Pedro Calais, computer scientist

Pedro Calais
Computer Scientist Ph.D and Software Engineer

Belo Horizonte, MG, Brazil
email: pcalais@dcc.ufmg.br

I am a computer scientist P.hD. motivated in doing long lasting, fundamental and technical work in the intersection of science and engineering.

I do my best to be a polymathic engineer-scientist - this means you can find me at the same time building an enterprise large-scale distributed system, teaching a bootcamp about big data tools, studying Economics under the lenses of bibliometrics, and writing a paper on how social psychology theories can be embedded into AI algorithms.

In the roles of academic researcher, teacher, individual contributor and manager, I've learned that great things can be accomplished when we are constantly pursuing truth and excellence by sharpening our knowledge in theory and practice, breadth and depth, and soft and hard skills.

My central purpose is designing and building useful software in a pragmatic manner, what means finding the right combination of code communicability, simplicity and flexibility.
Academic teaching and research also motivates me due to the amazing positive cycle it produces. "In learning you will teach, and in teaching you will learn."

For more details on my career, check my Linkedin profile and my publications at Google Scholar.

If you speak Portuguese, this is an interview I gave to UFMG reflecting on my career and the lessons learned during the journey.

You may also be interested on my post where I discuss how Generative AI has impacted my work as a lecturer.

Education:

My academic background is in computer science:

P.hD., Computer Science, Universidade Federal de Minas Gerais (2015) - Ph.D. Dissertation.
I was a visiting scholar at Cornell University CS Department and was supported by a Google Research and UOL Bolsa Pesquisa grants.
M.S., Computer Science, Universidade Federal de Minas Gerais (2009) - Master's Thesis (in Portuguese).
Chosen as one of the best 11 Master's Thesis in Computer Science in 2010, in Brazil.
B.S., Computer Science, Universidade Federal de Minas Gerais (2006).
I received the best student award - 4.7/5.0 GPA.

I like studying Economics and I've just finished the Austrian School of Economics specialization course by Instituto Mises Brasil.

Industry Experience:

In my builder side, I have done some real software engineering in the following companies:

WorldSense (2015-2018)
Loggi (2018-2021)
Stone Co. (2021-2023)

Research and Professional Interests:

software design principles that allow software to evolve and be stable by taming software complexity
data science and data engineering
machine learning
computational social sciences, i.e, connecting social theories with algorithms

Articles in the media, talks and interviews:

AI and the Job Market (in Portuguese). link
How Brazilians are using Artificial Intelligence? (in Portuguese). Interview to 98News Radio. link
The impact of AI in the market. Interview to 98News Radio. link
What do tech companies expect from data scientists and software engineers? (in Portuguese) link
The Economics of AI (in Portuguese) link
Using Spark in Scala (in Portuguese): link
Measuring Polarization in Social Media using Community Boundaries: link

Publications:

Large Language Models:

This is the result of a collaboration with Boston University to advance the understanding of the cognitive structure of LLMs.

Disentangling Text and Math in Word Problems: Evidence for the Bidimensional Structure of Large Language Models' Reasoning
Pedro Calais, Gabriel Franco, Zilu Tang, Themistoklis Nikas, Wagner Meira Jr., Evimaria Terzi, Mark Crovella.
Findings of the ACL, 2025.

Bibliometrics and Austrian Economics:

I love Economics, and in this work, I have applied bibliometrics and data analysis tools to characterize the branch of Economics I identify with the most - Austrian Economics - and how it studies topics such as entrepreneurship and business cycles.

Contemporary Austrian School as a research program: What can bibliometrics teach us?
Pedro Calais, Joao Mazzoni, Mariana Abreu
Review of Austrian Economics, 2024.

Software Engineering and Flow State:

Recently, the connection between software engineering and neuroscience has sparked my interest. While working at Stone Co, I noticed how test-driven development helps developers enter the so called flow state.

We found evidence that TDD promotes cognitive flow and enables better software delivery, bridging agile practice with cognitive science.

Since then, I have published academic work detailing the mechanics of this connection, and have given talks at the Brazilian The Developer's Conference (TDC) and Google Developer Groups meetups.

Test-Driven Development Benefits Beyond Design Quality: Flow State and Developer Experience [bibtex] [slides]
Pedro Calais, Lissa Franzini
International Conference on Software Engineering - New Ideas and Emerging Results (ICSE NIER, 2023).

Politicization, Fake News and Hate Speech:

I have a special interest in connecting social science theories with computational methods. In this recent research, me and some colleagues at UFMG show how politicization can be observed as a genuine social process -- a transition from a non-political to a political topic.

Topic Shifts as a Proxy for Assessing Politicization in Social Media
Marcelo Sartori, Pedro Calais, Joao Pedro Junho, Matheus Prado, Tomas Lacerda, Wagner Meira Jr., Virgilio Almeida
International AAAI Conference on Web and Social Media (ICWSM 2024).

"Like Sheep Among Wolves": Characterizing Hateful Users on Twitter [bibtex]
Manoel Ribeiro, Pedro Calais, Yuri Santos, Wagner Meira Jr. and Virgilio Almeida
Misinformation and Misbehavior Mining Workshop (MIS2) @ WSDM 2018, Los Angeles, USA.

"Everything I disagree with is #FakeNews": Correlating Political Polarization and Spread of Misinformation [bibtex]
Manoel Ribeiro, Pedro Calais, Wagner Meira Jr. and Virgilio Almeida
Data Science + Journalism Workshop @ KDD 2017, Halifax, Canada.

Polarization and Sentiment Analysis on Social Streams:

My Ph.D research was focused on connecting social psychology theories on how people express their opinions to sentiment analysis algorithms tailored to operate on rapid evolving social streams having an underlying social graph.
We were able to demonstrate that effective and simple algorithms based on such theories can operator on highly dynamic topic such as a soccer match.

Antagonism also Flows through Retweets: The Impact of Out-of-Context Quotes in Opinion Polarization Analysis (poster paper) [bibtex][extended paper]
Pedro Calais, Roberto Nalon, Renato Assuncao and Wagner Meira Jr.
11h International AAAI Conference on Weblogs and Social Media (ICWSM 2017), Montreal, Canada.

Sentiment Analysis on Evolving Social Streams: How Self-Report Imbalances Can Help [bibtex][slides]
Pedro Calais, Wagner Meira Jr., Claire Cardie
7th International ACM Conference on Web Search and Data Mining (WSDM 2014), New York City, USA.

A Measure of Polarization on Social Media Networks based on Community Boundaries [bibtex][slides][blog post][video]
Pedro Calais, Wagner Meira Jr., Claire Cardie, Robert Kleinberg.
7th International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, USA.

From Bias to Opinion: a Transfer-Learning Approach to Real-Time Sentiment Analysis [bibtex][slides]
Pedro Calais, Adriano Veloso, Wagner Meira Jr., Virgilio Almeida.
17h ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, California.

Spam Fighting and Characterization:

In my master's degree, I've focused on characterizing and investigating the behavior and strategies adopted by spammers in order to understand how they disseminate and distribute their messages.
The focus of the research was the development of clustering algorithms to detect spam campaigns, and the use of association rule mining to uncover regularities and spam abuse patterns.
Take a look at some patterns and regularities on spam construction techniques we have been able to find:

Mining Spam Campaigns and Spam Address Lists

Spam Detection Using Web Page Content: a New Battleground [bibtex]
Marco Tulio Ribeiro, Pedro Calais, Dorgival Guedes, Adriano Veloso, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'11), Perth, Australia

Exploring the Spam Arms Race to Characterize Spam Evolution [bibtex]
Pedro Calais, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'10), Redmond, WA, USA

Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns (demo paper) [bibtex]
Pedro Calais, Douglas Pires, Marco T�lio Ribeiro, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
International Conference on Knowledge Discovery and Data Mining (KDD '09), 2009, Paris, France.

Spamming Chains: A New Way of Understanding Spammer Behavior [bibtex]
Pedro Calais, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
Sixth Conference on e-Mail and Anti-Spam (CEAS '09)

A Campaign-based Characterization of Spamming Strategies [bibtex]
Pedro Calais, Douglas Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen.
Fifth Conference on e-Mail and Anti-Spam (CEAS '08)

Information Retrieval:

An Anatomy for Neural Search Engines [bibtex]
Akio Nakamura, Pedro Calais, Davi Reis, Andre Paim
Information Sciences, 2018.

e-Commerce:

A Seller's Perspective Characterization Methodology for Online Auctions
Arlei Silva, Pedro Calais, Adriano Pereira, Fernando Mourao, Jussara Almeida, Wagner Meira Jr., Paulo Goes.
International Conference on Electronic Commerce (ICEC), 2008.

Broadband User Behavior Characterization:

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro Calais, Jussara Almeida, Wagner Meira Jr., Virgilio Almeida.
Handbook of Research in Global Diffusion of Broadband Data. 1 ed. Hershey, Pennsylvania, US: IGI Global, 2008

Characterizing broadband user behavior and their e-business activities
Humberto Marques, Leonardo Rocha, Pedro Calais, Jussara Almeida, Wagner Meira Jr., Virgilio Almeida.
ACM SIGMETRICS Performance Evaluation Review, v. 32, p. 3-13, 2004

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro Calais, Jussara Almeida, Wagner Meira Jr., Virgilio Almeida
The first ACM Workshop on Next Generation Residential Broadband Challenges, New York, NY (NRBC '2004)

High-performance computing:

AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intenstive Parallel Jobs
Luis Fabricio Goes, Pedro Calais, Bruno Coutinho, Leonardo Rocha, Wagner Meira Jr, Renato Ferreira, Dorgival Guedes, Walfredo Cirne.
Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP' 2005)

Some of my other interests:

Austrian School of Economics
Stoicism
Cars