photoPedro Calais Guerra


Computer Science Department, Federal University of Minas Gerais Belo Horizonte, MG, Brazil
email: pcalais AT dcc.ufmg.br











About me

I hold a doctorate degree from the Department of Computer Science of Federal University of Minas Gerais (UFMG) in Brazil, working with prof. Wagner Meira Jr.. During October 2012 -- July 2013, I conducted my research as a visiting scholar at Cornell University CS Department.

Currently, I am a software engineer in WorldSense, an Ad Tech startup whose goal is to help content publishers focus and enhance their content with relevant links at content creation time.

In the more academic side, I am interested in polarization of opinions -- the social process whereby a social or political group is divided into two opposing sub-groups with fewer and fewer members of the group remaining neutral or holding an intermediate position. I am working on unveiling which structural characteristics polarization induce in the social graphs we find on social media systems, specially in contexts such as Politics, Sports and highly-debated topics. I am also interested on multi-polarized social graphs, which can be found where more than two sides compete against each other. Such scenario is found, for example, in Elections with more than two candidates and in Sports competitions (for example, on the Soccer World Cup, we have N=32 sides making opposition one to the other). Actually, in multi-polarized networks, more complex relationships and interactions among sides emerge, in particular, support,antagonism and indifference. My current research is also working on such direction and on how to embed such knowledge in content analysis and information diffusion in social media. With a good understanding of how polarization is manifested in social networks, we can then use special structure and social properties of polarized networks to enable solutions to challenging tasks as real time sentiment analysis in social media, which is hard due the lack of labeled data to support supervised classifiers and the dynamics of textual content observed in streams such as the one provided by Twitter.

My basic assumption is that, on polarized contexts, opinions are not manifested randomly by users at random moments in time. In other words, humans are biased. In particular, I use that fact that bias is the human tendency to favor one side of a discussion in argumentation, lacking neutrality and balance. Differently from pure text, human bias is robust and consistent pattern that can be used to analyze sentiment (polarity) in discussions on social media in a real time fashion, regarding polemic and heavily-debated topics such as Politics and Sports. The main motivations are gaining the capability to deal with the hard challenges of lack of labeled textual data to support learning algorithms and of the unpredictable directions discussions can take. My recent results show that bias manifest at least in two dimensions:

I also aim to devise graph mining strategies that use user bias to detect more complex sentiments, such as irony and sarcasm (e.g., when a user from a side endorses someone from an opposite side), to automatically detect when a social media user changes his view over a topic, and to rank content according to its ``polemicness'' (content endorsed by multiple sides of a discussion tend to be more interesting).

Education

Click here to check my CV Lattes and Google Scholar Citations

M.Sc., Computer Science, Federal University of Minas Gerais (2009) - Master's Thesis (in Portuguese). Chosen as one of the best 11 Master's Thesis in Computer Science in 2010, in Brazil.
B.S., Computer Science, Federal University of Minas Gerais (2006) (best student award - 4.7/5.0 GPA)

Media Coverage

There has been some media coverage of my research on Twitter in analyzing discussions over polemic topics such as Politics and Sports:
And also on my Spam Research:

Research Interests

My general research interests are in the area of data mining and machine learning. I am particularly interested in the following areas.

 


 

Publications:

Real Time Sentiment Analysis in Social Media:

Sentiment Analysis on Evolving Social Streams: How Self-Report Imbalances Can Help [bibtex][slides]
Pedro H. Calais Guerra, Wagner Meira Jr., Claire Cardie
7th International ACM Conference on Web Search and Data Mining (WSDM 2014), New York City, USA.

A Measure of Polarization on Social Media Networks based on Community Boundaries [bibtex][slides][blog post][video]
Pedro H. Calais Guerra, Wagner Meira Jr., Claire Cardie, Robert Kleinberg.
7th International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, USA.

From Bias to Opinion: a Transfer-Learning Approach to Real-Time Sentiment Analysis [bibtex][slides]
Pedro H. Calais Guerra, Adriano Veloso, Wagner Meira Jr., Virgilio Almeida.
17h ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, California.

Spam Fighting and Characterization:

In the recent past I've also performed some research in the spam fighting field. During my master's course, I've focused on characterizing and investigating the behavior and strategies adopted by spammers in order to understand how they disseminate and distribute their messages. The core of the research is the development of clustering algorithms to detect spam campaigns. Take a look at some patterns and regularities on spam construction techniques we have been able to find:
Mining Spam Campaigns and Spam Address Lists
Spam Detection Using Web Page Content: a New Battleground [bibtex]
Marco Túlio Ribeiro, Pedro H. Calais Guerra, Dorgival Guedes, Adriano Veloso, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'11), Perth, Australia

Exploring the Spam Arms Race to Characterize Spam Evolution [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'10), Redmond, WA, USA

Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns (demo paper) [bibtex]
Pedro H. Calais Guerra, Douglas Pires, Marco Túlio Ribeiro, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
International Conference on Knowledge Discovery and Data Mining (KDD '09), 2009, Paris, France.

Spamming Chains: A New Way of Understanding Spammer Behavior [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
Sixth Conference on e-Mail and Anti-Spam (CEAS '09)

A Campaign-based Characterization of Spamming Strategies [bibtex]
Pedro H. Calais Guerra, Douglas Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen.
Fifth Conference on e-Mail and Anti-Spam (CEAS '08)

e-Commerce:

A Seller's Perspective Characterization Methodology for Online Auctions
Arlei Silva, Pedro H. Calais Guerra, Adriano Pereira, Fernando Mourao, Jussara Almeida, Wagner Meira Jr., Paulo Goes.
International Conference on Electronic Commerce (ICEC), 2008.

Broadband User Behavior Characterization:

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida.
Handbook of Research in Global Diffusion of Broadband Data. 1 ed. Hershey, Pennsylvania, US: IGI Global, 2008

Characterizing broadband user behavior and their e-business activities
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida.
ACM SIGMETRICS Performance Evaluation Review, v. 32, p. 3-13, 2004

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida
The first ACM Workshop on Next Generation Residential Broadband Challenges, New York, NY (NRBC '2004)

Other Topics:

Estimativa de Demanda Potencial de Matrículas em Ensino Superior usando Dados Públicos e Múltiplos Modelos de Regressão (in Portuguese)
Pedro H. Calais Guerra, Rodrigo Mizobe, Eduardo Hruschka.
II Symposium on Knowledge Discovery, Mining and Learning KDMILE, 2014, São Carlos, SP. II Symposium on Knowledge Discovery, Mining and Learning KDMILE, 2014.

AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs
Luis Fabrício Góes, Pedro H. Calais Guerra, Bruno Coutinho, Leonardo Rocha, Wagner Meira Jr., Renato Ferreira, Dorgival Guedes, Walfredo Cirne.
Workshop on Job Scheduling Strategies for Parallel Processing, 2005, Cambridge. (JSSPP '2005)


Some of my other interests:


Find me on Social Networks:



eXTReMe Tracker