photoPedro Henrique Calais Guerra
P.hD. Student
Computer Science Department, Federal University of Minas Gerais Belo Horizonte, MG, Brazil
email: pcalais AT dcc.ufmg.br











About me

I am a P.hD. student in the Department of Computer Science of Federal University of Minas Gerais (UFMG) in Brazil, working with prof. Wagner Meira Jr..

On my P.hD., I am devising transfer learning strategies to address the challenges of real time sentiment analysis in social media, mainly, the lack of labeled data to support supervised classifiers and the high dynamics of textual content observed in streams such as the one provided by Twitter.

In particular, I use that fact that bias is the human tendency to favor one side of a discussion in argumentation, lacking neutrality and balance. My research is to use the fact that human bias is a robust and consistent pattern, differently from pure text, to use that information to analyze sentiment (polarity) in discussions on social media in a real time fashion, regarding polemic and heavily-debated topics such as Politics and Sports. The main motivations are gaining the capability to deal with the hard challenges of lack of labeled textual data to support learning algorithms and of the unpredictable directions discussions can take. For more details, check our KDD'11 paper.

In the sequence of my KDD'11 work, I aim to devise graph mining strategies that use user bias to detect more complex sentiments, such as irony and sarcasm (e.g., when a user from a side endorses someone from an opposite side), to automatically detect when a social media user changes his view over a topic, and to rank content according to its ``polemicness'' (content endorsed by multiple sides of a discussion tend to be more interesting). I am also working on novel strategies to mine user inclinations, such as the temporal locality among users from the same bias - supporters of the same football team tend to manifest at similar times, for example.

I am currently looking for visiting scholar/internship opportunities to help leveraging my research, specially to extend the techniques I have developed to analyze Brazilian 2010 Elections to also analyze sentiment on the upcoming 2012 US Elections and other major sportive events such as Olympics, NBA and NFL games.

In parallel I also do research in the spam fighting field. During my master's course, I've focused on characterizing and investigating the behavior and strategies adopted by spammers in order to understand how they disseminate and distribute their messages. The core of the research is the development of clustering algorithms to detect spam campaigns. Take a look at some patterns and regularities on spam construction techniques we have been able to find:
Mining Spam Campaigns and Spam Address Lists
For more details, check our KDD'09 demonstration tool.

Education

Click here to check my CV Lattes

M.Sc., Computer Science, Federal University of Minas Gerais (2009) - Master's Thesis (in Portuguese). Chosen as one of the best 11 Master's Thesis in Computer Science in 2010, in Brazil.
B.S., Computer Science, Federal University of Minas Gerais (2006) (best student award - 4.7/5.0 GPA)

Media Coverage

There has been some media coverage of my research on Twitter in analyzing discussions over polemic topics such as Politics and Sports:
And also on my Spam Research:

Research Interests

My general research interests are in the area of data mining and machine learning. I am particularly interested in the following areas.

 


 

Publications:

Real Time Sentiment Analysis in Social Media:

From Bias to Opinion: a Transfer-Learning Approach to Real-Time Sentiment Analysis [bibtex][slides]
Pedro H. Calais Guerra, Adriano Veloso, Wagner Meira Jr., Virgilio Almeida.
17h ACM International Conference on Knowledge Discovery and Data Mining (KDD '11), 2011, San Diego, California.

Exploiting Temporal Locality to Determine User Bias in Microblogging Platforms [bibtex][slides]
Pedro H. Calais Guerra, Loic Cerf, Thiago Costa Porto, Adriano Veloso, Wagner Meira Jr., Virgilio Almeida.
Journal of Information and Data Management 2011.

Spam Fighting and Characterization:

Spam Detection Using Web Page Content: a New Battleground [bibtex]
Marco Túlio Ribeiro, Pedro H. Calais Guerra, Dorgival Guedes, Adriano Veloso, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'11), Perth, Australia

Exploring the Spam Arms Race to Characterize Spam Evolution [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS'10), Redmond, WA, USA

Spam Miner: A Platform for Detecting and Characterizing Spam Campaigns (demo paper) [bibtex]
Pedro H. Calais Guerra, Douglas Pires, Marco Túlio Ribeiro, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
International Conference on Knowledge Discovery and Data Mining (KDD '09), 2009, Paris, France.

Spamming Chains: A New Way of Understanding Spammer Behavior [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
Sixth Conference on e-Mail and Anti-Spam (CEAS '09)

A Campaign-based Characterization of Spamming Strategies [bibtex]
Pedro H. Calais Guerra, Douglas Pires, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen.
Fifth Conference on e-Mail and Anti-Spam (CEAS '08)

Detecção de Spams Utilizando Conteúdo Web Associado a Mensagens (in Portuguese) [bibtex]
Marco Túlio Ribeiro, Leonardo Vilela, Pedro H. Calais Guerra, Adriano Veloso, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
XXVIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC '11)

Identificação e Caracterização de Spammers a partir de Listas de Destinatários (in Portuguese) [bibtex]
Pedro H. Calais Guerra, Marco Túlio Ribeiro, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
XXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC '10)

Caracterização do Encadeamento de Conexões para Envio de Spams (in Portuguese) [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Marcelo H. P. C. Chaves, Klaus Steding-Jessen.
XXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC '09)

Caracterização de Estratégias de Disseminação de Spams (in Portuguese) [bibtex]
Pedro H. Calais Guerra, Dorgival Guedes, Wagner Meira Jr., Cristine Hoepers, Klaus Steding-Jessen.
XXVI Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC '08)

e-Commerce:

A Seller's Perspective Characterization Methodology for Online Auctions
Arlei Silva, Pedro H. Calais Guerra, Adriano Pereira, Fernando Mourao, Jussara Almeida, Wagner Meira Jr., Paulo Goes.
International Conference on Electronic Commerce (ICEC), 2008.

Broadband User Behavior Characterization:

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida.
Handbook of Research in Global Diffusion of Broadband Data. 1 ed. Hershey, Pennsylvania, US: IGI Global, 2008

Uma caracterização de usuários de Internet de Banda Larga (in Portuguese)
Pedro H. Calais Guerra, Leonardo Rocha.
Congresso da SBC, 2005, São Leopoldo / RS. Concurso de Trabalhos de Iniciação Científica (CTIC '2005)

BUBA: Uma ferramenta de caracterização de usuários de Internet Banda Larga (in Portuguese)
Pedro H. Calais Guerra, Elisa Tuler de Albergaria, Leonardo Rocha, Humberto Marques, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida.
Simpósio Brasileiro de Redes de Computadores - Salão de Ferramentas, Fortaleza/CE (SBRC '05)

Characterizing broadband user behavior and their e-business activities
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida.
ACM SIGMETRICS Performance Evaluation Review, v. 32, p. 3-13, 2004

Characterizing Broadband User Behavior
Humberto Marques, Leonardo Rocha, Pedro H. Calais Guerra, Jussara Almeida, Wagner Meira Jr., Virgílio Almeida
The first ACM Workshop on Next Generation Residential Broadband Challenges, New York, NY (NRBC '2004)

Other Topics:

AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs
Luis Fabrício Góes, Pedro H. Calais Guerra, Bruno Coutinho, Leonardo Rocha, Wagner Meira Jr., Renato Ferreira, Dorgival Guedes, Walfredo Cirne.
Workshop on Job Scheduling Strategies for Parallel Processing, 2005, Cambridge. (JSSPP '2005)


Computer Science conferences I have attended:



Other interests:



eXTReMe Tracker