Daniel Kansaon

Daniel Kansaon.

I am currently pursuing my Ph.D. at the Universidade Federal de Minas Gerais, Brazil (UFMG), where I am a member of the Social Computing Research Group, and advised by Dr. Fabrício Benevenuto. Additionally, I engaged in collaborative research with a group at Max Planck Institute Informatik (MPI-INF), with Anja Feldmann and Savvas Zannettou.

My research involves areas from computer science related to data science, including natural language processing, machine learning and data mining. I am currently working on projects related to misinformation, hate speech and systems with societal impact. I gained valuable experience as a research intern during a three-month stint at the Max Planck Informatik Institute (MPI-INF) in Germany. I earned my Master's degree in Computer Science at the Universidade Federal de Minas Gerais (UFMG) in 2020. During my Master's, I worked with a focus on text classification and opinion mining. Also, I completed my undergraduate degree in Information System at the Pontifícia Universidade Católica de Minas Gerias (PUC-MG) in 2017. In addition, I specialized in Distributed Software Architecture and worked for almost 3 years as a software engineer.

Interests: Data Analysis, Complex Networks, Machine Learning, Natural Language Processing and Sentiment Analysis.

My complete name: Daniel Pimentel Kansaon.

My CV is available for download: in English.
Last update: July 07, 2025

Journal Publications

  • Analysis of Classification Algorithms for Detecting Emotions in Tweets in Brazilian Portuguese (translated)
  • Abstract: With increasing access to the Web, large amounts of content are produced daily. The study of such contents allows the discovery of new knowledge. In this sense, this work presents an analysis of algorithms that allow the detection of emotions in tweets in the Brazilian Portuguese language. Thus, ten algorithms are considered, from decision trees to classifiers based on Bayes model, addressing altogether, seven classes of emotions: sad, upset, love, happy, anger, envy and irony. The results of the experimental evaluation are better when classifying relationships of distinct emotions, reaching 85\% accuracy with a Naive Bayes algorithm. On the other hand, relations between close feelings present results inferior to 70\% of correctness in some cases. Moreover, Naive Bayes-based classification algorithms present efficient results in a variety of contexts, in addition to having consistent language-independent behavior.
  • Mining Comparative Opinions in Portuguese: A Lexicon-based Approach [Awaiting Publication]
  • Abstract: The constant expansion of e-commerce recently boosted due to the coronavirus pandemic, has led to a massive increase in online shopping, made by increasingly demanding customers, who seek comments and reviews on the Web to assist in decision-making regarding the purchase of products. In these reviews, part of the opinions found are comparisons, which contrast aspects expressing a preference for an object over others. However, this information is neglected by traditional sentiment analysis techniques and it is not applicable for comparisons, since they do not directly express positive or negative sentiment. In this context, despite efforts in the English language, almost no studies have been done to develop appropriate solutions that allow the analysis of comparisons in the Portuguese language. This work presented one of the first studies on comparative opinion in Portuguese and four main contributions, are (1) A hierarchical approach for detecting comparative opinions, which consists of an initial binary step, which subdivides the regular opinions from the comparatives, to further categorize the comparatives into the five opinion groups: (1) Non-Comparative; (2) Non-Equal Gradable; (3) Equative, (4) Superlative; and (5) Non-Gradable. The results are promising, reaching 87% of Macro-F1 and 0.94 of AUC (Compute Area Under the Curve) for the binary step, and 61% of Macro-F1 in multiple classes; (2) An lexicon algorithm to detect the entity expressed as preferred in comparative sentences, reaching 94% of Macro-F1 for Superlative; (3) Two new datasets with approximately 5,000 comparative and non-comparative sentences in Portuguese; and (4) a lexicon with words and expressions frequently used to make comparisons in the Portuguese language.

Conference Publications

  • From Fake News to Real Protests: WhatsApp's Role in Brazilian Political Coordination
  • Abstract: The growth of social networks has raised concerns about the misuse of these platforms by disinformation campaigns, social bots, and coordinated activities. Among these platforms, WhatsApp has become a focal point for this abuse, particularly in Brazil, one of the countries with the highest use of the platform. Despite acknowledging the presence of coordinated campaigns and implementing restrictions on the number of messages forwarded per user, the platform continues to be abused. Due to its private nature and the difficulty of collecting information, little is known about these campaigns and the messages they disseminate. Given this context, our study investigates the presence of coordinated activities on WhatsApp in Brazil, identifying their content and purpose, especially how these messages relate to recent Brazilian political events. To answer these questions, we analyzed 13 million messages from 1,444 political groups over seven months from July 2022 to January 2023. Using network analysis, our findings suggest a significant prevalence of coordinated activity in the propagation of news messages, 26% of which originate from misinformation sites. Furthermore, we found that images play a key role in coordinated activity, accounting for 15% of messages, which are also used to mislead. Finally, coordinated accounts were used to organize collective actions, including attacks and protests against election results.
  • A Sticker is Worth a Thousand Words: Characterizing the Use and Abuse of Stickers on WhatsApp Political Groups in Brazil
  • Abstract: Instant messaging platforms have become an important means of communication in our world. According to WhatsApp, more than 100 billion messages are sent daily through the app. Communication on these platforms has allowed individuals to express themselves in other types of media, rather than simple text, including audio, videos, images, and, more recently, stickers. This new multimedia format, in particular, emerged with messaging apps and gained considerable popularity among users, promoting new forms of interactions. Stickers range from static images of memes and emojis to animated images similar to GIFs, often used in humorous contexts. However, in the Brazilian context of WhatsApp, they are transcending their role as a mere form of humor to become an important element in political strategy. In this regard, we investigate how stickers are used, revealing unique characteristics that these media bring to public WhatsApp groups and, more specifically, the political use of this new media format. Furthermore, we found evidence of sticker abuse on WhatsApp, where users attack political opponents and spread hate speech and offensive content in public groups without any moderation. To investigate this phenomenon, we collected a large sample of messages from public political WhatsApp groups in Brazil and analyzed the sticker messages shared in this context. Warning! This paper contains images and terms that may be offensive to some audiences.
  • Strategies and Attacks of Digital Militias in WhatsApp Political Groups
  • Abstract: WhatsApp provides a fertile ground for the large-scale dissemination of information, particularly in countries like Brazil and India. Given its increasing popularity and use for political discussions, it is paramount to ensure that WhatsApp groups are adequately protected from attackers who aim to disrupt the activity of WhatsApp groups. Motivated by this, in this work, we characterize two types of attacks that may disrupt WhatsApp groups. We look into the flooding attack, where an attacker shares usually numerous duplicate messages within a short period, and the hijacking attack, where attackers aim to obtain complete control of the group. We collect a large dataset of 19M messages shared in 1.6K WhatsApp public political groups from Brazil and analyze them to identify and characterize flooding and hijacking attacks. Among other things, we find that approximately 7% of the groups receive flooding attacks, which are usually short-lived (usually less than four minutes), and groups can receive multiple flooding attacks, even within the same day. Also, we find that most flooding attacks are executed using stickers (62% of all flooding attacks) and that, in most cases, attackers use both flooding and hijacking attacks to obtain complete control of the WhatsApp groups. Our work aims to raise user awareness about such attacks on WhatsApp and emphasizes the need to develop effective moderation tools to assist group administrators in preventing or mitigating such attacks.
  • WhatsApp Monitor 2.0 – Monitoring Brazilian Political Groups on WhatsApp [Awaiting Publication] (translated)
  • julio_reis Abstract: WhatsApp has become a crucial tool in communicating and disseminating (mis)information in Brazil. Since 2018, the tool has been widely used for disinformation and hate speech campaigns. In this work, we propose WhatsApp Monitor 2.0, a web-based system that aids researchers and journalists in tracking, in real-time, the most popular content shared in public WhatsApp political groups. Our tool monitors, processes, and ranks images, videos, audios, and text messages posted in these groups, presenting the most popular content daily. WhatsApp Monitor 2.0 provides a valuable resource for identifying viral content on WhatsApp, thus helping to combat misinformation.
  • “Click Here to Join”: A Large-Scale Analysis of Topics Discussed by Brazilian Public Groups on WhatsApp
  • Abstract: WhatsApp has many similarities with online social networks, as it allows connections between multiple people and massive communication by sharing content with your contacts and public groups, which brings people together to discuss a topic. Even though it is one of the most popular social media in the world, there is a lack of a systematic understanding of the Whatsapp ecosystem, especially when it comes to knowing the subjects discussed in public groups and how other users find/join those groups. In this direction, our goal is to investigate how public groups are shared on the Web and also map the main topics existing within this ecosystem. For this, we perform a large-scale collection, spanning four main sources on the Web for sharing groups, with more than 270k WhatsApp public groups, categorizing and analyzing this environment. Our results shed light on a large existence of groups focused on topics such as friendship, pop culture, stickers, sales, jobs, education, and even adult content suggesting the many uses of the WhatsApp tool. We also found key differences in groups according to the source where it was posted. Moreover, we discovered how group links work to persuade users from other platforms into the underground environment of WhatsApp. Malicious groups abuse its closed architecture and low moderation for illicit practices such as selling fake money and cloned cards. Furthermore, our analysis also found evidence of automated behavior in malicious group sharing. Finally, we discuss implications and measures that can be taken to address these issues.
  • Telegram Monitor: Monitoring Brazilian Political Groups and Channels on Telegram
  • Abstract: In this work, we present the “Telegram Monitor”, a web-based system that monitors the political debate in this environment and enables the analysis of the most shared content in multiple channels and public groups. Our system aims to allow journalists, researchers, and fact-checking agencies to identify trending conspiracy theories, misinformation campaigns, or simply to monitor the political debate in this space along the 2022 Brazilian elections. We hope our system can assist the combat of misinformation spreading through Telegram in Brazil. The following link contains a brief description about the aforementioned system: https://bit.ly/3l4xNrF.
  • WhatsApp Monitor: A Fact-Checking System to Combat Misinformation (translated)
  • Abstract: WhatsApp is the most popular instant messaging application in many countries such as Brazil, India, and Indonesia, where many people use it as the main interface to the Web. Recently, WhatsApp has been pointed as an important actor in the spreading of misinformation. However, due to its encrypted and peer-to-peer nature, it is hard for people to explore the content people share within WhatsApp at scale. In this work, we propose the \textit{Monitor de WhatsApp} (http://www.whatsapp-monitor.dcc.ufmg.br/), a web-based system that helps researchers and journalists explore the nature of content shared on WhatsApp public groups from three different contexts: Brazil, India, and Indonesia. Our tool monitors multiple content categories such as images, videos, audio, and textual messages posted on a set of WhatsApp groups and displays the most shared content ranked per day. Our tool has been used for monitoring content during the 2018 Brazilian Elections to the COVID-19 pandemics and was one of the major sources for estimating the spread of misinformation and helping fact-checking efforts on WhatsApp scenario.
  • Mining Portuguese Comparative Sentences in Online Reviews
  • Abstract: The constant expansion of e-commerce, recently boosted due to the coronavirus pandemic, has led to a huge increase in online shopping. More and more, customers demand online reviews of products and comments on the Web to make decisions about buying a product over another. In this context, sentiment analysis techniques constitute the traditional way to summarize user’s opinions that criticizes or highlights the positive aspects of a product. Sentiment analysis of reviews usually relies on extracting positive and negative aspects of products, neglecting comparative opinions. Such opinions do not directly express a positive or negative view but contrast aspects of products from different competitors. In this paper, we present the first effort towards detecting comparative sentences in Portuguese. Identifying comparative sentences is a key task for companies to know how users are comparing a product with their competitors and is essential for developing sentiment summarization applications for the end user. In addition, we present a supervised approach to automatically detect Portuguese comparative sentences, classifying them into five distinct groups: (1) Non-Comparative, (2) Non-Equal Gradable, (3) Equative, (4) Superlative e (5) Non-Gradable. To that end, this paper provides three main contributions: (1) a Portuguese lexicon list with words used to make comparisons; (2) two new Portuguese datasets with comparative sentences; and (3) a hierarchical approach for detecting multiple comparisons and classify the sentences in different groups by using state-of-art classification algorithms, reaching an accuracy of 87%.
  • Leveraging the Facebook ads platform for election polling.
  • Abstract: Election polls provide valuable insights about voting intention on candidates filtered by demographic characteristics. This information can be used to understand the dynamics of underlying electoral preferences and how it changes in the days or months during the campaign. Despite its importance, election polls are time and money consuming on many occasions. Especially those with face-to-face surveys, that interview a representative population of the voters across the entire country. In this study, we propose a novel approach that explores social media advertising platforms to infer the audience demographics of politicians in the electoral race. We leverage the attribute-based targeting available on the Facebook Advertising platform by using the 'interests' related to candidates to calculate their audience demographics. Then, we compared the online extracted data with election polls taken in the same period. Our findings suggest that the candidate's popularity on Facebook, captured in terms of the number of likes, people talking about him/her, and the number of people interested in the candidate, is a valid indicator of his/her variation on vote intention polls. Additionally, we figured out that the fluctuation in the demographic aspects of supporters detected by election polls are captured by our methodology, obtaining more precision with popular candidates. In particular, we show that sharp variations occasioned by high impact events during the campaign such as protests are well captured by Facebook measurements. Finally, we deployed a system that exposes the audience demographics of Brazilian politicians on Facebook (available at http://www.audiencia-dos-politicos.dcc.ufmg.br/) and contributes to the understanding of the political scene in Brazil.
  • Sentiment Analysis in Brazilian Portuguese Tweets (translated)
  • Abstract: ​There are several studies on sentiment analysis for the English language. In the case of Brazilian Portuguese, the number of papers is smaller because there are not so many datasets available and methods to perform the analysis. This work presents a methodology to compare techniques that classify feelings expressed directly or indirectly in tweets in the Brazilian Portuguese language. In addition, seven classes of feelings are considered and identified in the tweets. The results are promising when classifying distinct feelings, as the best classifier achieves 85% of accuracy. On the other hand, relations between close feelings present results less than 70% of accuracy.

Master's Thesis

  • Mining Comparative Opinions in Portuguese
  • Abstract: The constant expansion of e-commerce, recently boosted due to the coronavirus pandemic, has led to a huge increase in online shopping, made by increasingly demanding customers, who seek comments and reviews on the Web to assist in decision making regarding the purchase of products. In these reviews, part of the opinions found are comparisons, which contrast aspects expressing a preference for an object over others, allowing, for example, companies to know how customers compare their products to their competitors. However, this information is neglected by traditional sentiment analysis techniques and it is not applicable for comparisons, since they do not directly express a positive or negative sentiment. In this context, despite efforts in the English language, almost no studies have been done to develop appropriate solutions that allow the analysis of comparisons in the Portuguese language.

Bachelor's Thesis

  • Classification Techniques for Sentiment Analysis in Tweets in Brazilian Portuguese (translated)
  • Abstract: Com a popularização das redes sociais online e o grande volume de dados produzidos a cada dia, a análise dessas informações pode ajudar a entender fenômenos, prever tendências e avaliar o senso comum. Na análise de sentimentos existem vários trabalhos para a língua inglesa, isso se dá devido a quantidade de ferramentas e técnicas existentes para esse idioma. No caso do português brasileiro, a quantidade de trabalhos é menor, devido não existirem tantas bases de dados à disposição. Muitas vezes as técnicas de classificação e análise de sentimentos apresentam resultados inferiores quando comparados a aplicação na língua inglesa. Neste trabalho, é realizado um estudo a fim de utilizar técnicas de análise de sentimentos e mineração de dados para descobrir sentimentos expressos diretamente ou indiretamente em tweets no idioma português brasileiro. Os tweets foram coletados através da API oficial do Twitter. Os dados foram armazenados em uma base de dados e organizados em classes de sentimentos para aplicação dos algoritmos de classificação. Foi necessário realizar processamentos nos tweets antes da aplicação dos algoritmos. Ademais, os emojis precisaram de um tratamento específico durante o trabalho. Os resultados mostram até 85% de acerto ao classificar sentimentos distintos. Por outro lado, as classes que compararam sentimentos próximos, não obtiveram resultados satisfatórios devido a semelhança dos sentimentos. Assim, a detecção de múltiplos sentimentos em textos ainda é um desafio da análise de sentimentos.

For more information and complete curriculum visit: Linkedin

Online Systems

  • Eleições sem Fake: Project led by Dr. Fabricio Benevenuto, which has Ph.D. students who contribute with systems focused on bringing transparency to Brazilian elections.
  • WhatsApp Monitor: This system shows the most shared images, videos, audios, messages, URLs in more than 500 WhatsApp public groups.
  • Telegram Monitor: This system shows the most shared content on Telegram public groups.
  • WhatsApp Reports: Reports and Analyses through the lens of WhatsApp.
  • Covid-19 Monitor: Analysis of Brazilian WhatsApp groups during the first three months of the pandemic in Brazil.

Datasets

Federal University of Minas Gerais (UFMG)
Departament of Computer Science
Avenue: Presidente Antônio Carlos - 4201 - 6499
Belo Horizonte, Minas Gerias - Brazil
Daniel Kansaon
daniel [DOT] kansaon [AT] dcc [DOT] ufmg [DOT] br