Reference
@inproceedings{costa@sac2013,
author = {Helen Costa and Fabricio Benevenuto and Luiz Henrique de Campos Merschmann},
title = {Detecting Tip Spam in Location-based Social Networks},
booktitle = {Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC)},
year = {2013},
location = {Coimbra, Portugal}
}
If you want to use our
dataset, please let us know by email at benevenuto AT
gmail.com.
Database
The dataset used in the paper Detecting Tip Spam in Location-based Social Networks, published in Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC). Coimbra, Portugal. March, 2013, is available.
The file is in Weka ARFF input file format. Each line represents a review from our test
collection. The attributes are separated by commas and represent the review characteristics as explained in the list below. The last attribute is the review class (i.e., "spam" and "non-spam").
- Clicks on the link "This tip helped me"
- Clicks on the link "Report abuse"
- Number of places registered by the user
- Number of tips posted by the user
- Number of photos posted by the user
- Number of clicks on the place page
- Number of tips on the place
- Place rating
- Clicks on the link "Thumbs down"
- Clicks on the link "Thumbs up"
- Similarity score (avg)
- Similarity score (max)
- Similarity score (min)
- Similarity score (median)
- Similarity score (sd)
- Number of spam words and spam rules
- Number of capital letters
- Number of numeric characters
- Number of phone numbers on the text
- Number of email addresses on the text
- Number of URLs on the text
- Number of contact information on the text
- Number of words
- Number of words in capital
- Distance among all places reviewed by the user (avg)
- Distance among all places reviewed by the user (max)
- Distance among all places reviewed by the user (min)
- Distance among all places reviewed by the user (median)
- Distance among all places reviewed by the user (sd)
- Clustering coefficient
- Reciprocity
- Number of followers (in-degree)
- Number of followees (out-degree)
- Fraction of followers per followees
- Degree
- Betweenness
- Assortativity (in-in)
- Assortativity (in-out)
- Assortativity (out-in)
- Assortativity (out-out)
- Pagerank
- Class (spam or non-spam)