Reference

@inproceedings{costa@sac2013,

author = {Helen Costa and Fabricio Benevenuto and Luiz Henrique de Campos Merschmann},

title = {Detecting Tip Spam in Location-based Social Networks},

booktitle = {Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC)},

year = {2013},

location = {Coimbra, Portugal}

}

 

If you want to use our dataset, please let us know by email at benevenuto AT gmail.com.



Database

The dataset used in the paper Detecting Tip Spam in Location-based Social Networks, published in Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC). Coimbra, Portugal. March, 2013, is available.

The file is in Weka ARFF input file format. Each line represents a review from our test collection. The attributes are separated by commas and represent the review characteristics as explained in the list below. The last attribute is the review class (i.e., "spam" and "non-spam").

  1. Clicks on the link "This tip helped me"
  2. Clicks on the link "Report abuse"
  3. Number of places registered by the user
  4. Number of tips posted by the user
  5. Number of photos posted by the user
  6. Number of clicks on the place page
  7. Number of tips on the place
  8. Place rating
  9. Clicks on the link "Thumbs down"
  10. Clicks on the link "Thumbs up"
  11. Similarity score (avg)
  12. Similarity score (max)
  13. Similarity score (min)
  14. Similarity score (median)
  15. Similarity score (sd)
  16. Number of spam words and spam rules
  17. Number of capital letters
  18. Number of numeric characters
  19. Number of phone numbers on the text
  20. Number of email addresses on the text
  21. Number of URLs on the text
  22. Number of contact information on the text
  23. Number of words
  24. Number of words in capital
  25. Distance among all places reviewed by the user (avg)
  26. Distance among all places reviewed by the user (max)
  27. Distance among all places reviewed by the user (min)
  28. Distance among all places reviewed by the user (median)
  29. Distance among all places reviewed by the user (sd)
  30. Clustering coefficient
  31. Reciprocity
  32. Number of followers (in-degree)
  33. Number of followees (out-degree)
  34. Fraction of followers per followees
  35. Degree
  36. Betweenness
  37. Assortativity (in-in)
  38. Assortativity (in-out)
  39. Assortativity (out-in)
  40. Assortativity (out-out)
  41. Pagerank
  42. Class (spam or non-spam)