The use of Twitter has rapidly grown since the first tweet in 2006. The number of spammers on Twitter shows a similar increase. Classifying users into spammers and non-spammers has been heavily researched, and new methods for spam detection are developing rapidly. One of these classification techniques is known as random forests. We examine three studies that employ random forests using user based features, geo-tagged features, and time dependent features. Each study showed high accuracy rates and F-measures with the exception of one model that had a test set with a more realistic proportion of spam relative to typical testing procedures. These studies suggest that random forests, in combination with unique feature selection can be used to identify spam and spammers with high accuracy but may have short- comings when applied to real world situations.
Haider, Humza S.
"Identifying Twitter Spam by Utilizing Random Forests,"
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: Vol. 4
, Article 5.
Available at: http://digitalcommons.morris.umn.edu/horizons/vol4/iss2/5