•  
  •  
 

Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal

Document Type

Article

Abstract

The use of Twitter has rapidly grown since the first tweet in 2006. The number of spammers on Twitter shows a similar increase. Classifying users into spammers and non-spammers has been heavily researched, and new methods for spam detection are developing rapidly. One of these classification techniques is known as random forests. We examine three studies that employ random forests using user based features, geo-tagged features, and time dependent features. Each study showed high accuracy rates and F-measures with the exception of one model that had a test set with a more realistic proportion of spam relative to typical testing procedures. These studies suggest that random forests, in combination with unique feature selection can be used to identify spam and spammers with high accuracy but may have short- comings when applied to real world situations.

Primo Type

Article

Share

COinS