Spammers always find new ways to get spammy content to the public. Very commonly this is accomplished by using email, social media, or advertisements. According to a 2011 report by the Messaging Anti-Abuse Working Group roughly 90% of all emails in the United States are spam. This is why we will be taking a more detailed look at email spam. Spam filters have been getting better at detecting spam and removing it, but no method is able to block 100% of it. Because of this, many different methods of text classification have been developed, including a group of classifiers that use a Bayesian approach. The Bayesian approach to spam filtering was one of the earliest methods used to filter spam, and it remains relevant to this day. In this paper we will analyze 2 specific optimizations of Naive Bayes text classification and spam filtering, looking at the differences between them and how they have been used in practice. This paper will show that Bayesian filtering can be simply implemented for a reasonably accurate text classifier and that it can be modified to make a significant impact on the accuracy of the filter. A variety of applications will be explored as well.
Eberhardt, Jeremy J.
"Bayesian Spam Detection,"
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: Vol. 2
, Article 2.
Available at: http://digitalcommons.morris.umn.edu/horizons/vol2/iss1/2