Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal

Article Title

Bayesian Spam Detection


Spammers always find new ways to get spammy content to the public. Very commonly this is accomplished by using email, social media, or advertisements. According to a 2011 report by the Messaging Anti-Abuse Working Group roughly 90% of all emails in the United States are spam. This is why we will be taking a more detailed look at email spam. Spam filters have been getting better at detecting spam and removing it, but no method is able to block 100% of it. Because of this, many different methods of text classification have been developed, including a group of classifiers that use a Bayesian approach. The Bayesian approach to spam filtering was one of the earliest methods used to filter spam, and it remains relevant to this day. In this paper we will analyze 2 specific optimizations of Naive Bayes text classification and spam filtering, looking at the differences between them and how they have been used in practice. This paper will show that Bayesian filtering can be simply implemented for a reasonably accurate text classifier and that it can be modified to make a significant impact on the accuracy of the filter. A variety of applications will be explored as well.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.