Machine Learning Techniques in Spam Filtering

Abstract

The article gives an overview of some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs) and of their applicability to the problem of spam-filtering. Brief descriptions of the algorithms are presented, which are meant to be understandable by a reader not familiar with them before. A most trivial sample implementation of the named techniques was made by the author, and the comparison of their performance on the PU1 spam corpus is presented. Finally, some ideas are given of how to construct a practically useful spam filter using the discussed techniques. The article is related to the author's first attempt of applying the machine-learning techniques in practice, and may therefore be of interest primarily to those getting aquainted with machine-learning.

Download

Page last updated: 1.05.2004