Machine Learning Techniques in Spam Filtering
Konstantin Tretyakov
Institute of Computer Science, University of Tartu
Data Mining Problem-oriented Seminar, MTAT.03.177
May 2004, pp. 60-79
Abstract
The article gives an overview of some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs) and of their applicability to the problem of spam-filtering. Brief descriptions of the algorithms are presented, which are meant to be understandable by a reader not familiar with them before. A most trivial sample implementation of the named techniques was made by the author, and the comparison of their performance on the PU1 spam corpus is presented. Finally, some ideas are given of how to construct a practically useful spam filter using the discussed techniques. The article is related to the author's first attempt of applying the machine-learning techniques in practice, and may therefore be of interest primarily to those getting aquainted with machine-learning.
Download
Page last updated: 1.05.2004