in Crash Course

How to train SpamAssassin’s Bayesian filter on spam or ham

SpamAssassin (SA) is a program used for email spam filtering based on content matching rules. The Bayesian classifier that comes with SpamAssassin can be trained to recognize spam (or ham) based on a few sample emails. SA breaks the spam email into tokens or group of tokens for processing. Once SA is fed a large enough sample of spam tokens, it will start marking spam email with a higher score and thus block the spam. The same applies to ham except that the score is lower.

The sa-learn utility that comes with SA is the tool used to train SA on what is spam or what is ham. It is crucial to feed sa-learn with either spam or ham and not both at the same time. While sa-learn has several command line switches for various options, one only needs a couple of flags to have it process emails. The following two command lines are all one needs to get the job done:

To train SA on spam, run the following from the server in question:

sa-learn --showdots --mbox --spam spam-file

To train SA on ham, run the following from the server in question:

sa-learn --showdots --mbox --ham ham-file

spam-file and¬†ham-file are files in Mailbox format. So what if your inbox is of type Maildir? There is an extra step involved in converting the Maildir format to a Mailbox. The utility mb2md can do the job seamlessly. Once the Maildir is converted to Mailbox, simply replace ‘spam-file’ in the command line above with the resulting Mailbox file from the conversion.

That’s all folks! We hope this was useful.