SpamAssassin (SA) is a program used for email spam filtering based on content matching rules. The Bayesian classifier that comes with SpamAssassin can be trained to recognize spam (or ham) based on a few sample emails. SA breaks the spam email into tokens or group of tokens for processing. Once SA is fed a large enough sample of spam tokens, it will start marking spam email with a higher score and thus block the spam. The same applies to ham except that the score is lower.
The sa-learn utility that comes with SA is the tool used to train SA on what is spam or what is ham. It is crucial to feed sa-learn with either spam or ham and not both at the same time. While sa-learn has several command line switches for various options, one only needs a couple of flags to have it process emails. The following two command lines are all one needs to get the job done:
To train SA on spam, run the following from the server in question:
sa-learn --showdots --mbox --spam spam-file
To train SA on ham, run the following from the server in question:
sa-learn --showdots --mbox --ham ham-file
spam-file and ham-file are files in Mailbox format. So what if your inbox is of type Maildir? There is an extra step involved in converting the Maildir format to a Mailbox. The utility mb2md can do the job seamlessly. Once the Maildir is converted to Mailbox, simply replace ‘spam-file’ in the command line above with the resulting Mailbox file from the conversion.
That’s all folks! We hope this was useful.
Comments are closed for this entry.