How to train Spamassassin and tell it an email is spam?

OliverF · June 22, 2020, 9:41pm

Hello,

Virtualmin open source latest stable here, on a Debian10 with dovecot and spamassassin (spamd).
Hoping it’s the right system page, here are my server’s spamassassin options: https://imgur.com/a/WWtZFUZ

I’m seeing (and hearing rants about) a fair number of those “canadian pharmacy” typical kind of emails successfully passing through, and looking at their source, I see they have a score of 3 or 4, max.

Those emails are read through IMAP, on thunderbird (2 email accounts, 2 websites).

I wouldn’t mind training spamassassin, after all, that’s what bayesian algos are for. However, I’m afraid I simply don’t find where one may do such a thing, pardon me to ask, but would someone know if there is an official, guaranteed to work method?

I’ve starting hitting the Spam categorization button in thunderbird, however I would tend to doubt this will magically do the job.

Better, I saw instructions here: https://www.math.ias.edu/computing/faq/train-SpamAssassin
, however they are not virtualmin-specific, and I have zero way of knowing if they would work in a virtualmin context. It says “create a Spam folder, and move the spam emails inside”.
Erm… okay?
Would it also work with imap?
Does virtualmin cron spam learning, or must it be triggered by hand, as I’ve seen for other systems with web searches?
Should the Spam folder be created by hand in a terminal or by SFTP, or may it be created within usermin?

You see the idea for my questions, in a virtualmin context, I do not know if it would work, and how it would…

Thanks if someone knows and can tell

ramin · June 23, 2020, 2:23am

Webmin’s Bayesian filter setting and other SA controls are plenty thorough enough for two mail accounts. If you really want to take your SA server to the next level – complete with with ham v. spam learning folders trained by humans – you’ll have to set it up manually according to the docs. And it’s a complicated beast to setup, test and roll out. I don’t know how the setup would affect Webmin if at all.

A working spam folder should have been created along with each new mail account if SA was running. You should see it in Usermin. If Thunderbird isn’t showing a spam folder try re-subscribing to IMAP folders.

OliverF · June 23, 2020, 8:09am

I’m not sure, Ramin (thanks for replying though), are you saying Spamassassin IS learning, or not learning, in a normal virtualmin?

Also, to quote you, “A working spam folder should have been created along with each new mail account if SA was running.”
I just checked, in usermin, for those 2 mail accounts, no Spam folder existed, for instance, and yet there’s spamd running and a score in email headers.

I just made the test, created the Spam folder in virtualmin, moved in there a month’s worth of the spam emails that weren’t deleted, but I have no idea if Spamassassin will use them as learning material or not from there on…

ramin · June 23, 2020, 2:29pm

Your spam folders may be missing because SA wasn’t enabled for the hosts in Virtualmin. But if mail is being processed by SA it was enabled at some point, so I’m not sure why your spam folders weren’t setup. Creating them manually ought to do the trick if all other ducks are in a row. Doing it in Usermin is your best bet unless it needs to be the root user who creates it. To test spam delivery google “test spam filters” or something like that. Look for a string of text that goes into the body of a test message and send it to your server accounts from Gmail or some other mail system outside your own.

Bayes filtering should be learning what is and isn’t spam similar to how Gmail training works. Undetected spam gets moved to the spam folder and SA should know better next time. Likewise for innocent messages delivered as spam that are moved to Inbox. The problem with that is now you’ve got innocent ham mail with spam headers which can be an inconvenience in other ways. Try to process spam so that innocent ham marked as spam is still usable by its recipient (avoid deleting).

The more complicated method I mentioned involves lots of extra setup and effort from every mail user. In addition to the regular spam folder that collects identified spam, there are two learning folders for undetected ham and spam, as well as incorrectly flagged spam. i.e., The learning folders are for false positives and negatives plus any other way SA gets it wrong. SA then processes learning folders at the end of the day and sends a report. This setup yields best results with high volume servers and domains that are spam magnets. Learning folders are very effective after a month or so of human training, but IMO it’s overkill for most of us. When I was doing it this way years ago the biggest challenge was training mail users to train their learning folders.

system · July 23, 2020, 2:29pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.