hmm. cant seem to find the conversation i had anywhere.
i apologize the novel im about the write, but i plan to use this as an example in the future. so hopefully it helps. but dont take offence if i am going over things you already understand…and if i start explaining them in an over simplified manner.
i expect that have a good understanding of these things, but i’d feel better if if wrote this so anybody coming along that finds it… can read it too.
then again, i have a tendency to confuse people. so who knows 
- what are you trying to acheive? what is your goal?
i.e. are you hosting personal email for yourself? or is this a high value customer?
how computer savvy are the email users? what is the environment that this email server will be used in?
how much security do you require? how much time do you have to spend on this?
why these things matter is for two main reasons. one, is because you need to decide just how much spam is acceptable. not that you want any, but the filter requires tuning. and you don’t want to deny legitimate mail during the tuning phase. the second reason is because alot of people who use email, should probably not even be using a computer 
more simply put… and to the last question: how much security do they need?
once you answer these questions, you can define what needs to be done. and start.
note: this sounds like a headache… but once its DONE, and done RIGHT, you will appreciate it. and it will just work. there is nothing worse than a customer who keeps getting his highly important email destroyed by the spam filter. THIS can be a much bigger headache, as i am sure you are aware of 
- make sure you enable every plugin available from spamassasin.
the spam filter comes with alot of features, but some arent enabled out the box on some systems. simply head over to /etc/spamassassin/ (on most systems) and go through these files. on debain/ubuntu, the ones in question are titled v310.pre through v330.pre
you will likely just need to remove the comments from the lines which contain text like this:
“loadplugin Mail::SpamAssassin::Plugin::DCC”
make sure you restart and make sure all these plugins are actually functional on your system before assuming they are operational, however
- make use of clamAV.
more specifically, clamAV can do alot more than just check for viruses. for example, on ubuntu… there is a package called “clamav-unofficial-sigs”. this might take a little bit of configuration (see its respective Man page…)… .however it includes defenitions to detect many more types of viruses… phishing… some types of spam… and alot of types of junk that plague email inboxes daily.
- take a very good look at the spamassasin confiuration (/etc/spamassassin/local.cf)
there is some things that are great for helping deter spam with spamassasin… but alot of people just enable the plugin… and assume all the work is over with
however, in reality, alot of the plugins require configuration. some is present in the existing config… some is present but commented out… and some is missing in its entirely. i will attach my configuration to this post, as a sort of starting point
some things to make note of:
a) simple switches to turn on some plugins, such as
use_bayes 1
use_pyzor 1
use_razor2 1
skip_rbl_checks 0
b) shortcut blocks and whitelists are your friends. these are not exactly for helping with the spam problem, but can help reduce the time spamassasin spends working on legitimate mail (freeing resources on your server quicker).
c) the URI Black List plugin
this is incredibly useful, and helps quite a bit with spam. but i dont think its actually configured by default. its like a DNSBL, but for URLS. suspicious of malicious URLs found in the email body content can add to the spam score, helping identify spam. “very bad” urls with links to malware can be used to trigger spamassasin to destroy the message entirely, before somebody less savvy infects the network by accident
d) the DNS white list. again, not exactly for finding spam… but we need to seperate the real stuff from the spam, in order to properly train out filter.
attributed score decrease, if the sender exists in the DNS White list. example:
score RCVD_IN_DNSWL_LOW -1.750
score RCVD_IN_DNSWL_MED -2.000
score RCVD_IN_DNSWL_HI -2.500
or… it does not exist in the white this. neither good, or bad. so lets add a very small increase:
score RCVD_IN_DNSWL_NONE 0.500
e) when you get the time, and after you answer the question “just how much spam is acceptable”, realize the fact that we are going to have to fine tune things
the stock configuration is just what it sounds like. a stock configuration. its difficult to coerse this to do what you want.
generally what I do, as you will see in my example configuration… is i actually go through and fine tune the scores for much of the plugins and processes in spamassassin. it can be a few hours of tedious work, but once its done… you will have a much better idea of just what is going on during the spam classification phase.
remember to choose a threshold for bottom of the barrel, throw away spam. i like to set this to around 10. some people prefer 5. it really doesnt matter, because we are going to be fine tuning the scoring system to suit our needs anyways. i’m fairly certain this is set in virtualmin, and is a function of procmail. so look for it in your server templates, or somewhere else in virtualmin. i might also suggest NOT to delete “virus” mail, and instead store it in a virus directory (if you are going to beef up your clamav signatures). the reason being is that now, clamav is doing more than just classifying viruses.
once we have this threshold level chosen… we can start the tuning process.
for example, some things like… email that don’t pass SPF record checks… in this day and age… is likely garbage. and if its not garbage, it should be. same goes for DKIM records. having no record is fine… but if the record is bad, or broken, than its likely junk. things like this deserve an obvious high score set, from the get go.
i think the trick is have all the obvious signs of junk mail add up to a score JUST BELOW your “throw away” spam level score (we talked about earlier). this way, spamassassin can work its magic and push it over upper limit. we arent going to tune everything… we are just going to tune the things which point to the email being obvious spam
another thing is to take into considering is adjusting the score values for the Bayes auto classifer. anything that has a 99% probability of being spam, should have a fairly high score set. we can than stagger this down to something that has a 10% chance. this low, i actually deduct point. why? if bayes gives it a 10% chance of being spam… id say that is a fairly low chance. and likely, it is just splitting hairs. we dont EVER want to classify legitimate mail as spam, or we are going to have to wipe the auto learning database. so in my opinion… better safe than sorry. if there are any other obvious tell tale signs of the email being spam, out other rules should get it.
you will be able to see how i went about all these score increases and decreases in my example configuration (which i will attach to this post). again, i’m fairly certain that I have used an upper limit throw away spam score of about 10 in these example.
f) take advantage of other great tools available.
i find that its handy to have a few main back end servers, but with several queue and relay servers scattered through the cloud. these servers come equipped with basic DNSBL look up functionality… and “grey listing”. virtualmin uses the greylisting plugin or program for rate limiting mail… but with a little bit of effort, you can repurpose this… or rather, use it for its intended purpose… and perform greylisting on your perimiter mail servers. greylisting basically sends things that might be spam back to the sender, and asks them to deliver it in… say… 15 minutes. this trips up most spam bots and the junk mail never returns.
g) monitor, clean, and organize your autowhite list.
the “auto white list” is a source of many problems for people. it doesnt just white list things… but rather, applys a score to any mail from a specific sender. in effect, it can techically be a white list, and a black list. this also operates using an equation. the equation looks like this
[finalscore = score + (mean - score) * auto_whitelist_factor]
and auto_whitelist_factor can be set by you. mine is typically set to 0.5
make sure it hasnt sucked up legitimate addresses, and is applying them spam level scores. and make sure these white listed addresses get a negative score. you would be surprised how often this can get turned around by accident.
and at the same time, make sure anything that should be spam is getting a positive score. not a negative. some time spam will leak through no matter what you do, and if this is happening, double check it didnt make its way into your users auto white list. some times its easiest to just clear this every so often.
IN CONCLUSION:
it can take a while, and have alot of things to take in to consideration… but once its done, and done right… spamassasin will be your friend. most mail servers i setup… take a few weeks of monitoring after initial launch. but after this trial period, i almost never have a serious problem creep up on me in the future.
thanks for taking the time to read this! and i hope it helped to some degree. like i said before… im going to point to this as an example in the future, and so if i went over things you already know or understand. just ignore it!
and remember: a poorly configured spam filter is probably worse than having no spam filter at all
i hope you get it working nicely for your self.
my final thoughts would have to be… as with any complicated piece of software, and any diverse system… it is only as good as your design it to be. spamassasin is not “crap”, its only “crap” if it was setup like “crap” :P. and thats understandable… it isn’t exactly straightforward.
spamassasin is a spam filter took kit per say. it really doesnt work all that great out of the box. but nor does any other industry standard spam filter out there (atleast that i am aware of). and the same can be said about alot of applications and services, when it comes to web hosting.
any serious shared host will be sure to carefully examine and benchmark critical parts of their infrastructure. a spam filter can be very handy, for sure. but it can also be a nightmare (if it isn’t used properly).
it can literally mean the difference between getting that reply for an employer. getting an important email from mum. getting an emergency message from your staff. and so much more. in any production environment, one must take careful consideration when it comes to this part of their infrastructure.
anyways, an example configuration for spamassasin has been attached. dont just copy and paste it… but rather, use it as a starting point, or for reference.
and again… i hope this was helpful!
take care!