One of our friendly UNIX sysadmins (aka the guy who fixes webmail when it breaks) posted a super-detailed spam-fightin’ action guide on WesConfess. Here it is slightly abridged and formatted:
So as some of you may have noticed, there’s now a button in Wesleyan’s “webmail” that allows you to report a given email message as spam or not spam. Since reporting as spam fires off the email to me so that I can evaluate 1) if it’s legitimately spam, and 2) if so, why it bypassed our filters, I’ve noticed my mailbox has recently been flooded by reports of spam. This is great, cause this is exactly what I want. Bizarrely enough, I don’t actually get spam in my Wesleyan account (in contrast to my other accounts) so it’s been immensely frustrating to deal with complaints because no one has given me any data. But now I have it.
In any case, there’s a few hints I can give based on my evaluation of reported spam for dealing with spam. The technology behind the filters work; where we’ve really failed is making it easy to use and educating people on how to use it. Again, working on this, but I have to rely on other people to help me out – I’m horrible at writing “serious” documentation and identifying what is user friendly.
- The most important – turn on your filters. I don’t know fully why it is the policy to make filtering opt-in, but unless you have specifically configured your spam filters (you can do this through Webmail->Options or some link in e-portfolio) nothing gets filtered for spam. Similarly, if you forward your mail off to someplace else, nothing gets filtered for spam.
- Nearly as important – do not whitelist from @wesleyan.edu. Or yourself @wesleyan.edu. Anyone in the world can claim they are from Wesleyan and trust me, spammers do. This counts for over half of the spam that get reported me because the whitelist will let a mail through no matter how spammy it looks. As a point of comparison, I routinely get reported mails that, if they had not been whitelisted, would have received a spam score of 30, well over most people’s threshold. We don’t actually filter internal Wesleyan communications at all as a matter of policy. Email wasn’t really designed with verification of sender in mind (though there are things in place, but they’re tricky for reasons that have nothing to do with technology).
In any case, a “whitelist” in this context is a list of addresses that are absolutely allowed. Namely, an address that is whitelisted means you have expressed a desire that this address, under all circumstances, is definitely allowed. In the context of Wesleyan’s system, you do this through your spamassassin prefences or by clicking “Allow Sender” when you’re viewing a message in webmail and the end result is that the mail will not be flagged as spam.
It can be quite useful in some circumstances, but as I noted above, lots of spammers fake the domain they’re sending to so whitelisting all of Wesleyan addresses means all the spam gets through. And since we actually don’t pass internal mail through the filters anyway, it’s not necessary anyway. Spoofing email is trivial, anyone can pretend to be oh, I dunno, email@example.com with little effort. I wouldn’t recommend doing it though.
Conversely, a blacklist is a list of addresses that are not allowed under any circumstances. It too can be configured through your spamassassin preferences or “Block Sender” when viewing a message in this case. Generally speaking trying to use blacklists to block all possible spammers is a futile gesture, but it can be helpful for something like an Amazon list you just want to get rid of. Unfortunately, you can’t blacklist Wesleyan addresses (policy again) via this system, but anything else is fair game.
Of course, this brings up the point that sometimes internal Wesleyan communications are very much spam and that maybe you do want to blacklist certain Wesleyan addresses. I’m hoping to come up with a solution for this, but it is currently lacking.
- It’s actually more important to learn messages as not spam then spam. Very high scoring spam messages are automatically learned by the system as spam, and very low scoring spam messages are automatically learned as ham… but unfortunately, it’s very very hard to get a legitimate low-scoring spam message. It’s simply easier to automatically detect a mail as spam– ham (legitimate mail) not so much. And since the learning technology will only start taking affect after you’ve learned 200 spams and 200 hams.. well, unless you’ve learned enough non-spam, it’s never going to help. I know it’s a pain, but it will help. Reporting something as non-spam, btw, does NOT get sent to me in any fashion.
- Not every piece of bulk mail is necessarily spam. Some things will never be detected as spam by the filters because they aren’t spam by the current definition. Things like various newsletters from Amazon, Buy.com, whatever (which you unwittingly opted in to) will probably never get flagged– by learning them you might be able to push them past the barrier, but because they are “legitimate”, the filters give them a pretty low score. You’re better off unsubscribing from them or blacklisting them straight out then trying to get the score down. What determines their legitimacy? Basically if they are 1) big enough, 2) opt-in, 3) have a clear way to get off their list they get a stamp of approval from the makers of the software.
- Lyris lists – depending on how the email is delivered to lyris, messages from lyris lists aren’t correctly scanned. We’re working on closing the hole, of course, but lyris is not the most cooperative product in the world for some of these things and the work around/recommendation I suggested has some other issues.