Spam Filtering

Note: this is pretty out-of-date

Spam

I have a pretty neat spam filter. It consists of two layers: DCC to throw away junk spams, and then a whitelist with auto-responder to filter more subtle spam.

DCC

DCC is the Distributed Checksum Clearinghouse. Essentially it computes a robust fingerprint of every email you get, and stores them on a group of peered clearinghouse servers. If more than a certain threshhold number of people get essentially the same email, it is marked as spam.

DCC never gets a false positive, unlike inferior solutions like SpamAssasin, which can “accidentally” delete important emails.

Here is the procmail rule I use for DCC:

########### Add X-DCC header
:0 f
| /usr/local/bin/dccproc -R -h /home/megacz/.dcc
:0
*X-DCC-[^\:]*:.*Fuz[12]=(many|[4-9]|[1-9][0-9])
*!X-Ack: no
*!SELF-SERVE
|/home/megacz/bin/fixdeliver.pl user.megacz.junk

Whitelist With AutoResponder

I have a very large library of “trusted senders” in ~/.whitelist. If I get an email which is not on this list, it is placed in /var/spam, and an email is sent to the sender with a web link they can click to move the mail from /var/spam to my inbox. Clicking this link also puts them on the whitelist.

Here is the procmail rule to check the whitelist:

:0
* ? formail -x"From:" -x"From" -x"Sender:" | \
          tail -n 1 | \
          sed 's_.*<\([^>]*\)>.*_\1_' | \
          tr A-Z a-z | \
          grep -if /home/megacz/.whitelist
|/home/megacz/bin/fixdeliver.pl user.megacz.newmail

Here is the procmail rule to generate the auto-response:

:0
*! ^Subject: Mail failure
*! @craigslist.org
*$ ! ^X-Loop: megacz.com
{
    :0c
    |/home/megacz/bin/fixdeliver.pl user.megacz.maybespam

    :0c
    | umask 0022; cat > /var/spam/`ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.\*/__ | sed s_\\\\.__g`

    :0 fhw
    | formail -kr -I"X-Loop: megacz.com"; cat bin/spamreply; echo -n "http://www.megacz.com/spam.cgi?spamid="; ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.*/__ | sed s_\\.__g; echo; echo; echo

    :0 w
    ! -oi -t 
}

The spam.cgi script forwards the email to a special address (SPECIALADDRESS) which places people on my whitelist:

:0
*Envelope-To: SPECIALADDRESS
{
:0c
|formail -x From | tail -n 1 | sed 's_.*<\([^>]*\)>.*_\1_' | tr A-Z a-z | tr \\r \\n >> /home/megacz/.whitelist
:0
|/home/megacz/bin/fixdeliver.pl user.megacz.newmail
}

Finally, every night this cron job sorts my whitelist and removes duplicates:

#!/bin/bash

cp .whitelist .whitelist.unsorted
find /var/spool/imap/user/megacz/sent -name \*. |\
     xargs grep "^To:" |\
     sed 's/.*[ &lt;,:]\([^ >,]*@[^ >,]*\).*/\1/' |\
     tr A-Z a-z >> .whitelist.unsorted

sort .whitelist.unsorted | uniq > ~/.whitelist.new
mv ~/.whitelist.new ~/.whitelist