Spam Filtering

Note: this is pretty out-of-date


I have a pretty neat spam filter. It consists of two layers: DCC to throw away junk spams, and then a whitelist with auto-responder to filter more subtle spam.


DCC is the Distributed Checksum Clearinghouse. Essentially it computes a robust fingerprint of every email you get, and stores them on a group of peered clearinghouse servers. If more than a certain threshhold number of people get essentially the same email, it is marked as spam.

DCC never gets a false positive, unlike inferior solutions like SpamAssasin, which can “accidentally” delete important emails.

Here is the procmail rule I use for DCC:

########### Add X-DCC header
:0 f
| /usr/local/bin/dccproc -R -h /home/megacz/.dcc
*!X-Ack: no
|/home/megacz/bin/ user.megacz.junk

Whitelist With AutoResponder

I have a very large library of “trusted senders” in ~/.whitelist. If I get an email which is not on this list, it is placed in /var/spam, and an email is sent to the sender with a web link they can click to move the mail from /var/spam to my inbox. Clicking this link also puts them on the whitelist.

Here is the procmail rule to check the whitelist:

* ? formail -x"From:" -x"From" -x"Sender:" | \
          tail -n 1 | \
          sed 's_.*<\([^>]*\)>.*_\1_' | \
          tr A-Z a-z | \
          grep -if /home/megacz/.whitelist
|/home/megacz/bin/ user.megacz.newmail

Here is the procmail rule to generate the auto-response:

*! ^Subject: Mail failure
*$ ! ^X-Loop:
    |/home/megacz/bin/ user.megacz.maybespam

    | umask 0022; cat > /var/spam/`ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.\*/__ | sed s_\\\\.__g`

    :0 fhw
    | formail -kr -I"X-Loop:"; cat bin/spamreply; echo -n ""; ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.*/__ | sed s_\\.__g; echo; echo; echo

    :0 w
    ! -oi -t 

The spam.cgi script forwards the email to a special address (SPECIALADDRESS) which places people on my whitelist:

|formail -x From | tail -n 1 | sed 's_.*<\([^>]*\)>.*_\1_' | tr A-Z a-z | tr \\r \\n >> /home/megacz/.whitelist
|/home/megacz/bin/ user.megacz.newmail

Finally, every night this cron job sorts my whitelist and removes duplicates:


cp .whitelist .whitelist.unsorted
find /var/spool/imap/user/megacz/sent -name \*. |\
     xargs grep "^To:" |\
     sed 's/.*[ &lt;,:]\([^ >,]*@[^ >,]*\).*/\1/' |\
     tr A-Z a-z >> .whitelist.unsorted

sort .whitelist.unsorted | uniq > ~/
mv ~/ ~/.whitelist