Spam FilteringNote: this is pretty out-of-date SpamI have a pretty neat spam filter. It consists of two layers: DCC to throw away junk spams, and then a whitelist with auto-responder to filter more subtle spam. DCCDCC is the Distributed Checksum Clearinghouse. Essentially it computes a robust fingerprint of every email you get, and stores them on a group of peered clearinghouse servers. If more than a certain threshhold number of people get essentially the same email, it is marked as spam. DCC never gets a false positive, unlike inferior solutions like SpamAssasin, which can “accidentally” delete important emails. Here is the procmail rule I use for DCC:
########### Add X-DCC header
:0 f | /usr/local/bin/dccproc -R -h /home/megacz/.dcc :0 *X-DCC-[^\:]*:.*Fuz[12]=(many|[4-9]|[1-9][0-9]) *!X-Ack: no *!SELF-SERVE |/home/megacz/bin/fixdeliver.pl user.megacz.junk Whitelist With AutoResponderI have a very large library of “trusted senders” in ~/.whitelist. If I get an email which is not on this list, it is placed in /var/spam, and an email is sent to the sender with a web link they can click to move the mail from /var/spam to my inbox. Clicking this link also puts them on the whitelist. Here is the procmail rule to check the whitelist:
:0
* ? formail -x"From:" -x"From" -x"Sender:" | \ tail -n 1 | \ sed 's_.*<\([^>]*\)>.*_\1_' | \ tr A-Z a-z | \ grep -if /home/megacz/.whitelist |/home/megacz/bin/fixdeliver.pl user.megacz.newmail Here is the procmail rule to generate the auto-response:
:0
*! ^Subject: Mail failure *! @craigslist.org *$ ! ^X-Loop: megacz.com { :0c |/home/megacz/bin/fixdeliver.pl user.megacz.maybespam :0c | umask 0022; cat > /var/spam/`ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.\*/__ | sed s_\\\\.__g` :0 fhw | formail -kr -I"X-Loop: megacz.com"; cat bin/spamreply; echo -n "http://www.megacz.com/spam.cgi?spamid="; ls -tr /home/megacz/mail/maybespam/ | grep -v cyrus | tail -n 2 | head -n 1 | sed s_.*/__ | sed s_\\.__g; echo; echo; echo :0 w ! -oi -t } The spam.cgi script forwards the email to a special address (SPECIALADDRESS) which places people on my whitelist:
:0
*Envelope-To: SPECIALADDRESS { :0c |formail -x From | tail -n 1 | sed 's_.*<\([^>]*\)>.*_\1_' | tr A-Z a-z | tr \\r \\n >> /home/megacz/.whitelist :0 |/home/megacz/bin/fixdeliver.pl user.megacz.newmail } Finally, every night this cron job sorts my whitelist and removes duplicates:
#!/bin/bash
cp .whitelist .whitelist.unsorted find /var/spool/imap/user/megacz/sent -name \*. |\ xargs grep "^To:" |\ sed 's/.*[ <,:]\([^ >,]*@[^ >,]*\).*/\1/' |\ tr A-Z a-z >> .whitelist.unsorted sort .whitelist.unsorted | uniq > ~/.whitelist.new mv ~/.whitelist.new ~/.whitelist
|