Eliminate Junk E-Mail

Editor's Note: Don't miss our new server-side anti-spam and anti-virus solution, ITS Mail Guard!

Everyone can agree nowadays that unsolicted commercial e-mail (a.k.a. UCE, or spam) has become a serious problem, soaking up valuable computing resources as well as human productivity. In extreme cases, users must wade through hundreds of junk e-mail messages per day. Fortunately there are ways that users can combat spam, while we all wait for either Internet e-mail standards or the legal system to mature to the point where junk e-mail is no longer a viable option.

Origins

Unsolicited e-mail dates back to at least 1978, when DEC announced a new computer by sending a message to all ARPANET addresses on the west coast (ARPANET was a precursor to the Internet). The term "spam" appears to have originated in reference to a skit by the British comedy troupe Monty Python, in which a group of Vikings dining in a restaurant repeatedly sing, "Spam, spam, spam," annoying other patrons and making conversation difficult. The term has therefore come to mean e-mail, USENET or message board posts, or other electronic communications, that drown out normal communications. Hormel has actually taken legal steps to protect its trademark, "SPAM" (note upper case) but does not object to the use of "spam" to describe UCE.

The sad fact is that the economics favor the senders of UCE. It costs little in terms of computing resources to send an e-mail message, however the recipients' mail servers must accept and store all the messages, which take up massive amounts of disk space. One of our web hosting partners' data centers recently estimated they block almost two billion messages per month by blocking mail from systems such as open relays. An open relay is a mail server which - usually accidentally - is configured to forward all e-mail sent through it from anywhere in the world. UCE senders search the Internet for such systems and use them to send mail until the owner discovers and corrects their error.

Eradication Methods

Currently no standard exists for authenticating the "from" address of a message. Many suggest this capability is necessary, and free e-mail account giant Yahoo! has just recently proposed such a solution. This would also prevent address spoofing (someone else sending mail using your domain name), which is very easy to do today. Another suggestion is to require a computer to perform a small computation for each sent message, unnoticeable to most users but extremely limiting to someone trying to send out thousands of messages.

Aside from manually deleting UCE messages, users can attempt to filter out UCE via software. Such software can run on the server or on each workstation. We prefer the filtering to be performed on each workstation, since server-based filtering often requires users to check a "spam" folder periodically via a web browser, an inconvenience. Also it can be tailored to each user rather than a company. However any filtering effort has two concerns: false positives (mail incorrectly marked as spam) and false negatives. For example, users would not want to accidentally filter out e-mail from an important customer, however they would also desire as little UCE in their inbox as possible. As false positives have a much more dramatic business impact, it becomes necessary to regularly check the messages filtered out as UCE to ensure no important e-mail is lost. Note: if someone suddenly stops answering your e-mail, they may be incorrectly filtering out your messages!

Currently there are two main methods for filtering UCE via software. One is to use a "blacklist," which the program uses to look up the sender of a message to see if that address has been recorded as sending UCE. Mail arriving from these addresses is blocked. The down side is that UCE senders can easily change the "from" address on their messages to a random string, defeating the lookups. Also, what one person perceives as UCE may be a valid message to others. Users typically keep a "whitelist" of known good addresses as well.

The other method looks at the content of each message, using Bayesian statistics. Starting from a clean slate, the program "learns" what a person considers UCE or valid e-mail. From then on each incoming message has a calculated ranking as to whether it is valid. The down side to this method is that users must correct the program's analysis for the first few days as it learns and accuracy increases.

Suggested Solutions

ITS has had good results using two free programs, which are listed on ITS StartCenter (teamITS.com/start) under the Internet Software link. K9 uses Bayesian statistics to analyze incoming e-mail. The other, MailWasher Free, primarily uses the blacklist/whitelist method. It can also "bounce" UCE messages as undeliverable, in hopes that the sender will remove your address from their list. Both do require some minor reconfiguration of users' mail programs as well as minor training in the procedures used. Alternatively, Microsoft Outlook 2003 includes a form of Bayesian statistical filtering.

January 2004

Send this article to a friend!
Subscribe to The ITS Connection

Related articles