Editorial Feature - Email Basics and Spam
|Published:||Nov 26, 2007|
by Nebojsa Djogo of DigiPortal
What is email?
Email is a system or a method of creating, storing as well as sending and receiving messages via electronic communication systems.
Basically email is a group of tools and technologies allowing people to communicate over long or short distances. This communication can include various document formats like pictures, sounds, presentations or indeed, any kind of document. As we will see later every email communication is performed using text only even if binary files are sent within an email message.
Did you know that email was used before the Internet was invented?
In early 1960's, people used something called "time sharing systems" (IBM 7094 at MIT in particular). People would dial into this system from a remote terminal. These were capable of running more than one program on a single computer (which was somewhat novel for that time period) and offered a chance for users to send messages from one terminal to another. This was the first form of email, but it was limited to users connected to the same computer.
Ray Tomlinson created the first SNDMSG and READMAIL programs that were the actual beginning of the email we know today. He also invented the use of the at "@" symbol for email addressing. User@host would indicate a specific user on a specific computer system. Today's addressing format follows the same principle.
In the early 1980's, the SENDMAIL program was created for BSD UNIX systems. SENDMAIL is the most commonly used SMTP server on the internet.
What is spam?
You may find many different definitions about what spam actually is. While we all understand the term very well, its definition is sometimes confusing. For example:
"SPAM is unsolicited commercial email"
While it sounds perfectly valid at the first glance, we have to remember that many people do want to receive unsolicited mail. Not all commercial unsolicited mail is spam for all users. One has to pair the user with the message to determine if the message is spam or not.
Therefore SPAM is, in most the simplistic terms, an unwanted message determined by the recipient.
Wikipedia classifies spam as this:
"abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages"
This is actually a pretty good definition. The problem is that often SPAM comes from a KNOWN source that one requested information from before. For example, if you purchased something online you may have willingly subscribed to the update notification.
At some point, some of these companies use your consent to send you promotional material that may or may not be related to your original purchase. They are abusing the electronic messaging systems to send unsolicited bulk messages and they are doing it indiscriminately to their whole user base regardless of the products or services that user originally purchased from them.
I want email that is related to updates of my product, but I do not want emails related to promoting some other product from the same company. When I receive these one of the messages is spam, the other is not. One is unsolicited, another is not. This is from my own, personal perspective though. You may find that you do want ALL related messages from the same vendor since you may be interested in another of their products.
It is very easy to classify message as spam for yourself, but it is very hard to do the same for someone you never met!
(Incidentally this is the biggest problem with todays anti-spam solutions. Why do I receive spam?
The simple answer to this is because it works! If nobody ever clicked on any ads spammers send out they would not waste the resources or bother to send it. Therefore it must be that some of us, however small the percentage may be, justify the spammers effort to send the message out.
How do spammers know my email address?
Spammers use several tricks to get access to a valid email address lists. Willingly or not, your email address gets transmitted over the internet to various third parties. Some of these sell their email list, get acquired by another company that sells the list or simply publish the list for everyone to see.
Sometimes spammers would use something called "dictionary attack" where they send messages to random email address appending the domain name to words found in a dictionary. A very small percentage of these make it through.
A somewhat recent form of spam and related methods for gaining access to your online accounts is called Phishing. This is a particularly nasty form of spam where spammers deliberately try to fool you into thinking that you received a message from someone you trust and want you to visit a website that looks like the website from someone you trust and fill out your personal information. The "Spammer" then uses your information to gain access to your account.
What can be done? Blocking spam
Approaches to Blocking SPAM
There are many different approaches to blocking spam. Recently one person simply said "the best thing to do is not to use email at all". While that certainly is a perfect defense against spam it may not be for everyone.
Rules and filtersThe most common approach to the spam problem is Filters and Rules that try to "guess" that a certain message is spam or not. While their success rate varies based on implementation, they all have a common problem where if one sets the filters to be too aggressive, good messages wind up in the junk box. Set them too low, and you are receiving too much spam. These are better than having nothing at all though.
Statistical approaches (Bayesian)
Statistical approaches may be a little better than simple filters, but they also suffer from inconsistencies and occasional message deletion. Most of these types of filters "learn" based on user input. This approach is better as it allows user to personalize the filters, but it requires user to constantly answer questions. With spammers changing their tactics daily this becomes as big a burden as spam itself at some point.
Bayesian filters were all the hype a year or two ago. These could be considered "statistical" filters as well, but as all other filters, they are easily fooled by a spammer. For example somewhat larger list of words at the bottom of an email will confuse many spam filters and at the same time skew the results for the learning type of anti-spam filter.
These systems are very good. The only problem is that user must manually enter the whitelist and the blacklist. This system suffers from a problem where spammer pretends to be someone else, but this is a problem with identity theft and there are already technologies like SPF that will help eliminate it.
Community Shared RBL Lists
These are systems that try to gather large number of data from various sources and blacklist IP addresses or blocks of IP addresses.
ISPs LOVE to implement these systems. They are low cost and block tons of messages. The problem is though that VERY often a valid IP address gets blacklisted and people cannot send mail.
These types of anti spam blocking are usually doing more harm than good. Moreover, nobody wants a community shared service to determine extremely easy to exploit by the spammers. All it takes as a few phony reports that certain server is sending spam and voila you are blacklisted. The only reason these services exist is that ISPs like the fact that with very little resources they can increase the number of users served by the same hardware and offer you a lower cost of service. This, however comes with much higher cost as even one lost message is one too many.
If you are missing a message and are not sure where it is ask your ISP if they use RBL lists to filter spam and have them disable it for you.
These systems are the definitive answer to spam. People often say that there is no real solution to spam. Well actually there is. This kind of a system usually tries to determine which sources I want the mail from. It is not trying to use the message content to determine if the message is spam or not. A friend should be able to send you a joke regardless of its contents right?
If correctly implemented these systems can be a joy to use. The only current problem with these systems is that spammers could impersonate someone else. This is however being solved with the new SPF and Sender Id efforts.
If every domain name used SPF and every person had this kind of system spam would not be possible.
I would like to add that I am the VP of Software Development at DigiPortal Software, Inc. We are the company behind ChoiceMail the leading challenge response system so my views on this subject are not completely objective.
I passionately believe that this is the answer to our spam problems.
Stay tuned next week for ways to deal with SPAM effectively!
Nebojsa Djogo has been in the computer industry since 1984. He is the president of Software River Solutions, Inc. - a small software company based in Ottawa, Canada. For the past six years he has been the VP of software development at DigiPortal Software and has been working on the ChoiceMail project since its beginning. He currently lives and works in Ottawa, Canada