Coding email links to avoid spam

This page contains advice for developers and maintainers of ringing websites such as Guild sites. It describes how to encode email links to protect them from spammers. An automatic encoding tool is also provided.

Email contact links are an invaluable part of any web page. However, they are also vulnerable to a particular type of web robot known as the spam harvester or spambot. A spam harvester can read through the pages in your site and extract email addresses which are then added to bulk marketing databases. The result: more spam arrives in your inbox. If you've quoted other people's addresses on your site, they will probably get upset with you too.

The solution to this dilemma is to protect email links in a way that hides them from the spam harvester. Here's how.

The normal way of doing it

Normally you'd add an email address to a web page with a piece of HTML such as:

<a href="mailto:nobody@fake.address9z.com">Mr Nobody</a>

This creates a mailto link, and when displayed in a web page looks like this:

Mr Nobody

When the site user clicks the link, instead of triggering a transition to another web page, their mail client will instead pop up a compose mail window, addressed to the target of the link, in this case nobody@fake.address9z.com.

Unfortunately a spam harvester can easily read the email address within the HTML code, so this style of link should be AVOIDED!

One solution

A solution adopted by some sites (including Roger Bailey's Change Ringers' Email Directory) is to nobble the email address in such a way that a spam harvester won't recognise it, but a human reader will. The normal way of doing this is to replace the "@" sign with some text, such as "-AT-":

<a href="mailto:nobody-AT-fake.address9z.com">Mr Nobody</a>

When clicked, this will produce an email addressed to nobody-AT-fake.address9z.com. There are two drawbacks to this system:

  1. The user has to manually replace the "-AT-" with "@".
  2. Some spam harvesters are already aware of this technique and can recognise and fix nobbled email addresses of this form.

For these reasons I believe it is better to use a more sophisticated form of address hiding.

A better solution

All modern browsers have support for Javascript. This can be used to emit HTML into a web page in a way that makes it very difficult for automatic robots such as web harvesters to detect. Here's an example of this technique at work:

mail2("nobody","fake.address9z",0,"","Mr Nobody") Mr Nobody

If you click this link, you will see a normal mail window open addressed to nobody@fake.address9z.com - so the user does not have to do any editing of the email address. But, if you view the HTML source for this link, you'll get the following code:

<script>mail2("nobody","fake.address9z",0,"","Mr Nobody")</script>

As you can see, there is nothing in this code which can be directly used by a spam harvester to reclaim the email address. So, the spam harvesting problem is also solved.

How do I implement this solution?

You could write your own Javascript to do something similar, but to save time, or if you're not familiar with Javascript, you can download mine. Right-click the link below and save it to your hard disk:

email.js

This file should be added to a suitable directory, such as the root or a scripts directory, within your web site. It is very small (less than 1K) so will not adversely affect page-load times.

To use the script to protect email links in a web page you need to carry out the following steps:

  1. In the HEAD section of the web page, add the following line:

    <script language="javascript" src="/scripts/email.js"></script>

    You must be careful to specify the correct path to the email.js file - here I've assumed you've saved it into a top-level "scripts" directory in your site.

  2. Now, every email link must be converted to a script call. For instance, a link such as:

    <a href="mailto:nobody@fake.address9z.com">Mr Nobody</a>

    Needs to be recoded as:

    <script>mail2("nobody","fake.address9z",0,"","Mr Nobody")</script>

The simplest way to encode your mail links is to use my automatic tool. This converts a list of email addresses into the required Javascript calls for step (2), with a simple button press.

If you do want to encode the links manually, here is a description of the five parameters needed by the mail2() function call:

  1. The email name: i.e. the bit before the @ sign
  2. The second-level domain of the email address: this is the bit after the @ sign, but with the top level domain (".com" or ".co.uk" etc) stripped off.
  3. A number specifying which top-level domain to use. I list these below.
  4. The fourth parameter allows you to specify a default subject for the email, by passing e.g. "?subject=Changeringing", but normally you'll just need the empty string "".
  5. Finally, the text of the link, i.e. the name to be displayed to the user in the web page.

Sometimes you might want to code a link in which the email address itself is shown as the visible text, e.g. mail("nobody","fake.address9z",0,"") nobody@fake.address9z.com. To do this, simply call mail() rather than mail2(). Only the first four parameters are needed.

Numbers for top-level domains

The use of numbers helps hide the email address from spam harvesters. My email.js file uses the following table of common top-level domains:

0 .com
1 .org
2 .net
3 .ws
4 .info
5 .edu
6 .mobi
7 .biz
8 .name
9 .me.uk
10 .co.uk
11 .org.uk
12 .gov.uk
13 .ac.uk
14 .sch.uk
15 .uk
16 .net.uk
20 .com.au
21 .it
22 .au
23 .nl
   

If you have a need for other top-level domains, these can easily be added to the tld_[] array declared at the start of the email.js file. Alternatively, you can pass a "-1" value to the mail() and mail2() calls, and simply leave the unknown top-level domain on the end of parameter 2. This is slightly less secure but should still be safe from most harvesters. However, the best solution is to use my automatic encoder, which uses a -2 parameter to pass an encrypted version of the complete domain name; this is the most secure option.

Browser Compatibility

The code in email.js is valid Javascript 1.1 / ECMA-262, and as such will run correctly in all modern browsers including Internet Explorer from version 4, Netscape Navigator from version 4, Mozilla and Opera.

A very small number of users may have scripting disabled. For them it may be worth adding a note that the email addresses will not be visible. You can do this with HTML code such as:

<noscript><p> Please note that email addresses on this site are protected to avoid abuse by spammers. You will need a JavaScript-enabled browser to see the email addresses. </noscript>

More Information

For more information on spam harvesters and email links, try these pages:

http://west-penwith.org.uk/misc/spam.htm

http://www.turnstep.com/Spambot/

http://www.siteware.ch/webresources/useragents/collectors/

MBD August 2003

Amended PAT Nov 2003-Nov 2004

Amended ACAH Jan 2008