Dave Higton’s !AntiSpam deletes incoming ‘spam’ email by a very simple method. It downloads just the header fields at the top of the email, and compares each of them in turn against a set of Rules supplied by the user. These Rules in general consist of a text string to look for (e.g. *microsoft* or *download;*), the specific header field to which this rule applies (e.g. From: or Content-type:), and the action to be taken if the given string is found in the given header. The commonly-used actions are Delete, Accept (normally used at the head of the Rules file to ensure that mailing-list traffic, for example, is not unnecessarily checked and/or deleted) and Defer (which leaves the email in question on the server to be downloaded later, perhaps via an alternative, faster connection).
For the sake of brevity, I’m going to assume that my readers already have a copy of AntiSpam and have managed both to configure it to connect to their ISP and to set up a basic Rules file — if anyone wants an article on Rules file syntax or AntiSpam configuration options, please let me know!
Recent versions of the program have added more complicated options, such as a new action which neither deletes nor accepts the email it matches, but simply logs its full headers so that the user can make the final decision off-line, plus the ability to specify a Rule as being the contents of a given system variable (thus allowing it to vary according to external factors, e.g. the time of day). However, all these Rules share the same drawbacks. They rely on the simplest of matching techniques, looking for a single, literal string within a single specified header line — you cannot construct a Rule saying “if any header except Subject: contains the string ‘microsoft’, then delete”, any more than you can create one saying “if the Subject: header contains the string ‘free’ and the To: header does not contain my email address, then delete”.
However, there is a feature of AntiSpam that can do either of the above — and more! If you can write a simple Basic program, then you can test for any condition that can be formulated into a logical statement, ranging from simple AND/OR keyword combinations to concepts such as: “accept all email from people in my Messenger address book”, “delete any email where the subject line contains many letters separated by single spaces” (e.g. ‘NEW Cable D e s c r a m b l e r’), “delete all email with a From address containing four or more random digits” (i.e. ‘<aez354hdf82@mailtrail.com>’) or “delete all email where the subject line is all in capitals”.
Because AntiSpam is written in Basic, adding extra functionality is simply a matter of using the LIBRARY statement to include another Basic file containing extra functions; and this is where the UserTests file comes in. When the application starts up, it checks the Choices:AntiSpam directory (where your Rules file lives) for a Basic file called UserTests, and if it exists, loads it. Any extra comparisons programmed into this file will then be available to the main program, just as if they were part of AntiSpam itself.
Of course, in practice it isn’t quite that simple. In order for AntiSpam to be able to use them, all UserTests files have to contain four standard functions/procedures: PROCUserTest_Initialise, PROCUserTest_NewMailbox, PROCUserTest_NewMessage and FNUserTest_DoTest. AntiSpam will call these functions by name at various pre-defined points during the spam-checking process — but the user can edit their contents in order to affect what happens and to call any other functions he has added to the file.
The ‘template’ UserTests file provided, !AntiSpam.UserTests, contains blank versions of all the necessary functions as empty definitions which you can fill in as necessary. It also contains a lot of REMs to help remind you what the various parts of the file are for! The best thing to do is to copy this file into Choices:AntiSpam and use it as a basis for writing your own UserTests.
Before going into detail about how UserTests work and how to write them, I need to take this opportunity to draw the user’s attention to a feature of AntiSpam which is absolutely essential to the would-be UserTest programmer — the Trial window. This is the window which opens when you click <adjust> on the iconbar icon, and it enables you to simulate the effect of a new Rule without actually downloading or deleting anything. By default, it uses the settings for the first mailbox set up in your copy of AntiSpam; it is possible to drag in a different Rules file and edit the mailbox number to simulate downloading from a different ISP (if you have more than one set up), but this definitely comes under the heading of advanced usage.
To test the effect of your current Rules setup, drag a saved email (with headers, obviously...) into the Email icon at the top of the Trial window. Make sure you don’t drag it into the Rules icon underneath by mistake. AntiSpam will now pass the headers of that email through the various tests as if it had just been downloaded, pausing every time a Rule is matched to display the condition that was triggered and the current action (Delete/Accept/Defer) that would apply. At the bottom of the window is a separate section displaying any UserTests that matched. This is where the results of your new tests will show up.
Two things to note are, firstly, that the test input doesn’t actually have to be a complete email; it can be as little as one header line, and you can drag selected blocks of text out of other applications directly into the Trial window to test them. Secondly, and more importantly, while the Rules file is re-scanned each time a fresh download or test is conducted, the UserTests file is not. Because it is a library file of the main program, it only gets reloaded when the AntiSpam application itself is quit and run again — the upshot of which is that every time you make a change to the UserTests file, you have to reload AntiSpam before testing it. Otherwise, you may find yourself wondering why none of your changes ever seem to have any effect!
The UserTests files contains three procedure definitions, which are used for initialisation at various different points during the download, plus one function definition, which contains the actual test itself and returns its result to the main program.
PROCUserTest_Initialise gets called once only, when AntiSpam starts up. This is where you put any one-off initialisation for things you might want to use later — dimensioning arrays or blocks of memory, for example, or, more exotically, assembling scraps of machine-code and constructing lookup tables.
PROCUserTest_NewMailbox gets called before every download starts. Any variables used to retain information between different messages need to be cleared and/or set up here.
PROCUserTest_NewMessage gets called at the start of each fresh message. Any variables used to retain information between different header lines need to be cleared and/or set up here.
FNUserTest_DoTest is the user test function itself, which is called once for every header line, i.e. multiple times per email.
Every time AntiSpam performs a download, it calls FNUserTest_DoTest once for every header line. In your default setup (the copy of UserTests inside !AntiSpam), FNUserTest_DoTest does nothing — at the bottom of the file, you will see the line “=0”. If FNUserTest_DoTest returns the value zero, AntiSpam will ignore it.
All you have to do in order to bring your user tests to life is to make this function return a different value. Three named variables are available, preset to appropriate values to make these easier to remember — ACCEPT%, _DELETE% and _DEFER%.
The best way to change the return value of FNUserTest_DoTest is to change the last line to “=exit%” instead of “=0” and to add a line immediately underneath DEF FNUserTest_DoTest, “LOCAL exit%”.
This creates a local variable ‘exit%’ every time the function is called by AntiSpam, and sets it to zero. Now, whatever the value of exit% when the program reaches the last line, this will be the value returned by FNUserTest_DoTest. In order to accept, defer or delete an email, all you have to do is have a line somewhere “LET exit%=_DELETE%”, etc.
There are two other (optional) variables you can set if you return a non-zero value — one is UserTestLog$, which (if present) will supply a short piece of text to be written into the log identifying which user rule was used. For example, if UserTestLog$ is set to “Many Bigfoots” you will get a log entry like this:
Message 7 deleted on user rule Many Bigfoots from: “joan garland” <staciusa@aol.com> subject: there is a better way ................... converge
The other is UserTestPriority%, which sets the priority of this user rule relative to the other rules in your Rules file — as you can see from your log file, the first rule in the file is rule Nº 1, the next line down is rule Nº 2, and so on. If you don’t give a specific value to UserTestPriority%, then your user rule will have bottom priority, that is, it will appear as if it is the last rule in the Rules file, and if any other rules match then they will be used instead. This is a safe default.
At all costs, you must make sure that user rules which delete messages are set to a lower priority than any Accept rules at the top of your Rules file — i.e. if you have seven Accept rules, you shouldn’t set UserTestPriority% to be higher than 8. There is a way to set the priority of your user rules relative to the rest of your Rules file automatically, but for simplicity’s sake I’ll discuss that separately in my next article.
To summarise:
DEF FNUserTest_DoTest(kw$, data$, header$, mbox%) LOCAL exit% UserTestLog$ = “” =exit%
As you can see from the skeleton function above, FNUserTest_DoTest is passed four parameters by AntiSpam every time it is called, containing all the information you’ll ever need to know about the current header line.
Specifically, kw$ holds the ‘keyword’ for this header, in lower-case (“subject”, “to”, “cc” etc.), data$ holds the rest of the line in lower-case (the reason for forcing it to lower-case is so that you can perform checks without worrying about what case the words are in), header$ holds the original header line (largely duplicating the information in kw$ and data$, but retaining the case as received so that you can test for e.g. all capitals) and mbox% holds the ‘mailbox number’ of the mailbox currently being checked, corresponding to the number of this entry on the Run submenu. You won’t need to worry about either of these last two unless you are doing something very complicated indeed.
When programming a given spam check, you first need to decide which headers you want to test. FNUserTest_DoTest is called once for each header in the email, i.e. twelve or more times, each time with the variable kw$ set to the value of the current keyword. For example, the first time it is called, kw$ might be set to “Return-path”, the second time “Delivery-date”, the third, fourth, fifth and sixth times to “Received”, the next time to “From”, the next time to “Date”, and so on.
Obviously, you don’t want to apply the same tests to all headers. In the case of the normal Rules, AntiSpam checks down the Rules file and only applies those rules which match the keyword currently being checked — in the case of a user test, you need to decide yourself which tests apply to which keyword by looking at the current value of kw$.
If there are two or more possible different values for a single variable, the easiest way to check it is via Basic’s
CASE <variable> OF WHEN “value1” WHEN “value2” WHEN “value3” ENDCASE
structure, which corresponds to “if variable equals value1, then do this; if it equals value2, then do that; if it equals value3, then do the other” and so on. This makes it much easier to add extra checks on different keywords later.
What we need to do is a CASE kw$ OF, to detect which header line is currently being looked at and pick out the specific ones we’re interested in. We can then set our arbitrary LOCAL variable exit% to the value of the _DELETE% exit code, if we decide we do want to delete the email, or to zero otherwise, and return this variable as the result of the function in the last line of the file instead of returning 0 (the default action).
The principle behind UserTests is that you can test for anything you can program. The main challenge is to identify a spam trait that can be described in terms identifiable by a program!
To provide a working example of a UserTest, I’ll take the case of spammers who betray themselves by s p a c i n g o u t vital words in the Subject line in a attempt to foil keyword filtering. The first step, as always, is to work out what exactly you are looking for in programming terms. In this case, we can count the number of single-letter words; that is, spaces which are separated by only one intervening character.
I’m going to write the actual test in the form of a small function ‘FNspacetest’, which will scan through a string one letter at a time, recording the position of each space character it finds and checking this against the position of the last recorded space character. If the difference between the two values matches what we are looking for, FNspacetest increments a counter showing how many ‘single-letter words’ it has found, and the final value of this counter is returned as the result of the function.
DEF FNspacetest(subject$)
LOCAL n%,space%,foundcount%
FOR n%=1 TO LEN(subject$)
IF MID$(subject$,n%,1)=“ ” THEN
IF space%=n%-2 THEN foundcount%+=1
REM record number of alternate spaces found
space%=n%
ENDIF
NEXT
=foundcount%
Glancing down my inbox, I’d say that a result of anything more than about 4 from FNspacetest applied to a Subject: header line would be an indicator of spam. So all you’d need would be something like this:
DEF FNUserTest_DoTest(kw$, data$, header$, mbox%) LOCAL exit% UserTestLog$ = “” CASE kw$ OF WHEN “subject”: IF FNspacetest(data$)>4 THEN exit%=_DELETE%: UserTestPriority%=40: UserTestLog$=“s e p a r a t e” ENDCASE =exit%
(adjusting UserTestPriority% and UserTestLog$ as desired, of course).
My next article will look at the rôle of PROCUserTest_Initialise, PROCUserTest_NewMailbox, and PROCUserTest_NewMessage, and give examples of some more complicated tests.