RabidFire spam filter for UNIX systems

RabidFire Version 1.0

Click here to download our departmental (simplified) version.

Contents:

Introduction
Installation
The filter
Terminology and conventions
Principles of operation
Organization of the rules script
Pattern groups and files
Patterns
Global variables
Operations
Bounces, probes, and rejects
Rules
Examples
Options and extras
Logging and error handling
Other programs
The mail proxy
Sendmail
Adding patterns to the database
Open relay test
Handling options
The killer
The root window
The mailbox window
The target window
The console
The complaint letter
Hints and remarks
Changes in version 0.8

Updated November 14, 2002

Introduction

This document describes version 1.0 of the package. Most of the changes introduced since the first official version (0.6) affect the filter part. The killer has changed relatively little since version 0.6, and those changes are practically exclusively extensions rather than serious functional modifications.

RabidFire is a combination of an E-Mail filter and a program for semi-automated generation of SPAM complaints. Both programs are Tcl/Tk scripts. I wrote them (the original version number was 0.5) as an exercise to learn Tcl/Tk before embarking on a more serious scripting project, which turned out to be not so serious after all.

In the sequel, the two scripts will be referred to as RabidFire filter and RabidFire killer, or simply filter and killer. The name "RabidFire" is to "honor" one of the less famous spamming tools used by the competition, which was "popular" (at least in certain circles) at the time when I embarked on this project.

Please note that I am neither an active member of an anti-spam organization, nor a fanatical independent crusader in the war against spam. While my work on reducing the amount of garbage circulating in the network can be viewed as a hobby of sorts, I have other things to do in my life, and some of them I consider more serious than this little aberration. Consequently, although I may continue working on RabidFire in my spare time, I would prefer not to commit myself to any serious maintenance or development work related to this project. Needless to say, anybody willing to modify or extend these programs, and possibly redistribute them, is more than welcome to do so.

RabidFire is intended for UNIX systems. The filter assumes that you can set up a .forward file in your home directory. I believe that the killer can be ported to Windows with little effort because there exists a Tcl/Tk implementation for Windows, and programs like nslookup, traceroute, and whois used by the killer are also available there. However, I am not going to do this port myself.

Installation

Get the package, e.g., by clicking here.
Uncompress and untar the file wherever you want RabidFire to go, e.g.,
```
cd ~/Mail
zcat rf.tar.gz | tar -xvf -
```
This will create subdirectory RabidFire containing all the relevant files and subdirectories, including two executable scripts filter and killer. Your (quite minimal) work on adjusting various setup parameters will be truly minimized, if the package is unpacked into Mail/RabidFire in your home directory. The directory where the package has been unpacked, whatever it is, will be called the base directory in the sequel. The default (and mildly recommended) location of the base directory is $HOME/Mail/RabidFire/.
Move to directory RabidFire and execute make. This will create a few symbolic links and compile rfproxy (the mail server proxy daemon). You may not want to use the proxy, in which case you may prefer to execute make links instead. This will create the relevant links without compiling the proxy. You need gcc to compile the proxy. If there are linking errors, you may have to add some libraries, e.g., -lnsl, -lsocket, at the end of the first gcc statement in Makefile. I know, I know - I should have used configure for this, but the only compiled component of the package (i.e., the proxy daemon, being a wrapper calling a Tcl script, which is unable to fork by itself) is so trivially small that such a solution would appear totally out of proportion.
Edit the file rfirerc_example to fit your environment. Read the comments in that file - they may clarify a couple of things that are not explicitly covered in this document. If some of them are confusing, don't worry - they will become clear in a short while (or perhaps they are irrelevant). Then copy rfirerc_example to your home directory as .rfirerc, i.e.,
```
cp rfirerc_example ~/.rfirerc
```
If you have decided to install RabidFire in a directory other than ~/Mail/RabidFire, make sure to specify the correct (home-relative) path to that directory in .rfirerc.
In the long run, you may want to edit the file rules in RabidFire, as well as the pattern group files in subdirectory Rules and, possibly, the programs in subdirectory Tests. All these files describe the "rules" for detecting spam (and generally processing your E-Mail), which you may want to tailor to your personal criteria and taste. Of course, to understand what you are doing, you will have to read through the rest of this document (or most of it anyway), so perhaps you should wait until then. The spam filter will be functional if you skip this step for now.
Edit the standard rejection letter (file preamble) and the standard complaint letter (file complaint) to your taste. You don't need the complaint letter, if you are not going to use the killer. You may also want to modify file vacation (containing the "vacation" message), if you are going to use the vacation feature of RabidFire.
Make sure Tcl/Tk is installed on your system, i.e., the machine that delivers E-Mail to your mailbox and interprets the contents of your .forward file. RabidFire was developed under version 8.3 of Tcl/Tk and it seems to run happily under 8.2. If your version is significantly older, please upgrade. There is a bug in version 8.0.x that renders some of the fancier regular expressions from the filter's database useless.
Run two simple tests (on the machine that delivers your mail). Move to directory RabidFire (you should be there already) and execute this:
```
./filter < test1
./filter < test2
```
The first message should show up in your mailbox and the second one should be rejected. The rejected message will appear in directory RabidFire/JunkMail.
Create a .forward file in your home directory. It should consist of a single line with the following contents:
```
"|.../RabidFire/filter"
```
where `...' stands for the full path leading to the directory where you have put the package. For example, my .forward file contains the following line:
```
"|/home/pawel/Mail/RabidFire/filter"
```
Notes
Some mailers require that .forward be readable to the world, and they may quietly ignore that file if it isn't.
Some mailers tend to be confused about the location of the home directory of the user for whom they deliver mail. If things don't work for you (or just to be on the safe side), you may indicate your home directory to the filter by specifying it as an argument, i.e.,
```
"|/home/pawel/Mail/RabidFire/filter /home/pawel"
```
If the filter fails to locate .rfirerc, it may still operate correctly, if the following two conditions are met:
1. When the script is run by the mailer, the environment variables HOME and USER are set to the right values. This happens on many systems.
2. The script's files are located in the standard directory, i.e., $HOME/Mail/RabidFire.
In such a case, and if .rfirerc is absent, the filter will assume that its base directory is $HOME/Mail/RabidFire, your mail server is the current host (i.e., the host delivering your E-Mail), your E-Mail address is "your user name"@"current host", your mailbox is /var/spool/mail/"your user name", and your name is "The Spamfighter."
End of notes
Send an innocuous E-Mail to yourself and see if you are receiving it. Then send another message with something offensive in it (e.g., "adults only" in the subject field). This message should be rejected and it shouldn't show up in your mailbox. This may not happen, i.e., the offensive message may make it through the filter, e.g., if your rules have been set up to exempt messages arriving from your domain.
Note that formally a rejected message bounces to its sender, but in this case you won't see it because YOU are the sender. This is to avoid bounce loops. Look in the folder dropped in RabidFire/JunkMail. This is where (with the standard set of rules distributed with the package) the filter deposits messages of this kind (i.e., ones that have been ignored for some good reason). Your message should be there preceded by the rejection letter. This is what the sender would see if he/she were somebody other than you (with a legitimate return address).

If all this works, then most likely everything is fine. If you are obsessive, you may make your .forward line look like this:

"|.../RabidFire/filter", "|cat > .../RabidFire/backup"

which will have the effect of additionally saving all incoming messages (including all spam) in file backup. You may want to keep this around for some time until you are absolutely convinced that nothing is getting lost.

No additional effort is required to install the killer. To run it, just execute the killer script in directory RabidFire. You may want to make a link to it from your bin directory, or make it callable by a click from your window manager. It also makes sense to include the base directory of the package in your PATH, as some other programs of a possible interest to you are available from there as well. The default directory with junk mail folders assumed by the killer coincides with the JunkMail directory of the filter. This way, whenever you invoke the killer, you will automatically get to your recent load of spam.

Needless to say, no filter can be guaranteed to be 100% effective. My personal rules, which are not very far from the set of rules distributed with the package, are intentionally very aggressive. I don't mind if they occasionally reject a legitimate E-Mail from somebody trying to contact me for the first time, as long as they eliminate virtually all spam aimed at my mailbox. In a completely automatic way, they exempt all people with whom I exchange E-Mail. With the assistance of the mail proxy and sendmail substitute that come with the package, whenever I send an E-Mail message, its recipients are automatically added to my list of exceptions, so that their replies are guaranteed to make it through.

A polite rejection letter (feel free to edit the default letter to suit your personal taste) will tell a legitimate sender whose message didn't make it through your filter how to get it delivered without problems. By including a magic phrase of your choice in the subject line, the sender will make sure that the message makes it through the filter and, as it happens, his E-Mail address will be automatically added to your list of exceptions. Consequently, unless the sender uses multiple addresses, such an incident can occur only once per sender, and it practically never occurs, if you are the party initiating the exchange of letters.

Note: Subdirectory PG of the package contains my personalized files, which you can view for comparison with your setup.

The filter

The operation of the filter is driven by a Tcl script stored in the rules file in the base directory. Note that the location of the base directory can be changed by setting a Tcl variable (BaseDirectory) in .rfirerc. The script in rules is organized into a sequence of rules describing actions to be taken when the incoming message meets certain conditions. Although it is possible to describe all these conditions and actions in a single file, it makes better sense (as we shall shortly see) to split them into several files that are referenced from the rules script.

In the simplest case, the action of a rule can be an unconditional single operation, e.g., save indicating that the message should be deposited in your default mailbox. On another extreme are possibly complex conditions expressed as Tcl programs performed on selected fragments of the message. Ultimately, those conditions may poll the mail servers appearing in the headers of the received message to check if they are secured against third party relays. They may even send probe messages to such servers and modify the contents of the rules database when those messages return. This functionality is already present in the filter and in the standard (or rather sample) database of rules that arrives with it, but it can be easily extended or modified to meet a specific need or suit a particular taste.

Note that the present version of the program is considerably more than merely a spam filter narrowly aimed at discarding messages that fit simple patterns. Owing to the fact that its behavior is driven by programmable rules triggering potentially complex actions, the filter can be used as a sophisticated sorter preprocessing and distributing incoming E-Mail to a number of different mailboxes. Needless to say, all those functions can be performed in addition to its original purpose, which of course is eliminating spam.

Terminology and conventions

Most constructs occurring in rules are Tcl commands or programs. Consequently, in the following discussion, we will be using some (quite minimal) terminology related to Tcl. If you are new to Tcl, don't worry - you will be able to use the package without any problems. As a matter of fact, you can use it without even reading this document. A quick look at the contents of the rules file and some of the files in the Rules directory will give you most of the insight needed for adding new patterns and adjusting the rules to your specific needs. But to take advantage of the full power of RabidFire, you have to know a little bit about its workings.

We shall assume the following convention for presenting the syntax of constructs interpreted by the filter. Any text in the teletype font stands for itself, i.e., it must literally appear in the respective place of the discussed construct. A piece of text in italics represents a parameter to be specified by the user. A fragment enclosed in square braces [...] is optional, i.e., it may or may not be present depending on the circumstances. Several alternative elements, of which only one can be legitimately specified, are separated by `|'. The ellipsis (...) indicates a sequence of items with a similar syntax and/or related meaning.

All predefined operations accessible to the rules script are in fact Tcl functions. Besides the standard functions provided by the Tcl interpreter, the filter offers a number of specific functions related to pattern matching and mail handling. Some of those functions accept flags that look like keywords starting with the `-' sign. Some of the flags may be exclusive, i.e., refer to multiple options of a single property of the respective function. Sometimes a flag may require a parameter that must immediately follow the flag keyword. In all cases, it is sufficient to specify an initial portion of the flag keyword that uniquely corresponds to a single recognizable flag.

If a function accepting (possibly optional) flags expects other arguments, those arguments must follow all flags. As a flag must begin with `-', there is usually no problem determining where the flag list ends and the argument list begins. In ambiguous cases, the sequence `--' (two consecutive minus signs) can be used to explicitly terminate the flags part.

Principles of operation

The filter is always invoked to process a single E-Mail message that arrives on its standard input. The way this message is processed is described by a Tcl script stored in the rules file in the base directory. The contents of that file are read in and evaluated (as a typical Tcl script) in the context of some definitions provided by the filter. These definitions set up some global variables and functions accessible by the script.

Being a Tcl program, the rules script can be put together in many different ways and perform many different things, not even related to processing incoming E-Mail. There exists a recommended standard way of organizing it into a series of rules that evaluate certain conditions related to the received message and determine its fate based on the outcome of those tests. One reason for this recommendation is that the rules script is also read and executed by four other incarnations of the filter (and also by the killer), which expect to execute it sensibly in circumstances somewhat different from its standard role. The other reason is methodological. By splitting the actions of the filter into a series of finely grained rules, one can better control its operation and easily modify it on-the-fly. In particular, it is easy to insert new rules without affecting the integrity and semantics of the existing ones.

The rules have immediate access to various components of the received message, which are mostly extracts from its headers. They can also invoke functions defined by the filter for checking various conditions on those components (typically involving pattern matching) and for performing actions on the incoming message. One example of such an action is bounce, which returns the message to its sender preceded by a rejection letter (preamble) provided by the user.

The filter uses a database of textual patterns that are matched against selected fragments of message headers and, possibly, its body. Although those patterns can be kept directly within the rules script, it makes better sense to organize them into sets of patterns with related functionality and keep as a collection of separate files. Such files describe the so-called groups. A pattern group can be referenced by the rules as a single object and matched against the incoming message. Groups can also be modified by the filter (and by the killer), e.g., by adding new patterns. There exists a special class of tagged groups, which can be viewed as databases of patterns (or other strings) referenced by keys.

Besides filtering, the same filter program can be used to perform four other functions identified by the name under which the program has been invoked. In two of those functions, the program preprocesses outgoing E-Mail messages before they are submitted to the proper mail server. The purpose of this processing is to add the recipients of outgoing E-Mail to one selected group of patterns identifying trusted senders whose E-Mail should never be suspected of being spam. In its another guise, the filter provides a user-callable program for introducing modifications to the pattern groups. The fourth function is performing open relay tests on indicated mail servers.

Organization of the rules script

As stated in the preceding section, the same filter program can be invoked to perform one of four functions. This multiple functionality is accomplished by providing four extra links (different program names) to the executable file filter.

Regardless of the function for which the program has been invoked, it always reads and evaluates the contents of the rules script fetched from the base directory of the package. This is needed because, at least for some of those functions, the program must know the locations and definitions of the same pattern groups that are used by the filter (and are described in rules). However, only the filter function cares about the rules used to classify incoming E-Mail messages, and executing those rules for other functions would not make much sense. Therefore, the simple approach is to make sure that the actual rules are described using a special operation (Tcl procedure) named rule which is ignored (redefined as empty) in all incarnations of the filter except for the primary one. All other statements appearing in the rules script are always executed in the same way, regardless of the program function. In consequence, the declaration part of the rules file is shared by all incarnations of the script while the actual rules are only executed by the filter.

Practically, the only non-trivial operation whose execution makes sense in all functions of the filter program is a group definition. Consequently, the rules script should be organized in such a way that all other operations (i.e., those specific to its primary function of filtering E-Mail messages) are encapsulated into actual rules. This does not restrict the power of the script in any way, because the body of a rule is an unrestricted piece of Tcl code that executes at the global level of the interpreter and may itself be organized as a complete program (defining procedures, variables, and so on).

For illustration, let us have a look at an initial fragment of the standard rules script that comes with the package:

setdirectory Rules
#
# Some groups may be needed by the updater, mail proxy, and killer.
# These are just declarations, so it doesn't hurt to keep them all here.
#
group -file         -lock             hard_from_exc "hard_from.exc"
group -file -nocase             -reg  hard_head_exc "hard_head.exc"
group -file         -lock             hard_addr_spm "hard_addr.spm"
group -file         -lock             open_rlay_exc "open_rlay.exc"
group -file -nocase       -tags -reg  open_rlay_spm "open_rlay.spm"
group -file         -lock -tags       temp_from_exc "temp_from.exc"
group -file -nocase -lock -tags -reg  temp_subj_exc "temp_subj.exc"
group -file -nocase -lock -tags -reg  temp_body_exc "temp_body.exc"
group -file -nocase             -reg  hard_subj_exc "hard_subj.exc"
group -file -nocase             -reg  hard_body_exc "hard_body.exc"
group -file                           hard_from_spm "hard_from.spm"
group -file -nocase             -reg  hard_subj_spm "hard_subj.spm"
group -file -nocase             -reg  hard_head_spm "hard_head.spm"
group -file -nocase             -reg  hard_body_spm "hard_body.spm"
group -file                           mild_from_exc "mild_from.exc"
group -file -nocase             -reg  mild_subj_exc "mild_subj.exc"
group -file -nocase             -reg  mild_body_exc "mild_body.exc"
group -file -nocase             -reg  soft_from_spm "soft_from.spm"
group -file -nocase             -reg  soft_addr_spm "soft_addr.spm"
group -file -nocase             -reg  soft_subj_spm "soft_subj.spm"
group -file -nocase             -reg  soft_head_spm "soft_head.spm"
group -file -nocase             -reg  soft_body_spm "soft_body.spm"

setdirectory

rule Init {
#
# This unconditional rule initializes things for the filter
#
    group -nocase -reg complaint_subj { "spam +complaint" }
}
...

Being nothing more than a Tcl script, the rules script conforms to the Tcl syntax. In particular, all lines starting with `#' are treated as comments. The portion of the script following the second setdirectory command consists exclusively of a series of rules; thus, that portion is only meaningful to the filter. Note that the first rule (Init) is not really a rule. Its body is executed unconditionally and it defines one more pattern group - to be accessible to the filter but not to the other functions.

Pattern groups and files

From the viewpoint of the rules script, a pattern group provides a single identifier for a set of textual patterns to be matched against something. The actual patterns comprising that set can be fetched from an external file or can be specified directly when the group is defined. The following operation (Tcl function) is used to define a pattern group:

group flags groupname filename|patternlist

The flags part is a sequence of options describing the properties of the pattern group. The following flags are accepted:

-or or -and

This flag indicates whether the pattern set should be treated as an alternative (-or) or a conjunction (-and) of patterns. In particular, if -or is selected, then the group is assumed to match a given string if any of the patterns in it matches the string. With -and, all patterns must match the string for the matching operation to be considered successful. The default, assumed when neither -or nor -and is explicitly specified, is -or.

-file

If this flag is present, the patterns comprising the group will be read from the file indicated by the filename argument. Otherwise, the list of patterns is specified directly as patternlist.

-lock

This flag is only meaningful together with -file. It ensures that when the pattern file is read by the filter, it will be locked, but only for the duration of the read operation. This option is intended to guarantee the integrity of a pattern file that may be written into while being read - as explained below.

-nocase

The case of letters appearing in the patterns, as well as in the text being matched, is ignored. By default, the case of letters is significant, unless -address is selected (see below), which automatically forces -nocase.

-regular or -address

Selects one of two interpretations of the patterns comprising the group. With -regular, the patterns are regular expressions, as understood by Tcl, with a minor extension. With -address, the patterns are assumed to represent IP addresses (symbolic or numeric) and, possibly, E-Mail addresses. The details of these interpretations are explained in section Patterns. By default, if neither -regular nor -address is explicitly specified, -address is assumed.

-tags

If present, this flag indicates a group in which patterns are tagged with keywords that can be viewed as handles for referencing them. Such groups can be used as simple databases.

-multiple

This flag indicates that the group may have been already defined, in which case the present definition should be ignored. Without this flag, multiple attempts to define the same group are treated as errors.

The first of the two mandatory arguments following the keyword list assigns a name to the group. Group names must be unique and they must be legitimate Tcl identifiers.

Depending on whether -file was included among the flags, the second argument is interpreted as the name of the file from which the patterns should be read, or as the list of patterns specified directly as part of the group command. For example, the Init rule in the above example defines a group consisting of a directly specified single pattern, which happens to be a regular expression.

The pattern list of a "file" group is not read until the group is referenced for the first time. This way, no overhead is incurred by declaring many groups, and it actually pays to have more small groups rather than few large ones. If some of the groups never get referenced (because the fate of the received message is determined earlier), their pattern files will never be read.

We say that a file group is accessed passively if its pattern file has to be read but is never written back. This is the most typical reference of a group, e.g., to be matched against a piece of text. A file group may also be updated, which means that its pattern file is modified and overwrites the previous version. When this happens, the filter automatically locks the pattern file, to make sure that no other copy of the filter (or perhaps the killer) is performing a similar operation at the same time.

A file group that is not unlikely to be modified while it is being referenced passively (i.e., read) should be declared with the -lock flag. This will guarantee that the the pattern file is locked also while being read, but only for the duration of the read operation (to reduce contention). For example, the -lock flag is specified for the hard_from_exc group in the rules file that comes with the package. That group contains patterns identifying the trusted senders whose messages should be always delivered. Further down in the rules file, there exists a rule that will automatically add a new pattern to that group whenever the filter receives a message with the magic subject phrase. Therefore, as multiple copies of the filter are allowed to run concurrently, the file may be modified at any time, also while being read for passive access.

For as long as a group is being used passively (i.e., for matching), its pattern file is read only once - at the first use. Whenever a group is modified, this operation always consists of the following steps: locking the pattern file (which may involve waiting until the lock is available), then (re)reading the pattern file (even if it was read before, e.g., for passive access), modifying the group, writing the pattern file back, and finally unlocking it.

It may be worthwhile to mention that the scripts use their private method of locking files, because the standard UNIX file locks are not available from Tcl. Subdirectory Lock in the base directory is used for this purpose. Locking is accomplished by creating dummy files in that subdirectory. It is assumed that no file is ever locked for more than 2 minutes. Locks older than this much are considered stale and forcibly removed.

For an untagged file group (i.e., one whose declaration does not specify the -tagged flag), the corresponding pattern file is expected to contain one pattern per line. Empty lines and comments (i.e., lines containing `#' as the first non-blank character) are ignored, as are leading and trailing blanks of the lines containing patterns. If a pattern starts or ends with a blank (and also if it starts with `#'), it should be encapsulated in double quotes. Quoting patterns always makes sense and is recommended as a standard practice.

For a tagged file group, every line of the pattern file, except for a comment or an empty line, is assumed to include a pair of strings separated by at least one space or tab character. As each of those strings is allowed to contain spaces and tabs, it is recommended to use double quotes (for each of them) to avoid ambiguity. The first string is assumed to be a pattern (essentially in the same way as for an untagged group), and the second string is the tag of that pattern. If no second string is specified, the tag is assumed to be empty. It is not required that all tags be different.

For a non-file group, i.e., one whose patterns are specified directly as an argument to the group function, the specification comes as a list of strings representing the patterns. Note that double quotes in such a list are stripped automatically by Tcl when the list items are interpreted. For a tagged group, every element of the specified list must be itself a two-element list describing the pattern and its tag, respectively. Directly specified tagged groups do not seem to be very useful. A tagged group is meant to be a database, and a database requires non-volatile storage to make sense.

When a file group is modified (operation modify) and its pattern file is rewritten, the filter tries to preserve the structure of comments and empty lines in the pattern file, even if some patterns are removed. All patterns and tags in a pattern file rewritten by the filter are always quoted.

Patterns

Depending on how the group was defined (with the group operation), its patterns can be viewed as representing either addresses or general regular expressions. The second interpretation is somewhat simpler to explain because it essentially coincides with the standard interpretation of regular expressions in Tcl. If you don't know Tcl, you may assume that its regular expressions are more or less the same as those acceptable by sed, perl, or vi. Remember that the characters `.*^$()[]' are special. They should be escaped with `\' if they are meant to stand for themselves.

In fact, there is one somewhat inconvenient difference between the regular expressions formally accepted by Tcl and those used by most other programs. Namely, Tcl patterns don't let you specify word boundaries, i.e., `\<...\>'. To compensate for this inconvenience, the filter preprocesses patterns belonging to a regular group, replacing `\<' and `\>' with sequences that simulate world boundary matches. This only works at the very beginning and end of a pattern, however.

Below I include (without permission) an excerpt from the on-line documentation of Tcl succinctly summarizing all the features of regular expressions in their Tcl edition.

Begin of excerpt

Regular expressions are implemented using Henry Spencer's package (thanks, Henry!), and much of the description of regular expressions below is copied verbatim from his manual entry.

A regular expression is zero or more branches, separated by `|'. It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by `*', `+', or `?'. An atom followed by `*' matches a sequence of 0 or more matches of the atom. An atom followed by `+' matches a sequence of 1 or more matches of the atom. An atom followed by `?' matches a match of the atom, or the null string.

An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), `.' (matching any single character), `^' (matching the null string at the beginning of the input string), `$' (matching the null string at the end of the input string), a `\' followed by a single character (matching that character), or a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in `[]'. It normally matches any single character from the sequence. If the sequence begins with `^', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by `-', this is shorthand for the full list of ASCII characters between them (e.g. `[0-9]' matches any decimal digit). To include a literal `]' in the sequence, make it the first character (following a possible `^'). To include a literal `-', make it the first or last character.

Copyright © 1993 The Regents of the University of California.
Copyright © 1994-1996 Sun Microsystems, Inc.
Copyright © 1995-1997 Roger E. Critchlow Jr.

End of excerpt

The same regular expression matching apparatus of Tcl is used to match patterns in both types of groups, i.e., regular and address. The difference is in how the expressions in the two types of groups are preprocessed before being used for matching. Also, the case of letters is always ignored when matching a pattern from an address group, whereas for a regular group this option is settable by a flag.

An address pattern is assumed to describe a class of IP addresses (symbolic or numeric) or E-Mail addresses. The following rules apply:

An address pattern may include characters from the following set: letters (case ignored), digits, `._-~*' and at most one `@', with the asterisk considered special.
If an address pattern starts (ends) with a letter or digit, then it is matched starting (ending) at a word boundary. A word boundary is defined as anything not being a letter or digit.

An asterisk occurring anywhere within the pattern (including beginning and end) matches any sequence of characters (including zero) consisting of letters, digits, and the underscore `_'.

For example, the pattern tdbank matches the following addresses: www.tdbank.com, mgg@tdbank.ca, but not www.tdbanking.com, which would be matched by "tdbank*". Similarly, @w*w. matches you@www.muka.com as well as anything@w123w.pl, but not somebody@w.ww.edu.

Although formally the local part of an E-Mail address may contain characters somewhat fancier than those mentioned in the first rule above (e.g., the X.400 standard), it is safe to assume these days that all sensible addresses fit the rule. Moreover, the filter ignores the case of letters in all addresses, including E-Mail addresses, even though formally the local part of an E-Mail address is (or rather may be) case sensitive.

Of course, besides the pattern groups of RabidFire, one can freely use the standard tools of Tcl (regexp) for matching directly regular expressions in their exact Tcl flavor.

Global variables

Before the rules script is read and evaluated, a few global variables are preset by the filter. The script evaluates at Tcl level 0, which means that it directly perceives the global level of definitions. In simple words, the script can access those variables directly, and any functions defined in the script can reference them as global variables.

The following variables represent fragments or extracts from the received message being processed by the filter:

HEADER

This variable stores the complete headers of the received message as a Tcl string.

BODY

This variable stores the message body. Depending on the message, BODY may not be its complete text, but only that part that the filter considers relevant from the viewpoint of matching patterns against it. For example, if the message contains a non-text MIME attachment (in base-64 encoding), this attachment will not be included in BODY.

FBODY

This is the complete body of the received message, including all (possibly encoded) attachments. Beware that the combined string stored in FBODY can be quite long; thus, it should be referenced and used with due care, to avoid memory problems and excessive processing overhead. For example, it is not recommended to copy FBODY or to do complicated group matching within it, unless you are sure that such an operation makes sense.

Note: While the filter never tries to guess at the contents of non-text components (attachments) of a received message, the handling of encoded textual components is configurable by the Expand option in .rfirerc. With the default setting of this option (base64+quoted), all textual components encoded in base-64 or as quoted-printable are unpacked before presenting the message to the rules. This is also how they appear in FBODY and how they will look when (and if) the message is stored in your mailbox. By adding html to Expand (i.e., base64+quoted+html), you can additionally convert HTML components to plaintext, which may be useful, if you are not crazy about receiving HTML-encoded messages. This conversion will only apply to inline HTML components, not to HTML attachments. HTML attachments are left in HTML, although they are converted from base-64 or quoted-printable if the respective options of Expand are set.

One can argue that it might be a better idea to decode encoded textual attachments solely in BODY, preserving the original shape of the incoming message - to be saved in the mailbox exactly as it has arrived. Personally, I prefer the present solution because I want to be able to process FBODY further, for which the plaintext ASCII view of all textual components is useful. Note that hiding textual components (by encoding them in base-64) makes little sense, unless you are a spammer trying to fool a filter.

FROM

This is the E-Mail address of the sender, if one could be determined from the headers. Otherwise, the variable contains an empty string.

RETURN

This is the return address from the headers, if one was specified there explicitly. Otherwise, this string is the same as FROM.

TO

This is the collection of E-Mail addresses representing the recipients of the message extracted from the "To" and "Cc" headers. Multiple addresses are separated by `\n' (the newline character) and stored as a single string.

SUBJECT

This variable stores the contents of the subject line. The string is empty if no subject has been found in the headers.

ADDRESS

This is the set of all unique host addresses (including symbolic and numeric IP addresses) that could be collected from the headers, excluding the "Cc" and "To" headers. They are stored as a single string with the individual entries separated by `\n' (the newline character).

One additional global variable is Flags, which is a Tcl array collecting certain status variables described in sections Bounces, probes, and rejects and Operations.

Operations

In this section, we describe the functions defined by the filter and made available to the rules script. Although the order in which they are listed may seem accidental, it has been somewhat inspired by their relevance and interdependence. Note that one operation, group, has been already discussed in section Pattern groups and files.

setdirectory [-relative] [dirname]

This operation sets the current directory for locating files referenced by operations like group or readfile. By default, when the script starts executing, the current directory is set to the base directory. The -relative flag indicates that the specified dirname is relative to the present current directory; otherwise, it is relative to the base directory. If dirname is missing (this is illegal with -relative), the current directory is reset to the base directory. If dirname starts with `/', it is always absolute, regardless of the flag.

cmpnc s1 s2

This is a "case-less" string comparison. The operation converts both argument strings to lower case and compares them in essentially the same way as the standard string compare function. The result is an integer: -1 (s1 is lexicographically less than s2), 0 (the strings are equal), +1 (s1 is lexicographically greater than s2).

lappendunique list item

This function performs the same operation as the standard lappend function, i.e., it appends the specified item to the list, but makes sure that the item is not already on the list. It returns 1 if the item was not found on the list (and was actually appended), and 0 otherwise.

getaddri s

Extracts all numeric IP addresses that could be located within string s. The result is a Tcl list of those addresses, which are all unique. This list is empty, if not a single numeric IP address could be found in s.

getaddrs s [nd]

Extracts all symbolic host addresses that could be located within string s, or rather the fragments that look like symbolic host addresses. The optional argument is an integer specifying the number of dots that must occur within a symbolic address to be considered legitimate. If not specified, it defaults to 1. The extracted (unique) addresses are returned as a Tcl list.

netaddress addr [nd]

If addr is a numeric IP address, the function returns the corresponding network address, based on the standard (and obsolete) interpretation of address classes (with the host bits set to zero). For a symbolic host address, the function returns the original address stripped (from the left) to the indicated number of dots, which defaults to 2. Thus, the number of address components in the result is not greater than nd + 1.

istrusted addr

This function returns 1 if the specified address matches one of the patterns specified in the .rfirerc file (variable ExcludeList) identifying the domains and hosts that are considered trusted. For example, the filter will not suspect trusted mail servers to be open relays, and the killer will never send a complaint letter to a trusted host.

readfile [-lock] [-eval] filename

The specified filename is interpreted relative to the current directory. The operation reads the contents of the indicated file. With -eval, the file is immediately evaluated as a Tcl script in the context of the function that invoked readfile. In such a case, readfile returns the result of that evaluation. Without -eval, the entire contents of the read file are returned as a string. With -lock, the file is locked while being read.

writefile [-lock] [-append] filename string

The operation writes the specified string to the file described by the filename argument, which is interpreted relative to the current directory. With -append, the string is appended at the end of any present contents of the file, which must already exist. Without append, the string completely overwrites the old contents, and the file is created if it didn't exist. With -lock, the file is locked for the operation. No value is returned by writefile.

match groupname string

The function returns 1 if the group matches the string, and 0 otherwise. If the group was declared with -and, all patterns in the group must match the string for the match operation to succeed. Otherwise, it is sufficient if any of those patterns matches the specified string. The semantics of patterns was described in section Patterns.

Upon a successful return, Flags(lastmatch) is set to the substring of the specified string that triggered the matching. For an -and group, if match returns 1, Flags(lastmatch) is set to the substring matched by the last pattern in the group. If match returns 0, Flags(lastmatch) stores the substring matched by the last pattern that actually matched the string, and is undefined if the string wasn't matched by the first pattern in the group. For a tagged group, Flags(lasttag) is set along with Flags(lastmatch) - to the tag of the corresponding pattern.

extract flags groupname item ... item

This operation can be used to extract patterns from the indicated group. The patterns to be extracted are identified by the list of items following the group name. The interpretation of those items depends on the flags and on the group type in the following way:

With -list, the function expects a single item which is treated as the list of actual items. Otherwise, the possibly multiple items are listed directly as arguments of extract.

If the group is not a tagged group, the operation extracts all patterns that match any of the indicated items (viewed as strings). The matching is carried out according to the group specification (i.e., -regular versus -address, case sensitive versus -nocase).

If the group is a tagged group and -tags is specified as one of the flags, the list of items refers to the tags rather than patterns, and the items are viewed as patterns to match the tags. Three other flags are applicable in such a case, -regular, -global, and -nocase, indicating what type of matching should be applied. With -regular (and also by default), the items are viewed as regular expressions in their Tcl flavor. With -global, the items are treated as "glob-style" patterns (a standard feature of Tcl). In both cases, -nocase can be used to select case-insensitive matching (which is case-sensitive by default).

One more flag that can be specified for a tagged group, irrespective of whether -tags is used or not, is -both. When present, it indicates that along with the patterns being extracted, their tags should be extracted as well. In such a case, the function returns a list of pairs <pattern,tag>. Otherwise, the list returned by extract is a straightforward sequence of extracted patterns.

Flags(lastmatch) and Flags(lasttag) (whenever applicable) are set by extract in a way consistent with the match operation, although this feature is pretty much useless. The primary purpose of extract is to implement simple databases using tagged groups, where the tags are indexes and the patterns are records.

modify flags groupname item ... item

This operation modifies the contents of the indicated group, forcing the pattern file to be rewritten. The arguments, which are similar to those of extract, identify the patterns to be affected by the modification.

The list of items identifies the patterns to be added to or deleted from the group. What exactly happens is determined by the configuration of flags, in the following way:

If one of the flags is -delete, which is mutually exclusive with -add, the indicated patterns are removed from the group. Otherwise (by default), and also if -add is specified instead of -delete, new patterns are added to the group.

The -delete case is more complicated, at least for a tagged group. If -tags is specified among the flags, the items identify the tags of the patterns to be removed. The same options (-global, -regular, -nocase) are applicable as for the extract operation. Without -tags (which is illegal for a non-tagged group), the items are strings to which the patterns belonging to the group are matched. Any pattern that matches at least one of the strings is marked for deletion.

For addition, the list of items represents new patterns to be added to the group. Again, this operation is straightforward for a non-tagged group, in which case every item is simply a new pattern. For a tagged group, each item is interpreted as a two-element list identifying a pattern together with its tag. As for extract, if -list occurs as one of the flags, a single item argument is expected, which is interpreted as the list of actual items.

The script applies some dubious heuristics to avoid adding patterns that would match no new strings compared to the patterns already present in the group. Those heuristics are quite simple and don't even come close to a full equivalence/inclusion test for regular expressions. If a pattern being added to a tagged group is exactly the same as an existing pattern, but it specifies a new tag, the new tag replaces the old tag. If, however, an attempt is made to add a new pattern that is different from all existing patterns, but turns out to be superfluous, the new pattern is simply ignored.

The function returns the list of patterns (without tags, even if the group is tagged) affected by the operation, i.e., actually added or deleted. If this list is empty, it means that the operation was void, i.e., the pattern file was not modified and there was no need to write it back.

Although a group that has no underlying pattern file (i.e., one that was declared without the -file flag) can also be modified, such modifications are obviously volatile and not very useful.

matchap pattern string

This function matches the specified address pattern to the indicated string. It returns 1 if the pattern matches the string, and 0 otherwise. In the former case, Flags(lastmatch) is set to the matched portion of the string.

rule name [trigger] body

This operation defines a rule an assigns a name to it. The name of a rule must be a legitimate Tcl identifier, and its body must be a list of Tcl statements, similar to a Tcl procedure. The optional trigger is a Boolean condition, i.e., something that can be sensibly used as the first argument of an if statement. The semantics of rules is described in section Rules.

proceed rulename

This operation can only be used within a rule body. It transfers control to the indicated rule. Any statements following proceed in the current rule will not be executed.

stop

This is another operation that can only be performed from within a rule body. It unconditionally stops the execution of the rules, effectively terminating the execution of the filter script.

save [-list [mbx ... mbx]]

This operation saves the received mail message in the mailboxes specified as arguments. If no mailbox is specified, the message is saved in the default mailbox indicated in the .rfirerc file. With -list, the function expects a single mbx argument pointing to the list of actual mailboxes.

The saved message consists of the contents of HEADER and FBODY as they look at the moment when the operation is performed.

The filter makes sure that the message is saved at most once in any given mailbox. Multiple save operations referencing the same mailbox are legal, but only the first of them is effective and the remaining ones are void. If, for whatever reason, you would want to save the same message twice in the same mailbox (it may make sense, e.g., when the message has been processed and modified in between), you can use the following function:

resave [mailbox]

which resets the "saved" status of the message with respect to the indicated mailbox (or all mailboxes, if no argument is given), so that a subsequent save operation will be effective, even if the message was previously saved. For example, resave is called by function setmessage discussed in section Options and extras.

fixpreamble strvar [ef]

This operation is used implicitly by bounce, forward, and send, but it has been made generally available because it may be of some use in other situations. It scans the string stored in strvar (which must be a variable name) and modifies it by replacing all occurrences of the following sequences:

%N
is replaced with your name, as declared in .rfirerc
%E
is replaced with your E-Mail address, as declared in .rfirerc
%F
is replaced with the sender address (FROM) of the current message
%S
is replaced with the subject line (SUBJECT) of the current message

Additionally, if the second argument of fixpreamble is given (and it is not an empty string), it replaces all occurrences of `%X'. The modified source string overwrites the original (the first argument is passed "by name") and the function returns no value.

bounce [-except magic] [-subject s] preamble

This operation bounces the received message to its sender, preceding it with the indicated preamble. Some special sequences in the preamble are substituted as described above. If -except is specified, then additionally the magic string replaces `%X' in the preamble. This is intended for inserting a personalized magic phrase into the (generic) bounce preamble, to tell the recipient how to force the message through the filter.

Note that you can prepare a customized version of the preamble, which includes the right personalized items and needs no substitutions. However, it may be easier to parameterize the filter if those items are kept in one place (i.e., .rfirerc) and included from the preamble.

If no subject is explicitly given (with the -subject flag), the bounced message will receive the following standard subject: Returned message, filter reject.

forward [-preamble p] [-subject s] [-nobounce] [-list] addr ... addr

This operation forwards the received message to the indicated recipients. With -list, the expected list of recipients consists of exactly one item interpreted as the list of addresses. With -nobounce, the message is sent with an empty envelope sender (such that it won't be bounced by the mail server, e.g., on a nonexistent recipient) and tagged in such a way that if it ever returns, the filter will quietly drop it without any consequential attempt at processing. This flag should be used, e.g., when the processing of a bounced forwarded message may result in another forwarded message, which in turn may cause an infinite loop.

If a preamble string is specified, it will precede the forwarded message (standard substitutions are performed on the preamble string). Otherwise, the following default preamble will be prepended: "This message is being forwarded to you by RabidFire from your name <your E-Mail address>". Similarly, if no subject is explicitly specified, the default subject line is: "Forwarded message: original subject line".

send [-subject s] [-nobounce] [-list] msg addr ... addr

This operation sends the indicated message (string msg) to the specified recipients. The interpretation of -list, -nobounce, and -subject is exactly as for forward. The default subject is NONE. The specified message string is preprocessed by fixpreamble before being sent.

mark arg ... arg

This simple operation appends a piece of text (a mark) at the end of the received message. This piece of text is a concatenation of all arguments. The purpose of marks is to pass a note from the filter to its user, e.g., indicating some problems or abnormal conditions in processing the message.

pipe arg ... arg

The received message (including its headers) is "piped" to the command described by the arguments. See the standard Tcl function open for the pipe syntax.

nslookup addr

The function performs a DNS lookup on the specified address, which can be either numeric or symbolic. If the address resolves, the function returns a two-element list consisting of the numeric address (the first element) and the symbolic address. Otherwise, nslookup returns an empty string. This function used to invoke the standard program nslookup to do the job, but as of version 1.0 of RabidFire, it uses dig instead. This is because nslookup is now considered deprecated. Thus, the operation relies on the dig program being available in the system and accessible through the execution path declared in .rfirerc. If this program cannot be found, the operation will always return an empty string.

The filter maintains its private cache of nslookup results. Consequently, multiple and possibly redundant references to nslookup cause no excessive overhead.

openrelay [-full|-quick] [-message m] [-list] addr ... addr

This operation carries out an open relay test of the indicated hosts (assumed to be mail servers). The meaning of -list is standard - it indicates that a single addr argument is expected, which represents a list of addresses.

With -quick (which happens to be the default), the test is quick (or light) and it doesn't involve sending probe messages to the suspected hosts. The function will just poll the server using a sequence of SMTP commands and monitor its replies. Such a test is inconclusive, in the sense that it may give a false "positive" result. There exist servers that behave as if they were willing to accept a message for relaying it to a third party, and later drop the message without actually forwarding it anywhere.

With -full, if the server passes the light test, the filter attempts to send a probe message via the server. In such a case, the complete outcome of the test will only be known when (and if) the message arrives back to you. The string specified with -message, if present, is appended to the standard body of the probe message, which contains a polite explanation addressed to the administrator (in case it is ever looked at by a human being). The probe message is also tagged in special way and it contains a piece of string identifying the mail server, so that when it returns, the filter will be able to tell the identity of the culprit.

The function returns the following values:
-1
This value can only be returned if the function was invoked to verify a single host, i.e., the list of addresses consisted of a single item. It means that host does not behave like an E-Mail server, i.e., it either doesn't exist or it doesn't respond sensibly to SMTP commands.

0
This value means that all the specified hosts are OK, i.e., none of them appears to be an open relay. This also includes the case of multiple hosts of which some (or all) did not respond like E-Mail servers.

>0
This is the number of hosts that, according to the light test, appear to be open relays. If the full test was selected, a probe message has been sent to each of those hosts.

Bounces, probes, and rejects

A message qualified as spam by your set of rules is usually bounced (via the bounce operation) to the sender. The primary reason for that is to notify the sender of a legitimate message misdiagnosed as spam that the message must be re-sent. Needless to say, in the vast majority of cases, a bounced message is going to be either dropped or bounced back at you, because its "from" address is nonexistent (even if it was legal at some point, it has become illegal in the meantime). As all actions regarding incoming messages, the operation of handling re-bounced messages must be programmed into the rules. However, even if the rules say nothing about this, the filter makes sure that a re-bounced message arriving back at your mailbox will never be bounced again. This is accomplished by marking bounced messages with a special tag in the headers (and also in the body), so that they can be easily spotted when they return.

SMTP says that a message whose so-called "envelope sender" (i.e., the sender specified with the mail from command) is empty, should not be returned, if it cannot be delivered. Although the filter takes advantage of this feature, some of the bounces do return and they have to be taken care of.

Having detected a re-bounced message, the filter sets Flags(nobounce) to a nonempty string. The role of this flag is to communicate to the rules that the arrived message is a re-bounce, so that they can perform some specific action, if they wish. Additionally, if Flags(nobounce) is set, any attempt to bounce the message to its sender will be void. Thus, if the rules decide to bounce such a message after all, nothing is going to happen. Note that a rule may explicitly set Flags(nobounce) to a nonempty string (any string) to disable bounce for any message.

The special tag identifying a re-bounced message is somewhat obfuscated and personalized based on the E-Mail address of the user running the filter. This way, if your filter bounces a message of another user of RabidFire, the two filters will not get confused.

Although the re-bounce tag is inserted by the filter into the headers of a bounced message, it is looked up in the headers as well as in the body of every received message. This is because some mail servers, when they return a message to the sender, completely strip the original headers, but include them as part of the body of the returned message (typically in a special attachment).

Another special type of message that may be received by the filter is a returning probe message from a full open relay test. A full open relay test may be initiated by the filter, by the killer, or manually by the user. Probe messages are also recognized through special (personalized) tags in their headers. Having received such a message, the filter sets Flags(nobounce) (probe messages are surely not to be bounced), and in addition sets Flags(ortreply) to a string consisting of two blank-separated tokens: the numeric and symbolic addresses of the probed host. The reception of a probe message demonstrates with absolute certainty that the host mentioned in it is an open relay. The rules can use the value of Flags(ortreply), e.g., to add the host address to a group file (via modify).

The last case of a special message is a reply to the semi-automatic complaint letter sent by the killer. Of course, to ever receive such a message, you have to actually use the killer to complain about your spam. Only those responses that quote your complaint letter are going to be detected (using yet another personalized tag interpreted as the killer's signature). You may want to perform some specific action on such messages, e.g., store them in a dedicated mailbox without viewing (see section Examples for an illustration). Upon reception of a reply to the killer's complaint, the filter sets Flags(kilreply) to a nonempty string.

Rules

The rules (defined with the rule operation) constitute a series of named chunks of Tcl code with optional conditions triggering their execution. Both the triggering condition of a rule as well as its body are executed in the global context, so they can directly perceive and operate on the variables and flags discussed in section Global variables.

It should be noted that the execution of rules does not start until the entire rules file has been read and evaluated. Thus, the declarations of rules can be followed by other statements (e.g., definitions of functions and variable assignments) that will be evaluated before the first rule is executed.

Normally, the rules are processed in the order of their definitions in the rules file. If a rule has a trigger, the trigger is first evaluated as a Boolean expression, i.e., something that could be used as the first operand of a Tcl if statement. If the trigger evaluates to `true', the body of the rule is executed; otherwise, the rule is simply skipped and the next rule is tried. This operation continues until either the last rule is reached or some fired rule executes stop. A trigger-less rule is equivalent to a rule whose trigger always evaluates to `true'.

Operation proceed (available exclusively from a rule body) can be used to transit directly to a specific rule. It can be viewed as a `goto' - the present rule is abandoned, and any statements following proceed are ignored. Note that rule names need not be unique. Formally, they are only needed as arguments of proceed. If two or more rules have the same name, only the first of them can ever be referenced by proceed, which always refers to the first rule (in the order of their definitions) with the specified name. Beware that loops are possible - in particular, it is legal for a rule to proceed to itself.

Examples

In this section, we discuss the layout of the standard (or rather sample) configuration of rules that comes with the package. We have already seen the opening fragment of the rules file in section Organization of the rules script, including the group declarations and the first (unconditional) rule. That fragment is followed by the following sequence:

...
rule Bounce { $Flags(nobounce) != "" } {
######################################################################
# This rule detects bounces of our rejection letters and probes from #
# open relay tests                                                   #
######################################################################
    # try to account for those users who directly reply to our
    # rejection, even though we ask them not to do so
    if [match hard_subj_exc $SUBJECT] {
        mark "EXCEPTION (direct reply): '$Flags(lastmatch)'"
        modify -add hard_from_exc $FROM
        proceed OK
    } else {
        proceed ORTProbe
    }
}

rule Complaint { $Flags(kilreply) != "" || [match complaint_subj $SUBJECT] } {
################################################################
# Check if this is a response to the killer's complaint letter #
################################################################
    save "JunkMail/killer"
    stop
}
...

The first of the above two rules is triggered when Flags(nobounce) contains a nonempty string. As we explained in section Bounces, probes, and rejects, this means that the received message is a bounced message that at some point was sent by the filter. It may be a re-bounce of a bounced spam or a returning probe from an open relay test. What is actually the case, will be determined by another rule, named ORTProbe, located at the end of the full sequence of rules. One possibility, taken care of by the present rule, is that the message happens to be a legitimate piece of mail that was earlier (accidentally) rejected by the filter and now is being resubmitted by its sender, in response to the instructions in the preamble.

Now, the sample preamble that comes with the package includes the following instruction:

...
To force your message through the filter, please RE-SEND it but
include the phrase "%X" (the case is insignificant) in the
subject field.  But DO NOT REPLY TO THIS MESSAGE! If you do so,
your message will be dropped and nobody will see it ever again.
This is to avoid looped bounces from non-existent users.
...

with "%X" being replaced by the actual magic phrase. Thus, as you can see, the sender of the original message is quite explicitly asked not to simply reply to the rejected message, but to send a new message instead. One reason for that is to make sure that the new copy of the message will not be marked as a returning bounce (will not set Flags(nobounce) upon arrival), so that the Bounce rule will not have to worry about it. At first sight, the issue may seem irrelevant because (as we can see) the Bounce rule can detect such messages and salvage them before they are dropped. Note however, that should the sender make a typo within the magic phrase, the Bounce rule will be powerless to detect his intentions and the message may be irretrievably lost (this depends on what specifically your rules decide to do with re-bounces, but their treatment is not going to be very nice, is it?). Note that if the resubmitted message is new, the worst that can happen is that it will be bounced again, in which case the sender will be given an opportunity to correct his mistake. Thus, the preamble explicitly asks the sender to do the right thing, i.e., to send a new copy of the message, but the filter takes a second look at re-bounces anyway - to make sure that even if the sender does not comply with the instruction, the message is still going to be delivered (at least in the vast majority of cases).

All patterns stored in hard_subj_exc are viewed as "magic phrases", i.e., if any of them matches the subject of an arriving message, its sender is added to hard_from_exc (operation modify), which will effectively exempt his subsequent messages from any harsh treatment. Rule OK are listed a bit further in this text. Also note the mark operation, which adds a comment at the end of the message to notify you what has happened. This comment consists of the text EXCEPTION (direct reply): followed by the matched portion of the subject line.

The next rule, Complaint, is only executed if Flags(nobounce) is not set. It detects replies to the complaint letters generated by the killer. Such complaints are deposited in a special mailbox named killer in directory JunkMail. Note that, according to the last setdirectory operation, JunkMail is a subdirectory of the base directory. Having stored the message in the mailbox, the rule stops the processing of the current message.

The next three rules attempt to quickly classify the message as an obvious exception:

...
rule ExcHF { [match hard_from_exc $FROM] } {
###########################################################################
# Next we check for hard  'from'  exceptions.  A message originating from #
# a known sender is delivered no matter what. This may be too optimistic, #
# though.                                                                 #
###########################################################################
    mark "EXCEPTION: '$Flags(lastmatch)'"
    proceed OK
}

rule ExcHSB { [match hard_subj_exc $SUBJECT] || [match hard_body_exc $BODY] } {
#############################################################################
# Now we check for the special subject and "hard" exception patterns in the #
# body.   The sender is automatically added to our list of hard exceptions, #
# and the message is deemed to be OK.                                       #
#############################################################################
    mark "EXCEPTION: '$Flags(lastmatch)'"
    modify -add hard_from_exc $FROM
    proceed OK
}

rule ExcHH { [match hard_head_exc $HEADER] } {
##############################
# Hard  'header'  exceptions #
##############################
    mark "EXCEPTION: '$Flags(lastmatch)'"
    proceed OK
}
...

The first of them fires if the sender address (FROM) matches any of the patterns in the hard_from_exc group, which describes trusted senders. If this is the case, the rule proceeds to OK, where, as we shall shortly see, the message is deposited in the standard mailbox. Before that happens, the rule appends a comment at the end of the message (operation mark) indicating the reason for excepting it from a more thorough scrutiny. This string consists of the word EXCEPTION followed by the matched portion of the sender address.

The second rule checks the subject and the body of the received message against patterns representing "hard" exceptions (groups hard_subj_exc and hard_body_exc). If the message meets this criterion, it is delivered to the user's mailbox and its sender is added to the list of "hard from" exceptions (operation modify), so that subsequent E-Mail from that address will always be let through. The third simple rule deals with general "header" exceptions, i.e., it applies the patterns from hard_head_exc to the complete headers of the received message.

If the filter has managed to get this far, i.e., past the ExcHH rule, subtler techniques are employed to determine the message's fate. The next two rules look like this:

...
rule ORTInit {
#################################################################
# The following file contains tests for validating mail servers #
#################################################################
    readfile -eval "Tests/server.tst"
}

rule Local { [local_test] } {
#################################################################
# Checks if the message was entirely passed in the local domain #
#################################################################
    proceed OK
}
...

The first of them is unconditional and its sole purpose is to read and evaluate the contents of file Tests/server.tst containing definitions of a few functions referenced in subsequent rules (specifically, Local, Temp, and SpmORSR). Please consult server.tst for those definitions. In particular, local_test, checks whether all addresses occurring in the headers of the received message (those extracted into the ADDRESS string) are trusted, which means that the message has been passed entirely within your trusted (local) domain. A message with this property is assumed to be OK and delivered. Otherwise, the filter continues with the following rule:

...
rule Temp {
########################
# Temporary exceptions #
########################
    remove_obsolete_tags temp_from_exc
    remove_obsolete_tags temp_subj_exc
    remove_obsolete_tags temp_body_exc
    if { [match temp_from_exc $FROM] || \
        [match temp_subj_exc $SUBJECT] || \
            [match temp_body_exc $BODY] } {
        mark "EXCEPTION: '$Flags(lastmatch)'"
        proceed OK
    }
}
...

which accounts for temporary exceptions. As we shall see in section The mail proxy, with the assistance of the mail proxy daemon, it is possible to insert patterns into outgoing messages that, for a limited time, will be applied to incoming mail. The idea is to make sure that expected replies to some messages (possibly sent to unreliable places) will be passed through, without making those exceptions permanent. Three special (tagged) groups are used for this purpose, storing patterns applicable to the sender address (temp_from_exc), subject (temp_subj_exc), and body (temp_body_exc). The tags represent the deadlines after which the tagged patterns should be dropped from the group, which operation is performed by remove_obsolete_tags defined in Tests/server.tst. If the above rule still fails to qualify the message as deliverable, its further processing looks as follows:

...
rule SpmHFASHB {
#########################################################################
# When we get here, we begin to become suspicious. This rule checks for #
# "hard" spam patterns.                                                 #
#########################################################################
    [match hard_from_spm $FROM]    ||
    [match hard_addr_spm $ADDRESS] ||
    [match hard_subj_spm $SUBJECT] ||
    [match hard_head_spm $HEADER]  ||
    [match hard_body_spm $BODY]
} {
    mark "SPAM: '$Flags(lastmatch)'"
    proceed Spam
}

rule SpmHSRV { [dns_test] } {
############################################################################
# This one checks if the message has been received from a non-identifiable #
# server without a domain name                                             #
############################################################################
    mark "SPAM: unidentifiable server"
    proceed Spam
}

rule SpmORSR { [open_relay_test] || [source_test] } {
################################################################
# Now we shall verify the list of mail servers from the header #
################################################################
    mark "SPAM: SRV"
    proceed Spam
}
...

First, the message is checked (by SpmHFASHB) for being an obvious spam by matching its various components against the list of "hard spam" groups. It isn't difficult to guess that the rule named Spam (listed further in this document) takes care of such a message.

The next rule (SpmHSRV) checks if the last foreign E-Mail server, whose identity in the headers could not possibly be faked, has a DNS-resolvable IP address. The usefulness of this rule depends on the behavior of your "receiving" mail server (which determines the IP address of the preceding server) and on the format of the "Received" header inserted by it. Function dns_test (defined in Tests/server.tst) scans the list of mail servers in the headers and returns 'true' if the following two conditions are met:

There is a trusted server that received the message from a non-trusted server.
The trusted server was not able resolve the IP address of the non-trusted server.

The last of the three rules, SpmORSR, is triggered by either of two tests defined in Tests/server.tst. In particular, open_relay_test, extracts from the headers of the received message the addresses of all untrusted E-Mail servers and performs the full variant of the open relay test on each of them. It returns `true' if at least one of the servers has responded positively to the light part of the test.

The exact behavior of open_relay_test is slightly more complicated, because the function accounts for those hosts that at first sight appear to be open relays (i.e., they respond positively to the light test), but in fact are decent, i.e., they silently drop the offending message. In its effort to account for such servers, the function maintains two databases. One of them (group open_rlay_spm and the underlying pattern file Rules/open_rlay.spm) stores those hosts that have responded to the light test but whose exact status isn't certain (the filter is still waiting for the probe messages to arrive back). The second database (group open_rlay_exc and file Rules/open_rlay.exc) lists those hosts whose probe messages haven't made it back within a predetermined amount of time (set to 4 hours in Tests/server.tst). Such hosts are assumed to be OK, even though they may look like open relays under the light test.

The first thing done by open_relay_test is to examine the first database whose entries consist of the addresses of the suspicious hosts tagged with the time of the light test. All entries being older than four hours are deleted, and the corresponding addresses are moved to the second database. Then, for every E-Mail server spotted in the headers of the current message, the function checks if the server occurs in the second database. If this is the case, the server is assumed to be OK. Otherwise, the function gets back to the first database. If the server is present there, it means that it responded positively to the light test less than four hours ago. For the time being, such a host is assumed to be an open relay. Finally, if the host does not occur in either database, it is subjected to the full open relay test. If the host responds positively (to the light part), it is stored in the first database, tagged with current time, and temporarily pronounced an open relay. Otherwise, it is stored in the second database, so that no more tests on it will be performed in the future. Rule ORTProbe (located at the very end of the list of rules) is responsible for interpreting returning probe messages and moving the offending hosts from the first database to a permanent black list.

The second function invoked by the trigger of SpmORSR, source_test, attempts to verify whether the message was sent directly from a mail server running on a subscriber's "home" PC. This favorite method of most spammers is relatively seldom used by senders of legitimate E-Mail. The function assumes that the message comes from a suspicious source if it was received by a trusted server from a host whose symbolic address looks like a "home" ADSL or cable subscriber. See the functions source_test and subscriber_test (in Tests/server.tst) for details. The heuristics may appear a bit weird, but they seem to work for me.

The next three rules take care of "mild" exceptions and "soft" spam criteria:

...
rule ExcMFSB {
#######################################################################
# If it still not obvious whether the message is a SPAM, we shall try #
# more exceptions before using further less exact criteria.           #
#######################################################################
    [match mild_from_exc $FROM]    ||
    [match mild_subj_exc $SUBJECT] ||
    [match mild_body_exc $BODY]
} {
    mark "EXCEPTION: '$Flags(lastmatch)'"
    proceed OK
}

rule NoBody { ![regexp -nocase \
       "\[a-z\]\[a-z\]\[a-z\].* \[a-z\]\[a-z\]\[a-z\]" \
               $BODY] } {
#############################
# Non-text attachments only #
#############################
    unset body
    mark "SPAM: No text"
    proceed Spam
}

rule SpmSFASHB {
#########################################
# And now for those less exact criteria #
#########################################
    [match soft_from_spm $FROM]    ||
    [match soft_addr_spm $ADDRESS] ||
    [match soft_subj_spm $SUBJECT] ||
    [match soft_head_spm $HEADER]  ||
    [match soft_body_spm $BODY]
} {
    mark "SPAM: '$Flags(lastmatch)'"
    proceed Spam
}
...

They are executed if the filter hasn't been able to make up its mind so far, and their meaning is pretty much obvious. Feel free to look into the pattern files of the relevant groups to see what kind of criteria I personally consider "mild" and "soft". Needless to say, you are more than welcome to add there your favorite patterns. Note the middle rule, NoBody, which checks whether the body of the received message is empty. The trigger condition applies to the BODY string, which consists of the text portion of the received message, excluding all non-text attachments and MIME headers. The rule checks, whether this string contains at least two sequences of three-letters in a row separated by a blank. If you have a better idea for this heuristic, be my guest!

Finally, we have the rule that delivers the message to your mailbox:

...
rule OK {
##################################
# The message is to be delivered #
##################################

    if [info exists Options(V)] {
        set fr [lindex $Options(V) 0]
        set to [lindex $Options(V) 1]
        setdirectory
        readfile -eval "Tests/goodies.tst"
        vacation $fr $to [readfile "vacation"]
    }

    save
    stop
}
...

If not for the conditional part, the rule would simply save the message in your default mailbox and terminate the execution of the filter. The if statement implements the "vacation" option which we shall discuss in section Options and extras.

Let us note that there is no way to fall through the OK rule. Therefore, the next rule (listed below) can only be entered via an explicit proceed operation from one of the previous rules:

...
rule Spam {
#########################
# The message is a SPAM #
#########################
    bounce -except [lindex [extract hard_subj_exc] 0] [readfile "preamble"]
    save "JunkMail/[clock format [clock seconds] -format %Y%m%d]"
    stop
}
...

This rule handles a message diagnosed as spam. Such a message is bounced to its sender and saved in a junk mailbox. Note the -except option of bounce, which specifies the first pattern from the hard_subj_exc group. This is your standard "magic phrase", i.e., the piece of text that the sender should put into the subject line to make sure that the message will not be rejected. This text will substitute for `%X' in the preamble prepended to the bounced message. Please have a look at the standard preamble file to see what I mean.

The somewhat cryptic argument to save identifies a mailbox in the JunkMail directory. The actual file name of that mailbox is simply the current date (in the format YYYYMMDD). Thus, a single mailbox in JunkMail will contain your daily load of spam.

Similar to the OK rule, Spam is unconditional and it terminates the processing of the rule sequence. The next rule must therefore be entered directly by an explicit proceed operation. This operation is executed by the Bounce rule discussed earlier. This is how the set of rules ends:

...
rule ORTProbe { $Flags(ortreply) != "" } {
#########################################################################
# The message should be dropped, but if it is a reply to our open relay #
# test, we should add the server to our list of spammers.               #
#########################################################################
    set r ""
    foreach s $Flags(ortreply) {
        # this is questionable in the context of CIDR addressing
        # but let us be aggressive
        set s [netaddress $s]
        while { [regsub "(\\.0)(\[.*\]*)$" $s ".*\\2" s] } { }
        lappend r $s
    }
    modify -add -list "hard_addr_spm" $r
    modify -delete -list "open_rlay_spm" $Flags(ortreply)
    modify -delete -list "open_rlay_exc" $Flags(ortreply)
}

rule Drop {
    save "JunkMail/dropped"
    stop
}

Rule ORTProbe determines if the message happens to be a returning probe message from an open relay test. This is the case if Flags(ortreply) is a nonempty string. If so, this string is a two-element list of addresses (numeric and symbolic) identifying the guilty host. Each of the two addresses is first turned into a network address. The idea is to mark all hosts in the network of the broken server as suspicious. (Note that this approach is a bit drastic, as the network address is determined using the old pre-CIDR interpretation of the host address.) The while loop turns the numerical IP address of a network into an address pattern that will match all addresses in that network. For example, `129.128.0.0' is transformed into `129.128.*.*'. Then, both preprocessed addresses are added to the hard_addr_spm group and removed from open_rlay_spm, where they were put by open_relay_test, and (just in case) from open_rlay_exc, if the probe message was delayed by more than 4 hours on its way. Note that hard_addr_spm is matched against ADDRESS by the SpmHFASHB rule. Thus, if any host address found in the headers of a received message belongs to a network running an open mail server, the message will be classified as spam. Drastic as they are, these measures appear to be quite effective.

The last rule is executed for any message for which Flags(nobounce) is set and the message doesn't happen to be a returning probe message. Such a message is dropped, meaning deposited in a special mailbox where you never look, except when really bored or intrigued.

My personal version of ORTProbe, in addition to performing the operations described above, sends a message to the administrator of the open mail server:

rule ORTProbe { $Flags(ortreply) != "" } {
    set r ""
    foreach s $Flags(ortreply) {
        set s [netaddress $s]
        while { [regsub "(\\.0)(\[.*\]*)$" $s ".*\\2" s] } { }
        lappend r $s
    }
    modify -add -list "hard_addr_spm" $r
    modify -delete -list "open_rlay_spm" $Flags(ortreply)
    modify -delete -list "open_rlay_exc" $Flags(ortreply)
    setdirectory
    set msg [readfile "ormessage"]
    regsub -all "%W" $msg [join $Flags(ortreply)] msg
    send -subject "Open mail server" -nobounce $msg \
        "postmaster@[lindex $Flags(ortreply) 1]"
}

File ormessage in my base directory contains the following text:

Dear Postmaster:

I am a spam filter operating on behalf of %E.

The following mail server:

    %W

appears to be unsecured against third party relays, and I suspect
that it has been used in an attempt to deliver spam to my master.
Consequently, effective immediately, I will be unconditionally
blocking all E-Mail passing through your server until my master
notifies me otherwise.

With automatic regards,

RF Spam Killer

Options and extras

The set of rules can be easily edited if needed and its functionality can be modified on-the-fly. However, there is always a risk of breaking something when introducing an ad-hoc change, even if the change appears innocent and very simple. Although the filter exercises extreme caution to avoid losing E-Mail in case of error (all errors are intercepted, and in any dubious case the message is delivered to the standard mailbox), little help (in terms of fighting spam) is offered by a broken set of rules. On the other hand, there are situations where it would be convenient to have a few options changeable in a reasonably simple way, without having to modify the rules file in every case. A typical example is the standard "vacation" setup which you would like to switch on and off without having to manually edit any sensitive files.

The standard and recommended way of passing options to the rules script is by setting the Options array in .rfirerc. This array is made global by the filter, which means that all its settings in .rfirerc are directly visible to the rules. Besides that, the filter doesn't interpret any of those settings, so it is up to the rules and functions run by them to take advantage of this feature.

By itself, this simple solution does not seem to bring a lot of help, because now, instead of modifying the rules, you have to modify .rfirerc. However, the base directory of the package contains a handy script, named rfopt and described in section Handling options, for setting, clearing, and listing those options in a more civilized way. But if needed, they can always be modified by hand.

The set of rules that comes with the package illustrates the use of one such option, `V', implementing the standard "vacation" feature. The function implementing it, along with a few other handy functions, is defined in file Tests/goodies.tst. Let us have another look at the OK rule executed to deposit the received message in the standard mailbox:

...
rule OK {
##################################
# The message is to be delivered #
##################################

    if [info exists Options(V)] {
        set fr [lindex $Options(V) 0]
        set to [lindex $Options(V) 1]
        setdirectory
        readfile -eval "Tests/goodies.tst"
        vacation $fr $to [readfile "vacation"]
    }

    save
    stop
}
...

Options(V) is meant to indicate the vacation mode. If defined, it is expected to be a string consisting of two list items: the starting and the ending date of your vacation (in any format acceptable by Tcl, e.g., "May 20, 2002"). The vacation function (please look it up in Tests/goodies.tst) accepts three arguments, the two dates and a message string, and performs the following actions:

If current time is earlier than the "from" date or later than the "to" date, the function returns zero and does nothing.
If the sender of the current message was notified about your absence within the last week, it returns 1 and does nothing more.
Otherwise, the function sends the indicated message to the sender. Special sequences in that message are substituted as for a preamble of a bounced message. Additionally, `%R' and `%U' are substituted with the starting and ending dates of your absence. The standard text of the vacation message is in file vacation in the base directory. Of course, you are welcome to edit it to your taste.

In the third case, the sender address is stored in a special tagged group (used internally by vacation) (file vaclist.lst in Rules) used as the database of those senders that have been notified about your absence within the last week.

File goodies.tst contains a few other handy functions. Imagine that you are leaving your office, e.g., for the weekend, and that you will be connecting to your mailbox via a slow dial-up line, e.g., from a hotel. You would like to make sure that there are no excessively long messages hogging your mailbox, e.g., ones with lengthy attachments that you won't have time to read anyway. Although with IMAP, you can view the subjects before deciding which message to download, it would be nice to be able to read the plain text part of a message with long attachments without having to download the attachments as well. The following function, defined in goodies.tst, implements this feature:

shorten [-plain] [-force] limit mailbox

The first argument (limit) is a number indicating the maximum size of a message that you would like to see in your mailbox. The second argument is the file name of the mailbox where you want to deposit the complete long message, if it happens to be longer than what you are willing to accept. If the full body (FBODY) of the message is longer than limit characters, the message actually deposited in your standard mailbox will be truncated to at most limit characters and "textified", i.e., converted to pure ASCII text as closely as possible. A footnote will be appended to the delivered message explaining that it has been truncated and where to look for the complete version. The function returns 1 in such a case. If the full body of the received message is not longer than the specified limit, the function returns 0 and performs no action.

With the -plain flag, the message will be "shortened" (with its original version saved in the indicated mailbox), even if it is shorter than the indicated limit but contains non-plaintext components. This is primarily intended for converting HTML messages to plain text, e.g., if you want to use mutt to read your E-Mail remotely. Note that if you de-HTML your messages by default, you do not have to do it here. With -force, the message will be treated as if it were shortened or textified, even if it is left intact. With this option, the function always returns 1.

My personal version of the OK rule contains the following insert (after the "vacation" part):

...
if [info exists Options(L)] {
    if [shorten -plain $Options(L) "../longmail"] {
        stop
    }
}
...

Options(L), if defined at all, is set to the number representing the limit on the message size. A message longer than the specified limit (or containing non-plaintext components) is deposited in ../longmail, with its sanitized version stored in my standard mailbox. A short plaintext message triggers no special action, and the save operation that follows the above sequence delivers it as usual. Note that the longmail mailbox is located just above the base directory of the package, i.e., in $HOME/Mail in my case.

Quite often you would prefer to deposit some messages directly in dedicated mailboxes without passing them through your standard mailbox and sorting them by hand. For example, assignments from your students, newsletters, bulletins, calls for papers, all fit into this category. Of course, it isn't difficult to set explicit rules for this kind of action, but the goodies.tst file defines the following function that makes this job a bit simpler:

special filename

The function assumes that the indicated file (filename is interpreted relative to the current directory) contains a Tcl list. Every element of that list, which itself is a three-element list, describes one sorting criterion, as illustrated in the following example:

#
# This is a sample list of two sorting criteria
#
{
    { [match conf_subject $SUBJECT] || [match conf_body $BODY] }
    "../conferences"
    "Received a new call for for paper: $SUBJECT"
} {
    { [string first "portmap connect from" $SUBJECT] == 0 }
    "../security_alerts"
    ""
}

The first element of a sorting criterion is the condition that triggers it. It may involve group matching or any Tcl constructs that can sensibly appear as arguments to the if statement. The second element identifies the mailbox (its name is interpreted relative to the current directory) where messages meeting the criterion are to be deposited. If the last (optional) element is a nonempty string, it specifies the subject of a simple body-less message that will be stored in your standard mailbox whenever a message meeting the criterion is saved. No such message is generated if the third item is not specified at all, or if it is an empty string.

The following function:

setmessage subject

completely erases the received message and replaces it with an empty one. The subject of the empty message is set to the indicated string. Its purpose is to deposit special notes in your mailbox. Needless to say, the function should be used with care.

The last function in goodies.tst that may be handy is

textify limit

used by shorten. It substitutes BODY for FBODY, forces the removal of all HTML constructs, etc. The function also trims the message to at most limit characters.

Logging and error handling

The filter maintains a log where it writes some information related to its operation, including error messages. The log file is named log and resides in the base directory of the package. There is an easy way to switch off all logging. Namely, if the log file doesn't exist, the program will not try to create a new file and, consequently, it will not write any log messages. This may not be a good idea, especially at the beginning, when you would like to be able to see what is going on within your setup.

Unless it crashes due to an internal bug (so far it has never happened to me), the filter tries to make sure that no message is lost because of an error in the rules database. With the simple way of handling errors, the filter abandons all operations (in particular, the execution of the rules sequence) as soon as an error occurs and executes the argument-less variant of save to deposit the received message in your default mailbox. The worst that can happen if you mess up your database, is that every incoming message, including all spam, will be delivered to you as if the filter were not there.

Error messages are primarily written to the log file, so they may pass unnoticed if you do not monitor the log. There is an easy and recommended way to make them more visible. By setting FooterLevel in .rfirerc, you determine the amount of information to be presented in the "footer" of received messages. If FooterLevel is undefined or zero, nothing is ever appended to a message deposited in your mailbox. If FooterLevel is 1, only error messages are shown there. As they typically diagnose problems with your database setup, missing them may not be a good idea. With the next higher level (2), in addition to errors, the filter will include a single line listing the succession of rules (their names) that were fired while the message was being processed. It may be a good idea to have this information handy, at least during the setup and development stage. The highest value of FooterLevel is 3 and it additionally selects all information that normally goes to the log. Note that if the log file is absent and FooterLevel is set to 3, you are still going to see most of the lines that would show up in the log, but distributed over the messages to which they are specifically related.

To have a better perception of how your incoming messages are being diagnosed, you may put commands like:

mark "MATCHED: `$Flags(lastmatch)'"

into the rules that are fired by group matching. This will insert a line of text at the end of every message that has matched some pattern, indicating the fragment of the message that triggered the matching.

Other programs

Four symbolic links in the base directory of RabidFire: rfmail, rfupd, rfort, and sendmail point to the filter script. They correspond to four additional functions that the script will perform when called under those names. Its filter function, described in the preceding sections, is only available when the script is invoked as filter. One moral from this paragraph is that you should never change the file name of the script.

The mail proxy

When called as rfmail, the script acts as a pipe for passing E-Mail messages from a mail client to a mail server. This function, rather useless by itself, is exploited by the mail proxy daemon. With the assistance of this simple program, you can automatically update your list of trusted senders by the addresses of the recipients of your outgoing E-Mail. A similar and related function is performed by the sendmail substitute.

For all this to work, you have to set up a few simple things. First, you must define the so-called default update group by setting UpdateGroup in .rfirerc to the name of the pattern group storing your trusted senders. For the standard setup discussed in section Examples, this would be hard_from_exc. The default update group is the group of patterns where the proxy will automatically store the recipients of your outgoing messages. Second, if you want to take advantage of temporary exceptions, you should set TempExceptionGroups (also in .rfirerc) to a list identifying the three temporary exception groups and their default deadlines. The standard (and perfectly reasonable) settings of all those items are already present in .rfirerc and they look as follows:

set UpdateGroup "hard_from_exc"
set TempExceptionGroups {{temp_from_exc 7} {temp_subj_exc 7} {temp_body_exc 7}}

For TempExceptionGroups, the three items on the list correspond to the sender, the subject line, and the message body. The number following the group name gives the (default) pattern expiration time as the number of days.

The mail proxy daemon is written in C, which means that, unlike the filter, it must be compiled before it can be used. This is accomplished by executing make in the base directory of the package, which should produce an executable file named rfproxy. When you run it, you will spawn a detached daemon process that will appear as a mail server to trusted clients.

The proxy daemon assumes that there exists a "real" mail server to which the outgoing E-Mail messages should be passed for "true" processing. The host address of that server is determined by the value of MailServer in the same .rfirerc file (in your home directory) that is used by the filter and the killer. Three additional parameters in .rfirerc are interpreted exclusively by the proxy:

MailProxyPort

This is the port number on which the proxy will offer its SMTP service. It is recommended to use a number different from 25 (the standard SMTP port), if only your E-Mail client is flexible enough to accept it as a parameter.

MailProxyList

This is the list of regexp patterns describing the numeric IP addresses of the hosts authorized to ask the proxy for service.

DoNotExcept

This is a single regexp pattern describing those destination addresses that should not be automatically added to your standard list of exceptions.

Note that the filter uses the "real" mail server (rather than the proxy) for bouncing messages (as well as its sending and forwarding functions). Surely, you wouldn't want to add the fake sender addresses used by the spammers to your list of exceptions.

If your mail client can accept a non-standard SMTP port number, the proxy can be run on the same machine on which the true mail server is running. My favorite mail client is kmail (from KDE). Although there is really nothing special about this program, its one handy feature is the possibility to specify any port for the mail server. Reasonably recent versions of Netscape accept the port number after the server's host name, e.g.,

mailserver.ualberta.ca:3398

which feature does not appear to be documented anywhere (to the point of confusing Netscape's own configuration "wizard").

If your client insists on using the standard SMTP port number 25, you have to start the proxy on a machine that doesn't normally run a mail server, and do this as root, so that the program can open a socket on port 25. Needless to say, that machine must see your home directory (access to .rfirerc is critical) and the base directory of RabidFire, so that the proxy can execute rfmail and update your pattern files kept in Rules. In that case, and in other cases when a confusion may arise as to the location of your home directory, you can pass the home directory path as an argument to the proxy daemon, e.g.,

rfproxy /home/pawel

The role of MailProxyList is to restrict access to the proxy, so that it cannot be hijacked as an open relay mail server. Normally, you will be connecting to the proxy from less than a handful of local machines, and it is important to restrict this access to those machines only. Otherwise, all the security mechanisms guarding your true mail server will be circumvented, because the server has been set up to trust the machine running your proxy, and all requests passed to it by the proxy will appear to have originated locally. Just to make sure that you are not accidentally deploying a new spamming tool on your system, the proxy will refuse to operate, if MailProxyList is undefined or empty.

Occasionally, you may want to send a message without adding its recipient (or recipients) to your list of exceptions. If you do not feel like modifying your entire setup for every situation like this (or possibly removing the exception by hand after sending the message), you may declare a pattern identifying those destination addresses that should be ignored by the proxy. This solution is more general and flexible than it seems at first sight. For example, with the following setting in .rfirerc:

set DoNotExcept {[A-Z]$}

any address ending with an upper-case letter will not be automatically added to the list of exceptions. As the case of letters in a symbolic host address is irrelevant, you can always use this simple trick without affecting the interpretation of the destination address by the true mail server.

This is how the mail proxy daemon operates. Having received a connection, the program sets up a bidirectional channel between your client and the true mail server, invoking rfmail to implement one leg of this channel, i.e., the path from your client to the server. The other path is trivial and it simply relays the responses of the server without any attempts to interpret them. The rfmail script is responsible for verifying the legitimacy of the request, intercepting destination addresses from the SMTP commands, deciding which of them should be added to the database, and finally performing modifications to the default update group. The group file is locked for this operation.

In addition to all those operations, the proxy looks up certain special sequences in the recipient addresses, the message subject, and the message body identifying patterns to be added to the temporary exception groups. If the character '%' occurs in a recipient address, the remaining portion of that address is added to the list of temporary "from" exceptions (the first group on the TempExceptionGroups list) as an address pattern. An optional number following '%' is interpreted as the expiration time of the pattern (the number of days). The default expiration time (7 days) is assumed, if that number is absent. If the expiration time needs to be clearly separated from the remainder of the address, a second '%' can be used as a (matching) delimiter. For example, the following sequence of addresses:

johnny@%important.company.com
sales@my.favorite.%1superstore.ca
the_1%2%34user@muka.edu

will result in the following patterns:

*important.company.com
*superstore.ca
*34user@muka.edu

with the second and third patterns expiring in one and two days, respectively. If the indicated portion of an address starts with '.' or '@', that character is ignored, i.e.,

sales@my.favorite.%1superstore.ca

and

sales@my.favorite%1.superstore.ca

will produce identical patterns.

Note: A recipient address specifying a temporary exception pattern is not automatically added to the list of permanent "from" exceptions (the default update group).

A sequence of the form "% ... %" (quotes included) appearing within the subject line (or message body), indicates a pattern to be added to the temporary subject (or body) exception group. In the simplest (default) case, the string between the delimiters is turned into a regular expression that will match the string exactly, except for white spaces which are considered flexible, i.e., any number of spaces (including tabs and newlines) in the message will match any number of spaces in the corresponding position of the string. The case of letters is significant or not, depending on the declaration of the corresponding group (in the standard setup, the case is ignored). The original string is retained where it occurs (i.e., passed along with the message) with the delimiters stripped.

If the opening delimiter is "r% instead of "%, the enclosed string is viewed directly as a regular expression. In that case, the entire string (including the delimiters) is removed before the message is passed to the recipient.

If the closing delimiter contains a decimal number between % and ", that number is viewed as the expiration time (in days) of the resulting pattern. The default expiration time (7 days, according to the standard setup) is assumed, if the number is absent.

Note: A single message may specify an arbitrary number of temporary exception patterns, but none of them is allowed to cross a line boundary. This is obvious for a subject, but less so for a body pattern.

Here are a few examples:

Please send me the data sheet for the "%eCOG1%" CPU mentioned in
your products brochure.

Re: final grades for "%CMPUT 379%12" are now available

I will be deeply obliged for any information regarding your memory
chips including SDRAM and SRAM."r%SD?RAM%3"

Sendmail

If called as sendmail, the script simulates the operation of the standard sendmail program when it is invoked to process a single outgoing message. Some E-Mail clients, e.g., mutt, let you specify a private version of sendmail to be used for processing outgoing E-Mail.

The simple sendmail lookalike offered by the filter is truly simple and naive. It ignores all options (arguments starting with `-') and assumes that the remaining arguments are destination addresses. The sender address cannot be changed and it is set to your E-Mail address as found in .rfirerc.

The program can execute in one of two modes. If MailProxyHost in .rfirerc is set to a nonempty string, sendmail uses the mail proxy to expedite outgoing E-Mail. In such a case, that string must be the correct name of the host running the proxy server. This way, all outgoing E-Mail is going to be processed by the mail proxy, as described above. Otherwise, i.e., if MailProxyHost is not defined or empty, sendmail will use the standard mail server (as declared in .rfirerc) and process the message (exactly as it is done by the proxy) all by itself. The first solution is to be used when sendmail is run on a host that has no immediate access to the base directory of the filter.

Adding patters to the database

Of course, the files containing regular expressions used by the groups (the inhabitants of the Rules directory) can be freely edited using whatever text editors are available on your system. Some care should be taken to avoid editing a file that can be updated during that operation by the filter, although the likelihood of this happening, and the severity of the consequences, should it actually happen, are both minuscule.

There is a convenient way of adding simple patterns to the group files, that automatically ensures their consistency by using the file locking mechanism of the filter. When the filter is called as rfupd, it interprets its command line arguments in a way similar to the modify function.

If rfupd is called with a single argument, this argument is interpreted as a single new pattern to be added to the default update group. For this invocation to be legal, a default update group must be defined in .rfirerc (by setting the UpdateGroup variable). Then, the operation performed by rfupd is equivalent to:

modify -add dug pattern

where dug is the default update group and pattern is the specified pattern. If rfupd is called with multiple arguments, they are simply directly passed to modify. For example,

rfupd -delete hard_from_exc "ualberta" "usask"

is executed as

modify -delete hard_from_exc "ualberta" "usask"

Open relay test

By invoking the filter as rfort, you can manually perform the open relay test on the indicated host or hosts. The call format of rfort is exactly as the call format of the openrelay function: the specified arguments are passed directly to openrelay. While performing the test, the program produces a description of its activity on the standard output.

Handling options

In section Options and extras, we mentioned the idea of Options settable in .rfirerc and available to the rules. We provide a simple script, named rfopt, for setting, clearing, and listing such options without having to edit the .rfirerc file by hand. This is a separate script, not yet another link to filter.

The script is invoked in the following way:

rfopt [-h home] [-r rcfile] [opt ... opt]

With no -h or -r, the script references the standard .rfirerc file in your home directory. With -h you can change the home directory for looking up .rfirerc, with -r you can change the name of the `rc' file, possibly specifying its full path. The remaining arguments are interpreted as a sequence of options to be modified. If no option is specified, the script just lists the current setting of options and performs no other action.

The script assumes that every option is a single capital letter. In the specified list of options, this letter must be preceded by `-'. The subsequent arguments, until the first one beginning with `-', are all concatenated into a list and used as the new value of the option. If this list is empty, the option is to be cleared, i.e., unset.

For example, the following command:

rfopt -V "May 20, 2002" "May 30, 2002" -L 10000

sets Options(V) to `{{May 20, 2002} {May 30, 2002}}' and Options(L) to `{10000}', whereas

rfopt -V -L 32000

sets Options(L) to `{32000}' and clears Options(V).

The killer

The killer is a simple Tcl/Tk script that will help you analyze spam messages and send complaints about them to whatever addresses can be inferred from the headers, with the assistance of nslookup, traceroute, whois, and perhaps some other programs. You can directly access and update the filter's pattern database from the killer; you can also perform open relay tests whose outcomes will be automatically honored by the filter.

As the number of changes introduced to the killer from its last-documented version is rather small (compared to how much the filter has changed), I have decided to leave the original description of the killer practically intact, and add a section at the end of this writeup to comment specifically on the changes. Although some of the windows look a little different in the new version, and some new menus have been added to some of them, you should have no problems figuring out what they are about.

The root window

When you invoke the program, you will see the root window that may look like this:

The window lists the contents of the default directory with spam mailboxes. This default is set up to coincide with the JunkMail directory of the filter.

You can browse through the hierarchy of your file system with the Directory menu and select mailboxes located elsewhere. Those mailboxes may have been created by other programs, not necessarily the RabidFire filter. If you don't like the default setting of the junk mail directory, you may set the JunkMail parameter in .rfirerc to the full path of your favorite default.

The mailbox window

When you decide to open a mailbox you can select it with your mouse and hit the open button. You can also double-click on the mailbox to the same effect. This will open the mailbox window showing the first message in the mailbox. This is how it looks:

You can navigate through the mailbox using the first four buttons from the left, but at any time you are looking at a single spam. You can delete this spam from the mailbox by hitting the Delete button, but you don't want to do it without complaining first, do you?

The root window is disabled while the mailbox window is active, which also means that you cannot have two mailbox windows opened at the same time. When you close the mailbox window, the root window will activate itself again.

If you decide to do something about the current spam in the mailbox window, you hit the Process button. This operation will locate all IP and E-Mail addresses in the spam (without verifying them), and display them in a new window. Those addresses will also appear highlighted in the mailbox window. You will see something like this:

and the following new window will pop up:

The target window

The target window lists all the addresses that appear highlighted in the mailbox window, turned into the E-Mail addresses of the prospective recipients of our complaint. The last address has been extracted from the web pointer that appears at the bottom of the message (outside the scrolled in visible region of the mailbox window). Note that the addresses from your domain do not take part in our little game. These addresses are eliminated by the setting of Excluded in .rfirerc.

If you don't exclude any domains by explicitly setting the Excluded parameter in .rfirerc, the filter will automatically exclude all addresses whose last two domains match the last two domains of the current host.

Of course, it is too early to complain now. You can discard some of those addresses right away, but let's keep them all around for illustration. Note that the E-Mail addresses that could be located in the message appear verbatim in the target window, whereas abuse has been prepended to all IP addresses. You can change this default target by setting the Targets parameter in .rfirerc to your favorite list of default targets. In particular, you may request that postmaster be also included by default (I think I have included it in rfirerc_example).

You can play around with the Destination menu, which allows you to delete an entry from the target list, add/delete a postmaster/abuse target, edit an existing entry, add a new entry by editing an existing entry, or create a brand new entry. You can also undo the effect of the last modification of the target list. By clicking on a highlighted address in the mailbox window, you can always bring this address back to the target list (if it is not there already).

Of course, the numeric address 204.245.72.135 cannot be used directly to send E-Mail to. Thus the next thing we should do is to verify the addresses and turn them all into a symbolic form. We accomplish that by choosing Verify from the Refine menu. But before doing that we have to select all the addresses to which the operation will be applied. For starters, we select them all (by dragging the mouse through the target window) and then hit Verify. This operation performs the following steps:

All numeric addresses are looked up (by executing nslookup) and turned into symbolic addresses.
Symbolic addresses pointing to the same network are merged and replaced by the shortest among them.
All addresses that don't nslookup are discarded.

The list that we obtain after this step looks as follows:

We are not entirely satisfied with it because, for example, the most important address from the header (204.245.72.135) has disappeared from the list (because it didn't nslookup). We will get it back in a while, but first let us say a few words about the new window that has popped up while we were verifying the addresses.

The console

The console window pops up (if it is not present already) when the killer executes an external program (e.g., nslookup). It lists the output of the command. Besides being able to see that output for yourself, you can easily extract addresses from the console and move them to the target window. After the previous verification step, our console looks like this:

Just for fun, we can try selecting ShowAll from the Addresses menu. This operation will highlight all symbolic IP and E-Mail addresses within the console window. Now, the console looks like this:

Note that the numeric addresses are not highlighted. You can select them manually by clicking on them. When you click on a highlighted address, you will move it to the target window. When you double-click on an address, you will move it to the target window even if it is not highlighted. When you select Add from the Addresses menu, you will add all the highlighted addresses to the target list. Other related choices from the Addresses menu are ShowSelected which highlights all symbolic addresses within the current selection, and Clear which un-highlights all highlighted addresses.

OK, let's get back to the business at hand. With a single click of the mouse in the mailbox window we bring back the originating address of the spam to the target list. What options do we have now? One thing we can try is whois. Thus, we select the address in the target window, erase the console (the Erase button of course), and choose WhoIs from the Refine menu. In fact, we have an option to do a single whois specifying one server from the sub-menu, or to do a comprehensive search using all servers. This is what we get with the single variant:

Well, not much. But the comprehensive variant is more successful (the search stops as soon as a positive response arrives).

So we have the culprit. Note that we can arrive at the same result in a slightly different way. Namely, let us select the address again (in the target window) and choose UpStream from the Refine menu. This executes traceroute on the selected address(es), which brings us about the following nice output:

We carefully select the entire output of traceroute (by dragging mouse through it) and then choose SelectionToEmail from the Addresses menu. Nothing happens when we do that (except that the selection is canceled), but when the complaint message is eventually sent, the selected text will be prepended to the included text of the original spam. This way they will know that we have a good reason to complain to them, as opposed to, say, mailexcite.com.

Next we double-click on appliedsoftware-gw.g10.com to bring it to the target list, select it, and execute WhoIs once again. It seems that now we have most of the pieces in hand.

I guess we will have to add Mr. Reynolds's E-Mail address to the list, so we double-click on it. From the traceroute output, we can see that ATM5-0-3.atl01.IConNet.NET looks like a front node for appliedsoftware-gw.g10.com with pacbell.iconnet.net being the upstream provider. Thus, we add all three to the target list. When we remove some obvious rejects that are not going to get us anywhere, we are left with this:

Now we can select all addresses and perform Trim on them. This will do the following things:

Multiple standard targets (abuse, postmaster) pointing to the same network or to the same mail exchanger will be merged into one.
Multiple addresses pointing to the same network will be replaced by the single shortest address.
All addresses will be parsed from left to right stripping their domains, and each time a new network is hit, the address will be added to the target list.
All addresses without a mail exchanger will be removed from the list.

In our case, this operation produces the following outcome:

After some cosmetic manual re-touch we finally settle for the following list:

Note that you can add a postmaster entry by selecting an address (or a range of addresses) and choosing Postmaster from the Destination menu. The command will add a postmaster target for every selected address that doesn't have this target already; otherwise, it will remove the postmaster target. A similar action is performed by the Abuse command from this menu.

You may include your own address in the target list (by choosing Self from the Destination menu) to see the message that has been sent to all those people. Note that the guys at lycosmail.com, mailexcite.com, and usa.net are innocent, but we should let them know that they are being used. When we are ready, we hit Complain and we are done. Complain will ask for confirmation before sending our message to all the parties in the target list.

Note: By selecting V+T from the Refine menu of the target window you perform Verify and then Trim on all addresses.

The complaint letter

The standard complain letter is located in file complaint. This letter will be followed by a copy of the spam message that you are complaining about.

The following two special sequences (each of them may occur more than once in the letter) will be substituted before the letter is sent:

This will be replaced with your name as specified in .rfirerc. If you haven't specified any name there, your name is "The Spamfighter."

This will be replaced with your E-Mail address (as specified in .rfirerc or determined automatically by the script).

Note that you may select a (single) fragment of the console's output to be included in your complaint. If you do so, the fragment will appear after the letter and before the message, preceded with the text `Included command output:'.

If you decide to modify the complaint letter, you may want to send a dummy complaint to yourself, to make sure that the letter looks exactly the way you want it to look.

Hints and remarks

You should carefully read the comments in the example version of .rfirerc and make sure that all parameters have the right values for your environment. Please check whether the commands nslookup, whois, and traceroute work on your system.

If your version of nslookup and/or whois returns output in a format that the script doesn't recognize, you may have to modify the way these commands are interfaced to the program. Queries implemented by external commands are described at the beginning of the script. It should be relatively easy to add new commands and describe their interface, but some knowledge of Tcl is required for this. See the comments in the script's header for additional information.

If you have dig installed on your system, you may uncomment the lines in .rfirerc that add dig to the Refine menu. Note, however, that dig will not be automatically interfaced to the script, in the sense that its queries won't affect the contents of the Target window. But you will be able to extract addresses from dig's output (directed to the console) and move them to the Target window with little effort. You may also add your other favorite commands to the Refine menu by using the syntax described in the sample .rfirerc file and at the beginning of the script.

Note that a command (be it a standard one or one defined by you) that can be invoked from the Refine menu may need an argument. This argument will be extracted from the current selection in the target window or in the console. Whenever you click in the target window or in the console, you select the current word (clicking on an address in the console has the additional effect of highlighting it and/or moving it to the target window). When you drag the mouse in either window, you may select several words and/or lines. By holding the Ctrl key, you can select multiple non-adjacent items in the target window, but you can have at most one continuous selection in the console. When you try this, you will see that whenever you select something in the console, the current selection from the target window is unselected and vice versa. If you invoke a Refine command that needs an argument, this argument will be sought in the current selection (in the target window or in the console). Depending on the command format (see rfirerc_example), this argument can be an IP address, an E-Mail address, or simply a word (understood as a continuous sequence of non-blanks). The command will be executed as many times as many different arguments it locates in the selection, each time for a different argument. Of course, if the command doesn't need any arguments, it will ignore the selection.

Feel free to increase and/or customize the standard list of NIC servers. You can add new servers to the list from .rfirerc; you can also edit the standard list (variable WhoIsServers in the script header) if you don't like it the way it is.

Note that you can execute any command in the console, by choosing Other from the Refine menu of the target window. Of course, you can collect any addresses returned by that command in the standard way.

The killer can operate on mailboxes created by various mailing programs (e.g., Netscape), but you shouldn't remove E-Mail from those mailboxes with the killer. If you do this, you may confuse the mailing program, which may have indexed the mailbox in a way that the killer cannot understand.

Changes in version 0.8

This section briefly lists the most relevant changes introduced to the killer since version 0.6 (described in the preceding sections).

Access to the filter database

The root window has a new menu labeled Filter listing all groups declared in the rules file (but outside actual rules). By clicking on a group name, you open its pattern file in a separate window and make it available for editing. Only the actual patterns (and tags for tagged groups) are listed in the window. Any comments are preserved and restored (as sensibly as possible) when the updated file is written back.

While the meaning of the three right buttons of a pattern file window is obvious, some of the operations in the Modify menu may need a few words of explanation. Please note that all modifications on a pattern file are void until the pattern window is closed and saved.

Add IP

This operation will attempt to add the IP addresses (symbolic or numeric) currently selected to the pattern file. All addresses will be checked for being already covered by some of the existing regular expressions, and only those will be added that are truly new.

Add NE

This operation adds a pattern covering the network of the indicated numeric IP address. For a symbolic address, it behaves exactly as Add IP.

Add EM

This operation adds the E-Mail addresses from current selections to the file. The rules are the same as for Add IP.

Add ST

This operation adds a single selected string to the file as one regular expression. It is not available for an address group. Some heuristic preprocessing is performed on the string to escape its fancy characters and improve its direct usability as a regular expression. For example, the string: `1 bottle=$75.95' (you've got - it comes from a Viagra spam) will be added as: `1[^.]+bottle=\$75\.95'.

Enter, Edit, Delete, Clone, Undo

These are obvious editing functions. Check them out.

Match

The operation opens a small window prompting you for a string to be matched against the group (using the group matching rules). When you enter a piece of string and hit the Try It button, the killer will try to match the string to the group. If this succeeds, the word Matched will show up in the window and the first pattern that matched the string will be highlighted. Otherwise, the killer will say Not matched. The window remains on the screen (letting you try more matches) until you dismiss it by hitting the Done button.

For an -and type group, no pattern is highlighted if the matching is successful. If the operation fails, the program highlights the first pattern that failed.

Note that backslashed sequences within the specified string are substituted in the standard fashion of Tcl. Thus, it is possible to simulate newlines, etc. Also beware that a true backslash must be preceded by another backslash to stand for itself.

A file displayed in a Filter window is locked when it is read or written, but it does not remain locked while its window is open. Consequently, if the filter accidentally modifies a pattern file after you read it and before it is written back, those changes will be lost.

Open relay test

The Refine menu in the target window offers the Open Relay Test function, for which you have to select an IP address. The SMTP commands sent to the host, and the conclusion from the test, are displayed in the console window. You may select a full test, in which case a standard probe message will be sent to the server, once it passes the light test. This operation is performed by invoking the rfort incarnation of the filter script.

Miscellaneous

The .rfirerc file contains more examples of some handy user-defined commands that can be added to the Refine menu of the target window. Some of those commands are commented out. You may want to have a look at them and edit their configuration to your taste.

Please mail your comments (preferably constructive ones) to pawel@cs.ualberta.ca. Do not forget to insert the phrase "receive me" into the subject line ;-).

`%N`	is replaced with your name, as declared in `.rfirerc`
`%E`	is replaced with your E-Mail address, as declared in `.rfirerc`
`%F`	is replaced with the sender address (`FROM`) of the current message
`%S`	is replaced with the subject line (`SUBJECT`) of the current message

-1	This value can only be returned if the function was invoked to verify a single host, i.e., the list of addresses consisted of a single item. It means that host does not behave like an E-Mail server, i.e., it either doesn't exist or it doesn't respond sensibly to SMTP commands.

0	This value means that all the specified hosts are OK, i.e., none of them appears to be an open relay. This also includes the case of multiple hosts of which some (or all) did not respond like E-Mail servers.

>0	This is the number of hosts that, according to the light test, appear to be open relays. If the full test was selected, a probe message has been sent to each of those hosts.