Spam filter with a help of algorithms LMTA and RHTA
Algorithm LMTA (Legacy Mail Testing Algorithm) is the one of the known
successfuly algorithms of the non-content non-legimate messages (spam). Algorithm LMTA
must have on its entry the following: IP address of sender's host, senders's
email address from the message's convert, and maybe sender's host name from
the "helo" phase of the protocol SMTP. Using these values LMTA is given
an answer about legacy of sender, or speaking strongly, the algorithm gives a link
or the correlation between these values. Because we want the definite answer
- "yes" or "no" - then heuristic result is transformed in 1 or 0.
As any complex heuristic algorithm with the parametric tuning (for example,
content filter algorithms) LMTA has a "soft" and "hard" conditions to accept.
Clearly, that moving in different ways, it is possible to reject more spam,
increase the chances to reject a legal mail, and vice versa. By default the anti-spam
filter of package TPmail is used the "soft" mode of mail accepting. The algorithm
settings are removed from the package to avoid many problems. Exact tuning of algoithm is
required a good knowlegde and understaning of very complex things - protocol SMTP, work of
DNS servers and others. Therefore a common user or administrator is more easy
to work with the basic settings using LMTA as good-working "blackbox".
With a default basic settings LMTA is successfully worked on the external
testings in the different organizations with a different volumes of mail traffic.
The best and strong feature of LMTA is possibility to reject a spam messsage
on the phase of email message's envelope without the servers of the various
black lists or like something.
Algorithm RHTA (Received Header Testing Algorithm) checks a service
fields of message to confirm RFC standards. This algorithm, of course, is
the content filter algorithm. Unfortrunately, the algorithm detects almost
all errors for the service fields of the mail message. It is right for the
spam mail or the legal mail messages. Usually, about 30 percents of all mail
will be rejected by this algorithm. Are we need this algorithm? This algorithm
can be forced the user to see some errors or strange things in the service part
of the message. For example, spam message don't often use some fields
because the spam databases have the e-mail addresses but they don't have
the right names of users. It can be helped the user that these messages just
can be ignored.
What is efficiency of non-content filter by using of LMTA? It is better
to use statistics from sma_stat and the user's interaction. And,of course,
every organization has the own rules for the e-mail. On the test runs the algorithm
gives the efficiency upto 98 percents. But a LMTA creator is enough sceptic
about these values and also about values 99,9999 percents from the various packages
of content filter. All these values are found on the definite selections and
cannot be used outside of these selections. Any generalizations are absolutely
Algorithm LMTA don't work on the legal spam delivery or the relaying from
the trusted user through other mail server (for example, @gmail.com through ISP
in South Africa). We have a classic statistical errors. But these limitations
are in the algorithm's nature. A such errors must be corrected by the static
or dynamic lists of accept, or maybe you must use the content filter.
Is it possible to compare the algorithm LMTA with many algorithms of
the content filter? It is a very difficult. Firstly, the algorithm LMTA
is only a one algorithm, but the other side has a group of the algorithms
containing the big and complex filter system. Secondly, if we take a classical
methods to compare the algorithms (execution time and work memory size), then
LMTA is an absolute winner. Really, speed of the algorithm LMTA
is not depended on volume and contents of message. The decision about accept/reject
can be done by LMTA before accepting of all message. The algorithm LMTA
don't need the big signatures bases, the rooms filled by the classification
experts, the learning operators with the re-learning process. It is a single
known algorithm from the class "install-run-forget". But LMTA cannot be
expected as the absolute winner because it cannot give 100 percent successful
results on any selection from mail messages stream. But we could note that usually
the algorithm LMTA is detected "spam" a more quickly and efficently than
the systems of the content filter.
A final results can be see as following.
Use the algorithm LMTA, but if you have the favourite content filter system
that enough simple to control, then use this system after LMTA. In this case
you could be expect the more good results. Use the statistical reports
from the module sma_stat. The sma_stat is given an exact
understanding of the mail processing in your organization, ignoring the
strange values from the software vendors.
Note, the most organizations using the package TPmail don't have
the system of content filter.