Mail package TPmail for Unix systems

Language

[eng] [rus]

Spam filter with a help of algorithms LMTA and RHTA

Algorithm LMTA (Legacy Mail Testing Algorithm) is the one of the known successfuly algorithms of the non-content non-legimate messages (spam). Algorithm LMTA must have on its entry the following: IP address of sender's host, senders's email address from the message's convert, and maybe sender's host name from the "helo" phase of the protocol SMTP. Using these values LMTA is given an answer about legacy of sender, or speaking strongly, the algorithm gives a link or the correlation between these values. Because we want the definite answer - "yes" or "no" - then heuristic result is transformed in 1 or 0.
As any complex heuristic algorithm with the parametric tuning (for example, content filter algorithms) LMTA has a "soft" and "hard" conditions to accept. Clearly, that moving in different ways, it is possible to reject more spam, increase the chances to reject a legal mail, and vice versa. By default the anti-spam filter of package TPmail is used the "soft" mode of mail accepting. The algorithm settings are removed from the package to avoid many problems. Exact tuning of algoithm is required a good knowlegde and understaning of very complex things - protocol SMTP, work of DNS servers and others. Therefore a common user or administrator is more easy to work with the basic settings using LMTA as good-working "blackbox". With a default basic settings LMTA is successfully worked on the external testings in the different organizations with a different volumes of mail traffic.
The best and strong feature of LMTA is possibility to reject a spam messsage on the phase of email message's envelope without the servers of the various black lists or like something.

Algorithm RHTA (Received Header Testing Algorithm) checks a service fields of message to confirm RFC standards. This algorithm, of course, is the content filter algorithm. Unfortrunately, the algorithm detects almost all errors for the service fields of the mail message. It is right for the spam mail or the legal mail messages. Usually, about 30 percents of all mail will be rejected by this algorithm. Are we need this algorithm? This algorithm can be forced the user to see some errors or strange things in the service part of the message. For example, spam message don't often use some fields because the spam databases have the e-mail addresses but they don't have the right names of users. It can be helped the user that these messages just can be ignored.

What is efficiency of non-content filter by using of LMTA? It is better to use statistics from sma_stat and the user's interaction. And,of course, every organization has the own rules for the e-mail. On the test runs the algorithm gives the efficiency upto 98 percents. But a LMTA creator is enough sceptic about these values and also about values 99,9999 percents from the various packages of content filter. All these values are found on the definite selections and cannot be used outside of these selections. Any generalizations are absolutely absurd here.

Algorithm LMTA don't work on the legal spam delivery or the relaying from the trusted user through other mail server (for example, @gmail.com through ISP in South Africa). We have a classic statistical errors. But these limitations are in the algorithm's nature. A such errors must be corrected by the static or dynamic lists of accept, or maybe you must use the content filter.

Is it possible to compare the algorithm LMTA with many algorithms of the content filter? It is a very difficult. Firstly, the algorithm LMTA is only a one algorithm, but the other side has a group of the algorithms containing the big and complex filter system. Secondly, if we take a classical methods to compare the algorithms (execution time and work memory size), then LMTA is an absolute winner. Really, speed of the algorithm LMTA is not depended on volume and contents of message. The decision about accept/reject can be done by LMTA before accepting of all message. The algorithm LMTA don't need the big signatures bases, the rooms filled by the classification experts, the learning operators with the re-learning process. It is a single known algorithm from the class "install-run-forget". But LMTA cannot be expected as the absolute winner because it cannot give 100 percent successful results on any selection from mail messages stream. But we could note that usually the algorithm LMTA is detected "spam" a more quickly and efficently than the systems of the content filter.

A final results can be see as following.
Use the algorithm LMTA, but if you have the favourite content filter system that enough simple to control, then use this system after LMTA. In this case you could be expect the more good results. Use the statistical reports from the module sma_stat. The sma_stat is given an exact understanding of the mail processing in your organization, ignoring the strange values from the software vendors.
Note, the most organizations using the package TPmail don't have the system of content filter.

Last modified: $Date: 2007-12-10 17:00:13+03 $

Mail package TPmail for Unix systems

Language

Home

Documentation

Support

News

Resources

Contacts

Spam filter with a help of algorithms LMTA and RHTA