One document matched: draft-crocker-spam-techconsider-00.txt
Network Working Group D. Crocker
Internet Draft Brandenburg
28 Apr 2003
Expires: <10-04>
Technical Considerations
for Spam Control Mechanisms
draft-crocker-spam-techconsider-00.txt
This document is an Internet-Draft and is in full
conformance with all provisions of Section 10 of
RFC2026. Internet-Drafts are working documents of the
Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or
obsoleted by other documents at any time. It is
inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be
accessed at
http://www.ietf.org/shadow.html.
Copyright (C) The Internet Society (2003). All Rights
Reserved.
SUMMARY
Internet mail has operated as an open and unfettered
channel between originator and recipient. This invites
some abuses, called spam, such as burdening recipients
with unwanted commercial email. Spam has become an
extremely serious problem, is getting much worse, and
is proving difficult (or impossible) to eliminate. The
most practical goal is to bring spam under reasonable
control; it will require an on-going, adaptive effort,
with stochastic rather than complete results. This note
discusses available points of control in the Internet
mail architecture, considerations in using any of those
points, and opportunities for creating Internet
standards to aid in spam control efforts. It offers
guidance about likely trade-offs (benefits and
limitations.)
CONTENTS
1. Spam And Consent
2. Email Architecture Control Points
3. Administrative And Legal Mechanisms
4. Filtering
4.1. Policies
4.2. Explicit Lists
4.3. Content Analysis
4.4. Negotiation
4.5. Traffic Analysis
5. Infrastructure Enhancement
6. Evaluating Technical Approaches
6.1. Adoption
6.2. Burden
6.3. Scaling
6.4. Scenarios
7. Security Considerations
8. Acknowledgements
9. AuthorsĘ Addresses
1. SPAM AND CONSENT
Internet mail has operated as an open and unfettered
channel between originator and recipient. It has
always suffered from some degree of abuse, in which
originators impose on recipients inappropriately. In
recent years, a version of this abuse has grown
substantially. Called spam, its definition varies from
"unsolicited commercial email" to "any email the
recipient does not want". Often there are no technical
differences between spam and "acceptable" email. Their
format, content and even aggregate traffic patterns may
be identical. Hence spam is a problem for fundamentally
non-technical reasons, yet the Internet technical
community must pursue technical responses to it. The
lack of strong community consensus on a single, precise
definition makes this particularly challenging.
For most working discussions, the term "Unsolicited
Bulk Email" is sufficient. The salient point is that
it is mass-mailings that are of the broadest concern.
More detailed discussion must, of course, be precise in
the definition of "unsolicited" and usually must
distinguish between different types of mail, such as
commercial, religious, political or personal.
The simplistic -- but entirely adequate -- summary of
the impact of spam on Internet mail is that it is an
extremely serious problem, it is getting much worse,
and it is proving difficult (or impossible) to
eliminate. Spam is generated by a wide range of clever
originators and it always will be.
Instead of thinking of Spam as a disease that might be
eliminated, it is more useful to think about crime, war
and cockroaches. It is not realistic to expect to
eliminate any of these, no matter how much anyone might
wish otherwise. Therefore the best we can hope to
accomplish is to bring spam under reasonable control
and that control will require an on-going, adaptive
effort, with stochastic rather than complete results.
We need multiple, adaptive techniques. As spam changes,
so must our mechanisms. Different sets of mechanisms
will be appropriate for different circumstances.
In other words spam has become a permanent part of the
Internet mail experience and efforts to control it may
only reduce it to a tolerable level, rather than
eliminate it. It is somewhat comforting to remember
that an individual spam is not damaging. Rather it is
the quantity of spam that poses a threat. Therefore it
is acceptable for spam control mechanisms to be
imperfect.
This note discusses available points of control in the
Internet mail architecture, considerations in their
use, definitions of terminology and opportunities for
creating Internet standards. It also offers guidance
about likely trade-offs (benefits and limitations.)
The note does not offer an analysis of the types of
spam or the types of attacks used in sending spam, nor
is it intended to specify solutions. Similarly, the
note does not discuss fine-grained details, such as the
arguments associated with single opt-in mechanisms,
versus double opt-in. These points are essential to
the engineering of particular solutions, but only as
refinements after the larger architectural and system
control choices are made.
COMMENT: This document is intended to evolve,
based on feedback. Comments are eagerly
sought, preferably in the form of
suggested text changes, and preferably
on the ASRG mailing list, at
<mailto:asrg@permissiontechnology.com>
2. EMAIL ARCHITECTURE CONTROL POINTS
Email transmission sequences can touch many systems,
between the originator and the recipient. However for
most discussions about control, only five major
components are important:
Originator Intermediary Recipient
Service Service Service
+---------------+ +---------------+
| UA.o -> MTA.o | -> ISP.i -> | MTA.r -> UA.r |
+---------------+ +---------------+
UA.o: The originator's user agent, typically
operated by the user and under their direct
control
MTA.o: The mail transfer agent service associated
with the originator's environment, possibly
operated by the sender and possibly operated
under separate control, such as by their
employer.
ISP.i: The IP and/or mail transfer agent service(s)
operated by independent third-part(ies).
MTA.r: The mail transfer agent service associated
with the recipient's environment
UA.r: The recipient's user agent
In many organizations, the MTA service is multi-stage,
such as including a department MTA and an Internet
"firewall" MTA. This distinction is of fundamental
importance for making software and operations
decisions, but it does not have a significant impact on
a discussion about points of control. By contrast, the
distinction between originator's service, recipient's
service and any independent third parties is essential
to this larger examination. These are separate,
independent administrative environments and are subject
to different policies. In particular, note that a
discussion about using control points hinges on the
scope of the control to be exercised.
Besides constituting a major burden to recipients, the
volume of spam traffic has become a serious problem for
transit services. Hence a precept in controlling spam
is to seek control as close to the source as possible.
The fewer downstream resources consumed by spam, the
better. Of course the ideal would be a mechanism in
UA.o that would prevent spam from being sent in the
first place. Indeed, legal remedies seek to affect a
sender's motivations, so that they will not send the
spam at all.
Unfortunately software control of spam in UA.o cannot
be assumed, because that software is usually under the
control of the originator. If they wish to bypass any
control mechanisms in UA.o, they will find a way. The
same may be true of MTA.o. Hence Internet-wide designs
of spam control must assume that UA.o and MTA.o may
cooperate to generate and transmit spam. Efforts to
control either of these components may be sought as an
adjunct, where they are operated by an independent
service, but it must not be relied on.
Wherever the detection mechanism is placed, the
critical challenge is to identify spam in real time, if
its relaying and delivery are to be stopped. The other
avenue is post-hoc removal of the right to make further
use of the MTA service. This may have strong utility
for controlling spammers needing to operate within
acceptable social bounds. It will have no effect upon
spammers who avoid accountability.
3. ADMINISTRATIVE AND LEGAL MECHANISMS
Both government law and service provider contracts can
be used for defining unacceptable behavior and the
remedies available when there are violations. There are
two major problems with this administrative control of
spam. One is that a spammer often cannot be
identified. There are many opportunities for anonymous
posting of email, such as through Internet cafes,
transient access services and free email services. The
second problem is that the sender of spam may not be in
the jurisdiction seeking to exercise control, or a
jurisdiction responsive to the recipient's
jurisdiction. The Internet is global. Unlike postal
bulk mail, the cost of sending spam over the Internet
does not change as the mail crosses jurisdictional
boundaries.
Hence it seems likely that use of administrative
procedures can be effective for controlling
"responsible" spam. That is, spam sent by
organizations operating as accountable social
participants, perhaps indulging in overly aggressive
policies, but still desiring to remain socially
tolerable. The large number of "rogue" spammers is not
similarly burdened.
4. FILTERING
The technical mechanism for real-time detection and
handling of spam is a filter, placed at ISP.i, MTA.r
and/or UA.r. A filter has two functions: qualification
and action. Action is usually either adding a special
label to the message or disposing of it.
Qualification tests whether a message is spam. Test
results are:
Positive: Message matches the test
criteria.
Negative: Message fails to match the
test criteria.
When the tests are heuristic or statistical, some
portion of the results will be incorrect. These are
classed as:
False Message matches test criteria,
Positive but the criteria are too
(FP): aggressive.
False Message fails to match the test
Negative criteria, but the criteria are
(FP): not sufficiently strong.
Filters are used for two, complementary policies:
Acceptance: Approves mail for delivery.
Rejection: Withholds or refuses permission for
relaying or delivery.
Note that rules for acceptance are equally subject to
error. However Acceptance rules usually employ simple,
explicit criteria rather than heuristics, so that FP
and FN results are not usually a concern. Hence FP and
FN discussion is usually about Rejection rules.
4.1. Policies
The simplest model for an assessment list is to have
entries containing a single, simple attribute, such as
sender email address or source system IP address or
domain name.
Standards 1. Control protocol between
Opportunity: recipient and filtering service
server, to permit specifying
policies and specific rules.
2. Modify SMTP delivery status
notifications to avoid flooding
innocent mailboxes because of
forged senders. [Needs
clarification. /ed]
3. Codify best current practices
of filters to minimize sending
DSN. [Cited by VS; needs
clarification. /ed]
4. Codify DSN and SMTP status
message wording, such as saying
that rejections resulting from
filtering should include a URL for
an extended explanation. [Needs
clarification. /ed]
5. Replace SMTP.
The idea of replacing SMTP is appealing because it
permits thinking in terms of creating an infrastructure
that has accountability and restrictions built in.
Unfortunately an installed base the size of the
Internet is not likely to make such a change anytime
soon. It seems far more likely that successful spam
control mechanisms will be introduced as increments to
the existing Internet mail service.
4.2. Explicit Lists
The simplest method of testing is to have explicit
lists of simple identifier criteria, such a From
address or IP address.
Pre-assessed senders are entered into a:
Whitelist: For automatic Acceptance
Blacklist: For automatic Rejection.
One approach to maintaining Whitelists and Blacklists
is to make explicit entries into them, manually. This
is often what a spam control service will propagate to
its subscribers. Most such services are for
Blacklisting "known" spammers.
A difficulty with listing services is the set of
criteria used for adding and removing senders or sites.
These policies usually need to be explicit, objective
and documented, as well as consistently applied. Even
then they are attractive targets for lawsuits claiming
inappropriate listing.
For assessments based on the identity of the sender,
rather than the content of the message, another concern
is validation of the key attribute used for
identification. What if the value for that attribute
is set falsely? For example, what if email was not
sent by the address listed in the From field?
Standards 6. List format and exchange, to
Opportunity: permit sharing Whitelist and
Blacklist entries
7. Format and access to filter
logs, such as among MX secondaries.
[Suggested by VS; needs
clarification. /ed]
4.3. Content Analysis
Filters look for message attributes, such as strings of
text in the headers or content of the message being
inspected. Other attributes include the address or
domain name of the originating system, or the
occurrence of the same message content in multiple
messages near the same time. Simple filters look for
any occurrence of specific strings. A more powerful
approach to content analysis looks for multiple sets of
these strings, assigns a score to each occurrence; it
then labels spam according to the aggregate score.
Rule creation is done manually, or by a service, or by
analysis of a known corpus of messages. A service
observes email traffic at many Internet locations and
receives reports as recipients see new occurrences of
spam. The service then propagates new rules to its
subscribers. The analytic approach performs empirical
rule creation, using statistical (Bayesian) techniques
that discern string occurrences in known spam, versus
mail that is known not to be spam.
As rules become common, spammers adapt their messages
to bypass filters, so that existing rules quickly
become far less effective. Hence long-term filter use
must have a base of rules that is continually modified.
Empirical rules generation must be repeated, or must
operate continuously, analyzing all incoming mail.
Manual rule maintenance is simply not viable for
typical users; the effort is far too great. A concern
about services is that they are inherently post-hoc.
They are always updating the rule-set after an "attack"
commences, so that some spam is certain to reach some
recipients; however the view that a small amount of
spam is not dangerous mitigates this concern. Lastly,
methods using automated analysis rely on heuristics, or
guesses. They are certain to have some FNs that permit
real spam to reach the recipients, and some FPs that
incorrectly label legitimate mail as spam.
Any effective, long term filtering mechanism must have
automatic or semi-automatic rule creation and must
upgrade the set of rules continuously or periodically.
Standards 8. Rule format and exchange, to
Opportunity: permit sharing effective rules.
9. Sample message labeling and
exchange, to permit submission of
candidate content to remote service
10. Hash-based identifier of
content
4.4. Negotiation
In addition to real-time analysis, a recipient may
engage in an explicit negotiation with the sender, to
validate them. When this is performed at the time of
message receipt, it is called a Challenge-Response (CR)
mechanism.
CR introduces delay in message receipt and creates at
least one additional email round-trip exchange for
every new sender/recipient pair. This is a substantial
burden both on participants and on the transit service.
Senders often refuse to respond to the challenge, so
that the mechanism dissuades senders from all but the
most urgent communications. Also the delay imposed by
CR can render time-sensitive messages useless.
As with other forms of Internet-based attack, effort is
often divided into two phases. The first assesses
details about the target and the second uses them. For
spam, the assessment phase of the process seeks to
discover valid email addresses. CR mechanisms suffer
from providing that validation.
Standards 11. CR protocol, to permit
Opportunity: automated interaction between the
recipient's system and the
sender's system.
4.5. Traffic Analysis
Spam is often referred to as "unsolicited bulk mail" to
highlight that senders typically post very large
amounts quickly. Opt-in (subscription) email also
demonstrates this traffic pattern. Still there is
benefit in measuring aggregate email behavior.
Standards 12. Traffic reporting protocol,
Opportunity: to permit collaboration among
independent administrations.
5. INFRASTRUCTURE ENHANCEMENT
Enhancement of underlying Internet services might
reduce the effectiveness of some spam transmission
mechanisms. For example many spammers prefer to send
to domain name service MX secondaries because
secondaries are often not as well filtered as MX
primaries. Because of the lack of MX secondary
coordination protocols, the best advice for all but
large sites is to stop using MX secondaries.
Standards 13. MX secondary coordination
Opportunity: protocol. [Suggested by VS; might
need clarification. /ed]
14. Best Current Practises (BCP)
documentation of preferred MTA
operation for spam control
15. BCPs for other services
operating to control spam
Postal mail imposes a fee on the sender for each
message that is sent. Such a fee makes the cost of
sending significant, and proportional to the amount
sent. In contrast, current Internet mail is very
nearly free to the sender. Hence there is interest in
exploring "sender pays" email. One form of sender-pays
is identical to postal stamping. Another entails
"retribution" to the sender, taking the fee for their
posting only if the recipient indicates they were
unhappy to receive it.
For both models, it is not clear that it is possible to
fit the necessary mechanisms to existing Internet mail.
Its complete absence from the current service and the
existence of anonymous and free email services may have
too much operational inertia. It is also not clear who
should accrue the revenues or how they should be
disbursed.
Standards 16. Billing and accounting
Opportunity: protocols to obtain sender
fees and track them.
6. EVALUATING TECHNICAL APPROACHES
The complexity of Internet mail service and the nature
of spam make it difficult evaluate proposals for
control mechanisms. In this section, the key technical
factors affecting viability are examined.
6.1. Adoption
A critical barrier to the success of a new mechanism is
the effort it takes to begin using it. It is essential
to look carefully at the adoption process.
What will it take for someone to start using the
proposed mechanism? What will it take for that person
to get some benefit from the mechanism? For example,
how many people and/or systems must adopt it before it
provides any benefit?
A key construct to this issue is "core-vs-edge". For
Internet-scale operations, adoption at the edge of a
system is typically easier and quicker than adoption in
the core. If a mechanism affects the core
(infrastructure) then it usually must be adopted by
most or all of the infrastructure before it provides
meaningful utility. In something the scale of the
Internet, it can take decades to reach that level of
adoption, if it ever does. For localized operations,
adoption in the core might be quicker, involving a
single administrative entity, rather than an array of
independent users.
Remember that the Internet comprises a massive number
of independent administrations, each with their own
politics and funding. What is important and feasible to
one might be neither to another. If the latter
administration is in the handling path for a spam, then
it will not have implemented the necessary control
mechanism. Worse, it well might not be possible to
change this. For example a proposal that requires a
brand new mail service is not likely to gain much
traction.
By contrast, some "edge" mechanisms provide utility to
the first one, two or three adopters who interact with
each other. No one else is needed for the adopters to
gain some benefit. Each additional adopter makes the
total system incrementally more useful. For example a
filter can be useful to the first recipient to adopt
it. A consent mechanism can be useful to the first two
or three adopters, depending upon the design of the
mechanism.
Obviously another concern is the effort it takes to
continue using the mechanism. That is, once a use has
chosen to make the change to adopt a mechanism, how
much effort does it take to use it regularly?
Equally, the impact on others is important. For
example, a challenge-response system is irritating for
the person being challenged, and it imposes extra delay
on the desired communication. If the originator and the
recipient both access the Internet only occasionally
(such as through dial-up when mobile) a challenge-
response model can impose days of delay. For some
communications, this can be disastrous.
6.2. Burden
The purpose of spam control is to cause some email to
fail to reach its intended destination. This is, of
course, directly at odds with the constructive goal of
email. Hence spam control alters the basic model of
email service.
Effective mechanisms will place some kind of burden on
senders and receivers. Hence a challenge for spam
control mechanisms is to require enough of a burden to
be effective, but not so much that it makes email
unacceptably painful to use. When evaluating
proposals, the nature and distribution of these burdens
must be considered carefully.
6.3. Scaling
How does the proposal scale? What happens if everyone
on the Internet engages in a particular behavior? What
if the Internet grows by a factor of 1000?
Remember that "everyone" is approximately 100 million
users today, and should be expected to grow to 10
billion, if we expect the Internet to be useful for
some decades. And it is likely there will be more email
users/accounts that there are people on the planet,
given that individuals and organizations occupy
multiple roles.
So, what will it be like for 100 million or 10 billion
users to employ the proposed mechanism?
The other side of the scaling question is to ask how
much of the Internet will be affected by a proposal
and, therefore, how much spam will be controlled by it?
If a proposal requires substantial effort to adopt and
use, but will affect only a small percentage of spam,
the efficacy of that proposed mechanism is very much in
question. An obvious example of this concern is legal
scope, given that spam is global and there is no global
law enforcement.
6.4. Scenarios
Almost any proposal will make sense for a particular
scenario that is sufficiently constrained. The real
test is how the proposal works for other, likely
scenarios.
Make sure the proposal considers these likely cases
carefully. For example, citing the scenario of mailing
list participation is an excellent test. There are many
others.
7. SECURITY CONSIDERATIONS
This note discusses types of mechanisms for evaluating
and filtering email. As such it covers topics with
extremely sensitive security concerns. However it does
not propose any standards and therefore does not have
any direct security effects.
8. ACKNOWLEDGEMENTS
This note is motivate by discussions on the Anti-Spam
Research Group (ASRG) mailing list and draws a number
of points from discussion there. A number of Standards
Opportunity suggestions were taken from an ASRG posting
by Vernon Schryver. The sub-section "Burden" is taken
from a posting by Dave Hendricks.
9. AUTHORSĘ ADDRESSES
Dave Crocker
Brandenburg InternetWorking
675 Spruce Drive
Sunnyvale, CA 94086 USA
Tel: +1.408.246.8253
dcrocker@brandenburg.com
10. FULL COPYRIGHT STATEMENT
Copyright (C) The Internet Society (2003). All Rights
Reserved.
This document and translations of it may be copied and
furnished to others, and derivative works that comment
on or otherwise explain it or assist in its
implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice
and this paragraph are included on all such copies and
derivative works. However, this document itself may
not be modified in any way, such as by removing the
copyright notice or references to the Internet Society
or other Internet organizations, except as needed for
the purpose of developing Internet standards in which
case the procedures for copyrights defined in the
Internet Standards process must be followed, or as
required to translate it into languages other than
English.
The limited permissions granted above are perpetual and
will not be revoked by the Internet Society or its
successors or assigns.
This document and the information contained herein is
provided on an "AS IS" basis and THE INTERNET SOCIETY
AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.
| PAFTECH AB 2003-2026 | 2026-04-23 05:36:32 |