One document matched: draft-ietf-marf-redaction-05.xml
<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfcXXXX.dtd'>
<rfc ipr="trust200902" docName="draft-ietf-marf-redaction-05" category="std">
<?rfc toc="yes" ?>
<?rfc tocompact="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc rfcprocack="yes" ?>
<front>
<title abbrev="Redaction">
Redaction of Potentially Sensitive Data from
Mail Abuse Reports
</title>
<author initials="J.D." surname="Falk" fullname="J.D. Falk"
role="editor">
<organization>Return Path</organization>
<address>
<postal>
<street>
100 Mathilda Place, Suite 100
</street>
<city>Sunnyvale</city>
<region>CA</region>
<code>94086</code>
<country>US</country>
</postal>
<email>ietf@cybernothing.org</email>
<uri>http://www.returnpath.net/</uri>
</address>
</author>
<author initials="M." surname="Kucherawy"
fullname="M. Kucherawy" role="editor">
<organization>Cloudmark</organization>
<address>
<postal>
<street>
128 King St., 2nd Floor
</street>
<city>San Francisco</city>
<region>CA</region>
<code>94107</code>
<country>US</country>
</postal>
<email>msk@cloudmark.com</email>
</address>
</author>
<date/>
<area>Applications</area>
<workgroup>MARF Working Group</workgroup>
<keyword>ARF</keyword>
<keyword>MARF</keyword>
<keyword>feedback loop</keyword>
<keyword>spam reporting</keyword>
<abstract>
<t> Email messages often contain information that
might be considered private or sensitive, per
either regulation or social norms. When such a
message becomes the subject of a report intended
to be shared with other entities, the report
generator may wish to redact or elide the
sensitive portions of the message. This memo
suggests one method for doing so effectively. </t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t> <xref target="ARF"/> defines a message format for
sending reports of abuse in the messaging
infrastructure, with an eye toward automating
both the generating and consumption of those
reports. </t>
<t> For privacy considerations it might be the policy
of a report generator to anonymize, or obscure,
portions of the report that might identify an end
user who caused the report to be generated.
This has come to be known in feedback loop
parlance as "redaction".
Precisely how this is done is unspecified in
<xref target="ARF"/> as it will generally be a
matter of local policy. That specification
does admonish generators against being too
over-zealous with this practice, as obscuring too
much data makes the report non-actionable. </t>
<t> Previous redaction practices, such as replacing
local-parts of addresses with a uniform string
like "xxxxxxxx", often frustrates any kind of
prioritizing or grouping of reports. </t>
<t> Generally, it is assumed that the
recipient-identifying fields of a message, when
copied into a report, are to be obscured to protect
the identity of the end user who submitted the
complaint about the message. However, it is also
presumed that other data will be left intact, and
that data could be correlated
against log files or other resources to determine
the intended recipient of the message. </t>
</section>
<section anchor="algorithm" title="Recommended Practice">
<t> When redacting of reports is desired, in order
to enable a report receiver to correlate reports
that might refer to a common but anonymous source,
the report generator SHOULD use the following
practice:
<list style="numbers">
<t> Select an arbitrary string that will be
used by an Administrative Management
Domain (ADMD) that generates reports.
This string will not be changed except
according to a key rotation policy or
similar. Call this the "redaction
key". The redaction key SHOULD be based
on at least 64 bits of pseudo-random
input. (See
<xref target="sec_not_redacted"/> and
<xref target="sec_key_protection"/> for
additional discussion.) </t>
<t> Identify string(s) (such as local-parts
of email addresses) in a message that need
to be redacted. Call these strings the
"private data". </t>
<t> For each piece of private data, construct
a new string that is a copy of the
redaction key with the private data
concatenated to it. </t>
<t> Compute a digest of each composite string
with any hashing/digest algorithm; a secure
hash such as one defined in
<xref target="FIPS-180-3-2008"/> or a
secure message digest algorithm based on
a secure hash is suggested.
(See <xref target="sec_not_redacted"/>
for discussion.) </t>
<t> Encode each digest with the base64
algorithm as defined in
<xref target="BASE64"/>. </t>
<t> Replace each instance of private data with
the corresponding base64-encoded hash when
generating the report. </t>
</list> </t>
<t> This has the effect of obscuring the data in an
irreversible way while still allowing the report
recipient to observe that numerous reports
are about one particular end user. Such detection
enables the receiver to prioritize its reactions
based on problems that appear to be focused on
specific end users that may be under attack. </t>
</section>
<section title="Security Considerations">
<section anchor="sec_general" title="General">
<t> General security issues with respect to these
reports are found in <xref target="ARF"/>. </t>
</section>
<section anchor="sec_collisions" title="Digest Collisions">
<t> Message digest collisions are a well-understood
issue. Their application here involves a
report receiver improperly concluding that two
pieces of redacted information were originally the
same when in fact they are not. This can lead
to a denial of service, where the inadvertently
improper application of complaint data causes
unjustified corrective action. Such cases are
sufficiently unlikely as to be of little
concern. </t>
</section>
<section anchor="sec_not_redacted"
title="Information Not Redacted">
<t> Although the identity of a report generator can
be redacted using this mechanism, other properties
of a message (such as the Message-ID field)
that are not redacted could be used to recover
the original data by locating them in the
message logs of the originating system or other
data correlation techniques. It is
incumbent on the report generator to anticipate
and redact or otherwise obscure such data, or
accept that such recovery is possible even from
the very simplest kinds of feedback. </t>
<t> It is for this reason that the normative portions
of this memo do not include stronger assertions
about minimum lengths for the redaction key or the
selection of particularly strong hashes. Given
the ultimate recoverability of the redacted
information, the cryptographic strength of the
hash and particularly long or unguessable keys are
not critical security measures. </t>
<t> The process of redacting a feedback report
satisfies a privacy requirement established by
local policy, and is not meant to provide strong
security properties. </t>
<t> <xref target="FBL-BCP"/> and Section 8 of
<xref target="ARF"/> discuss topics related to
establishment of bilateral agreements between
report producers and consumers. The issues
raised here are also things to be considered when
establishing such agreements. </t>
</section>
<section anchor="sec_key_protection" title="Key Management">
<t> As with any application that uses secret keys,
care must be taken to guard the redaction key
against compromise. If the key is no longer a
secret, recovering the redacted information
becomes a simple brute force attack. </t>
<t> Also, periodically changing the key is a means of
limiting the quantity of redacted information
that would be exposed by the compromise of a
single redaction key, and hence is advised.
However, a consideration when developing a key
rotation policy is that correlation of the redacted
form of the obscured information cannot occur
across use of different redaction keys. </t>
</section>
<section anchor="sec_not_complete"
title="Algorithm Vulnerabilities">
<t> The simple key-message hash method described in
<xref target="algorithm"/> is vulnerable to some
well known attacks that can be used to recover the
redaction key. This is especially important to
consider if obtaining the redaction key also
somehow creates an exposure for the report
generator in other ways. Although stronger
mechanisms like <xref target="HMAC"/> close these
loopholes, it is believed that such extra
hardening is unnecessary given the discussion
above. </t>
<t> Future work that seeks to obscure private data in
some way should not presume that this mechanism is
sufficient. It solves a simple policy requirement
for this specific use case, and is not a reliable
security mechanism for general use. </t>
</section>
</section>
<section title="Privacy Considerations">
<t> While the method of redaction described in this
document may reduce the likelihood of some types
of private data from leaking between ADMDs, it is
extremely unlikely that report generation software
could ever be created to recognize all of the
different ways that private information could be
expressed through human written language. If
further protections are required, implementers may
wish to consider establishing some sort of
out-of-band arrangements between the relevant
entities to contain private data as much as
possible. </t>
</section>
<section title="IANA Considerations">
<t> This memo includes no request to IANA. </t>
<t> [RFC Editor note: This section may be removed prior
to publication.] </t>
</section>
</middle>
<back>
<references title="Normative References">
<reference anchor='ARF'>
<front>
<title> An Extensible Format for Email
Feedback Reports </title>
<author initials='Y.'
surname='Shafranovich'
fullname='Y. Shafranovich'>
<organization />
</author>
<author initials='J.' surname='Levine'
fullname='J. Levine'>
<organization />
</author>
<author initials='M.'
surname='Kucherawy'
fullname='M. Kucherawy'>
<organization />
</author>
<date year='2010' month='August' />
</front>
<seriesInfo name='RFC' value='5965' />
</reference>
<reference anchor='BASE64'>
<front>
<title abbrev='BASE64'>
The Base16, Base32, and Base64
Data Encodings
</title>
<author initials='S.'
surname='Josefsson'
fullname='S. Josefsson'>
<organization>
SJD
</organization>
</author>
<date year='2006' month='October' />
</front>
<seriesInfo name='RFC' value='4648' />
</reference>
</references>
<references title="Informative References">
<reference anchor='FBL-BCP'>
<front>
<title abbrev='FBL Recommendations'>
Complaint Feedback Loop
Operational Recommendations
</title>
<author initials='J.D.' surname='Falk'
fullname='J.D. Falk'>
<organization>
Messaging Anti-Abuse
Working Group
</organization>
</author>
<date year='2011' month='November' />
</front>
<seriesInfo name='RFC' value='6449' />
</reference>
<reference
anchor="FIPS-180-3-2008">
<front>
<title>Secure Hash Standard</title>
<author
fullname="U.S. Department of Commerce"
surname="U.S. Department of Commerce" />
<date
month="October"
year="2008" />
</front>
<seriesInfo
name="FIPS PUB"
value="180-3" />
</reference>
<reference anchor='HMAC'>
<front>
<title abbrev='HMAC'>
HMAC: Keyed-Hashing for
Message Authentication
</title>
<author initials='H.'
surname='Krawczyk'
fullname='H. Krawczyk'>
<organization>
IBM
</organization>
</author>
<author initials='M.'
surname='Bellare'
fullname='M. Bellare'>
<organization>
UCSD
</organization>
</author>
<author initials='R.'
surname='Canetti'
fullname='R. Canetti'>
<organization>
IBM
</organization>
</author>
<date year='1997' month='February' />
</front>
<seriesInfo name='RFC' value='2104' />
</reference>
</references>
<section title="Example">
<t> Assume the following input message:
<figure><artwork>
From: alice@example.com
To: bob@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
</artwork></figure> </t>
<t> On receipt, bob@example.net reports this message as abusive
through whatever mechanism his mailbox provider has
established. This causes an <xref target="ARF"/> message to
be generated. However, example.net wishes to obscure Bob's
email address lest it be relayed to the offending agent, which
could lead to more trouble for Bob. </t>
<t> Thus, example.net plans to redact the local-part of the recipient
address in the To: field. It has selected a redaction key of
"potatoes", and the private data in this case is the string
"bob". The concatenation of "potatoesbob" is digested with SHA1
and then base64-encoded to the string
"rZ8cqXWGiKHzhz1MsFRGTysHia4=". </t>
<t> Thus, when constructing the ARF message in response to Bob's
complaint, the following form of the received message is used in
the third part of the ARF report:
<figure><artwork>
From: alice@example.com
To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
</artwork></figure> </t>
<t> Note, however, that it is possible the redacted information can
be recovered by agents at example.com by searching their logs for
the original envelope associated with the message by correlating
with the Message-ID contents, which were not redacted here. It
is expected that feedback loops generating such reports involve
senders that have been vetted against such information
leakage. </t>
</section>
<section title="Acknowledgements">
<t> Much of the text in this document was initially
moved from other MARF working group documents,
crafted by Murray S. Kucherawy with contributions
from Monica Chew, Tim Draegen, Michael Adkins, and
myself. Additional feedback was provided by
John Levine, S. Moonesamy, Alessandro Vesely, and
Mykyta Yevstifeyev. </t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 05:40:11 |