One document matched: draft-ietf-marf-redaction-07.xml


<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfcXXXX.dtd'>
<rfc ipr="trust200902" docName="draft-ietf-marf-redaction-07" category="std">
	<?rfc toc="yes" ?>
	<?rfc tocompact="yes" ?>
	<?rfc symrefs="yes" ?>
	<?rfc sortrefs="yes" ?>
	<?rfc compact="yes" ?>
	<?rfc rfcprocack="yes" ?>
	
	<front>
		<title abbrev="Redaction">
			Redaction of Potentially Sensitive Data from
			Mail Abuse Reports
		</title>
		<author initials="J.D." surname="Falk" fullname="J.D. Falk"
			role="editor">
			<organization>Return Path</organization>
			<address>
				<postal>
					<street>
						100 Mathilda Place, Suite 100
					</street>
					<city>Sunnyvale</city>
					<region>CA</region>
					<code>94086</code>
					<country>US</country>
				</postal>
				<email>ietf@cybernothing.org</email>
				<uri>http://www.returnpath.net/</uri>
			</address>
		</author>
		<author initials="M." surname="Kucherawy"
			fullname="M. Kucherawy" role="editor">
			<organization>Cloudmark</organization>
			<address>
				<postal>
					<street>
						128 King St., 2nd Floor
					</street>
					<city>San Francisco</city>
					<region>CA</region>
					<code>94107</code>
					<country>US</country>
				</postal>
				<email>msk@cloudmark.com</email>
			</address>
		</author>

		<date/>
		
		<area>Applications</area>
		<workgroup>MARF Working Group</workgroup>
		<keyword>ARF</keyword>
		<keyword>MARF</keyword>
		<keyword>feedback loop</keyword>
		<keyword>spam reporting</keyword>
		
		<abstract>
			<t> Email messages often contain information that
			    might be considered private or sensitive, per
			    either regulation or social norms.  When such a
			    message becomes the subject of a report intended
			    to be shared with other entities, the report
			    generator may wish to redact or elide the
			    sensitive portions of the message.  This memo
			    suggests one method for doing so effectively. </t>
		</abstract>
	</front>
	
	<middle>
		<section title="Introduction">
			<t> <xref target="ARF"/> defines a message format for
			    sending reports of abuse in the messaging
			    infrastructure, with an eye toward automating 
			    both the generating and consumption of those
			    reports. </t>
			
			<t> For privacy considerations it might be the policy
			    of a report generator to anonymize, or obscure,
			    portions of the report that might identify an end
			    user who caused the report to be generated.
			    This has come to be known in feedback loop
			    parlance as "redaction".
			    Precisely how this is done is unspecified in
			    <xref target="ARF"/> as it will generally be a
			    matter of local policy.  That specification 
			    does admonish generators against being too
			    over-zealous with this practice, as obscuring too
			    much data makes the report non-actionable. </t>

			<t> Previous redaction practices, such as replacing
			    local-parts of addresses with a uniform string
			    like "xxxxxxxx", frustrated any kind of
			    prioritizing or grouping of reports.  This memo
			    presents a practice for conducting redaction in
			    a manner that allows a report receiver to
			    detect that two reports were caused by the same
			    end user without revealing the identify of
			    that user.  That is, the report receiver can use
			    the redacted string, such as an obscured email
			    address, to determine that two such unredacted
			    strings were identical; the reports originally
			    contained the same address. </t>

			<t> Generally, it is assumed that the
			    recipient-identifying fields of a message, when
			    copied into a report, are to be obscured to protect 
			    the identity of the end user who submitted the
			    complaint about the message.  However, it is also
			    presumed that other data will be left intact, and
			    those data could be correlated against log files
			    or other resources to determine the intended
			    recipient of the original message. </t>
		</section>

		<section anchor="algorithm" title="Recommended Practice">
			<t> When redacting of reports is desired, in order
			    to enable a report receiver to correlate reports
			    that might refer to a common but anonymous source,
			    the report generator SHOULD use the following
			    practice:
			
			    <list style="numbers">
				<t> Select a transformation mechanism (see
				    <xref target="mechs"/>) that is
				    consistent (i.e., the same input string
				    produces the same output each time) and
				    reasonably collision-resistant (i.e., two
				    different inputs are unlikely to produce
				    the same output). </t>

				<t> Identify string(s) (such as local-parts
				    of email addresses) in a message that need
				    to be redacted.  Call these strings the
				    "private data". </t>
						
				<t> For each piece of private data, apply
				    the selected transformation mechanism. </t>

				<t> If the output of the transformation
				    can contain bytes that are not printable
				    ASCII, or if the output can include
				    characters not appropriate to replace the
				    private data directly, encode the output
				    with the base64 algorithm as defined in
				    Section 4 of <xref target="BASE64"/>, or
				    some similar translation to a form valid
				    replacement in the original context.  For
				    example, replacing a local-part in an email
				    address with transformation output
				    containing an "@" character (ASCII 0x40)
				    or a space character (ASCII 0x20) is not
				    permitted by the specification for
				    local-part (<xref target="SMTP"/>), so the
				    transformation output needs to be encoded
				    as described. </t>
						
				<t> Replace each instance of private data with
				    the corresponding (possibly encoded)
				    transformation when generating the
				    report.  Note that the replaced text could
				    also be in a context that has constraints
				    such as length limits that need to be
				    observed. </t>
			    </list> </t>
			
			<t> This has the effect of obscuring the data (in a
			    potentially irreversible way) while still allowing
			    the report recipient to observe that numerous
			    reports are about one particular end user.  Such
			    detection enables the receiver to prioritize its
			    reactions based on problems that appear to be
			    focused on specific end users that may be
			    under attack. </t>
		</section>
		
		<section anchor="mechs" title="Transformation Mechanisms">
			<t> This memo does not specify a particular
			    transformation mechanism as a requirement.
			    The interoperability that this memo seeks to
			    provide is enabled by the consistency of the
			    transformation. </t>

			<t> The issue of the security of the transformation,
			    frustrating attempts to reverse the transformation,
			    is a matter of local policy.  A continuum of
			    possible transformations exists, from trivial
			    ones such as rot13, CRC32 and base64, through
			    strong cryptographic encodings such as
			    <xref target="HMAC"/> and even full encryption, or
			    private transformations such as mapping an email
			    address to an internal customer number.  An
			    operator wishing to perform report redaction needs
			    to select a consistent transformation that
			    obscures the private data and is resilient to
			    attempts to extract the original data to the
			    extent required by local policy, keeping in mind
			    that the environment in which the transformation is
			    operating is not a highly secure one.  See
			    <xref target="sec_not_redacted"/> for further
			    details of this issue. </t>

			<t> An implementation MAY choose any transformation
			    that has a reasonably low likelihood of
			    collision. </t>
		</section>
						
		<section title="Security Considerations">
		 <section anchor="sec_general" title="General">
			<t> General security issues with respect to these
			    reports are found in <xref target="ARF"/>. </t>
		 </section>

		 <section anchor="sec_collisions" title="Digest Collisions">
			<t> Message digest collisions are a well-understood
			    issue.  Their application here involves a
			    report receiver improperly concluding that two
			    pieces of redacted information were originally the
			    same when in fact they are not.  This can lead
			    to a denial of service, where the inadvertently
			    improper application of complaint data causes
			    unjustified corrective action.  Such cases are
			    sufficiently unlikely as to be of little
			    concern. </t>
		 </section>

		 <section anchor="sec_not_redacted"
		          title="Information Not Redacted">
			<t> Although the identity of the user causing a report
			    to be generated can be obscured using this
			    mechanism, other properties of a message (such as
			    the Message-ID field) that are not redacted could
			    be used to recover the original data by locating
			    them in the message logs of the originating system
			    or via other data correlation techniques.  It is
			    incumbent on the report generator to anticipate
			    and redact or otherwise obscure such data, or
			    accept that such recovery is possible even from
			    the very simplest kinds of feedback. </t>

			<t> It is for this reason that the normative portions
			    of this memo do not include stronger assertions
			    about cryptography used in the transformation.
			    Given the ultimate recoverability of the redacted
			    information, the cryptographic strength of the
			    transformation is not a critical security
			    measure. </t>

			<t> The process of redacting a feedback report
			    satisfies a privacy requirement established by
			    local policy, and is not meant to provide strong
			    security properties. </t>

			<t> <xref target="FBL-BCP"/> and Section 8 of
			    <xref target="ARF"/> discuss topics related to
			    establishment of bilateral agreements between
			    report producers and consumers.  The issues
			    raised here are also things to be considered when
			    establishing such agreements. </t>
		 </section>
		</section>

		<section title="Privacy Considerations">
			<t> While the method of redaction described in this
			    document may reduce the likelihood of some types
			    of private data from leaking between ADMDs, it is
			    extremely unlikely that report generation software
			    could ever be created to recognize all of the
			    different ways that private information could be
			    expressed through human written language.  If
			    further protections are required, implementers may
			    wish to consider establishing some sort of
			    out-of-band arrangements between the relevant
			    entities to contain private data as much as
			    possible. </t>
		</section>
		
		<section title="IANA Considerations">
			<t> This memo includes no request to IANA. </t>
			<t> [RFC Editor note: This section may be removed prior
			    to publication.] </t>
		</section>
	</middle>
	
	<back>
		<references title="Normative References">
			<reference anchor='ARF'>
				<front>
					<title> An Extensible Format for Email
					        Feedback Reports </title>
					<author initials='Y.'
					        surname='Shafranovich'
					        fullname='Y. Shafranovich'>
						<organization />
					</author>
					<author initials='J.' surname='Levine'
					        fullname='J. Levine'>
						<organization />
					</author>
					<author initials='M.'
					        surname='Kucherawy'
					        fullname='M. Kucherawy'>
						<organization />
					</author>
					<date year='2010' month='August' />
				</front>
				<seriesInfo name='RFC' value='5965' />
			</reference>

			<reference anchor='BASE64'>
				<front>
					<title abbrev='BASE64'>
						The Base16, Base32, and Base64
						Data Encodings
					</title>

					<author initials='S.'
					        surname='Josefsson'
					        fullname='S. Josefsson'>
						<organization>
							SJD
						</organization>
					</author>
					<date year='2006' month='October' />
				</front>
				<seriesInfo name='RFC' value='4648' />
			</reference>
		</references> 

		<references title="Informative References">
			<reference anchor='FBL-BCP'>
				<front>
					<title abbrev='FBL Recommendations'>
						Complaint Feedback Loop
						Operational Recommendations
					</title>

					<author initials='J.D.' surname='Falk'
					        fullname='J.D. Falk'>
						<organization>
							Messaging Anti-Abuse
							Working Group
						</organization>
					</author>
					<date year='2011' month='November' />
				</front>
				<seriesInfo name='RFC' value='6449' />
			</reference>
			
			<reference anchor='HMAC'>
				<front>
					<title abbrev='HMAC'>
						HMAC: Keyed-Hashing for
						Message Authentication
					</title>

					<author initials='H.'
					        surname='Krawczyk'
					        fullname='H. Krawczyk'>
						<organization>
							IBM
						</organization>
					</author>

					<author initials='M.'
					        surname='Bellare'
					        fullname='M. Bellare'>
						<organization>
							UCSD
						</organization>
					</author>

					<author initials='R.'
					        surname='Canetti'
					        fullname='R. Canetti'>
						<organization>
							IBM
						</organization>
					</author>
					<date year='1997' month='February' />
				</front>
				<seriesInfo name='RFC' value='2104' />
			</reference>

			<reference anchor="SMTP">
				<front>
					<title> Simple Mail Transfer
					        Protocol </title>

					<author initials="J." surname="Klensin"
						fullname="J. Klensin">
						<organization/>
					</author>

					<date year="2008" month="October"/>

				</front>

				<seriesInfo name="RFC" value="5321"/>
			</reference>
		</references>

		<section title="Example">
			<t> Assume the following input message:

	<figure><artwork>
  From: alice@example.com
  To: bob@example.net
  Subject: Make money fast!
  Message-ID: <123456789@mailer.example.com>
  Date: Thu, 17 Nov 2011 22:19:40 -0500

  Want to make a lot of money really fast?  Check it out!
  http://www.example.com/scam/0xd0d0cafe
	</artwork></figure> </t>

	<t> On receipt, bob@example.net reports this message as abusive
	    through whatever mechanism his mailbox provider has
	    established.  This causes an <xref target="ARF"/> message to
	    be generated.  However, example.net wishes to obscure Bob's
	    email address lest it be relayed to the offending agent, which
	    could lead to more trouble for Bob. </t>

	<t> Thus, example.net plans to redact the local-part of the recipient
	    address in the To: field.  Local policy and security requirements
	    suggest the algorithm known as "H" (a hash of a key concatenated
	    with the data to be obscured) using SHA1 is adequeate.  It has
	    thus selected a redaction key of "potatoes", and the private data
	    in this case is the string "bob".  The concatenation of
	    "potatoesbob" is digested with SHA1 and then base64-encoded to the
	    string "rZ8cqXWGiKHzhz1MsFRGTysHia4=". </t>

	<t> Therefore, when constructing the ARF message in response to Bob's
	    complaint, the following form of the received message is used in
	    the third part of the ARF report:

	<figure><artwork>
  From: alice@example.com
  To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
  Subject: Make money fast!
  Message-ID: <123456789@mailer.example.com>
  Date: Thu, 17 Nov 2011 22:19:40 -0500

  Want to make a lot of money really fast?  Check it out!
  http://www.example.com/scam/0xd0d0cafe
	</artwork></figure> </t>

	<t> Note, however, that it is possible the redacted information can
	    be recovered by agents at example.com searching their logs for
	    the original envelope associated with the message, by correlating
	    with the Message-ID contents which were not redacted here.  It
	    is expected that feedback loops generating such reports involve
	    senders that have been vetted against such information
	    leakage. </t>
		</section>

		<section title="Acknowledgements">
			<t> Much of the text in this document was initially
			    moved from other MARF working group documents,
			    with contributions from Monica Chew, Tim Draegen,
			    Michael Adkins, and other members of the Messaging
			    Anti-Abuse Working Group.  Additional feedback was
			    provided by John Levine, S. Moonesamy, Alessandro
			    Vesely, and Mykyta Yevstifeyev. </t>
		</section>
	</back>
</rfc>

PAFTECH AB 2003-20262026-04-24 05:39:19