One document matched: draft-ietf-marf-redaction-05.xml


<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfcXXXX.dtd'>
<rfc ipr="trust200902" docName="draft-ietf-marf-redaction-05" category="std">
	<?rfc toc="yes" ?>
	<?rfc tocompact="yes" ?>
	<?rfc symrefs="yes" ?>
	<?rfc sortrefs="yes" ?>
	<?rfc compact="yes" ?>
	<?rfc rfcprocack="yes" ?>
	
	<front>
		<title abbrev="Redaction">
			Redaction of Potentially Sensitive Data from
			Mail Abuse Reports
		</title>
		<author initials="J.D." surname="Falk" fullname="J.D. Falk"
			role="editor">
			<organization>Return Path</organization>
			<address>
				<postal>
					<street>
						100 Mathilda Place, Suite 100
					</street>
					<city>Sunnyvale</city>
					<region>CA</region>
					<code>94086</code>
					<country>US</country>
				</postal>
				<email>ietf@cybernothing.org</email>
				<uri>http://www.returnpath.net/</uri>
			</address>
		</author>
		<author initials="M." surname="Kucherawy"
			fullname="M. Kucherawy" role="editor">
			<organization>Cloudmark</organization>
			<address>
				<postal>
					<street>
						128 King St., 2nd Floor
					</street>
					<city>San Francisco</city>
					<region>CA</region>
					<code>94107</code>
					<country>US</country>
				</postal>
				<email>msk@cloudmark.com</email>
			</address>
		</author>

		<date/>
		
		<area>Applications</area>
		<workgroup>MARF Working Group</workgroup>
		<keyword>ARF</keyword>
		<keyword>MARF</keyword>
		<keyword>feedback loop</keyword>
		<keyword>spam reporting</keyword>
		
		<abstract>
			<t> Email messages often contain information that
			    might be considered private or sensitive, per
			    either regulation or social norms.  When such a
			    message becomes the subject of a report intended
			    to be shared with other entities, the report
			    generator may wish to redact or elide the
			    sensitive portions of the message.  This memo
			    suggests one method for doing so effectively. </t>
		</abstract>
	</front>
	
	<middle>
		<section title="Introduction">
			<t> <xref target="ARF"/> defines a message format for
			    sending reports of abuse in the messaging
			    infrastructure, with an eye toward automating 
			    both the generating and consumption of those
			    reports. </t>
			
			<t> For privacy considerations it might be the policy
			    of a report generator to anonymize, or obscure,
			    portions of the report that might identify an end
			    user who caused the report to be generated.
			    This has come to be known in feedback loop
			    parlance as "redaction".
			    Precisely how this is done is unspecified in
			    <xref target="ARF"/> as it will generally be a
			    matter of local policy.  That specification 
			    does admonish generators against being too
			    over-zealous with this practice, as obscuring too
			    much data makes the report non-actionable. </t>

			<t> Previous redaction practices, such as replacing
			    local-parts of addresses with a uniform string
			    like "xxxxxxxx", often frustrates any kind of
			    prioritizing or grouping of reports. </t>

			<t> Generally, it is assumed that the
			    recipient-identifying fields of a message, when
			    copied into a report, are to be obscured to protect 
			    the identity of the end user who submitted the
			    complaint about the message.  However, it is also
			    presumed that other data will be left intact, and
			    that data could be correlated
			    against log files or other resources to determine
			    the intended recipient of the message. </t>
		</section>

		<section anchor="algorithm" title="Recommended Practice">
			<t> When redacting of reports is desired, in order
			    to enable a report receiver to correlate reports
			    that might refer to a common but anonymous source,
			    the report generator SHOULD use the following
			    practice:
			
			    <list style="numbers">
				<t> Select an arbitrary string that will be
				    used by an Administrative Management
				    Domain (ADMD) that generates reports.
				    This string will not be changed except
				    according to a key rotation policy or
				    similar.  Call this the "redaction
				    key".  The redaction key SHOULD be based
				    on at least 64 bits of pseudo-random
				    input.  (See
				    <xref target="sec_not_redacted"/> and
				    <xref target="sec_key_protection"/> for
				    additional discussion.) </t>
					
				<t> Identify string(s) (such as local-parts
				    of email addresses) in a message that need
				    to be redacted.  Call these strings the
				    "private data". </t>
						
				<t> For each piece of private data, construct
				    a new string that is a copy of the
				    redaction key with the private data
				    concatenated to it. </t>
						
				<t> Compute a digest of each composite string
				    with any hashing/digest algorithm; a secure
				    hash such as one defined in
				    <xref target="FIPS-180-3-2008"/> or a
				    secure message digest algorithm based on
				    a secure hash is suggested.
				    (See <xref target="sec_not_redacted"/>
				    for discussion.) </t>
						
				<t> Encode each digest with the base64
				    algorithm as defined in
				    <xref target="BASE64"/>. </t>
						
				<t> Replace each instance of private data with
				    the corresponding base64-encoded hash when
				    generating the report. </t>
			    </list> </t>
			
			<t> This has the effect of obscuring the data in an
			    irreversible way while still allowing the report
			    recipient to observe that numerous reports
			    are about one particular end user.  Such detection
			    enables the receiver to prioritize its reactions
			    based on problems that appear to be focused on
			    specific end users that may be under attack. </t>
		</section>
		
		<section title="Security Considerations">
		 <section anchor="sec_general" title="General">
			<t> General security issues with respect to these
			    reports are found in <xref target="ARF"/>. </t>
		 </section>

		 <section anchor="sec_collisions" title="Digest Collisions">
			<t> Message digest collisions are a well-understood
			    issue.  Their application here involves a
			    report receiver improperly concluding that two
			    pieces of redacted information were originally the
			    same when in fact they are not.  This can lead
			    to a denial of service, where the inadvertently
			    improper application of complaint data causes
			    unjustified corrective action.  Such cases are
			    sufficiently unlikely as to be of little
			    concern. </t>
		 </section>

		 <section anchor="sec_not_redacted"
		          title="Information Not Redacted">
			<t> Although the identity of a report generator can
			    be redacted using this mechanism, other properties
			    of a message (such as the Message-ID field)
			    that are not redacted could be used to recover
			    the original data by locating them in the
			    message logs of the originating system or other
			    data correlation techniques.  It is
			    incumbent on the report generator to anticipate
			    and redact or otherwise obscure such data, or
			    accept that such recovery is possible even from
			    the very simplest kinds of feedback. </t>

			<t> It is for this reason that the normative portions
			    of this memo do not include stronger assertions
			    about minimum lengths for the redaction key or the
			    selection of particularly strong hashes.  Given
			    the ultimate recoverability of the redacted
			    information, the cryptographic strength of the
			    hash and particularly long or unguessable keys are
			    not critical security measures. </t>

			<t> The process of redacting a feedback report
			    satisfies a privacy requirement established by
			    local policy, and is not meant to provide strong
			    security properties. </t>

			<t> <xref target="FBL-BCP"/> and Section 8 of
			    <xref target="ARF"/> discuss topics related to
			    establishment of bilateral agreements between
			    report producers and consumers.  The issues
			    raised here are also things to be considered when
			    establishing such agreements. </t>
		 </section>

		 <section anchor="sec_key_protection" title="Key Management">
			<t> As with any application that uses secret keys,
			    care must be taken to guard the redaction key
			    against compromise.  If the key is no longer a
			    secret, recovering the redacted information
			    becomes a simple brute force attack. </t>

			<t> Also, periodically changing the key is a means of
			    limiting the quantity of redacted information
			    that would be exposed by the compromise of a
			    single redaction key, and hence is advised. 
			    However, a consideration when developing a key
			    rotation policy is that correlation of the redacted
			    form of the obscured information cannot occur
			    across use of different redaction keys. </t>
		 </section>

		 <section anchor="sec_not_complete"
		          title="Algorithm Vulnerabilities">
			<t> The simple key-message hash method described in
			    <xref target="algorithm"/> is vulnerable to some
			    well known attacks that can be used to recover the
			    redaction key.  This is especially important to
			    consider if obtaining the redaction key also
			    somehow creates an exposure for the report
			    generator in other ways.  Although stronger
			    mechanisms like <xref target="HMAC"/> close these
			    loopholes, it is believed that such extra
			    hardening is unnecessary given the discussion
			    above. </t>

			<t> Future work that seeks to obscure private data in
			    some way should not presume that this mechanism is
			    sufficient.  It solves a simple policy requirement
			    for this specific use case, and is not a reliable
			    security mechanism for general use. </t>
		 </section>
		</section>

		<section title="Privacy Considerations">
			<t> While the method of redaction described in this
			    document may reduce the likelihood of some types
			    of private data from leaking between ADMDs, it is
			    extremely unlikely that report generation software
			    could ever be created to recognize all of the
			    different ways that private information could be
			    expressed through human written language.  If
			    further protections are required, implementers may
			    wish to consider establishing some sort of
			    out-of-band arrangements between the relevant
			    entities to contain private data as much as
			    possible. </t>
		</section>
		
		<section title="IANA Considerations">
			<t> This memo includes no request to IANA. </t>
			<t> [RFC Editor note: This section may be removed prior
			    to publication.] </t>
		</section>
	</middle>
	
	<back>
		<references title="Normative References">
			<reference anchor='ARF'>
				<front>
					<title> An Extensible Format for Email
					        Feedback Reports </title>
					<author initials='Y.'
					        surname='Shafranovich'
					        fullname='Y. Shafranovich'>
						<organization />
					</author>
					<author initials='J.' surname='Levine'
					        fullname='J. Levine'>
						<organization />
					</author>
					<author initials='M.'
					        surname='Kucherawy'
					        fullname='M. Kucherawy'>
						<organization />
					</author>
					<date year='2010' month='August' />
				</front>
				<seriesInfo name='RFC' value='5965' />
			</reference>

			<reference anchor='BASE64'>
				<front>
					<title abbrev='BASE64'>
						The Base16, Base32, and Base64
						Data Encodings
					</title>

					<author initials='S.'
					        surname='Josefsson'
					        fullname='S. Josefsson'>
						<organization>
							SJD
						</organization>
					</author>
					<date year='2006' month='October' />
				</front>
				<seriesInfo name='RFC' value='4648' />
			</reference>
		</references> 

		<references title="Informative References">
			<reference anchor='FBL-BCP'>
				<front>
					<title abbrev='FBL Recommendations'>
						Complaint Feedback Loop
						Operational Recommendations
					</title>

					<author initials='J.D.' surname='Falk'
					        fullname='J.D. Falk'>
						<organization>
							Messaging Anti-Abuse
							Working Group
						</organization>
					</author>
					<date year='2011' month='November' />
				</front>
				<seriesInfo name='RFC' value='6449' />
			</reference>
			
         <reference
            anchor="FIPS-180-3-2008">
            <front>
               <title>Secure Hash Standard</title>

               <author
                  fullname="U.S. Department of Commerce"
                  surname="U.S. Department of Commerce" />
               <date
                  month="October"
                  year="2008" />
            </front>
            <seriesInfo
               name="FIPS PUB"
               value="180-3" />
         </reference>

			<reference anchor='HMAC'>
				<front>
					<title abbrev='HMAC'>
						HMAC: Keyed-Hashing for
						Message Authentication
					</title>

					<author initials='H.'
					        surname='Krawczyk'
					        fullname='H. Krawczyk'>
						<organization>
							IBM
						</organization>
					</author>

					<author initials='M.'
					        surname='Bellare'
					        fullname='M. Bellare'>
						<organization>
							UCSD
						</organization>
					</author>

					<author initials='R.'
					        surname='Canetti'
					        fullname='R. Canetti'>
						<organization>
							IBM
						</organization>
					</author>
					<date year='1997' month='February' />
				</front>
				<seriesInfo name='RFC' value='2104' />
			</reference>
		</references>

		<section title="Example">
			<t> Assume the following input message:

	<figure><artwork>
  From: alice@example.com
  To: bob@example.net
  Subject: Make money fast!
  Message-ID: <123456789@mailer.example.com>
  Date: Thu, 17 Nov 2011 22:19:40 -0500

  Want to make a lot of money really fast?  Check it out!
  http://www.example.com/scam/0xd0d0cafe
	</artwork></figure> </t>

	<t> On receipt, bob@example.net reports this message as abusive
	    through whatever mechanism his mailbox provider has
	    established.  This causes an <xref target="ARF"/> message to
	    be generated.  However, example.net wishes to obscure Bob's
	    email address lest it be relayed to the offending agent, which
	    could lead to more trouble for Bob. </t>

	<t> Thus, example.net plans to redact the local-part of the recipient
	    address in the To: field.  It has selected a redaction key of
	    "potatoes", and the private data in this case is the string
	    "bob".  The concatenation of "potatoesbob" is digested with SHA1
	    and then base64-encoded to the string
	    "rZ8cqXWGiKHzhz1MsFRGTysHia4=". </t>

	<t> Thus, when constructing the ARF message in response to Bob's
	    complaint, the following form of the received message is used in
	    the third part of the ARF report:

	<figure><artwork>
  From: alice@example.com
  To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
  Subject: Make money fast!
  Message-ID: <123456789@mailer.example.com>
  Date: Thu, 17 Nov 2011 22:19:40 -0500

  Want to make a lot of money really fast?  Check it out!
  http://www.example.com/scam/0xd0d0cafe
	</artwork></figure> </t>

	<t> Note, however, that it is possible the redacted information can
	    be recovered by agents at example.com by searching their logs for
	    the original envelope associated with the message by correlating
	    with the Message-ID contents, which were not redacted here.  It
	    is expected that feedback loops generating such reports involve
	    senders that have been vetted against such information
	    leakage. </t>
		</section>

		<section title="Acknowledgements">
			<t> Much of the text in this document was initially
			    moved from other MARF working group documents,
			    crafted by Murray S. Kucherawy with contributions
			    from Monica Chew, Tim Draegen, Michael Adkins, and
			    myself.  Additional feedback was provided by
			    John Levine, S. Moonesamy, Alessandro Vesely, and
			    Mykyta Yevstifeyev. </t>
		</section>
	</back>
</rfc>

PAFTECH AB 2003-20262026-04-24 05:40:11