One document matched: draft-iab-privacy-terminology-00.xml


<?xml version="1.0" encoding="US-ASCII"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY RFC2119 SYSTEM
    "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
    <!ENTITY RFC3325 SYSTEM
    "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3325.xml">
    <!ENTITY RFC6265 SYSTEM
    "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6265.xml">
    <!ENTITY I-D.iab-privacy-considerations SYSTEM
    "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.iab-privacy-considerations.xml">
    <!ENTITY I-D.iab-identifier-comparison SYSTEM
    "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.iab-identifier-comparison.xml">
]>
    
    
<rfc ipr="trust200902" category="info" docName="draft-iab-privacy-terminology-00.txt">
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
  <?rfc toc="yes" ?>
  <?rfc symrefs="yes" ?>
  <?rfc sortrefs="yes"?>
  <?rfc rfcedstyle="yes" ?>
  <?rfc subcompact="no"?>
  <?rfc compact="no"?>
  <?rfc strict="no"?>

  <front>
    <title>Privacy Terminology</title>
    <author initials="M." surname="Hansen" fullname="Marit Hansen">
      <organization>ULD Kiel</organization>
      <address>
            <email>marit.hansen@datenschutzzentrum.de</email>
          </address>
    </author>
    <author initials="H." surname="Tschofenig" fullname="Hannes Tschofenig">
      <organization>Nokia Siemens Networks</organization>
      <address>
            <postal>
              <street>Linnoitustie 6</street>
              <city>Espoo</city>
              <code>02600</code>
              <country>Finland</country>
            </postal>
            <phone>+358 (50) 4871445</phone>
            <email>Hannes.Tschofenig@gmx.net</email>
            <uri>http://www.tschofenig.priv.at</uri>
          </address>
    </author>
    <author initials="R." surname="Smith" fullname="Rhys Smith">
      <organization>JANET(UK)</organization>
      <address>
        <email>rhys.smith@ja.net</email>
      </address>
    </author>
    
    <date year="2012"/>

    <abstract>
    
    <t>Privacy is a concept that has been debated and argued throughout the last few millennia by all manner of people. Its most striking feature is that nobody seems able to agree upon a precise definition of what it actually is. In order to discuss privacy in any meaningful way a tightly defined context needs to be elucidated. The specific context of privacy used within this document is that of "personal data", any information relating to a data subject; a data subject is an identified natural person or a natural person who can be identified, directly or indirectly. This context is highly relevant since a lot of work within the IETF involves defining protocols that can potentially transport (either explicitly or implicitly) personal data.</t>
    
        <t>This document aims to establish a basic lexicon around privacy so that IETF contributors who wish to discuss privacy considerations within their work can do so using terminology consistent across the area.</t>
        
		<t>Note: This document is discussed at https://www.ietf.org/mailman/listinfo/ietf-privacy </t>
    </abstract>
  </front>
  <middle>

    <!-- **************************************************************************************** -->
    <section anchor="intro" title="Introduction">

<t>
Privacy is a concept that has been debated and argued throughout the last few millennia by all manner of people, including philosophers, psychologists, lawyers, and more recently, computer scientists. Its most striking feature is that nobody seems able to agree upon a precise definition of what it actually is. Every individual, every group, and every culture have their own different views and preconceptions about the concept - some mutually complimentary, some distinctly different. However, it is generally (but not unanimously!) agreed that the protection of privacy is "A Good Thing" and often, people only realize what it was when they feel that they have lost it.
</t>

<!-- <t>
Even within the specific content of computing and computer science, there are still many facets to privacy. For example, consideration of privacy in terms of personal information is distinctly different from consideration of privacy in a geographical information sense: in the former a loss of privacy might be framed as the uncontrolled release of personal information without the subject's consent, while in the latter it might be the ability to compute the location of an individual beyond a certain degree of accuracy.
</t>--> 

<t>
In order to discuss privacy in any meaningful way a tightly defined context needs to be elucidated. The specific context of privacy used within this document is that of "personal data", any information relating to a data subject; a (data) subject is an identified natural person or a natural person who can be identified, directly or indirectly. We 

A lot of work within the IETF involves defining protocols that can potentially transport personal data and can therefore either, by dint of design decisions when creating them, enable either privacy protection or result in privacy breaches. 
While identifiers and data elements communicated in protocols often do not assume a specific association with a human using the software. However, a protocol may help or simplify the re-identification of a natural person by the choice of their identifiers and other state that is established and communicated, particularly when information from various sources is combined and analyzed together. <!-- For example, multiple hosts used by different persons may be attached to an single Internet gateway within a household. From the Internet Service Provider point of view all these devices belong to a single person: the subscriber with whom a contract was established. In this specific context, discussions of privacy largely centre around the collection minimalization, the usage, and release of such personal data.--> 
</t>

<t>
Work in this area of privacy and privacy protection over the last few decades has centered on the idea of data minimization; it uses terminologies such as anonymity, unlinkability, unobservability, and pseudonymity. These terms are often used in discussions about the privacy properties of systems.
</t>

<t>
The core principal of data minimization is that the ability for
others to collect personal data should be removed or at least minimized
when this is either not desirable or when it cannot be entirely prevented.
<!--
The core principal of data minimization is that the ability for others to collect any personal data should be removed. Often, however, the collection of personal data cannot not be prevented entirely, in which case the goal is to minimize the amount of personal data that can be collected for a given purpose and to offer ways to control the dissemination of personal data. --> 
</t>

<t>
Data minimization is the only generic strategy to enhance individual privacy in cases where valid personal information is used since all valid personal data inherently provides some linkability. Other techniques have been proposed and implemented that aim to enhance privacy by providing misinformation (inaccurate or erroneous information, provided usually without conscious effort to mislead or deceive) or disinformation (deliberately false or distorted information provided in order to mislead or deceive). However, these techniques are out of scope for this document. 
</t>

<t>We use the term 'attacker' in this writeup to refer to an entity that violates the privacy expectations of a data subject. The attacker may not only be an entity that is external to the system but, in many cases, the attacker is actually one of the communication partners. When necessary we use the term initiator and responder to refer to the communication interaction of a protocol. This particular terminology is used to highlight that many protocols utilize bidirectional communication where both ends send and receive data. We assume that the attacker uses all information available to infer (probabilities of) his items of interest (IOIs). These IOIs may be attributes (and their values) of personal data, or may be actions such as who sent, or who received, which messages.</t>

<t>
This document aims to establish a basic lexicon around privacy so that IETF contributors who wish to discuss privacy considerations within their work (see <xref target="I-D.iab-privacy-considerations"/>) can do so using terminology consistent across areas. Note that it does not attempt to define all aspects of privacy terminology, rather it just establishes terms to some of the most common ideas and concepts. 
</t>

</section>

    <!-- **************************************************************************************** -->
    <section anchor="anonymity" title="Anonymity">
        <t>
            <list style="hanging">
                <t hangText="Definition:">Anonymity of a subject means that the attacker 
                    cannot sufficiently identify the subject within a set of subjects, the anonymity set.</t>
            </list>
        </t>
        <t>To enable anonymity of a subject, there always has to be an appropriate set of subjects with potentially the same attributes. The set of all possible subjects is known as the anonymity set, and membership of this set may vary over time.</t>
        
        <t>The set of possible subjects depends on the knowledge of the attacker. Thus, anonymity is relative with respect to the attacker. Therefore, an initiator may be anonymous (initiator anonymity) only within a set of potential initiators - their initiator anonymity set - which itself may be a subset of all subjects who may send a message. Conversely a responder may be anonymous (responder anonymity) only within a set of potential responders - their responder anonymity set. Both anonymity sets may be disjoint, may overlap, or may be the same. 
      </t>
      
       <t>
        <list style="empty">
          <t>As an example consider RFC 3325 (P-Asserted-Identity, PAI) <xref target="RFC3325"/>, an extension for the Session Initiation Protocol (SIP), that allows subjects, such as a VoIP caller, to instruct an intermediary he or she trusts not to populate the SIP From header field with its authenticated and verified identity. The recipient of the call, as well as any other entity outside the data subjects's trust domain, would therefore only learn that the SIP message (typically a SIP INVITE) was sent with a header field 'From: "Anonymous" <sip:anonymous@anonymous.invalid>' rather than the subject's address-of-record, which is typically thought of as the "public address" of the user, i.e., the data subject. When PAI is used the subject becomes anonymous within the initiator anonymity set that is populated by every subject making use of that specific intermediary.
            </t>
            <t>Note that this example assumes that other personal data cannot be inferred from the other SIP protocol payloads, which is a useful assumption to be made in the analysis of one specific protocol extension but not for analysis of an entire architecture.</t>
        </list>
      </t>
    </section>
    <!--
          ****************************************************************************************
        -->
    <section anchor="unlinkability" title="Unlinkability">
           <t>
        <list style="hanging">
          <t hangText="Definition:"> Unlinkability of two or more Items Of Interest (e.g.,
            subjects, messages, actions, ...) means that within a particular
            set of information, the attacker cannot 
            distinguish whether these IOIs are related or not (with a high enough degree of probability to be useful).</t>
        </list>
      </t>

      <t>Unlinkability of two (or more) messages may of course depend on whether their content is
        protected against the attacker. In the cases where this is not true, messages may only be unlinkable if we
        assume that the attacker is not able to infer information about the initiator or responder from the
        message content itself. It is worth noting that even if the content itself does not betray linkable information explicitly, deep semantical analysis of a message sequence can often detect certain characteristics which link them
        together, e.g., similarities in structure, style, use of some words or phrases, consistent
        appearance of some grammatical errors, etc.
      </t>
      
     <t>
      The unlinkability property can be considered as a more "fine-grained" version of anonymity 
      since there are many more relations where unlinkability might be an issue than just the relation
      of "anonymity" between subjects and IOIs. As such, it may sometimes be necessary to explicitly 
      state to which attributes anonymity refers to (beyond the subject to IOI relationship). An attacker 
      might get to know information on linkability of various messages while not necessarily reducing 
      anonymity of the particular subject. As an example an attacker, in spite of being able to link 
      all encrypted messages in a set of transactions, does not learn the identify of the subject who 
      is the source of the transactions. 
      </t>

      <t>
      There are several items of terminology heavily related to unlinkability:
      </t>
      
      <t>
        <list style="hanging">
      <t hangText="Definition:">We use the term "profiling" to mean learning information about a particular subject while that subject remains anonymous to the attacker. For example, if an attacker concludes that a subject 
      plays a specific computer game, reads specific news article on a website, and uploads certain videos, then the subjects activities have been profiled, even if the attacker is unable to identify that specific subject.</t>

     
     <t hangText="Definition:">"Relationship anonymity" of a pair of subjects means that sender and recipient (or each
        recipient in case of multicast) are unlinkable. The classical MIX-net <xref target="Chau81"/> 
        without dummy traffic is one implementation with just this property: The attacker sees who 
        sends messages when, and who receives messages when, but cannot figure out who is sending messages 
        to whom.
        </t>
        
      <t hangText="Definition:">The term "unlinkable session" refers the ability of the system to render 
      a set of actions by a subject unlinkable from one another over a sequence of protocol runs (sessions).
      This term is useful for cases where a sequence of interactions between an initiator and a responder is necessary for the application logic rather than a single-shot message. We refer to this as a session. When doing an analysis with respect to unlinkability we compare this session to a sequence of sessions to determine linkability.
      </t>
          <t hangText="Definition:">We use the term "fingerprinting" to refer to any parameter (or set of parameters) that an attacker can observe for the purpose of re-identification. Fingerprinting is a form of tracking by associating activities of a communication software at different times, potentially with different communication partners, but without explicitly sharing state information (as it would be the case with cookies <xref target="RFC6265"/>). For example, the Panopticlick project by the Electronic Frontier Foundation uses parameters an HTTP-based Web browser shares with sites it visits to determine the uniqueness of the browser <xref target="panopticlick"/>.
          </t>
      </list> 
      </t>

    </section>

    <!--
          ****************************************************************************************
        -->
    <section anchor="undetectability" title="Undetectability">
      <t>
        <list style="hanging">
          <t hangText="Definition:"> Undetectability of an item of interest (IOI) means that the attacker cannot sufficiently distinguish whether it exists or
            not.</t>
        </list>
      </t>

      <t>In contrast to anonymity and unlinkability, where the IOI is protected indirectly through protection of the IOI's relationship to a subject or other IOI, undetectability is the direct protection of an IOI.
          For example, undetectability can be regarded as a possible and desirable property of steganographic
        systems.</t>


      <t>If we consider messages as IOIs, then undetectability means that messages are not sufficiently discernible
        from, e.g., "random noise".</t>

    </section>


    <!--
          ****************************************************************************************
        -->

<!--
    <section anchor="known-mechs"
      title="Known Mechanisms for Anonymity, Undetectability, and Unobservability">

      <t>Before it makes sense to speak about any particular mechanisms for anonymity,
        undetectability, and unobservability in communications, let us first remark that all of them
        assume that stations of users do not emit signals the attacker considered is able to use for
        identification of stations or their behavior or even for identification of users or their
        behavior. So if you travel around taking with you a mobile phone sending more or less
        continuously signals to update its location information within a cellular radio network,
        don't be surprised if you are tracked using its signals. If you use a computer emitting lots
        of radiation due to a lack of shielding, don't be surprised if observers using high-tech
        equipment know quite a bit about what's happening within your machine. If you use a
        computer, PDA, or smartphone without sophisticated access control, don't be surprised if
        Trojan horses send your secrets to anybody interested whenever you are online - or via
        electromagnetic emanations even if you think you are completely offline.</t>

      <t>DC-net <xref target="Chau85"/>, <xref target="Chau88"/>, and MIX-net <xref target="Chau81"
        /> are mechanisms to achieve sender anonymity and relationship anonymity, respectively, both
        against strong attackers. If we add dummy traffic, both provide for the corresponding
        unobservability <xref target="PfPW91"/>. If dummy traffic is used to pad sending and/or
        receiving on the sender's and/or recipient's line to a constant rate traffic, MIX-nets can
        even provide sender and/or recipient anonymity and unobservability. </t>

      <t>Broadcast <xref target="Chau85"/>, <xref target="PfWa86"/>, <xref target="Waid90"/> and
        private information retrieval <xref target="CoBi95"/> are mechanisms to achieve recipient
        anonymity against strong attackers. If we add dummy traffic, both provide for recipient
        unobservability.</t>
      <t> This may be summarized: A mechanism to achieve some kind of anonymity appropriately
        combined with dummy traffic yields the corresponding kind of unobservability.</t>
      <t> Of course, dummy traffic alone can be used to make the number and/or length of sent
        messages undetectable by everybody except for the recipients; respectively, dummy traffic
        can be used to make the number and/or length of received messages undetectable by everybody
        except for the senders. (Note: Misinformation and disinformation may be regarded as semantic
        dummy traffic, i.e., communication from which an attacker cannot decide which are real
        requests with real data or which are fake ones. Assuming the authenticity of misinformation
        or disinformation may lead to privacy problems for (innocent) bystanders.) </t>
      <t>As a side remark, we mention steganography and spread spectrum as two other well-known
        undetectability mechanisms.</t>
      <t> The usual concept to achieve undetectability of IOIs at some layer, e.g., sending
        meaningful messages, is to achieve statistical independence of all discernible phenomena at
        some lower implementation layer. An example is sending dummy messages at some lower layer to
        achieve, e.g., a constant rate flow of messages looking - by means of encryption - randomly
        for all parties except the sender and the recipient(s). </t>
    </section>

--> 
    <!--
          ****************************************************************************************
    -->

    <section anchor="pseudonymity" title="Pseudonymity">

      <t>
        <list style="hanging">
          <t hangText="Definition:">A pseudonym is an identifier of a subject other than one of the
            subject's real names. An identifier, as defined in 
        <xref target="id"/>, is "a lexical token that names entities".</t>
        </list>
      </t>
      <t>In the context of IETF protocols almost all identifiers are pseudonyms since there is typically no requirement 
      to use real names in protocols. However, in certain scenario it is reasonable to assume that real names 
      will be used and it will be worthwhile to point out this circumstance. </t>

      <t>For Internet protocols it is important whether protocols allow identifiers to be recycled dynamically, 
      what the lifetime of the pseudonyms are, to whom they get exposed, 
      how subjects are able to control disclosure, and how often 
      they can be changed over time (and what the consequences are when they are regularly changed). 
      These aspects are described in <xref target="I-D.iab-privacy-considerations"/>.</t>  

      <t>Achieving anonymity, unlinkability, and maybe undetectability may enable the ideal of  
        data minimization. Unfortunately, it would also prevent a certain class of useful two-way communication scenarios. 
        Therefore, for many
        applications, we need to accept a certain amount of linkability and detectability while attempting to retain unlinkability between the subject and their transactions. This is achieved through appropriate kinds of pseudonymous identifiers. These identifiers are then often used to refer 
        to established state or are used for access control purposes, see <xref target="I-D.iab-identifier-comparison"/>. </t>

      <t>The term 'real name' is the antonym to "pseudonym". There may be multiple real names
            over a lifetime -- in particular legal names. For example, a human being may possess the names which
            appear on their birth certificate or on other official identity documents issued by the
            State; for a legal person the name under which it operates and which is registered in
            official registers (e.g., commercial register or register of associations). A human
            being's real name typically comprises their given name and a family name. Note that
            from a mere technological perspective it cannot always be determined whether an
            identifier of a subject is a pseudonym or a real name.
      </t>

      <t>Additional useful terms are:</t>
      <t>
        <list style="hanging">
          <t hangText="Definition:">The "holder" of the pseudonym is the subject to whom the pseudonym refers.</t>
          <t hangText="Definition:">A subject is "pseudonymous" if a pseudonym is used as identifier
            instead of one of its real names.</t>
          <t hangText="Definition:">Pseudonymity is the state of remaining pseudonymous through the use of pseudonyms as identifiers.</t>
        </list>
      </t>

        <t>Sender pseudonymity is defined as the sender being pseudonymous, recipient pseudonymity
        is defined as the recipient being pseudonymous.</t>

      <t>Anonymity through the use of pseudonyms is stronger where ... <list style="symbols">
          <t> the less personal data of the pseudonym holder can be linked to the pseudonym;</t>
          <t> the less often and the less context-spanning pseudonyms are used and therefore the
            less data about the holder can be linked;</t>
          <t> the more often independently chosen 
            pseudonyms are used for new actions (i.e., making them, from an observer's perspective, unlinkable)</t>
        </list>
      </t>
      
      <!-- Example with a transaction pseudonym here. --> 
      
    </section>

    <!--
          ****************************************************************************************
        -->

<!-- 
    <section anchor="known-other" title="Known mechanisms and other properties of pseudonyms">
      <t>A digital pseudonym could be realized as a public key to test digital signatures where the
        holder of the pseudonym can prove holdership by forming a digital signature which is created
        using the corresponding private key <xref target="Chau81"/>. The most prominent example for
        digital pseudonyms are public keys generated by the user himself/herself, e.g., using PGP.
        In using PGP, each user may create an unlimited number of key pairs by himself/herself (at
        this moment, such a key pair is an initially unlinked pseudonym), bind each of them to an
        e-mail address, self-certify each public key by using his/her digital signature or asking
        another introducer to do so, and circulate it.</t>

      <t>A public key certificate bears a digital signature of a so-called certification authority
        and provides some assurance to the binding of a public key to another pseudonym, usually
        held by the same subject. In case that pseudonym is the civil identity (the real name) of a
        subject, such a certificate is called an identity certificate. An attribute certificate is a
        digital certificate which contains further information (attribute values) and clearly refers
        to a specific public key certificate. Independent of certificates, attributes may be used as
        identifiers of sets of subjects as well. Normally, attributes refer to sets of subjects
        (i.e., the anonymity set), not to one specific subject.</t>

      <t>There are several other properties of pseudonyms related to their use, such as revocation, 
         lifetime of the pseudonym, non-transferability, frequency of pseudonym changeover, the 
         ability to reveal civil identities in case of abuse, etc. Some of the properties may require 
         extension of the digital pseudonym by attributes of some kind. The binding of attributes to 
         a pseudonym can be documented in an attribute certificate produced either by the holder 
         himself/herself or by a certification authority.
      </t>
    </section>
--> 
    

    <!--
          ****************************************************************************************
        -->

    <section anchor="acks" title="Acknowledgments">
      <t>Parts of this document utilizes content from <xref target="anon_terminology"/>, which had a long history starting in
        2000 and whose quality was improved due to the feedback from a number of people. The authors would like to thank Andreas Pfitzmann for his work on an earlier draft version of this 
       document.</t>
       
     <t>Within the IETF a number of persons had provided their feedback to this document. We would like to thank Scott Brim, Marc Linsner, Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Alissa Cooper, Scott Bradner, Nat Sakimura, Bjoern Hoehrmann, 
     David Singer, Dean Willis, Christine Runnegar, Lucy Lynch, Trend Adams, Mark Lizar, Martin Thomson, Josh Howlett, and Mischa Tuffield.      
     </t>
    </section>

    <!--
          ****************************************************************************************
        -->

    <section anchor="security" title="Security Considerations">
      <t>This document introduces terminology for talking about privacy within IETF specifications. Since 
      privacy protection often relies on security mechanisms then this document is also related to security 
      in its broader context.</t>
    </section> 

    <!--
          ****************************************************************************************
        -->

    <section anchor="iana" title="IANA Considerations">
      <t>This document does not require actions by IANA.</t>
    </section> 

    <!--
          ****************************************************************************************
        -->

  </middle>
  <back>
    <references title="Normative References">  
    
     &I-D.iab-privacy-considerations; 

      <reference anchor="id">
        <front>
          <title>Identifier - Wikipeadia</title>
          <author/>
          <date year="2011"/>
        </front>
        <seriesInfo name="Wikipedia" value=", URL: http://en.wikipedia.org/wiki/Identifier"/>
      </reference>

    </references>
    
    <references title="Informative References">
    
     &RFC3325; 
     &I-D.iab-identifier-comparison;
     &RFC6265;
     
     
     
     
      <reference anchor="panopticlick">
        <front>
          <title>How Unique Is Your Web Browser?</title>
          <author fullname="Peter Eckersley" initials="P." surname="Eckersley"> </author>
          <date year="2009"/>
        </front>
        <seriesInfo name="Electronig Frontier Foundation" value=", URL: https://panopticlick.eff.org/browser-uniqueness.pdf"/>
      </reference>

     
      <reference anchor="Chau81">
        <front>
          <title>Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms</title>
          <author fullname="David Chaum" initials="D." surname="Chaum"> </author>
          <date year="1981"/>
        </front>
        <seriesInfo name="Communications of the ACM" value=", 24/2, 84-88"/>
      </reference>
      
      
      <reference anchor="anon_terminology">
        <front>
          <title>A terminology for talking about privacy by data minimization: 
		Anonymity, Unlinkability, Undetectability, Unobservability, 
		Pseudonymity, and Identity Management</title>
          <author fullname="Andreas Pfitzmann" initials="A." surname="Pfitzmann"> </author>
          <author fullname="Andreas Pfitzmann" initials="A." surname="Pfitzmann"> </author>
          <date year="2010"/>
        </front>
        <seriesInfo name="URL: http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.34.pdf" value=", version 034"/>
      </reference>
   </references>
  
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 08:54:04