One document matched: draft-boschi-ipfix-anon-01.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY draftIpfixArch PUBLIC "" "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-arch.xml">
<!ENTITY draftIpfixAs PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-as.xml'>
<!ENTITY draftIpfixArchitecture PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-architecture.xml'>
<!ENTITY draftIpfixRR PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-reducing-redundancy.xml'>
<!ENTITY rfc3917 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3917.xml'>
<!ENTITY rfc5101 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5101.xml">
<!ENTITY rfc5102 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5102.xml">
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
] >
<rfc ipr="full3978" category="exp" docName="draft-boschi-ipfix-anon-01.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<front>
<title abbrev="IP Flow Anonymisation Support">
IP Flow Anonymisation Support
</title>
<author initials="E." surname="Boschi" fullname="Elisa Boschi">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 57</phone>
<email>elisa.boschi@hitachi-eu.com</email>
</address>
</author>
<author initials="B." surname="Trammell" fullname="Brian Trammell">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 13</phone>
<email>brian.trammell@hitachi-eu.com</email>
</address>
</author>
<date month="July" day="14" year="2008"></date>
<area>Operations</area>
<workgroup>IPFIX Working Group</workgroup>
<abstract>
<t>This document describes anonymisation techniques for IP flow data. It
provides a categorization of common anonymisation schemes and defines the
parameters needed to describe them. It describes support for anonymization
within the IPFIX protocol, providing the basis for the definition of
information models for configuring anonymisation techniques within an
IPFIX Metering or Exporting Process, and for reporting the technique in
use to an IPFIX Collecting Process.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The standardisation of an IP flow information export protocol <xref target="RFC5101"></xref> and associated representations removes a
technical barrier to the sharing of IP flow data across organizational
boundaries and with network operations, security, and research communities
for a wide variety of purposes. However, with wider dissemination comes
greater risks to the privacy of the users of networks under measurement,
and to the security of those networks. While it is not a complete solution
to the issues posed by distribution of IP flow information, anonymisation
is an important tool for the protection of privacy within network
measurement infrastructures.</t>
<!-- Additionally, various jurisdictions define
data protection laws and regulations that flow measurement activities must
comply with, and anonymisation may be a part of such compliance [IMC07,
FloCon08]. -->
<t>This document presents a mechanism for representing anonymised data
within IPFIX and guidelines for using it. It begins with a categorization
of anonymisation techniques. It then describes applicability of each
technique to commonly anonymisable fields of IP flow data, organized by
information element data type and semantics as in <xref target="RFC5102"></xref>; enumerates the parameters required by each of
the applicable anonymisation techniques; and provides guidelines for the
use of each of these techniques in accordance with best practices in data
protection. Finally, it specifies a mechanism for exporting anonymised
data and binding anonymisation metadata to templates using IPFIX
Options.</t>
<section title="IPFIX Protocol Overview">
<t>In the IPFIX protocol, { type, length, value } tuples are expressed
in templates containing { type, length } pairs, specifying which { value
} fields are present in data records conforming to the Template, giving
great flexibility as to what data is transmitted. Since Templates are
sent very infrequently compared with Data Records, this results in
significant bandwidth savings. Various different data formats may be
transmitted simply by sending new Templates specifying the { type,
length } pairs for the new data format. See <xref target="RFC5101"></xref> for more information.</t>
<t>The <xref target="RFC5102">IPFIX information model</xref> defines a
large number of standard Information Elements which provide the
necessary { type } information for Templates. The use of standard
elements enables interoperability among different vendors'
implementations. Additionally, non-standard enterprise-specific elements
may be defined for private use.</t>
</section>
<section title="IPFIX Documents Overview" anchor="intro-docs">
<t><xref target="RFC5101">"Specification of the IPFIX
Protocol for the Exchange of IP Traffic Flow Information"</xref>
and its associated documents
define the IPFIX Protocol, which provides network engineers and
administrators with access to IP traffic flow information.</t>
<t><xref target="I-D.ietf-ipfix-arch">"Architecture for IP Flow
Information Export"</xref> defines
the architecture for the export of measured IP flow information out of
an IPFIX Exporting Process to an IPFIX Collecting Process, and the
basic terminology used to describe the elements of this architecture,
per the requirements defined in <xref target="RFC3917">"Requirements
for IP Flow Information Export"</xref>. The IPFIX Protocol document
<xref target="RFC5101"></xref> then covers the details of the method for
transporting IPFIX Data Records and Templates via a congestion-aware
transport protocol from an IPFIX Exporting Process to an IPFIX
Collecting Process.</t>
<t><xref target="RFC5102">"Information Model for IP Flow
Information Export"</xref> describes the Information Elements used by IPFIX, including
details on Information Element naming, numbering, and data type
encoding. Finally, <xref target="I-D.ietf-ipfix-as">"IPFIX
Applicability"</xref> describes the various applications of the IPFIX
protocol and their use of information exported via IPFIX, and relates
the IPFIX architecture to other measurement architectures and
frameworks.</t>
<t>This document references the Protocol and Architecture documents for
terminology and extends the IPFIX Information Model to provide new
Information Elements for anonymisation metadata.</t>
</section>
</section>
<section title="Terminology">
<t>Terms used in this document that are defined in the Terminology section
of the <xref target="RFC5101">IPFIX Protocol</xref> document are to be
interpreted as defined there.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119">RFC
2119</xref>.</t>
</section>
<section title="Categorisation of Anonymisation Techniques">
<t>Anonymisation modifies a data set in order to
protect the identity of the people or entities described by the data set
from disclosure. With respect to network traffic data, anonymisation
generally attempts to preserve some set of properties of the network
traffic useful for a given application or applications, while ensuring the
data cannot be traced back to the specific networks, hosts, or users
generating the traffic.</t>
<t>Anonymisation may be broadly split into three categories:
generalisation and reversible or irreversible substitution. When
generalisation is used, identifying information is grouped in sets, and
one single value is used to identify each set element. In effect, this
causes multiple records to become indistinguishable, thereby aggregating
them together. Generalisation is an irreversible operation, in that the
information needed to identify a single record from its "generalised
value" is lost.</t>
<t>Substitution (or pseudonymization) maps the real space of identifiers
or values into a separate, replacement space, using some
substitution function. If the substitution function is invertible or can
otherwise be reversed, then the substitution is reversible, and a real
identifier can be recovered from a given replacement identifier. This allows
to keep different elements distinguishable from each other: the number of different
elements in the real and the replacement space is the same.</t>
<t>Irreversible substitution results when a randomising or one-way function
is used to map the value space; real identifiers cannot be recovered in an
irreversible substitution. The number of different elements in the real and
replacement spaces is not necessarily the same.</t>
</section>
<section title="Anonymisation of IP Flow Data">
<t>Due to the restricted semantics of IP flow data, there are a relatively
limited set of specific anonymisation techniques available on flow data,
though each falls into the broad categories above. Each type of field that
may commonly appear in a flow record may have its own applicable specific
techniques.</t>
<t>While anonymisation is generally applied at the resolution of single
fields within a flow record, attacks against anonymisation use entire
flows and relationships between hosts and flows within a given data set.
Therefore, fields which may not necessarily be identifying by themselves
may be anonymised in order to increase the anonymity of the data set as a
whole.</t>
<t>Of all the fields in an IP flow record, only IP addresses directly
identify entities in the real world. Each IP address is associated with an
interface on a network host, and can potentially be identified with a
single user. Additionally, IP addresses are structured identifiers; that
is, partial IP address prefixes may be used to identify networks just as
full IP addresses identify hosts. This makes anonymisation of IP addresses
particularly important.</t>
<t>Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and ephemeral
ports, can be used to determine the operating system running on a host.
Relative data volumes by port can also be used to determine the host's
function (workstation, web server, etc.); this information can be used to
identify hosts and users.</t>
<t>While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any given
network activity is recognizable by a pattern of relative time differences
and data volumes in the associated sequence of flows, even without host
address information. They can therefore be used to identify hosts and
users. Timestamps and counters are also vulnerable to traffic injection
attacks, where traffic with a known pattern is injected into a network
under measurement, and this pattern is later identified in the anonymised
data set. </t>
<t>The simplest and most extreme form of anonymisation, which can be
applied to any field of a flow record, is black-marker anonymisation, or
complete deletion of a given field. <!--Note that black-marker anonymisation
is equivalent to simply not exporting the field(s) in question.</t>
<t> -->While black-marker anonymisation completely protects the data in the
deleted fields from the risk of disclosure, it also reduces the utility of
the anonymised data set as a whole. Techniques that retain some
information while reducing (though not eliminating) the disclosure risk
will be extensively discussed in the following sections; note that the
techniques specifically applicable to IP addresses, timestamps, and counters
will be discussed in separate sections.</t>
<section title="IP Address Anonymisation">
<t>The following table gives an overview of the schemes for IP address
anonymization described in this document and their categorization.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<ttcol align="left">Reversibility</ttcol>
<c>Truncation</c><c>Generalisation</c><c>N</c>
<c>Random Permutation</c><c>Substitution</c><c>Y/N</c>
<c>Prefix-preserving Pseudonymisation</c><c>Substitution</c><c>Y</c>
</texttable>
<t>Note that random permutations might be either reversible or not, depending
on the function used.</t>
<section title="Truncation">
<t>Truncation removes "n" of the least significant bits from an IP Address.
Note that truncating 8 bits would replace an IP Address with the corresponding
class C network address.</t>
</section>
<section title="Random Permutations">
<t>When random permutations are used, each IP Address is replaced with a
random permutation on the set of possible IP Addresses. The permutation function
can be implemented using hash tables.</t>
</section>
<section title="Prefix-preserving Pseudonymisation">
<t>Prefix-preserving pseudonymisation preserves the structure of IP Addresses.
If two IP Addresses match on a prefix of "n" bits, their anonymised versions
will match on a prefix of "n" bits too.</t>
</section>
<!--[EDITOR'S NOTE: text from old section above follows:
Prefix-preserving anonymisation is a (generally irreversible)
substitution technique which has the additional property that the
structure of the IP address space is maintained in the anonymised
data.]
<t>[EDITOR'S NOTE: Is prefix-preserving anonymisation necessarily
reversible? Is scrambling? Some potential implementations are, some are
not. This is one of the reasons I'm not sure reversibility is a prime
category -bt]</t>
<t>[TODO: This section is incomplete; text here should expand
on the table.]</t> -->
</section>
<section title="Timestamp Anonymisation">
<t>[TODO: introductory text]</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<ttcol align="left">Reversibility</ttcol>
<c>Precision Degradation</c><c>Generalisation</c><c>N</c>
<c>Enumeration</c><c>Substitution</c><c>Y</c>
<c>Random Shifts</c><c>Substitution</c><c>Y</c>
</texttable>
<section title="Precision Degradation">
<t>Precision Degradation removes the most precise components of a timestamp,
accounting all events occurring in each given interval (e.g. one millisecond
for millisecond level degradation) as simultaneous. This has the effect of
potentially collapsing many timestamps into one.
With this technique time precision is reduced, and sequencing may be lost,
but the information at which time the event happened is kept.
</t>
</section>
<section title="Enumeration">
<t>Enumeration keeps the chronological order in which events occurred while
eliminating time information. Timestamps are substituted by equidistant timestamps
(or numbers) starting from an rendomly chosen start value.</t>
</section>
<section title="Random Time Shifts">
<t>Random Time Shifts keep the information on how far apart two events are
from each other. This is achieved by shifting all timestamps by the same
random number. Note that random time shifts also preserve chronological order.</t>
</section>
<!-- Therefore, a variety of anonymisation techniques are available,
including loss of precision (a form of generalisation), or noise
addition (substitution), which may or may not preserve the sequencing of
flows and relationships among volumes in the data set. -->
</section>
<section title="Counter Anonymisation">
<t>Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymisation, as
timestamps are, but relative magnitudes of activity can be useful for
certain analysis tasks. [TODO: more intro text]</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<ttcol align="left">Reversibility</ttcol>
<c>Precision Degradation</c><c>Generalisation</c><c>N</c>
<c>Binning</c><c>Generalisation</c><c>N</c>
<c>Random noise addition</c><c>Substitution</c><c>N</c>
</texttable>
<section title="Precision Degradation">
<t>As with precision degradation in timestamps, precision degradation of
counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships
between sizes of similarly-sized flows, but keeps relative magnitude
information.</t>
</section>
<section title="Binning">
<t>Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the counter
ranges are uniform, and in binning they need not be. For example, a common
counter binning scheme for packet counters could be to bin values 1-2
together, and 3-infinity together, thereby separating potentially
completely-opened TCP connections from unopened ones. Binning schemes are
generally chosen to keep precisely the amount of information required in a
counter for a given analysis task</t>
</section>
<section title="Random Noise Addition">
<t>Random noise addition adds a random amount to a counter in each flow;
this is used to keep relative magnitude information and minimize the
disruption to size relationship information while avoiding fingerprinting
attacks against anonymization.</t>
</section>
</section>
<section title="Anonymisation of Other Flow Fields">
<t>[TODO: as section 4.1]</t>
<!--<t>[EDITOR'S NOTE: Port Numbers go here. Counters might, if not above.
It might make sense to split this into flow key anonymisation versus
flow value anonymisation.]</t> -->
</section>
</section>
<section title="Parameters for the Description of Anonymisation Techniques">
<t>[TODO: see corresponding section of draft-ietf-psamp-sample-tech for
the proposed structure of this section.] </t>
</section>
<section title="Anonymisation Support in IPFIX">
<t>[TODO: Here we'll describe how the information specified above can be
transmitted on the wire using an option template. The idea is to scope the
option to the Template ID and for each field specify which are anonymised,
providing info on the output characteristics of the technique, and which
ones aren't.]</t>
<t>[EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the same
time is indicated with multiple elements of the same type (in application
order as in PSAMP)]</t>
<t>[EDITOR'S NOTE: for blackmarking we'll recommend not to export the
information at all following the data protection law principle that only
necessary information should be exported.]</t>
<!-- All this came from file. It is probably wrong. -->
<!--
<section anchor="ie-informationElementAnonymized"
title="informationElementAnonymized">
<list style="hanging">
<t hangText="Description: "> A description of the anonymization
status of an IPFIX information element within a template. If this
field is FALSE, the corresponding Information Element is not
anonymized; to the best ability of the Exporting Process to
determine, it represents a real value. If this field is TRUE, the
corresponding Information Element is anonymized; to the best
ability of the Exporting Process to determine, it represents a
value that has been transformed to maintain privacy. Note that if
no informationElementAnonymized is specified for an information
element, it is assumed to be FALSE, or not anonymized.</t>
<t hangText="Abstract Data Type: ">boolean</t>
<t hangText="ElementId: ">TBD2</t>
<t hangText="Status: ">current</t>
</list>
</section>
-->
<!--
<t>This is followed by the Template Anonymization records noting that
the source and destination IPv4 address described the data template
are anonymized, as shown in <xref target="ex-anon" /></t>
<figure title="File Example Template Anonymization" anchor="ex-anon">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 260 | Length = 14 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| templateId | informationElementId |
| 256 | 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| iEAnonymized | templateId | iEId
| true | 256 | 12 . . .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| iEAnonymized |
. . . | true |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
-->
</section>
<section title="Security Considerations">
<t>[TODO: write this section.]</t>
</section>
<section title="IANA Considerations">
<t>This document contains no actions for IANA.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc5101;
&rfc5102;
</references>
<references title="Informative References">
&draftIpfixArch;
&draftIpfixAs;
&draftIpfixArchitecture;
&draftIpfixRR;
&rfc3917;
&rfc2119;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 02:40:22 |