One document matched: draft-ietf-ipfix-anon-01.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY draftIpfixMedps PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-mediators-problem-statement.xml'>
<!ENTITY draftIpfixMedframe PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-mediators-framework.xml'>
<!ENTITY rfc3330 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3330.xml'>
<!ENTITY rfc3917 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3917.xml'>
<!ENTITY rfc5101 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5101.xml">
<!ENTITY rfc5102 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5102.xml">
<!ENTITY rfc5103 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5103.xml">
<!ENTITY rfc5153 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5153.xml">
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc5470 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5470.xml">
<!ENTITY rfc5472 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5472.xml">
<!ENTITY rfc5610 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5610.xml">
<!ENTITY rfc5655 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5655.xml">
] >
<rfc ipr="trust200902" category="exp" docName="draft-ietf-ipfix-anon-01.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<front>
<title abbrev="IP Flow Anonymisation Support">
IP Flow Anonymisation Support
</title>
<author initials="E." surname="Boschi" fullname="Elisa Boschi">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 57</phone>
<email>elisa.boschi@hitachi-eu.com</email>
</address>
</author>
<author initials="B." surname="Trammell" fullname="Brian Trammell">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 13</phone>
<email>brian.trammell@hitachi-eu.com</email>
</address>
</author>
<date month="November" day="19" year="2009"></date>
<area>Operations</area>
<workgroup>IPFIX Working Group</workgroup>
<abstract>
<t>This document describes anonymisation techniques for IP flow data and
the export of anonymised data using the IPFIX protocol. It provides a
categorization of common anonymisation schemes and defines the parameters
needed to describe them. It provides guidelines for the implementation of
anonymised data export and storage over IPFIX, and describes an
Options-based method for anonymisation metadata export within the
IPFIX protocol, providing the basis for the definition of information
models for configuring anonymisation techniques within an IPFIX Metering
or Exporting Process, and for reporting the technique in use to an IPFIX
Collecting Process.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The standardisation of an IP flow information export protocol <xref target="RFC5101"/> and associated representations removes a
technical barrier to the sharing of IP flow data across organizational
boundaries and with network operations, security, and research communities
for a wide variety of purposes. However, with wider dissemination comes
greater risks to the privacy of the users of networks under measurement,
and to the security of those networks. While it is not a complete solution
to the issues posed by distribution of IP flow information, anonymisation
(i.e., the deletion or transformation of information that is considered
sensitive and could be used to reveal the identity of subjects involved in
a communication) is an important tool for the protection of privacy within
network measurement infrastructures.</t>
<t>This document presents a mechanism for representing anonymised data
within IPFIX and guidelines for using it. It begins with a categorization
of anonymisation techniques. It then describes applicability of each
technique to commonly anonymisable fields of IP flow data, organized by
information element data type and semantics as in <xref target="RFC5102"></xref>; enumerates the parameters required by each of
the applicable anonymisation techniques; and provides guidelines for the
use of each of these techniques in accordance with best practices in data
protection. Finally, it specifies a mechanism for exporting anonymised
data and binding anonymisation metadata to templates using IPFIX
Options.</t>
<section title="IPFIX Protocol Overview">
<t>In the IPFIX protocol, { type, length, value } tuples are expressed
in templates containing { type, length } pairs, specifying which { value
} fields are present in data records conforming to the Template, giving
great flexibility as to what data is transmitted. Since Templates are
sent very infrequently compared with Data Records, this results in
significant bandwidth savings. Various different data formats may be
transmitted simply by sending new Templates specifying the { type,
length } pairs for the new data format. See <xref target="RFC5101"></xref> for more information.</t>
<t>The <xref target="RFC5102">IPFIX information model</xref> defines a
large number of standard Information Elements which provide the
necessary { type } information for Templates. The use of standard
elements enables interoperability among different vendors'
implementations. Additionally, non-standard enterprise-specific elements
may be defined for private use.</t>
</section>
<section title="IPFIX Documents Overview" anchor="intro-docs">
<t><xref target="RFC5101">"Specification of the IPFIX
Protocol for the Exchange of IP Traffic Flow Information"</xref>
and its associated documents
define the IPFIX Protocol, which provides network engineers and
administrators with access to IP traffic flow information.</t>
<t><xref target="RFC5470">"Architecture for IP Flow
Information Export"</xref> defines
the architecture for the export of measured IP flow information out of
an IPFIX Exporting Process to an IPFIX Collecting Process, and the
basic terminology used to describe the elements of this architecture,
per the requirements defined in <xref target="RFC3917">"Requirements
for IP Flow Information Export"</xref>. The IPFIX Protocol document
<xref target="RFC5101"></xref> then covers the details of the method for
transporting IPFIX Data Records and Templates via a congestion-aware
transport protocol from an IPFIX Exporting Process to an IPFIX
Collecting Process.</t>
<t><xref target="RFC5102">"Information Model for IP Flow Information
Export"</xref> describes the Information Elements used by IPFIX,
including details on Information Element naming, numbering, and data
type encoding. Finally, <xref target="RFC5472">"IPFIX
Applicability"</xref> describes the various applications of the IPFIX
protocol and their use of information exported via IPFIX, and relates
the IPFIX architecture to other measurement architectures and
frameworks.</t>
<t>Additionally, <xref target="RFC5655">"Specification
of the IPFIX File Format"</xref> describes a file format based upon the
IPFIX Protocol for the storage of flow data.</t>
<t>This document references the Protocol and Architecture documents for
terminology, and extends the IPFIX Information Model to provide new
Information Elements for anonymisation metadata. The anonymisation
techniques described herein are equally applicable to the IPFIX Protocol
and data stored in IPFIX Files.</t>
</section>
<section title="Anonymisation within the IPFIX Architecture" anchor="intro-arch">
<t> <xref target="RFC5470">"Architecture for IP Flow Information
Export"</xref> defines the functions performed in sequence by the various
functional blocks in an IPFIX Device as in the figure below.</t>
<figure title="IPFIX Device functional blocks" anchor="ipfix-dev">
<artwork><![CDATA[
Packet(s) coming into Observation Point(s)
| |
v v
+----------------+-------------------------+ +-----+-------+
| Metering Process on an | | |
| Observation Point | | |
| | | |
| packet header capturing | | |
| | |...| Metering |
| timestamping | | Process N |
| | | | |
| +----->+ | | |
| | | | | |
| | sampling Si (1:1 in case of no | | |
| | | sampling) | | |
| | filtering Fi (select all when | | |
| | | no criteria) | | |
| +------+ | | |
| | | | |
| | Timing out Flows | | |
| | Handle resource overloads | | |
+--------|---------------------------------+ +-----|-------+
| |
Flow Records (identified by Observation Domain) Flow Records
| |
+---------+---------------------------------+
|
+--------------------|----------------------------------------------+
| | Exporting Process |
|+-------------------|-------------------------------------------+ |
|| v IPFIX Protocol | |
||+-----------------------------+ +----------------------------+| |
|||Rules for | |Functions || |
||| Picking/sending Templates | |-Packetise selected Control || |
||| Picking/sending Flow Records|->| & data Information into || |
||| Encoding Template & data | | IPFIX export packets. || |
||| Selecting Flows to export(*)| |-Handle export errors || |
||+-----------------------------+ +----------------------------+| |
|+----------------------------+----------------------------------+ |
| | |
| exported IPFIX Messages |
| | |
| +------------+-----------------+ |
| | Anonymise export packet(*) | |
| +------------+-----------------+ |
| | |
| +------------+-----------------+ |
| | Transport Protocol | |
| +------------+-----------------+ |
| | |
+-----------------------------+-------------------------------------+
|
v
IPFIX export packet to Collector
(*) indicates that the block is optional.
]]></artwork>
</figure>
<t>Note that, according to the original architecture specification, IPFIX Message anonymisation is optionally performed as the final operation before handing the Message to the transport protocol for export. While no provision is made in the architecture for anonymisation metadata as in <xref target="aes-section"></xref>, this arrangement does allow for the message rewriting necessary for comprehensive anonymisation of IPFIX export as in <xref target="export-anon-section"></xref>. The development of the <xref target="I-D.ietf-ipfix-mediators-framework">IPFIX Mediation</xref> framework and the <xref target="RFC5655">IPFIX File Format</xref> expand upon this initial architectural allowance for anonymisation by adding to the list of places that anonymisation may be applied. The former specifies IPFIX Mediators, which rewrite existing IPFIX messages, and the latter specifies a method for storage of IPFIX data in files.</t>
<t>More detail on the applicable architectural arrangements of anonymisation can be found in <xref target="export-anon-arrangement"></xref></t>.
</section>
</section>
<section title="Terminology">
<t>Terms used in this document that are defined in the Terminology section
of the <xref target="RFC5101">IPFIX Protocol</xref> document are to be
interpreted as defined there.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119">RFC
2119</xref>.</t>
</section>
<section title="Categorisation of Anonymisation Techniques">
<t>Anonymisation modifies a data set in order to
protect the identity of the people or entities described by the data set
from disclosure. With respect to network traffic data, anonymisation
generally attempts to preserve some set of properties of the network
traffic useful for a given application or applications, while ensuring the
data cannot be traced back to the specific networks, hosts, or users
generating the traffic.</t>
<t>Anonymisation may be broadly classified according to two properties:
recoverability and countability. All anonymisation techniques map the real
space of identifiers or values into a separate, anonymised space,
according to some function. A technique is said to be recoverable when the
function used is invertible or can otherwise be reversed and a real
identifier can be recovered from a given replacement identifier.</t>
<t>Countability compares the dimension of the anonymised space (N) to the
dimension of the real space (M), and denotes how the count of unique
values is preserved by the anonymisation function. If the anonymised space
is smaller than the real space, then the function is said to generalise
the input, mapping more than one input point to each anonymous value
(e.g., as with aggregation). By definition, generalisation is not
recoverable.</t>
<t>If the dimensions of the anonymised and real spaces are the
same, such that the count of unique values is preserved, then the function
is said to be a direct substitution function. If the dimension of the
anonymised space is larger, such that each real value maps to a set of
anonymised values, then the function is said to be a set substitution
function. Note that with set substitution functions, the sets of
anonymised values are not necessarily disjoint. Either direct or set
substitution functions are said to be one-way if there exists no method
for recovering the real data point from an anonymised one.</t>
<t>This classification is summarised in the table below.</t>
<texttable>
<ttcol align="left">Recoverability / Countability</ttcol>
<ttcol align="left">Recoverable</ttcol>
<ttcol align="left">Non-recoverable</ttcol>
<c>N < M </c><c>N.A.</c><c>Generalisation</c>
<c>N = M </c><c>Direct Substitution</c><c>One-way Direct Substitution</c>
<c>N > M </c><c>Set Substitution</c><c>One-way Set Substitution</c>
</texttable>
</section>
<section title="Anonymisation of IP Flow Data">
<t>Due to the restricted semantics of IP flow data, there are a relatively
limited set of specific anonymisation techniques available on flow data,
though each falls into the broad categories above. Each type of field that
may commonly appear in a flow record may have its own applicable specific
techniques.</t>
<t>While anonymisation is generally applied at the resolution of single
fields within a flow record, attacks against anonymisation use entire
flows and relationships between hosts and flows within a given data set.
Therefore, fields which may not necessarily be identifying by themselves
may be anonymised in order to increase the anonymity of the data set as a
whole.</t>
<t>Of all the fields in an IP flow record, only IP addresses directly
identify entities in the real world. Each IP address is associated with an
interface on a network host, and can potentially be identified with a
single user. Additionally, IP addresses are structured identifiers; that
is, partial IP address prefixes may be used to identify networks just as
full IP addresses identify hosts. This makes anonymisation of IP addresses
particularly important.</t>
<t>Hardware addresses uniquely identify devices on the network; while they
are not often available in traffic data collected at Layer 3, and cannot
be used to locate devices within the network, some traces may contain
sub-IP data including hardware address data. Hardware addresses may be
mappable to device serial numbers, and to the entities or individuals who
purchased the devices, when combined with external databases. They may
also leak via IPv6 addresses in certain circumstances. Therefore, hardware
address anonymisation is also important.</t>
<t>Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and ephemeral
ports, can be used to determine the operating system running on a host.
Relative data volumes by port can also be used to determine the host's
function (workstation, web server, etc.); this information can be used to
identify hosts and users.</t>
<t>While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any given
network activity is recognizable by a pattern of relative time differences
and data volumes in the associated sequence of flows, even without host
address information. They can therefore be used to identify hosts and
users. Timestamps and counters are also vulnerable to traffic injection
attacks, where traffic with a known pattern is injected into a network
under measurement, and this pattern is later identified in the anonymised
data set. </t>
<t>The simplest and most extreme form of anonymisation, which can be
applied to any field of a flow record, is black-marker anonymisation, or
complete deletion of a given field. Note that black-marker anonymisation
is equivalent to simply not exporting the field(s) in question.</t>
<t> While black-marker anonymisation completely protects the data in
the deleted fields from the risk of disclosure, it also reduces the
utility of the anonymised data set as a whole. Techniques that retain some
information while reducing (though not eliminating) the disclosure risk
will be extensively discussed in the following sections; note that the
techniques specifically applicable to IP addresses, timestamps, ports, and
counters will be discussed in separate sections.</t>
<section title="IP Address Anonymisation">
<t>Since IP addresses are the most common identifiers within flow data
that can be used to directly identify a person, organization, or host,
most of the work on flow and trace data anonymisation has gone into IP
address anonymisation techniques. Indeed, the aim of most attacks
against anonymisation is to recover the map from anonymised IP addresses
to original IP addresses thereby identifying the identified hosts. There
is therefore a wide range of IP address anonymisation schemes that fit
into the following categories.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Truncation</c><c>Generalisation</c>
<c>Reverse Truncation</c><c >Generalisation</c>
<c>Random Permutation</c><c>Direct Substitution</c>
<c>Prefix-preserving Pseudonymisation</c><c>Direct Substitution</c>
</texttable>
<section title="Truncation">
<t>Truncation removes "n" of the least significant bits from an IP
address, replacing them with zeroes. In effect, it replaces a host
address with a network address for some fixed netblock; for IPv4
addresses, 8-bit truncation corresponds to replacement with a /24
network address. Truncation is a non-reversible generalisation scheme.
Note that while truncation is effective for making hosts
non-identifiable, it preserves information which can be used to
identify an organization, a geographic region, a country, or a
continent (or RIR region of responsibility).</t>
<t>Truncation to an address length of 0 is equivalent to black-marker
anonymisation. Complete removal of IP address information is only
recommended for analysis tasks which have no need to separate flow
data by host or network; e.g. as a first stage to per-application
(port) or time-series total volume analyses.</t>
</section>
<section title="Reverse Truncation">
<t>Reverse truncation removes "n" of the most significant bits from an
IP address, replacing them with zeroes. Reverse truncation is a
non-reversible generalisation scheme. Reverse truncation is effective
for making networks unidentifiable, partially or completely removing
information which can be used to identify an organization, a
geographic region, a country, or a continent (or RIR region of
responsibility). However, it may cause ambiguity when applied to data
collected from more than one network, since it treats all the hosts
with the same address on different networks as if they are the same
host. It is not particularly useful when publishing data where the
network of origin is known or can be easily guessed by virtue of the
identity of the publisher.</t>
<t>Like truncation, reverse truncation to an address length of 0 is
equivalent to black-marker anonymisation.</t>
</section>
<section title="Random Permutation">
<t>Random permutation is a direct substitution technique, replacing
each IP address with an address randomly selected from the set of
possible IP addresses, guaranteeing that each anonymised address
represents a unique original address. The random permutation does not
preserve any structural information about a network, but it does
preserve the unique count of IP addresses. Any application that
requires more structure than host-uniqueness will not be able to use
randomly permuted IP addresses.</t>
</section>
<section title="Prefix-preserving Pseudonymisation">
<t>Prefix-preserving pseudonymisation is a direct substitution
technique, further restricted such that the structure of subnets is
preserved at each level while anonymising IP addresses. If two real IP
addresses match on a prefix of "n" bits, the two anonymised IP
addresses will match on a prefix of "n" bits as well. This is useful
when relationships among networks must be preserved for a given
analysis task, but introduces structure into the anonymised data which
can be exploited in attacks against the anonymisation technique.</t>
</section>
</section>
<section title="Hardware Address Anonymisation">
<t>Flow data containing sub-IP information can also contain identifying
information in the form of the hardware (MAC) address. While hardware
address information cannot be used to locate a node within a network, it
can be used to directly uniquely identify a specific device. Vendors or
organizations within the supply chain may then have the information
necessary to identify the entity or individual that purchased the
device.</t>
<t>Hardware address information is not as structured as IP address
information. EUI-48 and EUI-64 hardware addresses contain an
Organizational Unique Identifier in the three most significant bytes of
the address; this OUI additionally contains bits noting whether the
address is locally or globally administered. Beyond this, the address is
unstructured, and there is no particular relationship among the OUIs
assigned to a given vendor.</t>
<t>Note that hardware address information also appear within IPv6
addresses, as the EAP-64 address, or EAP-48 address encoded as an EAP-64
address, is used as the least significant 64 bits of the IPv6 address in
the case of link local addressing or stateless autoconfiguration; the
considerations and techniques in this section may then apply to such
IPv6 addresses as well.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Reverse Truncation</c><c>Generalisation</c>
<c>Random Permutation</c><c>Direct Substitution</c>
<c>Structured Pseudonymisation</c><c>Direct Substitution</c>
</texttable>
<section title="Reverse Truncation">
<t>Reverse truncation removes "n" of the most significant bits from an
MAC address, replacing them with zeroes. Reverse truncation is a
non-reversible generalisation scheme. This has the effect of removing
bits of the OUI, which identify manufacturers, before removing the
least significant bits. Reverse truncation of 24 bits zeroes out the
OUI.</t>
<t>Reverse truncation is effective for making device manufacturers
partially or completely unidentifiable within a dataset. However, it
may cause ambiguity by introducing the possibility of truncated MAC
address collision. Also note that the utility or removing manufacturer
information is dubious, and not particularly well-covered by the
literature.</t>
<t>Reverse truncation to an address length of 0 is
equivalent to black-marker anonymisation.</t>
</section>
<section title="Random Permutation">
<t>Random permutation is a direct substitution technique, replacing
each IP address with an address randomly selected from the set of
possible IP addresses, guaranteeing that each anonymised address
represents a unique original address. The random permutation does not
preserve any structural information about a network, but it does
preserve the unique count of IP addresses. Any application that
requires more structure than host-uniqueness will not be able to use
randomly permuted IP addresses.</t>
</section>
<section title="Structured Pseudonymisation">
<t>Structured pseudonymisation for MAC addresses is a direct
substitution technique, like random permutation, but restricted such
that the OUI (the most significant three bytes) is permuted separately
from the node identifier, the remainder. This is useful when the
uniqueness of OUIs must be preserved for a given analysis task, but
introduces structure into the anonymised data which can be exploited
in attacks against the anonymisation technique.</t>
</section>
</section>
<section title="Timestamp Anonymisation">
<t>The particular time at which a flow began or ended is not
particularly identifiable information, but it can be used as part of
attacks against other anonymisation techniques or for user profiling.
Presice timestamps can be used in injected-traffic fingerprinting
attacks [CITE] as well as to identify certain activity by response delay
and size fingerprinting [CITE]. Therefore, timestamp information may be
anonymised in order to ensure the protection of the entire dataset.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalisation</c>
<c>Enumeration</c><c>Direct or Set Substitution</c>
<c>Random Shifts</c><c>Direct Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>Precision Degradation is a generalisation technique that removes
the most precise components of a timestamp, accounting all events
occurring in each given interval (e.g. one millisecond for millisecond
level degradation) as simultaneous. This has the effect of potentially
collapsing many timestamps into one. With this technique time
precision is reduced, and sequencing may be lost, but the information
at which time the event occurred is preserved. The anonymised data may
not be generally useful for applications which require strict
sequencing of flows.</t>
<t>Note that flow meters with low time precision (e.g. second
precision, or millisecond precision on high-capacity networks) perform
the equivalent of precision degradation anonymisation by their
design.</t>
<t>Note also that degradation to a very low precision (e.g. on the
order of minutes, hours, or days) is commonly used in analyses
operating on time-series aggregated data, and may also be described as
binning; though the time scales are longer and applicability more
restricted, this is in principle the same operation.</t>
<t>Precision degradation to infinitely low precision is equivalent to
black-marker anonymisation. Removal of timestamp information is only
recommended for analysis tasks which have no need to separate flows in
time, for example for counting total volumes or unique occurrences of
other flow keys in an entire dataset.</t>
</section>
<section title="Enumeration">
<t>Enumeration is a substitution function that retains the
chronological order in which events occurred while eliminating time
information. Timestamps are substituted by equidistant timestamps (or
numbers) starting from a randomly chosen start value. The resulting
data is useful for applications requiring strict sequencing, but not
for those requiring good timing information (e.g. delay- or jitter-
measurement for QoS applications or SLA validation).</t>
</section>
<section title="Random Time Shifts">
<t>Random time shifts add a random offset to every timestamp within a
dataset. This reversible substitution technique therefore retains
duration and inter-event interval information as well as chronological
order of flows. It is primarily intended to defeat traffic injection
fingerprinting attacks.</t>
</section>
</section>
<section title="Counter Anonymisation">
<t>Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymisation, or for user
profiling as timestamps are. Counter anonymisation can help defeat these
attacks, but are only usable for analysis tasks for which relative or
imprecise magnitudes of activity are useful. </t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalisation</c>
<c>Binning</c><c>Generalisation</c>
<c>Random noise addition</c><c>Direct or Set Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>As with precision degradation in timestamps, precision degradation
of counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships
between sizes of similarly-sized flows, but keeps relative magnitude
information.</t>
</section>
<section title="Binning">
<t>Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the
counter ranges are uniform, and in binning they need not be. For
example, a common counter binning scheme for packet counters could be
to bin values 1-2 together, and 3-infinity together, thereby
separating potentially completely-opened TCP connections from unopened
ones. Binning schemes are generally chosen to keep precisely the
amount of information required in a counter for a given analysis task.
Note that, also unlike precision degradation, the bin label need not
be within the bin's range.</t>
<t>Binning counters to a single bin 0-infinity, or alternately
precision degradation to infinitely low precision, is equivalent to
black-marker anonymisation. Removal of counter information is only
recommended for analysis tasks which have no need to evaluate the
removed counter, for example for counting only unique occurrences of
other flow keys.</t>
</section>
<section title="Random Noise Addition">
<t>Random noise addition adds a random amount to a counter in each
flow; this is used to keep relative magnitude information and minimize
the disruption to size relationship information while avoiding
fingerprinting attacks against anonymisation. Note that there is no
guarantee that random noise addition will maintain ranking order by a
counter among members of a set. Random noise addition is particularly
useful when the derived analysis data will not be presented in such a
way as to require the lower-order bits of the counters.</t>
</section>
</section>
<section title="Anonymisation of Other Flow Fields">
<t>Other fields, particularly port numbers and protocol numbers, can
be used to partially identify the applications that generated the
traffic in a a given flow trace. This information can be used in
fingerprinting attacks, and may be of interest on its own (e.g., to
reveal that a certain application with suspected vulnerabilities is
running on a given network). These fields are generally
anonymised using one of two techniques.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Binning</c><c>Generalisation</c>
<c>Random Permutation</c><c>Direct Substitution</c>
</texttable>
<section title="Binning">
<t>Binning is a generalisation technique mapping a set of potentially
non-uniform ranges into a set of arbitrarily labeled bins. Common bin
arrangements depend on the field type and the analysis application.
For example, an IP protocol bin arrangement may preserve 1, 6, and 17
for ICMP, UDP, and TCP traffic, and bin all other protocols into a
single bin, to mitigate the use of uncommon protocols in
fingerprinting attacks. Another example arrangement may bin source and
destination ports into low (0-1023) and high (1024-65535) bins in
order to tell service from ephemeral ports without identifying
individual applications.</t>
<t>Binning other flow key fields to a single bin is equivalent to
black-marker anonymisation. Removal of other flow key information is
only recommended for analysis tasks which have no need to
differentiate flows on the removed keys, for example for total traffic
counts or unique counts of other flow keys.</t>
</section>
<section title="Random Permutation">
<t>Random permutation is a direct substitution technique, replacing
each value with an value randomly selected from the set of possible
range, guaranteeing that each anonymised value represents a unique
original value. This is used to preserve the count of unique values
without preserving information about, or the ordering of, the values
themselves.</t>
</section>
</section>
</section>
<section title="Parameters for the Description of Anonymisation Techniques">
<t>This section details the abstract parameters used to describe the
anonymisation techniques examined in the previous section, on a
per-parameter basis. These parameters and their export safety inform the
design of the IPFIX anonymisation metadata export specified in the
following section.</t>
<section title="Stability" anchor="params-stability">
<t>Any given anonymisation technique may be applied with a varying range
of stability. Stability is important for assessing the comparability of
anonymised information in different data sets, or in the same data set
over different time periods. In general, stability ranges from
completely stable to completely unstable; however, note that the
completely unstable case is indistinguishable from black-marker
anonymisation. A completely stable anonymisation will always map a given
value in the real space to the same value in the anonymised space. In
practice, an anonymisation may also be stable for every data set
published by an a particular producer to a particular consumer, stable
for a stated time period within a dataset or across datasets, or stable
only for a single data set.</t>
<t>If no information about stability is available, users of anonymised
data may assume that the techniques used are stable across the entire
dataset, but unstable across datasets. Note that stability presents a
risk-utility tradeoff, as completely stable anonymisation can be used
for longer-term trend analysis tasks but also presents more risk of
attack given the stable mapping.</t>
</section>
<section title="Truncation Length">
<t>Truncation and precision degradation are described by the truncation
length, or the amount of data still remaining in the anonymised field
after anonymisation.</t>
<t>Truncation length can be inferred from a given data set, and need not
be specially exported or protected.</t>
</section>
<section title="Bin Map">
<t>Binning is described by the specification of a bin mapping function.
This function can be generally expressed in terms of an associative
array that maps each point in the original space to a bin, although from
an implementation standpoint most bin functions are much simpler and
more efficient.</t>
<t>Since knowledge of the bin mapping function can be used to partially
deanonymise binned data, depending on the degree of generalisation, no
information about the bin mapping function should be exported.</t>
</section>
<section title="Permutation">
<t>Like binning, permutation is described by the specification of a
permutation function. In the general case, this can be expressed in
terms of an associative array that maps each point in the original space
to a point in the anonymised space. Unlike binning, each point in the
anonymised space must correspond to a single, unique point in the
original space.</t>
<t>Since knowledge of the permutation function can be used to completely
deanonymise permuted data, no information about the permutation function
or its parameters should be exported.</t>
</section>
<section title="Shift Amount">
<t>Shifting requires an amount to shift each value by. Since the shift
amount can be used to deanonymise data protected by shifting, no
information about the shift amount should be exported.</t>
</section>
</section>
<section title="Anonymisation Export Support in IPFIX" anchor="aes-section">
<t>Anonymised data exported via IPFIX SHOULD be annotated with
anonymisation metadata, which details which fields described by which
Templates are anonymised, and provides appropriate information on the
anonymisation techniques used. This metadata SHOULD be exported in Data
Records described by the recommended Options Templates described in this
section; these Options Templates use the additional Information Elements
described in the following subsection.</t>
<t>Note that fields anonymised using the black-marker (removal) technique
do not require any special metadata support. Black-marker anonymised
fields SHOULD NOT be exported at all; the absence of the field in a given
Data Set is implicitly declared by not including the corresponding
Information Element in the Template describing that Data Set.</t>
<section title="Anonymisation Options Template" anchor="opt-section">
<t>The Anonymisation Options Template describes anonymisation records,
which allow anonymisation metadata to be exported inline over IPFIX or
stored in an IPFIX File, by binding information about anonymisation
techniques to Information Elements within defined Templates. IPFIX
Exporting Processes SHOULD export anonymisation records for any Template
describing exported anonymised Data Records; IPFIX Collecting Processes
and processes downstream from them MAY use anonymisation records to
treat anonymised data differently depending on the applied
technique.</t>
<t>An Exporting Process SHOULD export anonymisation records after the
Templates they describe have been exported, and SHOULD export
anonymisation records reliably.</t>
<t>Anonymisation records, like Templates, MUST be handled by Collecting
Processes as scoped to the Transport Session in which they are sent.
While the Stability Class within the anonymisationFlags IE can be used
to declare that a given anonymisation technique's mapping will remain
stable across multiple sessions, each session MUST re-export the
anonymisation Records along with the templates.</t>
<texttable>
<ttcol align="left">IE</ttcol>
<ttcol align="left">Description</ttcol>
<c>templateId [scope]</c>
<c>
The Template ID of the Template containing the Information Element
described by this anonymisation record. This Information Element
MUST be defined as a Scope Field.
</c>
<c>informationElementId [scope]</c>
<c>
The Information Element identifier of the Information Element
described by this anonymisation record. This Information Element
MUST be defined as a Scope Field.
</c>
<c>informationElementId [scope] [optional]</c>
<c>
The Private Enterprise Number of the enterprise-specific Information
Element described by this anonymisation record. This Information
Element MUST be defined as a Scope Field if present.
</c>
<c>informationElementIndex [scope] [optional]</c>
<c>
The Information Element index of the instance of the Information
Element described by this anonymisation record identified by the
informationElementId within the Template. Optional; need only be
present when describing Templates that have multiple instances of
the same Information Element. This Information Element MUST be
defined as a Scope Field if present. This Information Element is
defined in <xref target="ie-section"></xref>, below.
</c>
<c>anonymisationFlags</c>
<c>
Flags describing the mapping stability and specialized modifications
to the Anonymisation Technique in use. SHOULD be present. This
Information Element is defined in <xref target="ie-section"></xref>,
below.
</c>
<c>anonymisationTechnique</c>
<c>
The technique used to anonymise the data. MUST be present. This
Information Element is defined in <xref target="ie-section"></xref>,
below.
</c>
</texttable>
</section>
<section title="Recommended Information Elements for Anonymisation Metadata" anchor="ie-section">
<section title="informationElementIndex">
<list style="hanging">
<t hangText="Description: ">
A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements.</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="ElementId: ">TBD3</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="anonymisationFlags">
<list style="hanging">
<t hangText="Description: ">
A flag word describing specialized modifications to the
anonymisation policy in effect for the anonymisation technique
applied to a referenced Information Element within a referenced
Template. When flags are clear (0), the normal policy (as
described by anonymisationTechnique) applies without
modification.
<figure title="anonymisationFlags IE">
<artwork><![CDATA[
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| Reserved |LOR|PmA| SC |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
</figure>
<texttable>
<ttcol align="left">bit(s) (LSB = 0)</ttcol>
<ttcol align="left">name</ttcol>
<ttcol align="left">description</ttcol>
<c>0-1</c><c>SC</c><c>Stability Class: see the Stability Class table below, and section <xref target="params-stability"/>.</c>
<c>2</c><c>PmA</c><c>Perimeter Anonymisation: when set (1), source address Information Elements are interpreted as external addresses, and destination address Information Elements are interpreted as internal addresses, for the purposes of associating anonymisationTechnique to Information Elements. MUST NOT be set when associated with a non-endpoint (i.e., source- or destination-) Information Element. SHOULD be consistent within a record (i.e., if a source- Information Element has this flag set, the corresponding destination- element SHOULD have this flag set, and vice-versa.)</c>
<c>3</c><c>LOR</c><c>Low-Order Unchanged: when set (1), the low-order bits of the anonymised Information Element contain real data. This modification is intended for the anonymisation of network-level addresses while leaving host-level addresses intact in order to preserve host level-structure, which could otherwise be used to reverse anonymisation. MUST NOT be set when associated with a truncation-based anonymisationTechnique.</c>
<c>4-15</c><c>Reserved</c><c>Reserved for future use: SHOULD be cleared (0) by the Exporting Process and MUST be ignored by the Collecting Process.</c>
</texttable>
The Stability Class portion of this flags word describes the
stability class of the anonymisation technique applied to a
referenced Information Element within a referenced Template.
Stability classes refer to the stability of the parameters of
the anonymisation technique, and therefore the comparability of
the mapping between the real and anonymised values over time.
This determines which anonymised datasets may be compared with
each other. Values are as follows:
<texttable>
<ttcol align="left">Bit 1</ttcol>
<ttcol align="left">Bit 0</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c><c>0</c><c>Undefined: the Exporting Process makes no representation as to how stable the mapping is, or over what time period values of this field will remain comparable; while the Collecting Process MAY assume Session level stability, Session level stability is not guaranteed. Processes SHOULD assume this is the case in the absence of stability class information; this is the default stability class.</c>
<c>0</c><c>1</c><c>Session: the Exporting Process will ensure that the parameters of the anonymisation technique are stable during the Transport Session. All the values of the described Information Element for each Record described by the referenced Template within the Transport Session are comparable. The Exporting Process SHOULD endeavour to ensure at least this stability class.</c>
<c>1</c><c>0</c><c>Exporter-Collector Pair: the Exporting Process will ensure that the parameters of the anonymisation technique are stable across Transport Sessions over time with the given Collecting Process, but may use different parameters for different Collecting Processes. Data exported to different Collecting Processes is not comparable.</c>
<c>1</c><c>1</c><c>Stable: the Exporting Process will ensure that the parameters of the anonymisation technique are stable across Transport Sessions over time, regardless of the Collecting Process to which it is sent.</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="ElementId: ">TBD1</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="anonymisationTechnique" anchor="ie-at-section">
<list style="hanging">
<t hangText="Description: ">
A description of the anonymisation technique applied to a
referenced Information Element within a referenced Template. Each
technique may be applicable only to certain Information Elements
and recommended only for certain Infomation Elements; these
restrictions are noted in the table below.
<texttable>
<ttcol align="left">Value</ttcol>
<ttcol align="left">Description</ttcol>
<ttcol align="left">Applicable to</ttcol>
<ttcol align="left">Recommended for</ttcol>
<c>0</c>
<c>Undefined: the Exporting Process makes no representation as to whether the defined field is anonymised or not. While the Collecting Process MAY assume that the field is not anonymised, it is not guaranteed not to be. This is the default anonymisation technique.</c>
<c>all</c>
<c>all</c>
<c>1</c>
<c>None: the values exported are real.</c>
<c>all</c>
<c>all</c>
<c>2</c>
<c>Precision Degradation/Truncation: the values exported are anonymised using simple precision degradation or truncation. The new precision or number of truncated buts is implicit in the exported data, and can be deduced by the Collecting Process.</c>
<c>all</c>
<c>all</c>
<c>3</c>
<c>Binning: the values exported are anonymised into bins.</c>
<c>all</c>
<c>all</c>
<c>4</c><c>Enumeration: the values exported are anonymised by enumeration.</c>
<c>all</c>
<c>timestamps</c>
<c>5</c>
<c>Permutation: the values exported are anonymised by random permutation.</c>
<c>all</c>
<c>identifiers</c>
<c>6</c><c>Structured Permutation: the values exported are anonymised by random permutation, preserving bit-level structure as appropriate; this represents prefix-preserving IP address anonymisation or structured MAC address anonymisation.</c>
<c>addresses</c>
<c></c>
<c>7</c><c>Reverse Truncation: the values exported are anonymised using reverse truncation. The number of truncated bits is implicit in the exported data, and can be deduced by the Collecting Process.</c>
<c>addresses</c>
<c></c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="ElementId: ">TBD2</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
</section>
</section>
<section title="Applying Anonymisation Techniques to IPFIX Export and Storage" anchor="export-anon-section">
<t>When exporting or storing anonymised flow data using IPFIX, certain
interactions between the IPFIX Protocol and the anonymisation techniques
in use must be considered; these are treated in the subsections below.</t>
<section title="Arrangement of Processes in IPFIX Anonymisation" anchor="export-anon-arrangement">
<t>Anonymisation may be applied to IPFIX data at three stages within a
the collection infrastructure: on initial export, at a mediator, or
after collection, as shown in <xref target="loc-fig"></xref>. Each of
these locations has specific considerations and applicability.</t>
<figure title="Potential Anonymisation Locations" anchor="loc-fig">
<artwork><![CDATA[
+==========================================+
| Exporting Process |
+==========================================+
| |
| (Anonymised at Original Exporter) |
V |
+=============================+ |
| Mediator | |
+=============================+ |
| |
| (Anonymising Mediator) |
V V
+==========================================+
| Collecting Process |
+==========================================+
|
| (Anonymising CP/File Writer)
V
+--------------------+
| IPFIX File Storage |
+--------------------+
]]></artwork>
</figure>
<t>Anonymisation is generally performed before the wider dissemination
or repurposing of a flow data set, e.g., adapting operational
measurement data for research. Therefore, direct anonymisation of flow
data on initial export is only applicable in certain restricted
circumstances: when the Exporting Process is "publishing" data to a
Collecting Process directly, and the Exporting Process and Collecting
Process are operated by different entities. Note that certain guidelines
in <xref target="header-anon"></xref> with respect to timestamp
anonymisation may not apply in this case, as the Collecting Process may
be able to deduce certain timing information from the time at which each
Message is received.</t>
<t>A much more flexible arrangement is to anonymise data within a <xref target="I-D.ietf-ipfix-mediators-framework">Mediator</xref>. Here,
original data is sent to a Mediator, which performs the anonymisation
function and re-exports the anonymised data. Such a Mediator could be
located at the administrative domain boundary of the initial Exporting
Process operator, exporting anonymised data to other consumers outside
the organisation. In this case, the original Exporter SHOULD use TLS as
specified in <xref target="RFC5101"></xref> to secure the channel to the
Mediator, and the Mediator should follow the guidelines in <xref target="guidelines"></xref>, to mitigate the risk of original data
disclosure.</t>
<t>When data is to be published as an anonymised data set in an <xref target="RFC5655">IPFIX File</xref>, the anonymisation may be
done at the final Collecting Process before storage and dissemination,
as well. In this case, the Collector should follow the guidelines in
<xref target="guidelines"></xref>, especially as regards File-specific
Options in <xref target="opt-anon"></xref> </t>
<t>In each of these data flows, the anonymisation of records is
undertaken by an Intermediate Anonymisation Process (IAP); the data
flows into and out of this IAP are shown in <xref target="iap-dataflows"></xref> below.</t>
<figure title="Data flows through the anonymisation process" anchor="iap-dataflows">
<artwork><![CDATA[
packets --+ +- IPFIX Messages -+
| | |
V V V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
+==================+ +====================+ +=============+
| Non-anonymised | Records |
V V V
+=========================================================+
| Intermediate Anonymisation Process (IAP) |
+=========================================================+
| Anonymised ^ Anonymised |
| Records | Records |
V | V
+===================+ Anonymisation +=============+
| Exporting Process |<--- Parameters ------>| File Writer |
+===================+ +=============+
| |
+------------> IPFIX Messages <----------+
]]></artwork>
</figure>
<t>Anonymisation parameters must also be available to the Exporting
Process and/or File Writer in order to ensure header data is also
appropriately anonymised as in <xref target="header-anon"></xref>.</t>
<t>Following each of the data flows through the IAP, we describe
five basic types of anonymisation arrangements within this framework in
<xref target="iap-arrangements"></xref>. In addition to the three arrangements
described in detail above, anonymisation can also be done at a
collocated Metering Process and File Writer (see section 7.3.2 of <xref target="RFC5655"></xref>), or at a file manipulator (see section
7.3.7 of <xref target="RFC5655"></xref>).</t>
<figure title="Possible anonymisation arrangements in the IPFIX architecture" anchor="iap-arrangements">
<artwork><![CDATA[
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter
+----+ +-----+ +----+
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masquerading Proxy)
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
File +----+ +-----+ +----+
]]></artwork>
</figure>
<t>Note that anonymisation may occur at more than one location within a
given collection infrastructure, to provide varying levels of anonymisation,
disclosure risk, or data utility for specific purposes.</t>
</section>
<section title="IPFIX-Specific Anonymisation Guidelines" anchor="guidelines">
<t>In implementing and deploying the anonymisation techniques described
in this document, implementors should note that IPFIX already provides
features that support anonymised data export, and use these where
appropriate. Care must also be taken that data structures supporting the
operation of the protocol itself do not leak data that could be used to
reverse the anonymisation applied to the flow data. Such data structures
may appear in the header, or within the data stream itself, especially
as options data. Each of these and their impact on specific
anonymisation techniques is noted in a separate subsection below.</t>
<section title="Appropriate Use of Information Elements for Anonymised Data" anchor="iespec-anon">
<t>Note, as in <xref target="aes-section"></xref> above, that black-marker
anonymised fields SHOULD NOT be exported at all; the absence of the
field in a given Data Set is implicitly declared by not including the
corresponding Information Element in the Template describing that Data
Set.</t>
<t>When using precision degradation of timestamps, Exporting Processes
SHOULD export timing information using Information Elements of an
appropriate precision, as explained in Section 4.5 of <xref target="RFC5153"></xref>.
For example, timestamps measured in
millisecond-level precision and degraded to second-level precision
should use flowStartSeconds and flowEndSeconds, not
flowStartMilliseconds and flowEndMilliseconds.</t>
<t>When exporting anonymised data and anonymisation metadata,
Exporting Processes SHOULD ensure that the combination of Information
Element and declared anonymisation technique are compatible.
Specifically, the applicable and recommended Information Element types
and semantics for each technique are noted in the description of the
anonymisationTechnique Information Element in <xref target="ie-at-section"></xref>.
In this description, a timestamp is an
Information Element with the data type dateTimeSeconds,
dataTimeMilliseconds, dateTimeMicroseconds, or dateTimeNanoseconds; an
address is an Information Element with the data type ipv4Address,
ipv6Address, or macAddress; and an identifier is an Information
Element with identifier data type semantics. Exporting Process MUST
NOT export Anonymisation Options records binding techniques to
Information Elements to which they are not applicable, and SHOULD NOT
export Anonymisation Options records binding techniques to Information
Elements for which they are not recommended. </t>
</section>
<section title="Export of Perimeter-Based Anonymisation Policies" anchor="perimeter-anon">
<t>Data collected from a single network may require different
anonymisation policies for addresses internal and external to the
network. For example, internal addresses could be subject to simple
permutation, while external addresses could be aggregated into
networks by truncation. When exporting anonymised perimeter biflow
data as in section 5.2 of <xref target="RFC5103"/>, this arrangement
may be easily represented by specifying one technique for source
endpoint information (which represents the external endpoint in a
perimeter biflow) and one technique for destination endpoint
information (which represents the internal address in a perimeter
biflow).</t>
<t>However, it can also be useful to represent perimeter-based
anonymisation policies with uniflow, or non-perimeter biflow data.
In this case, the Perimeter Anonymisation bit (bit 2) in the
anonymisationFlags Information Element describing the anonymised
address Information Elements can be set to change the meaning of
"source" and "destination" of Information Elements to mean
"external" and "internal" as with perimeter biflows, but only with
respect to anonymisation policies.</t>
</section>
<section title="Anonymisation of Header Data" anchor="header-anon">
<t>Each IPFIX Message contains a Message Header; within this Message
Header are contained two fields which may be used to break certain
anonymisation techniques: the Export Time, and the Observation Domain
ID</t>
<t>Export of IPFIX Messages containing anonymised timestamp data where
the original Export Time Message header has some relationship to the
anonymised timestamps SHOULD anonymise the Export Time header field
using an equivalent technique, if possible. Otherwise, relationships
between export and flow time could be used to partially or totally
reverse timestamp anonymisation.</t>
<t>The similarity in size between an Observation Domain ID and an IPv4
address (32 bits) may lead to a temptation to use an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID. If this address bears some relation to the IP addresses in the
flow data (e.g., shares a network prefix with internal addresses) and
the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Observation Domain ID may be used
to break the IP address anonymisation. Use of an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID is NOT RECOMMENDED in this case.</t>
<!--<t>[EDITOR'S NOTE: We might want to see if anyone is actually doing
this with IPFIX. The example comes from other network measurement
tools (e.g. Argus) which default to using an IPv4 address as a sensor
ID.]</t>-->
</section>
<section title="Anonymisation of Options Data" anchor="opt-anon">
<t>IPFIX uses the Options mechanism to export, among other things,
metadata about exported flows and the flow collection infrastructure.
As with the IPFIX Message Header, certain Options recommended in <xref target="RFC5101"></xref> and <xref target="RFC5655"></xref> containing flow timestamps and network addresses of
Exporting and Collecting Processes may be used to break certain
anonymisation techniques; care should be taken while using them with
anonymised data export and storage.</t>
<t>The Exporting Process Reliability Statistics Options Template,
recommended in <xref target="RFC5101"></xref>, contains an Exporting Process
ID field, which may be an exportingProcessIPv4Address Information
Element or an exportingProcessIPv6Address Information Element. If the
Exporting Process address bears some relation to the IP addresses in
the flow data (e.g., shares a network prefix with internal addresses)
and the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Exporting Process address may be
used to break the IP address anonymisation. Exporting Processes
exporting anonymised data in this situation SHOULD mitigate the risk
of attack either by omitting Options described by the Exporting
Process Reliability Statistics Options Template, or by anonymising the
Exporting Process address using a similar technique to that used to
anonymise the IP addresses in the exported data.</t>
<t>Similarly, the Export Session Details Options Template and Message
Details Options Template specified for the <xref target="RFC5655">IPFIX File Format</xref> may contain the
exportingProcessIPv4Address Information Element or the
exportingProcessIPv6Address Information Element to identify an
Exporting Process from which a flow record was received, and the
collectingProcessIPv4Address Information Element or the
collectingProcessIPv6Address Information Element to identify the
Collecting Process which received it. If the Exporting Process or
Collecting Process address bears some relation to the IP addresses in
the flow data (e.g., shares a network prefix with internal addresses)
and the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Exporting Process or Collecting
Process address may be used to break the IP address anonymisation.
Since these Options Templates are primarily intended for storing IPFIX
Transport Session data for auditing, replay, and testing purposes, it
is NOT RECOMMENDED that storage of anonymised data include these
Options Templates in order to mitigate the risk of attack.</t>
<t>The Message Details Options Template specified for the <xref target="RFC5655">IPFIX File Format</xref> also contains
the collectionTimeMilliseconds Information Element. As with the Export
Time Message Header field, if the exported flow data contains
anonymised timestamp information, and the collectionTimeMilliseconds
Information Element in a given Message has some relationship to the
anonymised timestamp information, then this relationship can be
exploited to reverse the timestamp anonymisation. Since this Options
Template is primarily intended for storing IPFIX Transport Session
data for auditing, replay, and testing purposes, it is NOT RECOMMENDED
that storage of anonymised data include this Options Template in order
to mitigate the risk of attack.</t>
<t>Since the Time Window Options Template specified for the
<xref target="RFC5655">IPFIX File Format</xref> refers to the
timestamps within the flow data to provide partial table of contents
information for an IPFIX File, care must be taken to ensure that
Options described by this template are written using the anonymised
timestamps instead of the original ones.</t>
<!--<t>[EDITOR'S NOTE: what about other non-standard templates
containing the same or similar IEs?]</t>-->
</section>
<section title="Special-Use Address Space Considerations" anchor="sua-anon">
<t>When anonymising data for transport or storage using IPFIX
containing anonymised IP addresses, and the analysis purpose permits
doing so, it is recommended to filter out or leave unanonymised data
containing the special-use IPv4 addresses enumerated in <xref
target="RFC3330"/> or the special-use IPv6 addresses enumerated in
<xref target="RFC5153"/>. Data containing these addresses (e.g.
0.0.0.0 and 169.254.0.0/16 for link-local autoconfiguration in IPv4
space) are often associated with specific, well-known behavioral
patterns. Detection of these patterns in anonymised data can lead to
deanonymisation of these special-use addresses, which increases the
chance of a complete reversal of anonymisation by an attacker,
especially of prefix-preserving techniques.</t>
</section>
</section>
</section>
<section title="Examples">
<t>In this example, consider the export or storage of an anonymised IPv4 dataset from a single network described by a simple template containing a timestamp in seconds, a five-tuple, and packet and octet counters. The template describing each record in this dataset is shown in figure <xref target="af-template"/>.</t>
<figure title="Example Flow Template" anchor="af-template">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 2 | Length = 40 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 256 | Field Count = 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| flowStartSeconds 150 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceIPv4Address 8 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationIPv4Address 12 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceTransportPort 7 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationTransportPort 11 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| packetDeltaCount 2 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| octetDeltaCount 1 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| protocolIdentifier 4 | Field Length = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Suppose that this dataset is anonymised according to the following policy:</t>
<list style="symbols">
<t>IP addresses within the network are protected by reverse truncation.</t>
<t>IP addresses outside the network are protected by prefix-preserving anonymisation.</t>
<t>Octet counts are exported using degraded precision in order to provide minimal protection against fingerprinting attacks.</t>
<t>All other fields are exported unanonymised.</t>
</list>
<t>In order to export anonymisation records for this template and policy,
first, the Anonymisation Options Template shown in figure <xref target="anon-opt-template"/> is exported. For this
example, the optional privateEnterpriseNumber and informationElementIndex
Information Elements are omitted, because they are not used.</t>
<figure title="Example Anonymisation Options Template" anchor="anon-opt-template">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 3 | Length = 26 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 257 | Field Count = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scope Field Count = 2 |0| templateID 346 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| informationElementId 303 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymisationFlags 339 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymisationTechnique 344 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Following the Anonymisation Options Template comes a Data Set
containing Anonymisation Records. This data set has an entry for each
Information Element Specifier in Template 256 describing the flow records.
This Data Set is shown in figure <xref target="anon-records"/>. Note that
sourceIPv4Address and destinationIPv4Address have the Perimeter
Anonymisation (0x0004) flag set in anonymisationFlags, meaning that source
address should be treated as network-external, and the destination address
as network-internal.</t>
<figure title="Example Anonymisation Records" anchor="anon-records">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 257 | Length = 68 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | flowStartSeconds IE 150 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceIPv4Address IE 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Session SC 0x0005 | Structured Permutation 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | destinationIPv4Address IE 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Stable 0x0005 | Reverse Truncation 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceTransportPort IE 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | dest.TransportPort IE 11 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | packetDeltaCount IE 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | octetDeltaCount IE 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stable 0x0003 | Precision Degradation 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | protocolIdentifier IE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Following the Anonymisation Records come the data sets containing the
anonymised data, exported according to the template in figure <xref
target="af-template"/></t>
</section>
<section title="Security Considerations">
<t>This document provides guidelines for exporting metadata about
anonymised data in IPFIX, or storing metadata about anonymised data in
IPFIX Files. It is not intended as a general statement on the
applicability of specific flow data anonymisation techniques. Exporters or
publishers of anonymised data must take care that the applied
anonymisation technique is appropriate for the data source, the purpose,
and the risk of deanonymisation of a given application.</t>
<t>We note specifically that anonymisation is not a replacement for
encryption for confidentiality. It is only appropriate for protecting
identifying information in data to be used for purposes in which the
protected data is irrelevant. Confidentiality in export is best served by
using TLS or DTLS as in the Security Considerations section of <xref
target="RFC5101"/>, and in long-term storage by implementation-specific
protection applied as in the Security Considerations section of <xref
target="RFC5655"/>. Indeed, confidentiality and anonymisation
are not mutually exclusive, as encryption for confidentiality may be
applied to anonymised data export or storage, as well, when the anonymised
data is not intended for public release.</t>
<t>When using pseudonymisation techniques that have a mutable mapping,
there is an inherent tradeoff in the stability of the map between
long-term comparability and security of the dataset against
deanonymisation. In general, deanonymisation attacks are more effective
given more information, so the longer a given mapping is valid, the more
information can be applied to deanonymisation. The specific details of
this are technique-dependent and therefore out of the scope of this
document.</t>
<t>When releasing anonymised data, publishers need to ensure that data
that could be used in deanonymisation is not leaked through the export
protocol; guidelines for addressing this risk are provided in <xref
target="guidelines"/>.</t>
<t>Note as well that the Security Considerations section of <xref
target="RFC5101"/> applies as well to the export of anonymised data, and
the Security Considerations section of <xref
target="RFC5655"/> to the storage of anonymised data, or the
publication of anonymised traces.</t>
</section>
<section title="IANA Considerations">
<t>This document specifies the creation of several new IPFIX Information
Elements in the IPFIX Information Element registry located at
http://www.iana.org/assignments/ipfix, as defined in <xref target="ie-section"></xref> above. IANA has assigned the following
Information Element numbers for their respective Information Elements as
specified below:</t>
<list style="symbols">
<t>Information Element number TBD1 for the
anonymisationFlags Information Element.</t>
<t>Information Element number TBD2 for the anonymisationTechnique
Information Element.</t>
<t>Information Element number TBD3 for the informationElementIndex
Information Element.</t>
</list>
<t>[NOTE for IANA: The text TBDn should be replaced with the respective
assigned Information Element numbers where they appear in this
document.]</t>
<t>[EDITOR'S NOTE: do we want to define a new anonymisationTechnique
registry subject to standards action?]</t>
</section>
<section title="Acknowledgments">
<t>We thank Paul Aitken and John McHugh for their comments and insight,
and the PRISM project for its support of this work.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc5101;
&rfc5102;
&rfc5610;
&rfc5655;
&rfc3330;
&rfc5156;
</references>
<references title="Informative References">
&rfc5103;
&rfc5472;
&rfc5470;
&draftIpfixMedframe;
&draftIpfixMedps;
&rfc5153;
&rfc3917;
&rfc2119;
<!--
<reference anchor='cryptopan'>
<front>
<title>Prefix-Preserving IP Address Anonymisation</title>
<author initials='J' surname='Fan' fullname='Jinliang Fan'>
<organization />
</author>
<author initials='J' surname='Xu' fullname='Jun Xu'>
<organization />
</author>
<author initials='M' surname='Ammar' fullname='Mostafa H. Ammar'>
<organization />
</author>
<author initials='S' surname='Moon' fullname='Sue B. Moon'>
<organization />
</author>
<date month='October' day='7' year='2004' />
<abstract/>
</front>
<seriesInfo name='' value='Computer Networks, Volume 46, Issue 2, Pages 253-272, Elsevier'/>
</reference>
-->
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 09:22:18 |