One document matched: draft-ietf-ipfix-anon-06.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="exp" docName="draft-ietf-ipfix-anon-06.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<front>
<title abbrev="IP Flow Anonymization Support">
IP Flow Anonymization Support
</title>
<author initials="E." surname="Boschi" fullname="Elisa Boschi">
<organization abbrev="ETH Zurich">
Swiss Federal Institute of Technology Zurich
</organization>
<address>
<postal>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<email>boschie@tik.ee.ethz.ch</email>
</address>
</author>
<author initials="B." surname="Trammell" fullname="Brian Trammell">
<organization abbrev="ETH Zurich">
Swiss Federal Institute of Technology Zurich
</organization>
<address>
<postal>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 13</phone>
<email>trammell@tik.ee.ethz.ch</email>
</address>
</author>
<date month="January" day="19" year="2011"></date>
<area>Operations</area>
<workgroup>IPFIX Working Group</workgroup>
<abstract>
<t>This document describes anonymization techniques for IP flow data and
the export of anonymized data using the IPFIX protocol. It categorizes
common anonymization schemes and defines the parameters needed to describe
them. It provides guidelines for the implementation of anonymized data
export and storage over IPFIX, and describes an information model and
Options-based method for anonymization metadata export within the IPFIX
protocol or storage in IPFIX Files.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The standardization of an IP flow information export protocol <xref
target="RFC5101"/> and associated representations removes a technical
barrier to the sharing of IP flow data across organizational boundaries
and with network operations, security, and research communities for a wide
variety of purposes. However, with wider dissemination comes greater risks
to the privacy of the users of networks under measurement, and to the
security of those networks. While it is not a complete solution to the
issues posed by distribution of IP flow information, anonymization (i.e.,
the deletion or transformation of information that is considered sensitive
and could be used to reveal the identity of subjects involved in a
communication) is an important tool for the protection of privacy within
network measurement infrastructures.</t>
<t>This document presents a mechanism for representing anonymized data
within IPFIX and guidelines for using it. It is not intended as a general
statement on the applicability of specific flow data anonymization
techniques to specific situations, or as a recommendation of any
particular application of anonymization to flow data export. Exporters or
publishers of anonymized data must take care that the applied
anonymization technique is appropriate for the data source, the purpose,
and the risk of deanonymization of a given application.</t>
<t>It begins with a categorization of anonymization techniques. It then
describes applicability of each technique to commonly anonymizable fields
of IP flow data, organized by information element data type and semantics
as in <xref target="RFC5102"></xref>; enumerates the parameters required
by each of the applicable anonymization techniques; and provides
guidelines for the use of each of these techniques in accordance with
current best practices in data protection. Finally, it specifies a
mechanism for exporting anonymized data and binding anonymization metadata
to Templates and Options Templates using IPFIX Options.</t>
<section title="IPFIX Protocol Overview">
<t>In the IPFIX protocol, { type, length, value } tuples are expressed
in Templates containing { type, length } pairs, specifying which { value
} fields are present in data records conforming to the Template, giving
great flexibility as to what data is transmitted. Since Templates are
sent very infrequently compared with Data Records, this results in
significant bandwidth savings. Various different data formats may be
transmitted simply by sending new Templates specifying the { type,
length } pairs for the new data format. See <xref target="RFC5101"></xref> for more information.</t>
<t>The <xref target="RFC5102">IPFIX information model</xref> defines a
large number of standard Information Elements which provide the
necessary { type } information for Templates. The use of standard
elements enables interoperability among different vendors'
implementations. Additionally, non-standard enterprise-specific elements
may be defined for private use.</t>
</section>
<section title="IPFIX Documents Overview" anchor="intro-docs">
<t><xref target="RFC5101">"Specification of the IPFIX
Protocol for the Exchange of IP Traffic Flow Information"</xref>
and its associated documents
define the IPFIX Protocol, which provides network engineers and
administrators with access to IP traffic flow information.</t>
<t><xref target="RFC5470">"Architecture for IP Flow
Information Export"</xref> defines
the architecture for the export of measured IP flow information out of
an IPFIX Exporting Process to an IPFIX Collecting Process, and the
basic terminology used to describe the elements of this architecture,
per the requirements defined in <xref target="RFC3917">"Requirements
for IP Flow Information Export"</xref>. The IPFIX Protocol document
<xref target="RFC5101"></xref> then covers the details of the method for
transporting IPFIX Data Records and Templates via a congestion-aware
transport protocol from an IPFIX Exporting Process to an IPFIX
Collecting Process.</t>
<t><xref target="RFC5102">"Information Model for IP Flow Information
Export"</xref> describes the Information Elements used by IPFIX,
including details on Information Element naming, numbering, and data
type encoding. Finally, <xref target="RFC5472">"IPFIX
Applicability"</xref> describes the various applications of the IPFIX
protocol and their use of information exported via IPFIX, and relates
the IPFIX architecture to other measurement architectures and
frameworks.</t>
<t>Additionally, <xref target="RFC5655">"Specification
of the IPFIX File Format"</xref> describes a file format based upon the
IPFIX Protocol for the storage of flow data.</t>
<t>This document references the Protocol and Architecture documents for
terminology, and extends the IPFIX Information Model to provide new
Information Elements for anonymization metadata. The anonymization
techniques described herein are equally applicable to the IPFIX Protocol
and data stored in IPFIX Files.</t>
</section>
<section title="Anonymization within the IPFIX Architecture" anchor="intro-arch">
<t>According to <xref target="RFC5470"/>, IPFIX Message anonymization is
optionally performed as the final operation before handing the Message
to the transport protocol for export. While no provision is made in the
architecture for anonymization metadata as in <xref
target="aes-section"></xref>, this arrangement does allow for the
rewriting necessary for comprehensive anonymization of IPFIX
export as in <xref target="export-anon-section"></xref>. The development
of the <xref target="I-D.ietf-ipfix-mediators-framework">IPFIX
Mediation</xref> framework and the <xref target="RFC5655">IPFIX File
Format</xref> expand upon this initial architectural allowance for
anonymization by adding to the list of places that anonymization may be
applied. The former specifies IPFIX Mediators, which rewrite existing
IPFIX Messages, and the latter specifies a method for storage of IPFIX
data in files.</t>
<t>More detail on the applicable architectural arrangements for
anonymization can be found in <xref
target="export-anon-arrangement"></xref></t>.
</section>
<section title="Supporting Experimentation with Anonymization">
<t>The intended status of this document is Experimental, reflecting
the experimental nature of anonymization export support. Research on
network trace anonymization techniques and attacks against them is
ongoing. Indeed, there is increasing evidence that anonymization
applied to network trace or flow data its own is insufficient for many
data protection applications as in <xref target="Bur10"/>. Therefore,
this document explicitly does not recommend any particular technique
or implementation thereof.</t>
<t>The intention of this document is to provide a common basis for
interoperable exchange of anonymized data, furthering research in this
area, both on anonymization techniques themselves as well as to the
application of anonymized data to network measurement. To that end,
the classification in <xref target="cat-section"/> and
anonymization export support in <xref target="aes-section"/>
can be used to describe and export information even about data
anonymized using techniques that are unacceptably weak for general
application to production data sets on their own.</t>
<t>While the specification herein is designed to be implementation-
and technique-independent, open research in this area may necessitate
future updates to the specification. Assuming the future successful
application of this specification to anonymized data publication and
exchange, it may be brought back to the IPFIX working group for
further development and publication on the standards track.</t>
</section>
</section>
<section title="Terminology">
<t>Terms used in this document that are defined in the Terminology section
of the <xref target="RFC5101">IPFIX Protocol</xref> document are to be
interpreted as defined there. In addition, this document defines the
following terms:</t>
<list style="hanging">
<t hangText="Anonymization Record: ">A record, defined by the
Anonymization Options Template in section <xref target="opt-section"/>,
that defines the properties of the anonymization applied to a single
Information Element within a single Template or Options Template.</t>
<t hangText="Anonymized Data Record: ">A Data Record within a Data Set
containing at least one Information Element with anonymized values. The
Information Element(s) within the Template or Options Template
describing this Data Record SHOULD have a corresponding Anonymization
Record.</t>
<t hangText="Intermediate Anonymization Process: ">An intermediate
process which takes Data Records and and transforms them into Anonymized
Data Records.</t>
</list>
<t>Note that there is an explicit difference in this document between a
"Data Set" (which is defined as in <xref target="RFC5101"/>) and a "data
set". When in lower case, this term refers to any collection of data
(usually, within the context of this document, flow or packet data) which
may contain identifying information and is therefore subject to
anonymization.</t>
<t>Note also that when the term Template is used in this document, unless
otherwise noted, it applies both to Templates and Options Templates as
defined in <xref target="RFC5101"/>. Specifically, Anonymization Records
may apply to both Templates and Options Templates.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119">RFC
2119</xref>.</t>
</section>
<section title="Categorization of Anonymization Techniques" anchor="cat-section">
<t>Anonymization, as described by this document, is the modification of a
data set in order to protect the identity of the people or entities
described by the data set from disclosure. With respect to network traffic
data, anonymization generally attempts to preserve some set of properties
of the network traffic useful for a given application or applications,
while ensuring the data cannot be traced back to the specific networks,
hosts, or users generating the traffic.</t>
<t>Anonymization may be broadly classified according to two properties:
recoverability and countability. All anonymization techniques map the real
space of identifiers or values into a separate, anonymized space,
according to some function. A technique is said to be recoverable when the
function used is invertible or can otherwise be reversed and a real
identifier can be recovered from a given replacement identifier.
Techniques wherein the function used can only be reversed using additional
information, such as an encryption key, or knowledge of injected traffic
within the data set; "recoverability" as used within this categorization
does not refer to recoverability under attack.</t>
<t>Countability compares the dimension of the anonymized space (N) to the
dimension of the real space (M), and denotes how the count of unique
values is preserved by the anonymization function. If the anonymized space
is smaller than the real space, then the function is said to generalize
the input, mapping more than one input point to each anonymous value
(e.g., as with aggregation). By definition, generalization is not
recoverable.</t>
<t>If the dimensions of the anonymized and real spaces are the same, such
that the count of unique values is preserved, then the function is said to
be a direct substitution function. If the dimension of the anonymized
space is larger, such that each real value maps to a set of anonymized
values, then the function is said to be a set substitution function. Note
that with set substitution functions, the sets of anonymized values are
not necessarily disjoint. Either direct or set substitution functions are
said to be one-way if there exists no non-brute force method for
recovering the real data point from an anonymized one in isolation (i.e.,
if the only way to recover the data point is to attack the anonymized data
set as a whole, e.g. through fingerprinting or data injection).</t>
<t>This classification is summarized in the table below.</t>
<texttable>
<ttcol align="left">Recoverability / Countability</ttcol>
<ttcol align="left">Recoverable</ttcol>
<ttcol align="left">Non-recoverable</ttcol>
<c>N < M </c><c>N.A.</c><c>Generalization</c>
<c>N = M </c><c>Direct Substitution</c><c>One-way Direct Substitution</c>
<c>N > M </c><c>Set Substitution</c><c>One-way Set Substitution</c>
</texttable>
</section>
<section title="Anonymization of IP Flow Data">
<!--<t>[EDITOR'S NOTE: RESOLVED: discuss goals of anonymization here: two-way untraceability, one-way untraceability, partner profiling resistance (what's the difference)? Non-observability, non-linkability, correlation resistance. Point out we're mainly interested in obscuring (completely) the identity of at least one host in a given flow. Reorganize section to make this clear. (Discuss ST2)]</t>-->
<!--<t>[EDITOR'S NOTE: RESOLVED: Remove "intent to defeat" language (Discuss ST4)]</t>-->
<t>In anonymizing IP flow data as treated by this document, the goal is
generally two-way address untraceability: to remove the ability to assert
that endpoint X contacted endpoint Y at time T. Address untraceability is
important as IP addresses are the most suitable field in IP flow records to
identify real-world entities. Each IP address is associated with an
interface on a network host, and can potentially be identified with a
single user. Additionally, IP addresses are structured identifiers; that
is, partial IP address prefixes may be used to identify networks just as
full IP addresses identify hosts. This leads IP flow data anonymization to
be concerned first and foremost with IP address anonymization.</t>
<t>Any form of aggregation which combines flows from multiple endpoints
into a single record (e.g., aggregation by subnetwork, aggregation removing
addressing completely) may also provide address untraceability; however,
anonymization by aggregation is out of scope for this document.
Additionally of potential interest in this problem space but out of scope
are anonymization techniques which are applied over multiple fields or
multiple records in a way which introduces dependencies among anonymized
fields or records. This document is concerned solely with anonymization
techniques applied at the resolution of single fields within a flow
record.</t>
<t>Even so, attacks against these anonymization techniques use entire
flows and relationships between hosts and flows within a given data set.
Therefore, fields which may not necessarily be identifying by themselves
may be anonymized in order to increase the anonymity of the data set as a
whole.</t>
<t>Due to the restricted semantics of IP flow data, there is a relatively
limited set of specific anonymization techniques available on flow data,
though each falls into the broad categories discussed in the previous
section. Each type of field that may commonly appear in a flow record may
have its own applicable specific techniques.</t>
<t>As with IP addresses, MAC addresses uniquely identify devices on the
network; while they are not often available in traffic data collected at
Layer 3, and cannot be used to locate devices within the network, some
traces may contain sub-IP data including MAC address data. Hardware
addresses may be mappable to device serial numbers, and to the entities or
individuals who purchased the devices, when combined with external
databases. MAC addresses are also often used in constructing IPv6
addresses (see section 2.5.1 of <xref target="RFC4291"/>), and as such may
be used to reconstruct the low-order bits of anonymized IPv6 addresses in
certain circumstances. Therefore, MAC address anonymization is also
important.</t>
<t>Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and ephemeral
ports, can be used to determine the operating system running on a host.
Relative data volumes by port can also be used to determine the host's
function (workstation, web server, etc.); this information can be used to
identify hosts and users.</t>
<t>While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any given
network activity is recognizable by a pattern of relative time differences
and data volumes in the associated sequence of flows, even without host
address information. They can therefore be used to identify hosts and
users. Timestamps and counters are also vulnerable to traffic injection
attacks, where traffic with a known pattern is injected into a network
under measurement, and this pattern is later identified in the anonymized
data set. </t>
<t>The simplest and most extreme form of anonymization, which can be
applied to any field of a flow record, is black-marker anonymization, or
complete deletion of a given field. Note that black-marker anonymization
is equivalent to simply not exporting the field(s) in question.</t>
<t> While black-marker anonymization completely protects the data in
the deleted fields from the risk of disclosure, it also reduces the
utility of the anonymized data set as a whole. Techniques that retain some
information while reducing (though not eliminating) the disclosure risk
will be extensively discussed in the following sections; note that the
techniques specifically applicable to IP addresses, timestamps, ports, and
counters will be discussed in separate sections.</t>
<section title="IP Address Anonymization">
<t>Since IP addresses are the most common identifiers within flow data
that can be used to directly identify a person, organization, or host,
most of the work on flow and trace data anonymization has gone into IP
address anonymization techniques. Indeed, the aim of most attacks
against anonymization is to recover the map from anonymized IP addresses
to original IP addresses thereby identifying the identified hosts. There
is therefore a wide range of IP address anonymization schemes that fit
into the following categories.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Truncation</c><c>Generalization</c>
<c>Reverse Truncation</c><c >Generalization</c>
<c>Permutation</c><c>Direct Substitution</c>
<c>Prefix-preserving Pseudonymization</c><c>Direct Substitution</c>
</texttable>
<section title="Truncation">
<t>Truncation removes "n" of the least significant bits from an IP
address, replacing them with zeroes. In effect, it replaces a host
address with a network address for some fixed netblock; for IPv4
addresses, 8-bit truncation corresponds to replacement with a /24
network address. Truncation is a non-reversible generalization scheme.
Note that while truncation is effective for making hosts
non-identifiable, it preserves information which can be used to
identify an organization, a geographic region, a country, or a
continent.</t>
<t>Truncation to an address length of 0 is equivalent to black-marker
anonymization. Complete removal of IP address information is only
recommended for analysis tasks which have no need to separate flow
data by host or network; e.g. as a first stage to per-application
(port) or time-series total volume analyses.</t>
</section>
<section title="Reverse Truncation">
<t>Reverse truncation removes "n" of the most significant bits from an
IP address, replacing them with zeroes. Reverse truncation is a
non-reversible generalization scheme. Reverse truncation is effective
for making networks unidentifiable, partially or completely removing
information which can be used to identify an organization, a
geographic region, a country, or a continent (or RIR region of
responsibility). However, it may cause ambiguity when applied to data
collected from more than one network, since it treats all the hosts
with the same address on different networks as if they are the same
host. It is not particularly useful when publishing data where the
network of origin is known or can be easily guessed by virtue of the
identity of the publisher.</t>
<t>Like truncation, reverse truncation to an address length of 0 is
equivalent to black-marker anonymization.</t>
</section>
<section title="Permutation">
<!--<t>[EDITOR'S NOTE: RESOLVED: We should say something about the
security requirements of the permutation function. There is all the
difference in the world between, say, a block cipher and an xor with
a known constant, but this section doesn't actually make that
distinction. Below, insection 5.4, the draft says that the
permutation and its parameters SHOULD not be exported, but that's
not quite the same as saying that the permutation SHOULD be hard to
invert without knowing its parameters. Similarly, the recommendation
to use a hash function can fail badly if the hash function is known
to the attacker: it is trivial for the attacker to brute force all
IPv4 addresses to deanonymize subjects if a known hash is used. HMAC
with a secret key would be more appropriate. Discuss ST5]</t>-->
<t>Permutation is a direct substitution technique, replacing each IP
address with an address selected from the set of possible IP
addresses, such that each anonymized address represents a
unique original address. The selection function is often random,
though it is not necessarily so. Permutation does not preserve any
structural information about a network, but it does preserve the
unique count of IP addresses. Any application that requires more
structure than host-uniqueness will not be able to use permuted IP
addresses.</t>
<t>There are many variations of permutation functions, each of which
has tradeoffs in performance, security, and guarantees of
non-collision; evaluating these tradeoffs is implementation
independent. However, in general permutation functions applied to
anonymization SHOULD be difficult to reverse without knowing the
parameters (e.g., a secret key for HMAC). Given the relatively small
space of IPv4 addresses in particular, hash functions applied without
additional parameters could be reversed through brute force if the
hash function is known, and SHOULD NOT be used as permutation
functions. Permutation functions may guarantee noncollision (i.e.,
that each anonymized address represents a unique original address),
but need not; however, the probability of collision SHOULD be low. We
treat even permutations with low but nonzero collision probability as
direct substitution nevertheless. Beyond these guidelines,
recommendations for specific permutation functions are out of scope
for this document.</t>
</section>
<section title="Prefix-preserving Pseudonymization">
<t>Prefix-preserving pseudonymization is a direct substitution
technique, like permutation but further restricted such that the
structure of subnets is preserved at each level while anonymising IP
addresses. If two real IP addresses match on a prefix of "n" bits, the
two anonymized IP addresses will match on a prefix of "n" bits as
well. This is useful when relationships among networks must be
preserved for a given analysis task, but introduces structure into the
anonymized data which can be exploited in attacks against the
anonymization technique.</t>
<t>Scanning in Internet background traffic can cause particular
problems with this technique: if a scanner uses a predictable and
known sequence of addresses, this information can be used to reverse
the substitution. The low order portion of the address can be left
unanonymized as a partial defense against this attack.</t>
</section>
</section>
<section title="MAC Address Anonymization">
<t>Flow data containing sub-IP information can also contain identifying
information in the form of the hardware (MAC) address. While MAC
address information cannot be used to locate a node within a network, it
can be used to directly uniquely identify a specific device. Vendors or
organizations within the supply chain may then have the information
necessary to identify the entity or individual that purchased the
device.</t>
<t>MAC address information is not as structured as IP address
information. EUI-48 and EUI-64 MAC addresses contain an Organizational
Unique Identifier (OUI) in the three most significant bytes of the
address; this OUI additionally contains bits noting whether the address
is locally or globally administered. Beyond this, there is no standard
relationship among the OUIs assigned to a given vendor.</t>
<t>Note that MAC address information also appear within IPv6
addresses, as the EAP-64 address, or EAP-48 address encoded as an EAP-64
address, is used as the least significant 64 bits of the IPv6 address in
the case of link local addressing or stateless autoconfiguration; the
considerations and techniques in this section may then apply to such
IPv6 addresses as well.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Truncation</c><c>Generalization</c>
<c>Reverse Truncation</c><c>Generalization</c>
<c>Permutation</c><c>Direct Substitution</c>
<c>Structured Pseudonymization</c><c>Direct Substitution</c>
</texttable>
<section title="Truncation">
<t>Truncation removes "n" of the least significant bits from a MAC
address, replacing them with zeroes. In effect, it retains bits of
OUI, which identifies the manufacturer, while removing the least
significant bits identifying the particular device. Truncation of 24
bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the
device identifier while retaining the OUI.</t>
<t>Truncation is effective for making device manufacturers partially
or completely identifiable within a dataset while deleting unique
host identifiers; this can be used to retain and aggregate MAC layer
behavior by vendor.</t>
<t>Truncation to an address length of 0 is equivalent to
black-marker anonymization.</t>
</section>
<section title="Reverse Truncation">
<t>Reverse truncation removes "n" of the most significant bits from a
MAC address, replacing them with zeroes. Reverse truncation is a
non-reversible generalization scheme. This has the effect of removing
bits of the OUI, which identify manufacturers, before removing the
least significant bits. Reverse truncation of 24 bits zeroes out the
OUI.</t>
<t>Reverse truncation is effective for making device manufacturers
partially or completely unidentifiable within a dataset. However, it
may cause ambiguity by introducing the possibility of truncated MAC
address collision. Also note that the utility of removing manufacturer
information is not particularly well-covered by the literature.</t>
<t>Reverse truncation to an address length of 0 is
equivalent to black-marker anonymization.</t>
</section>
<section title="Permutation">
<t>Permutation is a direct substitution technique, replacing each
MAC address with an address selected from the set of possible
MAC addresses, such that each anonymized address
represents a unique original address. The selection function is often
random, though it is not necessarily so. Permutation does not preserve
any structural information about a network, but it does preserve the
unique count of devices on the network. Any application that requires
more structure than host-uniqueness will not be able to use permuted
MAC addresses.</t>
<t>There are many variations of permutation functions, each of which
has tradeoffs in performance, security, and guarantees of
non-collision; evaluating these tradeoffs is implementation
independent. However, in general permutation functions applied to
anonymization SHOULD be difficult to reverse without knowing the
parameters (e.g., a secret key for HMAC). While the EAP-48 space is
larger than the IPv4 address space, hash functions applied without
additional parameters could be reversed through brute force if the
hash function is known, and SHOULD NOT be used as permutation
functions. Permutation functions may guarantee noncollision (i.e.,
that each anonymized address represents a unique original address),
but need not; however, the probability of collision SHOULD be low. We
treat even permutations with low but nonzero collision probability as
direct substitution nevertheless. Beyond these guidelines,
recommendations for specific permutation functions are out of scope
for this document.</t>
</section>
<section title="Structured Pseudonymization">
<t>Structured pseudonymization for MAC addresses is a direct
substitution technique, like permutation, but restricted such
that the OUI (the most significant three bytes) is permuted separately
from the node identifier, the remainder. This is useful when the
uniqueness of OUIs must be preserved for a given analysis task, but
introduces structure into the anonymized data which can be exploited
in attacks against the anonymization technique.</t>
</section>
</section>
<section title="Timestamp Anonymization">
<t>The particular time at which a flow began or ended is not
particularly identifiable information, but it can be used as part of
attacks against other anonymization techniques or for user profiling,
e.g. as in <xref target="Mur07"/>. Timestamps can be used in traffic
injection attacks, which use known information about a set of traffic
generated or otherwise known by an attacker to recover mappings of other
anonymized fields, as well as to identify certain activity by response
delay and size fingerprinting, which compares response sizes and
inter-flow times in anonymized data to known values. Note that these
attacks have been shown to be relatively robust against timestamp
anonymization techniques (see <xref target="Bur10"/>), so the techniques
presented in this section are relatively weak and should be used with
care.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalization</c>
<c>Enumeration</c><c>Direct or Set Substitution</c>
<c>Random Shifts</c><c>Direct Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>Precision Degradation is a generalization technique that removes
the most precise components of a timestamp, accounting all events
occurring in each given interval (e.g. one millisecond for millisecond
level degradation) as simultaneous. This has the effect of potentially
collapsing many timestamps into one. With this technique time
precision is reduced, and sequencing may be lost, but the information
at which time the event occurred is preserved. The anonymized data may
not be generally useful for applications which require strict
sequencing of flows.</t>
<t>Note that flow meters with low time precision (e.g. second
precision, or millisecond precision on high-capacity networks) perform
the equivalent of precision degradation anonymization by their
design.</t>
<t>Note also that degradation to a very low precision (e.g. on the
order of minutes, hours, or days) is commonly used in analyses
operating on time-series aggregated data, and may also be described as
binning; though the time scales are longer and applicability more
restricted, this is in principle the same operation.</t>
<t>Precision degradation to infinitely low precision is equivalent to
black-marker anonymization. Removal of timestamp information is only
recommended for analysis tasks which have no need to separate flows in
time, for example for counting total volumes or unique occurrences of
other flow keys in an entire dataset.</t>
</section>
<section title="Enumeration">
<t>Enumeration is a substitution function that retains the
chronological order in which events occurred while eliminating time
information. Timestamps are substituted by equidistant timestamps (or
numbers) starting from a randomly chosen start value. The resulting
data is useful for applications requiring strict sequencing, but not
for those requiring good timing information (e.g. delay- or jitter-
measurement for quality-of-service (QoS) applications or service-level
agreement (SLA) validation).</t>
<t>Note that enumeration is functionally equivalent to precision
degradation in any environment into which traffic can be regularly
injected to serve as a clock at the precision of the frequency of the
injected flows.</t>
</section>
<section title="Random Shifts">
<t>Random time shifts add a random offset to every timestamp within a
dataset. This reversible substitution technique therefore retains
duration and inter-event interval information as well as chronological
order of flows. Random time shifts are quite weak, and relatively easy
to reverse in the presence of external knowledge about traffic on the
measured network.</t>
</section>
</section>
<section title="Counter Anonymization">
<t>Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymization, or for user
profiling as timestamps are. Data sets with anonymized counters are
useful only for analysis tasks for which relative or imprecise
magnitudes of activity are useful. Counter information can also be
completely removed, but this is only recommended for analysis tasks
which have no need to evaluate the removed counter, for example for
counting only unique occurrences of other flow keys.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalization</c>
<c>Binning</c><c>Generalization</c>
<c>Random noise addition</c><c>Direct or Set Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>As with precision degradation in timestamps, precision degradation
of counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships
between sizes of similarly-sized flows, but keeps relative magnitude
information. Precision degradation to an infinitely low precision is
equivalent to black-marker anonymization.</t>
</section>
<section title="Binning">
<t>Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the
counter ranges are uniform, and in binning they need not be. For
example, consider separating unopened TCP connections from potentially
opened TCP connections. Here, packet counters per flow would be binned
into two bins, one for 1-2 packet flows, and one for flows with 3 or
more packets. Binning schemes are generally chosen to keep precisely
the amount of information required in a counter for a given analysis
task. Note that, also unlike precision degradation, the bin label need
not be within the bin's range. Binning counters to a single bin is
equivalent to black-marker anonymization. </t>
</section>
<section title="Random Noise Addition">
<t>Random noise addition adds a random amount to a counter in each
flow; this is used to keep relative magnitude information and minimize
the disruption to size relationship information while avoiding
fingerprinting attacks against anonymization. Note that there is no
guarantee that random noise addition will maintain ranking order by a
counter among members of a set. Random noise addition is particularly
useful when the derived analysis data will not be presented in such a
way as to require the lower-order bits of the counters.</t>
</section>
</section>
<section title="Anonymization of Other Flow Fields">
<t>Other fields, particularly port numbers and protocol numbers, can
be used to partially identify the applications that generated the
traffic in a a given flow trace. This information can be used in
fingerprinting attacks, and may be of interest on its own (e.g., to
reveal that a certain application with suspected vulnerabilities is
running on a given network). These fields are generally
anonymized using one of two techniques.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Binning</c><c>Generalization</c>
<c>Permutation</c><c>Direct Substitution</c>
</texttable>
<section title="Binning">
<t>Binning is a generalization technique mapping a set of potentially
non-uniform ranges into a set of arbitrarily labeled bins. Common bin
arrangements depend on the field type and the analysis application.
For example, an IP protocol bin arrangement may preserve 1, 6, and 17
for ICMP, UDP, and TCP traffic, and bin all other protocols into a
single bin, to mitigate the use of uncommon protocols in
fingerprinting attacks. Another example arrangement may bin source and
destination ports into low (0-1023) and high (1024-65535) bins in
order to tell service from ephemeral ports without identifying
individual applications.</t>
<t>Binning other flow key fields to a single bin is equivalent to
black-marker anonymization. Removal of other flow key information is
only recommended for analysis tasks which have no need to
differentiate flows on the removed keys, for example for total traffic
counts or unique counts of other flow keys.</t>
</section>
<section title="Permutation">
<t>Permutation is a direct substitution technique, replacing
each value with an value selected from the set of possible
range, such that each anonymized value represents a unique
original value. This is used to preserve the count of unique values
without preserving information about, or the ordering of, the values
themselves.</t>
<t>While permutation ideally guarantees that each anonymized value
represents a unique original value, such may require significant state
in the Intermediate Anonymization Process. Therefore, permutation may
be implemented by hashing for performance reasons, with hash functions
that may have relatively small collision probabilities. Such
techniques are still essentially direct substitution techniques,
despite the nonzero error probability.</t>
</section>
</section>
</section>
<section title="Parameters for the Description of Anonymization Techniques">
<t>This section details the abstract parameters used to describe the
anonymization techniques examined in the previous section, on a
per-parameter basis. These parameters and their export safety inform the
design of the IPFIX anonymization metadata export specified in the
following section.</t>
<section title="Stability" anchor="params-stability">
<t>A stable anonymization will always map a given value in the real
space to a given value in the anonymized space, while an unstable
anonymization will change this mapping over time; a completely unstable
anonymization is essentially indistinguishable from black-marker
anonymization. Any given anonymization technique may be applied with a
varying range of stability. Stability is important for assessing the
comparability of anonymized information in different data sets, or in
the same data set over different time periods. In practice, an
anonymization may also be stable for every data set published by an a
particular producer to a particular consumer, stable for a stated time
period within a dataset or across datasets, or stable only for a single
data set.</t>
<t>If no information about stability is available, users of anonymized
data MAY assume that the techniques used are stable across the entire
dataset, but unstable across datasets. Note that stability presents a
risk-utility tradeoff, as completely stable anonymization can be used
for longer-term trend analysis tasks but also presents more risk of
attack given the stable mapping. Information about the stability of
a mapping SHOULD be exported along with the anonymized data.</t>
</section>
<section title="Truncation Length">
<t>Truncation and precision degradation are described by the truncation
length, or the amount of data still remaining in the anonymized field
after anonymization.</t>
<t>Truncation length can generally be inferred from a given data set,
and need not be specially exported or protected. For bit-level
truncation, the truncated bits are generally inferable by the least
significant bit set for an instance of an Information Element described
by a given Template (or the most significant bit set, in the case of
reverse truncation). For precision degradation, the truncation is
inferable from the maximum precision given. Note that while this
inference method is generally applicable, it is data-dependent: there is
no guarantee that it will recover the exact truncation length used to
prepare the data.</t>
<t>In the special case of IP address export with variable (per-record)
truncation, the truncation MAY be expressed by exporting the prefix
length alongside the address.</t>
</section>
<section title="Bin Map">
<t>Binning is described by the specification of a bin mapping function.
This function can be generally expressed in terms of an associative
array that maps each point in the original space to a bin, although from
an implementation standpoint most bin functions are much simpler and
more efficient.</t>
<t>Since the bin map for a bin mapping function is in essence the bin
mapping key, and can be used to partially deanonymize binned data,
depending on the degree of generalization, information about the bin
mapping function SHOULD NOT be exported.</t>
</section>
<section title="Permutation">
<t>Like binning, permutation is described by the specification of a
permutation function. In the general case, this can be expressed in
terms of an associative array that maps each point in the original space
to a point in the anonymized space. Unlike binning, each point in the
anonymized space corresponds to a single, unique point in the
original space.</t>
<t>Since the parameters of the permutation function are in essence
key-like (indeed, for cryptographic permutation functions, they are the
keys themselves), information about the permutation function or its
parameters SHOULD NOT be exported.</t>
</section>
<section title="Shift Amount">
<t>Shifting requires an amount to shift each value by. Since the shift
amount is the only key to a shift function, and can be used to trivially
deanonymize data protected by shifting, information about the shift
amount SHOULD NOT be exported.</t>
</section>
</section>
<section title="Anonymization Export Support in IPFIX" anchor="aes-section">
<t>Anonymized data exported via IPFIX SHOULD be annotated with
anonymization metadata, which details which fields described by which
Templates are anonymized, and provides appropriate information on the
anonymization techniques used. This metadata SHOULD be exported in Data
Records described by the recommended Options Templates described in this
section; these Options Templates use the additional Information Elements
described in the following subsection.</t>
<t>Note that fields anonymized using the black-marker (removal) technique
do not require any special metadata support: black-marker anonymized
fields SHOULD NOT be exported at all, by omitting the corresponding
Information Elements from Template describing the Data Set. In the case
where application requirements dictate that a black-marker anonymized
field must remain in a Template, then an Exporting Process MAY export
black-marker anonymized fields with their native length as all-zeros, but
only in cases where enough contextual information exists within the record
to differentiate a black-marker anonymized field exported in this way from
a real zero value.</t>
<section title="Anonymization Records and the Anonymization Options Template" anchor="opt-section">
<t>The Anonymization Options Template describes Anonymization Records,
which allow anonymization metadata to be exported inline over IPFIX or
stored in an IPFIX File, by binding information about anonymization
techniques to Information Elements within defined Templates or Options
Templates. IPFIX Exporting Processes SHOULD export anonymization records
for any Template describing exported anonymized Data Records; IPFIX
Collecting Processes and processes downstream from them MAY use
anonymization records to treat anonymized data differently depending on
the applied technique.</t>
<t>Anonymization Records contain ancillary information bound to a
Template, so many of the considerations for Templates apply to
Anonymization Records as well. First, reliability is important: an
Exporting Process SHOULD export Anonymization Records after the
Templates they describe have been exported, and SHOULD export
anonymization records reliably if supported by the underlying transport
(i.e., without partial reliability when using SCTP)</t>
<t>Anonymization Records MUST be handled by Collecting Processes as
scoped to the Template to which they apply within the Transport Session
in which they are sent. When a Template is withdrawn via a Template
Withdrawal Message or expires during a UDP transport session, the
accompanying Anonymization Records are withdrawn or expire as well, and
do not apply to subsequent Templates with the same Template ID within
the Session unless re-exported.</t>
<t>The Stability Class within the anonymizationFlags IE can be used to
declare that a given anonymization technique's mapping will remain
stable across multiple sessions, but this does not mean that
anonymization technique information given in the Anonymization Records
themselves persist across Sessions. Each new Transport Session MUST
contain new Anonymization Records for each Template describing
anonymized Data Sets.</t>
<t>SCTP per-stream export <xref
target="I-D.ietf-ipfix-export-per-sctp-stream"/> may be used to ease
management of Anonymization Records if appropriate for the
application.</t>
<t>The fields of the Anonymization Options template are as follows:</t>
<texttable>
<ttcol align="left">IE</ttcol>
<ttcol align="left">Description</ttcol>
<c>templateId [scope]</c>
<c>
The Template ID of the Template or Options Template containing the
Information Element described by this anonymization record. This
Information Element MUST be defined as a Scope Field.
</c>
<c>informationElementId [scope]</c>
<c>
The Information Element identifier of the Information Element
described by this anonymization record. This Information Element
MUST be defined as a Scope Field. Exporting Processes MUST clear
then Enterprise bit of the informationElementId and Collecting
Processes SHOULD ignore it; information about enterprise-specific
Information Elements is exported via the privateEnterpriseNumber
Information Element.
</c>
<c>privateEnterpriseNumber [scope] [optional]</c>
<c>
The Private Enterprise Number of the enterprise-specific Information
Element described by this anonymization record. This Information
Element MUST be defined as a Scope Field if present. A
privateEnterpriseNumber of 0 signifies that the Information Element
is IANA-registered.
</c>
<c>informationElementIndex [scope] [optional]</c>
<c>
The Information Element index of the instance of the Information
Element described by this anonymization record identified by the
informationElementId within the Template. Optional; need only be
present when describing Templates that have multiple instances of
the same Information Element. This Information Element MUST be
defined as a Scope Field if present. This Information Element is
defined in <xref target="ie-section"></xref>, below.
</c>
<c>anonymizationFlags</c>
<c>
Flags describing the mapping stability and specialized modifications
to the Anonymization Technique in use. SHOULD be present. This
Information Element is defined in <xref target="ie-af-section"/>,
below.
</c>
<c>anonymizationTechnique</c>
<c>
The technique used to anonymize the data. MUST be present. This
Information Element is defined in <xref target="ie-at-section"/>,
below.
</c>
</texttable>
</section>
<section title="Recommended Information Elements for Anonymization Metadata" anchor="ie-section">
<section title="informationElementIndex" anchor="ie-iei-section">
<list style="hanging">
<t hangText="Description: ">
A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements.</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="Data Type Semantics: ">identifier</t>
<t hangText="ElementId: ">TBD3</t>
<t hangText="Status: ">Current</t>
</list>
</section>
<section title="anonymizationTechnique" anchor="ie-at-section">
<list style="hanging">
<t hangText="Description: ">
A description of the anonymization technique applied to a
referenced Information Element within a referenced Template. Each
technique may be applicable only to certain Information Elements
and recommended only for certain Infomation Elements; these
restrictions are noted in the table below.
<texttable>
<ttcol align="left">Value</ttcol>
<ttcol align="left">Description</ttcol>
<ttcol align="left">Applicable to</ttcol>
<ttcol align="left">Recommended for</ttcol>
<c>0</c>
<c>Undefined: the Exporting Process makes no representation as to whether the defined field is anonymized or not. While the Collecting Process MAY assume that the field is not anonymized, it is not guaranteed not to be. This is the default anonymization technique.</c>
<c>all</c>
<c>all</c>
<c>1</c>
<c>None: the values exported are real.</c>
<c>all</c>
<c>all</c>
<c>2</c>
<c>Precision Degradation/Truncation: the values exported are anonymized using simple precision degradation or truncation. The new precision or number of truncated bits is implicit in the exported data, and can be deduced by the Collecting Process.</c>
<c>all</c>
<c>all</c>
<c>3</c>
<c>Binning: the values exported are anonymized into bins.</c>
<c>all</c>
<c>all</c>
<c>4</c><c>Enumeration: the values exported are anonymized by enumeration.</c>
<c>all</c>
<c>timestamps</c>
<c>5</c>
<c>Permutation: the values exported are anonymized by permutation.</c>
<c>all</c>
<c>identifiers</c>
<c>6</c><c>Structured Permutation: the values exported are anonymized by permutation, preserving bit-level structure as appropriate; this represents prefix-preserving IP address anonymization or structured MAC address anonymization.</c>
<c>addresses</c>
<c></c>
<c>7</c><c>Reverse Truncation: the values exported are anonymized using reverse truncation. The number of truncated bits is implicit in the exported data, and can be deduced by the Collecting Process.</c>
<c>addresses</c>
<c></c>
<c>8</c><c>Noise: the values exported are anonymized by adding random noise to each value.</c>
<c>non-identifiers</c>
<c>counters</c>
<c>9</c><c>Offset: the values exported are anonymized by adding a single offset to all values.</c>
<c>all</c>
<c>timestamps</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="Data Type Semantics: ">identifier</t>
<t hangText="ElementId: ">TBD2</t>
<t hangText="Status: ">Current</t>
</list>
</section>
<section title="anonymizationFlags" anchor="ie-af-section">
<list style="hanging">
<t hangText="Description: ">
A flag word describing specialized modifications to the
anonymization policy in effect for the anonymization technique
applied to a referenced Information Element within a referenced
Template. When flags are clear (0), the normal policy (as
described by anonymizationTechnique) applies without
modification.
<figure title="anonymizationFlags IE">
<artwork><![CDATA[
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| Reserved |LOR|PmA| SC |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
</figure>
<texttable>
<ttcol align="left">bit(s) (LSB = 0)</ttcol>
<ttcol align="left">name</ttcol>
<ttcol align="left">description</ttcol>
<c>0-1</c><c>SC</c><c>Stability Class: see the Stability Class
table below, and section <xref target="params-stability"/>.</c>
<c>2</c><c>PmA</c><c>Perimeter Anonymization: when set (1),
source- Information Elements as described in <xref
target="RFC5103"/> are interpreted as external addresses, and
destination- Information Elements as described in <xref
target="RFC5103"/> are interpreted as internal addresses, for
the purposes of associating anonymizationTechnique to
Information Elements only; see <xref
target="perimeter-anon"/> for details. This bit MUST NOT be set
when associated with a non-endpoint (i.e., source- or
destination-) Information Element. SHOULD be consistent within a
record (i.e., if a source- Information Element has this flag
set, the corresponding destination- element SHOULD have this
flag set, and vice-versa.)</c>
<c>3</c><c>LOR</c><c>Low-Order Unchanged: when set (1), the
low-order bits of the anonymized Information Element contain
real data. This modification is intended for the anonymization
of network-level addresses while leaving host-level addresses
intact in order to preserve host level-structure, which could
otherwise be used to reverse anonymization. MUST NOT be set when
associated with a truncation-based anonymizationTechnique.</c>
<c>4-15</c><c>Reserved</c><c>Reserved for future use: SHOULD be
cleared (0) by the Exporting Process and MUST be ignored by the
Collecting Process.</c>
</texttable>
The Stability Class portion of this flags word describes the
stability class of the anonymization technique applied to a
referenced Information Element within a referenced Template.
Stability classes refer to the stability of the parameters of
the anonymization technique, and therefore the comparability of
the mapping between the real and anonymized values over time.
This determines which anonymized datasets may be compared with
each other. Values are as follows:
<texttable>
<ttcol align="left">Bit 1</ttcol>
<ttcol align="left">Bit 0</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c><c>0</c><c>Undefined: the Exporting Process makes no representation as to how stable the mapping is, or over what time period values of this field will remain comparable; while the Collecting Process MAY assume Session level stability, Session level stability is not guaranteed. Processes SHOULD assume this is the case in the absence of stability class information; this is the default stability class.</c>
<c>0</c><c>1</c><c>Session: the Exporting Process will ensure that the parameters of the anonymization technique are stable during the Transport Session. All the values of the described Information Element for each Record described by the referenced Template within the Transport Session are comparable. The Exporting Process SHOULD endeavour to ensure at least this stability class.</c>
<c>1</c><c>0</c><c>Exporter-Collector Pair: the Exporting Process will ensure that the parameters of the anonymization technique are stable across Transport Sessions over time with the given Collecting Process, but may use different parameters for different Collecting Processes. Data exported to different Collecting Processes are not comparable.</c>
<c>1</c><c>1</c><c>Stable: the Exporting Process will ensure that the parameters of the anonymization technique are stable across Transport Sessions over time, regardless of the Collecting Process to which it is sent.</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="Data Type Semantics: ">flags</t>
<t hangText="ElementId: ">TBD1</t>
<t hangText="Status: ">Current</t>
</list>
</section>
</section>
</section>
<section title="Applying Anonymization Techniques to IPFIX Export and Storage" anchor="export-anon-section">
<t>When exporting or storing anonymized flow data using IPFIX, certain
interactions between the IPFIX Protocol and the anonymization techniques
in use must be considered; these are treated in the subsections below.</t>
<section title="Arrangement of Processes in IPFIX Anonymization" anchor="export-anon-arrangement">
<t>Anonymization may be applied to IPFIX data at three stages within the
collection infrastructure: on initial export, at a mediator, or after
collection, as shown in <xref target="loc-fig"></xref>. Each of these
locations has specific considerations and applicability.</t>
<figure title="Potential Anonymization Locations" anchor="loc-fig">
<artwork><![CDATA[
+==========================================+
| Exporting Process |
+==========================================+
| |
| (Anonymized at Original Exporter) |
V |
+=============================+ |
| Mediator | |
+=============================+ |
| |
| (Anonymising Mediator) |
V V
+==========================================+
| Collecting Process |
+==========================================+
|
| (Anonymising CP/File Writer)
V
+--------------------+
| IPFIX File Storage |
+--------------------+
]]></artwork>
</figure>
<t>Anonymization is generally performed before the wider dissemination
or repurposing of a flow data set, e.g., adapting operational
measurement data for research. Therefore, direct anonymization of flow
data on initial export is only applicable in certain restricted
circumstances: when the Exporting Process (EP) is "publishing" data to a
Collecting Process (CP) directly, and the Exporting Process and
Collecting Process are operated by different entities. Note that certain
guidelines in <xref target="header-anon"></xref> with respect to
timestamp anonymization may not apply in this case, as the Collecting
Process may be able to deduce certain timing information from the time
at which each Message is received.</t>
<t>A much more flexible arrangement is to anonymize data within a <xref
target="I-D.ietf-ipfix-mediators-framework">Mediator</xref>. Here,
original data is sent to a Mediator, which performs the anonymization
function and re-exports the anonymized data. Such a Mediator could be
located at the administrative domain boundary of the initial Exporting
Process operator, exporting anonymized data to other consumers outside
the organization. In this case, the original Exporter SHOULD use <xref
target="RFC5246">TLS</xref> as specified in <xref
target="RFC5101"></xref> to secure the channel to the Mediator, and the
Mediator should follow the guidelines in <xref
target="guidelines"></xref>, to mitigate the risk of original data
disclosure.</t>
<t>When data is to be published as an anonymized data set in an <xref
target="RFC5655">IPFIX File</xref>, the anonymization may be done at the
final Collecting Process before storage and dissemination, as well. In
this case, the Collector should follow the guidelines in <xref
target="guidelines"></xref>, especially as regards File-specific Options
in <xref target="opt-anon"></xref> </t>
<t>In each of these data flows, the anonymization of records is
undertaken by an Intermediate Anonymization Process (IAP); the data
flows into and out of this IAP are shown in <xref target="iap-dataflows"></xref> below.</t>
<figure title="Data flows through the anonymization process" anchor="iap-dataflows">
<artwork><![CDATA[
packets --+ +- IPFIX Messages -+
| | |
V V V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
+==================+ +====================+ +=============+
| Non-anonymized | Records |
V V V
+=========================================================+
| Intermediate Anonymization Process (IAP) |
+=========================================================+
| Anonymized ^ Anonymized |
| Records | Records |
V | V
+===================+ Anonymization +=============+
| Exporting Process |<--- Parameters ------>| File Writer |
+===================+ +=============+
| |
+------------> IPFIX Messages <----------+
]]></artwork>
</figure>
<t>Anonymization parameters must also be available to the Exporting
Process and/or File Writer in order to ensure header data is also
appropriately anonymized as in <xref target="header-anon"></xref>.</t>
<t>Following each of the data flows through the IAP, we describe
five basic types of anonymization arrangements within this framework in
<xref target="iap-arrangements"></xref>. In addition to the three arrangements
described in detail above, anonymization can also be done at a
collocated Metering Process (MP) and File Writer (FW) (see section 7.3.2 of <xref target="RFC5655"></xref>), or at a file manipulator, which combines a File Writer with a File Reader (FR) (see section
7.3.7 of <xref target="RFC5655"></xref>).</t>
<figure title="Possible anonymization arrangements in the IPFIX architecture" anchor="iap-arrangements">
<artwork><![CDATA[
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| EP |-> anonymization on Original Exporter
+----+ +-----+ +----+
+----+ +-----+ +----+
pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy)
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
+----+ +-----+ +----+
+----+ +-----+ +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
File +----+ +-----+ +----+
]]></artwork>
</figure>
<t>Note that anonymization may occur at more than one location within a
given collection infrastructure, to provide varying levels of anonymization,
disclosure risk, or data utility for specific purposes.</t>
</section>
<section title="IPFIX-Specific Anonymization Guidelines" anchor="guidelines">
<t>In implementing and deploying the anonymization techniques described
in this document, implementors should note that IPFIX already provides
features that support anonymized data export, and use these where
appropriate. Care must also be taken that data structures supporting the
operation of the protocol itself do not leak data that could be used to
reverse the anonymization applied to the flow data. Such data structures
may appear in the header, or within the data stream itself, especially
as options data. Each of these and their impact on specific
anonymization techniques is noted in a separate subsection below.</t>
<section title="Appropriate Use of Information Elements for Anonymized Data" anchor="iespec-anon">
<t>Note, as in <xref target="aes-section"></xref> above, that
black-marker anonymized fields SHOULD NOT be exported at all; the
absence of the field in a given Data Set is implicitly declared by not
including the corresponding Information Element in the Template
describing that Data Set.</t>
<t>When using precision degradation of timestamps, Exporting Processes
SHOULD export timing information using Information Elements of an
appropriate precision, as explained in Section 4.5 of <xref
target="RFC5153"></xref>. For example, timestamps measured in
millisecond-level precision and degraded to second-level precision
should use flowStartSeconds and flowEndSeconds, not
flowStartMilliseconds and flowEndMilliseconds.</t>
<t>When exporting anonymized data and anonymization metadata,
Exporting Processes SHOULD ensure that the combination of Information
Element and declared anonymization technique are compatible.
Specifically, the applicable and recommended Information Element types
and semantics for each technique are noted in the description of the
anonymizationTechnique Information Element in <xref
target="ie-at-section"></xref>. In this description, a timestamp is an
Information Element with the data type dateTimeSeconds,
dataTimeMilliseconds, dateTimeMicroseconds, or dateTimeNanoseconds; an
address is an Information Element with the data type ipv4Address,
ipv6Address, or macAddress; and an identifier is an Information
Element with identifier data type semantics. Exporting Process MUST
NOT export Anonymization Options records binding techniques to
Information Elements to which they are not applicable, and SHOULD NOT
export Anonymization Options records binding techniques to Information
Elements for which they are not recommended. </t>
</section>
<section title="Export of Perimeter-Based Anonymization Policies" anchor="perimeter-anon">
<t>Data collected from a single network may require different
anonymization policies for addresses internal and external to the
network. For example, internal addresses could be subject to simple
permutation, while external addresses could be aggregated into
networks by truncation. When exporting anonymized perimeter
bidirectional flow (biflow) data as in section 5.2 of <xref
target="RFC5103"/>, this arrangement may be easily represented by
specifying one technique for source endpoint information (which
represents the external endpoint in a perimeter biflow) and one
technique for destination endpoint information (which represents the
internal address in a perimeter biflow).</t>
<t>However, it can also be useful to represent perimeter-based
anonymization policies with unidirectional flow (uniflow), or
non-perimeter biflow data. In this case, the Perimeter Anonymization
bit (bit 2) in the anonymizationFlags Information Element describing
the anonymized address Information Elements can be set to change the
meaning of "source" and "destination" of Information Elements to
mean "external" and "internal" as with perimeter biflows, but only
with respect to anonymization policies.</t>
</section>
<section title="Anonymization of Header Data" anchor="header-anon">
<t>Each IPFIX Message contains a Message Header; within this Message
Header are contained two fields which may be used to break certain
anonymization techniques: the Export Time, and the Observation Domain
ID</t>
<t>Export of IPFIX Messages containing anonymized timestamp data where
the original Export Time Message header has some relationship to the
anonymized timestamps SHOULD anonymize the Export Time header field so
that the Export Time is consistent with the anonymized timestamp data.
Otherwise, relationships between export and flow time could be used to
partially or totally reverse timestamp anonymization. When anonymising
timestamps and the Export Time header field SHOULD avoid times too far
in the past or future; while <xref target="RFC5101"/> does not make
any allowance for Export Time error detection, it is sensible that
Collecting Processes may interpret Messages with seemingly nonsensical
Export Times as erroneous. Specific limits are
implementation-dependent, but this issue may cause interoperability
issues when anonymising the Export Time header field.</t>
<t>The similarity in size between an Observation Domain ID and an IPv4
address (32 bits) may lead to a temptation to use an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID. If this address bears some relation to the IP addresses in the
flow data (e.g., shares a network prefix with internal addresses) and
the IP addresses in the flow data are anonymized in a
structure-preserving way, then the Observation Domain ID may be used
to break the IP address anonymization. Use of an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID is NOT RECOMMENDED in this case.</t>
</section>
<section title="Anonymization of Options Data" anchor="opt-anon">
<t>IPFIX uses the Options mechanism to export, among other things,
metadata about exported flows and the flow collection infrastructure.
As with the IPFIX Message Header, certain Options recommended in <xref
target="RFC5101"></xref> and <xref target="RFC5655"></xref> containing
flow timestamps and network addresses of Exporting and Collecting
Processes may be used to break certain anonymization techniques. When
using these Options along anonymized data export and storage, values
within the Options which could be used to break the anonymization
SHOULD themselves be anonymized or omitted.</t>
<t>The Exporting Process Reliability Statistics Options Template,
recommended in <xref target="RFC5101"></xref>, contains an Exporting
Process ID field, which may be an exportingProcessIPv4Address
Information Element or an exportingProcessIPv6Address Information
Element. If the Exporting Process address bears some relation to the
IP addresses in the flow data (e.g., shares a network prefix with
internal addresses) and the IP addresses in the flow data are
anonymized in a structure-preserving way, then the Exporting Process
address may be used to break the IP address anonymization. Exporting
Processes exporting anonymized data in this situation SHOULD mitigate
the risk of attack either by omitting Options described by the
Exporting Process Reliability Statistics Options Template, or by
anonymising the Exporting Process address using a similar technique to
that used to anonymize the IP addresses in the exported data.</t>
<t>Similarly, the Export Session Details Options Template and Message
Details Options Template specified for the <xref
target="RFC5655">IPFIX File Format</xref> may contain the
exportingProcessIPv4Address Information Element or the
exportingProcessIPv6Address Information Element to identify an
Exporting Process from which a flow record was received, and the
collectingProcessIPv4Address Information Element or the
collectingProcessIPv6Address Information Element to identify the
Collecting Process which received it. If the Exporting Process or
Collecting Process address bears some relation to the IP addresses in
the data set (e.g., shares a network prefix with internal addresses)
and the IP addresses in the data set are anonymized in a
structure-preserving way, then the Exporting Process or Collecting
Process address may be used to break the IP address anonymization.
Since these Options Templates are primarily intended for storing IPFIX
Transport Session data for auditing, replay, and testing purposes, it
is NOT RECOMMENDED that storage of anonymized data include these
Options Templates in order to mitigate the risk of attack.</t>
<t>The Message Details Options Template specified for the <xref
target="RFC5655">IPFIX File Format</xref> also contains the
collectionTimeMilliseconds Information Element. As with the Export
Time Message Header field, if the exported data set contains
anonymized timestamp information, and the collectionTimeMilliseconds
Information Element in a given Message has some relationship to the
anonymized timestamp information, then this relationship can be
exploited to reverse the timestamp anonymization. Since this Options
Template is primarily intended for storing IPFIX Transport Session
data for auditing, replay, and testing purposes, it is NOT RECOMMENDED
that storage of anonymized data include this Options Template in order
to mitigate the risk of attack.</t>
<t>Since the Time Window Options Template specified for the <xref
target="RFC5655">IPFIX File Format</xref> refers to the timestamps
within the data set to provide partial table of contents information
for an IPFIX File, Options described by this template SHOULD be
written using the anonymized timestamps instead of the original
ones.</t>
</section>
<section title="Special-Use Address Space Considerations" anchor="sua-anon">
<t>When anonymising data for transport or storage using IPFIX
containing anonymized IP addresses, and the analysis purpose permits
doing so, it is RECOMMENDED to filter out or leave unanonymized data
containing the special-use IPv4 addresses enumerated in <xref
target="RFC5735"/> or the special-use IPv6 addresses enumerated in
<xref target="RFC5156"/>. Data containing these addresses (e.g.
0.0.0.0 and 169.254.0.0/16 for link-local autoconfiguration in IPv4
space) are often associated with specific, well-known behavioral
patterns. Detection of these patterns in anonymized data can lead to
deanonymization of these special-use addresses, which increases the
chance of a complete reversal of anonymization by an attacker,
especially of prefix-preserving techniques.</t>
</section>
<section title="Protecting Out-of-Band Configuration and Management Data">
<t>Special care should be taken when exporting or sharing anonymized
data to avoid information leakage via the configuration or management
planes of the IPFIX Device containing the Exporting Process or the File
Writer. For example, adding noise to counters is useless if the
receiver can deduce the values in the counters from SNMP information,
and concealing the network under test is similarly useless if such
information is available in a configuration document. As the specifics
of these concerns are largely implementation- and deployment-dependent,
specific mitigation is out of scope for this draft. The general ground
rule is that information of similar type to that anonymized SHOULD NOT
be made available to the receiver by any means, whether in the Data
Records, in IPFIX protocol structures such as Message Headers, or
out-of-band.</t>
</section>
</section>
</section>
<section title="Examples">
<t>In this example, consider the export or storage of an anonymized IPv4 data set from a single network described by a simple template containing a timestamp in seconds, a five-tuple, and packet and octet counters. The template describing each record in this data set is shown in figure <xref target="af-template"/>.</t>
<figure title="Example Flow Template" anchor="af-template">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 2 | Length = 40 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 256 | Field Count = 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| flowStartSeconds 150 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceIPv4Address 8 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationIPv4Address 12 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| sourceTransportPort 7 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| destinationTransportPort 11 | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| packetDeltaCount 2 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| octetDeltaCount 1 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| protocolIdentifier 4 | Field Length = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Suppose that this data set is anonymized according to the following policy:</t>
<list style="symbols">
<t>IP addresses within the network are protected by reverse truncation.</t>
<t>IP addresses outside the network are protected by prefix-preserving anonymization.</t>
<t>Octet counts are exported using degraded precision in order to provide minimal protection against fingerprinting attacks.</t>
<t>All other fields are exported unanonymized.</t>
</list>
<t>In order to export anonymization records for this template and policy,
first, the Anonymization Options Template shown in figure <xref target="anon-opt-template"/> is exported. For this
example, the optional privateEnterpriseNumber and informationElementIndex
Information Elements are omitted, because they are not used.</t>
<figure title="Example Anonymization Options Template" anchor="anon-opt-template">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 3 | Length = 26 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 257 | Field Count = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scope Field Count = 2 |0| templateID 145 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| informationElementId 303 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymizationFlags TBD1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymizationTechnique TBD2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Following the Anonymization Options Template comes a Data Set
containing Anonymization Records. This data set has an entry for each
Information Element Specifier in Template 256 describing the flow records.
This Data Set is shown in figure <xref target="anon-records"/>. Note that
sourceIPv4Address and destinationIPv4Address have the Perimeter
Anonymization (0x0004) flag set in anonymizationFlags, meaning that source
address should be treated as network-external, and the destination address
as network-internal.</t>
<figure title="Example Anonymization Records" anchor="anon-records">
<artwork><![CDATA[
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 257 | Length = 68 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | flowStartSeconds IE 150 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceIPv4Address IE 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Session SC 0x0005 | Structured Permutation 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | destinationIPv4Address IE 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Stable 0x0007 | Reverse Truncation 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceTransportPort IE 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | dest.TransportPort IE 11 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | packetDeltaCount IE 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | octetDeltaCount IE 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stable 0x0003 | Precision Degradation 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | protocolIdentifier IE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>Following the Anonymization Records come the data sets containing the
anonymized data, exported according to the template in figure <xref
target="af-template"/>. Bringing it all together, consider an IPFIX
Message containing three real data records and the necessary templates to
export them, shown in <xref target="af-complete-real"/>. (Note that the
scale of this message is 8-bytes per line, for compactness; lines of dots
'. . . . . ' represent shifting of the example bit structure for
clarity.)</t>
<figure title="Example Real Message" anchor="af-complete-real">
<artwork><![CDATA[
1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 135 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr
| SetID 2 | length 40 | tid 256 | fields 8 | tmpl
| IE 150 | length 4 | IE 8 | length 4 | set
| IE 12 | length 4 | IE 7 | length 2 |
| IE 11 | length 2 | IE 2 | length 4 |
| IE 1 | length 4 | IE 4 | length 1 |
| SetID 256 | length 79 | time 1271227681 | data
| sip 192.0.2.3 | dip 198.51.100.7 | set
| sp 53 | dp 53 | packets 1 |
| bytes 74 | prt 17 | . . . . . . . . . . .
| time 1271227682 | sip 198.51.100.7 |
| dip 192.0.2.88 | sp 5091 | dp 80 |
| packets 60 | bytes 2896 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 198.51.100.7 |
| dip 203.0.113.9 | sp 5092 | dp 80 |
| packets 44 | bytes 2037 |
| prt 6 |
+---------+
]]></artwork>
</figure>
<t>The corresponding anonymized message is then shown in <xref
target="af-complete-anon"/>. The options template set describing
Anonymization Records and the Anonymization Records themselves are
added; IP addresses and byte counts are anonymized as declared.</t>
<figure title="Corresponding Anonymized Message" anchor="af-complete-anon">
<artwork><![CDATA[
1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 233 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr
| SetID 2 | length 40 | tid 256 | fields 8 | tmpl
| IE 150 | length 4 | IE 8 | length 4 | set
| IE 12 | length 4 | IE 7 | length 2 |
| IE 11 | length 2 | IE 2 | length 4 |
| IE 1 | length 4 | IE 4 | length 1 |
| SetID 3 | length 30 | tid 257 | fields 4 | opt
| scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl
| IE 145 | length 2 | IE 303 | length 2 | set
| IE TBD1 | length 2 | IE TBD2 | length 2 |
| SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon
| tid 256 | IE 150 | flags 0 | tech 1 | recs
| tid 256 | IE 8 | flags 5 | tech 6 |
| tid 256 | IE 12 | flags 7 | tech 7 |
| tid 256 | IE 7 | flags 0 | tech 1 |
| tid 256 | IE 11 | flags 0 | tech 1 |
| tid 256 | IE 2 | flags 0 | tech 1 |
| tid 256 | IE 1 | flags 3 | tech 2 |
| tid 256 | IE41 | flags 0 | tech 1 |
| SetID 256 | length 79 | time 1271227681 | data
| sip 254.202.119.209 | dip 0.0.0.7 | set
| sp 53 | dp 53 | packets 1 |
| bytes 100 | prt 17 | . . . . . . . . . . .
| time 1271227682 | sip 0.0.0.7 |
| dip 254.202.119.6 | sp 5091 | dp 80 |
| packets 60 | bytes 2900 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 0.0.0.7 |
| dip 2.19.199.176 | sp 5092 | dp 80 |
| packets 60 | bytes 2000 |
| prt 6 |
+---------+
]]></artwork>
</figure>
</section>
<section title="Security Considerations">
<!--<t>[EDITOR'S NOTE: RESOLVED: A risk that it could be worthwhile to mention:
Frequently, anonymized data will be treated by administrators as "not
privacy-sensitive" when in fact it should only be treated as "less
privacy-sensitive." (For examples in other fields, see the results
concerning user reidentification from AOL's search terms, or Netflix film
queues.) The anonymization techniques described here do indeed make
entities associated with flows harder to trace ... but there is a risk that
when they are applied, administrators will treat flow data as "completely
safe" when in fact it has only become "less harmful if misused". Discuss
ST9]</t>-->
<t>This document provides guidelines for exporting metadata about
anonymized data in IPFIX, or storing metadata about anonymized data in
IPFIX Files. It is not intended as a general statement on the
applicability of specific flow data anonymization techniques. Exporters or
publishers of anonymized data must take care that the applied
anonymization technique is appropriate for the data source, the purpose,
and the risk of deanonymization of a given application. Research in
anonymization techniques, and techniques for deanonymization, is ongoing,
and currently "safe" anonymization techniques may be rendered unsafe by
future developments. </t>
<t>We note specifically that anonymization is not a replacement for
encryption for confidentiality. It is only appropriate for protecting
identifying information in data to be used for purposes in which the
protected data is irrelevant. Confidentiality in export is best served by
using <xref target="RFC5246">TLS</xref> or <xref
target="RFC4347">DTLS</xref> as in the Security Considerations section of
<xref target="RFC5101"/>, and in long-term storage by
implementation-specific protection applied as in the Security
Considerations section of <xref target="RFC5655"/>. Indeed,
confidentiality and anonymization are not mutually exclusive, as
encryption for confidentiality may be applied to anonymized data export or
storage, as well, when the anonymized data is not intended for public
release.</t>
<t>We note as well that care should be taken even with well-anonymized
data, and anonymized data should still be treated as privacy-sensitive.
Anonymization reduces the risk of misuse, but is not a complete solution
to the problem of protecting end-user privacy in network flow trace
analysis.</t>
<t>When using pseudonymization techniques that have a mutable mapping,
there is an inherent tradeoff in the stability of the map between
long-term comparability and security of the data set against
deanonymization. In general, deanonymization attacks are more effective
given more information, so the longer a given mapping is valid, the more
information can be applied to deanonymization. The specific details of
this are technique-dependent and therefore out of the scope of this
document.</t>
<t>When releasing anonymized data, publishers need to ensure that data
that could be used in deanonymization is not leaked through a side
channel. The entire workflow (hardware, software, operational policies and
procedures, etc.) for handling anonymized data must be evaluated for risk
of data leakage. While most of these possible side channels are out of
scope for this document, guidelines for reducing the risk of information
leakage specific to the IPFIX export protocol are provided in <xref
target="guidelines"/>.</t>
<t>Note as well that the Security Considerations section of <xref
target="RFC5101"/> applies as well to the export of anonymized data, and
the Security Considerations section of <xref
target="RFC5655"/> to the storage of anonymized data, or the
publication of anonymized traces.</t>
</section>
<section title="IANA Considerations">
<t>This document specifies the creation of several new IPFIX Information
Elements in the IPFIX Information Element registry located at
http://www.iana.org/assignments/ipfix, as defined in <xref
target="ie-section"></xref> above. IANA has assigned the following
Information Element numbers for their respective Information Elements as
specified below:</t>
<list style="symbols">
<t>Information Element number TBD1 for the
anonymizationFlags Information Element.</t>
<t>Information Element number TBD2 for the anonymizationTechnique
Information Element.</t>
<t>Information Element number TBD3 for the informationElementIndex
Information Element.</t>
</list>
<t>[NOTE for IANA: The text TBDn should be replaced with the respective
assigned Information Element numbers where they appear in this document.
Information Element numbers should be assigned outside the NetFlow V9
compatibility range, as these Information Elements are not supported by
NetFlow V9.]</t>
</section>
<section title="Acknowledgments">
<t>We thank Paul Aitken and John McHugh for their comments and insight,
and Carsten Schmoll, Benoit Claise, Lothar Braun, Dan Romascanu, Stewart
Bryant, and Sean Turner for their reviews. Special thanks to the FP7 PRISM
and DEMONS projects for their material support of this work.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.5101" ?>
<?rfc include="reference.RFC.5102" ?>
<?rfc include="reference.RFC.5103" ?>
<?rfc include="reference.RFC.5655" ?>
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.RFC.5735" ?>
<?rfc include="reference.RFC.5156" ?>
</references>
<references title="Informative References">
<?rfc include="reference.RFC.5470" ?>
<?rfc include="reference.RFC.5472" ?>
<?rfc include="reference.I-D.ietf-ipfix-mediators-framework" ?>
<?rfc include="reference.I-D.ietf-ipfix-export-per-sctp-stream" ?>
<?rfc include="reference.RFC.5153" ?>
<?rfc include="reference.RFC.3917" ?>
<?rfc include="reference.RFC.4291" ?>
<?rfc include="reference.RFC.4347" ?>
<?rfc include="reference.RFC.5246" ?>
<reference anchor="Bur10">
<front>
<title>The Role of Network Trace Anonymization Under Attack</title>
<author initials="M" surname="Burkhart" fullname="Martin Burkhart">
<organization/>
</author>
<author initials="D" surname="Schatzmann" fullname="Dominik Schatzmann">
<organization/>
</author>
<author initials="B" surname="Trammell" fullname="Brian Trammell">
<organization/>
</author>
<author initials="E" surname="Boschi" fullname="Elisa Boschi">
<organization/>
</author>
<date month="January" year="2010"/>
<abstract/>
</front>
<seriesInfo name="" value="ACM Computer Communications Review, vol. 40, no. 1, pp. 6-11"/>
</reference>
<reference anchor="Mur07">
<front>
<title>Sampled Traffic Analysis by Internet-Exchange-Level Adversaries</title>
<author initials="S." surname="Murdoch" fullname="Steven Murdoch">
<organization />
</author>
<author initials="P." surname="Zielinski" fullname="Piotr Zielinski">
<organization />
</author>
<date month="June" year="2007"/>
<abstract/>
</front>
<seriesInfo name="" value="Proceedings of the 7th Workshop on Privacy Enhancing Technologies, Ottawa, Canada."/>
</reference>
<!--
<reference anchor='cryptopan'>
<front>
<title>Prefix-Preserving IP Address Anonymization</title>
<author initials='J' surname='Fan' fullname='Jinliang Fan'>
<organization />
</author>
<author initials='J' surname='Xu' fullname='Jun Xu'>
<organization />
</author>
<author initials='M' surname='Ammar' fullname='Mostafa H. Ammar'>
<organization />
</author>
<author initials='S' surname='Moon' fullname='Sue B. Moon'>
<organization />
</author>
<date month='October' day='7' year='2004' />
<abstract/>
</front>
<seriesInfo name='' value='Computer Networks, Volume 46, Issue 2, Pages 253-272, Elsevier'/>
</reference>
-->
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 09:22:17 |