One document matched: draft-boschi-ipfix-anon-03.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY draftIpfixFile PUBLIC "" "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-file.xml">
<!ENTITY draftIpfixAs PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-as.xml'>
<!ENTITY draftIpfixArchitecture PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-architecture.xml'>
<!ENTITY draftIpfixMedframe PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-ipfix-mediators-framework.xml'>
<!ENTITY rfc3917 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3917.xml'>
<!ENTITY rfc5101 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5101.xml">
<!ENTITY rfc5102 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5102.xml">
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
] >
<rfc ipr="trust200902" category="exp" docName="draft-boschi-ipfix-anon-03.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<front>
<title abbrev="IP Flow Anonymisation Support">
IP Flow Anonymisation Support
</title>
<author initials="E." surname="Boschi" fullname="Elisa Boschi">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 57</phone>
<email>elisa.boschi@hitachi-eu.com</email>
</address>
</author>
<author initials="B." surname="Trammell" fullname="Brian Trammell">
<organization abbrev="Hitachi Europe">
Hitachi Europe
</organization>
<address>
<postal>
<street>c/o ETH Zurich</street>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 13</phone>
<email>brian.trammell@hitachi-eu.com</email>
</address>
</author>
<date month="March" day="30" year="2009"></date>
<area>Operations</area>
<workgroup>IPFIX Working Group</workgroup>
<abstract>
<t>This document describes anonymisation techniques for IP flow data and
the export of anonymised data using the IPFIX protocol. It provides a
categorization of common anonymisation schemes and defines the parameters
needed to describe them. It provides guidelines for the implementation of
anonymised data export and storage over IPFIX, and describes an
Options-based method for anonymization metadata export within the
IPFIX protocol, providing the basis for the definition of information
models for configuring anonymisation techniques within an IPFIX Metering
or Exporting Process, and for reporting the technique in use to an IPFIX
Collecting Process.</t>
</abstract>
</front>
<middle>
<section title="Open Issues">
<t>There is not yet a mechanism for exporting information about
defined-time anonymisation stability.</t>
<t>The terminology section is incomplete; we should decide which of the
terms introduced in this document are to be treated as terminology.</t>
<t>Between "classes" of techniques and "parameters", there may be
"properties" as well; for example, binning and timestamp anonymisation may
be "ordered" or not (x>y in real --> x>y in anonymized). We should verify
that we're splitting these up correctly.</t>
<t>In parallel with this, the anonymisationTechnique values might be
useful as a bitfield, with properties and classes being represented by
some set of the bits in the field. We'll have to make sure that the
properties and classes are exhaustive, if we do this.</t>
<t>Both anonymisationStability and anonymisationTechnique might benefit
from the creation of IANA registries; HOWEVER, in this case, it would be
very important to ensure that such a registry contains only classes and
properties of anonymised data, not information about specific
algorithms.</t>
<t>Certain technique/IE combinaitons (e.g. structure-preserving counters)
don't make any sense; these should be noted in "IPFIX-Specific
Anonymisation Guidelines".</t>
<t>Guidelines should be provided for the evaluation of _new_ IEs added to
the IANA registry after the publication of this draft for their
anonymisation potential.</t>
<t>This document does not cover the anonymisation of sub-IP level
information, specifically MAC addresses. It should.</t>
<!-- Do we want to add information elements and templates for dissemination and publication policies? -->
</section>
<section title="Introduction">
<t>The standardisation of an IP flow information export protocol <xref target="RFC5101"></xref> and associated representations removes a
technical barrier to the sharing of IP flow data across organizational
boundaries and with network operations, security, and research communities
for a wide variety of purposes. However, with wider dissemination comes
greater risks to the privacy of the users of networks under measurement,
and to the security of those networks. While it is not a complete solution
to the issues posed by distribution of IP flow information, anonymisation
is an important tool for the protection of privacy within network
measurement infrastructures.</t>
<t>This document presents a mechanism for representing anonymised data
within IPFIX and guidelines for using it. It begins with a categorization
of anonymisation techniques. It then describes applicability of each
technique to commonly anonymisable fields of IP flow data, organized by
information element data type and semantics as in <xref target="RFC5102"></xref>; enumerates the parameters required by each of
the applicable anonymisation techniques; and provides guidelines for the
use of each of these techniques in accordance with best practices in data
protection. Finally, it specifies a mechanism for exporting anonymised
data and binding anonymisation metadata to templates using IPFIX
Options.</t>
<section title="IPFIX Protocol Overview">
<t>In the IPFIX protocol, { type, length, value } tuples are expressed
in templates containing { type, length } pairs, specifying which { value
} fields are present in data records conforming to the Template, giving
great flexibility as to what data is transmitted. Since Templates are
sent very infrequently compared with Data Records, this results in
significant bandwidth savings. Various different data formats may be
transmitted simply by sending new Templates specifying the { type,
length } pairs for the new data format. See <xref target="RFC5101"></xref> for more information.</t>
<t>The <xref target="RFC5102">IPFIX information model</xref> defines a
large number of standard Information Elements which provide the
necessary { type } information for Templates. The use of standard
elements enables interoperability among different vendors'
implementations. Additionally, non-standard enterprise-specific elements
may be defined for private use.</t>
</section>
<section title="IPFIX Documents Overview" anchor="intro-docs">
<t><xref target="RFC5101">"Specification of the IPFIX
Protocol for the Exchange of IP Traffic Flow Information"</xref>
and its associated documents
define the IPFIX Protocol, which provides network engineers and
administrators with access to IP traffic flow information.</t>
<t><xref target="I-D.ietf-ipfix-architecture">"Architecture for IP Flow
Information Export"</xref> defines
the architecture for the export of measured IP flow information out of
an IPFIX Exporting Process to an IPFIX Collecting Process, and the
basic terminology used to describe the elements of this architecture,
per the requirements defined in <xref target="RFC3917">"Requirements
for IP Flow Information Export"</xref>. The IPFIX Protocol document
<xref target="RFC5101"></xref> then covers the details of the method for
transporting IPFIX Data Records and Templates via a congestion-aware
transport protocol from an IPFIX Exporting Process to an IPFIX
Collecting Process.</t>
<t><xref target="RFC5102">"Information Model for IP Flow Information
Export"</xref> describes the Information Elements used by IPFIX,
including details on Information Element naming, numbering, and data
type encoding. Finally, <xref target="I-D.ietf-ipfix-as">"IPFIX
Applicability"</xref> describes the various applications of the IPFIX
protocol and their use of information exported via IPFIX, and relates
the IPFIX architecture to other measurement architectures and
frameworks.</t>
<t>Additionally, the <xref target="I-D.ietf-ipfix-file">"Specification
of the IPFIX File Format"</xref> describes a file format based upon the
IPFIX Protocol for the storage of flow data.</t>
<t>This document references the Protocol and Architecture documents for
terminology, and extends the IPFIX Information Model to provide new
Information Elements for anonymisation metadata. The anonymisation
techniques described herein are equally applicable to the IPFIX Protocol
and data stored in IPFIX Files.</t>
</section>
</section>
<section title="Terminology">
<t>Terms used in this document that are defined in the Terminology section
of the <xref target="RFC5101">IPFIX Protocol</xref> document are to be
interpreted as defined there.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119">RFC
2119</xref>.</t>
</section>
<section title="Categorisation of Anonymisation Techniques">
<t>Anonymisation modifies a data set in order to
protect the identity of the people or entities described by the data set
from disclosure. With respect to network traffic data, anonymisation
generally attempts to preserve some set of properties of the network
traffic useful for a given application or applications, while ensuring the
data cannot be traced back to the specific networks, hosts, or users
generating the traffic.</t>
<t>Anonymisation may be broadly classified according to two properties:
recoverability and countability. All anonymisation techniques map the real
space of identifiers or values into a separate, anonymised space,
according to some function. A technique is said to be recoverable when the
function used is invertible or can otherwise be reversed and a real
identifier can be recovered from a given replacement identifier.</t>
<t>Countability compares the dimension of the anonymised space (N) to the
dimension of the real space (M), and denotes how the count of unique
values is preserved by the anonymisation function. If the anonymised space
is smaller than the real space, then the function is said to generalise
the input, mapping more than one input point to each anonymous value
(e.g., as with aggregation). By definition, generalisation is not
recoverable.</t>
<t>If the dimensions of the anonymised and real spaces are the
same, such that the count of unique values is preserved, then the function
is said to be a direct substitution function. If the dimension of the
anonymised space is larger, such that each real value maps to a set of
anonymised values, then the function is said to be a set substitution
function. Note that with set substitution functions, the sets of
anonymised values are not necessarily disjoint. Either direct or set
substitution functions are said to be one-way if there exists no method
for recovering the real data point from an anonymised one.</t>
<t>This classification is summarised in the table below.</t>
<texttable>
<ttcol align="left">Recoverability / Countability</ttcol>
<ttcol align="left">Recoverable</ttcol>
<ttcol align="left">Non-recoverable</ttcol>
<c>N < M </c><c>N.A.</c><c>Generalisation</c>
<c>N = M </c><c>Direct Substitution</c><c>One-way Direct Substitution</c>
<c>N > M </c><c>Set Substitution</c><c>One-way Set Substitution</c>
</texttable>
</section>
<section title="Anonymisation of IP Flow Data">
<t>Due to the restricted semantics of IP flow data, there are a relatively
limited set of specific anonymisation techniques available on flow data,
though each falls into the broad categories above. Each type of field that
may commonly appear in a flow record may have its own applicable specific
techniques.</t>
<t>While anonymisation is generally applied at the resolution of single
fields within a flow record, attacks against anonymisation use entire
flows and relationships between hosts and flows within a given data set.
Therefore, fields which may not necessarily be identifying by themselves
may be anonymised in order to increase the anonymity of the data set as a
whole.</t>
<t>Of all the fields in an IP flow record, only IP addresses directly
identify entities in the real world. Each IP address is associated with an
interface on a network host, and can potentially be identified with a
single user. Additionally, IP addresses are structured identifiers; that
is, partial IP address prefixes may be used to identify networks just as
full IP addresses identify hosts. This makes anonymisation of IP addresses
particularly important.</t>
<t>Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and ephemeral
ports, can be used to determine the operating system running on a host.
Relative data volumes by port can also be used to determine the host's
function (workstation, web server, etc.); this information can be used to
identify hosts and users.</t>
<t>While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any given
network activity is recognizable by a pattern of relative time differences
and data volumes in the associated sequence of flows, even without host
address information. They can therefore be used to identify hosts and
users. Timestamps and counters are also vulnerable to traffic injection
attacks, where traffic with a known pattern is injected into a network
under measurement, and this pattern is later identified in the anonymised
data set. </t>
<t>The simplest and most extreme form of anonymisation, which can be
applied to any field of a flow record, is black-marker anonymisation, or
complete deletion of a given field. Note that black-marker anonymisation
is equivalent to simply not exporting the field(s) in question.</t>
<t> While black-marker anonymisation completely protects the data in
the deleted fields from the risk of disclosure, it also reduces the
utility of the anonymised data set as a whole. Techniques that retain some
information while reducing (though not eliminating) the disclosure risk
will be extensively discussed in the following sections; note that the
techniques specifically applicable to IP addresses, timestamps, ports, and
counters will be discussed in separate sections.</t>
<section title="IP Address Anonymisation">
<t>Since IP addresses are the most common identifiers within flow data
that can be used to directly identify a person, organization, or host,
most of the work on flow and trace data anonymisation has gone into IP
address anonymisation techniques. Indeed, the aim of most attacks
against anonymisation is to recover the map from anonymised IP addresses
to original IP addresses thereby identifying the identified hosts. There
is therefore a wide range of IP address anonymisation schemes that fit
into the following categories.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Truncation</c><c>Generalisation</c>
<c>Random Permutation</c><c>Direct Substitution</c>
<c>Prefix-preserving Pseudonymisation</c><c>Direct Substitution</c>
</texttable>
<section title="Truncation">
<t>Truncation removes "n" of the least significant bits from an IP
address, replacing them with zeroes. In effect, it replaces a host
address with a network address for some fixed netblock; for IPv4
addresses, 8-bit truncation corresponds to replacement with a /24
network address. Truncation is a non-reversible generalisation scheme.
Note that while truncation is effective for making hosts
non-identifiable, it preserves information which can be used to
identify an organization, a geographic region, a country, or a
continent (or RIR region of responsibility).</t>
<t>Truncation to an address length of 0 is equivalent to black-marker
anonymisation. Removal of IP address information is only recommended
for analysis tasks which have no need to separate flow data by host or
network; e.g. as a first stage to per-application (port) or
time-series total volume analyses.</t>
</section>
<section title="Random Permutation">
<t>Random permutation is a direct substitution technique, replacing
each IP address with an address randomly selected from the set of
possible IP addresses, guaranteeing that each anonymised address
represents a unique original address. The random permutation does not
preserve any structural information about a network, but it does
preserve the unique count of IP addresses. Any application that
requires more structure than host-uniqueness will not be able to use
randomly permuted IP addresses.</t>
</section>
<section title="Prefix-preserving Pseudonymisation">
<t>Prefix-preserving pseudonymisation is a direct substitution
technique, further restricted such that the structure of subnets is
preserved at each level while anonymising IP addresses. If two real IP
addresses match on a prefix of "n" bits, the two anonymised IP
addresses will match on a prefix of "n" bits as well. This is useful
when relationships among networks must be preserved for a given
analysis task, but introduces structure into the anonymised data which
can be exploited in attacks against the anonymisation technique.</t>
</section>
</section>
<section title="Timestamp Anonymisation">
<t>The particular time at which a flow began or ended is not
particularly identifiable information, but it can be used as part of
attacks against other anonymisation techniques or for user profiling.
Presice timestamps can be used in injected-traffic fingerprinting
attacks [CITE] as well as to identify certain activity by response delay
and size fingerprinting [CITE]. Therefore, timestamp information may be
anonymised in order to ensure the protection of the entire dataset.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalisation</c>
<c>Enumeration</c><c>Direct or Set Substitution</c>
<c>Random Shifts</c><c>Direct Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>Precision Degradation is a generalisation technique that removes
the most precise components of a timestamp, accounting all events
occurring in each given interval (e.g. one millisecond for millisecond
level degradation) as simultaneous. This has the effect of potentially
collapsing many timestamps into one. With this technique time
precision is reduced, and sequencing may be lost, but the information
at which time the event occurred is preserved. The anonymised data may
not be generally useful for applications which require strict
sequencing of flows.</t>
<t>Note that flow meters with low time precision (e.g. second
precision, or millisecond precision on high-capacity networks) perform
the equivalent of precision degradation anonymisation by their
design.</t>
<t>Note also that degradation to a very low precision (e.g. on the
order of minutes, hours, or days) is commonly used in analyses
operating on time-series aggregated data, and is referred to binning;
though the time scales are longer and applicability more restricted,
this is in principle the same operation.</t>
<t>Precision degradation to infinitely low precision is equivalent to
black-marker anonymisation. Removal of timestamp information is only
recommended for analysis tasks which have no need to separate flows in
time, for example for counting total volumes or unique occurrences of
other flow keys in an entire dataset.</t>
</section>
<section title="Enumeration">
<t>Enumeration is a substitution function that retains the
chronological order in which events occurred while eliminating time
information. Timestamps are substituted by equidistant timestamps (or
numbers) starting from a randomly chosen start value. The resulting
data is useful for applications requiring strict sequencing, but not
for those requiring good timing information (e.g. delay- or jitter-
measurement for QoS applications or SLA validation).</t>
</section>
<section title="Random Time Shifts">
<t>Random time shifts add a random offset to every timestamp within a
dataset. This reversible substitution technique therefore retains
duration and inter-event interval information as well as chronological
order of flows. It is primarily intended to defeat traffic injection
fingerprinting attacks.</t>
</section>
</section>
<section title="Counter Anonymisation">
<t>Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymisation, or for user
profiling as timestamps are. Counter anonymisation can help defeat these
attacks, but are only usable for analysis tasks for which relative or
imprecise magnitudes of activity are useful. </t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Precision Degradation</c><c>Generalisation</c>
<c>Binning</c><c>Generalisation</c>
<c>Random noise addition</c><c>Direct or Set Substitution</c>
</texttable>
<section title="Precision Degradation">
<t>As with precision degradation in timestamps, precision degradation
of counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships
between sizes of similarly-sized flows, but keeps relative magnitude
information.</t>
</section>
<section title="Binning">
<t>Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the
counter ranges are uniform, and in binning they need not be. For
example, a common counter binning scheme for packet counters could be
to bin values 1-2 together, and 3-infinity together, thereby
separating potentially completely-opened TCP connections from unopened
ones. Binning schemes are generally chosen to keep precisely the
amount of information required in a counter for a given analysis task.
Note that, also unlike precision degradation, the bin label need not
be within the bin's range.</t>
<t>Binning counters to a single bin 0-infinity, or alternately
precision degradation to infinitely low precision, is equivalent to
black-marker anonymisation. Removal of counter information is only
recommended for analysis tasks which have no need to evaluate the
removed counter, for example for counting only unique occurrences of
other flow keys.</t>
</section>
<section title="Random Noise Addition">
<t>Random noise addition adds a random amount to a counter in each
flow; this is used to keep relative magnitude information and minimize
the disruption to size relationship information while avoiding
fingerprinting attacks against anonymisation. Note that there is no
guarantee that random noise addition will maintain ranking order by a
counter among members of a set. Random noise addition is particularly
useful when the derived analysis data will not be presented in such a
way as to require the lower-order bits of the counters.</t>
</section>
</section>
<section title="Anonymisation of Other Flow Fields">
<t>Other fields, particularly port numbers and protocol numbers, can
be used to partially identify the applications that generated the
traffic in a a given flow trace. This information can be used in
fingerprinting attacks, and may be of interest on its own (e.g., to
reveal that a certain application with suspected vulnerabilities is
running on a given network). These fields are generally
anonymised using one of two techniques.</t>
<texttable>
<ttcol align="left">Scheme</ttcol>
<ttcol align="left">Action</ttcol>
<c>Binning</c><c>Generalisation</c>
<c>Random Permutation</c><c>Direct Substitution</c>
</texttable>
<section title="Binning">
<t>Binning is a generalisation technique mapping a set of potentially
non-uniform ranges into a set of abritrarily labeled bins. Common bin
arrangements depend on the field type and the analysis application.
For example, an IP protocol bin arrangement may preserve 1, 6, and 17
for ICMP, UDP, and TCP traffic, and bin all other protocols into a
single bin, to mitigate the use of uncommon protocols in
fingerprinting attacks. Another example arrangement may bin source and
destination ports into low (0-1023) and high (1024-65535) bins in
order to tell service from ephemeral ports without identifying
individual applications.</t>
<t>Binning other flow key fields to a single bin is equivalent to
black-marker anonymisation. Removal of other flow key information is
only recommended for analysis tasks which have no need to
differentiate flows on the removed keys, for example for total traffic
counts or unique counts of other flow keys.</t>
</section>
<section title="Random Permutation">
<t>Random permutation is a direct substitution technique, replacing
each key value with an value randomly selected from the set of
possible range, guaranteeing that each anonymised value represents a
unique original value. This is used to preserve the count of unique
flow key values without preserving information about the keys
themselves.</t>
</section>
</section>
</section>
<section title="Parameters for the Description of Anonymisation Techniques">
<t>This section details the abstract parameters used to describe the
anonymisation techniques examined in the previous section, on a
per-parameter basis. These parameters and their export safety inform the
design of the IPFIX anonymisation metadata export specified in the
following section.</t>
<section title="Stability">
<t>Any given anonymisation technique may be applied with a varying range
of stability. Stability is important for assessing the comparability of
anonymised information in different data sets, or in the same data set
over different time periods. In general, stability ranges from
completely stable to completely unstable; however, note that the
completely unstable case is indistinguishable from black-marker
anonymisation. A completely stable anonymisation will always map a given
value in the real space to the same value in the anonymised space. In
practice, an anonymisation may also be stable for every data set
published by an a particular producer to a particular consumer, stable
for a stated time period within a dataset or across datasets, or stable
only for a single data set.</t>
<t>If no information about stability is available, users of anonymised
data may assume that the techniques used are stable across the entire
dataset, but unstable across datasets. Note that stability presents a
risk-utility tradeoff, as completely stable anonymisation can be used
for longer-term trend analysis tasks but also presents more risk of
attack given the stable mapping.</t>
<!--<t>[EDITOR'S NOTE: are there any other universally applicable
parameters?]</t>-->
</section>
<section title="Truncation Length">
<t>Truncation and precision degradation are described by the truncation
length, or the amount of data still remaining in the anonymised field
after anonymisation.</t>
<t>Truncation length can be inferred from a given data set, and need not
be specially exported or protected.</t>
</section>
<section title="Bin Map">
<t>Binning is described by the specification of a bin mapping function.
This function can be generally expressed in terms of an associative
array that maps each point in the original space to a bin, although from
an implementation standpoint most bin functions are much simpler and
more efficient.</t>
<t>Since knowledge of the bin mapping function can be used to partially
deanonymise binned data, depending on the degree of generalisation, no
information about the bin mapping function should be exported.</t>
</section>
<section title="Permutation">
<t>Like binning, permutation is described by the specification of a
permutation function. In the general case, this can be expressed in
terms of an associative array that maps each point in the original space
to a point in the anonymised space. Unlike binning, each point in the
anonymised space must correspond to a single, unique point in the
original space.</t>
<t>Since knowledge of the permutation function can be used to completely
deanonymise permuted data, no information about the permutation function
or its parameters should be exported.</t>
</section>
<section title="Shift Amount">
<t>Shifting requires an amount to shift each value by. Since the shift
amount can be used to deanonymize data protected by shifting, no
information about the shift amount should be exported.</t>
</section>
</section>
<section title="Anonymisation Export Support in IPFIX">
<t>Anonymised data exported via IPFIX SHOULD be annotated with
anonymisation metadata, which details which fields described by which
Templates are anonymised, and provides appropriate information on the
anonymisation techniques used. This metadata SHOULD be exported in Data
Records described by the recommended Options Templates described in this
section; these Options Templates use the additional Information Elements
described in the following subsection.</t>
<t>Note that fields anonymised using the black-marker (removal) technique
do not require any special metadata support. Black-marker anonymised
fields SHOULD NOT be exported at all; the absence of the field in a given
Data Set is implicitly declared by not including the corresponding
Information Element in the Template describing that Data Set; exporting
"empty" data elements is inefficient and in the general case impossible,
as many non-counter Information Elements do not have semantically distinct
null values.</t>
<section title="Anonymisation Options Template" anchor="opt-section">
<t>The Anonymisation Options Template describes anonymisation records,
which allow anonymisation metadata to be exported inline over IPFIX or
stored in an IPFIX File, by binding information about anonymisation
techniques to Information Elements within defined Templates. IPFIX
Exporting Processes SHOULD export anonymisation records for any Template
describing exported anonymised Data Records; IPFIX Collecting Processes
and processes downstream from them MAY use anonymisation records to
treat anonymised data differently depending on the applied
technique.</t>
<t>An Exporting Process SHOULD export anonymisation records after the
Templates they describe have been exported, and SHOULD export
anonymisation records reliably.</t>
<t>Anonymisation records, like Templates, MUST be handled by Collecting
Processes as scoped to the Transport Session in which they are sent.
While the anonymisationStability IE can be used to declare that a given
anonymisation technique's mapping will remain stable across multiple
sessions, each session MUST re-export the anonymisation Records along
with the templates.</t>
<t>[EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the
same time is indicated with multiple elements of the same type (in
application order as in PSAMP). Need to verify this is actually useful
given the defined techniques.]</t>
<texttable>
<ttcol align="left">IE</ttcol>
<ttcol align="left">Description</ttcol>
<c>templateId [scope]</c>
<c>
The Template ID of the Template containing the Information Element
described by this anonymisation record. This Information Element
MUST be defined as a Scope Field.
</c>
<c>informationElementId [scope]</c>
<c>
The Information Element identifier of the Information Element
described by this anonymisation record. This Information Element
MUST be defined as a Scope Field.
</c>
<c>informationElementIndex [scope] [optional]</c>
<c>
The Information Element index of the instance of the Information
Element described by this anonymisation record identified by the
informationElementId within the Template. Optional; need only be
present when describing Templates that have multiple instances of
the same Information Element. This Information Element MUST be
defined as a Scope Field if present. This Information Element is
defined in <xref target="ie-section"/>, below.
</c>
<c>anonymisationStability</c>
<c>
The stability class of the anonymised data. MUST be present. This
Information Element is defined in <xref target="ie-section"/>,
below.
</c>
<c>anonymisationTechnique</c>
<c>
The technique used to anonymise the data. MUST be present. This
Information Element is defined in <xref target="ie-section"/>,
below.
</c>
</texttable>
</section>
<section title="Recommended Information Elements for Anonymisation Metadata" anchor="ie-section">
<section title="anonymisationStability">
<list style="hanging">
<t hangText="Description: ">
A description of the stability class of the anonymisation
technique applied to a referenced Information Element within a
referenced Template. Stability classes refer to the stability of
the parameters of the anonymisation technique, and therefore the
comparability of the mapping between the real and anonymised
values over time. This determines which anonymised datasets may be
compared with each other.
<texttable>
<ttcol align="left">Value</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c><c>Undefined: the Exporting Process makes no representation as to how stable the mapping is, or over what time period values of this field will remain comparable; while the Collecting Process MAY assume Session level stability, Session level stability is not guaranteed. This is equivalent to 0x01 Session level stability while advising the Collecting Process that no special effort has been made to ensure stability. Collecting Processes SHOULD assume this is the case in the absence of stability class information; this is the default stability class.</c>
<c>1</c><c>Session: the Exporting Process will ensure that the parameters of the anonymisation technique are stable during the Transport Session. All the values of the described Information Element for each Record described by the referenced Template within the Transport Session are comparable. The Exporting Process SHOULD endeavour to ensure at least this stability class.</c>
<c>2</c><c>Exporter-Collector Pair: the Exporting Process will ensure that the parameters of the anonymisation technique are stable across Transport Sessions over time with the given Collecting Process, but may use different parameters for different Collecting Processes. Data exported to different Collecting Processes is not comparable.</c>
<c>3</c><c>Stable: the Exporting Process will ensure that the parameters of the anonymisation technique are stable across Transport Sessions over time, regardless of the Collecting Process to which it is sent.</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned8</t>
<t hangText="ElementId: ">TBD1</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="anonymisationTechnique">
<list style="hanging">
<t hangText="Description: ">
A description of the anonymisation technique applied to a
referenced Information Element within a referenced Template.
<texttable>
<ttcol align="left">Value</ttcol>
<ttcol align="left">Description</ttcol>
<c>0</c><c>Undefined: the Exporting Process makes no representation as to whether the defined field is anonymised or not. While the Collecting Process MAY assume that the field is not anonymised, it is not guaranteed not to be. This is the default anonymisation technique.</c>
<c>1</c><c>None: the values exported are real.</c>
<c>2</c><c>Precision Degradation/Truncation: the values exported are anonymised using simple precision degradation or truncation. The new precision is implicit in the exported data, and can be deduced by the Collecting Process.</c>
<c>3</c><c>Binning: the values exported are anonymised into bins.</c>
<c>4</c><c>Enumeration: the values exported are anonymised by enumeration.</c>
<c>5</c><c>Permutation: the values exported are anonymised by random permutation.</c>
<c>6</c><c>Prefixed Permutation: the values exported are anonymised by random permutation, preserving bit-level structure; this represents prefix-preserving IP address anonymisation.</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned8</t>
<t hangText="ElementId: ">TBD2</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="informationElementIndex">
<list style="hanging">
<t hangText="Description: ">
A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements.</t>
<t hangText="Abstract Data Type: ">unsigned16</t>
<t hangText="ElementId: ">TBD3</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
</section>
</section>
<section title="Applying Anonymisation Techniques to IPFIX Export and Storage">
<t>When exporting or storing anonymised flow data using IPFIX, certain
interactions between the IPFIX Protocol and the anonymisation techniques
in use must be considered; these are treated in the subsections below.</t>
<section title="Arrangement of Processes in IPFIX Anonymisation">
<t>Anonymisation may be applied to IPFIX data at three stages within a
the collection infrastructure: on initial export, at a mediator, or
after collection, as shown in <xref target="loc-fig"></xref>. Each of these
locations has specific considerations and applicability.</t>
<figure title="Potential Anonymisation Locations" anchor="loc-fig">
<artwork><![CDATA[
+--------------------+
| IPFIX File Storage |
+--------------------+
^
| (Anonymised after collection)
|
+=======================================+
| Collecting Process |
+=======================================+
^ ^
| (Anonymised at mediator) |
| |
+=============================+ |
| Mediator | |
+=============================+ |
^ |
| (Anonymised on initial export) |
| |
+=======================================+
| Exporting Process |
+=======================================+
]]></artwork>
</figure>
<t>Anonymisation is generally performed before the wider dissemination
or repurposing of a flow data set, e.g., adapting operational
measurement data for research. Therefore, direct anonymisation of flow
data on initial export is only applicable in certain restricted
circumstances: when the Exporting Process is "publishing" data to a
Collecting Process directly, and the Exporting Process and Collecting
Process are operated by different entities. Note that certain guidelines
in <xref target="header-anon"/> with respect to timestamp anonymisation
may not apply in this case, as the Collecting Process may be able to
deduce certain timing information from the time at which each Message is
received.</t>
<t>A much more flexible arrangement is to anonymise data within a <xref
target="I-D.ietf-ipfix-mediators-framework">Mediator</xref>. Here,
original data is sent to a Mediator, which performs the anonymisation
function and re-exports the anonymised data. Such a Mediator could be
located at the administrative domain boundary of the initial Exporting
Process operator, exporting anonymised data to other consumers outside
the organisation. In this case, the original Exporter SHOULD use TLS as
specified in <xref target="RFC5101"/> to secure the channel to the
Mediator, and the Mediator should follow the guidelines in <xref
target="guidelines"></xref>, to mitigate the risk of original data
disclosure.</t>
<t>When data is to be published as an anonymised data set in an <xref
target="I-D.ietf-ipfix-file">IPFIX File</xref>, the anonymisation may be
done at the final Collecting Process before storage and dissemination,
as well. In this case, the Collector should follow the guidelines in
<xref target="guidelines"/>, especially as regards File-specific
Options in <xref target="opt-anon"/> </t>
<t>Note that anonymisation may occur at more than one location within a
given collection infrastructure, to provide varying levels of
anonymisation reversal risk and utility for specific purposes.</t>
</section>
<section title="IPFIX-Specific Anonymisation Guidelines" anchor="guidelines">
<t>In implementing and deploying the anonymisation techniques described
in this document, care must be taken that data structures supporting the
operation of the protocol itself do not leak data that could be used to
reverse the anonymisation applied to the flow data. Such data structures
may appear in the header, or within the data stream itself, especially
as options data. Each of these and their impact on specific
anonymisation techniques is noted in a separate subsection below.</t>
<section title="Appropriate Use of Information Elements for Anonymised Data" section="iespec-anon">
<t>[TODO: reiterate black-marker guidelines here]</t>
<t>[TODO: note that precision degradation SHOULD use appropriately-sized fields]</t>
</section>
<section title="Anonymisation of Header Data" anchor="header-anon">
<t>Each IPFIX Message contains a Message Header; within this Message
Header are contained two fields which may be used to break certain
anonymisation techniques: the Export Time, and the Observation Domain
ID</t>
<t>Export of IPFIX Messages containing anonymised timestamp data where
the original Export Time Message header has some relationship to the
anonymised timestamps SHOULD anonymise the Export Time header field
using an equivalent technique, if possible. Otherwise, relationships
between export and flow time could be used to partially or totally
reverse timestamp anonymisation.</t>
<t>The similarity in size between an Observation Domain ID and an IPv4
address (32 bits) may lead to a temptation to use an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID. If this address bears some relation to the IP addresses in the
flow data (e.g., shares a network prefix with internal addresses) and
the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Observation Domain ID may be used
to break the IP address anonymisation. Use of an IPv4 interface
address on the Metering or Exporting Process as the Observation Domain
ID is NOT RECOMMENDED in this case.</t>
<!--<t>[EDITOR'S NOTE: We might want to see if anyone is actually doing
this with IPFIX. The example comes from other network measurement
tools (e.g. Argus) which default to using an IPv4 address as a sensor
ID.]</t>-->
</section>
<section title="Anonymisation of Options Data" anchor="opt-anon">
<t>IPFIX uses the Options mechanism to export, among other things,
metadata about exported flows and the flow collection infrastructure.
As with the IPFIX Message Header, certain Options recommended in <xref
target="RFC5101"/> and <xref target="I-D.ietf-ipfix-file">the IPFIX
File Format</xref> containing flow timestamps and network addresses of
Exporting and Collecting Processes may be used to break certain
anonymisation techniques; care should be taken while using them with
anonymised data export and storage.</t>
<t>The Exporting Process Reliability Statistics Options Template,
recommended in <xref target="RFC5101"/>, contains an Exporting Process
ID field, which may be an exportingProcessIPv4Address Information
Element or an exportingProcessIPv6Address Information Element. If the
Exporting Process address bears some relation to the IP addresses in
the flow data (e.g., shares a network prefix with internal addresses)
and the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Exporting Process address may be
used to break the IP address anonymisation. Exporting Processes
exporting anonymised data in this situation SHOULD mitigate the risk
of attack either by omitting Options described by the Exporting
Process Reliability Statistics Options Template, or by anonymising the
Exporting Process address using a similar technique to that used to
anonymise the IP addresses in the exported data.</t>
<t>Similarly, the Export Session Details Options Template and Message
Details Options Template specified for the <xref
target="I-D.ietf-ipfix-file">IPFIX File Format</xref> may contain the
exportingProcessIPv4Address Information Element or the
exportingProcessIPv6Address Information Element to identify an
Exporting Process from which a flow record was received, and the
collectingProcessIPv4Address Information Element or the
collectingProcessIPv6Address Information Element to identify the
Collecting Process which received it. If the Exporting Process or
Collecting Process address bears some relation to the IP addresses in
the flow data (e.g., shares a network prefix with internal addresses)
and the IP addresses in the flow data are anonymised in a
structure-preserving way, then the Exporting Process or Collecting
Process address may be used to break the IP address anonymisation.
Since these Options Templates are primarily intended for storing IPFIX
Transport Session data for auditing, replay, and testing purposes, it
is NOT RECOMMENDED that storage of anonymised data include these
Options Templates in order to mitigate the risk of attack.</t>
<t>The Message Details Options Template specified for the <xref
target="I-D.ietf-ipfix-file">IPFIX File Format</xref> also contains
the collectionTimeMilliseconds Information Element. As with the Export
Time Message Header field, if the exported flow data contains
anonymised timestamp information, and the collectionTimeMilliseconds
Information Element in a given Message has some relationship to the
anonymised timestamp information, then this relationship can be
exploited to reverse the timestamp anonymisation. Since this Options
Template is primarily intended for storing IPFIX Transport Session
data for auditing, replay, and testing purposes, it is NOT RECOMMENDED
that storage of anonymised data include this Options Template in order
to mitigate the risk of attack.</t>
<t>Since the Time Window Options Template specified for the <xref
target="I-D.ietf-ipfix-file">IPFIX File Format</xref> refers to the
timestamps within the flow data to provide partial table of contents
information for an IPFIX File, care must be taken to ensure that
Options described by this template are written using the anonymised
timestamps instead of the original ones.</t>
<!--<t>[EDITOR'S NOTE: what about other non-standard templates
containing the same or similar IEs?]</t>-->
</section>
</section>
</section>
<section title="Examples">
<t>[TODO: write this section.]</t>
</section>
<section title="Security Considerations">
<t>[TODO: write this section.]</t>
</section>
<section title="IANA Considerations">
<t>This document contains no actions for IANA.</t>
<t>[EDITOR'S NOTE: creation of anonymisationStability and anonymisationTechnique registries may change this.]</t>
</section>
<section title="Acknowledgments">
<t>We thank Paul Aitken for his comments and insight, and the PRISM
project for its support of this work.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc5101;
&rfc5102;
</references>
<references title="Informative References">
&draftIpfixAs;
&draftIpfixArchitecture;
&draftIpfixFile;
&draftIpfixMedframe;
&rfc3917;
&rfc2119;
<!--
<reference anchor='cryptopan'>
<front>
<title>Prefix-Preserving IP Address Anonymization</title>
<author initials='J' surname='Fan' fullname='Jinliang Fan'>
<organization />
</author>
<author initials='J' surname='Xu' fullname='Jun Xu'>
<organization />
</author>
<author initials='M' surname='Ammar' fullname='Mostafa H. Ammar'>
<organization />
</author>
<author initials='S' surname='Moon' fullname='Sue B. Moon'>
<organization />
</author>
<date month='October' day='7' year='2004' />
<abstract/>
</front>
<seriesInfo name='' value='Computer Networks, Volume 46, Issue 2, Pages 253-272, Elsevier'/>
</reference>
-->
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 02:40:39 |