One document matched: draft-trammell-ipfix-a9n-00.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="std" docName="draft-trammell-ipfix-a9n-00.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<front>
<title abbrev="IPFIX Aggregation">
Exporting Aggregated Flow Data using the IP Flow Information Export (IPFIX) Protocol
</title>
<author initials="B." surname="Trammell" fullname="Brian Trammell">
<organization abbrev="ETH Zurich">
Swiss Federal Institute of Technology Zurich
</organization>
<address>
<postal>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<phone>+41 44 632 70 13</phone>
<email>trammell@tik.ee.ethz.ch</email>
</address>
</author>
<author initials="E." surname="Boschi" fullname="Elisa Boschi">
<organization abbrev="ETH Zurich">
Swiss Federal Institute of Technology Zurich
</organization>
<address>
<postal>
<street>Gloriastrasse 35</street>
<city>8092 Zurich</city>
<country>Switzerland</country>
</postal>
<email>boschie@tik.ee.ethz.ch</email>
</address>
</author>
<author initials="A." surname="Wagner" fullname="Arno Wagner">
<organization abbrev="Consecom AG">
Consecom AG
</organization>
<address>
<postal>
<street>Bellariastrasse 11</street>
<city>8002 Zurich</city>
<country>Switzerland</country>
</postal>
<email>arno@wagner.name</email>
</address>
</author>
<date month="September" day="21" year="2010"></date>
<area>Operations</area>
<workgroup>IPFIX Working Group</workgroup>
<abstract>
<t>This document describes the export of aggregated Flow information using
IPFIX. An Aggregated Flow is essentially an IPFIX Flow representing
packets from zero or more original Flows, within an externally imposed
time interval. The document describes Aggregated Flow export within the
framework of IPFIX Mediators and defines an interoperable,
implementation-independent method for Aggregated Flow export.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The aggregation of packet data into flows serves a variety of
different purposes, as noted in <xref target="RFC3917"/> and <xref
target="RFC5472"/>. Aggregation beyond the flow level, into records
representing multiple Flows, is a common analysis and data reduction
technique as well, with applicability to large-scale network data
analysis, archiving, and inter-organization exchange.</t>
<t>Aggregation is applicable to a wide variety of situations,
including traffic matrix calculation, generation of time series data
for visualizations or anomaly detection, and data reduction. Depending
on the keys used for aggregation, it may have an anonymising affect on
the data. Aggregation can take place at one of any number of locations
within a measurement infrastructure. Exporters may export aggregated
Flow information simply as normal flow information, by performing
aggregation after metering but before export. IPFIX Mediators are
particularly well suited to performing aggregation, as they can
collect information from multiple original exporters at geographically
and topologically distinct observation points.</t>
<t>Aggregation as defined and described in this document covers a
superset of the applications defined in the <xref
target="RFC5982">IPFIX Mediators Problem Statement</xref>, including
5.1 "Adjusting Flow Granularity (herein referred to as Key
Aggregation), 5.4 "Time Composition" (herein referred to as Interval
Combination), and 5.5 "Spatial Composition", although the
architectural aspects of spatial composition are not addressed by this
document.</t>
<t>Since aggregated flows as defined in the following section are
essentially Flows, IPFIX can be used to export <xref
target="RFC5101"/> and store <xref target="RFC5655"/> aggregated data
without further specification. However, this document further provides
a common basis for the application of IPFIX to the handling of
aggregated data, through a detailed terminology, model of aggregation
operations, methods for original Flow counting and counter
distribution across time intervals, and an aggregation metadata
representation based upon IPFIX Options.</t>
</section>
<section title="Terminology" anchor="sec-terminology">
<t>Terms used in this document that are defined in the Terminology
section of the <xref target="RFC5101">IPFIX Protocol</xref> document
are to be interpreted as defined there.</t>
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
<t>In addition, this document defines the following terms</t>:
<list style="hanging">
<t hangText="Aggregated Flow: ">A Flow, as defined by <xref
target="RFC5101"/>, derived from a set of zero or more original
Flows within a defined time interval. The two primary differences
between a Flow and an Aggregated Flow are (1) that the time
interval of a Flow is generally derived from information about the
timing of the packets comprising the Flow, while the time interval
of an Aggregated Flow are generally externally imposed; and (2)
that an Aggregated Flow may represent zero packets (i.e., an
assertion that no packets were seen for a given Flow Key in a
given time interval).</t>
<t hangText="(Intermediate) Aggregation Function: ">A mapping from
a set of zero or more original Flows into a set of aggregated
Flow, that separates the original Flows into a set of one or more
given time intervals.</t>
<t hangText="(Intermediate) Aggregation Process: ">An Intermediate
Process, as in <xref
target="I-D.ietf-ipfix-mediators-framework"/>, hosting an
Intermediate Aggregation Function.</t>
<t hangText="Aggregation Interval: ">A time interval imposed upon
an Aggregated Flow. Aggregation Functions commonly use a regular
Aggregation Interval (e.g. "every five minutes", "every calendar
month"), though regularity is not necessary.</t>
<!--
<t hangText="Interval Distribution: ">A temporal aggregation
operation which imposes a new time interval on an original Flow,
an Aggregated Flow produced by some other operation, or a set
thereof. Interval Distribution is a many-to-many operation: it may
result in the values from an original Flow appearing in multiple
Aggregated Flows as well as in multiple original Flows
contributing to each imposed time interval.</t>
<t hangText="Interval Combination: ">A temporal aggregation
operation which combines temporally adjacent original Flows with
matching Flow Keys, expanding the interval of the combined Flow to
cover the entire interval covered by the set of original
Flows.</t>
<t hangText="Key Aggregation: ">A spatial aggregation operation
that generates new Aggregated Flows from original Flows by
modifying the Flow Key. Key Aggregation is usually applied in
combination with an Interval Distribution operation.</t>
-->
<t hangText="original Flow: ">A Flow given as input to an
Aggregation Function in order to generate Aggregated Flows.</t>
<t hangText="contributing Flow: ">An original Flow that is
partially or completely represented within an Aggregated Flow.
Each aggregated Flow is made up of zero or more contributing
Flows, and an original flow may contribute to zero or more
Aggregated Flows.</t>
</list>
</section>
<section title="Requirements for Aggregation Support in IPFIX">
<t>In defining a terminology, model, and metadata for Aggregated Flow
export using IPFIX, we have sought to meet the following
requirements.</t>
<t>First, a specification of Aggregated Flow export must seek to be as
interoperable as possible. Export of Aggregated Flows using the
techniques described in this document will result in Flow data which
can be collected by Collecting Processes and read by File Readers
which do not provide any special support for Aggregated Flow
export.</t>
<t>Second, a specification of Aggregated Flow export must seek to be
as implementation-independent as the IPFIX protocol itself. In <xref
target="sec-arch"/>, we specify the flow aggregation process as an
intermediate process within the <xref
target="I-D.ietf-ipfix-mediators-framework">IPFIX Mediator
framework</xref>, and specify a variety of different architectural
arrangements for flow aggregation; these are meant to be descriptive
as opposed to proscriptive. In metadata export, we seek to define
properties of the set of exported Aggregated Flows, as opposed to the
properties of the specific algorithms used to aggregate these Flows.
Specifically out of scope for this effort are any definition of a
language for defining aggregation operations, or the configuration
parameters of Aggregation Processes.</t>
<t>From the definition of presented in <xref
target="sec-terminology"/>, an Aggregated Flow is a Flow as in <xref
target="RFC5101"/>, with a restricted definition as to the packets
making up the Flow. Practically speaking, Aggregated Flows are derived
from original Flows, as opposed to a raw packet stream. Key to this
definition of Aggregated Flow is how timing affects the process of
aggregation, as for the most part flow aggregation takes place within
some set of (usually regular) time intervals. Any specification for
Aggregated Flow export must account for the special role time
intervals play in aggregation, and the many-to-many relationship
between Aggregated Flows and original Flows which this implies.</t>
</section>
<section title="Use Cases for IPFIX Aggregation" anchor="sec-usecase">
<t>Aggregation, as a common data analysis method, has many
applications. When used with a regular Aggregation Interval, it
generates time series data from a collection of flows with discrete
intervals. Time series data is itself useful for a wide variety of
analysis tasks, such as generating parameters for network anomaly
detection systems, or driving visualizations of volume per time for
traffic with specific characteristics. Traffic matrix calculation from
flow data is inherently an aggregation action, by aggregating the flow
key down to interface, address prefix, or autonomous system.</t>
<t>Irregular or data-dependent Aggregation Intervals and Key
Aggregation operations can be also be used to provide adaptive
aggregation of network flow data, providing a higher-resolution view
on data of interest (e.g., potential attacks) to an application while
providing lower resolution to "less interesting" data (e.g., normal
web traffic). Indeed, this multiple-resolution approach can be applied
by a Mediator exporting unchanged original Flow data for the most
interesting flows alongside the Aggregated Flows of varying resolution
for the less interesting ones.</t>
<t>Note that an aggregation operation which removes potentially
sensitive information as identified in <xref
target="I-D.ietf-ipfix-anon"/> may tend to have an anonymising effect
on the Aggregated Flows, as well; however, any application of
aggregation as part of a data protection scheme should ensure that all
the issues raised in Section 4 of <xref target="I-D.ietf-ipfix-anon"/>
are addressed.</t>
</section>
<section title="Aggregation of IP Flows">
<t>As stated in <xref target="sec-terminology"/>, an Aggregated Flow
is simply an IPFIX Flow generated from original Flows by an
Aggregation Function. Here, we present a general model for
aggregation, and elaborate and provide examples of specific
aggregation operations that may be performed by the Aggregation
Process; we use this to define the export of Aggregated Flows in <xref
target="sec-export"/></t>
<section title="A general model for IP Flow Aggregation">
<t>An Intermediate Aggregation Process consumes original Flows and
exports Aggregated Flows, as defined in <xref
target="sec-terminology"/>. While this document does not define an
implementation of an Intermediate Aggregation Process further than
this, or the Aggregation Functions that it applies, it can be
helpful to partially decompose this function into a set of common
operations, in order to more fully examine the effects these
operations have.</t>
<t>Aggregation is composed of three general types of operations on
original Flows: those that externally impose a time interval,
called here the Aggregation Interval; those that reduce or
otherwise modify the Flow Key; and those that aggregate and
distribute the resulting non-Flow Key fields accordingly. Most
aggregation functions will perform each of these types of
operations.</t>
<t>Interval Distribution is the external imposition of a time
interval onto an original Flow. Note that this may lead to an
original Flow contributing to multiple aggregated Flows, if the
original Flow's time interval crosses at least one boundary
between Aggregation Intervals. Interval Distribution is described
in more detail in <xref target="sec-intdist"/>.</t>
<t>Key aggregation, the modification of Flow Keys, may occur in
two ways. First, the Flow Key may be projected: that is,
Information Elements may be removed from the Flow Key, or the
space of values in the Flow Key may be reduced. Second, derived
Information Elements may be added to the Flow Key. Both of these
modifications may result in multiple original Flows
contributing to the same Aggregated Flow. Key Aggregation is
described in more detail in <xref target="sec-keyagg"/>.</t>
<t>Interval distribution and key aggregation together may generate
multiple intermediate aggregated Flows covering the same time
interval with the same Flow Key; these intermediate Flows must
then be combined into Aggregated Flows. Non-key values are first
distributed among the Aggregated Flows to which an original Flow
contributes according to some distribution algorithm (see <xref
target="sec-distro"/>), and values from multiple contributing
Flows are combined using the same operation by which values are
combined from packets to form Flows for each Information Element:
in general, counters are added, averages are averaged, flags are
unioned, and so on. Aggregation may also introduce new non-key
fields, e.g. per-flow average counters, or distinct counters for
key fields projected out of the Aggregated Flow.</t>
<t>As a result of this final combination and distribution,an
Aggregation Function produces at most one Aggregated Flow
resulting from a set of original Flows for a given modified Flow
Key and Aggregation Interval.</t>
<t>This general model is illustrated in the figure below. Note
that interval and key field steps are commutative and optional,
and as such may occur in any order.</t>
<figure title="Conceptual model of aggregation operations" anchor="iaf-operations"><artwork><![CDATA[
original Flows
|
V
+------------------------+
| Interval Distribution |<--- Aggregation Interval
+------------------------+
| (Flows with modified intervals)
V
+------------------------+
| Key Aggregation and |<--- specification of keys
| Key Field replacement |
+------------------------+
| (Flows with modified keys/intervals)
V (Addition of new non-key values)
+------------------------+
| Combination of |
| contributing Flows and |
| Counter Distribution |
+------------------------+
|
V
Aggregated Flows
]]></artwork></figure>
</section>
<section title="Interval Distribution" anchor="sec-intdist">
<t>Interval Distribution generally imposes a regular interval on
the resulting Aggregated Flows; the selection of an interval is a
matter for the specific aggregation application, and has
tradeoffs. Shorter intervals allow higher resolution aggregated
data and, in streaming applications, faster reaction time. Longer
intervals lead to greater data reduction and simplified counter
distribution. Specifically, counter distribution is greatly
simplified by the choice of an interval longer than the duration
of longest original Flow, itself generally determined by the
original Flow's Metering Process active timeout; in this case an
original Flow can contribute to at most two Aggregated Flows, and
the more exotic value distribution methods become
inapplicable.</t>
<t>Aggregation intervals, however, need not be regular. The
aggregation interval can be chosen, for example, based on time of
day, or on the relative volume of the original Flows, in order to
adapt the aggregation to the conditions on the measured
network.</t>
<figure title="Illustration of interval distribution" anchor="intdist-fig">
<artwork><![CDATA[
| | | |
| |<--flow A-->| | | |
| |<--flow B-->| | |
| |<-------------flow C-------------->| |
| | | |
| interval 0 | interval 1 | interval 2 |
]]></artwork>
</figure>
<t>In <xref target="intdist-fig"/>, we illustrate three common
possibilities for interval distribution. For flow A, the start and
end times lie within the boundaries of a single interval 0;
therefore, flow A contributes to only one Aggregated Flow. Flow B,
by contrast, has the same duration but crosses the boundary
between intervals 0 and 1; therefore, it will contribute to two
Aggregated Flows, and its counters must be distributed among these
flows, though in the two-interval case this can be simplified
somewhat simply by picking one of the two intervals, or
proportionally distributing between them. Only flows like flow A
and flow B will be produced when the interval is chosen to be
longer than the duration of longest original Flow, as above. More
complicated is the case of flow C, which contributes to more than
two flows, and must have its counters distributed according to
some policy as in <xref target="sec-distro"/>.</t>
</section>
<section title="Key Aggregation" anchor="sec-keyagg">
<t>Key Aggregation modifies the Flow Key of the original Flows,
through projection, replacement, and augmentation. For example,
consider original Flows with a flow key containing the traditional
five-tuple of source and destination address and port, and
transport protocol. Aggregating by host pair would project the
Flow Key down by eliminating port and protocol fields. Aggregating
by source /24 network would project the Flow Key down to just the
source address, then further applying a prefix mask to the source
address.</t>
<t>During aggregation, new Flow Key fields may be added to
original Flows, or Flow Key Fields may be replaced with ancillary
values derived from the Flow. To continue the example from above,
consider an aggregation operation for counting traffic per source
autonomous system. Here, the Flow Key would be projected down to
just the source address, and the source address would be replaced
with the source AS number, looked up in a table maintained by the
intermediate Aggregation Process.</t>
<figure title="Illustration of key aggregation by simple masking" anchor="keyagg-simple-fig">
<artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
| | | | | |
retain mask /24 X X X X
V V
+---------+-------------+
| src ip4 | dst ip4 /24 |
+---------+-------------+
Aggregated Flow Key (by source address and destination class-C)
]]></artwork>
</figure>
<t><xref target="keyagg-simple-fig"/> illustrates an example
projection operation, aggregation by source address and
destination class C network. Here, the port, protocol, and
type-of-service information is removed from the flow key, the
source address is retained, and the destination address is masked
by dropping the low 8 bits.</t>
<figure title="Illustration of key aggregation by replacement" anchor="keyagg-replace-fig">
<artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
| | | | | |
+-------------------+ X X X X
| ASN lookup table |
+-------------------+
V V
+---------+---------+
| src asn | dst asn |
+---------+---------+
Aggregated Flow Key (by source and dest ASN)
]]></artwork>
</figure>
<t><xref target="keyagg-replace-fig"/> illustrates an example
projection operation with a replacement function, aggregation by
source and destination ASN without ASN information available in
the original Flow. Here, the port, protocol, and type-of-service
information is removed from the flow key, while the source and
destination addresses are run though an IP address to ASN lookup
table, and the Aggregated Flow key is made up of the resulting
source and destination ASNs.</t>
</section>
<section title="Aggregating and Distributing Counters" anchor="sec-distro">
<t>In general, counters in Aggregated Flows are treated the same
as in any Flow: on a per-Information Element basis, the counters
are calculated as if they were derived from the set of packets in
the original flow. For the most part, when aggregating original
Flows into Aggregated Flows, this is simply done by summation.</t>
<t>When the Aggregation Interval is longer or much longer than the
longest original Flow, a Flow can cross at most one
Interval boundary, and will therefore contribute to at most two
Aggregated Flows. Most common in this case is to arbitrarily but
consistently choose to account the original Flow's counters either
to the first or the last aggregated Flow to which it could
contribute.</t>
<t>However, this becomes more complicated when the Aggregation
Interval is shorter than the longest original Flow in the source
data. In such cases, each original Flow can incompletely cover
one or more time intervals, and apply to one or more Aggregated
Flows; in this case, the Aggregation Process must distribute the
counters in the original Flows across the multiple Aggregated
Flows. There are several methods for doing this, listed here in
increasing order of complexity and accuracy.</t>
<list style="hanging">
<t hangText="End Interval: ">The counters for an original Flow
are added to the counters of the appropriate Aggregated Flow
containing the end time of the original Flow.</t>
<t hangText="Start Interval: ">The counters for an original
Flow are added to the counters of the appropriate Aggregated
Flow containing the start time of the original Flow.</t>
<t hangText="Mid Interval: ">The counters for an original Flow
are added to the counters of a single appropriate Aggregated
Flow containing some timestamp between start and end time of
the original Flow.</t>
<t hangText="Simple Uniform Distribution: ">Each counter for
an original Flow is divided by the number of time intervals
the original Flow covers (i.e., of appropriate Aggregated
Flows sharing the same Flow Key), and this number is added to
each corresponding counter in each Aggregated Flow.</t>
<t hangText="Proportional Uniform Distribution: ">Each counter
for an original Flow is divided by the number of time _units_
the original Flow covers, to derive a mean count rate. This
mean count rate is then multiplied by the number of time units
in the intersection of the duration of the original Flow and
the time interval of each Aggregated Flow. This is like simple
uniform distribution, but accounts for the fractional portions
of a time interval covered by an original Flow in the first
and last time interval.</t>
<t hangText="Simulated Process: ">Each counter of the original
Flow is distributed among the intervals of the Aggregated
Flows according to some function the Aggregation Process uses
based upon properties of Flows presumed to be like the
original Flow. For example, bulk transfer flows might follow a
more or less proportional uniform distribution, while
interactive processes are far more bursty.</t>
<t hangText="Direct: ">The Aggregation Process has access to
the original packet timings from the packets making up the
original Flow, and uses these to distribute or recalculate the
counters.</t>
</list>
<t>A method for exporting the distribution of counters across
multiple Aggregated Flows is detailed in <xref
target="sec-ex-distro"/>. In any case, counters MUST be
distributed across the multiple Aggregated Flows in such a way
that the total count is preserved, within the limits of accuracy
of the implementation (e.g., inaccuracy introduced by the use of
floating-point numbers is tolerable). This property allows data to
be aggregated and re-aggregated without any loss of original count
information. To avoid confusion in interpretation of the
aggregated data, all the counters for a set of given original
Flows SHOULD be distributed via the same method.</t>
</section>
<section title="Counting Original Flows" anchor="sec-flowcount">
<t>When aggregating multiple original Flows into an Aggregated
Flow, it is often useful to know how many original Flows are
present in the Aggregated Flow. This document introduces four new
information elements in <xref target="sec-ex-flowcount"/> to
export these counters.</t>
<t>There are two possible ways to count original Flows, which we
call here conservative and non-conservative. Conservative flow
counting has the property that each original Flow contributes
exactly one to the total flow count within a set of aggregated
Flows. In other words, conservative flow counters are distributed
just as any other counter, except each original Flow is assumed to
have a flow count of one. When a count for an original Flow must
be distributed across a set of Aggregated Flows, and a
distribution method is used which does not account for that
original Flow completely within a single Aggregated Flow,
conservative flow counting requires a fractional
representation.</t>
<t>By contrast, non-conservative flow counting is used to count
how many flows are represented in an Aggregated Flow. Flow
counters are not distributed in this case. An original Flow which
is present within N Aggregated Flows would add N to the sum of
non-conservative flow counts, one to each Aggregated Flow. In
other words, the sum of conservative flow counts over a set of
Aggregated Flows is always equal to the number of original Flows,
while the sum of non-conservative flow counts is strictly greater
than or equal to the number of original Flows.</t>
<t>For example, consider flows A, B, and C as illustrated in <xref
target="intdist-fig"/>. Assume that the key aggregation step
aggregates the keys of these three flows to the same aggregated
flow key, and that start interval counter distribution is in
effect. The conservative flow count for interval 0 is 3 (since
flows A, B, and C all begin in this interval), and for the other
two intervals is 0. The non-conservative flow count for interval 0
is also 3 (due to the presence of flows A, B, and C), for interval
1 is 2 ( flows B and C), and for interval 2 is 1 (flow 0). The sum
of the conservative counts 3 + 0 + 0 = 3, the number of original
Flows; while the sum of the non-conservative counts 3 + 2 + 1 =
6.</t>
</section>
<section title="Counting Distinct Key Values" anchor="sec-distinct">
<t>One common case in aggregation is counting distinct values that
were projected out during key aggregation. For example, consider
an application counting destinations contacted per host, a common
case in host characterization or anomaly detection. Here, the
Aggregation Process needs a way to export this distinct key count
information.</t>
<t>For such applications, a distinctCountOf(key name) Information
Element should be registered with IANA to represent these cases.
[EDITOR'S NOTE: There is an open question as to the best way to do
this: either through the registration of Information Elements for
common cases in this draft, the registration of Information
Elements on demand, or the definition of a new Information Element
space for distinct counts bound to a PEN, as in <xref
target="RFC5103"/>.]</t>
</section>
<section title="Exact versus Approximate Counting during Aggregation"
anchor="sec-lowfi">
<t>In certain circumstances, particularly involving aggregation by
devices with limited resources, and in situations where exact
aggregated counts are less important than relative magnitudes
(e.g. driving graphical displays), counter distribution during key
aggregation may be performed by approximate counting means (e.g.
Bloom filters).</t>
<!-- <t>In certain cases, the magnitude of error for a given
Information Element due to approximate counting may be known. An
Exporting Process MAY use the Error Magnitude Options Template
defined in <xref target="sec-ex-error"/> to export this
information.</t> -->
</section>
<section title="Interval Combination">
<t>One special case of aggregation uses adaptive Aggregation
Intervals without any projection in order to join long-lived Flows
which may have been split (e.g., due to an active timeout shorter
than the Flow.) This is referred to as "Time Composition" in
section 5.4 of <xref target="RFC5982"/>. Here, the Flow Key is
unmodified, and the Aggregation Interval is chosen on a per-Flow
basis to cover the interval spanned by the set of aggregated
Flows. This may be applied alone in order to normalize split
Flows, or in combination with other aggregation functions in order
to obtain more accurate original Flow counts.</t>
</section>
</section>
<section title="Aggregation in the IPFIX Architecture" anchor="sec-arch">
<t>The techniques described in this document can be applied to IPFIX
data at three stages within the collection infrastructure: on initial
export, within a mediator, or after collection, as shown in <xref
target="loc-fig"/>.</t>
<figure title="Potential Aggregation Locations" anchor="loc-fig">
<artwork><![CDATA[
+==========================================+
| Exporting Process |
+==========================================+
| |
| (Aggregated Flow Export) |
V |
+=============================+ |
| Mediator | |
+=============================+ |
| |
| (Aggregating Mediator) |
V V
+==========================================+
| Collecting Process |
+==========================================+
|
| (Aggregation for Storage)
V
+--------------------+
| IPFIX File Storage |
+--------------------+
]]></artwork>
</figure>
<t>Aggregation can be applied for either intermediate or final analytic
purposes. In certain circumstances, it may make sense to export
Aggregated Flows from an Exporting Process, for example, if the
Exporting Process is designed to drive a time-series visualization
directly. Note that this case, where the Aggregation Process is
essentially integrated into the Metering Process, is essentially covered
by the <xref target="RFC5470">IPFIX architecture</xref>: the flow keys
used are simply a subset of those that would normally be used. A
Metering Process in this arrangement MAY choose to simulate the
generation of larger flows in order to generate original flow counts, if
the application calls for compatibility with an Aggregation Process
deployed in a separate location.</t>
<t>Deployment of an Intermediate Aggregation Process within a <xref
target="RFC5982">Mediator</xref> is a much more flexible arrangement. Here, the
Mediator consumes original Flows and produces aggregated Flows; this
arrangement is suited to any of the use cases detailed in <xref
target="sec-usecase"/>. In a mediator, aggregation can be applied as
well to aggregating original Flows from multiple sources into a single
stream of aggregated Flows; the architectural specifics of this
arrangement are not addressed in this document, which is concerned only
with the aggregation operation itself; see <xref
target="I-D.claise-ipfix-mediation-protocol"/> for details.</t>
<t>In the specific case that an Aggregation Process is employed for data
reduction for storage purposes, it can take original Flows from a
Collecting Process or File Reader and pass Aggregated Flows to a File
Writer for storage.</t>
<t>The data flows into and out of an Intermediate Aggregation Process
are showin in <xref target="iap-dataflows"/>.</t>
<figure title="Data flows through the aggregation process" anchor="iap-dataflows">
<artwork><![CDATA[
packets --+ +- IPFIX Messages -+
| | |
V V V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
| | +====================+ +=============+
| | | original Flows |
| | V V
+ - - - - - - - - -+======================================+
| Intermediate Aggregation Process (IAP) |
+=========================================================+
| Aggregated Aggregated |
| Flows Flows |
V V
+===================+ +=============+
| Exporting Process | | File Writer |
+===================+ +=============+
| |
+------------> IPFIX Messages <----------+
]]></artwork>
</figure>
</section>
<section title="Export of Aggregated IP Flows using IPFIX" anchor="sec-export">
<t>In general, Aggregated Flows are exported in IPFIX as any normal Flow. However, certain aspects of aggregated flow export benefit from additional guidelines, or new Information Elements to represent aggregation metadata or information generated during aggregation. These are detailed in the following subsections.</t>
<section title="Time Interval Export">
<t>Since an Aggregated Flow is simply a Flow, the existing
timestamp Information Elements in the IPFIX Information Model
(e.g., flowStartMilliseconds, flowEndNanoseconds) are sufficient
to specify the time interval for aggregation. Therefore, this
document specifies no new aggregation-specific Information
Elements for exporting time interval information.</t>
<t>Each Aggregated Flow SHOULD contain both an interval start and
interval end timestamp. If an exporter of Aggregated Flows omits
the interval end timestamp from each Aggregated Flow, the time
interval for Aggregated Flows within an Observation Domain and
Transport Session MUST be regular and constant.
However, note that this approach might lead to interoperability
problems when exporting Aggregated Flows to non-aggregation-aware
Collecting Processes and downstream analysis tasks; therefore, an
Exporting Process capable of exporting only interval start
timestamps MUST provide a configuration option to export interval
end timestamps as well.</t>
</section>
<section title="Flow Count Export" anchor="sec-ex-flowcount">
<t>The following four Information Elements are defined to count original Flows as discussed in <xref target="sec-flowcount"/>.</t>
<section title="originalFlowsPresent Information Element" anchor="ie-noncon-flowcount">
<list style="hanging">
<t hangText="Description: ">
The non-conservative count of original Flows contributing to
this Aggregated Flow. Non-conservative counts need not sum to
the original count on re-aggregation.
</t>
<t hangText="Abstract Data Type: ">unsigned64</t>
<t hangText="ElementId: ">TBD1</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="originalFlowsInitiated InformationElement" anchor="ie-con-flowstartcount">
<list style="hanging">
<t hangText="Description: ">
The conservative count of original Flows whose first packet is
represented within this Aggregated Flow. Conservative counts
must some to the original count on re-aggregation.
</t>
<t hangText="Abstract Data Type: ">unsigned64</t>
<t hangText="ElementId: ">TBD2</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="originalFlowsCompleted InformationElement" anchor="ie-con-flowendcount">
<list style="hanging">
<t hangText="Description: ">
The conservative count of original Flows whose last packet is
represented within this Aggregated Flow. Conservative counts
must some to the original count on re-aggregation.
</t>
<t hangText="Abstract Data Type: ">unsigned64</t>
<t hangText="ElementId: ">TBD3</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="originalFlows InformationElement" anchor="ie-con-flowcount">
<list style="hanging">
<t hangText="Description: ">
The conservative count of original Flows contributing to this
Aggregated Flow; may be distributed via any of the methods
described in <xref target="sec-distro"/>.
</t>
<t hangText="Abstract Data Type: ">float64</t>
<t hangText="ElementId: ">TBD4</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
</section>
<section title="Aggregate Counter Distibution Export" anchor="sec-ex-distro">
<t>When exporting counters distributed among Aggregated Flows, as
described in <xref target='sec-distro'/>, the Exporting Process MAY
export an Aggregate Counter Distribution Record for each Template
describing Aggregated Flow records; this Options Template is
described below. It uses the valueDistributionMethod Information
Element, also defined below. Since in many cases distribution is
simple, accounting the counters from contributing Flows to the
first Interval to which they contribute, this is default
situation, for which no Aggregate Counter Distribution Record is
necessary; Aggregate Counter Distribution Records are only
applicable in more exotic situations, such as using an Aggregation
Interval smaller than the durations of original Flows.</t>
<section title="Aggregate Counter Distribution Options Template">
<t>This Options Template defines the Aggregate Counter
Distribution Record, which allows the binding of a value
distribution method to a Template ID. This is used to signal
to the Collecting Process how the counters were distributed.
The fields are as below:
<texttable>
<ttcol align="left">IE</ttcol>
<ttcol align="left">Description</ttcol>
<c>templateId [scope]</c>
<c>
The Template ID of the Template defining the Aggregated
Flows to which this distribution option applies. This
Information Element MUST be defined as a Scope Field.
</c>
<c>valueDistributionMethod</c>
<c>
The method used to distribute the counters for the
Aggregated Flows defined by the associated Template.
</c>
</texttable></t>
</section>
<section title="valueDistributionMethod Information Element" anchor="ie-errmag">
<list style="hanging">
<t hangText="Description: ">
A description of the method used to distribute the
counters from contributing Flows into the Aggregated
Flow records described by an associated Template. The
method is deemed to apply to all the non-key
Information Elements in the referenced Template for
which value distribution is a valid operation; if the
originalFlowsInitiated and/or originalFlowsCompleted
Information Elements appear in the Template, they are
not subject to this distribution method, as they each
infer their own distribution method. The distribution
methods are taken from <xref target='sec-distro'/> and
encoded as follows:
<texttable>
<ttcol align="left">Value</ttcol>
<ttcol align="left">Description</ttcol>
<c>1</c><c>Start Interval: The counters for an
original Flow are added to the counters of the
appropriate Aggregated Flow containing the start time
of the original Flow. This should be assumed the
default if value distribution information is not
available at a Collecting Process for an Aggregated
Flow.</c>
<c>2</c><c>End Interval: The counters for an original
Flow are added to the counters of the appropriate
Aggregated Flow containing the end time of the
original Flow.</c>
<c>3</c><c>Mid Interval: The counters for an original
Flow are added to the counters of a single appropriate
Aggregated Flow containing some timestamp between
start and end time of the original Flow.</c>
<c>4</c><c>Simple Uniform Distribution: Each counter
for an original Flow is divided by the number of time
intervals the original Flow covers (i.e., of
appropriate Aggregated Flows sharing the same Flow
Key), and this number is added to each corresponding
counter in each Aggregated Flow.</c>
<c>5</c><c>Proportional Uniform Distribution: Each
counter for an original Flow is divided by the number
of time _units_ the original Flow covers, to derive a
mean count rate. This mean count rate is then
multiplied by the number of time units in the
intersection of the duration of the original Flow and
the time interval of each Aggregated Flow. This is
like simple uniform distribution, but accounts for the
fractional portions of a time interval covered by an
original Flow in the first and last time interval.</c>
<c>6</c><c>Simulated Process: Each counter of the
original Flow is distributed among the intervals of
the Aggregated Flows according to some function the
Aggregation Process uses based upon properties of
Flows presumed to be like the original Flow. This is
essentially an assertion that the Aggregation Process
has no direct packet timing information but is
nevertheless not using one of the other simpler
distribution methods. The Aggregation Process
specifically makes no assertion as to the correctness
of the simulation.</c>
<c>7</c><c>Direct: The Aggregation Process has access
to the original packet timings from the packets making
up the original Flow, and uses these to distribute or
recalculate the counters.</c>
</texttable>
</t>
<t hangText="Abstract Data Type: ">unsigned8</t>
<t hangText="ElementId: ">TBD5</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
</section>
<!--
<section title="Error Magnitude Export" anchor="sec-ex-error">
<section title="errorMagnitude Information Element" anchor="ie-errmag">
<list style="hanging">
<t hangText="Description: ">
The approximate magnitude of the error induced by the
Intermediate Aggregation Process in the values of the
associated Information Element, as a proportion of the
range of values covered by the Information Element;
SHOULD be associated with an informationElementId and
optional templateId as scope in an Options
Template.</t>
<t hangText="Abstract Data Type: ">float64</t>
<t hangText="ElementId: ">TBD6</t>
<t hangText="Status: ">Proposed</t>
</list>
</section>
<section title="Error Magnitude Options Template">
<t>[TODO: define options template]</t>
</section>
</section>
-->
</section>
<section title="Examples">
<t>[TODO]</t>
</section>
<section title="Security Considerations">
<t>[TODO]</t>
</section>
<section title="IANA Considerations">
<t>[TODO: add all IEs defined in Section 6.]</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.5101" ?>
<?rfc include="reference.RFC.5102" ?>
</references>
<references title="Informative References">
<?rfc include="reference.RFC.5103" ?>
<?rfc include="reference.RFC.5470" ?>
<?rfc include="reference.RFC.5472" ?>
<?rfc include="reference.RFC.5153" ?>
<?rfc include="reference.RFC.5610" ?>
<?rfc include="reference.RFC.5655" ?>
<?rfc include="reference.RFC.5982" ?>
<?rfc include="reference.RFC.3917" ?>
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.I-D.ietf-ipfix-anon" ?>
<?rfc include="reference.I-D.ietf-ipfix-mediators-framework" ?>
<?rfc include="reference.I-D.claise-ipfix-mediation-protocol" ?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:50:46 |