One document matched: draft-trammell-ipfix-a9n-01.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="std" docName="draft-trammell-ipfix-a9n-01.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>

<front>
  <title abbrev="IPFIX Aggregation">
    Exporting Aggregated Flow Data using the IP Flow Information Export (IPFIX) Protocol 
  </title>
  <author initials="B." surname="Trammell" fullname="Brian Trammell">
    <organization abbrev="ETH Zurich">
      Swiss Federal Institute of Technology Zurich 
    </organization>
    <address>
      <postal>
        <street>Gloriastrasse 35</street>
        <city>8092 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <phone>+41 44 632 70 13</phone>
      <email>trammell@tik.ee.ethz.ch</email>
    </address>
  </author>
  <author initials="E." surname="Boschi" fullname="Elisa Boschi">
    <organization abbrev="ETH Zurich">
      Swiss Federal Institute of Technology Zurich 
    </organization>
    <address>
      <postal>
        <street>Gloriastrasse 35</street>
        <city>8092 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <email>boschie@tik.ee.ethz.ch</email>
    </address>
  </author>
  <author initials="A." surname="Wagner" fullname="Arno Wagner">
    <organization abbrev="Consecom AG">
      Consecom AG 
    </organization>
    <address>
      <postal>
        <street>Bellariastrasse 12</street>
        <city>8002 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <email>arno@wagner.name</email>
    </address>
  </author>
  <date month="October" day="25" year="2010"></date>
  <area>Operations</area>
  <workgroup>IPFIX Working Group</workgroup>
  <abstract> 

    <t>This document describes the export of aggregated Flow information using
    IPFIX. An Aggregated Flow is essentially an IPFIX Flow representing
    packets from zero or more original Flows, within an externally imposed
    time interval. The document describes Aggregated Flow export within the
    framework of IPFIX Mediators and defines an interoperable,
    implementation-independent method for Aggregated Flow export.</t>

  </abstract>
</front>

<middle>

    <section title="Introduction">

        <t>The aggregation of packet data into flows serves a variety of
        different purposes, as noted in the <xref
        target="RFC3917">requirements</xref> and <xref
        target="RFC5472">applicability statement</xref> for the IP Flow
        Information Export (IPFIX) <xref target="RFC5101">protocol</xref>.
        Aggregation beyond the flow level, into records representing multiple
        Flows, is a common analysis and data reduction technique as well, with
        applicability to large-scale network data analysis, archiving, and
        inter-organization exchange. This applicability in large-scale
        situations, in particular, led to the inclusion of aggregation as part
        of the <xref target="RFC5982">IPFIX Mediators Problem
        Statement</xref>, and the definition of an Intermediate Aggregation
        Process in the <xref
        target="I-D.ietf-ipfix-mediators-framework">Mediator
        framework</xref>.</t>

        <t>The Mediator framework offered an initial but inexhaustive
        treatment of the topic of aggregation. This document expands on the
        definitions presented there, providing an implementation-neutral,
        interoperable specification of an Intermediate Aggregation Process
        which can operate within the Mediator framework or independent
        thereof.</t>

        <t>Aggregation is part of a wide variety of applications, including
        traffic matrix calculation, generation of time series data for
        visualizations or anomaly detection, and data reduction. Depending on
        the keys used for aggregation, it may have an anonymising affect on
        the data. Aggregation can take place at one of any number of locations
        within a measurement infrastructure. Exporters may export aggregated
        Flow information simply as normal flow information, by performing
        aggregation after metering but before export. IPFIX Mediators are
        particularly well suited to performing aggregation, as they can
        collect information from multiple original exporters at geographically
        and topologically distinct observation points.</t>

        <t>Aggregation as defined and described in this document covers a
        superset of the applications defined in <xref target="RFC5982"/>,
        including 5.1 "Adjusting Flow Granularity (herein referred to as Key
        Aggregation), 5.4 "Time Composition" (herein referred to as Interval
        Combination), and 5.5 "Spatial Composition".</t>

 		<t>Note that an Intermediate Aggregation process may be applied to
        data collected from multiple Observation Points, as aggregation is
        natural to apply for data reduction when concentrating measurement
        data. This document specifically does not address the architectural
        and protocol issues that arise when combining IPFIX data from multiple
        Observation Points and exporting from a single Mediator, as these
        issues are general to Mediation in general. These are treated in
        detail in the <xref
        target="I-D.claise-ipfix-mediation-protocol">Mediator Protocol</xref>
        document.</t>

        <t>Since aggregated flows as defined in the following section are
        essentially Flows, IPFIX can be used to export <xref
        target="RFC5101"/> and store <xref target="RFC5655"/> aggregated data
        "as-is"; there are no changes necessary to the protocol. However, this
        document further provides a common basis for the application of IPFIX
        to the handling of aggregated data, through a detailed terminology,
        model of aggregation operations, methods for original Flow counting
        and counter distribution across time intervals, and an aggregation
        metadata representation based upon IPFIX Options.</t>

        <section title="Rationale and Scope">

            <t>This specification of Aggregated Flow export has
            interoperability and implementation-independence as its two key
            goals. First, export of Aggregated Flows using the techniques
            described in this document will result in Flow data which can be
            collected by Collecting Processes and read by File Readers which
            do not provide any special support for Aggregated Flow export. An
            Aggregated Flow is simply a Flow with some additional conditions
            as to how it is derived.</t>

            <t>Second, in <xref target="sec-arch"/>, we specify aggregation in
            an implementation-independent way. While we must describe the
            aggregation process in terms of operations due to the
            interdependencies among them, these operations like the stages in
            the <xref target="RFC5470">IPFIX Architecture</xref> are meant to
            be descriptive as opposed to proscriptive. We specify the flow
            aggregation process as an intermediate process within the <xref
            target="I-D.ietf-ipfix-mediators-framework">IPFIX Mediator
            framework</xref>, and specify a variety of different architectural
            arrangements for flow aggregation. When exporting
            aggregation-relevant metadata, we seek to define properties of the
            set of exported Aggregated Flows, as opposed to the properties of
            the specific algorithms used to aggregate these Flows.
            Specifically out of scope for this effort are any definition of a
            language for defining aggregation operations, or the configuration
            parameters of Aggregation Processes, as these are necessarily
            implementation dependent.</t>

            <t>From the definition of presented below in <xref
            target="sec-terminology"/>, an Aggregated Flow is a Flow as in
            <xref target="RFC5101"/>, with additional conditions as to the
            packets making up the Flow. Practically speaking, Aggregated Flows
            are derived from original Flows, as opposed to a raw packet
            stream. Key to this definition of Aggregated Flow is how timing
            affects the process of aggregation, as for the most part flow
            aggregation takes place within some set of time intervals, which
            are usually regular and externally imposed, or derived from the
            flows themselves. Aggregation operations concerning keys, which
            are often called "spatial aggregation" in the literature, will
            necessarily impact and be impacted by these time intervals;
            aggregation operations concerning these time intervals are often
            called "temporal aggregation" in the literature. Prior definitions
            of aggregation attempt to treat temporal and spatial aggregation
            separately; this document recognizes that this is not possible due
            to the interdependencies between flows and their time intervals,
            and defines these operations as interdependent.</t>

        </section>

<!--
        <section title="Related IPFIX Documents"/>

            <t>[EDITOR'S NOTE: TODO roadmap goes here]</t>

        </section>
-->

    </section>

    <section title="Terminology" anchor="sec-terminology">

        <t>Terms used in this document that are defined in the Terminology
        section of the <xref target="RFC5101">IPFIX Protocol</xref> document
        are to be interpreted as defined there.</t>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119"/>.</t>

        <t>In addition, this document defines the following terms</t>:
        
        <list style="hanging">

            <t hangText="Aggregated Flow: ">A Flow, as defined by <xref
            target="RFC5101"/>, derived from a set of zero or more original
            Flows within a defined time interval. The two primary differences
            between a Flow and an Aggregated Flow are (1) that the time
            interval of a Flow is generally derived from information about the
            timing of the packets comprising the Flow, while the time interval
            of an Aggregated Flow are generally externally imposed; and (2)
            that an Aggregated Flow may represent zero packets (i.e., an
            assertion that no packets were seen for a given Flow Key in a
            given time interval).</t>

            <t hangText="(Intermediate) Aggregation Function: ">A mapping from
            a set of zero or more original Flows into a set of Aggregated
            Flows accross one or more time intervals.</t>
            
             <t hangText="(Intermediate) Aggregation Process: ">An
            Intermediate Process, as in <xref
            target="I-D.ietf-ipfix-mediators-framework"/>, hosting an
            Intermediate Aggregation Function. Note that this definition,
            together with that given above, updates the definition given in
            <xref target="I-D.ietf-ipfix-mediators-framework"/> to account for
            the more precise definition of Aggregated Flow given herein. An
            Aggregation Process need not be intermediate; that is, while
            Aggregation Processes will often be deployed within a Mediator,
            this is not necessarily the case.</t>

            <t hangText="Aggregation Interval: ">A time interval imposed upon
            an Aggregated Flow. Aggregation Functions may use a regular
            Aggregation Interval (e.g. "every five minutes", "every calendar
            month"), though regularity is not necessary. Aggregation intervals
            may also be derived from the time intervals of the flows being
            aggregated.</t>

            <!-- <t hangText="Interval Distribution: ">A temporal aggregation
            operation which imposes a new time interval on an original Flow,
            an Aggregated Flow produced by some other operation, or a set
            thereof. Interval Distribution is a many-to-many operation: it may
            result in the values from an original Flow appearing in multiple
            Aggregated Flows as well as in multiple original Flows
            contributing to each imposed time interval.</t>

            <t hangText="Interval Combination: ">A temporal aggregation
            operation which combines temporally adjacent original Flows with
            matching Flow Keys, expanding the interval of the combined Flow to
            cover the entire interval covered by the set of original
            Flows.</t>

            <t hangText="Key Aggregation: ">A spatial aggregation operation
            that generates new Aggregated Flows from original Flows by
            modifying the Flow Key. Key Aggregation is usually applied in
            combination with an Interval Distribution operation.</t>
-->

            <t hangText="original Flow: ">A Flow given as input to an
            Aggregation Function in order to generate Aggregated Flows.</t>

            <t hangText="contributing Flow: ">An original Flow that is
            partially or completely represented within an Aggregated Flow.
            Each aggregated Flow is made up of zero or more contributing
            Flows, and an original flow may contribute to zero or more
            Aggregated Flows.</t>

        </list>

    </section>

    <section title="Use Cases for IPFIX Aggregation" anchor="sec-usecase">

        <t>Aggregation, as a common data analysis method, has many
        applications. When used with a regular Aggregation Interval, it
        generates time series data from a collection of flows with discrete
        intervals. Time series data is itself useful for a wide variety of
        analysis tasks, such as generating parameters for network anomaly
        detection systems, or driving visualizations of volume per time for
        traffic with specific characteristics. Traffic matrix calculation from
        flow data is inherently an aggregation action, by aggregating the flow
        key down to interface, address prefix, or autonomous system.</t>

        <t>Irregular or data-dependent Aggregation Intervals and Key
        Aggregation operations can be also be used to provide adaptive
        aggregation of network flow data, providing a lower-resolution view
        (i.e. more aggregation) on data deemed "less interesting" to a given
        application, while allowing higher resolution (i.e. less or no
        aggregation) for data of interest. For example, in a Mediator equipped
        with traffic classification capabilities for security purposes,
        potentially malicious flows could be exported directly, while
        known-good or probably-good flows (e.g. normal web browsing) could be
        exported simply as time series volumes per web server.</t>
        
         <t>Note that an aggregation operation which removes potentially
        sensitive information as identified in <xref
        target="I-D.ietf-ipfix-anon"/> may tend to have an anonymising effect
        on the Aggregated Flows, as well; however, any application of
        aggregation as part of a data protection scheme should ensure that all
        the issues raised in Section 4 of <xref target="I-D.ietf-ipfix-anon"/>
        are addressed.</t>
        
         </section>

     <section title="Aggregation of IP Flows">

        <t>As stated in <xref target="sec-terminology"/>, an Aggregated Flow
        is simply an IPFIX Flow generated from original Flows by an
        Aggregation Function. Here, we discuss temporal and spatial aspects of
        aggregation, present a general model for aggregation, and elaborate
        and provide examples of specific aggregation operations that may be
        performed by the Aggregation Process; we use this to define the export
        of Aggregated Flows in <xref target="sec-export"/></t>

        <section title="A note on temporal and spatial aggregation">

            <t>In general, aggregation of data records bearing time
            information can take place in time (by grouping the original
            records by time) or in space (by grouping the original records by
            some other dimension; in the case of IP Flows, this would
            generally be a flow key.</t>

            <t>Temporal aggregation is treated in <xref
            target="I-D.ietf-ipfix-mediators-framework"/> in section 5.3.2.3,
            as "[m]erging a set of Data Records within a certain time period
            into one Flow Record by summing up the counters where
            appropriate," as well as in the definition of "temporal
            composition, wherein "multiple consecutive Flow Records with
            identical Flow Key values are merged into a single Flow Record of
            longer Flow duration if they arrive within a certain time
            interval."</t>

            <t>Spatial aggregation is treated in <xref
            target="I-D.ietf-ipfix-mediators-framework"/> in section 5.3.2.3,
            as "spatial composition", wherein "Data Records sharing common
            properties are merged into one Flow Record within a certain time
            period." Even this definition hints at the problem in attempting
            to treat temporal and spatial aggregation of IP flow data
            orthogonally.</t>

            <t>The issue arises because an IP Flow, as defined in <xref
            target="RFC5101"/>, has three types of properties: flow keys,
            which "define" the properties common to all packets in the Flow;
            flow values or non-key fields, which describe the Flow itself; and
            the time interval of the Flow. The keys and time interval serve to
            uniquely identify the Flow. When spatially aggregating Flows,
            these Flows bring their time intervals along with them. The time
            intervals of the spatially aggregated Flows must either be
            combined through union, or externally imposed by splitting the
            original Flow across one or more intervals.</t>

            <t>To address this subtle interdependency, it is more useful to
            view an Aggregation Function in terms of the temporal operations
            of the function, called "interval distribution" herein; and the
            spatial operations of the function, called "key aggregation"
            herein; this follows in the general model presented in the
            following subsection.</t>

        </section>

        <section title="A general operational model for IP Flow aggregation">

            <t>An Intermediate Aggregation Process consumes original Flows and
            exports Aggregated Flows, as defined in <xref
            target="sec-terminology"/>. While this document does not define an
            implementation of an Intermediate Aggregation Process further than
            this, or the Aggregation Functions that it applies, it can be
            helpful to partially decompose this function into a set of common
            operations, in order to more fully examine the effects these
            operations have.</t>

            <t>Aggregation is composed of three general types of operations on
            original Flows: those that externally impose a time interval,
            called here the Aggregation Interval; those that derive a new Flow
            Key for the Aggregated Flows from the original Flow information;
            and those that aggregate and distribute the resulting non-Flow Key
            fields accordingly. Most aggregation functions will perform each
            of these types of operations.</t>

            <t>Interval distribution is the external imposition of a time
            interval onto an original Flow. Note that this may lead to an
            original Flow contributing to multiple aggregated Flows, if the
            original Flow's time interval crosses at least one boundary
            between Aggregation Intervals. Interval Distribution is described
            in more detail in <xref target="sec-intdist"/>.</t>

            <t>Key aggregation, the derivation of Flow Keys for Aggregated
            Flows from original Flow information, is made up of two
            operations: reduction and replacement. Reduction removes
            Information Elements from the original Flow Key, or otherwise
            constrains the space of values in the Flow Key (e.g., by replacing
            IP addresses with /24 CIDR blocks). In replacement, Information
            Elements derived from fields in the original Flow itself may be
            added to the Flow Key. Both of these modifications may result in
            multiple original Flows contributing to the same Aggregated Flow.
            Key Aggregation is described in more detail in <xref
            target="sec-keyagg"/>.</t>

            <t>Interval distribution and key aggregation together may generate
            multiple intermediate aggregated Flows covering the same time
            interval with the same Flow Key; these intermediate Flows must
            therefore be combined into Aggregated Flows. Non-key values are
            first distributed among the Aggregated Flows to which an original
            Flow contributes according to some distribution algorithm (see
            <xref target="sec-distro"/>), and values from multiple
            contributing Flows are combined using the same operation by which
            values are combined from packets to form Flows for each
            Information Element: in general, counters are added, averages are
            averaged, flags are unioned, and so on. Key aggregation may also
            introduce new non-key fields, e.g. per-flow average counters, or
            distinct counters for key fields reduced out of the Aggregated
            Flow.</t>

            <t>As a result of this final combination and distribution,an
            Aggregation Function produces at most one Aggregated Flow
            resulting from a set of original Flows for a given Aggregated Flow
            Key and Aggregation Interval.</t>

            <t>This general model is illustrated in the figure below. Note
            that within an implementation, these steps may occur in any order,
            and indeed be combined together in any way.</t>

            <figure title="Conceptual model of aggregation operations" anchor="iaf-operations"><artwork><![CDATA[
        
                    +-----------------------+
                 +->| Interval distribution |-+
                 |  +-----------------------+ |
                 |            ^  (partially   |
                 |            |   aggregated  |
                 |            V     flows)    |
                 |  +-----------------+       |
 original Flows -+->| Key aggregation |----+  |
                    +-----------------+    |  |
                                           V  V
                            +--------------------+
                            |  Combination of    |
                            | contributing Flows |
                            +--------------------+
                                      |
                                      V    
                           +----------------------+
                           | Counter Distribution |
                           +----------------------+
                                      |
                                      V
                               Aggregated Flows
            ]]></artwork></figure>
        </section>

        <section title="Interval Distribution" anchor="sec-intdist">

            <t>Interval Distribution imposes a time interval on the resulting
            Aggregated Flows. The selection of an interval is a matter for the
            specific aggregation application. Intervals may be derived from
            the flows themselves (e.g, an interval may be selected to cover
            the entire interval containing the set of all flows sharing a
            given Key) or externally imposed; in the latter case the
            externally imposed interval may be regular (e.g., every five
            minutes) or irregular (e.g., to allow for different time
            resolutions at different times of day, under different network
            conditions, or indeed for different sets of original Flows).</t>

            <t>The length of the imposed interval itself has tradeoffs. and
            has tradeoffs. Shorter intervals allow higher resolution
            aggregated data and, in streaming applications, faster reaction
            time. Longer intervals lead to greater data reduction and
            simplified counter distribution. Specifically, counter
            distribution is greatly simplified by the choice of an interval
            longer than the duration of longest original Flow, itself
            generally determined by the original Flow's Metering Process
            active timeout; in this case an original Flow can contribute to at
            most two Aggregated Flows, and the more complex value distribution
            methods become inapplicable.</t>
            
            <figure title="Illustration of interval distribution" anchor="intdist-fig">
                <artwork><![CDATA[
|                |                |                |
| |<--flow A-->| |                |                |
|        |<--flow B-->|           |                |
|          |<-------------flow C-------------->|   |
|                |                |                |
|   interval 0   |   interval 1   |   interval 2   |
                ]]></artwork>
            </figure>

            <t>In <xref target="intdist-fig"/>, we illustrate three common
            possibilities for interval distribution as applies with regular
            intervals to a set of three original Flows. For flow A, the start
            and end times lie within the boundaries of a single interval 0;
            therefore, flow A contributes to only one Aggregated Flow. Flow B,
            by contrast, has the same duration but crosses the boundary
            between intervals 0 and 1; therefore, it will contribute to two
            Aggregated Flows, and its counters must be distributed among these
            flows, though in the two-interval case this can be simplified
            somewhat simply by picking one of the two intervals, or
            proportionally distributing between them. Only flows like flow A
            and flow B will be produced when the interval is chosen to be
            longer than the duration of longest original Flow, as above. More
            complicated is the case of flow C, which contributes to more than
            two flows, and must have its counters distributed according to
            some policy as in <xref target="sec-distro"/>.</t>

        </section>
        
        <section title="Key Aggregation" anchor="sec-keyagg">

            <t>Key Aggregation generates a new Flow Key for the Aggregated
            Flows from the original Flow Keys, non-Key fields in the original
            Flows, or from correlation of the original Flow information with
            some external source. There are two basic operations here. First,
            Aggregated Flow Keys may be derived directly from original Flow
            Keys through reduction, or the dropping of fields or precision in
            the original Flow Keys. Second, an Aggregated Flow Key may be
            derived through replacement, e.g. by removing one or more fields
            from the original Flow and replacing them with a fields derived
            from the removed fields. Replacement may refer to external
            information (e.g., IP to AS number mappings). Replacement need not
            replace only key fields; for example, an application aggregating
            byte counts per flow size in packets would promote the packet
            count to a Flow Key field.</t>

            <t>Key aggregation may also result in the addition of new non-Key
            fields to the Aggregated Flows, namely original Flow counters and
            unique reduced key counters; these are treated in more detail in
            <xref target="sec-flowcount"/> and <xref target="sec-distinct"/>,
            respectively.</t>

            <t>In any Key Aggregation operation, reduction and/or replacement
            may be applied any number of times in any order. Which of these
            operations are supported by a given implementation is
            implementation- and application-dependent. Key Aggregation may
            aggregate original Flows with different sets of Flow Key fields;
            only the Flow Keys of the resulting Aggregated Flows of any given
            Key Aggregation operation need contain the same set of fields.</t>

            <figure title="Illustration of key aggregation by reduction" anchor="keyagg-simple-fig">
                <artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
     |         |         |          |         |      |
  retain   mask /24      X          X         X      X
     V         V
+---------+-------------+
| src ip4 | dst ip4 /24 |
+---------+-------------+
Aggregated Flow Key (by source address and destination class-C)
                ]]></artwork>
            </figure>

            <t><xref target="keyagg-simple-fig"/> illustrates an example
            reduction operation, aggregation by source address and
            destination class C network. Here, the port, protocol, and
            type-of-service information is removed from the flow key, the
            source address is retained, and the destination address is masked
            by dropping the low 8 bits.</t>

            <figure title="Illustration of key aggregation by reduction and replacement" anchor="keyagg-replace-fig">
                <artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
     |         |         |          |         |      |
+-------------------+    X          X         X      X
| ASN lookup table  |
+-------------------+
     V         V
+---------+---------+
| src asn | dst asn |
+---------+---------+
Aggregated Flow Key (by source and dest ASN)
                ]]></artwork>
            </figure>

            <t><xref target="keyagg-replace-fig"/> illustrates an example
            reduction and replacement operation, aggregation by source and
            destination ASN without ASN information available in the original
            Flow. Here, the port, protocol, and type-of-service information is
            removed from the flow key, while the source and destination
            addresses are run though an IP address to ASN lookup table, and
            the Aggregated Flow key is made up of the resulting source and
            destination ASNs.</t>

        </section>

        <section title="Aggregating and Distributing Counters" anchor="sec-distro">

            <t>In general, counters in Aggregated Flows are treated the same
            as in any Flow. Each counter is independently is calculated as if
            it were derived from the set of packets in the original flow. For
            the most part, when aggregating original Flows into Aggregated
            Flows, this is simply done by summation.</t>

            <t>When the Aggregation Interval is guaranteed to be longer than
            the longest original Flow, a Flow can cross at most one Interval
            boundary, and will therefore contribute to at most two Aggregated
            Flows. Most common in this case is to arbitrarily but consistently
            choose to account the original Flow's counters either to the first
            or the last aggregated Flow to which it could contribute.</t>

            <t>However, this becomes more complicated when the Aggregation
            Interval is shorter than the longest original Flow in the source
            data. In such cases, each original Flow can incompletely cover one
            or more time intervals, and apply to one or more Aggregated Flows;
            in this case, the Aggregation Process must distribute the counters
            in the original Flows across the multiple Aggregated Flows. There
            are several methods for doing this, listed here in roughly
            increasing order of complexity and accuracy.</t>

            <list style="hanging">

                <t hangText="End Interval: ">The counters for an original Flow
                are added to the counters of the appropriate Aggregated Flow
                containing the end time of the original Flow.</t>

                <t hangText="Start Interval: ">The counters for an original
                Flow are added to the counters of the appropriate Aggregated
                Flow containing the start time of the original Flow.</t>

                <t hangText="Mid Interval: ">The counters for an original Flow
                are added to the counters of a single appropriate Aggregated
                Flow containing some timestamp between start and end time of
                the original Flow.</t>

                <t hangText="Simple Uniform Distribution: ">Each counter for
                an original Flow is divided by the number of time intervals
                the original Flow covers (i.e., of appropriate Aggregated
                Flows sharing the same Flow Key), and this number is added to
                each corresponding counter in each Aggregated Flow.</t>

                <t hangText="Proportional Uniform Distribution: ">Each counter
                for an original Flow is divided by the number of time _units_
                the original Flow covers, to derive a mean count rate. This
                mean count rate is then multiplied by the number of time units
                in the intersection of the duration of the original Flow and
                the time interval of each Aggregated Flow. This is like simple
                uniform distribution, but accounts for the fractional portions
                of a time interval covered by an original Flow in the first
                and last time interval.</t>

                <t hangText="Simulated Process: ">Each counter of the original
                Flow is distributed among the intervals of the Aggregated
                Flows according to some function the Aggregation Process uses
                based upon properties of Flows presumed to be like the
                original Flow. For example, bulk transfer flows might follow a
                more or less proportional uniform distribution, while
                interactive processes are far more bursty.</t>

                <t hangText="Direct: ">The Aggregation Process has access to
                the original packet timings from the packets making up the
                original Flow, and uses these to distribute or recalculate the
                counters.</t>

            </list>

            <t>A method for exporting the distribution of counters across
            multiple Aggregated Flows is detailed in <xref
            target="sec-ex-distro"/>. In any case, counters MUST be
            distributed across the multiple Aggregated Flows in such a way
            that the total count is preserved, within the limits of accuracy
            of the implementation (e.g., inaccuracy introduced by the use of
            floating-point numbers is tolerable). This property allows data to
            be aggregated and re-aggregated without any loss of original count
            information. To avoid confusion in interpretation of the
            aggregated data, all the counters for a set of given original
            Flows SHOULD be distributed via the same method.</t>

        </section>
        
        <section title="Counting Original Flows" anchor="sec-flowcount">

            <t>When aggregating multiple original Flows into an Aggregated
            Flow, it is often useful to know how many original Flows are
            present in the Aggregated Flow. This document introduces four new
            information elements in <xref target="sec-ex-flowcount"/> to
            export these counters.</t>
            
             <t>There are two possible ways to count original Flows, which we
            call here conservative and non-conservative. Conservative flow
            counting has the property that each original Flow contributes
            exactly one to the total flow count within a set of aggregated
            Flows. In other words, conservative flow counters are distributed
            just as any other counter, except each original Flow is assumed to
            have a flow count of one. When a count for an original Flow must
            be distributed across a set of Aggregated Flows, and a
            distribution method is used which does not account for that
            original Flow completely within a single Aggregated Flow,
            conservative flow counting requires a fractional
            representation.</t>

            <t>By contrast, non-conservative flow counting is used to count
            how many flows are represented in an Aggregated Flow. Flow
            counters are not distributed in this case. An original Flow which
            is present within N Aggregated Flows would add N to the sum of
            non-conservative flow counts, one to each Aggregated Flow. In
            other words, the sum of conservative flow counts over a set of
            Aggregated Flows is always equal to the number of original Flows,
            while the sum of non-conservative flow counts is strictly greater
            than or equal to the number of original Flows.</t>

            <t>For example, consider flows A, B, and C as illustrated in <xref
            target="intdist-fig"/>. Assume that the key aggregation step
            aggregates the keys of these three flows to the same aggregated
            flow key, and that start interval counter distribution is in
            effect. The conservative flow count for interval 0 is 3 (since
            flows A, B, and C all begin in this interval), and for the other
            two intervals is 0. The non-conservative flow count for interval 0
            is also 3 (due to the presence of flows A, B, and C), for interval
            1 is 2 ( flows B and C), and for interval 2 is 1 (flow 0). The sum
            of the conservative counts 3 + 0 + 0 = 3, the number of original
            Flows; while the sum of the non-conservative counts 3 + 2 + 1 =
            6.</t>

        </section>

        <section title="Counting Distinct Key Values" anchor="sec-distinct">

            <t>One common case in aggregation is counting distinct values that
            were reduced away during key aggregation. For example, consider
            an application counting destinations contacted per host, a common
            case in host characterization or anomaly detection. Here, the
            Aggregation Process needs a way to export this distinct key count
            information.</t>

            <t>For such applications, a distinctCountOf(key name) Information
            Element should be registered with IANA to represent these cases.
            [EDITOR'S NOTE: There is an open question as to the best way to do
            this: either through the registration of Information Elements for
            common cases in this draft, the registration of Information
            Elements on demand, or the definition of a new Information Element
            space for distinct counts bound to a PEN, as in <xref
            target="RFC5103"/>.]</t>

        </section>
        

        <section title="Exact versus Approximate Counting during Aggregation"
        anchor="sec-lowfi">

            <t>In certain circumstances, particularly involving aggregation by
            devices with limited resources, and in situations where exact
            aggregated counts are less important than relative magnitudes
            (e.g. driving graphical displays), counter distribution during key
            aggregation may be performed by approximate counting means (e.g.
            Bloom filters). The choice to use approximate counting is
            implementation- and application-dependent.</t>

            <!-- <t>In certain cases, the magnitude of error for a given
            Information Element due to approximate counting may be known. An
            Exporting Process MAY use the Error Magnitude Options Template
            defined in <xref target="sec-ex-error"/> to export this
            information.</t> -->

        </section>

        <section title="Time Composition">

            <t>Time Composition as in section 5.4 of <xref target="RFC5982"/>
            (or interval combination) is a special case of aggregation, where
            interval distribution imposes longer intervals on flows with
            matching keys and "chained" start and end times, without any key
            reduction, in order to join long-lived Flows which may have been
            split (e.g., due to an active timeout shorter than the Flow.)
            Here, no Key Aggregation is applied, and the Aggregation Interval
            is chosen on a per-Flow basis to cover the interval spanned by the
            set of aggregated Flows. This may be applied alone in order to
            normalize split Flows, or in combination with other aggregation
            functions in order to obtain more accurate original Flow
            counts.</t>

        </section>

    </section>

    <section title="Aggregation in the IPFIX Architecture" anchor="sec-arch">

      <t>The techniques described in this document can be applied to IPFIX
      data at three stages within the collection infrastructure: on initial
      export, within a mediator, or after collection, as shown in <xref
      target="loc-fig"/>.</t>
      
      <t>[EDITOR'S NOTE: determine where this lives: in the introduction or down here? Note explicitly that an IAP may live outside a mediator. Check both these figures for parallels to mediator framework.]</t>

      <figure title="Potential Aggregation Locations" anchor="loc-fig">
        <artwork><![CDATA[
+==========================================+
| Exporting Process                        |
+==========================================+
  |                                      |
  |             (Aggregated Flow Export) |
  V                                      |
+=============================+          |
| Mediator                    |          |
+=============================+          |
  |                                      |
  | (Aggregating Mediator)               |
  V                                      V
+==========================================+
| Collecting Process                       |
+==========================================+
        |
        | (Aggregation for Storage)
        V
+--------------------+
| IPFIX File Storage |
+--------------------+
        ]]></artwork>
      </figure>

      <t>Aggregation can be applied for either intermediate or final analytic
      purposes. In certain circumstances, it may make sense to export
      Aggregated Flows from an Exporting Process, for example, if the
      Exporting Process is designed to drive a time-series visualization
      directly. Note that this case, where the Aggregation Process is
      essentially integrated into the Metering Process, is essentially covered
      by the <xref target="RFC5470">IPFIX architecture</xref>: the flow keys
      used are simply a subset of those that would normally be used. A
      Metering Process in this arrangement MAY choose to simulate the
      generation of larger flows in order to generate original flow counts, if
      the application calls for compatibility with an Aggregation Process
      deployed in a separate location.</t>

      <t>Deployment of an Intermediate Aggregation Process within a <xref
      target="RFC5982">Mediator</xref> is a much more flexible arrangement.
      Here, the Mediator consumes original Flows and produces aggregated
      Flows; this arrangement is suited to any of the use cases detailed in
      <xref target="sec-usecase"/>. In a mediator, aggregation can be applied
      as well to aggregating original Flows from multiple sources into a
      single stream of aggregated Flows; the architectural specifics of this
      arrangement are not addressed in this document, which is concerned only
      with the aggregation operation itself; see <xref
      target="I-D.claise-ipfix-mediation-protocol"/> for details.</t>

      <t>In the specific case that an Aggregation Process is employed for data
      reduction for storage purposes, it can take original Flows from a
      Collecting Process or File Reader and pass Aggregated Flows to a File
      Writer for storage.</t>
      
       <t>The data flows into and out of an Intermediate Aggregation Process
      are showin in <xref target="iap-dataflows"/>.</t>

      <figure title="Data flows through the aggregation process" anchor="iap-dataflows">
                <artwork><![CDATA[
packets --+                     +- IPFIX Messages -+
          |                     |                  |
          V                     V                  V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
|                  | +====================+ +=============+
|                  |            |  original Flows  |
|                  |            V                  V
+ - - - - - - - - -+======================================+
|           Intermediate Aggregation Process (IAP)        |
+=========================================================+
          | Aggregated                  Aggregated |
          | Flows                            Flows |
          V                                        V
+===================+                       +=============+
| Exporting Process |                       | File Writer |
+===================+                       +=============+
          |                                        |
          +------------> IPFIX Messages <----------+
        ]]></artwork>
        </figure>

    </section>

    <section title="Export of Aggregated IP Flows using IPFIX" anchor="sec-export">

        <t>In general, Aggregated Flows are exported in IPFIX as any normal Flow. However, certain aspects of aggregated flow export benefit from  additional guidelines, or new Information Elements to represent aggregation metadata or information generated during aggregation. These are detailed in the following subsections.</t>

        <section title="Time Interval Export">

            <t>Since an Aggregated Flow is simply a Flow, the existing
            timestamp Information Elements in the IPFIX Information Model
            (e.g., flowStartMilliseconds, flowEndNanoseconds) are sufficient
            to specify the time interval for aggregation. Therefore, this
            document specifies no new aggregation-specific Information
            Elements for exporting time interval information.</t>

            <t>Each Aggregated Flow SHOULD contain both an interval start and
            interval end timestamp. If an exporter of Aggregated Flows omits
            the interval end timestamp from each Aggregated Flow, the time
            interval for Aggregated Flows within an Observation Domain and
            Transport Session MUST be regular and constant.
            However, note that this approach might lead to interoperability
            problems when exporting Aggregated Flows to non-aggregation-aware
            Collecting Processes and downstream analysis tasks; therefore, an
            Exporting Process capable of exporting only interval start
            timestamps MUST provide a configuration option to export interval
            end timestamps as well.</t>

        </section>

        <section title="Flow Count Export" anchor="sec-ex-flowcount">

          <t>The following four Information Elements are defined to count original Flows as discussed in <xref target="sec-flowcount"/>.</t>

          <section title="originalFlowsPresent Information Element" anchor="ie-noncon-flowcount">
            <list style="hanging">
              <t hangText="Description: ">

                The non-conservative count of original Flows contributing to
                this Aggregated Flow. Non-conservative counts need not sum to
                the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD1</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlowsInitiated InformationElement" anchor="ie-con-flowstartcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows whose first packet is
                represented within this Aggregated Flow. Conservative counts
                must some to the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD2</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlowsCompleted InformationElement" anchor="ie-con-flowendcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows whose last packet is
                represented within this Aggregated Flow. Conservative counts
                must some to the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD3</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlows InformationElement" anchor="ie-con-flowcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows contributing to this
                Aggregated Flow; may be distributed via any of the methods
                described in <xref target="sec-distro"/>.

              </t>

              <t hangText="Abstract Data Type: ">float64</t>
              <t hangText="ElementId: ">TBD4</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

        </section>
        
        <section title="Aggregate Counter Distibution Export" anchor="sec-ex-distro">

            <t>When exporting counters distributed among Aggregated Flows, as
            described in <xref target='sec-distro'/>, the Exporting Process MAY
            export an Aggregate Counter Distribution Record for each Template
            describing Aggregated Flow records; this Options Template is
            described below. It uses the valueDistributionMethod Information
            Element, also defined below. Since in many cases distribution is
            simple, accounting the counters from contributing Flows to the
            first Interval to which they contribute, this is default
            situation, for which no Aggregate Counter Distribution Record is
            necessary; Aggregate Counter Distribution Records are only
            applicable in more exotic situations, such as using an Aggregation
            Interval smaller than the durations of original Flows.</t>

            <section title="Aggregate Counter Distribution Options Template">

                <t>This Options Template defines the Aggregate Counter
                Distribution Record, which allows the binding of a value
                distribution method to a Template ID. This is used to signal
                to the Collecting Process how the counters were distributed.
                The fields are as below:

                <texttable>
                    <ttcol align="left">IE</ttcol>
                    <ttcol align="left">Description</ttcol>
                    <c>templateId [scope]</c>
                    <c>

                      The Template ID of the Template defining the Aggregated
                      Flows to which this distribution option applies. This
                      Information Element MUST be defined as a Scope Field.

                    </c>
                    <c>valueDistributionMethod</c>
                    <c>

                        The method used to distribute the counters for the
                        Aggregated Flows defined by the associated Template.

                    </c>
                </texttable></t>
            </section>
            
            <section title="valueDistributionMethod Information Element" anchor="ie-errmag">
                <list style="hanging">
                    <t hangText="Description: ">

                        A description of the method used to distribute the
                        counters from contributing Flows into the Aggregated
                        Flow records described by an associated Template. The
                        method is deemed to apply to all the non-key
                        Information Elements in the referenced Template for
                        which value distribution is a valid operation; if the
                        originalFlowsInitiated and/or originalFlowsCompleted
                        Information Elements appear in the Template, they are
                        not subject to this distribution method, as they each
                        infer their own distribution method. The distribution
                        methods are taken from <xref target='sec-distro'/> and
                        encoded as follows:

                        <texttable>
                        <ttcol align="left">Value</ttcol>
                        <ttcol align="left">Description</ttcol>

                        <c>1</c><c>Start Interval: The counters for an
                        original Flow are added to the counters of the
                        appropriate Aggregated Flow containing the start time
                        of the original Flow. This should be assumed the
                        default if value distribution information is not
                        available at a Collecting Process for an Aggregated
                        Flow.</c>

                        <c>2</c><c>End Interval: The counters for an original
                        Flow are added to the counters of the appropriate
                        Aggregated Flow containing the end time of the
                        original Flow.</c>

                        <c>3</c><c>Mid Interval: The counters for an original
                        Flow are added to the counters of a single appropriate
                        Aggregated Flow containing some timestamp between
                        start and end time of the original Flow.</c>

                        <c>4</c><c>Simple Uniform Distribution: Each counter
                        for an original Flow is divided by the number of time
                        intervals the original Flow covers (i.e., of
                        appropriate Aggregated Flows sharing the same Flow
                        Key), and this number is added to each corresponding
                        counter in each Aggregated Flow.</c>

                        <c>5</c><c>Proportional Uniform Distribution: Each
                        counter for an original Flow is divided by the number
                        of time _units_ the original Flow covers, to derive a
                        mean count rate. This mean count rate is then
                        multiplied by the number of time units in the
                        intersection of the duration of the original Flow and
                        the time interval of each Aggregated Flow. This is
                        like simple uniform distribution, but accounts for the
                        fractional portions of a time interval covered by an
                        original Flow in the first and last time interval.</c>

                        <c>6</c><c>Simulated Process: Each counter of the
                        original Flow is distributed among the intervals of
                        the Aggregated Flows according to some function the
                        Aggregation Process uses based upon properties of
                        Flows presumed to be like the original Flow. This is
                        essentially an assertion that the Aggregation Process
                        has no direct packet timing information but is
                        nevertheless not using one of the other simpler
                        distribution methods. The Aggregation Process
                        specifically makes no assertion as to the correctness
                        of the simulation.</c>

                        <c>7</c><c>Direct: The Aggregation Process has access
                        to the original packet timings from the packets making
                        up the original Flow, and uses these to distribute or
                        recalculate the counters.</c>

                        </texttable>

                    </t> 
                    <t hangText="Abstract Data Type: ">unsigned8</t>
                    <t hangText="ElementId: ">TBD5</t> 
                    <t hangText="Status: ">Proposed</t> 
                  </list>
                </section>

        </section>

<!--
        <section title="Error Magnitude Export" anchor="sec-ex-error">

            <section title="errorMagnitude Information Element" anchor="ie-errmag">
                <list style="hanging">
                    <t hangText="Description: ">

                        The approximate magnitude of the error induced by the
                        Intermediate Aggregation Process in the values of the
                        associated Information Element, as a proportion of the
                        range of values covered by the Information Element;
                        SHOULD be associated with an informationElementId and
                        optional templateId as scope in an Options
                        Template.</t>

                    <t hangText="Abstract Data Type: ">float64</t>
                    <t hangText="ElementId: ">TBD6</t>
                    <t hangText="Status: ">Proposed</t>
                </list>
            </section>      

            <section title="Error Magnitude Options Template">
                <t>[TODO: define options template]</t>
            </section>

        </section>
-->

    </section>

    <section title="Examples">

        <t>[TODO]</t>

    </section>

    <section title="Security Considerations">

        <t>[TODO]</t>

    </section>

    <section title="IANA Considerations">

        <t>[TODO: add all IEs defined in Section 6.]</t>

    </section>

    <section title="Acknowledgments">

        <t>Many thanks to Benoit Claise for his thorough review of this work.
        This work is materially supported by the European Union Seventh
        Framework Programme under grant agreement 257315 (DEMONS).</t>

    </section>

		
  </middle>
  
  <back>

  <references title="Normative References">
      <?rfc include="reference.RFC.5101" ?>
      <?rfc include="reference.RFC.5102" ?>
  </references>

  <references title="Informative References">
      <?rfc include="reference.RFC.5103" ?>
      <?rfc include="reference.RFC.5470" ?>
      <?rfc include="reference.RFC.5472" ?>
      <?rfc include="reference.RFC.5153" ?>
      <?rfc include="reference.RFC.5610" ?>
      <?rfc include="reference.RFC.5655" ?>
      <?rfc include="reference.RFC.5982" ?>
      <?rfc include="reference.RFC.3917" ?>
      <?rfc include="reference.RFC.2119" ?>
      <?rfc include="reference.I-D.ietf-ipfix-anon" ?>
      <?rfc include="reference.I-D.ietf-ipfix-mediators-framework" ?>
      <?rfc include="reference.I-D.claise-ipfix-mediation-protocol" ?>
  </references>

</back>
</rfc>

PAFTECH AB 2003-20262026-04-23 19:43:05