One document matched: draft-trammell-ipfix-a9n-00.xml


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc ipr="trust200902" category="std" docName="draft-trammell-ipfix-a9n-00.txt">
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>

<front>
  <title abbrev="IPFIX Aggregation">
    Exporting Aggregated Flow Data using the IP Flow Information Export (IPFIX) Protocol 
  </title>
  <author initials="B." surname="Trammell" fullname="Brian Trammell">
    <organization abbrev="ETH Zurich">
      Swiss Federal Institute of Technology Zurich 
    </organization>
    <address>
      <postal>
        <street>Gloriastrasse 35</street>
        <city>8092 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <phone>+41 44 632 70 13</phone>
      <email>trammell@tik.ee.ethz.ch</email>
    </address>
  </author>
  <author initials="E." surname="Boschi" fullname="Elisa Boschi">
    <organization abbrev="ETH Zurich">
      Swiss Federal Institute of Technology Zurich 
    </organization>
    <address>
      <postal>
        <street>Gloriastrasse 35</street>
        <city>8092 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <email>boschie@tik.ee.ethz.ch</email>
    </address>
  </author>
  <author initials="A." surname="Wagner" fullname="Arno Wagner">
    <organization abbrev="Consecom AG">
      Consecom AG 
    </organization>
    <address>
      <postal>
        <street>Bellariastrasse 11</street>
        <city>8002 Zurich</city>
        <country>Switzerland</country>
      </postal>
      <email>arno@wagner.name</email>
    </address>
  </author>
  <date month="September" day="21" year="2010"></date>
  <area>Operations</area>
  <workgroup>IPFIX Working Group</workgroup>
  <abstract> 

    <t>This document describes the export of aggregated Flow information using
    IPFIX. An Aggregated Flow is essentially an IPFIX Flow representing
    packets from zero or more original Flows, within an externally imposed
    time interval. The document describes Aggregated Flow export within the
    framework of IPFIX Mediators and defines an interoperable,
    implementation-independent method for Aggregated Flow export.</t>

  </abstract>
</front>

<middle>

    <section title="Introduction">

        <t>The aggregation of packet data into flows serves a variety of
        different purposes, as noted in <xref target="RFC3917"/> and <xref
        target="RFC5472"/>. Aggregation beyond the flow level, into records
        representing multiple Flows, is a common analysis and data reduction
        technique as well, with applicability to large-scale network data
        analysis, archiving, and inter-organization exchange.</t>

        <t>Aggregation is applicable to a wide variety of situations,
        including traffic matrix calculation, generation of time series data
        for visualizations or anomaly detection, and data reduction. Depending
        on the keys used for aggregation, it may have an anonymising affect on
        the data. Aggregation can take place at one of any number of locations
        within a measurement infrastructure. Exporters may export aggregated
        Flow information simply as normal flow information, by performing
        aggregation after metering but before export. IPFIX Mediators are
        particularly well suited to performing aggregation, as they can
        collect information from multiple original exporters at geographically
        and topologically distinct observation points.</t>

        <t>Aggregation as defined and described in this document covers a
        superset of the applications defined in the <xref
        target="RFC5982">IPFIX Mediators Problem Statement</xref>, including
        5.1 "Adjusting Flow Granularity (herein referred to as Key
        Aggregation), 5.4 "Time Composition" (herein referred to as Interval
        Combination), and 5.5 "Spatial Composition", although the
        architectural aspects of spatial composition are not addressed by this
        document.</t>

        <t>Since aggregated flows as defined in the following section are
        essentially Flows, IPFIX can be used to export <xref
        target="RFC5101"/> and store <xref target="RFC5655"/> aggregated data
        without further specification. However, this document further provides
        a common basis for the application of IPFIX to the handling of
        aggregated data, through a detailed terminology, model of aggregation
        operations, methods for original Flow counting and counter
        distribution across time intervals, and an aggregation metadata
        representation based upon IPFIX Options.</t>

        </section>

    <section title="Terminology" anchor="sec-terminology">

        <t>Terms used in this document that are defined in the Terminology
        section of the <xref target="RFC5101">IPFIX Protocol</xref> document
        are to be interpreted as defined there.</t>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119"/>.</t>

        <t>In addition, this document defines the following terms</t>:
        
        <list style="hanging">

            <t hangText="Aggregated Flow: ">A Flow, as defined by <xref
            target="RFC5101"/>, derived from a set of zero or more original
            Flows within a defined time interval. The two primary differences
            between a Flow and an Aggregated Flow are (1) that the time
            interval of a Flow is generally derived from information about the
            timing of the packets comprising the Flow, while the time interval
            of an Aggregated Flow are generally externally imposed; and (2)
            that an Aggregated Flow may represent zero packets (i.e., an
            assertion that no packets were seen for a given Flow Key in a
            given time interval).</t>

            <t hangText="(Intermediate) Aggregation Function: ">A mapping from
            a set of zero or more original Flows into a set of aggregated
            Flow, that separates the original Flows into a set of one or more
            given time intervals.</t>
            
            <t hangText="(Intermediate) Aggregation Process: ">An Intermediate
            Process, as in <xref
            target="I-D.ietf-ipfix-mediators-framework"/>, hosting an
            Intermediate Aggregation Function.</t>

            <t hangText="Aggregation Interval: ">A time interval imposed upon
            an Aggregated Flow. Aggregation Functions commonly use a regular
            Aggregation Interval (e.g. "every five minutes", "every calendar
            month"), though regularity is not necessary.</t>
<!--
            <t hangText="Interval Distribution: ">A temporal aggregation
            operation which imposes a new time interval on an original Flow,
            an Aggregated Flow produced by some other operation, or a set
            thereof. Interval Distribution is a many-to-many operation: it may
            result in the values from an original Flow appearing in multiple
            Aggregated Flows as well as in multiple original Flows
            contributing to each imposed time interval.</t>

            <t hangText="Interval Combination: ">A temporal aggregation
            operation which combines temporally adjacent original Flows with
            matching Flow Keys, expanding the interval of the combined Flow to
            cover the entire interval covered by the set of original
            Flows.</t>

            <t hangText="Key Aggregation: ">A spatial aggregation operation
            that generates new Aggregated Flows from original Flows by
            modifying the Flow Key. Key Aggregation is usually applied in
            combination with an Interval Distribution operation.</t>
-->

            <t hangText="original Flow: ">A Flow given as input to an
            Aggregation Function in order to generate Aggregated Flows.</t>

            <t hangText="contributing Flow: ">An original Flow that is
            partially or completely represented within an Aggregated Flow.
            Each aggregated Flow is made up of zero or more contributing
            Flows, and an original flow may contribute to zero or more
            Aggregated Flows.</t>

        </list>

    </section>

    <section title="Requirements for Aggregation Support in IPFIX">

        <t>In defining a terminology, model, and metadata for Aggregated Flow
        export using IPFIX, we have sought to meet the following
        requirements.</t>

        <t>First, a specification of Aggregated Flow export must seek to be as
        interoperable as possible. Export of Aggregated Flows using the
        techniques described in this document will result in Flow data which
        can be collected by Collecting Processes and read by File Readers
        which do not provide any special support for Aggregated Flow
        export.</t>

        <t>Second, a specification of Aggregated Flow export must seek to be
        as implementation-independent as the IPFIX protocol itself. In <xref
        target="sec-arch"/>, we specify the flow aggregation process as an
        intermediate process within the <xref
        target="I-D.ietf-ipfix-mediators-framework">IPFIX Mediator
        framework</xref>, and specify a variety of different architectural
        arrangements for flow aggregation; these are meant to be descriptive
        as opposed to proscriptive. In metadata export, we seek to define
        properties of the set of exported Aggregated Flows, as opposed to the
        properties of the specific algorithms used to aggregate these Flows.
        Specifically out of scope for this effort are any definition of a
        language for defining aggregation operations, or the configuration
        parameters of Aggregation Processes.</t>

        <t>From the definition of presented in <xref
        target="sec-terminology"/>, an Aggregated Flow is a Flow as in <xref
        target="RFC5101"/>, with a restricted definition as to the packets
        making up the Flow. Practically speaking, Aggregated Flows are derived
        from original Flows, as opposed to a raw packet stream. Key to this
        definition of Aggregated Flow is how timing affects the process of
        aggregation, as for the most part flow aggregation takes place within
        some set of (usually regular) time intervals. Any specification for
        Aggregated Flow export must account for the special role time
        intervals play in aggregation, and the many-to-many relationship
        between Aggregated Flows and original Flows which this implies.</t>

    </section>

    <section title="Use Cases for IPFIX Aggregation" anchor="sec-usecase">

        <t>Aggregation, as a common data analysis method, has many
        applications. When used with a regular Aggregation Interval, it
        generates time series data from a collection of flows with discrete
        intervals. Time series data is itself useful for a wide variety of
        analysis tasks, such as generating parameters for network anomaly
        detection systems, or driving visualizations of volume per time for
        traffic with specific characteristics. Traffic matrix calculation from
        flow data is inherently an aggregation action, by aggregating the flow
        key down to interface, address prefix, or autonomous system.</t>

        <t>Irregular or data-dependent Aggregation Intervals and Key
        Aggregation operations can be also be used to provide adaptive
        aggregation of network flow data, providing a higher-resolution view
        on data of interest (e.g., potential attacks) to an application while
        providing lower resolution to "less interesting" data (e.g., normal
        web traffic). Indeed, this multiple-resolution approach can be applied
        by a Mediator exporting unchanged original Flow data for the most
        interesting flows alongside the Aggregated Flows of varying resolution
        for the less interesting ones.</t>
        
         <t>Note that an aggregation operation which removes potentially
        sensitive information as identified in <xref
        target="I-D.ietf-ipfix-anon"/> may tend to have an anonymising effect
        on the Aggregated Flows, as well; however, any application of
        aggregation as part of a data protection scheme should ensure that all
        the issues raised in Section 4 of <xref target="I-D.ietf-ipfix-anon"/>
        are addressed.</t>
        
         </section>

     <section title="Aggregation of IP Flows">

        <t>As stated in <xref target="sec-terminology"/>, an Aggregated Flow
        is simply an IPFIX Flow generated from original Flows by an
        Aggregation Function. Here, we present a general model for
        aggregation, and elaborate and provide examples of specific
        aggregation operations that may be performed by the Aggregation
        Process; we use this to define the export of Aggregated Flows in <xref
        target="sec-export"/></t>

        <section title="A general model for IP Flow Aggregation">

            <t>An Intermediate Aggregation Process consumes original Flows and
            exports Aggregated Flows, as defined in <xref
            target="sec-terminology"/>. While this document does not define an
            implementation of an Intermediate Aggregation Process further than
            this, or the Aggregation Functions that it applies, it can be
            helpful to partially decompose this function into a set of common
            operations, in order to more fully examine the effects these
            operations have.</t>

            <t>Aggregation is composed of three general types of operations on
            original Flows: those that externally impose a time interval,
            called here the Aggregation Interval; those that reduce or
            otherwise modify the Flow Key; and those that aggregate and
            distribute the resulting non-Flow Key fields accordingly. Most
            aggregation functions will perform each of these types of
            operations.</t>

            <t>Interval Distribution is the external imposition of a time
            interval onto an original Flow. Note that this may lead to an
            original Flow contributing to multiple aggregated Flows, if the
            original Flow's time interval crosses at least one boundary
            between Aggregation Intervals. Interval Distribution is described
            in more detail in <xref target="sec-intdist"/>.</t>

            <t>Key aggregation, the modification of Flow Keys, may occur in
            two ways. First, the Flow Key may be projected: that is,
            Information Elements may be removed from the Flow Key, or the
            space of values in the Flow Key may be reduced. Second, derived
            Information Elements may be added to the Flow Key. Both of these
            modifications may result in multiple original Flows
            contributing to the same Aggregated Flow. Key Aggregation is
            described in more detail in <xref target="sec-keyagg"/>.</t>

            <t>Interval distribution and key aggregation together may generate
            multiple intermediate aggregated Flows covering the same time
            interval with the same Flow Key; these intermediate Flows must
            then be combined into Aggregated Flows. Non-key values are first
            distributed among the Aggregated Flows to which an original Flow
            contributes according to some distribution algorithm (see <xref
            target="sec-distro"/>), and values from multiple contributing
            Flows are combined using the same operation by which values are
            combined from packets to form Flows for each Information Element:
            in general, counters are added, averages are averaged, flags are
            unioned, and so on. Aggregation may also introduce new non-key
            fields, e.g. per-flow average counters, or distinct counters for
            key fields projected out of the Aggregated Flow.</t>

            <t>As a result of this final combination and distribution,an
            Aggregation Function produces at most one Aggregated Flow
            resulting from a set of original Flows for a given modified Flow
            Key and Aggregation Interval.</t>

            <t>This general model is illustrated in the figure below. Note
            that interval and key field steps are commutative and optional,
            and as such may occur in any order.</t>

            <figure title="Conceptual model of aggregation operations" anchor="iaf-operations"><artwork><![CDATA[
                
        original Flows
              |
              V
   +------------------------+
   |  Interval Distribution |<--- Aggregation Interval
   +------------------------+
              | (Flows with modified intervals)
              V
   +------------------------+
   |   Key Aggregation and  |<--- specification of keys
   |  Key Field replacement |
   +------------------------+
              | (Flows with modified keys/intervals)
              V (Addition of new non-key values)
   +------------------------+
   |     Combination of     |
   | contributing Flows and |
   |  Counter Distribution  |
   +------------------------+
              |
              V
        Aggregated Flows
            ]]></artwork></figure>
        </section>

        <section title="Interval Distribution" anchor="sec-intdist">

            <t>Interval Distribution generally imposes a regular interval on
            the resulting Aggregated Flows; the selection of an interval is a
            matter for the specific aggregation application, and has
            tradeoffs. Shorter intervals allow higher resolution aggregated
            data and, in streaming applications, faster reaction time. Longer
            intervals lead to greater data reduction and simplified counter
            distribution. Specifically, counter distribution is greatly
            simplified by the choice of an interval longer than the duration
            of longest original Flow, itself generally determined by the
            original Flow's Metering Process active timeout; in this case an
            original Flow can contribute to at most two Aggregated Flows, and
            the more exotic value distribution methods become
            inapplicable.</t>

            <t>Aggregation intervals, however, need not be regular. The
            aggregation interval can be chosen, for example, based on time of
            day, or on the relative volume of the original Flows, in order to
            adapt the aggregation to the conditions on the measured
            network.</t>
            
            <figure title="Illustration of interval distribution" anchor="intdist-fig">
                <artwork><![CDATA[
|                |                |                |
| |<--flow A-->| |                |                |
|        |<--flow B-->|           |                |
|          |<-------------flow C-------------->|   |
|                |                |                |
|   interval 0   |   interval 1   |   interval 2   |
                ]]></artwork>
            </figure>

            <t>In <xref target="intdist-fig"/>, we illustrate three common
            possibilities for interval distribution. For flow A, the start and
            end times lie within the boundaries of a single interval 0;
            therefore, flow A contributes to only one Aggregated Flow. Flow B,
            by contrast, has the same duration but crosses the boundary
            between intervals 0 and 1; therefore, it will contribute to two
            Aggregated Flows, and its counters must be distributed among these
            flows, though in the two-interval case this can be simplified
            somewhat simply by picking one of the two intervals, or
            proportionally distributing between them. Only flows like flow A
            and flow B will be produced when the interval is chosen to be
            longer than the duration of longest original Flow, as above. More
            complicated is the case of flow C, which contributes to more than
            two flows, and must have its counters distributed according to
            some policy as in <xref target="sec-distro"/>.</t>

        </section>
        
        <section title="Key Aggregation" anchor="sec-keyagg">

            <t>Key Aggregation modifies the Flow Key of the original Flows,
            through projection, replacement, and augmentation. For example,
            consider original Flows with a flow key containing the traditional
            five-tuple of source and destination address and port, and
            transport protocol. Aggregating by host pair would project the
            Flow Key down by eliminating port and protocol fields. Aggregating
            by source /24 network would project the Flow Key down to just the
            source address, then further applying a prefix mask to the source
            address.</t>

            <t>During aggregation, new Flow Key fields may be added to
            original Flows, or Flow Key Fields may be replaced with ancillary
            values derived from the Flow. To continue the example from above,
            consider an aggregation operation for counting traffic per source
            autonomous system. Here, the Flow Key would be projected down to
            just the source address, and the source address would be replaced
            with the source AS number, looked up in a table maintained by the
            intermediate Aggregation Process.</t>

            <figure title="Illustration of key aggregation by simple masking" anchor="keyagg-simple-fig">
                <artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
     |         |         |          |         |      |
  retain   mask /24      X          X         X      X
     V         V
+---------+-------------+
| src ip4 | dst ip4 /24 |
+---------+-------------+
Aggregated Flow Key (by source address and destination class-C)
                ]]></artwork>
            </figure>

            <t><xref target="keyagg-simple-fig"/> illustrates an example
            projection operation, aggregation by source address and
            destination class C network. Here, the port, protocol, and
            type-of-service information is removed from the flow key, the
            source address is retained, and the destination address is masked
            by dropping the low 8 bits.</t>

            <figure title="Illustration of key aggregation by replacement" anchor="keyagg-replace-fig">
                <artwork><![CDATA[
Original Flow Key
+---------+---------+----------+----------+-------+-----+
| src ip4 | dst ip4 | src port | dst port | proto | tos |
+---------+---------+----------+----------+-------+-----+
     |         |         |          |         |      |
+-------------------+    X          X         X      X
| ASN lookup table  |
+-------------------+
     V         V
+---------+---------+
| src asn | dst asn |
+---------+---------+
Aggregated Flow Key (by source and dest ASN)
                ]]></artwork>
            </figure>

            <t><xref target="keyagg-replace-fig"/> illustrates an example
            projection operation with a replacement function, aggregation by
            source and destination ASN without ASN information available in
            the original Flow. Here, the port, protocol, and type-of-service
            information is removed from the flow key, while the source and
            destination addresses are run though an IP address to ASN lookup
            table, and the Aggregated Flow key is made up of the resulting
            source and destination ASNs.</t>

        </section>

        <section title="Aggregating and Distributing Counters" anchor="sec-distro">

            <t>In general, counters in Aggregated Flows are treated the same
            as in any Flow: on a per-Information Element basis, the counters
            are calculated as if they were derived from the set of packets in
            the original flow. For the most part, when aggregating original
            Flows into Aggregated Flows, this is simply done by summation.</t>

            <t>When the Aggregation Interval is longer or much longer than the
            longest original Flow, a Flow can cross at most one
            Interval boundary, and will therefore contribute to at most two
            Aggregated Flows. Most common in this case is to arbitrarily but
            consistently choose to account the original Flow's counters either
            to the first or the last aggregated Flow to which it could
            contribute.</t>

            <t>However, this becomes more complicated when the Aggregation
            Interval is shorter than the longest original Flow in the source
            data. In such cases, each original Flow can incompletely cover
            one or more time intervals, and apply to one or more Aggregated
            Flows; in this case, the Aggregation Process must distribute the
            counters in the original Flows across the multiple Aggregated
            Flows. There are several methods for doing this, listed here in
            increasing order of complexity and accuracy.</t>

            <list style="hanging">

                <t hangText="End Interval: ">The counters for an original Flow
                are added to the counters of the appropriate Aggregated Flow
                containing the end time of the original Flow.</t>

                <t hangText="Start Interval: ">The counters for an original
                Flow are added to the counters of the appropriate Aggregated
                Flow containing the start time of the original Flow.</t>

                <t hangText="Mid Interval: ">The counters for an original Flow
                are added to the counters of a single appropriate Aggregated
                Flow containing some timestamp between start and end time of
                the original Flow.</t>

                <t hangText="Simple Uniform Distribution: ">Each counter for
                an original Flow is divided by the number of time intervals
                the original Flow covers (i.e., of appropriate Aggregated
                Flows sharing the same Flow Key), and this number is added to
                each corresponding counter in each Aggregated Flow.</t>

                <t hangText="Proportional Uniform Distribution: ">Each counter
                for an original Flow is divided by the number of time _units_
                the original Flow covers, to derive a mean count rate. This
                mean count rate is then multiplied by the number of time units
                in the intersection of the duration of the original Flow and
                the time interval of each Aggregated Flow. This is like simple
                uniform distribution, but accounts for the fractional portions
                of a time interval covered by an original Flow in the first
                and last time interval.</t>

                <t hangText="Simulated Process: ">Each counter of the original
                Flow is distributed among the intervals of the Aggregated
                Flows according to some function the Aggregation Process uses
                based upon properties of Flows presumed to be like the
                original Flow. For example, bulk transfer flows might follow a
                more or less proportional uniform distribution, while
                interactive processes are far more bursty.</t>

                <t hangText="Direct: ">The Aggregation Process has access to
                the original packet timings from the packets making up the
                original Flow, and uses these to distribute or recalculate the
                counters.</t>

            </list>

            <t>A method for exporting the distribution of counters across
            multiple Aggregated Flows is detailed in <xref
            target="sec-ex-distro"/>. In any case, counters MUST be
            distributed across the multiple Aggregated Flows in such a way
            that the total count is preserved, within the limits of accuracy
            of the implementation (e.g., inaccuracy introduced by the use of
            floating-point numbers is tolerable). This property allows data to
            be aggregated and re-aggregated without any loss of original count
            information. To avoid confusion in interpretation of the
            aggregated data, all the counters for a set of given original
            Flows SHOULD be distributed via the same method.</t>

        </section>
        
        <section title="Counting Original Flows" anchor="sec-flowcount">

            <t>When aggregating multiple original Flows into an Aggregated
            Flow, it is often useful to know how many original Flows are
            present in the Aggregated Flow. This document introduces four new
            information elements in <xref target="sec-ex-flowcount"/> to
            export these counters.</t>
            
             <t>There are two possible ways to count original Flows, which we
            call here conservative and non-conservative. Conservative flow
            counting has the property that each original Flow contributes
            exactly one to the total flow count within a set of aggregated
            Flows. In other words, conservative flow counters are distributed
            just as any other counter, except each original Flow is assumed to
            have a flow count of one. When a count for an original Flow must
            be distributed across a set of Aggregated Flows, and a
            distribution method is used which does not account for that
            original Flow completely within a single Aggregated Flow,
            conservative flow counting requires a fractional
            representation.</t>

            <t>By contrast, non-conservative flow counting is used to count
            how many flows are represented in an Aggregated Flow. Flow
            counters are not distributed in this case. An original Flow which
            is present within N Aggregated Flows would add N to the sum of
            non-conservative flow counts, one to each Aggregated Flow. In
            other words, the sum of conservative flow counts over a set of
            Aggregated Flows is always equal to the number of original Flows,
            while the sum of non-conservative flow counts is strictly greater
            than or equal to the number of original Flows.</t>

            <t>For example, consider flows A, B, and C as illustrated in <xref
            target="intdist-fig"/>. Assume that the key aggregation step
            aggregates the keys of these three flows to the same aggregated
            flow key, and that start interval counter distribution is in
            effect. The conservative flow count for interval 0 is 3 (since
            flows A, B, and C all begin in this interval), and for the other
            two intervals is 0. The non-conservative flow count for interval 0
            is also 3 (due to the presence of flows A, B, and C), for interval
            1 is 2 ( flows B and C), and for interval 2 is 1 (flow 0). The sum
            of the conservative counts 3 + 0 + 0 = 3, the number of original
            Flows; while the sum of the non-conservative counts 3 + 2 + 1 =
            6.</t>

        </section>

        <section title="Counting Distinct Key Values" anchor="sec-distinct">

            <t>One common case in aggregation is counting distinct values that
            were projected out during key aggregation. For example, consider
            an application counting destinations contacted per host, a common
            case in host characterization or anomaly detection. Here, the
            Aggregation Process needs a way to export this distinct key count
            information.</t>

            <t>For such applications, a distinctCountOf(key name) Information
            Element should be registered with IANA to represent these cases.
            [EDITOR'S NOTE: There is an open question as to the best way to do
            this: either through the registration of Information Elements for
            common cases in this draft, the registration of Information
            Elements on demand, or the definition of a new Information Element
            space for distinct counts bound to a PEN, as in <xref
            target="RFC5103"/>.]</t>

        </section>
        

        <section title="Exact versus Approximate Counting during Aggregation"
        anchor="sec-lowfi">

            <t>In certain circumstances, particularly involving aggregation by
            devices with limited resources, and in situations where exact
            aggregated counts are less important than relative magnitudes
            (e.g. driving graphical displays), counter distribution during key
            aggregation may be performed by approximate counting means (e.g.
            Bloom filters).</t>

            <!-- <t>In certain cases, the magnitude of error for a given
            Information Element due to approximate counting may be known. An
            Exporting Process MAY use the Error Magnitude Options Template
            defined in <xref target="sec-ex-error"/> to export this
            information.</t> -->

        </section>

        <section title="Interval Combination">

            <t>One special case of aggregation uses adaptive Aggregation
            Intervals without any projection in order to join long-lived Flows
            which may have been split (e.g., due to an active timeout shorter
            than the Flow.) This is referred to as "Time Composition" in
            section 5.4 of <xref target="RFC5982"/>. Here, the Flow Key is
            unmodified, and the Aggregation Interval is chosen on a per-Flow
            basis to cover the interval spanned by the set of aggregated
            Flows. This may be applied alone in order to normalize split
            Flows, or in combination with other aggregation functions in order
            to obtain more accurate original Flow counts.</t>

        </section>

    </section>

    <section title="Aggregation in the IPFIX Architecture" anchor="sec-arch">

      <t>The techniques described in this document can be applied to IPFIX
      data at three stages within the collection infrastructure: on initial
      export, within a mediator, or after collection, as shown in <xref
      target="loc-fig"/>.</t>

      <figure title="Potential Aggregation Locations" anchor="loc-fig">
        <artwork><![CDATA[
+==========================================+
| Exporting Process                        |
+==========================================+
  |                                      |
  |             (Aggregated Flow Export) |
  V                                      |
+=============================+          |
| Mediator                    |          |
+=============================+          |
  |                                      |
  | (Aggregating Mediator)               |
  V                                      V
+==========================================+
| Collecting Process                       |
+==========================================+
        |
        | (Aggregation for Storage)
        V
+--------------------+
| IPFIX File Storage |
+--------------------+
        ]]></artwork>
      </figure>

      <t>Aggregation can be applied for either intermediate or final analytic
      purposes. In certain circumstances, it may make sense to export
      Aggregated Flows from an Exporting Process, for example, if the
      Exporting Process is designed to drive a time-series visualization
      directly. Note that this case, where the Aggregation Process is
      essentially integrated into the Metering Process, is essentially covered
      by the <xref target="RFC5470">IPFIX architecture</xref>: the flow keys
      used are simply a subset of those that would normally be used. A
      Metering Process in this arrangement MAY choose to simulate the
      generation of larger flows in order to generate original flow counts, if
      the application calls for compatibility with an Aggregation Process
      deployed in a separate location.</t>

      <t>Deployment of an Intermediate Aggregation Process within a <xref
      target="RFC5982">Mediator</xref> is a much more flexible arrangement. Here, the
      Mediator consumes original Flows and produces aggregated Flows; this
      arrangement is suited to any of the use cases detailed in <xref
      target="sec-usecase"/>. In a mediator, aggregation can be applied as
      well to aggregating original Flows from multiple sources into a single
      stream of aggregated Flows; the architectural specifics of this
      arrangement are not addressed in this document, which is concerned only
      with the aggregation operation itself; see <xref
      target="I-D.claise-ipfix-mediation-protocol"/> for details.</t>

      <t>In the specific case that an Aggregation Process is employed for data
      reduction for storage purposes, it can take original Flows from a
      Collecting Process or File Reader and pass Aggregated Flows to a File
      Writer for storage.</t>
      
       <t>The data flows into and out of an Intermediate Aggregation Process
      are showin in <xref target="iap-dataflows"/>.</t>

      <figure title="Data flows through the aggregation process" anchor="iap-dataflows">
                <artwork><![CDATA[
packets --+                     +- IPFIX Messages -+
          |                     |                  |
          V                     V                  V
+==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader |
|                  | +====================+ +=============+
|                  |            |  original Flows  |
|                  |            V                  V
+ - - - - - - - - -+======================================+
|           Intermediate Aggregation Process (IAP)        |
+=========================================================+
          | Aggregated                  Aggregated |
          | Flows                            Flows |
          V                                        V
+===================+                       +=============+
| Exporting Process |                       | File Writer |
+===================+                       +=============+
          |                                        |
          +------------> IPFIX Messages <----------+
        ]]></artwork>
        </figure>

    </section>

    <section title="Export of Aggregated IP Flows using IPFIX" anchor="sec-export">

        <t>In general, Aggregated Flows are exported in IPFIX as any normal Flow. However, certain aspects of aggregated flow export benefit from  additional guidelines, or new Information Elements to represent aggregation metadata or information generated during aggregation. These are detailed in the following subsections.</t>

        <section title="Time Interval Export">

            <t>Since an Aggregated Flow is simply a Flow, the existing
            timestamp Information Elements in the IPFIX Information Model
            (e.g., flowStartMilliseconds, flowEndNanoseconds) are sufficient
            to specify the time interval for aggregation. Therefore, this
            document specifies no new aggregation-specific Information
            Elements for exporting time interval information.</t>

            <t>Each Aggregated Flow SHOULD contain both an interval start and
            interval end timestamp. If an exporter of Aggregated Flows omits
            the interval end timestamp from each Aggregated Flow, the time
            interval for Aggregated Flows within an Observation Domain and
            Transport Session MUST be regular and constant.
            However, note that this approach might lead to interoperability
            problems when exporting Aggregated Flows to non-aggregation-aware
            Collecting Processes and downstream analysis tasks; therefore, an
            Exporting Process capable of exporting only interval start
            timestamps MUST provide a configuration option to export interval
            end timestamps as well.</t>

        </section>

        <section title="Flow Count Export" anchor="sec-ex-flowcount">

          <t>The following four Information Elements are defined to count original Flows as discussed in <xref target="sec-flowcount"/>.</t>

          <section title="originalFlowsPresent Information Element" anchor="ie-noncon-flowcount">
            <list style="hanging">
              <t hangText="Description: ">

                The non-conservative count of original Flows contributing to
                this Aggregated Flow. Non-conservative counts need not sum to
                the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD1</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlowsInitiated InformationElement" anchor="ie-con-flowstartcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows whose first packet is
                represented within this Aggregated Flow. Conservative counts
                must some to the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD2</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlowsCompleted InformationElement" anchor="ie-con-flowendcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows whose last packet is
                represented within this Aggregated Flow. Conservative counts
                must some to the original count on re-aggregation.

              </t>
              <t hangText="Abstract Data Type: ">unsigned64</t>
              <t hangText="ElementId: ">TBD3</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

          <section title="originalFlows InformationElement" anchor="ie-con-flowcount">
            <list style="hanging">
              <t hangText="Description: ">

                The conservative count of original Flows contributing to this
                Aggregated Flow; may be distributed via any of the methods
                described in <xref target="sec-distro"/>.

              </t>

              <t hangText="Abstract Data Type: ">float64</t>
              <t hangText="ElementId: ">TBD4</t>
              <t hangText="Status: ">Proposed</t>
            </list>
          </section>      

        </section>
        
        <section title="Aggregate Counter Distibution Export" anchor="sec-ex-distro">

            <t>When exporting counters distributed among Aggregated Flows, as
            described in <xref target='sec-distro'/>, the Exporting Process MAY
            export an Aggregate Counter Distribution Record for each Template
            describing Aggregated Flow records; this Options Template is
            described below. It uses the valueDistributionMethod Information
            Element, also defined below. Since in many cases distribution is
            simple, accounting the counters from contributing Flows to the
            first Interval to which they contribute, this is default
            situation, for which no Aggregate Counter Distribution Record is
            necessary; Aggregate Counter Distribution Records are only
            applicable in more exotic situations, such as using an Aggregation
            Interval smaller than the durations of original Flows.</t>

            <section title="Aggregate Counter Distribution Options Template">

                <t>This Options Template defines the Aggregate Counter
                Distribution Record, which allows the binding of a value
                distribution method to a Template ID. This is used to signal
                to the Collecting Process how the counters were distributed.
                The fields are as below:

                <texttable>
                    <ttcol align="left">IE</ttcol>
                    <ttcol align="left">Description</ttcol>
                    <c>templateId [scope]</c>
                    <c>

                      The Template ID of the Template defining the Aggregated
                      Flows to which this distribution option applies. This
                      Information Element MUST be defined as a Scope Field.

                    </c>
                    <c>valueDistributionMethod</c>
                    <c>

                        The method used to distribute the counters for the
                        Aggregated Flows defined by the associated Template.

                    </c>
                </texttable></t>
            </section>
            
            <section title="valueDistributionMethod Information Element" anchor="ie-errmag">
                <list style="hanging">
                    <t hangText="Description: ">

                        A description of the method used to distribute the
                        counters from contributing Flows into the Aggregated
                        Flow records described by an associated Template. The
                        method is deemed to apply to all the non-key
                        Information Elements in the referenced Template for
                        which value distribution is a valid operation; if the
                        originalFlowsInitiated and/or originalFlowsCompleted
                        Information Elements appear in the Template, they are
                        not subject to this distribution method, as they each
                        infer their own distribution method. The distribution
                        methods are taken from <xref target='sec-distro'/> and
                        encoded as follows:

                        <texttable>
                        <ttcol align="left">Value</ttcol>
                        <ttcol align="left">Description</ttcol>

                        <c>1</c><c>Start Interval: The counters for an
                        original Flow are added to the counters of the
                        appropriate Aggregated Flow containing the start time
                        of the original Flow. This should be assumed the
                        default if value distribution information is not
                        available at a Collecting Process for an Aggregated
                        Flow.</c>

                        <c>2</c><c>End Interval: The counters for an original
                        Flow are added to the counters of the appropriate
                        Aggregated Flow containing the end time of the
                        original Flow.</c>

                        <c>3</c><c>Mid Interval: The counters for an original
                        Flow are added to the counters of a single appropriate
                        Aggregated Flow containing some timestamp between
                        start and end time of the original Flow.</c>

                        <c>4</c><c>Simple Uniform Distribution: Each counter
                        for an original Flow is divided by the number of time
                        intervals the original Flow covers (i.e., of
                        appropriate Aggregated Flows sharing the same Flow
                        Key), and this number is added to each corresponding
                        counter in each Aggregated Flow.</c>

                        <c>5</c><c>Proportional Uniform Distribution: Each
                        counter for an original Flow is divided by the number
                        of time _units_ the original Flow covers, to derive a
                        mean count rate. This mean count rate is then
                        multiplied by the number of time units in the
                        intersection of the duration of the original Flow and
                        the time interval of each Aggregated Flow. This is
                        like simple uniform distribution, but accounts for the
                        fractional portions of a time interval covered by an
                        original Flow in the first and last time interval.</c>

                        <c>6</c><c>Simulated Process: Each counter of the
                        original Flow is distributed among the intervals of
                        the Aggregated Flows according to some function the
                        Aggregation Process uses based upon properties of
                        Flows presumed to be like the original Flow. This is
                        essentially an assertion that the Aggregation Process
                        has no direct packet timing information but is
                        nevertheless not using one of the other simpler
                        distribution methods. The Aggregation Process
                        specifically makes no assertion as to the correctness
                        of the simulation.</c>

                        <c>7</c><c>Direct: The Aggregation Process has access
                        to the original packet timings from the packets making
                        up the original Flow, and uses these to distribute or
                        recalculate the counters.</c>

                        </texttable>

                    </t> 
                    <t hangText="Abstract Data Type: ">unsigned8</t>
                    <t hangText="ElementId: ">TBD5</t> 
                    <t hangText="Status: ">Proposed</t> 
                  </list>
                </section>

        </section>

<!--
        <section title="Error Magnitude Export" anchor="sec-ex-error">

            <section title="errorMagnitude Information Element" anchor="ie-errmag">
                <list style="hanging">
                    <t hangText="Description: ">

                        The approximate magnitude of the error induced by the
                        Intermediate Aggregation Process in the values of the
                        associated Information Element, as a proportion of the
                        range of values covered by the Information Element;
                        SHOULD be associated with an informationElementId and
                        optional templateId as scope in an Options
                        Template.</t>

                    <t hangText="Abstract Data Type: ">float64</t>
                    <t hangText="ElementId: ">TBD6</t>
                    <t hangText="Status: ">Proposed</t>
                </list>
            </section>      

            <section title="Error Magnitude Options Template">
                <t>[TODO: define options template]</t>
            </section>

        </section>
-->

    </section>

    <section title="Examples">

        <t>[TODO]</t>

    </section>

    <section title="Security Considerations">

        <t>[TODO]</t>

    </section>

    <section title="IANA Considerations">

        <t>[TODO: add all IEs defined in Section 6.]</t>

    </section>

  </middle>
  
  <back>

  <references title="Normative References">
      <?rfc include="reference.RFC.5101" ?>
      <?rfc include="reference.RFC.5102" ?>
  </references>

  <references title="Informative References">
      <?rfc include="reference.RFC.5103" ?>
      <?rfc include="reference.RFC.5470" ?>
      <?rfc include="reference.RFC.5472" ?>
      <?rfc include="reference.RFC.5153" ?>
      <?rfc include="reference.RFC.5610" ?>
      <?rfc include="reference.RFC.5655" ?>
      <?rfc include="reference.RFC.5982" ?>
      <?rfc include="reference.RFC.3917" ?>
      <?rfc include="reference.RFC.2119" ?>
      <?rfc include="reference.I-D.ietf-ipfix-anon" ?>
      <?rfc include="reference.I-D.ietf-ipfix-mediators-framework" ?>
      <?rfc include="reference.I-D.claise-ipfix-mediation-protocol" ?>
  </references>

</back>
</rfc>

PAFTECH AB 2003-20262026-04-23 19:50:46