One document matched: draft-ietf-conex-abstract-mech-08.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!--
See the bottom of the document for formatting instructions
TODO:


Figure float
New build&bibxml management


Check: http://trac.tools.ietf.org/wg/conex/trac/report/6?sort=component
@@'s
-->
<!-- Alterations to I-D/RFC boilerplate -->
<?rfc strict="no" ?>
<!-- Default strict="no" Don't check I-D nits -->
<?rfc rfcedstyle="yes" ?>
<!-- IETF process -->
<?rfc ipr="yes" ?>
<!-- Matt: Not a problem, as long as all IPR leads to a free licence. -->
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->
<?rfc sortrefs="yes"?>
<!-- Default sortrefs="no" Don't sort references into order -->
<?rfc comments="yes" ?>
<!-- Default comments="no" Don't render comments -->
<?rfc inline="no" ?>
<!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section -->
<?rfc compact="yes"?>
<?rfc subcompact="yes"?>
<?rfc emoticonic="yes" ?>
<!-- Default emoticonic="no" Doesn't prettify HTML format -->
<rfc category="info" docName="draft-ietf-conex-abstract-mech-08"
     ipr="trust200902">
  <front>
    <title abbrev="ConEx Concepts and Abstract Mechanism">Congestion Exposure
    (ConEx) Concepts and Abstract Mechanism</title>

    <author fullname="Matt Mathis" initials="M." surname="Mathis">
      <organization>Google, Inc</organization>

      <address>
        <postal>
          <street>1600 Amphitheater Parkway</street>

          <city>Mountain View</city>

          <code>93117</code>

          <region>California</region>

          <country>USA</country>
        </postal>

        <email>mattmathis at google.com</email>
      </address>
    </author>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <street>Martlesham Heath</street>

          <city>Ipswich</city>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <email>bob.briscoe@bt.com</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <date day="21" month="October" year="2013"/>

    <area>Transport</area>

    <workgroup>Congestion Exposure (ConEx) Working Group</workgroup>

    <keyword>Quality of Service</keyword>

    <keyword>QoS</keyword>

    <keyword>Congestion Control</keyword>

    <keyword>Signaling</keyword>

    <keyword>Protocol</keyword>

    <keyword>Encoding</keyword>

    <keyword>Audit</keyword>

    <keyword>Policing</keyword>

    <abstract>
      <t>This document describes an abstract mechanism by which senders inform
      the network about the congestion encountered by packets earlier in the
      same flow. Today, network elements at any layer may signal congestion to
      the receiver by dropping packets or by ECN markings, and the receiver
      passes this information back to the sender in transport-layer feedback.
      The mechanism described here enables the sender to also relay this
      congestion information back into the network in-band at the IP layer,
      such that the total amount of congestion from all elements on the path
      is revealed to all IP elements along the path, where it could, for
      example, be used to provide input to traffic management. This mechanism
      is called congestion exposure or ConEx. The companion document "ConEx
      Concepts and Use Cases" provides the entry-point to the set of ConEx
      documentation.</t>
    </abstract>
  </front>

  <middle>
    <!-- ================================================================ -->

    <section anchor="abstrmech_Introduction" title="Introduction">
      <t>This document describes an abstract mechanism by which, to a first
      approximation, senders inform the network about the congestion
      encountered by packets earlier in the same flow. It is not a complete
      protocol specification, because it is known that designing an encoding
      (e.g. packet formats, codepoint allocations, etc) is likely to entail
      compromises that preclude some uses of the protocol. The goal of this
      document is to provide a framework for developing and testing algorithms
      to evaluate the benefits of the ConEx protocol and to evaluate the
      consequences of the compromises in various different encoding
      designs.</t>

      <t>A companion document <xref target="RFC6789"/> provides the entry
      point to the set of ConEx documentation. It outlines concepts that are
      pre-requisites to understanding why ConEx is useful, and it outlines
      various ways that ConEx might be used.</t>
    </section>

    <section anchor="abstrmech_Overview" title="Overview">
      <t><!--
  As transport protocols continually seek out more network capacity,
   network elements signal whenever congestion results, and the
   transports are responsible for controlling this network congestion.
--> As typical end-to-end transport protocols continually seek out more
      network capacity, network elements signal whenever congestion results,
      and the transports are responsible for controlling this network
      congestion <xref target="RFC5681"/>. The more a transport tries to use
      capacity that others want to use, the more congestion signals will be
      attributable to that transport. Likewise, the more transport sessions
      sustained by a user and the longer the user sustains them, the more
      congestion signals will be attributable to that user. The goal of ConEx
      is to ensure that the resulting congestion signals are sufficiently
      visible and robust, because they are an ideal metric for networks to use
      as the basis of traffic management or other related functions.</t>

      <t>Networks indicate congestion by three possible signals: packet loss,
      ECN marking or queueing delay. ECN marking and some packet loss may be
      the outcome of Active Queue Management (AQM), which the network uses to
      warn senders to reduce their rates. Packet loss is also the natural
      consequence of complete exhaustion of a buffer or other network
      resource. Some experimental transport protocols and TCP variants infer
      impending congestion from increasing queuing delay. However, delay is
      too amorphous to use as a congestion metric. In this and other ConEx
      documents, the term 'congestion signals' is generally used solely for
      ECN markings and packet losses, because they are unambiguous signals of
      congestion.</t>

      <!--
<list style="symbols">
          <t>The most common congestion signal is packet loss. When congested,
          the network simply discards some packets either as part of
          active queue management <xref target="RFC2309"></xref> or as the
          consequence of a queue overflow or other resource starvation. The
          transport receiver detects that some data is missing and signals
          such through transport acknowledgments to the transport sender (e.g. 
          TCP duplicate acknowledgements or SACK options). The sender performs the appropriate congestion
          control rate reduction (e.g. <xref target="RFC5681"></xref> for TCP).
          </t>


          <t>If the transport supports explicit congestion notification (ECN)
          <xref target="RFC3168"></xref> or pre-congestion notification (PCN)
          <xref target="RFC5670"></xref> , the transport sender indicates this
          by setting an ECN-capable transport (ECT) codepoint in the IP header of every packet.
          Network devices can then explicitly signal congestion to the
          receiver by changing the codepoint in the IP header from ECT to ECN
          (1 bit change) of such packets. The
          transport receiver communicates these ECN signals back to the
          sender, which then performs the appropriate congestion control rate
          reduction.</t>


          


        </list></t>
-->

      <t>In both cases the congestion signals follow the route indicated in
      <xref target="abstrmech_Fig_ConEx_Placement"/>. A congested network
      device sends a signal in the data stream on the forward path to the
      transport receiver, the receiver passes it back to the sender through
      transport level feedback, and the sender makes some congestion control
      adjustment.</t>

      <t>This document extends the capabilities of the Internet protocol suite
      with the addition of a new Congestion Exposure signal. To a first
      approximation this signal, also shown in <xref
      target="abstrmech_Fig_ConEx_Placement"/>, relays the congestion
      information from the transport sender back through the internetwork
      layer where it is visible to any interested internetwork layer devices
      along the forward path. This document frames the engineering problem of
      designing the ConEx signal. The requirements are described in <xref
      target="abstrmech_Requirements"/> and some example encoding are
      presented in <xref target="abstrmech_Representing_ConEx"/>. <xref
      target="abstrmech_ConEx_Components"/> describes all of the protocol
      components.</t>

      <t>This new signal is expressly designed to support a variety of new
      policy mechanisms that might be used to instrument, monitor or manage
      traffic. The policy devices are not shown in <xref
      target="abstrmech_Fig_ConEx_Placement"/> but might be placed anywhere
      along the forward data path (see <xref
      target="abstrmech_Policy_Devices"/>).</t>

      <figure anchor="abstrmech_Fig_ConEx_Placement"
              title="The Flow of Congestion and ConEx Signals">
        <!--
123456789012345678901234567890123456789012345678901234567890123456789 -->

        <artwork><![CDATA[
,---------.                                               ,---------.
|Transport|                                               |Transport|
| Sender  |   .                                           |Receiver |
|         |  /|___________________________________________|         |
|     ,-<---------------Congestion-Feedback-Signals--<--------.     |
|     |   |/                                              |   |     |
|     |   |\           Transport Layer Feedback Flow      |   |     |
|     |   | \  ___________________________________________|   |     |
|     |   |  \|                                           |   |     |
|     |   |   '         ,-----------.               .     |   |     |
|     |   |_____________|           |_______________|\    |   |     |
|     |   |    IP Layer |           |  Data Flow      \   |   |     |
|     |   |             |(Congested)|                  \  |   |     |
|     |   |             |  Network  |--Congestion-Signals--->-'     |
|     |   |             |  Device   |                    \|         |
|     |   |             |           |                    /|         |
|     `----------->--(new)-IP-Layer-ConEx-Signals-------->|         |
|         |             |           |                  /  |         |
|         |_____________|           |_______________  /   |         |
|         |             |           |               |/    |         |
`---------'             `-----------'               '     `---------'
]]></artwork>
      </figure>

      <t>Since the policy devices can affect how traffic is treated it is
      assumed that there is an intrinsic motivation for users, applications or
      operating systems to understate the congestion that they are causing.
      Therefore, it is important to be able to audit ConEx signals, and to be
      able apply sufficient sanction to discourage cheating of congestion
      policies. The general approach to auditing is to count signals on the
      forward path to confirm that there are never fewer ConEx signals than
      congestion signals. Many ConEx design constraints come from the need to
      assure that the audit function is sufficiently robust. The audit
      function is described in <xref target="abstrmech_Audit"/>, however
      significant portions of this document (and prior research <xref
      target="Refb-dis"/>) is motivated by issues relating to the audit
      function and making it robust.</t>

      <t>The congestion and ConEx signals shown in <xref
      target="abstrmech_Fig_ConEx_Placement"/> represent a series of discrete
      events: ECN marks or lost packets, carried by the forward data stream
      and fed back into the Internetwork layer. The policy and audit functions
      are most likely to act on the accumulated values of these signals, for
      which we use the term "volume". For example traffic volume is the total
      number of bytes delivered, optionally over a specified time interval and
      over some aggregate of traffic (e.g. all traffic from a site). While
      loss-volume is the total amount of bytes discarded from some aggregate
      over an interval. The term congestion-volume is defined precisely in
      <xref target="RFC6789"/>. Note that volume per unit time is (average)
      rate.</t>

      <t>A design goal of the ConEx protocol is that the important policy
      mechanisms can be implemented per logical link without per flow state
      (see <xref target="abstrmech_Policy_Devices"/>). However, the price to
      pay can be flow state to audit ConEx signals (<xref
      target="abstrmech_Audit"/>). This is justified in that i) auditing at
      the edges, with limited per flow state, enables policy elsewhere,
      including in the core, without any per flow state; ii) auditing can use
      soft flow state, which does not require route pinning.</t>

      <t>There is a long standing argument over units of congestion: bytes vs
      packets (see <xref target="I-D.ietf-tsvwg-byte-pkt-congest"/> and its
      references). <xref target="abstrmech_Byte_Pkt"/> explains why this
      problem must be addressed carefully. However, this document does not
      take a strong position on this issue. Nonetheless, it does require that
      the units of congestion must be an explicitly stated property of any
      proposed encoding, and the consequences of that design decision must be
      evaluated along with other aspects of the design.</t>

      <!-- Furthermore, unifying these perspectives is likely to rely on a units conversion using the lengths of packets from successive transport round trips. -->

      <t>To be successful the ConEx protocol must have the property that the
      relevant stakeholders each have the incentive to unilaterally start on
      each stage of partial deployment, which in turn creates incentives for
      further deployment. Furthermore, legacy systems that will never be
      upgraded do not become a barrier to deploying ConEx. Issues relating to
      partial deployment are described in <xref
      target="abstrmech_Incr_Deploy"/>.</t>

      <!--  or using partial signals to improve traffic management -->

      <t>Note that ConEx signals are not intended to be used for fine-grained
      congestion control. They are anticipated to be most useful at longer
      time scales, for example the total congestion caused by a user might
      serve as an input to higher level policy or accountability functions,
      designed to create incentives for improving user behavior, such as
      choosing to send large quantities of data at off-peak times, at lower
      data rates or with less aggressive protocols such as LEDBAT <xref
      format="default" target="RFC6817"/> (see <xref target="RFC6789"/>).</t>

      <t>Ultimately ConEx signals have the potential to provide a mechanism to
      regulate global Internet congestion. From the earliest days of
      congestion control research there has been a concern that there is no
      mechanism to prevent transport designers from incrementally making
      protocols more aggressive without bound and spiraling to a "tragedy of
      the commons" Internet congestion collapse. The "TCP friendly" paradigm
      was created in part to forestall this failure. However, it no longer
      commands any authority because it has little to say about the Internet
      of today, which has moved beyond the scaling range of standard TCP. As a
      consequence, many transports and applications are opening arbitrarily
      large numbers of connections or using arbitrary levels of
      aggressiveness. ConEx represents a recognition that the IETF cannot
      regulate this space directly because it concerns the behaviour of users
      and applications, not individual transport protocols. Instead the IETF
      can give network operators the protocol tools to arbitrate the space
      themselves, with better bulk traffic management. This in turn should
      create incentives for users, and designers of application and of
      transport protocols to be more mindful about contributing to
      congesting.</t>

      <section anchor="abstrmech_Terminology" title="Terminology">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in RFC 2119 <xref
        target="RFC2119"/>.</t>

        <t>ConEx signals in IP packet headers from the sender to the
        network:<list style="hanging">
            <t hangText="Not-ConEx:">The transport (or at least this packet)
            is not ConEx-capable.</t>

            <t hangText="ConEx-Capable:">The transport is ConEx-Capable. This
            is the opposite of Not-ConEx.</t>

            <t hangText="ConEx Signal:">A packet sent by a ConEx Capable
            transport. It carries at least one of the following signals: <list
                style="hanging">
                <t hangText="Re-Echo-Loss:">The transport has experienced a
                loss.</t>

                <t hangText="Re-Echo-ECN:">The transport has experienced an
                ECN mark.</t>

                <t hangText="Credit:">The transport is building up credit to
                signal advance notice of the risk of packets contributing to
                congestion, in contrast to signalling only after inherently
                delayed feedback of actual congestion (see <xref
                target="abstrmech_Credit_Simple_Audit"/>)</t>

                <t hangText="ConEx-Not-Marked:">The transport is ConEx-capable
                but is signaling none of Re-Echo-Loss, Re-Echo-ECN or
                Credit.</t>
              </list></t>

            <t hangText="ConEx-Marked:">At least one of Re-Echo-Loss,
            Re-Echo-ECN or Credit.</t>
          </list></t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_Requirements"
             title="Requirements for the ConEx Abstract Mechanism">
      <t>First time readers may wish to skim this section, since it is more
      understandable having read the entire document.</t>

      <section anchor="abstrmech_Requirements_Signals"
               title="Requirements for ConEx Signals">
        <t>Ideally, all the following requirements would be met by a
        Congestion Exposure Signal:<list style="letters">
            <t>The ConEx Signal SHOULD be visible to internetwork layer
            devices along the entire path from the transport sender to the
            transport receiver. Equivalently, it SHOULD be present in the IPv4
            or IPv6 header, and in the outermost IP header if using IP in IP
            tunneling. The ConEx Signal SHOULD be immutable once set by the
            transport sender. A corollary of these requirements is that the
            chosen ConEx encoding SHOULD pass silently without modification
            through pre-existing networking gear.</t>

            <t>The ConEx Signal SHOULD be useful under only partial
            deployment. A minimal deployment SHOULD only require changes to
            transport senders. Furthermore, partial deployment SHOULD create
            incentives for additional deployment, both in terms of enabling
            ConEx on more devices and adding richer features to existing
            devices. Nonetheless, ConEx deployment need never be universal,
            and it is anticipated that some hosts and some transports may
            never support the ConEx Protocol and some networks may never use
            the ConEx Signals.</t>

            <t>The ConEx signal SHOULD be timely. There will be a minimum
            delay of one RTT, and often longer if the transport protocol sends
            infrequent feedback (consider RTCP <xref target="RFC3550"/> for
            example).</t>

            <t>The ConEx signal SHOULD be accurate and auditable. The general
            approach is to observe the volume of congestion signals and ConEx
            signals on the forward data path and verify that the ConEx signals
            do not under-represent the congestion signals (see <xref
            target="abstrmech_Audit"/>). The simplest mechanism to compensate
            for the round trip delay between the signals is for the sender to
            include a "credit" signal to cover the yet to be observed
            congestion that might occur during this delay. (see <xref
            target="abstrmech_Credit_Simple_Audit"/> for details).
            Furthermore, the ConEx signals for packet loss and ECN marking
            SHOULD have distinct encodings because they are likely to require
            different auditing techniques.</t>
          </list></t>

        <t>It is already known that implementing ConEx signals is likely to
        entail some compromises, and therefore all the requirements above are
        expressed with the keyword 'SHOULD' rather than 'MUST'. The only
        mandatory requirement is that a concrete protocol description MUST
        give sound reasoning if it chooses not to meet some requirement.</t>
      </section>

      <section anchor="abstrmech_Audit_Behave_Constraints"
               title="Requirements for the Audit Function">
        <t>The role and constraints on the audit function are described in
        <xref target="abstrmech_Audit"/>. There is no intention to standardise
        the audit function. However, it is necessary to lay down the following
        normative constraints on audit behaviour so that transport designers
        will know what to design against and implementers of audit devices
        will know what pitfalls to avoid: <list style="hanging">
            <t hangText="Minimal False Hits:">Audit SHOULD introduce minimal
            false hits for honest flows;</t>

            <t hangText="Minimal False Misses:">Audit SHOULD quickly detect
            and sanction dishonest flows, ideally on the first dishonest
            packet;</t>

            <t hangText="Transport Oblivious:">Audit SHOULD NOT be designed
            around one particular rate response, such as any particular TCP
            congestion control algorithm or one particular resource sharing
            regime such as TCP-friendliness <xref target="RFC5348"/>. An
            important goal is to give ingress networks the freedom to
            unilaterally allow different rate responses to congestion and
            different resource sharing regimes <xref target="Evol_cc"/>,
            without having to coordinate with other networks over details of
            individual flow behaviour;</t>

            <t hangText="Sufficient Sanction:">Audit SHOULD introduce
            sufficient sanction (e.g. loss in goodput) such that senders
            cannot gain from understating congestion;</t>

            <t hangText="Proportionate Sanction:">To the extent that the audit
            might be subject to false hits, the sanction SHOULD be
            proportionate to the degree to which congestion is understated. If
            audit over-punishes, attackers will find ways to harness it into
            amplifying attacks on others. Ideally audit should, in the
            long-run, cause the user to get no better performance than they
            would get by being accurate.</t>

            <!-- Ideally the consequences of a false hit would be only moderately more severe than the likely policy response to the same degree of congestion. -->

            <t hangText="Manage Memory Exhaustion:">Audit SHOULD be able to
            counter state exhaustion attacks. For instance, if the audit
            function uses flow-state, it should not be possible for senders to
            exhaust its memory capacity by gratuitously sending numerous
            packets, each with a different flow ID.</t>

            <t hangText="Identifier Accountability:">Audit SHOULD NOT be
            vulnerable to `identity whitewashing', where a transport can label
            a flow with a new ID more cheaply than paying the cost of
            continuing to use its current ID <xref target="CheapPseud"/>;</t>
          </list></t>
      </section>

      <section anchor="abstrmech_Secific_Constraints"
               title="Requirements for non-abstract ConEx specifications">
        <t>An experimental ConEx specification SHOULD describe the following
        protocol details:<list style="hanging">
            <t hangText="Network Layer:"><list style="letters">
                <t>The specific ConEx signal encodings with packet formats,
                bit fields and/or code points;</t>

                <t>An inventory of invalid combinations of flags or invalid
                codepoints in the encoding. Whether security gateways should
                normalise, discard or ignore such invalid encodings, and what
                values they should be considered equivalent to by ConEx-aware
                elements;</t>

                <t>An inventory of any conflated signals or any other effects
                that are known to compromise signal integrity;</t>

                <t>Whether the source is responsible for allowing for the
                round trip delay in ConEx signals (e.g. using a Credit
                marking), and if so whether Credit is maintained for the
                duration of a flow or degrades over time, and what defines the
                end of the duration of a flow;</t>

                <t>A specification for signal units (bytes vs packets, etc),
                any approximations allowed and algorithms to do any implied
                conversions or accounting;</t>

                <t>If the units are bytes a definition of which headers are
                included in the size of the packet;</t>

                <t>How tunnels should propagate the ConEx encoding;</t>

                <t>Whether the encoding fields are mutable or not, to ensure
                that header authentication, checksum calculation, etc. process
                them correctly. A ConEx encoding field SHOULD be immutable
                end-to-end, then end points can detect if it has been tampered
                with in transit;</t>

                <t>if a specific encoding allows mutability (e.g. at proxies),
                an inventory of invalid transitions between codepoints. In all
                encodings, transitions from any ConEx marking to Not-ConEx
                MUST be invalid;</t>

                <t>A statement that the ConEx encoding is only applicable to
                unicast and anycast, and that forwarding elements should
                silently ignore any ConEx signalling on multicast packets
                (they should be forwarded unchanged)</t>

                <t>Definition of any extensibility;</t>

                <t>Backward and forward compatibility and potential migration
                strategies. In all cases, a ConEx encoding MUST be arranged so
                that legacy transport senders implicitly send Not-ConEx;</t>

                <t>Any (optional) modification to data-plane forwarding
                dependent on the encoding (e.g. preferential discard,
                interaction with Diffserv, ECN etc.);</t>

                <t>Any warning or error messages relevant to the encoding.</t>
              </list></t>

            <t><vspace blankLines="1"/>Note regarding item J on multicast: A
            multicast tree may involve different levels of congestion on each
            leg. Any traffic management can only monitor or control multicast
            congestion at or near each receiver. It would make no sense for
            the sender to try to expose "whole path congestion" in sent
            packets, because it cannot hope to describe all the differing
            congestion levels on every leg of the tree.</t>

            <t hangText="Transport Layer:"><list style="letters">
                <t>A specification of any required changes to congestion
                feedback in particular transport protocols.</t>

                <t>A specification (or minimally a recommendation) for how a
                transport should estimate credits at the beginning of a
                connection and while it is in progress.</t>

                <t>A specification of whether any other protocol options
                should (or must) be enabled along with an implementation of
                ConEx (e.g. at least attempting to negotiate ECN and SACK
                capability);</t>

                <t>A specification of any configuration that a ConEx stack may
                require (or preferably confirmation that it requires no
                configuration);</t>

                <t>A specification of the statistics that a protocol stack
                should log for each type of marking on a per-flow or aggregate
                basis.</t>
              </list></t>

            <t hangText="Security:"><list style="letters">
                <t>An example of a strong audit algorithm suitable for
                detecting if a single flow is misstating congestion. This
                algorithm should present minimal false results, but need not
                have optimal scaling properties (e.g. may need per flow
                state).</t>

                <t>An example of an audit algorithm suitable for detecting
                misstated congestion in a large aggregate (e.g. no per-flow
                state).</t>
              </list></t>
          </list></t>

        <t>The possibility exists that these specifications over constrain the
        ConEx design, and can not be fully satisfied. An important part of the
        evaluation of any particular design will be a thorough inventory of
        all ways in which it might fail to satisfy these specifications.</t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_Representing_ConEx"
             title="Encoding Congestion Exposure">
      <t>Most protocol specifications start with a description of packet
      formats and codepoints with their associated meanings. This document
      does not: It is already known that choosing the encoding for ConEx is
      likely to entail some engineering compromises that have the potential to
      reduce the protocol's usefulness in some settings. For instance the
      experimental ConEx encoding chosen for IPv6 <xref
      target="I-D.ietf-conex-destopt"/> had to make compromises on tunnelling.
      Rather than making these engineering choices prematurely, this document
      side steps the encoding problem by making it abstract. It describes
      several different representations of ConEx Signals, none of which are
      specified to the level of specific bits or code points.</t>

      <!-- <t>A companion documents <xref target="RFC6789" /> describes the preliminary use cases for ConEx in terms of these abstract representations.</t> -->

      <t>The goal of this approach is to be as complete as possible for
      discovering the potential usage and capabilities of the ConEx protocol,
      so we have some hope of making optimal design decisions when choosing
      the encoding. Even if experiments reveal particular problems due to the
      encoding, then this document will still serve as a reference model.</t>

      <!-- <t>Ideally, this document would not describe encoding at all, and leave that little detail to some future document.  However, given the protocol engineering mindset of most readers, we have observed that nearly everybody invents an encoding in order to help themselves understand ConEx document. </t> -->

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Simple_Encoding" title="Naïve Encoding">
        <t>For tutorial purposes, it is helpful to describe a naïve
        encoding of the ConEx protocol for TCP and similar protocols: set a
        bit (not specified here) in the IP header on each retransmission and
        on each ECN signaled window reduction. Network devices along the
        forward path can see this bit and act on it. For example any device
        along the path might limit the rate of all traffic if the rate of
        marked (congested) packets exceeds a threshold.</t>

        <!--        <t>For tutorial purposes, it is helpful to describe a naïve encoding of the ConEx protocol for TCP and similar protocols: set a bit (not specified here) in the IP header on all non-retransmissions, except for once per ECN signaled window reduction.   This encoding conflates Not-ConEx and all ConEn-Marked signals (Re-Echo-Loss, Re-Echo-ECN and Credit).   
Network devices along the forward path can see this bit and act on it. For example any device along the path might give preferential treatment to marked (e.g. uncongested) packets.</t> -->

        <t>This simple encoding is sufficient to illustrate many of the
        benefits envisioned for ConEx. At first glance it looks like it might
        motivate people to deploy and use it. It is a one line code change
        that a small number of OS developers and content providers could
        unilaterally deploy across a significant fraction of all Internet
        traffic. However, this encoding does not support auditing so it would
        also motivate users and/or applications to misrepresent the congestion
        that they are causing <xref target="RFC3514"/>. As a consequence the
        naïve encoding is not likely to be trusted and thus creates its
        own disincentives for deployment.</t>

        <t>Nonetheless, this Naïve encoding does present a clear mental
        model of how the ConEx protocol might function under various uses. It
        is useful for thought experiments where it can be stipulated that all
        participants are honest and it does illustrate some of the incentives
        that might be introduced by ConEx.</t>
      </section>

      <section anchor="abstrmech_Null_Encoding" title="Null Encoding">
        <t>In limited contexts it is possible to implement ConEx-like
        functions without any signals at all by measuring rest-of-path
        congestion directly from TCP headers. The algorithm is to keep at
        least one RTT of past TCP headers and matching each new header against
        the history to count duplicate data.</t>

        <t>This could implement many ConEx policies, without any explicit
        protocol. It is fairly easy to implement, at least at low rate (e.g.
        in a software based edge router). However, it would only be useful in
        cases where the network operator can see the TCP headers. This is
        currently (2012) the vast majority of traffic because UDP, IPSEC and
        VPN tunnels are used far less than SSL or TLS over TCP/IP, which do
        not hide TCP sequence numbers from network devices. However, anyone
        specifically intending to avoid the attention of a congestion policy
        device would only have to hide their TCP headers from the network
        operator (e.g. by using a VPN tunnel).</t>
      </section>

      <!-- ________________________________________________________________ -->

      <!---->

      <section anchor="abstrmech_ECN_Encoding" title="ECN Based Encoding">
        <t>The re-ECN specification <xref
        target="I-D.briscoe-conex-re-ecn-tcp"/> presents an encoding of ConEx
        in IPv4 and IPv6 that was tightly integrated with ECN encoding in
        order to fit into the IPv4 header. ConEx and ECN are orthogonal
        signals in the sense that any individual packet may need to represent
        any one of the 4 possible combinations of signal values. Ideally their
        encoding should be entirely independent. However, given the limited
        number of header bits and/or code points, re-ECN chooses to partially
        share code points and to re-echo both losses and ECN with just one
        codepoint.</t>

        <t>The central theme of the re-ECN work is an audit mechanism that
        provides sufficient disincentives against misrepresenting congestion
        <xref target="I-D.briscoe-conex-re-ecn-motiv"/>. It is analyzed
        extensively in Briscoe's PhD dissertation <xref target="Refb-dis"/>.
        For a tutorial background on re-ECN motivation and techniques, see
        [<xref format="counter" target="Re-fb"/>, <xref format="counter"
        target="FairerFaster"/>].</t>

        <t>Re-ECN is an example of one chosen set of compromises attempting to
        meet the requirements of <xref target="abstrmech_Requirements"/>. The
        present document takes a step back, aiming to state the ideal
        requirements in order to allow the Internet community to assess
        whether different compromises might be better.</t>

        <t>The problem with Re-ECN is that it requires that receivers be ECN
        enabled in addition to sender changes. Newer encodings <xref
        target="I-D.ietf-conex-destopt"/> overcome this problem by being able
        to represent loss and ECN based congestion separately.</t>

        <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->
      </section>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Separate" title="Independent Bits">
        <t>This encoding involves flag bits, each of which the sender can set
        independently to indicate to the network one of the following four
        signals:<list style="hanging">
            <t hangText="ConEx (Not-ConEx)">The transport is (or is not) using
            ConEx with this packet (the protocol must be arranged so that
            legacy transport senders implicitly send Not-ConEx; see network
            layer encoding requirement L in <xref
            target="abstrmech_Secific_Constraints"/>)</t>

            <t hangText="Re-Echo-Loss (Not-Re-Echo-Loss)">The transport has
            (or has not) experienced a loss</t>

            <t hangText="Re-Echo-ECN (Not-Re-Echo-ECN)">The transport has (or
            has not) experienced ECN-signaled congestion</t>

            <t hangText="Credit (Not-Credit)">The transport is (or is not)
            building up congestion credit (see <xref
            target="abstrmech_Audit"/> on the audit function)</t>
          </list></t>

        <t>A packet with ConEx set combined with all the three other flags
        cleared implies ConEx-Not-Marked</t>

        <t>This encoding does not imply any exclusion property among the
        signals. Multiple types of congestion (ECN, loss) can be signalled on
        the same ACK. However, there will be many invalid combinations of
        flags (e.g. Not-ConEx combined with any of the ConEx-marked flags),
        which could be used to advantage against naive policy devices that
        only check each flag separately.</t>

        <t>As long as the packets in a flow have uniform sizes, it does not
        matter whether the units of congestion are packets or bytes. However,
        if an application sends very irregular packet sizes, it may be
        necessary for the sender to mark multiple packets to avoid being in
        technical violation of the audit function.</t>
      </section>

      <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->

      <section anchor="abstrmech_Enumerated" title="Codepoint Encoding">
        <t>This encoding involves signaling one of the following five
        codepoints:</t>

        <t>ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN,
        Credit}</t>

        <t>Each named codepoint has the same meaning as in the encoding using
        independent bits in the previous section. The use of any one codepoint
        implies the negative of all the others.</t>

        <t>Inherently, the semantics of most of the enumerated codepoints are
        mutually exclusive. 'Credit' is the only one that might need to be
        used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even
        that requirement is questionable. It must not be forgotten that the
        enumerated encoding loses the flexibility to signal these two
        combinations, whereas the encoding with four independent bits is not
        so limited. Alternatively two extra codepoints could be assigned to
        these two combinations of semantics. The comment in the previous
        section about units also applies.</t>

        <!--{ToDo: Signal from Policer to Receiver to distinguish policy-induced drop from congestion-induced drop. 
Bob NIX this, it is not in scope. -MM}-->

        <!--Some might prefer to use the following colours respectively for each codepoint. 
The same colours as follows (with the omission of Purple) were used to describe re-ECN codepoints:
{Hmmm, I changed them above, I strongly prefer white to be unmarked ConEx enabled, and a non-color (blank?) to be non-conex.}
ENUM {White, Grey, Purple, Black, Green}.
-->
      </section>

      <!--  <section anchor="abstrmech_Explicit" title="Explicit Carriage">
<t>Add an IPv6 header option to explicitly indicate the number of marked bytes (or packets), with bits indicating marking type. expand.</t>


</section> -->

      <section anchor="abstrmech_Byte_Pkt"
               title="Units Implied by an Encoding">
        <t>The following comments apply generally to all the other
        encodings.</t>

        <t>Congestion can be due to exhaustion of bit-carrying capacity, or
        exhaustion of packet processing power. When a packet is discarded or
        marked to indicate congestion, there is no easy way to know whether
        the lost or marked packet signifies bit-congestion or
        packet-congestion. The above ConEx encodings that rely on marking
        packets suffer from the same ambiguity.</t>

        <t>This problem is most acute when audit needs to check that one count
        of markings matches another. For example if there are ConEx markings
        on three large (1500B) packets, is that sufficient to match the loss
        of 5 small (60B) packets? If a packet-marking is defined to mean all
        the bytes in the packet are marked, then we have 4500B of Conex marked
        data against 300B of lost data, which is easily sufficient. If instead
        we are counting packets, then we have 3 ConEx packets against 5 lost
        packets, which is not sufficient. This problem will not arise when all
        the packets in a flow are the same size, but a choice needs to be made
        for flows in which packet sizes vary, such as BGP, SPDY and some
        variable rate video encoding schemes.</t>

        <t>Whether to use bytes or packets is not obvious. For instance, the
        most expensive links in the Internet, in terms of cost per bit, are
        all at lower data rates, where transmission times are large and packet
        sizes are important. In order for a policy to consider wire time, it
        needs to know the number of congested bytes. However, high speed
        networking equipment and the transport protocols themselves sometimes
        gauge resource consumption and congestion in terms of packets.</t>

        <t>This document does not take a strong position on this issue.
        However, a ConEx encoding will need to explicitly specify whether it
        assumes units of bytes or packets consistently for both congestion
        indications and ConEx markings (see network layer requirement E in
        <xref target="abstrmech_Secific_Constraints"/>). It may help to refer
        to the guidance in <xref
        target="I-D.ietf-tsvwg-byte-pkt-congest"/>.</t>

        <t><xref target="I-D.ietf-tsvwg-byte-pkt-congest"/> advises that
        congestion indications should be interpreted in units of bytes when
        responding to congestion, at least on today's Internet. In any TCP
        implementation this is simple to achieve for varying size packets,
        given TCP SACK tracks losses in bytes. If an encoding is specified in
        units of bytes, the encoding should also specify which headers to
        include in the size of a packet (see network layer requirement F in
        <xref target="abstrmech_Secific_Constraints"/>).</t>

        <!--
<t>We could require that a ConEx encoding specifies whether ConEx markings are in units of bytes or packets. But the problem is deeper than that: we do not even know whether congestion signals themselves (loss & ECN) are in units of bytes or packets. 
</t>


<t>Therefore a ConEx encoding SHOULD specify whether it assumes units of bytes or packets for both ConEx markings and for congestion indications.
</t>


<t><xref target="I-D.ietf-tsvwg-byte-pkt-congest" /> advises that congestion indications SHOULD be interpreted in units of bytes when responding to congestion, at least on today's Internet. In any TCP implementation this is simple to achieve for varying size packets, given TCP SACK tracks losses in bytes. 
</t>


<t>For example, to implement ConEx in bytes, the sender maintains a counter of outstanding bytes to be ConEx-marked. When the SACK options report the size of a loss, this is added to the counter, and whenever the counter is positive the next data packet is ConEx-marked and its size subtracted from the counter. Then, if one 1500B packet is lost, even if subsequent packets to be sent are all 600B, the sender will compensate by Conex-marking enough small packets. In this case, the sender will ConEx-mark the next three 600B packets before the counter goes negative (1500 - 3*600 = -300), which indicates that it has sent sufficient ConEx marked small packets to compensate for the lost large packet. It will hold over the negative remainder towards the next loss. As long as the remainder is kept negative, the ConEx markings will be on the safe side for audit purposes. 
</t>


<t>With TCP-ECN the sender knows the size in bytes of packets going out, but ECN feedback is in units of packets not bytes. In some TCP implementations, ECN markings are easy to convert to marked bytes, while in others it requires significant work. Therefore even if a ConEx encoding specifies that markings should be interpreted in bytes, it SHOULD allow implementers some leeway to approximate. Experiments with these approximations will determine whether they are sufficient for different patterns of packet size variations.
</t>


<t>If an encoding is specified in units of bytes, the encoding SHOULD also specify which headers to include in the size of a packet. Bit-congestion is caused by all the bits transmitted with packets, including lower layer frame headers, trailers etc. However, a transport endpoint cannot know the size of the frame header on a packet when it caused congestion at some other link in the Internet, or what size frame header will be used at the audit function. Therefore, it will be practical to define the size of a packet as including the layer 3 header that encapsulates the transport header associated with the ConEx transport sender, but not any more lower layer headers, nor any tunnel headers (which a transport is unlikely to be aware of anyway, because they will already have been stripped before the transport sees the segment). 
</t>


<t>It is appropriate to defer the definition of units to the (non-abstract) encoding specification, because this choice will need to be made in normative language, and the present document is only informative. It may seem that this could lead to interoperability problems if more than one encoding is specified. However, one encoding is unlikely to have to interact with another: the interactions between ConEx implementations in senders, policy devices and audit devices can only happen in the context of one encoding on the wire.
</t>
-->
      </section>
    </section>

    <!-- End of encoding -->

    <!-- ================================================================ -->

    <section anchor="abstrmech_ConEx_Components"
             title="Congestion Exposure Components">
      <t>The components shown in <xref
      target="abstrmech_Fig_ConEx_Placement"/> as well as policy and audit are
      described in more detail.</t>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Network"
               title="Network Devices (Not modified)">
        <t>Congestion signals originate from network devices as they do today.
        A congested router, switch or other network device can discard or ECN
        mark packets when it is congested.</t>

        <!--
<section anchor="abstrmech_ECN_Changes" title="ECN Changes">
        <t>@@@ Move elsewhere </t>
          <t>Although the re-ECN protocol requires no changes to the network
          part of the ECN protocol, it is important to note that it does
          propose some relatively minor modifications to the host-to-host
          aspects of the ECN protocol specified in RFC 3168. They include:
          redefining the ECT(1) code point (the change is consistent with
          RFC3168 but requires deprecating the experimental ECN nonce <xref
          target="RFC3540"></xref>); modifications to the ECN negotiations
          carried on the SYN and SYN-ACK; and using a different state machine
          to carry ECN signals in the transport acknowledgments from a modified
          Receiver to the Sender. This last change is optional, but it permits the transport
          protocol to carry multiple congestion signals per round trip. It
          greatly simplifies accurate auditing, and is likely to be useful in other 
          transports, e.g. DCTCP <xref target="DCTCP" />.</t>


          <t>All of these adjustments to RFC 3168 may also be needed in a
          future standardized ConEx protocol. There will need to be very
          careful consideration of any proposed changes to ECN or other
          existing protocols, because any such changes increase the cost of
          deployment.</t>
        </section> -->
      </section>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Senders" title="Modified Senders">
        <t>The sending transport needs to be modified to send Congestion
        Exposure Signals in response to congestion feedback signals (e.g. for
        the case of a TCP transport see <xref
        target="I-D.ietf-conex-tcp-modifications"/>). We want to permit ConEx
        without ECN (e.g. if the receiver does not support ECN). However, we
        want to encourage a ConEx sender to at least attempt to negotiate ECN
        (a ConEx transport protocol spec may require this), because it is
        believed that ConEx without ECN is harder to audit, and thus
        potentially exposed to cheating. Since honest users have the potential
        to benefit from stronger mechanisms to manage traffic they have an
        incentive to deploy ConEx and ECN together. This incentive is not
        sufficient to prevent a dishonest user from constructing (or
        configuring) a sender that enables ConEx after choosing not to
        negotiate ECN, but is should be sufficient to prevent this from being
        the sustained default case for any significant pool of users.</t>

        <t>Permitting ConEx without ECN is necessary to facilitate
        bootstrapping other parts of ConEx deployment.</t>
      </section>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Receivers"
               title="Receivers (Optionally Modified)">
        <t>Any receiving transport may already feedback sufficiently useful
        signals to the sender so that it does not need to be altered.</t>

        <t>If the transport receiver does not support ECN, then it's native
        loss signaling mechanism (required for compliance with existing
        congestion control standards) will be sufficient for the Sender to
        generate ConEx signals.</t>

        <t>A traditional ECN implementation (RFC 3168 for TCP) signals
        congestion no more than once per round trip. The sender may require
        more precise feedback from the receiver otherwise it is at risk of
        appearing to be understating its ConEx Signals.</t>

        <t>Ideally, ConEx should be added to a transport like TCP without
        mandatory modifications to the receiver. But an optional modification
        to the receiver could be recommended for precision (see <xref
        target="I-D.tcpm-accurate-ecn"/>). This is based on the approach
        originally taken when adding re-ECN to TCP <xref
        target="I-D.briscoe-conex-re-ecn-tcp"/>.</t>
      </section>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Policy_Devices" title="Policy Devices">
        <t>Policy devices are characterised by a need to be configured with a
        policy related to the users or neighboring networks being served. In
        contrast, auditing devices solely enforce compliance with the ConEx
        protocol and do not need to be configured with any client-specific
        policy.</t>

        <t>One of the design goals of the ConEx protocol is that none of the
        important policy mechanisms requires per flow state, and that policy
        mechanisms can even be implemented for heavily aggregated traffic in
        the core of the Internet with complexity akin to accumulating marking
        volumes per logical link. Of course, policy mechanisms may sometimes
        choose to focus down on individual flows, but ConEx aims to make
        aggregate policy devices feasible.</t>

        <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->

        <section anchor="abstrmech_Other_Policy"
                 title="Congestion Monitoring Devices">
          <t>Policy devices can typically be decomposed into two functions i)
          monitoring the ConEx signal to compare it with a policy then ii)
          acting in some way on the result. Various actions might be invoked
          against 'out of contract' traffic, such as policing (see <xref
          target="abstrmech_Policers"/>), re-routing, or downgrading the class
          of service.</t>

          <t>Alternatively a policy device might not act directly on the
          traffic, but instead report to management systems that are designed
          to control congestion indirectly. For instance the reports might
          trigger capacity upgrades, penalty clauses in contracts, levy
          charges based on congestion, or merely send warnings to clients who
          are causing excessive congestion.</t>

          <t>Nonetheless, whatever action is invoked, the congestion
          monitoring function will always be a necessary part of any policy
          device.</t>
        </section>

        <section anchor="abstrmech_RoP_Monitoring"
                 title="Rest-of-Path Congestion Monitoring">
          <t>ConEx signals indicate the level of congestion along a whole path
          from source to destination. In contrast, ECN signals monitored in
          the middle of a network indicate the level of congestion experienced
          so far on the path (of course, only in ECN-capable traffic).</t>

          <t>If a monitor in the middle of a network (e.g. at a network
          border) measures both of these signals, it can subtract the level of
          ECN (path so far) from the level of ConEx (whole path) to derive a
          measure of the congestion that packets are likely to experience
          between the monitoring point and their destination (rest-of-path
          congestion).</t>

          <t>It will often be preferable for policy devices to monitor
          rest-of-path congestion if they can, because it is a measure of the
          downstream congestion that the policy device can directly influence
          by controlling the traffic passing through it.</t>

          <!--
        <t>A monitor cannot use ConEx to reliably measure upstream congestion if it is
              signaled by losses rather than ECN. Therefore a monitor can only
              accurately measure rest-of-path congestion if it ignores traffic
              from non-ECN-capable transports (Not-ECT) and if the congested
              queues upstream of the monitor are ECN-enabled.</t>
-->
        </section>

        <section anchor="abstrmech_Policers" title="Congestion Policers">
          <t>A congestion policer can be implemented in a very similar way to
          a bit-rate policer, but its effect can be focused solely on traffic
          of users causing congestion downstream, which ConEx signals make
          visible. Without ConEx signals, the only way to mitigate congestion
          is to blindly limit traffic bit-rate, on the assumption that high
          bit-rate is more likely to cause congestion.</t>

          <t>A congestion policer monitors all ConEx traffic entering a
          network, or some identifiable subset. Using ConEx signals and/or
          Credit signals (and preferably subtracting ECN signals to yield
          rest-of-path congestion), it measures the amount of congestion that
          this traffic is contributing somewhere downstream. If this
          persistently exceeds a policy-configured 'congestion-bit-rate' the
          congestion policer can limit all the monitored ConEx traffic.</t>

          <!-- Should we give this example here, or rely on the definition of congestion-bit-rate in conex-concepts-uses? -->

          <!--              <t>Downstream congestion-bit-rate is the bit-rate of only those packets that are ConEx marked. For instance an allowed congestion-bit-rate of 100kb/s would allow traffic to flow at 10Mb/s into 1% congestion or 100Mb/s into 0.1% congestion.</t> -->

          <t>A congestion policer can be implemented by a simple token bucket
          applied to an aggregate. But unlike a bit-rate policer, it removes
          tokens only when it forwards packets that are ConEx-Marked and/or
          Credit-Marked, effectively treating Not-ConEx-Marked packets as
          invisible. Consequently, because tokens give the right to send
          congested bits, the fill-rate of the token bucket will represent the
          allowed congestion-bit-rate. This should provide sufficient traffic
          management without having to additionally constrain the straight
          bit-rate at all. See <xref target="I-D.briscoe-conex-policing"/> for
          details.</t>

          <t>Note that the policing action could be to introduce a throttle
          (discard some traffic) immediately upstream of the congestion
          monitor. Alternatively, this throttle could introduce delay using a
          queue with its own AQM, which potentially increases the whole path
          congestion. In effect the congestion policer has moved the
          congestion earlier in the path, and focused it on one user to
          protect downstream resources by reducing the congestion in the rest
          of the path.</t>
        </section>
      </section>

      <!-- ________________________________________________________________ -->

      <section anchor="abstrmech_Audit" title="Audit">
        <t>The most critical aspect of ConEx is the capability to support
        robust auditing. It can be assumed that there will be an intrinsic
        motivation for users to understate the congestion that they are
        causing. Without strong audit functions the ConEx signal is likely to
        become understated to the point of being useless. The most important
        feature of an encoding design is likely to be the robustness of the
        auditing it supports.</t>

        <t>The general approach is to compare the volume of ConEx signals to
        direct measures of actual congestion volume observed in ConEx-enabled
        traffic. The credit approach described in <xref
        target="abstrmech_Credit_Simple_Audit"/> can be used to guarantee that
        this is a strict bound: if the actual congestion exceeds the ConEx
        signal, then some congestion was understated and some sanction should
        be applied to the traffic. Although sanctions are beyond the scope of
        this document, an example sanction might be to throttle the traffic
        immediately upstream of the auditor to prevent the user from getting
        any advantage by understating congestion. Such a throttle would likely
        include some combination of delaying or dropping traffic.</t>

        <t>A ConEx auditor might use one of the following techniques:<list
            style="hanging">
            <t hangText="Generic loss auditing:">For congestion signaled by
            loss, totally accurate auditing is not believed to be possible in
            the general case, because it involves a network node detecting the
            absence of some packets, when it cannot always necessarily
            identify retransmissions or missing packets. The missing packet
            might simply be taking a different route, or the IP payload may be
            encrypted. <vspace blankLines="1"/> It is for this reason that it
            is desirable to motivate the deploying of ECN, even though ECN is
            not strictly required for ConEx.</t>

            <t hangText="ECN auditing:">Directly observe and compare the
            volume of ECN and ConEx marks. Since the volume of ECN marks rises
            monotonically along a path, ECN auditing is most accurate when
            located near the transport receiver. For this reason ECN should be
            monitored downstream of the predominant bottleneck.</t>

            <t hangText="TCP-specific loss auditing:">For non-encrypted
            standard TCP traffic on a single path, an auditor could measure
            losses by detecting retransmissions, which appear as duplicate
            sequence numbers upstream of the loss and out of order data
            downstream of the loss. Since some reordering is present in the
            Internet, such a loss estimator would be most accurate near the
            sender. Such an audit device should treat non-ECN-capable packets
            with encrypted IP payload as Not-ConEx, even if they claim to be
            ConEx-capable, unless the operator knows it is also using one of
            the other two techniques below that can audit such packets against
            losses.</t>

            <t hangText="Predominant bottleneck loss auditing:">For networks
            designed so that losses predominantly occur under the control of
            one IP-aware bottleneck node on the path, the auditor could be
            located at this bottleneck. It could simply compare ConEx Signals
            with actual local packet discards (and ECN marks). This is a good
            model for most consumer access networks where audit accuracy could
            well be sufficient even if losses occasionally occur elsewhere in
            the network. <vspace blankLines="1"/> Although the auditor at the
            predominant bottleneck would not be able to count losses at other
            nodes, transports would not know where losses were occurring
            either. Therefore a transport would not know which losses it could
            cheat and which ones it couldn't without getting caught.</t>

            <t hangText="ECN tunnel loss auditing:">A network operator can
            arrange IP-in-IP tunnels (or IP-in-MPLS etc.) so that any losses
            within the tunnels are deferred until the tunnel egress. Then the
            audit function can be deployed at the egress and be aware of all
            losses. This is possible by enabling ECN marking on switches and
            routers within a tunnel, irrespective of whether end-systems
            support ECN, by exploiting a side-effect of the way tunnels handle
            the ECN field. After encapsulation at the tunnel ingress, the
            network should arrange for any non-ECN packets (with '00' in ECN
            field of the outer) to be set to the ECN-capable transport
            (ECT(0)) codepoint. Then, if they experience congestion at one of
            the ECN-capable switches or routers within the tunnel, some will
            be ECN-marked rather than immediately dropped. However, when the
            tunnel decapsulator strips the outer from such an ECN-marked
            packet, if it finds the inner header has '00' in the ECN field
            (meaning that the endpoints do not support ECN) it will
            automatically drop the packet, assuming it complies with <xref
            target="RFC6040"/>. Thus, an audit function at the decapsulator
            can know which packets would have been dropped within the tunnel
            (and even which are genuinely ECN-marked for the end-to-end
            protocol). Non-ECN end-systems outside the tunnel see no sign of
            the use of ECN internally.</t>
          </list></t>

        <t>In addition, other audit techniques may be identified in the
        future.</t>

        <t><xref target="Refb-dis"/> gives a comprehensive inventory of
        attacks against audit proposed by various people. It includes
        pseudocode for both deterministic and statistical audit functions
        designed to thwart these attacks and analyses the effectiveness of an
        implementation. Although this work is specific to the re-ECN protocol,
        most of the material is useful for designing and assessing audit of
        other specific ConEx encodings, against both ECN and loss.</t>

        <t>The auditing function should be able to trigger sufficient sanction
        to discourage understating congestion <xref target="Salvatori05"/>.
        This seems to require designing the sanction in concert with the
        policy functions, even though they might be implemented in different
        parts of the network. However, <xref target="Refb-dis"/> proves audit
        and policy functions can be independent as long as audit drops
        sufficient traffic to 'normalise' actual congestion signals to be no
        greater than ConEx signals.</t>

        <t>Similarly, the job of incentivising the sending of ConEx-enabled
        packets is proper solely to policy devices, independent of the audit
        function. The audit function's job is policy-neutral, so it should be
        solely confined to checking for correctness within those packets that
        have been marked as ConEx-capable. Even if there are Not-ConEx packets
        mixed with ConEx packets within a flow, audit will not need to monitor
        any Not-ConEx packets.</t>

        <t>Note that in the future it might prove to be desirable to provide
        advice on uniformly implementing sanctions, because otherwise
        insufficient sanctions could impair the ability to implement policy
        elsewhere in the network.</t>

        <t>Some of the audit algorithms require per flow state. This cost is
        expected to be tolerable, because these techniques are most apropos
        near the edges of the network, where traffic is generally much less
        aggregated, so the state need not overwhelm any one device. The
        flow-state required for audit creates itself as it detects new flows.
        Therefore a flow will not fail if it is re-routed away from the audit
        box currently holding its flow-state, so auditing does not require
        route pinning and works fine with multipath flows.</t>

        <t>Holding flow-state seems to create a vulnerability to attacks that
        exhaust the auditor's memory by opening numerous new short flows. The
        audit function can protect itself from this attack by not allocating
        new flow-state unless a ConEx-marked packet arrives (e.g. credit at
        the start of a flow). Because policy devices rate limit ConEx-marked
        packets, this sets a natural limit to the rate at which a source can
        create flow-state in audit devices.</t>

        <t>Auditing can be distributed and redundant. One flow may be audited
        in multiple places, using multiple techniques. Some audit techniques
        do not require any per flow state and can be applied to aggregate
        traffic. These might be able to detect the presence of understated
        congestion at large scale and support recursively hunting for
        individual flows that are understating their congestion. Even at large
        scales, flows can be randomly selected for individual auditing.</t>

        <t>Sampling techniques can also be used to bound the total auditing
        memory footprint, although the implementer must be wary of "identifier
        white washing when caught" tactics where a source cheats until caught
        by sampling, then simply discards that flow ID and starts cheating
        with a new one.</t>

        <section anchor="abstrmech_Credit_Simple_Audit"
                 title="Using Credit to Simplify Audit">
          <!-- <t> add the idea that credit is an estimate: there is a trade off between requiring a strict bound on ConEx with an extremely conservative credit estimate or a statistical ConEx audit with a measured sanction.</t> -->

          <t>At the audit function, there will be an inherent delay of at
          least one round trip between a congestion signal and the subsequent
          ConEx signal it triggers, as shown in <xref
          target="abstrmech_Fig_ConEx_Placement"/>. However, the audit
          function cannot be expected to wait for a round trip to check that
          one signal balances the other, because that requires excessive state
          and the auditor cannot easily determine the RTT of each flow.</t>

          <t>The simplest mechanism to compensate for the round trip delay
          between the signals is to have the sender include a "credit" signal
          to cover the yet to be observed congestion that might occur during
          this delay. The transport signals sufficient credit in advance to
          cover congestion expected during its feedback delay. Then, the audit
          function does not need to make allowance for round trip delays that
          it cannot quantify. This design choice correctly makes the transport
          responsible for both minimizing feedback delay and for the risk that
          packets in flight will cause congestion to others before the source
          can react.</t>

          <t>Making the source responsible for allowing for the round trip
          delay in ConEx signals is a design choice that needs to be
          consistently applied, as is the question of whether Credit markings
          continue to maintain their value for the duration of a flow or
          expire or degrade over time. Any such requirements should be defined
          in a particular ConEx encoding specification (see network layer
          encoding requirement D in <xref
          target="abstrmech_Secific_Constraints"/>).</t>

          <t>For example, imagine that the audit function makes the transport
          responsible for round trip delays by keeping a running account of
          two balances: a) a first balance between credit signals, which it
          counts as positive, and actual congestion signals (loss or ECN),
          which it counts as negative. b) a second balance between ConEx
          signals, which it counts as positive, and all but the most recent
          congestion signals (loss or ECN), which it counts as negative. If
          audit punishes a flow as soon as either of these two balances goes
          negative, the source will be forced to 'pre-load' some credit
          markings at the start of a flow, as well as continually replenishing
          both credit and ConEx signals in response to actual congestion. Then
          the audit function can immediately start punishing a flow, without
          any grace period, as soon as the credit balance goes negative.</t>

          <t>This approach also ensures that a source has to 'pay' up-front
          for the risk of subjecting others to congestion. Then, a congestion
          policer can stop traffic from a source that is taking too much risk
          (e.g. opening too many large initial windows) before it enters the
          network and causes any actual harm.</t>
        </section>
      </section>

      <!-- end of audit -->
    </section>

    <!-- end of elements -->

    <!-- ================================================================ -->

    <section anchor="abstrmech_Incr_Deploy"
             title="Support for Incremental Deployment">
      <t>The ConEx abstract protocol described so far is intended to support
      incremental deployment in every possible respect. For convenience, the
      following list collects together all the features of ConEx that support
      incremental deployment, and points to further information on each:<list
          style="hanging">
          <t hangText="Packets:">The wire protocol encoding allows each packet
          to indicate whether it is using ConEx or not (see <xref
          format="default" target="abstrmech_Representing_ConEx"/> on <xref
          format="title" target="abstrmech_Representing_ConEx"/>).</t>

          <t hangText="Senders:">ConEx requires a modification to the source
          in order to send ConEx packet markings (see <xref
          target="abstrmech_Senders"/>). Although ConEx support can be
          indicated on a packet-by-packet basis, it is likely that all the
          packets in a flow will either consistently support ConEx or
          consistently not. It is also likely that, if the implementation of a
          transport protocol supports ConEx, all the packets sent from that
          host using that protocol will be ConEx marked. <vspace
          blankLines="1"/>The implementations of some of the transport
          protocols on a host might not support ConEx (e.g. the implementation
          of DNS over UDP might not support ConEx, while perhaps RTP over UDP
          and TCP will). Any non-upgraded transports and non-upgraded hosts
          will simply continue to send regular Not-ConEx packets as
          always.<vspace blankLines="1"/>A network operator can create
          incentives for senders to voluntarily reveal ConEx information (see
          the item on incremental deployment by 'Networks' below).</t>

          <t hangText="Receivers:">A ConEx source should be able to work
          without a modified receiver. However, without sufficiently precise
          congestion feedback from the receiver, the source may have to
          conservatively send extra ConEx markings in order to avoid
          understating congestion. The need for more precise receiver feedback
          is not exclusive to ConEx, for instance Data Centre TCP (DCTCP <xref
          target="DCTCP"/>) uses precise feedback to good effect. Nonetheless,
          if a receiver offers precise feedback, <xref
          target="I-D.tcpm-accurate-ecn"/> it will be best if ConEx uses it
          (see <xref target="abstrmech_Receivers"/>).</t>

          <t hangText="Proxies:">Although it was stated above that ConEx
          requires a modification to the source, ConEx signals could
          theoretically be introduced by a proxy for the source, as long as it
          can intercept feedback from the receiver. Similarly, more precise
          feedback could thoretically be provided by a proxy for the receiver
          rather than modifying the receiver itself.</t>

          <t hangText="Forwarding:">No modification to forwarding or queuing
          is needed for ConEx.<vspace blankLines="1"/> However, once ConEx is
          deployed, it is possible that a queue implementation could
          optionally take advantage of the ConEx information in packets. For
          instance, it has been suggested <xref
          target="I-D.briscoe-conex-re-ecn-tcp"/> that a queue would be more
          robust against flooding if it preferentially discarded Not-ConEx
          packets then Not-Marked ConEx packets.<vspace blankLines="1"/>A
          ConEx sender re-echoes congestion whether the queues signaling
          congestion are ECN-enabled or not. Nonetheless, auditing works best
          if most congestion is indicated by ECN rather than loss (see <xref
          target="abstrmech_Requirements"/>). Also, monitoring rest-of-path
          congestion is not accurate if there are congested non-ECN queues
          upstream of the monitoring point (<xref
          target="abstrmech_RoP_Monitoring"/>).</t>

          <t hangText="Networks:">If a subset of traffic sources (or proxies)
          use ConEx signals to reveal congestion in the internetwork layer, a
          network operator can choose (or not) to use this information for
          traffic management. As long as the end-to-end ConEx signals are
          present, each network can unilaterally choose to use
          them—independently of whether other networks do. <vspace
          blankLines="1"/>ConEx marked packets may safely traverse a network
          that ignores them. ConEx signals are defined to remain unchanged
          once set by the sender, but some encodings may allow changes in
          transit (e.g. by proxies). In no circumstances will a network node
          change ConEx marked packets to Not-ConEx (network layer encoding
          requirement I in <xref target="abstrmech_Secific_Constraints"/>). If
          necessary, endpoints should be able to detect if a network is
          removing ConEx signals (network layer encoding requirement H in
          <xref target="abstrmech_Secific_Constraints"/>). <vspace
          blankLines="1"/> An operator can deploy policy devices (<xref
          target="abstrmech_Policy_Devices"/>) wherever traffic enters its
          network, in order to monitor the downstream congestion that incoming
          traffic contributes to, and control it if necessary. A network
          operator can create incentives for the developers of sending
          applications and transports to voluntarily reveal ConEx information.
          Without ConEx information, a network operator tends to have to limit
          the bit-rate or volume from a site more than is necessary, just in
          case it might congest others. With ConEx information, the operator
          can solely limit congestion-causing traffic, and otherwise allow
          complete freedom. This greater freedom acts as an inducement for the
          source to volunteer ConEx information. An operator may also monitor
          whether a source transport has sent ConEx packets, and treat the
          same transport with greater suspicion (e.g. a more stringent
          rate-limit) whenever it selectively sends packets without ConEx
          support. See <xref target="RFC6789"/> for further discussion of
          deployment incentives for networks and references to scenarios where
          some networks use ConEx-based policy devices and others
          don't.<vspace blankLines="1"/> An operator can deploy audit devices
          (<xref target="abstrmech_Audit"/>) unilaterally within its own
          network to verify that traffic sources are not understating ConEx
          information. From the viewpoint of one network operator (say N_a),
          it only cares that the level of ConEx signaling is sufficient to
          cover congestion in its own network. If traffic continues into a
          congested downstream network (say N_b), it is of no concern to the
          first network (N_a) if the end-to-end ConEx signaling is
          insufficient to cover the congestion in N_b as well. This is N_b's
          concern, and N_b can both detect such anomalous traffic and deal
          with it using ConEx-based policy devices (<xref
          target="abstrmech_Policy_Devices"/>).</t>

          <!--{Network N_b can make it in N_a's interest to deal with congestion at source,
by including ConEx metrics in the traffic contract betwen them. However, in the absence of such contracts,
ConEx audit and policy devices can still be usefully deployed by each network operator unilaterally.}-->
        </list></t>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_Sec_Consider" title="Security Considerations">
      <t>The only known risk associated with ConEx is that users and
      applications are very likely to be motivated to under-represent the
      congestion that they are causing. Significant portions of this document
      are about mechanisms to audit the ConEx signals and create sufficient
      sanction to inhibit such under-representation. In particular see <xref
      target="abstrmech_Audit"/>.</t>

      <t>Security attacks and their defences are best discussed against a
      concrete protocol specification, not the abstract mechanism of this
      document. A concrete ConEx protocol will need to be accompanied by a
      document describing how the protocol and its audit mechanisms defend
      against likely attacks. <xref target="Refb-dis"/> will be a useful
      source for such a document. It gives a comprehensive inventory of
      attacks against audit that have been proposed by various parties. It
      includes pseudocode for both deterministic and statistical audit
      functions designed to thwart these attacks and analyses the
      effectiveness of an implementation.</t>

      <t>However, <xref target="Refb-dis"/> is specific to the re-ECN
      protocol, which signalled ECN & loss together, whereas ConEx signals
      them separately. Therefore, although likely attacks will be similar,
      there will be more combinations of attacks to worry about, and defences
      and their analysis are likely to be a little different for ConEx.</t>

      <t>The main known attacks that a security document for a concrete ConEx
      protocol will need to address are listed below, and <xref
      target="Refb-dis"/> should be referred to for how re-ECN was designed to
      defend against similar attacks: <list style="symbols">
          <t>Attacks on the audit function (see Section 7.5 of <xref
          target="Refb-dis"/>): <list style="hanging">
              <t hangText="Flow ID Whitewashing: ">Designing the audit
              function so that a source cannot gain from starting a new flow
              once audit has detected cheating in a previous flow.</t>

              <t hangText="Dragging Down an Aggregate: ">Avoiding audit
              discarding packets from all flows within an aggregate, which
              would allow one flow to pull down the average so that the audit
              function would discard packets from all flows, not just the
              offending flow.</t>

              <t hangText="Dragging Down a Spoofed Flow ID: ">An attacker
              understates ConEx markings in packets that spoof another flow,
              which fools the audit function into dropping the genuine user's
              packets.</t>
            </list></t>

          <t>Attacks by networks on other networks (see Section 8.2 of <xref
          target="Refb-dis"/>): <list style="hanging">
              <t hangText="Dummy Traffic: ">Sending dummy traffic across a
              border with understated ConEx markings to bring down the average
              ConEx markings in the aggregate of border traffic. This attack
              can be combined with a TTL that expires before the packets reach
              an audit function.</t>

              <t
              hangText="Signal Poisoning with 'Cancelled' Marking: ">Sending
              high volumes of valid packets that are both ConEx-Marked and
              ECN-Marked, which seems to represent congestion upstream, but it
              makes these packets immune to being further ECN-Marked
              downstream.</t>
            </list></t>
        </list></t>

      <t>It is planned to document all known attacks and their defences
      (including all the above) in the RFC series against a concrete ConEx
      protocol specification. In the interim, <xref target="Refb-dis"/> and
      its references should be referred to for details and ways to address
      these attacks in the case of re-ECN.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_Acknowledgements" title="Acknowledgements">
      <t>This document was improved by review comments from Toby Moncaster,
      Nandita Dukkipati, Mirja Kuehlewind, Caitlin Bestler, Marcelo Bagnulo
      Braun, John Leslie, Ingemar Johansson and David Wagner.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="abstrmech_Comments_Solicited" title="Comments Solicited">
      <t>Comments and questions are encouraged and very welcome. They can be
      addressed to the IETF Congestion Exposure (ConEx) working group mailing
      list <conex@ietf.org>, and/or to the authors.</t>
    </section>
  </middle>

  <back>
    <!-- ================================================================ -->

    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>
    </references>

    <references style="hanging" title="Informative References">
      <?rfc include='reference.RFC.3514'?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.5348'?>

      <?rfc include='reference.RFC.5681'?>

      <?rfc include='reference.RFC.6789'?>

      <?rfc include='reference.RFC.6040'?>

      <?rfc include='reference.RFC.6817'?>

      <?rfc include='reference.I-D.briscoe-conex-re-ecn-tcp'?>

      <?rfc include='reference.I-D.ietf-tsvwg-byte-pkt-congest'?>

      <?rfc include='reference.I-D.briscoe-conex-re-ecn-motiv'?>

      <?rfc include='reference.I-D.briscoe-conex-policing'?>

      <?rfc include='reference.I-D.ietf-conex-destopt'?>

      <?rfc include='reference.I-D.ietf-conex-tcp-modifications'?>

      <?rfc include='reference.I-D.kuehlewind-tcpm-accurate-ecn-02'?>

      <reference anchor="DCTCP"
                 target="http://portal.acm.org/citation.cfm?id=1851192">
        <front>
          <title>Data Center TCP (DCTCP)</title>

          <author fullname="Mohammad Alizadeh" initials="M" surname="Alizadeh"/>

          <author fullname="Albert Greenberg" initials="A" surname="Greenberg">
            <organization/>
          </author>

          <author fullname="David A. Maltz" initials="D.A." surname="Maltz">
            <organization/>
          </author>

          <author fullname="Jitendra Padhye" initials="J" surname="Padhye">
            <organization/>
          </author>

          <author fullname="Parveen Patel" initials="P" surname="Patel">
            <organization/>
          </author>

          <author fullname="Balaji Prabhakar" initials="B" surname="Prabhakar">
            <organization/>
          </author>

          <author fullname="Sudipta Sengupta" initials="S" surname="Sengupta">
            <organization/>
          </author>

          <author fullname="Murari Sridharan" initials="M" surname="Sridharan">
            <organization/>
          </author>

          <date month="October" year="2010"/>
        </front>

        <seriesInfo name="ACM SIGCOMM CCR" value="40(4)63--74"/>

        <format target="http://ccr.sigcomm.org/drupal/files/p63_0.pdf"
                type="PDF"/>
      </reference>

      <reference anchor="Refb-dis"
                 target="http://bobbriscoe.net/projects/refb/#refb-dis">
        <front>
          <title>Re-feedback: Freedom with Accountability for Causing
          Congestion in a Connectionless Internetwork</title>

          <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
            <organization>BT & UCL</organization>
          </author>

          <date month="" year="2009"/>
        </front>

        <seriesInfo name="UCL PhD Dissertation" value=""/>

        <format target="http://www.bobbriscoe.net/pubs.html#refb-dis"
                type="PDF"/>
      </reference>

      <reference anchor="Re-fb"
                 target="http://www.acm.org/sigs/sigcomm/sigcomm2005/techprog.html#session8">
        <front>
          <title>Policing Congestion Response in an Internetwork Using
          Re-Feedback</title>

          <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
            <organization>BT & UCL</organization>
          </author>

          <author fullname="Arnaud Jacquet" initials="A" surname="Jacquet">
            <organization>BT</organization>
          </author>

          <author fullname="Carla Di Cairano-Gilfedder" initials="C"
                  surname="Di Cairano-Gilfedder">
            <organization>BT</organization>
          </author>

          <author fullname="Alessandro Salvatori" initials="A"
                  surname="Salvatori">
            <organization>Eurécom & BT</organization>
          </author>

          <author fullname="Andrea Soppera" initials="A" surname="Soppera">
            <organization>BT</organization>
          </author>

          <author fullname="Martin Koyabe" initials="M" surname="Koyabe">
            <organization>BT</organization>
          </author>

          <date month="August" year="2005"/>
        </front>

        <seriesInfo name="ACM SIGCOMM CCR" value="35(4)277--288"/>

        <format target="http://www.cs.ucl.ac.uk/staff/B.Briscoe/projects/2020comms/refb/refb_sigcomm05.pdf"
                type="PDF"/>
      </reference>

      <reference anchor="FairerFaster"
                 target="http://bobbriscoe.net/projects/refb/#fairfastip">
        <front>
          <title>A Fairer, Faster Internet Protocol</title>

          <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
            <organization>BT & UCL</organization>
          </author>

          <date month="December" year="2008"/>
        </front>

        <seriesInfo name="IEEE Spectrum" value="Dec 2008:38--43"/>

        <format target="http://www.spectrum.ieee.org/print/7027" type="HTML"/>
      </reference>

      <reference anchor="CheapPseud">
        <front>
          <title>The Social Cost of Cheap Pseudonyms</title>

          <author fullname="E. Friedman" initials="E" surname="Friedman">
            <organization/>
          </author>

          <author fullname="P. Resnick" initials="P" surname="Resnick">
            <organization/>
          </author>

          <date month="" year="1998"/>
        </front>

        <seriesInfo name="Journal of Economics and Management Strategy"
                    value="10(2)173--199"/>
      </reference>

      <reference anchor="Evol_cc"
                 target="http://www.statslab.cam.ac.uk/~frank/evol.html">
        <front>
          <title>Resource pricing and the evolution of congestion
          control</title>

          <author fullname="Richard J. Gibbens " initials="R"
                  surname="Gibbens">
            <organization>Cam Uni</organization>
          </author>

          <author fullname="Frank P. Kelly" initials="F" surname="Kelly">
            <organization>Cam Uni</organization>
          </author>

          <date month="December" year="1999"/>
        </front>

        <seriesInfo name="Automatica" value="35(12)1969--1985"/>

        <format target="http://www.statslab.cam.ac.uk/~frank/evol.html"
                type="PDF"/>
      </reference>

      <reference anchor="Salvatori05">
        <front>
          <title>Closed Loop Traffic Policing</title>

          <author fullname="Alessandro Salvatori" initials="A"
                  surname="Salvatori">
            <organization>Eurécom & BT</organization>
          </author>

          <date month="September" year="2005"/>
        </front>

        <seriesInfo name="Politecnico Torino and Institut Eurecom Masters Thesis"
                    value=""/>
      </reference>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 05:05:23