One document matched: draft-ietf-tcpm-accecn-reqs-08.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC0896 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.0896.xml">
<!ENTITY RFC2018 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2018.xml">
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC3168 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml">
<!ENTITY RFC3449 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3449.xml">
<!ENTITY RFC3540 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3540.xml">
<!ENTITY RFC5562 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5562.xml">
<!ENTITY RFC5681 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5681.xml">
<!ENTITY RFC5690 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5690.xml">
<!ENTITY RFC6093 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6093.xml">
<!ENTITY RFC6679 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6679.xml">
<!ENTITY RFC6789 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6789.xml">
<!ENTITY I-D.bensley-tcpm-dctcp SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.bensley-tcpm-dctcp.xml">
<!ENTITY I-D.moncaster-tcpm-rcv-cheat SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.moncaster-tcpm-rcv-cheat.xml">
<!ENTITY I-D.stewart-tsvwg-sctpecn SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.stewart-tsvwg-sctpecn.xml">
<!ENTITY I-D.welzl-ecn-benefits SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.welzl-ecn-benefits.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-ietf-tcpm-accecn-reqs-08"
     ipr="trust200902">
  <!-- updates="3186" -->

  <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
       full title is longer than 39 characters -->

    <title abbrev="Requirements for More Accurate ECN">Problem Statement and
    Requirements for a More Accurate ECN Feedback</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

    <author fullname="Mirja Kühlewind" initials="M." role="editor"
            surname="Kühlewind">
      <organization>ETH Zurich</organization>

      <address>
        <postal>
          <street>Gloriastrasse 35</street>

          <code>8092</code>

          <city>Zurich</city>

          <country>Switzerland</country>
        </postal>

        <email>mirja.kuehlewind@tik.ee.ethz.ch</email>
      </address>
    </author>

    <author fullname="Richard Scheffenegger" initials="R."
            surname="Scheffenegger">
      <organization>NetApp, Inc.</organization>

      <address>
        <postal>
          <street>Am Euro Platz 2</street>

          <code>1120</code>

          <city>Vienna</city>

          <region/>

          <country>Austria</country>
        </postal>

        <phone>+43 1 3676811 3146</phone>

        <email>rs@netapp.com</email>
      </address>
    </author>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <street>Martlesham Heath</street>

          <city>Ipswich</city>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <email>bob.briscoe@bt.com</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <date day="" month="" year="2015"/>

    <area>Transport</area>

    <workgroup>TCP Maintenance and Minor Extensions (tcpm)</workgroup>

    <keyword>Internet-Draft</keyword>

    <keyword>I-D</keyword>

    <abstract>
      <t>Explicit Congestion Notification (ECN) is a mechanism where network
      nodes can mark IP packets instead of dropping them to indicate
      congestion to the end-points. An ECN-capable receiver will feed this
      information back to the sender. ECN is specified for TCP in such a way
      that it can only feed back one congestion signal per Round-Trip Time
      (RTT). In contrast, ECN for other transport protocols, such as RTP/UDP
      and SCTP, is specified with more accurate ECN feedback. Recent new TCP
      mechanisms (like ConEx or DCTCP) need more accurate ECN feedback in the
      case where more than one marking is received in one RTT. This document
      specifies requirements for an update to the TCP protocol to provide more
      accurate ECN feedback.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> is a
      mechanism where network nodes can mark IP packets instead of dropping
      them to indicate congestion to the end-points. An ECN-capable receiver
      will feed this information back to the sender. ECN is specified for TCP
      in such a way that only one feedback signal can be transmitted per
      Round-Trip Time (RTT). This is sufficient for pre-existing TCP
      congestion control mechanisms that perform only one reduction in sending
      rate per RTT, independent of the number of ECN congestion marks. But
      recently proposed or deployed mechanisms like Congestion Exposure
      (ConEx) <xref target="RFC6789"/> or Data Center TCP (DCTCP) <xref
      target="I-D.bensley-tcpm-dctcp"/> need more accurate ECN feedback than
      'classic' ECN <xref target="RFC3168"/> to work correctly in the case
      where more than one marking is received in any one RTT.</t>

      <t>For an in-depth discussion of the application benefits of using ECN
      (including with sufficiently granular feedback) see <xref
      target="I-D.welzl-ecn-benefits"/>.</t>

      <t>ECN is also defined for transport protocols beside TCP. ECN feedback
      as defined for RTP/UDP <xref target="RFC6679"/> provides a very detailed
      level of information, delivering individual counters for all four ECN
      codepoints as well as lost and duplicate segments, but at the cost of
      high signaling overhead. ECN feedback for SCTP has been proposed in
      <xref target="I-D.stewart-tsvwg-sctpecn"/>. This delivers a counter for
      the number of CE marked segments between CWR chunks, but also comes at
      the cost of increased overhead.</t>

      <t>Today, implementations of DCTCP already exist that alter TCP's ECN
      feedback protocol in proprietary ways (DCTCP was released in Microsoft
      Windows 8, and implementations exist for Linux and FreeBSD). The changes
      DCTCP makes to TCP are not currently the subject of any IETF
      standardization activity, and they omit capability negotiation, relying
      instead on uniform configuration across all hosts and network devices
      with ECN capability. A primary motivation for this document is to
      intervene before each proprietary implementation invents its own
      non-interoperable handshake, which could lead to <spanx style="emph">de facto</spanx>
      consumption of the few flags or codepoints that remain available for
      standardizing capability negotiation.</t>

      <t>This document lists requirements for a robust and interoperable
      TCP/ECN feedback protocol that is more accurate than classic ECN <xref
      target="RFC3168"/> and that all implementations of new TCP extensions,
      like ConEx and/or DCTCP, can use. While a new feedback scheme should
      still deliver as much information as classic ECN, this document also
      clarifies what has to be taken into consideration in addition. Thus the
      listed requirements should be addressed in the specification of a more
      accurate ECN feedback scheme. A few solutions have already been
      proposed. <xref target="accecn_designs"/> demonstrates how to use the
      requirements to compare them, by briefly sketching their high level
      design choices and discussing the benefits and drawbacks of each.</t>

      <t>The scope of these requirements is not limited to any specific
      environment and is intended for general deployment over public and
      private IP networks. Candidate solutions should try to adhere to all
      these requirements, but where this is not possible they should justify
      the deviation. The ordering of the requirements listed in this document
      is not to be taken as an order of importance, because each requirement
      might have different weight in different deployment scenarios.</t>

      <t>These requirements are only concerned with the type and quality of
      the ECN feedback signal. The requirements do not stipulate how a TCP
      sender might react to the improved ECN signal. The requirements also do
      not imply that any modifications to TCP senders or receivers are
      obligatory.</t>

      <section title="Terminology">
        <t>We use the following terminology from <xref target="RFC3168"/> and
        <xref target="RFC3540"/>:</t>

        <t>The ECN field in the IP header: <list hangIndent="10" style="empty">
            <t><list hangIndent="9" style="hanging">
                <t hangText="Not-ECT:">the not ECN-Capable Transport
                codepoint,</t>

                <t hangText="CE:">the Congestion Experienced codepoint,</t>

                <t hangText="ECT(0):">the first ECN-Capable Transport
                codepoint, and</t>

                <t hangText="ECT(1):">the second ECN-Capable Transport
                codepoint.</t>
              </list></t>
          </list> The ECN flags in the TCP header: <list hangIndent="10"
            style="empty">
            <t><list hangIndent="9" style="hanging">
                <t hangText="CWR:">the Congestion Window Reduced flag,</t>

                <t hangText="ECE:">the ECN-Echo flag, and</t>

                <t hangText="NS:">ECN Nonce Sum.</t>
              </list></t>
          </list></t>

        <t>In this document, the ECN feedback scheme as specified in <xref
        target="RFC3168"/> is called 'classic ECN' and any new proposal is
        called a 'more accurate ECN feedback' scheme. A 'congestion mark' is
        defined as an IP packet where the CE codepoint is set. A 'congestion
        episode' refers to one or more congestion marks that belong to the
        same overload situation in the network (usually during one RTT). A TCP
        segment with the acknowledgment flag set is simply called an ACK.</t>
      </section>
    </section>

    <section anchor="accecn_recap"
             title="Recap of Classic ECN and ECN Nonce in IP/TCP">
      <t>ECN requires two bits in the IP header. The ECN capability of a
      packet is indicated when either one of the two bits is set. <!--An 
    ECN sender can set one or the other bit to indicate an ECN-capable 
    transport (ECT) which results in two signals, ECT(0) and ECT(1).--> A
      network node can set both bits simultaneously when it experiences
      congestion. This leads to the four codepoints (not-ECT, ECT(0), ECT(1),
      and CE) as listed above. <!--When both bits are set the 
    packet is regarded as "Congestion Experienced" (CE).--></t>

      <t>In the TCP header the first two bits in byte 14 are defined as ECN
      feedback for each half-connection. A TCP receiver signals the reception
      of a congestion mark using the ECN-Echo (ECE) flag in the TCP header.
      For reliability, the receiver continues to set the ECE flag on every
      ACK. To enable the TCP receiver to determine when to stop setting the
      ECN-Echo flag, the sender sets the CWR flag upon reception of an ECE
      feedback signal. This always leads to a full RTT of ACKs with ECE set.
      Thus the receiver cannot signal back any additional CE markings arriving
      within the same RTT.</t>

      <t>The ECN Nonce <xref target="RFC3540"/> is an experimental addition to
      ECN that the TCP sender can use to protect itself against accidental or
      malicious concealment of CE-marked or dropped packets. This addition
      defines the last bit of byte 13 in the TCP header as the Nonce Sum (NS)
      flag. The receiver maintains a nonce sum that counts the occurrence of
      ECT(1) packets, and signals the least significant bit of this sum on the
      NS flag. There are no known deployments of a TCP stack that makes use of
      the ECN Nonce extension.</t>

      <figure align="center" anchor="TCPHdr"
              title="The (post-ECN Nonce) definition of the TCP header flags">
        <artwork align="center"><![CDATA[             
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
|               |           |   | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
      </figure>

      <!--
      <t>However, as the ECN Nonce is a separate extension to ECN, even if a
      sender tries to protect itself with the ECN Nonce, any receiver wishing
      to conceal marked packets only has to pretend not to support the ECN
      Nonce and simply does not provide any nonce sum feedback.</t>
-->

      <t>An alternative for a sender to assure feedback integrity has been
      proposed where the sender occasionally inserts a CE mark or reordering
      itself, and checks that the receiver feeds it back faithfully <xref
      target="I-D.moncaster-tcpm-rcv-cheat"/>. This alternative consumes no
      header bits or codepoints, as well as releasing the ECT(1) codepoint in
      the IP header and the NS flag in the TCP header for other uses.</t>
    </section>

    <section title="Use Cases">
      <t>The following two examples serve to show where existing mechanisms
      would already benefit from more accurate ECN feedback information.
      However, as it is hard to predict the future, once a more accurate ECN
      feedback mechanism that adheres to the requirements stated in this
      document is widely deployed, it's very likely that additional uses are
      found. The examples listed below are in no particular order.</t>

      <t>ConEx is an experimental approach that allows a sender to relay
      congestion feedback provided by the receiver into the network along the
      forward data path. ConEx information can be used for traffic management
      to limit traffic proportionate to the actual congestion being caused,
      rather than limiting traffic based on rate or volume <xref
      target="RFC6789"/>. A ConEx sender uses selective acknowledgements
      (SACK) <xref target="RFC2018"/> for accurate feedback of loss signals,
      but currently TCP offers no equivalent accurate feedback for ECN.</t>

      <t>DCTCP offers very low and predictable queuing delay. DCTCP changes
      the reaction to congestion of a TCP sender and additionally requires
      switches/routers to have ECN enabled and configured with a low step
      threshold and no signal smoothing, so it is currently only used in
      private networks, e.g. internal to data centers. DCTCP was released in
      Microsoft Windows 8, and implementations exist for Linux and FreeBSD. To
      retrieve sufficient congestion information, the different DCTCP
      implementations use a proprietary ECN feedback protocol, but they omit
      capability negotiation. Moreover, the feedback protocol proposed in
      <xref target="I-D.bensley-tcpm-dctcp"/> only works if there are no
      losses at all, and otherwise it gets very confused (see <xref
      target="DCTCP_Ambiguity"/>). Therefore, if a generic more accurate ECN
      feedback scheme were available, it would solve two problems for DCTCP:
      i) need for a consistent variant of DCTCP to be deployed network-wide
      and ii) inability to cope with ACK loss.</t>

      <t>Classic ECN-TCP would not benefit from more accurate ECN feedback,
      but it would not suffer either. The same signal that is currently
      conveyed with ECN following the specification given in <xref
      target="RFC3168"/> would be available.</t>

      <t>The following scenarios should briefly show where accurate ECN
      feedback is needed or adds value: <list hangIndent="8" style="hanging">
          <t
          hangText="A sender with standardised TCP congestion control that supports ConEx:"><vspace/>
          In this case the ConEx mechanism uses the extra information per RTT
          to re-echo the precise congestion information, but the congestion
          control algorithm still ignores multiple marks per RTT <xref
          target="RFC5681"/>.</t>

          <t
          hangText="A sender using DCTCP congestion control without ConEx:"><vspace/>
          The congestion control algorithm uses the extra info per RTT to
          perform its decrease depending on the number of congestion
          marks.</t>

          <t
          hangText="A sender using DCTCP congestion control and supporting ConEx:"><vspace/>
          Both the congestion control algorithm and ConEx use the more
          accurate ECN feedback mechanism.</t>

          <t hangText="As-yet-unspecified sender mechanisms:"><vspace/> The
          above are two examples of more general interest in sender mechanisms
          that respond to the extent of congestion feedback, not just its
          existence. It will greatly simplify incremental deployment if the
          sender can unilaterally deploy new behaviours, and rely on the
          presence of generic receivers that have already implemented more
          accurate feedback.</t>

          <t hangText="An RFC5681 TCP sender without ConEx:"><vspace/> No
          accurate feedback is necessary here. The congestion control
          algorithm still reacts to only one signal per RTT. But it is best to
          feed back all the information the receiver gets, whether the sender
          uses it or not — at least as long as overhead is low or
          zero.</t>

          <t hangText="Using CE for checking integrity:"><vspace/> If a more
          accurate ECN feedback scheme feeds all occurrences of CE marks back,
          a sender could perform integrity checking by occasionally injecting
          CE marks itself. Specifically, a sender can send packets which it
          randomly marks with CE (at low frequency), then check if feedback is
          received for these packets. The congestion notification feedback for
          these self-injected markings, would not require a congestion control
          reaction <xref target="I-D.moncaster-tcpm-rcv-cheat"/>.</t>
        </list></t>
    </section>

    <section anchor="accecn_reqs" title="Requirements">
      <t>The requirements of the accurate ECN feedback protocol <!--, for the use of e.g. Conex or DCTCP,-->
      are to have fairly accurate (not necessarily perfect), timely and
      protected signaling. This leads to the following requirements, which
      should be discussed for any proposed more accurate ECN feedback
      scheme:</t>

      <t><list hangIndent="8" style="hanging">
          <t hangText="Resilience"><vspace/> The ECN feedback signal is
          carried within the ACK. Pure TCP ACKs can get lost without recovery
          (not just due to congestion, but also due to deliberate ACK
          thinning). Moreover, delayed ACKs are commonly used with TCP.
          Typically, an ACK is triggered after two data segments (or more
          e.g., due to receive segment coalescing, ACK compression, ACK
          congestion control <xref target="RFC5690"/> or other phenomena, see
          <xref target="RFC3449"/>). In a high congestion situation where most
          of the packets are marked with CE, an accurate feedback mechanism
          should still be able to signal sufficient congestion information.
          Thus the accurate ECN feedback extension has to take delayed ACKs
          and ACK loss into account. Also, a more accurate feedback protocol
          should still provide more accurate feedback than classic ECN when
          delayed ACKs cover more than two segments, or when a thin stream
          disables Nagle's algorithm <xref target="RFC0896"/>. Finally, the
          feedback mechanism should not be impacted by reordering of ACKs,
          even when the ACK'ed sequence number does not increase.<vspace
          blankLines="1"/></t>

          <t hangText="Timeliness"><vspace/> A CE mark can be induced by the
          sending host, or more commonly a network node on the transmission
          path, and is then echoed by the receiver in the TCP ACK. Thus when
          this information arrives at the sender, it is naturally already
          about one RTT old. With a sufficient ACK rate a further delay of a
          small number of packets can be tolerated. However, this information
          will become stale with large delays, given the dynamic nature of
          networks. TCP congestion control (which itself partly introduces
          these dynamics) operates on a time scale of one RTT. Thus, to be
          timely, congestion feedback information should be delivered within
          about one RTT.</t>

          <t hangText="Integrity"><vspace/> <!-- With ECN Nonce, a misbehaving receiver or network node 
          can be detected with good probability. If the accurate ECN 
          feedback is reusing the NS bit, it is encouraged to ensure 
          integrity at least as good as ECN Nonce. If this is not 
          possible, alternative approaches should be provided how a 
          mechanism using the accurate ECN feedback extension can re-
          ensure integrity or give strong incentives for the receiver 
          and network node to cooperate honestly.--> The integrity of the
          feedback in a more accurate ECN feedback scheme should be assured,
          at least as well as the ECN Nonce. Alternatively, it should at least
          be possible to give strong incentives for the receiver and network
          nodes to cooperate honestly. <vspace blankLines="1"/> Given there
          are known problems with ECN Nonce deployment, this document only
          requires that the integrity of the more accurate ECN feedback can be
          assured; it does not require that the ECN Nonce mechanism is
          employed to achieve this. Indeed, if integrity could be provided
          else-wise, a more accurate ECN feedback protocol might re-purpose
          the nonce sum (NS) flag in the TCP header. <vspace blankLines="1"/>
          If the more accurate ECN feedback scheme provides sufficient
          information, the integrity check could e.g. be performed by
          deterministically setting the CE in the sender and monitoring the
          respective feedback (similar to ECT(1) and the ECN Nonce sum).
          Whether a sender should enforce when it detects wrong feedback
          information, and what kind of enforcement it should apply, are
          policy issues that need not be specified as part of more accurate
          ECN feedback signal scheme itself, but rather when specifying an
          update to core TCP mechanisms like congestion control that makes use
          of the more accurate ECN signal.</t>

          <t hangText="Accuracy"><vspace/> <!--In TCP usually delayed ACKs are used. Thats means in most 
          cases only for every second data packets an acknowledgment is 
          sent. Moreover, an ACK can get lost.-->Classic ECN feeds back one
          congestion notification per RTT, which is sufficient for classic TCP
          congestion control which reduces the sending rate at most once per
          RTT. Thus the more accurate ECN feedback scheme should ensure that,
          if a congestion episode occurs, at least one congestion notification
          is echoed and received per RTT as classic ECN would do. Of course,
          the goal of a more accurate ECN extension is to reconstruct the
          number of CE markings more accurately. In the best case the new
          scheme should even allow reconstruction of the exact number of
          payload bytes that a CE marked packet was carrying. However, it is
          accepted that it may be too complex for a sender to get the exact
          number of congestion markings or marked bytes in all situations.
          Ideally, the feedback scheme should preserve the order in which any
          (of the four) ECN signals were received. And, ideally, it would even
          be possible for the sender to determine which of the packets covered
          by one delayed ACK were congestion marked, e.g. if the flow consists
          of packets of different sizes, or to allow for future protocols
          where the order of the markings may be important. <vspace
          blankLines="1"/> In the best case, a sender that sees more accurate
          ECN feedback information would be able to reconstruct the occurrence
          of any of the four code points (non-ECT, CE, ECT(0), ECT(1)).
          However, assuming the sender marks all data packets as ECN-capable
          and uses a default setting of ECT(0) (as with <xref
          target="RFC3168"/>, solely feeding back the occurrence of CE and
          ECT(1) might be sufficient. Because the sender can keep account of
          the transmitted segments with any of the three ECN codepoints,
          conveying any two of these back to the sender is sufficient for it
          to reconstruct the third as observed by the receiver. Thus a more
          accurate ECN feedback scheme should at least provide information on
          two of these signals, e.g. CE and ECT(1).<vspace blankLines="1"/> If
          a more accurate ECN scheme can reliably deliver feedback in most but
          not all circumstances, ideally the scheme should at least not
          introduce bias. In other words, undetected loss of some ACKs should
          be as likely to increase as decrease the sender's estimate of the
          probability of ECN marking.</t>

          <t hangText="Complexity"><vspace/> Implementation should be as
          simple as possible and only a minimum of additional state
          information should be needed. This will enable more accurate ECN
          feedback to be used as the default feedback mechanism, even if only
          one ECN feedback signal per RTT is needed. <!--A proposal fulfilling this for a more accurate 
          ECN feedback can then also be the standard ECN feedback mechanism. --></t>

          <t hangText="Overhead"><vspace/> A more accurate ECN feedback signal
          should limit the additional network load, because ECN feedback is
          ultimately not critical information (in the worst case, loss will
          still be available as a congestion signal of last resort). As
          feedback information has to be provided frequently and in a timely
          fashion, potentially all or a large fraction of TCP acknowledgments
          might carry this information. Ideally, no additional segments should
          be exchanged compared to an RFC3168 TCP session, and the overhead in
          each segment should be minimized.</t>

          <t hangText="Backward and forward compatibility"><vspace/> Given
          more accurate ECN feedback will involve a change to the TCP
          protocol, it should be negotiated between the two TCP endpoints. If
          either end does not support the more accurate feedback, they should
          both be able to fall-back to classic ECN feedback. <vspace
          blankLines="1"/> A more accurate ECN feedback extension should aim
          to traverse most middleboxes, including firewalls and network
          address translators (NAT). Further, a feedback mechanism should
          provide a method to fall back to classic ECN signaling if the new
          signal is suppressed by certain middleboxes. <vspace
          blankLines="1"/> In order to avoid a fork in the TCP protocol
          specifications, if experiments with the new ECN feedback protocol
          are successful, it is intended to eventually update RFC3168 for any
          TCP/ECN sender, not just for ConEx or DCTCP senders. Then future
          senders will be able to unilaterally deploy new behaviours that
          exploit the existence of more accurate ECN feedback in receivers
          (forward compatibility). Conversely, even if another sender only
          needs one ECN feedback signal per RTT, it should be able to use more
          accurate ECN feedback, and simply ignore the excess information.</t>
        </list></t>

      <t>Furthermore, the receiver should not make assumptions about the
      mechanism that was used to set the markings nor about any interpretation
      or reaction to the congestion signal. The receiver only needs to
      faithfully reflect congestion information back to the sender.</t>
    </section>

    <section anchor="accecn_designs" title="Design Approaches">
      <t><!-- ToDo: Consider reemphasising why these sections are needed in a requirements doc -->This
      section introduces some possible TCP ECN feedback design approaches. The
      purpose of this section is to give examples of how trade-offs might be
      needed between the requirements, as input to future IETF work to specify
      a protocol. The order is not significant and there is no intention to
      endorse any particular approach.</t>

      <t>All approaches presented below (and proposed so far) are able to
      provide accurate ECN feedback information as long as no ACK loss occurs
      and the congestion rate is reasonable. In the case of a high ACK loss
      rate or very high congestion (CE marking) rate, the proposed schemes
      have different resilience characteristics depending on the number of
      bits used for the encoding. While classic ECN provides reliable (but
      inaccurate) feedback of a maximum of one congestion signal per RTT, the
      proposed schemes do not implement an explicit acknowledgement mechanism
      for the feedback (as e.g. the ECE / CWR exchange of <xref
      target="RFC3168"/>).</t>

      <section title="Re-Definition of ECN/NS Header Bits">
        <!--as a Flag-->

        <t>Schemes in this category can additionally use the NS bit for
        capability negotiation during the TCP handshake exchange. Thus a more
        accurate ECN could be negotiated without changing the classic ECN
        negotiation and thus being backwards compatible.</t>

        <t>Schemes in this category can simply re-define the ECN header flags,
        ECE and CWR, to encode the occurrence of a CE marking at the receiver.
        This approach provides very limited resilience against loss of ACK,
        particularly pure ACKs (no payload and therefore delivered
        unreliably).</t>

        <t>A couple of schemes have been proposed so far: <list
            style="symbols">
            <t>A naive one-bit scheme that sends one ECE for each CE received
            could use CWR to increase robustness against ACK loss by
            introducing redundant information on the next ACK, but this is
            still vulnerable to ACK loss.</t>

            <t>The scheme defined for DCTCP <xref
            target="I-D.bensley-tcpm-dctcp"/>, which toggles the ECE feedback
            on an immediate ACK whenever the CE marking changes, and otherwise
            feeds back delayed ACKs with the ECE value unchanged. <xref
            target="DCTCP_Ambiguity"/> demonstrates that this scheme is still
            ambiguous to the sender if the ACKs are pure ACKs, and if some may
            have been lost.</t>
          </list></t>

        <!--</section>-->

        <!--<section title="Re-Definition of ECN/NS Header Bits as a Field">-->

        <t>Alternatively, the receiver uses the three ECN/NS header flags,
        ECE, CWR and NS to represent a counter that signals the accumulated
        number of CE markings it has received. Resilience against loss is
        better than the flag-based schemes, but may not suffice in the
        presence of extended ACK loss that otherwise would not affect the TCP
        sender's performance.</t>

        <t>A number of coding schemes have been proposed so far in this
        category: <list style="symbols">
            <t>A 3-bit counter scheme continuously feeds back the three least
            significant bits of a CE counter;</t>

            <t>A scheme that defines a standardised lookup table to map the 8
            codepoints onto either a CE counter or an ECT(1) counter.</t>
          </list></t>

        <t>These proposed schemes provide accumulated information on ECN-CE
        marking feedback, similar to the number of acknowledged bytes in the
        TCP header. Due to the limited number of bits the ECN feedback
        information will wrap much more often than the acknowledgement field.
        Thus feedback information could be lost due to a relatively small
        sequence of pure-ACK losses. Resilience could be increased by
        introducing redundancy, e.g. send each counter increase two or more
        times. Of course any of these additional mechanisms will increase the
        complexity. If the congestion rate is greater than the ACK rate
        (multiplied by the number of congestion marks that can be signaled per
        ACK), the congestion information cannot correctly be fed back.
        Covering the worst case where every packet is CE marked can
        potentially be realized by dynamically adapting the ACK rate and
        redundancy. This again increases complexity and perhaps the signaling
        overhead as well. Schemes that do not re-purpose the ECN NS bit, could
        still support the ECN Nonce.</t>
      </section>

      <section title="Using Other Header Bits ">
        <t>As seen in <xref target="TCPHdr"/>, there are currently three
        unused flags in the TCP header. The proposed 3-bit counter or
        codepoint schemes could be extended by one or more bits to add higher
        resilience against ACK loss. The relative gain would be exponentially
        higher resilience against ACK loss, while the respective drawbacks
        would remain identical.</t>

        <t>Alternatively, a new method could standardise the use of the bits
        in the Urgent Pointer field (see <xref target="RFC6093"/>) to signal
        more bits of its congestion signal counter, but only whenever it does
        not set the Urgent Flag. As this is often the case, resilience could
        be increased without additional header overhead.</t>

        <t>Any proposal to use such bits would need to check the likelihood
        that some middleboxes might discard or 'normalize' the currently
        unused flag bits or a non-zero Urgent Pointer when the Urgent Flag is
        cleared. If during experimentation certain bits have been proven to be
        usable, the assignment of any of these bits would then require an IETF
        standards action.</t>
      </section>

      <section title="Using a TCP Option">
        <t>Alternatively, a new TCP option could be introduced, to help
        maintain the accuracy and integrity of ECN feedback between receiver
        and sender. Such an option could provide higher resilience and even
        more information, perhaps as much as ECN for RTP/UDP <xref
        target="RFC6679"/>, which explicitly provides the number of ECT(0),
        ECT(1), CE, non-ECT marked and lost packets, or as much as a proposal
        for SCTP that counts the number of ECN marks <xref
        target="I-D.stewart-tsvwg-sctpecn"/> between CWR chunks. However,
        deploying new TCP options has its own challenges. Moreover, to
        actually achieve high resilience, this option would need to be carried
        by most or all ACKs as the receiver cannot know if and when ACKs may
        be dropped. Thus this approach would introduce considerable signaling
        overhead even though ECN feedback is not extremely critical
        information (in the worst case, loss will still be available to
        provide a strong congestion feedback signal). Whatever, such a TCP
        option could be used in addition to a more accurate ECN feedback
        scheme in the TCP header or in addition to classic ECN, only when
        needed and when space is available.</t>
      </section>

      <!-- 
    
    <t>Combining the idea of <xref target="eci_mode"/> and <xref 
    target="cp_mode"/>, further extending it to a one-octet option, 
    would allow the signaling of two values, each with 4 bit. The gains 
    in worst case ACK loss, delayed ACK ratios and maintaining ECN Nonce 
    would scale accordingly. </t> 
    
    <t>Alternatively, if timestamp capability negotiation is supported, 
    a few bits could be extracted from the timestamp value, to provide 
    extended signaling. However, processing TCP options (or overloaded 
    TCP options) is more complex than processing of header flags. </t>
    
    -->
    </section>

    <section title="Acknowledgements">
      <t>Thanks <!-- to Bob Briscoe for reviewing and providing valuable 
    additions on DCTCP and ConEx. Moreover, thanks -->to Gorry Fairhurst <!-- as 
    well as Bob Briscoe -->for his review and for ideas on CE-based integrity
      checking and to Mohammad Alizadeh for suggesting the need to avoid
      bias.</t>

      <t>Bob Briscoe was part-funded by the European Community under its
      Seventh Framework Programme through the Reducing Internet Transport
      Latency (RITE) project (ICT-317700) and through the Trilogy 2 project
      (ICT-317756). he views expressed here are solely those of the 
      authors, in the context of the mentioned funding projects</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>

      <!--
    
    <t> If this memo was to progress to standards track, it would update 
    RFC3168 and RFC3540, to add new combinations of flags in the TCP 
    header for capability negotiation (see <xref target="TCPNeg"/>) and 
    a change in TCP ECN semantics (see <xref target="TCPSig"/>).</t>
    
    -->
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>ECN feedback information must only be used if the other information
      contained in a received TCP segment indicates that the congestion was
      genuinely part of the flow and not spoofed - i.e. the normal TCP
      acceptance techniques have to be used to verify that the segment is part
      of the flow before returning any contained ECN information, and
      similarly ECN feedback is only accepted on valid ACKs.</t>

      <t>Given ECN feedback is used as input for congestion control, the
      respective algorithm would not react appropriately if ECN feedback were
      lost and the resilience mechanism to recover it was inadequate. This
      resilience requirement is articulated in <xref target="accecn_reqs"/>.
      However, it should be noted that ECN feedback is not the last resort
      against congestion collapse, because if there is insufficient response
      to ECN, loss will ensue, and TCP will still react appropriately to
      loss.</t>

      <t>A receiver could suppress ECN feedback information leading to its
      connections consuming excess sender or network resources. <!--Or an attacker could providing wrong congestion information 
    which then easily leads to throttling of certain connections. These 
    problems are --> This problem is similar to that seen with the classic ECN
      feedback scheme and should be addressed by integrity checking as
      required in <xref target="accecn_reqs"/>.</t>
    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->

      &RFC3168;

      &RFC3540;
    </references>

    <references title="Informative References">
      <!--      <?rfc include="reference.I-D.briscoe-tsvwg-re-ecn-tcp.xml"?> -->

      <!--      <?rfc include="reference.I-D.kuehlewind-tcpm-accurate-ecn-option.xml"?> -->

      &RFC0896;

      &RFC2018;

      &RFC3449;

      <!--      &RFC5562; -->

      &RFC5681;

      &RFC5690;

      &RFC6093;

      &RFC6679;

      &RFC6789;

      &I-D.bensley-tcpm-dctcp;

      &I-D.moncaster-tcpm-rcv-cheat;

      &I-D.stewart-tsvwg-sctpecn;

      &I-D.welzl-ecn-benefits;
    </references>

    <section anchor="DCTCP_Ambiguity"
             title="Ambiguity of the More Accurate ECN Feedback in DCTCP">
      <t>As defined in <xref target="I-D.bensley-tcpm-dctcp"/>, a DCTCP
      receiver feeds back ECE=0 on delayed ACKs as long as CE remains 0, and
      also immediately sends an ACK with ECE=0 when CE transitions to 1.
      Similarly, it continually feeds back ECE=1 on delayed ACKs while CE
      remains 1 and immediately feeds back ECE=1 when CE transitions to 0. A
      sender can unambiguously decode this scheme if there is never any ACK
      loss, and the sender assumes there will never be any ACK loss.</t>

      <t>The following two examples show that the feedback sequence becomes
      highly ambiguous to the sender, if either of these conditions is broken.
      Below, '0' will represent ECE=0, '1' will represent ECE=1 and '.' will
      represent a gap of one segment between delayed ACKs. Now imagine that
      the sender receives the following sequence of feedback on 3 pure
      ACKs:<list style="empty">
          <t>0.0.0</t>
        </list>When the receiver sent this sequence it could have been any of
      the following four sequences:<list style="letters">
          <t>0.0.0 (0 x CE)</t>

          <t>010.0 (1 x CE)</t>

          <t>0.010 (1 x CE)</t>

          <t>01010 (2 x CE)</t>
        </list>where any of the 1s represent a possible pure ACK carrying ECE
      feedback that could have been lost. If the sender guesses (a), it might
      be correct, or it might miss 1 or 2 congestion marks over 5 packets.
      Therefore, when confronted with this simple sequence (that is not
      contrived), a sender can guess that congestion might have been 0%, 20%
      or 40%, but it doesn't know which.</t>

      <t>Sequences with a longer gap (e.g. 0...0.0) become far more ambiguous.
      It helps a little if the sender knows the distance the receiver uses
      between delayed ACKs, and it helps a lot if the distance is 1, i.e. no
      delayed ACKs, but even then there will still be ambiguity whenever there
      are pure ACK losses.</t>

      <!--      <t>Another simple example illustrates how quickly the ambiguity can get
      out of hand. Imagine the sender receives this sequence of feedback on
      pure ACKs:<list style="empty">
          <t>0...0.0</t>
        </list>The sender could guess that the receiver originally sent any of
      the following nine sequences:<list style="letters">
          <t>0.0.0.0 (0 x CE)</t>

          <t>010.0.0 (1 x CE)</t>

          <t>0.010.0 (1 x CE)</t>

          <t>001.0.0 (1 x CE)</t>

          <t>0.1.0.0 (2 x CE)</t>

          <t>00.10.0 (2 x CE)</t>

          <t>01.0010 (2 x CE)</t>

          <t>0.110.0 (3 x CE)</t>

          <t>01010.0 (3 x CE)</t>
        </list>If the sender guesses (a), it might be correct, or it might
      miss 1, 2 or 3 congestion marks over 7 packets. Therefore, when
      confronted with this simple sequence (that is not contrived), a sender
      can guess that congestion might have been 0%, 14%, 29% or 43%., but it
      doesn't know which. </t> -->
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-22 23:51:35