One document matched: draft-ietf-tcpm-accecn-reqs-05.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2018 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2018.xml">
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC3168 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml">
<!ENTITY RFC3540 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3540.xml">
<!ENTITY RFC5562 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5562.xml">
<!ENTITY RFC5681 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5681.xml">
<!ENTITY RFC5690 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5690.xml">
<!ENTITY RFC6679 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6679.xml">
<!ENTITY RFC6789 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6789.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-ietf-tcpm-accecn-reqs-05"
     ipr="trust200902">
  <!-- updates="3186" -->

  <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
       full title is longer than 39 characters -->

    <title abbrev="Requirements for More Accurate ECN">Problem Statement and
    Requirements for a More Accurate ECN Feedback</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

    <author fullname="Mirja Kühlewind" initials="M." role="editor"
            surname="Kühlewind">
      <organization>University of Stuttgart</organization>

      <address>
        <postal>
          <street>Pfaffenwaldring 47</street>

          <code>70569</code>

          <city>Stuttgart</city>

          <country>Germany</country>
        </postal>

        <email>mirja.kuehlewind@ikr.uni-stuttgart.de</email>
      </address>
    </author>

    <author fullname="Richard Scheffenegger" initials="R."
            surname="Scheffenegger">
      <organization>NetApp, Inc.</organization>

      <address>
        <postal>
          <street>Am Euro Platz 2</street>

          <code>1120</code>

          <city>Vienna</city>

          <region/>

          <country>Austria</country>
        </postal>

        <phone>+43 1 3676811 3146</phone>

        <email>rs@netapp.com</email>
      </address>
    </author>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <street>Martlesham Heath</street>

          <city>Ipswich</city>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <email>bob.briscoe@bt.com</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <date day="12" month="February" year="2014"/>

    <area>Transport</area>

    <workgroup>TCP Maintenance and Minor Extensions (tcpm)</workgroup>

    <keyword>Internet-Draft</keyword>

    <keyword>I-D</keyword>

    <abstract>
      <t>Explicit Congestion Notification (ECN) is an IP/TCP mechanism where
      network nodes can mark IP packets instead of dropping them to indicate
      congestion to the end-points. An ECN-capable receiver will feed this
      information back to the sender. ECN is specified for TCP in such a way
      that it can only feed back one congestion signal per Round-Trip Time
      (RTT). In contrast, ECN for other transport protocols, such as RTP/UDP
      and SCTP, is specified with more accurate ECN feedback. Recent new TCP
      mechanisms (like ConEx or DCTCP) need more accurate ECN feedback in the
      case where more than one marking is received in one RTT. This document
      specifies requirements for an update to the TCP protocol to provide more
      accurate ECN feedback.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> is an
      IP/TCP mechanism where network nodes can mark IP packets instead of
      dropping them to indicate congestion to the end-points. An ECN-capable
      receiver will feed this information back to the sender. ECN is specified
      for TCP in such a way that only one feedback signal can be transmitted
      per Round-Trip Time (RTT). This is sufficient for pre-existing TCP
      congestion control mechanisms that perform only one reduction in sending
      rate per RTT, independent of the number of ECN congestion marks. But
      recently proposed or deployed mechanisms like Congestion Exposure
      (ConEx) <xref target="RFC6789"/> or Data Center TCP (DCTCP) <xref
      target="Ali10"/> need more accurate ECN feedback to work correctly in
      the case where more than one marking is received in any one RTT.</t>

      <t>ECN is also defined for transport protocols beside TCP. ECN feedback
      as defined for RTP/UDP <xref target="RFC6679"/> provides a very detailed
      level of information, delivering individual counters for all four ECN
      codepoints as well as lost and duplicate segments, but at the cost of
      high signaling overhead. ECN feedback for SCTP <xref
      target="I-D.stewart-tsvwg-sctpecn"/> delivers a counter for the number
      of CE marked segments between CWR chunks, but also comes at the cost of
      increased overhead.</t>

      <t>Today, implementations of DCTCP already exist that alter TCP's ECN
      feedback protocol in proprietary ways (DCTCP was released in Microsoft
      Windows 8, and implementations exist for Linux and FreeBSD). The changes
      DCTCP makes to TCP are not currently the subject of any IETF
      standardization activity, and they omit capability negotiation, relying
      instead on uniform configuration across a across all
      hosts and network devices with ECN capability. A primary motivation
      for this document is to intervene before each proprietary implementation
      invents its own non-interoperable handshake, which could lead to <spanx
      style="emph">de facto</spanx> consumption of the few flags or codepoints
      that remain available for standardizing capability negotiation.</t>

      <t>This document lists requirements for a robust and interoperable more
      accurate TCP/ECN feedback protocol that all implementations of new TCP
      extensions, like ConEx and/or DCTCP, can use. While a new feedback
      scheme should still deliver as much information as classic ECN, this
      document also clarifies what has to be taken into consideration in
      addition. Thus the listed requirements should be addressed in the
      specification of a more accurate ECN feedback scheme. A few solutions
      have already been proposed. <xref target="accecn_designs"/> demonstrates
      how to use the requirements to compare them, by briefly sketching their
      high level design choices and discussing the benefits and drawbacks of
      each.</t>

      <section title="Terminology">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>

        <t>We use the following terminology from <xref target="RFC3168"/> and
        <xref target="RFC3540"/>:</t>

        <t>The ECN field in the IP header: <list hangIndent="10" style="empty">
            <t><list hangIndent="9" style="hanging">
                <t hangText="Not-ECT:">the not ECN-Capable Transport
                codepoint,</t>

                <t hangText="CE:">the Congestion Experienced codepoint,</t>

                <t hangText="ECT(0):">the first ECN-Capable Transport
                codepoint, and</t>

                <t hangText="ECT(1):">the second ECN-Capable Transport
                codepoint.</t>
              </list></t>
          </list> The ECN flags in the TCP header: <list hangIndent="10"
            style="empty">
            <t><list hangIndent="9" style="hanging">
                <t hangText="CWR:">the Congestion Window Reduced flag,</t>

                <t hangText="ECE:">the ECN-Echo flag, and</t>

                <t hangText="NS:">ECN Nonce Sum.</t>
              </list></t>
          </list></t>

        <t>In this document, the ECN feedback scheme as specified in <xref
        target="RFC3168"/> is called 'classic ECN' and any new proposal is
        called a 'more accurate ECN feedback' scheme. A 'congestion mark' is
        defined as an IP packet where the CE codepoint is set. A 'congestion
        episode' refers to one or more congestion marks that belong to the
        same overload situation in the network (usually during one RTT). A TCP
        segment with the acknowledgment flag set is simply called ACK.</t>
      </section>
    </section>

    <section anchor="accecn_recap"
             title="Recap of Classic ECN and ECN Nonce in IP/TCP">
      <t>ECN requires two bits in the IP header. The ECN capability of a
      packet is indicated when either one of the two bits is set. <!--An 
    ECN sender can set one or the other bit to indicate an ECN-capable 
    transport (ECT) which results in two signals, ECT(0) and ECT(1).--> A
      network node can set both bits simultaneously when it experiences
      congestion. This leads to the four codepoints (not-ECT, ECT(0), ECT(1),
      and CE) as listed above. <!--When both bits are set the 
    packet is regarded as "Congestion Experienced" (CE).--></t>

      <t>In the TCP header the first two bits in byte 14 are defined as ECN
      feedback for each half-connection. A TCP receiver signals the reception
      of a congestion mark using the ECN-Echo (ECE) flag in the TCP header.
      For reliability, the receiver continues to set the ECE flag on every
      ACK. To enable the TCP receiver to determine when to stop setting the
      ECN-Echo flag, the sender sets the CWR flag upon reception of an ECE
      feedback signal. This always leads to a full RTT of ACKs with ECE set.
      Thus the receiver cannot signal back any additional CE markings arriving
      within the same RTT.</t>

      <t>The ECN Nonce <xref target="RFC3540"/> is an experimental addition to
      ECN that the TCP sender can use to protect itself against accidental or
      malicious concealment of CE-marked (or dropped) packets. This addition
      defines the last bit of byte 13 in the TCP header as the Nonce Sum (NS)
      flag. The receiver maintains a nonce sum that counts the occurrence of
      ECT(1) packets, and signals the least significant bit of this sum on the
      NS flag.</t>

      <figure align="center" anchor="TCPHdr"
              title="The (post-ECN Nonce) definition of the TCP header flags">
        <artwork align="center"><![CDATA[             
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
|               |           |   | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
      </figure>

      <t>However, as the ECN Nonce is a separate extension to ECN, even if a
      sender tries to protect itself with the ECN Nonce, any receiver wishing
      to conceal marked packets only has to pretend not to support the ECN
      Nonce and simply does not provide any nonce sum feedback.</t>

      <t>An alternative for a sender to assure feedback integrity has been
      proposed where the sender occasionally inserts a CE mark itself (or
      reordering or loss), and checks that the receiver feeds it back
      faithfully <xref target="I-D.moncaster-tcpm-rcv-cheat"/>. This
      alternative requires no standardization and consumes no header bits or
      codepoints, as well as releasing the ECT(1) codepoint in the IP header
      and the NS flag in the TCP header for other uses.</t>
    </section>

    <section title="Use Cases">
      <t>ConEx is an experimental approach that allows a sender to relay
      congestion feedback provided by the receiver into the network along the
      forward data path. ConEx information can be used for traffic management
      to limit traffic proportionate to the actual congestion being caused,
      rather than limiting traffic based on rate or volume <xref
      target="RFC6789"/>. A ConEx sender uses selective acknowledgements
      (SACK) <xref target="RFC2018"/> for accurate feedback of loss signals,
      but currently TCP offers no equivalent accurate feedback for ECN.</t>

      <t>DCTCP offers very low and predictable queuing delay. DCTCP changes
      the reaction to congestion of a TCP sender and additionally requires
      switches/routers to have ECN enabled and configured with a low step
      threshold and no signal smoothing, so it is currently only used in
      private networks, e.g. internal to data centers. DCTCP was released in
      Microsoft Windows 8, and implementations exist for Linux and FreeBSD. To
      retrieve sufficient congestion information, the different DCTCP
      implementations use a proprietary ECN feedback protocol, but they omit
      capability negotiation. Moreover, the feedback protocol proposed in
      <xref target="Ali10"/> only works if there are no losses at all, and
      otherwise it gets very confused (see <xref target="DCTCP_Ambiguity"/>).
      Therefore, if a generic more accurate ECN feedback scheme were
      available, it would solve two problems for DCTCP: i) need for a
      consistent variant of DCTCP to be deployed network-wide and ii)
      inability to cope with ACK loss.</t>

      <t>The following scenarios should briefly show where accurate ECN
      feedback is needed or adds value: <list hangIndent="8" style="hanging">
          <t
          hangText="A sender with standardised TCP congestion control that supports ConEx:"><vspace/>
          In this case the ConEx mechanism uses the extra information per RTT
          to re-echo the precise congestion information, but the congestion
          control algorithm still ignores multiple marks per RTT <xref
          target="RFC5681"/>.</t>

          <t
          hangText="A sender using DCTCP congestion control without ConEx:"><vspace/>
          The congestion control algorithm uses the extra info per RTT to
          perform its decrease depending on the number of congestion
          marks.</t>

          <t
          hangText="A sender using DCTCP congestion control and supporting ConEx:"><vspace/>
          Both the congestion control algorithm and ConEx use the more
          accurate ECN feedback mechanism.</t>

          <t hangText="As-yet-unspecified sender mechanisms:"><vspace/> The
          above are two examples of more general interest in sender mechanisms
          that respond to the extent of congestion feedback, not just its
          existence. It will greatly simplify incremental deployment if the
          sender can unilaterally deploy new behaviours, and rely on the
          presence of generic receivers that have already implemented more
          accurate feedback.</t>

          <t hangText="A RFC5681 TCP sender without ConEx:"><vspace/> No
          accurate feedback is necessary here. The congestion control
          algorithm still reacts to only one signal per RTT. But it is best to
          feed back all the information the receiver gets, whether the sender
          uses it or not — at least as long as overhead is low or
          zero.</t>

          <t hangText="Using CE for checking integrity:"><vspace/> If a more
          accurate ECN feedback scheme feeds all occurrences of CE marks back,
          a sender could perform integrity checking by occasionally injecting
          CE marks itself. Specifically, a sender can send packets which it
          randomly marks with CE (at low frequency), then check if feedback is
          received for these packets. The congestion notification feedback for
          these self-injected markings, would not require a congestion control
          reaction <xref target="I-D.moncaster-tcpm-rcv-cheat"/>.</t>
        </list></t>
    </section>

    <section anchor="accecn_reqs" title="Requirements">
      <t>The requirements of the accurate ECN feedback protocol <!--, for the use of e.g. Conex or DCTCP,-->
      are to have fairly accurate (not necessarily perfect), timely and
      protected signaling. This leads to the following requirements, which
      MUST be discussed for any proposed more accurate ECN feedback
      scheme:</t>

      <t><list hangIndent="8" style="hanging">
          <t hangText="Resilience"><vspace/> The ECN feedback signal is
          carried within the ACK. Pure TCP ACKs can get lost without recovery
          (not just due to congestion, but also due to deliberate ACK
          thinning). Moreover, delayed ACKs are commonly used with TCP.
          Typically, an ACK is triggered after two data segments (or more
          e.g., due to receive segment coalescing, ACK compression, ACK
          congestion control <xref target="RFC5690"/> or other phenomena). In
          a high congestion situation where most of the packets are marked
          with CE, an accurate feedback mechanism should still be able to
          signal sufficient congestion information. Thus the accurate ECN
          feedback extension has to take delayed ACKs and ACK loss into
          account. Also, a more accurate feedback protocol should still work
          if delayed ACKs covered more than two packets.<vspace
          blankLines="1"/></t>

          <t hangText="Timeliness"><vspace/> A CE mark can be induced by a
          network node on the transmission path and is then echoed by the
          receiver in the TCP ACK. Thus when this information arrives at the
          sender, it is naturally already about one RTT old. With a sufficient
          ACK rate a further delay of a small number of packets can be
          tolerated. However, this information will become stale with large
          delays, given the dynamic nature of networks. TCP congestion control
          (which itself partly introduces these dynamics) operates on a time
          scale of one RTT. Thus, to be timely, congestion feedback
          information should be delivered within about one RTT.</t>

          <t hangText="Integrity"><vspace/> <!-- With ECN Nonce, a misbehaving receiver or network node 
          can be detected with good probability. If the accurate ECN 
          feedback is reusing the NS bit, it is encouraged to ensure 
          integrity at least as good as ECN Nonce. If this is not 
          possible, alternative approaches should be provided how a 
          mechanism using the accurate ECN feedback extension can re-
          ensure integrity or give strong incentives for the receiver 
          and network node to cooperate honestly.-->It should be possible to
          assure the integrity of the feedback in a more accurate ECN feedback
          scheme, at least as well as the ECN Nonce. Alternatively, it should
          at least be possible to give strong incentives for the receiver and
          network nodes to cooperate honestly. <vspace blankLines="1"/>Given
          there are known problems with the ECN nonce (as identified above),
          this document only requires that the integrity of the more accurate
          ECN feedback can be assured as an inherent part of the new more
          accurate ECN feedback protocol; it does not require that the ECN
          Nonce mechanism is employed to achieve this. Indeed, if integrity
          could be provided else-wise, a more accurate ECN feedback protocol
          might re-purpose the nonce sum (NS) flag in the TCP header. <vspace
          blankLines="1"/> If the more accurate ECN feedback scheme provides
          sufficient information, the integrity check could e.g. be performed
          by deterministically setting the CE in the sender and monitoring the
          respective feedback (similar to ECT(1) and the ECN Nonce sum).
          Whether a sender should enforce when it detects wrong feedback
          information, and what kind of enforcement it should apply, are
          policy issues that need not be specified as part of more accurate
          ECN feedback scheme.</t>

          <t hangText="Accuracy"><vspace/> <!--In TCP usually delayed ACKs are used. Thats means in most 
          cases only for every second data packets an acknowledgment is 
          sent. Moreover, an ACK can get lost.-->Classic ECN feeds back one
          congestion notification per RTT, which is sufficient for classic TCP
          congestion control which reduces the sending rate at most once per
          RTT. Thus the more accurate ECN feedback scheme should ensure that,
          if a congestion episode occurs, at least one congestion notification
          is echoed and received per RTT as classic ECN would do. Of course,
          the goal of a more accurate ECN extension is to reconstruct the
          number of CE markings more accurately. In the best case the new
          scheme should even allow reconstruction of the exact number of
          payload bytes that a CE marked packet was carrying. However, it is
          accepted that it may be too complex for a sender to get the exact
          number of congestion markings or marked bytes in all situations.
          Ideally, the feedback scheme should preserve the order in which any
          (of the four) ECN signals were received. And, ideally, it would even
          be possible for the sender to determine which of the packets covered
          by one delayed ACK were congestion marked, e.g. if the flow consists
          of packets of different sizes, or to allow for future protocols
          where the order of the markings may be important. <vspace
          blankLines="1"/> In the best case, a sender that sees more accurate
          ECN feedback information would be able to reconstruct the occurrence
          of any of the four code points (non-ECT, CE, ECT(0), ECT(1)).
          However, assuming the sender marks all data packets as ECN-capable
          and uses the default setting of ECT(0), solely feeding back the
          occurrence of CE and ECT(1) might be sufficient. Thus a more
          accurate ECN feedback scheme should at least provide information on
          these two signals, CE and ECT(1).<vspace blankLines="1"/>If a more
          accurate ECN scheme can reliably deliver feedback in most but not
          all circumstances, ideally the scheme should at least not introduce
          bias. In other words, undetected loss of some ACKs should be as
          likely to increase as decrease the sender's estimate of the
          probability of ECN marking.</t>

          <t hangText="Complexity"><vspace/> Implementation should be as
          simple as possible and only a minimum of additional state
          information should be needed. This will enable more accurate ECN
          feedback to be used as the default feedback mechanism, even if only
          one ECN feedback signal per RTT is needed. Furthermore, the receiver
          should not make assumptions about the mechanism that was used to set
          the markings nor about any interpretation or reaction to the
          congestion signal. The receiver only needs to faithfully reflect
          congestion information back to the sender. <!--A proposal fulfilling this for a more accurate ECN 
          feedback can then also be the standard ECN feedback 
          mechanism.--></t>

          <t hangText="Overhead"><vspace/> A more accurate ECN feedback signal
          should limit the additional network load, because ECN feedback is
          ultimately not critical information (in the worst case, loss will
          still be available as a congestion signal of last resort). As
          feedback information has to be provided frequently and in a timely
          fashion, potentially all or a large fraction of TCP acknowledgments
          might carry this information. Ideally, no additional segments should
          be exchanged compared to an RFC3168 TCP session, and the overhead in
          each segment should be minimized.</t>

          <t hangText="Backward and forward compatibility"><vspace/> Given
          more accurate ECN feedback will involve a change to the TCP
          protocol, it should to be negotiated between the two TCP endpoints.
          If either end does not support the more accurate feedback, they
          should both be able to fall-back to classic ECN feedback. <vspace
          blankLines="1"/> A more accurate ECN feedback extension should aim
          to be able to traverse most existing middleboxes. Further, a
          feedback mechanism should provide a method to fall-back to classic
          ECN signaling if the new signal is suppressed by certain
          middleboxes. <vspace blankLines="1"/> In order to avoid a fork in
          the TCP protocol specifications, if experiments with the new ECN
          feedback protocol are successful, it is intended to eventually
          update RFC3168 for any TCP/ECN sender, not just for ConEx or DCTCP
          senders. Then future senders will be able to unilaterally deploy new
          behaviours that exploit the existence of more accurate ECN feedback
          in receivers (forward compatibility). Conversely, even if another
          sender only needs one ECN feedback signal per RTT, it should be able
          to use more accurate ECN feedback, and simply ignore the excess
          information.</t>
        </list></t>
    </section>

    <section anchor="accecn_designs" title="Design Approaches">
      <t><!-- ToDo: Consider reemphasising why these sections are needed in a requirements doc -->
      All approaches presented below (and proposed so far) are able to provide
      accurate ECN feedback information as long as no ACK loss occurs and the
      congestion rate is reasonable. In case of a high ACK loss rate or very
      high congestion (CE marking) rate, the proposed schemes have different
      resilience characteristics depending on the number of bits used for the
      encoding. While classic ECN provides reliable (but inaccurate) feedback
      of a maximum of one congestion signal per RTT, the proposed schemes do
      not implement an explicit acknowledgement mechanism for the feedback (as
      e.g. the ECE / CWR exchange of <xref target="RFC3168"/>).</t>

      <section title="Re-Definition of ECN/NS Header Bits"><!--as a Flag-->
	<t>Schemes in this category can additionally use the NS bit for 
	capability negotiation during the TCP handshake exchange. Thus
	a more accurate ECN could be negotiated without changing the classic ECN
	negotiation and thus being backwards compatible.</t>
	
        <t>Schemes in this category can simply re-define the ECN header flags, ECE 
	and CWR, to encode the occurrence of a CE marking at the receiver. This
        approach provides very limited resilience against loss of ACK,
        particularly pure ACKs (no payload and therefore delivered
        unreliably).</t>

        <t>A couple of schemes have been proposed so far: <list
            style="symbols">
            <t>A naive one-bit scheme that sends one ECE for each CE received
            could use CWR to increase robustness against ACK loss by
            introducing redundant information on the next ACK, but this is
            still highly vulnerable to ACK loss.</t>

            <t>The scheme defined for DCTCP <xref target="Ali10"/>, which
            toggles the ECE feedback on an immediate ACK whenever the CE
            marking changes, and otherwise feeds back delayed ACKs with the
            ECE value unchanged. <xref target="DCTCP_Ambiguity"/> demonstrates
            that this scheme is still highly ambiguous to the sender if the
            ACKs are pure ACKs, and if some may have been lost.</t>
          </list></t>
      <!--</section>-->

      <!--<section title="Re-Definition of ECN/NS Header Bits as a Field">-->
	<t> Alternatively, the receiver uses the three ECN/NS header
        flags, ECE, CWR and NS to represent a counter that signals the 
	accumulated number of CE markings it has received. Resilience 
	against loss is better than the flag-based schemes, but still not 
	ideal.</t>

        <t>A couple of coding schemes have been proposed so far in this
        category: <list style="symbols">
            <t>A 3-bit counter scheme continuously feeds back the three least
            significant bits of a CE counter;</t>

            <t>A scheme that defines a standardised lookup table to map the 8
            codepoints onto either a CE counter or an ECT(1) counter.</t>
          </list></t>

        <t>These proposed schemes provide accumulated information on ECN-CE
        marking feedback, similar to the number of acknowledged bytes in the
        TCP header. Due to the limited number of bits the ECN feedback
        information will wrap much more often than the acknowledgement field.
        Thus feedback information could be lost due to a relatively small
        sequence of pure-ACK losses. Resilience could be increased by
        introducing redundancy, e.g. send each counter increase two or more
        times. Of course any of these additional mechanisms will increase the
        complexity. If the congestion rate is greater than the ACK rate
        (multiplied by the number of congestion marks that can be signaled per
        ACK), the congestion information cannot correctly be fed back.
        Covering the worst case where every packet is CE marked can
        potentially be realized by dynamically adapting the ACK rate and
        redundancy. This again increases complexity and perhaps the signaling
        overhead as well. Schemes that do not re-purpose the ECN NS bit, could
        still support the ECN Nonce.</t>
      </section>

      <section title="Using Other Header Bits ">
        <t>As seen in <xref target="TCPHdr"/>, there are currently three
        unused flags in the TCP header. The proposed 3-bit counter or
        codepoint schemes could be extended by one or more bits to add higher
        resilience against ACK loss. The relative gain would be exponentially
        higher resilience against ACK loss, while the respective drawbacks
        would remain identical.</t>

        <t>Alternatively, the receiver could use bits in the Urgent Pointer
        field to signal more bits of its congestion signal counter, but only
        whenever it does not set the Urgent Flag. As this is often the case,
        resilience could be increased without additional header overhead.</t>

        <t>Any proposal to use such bits would need to check the likelihood
        that some middleboxes might discard or 'normalize' the currently
        unused flag bits or a non-zero Urgent Pointer when the Urgent Flag is
        cleared.</t>
      </section>

      <section title="Using a TCP Option">
        <t>Alternatively, a new TCP option could be introduced, to help
        maintain the accuracy and integrity of ECN feedback between receiver
        and sender. Such an option could provide higher resilience and even
        more information. E.g. ECN for RTP/UDP <xref target="RFC6679"/>
        explicitly provides the number of ECT(0), ECT(1), CE, non-ECT marked
        and lost packets, and SCTP counts the number of ECN marks <xref
        target="I-D.stewart-tsvwg-sctpecn"/> between CWR chunks. However,
        deploying new TCP options has its own challenges. Moreover, to
        actually achieve high resilience, this option would need to be carried
        by most or all ACKs. Thus this approach would introduce considerable
        signaling overhead even though ECN feedback is not extremely critical
        information (in the worst case, loss will still be available to
        provide a strong congestion feedback signal). Whatever, such a TCP
        option could be used in addition to a more accurate ECN feedback
        scheme in the TCP header or in addition to classic ECN, only when
        needed and when space is available.</t>
      </section>

      <!-- 
    
    <t>Combining the idea of <xref target="eci_mode"/> and <xref 
    target="cp_mode"/>, further extending it to a one-octet option, 
    would allow the signaling of two values, each with 4 bit. The gains 
    in worst case ACK loss, delayed ACK ratios and maintaining ECN Nonce 
    would scale accordingly. </t> 
    
    <t>Alternatively, if timestamp capability negotiation is supported, 
    a few bits could be extracted from the timestamp value, to provide 
    extended signaling. However, processing TCP options (or overloaded 
    TCP options) is more complex than processing of header flags. </t>
    
    -->
    </section>

    <section title="Acknowledgements">
      <t>Thanks <!-- to Bob Briscoe for reviewing and providing valuable 
    additions on DCTCP and ConEx. Moreover, thanks -->to Gorry Fairhurst <!-- as 
    well as Bob Briscoe -->for ideas on CE-based integrity checking and to
      Mohammad Alizadeh for suggesting the need to avoid bias. Moverover,
    thanks to Michael Welzl and Michael Scharf for their feedback.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>

      <!--
    
    <t> If this memo was to progress to standards track, it would update 
    RFC3168 and RFC3540, to add new combinations of flags in the TCP 
    header for capability negotiation (see <xref target="TCPNeg"/>) and 
    a change in TCP ECN semantics (see <xref target="TCPSig"/>).</t>
    
    -->
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>Given ECN feedback is used as input for congestion control, the
      respective algorithm would not react appropriately if ECN feedback were
      lost and the resilience mechanism to recover it was inadequate. This
      resilience requirement is articulated in <xref target="accecn_reqs"/>.
      However, it should be noted that ECN feedback is not the last resort
      against congestion collapse, because if there is insufficient response
      to ECN, loss will ensue, and TCP will still react appropriately to
      loss.</t>

      <t>A receiver could suppress ECN feedback information leading to its
      connections consuming excess sender or network resources. <!--Or an attacker could providing wrong congestion information 
    which then easily leads to throttling of certain connections. These 
    problems are --> This problem is similar to that seen with the classic ECN
      feedback scheme and should be addressed by integrity checking as
      required in <xref target="accecn_reqs"/>.</t>
    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->

      &RFC2119;

      &RFC3168;

      &RFC3540;
    </references>

    <references title="Informative References">
      <!--      <?rfc include="reference.I-D.briscoe-tsvwg-re-ecn-tcp.xml"?> -->

      <!--      <?rfc include="reference.I-D.kuehlewind-tcpm-accurate-ecn-option.xml"?> -->

      <?rfc include="reference.I-D.moncaster-tcpm-rcv-cheat.xml"?>

      <?rfc include="reference.I-D.stewart-tsvwg-sctpecn.xml"?>

      &RFC2018;

      <!--      &RFC5562; -->

      &RFC5681;

      &RFC5690;

      &RFC6679;

      &RFC6789;

      <reference anchor="Ali10"
                 target="http://portal.acm.org/citation.cfm?id=1851192">
        <front>
          <title>Data Center TCP (DCTCP)</title>

          <author fullname="Mohammad Alizadeh" initials="M" surname="Alizadeh">
            <organization/>
          </author>

          <author fullname="Albert Greenberg" initials="A" surname="Greenberg">
            <organization/>
          </author>

          <author fullname="David A. Maltz" initials="D.A." surname="Maltz">
            <organization/>
          </author>

          <author fullname="Jitendra Padhye" initials="J" surname="Padhye">
            <organization/>
          </author>

          <author fullname="Parveen Patel" initials="P" surname="Patel">
            <organization/>
          </author>

          <author fullname="Balaji Prabhakar" initials="B" surname="Prabhakar">
            <organization/>
          </author>

          <author fullname="Sudipta Sengupta" initials="S" surname="Sengupta">
            <organization/>
          </author>

          <author fullname="Murari Sridharan" initials="M" surname="Sridharan">
            <organization/>
          </author>

          <date month="October" year="2010"/>
        </front>

        <seriesInfo name="ACM SIGCOMM CCR" value="40(4)63-74"/>

        <format target="http://portal.acm.org/citation.cfm?id=1851192"
                type="PDF"/>
      </reference>
    </references>

    <section anchor="DCTCP_Ambiguity"
             title="Ambiguity of the More Accurate ECN Feedback in DCTCP">
      <t>As defined in <xref target="Ali10"/>, a DCTCP receiver feeds back
      ECE=0 on delayed ACKs as long as CE remains 0, and also immediately
      sends an ACK with ECE=0 when CE transitions to 1. Similarly, it
      continually feeds back ECE=1 on delayed ACKs while CE remains 1 and
      immediately feeds back ECE=1 when CE transitions to 0. A sender can
      unambiguously decode this scheme if there is never any ACK loss, and the
      sender assumes there will never be any ACK loss. </t>

      <t>The following two examples show that the feedback sequence becomes
      highly ambiguous to the sender, if either of these conditions is broken.
      Below, '0' will represent ECE=0, '1' will represent ECE=1 and '.' will
      represent a gap of one segment between delayed ACKs. Now imagine that
      the sender receives the following sequence of feedback on 3 pure
      ACKs:<list style="empty">
          <t>0.0.0</t>
        </list>When the receiver sent this sequence it could have been any of
      the following four sequences:<list style="letters">
          <t>0.0.0 (0 x CE)</t>

          <t>010.0 (1 x CE)</t>

          <t>0.010 (1 x CE)</t>

          <t>01010 (2 x CE)</t>
        </list>where any of the 1s represent a possible pure ACK carrying ECE
      feedback that could have been lost. If the sender guesses (a), it might
      be correct, or it might miss 1 or 2 congestion marks over 5 packets.
      Therefore, when confronted with this simple sequence (that is not
      contrived), a sender can guess that congestion might have been 0%, 20%
      or 40%, but it doesn't know which.</t>

      <t>Sequences with a longer gap (e.g. 0...0.0) become far more ambiguous.
      It helps a little if the sender knows the distance the receiver uses
      between delayed ACKs, and it helps a lot if the distance is 1, i.e. no
      delayed ACKs, but even then there will still be ambiguity whenever there
      are pure ACK losses. </t>

      <!--      <t>Another simple example illustrates how quickly the ambiguity can get
      out of hand. Imagine the sender receives this sequence of feedback on
      pure ACKs:<list style="empty">
          <t>0...0.0</t>
        </list>The sender could guess that the receiver originally sent any of
      the following nine sequences:<list style="letters">
          <t>0.0.0.0 (0 x CE)</t>

          <t>010.0.0 (1 x CE)</t>

          <t>0.010.0 (1 x CE)</t>

          <t>001.0.0 (1 x CE)</t>

          <t>0.1.0.0 (2 x CE)</t>

          <t>00.10.0 (2 x CE)</t>

          <t>01.0010 (2 x CE)</t>

          <t>0.110.0 (3 x CE)</t>

          <t>01010.0 (3 x CE)</t>
        </list>If the sender guesses (a), it might be correct, or it might
      miss 1, 2 or 3 congestion marks over 7 packets. Therefore, when
      confronted with this simple sequence (that is not contrived), a sender
      can guess that congestion might have been 0%, 14%, 29% or 43%., but it
      doesn't know which. </t> -->
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 21:45:45