One document matched: draft-kuehlewind-tcpm-accurate-ecn-03.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC0793 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.0793.xml">
<!ENTITY RFC2018 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2018.xml">
<!ENTITY RFC2119 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC3168 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml">
<!ENTITY RFC3540 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3540.xml">
<!ENTITY RFC4987 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4987.xml">
<!ENTITY RFC5226 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5226.xml">
<!ENTITY RFC5562 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5562.xml">
<!ENTITY RFC5681 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5681.xml">
<!ENTITY RFC5925 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5925.xml">
<!ENTITY RFC6824 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6824.xml">
<!ENTITY RFC6994 SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6994.xml">
<!ENTITY I-D.ietf-tcpm-accecn-reqs SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-tcpm-accecn-reqs.xml">
<!ENTITY I-D.ietf-tcpm-fastopen SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-tcpm-fastopen.xml">
<!ENTITY I-D.ietf-conex-abstract-mech SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-conex-abstract-mech.xml">
<!ENTITY I-D.kuehlewind-tcpm-ecn-fallback SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.kuehlewind-tcpm-ecn-fallback.xml">
<!ENTITY I-D.moncaster-tcpm-rcv-cheat SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.moncaster-tcpm-rcv-cheat.xml">
<!ENTITY I-D.bensley-tcpm-dctcp SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.bensley-tcpm-dctcp.xml">
]>
<?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?>
<!-- Alterations to I-D/RFC boilerplate -->
<?rfc private="" ?>
<!-- Default private="" Produce an internal memo 2.5pp shorter than an I-D or RFC -->
<?rfc rfcprocack="yes" ?>
<!-- Default rfcprocack="no" add a short sentence acknowledging xml2rfc -->
<?rfc strict="no" ?>
<!-- Default strict="no" Don't check I-D nits -->
<?rfc rfcedstyle="yes" ?>
<!-- Default rfcedstyle="yes" attempt to closely follow finer details from the latest observable RFC-Editor style -->
<!-- IETF process -->
<?rfc iprnotified="no" ?>
<!-- Default iprnotified="no" I haven't disclosed existence of IPR to IETF -->
<!-- ToC format -->
<?rfc toc="yes" ?>
<!-- Default toc="no" No Table of Contents -->
<!-- Cross referencing, footnotes, comments -->
<?rfc symrefs="yes"?>
<!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->
<?rfc sortrefs="yes"?>
<!-- Default sortrefs="no" Don't sort references into order -->
<?rfc comments="yes" ?>
<!-- Default comments="no" Don't render comments -->
<?rfc inline="no" ?>
<!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section -->
<!-- Pagination control -->
<?rfc compact="yes"?>
<!-- Default compact="no" Start sections on new pages -->
<?rfc subcompact="no"?>
<!-- Default subcompact="(as compact setting)" yes/no is not quite as compact as yes/yes -->
<!-- HTML formatting control -->
<?rfc emoticonic="yes" ?>
<!-- Default emoticonic="no" Doesn't prettify HTML format -->
<rfc category="exp" docName="draft-kuehlewind-tcpm-accurate-ecn-03"
     ipr="trust200902" updates="">
  <front>
    <title abbrev="Accurate TCP-ECN Feedback">More Accurate ECN Feedback in
    TCP</title>

    <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <street>Martlesham Heath</street>

          <city>Ipswich</city>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <email>bob.briscoe@bt.com</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <author fullname="Richard Scheffenegger" initials="R."
            surname="Scheffenegger">
      <organization>NetApp, Inc.</organization>

      <address>
        <postal>
          <street>Am Euro Platz 2</street>

          <code>1120</code>

          <city>Vienna</city>

          <region/>

          <country>Austria</country>
        </postal>

        <phone>+43 1 3676811 3146</phone>

        <email>rs@netapp.com</email>
      </address>
    </author>

    <author fullname="Mirja Kühlewind" initials="M."
            surname="Kühlewind">
      <organization>University of Stuttgart</organization>

      <address>
        <postal>
          <street>Pfaffenwaldring 47</street>

          <code>70569</code>

          <city>Stuttgart</city>

          <country>Germany</country>
        </postal>

        <email>mirja.kuehlewind@ikr.uni-stuttgart.de</email>
      </address>
    </author>

    <date day="02" month="July" year="2014"/>

    <area>Transport</area>

    <workgroup>Transport Area Working Group</workgroup>

    <keyword>Congestion Control and Management</keyword>

    <keyword>Congestion Notification</keyword>

    <keyword>Feedback</keyword>

    <keyword>Reliable</keyword>

    <keyword>Ordered</keyword>

    <keyword>Protocol</keyword>

    <keyword>ECN</keyword>

    <abstract>
      <t>Explicit Congestion Notification (ECN) is a mechanism where network
      nodes can mark IP packets instead of dropping them to indicate incipient
      congestion to the end-points. Receivers with an ECN-capable transport
      protocol feed back this information to the sender. ECN is specified for
      TCP in such a way that only one feedback signal can be transmitted per
      Round-Trip Time (RTT). Recently, new TCP mechanisms like Congestion
      Exposure (ConEx) or Data Center TCP (DCTCP) need more accurate ECN
      feedback information whenever more than one marking is received in one
      RTT. This document specifies an experimental scheme to provide more than
      one feedback signal per RTT in the TCP header. Given TCP header space is
      scarce, it overloads the three existing ECN-related flags in the TCP
      header. Also, to improve robustness it uses 15 more bits if available.
      For initial experiments it places these in a TCP option. However, if the
      Urgent flag is cleared, zero header overhead could be achieved by
      reusing the Urgent Pointer opportunistically. Therefore this document
      reserves space in the Urgent Pointer to be used if the protocol
      progresses to the standards track.</t>
    </abstract>
  </front>

  <!-- ================================================================ -->

  <middle>
    <!-- ================================================================ -->

    <section anchor="accecn_Introduction" title="Introduction">
      <t>Explicit Congestion Notification (ECN) <xref target="RFC3168"/> is a
      mechanism where network nodes can mark IP packets instead of dropping
      them to indicate incipient congestion to the end-points. Receivers with
      an ECN-capable transport protocol feed back this information to the
      sender. ECN is specified for TCP in such a way that only one feedback
      signal can be transmitted per Round-Trip Time (RTT). Recently, proposed
      mechanisms like Congestion Exposure (ConEx <xref
      target="I-D.ietf-conex-abstract-mech"/>) or DCTCP <xref
      target="I-D.bensley-tcpm-dctcp"/> need more accurate ECN feedback
      information whenever more than one marking is received in one RTT. A
      fuller treatment of the motivation for this specification is given in
      <xref target="I-D.ietf-tcpm-accecn-reqs"/>.</t>

      <t>This documents specifies an experimental scheme for ECN feedback in
      the TCP header to provide more than one feedback signal per RTT. It will
      be called the more accurate ECN feedback scheme, or AccECN for short. If
      AccECN progresses from experimental to the standards track, it is
      intended to be a complete replacement for classic ECN feedback, not a
      fork in the design of TCP. Thus, the applicability of AccECN is intended
      to include all public and private IP networks (and even any non-IP
      networks over which TCP is used today). Until the AccECN experiment
      succeeds, <xref target="RFC3168"/> will remain as the standards track
      specification for adding ECN to TCP. To avoid confusion we call the ECN
      specification of <xref target="RFC3168"/> 'classic ECN' in this
      document.</t>

      <t>AccECN is solely an (experimental) change to the TCP wire protocol.
      It is completely independent of how TCP might respond to congestion
      feedback. This specification overloads flags and fields in the main TCP
      header with new definitions, so both ends have to support the new wire
      protocol before it can be used. Therefore during the TCP handshake the
      two ends use the three ECN-related flags in the TCP header to negotiate
      the most advanced feedback protocol that they can both support.</t>

      <section title="Document Roadmap">
        <t>The following introductory sections outline the goals of AccECN
        (<xref target="accecn_Goals"/>) and the goal of experiments with ECN
        (<xref target="accecn_Expt_Goals"/>) so that it is clear what success
        would look like. Then terminology is defined (<xref
        target="accecn_Terminology"/>) and a recap of existing prerequisite
        technology is given (<xref target="accecn_Recap"/>).</t>

        <t><xref target="accecn_Overview"/> gives an informative overview of
        the AccECN protocol. Then <xref target="accecn_Spec"/> gives the
        normative protocol specification. <xref
        target="accecn_Interact_Variants"/> assesses the interaction of AccECN
        with commonly used variants of TCP, whether standardised or not. <xref
        target="accecn_Properties"/> summarises the features and properties of
        AccECN.</t>

        <t><xref target="accecn_IANA_Considerations"/> summarises the protocol
        fields and numbers that IANA will need to assign and <xref
        target="accecn_Security_Considerations"/> points to the aspects of the
        protocol that will be of interest to the security community, as well
        as discussing additional security-related issues.</t>

        <t>The following aspects are relegated to appendices:<list
            style="symbols">
            <t><xref target="accecn_Algo_Examples"/>: Pseudocode examples for
            the various algorithms that AccECN uses;</t>

            <t>Then three appendices for use during document development that
            will be deleted before publication {ToDo: Delete this list before
            publication}:<list style="symbols">
                <t><xref target="accecn_Alt_Designs"/>: Protocol design
                alternatives that could be considered for inclusion in the
                main specification;</t>

                <t><xref target="accecn_Open_Issues"/>: a 'To Do' list of open
                protocol design issues;</t>

                <t><xref target="accecn_Doc_Changes"/>: Document change
                log.</t>
              </list></t>
          </list></t>
      </section>

      <section anchor="accecn_Goals" title="Goals">
        <t><xref target="I-D.ietf-tcpm-accecn-reqs"/> enumerates requirements
        that a candidate feedback scheme will need to satisfy, under the
        headings: resilience, timeliness, integrity, accuracy (including
        ordering and lack of bias), complexity, overhead and compatibility
        (both backward and forward). It recognises that a perfect scheme that
        fully satisfies all the requirements is unlikely and trade-offs
        between requirements are likely. <xref target="accecn_Properties"/>
        presents the properties of AccECN against these requirements and
        discusses the trade-offs made.</t>

        <t>The requirements document recognises that a protocol as ubiquitous
        as TCP needs to be able to serve as-yet-unspecified requirements.
        Therefore an AccECN receiver aims to act as a generic reflector of
        congestion information so that in future new sender behaviours can be
        deployed unilaterally.</t>
      </section>

      <section anchor="accecn_Expt_Goals" title="Experiment Goals">
        <t>TCP is critical to the robust functioning of the Internet,
        therefore any proposed modifications to TCP need to be thoroughly
        tested. The present specification describes an experimental protocol
        that adds more accurate ECN feedback to the TCP protocol. The
        intention is to specify the protocol sufficiently so that more than
        one implementation can be built in order to test its function,
        robustness and interoperability (with itself and with previous version
        of ECN and TCP).</t>

        <t><list style="hanging">
            <t hangText="Success criteria: ">The experimental protocol will be
            considered successful if it satisfies the requirements of <xref
            target="I-D.ietf-tcpm-accecn-reqs"/> in the consensus opinion of
            the IETF tcpm working group. In short, this requires that it
            improves the accuracy and timeliness of TCP's ECN feedback, as
            claimed in <xref target="accecn_Properties"/>, while striking a
            balance between the conflicting requirements of resilience,
            integrity and minimisation of overhead. It also requires that it
            is not unduly complex, and that it is compatible with prevalent
            equipment behaviours in the current Internet, whether or not they
            comply with standards.</t>

            <t hangText="Duration: ">To be credible, the experiment will need
            to last at least 12 months from publication of the present
            specification. At that time, a report on the experiment will be
            written up. If successful, it would then be appropriate to work on
            a standards track specification that adds more accurate ECN
            feedback to TCP.</t>
          </list></t>
      </section>

      <section anchor="accecn_Terminology" title="Terminology">
        <t><list style="hanging">
            <t hangText="AccECN:">The more accurate ECN feedback scheme will
            be called AccECN for short.</t>

            <t hangText="Classic ECN:">the ECN scheme as specified in <xref
            target="RFC3168"/>.</t>

            <t hangText="ACK:">A TCP acknowledgement, with or without a data
            payload.</t>

            <t hangText="Pure ACK:">A TCP acknowledgement without a data
            payload.</t>

            <t hangText="SupAccECN:">The Supplementary Accurate ECN field that
            provides additional resilience as well as information about the
            ordering of ECN markings covered by a delayed ACK.</t>

            <t hangText="Data receiver:">The endpoint of a TCP half-connection
            that receives data and sends AccECN feedback.</t>

            <t hangText="Data sender:">The endpoint of a TCP half-connection
            that sends data and receives AccECN feedback.</t>

            <t
            hangText="Outgoing AccECN Protocol Handler (or, Outgoing Protocol Handler):">The
            protocol handler at the Data Receiver that marshals the AccECN
            fields when sending an ACK.</t>

            <t
            hangText="Incoming AccECN Protocol Handler (or, Incoming Protocol Handler):">The
            protocol handler at the Data Sender that reads the AccECN fields
            when receiving an ACK.</t>
          </list></t>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>

      <section anchor="accecn_Recap"
               title="Recap of Existing ECN feedback in IP/TCP">
        <t>ECN <xref target="RFC3168"/> requires two bits in the IP header.
        Prior to the specification of ECN, these two bits were always zero,
        which is called Not-ECT. An ECN sender can set two possible codepoints
        (ECT(0) or ECT(1)) to indicate an ECN-capable transport (ECT). It is
        prohibited from doing so unless it has checked that the receiver will
        understand ECN and be able to feed it back. A network node can set
        both bits simultaneously when it experiences congestion, which is
        termed 'Congestion Experienced' (CE), or loosely a 'congestion mark'.
        <xref target="accecn_Tab_ECN"/> summarises these codepoints.</t>

        <texttable anchor="accecn_Tab_ECN"
                   title="The ECN Field in the IP Header">
          <ttcol>IP-ECN codepoint (binary)</ttcol>

          <ttcol>Codepoint name</ttcol>

          <ttcol>Abbrev- iation</ttcol>

          <ttcol>Description</ttcol>

          <c>00</c>

          <c>Not-ECT</c>

          <c>N</c>

          <c>Not ECN-Capable Transport</c>

          <c>01</c>

          <c>ECT(1)</c>

          <c>1</c>

          <c>ECN-Capable Transport (1)</c>

          <c>10</c>

          <c>ECT(0)</c>

          <c>0</c>

          <c>ECN-Capable Transport (0)</c>

          <c>11</c>

          <c>CE</c>

          <c>C</c>

          <c>Congestion Experienced</c>
        </texttable>

        <t>In the TCP header the first two bits in byte 14 are defined as
        flags for the use of ECN (CWR and ECE in <xref
        target="accecn_Fig_TCPHdr"/>). On reception of a CE-marked packet at
        the IP layer, the Data Receiver starts to set the Echo Congestion
        Experienced (ECE) flag continuously in the TCP header of ACKs, which
        ensures the signal is received reliably even if ACKs are lost. The TCP
        sender confirms that it has received at least one ECE signal by
        responding with the congestion window reduced (CWR) flag, which allows
        the TCP receiver to stop repeating the ECN-Echo flag. This always
        leads to a full RTT of ACKs with ECE set. Thus any additional CE
        markings arriving within this RTT cannot be fed back.</t>

        <t>The ECN Nonce <xref target="RFC3540"/> is an optional experimental
        addition to ECN that the TCP sender can use to protect against
        accidental or malicious concealment of marked or dropped packets. The
        sender can send an ECN nonce, which is a continuous pseudo-random
        pattern of ECT(0) and ECT(1) codepoints in the ECN field. The receiver
        is required to feed back a 1-bit nonce sum that counts the occurrence
        of ECT(1) packets using the last bit of byte 13 in the TCP header,
        which is defined as the Nonce Sum (NS) flag.</t>

        <?rfc needLines="8" ?>

        <figure align="center" anchor="accecn_Fig_TCPHdr"
                title="The (post-ECN Nonce) definition of the TCP header flags">
          <artwork align="center"><![CDATA[             
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved  | S | W | C | R | C | S | S | Y | I |
|               |           |   | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
        </figure>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Overview" title="AccECN Protocol Overview">
      <t>This section provides an informative overview of the AccECN protocol
      that will be normatively specified in <xref target="accecn_Spec"/>.</t>

      <section title="Essential and Supplementary Parts">
        <t>Given limitations on the space available for TCP options and given
        the possibility that certain incorrectly designed middleboxes prevent
        TCP using any new options, the AccECN protocol has had to be designed
        in two parts:<list style="symbols">
            <t>an essential part that provides more accurate ECN feedback than
            classic ECN with limited resilience against ACK loss;</t>

            <t>a supplementary part that serves three functions:<list
                style="symbols">
                <t>it greatly improves the resilience of AccECN feedback
                information against loss of ACKs;</t>

                <t>it provides information about the order in which ECN
                markings in the IP header arrived at the Data Receiver;</t>

                <t>it improves the timeliness of AccECN feedback when a
                delayed ACK covers multiple congestion signals.</t>
              </list></t>
          </list></t>

        <t>The essential part overloads the previous definition of the three
        flags in the TCP header that had been assigned for use by ECN. This
        design choice deliberately replaces the classic ECN feedback protocol,
        rather than leaving classic ECN intact and adding more accurate
        feedback separately:<list style="symbols">
            <t>because this efficiently reuses scarce TCP header space, given
            TCP option space is approaching saturation;</t>

            <t>because a single upgrade path for the TCP protocol is
            preferable to a fork in the design;</t>

            <t>because otherwise classic and accurate ECN feedback could give
            conflicting feedback on the same segment, which could open up new
            security concerns and make implementations unnecessarily
            complex;</t>

            <t>because middleboxes are more likely to faithfully forward the
            TCP ECN flags than newly defined areas of the TCP header.</t>
          </list></t>

        <t>AccECN is designed to work even if the supplementary part is
        removed or zeroed out, as long as the essential part gets through. The
        supplementary part is carried in a field called Supplementary Accurate
        ECN (SupAccECN).</t>

        <t>It is eventually intended that the SupAccECN field would be placed
        within the main TCP header, by overloading the Urgent Pointer in any
        segment with URG = 0. However, it would be presumptuous to reassign
        bits in the main TCP header on an experimental basis. Therefore, this
        specification reserves sufficient bits within the Urgent Pointer (when
        URG = 0) for use by AccECN if it reaches the standards track. For the
        present AccECN experiments, this specification defines an experimental
        TCP option to carry SupAccECN instead.</t>

        <t>When URG = 0, the Urgent Pointer field cannot be used as an Urgent
        Pointer. Therefore, this specification gives it a new name when URG =
        0, defining it as the Non-Urgent field. This specification also
        establishes an IANA registry for future standards actions to assign
        values in this newly defined Non-Urgent field.</t>

        <t>In order to ease a future transition from experiment to standards
        track, the Incoming Protocol Handler of all AccECN implementations is
        required to be able to read the SupAccECN field whether it arrives in
        a TCP Option or within the Non-Urgent field. However, for the present
        experimental specification, an AccECN implementation is forbidden from
        writing into the Non-Urgent field.</t>

        <t>Reserving the Non-Urgent field for future use by AccECN is
        justified, because the Non-Urgent field cannot always be guaranteed to
        be available. AccECN is unusual in that it is designed to work
        reasonably well even if the supplementary part is sometimes missing.
        Therefore, on the rare segments when the Urgent Pointer is needed for
        its original purpose, URG=1 can still be set and AccECN will still
        work. However, a future standards action can overload part of the
        Non-Urgent field for use by AccECN, whenever URG=0.</t>
      </section>

      <section title="Capability Negotiation">
        <t>AccECN is a change to the wire protocol of the main TCP header,
        therefore it can only be used if both endpoints have been upgraded to
        understand it. The client signals support for AccECN on the initial
        SYN of a connection and the server signals whether it supports AccECN
        on the SYN/ACK. The TCP flags on the SYN that the client uses to
        signal AccECN support have been carefully chosen so that a server will
        interpret them as a request to support the most advanced variant of
        ECN that it supports. Then the client falls back to the same ECN
        variant.</t>

        <t>The above negotiation uses the three ECN-related flags in the TCP
        header and determines if both ends support the essential part of
        AccECN. <!-- On the SYN/ACK then on subsequent segments -->On segments
        after the SYN/ACK, the SupAccECN field is used to determine whether
        the supplementary part of AccECN is usable over each half-connection.
        No supplementary part is needed on the initial SYN. A proposal to
        include a supplementary AccECN field on the SYN/ACK is included in
        <xref target="accecn_SupAccECN_on_SYN_ACK"/>.</t>
      </section>

      <section title="Two Complementary Feedback Methods">
        <t>Each AccECN half-connection uses two complementary methods to feed
        back ECN markings:<list style="hanging">
            <t hangText="Cumulative Counters:">A Data Receiver maintains three
            counters for the number of CE, ECT(1) and Not-ECT codepoints
            received since the start of the half-connection. In each ACK it
            places one of these counters, reduced in size by a suitable modulo
            operation. The Data Sender reads each counter in order to update
            its own three respective counters, which it uses to track the
            three counters at the Data Receiver. Of course, each endpoint
            takes the role of both Data Receiver and Data Sender, so each will
            maintain three counters as a receiver and three as a sender.
            AccECN does not provide an explicit count of ECT(0) marks, but
            this can be inferred from the other feedback;</t>

            <t hangText="Sequence List:">A list of the codepoints in the
            IP-ECN field of all the segments covered by a delayed ACK, in the
            order that they arrived at the Data Receiver. This list also
            provides timely feedback of any congestion information other than
            the one covered by the single counter selected.</t>
          </list></t>

        <t>TCP's traditional feedback is byte-based, whereas AccECN feedback
        is packet-based, which was a pragmatic choice to reduce feedback
        overhead, given each packet carries only one ECN mark. AccECN aims to
        act as a sufficiently generic feedback reflector that can be applied
        for different uses by different TCP sender behaviours, both existing
        and in the future.</t>

        <t>If a particular sender behaviour needed to associate AccECN's
        feedback of each ECN marking with the size of the original packet that
        picked up the marking, there is enough information in AccECN feedback
        to do so, although perhaps imperfectly. Similarly, if a sender
        behaviour needed to associate the feedback of each ECN marking with
        the timing of each packet it originally sent, that too ought to be
        possible. Of course, the order of arrival at the receiver is not
        necessarily the order in which packets were sent, and the order in
        which ACKs return might be different again. So, to apply AccECN to
        these more challenging tasks, the Data Sender would probably have to
        record the sizes and/or timings of packets in flight and combine
        AccECN feedback with the cumulative acknowledgement numbers on each
        ACK as well as selective ACK (SACK) information <xref
        target="RFC2018"/>.</t>

        <t>Whether such calculations are required or not is outside the scope
        of the present AccECN specification. The role of AccECN is merely to
        ensure it would be possible for a Data Sender to reconstruct which
        segment carried which marking, not to mandate whether it should. As
        long as AccECN reflects sufficient feedback information without
        excessive overhead, it fulfils its role. One reason for the
        experimental status of the present specification is to establish
        whether the trade-off between accuracy and overhead has been pitched
        at the right level.</t>
      </section>

      <section title="Resilience Against ACK Loss">
        <t>Because the counter method repeats one of the accumulating counters
        on each ACK, if ACKs are lost, a counter in a subsequent ACK will
        still recover the lost information in a fairly timely fashion.</t>

        <t>There is very little space in the 3 bits available for the
        essential part of an AccECN acknowledgement, so each of the three
        counters can wrap fairly frequently. Therefore, even if the counter
        appears to have incremented by one (say), the counter might have
        actually wrapped completely then incremented by one. This is a
        possibility because the whole sequence of ACKs carrying the
        intervening values of the counter might all have been lost or delayed.
        To be able to tell if a counter has wrapped, AccECN feeds back more
        significant bits of the counter within the supplementary part, making
        it resilient to ACK loss.</t>

        <t>The supplementary part includes the sequence of ECN codepoints
        covered by a delayed ACK (see below). As well as providing ordering
        information, this provides more timely feedback when more than one
        counter has changed within the time covered by one delayed ACK. It
        also provides resilience against the loss of a counter in a future
        ACK.</t>
      </section>

      <section title="Order of Arrival of IP-ECN Markings">
        <t><xref target="RFC5681"/> recommends using delayed ACKs, so one
        acknowledgement will often carry feedback about the ECN markings on
        more than one segment. Therefore, ideally, AccECN is required to
        provide ordering information <xref
        target="I-D.ietf-tcpm-accecn-reqs"/>. However, a counter in each ACK
        only says how many more IP-ECN markings arrived since the last ACK,
        not the order in which they arrived.</t>

        <t>This might seem an unnecessary level of precision given <xref
        target="RFC5681"/> currently advises against delaying acknowledgement
        for more than two full-sized segments. However, a delayed ACK could
        cover multiple segments that are smaller than full-size. Also, in
        practice one delayed ACK can cover many tens of packets that have all
        been coalesced into one large segment by large receive offload (LRO)
        hardware before being passed to the Data Receiver. Therefore, the
        design of AccECN allows for future expansion of the number of segments
        that can be covered by one delayed ACK.</t>

        <t>Once the connection is in progress, in each ACK the Data Receiver
        encodes the sequence of IP-ECN markings covered by that ACK, which
        includes the number of segments covered by the delayed ACK. The
        sequence does not need to include the last segment to arrive, because
        there is already sufficient information in the essential part of the
        feedback to infer that marking (by subtracting the markings in the
        list from the increment of the cumulative counter).</t>

        <t>AccECN uses a fixed size (10b) field for the sequence encoding.
        This can communicate a sequence of up to 14 codepoints, not including
        the last segment. The encoding is optimised for a selection of simple
        but common patterns. If the pattern of arriving codepoints becomes too
        complex to encode in 10b, the Data Receiver has to emit an ACK and
        start a new sequence for the next ACK. The scheme can always encode
        all the theoretically possible combinations of arriving codepoints in
        a delayed ACK covering 3 segments or less.</t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Spec" title="AccECN Protocol Specification">
      <section anchor="accecn_Negotiation"
               title="Negotiation during the TCP handshake">
        <t>During the TCP handshake at the start of a connection, to request
        more accurate ECN feedback the originator of the connection (host A)
        MUST set the TCP flags NS=1, CWR=1 and ECE=1 in the initial SYN
        segment.</t>

        <t>If a responding host (B) that implements AccECN receives a SYN with
        the above three flags set, it MUST set both its half connections into
        AccECN mode. Then it MUST set the flags NS=0, CWR=1 and ECE=0 on its
        response in the SYN/ACK segment to confirm that it supports AccECN.
        The responding host MUST NOT set this combination of flags unless the
        preceding SYN requested support for AccECN as above.</t>

        <t>Once an originating host (A) has sent the above SYN to declare that
        it supports AccECN, and once it has received the above SYN/ACK segment
        that confirms that the responding host supports AccECN, the
        originating host MUST set both its half connections into AccECN
        mode.</t>

        <t>The three flags set to 1 to indicate AccECN support on the SYN have
        been carefully chosen to enable natural fall-back to prior stages in
        the evolution of ECN. <xref target="accecn_Tab_Negotiation"/>
        tabulates all the negotiation possibilities for ECN-related
        capabilities that involve at least one AccECN-capable host. To
        compress the width of the table, the headings of the first four
        columns have been severely abbreviated, as follows: <list
            hangIndent="4" style="hanging">
            <t hangText="Ac:">More *Ac*curate ECN Feedback</t>

            <t hangText="N:">ECN-*N*once <xref target="RFC3540"/></t>

            <t hangText="E:">*E*CN <xref target="RFC3168"/></t>

            <t hangText="I:">Not-ECN (*I*mplicit congestion notification using
            packet drop).</t>
          </list></t>

        <?rfc needLines="22" ?>

        <texttable align="center" anchor="accecn_Tab_Negotiation"
                   title="ECN capability negotiation between Originator (A) and Responder (B)">
          <ttcol align="left">Ac</ttcol>

          <ttcol align="center">N</ttcol>

          <ttcol align="center">E</ttcol>

          <ttcol align="center">I</ttcol>

          <ttcol align="center">SYN A->B</ttcol>

          <ttcol align="center">SYN/ACK B->A</ttcol>

          <ttcol align="left">Mode</ttcol>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>NS CWR ECE</c>

          <c>NS CWR ECE</c>

          <c/>

          <c>AB</c>

          <c/>

          <c/>

          <c/>

          <c>1   1   1</c>

          <c>0   1   0</c>

          <c>AccECN</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>A</c>

          <c>B</c>

          <c/>

          <c/>

          <c>1   1   1</c>

          <c>1   0   1</c>

          <c>classic ECN</c>

          <c>A</c>

          <c/>

          <c>B</c>

          <c/>

          <c>1   1   1</c>

          <c>0   0   1</c>

          <c>classic ECN</c>

          <c>A</c>

          <c/>

          <c/>

          <c>B</c>

          <c>1   1   1</c>

          <c>0   0   0</c>

          <c>Not ECN</c>

          <c>A</c>

          <c/>

          <c/>

          <c>B</c>

          <c>1   1   1</c>

          <c>1   1   1</c>

          <c>Not ECN (broken)</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>B</c>

          <c>A</c>

          <c/>

          <c/>

          <c>0   1   1</c>

          <c>0   0   1</c>

          <c>classic ECN</c>

          <c>B</c>

          <c/>

          <c>A</c>

          <c/>

          <c>0   1   1</c>

          <c>0   0   1</c>

          <c>classic ECN</c>

          <c>B</c>

          <c/>

          <c/>

          <c>A</c>

          <c>0   0   0</c>

          <c>0   0   0</c>

          <c>Not ECN</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>A</c>

          <c/>

          <c/>

          <c/>

          <c>1   1   1</c>

          <c>0   1   1</c>

          <c>AccECN (Rsvd)</c>

          <c>A</c>

          <c/>

          <c/>

          <c/>

          <c>1   1   1</c>

          <c>1   0   0</c>

          <c>AccECN (Rsvd)</c>

          <c>A</c>

          <c/>

          <c/>

          <c/>

          <c>1   1   1</c>

          <c>1   1   0</c>

          <c>AccECN (Rsvd)</c>
        </texttable>

        <t><xref target="accecn_Tab_Negotiation"/> is divided into blocks each
        separated by an empty row.<list style="numbers">
            <t>The top block shows the case already described where both
            endpoints support AccECN.</t>

            <t>The second block shows the cases where the originating host (A)
            supports AccECN but the responding host (B) supports some earlier
            variant of TCP, indicated in its SYN/ACK. Therefore, as soon as an
            originating AccECN-capable host (A) receives the SYN/ACK shown it
            MUST set both its half connections into the mode shown in the
            rightmost column.</t>

            <t>The third block shows the cases where the responding host (B)
            supports AccECN but the originating host (A) supports some earlier
            variant of TCP, indicated in its SYN. Therefore, as soon as
            responding AccECN-capable host (B) receives the SYN shown it MUST
            set both its half connections into the mode shown in the rightmost
            column.</t>

            <t>Forward Compatibility: The fourth block enumerates the
            remaining combinations of AccECN-related flags that are Reserved
            for future use by AccECN ('Rsvd').<list style="symbols">
                <t>If an originating AccECN host (A) sends NS=1, CWR=1 and
                ECE=1 in the initial SYN segment and if it receives any of
                these Reserved values in a SYN/ACK response, it MUST set both
                its half connections into AccECN mode. <vspace
                blankLines="1"/>{ToDo: Can we think of anything now that an
                AccECN server could use any of these Reserved combinations of
                flags for, to signal something extra for the whole connection?
                If not, rather than Reserved, we need to decide whether to
                make these combinations Rsvd and therefore not switch to
                AccECN mode.}</t>

                <t>To comply with the present AccECN protocol, middleboxes
                MUST forward these Rsvd combinations of flags unaltered (see
                also <xref target="accecn_Mbox_Operation"/>).</t>
              </list></t>
          </list></t>

        <t>The table is self-explanatory in most respects, but the following
        exceptional cases need some explanation.<list style="hanging">
            <t hangText="Not ECN (broken):"><xref target="RFC3168"/> points
            out that broken TCP server implementations exist that reflect the
            'reserved' flags <xref target="RFC0793"/> back to the originator.
            If the SYN/ACK reflects the same flag settings as the preceding
            SYN, an AccECN client implementation MUST revert to Not-ECT.</t>

            <t hangText="ECN Nonce:">An AccECN implementation, whether client
            or server, sender or receiver, does not need to implement the ECN
            Nonce behaviour <xref target="RFC3540"/>. AccECN is compatible
            with a sender-only ECN feedback integrity approach that does not
            use up the ECT(1) codepoint (see <xref
            target="accecn_Integrity"/>).</t>

            <t hangText="Simultaneous Open:">An originating AccECN Host (A),
            having sent a SYN with NS=1, CWR=1 and ECE=1, might receive
            another SYN from host B. Host A MUST then enter the same mode as
            it would have entered had it been a responding host and received
            the same SYN. Then host A MUST send the same SYN/ACK as it would
            have sent had it been a responding host (see the third block
            above).</t>
          </list></t>
      </section>

      <section anchor="accecn_Essential" title="Essential AccECN Feedback">
        <t>This section specifies the essential part of AccECN feedback,
        including its placement and the encoding of the counters.</t>

        <section anchor="accecn_ACE" title="The ACE Field">
          <t>Once AccECN has been negotiated for a connection, it overloads
          the three TCP flags ECE, CWR and NS in the main TCP header as one
          3-bit field to encode 8 distinct codepoints. Then the field is given
          a new name, ACE, as shown in <xref target="accecn_Fig_ACE_ACK"/>.
          The original definition of these three flags in the TCP header,
          including the addition of support for the ECN Nonce, is shown for
          comparison in <xref target="accecn_Fig_TCPHdr"/>. This specification
          does not rename these three TCP flags, it merely overloads them with
          another name and definition once an AccECN connection has been
          established.</t>

          <t>A host MUST interpret the ECE, CWR and NS flags as the 3-bit ACE
          counter on a segment with SYN=0 that it sends or receives after it
          has set both its half-connections into AccECN mode having
          successfully negotiated AccECN (see <xref
          target="accecn_Negotiation"/>). A host MUST NOT interpret the 3
          flags as a 3-bit ACE field on any segment with SYN=1 (whether ACK is
          0 or 1), or if AccECN negotiation is incomplete or has not
          succeeded.</t>

          <t>Both parts of each of these conditions are equally important. For
          instance, even if AccECN negotiation has been successful, the ACE
          field is not defined on any segments with SYN=1 (e.g. a
          retransmission of an unacknowledged SYN/ACK, or when both ends send
          SYN/ACKs after AccECN support has been successfully negotiated
          during a simultaneous open).</t>

          <?rfc needLines="9" ?>

          <figure align="center" anchor="accecn_Fig_ACE_ACK"
                  title="Definition of  the ACE field within bytes 13 and 14 of the TCP Header (when AccECN has been negotiated and SYN=0).">
            <artwork align="center"><![CDATA[  
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|               |           |           | U | A | P | R | S | F |
| Header Length | Reserved  |    ACE    | R | C | S | S | Y | I |
|               |           |           | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
          </figure>

          <t>The Data Receiver maintains three counters, r.ci, r.e1 and r.ni,
          to count the number of packets it receives with respectively the CE,
          ECT(1) and Not-ECT codepoint in the IP-ECN field. When a Data
          Receiver first enters AccECN mode, it MUST initialise its counters
          to zero. The Outgoing Protocol Handler at the Data Receiver uses the
          ACE field to encode one of these counters at a time into each ACK.
          How it determines which counter to signal on any particular ACK is
          specified later (<xref target="accecn_ACE_Counter_Selection"/>).</t>

          <t>The 8 possible codepoints of the ACE field are shown in <xref
          target="accecn_Tab_ACE"/>. A Data Receiver uses four of them to
          encode a 'Congestion Indication' (CI) counter for CE markings and
          three to encode E1 for ECT(1) markings. It uses the eighth codepoint
          to feed back the arrival of Not-ECT in the IP-ECN field using a
          codepoint termed NI (Not-ECT Indication). We will now use an example
          to explain how ACE is encoded by the Outgoing Protocol Handler and
          decoded by the Incoming Protocol Handler.</t>

          <?rfc needLines="15" ?>

          <texttable align="center" anchor="accecn_Tab_ACE"
                     title="Codepoint assignments in the ACE field for feedback of congestion counters">
            <ttcol align="center">ACE (base 2)</ttcol>

            <ttcol align="center">CI (base 4) for CE</ttcol>

            <ttcol align="center">E1 (base 3) for ECT(1)</ttcol>

            <ttcol>NI (base 1) for Not-ECT</ttcol>

            <c>000</c>

            <c>0</c>

            <c>-</c>

            <c>-</c>

            <c>001</c>

            <c>1</c>

            <c>-</c>

            <c>-</c>

            <c>010</c>

            <c>2</c>

            <c>-</c>

            <c>-</c>

            <c>011</c>

            <c>3</c>

            <c>-</c>

            <c>-</c>

            <c>100</c>

            <c>-</c>

            <c>0</c>

            <c>-</c>

            <c>101</c>

            <c>-</c>

            <c>1</c>

            <c>-</c>

            <c>110</c>

            <c>-</c>

            <c>2</c>

            <c>-</c>

            <c>111</c>

            <c>-</c>

            <c>-</c>

            <c>0</c>
          </texttable>

          <t>Encode: Imagine that the E1 counter is the next to be signalled
          and r.e1 = 17. Then, because the E1 counter is base 3, the Data
          Receiver calculates</t>

          <figure>
            <artwork><![CDATA[    E1 = 17 % 3
       = 2]]></artwork>
          </figure>

          <t>So it looks up E1=2 in <xref target="accecn_Tab_ACE"/> to get the
          codepoint to set in ACE, which is 0b110.</t>

          <t>Decode: The Data Sender maintains three counters, s.ci, s.e1 and
          s.ni and it uses the incoming codepoints in ACE to ensure these
          track the equivalent counters at the receiver. Imagine the s.e1
          counter at the Data Sender has currently reached 16 when the 0b110
          codepoint arrives via the ACE field. The Data Sender looks up 0b110
          in <xref target="accecn_Tab_ACE"/> to get E1 = 2. It finds the
          difference between s.e1 and E1 using modulo 3 arithmetic, then adds
          the difference to s.e1, as follows:</t>

          <figure>
            <artwork><![CDATA[    delta_s.e1 = (E1 + 3 - s.e1 % 3) % 3
               = (2 + 3 - 16 % 3) % 3
               = 1
    =>  s.e1 = s.e1 + delta_s.e1
             = 16 + 1
             = 17
]]></artwork>
          </figure>
        </section>

        <section anchor="accecn_ACE_Safety"
                 title="Safety against Ambiguity of the ACE Field">
          <t>Clearly, the CI, E1 and NI counters will frequently wrap given
          the size of the space available to encode them is so small. If a
          number of ACKs in a row are lost, the Data Sender might not be able
          to tell whether one of these counters has wrapped or not.</t>

          <t>The supplementary part of AccECN provides more space to signal
          higher bits of these counters, which gives resilience against ACK
          loss (<xref target="accecn_Higher_Counter"/>). However, the
          supplementary part of the AccECN protocol might be unavailable
          (perhaps due to middlebox interference).</t>

          <t>Therefore, if the Data Sender detects that these fields could
          have wrapped, it SHOULD behave conservatively. That is, if the
          AccECN sender detects that the supplementary part of the AccECN
          protocol is unavailable, and it detects a jump in the
          acknowledgement number that implies that so many ACKs are missing
          that a counter could have wrapped under the prevailing conditions,
          it SHOULD decode the counter assuming that the counter did wrap. If
          missing acknowledgement numbers arrive later (reordering) and prove
          that the counter did not wrap, the Data Sender MAY attempt to
          neutralise the effect of any action it took based on a conservative
          assumption that it later found to be incorrect.</t>

          <t>An example algorithm to implement this policy is given in <xref
          target="accecn_Algo_ACE_Wrap"/>. An implementer MAY develop an
          alternative algorithm as long as it satisfies these
          requirements.</t>
        </section>

        <section anchor="accecn_ACE_Counter_Selection"
                 title="ACE Counter Selection">
          <t>If the Data Receiver implements ACK-withholding as recommended in
          <xref target="RFC5681"/>, more than one counter could have
          incremented before sending each ACK. It follows the steps below to
          determine which counter to encode in the ACE field:<list
              style="numbers">
              <t>If the last IP-ECN field that arrived was CE, ECT(1) or
              Not-ECT, the Data Receiver MUST encode the associated counter in
              the ACE field, i.e. respectively CI, E1 or Not-ECT;</t>

              <t>If the last IP-ECN field that arrived was ECT(0), the Data
              Receiver can signal either the CI or the E1 counter:<list
                  style="symbols">
                  <t>The choice of which to signal SHOULD be based on the
                  principle that the more one counter has changed recently the
                  more it SHOULD be signalled;</t>

                  <t>If there is a tie between CI and E1, CI MUST take
                  precedence.</t>
                </list></t>
            </list><xref target="accecn_Algo_Counter_Select"/> suggests two
          possible algorithms that could be used to determine which counter to
          encode in ACE. An implementer MAY develop an alternative algorithm
          as long as it meets the requirements in the three steps above.</t>

          <t>If an AccECN Data Sender has to retransmit a packet due to a
          suspected loss, in its role as a Data Receiver it will piggy-back
          AccECN feedback on the retransmitted packet. On a retransmitted
          packet, a Data Receiver MUST select which counter to send using the
          rules in the above three steps and encode the latest prevailing
          value of the selected counter, which will not necessarily be the
          same counter that the packet carried originally, nor the original
          value of that counter.</t>

          <t>There is no standards track end-to-end definition of the ECT(1)
          codepoint of the IP-ECN field. Nonetheless, to comply with this
          specification, an AccECN Data Receiver MUST implement and reflect
          the ECT(1) counter as specified here. Then, a standards track
          definition of the ECT(1) codepoint can be defined in future and be
          deployed unilaterally in Data Senders, without having to wait for
          associated receivers to be deployed. The above rules ensure that a
          Data Receiver will only feed back the ECT(1) counter if some packets
          marked with ECT(1) are arriving.</t>

          <t>At the Data Sender, the Incoming AccECN Protocol Handler MUST be
          able to receive feedback of E1 codepoints, but the Data Sender MAY
          discard them (it might not have any logic to understand what to do
          with them). However, if an Incoming AccECN Protocol Handler is
          running back-to-back with an Outgoing AccECN Protocol handler (e.g.
          to implement a split TCP connection), it MUST forward the values of
          all AccECN counters including E1, and not discard any.</t>

          <t>{ToDo: Refer if necessary to <xref
          target="accecn_Rcvr_Operation"/>).</t>
        </section>
      </section>

      <section anchor="accecn_SupAccECN"
               title="The Supplementary AccECN Field (SupAccECN)">
        <t>This section defines the size, placement and internal structure of
        the Supplementary AccECN field (SupAccECN), as well as the semantics
        of the sub-fields within it. The internal structure of the SupAccECN
        field is agnostic to where it is placed in the TCP header, so that it
        can be moved during planned evolution of the protocol. The protocol
        overview in <xref target="accecn_Overview"/> explains that the field
        is placed in a TCP option for initial experiments, but if it
        progresses to the standards track, it is planned to place it in the
        main TCP header, using some of the bits in the Urgent Pointer (when
        URG=0).</t>

        <section anchor="accecn_SupAccECN_Placement"
                 title="Placement of the SupAccECN Field">
          <t>The Outgoing AccECN Protocol Handler at a Data Receiver MUST
          place the SupAccECN field in a SupAccECN TCP option (<xref
          target="accecn_SupAccECN_TCP_Option"/>).</t>

          <t>Forward compatibility: If the SupAccECN TCP option (<xref
          target="accecn_SupAccECN_TCP_Option"/>) is absent, the Incoming
          AccECN Protocol Handler at a Data Sender MUST attempt to read the
          SupAccECN field from within the Non-Urgent field (<xref
          target="accecn_Non-Urgent"/>).</t>

          <section anchor="accecn_SupAccECN_TCP_Option"
                   title="The SupAccECN TCP Option">
            <t>The Data Receiver MUST set the Kind field to 0x<KK>
            (TBA), which is registered in <xref
            target="accecn_SupAccECN_TCPopt_IANA"/> as a new TCP option Kind
            called SupAccECN. An experimental TCP option with Kind=254 MAY be
            used for initial experiments, with magic number 0xACCE.</t>

            <t>The Data Receiver MUST set the Length field to <!-- 3 [octets] on a
            SYN/ACK or --> 4 [octets] on any segment with SYN=0. For initial
            experiments, the Length field MUST be 2 greater to accommodate the
            16-bit magic number. In either case, the Data Receiver MUST pad
            the most significant bit<!--(s)--> with zeros up to a whole number
            of octets, as illustrated in <xref
            target="accecn_Fig_SupAccECN_TCP_Option"/>. <!--These-->This
            padding bit<!--s are--> is currently unused (CU).</t>

            <t>Forward compatibility: To comply with the present AccECN
            specification:<list style="symbols">
                <t>the Incoming AccECN Protocol Handler at the Data Sender
                MUST ignore the padding bit<!--(s)-->, whether <!--they are-->it
                is set to zero or not;</t>

                <t>if the Length field of the TCP option is greater than that
                expected from the paragraph above, a Data Sender MUST take the
                SupAccECN field to be aligned with the right hand end (least
                significant bit) of the TCP Option as calculated using the
                Length field;</t>

                <t>if the Length value is less than that expected from the
                paragraph above, the Incoming AccECN Protocol Handler at the
                Data Sender MUST discard the segment;</t>

                <t>a middlebox MUST forward the padding bit<!--(s)-->
                unaltered, whether <!--they are-->it is set to zero or
                not;</t>

                <t>if the Length value is different to that expected from the
                paragraph above (whether larger or smaller), a middlebox MUST
                still forward the TCP option unaltered.</t>
              </list></t>

            <?rfc needLines="24" ?>

            <figure align="center" anchor="accecn_Fig_SupAccECN_TCP_Option"
                    title="Placement of the SupAccECN field within the SupAccECN TCP Option on a Segment with SYN=0">
              <artwork><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ a)
|  Kind = 0xKK  |  Length = 4   |0|        SupAccECN            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Kind = 254   |  Length = 6   |     magic number = 0xACCE     | b)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|        SupAccECN            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>

              <postamble>a) Using the permanently assigned TCP option Kind
              0x<KK> (TBA); b) Using a Shared TCP Option Kind for
              Initial Experiments</postamble>
            </figure>
          </section>

          <section anchor="accecn_Non-Urgent" title="The Non-Urgent Field">
            <t>If the Urgent (URG) flag in the TCP header <xref
            target="RFC0793"/> is zero, this specification experimentally
            renames the Urgent Pointer (bytes 19 and 20 counting from 1 of the
            TCP header) as the Non-Urgent field. If URG = 1, this 16 bit field
            keeps its original name and definition from <xref
            target="RFC0793"/> as the Urgent Pointer. Bytes 13 to 20 of the
            TCP header when URG=0 are illustrated in <xref
            target="accecn_Fig_Non-Urgent"/>, which shows the new experimental
            definition of the Non-Urgent Field.</t>

            <t>Note that the new experimental definition of the Non-Urgent
            field is intended for wider use than just AccECN, which is why it
            solely depends on the URG flag and it is independent of whether
            AccECN has been negotiated or not.</t>

            <t><xref target="accecn_Non-Urgent_IANA"/> establishes a new
            registry to assign values within this Non-Urgent field. <xref
            target="accecn_Non-Urgent_IANA"/> also reserves space for a future
            standards track AccECN specification within this field.</t>

            <?rfc needLines="16" ?>

            <figure align="center" anchor="accecn_Fig_Non-Urgent"
                    title="Experimental Renaming of the TCP Urgent Pointer (bytes 19 & 20) as the Non-Urgent field when URG=0">
              <artwork><![CDATA[    0                   1                   2                   3   
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
    ...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |Res- |N|C|E|U|A|P|R|S|F|                               |
   | Offset|erved|S|W|C|R|C|S|S|Y|I|            Window             |
   |       |     | |R|E|G|K|H|T|N|N|                               |
   |       |     | | | |=| | | | | |                               |
   |       |     | | | |0| | | | | |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |           Non-Urgent          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...
]]></artwork>
            </figure>

            <t>As required in <xref target="accecn_SupAccECN_Placement"/>, the
            Outgoing Protocol Handler of the present AccECN specification
            never writes into the Non-Urgent field. Nonetheless, the Incoming
            AccECN Protocol Handler can read the SupAccECN field from within
            the Non-Urgent field.</t>

            <t>When reading the Non-Urgent field, AccECN implementations MUST
            take the SupAccECN field to be right-justified (i.e. the least
            significant bit of SupAccECN is aligned with the least significant
            bit of the Non-Urgent Field) as shown in <xref
            target="accecn_Fig_SupAccECN_Non-Urgent"/>. The remaining most
            significant bit<!--s are--> is currently unused (CU).</t>

            <?rfc needLines="7" ?>

            <figure align="center" anchor="accecn_Fig_SupAccECN_Non-Urgent"
                    title="Placement of the SupAccECN field within the Non-Urgent field of a segment with SYN=0">
              <artwork><![CDATA[      0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | X |                         SupAccECN                         |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
            </figure>

            <t>Forward compatibility: To comply with the present AccECN
            specification:<list style="symbols">
                <t>the Incoming Protocol Handler of an AccECN Data Sender MUST
                ignore the remaining most significant bit<!--(s)--> in the
                Non-Urgent field (shown as X in <xref
                target="accecn_Fig_SupAccECN_Non-Urgent"/> meaning "Don't
                care");</t>

                <t>middleboxes MUST forward the most significant bit<!--(s)-->
                unaltered, whether <!--they are-->it is set to zero or
                not.</t>
              </list></t>
          </section>
        </section>

        <section anchor="accecn_SupAccECN_Structure"
                 title="Structure of the SupAccECN Field">
          <t>This section defines the structure of the Supplementary AccECN
          field (SupAccECN) for SYN/ACKs and for subsequent segments within
          each half-connection. There is no SupAccECN field in the initial SYN
          segment.</t>

          <t>The size of the SupAccECN field on a segment with SYN = 0 is
          always 15 bits. <xref target="accecn_Fig_SupAccECN"/> shows the
          internal structure of the SupAccECN field on any segment with SYN =
          0 including the ACK that ends the 3-way handshake.</t>

          <?rfc needLines="7" ?>

          <figure align="center" anchor="accecn_Fig_SupAccECN"
                  title="The Supplementary AccECN Field on a Segment with SYN = 0">
            <artwork><![CDATA[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|DAC|                  ESQ                  |    Top-ACE    |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
]]></artwork>
          </figure>

          <t/>

          <t>The sub-fields of SupAccECN on a segment with SYN = 0 have the
          following meanings:<list style="hanging">
              <t hangText="Top-ACE:">Higher significant bits of the counter in
              ACE within the same segment (defined in <xref
              target="accecn_Higher_Counter"/>).</t>

              <t hangText="ESQ:">The ECN Sequence field (defined in <xref
              target="accecn_Sequence"/>).</t>

              <t hangText="DAC:">Reserved for Delayed ACK Control (see <xref
              target="accecn_Delayed_ACK_Control"/>). <vspace
              blankLines="1"/>Forward Compatibility: In the meantime, the
              Outgoing AccECN Protocol Handler MUST set DAC to zero (0); the
              Incoming AccECN Protocol Handler MUST ignore this flag; and
              middleboxes MUST forward this flag unaltered whether or not it
              is zero.</t>
            </list></t>
        </section>

        <section anchor="accecn_Higher_Counter"
                 title="Higher Resilience Congestion Counters (Top-ACE)">
          <t>Four codepoints are set aside for the CI counter in the ACE field
          to provide reasonable resilience under expected marking and loss
          regimes. However, resilience against more extreme levels of CE
          marking, return ACK loss or ACK thinning really requires more space
          than the 3 bits taken from existing TCP flags for the ACE counter.
          At the same time, is it not necessary to deliver higher order bits
          with every returned segment, or even reliably at all.</t>

          <t>Therefore on segments with SYN=0, the least significant four bits
          of the Supplementary AccECN field are defined as the 'Top ACE'
          field, as illustrated in <xref target="accecn_Fig_SupAccECN"/>.
          Whenever an AccECN implementation encodes a counter in ACE, it MUST
          also encode the higher precision bits of the same counter in the
          Top-ACE field of the same segment, using the following rules: <list
              style="symbols">
              <t>Top-ACE MUST be initialised to 0 at the start of each
              half-connection.</t>

              <t>Whenever the CI counter (base 4) in ACE wraps, the associated
              Top-ACE MUST increment by 1.</t>

              <t>Similarly, whenever the E1 counter (base 3) in ACE wraps,
              Top-ACE MUST increment by 1.</t>

              <t>The NI counter in ACE is base 1, so it can hardly be called a
              counter. The presence of the NI counter in ACE MUST be
              interpreted as an indication that the associated Top-ACE field
              in the same segment has incremented, because Top-ACE on its own
              represents the NI counter.</t>
            </list></t>

          <t>Formulae for encoding and decoding the counters CI, E1 or NI into
          the Top-ACE and ACE fields are given in <xref
          target="accecn_Algo_Top-ACE_ACE"/>, which also includes numerical
          examples.</t>

          <t>The 4 bits in the Top-ACE field multiply the number of distinct
          codepoints for each counter by 2^4 = 16. Using Top-ACE therefore
          increases the numbers of distinct codepoints for each counter as
          follows:</t>

          <?rfc needLines="6" ?>

          <texttable anchor="accecn_Tab_Counter_Space" suppress-title="true">
            <ttcol>Counter</ttcol>

            <ttcol>codepoints in ACE</ttcol>

            <ttcol>codepoints in Top-ACE with ACE</ttcol>

            <c>CI (counts CE)</c>

            <c>4</c>

            <c>16 * 4 = 64</c>

            <c>E1 (counts ECT(1))</c>

            <c>3</c>

            <c>16 * 3 = 48</c>

            <c>NI (counts Not-ECT)</c>

            <c>1</c>

            <c>16 * 1 = 16</c>
          </texttable>

          <t>Top-ACE hugely improves the resilience of AccECN against
          ambiguity of counters due to ACK loss, compared with that of ACE
          alone (quantified in <xref target="accecn_Algo_ACE_Wrap"/>). With
          Top-ACE, the AccECN protocol can lose a whole string of ACKs
          covering up to 64 - 1 = 63 congestion indications without becoming
          ambiguous. Similarly AccECN is robust to losing a whole string of
          ACKs covering 47 ECT(1) markings or 15 Not-ECT markings. If, for
          example, about 1 in 100 data packets were marked with a CE codepoint
          on the forward path, all the ACKs covering about 100 * 63 = 6,300
          segments would have to be missing from the reverse path before
          AccECN would become ambiguous. If just one of these ACKs got
          through, it would resolve any ambiguity.</t>
        </section>

        <section anchor="accecn_Sequence"
                 title="Accurate ECN Sequence within Delayed ACKs">
          <t>Given each delayed ACK can cover multiple segments, a Data
          Receiver needs to describe the order in which the ECN codepoints
          arrived. AccECN uses a 10-bit ECN Sequence (ESQ) field to encode
          this ordering. This section explains the encoding. An example
          encoding algorithm in pseudocode is given in <xref
          target="accecn_Algo_ESQ"/>. Implementations MAY develop their own
          encoding algorithm as long as it complies with the requirements in
          this section.</t>

          <t>Once the TCP 3-way handshake has completed, an AccECN Data
          Receiver can defer an ACK until one of these three tests does not
          pass:<list style="numbers">
              <t>The number of deferred bytes exceeds a configured limit
              (currently two full-sized segments <xref
              target="RFC5681"/>);</t>

              <t>The longest time for which an ACK has been delayed exceeds a
              configured limit (currently 500ms <xref target="RFC5681"/>);</t>

              <t>The sequence of ECN codepoints has become too complex to
              encode in the fixed 10b available.</t>
            </list></t>

          <t>AccECN can encode the order of a sequence of up to 15 ECN
          codepoints in one ACK. The ACE field in the ACK always encodes the
          ECN codepoint of the latest packet to arrive. Using the ESQ field of
          the same ACK, the Outgoing AccECN Protocol Handler can encode the
          order of arrival of up to 14 ECN codepoints that arrived before
          this, making a maximum coverage of 15 packets.</t>

          <t>The encoding of the ESQ field is optimised for a selection of
          simple sequences that are expected to be common. Even if the first
          two tests pass, if a more complex sequence occurs, the third test
          above will fail so the Data Receiver will be forced to send an ACK
          earlier than it would have otherwise. The most complex sequence that
          AccECN can encode is a run of 'spaces' (SP) ending in one 'mark'
          (MK1), then another run of 'spaces', followed by a 'mark' that might
          be different from the first (MK2).</t>

          <t>The internal structure of the 10-bit Accurate ECN Sequence (ESQ)
          field is show in <xref target="accecn_Fig_ESQ"/>.</t>

          <?rfc needLines="7" ?>

          <figure align="center" anchor="accecn_Fig_ESQ"
                  title="Internal Structure of the Accurate ECN Sequence (ESQ) Field">
            <artwork><![CDATA[  0   1   2   3   4   5   6   7   8   9
+---+---+---+---+---+---+---+---+---+---+
|    RL1    |    RL2    |  SP   |  MK1  |
+---+---+---+---+---+---+---+---+---+---+]]></artwork>
          </figure>

          <t>The sub-fields of ESQ have the following meanings:<list
              style="hanging">
              <t hangText="RL1:">Run-Length #1: a 3-bit field giving the
              length of a first run consisting of spaces (SP) ending in one
              mark (MK1), which is included in the length of the run;</t>

              <t hangText="RL2:">Run-Length #2: another 3-bit field giving the
              length of a second run of spaces (SP). There is no mark included
              in this run;</t>

              <t hangText="SP:">Space: The 2-bit ECN codepoint defined as a
              space, for the present ACK only;</t>

              <t hangText="MK1:">Mark #1: The 2-bit ECN codepoint defined as
              the first mark, for the present ACK only.</t>
            </list></t>

          <t>The Incoming Protocol Handler can always determine the second
          mark (MK2) from the counter that the Data Receiver uses in the ACE
          field, which has to be the counter associated with the last ECN
          codepoint to have arrived (according to the rules in <xref
          target="accecn_ACE_Counter_Selection"/>). Even though there is no
          counter associated with ECT(0), the Incoming Protocol Handler can
          tell if the last codepoint to arrive was ECT(0), because the counter
          used in ACE will not have changed relative to the previous
          packet.</t>

          <t><xref target="accecn_Fig_ESQ_Examples"/> gives example sequences
          of ECN codepoints and illustrates how the Data Receiver encodes
          them. The sequences use the single-character abbreviations in <xref
          target="accecn_Tab_ECN"/> for each ECN codepoint. The last codepoint
          to arrive is shown on the right.</t>

          <figure align="center" anchor="accecn_Fig_ESQ_Examples"
                  title="Examples Encodings of Sequences of ECN Codepoints in the ESQ Field">
            <artwork><![CDATA[       ,----- RL1 = 6 ------>  ,--- RL2=4 -->
a)      0   0   0   0   0   C   0   0   0   0   1
       SP  SP  SP  SP  SP  MK1 SP  SP  SP  SP  MK2

       ,--- RL1=4 -->       (RL2 = 0)
b)      C   C   C   0   0
       SP  SP  SP  MK1 MK2

       ,--------- RL1 = 7 ------>  ,--------- RL2 = 7 ------>
c)      0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
       SP  SP  SP  SP  SP  SP  MK1 SP  SP  SP  SP  SP  SP  SP  MK2

 RL1=1 ,>  ,--- RL2=4 -->
d)      C   0   0   0   0   C
       MK1 SP  SP  SP  SP  MK2

 RL2=1 ,>                  (RL1 = 0)
e)      N   N
       SP  MK2
]]></artwork>
          </figure>

          <t>The examples should be self-explanatory, but the following points
          might help:<list style="symbols">
              <t>The term 'mark' does not have to mean an 'ECN mark'. In (a)
              the 'spaces' are defined as ECT(0) and the first 'mark' is
              defined as CE. However, in (b) it is more efficient to define CE
              as the 'space' and ECT(0) as the first 'mark';</t>

              <t>A mark is defined to mean just one codepoint, so two marks in
              a row have to be encoded as two different marks, even if they
              are the same codepoint (b). The first and second marks can be
              defined as different (a) or the same (b or c);</t>

              <t>For a long run of the same codepoint, the first mark can be
              defined to be the same as a space, and if necessary the second
              mark can be the same as well (c);</t>

              <t>The first run (if non-zero length) always ends in one mark.
              So, if its run-length is 1, it contains a mark but no spaces
              (d);</t>

              <t>Either run-length might be zero (b & e), but MK2 will
              always be present. If the first run-length is zero, the
              definition of MK1 is redundant (e). If both run-lengths are
              zero, the definition of SP would be redundant as well.</t>
            </list></t>

          <t>The following normative statements govern an implementation of an
          AccECN Data Receiver when it defers an ACK:<list style="symbols">
              <t>The Outgoing Protocol Handler MUST NOT encode the last packet
              to be acknowledged into the ESQ field;</t>

              <t>If the Outgoing Protocol Handler cannot encode the last ECN
              codepoint to arrive in the ESQ field, it MUST send an ACK
              immediately;</t>

              <t>The Outgoing Protocol Handler MUST NOT include a codepoint in
              the sequence of codepoints in an ACK that is from any packet
              already reported in another ACK;</t>

              <t>If RL1=0, the Outgoing Protocol Handler MUST set MK1 = ECT(0)
              = 0b10, even though the value of MK1 seems redundant.</t>

              <t>If RL2=0 and RL1=<1, the Outgoing Protocol Handler MUST
              set SP = ECT(0) = 0b10, even though the value of SP seems
              redundant.</t>
            </list></t>

          <t>The last two rules ensure that the value of ESQ as a whole is
          never all-zeros, which allows the Incoming Protocol Handler to
          detect interference by middleboxes (see <xref
          target="accecn_Mbox_Interference"/>).</t>

          <t>The following normative statements govern an implementation of an
          AccECN Data Sender:<list style="symbols">
              <t>The Incoming AccECN Protocol Handler MUST increment the
              congestion codepoint counters (other than the one associated
              with the ACE field) by counting the codepoints as it decodes the
              ESQ field;</t>

              <t>If the Incoming AccECN Protocol Handler finds that the value
              of a congestion counter calculated using ESQ would be more than
              that calculated using Top-ACE/ACE, it SHOULD use the higher of
              the two calculations.</t>

              <t>If the Incoming AccECN Protocol Handler finds that the value
              of a congestion counter calculated using ESQ would be less than
              that calculated using Top-ACE/ACE, it SHOULD use the higher of
              the two calculations. An example of an exception to this rule
              would be where the Incoming Protocol Handler had previously
              conservatively assumed counter wrap, but then missing ACKs
              arriving later filled the gap in the sequence feedback.</t>

              <t>While the Incoming AccECN Protocol Handler is calculating the
              value of a congestion counter using Top-ACE/ACE, if it finds
              that the value calculated using ESQ in a previous segment is
              already higher, it SHOULD use the lower value calculated using
              ACE/Top-ACE. It SHOULD also consider the SupAccECN field in
              subsequent segments as suspect {ToDo: suggest what concrete
              action this implies}.</t>
            </list></t>

          <t>Forward Compatibility:<list style="symbols">
              <t>if RL1=0:<list style="symbols">
                  <t>the Incoming Protocol Handler MUST ignore the value in
                  MK1;</t>

                  <t>middleboxes MUST forward the value in MK1 unaltered
                  (whether or not it is 0b10 as it ought to be).</t>
                </list></t>
            </list><list style="symbols">
              <t>if RL2=0 and RL1=<1:<list style="symbols">
                  <t>the Incoming Protocol Handler MUST ignore the value in
                  SP;</t>

                  <t>middleboxes MUST forward the value in SP unaltered
                  (whether or not it is 0b10 as it ought to be).</t>
                </list></t>
            </list></t>
        </section>

        <section anchor="accecn_Integrity" title="AccECN Feedback Integrity">
          <t>The ECN Nonce <xref target="RFC3540"/> is an experimental IETF
          specification intended to allow a sender to test whether ECN CE
          markings (or losses) are being suppressed by the receiver (or
          anywhere else in the feedback loop, such as another network or a
          middlebox). The ECN nonce has not been deployed as far as can be
          ascertained. The nonce would now be nearly impossible to deploy
          retrospectively, because to catch a misbehaving receiver it relies
          on the receiver volunteering feedback information to incriminate
          itself. A receiver that has been modified to misbehave can simply
          claim that it does not support nonce feedback, which will seem
          unremarkable given so many other hosts do not support it either.</t>

          <t>With minor changes AccECN could be optimised for the possibility
          that the ECT(1) codepoint might be used as a nonce. However, given
          the nonce is now probably undeployable, the AccECN design has been
          generalised so that it ought to be able to support other possible
          uses of the ECT(1) codepoint, such as a lower severity or a more
          instant congestion signal than CE.</t>

          <t>Three alternative mechanisms are available to assure the
          integrity of ECN and/or loss signals. AccECN is compatible with any
          of these approaches:<list style="symbols">
              <t>The Data Sender can test the integrity of the receiver's ECN
              (or loss) feedback by occasionally setting the IP-ECN field to a
              value normally only set by the network (and/or deliberately
              leaving a sequence number gap). Then it can test whether the
              Data Receiver's feedback faithfully reports what it expects
              <xref target="I-D.moncaster-tcpm-rcv-cheat"/>. Unlike the ECN
              Nonce, this approach does not waste the ECT(1) codepoint in the
              IP header, it does not require standardisation and it does not
              rely on misbehaving receivers volunteering to reveal feedback
              information that allows them to be detected.</t>

              <t>Networks generate congestion signals when they are becoming
              congested, so they are more likely than Data Senders to be
              concerned about the integrity of the receiver's feedback of
              these signals. A network can enforce a congestion response to
              its ECN markings (or packet losses) using congestion exposure
              (ConEx) audit <xref target="I-D.ietf-conex-abstract-mech"/>.
              Whether the receiver or a downstream network is suppressing
              congestion feedback or the sender is unresponsive to the
              feedback, or both, ConEx audit can neutralise any advantage that
              any of these three parties would otherwise gain. <vspace
              blankLines="1"/>ConEx is a change to the Data Sender that is
              most useful when combined with AccECN. Without AccECN, the ConEx
              behaviour of a Data Sender would have to be more conservative
              than would be necessary if it had the accurate feedback of
              AccECN.</t>

              <t>The TCP authentication option (TCP-AO <xref
              target="RFC5925"/>) can be used to detect any tampering with
              AccECN feedback between the Data Receiver and the Data Sender.
              Although this section of the feedback loop is the least likely
              to come under malicious attack, it is increasingly likely to be
              tampered with accidentally by middleboxes intervening at layer
              4. The AccECN fields are immutable end-to-end, so whether placed
              in the Non-Urgent field or a TCP option, they are amenable to
              default TCP-AO protection (but not if TCP-AO protection of TCP
              options is turned off, which is non-default but might be
              necessary for other reasons).</t>
            </list></t>
        </section>
      </section>

      <section anchor="accecn_Rcvr_Operation"
               title="Accurate ECN Receiver Operation">
        <t>A TCP receiver MUST only feedback ECN information arriving in a
        segment that it deems is part of the flow, by using regular TCP
        techniques based on sequence numbers.</t>

        <t>{ToDo: It might be useful to describe receiver end of the feedback
        process, including special cases, e.g. pure ACKs, retransmissions,
        window probes, partial ACKs, etc. Does AccECN feed back each ECN
        codepoint when a data packet is duplicated?}</t>
      </section>

      <section anchor="accecn_Sndr_Operation"
               title="Accurate ECN Sender Operation">
        <t>A TCP sender MUST only accept ECN feedback on ACKs that it deems is
        part of the flow, by using regular TCP techniques based on sequence
        numbers.</t>

        <t>{ToDo: It might be useful to describe the sender end of the
        feedback process, including special cases, e.g. pure ACKs,
        retransmissions, window probes, partial ACKs, etc.}</t>
      </section>

      <section anchor="accecn_Mbox_Interference"
               title="Detection of Legacy Middlebox Interference">
        <t>The definition of the SupAccECN field has been contrived so that
        the value all-zeros is undefined. Therefore, an Outgoing AccECN
        Protocol Handler MUST NOT ever set the value of SupAccECN to
        all-zeros. <!--The specific rules are reiterated here:<list
            style="symbols">
            <t>On a SYN/ACK, an Outgoing AccECN Protocol Handler MUST set the
            reserved D-ECN field to ECT(0) = 0b10 (<xref
            target="accecn_SupAccECN_Structure"/>);</t>

            <t>On a segment with SYN=0, if RL1=RL2=0, an Outgoing AccECN
            Protocol Handler MUST set SP = MK1 = 0b10 (<xref
            target="accecn_Sequence"/>)</t>
          </list>--></t>

        <t>Therefore, the Incoming AccECN Protocol Handler MUST check<!--:<list
            style="symbols">
            <t>that the value of D-ECN is non-zero (on a SYN/ACK segment with
            SYN=1, ACK=1)</t>

            <t>or --> that the value of ESQ is non-zero (on a segment with
        SYN=0).<!--</t>
          </list>--> If the Incoming Protocol Handler detects all-zeros in
        either of these fields on any segment, it MUST ignore the whole
        SupAccECN field on that segment, and it SHOULD ignore the SupAccECN
        field on all subsequent segments in the same half-connection or at
        least treat each with greater suspicion.</t>

        <t>If a Data Sender ignores the incoming SupAccECN field, it MUST
        revert to the conservative behaviour needed when only the essential
        part of the AccECN protocol is available, as described in <xref
        target="accecn_ACE_Safety"/>. Nonetheless, the Outgoing AccECN
        Protocol Handler of the same Data Sender MUST continue to set the
        SupAccECN field as normal (<xref target="accecn_SupAccECN"/>), because
        any interference might be only in one direction. The AccECN protocol
        does not include any requirement for a Data Sender that detects
        interference to notify the other end, because the complexity required
        to assure message integrity in the face of interference is not
        warranted.</t>
      </section>

      <section anchor="accecn_Mbox_Operation"
               title="Correct Middlebox Operation">
        <t>A large class of middleboxes split TCP connections, acting as the
        receiver for one connection and the sender for another, passing data
        between the two, usually via a buffer. Network interface hardware to
        offload certain TCP processing represents another large class of
        middleboxes, even though it is rarely in its own 'box'.</t>

        <t>To comply with this specification, each side of such a middlebox
        MUST comply with the AccECN requirements applicable to a responding
        host or an originating host during capability negotiation (<xref
        target="accecn_Negotiation"/>) and the required AccECN behaviours as a
        Data Receiver or as a Data Sender throughout this specification.</t>

        <t>Another class of middleboxes attempts to 'normalise' the TCP wire
        protocol by checking that all values in header fields comply with a
        rather narrow interpretation of the TCP specifications. To comply with
        this specification, such middleboxes MUST be updated to recognise and
        forward values in fields that comply with the newly defined semantics
        of AccECN. This includes the explicitly stated requirements to forward
        Reserved (Rsvd) and Currently Unused (CU) values unaltered. An 'ideal'
        TCP normaliser would not have to change to accommodate AccECN, because
        AccECN does not directly contravene any existing TCP specifications,
        even though it uses existing TCP fields in unorthodox ways.</t>
      </section>
    </section>

    <section anchor="accecn_Interact_Variants"
             title="Interaction with Other TCP Variants">
      <section anchor="accecn_Interaction_SYN_Cookies"
               title="Compatibility with SYN Cookies">
        <t>A server can use SYN Cookies (see Appendix A of <xref
        target="RFC4987"/>) to protect itself from SYN flooding attacks. It
        places minimal commonly used connection state in the SYN/ACK, and
        deliberately does not hold any additional state while waiting for the
        subsequent ACK. Therefore it cannot record the fact that it entered
        AccECN mode for both half-connections. Indeed, it cannot even remember
        whether it negotiated the use of classic ECN <xref
        target="RFC3168"/>.</t>

        <t>If the server (host B) receives the final ACK of the 3-way
        handshake with a SupAccECN TCP option, it can infer that the
        originating host (A) supports AccECN. If host B supports AccECN
        itself, it can further infer that it would have entered AccECN mode
        before sending the SYN/ACK.</t>

        <t>If, on the other hand, the originating host (A) sends the final ACK
        of the 3-way handshake with the SupAccECN field in the Non-Urgent
        field, responding host B can still infer that host A originally
        negotiated AccECN, by checking the fourteen least significant bits of
        the Non-Urgent field and the ACE field, as follows:<list
            style="symbols">
            <t>Host B knows that host A would not defer the final ACK of the
            3-way handshake, because TCP never delays this.</t>

            <t>Therefore, if host B sends the SYN/ACK with its IP-ECN field
            set to ECT(0) <xref target="RFC5562"/>, then checks the fourteen
            least significant bits of the Non-Urgent field of the final ACK of
            the 3-way handshake, it can make the following inferences:<list
                style="numbers">
                <t>lsb(Non-Urgent) == 000010100000 && ACE == 000
                implies host A is AccECN and the SYN/ACK arrived unchanged as
                ECT(0);</t>

                <t>lsb(Non-Urgent) == 000010100000 && ACE == 001
                implies host A is AccECN and the SYN-ACK was CE-marked;</t>

                <t>lsb(Non-Urgent) == 000010100001 && ACE == 111
                implies host A is AccECN and the IP-ECN field of the SYN/ACK
                was zeroed;</t>

                <t>lsb(Non-Urgent) == 000000000000 or any value other than
                those above implies host A is Not AccECN or a middlebox is
                interfering with the Non-Urgent field.</t>
              </list></t>

            <t>If, on the other hand, host B sends the SYN/ACK with its IP-ECN
            field set to Not-ECT, then checks the fourteen least significant
            bits of the Non-Urgent field of the final ACK of the 3-way
            handshake, it can make the following inferences:<list
                style="numbers">
                <t>lsb(Non-Urgent) == 000010100001 && ACE == 111
                implies host A is AccECN;</t>

                <t>lsb(Non-Urgent) == 000000000000 or any value other than
                that above implies host A is Not AccECN or a middlebox is
                interfering with the Non-Urgent field.</t>
              </list></t>
          </list></t>
      </section>

      <section anchor="accecn_Interaction_Other"
               title="Compatibility with Other Options and Experiments">
        <t>AccECN is compatible (at least on paper) with the most commonly
        used TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It
        is also compatible with the recent promising experimental TCP options
        TCP Fast Open (TFO <xref target="I-D.ietf-tcpm-fastopen"/>) and
        Multipath TCP (MPTCP <xref target="RFC6824"/>). AccECN is particularly
        friendly to all these protocols, because space for TCP options is
        particularly scarce on the SYN, where AccECN consumes zero additional
        header space.</t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Properties" title="Protocol Properties">
      <t>This section is informative not normative. It describes how well the
      protocol satisfies the agreed requirements for a more accurate ECN
      feedback protocol <xref target="I-D.ietf-tcpm-accecn-reqs"/>.<list
          style="hanging">
          <t hangText="Accuracy:">From each ACK, the Data Sender can infer the
          number of new Not-ECT, ECT(0), ECT(1) and CE markings since the
          previous ACK.</t>

          <t hangText="Accuracy:">The Data Receiver can feed back to the Data
          Sender a list of the order of the IP-ECN markings covered by each
          delayed ACK.</t>

          <t hangText="Overhead:">The AccECN scheme is divided into two parts.
          The essential part reuses the 3 flags already assigned to ECN in the
          IP header. The supplementary part requires fifteen bits.</t>

          <t hangText="Overhead:">Two alternative locations for the
          supplementary protocol field are proposed:<list style="numbers">
              <t>In the 16-bit Urgent Pointer when URG=0. This specification
              reserves 15 bits of this space, but while the specification is
              only experimental it refrains from using this space in the main
              TCP header. If AccECN progresses to the standards track and uses
              these 15b, it will require zero additional overhead, because it
              will overload fields that already takes up space in every TCP
              header</t>

              <t>In a TCP option. This takes up 4B; the fifteen bits have to
              be rounded up to 2B, plus 2B for the TCP option Kind and
              Length.</t>
            </list></t>

          <t hangText="Timeliness:">In the absence of lost ACKs, no feedback
          is deferred to a future ACK, which is intended to enable
          latency-sensitive uses of ECN feedback.</t>

          <t hangText="Timeliness:">{ToDo: Add improved timeliness if the
          Delayed ACK Control (DAC) feature is included.}</t>

          <t hangText="Resilience:">Each ACK includes a counter of one of the
          ECN congestion signals. If ACKs are lost, the counter on the first
          ACK following the losses allows the Data Sender to immediately
          recover the number of one of the ECN markings that it missed.</t>

          <t hangText="Resilience:">Subsequent ACKs will allow it to recover
          the number of other ECN markings that it missed.</t>

          <t hangText="Resilience against Bias:">Undetected ACK loss is as
          likely to decrease as increase congestion signals detected by the
          Data Sender.</t>

          <t hangText="Resilience against Bias:">However, if the supplementary
          part is unavailable, the required conservative decoding of feedback
          during ACK loss is more likely to increase perceived congestion
          signals, which would otherwise be more likely to be
          under-reported.</t>

          <t hangText="Timeliness vs Overhead:">For efficiency, each delayed
          ACK only includes one of the counters at a time, therefore recovery
          of the count of the other signals might not be immediate if an ACK
          is lost that covers more than one signal. The receiver cannot
          predict which ACKs might get lost, if any. Therefore it repeats the
          count of each signal roughly in proportion to how often each signal
          changes.</t>

          <t hangText="Ordering:">The order of arriving ECN codepoints is
          communicated in a 10-bit field in the supplementary part;</t>

          <t hangText="Resilience vs. Ordering:">Following an ACK loss, only a
          count of the lost ECN signals is recovered, not their order of
          arrival over the sequence covered by the loss.</t>

          <t hangText="Ordering vs. Overhead:">The encoding is tailored for
          sequences of ECN codepoints expected to be typical. It can encode
          sequences of up to 15 segments but, if the pattern of arrivals
          becomes too complex, the protocol forces the Data Receiver to emit
          an ACK. The protocol can always encode any sequence of 3 segments in
          one delayed ACK;</t>

          <t hangText="Ordering, Timeliness and Resilience:">If one delayed
          ACK covers changes to more than one congestion counter the
          supplementary sequence information provides more timely congestion
          feedback than waiting for the other congestion counters on future
          ACKs, and it provides resilience against the possibility of those
          future ACKs going missing;</t>

          <t hangText="Complexity:">{ToDo: Once implemented, quantify the code
          complexity}</t>

          <t hangText="Integrity:">AccECN is compatible with complementary
          protocols that assure the integrity of ECN feedback.</t>

          <t hangText="Backward Compatibility:">If only one endpoint supports
          the AccECN scheme, it will fall-back to the most advanced ECN
          feedback scheme supported by the other end.</t>

          <t hangText="Backward Compatibility:">Each endpoint can detect
          normalisation of the Supplementary AccECN field by middleboxes at
          any time during a connection. It could then fall-back to the
          essential part using only the fewer but safer bits in the TCP
          header.</t>

          <t hangText="Forward Compatibility:">The behaviour of endpoints and
          middleboxes is carefully defined for all reserved or currently
          unused codepoints in the scheme, to ensure that any blocking of
          anomalous values is always at least under reversible policy
          control.</t>
        </list></t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_IANA_Considerations" title="IANA Considerations">
      <section anchor="accecn_SupAccECN_TCPopt_IANA"
               title="SupAccECN TCP Option Allocation">
        <t>This specification requires IANA to allocate one value from the TCP
        option Kind name-space, against the name "Supplementary Accurate ECN"
        (SupAccECN).</t>

        <t>Early implementation before the IANA allocation MUST follow <xref
        target="RFC6994"/> and use experimental option 254 and magic number
        0xACCE (16 bits) {ToDo register this with IANA}, then migrate to the
        new option after the allocation.</t>
      </section>

      <section anchor="accecn_Non-Urgent_IANA"
               title="Non-Urgent Field Registry">
        <t>This specification requests that IANA sets up a new TCP parameters
        registry in accordance with <xref target="RFC5226"/>. This registry
        enables future standards track RFCs to assign values to sub-fields of
        the TCP Non-Urgent field defined in <xref
        target="accecn_Non-Urgent"/>.</t>

        <t><list style="hanging">
            <t hangText="Name of registry:">Non-Urgent field.</t>

            <t hangText="Information required for assignments:"><list
                style="symbols">
                <t>Width and position of sub-field or sub-fields,</t>

                <t>Assignment of values to sub-field(s),</t>

                <t>Confirmation of compliance with additional conditions 1
                & 2 below.</t>
              </list></t>

            <t hangText="Review Process:">Standards Action - Values to be
            assigned for Standards Track RFCs approved by the IESG. At the
            IESG's discretion, values MAY be assigned for Standards Track RFCs
            still in the process of approval, in order to resolve the catch-22
            where the assignment needs deployment testing but deployment
            testing needs the assignment.</t>

            <t hangText="Size, format and syntax of registry entries:">Binary
            values of sub-fields.</t>

            <t hangText="Initial assignments and reservations:">This
            specification reserves the 15 least significant bits of the
            Non-Urgent field for use by a potential future standards action
            that might define the AccECN scheme for the standards track.</t>
          </list></t>

        <t>Additional conditions for assignment:<list style="numbers">
            <t>Assignments within the Non-Urgent field MUST be used by a
            protocol that is robust to the field being unavailable
            occasionally. This is because the Non-Urgent field is unusable and
            undefined on segments with URG = 1 in the TCP header <xref
            target="RFC0793"/>. The Non-Urgent field overloads the meaning of
            the 16-bit Urgent Pointer only when URG = 0.</t>

            <t>The value zero, i.e. all 16 bits of the Non-Urgent field
            cleared to zero, SHOULD be undefined, because it is known that
            certain 'normalising' middleboxes overzealously zero the urgent
            pointer when URG = 0. An undefined zero value can be achieved by
            requiring that the value all-zeros is undefined for at least one
            sub-field of the Non-Urgent field. Then even if the value
            all-zeros is defined and used in other sub-fields, the value
            all-zeros for the whole field will be undefined.</t>
          </list></t>
      </section>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Security_Considerations"
             title="Security Considerations">
      <t>If ever the supplementary part of AccECN is unusable (due for example
      to middlebox interference) the essential part of AccECN's congestion
      feedback offers only limited resilience to long runs of ACK loss (see
      <xref target="accecn_ACE_Safety"/>). These problems are unlikely to be
      due to malicious intervention (because if an attacker could discard a
      long run of ACKs it could wreak other arbitrary havoc). However, it
      would be of concern if AccECN's resilience could be indirectly
      compromised during a flooding attack. AccECN is still considered safe
      though, because an AccECN Data Sender can detect when the supplementary
      part is unusable, and it is then required to switch to more conservative
      assumptions about wrap of congestion indication counters (see <xref
      target="accecn_ACE_Safety"/> and <xref
      target="accecn_Algo_ACE_Wrap"/>).</t>

      <t>AccECN does not signal the ordering of ECN codepoints covered by a
      delayed ACK reliably, i.e. if one delayed ACK is lost, the ECN sequence
      information in that ACK is not retransmitted. The design of AccECN
      assumes gaps in this information will not be critical, and that this
      information is unlikely to be security-sensitive. However, this point is
      mentioned for completeness.</t>

      <t>The SYN cookie method for mitigating SYN flooding attacks is not
      generally compatible with enhancements to the TCP 3-way handshake.
      Nonetheless, <xref target="accecn_Interaction_SYN_Cookies"/> describes
      how a server can negotiate AccECN and use SYN cookies.</t>

      <t>AccECN is compatible with all the known schemes that ensure the
      integrity of ECN feedback (see <xref target="accecn_Integrity"/> for
      details). Given the experimental ECN nonce is now probably undeployable,
      AccECN has been generalised for other possible uses of the ECT(1)
      codepoint to avoid any risk of obsolescence.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Acknowledgements" title="Acknowledgements">
      <t>We want to thank Michael Welzl for his input and discussion. The idea
      of using the three ECN-related TCP flags as one field for more accurate
      TCP-ECN feedback was first introduced in the re-ECN protocol that was
      the ancestor of ConEx.</t>

      <t>Bob Briscoe was part-funded by the European Community under its
      Seventh Framework Programme through the Reducing Internet Transport
      Latency (RITE) project (ICT-317700) and through the Trilogy 2 project
      (ICT-317756). The views expressed here are solely those of the
      authors.</t>
    </section>

    <!-- ================================================================ -->

    <section anchor="accecn_Comments_Solicited" title="Comments Solicited">
      <t>Comments and questions are encouraged and very welcome. They can be
      addressed to the IETF TCP maintenance and minor modifications working
      group mailing list <tcpm@ietf.org>, and/or to the authors.</t>
    </section>
  </middle>

  <back>
    <!-- ================================================================ -->

    <references title="Normative References">
      &RFC0793;

      &RFC2119;

      &RFC3168;

      &RFC5681;

      &RFC6994;
    </references>

    <references title="Informative References">
      &RFC2018;

      &RFC3540;

      &RFC4987;

      &RFC5226;

      &RFC5562;

      &RFC5925;

      &RFC6824;

      &I-D.ietf-tcpm-accecn-reqs;

      &I-D.ietf-tcpm-fastopen;

      &I-D.ietf-conex-abstract-mech;

      &I-D.kuehlewind-tcpm-ecn-fallback;

      &I-D.moncaster-tcpm-rcv-cheat;

      &I-D.bensley-tcpm-dctcp;
    </references>

    <section anchor="accecn_Algo_Examples" title="Example Algorithms">
      <t>This appendix is informative, not normative. It gives examples in
      pseudocode for the various algorithms used by AccECN.</t>

      <section anchor="accecn_Algo_ACE_Wrap"
               title="Example Algorithm for Safety Against Long Sequences of ACK Loss">
        <t>This appendix gives an example algorithm that a Data Sender can use
        to heuristically detect a long enough unbroken string of ACK losses
        that could have concealed wrap of the congestion counter in the ACE
        field of the next ACK to arrive. The Data Sender is unlikely to need
        to run an algorithm like this unless it detects that supplementary
        AccECN feedback is not available (see <xref
        target="accecn_ACE_Safety"/> and <xref
        target="accecn_Mbox_Interference"/>).</t>

        <t>It is assumed that the focus is solely safety not complete protocol
        precision. Therefore, this example solely detects possible wrap of the
        congestion indication (CI) counter, not E1 or NI. This is on the
        assumption that, even if ECT(1) is redefined to indicate congestion in
        some way, then ECN CE markings will always indicate more severe
        congestion. It is also assumed that numerous Not-ECT markings imply
        middlebox tampering, which only needs to be detected, not quantified
        perfectly.</t>

        <t>If the supplementary Top-ACE field cannot be used, there is only
        room for 4 values of the congestion indication (CI) counter in the ACE
        field. The CI counter in an arriving ACK could have wrapped and become
        ambiguous to the Data Sender if a row of ACKs goes missing that covers
        a stream of data long enough to contain 4 or more CE marks. We use the
        word missing rather than lost, because some or all the missing ACKs
        might arrive eventually, but out of order. Even if some of the lost
        ACKs are piggy-backed on data (i.e. not pure ACKs) retransmissions
        will not repair the lost AccECN information, because AccECN requires
        retransmissions to carry the latest AccECN counters, not the original
        ones (<xref target="accecn_ACE_Counter_Selection"/>).</t>

        <t>If the CE marking probability were p on the forward data path,
        ambiguity would arise if 100% of ACKs went missing from the reverse
        path in a row was at least 4/p long. For example, if p was 5% on the
        forward path, ambiguity would ensue if simultaneously on the reverse
        path a sequence of ACKs covering 4/0.05 = 80 packets all went missing.
        With a delayed ACK ratio of 2 that translates to missing 40 ACKs in a
        row. Obviously, missing ACKs would be far less likely if pure ACKs
        were allowed to be ECN-capable. However, because RFC 3168 currently
        precludes this, we will assume that pure ACKs are not ECN-capable.</t>

        <t>To protect against such an unlikely event, <xref
        target="accecn_ACE_Safety"/> requires the Incoming Protocol Handler to
        assume that the CI field did wrap if it could have wrapped under
        prevailing conditions. It could be extremely conservative and assume
        that ECN marking suddenly jumped to 100% on the forward path just when
        there were no ACKs on the reverse path to detect it.</t>

        <t>Specifically, if the Incoming Protocol Handler receives an ACK with
        an acknowledgement number that acknowledges L full-sized segments
        since the previous ACK, it could conservatively assume that the CI
        field incremented by <figure>
            <artwork><![CDATA[    D' = L - ((L-D) % 4),]]></artwork>
          </figure>where D is the apparent increase in the CI field. This
        would still be safe if segments were 5% of full-sized as long as ECN
        marking was 5% or less, not 100%.</t>

        <t>For example, imagine an ACK acknowledges 5 more full-size segments
        than any previous ACK, and that it apparently increases CI by 2. The
        above formula works out that a safe increment of CI would still be 2
        (because 5 - ((5-2) % 4) = 2). However, if CI apparently increases by
        2 but acknowledges 11 more full-sized segments, then CI should be
        assumed to have increased by 10 (because 11 - ((11-2) % 4) = 10).</t>

        <t>Implementers could build in more heuristics to estimate prevailing
        segment sizes and prevailing ECN marking. For instance, L in the above
        formula could be replaced with L' = L*p*M/s, where M is the MSS, s is
        the prevailing segment size and p is the prevailing ECN marking
        probability. However, ultimately, if TCP's ECN feedback becomes
        inaccurate it still has loss detection to fall back on. Therefore, it
        would seem safe to implement a simple algorithm like that given
        initially, rather than a perfect one.</t>

        <t>If missing acknowledgement numbers arrive later (due to
        reordering), <xref target="accecn_ACE_Safety"/> says "the Data Sender
        MAY attempt to neutralise the effect of any action it took based on a
        conservative assumption that it later found to be incorrect". To do
        this, the Data Sender would have to store the values of all the
        relevant variables whenever it made assumptions, so that it could
        re-evaluate them later. Given this could become complex and it is not
        required, we do not attempt to provide an example of how to do
        this.</t>
      </section>

      <section anchor="accecn_Algo_Counter_Select"
               title="Example Counter Selection Algorithms">
        <t>When the Data Receiver sends an ACK, if the last IP-ECN field that
        arrived was ECT(0), <xref target="accecn_ACE_Counter_Selection"/>
        says, "...the Data Receiver can signal either the CI or the E1
        counter. The choice of which to signal SHOULD be based on the
        principle that the more one counter has changed recently the more it
        SHOULD be signalled." A couple of alternative algorithms are suggested
        below that would satisfy this requirement.</t>

        <section anchor="accecn_Algo_Counter_Select1"
                 title="Counter Selection Algorithm Alt#1">
          <t>Counter selection algorithm Alt#1 repeats whichever counter has
          been repeated proportionately less often, relative to how often it
          has changed, with preference for CI if they tie. Or in
          pseudocode:</t>

          <figure>
            <artwork><![CDATA[if ( (e1 / r_e1) > (ci / r_ci) )
    send_ack(e1)
else
    send_ack(ci)
]]></artwork>
          </figure>

          <t>where r_e1 and r_ci are counts of how often E1 and CI were
          already repeated when ECT(0) was signalled. The algorithm below
          implements this comparison between two divisions using only integer
          addition. It is a little terse, so it is explained afterwards.</t>

          <figure>
            <artwork><![CDATA[ci   = 0    // CE counter
w_ci = 0    // internal 'weight' variable for CI
r_ci = 0    // internal count of how often CI has been repeated
e1   = 0    // ECT(1) counter
w_e1 = 0    // internal 'weight' variable for E1
r_e1 = 0    // internal count of how often E1 has been repeated
ni   = 0    // Not-ECT counter

dack_to_be_sent()    // shorthand for test if a delayed ACK is needed

switch (read(pkt.ip.ecn)) {
    case CE :
        ci++
        w_ci += r_e1
        if (dack_to_be_sent()) send_ack(ci)
    case ECT1 :
        e1++
        w_e1 += r_ci
        if (dack_to_be_sent()) send_ack(e1)
    case Not-ECT :
        ni++
        if (dack_to_be_sent()) send_ack(ni)
    case ECT0 :
        if (dack_to_be_sent()) {
            /* Choice between E1 and CI */
            if (w_e1 > w_ci) {      // Preference to CI if they tie
                send_ack(e1)
                r_e1++
                w_ci += ci
            } else {
                send_ack(ci)
                r_ci++
                w_e1 += e1
            }
        }
}
]]></artwork>
          </figure>

          <t>{ToDo: Handle wrap of the weights (see my notebook?).}</t>

          <t>Explanation: The algorithm ensures that the weights always equal
          the following products:</t>

          <figure>
            <artwork><![CDATA[    w_ci = ci * r_e1,
    w_e1 = e1 * r_ci.
]]></artwork>
          </figure>

          <t>It does this by incremental addition rather than
          multiplication:<list style="symbols">
              <t>every time r_e1 increments by 1, w_ci is incremented by 1 *
              ci;</t>

              <t>every time ci increments by 1, w_ci is incremented by 1 *
              r_e1;</t>
            </list>and the same for w_e1 and the pair of variables it consists
          of.</t>

          <t>This ensures that the condition</t>

          <figure>
            <artwork><![CDATA[    w_e1 > w_ci]]></artwork>
          </figure>

          <t>used in the algorithm is equivalent to:</t>

          <figure>
            <artwork><![CDATA[    e1 * r_ci > ci * r_e1,]]></artwork>
          </figure>

          <t>or rearranging:</t>

          <figure>
            <artwork><![CDATA[    (e1 / r_e1) > (ci / r_ci),]]></artwork>
          </figure>

          <t>which is the required proportionality condition.</t>
        </section>

        <section anchor="accecn_Algo_Counter_Select2"
                 title="Counter Selection Algorithm Alt#2">
          <t>Counter selection algorithm Alt#2 implements the policy "Send
          each recently changed codepoint twice, unless the other one has also
          changed, and alternate sending CI, E1 if no counter changes."</t>

          <t>{ToDo: Alt#2 has the disadvantage that it can repeat E1 a lot,
          even if E1 has never been signalled, which unnecessarily reduces the
          resilience of CI.</t>

          <figure>
            <artwork><![CDATA[ci   = 0        // CE counter
q_ci = 0        // queue of CI's to repeat
nxt_ci = TRUE   // Signal E1 next if FALSE
e1   = 0        // ECT(1) counter
q_e1 = 0        // queue of E1's to repeat
ni   = 0        // Not-ECT counter

dack_to_be_sent()    // shorthand for test if a delayed ACK is needed

switch (read(pkt.ip.ecn)) {
    case CE :
        ci++
        q_ci = 2
        if (dack_to_be_sent()) send_ack(ci)
    case ECT1 :
        e1++
        q_e1 = 2
        if (dack_to_be_sent()) send_ack(e1)
    case Not-ECT :
        ni++
        if (dack_to_be_sent()) send_ack(ni)
    case ECT0 :
        if (dack_to_be_sent()) {
            /* Choice between E1 and CI */
            if (q_ci || q_e1) {     // If either queue is non-zero
                if (q_e1 > q_ci) {  // Preference to CI if they tie
                    send_ack(e1)
                    q_e1 = max(0, q_e1 - 1)
                } else {
                    send_ack(ci)
                    q_ci = max(0, q_ci - 1)
                }
            } else {            // Both queues are zero
                if (nxt_ci)
                    send_ack(ci)
                else
                    send_ack(e1)
                nxt_ci = !nxt_ci    // Toggle the next signal
            }
        }
}
]]></artwork>
          </figure>
        </section>
      </section>

      <section anchor="accecn_Algo_Top-ACE_ACE"
               title="Example Encodings and Decodings of Top-ACE and ACE">
        <t>This appendix gives formulae for encoding and decoding the counters
        CI, E1 or NI with higher resilience to ACK loss by supplementing the
        ACE field with the Top-ACE field, as required in <xref
        target="accecn_Higher_Counter"/>.</t>

        <section anchor="accecn_Algo_Top-ACE_ACE_Encode"
                 title="Encoding Top-ACE and ACE by the Data Receiver">
          <t>The values associated with codepoints in ACE for CI and E1 are
          respectively base 4 and base 3 numbers (see <xref
          target="accecn_Tab_ACE"/>). Although there is only space for one
          value of NI, mathematically, NI can still be treated as a base 1
          counter. Then the following general formulae allow a Data Receiver
          to encode any of the counters CI, E1 or NI, by calling them all
          cntr, and defining ACE_base as their respective number base:</t>

          <figure>
            <artwork><![CDATA[     Top-ACE = Int(cntr / ACE_base) % 16,
    ACE_cntr = cntr % ACE_base.
]]></artwork>
          </figure>

          <t>Then the Data Receiver looks up the codepoint to put in the ACE
          field by looking up ACE_cntr in <xref target="accecn_Tab_ACE"/> in
          the column of the relevant counter (CI, E1 or NI). Int() means round
          down to an integer and '%' is the modulo operator.</t>

          <t>To implement this without a costly division operation, two
          counters can be maintained while processing the header information
          for the ACK. The first counter can be mapped into the ACE field via
          <xref target="accecn_Tab_ACE"/>. A wrap every 4 increments of the
          counter could be implemented as a single conditional check, and when
          it wraps, a secondary, high-order counter could be incremented. This
          secondary counter could then be mapped directly into the Top ACE
          field. For instance, the two counters for CE markings would be
          implemented as follows:</t>

          <figure>
            <artwork><![CDATA[if (read(pkt.ip.ecn) == CE) {
    if (ACE_cntr.ci == 4) {
        ACE_cntr.ci = 0
        if (Top-ACE.ci == 16) {
            Top-ACE.ci = 0
        } else
            Top-ACE.ci++
    } else
       ACE_cntr.ci++
}
]]></artwork>
          </figure>

          <t>The three examples below explain how the algorithm determines
          which codepoints to place in Top-ACE and ACE, for each counter in
          turn. For brevity, they use the first mathematical formula above,
          rather than the second conditional logic variant.</t>

          <t>Example #1: if the Data Receiver has determined that it will
          signal its CI counter next and its local value is 73, it encodes
          this as:</t>

          <figure>
            <artwork><![CDATA[    Top-ACE = INT(73 / 4) % 16
            = 2
            = 0b0010
    ACE_cntr = 73 % 4
             = 1
]]></artwork>
          </figure>

          <t>Looking up the codepoint for CI = 1 in <xref
          target="accecn_Tab_ACE"/> gives:</t>

          <figure>
            <artwork><![CDATA[    ACE = 0b001.]]></artwork>
          </figure>

          <t>Example #2: if the Data Receiver has determined that it will
          signal its E1 counter next and its local value is 75, it encodes
          this as:</t>

          <figure>
            <artwork><![CDATA[    Top-ACE = INT(75 / 3) % 16
            = 9
            = 0b1001
    ACE_cntr = 75 % 3
             = 0
]]></artwork>
          </figure>

          <t>Looking up the codepoint for E1 = 0 in <xref
          target="accecn_Tab_ACE"/> gives:</t>

          <figure>
            <artwork><![CDATA[    ACE = 0b100.]]></artwork>
          </figure>

          <t>Example #3: if the Data Receiver has determined that it will
          signal its NI counter next and its local value is 43, it encodes
          this as:</t>

          <figure>
            <artwork><![CDATA[    Top-ACE = INT(43 / 1) % 16
            = 11
            = 0b1011
    ACE_cntr = 43 % 1
             = 0               // Anything modulo 1 is 0
]]></artwork>
          </figure>

          <t>Looking up the codepoint for NI = 0 in <xref
          target="accecn_Tab_ACE"/> gives:</t>

          <figure>
            <artwork><![CDATA[    ACE = 0b111.]]></artwork>
          </figure>
        </section>

        <section anchor="accecn_Algo_Top-ACE_ACE_Decode"
                 title="Decoding Top-ACE and ACE by the Data Sender">
          <t>An AccECN Data Sender decodes the incoming combination of Top-ACE
          and ACE by looking up the ACE codepoint in <xref
          target="accecn_Tab_ACE"/> to get ACE_cntr and ACE_base, then:</t>

          <figure>
            <artwork><![CDATA[    cntr = Top-ACE * ACE_base + ACE_cntr.]]></artwork>
          </figure>

          <t>For example, if ACE = 0b101 and Top-ACE = 0b0111 = 7, the Data
          Sender looks up ACE = 0b101 in <xref target="accecn_Tab_ACE"/> to
          see that this is the E1 counter and that ACE_cntr = 1 base 3.
          Therefore,</t>

          <figure>
            <artwork><![CDATA[    E1 = cntr = 7 * 3 + 1
              = 22
]]></artwork>
          </figure>

          <t>The Data Sender is likely to be primarily interested in the
          increment in this counter relative to the previous ACK. In the case
          of E1, it will have to use modulo 48 arithmetic for the difference,
          because the encoding wraps at 48 (see <xref
          target="accecn_Tab_Counter_Space"/>). Specifically, if the Data
          Sender's local counter is snd_e1, then the difference,</t>

          <figure>
            <artwork><![CDATA[    delta_e1 = (E1 + 48 - snd_e1 % 48) % 48]]></artwork>
          </figure>

          <t>{ToDo: Provide algorithms that decode correctly with ACK
          reordering}</t>
        </section>
      </section>

      <section anchor="accecn_Algo_ESQ"
               title="Example ECN Sequence (ESQ) Encoding Algorithms">
        <t>This appendix gives an example algorithm for the Data Receiver to
        encode the arriving sequence of IP-ECN codepoints in the ECN Sequence
        (ESQ) field of a delayed ACK, as required in <xref
        target="accecn_Sequence"/>.</t>

        <figure>
          <artwork><![CDATA[/* Algorithm to encode the arrival sequence of IP-ECN codepoints
 */
DEFAULT = ECT0      // Any ECN codepoint except Not-ECT
DACK_T_MAX = 500    // Max time to delay an ACK [ms]
RL_MAX = 7          // Max run-length that can fit in 3-bit field
DACK_SEG_MAX = 2    // Max full-sized delayed ACK segments:
MSS = 1500          // Example max segment size [B]
DACK_B_MAX = DACK_SEG_MAX * MSS     // Max deferred bytes

sp = mk1 = DEFAULT  // 2-bit ECN codepoints: space and mark
mk2                 // second mark (fed back in ACE, not ESQ)
rl1 = rl2 = 0       // 3-bit run-lengths
dack_b = 0          // deferred bytes

/* Strategy: in readiness for a packet arrival, hold the variables
 *  necessary to build the ECN sequence field (ESQ) of the next ACK.
 * If a packet arrives, and it can be added to the held sequence,
 *  do so and return.
 * If it can't be added to the held sequence, send the ACK
 *  with the most recent packet as the second mark.
 * If the delayed ack timer expires, unwind the last packet in the 
 * held sequence to use as the second mark, and send the ACK
 */

foreach pkt {
    tmp = read(pkt.ip.ecn)      // Store incoming ECN field
    dack_b += read(pkt.ip.size) // Add to deferred bytes

    if (dack_b >= DACK_B_MAX) { // Test deferred bytes threshold
        mk2 = tmp               // Assign incoming ECN to mk2
        send_ack(rl1,rl2,sp,mk1,mk2)     // Encode ESQ and send ACK
    } elif ((rl1 + rl2) =< 0) { // Is the held sequence empty?
        sp = tmp                // Initialise with a space in run2
        rl2++
        init_timer(dack_expire, DACK_T_MAX) // Arm delayed ACK timer
    } elif (tmp == sp) {        // Is the incoming ECN another space?
        if (rl2 < RL_MAX) {     // Is there room in run2?
            rl2++               // Extend run2
        } elif (rl1 =< 0) {     // Otherwise, is run1 empty?
            mk1 = sp            // Shift run2 to run1, making mk1=sp
            rl1 = rl2
            rl2 = 1
        }
    /* If got to here, incoming ECN is assigned as a mark */
    } elif (rl1 =< 0) {     // If there's room in run1, switch to it
        mk1 = tmp
        rl1 = rl2
        rl2 = 0
    } elif ( (tmp == mk1)   // Is incoming ECN a mark already seen
          && (rl1 = 2)      //  with only one space before it?
          && (rl2 = 0) ) {
        mk1 = sp            // If so, swap marks with spaces
        sp = tmp
        rl1 = 1
        rl2 = 2
    } else {                // Cannot extend sequence
        mk2 = tmp           // Assign the incoming ECN to mk2
        send_ack(rl1,rl2,sp,mk1,mk2)    // Encode ESQ and send ACK
    }
}

/* dack_expire()
 * Routine called when the delayed ACK timer expires.
 * There is no incoming packet to fill mk2, 
 *  so the last value from the held sequence has to be used instead
 *  (there will always be a held sequence because the timer is only
 *  armed once the sequence is non-empty).
 */
dack_expire() {
    if (rl2 > 0) {  // run2 contains a value
        rl2--
        mk2 = sp    // copy it into mk2
    } else {        // run2 is empty, therefore run1 is not
        mk2 = mk1   // copy mk1 into mk2
        rl2 = rl1-- // shift run1 into run2 without mk1
        rl1 = 0
    }               // Last value extraction is complete
    send_ack(rl1,rl2,sp,mk1,mk2)    // Encode ESQ and send ACK
}

/* send_ack()
 * Algorithm to encode the arrival sequence of IP-ECN codepoints
 *  into the ECN sequence (ESQ) field of a TCP ACK, then send it.
 */
send_ack(rl1,rl2,sp,mk1,mk2) {
    del_timer(dack)         // Remove any pending delayed ACK timer
    /* Marshall the ECN Sequence field (esq) */
    pkt.tcp.esq = lsb(2,sp) & lsb(2,mk1) & lsb(3,rl1) & lsb(3,rl2)
    /* lsb(n,x): pseudocode for the lowest n significant bits of x */
    /* x & y   : pseudocode for concatenate x and y */
    /*
     * Insert code to send ACK here, with mk2 in pkt.tcp.ace
     */
    /* Reset all variables ready for next packet arrival */
    sp = mk1 = DEFAULT
    rl1 = rl2 = 0
}
]]></artwork>
        </figure>
      </section>
    </section>

    <section anchor="accecn_Alt_Designs"
             title="Alternative Design Choices (To Be Removed Before Publication)">
      <t>This appendix is informative, not normative. It records alternative
      designs that the authors chose not to include in the normative
      specification, but which the IETF might wish to consider for
      inclusion.</t>

      <section anchor="accecn_SupAccECN_on_SYN_ACK"
               title="Supplementary AccECN Field on the SYN/ACK">
        <t>{ToDo: The tcpm working group is recommended to consider including
        this in an AccECN RFC from the start. The AccECN protocol defined in
        the body of this specification currently gives no ECN feedback on the
        SYN/ACK on the assumption that the SYN is not ECN-capable. If it is
        required for the protocol to be future-proofed against the possibility
        that SYNs might one-day be ECN-capable, the following definition of
        the SupAccECN field for the SYN/ACK would need to be added to <xref
        target="accecn_SupAccECN_Placement"/> and <xref
        target="accecn_SupAccECN_Structure"/>. The text below is written as if
        it is normative, but it is only informative while it is demoted to
        this appendix.}</t>

        <section anchor="accecn_SupAccECN_Placement_SYN_ACK"
                 title="Placement of the Supplementary AccECN Field in a SYN/ACK">
          <t>To include the SupAccECN field on a SYN/ACK, the Data Receiver
          MUST use the SupAccECN TCP Option with TCP option Kind 0x<KK>
          (TBA) and set the Length field to 3 [octets], as illustrated in
          <xref target="accecn_Fig_SupAccECN_TCP_Option_SYN_ACK"/>. .</t>

          <figure align="center"
                  anchor="accecn_Fig_SupAccECN_TCP_Option_SYN_ACK"
                  title="Placement of the SupAccECN field within the SupAccECN TCP Option on a SYN/ACK">
            <artwork><![CDATA[ 0                   1                   2
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Kind = 0xKK  |  Length = 3   |0 0 0 0|  Sup- |
|               |               |       | AccECN|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
          </figure>

          <t>If the Data Sender has entered AccECN mode but there is no
          SupAccECN TCP Option on a SYN/ACK, the Incoming AccECN Protocol
          Handler MUST take the SupAccECN field to be right-justified within
          the Non-Urgent field (i.e. the least significant bit of SupAccECN is
          aligned with the least significant bit of the Non-Urgent Field) as
          shown in <xref target="accecn_Fig_SupAccECN_Non-Urgent_SYN_ACK"/>.
          The remaining most significant bits are currently unused (CU).</t>

          <?rfc needLines="7" ?>

          <figure align="center"
                  anchor="accecn_Fig_SupAccECN_Non-Urgent_SYN_ACK"
                  title="Placement of the SupAccECN field within the Non-Urgent field on a SYN/ACK">
            <artwork><![CDATA[      0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
    | X   X   X   X   X   X   X   X   X   X   X   X |   SupAccECN   |
    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

]]></artwork>
          </figure>
        </section>

        <section anchor="accecn_SupAccECN_Structure_SYN_ACK"
                 title="Structure of the Supplementary AccECN Field in a SYN/ACK">
          <t>The size of the SupAccECN field on a SYN/ACK (i.e. a segment with
          SYN = 1 and ACK = 1) is always 4 bits. <xref
          target="accecn_Fig_SupAccECN_SYN"/> defines the sub-fields of the
          SupAccECN field on a SYN/ACK.</t>

          <?rfc needLines="11" ?>

          <figure align="center" anchor="accecn_Fig_SupAccECN_SYN"
                  title="The Supplementary AccECN Field on a SYN/ACK Segment">
            <artwork><![CDATA[             0   1   2   3
           +---+---+---+---+
           | D-ECN | E-ECN |
           +---+---+---+---+
]]></artwork>
          </figure>

          <t>The sub-fields of SupAccECN on a SYN/ACK segment have the
          following meanings:<list style="hanging">
              <t hangText="E-ECN:">Echo ECN, for the responding host (B) to
              echo the IP-ECN field that arrives in the SYN. RFC 3168 requires
              that the ECN field on a SYN must always be Not-ECT (0b00).
              Therefore initially the E-ECN field is likely to always be 0b00.
              However, the AccECN wire protocol allows for the possibility
              that ECN-capable SYNs might be allowed in future. The responding
              host (B) MUST echo a copy of the IP-ECN field of the SYN in the
              E-ECN field of the SYN/ACK. <vspace blankLines="1"/>If the SYN
              were to arrive carrying a congestion indication, the responding
              host (B) MUST also increment the relevant counter (r.ci, r.e1 or
              r.e1 ) as specified in <xref target="accecn_ACE"/>. Then the
              counters on subsequent feedback will remain consistent even
              though the SYN/ACK does not have an ACE field to feedback
              congestion counters (because it is still using the same bits as
              flags for capability negotiation). The E-ECN field has been
              defined within a SYN/ACK because the start of a flow is when it
              is most critical for congestion feedback to be timely. Without
              the E-ECN field, feedback of any congestion marking on a SYN
              would get deferred for at least a round trip.</t>

              <t hangText="D-ECN:">Reserved for a Duplicate ECN field, meaning
              a duplicate of the ECN field in the IP header of the same
              packet. This field is not defined in the present specification,
              but it is reserved for possible use by a companion specification
              about ECN-fall-back (see <xref target="accecn_ECN_Fall-Back"/>).
              <vspace blankLines="1"/>Forward Compatibility: In the meantime,
              the responding host (B) MUST set D-ECN to ECT(0) (0b10), the
              originating host (A) MUST ignore this field and middleboxes MUST
              forward this field unaltered whether or not it is 0b10.</t>
            </list></t>
        </section>
      </section>

      <section anchor="accecn_Remove_Not-ECT_from_ESQ"
               title="Remove Not-ECT from ECN Sequence (ESQ) Encoding">
        <t>This alternative encoding would allow the ESQ field to be 1 bit
        shorter (9 bits instead of 10). The trade-off is that the receiver has
        to send an ACK immediately whenever a Not-ECT packet arrives. This is
        because this alternative encoding only caters for one Not-ECT
        codepoint in the ACE field, and none in the ESQ field.</t>

        <t>Once ECN has been negotiated for a connection, the sender ought to
        rarely send data segments with the Not-ECT codepoint. The only data
        segments on which RFC 3168 requires the sender to set Not-ECT are
        retransmissions and window probes. Pure ACKs also have to be sent as
        Not-ECT, but they are not data segments, so they are not included in
        the feedback sequence.</t>

        <t>If the encoding of the ESQ field has to allow for Not-ECT as well
        as the three ECN-capable codepoints, it needs space to encode 4
        possible spaces and 4 possible marks. This requires 4 bits for 4x4=16
        combinations (two 2-bit fields for SP and MK1). If on the other hand
        Not-ECT is excluded, space for only 3x3=9 combinations is required.
        This many combinations can only be fitted into 3 bits if they can be
        reduced to 8 codepoints by encoding two combinations as one symbol.
        Two combinations can be encoded as one symbol using the same encoding
        for sp=mk1=ECT(1) and sp=mk1=CE. This is because either an ECT(1) or
        CE code in the ACE field can be used to distinguish which is which.
        However, whenever a run of ECT(1) or of CE ended, the encoding
        algorithm would have to send two ACKs at once.</t>

        <t>Arguments against this alternative design choice:<list
            style="symbols">
            <t>Although retransmissions would be expected to be rare in a
            fully ECN-enabled network, there might be frequent losses and
            retransmissions during early deployment of ECN, when many
            bottleneck links might not be ECN-enabled. Then this alternative
            encoding would reduce the opportunities when a receiver could use
            delayed ACKs.</t>

            <t>Even if the sender sets Not-ECT on few data segments,
            incorrectly configured or buggy network equipment exists that
            clears the IP-ECN field to Not-ECT. With this alternative
            encoding, connections via such equipment would never be able to
            use delayed ACKs. The consequential extra ACK load might be
            considered an incentive for these networks to fix their bugs.
            However, the endpoints would also suffer the extra ACK load.</t>

            <t>To save 1 bit in the encoding it seems necessary for the
            algorithm to sometimes have to send two ACKs at once.</t>
          </list></t>
      </section>

      <section anchor="accecn_ECN_Fall-Back" title="ECN Fall-Back">
        <t>{ToDo: consider whether the present specification could be enhanced
        with ECN fall-back on the SYN/ACK to give earlier fall-back than in
        <xref target="I-D.kuehlewind-tcpm-ecn-fallback"/>. Space for a
        duplicate of the IP-ECN field on the SYN/ACK has been reserved in the
        SupAccECN field (<xref target="accecn_SupAccECN_on_SYN_ACK"/>), but
        the behaviour is still TBA. A duplicate of the IP-ECN field has not
        been provided on the SYN, because it would be unremarkable if ECN on
        the SYN was zeroed by security devices, given RFC 3168 prohibited ECT
        on SYN because it enables DoS attacks. Therefore the IP-ECN field has
        to be tested on the last ACK of the 3WHS, IMO}</t>
      </section>

      <section anchor="accecn_Delayed_ACK_Control"
               title="Remote Delayed ACK Control Proposal">
        <t>{ToDo: The tcpm working group is recommended to consider including
        this in an AccECN RFC from the start, because it would be less useful
        if it was unpredictable whether it had been implemented. The text
        below is written as if it is normative, but it is only informative
        while it is demoted to this appendix.} {ToDo: Add a use-case.}</t>

        <t>Traditionally, each decision on whether to delay an ACK is taken
        independently by the Data Receiver. This makes it hard to deploy
        behaviours where the Data Sender would like the Data Receiver not to
        delay feedback, perhaps so that it can measure the effect of subtle
        changes in the timing between packets to more rapidly get up to speed
        during slow-start without overshoot.</t>

        <t>A single bit for a Delayed ACK Control (DAC) flag is defined within
        the SupAccECN field of segments with SYN=0. Space for this is reserved
        in <xref target="accecn_SupAccECN_Structure"/> and illustrated in
        <xref target="accecn_Fig_SupAccECN"/>. For either half-connection, the
        Data Sender can use the DAC flag to request that the remote Data
        Receiver turns delayed ACKing on or off:<list style="symbols">
            <t>DAC = 0 means the sender requests that the receiver turns
            Delayed ACKing on, using the receiver's choice of delayed ACK
            factor.</t>

            <t>DAC = 1 means the sender requests that the receiver turns
            Delayed ACKing off.</t>
          </list></t>

        <t>For resilience, the Data Sender MUST repeat its currently chosen
        value of DAC continuously on every packet. The Data Receiver SHOULD
        start to honour the request on receipt. Therefore, as soon as a
        segment arrives with DAC=1, a Data Sender SHOULD immediately send any
        deferred ACKs and no longer withhold ACKs while it continues to
        receive segments with DAC=1. The DAC flag is meaningful on every
        packet with SYN=0. The DAC flag is not needed and therefore not
        present in the SupAccECN field when SYN=1 (<xref
        target="accecn_Fig_SupAccECN_SYN"/>), because TCP never withholds the
        SYN/ACK or the final ACK of the 3-way handshake.</t>

        <t>A receiver MAY ignore a request from a sender to alter its Delayed
        ACKing behaviour, e.g. a challenged receiver that cannot send ACKs
        fast enough need not turn off Delayed ACKs, or a receiver that has not
        implemented delayed ACKs need not turn them on.</t>
      </section>
    </section>

    <section anchor="accecn_Open_Issues"
             title="Open Protocol Design Issues (To Be Removed Before Publication)">
      <t><list style="numbers">
          <t>A possibility to simplify the protocol would be to remove
          ordering feedback entirely, but require the receiver to disable
          delayed ACKs during slow-start (including within a connection after
          a time-out or idle period) or to provide the DAC flag to allow the
          sender to ask the receiver to disable delayed ACKs when it needs
          more accuracy. However, not delaying ACKs may impact server
          performance. Also a new way to identify middlebox interference in
          the remaining SupAccECN field (Top-ACE & DAC) would have to be
          found.</t>

          <t>The protocol currently gives no ECN feedback on the SYN/ACK on
          the assumption that the SYN is not ECN-capable. If it is required
          for the protocol to be future-proofed against the possibility that
          SYNs might one-day be ECN-capable, the proposal in <xref
          target="accecn_SupAccECN_on_SYN_ACK"/> could be adopted. This also
          provides earlier ECN-fall-back than would otherwise be possible.</t>

          <t><xref target="accecn_SupAccECN_Placement"/> says an AccECN
          implementation has to be prepared to read the SupAccECN field from
          either a TCP option or the Non-Urgent field. If the definition of
          the SupAccECN field changes between this experimental spec and the
          standards track spec, the structure of the Non-Urgent field will
          have to include a version number somehow.</t>

          <t>The Non-Urgent field might be used for something else in future
          rather than SupAccECN, despite the attempt to reserve it in this
          spec. <xref target="accecn_SupAccECN_Placement"/> says "If a
          SupAccECN TCP option is present, the Non-Urgent field MUST be
          ignored.", which seems to correctly ensure that experimental
          implementations will not read the altered Non-Urgent field in this
          case. However, they will incorrectly read the Non-Urgent field if a
          future AccECN protocol uses a different TCP option.</t>

          <t>There is possibly a concern that, if the supplementary field is
          unavailable, the counter selection (<xref
          target="accecn_ACE_Counter_Selection"/>) always uses the last
          codepoint in a delayed ACK, which may starve visibility of other
          counters.</t>

          <t>Counter Selection Algo #Alt2 <xref
          target="accecn_Algo_Counter_Select2"/> needs to be altered to
          prevent the E1 counter being continually repeated when no ECT(1)
          codepoints are arriving at the Data Receiver.</t>

          <t>A production version of Counter Selection Algo #Alt1 <xref
          target="accecn_Algo_Counter_Select1"/> needs to be developed that
          handles wrapping of the variables, without losing
          proportionality.</t>

          <t>Example algorithms need to be developed that decode the
          Top-ACE:ACE counters correctly when ACKs are reordered.</t>

          <t>The definition of the D-ECN field <xref
          target="accecn_SupAccECN_Structure"/> and ECN fall-back more
          generally <xref target="accecn_ECN_Fall-Back"/> will need to be
          resolved before publication.</t>
        </list></t>
    </section>

    <section anchor="accecn_Doc_Changes"
             title="Changes in This Version (To Be Removed Before Publication)">
      <t>The difference between any pair of versions can be displayed at
      <http://datatracker.ietf.org/doc/draft-kuehlewind-tcpm-accurate-ecn/history/><list
          style="hanging">
          <t hangText="From 02 to 03:"><list style="symbols">
              <t>Extensively rewritten. No summary of changes has been
              prepared.</t>
            </list></t>
        </list></t>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 05:06:34