One document matched: draft-fairhurst-tsvwg-buffers-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc category="bcp" docName="draft-fairhurst-tsvwg-buffers-00"
     ipr="trust200902" obsoletes="" updates="RFC 3819 (if published)">
  <!-- Updates SDP -->

  <?rfc toc="yes"?>

  <!-- generate a table of contents -->

  <?rfc symrefs="yes"?>

  <!-- use anchors instead of numbers for references -->

  <?rfc sortrefs="yes" ?>

  <!-- alphabetize the references -->

  <?rfc compact="yes" ?>

  <!-- conserve vertical whitespace -->

  <?rfc subcompact="no" ?>

  <!-- but keep a blank line between list items -->

  <front>
    <title abbrev="Advice on network buffering">Advice on network
    buffering</title>

    <author fullname="Godred Fairhurst" initials="G." surname="Fairhurst">
      <organization>University of Aberdeen</organization>

      <address>
        <postal>
          <street>School of Engineering</street>

          <street>Fraser Noble Building</street>

          <city>Aberdeen</city>

          <region>Scotland</region>

          <code>AB24 3UE</code>

          <country>UK</country>
        </postal>

        <email>gorry@erg.abdn.ac.uk</email>

        <uri>http://www.erg.abdn.ac.uk/~gorry</uri>
      </address>
    </author>

    <author fullname="Bob Briscoe" initials="B. " surname="Briscoe">
      <organization>BT</organization>

      <address>
        <postal>
          <street>B54/77, Adastral Park</street>

          <city>Martlesham Heath</city>

          <region>Ipswich</region>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <phone>+44 1473 645196</phone>

        <facsimile></facsimile>

        <email>bob.briscoe@bt.com</email>

        <uri>http://bobbriscoe.net/</uri>
      </address>
    </author>

    <date day="10" month="March" year="2013" />

    <area>Transport</area>

    <workgroup>TSVWG Working Group</workgroup>

    <keyword>buffers</keyword>

    <keyword>router</keyword>

    <abstract>
      <t>This document proposes an update to the advice given in RFC 3819.
      Subsequent research has altered understanding of buffer sizing and queue
      management. Therefore this document significantly revises the previous
      recommendations on buffering. The advice applies to all packet buffers,
      whether in network equipment, end hosts or middleboxes such as firewalls
      or NATs. And the advice applies to packet buffers at any layer: whether
      subnet, IP, transport or application.</t>
    </abstract>
  </front>

  <middle>
    <!-- text starts here -->

    <section title="Introduction" toc="include">
      <t><xref target="RFC3819"></xref> provides guidance on the design of
      subnetworks and networking equipment. This document updates this
      guidance for the topic of Internet buffer configuration and control. The
      guidance is aimed at both equipment designers and network operators.</t>

      <t>All networking devices use buffers to temporarily store packets that
      are waiting for transmission on an out-going link during traffic bursts
      or at times when the capacity of the ingress/egress changes.</t>

      <t>The congestion control algorithms in TCP (and derivatives of TCP) are
      designed to try to fully utilise the link that has the least available
      capacity on the path across the network. This is called the bottleneck
      link. Network link capacities are typically arranged so that it will be
      rare for a bottleneck to arise in the network core. However, depending
      on prevailing patterns of traffic, any link might become the bottleneck
      (within the host, at an edge router, at a core router, at a switch in
      the subnet between routers or at some middlebox such as a firewall or a
      network address translator). Modern TCP stacks are capable of filling a
      link of any capacity.</t>

      <t>A buffer that simply discards incoming packets when it is full is
      called a tail-drop buffer. A long-running TCP flow will fill a tail-drop
      buffer and keep it full, so that there is no longer any space to absorb
      bursts. This is called a standing queue. Packets arriving at the tail of
      a standing queue still work their way through the buffer until they
      emerge onto the link, but this introduces unnecessary delay to every
      packet, including those from other sessions sharing the link. This can
      intermittently add intolerable delay to a real-time interactive media
      session (e.g. voice or video). Also, most Web pages involve dozens of
      short back-and-forth exchanges, so adding even a small amount of queuing
      delay to each round can accumulate considerable delay in the completion
      of the whole task.</t>

      <t>The recommended way to avoid these problems is to use an active queue
      management (AQM) algorithm in every potential bottleneck buffer (subnet,
      router, middlebox or host), and to enable explicit congestion
      notification (ECN). However, if AQM has not been implemented in existing
      equipment, the next best option is to at least size the buffer so that
      it is no larger than needed to absorb bursts.</t>

      <t>This document gives advice on using and configuring AQM algorithms
      and ECN, and advice on buffer sizing in the absence of such
      algorithms.</t>

      <t>The correct buffer size depends on the link rate, so a common problem
      is where equipment auto-adjusts its rate, often over a wide range, so
      the buffer size can be badly incorrect. Advice is also given on how to
      relate buffer auto-sizing algorithms to rate-adjusting algorithms, and
      the best static buffer size to configure if auto-sizing has not been
      implemented.</t>

      <t>It is difficult to test whether a network might exhibit these
      problems. They only appear intermittently, because they depend on four
      pathologies co-inciding: i) a particular buffer has become the
      bottleneck for a long-running TCP flow, which depends on relative
      traffic levels in other links, ii) the TCP flow has run for long enough
      to fill this buffer, iii) the buffer lacks AQM or the AQM is badly
      configured and iv) the buffer has been badly over-sized. When all four
      conditions co-incide, the delays can be bad enough to lead to support
      desk calls.</t>

      <t>This document updates section 13 of RFC 3819, which gave guidance to
      subnet designers on the use and sizing of buffers. <xref
      target="buf_3819"></xref> reviews that guidance, which now requires
      considerable revision in the light of subsequent research. Also, whereas
      RFC 3819 addressed subnet designers, the advice in this document is
      relevant to a wider audience, because it concerns buffers wherever they
      are, including in end-systems and middleboxes not just in subnet
      technology.</t>
    </section>

    <section title="Terminology">
      <t>The document assumes familiarity with the terminology of RFC 3819
      <xref target="RFC3819"></xref>.</t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"></xref>.</t>

      <t>The term active queue management (AQM) has been applied to
      technologies that work only at the packet level as well as technologies
      that identify and police flows with above average rates or that enforce
      flow-level or user-level policies such as fair queuing. For this
      document, we will use the term 'AQM' for technologies or parts of
      technologies that treat packets indiscriminately, and the term
      'policing' for the additional technologies that attempt to enforce some
      level of behaviour or isolation at the flow or user level of
      granularity.</t>
    </section>

    <section title="Updated Recommendations on Buffering">
      <t>This section updates the rules for network buffers in section 13 of
      RFC 3819.</t>

      <section title="Recommendations Applicable to Any Buffer">
        <t>XX Work in Progress, to be included in next revision XX</t>

        <t>AQM is strongly recommended recommended for any buffer. Auto-tuned
        configuration is recommended.</t>

        <t>Explicit Congestion Notification (ECN) <xref
        target="RFC3168"></xref> is also strongly recommended for any buffer
        (this avoids delays due to timeouts after loss). It is safe to enable
        ECN for routers and servers. If concerns arise over the use of ECN,
        this can be fully addressed by turning off ECN support at the
        endpoint. If routers and servers were not to enable ECN, where it is
        deemed safe, it will not be possible for endpoints to turn it on.</t>

        <t>Buffer size: if AQM is implemented, there is no harm in having a
        large buffer to absorb bursts. However, if there is no AQM, it is
        important to keep the buffer small.<list style="symbols">
            <t>Too little buffering can result in poor utilisation of the
            egress link, since many traffic flows are not smooth-paced and
            bursts of traffic may fail to be buffered.</t>

            <t>Large buffers can help ensure full utilisation of the egress
            link, but excessive buffering results in slow response to
            congestion and in unnecessary delay experienced by any flow that
            shares the egress link. Such events are not uncommon, since a
            single long- lived connection using a modern TCP stack can fill
            any size of network buffer. </t>
          </list></t>

        <t>Auto-sizing is recommended if the line rate is adjustable or
        auto-adjusts (e.g. setting buffer time, not byte-size). If auto-sizing
        has not been implemented, a large buffer is not best. Too small a
        buffer reduces link utilisaiton. If it is necessary to find a
        compromise size for adjustable line rates, should consider sacrificing
        some utilisation at lower rates to keep the buffer delay
        reasonable.</t>
      </section>

      <section title="Buffering recommendations for end hosts">
        <t>XX Work in Progress, to be included in next revision XX</t>

        <t>Large buffers are not best. AQM and auto-tuning/auto-sizing are as
        applicable in end hosts as in network equipment.</t>

        <t>ECN may even be appropriate (e.g. on a subsystem such as a NIC),
        but within a host it should be possible to use back-pressure messages
        instead.</t>

        <t>Buffer sizing recommendations specific to end-systems.</t>
      </section>

      <section title="Buffering recommendations for edge routers and switches">
        <t>XX Work in Progress, to be included in next revision XX</t>

        <t>Large is not best.</t>

        <t>AQM and ECN are strongly recommended.</t>

        <t>Buffer sizing recommendations specific to edge routers, switches
        & middleboxes.</t>
      </section>

      <section title="Buffering recommendations for core routers and switches">
        <t>XX Work in Progress, to be included in next revision XX</t>

        <t>Large is not best.</t>

        <t>Buffer sizing recommendations specific to core routers &
        switches.</t>
      </section>

      <section title="Recommendations on Flow Isolation">
        <t>XX Work in Progress, to be included in next revision XX</t>

        <t>Still a subject of debate and research. May be able to recommend
        something here, but more likely will commentate on the debate.</t>
      </section>
    </section>

    <section title="Buffer Management Methods" toc="include">
      <t>This section provides informative documentation of current
      practice.</t>

      <section title="Examples of subnetwork buffering">
        <t>This section provides informative examples of buffer configuration
        and their impact on network traffic {TBA: to consider whether to
        bless, deprecate or merely state each of these practices}.<list
            style="symbols">
            <t>An Ethernet subnetwork may operate over a range of speeds from
            a shared 10 Mbps of capacity to over 40 Gbps. The buffering
            required depends on the link speed and many Many device drivers
            and operating systems do not adjust their buffering to the
            available capacity. The first hop link from a host often has a
            higher speed than the subsequent links along a network path.</t>

            <t>Subnetwork flow-control can be triggered when a subnetwork link
            suffers congestion. An example is the use of Ethernet Pause frames
            (e.g. by consumer Ethernet switches) to slow a sender emitting
            traffic towards a congestion switch port. These methods can
            increase the buffering experienced by the end-to-end flow.</t>

            <t>Docsis 3.1 supports transmission up to 300Mbps. A current modem
            can be plugged into a current network. Then suppose a customers
            service only supports 10 Mbps, the network equipment may be 30
            times over-buffered (assuming buffers are dimensioned based on the
            maximum bit rate). The buffer control amendment may be implemented
            in the modem, and in its provisioning system to address this type
            of issue. Similar issues apply for other link technologies, were
            the offered service is often less than the maximum supported
            rate.</t>

            <t>On wireless, bandwidth (and hence network capacity) is often
            highly variable, unless you have a fixed point to point link. Even
            fixed links may use adaptive methods and propagation conditions
            can cause the capacity to var</t>
          </list></t>
      </section>

      <section title="Examples of methods for active buffer management">
        <t>This section provides informative examples of active buffer
        management.</t>

        <t>While large buffers can lead to an increase in experienced network
        delay, they do not necessarily impact the flow delay. The issue is not
        how how much buffering is provided, but how the provided buffers are
        used to manage the flow of traffic.</t>

        <t>Several active buffer/queue management methods have been proposed
        that can significantly improve performance of flows using a
        (potentially) congested bottleneck.</t>

        <t><list style="symbols">
            <t>RED</t>

            <t>CoDel</t>

            <t>Pi</t>

            <t>etc</t>
          </list></t>
      </section>
    </section>

    <section title="Security Considerations" toc="include">
      <t>Decisions on queue management and buffer sizing are neutral to
      security considerations if they act indiscriminately over all packets.
      Recommendations on treatment or lack of treatment at the flow or
      user-level can have security considerations, which are TBA.</t>

      <t>The question of whether end-systems respond to congestion signals is
      a valid security concern, but outside the scope of this document.</t>
    </section>

    <section title="IANA Considerations" toc="include">
      <t>This document does not require any IANA considerations.</t>

      <t>[RFC-ED]: Please remove this section prior to publication.</t>
    </section>

    <section title="Acknowledgments">
      <t>This work was part-funded by the European Community under its Seventh
      Framework Programme through the Reducing Internet Transport Latency
      (RITE) project (ICT-317700). The views expressed are solely those of the
      author.</t>

      <t>The authors acknowledge contributions from: Jim Gettys.</t>

      <t><!----></t>
    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <!-- -->

    <references title="Normative References">
      <?rfc sortrefs="yes"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3819.xml"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml"?>
    </references>

    <references title="Informative References">
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.0793.xml"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?>

      <reference anchor="Wischik">
        <front>
          <title>TCP Buffer Sizing Advice</title>

          <author fullname="Wischik">
            <organization></organization>
          </author>

          <author fullname="McKeown">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <date />
        </front>
      </reference>

      <reference anchor="Ganjali">
        <front>
          <title>Update on Buffer Sizing in Internet Routers; ACM SIGCOMM
          Computer Communication Review 36 ACM</title>

          <author fullname="Ganjali" initials="Y" surname="Ganjali">
            <organization></organization>
          </author>

          <author fullname="McKeown" initials="N" surname="McKeown">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <date month="October" year="2006" />
        </front>
      </reference>

      <reference anchor="Appenzeller">
        <front>
          <title>Sizing router buffers; ACM SIGCOMM ’04, pages
          281–292, New York, NY, USA.</title>

          <author fullname="Appenzeller" initials="G" surname="Appenzeller">
            <organization></organization>
          </author>

          <author fullname="Keslassy" initials="I" surname="Keslassy">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <author fullname="McKeown" initials="N" surname="McKeown">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <date year="2004" />
        </front>
      </reference>

      <reference anchor="Villamizar">
        <front>
          <title>High Performance TCP in ANSNET; ACM Computer Communications
          Review, 24(5):45–60</title>

          <author fullname="Villamizar" initials="C" surname="Villamizar">
            <organization></organization>
          </author>

          <author fullname="Song" initials="C" surname="Song">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <date year="1994" />
        </front>
      </reference>
    </references>

    <section anchor="buf_3819"
             title="vious IETF guidance for configuring network buffers"
             toc="include">
      <t>This section reviews previous guidance for configuring network
      buffers and motivates the need to update these recommendations.</t>

      <t>Guidance for the use of buffers was provided in section 13 of RFC
      3819:</t>

      <t>"each node should have enough buffering to hold one
      link_bandwidth*link_delay product's worth of data for each TCP
      connection sharing the link."</t>

      <t>However, in today's Internet, a deployment following this
      recommendation would overly allocate buffering for a network link that
      supports multiple flows. This is discussed in the observations
      below:</t>

      <t><list style="symbols">
          <t>This buffering recommendation is appropriate for a device that
          supports a single or small number of bulk TCP flows <xref
          target="Villamizar"></xref>.</t>

          <t>The buffering is unduly large when there are more than a small
          number of flows (e.g. >10). The goal of sharing between TCP flows
          requires only that the buffering is sufficient to hold one
          link_bandwidth*path_delay product's worth of data for the longest
          path flow. The more flows share a link, the less buffering is needed
          <xref target="Appenzeller"></xref>, unless the egress link becomes
          congested with so many flows that there are only a few packets per
          flow buffered.</t>

          <t>Many egress links have a higher level of multiplexing (e.g.
          >100 of uncorrelated flows). This is often found beyond the edge
          of a network. In this case, the buffer size may be inversely
          proportional to the square root of the number of flows (for medium
          numbers . For still higher levels of multiplexing, this may be of
          the order of the logarithm of the number of flows <xref
          target="Wischik"></xref><xref target="Ganjali"></xref>.</t>

          <t>Note that while optimal buffering may be a function of the number
          of concurrent flows, it is not recommended to tune buffering by
          dynamically estimating the number of flows sharing a network device
          or path, or by attempting to classify flows as "long", "short", etc.
          Such estimates are difficult, due to the wide variety of flow
          behaviours and the use of aggregation methods (such as tunnels) that
          hide the traffic of individual flows.</t>

          <t>In deployed scenarios (apart from restricted deployments in
          operator-controlled subnetworks), it is usually impossible for a
          router or other network middlebox to know the experienced by a flow.
          In the Internet service model this information is only available to
          end points (e.g. using feedback provided by TCP <xref
          target="RFC0793"></xref> or RTCP <xref target="RFC3550"></xref>. It
          is therefore not usually possibly for operators to use the
          end-to-end path delay calculation to determine the size of buffering
          when configuring network equipment.</t>
        </list></t>

      <t>The discussion in section 13 of RFC 3819 summarises:</t>

      <t>"In general, it is wise to err in favor of too much buffering rather
      than too little."</t>

      <t>While this advice may have been appropriate when routers and
      subnetworks with small numbers of flows and low buffer memory <xref
      target="Villamizar"></xref>, this advice is now not appropriate for many
      modern networks.</t>

      <t>Section 13 of RFC 3819 also motivates using methods such as Active
      Queue Management, AQM and <xref target="RFC3168"></xref>. However, at
      the time of writing there was little deployment experience, and little
      understanding of how to configure these methods. We now argue that these
      methods should be considered for deployment in operational networks.</t>
    </section>

    <section title="Revision notes">
      <t>RFC-Editor: Please remove this section prior to publication</t>

      <t>Draft 00</t>

      <t><list style="symbols">
          <t>This contains the first draft for comment.</t>
        </list></t>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 16:29:55