One document matched: draft-fairhurst-tsvwg-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc category="std" docName="draft-fairhurst-tsvwg-00g" ipr="trust200902"
     obsoletes="" updates="">
  <!-- Updates SDP -->

  <?rfc toc="yes"?>

  <!-- generate a table of contents -->

  <?rfc symrefs="yes"?>

  <!-- use anchors instead of numbers for references -->

  <?rfc sortrefs="yes" ?>

  <!-- alphabetize the references -->

  <?rfc compact="yes" ?>

  <!-- conserve vertical whitespace -->

  <?rfc subcompact="no" ?>

  <!-- but keep a blank line between list items -->

  <front>
    <title abbrev="">Network Transport Circuit Breakers</title>

    <author fullname="Godred Fairhurst" initials="G." surname="Fairhurst">
      <organization>University of Aberdeen</organization>

      <address>
        <postal>
          <street>School of Engineering</street>

          <street>Fraser Noble Building</street>

          <city>Aberdeen</city>

          <region>Scotland</region>

          <code>AB24 3UE</code>

          <country>UK</country>
        </postal>

        <email>gorry@erg.abdn.ac.uk</email>

        <uri>http://www.erg.abdn.ac.uk</uri>
      </address>
    </author>

    <date day="12" month="April" year="2014" />

    <area>Transport</area>

    <workgroup>TSVWG Working Group</workgroup>

    <keyword></keyword>

    <keyword></keyword>

    <abstract>
      <t>This note explains what is meant by the term "transport circuit
      breaker" in the context of an Internet tunnel service.</t>
    </abstract>
  </front>

  <middle>
    <!-- text starts here -->

    <section title="Introduction" toc="include">
      <t>A transport Circuit Breaker (CB) is an automatic mechanism that is
      used to estimate congestion caused by a flow, and to terminate (or
      significantly reduce the rate of) the flow when excessive congestion is
      detected. This is a safety measure to prevent congestion collapse
      (starvation of resources available to other flows), essential for an
      Internet that is heterogeneous and for traffic that is hard to predict
      in advance.</t>

      <t>A CB is intended as a protection mechanism of last resort. Under
      normal circumstances, a CB should not be triggered; It is designed to
      protect things when there is overload. Just as people do not expect the
      electrical circuit-breaker (or fuse) in their home to be triggered,
      except when there is a wiring fault or a problem with an electrical
      appliance.</t>

      <t>Persistent congestion (also known as "congestion collapse") was a
      feature of the early Internet of the 1980s. This resulted in excess
      traffic starving other connection from access to the Internet. It was
      countered by the requirement to use congestion control (CC) by the TCP
      transport protocol<xref target="Jacobsen88"></xref> <xref
      target="RFC1112"></xref>. These mechanisms operate in Internet hosts to
      cause TCP connections to "back off" during congestion. The introduction
      of CC in TCP (currently documented in <xref target="RFC5681"></xref>
      ensured the stability of the Internet, because it was able to detect
      congestion and promptly react. This worked well while TCP was by far the
      dominant traffic in the Internet, and most TCP flows were long-lived
      (ensuring that they could detect and respond to congestion before the
      flows terminated). This is no longer the case, and non-congestion
      controlled traffic, such as UDP can form a significant proportion of the
      total traffic traversing a link. The current Internet therefore requires
      that non-congestion controlled traffic needs to be considered to avoid
      congestion collapse.</t>

      <t>There are important differences between a transport circuit-breaker
      and a congestion-control method. Specifically, congestion control (as
      implemented in TCP, SCTP, and DCCP) needs to operate on the timescale on
      the order of a packet round-trip-time (RTT), the time from sender to
      destination and return. Congestion control methods may react to a single
      packet loss/marking and reduce the transmission rate for each loss or
      congestion event. The goal is usually to limit the maximum transmission
      rate that reflects the available capacity of a network path. These
      methods typically operate on individual traffic flows (e.g. a
      5-tuple).</t>

      <t>In contrast, CBs are recommended for traffic aggregates, e.g.traffic
      sent using a network tunnel. Later sections provide examples of cases
      where circuit-breakers may or may not be desirable.</t>

      <t>A CB needs to be designed to trigger robustly when there is
      persistent congestion. It will often operate on a much longer timescale:
      many RTTs, possibly many 10s of seconds. This longer period is needed to
      provide sufficient time for transports (or applications) to adjust their
      rate following congestion, and for the network load to stabilise after
      adjustment. A CB also needs to decide if a reaction is required based on
      a series of successive samples taken over a reasonably long period of
      time. This is to ensure that a CB does not accidentally trigger
      following a single (or even successive) congestion events (congestion
      events are what triggers congestion control, and are to be regarded as
      normal on a network link operating near its capacity).</t>

      <section title="Types of Circuit-Breaker">
        <t>There are various forms of circuit breaker, which are
        differentiated mainly on the timescale over which they are triggered,
        but also in the intended protection they offer:<list style="symbols">
            <t>Fast-Trip Circuit Breakers: The relatively short timescale used
            by this form of circuit breaker is intended to protect a flow or
            related group of flows.</t>

            <t>Slow-Trip Circuit Breakers: This circuit breaker utilises a
            longer timescale and is designed to protect traffic
            aggregates.</t>

            <t>Managed Circuit Breakers: Utilise the operations and management
            functions that may be present in a managed service to implement a
            circuit breaker.</t>
          </list>Examples of each type of circuit breaker are provided in
        section 4.</t>
      </section>
    </section>

    <section title="Terminology" toc="include">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"></xref>.</t>
    </section>

    <section title="Designing a Circuit-Breaker (What makes a good circuit breaker?)"
             toc="include">
      <t>Although circuit breakers have been talked about in the IETF for many
      years, there has not yet been guidance on the cases where they are need
      for or the design of circuit breaker mechanisms. This document seeks to
      offer advise on these topics.</t>

      <t>The basic design of a circuit breaker involves communication between
      the sender and receiver of a network flow. It is assumed that a sender
      can control the rate of the flow, but the effect of congestion can only
      be measured at the corresponding receiver (after loss/marking is
      experienced across the end-to-end path). The receiver therefore needs to
      be responsible for either measuring the level of congestion (and
      returning this measure to the sender to inform a trigger) or for
      detecting excessive congestion (returning the trigger to the sender).
      Whether the trigger is generated at the receiver or based on
      measurements returned to the sender, the result of the trigger (the
      circuit-breaker action) needs to be applied at the sender.</t>

      <t>The set of components needed to implement a circuit breaker are:</t>

      <t><list style="symbols">
          <t>There MUST be a control path from the receiver to the sender.
          Ideally the CB should trigger if this control path fails. That is,
          the feedback indicating a congested period is designed so that the
          sender triggers the CB action when it fails to receive reports from
          the receiver that indicate an absence of congestion, rather than
          relying on the successful transmission of a "congested" signal back
          to the sender. (The feedback signal could itself be lost under
          congestion collapse).</t>

          <t>A CB MUST define a measurement period over which the receiver
          measures the level of congestion. This method does not have to
          detect individual packet loss, but MUST have a way to know that
          packets have been lost/marked from the traffic flow. If ECN is
          enabled, a receiver MAY also count the number of Explicit Congestion
          Notification (ECN)<xref target="RFC3168"></xref> marks per
          measurement interval, but even if ECN is used, the loss MUST still
          be measured, since this better reflects the impact of excessive
          congestion. The type of CB will determine how long this measurement
          period needs to be. The minimum time must be significantly longer
          than the time that current CC algorithms need to reduce their rate
          following detection of congestion (i.e. many path RTTs).</t>

          <t>A CB MUST define a threshold to determine whether the measured
          congestion is considered excessive.</t>

          <t>A CB MUST define a period over which the trigger uses collected
          measurements.</t>

          <t>A CB MUST be robust to multiple congestion events. This usually
          will define a number of measured excessive congestion events per
          triggering period. For example, a CB may combine the results of
          several measurement periods to determine if the CB is triggered.
          (e.g. triggered when excessive congestion is detected in 3
          measurements within the triggering interval).</t>

          <t>A triggered CB MUST react decisively by reducing traffic at the
          source (e.g. tunnel egress). A CB SHOULD be constructed so that it
          does not trigger under light or intermittent congestion, hence the
          response when triggered needs to be much more severe than that of a
          CC algorithm. By default, a CB SHOULD disable the flow, it could
          alternatively significantly reduce the rate of the flow it
          controls.</t>

          <t>Triggering a CB SHOULD result in a response that continues for a
          period of time. This by default SHOULD be at least the triggering
          interval. Manual operator intervention MAY be required to restore
          the flow. If an automated response is needed to restore the flow,
          then this MUST NOT be immediate.</t>

          <t>When a CB is triggered, it SHOULD be regarded as an abnormal
          network event. As such, this event SHOULD be logged. The
          measurements that lead to triggering of the CB SHOULD also be
          logged.</t>
        </list></t>

      <section title="Basic Function" toc="default">
        <t>This section provides one example of a suitable method to measure
        congestion:</t>

        <t><list style="numbers">
            <t>A sender or a tunnel ingress records the number of
            packets/bytes sent in each measurement interval. The measurement
            interval could be every few seconds.</t>

            <t>The receiver or tunnel egress also records the number/bytes
            received (at ) in each measurement interval.</t>

            <t>The receiver periodically returns the measured values. (This
            could be using Operations and Management (OAM), or an in-band
            signalling datagram).</t>

            <t>Using the ingress and egress measurements, the loss rate for
            each measurement interval can be deduced from calculating the
            difference between these two counter values. Note that accurate
            measurement intervals are not typically important, since isolated
            loss events need to be disregard. An appropriate threshold for
            determining excessive congestion needs to be set (e.g. more than
            10% loss, but other methods could also be based on the rate of
            transmission as well as the loss rate).</t>

            <t>The transport circuit breaker is triggered when the threshold
            is exceeded in multiple measurement intervals (e.g. 3 successive
            measurements). This design is to be robust to single or spurious
            events resulting in a trigger.</t>

            <t>The design may also trigger loss when it does not receive
            receiver measurements for 3 successive measurement periods - this
            may indicate a loss of control packets.</t>
          </list></t>
      </section>
    </section>

    <section title="Examples of Circuit Breakers">
      <t>This section provides examples of different types of circuit breaker.
      There are multiple types of circuit breaker that may be defined for use
      in different deployment cases:</t>

      <section title="A fast-trip Circuit Breaker">
        <t>A fast-trip circuit breaker is the most responsive It has a
        response time that is only slightly larger than that of the traffic it
        controls. It is suited to traffic with well-understood
        characteristics. It is not be suited to arbitrary network traffic,
        since it may prematurely trigger (e.g. when multiple
        congestion-controlled flows lead to short-term overload).</t>

        <section title="A fast-trip RTP Circuit Breaker">
          <t>A set of fast-trip CB methods have been specified for use
          together by a Real-time Transport Protocol (RTP) flow using the
          RTP/AVP Profile :<xref target="RTP-CB"></xref> . It is expected
          that, in the absence of severe congestion, all RTP applications
          running on best-effort IP networks will be able to run without
          triggering these circuit breakers. </t>

          <t>The RTP congestion control specification is therefore implemented
          as a fail-safe.</t>

          <t>The sender monitors reception of RTCP Reception Report (RR or
          XRR) packets that convey reception quality feedback information.
          This is used to measure (congestion) loss, possibly in combination
          with ECN <xref target="RFC6679"></xref>.</t>

          <t>The CB action (shutdown of the flow) is triggered when any of the
          following trigger conditions are true:</t>

          <t><list style="numbers">
              <t>An RTP CB triggers on reported lack of progress.</t>

              <t>An RTP CB triggers when no receiver reports messages are
              received.</t>

              <t>An RTP CB uses a TFRC-style check and set a hard upper limit
              to the long-term RTP throughput (over many RTTs).</t>

              <t>An RTP CB includes the notion of Media Usability. This
              circuit breaker is triggered when the quality of the transported
              media falls below some required minimum acceptable quality.</t>
            </list></t>
        </section>
      </section>

      <section title="A Slow-trip Circuit Breaker">
        <t>It is expected that most circuit breakers will be slower at
        responding to loss. </t>

        <t>One example where a circuit breaker is needed is where flows or
        traffic-aggregates use a tunnel or encapsulation and the flows within
        the tunnel do not all support TCP-style congestion control (e.g. TCP,
        SCTP, TFRC), see <xref target="RFC5405"></xref> section 3.1.3. The
        usual case where this is needed is when tunnels are deployed in the
        general Internet (rather than "controlled environments" within an ISP
        or Enterprise), especially when the tunnel may need to cross a
        customer access router.</t>
      </section>

      <section title="A Managed Circuit Breaker">
        <t>This type of circuit breaker is implemented in the signalling
        protocol or management plane that relates to the traffic aggregate
        being controlled. This type of circuit breaker is typically applicable
        when the deployment is within a "controlled environment".</t>

        <section title="A Managed Circuit Breaker for SAToP Pseudo-Wires">
          <t><xref target="RFC4553"></xref>, SAToP Pseudo-Wires (PWE3),
          section 8 describes an example of a managed circuit breaker for
          isochronous flows. </t>

          <t>If such flows were to run over a pre-provisioned (e.g. MPLS)
          infrastructure, then it may be expected that the Pseudo-Wire (PW)
          would not experience congestion, because a flow is not expected to
          either increase (or decrease) their rate. If instead Pseudo-Wire
          traffic is multiplexed with other traffic over the general Internet,
          it could experience congestion. <xref target="RFC4553"></xref>
          states: "If SAToP PWs run over a PSN providing best-effort service,
          they SHOULD monitor packet loss in order to detect "severe
          congestion". The currently recommended measurement period is 1
          second, and the trigger operates when there are more than three
          measured Severely Errored Seconds (SES) within a period.</t>

          <t>If such a condition is detected, a SAToP PW should shut down
          bidirectionally for some period of time..." The concept was that
          when the packet loss ratio (congestion) level increased above a
          threshold, the PW was by default disabled. This use case considered
          fixed-rate transmission, where the PW had no reasonable way to shed
          load.</t>

          <t>The trigger needs to be set at the rate the PW was likely have a
          serious problem, possibly making the service non-compliant. At this
          point triggering the CB would remove the traffic prevent undue
          impact congestion-responsive traffic (e.g., TCP). Part of the
          rationale, was that high loss ratios typically indicated that
          something was "broken" and should have already resulted in operator
          intervention, and should trigger this intervention. An
          operator-based response provides opportunity for other action to
          restore the service quality, e.g. by shedding other loads or
          assigning additional capacity, or to consciously avoid reacting to
          the trigger while engineering a solution to the problem. This may
          require the trigger to be sent to a third location (e.g. a network
          operations centre, NOC) responsible for operation of the tunnel
          ingress, rather than the tunnel ingress itself.</t>
        </section>
      </section>
    </section>

    <section title="Examples where circuit breakers may not be needed. ">
      <t>A CB is not required for a single CC-controlled flow using TCP, SCTP,
      TFRC, etc. In these cases, the CC methods are designed to prevent
      congestion collapse.</t>

      <section title="CBs and uni-directional Traffic">
        <t>A CB can not be used to control uni-directional UDP traffic. The
        lack of feedback prevents automated triggering of the CB. Supporting
        this type of traffic in the general Internet requires operator
        monitoring to detect and respond to congestion collapse or the use of
        dedicated capacity - e.g. Using per-provisioned MPLS services, RSVP,
        or admission-controlled Differentiated Services.</t>
      </section>

      <section title="CBs over pre-provisioned Capacity">
        <t>One common question is whether a CB is needed when a tunnel is
        deployed in a private network with pre-provisioned capacity? In this
        case, compliant traffic that does not exceed the provisioned capacity
        should not result in congestion. The CB will hence only be triggered
        when there is non-compliant traffic. It could be argued that this
        event should never happen - but it may also be argued that the CB
        equally should never be triggered. If a CB were to be implemented, it
        would provide an appropriate response should this excessive congestion
        occur in an operational network.</t>
      </section>

      <section title="CBs with CC Traffic">
        <t>IP-based traffic is generally assumed to be congestion-controlled,
        i.e., it is assumed that the transport protocols generating IP-based
        traffic at the sender already employ mechanisms that are sufficient to
        address congestion on the path <xref target="RFC5405"></xref>. A
        question therefore arises when people deploy a tunnel that is thought
        to only carry an aggregate of TCP (or some other CC-controlled)
        traffic: Is there advantage in this case in using a CB? For sure,
        traffic in a such a tunnel will respond to congestion. However, the
        answer to the question is not obvious, because the overall traffic
        formed by an aggregate of flows that implement a CC mechanism does not
        necessarily prevent congestion collapse. For instance, most CC
        mechanisms require long-lived flows to react to reduce the rate of a
        flow, an aggregate of many short flows may result in many terminating
        before they experience congestion. It is also often impossible for a
        tunnel service provider to know that the tunnel only contains
        CC-controlled traffic (e.g. Inspecting packet headers may not be
        possible). The important thing to note is that if the aggregate of the
        traffic does not result in persistent congestion (impacting other
        flows), then the CB will not trigger. This is the expected case in
        this context - so implementing a CB will not reduce performance of the
        tunnel, but offers protection should congestion collapse occur.</t>
      </section>
    </section>

    <section title="Security Considerations" toc="include">
      <t>This section will describe security considerations.</t>
    </section>

    <section title="IANA Considerations" toc="include">
      <t>This document makes no request from IANA.</t>
    </section>

    <section title="Acknowledgments">
      <t>There are many people who have discussed and described the issues
      that have motivated this draft. </t>
    </section>

    <section title="Revision Notes">
      <t>RFC-Editor: Please remove this section prior to publication</t>

      <t>Draft 00</t>

      <t>This was the first revision. Help and comments are greatly
      appreciated.</t>

      <t></t>
    </section>
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>
    <!-- -->

    <references title="Normative References">
      <?rfc sortrefs="yes"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5405.xml"?>

      <reference anchor="Jacobsen88">
        <front>
          <title>Congestion Avoidance and Control", SIGCOMM Symposium
          proceedings on Communications architectures and protocols</title>

          <author fullname="Jacobson, V.">
            <organization>European Telecommunication Standards, Institute
            (ETSI)</organization>
          </author>

          <date month="August" year="1998" />
        </front>
      </reference>

      <reference anchor="RTP-CB">
        <front>
          <title>Multimedia Congestion Control: Circuit Breakers for Unicast
          RTP Sessions</title>

          <author fullname="C. S. Perkins">
            <organization></organization>

            <address>
              <postal>
                <street></street>

                <city></city>

                <region></region>

                <code></code>

                <country></country>
              </postal>

              <phone></phone>

              <facsimile></facsimile>

              <email></email>

              <uri></uri>
            </address>
          </author>

          <author fullname="V. Singh">
            <organization></organization>
          </author>

          <date month="February" year="2014" />
        </front>
      </reference>
    </references>

    <references title="Informative References">
      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.1112.xml"
?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4553.xml"
?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6040.xml"
?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml"
?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6679.xml"?>

      <?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5681.xml"?>

      <?rfc ?>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-23 21:45:39