http://stupid.domain.name/ietf/

One document matched: draft-ietf-ledbat-congestion-09.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC '' 
	'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
    <!ENTITY rfc5681 PUBLIC '' 
	'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5681.xml'>
    <!ENTITY rfc6298 PUBLIC '' 
	'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6298.xml'>
    <!ENTITY rfc4821 PUBLIC '' 
	'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4821.xml'>
    <!ENTITY rfc3168 PUBLIC '' 
	'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml'>
]>
<rfc category="exp" ipr="trust200902"
	docName="draft-ietf-ledbat-congestion-09.txt">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>

    <front>
        <title abbrev="LEDBAT">Low Extra Delay Background Transport (LEDBAT)</title>
        <author initials='S' surname="Shalunov" fullname='Stanislav Shalunov'>
          <organization>BitTorrent Inc</organization>
          <address>
            <postal>
              <street>612 Howard St, Suite 400</street>
              <city>San Francisco</city> <region>CA</region> <code>94105</code>
              <country>USA</country>
            </postal>
            <email>shalunov@bittorrent.com</email>
            <uri>http://shlang.com</uri>
          </address>
        </author>
        <author initials='G' surname="Hazel" fullname='Greg Hazel'>
          <organization>BitTorrent Inc</organization>
          <address>
            <postal>
              <street>612 Howard St, Suite 400</street>
              <city>San Francisco</city> <region>CA</region> <code>94105</code>
              <country>USA</country>
            </postal>
            <email>greg@bittorrent.com</email>
          </address>
        </author>
        <author initials='J' surname="Iyengar" fullname='Janardhan Iyengar'>
          <organization>Franklin and Marshall College</organization>
          <address>
            <postal>
              <street>415 Harrisburg Ave.</street>
              <city>Lancaster</city> <region>PA</region> <code>17603</code>
              <country>USA</country>
            </postal>
            <email>jiyengar@fandm.edu</email>
          </address>
        </author>
	<author initials='M' surname="Kühlewind" fullname='Mirja Kühlewind'>
		<organization>University of Stuttgart</organization>
		<address>
			<postal>
				<street></street>
				<code></code><city>Stuttgart</city>
				<country>DE</country>
			</postal>
			<email>mirja.kuehlewind@ikr.uni-stuttgart.de</email>
		</address>
        </author>
        
        <date/>
		<area>Transport</area>

		<workgroup>LEDBAT WG</workgroup>
        <abstract>
	
		<t>LEDBAT is an experimental delay-based congestion control algorithm
		  that attempts to utilize the available bandwidth on an end-to-end path
		  while limiting the consequent increase in queueing delay on the path.
		  LEDBAT uses changes in one-way delay measurements
		  to limit congestion that the flow itself induces in the network.
		  LEDBAT is designed for use by background bulk-transfer applications;
		  it is designed to be no more aggressive than TCP congestion control
		  and to yield in the presence of any competing flows when latency builds,
		  thus limiting interference with the network performance of the competing flows.</t>
	 </abstract>
	 </front>

    <middle>
        <section title="Requirements notation">
            <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
            "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
            and "OPTIONAL" in this document are to be interpreted as
            described in <xref target="RFC2119"/>.</t>
        </section>

		<section title="Introduction">
		  <t>
			TCP congestion control <xref target="RFC5681"/>
			seeks to share bandwidth at a bottleneck link equitably
			among flows competing at the bottleneck,
			and it is
			the predominant congestion control mechanism used on the Internet.
			Not all applications seek an equitable share 
			of network throughput, however "background" applications,
			such as software updates or file-sharing applications,
			seek to operate
			without interfering with the performance of 
			more interactive and delay- and/or bandwidth-sensitive "foreground" 
			applications and 
			standard TCP may be too aggressive for use with such background applications.
		  </t>
		  <t> LEDBAT is an experimental delay-based congestion control mechanism
			that reacts early to congestion in the network, 
			thus enabling "background" applications to use the network
			while avoidoing interference with the network performance of competing flows.
			A LEDBAT sender
			uses one-way delay measurements to estimate the amount of queueing on the data path,
			controls the LEDBAT flow's congestion window based on this estimate, and 
			minimizes interference with competing flows when latency builds
			by adding low extra queueing delay on the end-to-end path.
		  </t>
    	  <t>
			Delay-based congestion control protocols,
			such as TCP-Vegas <xref target="Bra94"/><xref target="Low02"/>,
			are generally designed to achieve more, not less throughput than
			standard TCP,
			and often outperform TCP under particular network settings.
			In contrast, LEDBAT is designed to be no more aggressive than TCP;
			LEDBAT is a "scavenger" congestion control mechanism
			that seeks to utilize all available bandwidth
			and yields quickly when competing with standard TCP at a bottleneck link.
		  </t>
		  <!--
			Other than TCP Vegas, the LEDBAT appraoch is based on changes in
			the one-way delay (OWD) instead of Round-Trip Time (RTT). This avoids that
			additional delays on the backchannel influence the sent-out decision.
			Additionally, LEDBAT maintains an low extra delay to operate more stable. 
			Moreover, LEDBAT is designed to be not more aggressive than standard
			TCP as LEDBAT will additionally react to loss as a congestion signal.
		</t>
		 <t>
			TCP Vegas [Bra94] is one of the first congestion control mechanisms known to have a
			smaller sending rate than standard TCP when both protocols share a bottleneck
			[Kur00]---yet it was designed to achieve more, not less throughput than
			standard TCP. Indeed, when it is the only protocol on the bottleneck, the
			throughput of TCP Vegas is greater than the throughput of standard TCP.
		</t>
			
			The predominant congestion control mechanism used on the Internet,
			TCP congestion control [XXXRFC2581],
			requires data loss to detect congestion.
			A TCP sender increases its congestion window [XXXRFC2581]
			until a loss occurs,
			which, 
			in the absence of Active Queue Management (AQM), 
			occurs only
			when the queue at the
			bottleneck link
			on the end-to-end path overflows.
			Even with AQM, 
			The queueing delay at the bottleneck link
			increases significantly before
			TCP responds to congestion at the tight link.

			This increased delay can be significant;
			default parameters on customer-side ADSL modems [XXXcite]
			can result in seconds of queueing delay on the ADSL uplink alone.
			While these large queueing delays have no known
			benefit, they have substantial drawbacks for interactive
			applications---"lag" increases for interactive games,
			and voice/video communication suffers from the consequently high roundtrip times.
			-->
			<!--
			It has been deployed by BitTorrent
            in the wild first with the BitTorrent DNA client (a P2P-based CDN) and now with the uTorrent
            client. This mechanism not only allows to keep delay across a bottleneck low, but also
            yields quickly in the presence of competing traffic with loss-based congestion
            control.</t>

		   	<t>Beyond its utility for P2P, LEDBAT enables other advanced networking applications to
            better get out of the way of interactive apps.</t>

		   	<t>In addition to direct and immediate benefits for P2P and other application that can
            benefit from scavenger service, LEDBAT could point the way for a possible future evolution
            of the Internet where loss is not part of the designed behavior and delay is
            minimized.</t>
			-->
		  	<section title="Design Goals">
			<t>LEDBAT congestion control seeks to:
				<list style="numbers">
					<t> utilize end-to-end available bandwidth, and maintain 
						low queueing delay when no other traffic is present, </t>
					<t> add little to the queuing delay induced by concurrent flows,</t>
					<t> quickly yield to flows using standard TCP congestion control 
						that share the same bottleneck link,</t>
				</list>
			</t>
			</section>

		  	<section title="Applicability">
			  <t> LEDBAT is a "scavenger" congestion control mechanism that is 
				primarily motivated by background bulk-transfer applications,
				such as 
				large file transfers (as with file-sharing applications)
				and 
				software updates.
				It can be used with any application
				that seeks to minimize its impact on the network
				and on other interactive delay- and/or bandwidth-sensitive network applications.
				LEDBAT is expected to work well	when the sender and/or receiver
				is connected via a residential access network.
			  </t>
			  <t> LEDBAT can be used as part of a
				transport protocol or as part of an application,
				as long as the data transmission 
				mechanisms are capable of carrying timestamps and
				acknowledging data frequently.
				LEDBAT can be used, 
				with appropriate extensions where necessary,
				with TCP, SCTP, and DCCP,
				and with proprietary application protocols such as those built on top of
				UDP for P2P applications.
			  </t>
			  <t>
			  When used with an ECN-capable framing protocol,
			  LEDBAT should react to an ECN mark as it would to a loss, 
			  as specified in <xref target="RFC3168"/>.</t>

			  <t> LEDBAT is designed to reduce build-up
				of a standing queue by long-lived LEDBAT flows
				at a link with a tail-drop FIFO queue,
				so as to avoid persistently delaying other flows 
				sharing the queue.
				If Active Queue Management (AQM) is configured to
				drop or ECN-mark packets before the
				LEDBAT starts reacting to persistent queue build-up,
				LEDBAT reverts to standard TCP behavior, rather than
				yield to other TCP flows.
				However, such an AQM is still desirable since it keeps
				queuing delay low, achieving an outcome that is in line with LEDBAT's goals.
				Additionally,
				a LEDBAT transport that supports ECN enjoys
				the advantages that an ECN-capable TCP enjoys over
				an ECN-agnostic TCP; avoiding losses and possible retransmissions.
				Weighted Fair Queuing (WFQ), as employed by some home gateways,
				seeks to isolate and protect delay-sensitive flows from delays due to 
				standing queues built up by concurrent long-lived flows.
				Consequently, while it prevents LEDBAT from yielding to other TCP flows, 
				it again achieves an outcome
				that is in line with LEDBAT's goals <xref target="Sch10"/>.
			  </t>
			  <t>
				Further study is required to fully understand
				the behaviour of LEDBAT with non-drop-tail, non-FIFO
				queues.
			  </t>
		  	  </section>
		</section>
		
		<section title="LEDBAT Congestion Control">
		  <section title="Overview">
			<t> A standard TCP sender increases its congestion window
				until a loss occurs <xref target="RFC5681"/> or 
				an ECN mark is received <xref target="RFC3168"/>,
			  which, in the absence of any Active Queue Management (AQM) and 
			  link errors in the network, occurs only when the queue at the bottleneck link
			  on the end-to-end path overflows.
			  Since packet loss or marking at the bottleneck link is expected to 
			  be preceded by an increase in the queueing delay at the bottleneck link,
			  LEDBAT congestion control uses this increase in queueing delay as an early 
			  signal of congestion,
			  enabling it to respond to congestion earlier than standard TCP,
			  and enabling it to yield bandwidth to a competing TCP flow.
			</t>
			<t> LEDBAT employs one-way delay measurements
			  to estimate queueing delay.
			  When the estimated queueing delay 
			  is less than a pre-determined target,
			  LEDBAT infers that the network is not yet congested,
			  and increases its sending rate to utilize any spare capacity in the network.
			  When the estimated queueing delay
			  becomes greater than a pre-determined target,
			  LEDBAT decreases its sending rate quickly
			  as a response to potential congestion in the network.
			</t>
		  </section>

		  <section title="Preliminaries">
			<t> 
			  <!-- (JRI:  We don't need to assume fixed-size segments since bytes_acked takes care of it.)
				   For the purposes of explaining LEDBAT, 
			  we assume a transport sender that uses fixed-size
			  segments and a receiver that acknowledges each segment separately.
			  It is straightforward to apply the mechanisms described here 
			  with variable-sized segments
			  and with delayed acknowledgments. -->
			  A LEDBAT sender uses a congestion window (cwnd)
			  to gate the amount of data that the sender can send into the network in one roundtrip time (RTT).
			  A sender MAY maintain its cwnd in bytes or in packets;
			  this document uses cwnd in bytes.
			  LEDBAT requires that each data segment carries a "timestamp" from the sender,
			  based on which the receiver computes the one-way delay from the sender,
			  and sends this computed value back to the sender.</t>
			
			<t> In addition to the LEDBAT mechanism described below,
			  we note that a slow start mechanism can be used as specified in <xref target="RFC5681"/>.
			  Since slow start leads to faster increase in the window than 
			  that specified in LEDBAT,
			  conservative congestion control implementations employing LEDBAT
			  may skip slow start altogether
			  and start with an initial window of INIT_CWND * MSS.
			  (INIT_CWND is described later in
			  <xref target='params'/>.)
			</t>
			<t> The term "MSS", or the sender's Maximum Segment Size,
			  used in this document
			  refers to the size of the
			  largest segment that the sender can transmit.
			  The value of MSS can be based on the path MTU discovery <xref target="RFC4821"/> algorithm
			  and/or on other factors.
			</t>
		  </section>

		  <section title="Receiver-Side Operation">
			<t> A LEDBAT receiver operates as follows:
				<figure><artwork><![CDATA[
on data_packet:
    remote_timestamp = data_packet.timestamp
    acknowledgement.delay = local_timestamp() - remote_timestamp
    # fill in other fields of acknowledgement
    acknowlegement.send()
]]></artwork></figure></t>
			<t> A receiver MAY send more than one delay sample in an acknowledgment.
			  For instance, a receiver that delays acknowledgments,
			  i.e., sends an acknowledgment 
			  less frequently than once per data packet, 
			  MAY
			  send all the one-way delay samples that it gathers in one acknowledgment.
			</t>

			</section>

			<section title="Sender-Side Operation">
			<section title="An Overview">
			<t>As a first approximation, a LEDBAT sender operates as shown below; 
			  the complete algorithm is specified later in <xref target='full-algo'/>.
			  TARGET is the maximum queueing delay that LEDBAT itself may introduce in the network,
			  and GAIN determines the rate at which the cwnd responds to changes in queueing delay;
			  both constants are specified later. 
			  off_target is a normalized value representing the difference between
			  the measured current queueing delay and the pre-determined TARGET queuing delay.
			  off_target can be positive or negative,
			  and consequently, cwnd increases or decreases in proportion to off_target.
			</t>
				<figure><artwork><![CDATA[
on initialization:
    base_delay = +INFINITY

on acknowledgement:
    current_delay = acknowledgement.delay
    base_delay = min(base_delay, current_delay)
    queuing_delay = current_delay - base_delay
    off_target = (TARGET - queuing_delay) / TARGET
    cwnd += GAIN * off_target * bytes_newly_acked * MSS / cwnd
]]></artwork></figure>
			  <t>The simplified mechanism above ignores multiple delay samples in an acknowledgment, 
				noise filtering, base delay expiration, and sender idle times,
				which we now take into account in our complete sender algorithm below.
			</t>
			</section>

			<section anchor="full-algo" title="The Complete Sender Algorithm">
			  <t> 
				update_current_delay() maintains a list of one-way delay measurements,
				of which a filtered value is used as an estimate of the current end-to-end delay.
				update_base_delay() maintains a list of one-way delay minima over a number of one-minute intervals,
				to measure and to track changes in the base delay of the end-to-end path.
				<!--
				Note that while this document uses the minimum to filter any noise in the one-way delay, 
				a different and more sophisticated filter MAY be used.
				-->
			  </t>
			  <t>
				This algorithm restricts cwnd growth
				after a period of inactivity, 
				where the cwnd is clamped down to a little more than flightsize
				using max_allowed_cwnd.
				To be TCP-friendly on data loss, 
				LEDBAT halves its cwnd.
				The full sender-side algorithm is given below:</t>
			  <figure><artwork><![CDATA[
on initialization:
    # cwnd is the amount of data that is allowed to send in one RTT and 
    # is defined in bytes.
    # CTO is the Congestion Timeout value.
   
    create current_delays list with CURRENT_FILTER elements
    create base_delays list with BASE_HISTORY number of elements
    initialize elements in base_delays to +INFINITY
    initialize elements in current_delays appropriate to FILTER()
    last_rollover = -INFINITY # More than a minute in the past
    flightsize = 0
    cwnd = INIT_CWND * MSS
    CTO = 1 second

on acknowledgment:
    # flightsize is the amount of data outstanding before this ack 
    #    was received and is updated later;
    # bytes_newly_acked is the number of bytes that this ack 
    #    newly acknowledges, and it MAY be set to MSS.

    for each delay sample in the acknowledgment:
        delay = acknowledgement.delay
        update_base_delay(delay)
        update_current_delay(delay)
    queuing_delay = FILTER(current_delays) - MIN(base_delays)
    off_target = (TARGET - queuing_delay) / TARGET
    cwnd += GAIN * off_target * bytes_newly_acked * MSS / cwnd
    max_allowed_cwnd = flightsize + ALLOWED_INCREASE * MSS
    cwnd = min(cwnd, max_allowed_cwnd)
    cwnd = max(cwnd, MIN_CWND * MSS)
    flightsize = flightsize - bytes_newly_acked
    update_CTO()

on data loss:
    # at most once per RTT
    cwnd = min (cwnd, max (cwnd/2, MIN_CWND * MSS))
    if data lost is not to be retransmitted:
        flightsize = flightsize - bytes_not_to_be_retransmitted

if no acks are received within a CTO:
    # extreme congestion, or significant RTT change.
    # set cwnd to 1MSS and backoff the congestion timer.
    cwnd = 1 * MSS
    CTO = 2 * CTO

]]></artwork></figure>

<figure><artwork><![CDATA[
			  
update_CTO()
    # implements an RTT estimation mechanism using data
    # transmission times and ack reception times,
    # which is used to implement a congestion timeout (CTO).
    # If implementing LEDBAT in TCP, sender SHOULD use
    # mechanisms described in RFC 6298 <xref target= 'RFC6298'/>,
    # and the CTO would be the same as the RTO.

update_current_delay(delay)
    # Maintain a list of CURRENT_FILTER last delays observed.
    delete first item in current_delays list
    append delay to current_delays list

update_base_delay(delay)
    # Maintain BASE_HISTORY delay-minima. 
    # Each minimum is measured over a period of a minute.
    # 'now' is the current system time
    if round_to_minute(now) != round_to_minute(last_rollover)
        last_rollover = now
        delete first item in base_delays list
        append delay to base_delays list
    else
        base_delays.tail = MIN(base_delays.tail, delay)


]]></artwork></figure>

			  <t> The LEDBAT sender seeks to to extract the actual delay estimate 
				from the current_delay samples by implementing FILTER() to eliminate any outliers. 
				Different types of filters MAY be used for FILTER() ---
				a NULL filter, that does not filter at all, is a reasonable candidate as well, 
				since LEDBAT's use of a linear controller for cwnd increase and decrease
				may allow it to recover quickly from errors induced by bad samples.
				Another example of a filter is the
				Exponentially-Weighted Moving Average (EWMA) function,
				with weights that enable agile tracking of changing network delay.
				A simple MIN filter applied over a small window may also provide robustness to large delay peaks, 
				as may occur with delayed acks in TCP. 
				Care should be taken that the filter used, while providing robustness to noise, 
				remains sensitive to persistent congestion signals.
			  </t>
			  <t>To implement an approximate
				  minimum over the past few minutes,
				  a LEDBAT sender stores BASE_HISTORY separate minima---one each
				  for the last BASE_HISTORY-1 minutes,
				  and one for the running current minute.
				  At the end of the current minute, the window moves---the 
				  earliest minimum is dropped and the latest minimum is added.
				  If the connection is idle for a given minute, 
				  no data is available for the one-way delay and,
				  therefore,
				  a value of +INFINITY has to be stored in the list.
				  If the connection has been idle for BASE_HISTORY minutes, 
				  all minima in the list are thus set to +INFINITY and measurement begins
				  anew. 
				  LEDBAT thus requires that during idle periods, 
				  an implementation must maintain the base delay list.</t>

			  <t> LEDBAT uses a congestion timeout (CTO) to avoid transmitting 
				data during periods of heavy congestion, and to avoid congestion collapse.
				A CTO is used to detect heavy congestion indicated by loss of all outstanding data or acknowledgments,
				resulting in reduction of the cwnd to 1 MSS
				and an exponential backoff of the CTO interval.
				This backoff of the CTO value avoids sending more data into an overloaded queue,
				and also allows the sender to cope with sudden changes in the RTT of the path.
				The function of a CTO is similar to that of an retransmission timeout (RTO) in TCP <xref target='RFC6298'/>,
				but 
				since LEDBAT separates reliability from congestion control,
				a retransmission need not be triggered by a CTO.
				LEDBAT, however does not preclude a CTO from triggering retransmissions,
				as could be the case if LEDBAT congestion control were to be used with TCP framing and reliability.
			</t>
			  <t> The CTO is a gating mechanism that ensures
				exponential backoff of sending rate under heavy congestion, 
				and it may
				be implemented with or without a timer.  
				An implementation choosing to avoid timers may consider
				using a "next-time-to-send" variable,
				set based on the CTO,
				to control the earliest time a sender may transmit
				without receiving any acks.
			  </t>
			  <t>
				A maximum value MAY be placed on the CTO, and if placed, it MUST be 60 seconds or more.
			  </t>
			  
			<t> We note that LEDBAT assumes random fluctuations in inter-packet 
				transmission times. That will help to measure the correct base delay because the
				bottleneck runs empty from time to time; see section 
				<xref target="fairness"/> for a discussion.
			  </t>

			</section>
			</section>
			<section anchor="params" title="Parameter Values">
				<t>TARGET MUST be 100 milliseconds or less, 
				  and this choice of value is explained further in <xref target="target"/>. 
				  Note that using the same TARGET
				  value across LEDBAT flows enables equitable sharing
				  of the bottleneck bandwidth---flows with a higher TARGET may
				  get a  larger share of the bottleneck bandwidth.
				  It is possible to consider 
				  the use of different TARGET values
				  for implementing a relative priority between 
				  two competing LEDBAT flows
				  by setting a higher TARGET value for the higher-priority flow.
				</t>
				  
				  <t>
				  ALLOWED_INCREASE SHOULD be 1, and it MUST be greater than 0.
				  An ALLOWED_INCREASE of 0 results in no cwnd growth at all,
				  and an ALLOWED_INCREASE of 1 allows and limits the cwnd increase 
				  based on flightsize in the previous RTT.
				  An ALLOWED_INCREASE greater than 1 MAY be used when 
				  interactions between LEDBAT and the framing protocol
				  provide a clear reason for doing so.
				  <!--
				  ALLOWED_INCREASE allows for congestion window growth where the
				  flightsize consistently remains lower than the congestion window.
				  For instance,
				  standard TCP
				  for a sender to artificially inflate the congestion window
				  during loss recovery 
				  partially since duplicate acks do not convey ,
				  -->
				  </t>

				  <t> GAIN MUST be set to 1 or less. 
				  A GAIN of 1 limits the
				  maximum cwnd ramp-up to the same rate as 
				  TCP Reno in Congestion Avoidance.
				  While this document specifies the use
				  of the same GAIN for both cwnd increase (when off_target is greater than zero) and 
					decrease (when off_target is less than zero),
				  implementations MAY use a higher GAIN for cwnd decrease than for the increase; 
					our justification follows.
				  When a competing non-LEDBAT flow increases its 
				  sending rate, 
				  the LEDBAT sender may only measure a small amount of additional delay
				  and decrease the sending rate slowly. 
				  To ensure no
				  impact on a competing non-LEDBAT flow, 
				  the LEDBAT flow should decrease its sending rate 
				  at least as quickly as the competing flow increases its sending rate.
				  A higher decrease GAIN MAY be used to allow the LEDBAT flow
				  to decrease its sending rate faster than 
				  the competing flow's increase rate.
				  </t>
				  <t>
				  The size of the base_delays list, BASE_HISTORY, SHOULD be 10.
				  If the actual base delay decreases,
				  due to a route change for instance,
				  a LEDBAT sender adapts immediately,
				  irrespective of the value of BASE_HISTORY.
				  If the actual base delay increases however,
				  a LEDBAT sender will take BASE_HISTORY 
				  minutes to adapt and may wrongly infer a little more extra delay than intended (TARGET) in the meanwhile.
				  A value for BASE_HISTORY is thus a tradeoff:
				  a higher value may yield a more accurate measurement when the base delay is unchanging,
				  and a lower value results in a quicker response to actual increase in base delay.
				  </t>
				  <t>
				  A LEDBAT sender uses the current_delays list to 
				  maintain only delay measurements made within a RTT amount of time in the past,
				  seeking to eliminate noise spikes 
				  in its measurement of the current one-way delay through the network.
				  The size of this list, CURRENT_FILTER, may be variable,
				  and depends on the FILTER() function as well as the number of successful measurements made within 
				  a RTT amount of time in the past.
				  The sender should seek to gather enough delay samples in each RTT
				  so as to have statistical confidence in the measurements.
				  While the number of delay samples required for such confidence will vary
				  depending on network conditions,
				  we recommend that the sender SHOULD use at least 4 samples in each RTT,
				  unless the number of samples is lower due to a small congestion window.
				  Thus, subject to congestion window constraints,
				  the number of delay samples in each RTT SHOULD be at least 4.
				  The value of CURRENT_FILTER will depend on the filter being employed,
				  but CURRENT_FILTER MUST be limited such that samples in the list are
				  not older than an RTT in the past.
				</t>
				<t> 
				  INIT_CWND and
				  MIN_CWND SHOULD both be 2.
				  An INIT_CWND of 2 should
				  help seed FILTER() at the sender
				  when there are no samples at the beginning of a flow, 
				  and 
				  a MIN_CWND of 2
				  allows FILTER() to use more than a single instantaneous delay estimate
				  while not being too aggressive.
				  Slight deviations
				  may be warranted, for example,
				  when these values of INIT_CWND and MIN_CWND interact poorly with the framing protocol.
				  However,
				  INIT_CWND and MIN_CWND
				  MUST be no larger than 
				  the corresponding values specified for TCP <xref target='RFC5681'/>.
				</t>
				  <!--MIN_CWND SHOULD be 2, and it MUST be at least 1.
				  INIT_CWND SHOULD be 2, and it MUST be at least 1.
				  The choice of MIN_CWND and INIT_CWND 
				  are strongly connected to the framing protocol,
				  and the acknowledgment mechanisms used;
				  a larger MIN_CWND and/or INIT_CWND MAY be used
				  if the framing protocol allows it.
				  For instance, TCP senders may use
				  a larger INIT_CWND as specified in <xref target='RFC3390'/>. -->

			</section>
		</section>
		<section title="Understanding LEDBAT Mechanisms">
			<t>This section describes the 
			  delay estimation and window management mechanisms 
			  used in LEDBAT.
			</t>
			
			<section title="Delay Estimation">
			  <t> LEDBAT estimates congestion in the direction of the data flow,
				  and to avoid measuring additional delay from e.g. queue build-up 
				  on the reverse path (or ack path) or reordering,
				LEDBAT uses one-way delay estimates.
				LEDBAT assumes measurements are done with data packets,
				thus avoiding the need for separate measurement packets
				and avoiding the pitfall of
				measurement packets being treated
				differently from the data packets in the network.</t>	

			  <t> End-to-end delay
				can be decomposed into transmission (or serialization) delay,
				propagation (or speed-of-light) delay,
				queueing delay,
				and processing delay.
				On any given path,
				barring some noise,
				all delay components except for queueing delay are constant.
				To observe an increase in the queueing delay in the network,
				a LEDBAT sender separates the queueing delay component 
				from the rest of the end-to-end delay,
				as described below.
			  </t>
  
		  <section anchor="estimating_base_delay" title="Estimating Base Delay">
			  <t>
			  Since	queuing delay is always additive to the end-to-end delay, 
			  LEDBAT estimates the
			  sum of the constant delay components,
			  which we call "base delay",
			  to be the minimum delay observed on the end-to-end path. 
			  <!--Using the minimum observed delay 
			  also allows LEDBAT to eliminate noise in the delay estimation,
			  such as due to spikes in processing delay at a node on the path.--></t>
			

		  <t> To respond to true changes in the base delay,
			as can be caused by a route change,
			  LEDBAT uses only recent measurements in estimating
			  the base delay. The duration of the observation window itself 
			  is a tradeoff between 
			  robustness of measurement and responsiveness to change---a
			  larger observation window increases the chances that 
			  the true base delay will be detected (as long as the 
			  true base delay is unchanged),
			  whereas a smaller observation window results 
			  in faster response to true changes in the base delay.</t>
			</section>

			<section title="Estimating Queueing Delay">
				<t>Given that the base delay is constant,
				  the queueing delay is represented by the variable component 
				  of the measured end-to-end delay.
				  LEDBAT measures queueing delay as simply the
				  difference between an end-to-end delay measurement
				  and the current estimate of base delay.
				  The queueing delay should be filtered (depending on the usage 
				  scenario) to eliminate noise in the delay estimation,
			  such as due to spikes in processing delay at a node on the path.</t>
			</section>
			</section>

			<section title="Managing the Congestion Window">
			  <section title="Window Increase: Probing For More Bandwidth">
				<t> A LEDBAT sender increases its
				  congestion window if the queuing delay is smaller than a target value,
				  proportionally to the relative difference 
				  between the current queueing delay 
				  and the delay target.
				  To be friendly to competing TCP flows,
				  we set this highest rate of window growth
				  to be the same as TCP's.
				  In other words,
				  a LEDBAT flow thus
				  never ramps up faster than a competing TCP flow over the same path.
				  As closer the extra delay gets to the TARGET value, as slower LEDBAT
				  will increase the window.
				  </t>
			  </section>
				
			  <section title="Window Decrease: Responding To Congestion">
				<t> When the sender's queueing delay estimate is higher than the target, 
				  the LEDBAT flow's rate should be reduced.
				  LEDBAT uses a simple linear controller to determine the sending rate
				  as a function of the delay estimate, where the
				  response is proportional to the difference between the
				  current queueing delay estimate and the target. This allows to decrease
				  the window only slightly 
				  while probing and leads to a quite stable state 
				  with high link utilization.
				  In limited experiments with Bittorrent nodes, 
				  this controller seems to work well.</t>

				<t> Unlike TCP-like loss-based
				  congestion control, LEDBAT
				  seeks to 
				  avoid losses and so 
				  a LEDBAT sender is not expected to normally 
				  rely on losses to determine the sending rate. 
				  However, when data loss does occur, 
				  LEDBAT must respond as standard
				  TCP does;  
				  even if the queueing delay estimates indicate otherwise,
				  a loss is assumed to be a strong indication of congestion.
				  Thus, 
				  to deal with severe congestion when
				  packets are dropped in the network,
				  and to provide a fallback against
				  incorrect queuing delay estimates,
				  a LEDBAT sender halves its congestion window
				  when a loss event is detected.
				  As with TCP New-Reno, 
				  LEDBAT reduces its cwnd by half at
				  most once per RTT. 
				</t>
			  </section>

			</section>
			<section anchor="target" title="Choosing The Queuing Delay Target">
			  <t>
				The queueing delay target is a tradeoff.
				A target that is too low might result in under-utilization of the bottleneck link,
				because of the noise in the delay measurement e.g in a mobile scenario,
				and may also be more sensitive to error in the measured delay.
			  <!--
				<t>Consider the queuing delay at the queue associated with the bottleneck link.  
				  This delay is the extra delay induced by congestion
				control. One of LEDBAT's design goals is to keep this delay
				low. However, when this delay is zero, the queue is
				empty and the link
				is thus not saturated. Hence, our design goal is to
				keep the queuing delay low, but non-zero.</t>

				<t>How low do we want the queuing delay to be? Because
				another design goal is to be deployable on networks
				with only simple FIFO queuing and drop-tail
				discipline, we can't rely on explicit signaling for
				the queuing delay.  So we're going to estimate it
				using external measurements. The external measurements
				will have an error at least on the order of best-case
				scheduling delays in the OSes or other devices on the network path. 
				There's thus a good
				reason to try to make the queuing delay larger than
				this error.</t>
				-->
				The International Telecommunication Union's (ITU's)
				Recommendation G.114 defines a
				delay of 150 ms to be acceptable for most user voice
				applications. Thus the extra delay induced by LEDBAT must be
				below 150 ms to reduce impact on delay-sentive applications.
				If the TARGET value is larger than the maximum delay the queue can 
				induce, LEDBAT will fallback to the same behavior than standard TCP 
				(see section <xref target="competing_flows"/>).</t>
				
				<t> Our recommendation of 100 ms or less as the target
				  is based on these considerations.
				  Anecdotal evidence indicates that this value works well: 
				  LEDBAT has been implemented and successfully 
				  deployed with a target value of 100 ms
				  in two Bittorrent implementations---BitTorrent DNA as the
				  exclusive congestion control mechanism and in uTorrent as an
				  experimental mechanism.
				</t>
		</section>
		</section>
		

		<section title="Discussion">
		  <section title="Framing and Ack Frequency Considerations">
			  <t>While the actual framing and wire format of the protocols
				using LEDBAT are outside the scope of this document, 
				we briefly consider the
				data framing and ack frequency needs
				of LEDBAT mechanisms.</t>

			  <t> To compute the data path's one-way delay,
				our discussion of LEDBAT assumes a framing that
				allows the sender
				to timestamp packets and 
				for the receiver to convey the measured one-way delay 
				back to the sender in ack packets.
				LEDBAT does not require this particular method,
				but it does require unambiguous delay estimates using data and ack packets.
				</t>

			  <t>A LEDBAT receiver may send an ack as frequently as 
				one for every data packet received
				or less frequently;
				LEDBAT does require that the receiver MUST transmit 
				at least one ack in every RTT.
			  </t>

		  </section>

		  <section title="Competing With TCP Flows" anchor="competing_flows">
			<t> LEDBAT is designed to
			  respond to congestion indications earlier
			  than loss-based TCP.
			  A LEDBAT flow is more aggressive when the 
			  queueing delay estimate is lower;
			  since the queueing delay estimate is non-negative,
			  LEDBAT is most aggressive when its queuing delay
			  estimate is zero. 
			  In this case, 
			  LEDBAT ramps up its congestion window at the same rate as TCP does.
			  LEDBAT reduces its rate earlier than TCP does,
			  always halving the congestion window on loss.  
			  Thus,
			  in the worst case where the delay estimates are completely and consistently off,
			  a LEDBAT flow falls back to TCP mechanisms as it will be at most as aggressive as a TCP flow and halves on loss.</t>
			  <!-- If LEDBAT can ramp down faster than the
			  loss-based connection ramps up, LEDBAT will
			  yield. LEDBAT ramps down when queuing delay estimate
			  exceeds the target: the more the excess, the faster the
			  ramp-down.  When the loss-based connection is standard
			  TCP, LEDBAT will yield at precisely the same rate as TCP
			  is ramping up when the queuing delay is double the
			  target.
			  -->
		  </section>

		  <section anchor="fairness" title="Fairness Among LEDBAT Flows">
			<t>The primary design goals of LEDBAT are focussed on the aggregate
			  behavior of LEDBAT flows when they compete with standard
			  TCP.  Since LEDBAT is designed for
			  background traffic, we consider link utilization to be 
			  more important than fairness amongst LEDBAT flows. 
			  Nevertheless, we now consider fairness issues that might arise 
			  amongst competing LEDBAT flows.</t>

			<t>LEDBAT as described so far lacks a mechanism
			  specifically designed to equalize utilization 
			  amongst LEDBAT flows. Anecdotally observed behavior of existing
			  implementations indicates that a rough equalization
			  does occur since in most enviroments some amount 
			  of randomness in the inter-packet transmission times exist, as explained further below.</t>

			<t>Delay-based congestion control systems suffer from the 
			  possibility of late-comers incorrectly
			  measuring and using a higher base-delay than an active flow that started earlier.
			  Suppose a LEDBAT flow is the only flow on the bottleneck, which the flow saturates,
			  steadily maintaining the queueing delay at a target delay.
			  When a new LEDBAT flow arrives,
			  it might incorrectly measure the current end-to-end delay,
			  including the queueing delay being maintained by the first LEDBAT flow,
			  as its base delay,
			  and the incoming flow might now effectively seek to build on top of the existing, 
			  already maximal queueing delay.
			  As the second flow builds up, 
			  the first flow sees the true queueing delay and backs off,
			  while the late-comer keeps building up, using up the entire link's capacity;
			  this advantage is called the "late-comer's advantage".
			</t>
			<t> In the worse case, if the first flow yields at the same rate as the new flow 
			  increases its sending rate, 
			  the new flow will see constant end-to-end delay, which it assumes is the base delay,
			  until the first flow backs off completely. 
			  As a result, 
			  by the time the second flow stops increasing its cwnd,
			  it would have added twice the target queueing delay to the network.
			</t>
			  
			<t> This advantage can be reduced 
			  if the the first flow yields quickly enough to empty the bottleneck queue
			  faster than the incoming flow increases its occupancy in the queue;
			  as a result, the late-comer might measure a delay closer to the base delay.
			  While such a reduction might be achieved through a multiplicative decrease 
			  of the congestion window, this might cause stronger fluctuations in flow throughput
			  during steady state.
			  Thus we do not recommend a multiplicative decrease scheme.
			</t>

			<t>We note that in certain use-case scenarios,
			  it is possible for a later LEDBAT flow to gain an
			  unfair advantage over an existing one <xref target="Car10"/>.
			  In practice, this concern may be
			  alleviated by the burstiness of network traffic: all
			  thatâ€™s needed to measure the base delay is one small gap
			  in transmission schedules between the LEDBAT
			  flows. These gaps can occur for a number of reasons such
			  as latency introduced due to application sending patterns,
			  OS scheduling at the
			  sender, processing delay at the sender or any network
			  node, and link contention. When such a gap occurs in the
			  first sender's transmission while the late-comer is
			  starting, base delay is immediately correctly measured.
			  With a small number of LEDBAT flows, 
			  system noise may sufficiently regulate the late-comerâ€™s
			  advantage.
			</t>
		</section>
	</section>
		
	<section title="IANA Considerations">
	  <t>There are no IANA considerations for this document.</t>
	</section>

        <section title="Security Considerations">
          <t>A network on the path might choose to cause higher
        	delay measurements than the real queuing delay so that
        	LEDBAT backs off even when there's no congestion present.
        	While shaping of traffic into an artificially narrow bottleneck 
			by increasing the queueing delay
        	cannot be trivially counteracted, 
			a protocol using LEDBAT should seek to minimize the risk of such an attack
			by authenticating the timestamp and delay fields in the packets.
		  </t>
		  <t>
			  LEDBAT is not known to introduce any new concerns with privacy, integrity,
			  or other security issues for flows that use it.  It should be compatible with use of IPsec and TLS/DTLS.
		  </t>
        </section>

		<section title="Acknowledgements">
		  <t> We thank folks in the LEDBAT working group for their comments and feedback.
			Special thanks to Murari Sridharan and Rolf Winter for their patient and untiring shepherding.
		  </t>
		</section>
	</middle>

    <back>
        <references title='Normative References'>
		&rfc2119;&rfc6298;&rfc4821;&rfc5681;&rfc3168;
		</references>
        <references title='Informative References'>
		<reference anchor="Bra94">
			<front>
				<title>TCP Vegas: New techniques for congestion detection and avoidance</title>
				<author initials="L" surname="Brakmo"><organization></organization></author>
				<author initials="S" surname="O'Malley"><organization></organization></author>
				<author initials="L" surname="Peterson"><organization></organization></author>
				<date year="Proceedings of SIGCOMM '94, pages 24-35, August 1994" />
			</front>
		</reference>
		<reference anchor="Low02">
			<front>
				<title>Understanding TCP Vegas: A Duality Model</title>
				<author initials="S" surname="Low"><organization></organization></author>
				<author initials="L" surname="Peterson"><organization></organization></author>
				<author initials="L" surname="Wang"><organization></organization></author>
				<date year="JACM 49 (2), March 2002" />
			</front>
		</reference>
		<reference anchor="Car10">
			<front>
				<title>Rethinking Low Extra Delay Background Transport Protocols</title>
				<author initials="G" surname="Carofiglio"><organization></organization></author>
				<author initials="L" surname="Muscariello"><organization></organization></author>
				<author initials="D" surname="Rossi"><organization></organization></author>
				<author initials="C" surname="Testa"><organization></organization></author>
				<author initials="S" surname="Valenti"><organization></organization></author>
				<date year="arXiv:1010.5623v1, September 2010" />
			</front>
		</reference>
		<reference anchor="Sch10">
			<front>
				<title>Out of my Way -- Evaluating Low Extra Delay Background Transport in an ADSL Access Network</title>
				<author initials="J" surname="Schneider"><organization></organization></author>
				<author initials="J" surname="Wagner"><organization></organization></author>
				<author initials="R" surname="Winter"><organization></organization></author>
				<author initials="H" surname="Kolbe"><organization></organization></author>
				<date year="Proceedings of  22nd International Teletraffic Congress (ITC22), September 2010" />
			</front>
		</reference>
		<!--
		<reference anchor="Kur00">
			<front>
				<title>Fairness Comparisons Between TCP Reno and TCP Vegas for 
					Future Deployment of TCP Vegas</title>
				<author initials="K" surname="Kurata"><organization></organization></author>
				<author initials="G" surname="Hasegawa"><organization></organization></author>
				<author initials="M" surname="Murata"><organization></organization></author>

				<date year="Proceedings of INET 2000, July 2000" />
			</front>
		</reference>
		-->
	</references>

		<section anchor="app-additional" title="Timestamp errors">
		  <t>One-way delay measurement needs to deal with timestamp
			errors. We'll use the same locally linear clock model and
			the same terminology as Network Time Protocol (NTP). This
			model is valid for any differentiable clocks.  NTP uses
			the term "offset" to refer to difference from true time
			and "skew" to refer to difference of clock rate from the
			true rate.  The clock will thus have a fixed offset from
			the true time and a skew.  We'll consider what we need to
			do about the offset and the skew separately.</t>
				
		  <section title="Clock offset">
			<t>First, consider the case of zero skew. The offset of
			  each of the two clocks shows up as a fixed error in
			  one-way delay measurement. The difference of the offsets
			  is the absolute error of the one-way delay estimate. We
			  won't use this estimate directly, however. We'll use the
			  difference between that and a base delay. Because the
			  error (difference of clock offsets) is the same for the
			  current and base delay, it cancels from the queuing
			  delay estimate, which is what we'll use. Clock offset is
			  thus irrelevant to the design.</t>
		  </section>

		  <section title="Clock skew">
			  <t>The clock skew manifests in a
			  linearly changing error in the time estimate.  For a given pair of
			  clocks, the difference in skews is the skew of the one-way delay
			  estimate.  Unlike the offset, this no longer cancels in the
			  computation of the queuing delay estimate.  On the other hand, while
			  the offset could be huge, with some clocks off by minutes or even
			  hours or more, the skew is typically small.  For example, NTP
			  is designed to work with most clocks, yet it gives up when the skew
			  is more than 500 parts per million (PPM).  Typical skews of clocks
			  that have never been trained seem to often be around 100-200 PPM.
			  Previously trained clocks could have 10-20 PPM skew due to
			  temperature changes.  A 100-PPM skew means accumulating 6
			  milliseconds of error per minute. The base delay updates mostly 
			  takes care of clock skew unless the skew is unusually high or extreme 
			  values have been chosen for TARGET and BASE_HISTORY so that the clock skew 
			  in BASE_DELAY minutes is larger than the TARGET. </t>
					
		  <t>Clock skew can be in two directions: either the sender's clock is
			  faster than the receiver's, or vice versa. </t>
			
		  <t>If the sender's clock is faster the one-way delay measurement 
			will get more and more reduced by the clock drift over time. 
			Whenever there is no additional delay the base delay will be 
			updated by a smaller one-way delay value and the skew is compensated. 
			<!--This will happen continuously as LEBDAT is design to keep the queue 
			empty.--> If a competing flow introduces additional queueing delay 
			LEDBAT will anyway get out of the way quickly and an overestimated 
  			 one-way delay will just speed-up the back-off.</t>

		 <t>When the receiver clock runs faster, the raw delay
			 estimate will drift up with time. This can suppress the throughput
			 unnecessarily. In this case a skew correction mechanim can
			 be benefital. Further condersiderations based on a deployed implementation
   			and LEDBAT specific preconditions are given in the next section.</t>

	          </section>
          
		<section title="Clock skew correction mechanism">
		  <t>The following paragraph describes the deployed clock skew correction mechanism
			  in the BitTorrent implementation for documentation purpose.</t>
            
		  <t>In the BitTorrent implementation, the receiver sends back raw (sending 
			  and receiving) timestamps. Using this information,
			the sender can estimate the one-way delays in both 
			  directions,
			and can also compute and maintain an estimate of the base delay as 
			would be observed by the receiver.
			If the sender 
			  detects the receiver reducing its base delay, it infers that this reduction is due to
			  clock drift. The sender can be compensated by increasing its base 
			  delay by the same amount. To apply this mechanism however, 
			timestamps need to be transmitted in both directions.</t>
            
		  <t> The following considerations can be used for an alternative implementation as a 
			  reference:
			  <list style="symbols">

		 <t>Skew correction with faster virtual clock:<vspace blankLines="0" /> 
			  Since having a faster clock on the sender will continuousely update the base 
			  delay, a faster virtual clock for sender timestamping can be applied. This 
			  virual clock can be computed from the default machine clock through
			  a linear transformation. E.g. with a 500 PPM speed-up the sender's clock 
			  is very likely to be faster than any receiver's clock and thus LEDBAT will benefit 
			  from the implicit correction when updating the base delay.<vspace blankLines="1" /> </t>
            
		  <t>Skew correction with estimating drift:<vspace blankLines="0" /> 
			  With LEDBAT the history of base delay minima is already kept for each minute.
			  This can provide a base to compute the clock skew difference between the 
			  two hosts. The slope of a linear function fitted to the set of minima base delays 
			  gives an estimate of the clock skew. This estimation can be used to correct 
			  the clocks. If the other endpoint is doing the same, the clock should be 
			  corrected by half of the estimated skew amount.<vspace blankLines="1" /></t> 
		  
		  <t>Byzantine skew correction:<vspace blankLines="0" /> 
			  When it is known that each host maintains long-lived connections to a
			  number of different other hosts, a byzantine scheme can be used to
			  estimate the skew with respect to the true time.  Namely, calculate
			  the skew difference for each of the peer hosts as described with the 
			  previous approach, then take the median of the skew differences.
			  While this scheme is not universally applicable, it combines well
			  with other schemes, since it is essentially a clock training
			  mechanism.  The scheme also acts the fastest, since the state is
			  preserved between connections.</t>
		  
		  </list></t>

		  </section>
		</section>
    </back>

</rfc>
PAFTECH AB 2003-2026
2026-04-22 22:40:29