One document matched: draft-ietf-ledbat-congestion-03.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY rfc2581 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2581.xml'>
]>
<rfc category="exp" ipr="trust200902"
docName="draft-ietf-ledbat-congestion-03.txt">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<front>
<title abbrev="LEDBAT">Low Extra Delay Background Transport (LEDBAT)</title>
<author initials='S' surname="Shalunov" fullname='Stanislav Shalunov'>
<organization>BitTorrent Inc</organization>
<address>
<postal>
<street>612 Howard St, Suite 400</street>
<city>San Francisco</city> <region>CA</region> <code>94105</code>
<country>USA</country>
</postal>
<email>shalunov@bittorrent.com</email>
<uri>http://shlang.com</uri>
</address>
</author>
<author initials='G' surname="Hazel" fullname='Greg Hazel'>
<organization>BitTorrent Inc</organization>
<address>
<postal>
<street>612 Howard St, Suite 400</street>
<city>San Francisco</city> <region>CA</region> <code>94105</code>
<country>USA</country>
</postal>
<email>greg@bittorrent.com</email>
</address>
</author>
<author initials='J' surname="Iyengar" fullname='Janardhan Iyengar'>
<organization>Franklin and Marshall College</organization>
<address>
<postal>
<street>415 Harrisburg Ave.</street>
<city>Lancaster</city> <region>PA</region> <code>17603</code>
<country>USA</country>
</postal>
<email>jiyengar@fandm.edu</email>
</address>
</author>
<date/>
<area>Transport</area>
<workgroup>LEDBAT WG</workgroup>
<abstract>
<t>LEDBAT is an experimental delay-based congestion control algorithm
that attempts to utilize the available bandwidth on an end-to-end path
while limiting the consequent increase in queueing delay on the path.
LEDBAT uses changes in one-way delay measurements
to limit congestion induced in the network by the LEDBAT flow.
LEDBAT is designed largely for use by background bulk-transfer applications;
it is designed to be no more aggressive than TCP congestion control
and yields in the presence of competing TCP flows,
thus reducing interference with the network performance of the competing flows.</t>
</abstract>
</front>
<middle>
<section title="Requirements notation">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as
described in <xref target="RFC2119"/>.</t>
</section>
<section title="Introduction">
<t>
TCP congestion control <xref target="RFC2581"/>,
the predominant congestion control mechanism used on the Internet,
aims to share bandwidth at a bottleneck link equitably
among flows competing at the bottleneck.
While TCP works well for many applications,
applications
such as software updates or file-sharing applications
prefer to use bandwidth available in the network
without interfering with the network performance of
other interactive applications.
Such "background" traffic
can yield bandwidth to TCP-based "foreground" traffic
by reacting earlier than TCP to congestion signals.
</t>
<t> LEDBAT is an experimental delay-based congestion control mechanism
that allows background applications,
such as peer-to-peer applications,
that send large amounts of data particularly over links with deep buffers,
such as residential uplinks,
to operate in the background,
without interfering with performance of interactive applications.
LEDBAT uses one-way delay measurements to determine congestion on the data path,
and keeps latency across the tight link in the end-to-end path low
while attempting to utilize the available bandwidth on the end-to-end path. </t>
<!--
The predominant congestion control mechanism used on the Internet,
TCP congestion control [XXXRFC2581],
requires data loss to detect congestion.
A TCP sender increases its congestion window [XXXRFC2581]
until a loss occurs,
which,
in the absence of Active Queue Management (AQM),
occurs only
when the queue at the
bottleneck link
on the end-to-end path overflows.
Even with AQM,
The queueing delay at the bottleneck link
increases significantly before
TCP responds to congestion at the tight link.
This increased delay can be significant;
default parameters on customer-side ADSL modems [XXXcite]
can result in seconds of queueing delay on the ADSL uplink alone.
While these large queueing delays have no known
benefit, they have substantial drawbacks for interactive
applications---"lag" increases for interactive games,
and voice/video communication suffers from the consequently high roundtrip times.
-->
<!--
It has been deployed by BitTorrent
in the wild first with the BitTorrent DNA client (a P2P-based CDN) and now with the uTorrent
client. This mechanism not only allows to keep delay across a bottleneck low, but also
yields quickly in the presence of competing traffic with loss-based congestion
control.</t>
<t>Beyond its utility for P2P, LEDBAT enables other advanced networking applications to
better get out of the way of interactive apps.</t>
<t>In addition to direct and immediate benefits for P2P and other application that can
benefit from scavenger service, LEDBAT could point the way for a possible future evolution
of the Internet where loss is not part of the designed behavior and delay is
minimized.</t>
-->
<section title="Design Goals">
<t>As a "scavenger" mechanism for the Internet, LEDBAT's design goals are to:
<list style="numbers">
<t>Keep delay low when no other traffic is present </t>
<t>Add little to the queuing delays induced by TCP traffic</t>
<t>Quickly yield to traffic sharing the same bottleneck queue that uses standard TCP congestion control </t>
<t>Utilize end-to-end available bandwidth </t>
<t>Operate well in networks with FIFO queuing with drop-tail discipline</t>
</list>
</t>
</section>
<section title="Applicability">
<t> LEDBAT is a "scavenger" congestion control mechanism---a LEDBAT flow attempts to utilize all available bandwidth
and yields quickly to a competing TCP flow---and is primarily motivated by background bulk-transfer applications,
such as peer-to-peer file transfers and software updates.
It can be used for any application that needs to run in the "background",
to reduce the application's impact on the network
and on other interactive network applications.</t>
<t> LEDBAT can be used with any
transport protocol
capable of carrying timestamps and
acknowledging data frequently---LEDBAT can be easily used with TCP, SCTP, and DCCP.</t>
<!--
<t> The constants specified in this document
are based on XXX. (XXX What contexts is LEDBAT applicable in? Residential networks? Others?)
</t>-->
</section>
</section>
<section title="LEDBAT Congestion Control">
<section title="Overview">
<t> A TCP sender increases its congestion window
until a loss occurs,
which,
in the absence of any Active Queue Management (AQM) in the network,
occurs only
when the queue at the
bottleneck link
on the end-to-end path overflows.
Since packet loss at the bottleneck link is often preceded by an increase in the queueing delay at the bottleneck link,
LEDBAT congestion control uses this increase in queueing delay as an early signal of congestion,
enabling it to respond to congestion earlier than TCP,
and enabling it to yield bandwidth to a competing TCP flow.
</t>
<t> LEDBAT employs one-way delay measurements
to estimate queueing delay.
When the estimated queueing delay
is lesser than a pre-determined target,
LEDBAT infers that the network is not yet congested,
and increases its sending rate to utilize any spare capacity in the network.
When the estimated queueing delay
becomes greater than a pre-determined target,
LEDBAT decreases its sending rate quickly
as a response to potential congestion in the network.</t>
</section>
<section title="Preliminaries">
<t> For the purposes of explaining LEDBAT,
we assume a transport sender that uses fixed-size
segments and a receiver that acknowledges each segment separately.
It is straightforward to apply the mechanisms described here
with variable-sized segments
and with delayed acknowledgments.
A LEDBAT sender uses a congestion window (cwnd)
that gates the amount of data that the sender can send into the network in one RTT.
</t>
<t>LEDBAT requires that each data segment carries a "timestamp" from the sender,
based on which the receiver computes the one-way delay from the sender,
and sends this computed value back to the sender.</t>
<t> In addition to the LEDBAT mechanism described below,
we note that a slow start mechanism can be used as specified in <xref target="RFC2581"/>.
Since slow start leads to faster increase in the window than
that specified in LEDBAT,
conservative congestion control implementations employing LEDBAT
may skip slow start altogether
and start with an initial window of XXX MSS.</t>
</section>
<section title="Receiver-Side Operation">
<t> A LEDBAT receiver operates as follows:
<figure><artwork><![CDATA[
on data_packet:
remote_timestamp = data_packet.timestamp
acknowledgement.delay = local_timestamp() - remote_timestamp
# fill in other fields of acknowledgement
acknowlegement.send()
]]></artwork></figure></t>
</section>
<section title="Sender-Side Operation">
<section title="An Overview">
<t>As a first approximation, a LEDAT sender operates as show below.
TARGET is the maximum queueing delay that LEDBAT itself can introduce in the network,
and GAIN determines the rate at which the congestion window changes;
both constants are specified later.
</t>
<figure><artwork><![CDATA[
on initialization:
base_delay = +infinity
on acknowledgement:
current_delay = acknowledgement.delay
base_delay = min(base_delay, current_delay)
queuing_delay = current_delay - base_delay
off_target = TARGET - queuing_delay
cwnd += GAIN * off_target / cwnd
]]></artwork></figure>
</section>
<section title="The Complete Sender Algorithm">
<t>The simplified mechanism above ignores noise filtering and base expiration.
The full sender-side algorithm is specified below:</t>
<t><figure><artwork><![CDATA[
on initialization:
set all NOISE_FILTER delays used by current_delay() to +infinity
set all BASE_HISTORY delays used by base_delay() to +infinity
last_rollover = -infinity # More than a minute in the past.
on acknowledgement:
delay = acknowledgement.delay
update_base_delay(delay)
update_current_delay(delay)
queuing_delay = current_delay() - base_delay()
off_target = TARGET - queuing_delay + random_input()
cwnd += GAIN * off_target / cwnd
# flight_size() is the amount of currently not acked data.
max_allowed_cwnd = ALLOWED_INCREASE + TETHER*flight_size()
cwnd = min(cwnd, max_allowed_cwnd)
random_input()
# random() is a PRNG between 0.0 and 1.0
# NB: RANDOMNESS_AMOUNT is normally 0
RANDOMNESS_AMOUNT * TARGET * ((random() - 0.5)*2)
update_current_delay(delay)
# Maintain a list of NOISE_FILTER last delays observed.
forget the earliest of NOISE_FILTER current_delays
add delay to the end of current_delays
current_delay()
min(the NOISE_FILTER delays stored by update_current_delay)
update_base_delay(delay)
# Maintain BASE_HISTORY min delays. Each represents a minute.
if round_to_minute(now) != round_to_minute(last_rollover)
last_rollover = now
forget the earliest of base delays
add delay to the end of base_delays
else
last of base_delays = min(last of base_delays, delay)
base_delay()
min(the BASE_HISTORY min delays stored by update_base_delay)
]]></artwork></figure>
</t>
</section>
</section>
<section title="Parameter Values">
<t>TARGET parameter MUST be set to 100 milliseconds. GAIN
SHOULD be set to 1 so that max rampup rate is the same as for
TCP. BASE_HISTORY SHOULD be
10; it MUST be no less than 2 and SHOULD NOT be more than
20. NOISE_FILTER SHOULD be 1; it MAY be tuned so that it
is at least 1 and no more than cwnd/2. ALLOWED_INCREASE
SHOULD be 1 packet; it MUST be at least 1 packet and
SHOULD NOT be more than 3 packets. TETHER SHOULD be 1.5;
it MUST be greater than 1. RANDONMESS_AMOUNT SHOULD be 0;
it MUST be between 0 and 0.1 inclusively.</t>
<t>Note that using the same TARGET
value across LEDBAT flows is important, since flows
using different TARGET values will not share a
bottleneck equitably---flows with higher values will
get a larger share of the bottleneck bandwidth.</t>
</section>
</section>
<section title="Understanding LEDBAT Mechanisms">
<t>This section describes and
provides insight into the delay estimation
and window management mechanisms
used in LEDBAT congestion control.
</t>
<section title="Delay Estimation">
<t>LEDBAT estimates congestion in the network
based on observed increase in queueing delay in the network.
To observe an increase in the queueing delay in the network,
LEDBAT must separate the queueing delay component
from the rest of the end-to-end delay.
This section explains how LEDBAT decomposes the
observed changes in end-to-end delay into these two components.</t>
<t> LEDBAT estimates congestion in the direction of data flow.
To avoid measuring queue build-up on the reverse path (or ack path),
LEDBAT uses changes in one-way delay estimates.
The extant One-Way Active Measurement
Protocol (OWAMP) [XXXcite],
can be used for measuring one-way delay,
but since LEDBAT is used for sending data,
and since LEDBAT requires only changes in one-way delay to infer congestion,
simply adding a timestamp to the data segments
and a measurement result field in the ack packets
seems sufficient.
Doing so also avoids the pitfall of
measurement packets being treated
differently from the data packets in the network.</t>
<section title="Estimating Base Delay">
<t>End-to-end delay
can be decomposed into transmission (or serialization) delay,
propagation (or speed-of-light) delay,
queueing delay,
and processing delay.
On any given path,
barring some noise,
all delay components except for queueing delay are constant;
over time, we expect only the queueing delay on the path
to change as the queue sizes at the links change.
Since queuing delay is always additive to the end-to-end delay,
we estimate the
sum of the constant delay components,
which we call "base delay",
to be the minimum delay observed on the end-to-end path.
Using the minimum observed delay
also allows LEDBAT to eliminate noise in the delay estimation,
such as due to spikes in processing delay at a node on the path.</t>
<t>To respond to true changes in the base delay due to route changes,
LEDBAT uses only "recent" measurements---measurements over the last N minutes---in estimating
the base delay.
To implement an approximate
minimum over the last N minutes,
a LEDBAT sender stores N+1 separate minima---N for the last N minutes,
and one for the running current minute.
At the end of the current minute, the window moves---the
earliest minimum is dropped and the latest minimum is added.
When the connection is idle for a given minute,
no data is available for the one-way delay and,
therefore, no minimum is stored.
When the connection has
been idle for N minutes,
the measurement begins anew.</t>
<t> The duration of the observation window itself is a tradeoff between
robustness of measurement and responsiveness to change:
a larger observation window yields a more accurate base delay if the true base delay is unchanged,
whereas a smaller observation window results in faster response to true changes in the base delay.</t>
<!--Assuming that the
queuing delay distribution density has a non-zero
integral from zero to any sufficiently small upper
limit, minimum is also an asymptotically consistent
estimate of the constant fraction of the delay. We can
thus estimate the queuing delay as the difference
between current and base delay as usual.-->
</section>
<section title="Estimating Queueing Delay">
<t>Given that the base delay is constant,
the queueing delay is represented by the variable component
of the measured end-to-end delay.
We measure queueing delay as simply the
difference between an end-to-end delay measurement
and the current estimate of base delay.</t>
</section>
</section>
<section title="Managing the Congestion Window">
<section title="Window Increase: Probing For More Bandwidth">
<t> A LEDBAT sender increases its
congestion window most
when the queuing delay estimate is zero.
To be friendly to competing TCP flows,
we set this highest rate of window growth
to be the same as TCP's.
In other words,
the LEDBAT window grows by at most twice per round-trip time.
Since queuing delay estimate is always non-negative,
this growth rate ensures that a LEDBAT flow
never ramps up faster than a competing TCP flow over the same path.
</t>
</section>
<section title="Window Decrease: Responding To Congestion">
<t> When the sender's queuing delay estimate is lower
than the target, the sending rate should be increased.
When the sender's queueing delay estimate is higher than the target,
the sending rate should be reduced.
LEDBAT uses a simple linear controller to detemine sending rate
as a function of the delay estimate, where the
response is proportional to the difference between the
current queueing delay estimate and the target.
In limited experiments with Bittorrent nodes,
this controller seems to work well.</t>
<t> To deal with severe congestion when several
packets are dropped in the network,
and to provide a fallback against
incorrect queuing delay estimates,
a LEDBAT sender halves its cwnd
when a loss event is detected.
As with NewReno,
LEDBAT reduces its cwnd by half at
most once per RTT.
Note that, unlike TCP-like loss-based
congestion control, LEDBAT
does not induce losses and so it normally does
not rely on losses to determine the sending
rate. LEDBAT's reaction to loss is thus less
important than it is in the case of loss-based congestion
control. For LEDBAT, reducing the congestion window on
loss is a fallback mechanism in case of severe congestion and
in the case of incorrect delay estimates.</t>
</section>
</section>
</section>
<section title="Choosing Parameter Values">
<t> Through this discussion, we hope to encourage informed experimentation with LEDBAT.</t>
<section title="Queuing Delay Target">
<t>Consider the queuing delay on the bottleneck. This
delay is the extra delay induced by congestion
control. One of our design goals is to keep this delay
low. However, when this delay is zero, the queue is
empty and so no data is being transmitted and the link
is thus not saturated. Hence, our design goal is to
keep the queuing delay low, but non-zero.</t>
<t>How low do we want the queuing delay to be? Because
another design goal is to be deployable on networks
with only simple FIFO queuing and drop-tail
discipline, we can't rely on explicit signaling for
the queuing delay. So we're going to estimate it
using external measurements. The external measurements
will have an error at least on the order of best-case
scheduling delays in the OSes. There's thus a good
reason to try to make the queuing delay larger than
this error. There's no reason that would want us to
push the delay much further up. Thus, we will have a
delay target that we would want to maintain.</t>
</section>
</section>
<section title="Discussion">
<section title="Framing Considerations">
<t>The actual framing and wire format of the protocol(s)
using the LEDBAT congestion control mechanism is outside
of scope of this document, which only describes the
congestion control part.</t>
<t>There is an implication of the need to use one-way
delay from the sender to the receiver in the sender. An
obvious way to support this is to use a framing that
timestamps packets at the sender and conveys the measured
one-way delay back to the sender in ack packets. This is
the method we'll keep in mind for the purposes of
exposition. Other methods are possible and valid.</t>
<t>Another implication is the receipt of frequent ACK
packets. The exposition below assumes one ACK per data
packet, but any reasonably small number of data packets
per ACK will work as long as there is at least one ACK
every round-trip time.</t>
<t>The protocols to which this congestion control
mechanism is applicable, with possible appropriate
extensions, are TCP, SCTP, DCCP, etc. It is not a goal of
this document to cover such applications. The mechanism
can also be used with proprietary transport protocols,
e.g., those built over UDP for P2P applications.</t>
</section>
<section title="Competing With TCP Flows">
<t>Consider competition between a LEDBAT connection and a
connection governed by loss-based congestion control (on
a FIFO bottleneck with drop-tail discipline).
Loss-based connection will need to experience loss to
back off. Loss will only occur after the connection
experiences maximum possible delays. LEDBAT will thus
receive congestion indication sooner than the loss-based
connection. If LEDBAT can ramp down faster than the
loss-based connection ramps up, LEDBAT will
yield. LEDBAT ramps down when queuing delay estimate
exceeds the target: the more the excess, the faster the
ramp-down. When the loss-based connection is standard
TCP, LEDBAT will yield at precisely the same rate as TCP
is ramping up when the queuing delay is double the
target.</t>
<t>LEDBAT is most aggressive when its queuing delay
estimate is most wrong and is as low as it can be.
Queuing delay estimate is nonnegative, therefore the
worst possible case is when somehow the estimate is
always returned as zero. In this case, LEDBAT will ramp
up as fast as TCP and halve the rate on loss. Thus, in
case of worst possible failure of estimates, LEDBAT will
behave identically to TCP. This provides an extra
safety net.</t>
</section>
<section title="Fairness Among LEDBAT Flows">
<t>The design goals of LEDBAT center around the aggregate
behavior of LEDBAT flows when they compete with standard
TCP. It is also interesting how LEDBAT flows share
bottleneck bandwidth when they only compete between
themselves.</t>
<t>LEDBAT as described so far lacks a mechanism
specifically designed to equalize utilization between
these flows. The observed behavior of existing
implementations indicates that a rough equalization, in
fact, does occur.</t>
<t>The delay measurements used as control inputs by LEDBAT
contain some amount of noise and errors. The linear
controller converts this input noise into the same
amount of output noise. The effect that this has is that
the uncorrelated component of the noise between flows
serves to randomly shuffle some amount of bandwidth
between flows. The amount shuffled during each RTT is
proportional to the noise divided by the target
delay. The random-walk trajectory of bandwidth utilized
by each of the flows over time tends to the fair
share. The timescales on which the rates become
comparable are proportional to the target delay
multiplied by the RTT and divided by the noise.</t>
<t>In complex real-life systems, the main concern is
usually the reduction of the amount of noise, which is
copious if not eliminated. In some circumstances,
however, the measurements might be "too good" -- since
the equalization timescale is inversely proportional to
noise, perfect measurements would result in lack of
convergence.</t>
<t>Under these circumstances, it may be beneficial to
introduce some artificial randomness into the inputs
(or, equivalently, outputs) of the controller. Note that
most systems should not require this and should be
primarily concerned with reducing, not adding,
noise.</t>
<t>With delay-based congestion control systems, there's a
concern about the ability of late comers to measure the
base delay correctly. Suppose a LEDBAT flow saturates a
bottleneck; another LEDBAT flow starts and proceeds to
measure the base delay and the current delay and to
estimate the queuing delay. If the bottleneck always
contains target delay worth of packets, the second flow
would see the bottleneck as empty start building a
second target delay worth of queue on top of the
existing queue. The concern ("late comers' advantage")
is that the initial flow would now back off because it
sees the real delay and the late comer would use the
whole capacity.</t>
<t>However, once the initial flow yields, the late comer
immediately measures the true base delay and the two
flows operate from the same (correct) inputs.</t>
<t>Additionally, in practice this concern is further
alleviated by the burstiness of network traffic: all
that's needed to measure the base delay is one small
gap. These gaps can occur for a variety of reasons: the
OS may delay the scheduling of the sending process until
a time slice ends, the sending computer might be
unusually busy for some number of milliseconds or tens
of milliseconds, etc. If such a gap occurs while the
late comer is starting, base delay is immediately
correctly measured. With small number of flows, this
appears to be the main mechanism of regulating the late
comers' advantage.</t>
</section>
<!-- <section title="Deployment Status">
</section>
--> </section>
<section title="IANA Considerations">
<t>There are no IANA considerations for this document.</t>
</section>
<section title="Security Considerations">
<t>A network on the path might choose to cause higher
delay measurements than the real queuing delay so that
LEDBAT backs off even when there's no congestion present.
Shaping of traffic into an artificially narrow bottleneck
can't be counteracted, but faking timestamp field can and
SHOULD. A protocol using the LEDBAT congestion control
SHOULD authenticate the timestamp and delay fields,
preferably as part of authenticating most of the rest of
the packet, with the exception of volatile header fields.
The choice of the authentication mechanism that resists
man-in-the-middle attacks is outside of scope of this
document.</t>
</section>
</middle>
<back>
<references title='Normative References'>&rfc2119;&rfc2581;
</references>
<section anchor="app-additional" title="Timestamp errors">
<t>One-way delay measurement needs to deal with timestamp
errors. We'll use the same locally linear clock model and
the same terminology as Network Time Protocol (NTP). This
model is valid for any differentiable clocks. NTP uses
the term "offset" to refer to difference from true time
and "skew" to refer to difference of clock rate from the
true rate. The clock will thus have a fixed offset from
the true time and a skew. We'll consider what we need to
do about the offset and the skew separately.</t>
<section title="Clock offset">
<t>First, consider the case of zero skew. The offset of
each of the two clocks shows up as a fixed error in
one-way delay measurement. The difference of the offsets
is the absolute error of the one-way delay estimate. We
won't use this estimate directly, however. We'll use the
difference between that and a base delay. Because the
error (difference of clock offsets) is the same for the
current and base delay, it cancels from the queuing
delay estimate, which is what we'll use. Clock offset is
thus irrelevant to the design.</t>
</section>
<section title="Clock skew">
<t>Now consider the skew. For a given clock, skew
manifests in a linearly changing error in the time
estimate. For a given pair of clocks, the difference in
skews is the skew of the one-way delay estimate. Unlike
the offset, this no longer cancels in the computation of
the queuing delay estimate. On the other hand, while
the offset could be huge, with some clocks off by
minutes or even hours or more, the skew is typically not
too bad. For example, NTP is designed to work with most
clocks, yet it gives up when the skew is more than 500
parts per million (PPM). Typical skews of clocks that
have never been trained seem to often be around 100-200
PPM. Previously trained clocks could have 10-20 PPM
skew due to temperature changes. A 100-PPM skew means
accumulating 6 milliseconds of error per minute. The
expiration of base delay related to route changes mostly
takes care of clock skew. A technique to specifically
compute and cancel it is trivially possible and involves
tracking base delay skew over a number of minutes and
then correcting for it, but usually isn't necessary,
unless the target is unusually low, the skew is
unusually high, or the base interval is unusually long.
It is not further described in this document.</t>
<t>For cases when the base interval is long or the skew is
high or the target is low, a technique to correct for
skew can be beneficial. The technique described here or
a different mechanism MAY be used by
implementations. The technique described is still
experimental, but it is actually currently used. The
pseudocode in the specification below does not include
any of the skew correction algorithms.</t>
<section title="Deployed clock skew correction mechanism">
<t>Clock skew can be in two directions: either the
sender's clock is faster than the receiver's, or vice
versa. We refer to the former situation as clock drift "in
sender's favor" and to the latter as clock drift "in
receiver's favor".</t>
<t>When the clock drift is "in sender's favor", nothing
special needs to be done, because the timestamp
differences (i.e., the raw delay estimates) will grow
smaller with time, and thus the base delay will be
continuously updated with the drift.</t>
<t>When the clock drift is "in receiver's favor", the raw
delay estimates will drift up with time, suppressing the
throughput needlessly. This is the case that can benefit
from a special mechanism. Assume symmetrical framing
(i.e., same information about delays transmitted in both
way). If the sender can detect the receiver reducing its
base delay, it can infer that this is due to clock drift
"in receiver's favor". This can be compensated for by
increasing the sender's base delay by the same
amount. Since, in our implementation, the receiver sends
back the raw timestamp estimate, the sender can run the
same base delay calculation algorithm it runs for itself
for the receiver as well; when it reduces the inferred
receiver's delay, it increases its own by the same
amount.</t>
</section>
<section title="Skew correction with faster virtual clock">
<t>This is an alternative skew correction algorithm,
currently under consideration and not deployed in the
wild.</t>
<t>Since having a faster clock on the sender is,
relatively speaking, a non-problem, one can use two
different virtual clocks on each LEDBAT implementation:
use, for example, the default machine clock for situations
where the instance is acting as a receiver, and use a
virtual clock, easily computed from the default machine
clock through a linear transformation, for situations
where the instance is acting as a sender. Make the virtual
clock, e.g., 500 PPM faster than the machine clock. Since
500 PPM is more than the variability of most clocks (plus
or minus 100 PPM), any sender's clock is very likely to be
faster than any receiver's clock, thus benefitting from
the implicit correction of taking the minimum as the base
delay.</t>
<t>Note that this approach is not compatible with the one
described in the preceding section.</t>
</section>
<section title="Skew correction with estimating drift">
<t>This is an alternative skew correction algorithm,
currently under consideration and not deployed in the
wild.</t>
<t>The history of base delay minima we already keep for
each minute provides us with direct means of computing the
clock skew difference between the two hosts. Namely, we
can fit a linear function to the set of base delay
estimates for each minute. The slope of the function is an
estimate of the clock skew difference for the given pair
of sender and receiver. Once the clock skew difference is
estimated, it can be used to correct the clocks so that
they advance at nearly the same rate. Namely, the clock
needs to be corrected by half of the estimated skew
amount, since the other half will be corrected by the
other endpoint. Note that the skew differences are then
maintained for each connection and the virtual clocks used
with each connection can differ, since they do not attempt
to estimate the skew with respect to the true time, but
instead with respect to the other endpoint.</t>
<section title="Byzantine skew correction">
<t>This is an alternative skew correction algorithm,
currently under consideration and not deployed in the
wild.</t>
<t>When it is known that each host maintains long-lived
connections to a number of different other hosts, a
byzantine scheme can be used to estimate the skew with
respect to the true time. Namely, calculate the skew
difference for each of the peer hosts as described in the
preceding section, then take the median of the skew
differences.</t>
<t>This inherent clock drift can then be corrected with a
linear transformation before the clock data is used in the
algorithm from the preceding section, the currently
deployed algorithm, or nearly any other skew correction
algorithm.</t>
<t>While this scheme is not universally applicable, it
combines well with other schemes, since it is essentially
a clock training mechanism. The scheme also acts the
fastest, since the state is preserved between
connections.</t>
</section>
</section>
</section>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 10:57:46 |