http://stupid.domain.name/ietf/

One document matched: draft-alvestrand-rmcat-congestion-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-alvestrand-rmcat-congestion-00"
     ipr="trust200902">
  <front>
    <title abbrev="Congestion Control for RTCWEB">A Google Congestion Control
    Algorithm for Real-Time Communication</title>

    <author fullname="Henrik Lundin" initials="H." surname="Lundin">
      <organization>Google</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <code>11122</code>

          <country>Sweden</country>
        </postal>
      </address>
    </author>

    <author fullname="Stefan Holmer" initials="S." surname="Holmer">
      <organization>Google</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <code>11122</code>

          <country>Sweden</country>
        </postal>

        <email>holmer@google.com</email>
      </address>
    </author>

    <author fullname="Harald Alvestrand" initials="H. T." role="editor"
            surname="Alvestrand">
      <organization>Google</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <code>11122</code>

          <country>Sweden</country>
        </postal>

        <email>harald@alvestrand.no</email>
      </address>
    </author>

    <date day="18" month="February" year="2013" />

    <abstract>
      <t>This document describes two methods of congestion control when using
      real-time communications on the World Wide Web (RTCWEB); one
      sender-based and one receiver-based.</t>

      <t>It is published as an input document to the RMCAT working group on
      congestion control for media streams. The mailing list of that WG is
      rmcat@ietf.org.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Congestion control is a requirement for all applications that wish to
      share the Internet <xref target="RFC2914"></xref>.</t>

      <t>The problem of doing congestion control for real-time media is made
      difficult for a number of reasons:</t>

      <t><list style="symbols">
          <t>The media is usually encoded in forms that cannot be quickly
          changed to accommodate varying bandwidth, and bandwidth requirements
          can often be changed only in discrete, rather large steps</t>

          <t>The participants may have certain specific wishes on how to
          respond - which may not be reducing the bandwidth required by the
          flow on which congestion is discovered</t>

          <t>The encodings are usually sensitive to packet loss, while the
          real time requirement precludes the repair of packet loss by
          retransmission</t>
        </list>This memo describes two congestion control algorithms that
      together are seen to give reasonable performance and reasonable (not
      perfect) bandwidth sharing with other conferences and with TCP-using
      applications that share the same links.</t>

      <t>The signalling used consists of standard RTP timestamps <xref
      target="RFC3550"></xref> possibly augmented with RTP transmission time
      offsets <xref target="RFC5450"></xref>, standard RTCP feedback reports
      and Temporary Maximum Media Stream Bit Rate Requests (TMMBR) as defined
      in <xref target="RFC5104"></xref> section 3.5.4, or by using the REMB
      feedback report defined in <xref
      target="I-D.alvestrand-rmcat-remb"></xref></t>

      <t></t>

      <section title="Mathemathical notation conventions">
        <t>The mathematics of this document have been transcribed from a more
        formula-friendly format.</t>

        <t>The following notational conventions are used:</t>

        <t><list style="hanging">
            <t hangText="X_bar">The variable X, where X is a vector -
            conventionally marked by a bar on top of the variable name.</t>

            <t hangText="X_hat">An estimate of the true value of variable X -
            conventionally marked by a circumflex accent on top of the
            variable name.</t>

            <t hangText="X(i)">The "i"th value of X - conventionally marked by
            a subscript i.</t>

            <t hangText="[x y z]">A row vector consisting of elements x, y and
            z.</t>

            <t hangText="X_bar^T">The transpose of vector X_bar.</t>

            <t hangText="E{X}">The expected value of the stochastic variable
            X</t>
          </list></t>
      </section>
    </section>

    <section title="System model">
      <t>The following elements are in the system:</t>

      <t><list style="symbols">
          <t>RTP packet - an RTP packet containing media data.</t>

          <t>Frame - a set of RTP packets transmitted from the sender at the
          same time instant. This could be a video frame, an audio frame, or a
          mix of audio and video packets. A frame can be defined by the RTP
          packet send time (RTP timestamp + transmission time offset), or by
          the RTP timestamp if the transmission time offset field is not
          present.</t>

          <t>Incoming media streams - a stream of frames consisting of RTP
          packets.</t>

          <t>Media codec - has a bandwidth control, and encodes the incoming
          media stream into an RTP stream.</t>

          <t>RTP sender - sends the RTP stream over the network to the RTP
          receiver. Generates the RTP timestamp.</t>

          <t>RTP receiver - receives the RTP stream, notes the time of
          arrival. Regenerates the media stream for the recipient.</t>

          <t>RTCP sender at RTP sender - sends sender reports with mappings
          between RTP timestamps and NTP time.</t>

          <t>RTCP sender at RTP receiver - sends receiver reports and
          TMMBR/REMB messages.</t>

          <t>RTCP receiver at RTP sender - receives receiver reports and
          TMMBR/REMB messages, reports these to sender side control.</t>

          <t>RTCP receiver at RTP receiver.</t>

          <t>Sender side control - takes loss rate info, round trip time info,
          and TMMBR/REMB messages and computes a sending bitrate.</t>

          <t>Receiver side control - takes the packet arrival info at the RTP
          receiver and decides when to send TMMBR/REMB messages.</t>
        </list>Together, sender side control and receiver side control
      implement the congestion control algorithm.</t>

      <t></t>
    </section>

    <section anchor="receiverside" title="Receiver side control">
      <t>The receive-side algorithm can be further decomposed into four parts:
      an RTP timestamp to NTP time conversion, arrival-time filter, an
      over-use detector, and a remote rate-control.</t>

      <t></t>

      <section title="Procsesing multiple streams using RTP timestamp to NTP time conversion">
        <t>It is common that multiple RTP streams are sent from the sender to
        the receiver. In such a situation the RTP timestamps of incoming can
        first be converted to a common time base using the RTP timestamp and
        NTP time pairs in RTCP SR reports<xref target="RFC3550"></xref>. The
        converted timestamps can then be used instead of RTP timestamps in the
        arrival-time filtering, and since all streams from the same sender
        have timestamps in the same time base they can all be processed by the
        same filter. This has the advantage of quicker reactions and reduces
        problems of noisy measurements due to self-inflicted
        cross-traffic.</t>

        <t>In the time interval from the start of the call until a stream from
        the same sender has received an RTCP SR report, the receiver-side
        control operates in single-stream mode. In that mode only one RTP
        stream can be processed by the over-use detector. As soon as a stream
        has received one or more RTCP SR reports the receiver-side control can
        change to a multi-stream mode, where all RTP streams from the same
        sender which have received one or more RTCP SR reports can be
        processed by the over-use detector. When switching to the multi-stream
        mode the state of the over-use detector must be modified to avoid a
        time base mismatch. This can either be done by resetting the stored
        RTP timestamp values or by converting them using the newly received
        RTCP SR report.</t>
      </section>

      <section title="Arrival-time model">
        <t>This section describes an adaptive filter that continuously updates
        estimates of network parameters based on the timing of the received
        frames.</t>

        <t>At the receiving side we are observing groups of incoming packets,
        where each group of packets corresponding to the same frame having
        timestamp T(i).</t>

        <t>Each frame is assigned a receive time t(i), which corresponds to
        the time at which the whole frame has been received (ignoring any
        packet losses). A frame is delayed relative to its predecessor if
        t(i)-t(i-1)>T(i)-T(i-1), i.e., if the arrival time difference is
        larger than the timestamp difference.</t>

        <t>We define the (relative) inter-arrival time, d(i) as</t>

        <figure>
          <artwork><![CDATA[ 
  d(i) = t(i)-t(i-1)-(T(i)-T(i-1))

]]></artwork>
        </figure>

        <t>Since the time ts to send a frame of size L over a path with a
        capacity of C is roughly</t>

        <figure>
          <artwork><![CDATA[
  ts = L/C

]]></artwork>
        </figure>

        <t>we can model the inter-arrival time as</t>

        <figure>
          <artwork><![CDATA[ 
           L(i)-L(i-1)
  d(i) = -------------- + w(i) = dL(i)/C+w(i)
               C

]]></artwork>
        </figure>

        <t>Here, w(i) is a sample from a stochastic process W, which is a
        function of the capacity C, the current cross traffic X(i), and the
        current send bit rate R(i). We model W as a white Gaussian process. If
        we are over-using the channel we expect w(i) to increase, and if a
        queue on the network path is being emptied, w(i) will decrease;
        otherwise the mean of w(i) will be zero.</t>

        <t>Breaking out the mean m(i) from w(i) to make the process zero mean,
        we get</t>

        <figure>
          <preamble>Equation 5</preamble>

          <artwork><![CDATA[
  d(i) = dL(i)/C + m(i) + v(i)

]]></artwork>
        </figure>

        <t>This is our fundamental model, where we take into account that a
        large frame needs more time to traverse the link than a small frame,
        thus arriving with higher relative delay. The noise term represents
        network jitter and other delay effects not captured by the model.</t>

        <t>When graphing the values for d(i) versus dL(i) on a scatterplot, we
        find that most samples cluster around the center, and the outliers are
        clustered along a line with average slope 1/C and zero offset.</t>

        <t>For instance, when using a regular video codec, most frames are
        roughly the same size after encoding (the central
        “cloud”); the exceptions are I-frames (or key frames)
        which are typically much larger than the average causing positive
        outliers (the I-frame itself) and negative outliers (the frame after
        an I-frame) on the dL axis. Audio frames on the other hand often
        consist of single packets of equal size, and an audio-only media
        stream would have its frames scattered at dL = 0.</t>
      </section>

      <section title="Arrival-time filter">
        <t>The parameters d(i) and dL(i) are readily available for each frame
        i > 1, and we want to estimate C(i) and m(i) and use those
        estimates to detect whether or not we are over-using the bandwidth
        currently available. These parameters are easily estimated by any
        adaptive filter – we are using the Kalman filter.</t>

        <t>Let</t>

        <figure>
          <artwork><![CDATA[
  theta_bar(i) = [1/C(i)  m(i)]^T
]]></artwork>
        </figure>

        <t>and call it the state of time i. We model the state evolution from
        time i to time i+1 as</t>

        <t></t>

        <figure>
          <artwork><![CDATA[
  theta_bar(i+1) = theta_bar(i) + u_bar(i)
]]></artwork>
        </figure>

        <t>where u_bar(i) is the zero mean white Gaussian process noise with
        covariance</t>

        <t></t>

        <figure>
          <preamble>Equation 7</preamble>

          <artwork><![CDATA[
  Q(i) = E{u_bar(i) u_bar(i)^T}

]]></artwork>
        </figure>

        <t>Given equation 5 we get</t>

        <figure>
          <preamble>Equation 8</preamble>

          <artwork><![CDATA[
  d(i) = h_bar(i)^T theta_bar(i) + v(i)

  h_bar(i) = [dL(i)  1]^T

]]></artwork>
        </figure>

        <t>where v(i) is zero mean white Gaussian measurement noise with
        variance var_v = sigma(v,i)^2</t>

        <t>The Kalman filter recursively updates our estimate</t>

        <figure>
          <artwork><![CDATA[
  theta_hat(i) = [1/C_hat(i) m_hat(i)]^T]]></artwork>
        </figure>

        <t>as</t>

        <figure>
          <artwork><![CDATA[
  z(i) = d(i) - h_bar(i)^T * theta_hat(i-1)

  theta_hat(i) = theta_hat(i-1) + z(i) * k_bar(i)

                           E(i-1) * h_bar(i)
  k_bar(i) = --------------------------------------------
               var_v_hat + h_bar(i)^T * E(i-1) * h_bar(i)

  E(i) = (I - K_bar(i) * h_bar(i)^T) * E(i-1) + Q(i)
]]></artwork>
        </figure>

        <t>I is the 2-by-2 identity matrix.</t>

        <t>The variance var_v = sigma(v,i)^2 is estimated using an exponential
        averaging filter, modified for variable sampling rate</t>

        <figure>
          <artwork><![CDATA[
  var_v_hat = beta*sigma(v,i-1)^2 + (1-beta)*z(i)^2

  beta = (1-alpha)^(30/(1000 * f_max))

]]></artwork>
        </figure>

        <t>where f_max = max {1/(T(j) - T(j-1))} for j in i-K+1...i is the
        highest rate at which frames have been captured by the camera the last
        K frames and alpha is a filter coefficient typically chosen as a
        number in the interval [0.1, 0.001]. Since our assumption that v(i)
        should be zero mean WGN is less accurate in some cases, we have
        introduced an additional outlier filter around the updates of
        var_v_hat. If z(i) > 3 var_v_hat the filter is updated with 3
        sqrt(var_v_hat) rather than z(i). For instance v(i) will not be white
        in situations where packets are sent at a higher rate than the channel
        capacity, in which case they will be queued behind each other. In a
        similar way, Q(i) is chosen as a diagonal matrix with main diagonal
        elements given by</t>

        <figure>
          <artwork><![CDATA[
  diag(Q(i)) = 30/(1000 * f_max)[10^-10 10^-2]^T
]]></artwork>
        </figure>

        <t>It is necessary to scale these filter parameters with the frame
        rate to make the detector respond as quickly at low frame rates as at
        high frame rates.</t>
      </section>

      <section title="Over-use detector">
        <t>The offset estimate m(i) is compared with a threshold gamma_1. An
        estimate above the threshold is considered as an indication of
        over-use. Such an indication is not enough for the detector to signal
        over-use to the rate control subsystem. Not until over-use has been
        detected for at least gamma_2 milliseconds and at least gamma_3
        frames, a definitive over-use will be signaled. However, if the offset
        estimate m(i) was decreased in the last update, over-use will not be
        signaled even if all the above conditions are met. Similarly, the
        opposite state, under-use, is detected when m(i) < -gamma_1. If
        neither over-use nor under-use is detected, the detector will be in
        the normal state.</t>
      </section>

      <section title="Rate control">
        <t>The rate control at the receiving side is designed to increase the
        receive-side estimate of the available bandwidth A_hat as long as the
        detected state is normal. Doing that assures that we, sooner or later,
        will reach the available bandwidth of the channel and detect an
        over-use.</t>

        <t>As soon as over-use has been detected the receive-side estimate of
        the available bandwidth is decreased. In this way we get a recursive
        and adaptive estimate of the available bandwidth.</t>

        <t>In this document we make the assumption that the rate control
        subsystem is executed periodically and that this period is
        constant.</t>

        <t>The rate control subsystem has 3 states: Increase, Decrease and
        Hold. "Increase" is the state when no congestion is detected;
        "Decrease" is the state where congestion is detected, and "Hold" is a
        state that waits until built-up queues have drained before going to
        "increase" state.</t>

        <t>The state transitions (with blank fields meaning "remain in state")
        are:</t>

        <t></t>

        <figure>
          <artwork><![CDATA[State ---->  | Hold      |Increase    |Decrease
Signal-----------------------------------------
  v          |           |            |
Over-use     | Decrease  |Decrease    |
-----------------------------------------------
Normal       | Increase  |            |Hold
-----------------------------------------------
Under-use    |           |Hold        |Hold
-----------------------------------------------



]]></artwork>
        </figure>

        <t>The subsystem starts in the increase state, where it will stay
        until over-use or under-use has been detected by the detector
        subsystem. On every update the receive-side estimate of the available
        bandwidth is increased with a factor which is a function of the global
        system response time and the estimated measurement noise variance
        var_v_hat. The global system response time is the time from an
        increase that causes over-use until that over-use can be detected by
        the over-use detector. The variance var_v_hat affects how responsive
        the Kalman filter is, and is thus used as an indicator of the delay
        inflicted by the Kalman filter.</t>

        <figure>
          <artwork><![CDATA[
  A_hat(i) = eta*A_hat(i-1)
                                 1.001+B
  eta(RTT, var_v_hat) = ------------------------------------------
                           1+e^(b(d*RTT - (c1 * var_v_hat + c2)))]]></artwork>
        </figure>

        <t>Here, B, b, d, c1 and c2 are design parameters.</t>

        <t>Since the system depends on over-using the channel to verify the
        current available bandwidth estimate, we must make sure that our
        estimate doesn't diverge from the rate at which the sender is actually
        sending. Thus, if the sender is unable to produce a bit stream with
        the bit rate the receiver is asking for, the available bandwidth
        estimate must stay within a given bound. Therefore we introduce a
        threshold</t>

        <figure>
          <artwork><![CDATA[
  A_hat(i) < 1.5 * R_hat(i)
]]></artwork>
        </figure>

        <t>where R_hat(i) is the incoming bit rate measured over a T seconds
        window:</t>

        <figure>
          <artwork><![CDATA[
  R_hat(i) = 1/T * sum(L(j)) for j from 1 to N(i)]]></artwork>
        </figure>

        <t>N(i) is the number of frames received the past T seconds and L(j)
        is the payload size of frame j. Ideally T should be chosen to match
        the rate controller at the sender. A window between 0.5 and 1 second
        is recommended.</t>

        <t>When an over-use is detected the system transitions to the decrease
        state, where the receive-side available bandwidth estimate is
        decreased to a factor times the currently incoming bit rate.</t>

        <figure>
          <artwork><![CDATA[
  A_hat(i) = alpha*R_hat(i)]]></artwork>
        </figure>

        <t>alpha is typically chosen to be in the interval [0.8, 0.95].</t>

        <t>When the detector signals under-use to the rate control subsystem,
        we know that queues in the network path are being emptied, indicating
        that our available bandwidth estimate is lower than the actual
        available bandwidth. Upon that signal the rate control subsystem will
        enter the hold state, where the receive-side available bandwidth
        estimate will be held constant while waiting for the queues to
        stabilize at a lower level – a way of keeping the delay as low
        as possible. This decrease of delay is wanted, and expected,
        immediately after the estimate has been reduced due to over-use, but
        can also happen if the cross traffic over some links is reduced. In
        either case we want to measure the highest incoming rate during the
        under-use interval:</t>

        <figure>
          <artwork><![CDATA[
  R_max = max{R_hat(i)} for i in 1..K

]]></artwork>
        </figure>

        <t>where K is the number of frames of under-use before returning to
        the normal state. R_max is a measure of the actual bandwidth available
        and is a good guess of what bit rate the sender should be able to
        transmit at. Therefore the receive-side available bandwidth estimate
        will be set to R_max when we transition from the hold state to the
        increase state.</t>

        <t>One design decision is when to send rate control messages. The time
        from a change in congestion to the sending of the feedback message is
        a limitation on how fast the sender can react. Sending too many
        messages giving no new information is a waste of bandwidth - but in
        the case of severe congestion, feedback messages can be lost,
        resulting in a failure to react in a timely manner.</t>

        <t>The conclusion is that feedback messages should be sent on a
        "heartbeat" schedule, allowing the sender side control to react to
        missing feedback messages by reducing its send rate, but they should
        also be sent whenever the estimated bandwidth value has changed
        significantly, without waiting for the heartbeat time, up to some
        limiting upper bound on the send rate.</t>

        <t>The minimum interval is named t_min_fb_interval.</t>

        <t>The maximum interval is named t_max_fb_interval.</t>

        <t>The permissible values of these intervals will be bounded by the
        RTP session's RTCP bandwidth and its rtcp_frr setting.</t>

        <t>[TODO: Get some example values for these timers]</t>
      </section>
    </section>

    <section anchor="senderside" title="Sender side control">
      <t>An additional congestion controller resides at the sending side. It
      bases its decisions on the round-trip time, packet loss and available
      bandwidth estimates transmitted from the receiving side.</t>

      <t>The available bandwidth estimates produced by the receiving side are
      only reliable when the size of the queues along the channel are large
      enough. If the queues are very short, over-use will only be visible
      through packet losses, which aren't used by the receiving side
      algorithm.</t>

      <t>This algorithm is run every time a receive report arrives at the
      sender, which will happen no more often than t_min_fb_interval, and no
      less often than t_max_fb_interval. If no receive report is received
      within 2x t_max_fb_interval (indicating at least 2 lost feedback
      reports), the algorithm will take action as if all packets in the
      interval have been lost, resulting in a halving of the send rate.</t>

      <t><list style="symbols">
          <t>If 2-10% of the packets have been lost since the previous report
          from the receiver, the sender available bandwidth estimate As(i) (As
          denotes ‘sender available bandwidth’) will be kept
          unchanged.</t>

          <t>If more than 10% of the packets have been lost a new estimate is
          calculated as As(i)=As(i-1)(1-0.5p), where p is the loss ratio.</t>

          <t>As long as less than 2% of the packets have been lost As(i) will
          be increased as As(i)=1.05(As(i-1)+1000)</t>
        </list></t>

      <t>The new send-side estimate is limited by the TCP Friendly Rate
      Control formula <xref target="RFC3448"></xref> and the receive-side
      estimate of the available bandwidth A(i):</t>

      <figure>
        <artwork><![CDATA[                               8 s
As(i) >= ----------------------------------------------------------
         R*sqrt(2*b*p/3) + (t_RTO*(3*sqrt(3*b*p/8) * p * (1+32*p^2)))

As(i) <= A(i)

]]></artwork>
      </figure>

      <t>where b is the number of packets acknowledged by a single TCP
      acknowledgement (set to 1 per TFRC recommendations), t_RTO is the TCP
      retransmission timeout value in seconds (set to 4*R) and s is the
      average packet size in bytes. R is the round-trip time in seconds.</t>

      <t>(The multiplication by 8 comes because TFRC is computing bandwidth in
      bytes, while this document computes bandwidth in bits.)</t>

      <t>In words: The sender-side estimate will never be larger than the
      receiver-side estimate, and will never be lower than the estimate from
      the TFRC formula.</t>

      <t>We motivate the packet loss thresholds by noting that if the
      transmission channel has a small amount of packet loss due to over-use,
      that amount will soon increase if the sender does not adjust his bit
      rate. Therefore we will soon enough reach above the 10 % threshold and
      adjust As(i). However if the packet loss rate does not increase, the
      losses are probably not related to self-induced channel over-use and
      therefore we should not react on them.</t>
    </section>

    <section title="Interoperability Considerations">
      <t>There are three scenarios of interest, and one included for
      reference</t>

      <t><list style="symbols">
          <t>Both parties implement the algorithms described here</t>

          <t>Sender implements the algorithm described in section <xref
          target="senderside"></xref>, recipient does not implement <xref
          target="receiverside"></xref></t>

          <t>Recipient implements the algorithm in section <xref
          target="receiverside"></xref>, sender does not implement <xref
          target="senderside"></xref>.</t>
        </list>In the case where both parties implement the algorithms, we
      expect to see most of the congestion control response to slowly varying
      conditions happen by TMMBR/REMB messages from recipient to sender. At
      most times, the sender will send less than the congestion-inducing
      bandwidth limit C, and when he sends more, congestion will be detected
      before packets are lost.</t>

      <t>If sudden changes happen, packets will be lost, and the sender side
      control will trigger, limiting traffic until the congestion becomes low
      enough that the system switches back to the receiver-controlled
      state.</t>

      <t>In the case where sender only implements, we expect to see somewhat
      higher loss rates and delays, but the system will still be overall TCP
      friendly and self-adjusting; the governing term in the calculation will
      be the TFRC formula.</t>

      <t>In the case where recipient implements this algorithm and sender does
      not, congestion will be avoided for slow changes as long as the sender
      understands and obeys TMMBR/REMB; there will be no backoff for
      packet-loss-inducing changes in capacity. Given that some kind of
      congestion control is mandatory for the sender according to the TMMBR
      spec, this case has to be reevaluated against the specific congestion
      control implemented by the sender.</t>
    </section>

    <section title="Implementation Experience">
      <t>This algorithm has been implemented in the open-source WebRTC
      project.</t>
    </section>

    <section title="Further Work">
      <t>This draft is offered as input to the congestion control
      discussion.</t>

      <t>Work that can be done on this basis includes:</t>

      <t><list style="symbols">
          <t>Consideration of timing info: It may be sensible to use the
          proposed TFRC RTP header extensions <xref
          target="I-D.gharai-avtcore-rtp-tfrc"></xref> to carry per-packet
          timing information, which would both give more data points and a
          timestamp applied closer to the network interface. This draft
          includes consideration of using the transmission time offset defined
          in <xref target="RFC5450"></xref></t>

          <t>Considerations of cross-channel calculation: If all packets in
          multiple streams follow the same path over the network, congestion
          or queueing information should be considered across all packets
          between two parties, not just per media stream. A feedback message
          (REMB) that may be suitable for such a purpose is given in <xref
          target="I-D.alvestrand-rmcat-remb"></xref>.</t>

          <t>Considerations of cross-channel balancing: The decision to slow
          down sending in a situation with multiple media streams should be
          taken across all media streams, not per stream.</t>

          <t>Considerations of additional input: How and where packet loss
          detected at the recipient can be added to the algorithm.</t>

          <t>Considerations of locus of control: Whether the sender or the
          recipient is in the best position to figure out which media streams
          it makes sense to slow down, and therefore whether one should use
          TMMBR to slow down one channel, signal an overall bandwidth change
          and let the sender make the decision, or signal the (possibly
          processed) delay info and let the sender run the algorithm.</t>

          <t>Considerations of over-bandwidth estimation: Whether we can use
          the estimate of how much we're over bandwidth in section 3 to
          influence how much we reduce the bandwidth, rather than using a
          fixed factor.</t>

          <t>Startup considerations. It's unreasonable to assume that just
          starting at full rate is always the best strategy.</t>

          <t>Dealing with sender traffic shaping, which delays sending of
          packets. Using send-time timestamps rather than RTP timestamps may
          be useful here, but as long as the sender's traffic shaping does not
          spread out packets more than the bottleneck link, it should not
          matter.</t>

          <t>Stability considerations. It is not clear how to show that the
          algorithm cannot provide an oscillating state, either alone or when
          competing with other algorithms / flows.</t>
        </list>These are matters for further work; since some of them involve
      extensions that have not yet been standardized, this could take some
      time.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>An attacker with the ability to insert or remove messages on the
      connection will, of course, have the ability to mess up rate control,
      causing people to send either too fast or too slow, and causing
      congestion.</t>

      <t>In this case, the control information is carried inside RTP, and can
      be protected against modification or message insertion using SRTP, just
      as for the media. Given that timestamps are carried in the RTP header,
      which is not encrypted, this is not protected against disclosure, but it
      seems hard to mount an attack based on timing information only.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>Thanks to Randell Jesup, Magnus Westerlund, Varun Singh, Tim Panton,
      Soo-Hyun Choo, Jim Gettys, Ingemar Johansson, Michael Welzl and others
      for providing valuable feedback on earlier versions of this draft.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3448'?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.5104'?>

      <?rfc include='reference.RFC.5450'?>

      <?rfc include='reference.I-D.alvestrand-rmcat-remb'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.I-D.gharai-avtcore-rtp-tfrc'?>

      <?rfc include='reference.RFC.2914'?>
    </references>

    <section title="Change log">
      <t></t>

      <section title="Version -00 to -01">
        <t><list style="symbols">
            <t>Added change log</t>

            <t>Added appendix outlining new extensions</t>

            <t>Added a section on when to send feedback to the end of section
            3.3 "Rate control", and defined min/max FB intervals.</t>

            <t>Added size of over-bandwidth estimate usage to "further work"
            section.</t>

            <t>Added startup considerations to "further work" section.</t>

            <t>Added sender-delay considerations to "further work"
            section.</t>

            <t>Filled in acknowledgements section from mailing list
            discussion.</t>
          </list></t>
      </section>

      <section title="Version -01 to -02">
        <t><list style="symbols">
            <t>Defined the term "frame", incorporating the transmission time
            offset into its definition, and removed references to "video
            frame".</t>

            <t>Referred to "m(i)" from the text to make the derivation
            clearer.</t>

            <t>Made it clearer that we modify our estimates of available
            bandwidth, and not the true available bandwidth.</t>

            <t>Removed the appendixes outlining new extensions, added pointers
            to REMB draft and RFC 5450.</t>
          </list></t>
      </section>

      <section title="Version -02 to -03">
        <t><list style="symbols">
            <t>Added a section on how to process multiple streams in a single
            estimator using RTP timestamps to NTP time conversion.</t>

            <t>Stated in introduction that the draft is aimed at the RMCAT
            working group.</t>
          </list></t>
      </section>

      <section title="rtcweb-03 to rmcat-00">
        <t>Renamed draft to link the draft name to the rmcat WG.</t>

        <t></t>
      </section>
    </section>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 04:29:41