One document matched: draft-lennox-avtcore-rtp-multi-stream-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "reference.RFC.2119.xml">
<!ENTITY rtp SYSTEM "reference.RFC.3550.xml">
<!ENTITY rtptopo SYSTEM "reference.RFC.5117.xml">
<!ENTITY sip SYSTEM "reference.RFC.3261.xml">
<!ENTITY sdp SYSTEM "reference.RFC.4566.xml">
<!ENTITY srtp SYSTEM "reference.RFC.3711.xml">
<!ENTITY mikey SYSTEM "reference.RFC.3830.xml">
<!ENTITY rtprtx SYSTEM "reference.RFC.4588.xml">
<!ENTITY avpf SYSTEM "reference.RFC.4585.xml">
<!ENTITY offeranswer SYSTEM "reference.RFC.3264.xml">
<!ENTITY bundle SYSTEM "reference.I-D.ietf-mmusic-sdp-bundle-negotiation.xml">
<!ENTITY savpf SYSTEM "reference.RFC.5124.xml">
<!ENTITY reducedrtcp SYSTEM "reference.RFC.5506.xml">
<!ENTITY sourcedesc SYSTEM "reference.RFC.5576.xml">
<!ENTITY clue SYSTEM "reference.I-D.ietf-clue-framework.xml">
<!ENTITY xr SYSTEM "reference.RFC.3611.xml">
]>
<?rfc toc="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="std" docName="draft-lennox-avtcore-rtp-multi-stream-00"
     ipr="trust200902" updates="3550">
  <front>
    <title abbrev="RTP Multi-Stream Considerations">Real-Time Transport
    Protocol (RTP) Considerations for Endpoints Sending Multiple Media Streams</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>

          <street>Seventh Floor</street>

          <city>Hackensack</city>

          <region>NJ</region>

          <code>07601</code>

          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 82 87</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <date/>

    <area>RAI</area>

    <workgroup>AVTCORE</workgroup>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>This document expands and clarifies the behavior of the Real-Time
      Transport Protocol (RTP) endpoints when they are sending multiple media
      streams in a single RTP session. In particular, issues involving
      Real-Time Transport Control Protocol (RTCP) messages are described.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">
      <t>At the time The <xref target="RFC3550">Real-Time Tranport Protocol
      (RTP)</xref> was originally written, and for quite some time after,
      endpoints in RTP sessions typically only transmitted a single media
      stream per RTP session, where separate RTP sessions were typically used
      for each distinct media type.</t>

      <t>Recently, however, a number of scenarios have emerged (discussed
      further in <xref target="usecases"/>) in which endpoints wish to send
      multiple RTP media streams, distinguished by distinct RTP
      synchronization source (SSRC) identifiers, in a single RTP session.
      Although RTP's initial design did consider such scenarios, the
      specification was not consistently written with such use cases in mind.
      The specifications are thus somewhat unclear.</t>

      <t>The purpose of this document is to expand and clarify <xref
      target="RFC3550"/>'s language for these use cases. The authors believe
      this does not result in any major normative changes to the RTP
      specification, however this document defines how the RTP specification
      shall be interpreted.  In these cases, this document updates RFC3550.</t>
    </section>

    <section title="Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref> and indicate requirement levels for
      compliant implementations.</t>
    </section>

    <section anchor="usecases" title="Use Cases For Multi-Stream Endpoints">
      <t>This section discusses several use cases that have motivated the
      development of endpoints that send multiple streams in a single RTP
      session.</t>

      <section anchor="telepresence" title="Multiple-Capturer Endpoints">
        <t>The most straightforward motivation for an endpoint to send
        multiple media streams in a session is the scenario where an endpoint
        has multiple capture devices of the same media type and
        characteristics. For example, telepresence endpoints, of the type
        described by the <xref target="I-D.ietf-clue-framework">CLUE
        Telepresence Framework</xref> is designed, often have multiple cameras
        or microphones covering various areas of a room.</t>
      </section>

      <section title="Multi-Media Sessions">
        <t>Recent work has been done in <xref
        target="I-D.westerlund-avtcore-multi-media-rtp-session">RTP</xref> and
        <xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">SDP</xref> to
        update RTP's historical assumption that media streams of different
        media types would always be sent on different RTP sessions. In this
        work, a single endpoint's audio and video media streams (for example)
        are instead sent in a single RTP session.</t>
      </section>

      <section title="Multi-Stream Mixers">
        <t>There are several RTP topologies which can involve a central box
        which itself generates multiple media streams in a session.</t>

        <t>One example is a mixer providing centralized compositing for a
        multi-capturer scenario like the one described in <xref
        target="telepresence"/>. In this case, the centralized node is
        behaving much like a multi-capturer endpoint, generating several
        similar and related sources.</t>

        <t>More complicated is the Source Projecting Mixer, which is a central
        box that receives media streams from several endpoints, and then
        selectively forwards modified versions of some of the streams toward
        the other endpoints it is connected to. Toward one destination, a
        separate media source appears in the session for every other source
        connected to the mixer, "projected" from the original streams, but at
        any given time many of them may appear to be inactive (and thus
        receivers, not senders, in RTP). This box is an RTP mixer, not an RTP
        translator, in that it terminates RTCP reporting about the mixed
        streams, and it can re-write SSRCs, timestamps, and sequence numbers,
        as well as the contents of the RTP payloads, and can turn sources on
        and off at will without appearing to be generating packet loss. Each
        projected stream will typically preserve its original RTCP source
        description (SDES) information.</t>
      </section>
    </section>

    <section title="Multi-Stream Endpoint RTP Media Recommendations">
      <t>While an endpoint MUST (of course) stay within its share of the
      available session bandwidth, as determined by signalling and congestion
      control, this need not be applied independently or uniformly to each
      media stream. In particular, session bandwidth MAY be reallocated among
      an endpoint's media streams, for example by varying the bandwidth use of
      a variable-rate codec, or changing the codec used by the media stream,
      up to the constraints of the session's negotiated (or declared) codecs.
      This includes enabling or disabling media streams as more or less
      bandwidth becomes available.</t>
    </section>

    <section title="Multi-Stream Endpoint RTCP Recommendations">
      <t>The Real-Time Transport Control Protocol (RTCP) is defined in Section
      6 of <xref target="RFC3550"/>, but it is largely documented in terms of
      "participants". For multi-media-stream endpoints, it is generally most
      useful to interpret the specification such that each media stream is a
      separate "participant".</t>

      <t>For each of an endpoint's media media streams, whether or not it is
      currently being sent, SR/RR and SDES packets MUST be sent at least once
      per RTCP report interval. (For discussion of the content of SR or RR
      packets' reception statistic reports, see <xref
      target="reportblocks"/>.)</t>

      <t>When a new media stream is added to a unicast session, the sentence
      in <xref target="RFC3550"/>'s Section 6.2 applies: "For unicast sessions
      ... the delay before sending the initial compound RTCP packet MAY be
      zero." Thus, endpoints MAY send an initial RTCP packet for the media
      stream immediately upon adding to the session.</t>

      <t>Similarly, <xref target="RFC3550"/> Section 6.1 gives the following
      advice to RTP translators and mixers:</t>

      <t><list style="empty">
          <t>It is RECOMMENDED that translators and mixers combine individual
          RTCP packets from the multiple sources they are forwarding into one
          compound packet whenever feasible in order to amortize the packet
          overhead (see Section 7). An example RTCP compound packet as might
          be produced by a mixer is shown in Fig. 1. If the overall length of
          a compound packet would exceed the MTU of the network path, it
          SHOULD be segmented into multiple shorter compound packets to be
          transmitted in separate packets of the underlying protocol. This
          does not impair the RTCP bandwidth estimation because each compound
          packet represents at least one distinct participant. Note that each
          of the compound packets MUST begin with an SR or RR packet.</t>
        </list>Note: To avoid confusion, an RTCP packet is an individual item,
      such as an Sender Report (SR), Receiver Report (RR), Source Description
      (SDES), Goodbye (BYE), Application Defined (APP), <xref
      target="RFC4585">Feedback</xref> or <xref target="RFC3611">Extended
      Report (XR)</xref> packet. A compound packet is the combination of two
      or more such RTCP packets where the first packet must be an SR or an RR
      packet, and which contains a SDES packet containing an CNAME item. Thus the above
      results in compound RTCP packets that contain multiple SR or RR packets from
      different sources as well as any of the other packet types. There are no
      restrictions on the order the packets may occur within the compound
      packet, except the regular compound rule, i.e. starting with an SR or RR.
      </t>

      <t>This advice applies to multi-media-stream endpoints as well, with the
      same restrictions and considerations. (Note, however, that the last
      sentence does not apply to <xref target="RFC4585">AVPF</xref> or <xref
      target="RFC5124">SAVPF</xref> feedback packets if <xref
      target="RFC5506">Reduced-Size RTCP</xref> is in use.)</t>

      <t>Open Issue: Any clarifications on how one handle the scheduling of
      RTCP transmissions when having multiple sources? Alternatives include
      delaying one source to the next source's transmission, or to group
	  multiple sources to use only one scheduling. </t>

      <section anchor="reportblocks"
               title="Transmission of RTCP Reception Statistics">
        <t>As required by <xref target="RFC3550"/>, an endpoint MUST send
        reception reports about every active media stream it is receiving,
        from at least one local source.</t>

        <t>However, a naive application of the RTP specification's rules could
        be quite inefficient. In particular, if a session has N media sources
        (active and inactive), and had S senders in each reporting interval,
        there would either be N*S report blocks per reporting interval, or
        (per the round-robinning recommendations of <xref target="RFC3550"/>
        Section 6.1) reception sources would be unnecessarily round-robinned.
        In a session where most media sources become senders reasonably
        frequently, this results in quadratically many reception report blocks
        in the conference, or reporting delays proportional to the number of
        session members.</t>

        <t>Since traffic is received by endpoints, however, rather than by
        media sources, there is not actually any need for this quadratic
        expansion. All that is needed is for each endpoint to report all the
        remote sources it is receiving.</t>

        <t>Thus, an endpoint SHOULD NOT send reception reports from one of its
        own media sources about another one of its own ("self-reports").
        Similarly, an endpoint with multiple media sources SHOULD NOT send
        reception reports about a remote media source from more than one of
        its local sources ("cross-reports"). Instead, it SHOULD pick one of
        its local media sources as the "reporting" source for each remote
        media source, and use it to send reception reports for that remote
        source; all its other media sources SHOULD NOT send any reception
        reports for that remote media source.</t>

        <t>An endpoint MAY choose different local media sources as the
        reporting source for different remote media sources (for example, it
        could choose to send reports about remote audio sources from its local
        audio source, and reports about remote video sources from its local
        video source), or it MAY choose a single local source for all its
        reports. If the reporting source leaves the session (sends BYE),
        another reporting source MUST be chosen. This "reporting" source
        SHOULD also be the source for any AVPF feedback messages about its
        remote sources, as well.</t>
      </section>

      <section title="Consequences of Restricted RTCP Reception Statistics">
        <t>The RTCP traffic generated by receivers following the rules in
        <xref target="reportblocks"/> might appear, to observers unaware of
        the recommendations of this specification or knowledge about which
        end-points are associated with which SSRCs, to be generated by
        receivers who are experiencing a network disconnection. </t>

        <t>This could be a potentially critical problem when one uses RTCP for
        congestion control, as a sender might think that it is sending so
		  much traffic that it is causing complete congestion collapse.
		  At the same time, however, a congestion control solution is
        likely not interested in performing unecessary processing based on
        multiple reporting sources having identical statistics. A congestion
        control algorithm is likely more interested in frequent reporting from
        one specific source than multiple sources at the same end-point based
        on common statistics. That would reduce the uncertainty that sources
        are from the same end-point, and likely improve the interarrival time of the
        reporting, compared to multiple SSRCs which, by the RTCP algorithm, are
		deliberately desynchronized. However, this would clearly require
        clarifications on how the RTCP timer rules are to be treated. </t>

        <t>However, such an interpretation of the session statistics would
        require a fairly sophisticated RTCP analysis. Any receiver of RTCP
        statistics which is just interested in information about itself needs
        to be prepared that any given reception report might not contain
        information about a specific media source, because reception reports
        in large conferences can be round-robined.</t>

        <t>Thus, it is unclear to what extent this restriction would actually
        cause trouble in practice.</t>
      </section>

      <section anchor="alternate" title="Alternate Restriction Proposal">
        <t>If there are indeed scenarios in which the rules of <xref
        target="reportblocks"/> do cause troubles, an alternative solution
        would be to explicitly signal, in RTCP, which groups of media sources
        originate from a single endpoint. Thus, within a group of sources,
        receivers could know that there would not be self-reports, and only a
        single SSRC would be providing cross-reports. In such a mode, the
        signaling protocol would need to negotiate, or declare, that the
        mode was in use. </t>

        <t>The next question would be to determine how to indicate the groups
        of sources for this purpose. The sources' CNAMEs would probably not be
        sufficient, as some of the use cases described in <xref
        target="usecases"/>, notably the source-projecting mixer, result in a
        single endpoint generating sources with multiple CNAME values. Thus, a
        new SDES item would be needed for these purposes.</t>

        <t>TBD: If this solution is indeed taken, define the specifics of this
        SDES item, and the signaling needed to indicate its use.</t>
      </section>
    </section>

    <section anchor="security" title="Security Considerations">
      <t>In the <xref target="RFC3711">secure RTP protocol (SRTP)</xref>, the
      cryptographic context of a compound SRTCP packet is the SSRC of the
      sender of the first RTCP (sub-)packet. This could matter in some cases,
      especially for keying mechanisms such as <xref
      target="RFC3830">Mikey</xref> which use per-SSRC keying.</t>

      <t>Other than that, the standard security considerations of RTP apply;
      sending multiple media streams from a single endpoint does not appear to
      have different security consequences than sending the same number of
      streams.</t>
    </section>

    <section title="Open Issues">
      <t>At this stage this document contains a number of open issues. The
      below list tries to summarize the issues:<list style="numbers">
          <t>Any clarifications on how to handle the RTCP scheduler when
          sending multiple sources in one compound packet.</t>

          <t>Shall suppression of self-reporting, i.e. reporting one's other
          SSRCs in any SR/RR, be applied?</t>

          <t>Shall suppression of cross-reporting be used, i.e. each end-point uses
          only one SSRC to report on any non-local SSRCs being received? If so
          what method should be applied:<list style="numbers">
              <t>Implicit, by just not report using any other SSRC</t>

              <t>Explicit binding of SSRCs that are being commonly reported,
              either using SDES or another packet type, to explicitly indicate the
              SSRCs on whose behalf the report applies.</t>

              <t>Add any specific RTCP scheduler considerations.</t>
            </list></t>
        </list></t>

      <t/>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>This document makes no requests of IANA.</t>

      <t>Note to the RFC Editor: please remove this section before
      publication.</t>

      <t>(Note: This section may change if the alternative proposal of <xref
      target="alternate"/> is adopted.)</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc2119;

      &rtp;

      &srtp;

      &avpf;

      &savpf;

      &reducedrtcp;
    </references>

    <references title="Informative References">
      <!-- &rtptopo; -->

      <!-- &rtprtx; -->

      &bundle;

      <!-- &sourcedesc; -->

      &clue;

      <!-- &sdp; -->

      <!-- &offeranswer; -->

      &mikey;

      &xr;

      <!-- make this a real ref once the draft is published -->

      <reference anchor="I-D.westerlund-avtcore-multi-media-rtp-session">
        <front>
          <title abbrev="Multiple Media Types in an RTP Session">Multiple
          Media Types in an RTP Session</title>

          <author fullname="Magnus Westerlund" initials="M."
                  surname="Westerlund">
            <organization>Ericsson</organization>
          </author>

          <author fullname="Colin Perkins" initials="C" surname="Perkins"/>

          <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
            <organization abbrev="Vidyo">Vidyo, Inc.</organization>
          </author>

          <date day="9" month="July" year="2012"/>
        </front>

        <seriesInfo name="Internet-Draft"
                    value="draft-westerlund-avtcore-multi-media-rtp-session-00"/>

        <format target="http://www.ietf.org/internet-drafts/draft-westerlund-avtcore-multi-media-rtp-session-00.txt"
                type="TXT"/>
      </reference>
    </references>

    <!--
<section title='Open issues'>

<t><list style='symbols'>

<t></t>

</list></t>

</section>
-->

    <!-- 
<section title='Changes From Earlier Versions'>

<t>Note to the RFC-Editor: please remove this section prior to publication
as an RFC.</t>

<section title='Changes From Draft -00'>

<t><list style='symbols'>

<t></t>


</list></t>

</section>

</section>

-->
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 13:11:48