One document matched: draft-lennox-avtcore-rtp-multi-stream-01.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "reference.RFC.2119.xml">
<!ENTITY rtp SYSTEM "reference.RFC.3550.xml">
<!ENTITY rtptopo SYSTEM "reference.I-D.westerlund-avtcore-rtp-topologies-update.xml">
<!ENTITY sip SYSTEM "reference.RFC.3261.xml">
<!ENTITY sdp SYSTEM "reference.RFC.4566.xml">
<!ENTITY srtp SYSTEM "reference.RFC.3711.xml">
<!ENTITY mikey SYSTEM "reference.RFC.3830.xml">
<!ENTITY rtprtx SYSTEM "reference.RFC.4588.xml">
<!ENTITY avpf SYSTEM "reference.RFC.4585.xml">
<!ENTITY offeranswer SYSTEM "reference.RFC.3264.xml">
<!ENTITY bundle SYSTEM "reference.I-D.ietf-mmusic-sdp-bundle-negotiation.xml">
<!ENTITY savpf SYSTEM "reference.RFC.5124.xml">
<!ENTITY reducedrtcp SYSTEM "reference.RFC.5506.xml">
<!ENTITY sourcedesc SYSTEM "reference.RFC.5576.xml">
<!ENTITY clue SYSTEM "reference.I-D.ietf-clue-framework.xml">
<!ENTITY xr SYSTEM "reference.RFC.3611.xml">
<!ENTITY session SYSTEM "reference.I-D.ietf-avtcore-multi-media-rtp-session.xml">
<!ENTITY cname SYSTEM "reference.RFC.6222.xml">
]>
<?rfc toc="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="std" docName="draft-lennox-avtcore-rtp-multi-stream-01"
     ipr="trust200902" updates="3550">
  <front>
    <title abbrev="RTP Multi-Stream Considerations">Real-Time Transport
    Protocol (RTP) Considerations for Endpoints Sending Multiple Media
    Streams</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>

          <street>Seventh Floor</street>

          <city>Hackensack</city>

          <region>NJ</region>

          <code>07601</code>

          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 82 87</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <date/>

    <area>RAI</area>

    <workgroup>AVTCORE</workgroup>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>This document expands and clarifies the behavior of the Real-Time
      Transport Protocol (RTP) endpoints when they are sending multiple media
      streams in a single RTP session. In particular, issues involving
      Real-Time Transport Control Protocol (RTCP) messages are described.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">
      <t>At the time The <xref target="RFC3550">Real-Time Tranport Protocol
      (RTP)</xref> was originally written, and for quite some time after,
      endpoints in RTP sessions typically only transmitted a single media
      stream per RTP session, where separate RTP sessions were typically used
      for each distinct media type.</t>

      <t>Recently, however, a number of scenarios have emerged (discussed
      further in <xref target="usecases"/>) in which endpoints wish to send
      multiple RTP media streams, distinguished by distinct RTP
      synchronization source (SSRC) identifiers, in a single RTP session.
      Although RTP's initial design did consider such scenarios, the
      specification was not consistently written with such use cases in mind.
      The specifications are thus somewhat unclear.</t>

      <t>The purpose of this document is to expand and clarify <xref
      target="RFC3550"/>'s language for these use cases. The authors believe
      this does not result in any major normative changes to the RTP
      specification, however this document defines how the RTP specification
      shall be interpreted. In these cases, this document updates RFC3550.</t>

      <t>The document starts with terminology and some use cases where
      multiple sources will occur. This is followed by some case studies to
      try to identify issues that exist and need considerations. This is
      followed by RTP and RTCP recommendations to resolve issues. Next are
      security considerations and remaining open issues.</t>
    </section>

    <section title="Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref> and indicate requirement levels for
      compliant implementations.</t>
    </section>

    <section anchor="usecases" title="Use Cases For Multi-Stream Endpoints">
      <t>This section discusses several use cases that have motivated the
      development of endpoints that send multiple streams in a single RTP
      session.</t>

      <section anchor="telepresence" title="Multiple-Capturer Endpoints">
        <t>The most straightforward motivation for an endpoint to send
        multiple media streams in a session is the scenario where an endpoint
        has multiple capture devices of the same media type and
        characteristics. For example, telepresence endpoints, of the type
        described by the <xref target="I-D.ietf-clue-framework">CLUE
        Telepresence Framework</xref> is designed, often have multiple cameras
        or microphones covering various areas of a room.</t>
      </section>

      <section title="Multi-Media Sessions">
        <t>Recent work has been done in <xref
        target="I-D.ietf-avtcore-multi-media-rtp-session">RTP</xref> and <xref
        target="I-D.ietf-mmusic-sdp-bundle-negotiation">SDP</xref> to update
        RTP's historical assumption that media streams of different media
        types would always be sent on different RTP sessions. In this work, a
        single endpoint's audio and video media streams (for example) are
        instead sent in a single RTP session.</t>
      </section>

      <section title="Multi-Stream Mixers">
        <t>There are several RTP topologies which can involve a central box
        which itself generates multiple media streams in a session.</t>

        <t>One example is a mixer providing centralized compositing for a
        multi-capturer scenario like the one described in <xref
        target="telepresence"/>. In this case, the centralized node is
        behaving much like a multi-capturer endpoint, generating several
        similar and related sources.</t>

        <t>More complicated is the Source Projecting Mixer, see <xref
        target="I-D.westerlund-avtcore-rtp-topologies-update">Section
        3.6</xref>, which is a central box that receives media streams from
        several endpoints, and then selectively forwards modified versions of
        some of the streams toward the other endpoints it is connected to.
        Toward one destination, a separate media source appears in the session
        for every other source connected to the mixer, "projected" from the
        original streams, but at any given time many of them may appear to be
        inactive (and thus receivers, not senders, in RTP). This box is an RTP
        mixer, not an RTP translator, in that it terminates RTCP reporting
        about the mixed streams, and it can re-write SSRCs, timestamps, and
        sequence numbers, as well as the contents of the RTP payloads, and can
        turn sources on and off at will without appearing to be generating
        packet loss. Each projected stream will typically preserve its
        original RTCP source description (SDES) information.</t>
      </section>
    </section>

    <section title="Issue Cases">
      <t>This section tries to illustrate a few cases that have been
      determined to cause issues.</t>

      <section title="Cascaded Multi-party Conference with Source
      Projecting Mixers" anchor="source-proj-mixers">
        <t>This issue case tries to illustrate the effect of having multiple
        SSRCs sent by an endpoint, by considering the traffic between
        two source-projecting mixers in a large multi-party conference.</t>

        <t>For concreteness, consider a 200-person conference,
        where 16 sources are viewed at any given time.  Assuming
        participants are distributed evenly among the mixers, each
        mixer would have 100 sources "behind" (projected through) it,
        of which at any given time eight are active senders.  Thus,
        the RTP session between the mixers consists of two endpoints,
        but 200 sources.</t>

		<t>The RTCP bandwidth implications of this scenario are
		discussed further in <xref target="reportgroups-bw"/>.</t>

		<t>(TBD: Other examples?)</t>

      </section>

    </section>

    <section title="Multi-Stream Endpoint RTP Media Recommendations">
      <t>While an endpoint MUST (of course) stay within its share of the
      available session bandwidth, as determined by signalling and congestion
      control, this need not be applied independently or uniformly to each
      media stream. In particular, session bandwidth MAY be reallocated among
      an endpoint's media streams, for example by varying the bandwidth use of
      a variable-rate codec, or changing the codec used by the media stream,
      up to the constraints of the session's negotiated (or declared) codecs.
      This includes enabling or disabling media streams as more or less
      bandwidth becomes available.</t>
    </section>

    <section title="Multi-Stream Endpoint RTCP Recommendations">
      <t>This section contains a number of different RTCP clarifications or
      recommendations that enables more efficient and simpler behavior without
      loss of functionality.</t>

      <t>The Real-Time Transport Control Protocol (RTCP) is defined in
        Section 6 of <xref target="RFC3550"/>, but it is largely documented in
        terms of "participants".  In many cases, the specification's recommendations for
        "participants" should be interpreted as applying to individual
        media streams, rather than to endpoints.  This section
        describes several concrete cases where this applies.</t>

      <section title="RTCP Reporting Requirement">
        <t>For each of an endpoint's media media streams, whether or not it is
        currently sending media, SR/RR and SDES packets MUST be sent at least
        once per RTCP report interval. (For discussion of the content of SR or
        RR packets' reception statistic reports, see <xref
        target="reportgroups"/>.)</t>
      </section>

      <section title="Initial Reporting Interval">
        <t>When a new media stream is added to a unicast session, the sentence
        in <xref target="RFC3550"/>'s Section 6.2 applies: "For unicast
        sessions ... the delay before sending the initial compound RTCP packet
        MAY be zero." This applies to individual media sources as
        well.  Thus, endpoints MAY send an initial RTCP packet for an
        SSRC immediately upon adding it to a unicast session.</t>

        <t>This allowance also applies, as written, when initially joining a
        unicast session. However, in this case some caution should be
        excersied if the end-point or mixer has a large number of sources
        (SSRCs) as this can create a significant burst. How big an issue this
        depends on the number of source to send initial SR or RR and Session
        Description CNAME items for in relation to the RTCP bandwidth. TBD:
        Maybe some recommendation here?</t>
      </section>

      <section title="Compound RTCP Packets" anchor="compound">
        <t>Section 6.1 gives the following advice to RTP translators and
        mixers:</t>

        <t><list style="empty">
            <t>It is RECOMMENDED that translators and mixers combine
            individual RTCP packets from the multiple sources they are
            forwarding into one compound packet whenever feasible in order to
            amortize the packet overhead (see Section 7). An example RTCP
            compound packet as might be produced by a mixer is shown in Fig.
            1. If the overall length of a compound packet would exceed the MTU
            of the network path, it SHOULD be segmented into multiple shorter
            compound packets to be transmitted in separate packets of the
            underlying protocol. This does not impair the RTCP bandwidth
            estimation because each compound packet represents at least one
            distinct participant. Note that each of the compound packets MUST
            begin with an SR or RR packet.</t>
          </list></t>

        <t>Note: To avoid confusion, an RTCP packet is an individual item,
        such as a Sender Report (SR), Receiver Report (RR), Source
        Description (SDES), Goodbye (BYE), Application Defined (APP), <xref
        target="RFC4585">Feedback</xref> or <xref target="RFC3611">Extended
        Report (XR)</xref> packet. A compound packet is the combination of two
        or more such RTCP packets where the first packet must be an SR or an
        RR packet, and which contains a SDES packet containing an CNAME item.
        Thus the above results in compound RTCP packets that contain multiple
        SR or RR packets from different sources as well as any of the other
        packet types. There are no restrictions on the order the packets may
        occur within the compound packet, except the regular compound rule,
        i.e. starting with an SR or RR.</t>

        <t>This advice applies to multi-media-stream endpoints as well, with
        the same restrictions and considerations. (Note, however, that the
        last sentence does not apply to <xref target="RFC4585">AVPF</xref> or
        <xref target="RFC5124">SAVPF</xref> feedback packets if <xref
        target="RFC5506">Reduced-Size RTCP</xref> is in use.)</t>

        <t>Due to RTCP's randomization of reporting times,
		  there is a fair bit of tolerance in precisely when an
		  endpoint schedules RTCP to be sent.  Thus, one potential
		  way of implementing this recommendation would be to
		  randomize all of an endpoint's sources together, with a
		  single randomization schedule, so an MTU's worth of RTCP all
		  comes out simultaneously.</t>

		<t>TBD: Multiplexing RTCP packets from multiple different
		sources may require some adjustment to the calculaton of
		RTCP's avg_rtcp_size, as the RTCP group interval is
		proportional to avg_rtcp_size times the group size.</t>

      </section>
	</section>

	<section title="RTCP Bandwidth Considerations When Sources have
	Greatly-Differing Bandwidths">

	  <t>it is possible for an RTP session to carry sources of greatly
	  differing bandwidths.  One example is the scenario of
	  <xref target="I-D.ietf-avtcore-multi-media-rtp-session"/>, when
	  audio and video are sent in the same session.  However, this can
	  occur even within a single media type, for example a video
	  session carrying both 5 fps QCIF and 60 fps 1080p HD video, or
	  an audio session carrying both G.729 and L24/48000/6 audio.</t>

	  <t>TBD: recommend how RTCP bandwidths should be chosen in these
	  scenarios.  Likely, these recommendations will be different for
	  sessions using AVPF-based profiles (where the trr-int parameter
	  is available) than for those using AVP.</t>

	</section> 

      <section anchor="reportgroups"
               title="Grouping of RTCP Reception Statistics and
               Other Feedback">
        <t>As required by <xref target="RFC3550"/>, an endpoint MUST send
        reception reports about every active media stream it is receiving,
        from at least one local source.</t>

        <t>However, a naive application of the RTP specification's rules could
        be quite inefficient. In particular, if a session has N media sources
        (active and inactive), and has S senders in each reporting interval,
        there will either be N*S report blocks per reporting interval, or
        (per the round-robinning recommendations of <xref target="RFC3550"/>
        Section 6.1) reception sources would be unnecessarily round-robinned.
        In a session where most media sources become senders reasonably
        frequently, this results in quadratically many reception report blocks
        in the conference, or reporting delays proportional to the number of
        session members.</t>

        <t>Since traffic is received by endpoints, however, rather than by
        media sources, there is not actually any need for this quadratic
        expansion. All that is needed is for each endpoint to report all the
        remote sources it is receiving.</t>

		<t>Thus, this document defines a new RTCP mechanism, Reporting
		Groups, to indicate sources which originate from the same
		endpoint, and which therefore would have identical reecption
		reports.</t>

		<section title="Semantics and Behavior of Reporting Groups">

		<t>An RTCP Reporting Group indicates that a set of sources
		originate from a single entity in an RTP session, and
		therefore all the sources in the group's view of the network
		is identical.  Typically, a Reporting Group corresponds to a
		physical entity in the network.</t>

        <t>If reporting groups are in use, an endpoint MUST NOT send reception reports from one source
        in a reporting group about another one in the same group ("self-reports").
        Similarly, an endpoint MUST NOT send
        reception reports about a remote media source from more than
        one sources in a reporting group ("cross-reports"). Instead, it MUST pick one of
        its local media sources as the "reporting" source for each remote
        media source, and use it to send reception reports for that remote
        source; all its other media sources MUST NOT send any reception
        reports for that remote media source.</t>

        <t>An endpoint MAY choose different local media sources as the
        reporting source for different remote media sources (for example, it
        could choose to send reports about remote audio sources from a local
        audio source, and reports about remote video sources from a local
        video source), or it MAY choose a single local source for all its
        reports.   This reporting source
        MUST also be the source for any <xref target="RFC4585">AVPF
        Feedback</xref> or <xref target="RFC3611">Extended Report (XR)</xref> packets 
        about the corresponding remote sources as
        well.  If a reporting source leaves the session (i.e., if
        it sends a BYE, or leaves the group without sending BYE under the rules
        of <xref target="RFC3550"/> section 6.3.7),
        another reporting source MUST be chosen, if the sources
        it was reporting on are still in the session.</t>

		<t>If AVPF feedback is in use, a reporting source MAY send
		immediate or early feedback at any point when any member of
		the reporting group could validly do so.</t>

		<t>An endpoint SHOULD NOT create single-source reporting
		groups, unless it is anticipated that the group might have
		additional sources added to it in the future.</t>

		</section>

		<section title="RTCP Source Description (SDES) item for
		Reporting Groups">

		  <t>A new Source Description (SDES) item, "RGRP", indicates
		  that a sources is a member of a specified reporting group.
		  Syntactically, its format is the same as the
		  <xref target="RFC6222">RTCP CNAME</xref>, and MUST be chosen
		  with the same global-uniqueness and privacy considerations
		  as CNAME.</t>

		  <t>Every source which belongs to a reporting group MUST
		  include an RGRP SDES item in an SDES packet, alongside its
		  CNAME, in every compound RTCP packet in which it sends an RR
		  or SR packet.  (I.e., in every RTCP packet it sends, unless
		  <xref target="RFC5506">Reduced-Size RTCP</xref> is in use.)</t>

		</section>

		<section title="SDP signaling for Reporting Groups">

		<t>TBD</t>

		</section>

		<section anchor="reportgroups-bw" title="Bandwidth Benefits of RTCP Reporting Groups">

		<t>To understand the benefits of RTCP reporting groups, consider the scenario described in
		<xref target="source-proj-mixers"/>.  This scenario describes
		an environment in which the two endpoints in a session each
		have a hundred sources, of which eight each are sending
		within any given reporting interval.</t>

		<t>For ease of analysis, we can make the simplifying
		approximation that the duration of the RTCP reporting interval is equal to the
		total size of the RTCP packets sent during an RTCP interval,
		divided by the RTCP bandwidth.  (This will be approximately
		true in scenarios where the bandwidth is not so high that the
		minimum RTCP interval is reached.) For further simplification,
		we can assume RTCP senders are following the recommendations
		of <xref target="compound"/>; thus, the per-packet
		transport-layer overhead will be small relative to the RTCP
		data.  Thus, only the actual RTCP data itself need be
		considered.</t>

		<t>In a report interval in this scenario, there will, as a
		baseline, be 200 SDES packets, 184 RR packets, and 16 SR
		packets.  This amounts to approximately 6.5 kB of RTCP per
		report interval, assuming 16-byte CNAMEs and no other SDES
		information.</t>

		<t>Using naive everyone-reports-on-every-sender feedback
		rules, each of the 184 receivers will send 16 report blocks,
		and each of the 16 senders will send 15.  This amounts to
		approximately 76 kB of report block traffic per interval; 92%
		of RTCP traffic consists of report blocks.</t>

		<t>If reporting groups are used, however, there is only 0.4 kB
		of reports per interval, with no loss of
		useful information.  Additionally, there will be (assuming
		16-byte RGRPs as well) an additional 3.2 kB per cycle of RGRP
		SDES items.  Put another way, the naive case's
		reporting interval is approximately 7.5 times longer than if
		reporting groups are in use.</t>

		</section>


        <section title="Consequences of RTCP Reporting Groups">
          <t>The RTCP traffic generated by receivers using RTCP
          Reporting Groups might appear, to observers unaware of
          these semantics, to be generated by
          receivers who are experiencing a network disconnection, as
          the non-reporting sources appear not to be receiving a given
          sender at all.</t>

          <t>This could be a potentially critical problem for such a sender useing RTCP
          for congestion control, as such a sender might think that it is sending
          so much traffic that it is causing complete congestion collapse.</t>

          <t>However, such an interpretation of the session statistics would
          require a fairly sophisticated RTCP analysis. Any receiver of RTCP
          statistics which is just interested in information about itself
          needs to be prepared that any given reception report might not
          contain information about a specific media source, because reception
          reports in large conferences can be round-robined.</t>

          <t>Thus, it is unclear to what extent such backward
          compatibility issues would actually cause trouble in practice.</t>
        </section>

    </section>

    <section anchor="security" title="Security Considerations">
      <t>In the <xref target="RFC3711">secure RTP protocol (SRTP)</xref>, the
      cryptographic context of a compound SRTCP packet is the SSRC of the
      sender of the first RTCP (sub-)packet. This could matter in some cases,
      especially for keying mechanisms such as <xref
      target="RFC3830">Mikey</xref> which use per-SSRC keying.</t>

      <t>Other than that, the standard security considerations of RTP apply;
      sending multiple media streams from a single endpoint does not appear to
      have different security consequences than sending the same number of
      streams.</t>
    </section>

    <section title="Open Issues">
      <t>At this stage this document contains a number of open issues. The
      below list tries to summarize the issues:<list style="numbers">
          <t>Further clarifications on how to handle the RTCP scheduler when
          sending multiple sources in one compound packet.</t>

		  <t>How should the use of reporting groups be signaled in SDP?</t>

		  <t>How should the RTCP avg_rtcp_size be calculated when
		  RTCP packets are routinely multiplexed among multiple RTCP
		  senders?</t>

          <t>Do we need to provide a recommendation for unicast session
          joiners with many sources to not use 0 initial minimal interval from
          bit-rate burst perspective?</t>
        </list></t>

      <t/>
    </section>

    <section anchor="iana" title="IANA Considerations">

	  <t>This document adds an additional SDES type to the IANA "RTCP
	  SDES Item Types" Registry, as follows:</t>

<figure anchor="RTCP-item" title="Initial Contents of IANA Source Attribute Registry">
<artwork>
Value    Abbrev      Name              Reference
TBD      RGRP        Reporting Group   [RFC&rfc.number;]
</artwork>
</figure>

<t>(Note to the RFC-Editor: please replace "TBD" with the
  IANA-assigned value, and "&rfc.number;" with the
  number of this document, prior to publication as an RFC.)</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc2119;

      &rtp;

      &srtp;

      &avpf;

      &savpf;

      &reducedrtcp;

	  &cname;
    </references>

    <references title="Informative References">
      &rtptopo;

      &bundle;

      &clue;

      &mikey;

      &xr;

      &session;

      <!-- &rtprtx; -->

      <!-- &sourcedesc; -->

      <!-- &sdp; -->

      <!-- &offeranswer; -->
    </references>

    <section title="Changes From Earlier Versions">
      <t>Note to the RFC-Editor: please remove this section prior to
      publication as an RFC.</t>

      <section title="Changes From Draft -00">
        <t><list style="symbols">
            <t>Added the Reporting Group semantic to explicitly
            indicate which sources come from a single endpoint, rather
            than leaving it implicit.</t>
			<t>Specified that Reporting Group semantics (as they now
			are) apply to AVPF and XR, as well as to RR/SR report
			blocks.</t>
			<t>Added a description of the cascaded source-projecting
			mixer, along with a calculation of its RTCP overhead if
			reporting groups are not in use.</t>
			<t>Gave some guidance on how the flexibility of RTCP
			randomization allows some freedom in RTCP multiplexing.</t>
			<t>Clarified the language of several of the
			recommendations.</t>
			<t>Added an open issue discussing how avg_rtcp_size should
			be calculated for multiplexed RTCP.</t>
			<t>Added an open issue discussing RTCP bandwidths should
			be chosen for sessions where source bandwidths greatly
			differ.</t>
          </list></t>
      </section>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 01:07:21