One document matched: draft-alvestrand-one-rtp-02.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-alvestrand-one-rtp-02" ipr="trust200902">
  <front>
    <title abbrev="Single RTP Session Grouping">SDP Grouping for Single RTP
    Sessions</title>

    <author fullname="Harald Tveit Alvestrand" initials="H. T."
            surname="Alvestrand">
      <organization>Google</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <region></region>

          <code>11122</code>

          <country>Sweden</country>
        </postal>

        <email>harald@alvestrand.no</email>
      </address>
    </author>

    <date day="28" month="September" year="2011" />

    <abstract>
      <t>This document describes an extension to the Session Description
      Protocol (SDP) to describe RTP sessions where media of multiple top
      level types, for example audio and video, are carried in the same RTP
      session.</t>

      <t>This document is presented to the RTCWEB, AVTCORE and MMUSIC WGs for
      consideration.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>In the work with the RTCWEB specifications <xref
      target="I-D.ietf-rtcweb-overview"></xref>, a need was discovered for
      representing within the SDP framework an SDP session consisting of a
      single RTP session, where that single RTP session, mapped to a single
      transport flow, contained multiple top-level data types.</t>

      <t>This is advantageous for the use case where there is no desire for
      different treatment by the network of the different flows in the
      session, there exist other appropriate mechanisms (for instance based on
      SSRC) to identify the flows in the session to the applications, and
      where the handling of multiple RTP sessions would increase the work
      required to establish the session (for instance by requiring multiple
      ICE <xref target="RFC5245"></xref> negotiations, or handling of failure
      cases where one RTP session is established and another is not).</t>

      <t>This document describes how to represent such a session.</t>

      <t></t>
    </section>

    <section title="Requirements for a solution">
      <t>The requirements for our representation are:</t>

      <t><list style="symbols">
          <t>It should be possible to represent an SDP session consisting of a
          single RTP session, where that session carries both audio and
          video.</t>

          <t>If this description is presented in an Offer in the offer/answer
          model to an entity that does not understand it, the resulting Answer
          should contain a valid description of an SDP session consisting of
          one video RTP session and one audio RTP session.</t>
        </list></t>
    </section>

    <section title="SDP Grouping Framework Parameter">
      <t>This document defines a new semantics extension called TOGETHER
      within the SDP Grouping framework <xref target="RFC5888"></xref>.</t>

      <t>The extension looks like this:</t>

      <t> a=group:TOGETHER <first> <subsequent>...</t>

      <t>The first media section mentioned in the list is special; it
      identifies the base media section for the group. This is referred to as
      the "first" media section below, but it may occur anywhere in the SDP
      description.</t>

      <t>If this semantics extension is present in an SDP Session-level
      a=group: line, the semantics are that the two or more media sections are
      intended to be read as components of the description of a single RTP
      session, creating a single SSRC numbering space that can contain
      components of all the types described in the referenced media
      sections.</t>

      <t>The following properties of the media sections are REQUIRED:</t>

      <t><list style="symbols">
          <t>The defined RTPMAP values of the section MUST NOT overlap</t>

          <t>The "proto" profile of the sections MUST be the same (e.g
          RTP/AVPF)</t>
        </list>The media sections MAY contain connection data (port numbers or
      ICE parameters), but some of these may be ignored in processing (see
      next section).</t>

      <t>The reason for the requirement for systematic proto is that there are
      many combinations that don't make sense (for instance "RTP/AVPF" in one
      section and "RTP/SAVP" in another would make encryption and availability
      of TMMBR depend on the outcome of negotiation, which seems strange). The
      cases where combinations make sense (RTP/AVPF with UDP/FEC for instance)
      also usually require that separate RTP sessions be used.</t>

      <t>It is RECOMMENDED that the source port number of the media sections
      are the same, since this will give the least difficulty for SDP-parsing
      intermediaries when trying to keep track of the media flows established
      as a result of negotiation.</t>
    </section>

    <section title="Use in Offer/Answer">
      <t>This extension MAY be included in a Offer; as specified in RFC 5888
      section 9 when describing SIP usage, if it is not included in an Offer,
      it MUST NOT be included in an answer.</t>

      <t>If the responder understands the semantics of the TOGETHER extension,
      the parameters of the first section MUST be used to establish the RTP
      session, and the parameters for the other sections MUST be ignored.</t>

      <t>The following parameters are taken from the first section only:</t>

      <t><list style="symbols">
          <t>Port number from the m= line</t>

          <t>All media-level attributes defined in RFC 5245 section 15.1 -
          this includes "candidate", "remote-candidates", "ice-mismatch",
          "ice-ufrag", "ice-pwd"</t>
        </list>The bandwidth of the "m" line is treated specially: The values
      for all "m=" lines in the group are added together, and the resulting
      value is taken to be the negotiated bandwidth value for the RTP
      session.</t>

      <t>The expected behaviour when the extension is present in an offer and
      not understood is that the generated answer will not contain the
      "a=group:TOGETHER" line, and that each sections' parameters will be
      used.</t>

      <t>If the answerer wishes to refuse an entire m= section, while
      accepting the RTP session otherwise, he MAY indicate this by setting the
      port number of the relevant section to zero. In all other cases, the
      port number of second and subsequent sections is to be ignored.</t>

      <t>The answerer cannot refuse the first m= section and establish the
      call.</t>
    </section>

    <section title="Parameter combining">
      <t>The general approach taken when combining the sections is to treat
      the parameters according to normal processing; for instance,
      a=fmtp:<payloadtype> parameters still bind to their . However, a
      number of special considerations have to apply.</t>

      <t><list style="symbols">
          <t>For the "a=rtpmap" lines, their interpretation depends on the
          "m=" line they occur under, even after combination; "a=rtpmap:8
          PCMA/8000" needs to carry the info that it occured under an
          "m=audio" line.</t>

          <t>For parameters that can only occur once in a section, such as the
          port number from the m= line, the occurence of the parameter that
          comes from the first section needs to take precedence.</t>

          <t>The "b=" (bandwidth) line needs to be considered specially, since
          the values are to be added together.</t>

          <t>Media-level ICE attributes need to come from the first section
          only, even though syntax would allow more occurences.</t>
        </list></t>
    </section>

    <section title="Interaction with other extensions">
      <t>If other extensions modify the bandwidth calculation algorithm, those
      extensions will have to take into consideration how bandwidth from
      multiple sections of the SDP description should be merged.</t>
    </section>

    <section title="RTCP bandwidth considerations">
      <t>A concern has been raised that when audio and video are combined, the
      bandwidth of RTCP reports required for an audio stream may exceed the
      bandwidth of the audio stream itself, which seems a bit bizarre. While
      not critical (overall RTCP bandwidth is still limited to 5% of the total
      bandwidth), this warrants a little more study.</t>

      <t>Considering a combined RTP session with one sender and one recipient,
      four 1-Mbit/sec video flows and four 100-Kbit/sec audio flows, all
      flowing in one direction.</t>

      <t>The total bandwidth is 4.4 Mbit/sec, so if the RTCP share of the
      bandwidth is 5% as recommended by <xref target="RFC3550"></xref> section
      6.2, the RTCP bandwidth limit is 220 Kbits/sec. Eight SSRCs need to be
      reported on.</t>

      <t>Each report sender will have 24.4 Kbits/second of RTCP bandwidth at
      its disposal. Assuming a packet size of 100 bytes (11 bytes per SSRC
      reported on), the maximum RTCP rate allowed is 30 RTCP packets per
      second, which is slightly slower than the typical audio heartbeat flow
      of 50 packets per second (20 ms interval).</t>

      <t>If this is deemed excessive, one can adopt the RTP/AVPF model of
      5-second regular RTCP reports with additional availability of
      "on-demand" RTCP packets. But the RTCP feedback interval also enters
      into congestion control algorithms, which may complicate the
      picture.</t>

      <t></t>
    </section>

    <section title="Examples">
      <t>The examples are taken from RFC 4317, "SDP Offer/Answer
      Examples".</t>

      <t></t>

      <figure>
        <preamble>Offer</preamble>

        <artwork><![CDATA[
      v=0
      o=alice 2890844526 2890844526 IN IP4 host.atlanta.example.com
      s=
      c=IN IP4 host.atlanta.example.com
      t=0 0
      a=group:TOGETHER foo bar
      m=audio 49170 RTP/AVP 0 8 97
      a=mid:foo
      b=AS:200
      a=rtpmap:0 PCMU/8000
      a=rtpmap:8 PCMA/8000
      a=rtpmap:97 iLBC/8000
      m=video 49170 RTP/AVP 31 32
      a=mid:bar
      b=AS:1000
      a=rtpmap:31 H261/90000
      a=rtpmap:32 MPV/90000]]></artwork>
      </figure>

      <t></t>

      <t>This is a request to have both audio and video sent over port 49170,
      and invites the responder to accept these on the same port, creating a
      single RTP session. If this can't be done, audio and video will be sent
      over port 49170, but the respondent's different port numbers will be
      used, creating different 5-tuples for the two RTP sessions. The total
      bandwidth, if combined, is 1200 Kbits/second; if separated, 200 Kbits
      goes to audio and 1000 Kbits goes to video.</t>

      <figure>
        <preamble>Answer, from an entity that understands TOGETHER</preamble>

        <artwork><![CDATA[
      v=0
      o=bob 2808844564 2808844564 IN IP4 host.biloxi.example.com
      s=
      c=IN IP4 host.biloxi.example.com
      t=0 0
      a=group:TOGETHER foo bar
      m=audio 49174 RTP/AVP 0
      a=mid:foo
      b=AS:200
      a=rtpmap:0 PCMU/8000
      m=video 49174 RTP/AVP 32
      a=mid:bar
      b=AS:1000
      a=rtpmap:32 MPV/90000]]></artwork>
      </figure>

      <t>After processing this answer, video and audio will flow together on
      one RTP session between initiator port 49170 and responder port
      49174.</t>

      <figure>
        <preamble>Answer, from an entity that understands grouping, but does
        not understand TOGETHER</preamble>

        <artwork><![CDATA[      
      v=0
      o=bob 2808844564 2808844564 IN IP4 host.biloxi.example.com
      s=
      c=IN IP4 host.biloxi.example.com
      t=0 0
      m=audio 49174 RTP/AVP 0
      a=mid:foo
      a=rtpmap:0 PCMU/8000
      b=AS:200
      m=video 49175 RTP/AVP 32
      a=mid:bar
      a=rtpmap:32 MPV/90000
      b=AS:1000]]></artwork>
      </figure>

      <t>After processing this answer, video will flow between initiator port
      49170 and responder port 49174, while audio flows between initiator port
      49170 and responder port 49175, forming two RTP sessions.</t>

      <figure>
        <preamble>Answer, from an entity that does not understand
        grouping</preamble>

        <artwork><![CDATA[      v=0
      o=bob 2808844564 2808844564 IN IP4 host.biloxi.example.com
      s=
      c=IN IP4 host.biloxi.example.com
      t=0 0
      m=audio 49174 RTP/AVP 0
      a=rtpmap:0 PCMU/8000
      b=AS:200
      m=video 49175 RTP/AVP 32
      a=rtpmap:32 MPV/90000
      b=AS:1000]]></artwork>
      </figure>

      <t>After processing this answer, the flows set up will be the same ones
      as in the previous example.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document requests IANA to register the new SDP Grouping semantic
      extension called TOGETHER.</t>

      <t></t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>No new security issues have been raised specifically for this
      extension.</t>

      <t>Third-party interceptors that sniff negotiation but do not understand
      the extension may end up listening to the wrong port number for some of
      the media flows. This is not deemed greatly harmful.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>This draft is based on a discussion between a number of participants
      at the Quebec City IETF, July 2011, about the issue of multiplexing
      audio and video on a single network transport using RTP.</t>

      <t>Thanks to (in alphabetical order) Christer Holmberg, Randell Jesup,
      Paul Kyzivat, Muthu Arul Pozhi Perumal, Justin Uberti for reviews and
      comments.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.5245'?>

      <?rfc include='reference.RFC.5888'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.I-D.ietf-rtcweb-overview'?>

      <?rfc include='reference.RFC.5761'?>

      <?rfc include='reference.RFC.6051'?>
    </references>

    <section title="Change log">
      <t></t>

      <section title="From draft-alvestrand-one-rtp-00 to -01">
        <t>Added change log and "other matters" appendix.</t>

        <t>Added section on parameter combining.</t>

        <t>Added an example for an entity that does not understand
        grouping.</t>

        <t>Added port-zero and no-codecs methods for refusing a section.</t>
      </section>

      <section title="From draft-alvestrand-one-rtp-01 to -02">
        <t>Clarified that "first" refers to the first section in a TOGETHER
        section.</t>

        <t>Allowed and encouraged the port numbers of sections to be equal
        (after mmusic discussion).</t>

        <t>Added some more text describing the result of processing the
        example answers.</t>
      </section>
    </section>

    <section title="Other matters">
      <t>This appendix is not normative.</t>

      <t>A number of matters have been raised in discussion on this draft.
      Some of the discussions have led to actual changes, others have pointed
      to the need to identify other documents as authoritative on these
      issues. This appendix serves to collect such matters.</t>

      <t></t>

      <section title="RTCP session establishment">
        <t>This document does not specify anything about RTCP sessions. Since
        a purpose of the specification is to minimize the number of transport
        connections, it's natural to use the "a=rtcp-mux" attribute defined in
        <xref target="RFC5761"></xref>, but this document does not specify
        anything about that matter.</t>

        <t>If the TOGETHER extension is successfully negotiated, there will be
        only one RTCP session for the RTP session described by the TOGETHER
        group.</t>

        <t>Since the purpose of this extension is to have SSRCs carrying
        multiple media types on one RTP session, it may be more important than
        usual to get SSRC metainformation such as CNAME quicly; this can be
        done by using the recommendation in <xref target="RFC6051"></xref>to
        send an RTCP SR at once when starting the flow, and the recipient can
        use the RTCP-SR-REQ defined in the same document to request an RTCP SR
        if the initial one is lost.</t>
      </section>

      <section title="Renegotiation">
        <t>After the session has been established, there might occur a need to
        change its parameters. At the moment, no issues that are different
        from the issues for a session that does not use TOGETHER have been
        identified.</t>

        <t>If a renegotiation mechanism that uses SDP fragments, rather than
        whole SDP descriptors, is developed, there might be a need for
        specifying that all the fragments included in the TOGETHER group are
        sent together. This will have to wait for the development of such a
        mechanism.</t>
      </section>

      <section title="ICE negotiation sequence">
        <t>When an SDP agent sends out an offer using TOGETHER and ICE, it
        must choose one of two strategies:</t>

        <t><list style="symbols">
            <t>List ICE parameters for all sections. This means that candidate
            gathering and opening of local ports must happen before the SDP
            can be generated, and some of the ports get closed, unused, after
            the answer has been received. Since the generation of candidates
            may involve negotiation with a remote server such as a TURN
            server, this may be an expensive proposition. If the same port
            number is used, the same ICE parameters can be used for all
            sections too.</t>

            <t>List ICE parameters for only the first section. If the TOGETHER
            extension is not understood, this will lead to only the first
            section's RTP session being established. It is up to the
            application whether or not to consider this to be a call
            failure.</t>
          </list>The choice between these two options is left as a local
        matter.</t>

        <t>The offerer may also choose to send out ICE probes on either one
        port or all ports even before the answer comes back, in order to speed
        up connection establishment; this too is left as a local matter.</t>

        <t></t>
      </section>

      <section title="SDP-inspecting intermediaries">
        <t>There may exist intermediaries that inspect, but do not modify, the
        SDP flow, and take some action based on what they parse there.</t>

        <t>Such intermediaries are architecturally unsound in general (if they
        really are needed, they should be explicit participants and be able to
        refuse to carry the extension they don't understand; if not, their
        working is of no concern to the callers), but they still exist. Some
        of them will even take active steps to cause a media stream to be
        closed if they don't see any packets on the port pair that they
        interpret as having been established by the exchange, causing
        difficulty for the endpoints. The use of the same port number for all
        media sections on the offer mitigates this issue.</t>

        <t>In the RTCWEB use case, they may have more problems existing, since
        the negotiation will normally be carried over encrypted HTTPS
        connections. In other cases, where negotiation is done in the clear,
        they may be more common.</t>

        <t></t>
      </section>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 02:44:18