http://stupid.domain.name/ietf/

One document matched: draft-roach-mmusic-mlines-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
  <!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>
  <!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5576.xml'>
  <!ENTITY bundle PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-mmusic-sdp-bundle-negotiation-01.xml'>
  <!ENTITY mmt PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-holmberg-mmusic-sdp-mmt-negotiation-00.xml'>

  <!ENTITY nbsp " ">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="no"?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>

<rfc category="info"
     ipr="trust200902" >
  <front>
    <title abbrev='Media Stream Syntax'>
      Thoughts on syntax for representing multiple media streams
    </title>

    <author fullname="Adam Roach" initials="A. B." surname="Roach">
      <organization>Mozilla</organization>

      <address>
        <postal>
          <street></street>
          <city>Dallas</city>
          <region>TX</region>
          <code></code>
          <country>US</country>
        </postal>
        <email>adam@nostrum.com</email>
      </address>
    </author>

    <date/> <!-- Date is auto-generated -->

    <area>RAI</area>
    <workgroup>MMUSIC</workgroup>

    <abstract>
      <t>
        This document briefly explores the ramifications of
        combining multiple media streams into one SDP m=
        section versus expressing each in its own
        m= section.
      </t>
    </abstract>

  </front>

  <middle>
    <section title="Introduction">
      <t>
        As part of the ongoing RTCWEB and CLUE work, it has
        become clear that the current mechanisms in SDP
        are insufficient for describing complex sessions
        with multiple streams. Two competing schools of
        thought have emerged. One holds that the m=
        lines should apply to RTP sessions, regardless
        of how many media streams they contain. Another
        holds that m= lines should apply to media streams
        exclusively, and that an additional mechanism
        should be applied to combine multiple streams
        into a single RTP session, if necessary.
      </t>
    </section>

    <section title="Alternatives">
      <section title="Alternative 1: Multiple streams per m= section">
        <t>
          One approach to specifying multiple streams in a
          single RTP session is to put information for several
          streams into a single m= section; and, by doing do,
          implicitly combine them into a single session.
        </t>
        <t>
          To maintain some level of backwards compatibility with
          SDP, this approach might choose to have one m= section
          for audio and a second for video (with additional
          m= sections for other media types if they are used
          in the future), combining those sections with a=group:BUNDLE
          <xref target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>; we
          will call this "Alternative 1a".
          An alternate approach would be the definition
          of a new media type which effectively allows transmission
          of any kind of media, thereby avoiding the need to bundle
          multiple sections together at all. A syntax for such an
          approach is proposed by
          <xref target="I-D.holmberg-mmusic-sdp-mmt-negotiation"/>.
          We will call this "Alternative 1b".
        </t>
        <t>
          In both of the cases described above, certain SDP attributes
          might be targeted at only one of the streams in an
          RTP session. These attributes can be matched up with
          individual streams using the "a=ssrc" extension defined
          in <xref target="RFC5576"/>.
        </t>
        <t>
          For "Alternative 1a", we have the additional challenge of
          specifying attributes that apply to the entire RTP
          session, such as a=rtcp-fb and ICE candidate parameters.
          One approach would be inclusion of such parameters only
          in the first m= section within a bundle, with the implication
          that they apply to the entire session.
        </t>
        <section title="Alternative 1a: One section per RTP session per type">
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:BUNDLE c1 c2
m=audio 10000 RTP/AVP 0 8 97
a=mid:c1
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
   192.0.2.240 rport 51091
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=ssrc:11111 label:speaker-audio
a=ssrc:22222 label:floor-mic
m=video 10000 RTP/AVP 31 32
a=mid:c2
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
a=ssrc:33333 label:speaker-video
a=ssrc:44444 label:slides
]]></artwork> </figure>
        </section>
        <section title="Alternative 1b: One section per RTP session">
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:MMT foo bar zoe
m=anymedia 10000 RTP/AVP 0 8 97 31 32
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
   192.0.2.240 rport 51091
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
a=mmtype:0 audio
a=mmtype:8 audio
a=mmtype:97 audio
a=mmtype:31 video
a=mmtype:32 video
a=ssrc:11111 label:speaker-audio
a=ssrc:22222 label:floor-mic
a=ssrc:33333 label:speaker-video
a=ssrc:44444 label:slides
]]></artwork> </figure>
        </section>
      </section>
      <section title="Alternative 2: Single stream per m= section">
        <t>
          An alternate proposal is constraining one m= section
          to talk about a single media stream. Like alternative
          1a, above, the BUNDLE extension is used to combine
          several m= sections into a single RTP session. Any
          attributes that are applicable to a single media stream
          can be correlated by putting them in the corresponding
          m= section. Any attributes that apply to the transport
          parameters (e.g., rtcp-fb, ICE parameters) are conveyed in the
          first m= section within the bundle (alternate schemes
          are possible, but this seems the simplest and most
          straightforward).
        </t>
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:BUNDLE c1 c2 c3 c4
m=audio 10000 RTP/AVP 0 8 97
a=mid:c1
a=label:speaker-audio
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
   192.0.2.240 rport 51091
m=audio 10000 RTP/AVP 0 8 97
a=mid:c2
a=label:floor-mic
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
m=video 10000 RTP/AVP 31 32
a=mid:c3
a=label:speaker-video
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
m=video 10000 RTP/AVP 31 32
a=mid:c4
a=label:slides
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
]]></artwork> </figure>
      </section>
      <section title="Pros and Cons">
        <section title="Codec Selection" anchor="codec">
          <t>
            Currently, in SDP and the various documents that
            rely on it (such as <xref target="RFC3264"/>),
            there are certain assumptions made about the
            ordinality of streams to m= sections.
            Consider, for example, wanting to convey
            two audio streams with a low-bandwidth voice
            codec preferred for one, but a high-quality
            codec preferred for the other. RFC 3264 has
            rules indicating that codecs are conveyed in
            the order of their preference. With alternative
            2, it is trivial to provide different ordering
            (or even a different set) of codecs to achieve
            such a goal. Alternatives 1a and 1b lack the ability
            to do so without additional extensions.
          </t>
          <t>
            This set of facts supports alternative 2 in preference
            to alternatives 1a and 1b.
          </t>
        </section>
        <section title="Port Number Handling" anchor="ports">
          <t>
            When multiple sections are used to represent a single
            session, we need to make a decision regarding the port
            number conveyed in the m= line itself. One option is
            to use the same port number in all related m= sections.
            According to Cullen Jennings, this interacts very poorly
            with existing implementations that use SDP. The other
            alternative is to indicate bogus port numbers in all
            (or all but one) of the m= lines. According to Hadriel
            Kaplan, this usage will lead to certain media intermediaries
            destroying the session when it determines that a signaled
            port is going unused.
          </t>
          <t>
            Alternative 1b avoids this problem altogether by
            having only one m= per IP/port combination,
            thereby completely sidestepping the question of what
            to put in subsequent m= lines.
          </t>
          <t>
            This set of facts supports alternative 1b in
            preference to alternatives 1a and 2.
          </t>
        </section>
        <section title="Attribute handling" anchor="attr">
          <t>
            Attributes that appear inside m= sections can be generally
            broken down into three categories: those intended to apply
            to a single media stream (e.g., framerate); those intended
            to apply to an RTP session (e.g., rtcp-fb), and those
            that are explicitly bound to the m= line itself
            (e.g., rtpmap). By and large, these attributes have been
            defined with an assumption that each RTP session had one
            stream and vice-versa.
          </t>
          <t>
            By specifying a model that breaks this one-to-one
            correspondence, we have created the need to be able
            designate a specific media stream within an RTP session
            (for alternatives 1a and 1b), or the need to be able to
            talk about session-level attributes (for alternatives
            1a and 2).
          </t>
          <t>
            Alternatives 1a and 1b can perform stream-level designation
            through the use of the ssid attribute specified in
            <xref target="RFC5576"/>. Alternatives 1a and 2
            can apply a convention that any RTP-session-level
            attributes are placed in the first m= section in a
            bundle (although other, more complicated approaches
            may also be possible).
          </t>
          <t>
            Note, in particular, that alternative 1a inherits both
            problems of being able to designate attributes as applying
            to a single stream, as well as being able to talk about
            session-level attributes when multiple m=lines are
            bundled together.
          </t>
          <t>
            This set of facts supports alternatives 1b and 2 in
            preference to alternative 1a.
          </t>
        </section>
        <section title="What We're Unaware of Not Knowing" anchor="unkn">
          <t>
            It is worth noting that the problem described in
            <xref target="codec"/> was not discovered for quite
            a long time after the discussion of multiple media
            streams had begun. In the characterization of "known
            knowns," "known unknowns," and "unknown unknowns,"
            this issue remained an unknown unknown for more
            than a little time.
          </t>
          <t>
            Generally, addressing these unknown unknowns is likely
            to be easiest if we have the highest granularity of
            control. Alternative 2, by breaking each stream apart
            into its own instance of the control structure that has
            historically been used to work with media (the m= section),
            provides this high granularity where alternatives 1a and
            1b do not.
          </t>
          <t>
            It is the author's opinion that the probable existence
            of such unknown unknowns favors alternative 2 over
            1a or 1b.
          </t>
        </section>
      </section>
      <section title="Red Herrings">
        <t>
          During the course of discussing this topic, several
          points have been raised that, while relevant, do not
          bias the selection of one solution over another.
        </t>
        <t>
          One issue that has been brought up is that SDP offer/answer
          requires signaling of the number of m= sections in the offer,
          to allow clear semantics for negotiation. Some proponents of
          solutions 1a and 1b have indicated a belief that allowing
          multiple streams per m= section avoids this restriction.
          This assertion has a number of problems. First, it assumes that
          implementations can perform reasonable operations on dynamically
          created media streams that begin and end without any signaling.
          It further assumes that the problems that the offer/answer
          model imposed the m-line restrictions for are no longer
          applicable (at least, not on a stream level). Finally, this
          assertion assumes that no control surfaces are necessary to
          talk about and/or manipulate the individual streams (alternately,
          if such control surfaces are introduced, then additional
          SDP round-trips to exchange information about those
          controls is necessary, making them semantically equivalent
          to a new offer/answer exchange -- which eliminates any
          purported advantage).
        </t>
        <t>
          It has also been observed that, in addition to being
          sometimes applicable to streams and sometimes applicable
          to sessions, attribute are also sometimes unidirectional,
          and sometimes bidirectional. While an astute observation,
          this does not appear to have any bearing on the ultimate
          solution selected, as all three alternatives face exactly
          the same challenges in dealing with issues of directionality.
        </t>
        <t>
          Finally, it should be noted that any decision to include
          multiple sections within a single m= section does little
          to simplify implementation. Even if native RTCWEB implementations
          generate the fewest m= sections necessary to convey their
          desired session state, the selection of alternatives
          1a and 1b does not obviate the requirement that implementations
          must be able to receive SDP with several m=audio sections
          (for example). Inter-operation with legacy implementations,
          even through a gateway, will require that proper handling of
          such session descriptions is present in every RTCWEB
          implementation.
        </t>
      </section>
      <section title="Summary">
        <t>
          The following table summarizes the pros and cons conveyed
          in the preceding sections on a per-solution basis.
        </t>
        <texttable>
          <ttcol>Issue</ttcol> <ttcol>1a</ttcol><ttcol>1b</ttcol><ttcol>2</ttcol>
          <c><xref target="codec"/></c> <c>-</c><c>-</c><c>+</c>
          <c><xref target="ports"/></c> <c>-</c><c>+</c><c>-</c>
          <c><xref target="attr"/></c> <c>-</c><c>+</c><c>+</c>
          <c><xref target="unkn"/></c> <c>-</c><c>-</c><c>+</c>
        </texttable>
        <t>
          Based on these criteria, it is the author's belief that
          Alternative 2 provides the most benefit, with Alternative
          1b providing a close second place.
        </t>
        <t>
          Alternative 1a has the remarkable property of combining
          all of the drawbacks of solutions 1b and
          2, forming a kind of "sweet-spot" of ill-advisement, and thereby
          maximizing the amount of work required of the MMUSIC, RTCWEB,and
          CLUE working groups.
        </t>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>
        This document makes no requests of IANA.
      </t>
    </section>
    <section title="Security Considerations">
      <t>
        The author does not believe that the syntax under
        discussion has an impact on the security properties
        of those protocols that make use of SDP.
      </t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc3264;
      &rfc5576;
      &bundle;
      &mmt;
    </references>

<!--
    <references title="Informative References">
    </references>
-->

  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 02:37:54