One document matched: draft-roach-mmusic-mlines-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>
<!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5576.xml'>
<!ENTITY bundle PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-mmusic-sdp-bundle-negotiation-01.xml'>
<!ENTITY mmt PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-holmberg-mmusic-sdp-mmt-negotiation-00.xml'>
<!ENTITY nbsp " ">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="no"?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="info"
ipr="trust200902" >
<front>
<title abbrev='Media Stream Syntax'>
Thoughts on syntax for representing multiple media streams
</title>
<author fullname="Adam Roach" initials="A. B." surname="Roach">
<organization>Mozilla</organization>
<address>
<postal>
<street></street>
<city>Dallas</city>
<region>TX</region>
<code></code>
<country>US</country>
</postal>
<email>adam@nostrum.com</email>
</address>
</author>
<date/> <!-- Date is auto-generated -->
<area>RAI</area>
<workgroup>MMUSIC</workgroup>
<abstract>
<t>
This document briefly explores the ramifications of
combining multiple media streams into one SDP m=
section versus expressing each in its own
m= section.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
As part of the ongoing RTCWEB and CLUE work, it has
become clear that the current mechanisms in SDP
are insufficient for describing complex sessions
with multiple streams. Two competing schools of
thought have emerged. One holds that the m=
lines should apply to RTP sessions, regardless
of how many media streams they contain. Another
holds that m= lines should apply to media streams
exclusively, and that an additional mechanism
should be applied to combine multiple streams
into a single RTP session, if necessary.
</t>
</section>
<section title="Alternatives">
<section title="Alternative 1: Multiple streams per m= section">
<t>
One approach to specifying multiple streams in a
single RTP session is to put information for several
streams into a single m= section; and, by doing do,
implicitly combine them into a single session.
</t>
<t>
To maintain some level of backwards compatibility with
SDP, this approach might choose to have one m= section
for audio and a second for video (with additional
m= sections for other media types if they are used
in the future), combining those sections with a=group:BUNDLE
<xref target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>; we
will call this "Alternative 1a".
An alternate approach would be the definition
of a new media type which effectively allows transmission
of any kind of media, thereby avoiding the need to bundle
multiple sections together at all. A syntax for such an
approach is proposed by
<xref target="I-D.holmberg-mmusic-sdp-mmt-negotiation"/>.
We will call this "Alternative 1b".
</t>
<t>
In both of the cases described above, certain SDP attributes
might be targeted at only one of the streams in an
RTP session. These attributes can be matched up with
individual streams using the "a=ssrc" extension defined
in <xref target="RFC5576"/>.
</t>
<t>
For "Alternative 1a", we have the additional challenge of
specifying attributes that apply to the entire RTP
session, such as a=rtcp-fb and ICE candidate parameters.
One approach would be inclusion of such parameters only
in the first m= section within a bundle, with the implication
that they apply to the entire session.
</t>
<section title="Alternative 1a: One section per RTP session per type">
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:BUNDLE c1 c2
m=audio 10000 RTP/AVP 0 8 97
a=mid:c1
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
192.0.2.240 rport 51091
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=ssrc:11111 label:speaker-audio
a=ssrc:22222 label:floor-mic
m=video 10000 RTP/AVP 31 32
a=mid:c2
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
a=ssrc:33333 label:speaker-video
a=ssrc:44444 label:slides
]]></artwork> </figure>
</section>
<section title="Alternative 1b: One section per RTP session">
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:MMT foo bar zoe
m=anymedia 10000 RTP/AVP 0 8 97 31 32
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
192.0.2.240 rport 51091
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
a=mmtype:0 audio
a=mmtype:8 audio
a=mmtype:97 audio
a=mmtype:31 video
a=mmtype:32 video
a=ssrc:11111 label:speaker-audio
a=ssrc:22222 label:floor-mic
a=ssrc:33333 label:speaker-video
a=ssrc:44444 label:slides
]]></artwork> </figure>
</section>
</section>
<section title="Alternative 2: Single stream per m= section">
<t>
An alternate proposal is constraining one m= section
to talk about a single media stream. Like alternative
1a, above, the BUNDLE extension is used to combine
several m= sections into a single RTP session. Any
attributes that are applicable to a single media stream
can be correlated by putting them in the corresponding
m= section. Any attributes that apply to the transport
parameters (e.g., rtcp-fb, ICE parameters) are conveyed in the
first m= section within the bundle (alternate schemes
are possible, but this seems the simplest and most
straightforward).
</t>
<figure> <artwork><![CDATA[
v=0
o=- 2890844526 2890844526 IN IP4 host.example.com
s=
c=IN IP4 host.example.com
t=0 0
a=group:BUNDLE c1 c2 c3 c4
m=audio 10000 RTP/AVP 0 8 97
a=mid:c1
a=label:speaker-audio
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
a=candidate:0 1 UDP 2113601791 192.0.2.240 51091 typ host
a=candidate:1 1 UDP 1694194431 198.51.100.32 51091 typ srflx raddr
192.0.2.240 rport 51091
m=audio 10000 RTP/AVP 0 8 97
a=mid:c2
a=label:floor-mic
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
m=video 10000 RTP/AVP 31 32
a=mid:c3
a=label:speaker-video
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
m=video 10000 RTP/AVP 31 32
a=mid:c4
a=label:slides
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000
]]></artwork> </figure>
</section>
<section title="Pros and Cons">
<section title="Codec Selection" anchor="codec">
<t>
Currently, in SDP and the various documents that
rely on it (such as <xref target="RFC3264"/>),
there are certain assumptions made about the
ordinality of streams to m= sections.
Consider, for example, wanting to convey
two audio streams with a low-bandwidth voice
codec preferred for one, but a high-quality
codec preferred for the other. RFC 3264 has
rules indicating that codecs are conveyed in
the order of their preference. With alternative
2, it is trivial to provide different ordering
(or even a different set) of codecs to achieve
such a goal. Alternatives 1a and 1b lack the ability
to do so without additional extensions.
</t>
<t>
This set of facts supports alternative 2 in preference
to alternatives 1a and 1b.
</t>
</section>
<section title="Port Number Handling" anchor="ports">
<t>
When multiple sections are used to represent a single
session, we need to make a decision regarding the port
number conveyed in the m= line itself. One option is
to use the same port number in all related m= sections.
According to Cullen Jennings, this interacts very poorly
with existing implementations that use SDP. The other
alternative is to indicate bogus port numbers in all
(or all but one) of the m= lines. According to Hadriel
Kaplan, this usage will lead to certain media intermediaries
destroying the session when it determines that a signaled
port is going unused.
</t>
<t>
Alternative 1b avoids this problem altogether by
having only one m= per IP/port combination,
thereby completely sidestepping the question of what
to put in subsequent m= lines.
</t>
<t>
This set of facts supports alternative 1b in
preference to alternatives 1a and 2.
</t>
</section>
<section title="Attribute handling" anchor="attr">
<t>
Attributes that appear inside m= sections can be generally
broken down into three categories: those intended to apply
to a single media stream (e.g., framerate); those intended
to apply to an RTP session (e.g., rtcp-fb), and those
that are explicitly bound to the m= line itself
(e.g., rtpmap). By and large, these attributes have been
defined with an assumption that each RTP session had one
stream and vice-versa.
</t>
<t>
By specifying a model that breaks this one-to-one
correspondence, we have created the need to be able
designate a specific media stream within an RTP session
(for alternatives 1a and 1b), or the need to be able to
talk about session-level attributes (for alternatives
1a and 2).
</t>
<t>
Alternatives 1a and 1b can perform stream-level designation
through the use of the ssid attribute specified in
<xref target="RFC5576"/>. Alternatives 1a and 2
can apply a convention that any RTP-session-level
attributes are placed in the first m= section in a
bundle (although other, more complicated approaches
may also be possible).
</t>
<t>
Note, in particular, that alternative 1a inherits both
problems of being able to designate attributes as applying
to a single stream, as well as being able to talk about
session-level attributes when multiple m=lines are
bundled together.
</t>
<t>
This set of facts supports alternatives 1b and 2 in
preference to alternative 1a.
</t>
</section>
<section title="What We're Unaware of Not Knowing" anchor="unkn">
<t>
It is worth noting that the problem described in
<xref target="codec"/> was not discovered for quite
a long time after the discussion of multiple media
streams had begun. In the characterization of "known
knowns," "known unknowns," and "unknown unknowns,"
this issue remained an unknown unknown for more
than a little time.
</t>
<t>
Generally, addressing these unknown unknowns is likely
to be easiest if we have the highest granularity of
control. Alternative 2, by breaking each stream apart
into its own instance of the control structure that has
historically been used to work with media (the m= section),
provides this high granularity where alternatives 1a and
1b do not.
</t>
<t>
It is the author's opinion that the probable existence
of such unknown unknowns favors alternative 2 over
1a or 1b.
</t>
</section>
</section>
<section title="Red Herrings">
<t>
During the course of discussing this topic, several
points have been raised that, while relevant, do not
bias the selection of one solution over another.
</t>
<t>
One issue that has been brought up is that SDP offer/answer
requires signaling of the number of m= sections in the offer,
to allow clear semantics for negotiation. Some proponents of
solutions 1a and 1b have indicated a belief that allowing
multiple streams per m= section avoids this restriction.
This assertion has a number of problems. First, it assumes that
implementations can perform reasonable operations on dynamically
created media streams that begin and end without any signaling.
It further assumes that the problems that the offer/answer
model imposed the m-line restrictions for are no longer
applicable (at least, not on a stream level). Finally, this
assertion assumes that no control surfaces are necessary to
talk about and/or manipulate the individual streams (alternately,
if such control surfaces are introduced, then additional
SDP round-trips to exchange information about those
controls is necessary, making them semantically equivalent
to a new offer/answer exchange -- which eliminates any
purported advantage).
</t>
<t>
It has also been observed that, in addition to being
sometimes applicable to streams and sometimes applicable
to sessions, attribute are also sometimes unidirectional,
and sometimes bidirectional. While an astute observation,
this does not appear to have any bearing on the ultimate
solution selected, as all three alternatives face exactly
the same challenges in dealing with issues of directionality.
</t>
<t>
Finally, it should be noted that any decision to include
multiple sections within a single m= section does little
to simplify implementation. Even if native RTCWEB implementations
generate the fewest m= sections necessary to convey their
desired session state, the selection of alternatives
1a and 1b does not obviate the requirement that implementations
must be able to receive SDP with several m=audio sections
(for example). Inter-operation with legacy implementations,
even through a gateway, will require that proper handling of
such session descriptions is present in every RTCWEB
implementation.
</t>
</section>
<section title="Summary">
<t>
The following table summarizes the pros and cons conveyed
in the preceding sections on a per-solution basis.
</t>
<texttable>
<ttcol>Issue</ttcol> <ttcol>1a</ttcol><ttcol>1b</ttcol><ttcol>2</ttcol>
<c><xref target="codec"/></c> <c>-</c><c>-</c><c>+</c>
<c><xref target="ports"/></c> <c>-</c><c>+</c><c>-</c>
<c><xref target="attr"/></c> <c>-</c><c>+</c><c>+</c>
<c><xref target="unkn"/></c> <c>-</c><c>-</c><c>+</c>
</texttable>
<t>
Based on these criteria, it is the author's belief that
Alternative 2 provides the most benefit, with Alternative
1b providing a close second place.
</t>
<t>
Alternative 1a has the remarkable property of combining
all of the drawbacks of solutions 1b and
2, forming a kind of "sweet-spot" of ill-advisement, and thereby
maximizing the amount of work required of the MMUSIC, RTCWEB,and
CLUE working groups.
</t>
</section>
</section>
<section title="IANA Considerations">
<t>
This document makes no requests of IANA.
</t>
</section>
<section title="Security Considerations">
<t>
The author does not believe that the syntax under
discussion has an impact on the security properties
of those protocols that make use of SDP.
</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc3264;
&rfc5576;
&bundle;
&mmt;
</references>
<!--
<references title="Informative References">
</references>
-->
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 02:37:54 |