One document matched: draft-ietf-mmusic-sdp-simulcast-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-mmusic-sdp-simulcast-01"
ipr="trust200902" submissionType="IETF">
<front>
<title abbrev="Simulcast">Using Simulcast in SDP and RTP Sessions</title>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Kistavagen 25</street>
<city>SE-164 80 Stockholm</city>
<region/>
<code/>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<facsimile/>
<email>bo.burman@ericsson.com</email>
<uri/>
</address>
</author>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Stockholm</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Suhas Nandakumar" initials="S." surname="Nandakumar">
<organization>Cisco</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<phone/>
<facsimile/>
<email>snandaku@cisco.com</email>
<uri/>
</address>
</author>
<author fullname="Mo Zanaty" initials="M." surname="Zanaty">
<organization>Cisco</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<phone/>
<facsimile/>
<email>mzanaty@cisco.com</email>
<uri/>
</address>
</author>
<date day="21" month="July" year="2015"/>
<abstract>
<t>In some application scenarios it may be desirable to send multiple
differently encoded versions of the same media source in independent RTP
streams. This is called simulcast. This document discusses the best way
of accomplishing simulcast in RTP and how to signal it in SDP. A
solution is defined by making an extension to SDP, and using RTP/RTCP
identification methods to relate RTP streams belonging to the same media
source. The SDP extension consists a new media level SDP attribute that
express capability to send and/or receive simulcast RTP streams. One
part of the RTP/RTCP identification method is included as a reference to
a separate document, since it is useful also for other purposes.</t>
</abstract>
</front>
<middle>
<section anchor="sec-intro" title="Introduction">
<t>Most of today's multiparty video conference solutions make use of
centralized servers to reduce the bandwidth and CPU consumption in the
endpoints. Those servers receive RTP streams from each participant and
send some suitable set of possibly modified RTP streams to the rest of
the participants, which usually have heterogeneous capabilities (screen
size, CPU, bandwidth, codec, etc). One of the biggest issues is how to
perform RTP stream adaptation to different participants' constraints
with the minimum possible impact on both video quality and server
performance.</t>
<t>Simulcast is defined in this memo as the act of simultaneously
sending multiple different encoded streams of the same media source,
e.g. the same video source encoded with different video encoder types or
image resolutions. This can be done in several ways and for different
purposes. This document focuses on the case where it is desirable to
provide a media source as multiple encoded streams over <xref
target="RFC3550">RTP</xref> towards an intermediary so that the
intermediary can provide the wanted functionality by selecting which RTP
stream to forward to other participants in the session, and more
specifically how the identification and grouping of the involved RTP
streams are done. From an RTP perspective, simulcast is a specific
application of the aspects discussed in <xref
target="I-D.ietf-avtcore-multiplex-guidelines">RTP Multiplexing
Guidelines</xref>.</t>
<t>The purpose of this document is to describe a few scenarios where it
is motivated to use simulcast, and propose a suitable solution for SDP
signaling and performing RTP simulcast.</t>
</section>
<section anchor="sec-definitions" title="Definitions">
<t/>
<section title="Terminology">
<t>This document makes use of the terminology defined in <xref
target="I-D.ietf-avtext-rtp-grouping-taxonomy">RTP Taxonomy</xref>,
<xref target="RFC5117">RTP Topology</xref> and <xref
target="I-D.ietf-avtcore-rtp-topologies-update">RTP Topologies
Update</xref>. In addition, the following terms are used:<list
style="hanging">
<t hangText="RTP Mixer:">An RTP middle node, defined in <xref
target="RFC5117"/> (Section 3.4: Topo-Mixer), further elaborated
and extended with other topologies in <xref
target="I-D.ietf-avtcore-rtp-topologies-update"/> (Section 3.6 to
3.9).</t>
<t hangText="RTP Switch:">A common short term for the terms
"switching RTP mixer", "source projecting middlebox", and "video
switching MCU" as discussed in <xref
target="I-D.ietf-avtcore-rtp-topologies-update"/>.</t>
<t hangText="Simulcast version:">One encoded stream from the set
of encoded streams that constitutes the simulcast for a single
media source.</t>
<t hangText="Simulcast version alternative:">One encoded stream
being encoded in one of possibly multiple alternative ways to
create a simulcast version.</t>
</list></t>
</section>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
</section>
<section anchor="sec-use-cases" title="Use Cases">
<t>Many use cases of simulcast as described in this document relate to a
multi-party communication session where one or more central nodes are
used to adapt the view of the communication session towards individual
participants, and facilitate the media transport between participants.
Thus, these cases targets the RTP Mixer type of topology.</t>
<t>There are two principle approaches for an RTP Mixer to provide this
adapted view of the communication session to each receiving
participant:<list style="symbols">
<t>Transcoding (decoding and re-encoding) received RTP streams with
characteristics adapted to each receiving participant. This often
include mixing or composition of media sources from multiple
participants into a mixed media source originated by the RTP Mixer.
The main advantage of this approach is that it achieves close to
optimal adaptation to individual receiving participants. The main
disadvantages are that it can be very computationally expensive to
the RTP Mixer and typically also degrades media Quality of
Experience (QoE) such as end-to-end delay for the receiving
participants.</t>
<t>Switching a subset of all received RTP streams or sub-streams to
each receiving participant, where the used subset is typically
specific to each receiving participant. The main advantages of this
approach are that it is computationally cheap to the RTP Mixer and
it has very limited impact on media QoE. The main disadvantage is
that it can be difficult to combine a subset of received RTP streams
into a perfect fit to the resource situation of a receiving
participant.</t>
</list></t>
<t>The use of simulcast relates to the latter approach, where it is more
important to reduce the load on the RTP Mixer and/or minimize QoE impact
than to achieve an optimal adaptation of resource usage.</t>
<t>A multicast/broadcast case where the receivers themselves selects the
most appropriate simulcast version and tune in to the right media
transport to receive that version is also <xref
target="sec-multicast">considered</xref> . This enables large,
heterogeneous receiver populations, when it comes to capabilities and
the use of network path bandwidth resources.</t>
<section anchor="sec-diverse-receivers"
title="Reaching a Diverse Set of Receivers">
<t>The media sources provided by a sending participant potentially
need to reach several receiving participants that differ in terms of
available resources. The receiver resources that typically differ
include, but are not limited to:<list style="hanging">
<t hangText="Codec:">This includes codec type (such as SDP MIME
type) and can include codec configuration options (e.g. SDP fmtp
parameters). A couple of codec resources that differ only in codec
configuration will be "different" if they are somehow not
"compatible", like if they differ in video codec profile, or the
transport packetization configuration.</t>
<t hangText="Sampling:">This relates to how the media source is
sampled, in spatial as well as in temporal domain. For video
streams, spatial sampling affects image resolution and temporal
sampling affects video frame rate. For audio, spatial sampling
relates to the number of audio channels and temporal sampling
affects audio bandwidth. This may be used to suit different
rendering capabilities or needs at the receiving endpoints, as
well as a method to achieve different transport capabilities,
bitrates and eventually QoE by controlling the amount of source
data.</t>
<t hangText="Bitrate:">This relates to the amount of bits spent
per second to transmit the media source as an RTP stream, which
typically also affects the Quality of Experience (QoE) for the
receiving user.</t>
</list>Letting the sending participant create a simulcast of a few
differently configured RTP streams per media source can be a good
tradeoff when using an RTP switch as middlebox, instead of sending a
single RTP stream and using an RTP mixer to create individual
transcodings to each receiving participant.</t>
<t>This requires that the receiving participants can be categorized in
terms of available resources and that the sending participant can
choose a matching configuration for a single RTP stream per category
and media source.</t>
<t>For example, assume for simplicity a set of receiving participants
that differ only in that some have support to receive Codec A, and the
others have support to receive Codec B. Further assume that the
sending participant can send both Codec A and B. It can then reach all
receivers by creating two simulcasted RTP streams from each media
source; one for Codec A and one for Codec B.</t>
<t>In another simple example, a set of receiving participants differ
only in screen resolution; some are able to display video with at most
360p resolution and some support 720p resolution. A sending
participant can then reach all receivers by creating a simulcast of
RTP streams with 360p and 720p resolution for each sent video media
source.</t>
<t>In more elaborate cases, the receiving participants differ both in
available sampling and bitrate, and maybe also codec, and it is up to
the RTP switch to find a good trade-off in which simulcasted stream to
choose for each intended receiver. It is also the responsibility of
the RTP switch to negotiate a good fit of simulcast streams with the
sending participant.</t>
<t>The maximum number of simulcasted RTP streams that can be sent is
mainly limited by the amount of processing and uplink network
resources available to the sending participant.</t>
</section>
<section anchor="sec-application-specific"
title="Application Specific Media Source Handling">
<t>The application logic that controls the communication session may
include special handling of some media sources. It is for example
commonly the case that the media from a sending participant is not
sent back to itself.</t>
<t>It is also common that a currently active speaker participant is
shown in larger size or higher quality than other participants (the
sampling or bitrate aspects of <xref
target="sec-diverse-receivers"/>). Not sending the active speaker
media back to itself means there is some other participant's media
that instead has to receive special handling towards the active
speaker; typically the previous active speaker. This way, the
previously active speaker is needed both in larger size (to current
active speaker) and in small size (to the rest of the participants),
which can be solved with a simulcast from the previously active
speaker to the RTP switch.</t>
</section>
<section anchor="sec-multicast"
title="Receiver Adaptation in Multicast/Broadcast">
<t>When using broadcast or multicast technology to distribute
real-time media streams to large populations of receivers, there can
still be significant heterogeneity among the receiver population. This
can depend on several factors:<list style="hanging">
<t hangText="Network Bandwidth:">The network paths to individual
receivers will have variations in the bandwidth, thus putting
different limits on the supported bit-rates that can be
received.</t>
<t hangText="Endpoint Capabilities:">The end point's hardware and
software can have varying capabilities in relation to screen
resolution, decoding capabilities, and supported media codecs.</t>
</list></t>
<t>To handle these variations, a transmitter of real-time media may
want to apply simulcast to a media source and provide it as a set of
different encoded streams, enabling the receivers to select the best
fit from this set themselves. The end point capabilities will usually
result in a single initial choice. However, the network bandwidth can
vary over time, which requires a client to continuously monitor its
reception to determine if the received RTP streams still fit within
the available bandwidth. If not, another set of encoded streams from
the ones offered in the simulcast will have to be chosen.</t>
<t>When using IP multicast, the level of granularity that the receiver
can select from is decided by its ability to choose different
multicast addresses. Thus, different simulcast versions need to be put
on different media transports using different multicast addresses. If
these simulcast versions are described using SDP, they need to be part
of different SDP media descriptions, as SDP binds to transport on
media description level.</t>
</section>
<section anchor="sec-receiver-preferences"
title="Receiver Media Source Preferences">
<t>The application logic that controls the communication session may
allow receiving participants to apply preferences to the
characteristics of the RTP stream they receive, for example in terms
of the aspects listed in <xref target="sec-diverse-receivers"/>.
Sending a simulcast of RTP streams is one way of accommodating
receivers with conflicting or otherwise incompatible preferences.</t>
</section>
</section>
<section anchor="sec-requirements" title="Requirements">
<t>The following requirements need to be met to support the use cases in
previous sections:<list style="hanging">
<t anchor="req-1" hangText="REQ-1:">Identification. It must be
possible to identify a set of simulcasted RTP streams as originating
from the same media source:<list style="hanging">
<t anchor="req-1.1" hangText="REQ-1.1:">In SDP signaling.</t>
<t anchor="req-1.2" hangText="REQ-1.2:">On RTP/RTCP level.</t>
</list></t>
<t anchor="req-2" hangText="REQ-2:">Transport usage. The solution
must work when using:<list style="hanging">
<t anchor="req-2.1" hangText="REQ-2.1:">Legacy SDP with separate
media transports per SDP media description.</t>
<t anchor="req-2.2" hangText="REQ-2.2:">Bundled SDP media
descriptions.</t>
</list></t>
<t anchor="req-3" hangText="REQ-3:">Capability negotiation. It must
be possible that:<list style="hanging">
<t anchor="req-3.1" hangText="REQ-3.1:">Sender can express
capability of sending simulcast.</t>
<t anchor="req-3.2" hangText="REQ-3.2:">Receiver can express
capability of receiving simulcast.</t>
<t anchor="req-3.3" hangText="REQ-3.3:">Sender can express
maximum number of simulcast versions that can be provided.</t>
<t anchor="req-3.4" hangText="REQ-3.4:">Receiver can express
maximum number of simulcast versions that can be received.</t>
<t anchor="req-3.5" hangText="REQ-3.5:">Sender can detail the
characteristics of the simulcast versions that can be
provided.</t>
<t anchor="req-3.6" hangText="REQ-3.6:">Receiver can detail the
characteristics of the simulcast versions that it prefers to
receive.</t>
</list></t>
<t anchor="req-4" hangText="REQ-4:">Distinguishing features. It must
be possible to have different simulcast versions use different codec
parameters, as can be expressed by SDP format values and RTP payload
types.</t>
<t anchor="req-5" hangText="REQ-5:">Compatibility. It must be
possible to use simulcast in combination with other RTP mechanisms
that generate additional RTP streams:<list style="hanging">
<t anchor="req-5.1" hangText="REQ-5.1:"><xref
target="RFC4588">RTP Retransmission</xref>.</t>
<t anchor="req-5.2" hangText="REQ-5.2:"><xref
target="RFC5109">RTP Forward Error Correction</xref>.</t>
<t anchor="req-5.3" hangText="REQ-5.3:">Related payload types
such as audio Comfort Noise and/or DTMF.</t>
</list></t>
<t anchor="req-6" hangText="REQ-6:">Interoperability. The solution
must be possible to use in:<list style="hanging">
<t anchor="req-6.1" hangText="REQ-6.1:">Interworking with
non-simulcast legacy clients using a single media source per
media type.</t>
<t anchor="req-6.2" hangText="REQ-6.2:">WebRTC "Unified Plan"
environment with a single media source per SDP media
description.</t>
</list></t>
</list></t>
</section>
<section anchor="sec-solution-overview" title="Proposed Solution Overview">
<t>The proposed solution consists of signaling simulcast capability and
configurations in <xref target="RFC4566">SDP</xref>:<list
style="symbols">
<t>An offer or answer can contain a number of simulcast versions,
separate for send and receive directions.</t>
<t>An offer or answer can contain multiple, alternative simulcast
versions in the same fashion as multiple, alternative codecs can be
offered in a media description.</t>
<t>Currently, a single media source per SDP media description is
assumed, which makes the solution work in an <xref
target="I-D.roach-mmusic-unified-plan">Unified Plan</xref> context
(although different from what is currently defined there), both with
and without BUNDLE grouping.</t>
<t>The codec configuration for each simulcast version is expressed
in terms of existing SDP formats (and typically RTP payload types).
Some codecs may rely on codec configuration based on general
attributes that apply for all formats within a media description,
and which could thus not be used to separate different simulcast
versions. This memo makes no attempt to address such shortcomings,
but if needed instead encourages that a separate, general mechanism
is defined for that purpose.</t>
<t>It is possible, but not required to use <xref
target="RFC5576">source-specific signaling</xref> with the proposed
solution.</t>
</list></t>
</section>
<section anchor="sec-solution" title="Proposed Solution">
<t>This section further details the signaling solution outlined <xref
target="sec-solution-overview">above</xref>.</t>
<section anchor="sec-cap" title="Simulcast Capability">
<t>Simulcast capability is expressed as a new media level SDP
attribute, "a=simulcast". For each desired direction
(send/recv/sendrecv), the simulcast attribute defines a list of
simulcast versions (separated by semicolons), each of which is a list
of alternative RTP payload types (separated by commas) for that
simulcast version. The meaning of the attribute on SDP session level
is undefined and MUST NOT be used. There MUST be at most one
"a=simulcast" attribute per media description. The <xref
target="RFC5234">ABNF</xref> for this attribute is:</t>
<figure align="center" anchor="fig-abnf" title="ABNF for Simulcast">
<artwork align="left"><![CDATA[
simulcast-attribute = "a=simulcast" 1*3( WSP sc-dir-list )
sc-dir-list = sc-dir WSP sc-fmt-list *( ";" sc-fmt-list )
sc-dir = "send" / "recv" / "sendrecv"
sc-fmt-list = sc-fmt *( "," sc-fmt )
sc-fmt = fmt
; WSP defined in [RFC5234]
; fmt defined in [RFC4566]
]]></artwork>
</figure>
<t/>
<t>There are separate and independent sets of parameters for simulcast
in send and receive directions. When listing multiple directions, each
direction MUST NOT occur more than once.</t>
<t>Attribute parameters are grouped by direction and consist of a
listing of SDP format tokens (usually corresponding to RTP payload
types), which describe the simulcast versions to be used. The number
of (non-alternative, see below) formats in the list sets a limit to
the number of supported simulcast versions in that direction. The
order of the listed simulcast versions in the "send" direction is not
significant. The order of the listed simulcast versions in the "recv"
direction expresses a preference which simulcast versions that are
preferred, with the leftmost being most preferred, if the number of
actually sent simulcast versions have to be reduced for some
reason.</t>
<t>Formats that have explicit <xref
target="RFC5583">dependencies</xref> to other formats (even in the
same media description) MAY be listed as different simulcast
versions.</t>
<t>Alternative simulcast versions MAY be specified as part of the
attribute parameters by expressing each simulcast version format as a
comma-separated list of alternative values. In this case, all
combinations of those alternatives MUST be supported. The order of the
alternatives within a simulcast version is not significant; codec
preference is expressed by format type ordering on the m-line, using
regular SDP rules.</t>
<t>A simulcast version can use a codec defined such that the same RTP
SSRC can change RTP payload type multiple times during a session,
possibly even on a per-packet basis. A typical example can be a speech
codec that makes use of <xref target="RFC3389">Comfort Noise</xref>
and/or <xref target="RFC4733">DTMF</xref> formats. In those cases,
such "related" formats MUST NOT be listed explicitly in the attribute
parameters, since they are not strictly simulcast versions of the
media source, but rather a specific way of generating the RTP stream
of a single simulcast version with varying RTP payload type. Instead,
only a single codec format MUST be used per simulcast version or
simulcast version alternative (if there are such). The codec format
SHOULD be the codec most relevant to the media description, if
possible to identify, for example the audio codec rather than the
DTMF. What codec format to choose in the case of switching between
multiple equally "important" formats is left open, but it is assumed
that in the presence of such strong relation it does not matter which
is chosen.</t>
<t>Use of the <xref target="RFC2198">redundant audio data</xref>
format could be seen as a form of simulcast for loss protection
purposes, but is not considered conflicting with the mechanisms
described in this memo and MAY therefore be used as any other format.
In this case the "red" format, rather than the carried formats, SHOULD
be the one to list as a simulcast version on the "a=simulcast"
line.</t>
<t><list style="empty">
<t>Editor's note: Consider adding the possibility to put an RTP
stream in <xref target="I-D.ietf-avtext-rtp-stream-pause">"paused"
state</xref> from the beginning of the session, possibly starting
it at a later point in time by applying RTP/RTCP level procedures
from that specification.</t>
</list></t>
<section title="Declarative Use">
<t>When used as a declarative media description, a=simulcast "recv"
direction formats indicates the configured end point's required
capability to recognize and receive a specified set of RTP streams
as simulcast streams. In the same fashion, a=simulcast "send"
direction requests the end point to send a specified set of RTP
streams as simulcast streams. The "sendrecv" direction combines
"send" and "recv" requirements, using the same format values for
both.</t>
<t>If simulcast version alternatives are listed, it means that the
configured end point MUST be prepared to receive any of the "recv"
formats, and MAY send any of the "send" formats for that simulcast
version.</t>
</section>
<section title="Offer/Answer Use">
<t>An offerer wanting to use simulcast SHALL include the
"a=simulcast" attribute in the offer. An offerer that receives an
answer without "a=simulcast" MUST NOT use simulcast towards the
answerer. An offerer that receives an answer with "a=simulcast" not
listing a direction or without any formats in a specified direction
MUST NOT use simulcast in that direction.</t>
<t>An answerer that does not understand the concept of simulcast
will also not know the attribute and will remove it in the SDP
answer, as defined in existing <xref target="RFC3264">SDP
Offer/Answer</xref> procedures. An answerer that does understand the
attribute and that wants to support simulcast in an indicated
direction SHALL reverse directionality of the unidirectional
direction parameters; "send" becomes "recv" and vice versa, and
include it in the answer. If the offered direction is "sendrecv",
the answerer MAY keep it, but MAY also change it to "send" or "recv"
to indicate that it is only interested in simulcast for a single
direction. Note that, like all other use of SDP format tags for the
send direction in Offer/Answer, format tags related to the simulcast
send direction in an offer ("send" or "sendrecv") are placeholders
that refer to information in the offer SDP, and the actual formats
that will be used on the wire (including RTP Payload Format numbers)
depends on information included in the SDP answer.</t>
<t>An offerer listing a set of receive simulcast versions and/or
alternatives in the offer MUST be prepared to receive RTP streams
for any of those simulcast versions and/or alternatives from the
answerer.</t>
<t>An answerer that receives an offer with simulcast containing an
"a=simulcast" attribute listing alternative formats for simulcast
versions MAY keep all the alternatives in the answer, but it MAY
also choose to remove any non-desirable alternatives per simulcast
version in the answer. The answerer MUST NOT add any alternatives
that were not present in the offer.</t>
<t>An answerer that receives an offer with simulcast that lists a
number of simulcast versions, MAY reduce the number of simulcast
versions in the answer, but MUST NOT add simulcast versions.</t>
<t>An offerer that receives an answer where some simulcast version
alternatives are kept MUST be prepared to receive any of the kept
send direction alternatives, and MAY send any of the kept receive
direction alternatives from the answer. This is similar to the case
when the answer includes multiple formats on the m-line.</t>
<t>An offerer that receives an answer where some of the simulcast
versions are removed MAY release the corresponding resources (codec,
transport, etc) in its receive direction and MUST NOT send any RTP
streams corresponding to the removed simulcast versions.</t>
<t>The media formats and corresponding characteristics of encoded
streams used in a simulcast SHOULD be chosen such that they are
different. If this difference is not required, <xref
target="RFC7104">RTP duplication</xref> procedures SHOULD be
considered instead of simulcast.</t>
<t><list style="empty">
<t>Note: The inclusion of "a=simulcast" or the use of simulcast
does not change any of the interpretation or Offer/Answer
procedures for other SDP attributes, like "a=fmtp".</t>
</list></t>
</section>
</section>
<section anchor="sec-srcname" title="Relating Simulcast Versions">
<t>As long as there is only a single media source per SDP media
description, simulcast RTP streams can be related on RTP level through
the RTP payload type, as specified in the SDP <xref
target="sec-cap">"a=simulcast" attribute </xref> parameters. When
using <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref> to use
multiple SDP media descriptions to specify a single RTP session, there
is an identification mechanism that allows relating RTP streams back
to individual media descriptions, after which the above RTP payload
type relation can be used.</t>
</section>
<section anchor="sec-ex" title="Signaling Examples">
<t>These examples are for a case of client to video conference service
using a centralized media topology with an RTP mixer.</t>
<figure align="center" anchor="fig-mixer-four-party"
title="Four-party Mixer-based Conference">
<artwork align="center"><![CDATA[
+---+ +-----------+ +---+
| A |<---->| |<---->| B |
+---+ | | +---+
| Mixer |
+---+ | | +---+
| F |<---->| |<---->| J |
+---+ +-----------+ +---+]]></artwork>
</figure>
<section anchor="sec-ex-unified-plan" title="Unified Plan Client">
<t>Alice is calling in to the mixer with a simulcast-enabled Unified
Plan client capable of a single media source per media type. The
client can send a simulcast of 2 video resolutions and frame rates:
HD 1280x720p 30fps and thumbnail 320x180p 15fps. Alice's Offer:</t>
<figure align="center" anchor="fig-up-offer"
title="Unified Plan Simulcast Offer">
<artwork align="left"><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.156
m=audio 49200 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 49300 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=rtpmap:98 H264/90000
a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720]
a=imageattr:98 send [x=320,y=180] recv [x=320,y=180]
a=simulcast send 97;98 recv 97
]]></artwork>
</figure>
<t>The only thing in the SDP that indicates simulcast capability is
the line in the video media description containing the "simulcast"
attribute. The included format parameters indicates that sent
simulcast versions can differ in video resolution and framerate.</t>
<t>The Answer from the server indicates that it too is simulcast
capable. Should it not have been simulcast capable, the
"a=simulcast" line would not have been present and communication
would have started with the media negotiated in the SDP.</t>
<figure align="center" anchor="fig-up-answer"
title="Unified Plan Simulcast Answer">
<artwork align="left"><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.43
m=audio 49672 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 49674 RTP/AVP 97 98
a=rtpmap:97 H264/90000
a=rtpmap:98 H264/90000
a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000
a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600
a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720]
a=imageattr:98 send [x=320,y=180] recv [x=320,y=180]
a=simulcast recv 97;98 send 97
]]></artwork>
</figure>
<t>Since the server is the simulcast media receiver, it reverses the
direction of the "simulcast" attribute.</t>
</section>
<section anchor="sec-ex-multi-source" title="Multi-Source Client">
<t>Fred is calling in to the same conference as in the example above
with a two-camera, two-display system, thus capable of handling two
separate media sources in each direction, where each media source is
simulcast-enabled in the send direction. Fred's client is a Unified
Plan client, restricted to a single media source per media
description.</t>
<t>The first two simulcast versions for the first media source use
different codecs, <xref target="RFC6190">H264-SVC</xref> and <xref
target="RFC6184">H264</xref>. These two simulcast versions also have
a temporal dependency. Two different video codecs, <xref
target="I-D.ietf-payload-vp8">VP8</xref> and H264, are offered as
alternatives for the third simulcast version for the first media
source.</t>
<t>The second media source is offered with three different simulcast
versions. All video streams of this second media source are loss
protected by <xref target="RFC4588">RTP retransmission</xref>.</t>
<t>Fred's client is also using BUNDLE to send all RTP streams from
all media descriptions in the same RTP session on a single media
transport. There are not so many RTP payload types in this example
that there is any risk of running out of payload types, but for the
sake of making an example, it is assumed that one of the payload
types cannot be kept unique across all media descriptions.
Therefore, the SDP makes use of the mechanism (work in progress) in
BUNDLE that identifies which media description an RTP stream belongs
to (a new RTCP SDES item and <xref target="RFC5285">RTP header
extension</xref> type carrying the a=mid value). That identification
will make it possible to identify unambiguously also on RTP level
which media source it is and thus what the related simulcast
versions are, even though two separate RTP streams in the joint RTP
session share RTP payload type.</t>
<figure anchor="fig-ms-offer"
title="Fred's Multi-Source Simulcast Offer">
<artwork><![CDATA[
v=0
o=fred 238947129 823479223 IN IP4 192.0.2.125
s=Offer from Simulcast Enabled Multi-Source Client
t=0 0
c=IN IP4 192.0.2.125
a=group:BUNDLE foo bar zen
m=audio 49200 RTP/AVP 99
a=mid:foo
a=rtpmap:99 G722/8000
m=video 49600 RTP/AVP 100 101 102 103
a=mid:bar
a=rtpmap:100 H264-SVC/90000
a=rtpmap:101 H264/90000
a=rtpmap:102 H264/90000
a=rtpmap:103 VP8/90000
a=fmtp:100 profile-level-id=42400d; max-fs=3600; max-mbps=108000; \
mst-mode=NI-TC
a=fmtp:101 profile-level-id=42c00d; max-fs=3600; max-mbps=54000
a=fmtp:102 profile-level-id=42c00d; max-fs=900; max-mbps=27000
a=fmtp:103 max-fs=900; max-fr=30
a=imageattr:100 send [x=1280,y=720] recv [x=1280,y=720]
a=imageattr:101 send [x=1280,y=720] recv [x=1280,y=720]
a=imageattr:102 send [x=640,y=360] recv [x=640,y=360]
a=imageattr:103 send [x=640,y=360] recv [x=640,y=360]
a=depend:100 lay bar:101
a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid
a=simulcast sendrecv 100;101 send 103,102
m=video 49602 RTP/AVP 96 103 97 104 105 106
a=mid:zen
a=rtpmap:96 VP8/90000
a=fmtp:96 max-fs=3600; max-fr=30
a=rtpmap:104 rtx/90000
a=fmtp:104 apt=96;rtx-time=200
a=rtpmap:103 VP8/90000
a=fmtp:103 max-fs=900; max-fr=30
a=rtpmap:105 rtx/90000
a=fmtp:105 apt=103;rtx-time=200
a=rtpmap:97 VP8/90000
a=fmtp:97 max-fs=240; max-fr=15
a=rtpmap:106 rtx/90000
a=fmtp:106 apt=97;rtx-time=200
a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid
a=simulcast send 97;96;103
]]></artwork>
</figure>
<t><list style="empty">
<t>Note: Empty lines in the SDP above are added only for
readability and would not be present in an actual SDP.</t>
</list></t>
</section>
</section>
</section>
<section anchor="sec-network-aspects" title="Network Aspects">
<t>Simulcast is in this memo defined as the act of sending multiple
alternative encoded streams of the same underlying media source. When
transmitting multiple independent streams that originate from the same
source, it could potentially be done in several different ways using
RTP. A general discussion on considerations for use of the different RTP
multiplexing alternatives can be found in <xref
target="I-D.ietf-avtcore-multiplex-guidelines">Guidelines for
Multiplexing in RTP</xref>. Discussion and clarification on how to
handle multiple streams in an RTP session can be found in <xref
target="I-D.ietf-avtcore-rtp-multi-stream"/>.</t>
<t>The network aspects that are relevant for simulcast are:<list
style="hanging">
<t hangText="Quality of Service:">When using simulcast it might be
of interest to prioritize a particular simulcast version, rather
than applying equal treatment to all versions. For example, lower
bit-rate versions may be prioritized over higher bit-rate versions
to minimize congestion or packet losses in the low bit-rate
versions. Thus, there is a benefit to use a simulcast solution that
supports QoS as good as possible. By separating simulcast versions
into different RTP sessions and send those RTP sessions over
different media transports, a simulcast version can be prioritized
by existing flow based QoS mechanisms. When using unicast, QoS
mechanisms based on individual packet marking are also feasible,
which do not require separation of simulcast versions into different
RTP sessions to apply different QoS. The proposed solution can be
extended to support this functionality with an optional mid: prefix
before the RTP payload types of a simulcast version, to describe
simulcast across multiple media descriptions.</t>
<t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
incur more cost for NAT/FW traversal unless they can re-use the same
transport flow, which can be achieved by either one of <xref
target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
multiple RTP sessions on a single lower layer transport</xref> or
<xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">Multiplexing
Negotiation Using SDP Port Numbers</xref>. If flow based QoS with
any differentiation is desirable, the cost for additional transport
flows is likely necessary.</t>
<t hangText="Multicast:">Multiple RTP sessions will be required to
enable combining simulcast with multicast. Different simulcast
versions have to be separated to different multicast groups to allow
a multicast receiver to pick the version it wants, rather than
receive all of them. In this case, the only reasonable
implementation is to use different RTP sessions for each multicast
group so that reporting and other RTCP functions operate as
intended. The proposed solution can be extended to support this
functionality with an optional mid: prefix before the RTP payload
types of a simulcast version, to describe simulcast across multiple
media descriptions.</t>
</list></t>
<t/>
</section>
<section anchor="sec-iana" title="IANA Considerations">
<t>This document requests to register a new attribute, simulcast.</t>
<t>Formal registrations to be written.</t>
</section>
<section anchor="sec-security" title="Security Considerations">
<t>The simulcast capability and configuration attributes and parameters
are vulnerable to attacks in signaling.</t>
<t>A false inclusion of the "a=simulcast" attribute may result in
simultaneous transmission of multiple RTP streams that would otherwise
not be generated. The impact is limited by the media description joint
bandwidth, shared by all simulcast versions irrespective of their
number. There may however be a large number of unwanted RTP streams that
will impact the share of the bandwidth allocated for the originally
wanted RTP stream.</t>
<t>A hostile removal of the "a=simulcast" attribute will result in
simulcast not being used.</t>
<t>Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.</t>
</section>
<section title="Contributors">
<t>Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have
contributed with important material to the first versions of this
document. Robert Hansen, from Cisco, contributed significantly to
subsequent versions.</t>
</section>
<section anchor="sec-ack" title="Acknowledgements">
<t/>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.5109'?>
<?rfc include='reference.RFC.5234'?>
<?rfc include='reference.RFC.7104'?>
</references>
<references title="Informative References">
<?rfc include='reference.RFC.2198'?>
<?rfc include='reference.RFC.3264'?>
<?rfc include='reference.RFC.3389'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.RFC.4733'?>
<?rfc include='reference.RFC.5117'?>
<?rfc include='reference.RFC.5285'?>
<?rfc include='reference.RFC.5576'?>
<?rfc include='reference.RFC.5583'?>
<?rfc include='reference.RFC.6184'?>
<?rfc include='reference.RFC.6190'?>
<?rfc include='reference.RFC.6236'?>
<?rfc include='reference.I-D.ietf-avtcore-multiplex-guidelines'?>
<?rfc include='reference.I-D.ietf-avtcore-rtp-multi-stream'?>
<?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>
<?rfc include='reference.I-D.ietf-avtcore-rtp-topologies-update'?>
<?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>
<?rfc include='reference.I-D.ietf-avtext-rtp-grouping-taxonomy'?>
<?rfc include='reference.I-D.roach-mmusic-unified-plan'?>
<?rfc include='reference.I-D.ietf-avtext-rtp-stream-pause'?>
<?rfc include='reference.I-D.ietf-payload-vp8'?>
</references>
<section title="Changes From Earlier Versions">
<t>NOTE TO RFC EDITOR: Please remove this section prior to
publication.</t>
<section title="Modifications Between Individual Version -00 and WG Version -00">
<t><list style="symbols">
<t>Added this appendix.</t>
</list></t>
</section>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 09:21:23 |