One document matched: draft-westerlund-avtcore-rtp-simulcast-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-rtp-simulcast-02"
ipr="trust200902" submissionType="IETF">
<front>
<title abbrev="RTP Simulcast">Using Simulcast in RTP Sessions</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<email>bo.burman@ericsson.com</email>
</address>
</author>
<author fullname="Morgan Lindqvist" initials="M." surname="Lindqvist">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region/>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile/>
<email>morgan.lindqvist@ericsson.com</email>
<uri/>
</address>
</author>
<author fullname="Fredrik Jansson" initials="F." surname="Jansson">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region/>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile/>
<email>fredrik.k.jansson@ericsson.com</email>
<uri/>
</address>
</author>
<date day="25" month="February" year="2013"/>
<abstract>
<t>In some applications it may be necessary to send multiple media
encodings derived from the same media source in independent RTP media
streams. This is called Simulcast. This document discusses the best way
of accomplishing this in RTP and how to signal it in SDP. It is
concluded that a solution where the different simulcast versions are
based on separate SDP media descriptions provides best support for
simulcast. A solution is defined by making two extensions to SDP. The
first extension consists of two new attributes in SDP that express
capability to send or receive simulcast streams, respectively. The
second extension describes how to group media descriptions belonging to
the same simulcast source by using the grouping framework.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>Simulcast is the act of simultaneously sending multiple different
versions of the same media content, e.g. the same video source encoded
with different video encoders or target resolutions. This can be done in
several ways and for different purposes. This document focuses on the
case where one wants to provide multiple streams with different
encodings over <xref target="RFC3550">RTP</xref> towards an intermediary
so that the intermediary can select which encoding to forward to other
participants in the session, and more specifically how the grouping of
the streams is defined. From an RTP perspective, simulcast is a specific
application of the aspects discussed in <xref
target="I-D.westerlund-avtcore-multiplex-architecture">RTP Multiplexing
Architecture</xref>.</t>
<t>The different encodings of a media content that are considered in
this document can differ in:</t>
<t><list style="hanging">
<t hangText="Bit-rate:">The difference is the amount of bits spent
to encode the media thus giving different quality.</t>
<t hangText="Codec:">Different media codecs are used to ensure that
different receivers that do not have a common set of decoders can
decode at least one of the versions. This can include codec
configuration options that are not compatible, like video encoder
profiles, or the capability of receiving the transport
packetization.</t>
<t hangText="Sampling:">Different sampling of media, in spatial as
well as in temporal domain, may be used to suit different rendering
capabilities or needs at the receiving endpoints, as well as a
method to achieve different bit-rates. For video streams, spatial
sampling affects image resolution and temporal sampling affects
video frame rate. For audio, spatial sampling relates to the number
of audio channels and temporal sampling affects audio bandwidth.
Obviously, a difference in sampling may result in difference in
bit-rate.</t>
</list>There are different reasons for an application to provide
multiple different encodings of a single media source. As soon as an
application has the need to send multiple encodings, there is a
potential need for simulcast. This need can arise even when using media
codecs that have scalability features built in. The purpose of this
document is to describe a few scenarios where it is motivated to use
simulcast, elaborate on possible alternatives and available mechanisms,
and find a suitable solution for signaling and performing RTP simulcast.
The discussion results in a signaling proposal to support simulcast.</t>
</section>
<section title="Definitions">
<t/>
<section title="Terminology">
<t>The following terms and abbreviations are used in this
document:<list style="hanging">
<t hangText="Encoding:">A particular encoding is the choice of the
media encoder (codec) that has been used to compress the media and
the fidelity of that encoding through the choice of sampling,
bit-rate and other codec configuration parameters.</t>
<t hangText="Different encodings:">An encoding is different when
some parameter that characterize the encoding of a particular
media source is changed. Such changes can be one or more of the
following parameters; codec, codec configuration, bit-rate,
sampling.</t>
<t hangText="Simulcast versions:">Media streams used for simulcast
that use different encodings and thus constitute different
versions of the same media source.</t>
</list></t>
</section>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
</section>
<section title="Simulcast Scenarios">
<t>This section discusses different usage scenarios for the term
simulcast and clarifies which of those this document focuses on. It also
reviews why simulcast and scalable codecs can be a useful
combination.</t>
<section title="Simulcasting to RTP Mixer">
<t>This scenario relates to a multi-party session where one or more
central nodes are used to facilitate the media transport between the
session participants. Thus, this targets the RTP Mixer Topology
defined in <xref target="RFC5117"/> (Section 3.4: Topo-Mixer). This
scenario is targeted for further discussion in this document.</t>
<t>Simulcasting different media encodings of video that differ both in
resolution and in bit-rate is highly applicable to video conferencing
scenarios. For example, an RTP mixer selects the video of the most
active speaker and sends that participant's video stream as a high
resolution stream to the other participants, and in addition also
sends a number of low resolution video streams of the other
participants, enabling the receiving user to both display the current
speaker in high quality and monitor the other participants in lower
quality/resolution/size. As the participants should not receive the
stream showing themselves, the set of streams will be unique to all
participants.</t>
<t>A number of alternatives exist to provide both high and low
resolutions from an RTP Mixer:<list style="hanging">
<t hangText="Simulcast:">The clients send one stream for the low
resolution and another for the high resolution to the RTP
Mixer.</t>
<t hangText="Scalable Video Coding:">The clients send one stream
to the RTP Mixer, using a video encoder that in this stream can
provide both the high resolution and also enables the mixer to
extract a low resolution representation from that single
stream.</t>
<t hangText="Transcoding in the Mixer:">The clients send a high
resolution stream to the RTP Mixer which performs a transcoding to
a lower resolution stream.</t>
</list>The Transcoding alternative requires that the RTP mixer has
sufficient amount of transcoding resources to produce the number of
low resolution streams required. In worst case, all participants'
streams may need to be transcoded. If the resources are not available,
a different solution is needed. There will also normally be a quality
loss and an increase in latency associated with the transcoding
operation.</t>
<t>Scalable video encoding requires a more complex encoder compared to
non-scalable encoding. Also, if the resolution difference between the
streams is large, a scalable codec may in fact be only marginally more
bandwidth efficient than the simulcast case where the different
resolutions are sent as separate streams from the clients to the
mixer. At the same time, with scalable video encoding using the
currently available scalable video codecs, the transmission of all but
the lowest resolution will consume more bandwidth from the mixer to
the other participants compared to a non-scalable encoding.</t>
<t>Simulcasting has the benefit that it is conceptually simple. It
enables the use of any media codec that the participants agree on,
allowing the RTP mixer to be codec-agnostic.</t>
<figure align="center" anchor="fig-mixer-forwarding"
title="RTP Mixer selecting from simulcast versions">
<artwork><![CDATA[
+------------+ +---+
+---+ | |----->| B |
| |=====>| | +---+
| A | | Mixer |
| |----->| | +---+
+---+ | |=====>| C |
+------------+ +---+
]]></artwork>
</figure>
<t>The sender A provides the mixer with both a high resolution version
"===>" and a low resolution version "--->". The mixer selects
who in it's receiver population should get a particular version.</t>
<section title="Simulcast Combined with Scalable Encoding">
<t>As explained in the previous section, a scalable codec is not
always more bandwidth efficient than simulcast, especially in the
path from the mixer to the receiver.</t>
<t>There are however cases where a combination of simulcast and
scalable encoding can be beneficial. By using simulcast in cases
where the scalable codec is less efficient, it is possible to
optimize the efficiency of the complete system. A good example of
this usage would be where the video is encoded using <xref
target="RFC6190">SVC transported in RTP</xref>, where each simulcast
stream has a different resolution, and each SVC media stream uses
temporal scalability and signal to noise ratio (SNR) scalability
within that single media stream. If only resolution and temporal
variations are needed, this can be implemented using the
non-scalable part of H.264, as each simulcast version provides the
different resolution, and each media stream within a simulcast
encoding has temporal scalability through the use of non-reference
frames.</t>
</section>
</section>
<section title="Multicast Transported Simulcasted Media">
<t>When using multicast, particularly <xref
target="RFC3569">Source-Specific Multicast (SSM)</xref> to distribute
RTP/RTCP packets to a large receiver population one faces some issues.
There are at least two different issues where simulcast can
potentially be useful.</t>
<section title="Diversity in Receiver Population">
<t>If there is any diversity in the receivers regarding e.g.
capability, codec support or code base, there are potentially
restrictions in what streams can be delivered to the receivers. If
using the lowest common denominator over a diverse receiver
population isn't acceptable, simulcast can be one possible solution.
By offering different stream alternatives, it is possible to let the
receivers choose the simulcast version that matches their
capabilities. By using explicit signalling for simulcast, it is not
necessary for the stream distributor to handle multiple receiver
configurations individually for a multi-media session, nor to ensure
that each receiver gets an encoding that matches their
capabilities.</t>
<t>The simulcast version granularity the receivers can select will
be on multicast group level. Thus, this use case puts a strict
requirement on supporting separation through differnt RTP sessions.
The reason being that having a single RTP session straddle several
multicast groups makes any reporting on the received sources very
difficult to interpret. Using one RTP session per simulcast version
instead provides consistency.</t>
</section>
<section title="Bit-rate Adaptation">
<t>If the network paths from the media sender to the receivers can
support different bit-rates, there is a need to support media
streams encoded to different bit-rates. If these path differences
are of a more static nature, for example depending primarily on the
underlying link layers, using simulcast has an advantage over
scalable encoding. The reason is that the efficiency of scalable
coding will never be better than encoding to a single target rate.
When the receiver can determine current network interface
connectivity, it can choose simulcast version with certainty. That
choice will also be correct until the event of another network
interface becoming the active one. This assumes that the multicast
transmission uses dedicated resources and will thus not be congested
due to other network traffic. To support this behavior, the
signalling must support indication of which media streams that are
alternatives to each other, and it is also necessary to be able to
determine aggregate bit-rate for the selected multicast group(s)
compared to available network properties.</t>
<t>Simulcast is possible to use also in more dynamic situations
where each receiver continuously gathers reception statistics to
detect path congestion and based on that may change which version to
receive. The main issue with such usage is how to achieve a switch
from one version to another with minimal playback interruption and
also avoiding to put extra load on the network during the actual
switch. Here, scalable encoding in general have better
characteristics since scalability layers are typically
synchronized.</t>
<t>When comparing simulcast and scalable encoding, the trade-offs
are different and the down-sides occur at different places.
Simulcast will have a higher bit-rate load at a media sender and
that will also be the case for any network path shared between
receivers of multiple simulcast versions. However, for parts of the
network path where there is only a single simulcast version, the
achievable quality at a given bit-rate will be slightly higher for
simulcast. It will also be more difficult to seamlessly switch
between simulcast versions than between different scalable
encodings, as simulcast actually switches from one media stream
version to another instead of adding or removing some enhancement
layers.</t>
</section>
</section>
<section title="Same Encoding to Multiple Destinations">
<t>One interpretation of simulcast is when one encoding is sent to
multiple receivers. This is well supported in RTP by simply copying
all outgoing RTP and RTCP traffic to several transport destinations,
if the intention is to create a common RTP session. As long as all
participants do the same, a full mesh is constructed and everyone in
the multi party session have a similar view of the joint RTP session.
This is analog to an Any Source Multicast (ASM) session but without
the traffic optimization as multiple copies of the same content is
likely to have to pass over the same link.</t>
<figure align="center" anchor="fig-full-mesh"
title="Full Mesh / Multi-unicast">
<artwork><![CDATA[
+---+ +---+
| A |<---->| B |
+---+ +---+
^ ^
\ /
\ /
v v
+---+
| C |
+---+
]]></artwork>
</figure>
<t>As this type of simulcast is analog to ASM usage and RTP has good
support for ASM sessions, no further consideration is made in this
document for this scenario.</t>
</section>
<section title="Different Encoding to Independent Destinations">
<t>Another alternative interpretation of simulcast includes multiple
destinations, where each destination gets a specifically tailored
version, but where the destinations are independent. A typical example
for this would be a streaming server distributing the same live
session to a number of receivers, adapting the quality and resolution
of the multi-media session to each receiver's capability and available
bit-rate. This case can be solved in RTP by having independent RTP
sessions between the sender and the receivers. Thus this case is not
considered further.</t>
</section>
</section>
<section title="Network Aspects">
<t>The network aspects that are relevant for simulcast are:<list
style="hanging">
<t hangText="Quality of Service:">When using simulcast it might be
of interest to prioritize a particular simulcast version, rather
than applying equal treatment to all versions. For example, lower
bit-rate versions may be prioritized over higher bit-rate versions
to minimize congestion or packet losses in the low bit-rate
versions. Thus, there is a benefit to use a simulcast solution that
supports QoS as good as possible. By separating simulcast versions
into different RTP sessions and send those RTP sessions over
different transport flows, a simulcast version can be prioritized by
existing flow based QoS mechanisms. When using unicast, QoS
mechanisms based on individual packet marking are also feasible,
which do not require separation of simulcast versions into different
RTP sessions to apply different QoS.</t>
<t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
incur more cost for NAT/FW traversal unless they can re-use the same
transport flow, which can be achieved by either one of <xref
target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
multiple RTP sessions on a single lower layer transport</xref> or
<xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">Multiplexing
Negotiation Using SDP Port Numbers</xref>. If flow based QoS with
any differentiation is desirable, the cost for additional transport
flows is likely necessary.</t>
<t hangText="Multicast:">Multiple RTP sessions will be required to
enable combining simulcast with multicast. Different simulcast
versions have to be separated to different multicast groups to allow
a multicast receiver to pick the version it wants, rather than
receive all of them. In this case, the only reasonable
implementation is to use different RTP sessions for each multicast
group so that reporting and other RTCP functions operate as
intended.</t>
</list></t>
<t/>
</section>
<section title="Simulcast Alternatives">
<t>Simulcast is in this document defined as the act of sending multiple
alternative encodings of the same underlying media source. When
transmitting multiple independent streams that originate from the same
source, it could potentially be done in several different ways using
RTP. A general discussion on how considerations for use of the different
RTP multiplexing alternatives can be found in <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
using the Multiplexing Features of RTP</xref>. Discussion and
clarification on how to handle multiple streams in an RTP session can be
found in <xref target="I-D.lennox-avtcore-rtp-multi-stream"/>.</t>
<t>The below sub-sections briefly describe potential ways of achieving
RTP media stream multiplexing and identification of which streams are
alternative simulcast encodings of the same source. In the following
descriptions it is also included how this interacts with multiple
sources (SSRCs) in the same RTP session for other reasons than
simulcast. Multiple SSRCs may occur for various reasons such as multiple
participants in multipoint topologies like multicast, transport relays
or full mesh transport simulcasting, multiple source devices such as
multiple cameras or microphones at one end-point, or other RTP
mechanisms such as <xref target="RFC4588">RTP Retransmission</xref>.</t>
<section title="Using the Payload Type">
<t>An alternative could be to use only the RTP payload type to
identify the different simulcast streams. This could be tempting,
since simulcast streams may differ in codec, codec configuration, or
sampling, all of which are typically specified in SDP by a format
number on the media line that is in turn connected to an RTP Payload
Type. Thus all simulcast streams would be sent in the same RTP session
using only a single SSRC per actual media source. However, as
discussed in <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
using the Multiplexing Features of RTP</xref>, using Payload Type
Multiplexing does not generally work and is hereby dismissed as
potential solution.</t>
</section>
<section anchor="sec-single-rtp" title="Using Single RTP session ">
<t>This idea is based on using a unique SSRC for each alternative
encoding of an actual media source within a single RTP session. The
identification of streams and how they are specified to be related
alternatives needs an additional mechanism, for example using <xref
target="RFC5576">SSRC grouping</xref>, and potentially also a new SDES
item such as SRCNAME proposed in <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname"/> with a semantics
that indicate them as alternatives of a particular media source. When
there are multiple actual media sources in a session, each media
source will have to use a number of SSRCs to represent the different
simulcast alternatives it produces. For example, assume the number of
media sources is n and if they all produce the same number of
simulcast versions, m, there will be n*m SSRCs in use in the RTP
session. Each SSRC can use any of the configured payload types for
this RTP session. All session level attributes and parameters that are
not source specific will apply and must function with all the
alternative encodings in use.</t>
<t>In the currently used signaling system based on <xref
target="RFC4566">SDP</xref> and <xref
target="RFC3264">Offer/Answer</xref>, the properties of media streams
are typically negotiated on media block (m-line) level. Sending
simulcast alternatives as different SSRC belonging to the same media
description is likely possible to achieve, but SSRC centric signaling
providing the needed media stream properties is currently almost
non-existent and it would require a considerable effort to make the
necessary SDP extensions.</t>
<t>A single RTP session can be described in SDP by more than a single
m-line, like for <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>, and it
can re-use the same <xref target="RFC5888">m-line grouping</xref> as
would be used for <xref target="sec-multi-rtp">multiple RTP
sessions</xref>, but the RTP aspects described in this section will
still apply. This would enable the same signalling expressenes for
multiple RTP sessions as for a single RTP sessions. <!--BoB: Add figure(s) that shows distribution of simulcast versions to SSRC groups and m-lines?--></t>
</section>
<section anchor="sec-multi-rtp" title="Using Multiple RTP sessions">
<t>Using multiple RTP sessions means that each different simulcast
version of an actual media source is transmitted in a separate RTP
session, using whatever session identifier to distinguish the
different versions. Since each RTP session is described by one or more
SDP m-lines, this solution needs explicit <xref
target="RFC5888">m-line grouping</xref> with a semantics that indicate
them as simulcast alternatives. It is also important to identify the
SSRCs in the different sessions that are alternative encodings of the
same media source, if there are more than a single media source in
each RTP session. This could be accomplished using the same SSRC
across the sessions, but that is not robust against SSRC collisions
and could potentially force cascading SSRC changes between sessions. A
better choice would be to use different SSRC, but relate streams
through a new SDES item proposed in <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname"/>. Each RTP session
will have its own set of configured RTP payload types available for
use with any SSRC in that session. In addition, all other attributes
for sessions or sources can be used as normal to indicate the
configuration of that particular alternative.<!--BoB: Add figure(s) that shows distribution of media sources and versions to sessions and m-lines--></t>
</section>
<section title="Conclusions">
<t>If it is at all desirable to support simulcast based on multicast,
the solution must support using multiple RTP sessions. The main reason
is that receiver based selection of simulcast version must be
possible, which is accomplished in multicast through receiver
selection of which multicast group(s) it joins. This also has the
advantage of being able to use the existing SDP media description (m=)
expressiveness to signal or negotiate simulcast versions.</t>
<t>When using simulcast based on unicast, it is desirable to be able
to use the same media description signalling expressiveness regardless
if multiple RTP sessions are used or not. Assuming that MMUSIC decides
to enable single RTP media stream negotiation per SDP media
description and combine that with BUNDLE to identify RTP sessions, it
appears that using one or more RTP sessions for simulcast over unicast
will be able to use the same signalling solution. Thus the decision to
use one or more RTP sessions can be taken based on other limitations,
such as cost of NAT/FW traversal, need for flow-based QoS etc.</t>
<t>A solution proposal for an SDP media description level signaling
for Simulcast version parameters is outlined below.</t>
</section>
</section>
<section title="Simulcast Signaling Proposal">
<t>Signaling simulcast is about negotiating between media sender and
receiver what the different simulcast versions should be, how to
identify them in terms of RTP streams, and how to relate those RTP
streams.</t>
<t>The proposed solution consists of:<list style="symbols">
<t>Signaling simulcast capability as SDP media level attributes in a
first round of Offer/Answer<list style="symbols">
<t>Separate send and receive simulcast capabilities</t>
<t>Media properties that are supported as base for different
simulcast versions are listed as parameters</t>
</list></t>
<t>Adding SDP media descriptions for the simulcast streams in a
second round of Offer/Answer<list style="symbols">
<t>Grouping SDP media descriptions from the same media source,
belonging to the same simulcast, using the <xref
target="RFC5888">SDP grouping framework</xref></t>
<t>Separate send and receive simulcast groupings</t>
<t>Negotiating parameters for simulcast version using regular,
individual SDP media descriptions</t>
<t>Identifying RTP media streams (SSRC) from same media source
using new SDES Item <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref></t>
</list></t>
</list></t>
<t> This is further outlined below.</t>
<section title="Simulcast Capability">
<t>There are numerous media properties that can be varied to construct
a set of simulcast versions. A simulcast enabled endpoint could also
support simulcast based on several of those properties. As long as
those properties are relatively independent and if each simulcast
version need explicit definition (an m-line) in the SDP, this would
lead to an exponential number of simulcast version candidates and a
very long SDP that is likely also hard to interpret. There is thus a
need to limit the simulcast version candidates included in the SDP to
cover as small set of properties as possible.</t>
<t>If a legacy endpoint not supporting simulcast were to be presented
with an SDP including media descriptions for a set of simulcast
versions, it may not know how to correctly handle or interpret these
"surplus" media descriptions.</t>
<t>Based on the functionality that simulcast is intended to achieve,
it should be clear that the reasons to send simulcast versions are not
the same as to receive simulcast versions, seen from a single
endpoint.</t>
<t>For these reasons, it is proposed to define two new SDP media level
attributes, "a=sim-send" and "a=sim-recv", which explicitly signal
support for simulcast media transmission and simulcast media
reception, respectively, for that media description. "a=sim-send" and
"a=sim-recv" MAY be used independently and simulaneously. These
attributes are also proposed to have parameters indicating the media
properties used to create the simulcast versions. The meaning of the
attributes on SDP session level is undefined and MUST NOT be used.</t>
<figure anchor="fig-abnf" title="ABNF for Simulcast">
<artwork><![CDATA[simulcast = "a="( "sim-send:" / "sim-recv:" ) prop-list
prop-list = prop-entry *(WSP prop-entry)
prop-entry = prop *("=" q-value)
prop = "rtpmap"
/ "fmtp"
/ "imageattr"
/ "ptime"
/ "crypto"
/ token ; for future extensions
q-value = ( "0" "." 1*2DIGIT )
/ ( "1" "." 1*2("0") )
; Values between 0.00 and 1.00
; WSP and DIGIT defined in [RFC5234]
; token defined in [RFC4566]
]]></artwork>
</figure>
<t>The media property values are taken from existing (and could likely
be extended to cover future) SDP attributes that express media
properties that can be varied to create different simulcast
versions:<list style="hanging">
<t hangText="rtpmap:">Differences in codec type, sampling rate
(see <xref target="sec-requirements"/>), and number of
channels</t>
<t hangText="fmtp:">Differences in codec-specific encoding
parameters</t>
<t hangText="imageattr:">Differences in video resolution, aspect
ratio, and framerate <xref target="RFC6236"/></t>
<t hangText="ptime:">Differences in frame aggregation per
packet</t>
<t hangText="crypto:">Differences in encryption <xref
target="RFC4568"/></t>
<t hangText="...:"/>
</list></t>
<t>The optional q-value expresses the relative preference to base a
simulcast version on that media property, with 1.00 meaning maximum
(100%) preference and 0.00 meaning no (0%) preference. Several media
properties can share the same q-value, in which case they are equally
preferred.</t>
<t>An offerer wanting to use simulcast SHALL include either one or
both of those attributes, depending on in which direction(s) simulcast
will be used. An offerer that receives an answer without "a=sim-send"
or "a=sim-recv" MUST NOT define or use any simulcast alternatives
belonging to that media description and in that direction to the
answerer.</t>
<t>An answerer that does not understand the concept of simulcast will
also not know those attributes and will remove them in the SDP answer,
as defined in existing SDP Offer/Answer procedures. An answerer that
does understand the attributes and that wants to support simulcast in
the indicated direction SHALL reverse directionality of the attribute,
"sim-send" becomes "sim-recv" and vice versa, and include it in the
answer.</t>
<t>An offerer that intends to send simulcast alternatives and thus
includes "a=sim-send", MUST also include at least one media property
parameter that it intends to use to construct the simulcast
alternatives, but it MAY include more media property parameters.
Including multiple media property parameters in "a=sim-send" SHALL be
interpreted as an offer to send simulcast versions covering all
combinations thereof, but MAY be further restricted by other
information in the SDP such as for example the number of
simulcast-related media descriptions in the SDP or use of <xref
target="I-D.westerlund-mmusic-max-ssrc">max-ssrc signaling</xref>.</t>
<t>An offerer that is capable of receiving simulcast alternatives and
thus includes "a=sim-recv", MUST also include at least one media
property parameter that it is willing to use as discriminator between
received simulcast alternatives, but MAY include more media property
parameters. Including multiple media property parameters in
"a=sim-recv" SHALL be interpreted as an offer to receive simulcast
versions covering all combinations thereof, but MAY be further
restricted by other information in the SDP such as for example the
number of simulcast-related media descriptions in the SDP or use of
<xref target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
signaling</xref>.</t>
<t>An answerer either lacks the capability or desire to use simulcast
versions based on a certain media property parameter in a specific
direction MUST remove such media property parameter from "a=sim-send"
or "a=sim-recv". The answerer MUST NOT add any media property
parameters that were not included in the offer.</t>
</section>
<section anchor="sec-group-m"
title="Grouping Simulcast Media Descriptions">
<t>To relate media descriptions holding simulcast versions, two new
simulcast grouping semantics are defined, "SimulCast Receive" (SCR)
and "SimulCast Send" (SCS). There is a need to separate semantics for
the intent to send simulcast streams from the semantics that describe
capability to recognize and receive simulcast streams. Both sematics
act as an indicator that simulcast is desired and that the grouped
media descriptions (m-lines) carries simulcast versions of media
sources. There may be multiple sets of media descriptions that carries
simulcast versions.</t>
<section title="Declarative Use">
<t>When used as a declarative media description, SCR indicates the
configured end-point's required capability to recognize and receive
a specified set of RTP streams as simulcast streams. In the same
fashion, SCS requests the end-point to send a specified set of RTP
streams as simulcast streams. SCR and SCS MAY be used independently
and at the same time and they need not specify the same or even the
same number of media descriptions in the group.</t>
</section>
<section title="Offer/Answer Use">
<t>When used in an offer, SCS indicates the SDP providing agent's
intent of sending simulcast and the particular set of media
descriptions, and SCR indicates the agent's capability of receiving
simulcast streams within the configured set of media descriptions.
SCS and SCR MAY be used independently and at the same time and they
need not specify the same or even the same number of media
descriptions in the group. The answerer MUST change SCS to SCR and
SCR to SCS in the answer, given that it has and wants to use the
corresponding (reverse) capability. An answerer not supporting the
SCS or SCR direction, or not supporting SCS or SCR grouping
semantics at all, will remove that grouping attribute altogether,
according to <xref target="RFC5888">the grouping framework</xref>.
However, this case should not occur or at least be very rare due to
the proposed <xref target="sec-two-phase">two-phase approach</xref>.
An offerer that receives an answer indicating lack of simulcast
support in one or both directions, where SCR and/or SCS grouping are
removed, MUST NOT use simulcast in the non-supported
direction(s).</t>
</section>
</section>
<section anchor="sec-two-phase" title="Two-Phase Negotiation">
<t>These new "a=sim-send" and "a=sim-recv" attributes are proposed to
be included in the SDP as a first phase in a two-phased approach,
where the first phase involves a first SDP Offer/Answer procedure that
only establishes simulcast capability at both the offerer and the
answerer. This has the additional advantage to avoid sending media
descriptions related to simulcast to an endpoint that does not support
simulcast. It is also not likely that it incurs any significant extra
signaling round-trips, given that many other recent SDP techniques
also makes use of two Offer/Answer procedures, as long as this phased
approach can be used in parallel with those. Such other two-phase
techniques include <xref target="RFC5245">ICE</xref> and <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>.</t>
<t>Thus, the first Offer/Answer SHOULD NOT include any
simulcast-grouped media descriptions, which SHOULD then be added in a
second Offer/Answer phase. This second phase SHOULD be initiated by
the simulcast receiver, meaning the endpoint that included
"a=sim-recv" in the first phase SDP SHOULD be offerer in the second
phase. If both endpoints are simulcast receivers, it is not possible
to define a preferred offerer in the second phase and either endpoint
MAY then send the offer, using regular Offer/Answer rules to handle
race conditions.</t>
<t>The first phase of establishing capability is not possible to use
with declarative SDP, in which case it SHALL be by-passed, using the
second phase media description grouping directly.</t>
</section>
<section anchor="sec-requirements" title="Media Stream Requirements">
<t>When doing simulcast, the media streams that are alternatives need
to meet certain constraints to ensure that switching between
alternative streams are as issue-free as possible. The following
constraints are needed:<list style="hanging">
<t hangText="Same Clock Base:">To enable correct alignment of
media packets on the source time-line, all alternative streams
(SSRCs) MUST use the same underlying clock to relate their RTP
timestamp values with the network time protocol (NTP) formatted
sender time in the RTCP Sender Reports.</t>
<t hangText=""/>
</list></t>
<t/>
</section>
<section anchor="sec-srcname" title="Relating Alternative Encodings">
<t>To ensure that simulcast streams can be related correctly also on
RTP level, the usage of <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname">SDES SRCNAME</xref>
to label and relate simulcast versions belonging to the same media
source is RECOMMENDED.</t>
</section>
<section anchor="sec-max-ssrc" title="Multiple Stream handling">
<t>When using multiple SSRC in a single media description, for example
when using simulcast for multiple independent media sources, the
grouping semantics SCR and SCS SHOULD be combined with the SDP
attributes <xref
target="I-D.westerlund-mmusic-max-ssrc">"a=max-send-ssrc" and
"a=max-recv-ssrc"</xref> to indicate the number of simultaneous
streams of each encoding that may be sent or that can be handled in
the receive direction.</t>
</section>
</section>
<section title="Simulcast Signaling Examples">
<t>For brevity and clarity, the SDP in all below examples does not
contain signaling for multiple streams, such as the ones related to
<xref target="sec-srcname">RTP level relations</xref> or <xref
target="sec-max-ssrc">multiple SSRC signaling</xref>.</t>
<t>This example is for a case of client to video conference service
using a centralized media topology with an RTP mixer. Alice and Bob
calls into a conference server for a conference call with audio and
video sent to the RTP mixer, these clients being capable to send a few
video simulcast versions. The conference server also dials out to Fred,
which is a legacy client resulting in fallback behavior. When dialing
out to Joe, more functionality is enabled as Joe is a client similar to
Alice.</t>
<figure align="center" anchor="fig-mixer-four-party"
title="Four-party Mixer-based Conference">
<artwork><![CDATA[
+---+ +-----------+ +---+
| A |<---->| |<---->| B |
+---+ | | +---+
| Mixer |
+---+ | | +---+
| F |<---->| |<---->| J |
+---+ +-----------+ +---+]]></artwork>
</figure>
<t>Example of Media plane for RTP mixer based multi-party conference
with 4 participants.</t>
<section title="Alice: Desktop Client">
<t>Alice is calling in to the mixer with an audiovisual single stream
desktop client, only adding capability to send <xref
target="RFC6236">video resolution</xref> ("imageattr") and framerate
based simulcast compared to a legacy client. The first phase offer
from Alice looks like</t>
<figure anchor="fig-alice-first-offer"
title="Alice First Offer for a Simulcast Conference">
<artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:665
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr=1.0 fmtp=0.8
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=content:main
]]></artwork>
</figure>
<t>In this first phase, the only thing in the SDP that indicates
simulcast capability is the line in the video media description
containing the "sim-send" attribute.</t>
<t>The answer from the server indicates both that it is simulcast
capable and that it would only like to use video resolution
("imageattr") based simulcast only. Should it not have been simulcast
capable, the "a=sim-recv" line would not have been present and
communication would have started with the media negotiated in the
SDP.</t>
<figure anchor="fig-alice-first-answer"
title="Server First Answer for a Simulcast Conference">
<artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:665
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=content:main
]]></artwork>
</figure>
<t>Since the server is the simulcast media receiver, it immediately
initiates another Offer/Answer including the simulcast versions. The
server also keeps the "sim-recv" as explicit simulcast capability
indication in this second Offer/Answer round. Note that the
"non-simulcast" media can be started already now, before the second
phase Offer/Answer, with the only restriction that the simulcast
functionality is not yet established.</t>
<figure anchor="fig-alice-second-offer"
title="Server Second Offer for a Simulcast Conference">
<artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Server inviting simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:825
a=group:SCR 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 recv [x=320,y=180]
a=mid:3
a=recvonly
]]></artwork>
</figure>
<t>The server has added one additional receive-only media description
with the simulcast version based on difference only in imageattr. That
the two media lines are considered to be simulcast versions is seen
from the SCR grouping tag and the two media IDs (2 and 3). The first
video version with media ID 2 prefers 360p resolution (signaled via
imageattr) and the second video version with media ID 3 prefers 180p
resolution. The first video media line also acts as the single send
video (making media line sendrecv), while the second video media line
is only related to simulcast transmission and is thus offered
recvonly. </t>
<t>The fact that fmtp for this second video is also different should
be seen as a secondary effect from the change of resolution and does
not create any kind of conflict. The capabilities of Alice's client is
very well aligned with this and the SDP answer is straightforward.</t>
<figure anchor="fig-alice-second-answer"
title="Alice Second Answer for a Simulcast Conference">
<artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Final answer from simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=mid:3
a=sendonly
]]></artwork>
</figure>
<t/>
</section>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document requests that two new attributes sim-send and sim-recv,
with a new registry of defined parameters taken from existing SDP
attributes, and two new SDP grouping semantics, SCS and SCR, are
registered.</t>
<t>Formal registrations to be written.</t>
<t/>
</section>
<section anchor="Security" title="Security Considerations">
<t>The simulcast capability attributes and parameters are vulnerable to
attacks in signaling.</t>
<t>A false inclusion of simulcast attributes may result in generation of
a second phase SDP that potentially contains a large number of
non-supported media descriptions expressing simulcast alternatives. A
correct SDP implementation will however be able to reject any
non-supported media descriptions and the effect from that should be
limited.</t>
<t>A hostile removal of the simulcast attributes will result in skipping
any second phase Offer/Answer and that simulcast is not used.</t>
<t>The simulcast grouping semantics are vulnerable to attacks in the
signalling.</t>
<t>A false grouping of non-simulcast streams as simulcast would risk
that some streams are incorrectly ignored by receivers that know
simulcast and that are not interested in the assumed simulcast
streams.</t>
<t>A hostile removal of simulcast grouping will prevent streams from
being interpreted as simulcast, which obviously prevents use of the
simulcast functionality. It will also risk that intended simulcast
streams are instead presented as separate, independent streams to a
receiver.</t>
<t>Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t/>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.4568'?>
<?rfc include='reference.RFC.5576'?>
<?rfc include='reference.RFC.5888'?>
<?rfc include='reference.RFC.6236'?>
<?rfc include='reference.I-D.westerlund-avtext-rtcp-sdes-srcname'?>
<?rfc include='reference.I-D.westerlund-mmusic-max-ssrc'?>
</references>
<references title="Informative References">
<?rfc include='reference.RFC.3264'?>
<?rfc include='reference.RFC.3569'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.RFC.5117'?>
<?rfc include='reference.RFC.5245'?>
<?rfc include='reference.RFC.6190'?>
<?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>
<?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>
<?rfc include='reference.I-D.lennox-avtcore-rtp-multi-stream'?>
<?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:35:14 |