One document matched: draft-westerlund-avtcore-rtp-simulcast-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-rtp-simulcast-01"
ipr="trust200902" submissionType="IETF">
<front>
<title abbrev="RTP Simulcast">Using Simulcast in RTP sessions</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<email>bo.burman@ericsson.com</email>
</address>
</author>
<author fullname="Morgan Lindqvist" initials="M." surname="Lindqvist">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region/>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile/>
<email>morgan.lindqvist@ericsson.com</email>
<uri/>
</address>
</author>
<author fullname="Fredrik Jansson" initials="F." surname="Jansson">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region/>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile/>
<email>fredrik.k.jansson@ericsson.com</email>
<uri/>
</address>
</author>
<date day="16" month="July" year="2012"/>
<abstract>
<t>In some applications it may be necessary to send multiple media
streams derived from the same media source. This is called Simulcast.
This document discusses the best way of accomplishing this in RTP. It is
concluded that a session based solution provides best support for
simulcast, and a solution for that is defined. There are two necessary
extensions. The first extension is how to group RTP sessions belonging
to the same simulcast source using the grouping framework, and the
second is how to identify which SSRCs that are the same media source by
using a new RTCP SDES item SRCNAME.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>Simulcast is the act of simultaneously sending multiple different
versions of the same media content, e.g. the same video source encoded
with different video encoders. This can be done in several ways and for
different purposes. This document focuses on the case where one wants to
provide multiple streams with different encodings over <xref
target="RFC3550">RTP</xref> towards an intermediary so that the
intermediary can select which encoding to forward to other participants
in the session, and more specifically how the grouping of the streams is
defined.</t>
<t>The different encodings of a media content considered in this
document can differ in:</t>
<t><list style="hanging">
<t hangText="Bit-rate:">The difference is the amount of bits spent
to encode the media thus giving different quality.</t>
<t hangText="Codec:">Different media codecs are used to ensure that
different receivers that do not have a common set of decoders can
decode at least one of the versions. This can include codec
configuration options that are not compatible, like video encoder
profiles, or the capability of receiving the transport
packetization.</t>
<t hangText="Sampling:">Different sampling of media, in spatial as
well as in temporal domain, may be used to suit different rendering
capabilities or needs at the receiving endpoints, as well as a
method to achieve different bit-rates. For video streams, spatial
sampling affects image resolution and temporal sampling affects
video frame rate. For audio, spatial sampling relates to the number
of audio channels and temporal sampling affects audio bandwidth.
Obviously, a difference in sampling may result in difference in
bit-rate.</t>
</list>There are different reasons for an application to provide a
single media source in different encodings. As soon as an application
has the need to send multiple encodings, there is a potential need for
simulcast. This need can arise even when using media codecs that have
scalability features built in. The purpose of this document is to find
the most suitable solution for the non-trivial variants of simulcast and
in order to do this, different ways of multiplexing the different
encodings are discussed. Following the presentation of the alternatives,
an analysis is performed on how different aspects like RTP mechanisms,
signaling possibilities, and network features are affected by the
alternatives. This is a specific application of the aspects discussed in
<xref target="I-D.westerlund-avtcore-multiplex-architecture">RTP
Multiplexing Architecture</xref>. The discussion results in a
conclusion, a solution, and a proposal for the standardization work
required to support simulcast.</t>
</section>
<section title="Definitions">
<t/>
<section title="Terminology">
<t>The following terms and abbreviations are used in this
document:<list style="hanging">
<t hangText="Encoding:">A particular encoding is the choice of the
media encoder (codec) that has been used to compress the media and
the fidelity of that encoding through the choice of sampling,
bit-rate and other codec configuration parameters.</t>
<t hangText="Different encodings:">An encoding is different when
some parameter that characterize the encoding of a particular
media source is changed. Such changes can be one or more of the
following parameters; codec, codec configuration, bit-rate,
sampling.</t>
<t hangText="Simulcast versions:">Media streams used for simulcast
that use different encodings and thus constitute different
versions of the same media source.</t>
</list></t>
</section>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
</section>
<section title="Simulcast and Applicability">
<t>This section discusses different usage scenarios for the term
simulcast and clarifies which of those this document focuses on. It also
reviews why simulcast and scalable codecs can be a useful
combination.</t>
<section title="Simulcasting to RTP Mixer">
<t>This scenario relates to a multi-party session where one or more
central nodes are used to facilitate the media transport between the
session participants. Thus, this targets the RTP Mixer Topology
defined in <xref target="RFC5117"/> (Section 3.4: Topo-Mixer). This
scenario is targeted for further discussion in this document.</t>
<t>Simulcasting different media encodings of video that differ both in
resolution and in bit-rate is highly applicable to video conferencing
scenarios. For example, an RTP mixer selects the video of the most
active speaker and sends that participant's video stream as a high
resolution stream to the other participants, and in addition also
sends a number of low resolution video streams of the other
participants, enabling the receiving user to both display the current
speaker in high quality and monitor the other participants in lower
quality/resolution/size. As the participants should not receive the
stream showing themselves, the set of streams will be unique to all
participants.</t>
<t>A number of alternatives exist to provide both high and low
resolutions from an RTP Mixer:<list style="hanging">
<t hangText="Simulcast:">The clients send one stream for the low
resolution and another for the high resolution.</t>
<t hangText="Scalable Video Coding:">The clients are using a video
encoder that can provide one stream that is both providing the
high resolution and also enables the mixer to extract a low
resolution representation from that single stream.</t>
<t hangText="Transcoding in the Mixer:">The clients send a high
resolution stream to the RTP Mixer which performs a transcoding to
a lower resolution stream.</t>
</list>The Transcoding alternative requires that the RTP mixer has
sufficient amount of transcoding resources to produce the number of
low resolution streams required. In worst case, all participants'
streams may need to be transcoded. If the resources are not available,
a different solution is needed. There will also normally be a quality
loss and an increase in latency associated with the transcoding
operation.</t>
<t>Scalable video encoding requires a more complex encoder compared to
non-scalable encoding. Also, if the resolution difference between the
streams is large, a scalable codec may in fact be only marginally more
bandwidth efficient than the simulcast case where the different
resolutions are sent as separate streams from the clients to the
mixer. At the same time, with scalable video encoding, the
transmission of all but the lowest resolution will consume more
bandwidth from the mixer to the other participants than with a
non-scalable encoding.</t>
<t>Simulcasting has the benefit that it is conceptually simple. It
enables the use of any media codec that the participants agree on,
allowing the RTP mixer to be codec-agnostic. With the currently
available video encoders, simulcasting may be less bit-rate efficient
in the path from the sending client to the mixer but more efficient in
the mixer to receiver path compared to Scalable Video Coding.</t>
<figure align="center" anchor="fig-mixer-forwarding"
title="RTP Mixer selecting from simulcast versions">
<artwork><![CDATA[
+------------+ +---+
+---+ | |----->| B |
| |=====>| | +---+
| A | | Mixer |
| |----->| | +---+
+---+ | |=====>| C |
+------------+ +---+
]]></artwork>
</figure>
<t>The sender A provides the mixer with both a high resolution version
"===>" and a low resolution version "--->". The mixer selects
who in it's receiver population should get a particular version.</t>
<section title="Simulcast Combined with Scalable Encoding">
<t>As explained in the previous section, a scalable codec is not
always more bandwidth efficient than simulcast, especially in the
path from the mixer to the receiver.</t>
<t>There are however cases where a combination of simulcast and
scalable encoding can be beneficial. By using simulcast in cases
where the scalable codec is less efficient, one can optimize the
efficiency of the complete system. A good example of this usage
would be where the video is encoded using <xref target="RFC6190">SVC
transported in RTP</xref>, where each simulcast stream has a
different resolution, and each SVC media stream uses temporal
scalability and signal to noise ratio (SNR) scalability within that
single media stream. If only resolution and temporal variations are
needed, this can be implemented using the non-scalable part of
H.264, as each simulcast version provides the different resolution,
and each media stream within a simulcast encoding has temporal
scalability through the use of non-reference frames.</t>
</section>
</section>
<section title="Multicast Transported Simulcasted Media">
<t>When using multicast, particularly <xref
target="RFC3569">Source-Specific Multicast (SSM)</xref> to distribute
RTP/RTCP packets to a large receiver population one faces some issues.
There are at least two different issues where simulcast can
potentially be useful.</t>
<section title="Diversity in Receiver Population">
<t>If there is any diversity in the receivers regarding e.g.
capability, codec support or code base, there are potentially
restrictions in what streams can be delivered to the receivers. If
using the lowest common denominator over a diverse receiver
population isn't acceptable, simulcast can be one possible solution.
By offering different stream alternatives, it is possible to let the
receivers choose the simulcast version that matches their
capabilities. By using explicit signalling for simulcast, it is not
necessary for the stream distributor to handle multiple receiver
configurations individually for a multi-media session, nor to ensure
that each receiver gets an encoding that matches their
capabilities.</t>
<t>The simulcast version granularity the receivers can select will
be on multicast group level. Thus, this use case puts a strict
requirement on supporting RTP session multiplexing. The reason being
that having a single RTP session straddle several multicast groups
makes any reporting on the received sources very difficult to
interpret. Using one RTP session per simulcast version instead
provides consistency.</t>
</section>
<section title="Bit-rate Adaptation">
<t>If the network paths from the media sender to the receivers can
support different bit-rates, there is a need to support media
streams encoded to different bit-rates. If these path differences
are of a more static nature, for example depending primarily on the
underlying link layers, using simulcast has an advantage over
scalable encoding. The reason is that the efficiency of scalable
coding will never be better than encoding to a single target rate.
When the receiver can determine current network interface
connectivity, it can choose simulcast version with certainty. That
choice will also be correct until the event of another network
interface becoming the active one. This assumes that the multicast
transmission uses dedicated resources and will thus not be congested
due to other network traffic. To support this behavior, the
signalling must support indication of which media streams that are
alternatives to each other, and it is also necessary to be able to
determine aggregate bit-rate for the selected multicast group(s)
compared to available network properties.</t>
<t>Simulcast is possible to use also in more dynamic situations
where each receiver continuously gathers reception statistics to
detect path congestion and based on that may change which version to
receive. The main issue with such usage is how to achieve a switch
from one version to another with minimal playback interruption and
also avoiding to put extra load on the network during the actual
switch. Here, scalable encoding in general have better
characteristics since scalability layers are typically
synchronized.</t>
<t>When comparing simulcast and scalable encoding, the trade-offs
are different and the down-sides occur at different places.
Simulcast will have a higher bit-rate load at a media sender and
that will also be the case for any network path shared between
receivers of multiple simulcast versions. However, for parts of the
network path where there is only a single simulcast version, the
achievable quality at a given bit-rate will be slightly higher for
simulcast. It will also be more difficult to seamlessly switch
between simulcast versions than between different scalable
encodings, as simulcast actually switches from one media stream
version to another instead of adding or removing some enhancement
layers.</t>
</section>
</section>
<section title="Simulcasting to a Consuming End-Point">
<t>This scenario is based on an RTP Transport Translator (Section 3.3:
Topo-Trn-Translator) <xref target="RFC5117"/>. The transport
translator functions as a relay and transmits all streams received
from one participant to all other participants. For example, when
simulcasting a low resolution and a high resolution video stream, the
RTP Translator would send all the streams to all clients. This clearly
increases the bit-rate transmitted on the paths to the clients
compared to the mixer case in the previous section. The only simulcast
benefit for the receiving client over a single stream scenario would
be reduced decoding complexity for the low resolution streams. A
single stream scenario which only transmits the high resolution stream
would allow the receiver to decode it and scale it down to the desired
resolution.</t>
<t>The usage of transport translator and simulcast becomes efficient
if each receiving client is allowed to control or configure the relay
with respect to which version it wants to receive. However, such usage
of RTP has some potential issues with RTCP. One example is when a
receiver has indicated to the transport translator that it does not
want to receive a particular stream, but at the same time it is
receiving and reporting on other streams from the same sender. In this
case, the sender will receive no RTCP messages about the non-forwarded
stream and therefore get the impression that the stream somehow is
lost. Thus some consideration and mechanism are needed to support such
a use case in order not to break RTCP reception reporting.</t>
<t>This scenario is considered in the continuation of the document but
with less emphasis than on the RTP mixer case.</t>
</section>
<section title="Same Encoding to Multiple Destinations">
<t>One interpretation of simulcast is when one encoding is sent to
multiple receivers. This is well supported in RTP by simply copying
all outgoing RTP and RTCP traffic to several transport destinations,
if the intention is to create a common RTP session. As long as all
participants do the same, a full mesh is constructed and everyone in
the multi party session have a similar view of the joint RTP session.
This is analog to an Any Source Multicast (ASM) session but without
the traffic optimization as multiple copies of the same content is
likely to have to pass over the same link.</t>
<figure align="center" anchor="fig-full-mesh"
title="Full Mesh / Multi-unicast">
<artwork><![CDATA[
+---+ +---+
| A |<---->| B |
+---+ +---+
^ ^
\ /
\ /
v v
+---+
| C |
+---+
]]></artwork>
</figure>
<t>As this type of simulcast is analog to ASM usage and RTP has good
support for ASM sessions, no further consideration for this scenario
is made in this document.</t>
</section>
<section title="Different Encoding to Independent Destinations">
<t>Another alternative interpretation of simulcast is multiple
destinations, where each destination gets a specifically tailored
version, but where the destinations are independent. A typical example
for this would be a streaming server distributing the same live
session to a number of receivers, adapting the quality and resolution
of the multi-media session to each receiver's capability and available
bit-rate. This case can be solved in RTP by having independent RTP
sessions between the sender and the receivers. Thus this case is not
considered further.</t>
</section>
</section>
<section title="Simulcast Alternatives">
<t>Simulcast is defined in this document as the act of sending multiple
alternative encodings of the same underlying media source. When
transmitting multiple independent streams that originate from the same
source, it could potentially be done in several different ways using
RTP. The below sub-sections describe potential ways of achieving stream
multiplexing and identification of which streams are alternative
encodings of the same source. In the following descriptions it is also
included how this interacts with multiple sources (SSRCs) in the same
RTP session for other reasons than simulcast. Multiple SSRCs may occur
for various reasons such as multiple participants in multipoint
topologies such as multicast, transport relays or full mesh transport
simulcasting, multiple source devices, such as multiple cameras or
microphones at one end-point, or other RTP mechanisms such as <xref
target="RFC4588">RTP Retransmission</xref>.</t>
<section title="Using the Payload Type">
<t>This alternative uses only the RTP payload type to identify the
different simulcast streams. Thus all simulcast streams would be sent
in the same RTP session using only a single SSRC per actual media
source. However, as discussed in <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
using the Multiplexing Features of RTP</xref>, using Payload Type
Multiplexing does not work and is hereby dismissed as potential
solution.</t>
</section>
<section title="Using Single RTP session ">
<t>This idea is based on using a unique SSRC for each alternative
encoding of an actual media source within a single RTP session. The
identification of how streams are considered to be alternative needs
an additional mechanism, for example using <xref target="RFC5576">SSRC
grouping</xref> and a new SDES item such as SRCNAME proposed in <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname"/> with a semantics
that indicate them as alternatives of a particular media source. When
there are multiple actual media sources in a session, each media
source will have to use a number of SSRCs to represent the different
alternatives it produces. For example, if all actual media sources are
similar and produce the same number of simulcast versions, there will
be n*m SSRCs in use in the RTP session, where n is the number of
actual media sources and m the number of simulcast versions they can
produce. Each SSRC can use any of the configured payload types for
this RTP session. All session level attributes and parameters that are
not source specific will apply and must function with all the
alternative encodings intended to be used.</t>
</section>
<section title="Using Multiple RTP sessions">
<t>Using multiple RTP sessions means that each different simulcast
version of an actual media source is transmitted in a separate RTP
session, using whatever session identifier to distinguish the
different versions. This solution needs explicit <xref
target="RFC5888">session grouping</xref> with a semantics that
indicate them as alternatives. It is also important to identify the
SSRCs in the different sessions that are alternative encodings of the
same media source. This could be accomplished using the same SSRC
across the sessions, but that is not robust against SSRC collisions
and could potentially force cascading SSRC changes between sessions. A
better choice would be to use the same value for the a new SDES item
proposed in <xref target="I-D.westerlund-avtext-rtcp-sdes-srcname"/>.
Each RTP session will have its own set of configured RTP payload types
available for use with any SSRC in that session. In addition, all
other attributes for sessions or sources can be used as normal to
indicate the configuration of that particular alternative.</t>
</section>
</section>
<section title="Analysis">
<t>This section provides an analysis of simulcast as a specific case of
the aspects discussed in <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
using the Multiplexing Features of RTP</xref> to determine what is the
most suitable solution. The below section discusses the relevant points
for simulcast and contrasts using only SSRCs with using both RTP
sessions and SSRC.</t>
<section title="RTP/RTCP Aspects">
<t>The RTP/RTCP aspects of relevance are:<list style="hanging">
<t hangText="RTP Specification:">From a base RTP specification
point of view, there is no real difference between a single RTP
session or using multiple RTP sessions.</t>
<t hangText="Multiple SSRC Legacy Considerations:">Dealing with
legacy handling of multiple SSRCs in one RTP session for simulcast
is a minor issue as end-points supporting simulcast will implement
the necessary support. They should also determine if there is
necessary support based on signalling. However, for cases where
usage of simulcast is combined with legacy in the same scenario,
multiple RTP sessions will have an advantage as the number of
SSRCs in each session does not increase due to simulcast, only the
number of sessions.</t>
<t hangText="Cross Session RTCP Requests:">In the case of
simulcast, the findings in the architecture document stands and
might be relevant when switching between simulcast versions to
configure current code control state.</t>
<t hangText="Binding Related Sources:">Simulcast will require a
clear binding between the SSRCs carrying the different simulcast
versions. This issue will be independent of using one or multiple
RTP sessions.</t>
<t hangText="Transport Translators:">Transport translators and
simulcast is not the best match. This as the core of the
functionality desired in simulcast is usually to be able to switch
between alternatives, which is not really possible with transport
translators as they do not manipulate the media streams. However,
if one uses multiple RTP sessions, a session participant can
control the simulcast version it receives in a very coarse grained
fashion by joining the right RTP session. However, it is not
capable of switching individual sources within the sessions.</t>
</list></t>
<t>Regarding RTP/RTCP aspects, multiple RTP sessions based solution
can handle legacy better, while an single RTP seesion solution has
some advantage if there is need for synchronized requests across
multiple stream versions, but there are no major differences.</t>
</section>
<section title="Signalling Aspects">
<t>The signalling aspects is one of the major issues for simulcast. In
the currently used signalling system based on <xref
target="RFC4566">SDP</xref> and <xref
target="RFC3264">Offer/Answer</xref>, the properties of media streams
are negotiated on RTP session level. This is discussed in Section
7.3.1 of the <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
using the Multiplexing Features of RTP</xref>.</t>
<t>As simulcast is all about being able to signal and negotiate what
the different simulcast versions should be, it becomes important that
the signalling supports such usage. A SSRC only solution does not
prevent such signalling to be developed, but SSRC centric signalling
is currently almost non-existent. If Session and SSRC based solution
is used instead, it is already possible to signal and negotiate the
version properties on a session level. Negotiated media properties
will apply to all media sources sent in the same RTP session, which is
likely not an issue in most cases. For example, using a common
simulcast version definition across all media sources at one end-point
will allow an RTP mixer choose both which media sources and which
simulcast versions of them to forward towards the other
end-points.</t>
<t>From a signalling perspective, the only rapid way forward is
multiple RTP sessions based solution.</t>
</section>
<section title="Network Aspects">
<t>The network aspects that have any relevance for simulcast are:<list
style="hanging">
<t hangText="Quality of Service:">When using simulcast it might be
of interest to prioritize a particular simulcast version, rather
than applying equal treatment of all versions. For example, lower
bit-rate versions may be prioritized over higher bit-rate versions
to minimize congestion or packet losses in the low bit-rate
versions. Thus, there is a benefit to use a simulcast solution
that supports QoS as good as possible. By using RTP sessions over
different transport flows, a simulcast version can be prioritized
by flow based QoS mechanisms. If the application would like to
prioritize a particular media source in one simulcast version then
the two proposals are equal.</t>
<t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
incur more cost for NAT/FW traversal unless the solution for <xref
target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
multiple RTP sessions on a single lower layer transport</xref> is
used, in which cases they are basically equal. That is both from
NAT/FW traversal perspective and QoS possibilities. If flow based
QoS with any differentiation is desirable, the cost for additional
transport flows is likely necessary.</t>
<t hangText="Multicast:">To enable simulcast to be combined with
multicast, it will be required to use multiple RTP sessions.
Multicast groups need be separate for the different versions to
allow a multicast receiver to pick the version it wants, rather
than receive all of them. In this case, the only reasonable
implementation is to use different RTP sessions for each multicast
group so that reporting and other RTCP functions operate as
intended.</t>
</list></t>
<t>Using multiple RTP Sessions are clearly the better choice when
taking network aspects into account. Multiple RTP Sessions are
required to support any multicast usage. In addition, it can provide
support for differentiated flow based QoS. The extra NAT/FW traversal
costs can be mitigated completely by multiplexing all RTP sessions
over a single transport.</t>
</section>
<section title="Security Aspects">
<t>The discussed security aspects has the following applicability or
considerations when it comes to simulcast:<list style="hanging">
<t hangText="Security Context Scope:">Both issues may be
applicable to simulcast usage. If differentiation enforcement is
based on encryption and keying then multiple RTP session based
simulcast has a slight benefit.</t>
<t hangText="Key-Management:">There is no significant difference
in the solution except that multiple RTP sessions may require
keying more contexts. Having more contexts is also what brings
additional freedom to make differentiation.</t>
</list></t>
<t>There is a small difference in security aspects where multiple RTP
sessions provides more freedom, but also a higher cost in the amount
of contexts needing to be keyed.</t>
</section>
<section title="Summary">
<t>Defining multiple RTP sessions based simulcast appears to be the
best choice. It supports the most use cases including the multicast
based one, it has better support for flow based QoS, and the NAT/FW
costs can be mitigated. When it comes to signalling, multiple RTP
sessions based simulcast appears to require a modest set of extensions
to work, while a single RTP session seems to require large amounts of
extensions to enable sets of SSRC to negotiate different parameters
that differentiate the simulcast versions. Multiple RTP sessions also
provide greater flexibility when it comes to key-management choices
for the applications.</t>
<t>A single RTP session solution, as a complement to the multiple RTP
sessions, is not considered due to the large amount of extensions
required for signalling. The needed extensions to support single RTP
session simulcast may be defined in the future.</t>
</section>
</section>
<section title="Signaling Support for Multiple RTP session based Simulcast">
<!--MW: Needs to be worked through
MW2: We might need a simulcasst capable attribute and describe a two phased offer/answer case.
The reasons is if the simulcast receiver invites and they don't know how many versions or in which
configuration dimensions the simulcast will occur. Then they can't populate the RTP sessions
intended to receive the simulcast in.-->
<t>To enable the usage of multiple RTP sessions based simulcast, some
minimal additional signaling support is required. That support is
discussed in this section. First of all, there is a need for a mechanism
to identify the RTP sessions carrying simulcast versions from the same
media source. Secondly, a receiver needs to be able to identify the
SSRCs in the different sessions belonging to the same media source.
Beyond the necessary signaling support for simulcast, some very useful
optimizations regarding transmission of media streams are described that
will also help RTP mixers to select which stream alternatives to deliver
to a specific client, or request a client to encode in a particular
way.</t>
<section title="Grouping Simulcast RTP Sessions">
<t>The proposal is to define a new grouping semantics for the <xref
target="RFC5888">session groupings framework</xref>. There is a need
to separate the semantics of intent to send simulcast streams from the
capability to recognize and receive simulcast streams. For that reason
two new simulcast grouping semantics are defined, "SimulCast Receive"
(SCR) and "SimulCast Send" (SCS). They both act as an indicator that
session level simulcast is desired and provide one set of RTP sessions
that carries simulcast versions of media sources. There may be
multiple sets of RTP Sessions that carries simulcast versions.</t>
<section title="Declarative Use">
<t>When used as a declarative media description, SCR indicates the
configured end-point's required capability to recognize and receive
a specified set of RTP streams as simulcast streams. In the same
fashion, SCS requests the end-point to send a specified set of RTP
streams as simulcast streams. SCR and SCS MAY be used independently
and at the same time and they need not specify the same or even the
same number of RTP sessions in the group.</t>
</section>
<section title="Offer/Answer Use">
<t>When used in an offer, SCS indicates the SDP providing agent's
intent of sending simulcast and the particular set of RTP sessions,
and SCR indicates the agent's capability of receiving simulcast
streams within the configured set of RTP Sessions. SCS and SCR MAY
be used independently and at the same time and they need not specify
the same or even the same number of RTP sessions in the group. The
answerer MUST change SCS to SCR and SCR to SCS in the answer, given
that it has and wants to use the corresponding (reverse) capability.
An answerer not supporting the SCS or SCR direction, or not
supporting SCS or SCR grouping semantics at all, will remove that
grouping attribute altogether, according to <xref
target="RFC5888">the grouping framework</xref>. An offerer that
receives an answer indicating lack of simulcast support in one or
both directions, where SCR and/or SCS grouping are removed, MUST NOT
use simulcast in the non-supported direction(s).</t>
</section>
</section>
<section title="Media Stream Requirements">
<t>When doing simulcast, the media streams that are alternatives need
certain considerations to ensure that switching between alternative
streams are as issue-free as possible. The following considerations
are needed:<list style="hanging">
<t hangText="Same Clock Base:">To enable correct alignment of
media packets on the source time-line, all alternative streams
(SSRCs) MUST use the same underlying clock to relate their RTP
timestamp values with the network time protocol (NTP) formatted
sender time in the RTCP Sender Reports.</t>
<t hangText=""/>
</list></t>
<t/>
</section>
<section title="Relating Alternative Encodings">
<t>To ensure that simulcast streams can be related correctly, the
usage of the <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname">SDES SRCNAME</xref>
with the same value across simulcast versions is belonging to the same
media source is REQUIRED.</t>
</section>
<section title="Multiple Stream handling">
<t>The grouping semantics SCR and SCS SHOULD be combined with the SDP
attributes <xref
target="I-D.westerlund-avtcore-max-ssrc">"a=max-send-ssrc" and
"a=max-recv-ssrc"</xref> to indicate the number of simultaneous
streams of each encoding that may be sent or that can be handled in
the receive direction.</t>
</section>
</section>
<section title="Simulcast Signalling Examples">
<t>This example is for a case of client to video conference service
using a centralized media topology with an RTP mixer. Alice and Bob
calls into a conference server for a conference call with audio and
video sent to the RTP mixer, these clients being capable to send a few
video simulcast versions. The conference server also dials out to Fred,
which is a legacy client resulting in fallback behavior. When dialing
out to Joe, more functionality is enabled as Joe is a client similar to
Alice.</t>
<figure align="center" anchor="fig-mixer-four-party"
title="Four-party Mixer-based Conference">
<artwork><![CDATA[
+---+ +-----------+ +---+
| A |<---->| |<---->| B |
+---+ | | +---+
| Mixer |
+---+ | | +---+
| F |<---->| |<---->| J |
+---+ +-----------+ +---+]]></artwork>
</figure>
<t>Example of Media plane for RTP mixer based multi-party conference
with 4 participants.</t>
<section title="Alice: Desktop Client">
<t>Alice is calling in to the mixer with an audiovisual single stream
desktop client, only adding capability to send simulcast and announce
SRCNAME, compared to a legacy client. The offer from Alice looks
like</t>
<figure anchor="fig-alice-offer"
title="Alice Offer for a Simulcast Conference">
<artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=ssrc:521923924 cname:alice@foo.example.com
a=ssrc:521923924 srcname:a
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=ssrc:192392452 cname:alice@foo.example.com
a=ssrc:192392452 srcname:v
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=ssrc:239245219 cname:alice@foo.example.com
a=ssrc:239245219 srcname:v
a=mid:3
a=sendonly
]]></artwork>
</figure>
<t>As can be seen from the SDP, Alice has a simulcast-enabled client
and offers two different simulcast versions sent from her single
camera, indicated by the SCS grouping tag and the two media IDs (2 and
3). The first video version with media ID 2 prefers 360p resolution
(signaled via imageattr) and the second video version with media ID 3
prefers 180p resolution. The first video media line also acts as the
single receive video (making media line sendrecv), while the second
video media line is only related to simulcast transmission and is thus
offered sendonly. The two simulcast encoding streams and its related
audio stream are bound together using SRCNAME SDES item with the
identifier "v", a single level is required in this case. We also
declare the end-point CNAME as all sources belong to the same
synchronization context.</t>
</section>
<section title="Bob: Telepresence Room">
<t>Bob is calling in to the mixer with a telepresence client that has
capability for both sending multi-stream, receiving and local
rendering of those multiple streams, as well as sending simulcast
versions to the mixer. More specifically, in this example the client
has three cameras, each being sent in three different simulcast
versions. In the receive direction, up to two main screens can show
video from a (multi-stream) conference participant being active
speaker, and still more screen estate can be used to show videos from
up to 16 other conference listeners. Each camera has a corresponding
(stereo) microphone that can also be negotiated down to mono by
removing the stereo payload type from the answer. The capability to
send and receive multiple SSRC in the same RTP session is explicitly
announced through use of <xref
target="I-D.westerlund-avtcore-max-ssrc">RTP multi-stream
signalling</xref>.</t>
<figure anchor="fig-bob-offer"
title="Bob Offer for a Multi-stream and Simulcast Telepresence Conference">
<artwork><![CDATA[v=0
o=bob 129384719 9834727 IN IP4 192.0.2.35
s=Simulcast Enabled Multi Stream Telepresence Client
t=0 0
c=IN IP4 192.0.2.35
b=AS:6035
a=group:SCS 2 3 4
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:435
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=max-send-ssrc:* 3
a=max-recv-ssrc:* 3
a=ssrc:724847850 cname:bob@foo.example.com
a=ssrc:724847850 srcname:a1
a=ssrc:2847529901 cname:bob@foo.example.com
a=ssrc:2847529901 srcname:a2
a=ssrc:57289389 cname:bob@foo.example.com
a=ssrc:57289389 srcname:a3
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:4500
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:* send [x=1280,y=720] recv [x=1280,y=720]
[x=640,y=360] [x=320,y=180]
a=max-send-ssrc:96 3
a=max-recv-ssrc:96 2
a=ssrc:75384768 cname:bob@foo.example.com
a=ssrc:75384768 srcname:v1
a=ssrc:2934825991 cname:bob@foo.example.com
a=ssrc:2934825991 srcname:v2
a=ssrc:3582594238 cname:bob@foo.example.com
a=ssrc:3582594238 srcname:v3
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:1560
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:* send [x=640,y=360]
a=max-send-ssrc:96 3
a=ssrc:1371234978 cname:bob@foo.example.com
a=ssrc:1371234978 srcname:v1
a=ssrc:897234694 cname:bob@foo.example.com
a=ssrc:897234694 srcname:v2
a=ssrc:239263879 cname:bob@foo.example.com
a=ssrc:239263879 srcname:v3
a=mid:3
a=sendonly
m=video 49500 RTP/AVP 96
b=AS:420
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=max-send-ssrc:96 3
a=ssrc:485723998 cname:bob@foo.example.com
a=ssrc:485723998 srcname:v1
a=ssrc:2345798212 cname:bob@foo.example.com
a=ssrc:2345798212 srcname:v2
a=ssrc:1295729848 cname:bob@foo.example.com
a=ssrc:1295729848 srcname:v3
a=mid:4
a=sendonly
m=video 49600 RTP/AVP 96 97 98
b=AS:2600
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:96 recv [x=1280,y=720]
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 recv [x=640,y=360]
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42c00d
a=imageattr:98 recv [x=320,y=180]
a=max-recv-ssrc:96 1
a=max-recv-ssrc:97 4
a=max-recv-ssrc:98 16
a=max-recv-ssrc:* 16
a=mid:5
a=recvonly
a=content:alt
]]></artwork>
</figure>
<t>Bob has a three-camera, three-screen, simulcast-enabled client with
even higher performance than Alice's and can additionally support 720p
video, as well as multiple receive streams of various resolutions. The
client implementor has thus decided to offer three simulcast streams
for each camera, indicated by the SCS grouping tag and the three media
IDs (2, 3, and 4) in the SDP.</t>
<t>The first video media line with media ID 2 indicates the ability to
send video from three simultaneous video sources (cameras) through the
max-send-ssrc attribute with value 3. This media line is also marked
as the main video by using the content attribute from <xref
target="RFC4796"/>. Also the receive direction has declared ability to
handle multiple video sources, and in this example it is 2. The
interpretation of content:main for those two streams in the receive
direction is that the client expects and can present (in prime
position) at most two main (active speaker) video streams from another
multi-camera client.</t>
<t>The second and third video media lines with media ID 3 and 4 are
the sendonly simulcast streams. Through the grouping, they can
implicitly be interpreted as also being content:main for the send
direction, but is not marked as such since multiple media blocks with
content:main could be confusing for a legacy client.</t>
<t>The fourth video media line with media ID 5 is recvonly and is
marked with content:alt. That media line should, as was intended for
that content attribute value, receive alternative content to the main
speaker, such as "audience". In a multi-party conference, that could
for example be the next-to-most-active and/or non-active speakers. The
SDP describes that those streams can be presented in a set of
different resolutions, indicated through the different payload types.
The maximum number of streams per payload type is indicated through
the max-recv-ssrc attribute. In this example, at most one stream can
have payload type 96, preferably 720p, as indicated by the related
imageattr line. Similarly, at most 4 streams can have payload type 97,
preferably using 360p resolution, and at most 16 streams can have
payload type 98, preferably of 180p resolution. In any case, there
must never be more than 16 simultaneous streams of any payload type,
but combinations of payload types may occur, such as for example two
streams using payload type 97 and 8 streams using payload type 98.</t>
<t>The answer from a simulcast-enabled RTP mixer to this last SDP
could look like:</t>
<figure anchor="fig-bob-answer"
title="Server Answer for Bob Multi-stream and Simulcast Telepresence Conference">
<artwork><![CDATA[
v=0
o=server 238947290 239573929 IN IP4 192.0.2.2
s=Multi stream and Simulcast Telepresence Bob Answer
c=IN IP4 192.0.2.43
b=AS:7065
a=group:SCR 2 3 4
m=audio 49200 RTP/AVP 96
b=AS:435
a=rtpmap:96 G719/48000/2
a=max-send-ssrc:96 3
a=max-recv-ssrc:96 3
a=ssrc:4111848278 cname:server@conf1.example.com
a=ssrc:4111848278 srcname:r1
a=ssrc:835978294 cname:server@conf1.example.com
a=ssrc:835978294 srcname:r2
a=ssrc:2938491278 cname:server@conf1.example.com
a=ssrc:2938491278 srcname:r3
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:4650
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:* send [x=1280,y=720] [x=640,y=360] [x=320,y=180]
recv [x=1280,y=720]
a=max-recv-ssrc:96 3
a=max-send-ssrc:96 2
a=ssrc:2938746293 cname:server@conf1.example.com
a=ssrc:2938746293 srcname:t1
a=ssrc:1207102398 cname:server@conf1.example.com
a=ssrc:1207102398 srcname:t2
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:1560
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:* recv [x=640,y=360]
a=max-recv-ssrc:96 3
a=mid:3
a=recvonly
m=video 49500 RTP/AVP 96
b=AS:420
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 recv [x=320,y=180]
a=max-recv-ssrc:96 3
a=mid:4
a=recvonly
m=video 49600 RTP/AVP 96 97 98
b=AS:2600
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:96 send [x=1280,y=720]
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360]
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42c00d
a=imageattr:98 send [x=320,y=180]
a=max-send-ssrc:96 1
a=max-send-ssrc:97 4
a=max-send-ssrc:98 8
a=max-send-ssrc:* 8
a=ssrc:2981523948 cname:server@conf1.example.com
a=ssrc:2938237 cname:server@conf1.example.com
a=ssrc:1230495879 cname:server@conf1.example.com
a=ssrc:74835983 cname:server@conf1.example.com
a=ssrc:3928594835 cname:server@conf1.example.com
a=ssrc:948753 cname:server@conf1.example.com
a=ssrc:1293456934 cname:server@conf1.example.com
a=ssrc:4134923746 cname:server@conf1.example.com
a=mid:5
a=sendonly
a=content:alt
]]></artwork>
</figure>
<t>In this SDP answer, the grouping tag is changed to SCR, confirming
that the sent simulcast streams will be received. The directionality
of the streams themselves as well as the directionality of
multi-stream and bandwidth attributes are changed. The number of
allowed streams in the content:alt video session has been reduced from
16 to 8 in the answer.</t>
</section>
<section title="Fred: Dial-out to Legacy Client">
<t>Fred has a simple legacy client that know nothing of the new
signaling means discussed in this document. In this example, the
multi-stream and simulcast aware RTP mixer is calling out to Fred.
Even though it is never actually sent, this would be Fred's offer SDP,
should he have called in. It is included here to improve the reader's
understanding of Fred's response to the conference SDP.</t>
<figure anchor="fig-fred-offer"
title="Legacy Client Hypothetical Offer">
<artwork><![CDATA[
v=0
o=fred 82342187 237429834 IN IP4 192.0.2.213
s=Legacy Client
t=0 0
c=IN IP4 192.0.2.213
m=audio 50132 RTP/AVP 9 8
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
m=video 50134 RTP/AVP 96 97
b=AS:405
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00c
a=rtpmap:97 H263-2000/90000
a=fmtp:97 profile=0;level=30
]]></artwork>
</figure>
<t>Fred would offer a single mono audio and a single video, each with
a couple of different codec alternatives.</t>
<t>The same conference server as in the previous example is calling
out to Fred, offering the full set of multi-stream and simulcast
features based on what the server itself can support.</t>
<figure anchor="fig-fred-dial-out"
title="Server Dial-out Offer with Multi-stream and Simulcast">
<artwork><![CDATA[
v=0
o=server 323439283 2384192332 IN IP4 192.0.2.2
s=Multi stream and Simulcast Dial-out Offer
c=IN IP4 192.0.2.43
b=AS:7065
a=group:SCR 2 3 4
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:435
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=max-send-ssrc:* 4
a=max-recv-ssrc:* 3
a=ssrc:3293472833 cname:server@conf1.example.com
a=ssrc:3293472833 srcname:q9
a=ssrc:1734728348 cname:server@conf1.example.com
a=ssrc:1734728348 srcname:Gr
a=ssrc:1054453769 cname:server@conf1.example.com
a=ssrc:1054453769 srcname:SO
a=ssrc:3923447729 cname:server@conf1.example.com
a=ssrc:3923447729 srcname:AJ
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:4650
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:* send [x=1280,y=720] [x=640,y=360] [x=320,y=180]
recv [x=1280,y=720]
a=max-recv-ssrc:96 3
a=max-send-ssrc:96 3
a=ssrc:78456398 cname:server@conf1.example.com
a=ssrc:78456398 srcname:bj
a=ssrc:3284726348 cname:server@conf1.example.com
a=ssrc:3284726348 srcname:ON
a=ssrc:2394871293 cname:server@conf1.example.com
a=ssrc:2394871293 srcname:ya
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:1560
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:* recv [x=640,y=360]
a=max-recv-ssrc:96 3
a=mid:3
a=recvonly
m=video 49500 RTP/AVP 96
b=AS:420
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 recv [x=320,y=180]
a=max-recv-ssrc:96 3
a=mid:4
a=recvonly
m=video 49600 RTP/AVP 96 97 98
b=AS:2600
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01f
a=imageattr:96 send [x=1280,y=720]
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360]
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42c00d
a=imageattr:98 send [x=320,y=180]
a=max-send-ssrc:96 1
a=max-send-ssrc:97 4
a=max-send-ssrc:98 8
a=max-send-ssrc:* 8
a=ssrc:2342872394 cname:server@conf1.example.com
a=ssrc:1283741823 cname:server@conf1.example.com
a=ssrc:3294823947 cname:server@conf1.example.com
a=ssrc:1020408838 cname:server@conf1.example.com
a=ssrc:1999343791 cname:server@conf1.example.com
a=ssrc:2934192349 cname:server@conf1.example.com
a=ssrc:2234347728 cname:server@conf1.example.com
a=ssrc:3224283479 cname:server@conf1.example.com
a=mid:5
a=sendonly
a=content:alt
]]></artwork>
</figure>
<t/>
<t>The answer from Fred to this offer would look like:</t>
<figure anchor="fig-fred-answer"
title="Legacy Client Answer to Server Dial-out">
<artwork><![CDATA[
v=0
o=fred 9842793823 239482793 IN IP4 192.0.2.213
s=Legacy Client Answer to Server Dial-out
t=0 0
c=IN IP4 192.0.2.213
m=audio 50132 RTP/AVP 9
b=AS:80
a=rtpmap:9 G722/8000
m=video 50134 RTP/AVP 96
b=AS:405
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00c
m=video 0 RTP/AVP 96
m=video 0 RTP/AVP 96
m=video 0 RTP/AVP 96
]]></artwork>
</figure>
<t>as can be seen from the hypothetical offer, Fred does not
understand any of the multistream or simulcast attributes, and does
also not understand the grouping framework. Thus, all those lines are
removed from the answer SDP and any surplus video media blocks except
for the first are rejected. The media bandwidth are adjusted down to
what Fred actually accepts to receive.</t>
</section>
<section title="Joe: Dial-out to Desktop Client">
<t>This example is almost identical to the one above, with the
difference that the answering end-point has some limited simulcast and
multi-stream capability. As above, this is the offer SDP that Joe
would have used, should he have called in.</t>
<figure anchor="fig-joe-offer"
title="Desktop Client Hypothetical Offer">
<artwork><![CDATA[
v=0
o=joe 82342187 237429834 IN IP4 192.0.2.117
s=Simulcast and Multistream enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.117
b=AS:985
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
a=ssrc:1223883729 cname:joe@foo.example.com
a=ssrc:1223883729 srcname:jV
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:96 send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=ssrc:3842394823 cname:joe@foo.example.com
a=ssrc:3842394823 srcname:BD
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=ssrc:1214232284 cname:joe@foo.example.com
a=ssrc:1214232284 srcname:BD
a=mid:3
a=sendonly
m=video 49300 RTP/AVP 96
b=AS:320
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00c
a=imageattr:96 recv [x=320,y=180]
a=max-recv-ssrc:* 2
a=mid:4
a=recvonly
a=content:alt
]]></artwork>
</figure>
<t>Joe would send two versions of simulcast, 360p and 180p, from a
single camera and can receive three sources of multi-stream, one 360p
and two 180p streams.</t>
<t>Again, the same conference server is calling out to Joe and the
offer SDP from the server would be almost identical to the one in the
previous example. It is therefore not included here. The response from
Joe would look like:</t>
<figure anchor="fig-joe-answer"
title="Desktop Client Answer to Server Dial-out">
<artwork><![CDATA[
v=0
o=joe 239482639 4702341992 IN IP4 192.0.2.117
s=Answer from Desktop Client to Server Dial-out
t=0 0
c=IN IP4 192.0.2.117
b=AS:985
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=ssrc:1223883729 cname:joe@foo.example.com
a=ssrc:1223883729 srcname:iJ
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=imageattr:96 send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=ssrc:3842394823 cname:joe@foo.example.com
a=ssrc:3842394823 srcname:YD
a=mid:2
a=content:main
m=video 0 RTP/AVP 96
a=mid:3
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=ssrc:1214232284 cname:joe@foo.example.com
a=ssrc:1214232284 srcname:YD
a=mid:4
a=sendonly
m=video 49300 RTP/AVP 96
b=AS:320
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00c
a=imageattr:96 recv [x=320,y=180]
a=max-recv-ssrc:* 2
a=mid:5
a=recvonly
a=content:alt
]]></artwork>
</figure>
<t>Since the RTP mixer supports all of the features that Joe does and
more, the SDP does not differ much from what it should have been in an
offer. It can be noted that as stated in <xref target="RFC5888"/>, all
media lines need mid attributes, even the rejected ones, which is why
mid:3 is present even though the mid quality simulcast version offered
by the mixer is rejected by Joe.</t>
</section>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document requests that two new SDP grouping semantics, SCS and
SCR, are registered.</t>
<t>Formal registrations to be written.</t>
<t/>
</section>
<section anchor="Security" title="Security Considerations">
<t>The Simulcast grouping semantics are vulnerable to attacks in the
signalling.</t>
<t>A false grouping of non-simulcast streams as simulcast would risk
that some streams are incorrectly ignored by receivers that know
simulcast and that are uninterested in the assumed simulcast
streams.</t>
<t>A hostile removal of simulcast grouping will prevent streams from
being interpreted as simulcast, which obviously prevents use of the
simulcast functionality. It will also risk that intended simulcast
streams are instead presented as separate, independent streams to a
receiver.</t>
<t>Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t/>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.5576'?>
<?rfc include='reference.RFC.5888'?>
<?rfc include='reference.I-D.westerlund-avtext-rtcp-sdes-srcname'?>
<?rfc include='reference.I-D.westerlund-avtcore-max-ssrc'?>
</references>
<references title="Informative References">
<?rfc include='reference.RFC.3264'?>
<?rfc include='reference.RFC.3569'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.RFC.4796'?>
<?rfc include='reference.RFC.5117'?>
<?rfc include='reference.RFC.6190'?>
<?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>
<?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:34:19 |