One document matched: draft-westerlund-avtcore-rtp-simulcast-03.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-rtp-simulcast-03"
ipr="trust200902" submissionType="IETF">
<front>
<title abbrev="RTP Simulcast">Using Simulcast in RTP Sessions</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<email>bo.burman@ericsson.com</email>
</address>
</author>
<author fullname="Suhas Nandakumar" initials="S." surname="Nandakumar">
<organization>Cisco</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<phone/>
<facsimile/>
<email>snandaku@cisco.com</email>
<uri/>
</address>
</author>
<date day="21" month="October" year="2013"/>
<abstract>
<t>In some application scenarios it may be desirable to send multiple
differently encoded versions of the same Media Source in independent
Source Packet Streams. This is called Simulcast. This document discusses
the best way of accomplishing Simulcast in RTP and how to signal it in
SDP. A solution is defined by making three extensions to SDP, and using
RTP/RTCP identification methods to relate RTP Source Packet Streams. The
first SDP extension consists of two new session level SDP attributes
that express capability to send or receive Simulcast Source Packet
Streams, respectively. The second SDP extension introduces an SDP media
level attribute that groups and identifies a selected set of media level
parameters for a specific direction, called a media configuration. The
third SDP extension describes how to group such media configurations on
SDP session or media level for Simulcast purposes.</t>
</abstract>
</front>
<middle>
<section anchor="sec-intro" title="Introduction">
<t>Most of today's multiparty video conference solutions make use of
centralized servers to reduce the bandwidth and CPU consumption in the
endpoints. Those servers receive Source Packet Streams from each
participant and send some suitable set of possibly modified streams to
the rest of the participants, which usually have heterogeneous
capabilities (screen size, CPU, bandwidth, codec, etc). One of the
biggest issues is how to perform stream adaptation to different
participants' constraints with the minimum possible impact on video
quality and server performance.</t>
<t>Simulcast is the act of simultaneously sending multiple different
versions of the same media content, e.g. the same video source encoded
with different video encoder types or image resolutions. This can be
done in several ways and for different purposes. This document focuses
on the case where it is desirable to provide a Media Source as multiple
Source Packet Streams over <xref target="RFC3550">RTP</xref> towards an
intermediary so that the intermediary can provide the wanted
functionality by selecting which Source Packet Stream to forward to
other participants in the session, and more specifically how the
identification and grouping of the involved Source Packet Streams are
done. From an RTP perspective, Simulcast is a specific application of
the aspects discussed in <xref
target="I-D.ietf-avtcore-multiplex-guidelines">RTP Multiplexing
Guidelines</xref>.</t>
<t>The purpose of this document is to describe a few scenarios where it
is motivated to use Simulcast, and propose a suitable solution for
signaling and performing RTP Simulcast.</t>
</section>
<section anchor="sec-definitions" title="Definitions">
<t/>
<section title="Terminology">
<t>This document makes use of the terminology defined in <xref
target="I-D.lennox-raiarea-rtp-grouping-taxonomy">RTP Taxonomy</xref>.
In addition, the following terms are used:<list style="hanging">
<t hangText="Media Configuration:">A specific set of parameter
values applied on the encoding and packetization process that
creates a specific Source Packet Stream. In SDP, the applicable
parameter values are described by the joint set of "rtpmap"
parameters, "fmtp" parameters, and the <xref
target="sec-media-config">"config-id"</xref> parameters, including
extensions.</t>
</list></t>
</section>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
</section>
<section anchor="sec-use-cases" title="Use Cases">
<t>Many use cases of Simulcast as described in this document relate to a
multi-party Communication Session where one or more central nodes are
used to adapt the view of the Communication Session towards individual
Participants, and facilitate the Media Transport between Participants.
Thus, these cases targets the RTP Mixer topology defined in <xref
target="RFC5117"/> (Section 3.4: Topo-Mixer), further elaborated and
extended with other topologies in <xref
target="I-D.ietf-avtcore-rtp-topologies-update"/> (Section 3.6 to
3.9).</t>
<t>There are two principle approaches for an RTP Mixer to provide this
adapted view of the Communication Session to each receiving
Participant:<list style="symbols">
<t>Transcoding (decoding and re-encoding) received Source Packet
Streams with characteristics adapted to each receiving Participant.
This often include mixing or composition of Media Sources from
multiple Participants into a mixed Media Source originated by the
RTP Mixer. The main advantage of this approach is that it achieves
close to optimal adaptation to individual receiving Participants.
The main disadvantages are that it can be very computationally
expensive to the RTP Mixer and typically also degrades media Quality
of Experience (QoE) such as end-to-end delay for the receiving
Participants.</t>
<t>Switching a subset of all received Source Packet Streams or
sub-streams to each receiving Participant, where the used subset is
typically specific to each receiving Participant. The main
advantages of this approach are that it is computationally cheap to
the RTP Mixer and it has very limited impact on media QoE. The main
disadvantage is that it can be difficult to combine a subset of
received Source Packet Streams into a perfect fit to the resource
situation of a receiving Participant.</t>
</list></t>
<t>The use of Simulcast is relates to the latter approach, where it is
more important to reduce the load on the RTP Mixer and/or minimize QoE
impact than to achieve an optimal adaptation of resource usage.</t>
<t>A multicast/broadcast case where the receivers themselves selects the
most appropriate simulcast version and tune in to the right transport to
receive that version is also <xref
target="sec-multicast">considered</xref> . This enables large receiver
populations with heterogeneity where it comes to capabilities and the
use network paths bandwidth.</t>
<t>In this section, an "RTP switch" is used as a common short term for
the terms "switching RTP mixer", "source projecting middlebox", and
"video switching MCU" as discussed in <xref
target="I-D.ietf-avtcore-rtp-topologies-update"/>.</t>
<section anchor="sec-diverse-receivers"
title="Reaching a Diverse Set of Receivers">
<t>The Media Sources provided by a sending Participant potentially
need to reach several receiving Participants that differ in terms of
available resources. A discussion on that topic is included in <xref
target="appendix-a"/>. The receiver resources that typically differ
include, but are not limited to:<list style="hanging">
<t hangText="Codec:">This includes codec type (such as SDP MIME
type) and can include codec configuration options (e.g. SDP fmtp
parameters). A couple of codec resources that differ only in codec
configuration will be "different" if they are somehow not
"compatible", like if they differ in video codec profile, or the
transport packetization configuration.</t>
<t hangText="Sampling:">This relates to how the Media Source is
sampled, in spatial as well as in temporal domain. For video
streams, spatial sampling affects image resolution and temporal
sampling affects video frame rate. For audio, spatial sampling
relates to the number of audio channels and temporal sampling
affects audio bandwidth. This may be used to suit different
rendering capabilities or needs at the receiving endpoints, as
well as a method to achieve different transport capabilities,
bitrates and eventually QoE by controlling the amount of source
data.</t>
<t hangText="Bitrate:">This relates to the amount of bits spent
per second to transmit the Media Source as an Source Packet
Stream, which typically also affects the Quality of Experience
(QoE) for the receiving user.</t>
</list>Letting the sending Participant create a Simulcast of a few
differently configured Source Packet Streams per Media Source can be a
good trade-off when using an RTP switch as middlebox, instead of
sending a single Source Packet Stream and using an RTP Mixer to create
individual transcodings to each receiving Participant.</t>
<t>This requires that the receiving Participants can be categorized in
terms of available resources and that the sending Participant can
choose a matching configuration for a single Source Packet Stream per
category and Media Source.</t>
<t>For example, assume for simplicity a set of receiving Participants
that differ only in that some have support to receive Codec A, and the
others have support to receive Codec B. Further assume that the
sending participant can send both Codec A and B. It can then reach all
receivers by creating two Simulcasted Source Packet Streams from each
Media Source; one for Codec A and one for Codec B.</t>
<t>In another simple example, a set of receiving Participants differ
only in screen resolution; some are able to display video with at most
360p resolution and some support 720p resolution. A sending
Participant can then reach all receivers by creating a Simulcast of
Source Packet Streams with 360p and 720p resolution for each sent
video Media Source.</t>
<t>In more elaborate cases, the receiving Participants differ both in
available Sampling and Bitrate, and maybe also Codec, and it is up to
the RTP switch to find a good trade-off in which Simulcasted stream to
choose for each intended receiver. It is also the responsibility of
the RTP switch to negotiate a good fit of Simulcast streams with the
sending Participant.</t>
<t>The maximum number of Simulcasted Source Packet Streams that can be
sent is mainly limited by the amount of processing and uplink network
resources available to the sending Participant.</t>
</section>
<section anchor="sec-application-specific"
title="Application Specific Media Source Handling">
<t>The application logic that controls the Communication Session may
include special handling of some Media Sources. It is for example
commonly the case that the media from a sending Participant is not
sent back to itself.</t>
<t>It is also common that a currently active speaker Participant is
shown in larger size or higher quality than other Participants (the
Sampling or Bitrate aspects of <xref
target="sec-diverse-receivers"/>). Not sending the active speaker
media back to itself means there is some other Participant's media
instead that receive special handling towards the active speaker;
typically the previous active speaker. This way, the previously active
speaker is needed both in larger size (to current active speaker) and
in small size (to the rest of the Participants), which can be solved
with a Simulcast from the previously active speaker to the RTP
switch.</t>
</section>
<section anchor="sec-multicast"
title="Receiver Adaptation in Multicast/Broadcast">
<t>When using Broadcast or Multicast technology to distribute
real-time media streams to large populations of receivers there can
still be significant heterogeneity among the receiver population. This
can depend on several factors:<list style="hanging">
<t hangText="Network Bandwidth:">The network paths to individual
receivers will have variations in the bandwidth. Thus putting
different limits on the supported bit-rates that can be
received.</t>
<t hangText="Endpoint Capabilities:">The endpoint's hardware and
software can have varying capabilities in relation to screen
resolution, decoding capabilities, and supported media codecs.</t>
</list></t>
<t>To handle these variations, a transmitter of real-time media may
want to apply Simulcast to its Source Packet Streams and provide a set
of media configurations, enabling the receivers to select the best fit
from these sets themselves. The endpoint capabilities will usually
result in a single initial choice. However, the network bandwidth can
vary over time, which requires a client to continuously monitor its
reception to determine if the received media streams still fit within
the available bandwidth. If not, another Simulcast media configuration
containing a thinner set of Source Packet Streams will have to be
chosen.</t>
<t>When one uses IP multicast, the level of Simulcast granularity that
the receiver can select from is by choosing different multicast
addresses. Thus, different Simulcast versions need to be put on
different Media Transports using different multicast addresses. If
these Simulcast versions are described using SDP, they need to be part
of different SDP media descriptions, as SDP binds to transport on
media description level. To enable more than the initial choice to
function well, there is a need to enable correct mapping of Source
Packet Streams in one Simulcast media configuration to a corresponding
Source Packet Stream in another Simulcast media configuration on
another multicast group.</t>
</section>
<section anchor="sec-receiver-preferences"
title="Receiver Media Source Preferences">
<t>The application logic that controls the Communication Session may
allow receiving Participants to apply preferences to the
characteristics of the Source Packet Stream they receive, for example
in terms of the aspects listed in <xref
target="sec-diverse-receivers"/>. Sending a Simulcast of Source Packet
Streams is one way of accommodating receivers with conflicting or
otherwise incompatible preferences.</t>
</section>
</section>
<section anchor="sec-requirements" title="Requirements">
<t>The following requirements need to be met to support the use cases in
previous sections:<list style="hanging">
<t anchor="req-1" hangText="REQ-1:">Identification. It must be
possible to identify a set of simulcasted Source Packet Streams as
originating from the same Media Source:<list style="hanging">
<t anchor="req-1.1" hangText="REQ-1.1:">In SDP signaling.</t>
<t anchor="req-1.2" hangText="REQ-1.2:">On RTP/RTCP level.</t>
</list></t>
<t anchor="req-2" hangText="REQ-2:">Transport usage. The solution
must work when distributing different Simulcast versions on:<list
style="hanging">
<t anchor="req-2.1" hangText="REQ-2.1:">Same Media Transport and
RTP session.</t>
<t anchor="req-2.2" hangText="REQ-2.2:">Different Media
Transports and RTP sessions.</t>
</list></t>
<t anchor="req-3" hangText="REQ-3:">Capability negotiation. It must
be possible that:<list style="hanging">
<t anchor="req-3.1" hangText="REQ-3.1:">Sender can express
capability of sending simulcast.</t>
<t anchor="req-3.2" hangText="REQ-3.2:">Receiver can express
capability of receiving simulcast.</t>
<t anchor="req-3.3" hangText="REQ-3.3:">Sender can express
maximum number of Simulcast versions that can be provided.</t>
<t anchor="req-3.4" hangText="REQ-3.4:">Receiver can express
maximum number of Simulcast versions that can be received.</t>
<t anchor="req-3.5" hangText="REQ-3.5:">Sender can detail the
characteristics of the Simulcast versions that can be
provided.</t>
<t anchor="req-3.6" hangText="REQ-3.6:">Receiver can detail the
characteristics of the Simulcast versions that it prefers to
receive.</t>
</list></t>
<t anchor="req-4" hangText="REQ-4:">Distinguishing features. It must
be possible to have different Simulcast versions use different
values for any combination of:<list style="hanging">
<t anchor="req-4.1" hangText="REQ-4.1:">Codec. This includes
both codec type and configuration options for both codec and RTP
packetization. It also includes different layers from a scalable
codec, but only as long as those layers are possible to identify
on RTP level.</t>
<t anchor="req-4.2" hangText="REQ-4.2:">Bitrate of Source Packet
Stream.</t>
<t anchor="req-4.3" hangText="REQ-4.3:">Sampling in spatial as
well as in temporal domain.</t>
</list></t>
<t anchor="req-5" hangText="REQ-5:">Compatibility. It must be
possible to use Simulcast in combination with other RTP mechanisms
that generate additional Source Packet Streams:<list style="hanging">
<t anchor="req-5.1" hangText="REQ-5.1:"><xref
target="RFC4588">RTP Retransmission</xref>.</t>
<t anchor="req-5.2" hangText="REQ-5.2:"><xref
target="RFC5109">RTP Forward Error Correction</xref>.</t>
</list></t>
<t anchor="req-6" hangText="REQ-6:">Interoperability. The solution
must also be able to use in:<list style="hanging">
<t anchor="req-6.1" hangText="REQ-6.1:">Interworking with
non-simulcast legacy clients using a single Media Source per
media type.</t>
<t anchor="req-6.2" hangText="REQ-6.2:">WebRTC "Unified Plan"
environment.</t>
</list></t>
</list></t>
</section>
<section anchor="sec-solution" title="Proposed Solution Overview">
<t>Signaling Simulcast is about negotiating between media sender and
receiver what the different Simulcast versions should be, how to
identify them in terms of Source Packet Streams, and how to inter-relate
those Source Packet Streams.</t>
<t>The proposed solution consists of:<list style="symbols">
<t>Signaling Simulcast capability in an optional, pre-stage
Offer/Answer:<list style="symbols">
<t>Separate send and receive Simulcast capabilities as SDP
session level attributes.</t>
<t>Media properties that are supported as base for different
Simulcast versions are listed as parameters that are also
possible to rank.</t>
<t>Early indication of maximum number of available
encoding/decoding resources on SDP media level.</t>
</list></t>
<t>Including detailed information for the Simulcast in a main
Offer/Answer:<list style="symbols">
<t>Including Simulcast capability indications, as described
above, being kept from the pre-stage Offer/Answer, if any.</t>
<t>Defining and labeling of the media configuration for each
Simulcast version to be sent or received.</t>
<t>The media configuration for a Simulcast version can include
acceptable parameter ranges for parameters that are most likely
used to distinguish Simulcast versions.</t>
<t>Indicating the use of Simulcast, separately per direction, by
grouping the defined media configurations, not individual
streams, that will constitute the Simulcast.</t>
<t>Allowing that any one of the media configurations in a
specific Simulcast is signaled inactive from the start of the
session. This is defined as equivalent to the affected Source
Packet Stream being in <xref
target="I-D.westerlund-avtext-rtp-stream-pause">PAUSED
state</xref>.</t>
<t>Adding and/or modifying SDP media descriptions as needed to
accommodate the negotiated Simulcast streams.</t>
<t>Parameter limits to the aggregate of media configurations are
signaled by existing SDP attributes on session and media
description level.</t>
<t>Including media level indication of maximum number of
available encoding/decoding resources on SDP media level. They
MAY be modified compared to the pre-stage Offer/Answer, if
any.</t>
<t>Identifying which Source Packet Stream corresponds to which
media configuration by including the configuration label as part
of the SDES item <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref>
information include in the RTP and RTCP packets. The optional
mechanism for source specific signalling defined in SRCNAME
could be used to let Simulcast sender pre-announce such a
relationship before sending the Source Packet Stream.</t>
</list></t>
<t>Adding Simulcast information to the Source Packet Stream:<list
style="symbols">
<t>Identifying Source Packet Streams from same Media Source
using the new RTCP SDES Item <xref
target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref>,
and as described there including the possibility to send the
same information as an <xref target="RFC5285">RTP Header
Extension</xref>.</t>
<t>Using <xref
target="I-D.westerlund-avtext-rtp-stream-pause">PAUSE/RESUME</xref>
functionality to temporarily turn individual Simulcast versions
on or off.</t>
</list></t>
</list></t>
</section>
<section anchor="sec-signaling" title="Proposed Signaling">
<t>This section further details the signaling solution outlined <xref
target="sec-solution">above</xref>.</t>
<section title="Simulcast Capability">
<t>There are numerous media properties that can be varied to construct
a set of Simulcast versions. A Simulcast enabled endpoint could also
support Simulcast based on several of those properties. As long as
those properties are relatively independent and if each Simulcast
version need explicit definition in the SDP, this would lead to an
exponential number of Simulcast version candidates and a very long SDP
that is likely also hard to interpret. There is thus a need to limit
the Simulcast version candidates included in the SDP to cover as small
set of properties as possible.</t>
<t>If a legacy endpoint not supporting Simulcast were to be presented
with an SDP including media descriptions for a set of Simulcast
versions, it may not know how to correctly handle or interpret these
"surplus" media descriptions.</t>
<t>Based on the functionality that Simulcast is intended to achieve,
it should be clear that the reasons to send Simulcast versions are not
the same as to receive Simulcast versions, seen from a single
endpoint.</t>
<t>For these reasons, it is proposed to define two new SDP session
level attributes, "a=sim-send-cap" and "a=sim-recv-cap", which
explicitly signal support for Simulcast media transmission and
Simulcast media reception, respectively, for that media description.
"a=sim-send-cap" and "a=sim-recv-cap" MAY be used independently and
simultaneously. These attributes are also proposed to have parameters
indicating the media properties used to create the Simulcast versions,
and their preferred ranking. The meaning of the attributes on SDP
media level is undefined and MUST NOT be used.</t>
<figure anchor="fig-abnf-cap" title="ABNF for Simulcast Capability">
<artwork><![CDATA[
simulcast-cap = "a="( "sim-send-cap:" / "sim-recv-cap:" )
cap-prop-list
cap-prop-list = cap-prop-entry *(WSP cap-prop-entry)
cap-prop-entry = cap-prop ["=" q-value]
cap-prop = "rtpmap"
/ "fmtp"
/ "imageattr"
/ "framerate"
/ token ; for future extensions
q-value = ( "0" "." 1*2DIGIT )
/ ( "1" "." 1*2("0") )
; Values between 0.00 and 1.00
; WSP and DIGIT defined in [RFC5234]
; token defined in [RFC4566]
]]></artwork>
</figure>
<t>The media property values are taken from existing (and could be
extended to cover other or future) SDP attributes that express media
properties that can be varied to create different Simulcast
versions:<list style="hanging">
<t hangText="rtpmap:">Differences in codec type, sampling rate
(see <xref target="sec-requirements"/>), and number of
channels.</t>
<t hangText="fmtp:">Differences in codec-specific encoding
parameters.</t>
<t hangText="imageattr:">Differences in video resolution and
aspect ratio <xref target="RFC6236"/>.</t>
<t hangText="framerate:">Differences in framerate.</t>
</list></t>
<t>The optional q-value expresses the relative preference to base a
Simulcast version on that media property, with 1.00 meaning maximum
(100%) preference and 0.00 meaning no (0%) preference. Several media
properties can share the same q-value, in which case they are equally
preferred. Not including any q-value for a media property value SHALL
default to a q-value of 1.00.</t>
<t>The list of media properties is made extensible, to allow
introducing additional dimensions for Simulcast versions.</t>
<section title="Declarative Use">
<t>When used as a declarative media description, sim-recv-cap
indicates the configured end-point's required capability to
recognize and receive a specified set of Source Packet Streams as
Simulcast streams. In the same fashion, sim-send-cap requests the
end-point to send a specified set of Source Packet Streams as
Simulcast streams. sim-recv-cap and sim-send-cap MAY be used
independently and at the same time and they need not specify the
same capability properties.</t>
</section>
<section title="Offer/Answer Use">
<t>An offerer wanting to use Simulcast SHALL include either one or
both of those attributes, depending on in which direction(s)
Simulcast is both supported and desirable. An offerer that receives
an answer without "a=sim-send-cap" or "a=sim-recv-cap" MUST NOT
define or use any Simulcast alternatives in that direction to the
answerer.</t>
<t>An answerer that does not understand the concept of Simulcast
will also not know those attributes and will remove them in the SDP
answer, as defined in existing SDP Offer/Answer procedures. An
answerer that does understand the attributes and that wants to
support Simulcast in the indicated direction SHALL reverse
directionality of the attribute; "sim-send-cap" becomes
"sim-recv-cap" and vice versa, and include it in the answer.</t>
<t>An offerer that intends to send Simulcast alternatives and thus
includes "a=sim-send-cap", MUST also include at least one media
property parameter that it intends to use to construct the Simulcast
alternatives, but it MAY include more media property parameters.
Including multiple media property parameters in "a=sim-send-cap"
SHALL be interpreted as an offer to send Simulcast versions covering
all combinations thereof, but MAY be further restricted by other
information in the SDP such as for example the number of
simulcast-related media descriptions in the SDP or use of <xref
target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
signaling</xref>.</t>
<t>An offerer that is capable of receiving Simulcast alternatives
and thus includes "a=sim-recv-cap", MUST also include at least one
media property parameter that it is willing to use as discriminator
between received Simulcast alternatives, but MAY include more media
property parameters. Including multiple media property parameters in
"a=sim-recv-cap" SHALL be interpreted as an offer to receive
Simulcast versions covering all combinations thereof, but MAY be
further restricted by other information in the SDP such as for
example the number of simulcast-related media descriptions in the
SDP or use of <xref target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
signaling</xref>.</t>
<t>An answerer that either lacks the capability or does not desire
to use Simulcast versions based on a certain media property
parameter in a specific direction MUST remove such media property
parameter from "a=sim-send-cap" or "a=sim-recv-cap". The answerer
MUST NOT add any media property parameters that were not included in
the offer.</t>
<t>An answerer SHOULD take the offerer's q-values into account when
choosing which <xref target="sec-media-config">media
configurations</xref> to include in the answer and how to <xref
target="sec-group-config">group them</xref> into the resulting
Simulcast(s).</t>
</section>
</section>
<section anchor="sec-media-config" title="Media Configuration">
<t>Media that constitutes a Simulcast version has certain desirable
characteristics that is meant to suit one category of <xref
target="sec-diverse-receivers">diverse receivers</xref>. A receiver
that is willing to receive Simulcast streams must be given sufficient
means to express what it is capable of and desires to receive. A
sender that is willing to send Simulcast streams must similarly be
given sufficient means to express what it is capable of and desires to
send.</t>
<t>An obvious candidate to express those characteristics is the media
format in an SDP media description, defined by the rtpmap and fmtp
attributes, which is typically mapped to an RTP Payload Type. Some of
the most interesting characteristics for Simulcast purposes are
however not included in rtpmap or fmtp, but are instead defined as
separate attributes. Some of those individual attributes are possible
to directly relate to a defined media format and could form a
configuration together with the media format, but some attributes
cannot be related to a specific media format and using the existing
media format as a common identifier for a media configuration is not
fully sufficient.</t>
<t>The act of Simulcast is trying to handle senders and receivers
belonging to the vast multi-dimensional parameter space of "media
configuration" by sub-dividing that parameter space into manageable
and meaningful sub-sets. Communication between a sender and a receiver
can be established successfully only when the actually sent media
configuration (sub-set) fits within the receiver's available media
configuration sub-set. At the same time, practical and implementation
aspects often limits the size of those sub-sets. When that receiver or
sender sub-set is either too small or is not known, the probability of
successful communication decreases significantly. To increase the
probability of finding a match between sender and receiver media
configurations, it is essential that a media configuration can be a
set instead of a single point in the parameter space, i.e. include
parameter listings and/or ranges instead of single values.</t>
<t>Therefore, it is proposed to define a new media level SDP
attribute, "a=config-id", which has relate the needed parameter types
and the corresponding value ranges that together constitute a
Simulcast media configuration. Each SDP media description MAY contain
zero or more config-id attributes. The meaning of the attribute on SDP
session level is undefined and MUST NOT be used.</t>
<figure anchor="fig-abnf-config" title="ABNF for Media Configuration">
<artwork><![CDATA[
configuration = "a=config-id:" config-id WSP config-dir
WSP config-list
config-id = token
config-dir = "send"
/ "recv"
config-list = config-entry *(WSP config-entry)
config-entry = "pt" "=" pt-value *("," pt-value)
/ image-attr
/ "framerate" "=" fr-param
/ "b" "=" bw-mod ":" bw-value *1("-" bw-value)
/ ext-config-id [ "=" ext-config-value ]
; for future ext
image-attr = "imageattr" "=" resolution-list
resolution-list = resolution-set *("," resolution-set)
ext-config-id = token
ext-config-value = non-ws-string
pt-value = 1*3DIGIT ; could be made more strict
resolution-set = "[" "x=" xyrange "," "y=" xyrange *key-values "]"
key-values = ( "," key-value )
key-value = ( "sar=" srange )
/ ( "par=" prange )
/ ( "q=" qvalue )
onetonine = "1" / "2" / "3" / "4" / "5"
/ "6" / "7" / "8" / "9"
xyvalue = onetonine *5DIGIT
step = xyvalue
xyrange = ( "[" xyvalue ":" [ step ":" ] xyvalue "]" )
/ ( "[" xyvalue 1*( "," xyvalue ) "]" )
/ ( xyvalue )
spvalue = ( "0" "." onetonine *3DIGIT )
/ ( onetonine "." 1*4DIGIT )
srange = ( "[" spvalue 1*( "," spvalue ) "]" )
/ ( "[" spvalue "-" spvalue "]" )
/ ( spvalue )
prange = ( "[" spvalue "-" spvalue "]" )
qvalue = ( "0" "." 1*2DIGIT )
/ ( "1" "." 1*2("0") )
fr-param = fr-value *("," fr-value)
/ fr-value "-" fr-value
fr-value = 1*3DIGIT [ "." 1*2DIGIT ]
bw-mod = "AS"
/ "TIAS"
/ token ; for future extensions
bw-value = 1*DIGIT
; WSP, DQUOTE and DIGIT defined in [RFC5234]
; token and non-ws-string defined in [RFC4566]
]]></artwork>
</figure>
<t>A media configuration is thus identified by:<list style="hanging">
<t hangText="config-id:">A token that identifies the media
configuration, which MUST be unique across all media
configurations and media descriptions in the SDP.</t>
<t hangText="config-dir:">The direction for the stream(s)
receiving the media configuration, as seen from the part issuing
the SDP.</t>
</list></t>
<t>The media configuration MUST contain at least one and MAY contain
more of the below media configuration entries. Each entry type MUST
NOT appear more than once in every media configuration.</t>
<t><list style="hanging">
<t hangText="pt:">A comma-separated list of media formats, RTP
payload types, which MUST be defined within the same media
description as config-id. This describes the allowed set of codecs
or codec configurations for this media configuration. MUST be
present in every media configuration.</t>
<t hangText="imageattr:">An OPTIONAL listing of preferred image
resolutions for this media configuration. MUST NOT be used with
other than video and image media types. An imageattr media
configuration entry MUST NOT conflict with any "a=imageattr"
attribute present in the same media description.</t>
<t hangText="framerate:">An OPTIONAL range or enumeration of
preferred framerates for this media configuration. MUST NOT be
used with other than video media types. The high end of the range
MUST be equal to or larger than the low end. An enumerating
framerate media configuration entry MUST include the value of the
"a=framerate" attribute, if any. A framerate range media
configuration entry MUST include the "a=framerate" value in the
range.</t>
<t hangText="b:">An acceptable bandwidth range for this media
configuration. Either one of the defined bandwidth modifiers MAY
be used, which MUST share semantics with corresponding bandwidth
modifiers from the SDP bandwidth attribute. The bandwidth value
MUST be interpreted as defined by the bandwidth modifier. The high
end of the range MUST be equal to or larger than the low end. The
high end of the range MUST NOT exceed the bandwidth parameter in
the same media description, if any. The sum of bandwidth range low
ends for all media configurations within a media description MUST
NOT exceed the value of that media description's bandwidth
parameter. MUST be present in every media configuration.</t>
</list></t>
<t>Media configuration entry types "pt" and "b" MUST be supported by
all implementations of this specification. Otherwise, an
implementation MAY ignore any media configuration entry types that are
not understood. A media configuration MAY be re-used to describe more
than a single Source Packet Stream.</t>
<section title="Simulcast Limitations">
<t>The Session and Media level attributes and parameters outside of
individual media configurations (a=config-id) provides limitations
on the set of media configurations in simultanuous use. For example
a media description bandwidth limitation using b=AS would apply on
all the Packet Streams sent within the scope of that media
description, thus forcing the sum of the media configuration
bandwidth in use to share that available bandwidth. Don't forget
other Packet Streams such as RTP retransmission or FEC flows that
also needs to be included.</t>
<t>There exist a number of different limitations, and this section
does not intend to be complete. The payload formats and their
configurations can offer limitations, for example video profile and
levels imposes a joint limit on bit-rate, frame-rate and resolution.
The bandwidth parameters on session and media description level
apply according to their semantics and their level. Packetization
limitations, e.g. maxptime, as well as recommendations apply to all
the configurations within the scope where this parameter is
defined.</t>
<t>It is important to note that limits, such as bandwidth expressed
within a media configuration are not limited by the media
description values. First of all, the sum of bit-rates across all
media configurations in a media description can be greater than the
media description limit as not all configurations may be in
simultanuous use. For example, only a single configuration can be
enabled, which is then allowed to consume the full outer limit.
Secondly, the media configuration directionality needs to be taken
into account, for example that SDP receiver limitations are not
applied to the sender configuration.</t>
</section>
<section title="Declarative Use">
<t>When used as a declarative media description, config-id with recv
parameter indicates the configured end-point's required media
configuration to receive a specified set of Source Packet Streams as
Simulcast streams. In the same fashion, config-id with send
parameter requests the end-point to use the specified media
configuration when sending a specified set of Source Packet Streams
as Simulcast streams.</t>
</section>
<section title="Offer/Answer Use">
<t>An offerer wanting to use Simulcast in a specific direction SHALL
use config-id to describe the media configurations to use in that
direction in the Offer.</t>
<t>An answerer receiving a config-id media configuration for a
specific direction, accepting to use that media configuration SHALL
include a corresponding media configuration with the reverse
direction in the Answer. The config-id identification value MUST be
kept between the Offer and the Answer. An answerer not accepting to
use a specific media configuration SHALL remove it from the
Answer.</t>
<t>The Answer MUST keep exactly the same media configuration types
in a media configuration as were present in the corresponding media
configuration in the Offer.</t>
<t>The answerer MAY remove values from enumerations and MAY reduce
ranges of media configuration entries in the Answer. If the reduced
media configuration entry relates to the answerer's send direction,
negotiation is complete and no further action is needed. If the
reduced media configuration relates to the answerer's receive
direction, the offerer SHOULD send another Offer where that related,
send direction media configuration is reduced at least to the level
in the previous Answer, but MAY be reduced even more, and MAY be
removed entirely.</t>
</section>
</section>
<section anchor="sec-group-config"
title="Grouping Simulcast Configurations">
<t>A set of <xref target="sec-media-config">media
configurations</xref> is needed to describe a Simulcast. Each Source
Packet Stream in the Simulcast share the same Media Source, but have
different media configurations. Thus, the actual grouping of media
configurations is what defines a specific Simulcast. It is proposed to
define two new media level and session level SDP attributes,
"a=sim-send" and "a=sim-recv", which uses config-id values to group
media configurations for the purpose of Simulcast transmission and
reception, respectively. "a=sim-send" and "a=sim-recv" MAY be used
independently and simultaneously. They MAY be used on session level to
group media configurations when different Simulcast encodings of a
Media Source are to be sent in different Media Transports and RTP
sessions. They MAY also be used on media level to group media
configurations when different Simulcast encodings of a Media Source
are to be sent based on the same media description and thus use the
same Media Transport and RTP session. When used on media level, the
Simulcast direction MAY conflict with the general media description
direction, but a conflict MUST be interpreted as the Simulcast being
effectively inhibited. For example, sim-send in a recvonly media
description means that no Simulcast Source Packet Streams are
sent.</t>
<figure anchor="fig-abnf-group"
title="ABNF for Simulcast Configuration Grouping">
<artwork><![CDATA[
simulcast = "a="( "sim-send:" / "sim-recv:" ) config-id-list
config-id-list = config-item *(WSP config-item)
config-item = config-id [":" config-param-list]
config-id = token
config-param-list = config-param *("," config-param)
config-param = "inactive"
/ token ["=" param-value] ; for future extension
param-value = 1*(value-char)
/ DQUOTE non_ws_string DQUOTE
value-char = token-char / %x28 / %x29 / %x2F / %x3A-3C
/ %x3E-40 / %x5B-5D ; VCHAR except "=" and ","
; WSP and VCHAR defined in [RFC5234]
; token, token-char and non_ws_string defined in [RFC4566]
]]></artwork>
</figure>
<t>The config-id identification of a media configuration MUST be
defined by a "config-id" attribute in any of the media descriptions
that are part of the SDP.</t>
<section title="Declarative Use">
<t>When used as a declarative media description, sim-recv indicates
the configured end-point's required ability to receive Source Packet
Streams with the specified set of media configurations as Simulcast
streams. In the same fashion, sim-send requests the end-point to
send Source Packet Streams with the specified set of media
configurations as Simulcast streams.</t>
<t>The configuration parameter "inactive" SHALL be interpreted as
the related Source Packet Stream is in <xref
target="I-D.westerlund-avtext-rtp-stream-pause">PAUSED state</xref>
at the start of the session, and applicable RTP level procedures
from that specification SHALL be applied.</t>
</section>
<section title="Offer/Answer Use">
<t>An offerer wanting to send a set of Source Packet Streams as
Simulcast streams includes sim-send in the Offer to describe which
media configurations to use for that Simulcast. Similarly, an
offerer wanting to receive a set of Source Packet Streams as
Simulcast streams includes sim-recv in the Offer to describe which
media configurations to use for that Simulcast.</t>
<t>An answerer receiving sim-send, accepting to receive those media
configurations as Simulcasted Source Packet Streams SHALL include
sim-recv with the accepted media configurations in the Answer.
Similarly, an answerer receiving sim-recv, accepting to send those
media configurations as Simulcasted Source Packet Streams SHALL
include sim-send with the accepted media configurations in the
Answer. An answerer MAY remove media configurations from sim-send or
sim-recv included in the Answer compared to the ones included in the
sim-send or sim-recv in the Offer. The answerer MUST NOT add any
media configurations to sim-send or sim-recv in the Answer that were
not in the corresponding ones in the Offer.</t>
<t>An "inactive" parameter present in the Offer MUST be kept in the
Answer. The Answer MAY add an "inactive" parameter to any of the
media configurations. An "inactive" parameter on a media
configuration in "sim-recv" is equivalent to a <xref
target="I-D.westerlund-avtext-rtp-stream-pause">PAUSE (or in some
cases, an equivalent TMMBR 0) message</xref> being sent for the
received Source Packet Stream at the start of the session, and
applicable RTP level procedures from that specification SHALL be
applied. An "inactive" parameter on a media configuration in
"sim-send" is equivalent to the related Source Packet Stream being
in PAUSED state at the start of the session, and applicable RTP
level procedures SHALL be applied.</t>
<t>The number of different Source Packet Streams used for a
Simulcast related to a single media description MUST NOT exceed the
number of listed media configurations in the corresponding sim-recv
in that media description sent by the media receiver.</t>
</section>
</section>
<section anchor="sec-srcname" title="Relating Simulcast Versions">
<t>To ensure that Simulcast Packet Streams can be related correctly on
RTP level, <xref target="I-D.westerlund-avtext-rtcp-sdes-srcname">SDES
SRCNAME</xref> MUST be used to label Simulcast versions belonging to
the same Media Source. The RTP Header Extension option of that
specification MAY be used with Simulcast.</t>
<t>The SRCNAME identifier for Simulcast MUST contain a first part that
uniquely identifies the Media Source within a given CNAME, followed by
a single "." (period) and the config-id as defined <xref
target="sec-media-config">above</xref>.</t>
<t>The SRCNAME parameter to <xref target="RFC5576">source-specific
signaling</xref> ("a=ssrc") MAY be used for Source Packet Streams in
the send direction to relate SRCNAME to SSRC already in the SDP.</t>
</section>
<section anchor="sec-two-phase" title="Two-Phase Negotiation">
<t>The new "a=sim-send-cap" and "a=sim-recv-cap" attributes MAY be
included in the SDP as an optional pre-stage in a two-phased approach,
where the pre-stage involves a first SDP Offer/Answer procedure that
only establishes Simulcast capability at both the offerer and the
answerer. This has the additional advantage to avoid sending media
descriptions related to Simulcast to an endpoint that does not support
simulcast. In case two Offer/Answer procedures are already used for
other reasons, it will not incur any significant extra signaling
round-trips. Such other two-phase techniques include use of SIP
OPTIONS, <xref target="RFC3311">SIP UPDATE</xref> with reliable
provisional responses, and <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>.</t>
<t>Thus, when using the pre-stage Offer/Answer, it SHOULD NOT include
any simulcast-grouped media descriptions, which SHOULD then instead be
added in a main Offer/Answer phase. When using the pre-stage
Offer/Answer, half a signaling round-trip time can sometimes be saved
if main phase is initiated by the Simulcast receiver, meaning that the
endpoint that included "a=sim-recv" in the pre-stage SDP is the
offerer in the main phase. If both endpoints are Simulcast receivers,
it does not matter which endpoint sends the main Offer, using regular
Offer/Answer rules to handle any race conditions.</t>
<t>It is not possible to use any pre-stage to establish capability
with declarative SDP, in which case it SHALL be by-passed, using only
the main phase directly.</t>
</section>
<section title="Signaling Examples">
<t>These examples are for a case of client to video conference service
using a centralized media topology with an RTP mixer.</t>
<figure align="center" anchor="fig-mixer-four-party"
title="Four-party Mixer-based Conference">
<artwork><![CDATA[
+---+ +-----------+ +---+
| A |<---->| |<---->| B |
+---+ | | +---+
| Mixer |
+---+ | | +---+
| F |<---->| |<---->| J |
+---+ +-----------+ +---+]]></artwork>
</figure>
<section anchor="sec-ex-unified-plan" title="Unified Plan Client">
<t>Alice is calling in to the mixer with a Simulcast-enabled Unified
Plan client capable of a single Media Source per media type. The
only difference to a non-Simulcast client is capability to send
<xref target="RFC6236">video resolution</xref> ("imageattr") and
framerate based Simulcast. Alice uses a pre-stage Offer, which looks
like:</t>
<figure anchor="fig-up-first-offer"
title="Unified Plan Simulcast Pre-Stage Offer">
<artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:665
a=sim-send-cap:imageattr framerate
m=audio 49200 RTP/AVP 96 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:8 PCMA/8000
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
recv [x=640,y=360] [x=320,y=180]
]]></artwork>
</figure>
<t>In this pre-stage, the only thing in the SDP that indicates
Simulcast capability is the line in the video media description
containing the "sim-send-cap" attribute, which also indicates that
sent Simulcast versions can differ in video resolution and/or
framerate.</t>
<t>The Answer from the server indicates both that it too is
Simulcast capable and that it would prefer to use video resolution
("imageattr") based Simulcast, but that it supports both video
resolution and framerate. Should it not have been Simulcast capable,
the "a=sim-recv-cap" line would not have been present and
communication would have started with the media negotiated in the
SDP.</t>
<figure anchor="fig-up-first-answer"
title="Unified Plan Simulcast Pre-Stage Answer">
<artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:665
a=sim-recv-cap:imageattr=1.0 framerate=0.8
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
recv [x=640,y=360] [x=320,y=180]
]]></artwork>
</figure>
<t>Since the server is the Simulcast media receiver, it immediately
initiates another Offer/Answer including details on the Simulcast
versions. The server also keeps the "sim-recv-cap" as explicit
Simulcast capability indication in this main Offer/Answer. Note that
the "non-simulcast" media can be started already now, before the
main Offer/Answer, with the only restriction that the Simulcast
functionality is not yet established.</t>
<figure anchor="fig-up-main-offer"
title="Unified Plan Simulcast Main Offer">
<artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Server Inviting Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:825
a=sim-recv-cap:imageattr=1.0 framerate=0.8
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:2200
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=config-id:a recv pt=97 imageattr=[x=640,y=360],[x=1280,y=720] \
framerate=25-60 b=AS:500-2500
a=config-id:b recv pt=97 imageattr=[x=320,y=180],[x=640,y=360] \
framerate=25-60 b=AS:150-500
a=config-id:c recv pt=97 imageattr=[x=256,y=144],[x=320,y=180] \
framerate=10-30 b=AS:100-250
a=sim-recv:a b c
]]></artwork>
</figure>
<t>The server chooses to structure the Answer according to Unified
Plan and has added three config-id lines in the video media
description, one for each Simulcast media configuration that it is
prepared to receive. Each media configuration refers to a defined
media format, and lists a set of preferred video resolutions as well
as a range of acceptable framerates, concluded by a bandwidth range.
It also includes the sim-recv attribute for those three media
configurations, indicating that the Simulcast it is prepared to
receive in this media description can include one or more of those
media configurations.</t>
<t>Alice's Answer is:</t>
<figure anchor="fig-up-main-answer"
title="Unified Plan Simulcast Main Answer">
<artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Final answer from Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=sim-send-cap:imageattr framerate
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=config-id:b send pt=97 imageattr=[x=640,y=360] \
framerate=25-30 b=AS:150-400
a=config-id:c send pt=97 imageattr=[x=320,y=180] \
framerate=10-12.5 b=AS:100-150
a=sim-send:b c:inactive
a=ssrc:31053821 cname=SDIe93850aQFid9P srcname=1.b
a=ssrc:43298172 cname=SDIe93850aQFid9P srcname=1.c
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
recv [x=640,y=360] [x=320,y=180]
]]></artwork>
</figure>
<t>The Simulcast capability, sim-send-cap, is kept from Alice's
previous Offer. One of the media configurations from the server
Offer, config-id:a, is not acceptable to Alice's client for some
reason and is removed from the Answer. The resulting Simulcast,
described by sim-send, thus contains two media configurations, b and
c, where c is initially set to "inactive" that effectively means it
is paused from the start of the session. The media configuration
parameter value ranges are in some cases reduced, which makes a more
precise definition of what will actually be sent. This Answer SDP
also includes a specification of the SSRC values that will be sent
and what media configurations those SSRC will carry, by including
the srcname parameter. The first part of srcname, before the ".", is
the Media Source identification. Both SSRC share the same Media
Source identification, since they are part of the same Simulcast.
The second part, after the ".", is the config-id of the media
configuration sent with that SSRC.</t>
</section>
<section anchor="sec-ex-multi-transport"
title="Multi-Transport Client">
<t>Bob is calling in to the mixer with a Simulcast-enabled client,
like Alice's capable of a single Media Source per media type, but
also capable of sending Source Packet Streams as Simulcast versions
on separate Media Transports. In this example, Bob's client knows
that the server is capable of Simulcast and does not use any
pre-stage Offer, but goes straight to the main Offer.</t>
<figure anchor="fig-mt-main-offer"
title="Multi-Transport Simulcast Main Offer">
<artwork><![CDATA[
v=0
o=bob 94572932847 3429478298 IN IP4 192.0.2.93
s=Offer from Simulcast Enabled Multi-Transport Client
t=0 0
c=IN IP4 192.0.2.93
b=AS:825
a=sim-send-cap:imageattr=1.0 framerate=0.9
a=sim-send:x y
m=audio 50138 RTP/AVP 101
b=AS:145
a=rtpmap:101 G719/48000/2
m=video 50226 RTP/AVP 118
b=AS:500
a=rtpmap:118 H264/90000
a=fmtp:118 profile-level-id=42c01e
a=config-id:x send pt=118 imageattr=[x=320,y=180],[x=640,y=360] \
framerate=25-50 b=AS:200-500
a=ssrc:3929384298 cname=Nsdko39Oen828FKn srcname=M.x
a=imageattr:118 send [x=640,y=360] [x=320,y=180] \
recv [x=640,y=360] [x=320,y=180]
m=video 50228 RTP/AVP 119
b=AS:150
a=config-id:y send pt=119 imageattr=[x=256,y=144],[x=320,y=180] \
framerate=12.5-25 b=AS:100-200
a=ssrc:1923419284 cname=Nsdko39Oen828FKn srcname=M.y
a=imageattr:119 send [x=320,y=180] [x=256,y=144]
a=sendonly
]]></artwork>
</figure>
<t>As can be seen from above, this Offer uses sim-send on session
level and has split the Simulcast media configurations on two media
descriptions, in order to be able to use separate Media Transports
and enable differentiated treatment of the two Simulcast
streams.</t>
<t>The server accepts this structure to the Answer:</t>
<figure anchor="fig-mt-main-answer"
title="Multi-Transport Simulcast Main Answer">
<artwork><![CDATA[
v=0
o=server 283479882 9384298374 IN IP4 192.0.2.2
s=Server Answering Simulcast Enabled Multi-Transport Client
t=0 0
c=IN IP4 192.0.2.45
b=AS:825
a=sim-recv-cap:imageattr framerate
a=sim-recv:x y
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 118
b=AS:500
a=rtpmap:118 H264/90000
a=fmtp:118 profile-level-id=42c01e
a=config-id:x recv pt=118 imageattr=[x=640,y=360] \
framerate=25-50 b=AS:350-500
a=imageattr:118 send [x=640,y=360] [x=320,y=180] \
recv [x=640,y=360] [x=320,y=180]
m=video 49300 RTP/AVP 119
b=AS:150
a=rtpmap:119 H264/90000
a=fmtp:119 profile-level-id=42c01e
a=config-id:y recv pt=119 imageattr=[x=256,y=144] \
framerate=12.5-25 b=AS:120-150
a=imageattr:119 recv [x=320,y=180] [x=256,y=144]
a=recvonly
]]></artwork>
</figure>
<t/>
</section>
<section title="Multi-Source Client">
<t>Fred is calling in to the same conference as in the examples
above with a three-camera, three-display system, thus capable of
handling three separate Media Sources in each direction, where each
Media Source is also Simulcast-enabled in the send direction. Fred's
client is a Unified Plan client, restricted to a single Media Source
per media description.</t>
<figure anchor="fig-ms-main-offer"
title="Fred's Multi-Source Simulcast Main Offer">
<artwork><![CDATA[
v=0
o=fred 238947129 823479223 IN IP4 192.0.2.125
s=Offer from Simulcast Enabled Multi-Source Client
t=0 0
c=IN IP4 192.0.2.125
b=AS:825
a=sim-send-cap:imageattr=1.0 framerate=0.5
m=audio 49200 RTP/AVP 98
b=AS:145
a=rtpmap:98 G719/48000/2
m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:1h send pt=100 imageattr=[x=1920,y=1080] \
framerate=30-60 b=AS:2000-3500
a=config-id:1m send pt=100 imageattr=[x=1280,y=720] \
framerate=15-60 b=AS:1000-2000
a=config-id:1l send pt=100 imageattr=[x=640,y=360] \
framerate=10-60 b=AS:200-1000
a=sim-send:1h 1m 1l
a=ssrc:2397234521 cname=EkeS32892FeO29DK srcname=1.1h
a=ssrc:1023894789 cname=EkeS32892FeO29DK srcname=1.1m
a=ssrc:4029284928 cname=EkeS32892FeO29DK srcname=1.1l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]
m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:2h send pt=100 imageattr=[x=1920,y=1080] \
framerate=30-60 b=AS:2000-3500
a=config-id:2m send pt=100 imageattr=[x=1280,y=720] \
framerate=15-60 b=AS:1000-2000
a=config-id:2l send pt=100 imageattr=[x=640,y=360] \
framerate=10-60 b=AS:200-1000
a=sim-send:2h 2m 2l
a=ssrc:2301017618 cname=EkeS32892FeO29DK srcname=2.2h
a=ssrc:639711316 cname=EkeS32892FeO29DK srcname=2.2m
a=ssrc:3293473905 cname=EkeS32892FeO29DK srcname=2.2l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]
m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:3h send pt=100 imageattr=[x=1920,y=1080] \
framerate=30-60 b=AS:2000-3500
a=config-id:3m send pt=100 imageattr=[x=1280,y=720] \
framerate=15-60 b=AS:1000-2000
a=config-id:3l send pt=100 imageattr=[x=640,y=360] \
framerate=10-60 b=AS:200-1000
a=sim-send:3h 3m 3l
a=ssrc:4115355057 cname=EkeS32892FeO29DK srcname=3.3h
a=ssrc:3196538337 cname=EkeS32892FeO29DK srcname=3.3m
a=ssrc:3757973912 cname=EkeS32892FeO29DK srcname=3.3l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]
]]></artwork>
</figure>
<t>The three media descriptions for video are essentially the same,
except values that needs to be unique are provided unique values.
The above also assumes that BUNDLE will be used across these three
video media description to create a common RTP session. </t>
</section>
</section>
</section>
<section anchor="sec-network-aspects" title="Network Aspects">
<t>Simulcast is in defined as the act of sending multiple alternative
encodings of the same underlying media source. When transmitting
multiple independent streams that originate from the same source, it
could potentially be done in several different ways using RTP. A general
discussion on considerations for use of the different RTP multiplexing
alternatives can be found in <xref
target="I-D.ietf-avtcore-multiplex-guidelines">Guidelines for
Multiplexing in RTP</xref>. Discussion and clarification on how to
handle multiple streams in an RTP session can be found in <xref
target="I-D.ietf-avtcore-rtp-multi-stream"/>.</t>
<t>The network aspects that are relevant for Simulcast are:<list
style="hanging">
<t hangText="Quality of Service:">When using Simulcast it might be
of interest to prioritize a particular Simulcast version, rather
than applying equal treatment to all versions. For example, lower
bit-rate versions may be prioritized over higher bit-rate versions
to minimize congestion or packet losses in the low bit-rate
versions. Thus, there is a benefit to use a Simulcast solution that
supports QoS as good as possible. By separating Simulcast versions
into different RTP sessions and send those RTP sessions over
different Media Transports, a Simulcast version can be prioritized
by existing flow based QoS mechanisms. When using unicast, QoS
mechanisms based on individual packet marking are also feasible,
which do not require separation of Simulcast versions into different
RTP sessions to apply different QoS.</t>
<t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
incur more cost for NAT/FW traversal unless they can re-use the same
transport flow, which can be achieved by either one of <xref
target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
multiple RTP sessions on a single lower layer transport</xref> or
<xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">Multiplexing
Negotiation Using SDP Port Numbers</xref>. If flow based QoS with
any differentiation is desirable, the cost for additional transport
flows is likely necessary.</t>
<t hangText="Multicast:">Multiple RTP sessions will be required to
enable combining Simulcast with multicast. Different Simulcast
versions have to be separated to different multicast groups to allow
a multicast receiver to pick the version it wants, rather than
receive all of them. In this case, the only reasonable
implementation is to use different RTP sessions for each multicast
group so that reporting and other RTCP functions operate as
intended.</t>
</list></t>
<t/>
</section>
<section anchor="sec-iana" title="IANA Considerations">
<t>This document requests that five new attributes, sim-send-cap,
sim-recv-cap, sim-send, sim-recv, and config-id. It is also requested to
make a new registry of defined parameters taken from existing SDP
attributes for sim-send-cap, sim-recv-cap, and config-id.</t>
<t>Formal registrations to be written.</t>
</section>
<section anchor="sec-security" title="Security Considerations">
<t>The Simulcast capability and configuration attributes and parameters
are vulnerable to attacks in signaling.</t>
<t>A false inclusion of Simulcast attributes may result in generation of
a second phase SDP that potentially contains a large number of
non-supported media descriptions expressing Simulcast alternatives. A
correct SDP implementation will however be able to reject any
non-supported media descriptions and the effect from that should be
limited.</t>
<t>A hostile removal of the Simulcast attributes will result in skipping
any second phase Offer/Answer and that Simulcast is not used.</t>
<t>The Simulcast grouping semantics are vulnerable to attacks in the
signalling. Changing the set of media configurations that are used in a
Simulcast will impact the number of Source Packet Streams.</t>
<t>A hostile removal of Simulcast grouping will prevent streams from
being interpreted as Simulcast, which obviously prevents use of the
Simulcast functionality. It will also risk that intended Simulcast
streams are instead presented as separate, independent streams to a
receiver.</t>
<t>Neither of the above will likely have any major consequences and can
be mitigated by signaling that is at least integrity and source
authenticated to prevent an attacker to change it.</t>
</section>
<section title="Contributors">
<t>Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have
contributed with important material to the first versions of this
document.</t>
</section>
<section anchor="sec-ack" title="Acknowledgements">
<t/>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include='reference.RFC.3311'?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.4568'?>
<?rfc include='reference.RFC.5109'?>
<?rfc include='reference.RFC.5234'?>
<?rfc include='reference.RFC.5285'?>
<?rfc include='reference.RFC.5576'?>
<?rfc include='reference.RFC.5888'?>
<?rfc include='reference.RFC.6236'?>
<?rfc include='reference.I-D.westerlund-avtext-rtcp-sdes-srcname'?>
<?rfc include='reference.I-D.westerlund-mmusic-max-ssrc'?>
<?rfc include='reference.I-D.westerlund-avtext-rtp-stream-pause'?>
</references>
<references title="Informative References">
<?rfc include='reference.RFC.3264'?>
<?rfc include='reference.RFC.3569'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.RFC.5117'?>
<?rfc include='reference.RFC.5245'?>
<?rfc include='reference.RFC.6190'?>
<?rfc include='reference.I-D.ietf-avtcore-multiplex-guidelines'?>
<?rfc include='reference.I-D.ietf-avtcore-rtp-multi-stream'?>
<?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>
<?rfc include='reference.I-D.ietf-avtcore-rtp-topologies-update'?>
<?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>
<?rfc include='reference.I-D.lennox-raiarea-rtp-grouping-taxonomy'?>
</references>
<section anchor="appendix-a" title="Discussion on Receiver Diversity">
<t>Receiver diversity can be handled in a number of different ways, each
with its own advantages and disadvantages. In that, there are relations
between RTP Mixer processing requirement, bandwidth usage on uplink from
sending Participant to RTP Mixer, bandwidth usage on downlink from RTP
Mixer to receiving Participant, and media Quality of Experience at the
receiving Participant.</t>
<t>The following is a listing of possible approaches:<list
style="numbers">
<t>Lowest Common Denominator: Create a single Source Packet Stream
per Media Source and, assuming that everyone can receive a "simple"
stream, adapt the characteristics of that Source Packet Stream
already at the sending Participant to the lowest common denominator
among all receiving Participants. Let the RTP Mixer forward this
single Source Packet Stream to all receiving Participants. The
advantages are low bandwidth usage on both uplink and downlink and
low RTP Mixer processing requirements. The disadvantage is that the
least capable receiver and/or network path dictates the (low) QoE
for everyone else.</t>
<t>Individual Transcoding: Create a single Source Packet Stream per
Media Source with characteristics governed by resources available to
the sending Participant and the network path to the RTP Mixer. Let
the RTP Mixer transcode (decode and re-encode) that into individual
Source Packet Streams for each receiving Participant, governed by
the RTP Mixer resources, receiving Participant resources, and the
network path to that Participant. The advantages are adapted
although overall slightly lowered QoE (due to transcoding) to each
Participant and optimised bandwidth usage on both uplink and
downlink. The disadvantage is (very) high RTP Mixer processing
requirements.</t>
<t>Individual Simulcast: Create individual Source Packet Streams of
each Media Source to each receiving Participant, constituting a
complete individual Simulcast. Let the RTP Mixer forward each
individual Source Packet Stream to the targeted receiving
Participant. The advantages are low RTP Mixer processing and
optimised downlink bandwidth. The disadvantage is (very) high uplink
bandwidth.</t>
<t>Grouped Simulcast: For each Media Source, create a "suitable"
logical grouping of receiving Participants in sub-groups with
respect to available receiver resources, for example the resources
listed <xref target="sec-diverse-receivers">above</xref>. Create a
set of Source Packet Streams for this Media Source with well-chosen
characteristics, where each Source Packet Stream in the set is a
good-enough fit to the receiving sub-group of Participants. This set
of Source Packet Streams constitutes a Simulcast of the Media
Source. The size of the set and the characteristics of each Source
Packet Stream can be adjusted to cater for various restrictions in
the sending Participant, receiving Participants in the sub-group,
and network path(s) to the Participants in the sub-group. Let the
RTP Mixer forward the same Source Packet Stream to all Participants
in a sub-group, for all Source Packet Streams and sub-groups. The
advantages are low RTP Mixer processing, near optimum QoE, and near
optimum downlink bandwidth. The disadvantages are high uplink
bandwidth and arguably that downlink bandwidth and QoE are optimum
only for a sub-group and not per individual receiving
Participant.</t>
</list>A summary of the advantages and disadvantages of the above four
principle alternatives is given <xref
target="tab-diversity">below</xref>:</t>
<texttable anchor="tab-diversity"
title="Receiver Diversity Handling Comparison">
<ttcol>Method</ttcol>
<ttcol>Mixer CPU</ttcol>
<ttcol>Uplink</ttcol>
<ttcol>Downlink</ttcol>
<ttcol>QoE</ttcol>
<c>1</c>
<c>Low</c>
<c>Low</c>
<c>Low</c>
<c>Low</c>
<c>2</c>
<c>Very high</c>
<c>Optimum</c>
<c>Optimum</c>
<c>Near optimum</c>
<c>3</c>
<c>Low</c>
<c>Very high</c>
<c>Optimum</c>
<c>Optimum</c>
<c>4</c>
<c>Low</c>
<c>High</c>
<c>Near optimum</c>
<c>Near optimum</c>
</texttable>
<t>The authors of this document believes that alternative 4, the Grouped
Simulcast, can be a good tradeoff whenever supported by sufficient
uplink resources.</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:34:10 |