One document matched: draft-westerlund-avtcore-max-ssrc-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-max-ssrc-02"
ipr="trust200902">
<front>
<title abbrev="Multiple SSRC Signalling">Multiple Synchronization sources
(SSRC) in RTP Session Signaling</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<email>bo.burman@ericsson.com</email>
</address>
</author>
<author fullname="Fredrik Jansson" initials="F." surname="Jansson">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region/>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile/>
<email>fredrik.k.jansson@ericsson.com</email>
<uri/>
</address>
</author>
<date day="16" month="July" year="2012"/>
<abstract>
<t>RTP has always been a protocol that supports multiple participants
each sending their own media streams in an RTP session. Unfortunately,
many implementations are designed only for point to point voice over IP
with a single source in each end-point. Even client implementations
aimed at video conferences have often been built with the assumption
around central mixers that only deliver a single media stream per media
type. Thus any application that wants to allow for more advanced usage,
where multiple media streams are sent and received by an end-point, has
an issue with legacy implementations. This document describes the
problem and proposes a signalling solution for how to use multiple SSRCs
within one RTP session and at the same time handle the legacy
issues.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>This document discusses the issues of non basic usage of <xref
target="RFC3550">RTP</xref> where there is multiple media sources sent
over an RTP session using the SSRC source identifier to distinguish
between the sources. This include multiple sources from the same
end-point, multiple end-points each having a source, or an application
that sends or receive multiple encodings of a particular media source
using multiple SSRCs.</t>
<section title="Background">
<t>RTP sessions are a concept which most fundamental part is an SSRC
space. This space can encompass a number of network nodes and
interconnected transport flows between these nodes. Each node may have
zero, one or more source identifiers (SSRCs) used to either identify a
real media source such as a camera or a microphone, a conceptual
source (like the most active speaker selected by an RTP mixer that
switches between incoming media streams based on the media stream or
additional information), or simply as an identifier for a receiver
that provides feedback and reports on reception. There are also RTP
nodes, like translators that are manipulating data, transport or
session state without making their presence aware to the other session
participants.</t>
<t>RTP was designed to support multiple participants in a session from
the beginning. This was not restricted to multicast as many believe
but also unicast using either multiple transport flows below RTP or a
network node that redistributes the RTP packets, either unchanged in
the form of a transport translator (relay) or modified in an RTP
mixer. There is also the case where a single end-point have multiple
media sources, like multiple cameras or microphones.</t>
<t>However, the most common use cases for RTP have been point to point
Voice over IP (VoIP) or streaming applications where there have
commonly not been more than one media source per end-point. Even in
conferencing applications, especially voice only, the conference focus
or bridge have provided a single stream being a mix of the other
participants to each participant. Thus there has been little need for
handling multiple SSRCs in implementations. This has resulted in an
installed legacy base that is not fully RTP specification compliant
and will have different issues if they receive multiple SSRCs of
media, either simultaneously or in sequence. These issues will
manifest themselves in various ways, either by software crashes, or
simply in limited functionality, like only decoding and playing back
the first or latest SSRC received and discarding any other SSRCs.</t>
<t>The signaling solutions around RTP, especially the <xref
target="RFC4566">SDP</xref> based, have not considered the fundamental
issues around an RTP session's theoretical support of up to 4 billion
plus sources all sending media. No end-point has infinite processing
resources to decode and mix any number of media sources. In addition
the memory for storing related state, especially decoder state is
limited, and the network bandwidth to receive multiple streams is also
limited. Today, the most likely limitations are processing and network
bandwidth although for some use cases memory or other limitations may
also exist. The issue is that a given end-point will have some
limitations in the number of streams it simultaneously can receive,
decode and playback. These limitations need to be possible to expose
and enabling the session participants to take them into account.</t>
<t>In similar ways there is a need for an end-point to express if it
intends to produce one or more media streams in an RTP session. Todays
SDP signaling support for this is basically the directionality
attribute which indicates an end-point intent to send media or not.
There is however no way to indicate how many media streams will be
sent.</t>
<t>Taking these things together there exist a clear need to enable the
usage of multiple simultaneous media streams within an RTP session in
a way that allows a system to take legacy implementations into account
in addition to negotiate the actual capabilities around the multiple
streams in an RTP session.</t>
</section>
</section>
<section title="Definitions">
<t/>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</section>
<section title="Terminology">
<t>The following terms and abbreviations are used in this
document:</t>
<t><list style="hanging">
<t hangText="Encoding:">A particular encoding is the choice of the
media encoder (codec) that has been used to compress the media,
the fidelity of that encoding through the choice of sampling,
bit-rate and other configuration parameters.</t>
<t hangText="Different encodings:">An encoding is different when
some parameter that characterize the encoding of a particular
media source has been changed. Such changes can be one or more of
the following parameters; codec, codec configuration, bit-rate,
sampling.</t>
</list></t>
</section>
</section>
<section title="Multiple Streams Issues">
<t>This section attempts to go a bit more in depth around the different
issues when using multiple media streams in an RTP session to make it
clear that although in theory multi-stream applications should already
be possible to use, there are good reasons to create extensions for
signaling. In addition, the RTP specification could benefit from
clarifications on how certain mechanisms should be working when an RTP
session contains more than two SSRCs.</t>
<section title="Legacy Behaviors">
<t>It is a common assumption among many applications using RTP that
they do not have a need to support more than one incoming and one
outgoing media stream per RTP session. For a number of applications
this assumption has been correct. For VoIP and Streaming applications
it has been easiest to ensure that a given end-point only receives
and/or sends a single stream. However, all end-points should support a
source changing SSRC value during a session, e.g due to SSRC value
collision between participants in a conference and the requirement to
always use unique SSRC values.</t>
<t>Some RTP extension mechanisms require the RTP stacks to handle
additional SSRCs, like SSRC multiplexed RTP retransmission described
in <xref target="RFC4588"/>. However, that still has only required
handling a single media decoding chain.</t>
<t>There are however applications that clearly can benefit from
receiving and using multiple media streams simultaneously. A very
basic case would be T.140 conversational text, where the text
characters are transmitted as a real-time media stream as you type.
When used in a multi-party chat scenario, an end-point can receive
input from multiple sending end-points where the <xref
target="RFC4103">T.140 RTP Payload Format</xref> text media is both
low bandwidth and where there is no obvious method to algorithmically
distinguish between multiple sources of text, making simple multiplex
and identification of separate sources through an identifier (SSRC) a
good choice.</t>
<t>An RTP session that contains an end-point with more than two SSRCs
actively sending media streams puts some requirements on the receiving
client, which is not necessarily fulfilled by a legacy client:</t>
<t><list style="numbers">
<t>The receiving client needs to handle receiving more than one
stream simultaneously rather than replacing the already existing
stream with the new one.</t>
<t>Be capable of decoding multiple streams simultaneously.</t>
<t>Be capable of rendering multiple streams simultaneously.</t>
</list></t>
<t>An application using multiple streams may be very similar to
existing one media stream applications at signaling level. To avoid
connecting two different implementations, one that is built to support
multiple streams and one that is not, it is important that the
capabilities are signaled. This also enables a particular application
to separate legacy clients that do not signal it explicitly from the
ones that do, and take appropriate measures for the legacy
application. Due to that there exist both legacy applications that
only handle a single stream and applications that handle multiple
streams, a single authoritative interpretation cannot be defined by
this specification.</t>
<t>There exist many models for legacy behaviors when it comes to
handling multiple SSRCs. This briefly describes some of the more
common ones:<list style="hanging">
<t hangText="Single SSRC per Direction:">As previously discussed
there exist a large number of applications where each end-point
has only a single SSRC and the number of end-points in the RTP
session are only two. This class of applications needs to have a
legacy fallback behavior that provide only a single SSRC, or
issues will likely occur.</t>
<t
hangText="Multiple SSRCs per Direction Single Media Stream:">There
exist some applications that uses multiple concurrent SSRCs but in
the end only consumes a single media stream. This include the
streaming applications using RTP retransmission with SSRC
multiplexing, but also applications switching between sources
although only consuming one at a time. When an offer or answer
without the signalling extension is received, the end-point can
potentially determine this by inspecting payload types in use and
correctly determine the legacy behavior. In some cases it must be
determined through application context or SDP external signaling
indications.</t>
<t hangText="Multi-Party Each Using Multiple SSRCs:">Multicast
applications commonly have multiple SSRCs, if for no other reason
than the existence of multiple end-points in the same RTP session.
Similar considerations exist in multi-party applications that uses
central nodes. There may be no indications in the SDP regarding
how the application will handle multiple concurrent SSRCs and the
expectancy to decode these. Thus also this legacy behavior must
commonly be determined from the application context and external
signaling.</t>
</list></t>
</section>
<section title="Receiver Limitations">
<t>An RTP end-point that intends to process the media in an RTP
session needs to have sufficient resources to receive and process all
the incoming streams. It is extremely likely that no receiver is
capable to handle the theoretical upper limit of more than 4 billion
media sources in an RTP session. Instead, one or more properties will
limit the end-points' capabilities to handle simultaneous media
streams. These properties are for example memory, processing, network
bandwidth, memory bandwidth, or rendering estate to mention a few
possible limitations. Those limits in number of simultaneous streams
may also differ depending on what codec (payload type) that is
used.</t>
<t>We have also considered the issue of how many simultaneous
non-active sources an end-point can handle. We cannot see that
inactive media sending SSRCs result in significant resource
consumption and there should thus be no need to limit them.</t>
<t>A potential issue that needs to be acknowledged is where a limited
set of simultaneously active sources varies within a larger set of
session members. As each media decoding chain may contain state, it is
important that a receiver can flush a decoding state for an inactive
source and if that source becomes active again it does not assume that
this previous state exists. Thus, we see need for a signaling solution
that allows a receiver to indicate its upper limit in terms of
capability to handle simultaneous media streams. We see little need
for an upper limitation of RTP session members. Applications will need
to account for its own capability to use different codecs
simultaneously when choosing general and payload specific limits.</t>
</section>
<section title="Transmission Declarations">
<t>In an RTP based system where an end-point may either be legacy or
has an explicit upper limit in the number of simultaneous streams, one
will encounter situations where the end-point can not receive and
process all simultaneous active streams in the session. Instead, the
sending end-points or central nodes, like RTP mixers, will have to
provide the end-point with a selected set of streams based on various
metrics, such as most active, most interesting, or user selected. In
addition, the central node may combine multiple media streams using
mixing or composition into a new media stream to enable an end-point
to get sufficient source coverage in the session, despite existing
limitations.</t>
<t>For such a system to be able to correctly determine the need for
central processing, the capabilities needed for such a central
processing node, and the potential need for an end-point to do sender
side limitations, it is necessary for an end-point to declare how many
simultaneous streams it may send. Thus, enabling negotiation of the
number of streams an end-point sends. The limit in number of
simultaneous streams may also differ depending on what codec (payload
type) that is used.</t>
</section>
</section>
<section title="Multiple Streams SDP Extension">
<t>This section describes an extension of the media-level SDP attributes
to support signaling of the end point's multiple stream
capabilities.</t>
<section title="Signaling Support for Multiple Streams">
<t>A solution to the issues described in the previous section needs
to:<list style="symbols">
<t>Enable signaling between the RTP sender and receiver how many
simultaneous RTP streams that can be handled, including a
possibility to specify a different number of streams depending on
what codec (payload type) that is used.</t>
<t>Be able to handle the case where the number of RTP streams that
can be sent from a client do not match the number of streams that
can be received by the same client.</t>
</list></t>
<t>It is also a requirement that a multiple streams capable RTP sender
MUST be able to adapt the number of sent streams to the RTP receiver
capability.</t>
<t>For this purpose and for use in SDP, two new media-level SDP
attributes are defined, max-send-ssrc and max-recv-ssrc, which can be
used independently to establish a limit to the number of
simultaneously active SSRCs for the send and receive directions,
respectively. Active SSRCs are the ones counted as senders according
to <xref target="RFC3550"/>, i.e. they have sent RTP packets during
the last two regular RTCP reporting intervals. Alternatively, if the
end-points supports <xref
target="I-D.westerlund-avtext-rtp-stream-pause">PAUSE and RESUME
signaling</xref>, any stream that the media sender sent a PAUSED
indication for can be regarded as non-Active and can immediately be
replaced by another SSRC, considering the limits established by this
specification, including possible changes in the applicable limit
required by a change of codec.</t>
<t>The syntax for the attributes is in ABNF <xref
target="RFC5234"/>:</t>
<figure>
<artwork><![CDATA[
max-ssrc = "a="( "max-send-ssrc:" / "max-recv-ssrc:" ) alt-list
alt-list = alt-set *(WSP alt-set)
alt-set = "{" alt *("&" alt)) "}"
alt = pt ":" limit
pt = ( pt-list / pt-wildcard )
pt-list = ( pt-value / pt-range ) *(","( pt-value / pt-range ))
pt-value = 1*3DIGIT
pt-range = pt-value "-" pt-value
pt-wildcard = "*"
limit = 1*8DIGIT
; WSP and DIGIT defined in [RFC5234]
]]></artwork>
</figure>
<t>A Payload Type (PT)-agnostic upper limit to the total number of
simultaneous SSRCs that can be sent or received in this RTP session is
signaled with a "*" instead of the PT number. A value of 0 MAY be used
as maximum number of SSRC, but it is then RECOMMENDED that this is
also reflected using the sendonly or recvonly attribute.</t>
<t>A PT-specific upper limit to the total number of simultaneous SSRCs
in the RTP session with that specific PT is signaled with a defined PT
(static, or dynamic through rtpmap). Multiple, alternative sets of PT
and limit MAY be specified on the same line, where each set indicates
the codec-dependent limit when a certain PT is used. A combination of
a comma-separated list of PT and a range of PT sharing a single limit
MAY be used. Within the alternative set, there MAY be a specification
of limits valid when different PT are used simultaneously. Any
PT-agnostic specification on a line MUST be interpreted as valid for
any PT that was not included within an explicit limit within that
alternative set. Multiple lines with max-send-ssrc or max-recv-ssrc
attributes specifying a single PT MAY be used, but MUST NOT contain
conflicting limits. PT values that are not defined in the media block
MUST be ignored.</t>
<t>When max-send-ssrc or max-recv-ssrc are not included in the SDP, it
is to be interpreted in the application's context. The default number
of allowed SSRCs can vary depending on the type of application.</t>
</section>
<section title="Declarative Use">
<t>When used as a declarative media description, the specified limit
in max-send-ssrc indicates the maximum number of simultaneous streams
of the specified payload types that the configured end-point may send
at any single point in time. Similarly, max-recv-ssrc indicates the
maximum number of simultaneous streams of the specified payload types
that may be sent to the configured end-point. Payload-agnostic limits
MAY be used with or without additional payload-specific limits.</t>
</section>
<section title="Use in Offer/Answer">
<t>When used in an offer <xref target="RFC3264"/>, the specified
limits indicate the agent's intent of sending and/or capability of
receiving that number of simultaneous SSRCs. An offerer supporting
this specification MUST include both attributes for sendrecv media
streams, even if one or both has a value of 1. For sendonly or
recvonly m= blocks, the one matching the Offer/Answer agent's role
MUST be included when using this extension, and the other
directionality MAY be included for informational purpose, if
bi-directionality can potentially be used in the future for this m=
block. The answerer MUST reverse the directionality of recognized
attributes such that max-send-ssrc becomes max-recv-ssrc and vice
versa. The answerer SHOULD modify the offered limits in the answer to
suit the answering client's capability and intentions. A sender MUST
NOT send more simultaneous streams of the specified payload type(s)
than the receiver has indicated ability to receive, taking into
account also any payload type-agnostic limit.</t>
<t>In case an answer fails to include any of the limitation
attributes, the agent is RECOMMENDED to have defined a suitable
interpretation for the application context. This may be the choice of
being capable of supporting only a single stream (SSRC) in the
direction for which attributes are missing. It may also indicate
multiple SSRC support depending on the application. In case the offer
lacks both max-send-ssrc and max-recv-ssrc, they MUST NOT be included
in the answer.</t>
</section>
<section title="Examples">
<t>The SDP examples below are not complete. Only relevant parts have
been included, for brevity and readability.</t>
<figure>
<artwork><![CDATA[
m=video 49200 RTP/AVP 99
a=rtpmap:99 H264/90000
a=max-send-ssrc:{*:2}
a=max-recv-ssrc:{*:4}]]></artwork>
</figure>
<t>An offer with a stated intention of sending 2 simultaneous SSRCs
and a capability to receive 4 simultaneous SSRCs.</t>
<figure>
<artwork><![CDATA[
m=video 50324 RTP/AVP 96 97
a=rtpmap:96 H264/90000
a=rtpmap:97 H263-2000/90000
a=max-recv-ssrc:{96:2&97:3} {96:1&97:4} {97:5}
a=max-send-ssrc:{* 1}]]></artwork>
</figure>
<t>An offer to receive at most 5 SSRCs, at most 2 of which using
payload type 96 and the rest using payload type 97. The "max-
send-ssrc" is used to explicitly indicate the value of 1.</t>
<figure>
<artwork><![CDATA[
m=video 50324 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H263-2000/90000
a=rtpmap:98 H263/90000
a=max-recv-ssrc:{96:2&97:3} {96-97:2&98:1} {96,98:2&97:1}
a=max-recv-ssrc:{96,98:1&97:3} {96:1&97-98:2} {96-97:1&98:3}
a=max-recv-ssrc:{96:1&98:4} {97:3&98:2} {97:2&98:3} {97:1&98:4}
a=max-recv-ssrc:{98:5}
a=max-send-ssrc:{* 1}]]></artwork>
</figure>
<t>An offer to receive at most 5 SSRCs, at most 2 of which using
payload type 96, at most 3 of which using payload type 97, and at most
5 using payload type 98. Since there are many combinations, more than
one "max-recv-ssrc" line is used, which is OK since no specifications
on those lines are in conflict.</t>
</section>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document registers two media level SDP attributes.</t>
<!--MW: Need to add registration template.-->
</section>
<section anchor="Security" title="Security Considerations">
<t>The SDP attributes defined in this document "a=max-recv-ssrc" and
"a=max-send-ssrc" signals capabilities of the end-point. Thus they are
vulnerable to attacks. The primary security concerns would be with third
parties that modifies the values of the attributes or inserts the
attributes in a signalling context. Thus changing the peers view of the
others peers capabilities and proposals. A modification reducing either
of send or receive values will degrade the service, potentially
preventing the service all together. Increasing the value or inserting
the attribute with a value different from 1 have the potential of being
even more effective. It can result in that an end-point that only
supports a single stream will be sent multiple streams. First of all,
this potentially exposes software flaws regarding handling of multiple
streams, thus causing crashes, less severe it can cause media
degradation as the receiving entity flaps between media streams, or
plays only a single one, where the other side assumes both will be
played. In addition, negotiation of several streams has transport
impact, potentially increasing the bit-rate consumed towards the
end-point, and in addition forcing an adaptation response over a limited
path, thus degrading the media stream the end-point may play out.</t>
<t>To prevent third party manipulation of the SDP, it should be source
authenticated and integrity protected. The solution suitable for this
depends on the signalling protocol being used. For <xref
target="RFC3261">SIP S/MIME</xref> is the ideal, and hop by hop TLS
provides at least some protection, although not perfect. For SDPs
retrieved using <xref target="RFC2326">RTSP DESCRIBE</xref>, TLS would
be the RECOMMENDED solution.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include='reference.RFC.2119'?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.RFC.5234'?>
</references>
<references title="Informative References">
<?rfc include='reference.RFC.2326'?>
<?rfc include='reference.RFC.3261'?>
<?rfc include='reference.RFC.3264'?>
<?rfc include='reference.RFC.4103'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.I-D.westerlund-avtext-rtp-stream-pause'?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 14:29:21 |