One document matched: draft-westerlund-avtcore-max-ssrc-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc2326 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2326.xml">
<!ENTITY rfc3261 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml">
<!ENTITY rfc3264 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml">
<!ENTITY rfc3550 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc4103 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4103.xml">
<!ENTITY rfc4566 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml">
<!ENTITY rfc4588 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4588.xml">
<!ENTITY rfc5234 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-max-ssrc-00"
ipr="trust200902">
<front>
<title abbrev="Multiple SSRC Signalling">Multiple Synchronization sources
(SSRC) in RTP Session Signaling</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Bo Burman" initials="B." surname="Burman">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 13 11</phone>
<email>bo.burman@ericsson.com</email>
</address>
</author>
<author fullname="Fredrik Jansson" initials="F." surname="Jansson">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>Kista</city>
<region></region>
<code>SE-164 80</code>
<country>Sweden</country>
</postal>
<phone>+46 10 719 00 00</phone>
<facsimile></facsimile>
<email>fredrik.k.jansson@ericsson.com</email>
<uri></uri>
</address>
</author>
<date day="24" month="October" year="2011" />
<abstract>
<t>RTP has always been a protocol that supports multiple participants
each sending their own media streams in an RTP session. Unfortunately
many implementations are designed only for point to point voice over IP
with a single source in each end-point. Even client implementations
aimed at video conferences have often been built with the assumption
around central mixers that only deliver a single media stream per media
type. Thus any application that wants to allow for more advance usage
where multiple media streams are sent and received by an end-point has
an issue with legacy implementations. This document describes the
problem and proposes a solution for how to use multiple SSRCs within one
RTP session and at the same time handle the legacy issues.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>This document discusses the issues of non basic usage of <xref
target="RFC3550">RTP</xref> where there is multiple media sources sent
over an RTP session using the SSRC source identifier to distinguish
between the sources. This include multiple sources from the same
end-point, multiple end-points each having a source, or an application
that sends or receive multiple encodings of a particular source.</t>
<section title="Background">
<t>RTP sessions are a concept which most fundamental part is an SSRC
space. This space can encompass a number of network nodes and
interconnected transport flows between these nodes. Each node may have
zero, one or more source identifiers (SSRCs) used to either identify a
real media source such as a camera or a microphone, a conceptual
source (like the most active speaker selected by an RTP mixer that
switches between incoming media streams based on the media stream or
additional information), or simply as an identifier for a receiver
that provides feedback and reports on reception. There are also RTP
nodes, like translators that are manipulating data, transport or
session state without making their presence aware to the other session
participants.</t>
<t>RTP was designed with multiple participants in a session from the
beginning. This was not restricted to multicast as many believe but
also unicast using either multiple transport flows below RTP or a
network node that redistributes the RTP packets, either unchanged in
the form of a transport translator (relay) or modified in an RTP
mixer. There is also the case where a single end-point have multiple
media sources of the same media type, like multiple cameras or
microphones.</t>
<t>However, the most common use cases have been point to point Voice
over IP (VoIP) or streaming applications where there have commonly not
been more than one media source per end-point. Even in conferencing
applications, especially voice only, the conference focus or bridge
have provided a single stream being a mix of the other participants to
each participant. Thus there has been little need for handling
multiple SSRCs in implementations. This has resulted in an installed
legacy base that is not fully RTP specification compliant and will
have different issues if they receive multiple SSRCs of media, either
simultaneously or in sequence. These issues will manifest themselves
in various ways, either by software crashes, or simply in limited
functionality, like only decoding and playing back the first or latest
SSRC received and discarding any other SSRCs.</t>
<t>The signaling solutions around RTP, especially the <xref
target="RFC4566">SDP</xref> based, have not considered the fundamental
issues around an RTP session's theoretical support of up to 4 billion
plus sources all sending media. No end-point has infinite processing
resources to decode and mix any number of media sources. In addition
the memory for storing related state, especially decoder state is
limited, and the network bandwidth to receive multiple streams is also
limited. Today, the most likely limitations are processing and network
bandwidth although for some use cases memory or other limitations may
also exist. The issue is that a given end-point will have some
limitations in the number of streams it simultaneously can receive,
decode and playback. These limitations need to be possible to expose
and enabling the session participants to take them into account.</t>
<t>In similar ways there is a need for an end-point to express if it
intends to produce one or more media streams in an RTP session. Todays
SDP signaling support for this is basically the directionality
attribute which indicates an end-point intent to send media or not.
There is however no way to indicate how many media streams will be
sent.</t>
<t>Taking these things together there exist a clear need to enable the
usage of multiple simultaneous media streams within an RTP session in
a way that allows a system to take legacy implementations into account
in addition to negotiate the actual capabilities around the multiple
streams in an RTP session.</t>
</section>
</section>
<section title="Definitions">
<t></t>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"></xref>.</t>
</section>
<section title="Terminology">
<t>The following terms and abbreviations are used in this
document:</t>
<t><list style="hanging">
<t hangText="Encoding:">A particular encoding is the choice of the
media encoder (codec) that has been used to compress the media,
the fidelity of that encoding through the choice of sampling,
bit-rate and other configuration parameters.</t>
<t hangText="Different encodings:">An encoding is different when
some parameter that characterize the encoding of a particular
media source has been changed. Such changes can be one or more of
the following parameters; codec, codec configuration, bit-rate,
sampling.</t>
</list></t>
</section>
</section>
<section title="Multiple Streams Issues">
<t>This section attempts to go a bit more in depth around the different
issues when using multiple media streams in an RTP session to make it
clear that although in theory multi-stream applications should already
be possible to use, there are good reasons to create extensions for
signaling. In addition, the RTP specification could benefit from
clarifications on how certain mechanisms should be working when an RTP
session contains more than two SSRCs.</t>
<section title="Legacy Behaviors">
<t>It is a common assumption among many applications using RTP that
they do not have a need to support more than one incoming and one
outgoing media stream per RTP session. For a number of applications
this assumption has been correct. For VoIP and Streaming applications
it has been easiest to ensure that a given end-point only receives
and/or sends a single stream. However, all end-points should support a
source changing SSRC value during a session, e.g due to SSRC value
collision between participants in a conference and the requirement to
always use unique SSRC values.</t>
<t>Some RTP extension mechanisms require the RTP stacks to handle
additional SSRCs, like SSRC multiplexed RTP retransmission described
in <xref target="RFC4588"></xref>. However, that still has only
required handling a single media decoding chain.</t>
<t>There are however applications that clearly can benefit from
receiving and using multiple media streams simultaneously. A very
basic case would be T.140 conversational text, where the text
characters are transmitted as a real-time media stream as you type.
When used in a multi-party chat scenario, an end-point can receive
input from multiple sending end-points where the <xref
target="RFC4103">T.140 RTP Payload Format</xref> text media is both
low bandwidth and where there is no obvious method to algorithmically
distinguish between multiple sources of text, making simple multiplex
and identification of separate sources through an identifier (SSRC) a
good choice.</t>
<t>An RTP session that contains an end-point with more than two SSRCs
actively sending media streams put some requirements on the receiving
client which is not necessarily fulfilled by a legacy client:</t>
<t><list style="numbers">
<t>The receiving client needs to handle receiving more than one
stream simultaneously rather than replacing the already existing
stream with the new one.</t>
<t>Be capable of decoding multiple streams simultaneously.</t>
<t>Be capable of rendering multiple streams simultaneously.</t>
</list></t>
<t>An application using multiple streams may be very similar to
existing one media stream applications at signaling level. To avoid
connecting two different implementations, one that is built to support
multiple streams and one that is not, it is important that the
capabilities are signaled. It is also the legacy that makes us use a
basic assumption in the solution. Anyone that does not explicitly
indicate capability to receive multiple media streams is assumed to
only handle a single media, to avoid affecting legacy clients.</t>
</section>
<section title="Receiver Limitations">
<t>An RTP end-point that intends to process the media in an RTP
session needs to have sufficient resources to receive and process all
the incoming streams. It is extremely likely that no receiver is
capable to handle the theoretical upper limit of more than 4 billion
media sources in an RTP session. Instead, one or more properties will
limit the end-points' capabilities to handle simultaneous media
streams. These properties are for example memory, processing, network
bandwidth, memory bandwidth, or rendering estate to mention a few
possible limitations.</t>
<t>We have also considered the issue of how many simultaneous
non-active sources an end-point can handle. We cannot see that
inactive media sending SSRCs result in significant resource
consumption and there should thus be no need to limit them.</t>
<t>A potential issue that needs to be acknowledged is where a limited
set of simultaneously active sources varies within a larger set of
session members. As each media decoding chain may contain state, it is
important that a receiver can flush a decoding state for an inactive
source and if that source becomes active again it does not assume that
this previous state exists. Thus, we see need for a signaling solution
that allows a receiver to indicate its upper limit in terms of
capability to handle simultaneous media streams. We see little need
for an upper limitation of RTP session members. Applications will need
to account for its own capability to use different codecs
simultaneously when choosing general and payload specific limits.</t>
</section>
<section title="Transmission Declarations">
<t>In an RTP based system where an end-point may either be legacy or
has an explicit upper limit in the number of simultaneous streams, one
will encounter situations where the end-point can not receive and
process all simultaneous active streams in the session. Instead the
sending end-points or central nodes, like RTP mixers, will provide the
end-point with a selected set of streams based on various metrics,
such as most active, most interesting, or user selected. In addition,
the central node may combine multiple media streams using mixing or
composition into a new media stream to enable an end-point to get
sufficient source coverage in the session, despite existing
limitations.</t>
<t>For such a system to be able to correctly determine the need for
central processing, the capabilities needed for such a central
processing node, and the potential need for an end-point to do sender
side limitations, it is necessary for an end-point to declare how many
simultaneous streams it may send. Thus, enabling negotiation of the
number of streams an end-point sends.</t>
</section>
</section>
<section title="Multiple Streams Extension">
<t>This section describes an extension of the media-level SDP attributes
to support signaling of the end points multiple stream capabilities.</t>
<section title="Signaling Support for Multiple Streams">
<t>A solution to the issues described in the previous section needs
to:<list style="symbols">
<t>Enable signaling between the RTP sender and receiver how many
simultaneous RTP streams that can be handled.</t>
<t>Be able to handle the case where the number of RTP streams that
can be sent from a client do not match the number of streams that
can be received by the same client.</t>
</list></t>
<t>It is also a requirement that a multiple streams capable RTP sender
MUST be able to adapt the number of sent streams to the RTP receiver
capability.</t>
<t>For this purpose and for use in SDP, two new media-level SDP
attributes are defined, max-send-ssrc and max-recv-ssrc, which can be
used independently to establish a limit to the number of
simultaneously active SSRCs for the send and receive directions,
respectively. Active SSRCs are the ones counted as senders according
to <xref target="RFC3550"></xref>, i.e. they have sent RTP packets
during the last two regular RTCP reporting intervals.</t>
<!--MW: The definition in the last sentence should likely be changed. Because we like to be able
to reuse a channle quicker than several seconds. -->
<t>The syntax for the attributes is in ABNF <xref
target="RFC5234"></xref>:</t>
<figure>
<artwork><![CDATA[
max-ssrc = "a="("max-send-ssrc:" / "max-recv-ssrc:") PT 1WSP limit
PT = "*" / 1*3DIGIT
limit = 1*8DIGIT
;WSP and DIGIT defined in [RFC5234]
]]></artwork>
</figure>
<t>A payload type-agnostic upper limit to the total number of
simultaneous SSRCs that can be sent or received in this RTP session is
signaled with a * instead of the payload type number. A value of 0 MAY
be used as maximum number of SSRC, but it is then RECOMMENDED that
this is also reflected using the sendonly or recvonly attribute. There
MUST be at most one payload type-agnostic limit specified in each
direction.</t>
<t>A payload type-specific upper limit to the total number of
simultaneous SSRCs in the RTP session with that specific payload type
is signaled with a defined payload type (static, or dynamic through
rtpmap). Multiple lines with max-send-ssrc or max-recv-ssrc attributes
specifying a single payload type MAY be used, each line providing a
limitation for that specific payload type. Payload types that are not
defined in the media block MUST be ignored.</t>
<!--MW: Consider to change "payloat type" to PT in this text. -->
<t>If a payload type-agnostic limit is present in combination with one
or more payload type-specific ones, the total number of payload
type-specific SSRCs are additionally limited by the payload
type-agnostic number. When there are multiple lines with payload
type-specific limits, the sender or receiver MUST be able to handle
any combination of the SSRCs with different payload types that fulfill
all of the payload type specific limitations, with a total number of
SSRCs up to the payload type-agnostic limit.</t>
<t>When max-send-ssrc or max-recv-ssrc are not included in the SDP, it
MUST be interpreted as equivalent to a limit of one, unless sendonly
or recvonly attributes are specified, in which case the limit is
implicitly zero for the corresponding unused direction.</t>
</section>
<section title="Declarative Use">
<t>When used as a declarative media description, the specified limit
in max-send-ssrc indicates the maximum number of simultaneous streams
of the specified payload types that the configured end-point may send
at any single point in time. Similarly, max-recv-ssrc indicates the
maximum number of simultaneous streams of the specified payload types
that may be sent to the configured end-point. Payload-agnostic limits
MAY be used with or without additional payload-specific limits.</t>
</section>
<section title="Use in Offer/Answer">
<t>When used in an offer <xref target="RFC3264"></xref>, the specified
limits indicate the agent's intent of sending and/or capability of
receiving that number of simultaneous SSRCs. The answerer MUST reverse
the directionality of recognized attributes such that max-send-ssrc
becomes max-recv-ssrc and vice versa. The answerer SHOULD modify the
offered limits in the answer to suit the answering client's capability
and intentions. A sender MUST NOT send more simultaneous streams of
the specified payload type than the receiver has indicated ability to
receive, taking into account also any payload type-agnostic limit.</t>
<t>In case an answer fails to include any of the limitation
attributes, the agent SHOULD be interpreted as capable of supporting
only a single stream in the direction for which attributes are
missing. If the offer lacks attributes it SHOULD be assumed that the
offerer only supports a single stream in each direction. In case the
offer lack both max-send-ssrc and max-recv-ssrc, they MUST NOT be
included in the answer.</t>
</section>
<section title="Examples">
<t>The SDP examples below are not complete. Only relevant parts have
been included.</t>
<figure>
<artwork><![CDATA[
m=video 49200 RTP/AVP 99
a=rtpmap:99 H264/90000
a=max-send-ssrc:* 2
a=max-recv-ssrc:* 4]]></artwork>
</figure>
<t>An offer with a stated intention of sending 2 simultaneous SSRCs
and a capability to receive 4 simultaneous SSRCs.</t>
<figure>
<artwork><![CDATA[
m=video 50324 RTP/AVP 96 97
a=rtpmap:96 H264/90000
a=rtpmap:97 H263-2000/90000
a=max-recv-ssrc:96 2
a=max-recv-ssrc:97 5
a=max-recv-ssrc:* 5]]></artwork>
</figure>
<t>An offer to receive at most 5 SSRCs, at most 2 of which using
payload type 96 and the rest using payload type 97. By not including
"max- send-ssrc" the value is implicitly set to 1.</t>
<figure>
<artwork><![CDATA[
m=video 50324 RTP/AVP 96 97 98
a=rtpmap:96 H264/90000
a=rtpmap:97 H263-2000/90000
a=rtpmap:98 H263/90000
a=max-recv-ssrc:96 2
a=max-recv-ssrc:97 3
a=max-recv-ssrc:98 5
a=max-recv-ssrc:* 5]]></artwork>
</figure>
<t>An offer to receive at most 5 SSRCs, at most 2 of which using
payload type 96, and at most 3 of which using payload type 97, and at
most 5 using payload type 98. Permissible payload type combinations
include those with no streams at all for one or more of the payload
types, as well as a total number of SSRCs less than 5, e.g. two SSRCs
with PT=96 and three SSRCs with PT=97, or one SSRC with PT=96, one
with PT=97 and two with PT=98.</t>
</section>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document registers two media level SDP attributes.</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>The SDP attributes defined in this document "a=max-recv-ssrc" and
"a=max-send-ssrc" signals capabilities of the end-point. Thus they are
vulnerable to attacks. The primary security concerns would be with third
parties that modifies the values of the attributes or inserts the
attributes in a signalling context. Thus changing the peers view of the
others peers capabilities and proposals. A modification reducing either
of send or receive values will degrade the service, potentially
preventing the service all together. Increasing the value or inserting
the attribute with a value different from 1 have the potential of being
even more effective. It can result in that an end-point that only
supports a single stream receives multiple streams. First of all
potentially exposing software flaws regarding handling of multiple
streams, thus causing crashes, less severe it can cause media
degradation as the receiving entity flaps between media streams, or
plays only a single one, where the other side assumes both will be
played. In addition negotiation several streams has transport impact,
potentially increasing the bit-rate consumed towards the end-point, and
in addition forcing a adaptation response over a limited path thus
degrading the media stream the end-point may play out.</t>
<t>To prevent third party manipulation of the SDP it should be source
authenticated and integrity protected. The solution suitable for this
depends on the signalling protocol being used. For <xref
target="RFC3261">SIP S/MIME</xref> are the ideal, and hop by hop TLS
provides at least some protection, although not perfect. For SDP's
retrieved using <xref target="RFC2326">RTSP DESCRIBE</xref> TLS would be
the RECOMMENDED solution.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc3550;
&rfc5234;
</references>
<references title="Informative References">
&rfc2326;
&rfc3261;
&rfc3264;
&rfc4103;
&rfc4566;
&rfc4588;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 14:30:31 |