One document matched: draft-ietf-avtcore-multi-media-rtp-session-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-avtcore-multi-media-rtp-session-00"
ipr="trust200902" updates="3550">
<front>
<title abbrev="Multiple Media Types in an RTP Session">Multiple Media
Types in an RTP Session</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Colin Perkins" initials="C. " surname="Perkins">
<organization>University of Glasgow</organization>
<address>
<postal>
<street>School of Computing Science</street>
<city>Glasgow</city>
<code>G12 8QQ</code>
<country>United Kingdom</country>
</postal>
<email>csp@csperkins.org</email>
</address>
</author>
<author fullname="Jonathan Lennox" initials="J." surname="Lennox">
<organization abbrev="Vidyo">Vidyo, Inc.</organization>
<address>
<postal>
<street>433 Hackensack Avenue</street>
<street>Seventh Floor</street>
<city>Hackensack</city>
<region>NJ</region>
<code>07601</code>
<country>US</country>
</postal>
<email>jonathan@vidyo.com</email>
</address>
</author>
<date day="8" month="October" year="2012"/>
<workgroup>AVTCORE WG</workgroup>
<abstract>
<t>This document specifies how an RTP session can contain media streams
with media from multiple media types such as audio, video, and text.
This has been restricted by the RTP Specification, and thus this
document updates RFC 3550 to enable this behavior for applications that
satisfy the applicability for using multiple media types in a single RTP
session.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>When the <xref target="RFC3550">Real-time Transport Protocol
(RTP)</xref> was designed, close to 20 years ago, IP networks were very
different compared to the ones in 2012 when this is written. The almost
ubiquitous deployment of Network Address Translators (NAT) and Firewalls
has increased the cost and likely-hood of communication failure when
using many different transport flows. Thus there exists a pressure to
reduce the number of concurrent transport flows.</t>
<t><xref target="RFC3550">RTP</xref> as defined recommends against
having multiple media types, like audio and video, in the same RTP
session. The motivation for this is dependent on particular usage or
dependencies on lower layer Quality of Service (QoS). When these aren't
present, there are no strong RTP reasons for not allowing multiple media
types in one RTP session. However, the <xref target="RFC4566">Session
Description Protocol (SDP)</xref>, as one of the dominant signalling
method for establishing RTP session, has enforced this rule, by not
allowing multiple media types for a given receiver destination or set of
ICE candidates, which is the most common method to determine which RTP
session the packets are intended for.</t>
<t>The fact that these limitations have been in place for so long a
time, in addition to RFC 3550 being written without fully considering
multiple media types in an RTP session, does result in a number of
considerations being needed. This document provides such considerations
regarding applicability as well as functionality, including normative
specification of behavior.</t>
<t>First, some basic definitions are provided. This is followed by a
background that discusses the motivation in more detail. A overview of
the solution of how to provide multiple media types in one RTP session
is then presented. Next is the formal applicability this specification
have followed by the normative specification. This is followed by a
discussion how some RTP/RTCP Extensions should function in the case of
multiple media types in one RTP session. A specification of the
requirements on signalling from this specification and a look how this
is realized in SDP using <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">Bundle</xref>. The
document ends with the security considerations.</t>
</section>
<section title="Definitions">
<t/>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</section>
<section title="Terminology">
<t>The following terms are used with supplied definitions:<list
style="hanging">
<t hangText="Endpoint:">A single entity sending or receiving RTP
packets. It may be decomposed into several functional blocks, but
as long as it behaves as a single RTP stack entity it is
classified as a single endpoint.</t>
<t hangText="Media Stream:">A sequence of RTP packets using a
single SSRC that together carries part or all of the content of a
specific Media Type from a specific sender source within a given
RTP session.</t>
<t hangText="Media Type:">Audio, video, text or application whose
form and meaning are defined by a specific real-time
application.</t>
<t hangText="RTP Session:">As defined by <xref target="RFC3550"/>,
the endpoints belonging to the same RTP Session are those that
share a single SSRC space. That is, those endpoints can see an
SSRC identifier transmitted by any one of the other endpoints. An
endpoint can receive an SSRC either as SSRC or as CSRC in RTP and
RTCP packets. Thus, the RTP Session scope is decided by the
endpoints' network interconnection topology, in combination with
RTP and RTCP forwarding strategies deployed by endpoints and any
interconnecting middle nodes.</t>
<t/>
</list></t>
<t/>
</section>
</section>
<section title="Motivation ">
<t>This section discusses in more detail the main motivations why
allowing multiple media types in the same RTP session is suitable.</t>
<section title="NAT and Firewalls">
<t>The existence of NATs and Firewalls at almost all Internet access
has had implications on protocols like RTP that were designed to use
multiple transport flows. First of all, the NAT/FW traversal solution
one uses needs to ensure that all these transport flows are
established. This has three different impacts:<list style="numbers">
<t>Increased delay to perform the transport flow establishment</t>
<t>The more transport flows, the more state and the more resource
consumption in the NAT and Firewalls. When the resource
consumption in NAT/FWs reaches their limits, unexpected behaviors
usually occur.</t>
<t>More transport flows means a higher risk that some transport
flow fails to be established, thus preventing the application to
communicate.</t>
</list></t>
<t>Using fewer transport flows reduces the risk of communication
failure, improved establishment behavior and less load on NAT and
Firewalls.</t>
</section>
<section title="No Transport Level QoS">
<t>Many RTP-using applications don't utilize any network level Quality
of Service functions. Nor do they expect or desire any separation in
network treatment of its media packets, independent of whether they
are audio, video or text. When an application has no such desire, it
doesn't need to provide a transport flow structure that simplifies
flow based QoS.</t>
</section>
<section title="Architectural Equality">
<t>For applications that don't desire any type of different treatment,
neither on the transport level nor in RTP or RTCP reporting, using the
same RTP session for both media types appears a reasonable choice. The
architecture should be neutral to media type, rather look at what it
provides based on the application users choice. Therefore this bias
should be removed and let the application designer make the choice if
they need multiple RTP sessions or not based on other aspects.</t>
</section>
</section>
<section title="Overview of Solution">
<t>The goal of the solution is to enable having one or more RTP
sessions, where each RTP session may contain two or more media types.
This includes having multiple RTP sessions containing a given media
type, for example having three sessions containing video and audio.</t>
<t>The solution is quite straightforward. The first step is to override
the SHOULD and SHOULD NOT language of the <xref target="RFC3550">RTP
specification</xref>. This is done by appropriate exception clauses
given that this specification is followed.</t>
<t>Within an RTP session where multiple media types have been configured
for use, a SSRC may only send one media type during its lifetime.
Different SSRCs must be used for the different media sources, the same
way multiple media sources of the same media type already have to do.
The payload type will inform a receiver which media type the SSRC is
being used for. Thus the payload type must be unique across all of the
payload configurations independent of media type that may be used in the
RTP session.</t>
<t>Some few extra considerations within the RTP sessions also needs to
be considered. RTCP bandwidth and regular reporting suppression (AVPF
and SAVPF) should be considered to be configured. Certain payload types
like FEC also need additional rules.</t>
<t>The final important part of the solution to this is to use signalling
and ensure that agreement on using multiple media types in an RTP
session exists, and how that then is configured. Thus document documents
some existing requirements, while an external reference defines how this
is accomplished in SDP.</t>
</section>
<section title="Applicability">
<t>This specification has limited applicability and any one intending to
use must ensure that their application and usage meets the below
criteria for usage.</t>
<section title="Usage of the RTP session">
<t>Before choosing to use this specification, an application
implementer needs to ensure that they don't have a need for different
RTP sessions between the media types for some reason. The main rule is
that if one expects to have equal treatment of all media packets, then
this specification might be suitable. The equal treatment include
anything from network level up to RTCP reporting and feedback. The
document <xref
target="I-D.westerlund-avtcore-multiplex-architecture">Guidance on RTP
Multiplexing Architecture</xref> gives more detailed guidance on
aspects to consider when choosing how to use RTP and specifically
sessions. RTP-using applications that need or would prefer multiple
RTP sessions, but do not require the functionalities or behaviors that
multiple transport flows give, can consider using <xref
target="I-D.westerlund-avtcore-transport-multiplexing">Multiple RTP
Sessions on a Single Lower-Layer Transport</xref>.</t>
</section>
<section title="Signalled Support">
<t>Usage of this specification is not compatible with anyone following
RFC 3550 and intending to have different RTP sessions for each media
type. Therefore there must be mutual agreement to use multiple media
types in one RTP session by all participants within an RTP session.
This agreement must in most cases be determined using signalling.</t>
<t>This requirement can be a problem for signalling solutions that
can't negotiate with all participants. For declarative signalling
solutions, mandating that the session is using multiple media types in
one RTP session can be a way of attempting to ensure that all
participants in the RTP session follow the requirement. However, for
signalling solutions that lack methods for enforcing that a receiver
supports a specific feature, this can still cause issues.</t>
</section>
<section title="Homogeneous Multi-party">
<t>In multiparty communication scenarios it is important to separate
two different cases. One case is where the RTP session contains
multiple participants in a common RTP session. This occurs for example
in Any Source Multicast (ASM) and Transport Translator topologies as
defined in <xref target="RFC5117">RTP Topologies</xref>. It may also
occur in some implementations of RTP mixers that share the same
SSRC/CSRC space across all participants. The second case is when the
RTP session is terminated in a middlebox and the other participants
sources are projected or switched into each RTP session and rewritten
on RTP header level including SSRC mappings.</t>
<t>For the first case, with a common RTP session or at least shared
SSRC/CSRC values, all participants in multiparty communication are
required to support multiple media types in an RTP session. An
participant using two or more RTP sessions towards a multiparty
session can't be collapsed into a single session with multiple media
types. The reason is that in case of multiple RTP sessions, the same
SSRC value can be use in both RTP sessions without any issues, but
when collapsed to a single session there is an SSRC collision. In
addition some collisions can't be represented in the multiple separate
RTP sessions. For example, in a session with audio and video, an SSRC
value used for video will not show up in the Audio RTP session at the
participant using multiple RTP sessions, and thus not trigger any
collision handling. Thus any application using this type of RTP
session structure must have a homogeneous support for multiple media
types in one RTP session, or be forced to insert a translator node
between that participant and the rest of the RTP session.</t>
<t>For the second case of separate RTP sessions for each multiparty
participant and a central node it is possible to have a mix of single
RTP session users and multiple RTP session users as long as one is
willing to remap the SSRCs used by a participant with multiple RTP
sessions into non-used values in the single RTP session SSRC space for
each of the participants using a single RTP session with multiple
media types. It can be noted that this type of implementation is
required to understand any type of RTP/RTCP extension being used in
the RTP sessions to correctly be able to translate them between the
RTP sessions.</t>
</section>
<section title="Reduced number of Payload Types">
<t>An RTP session with multiple media types in it have only a single
7-bit Payload Type range for all its payload types. Within the 128
available values, only 96 or less if <xref
target="RFC5761">"Multiplexing RTP Data and Control Packets on a
Single Port"</xref> is used, all the different RTP payload
configurations for all the media types must fit. For most applications
this will not be a real problem, but the limitation exists and could
be encountered.</t>
</section>
<section title="Stream Differentiation">
<t>If network level differentiation of the media streams of different
media types are desired using this specification can cause severe
limitations. All media streams in an RTP session, independent of the
media type, will be sent over the same underlying transport flow. Any
flow-based Quality of Service (QoS) mechanism will be unable to
provide differentiated treatment between different media types, e.g.
to prioritize audio over video. If that is desired, separate RTP
sessions over different underlying transport flows needs to be used.
Any marking-based QoS scheme like DiffServ is not affected unless a
network ingress marks based on flows.</t>
</section>
<section title="Non-compatible Extensions">
<t>There exist some RTP and RTCP extensions that rely on the existence
of multiple RTP sessions. If the goal of using an RTP session with
multiple media types is to have only a single RTP session, then these
extensions can't be used. If one has no need to have different RTP
sessions for the media types but is willing to have multiple RTP
sessions, one for the main media transmission and one for the
extension, they can be used. It should be noted that this assumes that
it is possible to get the extension working when the related RTP
session contains multiple media types.</t>
<t>Identified RTP/RTCP extensions that require multiple RTP Sessions
are:<list style="hanging">
<t hangText="RTP Retransmission:"><xref target="RFC4588">RTP
Retransmission</xref> has a session multiplexed mode. It also has
a SSRC multiplexed mode that can be used instead. So use the mode
that is suitable for the RTP application.</t>
<t hangText="XOR-Based FEC:">The <xref target="RFC5109">RTP
Payload Format for Generic Forward Error Correction</xref> and its
predecessor <xref target="RFC2733"/> requires a separate RTP
session unless the FEC data is carried in <xref
target="RFC2198">RTP Payload for Redundant Audio Data</xref> which
has another set of restrictions.</t>
<t hangText="">Note that the <xref
target="RFC5576">Source-Specific Media Attributes</xref>
specification defines an SDP syntax (the "FEC" semantic of the
"ssrc-group" attribute) to signal FEC relationships between
multiple media streams within a single RTP session. However, this
can't be used as the FEC repair packets are required to have the
same SSRC value as the source packets being protected. <xref
target="RFC5576"/> does not normatively update and resolve that
restriction.</t>
</list></t>
<t/>
</section>
</section>
<section title="RTP Session Specification">
<t>This section defines what needs to be done or avoided to make an RTP
session with multiple media types function without issues.</t>
<section title="RTP Session">
<t>Section 5.2 of <xref target="RFC3550">"RTP: A Transport Protocol
for Real-Time Applications"</xref> states:<list style="empty">
<t>For example, in a teleconference composed of audio and video
media encoded separately, each medium SHOULD be carried in a
separate RTP session with its own destination transport
address.</t>
<t>Separate audio and video streams SHOULD NOT be carried in a
single RTP session and demultiplexed based on the payload type or
SSRC fields.</t>
</list></t>
<t>This specification changes both of these sentences. The first
sentence is changed to:<list style="empty">
<t>For example, in a teleconference composed of audio and video
media encoded separately, each medium SHOULD be carried in a
separate RTP session with its own destination transport address,
unless specification [RFCXXXX] is followed and the application
meets the applicability constraints.</t>
</list></t>
<t>The second sentence is changed to:<list style="empty">
<t>Separate audio and video streams SHOULD NOT be carried in a
single RTP session and demultiplexed based on the payload type or
SSRC fields, unless multiplexed based on both SSRC and payload
type and usage meets what Multiple Media Types in an RTP Session
[RFCXXXX] specifies.</t>
</list></t>
<t>RFC-Editor Note: Please replace RFCXXXX with the RFC number of this
specification when assigned.</t>
<t>TBD: Discussion of the motivations in Section 5.2 of the <xref
target="RFC3550">RTP Specification</xref>.</t>
</section>
<section title="Sender Source Restrictions">
<t>A SSRC in the RTP session MUST only send one media type (audio,
video, text etc.) during the SSRC's lifetime. The main motivation is
that a given SSRC has its own RTP timestamp and sequence number
spaces. The same way that you can't send two streams of encoded audio
on the same SSRC, you can't send one audio and one video encoding on
the same SSRC. Each media encoding when made into an RTP stream needs
to have the sole control over the sequence number and timestamp space.
If not, one would not be able to detect packet loss for that
particular stream. Nor can one easily determine which clock rate a
particular SSRCs timestamp shall increase with.</t>
</section>
<section title="Payload Type Applicability">
<t>Most Payload Types have a native media type, like an audio codec is
natural belonging to the audio media type. However, there exist a
number of RTP payload types that don't have a native media type. For
example, transport robustification mechanisms like <xref
target="RFC4588">RTP Retransmission</xref> and <xref
target="RFC5109">Generic FEC</xref> inherit their media type from what
they protect. RTP Retransmission is explicitly bound to the payload
type it is protecting, and thus will inherit it. However Generic FEC
is a excellent example of an RTP payload type that has no natural
media type. The media type for what it protects is not relevant as it
is the recovered RTP packets that have a particular media type, and
thus Generic FEC is best categorized as an application media type.</t>
<t>The above discussion is relevant to what limitations exist for RTP
payload type usage within an RTP session that has multiple media
types. When it comes to Generic FEC, is an configured payload type
allowed to be used to protect both audio SSRCs and Video SSRCs? Note a
particular SSRC carrying Generic FEC will clearly only protect a
specific SSRC and thus that instance is bound to the SSRC's media
type. For this specific case, it appears possible to have one be
applicable to both. However, in cases when the signalling is setup to
enable fallback to using separate RTP sessions, then using a different
media type, e.g. application, than the media being protected can
create issues.</t>
<t>TBD: What recommendations are needed here?</t>
</section>
<section title="RTCP ">
<t>All SSRCs in an RTP session fall under the same set of RTCP
configuration parameters, such as the RR and RS bandwidth and the
trr-int parameter if AVPF or SAVPF is used. This means that at least
the regular reporting period by, and on, a source will be equal,
independent of the media type for that source. This should in most
cases not be an issue, but it may result in more frequent reporting
than is considered necessary for a particular media type or set of
media sources. Having multiple media types in one RTP session also
results in more SSRCs being present in this RTP session. This
increasing the amount of cross reporting between the SSRCs. From an
RTCP perspective, two RTP sessions with half the number of SSRCs in
each will be slightly more efficient. If someone needs either the
higher efficiency due to the lesser number of SSRCs or the fact that
one can't tailor RTCP usage per media type, they need to use
independent RTP sessions.</t>
<t>When it comes to handling multiple SSRCs in an RTP session there is
a clarification under discussion in <xref
target="I-D.lennox-avtcore-rtp-multi-stream">Real-Time Transport
Protocol (RTP) Considerations for Multi-Stream Endpoints</xref>. When
it comes to configuring RTCP the need for regular periodic reporting
needs to be weighted against any feedback or control messages being
sent. The applications using AVPF or SAVPF are RECOMMENDED to consider
setting trr-int parameter to a value suitable for the applications
needs, thus potentially reducing the need for regular reporting and
thus releasing more bandwidth for use for feedback or control.</t>
<t>Another aspect of an RTP session with multiple media types is that
the used RTCP packets, RTCP Feedback Messages, or RTCP XR metrics used
may not be applicable to all media types. Instead all RTP/RTCP
endpoints need to correlate the media type of the SSRC being
referenced in an messages/packet and only use those that apply to that
particular SSRC and its media type. Signalling solutions may have
shortcomings when it comes to indicate that a particular set of RTCP
reports or feedback messages only apply to a particular media type
within an RTP session.</t>
</section>
</section>
<section title="Extension Considerations">
<t>This section discusses the impact on some RTP/RTCP extensions due to
usage of multiple media types in on RTP session. Only extensions where
something worth noting has been included.</t>
<section title="RTP Retransmission">
<t>SSRC-multiplexed <xref target="RFC4588">RTP retransmission</xref>
is actually very straightforward. Each retransmission RTP payload type
is explicitly connected to an associated payload type. If
retransmission is only to be used with a subset of all payload types,
this is not a problem, as it will be evident from the retransmission
payload types which payload types that have retransmission enabled for
them.</t>
<t>Session-multiplexed RTP retransmission is also possible to use
where an retransmission session contains the retransmissions of the
associated payload types in the source RTP session. The only
difference to previously is that the source RTP session is one which
contains multiple media types. Thus it is even more likely that only a
subset of the source RTP session's payload types and SSRCs are
actually retransmitted.</t>
<t>Open Issue: When using SDP to signal retransmission for one RTP
session with multiple media types and one RTP session for the
retransmission data will cause a situation where one will have
multiple m= lines grouped using FID and the ones belonging to
respective RTP session being grouped using BUNDLE. This usage may
contradict both the <xref target="RFC5888">FID semantics</xref> and an
assumption in the <xref target="RFC4588">RTP retransmission
specification</xref>.</t>
</section>
<section title="Generic FEC">
<t>TBW:</t>
</section>
</section>
<section title="Signalling">
<t>The Signalling requirements</t>
<t>Establishing an RTP session with multiple media types requires
signalling. This signalling needs to fulfill the following
requirements:<list style="numbers">
<t>Ensure that any participant in the RTP session is aware that this
is an RTP session with multiple media types.</t>
<t>Ensure that the payload types in use in the RTP session are using
unique values, with no overlap between the media types.</t>
<t>Configure the RTP session level parameters, such as RTCP RR and
RS bandwidth, AVPF trr-int, underlying transport, the RTCP
extensions in use, and security parameters, commonly for the RTP
session.</t>
<t>RTP and RTCP functions that can be bound to a particular media
type should be reused when possible also for other media types,
instead of having to be configured for multiple code-points. Note:
In some cases one will not have a choice but to use multiple
configurations.</t>
</list></t>
<t/>
<section title="SDP-Based Signalling">
<t>The signalling of multiple media types in one RTP session in SDP is
specified in <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation">"Multiplexing
Negotiation Using Session Description Protocol (SDP) Port
Numbers"</xref>.</t>
</section>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document makes no request of IANA.</t>
<t>Note to RFC Editor: this section may be removed on publication as an
RFC.</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>Having an RTP session with multiple media types doesn't change the
methods for securing a particular RTP session. One possible difference
is that the different media have often had different security
requirements. When combining multiple media types in one session, their
security requirements must also be combined by selecting the most
demanding for each property. Thus having multiple media types may result
in increased overhead for security for some media types to ensure that
all requirements are meet.</t>
<t>Otherwise, the recommendations for how to configure and RTP session
do not add any additional requirements compared to normal RTP, except
for the need to be able to ensure that the participants are aware that
it is a multiple media type session. If not that is ensured it can cause
issues in the RTP session for both the unaware and the aware one.
Similar issues can also be produced in an normal RTP session by creating
configurations for different end-points that doesn't match each
other.</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t>The authors would like to thank Christer Holmberg for the feedback on
the document.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include='reference.RFC.3550'?>
<?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>
<?rfc include='reference.I-D.lennox-avtcore-rtp-multi-stream'?>
</references>
<references title="Informative References">
<?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>
<?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>
<?rfc include='reference.RFC.2198'?>
<?rfc include='reference.RFC.2733'?>
<?rfc include='reference.RFC.4566'?>
<?rfc include='reference.RFC.4588'?>
<?rfc include='reference.RFC.5109'?>
<?rfc include='reference.RFC.5117'?>
<?rfc include='reference.RFC.5576'?>
<?rfc include='reference.RFC.5761'?>
<?rfc include='reference.RFC.5888'?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:44:11 |