One document matched: draft-westerlund-avtcore-transport-multiplexing-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3550 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc3711 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml">
<!ENTITY rfc5234 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml">
<!ENTITY rfc3830 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3830.xml">
<!ENTITY rfc3264 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml">
<!ENTITY rfc4567 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4567.xml">
<!ENTITY rfc4568 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4568.xml">
<!ENTITY rfc5285 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5285.xml">
<!ENTITY rfc5506 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5506.xml">
<!ENTITY rfc5761 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5761.xml">
<!ENTITY rfc5764 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5764.xml">
<!ENTITY draft-ietf-mmusic-sdp-bundle-negotiation SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mmusic-sdp-bundle-negotiation.xml">
<!ENTITY draft-rosenberg-rtcweb-rtpmux SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.rosenberg-rtcweb-rtpmux.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std"
docName="draft-westerlund-avtcore-transport-multiplexing-02"
ipr="trust200902">
<front>
<title abbrev="Multiple RTP Session on Single Transport">Multiple RTP
Sessions on a Single Lower-Layer Transport</title>
<author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
<organization>Ericsson</organization>
<address>
<postal>
<street>Farogatan 6</street>
<city>SE-164 80 Kista</city>
<country>Sweden</country>
</postal>
<phone>+46 10 714 82 87</phone>
<email>magnus.westerlund@ericsson.com</email>
</address>
</author>
<author fullname="Colin Perkins" initials="C. S." surname="Perkins">
<organization>University of Glasgow</organization>
<address>
<postal>
<street>School of Computing Science</street>
<city>Glasgow</city>
<code>G12 8QQ</code>
<country>United Kingdom</country>
</postal>
<email>csp@csperkins.org</email>
</address>
</author>
<date day="12" month="March" year="2012" />
<abstract>
<t>This document specifies how multiple RTP sessions are to be
multiplexed on the same lower-layer transport, e.g. a UDP flow. It
discusses various requirements that have been raised and their
feasibility, which results in a solution with a certain applicability. A
solution is recommended and that solution is provided in more detail,
including signalling and examples.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>There has been renewed interest for having a solution that allows
multiple <xref target="RFC3550">RTP sessions</xref> to use a single
lower layer transport, such as a bi-directional UDP flow. The main
reason is the cost of doing NAT/FW traversal for each individual flow.
ICE and other NAT/FW traversal solutions are clearly capable of
attempting to open multiple flows. However, there is both increased risk
for failure and an increased cost in the creation of multiple flows. The
increased cost comes as slightly higher delay in establishing the
traversal, and the amount of consumed NAT/FW resources. The latter might
be an increasing problem in the IPv4 to IPv6 transition period.</t>
<t>This document draws up some requirements for consideration on how to
transport multiple RTP sessions over a single lower-layer transport.
These requirements will have to be weighted as the combined set of
requirements result in that no known solution exist that can fulfill
them completely.</t>
<t>A number of possible solutions are then considered and discussed with
respect to their properties. Based on that, the authors recommends a
shim layer variant as single solution, which is described in more detail
including signalling solution and examples.</t>
</section>
<section title="Conventions">
<t></t>
<section title="Terminology">
<t>Some terminology used in this document.</t>
<t><list style="hanging">
<t hangText="Multiplexing:">Unless specifically noted, all
mentioning of multiplexing in this document refer to the
multiplexing of multiple RTP Sessions on the same lower layer
transport. It is important to make this distinction as RTP does
contain a number of multiplexing points for various purposes, such
as media formats (Payload Type), media sources (SSRC), and RTP
sessions.</t>
</list></t>
</section>
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
</section>
<section title="Requirements">
<t>This section lists and discusses a number of potential requirements.
However, it is not difficult to realize that it is in fact possible to
put requirements that makes the set of feasible solutions an empty set.
It is thus necessary to consider which requirements that are essential
to fulfill and which can be compromised on to arrive at a solution.</t>
<section anchor="sec-req-multiple-session"
title="Support Use of Multiple RTP Sessions">
<t>This may at first glance appear to be an obvious requirement.
Although the authors are convinced it is a mandatory requirement for a
solution, it warrants some discussion around the implications of not
having multiple RTP sessions and instead use a single RTP session.</t>
<t>The usage of multiple RTP sessions allow separation of media
streams that have different usages or purposes in an RTP based
application, for example to separate the video of a presenter or most
important current talker from those of the listeners that not all
end-points receiver. Also separation for different processing based on
media types such as audio and video in end-points and central nodes.
Thus providing the node with the knowledge that any SSRC within the
session is supposed to be processed in a similar or same way.</t>
<t>For simpler cases, where the streams within each media type need
the same processing, it is clearly possible to find other multiplex
solutions, for example based on the Payload Type and the differences
in encoding that the payload type allows to describe. This may anyhow
be insufficient when you get into more advanced usages where you have
multiple sources of the same media type, but for different usages or
as alternatives. For example when you have one set of video sources
that shows session participants and another set of video sources that
shares an application or slides, you likely want to separate those
streams for various reasons such as control, prioritization, QoS,
methods for robustification, etc. In those cases, using the RTP
session for separation of properties is a powerful tool. A tool with
properties that need to be preserved when providing a solution for how
to use only a single lower-layer transport.</t>
<t>For more discussion of the usage of RTP sessions verses other
multiplexing we recommend <xref
target="I-D.westerlund-avtcore-multiplex-architecture">RTP
Multiplexing Architecture</xref>.</t>
</section>
<section anchor="seqreq-multi-ssrc"
title="Same SSRC Value in Multiple RTP Sessions">
<t>Two different RTP sessions being multiplexed on the same lower
layer transport need to be able to use the same SSRC value. This is a
strong requirement, for two reasons:</t>
<t><list style="numbers">
<t>To avoid mandating SSRC assignment rules that are coordinated
between the sessions. If the RTP sessions multiplexed together
must have unique SSRC values, then additional code that works
between RTP Sessions is needed in the implementations. Thus
raising the bar for implementing this solution. In addition, if
one gateways between parts of a system using this multiplexing and
parts that aren't multiplexing, the part that isn't multiplexing
must also fulfill the requirements on how SSRC is assigned or
force the gateway to translate SSRCs. Translating SSRC is actually
hard as it requires one to understand the semantics of all current
and future RTP and RTCP extensions. Otherwise a barrier for
deploying new extensions is created.</t>
<t>There are some few RTP extensions that currently rely on being
able to use the same SSRC in different RTP sessions: <list
style="symbols">
<t>XOR FEC (RFC5109)</t>
<t>RTP Retransmission in session mode (RFC4588)</t>
<t>Certain Layered Coding</t>
</list></t>
</list></t>
</section>
<section title="SRTP">
<t><xref target="RFC3711">SRTP</xref> is one of the most commonly used
security solutions for RTP. In addition, it is the only one
recommended by IETF that is integrated into RTP. This integration has
several aspects that needs to be considered when designing a solution
for multiplexing RTP sessions on the same lower layer transport.</t>
<t><list style="hanging">
<t hangText="Determining Crypto Context:">SRTP first of all needs
to know which session context a received or to-be-sent packet
relates to. It also normally relies on the lower layer transport
to identify the session. It uses the MKI, if present, to determine
which key set is to be used. Then the SSRC and sequence number are
used by most crypto suites, including the most common use of AES
Counter Mode, to actually generate the correct cipher stream.</t>
<t hangText="Unencrypted Headers:">SRTP has chosen to leave the
RTP headers and the first two 32-bit words of the first RTCP
header unencrypted, to allow for both header compression and
monitoring to work also in the presence of encryption. As these
fields are in clear text they are used in most crypto suites for
SRTP to determine how to protect or recover the plain text.</t>
</list>It is here important to contrast SRTP against a set of other
possible protection mechanisms. DTLS, TLS, and IPsec are all
protecting and encapsulating the entire RTP and RTCP packets. They
don't perform any partial operations on the RTP and RTCP packets. Any
change that is considered to be part of the RTP and RTCP packet is
transparent to them, but possibly not to SRTP. Thus the impact on SRTP
operations must be considered when defining a mechanism.</t>
</section>
<section title="Don't Redefine Used Bits">
<t>As the core of RTP is in use in many systems and has a really large
deployment story and numerous implementations, changing any of the
field definitions is highly problematic. First of all, the
implementations need to change to support this new semantics.
Secondly, you get a large transition issue when you have some session
participants that support the new semantics and some that don't.
Combing the two behaviors in the same session can force the deployment
of costly and less than perfect translation devices.</t>
</section>
<section title="Firewall Friendly">
<t>It is desirable that current firewalls will accept the solutions as
normal RTP packets. However, in the authors' opinion we can't let the
firewall stifle invention and evolution of the protocol. It is also
necessary to be aware that a change that will make most deep
inspecting firewall consider the packet as not valid RTP/RTCP will
have more difficult deployment story.</t>
</section>
<section title="Monitoring and Reporting">
<t>It is desirable that a third party monitor can still operate on the
multiplexed RTP Sessions. It is however likely that they will require
an update to correctly monitor and report on multiplexed RTP
Sessions.</t>
<t>Another type of function to consider is packet sniffers and their
selector filters. These may be impacted by a change of the fields. An
observation is that many such systems are usually quite rapidly
updated to consider new types of standardized or simply common packet
formats.</t>
</section>
<section title="Usable Also Over Multicast">
<t>It is desirable that a solution should be possible to use also when
RTP and RTCP packets are sent over multicast, both Any Source
Multicast (ASM) and Single Source Multicast (SSM). The reason for this
requirement is to allow a system using RTP to use the same
configuration regardless of the transport being done over unicast or
multicast. In addition, multicast can't be claimed to have an issue
with using multiple ports, as each multicast group has a complete port
space scoped by address.</t>
</section>
<section title="Incremental Deployment">
<t>A good solution has the property that in topologies that contains
RTP mixers or Translators, a single session participant can enable
multiplexing without having any impact on any other session
participants. Thus a node should be able to take a multiplexed packet
and then easily send it out with minimal or no modification on another
leg of the session, where each RTP session is transported over its own
lower-layer transport. It should also be as easy to do the reverse
forwarding operation.</t>
</section>
</section>
<section title="Possible Solutions">
<t>This section looks at a few possible solutions and discusses their
feasibility.</t>
<section title="Header Extension">
<t>One proposal is to define an <xref target="RFC5285">RTP header
extension</xref> that explicitly enumerates the session identifier in
each packet. This proposal has some merits regarding RTP, since it
uses an existing extension mechanism; it explicitly enumerates the
session allowing for third parties to associate the packet to a given
RTP session; and it works with SRTP as currently defined since a
header extension is by default not encrypted, and is thus readable by
the receiving stack without needing to guess which session it belongs
to and attempt to decrypt it. This approach does, however, conflict
with the requirement from <xref target="RFC5285"></xref> that "header
extensions using this specification MUST only be used for data that
can be safely ignored by the recipient", since correct processing of
the received packet depends on using the header extension to
demultiplex it to the correct RTP session.</t>
<t>Using a header extension also result in the session ID is in the
integrity protected part of the packet. Thus a translator between
multiplexed and non-multiplexed has the options:</t>
<t><list style="numbers">
<t>to be part of the security context to verify the field</t>
<t>to be part of the security context to verify the field and
remove it before forwarding the packet</t>
<t>to be outside of the security context and leave the header
extension in the packet. However, that requires successful
negotiation of the header extension, but not of the functionality,
with the receiving end-points.</t>
</list></t>
<t>The biggest existing hurdle for this solution is that there exist
no header extension field in the RTCP packets. This requires defining
a solution for RTCP that allows carrying the explicit indicator,
preferably in a position that isn't encrypted by SRTCP. However, the
current SRTCP definition does not offer such a position in the
packet.</t>
<t>Modifying the RR or SR packets is possible using profile specific
extensions. However, that has issues when it comes to deployability
and in addition any information placed there would end up in the
encrypted part.</t>
<t>Another alternative could be to define another RTCP packet type
that only contains the common header, using the 5 bits in the first
byte of the common header to carry a session id. That would allow
SRTCP to work correctly as long it accepts this new packet type being
the first in the packet. Allowing a non-SR/RR packet as the first
packet in a compound RTCP packet is also needed if an implementation
is to support <xref target="RFC5506">Reduced Size RTCP packets</xref>.
The remaining downside with this is that all stack implementations
supporting multiplexing would need to modify its RTCP compound packet
rules to include this packet type first. Thus a translator box between
supporting nodes and non-supporting nodes needs to be in the crypto
context.</t>
<t>This solution's per packet overhead is expected to be 64-bits for
RTCP. For RTP it is 64-bits if no header extension was otherwise used,
and an additional 16 bits (short header), or 24 bits plus (if needed)
padding to next 32-bits boundary if other header extensions are
used.</t>
</section>
<section anchor="sec-mux-shim" title="Multiplexing Shim">
<t>This proposal is to prefix or postfix all RTP and RTCP packets with
a session ID field. This field would be outside of the normal RTP and
RTCP packets, thus having no impact on the RTP and RTCP packets and
their processing. An additional step of demultiplexing processing
would be added prior to RTP stack processing to determine in which RTP
session context the packet shall be included. This has also no impact
on SRTP/SRTCP as the shim layer would be outside of its protection
context. The shim layer's session ID is however implicitly integrity
protected as any error in the field will result in the packet being
placed in the wrong or non-existing context, thus resulting in a
integrity failure if processed by SRTP/SRTCP.</t>
<t>This proposal is quite simple to implement in any gateway or
translating device that goes from a multiplexed to a non-multiplexed
domain or vice versa, as only an additional field needs to be added to
or removed from the packet.</t>
<t>The main downside of this proposal is that it is very likely to
trigger a firewall response from any deep packet inspection device. If
the field is prefixed, the RTP fields are not matching the heuristics
field (unless the shim is designed to look like an RTP header, in
which case the payload length is unlikely to match the expected value)
and thus are likely preventing classification of the packet as an RTP
packet. If it is postfixed, it is likely classified as an RTP packet
but may not correctly validate if the content validation is such that
the payload length is expected to match certain values. It is expected
that a postfixed shim will be less problematic than a prefixed shim in
this regard, but we are lacking hard data on this.</t>
<t>This solution's per packet overhead is 1 byte.</t>
</section>
<section anchor="sec-sol-single" title="Single Session">
<t>Given the difficulty of multiplexing several RTP sessions onto a
single lower-layer transport, it's tempting to send multiple media
streams in a single RTP session. Doing this avoids the need to
de-multiplex several sessions on a single transport, but at the cost
of losing the RTP session as a separator for different type of
streams. Lacking different RTP sessions to demultiplex incoming
packets, a receiver will have to dig deeper into the packet before
determining what to do with it. Care must be taken in that inspection.
For example, you must be careful to ensure that each real media source
uses its own SSRC in the session and that this SSRC doesn't change
media type.</t>
<t>The loss of the RTP session as a separator for different usages or
purpose would be an minor issue if the only difference between the RTP
sessions is the media type. In this case, the application could use
the Payload Type field to identify the media type. The loss of the RTP
Session functionality is however severe, if the application uses the
RTP Session for separating different treatments, contexts etc. Then
you would need additional signalling to bind the different sources to
groups which can help make the necessary distinctions.</t>
<t>However, the loss of the RTP session as separator is not the only
issue with this approach. The <xref
target="I-D.westerlund-avtcore-multiplex-architecture">RTP
Multiplexing Architecture</xref> discusses a number of issues in
Section 6.7. These include RTCP bandwidth differences, limitations in
the number of payload types, media aware RTP mixers and interactions
with Legacy end-points. </t>
<t>Additional attention should be place on this important aspect. In
multi-party situations using central nodes there exist some
difficulties in having a legacy implementation using multiple RTP
sessions interworking with an end-point having only a single RTP
session across the central node. The main reason is the fact that the
one using single session with multiple media types has only one SSRC
space, while the other end-points have multiple spaces. Thus
translation may have to occur because there is several RTP sessions
using the same SSRC value. This has both limitations, processing
overhead and the possibility of becoming an deployment obstacle for
new RTP/RTCP extensions.</t>
<t>This approach has been proposed in the RTCWeb context in <xref
target="I-D.lennox-rtcweb-rtp-media-type-mux"/> and <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>. These drafts
describe how to signal multiple media streams multiplexed into a
single RTP session, and address some of the issues raised here and in
Section 6.7 of the <xref
target="I-D.westerlund-avtcore-multiplex-architecture"> RTP
Multiplexing Architecture</xref> draft.</t>
<t>This method has several limitations that limits its usage as
solution in providing multiple RTP sessions on the same lower layer
transport. However, we acknowledge that there are some uses for which
this method may be sufficient and which can accept the methods
limitations and downsides. The RTCWEB WG has a working assumption to
support this method. For more details of this method, see the relevant
drafts under development. We do include this method in the comparison
to provide a more complete picture of the pro and cons of this
method.</t>
<t>This solution has no per packet overhead. The signalling overhead
will be a different question.</t>
</section>
<section title="Use the SRTP MKI field">
<t>This proposal is to overload the MKI SRTP/SRTCP identifier to not
only identify a particular crypto context, but also identify the
actual RTP Session. This clearly is a miss use of the MKI field,
however it appears to be with little negative implications. SRTP
already supports handling of multiple crypto contexts.</t>
<t>The two major downsides with this proposal is first the fact that
it requires using SRTP/SRTCP to multiplex multiple sessions on a
single lower layer transport. The second issue is that the session ID
parameter needs to be put into the various key-management schemes and
to make them understand that the reason to establish multiple crypto
contexts is because they are connected to various RTP Sessions.
Considering that SRTP have at least 3 used keying mechanisms, <xref
target="RFC5764">DTLS-SRTP</xref>, <xref target="RFC4568">Security
Descriptions</xref>, and <xref target="RFC3830">MIKEY</xref>, this is
not an insignificant amount of work.</t>
<t>This solution has 32-bit per packet overhead, but only if the MKI
was not already used.</t>
</section>
<section anchor="sec-sol-padding" title="Use an Octet in the Padding">
<t>The basics of this proposal is to have the RTP packet and the last
(required by RFC3550) RTCP packet in a compound to include padding, at
least 2 bytes. One byte for the padding count (last byte) and one byte
just before the padding count containing the session ID.</t>
<t>This proposal uses bytes to carry the session ID that have no
defined value and is intended to be ignored by the receiver. From that
perspective it only causes packet expansion that is supported and
handled by all existing equipment. If an implementation fails to
understand that it is required to interpret this padding byte to learn
the session ID, it will see a mostly coherent RTP session except where
SSRCs overlap or where the payload types overlap. However, reporting
on the individual sources or forwarding the RTCP RR are not completely
without merit.</t>
<t>There is one downside of this proposal and that has to do with
SRTP. To be able to determine the crypto context, it is necessary to
access to the encrypted payload of the packet. Thus, the only
mechanism available for a receiver to solve this issue is to try the
existing crypto contexts for any session on the same lower layer
transport and then use the one where the packet decrypts and verifies
correctly. Thus for transport flows with many crypto contexts, an
attacker could simply generate packets that don't validate to force
the receiver to try all crypto contexts they have rather than
immediately discard it as not matching a context. A receiver can
mitigate this somewhat by using heuristics based on the RTP header
fields to determine which context applies for a received packet, but
this is not a complete solution.</t>
<t>This solution has a 16-bit per packet overhead.</t>
</section>
<section anchor="sec-sol-ssrc" title="Redefine the SSRC field">
<t>The Rosenberg et. al. Internet draft <xref
target="I-D.rosenberg-rtcweb-rtpmux">"Multiplexing of Real-Time
Transport Protocol (RTP) Traffic for Browser based Real-Time
Communications (RTC)"</xref> proposed to redefine the SSRC field. This
has the advantage of no packet expansion. It also looks like regular
RTP. However, it has a number of implications. First of all it
prevents any RTP functionality that require the same SSRC in multiple
RTP sessions.</t>
<t>Secondly its interoperability with end-point using multiple RTP
sessions are problematic. Such interoperability will requires an SSRC
translator function in the gatewaying node to ensure that the SSRCs
fulfill the semantic rules of the different domains. That translator
is actually far from easy as it needs to understand the semantics of
all RTP and RTCP extensions that include SSRC/CSRC. This as it is
necessary to know when a particular matching 32-bit pattern is an SSRC
field and when the field is just a combination of other fields that
create the same matching 32-bit pattern. Thus there is a possibility
that such a translator becomes a obstacle in deploying future RTP/RTCP
extensions. In addition the translator actually have significant
overhead when SRTP are in use. This as a verification that the packet
is authentic, decryption, SSRC translation, encryption and finally
generation of authentication tags are required. In addition the
translator must be part of the security context.</t>
<t>This solution has no per packet overhead.</t>
</section>
</section>
<section title="Comparison">
<t>This section compares the above potential solutions with the
requirements. Motivations are provided in addition to a high level
metric of successfully, partially and failing to meet requirement. In
the end a <xref target="fig-comp-table">summary table</xref> of the high
level value are provided.</t>
<section title="Support of Multiple RTP Sessions Over Single Transport">
<t>This one is easy to determine. Only the single session proposal
fails this requirement as it is not at all designed to meet it. The
rest fully support this requirement. The main question around this
requirement is how important it is to have as discussed in <xref
target="sec-req-multiple-session"></xref>.</t>
</section>
<section title="Enable Same SSRC Value in Multiple RTP Sessions">
<t>Based on the discussion in <xref target="seqreq-multi-ssrc"></xref>
two sub-requirements have been derived.</t>
<section title="Avoid SSRC Translation in Gateways/Translation">
<t>This sub-requirement is derived based on the desire to avoid
having gateways or translators perform full SSRC translation to
minimize complexity, avoid the requirement to have gateways in
security context, and as a hinder to long-term evolution. Two of the
proposals have issues with this, due to their lack of support for
multiple 32-bit SSRC spaces and lacking possibility to have the same
SSRC value in multiple RTP sessions. The proposals that have these
properties and thus are marked as failing are the Single Session and
Redefine the SSRC field. The other proposals are all succcesful in
meeting this requirement.</t>
</section>
<section title="Support Existing Extensions">
<t>The second sub-requirement is how well the proposals support
using the existing RTP mechanisms. Here both Single Session and
Redefine the SSRC field will have clear issues as they cannot
support the same full 32-bit SSRC value in two different RTP
sessions. This is clearly an issue for the XOR based FEC. RTP
retransmission and scalable encoding are minor issues as there exist
alternatives to those mechanisms that works with the structure of
these two proposals. Thus we give them a fail. The Header Extension
gets a partial due to unclear interaction between putting in an
header extension and these mechanisms.</t>
</section>
</section>
<section title="Ensure SRTP Functions">
<t>This requirement is about ensuring both secure and efficient usage
of SRTP. The Octet in Padding field proposal gets a fail as the
receiving end-point cannot determine the intended RTP session prior to
de-encryption of the padding field. Thus a catch-22 arises which can
only be resolved by trying all session contexts and see what decrypts.
This causes a security vulnerability as an attacker can inject a
packet which does not meet any of the session contexts. The receiver
will then attempt decryption and authentication of it using all its
session contexts, increasing the amount of wasted resources by a
factor equal to the number of multiplexed sessions. Thus this proposal
gets a fail.</t>
<t>The proposal of Overloading the SRTP MKI field as session
identifier gets a partial due to the fact that it cannot use SRTP's
key-management mechanism out of the box. It forces the key-management
mechanism and the SRTP implementations to maintain the MKI-to-RTP
session bindings to maintain secure and correct function.</t>
<t>The Redefine the SSRC field gets a partial due to its need to
modify the key-management mechanisms to correctly identify the partial
SSRC space the parameters applies to. Similarly, the SRTP
implementation also needs to be updated to correctly support this
security context differentiation.</t>
<t>The header extension based solution gets a less severe partial than
Redefine the SSRC and the MKI. It will however have an issue when
being gatewayed to a domain that does not multiplex multiple RTP
sessions over the same transport. Then the gateway will require to be
in the security context to be able to add or remove the header
extension as it is in the part of the packet that is integrity
protected by SRTP.</t>
<t>The remaining two proposals do not affect SRTP mechanisms and thus
successfully meet this requirement.</t>
</section>
<section title="Don't Redefine Used Bits">
<t>This requirement is all about RTP and RTCP header fields having a
given definition should not be changed as it can cause
interoperability problems between modified and non-modified
implementations. This becomes especially problematic in RTP sessions
used for multi-party sessions.</t>
<t>Redefine the SSRC field gets a big fail on this as it redefines the
SSRC field, a core field in RTP. It has been identified that such a
change will have issues since if it gets connected to a non-modified
end-point that randomly assigns the SSRC, as supposed by RFC 3550,
those SSRCs will be distributed over different RTP sessions at the
modified end-point. Also other functions using the SSRC field, not
understanding the additional semantics of the SSRC field, is likely to
have issues.</t>
<t>Using the SRTP MKI field to identify a session is overloading that
field with double semantics. This likely has minimal negative impact
in RTP since it should be possible to have the SRTP stack use the MKI
field to both look up the security context and which output RTP
session the processed packet belongs to. However, this redefinition
clearly creates issues with the key-management scheme. That will have
to be modified to handle both this change and deal with the
interoperability issues when negotiating its usage. This gets a full
fail due to that it makes the problem someone else's, namely the RTP
implementors.</t>
<t>Defining an Octet in the Padding field redefines a field, whose
definition is to have zero value and is expected to be ignored by the
receiver according to the original semantics. Thus this is one of the
more benign modifications one can do, however this can still cause
issues in implementations that unnecessarily check the field values,
or in firewalls. This is judged to be partially meeting the
requirement.</t>
<t>The Header Extension proposal does in fact not redefine any
currently used bits in RTP. The header extension would be a correctly
identified extension with its own definition. However, it does
redefine a rule on what header extensions are for. The RTCP solution
however would have more severe impact as it would need to redefine the
standard meaning of an RTCP packet header in addition to the default
compound packet rules. Due to these issues the proposal fails to meet
this requirement.</t>
<t>The multiplexing shim and the single session both successfully
meet this requirement.</t>
</section>
<section title="Firewall Friendly ">
<t>This requirement is clearly difficult to judge as firewall
implementations are highly different in both implementation, scope of
what it investigates in packets, and set policies. A reasonable goal
is to minimize the likeliness that rules and policies intended to let
RTP media streams pass, will also let these streams through when
multiplexing RTP sessions over a single transport. The below analysis
shows that no solution is truly firewall friendly and all are judged
as being partially meeting this goal. However, the reason why it is
believed that a firewall might react to the streams are quite
different.</t>
<t>The Single Session and Redefine the SSRC field are likely the least
suspect solutions from a firewall perspective. However, as their
transport flows contain multiple SSRCs with payloads that indicate
likely multiple different media types they are still likely to make a
picky firewall block the transport. This is especially true for
firewalls that take signalling messages into account where it will
expect a particular media type in a given context. A non upgraded
firewall might in fact produce two different contexts with overlapping
transport parameters where both rules will receive media streams of
the other media type that are outside of the allowed rule. However, to
be clear if these proposals doesn't get through, none of the other will
either as they all will have this behavior.</t>
<t>The header extension proposal is potentially problematic for two
reasons. The first reason, which also other proposals has, is related
to that the same SSRC value can exist in two RTP sessions over the
same underlying flow. Anyone tracking the sequence number and
timestamp will react badly as the second media stream with the same
SSRC causes constant jumps back and forth in these fields compared to
the first stream, if packets are transmitted simultaneously for both
SSRCs. This issue can likely only be solved by having the firewalls
that like to track flows to also use the session identifier to create
context. This is possible as the header extension will be in the clear
and in the front. The second issue is that the header extension itself
may get the firewall to react. Especially very picky ones that expect
packets with certain media types to have certain packet lengths. They
are not compatible with a header extension.</t>
<t>The Multiplexing Shim shares the issue with multiple flows for the
same SSRC. Firewalls and deep packet inspection cause the shim
placement to be in question. If it is a pre-fixed shim, it prevents
the packet from looking like regular IP/UDP/RTP packets and be
correctly classified in firewalls and DPI engines. However, if one
puts it last, it is unlikely that any firewall or DPI ever will be
able to take the session context into account as it is at the end of
the packet. This as many line rate processing devices only take a
certain amount of the the headers into account.</t>
<t>The SRTP MKI field is likely the solution that has least firewall
and DPI issues, after the single RTP session. There is no additional
suspect field. The only difference from a single RTP session in the
transport flow is the fact that multiple MKI are guaranteed to be
used. However, that may occur also in a single RTP session usage. Thus
the only issues are the one shared with single session and the one
that several RTP media streams may use the same SSRC.</t>
<t>The octet in the padding field has, in addition to the issues the
SRTP MKI field has, the single issue that it redefines something that
is supposed to be zero into a value. Thus potentially causing a deeply
inspecting firewall to clamp the flow in fear of covert channel or
non-compliance.</t>
</section>
<section title="Monitoring and Reporting">
<t>The monitoring and reporting requirement considers several aspects.
How useful monitoring can one get from an existing legacy monitor, and
secondary any issues in upgrading them to handle the selected
solution. Thirdly, packet selector filters and packet sniffers
concerns are considered.</t>
<t>In general one can expect the proposals that have only a single
SSRC space to work better with legacy. Thus both Single Session and
Redefine SSRC space can gather and report data on media flows most
likely. The only potential issue is that due to the different media
types and clock rates, some failure may occur. In particular a third
party monitor may be targeted to a specific media type, like
monitoring VoIP. That monitor will have problems processing any video
packets correctly and generate the VoIP specific metrics for any video
sending SSRC. In general, no legacy solution for monitoring will be
able to correctly create the sub-contexts that each RTP
session has in the solutions, without update to handle the new
semantics. Also when it comes to the packet filtering and selector
filters, fine grained control can only be accomplished implementing the
new semantics. Therefore only the Single Session meets this
requirement fully.</t>
<t>Redefine the SSRC field is close to fully meeting the
requirement, however due to that there exist a session structure that
is hidden to anyone that is not upgraded to understand the semantics,
this only gets a partial.</t>
<t>The other proposals all can have multiple RTP sessions using the
same SSRC. This will create significant issues for any legacy third
party monitor. Only an updated monitor, or for that matter packet
selector, can pick out the individual media streams and their
associated RTCP traffic. Thus all these proposals gets a failure to
meet the requirement.</t>
</section>
<section title="Usable over Multicast">
<t>As discussed earlier the goal with having the option usable also
over multicast is to remove the need to produce different media
streams for transport over unicast and multicast. All of the proposals
successfully meet the requirement.</t>
</section>
<section title="Incremental Deployment">
<t>The possibility to deploy the usage of the multiplexing of multiple
RTP sessions over a single transport, especially in the context of
multi-party sessions, is a great benefit for any of the proposals.
Thus not all end-point implementations needs to be upgraded before one
start enabling it in the central node and any signalling.</t>
<t>Considering a centralized multi-party application where some
participants are using multiple transport flows and you want to enable
one particular participant to use the single transport to the central
node, one criteria stands out. The possibility to have one RTP session
per transport in one leg, and in the next multiplex them together with
minimal complexity and packet changes. Here there are significant
differences.</t>
<t>The Multiplexing Shim has the least overhead for this. As the
central node or gateway between deployments only needs to either add
or remove the shim identifier and then forward the packet over the
corresponding transport, either a joint one on the single transport
side, or over the individual one on the multiple transport side.</t>
<t>The SRTP MKI field proposal is almost as good, as the only main
difference is the need to coordinate the used MKIs on the
non-multiplexed legs so that there is no overlap between the RTP
sessions. And if there is, the MKI can be translated in gateway as
SRTP has no integrity protection over the MKI. Thus both multiplexing
shim and SRTP MKI field does successfully meet this requirement.</t>
<t>The Header Extension supports multiple full 32-bit SSRC spaces and
can thus handle all the RTP sessions without need for any SSRC
translation, however this proposal does run into the problem that the
gateway needs to be in the security context to be able to add or
remove the header extension when SRTP is used. In addition to the
security implications of that, there is a complexity overhead due to
the need to redo the authentication tags on all RTP/RTCP packets. Thus
it gets a partial.</t>
<t>The Octet in the Padding field share issues with the header
extension but have even higher complexities for this. The reason is
that the padding field is also encrypted. Thus to add or remove it
(although removing it may be unncessary) forces the end-point to
encrypt at least that byte also, and for ciphers that are not
stream-ciphers, the whole packet needs to be re-encrypted. Thus this
proposal gets a very weak partially meeting the requirement.</t>
<t>The Single Session and Redefine the SSRC field do not allow several
vanilla RTP sessions to be connected to these proposals. The reason is
the single 32-bit SSRC space they have. Single Session only has one
session and the Redefine the SSRC fields uses some of the bits as
session identifier. This forces the gateway to translate the SSRC
whenever it does not fulfill the rules or semantics of the multiplexed
side. For Redefine SSRC field this becomes almost constant as the
session identifier part of the SSRC must be the same over all SSRCs
from the same session. For Single Session it may only be needed when
there otherwise would be an SSRC collision between the sessions. This
further assumes that the non-multiplexed side would never use any of
the RTP mechanisms that require the same SSRC in multiple RTP
sessions, as they cannot be gatewayed at all. When translating an SSRC
there is first of all an overhead, with SRTP that includes a complete
authenticate, decrypte, encrypt and create a new authentication tag
cycle. In addition, the SSRC translation could potentially be a
deployment obstacle for new RTP/RTCP extensions required to be
understood by the translator to be correctly translated. Therefore
these two proposals gets a fail to meet the requirements.</t>
</section>
<section title="Summary and Conclusion">
<t>This section contains a summary table of the high level outcome
against the different requirements.</t>
<t>A table mapping the requirements against the ID numbers used in the
table is the following:<list style="hanging">
<t hangText="1:">Support multiple RTP sessions over one transport
flow</t>
<t hangText="2:">Enable same SSRC value in multiple RTP sessions
<list style="hanging">
<t hangText="2.1:">Avoid SSRC translation in
gateways/translators</t>
<t hangText="2.2:">Support existing extensions</t>
</list></t>
<t hangText="3:">Ensure SRTP functions</t>
<t hangText="4:">Don’t Redefine used bits</t>
<t hangText="5:">Firewall Friendly</t>
<t hangText="6:">Monitoring and Reporting should still
function</t>
<t hangText="7:">Usable over Multicast</t>
<t hangText="8:">Incremental deployment</t>
<t hangText="OH:">Overhead in Bytes. + means variable</t>
</list></t>
<figure align="center" anchor="fig-comp-table"
title="Summary Table of Evaluation (Successfully (S), Partially (P) or Fails (F) to meet requirement)">
<artwork><![CDATA[
---------------+---+---+---+---+---+---+---+---+---+----
Solution | 1 |2.1|2.2| 3 | 4 | 5 | 6 | 7 | 8 | OH
---------------+---+---+---+---+---+---+---+---+---+----
Header Ext. | S | S | P | P | F | P | F | S | P | 8+
Multiplex Shim | S | S | S | S | S | P | F | S | S | 1
Single Session | F | F | F | S | S | P | S | S | F | 0
SRTP MKI Field | S | S | S | P | F | P | F | S | S | 4
Padding Field | S | S | S | F | P | P | F | S | P | 2
Redefine SSRC | S | F | F | P | F | P | P | S | S | 0
---------------+---+---+---+---+---+---+---+---+---+----
]]></artwork>
</figure>
<t></t>
<t>Considering these options, the authors would recommend that AVTCORE
standardize a solution based on a postfixed multiplexing field, i.e. a
shim approach combined with the appropriate signalling as described in
<xref target="sec-mux-shim"></xref>.</t>
</section>
</section>
<section title="Specification">
<t>This section contains the specification of the solution based on a
SHIM, with the explicit session identifier at the end of the
encapsulated payload.</t>
<section title="Shim Layer">
<t>This solution is based on a shim layer that is inserted in the
stack between the regular RTP and RTCP packets and the transport layer
being used by the RTP sessions. Thus the layering looks like the
following:</t>
<figure title="Stack View with Session ID SHIM">
<artwork><![CDATA[
+---------------------+
| RTP / RTCP Packet |
+---------------------+
| Session ID Layer |
+---------------------+
| Transport layer |
+---------------------+]]></artwork>
</figure>
<t>The above stack is in fact a layered one as it does allow multiple
RTP Sessions to be multiplexed on top of the Session ID shim layer.
This enables the example presented in <xref
target="fig-shim-example"></xref> where four sessions, S1-S4 is sent
over the same Transport layer and where the Session ID layer will
combine and encapsulate them with the session ID on transmission and
separate and decapsulate them on reception.</t>
<figure anchor="fig-shim-example"
title="Multiple RTP Session On Top of Session ID Layer">
<artwork><![CDATA[
+-------------------+
| S1 | S2 | S3 | S4 |
+-------------------+
| Session ID Layer |
+-------------------+
| Transport layer |
+-------------------+]]></artwork>
</figure>
<t>The Session ID layer encapsulates one RTP or RTCP packet from a
given RTP session and postfixes a one byte Session ID (SID) field to
the packet. Each RTP session being multiplexed on top of a given
transport layer is assigned either a single or a pair of unique SID in
the range 0-255. The reason for assigning a pair of SIDs to a given
RTP session are for RTP Sessions that doesn't support <xref
target="RFC5761">"Multiplexing RTP Data and Control Packets on a
Single Port"</xref> to still be able to use a single 5-tuple. The
reasons for supporting this extra functionality is that RTP and RTCP
multiplexing based on the payload type/packet type fields enforces
certain restrictions on the RTP sessions. These restrictions may not
be acceptable. As this solution does not have these restrictions,
performing RTP and RTCP multiplexing in this way has benefits.</t>
<t>Each Session ID value space is scoped by the underlying transport
protocol. Common transport protocols like UDP, DCCP, TCP, and SCTP can
all be scoped by one or more 5-tuple (Transport protocol, source
address and port, destination address and port). The case of multiple
5-tuples occur in the case of multi-unicast topologies, also called
meshed multiparty RTP sessions or in case any application would need
more than 128 RTP sessions.</t>
<figure align="center" anchor="fig-srtp-packet"
title="SRTP Packet encapsulated by Session ID Layer">
<artwork><![CDATA[
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
|V=2|P|X| CC |M| PT | sequence number | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| timestamp | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| synchronization source (SSRC) identifier | |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| contributing source (CSRC) identifiers | |
| .... | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| RTP extension (OPTIONAL) | |
+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | payload ... | |
| | +-------------------------------+ |
| | | RTP padding | RTP pad count | |
+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
| ~ SRTP MKI (OPTIONAL) ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| : authentication tag (RECOMMENDED) : |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | Session ID | |
| +---------------+ |
+- Encrypted Portion* Authenticated Portion ---+
]]></artwork>
</figure>
<t></t>
<figure align="center" anchor="fig-srtcp-packet"
title="SRTCP packet encapsulated by Session ID layer">
<artwork><![CDATA[
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
|V=2|P| RC | PT=SR or RR | length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| SSRC of sender | |
+>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| ~ sender info ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| ~ report block 1 ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| ~ report block 2 ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| ~ ... ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |V=2|P| SC | PT=SDES=202 | length | |
| +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| | SSRC/CSRC_1 | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| ~ SDES items ~ |
| +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| ~ ... ~ |
+>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| |E| SRTCP index | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
| ~ SRTCP MKI (OPTIONAL) ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| : authentication tag : |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | Session ID | |
| +---------------+ |
+-- Encrypted Portion Authenticated Portion -----+
]]></artwork>
</figure>
<t>The processing in a receiver when the Session ID layer is present
will be to<list style="numbers">
<t>Pick up the packet from the lower layer transport</t>
<t>Inspect the SID field value</t>
<t>Strip the SID field from the packet</t>
<t>Forward it to the (S)RTP Session context identified by the SID
value</t>
</list></t>
<t></t>
</section>
<section title="Signalling">
<t>The use of the Session ID layer needs to be explicitly agreed on
between the communicating parties. Each RTP Session the application
uses must in addition to the regular configuration such as payload
types, RTCP extension etc, have both the underlying 5-tuple (source
address and port, destination address and port, and transport
protocol) and the Session ID used for the particular RTP session. The
signalling requirement is to assign unique Session ID values to all
RTP Sessions being sent over the same 5-tuple. The same Session ID
shall be used for an RTP session independently of the traffic
direction. Note that nothing prevents a multi-media application from
using multiple 5-tuples if desired for some reason, in which case each
5-tuple has its own session ID value space.</t>
<t>This section defines how to negotiate the use of the Session ID
layer, using the Session Description Protocol (SDP) Offer/Answer
mechanism [RFC3264]. A new media-level SDP attribute,
‘session-mux-id’, is defined, in order to be used with the
media BUNDLE mechanism defined in <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"></xref>. The attribute
allows each media description (“m=” line) associated with
a ‘BUNDLE’ group to form separate RTP sessions.</t>
<t>The ‘session-mux-id’ attribute is included for a media
description, in order to indicate the Session ID for that particular
media description. Every media description that shares a common
attribute value is assumed to be part of a single RTP session. An SDP
Offerer MUST include the ‘session-mux-id’ attribute for
every media description associated with a ‘BUNDLE’ group.
If the SDP Answer does not contain ‘session-mux-id’
attributes, the SDP Offerer MUST NOT assume that separate RTP sessions
will be used. If the SDP Answer still describes a ‘BUNDLE’
group, the procedures in <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"></xref> apply.</t>
<t>An SDP Answerer MUST NOT include the ‘session-mux-id’
attribute in an SDP Answer, unless included in the SDP Offer.</t>
<t>The attribute has the following <xref target="RFC5234">ABNF</xref>
definition.</t>
<figure>
<artwork><![CDATA[
Session-mux-id-attr = "a=session-mux-id:" SID *SID-prop
SID = SID-value / SID-pairs
SID-value = 1*3DIGIT / "NoN"
SID-pairs = SID-value "/" SID-value ; RTP/RTCP SIDs
SID-prop = SP assignment-policy / prop-ext
prop-ext = token "=" value
assignment-policy = "policy=" ("tentative" / "fixed")
]]></artwork>
</figure>
<t>The following parameters MUST be configured as specified:</t>
<t><list style="symbols">
<t>RTP Profile SHOULD be the same, but MUST be compatible, like
AVP and AVPF.</t>
<t>RTCP bandwidth parameters are the same</t>
<t>RTP Payload type values are not overlapping</t>
</list>In declarative SDP usage, there is clearly no method for
fallback unless some other negotiation protocol is used.</t>
<t>The SID property "policy" is used in negotiation by an end-point to
indicate if the session ID values are merely a tentative suggestion or
if they must have these values. This is used when negotiating SID for
multi-party RTP sessions to support shared transports such as
multicast or RTP translators that are unable to produce renumbered
SIDs on a per end-point basis. The normal behavior is that the offer
suggest a tentative set of values, indicated by "policy=tentative".
These SHOULD be accepted by the peer unless that peer negotiate
session IDs on behalf of a centralized policy, in which case it MAY
change the value(s) in the answer. If the offer represents a policy
that does not allow changing the session ID values, it can indicate
that to the answerer by setting the policy to "fixed". This enables
the answering peer to either accept the value or indicate that there
is a conflict in who is performing the assignment by setting the SID
value to NoN (Not a Number). Offerer and answerer SHOULD always
include the policy they are operating under. Thus, in case of no
centralized behaviors, both offerer and answerer will indicate the
tentative policy.</t>
</section>
<section title="SRTP Key Management">
<t>Key management for SRTP do needs discussion as we do cause multiple
SRTP sessions to exist on the same underlying transport flow. Thus we
need to ensure that the key management mechanism still are properly
associated with the SRTP session context it intends to key. To ensure
that we do look at the three SRTP key management mechanism that IETF
has specified, one after another.</t>
<section title="Security Description">
<t><xref target="RFC4568">Session Description Protocol (SDP)
Security Descriptions for Media Streams</xref> as being based on SDP
has no issue with the RTP session multiplexing on lower layer
specified here. The reason is that the actual keying is done using a
media level SDP attribute. Thus the attribute is already associated
with a particular media description. A media description that also
will have an instance of the "a=session-mux-id" attribute carrying
the SID value/pair used with this particular crypto parameters.</t>
</section>
<section title="DTLS-SRTP">
<t><xref target="RFC5764">Datagram Transport Layer Security (DTLS)
Extension to Establish Keys for the Secure Real-time Transport
Protocol (SRTP)</xref> is a keying mechanism that works on the media
plane on the same lower layer transport that SRTP/SRTCP will be
transported over. Thus each DTLS message must be associated with the
SRTP and/or SRTCP flow it is keying.</t>
<t>The most direct solution is to use the SHIM and the SID context
identifier to be applied also on DTLS packets. Thus using the same
SID that is used with RTP and/or RTCP also for the DTLS message
intended to key that particular SRTP and/or SRTCP flow(s).</t>
</section>
<section title="MIKEY">
<t><xref target="RFC3830">MIKEY: Multimedia Internet KEYing</xref>
is a key management protocol that has several transports. In some
cases it is used directly on a transport protocol such as UDP, but
there is also a specification for how MIKEY is used with SDP "Key
Management Extensions for Session Description Protocol (SDP) and
Real Time Streaming Protocol (RTSP)" <xref
target="RFC4567"></xref>.</t>
<t>Lets start with the later, i.e. the SDP transport, which shares
the properties with Security Description in that is can be
associated with a particular media description in a SDP. As long as
one avoids using the session level attribute one can be certain to
correctly associate the key exchange with a given SRTP/SRTCP
context.</t>
<t>It does appear that MIKEY directly over a lower layer transport
protocol will have similar issues as DTLS.</t>
</section>
</section>
<section title="Examples">
<t></t>
<section title="RTP Packet with Transport Header">
<t>The below figure contains an RTP packet with SID field
encapsulated by a UDP packet (added UDP header).</t>
<figure title="SRTP Packet Encapsulated by Session ID Layer">
<artwork><![CDATA[
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
|V=2|P|X| CC |M| PT | sequence number | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| timestamp | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| synchronization source (SSRC) identifier | |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
| contributing source (CSRC) identifiers | |
| .... | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| RTP extension (OPTIONAL) | |
+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | payload ... | |
| | +-------------------------------+ |
| | | RTP padding | RTP pad count | |
+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
| ~ SRTP MKI (OPTIONAL) ~ |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| : authentication tag (RECOMMENDED) : |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| | Session ID | |
| +---------------+ |
+- Encrypted Portion* Authenticated Portion ---+
]]></artwork>
</figure>
<t></t>
</section>
<section title="SDP Offer/Answer example">
<t>This section contains SDP offer/answer examples. First one
example of successful BUNDLEing, and then two where fallback
occurs.</t>
<t>In the below SDP offer, one audio and one video is being offered.
The audio is using SID 0, and the video is using SID 1 to indicate
that they are different RTP sessions despite being offered over the
same 5-tuple.</t>
<figure>
<artwork><![CDATA[v=0
o=alice 2890844526 2890844526 IN IP4 atlanta.example.com
s=
c=IN IP4 atlanta.example.com
t=0 0
a=group:BUNDLE foo bar
m=audio 10000 RTP/AVP 0 8 97
b=AS:200
a=mid:foo
a=session-mux-id:0 policy=tentative
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 iLBC/8000
m=video 10000 RTP/AVP 31 32
b=AS:1000
a=mid:bar
a=session-mux-id:1 policy=tentative
a=rtpmap:31 H261/90000
a=rtpmap:32 MPV/90000]]></artwork>
</figure>
<t></t>
<t>The SDP answer from an end-point that supports this
BUNDLEing:</t>
<figure>
<artwork><![CDATA[v=0
o=bob 2808844564 2808844564 IN IP4 biloxi.example.com
s=
c=IN IP4 biloxi.example.com
t=0 0
a=group:BUNDLE foo bar
m=audio 20000 RTP/AVP 0
b=AS:200
a=mid:foo
a=session-mux-id:0 policy=tentative
a=rtpmap:0 PCMU/8000
m=video 20000 RTP/AVP 32
b=AS:1000
a=mid:bar
a=session-mux-id:1 policy=tentative
a=rtpmap:32 MPV/90000]]></artwork>
</figure>
<t></t>
<t>The SDP answer from an end-point that does not support this
BUNDLEing or the general signalling of <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"></xref>.</t>
<figure>
<artwork><![CDATA[v=0
o=bob 2808844564 2808844564 IN IP4 biloxi.example.com
s=
c=IN IP4 biloxi.example.com
t=0 0
m=audio 20000 RTP/AVP 0
b=AS:200
a=rtpmap:0 PCMU/8000
m=video 30000 RTP/AVP 32
b=AS:1000
a=rtpmap:32 MPV/90000]]></artwork>
</figure>
<t></t>
<t>The SDP answer of a client supporting <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"></xref> but not this
BUNDLEing would look like this:</t>
<figure>
<artwork><![CDATA[v=0
o=bob 2808844564 2808844564 IN IP4 biloxi.example.com
s=
c=IN IP4 biloxi.example.com
t=0 0
a=group:BUNDLE foo bar
m=audio 20000 RTP/AVP 0
a=mid:foo
b=AS:200
a=rtpmap:0 PCMU/8000
m=video 20000 RTP/AVP 32
a=mid:bar
b=AS:1000
a=rtpmap:32 MPV/90000]]></artwork>
</figure>
<t>In this last case, the result is a sing RTP session with both
media types being established. If that isn't supported or desired,
the offerer will have to either re-invite without the BUNDLE
grouping to force different 5-tuples, or simply terminate the
session.</t>
</section>
</section>
</section>
<section title="Open Issues">
<t>This work is still in the early phase of specification. This section
contains a list of open issues where the author desires some input.</t>
<t><list style="numbers">
<t>Should RTP and RTCP multiplexing without RFC 5761 support be
included?</t>
<t>In Section 6.2 there is a discussion of which parameters that
must be configured. The scope of these rules and if they do make
sense needs additional discussion.</t>
<t>Can we provide better control so that applications that doesn't
desire fallback to single RTP session when Multiplexing shim fails
to be supported but Bundle is supported ends up with a better
alternative?<!--I think yes since it is so simple to do it, but may this upset the supporters of RFC 5761 or
is there any general rule of not duplicating functionality in existing RFC's?
--></t>
</list></t>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This document request the registration of one SDP attribute. Details
of the registration to be filled in.</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>The security properties of the Session ID layer is depending on what
mechanism is used to protect the RTP and RTCP packets of a given RTP
session. If IPsec or transport layer security solutions such as DTLS or
TLS are being used then both the encapsulated RTP/RTCP packets and the
session ID layer will be protected by that security mechanism. Thus
potentially providing both confidentiality, integrity and source
authentication. If SRTP is used, the session ID layer will not be
directly protected by SRTP. However, it will be implicitly integrity
protected (assuming the RTP/RTCP packet is integrity protected) as the
only function of the field is to identify the session context. Thus any
modification of the SID field will attempt to retrieve the wrong SRTP
crypto context. If that retrieval fails, the packet will be anyway be
discarded. If it is successful, the context will not lead to successful
verification of the packet.</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t>This document is based on the input from various people, especially
in the context of the RTCWEB discussion of how to use only a single
lower layer transport. The RTP and RTCP packet figures are borrowed from
RFC3711. The SDP example is extended from the one present in <xref
target="I-D.ietf-mmusic-sdp-bundle-negotiation"></xref>. The authors
would like to thank Christer Holmberg for assistance in utilizing the
BUNDLE grouping mechanism.</t>
<t>The proposal in <xref target="sec-sol-padding"></xref> is original
suggested by Colin Perkins. The idea in <xref
target="sec-sol-ssrc"></xref> is from an <xref
target="I-D.rosenberg-rtcweb-rtpmux">Internet Draft</xref> written by
Jonathan Rosenberg et. al. The proposal in <xref
target="sec-sol-single"></xref> is a result of discussion by a group of
people at IETF meeting #81 in Quebec.</t>
<t></t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc3550;
&rfc3711;
&rfc5234;
&draft-ietf-mmusic-sdp-bundle-negotiation;
</references>
<references title="Informational References">
&rfc3264;
&rfc3830;
&rfc4567;
&rfc4568;
&rfc5285;
&rfc5506;
&rfc5761;
&rfc5764;
&draft-rosenberg-rtcweb-rtpmux;
<?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>
<?rfc include='reference.I-D.lennox-rtcweb-rtp-media-type-mux'?>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 13:31:06 |