http://stupid.domain.name/ietf/

One document matched: draft-lennox-clue-rtp-usage-00.xml
<?xml version='1.0'?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 SYSTEM
      'reference.RFC.2119.xml'>
    <!ENTITY rtp SYSTEM
      'reference.RFC.3550.xml'>
    <!ENTITY clueusecases SYSTEM
      'reference.I-D.ietf-clue-telepresence-use-cases.xml'>
    <!ENTITY cluerequirements SYSTEM
      'reference.I-D.ietf-clue-telepresence-requirements.xml'>
    <!ENTITY clueframework SYSTEM
      'reference.I-D.ietf-clue-framework.xml'>
    <!ENTITY h281 SYSTEM
      'reference.ITU.H281.1994.xml'> 
    <!ENTITY h224rtp SYSTEM
      'reference.RFC.4573.xml'>
    <!ENTITY contentattr SYSTEM
      'reference.RFC.4796.xml'>
    <!ENTITY hdrext SYSTEM
      'reference.RFC.5285.xml'>
]>

<?rfc toc="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>

<rfc category='std' ipr='trust200902' docName='draft-lennox-clue-rtp-usage-00'>
    <front>
        <title abbrev='RTP Usage for Telepresence'>
		Real-Time Transport Protocol (RTP) Usage for Telepresence Sessions
        </title>

        <author initials='J.' surname='Lennox'
                fullname='Jonathan Lennox'>
            <organization abbrev='Vidyo'>
               Vidyo, Inc.
            </organization>
            <address>
               <postal>
                   <street>433 Hackensack Avenue</street>
                   <street>Seventh Floor</street>
                   <city>Hackensack</city> <region>NJ</region>
                   <code>07601</code>
                   <country>US</country>
               </postal>
               <email>jonathan@vidyo.com</email>
            </address>
        </author>

        <author initials="A." surname="Romanow"
                fullname="Allyn Romanow">
            <organization>Cisco Systems</organization>
    
            <address>
               <postal>
				   <street> </street>
                   <city>San Jose</city> <region>CA</region>
                   <code>95134</code>
                   <country>USA</country>
               </postal>
               <email>allyn@cisco.com</email>
            </address>
        </author>

        <date />
        <area>RAI</area>
        <workgroup>RTCWEB</workgroup>

        <keyword>I-D</keyword>
        <keyword>Internet-Draft</keyword>
		<!-- TODO: more keywords -->

        <abstract>
		<t>
		  This document describes mechanisms and recommended practice
		  for transmitting the media streams of telepresence sessions
		  using the Real-Time Transport Protocol (RTP).
		</t>
        </abstract>

    </front>

<middle>

<section title='Introduction' anchor='introduction'>

<t>Telepresence systems, of the architecture described by
<xref target='I-D.ietf-clue-telepresence-use-cases' />
and <xref target='I-D.ietf-clue-telepresence-requirements' />, will send and
receive multiple media streams, where the number of streams in use is
potentially large and asymmetric between endpoints, and streams can
come and go dynamically.  These characteristics lead to a number of architectural
design choices which, while still in the scope of potential
architectures envisioned by the <xref target='RFC3550'>Real-Time
Transport Protocol</xref>, must be fairly different than those
typically implemented by the current generation of voice or video
conferencing systems.  This document makes recommendations about how
streams should be encoded and transmitted in RTP for this telepresence
architecture.</t>

</section>

<section title='Terminology'>

<t>The key words "MUST", "MUST NOT", "REQUIRED", 
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 
and "OPTIONAL" in this document
are to be interpreted as described in <xref
target='RFC2119'>RFC 2119</xref> and indicate requirement levels for
compliant implementations.</t>

</section>

<section title='Source multiplexing'>

<t>Telepresence sessions have lots of media streams: easily dozens at a time
  (given, e.g., a continuous presence screen in a multi-point
  conference), potentially out of a
  possible pool of hundreds.  Furthermore, endpoints will have an
  asymmetric number of media streams.</t>

<t>In such an environment the usual model of existing SIP endpoints –-
  sending zero or one source (in each direction) per RTP session --
  doesn’t scale, and mapping asymmetric numbers of sources to sessions
  is needlessly complex.</t>

<t>Therefore, telepresence systems SHOULD use a single RTP session per
  media type, except where
  there's a need to give sessions different transport treatment.  All
  sources of the same media type are sent over this single RTP
  session.  This architecture (known as "source multiplexing") was
  defined by <xref target='RFC3550' />, but was used rarely until more
  recently by some Telepresence systems.  
</t>

<t>
Multiplexing multiple media streams in this way has additional
advantages.   It makes going through middle boxes  
considerably easier, as it allows Telepresence devices to work through
SIP B2BUAs that do not support multiple media lines of  
the same media type. It also simplifies NAT and firewall traversal by allowing
endpoint to deal with only a single address/port mapping  
per media type rather than multiple mappings.
</t>

<t>
During call setup, a single RTP session is negotiated for each media type.  
In SDP, only one media line is negotiated per media and multiple media
streams are sent over the same UDP channel negotiated  
using the SDP media line.
</t>

</section>

<section title='Demultiplexing sources'>

<t>
The main issue to work out is how receivers demultiplex and interpret the
multiple media streams. There are several alternatives  
that need to be analyzed in light of their advantages and
disadvantages for different scenarios. In
the <xref target='I-D.ietf-clue-framework'>CLUE Framework</xref>,
different types of audio and video captures are
distinguished. There are main and presentation video captures, and "switched" and  
"composite/mixed" synthetic captures [edt. names may chance in the Framework].
</t>

<t>In <xref target='RFC3550'>RTP</xref>, a synchroniztion source
  (SSRC) value corresponds to
  a single stream of media, which is fed into a single decoding process.  It
  therefore seems fairly clear that statically-transmitted captures,
  and probably composited/mixed captures,
  should be distinguished by SSRC values.  There may, however, be
  additional meta-data needed as well, for instance to communicate
  layout decisions or recommendations about the sources</t> 

<t>
For switched sources, however, things are less clear.  Switched
sources could be sent with an SSRC corresponding to the
switched source, with some meta-data indicating the original source
that is being sent at any given time.  Alternately, they could be sent
with an SSRC indicating the original source that is being transmitted,
with meta-data indicating that the source is being sent because it
corresponds to a receiver's requested switched source.
</t>

<t>
In each case, this meta-data could either be sent in-band in the RTP
session, or out-of-band using some sort of lightweight signaling
protocol.
</t>

<t>
If the meta-data is sent in the RTP session, it could either be sent
as the RTP CSRC, or as a new <xref target='RFC5285'>RTP header
tension</xref>.  In the semantics of RTP, the CSRC would seem to make
sense for describing composited sources, and for switched sources if the
SSRC corresponds to the switch rather than the original sources.  RTP
header extensions would probably make more sense for other cases.</t>


<t>All these potential approaches need further study to identify their
benefits and drawbacks.</t>


</section>

<section title='Transmission of presentation sources'>

<t>Most existing videoconferencing systems use separate RTP sessions
  for main and presentation video sources, distinguished by the
  <xref target='RFC4796'>SDP content attribute</xref>.  It seems
  likely that telepresence systems should not maintain this
  transport-level separation, but should instead send both main and
  presentation video over the single video RTP session, in just the
  same manner as with multiple main video sources.  However, some
  receivers need to be able to treat main and presentation video
  differently, so there will need to be a mechanism for receivers to
  reliably distinguish the types of sources.</t>

</section>

<section title='Other considerations'>

<t>As currently defined, <xref target='ITU.H281.1994'>H.281 Far-End Camera
Control</xref><xref target='RFC4573' /> does not, in SIP-based
videoconferences, support selecting among multiple remote sources
(though it does in H.323 conferences controled by an MCU, which can
assign terminal IDs to sources).  When RTP sessions contain multiple sources,
this limitation becomes pressing.  (However, this problem does not
appear to be in scope of the CLUE working group.)</t>

</section>

<section title='Security Considerations' anchor='security'>

<t>The security considerations for multiplexed RTP do not seem to be
  different than for non-multiplexed RTP.</t>

</section>

<section title='IANA Considerations' anchor='iana'>

<t>This document makes no requests of IANA.</t>

<t>Note to RFC Editor: please remove this section before publication
  as an RFC.</t>

</section>

</middle>

<back>

<references title='Normative References'>

&rfc2119;

&rtp;

</references>

<references title='Informative References'>

&clueusecases;

&cluerequirements;

&clueframework;

&h281;

&h224rtp;

&contentattr;

&hdrext;

</references>

<!--
<section title='Open issues'>

<t><list style='symbols'>

<t></t>

</list></t>

</section>
-->

<!-- 
<section title='Changes From Earlier Versions'>

<t>Note to the RFC-Editor: please remove this section prior to publication
as an RFC.</t>

<section title='Changes From Draft -00'>

<t><list style='symbols'>

<t></t>


</list></t>

</section>

</section>

-->

</back>

</rfc>
PAFTECH AB 2003-2026
2026-04-24 05:40:41