One document matched: draft-lennox-raiarea-rtp-grouping-taxonomy-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
]>
<?rfc toc="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="info" docName="draft-lennox-raiarea-rtp-grouping-taxonomy-00"
ipr="trust200902">
<front>
<title abbrev="RTP Grouping Taxonomy">A Taxonomy of Grouping Semantics and Mechanisms
for Real-Time Transport Protocol (RTP) Sources</title>
<author fullname="Jonathan Lennox" initials="J." surname="Lennox">
<organization abbrev="Vidyo">Vidyo, Inc.</organization>
<address>
<postal>
<street>433 Hackensack Avenue</street>
<street>Seventh Floor</street>
<city>Hackensack</city>
<region>NJ</region>
<code>07601</code>
<country>US</country>
</postal>
<email>jonathan@vidyo.com</email>
</address>
</author>
<author fullname="Kevin Gross" initials="K." surname="Gross">
<organization abbrev="AVA">AVA Networks, LLC</organization>
<address>
<postal>
<street/>
<city>Boulder</city>
<region>CO</region>
<country>US</country>
</postal>
<email>kevin.gross@avanw.com</email>
</address>
</author>
<!-- Add more authors here! -->
<date/>
<area>RAI</area>
<keyword>I-D</keyword>
<keyword>Internet-Draft</keyword>
<!-- TODO: more keywords -->
<abstract>
<t>The terminology about, and associations among, Real-Time
Transport Protocol (RTP) sources can be complex and somewhat
opaque. This document describes a number of existing and
proposed relationships among RTP sources, and attempts to define
common terminology for discussing protocol entities and their
relationships.</t>
<t>This document is still very rough, but is submitted in the
hopes of making future discussion productive.</t>
</abstract>
</front>
<middle>
<section anchor="introduction" title="Introduction">
<t>Terminology and understanding about sources in RTP, and how
they relate, is very confused. Different protocols use the
same terms differently, and complexities addressed at one layer
are often glossed over or ignored at another.</t>
<t>This document attempts to provide some clarity by reviewing
the semantics of various aspects of sources in RTP. As an
organizing mechanism, it approaches this by describing various
ways that RTP sources can be grouped and associated together.</t>
</section>
<section title="RTP Terminology">
<section anchor="SSRC" title="Synchronization Source">
<t>In RTP, the smallest unit of media transport is the
"synchronization source" (SSRC). Defined in
<xref target="RFC3550"/>, a synchronization source is the source
of a stream of RTP packets, identified by a 32-bit numeric SSRC
identifier carried in the RTP header of every RTP packet. All
packets from a synchronization source form part of the same
timing and sequence number space, so a receiver groups packets
by synchronization source for playback. In the most common
cases, a synchronization source is a single flow of encoded
media that can be fed to a single decoding process.</t>
<t>In some more complex cases, a synchronization source is also
used to encode repair flows and sub-flows of a layered
encoding. See <xref target="alternative"/> and
<xref target="repair"/> below.</t>
<t>(Note: as the definition above mentions, "synchronization
source" in <xref target="RFC3550"/> technically refers to the
*source* of the stream of RTP packets, not the packets
themselves. A better term for the stream could be helpful, and
the authors are open to one; however, finding a good term that
doesn't collide with terms used in other standards is very
difficult. In particular, both "stream" and "flow" are
overloaded enough that their use would probably confuse more
than it clarifies.)</t>
</section>
<section anchor="RTPsession" title="RTP Session">
<t>The RTP protocol transmits sources in an "RTP session". An RTP
session is an association among a group of participants
communicating with RTP. It is a group communications channel which carries a number
of synchronization sources. Within an RTP session, every
participant finds out meta-data and control information (over
RTCP) about all the synchronization sources in the RTP session, all
synchronization source identifiers are unique in the RTP session,
and the bandwidth of the RTCP control channel is shared by all
the synchronization sources in the session.</t>
</section>
<section anchor="CSRC" title="Contributing Source">
<t>In an RTP session, in some cases,
the media flows of some synchronization sources
are not forwarded by RTP middleboxes (known as mixers and
translators). In the case of mixers, each of the synchronization
sources that are used to create a mixed stream of media becomes
visible instead as a "contributing source" (CSRC),
whose media is no longer directly visibile in some parts of the
RTP session, but whose meta-data is still known by all
participants through RTCP.</t>
</section>
<section anchor="Multimediasession" title="Multimedia Session">
<t>A group of RTP sessions can be grouped into a "multimedia
session", a group of concurrent, associated RTP sessions among a common group of
participants. A multimedia session defines logical relationships among
sources that appear in multiple RTP sessions.</t>
</section>
<section anchor="Mediasource" title="Media Source">
<t>At a higher level than this is the "media source", a single
piece of media that can be indepentently rendered. While in the
simplest cases a synchronization source uniquely
encodes a media source, many more complicated cases, discussed
in <xref target="alternative"/> and <xref target="repair"/> below,
associate multiple synchronization sources together to provide various
methods and options for encoding a media source. (In some of
these cases, these multiple synchronization sources are even
sent in separate RTP sessions, though always in the same
multimedia session.)</t>
<t>Unlike the
other terminology defined in this section, "media source" is not a
well-established term; existing protocols use a number of
different terms to define similar concepts, described below.
("Media source" may not be the best term in for this concept;
the authors of this document are open to better ones.)</t>
</section>
</section>
<section title="Terminology used other protocols and groups">
<section anchor="sdp" title="SDP">
<t>The <xref target="RFC4566">Session Description
Protocol</xref> provides a mechanism for describing -- and,
with <xref target="RFC3264">SDP Offer/Answer</xref>,
negotiation -- multimedia sessions.</t>
<section anchor="SDPMultimediaSession" title="SDP Multimedia Session">
<t>SDP uses the term "multimedia session" in essentially the
same manner as RTP.</t>
<section anchor="SDPMediaStream" title="Media Stream">
<t>"Media stream" is perhaps one of the most confusing terms
in SDP. It is never explicitly defined
in <xref target="RFC4566"/>; its meaning can only be inferred
from descriptions of its properties, the SDP syntax used to
describe it, and its usage in other protocols.</t>
<t>In SDP, a media stream is described my a "media
description", a block of SDP starting with an "m=" line and
followed by other SDP fields which are scoped to that media
description.</t>
<t>When SDP is used to describe a media stream that uses RTP,
the attributes used (in the core SDP protocol) all define
attributes of an RTP session. Furthermore,
<xref target="RFC4566"/> does note that "Media streams can be
many-to-many." Thus, when SDP describes a media stream that
uses RTP, the RTP-level protocol element that is described is
an RTP session, which can carry any number of synchronization
sources (and thus media sources).</t>
<t>However, for a number of reasons, many implementations of
SDP, notably SIP devices, instead interpret SDP media streams
as containing only a single media source in each direction
(assuming unicast flows). There are a number of reasons this
implementation decision was chosen, including the lack of
definition of "media stream" in the SDP specification; the
poorly chosen name (the term "media stream" seems like
something that should refer to what this document calls a
"media source"); the fact that core SDP, and offer/answer,
provide no description or negotiation mechanisms for anything
more fine-grained than a media source; and the fact that most
existing implementations have no need for more than one media
source and received of each media type.</t>
<t>Other than in discussions of other existing terminology,
this document attempts to avoid the use of the word "stream".</t>
</section>
</section>
<section anchor="clue" title="CLUE">
<t>The CLUE working group is ongoing work in the IETF to
define protocols and mechanism for interoperable telepresence
systems. Its architecture and terms are defined
in <xref target="I-D.ietf-clue-framework"/>.</t>
<section anchor="ClueCaptureScene" title="CLUE Capture Scene">
<t>In CLUE, a "Capture Scene" is a structure representing a spatial region containing
one or more Capture Devices, each capturing media
representing a portion of the region. The spatial region
represented by a scene may or may not correspond to a real
region in physical space, such as a room. A capture scene
includes attributes and one or more capture scene entries,
with each entry including one or more media captures.</t>
</section>
<section anchor="ClueCaptureSceneEntry" title="CLUE Capture Scene Entry">
<t>In CLUE, a "Capture Scene Entry" is a list of Media
Captures of the same media type that together form one way
to represent the entire Capture scene.</t>
</section>
<section anchor="ClueCapture" title="CLUE Capture">
<t>In CLUE, a "Media Capture" is a source of Media, such as
from one or more Capture Devices or constructed from other
Media streams.</t>
<t>The "Capture" is CLUE's term that is most closely
equivalent to what this document calls a "media source";
however, some usages (such as "dynamic sources" of "switched
captures") may be more complex, and are still actively being
defined.</t>
</section>
<section anchor="ClueCaptureEncoding" title="CLUE Capture Encoding">
<t>In CLUE, a "Capture Encoding" is a specific encoding of a
Media Capture, to be sent by a Media Provider to a Media
Consumer via RTP.</t>
<t>In simulcast scenearios (see
<xref target="alternative"/><!-- XXX replace with proper
subsection -->), multiple capture encodings can be used
simultaneously to transmit a single capture.</t>
</section>
</section>
<section anchor="webrtc" title="WebRTC">
<t>WebRTC is ongoing work in the IETF and W3C to define
mechanisms and APIs to allow Javascript applications in web
browsers to participate in multimedia sessions. An overview
can be found in <xref target="I-D.ietf-rtcweb-overview"/>.</t>
<section anchor="RtcMediaStreamTrack" title="WebRTC RtcMediaStreamTrack">
<t>RtcMediaStreamTrack</t>
<t>An "RtcMediaStreamTrack" is an object in WebRTC that maps most
closely to what this document calls a "media source",
though some of the details remain to be defined. However, as
WebRTC is defining an API, the RtcMediaStreamTrack object also
includes a good deal of meta-data about the media source, such as an ID,
enabled/disabled state, and so forth.</t>
</section>
<section anchor="RtcMediaStream" title="WebRTC RtcMediaStream">
<t>An "RtcMediaStream" in WebRTC is an object that groups a
set of RtcMediaStreamTracks that are synchronized with each other.</t>
</section>
</section>
</section>
</section>
<section title="Grouping of RTP sources">
<t>Synchronization sources can have a large variety of
relationships among them. These relationships can apply both
between sources within a single session, and between sources
that occur in multiple sessons.</t>
<t>Ways of relating synchronization sources typically involve groups: a set of
synchronizaiton sources has some relationship that applies to all those in the
group, and no others. (Relationships that involve arbitrary
non-grouping associations among synchronization sources, such
that e.g., A relates
to B and B to C, but A and C are unrelated, are uncommon if not
nonexistant.)</t>
<t>In many cases, the semantics of groups are not simply that
the sources form an undifferentiated group, but rather that
members of the group have certain roles.</t>
<section title="Overview of hierarchical relationships among groups">
<t>Many (though not all) associations among groups can be
described hierarchically.</t>
<t>These hierarchichal groups can be divided into three large
categories: associations among independent media sources;
alternative representations of media sources; and mechanisms for
robustness and repair of a particular representation of a media source.</t>
</section>
<section title="Existing and proposed source group semantics">
<t>We can divide group semantics into several broad categories.</t>
<t>(TODO: each of the items below needs to be described in detail.)</t>
<section title="Relationships among media sources">
<t>Synchronization Contexts -- things identified by a
CNAME.</t>
<t>CLUE Scenes (and their substructures, down to Captures.).</t>
<t>WebRTC MediaStreams.</t>
<t>Clock source signaling.</t>
</section>
<section anchor="alternative" title="Alternative representations of media sources">
<t>Simulcast.</t>
<t>Multi-Stream Transmission of Layered Codecs.</t>
<t>The original FID description in RFC 5888 (grouping) Section
8.</t>
</section>
<section anchor="repair" title="Robustness and Repair">
<t>Forward Error Correction (FEC). (But in some cases this
can also be across multiple media sources!)</t>
<t>Retransmission.</t>
</section>
<section title="Reporting Associations">
<t>RTP also uses SSRC values to identify receivers of media, the
source of feedback about synchronization sources (and thus, also media
sources). In some cases these receivers are also grouped.
However, these associations are not associations of senders
of media, and thus are not the same type of relationship as
the ones described above.</t>
<t>Inter-Domain Media Synchronization</t>
<t>Reporting groups.</t>
</section>
</section>
</section>
<section title="Existing and proposed mechanisms for describing groups">
<t>SDP groups</t>
<t>SDP source groups</t>
<t>SRCNAME</t>
<t>MSID</t>
<t>CLUE</t>
<t>Application-specific SDP attributes</t>
<t>SDP "number of ports" field to create multiple sessions with SSRC alignment.</t>
</section>
<section anchor="security" title="Security Considerations">
<t>Different for each way of grouping. Is there anything we
can generalize?</t>
<t>Hopefully having a well-defined common terminology and
understanding of the complexities of the RTP architecture will
help lead us to better standards, avoiding security problems.</t>
</section>
<section title="Open Issues">
<t>Much of the terminology is still a matter of dispute.</t>
<t>It might be useful to distinguish between a single endpoint's view of a
source, or RTP session, or multimedia session, versus the full
set of sessions and every endpoint that's communicating in them,
with the signaling that established them.</t>
<t>(Sure to be many more...)</t>
</section>
<section anchor="iana" title="IANA Considerations">
<t>This document makes no request of IANA.</t>
</section>
</middle>
<back>
<!-- I don't think we would have any normative references. -->
<references title="Informative References">
<?rfc include="reference.RFC.3264"?>
<?rfc include="reference.RFC.3550"?>
<?rfc include="reference.RFC.4566"?>
<?rfc include="reference.RFC.6222"?>
<?rfc include="reference.I-D.ietf-clue-framework"?>
<?rfc include="reference.I-D.ietf-rtcweb-overview"?>
<!-- Lots more to add. -->
</references>
<!-- Add this after version -00.
<section title="Changes From Earlier Versions">
<t>Note to the RFC-Editor: please remove this section prior to
publication as an RFC.</t>
<section title="Changes From Draft -00">
<t><list style="symbols">
<t></t>
</list></t>
</section>
</section>
-->
<section title="RTP terminology">
<t>(TODO: This list of definitions needs to be cleaned up and
expanded, if it's useful. Possibly its content should simply be
merged into the appropriate sections of the main document, instead.)</t>
<section anchor="Capture" title="Capture">
<t>"Capture" is the term used in the IETF CLUE Telepresence
Framework to refer to what this document calls a "media source".</t>
</section>
<section anchor="CNAME" title="CNAME">
<t>An RTP canonical name (RTP CNAME or CNAME) is a persistent transport-level identifier for an <xref target="RTPendpoint">RTP endpoint</xref>. CNAMEs must be unique within an <xref target="RTPsession">RTP session</xref>.<!--RFC 6222 abstract--> The RTCP CNAME can be either persistent across different RTP sessions for an RTP endpoint or unique per session, meaning that an RTP endpoint chooses a different CNAME for each session.<xref target="RFC6222"/><!--section 4.1--> <xref target="Receiver">receivers</xref> require the CNAME to keep track of each participant<!--what's a participant?-->. Receivers may also require the CNAME to associate multiple data streams<!--what kind of stream?--> from a given participant in a set of related RTP sessions, for example to synchronize audio and video.</t>
</section>
<section anchor="Endsystem" title="End system">
<t>An application that generates the content to be sent in RTP packets and/or consumes the content of received RTP packets. An end system can produce one or more <xref target="SSRC">synchronization sources</xref> in a particular <xref target="RTPsession">RTP session</xref>.<xref target="RFC3550"/><!--section 3--></t>
</section>
<section anchor="Group" title="Group">
</section>
<section anchor="AppMultimediasession" title="Multimedia session">
<t>A set of concurrent, associated <xref target="RTPsession">RTP sessions</xref> among a common group of participants.<xref target="RFC3550"/><!--section 3--></t>
</section>
<section anchor="MST" title="Multi-stream transmission">
<t>Multi-stream transmission (MST) is a mechanism by which different portions
of a layered encoding of a media stream are sent using separate
synchronization sources (sometimes in separate RTP sessons).
MSTs are useful for receiver control of layered media.</t>
</section>
<section anchor="Receiver" title="Receiver">
<!--used but never defined in RFC 3550-->
</section>
<section anchor="Repairstream" title="Repair stream">
<t>A repair stream is a secondary stream used for error recovery. The repair mecahnisms available include
<list style="symbols">
<t>duplication of the original stream,</t>
<t>duplication of the original stream with a time offset,</t>
<t>forward error correction (FEC) techniques, and.</t>
<t>retransmission of lost packets (either globally or selectively).</t>
</list><!--from IETF 85 7 November 2012 meeting notes-->
</t>
</section>
<section anchor="RTPendpoint" title="RTP endpoint">
</section>
<section anchor="AppRTPsession" title="RTP session">
<t>An RTP session or simply, session is an association among a set of participants communicating with RTP.<xref target="RFC3550"/><!--section 3--></t>
</section>
<section anchor="Scene" title="Scene">
<t>A scene is associated with IETF CLUE and describes a collection of <xref target="Capture">captures</xref> and the metadata required to make associations (e.g. relative spacial orintation of cameras) between the captures.</t><!--from IETF 85 7 November 2012 meeting notes-->
</section>
<section anchor="Sender" title="Sender">
<!--used but never defined in RFC 3550-->
</section>
<section anchor="Simulcast" title="Simulcast">
<!--from IETF 85 7 November 2012 meeting notes-->
</section>
<section anchor="Source" title="Source">
<t>A synchronization source (SSRC) is the source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.<xref target="RFC3550"/><!--section 3--></t>
</section>
<section anchor="AppMediastream" title="Media stream">
</section>
<section anchor="Switchedcapture" title="Switched capture">
<t>A switched capture is used in teleconference appliations and is a <xref target="AppMediastream">stream</xref> whose content is selected from one of a selection of sources. The source is selected based on an algorithm in the equipment producing the stream for example, a closeup of the person currently speaking in a conference room.</t><!--from IETF 85 7 November 2012 meeting notes-->
</section>
<section anchor="Track" title="Track">
<t>A track is associated with W3C WebRTC and IETF RTCWeb projects. A track is defined by SSRC and CNAME. A track is the smallest media entity that may be independently rendered. A single audio channel is a track.</t><!--from IETF 85 7 November 2012 meeting notes-->
</section>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 03:22:22 |