One document matched: draft-lennox-raiarea-rtp-grouping-taxonomy-00.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
]>
<?rfc toc="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="info" docName="draft-lennox-raiarea-rtp-grouping-taxonomy-00"
     ipr="trust200902">
  <front>
    <title abbrev="RTP Grouping Taxonomy">A Taxonomy of Grouping Semantics and Mechanisms
	  for Real-Time Transport Protocol (RTP) Sources</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>
          <street>Seventh Floor</street>
          <city>Hackensack</city>
          <region>NJ</region>
          <code>07601</code>
          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Kevin Gross" initials="K." surname="Gross">
      <organization abbrev="AVA">AVA Networks, LLC</organization>

      <address>
        <postal>
          <street/>
          <city>Boulder</city>
          <region>CO</region>
          <country>US</country>
        </postal>

        <email>kevin.gross@avanw.com</email>
      </address>
    </author>

	<!-- Add more authors here! -->

    <date/>

    <area>RAI</area>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>The terminology about, and associations among, Real-Time
      Transport Protocol (RTP) sources can be complex and somewhat
      opaque.  This document describes a number of existing and
      proposed relationships among RTP sources, and attempts to define
      common terminology for discussing protocol entities and their
      relationships.</t>

	  <t>This document is still very rough, but is submitted in the
	  hopes of making future discussion productive.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">

	  <t>Terminology and understanding about sources in RTP, and how
	  they relate, is very confused.  Different protocols use the
	  same terms differently, and complexities addressed at one layer
	  are often glossed over or ignored at another.</t>

	  <t>This document attempts to provide some clarity by reviewing
	  the semantics of various aspects of sources in RTP.  As an
	  organizing mechanism, it approaches this by describing various
	  ways that RTP sources can be grouped and associated together.</t>  

    </section>

    <section title="RTP Terminology">

	  <section anchor="SSRC" title="Synchronization Source">

	  <t>In RTP, the smallest unit of media transport is the
	  "synchronization source" (SSRC).  Defined in
	  <xref target="RFC3550"/>, a synchronization source is the source
	  of a stream of RTP packets, identified by a 32-bit numeric SSRC
	  identifier carried in the RTP header of every RTP packet. All
	  packets from a synchronization source form part of the same
	  timing and sequence number space, so a receiver groups packets
	  by synchronization source for playback.  In the most common
	  cases, a synchronization source is a single flow of encoded
	  media that can be fed to a single decoding process.</t>

	  <t>In some more complex cases, a synchronization source is also
	  used to encode repair flows and sub-flows of a layered
	  encoding.  See <xref target="alternative"/> and
	  <xref target="repair"/> below.</t>

	  <t>(Note: as the definition above mentions, "synchronization
	  source" in <xref target="RFC3550"/> technically refers to the
	  *source* of the stream of RTP packets, not the packets
	  themselves.  A better term for the stream could be helpful, and
	  the authors are open to one; however, finding a good term that
	  doesn't collide with terms used in other standards is very
	  difficult.  In particular, both "stream" and "flow" are
	  overloaded enough that their use would probably confuse more
	  than it clarifies.)</t>

	  </section>

	  <section anchor="RTPsession" title="RTP Session">

	  <t>The RTP protocol transmits sources in an "RTP session".  An RTP
	  session is an association among a group of participants
	  communicating with RTP.  It is a group communications channel which carries a number
	  of synchronization sources.  Within an RTP session, every
	  participant finds out meta-data and control information (over
	  RTCP) about all the synchronization sources in the RTP session, all
	  synchronization source identifiers are unique in the RTP session,
	  and the bandwidth of the RTCP control channel is shared by all
	  the synchronization sources in the session.</t>

	  </section>

	  <section anchor="CSRC" title="Contributing Source">

	  <t>In an RTP session, in some cases,
	  the media flows of some synchronization sources
	  are not forwarded by RTP middleboxes (known as mixers and
	  translators).  In the case of mixers, each of the synchronization
	  sources that are used to create a mixed stream of media becomes
	  visible instead as a "contributing source" (CSRC), 
	  whose media is no longer directly visibile in some parts of the
	  RTP session, but whose meta-data is still known by all
	  participants through RTCP.</t>

	  </section>

      <section anchor="Multimediasession" title="Multimedia Session">
	  
	  <t>A group of RTP sessions can be grouped into a "multimedia
	  session", a group of concurrent, associated RTP sessions among a common group of
	  participants.  A multimedia session defines logical relationships among
	  sources that appear in multiple RTP sessions.</t>

	  </section>

	  <section anchor="Mediasource" title="Media Source">

	  <t>At a higher level than this is the "media source", a single
	  piece of media that can be indepentently rendered.  While in the
	  simplest cases a synchronization source uniquely 
	  encodes a media source, many more complicated cases, discussed
	  in <xref target="alternative"/> and <xref target="repair"/> below,
	  associate multiple synchronization sources together to provide various
	  methods and options for encoding a media source.  (In some of
	  these cases, these multiple synchronization sources are even
	  sent in separate RTP sessions, though always in the same
	  multimedia session.)</t>

	  <t>Unlike the
	  other terminology defined in this section, "media source" is not a
	  well-established term; existing protocols use a number of
	  different terms to define similar concepts, described below.
	  ("Media source" may not be the best term in for this concept;
	  the authors of this document are open to better ones.)</t> 

	  </section>
	  
    </section>

	<section title="Terminology used other protocols and groups">
	  <section anchor="sdp" title="SDP">

		<t>The <xref target="RFC4566">Session Description
		Protocol</xref> provides a mechanism for describing -- and,
		with <xref target="RFC3264">SDP Offer/Answer</xref>,
		negotiation -- multimedia sessions.</t>

		<section anchor="SDPMultimediaSession" title="SDP Multimedia Session">

		<t>SDP uses the term "multimedia session" in essentially the
		same manner as RTP.</t>

		<section anchor="SDPMediaStream" title="Media Stream">

		<t>"Media stream" is perhaps one of the most confusing terms
		in SDP.  It is never explicitly defined
		in <xref target="RFC4566"/>; its meaning can only be inferred
		from descriptions of its properties, the SDP syntax used to
		describe it, and its usage in other protocols.</t>

		<t>In SDP, a media stream is described my a "media
		description", a block of SDP starting with an "m=" line and
		followed by other SDP fields which are scoped to that media
		description.</t>

		<t>When SDP is used to describe a media stream that uses RTP,
		the attributes used (in the core SDP protocol) all define
		attributes of an RTP session.  Furthermore,
		<xref target="RFC4566"/> does note that "Media streams can be
		many-to-many."  Thus, when SDP describes a media stream that
		uses RTP, the RTP-level protocol element that is described is
		an RTP session, which can carry any number of synchronization
		sources (and thus media sources).</t>

		<t>However, for a number of reasons, many implementations of
		SDP, notably SIP devices, instead interpret SDP media streams
		as containing only a single media source in each direction
		(assuming unicast flows).  There are a number of reasons this
		implementation decision was chosen, including the lack of
		definition of "media stream" in the SDP specification; the
		poorly chosen name (the term "media stream" seems like
		something that should refer to what this document calls a
		"media source"); the fact that core SDP, and offer/answer,
		provide no description or negotiation mechanisms for anything
		more fine-grained than a media source; and the fact that most
		existing implementations have no need for more than one media
		source and received of each media type.</t>

		<t>Other than in discussions of other existing terminology,
		this document attempts to avoid the use of the word "stream".</t>

		</section>

	  </section>

	  <section anchor="clue" title="CLUE">

		<t>The CLUE working group is ongoing work in the IETF to
		define protocols and mechanism for interoperable telepresence
		systems.  Its architecture and terms are defined
		in <xref target="I-D.ietf-clue-framework"/>.</t>

		<section anchor="ClueCaptureScene" title="CLUE Capture Scene">

		<t>In CLUE, a "Capture Scene" is a structure representing a spatial region containing
		one or more Capture Devices, each capturing media
		representing a portion of the region. The spatial region
		represented by a scene may or may not correspond to a real
		region in physical space, such as a room.  A capture scene
		includes attributes and one or more capture scene entries,
		with each entry including one or more media captures.</t>

		</section>

		<section anchor="ClueCaptureSceneEntry" title="CLUE Capture Scene Entry">

		 <t>In CLUE, a "Capture Scene Entry" is a list of Media
		  Captures of the same media type that together form one way
		  to represent the entire Capture scene.</t>

		</section>

		<section anchor="ClueCapture" title="CLUE Capture">

		  <t>In CLUE, a "Media Capture" is a source of Media, such as
		  from one or more Capture Devices or constructed from other
		  Media streams.</t>

		<t>The "Capture" is CLUE's term that is most closely
		equivalent to what this document calls a "media source";
		however, some usages (such as "dynamic sources" of "switched
		captures") may be more complex, and are still actively being
		defined.</t>

		</section>

		<section anchor="ClueCaptureEncoding" title="CLUE Capture Encoding">

		  <t>In CLUE, a "Capture Encoding" is a specific encoding of a
		  Media Capture, to be sent by a Media Provider to a Media
		  Consumer via RTP.</t>

		  <t>In simulcast scenearios (see
		  <xref target="alternative"/><!-- XXX replace with proper
		  subsection -->), multiple capture encodings can be used
		  simultaneously to transmit a single capture.</t>

		</section>

	  </section>

	  <section anchor="webrtc" title="WebRTC">

		<t>WebRTC is ongoing work in the IETF and W3C to define
		mechanisms and APIs to allow Javascript applications in web
		browsers to participate in multimedia sessions.  An overview
		can be found in <xref target="I-D.ietf-rtcweb-overview"/>.</t>

		<section anchor="RtcMediaStreamTrack" title="WebRTC RtcMediaStreamTrack">

		<t>RtcMediaStreamTrack</t>

		<t>An "RtcMediaStreamTrack" is an object in WebRTC that maps most
		closely to what this document calls a "media source",
		though some of the details remain to be defined.  However, as
		WebRTC is defining an API, the RtcMediaStreamTrack object also
		includes a good deal of meta-data about the media source, such as an ID,
		enabled/disabled state, and so forth.</t>

		</section>

		<section anchor="RtcMediaStream" title="WebRTC RtcMediaStream">

		<t>An "RtcMediaStream" in WebRTC is an object that groups a
		set of RtcMediaStreamTracks that are synchronized with each other.</t>

		</section>

	  </section>

	</section>

	  
	</section>
	<section title="Grouping of RTP sources">

	  <t>Synchronization sources can have a large variety of
	  relationships among them.  These relationships can apply both
	  between sources within a single session, and between sources
	  that occur in multiple sessons.</t>

	  <t>Ways of relating synchronization sources typically involve groups: a set of
	  synchronizaiton sources has some relationship that applies to all those in the
	  group, and no others.  (Relationships that involve arbitrary
	  non-grouping associations among synchronization sources, such
	  that e.g., A relates
	  to B and B to C, but A and C are unrelated, are uncommon if not
	  nonexistant.)</t>

	  <t>In many cases, the semantics of groups are not simply that
	  the sources form an undifferentiated group, but rather that
	  members of the group have certain roles.</t>

	<section title="Overview of hierarchical relationships among groups">

	  <t>Many (though not all) associations among groups can be
		described hierarchically.</t>

	  <t>These hierarchichal groups can be divided into three large
	  categories: associations among independent media sources;
	  alternative representations of media sources; and mechanisms for
	  robustness and repair of a particular representation of a media source.</t>

	</section>

	<section title="Existing and proposed source group semantics">

	  <t>We can divide group semantics into several broad categories.</t>

	  <t>(TODO: each of the items below needs to be described in detail.)</t>

	  <section title="Relationships among media sources">

		<t>Synchronization Contexts -- things identified by a
		CNAME.</t>

		<t>CLUE Scenes (and their substructures, down to Captures.).</t>

		<t>WebRTC MediaStreams.</t>

		<t>Clock source signaling.</t>

	  </section>

	  <section anchor="alternative" title="Alternative representations of media sources">

		<t>Simulcast.</t>

		<t>Multi-Stream Transmission of Layered Codecs.</t>

		<t>The original FID description in RFC 5888 (grouping) Section
		8.</t>

	  </section>

	  <section anchor="repair" title="Robustness and Repair">

		<t>Forward Error Correction (FEC).  (But in some cases this
		can also be across multiple media sources!)</t>

		<t>Retransmission.</t>

	  </section>

	  <section title="Reporting Associations">

		<t>RTP also uses SSRC values to identify receivers of media, the
		source of feedback about synchronization sources (and thus, also media
		sources).  In some cases these receivers are also grouped.
		However, these associations are not associations of senders
		of media, and thus are not the same type of relationship as
		the ones described above.</t>

		<t>Inter-Domain Media Synchronization</t>

		<t>Reporting groups.</t>

	  </section>
	  
	</section>

	</section>

	<section title="Existing and proposed mechanisms for describing groups">

	  <t>SDP groups</t>

	  <t>SDP source groups</t>

	  <t>SRCNAME</t>

	  <t>MSID</t>

	  <t>CLUE</t>

	  <t>Application-specific SDP attributes</t>

	  <t>SDP "number of ports" field to create multiple sessions with SSRC alignment.</t>

	</section>

    <section anchor="security" title="Security Considerations">
	  <t>Different for each way of grouping.  Is there anything we
	  can generalize?</t>

	  <t>Hopefully having a well-defined common terminology and
	  understanding of the complexities of the RTP architecture will
	  help lead us to better standards, avoiding security problems.</t>
    </section>

    <section title="Open Issues">
	  <t>Much of the terminology is still a matter of dispute.</t>

	  <t>It might be useful to distinguish between a single endpoint's view of a
	  source, or RTP session, or multimedia session, versus the full
	  set of sessions and every endpoint that's communicating in them,
	  with the signaling that established them.</t> 

	  <t>(Sure to be many more...)</t>
    </section>

    <section anchor="iana" title="IANA Considerations">

	  <t>This document makes no request of IANA.</t>

	</section>
  </middle>

  <back>
	<!-- I don't think we would have any normative references. -->

    <references title="Informative References">
	  <?rfc include="reference.RFC.3264"?>

	  <?rfc include="reference.RFC.3550"?>
	  
	  <?rfc include="reference.RFC.4566"?>

	  <?rfc include="reference.RFC.6222"?>

	  <?rfc include="reference.I-D.ietf-clue-framework"?>

	  <?rfc include="reference.I-D.ietf-rtcweb-overview"?>

	<!-- Lots more to add. -->

    </references>

	<!-- Add this after version -00.

    <section title="Changes From Earlier Versions">
      <t>Note to the RFC-Editor: please remove this section prior to
      publication as an RFC.</t>

      <section title="Changes From Draft -00">
        <t><list style="symbols">
            <t></t>
        </list></t>
      </section>
    </section>

	-->

	<section title="RTP terminology">

	  <t>(TODO: This list of definitions needs to be cleaned up and
	  expanded, if it's useful.  Possibly its content should simply be
	  merged into the appropriate sections of the main document, instead.)</t>

	  <section anchor="Capture" title="Capture">
      <t>"Capture" is the term used in the IETF CLUE Telepresence
      Framework to refer to what this document calls a "media source".</t>
	  </section>

	  <section anchor="CNAME" title="CNAME">
      <t>An RTP canonical name (RTP CNAME or CNAME) is a persistent transport-level identifier for an <xref target="RTPendpoint">RTP endpoint</xref>. CNAMEs must be unique within an <xref target="RTPsession">RTP session</xref>.<!--RFC 6222 abstract--> The RTCP CNAME can be either persistent across different RTP sessions for an RTP endpoint or unique per session, meaning that an RTP endpoint chooses a different CNAME for each session.<xref target="RFC6222"/><!--section 4.1--> <xref target="Receiver">receivers</xref> require the CNAME to keep track of each participant<!--what's a participant?-->. Receivers may also require the CNAME to associate multiple data streams<!--what kind of stream?--> from a given participant in a set of related RTP sessions, for example to synchronize audio and video.</t>
	  </section>

	  <section anchor="Endsystem" title="End system">
      <t>An application that generates the content to be sent in RTP packets and/or consumes the content of received RTP packets.  An end system can produce one or more <xref target="SSRC">synchronization sources</xref> in a particular <xref target="RTPsession">RTP session</xref>.<xref target="RFC3550"/><!--section 3--></t>
    </section>
    
    <section anchor="Group" title="Group">
	  </section>

    <section anchor="AppMultimediasession" title="Multimedia session">
      <t>A set of concurrent, associated <xref target="RTPsession">RTP sessions</xref> among a common group of participants.<xref target="RFC3550"/><!--section 3--></t>
    </section>

	  <section anchor="MST" title="Multi-stream transmission">
      <t>Multi-stream transmission (MST) is a mechanism by which different portions
      of a layered encoding of a media stream are sent using separate
      synchronization sources (sometimes in separate RTP sessons).
       MSTs are useful for receiver control of layered media.</t>
	  </section>

	  <section anchor="Receiver" title="Receiver">
      <!--used but never defined in RFC 3550-->
	  </section>

	  <section anchor="Repairstream" title="Repair stream">
      <t>A repair stream is a secondary stream used for error recovery. The repair mecahnisms available include
        <list style="symbols">
          <t>duplication of the original stream,</t>
          <t>duplication of the original stream with a time offset,</t>
          <t>forward error correction (FEC) techniques, and.</t>
          <t>retransmission of lost packets (either globally or selectively).</t>
        </list><!--from IETF 85 7 November 2012 meeting notes-->
      </t>
	  </section>

	  <section anchor="RTPendpoint" title="RTP endpoint">
	  </section>
    
	  <section anchor="AppRTPsession" title="RTP session">
      <t>An RTP session or simply, session is an association among a set of participants communicating with RTP.<xref target="RFC3550"/><!--section 3--></t>
	  </section>

	  <section anchor="Scene" title="Scene">
      <t>A scene is associated with IETF CLUE and describes a collection of <xref target="Capture">captures</xref> and the metadata required to make associations (e.g. relative spacial orintation of cameras) between the captures.</t><!--from IETF 85 7 November 2012 meeting notes-->
	  </section>

	  <section anchor="Sender" title="Sender">
      <!--used but never defined in RFC 3550-->
	  </section>

	  <section anchor="Simulcast" title="Simulcast">
      <!--from IETF 85 7 November 2012 meeting notes-->
	  </section>

	  <section anchor="Source" title="Source">

      <t>A synchronization source (SSRC) is the source of a stream of RTP packets, identified by a 32-bit numeric SSRC identifier carried in the RTP header so as not to be dependent upon the network address. All packets from a synchronization source form part of the same timing and sequence number space, so a receiver groups packets by synchronization source for playback.<xref target="RFC3550"/><!--section 3--></t>
	  </section>

	  <section anchor="AppMediastream" title="Media stream">
	  </section>

	  <section anchor="Switchedcapture" title="Switched capture">
      <t>A switched capture is used in teleconference appliations and is a <xref target="AppMediastream">stream</xref> whose content is selected from one of a selection of sources. The source is selected based on an algorithm in the equipment producing the stream for example, a closeup of the person currently speaking in a conference room.</t><!--from IETF 85 7 November 2012 meeting notes-->
	  </section>

	  <section anchor="Track" title="Track">
      <t>A track is associated with W3C WebRTC and IETF RTCWeb projects. A track is defined by SSRC and CNAME. A track is the smallest media entity that may be independently rendered. A single audio channel is a track.</t><!--from IETF 85 7 November 2012 meeting notes-->
	  </section>
	</section>

  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 03:22:22