http://stupid.domain.name/ietf/

One document matched: draft-lennox-raiarea-rtp-grouping-taxonomy-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc autobreaks="yes"?>

<rfc category="info" docName="draft-lennox-raiarea-rtp-grouping-taxonomy-01"
     ipr="trust200902">
  <front>
    <title abbrev="RTP Grouping Taxonomy">A Taxonomy of Grouping Semantics and
    Mechanisms for Real-Time Transport Protocol (RTP) Sources</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>

          <street>Seventh Floor</street>

          <city>Hackensack</city>

          <region>NJ</region>

          <code>07601</code>

          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Kevin Gross" initials="K." surname="Gross">
      <organization abbrev="AVA">AVA Networks, LLC</organization>

      <address>
        <postal>
          <street/>

          <city>Boulder</city>

          <region>CO</region>

          <country>US</country>
        </postal>

        <email>kevin.gross@avanw.com</email>
      </address>
    </author>

    <author fullname="Suhas Nandakumar" initials="S" surname="Nandakumar">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>170 West Tasman Drive</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>snandaku@cisco.com</email>
      </address>
    </author>

    <author fullname="Gonzalo Salgueiro" initials="G" surname="Salgueiro">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>7200-12 Kit Creek Road</street>

          <city>Research Triangle Park</city>

          <region>NC</region>

          <code>27709</code>

          <country>US</country>
        </postal>

        <email>gsalguei@cisco.com</email>
      </address>
    </author>

    <author fullname="Bo Burman" initials="B." surname="Burman">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 13 11</phone>

        <email>bo.burman@ericsson.com</email>
      </address>
    </author>

    <!-- Add more authors here! -->

    <date year="2013"/>

    <area>Real Time Applications and Infrastructure (RAI)</area>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>The terminology about, and associations among, Real-Time Transport
      Protocol (RTP) sources can be complex and somewhat opaque. This document
      describes a number of existing and proposed relationships among RTP
      sources, and attempts to define common terminology for discussing
      protocol entities and their relationships.</t>

      <t>This document is still very rough, but is submitted in the hopes of
      making future discussion productive.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">
      <t>The existing taxonomy of sources in RTP is often regarded as
      confusing and inconsistent. Consequently, a deep understanding of how
      the different terms relate to each other becomes a real challenge.
      Frequently cited examples of this confusion are (1) how different
      protocols that make use of RTP use the same terms to signify different
      things and (2) how the complexities addressed at one layer are often
      glossed over or ignored at another.</t>

      <t>This document attempts to provide some clarity by reviewing the
      semantics of various aspects of sources in RTP. As an organizing
      mechanism, it approaches this by describing various ways that RTP
      sources can be grouped and associated together.</t>
    </section>

    <section title="Concepts">
      <t>This section defines concepts that serve to identify various
      components in a given RTP usage. For each concept an attempt is made to
      list any alternate definitions and usages that co-exist today along with
      various characteristics that further describes the concept.</t>

      <t hangText="Note">All references to ControLling mUltiple streams for
      tElepresence (CLUE) in this document map to <xref
      target="I-D.ietf-clue-framework"/> and all references to Web Real-Time
      Communications (WebRTC) map to <xref
      target="I-D.ietf-rtcweb-overview"/>.</t>

      <section anchor="endpoint" title="End Point">
        <t>A single entity sending or receiving RTP packets. It may be
        decomposed into several functional blocks, but as long as it behaves
        as a single RTP stack entity it is classified as a single "End
        Point".</t>

        <section title="Alternate Usages">
          <t>The CLUE Working Group (WG) uses the terms "Media Provider" and
          "Media Consumer" to describes aspects of End Point pertaining to
          sending and receiving functionalities.</t>
        </section>

        <section title="Characteristics">
          <t>End Points can be identified in several different ways. While RTCP Canonical Names (CNAMEs) <xref target="RFC3550"/> provide a globally unique and stable identification mechanism for the duration of the Communication Session (See <xref
          target="commsession"/>), their validity applies exclusively within a synchronization context. Therefore, a mechanisms outside the scope of RTP, such as an application defined mechanisms, must be depended upon to ensure End Point identification when outside this synchronization context.</t>
        </section>
      </section>

      <section anchor="capturedevice" title="Capture Device">
        <t>The physical source of stream of media data of one type such as
        camera or microphone.</t>

        <section title="Alternate Usages">
          <t>The CLUE WG uses the term "Capture Device" to identify a physical
          capture device.</t>

          <t>WebRTC WG uses the term "Recording Device" to refer to the
          locally available capture devices in an end-system.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>A Capture Device is identified either by
              hardware/manufacturer ID or via a session-scoped device
              identifier as mandated by the application usage.</t>

              <t>A Capture Device always corresponds to a Media Source (See
              <xref target="mediasource"/> for a definition of this term) but
              vice-versa might not always be true. For example, in the cases
              of output from a media production function (i.e., an audio
              mixer) or a video editing function which can represent data from
              several Media Sources.</t>
            </list></t>
        </section>
      </section>

      <section anchor="mediasource" title="Media Source">
        <t>A Media Source logically defines the source of a raw stream of
        media data as generated either by a single capture device or by a
        conceptual source. A Media Source represents an Audio Source or a
        Video Source.</t>

        <section title="Alternate Usages">
          <t>The CLUE WG uses the term "Media Capture" for this purpose. A
          CLUE Media Capture is identified via indexed notation. The terms
          Audio Capture and Video Capture are used to identify Audio Sources
          and Video Sources respectively. Concepts such as "Capture Scene",
          "Capture Scene Entry" and "Capture" provide a flexible framework to
          represent media captured spanning spatial regions.</t>

          <t>The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to
          a Media Source. An "RtcMediaStreamTrack" is identified by the ID
          attribute on it.</t>

          <t>Typically a Media Source is mapped to a single m=line via the
          Session Description Protocol (SDP) <xref target="RFC4566"/> unless
          mechanisms such as Source-Specific attributes are in place <xref
          target="RFC5576"/>. In the latter cases, an m=line can represent
          either multiple Media Sources or multiple Media Streams (See <xref
          target="mediastream"/> for a definition of this term).</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>A Media Source represents a real-time source of raw stream of
              audio or video media data.</t>

              <t>At any point, it can represent a physical capture source or
              conceptual source.</t>

              <t>Typically raw media from a Media Source is compressed via the
              application of an appropriate encoding mechanism, thus creating
              an RTP payload for Media Streams (See <xref
              target="mediastream"/> for a definition of this term).</t>

              <t>Multiple transformations can be applied to the data from a
              Media Source, thus creating several Media Streams.</t>

              <t>Some notable transformations are described in <xref
              target="equivalence"/>.</t>
            </list></t>
        </section>
      </section>

      <section anchor="mediastream" title="Media Stream">
        <t>Media from a Media Source is encoded and packetized to produce one
        or more Media Streams representing a sequence of RTP packets.</t>

        <section title="Alternate Usages">
          <t>The term "Stream" is used by the CLUE WG to define a encoded
          Media Source sent via RTP. "Capture Encoding", "Encoding Groups" are
          defined to capture specific details of the encoding scheme.</t>

          <t>RFC3550 <xref target="RFC3550"/> uses the term Source for this
          purpose.</t>

          <t>The equivalent mapping of Media Stream in SDP <xref
          target="RFC4566"/> is defined per usage. For example, each m=line
          can describe one Media Stream and hence one Media Source OR a single
          m=line can describe properties for multiple Media Streams (via <xref
          target="RFC5576"/> mechanisms for example).</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>Each Media Stream is identified by a unique Synchronization
              source (SSRC) <xref target="RFC3550"/> that is carried in every
              RTP and Real-time Transport Control Protocol (RTCP) packet
              header.</t>

              <t>At any given point, an Media Stream can have one and only
              SSRC.</t>

              <t>Each Media Stream defines a unique RTP sequence numbering and
              timing space.</t>

              <t>Several Media Streams could potentially map to a single Media
              Source via the source transformations (See <xref
              target="equivalence"/>).</t>

              <t>Several Media Streams can be carried over a single RTP
              Session.</t>
            </list></t>
        </section>
      </section>

      <section anchor="provider" title="Media Provider">
        <t>A Media Provider is a logical component within the RTP Stack that
        is responsible for encoding the media data from one or more Media
        Sources to generate RTP Payload for the outbound Media Streams.</t>

        <section title="Alternate Usages">
          <t>Within the SDP usage, an m=line describes the necessary
          configuration required for encoding purposes.</t>

          <t>CLUE's "Capture Encoding" provides specific encoding
          configuration for this purpose.</t>

          <t>WebRTC WG uses the term "RtcMediaStreamTrack" to qualify as
          source of the media data that is encoded via the Media Provider.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>A Media Source can be multiply encoded by a given Media
              Provider on-the-fly by allowing various encoded
              representations.</t>
            </list></t>
        </section>
      </section>

      <section anchor="rtpsession" title="RTP Session">
        <t>An RTP session is an association among a group of participants
        communicating with RTP. It is a group communications channel which can
        potentially carry a number of Media Streams. Within an RTP session,
        every participant finds out meta-data and control information (over
        RTCP) about all the Media Streams in the RTP session. The bandwidth of
        the RTCP control channel is shared within an RTP Session.</t>

        <section title="Alternate Usages">
          <t>Within the context of SDP a singe m=line can map to a single RTP
          Session or multiple m=lines can map to a single RTP Session. The
          latter is enabled via multiplexing schemes such as BUNDLE <xref
          target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>, for example, that
          allows mapping of multiple m=lines to a single RTP Session.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>Typically an RTP Session can carry one ore more Media
              Streams, the latter is also termed "SSRC Multiplexing".</t>

              <t>Each RTP Session is carried by a single underlying Media
              Transport unless multiple RTP sessions are multiplexed over a
              single Transport Flow. Such a scheme is alternatively called
              "Session Multiplexing" in the RTP context <xref
              target="I-D.westerlund-avtcore-transport-multiplexing"/>.</t>

              <t>An RTP Session shares a single SSRC space as defined in
              RFC3550 <xref target="RFC3550"/>. That is, those End Points can
              see an SSRC identifier transmitted by any of the other End
              Points. An End Point can receive an SSRC either as SSRC or as a
              Contributing source (CSRC) in RTP and RTCP packets, as defined
              by the endpoints' network interconnection topology.</t>

              <t>Multiple RTP Sessions can be related to one another via
              mechanisms defined in <xref target="relationships"/>.</t>
            </list></t>
        </section>
      </section>

      <section anchor="mediatransport" title="Media Transport">
        <t>A Media Transport defines an end-to-end transport association for
        carrying one or more RTP Sessions. The combination of a network
        address and port uniquely identifies such a transport association, for
        example an IP address and a UDP port.</t>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>Media Transport transmits RTP Packets from a source transport
              address to a destination transport address.</t>

              <t>RTP may depend upon the lower-layer protocol to provide
              mechanism such as ports to multiplex the RTP and RTCP packets of
              an RTP Session.</t>
            </list></t>
        </section>
      </section>

      <section anchor="renderdevice" title="Rendering Device">
        <t>Represents a physical rendering device such display or speaker.</t>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>An End Point can potentially have multiple rendering devices
              of each type.</t>

              <t>Incoming Media Streams are decoded by one or more Media
              Renderers to provide a representation suitable for rendering the
              media data over one or more Rendering Devices, as defined by the
              application usage or system-wide configuration.</t>
            </list></t>
        </section>
      </section>

      <section anchor="renderer" title="Media Renderer">
        <t>A Media Renderer is a logical component within the RTP Stack that
        is responsible for decoding the RTP Payload within the incoming Media
        Streams to generate media data suitable for eventual rendering.</t>

        <section title="Alternate Usages">
          <t>Within the context of SDP, an m=line describes the necessary
          configuration required to decode either one or more incoming Media
          Streams.</t>

          <t>The WebRTC WG uses the term "RtcMediaStreamTrack" to qualify the
          media data decoded via the Media Renderer corresponding to the
          incoming Media Stream.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>The output from the Media Renderer is usually rendered to a
              Rendering Device via appropriate mechanisms as explained in
              <xref target="renderdevice"/></t>

              <t>Incoming Media Streams decoded by the Media Renderer are
              typically identified via the SSRC.</t>
            </list></t>
        </section>
      </section>

      <section title="Participant">
        <t>A participant is an entity reachable by a single signaling address,
        and is thus related more to the signaling context than to the media
        context.</t>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>A single signaling-addressable entity, using an
              application-specific signaling address space, for example a SIP
              URI.</t>

              <t>A participant can have several associated transport flows,
              including several separate local transport addresses for those
              transport flows.</t>

              <t>A participant can have several multimedia sessions.</t>
            </list></t>
        </section>
      </section>

      <section anchor="multimediasession" title="Multimedia Session">
        <t>A multimedia session is an association among a group of
        participants engaged in the conversation via one or more RTP Sessions.
        It defines logical relationships among Media Sources that appear in
        multiple RTP Sessions.</t>

        <section title="Alternate Usages">
          <t>RFC4566 <xref target="RFC4566"/> defines a multimedia session as
          a set of multimedia senders and receivers and the data streams
          flowing from senders to receivers.</t>

          <t>RFC3550 <xref target="RFC3550"/> defines it as set of concurrent
          RTP sessions among a common group of participants. For example, a
          videoconference (which is a multimedia session) may contain an audio
          RTP session and a video RTP session.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>Participants in RTP multimedia sessions are identified via
              mechanisms such as RTCP CNAME or other application level
              identifiers as appropriate.</t>

              <t>A multimedia session can be composed of several parallel RTP
              Sessions with potentially multiple Media Streams per RTP
              Session.</t>

              <t>Each participant in a multimedia sessions can have multitude
              of Media Captures and Media Rendering devices.</t>
            </list></t>
        </section>
      </section>

      <section anchor="commsession" title="Communication Session">
        <t>A communication session is an association among group of
        participants communicating with each other via a set of multimedia
        sessions.</t>

        <section title="Alternate Usages">
          <t>The Session Description Protocol RFC4566 <xref
          target="RFC4566"/>defines a multimedia session as a set of
          multimedia senders and receivers and the data streams flowing from
          senders to receivers. In that definition it is however not clear if
          a multimedia session includes both the sender's and the receiver's
          view of the same RTP Stream.</t>
        </section>

        <section title="Characteristics">
          <t><list style="symbols">
              <t>Each participant in a Communication Session is identified via
              an application-specific signaling address.</t>

              <t>A Communication Session is composed of at least one
              multimedia session per participant, involving one or more
              parallel RTP Sessions with potentially multiple Media Streams
              per RTP Session.</t>
            </list> For example, in a full mesh communication, the
          Communication Session consists of a set of separate Multimedia
          Sessions between each pair of Participants. Another example is a
          centralized conference, where the Communication Session consists of
          a set of Multimedia Sessions between each Participant and the
          conference handler.</t>
        </section>
      </section>
    </section>

    <section anchor="relationships" title="Relationships">
      <t>This section provides various relationships that can co-exist between
      the aforementioned concepts in a given RTP usage. Using Unified Modeling
      Language (UML) class diagrams <xref target="UML"/>, <xref
      target="fig-media-source"/> below depicts general relations between a
      Media Source, its Media Provider(s) and the resulting Media
      Stream(s).</t>

      <t><list style="empty">
          <t>Note: The RTCP Stream related to the RTP Stream is not shown in
          the figure.</t>
        </list></t>

      <figure align="center" anchor="fig-media-source"
              title="Media Source Relations">
        <artwork align="center"><![CDATA[+--------------+  <<uses>>  +-------------------------+
| Media Source |- - - - - ->| Synchronization Context |
+--------------+            +-------------------------+
      < > 1..*
       |
       | 0..*
+--------------+
|              |<>-+ 0..*
|    Media     |   |
|   Provider   |   |
|              |---+ 0..*
+--------------+
      < > 1
       |
       | 0..*
+----------------+ 0..*     1 +-------------+
|  Media Stream  |----------<>| RTP Session |
+----------------+            +-------------+
]]></artwork>
      </figure>

      <t>Media sources can have a large variety of relationships among them.
      These relationships can apply both between sources within a single RTP
      Session, and between Media Sources that occur in multiple RTP Session.
      Ways of relating them typically involve groups: a set of Media Sources
      has some relationship that applies to all those in the group, and no
      others. (Relationships that involve arbitrary non-grouping associations
      among Media sources, such that e.g., A relates to B and B to C, but A
      and C are unrelated, are uncommon if not nonexistent.) In many cases,
      the semantics of groups are not simply that the the members form an
      undifferentiated group, but rather that members of the group have
      certain roles.</t>

      <section anchor="syncontext" title="Synchronization Context">
        <t>A synchronization context defines requirement on a strong timing
        relationship between the related entities, typically requiring
        alignment of clock sources. Such relationship can be identified in
        multiple ways as listed below. A single Media Source can only belong
        to a single Synchronization Context, since it is assumed that a single
        Media Source can only have a single media clock and requiring
        alignment to several Synchronization Contexts will effectively merge
        those into a single Synchronization Context.</t>

        <t>A single Multimedia session can contain media from one or more
        Synchronization Contexts. An example of that is a Multimedia Session
        containing one set of audio and video for communication purposes
        belonging to one Synchronization context, and another set of audio and
        video for presentation purposes (like playing a video file) that has
        no strong timing relationship and need not be strictly synchronized
        with the audio and video used for communication.</t>

        <section title="RTCP CNAME">
          <t>RFC3550 <xref target="RFC3550"/> describes Inter-media
          synchronization between RTP Sessions based on RTCP CNAME, RTP and
          Network Time Protocol (NTP) <xref target="RFC5905"/> timestamps.</t>
        </section>

        <section title="Clock Source Signaling">
          <t><xref target="I-D.ietf-avtcore-clksrc"/> provides a mechanism to
          signal the clock source in SDP, thus allowing a synchronized context
          to be defined.</t>
        </section>

        <section title="CLUE Scenes">
          <t>In CLUE "Capture Scene", "Capture Scene Entry" and "Captures"
          define an implied synchronization context.</t>
        </section>

        <section title="Implicitly via RtcMediaStream">
          <t>The WebRTC WG defines "RtcMediaStream" with one or more
          "RtcMediaStreamTracks". All tracks in a "RTCMediaStream" are
          intended to be synchronized when rendered.</t>
        </section>

        <section title="Explicitly via SDP Mechanisms">
          <t>RFC5888 <xref target="RFC5888"/> defines m=line grouping
          mechanism called "Lip Synchronization (LS)" for establishing the
          synchronization requirement across m=lines when they map to
          individual sources.</t>

          <t>RFC5576 <xref target="RFC5576"/> extends the above mechanism when
          multiple media sources are described by a single m=line.</t>
        </section>
      </section>

      <section anchor="containment" title="Containment Context">
        <t>A containment relationship allows composing of multiple concepts
        into a larger concept.</t>

        <section anchor="ssrcmux" title="Media Stream Multiplexing">
          <t>Multiple Media Streams can be contained within a single RTP
          Session via unique SSRC per Media Stream. <xref
          target="I-D.ietf-mmusic-sdp-bundle-negotiation"/> provides SDP based
          signaling mechanism to enable this across several m=lines.</t>

          <t>RFC5576 <xref target="RFC5576"/> enables the same for multiple
          Media Sources described in a single m=line.</t>
        </section>

        <section anchor="sessionmux" title="RTP Session Multiplexing">
          <t><xref target="I-D.westerlund-avtcore-transport-multiplexing"/>,
          for example, describes a mechanism that allow several RTP Sessions
          to be carried over a single underlying Media Transport.</t>
        </section>

        <section anchor="rtcpeerconnection"
                 title="Multiple Media Sources in a WebRTC PeerConnection">
          <t>The WebRTC WG defines a containment object named
          "RTCPeerConnection" that can potentially contain several Media
          Sources mapped to a single RTP Session or spread across several RTP
          Sessions.</t>
        </section>
      </section>

      <section anchor="equivalence" title="Equivalence Context">
        <t>In this relationship different instances of a concept are treated
        to be equivalent for the purposes of relating them to the Media
        Source.</t>

        <t><xref target="fig-rtp-stream"/> below depicts in UML notation the
        general relation between a Media Provider and its Media Stream(s),
        including the Media Stream specializations Source Stream and RTP
        Repair Stream.</t>

        <figure align="center" anchor="fig-rtp-stream"
                title="Media Stream Relations">
          <artwork align="center"><![CDATA[              +--------------+
              |              |<>-+ 0..*
              |    Media     |   |
              |   Provider   |   |
              |              |---+ 0..*
              +--------------+
                    < > 1
                     |
                     | 0..*
              +--------------+ 0..*  1 +-----------------+
              | Media Stream |<>-------| Media Transport |
              +--------------+         +-----------------+
                /\        /\
               +--+      +--+
                |          |
        +-------+          +-------+
        |                          |
+--------------+            +--------------+ 1
|    Primary   |<>----------|    Repair    |<>-+
|    Stream    | 1..*  0..* |    Stream    |---+
+--------------+            +--------------+ 0..*


]]></artwork>
        </figure>

        <t>This relation can in combination with <xref
        target="fig-media-source"/> be used to achieve a set of
        functionalities, described below.</t>

        <section anchor="simulcast" title="Simulcast">
          <t>A Media Source represented as multiple independent Encodings
          constitutes a simulcast of that Media Source. The figure below
          represents an example of a Media Source that is encoded into three
          separate simulcast streams that are in turn sent on the same
          transport flow.</t>

          <figure align="center" anchor="fig-sim"
                  title="Example of Media Source Simulcast">
            <artwork align="center"><![CDATA[                     +----------------+
                     |  Media Source  |
                     +----------------+
                     < >    < >    < >
                      |      |      |
         +------------+      |      +--------------+
         |                   |                     |
+----------------+   +----------------+   +----------------+
| Media Provider |   | Media Provider |   | Media Provider |
+----------------+   +----------------+   +----------------+
       < >                  < >                   < >
        |                    |                     |
        |                    |                     |
+----------------+   +----------------+   +----------------+
|  Media Stream  |   |  Media Stream  |   |  Media Stream  |
+----------------+   +----------------+   +----------------+
       < >                  < >                   < >
        |                    |                     |
        +---------------+    |    +----------------+
                        |    |    |
                   +-------------------+
                   |  Media Transport  |
                   +-------------------+
]]></artwork>
          </figure>
        </section>

        <section anchor="svc" title="Layered MultiStream Transmission">
          <t>Multi-stream transmission (MST) is a mechanism by which different
          portions of a layered encoding of a media stream are sent using
          separate Media Streams (sometimes in separate RTP sessions). MSTs
          are useful for receiver control of layered media.</t>

          <t>A Media Source represented as multiple dependent Encodings
          constitutes a Media Source that has layered dependency. The figure
          below represents an example of a Media Source that is encoded into
          three dependent layers, where two layers are sent on the same
          transport flow and the third layer is sent on a separate transport
          flow.</t>

          <figure align="center" anchor="fig-ddp"
                  title="Example of Media Source Layered Dependency">
            <artwork align="center"><![CDATA[                     +----------------+
                     |  Media Source  |
                     +----------------+
                      < >   < >   < >
                       |     |     |
        +--------------+     |     +--------------+
        |                    |                    |
+----------------+   +----------------+   +---------------+
| Media Provider |<>-| Media Provider |<>-| Media Provider|
+----------------+   +----------------+   +---------------+
       < >                  < >                  < >
        |                    |                    |
        |                    |                    |
+----------------+   +----------------+   +----------------+
|  Media Stream  |   |  Media Stream  |   |  Media Stream  |
+----------------+   +----------------+   +----------------+
       < >                  < >                   < >
        |                    |                     |
        +------+      +------+                     |
               |      |                            |
         +-----------------+              +-----------------+
         | Media Transport |              | Media Transport |
         +-----------------+              +-----------------+
]]></artwork>
          </figure>
        </section>

        <section anchor="repair" title="Robustness and Repair">
          <t>A Media Source may be protected by repair streams during
          transport. Several approaches listed below can achieve the same
          result <list style="symbols">
              <t>Duplication of the original Media Stream</t>

              <t>Duplication of the original Media Stream with a time
              offset,</t>

              <t>forward error correction (FEC) techniques, and.</t>

              <t>retransmission of lost packets (either globally or
              selectively).</t>
            </list></t>

          <t>The figure below represents an example where a Media Source is
          protected by a retransmission (RTX) flow. In this example the
          primary Media Stream and the RTP RTX Stream share the same Media
          Transport.</t>

          <figure align="center" anchor="fig-rtx"
                  title="Example of Media Source Retransmission Flows">
            <artwork align="center"><![CDATA[+----------------+
|  Media Source  |
+----------------+
       < >
        |
+----------------+
| Media Provider |
+----------------+
       < >
        |
+---------------+   +-----------+
| Primary Media |<>-| RTX Media |
|    Stream     |   |  Stream   |
+---------------+   +-----------+
       < >               < >
        |                 |
        +------+   +------+
               |   |
        +-----------------+
        | Media Transport |
        +-----------------+
]]></artwork>
          </figure>

          <t>The figure below represents an example where two Media Sources
          are protected by individual FEC flows as well as one additional FEC
          flow that protects the set of both Media Sources (a FEC group).
          There are several possible ways to map those Media Streams to one or
          more Media Transport, but that is omitted from the figure for
          clarity.</t>

          <figure align="center" anchor="fig-fec"
                  title="Example of Media Source FEC Flows">
            <artwork align="center"><![CDATA[+----------+                                         +----------+
|   Media  |                                         |  Media   |
|  Source  |                                         |  Source  |
+----------+                                         +----------+
    < >                                                  < >
     |                                                    |
+----------+                                         +----------+
|  Media   |                                         |  Media   |
| Provider |                                         | Provider |
+----------+                                         +----------+
    < >  +-------------------+    +-------------------+  < > 
     |   |                   |    |                   |   |
     |   |                  < >  < >                  |   |
+---------+   +--------+   +--------+   +--------+   +---------+
| Primary |   |  RTP   |   |  RTP   |   |  RTP   |   | Primary |
|  Media  |<>-|  FEC   |-<>|  FEC   |<>-|  FEC   |-<>|  Media  |
| Stream  |   | Stream |   | Stream |   | Stream |   |  Stream |
+---------+   +--------+   +--------+   +--------+   +---------+
]]></artwork>
          </figure>
        </section>

        <section anchor="fid" title="SDP FID Semantics">
          <t>RFC5888 <xref target="RFC5888"/> defines m=line grouping
          mechanism called "FID" for establishing the equivalence of Media
          Streams across the m=lines under grouping.</t>

          <t>RFC5576 <xref target="RFC5576"/> extends the above mechanism when
          multiple media sources are described by a single m=line.</t>
        </section>
      </section>

      <section title="Session Context">
        <t>There are different ways to construct a Communication Session. The
        general relation in UML notation between a Communication Session,
        Participants, Multimedia Sessions and RTP Sessions is outlined
        below.</t>

        <figure align="center" anchor="fig-sessions" title="Session Relations">
          <artwork align="center"><![CDATA[               +---------------+
               | Communication |
               |    Session    |
               +---------------+
            0..* < >       < > 1..*
                  |         |
       +----------+         +--------+
  1..* |                             | 1..*
+-------------+ 1     0..* +--------------------+
| Participant |<>----------| Multimedia Session |
+-------------+            +--------------------+
      < > 1                         < > 1
       |                             | 0..*
       |                      +-------------+
       |                      | RTP Session |
       |                      +-------------+
       |                            < > 1
       | 0..*                        | 0..*
+-----------------+ 1   0..* +--------------+
| Media Transport |--------<>| Media Stream |
+-----------------+          +--------------+

]]></artwork>
        </figure>

        <t>Several different flavors of Session can be possible. A few typical
        examples are listed in the below sub-sections, but many other are
        possible to construct.</t>

        <section title="Point-to-Point Session">
          <t>In this example, a single Multimedia Session is shared between
          the two Participants. That Multimedia Session contains a single RTP
          Session with two Media Streams from each Participant. Each
          Participant has only a single Media Transport, carrying those Media
          Streams, which is the main reason why there is only a single RTP
          Session.</t>

          <figure align="center" anchor="fig-point-to-point"
                  title=" Example Point-to-Point Session">
            <artwork><![CDATA[                              +----------------+
                              | Point-to-Point |
                              |    Session     |
                              +----------------+
                               < >   < >   < >
                                |     |     |
       +------------------------+     |     +------------------------+
       |                              |                              |
+-------------+            +--------------------+            +-------------+
| Participant |<>----------| Multimedia Session |----------<>| Participant |
+-------------+            +--------------------+            +-------------+
      < >                            < >                            < >
       |                              |                              |
       | +--------------+      +-------------+      +--------------+ |
       | | Media Stream |----<>| RTP Session |<>----| Media Stream | |
       | +--------------+      +-------------+      +--------------+ |
       |     < >                 < >     < >                 < >     |
       |      |                   |       |                   |      |
+-----------------+   +--------------+ +--------------+   +-----------------+
| Media Transport |-<>| Media Stream | | Media Stream |<>-| Media Transport |
+-----------------+   +--------------+ +--------------+   +-----------------+

]]></artwork>
          </figure>
        </section>

        <section title="Full Mesh Session">
          <t>In this example, the Full Mesh Session has three Participants,
          each of which has the same characteristics as the example in the
          previous section; a single Media Transport per peer Participant,
          resulting in a single RTP session between each pair of
          Participants.</t>

          <figure align="center" anchor="fig-full-mesh"
                  title="Example Full Mesh Session">
            <artwork><![CDATA[+-----------+                  +-------------+                 +-----------+
|   Media   |----------------<>| Participant |<>---------------|   Media   |
| Transport |                  +-------------+                 | Transport |
+-----------+                         |                        +-----------+
    |      |         +------------+   |   +------------+         |      |
   < >    < >        | Multimedia |   |   | Multimedia |        < >    < >
+--------++--------+ |  Session   |   |   |  Session   | +--------++--------+
| Media  || Media  | +------------+   |   +------------+ | Media  || Media  |
| Stream || Stream |  < >        |    |    |        < >  | Stream || Stream |
+--------++--------+   |         |    |    |         |   +--------++--------+
    |           |      |         |    |    |         |      |          |
    |          < >     |        < >  < >  < >        |     < >         |
    |         +---------+     +---------------+     +---------+        |
    +-------<>|   RTP   |     |   Full Mesh   |     |   RTP   |<>------+
    +-------<>| Session |     |    Session    |     | Session |<>------+
    |         +---------+     +---------------+     +---------+        |
    |          < >             < >   < >   < >             < >         | 
    |           |               |     |     |               |          |
+--------++--------+            |     |     |            +--------++--------+
| Media  || Media  |            |     |     |            | Media  || Media  |
| Stream || Stream |            |     |     |            | Stream || Stream |
+--------++--------+            |     |     |            +--------++--------+
   < >    < >                   |     |     |                   < >    < >
    |      |                    |     |     |                    |      |
+-----------+                   |     |     |                   +-----------+
|   Media   |                   |     |     |                   |   Media   |
| Transport |                   |     |     |                   | Transport |
+-----------+ +-----------------+     |     +-----------------+ +-----------+
              |                       |                       |
+-------------+             +--------------------+            +-------------+
| Participant |<>-----------| Multimedia Session |----------<>| Participant |
+-------------+             +--------------------+            +-------------+
      < >                            < >                            < >
       |                              |                              |
       |   +--------+            +---------+            +--------+   |
       |   | Media  |----------<>|   RTP   |<>----------| Media  |   |
       |   | Stream |            | Session |            | Stream |   |
       |   +--------+            +---------+            +--------+   |
       |    < >                  < >     < >                 < >     |
       |     |                    |       |                   |      |
    +-----------+           +--------+ +--------+           +-----------+
    |   Media   |---------<>| Media  | | Media  |<>---------|   Media   |
    | Transport |           | Stream | | Stream |           | Transport |
    +-----------+           +--------+ +--------+           +-----------+

]]></artwork>
          </figure>
        </section>

        <section title="Centralized Conference Session">
          <t>Text to be provided</t>

          <figure align="center" anchor="fig-central-conf"
                  title="Example Centralized Conference Session">
            <artwork><![CDATA[TBD]]></artwork>
          </figure>
        </section>
      </section>
    </section>

    <section anchor="security" title="Security Considerations">
      <t>This document simply tries to clarify the confusion prevalent in RTP
      taxonomy because of inconsistent usage by multiple technologies and
      protocols making use of the RTP protocol. It does not introduce any new
      security considerations beyond those already well documented in the RTP
      protocol <xref target="RFC3550"/> and each of the many respective
      specifications of the various protocols making use of it.</t>

      <t>Hopefully having a well-defined common terminology and understanding
      of the complexities of the RTP architecture will help lead us to better
      standards, avoiding security problems.</t>
    </section>

    <section title="Acknowledgement">
      <t>This document has many concepts borrowed from several documents such
      as WebRTC <xref target="I-D.ietf-rtcweb-overview"/>, CLUE <xref
      target="I-D.ietf-clue-framework"/>, Multiplexing Architecture <xref
      target="I-D.westerlund-avtcore-transport-multiplexing"/>. The authors
      would like to thank all the authors of each of those documents.</t>

      <t>The authors would also like to acknowledge the insights, guidance and
      contributions of Magnus Westerlund, Roni Even, Colin Perkins, Keith
      Drage, and Harald Alvestrand.</t>
    </section>

    <section title="Open Issues">
      <t>Much of the terminology is still a matter of dispute.</t>

      <t>It might be useful to distinguish between a single endpoint's view of
      a source, or RTP session, or multimedia session, versus the full set of
      sessions and every endpoint that's communicating in them, with the
      signaling that established them.</t>

      <t>(Sure to be many more...)</t>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.3550"?>

      <reference anchor="UML">
        <front>
          <title>OMG Unified Modeling Language (OMG UML), Superstructure,
          V2.2</title>

          <author>
            <organization abbrev="OMG">Object Management Group</organization>
          </author>

          <date month="February" year="2009"/>
        </front>

        <seriesInfo name="OMG" value="formal/2009-02-02"/>

        <format target="http://www.omg.org/spec/UML/2.2/Superstructure/PDF/"
                type="PDF"/>
      </reference>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.3264"?>

      <?rfc include="reference.RFC.4566"?>

      <?rfc include="reference.RFC.6222"?>

      <?rfc include="reference.RFC.5576"?>

      <?rfc include="reference.RFC.5888"?>

      <?rfc include="reference.RFC.5905"?>

      <?rfc include="reference.I-D.ietf-clue-framework"?>

      <?rfc include="reference.I-D.ietf-rtcweb-overview"?>

      <?rfc include="reference.I-D.ietf-mmusic-sdp-bundle-negotiation"?>

      <?rfc include="reference.I-D.ietf-avtcore-clksrc"?>

      <?rfc include="reference.I-D.westerlund-avtcore-transport-multiplexing"?>
    </references>

    <section title="Changes From Earlier Versions">
      <t>NOTE TO RFC EDITOR: Please remove this section prior to
      publication.</t>

      <section title="Changes From Draft -00">
        <t><list style="symbols">
            <t>Too many to list</t>

            <t>Added new authors</t>

            <t>Updated content organization and presentation</t>
          </list></t>
      </section>
    </section>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 03:20:34