http://stupid.domain.name/ietf/

One document matched: draft-lennox-raiarea-rtp-grouping-taxonomy-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc autobreaks="yes"?>
<rfc category="info" docName="draft-lennox-raiarea-rtp-grouping-taxonomy-02"
     ipr="trust200902">
  <front>
    <title abbrev="RTP Grouping Taxonomy">A Taxonomy of Grouping Semantics and
    Mechanisms for Real-Time Transport Protocol (RTP) Sources</title>

    <author fullname="Jonathan Lennox" initials="J." surname="Lennox">
      <organization abbrev="Vidyo">Vidyo, Inc.</organization>

      <address>
        <postal>
          <street>433 Hackensack Avenue</street>

          <street>Seventh Floor</street>

          <city>Hackensack</city>

          <region>NJ</region>

          <code>07601</code>

          <country>US</country>
        </postal>

        <email>jonathan@vidyo.com</email>
      </address>
    </author>

    <author fullname="Kevin Gross" initials="K." surname="Gross">
      <organization abbrev="AVA">AVA Networks, LLC</organization>

      <address>
        <postal>
          <street/>

          <city>Boulder</city>

          <region>CO</region>

          <country>US</country>
        </postal>

        <email>kevin.gross@avanw.com</email>
      </address>
    </author>

    <author fullname="Suhas Nandakumar" initials="S" surname="Nandakumar">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>170 West Tasman Drive</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>snandaku@cisco.com</email>
      </address>
    </author>

    <author fullname="Gonzalo Salgueiro" initials="G" surname="Salgueiro">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>7200-12 Kit Creek Road</street>

          <city>Research Triangle Park</city>

          <region>NC</region>

          <code>27709</code>

          <country>US</country>
        </postal>

        <email>gsalguei@cisco.com</email>
      </address>
    </author>

    <author fullname="Bo Burman" initials="B." surname="Burman">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 13 11</phone>

        <email>bo.burman@ericsson.com</email>
      </address>
    </author>

    <!-- Add more authors here! -->

    <date day="18" month="September" year="2013"/>

    <area>Real Time Applications and Infrastructure (RAI)</area>

    <keyword>I-D</keyword>

    <keyword>Internet-Draft</keyword>

    <!-- TODO: more keywords -->

    <abstract>
      <t>The terminology about, and associations among, Real-Time Transport
      Protocol (RTP) sources can be complex and somewhat opaque. This document
      describes a number of existing and proposed relationships among RTP
      sources, and attempts to define common terminology for discussing
      protocol entities and their relationships.</t>

      <t>This document is still very rough, but is submitted in the hopes of
      making future discussion productive.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="introduction" title="Introduction">
      <t>The existing taxonomy of sources in RTP is often regarded as
      confusing and inconsistent. Consequently, a deep understanding of how
      the different terms relate to each other becomes a real challenge.
      Frequently cited examples of this confusion are (1) how different
      protocols that make use of RTP use the same terms to signify different
      things and (2) how the complexities addressed at one layer are often
      glossed over or ignored at another.</t>

      <t>This document attempts to provide some clarity by reviewing the
      semantics of various aspects of sources in RTP. As an organizing
      mechanism, it approaches this by describing various ways that RTP
      sources can be grouped and associated together.</t>

      <t>All non-specific references to ControLling mUltiple streams for
      tElepresence (CLUE) in this document map to <xref
      target="I-D.ietf-clue-framework"/> and all references to Web Real-Time
      Communications (WebRTC) map to <xref
      target="I-D.ietf-rtcweb-overview"/>.</t>
    </section>

    <section title="Concepts">
      <t>This section defines concepts that serve to identify and name various
      transformations and streams in a given RTP usage. For each concept an
      attempt is made to list any alternate definitions and usages that
      co-exist today along with various characteristics that further describes
      the concept. These concepts are divided into two categories, one related
      to the chain of streams and transformations that media can be subject
      to, the other for entities involved in the communication.</t>

      <section title="Media Chain">
        <t>This section contains the concepts that can be involved in taking a
        sequence of physical world stimulus (sound waves, photons,
        key-strokes) at a sender side and transport them to a receiver, which
        may recover a sequence of physical stimulus. This chain of concepts is
        of two main types, streams and transformations. Streams are time-based
        sequences of samples of the physical stimulus in various
        representations, while transformations changes the representation of
        the streams in some way.</t>

        <t>The below examples are basic ones and it is important to keep in
        mind that this conceptual model enables more complex usages. Some will
        be further discussed in later sections of this document. In general
        the following applies to this model:<list style="symbols">
            <t>A transformation may have zero or more inputs and one or more
            outputs.</t>

            <t>A Stream are of some type.</t>

            <t>A Stream has one source transformation and one or more sink
            transformation (with the exception of <xref
            target="physical-stimulus">Physical Stimulus</xref> that can have
            no source or sink transformation).</t>

            <t>Streams can be forwarded from a transformation output to any
            number of inputs on other transformations that support that
            type.</t>

            <t>If the output of a transformation is sent to multiple
            transformations, those streams will be identical; it takes a
            transformation to make them different.</t>

            <t>There are no formal limitations on how streams are connected to
            transformations, this may include loops if required by a
            particular transformation.</t>
          </list> It is also important to remember that this is a conecptual
        model. Thus real-world implementations may look different and have
        different structure.</t>

        <t>To provide a basic understanding of the relationships in the chain
        we below first introduces the concepts for the <xref
        target="fig-sender-chain">sender side</xref>. This covers physical
        stimulus until media packets are emitted onto the network.</t>

        <figure align="center" anchor="fig-sender-chain"
                title="Sender Side Concepts in the Media Chain">
          <artwork><![CDATA[   Physical Stimulus
          |
          V
+--------------------+
|    Media Capture   |
+--------------------+
          |
     Raw stream
          V
+--------------------+
|    Media Source    |<- Synchronization Timing
+--------------------+
          |
    Source Stream
          V 
+--------------------+
|   Media Encoder    |
+--------------------+
          |
    Encoded Stream     +-----------+
          V            |           V
+--------------------+ | +--------------------+
|  Media Packetizer  | | |  Media Redundancy  |
+--------------------+ | +--------------------+
          |            |           |
          +------------+ Redundancy Packet Stream
   Source Packet Stream            |
          V                        V
+--------------------+   +--------------------+
|  Media Transport   |   |  Media Transport   |
+--------------------+   +--------------------+
]]></artwork>
        </figure>

        <t>In <xref target="fig-sender-chain"/> we have included a branched
        chain to cover the concepts for using redundancy to improve the
        reliability of the transport. The Media Transport concept is an
        aggregate that is decomposed below in <xref
        target="media-stream-decomposition"/>.</t>

        <t>Below we review a <xref target="fig-receiver-chain">receiver media
        chain</xref> matching the sender side to look at the inverse
        transformations and their attempts to recover possibly identical
        streams as in the sender chain. Note that the streams out of a reverse
        transformation, like the Source Stream out the Media Decoder are in
        many cases not the same as the corresponding ones on the sender side,
        thus they are prefixed with a "Received" to denote a potentialy
        modified version. The reason for not being the same lies in the
        transformations that can be of irreversible type. For example, lossy
        source coding in the Media Encoder prevents the Source Stream out of
        the Media Decoder to be the same as the one fed into the Media
        Encoder. Other reasons include packet loss or late loss in the Media
        Transport transformation that even Media Repair, if used, fails to
        repair. It should be noted that some transformations are not always
        present, like Media Repair that cannot operate without Redundancy
        Packet Streams.</t>

        <figure align="center" anchor="fig-receiver-chain"
                title="Receiver Side Concepts of the Media Chain">
          <artwork><![CDATA[+--------------------+   +--------------------+
|  Media Transport   |   |  Media Transport   |
+--------------------+   +--------------------+
          |                        |
Received Packet Stream   Received Redundancy PS
          |                        |
          |    +-------------------+
          V    V
+--------------------+
|    Media Repair    |
+--------------------+
          |
Repaired Packet Stream
          V                 
+--------------------+
| Media Depacketizer |
+--------------------+
          |
Received Encoded Stream
          V          
+--------------------+
|   Media Decoder    |
+--------------------+
          |
Received Source Stream
          V
+--------------------+
|     Media Sink     |--> Synchronization Information
+--------------------+
          |
Received Raw Stream
          V
+--------------------+
|   Media Renderer   |
+--------------------+
          |
          V
  Physical Stimulus
]]></artwork>
        </figure>

        <section anchor="physical-stimulus" title="Physical Stimulus">
          <t>The physical stimulus is a physical event that can be captured
          and provided as media to a receiver. This include soundwaves making
          up audio, photons in a light field that is visible, or other
          excitations or interactions with sensors, like keystrokes on a
          keyboard.</t>
        </section>

        <section anchor="media-capture" title="Media Capture">
          <t>The process of transforming the <xref
          target="physical-stimulus">Physical Stimulus</xref> into captured
          media. The Media Capture performs a digitial sampling of the
          physical stimulus, usually periodically, and outputs this in some
          representation as a <xref target="raw-stream">Raw Stream</xref>.
          This data is due to its periodical sampling, or at least being timed
          asynchronous events, some form of a stream of media data. The Media
          Capture is normally instantiated in some type of device, i.e. media
          capture device. Examples of different types of media capturing
          devices are digital cameras, microphones connected to A/D
          converters, or keyboards.</t>

          <section title="Alternate Usages">
            <t>The CLUE WG uses the term "Capture Device" to identify a
            physical capture device.</t>

            <t>WebRTC WG uses the term "Recording Device" to refer to the
            locally available capture devices in an end-system.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Capture is identified either by
                hardware/manufacturer ID or via a session-scoped device
                identifier as mandated by the application usage.</t>
              </list></t>
          </section>
        </section>

        <section anchor="raw-stream" title="Raw Stream">
          <t>The time progressing stream of digitialy sampled information,
          usually periodically sampled, provided by a <xref
          target="media-capture">Media Capture</xref>.</t>
        </section>

        <section anchor="media-source" title="Media Source">
          <t>A Media Source is the logical source of a reference clock
          synchronized, time progressing, digital media stream, called a <xref
          target="source-stream">Source Stream</xref>. This transformation
          takes one or more <xref target="raw-stream">Raw Streams</xref> and
          provides a Source Stream as output. This output has been
          synchronized with some reference clock, even if just a system local
          wall clock.</t>

          <t>The output can be of different types. One type is directly
          associated with a particular Media Capture's Raw Stream. Others are
          more conceptual sources, like an <xref
          target="fig-media-source-mixer">audio mix of multiple Raw
          Streams</xref>, a mixed selection of the three loudest inputs
          regarding speech activity, a selection of a particular video based
          on the current speaker, i.e. typically based on other Media
          Sources.</t>

          <figure align="center" anchor="fig-media-source-mixer"
                  title="Conceptual Media Source in form of Audio Mixer">
            <artwork><![CDATA[   Raw       Raw       Raw
  Stream    Stream    Stream
    |         |         |
    V         V         V
+--------------------------+
|        Media Source      |<-- Reference Clock
|           Mixer          |
+--------------------------+
              |
              V
        Source Stream]]></artwork>
          </figure>

          <t/>

          <section title="Alternate Usages">
            <t>The CLUE WG uses the term "Media Capture" for this purpose. A
            CLUE Media Capture is identified via indexed notation. The terms
            Audio Capture and Video Capture are used to identify Audio Sources
            and Video Sources respectively. Concepts such as "Capture Scene",
            "Capture Scene Entry" and "Capture" provide a flexible framework
            to represent media captured spanning spatial regions.</t>

            <t>The WebRTC WG defines the term "RtcMediaStreamTrack" to refer
            to a Media Source. An "RtcMediaStreamTrack" is identified by the
            ID attribute.</t>

            <!--MW: I think the below SDP is a bit misplaced. Do we need a special section to discuss 
    relation to SDP terminology. Or should this be focused and other interpretations
    be added?-->

            <t>Typically a Media Source is mapped to a single m=line via the
            Session Description Protocol (SDP) <xref target="RFC4566"/> unless
            mechanisms such as Source-Specific attributes are in place <xref
            target="RFC5576"/>. In the latter cases, an m=line can represent
            either multiple Media Sources, multiple <xref
            target="packet-stream">Packet Streams</xref>, or both.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>At any point, it can represent a physical captured source
                or conceptual source.</t>

                <!--MW: Put back a discussion of relation between Media Capture and Media sources?-->
              </list></t>
          </section>
        </section>

        <section anchor="source-stream" title="Source Stream">
          <t>A time progressing stream of digital samples that has been
          synchronized with a reference clock and comes from particular <xref
          target="media-source">Media Source</xref>.</t>
        </section>

        <section anchor="media-encoder" title="Media Encoder">
          <t>A Media Encoder is a transform that is responsible for encoding
          the media data from a <xref target="source-stream">Source
          Stream</xref> into another representation, usually more compact,
          that is output as an <xref target="encoded-stream">Encoded
          Stream</xref>.</t>

          <t>The Media Encoder step commonly includes pre-encoding
          transformations, such as scaling, resampling etc. The Media Encoder
          can have a significant number of configuration options that affects
          the properties of the encoded stream. This include properties such
          as bit-rate, start points for decoding, resolution, bandwidth or
          other fidelity affecting properties. The actually used codec is also
          an important factor in many communication systems, not only its
          parameters.</t>

          <t>Scalable Media Encoders need special mentioning as they produce
          multiple outputs that are potentially of different types. A scalable
          Media Encoder takes one input Source Stream and encodes it into
          multiple output streams of two different types; at least one Encoded
          Stream that is independently decodable and one or more <xref
          target="dependent-stream">Dependent Streams</xref> that requires at
          least one Encoded Stream and zero or more Dependent Streams to be
          possible to decode. A Dependent Stream's dependency is one of the
          grouping relations this document discusses further in <xref
          target="svc"/>.</t>

          <figure align="center" anchor="fig-scalable-media-encoder"
                  title="Scalable Media Encoder Input and Outputs">
            <artwork><![CDATA[       Source Stream
             |
             V
+--------------------------+
|  Scalable Media Encoder  |
+--------------------------+
   |         |   ...    |
   V         V          V
Encoded  Dependent  Dependent
Stream    Stream     Stream
]]></artwork>
          </figure>

          <t/>

          <section title="Alternate Usages">
            <t>Within the SDP usage, an SDP media description (m=line) may
            describe part of the necessary configuration required for encoding
            purposes.</t>

            <t>CLUE's "Capture Encoding" provides specific encoding
            configuration for this purpose.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Source can be multiply encoded by different Media
                Encoders to provide various encoded representations.</t>
              </list></t>
          </section>
        </section>

        <section anchor="encoded-stream" title="Encoded Stream">
          <t>A stream of time synchronized encoded media that can be
          independently decoded.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Due to temporal dependencies, an Encoded Stream may have
                limitations in where decoding can be started. These entry
                points, for example Intra frames from a video encoder, may
                require identification and their generation may be event based
                or configured to occur periodically.</t>
              </list></t>
          </section>
        </section>

        <section anchor="dependent-stream" title="Dependent Stream">
          <t>A stream of time synchronized encoded media fragements that are
          dependent on one or more <xref target="encoded-stream">Encoded
          Streams</xref> and zero or more Dependent Streams to be possible to
          decode.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each Dependent Stream has a set of dependencies. These
                dependencies must be understood by the parties in a
                multi-media session that intend to use a Dependent Stream.</t>
              </list></t>
          </section>
        </section>

        <section title="Media Packetizer">
          <t>The transformation of taking one or more <xref
          target="encoded-stream">Encoded</xref> or <xref
          target="dependent-stream">Dependent Stream</xref> and put their
          content into one or more sequences of packets, normally RTP packets,
          and output <xref target="packet-stream">Source Packet
          Streams</xref>. This step includes both generating RTP payloads as
          well as RTP packets.</t>

          <t>The Media Packetizer can use multiple inputs when producing a
          single Packet Stream. One such example is the packetization when
          using SVC, as in Single Stream Transport (SST) usage of the payload
          format both an Encoded Stream as well as Dependent Streams are
          packetized in a single Source Packet Stream using a single SSRC.</t>

          <t>The Media Packetizer can also produce multiple Packet Streams,
          for example when Encoded and/or Dependent Streams are distributed
          over multiple Packet Streams, possibly in different RTP
          sessions.</t>

          <section title="Alternate Usages">
            <t>An RTP sender is part of the Media Packetizer.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>The Media Packetizer will select which Synchronization
                source(s) (SSRC) <xref target="RFC3550"/> in which RTP
                sessions that are used.</t>

                <t>Media Packetizer can combine multiple Encoded or Dependent
                Streams into one or more Packet Streams.</t>
              </list></t>
          </section>
        </section>

        <section anchor="packet-stream" title="Packet Stream">
          <t>A stream of RTP packets containing media data, source or
          redundant. The Packet Stream is identified by an SSRC belonging to a
          particular RTP session. The RTP session is identified as discussed
          in <xref target="rtp-session"/>.</t>

          <t>A Source Packet Stream is a packet stream containing at least
          some content from an Encoded Stream. Source material is any media
          material that is produced for transport over RTP without any
          additional redundancy applied to cope with network transport losses.
          Compare this with the <xref
          target="redundancy-packet-stream">Redundancy Packet
          Stream</xref>.</t>

          <section title="Alternate Usages">
            <t>The term "Stream" is used by the CLUE WG to define an encoded
            Media Source sent via RTP. "Capture Encoding", "Encoding Groups"
            are defined to capture specific details of the encoding
            scheme.</t>

            <t>RFC3550 <xref target="RFC3550"/> uses the terms media stream,
            audio stream, video stream and streams of (RTP) packets
            interchangably. It defines the SSRC as the "The source of a stream
            of RTP packets, ..."</t>

            <t>The equivalent mapping of a Packet Stream in SDP <xref
            target="RFC4566"/> is defined per usage. For example, each Media
            Description (m=line) can describe one Packet Stream OR properties
            for multiple Packet Streams OR for an RTP session (via <xref
            target="RFC5576"/> mechanisms for example).</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each Packet Stream is identified by a unique
                Synchronization source (SSRC) <xref target="RFC3550"/> that is
                carried in every RTP and Real-time Transport Control Protocol
                (RTCP) packet header in a specific RTP session context.</t>

                <t>At any given point in time, a Packet Stream can have one
                and only one SSRC.</t>

                <t>Each Packet Stream defines a unique RTP sequence numbering
                and timing space.</t>

                <t>Several Packet Streams may map to a single Media Source via
                the source transformations (see <xref
                target="equivalence"/>).</t>

                <t>Several Packet Streams can be carried over a single RTP
                Session.</t>
              </list></t>
          </section>
        </section>

        <section anchor="media-redundancy" title="Media Redundancy">
          <t>Media redundancy is a transformation that generates redundant or
          repair packets sent out as a Redundancy Packet Stream to mitigate
          network transport impairments, like packet loss and delay.</t>

          <t>The Media Redundancy exists in many flavors; they may be
          generating indepdent Repair Streams that are used in addition to the
          Source Stream (<xref target="RFC4588">RTP Retransmission</xref> and
          some <xref target="RFC5109">FEC</xref>), they may generate a new
          Source Stream by combining redundancy information with source
          information (Using <xref target="RFC5109">XOR FEC</xref> as a <xref
          target="RFC2198">redundancy payload</xref>), or completely replace
          the source information with only redundancy packets.</t>
        </section>

        <section anchor="redundancy-packet-stream"
                 title="Redundancy Packet Stream">
          <t>A <xref target="packet-stream">Packet Stream</xref> that contains
          no original source data, only redundant data that may be combined
          with one or more <xref target="received-packet-stream">Received
          Packet Stream</xref> to produce <xref
          target="repaired-packet-stream">Repaired Packet Streams</xref>.</t>
        </section>

        <section anchor="media-transport" title="Media Transport">
          <t>A Media Transport defines the transformation that the <xref
          target="packet-stream">Packet Streams</xref> are subjected to by the
          end-to-end transport from one RTP sender to one specific RTP
          receiver (an RTP session may contain multiple RTP receivers per
          sender). Each Media Transport is defined by a transport association
          that is identified by a 5-tuple (source address, source port,
          destination address, destination port, transport protocol). Each
          transport association normally contains only a single RTP session,
          although a proposal exists for sending <xref
          target="I-D.westerlund-avtcore-transport-multiplexing">multiple RTP
          sessions over one transport association</xref>.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Media Transport transmits Packet Streams of RTP Packets
                from a source transport address to a destination transport
                address.</t>
              </list></t>
          </section>

          <section anchor="media-stream-decomposition"
                   title="Media Stream Decomposition">
            <t>The Media Transport concept sometimes needs to be decomposited
            into more steps to enable discussion of what a sender emits that
            gets transformed by the network before it is received by the
            receiver. Thus we provide also this <xref
            target="fig-media-transport">Media Transport
            decomposition</xref>.</t>

            <figure align="center" anchor="fig-media-transport"
                    title="Decomposition of Media Transport">
              <artwork><![CDATA[      Packet Stream            
             |
             V
+--------------------------+
|  Media Transport Sender  |
+--------------------------+
             |
      Sent Packet Stream
             V
+--------------------------+
|    Network Transport     |
+--------------------------+
             |
 Transported Packet Stream
             V
+--------------------------+
| Media Transport Receiver |
+--------------------------+
             |
             V
    Received Packet Stream
]]></artwork>
            </figure>

            <t/>

            <section anchor="media-transport-sender"
                     title="Media Transport Sender">
              <t>The first transformation within the <xref
              target="media-transport">Media Transport</xref> is the Media
              Transport Sender, where the sending <xref
              target="end-point">End-Point</xref> takes a Packet Stream and
              emits the packets onto the network using the transport
              association established for this Media Transport thus creating a
              <xref target="sent-packet-stream">Sent Packet Stream</xref>. In
              this process it transforms the Packet Stream in several ways.
              First, it gains the necessary protocol headers for the transport
              assocaition, for example IP and UDP headers, thus forming
              IP/UDP/RTP packets. In addition, the Media Transport Sender may
              queue, pace or otherwise affect how the packets are emitted onto
              the network. Thus adding delay, jitter and inter packet spacings
              that characterize the Sent Packet Stream.</t>
            </section>

            <section anchor="sent-packet-stream" title="Sent Packet Stream">
              <t>The Sent Packet Stream is the Packet Stream as entering the
              first hop of the network path to its destination. The Sent
              Packet Stream is identified using network transport addresses,
              like for IP/UDP the 5-tuple (source IP address, source port,
              destination IP address, destination port, and protocol
              (UDP)).</t>
            </section>

            <section anchor="network-transport" title="Network Transport">
              <t>Network Transport is the transformation that the <xref
              target="sent-packet-stream">Sent Packet Stream</xref> is
              subjected to by traveling from the source to the destination
              through the network. These transformations include, loss of some
              packets, varying delay on a per packet basis, packet
              duplication, and packet header or data corruption. These
              transformations produces a <xref
              target="transported-packet-stream">Transported Packet
              Stream</xref> at the exit of the network path.</t>
            </section>

            <section anchor="transported-packet-stream"
                     title="Transported Packet Stream">
              <t>The Packet Stream that is emitted out of the network path at
              the destination, subjected to the <xref
              target="network-transport">Network Transport's
              transformation</xref>.</t>
            </section>

            <section title="Media Transport Receiver">
              <t>The receiver <xref target="end-point">End-Point's</xref>
              transformation of the <xref
              target="transported-packet-stream">Transported Packet
              Stream</xref> by its reception process that result in the <xref
              target="received-packet-stream">Received Packet Stream</xref>.
              This transformation includes transport checksums being verified
              and if non-matching, causing discarding of the corrupted packet.
              Other transformations can include delay variations in receiving
              a packet on the network interface and providing it to the
              application.</t>
            </section>
          </section>
        </section>

        <section anchor="received-packet-stream"
                 title="Received Packet Stream">
          <t>The <xref target="packet-stream">Packet Stream</xref> resulting
          from the Media Transport's transformation, i.e. subjected to packet
          loss, packet corruption, packet duplication and varying transmission
          delay from sender to receiver.</t>
        </section>

        <section anchor="received-redundancy-ps"
                 title="Received Redundandy Packet Stream">
          <t>The <xref target="redundancy-packet-stream">Redundancy Packet
          Stream</xref> resulting from the Media Transport's transformation,
          i.e. subjected to packet loss, packet corruption, and varying
          transmission delay from sender to receiver.</t>
        </section>

        <section title="Media Repair">
          <t>A Transformation that takes as input one or more <xref
          target="packet-stream">Source Packet Streams</xref> as well as <xref
          target="redundancy-packet-stream">Redundancy Packet Streams</xref>
          and attempts to combine them to counter the transformations
          introduced by the <xref target="media-transport">Media
          Transport</xref> to minimize the difference between the <xref
          target="source-stream">Source Stream</xref> and the <xref
          target="received-source-stream">Received Source Stream</xref> after
          <xref target="media-decoder">Media Decoder</xref>. The output is a
          <xref target="repaired-packet-stream">Repaired Packet
          Stream</xref>.</t>
        </section>

        <section anchor="repaired-packet-stream"
                 title="Repaired Packet Stream">
          <t>A <xref target="received-packet-stream">Received Packet
          Stream</xref> for which <xref
          target="received-redundancy-ps">Received Redundancy Packet
          Stream</xref> information has been used to try to re-create the
          <xref target="packet-stream">Packet Stream</xref> as it was before
          <xref target="media-transport">Media Transport</xref>.</t>
        </section>

        <section title="Media Depacketizer">
          <t>A Media Depacketizer takes one or more <xref
          target="packet-stream">Packet Streams</xref> and depacketizes them
          and attempts to reconstitue the <xref
          target="encoded-stream">Encoded Streams</xref> or <xref
          target="dependent-stream">Dependent Streams</xref> present in those
          Packet Streams.</t>
        </section>

        <section anchor="received-encoded-stream"
                 title="Received Encoded Stream">
          <t>The received version of an <xref target="encoded-stream">Encoded
          Stream</xref>.</t>
        </section>

        <section anchor="media-decoder" title="Media Decoder">
          <t>A Media Decoder is a transformation that is responsible for
          decoding <xref target="encoded-stream">Encoded Streams</xref> and
          any <xref target="dependent-stream">Dependent Streams</xref> into a
          <xref target="source-stream">Source Stream</xref>.</t>

          <section title="Alternate Usages">
            <t>Within the context of SDP, an m=line describes the necessary
            configuration and identification (RTP Payload Types) required to
            decode either one or more incoming Media Streams.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A Media Decoder is the entity that will have to deal with
                any errors in the encoded streams that resulted from
                corruptions or failures to repair packet losses. This as a
                media decoder generally is forced to produce some output
                periodically. It thus commonly includes concealment
                methods.</t>
              </list></t>
          </section>
        </section>

        <section anchor="received-source-stream"
                 title="Received Source Stream">
          <t>The received version of a <xref target="source-stream">Source
          Stream</xref>.</t>
        </section>

        <section anchor="media-sink" title="Media Sink">
          <t>The Media Sink receives a <xref target="source-stream">Source
          Stream</xref> that contains, usually periodically, sampled media
          data together with associated synchronization information. Depending
          on application, this Source Stream then needs to be transformed into
          a <xref target="raw-stream">Raw Stream</xref> that is sent in
          synchronization with the output from other Media Sinks to a <xref
          target="media-render">Media Render</xref>. The media sink may also
          be connected with a <xref target="media-source">Media Source</xref>
          and be used as part of a conceptual Media Source.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>The media sink can further transform the source stream into
                a representation that is suitable for rendering on the Media
                Render as defined by the application or system-wide
                configuration. This include sample scaling, level adjustments
                etc.</t>
              </list></t>
          </section>
        </section>

        <section title="Received Raw Stream">
          <t>The received version of a <xref target="raw-stream">Raw
          Stream</xref>.</t>
        </section>

        <section anchor="media-render" title="Media Render">
          <t>A Media Render takes a <xref target="raw-stream">Raw
          Stream</xref> and converts it into <xref
          target="physical-stimulus">Physical Stimulus</xref> that a human
          user can perceive. Examples of such devices are screens, D/A
          converters connected to amplifiers and loudspeakers.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>An End Point can potentially have multiple Media Renders
                for each media type.</t>
              </list></t>
          </section>
        </section>
      </section>

      <section title="Communication Entities">
        <t>This section contains concept for entities involved in the
        communication.</t>

        <section anchor="end-point" title="End Point">
          <t>A single addressabke entity sending or receiving RTP packets. It
          may be decomposed into several functional blocks, but as long as it
          behaves as a single RTP stack entity it is classified as a single
          "End Point".</t>

          <section title="Alternate Usages">
            <t>The CLUE Working Group (WG) uses the terms "Media Provider" and
            "Media Consumer" to describes aspects of End Point pertaining to
            sending and receiving functionalities.</t>
          </section>

          <section title="Characteristics">
            <t>End Points can be identified in several different ways. While
            RTCP Canonical Names (CNAMEs) <xref target="RFC3550"/> provide a
            globally unique and stable identification mechanism for the
            duration of the Communication Session (see <xref
            target="comm-session"/>), their validity applies exclusively
            within a <xref target="syncontext">Synchronization Context</xref>.
            Thus one End Point can have multiple CNAMEs. Therefore, mechanisms
            outside the scope of RTP, such as application defined mechanisms,
            must be used to ensure End Point identification when outside this
            Synchronization Context.</t>
          </section>
        </section>

        <section anchor="rtp-session" title="RTP Session">
          <t>An RTP session is an association among a group of participants
          communicating with RTP. It is a group communications channel which
          can potentially carry a number of Packet Streams. Within an RTP
          session, every participant can find meta-data and control
          information (over RTCP) about all the Packet Streams in the RTP
          session. The bandwidth of the RTCP control channel is shared between
          all participants within an RTP Session.</t>

          <section title="Alternate Usages">
            <t>Within the context of SDP, a singe m=line can map to a single
            RTP Session or multiple m=lines can map to a single RTP Session.
            The latter is enabled via multiplexing schemes such as BUNDLE
            <xref target="I-D.ietf-mmusic-sdp-bundle-negotiation"/>, for
            example, which allows mapping of multiple m=lines to a single RTP
            Session.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Typically, an RTP Session can carry one ore more Packet
                Streams.</t>

                <t>An RTP Session shares a single SSRC space as defined in
                RFC3550 <xref target="RFC3550"/>. That is, the End Points
                participating in an RTP Session can see an SSRC identifier
                transmitted by any of the other End Points. An End Point can
                receive an SSRC either as SSRC or as a Contributing source
                (CSRC) in RTP and RTCP packets, as defined by the endpoints'
                network interconnection topology.</t>

                <t>An RTP Session uses at least two <xref
                target="media-transport">Media Transports</xref>, one for
                sending and one for receiving. Commonly, the receiving one is
                the reverse direction of the same one as used for sending. An
                RTP Session may use many Media Transports and these define the
                session's network interconnection topology. A single Media
                Transport can normally not transport more than one RTP
                Session, unless a solution for multiplexing multiple RTP
                sessions over a single Media Transport is used. One example of
                such a scheme is <xref
                target="I-D.westerlund-avtcore-transport-multiplexing">Multiple
                RTP Sessions on a Single Lower-Layer Transport</xref>.</t>

                <t>Multiple RTP Sessions can be related via mechanisms defined
                in <xref target="relationships"/>.</t>
              </list></t>
          </section>
        </section>

        <section anchor="participant" title="Participant">
          <t>A participant is an entity reachable by a single signaling
          address, and is thus related more to the signaling context than to
          the media context.</t>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>A single signaling-addressable entity, using an
                application-specific signaling address space, for example a
                SIP URI.</t>

                <t>A participant can have several <xref
                target="multimedia-session">Multimedia Sessions</xref>.</t>

                <t>A participant can have several associated transport flows,
                including several separate local transport addresses for those
                transport flows.</t>

                <!--MW: I can't understand what the purpose is of the last bullet regarding many 
transport flows. It needs to be aligned with the rest of the concept language.
But I am unable to change it because I don't understand what one attempts
to say.
BoB: Speculatively, it is just trying to prohibit definig a Participant as
being one end of a single Media Transport. This bullet is then not needed,
as a single Multimedia Session can already have multiple Media Transports.
-->
              </list></t>
          </section>
        </section>

        <section anchor="multimedia-session" title="Multimedia Session">
          <t>A multimedia session is an association among a group of
          participants engaged in the communication via one or more <xref
          target="rtp-session">RTP Sessions</xref>. It defines logical
          relationships among <xref target="media-source">Media Sources</xref>
          that appear in multiple RTP Sessions.</t>

          <section title="Alternate Usages">
            <t>RFC4566 <xref target="RFC4566"/> defines a multimedia session
            as a set of multimedia senders and receivers and the data streams
            flowing from senders to receivers.</t>

            <t>RFC3550 <xref target="RFC3550"/> defines it as set of
            concurrent RTP sessions among a common group of participants. For
            example, a videoconference (which is a multimedia session) may
            contain an audio RTP session and a video RTP session.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Participants and their End Points in RTP Multimedia
                Sessions are identified via mechanisms such as RTCP CNAME or
                other application level identifiers, as appropriate.</t>

                <t>A Multimedia Session can be composed of several parallel
                RTP Sessions with potentially multiple Packet Streams per RTP
                Session.</t>

                <t>Each participant in a Multimedia Session can have a
                multitude of Media Captures and Media Rendering devices.</t>
              </list></t>
          </section>
        </section>

        <section anchor="comm-session" title="Communication Session">
          <t>A Communication Session is an association among group of
          participants communicating with each other via a set of Multimedia
          Sessions.</t>

          <section title="Alternate Usages">
            <t>The <xref target="RFC4566">Session Description Protocol
            (SDP)</xref> defines a multimedia session as a set of multimedia
            senders and receivers and the data streams flowing from senders to
            receivers. In that definition it is however not clear if a
            multimedia session includes both the sender's and the receiver's
            view of the same RTP Packet Stream.</t>
          </section>

          <section title="Characteristics">
            <t><list style="symbols">
                <t>Each participant in a Communication Session is identified
                via an application-specific signaling address.</t>

                <t>A Communication Session is composed of at least one
                Multimedia Session per participant, involving one or more
                parallel RTP Sessions with potentially multiple Packet Streams
                per RTP Session.</t>
              </list> For example, in a full mesh communication, the
            Communication Session consists of a set of separate Multimedia
            Sessions between each pair of Participants. Another example is a
            centralized conference, where the Communication Session consists
            of a set of Multimedia Sessions between each Participant and the
            conference handler.</t>
          </section>
        </section>
      </section>
    </section>

    <section title="Relations at Different Levels">
      <t>This section uses the concepts from previous section and look at
      different types of relationships among them. These relationships occur
      at different levels and for different purposes. The section is organized
      such as to look at the level where a relation is required. The reason
      for the relationship may exist at another step in the media handling
      chain. For example, using Simulcast (discussed in <xref
      target="simulcast"/>) needs to determine relations at Packet Stream
      level, however the reason to relate Packet Streams is that multiple
      Media Encoders use the same Media Source, i.e. to be able to identify a
      common Media Source.</t>

      <section title="Media Source Relations">
        <t><xref target="media-source">Media Sources</xref> are commonly
        grouped and related to an <xref target="end-point">End Point</xref> or
        a <xref target="participant">Participant</xref>. This occurs for
        several reasons; both application logic as well as media handling
        purposes. These cases are further discussed below.</t>

        <section anchor="syncontext" title="Synchronization Context">
          <t>A Synchronization Context defines a requirement on a strong
          timing relationship between the Media Sources, typically requiring
          alignment of clock sources. Such relationship can be identified in
          multiple ways as listed below. A single Media Source can only belong
          to a single Synchronization Context, since it is assumed that a
          single Media Source can only have a single media clock and requiring
          alignment to several Synchronization Contexts (and thus reference
          clocks) will effectively merge those into a single Synchronization
          Context.</t>

          <!--MW: The following paragraph may be quite misplaced. Should be reconsidered when improving
text for the relations between RTP Sessions, Multimedia Sessions and Communication 
Sessions.-->

          <t>A single Multimedia Session can contain media from one or more
          Synchronization Contexts. An example of that is a Multimedia Session
          containing one set of audio and video for communication purposes
          belonging to one Synchronization Context, and another set of audio
          and video for presentation purposes (like playing a video file) with
          a separate Synchronization Context that has no strong timing
          relationship and need not be strictly synchronized with the audio
          and video used for communication.</t>

          <section title="RTCP CNAME">
            <t>RFC3550 <xref target="RFC3550"/> describes Inter-media
            synchronization between RTP Sessions based on RTCP CNAME, RTP and
            Network Time Protocol (NTP) <xref target="RFC5905"/> formatted
            timestamps of a reference clock.</t>
          </section>

          <section title="Clock Source Signaling">
            <t><xref target="I-D.ietf-avtcore-clksrc"/> provides a mechanism
            to signal the clock source in SDP both for the reference clock as
            well as the media clock, thus allowing a Synchronization Context
            to be defined beyond the one defined by the usage of CNAME source
            descriptions.</t>
          </section>

          <section title="CLUE Scenes">
            <t>In CLUE "Capture Scene", "Capture Scene Entry" and "Captures"
            define an implied Synchronization Context.</t>
          </section>

          <section title="Implicitly via RtcMediaStream">
            <t>The WebRTC WG defines "RtcMediaStream" with one or more
            "RtcMediaStreamTracks". All tracks in a "RTCMediaStream" are
            intended to be possible to synchronize when rendered.</t>
          </section>

          <section title="Explicitly via SDP Mechanisms">
            <t>RFC5888 <xref target="RFC5888"/> defines m=line grouping
            mechanism called "Lip Synchronization (LS)" for establishing the
            synchronization requirement across m=lines when they map to
            individual sources.</t>

            <t>RFC5576 <xref target="RFC5576"/> extends the above mechanism
            when multiple media sources are described by a single m=line.</t>
          </section>
        </section>

        <section title="End Point">
          <t>Some applications requires knowledge of what Media Sources
          originate from a particular <xref target="end-point">End
          Point</xref>. This can include such decisions as packet routing
          between parts of the topology, knowing the End Point origin of the
          Packet Streams.</t>

          <t>In RTP, this identification has been overloded with the
          Synchronization Context through the usage of the source description
          CNAME item. This works for some usages, but sometimes it breaks
          down. For example, if an End Point has two sets of Media Sources
          that have different Synchronization Contexts, like the audio and
          video of the human participant as well as a set of Media Sources of
          audio and video for a shared movie. Thus, an End Point may have
          multiple CNAMEs. The CNAMEs or the Media Sources themselves can be
          related to the End Point.</t>
        </section>

        <section title="Participant">
          <t>In communication scenarios, it is commonly needed to know which
          Media Sources that originate from which <xref
          target="participant">Participant</xref>. Thus enabling the
          application to for example display Participant Identity information
          correctly assoicated with the Media Sources. This association is
          currently handled through the signaling solution to point at a
          specific Multimedia Session where the Media Sources may be
          explicitly or implicitly tied to a particular End Point.</t>

          <t>Participant information becomes more problematic due to Media
          Sources that are generated through mixing or other conceptual
          processing of Raw Streams or Source Streams that originate from
          different Participants. This type of Media Sources can thus have a
          dynamically varying set of origins and Participants. RTP contains
          the concept of Contributing Sources (CSRC) that carries such
          information about the previous step origin of the included media
          content on RTP level.</t>
        </section>

        <section title="WebRTC MediaStream">
          <t>An RtcMediaStream, in addition to requiring a single
          Synchronization Context as discussed above, is also an explicit
          grouping of a set of Media Sources, as identifed by
          RtcMediaStreamTracks, within the RtcMediaStream.</t>
        </section>
      </section>

      <section title="Packetization Time Relations">
        <t>At RTP Packetization time, there exists a possibility for a number
        of different types of relationships between <xref
        target="encoded-stream">Encoded Streams</xref>, <xref
        target="dependent-stream">Dependent Streams</xref> and <xref
        target="packet-stream">Packet Streams</xref>. These are caused by
        grouping together or distributing these different types of streams
        into Packet Streams. This section will look at such relationships.</t>

        <section title="Single Stream Transport of SVC">
          <t><xref target="RFC6190">Scalable Video Coding</xref> has a mode of
          operation where Encoded Streams and Dependent Streams from the SVC
          Media Encoder is grouped together in a single Source Packet Stream
          using the SVC RTP Payload format.</t>
        </section>

        <section title="Multi-Channel Audio">
          <t>There exist a number of RTP payload formats that can carry
          multi-channel audio, despite the codec being a mono encoder.
          Multi-channel audio can be viewed as multiple Media Sources sharing
          a common Synchronization Context. These are then independently
          encoded by a Media Encoder and the different Encoded Streams are
          then packetized together in a time synchronizated way into a single
          Source Packet Stream using the used codec's RTP Payload format.
          Example of such codecs are, <xref target="RFC3551">PCMA and
          PCMU</xref>, <xref target="RFC4867">AMR</xref>, and <xref
          target="RFC5404">G.719</xref>.</t>
        </section>

        <section title="Redundancy Format">
          <t>The <xref target="RFC2198">RTP Payload for Redundant Audio
          Data</xref> defines how one can transport redundant audio data
          together with primary data in the same RTP payload. The redundant
          data can be a time delayed version of the primary or another time
          delayed Encoded stream using a different Media Encoder to encode the
          the same Media Source as the primary, as depicted below in <xref
          target="fig-red-rfc2198"/>.</t>

          <figure align="center" anchor="fig-red-rfc2198"
                  title="Concept for usage of Audio Redundancy  with different Media Encoders">
            <artwork><![CDATA[+--------------------+
|    Media Source    |
+--------------------+
          |
     Source Stream 
          |
          +------------------------+
          |                        |
          V                        V
+--------------------+   +--------------------+
|   Media Encoder    |   |   Media Encoder    |
+--------------------+   +--------------------+
          |                        |
          |                 +------------+
    Encoded Stream          | Time Delay |
          |                 +------------+
          |                        |
          |     +------------------+
          V     V             
+--------------------+ 
|  Media Packetizer  | 
+--------------------+
          |
          V
   Packet Stream ]]></artwork>
          </figure>

          <t>The Redundancy format is thus providing the necessary meta
          information to correctly relate different parts of the same Encoded
          Stream, or in the case <xref target="fig-red-rfc2198">depicted
          above</xref> relate the Received Source Stream fragments coming out
          of different Media Decoders to be able to combine them together into
          a less erroneous Source Stream.</t>
        </section>
      </section>

      <section title="Packet Stream Relations">
        <t>This section discusses various cases of relationships among Packet
        Streams. This is a common relation to handle in RTP due to that Packet
        Streams are separate and have their own SSRC, implying independent
        sequence numbers and timestamp spaces. The underlying reasons for the
        Packet Stream relationships are different, as can be seen in the cases
        below. The different Packet Streams can be handled within the same RTP
        Session or different RTP Sessions to accomplish different transport
        goals. This separation of Packet Streams is further discussed in <xref
        target="packet-stream-separation"/>.</t>

        <section anchor="simulcast" title="Simulcast">
          <t>A Media Source represented as multiple independent Encoded
          Streams constitutes a simulcast of that Media Source. <xref
          target="fig-simulcast"/> below represents an example of a Media
          Source that is encoded into three separate and different Simulcast
          streams, that are in turn sent on the same Media Transport flow.
          When using Simulcast, the Packet Streams may be sharing RTP Session
          and Media Transport, or be separated on different RTP Sessions and
          Media Transports, or be any combination of these two. It is other
          considerations that affect which usage is desirable, as discussed in
          <xref target="packet-stream-separation"/>.</t>

          <figure anchor="fig-simulcast"
                  title="Example of Media Source Simulcast">
            <artwork align="center"><![CDATA[                        +----------------+
                        |  Media Source  |
                        +----------------+
                 Source Stream  |
         +----------------------+----------------------+
         |                      |                      |
         v                      v                      v
+------------------+   +------------------+   +------------------+
|  Media Encoder   |   |  Media Encoder   |   |  Media Encoder   |
+------------------+   +------------------+   +------------------+
         | Encoded              | Encoded              | Encoded
         | Stream               | Stream               | Stream
         v                      v                      v
+------------------+   +------------------+   +------------------+
| Media Packetizer |   | Media Packetizer |   | Media Packetizer |
+------------------+   +------------------+   +------------------+
         | Source               | Source               | Source
         | Packet               | Packet               | Packet
         | Stream               | Stream               | Stream
         +-----------------+    |    +-----------------+
                           |    |    |
                           V    V    V
                      +-------------------+
                      |  Media Transport  |
                      +-------------------+
]]></artwork>
          </figure>

          <t>The simulcast relation between the Packet Streams is the common
          Media Source. In addition, to be able to identify the common Media
          Source, a receiver of the Packet Stream may need to know which
          configuration or encoding goals that lay behind the produced Encoded
          Stream and its properties. This to enable selection of the stream
          that is most useful in the application at that moment.</t>
        </section>

        <section anchor="svc" title="Layered Multi-Stream Transmission">
          <t>Multi-stream transmission (MST) is a mechanism by which different
          portions of a layered encoding of a Source Stream are sent using
          separate Packet Streams (sometimes in separate RTP sessions). MSTs
          are useful for receiver control of layered media.</t>

          <t>A Media Source represented as an Encoded Stream and multiple
          Dependent Streams constitutes a Media Source that has layered
          dependency. The figure below represents an example of a Media Source
          that is encoded into three dependent layers, where two layers are
          sent on the same Media Transport using different Packet Streams,
          i.e. SSRCs, and the third layer is sent on a separate Media
          Transport, i.e. a different RTP Session.</t>

          <figure align="center" anchor="fig-ddp"
                  title="Example of Media Source Layered Dependency">
            <artwork align="center"><![CDATA[                     +----------------+
                     |  Media Source  |
                     +----------------+
                             |
                             |
                             V
+---------------------------------------------------------+
|                      Media Encoder                      |
+---------------------------------------------------------+
        |                    |                     |
 Encoded Stream       Dependent Stream     Dependent Stream
        |                    |                     |
        V                    V                     V
+----------------+   +----------------+   +----------------+
|Media Packetizer|   |Media Packetizer|   |Media Packetizer|
+----------------+   +----------------+   +----------------+
        |                    |                     |
  Packet Stream         Packet Stream        Packet Stream
        |                    |                     |
        +------+      +------+                     |
               |      |                            |
               V      V                            V
         +-----------------+              +-----------------+
         | Media Transport |              | Media Transport |
         +-----------------+              +-----------------+
]]></artwork>
          </figure>

          <t>The SVC MST relation needs to identify the common Media Encoder
          origin for the Encoded and Dependent Streams. The SVC RTP Payload
          RFC is not particularly explicit about how this relation is to be
          implemented. When using different RTP Sessions, thus different Media
          Transports, and as long as there is only one Packet Stream per Media
          Encoder and a single Media Source in each RTP Session, common SSRC
          and CNAMEs can be used to identify the common Media Source. When
          multiple Packet Streams are sent from one Media Encoder in the same
          RTP Session, then CNAME is the only currently specified RTP
          identifier that can be used. In cases where multiple Media Encoders
          use multiple Media Sources sharing Synchronization Context, and thus
          having a common CNAME, additional heuristics need to be applied to
          create the MST relationship between the Packet Streams.</t>
        </section>

        <section anchor="repair" title="Robustness and Repair">
          <t>Packet Streams may be protected by Redundancy Packet Streams
          during transport. Several approaches listed below can achieve the
          same result; <list style="symbols">
              <t>Duplication of the original Packet Stream</t>

              <t>Duplication of the original Packet Stream with a time
              offset,</t>

              <t>Forward Error Correction (FEC) techniques, and</t>

              <t>Retransmission of lost packets (either globally or
              selectively).</t>
            </list></t>

          <t/>

          <section title="RTP Retransmission">
            <t>The <xref target="fig-rtx">figure below</xref> represents an
            example where a Media Source's Source Packet Stream is protected
            by a <xref target="RFC4588">retransmission (RTX) flow</xref>. In
            this example the Source Packet Stream and the Redundancy Packet
            Stream share the same Media Transport.</t>

            <figure align="center" anchor="fig-rtx"
                    title="Example of Media Source Retransmission Flows">
              <artwork align="center"><![CDATA[+--------------------+
|    Media Source    |
+--------------------+
          |
          V
+--------------------+
|   Media Encoder    |
+--------------------+
          |                              Retransmission
    Encoded Stream     +--------+     +---- Request
          V            |        V     V
+--------------------+ | +--------------------+
|  Media Packetizer  | | | RTP Retransmission |
+--------------------+ | +--------------------+
          |            |           |
          +------------+  Redundancy Packet Stream
   Source Packet Stream            |
          |                        |
          +---------+    +---------+
                    |    |
                    V    V
             +-----------------+
             | Media Transport |
             +-----------------+
]]></artwork>
            </figure>

            <t>The <xref target="fig-rtx">RTP Retransmission example</xref>
            helps illustrate that this mechanism works purely on the Source
            Packet Stream. The RTP Retransmission transform buffers the sent
            Source Packet Stream and upon requests emits a retransmitted
            packet with some extra payload header as a Redundancy Packet
            Stream. The <xref target="RFC4588">RTP Retransmission
            mechanism</xref> is specified so that there is a one to one
            relation between the Source Packet Stream and the Redundancy
            Packet Stream. Thus a Redundancy Packet Stream needs to be
            associated with its Source Packet Stream upon being received. This
            is done based on CNAME selectors and heuristics to match requested
            packets for a given Source Packet Stream with the original
            sequence number in the payload of any new Redundancy Packet Stream
            using the RTX payload format. In cases where the Redundancy Packet
            Stream is sent in a separate RTP Session from the Source Packet
            Stream, these sessions are related, e.g. using the <xref
            target="RFC5888">SDP Media Grouping's</xref> FID semantics.</t>
          </section>

          <section title="Forward Error Correction">
            <t>The <xref target="fig-fec">figure below</xref> represents an
            example where two Media Sources' Source Packet Streams are
            protected by FEC. Source Packet Stream A has a Media Redundnacy
            transformation in FEC Encoder 1. This produces a Redundancy Packet
            Stream 1, that is only related to Source Packet Stream A. The FEC
            Encoder 2, however takes two Source Packet Streams (A and B) and
            produces a Redundancy Packet Stream 2 that protects them together,
            i.e. Redundancy Packet Stream 2 relate to two Source Packet
            Streams (a FEC group). FEC decoding, when needed due to packet
            loss or packet corruption at the receiver, requires knowledge
            about which Source Packet Streams that the FEC encoding was based
            on.</t>

            <t>In <xref target="fig-fec"/> all Packet Streams are sent on the
            same Media Transport. This is however not the only possible
            choice. Numerous combinations exist for spreading these Packet
            Streams over different Media Transports to achieve the
            communication application's goal.</t>

            <figure align="center" anchor="fig-fec"
                    title="Example of FEC Flows">
              <artwork align="center"><![CDATA[+--------------------+                +--------------------+ 
|   Media Source A   |                |   Media Source B   | 
+--------------------+                +--------------------+ 
          |                                     |            
          V                                     V            
+--------------------+                +--------------------+ 
|   Media Encoder A  |                |   Media Encoder B  | 
+--------------------+                +--------------------+ 
          |                                     |            
    Encoded Stream                        Encoded Stream     
          V                                     V            
+--------------------+                +--------------------+ 
| Media Packetizer A |                | Media Packetizer B | 
+--------------------+                +--------------------+ 
          |                                     |
Source Packet Stream A                Source Packet Stream B
          |                                     |
    +-----+-------+-------------+       +-------+------+
    |             V             V       V              |
    |    +---------------+  +---------------+          |
    |    | FEC Encoder 1 |  | FEC Encoder 2 |          |
    |    +---------------+  +---------------+          |
    |             |                 |                  |
    |     Redundancy PS 1    Redundancy PS 2           |
    V             V                 V                  V
+----------------------------------------------------------+
|                    Media Transport                       |
+----------------------------------------------------------+
]]></artwork>
            </figure>

            <t>As FEC Encoding exists in various forms, the methods for
            relating FEC Redundancy Packet Streams with its source information
            in Source Packet Streams are many. The <xref target="RFC5109">XOR
            based RTP FEC Payload format</xref> is defined in such a way that
            a Redundancy Packet Stream has a one to one relation with a Source
            Packet Stream. In fact, the RFC requires the Redundancy Packet
            Stream to use the same SSRC as the Source Packet Stream. This
            requires to either use a separate RTP session or to use the <xref
            target="RFC2198">Redundancy RTP Payload format</xref>. The
            underlying relation requirement for this FEC format and a
            particular Redundancy Packet Stream is to know the related Source
            Packet Stream, including its SSRC.</t>

            <t><!--MW: Here we could ad something about FECFRAME and generalized block FEC that can 
protect multiple Packet Streams with one Redundancy Packet Stream. However, that do requrie 
usage of explicit Source Packet Information. --></t>
          </section>
        </section>

        <section anchor="packet-stream-separation"
                 title="Packet Stream Separation">
          <t>An important aspect of Packet Stream relations is the level of
          separation between the Packet Streams. This section discusses some
          alternatives.</t>

          <section title="SSRC-Only Based Separation">
            <t>When the Packet Streams that have a relationship are all sent
            in the same RTP Session and their identification and separation
            from each other are based on the SSRC only, it is SSRC-Only Based
            Separation.</t>
          </section>

          <section title="RTP Session Based Separation">
            <t>Packet Streams that are related, but are sent in the context of
            different RTP Sessions to achieve separation between the Packet
            Streams. This is commonly used when the different Packet Streams
            are intended for different Media Transports.</t>

            <t>Several mechanisms that uses RTP Session based separation
            relies on it to enable an implicit grouping mechanism expressing
            the relationship. The solutions have been based on using the same
            SSRC value in the different RTP Sessions to implicitly indicate
            their relation. That way, no explicit RTP level mechanism has been
            needed, only signalling level relations have been established
            using semantics from <xref target="RFC5888">Grouping of Media
            lines framework</xref>. Examples of this are <xref
            target="RFC4588">RTP Retransmission</xref>, <xref
            target="RFC6190">SVC Multi Stream Transmission</xref> and <xref
            target="RFC5109">XOR Based FEC</xref>.</t>
          </section>

          <section title="Multimedia Session based Separation">
            <t>Packet Streams that are related and need to be associated can
            be part of different Multimedia Sessions, rather than just
            different RTP sessions within the same Multimedia Session context.
            This puts further demand on the scope of the mechanism(s) and its
            handling of identifiers used for expressing the relationships.</t>
          </section>
        </section>
      </section>

      <section title="Multiple RTP Sessions over one Media Transport">
        <t><xref target="I-D.westerlund-avtcore-transport-multiplexing"/>
        describes a mechanism that allow several RTP Sessions to be carried
        over a single underlying Media Transport. The main reasons for doing
        this are related to the impact of using one or more Media Transports.
        Thus using a common network path or potentially have different ones.
        There is reduced need for NAT/FW traversal resources and no need for
        flow based QoS.</t>

        <t>However, Multiple RTP Sessions over one Media Transport makes it
        clear that a single Media Transport 5-tuple is not sufficient to
        express which RTP Session context a particular Packet Stream exists
        in. Complexities in the relationship between Media Transports and RTP
        Session already exist as one RTP Session contains multiple Media
        Transports, e.g. even a Peer-to-Peer RTP Session with RTP/RTCP
        Multiplexing requires two Media Transports, one in each direction. The
        relationship between Media Transports and RTP Sessions as well as
        additional levels of identifers need to be considered in both
        signalling design and when defining terminology.</t>
      </section>
    </section>

    <section anchor="relationships" title="Topologies and Contexts">
      <t>WARNING: This section has not yet been updated to reflect the changes
      in the previous sections (1-3). This is planned to be addressed in the
      next version! </t>

      <t>This section provides various relationships that can co-exist between
      the aforementioned concepts in a given RTP usage. Using Unified Modeling
      Language (UML) class diagrams <xref target="UML"/>, <xref
      target="fig-media-source"/> below depicts general relations between a
      Media Source, its Media Provider(s) and the resulting Media
      Stream(s).</t>

      <t><list style="empty">
          <t>Note: The RTCP Stream related to the RTP Stream is not shown in
          the figure.</t>
        </list></t>

      <figure align="center" anchor="fig-media-source"
              title="Media Source Relations">
        <artwork align="center"><![CDATA[+--------------+  <<uses>>  +-------------------------+
| Media Source |- - - - - ->| Synchronization Context |
+--------------+            +-------------------------+
      < > 1..*
       |
       | 0..*
+-------------+
|             |<>-+ 0..*
|    Media    |   |
|   Encoder   |   |
|             |---+ 0..*
+-------------+
      < > 1
       |
       | 0..*
+---------------+ 0..*     1 +-------------+
| Packet Stream |----------<>| RTP Session |
+---------------+            +-------------+
]]></artwork>
      </figure>

      <t>Media Sources can have a large variety of relationships among them.
      These relationships can apply both between sources within a single RTP
      Session, and between Media Sources that occur in multiple RTP Session.
      Ways of relating them typically involve groups: a set of Media Sources
      has some relationship that applies to all those in the group, and no
      others. (Relationships that involve arbitrary non-grouping associations
      among Media Sources, such that e.g., A relates to B and B to C, but A
      and C are unrelated, are uncommon if not nonexistent.) In many cases,
      the semantics of groups are not simply that the the members form an
      undifferentiated group, but rather that members of the group have
      certain roles.</t>

      <section anchor="equivalence" title="Equivalence Context">
        <t>In this relationship, different instances of a concept are treated
        to be equivalent for the purposes of relating them to the Media
        Source.</t>

        <t><xref target="fig-rtp-stream"/> below depicts in UML notation the
        general relation between a Media Source and its Media Stream(s),
        including the Packet Stream specializations Source Packet Stream and
        Redundancy Packet Stream.</t>

        <figure align="center" anchor="fig-rtp-stream"
                title="Media Stream Relations">
          <artwork align="center"><![CDATA[              +---------------+
              |               |<>-+ 0..*
              |     Media     |   |
              |    Encoder    |   |
              |               |---+ 0..*
              +---------------+
                     < > 1
                      |
                      | 0..*
              +---------------+ 0..*  1 +-----------------+
              | Packet Stream |<>-------| Media Transport |
              +---------------+         +-----------------+
                /\         /\
               +--+       +--+
                |           |
        +-------+           +-------+
        |                           |
+---------------+           +---------------+ 1
|     Source    |<>---------|  Redundancy   |<>-+
| Packet Stream | 1..* 0..* | Packet Stream |---+
+---------------+           +---------------+ 0..*

]]></artwork>
        </figure>

        <t>This relation can in combination with <xref
        target="fig-media-source"/> be used to achieve a set of
        functionalities, described below.</t>

        <section anchor="fid" title="SDP FID Semantics">
          <t>RFC5888 <xref target="RFC5888"/> defines m=line grouping
          mechanism called "FID" for establishing the equivalence of Media
          Streams across the m=lines under grouping.</t>

          <t>RFC5576 <xref target="RFC5576"/> extends the above mechanism when
          multiple media sources are described by a single m=line.</t>
        </section>
      </section>

      <section title="Session Context">
        <t>There are different ways to construct a Communication Session. The
        general relation in UML notation between a Communication Session,
        Participants, Multimedia Sessions and RTP Sessions is outlined
        below.</t>

        <figure align="center" anchor="fig-sessions" title="Session Relations">
          <artwork align="center"><![CDATA[               +---------------+
               | Communication |
               |    Session    |
               +---------------+
            0..* < >       < > 1..*
                  |         |
       +----------+         +--------+
  1..* |                             | 1..*
+-------------+ 1     0..* +--------------------+
| Participant |<>----------| Multimedia Session |
+-------------+            +--------------------+
      < > 1                         < > 1
       |                             | 0..*
       |                      +-------------+
       |                      | RTP Session |
       |                      +-------------+
       |                            < > 1
       | 0..*                        | 0..*
+-----------------+ 1   0..* +--------------+
| Media Transport |--------<>| Packet Stream|
+-----------------+          +--------------+

]]></artwork>
        </figure>

        <t>Several different flavors of Session can be possible. A few typical
        examples are listed in the below sub-sections, but many other are
        possible to construct.</t>

        <section title="Point-to-Point Session">
          <t>In this example, a single Multimedia Session is shared between
          the two Participants. That Multimedia Session contains a single RTP
          Session with two Media Streams from each Participant. Each
          Participant has only a single Media Transport, carrying those Media
          Streams, which is the main reason why there is only a single RTP
          Session.</t>

          <figure align="center" anchor="fig-point-to-point"
                  title=" Example Point-to-Point Session">
            <artwork><![CDATA[                        +-----------------+
                        | Point-to-Point  |
                        |     Session     |
                        +-----------------+
                          < >   < >   < >
                           |     |     |
       +-------------------+     |     +-------------------+
       |                         |                         |
  +-------------+     +--------------------+      +-------------+
  | Participant |<>---| Multimedia Session |----<>| Participant |
  +-------------+     +--------------------+      +-------------+
     < >                        < >                        < >
      |                          |                          |
      |  +-----------+     +-----------+     +-----------+  |
      |  |  Media    |---<>|    RTP    |<>---|  Media    |  |
      |  |  Stream   |     |  Session  |     |  Stream   |  |
      |  +-----------+     +-----------+     +-----------+  |
      |   < >               < >     < >               < >   |
      |    |                 |       |                 |    |
+-----------+      +-----------+   +-----------+      +-----------+
|   Media   |----<>|  Media    |   |  Media    |<>----|   Media   |
| Transport |      |  Stream   |   |  Stream   |      | Transport |
+-----------+      +-----------+   +-----------+      +-----------+
]]></artwork>
          </figure>
        </section>

        <section title="Full Mesh Session">
          <t>In this example, the Full Mesh Session has three Participants,
          each of which has the same characteristics as the example in the
          previous section; a single Media Transport per peer Participant,
          resulting in a single RTP session between each pair of
          Participants.</t>

          <figure align="center" anchor="fig-full-mesh"
                  title="Example Full Mesh Session">
            <artwork><![CDATA[+-----------+              +-------------+              +-----------+
|   Media   |------------<>| Participant |<>------------|   Media   |
| Transport |              +-------------+              | Transport |
+-----------+                     |                     +-----------+
    |      |         +----------+ | +----------+         |      |
   < >    < >        |Multimedia| | |Multimedia|        < >    < >
+--------++--------+ |  Session | | | Session  | +--------++--------+
| Media  || Media  | +----------+ | +----------+ | Media  || Media  |
| Stream || Stream |  < >    |    |    |    < >  | Stream || Stream |
+--------++--------+   |     |    |    |     |   +--------++--------+
    |           |      |     |    |    |     |      |          |
    |          < >     |    < >  < >  < >    |     < >         |
    |         +---------+ +--------------+  +---------+        |
    +-------<>|   RTP   | |  Full Mesh   |  |   RTP   |<>------+
    +-------<>| Session | |   Session    |  | Session |<>------+
    |         +---------+ +--------------+  +---------+        |
    |          < >         < >   < >   < >      < >            | 
    |           |           |     |     |        |             |
+--------++--------+        |     |     |        +--------++--------+
| Media  || Media  |        |     |     |        | Media  || Media  |
| Stream || Stream |        |     |     |        | Stream || Stream |
+--------++--------+        |     |     |        +--------++--------+
   < >    < >               |     |     |               < >    < >
    |      |                |     |     |                |      |
+-----------+               |     |     |               +-----------+
|   Media   |               |     |     |               |   Media   |
| Transport |               |     |     |               | Transport |
+-----------+ +-------------+     |     +-------------+ +-----------+
              |                   |                   |
+-------------+        +--------------------+         +-------------+
| Participant |<>------| Multimedia Session |-------<>| Participant |
+-------------+        +--------------------+         +-------------+
  < >                            < >                            < >
   |                              |                              |
   |   +--------+           +---------+           +--------+     |
   |   | Media  |---------<>|   RTP   |<>---------| Media  |     |
   |   | Stream |           | Session |           | Stream |     |
   |   +--------+           +---------+           +--------+     |
   |    < >                 < >     < >                 < >      |
   |     |                   |       |                   |       |
+-----------+           +--------+ +--------+           +-----------+
|   Media   |---------<>| Media  | | Media  |<>---------|   Media   |
| Transport |           | Stream | | Stream |           | Transport |
+-----------+           +--------+ +--------+           +-----------+
]]></artwork>
          </figure>
        </section>

        <section title="Centralized Conference Session">
          <t>Text to be provided</t>

          <figure align="center" anchor="fig-central-conf"
                  title="Example Centralized Conference Session">
            <artwork><![CDATA[TBD]]></artwork>
          </figure>
        </section>
      </section>
    </section>

    <section anchor="security" title="Security Considerations">
      <t>This document simply tries to clarify the confusion prevalent in RTP
      taxonomy because of inconsistent usage by multiple technologies and
      protocols making use of the RTP protocol. It does not introduce any new
      security considerations beyond those already well documented in the RTP
      protocol <xref target="RFC3550"/> and each of the many respective
      specifications of the various protocols making use of it.</t>

      <t>Hopefully having a well-defined common terminology and understanding
      of the complexities of the RTP architecture will help lead us to better
      standards, avoiding security problems.</t>
    </section>

    <section title="Acknowledgement">
      <t>This document has many concepts borrowed from several documents such
      as WebRTC <xref target="I-D.ietf-rtcweb-overview"/>, CLUE <xref
      target="I-D.ietf-clue-framework"/>, Multiplexing Architecture <xref
      target="I-D.westerlund-avtcore-transport-multiplexing"/>. The authors
      would like to thank all the authors of each of those documents.</t>

      <t>The authors would also like to acknowledge the insights, guidance and
      contributions of Magnus Westerlund, Roni Even, Colin Perkins, Keith
      Drage, and Harald Alvestrand.</t>
    </section>

    <section title="Contributors">
      <t>Magnus Westerlund has contributed the concept model for the media
      chain using transformations and streams model, including rewriting
      pre-existing concepts into this model and adding missing concepts. Also
      the rewriting of the relationships he has contributed. </t>
    </section>

    <section title="Open Issues">
      <t>Much of the terminology is still a matter of dispute.</t>

      <t>It might be useful to distinguish between a single endpoint's view of
      a source, or RTP session, or multimedia session, versus the full set of
      sessions and every endpoint that's communicating in them, with the
      signaling that established them.</t>

      <t>(Sure to be many more...)</t>
    </section>

    <section anchor="iana" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.3550"?>

      <reference anchor="UML">
        <front>
          <title>OMG Unified Modeling Language (OMG UML), Superstructure,
          V2.2</title>

          <author>
            <organization abbrev="OMG">Object Management Group</organization>
          </author>

          <date month="February" year="2009"/>
        </front>

        <seriesInfo name="OMG" value="formal/2009-02-02"/>

        <format target="http://www.omg.org/spec/UML/2.2/Superstructure/PDF/"
                type="PDF"/>
      </reference>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.2198'?>

      <?rfc include="reference.RFC.3264"?>

      <?rfc include='reference.RFC.3551'?>

      <?rfc include="reference.RFC.4566"?>

      <?rfc include='reference.RFC.4588'?>

      <?rfc include='reference.RFC.4867'?>

      <?rfc include='reference.RFC.5109'?>

      <?rfc include='reference.RFC.5404'?>

      <?rfc include="reference.RFC.5576"?>

      <?rfc include="reference.RFC.5888"?>

      <?rfc include="reference.RFC.5905"?>

      <?rfc include='reference.RFC.6190'?>

      <?rfc include="reference.RFC.6222"?>

      <?rfc include="reference.I-D.ietf-clue-framework"?>

      <?rfc include="reference.I-D.ietf-rtcweb-overview"?>

      <?rfc include="reference.I-D.ietf-mmusic-sdp-bundle-negotiation"?>

      <?rfc include="reference.I-D.ietf-avtcore-clksrc"?>

      <?rfc include="reference.I-D.westerlund-avtcore-transport-multiplexing"?>
    </references>

    <section title="Changes From Earlier Versions">
      <t>NOTE TO RFC EDITOR: Please remove this section prior to
      publication.</t>

      <section title="Changes From Draft -00">
        <t><list style="symbols">
            <t>Too many to list</t>

            <t>Added new authors</t>

            <t>Updated content organization and presentation</t>
          </list></t>
      </section>

      <section title="Changes From Draft -01">
        <t><list style="symbols">
            <t>Section 2 rewritten to add both streams and transformations in
            the media chain.</t>

            <t>Section 3 rewritten to focus on exposing relationships.</t>
          </list></t>
      </section>
    </section>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-24 03:21:07