http://stupid.domain.name/ietf/

One document matched: draft-westerlund-avtcore-rtp-simulcast-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-rtp-simulcast-02"
     ipr="trust200902" submissionType="IETF">
  <front>
    <title abbrev="RTP Simulcast">Using Simulcast in RTP Sessions</title>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 82 87</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <author fullname="Bo Burman" initials="B." surname="Burman">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 13 11</phone>

        <email>bo.burman@ericsson.com</email>
      </address>
    </author>

    <author fullname="Morgan Lindqvist" initials="M." surname="Lindqvist">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>Kista</city>

          <region/>

          <code>SE-164 80</code>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 719 00 00</phone>

        <facsimile/>

        <email>morgan.lindqvist@ericsson.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Fredrik Jansson" initials="F." surname="Jansson">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>Kista</city>

          <region/>

          <code>SE-164 80</code>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 719 00 00</phone>

        <facsimile/>

        <email>fredrik.k.jansson@ericsson.com</email>

        <uri/>
      </address>
    </author>

    <date day="25" month="February" year="2013"/>

    <abstract>
      <t>In some applications it may be necessary to send multiple media
      encodings derived from the same media source in independent RTP media
      streams. This is called Simulcast. This document discusses the best way
      of accomplishing this in RTP and how to signal it in SDP. It is
      concluded that a solution where the different simulcast versions are
      based on separate SDP media descriptions provides best support for
      simulcast. A solution is defined by making two extensions to SDP. The
      first extension consists of two new attributes in SDP that express
      capability to send or receive simulcast streams, respectively. The
      second extension describes how to group media descriptions belonging to
      the same simulcast source by using the grouping framework.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Simulcast is the act of simultaneously sending multiple different
      versions of the same media content, e.g. the same video source encoded
      with different video encoders or target resolutions. This can be done in
      several ways and for different purposes. This document focuses on the
      case where one wants to provide multiple streams with different
      encodings over <xref target="RFC3550">RTP</xref> towards an intermediary
      so that the intermediary can select which encoding to forward to other
      participants in the session, and more specifically how the grouping of
      the streams is defined. From an RTP perspective, simulcast is a specific
      application of the aspects discussed in <xref
      target="I-D.westerlund-avtcore-multiplex-architecture">RTP Multiplexing
      Architecture</xref>.</t>

      <t>The different encodings of a media content that are considered in
      this document can differ in:</t>

      <t><list style="hanging">
          <t hangText="Bit-rate:">The difference is the amount of bits spent
          to encode the media thus giving different quality.</t>

          <t hangText="Codec:">Different media codecs are used to ensure that
          different receivers that do not have a common set of decoders can
          decode at least one of the versions. This can include codec
          configuration options that are not compatible, like video encoder
          profiles, or the capability of receiving the transport
          packetization.</t>

          <t hangText="Sampling:">Different sampling of media, in spatial as
          well as in temporal domain, may be used to suit different rendering
          capabilities or needs at the receiving endpoints, as well as a
          method to achieve different bit-rates. For video streams, spatial
          sampling affects image resolution and temporal sampling affects
          video frame rate. For audio, spatial sampling relates to the number
          of audio channels and temporal sampling affects audio bandwidth.
          Obviously, a difference in sampling may result in difference in
          bit-rate.</t>
        </list>There are different reasons for an application to provide
      multiple different encodings of a single media source. As soon as an
      application has the need to send multiple encodings, there is a
      potential need for simulcast. This need can arise even when using media
      codecs that have scalability features built in. The purpose of this
      document is to describe a few scenarios where it is motivated to use
      simulcast, elaborate on possible alternatives and available mechanisms,
      and find a suitable solution for signaling and performing RTP simulcast.
      The discussion results in a signaling proposal to support simulcast.</t>
    </section>

    <section title="Definitions">
      <t/>

      <section title="Terminology">
        <t>The following terms and abbreviations are used in this
        document:<list style="hanging">
            <t hangText="Encoding:">A particular encoding is the choice of the
            media encoder (codec) that has been used to compress the media and
            the fidelity of that encoding through the choice of sampling,
            bit-rate and other codec configuration parameters.</t>

            <t hangText="Different encodings:">An encoding is different when
            some parameter that characterize the encoding of a particular
            media source is changed. Such changes can be one or more of the
            following parameters; codec, codec configuration, bit-rate,
            sampling.</t>

            <t hangText="Simulcast versions:">Media streams used for simulcast
            that use different encodings and thus constitute different
            versions of the same media source.</t>
          </list></t>
      </section>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>
    </section>

    <section title="Simulcast Scenarios">
      <t>This section discusses different usage scenarios for the term
      simulcast and clarifies which of those this document focuses on. It also
      reviews why simulcast and scalable codecs can be a useful
      combination.</t>

      <section title="Simulcasting to RTP Mixer">
        <t>This scenario relates to a multi-party session where one or more
        central nodes are used to facilitate the media transport between the
        session participants. Thus, this targets the RTP Mixer Topology
        defined in <xref target="RFC5117"/> (Section 3.4: Topo-Mixer). This
        scenario is targeted for further discussion in this document.</t>

        <t>Simulcasting different media encodings of video that differ both in
        resolution and in bit-rate is highly applicable to video conferencing
        scenarios. For example, an RTP mixer selects the video of the most
        active speaker and sends that participant's video stream as a high
        resolution stream to the other participants, and in addition also
        sends a number of low resolution video streams of the other
        participants, enabling the receiving user to both display the current
        speaker in high quality and monitor the other participants in lower
        quality/resolution/size. As the participants should not receive the
        stream showing themselves, the set of streams will be unique to all
        participants.</t>

        <t>A number of alternatives exist to provide both high and low
        resolutions from an RTP Mixer:<list style="hanging">
            <t hangText="Simulcast:">The clients send one stream for the low
            resolution and another for the high resolution to the RTP
            Mixer.</t>

            <t hangText="Scalable Video Coding:">The clients send one stream
            to the RTP Mixer, using a video encoder that in this stream can
            provide both the high resolution and also enables the mixer to
            extract a low resolution representation from that single
            stream.</t>

            <t hangText="Transcoding in the Mixer:">The clients send a high
            resolution stream to the RTP Mixer which performs a transcoding to
            a lower resolution stream.</t>
          </list>The Transcoding alternative requires that the RTP mixer has
        sufficient amount of transcoding resources to produce the number of
        low resolution streams required. In worst case, all participants'
        streams may need to be transcoded. If the resources are not available,
        a different solution is needed. There will also normally be a quality
        loss and an increase in latency associated with the transcoding
        operation.</t>

        <t>Scalable video encoding requires a more complex encoder compared to
        non-scalable encoding. Also, if the resolution difference between the
        streams is large, a scalable codec may in fact be only marginally more
        bandwidth efficient than the simulcast case where the different
        resolutions are sent as separate streams from the clients to the
        mixer. At the same time, with scalable video encoding using the
        currently available scalable video codecs, the transmission of all but
        the lowest resolution will consume more bandwidth from the mixer to
        the other participants compared to a non-scalable encoding.</t>

        <t>Simulcasting has the benefit that it is conceptually simple. It
        enables the use of any media codec that the participants agree on,
        allowing the RTP mixer to be codec-agnostic.</t>

        <figure align="center" anchor="fig-mixer-forwarding"
                title="RTP Mixer selecting from simulcast versions">
          <artwork><![CDATA[
           +------------+      +---+
+---+      |            |----->| B |
|   |=====>|            |      +---+
| A |      |   Mixer    |
|   |----->|            |      +---+
+---+      |            |=====>| C |
           +------------+      +---+
]]></artwork>
        </figure>

        <t>The sender A provides the mixer with both a high resolution version
        "===>" and a low resolution version "--->". The mixer selects
        who in it's receiver population should get a particular version.</t>

        <section title="Simulcast Combined with Scalable Encoding">
          <t>As explained in the previous section, a scalable codec is not
          always more bandwidth efficient than simulcast, especially in the
          path from the mixer to the receiver.</t>

          <t>There are however cases where a combination of simulcast and
          scalable encoding can be beneficial. By using simulcast in cases
          where the scalable codec is less efficient, it is possible to
          optimize the efficiency of the complete system. A good example of
          this usage would be where the video is encoded using <xref
          target="RFC6190">SVC transported in RTP</xref>, where each simulcast
          stream has a different resolution, and each SVC media stream uses
          temporal scalability and signal to noise ratio (SNR) scalability
          within that single media stream. If only resolution and temporal
          variations are needed, this can be implemented using the
          non-scalable part of H.264, as each simulcast version provides the
          different resolution, and each media stream within a simulcast
          encoding has temporal scalability through the use of non-reference
          frames.</t>
        </section>
      </section>

      <section title="Multicast Transported Simulcasted Media">
        <t>When using multicast, particularly <xref
        target="RFC3569">Source-Specific Multicast (SSM)</xref> to distribute
        RTP/RTCP packets to a large receiver population one faces some issues.
        There are at least two different issues where simulcast can
        potentially be useful.</t>

        <section title="Diversity in Receiver Population">
          <t>If there is any diversity in the receivers regarding e.g.
          capability, codec support or code base, there are potentially
          restrictions in what streams can be delivered to the receivers. If
          using the lowest common denominator over a diverse receiver
          population isn't acceptable, simulcast can be one possible solution.
          By offering different stream alternatives, it is possible to let the
          receivers choose the simulcast version that matches their
          capabilities. By using explicit signalling for simulcast, it is not
          necessary for the stream distributor to handle multiple receiver
          configurations individually for a multi-media session, nor to ensure
          that each receiver gets an encoding that matches their
          capabilities.</t>

          <t>The simulcast version granularity the receivers can select will
          be on multicast group level. Thus, this use case puts a strict
          requirement on supporting separation through differnt RTP sessions.
          The reason being that having a single RTP session straddle several
          multicast groups makes any reporting on the received sources very
          difficult to interpret. Using one RTP session per simulcast version
          instead provides consistency.</t>
        </section>

        <section title="Bit-rate Adaptation">
          <t>If the network paths from the media sender to the receivers can
          support different bit-rates, there is a need to support media
          streams encoded to different bit-rates. If these path differences
          are of a more static nature, for example depending primarily on the
          underlying link layers, using simulcast has an advantage over
          scalable encoding. The reason is that the efficiency of scalable
          coding will never be better than encoding to a single target rate.
          When the receiver can determine current network interface
          connectivity, it can choose simulcast version with certainty. That
          choice will also be correct until the event of another network
          interface becoming the active one. This assumes that the multicast
          transmission uses dedicated resources and will thus not be congested
          due to other network traffic. To support this behavior, the
          signalling must support indication of which media streams that are
          alternatives to each other, and it is also necessary to be able to
          determine aggregate bit-rate for the selected multicast group(s)
          compared to available network properties.</t>

          <t>Simulcast is possible to use also in more dynamic situations
          where each receiver continuously gathers reception statistics to
          detect path congestion and based on that may change which version to
          receive. The main issue with such usage is how to achieve a switch
          from one version to another with minimal playback interruption and
          also avoiding to put extra load on the network during the actual
          switch. Here, scalable encoding in general have better
          characteristics since scalability layers are typically
          synchronized.</t>

          <t>When comparing simulcast and scalable encoding, the trade-offs
          are different and the down-sides occur at different places.
          Simulcast will have a higher bit-rate load at a media sender and
          that will also be the case for any network path shared between
          receivers of multiple simulcast versions. However, for parts of the
          network path where there is only a single simulcast version, the
          achievable quality at a given bit-rate will be slightly higher for
          simulcast. It will also be more difficult to seamlessly switch
          between simulcast versions than between different scalable
          encodings, as simulcast actually switches from one media stream
          version to another instead of adding or removing some enhancement
          layers.</t>
        </section>
      </section>

      <section title="Same Encoding to Multiple Destinations">
        <t>One interpretation of simulcast is when one encoding is sent to
        multiple receivers. This is well supported in RTP by simply copying
        all outgoing RTP and RTCP traffic to several transport destinations,
        if the intention is to create a common RTP session. As long as all
        participants do the same, a full mesh is constructed and everyone in
        the multi party session have a similar view of the joint RTP session.
        This is analog to an Any Source Multicast (ASM) session but without
        the traffic optimization as multiple copies of the same content is
        likely to have to pass over the same link.</t>

        <figure align="center" anchor="fig-full-mesh"
                title="Full Mesh / Multi-unicast">
          <artwork><![CDATA[
+---+      +---+
| A |<---->| B |
+---+      +---+
  ^         ^
   \       /
    \     /
     v   v
     +---+
     | C |
     +---+
]]></artwork>
        </figure>

        <t>As this type of simulcast is analog to ASM usage and RTP has good
        support for ASM sessions, no further consideration is made in this
        document for this scenario.</t>
      </section>

      <section title="Different Encoding to Independent Destinations">
        <t>Another alternative interpretation of simulcast includes multiple
        destinations, where each destination gets a specifically tailored
        version, but where the destinations are independent. A typical example
        for this would be a streaming server distributing the same live
        session to a number of receivers, adapting the quality and resolution
        of the multi-media session to each receiver's capability and available
        bit-rate. This case can be solved in RTP by having independent RTP
        sessions between the sender and the receivers. Thus this case is not
        considered further.</t>
      </section>
    </section>

    <section title="Network Aspects">
      <t>The network aspects that are relevant for simulcast are:<list
          style="hanging">
          <t hangText="Quality of Service:">When using simulcast it might be
          of interest to prioritize a particular simulcast version, rather
          than applying equal treatment to all versions. For example, lower
          bit-rate versions may be prioritized over higher bit-rate versions
          to minimize congestion or packet losses in the low bit-rate
          versions. Thus, there is a benefit to use a simulcast solution that
          supports QoS as good as possible. By separating simulcast versions
          into different RTP sessions and send those RTP sessions over
          different transport flows, a simulcast version can be prioritized by
          existing flow based QoS mechanisms. When using unicast, QoS
          mechanisms based on individual packet marking are also feasible,
          which do not require separation of simulcast versions into different
          RTP sessions to apply different QoS.</t>

          <t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
          incur more cost for NAT/FW traversal unless they can re-use the same
          transport flow, which can be achieved by either one of <xref
          target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
          multiple RTP sessions on a single lower layer transport</xref> or
          <xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">Multiplexing
          Negotiation Using SDP Port Numbers</xref>. If flow based QoS with
          any differentiation is desirable, the cost for additional transport
          flows is likely necessary.</t>

          <t hangText="Multicast:">Multiple RTP sessions will be required to
          enable combining simulcast with multicast. Different simulcast
          versions have to be separated to different multicast groups to allow
          a multicast receiver to pick the version it wants, rather than
          receive all of them. In this case, the only reasonable
          implementation is to use different RTP sessions for each multicast
          group so that reporting and other RTCP functions operate as
          intended.</t>
        </list></t>

      <t/>
    </section>

    <section title="Simulcast Alternatives">
      <t>Simulcast is in this document defined as the act of sending multiple
      alternative encodings of the same underlying media source. When
      transmitting multiple independent streams that originate from the same
      source, it could potentially be done in several different ways using
      RTP. A general discussion on how considerations for use of the different
      RTP multiplexing alternatives can be found in <xref
      target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
      using the Multiplexing Features of RTP</xref>. Discussion and
      clarification on how to handle multiple streams in an RTP session can be
      found in <xref target="I-D.lennox-avtcore-rtp-multi-stream"/>.</t>

      <t>The below sub-sections briefly describe potential ways of achieving
      RTP media stream multiplexing and identification of which streams are
      alternative simulcast encodings of the same source. In the following
      descriptions it is also included how this interacts with multiple
      sources (SSRCs) in the same RTP session for other reasons than
      simulcast. Multiple SSRCs may occur for various reasons such as multiple
      participants in multipoint topologies like multicast, transport relays
      or full mesh transport simulcasting, multiple source devices such as
      multiple cameras or microphones at one end-point, or other RTP
      mechanisms such as <xref target="RFC4588">RTP Retransmission</xref>.</t>

      <section title="Using the Payload Type">
        <t>An alternative could be to use only the RTP payload type to
        identify the different simulcast streams. This could be tempting,
        since simulcast streams may differ in codec, codec configuration, or
        sampling, all of which are typically specified in SDP by a format
        number on the media line that is in turn connected to an RTP Payload
        Type. Thus all simulcast streams would be sent in the same RTP session
        using only a single SSRC per actual media source. However, as
        discussed in <xref
        target="I-D.westerlund-avtcore-multiplex-architecture">Guidelines for
        using the Multiplexing Features of RTP</xref>, using Payload Type
        Multiplexing does not generally work and is hereby dismissed as
        potential solution.</t>
      </section>

      <section anchor="sec-single-rtp" title="Using Single RTP session ">
        <t>This idea is based on using a unique SSRC for each alternative
        encoding of an actual media source within a single RTP session. The
        identification of streams and how they are specified to be related
        alternatives needs an additional mechanism, for example using <xref
        target="RFC5576">SSRC grouping</xref>, and potentially also a new SDES
        item such as SRCNAME proposed in <xref
        target="I-D.westerlund-avtext-rtcp-sdes-srcname"/> with a semantics
        that indicate them as alternatives of a particular media source. When
        there are multiple actual media sources in a session, each media
        source will have to use a number of SSRCs to represent the different
        simulcast alternatives it produces. For example, assume the number of
        media sources is n and if they all produce the same number of
        simulcast versions, m, there will be n*m SSRCs in use in the RTP
        session. Each SSRC can use any of the configured payload types for
        this RTP session. All session level attributes and parameters that are
        not source specific will apply and must function with all the
        alternative encodings in use.</t>

        <t>In the currently used signaling system based on <xref
        target="RFC4566">SDP</xref> and <xref
        target="RFC3264">Offer/Answer</xref>, the properties of media streams
        are typically negotiated on media block (m-line) level. Sending
        simulcast alternatives as different SSRC belonging to the same media
        description is likely possible to achieve, but SSRC centric signaling
        providing the needed media stream properties is currently almost
        non-existent and it would require a considerable effort to make the
        necessary SDP extensions.</t>

        <t>A single RTP session can be described in SDP by more than a single
        m-line, like for <xref
        target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>, and it
        can re-use the same <xref target="RFC5888">m-line grouping</xref> as
        would be used for <xref target="sec-multi-rtp">multiple RTP
        sessions</xref>, but the RTP aspects described in this section will
        still apply. This would enable the same signalling expressenes for
        multiple RTP sessions as for a single RTP sessions. <!--BoB: Add figure(s) that shows distribution of simulcast versions to SSRC groups and m-lines?--></t>
      </section>

      <section anchor="sec-multi-rtp" title="Using Multiple RTP sessions">
        <t>Using multiple RTP sessions means that each different simulcast
        version of an actual media source is transmitted in a separate RTP
        session, using whatever session identifier to distinguish the
        different versions. Since each RTP session is described by one or more
        SDP m-lines, this solution needs explicit <xref
        target="RFC5888">m-line grouping</xref> with a semantics that indicate
        them as simulcast alternatives. It is also important to identify the
        SSRCs in the different sessions that are alternative encodings of the
        same media source, if there are more than a single media source in
        each RTP session. This could be accomplished using the same SSRC
        across the sessions, but that is not robust against SSRC collisions
        and could potentially force cascading SSRC changes between sessions. A
        better choice would be to use different SSRC, but relate streams
        through a new SDES item proposed in <xref
        target="I-D.westerlund-avtext-rtcp-sdes-srcname"/>. Each RTP session
        will have its own set of configured RTP payload types available for
        use with any SSRC in that session. In addition, all other attributes
        for sessions or sources can be used as normal to indicate the
        configuration of that particular alternative.<!--BoB: Add figure(s) that shows distribution of media sources and versions to sessions and m-lines--></t>
      </section>

      <section title="Conclusions">
        <t>If it is at all desirable to support simulcast based on multicast,
        the solution must support using multiple RTP sessions. The main reason
        is that receiver based selection of simulcast version must be
        possible, which is accomplished in multicast through receiver
        selection of which multicast group(s) it joins. This also has the
        advantage of being able to use the existing SDP media description (m=)
        expressiveness to signal or negotiate simulcast versions.</t>

        <t>When using simulcast based on unicast, it is desirable to be able
        to use the same media description signalling expressiveness regardless
        if multiple RTP sessions are used or not. Assuming that MMUSIC decides
        to enable single RTP media stream negotiation per SDP media
        description and combine that with BUNDLE to identify RTP sessions, it
        appears that using one or more RTP sessions for simulcast over unicast
        will be able to use the same signalling solution. Thus the decision to
        use one or more RTP sessions can be taken based on other limitations,
        such as cost of NAT/FW traversal, need for flow-based QoS etc.</t>

        <t>A solution proposal for an SDP media description level signaling
        for Simulcast version parameters is outlined below.</t>
      </section>
    </section>

    <section title="Simulcast Signaling Proposal">
      <t>Signaling simulcast is about negotiating between media sender and
      receiver what the different simulcast versions should be, how to
      identify them in terms of RTP streams, and how to relate those RTP
      streams.</t>

      <t>The proposed solution consists of:<list style="symbols">
          <t>Signaling simulcast capability as SDP media level attributes in a
          first round of Offer/Answer<list style="symbols">
              <t>Separate send and receive simulcast capabilities</t>

              <t>Media properties that are supported as base for different
              simulcast versions are listed as parameters</t>
            </list></t>

          <t>Adding SDP media descriptions for the simulcast streams in a
          second round of Offer/Answer<list style="symbols">
              <t>Grouping SDP media descriptions from the same media source,
              belonging to the same simulcast, using the <xref
              target="RFC5888">SDP grouping framework</xref></t>

              <t>Separate send and receive simulcast groupings</t>

              <t>Negotiating parameters for simulcast version using regular,
              individual SDP media descriptions</t>

              <t>Identifying RTP media streams (SSRC) from same media source
              using new SDES Item <xref
              target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref></t>
            </list></t>
        </list></t>

      <t> This is further outlined below.</t>

      <section title="Simulcast Capability">
        <t>There are numerous media properties that can be varied to construct
        a set of simulcast versions. A simulcast enabled endpoint could also
        support simulcast based on several of those properties. As long as
        those properties are relatively independent and if each simulcast
        version need explicit definition (an m-line) in the SDP, this would
        lead to an exponential number of simulcast version candidates and a
        very long SDP that is likely also hard to interpret. There is thus a
        need to limit the simulcast version candidates included in the SDP to
        cover as small set of properties as possible.</t>

        <t>If a legacy endpoint not supporting simulcast were to be presented
        with an SDP including media descriptions for a set of simulcast
        versions, it may not know how to correctly handle or interpret these
        "surplus" media descriptions.</t>

        <t>Based on the functionality that simulcast is intended to achieve,
        it should be clear that the reasons to send simulcast versions are not
        the same as to receive simulcast versions, seen from a single
        endpoint.</t>

        <t>For these reasons, it is proposed to define two new SDP media level
        attributes, "a=sim-send" and "a=sim-recv", which explicitly signal
        support for simulcast media transmission and simulcast media
        reception, respectively, for that media description. "a=sim-send" and
        "a=sim-recv" MAY be used independently and simulaneously. These
        attributes are also proposed to have parameters indicating the media
        properties used to create the simulcast versions. The meaning of the
        attributes on SDP session level is undefined and MUST NOT be used.</t>

        <figure anchor="fig-abnf" title="ABNF for Simulcast">
          <artwork><![CDATA[simulcast   = "a="( "sim-send:" / "sim-recv:" ) prop-list
prop-list   = prop-entry *(WSP prop-entry)
prop-entry  = prop *("=" q-value)
prop        = "rtpmap"
            / "fmtp"
            / "imageattr"
            / "ptime"
            / "crypto"
            / token ; for future extensions
q-value     = ( "0" "." 1*2DIGIT )
            / ( "1" "." 1*2("0") )
            ; Values between 0.00 and 1.00
; WSP and DIGIT defined in [RFC5234]
; token defined in [RFC4566]

]]></artwork>
        </figure>

        <t>The media property values are taken from existing (and could likely
        be extended to cover future) SDP attributes that express media
        properties that can be varied to create different simulcast
        versions:<list style="hanging">
            <t hangText="rtpmap:">Differences in codec type, sampling rate
            (see <xref target="sec-requirements"/>), and number of
            channels</t>

            <t hangText="fmtp:">Differences in codec-specific encoding
            parameters</t>

            <t hangText="imageattr:">Differences in video resolution, aspect
            ratio, and framerate <xref target="RFC6236"/></t>

            <t hangText="ptime:">Differences in frame aggregation per
            packet</t>

            <t hangText="crypto:">Differences in encryption <xref
            target="RFC4568"/></t>

            <t hangText="...:"/>
          </list></t>

        <t>The optional q-value expresses the relative preference to base a
        simulcast version on that media property, with 1.00 meaning maximum
        (100%) preference and 0.00 meaning no (0%) preference. Several media
        properties can share the same q-value, in which case they are equally
        preferred.</t>

        <t>An offerer wanting to use simulcast SHALL include either one or
        both of those attributes, depending on in which direction(s) simulcast
        will be used. An offerer that receives an answer without "a=sim-send"
        or "a=sim-recv" MUST NOT define or use any simulcast alternatives
        belonging to that media description and in that direction to the
        answerer.</t>

        <t>An answerer that does not understand the concept of simulcast will
        also not know those attributes and will remove them in the SDP answer,
        as defined in existing SDP Offer/Answer procedures. An answerer that
        does understand the attributes and that wants to support simulcast in
        the indicated direction SHALL reverse directionality of the attribute,
        "sim-send" becomes "sim-recv" and vice versa, and include it in the
        answer.</t>

        <t>An offerer that intends to send simulcast alternatives and thus
        includes "a=sim-send", MUST also include at least one media property
        parameter that it intends to use to construct the simulcast
        alternatives, but it MAY include more media property parameters.
        Including multiple media property parameters in "a=sim-send" SHALL be
        interpreted as an offer to send simulcast versions covering all
        combinations thereof, but MAY be further restricted by other
        information in the SDP such as for example the number of
        simulcast-related media descriptions in the SDP or use of <xref
        target="I-D.westerlund-mmusic-max-ssrc">max-ssrc signaling</xref>.</t>

        <t>An offerer that is capable of receiving simulcast alternatives and
        thus includes "a=sim-recv", MUST also include at least one media
        property parameter that it is willing to use as discriminator between
        received simulcast alternatives, but MAY include more media property
        parameters. Including multiple media property parameters in
        "a=sim-recv" SHALL be interpreted as an offer to receive simulcast
        versions covering all combinations thereof, but MAY be further
        restricted by other information in the SDP such as for example the
        number of simulcast-related media descriptions in the SDP or use of
        <xref target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
        signaling</xref>.</t>

        <t>An answerer either lacks the capability or desire to use simulcast
        versions based on a certain media property parameter in a specific
        direction MUST remove such media property parameter from "a=sim-send"
        or "a=sim-recv". The answerer MUST NOT add any media property
        parameters that were not included in the offer.</t>
      </section>

      <section anchor="sec-group-m"
               title="Grouping Simulcast Media Descriptions">
        <t>To relate media descriptions holding simulcast versions, two new
        simulcast grouping semantics are defined, "SimulCast Receive" (SCR)
        and "SimulCast Send" (SCS). There is a need to separate semantics for
        the intent to send simulcast streams from the semantics that describe
        capability to recognize and receive simulcast streams. Both sematics
        act as an indicator that simulcast is desired and that the grouped
        media descriptions (m-lines) carries simulcast versions of media
        sources. There may be multiple sets of media descriptions that carries
        simulcast versions.</t>

        <section title="Declarative Use">
          <t>When used as a declarative media description, SCR indicates the
          configured end-point's required capability to recognize and receive
          a specified set of RTP streams as simulcast streams. In the same
          fashion, SCS requests the end-point to send a specified set of RTP
          streams as simulcast streams. SCR and SCS MAY be used independently
          and at the same time and they need not specify the same or even the
          same number of media descriptions in the group.</t>
        </section>

        <section title="Offer/Answer Use">
          <t>When used in an offer, SCS indicates the SDP providing agent's
          intent of sending simulcast and the particular set of media
          descriptions, and SCR indicates the agent's capability of receiving
          simulcast streams within the configured set of media descriptions.
          SCS and SCR MAY be used independently and at the same time and they
          need not specify the same or even the same number of media
          descriptions in the group. The answerer MUST change SCS to SCR and
          SCR to SCS in the answer, given that it has and wants to use the
          corresponding (reverse) capability. An answerer not supporting the
          SCS or SCR direction, or not supporting SCS or SCR grouping
          semantics at all, will remove that grouping attribute altogether,
          according to <xref target="RFC5888">the grouping framework</xref>.
          However, this case should not occur or at least be very rare due to
          the proposed <xref target="sec-two-phase">two-phase approach</xref>.
          An offerer that receives an answer indicating lack of simulcast
          support in one or both directions, where SCR and/or SCS grouping are
          removed, MUST NOT use simulcast in the non-supported
          direction(s).</t>
        </section>
      </section>

      <section anchor="sec-two-phase" title="Two-Phase Negotiation">
        <t>These new "a=sim-send" and "a=sim-recv" attributes are proposed to
        be included in the SDP as a first phase in a two-phased approach,
        where the first phase involves a first SDP Offer/Answer procedure that
        only establishes simulcast capability at both the offerer and the
        answerer. This has the additional advantage to avoid sending media
        descriptions related to simulcast to an endpoint that does not support
        simulcast. It is also not likely that it incurs any significant extra
        signaling round-trips, given that many other recent SDP techniques
        also makes use of two Offer/Answer procedures, as long as this phased
        approach can be used in parallel with those. Such other two-phase
        techniques include <xref target="RFC5245">ICE</xref> and <xref
        target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>.</t>

        <t>Thus, the first Offer/Answer SHOULD NOT include any
        simulcast-grouped media descriptions, which SHOULD then be added in a
        second Offer/Answer phase. This second phase SHOULD be initiated by
        the simulcast receiver, meaning the endpoint that included
        "a=sim-recv" in the first phase SDP SHOULD be offerer in the second
        phase. If both endpoints are simulcast receivers, it is not possible
        to define a preferred offerer in the second phase and either endpoint
        MAY then send the offer, using regular Offer/Answer rules to handle
        race conditions.</t>

        <t>The first phase of establishing capability is not possible to use
        with declarative SDP, in which case it SHALL be by-passed, using the
        second phase media description grouping directly.</t>
      </section>

      <section anchor="sec-requirements" title="Media Stream Requirements">
        <t>When doing simulcast, the media streams that are alternatives need
        to meet certain constraints to ensure that switching between
        alternative streams are as issue-free as possible. The following
        constraints are needed:<list style="hanging">
            <t hangText="Same Clock Base:">To enable correct alignment of
            media packets on the source time-line, all alternative streams
            (SSRCs) MUST use the same underlying clock to relate their RTP
            timestamp values with the network time protocol (NTP) formatted
            sender time in the RTCP Sender Reports.</t>

            <t hangText=""/>
          </list></t>

        <t/>
      </section>

      <section anchor="sec-srcname" title="Relating Alternative Encodings">
        <t>To ensure that simulcast streams can be related correctly also on
        RTP level, the usage of <xref
        target="I-D.westerlund-avtext-rtcp-sdes-srcname">SDES SRCNAME</xref>
        to label and relate simulcast versions belonging to the same media
        source is RECOMMENDED.</t>
      </section>

      <section anchor="sec-max-ssrc" title="Multiple Stream handling">
        <t>When using multiple SSRC in a single media description, for example
        when using simulcast for multiple independent media sources, the
        grouping semantics SCR and SCS SHOULD be combined with the SDP
        attributes <xref
        target="I-D.westerlund-mmusic-max-ssrc">"a=max-send-ssrc" and
        "a=max-recv-ssrc"</xref> to indicate the number of simultaneous
        streams of each encoding that may be sent or that can be handled in
        the receive direction.</t>
      </section>
    </section>

    <section title="Simulcast Signaling Examples">
      <t>For brevity and clarity, the SDP in all below examples does not
      contain signaling for multiple streams, such as the ones related to
      <xref target="sec-srcname">RTP level relations</xref> or <xref
      target="sec-max-ssrc">multiple SSRC signaling</xref>.</t>

      <t>This example is for a case of client to video conference service
      using a centralized media topology with an RTP mixer. Alice and Bob
      calls into a conference server for a conference call with audio and
      video sent to the RTP mixer, these clients being capable to send a few
      video simulcast versions. The conference server also dials out to Fred,
      which is a legacy client resulting in fallback behavior. When dialing
      out to Joe, more functionality is enabled as Joe is a client similar to
      Alice.</t>

      <figure align="center" anchor="fig-mixer-four-party"
              title="Four-party Mixer-based Conference">
        <artwork><![CDATA[
+---+      +-----------+      +---+
| A |<---->|           |<---->| B |
+---+      |           |      +---+
           |   Mixer   |
+---+      |           |      +---+
| F |<---->|           |<---->| J |
+---+      +-----------+      +---+]]></artwork>
      </figure>

      <t>Example of Media plane for RTP mixer based multi-party conference
      with 4 participants.</t>

      <section title="Alice: Desktop Client">
        <t>Alice is calling in to the mixer with an audiovisual single stream
        desktop client, only adding capability to send <xref
        target="RFC6236">video resolution</xref> ("imageattr") and framerate
        based simulcast compared to a legacy client. The first phase offer
        from Alice looks like</t>

        <figure anchor="fig-alice-first-offer"
                title="Alice First Offer for a Simulcast Conference">
          <artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:665
m=audio 49200 RTP/AVP 96 97 9 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:97 G719/48000
a=rtpmap:9 G722/8000
a=rtpmap:8 PCMA/8000
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr=1.0 fmtp=0.8
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=content:main

]]></artwork>
        </figure>

        <t>In this first phase, the only thing in the SDP that indicates
        simulcast capability is the line in the video media description
        containing the "sim-send" attribute.</t>

        <t>The answer from the server indicates both that it is simulcast
        capable and that it would only like to use video resolution
        ("imageattr") based simulcast only. Should it not have been simulcast
        capable, the "a=sim-recv" line would not have been present and
        communication would have started with the media negotiated in the
        SDP.</t>

        <figure anchor="fig-alice-first-answer"
                title="Server First Answer for a Simulcast Conference">
          <artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:665
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=content:main

]]></artwork>
        </figure>

        <t>Since the server is the simulcast media receiver, it immediately
        initiates another Offer/Answer including the simulcast versions. The
        server also keeps the "sim-recv" as explicit simulcast capability
        indication in this second Offer/Answer round. Note that the
        "non-simulcast" media can be started already now, before the second
        phase Offer/Answer, with the only restriction that the simulcast
        functionality is not yet established.</t>

        <figure anchor="fig-alice-second-offer"
                title="Server Second Offer for a Simulcast Conference">
          <artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Server inviting simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:825
a=group:SCR 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-recv:imageattr
a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 recv [x=320,y=180]
a=mid:3
a=recvonly
]]></artwork>
        </figure>

        <t>The server has added one additional receive-only media description
        with the simulcast version based on difference only in imageattr. That
        the two media lines are considered to be simulcast versions is seen
        from the SCR grouping tag and the two media IDs (2 and 3). The first
        video version with media ID 2 prefers 360p resolution (signaled via
        imageattr) and the second video version with media ID 3 prefers 180p
        resolution. The first video media line also acts as the single send
        video (making media line sendrecv), while the second video media line
        is only related to simulcast transmission and is thus offered
        recvonly. </t>

        <t>The fact that fmtp for this second video is also different should
        be seen as a secondary effect from the change of resolution and does
        not create any kind of conflict. The capabilities of Alice's client is
        very well aligned with this and the SDP answer is straightforward.</t>

        <figure anchor="fig-alice-second-answer"
                title="Alice Second Answer for a Simulcast Conference">
          <artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Final answer from simulcast enabled Desktop Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=group:SCS 2 3
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
a=mid:1
m=video 49300 RTP/AVP 96
b=AS:520
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c01e
a=sim-send:imageattr
a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
a=mid:2
a=content:main
m=video 49400 RTP/AVP 96
b=AS:160
a=rtpmap:96 H264/90000
a=fmtp:96 profile-level-id=42c00d
a=imageattr:96 send [x=320,y=180]
a=mid:3
a=sendonly
]]></artwork>
        </figure>

        <t/>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document requests that two new attributes sim-send and sim-recv,
      with a new registry of defined parameters taken from existing SDP
      attributes, and two new SDP grouping semantics, SCS and SCR, are
      registered.</t>

      <t>Formal registrations to be written.</t>

      <t/>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The simulcast capability attributes and parameters are vulnerable to
      attacks in signaling.</t>

      <t>A false inclusion of simulcast attributes may result in generation of
      a second phase SDP that potentially contains a large number of
      non-supported media descriptions expressing simulcast alternatives. A
      correct SDP implementation will however be able to reject any
      non-supported media descriptions and the effect from that should be
      limited.</t>

      <t>A hostile removal of the simulcast attributes will result in skipping
      any second phase Offer/Answer and that simulcast is not used.</t>

      <t>The simulcast grouping semantics are vulnerable to attacks in the
      signalling.</t>

      <t>A false grouping of non-simulcast streams as simulcast would risk
      that some streams are incorrectly ignored by receivers that know
      simulcast and that are not interested in the assumed simulcast
      streams.</t>

      <t>A hostile removal of simulcast grouping will prevent streams from
      being interpreted as simulcast, which obviously prevents use of the
      simulcast functionality. It will also risk that intended simulcast
      streams are instead presented as separate, independent streams to a
      receiver.</t>

      <t>Neither of the above will likely have any major consequences and can
      be mitigated by signaling that is at least integrity and source
      authenticated to prevent an attacker to change it.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t/>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.4566'?>

      <?rfc include='reference.RFC.4568'?>

      <?rfc include='reference.RFC.5576'?>

      <?rfc include='reference.RFC.5888'?>

      <?rfc include='reference.RFC.6236'?>

      <?rfc include='reference.I-D.westerlund-avtext-rtcp-sdes-srcname'?>

      <?rfc include='reference.I-D.westerlund-mmusic-max-ssrc'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.3264'?>

      <?rfc include='reference.RFC.3569'?>

      <?rfc include='reference.RFC.4588'?>

      <?rfc include='reference.RFC.5117'?>

      <?rfc include='reference.RFC.5245'?>

      <?rfc include='reference.RFC.6190'?>

      <?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>

      <?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>

      <?rfc include='reference.I-D.lennox-avtcore-rtp-multi-stream'?>

      <?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>
    </references>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-23 19:35:14