http://stupid.domain.name/ietf/

One document matched: draft-westerlund-avtcore-rtp-simulcast-03.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-westerlund-avtcore-rtp-simulcast-03"
     ipr="trust200902" submissionType="IETF">
  <front>
    <title abbrev="RTP Simulcast">Using Simulcast in RTP Sessions</title>

    <author fullname="Magnus Westerlund" initials="M." surname="Westerlund">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 82 87</phone>

        <email>magnus.westerlund@ericsson.com</email>
      </address>
    </author>

    <author fullname="Bo Burman" initials="B." surname="Burman">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Farogatan 6</street>

          <city>SE-164 80 Kista</city>

          <country>Sweden</country>
        </postal>

        <phone>+46 10 714 13 11</phone>

        <email>bo.burman@ericsson.com</email>
      </address>
    </author>

    <author fullname="Suhas Nandakumar" initials="S." surname="Nandakumar">
      <organization>Cisco</organization>

      <address>
        <postal>
          <street>170 West Tasman Drive</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>snandaku@cisco.com</email>

        <uri/>
      </address>
    </author>

    <date day="21" month="October" year="2013"/>

    <abstract>
      <t>In some application scenarios it may be desirable to send multiple
      differently encoded versions of the same Media Source in independent
      Source Packet Streams. This is called Simulcast. This document discusses
      the best way of accomplishing Simulcast in RTP and how to signal it in
      SDP. A solution is defined by making three extensions to SDP, and using
      RTP/RTCP identification methods to relate RTP Source Packet Streams. The
      first SDP extension consists of two new session level SDP attributes
      that express capability to send or receive Simulcast Source Packet
      Streams, respectively. The second SDP extension introduces an SDP media
      level attribute that groups and identifies a selected set of media level
      parameters for a specific direction, called a media configuration. The
      third SDP extension describes how to group such media configurations on
      SDP session or media level for Simulcast purposes.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="sec-intro" title="Introduction">
      <t>Most of today's multiparty video conference solutions make use of
      centralized servers to reduce the bandwidth and CPU consumption in the
      endpoints. Those servers receive Source Packet Streams from each
      participant and send some suitable set of possibly modified streams to
      the rest of the participants, which usually have heterogeneous
      capabilities (screen size, CPU, bandwidth, codec, etc). One of the
      biggest issues is how to perform stream adaptation to different
      participants' constraints with the minimum possible impact on video
      quality and server performance.</t>

      <t>Simulcast is the act of simultaneously sending multiple different
      versions of the same media content, e.g. the same video source encoded
      with different video encoder types or image resolutions. This can be
      done in several ways and for different purposes. This document focuses
      on the case where it is desirable to provide a Media Source as multiple
      Source Packet Streams over <xref target="RFC3550">RTP</xref> towards an
      intermediary so that the intermediary can provide the wanted
      functionality by selecting which Source Packet Stream to forward to
      other participants in the session, and more specifically how the
      identification and grouping of the involved Source Packet Streams are
      done. From an RTP perspective, Simulcast is a specific application of
      the aspects discussed in <xref
      target="I-D.ietf-avtcore-multiplex-guidelines">RTP Multiplexing
      Guidelines</xref>.</t>

      <t>The purpose of this document is to describe a few scenarios where it
      is motivated to use Simulcast, and propose a suitable solution for
      signaling and performing RTP Simulcast.</t>
    </section>

    <section anchor="sec-definitions" title="Definitions">
      <t/>

      <section title="Terminology">
        <t>This document makes use of the terminology defined in <xref
        target="I-D.lennox-raiarea-rtp-grouping-taxonomy">RTP Taxonomy</xref>.
        In addition, the following terms are used:<list style="hanging">
            <t hangText="Media Configuration:">A specific set of parameter
            values applied on the encoding and packetization process that
            creates a specific Source Packet Stream. In SDP, the applicable
            parameter values are described by the joint set of "rtpmap"
            parameters, "fmtp" parameters, and the <xref
            target="sec-media-config">"config-id"</xref> parameters, including
            extensions.</t>
          </list></t>
      </section>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>
    </section>

    <section anchor="sec-use-cases" title="Use Cases">
      <t>Many use cases of Simulcast as described in this document relate to a
      multi-party Communication Session where one or more central nodes are
      used to adapt the view of the Communication Session towards individual
      Participants, and facilitate the Media Transport between Participants.
      Thus, these cases targets the RTP Mixer topology defined in <xref
      target="RFC5117"/> (Section 3.4: Topo-Mixer), further elaborated and
      extended with other topologies in <xref
      target="I-D.ietf-avtcore-rtp-topologies-update"/> (Section 3.6 to
      3.9).</t>

      <t>There are two principle approaches for an RTP Mixer to provide this
      adapted view of the Communication Session to each receiving
      Participant:<list style="symbols">
          <t>Transcoding (decoding and re-encoding) received Source Packet
          Streams with characteristics adapted to each receiving Participant.
          This often include mixing or composition of Media Sources from
          multiple Participants into a mixed Media Source originated by the
          RTP Mixer. The main advantage of this approach is that it achieves
          close to optimal adaptation to individual receiving Participants.
          The main disadvantages are that it can be very computationally
          expensive to the RTP Mixer and typically also degrades media Quality
          of Experience (QoE) such as end-to-end delay for the receiving
          Participants.</t>

          <t>Switching a subset of all received Source Packet Streams or
          sub-streams to each receiving Participant, where the used subset is
          typically specific to each receiving Participant. The main
          advantages of this approach are that it is computationally cheap to
          the RTP Mixer and it has very limited impact on media QoE. The main
          disadvantage is that it can be difficult to combine a subset of
          received Source Packet Streams into a perfect fit to the resource
          situation of a receiving Participant.</t>
        </list></t>

      <t>The use of Simulcast is relates to the latter approach, where it is
      more important to reduce the load on the RTP Mixer and/or minimize QoE
      impact than to achieve an optimal adaptation of resource usage.</t>

      <t>A multicast/broadcast case where the receivers themselves selects the
      most appropriate simulcast version and tune in to the right transport to
      receive that version is also <xref
      target="sec-multicast">considered</xref> . This enables large receiver
      populations with heterogeneity where it comes to capabilities and the
      use network paths bandwidth.</t>

      <t>In this section, an "RTP switch" is used as a common short term for
      the terms "switching RTP mixer", "source projecting middlebox", and
      "video switching MCU" as discussed in <xref
      target="I-D.ietf-avtcore-rtp-topologies-update"/>.</t>

      <section anchor="sec-diverse-receivers"
               title="Reaching a Diverse Set of Receivers">
        <t>The Media Sources provided by a sending Participant potentially
        need to reach several receiving Participants that differ in terms of
        available resources. A discussion on that topic is included in <xref
        target="appendix-a"/>. The receiver resources that typically differ
        include, but are not limited to:<list style="hanging">
            <t hangText="Codec:">This includes codec type (such as SDP MIME
            type) and can include codec configuration options (e.g. SDP fmtp
            parameters). A couple of codec resources that differ only in codec
            configuration will be "different" if they are somehow not
            "compatible", like if they differ in video codec profile, or the
            transport packetization configuration.</t>

            <t hangText="Sampling:">This relates to how the Media Source is
            sampled, in spatial as well as in temporal domain. For video
            streams, spatial sampling affects image resolution and temporal
            sampling affects video frame rate. For audio, spatial sampling
            relates to the number of audio channels and temporal sampling
            affects audio bandwidth. This may be used to suit different
            rendering capabilities or needs at the receiving endpoints, as
            well as a method to achieve different transport capabilities,
            bitrates and eventually QoE by controlling the amount of source
            data.</t>

            <t hangText="Bitrate:">This relates to the amount of bits spent
            per second to transmit the Media Source as an Source Packet
            Stream, which typically also affects the Quality of Experience
            (QoE) for the receiving user.</t>
          </list>Letting the sending Participant create a Simulcast of a few
        differently configured Source Packet Streams per Media Source can be a
        good trade-off when using an RTP switch as middlebox, instead of
        sending a single Source Packet Stream and using an RTP Mixer to create
        individual transcodings to each receiving Participant.</t>

        <t>This requires that the receiving Participants can be categorized in
        terms of available resources and that the sending Participant can
        choose a matching configuration for a single Source Packet Stream per
        category and Media Source.</t>

        <t>For example, assume for simplicity a set of receiving Participants
        that differ only in that some have support to receive Codec A, and the
        others have support to receive Codec B. Further assume that the
        sending participant can send both Codec A and B. It can then reach all
        receivers by creating two Simulcasted Source Packet Streams from each
        Media Source; one for Codec A and one for Codec B.</t>

        <t>In another simple example, a set of receiving Participants differ
        only in screen resolution; some are able to display video with at most
        360p resolution and some support 720p resolution. A sending
        Participant can then reach all receivers by creating a Simulcast of
        Source Packet Streams with 360p and 720p resolution for each sent
        video Media Source.</t>

        <t>In more elaborate cases, the receiving Participants differ both in
        available Sampling and Bitrate, and maybe also Codec, and it is up to
        the RTP switch to find a good trade-off in which Simulcasted stream to
        choose for each intended receiver. It is also the responsibility of
        the RTP switch to negotiate a good fit of Simulcast streams with the
        sending Participant.</t>

        <t>The maximum number of Simulcasted Source Packet Streams that can be
        sent is mainly limited by the amount of processing and uplink network
        resources available to the sending Participant.</t>
      </section>

      <section anchor="sec-application-specific"
               title="Application Specific Media Source Handling">
        <t>The application logic that controls the Communication Session may
        include special handling of some Media Sources. It is for example
        commonly the case that the media from a sending Participant is not
        sent back to itself.</t>

        <t>It is also common that a currently active speaker Participant is
        shown in larger size or higher quality than other Participants (the
        Sampling or Bitrate aspects of <xref
        target="sec-diverse-receivers"/>). Not sending the active speaker
        media back to itself means there is some other Participant's media
        instead that receive special handling towards the active speaker;
        typically the previous active speaker. This way, the previously active
        speaker is needed both in larger size (to current active speaker) and
        in small size (to the rest of the Participants), which can be solved
        with a Simulcast from the previously active speaker to the RTP
        switch.</t>
      </section>

      <section anchor="sec-multicast"
               title="Receiver Adaptation in Multicast/Broadcast">
        <t>When using Broadcast or Multicast technology to distribute
        real-time media streams to large populations of receivers there can
        still be significant heterogeneity among the receiver population. This
        can depend on several factors:<list style="hanging">
            <t hangText="Network Bandwidth:">The network paths to individual
            receivers will have variations in the bandwidth. Thus putting
            different limits on the supported bit-rates that can be
            received.</t>

            <t hangText="Endpoint Capabilities:">The endpoint's hardware and
            software can have varying capabilities in relation to screen
            resolution, decoding capabilities, and supported media codecs.</t>
          </list></t>

        <t>To handle these variations, a transmitter of real-time media may
        want to apply Simulcast to its Source Packet Streams and provide a set
        of media configurations, enabling the receivers to select the best fit
        from these sets themselves. The endpoint capabilities will usually
        result in a single initial choice. However, the network bandwidth can
        vary over time, which requires a client to continuously monitor its
        reception to determine if the received media streams still fit within
        the available bandwidth. If not, another Simulcast media configuration
        containing a thinner set of Source Packet Streams will have to be
        chosen.</t>

        <t>When one uses IP multicast, the level of Simulcast granularity that
        the receiver can select from is by choosing different multicast
        addresses. Thus, different Simulcast versions need to be put on
        different Media Transports using different multicast addresses. If
        these Simulcast versions are described using SDP, they need to be part
        of different SDP media descriptions, as SDP binds to transport on
        media description level. To enable more than the initial choice to
        function well, there is a need to enable correct mapping of Source
        Packet Streams in one Simulcast media configuration to a corresponding
        Source Packet Stream in another Simulcast media configuration on
        another multicast group.</t>
      </section>

      <section anchor="sec-receiver-preferences"
               title="Receiver Media Source Preferences">
        <t>The application logic that controls the Communication Session may
        allow receiving Participants to apply preferences to the
        characteristics of the Source Packet Stream they receive, for example
        in terms of the aspects listed in <xref
        target="sec-diverse-receivers"/>. Sending a Simulcast of Source Packet
        Streams is one way of accommodating receivers with conflicting or
        otherwise incompatible preferences.</t>
      </section>
    </section>

    <section anchor="sec-requirements" title="Requirements">
      <t>The following requirements need to be met to support the use cases in
      previous sections:<list style="hanging">
          <t anchor="req-1" hangText="REQ-1:">Identification. It must be
          possible to identify a set of simulcasted Source Packet Streams as
          originating from the same Media Source:<list style="hanging">
              <t anchor="req-1.1" hangText="REQ-1.1:">In SDP signaling.</t>

              <t anchor="req-1.2" hangText="REQ-1.2:">On RTP/RTCP level.</t>
            </list></t>

          <t anchor="req-2" hangText="REQ-2:">Transport usage. The solution
          must work when distributing different Simulcast versions on:<list
              style="hanging">
              <t anchor="req-2.1" hangText="REQ-2.1:">Same Media Transport and
              RTP session.</t>

              <t anchor="req-2.2" hangText="REQ-2.2:">Different Media
              Transports and RTP sessions.</t>
            </list></t>

          <t anchor="req-3" hangText="REQ-3:">Capability negotiation. It must
          be possible that:<list style="hanging">
              <t anchor="req-3.1" hangText="REQ-3.1:">Sender can express
              capability of sending simulcast.</t>

              <t anchor="req-3.2" hangText="REQ-3.2:">Receiver can express
              capability of receiving simulcast.</t>

              <t anchor="req-3.3" hangText="REQ-3.3:">Sender can express
              maximum number of Simulcast versions that can be provided.</t>

              <t anchor="req-3.4" hangText="REQ-3.4:">Receiver can express
              maximum number of Simulcast versions that can be received.</t>

              <t anchor="req-3.5" hangText="REQ-3.5:">Sender can detail the
              characteristics of the Simulcast versions that can be
              provided.</t>

              <t anchor="req-3.6" hangText="REQ-3.6:">Receiver can detail the
              characteristics of the Simulcast versions that it prefers to
              receive.</t>
            </list></t>

          <t anchor="req-4" hangText="REQ-4:">Distinguishing features. It must
          be possible to have different Simulcast versions use different
          values for any combination of:<list style="hanging">
              <t anchor="req-4.1" hangText="REQ-4.1:">Codec. This includes
              both codec type and configuration options for both codec and RTP
              packetization. It also includes different layers from a scalable
              codec, but only as long as those layers are possible to identify
              on RTP level.</t>

              <t anchor="req-4.2" hangText="REQ-4.2:">Bitrate of Source Packet
              Stream.</t>

              <t anchor="req-4.3" hangText="REQ-4.3:">Sampling in spatial as
              well as in temporal domain.</t>
            </list></t>

          <t anchor="req-5" hangText="REQ-5:">Compatibility. It must be
          possible to use Simulcast in combination with other RTP mechanisms
          that generate additional Source Packet Streams:<list style="hanging">
              <t anchor="req-5.1" hangText="REQ-5.1:"><xref
              target="RFC4588">RTP Retransmission</xref>.</t>

              <t anchor="req-5.2" hangText="REQ-5.2:"><xref
              target="RFC5109">RTP Forward Error Correction</xref>.</t>
            </list></t>

          <t anchor="req-6" hangText="REQ-6:">Interoperability. The solution
          must also be able to use in:<list style="hanging">
              <t anchor="req-6.1" hangText="REQ-6.1:">Interworking with
              non-simulcast legacy clients using a single Media Source per
              media type.</t>

              <t anchor="req-6.2" hangText="REQ-6.2:">WebRTC "Unified Plan"
              environment.</t>
            </list></t>
        </list></t>
    </section>

    <section anchor="sec-solution" title="Proposed Solution Overview">
      <t>Signaling Simulcast is about negotiating between media sender and
      receiver what the different Simulcast versions should be, how to
      identify them in terms of Source Packet Streams, and how to inter-relate
      those Source Packet Streams.</t>

      <t>The proposed solution consists of:<list style="symbols">
          <t>Signaling Simulcast capability in an optional, pre-stage
          Offer/Answer:<list style="symbols">
              <t>Separate send and receive Simulcast capabilities as SDP
              session level attributes.</t>

              <t>Media properties that are supported as base for different
              Simulcast versions are listed as parameters that are also
              possible to rank.</t>

              <t>Early indication of maximum number of available
              encoding/decoding resources on SDP media level.</t>
            </list></t>

          <t>Including detailed information for the Simulcast in a main
          Offer/Answer:<list style="symbols">
              <t>Including Simulcast capability indications, as described
              above, being kept from the pre-stage Offer/Answer, if any.</t>

              <t>Defining and labeling of the media configuration for each
              Simulcast version to be sent or received.</t>

              <t>The media configuration for a Simulcast version can include
              acceptable parameter ranges for parameters that are most likely
              used to distinguish Simulcast versions.</t>

              <t>Indicating the use of Simulcast, separately per direction, by
              grouping the defined media configurations, not individual
              streams, that will constitute the Simulcast.</t>

              <t>Allowing that any one of the media configurations in a
              specific Simulcast is signaled inactive from the start of the
              session. This is defined as equivalent to the affected Source
              Packet Stream being in <xref
              target="I-D.westerlund-avtext-rtp-stream-pause">PAUSED
              state</xref>.</t>

              <t>Adding and/or modifying SDP media descriptions as needed to
              accommodate the negotiated Simulcast streams.</t>

              <t>Parameter limits to the aggregate of media configurations are
              signaled by existing SDP attributes on session and media
              description level.</t>

              <t>Including media level indication of maximum number of
              available encoding/decoding resources on SDP media level. They
              MAY be modified compared to the pre-stage Offer/Answer, if
              any.</t>

              <t>Identifying which Source Packet Stream corresponds to which
              media configuration by including the configuration label as part
              of the SDES item <xref
              target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref>
              information include in the RTP and RTCP packets. The optional
              mechanism for source specific signalling defined in SRCNAME
              could be used to let Simulcast sender pre-announce such a
              relationship before sending the Source Packet Stream.</t>
            </list></t>

          <t>Adding Simulcast information to the Source Packet Stream:<list
              style="symbols">
              <t>Identifying Source Packet Streams from same Media Source
              using the new RTCP SDES Item <xref
              target="I-D.westerlund-avtext-rtcp-sdes-srcname">SRCNAME</xref>,
              and as described there including the possibility to send the
              same information as an <xref target="RFC5285">RTP Header
              Extension</xref>.</t>

              <t>Using <xref
              target="I-D.westerlund-avtext-rtp-stream-pause">PAUSE/RESUME</xref>
              functionality to temporarily turn individual Simulcast versions
              on or off.</t>
            </list></t>
        </list></t>
    </section>

    <section anchor="sec-signaling" title="Proposed Signaling">
      <t>This section further details the signaling solution outlined <xref
      target="sec-solution">above</xref>.</t>

      <section title="Simulcast Capability">
        <t>There are numerous media properties that can be varied to construct
        a set of Simulcast versions. A Simulcast enabled endpoint could also
        support Simulcast based on several of those properties. As long as
        those properties are relatively independent and if each Simulcast
        version need explicit definition in the SDP, this would lead to an
        exponential number of Simulcast version candidates and a very long SDP
        that is likely also hard to interpret. There is thus a need to limit
        the Simulcast version candidates included in the SDP to cover as small
        set of properties as possible.</t>

        <t>If a legacy endpoint not supporting Simulcast were to be presented
        with an SDP including media descriptions for a set of Simulcast
        versions, it may not know how to correctly handle or interpret these
        "surplus" media descriptions.</t>

        <t>Based on the functionality that Simulcast is intended to achieve,
        it should be clear that the reasons to send Simulcast versions are not
        the same as to receive Simulcast versions, seen from a single
        endpoint.</t>

        <t>For these reasons, it is proposed to define two new SDP session
        level attributes, "a=sim-send-cap" and "a=sim-recv-cap", which
        explicitly signal support for Simulcast media transmission and
        Simulcast media reception, respectively, for that media description.
        "a=sim-send-cap" and "a=sim-recv-cap" MAY be used independently and
        simultaneously. These attributes are also proposed to have parameters
        indicating the media properties used to create the Simulcast versions,
        and their preferred ranking. The meaning of the attributes on SDP
        media level is undefined and MUST NOT be used.</t>

        <figure anchor="fig-abnf-cap" title="ABNF for Simulcast Capability">
          <artwork><![CDATA[
simulcast-cap   = "a="( "sim-send-cap:" / "sim-recv-cap:" ) 
                  cap-prop-list
cap-prop-list   = cap-prop-entry *(WSP cap-prop-entry)
cap-prop-entry  = cap-prop ["=" q-value]
cap-prop        = "rtpmap"
                / "fmtp"
                / "imageattr"
                / "framerate"
                / token ; for future extensions
q-value         = ( "0" "." 1*2DIGIT )
                / ( "1" "." 1*2("0") )
                ; Values between 0.00 and 1.00
; WSP and DIGIT defined in [RFC5234]
; token defined in [RFC4566]

]]></artwork>
        </figure>

        <t>The media property values are taken from existing (and could be
        extended to cover other or future) SDP attributes that express media
        properties that can be varied to create different Simulcast
        versions:<list style="hanging">
            <t hangText="rtpmap:">Differences in codec type, sampling rate
            (see <xref target="sec-requirements"/>), and number of
            channels.</t>

            <t hangText="fmtp:">Differences in codec-specific encoding
            parameters.</t>

            <t hangText="imageattr:">Differences in video resolution and
            aspect ratio <xref target="RFC6236"/>.</t>

            <t hangText="framerate:">Differences in framerate.</t>
          </list></t>

        <t>The optional q-value expresses the relative preference to base a
        Simulcast version on that media property, with 1.00 meaning maximum
        (100%) preference and 0.00 meaning no (0%) preference. Several media
        properties can share the same q-value, in which case they are equally
        preferred. Not including any q-value for a media property value SHALL
        default to a q-value of 1.00.</t>

        <t>The list of media properties is made extensible, to allow
        introducing additional dimensions for Simulcast versions.</t>

        <section title="Declarative Use">
          <t>When used as a declarative media description, sim-recv-cap
          indicates the configured end-point's required capability to
          recognize and receive a specified set of Source Packet Streams as
          Simulcast streams. In the same fashion, sim-send-cap requests the
          end-point to send a specified set of Source Packet Streams as
          Simulcast streams. sim-recv-cap and sim-send-cap MAY be used
          independently and at the same time and they need not specify the
          same capability properties.</t>
        </section>

        <section title="Offer/Answer Use">
          <t>An offerer wanting to use Simulcast SHALL include either one or
          both of those attributes, depending on in which direction(s)
          Simulcast is both supported and desirable. An offerer that receives
          an answer without "a=sim-send-cap" or "a=sim-recv-cap" MUST NOT
          define or use any Simulcast alternatives in that direction to the
          answerer.</t>

          <t>An answerer that does not understand the concept of Simulcast
          will also not know those attributes and will remove them in the SDP
          answer, as defined in existing SDP Offer/Answer procedures. An
          answerer that does understand the attributes and that wants to
          support Simulcast in the indicated direction SHALL reverse
          directionality of the attribute; "sim-send-cap" becomes
          "sim-recv-cap" and vice versa, and include it in the answer.</t>

          <t>An offerer that intends to send Simulcast alternatives and thus
          includes "a=sim-send-cap", MUST also include at least one media
          property parameter that it intends to use to construct the Simulcast
          alternatives, but it MAY include more media property parameters.
          Including multiple media property parameters in "a=sim-send-cap"
          SHALL be interpreted as an offer to send Simulcast versions covering
          all combinations thereof, but MAY be further restricted by other
          information in the SDP such as for example the number of
          simulcast-related media descriptions in the SDP or use of <xref
          target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
          signaling</xref>.</t>

          <t>An offerer that is capable of receiving Simulcast alternatives
          and thus includes "a=sim-recv-cap", MUST also include at least one
          media property parameter that it is willing to use as discriminator
          between received Simulcast alternatives, but MAY include more media
          property parameters. Including multiple media property parameters in
          "a=sim-recv-cap" SHALL be interpreted as an offer to receive
          Simulcast versions covering all combinations thereof, but MAY be
          further restricted by other information in the SDP such as for
          example the number of simulcast-related media descriptions in the
          SDP or use of <xref target="I-D.westerlund-mmusic-max-ssrc">max-ssrc
          signaling</xref>.</t>

          <t>An answerer that either lacks the capability or does not desire
          to use Simulcast versions based on a certain media property
          parameter in a specific direction MUST remove such media property
          parameter from "a=sim-send-cap" or "a=sim-recv-cap". The answerer
          MUST NOT add any media property parameters that were not included in
          the offer.</t>

          <t>An answerer SHOULD take the offerer's q-values into account when
          choosing which <xref target="sec-media-config">media
          configurations</xref> to include in the answer and how to <xref
          target="sec-group-config">group them</xref> into the resulting
          Simulcast(s).</t>
        </section>
      </section>

      <section anchor="sec-media-config" title="Media Configuration">
        <t>Media that constitutes a Simulcast version has certain desirable
        characteristics that is meant to suit one category of <xref
        target="sec-diverse-receivers">diverse receivers</xref>. A receiver
        that is willing to receive Simulcast streams must be given sufficient
        means to express what it is capable of and desires to receive. A
        sender that is willing to send Simulcast streams must similarly be
        given sufficient means to express what it is capable of and desires to
        send.</t>

        <t>An obvious candidate to express those characteristics is the media
        format in an SDP media description, defined by the rtpmap and fmtp
        attributes, which is typically mapped to an RTP Payload Type. Some of
        the most interesting characteristics for Simulcast purposes are
        however not included in rtpmap or fmtp, but are instead defined as
        separate attributes. Some of those individual attributes are possible
        to directly relate to a defined media format and could form a
        configuration together with the media format, but some attributes
        cannot be related to a specific media format and using the existing
        media format as a common identifier for a media configuration is not
        fully sufficient.</t>

        <t>The act of Simulcast is trying to handle senders and receivers
        belonging to the vast multi-dimensional parameter space of "media
        configuration" by sub-dividing that parameter space into manageable
        and meaningful sub-sets. Communication between a sender and a receiver
        can be established successfully only when the actually sent media
        configuration (sub-set) fits within the receiver's available media
        configuration sub-set. At the same time, practical and implementation
        aspects often limits the size of those sub-sets. When that receiver or
        sender sub-set is either too small or is not known, the probability of
        successful communication decreases significantly. To increase the
        probability of finding a match between sender and receiver media
        configurations, it is essential that a media configuration can be a
        set instead of a single point in the parameter space, i.e. include
        parameter listings and/or ranges instead of single values.</t>

        <t>Therefore, it is proposed to define a new media level SDP
        attribute, "a=config-id", which has relate the needed parameter types
        and the corresponding value ranges that together constitute a
        Simulcast media configuration. Each SDP media description MAY contain
        zero or more config-id attributes. The meaning of the attribute on SDP
        session level is undefined and MUST NOT be used.</t>

        <figure anchor="fig-abnf-config" title="ABNF for Media Configuration">
          <artwork><![CDATA[
configuration    = "a=config-id:" config-id WSP config-dir 
                    WSP config-list
config-id        = token
config-dir       = "send"
                 / "recv"
config-list      = config-entry *(WSP config-entry)
config-entry     = "pt" "=" pt-value *("," pt-value)
                 / image-attr
                 / "framerate" "=" fr-param
                 / "b" "=" bw-mod ":" bw-value *1("-" bw-value)
                 / ext-config-id [ "=" ext-config-value ] 
                    ; for future ext
image-attr       = "imageattr" "=" resolution-list
resolution-list  = resolution-set *("," resolution-set)
ext-config-id    = token
ext-config-value = non-ws-string
pt-value         = 1*3DIGIT ; could be made more strict
resolution-set   = "[" "x=" xyrange "," "y=" xyrange *key-values "]"
key-values       = ( "," key-value )
key-value        = ( "sar=" srange )
                 / ( "par=" prange )
                 / ( "q=" qvalue )
onetonine        = "1" / "2" / "3" / "4" / "5" 
                 / "6" / "7" / "8" / "9"
xyvalue          = onetonine *5DIGIT
step             = xyvalue
xyrange          = ( "[" xyvalue ":" [ step ":" ] xyvalue "]" )
                 / ( "[" xyvalue 1*( "," xyvalue ) "]" )
                 / ( xyvalue )
spvalue          = ( "0" "." onetonine *3DIGIT )
                 / ( onetonine "." 1*4DIGIT )
srange           =  ( "[" spvalue 1*( "," spvalue ) "]" )
                 / ( "[" spvalue "-" spvalue "]" )
                 / ( spvalue )
prange           =  ( "[" spvalue "-" spvalue "]" )
qvalue           = ( "0" "." 1*2DIGIT )
                 / ( "1" "." 1*2("0") )
fr-param         = fr-value *("," fr-value)
                 / fr-value "-" fr-value
fr-value         = 1*3DIGIT [ "." 1*2DIGIT ]
bw-mod           = "AS"
                 / "TIAS"
                 / token ; for future extensions
bw-value         = 1*DIGIT
; WSP, DQUOTE and DIGIT defined in [RFC5234]
; token and non-ws-string defined in [RFC4566]

]]></artwork>
        </figure>

        <t>A media configuration is thus identified by:<list style="hanging">
            <t hangText="config-id:">A token that identifies the media
            configuration, which MUST be unique across all media
            configurations and media descriptions in the SDP.</t>

            <t hangText="config-dir:">The direction for the stream(s)
            receiving the media configuration, as seen from the part issuing
            the SDP.</t>
          </list></t>

        <t>The media configuration MUST contain at least one and MAY contain
        more of the below media configuration entries. Each entry type MUST
        NOT appear more than once in every media configuration.</t>

        <t><list style="hanging">
            <t hangText="pt:">A comma-separated list of media formats, RTP
            payload types, which MUST be defined within the same media
            description as config-id. This describes the allowed set of codecs
            or codec configurations for this media configuration. MUST be
            present in every media configuration.</t>

            <t hangText="imageattr:">An OPTIONAL listing of preferred image
            resolutions for this media configuration. MUST NOT be used with
            other than video and image media types. An imageattr media
            configuration entry MUST NOT conflict with any "a=imageattr"
            attribute present in the same media description.</t>

            <t hangText="framerate:">An OPTIONAL range or enumeration of
            preferred framerates for this media configuration. MUST NOT be
            used with other than video media types. The high end of the range
            MUST be equal to or larger than the low end. An enumerating
            framerate media configuration entry MUST include the value of the
            "a=framerate" attribute, if any. A framerate range media
            configuration entry MUST include the "a=framerate" value in the
            range.</t>

            <t hangText="b:">An acceptable bandwidth range for this media
            configuration. Either one of the defined bandwidth modifiers MAY
            be used, which MUST share semantics with corresponding bandwidth
            modifiers from the SDP bandwidth attribute. The bandwidth value
            MUST be interpreted as defined by the bandwidth modifier. The high
            end of the range MUST be equal to or larger than the low end. The
            high end of the range MUST NOT exceed the bandwidth parameter in
            the same media description, if any. The sum of bandwidth range low
            ends for all media configurations within a media description MUST
            NOT exceed the value of that media description's bandwidth
            parameter. MUST be present in every media configuration.</t>
          </list></t>

        <t>Media configuration entry types "pt" and "b" MUST be supported by
        all implementations of this specification. Otherwise, an
        implementation MAY ignore any media configuration entry types that are
        not understood. A media configuration MAY be re-used to describe more
        than a single Source Packet Stream.</t>

        <section title="Simulcast Limitations">
          <t>The Session and Media level attributes and parameters outside of
          individual media configurations (a=config-id) provides limitations
          on the set of media configurations in simultanuous use. For example
          a media description bandwidth limitation using b=AS would apply on
          all the Packet Streams sent within the scope of that media
          description, thus forcing the sum of the media configuration
          bandwidth in use to share that available bandwidth. Don't forget
          other Packet Streams such as RTP retransmission or FEC flows that
          also needs to be included.</t>

          <t>There exist a number of different limitations, and this section
          does not intend to be complete. The payload formats and their
          configurations can offer limitations, for example video profile and
          levels imposes a joint limit on bit-rate, frame-rate and resolution.
          The bandwidth parameters on session and media description level
          apply according to their semantics and their level. Packetization
          limitations, e.g. maxptime, as well as recommendations apply to all
          the configurations within the scope where this parameter is
          defined.</t>

          <t>It is important to note that limits, such as bandwidth expressed
          within a media configuration are not limited by the media
          description values. First of all, the sum of bit-rates across all
          media configurations in a media description can be greater than the
          media description limit as not all configurations may be in
          simultanuous use. For example, only a single configuration can be
          enabled, which is then allowed to consume the full outer limit.
          Secondly, the media configuration directionality needs to be taken
          into account, for example that SDP receiver limitations are not
          applied to the sender configuration.</t>
        </section>

        <section title="Declarative Use">
          <t>When used as a declarative media description, config-id with recv
          parameter indicates the configured end-point's required media
          configuration to receive a specified set of Source Packet Streams as
          Simulcast streams. In the same fashion, config-id with send
          parameter requests the end-point to use the specified media
          configuration when sending a specified set of Source Packet Streams
          as Simulcast streams.</t>
        </section>

        <section title="Offer/Answer Use">
          <t>An offerer wanting to use Simulcast in a specific direction SHALL
          use config-id to describe the media configurations to use in that
          direction in the Offer.</t>

          <t>An answerer receiving a config-id media configuration for a
          specific direction, accepting to use that media configuration SHALL
          include a corresponding media configuration with the reverse
          direction in the Answer. The config-id identification value MUST be
          kept between the Offer and the Answer. An answerer not accepting to
          use a specific media configuration SHALL remove it from the
          Answer.</t>

          <t>The Answer MUST keep exactly the same media configuration types
          in a media configuration as were present in the corresponding media
          configuration in the Offer.</t>

          <t>The answerer MAY remove values from enumerations and MAY reduce
          ranges of media configuration entries in the Answer. If the reduced
          media configuration entry relates to the answerer's send direction,
          negotiation is complete and no further action is needed. If the
          reduced media configuration relates to the answerer's receive
          direction, the offerer SHOULD send another Offer where that related,
          send direction media configuration is reduced at least to the level
          in the previous Answer, but MAY be reduced even more, and MAY be
          removed entirely.</t>
        </section>
      </section>

      <section anchor="sec-group-config"
               title="Grouping Simulcast Configurations">
        <t>A set of <xref target="sec-media-config">media
        configurations</xref> is needed to describe a Simulcast. Each Source
        Packet Stream in the Simulcast share the same Media Source, but have
        different media configurations. Thus, the actual grouping of media
        configurations is what defines a specific Simulcast. It is proposed to
        define two new media level and session level SDP attributes,
        "a=sim-send" and "a=sim-recv", which uses config-id values to group
        media configurations for the purpose of Simulcast transmission and
        reception, respectively. "a=sim-send" and "a=sim-recv" MAY be used
        independently and simultaneously. They MAY be used on session level to
        group media configurations when different Simulcast encodings of a
        Media Source are to be sent in different Media Transports and RTP
        sessions. They MAY also be used on media level to group media
        configurations when different Simulcast encodings of a Media Source
        are to be sent based on the same media description and thus use the
        same Media Transport and RTP session. When used on media level, the
        Simulcast direction MAY conflict with the general media description
        direction, but a conflict MUST be interpreted as the Simulcast being
        effectively inhibited. For example, sim-send in a recvonly media
        description means that no Simulcast Source Packet Streams are
        sent.</t>

        <figure anchor="fig-abnf-group"
                title="ABNF for Simulcast Configuration Grouping">
          <artwork><![CDATA[
simulcast         = "a="( "sim-send:" / "sim-recv:" ) config-id-list
config-id-list    = config-item *(WSP config-item)
config-item       = config-id [":" config-param-list]
config-id         = token
config-param-list = config-param *("," config-param)
config-param      = "inactive"
                  / token ["=" param-value] ; for future extension
param-value       = 1*(value-char)
                  / DQUOTE non_ws_string DQUOTE
value-char        = token-char / %x28 / %x29 / %x2F / %x3A-3C 
                  / %x3E-40 / %x5B-5D ; VCHAR except "=" and ","
; WSP and VCHAR defined in [RFC5234]
; token, token-char and non_ws_string defined in [RFC4566]

]]></artwork>
        </figure>

        <t>The config-id identification of a media configuration MUST be
        defined by a "config-id" attribute in any of the media descriptions
        that are part of the SDP.</t>

        <section title="Declarative Use">
          <t>When used as a declarative media description, sim-recv indicates
          the configured end-point's required ability to receive Source Packet
          Streams with the specified set of media configurations as Simulcast
          streams. In the same fashion, sim-send requests the end-point to
          send Source Packet Streams with the specified set of media
          configurations as Simulcast streams.</t>

          <t>The configuration parameter "inactive" SHALL be interpreted as
          the related Source Packet Stream is in <xref
          target="I-D.westerlund-avtext-rtp-stream-pause">PAUSED state</xref>
          at the start of the session, and applicable RTP level procedures
          from that specification SHALL be applied.</t>
        </section>

        <section title="Offer/Answer Use">
          <t>An offerer wanting to send a set of Source Packet Streams as
          Simulcast streams includes sim-send in the Offer to describe which
          media configurations to use for that Simulcast. Similarly, an
          offerer wanting to receive a set of Source Packet Streams as
          Simulcast streams includes sim-recv in the Offer to describe which
          media configurations to use for that Simulcast.</t>

          <t>An answerer receiving sim-send, accepting to receive those media
          configurations as Simulcasted Source Packet Streams SHALL include
          sim-recv with the accepted media configurations in the Answer.
          Similarly, an answerer receiving sim-recv, accepting to send those
          media configurations as Simulcasted Source Packet Streams SHALL
          include sim-send with the accepted media configurations in the
          Answer. An answerer MAY remove media configurations from sim-send or
          sim-recv included in the Answer compared to the ones included in the
          sim-send or sim-recv in the Offer. The answerer MUST NOT add any
          media configurations to sim-send or sim-recv in the Answer that were
          not in the corresponding ones in the Offer.</t>

          <t>An "inactive" parameter present in the Offer MUST be kept in the
          Answer. The Answer MAY add an "inactive" parameter to any of the
          media configurations. An "inactive" parameter on a media
          configuration in "sim-recv" is equivalent to a <xref
          target="I-D.westerlund-avtext-rtp-stream-pause">PAUSE (or in some
          cases, an equivalent TMMBR 0) message</xref> being sent for the
          received Source Packet Stream at the start of the session, and
          applicable RTP level procedures from that specification SHALL be
          applied. An "inactive" parameter on a media configuration in
          "sim-send" is equivalent to the related Source Packet Stream being
          in PAUSED state at the start of the session, and applicable RTP
          level procedures SHALL be applied.</t>

          <t>The number of different Source Packet Streams used for a
          Simulcast related to a single media description MUST NOT exceed the
          number of listed media configurations in the corresponding sim-recv
          in that media description sent by the media receiver.</t>
        </section>
      </section>

      <section anchor="sec-srcname" title="Relating Simulcast Versions">
        <t>To ensure that Simulcast Packet Streams can be related correctly on
        RTP level, <xref target="I-D.westerlund-avtext-rtcp-sdes-srcname">SDES
        SRCNAME</xref> MUST be used to label Simulcast versions belonging to
        the same Media Source. The RTP Header Extension option of that
        specification MAY be used with Simulcast.</t>

        <t>The SRCNAME identifier for Simulcast MUST contain a first part that
        uniquely identifies the Media Source within a given CNAME, followed by
        a single "." (period) and the config-id as defined <xref
        target="sec-media-config">above</xref>.</t>

        <t>The SRCNAME parameter to <xref target="RFC5576">source-specific
        signaling</xref> ("a=ssrc") MAY be used for Source Packet Streams in
        the send direction to relate SRCNAME to SSRC already in the SDP.</t>
      </section>

      <section anchor="sec-two-phase" title="Two-Phase Negotiation">
        <t>The new "a=sim-send-cap" and "a=sim-recv-cap" attributes MAY be
        included in the SDP as an optional pre-stage in a two-phased approach,
        where the pre-stage involves a first SDP Offer/Answer procedure that
        only establishes Simulcast capability at both the offerer and the
        answerer. This has the additional advantage to avoid sending media
        descriptions related to Simulcast to an endpoint that does not support
        simulcast. In case two Offer/Answer procedures are already used for
        other reasons, it will not incur any significant extra signaling
        round-trips. Such other two-phase techniques include use of SIP
        OPTIONS, <xref target="RFC3311">SIP UPDATE</xref> with reliable
        provisional responses, and <xref
        target="I-D.ietf-mmusic-sdp-bundle-negotiation">BUNDLE</xref>.</t>

        <t>Thus, when using the pre-stage Offer/Answer, it SHOULD NOT include
        any simulcast-grouped media descriptions, which SHOULD then instead be
        added in a main Offer/Answer phase. When using the pre-stage
        Offer/Answer, half a signaling round-trip time can sometimes be saved
        if main phase is initiated by the Simulcast receiver, meaning that the
        endpoint that included "a=sim-recv" in the pre-stage SDP is the
        offerer in the main phase. If both endpoints are Simulcast receivers,
        it does not matter which endpoint sends the main Offer, using regular
        Offer/Answer rules to handle any race conditions.</t>

        <t>It is not possible to use any pre-stage to establish capability
        with declarative SDP, in which case it SHALL be by-passed, using only
        the main phase directly.</t>
      </section>

      <section title="Signaling Examples">
        <t>These examples are for a case of client to video conference service
        using a centralized media topology with an RTP mixer.</t>

        <figure align="center" anchor="fig-mixer-four-party"
                title="Four-party Mixer-based Conference">
          <artwork><![CDATA[
+---+      +-----------+      +---+
| A |<---->|           |<---->| B |
+---+      |           |      +---+
           |   Mixer   |
+---+      |           |      +---+
| F |<---->|           |<---->| J |
+---+      +-----------+      +---+]]></artwork>
        </figure>

        <section anchor="sec-ex-unified-plan" title="Unified Plan Client">
          <t>Alice is calling in to the mixer with a Simulcast-enabled Unified
          Plan client capable of a single Media Source per media type. The
          only difference to a non-Simulcast client is capability to send
          <xref target="RFC6236">video resolution</xref> ("imageattr") and
          framerate based Simulcast. Alice uses a pre-stage Offer, which looks
          like:</t>

          <figure anchor="fig-up-first-offer"
                  title="Unified Plan Simulcast Pre-Stage Offer">
            <artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:665
a=sim-send-cap:imageattr framerate
m=audio 49200 RTP/AVP 96 8
b=AS:145
a=rtpmap:96 G719/48000/2
a=rtpmap:8 PCMA/8000
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
    recv [x=640,y=360] [x=320,y=180]

]]></artwork>
          </figure>

          <t>In this pre-stage, the only thing in the SDP that indicates
          Simulcast capability is the line in the video media description
          containing the "sim-send-cap" attribute, which also indicates that
          sent Simulcast versions can differ in video resolution and/or
          framerate.</t>

          <t>The Answer from the server indicates both that it too is
          Simulcast capable and that it would prefer to use video resolution
          ("imageattr") based Simulcast, but that it supports both video
          resolution and framerate. Should it not have been Simulcast capable,
          the "a=sim-recv-cap" line would not have been present and
          communication would have started with the media negotiated in the
          SDP.</t>

          <figure anchor="fig-up-first-answer"
                  title="Unified Plan Simulcast Pre-Stage Answer">
            <artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Answer to Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:665
a=sim-recv-cap:imageattr=1.0 framerate=0.8
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
    recv [x=640,y=360] [x=320,y=180]

]]></artwork>
          </figure>

          <t>Since the server is the Simulcast media receiver, it immediately
          initiates another Offer/Answer including details on the Simulcast
          versions. The server also keeps the "sim-recv-cap" as explicit
          Simulcast capability indication in this main Offer/Answer. Note that
          the "non-simulcast" media can be started already now, before the
          main Offer/Answer, with the only restriction that the Simulcast
          functionality is not yet established.</t>

          <figure anchor="fig-up-main-offer"
                  title="Unified Plan Simulcast Main Offer">
            <artwork><![CDATA[
v=0
o=server 823479283 1209384938 IN IP4 192.0.2.2
s=Server Inviting Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.43
b=AS:825
a=sim-recv-cap:imageattr=1.0 framerate=0.8
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:2200
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=config-id:a recv pt=97 imageattr=[x=640,y=360],[x=1280,y=720] \
    framerate=25-60 b=AS:500-2500
a=config-id:b recv pt=97 imageattr=[x=320,y=180],[x=640,y=360] \
    framerate=25-60 b=AS:150-500
a=config-id:c recv pt=97 imageattr=[x=256,y=144],[x=320,y=180] \
    framerate=10-30 b=AS:100-250
a=sim-recv:a b c

]]></artwork>
          </figure>

          <t>The server chooses to structure the Answer according to Unified
          Plan and has added three config-id lines in the video media
          description, one for each Simulcast media configuration that it is
          prepared to receive. Each media configuration refers to a defined
          media format, and lists a set of preferred video resolutions as well
          as a range of acceptable framerates, concluded by a bandwidth range.
          It also includes the sim-recv attribute for those three media
          configurations, indicating that the Simulcast it is prepared to
          receive in this media description can include one or more of those
          media configurations.</t>

          <t>Alice's Answer is:</t>

          <figure anchor="fig-up-main-answer"
                  title="Unified Plan Simulcast Main Answer">
            <artwork><![CDATA[
v=0
o=alice 2362969037 2362969040 IN IP4 192.0.2.156
s=Final answer from Simulcast Enabled Unified Plan Client
t=0 0
c=IN IP4 192.0.2.156
b=AS:825
a=sim-send-cap:imageattr framerate
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 97
b=AS:520
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42c01e
a=config-id:b send pt=97 imageattr=[x=640,y=360] \
    framerate=25-30 b=AS:150-400
a=config-id:c send pt=97 imageattr=[x=320,y=180] \
    framerate=10-12.5 b=AS:100-150
a=sim-send:b c:inactive
a=ssrc:31053821 cname=SDIe93850aQFid9P srcname=1.b
a=ssrc:43298172 cname=SDIe93850aQFid9P srcname=1.c
a=imageattr:97 send [x=640,y=360] [x=320,y=180] \
    recv [x=640,y=360] [x=320,y=180]

]]></artwork>
          </figure>

          <t>The Simulcast capability, sim-send-cap, is kept from Alice's
          previous Offer. One of the media configurations from the server
          Offer, config-id:a, is not acceptable to Alice's client for some
          reason and is removed from the Answer. The resulting Simulcast,
          described by sim-send, thus contains two media configurations, b and
          c, where c is initially set to "inactive" that effectively means it
          is paused from the start of the session. The media configuration
          parameter value ranges are in some cases reduced, which makes a more
          precise definition of what will actually be sent. This Answer SDP
          also includes a specification of the SSRC values that will be sent
          and what media configurations those SSRC will carry, by including
          the srcname parameter. The first part of srcname, before the ".", is
          the Media Source identification. Both SSRC share the same Media
          Source identification, since they are part of the same Simulcast.
          The second part, after the ".", is the config-id of the media
          configuration sent with that SSRC.</t>
        </section>

        <section anchor="sec-ex-multi-transport"
                 title="Multi-Transport Client">
          <t>Bob is calling in to the mixer with a Simulcast-enabled client,
          like Alice's capable of a single Media Source per media type, but
          also capable of sending Source Packet Streams as Simulcast versions
          on separate Media Transports. In this example, Bob's client knows
          that the server is capable of Simulcast and does not use any
          pre-stage Offer, but goes straight to the main Offer.</t>

          <figure anchor="fig-mt-main-offer"
                  title="Multi-Transport Simulcast Main Offer">
            <artwork><![CDATA[
v=0
o=bob 94572932847 3429478298 IN IP4 192.0.2.93
s=Offer from Simulcast Enabled Multi-Transport Client
t=0 0
c=IN IP4 192.0.2.93
b=AS:825
a=sim-send-cap:imageattr=1.0 framerate=0.9
a=sim-send:x y
m=audio 50138 RTP/AVP 101
b=AS:145
a=rtpmap:101 G719/48000/2
m=video 50226 RTP/AVP 118
b=AS:500
a=rtpmap:118 H264/90000
a=fmtp:118 profile-level-id=42c01e
a=config-id:x send pt=118 imageattr=[x=320,y=180],[x=640,y=360] \
    framerate=25-50 b=AS:200-500
a=ssrc:3929384298 cname=Nsdko39Oen828FKn srcname=M.x
a=imageattr:118 send [x=640,y=360] [x=320,y=180] \
    recv [x=640,y=360] [x=320,y=180]
m=video 50228 RTP/AVP 119
b=AS:150
a=config-id:y send pt=119 imageattr=[x=256,y=144],[x=320,y=180] \
    framerate=12.5-25 b=AS:100-200
a=ssrc:1923419284 cname=Nsdko39Oen828FKn srcname=M.y
a=imageattr:119 send [x=320,y=180] [x=256,y=144]
a=sendonly

]]></artwork>
          </figure>

          <t>As can be seen from above, this Offer uses sim-send on session
          level and has split the Simulcast media configurations on two media
          descriptions, in order to be able to use separate Media Transports
          and enable differentiated treatment of the two Simulcast
          streams.</t>

          <t>The server accepts this structure to the Answer:</t>

          <figure anchor="fig-mt-main-answer"
                  title="Multi-Transport Simulcast Main Answer">
            <artwork><![CDATA[
v=0
o=server 283479882 9384298374 IN IP4 192.0.2.2
s=Server Answering Simulcast Enabled Multi-Transport Client
t=0 0
c=IN IP4 192.0.2.45
b=AS:825
a=sim-recv-cap:imageattr framerate
a=sim-recv:x y
m=audio 49200 RTP/AVP 96
b=AS:145
a=rtpmap:96 G719/48000/2
m=video 49300 RTP/AVP 118
b=AS:500
a=rtpmap:118 H264/90000
a=fmtp:118 profile-level-id=42c01e
a=config-id:x recv pt=118 imageattr=[x=640,y=360] \
    framerate=25-50 b=AS:350-500
a=imageattr:118 send [x=640,y=360] [x=320,y=180] \
    recv [x=640,y=360] [x=320,y=180]
m=video 49300 RTP/AVP 119
b=AS:150
a=rtpmap:119 H264/90000
a=fmtp:119 profile-level-id=42c01e
a=config-id:y recv pt=119 imageattr=[x=256,y=144] \
    framerate=12.5-25 b=AS:120-150
a=imageattr:119 recv [x=320,y=180] [x=256,y=144]
a=recvonly

]]></artwork>
          </figure>

          <t/>
        </section>

        <section title="Multi-Source Client">
          <t>Fred is calling in to the same conference as in the examples
          above with a three-camera, three-display system, thus capable of
          handling three separate Media Sources in each direction, where each
          Media Source is also Simulcast-enabled in the send direction. Fred's
          client is a Unified Plan client, restricted to a single Media Source
          per media description.</t>

          <figure anchor="fig-ms-main-offer"
                  title="Fred's Multi-Source Simulcast Main Offer">
            <artwork><![CDATA[
v=0
o=fred 238947129 823479223 IN IP4 192.0.2.125
s=Offer from Simulcast Enabled Multi-Source Client
t=0 0
c=IN IP4 192.0.2.125
b=AS:825
a=sim-send-cap:imageattr=1.0 framerate=0.5

m=audio 49200 RTP/AVP 98
b=AS:145
a=rtpmap:98 G719/48000/2

m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:1h send pt=100 imageattr=[x=1920,y=1080] \
    framerate=30-60 b=AS:2000-3500
a=config-id:1m send pt=100 imageattr=[x=1280,y=720] \
    framerate=15-60 b=AS:1000-2000
a=config-id:1l send pt=100 imageattr=[x=640,y=360] \
    framerate=10-60 b=AS:200-1000
a=sim-send:1h 1m 1l
a=ssrc:2397234521 cname=EkeS32892FeO29DK srcname=1.1h
a=ssrc:1023894789 cname=EkeS32892FeO29DK srcname=1.1m
a=ssrc:4029284928 cname=EkeS32892FeO29DK srcname=1.1l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
    recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]

m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:2h send pt=100 imageattr=[x=1920,y=1080] \
    framerate=30-60 b=AS:2000-3500
a=config-id:2m send pt=100 imageattr=[x=1280,y=720] \
    framerate=15-60 b=AS:1000-2000
a=config-id:2l send pt=100 imageattr=[x=640,y=360] \
    framerate=10-60 b=AS:200-1000
a=sim-send:2h 2m 2l
a=ssrc:2301017618 cname=EkeS32892FeO29DK srcname=2.2h
a=ssrc:639711316 cname=EkeS32892FeO29DK srcname=2.2m
a=ssrc:3293473905 cname=EkeS32892FeO29DK srcname=2.2l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
    recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]

m=video 49600 RTP/AVP 100
b=AS:3500
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42c02a
a=config-id:3h send pt=100 imageattr=[x=1920,y=1080] \
    framerate=30-60 b=AS:2000-3500
a=config-id:3m send pt=100 imageattr=[x=1280,y=720] \
    framerate=15-60 b=AS:1000-2000
a=config-id:3l send pt=100 imageattr=[x=640,y=360] \
    framerate=10-60 b=AS:200-1000
a=sim-send:3h 3m 3l
a=ssrc:4115355057 cname=EkeS32892FeO29DK srcname=3.3h
a=ssrc:3196538337 cname=EkeS32892FeO29DK srcname=3.3m
a=ssrc:3757973912 cname=EkeS32892FeO29DK srcname=3.3l
a=imageattr:100 send [x=1920,y=1080] [x=1280,y=720] [x=640,y=360] \
    recv [x=1920,y=1080] [x=1280,y=720] [x=640,y=360]

]]></artwork>
          </figure>

          <t>The three media descriptions for video are essentially the same,
          except values that needs to be unique are provided unique values.
          The above also assumes that BUNDLE will be used across these three
          video media description to create a common RTP session. </t>
        </section>
      </section>
    </section>

    <section anchor="sec-network-aspects" title="Network Aspects">
      <t>Simulcast is in defined as the act of sending multiple alternative
      encodings of the same underlying media source. When transmitting
      multiple independent streams that originate from the same source, it
      could potentially be done in several different ways using RTP. A general
      discussion on considerations for use of the different RTP multiplexing
      alternatives can be found in <xref
      target="I-D.ietf-avtcore-multiplex-guidelines">Guidelines for
      Multiplexing in RTP</xref>. Discussion and clarification on how to
      handle multiple streams in an RTP session can be found in <xref
      target="I-D.ietf-avtcore-rtp-multi-stream"/>.</t>

      <t>The network aspects that are relevant for Simulcast are:<list
          style="hanging">
          <t hangText="Quality of Service:">When using Simulcast it might be
          of interest to prioritize a particular Simulcast version, rather
          than applying equal treatment to all versions. For example, lower
          bit-rate versions may be prioritized over higher bit-rate versions
          to minimize congestion or packet losses in the low bit-rate
          versions. Thus, there is a benefit to use a Simulcast solution that
          supports QoS as good as possible. By separating Simulcast versions
          into different RTP sessions and send those RTP sessions over
          different Media Transports, a Simulcast version can be prioritized
          by existing flow based QoS mechanisms. When using unicast, QoS
          mechanisms based on individual packet marking are also feasible,
          which do not require separation of Simulcast versions into different
          RTP sessions to apply different QoS.</t>

          <t hangText="NAT/FW Traversal:">Using multiple RTP sessions will
          incur more cost for NAT/FW traversal unless they can re-use the same
          transport flow, which can be achieved by either one of <xref
          target="I-D.westerlund-avtcore-transport-multiplexing">multiplexing
          multiple RTP sessions on a single lower layer transport</xref> or
          <xref target="I-D.ietf-mmusic-sdp-bundle-negotiation">Multiplexing
          Negotiation Using SDP Port Numbers</xref>. If flow based QoS with
          any differentiation is desirable, the cost for additional transport
          flows is likely necessary.</t>

          <t hangText="Multicast:">Multiple RTP sessions will be required to
          enable combining Simulcast with multicast. Different Simulcast
          versions have to be separated to different multicast groups to allow
          a multicast receiver to pick the version it wants, rather than
          receive all of them. In this case, the only reasonable
          implementation is to use different RTP sessions for each multicast
          group so that reporting and other RTCP functions operate as
          intended.</t>
        </list></t>

      <t/>
    </section>

    <section anchor="sec-iana" title="IANA Considerations">
      <t>This document requests that five new attributes, sim-send-cap,
      sim-recv-cap, sim-send, sim-recv, and config-id. It is also requested to
      make a new registry of defined parameters taken from existing SDP
      attributes for sim-send-cap, sim-recv-cap, and config-id.</t>

      <t>Formal registrations to be written.</t>
    </section>

    <section anchor="sec-security" title="Security Considerations">
      <t>The Simulcast capability and configuration attributes and parameters
      are vulnerable to attacks in signaling.</t>

      <t>A false inclusion of Simulcast attributes may result in generation of
      a second phase SDP that potentially contains a large number of
      non-supported media descriptions expressing Simulcast alternatives. A
      correct SDP implementation will however be able to reject any
      non-supported media descriptions and the effect from that should be
      limited.</t>

      <t>A hostile removal of the Simulcast attributes will result in skipping
      any second phase Offer/Answer and that Simulcast is not used.</t>

      <t>The Simulcast grouping semantics are vulnerable to attacks in the
      signalling. Changing the set of media configurations that are used in a
      Simulcast will impact the number of Source Packet Streams.</t>

      <t>A hostile removal of Simulcast grouping will prevent streams from
      being interpreted as Simulcast, which obviously prevents use of the
      Simulcast functionality. It will also risk that intended Simulcast
      streams are instead presented as separate, independent streams to a
      receiver.</t>

      <t>Neither of the above will likely have any major consequences and can
      be mitigated by signaling that is at least integrity and source
      authenticated to prevent an attacker to change it.</t>
    </section>

    <section title="Contributors">
      <t>Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have
      contributed with important material to the first versions of this
      document.</t>
    </section>

    <section anchor="sec-ack" title="Acknowledgements">
      <t/>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3311'?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.4566'?>

      <?rfc include='reference.RFC.4568'?>

      <?rfc include='reference.RFC.5109'?>

      <?rfc include='reference.RFC.5234'?>

      <?rfc include='reference.RFC.5285'?>

      <?rfc include='reference.RFC.5576'?>

      <?rfc include='reference.RFC.5888'?>

      <?rfc include='reference.RFC.6236'?>

      <?rfc include='reference.I-D.westerlund-avtext-rtcp-sdes-srcname'?>

      <?rfc include='reference.I-D.westerlund-mmusic-max-ssrc'?>

      <?rfc include='reference.I-D.westerlund-avtext-rtp-stream-pause'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.3264'?>

      <?rfc include='reference.RFC.3569'?>

      <?rfc include='reference.RFC.4588'?>

      <?rfc include='reference.RFC.5117'?>

      <?rfc include='reference.RFC.5245'?>

      <?rfc include='reference.RFC.6190'?>

      <?rfc include='reference.I-D.ietf-avtcore-multiplex-guidelines'?>

      <?rfc include='reference.I-D.ietf-avtcore-rtp-multi-stream'?>

      <?rfc include='reference.I-D.westerlund-avtcore-transport-multiplexing'?>

      <?rfc include='reference.I-D.ietf-avtcore-rtp-topologies-update'?>

      <?rfc include='reference.I-D.ietf-mmusic-sdp-bundle-negotiation'?>

      <?rfc include='reference.I-D.lennox-raiarea-rtp-grouping-taxonomy'?>
    </references>

    <section anchor="appendix-a" title="Discussion on Receiver Diversity">
      <t>Receiver diversity can be handled in a number of different ways, each
      with its own advantages and disadvantages. In that, there are relations
      between RTP Mixer processing requirement, bandwidth usage on uplink from
      sending Participant to RTP Mixer, bandwidth usage on downlink from RTP
      Mixer to receiving Participant, and media Quality of Experience at the
      receiving Participant.</t>

      <t>The following is a listing of possible approaches:<list
          style="numbers">
          <t>Lowest Common Denominator: Create a single Source Packet Stream
          per Media Source and, assuming that everyone can receive a "simple"
          stream, adapt the characteristics of that Source Packet Stream
          already at the sending Participant to the lowest common denominator
          among all receiving Participants. Let the RTP Mixer forward this
          single Source Packet Stream to all receiving Participants. The
          advantages are low bandwidth usage on both uplink and downlink and
          low RTP Mixer processing requirements. The disadvantage is that the
          least capable receiver and/or network path dictates the (low) QoE
          for everyone else.</t>

          <t>Individual Transcoding: Create a single Source Packet Stream per
          Media Source with characteristics governed by resources available to
          the sending Participant and the network path to the RTP Mixer. Let
          the RTP Mixer transcode (decode and re-encode) that into individual
          Source Packet Streams for each receiving Participant, governed by
          the RTP Mixer resources, receiving Participant resources, and the
          network path to that Participant. The advantages are adapted
          although overall slightly lowered QoE (due to transcoding) to each
          Participant and optimised bandwidth usage on both uplink and
          downlink. The disadvantage is (very) high RTP Mixer processing
          requirements.</t>

          <t>Individual Simulcast: Create individual Source Packet Streams of
          each Media Source to each receiving Participant, constituting a
          complete individual Simulcast. Let the RTP Mixer forward each
          individual Source Packet Stream to the targeted receiving
          Participant. The advantages are low RTP Mixer processing and
          optimised downlink bandwidth. The disadvantage is (very) high uplink
          bandwidth.</t>

          <t>Grouped Simulcast: For each Media Source, create a "suitable"
          logical grouping of receiving Participants in sub-groups with
          respect to available receiver resources, for example the resources
          listed <xref target="sec-diverse-receivers">above</xref>. Create a
          set of Source Packet Streams for this Media Source with well-chosen
          characteristics, where each Source Packet Stream in the set is a
          good-enough fit to the receiving sub-group of Participants. This set
          of Source Packet Streams constitutes a Simulcast of the Media
          Source. The size of the set and the characteristics of each Source
          Packet Stream can be adjusted to cater for various restrictions in
          the sending Participant, receiving Participants in the sub-group,
          and network path(s) to the Participants in the sub-group. Let the
          RTP Mixer forward the same Source Packet Stream to all Participants
          in a sub-group, for all Source Packet Streams and sub-groups. The
          advantages are low RTP Mixer processing, near optimum QoE, and near
          optimum downlink bandwidth. The disadvantages are high uplink
          bandwidth and arguably that downlink bandwidth and QoE are optimum
          only for a sub-group and not per individual receiving
          Participant.</t>
        </list>A summary of the advantages and disadvantages of the above four
      principle alternatives is given <xref
      target="tab-diversity">below</xref>:</t>

      <texttable anchor="tab-diversity"
                 title="Receiver Diversity Handling Comparison">
        <ttcol>Method</ttcol>

        <ttcol>Mixer CPU</ttcol>

        <ttcol>Uplink</ttcol>

        <ttcol>Downlink</ttcol>

        <ttcol>QoE</ttcol>

        <c>1</c>

        <c>Low</c>

        <c>Low</c>

        <c>Low</c>

        <c>Low</c>

        <c>2</c>

        <c>Very high</c>

        <c>Optimum</c>

        <c>Optimum</c>

        <c>Near optimum</c>

        <c>3</c>

        <c>Low</c>

        <c>Very high</c>

        <c>Optimum</c>

        <c>Optimum</c>

        <c>4</c>

        <c>Low</c>

        <c>High</c>

        <c>Near optimum</c>

        <c>Near optimum</c>
      </texttable>

      <t>The authors of this document believes that alternative 4, the Grouped
      Simulcast, can be a good tradeoff whenever supported by sufficient
      uplink resources.</t>
    </section>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-23 19:34:10