One document matched: draft-alvestrand-mmusic-msid-01.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-alvestrand-mmusic-msid-01"
     ipr="trust200902">
  <front>
    <title abbrev="MSID in SDP">Cross Session Stream Identification in the
    Session Description Protocol</title>

    <author fullname="Harald Alvestrand" initials="H. T." surname="Alvestrand">
      <organization>Google</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <region></region>

          <code>11122</code>

          <country>Sweden</country>
        </postal>

        <email>harald@alvestrand.no</email>
      </address>
    </author>

    <date day="15" month="October" year="2012" />

    <abstract>
      <t>This document specifies a grouping mechanism for RTP media streams
      that can be used to specify relations between media streams within
      different RTP sessions.</t>

      <t>This mechanism is used to signal the association between the RTP
      concept of SSRC and the WebRTC concept of "media stream" / "media stream
      track" using SDP signalling.</t>

      <t>This document is an input document for discussion. It should be
      discussed in the MMUSIC WG list, mmusic@ietf.org.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t></t>

      <section title="Structure Of This Document">
        <t>This document extends the SSRC grouping framework <xref
        target="RFC5888"></xref> by adding a new grouping relation that can
        cross RTP session boundaries.</t>

        <t><xref target="sect-why"></xref> gives the background on why a new
        mechanism is needed.</t>

        <t><xref target="sect-definition"></xref> gives the definition of the
        new mechanism.</t>

        <t><xref target="sect-media-stream"></xref> gives the application of
        the new mechanism for providing necessary semantic information for the
        association of MediaStreamTracks to MediaStreams in the WebRTC API
        .</t>
      </section>

      <section anchor="sect-why" title="Why A New Mechanism Is Needed">
        <t>When media is carried by RTP <xref target="RFC3550"></xref>, each
        RTP media stream is distinguished inside an RTP session by its SSRC;
        each RTP session is distinguished from all other RTP sessions by being
        on a different transport association (strictly speaking, 2 transport
        associations, one used for RTP and one used for RTCP, unless RTCP
        multiplexing <xref target="RFC5761"></xref> is used).</t>

        <t>There exist cases where an application using RTP and SDP needs to
        signal some relationship between RTP media streams that may be carried
        in either the same RTP session or different RTP sessions. For
        instance, there may be a need to signal a relationship between a video
        track in one RTP session and an audio track in another RTP session. In
        traditional SDP, it is not possible to signal that these two tracks
        should be carried in one session, so they are carried in different RTP
        sessions.</t>

        <t>The SSRC grouping mechanism ("a=ssrc-group") <xref
        target="RFC5576"></xref> can be used to associate RTP media streams
        when those RTP media streams are part of the same RTP session. The
        semantics of this mechanism prevent the association of RTP media
        streams that are spread across different RTP sessions.</t>

        <t>The SDP grouping framework <xref target="RFC5888"></xref> can be
        used to group RTP sessions. When an RTP session carries one and only
        one RTP media stream, it is possible to associate RTP media streams
        across different RTP sessions. However, if an RTP session has multiple
        RTP media streams, using multiple SSRCs, the SDP grouping framework
        cannot be used for this purpose.</t>

        <t>There are use cases (some of which are discussed in <xref
        target="I-D.westerlund-avtcore-multiplex-architecture"></xref> ) where
        neither of these approaches is appropriate; In those cases, a new
        mechanism is needed.</t>

        <t>In addition, there is sometimes the need for an application to
        specify some application-level information about the association
        between the SSRC and the group. This is not possible using either of
        the frameworks above.</t>
      </section>

      <section title="Application to the WEBRTC MediaStream">
        <t>The W3C WebRTC API specification <xref
        target="W3C.WD-webrtc-20120209"></xref> specifies that communication
        between WebRTC entities is done via MediaStreams, which contain
        MediaStreamTracks. A MediaStreamTrack is generally carried using a
        single SSRC in an RTP session (forming an RTP media stream. The
        collision of terminology is unfortunate.) There might possibly be
        additional SSRCs, possibly within additional RTP sessions, in order to
        support functionality like forward error correction or simulcast. This
        complication is ignored below.</t>

        <t>In the RTP specification, media streams are identified using the
        SSRC field. Streams are grouped into RTP Sessions, and also carry a
        CNAME. Neither CNAME nor RTP session correspond to a MediaStream.
        Therefore, the association of an RTP media stream to MediaStreams need
        to be explicitly signalled.</t>

        <t>The marking needs to be on a per-SSRC basis, since one RTP session
        can carry media from multiple MediaStreams, and one MediaStream can
        have media in multiple RTP sessions. This means that the <xref
        target="RFC4574"></xref> "label" attribute, which is used to label RTP
        sessions, is not usable for this purpose.</t>

        <t>The marking needs to also carry the unique identifier of the RTP
        media stream as a MediaStreamTrack within the media stream; this is
        done using a single letter to identify whether it belongs in the video
        or audio track list, and the MediaStreamTrack's position within that
        array.</t>

        <t>This usage is described in <xref
        target="sect-media-stream"></xref>.</t>
      </section>
    </section>

    <section anchor="sect-definition" title="The Msid Mechanism">
      <t>This document extends the Source-Specific Media Attributes framework
      <xref target="RFC5576"></xref> by adding a new "msid" attribute that can
      be used with the "a=ssrc" SDP attribute. This new attribute allows
      endpoints to associate RTP media streams that are carried in different
      RTP sessions, as well as allowing application-specific information to
      the association.</t>

      <t>The value of the "msid" attribute consists of an identifier and
      optional application-specific data, according to the following ABNF
      <xref target="RFC5234"></xref> grammar:</t>

      <t></t>

      <figure>
        <artwork><![CDATA[
  ; "attribute" is defined in RFC 4566.
  ; This attribute should be used with the ssrc-attr from RFC 5576.
  attribute =/ msid-attr
  msid-attr = "msid:" identifier [ " " appdata ]
  identifier = 1*64 ("0".."9" / "a".."z" / "-")
  appdata = 1*64 ("0".."9" / "a".."z" / "-")

]]></artwork>
      </figure>

      <t>An example MSID value for the SSRC 1234 might look like this:</t>

      <figure>
        <artwork><![CDATA[  a=ssrc:1234 msid:examplefoo v1

]]></artwork>
      </figure>

      <t>The identifier is a string of ASCII characters chosen from 0-9, a-z,
      A-Z and - (hyphen), consisting of between 1 and 64 characters. It MUST
      be unique among the identifier values used in the same SDP session. It
      is RECOMMENDED that is generated using a random-number generator.</t>

      <t>Application data is carried on the same line as the identifier,
      separated from the identifier by a space.</t>

      <t>The identifier uniquely identifies a group within the scope of an SDP
      description.</t>

      <t>There may be multiple msid attributes on a single SSRC. There may
      also be multiple SSRCs that have the same value for identifier and
      application data.</t>

      <t>Endpoints can update the associations between SSRCs as expressed by
      msid attributes at any time; the semantics and restrictions of such
      grouping and ungrouping are application dependent.</t>
    </section>

    <section title="The Msid-Semantic Attribute">
      <t>In order to fully reproduce the semantics of the SDP and SSRC
      grouping frameworks, a session-level attribute is defined for signalling
      the semantics associated with an msid grouping.</t>

      <t>This OPTIONAL attribute gives the group identifier and its group
      semantic; it carries the same meaning as the ssrc-group-attr of RFC 5576
      section 4.2, but uses the identifier of the group rather than a list of
      SSRC values.</t>

      <t>The ABNF of msid-semantic is:</t>

      <figure>
        <artwork><![CDATA[
  attribute =/ msid-semantic-attr
  msid-semantic-attr = "msid-semantic:" " " identifier token
  token = <as defined in RFC 4566>
   ]]></artwork>
      </figure>

      <t>The semantic field may hold values from the IANA registries
      "Semantics for the "ssrc-group" SDP Attribute" and "Semantics for the
      "group" SDP Attribute".</t>

      <t>An example msid-semantic might look like this:</t>

      <figure>
        <artwork><![CDATA[  a=msid-semantic: examplefoo LS

]]></artwork>
      </figure>

      <t></t>
    </section>

    <section anchor="sect-media-stream"
             title="Applying Msid to WebRTC Media Streams">
      <t>This section creates a new semantic for use with the framework
      defined in <xref target="sect-definition"></xref>, to be used for
      associating SSRCs representing media stream tracks with media streams as
      defined in <xref target="W3C.WD-webrtc-20120209"></xref>.</t>

      <t>The semantic token for WebRTC Media Streams is "WMS".</t>

      <t>The value of the msid corresponds to the "id" attribute of a
      MediaStream. (note: as of Jan 11, 2012, this is called "label". The word
      "label" means many other things, so the same word should not be
      used.)</t>

      <t>In a WebRTC-compatible SDP description, all SSRCs intending to be
      sent from one peer will be identified in the SDP generated by that
      entity.</t>

      <t>The appdata for a WebRTC MediaStreamTrack consists of the track type
      and the track number; the track type is encoded as the single letter "a"
      (audio) or "v" (video), and the track number is encoded as a decimal
      integer with no leading zeros. The first track is track zero, and is
      identified as "a0" for audio, and "v0" for video.</t>

      <t>If two different SSRCs have the same value for identifier and
      appdata, it means that these two SSRCs are both intended for the same
      MediaStreamTrack. This may occur if the sender wishes to use simulcast
      or forward error correction, or if the sender intends to switch between
      multiple codecs on the same MediaStreamTrack.</t>

      <t>When an SDP description is updated, a specific msid continues to
      refer to the same media stream; an msid value MUST NOT be reused for
      another media stream within a PeerConnection's lifetime.</t>

      <t>The following are the rules for handling updates of the list of SSRCs
      and their msid values.</t>

      <t><list style="symbols">
          <t>When a new msid value occurs in the description, the recipient
          can signal to its application that a new media stream has been
          added.</t>

          <t>When a description is updated to have more SSRCs with the same
          msid value, the recipient can signal to its application that new
          media stream tracks have been added to the media stream.</t>

          <t>When a description is updated to no longer list the msid value on
          a specific ssrc, the recipient can signal to its application that
          the corresponding media stream track has been closed.</t>

          <t>When a description is updated to no longer list the msid value on
          any ssrc, the recipient can signal to its application that the media
          stream has been closed.</t>
        </list>OPEN ISSUE: Exactly when should the recipient signal that the
      track is closed? When the msid value disappears from the description,
      when the SSRC disappears by the rules of <xref target="RFC3550"></xref>
      section 6.3.4 (BYE packet received) and 6.3.5 (timeout), any of the
      above, or some combination of the above?</t>

      <t></t>

      <section title="Handling of non-signalled tracks">
        <t>Pre-WebRTC entities will not send msid. This means that there will
        be some incoming RTP packets with SSRCs where the recipient does not
        know about a corresponding MediaStream id.</t>

        <t>Handling will depend on whether or not any SSRCs are signalled in
        the relevant RTP session. There are two cases:</t>

        <t><list style="symbols">
            <t>No SSRC is signalled with an msid attribute. The SDP session is
            assumed to be a backwards-compatible session. All incoming SSRCs,
            on all RTP sessions that are part of the SDP session, are assumed
            to belong to a single media stream. The identifier of this media
            stream is "default".</t>

            <t>Some SSRCs are signalled with an msid attribute. In this case,
            the session is WebRTC compatible, and the newly arrived SSRCs are
            either caused by a bug or by timing skew between the arrival of
            the media packets and the SDP description. These packets MAY be
            discarded, or they MAY be buffered for a while in order to allow
            immediate startup of the media stream when the SDP description is
            updated. The arrival of media packets MUST NOT cause a new
            MediaStreamTrack to be signalled.</t>
          </list>Note: This means that it is wise to include at least one
        a=ssrc: line with an msid attribute, even when no media streams are
        yet attached to the session. (Alternative: Mark the RTP session
        explicitly as "I will signal the media stream tracks explicitly").</t>

        <t>It follows from the above that media stream tracks in the "default"
        media stream cannot be closed by signalling; the application must
        instead signal these as closed when the SSRC disappears according to
        the rules of RFC 3550 section 6.3.4 and 6.3.5.</t>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document requests IANA to register the "msid" attribute in the
      "att-field (source level)" registry within the SDP parameters registry,
      according to the procedures of <xref target="RFC5576"></xref></t>

      <t>The required information is:</t>

      <t><list style="symbols">
          <t>Contact name, email: IETF, contacted via rtcweb@ietf.org, or a
          successor address designated by IESG</t>

          <t>Attribute name: msid</t>

          <t>Long-form attribute name: Media stream group Identifier</t>

          <t>The attribute value contains only ASCII characters, and is
          therefore not subject to the charset attribute.</t>

          <t>The attribute gives an association over a set of SSRCs,
          potentially in different RTP sessions. It can be used to signal the
          relationship between a WebRTC MediaStream and a set of SSRCs.</t>

          <t>The details of appropriate values are given in RFC XXXX.</t>
        </list></t>

      <t>This document requests IANA to register the "WMS" semantic within the
      "Semantics for the "ssrc-group" SDP Attribute" registry within the SDP
      parameters registry.</t>

      <t>The required information is:</t>

      <t><list style="symbols">
          <t>Description: WebRTC Media Stream, as given in RFC XXXX.</t>

          <t>Token: WMS</t>

          <t>Standards track reference: RFC XXXX</t>
        </list>IANA is requested to replace "RFC XXXX" with the RFC number of
      this document upon publication.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>An adversary with the ability to modify SDP descriptions has the
      ability to switch around tracks between media streams. This is a special
      case of the general security consideration that modification of SDP
      descriptions needs to be confined to entities trusted by the
      application.</t>

      <t>No attacks that are relevant to the browser's security have been
      identified that depend on this mechanism.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>This note is based on sketches from, among others, Justin Uberti and
      Cullen Jennings.</t>

      <t>Special thanks to Miguel Garcia for his work in reviewing this draft,
      with many specific language suggestions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.3550'?>

      <?rfc include='reference.RFC.5576'?>

      <?rfc include='reference.RFC.5234'?>

      <?rfc include='reference.W3C.WD-webrtc-20120209'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.4574'?>

      <?rfc include='reference.RFC.5761'?>

      <?rfc include='reference.RFC.5888'?>

      <?rfc include='reference.I-D.westerlund-avtcore-multiplex-architecture'?>
    </references>

    <section title="Design considerations, open questions and and alternatives">
      <t>This appendix should be deleted before publication as an RFC.</t>

      <t>One suggested mechanism has been to use CNAME instead of a new
      attribute. This was abandoned because CNAME identifies a synchronization
      context; one can imagine both wanting to have tracks from the same
      synchronization context in multiple media streams and wanting to have
      tracks from multiple synchronization contexts within one media
      stream.</t>

      <t>Another suggestion has been to put the msid value within an attribute
      of RTCP SR (sender report) packets. This doesn't offer the ability to
      know that you have seen all the tracks currently configured for a media
      stream.</t>

      <t>There has been a suggestion that this mechanism could be used to mute
      tracks too. This is not done at the moment.</t>

      <t>The special value "default" and the reservation of "example*" seems
      bothersome; apart from that, it's a random string. It's uncertain
      whether "example" has any benefit.</t>

      <t>An alternative to the "default" media stream is to let each new media
      stream track without a msid attribute create its own media stream. Input
      on this question is sought.</t>

      <t>Discarding of incoming data when the SDP description isn't updated
      yet (section 3) may cause clipping. However, the same issue exists when
      crypto keys aren't available. Input sought.</t>

      <t>There's been a suggestion that acceptable SSRCs should be signalled
      in a response, giving a recipient the ability to say "no" to certain
      SSRCs. This is not supported in the current version of this
      document.</t>

      <t>This specification reuses the ssrc-group semantics registry for this
      semantic, on the argument that the WMS purpose is more similar to an
      SSRC grouping than a session-level grouping, and allows values from both
      registries, on the argument that some semantics (like LS) are well
      defined for MSID. Input sought.</t>
    </section>

    <section title="Change log">
      <t>This appendix should be deleted before publication as an RFC.</t>

      <section title="Changes from rtcweb-msid-00 to -01">
        <t>Added track identifier.</t>

        <t>Added inclusion-by-reference of
        draft-lennox-mmusic-source-selection for track muting.</t>

        <t>Some rewording.</t>
      </section>

      <section title="Changes from rtcweb-msid-01 to -02">
        <t>Split document into sections describing a generic grouping
        mechanism and sections describing the application of this grouping
        mechanism to the WebRTC MediaStream concept.</t>

        <t>Removed the mechanism for muting tracks, since this is not central
        to the MSID mechanism.</t>
      </section>

      <section title="Changes from rtcweb-msid-02 to mmusic-msid-00">
        <t>Changed the draft name according to the wishes of the MMUSIC group
        chairs.</t>

        <t>Added text indicting cases where it's appropriate to have the same
        appdata for multiple SSRCs.</t>

        <t>Minor textual updates.</t>
      </section>

      <section title="Changes from mmusic-msid-00 to -01">
        <t>Increased the amount of explanatory text, much based on a review by
        Miguel Garcia.</t>

        <t>Removed references to BUNDLE, since that spec is under active
        discussion.</t>

        <t>Removed distinguished values of the MSID identifier.</t>
      </section>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 02:57:32