One document matched: draft-ietf-payload-vp8-17.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3550 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc3551 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml">
<!ENTITY rfc3711 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml">
<!ENTITY rfc3984 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3984.xml">
<!ENTITY rfc4855 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4855.xml">
<!ENTITY rfc4566 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml">
<!ENTITY rfc4585 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4585.xml">
<!ENTITY rfc5124 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5124.xml">
<!ENTITY rfc6386 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6386.xml">
<!ENTITY rfc6838 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6838.xml">
<!ENTITY rfc7201 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7201.xml">
<!ENTITY rfc7202 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7202.xml">
]>
<rfc category="std" docName="draft-ietf-payload-vp8-17" ipr="trust200902">
  <?rfc symrefs="yes" ?>

  <?rfc sortrefs="yes" ?>

  <!-- alphabetize the references -->

  <?rfc comments="no"?>

  <!-- show comments -->

  <?rfc inline="yes" ?>

  <!-- comments are inline -->

  <?rfc toc="yes" ?>

  <!-- generate table of contents -->

  <front>
    <title abbrev="RTP Payload Format for VP8">RTP Payload Format for VP8
    Video</title>

    <author fullname="Patrik Westin" initials="P." surname="Westin">
      <organization abbrev="Google">Google, Inc.</organization>

      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>

          <city>Mountain View</city>

          <region>CA</region>

          <code>94043</code>

          <country>USA</country>
        </postal>

        <email>patrik.westin@gmail.com</email>
      </address>
    </author>

    <author fullname="Henrik F Lundin" initials="H.F." surname="Lundin">
      <organization abbrev="Google">Google, Inc.</organization>

      <address>
        <postal>
          <street>Kungsbron 2</street>

          <city>Stockholm</city>

          <region/>

          <code>11122</code>

          <country>Sweden</country>
        </postal>

        <email>hlundin@google.com</email>
      </address>
    </author>

    <author fullname="Michael Glover" initials="M." surname="Glover">
      <organization abbrev="Google">Google, Inc.</organization>

      <address>
        <postal>
          <street>5 Cambridge Center</street>

          <city>Cambridge</city>

          <region>MA</region>

          <code>02142</code>

          <country>USA</country>
        </postal>
      </address>
    </author>

    <author fullname="Justin Uberti" initials="J." surname="Uberti">
      <organization abbrev="Google">Google, Inc.</organization>

      <address>
        <postal>
          <street>747 6th Street South</street>

          <city>Kirkland</city>

          <region>WA</region>

          <code>98033</code>

          <country>USA</country>
        </postal>
      </address>
    </author>

    <author fullname="Frank Galligan" initials="F." surname="Galligan">
      <organization abbrev="Google">Google, Inc.</organization>

      <address>
        <postal>
          <street>1600 Amphitheatre Parkway</street>

          <city>Mountain View</city>

          <region>CA</region>

          <code>94043</code>

          <country>USA</country>
        </postal>
      </address>
    </author>

    <date day="9" month="September" year="2015"/>

    <area>General</area>

    <workgroup>Payload Working Group</workgroup>

    <keyword>RFC</keyword>

    <keyword>Request for Comments</keyword>

    <keyword>RTP</keyword>

    <keyword>VP8</keyword>

    <keyword>WebM</keyword>

    <abstract>
      <t>This memo describes an RTP payload format for the VP8 video codec.
      The payload format has wide applicability, as it supports applications
      from low bit-rate peer-to-peer usage, to high bit-rate video
      conferences.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>This memo describes an RTP payload specification applicable to the
      transmission of video streams encoded using the VP8 video codec <xref
      target="RFC6386"/>. The format described in this document can be used
      both in peer-to-peer and video conferencing applications.</t>

      <t>VP8 is based on decomposition of frames into square
      sub-blocks of pixels known as "macroblocks" (see Section 2 of
      <xref target="RFC6386"/>).  Prediction of such sub-blocks using
      previously constructed blocks, and adjustment of such
      predictions (as well as synthesis of unpredicted blocks) is done using a
      discrete cosine transform (hereafter abbreviated as DCT). In one
      special case, however, VP8 uses a "Walsh-Hadamard" (hereafter
      abbreviated as WHT) transform instead of a DCT. An encoded VP8
      frame is divided into two or more partitions, as described in
      <xref target="RFC6386"/>. The first partition (prediction or
      mode) contains prediction mode parameters and motion vectors for
      all macroblocks. The remaining partitions all contain the
      quantized DCT/WHT coefficients for the residuals. There can be
      1, 2, 4, or 8 DCT/WHT partitions per frame, depending on encoder
      settings.</t>

      <t>In summary, the payload format described in this document enables a
      number of features in VP8, including: <list style="symbols">
          <t>Taking partition boundaries into consideration, to improve loss
          robustness and facilitate efficient packet loss concealment at the
          decoder.</t>

          <t>Temporal scalability.</t>

          <t>Advanced use of reference frames to enable efficient error
          recovery.</t>

          <t>Marking of frames that have no impact on the decoding of any
          other frame, so that these non-reference frames can be discarded in
          a server or media-aware network element if needed.</t>
        </list></t>
    </section>

    <section anchor="conventions"
             title="Conventions, Definitions and Acronyms">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"/>.</t>

      <t>This document uses the definitions of <xref
      target="RFC6386"/>. In particular, the following terms are
      used.</t>
      <t><list style="hanging">
            <t hangText="Key frames:"> Frames that are decoded without reference to any other frame
            in a sequence. (also called interframes and
            I-frames)</t>
            <t hangText="Interframes:"> Frames that are encoded with reference to prior frames,
            specifically all prior frames up to and including the most
            recent key frame. (also called prediction frames
            and P-frames)</t>
            <t hangText="Golden and altref frames:"> alternate
            prediction frames. Blocks in an interframe may be
            predicted using blocks in the immediately previous frame
            as well as the most recent golden frame or altref frame.
            Every key frame is automatically golden and altref, and
            any interframe may optionally replace the most recent
            golden or altref frame.</t>
            <t hangText="Macroblock:"> a square array of pixels whose
            Y (luminance) dimensions are 16x16 pixels and whose U and
            V (chrominance) dimensions are 8x8 pixels.</t>
      </list></t>

      <t>Two definitions from <xref
      target="RFC4585"/> are also used in this document.</t>
      <t><list style="hanging">
            <t hangText="RPSI:"> Reference picture selection index. A
            feedback message to let the encoder know that the decoder
            has correctly decoded a certain frame.</t>
            <t hangText="SLI:"> Slice loss indication. A feedback
            message to let a decoder inform an encoder that it has
            detected the loss or corruption of one or several
            macroblock(s)</t>
      </list></t>
    </section>

    <section anchor="mediaFormatDescription" title="Media Format Description">
      <t>The VP8 codec uses three different reference frames for
      interframe prediction: the previous frame, the golden frame, and
      the altref frame.  Blocks in an interframe may be predicted
      using blocks in the immediately previous frame as well as the
      most recent golden frame or altref frame.  Every key frame is
      automatically golden and altref, and any interframe may
      optionally replace the most recent golden or altref
      frame. Golden frames and altref frames may also be used to
      increase the tolerance to dropped frames. The payload
      specification in this memo has elements that enable advanced use
      of the reference frames, e.g., for improved loss robustness.</t>

      <t>One specific use case of the three reference frame types is
      temporal scalability. By setting up the reference hierarchy in
      the appropriate way, up to five temporal layers can be
      encoded. (How to set up the reference hierarchy for temporal
      scalability is not within the scope of this memo.) Support for
      temporal scalability is provided by the optional TL0PICIDX and
      TID/Y/KEYIDX fields described in <xref target="VP8payloadDescriptor"/>
      For a general description about temporal scalability for video coding,
      see e.g., <xref target="Sch07"/>.</t>

      <t>Another property of the VP8 codec is that it applies data
      partitioning to the encoded data. Thus, an encoded VP8 frame can
      be divided into two or more partitions, as described in "VP8
      Data Format and Decoding Guide" <xref target="RFC6386"/>. The
      first partition (prediction or mode) contains prediction mode
      parameters and motion vectors for all macroblocks. The remaining
      partitions all contain the transform coefficients for the
      residuals. The first partition is decodable without the
      remaining residual partitions. The subsequent partitions may be
      useful even if some part of the frame is lost. Accordingly, this
      document RECOMMENDS that the frame is packetized by the sender
      with each data partition in a separate packet or packets.  This
      may be beneficial for decoder error concealment, and the payload
      format described in <xref target="payloadFormat"/> provides
      fields that allow the partitions to be identified even if the
      first partition is not available.  The sender can,
      alternatively, aggregate the data partitions into a single data
      stream and, optionally, split it into several packets without
      consideration of the partition boundaries.  The receiver can use
      the length information in the first partition to identify the
      partitions during decoding.</t>

      <t>The format specification is described in <xref target="payloadFormat"/>.
      In <xref target="RPSIandSLI"/>, a method to
      acknowledge receipt of reference frames using RTCP techniques is
      described.</t>

      <t>The payload partitioning and the acknowledging method both serve as
      motivation for three of the fields included in the payload format: the
      "PID", "1st partition size" and "PictureID" fields. The ability to
      encode a temporally scalable stream motivates the "TL0PICIDX" and "TID"
      fields.</t>
    </section>

    <section anchor="payloadFormat" title="Payload Format">
      <t>This section describes how the encoded VP8 bitstream is encapsulated
      in RTP. To handle network losses usage of RTP/AVPF <xref
      target="RFC4585"/> is RECOMMENDED. All integer fields in the
      specifications are encoded as unsigned integers in network octet
      order.</t>

      <section anchor="RTPHeaderUsage" title="RTP Header Usage">
        <figure anchor="figureRTPHeader">
          <preamble>The general RTP payload format for VP8 is depicted
          below.</preamble>

          <artwork><![CDATA[
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |V=2|P|X|  CC   |M|     PT      |       sequence number         |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                           timestamp                           |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |           synchronization source (SSRC) identifier            |
  +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  |            contributing source (CSRC) identifiers             |
  |                             ....                              |
  +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
  |            VP8 payload descriptor (integer #octets)           |
  :                                                               :
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               : VP8 payload header (3 octets) |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  | VP8 pyld hdr  :                                               |
  +-+-+-+-+-+-+-+-+                                               |
  :                   Octets 4..N of VP8 payload                  :
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :    OPTIONAL RTP padding       |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          ]]></artwork>

          <postamble>The VP8 payload descriptor and VP8 payload header will be
          described in <xref target="VP8payloadDescriptor"/> and 
          <xref target="VP8payloadHeader"/>. OPTIONAL RTP padding MUST NOT 
          be included unless the P bit is set. The figure specifically shows
          the format for the first packet in a frame. Subsequent packets will
          not contain the VP8 payload header, and will have later octets in
          the frame payload.</postamble>
        </figure>

        <t><list style="hanging">
            <t hangText="Marker bit (M):">MUST be set for the very last packet
            of each encoded frame in line with the normal use of the M bit in
            video formats. This enables a decoder to finish decoding the
            picture, where it otherwise may need to wait for the next packet
            to explicitly know that the frame is complete.</t>

            <t hangText="Payload type (PT):"> In line with the policy
            in Section 3 of <xref target="RFC3551"/>, applications
            using the VP8 RTP payload profile MUST assign a dynamic
            payload type number to be used in each RTP session and
            provide a mechanism to indicate the mapping. See
            <xref target="SdpParameters"/> for the mechanism to be used 
            with the Session Description Protocol (SDP)
            <xref target="RFC4566"/>.</t>

            <t hangText="Timestamp:"> The RTP timestamp indicates the
            time when the frame was sampled. The granularity of the
            clock is 90 kHz, so a delta of 1 represents 1/90,000 of a
            second.</t>

            <t>The remaining RTP Fixed Header Fields (V, P, X, CC,
            sequence number, SSRC and CSRC identifiers) are used as
            specified in Section 5.1 of <xref target="RFC3550"/>.</t>
          </list></t>
      </section>

      <section anchor="VP8payloadDescriptor" title="VP8 Payload Descriptor">
        <figure anchor="figureVP8payloadDescriptor">
          <preamble>The first octets after the RTP header are the VP8 payload
          descriptor, with the following structure. The single-octet version
          of the PictureID is illustrated to the left (M bit set to zero),
          while the dual-octet version (M bit set to one) is show to the
          right.</preamble>

          <artwork><![CDATA[
      0 1 2 3 4 5 6 7                      0 1 2 3 4 5 6 7 
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
     |X|R|N|S|R| PID | (REQUIRED)        |X|R|N|S|R| PID | (REQUIRED)
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
X:   |I|L|T|K| RSV   | (OPTIONAL)   X:   |I|L|T|K| RSV   | (OPTIONAL)
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
I:   |M| PictureID   | (OPTIONAL)   I:   |M| PictureID   | (OPTIONAL)
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
L:   |   TL0PICIDX   | (OPTIONAL)        |   PictureID   |
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
T/K: |TID|Y| KEYIDX  | (OPTIONAL)   L:   |   TL0PICIDX   | (OPTIONAL)
     +-+-+-+-+-+-+-+-+                   +-+-+-+-+-+-+-+-+
                                    T/K: |TID|Y| KEYIDX  | (OPTIONAL)
                                         +-+-+-+-+-+-+-+-+
            ]]></artwork>
        </figure>

        <t><list style="hanging">
            <t hangText="X:">Extended control bits present. When set to one,
            the extension octet MUST be provided immediately after the
            mandatory first octet. If the bit is zero, all optional fields
            MUST be omitted. Note: this X bit is not to be confused with the
            X bit in the RTP header.</t>

            <t hangText="R:">Bit reserved for future use. MUST be set to zero
            and MUST be ignored by the receiver.</t>

            <t hangText="N:">Non-reference frame. When set to one, the frame
            can be discarded without affecting any other future or past
            frames. If the reference status of the frame is unknown, this bit
            SHOULD be set to zero to avoid discarding frames needed for
            reference. <list style="empty">
                <t>Informative note: This document does not describe how to
                determine if an encoded frame is non-reference. The reference
                status of an encoded frame is preferably provided from the
                encoder implementation.</t>
              </list></t>

            <t hangText="S:">Start of VP8 partition. SHOULD be set to 1 when
            the first payload octet of the RTP packet is the beginning of a
            new VP8 partition, and MUST NOT be 1 otherwise. The S bit MUST be
            set to 1 for the first packet of each encoded frame.</t>

            <t hangText="PID:">Partition index. Denotes which VP8 partition
            the first payload octet of the packet belongs to. The first VP8
            partition (containing modes and motion vectors) MUST be labeled
            with PID = 0. PID SHOULD be incremented by 1 for each subsequent
            partition, but MAY be kept at 0 for all packets. PID cannot be
            larger than 7. If more than one packet in an encoded frame
            contains the same PID, the S bit MUST NOT be set for any other
            packet than the first packet with that PID.</t>
          </list></t>

        <t>When the X bit is set to 1 in the first octet, the
        Extended Control Bits field octet MUST be provided as the second
        octet. If the X bit is 0, the Extended Control Bits field octet MUST
        NOT be present, and no extensions (I, L, T, or K) are
        permitted.
        <list style="hanging">
            <t hangText="I:">PictureID present. When set to one, the
            PictureID MUST be present after the extension bit field
            and specified as below. Otherwise, PictureID MUST NOT be
            present.</t>

            <t hangText="L:">TL0PICIDX present. When set to one, the
            TL0PICIDX MUST be present and specified as below, and the
            T bit MUST be set to 1. Otherwise, TL0PICIDX MUST NOT be
            present.</t>

            <t hangText="T:">TID present. When set to one, the
            TID/Y/KEYIDX octet MUST be present. The TID|Y part of the
            octet MUST be specified as below. If K (below) is set to
            one but T is set to zero, the TID/Y/KEYIDX octet MUST be
            present, but the TID field MUST be ignored. If neither T
            nor K is set to one, the TID/Y/KEYIDX octet MUST NOT be
            present.</t>

            <t hangText="K:">KEYIDX present. When set to one, the
            TID/Y/KEYIDX octet MUST be present. The KEYIDX part of the
            octet MUST be specified as below. If T (above) is set to
            one but K is set to zero, the TID/Y/KEYIDX octet MUST be
            present, but the KEYIDX field MUST be ignored. If neither
            T nor K is set to one, the TID/Y/KEYIDX octet MUST NOT be
            present.</t>

            <t hangText="RSV:">Bits reserved for future use. MUST be set to
            zero and MUST be ignored by the receiver.</t>
          </list></t>

        <t>After the extension bit field follow the extension data fields that
        are enabled. <list style="hanging">
            <t hangText="The PictureID extension:"> If the I bit is set to 
            one, the PictureID extension field MUST be present, and MUST NOT 
            be present otherwise. The field consists of two parts: 
            <list style="hanging">
            <t hangText="M:">The most significant bit of the first octet is an
            extension flag. If M is set, the remainder of the PictureID field
            MUST contain 15 bits, else it MUST contain 7 bits. Note: this M
            bit is not to be confused with the M bit in the RTP header.</t>

            <t hangText="PictureID:">7 or 15 bits (shown left and
            right, respectively, in <xref
            target="figureVP8payloadDescriptor"/>) not including the M
            bit. This is a running index of the frames, which MAY
            start at a random value, MUST increase by 1 for each
            subsequent frame, and MUST wrap to 0 after reaching the
            maximum ID (all bits set).  The 7 or 15 bits of the
            PictureID go from most significant to least significant,
            beginning with the first bit after the M bit.  The sender
            chooses a 7 or 15 bit index and sets the M bit
            accordingly. The receiver MUST NOT assume that the number
            of bits in PictureID stay the same through the
            session. Having sent a 7-bit PictureID with all bits set
            to 1, the sender may either wrap the PictureID to 0, or
            extend to 15 bits and continue
            incrementing.</t></list></t>

            <t hangText="The TL0PICIDX extension:"> If the L bit is set to 
            one, the TL0PICIDX extension field MUST be present, and MUST NOT 
            be present otherwise. The field consists of one part: 
            <list style="hanging">
            <t hangText="TL0PICIDX:">8 bits temporal level zero
            index. TL0PICIDX is a running index for the temporal base
            layer frames, i.e., the frames with TID set to 0.  If TID
            is larger than 0, TL0PICIDX indicates which base layer
            frame the current image depends on. TL0PICIDX MUST be
            incremented when TID is 0.  The index MAY start at a
            random value, and MUST wrap to 0 after reaching the
            maximum number 255. Use of TL0PICIDX depends on the presence
			of TID. It is therefore RECOMMENDED that the TID is used 
			whenever TL0PICIDX is.</t></list></t>

            <t hangText="The TID/Y/KEYIDX extension:"> If the any of
            the T or K bits are set to one, the TID/Y/KEYIDX extension
            field MUST be present. It MUST NOT be present if both T
            and K are zero. The field consists of three parts:
            <list style="hanging">
            <t hangText="TID:">2 bits temporal layer index. The TID field
            MUST be ignored by the receiver when the T bit is set equal to 0.
            The TID field indicates which temporal layer the packet
            represents. The lowest layer, i.e., the base layer, MUST have TID
            set to 0. Higher layers SHOULD increment the TID according to
            their position in the layer hierarchy.</t>

            <t hangText="Y:">1 layer sync bit. The Y bit SHOULD be set to 1 if
            the current frame depends only on the base layer (TID = 0) frame
            with TL0PICIDX equal to that of the current frame. The Y bit MUST
            be set to 0 if the current frame depends on any other frame than the
            base layer (TID = 0) frame with TL0PICIDX equal to that of the
            current frame. Additionally, the Y bit MUST be set to 0 if any frame
			following the current frame depends on a non-base layer frame older 
			than the base layer frame with TL0PICIDX equal to that of the current 
			frame. If the Y bit is set when the T bit is equal to 0
            the current frame MUST only depend on a past base layer (TID=0)
            key frame as signaled by a change in the KEYIDX field.
            Additionally this frame MUST NOT depend on any of the three codec
            buffers (as defined by <xref target="RFC6386"/>) that have been
            updated since the last time the KEYIDX field was changed.</t><!-- <list
                style="empty">-->
                <t>Informative note: This document does not describe how to
                determine the dependency status for a frame; this information
                is preferably provided from the encoder implementation. In the
                case of unknown status, the Y bit can safely be set to 0.</t>
              <!--</list></t>-->

            <t hangText="KEYIDX:">5 bits temporal key frame index. The
            KEYIDX field MUST be ignored by the receiver when the K bit is set
            equal to 0. The KEYIDX field is a running index for key frames.
            KEYIDX MAY start at a random value, and MUST wrap to 0 after
            reaching the maximum number 31. When in use, the KEYIDX SHOULD be
            present for both key frames and interframes. The sender MUST
            increment KEYIDX for key frames which convey parameter updates
            critical to the interpretation of subsequent frames, and SHOULD
            leave the KEYIDX unchanged for key frames that do not contain
            these critical updates. If the KEYIDX is present, a receiver
            SHOULD NOT decode an interframe if it has not received and decoded
            a key frame with the same KEYIDX after the last KEYIDX
            wrap-around. </t><!--<list style="empty">-->
                <t>Informative note: This document does not describe how to
                determine if a key frame updates critical parameters; this
                information is preferably provided from the encoder
                implementation. A sender that does not have this information
                may either omit the KEYIDX field (set K equal to 0), or
                increment the KEYIDX on every key frame. The benefit with the
                latter is that any key frame loss will be detected by the
                receiver, which can signal for re-transmission or request a
                new key frame.</t>
              <!--</list></t>--></list></t>

            <t hangText="Informative note:">Implementations doing splicing of
            VP8 streams will have to make sure the rules for incrementing
            TL0PICIDX and KEYIDX are obeyed across the splice. This will
            likely require rewriting values of TL0PICIDX and KEYIDX after the
            splice.</t>
          </list></t>

        <t><vspace blankLines="100"/></t>

        <!-- force a pagebreak-->
      </section>

      <section anchor="VP8payloadHeader" title="VP8 Payload Header">
        <t>The beginning of an encoded VP8 frame is referred to as an
        "uncompressed data chunk" in Section 9.1 of <xref target="RFC6386"/>, and
        also serves as a payload header in this RTP format. The codec
        bitstream format specifies two different variants of the
        uncompressed data chunk: a 3 octet version for interframes and
        a 10 octet version for key frames.  The first 3 octets are
        common to both variants. In the case of a key frame the
        remaining 7 octets are considered to be part of the remaining
        payload in this RTP format. Note that the header is present
        only in packets which have the S bit equal to one and the PID
        equal to zero in the payload descriptor. Subsequent packets
        for the same frame do not carry the payload header.</t>

        <t>The length of the first partition can always be obtained
        from the first partition size parameter in the VP8 payload
        header. The VP8 bitstream format <xref target="RFC6386"/>
        specifies that if multiple DCT/WHT partitions are produced,
        the location of each partition start is found at the end of
        the first (prediction or mode) partition. In this RTP payload
        specification, the location offsets are considered to be part
        of the first partition.</t>

        <figure anchor="figureVP8payloadHeader">
          <artwork><![CDATA[
   0 1 2 3 4 5 6 7
  +-+-+-+-+-+-+-+-+
  |Size0|H| VER |P|
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | Octets 4..N of|
  | VP8 payload   |
  :               :
  +-+-+-+-+-+-+-+-+
  | OPTIONAL RTP  |
  | padding       |
  :               :
  +-+-+-+-+-+-+-+-+
            ]]></artwork>
        </figure>

        <t>A packetizer needs access to the P bit. The remaining
        fields are left unexplained with reference to <xref
        target="RFC6386"/></t>

        <t><list style="hanging">
            <t hangText="P:">Inverse key frame flag. When set to 0 the current
            frame is a key frame. When set to 1 the current frame is an
            interframe. Defined in <xref target="RFC6386"/></t>
          </list></t>
      </section>

      <section title="Aggregated and Fragmented Payloads">
        <t>An encoded VP8 frame can be divided into two or more
        partitions, as described in <xref target="intro"/>.  It is
        OPTIONAL for a packetizer implementing this RTP specification
        to pay attention to the partition boundaries within an encoded
        frame.  If packetization of a frame is done without
        considering the partition boundaries, the PID field MAY be set
        to zero for all packets, and the S bit MUST NOT be set to one
        for any other packet than the first.</t>

        <t>If the preferred usage suggested in <xref
        target="mediaFormatDescription"/> is followed, with each packet
        carrying data from exactly one partition, the S bit and PID
        fields described in <xref target="VP8payloadDescriptor"/>
        SHOULD be used to indicate what the packet contains.  The PID
        field should indicate which partition the first octet of the
        payload belongs to, and the S bit indicates that the packet
        starts on a new partition.</t>

        <t>If the packetizer does not pay attention to the partition
        boundaries, one packet can contain a fragment of a partition,
        a complete partition, or an aggregate of fragments and
        partitions.  There is no explicit signaling of partition
        boundaries in the payload and the partition lengths at the end
        of the first partition have to be used to identify the
        boundaries. Partitions MUST be aggregated in decoding order.
        Two fragments from different partitions MAY be aggregated into
        the same packet along with one or more complete
        partitions.</t>

        <t>In all cases, the payload of a packet MUST contain data
        from only one video frame.  Consequently the set of packets
        carrying the data from a particular frame will contain exactly
        one VP8 Payload Header (see <xref target="VP8payloadHeader"/>)
        carried in the first packet of the frame.  The last, or only,
        packet carrying data for the frame MUST have the M bit set in
        the RTP header.</t>
      </section>

      <section title="Example algorithms">
	<section title="Frame reconstruction algorithm">
          <t>Example of frame reconstruction algorithm. <list style="hanging">
          <t hangText="1:">Collect all packets with a given RTP
          timestamp.</t>

          <t hangText="2:">Go through packets in order, sorted by sequence
          numbers, if packets are missing, send NACK as defined in <xref
          target="RFC4585"/> or decode with missing partitions, see <xref
          target="FrameReconstructionWithLoss"/> below.</t>

          <t hangText="3:">A frame is complete if the frame has no missing
          sequence numbers, the first packet in the frame contains S=1 with
          partId=0 and the last packet in the frame has the marker bit
          set.</t>
          </list></t>
	</section>

        <section anchor="FrameReconstructionWithLoss"
                 title="Partition reconstruction algorithm">
          <t>Example of partition reconstruction algorithm. The algorithm only
          applies for the RECOMMENDED use case with partitions in separate
          packets.
          <list style="hanging">
              <t hangText="1:">Scan for the start of a new partition; S=1.</t>

              <t hangText="2:">Continue scan to detect end of partition; hence
              a new S=1 (previous packet was the end of the partition) is
              found or the marker bit is set. If a loss is detected before the
              end of the partition, abandon all packets in this partition and
              continue the scan repeating from step 1.</t>

              <t hangText="3:">Store the packets in the complete partition,
              continue the scan repeating from step 1 until end of frame is
              reached.</t>

              <t hangText="4:">Send all complete partitions to the decoder. If
              no complete partition is found discard the whole frame.</t>
            </list></t>
        </section>
      </section>

      <section title="Examples of VP8 RTP Stream">
        <t>A few examples of how the VP8 RTP payload can be used are included
        below.</t>

        <section title="Key frame in a single RTP packet">
          <figure>
            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 1        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 0| X = 1; S = 1; PID = 0
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  |Size0|1| VER |0| P = 0
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | VP8 payload   |
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>
        </section>

        <section title="Non-discardable VP8 interframe in a single RTP packet; no PictureID">
          <figure>
            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 1        |
  +-+-+-+-+-+-+-+-+
  |0|0|0|1|0|0 0 0| X = 0; S = 1; PID = 0
  +-+-+-+-+-+-+-+-+
  |Size0|1| VER |1| P = 1
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | VP8 payload   |
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>

          <t><vspace blankLines="100"/></t>

          <!-- force a pagebreak-->
        </section>

        <section title="VP8 partitions in separate RTP packets">
          <figure>
            <preamble>First RTP packet; complete first partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 0        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 0| X = 1; S = 1; PID = 0
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  |Size0|1| VER |1| P = 1
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | Octets 4..L of|
  | first VP8     |
  | partition     |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>

          <figure>
            <preamble>Second RTP packet; complete second partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 1        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 1| X = 1; S = 1; PID = 1
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  | Remaining VP8 |
  | partitions    |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>
        </section>

        <section title="VP8 frame fragmented across RTP packets">
          <figure>
            <preamble>First RTP packet; complete first partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 0        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 0| X = 1; S = 1; PID = 0
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  |Size0|1| VER |1| P = 1
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | Complete      |
  | first         |
  | partition     |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>

          <figure>
            <preamble>Second RTP packet; first fragment of second
            partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 0        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 1| X = 1; S = 1; PID = 1
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  | First fragment|
  | of second     |
  | partition     |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>

          <figure>
            <preamble>Third RTP packet; second fragment of second
            partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 0        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0|0 0 1| X = 1; S = 0; PID = 1
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  | Mid fragment  |
  | of second     |
  | partition     |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>

          <figure>
            <preamble>Fourth RTP packet; last fragment of second
            partition.</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 1        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0|0 0 1| X = 1; S = 0; PID = 1
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1
  +-+-+-+-+-+-+-+-+
  |0 0 0 1 0 0 0 1| PictureID = 17
  +-+-+-+-+-+-+-+-+
  | Last fragment |
  | of second     |
  | partition     |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>
        </section>

        <section title="VP8 frame with long PictureID">
          <figure>
            <preamble>PictureID = 4711 = 001001001100111 binary (first 7 bits:
            0010010, last 8 bits: 01100111).</preamble>

            <artwork><![CDATA[
   0 1 2 3 4 5 6 7 
  +-+-+-+-+-+-+-+-+
  |  RTP header   |
  |  M = 1        |
  +-+-+-+-+-+-+-+-+
  |1|0|0|1|0|0 0 0| X = 1; S = 1; PID = 0
  +-+-+-+-+-+-+-+-+
  |1|0|0|0|0 0 0 0| I = 1;
  +-+-+-+-+-+-+-+-+
  |1 0 0 1 0 0 1 0| Long PictureID flag = 1
  |0 1 1 0 0 1 1 1| PictureID = 4711
  +-+-+-+-+-+-+-+-+
  |Size0|1| VER |1|
  +-+-+-+-+-+-+-+-+
  |     Size1     |
  +-+-+-+-+-+-+-+-+
  |     Size2     |
  +-+-+-+-+-+-+-+-+
  | Octets 4..N of|
  | VP8 payload   |
  :               :
  +-+-+-+-+-+-+-+-+
              ]]></artwork>
          </figure>
        </section>
      </section>
    </section>

    <section anchor="RPSIandSLI" title="Using VP8 with RPSI and SLI Feedback">
      <t>The VP8 payload descriptor defined in <xref
      target="VP8payloadDescriptor"/> above contains an optional PictureID
      parameter. This parameter is included mainly to enable use of reference
      picture selection index (RPSI) and slice loss indication (SLI), both
      defined in <xref target="RFC4585"/>.</t>

      <section anchor="RPSI" title="RPSI">
        <t>The reference picture selection index is a payload-specific
        feedback message defined within the RTCP-based feedback
        format. The RPSI message is generated by a receiver and can be
        used in two ways.  Either it can signal a preferred reference
        picture when a loss has been detected by the decoder --
        preferably then a reference that the decoder knows is perfect
        -- or, it can be used as positive feedback information to
        acknowledge correct decoding of certain reference
        pictures. The positive feedback method is useful for VP8 used
        for point to point (unicast) communication. The use of RPSI
        for VP8 is preferably combined with a special update pattern
        of the codec's two special reference frames -- the golden
        frame and the altref frame -- in which they are updated in an
        alternating leapfrog fashion. When a receiver has received and
        correctly decoded a golden or altref frame, and that frame had
        a PictureID in the payload descriptor, the receiver can
        acknowledge this simply by sending an RPSI message back to the
        sender. The message body (i.e., the "native RPSI bit string"
        in <xref target="RFC4585"/>) is simply the PictureID of the
        received frame.</t>
      </section>

      <section anchor="SLI" title="SLI">
        <t>The slice loss indication is another payload-specific feedback
        message defined within the RTCP-based feedback format. The SLI message
        is generated by the receiver when a loss or corruption is detected in
        a frame. The format of the SLI message is as follows <xref
        target="RFC4585"/>:</t>

        <figure anchor="figureSLIHeader">
          <artwork><![CDATA[
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |         First           |        Number           | PictureID |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            ]]></artwork>
        </figure>

        <t>Here, First is the macroblock address (in scan order) of
        the first lost block and Number is the number of lost blocks,
        as defined in <xref target="RFC4585"/>. PictureID is the six
        least significant bits of the codec-specific picture
        identifier in which the loss or corruption has occurred. For
        VP8, this codec-specific identifier is naturally the PictureID
        of the current frame, as read from the payload descriptor. If
        the payload descriptor of the current frame does not have a
        PictureID, the receiver MAY send the last received PictureID+1
        in the SLI message. The receiver MAY set the First parameter
        to 0, and the Number parameter to the total number of
        macroblocks per frame, even though only part of the frame is
        corrupted. When the sender receives an SLI message, it can
        make use of the knowledge from the latest received RPSI
        message. Knowing that the last golden or altref frame was
        successfully received, it can encode the next frame with
        reference to that established reference.</t>
      </section>

      <section title="Example">
        <t>The use of RPSI and SLI is best illustrated in an example. In this
        example, the encoder may not update the altref frame until the last
        sent golden frame has been acknowledged with an RPSI message. If an
        update is not received within some time, a new golden frame update is
        sent instead. Once the new golden frame is established and
        acknowledged, the same rule applies when updating the altref
        frame.</t>

        <texttable anchor="table_example_timing"
                   title="Exemple signaling between sender and receiver">
          <ttcol align="left">Event</ttcol>

          <ttcol align="left">Sender</ttcol>

          <ttcol align="left">Receiver</ttcol>

          <ttcol align="left">Established reference</ttcol>

          <c>1000</c>

          <c>Send golden frame PictureID = 0</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Receive and decode golden frame</c>

          <c/>

          <c>1001</c>

          <c/>

          <c>Send RPSI(0)</c>

          <c/>

          <c>1002</c>

          <c>Receive RPSI(0)</c>

          <c/>

          <c>golden</c>

          <c>...</c>

          <c>(sending regular frames)</c>

          <c/>

          <c/>

          <c>1100</c>

          <c>Send altref frame PictureID = 100</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Altref corrupted or lost</c>

          <c>golden</c>

          <c>1101</c>

          <c/>

          <c>Send SLI(100)</c>

          <c>golden</c>

          <c>1102</c>

          <c>Receive SLI(100)</c>

          <c/>

          <c/>

          <c>1103</c>

          <c>Send frame with reference to golden</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Receive and decode frame (decoder state restored)</c>

          <c>golden</c>

          <c>...</c>

          <c>(sending regular frames)</c>

          <c/>

          <c/>

          <c>1200</c>

          <c>Send altref frame PictureID = 200</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Receive and decode altref frame</c>

          <c>golden</c>

          <c>1201</c>

          <c/>

          <c>Send RPSI(200)</c>

          <c/>

          <c>1202</c>

          <c>Receive RPSI(200)</c>

          <c/>

          <c>altref</c>

          <c>...</c>

          <c>(sending regular frames)</c>

          <c/>

          <c/>

          <c>1300</c>

          <c>Send golden frame PictureID = 300</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Receive and decode golden frame</c>

          <c>altref</c>

          <c>1301</c>

          <c/>

          <c>Send RPSI(300)</c>

          <c>altref</c>

          <c>1302</c>

          <c>RPSI lost</c>

          <c/>

          <c/>

          <c>1400</c>

          <c>Send golden frame PictureID = 400</c>

          <c/>

          <c/>

          <c/>

          <c/>

          <c>Receive and decode golden frame</c>

          <c>altref</c>

          <c>1401</c>

          <c/>

          <c>Send RPSI(400)</c>

          <c/>

          <c>1402</c>

          <c>Receive RPSI(400)</c>

          <c/>

          <c>golden</c>
        </texttable>

        <t>Note that the scheme is robust to loss of the feedback messages. If
        the RPSI is lost, the sender will try to update the golden (or altref)
        again after a while, without releasing the established reference.
        Also, if an SLI is lost, the receiver can keep sending SLI messages at
        any interval allowed by the RTCP sending timing restrictions as
        specified in <xref target="RFC4585"/>, as long as the picture is
        corrupted.</t>
      </section>
    </section>

    <section anchor="payloadFormatParameters"
             title="Payload Format Parameters">
      <t>This payload format has two optional parameters.</t>

      <section anchor="mediaTypeRegistration" title="Media Type Definition">
        <t>This registration is done using the template defined in <xref
        target="RFC6838"/> and following <xref target="RFC4855"/>. <list
            style="hanging">
            <t hangText="Type name:">video</t>

            <t hangText="Subtype name:">VP8</t>

            <t hangText="Required parameters:">None.</t>

            <t hangText="Optional parameters:"><vspace
            blankLines="0"/> These parameters are used to signal the
            capabilities of a receiver implementation. If the
            implementation is willing to receive media, both
            parameters MUST be provided. These parameters MUST NOT be
            used for any other purpose. <list style="hanging">
                <t hangText="max-fr:">The value of max-fr is an integer
                indicating the maximum frame rate in units of frames per
                second that the decoder is capable of decoding.</t>

                <t hangText="max-fs:">The value of max-fs is an integer
                indicating the maximum frame size in units of macroblocks that
                the decoder is capable of decoding.</t>

                <t>The decoder is capable of decoding this frame size as long
                as the width and height of the frame in macroblocks are less
                than int(sqrt(max-fs * 8)) - for instance, a max-fs of 1200
                (capable of supporting 640x480 resolution) will support widths
                and heights up to 1552 pixels (97 macroblocks).</t>
              </list></t>

            <t hangText="Encoding considerations:"><vspace blankLines="0"/>
            This media type is framed in RTP and contains binary data; see
            Section 4.8 of <xref target="RFC6838"/>.</t>

            <t hangText="Security considerations:">See <xref
            target="securityConsiderations"/> of RFC xxxx. <vspace
            blankLines="0"/> [RFC Editor: Upon publication as an RFC, please
            replace "XXXX" with the number assigned to this document and
            remove this note.]</t>

            <t hangText="Interoperability considerations:">None.</t>

            <t hangText="Published specification:">VP8 bitstream format <xref
            target="RFC6386"/> and RFC XXXX. <vspace blankLines="0"/> [RFC
            Editor: Upon publication as an RFC, please replace "XXXX" with the
            number assigned to this document and remove this note.] <vspace
            blankLines="0"/></t>

            <t hangText="Applications which use this media type:"><vspace
            blankLines="0"/> For example: Video over IP, video
            conferencing.</t>

            <t hangText="Fragment identifier considerations:">N/A.</t>

            <t hangText="Additional information:">None.</t>

            <t
            hangText="Person & email address to contact for further information:"><vspace
            blankLines="0"/> Patrik Westin, patrik.westin@gmail.com</t>

            <t hangText="Intended usage:">COMMON</t>

            <t hangText="Restrictions on usage:"><vspace blankLines="0"/> This
            media type depends on RTP framing, and hence is only defined for
            transfer via RTP <xref target="RFC3550"/>.</t>

            <t hangText="Author:">Patrik Westin, patrik.westin@gmail.com</t>

            <t hangText="Change controller:"><vspace blankLines="0"/> IETF
            Payload Working Group delegated from the IESG.</t>
          </list></t>
      </section>

      <section anchor="SdpParameters" title="SDP Parameters">
        <t>The receiver MUST ignore any fmtp parameter unspecified in this
        memo.</t>

        <section title="Mapping of Media Subtype Parameters to SDP">
          <t>The media type video/VP8 string is mapped to fields in the
          Session Description Protocol (SDP) <xref target="RFC4566"/> as
          follows: <list style="symbols">
              <t>The media name in the "m=" line of SDP MUST be video.</t>

              <t>The encoding name in the "a=rtpmap" line of SDP MUST be VP8
              (the media subtype).</t>

              <t>The clock rate in the "a=rtpmap" line MUST be 90000.</t>

              <t>The parameters "max-fs", and "max-fr", MUST be included in
              the "a=fmtp" line if the SDP is used to declare receiver 
              capabilities. These parameters are expressed as a
              media subtype string, in the form of a semicolon separated list
              of parameter=value pairs.</t>
            </list></t>

          <section title="Example">
            <t>An example of media representation in SDP is as follows:</t>

            <t>m=video 49170 RTP/AVPF 98<vspace blankLines="0"/> a=rtpmap:98
            VP8/90000<vspace blankLines="0"/> a=fmtp:98 max-fr=30;
            max-fs=3600;<vspace blankLines="0"/></t>
          </section>
        </section>

        <section title="Offer/Answer Considerations">
          <t>The VP8 codec offers a decode complexity that is roughly linear
          with the number of pixels encoded. The parameters "max-fr" and
          "max-fs" are defined in <xref target="mediaTypeRegistration"/>,
          where the macroblock size is 16x16 pixels as defined in <xref
          target="RFC6386"/>, the max-fs and max-fr parameters MUST be used to
          establish these limits.</t>
        </section>
      </section>
    </section>

    <section anchor="securityConsiderations" title="Security Considerations">
      <t>RTP packets using the payload format defined in this
      specification are subject to the security considerations
      discussed in the RTP specification <xref target="RFC3550"/> ,
      and in any applicable RTP profile such as RTP/AVP <xref
      target="RFC3551"/>, RTP/AVPF <xref target="RFC4585"/>, RTP/SAVP
      <xref target="RFC3711"/> or RTP/SAVPF <xref target="RFC5124"/>.
      However, as "Securing the RTP Protocol Framework: Why RTP Does
      Not Mandate a Single Media Security Solution" <xref
      target="RFC7202"/> discusses, it is not an RTP payload format's
      responsibility to discuss or mandate what solutions are used to
      meet the basic security goals like confidentiality, integrity
      and source authenticity for RTP in general.  This responsibility
      lays on anyone using RTP in an application.  They can find
      guidance on available security mechanisms and important
      considerations in Options for Securing RTP Sessions <xref
      target="RFC7201"/>.  Applications SHOULD use one or more
      appropriate strong security mechanisms.  The rest of this
      security consideration section discusses the security impacting
      properties of the payload format itself.</t>

      <t>This RTP payload format and its media decoder do not exhibit
      any significant non-uniformity in the receiver-side
      computational complexity for packet processing, and thus are
      unlikely to pose a denial-of-service threat due to the receipt
      of pathological data.  Nor does the RTP payload format contain
      any active content.</t>
    </section>

    <section anchor="congestionControl" title="Congestion Control">
      <t>Congestion control for RTP SHALL be used in accordance with RFC 3550
      <xref target="RFC3550"/>, and with any applicable RTP profile; e.g., RFC
      3551 <xref target="RFC3551"/>. The congestion control mechanism can, in
      a real-time encoding scenario, adapt the transmission rate by
      instructing the encoder to encode at a certain target rate. Media aware
      network elements MAY use the information in the VP8 payload descriptor
      in <xref target="VP8payloadDescriptor"/> to identify non-reference
      frames and discard them in order to reduce network congestion. Note that
      discarding of non-reference frames cannot be done if the stream is
      encrypted (because the non-reference marker is encrypted).</t>
    </section>

    <section anchor="IANAConsiderations" title="IANA Considerations">
      <t>The IANA is requested to register the following values:<vspace
      blankLines="0"/> - Media type registration as described in <xref
      target="mediaTypeRegistration"/>.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc6386;

      &rfc2119;

      &rfc4585;

      &rfc3550;

      &rfc3551;

      &rfc4566;

      &rfc6838;

      &rfc4855;
    </references>

    <references title="Informative References">
      &rfc3711;

      &rfc5124;

      &rfc7201;

      &rfc7202;

      <reference anchor="Sch07" target="http://dx.doi.org/10.1109/TCSVT.2007.905532">
	<front>
	  <title>Overview of the Scalable Video Coding Extension of the H.264/AVC Standard</title>
	  <author initials="H." surname="Schwarz" fullname="Heiko Schwarz"><organization/></author>
	  <author initials="D." surname="Marpe" fullname="Detlev Marpe"><organization/></author>
	  <author initials="T." surname="Wiegand" fullname="Thomas Wiegand"><organization/></author>
	  <date year="2007" />
	</front>
	<seriesInfo name="Circuits and Systems for Video Technology, IEEE Transactions on," value="17(9)" />
	<seriesInfo name="DOI" value="10.1109/TCSVT.2007.905532" />
      </reference>

    </references>
  </back>
</rfc>
<!--  LocalWords:  PictureID DCT Hadamard WHT SSRC CSRC pyld hdr FI VER RPSI
 -->
<!--  LocalWords:  stPartitionSize SLI SDP AVPF SRTP IANA PID PICIDX TID
 -->
<!--  LocalWords:  RTP Amphitheatre Kungsbron WebM RTCP KEYIDX NACK
-->
<!--  LocalWords:  partId IP IETF IESG AVP SAVP SAVPF
-->

PAFTECH AB 2003-20262026-04-23 19:40:05