One document matched: draft-ietf-avtext-lrr-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3550 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc4585 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4585.xml">
<!ENTITY rfc5104 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5104.xml">
<!ENTITY rfc6190 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6190.xml">
<!ENTITY vp8rtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-payload-vp8.xml">
<!ENTITY vp9rtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.uberti-payload-vp9.xml">
<!ENTITY h265rtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-payload-rtp-h265.xml">
<!ENTITY taxonomy SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-avtext-rtp-grouping-taxonomy.xml">
]>
<rfc category="std" docName="draft-ietf-avtext-lrr-01" ipr="trust200902">
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<!-- alphabetize the references -->
<?rfc comments="no"?>
<!-- show comments -->
<?rfc inline="yes" ?>
<!-- comments are inline -->
<?rfc toc="yes" ?>
<!-- generate table of contents -->
<front>
<title abbrev="Layer Refresh Request RTCP Feedback">The Layer Refresh Request (LRR) RTCP Feedback Message</title>
<author fullname="Jonathan Lennox" initials="J." surname="Lennox">
<organization abbrev="Vidyo">Vidyo, Inc.</organization>
<address>
<postal>
<street>433 Hackensack Avenue</street>
<street>Seventh Floor</street>
<city>Hackensack</city>
<region>NJ</region>
<code>07601</code>
<country>US</country>
</postal>
<email>jonathan@vidyo.com</email>
</address>
</author>
<author fullname="Danny Hong" initials="D." surname="Hong">
<organization abbrev="Vidyo">Vidyo, Inc.</organization>
<address>
<postal>
<street>433 Hackensack Avenue</street>
<street>Seventh Floor</street>
<city>Hackensack</city>
<region>NJ</region>
<code>07601</code>
<country>US</country>
</postal>
<email>danny@vidyo.com</email>
</address>
</author>
<author fullname="Justin Uberti" initials="J." surname="Uberti">
<organization abbrev="Google">Google, Inc.</organization>
<address>
<postal>
<street>747 6th Street South</street>
<city>Kirkland</city>
<region>WA</region>
<code>98033</code>
<country>USA</country>
</postal>
<email>justin@uberti.name</email>
</address>
</author>
<author fullname="Stefan Holmer" initials="S." surname="Holmer">
<organization abbrev="Google">Google, Inc.</organization>
<address>
<postal>
<street>Kungsbron 2</street>
<code>111 22</code>
<city>Stockholm</city>
<country>Sweden</country>
</postal>
<email>holmer@google.com</email>
</address>
</author>
<author fullname="Magnus Flodman" initials="M." surname="Flodman">
<organization abbrev="Google">Google, Inc.</organization>
<address>
<postal>
<street>Kungsbron 2</street>
<code>111 22</code>
<city>Stockholm</city>
<country>Sweden</country>
</postal>
<email>mflodman@google.com</email>
</address>
</author>
<date/>
<area>RAI</area>
<workgroup>Payload Working Group</workgroup>
<keyword>RFC</keyword>
<keyword>Request for Comments</keyword>
<keyword>RTP</keyword>
<abstract>
<t>This memo describes the RTCP Payload-Specific Feedback Message
"Layer Refresh Request" (LRR), which can be used to request a
state refresh of one or more substreams of a layered media
stream. It also defines its use with several scalable media
formats.</t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>This memo describes an <xref target="RFC3550">RTCP</xref> <xref target="RFC4585">Payload-Specific Feedback Message</xref>
"Layer Refresh Request" (LRR). It is designed to allow a
receiver of a layered media stream to request that one or more
of its substreams be refreshed, such that it can then be
decoded by an endpoint which previously was not receiving those
layers, without requiring that the
entire stream be refreshed (as it would be if the receiver
sent a <xref target='RFC5104'>Full Intra Request (FIR)</xref>.</t>
<t>The message is designed to be applicable both to temporally
and spatially scaled streams, and to both single-stream and
multi-stream scalability modes.</t>
</section>
<section anchor="conventions"
title="Conventions, Definitions and Acronyms">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
<section anchor="terminology"
title="Terminology">
<t>A "Layer Refresh Point" is a point in a scalable stream after
which a decoder, which previously had been able to decode only some
(possibly none) of the available layers of stream, is able to
decode a greater number of the layers.</t>
<t>For spatial (or quality) layers, layer refresh typically
requires that a spatial layer be encoded in a way that
references only lower-layer subpictures of the current picture,
not any earlier pictures of that spatial layer. Additionally,
the encoder must promise that no earlier pictures of that
spatial layer will be used as reference in the future.</t>
<t>In a layer refresh, however, other layers than the ones
requested for refresh may still maintain dependency on earlier
content of the stream. This is the difference between a layer
refresh and a <xref target='RFC5104'>Full Intra
Request</xref>. This minimizes the coding overhead of refresh
to only those parts of the stream that actually need to be
refreshed at any given time.</t>
<figure anchor="figureSpatialRefreshEnhanced">
<preamble>An illustration of spatial layer refresh of an
enhancement layer is shown below.</preamble>
<artwork><![CDATA[
... <-- S1 <-- S1 S1 <-- S1 <-- ...
| | | |
\/ \/ \/ \/
... <-- S0 <-- S0 <-- S0 <-- S0 <-- ...
1 2 3 4
]]></artwork>
<postamble>In this illustration, frame 3 is a layer refresh
point for spatial layer S1; a decoder which had previously
only been decoding spatial layer S0 would be able to
decode layer S1 starting at frame 3.</postamble>
</figure>
<figure anchor="figureSpatialRefreshBase">
<preamble>An illustration of spatial layer refresh of a base layer is shown
below.</preamble>
<artwork><![CDATA[
... <-- S1 <-- S1 <-- S1 <-- S1 <-- ...
| | | |
\/ \/ \/ \/
... <-- S0 <-- S0 S0 <-- S0 <-- ...
1 2 3 4
]]></artwork>
<postamble>In this illustration, frame 3 is a layer refresh
point for spatial layer S0; a decoder which had previously
not been decoding the stream at all could decode layer S0
starting at frame 3.</postamble>
</figure>
<t>For temporal layers, layer refresh requires that the layer be
"temporally nested", i.e. use as reference only
earlier frames of a lower temporal layer, not any earlier frames of this
temporal layer, and also promise that no future frames
of this temporal layer will reference frames of this temporal
layer before the refresh point. In many cases, the temporal
structure of the stream will mean that all frames are
temporally nested, in which case decoders will have no need to
send LRR messages for the stream.</t>
<figure anchor="figureTemporalRefresh">
<preamble>An illustration of temporal layer refresh is shown
below.</preamble>
<artwork><![CDATA[
... <----- T1 <------ T1 T1 <------ ...
/ / /
|_ |_ |_
... <-- T0 <------ T0 <------ T0 <------ T0 <--- ...
1 2 3 4 5 6 7
]]></artwork>
<postamble>In this illustration, frame 6 is a layer refresh
point for temporal layer T1; a decoder which had previously
only been decoding temporal layer T0 would be able to
decode layer T1 starting at frame 6.</postamble>
</figure>
<figure anchor="figureTemporalNesting">
<preamble>An illustration of an inherently temporally nested
stream is shown below.</preamble>
<artwork><![CDATA[
T1 T1 T1
/ / /
|_ |_ |_
... <-- T0 <------ T0 <------ T0 <------ T0 <--- ...
1 2 3 4 5 6 7
]]></artwork>
<postamble>In this illustration, the stream is temporally
nested in its ordinary structure; a decoder receiving layer
T0 can begin decoding layer T1 at any point.</postamble>
</figure>
</section>
</section>
<section anchor="layerRefreshRequest" title="Layer Refresh Request">
<t>A layer refresh frame can be requested by sending a Layer Refresh Request (LRR),
which is an <xref target="RFC4585">RTCP payload-specific feedback message</xref> asking the encoder to encode a frame
which makes it possible to upgrade to a higher layer. The LRR
contains one or two tuples, indicating the layer the decoder
wants to upgrade to, and (optionally) the currently highest
layer the decoder can decode.</t>
<t>The specific format of the tuples, and the mechanism by which
a receiver recognizes a refresh frame, is
codec-dependent. Usage for several codecs is discussed in
<xref target="codec-details"/>.</t>
<t>LRR follows the model of the <xref target="RFC5104">Full
Intra Request (FIR)</xref>(Section 3.5.1) for its
retransmission, reliability, and use in multipoint conferences.
TODO: expand these here.</t>
<t>The LRR message is identified by RTCP packet type value
PT=PSFB and FMT=TBD. The FCI field MUST contain one or more FIR entries. Each entry
applies to a different media sender, identified by its SSRC.</t>
<section anchor="MessageFormat" title="Message Format">
<t>The Feedback Control Information (FCI) for the Layer Refresh Request
consists of one or more FCI entries, the content of which is
depicted in <xref target="figureFciFormat"/>. The length of
the LRR feedback message MUST be set to
2+3*N, where N is the number of FCI entries.</t>
<figure anchor="figureFciFormat">
<artwork><![CDATA[
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SSRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Seq nr. |C| Payload Type| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Target Layer Index | Current Layer Index (opt) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork>
</figure>
<t>
<list style="hanging">
<t hangText="SSRC (32 bits)"> The SSRC value of the media sender that is
requested to send a layer refresh point.</t>
<t hangText="Seq nr. (8 bits)"> Command sequence number. The sequence number
space is unique for each pairing of the SSRC of command
source and the SSRC of the command target. The sequence
number SHALL be increased by 1 modulo 256 for each new
command. A repetition SHALL NOT increase the sequence
number. The initial value is arbitrary.</t>
<t hangText="C (1 bit)">A flag bit indicating whether the
"Current Layer Index" field is present in the FCI. If
this bit is false, the sender of the LRR message is
requesting refresh of all layers up to and including the
target layer.</t>
<t hangText="Payload Type (7 bits)">The RTP payload type for
which the LRR is being requested. This gives the context in
which the target layer index is to be interpreted.</t>
<t hangText="Reserved (16 bits)"> All bits SHALL be set to 0
by the sender and SHALL be ignored on reception.</t>
<t hangText="Target Layer Index (16 bits)">The target layer
for which the receiver wishes a refresh point. Its format
is dependent on the payload type field.</t>
<t hangText="Current Layer Index (16 bits)">If C is 1, the
current layer being decoded by the receiver. This message
is not requesting refresh of layers at or below this layer.
If C is 0, this field SHALL be set to 0 by the sender and
SHALL be ignored on reception.</t>
</list>
</t>
</section>
</section>
<section anchor="codec-details" title="Usage with specific codecs">
<section title="H264 SVC">
<t><xref target="RFC6190">H.264 SVC</xref> defines temporal,
dependency (spatial), and quality scalability modes.</t>
<figure anchor="figureH264SvcIndexFormat">
<artwork><![CDATA[
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| DID | QID | TID |RES |
+---------------+---------------+
]]></artwork>
</figure>
<t><xref target="figureH264SvcIndexFormat"/> shows the format
of the layer index field for H.264 SVC streams. This is
designed to follow the same layout as the third and fourth
bytes of the H.264 SVC NAL unit extension, which carry the
stream's layer information. The "R" and "RES"
fields MUST be set to 0 on transmission and ignored on
reception. See <xref target='RFC6190'/> Section 1.1.3 for
details on the DID, QID, and TID fields.</t>
<t>A dependency or quality layer refresh of a given layer in
H.264 SVC can be identified by the "I" bit (idr_flag) in the
extended NAL unit header, present in NAL unit types 14 (prefix
NAL unit) and 20 (coded scalable slice). Layer refresh of the
base layer can also be identified by its NAL unit type of
its coded slices, which is "5" rather than "1". A dependency or
quality layer refresh is complete once this bit has been seen
on all the appropriate layers (in decoding order) above the
current layer index (if any, or beginning from the base layer
if not) through the target layer index.</t>
<t>Note that as the "I" bit in a PACSI header is set if the
corresponding bit is set in any of the aggregated NAL units it
describes; thus, it is not sufficient to identify layer
refresh when NAL units of multiple dependency or quality layers
are aggregated.</t>
<t>In H.264 SVC, temporal layer refresh information can be
determined from various Supplemental Encoding Information
(SEI) messages in the bitstream.</t>
<t>Whether an H.264 SVC stream is scalably nested can be determined from
the Scalability Information SEI message's temporal_id_nesting
flag. If this flag is set in a stream's currently applicable
Scalability Information SEI, receivers SHOULD NOT send
temporal LRR messages for that stream, as every frame is
implicitly a temporal layer refresh point. (The Scalability
Information SEI message may also be available in the signaling
negotiation of H.264 SVC, as the sprop-scalability-info
parameter.)</t>
<t>If a stream's temporal_id_nesting flag is not set, the
Temporal Level Switching Point SEI message identifies temporal
layer switching points. A temporal layer refresh is satisfied
when this SEI message is present in a frame with the target
layer index, if the message's delta_frame_num refer to a frame
with the requested current layer index. (Alternately,
temporal layer refresh can also be satisfied by a complete
state refresh, such as an IDR.) Senders which support
receiving LRR for non-scalably-nested streams MUST insert
Temporal Level Switching Point SEI messages as appropriate.</t>
</section>
<section title="VP8">
<t><xref target="I-D.ietf-payload-vp8">The VP8 RTP payload
format</xref> defines temporal scalability modes. It does not
support spatial scalability.</t>
<figure anchor="figureVP8IndexFormat">
<artwork><![CDATA[
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|TID| RES |
+---------------+---------------+
]]></artwork>
</figure>
<t><xref target="figureVP8IndexFormat"/> shows the format
of the layer index field for VP8 streams. The "RES"
fields MUST be set to 0 on transmission and ingnored on
reception. See <xref target='I-D.ietf-payload-vp8'/> Section 4.2 for
details on the TID field.</t>
<t>A VP8 layer refresh point can be identified by the presence
of the "Y" bit in the VP8 payload header. When this bit is
set, this and all subsequent frames depend only on the current
base temporal layer. On receipt of an LRR for a VP8 stream, A
sender which supports LRR MUST encode the stream so it can set the
Y bit in a packet whose temporal layer is at or below the target
layer index.</t>
<t>Note that in VP8, not every layer switch point can be
identified by the Y bit, since the Y bit implies layer switch
of all layers, not just the layer in which it is sent. Thus
the use of LRR with VP8 can result in some inefficiency in
transmision. However, this is not expected to be a major
issue for temporal structures in normal use.</t>
</section>
<section title="H265">
<t>The initial version
of <xref target="I-D.ietf-payload-rtp-h265">the H.265 payload
format</xref> defines temporal scalability, with protocol
elements reserved for spatial or other scalability modes
(which are expected to be defined in a future version of the
specification).</t>
<figure anchor="figureH265IndexFormat">
<artwork><![CDATA[
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RES | LayerId | TID |
+-------------+-----------------+
]]></artwork>
</figure>
<t><xref target="figureH265IndexFormat"/> shows the format
of the layer index field for H.265 streams. This is
designed to follow the same layout as the first and second
bytes of the H.265 NAL unit header, which carry the
stream's layer information. The "RES"
field MUST be set to 0 on transmission and ignored on
reception. See <xref target='I-D.ietf-payload-rtp-h265'/> Section 1.1.4 for
details on the LayerId and TID fields.</t>
<t>H.265 streams signal whether they are temporally nested,
using the vps_temporal_id_nesting_flag in the Video Parameter
Set (VPS), and the sps_temporal_id_nesting_flag in the Sequence
Parameter Set (SPS). If this flag is set in a stream's currently applicable
VPS or SPS, receivers SHOULD NOT send temporal LRR messages
for that stream, as every frame is implicitly a temporal layer
refresh point.</t>
<t>If a stream's sps_temporal_id_nesting_flag is not set, the
NAL unit types 2 to 5 inclusively identify temporal
layer switching points. A layer refresh to any higher
target temporal layer is satisfied when a NAL unit type of 4 or 5
with TID equal to 1 more than current TID is seen. Alternatively,
layer refresh to a target temporal layer can be incrementally
satisfied with NAL unit type of 2 or 3. In this case, given
current TID = TO and target TID = TN, layer refresh to TN is satisfied
when NAL unit type of 2 or 3 is seen for TID = T1, then TID = T2,
all the way up to TID = TN. During this incremental process, layer
referesh to TN can be completely satisfied as soon as a NAL unit type
of 2 or 3 is seen.</t>
<t>Of course, temporal layer refresh can also be satisfied whenever
any Intra Random Access Point (IRAP) NAL unit type (with values 16-23,
inclusively) is seen. An IRAP picture is similar to an IDR picture in
H.264 (NAL unit type of 5 in H.264) where decoding of the picture can start
without any older pictures.</t>
<t>In the (future) H.265 payloads that support spatial
scalability, a spatial layer refresh of a specific layer can
be identified by NAL units with the requested layer ID and NAL
unit types between 16 and 21 inclusive. A dependency or
quality layer refresh is complete once NAL units of this type have been seen
on all the appropriate layers (in decoding order) above the
current layer index (if any, or beginning from the base layer
if not) through the target layer index.</t>
</section>
<section title="VP9">
<t><xref target="I-D.uberti-payload-vp9">The RTP payload format
for VP9</xref> defines how it can be used for spatial and
temporal scalability.</t>
<figure anchor="figureVP9IndexFormat">
<artwork><![CDATA[
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-------------+-----------------+
| T |R| S | RES |
+-------------+-----------------+
]]></artwork>
</figure>
<t><xref target="figureVP9IndexFormat"/> shows the format
of the layer index field for VP9 streams. This is
designed to follow the same layout as the "L" byte
of the VP9 payload header, which carries the
stream's layer information. The "R" and "RES"
fields MUST be set to 0 on transmission and ingnored on
reception. See <xref target='I-D.uberti-payload-vp9'/> for
details on the T and S fields.</t>
<t>Identification of a layer refresh frame can be derived from the
reference IDs of each frame by backtracking the dependency chain
until reaching a point where only decodable frames are being
referenced. Therefore it's recommended for both the
flexible and the non-flexible mode that, when upgrade frames are
being encoded in response to a LRR, those packets should contain
layer indices and the reference fields so that the decoder or an
MCU can make this derivation.</t>
<t>Example:</t>
<t>LRR {1,0}, {2,1} is sent by an MCU when it is currently
relaying {1,0} to a receiver and which wants to upgrade to
{2,1}. In response the encoder should encode the next frames
in layers {1,1} and {2,1} by only referring to frames in
{1,0}, or {0,0}.</t>
<t>In the non-flexible mode, periodic upgrade frames can be
defined by the layer structure of the SS, thus periodic upgrade
frames can be automatically identified by the picture ID.</t>
</section>
</section>
<section title="Usage with different scalability transmission mechanisms">
<t>Several different mechanisms are defined for how scalable
streams can be transmitted in RTP.
The <xref target="I-D.ietf-avtext-rtp-grouping-taxonomy">RTP
Taxonomy</xref> Section 3.7 defines three mechanisms: Single RTP
Stream on a Single Media Transport (SRST), Multiple RTP Streams
on a Single Media Transport (MRST), and Multiple RTP Streams on
Multiple Media Transports (MRMT).</t>
<t>The LRR message is applicable to all these mechanisms. For
MRST and MRMT mechanisms, the "media source" field of the LRR
FCI is set to the SSRC of the RTP stream containing the layer
indicated by the Current Layer Index (if "C" is 1), or the
stream containing the base encoded stream (if "C" is 0). For
MRMT, it is sent on the RTP session on which this stream is
sent. On receipt, the sender MUST refresh all the layers
requested in the stream, simultaneously in decode order.</t>
<t>Note: arguably, for the MRST and MRMT mechanisms, FIR
feedback messages could instead be used to refresh specific individual
layers. However, the usage of FIR for MRSR/MRMT is not
explicitly specified anywhere, and if FIR is interpreted as refreshing
layers, there is no way to request an actual full, synchronized refresh of
all the layers of an MRST/MRMT layered source. Thus, the authors feel that
interpreting FIR as refreshing the entire source, and using
LRR for the individual layers, would be more useful.</t>
</section>
<section anchor="securityConsiderations" title="Security Considerations">
<t>All the security considerations of <xref target="RFC5104">FIR
feedback packets</xref> apply to LRR feedback packets as well.
Additionally, media senders receiving LRR feedback packets MUST
validate that the payload types and layer indices they are
receiving are valid for the stream they are currently sending,
and discard the requests if not.</t>
</section>
<section anchor="IANAConsiderations" title="IANA Considerations">
<t>The IANA is requested to register the following values:<vspace
blankLines="0"/> - TODO: PSFB value for LRR</t>
</section>
</middle>
<back>
<references title='Normative References'>
&vp8rtp;
&vp9rtp;
&h265rtp;
&rfc2119;
&rfc3550;
&rfc4585;
&rfc6190;
</references>
<references title='Informative References'>
&rfc5104;
&taxonomy;
</references>
</back>
</rfc>
<!-- LocalWords: PictureID DCT Hadamard WHT SSRC CSRC pyld hdr FI VER RPSI
-->
<!-- LocalWords: stPartitionSize SLI SDP AVPF SRTP IANA PID PICIDX TID
-->
| PAFTECH AB 2003-2026 | 2026-04-24 02:41:31 |