One document matched: draft-wenger-avt-rtp-svc-02.txt
Differences from draft-wenger-avt-rtp-svc-01.txt
Network Working Group S. Wenger
Internet Draft Y.-K. Wang
Document: draft-wenger-avt-rtp-svc-02.txt T. Schierl
Expires: December 2006
June 2006
RTP Payload Format for SVC Video
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 5, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
This memo describes an RTP Payload format for the scalable extension
of the ITU-T Recommendation H.264 video codec which is the
technically identical to ISO/IEC International Standard 14496-10
video codec. The RTP payload format allows for packetization of one
or more Network Abstraction Layer Units (NALUs), produced by the
video encoder, in each RTP payload. The payload format has wide
applicability, as it supports applications from simple low bit-rate
conversational usage, to Internet video streaming with interleaved
transmission, to high bit-rate video-on-demand.
Wenger, Wang, Schierl Standards Track [page 2]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
Table of Content
RTP Payload Format for SVC Video...................................1
1. Introduction.................................................5
1.1. SVC - the scalable extensions of H.264/AVC..................5
2. Conventions..................................................5
3. The SVC Codec................................................6
3.1. Overview....................................................6
3.2. Parameter Set Concept.......................................6
3.3. Network Abstraction Layer Unit Header.......................7
4. Scope........................................................9
5. Definitions and Abbreviations...............................10
5.1. Definitions................................................10
5.2. Abbreviations..............................................11
6. RTP Payload Format..........................................11
6.1. Design Principles..........................................11
6.2. RTP Header Usage...........................................12
6.3. Common Structure of the RTP Payload Format.................12
6.4. NAL Unit Header Usage......................................12
6.5. Packetization Modes........................................13
6.6. Decoding Order Number (DON)................................13
6.7. Single NAL Unit Packet.....................................13
6.8. Aggregation Packets........................................13
6.9. Fragmentation Units (FUs)..................................13
6.10. Payload Content Scalability Information (PACSI) NAL Unit...14
7. Packetization Rules.........................................15
8. De-Packetization Process (Informative)......................16
9. Payload Format Parameters...................................16
9.1. MIME Registration..........................................16
9.2. SDP Parameters.............................................18
9.2.1. Mapping of MIME Parameters to SDP........................18
9.2.2. Usage with the SDP Offer/Answer Model....................18
9.2.3. Usage in Declarative Session Descriptions................18
9.3. Examples...................................................18
9.4. Parameter Set Considerations...............................18
10. Security Considerations.....................................19
11. Congestion Control..........................................19
12. IANA Consideration..........................................20
13. NAL Unit Reordering for Layered Multicast...................20
14. Informative Appendix: Application Examples..................20
14.1. Introduction...............................................20
14.2. Layered Multicast..........................................21
14.3. Streaming of an SVC scalable stream........................22
14.4. Multicast to MANE, SVC scalable stream to endpoint.........22
Wenger, Wang, Schierl Standards Track [page 3]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
14.5. Scenarios currently not considered for complexity reasons..24
14.6. Scenarios currently not considered for being unaligned with IP
philosophy........................................................24
15. Acknowledgements............................................26
16. References..................................................26
14.7. Normative References.......................................26
14.8. Informative References.....................................26
15. Author's Addresses..........................................27
16. Intellectual Property Statement.............................27
17. Disclaimer of Validity......................................28
18. Copyright Statement.........................................28
19. RFC Editor Considerations...................................28
20. Open Issues.................................................28
21. Changes Log.................................................28
Wenger, Wang, Schierl Standards Track [page 4]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
1. Introduction
1.1. SVC - the scalable extensions of H.264/AVC
This memo specifies an RTP [RFC3550] payload format for a forthcoming
new mode of the H.264/AVC video codec, known as Scalable Video Coding
(SVC). Formally, SVC will take the form of an Amendment to ISO/IEC
14496 Part 10 [MPEG4-10], and likely as one or more new Annexes of
ITU-T Rec. H.264 [H.264]. It is planned to keep the technical
alignment between the two mentioned specifications, as well as
backward compatibility with previous versions of H.264/AVC.
The current working draft of SVC is available for public review
[SVC]. Technical maturity will be reached perhaps around mid 2006.
In this memo, SVC is used as an acronym for the mentioned scalable
extensions of H.264/AVC.
SVC covers all of H.264/AVC's applications, ranging from all forms of
digital compressed video from, low bit-rate Internet streaming
applications to HDTV broadcast and Digital Cinema applications with
nearly lossless coding.
This memo tries to follow a backward compatible enhancement
philosophy similar to what the video coding standardization
committees implement, by keeping as close an alignment to the
H.264/AVC payload RFC [RFC3984] as possible. It basically documents
the enhancements relevant from an RTP transport viewpoint, defines
signaling support for SVC, and deprecates the single NAL unit mode of
RFC 3984.
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119].
This specification uses the notion of setting and clearing a bit when
bit fields are handled. Setting a bit is the same as assigning that
bit the value of 1 (On). Clearing a bit is the same as assigning
that bit the value of 0 (Off).
Wenger, Wang, Schierl Standards Track [page 5]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
3. The SVC Codec
3.1. Overview
SVC provides scalable video bitstreams. A scalable video bitstream
contains a base layer and one or more enhancement layers. An
enhancement layer may enhance the temporal resolution (i.e. the frame
rate), the spatial resolution, or the quality of the video content
represented by the lower layer or part thereof. The scalable layers
can be aggregated to a single RTP stream, or transported
independently.
The concept of video coding layer (VCL) and network abstraction layer
(NAL) is inherited from AVC. The VCL contains the signal processing
functionality of the codec; mechanisms such as transform,
quantization, motion-compensated prediction, loop filtering and
inter-layer prediction. A coded picture of a base or enhancement
layer consists of one or more slices. The Network Abstraction Layer
(NAL) encapsulates each slice generated by the VCL into one or more
Network Abstraction Layer Units (NAL units). Please consult RFC 3984
for a more in-depth discussion of the NAL unit concept. SVC
specifies the decoding order of these NAL units.
The term "Layer" in Video Coding Layer and Network Abstraction Layer
refers to a conceptual distinction, and is closely related to syntax
layers (block, macroblock, slice, ... layers). It should not be
confused with base and enhancement layers.
The concept of scaling the visual content quality by omitting the
transport and decoding of entire enhancement layers is denoted as
coarse-grained scalability (CGS).
In some cases, the bit rate of a given enhancement layer can be
reduced by truncating bits from individual NAL units. Truncation
leads to a graceful degradation of the video quality of the
reproduced enhancement layer. This concept is known as Fine
Granularity Scalability (FGS).
3.2. Parameter Set Concept
The parameter set concept is inherited from AVC. In SVC, pictures
from different layers may use the same sequence or picture parameter
set and may also use different sequence or picture parameter sets. If
Wenger, Wang, Schierl Standards Track [page 6]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
different sequence parameter sets are used, then at any time instant
during the decoding process, there may be more than one active
sequence picture parameter set. Any specific active sequence
parameter set remains unchanged throughout a coded video sequence in
the layer in which the active sequence parameter set is referred to.
The active picture parameter set remains unchanged within a coded
picture.
3.3. Network Abstraction Layer Unit Header
An SVC NAL unit consists of a header of one or three bytes and the
payload byte string. The header indicates the type of the NAL unit,
the (potential) presence of bit errors or syntax violations in the
NAL unit payload, information regarding the relative importance of
the NAL unit for the decoding process, and (optionally, when the
header is of three bytes) the scalable layer decoding dependency
information. This RTP payload specification is designed to be unaware
of the bit string in the NAL unit payload.
The NAL unit header co-serves as the payload header of this RTP
payload format. The payload of a NAL unit follows immediately.
The syntax and semantics of the NAL unit header are specified in
[SVC], but the essential properties of the NAL unit header are
summarized below.
The first byte of the NAL unit header has the following format (the
bit fields are the same as in H.264/AVC and RFC 3984, while the
semantics are slightly different, in a backward compatible way):
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
F: 1 bit
forbidden_zero_bit. The H.264 specification declares a value of 1 as
a syntax violation.
NRI: 2 bits
nal_ref_idc. A value of 00 indicates that the content of the NAL
unit is not used to reconstruct reference pictures for inter picture
prediction. Such NAL units can be discarded without risking the
Wenger, Wang, Schierl Standards Track [page 7]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
integrity of the reference pictures in the same layer. Values
greater than 00 indicate that the decoding of the NAL unit is
required to maintain the integrity of the reference pictures. For a
slice or slice data partitioning NAL unit, a NRI value of 11
indicates that the NAL unit contains data of a key picture, as
specified in [SVC].
Informative Note: The concept of a key picture has been introduced in
SVC, and no assumption should be made that any pictures in bit
streams compliant with the 2003 and 2005 versions of H.264 follow
this rule.
Type: 5 bits
nal_unit_type. This component specifies the NAL unit payload type as
defined in table 7-1 of [SVC], and later within this memo. For a
reference of all currently defined NAL unit types and their
semantics, please refer to section 7.4.1 in [SVC].
Previously, NAL unit types 20 and 21 (among others) have been
reserved for future extensions. SVC is using these two NAL unit
types. They indicate the presence of two more bytes as shown below.
+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PRID |D| TL | DID | QL|
+---------------+---------------+
PRID: 7 bits
simple_priority_id. This component specifies a priority identifier
for the NAL unit. When simple_priority_id is not present, it shall be
inferred to be equal to 0.
D: 1 bit
discardable_flag. A value of 1 indicates that the content of the NAL
unit with dependency_id equal to currDependencyId is not used in the
decoding process of NAL units with dependency_id larger than
currDependencyId. Such NAL units can be discarded without risking
the integrity of higher scalable layers with larger values of
dependency_id. discardable_flag equal to 0 indicates that the
decoding of the NAL unit is required to maintain the integrity of
higher scalable layers with larger values of dependency_id.
TL: 3 bits
Wenger, Wang, Schierl Standards Track [page 8]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
temporal_level indicates the temporal layer (or frame rate)
hierarchy. A layer consisted of pictures of a smaller temporal_level
value has a smaller frame rate.
DID: 3 bits
dependency_id denotes the inter-layer coding dependency hierarchy. At
any temporal location, a picture of a smaller dependency_id value may
be used for inter-layer prediction for coding of a picture of a
larger dependency_id value, while a picture of a larger dependency_id
value is disallowed to be used for inter-layer prediction for coding
of a picture of a smaller dependency_id value.
QL: 2 bits
quality_level designates the quality level hierarchy of a progressive
refinement slice. At any temporal location and with identical
dependency_id value, a quality enhancement of a picture with
quality_level value equal to ql uses the quality enhancement or base
quality information (the non-quality enhancement information of the
slice when ql = 1) of the slice with quality_level value equal to ql-
1 for inter-layer prediction. When quality_level is larger than 0,
the NAL unit contains a progressive refinement slice or part thereof.
This memo introduces new NAL unit types, which are presented in
section6.3. The NAL unit types defined in this memo are marked as
unspecified in [SVC]. Moreover, this specification extends the
semantics of F, NRI, PRID, D, TL, DID and QL as described in section
6.4.
4. Scope
This payload specification can only be used to carry the "naked" SVC
NAL unit stream over RTP, and not the bitstream format according to
Annex B of [SVC]. Likely, the applications of this specification
will be in the IP based multimedia communications fields including
conversational multimedia, video telephony or video conferencing,
Internet streaming and TV over IP.
This specification allows, in a given RTP session, to encapsulate NAL
units belong to
o the base layer, or
o one or more enhancement layers, or
o the base layer and one or more enhancement layers
Wenger, Wang, Schierl Standards Track [page 9]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
5. Definitions and Abbreviations
5.1. Definitions
This document uses the definitions of [SVC] and [H.264]. The
following terms, defined in [SVC], are summed up for convenience:
scalable bitstream: an SVC compliant bit stream containing a base
layer and at least one enhancement layer.
base layer: The base layer is typically representing the minimal
temporal and, or spatial resolution and, or minimal quality of an SVC
bitstream. The base layer may be fully complying with [H.264]. The
base layer is independently decodable without the requirement of
using any other layer of the SVC bitstream. If the base layer
contains NAL units fully conforming to [H.264] only, the layer is
called H.264/AVC base layer. For such a layer the ability of
signaling transport priority (simple_priority_id or temporal_level,
dependency_id and quality_level) per NAL unit may not be given.
operation point: A operation of a SVC bitstream represents a certain
level of temporal, spatial and quality scalability. An operation
point contains all NAL units required for successfully decoding a
certain SVC enhancement layer, which represents the highest value of
temporal and, or spatial and, or quality of the operation point.
scalable enhancement layer: an SVC enhancement layer is identified
by a certain NAL unit header value (transport priority) of
simple_priority_id or, if present, by a combination of
temporal_level, dependency_id, quality_level as defined in [SVC] and
summarized in section 3.3.
access unit: A set of NAL units pertaining to a certain temporal
location. An access unit includes the slice data of the pictures of
all scalable layers at that temporal location and possibly other
associated data e.g. SEI messages and parameter sets.
coded video sequence: A sequence of access units that consists, in
decoding order, of an instantaneous decoding refresh (IDR) access
unit followed by zero or more non-IDR access units including all
subsequent access units up to but not including any subsequent IDR
access unit.
Wenger, Wang, Schierl Standards Track [page 10]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
IDR access unit: An access unit in which all the primary coded
pictures are IDR pictures.
[Edt. note: This needs to be updated according to the new adoption of
the enhancement-layer IDR (EIDR) concept in January 2006. At the time
of writing, the SVC spec update for the January JVT meeting has not
yet been available.]
IDR picture: A coded picture with the property that the decoding of
this coded picture and all the following coded pictures in decoding
order, in the same layer (i.e. with the same values of dependency_id
and quality_level, respectively), can be performed without inter
prediction from any picture prior to the coded picture in decoding
order in the same layer. An IDR picture causes a "reset" in the
decoding process of the scalable layer containing the IDR picture.
[Edt. note: This needs to be updated according to the new adoption of
the enhancement-layer IDR (EIDR) concept in January 2006. At the time
of writing, the SVC spec update for the January JVT meeting has not
yet been available.]
progressive refinement slice: A progressive refinement slice [SVC] is
contained in an SVC NAL unit and may be signaled, if extension_flag
equal to one, by a quality_level not equal to zero. Such slices can
be truncated byte-wise from the end in NAL unit payload byte-string
order for bit-rate and quality reduction. This ability is also known
as Fine Granularity Scalability (FGS).
5.2. Abbreviations
In addition to the abbreviations defined in [RFC3984], the following
ones are defined.
CGS: Coarse Granularity Scalability
FGS: Fine Granularity Scalability
6. RTP Payload Format
6.1. Design Principles
The authors tried to follow design principles as follows:
o Backward compatibility with RFC 3984 wherever possible.
o As we expect the SVC base layer to be H.264/AVC compatible, we
assume the base layer (when transmitted in its own session) to be
Wenger, Wang, Schierl Standards Track [page 11]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
encapsulated using RFC 3984. Requiring this has the desirable
side effect that it can be used by RFC 3984 legacy devices.
o MANEs are signaling aware and rely on signaling information.
In other words, MANEs have state.
o MANEs terminate RTP sessions, and create different RTP sessions
with perhaps modified content.
[Edt. Note: need to clarify this wrt. Translators and Mixers.]
o MANEs are within the security context of the RTP session.
o Packet integrity needs to be preserved end-to-end (whereby
end-to-end can mean endpoint to endpoint but also endpoint to
MANE).
o others?
6.2. RTP Header Usage
Please see section 5.1 of RFC 3984 [RFC3984].
6.3. Common Structure of the RTP Payload Format
Please see section 5.2 of RFC 3984 [RFC3984].
6.4. NAL Unit Header Usage
The structure and semantics of the NAL unit header were introduced in
section 3.3. This section specifies the semantics of F, NRI, PRID,
D, TL, DID and QL according to this specification.
The semantics of F specified in section 5.3 of [RFC3984] also applies
herein.
For NRI, for the bitstream that is compliant with AVC, the semantics
specified in section 5.3 of [H.264] are applicable, otherwise only
the semantics specified in SVC [SVC] is applicable.
For PRID, in addition to the semantics specified in [SVC], according
to this RTP payload specification, values of PRID indicate the
relative transport priority, as determined by the sender, which is
typically increasing from a layer of lower to a layer of higher
importance. MANEs implementing unequal error protection can use this
Wenger, Wang, Schierl Standards Track [page 12]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
information to protect more important NAL units better than less
important ones, for example by including only the more important NAL
units in a FEC protection mechanism. The transport priority
increases as the PRID value increases.
For D, MANEs can use this information to protect NAL units with D
equal to 0 better than NAL units with D equal to 1. Furthermore a
MANE can determine whether the transmission of a NAL unit is required
for successfully decoding a certain operation point of the SVC
bitstream.
For TL, DID and QL, in addition to the semantics specified in [SVC],
according to this RTP payload specification, values of TL, DID or QL
indicate the relative transport priority. MANEs can use this
information to protect more important NAL units better than less
important NAL units. A higher value of TL, DID or QL indicates a
higher priority if the other two components are identical
correspondingly.
Informative note: Using of PRID, D, TL, DID and QL in combination
may better indicate the relative transport priority. [Edt. note:
such examples may be provided in Informative Appendix 13 in future
versions.]
6.5. Packetization Modes
Please see section 5.4 of RFC 3984 [RFC3984]. The single NAL unit
mode SHALL NOT be used.
6.6. Decoding Order Number (DON)
Please see section 5.5 of RFC 3984 [RFC3984].
6.7. Single NAL Unit Packet
Please see section 5.6 of RFC 3984 [RFC3984].
6.8. Aggregation Packets
Please see section 5.7 of RFC 3984 [RFC3984].
6.9. Fragmentation Units (FUs)
Please see section 5.8 of RFC 3984 [RFC3984].
Wenger, Wang, Schierl Standards Track [page 13]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
6.10. Payload Content Scalability Information (PACSI) NAL Unit
A new NAL unit type, referred to as payload content scalability
information (PACSI) NAL unit, is specified. The PACSI NAL unit, if
present, MUST be the first NAL unit in an aggregation packet, and it
MUST NOT be present in other types of packets. The PACSI NAL unit
indicates scalability characteristics that are common for all the
remaining NAL units in the payload, thus making it easier for MANEs
to decide whether to forward or discard the packet. Senders MAY
create PACSI NAL units and receivers can ignore them.
Informative note: The NAL unit type for the PACSI NAL unit is
selected among those values that are unspecified in the H.264/AVC
specification and in RFC 3984. Thus, SVC streams having H.264/AVC
base layer and including PACSI NAL units can be processed with RFC
3984 receivers and H.264/AVC decoders.
When the first aggregation unit of an aggregation packet contains a
PACSI NAL unit, there MUST be at least one additional aggregation
unit present in the same packet. The RTP header fields are set
according to the remaining NAL units in the aggregation packet.
When a PACSI NAL unit is included in a multi-time aggregation packet,
the decoding order number for the PACSI NAL unit MUST set to indicate
that the PACSI NAL unit is the first NAL unit in decoding order among
the NAL units in the aggregation packet or the PACSI NAL unit has an
identical decoding order number to the first NAL unit in decoding
order among the remaining NAL units in the aggregation packet.
The structure of PACSI NAL unit is specified in Figure 1.
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type | PRID |D| TL | DID | QL|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1. Structure for payload content scalability information
(PACSI) NAL unit.
Wenger, Wang, Schierl Standards Track [page 14]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
The semantics of the fields in the PACSI NAL unit structure are
specified in section 3.3. The values of the fields in PACSI NAL unit
MUST be set as follows.
o The F bit MUST be set to 1 if the F bit in any remaining NAL unit
in the payload is equal to 1. Otherwise, the F bit MUST be set to
0.
o The NRI field MUST be set to the highest value of NRI field among
the remaining NAL units in the payload.
o The Type field MUST be set to 30.
o The PRID field MUST be set to the lowest value of the PRID field
among the remaining NAL units in the payload. If the PRID field is
not present in one of the remaining NAL units in the payload, the
PRID field in the PACSI NAL unit must be set to 0.
o The D bit MUST be set to 0 if the D bit in any remaining NAL unit
in the payload is equal to 0. Otherwise, the D bit MUST be set to
1.
o The TL field MUST be set to the lowest value of the TL field among
the remaining NAL units in the payload.
o The DID field MUST be set to the lowest value of the DID field
among the remaining NAL units in the payload.
o The QL field MUST be set to the lowest value of the QL field among
the remaining NAL units in the payload.
7. Packetization Rules
Please see section 6 of RFC 3984 [RFC3984]. The following rules
apply in addition.
The single NAL unit mode SHALL NOT be used.
In an RTP session, the first NAL unit of an aggregation packet SHALL
have a two- or three-byte NAL unit header containing the transport
priority indicator, as described in section 3.3. Non-VCL NAL units
SHALL be transmitted out-of-band or in a separate session for the
current state of this specification. If aggregating NAL units of
different layers within one aggregation packet, the first NAL unit of
Wenger, Wang, Schierl Standards Track [page 15]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
the packet MUST have the highest transport priority of all NAL units
contained in the packet. The order of NAL units within a packet is
the same as the decoding order.
8. De-Packetization Process (Informative)
Please see section 7 of RFC 3984 [RFC3984]. The following rules
apply in addition.
The single NAL unit mode SHALL NOT be used.
Layered multicast is supported by this specification. An informative
appendix on recovering NAL unit decoding order in layered multicast
can be found in section 14.
9. Payload Format Parameters
[Edt. note: this section 9 and its subsections will be updated
according to the changes listed below, a little later in the process.
For now, we just list the adjustments necessary, so not to bury any
new information in the RFC 3984 text.]
Section 8 of [RFC3984] applies with the following modification.
The sentence
"The parameters are specified here as part of the MIME subtype
registration for the ITU-T H.264 | ISO/IEC 14496-10 codec."
is replaced with
"The parameters are specified here as part of the MIME subtype
registration for the SVC codec."
9.1. MIME Registration
The MIME subtype for the SVC codec is allocated from the IETF tree.
The receiver MUST ignore any unspecified parameter.
Media Type name: video
Media subtype name: H.264-SVC
Wenger, Wang, Schierl Standards Track [page 16]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
Required parameters: none
OPTIONAL parameters:
The optional MIME parameters specified in [RFC3984] apply, in
addition to the following.
sprop-scalability-info:
This parameter MAY be used to convey the NAL unit containing the
scalability information SEI message that MUST precede any other NAL
units in decoding order. The parameter MUST NOT be used to indicate
codec capability in any capability exchange procedure. The value of
the parameter is the base64 representation of the NAL unit containing
the scalability information SEI message as specified in [SVC].
sprop-transport-priority:
This parameter MAY be used to signal the transport priority indicator
value(s) in terms of the one or two byte SVC NAL unit header
extension of one or more SVC layer(s) of one RTP session. A
transport priority indicator is base64 coded. If more than one layer
is transmitted within one RTP session, the transport priority
indicator value of each layer MUST be itemized with decreasing
importance for decoding and MUST be comma-separated.
If a H.264/AVC base layer is part of the RTP session, this parameter
SHALL not be used.
Encoding considerations:
This type is only defined for transfer via
RTP (RFC 3550).
Security considerations:
See section 9 of this specification.
Public specification:
Please refer to section 15 of this
specification.
Additional information:
None
File extensions: none
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Wenger, Wang, Schierl Standards Track [page 17]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
Intended usage: COMMON
Author:
Change controller:
IETF Audio/Video Transport working group
delegated from the IESG.
9.2. SDP Parameters
9.2.1. Mapping of MIME Parameters to SDP
The MIME media type video/SVC string is mapped to fields in the
Session Description Protocol (SDP) as follows:
* The media name in the "m=" line of SDP MUST be video.
* The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the
MIME subtype).
* The clock rate in the "a=rtpmap" line MUST be 90000.
* The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
"max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
parameter-sets", "parameter-add", "packetization-mode", "sprop-
interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
"sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-size",
"sprop-transport-priority", and "sprop-scalability-info", when
present, MUST be included in the "a=fmtp" line of SDP. These
parameters are expressed as a MIME media type string, in the form
of a semicolon separated list of parameter=value pairs.
9.2.2. Usage with the SDP Offer/Answer Model
TBD.
9.2.3. Usage in Declarative Session Descriptions
TBD.
9.3. Examples
TBD.
9.4. Parameter Set Considerations
Wenger, Wang, Schierl Standards Track [page 18]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
Please see section 10 of RFC 3984 [RFC3984].
10. Security Considerations
Please see section 11 of RFC 3984 [RFC3984].
11. Congestion Control
Within any given RTP session carrying payload according to this
specification, the provisions of section 12 of RFC 3984 [RFC3984]
apply.
One key motivation for the recent attention to scalable codecs has
been the increasing awareness of media codec designers to network
congestion. While CGS scalability cannot reduce congestion for the
transport path of a given RTP session, MANEs and layered multicast
technologies can be used to alleviate congestion on a larger scale.
FGS scalability can be helpful to reduce session bandwidth both end-
to-end (with pre-coded content) and in network segments, again
assuming the use of MANEs.
MANEs MAY alleviate congestion on their outgoing network path by
a) removing the NAL units belonging to hierarchically "highest"
enhancement layer (or set of enhancement layers) from an RTP
stream carrying base and enhancement layers.
b) removing some or all bits of a given FGS NAL unit as long as the
remaining bits still form a conforming SVC NAL unit.
Edt. note: In the following paragraph, "translator" and "mixer" are
not used consistently with RFC 3550. What we think we would need is
a "mixer" that mixes only a single input in a single output (as a
mixer terminates sessions). A "Translator" (that does not terminate
the RTP session) carries certain unnecessary baggage which appears to
make it undesirable for MANEs. The following paragraph can either be
fixed into RFC 3550 style and logic (thereby removing an operation
point we consider desirable), or we would need to explain in detail
what we want to do (not really congestion control related and long).
Perhaps we refer to the detailed discussions in the CCM draft...
Added to open issues.
In both cases, the incoming RTP session is terminated in the MANE,
and a second RTP session originates at the MANE. The MANE acts as an
RTP translator. The concept of scalability keeps the implementation
and computational effort within the MANE low, and avoids expensive
Wenger, Wang, Schierl Standards Track [page 19]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
and delay-intensive full transcoding (in the sense of reconstruction
and re-encoding).
When scalable layers are transported in their own RTP sessions, an
RTP receiver SHOULD unsubscribe to one or more enhancement layers
when it senses congestion, similar to what has been described in
[McCanne/Vetterli]. This behavior could perhaps be sufficient to
ease the network load to an acceptable level of congestion.
Nevertheless, it MUST follow the mechanisms described in section 12
of [RFC3984].
12. IANA Consideration
[Edt. note: A new MIME type should be registered from IANA.]
13. NAL Unit Reordering for Layered Multicast
Layered multicast is described in 14.2. In layered multicast, the
base layer, one or more enhancement layers, or the base layer and one
or more enhancement layers may be transmitted within separate RTP
sessions, i.e. the NAL units required for decoding an access unit of
a certain operation point of the scalable bitstream may be
distributed in different RTP sessions. After receiving NAL units
from different RTP sessions, recovering of the NAL unit decoding
order is required.
To enable a simple NAL unit decoder order recovering process, the
following constraints shall be fulfilled in layered multicast:
* The interleaved packetization mode must be used.
* The DON values of all the NAL units, as specified section 5.5 of
RFC 3984, shall indicate the correct NAL unit decoding order over
all the RTP sessions.
14. Informative Appendix: Application Examples
14.1. Introduction
Scalable video coding is a concept that has been around at least
since MPEG-2 [MPEG2], which goes back as early as 1993.
Nevertheless, it has never gained wide acceptance; perhaps partly
Wenger, Wang, Schierl Standards Track [page 20]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
because applications didn't materialize in the form envisioned during
standardization.
MPEG and JVT, respectively, performed a requirement analysis before
the SVC project was launched. Dozens of scenarios have been studied.
While some of the scenarios appear not to follow the most basic
design principles of the Internet -- and are therefore not
appropriate for IETF standardization -- others are clearly in the
scope of IETF work. Of these, this draft chooses the following
subset for immediate consideration. Note that we do not reference
the MPEG and JVT documents directly; partly, because at least the
MPEG documents have a limited lifespan and are not publicly
available, and partly because the language used in these documents is
inappropriately video centric and imprecise, when it comes to
protocol matters.
With these remarks, we now introduce three main application scenarios
that we consider as relevant, and that are implementable with this
specification.
14.2. Layered Multicast
This well-understood form of the use of layered coding
[McCanne/Vetterli] implies that all layers are individually conveyed
in their own RTP session using their own IP multicast address.
Receivers "tune" into the layers by subscribing to the IP multicast,
normally by using IGMP [IGMP]. Optimization forms could be
envisioned in which a number of layers are sent combined in a single
RTP session; but these optimizations are currently not considered in
this document.
Layered Multicast has the great advantage of simplicity and easy
implementation. However, it has also the great disadvantage of
utilizing many different ports. While we consider this not to be a
major problem for a professionally maintained content server,
receiving client endpoints need to open many ports to IP multicast
addresses in their firewalls. This is a practical problem from a
firewall/NAT viewpoint. Furthermore, even today IP multicast is not
as widely deployed as many wish.
We consider layered multicast an important application scenario for
three reasons. First, it is well understood and the implementation
constraints are well known. There may well by large scale IP
networks outside the immediate Internet context that may wish to
Wenger, Wang, Schierl Standards Track [page 21]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
employ layered multicast in the future. One possible example could
be a combination of content creation and core-network distribution
for the various mobile TV services, e.g. those being developed by
3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H]. Finally, when one base
and one enhancement layer is in use and are being conveyed
separately, that represents one operation point of layered multicast.
14.3. Streaming of an SVC scalable stream
In this scenario, a streaming server has a repository of stored SVC
coded layers for a given content. At the time of streaming, and
according to the capabilities and connectivity of the client(s), the
streaming server generates a scalable stream. This scalable stream
is served to the client(s). Both unicast and multicast serving is
possible. At the same time, the streaming server may use the same
repository of stored layers to compose different streams (with a
different set of layers) intended for different audiences.
As every endpoint receives only a single SVC RTP session, the number
of firewall pinholes can be optimized. In fact, only a single
firewall pinhole is required.
The main difference between this scenario and straightforward
simulcasting lies in the architecture and the requirements of the
streaming server, and is therefore out of the scope of IETF
standardization. However, compelling arguments can be made why such
a streaming server design makes sense. One possible argument is
related to storage space and channel bandwidth. Another is bandwidth
adaptivity without transcoding -- a considerable advantage in a
congestion controlled network. When the streaming server learns
about congestion, it can reduce sending bitrate by choosing fewer
layers when composing the layered stream. SVC is designed to
gracefully support both bandwidth rampdown and bandwidth rampup with
a considerable dynamic range. This payload format is designed to
allow for bandwidth flexibility in the mentioned sense, both for CGS
and FGS layers. While, in theory, a transcoding step could achieve a
similar dynamic range, the computational demands are impractically
high and video quality is typically lowered -- therefore, few (if
any) streaming servers implement full transcoding.
14.4. Multicast to MANE, SVC scalable stream to endpoint
This final scenario is a bit more complex, and designed to optimize
the network traffic in a core network, while still requiring only a
Wenger, Wang, Schierl Standards Track [page 22]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
single pinhole in the endpoint's firewall. One of its key
applications is the mobile TV market.
Consider a large IP network, e.g. the core network of 3GPP.
Streaming servers within this core network can be assumed to be
professionally maintained. We assume that these servers can have
many ports open to the network and that layered multicast is a real
option. Therefore, we assume that the streaming server multicasts
SVC scalable layers, instead of simulcasting different
representations of the same content at different bit rates.
Also consider many endpoints of different classes. Some of these
endpoints may not have the processing power or the display size to
meaningfully decode all layers; other may have these capabilities.
Users of some endpoints may not wish to pay for high quality and are
happy with a base service, which may be cheaper or even free. Other
users are willing to pay for high quality. Finally, some connected
users may have a bandwidth problem in that they can't receive the
bandwidth they would want to receive -- be it through congestion,
change of service quality, or for whatever other reasons. However,
all these users have in common that they don't want to be exposed too
much, and therefore the number of firewall pinholes need to be small.
This situation can be handled best by introducing middleboxes close
to the edge of the core network, which receive the layered multicast
streams and compose the single SVC scalable bit stream according to
the needs of the endpoint connected. These middleboxes are called
MANEs throughout this specification. In practice, we envision the
MANE to be part of (or at least physically and topologically close
to) the base station of a mobile network, where all the signaling and
media traffic necessarily are multiplexed on the same physical link.
This is why we do not worry too much about decomposition aspects of
the MANE as such.
Edt. note: In the following paragraph, Mixers and Translators need to
be clarified.
MANEs necessarily need to be fairly complex devices. They certainly
need to understand the signaling, so, for example, to associate the
PT octet in the RTP header with the SVC payload type. Furthermore,
they terminate the multicasted layered RTP sessions coming in from
the core network side, and create new RTP sessions (perhaps even
multicast sessions) to the endpoints connected to them. In RTP
terminology, it appears that MANEs necessarily are mixers AND
Wenger, Wang, Schierl Standards Track [page 23]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
translators; a MANE first mixes the content of one or more incoming
RTP streams, and then "translates" it into the outgoing stream (which
may involve pruning FGS coded NAL units and similar tasks).
While the implementation complexity of a MANE, as discussed above, is
fairly high, the computational demands are comparatively low. In
particular, SVC and/or this specification contain means to easily
generate the correct inter-layer decoding order of NAL units. It is
also simple to identify the fine granularity scalable bits in a given
NAL unit. No serious bit-oriented processing is required and no
significant state information (beyond that of the signaling and
perhaps the SVC sequence parameter sets) need to be kept.
Finally, another scenario with very similar properties could be
implemented in which the streaming server would send a single SVC
scalable stream (containing basically all available scalable layers)
to the MANE, and the MANE de-layers this scalable bit stream into its
individual layers, before further processing.
14.5. Scenarios currently not considered for complexity reasons
-- vacat --
14.6. Scenarios currently not considered for being unaligned with IP
philosophy
Remarks have been made that the current draft does not take into
consideration at least one application scenario which some JVT folks
consider important. In particular, their idea is to make the RTP
payload format (or the media stream itself) self-contained enough
that a stateless, non signaling aware device can "thin" an RTP
session to meet the bandwidth demands of the endpoint. They call
this device a "Router" or "Gateway", and sometimes a MANE.
Obviously, it's not a Router or Gateway in the IETF sense. To
distinguish it from a MANE as defined in RFC 3984 and in this
specification, let's call it a MDfH (Magic Device from Heaven).
To simplify discussions, let's assume point-to-point traffic only.
The endpoint has a signaling relationship with the streaming server,
but it is known that the MDfH is somewhere in the media path (e.g.
because the physical network topology ensures this). It has been
requested, at least implicitly through MPEG's and JVT's requirements
document, that the MDfH should be capable to intercept the SVC
Wenger, Wang, Schierl Standards Track [page 24]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
scalable bit stream, modify it by dropping packets or parts thereof,
and forwarding the resulting packet stream to the receiving endpoint.
It has been requested that this payload specification contains
protocol elements facilitating such an operation, and the argument
has been made that the NRI field of RFC 3984 serves exactly the same
purpose.
The authors of this I-D do not consider the scenario above to be
aligned with the most basic design philosophies the IETF follows, and
therefore have not addressed the comments made (except through this
section). In particular, we see the following problems with the MDfH
approach):
- As the very minimum, the MDfH would need to know which RTP streams
are carrying SVC. We don't see how this could be accomplished but
by using a static payload type. None of the IETF defined RTP
profiles envision static payload types for SVC, and even the de-
facto profiles developed by some application standard
organizations (3GPP for example) do not use this outdated concept.
Therefore, the MDfH necessarily needs to be at least "listening"
to the signaling.
- If the RTP packet payload were encrypted, it would be impossible
to interpret the payload header and/or the first bytes of the
media stream. We understand that there are crypto schemes under
discussion that encrypt only the last n bytes of an RTP payload,
but we are more than unsure that this is fully in line with the
IETF's security vision.
Even if the above two problems would have been overcome through
standardization outside of the IETF, we still foresee serious design
flaws:
- An MDfH can't simply dump RTP packets it doesn't want to forward.
It either needs to act as a full RTP Translator (implying that it
patches RTCP RRs and such), or it needs to patch the RTP sequence
numbers to fulfill the RTP specification. Not doing either would,
for the receiver, look like the gaps in the sequence numbers
occurred due to unintentional erasures, which has interesting
effects on congestion control (if implemented), will break pretty
much every meta-payload ever developed, and so on. (Many more
points could be made here).
- An MDfH also can't "prune" FGS packets. Again, doing so would not
be compatible with meta payloads, and would mess up RTCP RRs and
congestion control (if the congestion control is based on octet
Wenger, Wang, Schierl Standards Track [page 25]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
count and not on packet count; there are discussions related to
the former at least in the context of TFRC).
In summary, based on our current knowledge we are not willing to
specify protocol mechanisms that support an operation point that has
so little in common with classic RTP use.
15. Acknowledgements
Funding for the RFC Editor function is currently provided by the
Internet Society.
16. References
16.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[MPEG4-10] ISO/IEC International Standard 14496-10:2003.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for
generic audiovisual services", May 2003.
[SVC] Joint Video Team, "Joint Scalable Video Model JSVM-4 Annex
G", available from http://ftp3.itu.ch/av-arch/jvt-site/
2005_10_Nice/JVT-Q202.zip., October 2005
[RFC3984] Wenger, S., Hannuksela, M, Stockhammer, T, Westerlund, M,
Singer, D, "RTP Payload Format for H.264 Video", RFC 3984,
February 2005
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
16.2. Informative References
[DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H
Implementation Guidelines, ETSI TR 102 377, 2005
[IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B. and
Thyagarajan, A., "Internet Group Management Protocol, Version 3", RFC
3376, October 2002.
Wenger, Wang, Schierl Standards Track [page 26]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
[McCanne/Vetterli] V. Jacobson, S. McCanne and M. Vetterli. Receiver-
driven layered multicast. In Proc. of ACM SIGCOMM'96, pages
117--130, Stanford, CA, August 1996.
[MBMS] 3GPP - Technical Specification Group Services and System
Aspects; Multimedia Broadcast/Multicast Service (MBMS);
Protocols and codecs (Release 6), December 2005
[MPEG2] ISO/IEC International Standard 13818-2:1993.
17. Author's Addresses
Stephan Wenger Phone: +358-50-486-0637
Nokia Research Center Email: stewe@stewe.org
P.O. Box 100
FIN-33721 Tampere
Finland
Ye-Kui Wang Phone: +358-50-486-7004
Nokia Research Center Email: ye-kui.wang@nokia.com
P.O. Box 100
FIN-33721 Tampere
Finland
Thomas Schierl Phone: +49-30-31002-227
Fraunhofer HHI Email: schierl@hhi.fhg.de
Einsteinufer 37
D-10587 Berlin
Germany
18. Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
Wenger, Wang, Schierl Standards Track [page 27]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
19. Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
20. Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
21. RFC Editor Considerations
none
22. Open Issues
3. Need to clarify MANE, Mixers, and Translators throughout the
document (consistently with RFC 3550).
4. Packetization rules need work oncs 3) is addressed
5. Alignment with JVT spec (ongoing)
23. Changes Log
From -00 to -01
- 04.02.2006, StW: Added details to scope
Wenger, Wang, Schierl Standards Track [page 28]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format February 2006
- 04.02.2006, StW: Added short subsection 6.1 "Design Principles"
- 04.02.2006, StW: Added section 15, "Application Examples"
- 06.02 - 03.03.2006, YkW: Various modifications throughout the document
- 13.02.2006 - 03.03.2006 , ThS: Added definitions and additional
information to section 3.3, 5.1, 7 and 8, parameters in section 9.1 and
added section 14 for NAL unit re-ordering for layered multicast. Further
modifications throughout the document
From -01 to -02
- 06.03.2006, StW: Editorial improvements
- 26.05.2006, YkW: Updated NAL unit header syntax and semantics
according to the latest draft SVC spec
- 20.06.2006, Miska/YkW: Added section 6.10 "Payload Content Scalability
Information (PACSI) NAL Unit"
- 20.06.2006, YkW: Updated the NAL unit reordering process for layered
multicast (removed the old section 14 "Informative Appendix: NAL Unit
Re-ordering for Layered Multicast" and added the new section 13 "NAL
Unit Reordering for Layered Multicast")
Wenger, Wang, Schierl Standards Track [page 29]
| PAFTECH AB 2003-2026 | 2026-04-24 05:50:57 |