One document matched: draft-wenger-avt-rtp-svc-00.txt
Network Working Group S. Wenger
Internet Draft Y.-K. Wang
Document: draft-wenger-avt-rtp-svc-00.txt
Expires: April 2006
October 2005
RTP Payload Format for SVC Video
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on March 29, 2006.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This memo describes an RTP Payload format for the scalable extension
of the ITU-T Recommendation H.264 video codec which is the
technically identical to ISO/IEC International Standard 14496-10
video codec. The RTP payload format allows for packetization of one
or more Network Abstraction Layer Units (NALUs), produced by the
video encoder, in each RTP payload. The payload format has wide
applicability, as it supports applications from simple low bit-rate
conversational usage, to Internet video streaming with interleaved
transmission, to high bit-rate video-on-demand.
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
Table of Content
RTP Payload Format for SVC Video...............................1
1. Introduction............................................3
1.1. SVC - the scalable enhancement of H.264/AVC...............3
2. Conventions.............................................3
3. The SVC Codec ...........................................3
3.1. Overview..............................................3
3.2. Parameter Set Concept...................................4
3.3. Network Abstraction Layer Unit Header ....................4
4. Scope...................................................6
5. Definitions and Abbreviations.............................7
5.1. Definitions............................................7
5.2. Abbreviations..........................................7
6. RTP Payload Format.......................................7
6.1. RTP Header Usage.......................................7
6.2. Common Structure of the RTP Payload Format................8
6.3. NAL Unit Header Usage...................................8
6.4. Packetization Modes.....................................8
6.5. Decoding Order Number (DON).............................8
6.6. Single NAL Unit Packet..................................8
6.7. Aggregation Packets.....................................8
6.8. Fragmentation Units (FUs)...............................9
7. Packetization Rules......................................9
8. De-Packetization Process (Informative).....................9
9. Payload Format Parameters.................................9
9.1. MIME Registration......................................9
9.1.1. Mapping of MIME Parameters to SDP......................10
9.1.2. Usage with the SDP Offer/Answer Model..................11
9.1.3. Usage in Declarative Session Descriptions..............11
10. Examples...............................................11
11. Parameter Set Considerations.............................11
12. Security Considerations .................................11
13. Congestion Control......................................11
14. IANA Consideration......................................12
15. Acknowledgements........................................12
16. References.............................................12
16.1. Normative References...................................12
16.2. Informative References.................................12
17. Author's Addresses......................................12
RFC Editor Considerations ....................................13
Open Issues.................................................13
18. Changes Log............................................13
Wenger, Wang Standards Track [page 2]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
1. Introduction
1.1. SVC - the scalable enhancement of H.264/AVC
This memo specifies an RTP [RFC3550] payload format for a
forthcoming new mode of the H.264/AVC video codec, known as Scalable
Video Coding (SVC). Formally, SVC will take the form of an Amendment
to ISO/IEC 14496 Part 10 [MPEG4-10], and likely as one or more new
Annexes of ITU-T Rec. H.264 [H.264]. It is planned to keep the
technical alignment between the two mentioned specifications, as
well as backward compatibility with previous versions of H.264/AVC.
The current working draft of SVC is available for public review
[SVC]. Technical maturity will be reached perhaps around mid 2006
for which timeframe the ISO/IEC Committee Draft is expected. In
this memo, SVC is used as an acronym for the mentioned scalable
extensions of H.264/AVC.
SVC covers all of H.264/AVC's applications, ranging from all forms
of digital compressed video from, low bit-rate Internet streaming
applications to HDTV broadcast and Digital Cinema applications with
nearly lossless coding.
This memo tries to follow a similar philosophy by keeping as close
an alignment to the H.264/AVC payload RFC [RFC3984] as possible. It
basically documents the enhancements relevant from an RTP transport
viewpoint, defines signaling support for SVC, and deprecates the
single NAL unit mode of RFC 3984.
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119
[RFC2119].
This specification uses the notion of setting and clearing a bit
when bit fields are handled. Setting a bit is the same as assigning
that bit the value of 1 (On). Clearing a bit is the same as
assigning that bit the value of 0 (Off).
3. The SVC Codec
3.1. Overview
SVC provides scalable video bitstreams. A scalable video bitstream
contains a non-scalable base layer and one or more enhancement
layers. An enhancement layer may enhance the temporal resolution
(i.e. the frame rate), the spatial resolution, or the quality of the
video content represented by the lower layer or part thereof. The
scalable layers can be aggregated to a single RTP stream, or
transported independently.
Wenger, Wang Standards Track [page 3]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
The concept of video coding layer (VCL) and network abstraction
layer (NAL) is inherited from AVC. The VCL contains the signal
processing functionality of the codec; mechanisms such as transform,
quantization, motion-compensated prediction, loop filter, inter-
layer prediction. A coded picture of a base or enhancement layer
consists of one or more slices. The Network Abstraction Layer (NAL)
encapsulates each slice generated by the VCL into one or more
Network Abstraction Layer Units (NAL units). Please consult RFC 3984
for a more in-depth discussion of the NAL unit concept.
Each SVC layer is formed by NAL units, representing the coded video
bits of the layer. An RTP stream carrying only one layer would
carry NAL units belonging to that layer only. An RTP stream
carrying a complete scalable video bit stream would carry NAL units
of a base layer and one or more enhancement layers. SVC specifies
the decoding order of these NAL units.
The concept of scaling the visual content quality by omitting the
transport and decoding of entire enhancement layers is denoted as
coarse-grained scalability (CGS).
In some cases, the bit rate of a given enhancement layer can be
reduced by truncating bits from individual NAL units. Truncation
leads to a graceful degradation of the video quality of the
reproduced enhancement layer. This concept is known as fine-grained
(granularity) scalability (FGS).
3.2. Parameter Set Concept
The parameter set concept is inherited from AVC. In SVC, pictures
from different layers may use the same sequence or picture parameter
set and may also use different sequence or picture parameter sets.
If different sequence parameter sets are used, then at any time
instant during the decoding process, there may be more than one
active sequence picture parameter set. Any specific active sequence
parameter set remains unchanged throughout a coded video sequence,
and any active picture parameter set remains unchanged within a
coded picture.
3.3. Network Abstraction Layer Unit Header
An SVC NAL unit consists of a header of one, two or three bytes and
the payload byte string. The header indicates the type of the NAL
unit, the (potential) presence of bit errors or syntax violations in
the NAL unit payload, information regarding the relative importance
of the NAL unit for the decoding process, and (optionally, when the
header is of three bytes) the scalable layer decoding dependency
information. This RTP payload specification is designed to be
unaware of the bit string in the NAL unit payload.
The NAL unit header co-serves as the payload header of this RTP
payload format. The payload of a NAL unit follows immediately.
Wenger, Wang Standards Track [page 4]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
The syntax and semantics of the NAL unit header are specified in
[SVC], but the essential properties of the NAL unit header are
summarized below.
The first byte of the NAL unit header has the following format (it's
the same as in H.264/AVC and RFC 3984):
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
F: 1 bit
forbidden_zero_bit. The H.264 specification declares a value of 1
as a syntax violation.
NRI: 2 bits
nal_ref_idc. A value of 00 indicates that the content of the NAL
unit is not used to reconstruct reference pictures for inter picture
prediction. Such NAL units can be discarded without risking the
integrity of the reference pictures in the same layer. Values
greater than 00 indicate that the decoding of the NAL unit is
required to maintain the integrity of the reference pictures. For a
slice or slice data partitioning NAL unit, a NRI value of 11
indicates that the NAL unit contains data of a key picture, as
specified in [SVC].
Informative Note: The concept of a key picture has been introduced
in SVC, and no assumption should be made that any pictures in bit
streams compliant with the 2003 and 2005 versions of H.264 follow
this rule.
Type: 5 bits
nal_unit_type. This component specifies the NAL unit payload type
as defined in table 7-1 of [SVC], and later within this memo. For a
reference of all currently defined NAL unit types and their
semantics, please refer to section 7.4.1 in [SVC].
Previously, NAL unit types 20 and 21 (among others) have been
reserved for future extensions. SVC is using these two NAL unit
types. They indicate the presence of one more byte that is helpful
from a transport viewpoint.
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| PRID |D|E|
+---------------+
PRID: 6 bits
simple_priority_id. This component specifies a priority identifier
for the NAL unit. When extension_flag is equal to 0,
simple_priority_id is used for inferring the values of
dependency_id, temporal_level, and quality_level. When
Wenger, Wang Standards Track [page 5]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
simple_priority_id is not present, it shall be inferred to be equal
to 0.
D: 1 bit
discardable_flag. A value of 1 indicates that the content of the
NAL unit (dependency_id = currDependencyId) is not used in the
decoding process of NAL units with dependency_id > currDependencyId.
Such NAL units can be discarded without risking the integrity of
higher scalable layers with larger values of dependency_id.
discardable_flag equal to 0 indicates that the decoding of the NAL
unit is required to maintain the integrity of higher scalable layers
with larger values of dependency_id.
E: 1 bit
extension_flag. A value of 1 indicates that the third byte of the
NAL unit header is present.
When the E-bit of the second byte is 1, then the NAL unit header
extends to a third byte:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| TL | DID | QL|
+---------------+
TL: 3 bits
temporal_level. This component is used to indicate temporal
scalability or frame rate. A layer consisted of pictures of a
smaller temporal_level value has a smaller frame rate.
DID: 3 bits
dependency_id. This component is used to indicate the inter-layer
coding dependency hierarchy. At any temporal location, a picture of
a smaller dependency_id value may be used for inter-layer prediction
for coding of a picture with a larger dependency_id value.
QL: 2 bits
quality_level. This component is used to indicate FGS layer
hierarchy. At any temporal location and with identical dependency_id
value, an FGS picture with quality_level value equal to QL uses the
FGS picture or base quality picture (the non-FGS picture when QL-1 =
0) with quality_level value equal to QL-1 for inter-layer
prediction. When QL is larger than 0, the NAL unit contains an FGS
slice or part thereof.
This memo introduces new NAL unit types, which are presented in
section 5.2. The NAL unit types defined in this memo are marked as
unspecified in [SVC]. Moreover, this specification extends the
semantics of F, NRI, PRID, D, TL, DID and QL as described in section
5.3.
4. Scope
Wenger, Wang Standards Track [page 6]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
This payload specification can only be used to carry the "naked" SVC
NAL unit stream over RTP, and not the bitstream format discussed in
Annex B of SVC. Likely, the applications of this specification will
be in the IP based multimedia communications fields including
conversational multimedia, video telephony or video conferencing,
Internet streaming and TV over IP.
5. Definitions and Abbreviations
5.1. Definitions
This document uses the definitions of [SVC] and [H.264]. The
following terms, defined in [SVC], are summed up for convenience:
scalable bitstream: an SVC compliant bit stream containing a base
layer and at least one enhancement layer
access unit: A set of NAL units pertaining to a certain temporal
location. An access unit includes the slice data of the pictures of
all scalable layers at that temporal location and possibly other
associated data e.g. SEI messages and parameter sets.
coded video sequence: A sequence of access units that consists, in
decoding order, of an instantaneous decoding refresh (IDR) access
unit followed by zero or more non-IDR access units including all
subsequent access units up to but not including any subsequent IDR
access unit.
IDR access unit: An access unit in which all the primary coded
pictures are IDR pictures.
IDR picture: A coded picture with the property that the decoding of
this coded picture and all the following coded pictures in decoding
order, in the same layer (i.e. with the same values of dependency_id
and quality_level, respectively), can be performed without inter
prediction from any picture prior to the coded picture in decoding
order in the same layer. An IDR picture causes a "reset" in the
decoding process of the scalable layer containing the IDR picture.
5.2. Abbreviations
In addition to the abbreviations defined in [RFC3984], the following
ones are defined.
CGS: Coarse granularity scalability
FGS: Fine granularity scalability
MANE: Media-aware network element
6. RTP Payload Format
6.1. RTP Header Usage
Please see section 5.1 of RFC3984 [RFC3984].
Wenger, Wang Standards Track [page 7]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
6.2. Common Structure of the RTP Payload Format
Please see section 5.2 of RFC3984 [RFC3984].
6.3. NAL Unit Header Usage
The structure and semantics of the NAL unit header were introduced
in section 3.3. This section specifies the semantics of F, NRI,
PRID, D, TL, DID and QL according to this specification.
The semantics of F specified in section 5.3 of [RFC3984] also
applies herein.
For NRI, for the bitstream that is compliant with AVC, the semantics
specified in section 5.3 of [H.264] are applicable, otherwise only
the semantics specified in SVC [SVC] is applicable.
For PRID, in addition to the semantics specified in [SVC], according
to this RTP payload specification, values of PRID indicate the
relative transport priority, as determined by the encoder, which is
typically increasing from a lower layer to a higher layer. MANEs
can use this information to protect more important NAL units better
than they do less important NAL units. The transport priority
increases as the PRID value increases.
For D, MANEs can use this information to protect NAL units with D
equal to 0 better than they do NAL units with D equal to 1.
For TL, DID and QL, in addition to the semantics specified in [SVC],
according to this RTP payload specification, values of TL, DID or QL
indicate the relative transport priority. MANEs can use this
information to protect more important NAL units better than they do
less important NAL units. A higher value of TL, DID or QL indicates
a higher priority if the other two components are identical
correspondingly.
Informative note: Using of SPID, D, TL, DID and QL in combination
may better indicate the relative transport priority.
6.4. Packetization Modes
Please see section 5.4 of RFC3984 [RFC3984]. The single NAL unit
mode SHALL NOT be used.
6.5. Decoding Order Number (DON)
Please see section 5.5 of RFC3984 [RFC3984].
6.6. Single NAL Unit Packet
Please see section 5.6 of RFC3984 [RFC3984].
6.7. Aggregation Packets
Please see section 5.7 of RFC3984 [RFC3984].
Wenger, Wang Standards Track [page 8]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
6.8. Fragmentation Units (FUs)
Please see section 5.8 of RFC3984 [RFC3984].
7. Packetization Rules
Please see section 6 of RFC3984 [RFC3984]. The single NAL unit mode
SHALL NOT be used.
8. De-Packetization Process (Informative)
Please see section 7 of RFC3984 [RFC3984]. The single NAL unit mode
SHALL NOT be used.
9. Payload Format Parameters
Edt. note: this section 9 and its subsections will be updated
according to the changes listed below, a little later in the
process. For now, we just list the adjustments necessary, so not to
bury any new information in the RFC 3984 text.
Section 8 of [RFC3984] applies with the following modification.
The sentence
"The parameters are specified here as part of the MIME subtype
registration for the ITU-T H.264 | ISO/IEC 14496-10 codec."
is replaced with
"The parameters are specified here as part of the MIME subtype
registration for the SVC codec."
9.1. MIME Registration
The MIME subtype for the SVC codec is allocated from the IETF tree.
The receiver MUST ignore any unspecified parameter.
Media Type name: video
Media subtype name: H.264-SVC
Required parameters: none
OPTIONAL parameters:
The optional MIME parameters specified in [RFC3984] apply, in
addition to the following.
sprop-scalability-info:
This parameter MAY be used to convey the NAL unit containing the
scalability information SEI message that MUST precede any other NAL
units in decoding order. The parameter MUST NOT be used to indicate
Wenger, Wang Standards Track [page 9]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
codec capability in any capability exchange procedure. The value of
the parameter is the base64 representation of the NAL unit
containing the scalability information SEI message as specified in
[RFC3984].
Encoding considerations:
This type is only defined for transfer via
RTP
(RFC 3550).
Security considerations:
See section 9 of this specification.
Public specification:
Please refer to section 15 of this
specification.
Additional information:
None
File extensions: none
Macintosh file type code: none
Object identifier or OID: none
Person & email address to contact for further information:
Intended usage: COMMON
Author:
Change controller:
IETF Audio/Video Transport working group
delegated from the IESG.
SDP Parameters
9.1.1. Mapping of MIME Parameters to SDP
The MIME media type video/SVC string is mapped to fields in the
Session Description Protocol (SDP) as follows:
* The media name in the "m=" line of SDP MUST be video.
* The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the
MIME subtype).
* The clock rate in the "a=rtpmap" line MUST be 90000.
* The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
"max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
parameter-sets", "parameter-add", "packetization-mode", "sprop-
interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
"sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-
size", and "sprop-scalability-info", when present, MUST be
included in the "a=fmtp" line of SDP. These parameters are
expressed as a MIME media type string, in the form of a semicolon
separated list of parameter=value pairs.
Wenger, Wang Standards Track [page 10]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
9.1.2. Usage with the SDP Offer/Answer Model
TBD.
9.1.3. Usage in Declarative Session Descriptions
TBD.
10. Examples
TBD.
11. Parameter Set Considerations
Please see section 10 of RFC3984 [RFC3984].
12. Security Considerations
Please see section 11 of RFC3984 [RFC3984].
13. Congestion Control
Within any given RTP session carrying payload according to this
specification, the provisions of section 12 of RFC3984 [RFC3984]
apply.
On key motivation for the introduction of a scalable codec has been
the problem of network congestion as a whole. While scalability
cannot reduce congestion for the transport path of a given RTP
session, MANEs and layered multicast technologies can be used to
alleviate network-wide congestion.
MANEs MAY alleviate congestion on their outgoing network path by
a) removing the NAL units belonging to hierarchically "highest"
enhancement layer (or set of enhancement layers) from an RTP
stream carrying base and enhancement layers.
b) removing some or all bits of a given FGS NAL unit.
In both cases, the incoming RTP session is terminated in the MANE,
and a second RTP session originates at the MANE. The MANE acts as
an RTP translator. The concept of scalability keeps the
implementation and computational effort within the MANE low, and
avoids expensive and delay-intensive full transcoding (in the sense
of reconstruction and re-encoding).
When scalable layers are transported in their own RTP sessions, an
RTP receiver SHOULD unsubscribe to one or more enhancement layers
when it senses congestion, similar to what has been described in
[McCanne/Vetterli]. This behavior could perhaps be sufficient to
ease the network load to an acceptable level of congestion.
Nevertheless, it MUST follow the mechanisms described in section 12
of [RFC3984].
Wenger, Wang Standards Track [page 11]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
14. IANA Consideration
Edt. note: A new MIME type should be registered from IANA.
15. Acknowledgements
Funding for the RFC Editor function is currently provided by the
Internet Society.
16. References
16.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[MPEG4-10] ISO/IEC International Standard 14496-10:2003.
[H.264] ITU-T Recommendation H.264, "Advanced video coding for
generic audiovisual services", May 2003.
[SVC] Joint Video Team, "Joint Scalable Video Model JSVM-3 Annex
S", available from http://ftp3.itu.ch/av-arch/jvt-site/
2005_07_Poznan/JVT-P202r1.zip., July 2007
[RFC3984] Wenger, S. Hannuksela, M, Stockhammer, T, Westerlund, M,
Singer, D, "RTP Payload Format for H.264 Video", RFC 3984,
February 2005
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
16.2. Informative References
[McCanne/Vetterli] V. Jacobson S. McCanne and M. Vetterli. Receiver-
driven layered multicast. In Proc. of ACM SIGCOMM'96, pages
117--130, Stanford, CA, August 1996.
17. Author's Addresses
Stephan Wenger Phone: +358-50-486-0637
Nokia Research Center Email: stewe@stewe.org
P.O. Box 100
FIN-33721 Tampere
Finland
Ye-Kui Wang Phone: +358-50-486-7004
Nokia Research Center Email: ye-kui.wang@nokia.com
P.O. Box 100
FIN-33721 Tampere
Finland
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
Wenger, Wang Standards Track [page 12]
INTERNET-DRAFT Scalable Video Codec RTP Payload Format October 2005
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
RFC Editor Considerations
none
Open Issues
18. Changes Log
Wenger, Wang Standards Track [page 13]
| PAFTECH AB 2003-2026 | 2026-04-24 05:51:31 |