One document matched: draft-ietf-avt-variable-rate-audio-00.txt
Network Working Group S. Wenger
Internet Draft C. Perkins
Document: draft-ietf-avt-variable-rate-audio-00.txt
Expires: April 2005
October 2004
RTP Timestamp Frequency for Variable Rate Audio Codecs
Status of this Memo
By submitting this Internet-Draft, I certify that any applicable
patent or other IPR claims of which I am aware have been disclosed,
or will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This document is a submission of the IETF AVT WG. Comments should
be directed to the AVT WG mailing list, avt@ietf.org.
Abstract
This memo discusses the problems of audio codecs with variable
external sampling rates. Historically, for audio codecs, the RTP
timestamp frequency was chosen to match the sampling rate of the
audio codec. However, this choice is nowadays more difficult to
justify, because of the advent of audio codecs (and, even more
important, practical use cases) that support multiple sample rates
and the switch between the sample rates during the lifetime of an
RTP session. This Internet draft addresses the problem by
suggesting that RTP Payload RFCs for such codecs to utilize a
single, high, unified RTP timestamp frequency.
1. Introduction
Internet Draft October 2004
One key property of audio codecs is the external input sample rate.
For many of codecs, this sample rate is fixed. ITU-T G.711 [2],
also known as a-law and mu-law, uses, for example, a sample rate of
8 kHz. Other audio codecs give the user a choice between different
sample rates. However, until recently, applications never changed
the sample rate during the lifetime of an RTP session, even if this
is technically feasible and probably advantageous from both the user
perception, and the network point-of-view. One example for such a
codec is MPEG-1 audio, layers 1, 2, or 3 [3]. At the time RTP [1]
and the AV-profile [4] was developed, it was a reasonable design
choice to use an RTP timestamp frequency that is identical to the
codec's input sample rate, as this facilitates sample exact
synchronization and processing of media data in endpoints, mixers
and translators, among other advantages. Although neither RTP [1]
nor the audio-visual profile [4] require the codec sample rate being
the same as the RTP timestamp frequency, this paradigm was observed
in practice.
Recently, codecs have been developed which do not only support
variable sample rates, but use unannounced (in-band only signaled)
changes of the sample rate as one of their key mechanisms.
Similarly, applications have emerged, that not only support variable
sample rates, but, to some extend, rely on this feature. For most
(if not all) of these codecs, it is true that the required bit rate
and the user experience scales with the sample rate selected. This
allows, in the future, a network-dictated scaling of the
transmission bit rate of an audio codec -- a feature that was not
available before -- which could turn out to be very useful in
Internet environments, for example to support congestion control.
With the modern codecs mentioned, the current paradigm of RTP time
stamp frequency equal to codec sample rate does not make much sense
any more. The purpose of this draft is to provide guidance for the
developers of RTP payload specs for codecs with variable sample rate
to use a single, relatively high, RTP timestamp frequency, which is
specified in this draft.
2. Audio codecs with variable sample rates: Examples
Examples for audio codecs with variable sample rates, that (at least
in theory) could switch the sample rate on the fly without
out-of-band signaling support, include:
* AMR-WB+ [5] with a choice of 56 different sample rates
* VMR-WB [6] with the choice of 8 kHz and 16 kHz sample rates
* MPEG-4 AAC+ [7] with the choice of (need details here)
* Any others?
All these codecs use in-band signaling of the sample rate.
3. Rounding
Wenger, Perkins Expires April 2005 Page 2
Internet Draft October 2004
It is possible (even likely) that no unified RTP timestamp frequency
can be found that, on one hand, fulfills one key requirement spelled
out later (namely: is low enough to make timestamp wrap-around
during erasure periods unlikely for all practical application
scenarios) and, one the other hand, is an integer multitude of all
sampling frequencies the codecs support. It is well possible that,
in the future, codecs be developed that can make sample rate choices
in a granularity of 1 Hz or even finer. Considering this, it is
required to specify a rounding algorithm for such cases where no
sample-exact position of an audio frame can be found in the RTP
timestamp numbering space. Specifying this rounding algorithm
ensures that all equipment conforming to this draft use the same
rounding algorithm. If that selected rounding algorithm guaranties
that inaccuracies do not add up (as spelled out in the requirements
later), then even frequent transcoding steps will not lead to an
increase to inaccuracy of the timing beyond the unavoidable minimum.
4. Requirements discussion
4.1. Requirements for this draft (general)
1) This draft MUST specify a unified RTP timestamp frequency that
fulfills the requirements of section 4.2.
2) This draft MUST specify a rounding algorithm that can be used for
non-sample exact alignment of samples stemming from more than one
audio codec, at least one of which having a variable sample
rate). The rounding algorithm MUST fulfill the requirements of
section 4.3.
3) This draft SHOULD state that its provisions MUST be used for the
design of future RTP payload formats for audio codecs with
variable sample rates
4) This draft SHOULD state that its provisions SHOULD be considered
in the design of future RTP payload formats for non-audio codecs
that have similar problems as variable sample rate audio codecs.
5) This draft SHOULD provide an application example for a
well-understood variable sample rate codec.
4.2. Requirements for the unified RTP timestamp rate
6) The unified RTP timestamp rate (uRTR) MUST be sufficiently high
to fulfill the requirements for timestamps in RFC3550[1]
7) The uRTR MUST be low enough to make wrap-arounds of the RTP
timestamp during erasure periods (packet loss bursts) unlikely in
all reasonable application scenarios.
Informative note: Such scenarios include, for example, cell
handovers in wireless cellular networks, where erasure periods
of a few seconds can occur.
8) The uRTR SHOULD share the prime factors of the sample rates of
the most commonly used fixed sample rate audio codecs, so to
allow for sample exact mixing of streams coded by those fixed
sample rate audio codecs.
9) The uRTR SHOULD be chosen to include a sufficiently high number
of prime factors so to support as many future variable sample
rate codec code points as possible for sample-exact mixing
Wenger, Perkins Expires April 2005 Page 3
Internet Draft October 2004
4.3. Requirements for a rounding algorithm
10) The rounding algorithm MUST be applicable for all sample
rates lower than the 0.5 * uRTR specified in this draft.
11) The rounding algorithm MAY specify a minimum and maximum
sample rate, in units of x * uRTR. Only within this band it
is a reasonable expectation that the application of the
rounding algorithm does not lead to audible distortions for
the common user.
12) The rounding algorithm MUST be simple enough to be
implemented, without a serious cycle burden, in networking
equipment.
13) The rounding algorithm SHOULD be imlementable in fixed-point
arithmetic
14) The rounding algorithm MAY, advantageously, be specified such
that it does not require division operations
15) The rounding algorithm SHOULD be designed such that that
multiple applications of the algorithm does not lead to the
introduction of errors larger than one tick of the uRTR.
Informative Note: this is a much more difficult
requirement as it seems at the first glance. Think of a
transcoding scenario where variable goes to 44.1 kHz goes
to variable, and the unified timestamp frequency does not
share all prime factors of 44.1 kHz. One way out of this
would be to rewrite all fixed rate payload specs that use
timestamp frequencies that do not fit into the prime
factors of the uRTR to be rewritten so to use the uRTR.
Is it possible to do this for 44.1 -- or is this nailed
down in RFC3551?
5. Open issues
* Very general: is this a good idea?
* What would be a good choice for the uRTR? 192 kHz?
* Is it a good idea to require ALL future I-Ds on audio (not only
the variable clock frequency ones) to use the uRTR?
* Or only those that do not fit the uRTR (fit == subset of prime
factors)?
* Revisit CD 44.1. No variable sample rate needed? Are there
proposals for an 88.2 CD audio codec?
6. Security Considerations
None
7. Congestion Control
None
8. IANA Consideration
None
9. Acknowledgements
None
10. Full Copyright Statement
Wenger, Perkins Expires April 2005 Page 4
Internet Draft October 2004
Copyright (C) The Internet Society (2004). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
11. Intellectual Property Notice
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology described
in this document or the extent to which any license under such
rights might or might not be available; nor does it represent that
it has made any independent effort to identify any such rights.
Information on the procedures with respect to rights in RFC
documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
12. References
12.1. Normative References
[1] RTP, RFC 3550, STD 64
12.2. Informative References
[2] G.711
[3] ISO/IEC 11172 part 3
[4] RTP AV profile, RFC 3551, STD 65
[5] AMR-WB+
[6] VMR-WB
[7] ISO/IEC 14496 part xxx, AAC+
13. Author's Addresses
Stephan Wenger Phone: +358-50-486-0637
Nokia Research Center Email: stewe@stewe.org
Wenger, Perkins Expires April 2005 Page 5
Internet Draft October 2004
P.O. Box 100
FIN-33721 Tampere
Finland
Colin Perkins <csp@csperkins.org>
University of Glasgow
Department of Computing Science
17 Lilybank Gardens
Glasgow G12 8QQ
United Kingdom
14. RFC Editor Considerations
none
Wenger, Perkins Expires April 2005 Page 6 | PAFTECH AB 2003-2026 | 2026-04-23 20:19:47 |