One document matched: draft-garcia-mmusic-multiple-ptimes-problem-03.xml
<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE rfc SYSTEM "http://xml.resource.org/authoring/rfc2629.dtd" [
<!ENTITY RFC4566 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml">
<!ENTITY RFC3264 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml">
<!ENTITY RFC3890 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3890.xml">
<!ENTITY RFC3108 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3108.xml">
<!ENTITY RFC4504 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4504.xml">
<!ENTITY RFC3441 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3441.xml">
<!ENTITY RFC3952 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3952.xml">
<!ENTITY RFC4060 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4060.xml">
<!ENTITY RFC1958 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1958.xml">
<!ENTITY RFC2327 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2327.xml">
<!ENTITY RFC3267 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3267.xml">
<!ENTITY RFC3016 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3016.xml">
<!ENTITY RFC3551 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml">
]>
<?xml-stylesheet type="text/xsl" href="http://xml.resource.org/authoring/rfc2629.xslt" ?>
<?rfc strict="yes" ?>
<?rfc toc="yes" ?>
<?rfc tocdepth="4" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="yes" ?>
<rfc category="info" docName="draft-garcia-mmusic-multiple-ptimes-problem-03.txt" ipr="full3978">
<front>
<title abbrev="Multiple ptime in SDP">
Multiple Packetization Times in the Session Description Protocol (SDP):
Problem Statement, Requirements & Solution
</title>
<author initials="M." surname="Willekens" fullname="Marc Willekens">
<organization>Devoteam Telecom & Media</organization>
<address>
<postal>
<street></street>
<city>Herentals</city>
<region>Antwerp</region>
<code>2200</code>
<country>Belgium</country>
</postal>
<email>marc.willekens@devoteam.com</email>
</address>
</author>
<author initials="M." surname="Garcia-Martin" fullname="Miguel A. Garcia-Martin">
<organization>Ericsson</organization>
<address>
<postal>
<street>Via de los Poblados 13</street>
<city>Madrid</city>
<region></region>
<code>28033</code>
<country>Spain</country>
</postal>
<email>Miguel.A.Garcia@ericsson.com</email>
</address>
</author>
<author initials="P." surname="Xu" fullname="Peili Xu">
<organization>Huawei Technologies</organization>
<address>
<postal>
<street>Bantian</street>
<city>Longgang</city>
<region>Shenzhen</region>
<code>518129</code>
<country>China</country>
</postal>
<email>xupeili@huawei.com</email>
</address>
</author>
<date day="12" month="July" year="2008" />
<area>RAI</area>
<workgroup>MMUSIC Working Group</workgroup>
<keyword>SDP</keyword>
<keyword>ptime</keyword>
<keyword>maxptime</keyword>
<abstract>
<t>
This document provides a problem statement and requirements with respect to the
presence of a single packetization time (ptime/maxptime) attribute in SDP media
descriptions that contain several media formats (audio codecs).
Furthermore, a best common practice solution for the use of 'ptime/maxptime' is
proposed based on 'static', 'dynamic' and 'indicated' values.
Some methods already proposed as ad-hoc solutions and background
information is included in an appendix.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
<xref target="RFC4566">"Session Description Protocol" (SDP)</xref>
provides a protocol to describe multimedia sessions
for the purposes of session announcement, session invitation,
and other forms of multimedia session initiation. A session
description in SDP includes the session name and purpose, the
media comprising the session, information needed to receive the
media (addresses, ports, formats, etc.) and some other
information.
</t>
<t>
In the SDP media description part, the m-line contains the
media type (e.g. audio), a transport port, a transport
protocol (e.g. RTP/AVP) and a media format description which
depends on the transport protocol.
</t>
<t>
For the transport protocol RTP/AVP or RTP/SAVP, the media
format sub-field can contain a list of RTP payload type
numbers.
See <xref target="RFC3551">
"RTP Profile for Audio and Video Conferences with Minimal Control"</xref>,
Table 4.<vspace/>
For example: "m=audio 49232 RTP/AVP 3 15 18" indicates the audio encoders
GSM, G728, and G729.
</t>
<t>
Further, the media description part can contain additional
attribute lines that complement or modify the media
description line. Of interest for this memo, are the 'ptime'
and 'maxptime' attributes.
According to <xref target="RFC4566"/>, the 'ptime' attribute gives
the length of time in milliseconds represented by the media in
a packet, and the 'maxptime' gives the maximum amount of media
that can be encapsulated in each packet, expressed as time in
milliseconds. These attributes modify the whole media
description line, which can contain an extensive list of
payload types. In other words, these attributes are not
specific to a given codec.
</t>
<t>
<xref target="RFC4566"/> also indicates that it
should not be necessary to know 'ptime' to decode RTP or vat
audio since the 'ptime' attribute is intended as a
recommendation for the encoding/packetization of
audio. However, once more, the existing 'ptime' attribute
defines the desired packetization time for all the payload
types defined in the corresponding media description line.
</t>
<t>
End-devices can sometimes be configured with different codecs and for
each codec a different packetization time can be
indicated. However, there is no clear way to exchange this
type of information between different user agents and this can
result in lower voice quality, network problems or performance
problems in the end-devices.
</t>
</section>
<section title="Problem Statement">
<t>
The packetization time is an important parameter which helps
in reducing the packet overhead. Many voice codecs define a
certain frame length used to determine the coded voice filter
parameters and try to find a certain trade-off between the
perceived voice quality, measured by the Mean Option Score
(MOS), and the required bitrate. When a packet
oriented network is used for the transfer, the packet header
induces an additional overhead. As such, it makes sense to
combine different voice frame data in one packet, up to
a Maximum Transmission Unit (MTU), to find a good balance
between the required network resources, end-device resources
and the perceived voice quality influenced by packet loss,
packet delay, jitter. When the packet size decreases, the
bandwidth efficiency is reduced. When the packet size
increases, the packetization delay can have a negative impact
on the perceived voice quality.
</t>
<t>
The <xref target="RFC3551">"RTP Profile for Audio and Video
Conferences with Minimal Control"</xref>, Table 1, indicates
the frame size and default packetization time for different
codecs. The G728 codec has a frame size of 2.5 ms/frame and
a default packetization time of 20 ms/packet. For G729
codec, the frame size is 10 ms/frame and a default
packetization time of 20 ms/packet.
</t>
<t>
When more and more audio streaming traffic is carried over
IP-networks, the quality as perceived by the end-user should
be no worse as the classical telephony services. For VoIP
service providers, it is very important that endpoints receive
audio with the best possible codec and packetization time. In
particular, the packetization time depends on the selected
codec for the audio communication and other factors, such as
the Maximum Transmission Unit (MTU) of the network and the
type of access network technology.
</t>
<t>
As such, the packetization time is clearly a function of the
codec and the network access technology. During the
establishment of a new session or a modification of an existing
session, an endpoint should be able to express its preferences
with respect to the packetization time for each codec. This would
mean that the creator of the SDP prefers the remote endpoint to
use certain packetization time when sending media with that
codec.
</t>
<t>
<xref target="RFC4566">SDP</xref> provides the means for
expressing a packetization time that affects all the payload
types declared in the media description line. So, there are no
means to indicate the desired packetization time on a per
payload type basis. Implementations have been using
proprietary mechanisms for indicating the packetization time
per payload type, leading to interoperability problems.
</t>
<t>
One of these mechanisms is the 'maxmptime' attribute, defined in
<xref target="ITU.V152"/>, which indicates the supported packetization
period for all codec payload types.
</t>
<t>
Another one is the 'mptime' attribute, defined by
<xref target="PKT.PKT-SP-EC-MGCP">"PacketCable"</xref>, which indicates a
list of packetization period values the endpoint is capable of
using (sending and receiving) for this connection.
</t>
<t>
While all have similar semantics, there is obviously no interoperability
between them, creating a nightmare for the implementer who happens to be
defining a common SDP stack for different applications.
</t>
<t>
A few RTP payload format descriptions, such as: <vspace/>
<xref target="RFC3267"/>, <xref target="RFC3016"/>, and <xref target="RFC3952"/>,
indicate that the packetization time for such payload should
be indicated in the 'ptime' attribute in SDP. However, since
the 'ptime' attribute affects all payload formats included
in the media description line, it would not be possible to
create a media description line that contains all the
mentioned payload formats and different packetization
times. The solutions range from considering a single
packetization time for all payload types, or creating a
media description line that contains a single payload type.
</t>
<t>
However, once more, if several payload formats are
offered in the same media description line in SDP, there is no
way to indicate different packetization times per payload format.
</t>
</section>
<section title="Requirements">
<t>
The main requirement is coming from the implementation and media gateway
community making use of hardware based solutions, e.g. DSP or FPGA
implementations with silicon constraints for the amount of buffer space.
</t>
<t>
Some are making use of the ptime/codec information to make certain QoS budget
calculations.
When the packetization time is known for a codec with a certain
frame size and frame data rate, the efficiency of the throughput
can be calculated.
</t>
<t>
Currently, the 'ptime' and 'maxptime' are "indication" attributes and optional.
When these parameters are used for resource reservation and for hardware
initializations, a negotiated value between the SDP offerer and SDP answerer
can become a requirement.
</t>
<t>
There could be different sources for the 'ptime/maxptime', i.e. from RTP/AVP
profile, from end-user device configuration, from network architecture,
from receiver.
</t>
<t>
The codec and 'ptime/maxptime' in upstream and downstream can be different.
</t>
</section>
<section title="BCP solution proposal">
<t>
The basic idea of this proposal is to keep the packetization time
independent from the codec and to consider the main purpose of the 'ptime'
as follows.
</t>
<t>
The 'ptime' is a parameter indicating the packetization time which is an
important parameter for the end-to-end delay of the voice signal as
indicated in the previous sections.
It is defined as a media-attribute in the SDP.
</t>
<t>
The only requirement for the use of the 'ptime' or 'maxptime' is the total
size of the message which should fit in the MTU and the packetization
time should be an integer multiple of the codec frame size.
</t>
<t>
If the same session does require different kind of streams, e.g. in a
conference where some users have a narrowband connection and others
having a broadband connection, different media can be defined and
allocated to different ports.
In that case, different m-lines can be defined and another 'ptime' and
'maxptime' can be indicated.
</t>
<t>
The IETF RFCs are not clear when the 'ptime' or 'maxptime' in the SDP are not
an integer multiple of the frame size. What should be used in that case?
Making use of the default 'ptime', making use of the 'ptime' which is an
integer multiple of the frame size and lower than the indicated 'ptime'?
In case of an indicated 'maxptime', taking a value as close as possible to
the indicated 'ptime' but lower as the 'maxptime'?
</t>
<t>
This proposal takes care about the IETF architectural principle of
"be strict when sending" and "be tolerant when receiving". Ref.
<xref target="RFC1958"/>.
</t>
<section title="Sending party RTP voice payload">
<t>
The transmitting side of a connection needs to know the packetization
time it can use for the RTP payload data, i.e. how many speech frames
it can include in the RTP packet. A trade-off between the packetization
delay and the transmission efficiency has to be made and this can be a
static or a dynamic process which involves all elements in the
end-to-end chain.
</t>
<t>
As such, 3 different sources to determine the packetization time are
considered.
</t>
<section title="ptime(s) - Static">
<t>
Static provided values in the end-device: default values or manually
defined values.
</t>
<t>
An end-device implementation must know:<vspace/>
<list style="numbers">
<t>
all the codec specific parameters such as:
<list style="numbers">
<t>Sampling rate (e.g. 8000 Hz).</t>
<t>Amount of channels (e.g. 1).</t>
<t>Frame size in ms (e.g. 20 ms).</t>
<t>Amount of encoded bits per frame (e.g. 264 bits).</t>
<t>
Amount of required octets per frame (e.g. G.723.1 with 6.4 kbps,
has 189 bits for the encoded data resulting in a datarate of
189/30 ms or 6.3 kbps.
However, the packet data is octet aligned and as such, 3 bits are added
which results in 24 octets/frame or a datarate of 6.4 kbps).
</t>
</list>
</t>
<t>
system specific parameters such as:
<list style="numbers">
<t>MTU supported by the network and by the protocol stack of the end-device.</t>
<t>Packetization time (e.g. 60 ms) and the maximum packetization time (e.g. 150 ms).</t>
<t>Supported codecs.</t>
</list>
</t>
</list>
</t>
</section>
<section title="ptime(d) - Dynamic">
<t>
Dynamic provided values defined by the network architecture.
</t>
<t>
The network can indicate, as part of the device management, its supported
codecs, the 'ptime' and 'maxptime'. These values can also change based on the
dynamic behavior of the network. During heavy load on the network,
the network architecture can decide to use lower rate codecs
(for bandwidth issues) and/or higher packetization times
(for packet processing performance).
This dynamic change can be done before, during or after a session.
</t>
</section>
<section title="ptime(i) - Indicated">
<t>
Proposed indicated values coming from the receiving side.
</t>
<t>
The receiving side can indicate in the SDP the 'ptime' and 'maxptime' value
it wants to receive. This is an optional parameter for the media, codec
independent and considered as an indication only. It should only be
considered as a hint to the sending party.
</t>
</section>
<section title="ptime/maxptime algorithm">
<t>
Instead of indicating a 'ptime/maxptime' on a per-codec basis as done in
many different proposals, this draft proposes to make use of the 'ptime/maxptime'
as a common parameter coming from different sources:<vspace/>
ptime(s), ptime(d), ptime(i) and maxptime(s), maxptime(d), maxptime(i).
</t>
<t>
In function of the available information for the 'ptime' and 'maxptime',
the packetization time which will be used for the transmission "pt" is
based on following algorithm.
<list style="numbers">
<t>
Determine codec to be used, e.g. G723 based on local info or the
optional network info.
</t>
<t>
Determine coding data rate, e.g. 6.4 kbps based on local info or the
optional network info.
</t>
<t>
Based on the codec, the frame size in ms is known: fc = frame size
of the codec.
</t>
<t>
Determine the MTU size which can be used. Based on this value,
the codec frame size and datarate, a 'maxptime' related to the codec "mc"
can be calculated.
</t>
<t>
Check the ptime(s, d, i) and maxptime(s, d, i, mc).
Take the maximum value from the available set of ptime(s, d, i) which
is lower or equal than the minimum value in the set maxptime(s, d, i, mc).
</t>
<t>
Normalize this 'ptime' value to the integer multiple of the frame size
lower or equal to this 'ptime' value and lower or equal to the "mc" but not
lower then the codec frame size.
</t>
</list>
</t>
<t>
Remark:<vspace/>
It's up to a local policy of the device, to determine which 'ptime/maxptime'
sources it will use in its calculation, e.g. it is possible to disallow
the treatment of the 'ptime' indicated by the other side.
This can easily be done by including/excluding the 'ptime/maxptime' values
from the vectors used in the calculation.
</t>
<t>
The formula to calculate the packetization time for the transmission of
voice packets in the RTP payload data has following input parameters.
</t>
<t>
<list style="numbers">
<t>
The packetization time made available from different sources.
When no value is known, the frame size of the voice codec is used.
</t>
<t>
The maximum packetization time values made available from different
sources. When no value is known, the frame size of the voice codec is used.
</t>
<t>
The frame size of the codec.
</t>
<t>
The packetization time corresponding with the selected codec,
frame size, frame datarate and the network MTU. This packetization time
has to be larger or equal to the frame size. At least one frame size should fit
in the MTU!
</t>
</list>
</t>
<t>
The function has one output parameter: the packetization time which has
to be used for the transmission: "pt". It is the frame size of the codec
multiplied by the number of frames which have to be placed in the RTP
payload based on the provided 'ptime' and 'maxptime' values.
In the formula, the maximum packetization time related to the MTU is added
to the vector which contains one or more packetization time values. The
minimum value out of this set is determined.
For the 'ptime' set "p" which contains one or more values, the values of
the 'ptime' which is higher as the minimum value of the 'maxptime' set "mp"
is replaced by this value. Then the maximum value out of this set
is determined and used to calculate the amount of voice frames which
can be included with that packetization time.
</t>
<t>
Some examples are provided. The first example is related to the G723
with a frame size of 30 ms. When the receiver has indicated a 'ptime' of
20 ms in the SDP, the RTP will be sent with one voice frame of 30 ms.<vspace/>
In another example, a G711 codec with a default 'ptime' of 20 ms and
an indicated 'ptime' of 60 ms, 3 speech frames of 20 ms can be transmitted
in one RTP packet towards the receiver which has indicated his ability to
receive RTP packets with 60 ms packetization time.
</t>
<t>
This "pt" is used to allocate the PCM buffer size where the voice samples
from the synchronous network interface are stored before being passed
in RTP packets towards the packet oriented network.
</t>
<t>
When the 'ptime' and 'maxptime' are lower as the frame size of the codec, no
packetization time for the transmission can be determined. An invalid value
(=0) is indicated by the algorithm. In that case, the sender has to select
another codec with a voice frame size which is lower or equal to the 'ptime'
or 'maxptime'.
</t>
</section>
<section title="Algorithm and examples">
<section title="Codec independent parameters">
<t>
<list style="symbols">
<t>
p = vector containing all provided packetization time values such as
static, dynamic, indicated values.
</t>
<t>
mp = vector containing all provided maximum packetization time values.
</t>
</list>
</t>
<t>
At least, one "p" and "mp" value have to be provided. When no static,
dynamic or indicated values are known, the frame size of the codec "fc"
can be used.
</t>
</section>
<section title="Codec dependent parameters">
<t>
<list style="symbols">
<t>
fc = frame size of the codec
</t>
<t>
mc = max packetization time which corresponds with the selected codec,
frame size, frame datarate and the network MTU (mc > fc).
</t>
</list>
</t>
</section>
<section title="Pseudocode algorithm">
<figure align="center" title="Pseudocode algorithm">
<artwork>
<![CDATA[
pt(p,mp,fc,mc) := |mp <- stack(mp,mc)
|if cols(p)>0
| for i e 0..cols(p)-1
| p(i)<-min(mp) if p(i)>min(mp)
|otherwise
| p<-min(mp) if p>min(mp)
|nf<-1 if (nf<-floor(max(p)/fc)<=0) & (min(mp)>fc)
|fc.nf
]]>
</artwork>
</figure>
</section>
<section title="Pseudocode examples">
<figure align="center" title="Pseudocode examples">
<artwork>
<![CDATA[
ptime:=20 maxptime:=60 pt(ptime,maxptime,30,100)=30
ptime:=20 maxptime:=20 pt(ptime,maxptime,30,100)=0
ptime:=30 maxptime:=30 pt(ptime,maxptime,30,100)=30
ptime:=60 maxptime:=80 pt(ptime,maxptime,30,100)=60
ptime:=20 maxptime:=60 pt(ptime,maxptime,20,100)=20
ptime:=60 maxptime:=80 pt(ptime,maxptime,20,100)=60
ptime:=70 maxptime:=200 pt(ptime,maxptime,20,100)=60
ptime:=120 maxptime:=60 pt(ptime,maxptime,20,100)=60
ptime:=120 maxptime:=200 pt(ptime,maxptime,10,100)=100
ptime:=[40,50,20] maxptime:=200 pt(ptime,maxptime,10,100)=50
ptime:=[40,50,20] maxptime:=[40,50,20] pt(ptime,maxptime,10,100)=20
ptime:=[120,40] maxptime:=[150,200,100] pt(ptime,maxptime,10,100)=100
]]>
</artwork>
</figure>
</section>
</section>
</section>
<section title="Receiving party RTP voice payload">
<t>
The receiver has to make use of the information in the RTP to determine
the codec type, the frame rate and the total packetization time of the
voice payload data.
</t>
<t>
For the receiver, two parts in the data flow can be considered. First,
the packet has to be received from the packet oriented network. At the
other side, mostly a synchronous network is provided where PCM voice
samples are used.
</t>
<t>
This proposal describes a method how the receiver can handle unknown
packetization buffer requirements which also allows inband changes
for the codec datarate and packetization time.
</t>
<t>
As indicated, there are different sources for the 'maxptime' and it
is already described how a 'maxptime' value can be determined for
sending it in the SDP indication. The same 'maxptime' is used for
the allocation of the PCM buffer space where the voice samples
received in the RTP packets are stored before being transmitted
towards the synchronous network, after a de-jittering. An indication is given to the
DSP hardware about the actual packetization length obtained
from the received RTP packet. When the amount of samples are stored
in the buffer corresponding to the packetization length, an interrupt
is generated and the data is transmitted without having to wait for
another RTP packet to fill-up the remaining space.
</t>
</section>
<section title="Procedures for the SDP offer/answer">
<t>
This section contains the procedures related to the calculation of the
'ptime' and 'maxptime' attributes when they are used by protocols
following the SDP offer/answer model specified in <xref target="RFC3264"/>.
</t>
<section title="Procedures for an SDP offerer">
<t>
An SDP offerer may include a 'ptime' value and a 'maxptime' value in the
SDP. These values are merely an indication of the desired packetization
times.
The same formula as for the "pt" is used to determine
the 'ptime' in the SDP. When the media line contains different codec
formats, the 'ptime' value is determined for the first codec in the format
list (i.e. the codec with the highest priority).
For the 'maxptime', the minimum value of the 'maxptime' value set is used
in the SDP and normalized to an integer multiple of the frame size of
the first codec in the list.
</t>
<t>
It's up to a local policy of the device, to determine which 'ptime/maxptime'
sources it will use in its calculation, e.g. it is possible to disallow
the treatment of a certain 'ptime'. This can easily
be done by including/excluding the 'ptime/maxptime' values from the
vectors used in the calculation.
</t>
</section>
<section title="Procedures for an SDP answerer">
<t>
An SDP answerer that receives an SDP offer may also determine the value
of 'ptime' value and the 'maxptime' value to be included in the SDP answer.
These parameters are determined in the same way as done
by the offerer. However, the "answerer" can use another local policy to
determine which 'time/maxptime' sources will be used in the calculation.
</t>
</section>
</section>
<section title="Advantages">
<t>
The new proposed method has following advantages:<vspace/>
<vspace/>
<list style="numbers">
<t>
Basic idea of the 'ptime' related RFCs is kept. No new parameters
have to be added and no new interpretations or semantic reordering
has to be done.
</t>
<t>
The new method is strict in sending and tolerant in receiving.
It sends with the maximum allowed 'ptime' lower or equal to the minimal
'maxptime'.
</t>
<t>
Different sources for the 'ptime' and 'maxptime' are taken into account,
even more as done in the different current proposals trying to
negotiate end-to-end.
</t>
<t>
A local policy in the end-device can easily be adopted and
adapted without requiring changes in the end-to-end protocol.
</t>
<t>
The algorithm makes use of all the provided information about
'ptime', 'maxptime', codec frame size, MTU size and proposes the most
optimum 'ptime'.
</t>
<t>
The same algorithm is used at sending and receiving side, for
SDP indications and RTP packets.
</t>
<t>
The algorithm is small and straight-forward. Codec dependent
and codec independent parameters are clearly indicated.
</t>
</list>
</t>
</section>
</section>
<section title="Conclusion and next steps">
<t>
This memo advocates for the need of a standardized mechanism to
indicate the packetization time on a per codec basis, allowing
the creator of SDP to include several payload formats in the
same media description line with different packetization
times.
</t>
<t>
This memo encourage discussion in the MMUSIC WG mailing list
in the IETF. The ultimate goal is to define a standard
mechanism that fulfils the requirements highlighted in this
memo.
</t>
<t>
The goal is finding a solution which does not require changes in
implementations which have followed the existing RFC guidelines and
which are able to receive any packetization time.
</t>
</section>
<section title="Security Considerations" anchor="sec-security">
<t>
This memo discusses a problem statement and requirements. As
such, no protocol that can suffer attacks is defined.
</t>
</section>
<section title="IANA Considerations" anchor="sec-iana">
<t>
This document does not request IANA to take any action.
</t>
</section>
</middle>
<back>
<references title="Normative References">
&RFC4566;
&RFC3264;
</references>
<references title="Informative References">
<reference anchor="ITU.V152">
<front>
<title>Procedures for supporting voice-band data over IP networks</title>
<author fullname="ITU-T">
<organization>ITU-T</organization>
</author>
<date year="2005" month="January" />
</front>
<seriesInfo name="ITU-T Recommendation" value="V.152"/>
<format type="pdf" target="http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-V.152-200501-I!!PDF-E&type=items"/>
</reference>
<reference anchor="ITU.G114">
<front>
<title>One-way transmission time</title>
<author fullname="ITU-T">
<organization>ITU-T</organization>
</author>
<date year="2005" month="May" />
</front>
<seriesInfo name="ITU-T Recommendation" value="G.114"/>
<format type="pdf" target="http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-G.114-200305-I!!PDF-E&type=items"/>
</reference>
<reference anchor="PKT.PKT-SP-EC-MGCP">
<front>
<title>PacketCable Network-Based Call Signaling Protocol Specification</title>
<author fullname="PacketCable">
<organization>PacketCable</organization>
</author>
<date year="2005" month="August" day="12" />
</front>
<seriesInfo name="PacketCable" value="PKT-SP-EC-MGCP-I11-050812"/>
<format type="pdf" target="http://www.packetcable.com/downloads/specs/PKT-SP-MGCP-I11-050812.pdf" />
</reference>
<reference anchor="PKT.PKT-SP-CODEC-MEDIA">
<front>
<title>Codec and Media Specification</title>
<author fullname="PacketCable">
<organization>PacketCable</organization>
</author>
<date year="2006" month="October" day="13" />
</front>
<seriesInfo name="PacketCable" value="PKT-SP-CODEC-MEDIA-I02-061013"/>
<format type="pdf" target="http://www.packetcable.com/downloads/specs/PKT-SP-CODEC-MEDIA-I02-061013.pdf" />
</reference>
<?rfc include='reference.I-D.ietf-mmusic-sdp-capability-negotiation'?>
&RFC3890;
&RFC3108;
&RFC4504;
&RFC3441;
&RFC3952;
&RFC4060;
&RFC1958;
&RFC2327;
&RFC3267;
&RFC3016;
&RFC3551;
</references>
<section title="Related RFCs for ptime">
<t>
Many RFCs make references to the 'ptime/maxptime' attribute to
give some definitions, recommendations, requirements, default values.
</t>
<t>
<xref target="RFC4566"/> defines the 'ptime' and 'maxptime' as:
</t>
<t>
<list>
<t>
a=ptime:[packet time]
</t>
<t>
"This gives the length of time in milliseconds represented by
the media in a packet. This is probably only meaningful for
audio data, but may be used with other media types if it makes
sense. It should not be necessary to know ptime to decode RTP
or vat audio, and it is intended as a recommendation for the
encoding/packetization of audio. It is a media-level
attribute, and it is not dependent on charset."
</t>
</list>
</t>
<t>
<list>
<t>
a=maxptime:[maximum packet time]
</t>
<t>
"This gives the maximum amount of media that can be encapsulated
in each packet, expressed as time in milliseconds. The time
SHALL be calculated as the sum of the time the media present in
the packet represents. For frame-based codecs, the time SHOULD
be an integer multiple of the frame size. This attribute is
probably only meaningful for audio data, but may be used with
other media types if it makes sense. It is a media-level
attribute, and it is not dependent on charset."
</t>
</list>
</t>
<t>
<list>
<t>
"Additional encoding parameters MAY be defined in the future,
but codec-specific parameters SHOULD NOT be added. Parameters
added to an "a=rtpmap:" attribute SHOULD only be those required
for a session directory to make the choice of appropriate media
to participate in a session. Codec-specific parameters should
be added in other attributes (for example, "a=fmtp:")."
</t>
</list>
</t>
<t>
<list>
<t>
"Note: RTP audio formats typically do not include information
about the number of samples per packet. If a non-default (as
defined in the RTP Audio/Video Profile) packetization is
required, the 'ptime' attribute is used as given above."
</t>
</list>
</t>
<t>
Remark:<vspace/>
'maxptime' was introduced after the release of <xref target="RFC2327"/>,
and non-updated implementations will ignore this attribute.
</t>
<t>
<xref target="RFC3264">"SDP Offer/answer model"</xref>.<vspace/>
Describe requirements for the 'ptime' for the SDP offerer and SDP answerer. <vspace/>
If the 'ptime' attribute is present for a stream, it indicates the
desired packetization interval that the offerer would like to
receive. The 'ptime' attribute MUST be greater than zero.<vspace/>
The answerer MAY include a non-zero 'ptime' attribute for any media
stream. This indicates the packetization interval that the answerer
would like to receive.
There is no requirement for the packetization interval to be the same
in each direction for a particular stream.
</t>
<t>
<xref target="RFC3890">"SDP Transport independent bandwidth modifier"</xref>.<vspace/>
Indicates the 'ptime' as a possible candidate for the bandwidth but
it should be avoided for that purpose. The use of another parameter is
indicated as a proposed method.
</t>
<t>
<xref target="RFC3108">"SDP Conversions for ATM bearer"</xref>.<vspace/>
It is not recommended to use the 'ptime' in ATM applications since packet
period information is provided with other parameters (e.g. the profile type and
number in the 'm' line, and the 'vsel', 'dsel' and 'fsel'
attributes). Also, for AAL1 applications, 'ptime' is not
applicable and should be flagged as an error. If used in AAL2
and AAL5 applications, 'ptime' should be consistent with the
rest of the SDP description.<vspace/>
The 'vsel', 'dsel' and 'fsel' attributes refer generically
to codecs. These can be used for service-specific codec negotiation and
assignment in non-ATM as well as for ATM applications.<vspace/>
The 'vsel' attribute indicates a prioritized list of one or more 3-
tuples for voice service. Each 3-tuple indicates a codec, an
optional packet length and an optional packetization period. This
complements the 'm' line information and should be consistent with
it.<vspace/>
The 'vsel' attribute refers to all directions of a connection. For a
bidirectional connection, these are the forward and backward
directions. For a unidirectional connection, this can be either the
backward or forward direction.<vspace/>
The 'vsel' attribute is not meant to be used with bidirectional
connections that have asymmetric codec configurations described in a
single SDP descriptor. For these, the 'onewaySel' attribute
should be used.<vspace/>
The 'vsel' line is structured with an encodingName, a packetLength and a
packetTime.<vspace/>
The packetLength is a decimal integer
representation of the packet length in octets. The packetTime is a
decimal integer representation of the packetization interval in
microseconds. The parameters packetLength and packetTime can be
set to "-" when not needed. Also, the entire 'vsel' media attribute
line can be omitted when not needed.<vspace/>
</t>
<t>
<xref target="RFC4504">"SIP device requirements and configuration"</xref>.<vspace/>
In some cases, certain network architectures have constraints influencing
the end devices. The desired subset of codecs supported by the device
SHOULD be configurable along with the order of preference. Service
providers SHOULD have the possibility of plugging in own preferred codecs.
The codec settings MAY include the packet length and
other parameters like silence suppression or comfort noise
generation.
The set of available codecs will be used in the codec negotiation
according to <xref target="RFC3264"/>.<vspace/>
Example: Codecs="speex/8000;ptime=20;cng=on,gsm;ptime=30"
</t>
<t>
<xref target="RFC3441">"MGCP ATM package"</xref>.<vspace/>
Packet time changed ("ptime(#)"):<vspace/>
If armed via an R:atm/ptime, a media gateway signals a packetization
period change through an O:atm/ptime. The decimal number, in
parentheses, is optional. It is the new packetization period in
milliseconds. In AAL2 applications, the pftrans event can be used to
cover packetization period changes (and codec changes).<vspace/>
Voice codec selection (vsel): This is a prioritized list of one or
more 3-tuples describing voice service. Each vsel 3-tuple indicates
a codec, an optional packet length and an optional packetization
period.
</t>
<t>
<xref target="RFC3952">"RTP payload for iLBC"</xref>.<vspace/>
The 'maxptime' SHOULD be a multiple of
the frame size. This attribute is probably only meaningful
for audio data, but may be used with other media types if it
makes sense. It is a media attribute, and is not dependent
on charset. Note that this attribute was introduced after
<xref target="RFC2327"/>, and non updated implementations will ignore this
attribute.<vspace/>
Parameter 'ptime' can not be used for the purpose of specifying iLBC
operating mode, due to fact that for the certain values it will be
impossible to distinguish which mode is about to be used (e.g., when
'ptime=60', it would be impossible to distinguish if packet is carrying
2 frames of 30 ms or 3 frames of 20 ms, etc.).
</t>
<t>
<xref target="RFC4060">"RTP payload for distributed speech recognition"</xref>.<vspace/>
If 'maxptime' is not present, 'maxptime' is assumed to be 80ms.<vspace/>
Note, since the performance of most speech recognizers are
extremely sensitive to consecutive FP losses, if the user of the
payload format expects a high packet loss ratio for the session,
it MAY consider to explicitly choose a 'maxptime' value for the
session that is shorter than the default value.
</t>
</section>
<section title="Ad-hoc solutions for multiple ptime">
<t>
During last years, different solutions were already proposed and
implemented with the goal to make the 'ptime' in function of the codec
instead of the media, containing a list of codecs.
The list of given solutions indicates what kind of logical
proposals were already made to find a solution for the SDP interworking
issues due to implementation and RFC interpretations without imposing
any preference for a certain solution.
</t>
<t>
In all these proposals, a semantic grouping of the codec specific
information is made by giving a new interpretation of the sequence
of the parameters or by providing new additional attributes.
</t>
<t>
REMARK:<vspace/>
All these methods are against the basic rule indicated in the RFCs which
state that a 'ptime' and 'maxptime' are media specific and NOT codec specific.
It does not solve the interworking issues! Instead, it makes it worse due
to many new interpretations and implementations as indicated by following
examples.
</t>
<t>
To avoid a further divergence, the implementation community is strongly
asking for a standardized solution.
</t>
<section title="Method 1">
<t>
Write the rtpmap first, followed by the 'ptime' when it is related to the
codec indicated by that rtpmap.
</t>
<t>
This method tries to correlate a ptime to a specific codec but many existing
implementations will suffer from such a proposal.
Some SDP encoder implementations first write the media line, followed by the
rtpmap lines and then the other value attributes such as ptime and fmtp.
So, it is difficult to know to which payload type the
'ptime' is related. In following example, it's hard to tell if ptime:20
is related to payload 0 or 4 or both and the interpretation of this information
by the remote end is unknown. Implementations which are fully compliant with
the existing RFCs will suffer from such new proposals.
</t>
<figure align="center" title="Method 1">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 4 0
a=rtpmap:4 G723/8000
a=rtpmap:0 PCMU/8000
a=ptime:20
a=fmtp:4 bitrate=6400 ]]>
</artwork>
</figure>
</section>
<section title="Method 2">
<t>
Grouping of all codec specific information together.
</t>
<t>
Most implementers are in favor of this proposal, i.e. writing the value
attributes associated with an rtpmap listed immediately after it. But, this
is also a new interpretation. Normally, the ptime refers to all payload types
indicated in the m-line. All existing implementations will also suffer from
such a method.
</t>
<figure align="center" title="Method 2">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 4 0
a=rtpmap:4 G723/8000
a=fmtp:4 bitrate=6400
a=rtpmap:0 PCMU/8000
a=ptime:20 ]]>
</artwork>
</figure>
</section>
<section title="Method 3">
<t>
Use the 'ptime' for every codec after its rtpmap definition. This makes the
'ptime' a required parameter for each payload type. It looks obvious but not
allowed according the existing RFCs. And will the same construct be used
for the 'maxptime'?
</t>
<figure align="center" title="Method 3">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 0 18 4
a=rtpmap:18 G729/8000
a=ptime:30
a=rtpmap:0 PCMU/8000
a=ptime:40
a=rtpmap:4 G723/8000
a=ptime:60 ]]>
</artwork>
</figure>
</section>
<section title="Method 4">
<t>
Create a new 'mptime' (multiple ptime) attribute that contains different
packetization times, each one mapped to its corresponding payload type
in the preceding 'm=' line.
What will happen when the other side sends a RTP stream with a different
packetization time? Should the elements in the mptime attribute be interpreted
as required values or preferred values? With this approach, the RFC
compliant implementations are also affected and have to consider to the new
mptime attribute.
</t>
<figure align="center" title="Method 4">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 0 18 4
a=mptime 40 30 60 ]]>
</artwork>
</figure>
</section>
<section title="Method 5">
<t>
Use of a new 'x-ptime' attribute. However, SDP parsers complained
about x- headers. It was once indicated to better use something
without x- (e.g. 'xptime'). This is just another type of encoding
of method 4 and also doesn't solve anything.
</t>
<figure align="center" title="Method 5">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 0 8
a=x-ptime 20 30 ]]>
</artwork>
</figure>
</section>
<section title="Method 6">
<t>
Use of different m-lines with one codec per m-line.<vspace/>
However this is a misuse because different m-lines means different audio streams
and not different codec options. So, this is certainly against the existing
SDP concept.
</t>
<figure align="center" title="Method 6">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:40
m=audio 1234 RTP/AVP 18
a=rtpmap:18 G729/8000
a=ptime:30
m=audio 1234 RTP/AVP 4
a=rtpmap:4 G723/8000
a=ptime:60 ]]>
</artwork>
</figure>
</section>
<section title="Method 7">
<t>
Use of the 'ptime' in the 'fmtp' attribute
</t>
<figure align="center" title="Method 7">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 4 18
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=yes;ptime=20
a=maxptime:40
a=rtpmap 4 G723/8000
a=fmtp:4 bitrate=6.3;annexa=yes;ptime=30
a=maxptime:60 ]]>
</artwork>
</figure>
</section>
<section title="Method 8">
<t>
Use of the vsel parameter as done for ATM bearer connections
Following example indicates first preference of G.729 or G.729a (both are
interoperable) as the voice encoding scheme. A packet length of 10
octets and a packetization interval of 10 ms are associated with this
codec. G726-32 is the second preference stated in this line, with an
associated packet length of 40 octets and a packetization interval of
10 ms. If the packet length and packetization interval are intended
to be omitted, then this media attribute line contains '-'.
</t>
<figure align="center" title="Method 8">
<artwork>
<![CDATA[
a=vsel:G729 10 10000 G726-32 40 10000
a=vsel:G729 - - G726-32 - -]]>
</artwork>
</figure>
</section>
<section title="Method 9">
<t>
Use of <xref target="ITU.V152"/>'maxmptime' (maximum multiple ptime) attribute,
which contains different packetization times, each one maps to its
corresponding payload type described in the preceding 'm=' line to
indicate the supported packetization period for all codec payload
types. This attribute is a media-level attribute and defines a list
of maximum packetization time values, expressed in milliseconds, the
endpoint is capable of using (sending and receiving) for the connection. When
the maxmptime attribute is present, the ptime shall be ignored according to
the V.152 specification. When the maxptime is absent, then the value of ptime
attribute, if present, shall be taken as indicating the packetization period
for all codecs present in the 'm=' line.<vspace/>
The specification doesn't specify what has to be done when a 'maxptime' is also
present. Does the 'maxmptime' indicates the absolute maximum which can be used
as packetization time for a certain codec or does it indicate the packetization
time which has to be used as preference. It's open to many different
interpretations certainly in interworking scenarios.
</t>
<figure align="center" title="Method 9">
<artwork>
<![CDATA[
m=audio 3456 RTP/AVP 18 0 13 96 98 99
a=maxmptime:10 10 - - 20 20]]>
</artwork>
</figure>
</section>
<section title="Method 10">
<t>
Use of PacketCable 'mptime' attribute. See
<xref target="PKT.PKT-SP-CODEC-MEDIA">"Codec and Media Specification"</xref> which
gives a Note about the 'ptime': <xref target="RFC4566"/> defines the 'maxptime'
SDP attribute
and V.152 defines the 'maxmptime' SDP attribute. The precedence of these
attributes with respect to the 'ptime' and 'mptime' attributes is not defined
at this time."<vspace/>
</t>
<t>
Remark:<vspace/>
This method is the same as indicated by method 4. However, in the <xref target="PKT.PKT-SP-CODEC-MEDIA"/>
version from 9/2006, the mptime was removed and the maxptime was added. The PacketCable
seems to move away from the need of having multiple packetization times in
function of the codec and treat it more in the direction of a maximum end-to-end
delay aspect.
</t>
</section>
<section title="Method 11">
<t>
Use of SDP capabilities negotiation method. See <xref
target="I-D.ietf-mmusic-sdp-capability-negotiation"/>
which describes how additional capabilities can be
negotiated, such as the different supported ptimes. This
could be a possible solution in certain cases, but it also
requires updates in implementations which followed the basic
ptime/maxptime concept to adapt themselves to more
restricted implementations. It also introduces additional
complexity by adding new parameters and new semantics.
</t>
</section>
</section>
<section title="Background info">
<t>
The "Session Initiation Protocol" (SIP) is used to setup media sessions.
In the SIP INVITE message, a "Session Description Protocol" (SDP) is
used. In the SDP media description part, the m-line contains the media
type (e.g. audio), a transport port, a transport protocol (e.g. RTP/AVP)
and a media format description depending on the transport protocol.
For the transport protocol RTP/AVP or RTP/SAVP, the media format sub-field
can contain a list of RTP payload type numbers.<vspace/>
<vspace/>
Example:
m=audio 49232 RTP/AVP 8 0 4<vspace/>
<vspace/>
The "8 0 4" is the media format, indicating a list of possible codecs
indicated by static or dynamic numbers as defined in
<xref target="RFC3551">RFC 3551</xref>.
<vspace/>
In the above example, a list of static numbers is used:<vspace/>
8 = PCMA - G.711 PCM A-law<vspace/>
0 = PCMU - G.711 PCM u-law<vspace/>
4 = G723 - G.723.1<vspace/>
</t>
<t>
The PCMA and PCMU are "sample-based" codecs while the G723 is a "frame-based"
codec. All of them make use of a sampling rate of 8 kHz or 0.125 ms/sample.
PCMA and PMCU encode each sample in 8 bits by making use of the A or u
logarithmic companding laws resulting in a datarate of 64 kbps.
G723 however does not operate on single samples, but on different
samples combined together in a "frame". As such, higher compression rates
can be achieved. The G723 codec makes use of 240 voice samples corresponding with
30 ms speech frame duration. The codec compresses the data in the frame and
encodes it with 192 or 160 bits resulting in a datarate of 6.4 or 5.3 kbps.
G723 gives the advantage of a lower bit rate at the cost of increased
voice delay: 30 ms instead of 0,125 ms
</t>
<t>
The "International Telecommunication Union" (ITU) gives some guidelines
on acceptable end-to-end delays in <xref target="ITU.G114"/>. A delay up to
150 ms is acceptable. Between 150 and 400 ms, there is impact on the
perceived voice quality but still acceptable. Above 400 ms it becomes
unacceptable. Echo cancellers are required for delays >25 ms.
</t>
<t>
In "time division multiplexing" (TDM) networks, the coding delay is the
biggest part contributing to the end-to-end delay. However, in
"Packet Oriented" networks, packetization delays are added to the
end-to-end delay and can become an issue. Each packet has a certain
header which contributes to the bandwidth usage, i.e. the total required
bit-rate. The more data can be packed together, the smaller the influence
of the header on the total payload and the higher the transmission
efficiency. However, combining more data in a packet gives an increase
of the end-to-end delay. As such, there is a trade-off between bandwidth
usage, amount of packet processing and end-to-end delay. For a higher
compression rate, more data in a packet to improve the transmission
efficiency gives a quality reduction due to the increased end-to-end delay.
</t>
<t>
An example is indicated in following table where the G.711 (A or u-Law) is
compared with the G.723.1 for different packetization delays. The headers
consist of:
</t>
<t>
<list style="symbols">
<t>RTP header: 12 bytes.</t>
<t>UDP header: 8 bytes.</t>
<t>IPv4 header: 20 bytes.</t>
<t>MAC layer: 14 bytes.</t>
<t>CRC: 4 bytes.</t>
<t>Start frame + preamble: 20 bytes.</t>
</list>
</t>
<figure align="center" title="Packet delay & Throughput">
<artwork>
<![CDATA[
Codec Packet Datarate Voice Headers Tot Payload Throughput
Delay Payload
ms kbps bytes bytes bytes % kbps
-----------------------------------------------------------------
G711 0.125 64 1 78 79 1.3 5056.0
2.5 64 20 78 98 20.4 313.6
5 64 40 78 118 33.9 188.8
10 64 80 78 158 50.6 126.4
20 64 160 78 238 67.2 95.2
30 64 240 78 318 75.5 84.8
90 64 720 78 798 90.2 70.9
200 64 1600 78 1678 95.4 67.1
-----------------------------------------------------------------
G723.1 30 6.4 24 78 102 23.5 27.2
60 6.4 48 78 126 38.1 16.8
90 6.4 72 78 150 48.0 13.3
150 6.4 120 78 198 60.6 10.6
300 6.4 240 78 318 75.5 8.5
-----------------------------------------------------------------
]]>
</artwork>
</figure>
<t>
For the same packetization delay of 30 ms, the datarate of the G.723.1
is 10 times lower as for the G.711, but the payload efficiency is reduced
from 75.5 to 23.5%. The same efficiency for the G.723.1 is obtained when
the packetization delay is 300 ms! While the packet efficiency is lower,
the required bitrate on the link for the G.723.1 is reduced from 84.8 kbps
to 27.2 kbps. And when different frames are packed together, e.g. 3 frames
of 30 ms, the packetization delay becomes 90 ms resulting in a lower amount
of packets which have to be routed and processed and resulting in an
improved throughput data rate of 13.3 kbps.
</t>
<t>
The used frame sizes for the different codecs are 0.125 ms (G.711), 2.5 ms
(G728), 10 ms (G729); 20 ms (G726, GSM; GSM-EFR, QCELP, LPC) and 30 ms (G723).
All of them have a default 'ptime' of 20 ms, with the exception of the G723
with a default 'ptime' of 30ms.
</t>
<t>
The media description part can contain additional attribute lines which
complement or modify the media description line: 'ptime' and 'maxptime'
attributes.
</t>
<t>
Example:<vspace/>
m=audio 49232 RTP/AVP 8 0 4<vspace/>
a=ptime:20<vspace/>
a=maxptime:60
</t>
<t>
<xref target="RFC3551">RFC 35551 </xref> defines the default
packetization time for each codec in Table 1. The PCMA and
PCMU have 20 ms as default 'ptime' and the G723 has a 30 ms
default 'ptime'.
</t>
<t>
When, as in the example above, the 'ptime' value is 20, then it is a wrong
value for the G723 codec which requires at least a frame size of 30 ms
and as such requires a minimal packetization delay of 30 ms. And this causes many
different interworking problems between different systems due to different
interpretations of the relevant RFCs resulting in bad voice quality or call
setup failures.
</t>
<t>
In some APIs, the following functions are provided to interface with the RTP
and codec hardware layer for encoding voice samples, based on a certain codec,
in RTP packets.
<list style="numbers">
<t>
Set the encoding parameters such as codec type, payload type (for RTP),
packetization rate. Mostly these parameters are configuration parameters of
the device.
Either, these parameters are manually provided based on guidelines from the
network architecture or are dynamically and automatically provided.
</t>
<t>
Next a transmit buffer has to be allocated. The lower layer provides a function
to calculate the required buffer size in function of the encoding parameters.
</t>
<t>
A transmit buffer is allocated with the indicated size (as a minimum) by the
application layer.
</t>
<t>
The synchronous voice data which has to be encoded is passed to the
hardware layer which encodes the data (codec and packetization) into the
provided buffer.
</t>
<t>
The buffer with the RTP data is returned to the application which
can sent it out on the host network interface towards the packet network.
</t>
</list>
</t>
<t>
For the receiving part, required API functions are:
<list style="numbers">
<t>
Set the required decoding parameters such as codec type, payload type,
initial latency in frames, jitter buffer info. Please note that packetization
time is not required because every receiver should be able to handle up
to 200 ms, which is in fact the MTU size for which the receiver should
have the required resources.
</t>
<t>
The required buffer size which needs to be allocated is requested at
the hardware. This size is calculated based on the size of the RTP
header and the maximum allowed payload of 200 ms.
</t>
</list>
</t>
<t>
* The application however can decide to allocate smaller buffers if the
worst case is known for the expected RTP packetization time, i.e. by making
use of the 'maxptime' attribute.
</t>
<t>
Most implementations make use of a general purpose host processor (GPP)
in combination with a digital signal processor (DSP) for the codec/packetization part.
The host processor has the interface with the packet oriented world while
the DSP has an interface with a real-time synchronous network mostly with
special buffer handling mechanism to avoid too many interrupt handling.
</t>
<t>
Suppose a VoIP call making use of the G711 A or u-law. Most hardware
solutions are using a DSP to handle the realtime stuff. Most of these
DSPs have special build-in hardware functionality for PCM samples. The DSP can be
configured for A or u law and for a specific clock rate. For every transmitted
or received PCM sample, the hardware can generate an interrupt. But this has of
course is a big burden on the system performance. As such, the DSPs also
provide a method to avoid this interrupt burden by providing a mechanism
based on an internal buffer. An interrupt is only generated when the buffer
is empty or full.
The initialization of this DSP hardware for a specific call is done at the
SIP invite SDP negotiation time.
</t>
<figure align="center" title="Example">
<artwork>
<![CDATA[
m=audio 1234 RTP/AVP 0 8 4
ptime=30 ]]>
</artwork>
</figure>
<t>
So, if this SDP contains a PT=0,8,4 (i.e. G711u, G711A, G723) and a 'ptime'
of 30, then this 'ptime' can be used to initialize the DSP port with a buffer
size for 30 ms PCM voice samples. When the "offerer" sends a RTP packet
for a G711u or G711A by making use of the default value of 20 ms, then
the DSP PCM port is waiting for 30ms before sending out the buffer.
Because only 20 ms are received in the RTP packet, it has to wait for
the next RTP packet before being able to transmit the buffer causing a
serious degradation of the voice quality.
</t>
<t>
This could be the problem in DSP based solutions in media gateways between
IP and PSTN world but also for end user internet access devices (IAD) providing
the possibility to attach a normal analog voice phone via a RJ11 jack (ATA -
analog telephone adapter).
</t>
<t>
For this use case, certain implementers are making arguments in the
direction of a complete SDP negotiation mechanism. But this is in conflict
with the SDP paradigm where the 'ptime' is an optional parameter and not bound
to a specific codec but to the media itself.
Different proprietary solutions are now implemented causing even more
interworking issues.
</t>
</section>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-23 08:27:34 |