http://stupid.domain.name/ietf/

One document matched: draft-garcia-mmusic-multiple-ptimes-problem-03.xml
<?xml version="1.0" encoding="us-ascii"?>
<!DOCTYPE rfc SYSTEM "http://xml.resource.org/authoring/rfc2629.dtd" [
  <!ENTITY RFC4566 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml">
  <!ENTITY RFC3264 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml">
  <!ENTITY RFC3890 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3890.xml">
  <!ENTITY RFC3108 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3108.xml">
  <!ENTITY RFC4504 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4504.xml">
  <!ENTITY RFC3441 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3441.xml">
  <!ENTITY RFC3952 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3952.xml">
  <!ENTITY RFC4060 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4060.xml">
  <!ENTITY RFC1958 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1958.xml">
  <!ENTITY RFC2327 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2327.xml">
  <!ENTITY RFC3267 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3267.xml">
  <!ENTITY RFC3016 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3016.xml">
  <!ENTITY RFC3551 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml">
]>
<?xml-stylesheet type="text/xsl" href="http://xml.resource.org/authoring/rfc2629.xslt" ?>

<?rfc strict="yes" ?>
<?rfc toc="yes" ?>
<?rfc tocdepth="4" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="yes" ?>

<rfc category="info" docName="draft-garcia-mmusic-multiple-ptimes-problem-03.txt" ipr="full3978">
  <front>
    <title abbrev="Multiple ptime in SDP">
      Multiple Packetization Times in the Session Description Protocol (SDP):
      Problem Statement, Requirements & Solution
    </title>
    <author initials="M." surname="Willekens" fullname="Marc Willekens">
      <organization>Devoteam Telecom & Media</organization>
      <address>
        <postal>
          <street></street>
          <city>Herentals</city>
          <region>Antwerp</region>
          <code>2200</code>
          <country>Belgium</country>
        </postal>
        <email>marc.willekens@devoteam.com</email>
      </address>
    </author>
    <author initials="M." surname="Garcia-Martin" fullname="Miguel A. Garcia-Martin">
      <organization>Ericsson</organization>
      <address>
        <postal>
          <street>Via de los Poblados 13</street>
          <city>Madrid</city>
          <region></region>
          <code>28033</code>
          <country>Spain</country>
        </postal>
        <email>Miguel.A.Garcia@ericsson.com</email>
      </address>
    </author>
    <author initials="P." surname="Xu" fullname="Peili Xu">
      <organization>Huawei Technologies</organization>
      <address>
        <postal>
          <street>Bantian</street>
          <city>Longgang</city>
          <region>Shenzhen</region>
          <code>518129</code>
          <country>China</country>
        </postal>
        <email>xupeili@huawei.com</email>
      </address>
    </author>
    <date day="12" month="July" year="2008" />
    <area>RAI</area>
    <workgroup>MMUSIC Working Group</workgroup>
    <keyword>SDP</keyword>
    <keyword>ptime</keyword>
    <keyword>maxptime</keyword>
    <abstract>
      <t>
        This document provides a problem statement and requirements with respect to the
        presence of a single packetization time (ptime/maxptime) attribute in SDP media
        descriptions that contain several media formats (audio codecs).
        Furthermore, a best common practice solution for the use of 'ptime/maxptime' is
        proposed based on 'static', 'dynamic' and 'indicated' values.
        Some methods already proposed as ad-hoc solutions and background
        information is included in an appendix.
      </t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction">
      <t>
        <xref target="RFC4566">"Session Description Protocol" (SDP)</xref>
        provides a protocol to describe multimedia sessions
        for the purposes of session announcement, session invitation,
        and other forms of multimedia session initiation. A session
        description in SDP includes the session name and purpose, the
        media comprising the session, information needed to receive the
        media (addresses, ports, formats, etc.) and some other
        information.
      </t>
      <t>
        In the SDP media description part, the m-line contains the
        media type (e.g. audio), a transport port, a transport
        protocol (e.g. RTP/AVP) and a media format description which
        depends on the transport protocol.
      </t>
      <t>
        For the transport protocol RTP/AVP or RTP/SAVP, the media
        format sub-field can contain a list of RTP payload type
        numbers.
        See <xref target="RFC3551">
        "RTP Profile for Audio and Video Conferences with Minimal Control"</xref>,
        Table 4.<vspace/>
        For example: "m=audio 49232 RTP/AVP 3 15 18" indicates the audio encoders 
        GSM, G728, and G729.
      </t>
      <t>
        Further, the media description part can contain additional
        attribute lines that complement or modify the media
        description line. Of interest for this memo, are the 'ptime'
        and 'maxptime' attributes.
        According to <xref target="RFC4566"/>, the 'ptime' attribute gives
        the length of time in milliseconds represented by the media in
        a packet, and the 'maxptime' gives the maximum amount of media
        that can be encapsulated in each packet, expressed as time in
        milliseconds. These attributes modify the whole media
        description line, which can contain an extensive list of
        payload types. In other words, these attributes are not
        specific to a given codec.
      </t>
      <t>
        <xref target="RFC4566"/> also indicates that it
        should not be necessary to know 'ptime' to decode RTP or vat
        audio since the 'ptime' attribute is intended as a
        recommendation for the encoding/packetization of
        audio. However, once more, the existing 'ptime' attribute
        defines the desired packetization time for all the payload
        types defined in the corresponding media description line.
      </t>
      <t>
        End-devices can sometimes be configured with different codecs and for
        each codec a different packetization time can be
        indicated. However, there is no clear way to exchange this
        type of information between different user agents and this can
        result in lower voice quality, network problems or performance
        problems in the end-devices.
      </t>
    </section>
    <section title="Problem Statement">
      <t>
        The packetization time is an important parameter which helps
        in reducing the packet overhead. Many voice codecs define a
        certain frame length used to determine the coded voice filter
        parameters and try to find a certain trade-off between the
        perceived voice quality, measured by the Mean Option Score
        (MOS), and the required bitrate. When a packet
        oriented network is used for the transfer, the packet header
        induces an additional overhead.  As such, it makes sense to
        combine different voice frame data in one packet, up to
        a Maximum Transmission Unit (MTU), to find a good balance
        between the required network resources, end-device resources
        and the perceived voice quality influenced by packet loss,
        packet delay, jitter.  When the packet size decreases, the
        bandwidth efficiency is reduced. When the packet size
        increases, the packetization delay can have a negative impact
        on the perceived voice quality.
      </t>
      <t>
        The <xref target="RFC3551">"RTP Profile for Audio and Video
        Conferences with Minimal Control"</xref>, Table 1, indicates
        the frame size and default packetization time for different
        codecs. The G728 codec has a frame size of 2.5 ms/frame and
        a default packetization time of 20 ms/packet.  For G729
        codec, the frame size is 10 ms/frame and a default
        packetization time of 20 ms/packet.
      </t>
      <t>
        When more and more audio streaming traffic is carried over
        IP-networks, the quality as perceived by the end-user should
        be no worse as the classical telephony services. For VoIP
        service providers, it is very important that endpoints receive
        audio with the best possible codec and packetization time. In
        particular, the packetization time depends on the selected
        codec for the audio communication and other factors, such as
        the Maximum Transmission Unit (MTU) of the network and the
        type of access network technology.
      </t>
      <t>
        As such, the packetization time is clearly a function of the
        codec and the network access technology. During the
        establishment of a new session or a modification of an existing
        session, an endpoint should be able to express its preferences
        with respect to the packetization time for each codec. This would
        mean that the creator of the SDP prefers the remote endpoint to
        use certain packetization time when sending media with that
        codec.
      </t>
      <t>
        <xref target="RFC4566">SDP</xref> provides the means for
        expressing a packetization time that affects all the payload
        types declared in the media description line. So, there are no
        means to indicate the desired packetization time on a per
        payload type basis. Implementations have been using
        proprietary mechanisms for indicating the packetization time
        per payload type, leading to interoperability problems.
      </t>
      <t>
        One of these mechanisms is the 'maxmptime' attribute, defined in
        <xref target="ITU.V152"/>, which indicates the supported packetization
        period for all codec payload types.
      </t>
      <t>
        Another one is the 'mptime' attribute, defined by
        <xref target="PKT.PKT-SP-EC-MGCP">"PacketCable"</xref>, which indicates a
        list of packetization period values the endpoint is capable of
        using (sending and receiving) for this connection.
      </t>
      <t>
        While all have similar semantics, there is obviously no interoperability
        between them, creating a nightmare for the implementer who happens to be
        defining a common SDP stack for different applications.
      </t>
      <t>
        A few RTP payload format descriptions, such as: <vspace/>
        <xref target="RFC3267"/>, <xref target="RFC3016"/>, and <xref target="RFC3952"/>,
        indicate that the packetization time for such payload should
        be indicated in the 'ptime' attribute in SDP. However, since
        the 'ptime' attribute affects all payload formats included
        in the media description line, it would not be possible to
        create a media description line that contains all the
        mentioned payload formats and different packetization
        times. The solutions range from considering a single
        packetization time for all payload types, or creating a
        media description line that contains a single payload type.
      </t>
      <t>
        However, once more, if several payload formats are
        offered in the same media description line in SDP, there is no
        way to indicate different packetization times per payload format.
      </t>


    </section>
    <section title="Requirements">
      <t>
        The main requirement is coming from the implementation and media gateway
        community making use of hardware based solutions, e.g. DSP or FPGA
        implementations with silicon constraints for the amount of buffer space.
      </t>
      <t>
        Some are making use of the ptime/codec information to make certain QoS budget
        calculations.
        When the packetization time is known for a codec with a certain
        frame size and frame data rate, the efficiency of the throughput
        can be calculated.
      </t>
      <t>
        Currently, the 'ptime' and 'maxptime' are "indication" attributes and optional.
        When these parameters are used for resource reservation and for hardware
        initializations, a negotiated value between the SDP offerer and SDP answerer
        can become a requirement.
      </t>
      <t>
        There could be different sources for the 'ptime/maxptime', i.e. from RTP/AVP
        profile, from end-user device configuration, from network architecture, 
        from receiver.
      </t>
      <t>
        The codec and 'ptime/maxptime' in upstream and downstream can be different.
      </t>
    </section>
    <section title="BCP solution proposal">
      <t>
        The basic idea of this proposal is to keep the packetization time
        independent from the codec and to consider the main purpose of the 'ptime'
        as follows.
      </t>
      <t>
        The 'ptime' is a parameter indicating the packetization time which is an
        important parameter for the end-to-end delay of the voice signal as
        indicated in the previous sections.
        It is defined as a media-attribute in the SDP.
      </t>
      <t>
        The only requirement for the use of the 'ptime' or 'maxptime' is the total
        size of the message which should fit in the MTU and the packetization
        time should be an integer multiple of the codec frame size.
      </t>
      <t>
        If the same session does require different kind of streams, e.g. in a
        conference where some users have a narrowband connection and others
        having a broadband connection, different media can be defined and
        allocated to different ports.
        In that case, different m-lines can be defined and another 'ptime' and
        'maxptime' can be indicated.
      </t>
      <t>
        The IETF RFCs are not clear when the 'ptime' or 'maxptime' in the SDP are not
        an integer multiple of the frame size. What should be used in that case?
        Making use of the default 'ptime', making use of the 'ptime' which is an
        integer multiple of the frame size and lower than the indicated 'ptime'?
        In case of an indicated 'maxptime', taking a value as close as possible to
        the indicated 'ptime' but lower as the 'maxptime'?
      </t>
      <t>
        This proposal takes care about the IETF architectural principle of
        "be strict when sending" and "be tolerant when receiving". Ref. 
        <xref target="RFC1958"/>.
      </t>
      <section title="Sending party RTP voice payload">
        <t>
          The transmitting side of a connection needs to know the packetization
          time it can use for the RTP payload data, i.e. how many speech frames
          it can include in the RTP packet. A trade-off between the packetization
          delay and the transmission efficiency has to be made and this can be a
          static or a dynamic process which involves all elements in the
          end-to-end chain.
        </t>
        <t>
          As such, 3 different sources to determine the packetization time are
          considered.
        </t>
        <section title="ptime(s) - Static">
          <t>
            Static provided values in the end-device: default values or manually
            defined values.
          </t>
          <t>
            An end-device implementation must know:<vspace/>
            <list style="numbers">
              <t>
                all the codec specific parameters such as:
                <list style="numbers">
                  <t>Sampling rate (e.g. 8000 Hz).</t>
                  <t>Amount of channels (e.g. 1).</t>
                  <t>Frame size in ms (e.g. 20 ms).</t>
                  <t>Amount of encoded bits per frame (e.g. 264 bits).</t>
                  <t>
                    Amount of required octets per frame (e.g. G.723.1 with 6.4 kbps,
                    has 189 bits for the encoded data resulting in a datarate of
                    189/30 ms or 6.3 kbps.
                    However, the packet data is octet aligned and as such, 3 bits are added
                    which results in 24 octets/frame or a datarate of 6.4 kbps).
                  </t>
                </list>
              </t>
              <t>
                system specific parameters such as:
                <list style="numbers">
                  <t>MTU supported by the network and by the protocol stack of the end-device.</t>
                  <t>Packetization time (e.g. 60 ms) and the maximum packetization time (e.g. 150 ms).</t>
                  <t>Supported codecs.</t>
                </list>
              </t>
            </list>
          </t>
        </section>
        <section title="ptime(d) - Dynamic">
          <t>
            Dynamic provided values defined by the network architecture.
          </t>
          <t>
            The network can indicate, as part of the device management, its supported
            codecs, the 'ptime' and 'maxptime'. These values can also change based on the
            dynamic behavior of the network. During heavy load on the network,
            the network architecture can decide to use lower rate codecs
            (for bandwidth issues) and/or higher packetization times
            (for packet processing performance).
            This dynamic change can be done before, during or after a session.
          </t>
        </section>
        <section title="ptime(i) - Indicated">
          <t>
            Proposed indicated values coming from the receiving side.
          </t>
          <t>
            The receiving side can indicate in the SDP the 'ptime' and 'maxptime' value
            it wants to receive. This is an optional parameter for the media, codec
            independent and considered as an indication only. It should only be
            considered as a hint to the sending party.
          </t>
        </section>
        <section title="ptime/maxptime algorithm">
          <t>
            Instead of indicating a 'ptime/maxptime' on a per-codec basis as done in
            many different proposals, this draft proposes to make use of the 'ptime/maxptime'
            as a common parameter coming from different sources:<vspace/>
            ptime(s), ptime(d), ptime(i) and maxptime(s), maxptime(d), maxptime(i).
          </t>
          <t>
            In function of the available information for the 'ptime' and 'maxptime',
            the packetization time which will be used for the transmission "pt" is
            based on following algorithm.
            <list style="numbers">
              <t>
                Determine codec to be used, e.g. G723 based on local info or the
                optional network info.
              </t>
              <t>
                Determine coding data rate, e.g. 6.4 kbps based on local info or the
                optional network info.
              </t>
              <t>
                Based on the codec, the frame size in ms is known: fc = frame size
                of the codec.
              </t>
              <t>
                Determine the MTU size which can be used. Based on this value,
                the codec frame size and datarate, a 'maxptime' related to the codec "mc"
                can be calculated.
              </t>
              <t>
                Check the ptime(s, d, i) and maxptime(s, d, i, mc).
                Take the maximum value from the available set of ptime(s, d, i) which
                is lower or equal than the minimum value in the set maxptime(s, d, i, mc).
              </t>
              <t>
                Normalize this 'ptime' value to the integer multiple of the frame size
                lower or equal to this 'ptime' value and lower or equal to the "mc" but not
                lower then the codec frame size.
              </t>
            </list>
          </t>
          <t>
            Remark:<vspace/>
            It's up to a local policy of the device, to determine which 'ptime/maxptime'
            sources it will use in its calculation, e.g. it is possible to disallow
            the treatment of the 'ptime' indicated by the other side.
            This can easily be done by including/excluding the 'ptime/maxptime' values
            from the vectors used in the calculation.
          </t>
          <t>
            The formula to calculate the packetization time for the transmission of
            voice packets in the RTP payload data has following input parameters.
          </t>
          <t>
            <list style="numbers">
              <t>
                The packetization time made available from different sources.
                When no value is known, the frame size of the voice codec is used.
              </t>
              <t>
                The maximum packetization time values made available from different
                sources. When no value is known, the frame size of the voice codec is used.
              </t>
              <t>
                The frame size of the codec.
              </t>
              <t>
                The packetization time corresponding with the selected codec,
                frame size, frame datarate and the network MTU. This packetization time
                has to be larger or equal to the frame size. At least one frame size should fit
                in the MTU!
              </t>
            </list>
          </t>
          <t>
            The function has one output parameter: the packetization time which has
            to be used for the transmission: "pt". It is the frame size of the codec
            multiplied by the number of frames which have to be placed in the RTP
            payload based on the provided 'ptime' and 'maxptime' values.
            In the formula, the maximum packetization time related to the MTU is added
            to the vector which contains one or more packetization time values. The
            minimum value out of this set is determined.
            For the 'ptime' set "p" which contains one or more values, the values of
            the 'ptime' which is higher as the minimum value of the 'maxptime' set "mp"
            is replaced by this value.  Then the maximum value out of this set
            is determined and used to calculate the amount of voice frames which
            can be included with that packetization time.
          </t>
          <t>
            Some examples are provided. The first example is related to the G723
            with a frame size of 30 ms. When the receiver has indicated a 'ptime' of
            20 ms in the SDP, the RTP will be sent with one voice frame of 30 ms.<vspace/>
            In another example, a G711 codec with a default 'ptime' of 20 ms and
            an indicated 'ptime' of 60 ms, 3 speech frames of 20 ms can be transmitted
            in one RTP packet towards the receiver which has indicated his ability to
            receive RTP packets with 60 ms packetization time.
          </t>
          <t>
            This "pt" is used to allocate the PCM buffer size where the voice samples
            from the synchronous network interface are stored before being passed
            in RTP packets towards the packet oriented network.
          </t>
          <t>
            When the 'ptime' and 'maxptime' are lower as the frame size of the codec, no
            packetization time for the transmission can be determined. An invalid value
            (=0) is indicated by the algorithm. In that case, the sender has to select
            another codec with a voice frame size which is lower or equal to the 'ptime' 
            or 'maxptime'.
          </t>
        </section>
        <section title="Algorithm and examples">
          <section title="Codec independent parameters">
            <t>
              <list style="symbols">
                <t>
                  p = vector containing all provided packetization time values such as
                  static, dynamic, indicated values.
                </t>
                <t>
                  mp = vector containing all provided maximum packetization time values.
                </t>
              </list>
            </t>
            <t>
              At least, one "p" and "mp" value have to be provided. When no static,
              dynamic or indicated values are known, the frame size of the codec "fc"
              can be used.
            </t>
          </section>
          <section title="Codec dependent parameters">
            <t>
              <list style="symbols">
                <t>
                  fc = frame size of the codec
                </t>
                <t>
                  mc = max packetization time which corresponds with the selected codec,
                  frame size, frame datarate and the network MTU (mc > fc).
                </t>
              </list>
            </t>
          </section>
          <section title="Pseudocode algorithm">
            <figure align="center" title="Pseudocode algorithm">
              <artwork>
                <![CDATA[
pt(p,mp,fc,mc) := |mp <- stack(mp,mc)
                  |if cols(p)>0
                  |  for i e 0..cols(p)-1
                  |     p(i)<-min(mp) if p(i)>min(mp)
                  |otherwise
                  |     p<-min(mp) if p>min(mp)
                  |nf<-1 if (nf<-floor(max(p)/fc)<=0) & (min(mp)>fc)
                  |fc.nf
]]>
              </artwork>
            </figure>
          </section>
          <section title="Pseudocode examples">
            <figure align="center" title="Pseudocode examples">
              <artwork>
                <![CDATA[
ptime:=20         maxptime:=60          pt(ptime,maxptime,30,100)=30
ptime:=20         maxptime:=20          pt(ptime,maxptime,30,100)=0
ptime:=30         maxptime:=30          pt(ptime,maxptime,30,100)=30
ptime:=60         maxptime:=80          pt(ptime,maxptime,30,100)=60

ptime:=20         maxptime:=60          pt(ptime,maxptime,20,100)=20
ptime:=60         maxptime:=80          pt(ptime,maxptime,20,100)=60
ptime:=70         maxptime:=200         pt(ptime,maxptime,20,100)=60
ptime:=120        maxptime:=60          pt(ptime,maxptime,20,100)=60

ptime:=120        maxptime:=200         pt(ptime,maxptime,10,100)=100
ptime:=[40,50,20] maxptime:=200         pt(ptime,maxptime,10,100)=50
ptime:=[40,50,20] maxptime:=[40,50,20]  pt(ptime,maxptime,10,100)=20
ptime:=[120,40] maxptime:=[150,200,100] pt(ptime,maxptime,10,100)=100
]]>
              </artwork>
            </figure>
          </section>
        </section>
      </section>
      <section title="Receiving party RTP voice payload">
        <t>
          The receiver has to make use of the information in the RTP to determine
          the codec type, the frame rate and the total packetization time of the
          voice payload data.
        </t>
        <t>
          For the receiver, two parts in the data flow can be considered. First,
          the packet has to be received from the packet oriented network. At the
          other side, mostly a synchronous network is provided where PCM voice
          samples are used.
        </t>
        <t>
          This proposal describes a method how the receiver can handle unknown
          packetization buffer requirements which also allows inband changes
          for the codec datarate and packetization time.
        </t>
        <t>
          As indicated, there are different sources for the 'maxptime' and it
          is already described how a 'maxptime' value can be determined for
          sending it in the SDP indication. The same 'maxptime' is used for
          the allocation of the PCM buffer space where the voice samples
          received in the RTP packets are stored before being transmitted
          towards the synchronous network, after a de-jittering. An indication is given to the
          DSP hardware about the actual packetization length obtained
          from the received RTP packet. When the amount of samples are stored
          in the buffer corresponding to the packetization length, an interrupt
          is generated and the data is transmitted without having to wait for
          another RTP packet to fill-up the remaining space.
        </t>
      </section>
      <section title="Procedures for the SDP offer/answer">
        <t>
          This section contains the procedures related to the calculation of the
          'ptime' and 'maxptime' attributes when they are used by protocols
          following the SDP offer/answer model specified in <xref target="RFC3264"/>.
        </t>
        <section title="Procedures for an SDP offerer">
        <t>
          An SDP offerer may include a 'ptime' value and a 'maxptime' value in the
          SDP. These values are merely an indication of the desired packetization
          times. 
          The same formula as for the "pt" is used to determine
          the 'ptime' in the SDP. When the media line contains different codec
          formats, the 'ptime' value is determined for the first codec in the format
          list (i.e. the codec with the highest priority).
          For the 'maxptime', the minimum value of the 'maxptime' value set is used
          in the SDP and normalized to an integer multiple of the frame size of
          the first codec in the list.
        </t>
        <t>
          It's up to a local policy of the device, to determine which 'ptime/maxptime'
          sources it will use in its calculation, e.g. it is possible to disallow
          the treatment of a certain 'ptime'. This can easily
          be done by including/excluding the 'ptime/maxptime' values from the
          vectors used in the calculation.
        </t>
        </section>
        <section title="Procedures for an SDP answerer">
          <t>
            An SDP answerer that receives an SDP offer may also determine the value
            of 'ptime' value and the 'maxptime' value to be included in the SDP answer.
            These parameters are determined in the same way as done
            by the offerer. However, the "answerer" can use another local policy to
            determine which 'time/maxptime' sources will be used in the calculation.
          </t>
        </section>
      </section>
      <section title="Advantages">
        <t>
          The new proposed method has following advantages:<vspace/>
          <vspace/>
          <list style="numbers">
            <t>
              Basic idea of the 'ptime' related RFCs is kept. No new parameters
              have to be added and no new interpretations or semantic reordering
              has to be done.
            </t>
            <t>
              The new method is strict in sending and tolerant in receiving.
              It sends with the maximum allowed 'ptime' lower or equal to the minimal
              'maxptime'.
            </t>
            <t>
              Different sources for the 'ptime' and 'maxptime' are taken into account,
              even more as done in the different current proposals trying to
              negotiate end-to-end.
            </t>
            <t>
              A local policy in the end-device can easily be adopted and
              adapted without requiring changes in the end-to-end protocol.
            </t>
            <t>
              The algorithm makes use of all the provided information about
              'ptime', 'maxptime', codec frame size, MTU size and proposes the most
              optimum 'ptime'.
            </t>
            <t>
              The same algorithm is used at sending and receiving side, for
              SDP indications and RTP packets.
            </t>
            <t>
              The algorithm is small and straight-forward. Codec dependent
              and codec independent parameters are clearly indicated.
            </t>
          </list>
        </t>
      </section>
    </section>
    <section title="Conclusion and next steps">
      <t>
        This memo advocates for the need of a standardized mechanism to
        indicate the packetization time on a per codec basis, allowing
        the creator of SDP to include several payload formats in the
        same media description line with different packetization
        times.
      </t>
      <t>
        This memo encourage discussion in the MMUSIC WG mailing list
        in the IETF. The ultimate goal is to define a standard
        mechanism that fulfils the requirements highlighted in this
        memo.
      </t>
      <t>
        The goal is finding a solution which does not require changes in
        implementations which have followed the existing RFC guidelines and
        which are able to receive any packetization time.
      </t>
    </section>
    <section title="Security Considerations" anchor="sec-security">
      <t>
        This memo discusses a problem statement and requirements. As
        such, no protocol that can suffer attacks is defined.
      </t>
    </section>
    <section title="IANA Considerations" anchor="sec-iana">
      <t>
        This document does not request IANA to take any action.
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
      &RFC4566;
      &RFC3264;
    </references>
    <references title="Informative References">
      <reference anchor="ITU.V152">
        <front>
          <title>Procedures for supporting voice-band data over IP networks</title>
          <author fullname="ITU-T">
            <organization>ITU-T</organization>
          </author>
          <date year="2005" month="January" />
        </front>
        <seriesInfo name="ITU-T Recommendation" value="V.152"/>
        <format type="pdf" target="http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-V.152-200501-I!!PDF-E&type=items"/>
      </reference>
      <reference anchor="ITU.G114">
        <front>
          <title>One-way transmission time</title>
          <author fullname="ITU-T">
            <organization>ITU-T</organization>
          </author>
          <date year="2005" month="May" />
        </front>
        <seriesInfo name="ITU-T Recommendation" value="G.114"/>
        <format type="pdf" target="http://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-G.114-200305-I!!PDF-E&type=items"/>
      </reference>
      <reference anchor="PKT.PKT-SP-EC-MGCP">
        <front>
          <title>PacketCable Network-Based Call Signaling Protocol Specification</title>
          <author fullname="PacketCable">
            <organization>PacketCable</organization>
          </author>
          <date year="2005" month="August" day="12" />
        </front>
        <seriesInfo name="PacketCable" value="PKT-SP-EC-MGCP-I11-050812"/>
        <format type="pdf" target="http://www.packetcable.com/downloads/specs/PKT-SP-MGCP-I11-050812.pdf" />
      </reference>

      <reference anchor="PKT.PKT-SP-CODEC-MEDIA">
        <front>
          <title>Codec and Media Specification</title>
          <author fullname="PacketCable">
            <organization>PacketCable</organization>
          </author>
          <date year="2006" month="October" day="13" />
        </front>
        <seriesInfo name="PacketCable" value="PKT-SP-CODEC-MEDIA-I02-061013"/>
        <format type="pdf" target="http://www.packetcable.com/downloads/specs/PKT-SP-CODEC-MEDIA-I02-061013.pdf" />
      </reference>

      <?rfc include='reference.I-D.ietf-mmusic-sdp-capability-negotiation'?>
      &RFC3890;
      &RFC3108;
      &RFC4504;
      &RFC3441;
      &RFC3952;
      &RFC4060;
      &RFC1958;
      &RFC2327;
      &RFC3267;
      &RFC3016;
      &RFC3551;
    </references>
    <section title="Related RFCs for ptime">
      <t>
        Many RFCs make references to the 'ptime/maxptime' attribute to
        give some definitions, recommendations, requirements, default values.
      </t>
      <t>
        <xref target="RFC4566"/> defines the 'ptime' and 'maxptime' as:
      </t>
      <t>
        <list>
          <t>
            a=ptime:[packet time]
          </t>
          <t>
            "This gives the length of time in milliseconds represented by
            the media in a packet. This is probably only meaningful for
            audio data, but may be used with other media types if it makes
            sense. It should not be necessary to know ptime to decode RTP
            or vat audio, and it is intended as a recommendation for the
            encoding/packetization of audio. It is a media-level
            attribute, and it is not dependent on charset."
          </t>
        </list>
      </t>
      <t>
        <list>
          <t>
            a=maxptime:[maximum packet time]
          </t>
          <t>
            "This gives the maximum amount of media that can be encapsulated
            in each packet, expressed as time in milliseconds. The time
            SHALL be calculated as the sum of the time the media present in
            the packet represents. For frame-based codecs, the time SHOULD
            be an integer multiple of the frame size. This attribute is
            probably only meaningful for audio data, but may be used with
            other media types if it makes sense. It is a media-level
            attribute, and it is not dependent on charset."
          </t>
        </list>
      </t>
      <t>
        <list>
          <t>
            "Additional encoding parameters MAY be defined in the future,
            but codec-specific parameters SHOULD NOT be added. Parameters
            added to an "a=rtpmap:" attribute SHOULD only be those required
            for a session directory to make the choice of appropriate media
            to participate in a session. Codec-specific parameters should
            be added in other attributes (for example, "a=fmtp:")."
          </t>
        </list>
      </t>
      <t>
        <list>
          <t>
            "Note: RTP audio formats typically do not include information
            about the number of samples per packet. If a non-default (as
            defined in the RTP Audio/Video Profile) packetization is
            required, the 'ptime' attribute is used as given above."
          </t>
        </list>
      </t>

      <t>
        Remark:<vspace/>
        'maxptime' was introduced after the release of <xref target="RFC2327"/>,
        and non-updated implementations will ignore this attribute.
      </t>
 
      <t>
        <xref target="RFC3264">"SDP Offer/answer model"</xref>.<vspace/>
        Describe requirements for the 'ptime' for the SDP offerer and SDP answerer. <vspace/>

        If the 'ptime' attribute is present for a stream, it indicates the
        desired packetization interval that the offerer would like to
        receive. The 'ptime' attribute MUST be greater than zero.<vspace/>

        The answerer MAY include a non-zero 'ptime' attribute for any media
        stream. This indicates the packetization interval that the answerer
        would like to receive.
        There is no requirement for the packetization interval to be the same
        in each direction for a particular stream.
      </t>
      <t>
        <xref target="RFC3890">"SDP Transport independent bandwidth modifier"</xref>.<vspace/>
        Indicates the 'ptime' as a possible candidate for the bandwidth but
        it should be avoided for that purpose. The use of another parameter is
        indicated as a proposed method.
      </t>
      <t>
        <xref target="RFC3108">"SDP Conversions for ATM bearer"</xref>.<vspace/>
        It is not recommended to use the 'ptime' in ATM applications since packet
        period information is provided with other parameters (e.g. the profile type and
        number in the 'm' line, and the 'vsel', 'dsel' and 'fsel'
        attributes).  Also, for AAL1 applications, 'ptime' is not
        applicable and should be flagged as an error.  If used in AAL2
        and AAL5 applications, 'ptime' should be consistent with the
        rest of the SDP description.<vspace/>

        The 'vsel', 'dsel' and 'fsel' attributes refer generically
        to codecs.  These can be used for service-specific codec negotiation and
        assignment in non-ATM as well as for ATM applications.<vspace/>

        The 'vsel' attribute indicates a prioritized list of one or more 3-
        tuples for voice service.  Each 3-tuple indicates a codec, an
        optional packet length and an optional packetization period.  This
        complements the 'm' line information and should be consistent with
        it.<vspace/>

        The 'vsel' attribute refers to all directions of a connection.  For a
        bidirectional connection, these are the forward and backward
        directions.  For a unidirectional connection, this can be either the
        backward or forward direction.<vspace/>

        The 'vsel' attribute is not meant to be used with bidirectional
        connections that have asymmetric codec configurations described in a
        single SDP descriptor.  For these, the 'onewaySel' attribute
        should be used.<vspace/>

        The 'vsel' line is structured with an encodingName, a packetLength and a
        packetTime.<vspace/>

        The packetLength is a decimal integer
        representation of the packet length in octets.  The packetTime is a
        decimal integer representation of the packetization interval in
        microseconds.  The parameters packetLength and packetTime can be
        set to "-" when not needed.  Also, the entire 'vsel' media attribute
        line can be omitted when not needed.<vspace/>
      </t>
      <t>
        <xref target="RFC4504">"SIP device requirements and configuration"</xref>.<vspace/>
        In some cases, certain network architectures have constraints influencing
        the end devices. The desired subset of codecs supported by the device
        SHOULD be configurable along with the order of preference.  Service
        providers SHOULD have the possibility of plugging in own preferred codecs.
        The codec settings MAY include the packet length and
        other parameters like silence suppression or comfort noise
        generation.

        The set of available codecs will be used in the codec negotiation
        according to <xref target="RFC3264"/>.<vspace/>
        Example: Codecs="speex/8000;ptime=20;cng=on,gsm;ptime=30"
      </t>
      <t>
        <xref target="RFC3441">"MGCP ATM package"</xref>.<vspace/>
        Packet time changed ("ptime(#)"):<vspace/>

        If armed via an R:atm/ptime, a media gateway signals a packetization
        period change through an O:atm/ptime.  The decimal number, in
        parentheses, is optional.  It is the new packetization period in
        milliseconds.  In AAL2 applications, the pftrans event can be used to
        cover packetization period changes (and codec changes).<vspace/>

        Voice codec selection (vsel): This is a prioritized list of one or
        more 3-tuples describing voice service.  Each vsel 3-tuple indicates
        a codec, an optional packet length and an optional packetization
        period.
      </t>
      <t>
        <xref target="RFC3952">"RTP payload for iLBC"</xref>.<vspace/>
        The 'maxptime' SHOULD be a multiple of
        the frame size.  This attribute is probably only meaningful
        for audio data, but may be used with other media types if it
        makes sense.  It is a media attribute, and is not dependent
        on charset.  Note that this attribute was introduced after
        <xref target="RFC2327"/>, and non updated implementations will ignore this
        attribute.<vspace/>

        Parameter 'ptime' can not be used for the purpose of specifying iLBC
        operating mode, due to fact that for the certain values it will be
        impossible to distinguish which mode is about to be used (e.g., when
        'ptime=60', it would be impossible to distinguish if packet is carrying
        2 frames of 30 ms or 3 frames of 20 ms, etc.).
      </t>
      <t>
        <xref target="RFC4060">"RTP payload for distributed speech recognition"</xref>.<vspace/>
        If 'maxptime' is not present, 'maxptime' is assumed to be 80ms.<vspace/>

        Note, since the performance of most speech recognizers are
        extremely sensitive to consecutive FP losses, if the user of the
        payload format expects a high packet loss ratio for the session,
        it MAY consider to explicitly choose a 'maxptime' value for the
        session that is shorter than the default value.
      </t>
    </section>
    <section title="Ad-hoc solutions for multiple ptime">
      <t>
        During last years, different solutions were already proposed and
        implemented with the goal to make the 'ptime' in function of the codec
        instead of the media, containing a list of codecs.
        The list of given solutions indicates what kind of logical
        proposals were already made to find a solution for the SDP interworking
        issues due to implementation and RFC interpretations without imposing
        any preference for a certain solution.
      </t>
      <t>
        In all these proposals, a semantic grouping of the codec specific
        information is made by giving a new interpretation of the sequence
        of the parameters or by providing new additional attributes.
      </t>
      <t>
        REMARK:<vspace/>
        All these methods are against the basic rule indicated in the RFCs which
        state that a 'ptime' and 'maxptime' are media specific and NOT codec specific.
        It does not solve the interworking issues! Instead, it makes it worse due
        to many new interpretations and implementations as indicated by following
        examples.
      </t>
      <t>
        To avoid a further divergence, the implementation community is strongly
        asking for a standardized solution.
      </t>
      <section title="Method 1">
        <t>
          Write the rtpmap first, followed by the 'ptime' when it is related to the
          codec indicated by that rtpmap.
        </t>
        <t>
          This method tries to correlate a ptime to a specific codec but many existing
          implementations will suffer from such a proposal.
          Some SDP encoder implementations first write the media line, followed by the
          rtpmap lines and then the other value attributes such as ptime and fmtp.
          So, it is difficult to know to which payload type the
          'ptime' is related. In following example, it's hard to tell if ptime:20
          is related to payload 0 or 4 or both and the interpretation of this information
          by the remote end is unknown. Implementations which are fully compliant with
          the existing RFCs will suffer from such new proposals.
        </t>
        <figure align="center" title="Method 1">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 4 0
a=rtpmap:4 G723/8000
a=rtpmap:0 PCMU/8000
a=ptime:20
a=fmtp:4 bitrate=6400 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 2">
        <t>
          Grouping of all codec specific information together.
        </t>
        <t>
          Most implementers are in favor of this proposal, i.e. writing the value
          attributes associated with an rtpmap listed immediately after it. But, this
          is also a new interpretation. Normally, the ptime refers to all payload types
          indicated in the m-line. All existing implementations will also suffer from
          such a method.
        </t>
        <figure align="center" title="Method 2">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 4 0
a=rtpmap:4 G723/8000
a=fmtp:4 bitrate=6400
a=rtpmap:0 PCMU/8000
a=ptime:20 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 3">
        <t>
          Use the 'ptime' for every codec after its rtpmap definition. This makes the
          'ptime' a required parameter for each payload type. It looks obvious but not
          allowed according the existing RFCs. And will the same construct be used
          for the 'maxptime'?
        </t>
        <figure align="center" title="Method 3">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 0 18 4

a=rtpmap:18 G729/8000
a=ptime:30

a=rtpmap:0 PCMU/8000
a=ptime:40

a=rtpmap:4 G723/8000
a=ptime:60 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 4">
        <t>
          Create a new 'mptime' (multiple ptime) attribute that contains different
          packetization times, each one mapped to its corresponding payload type
          in the preceding 'm=' line.
          What will happen when the other side sends a RTP stream with a different
          packetization time? Should the elements in the mptime attribute be interpreted
          as required values or preferred values? With this approach, the RFC
          compliant implementations are also affected and have to consider to the new
          mptime attribute.
        </t>
        <figure align="center" title="Method 4">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 0 18 4
a=mptime 40 30 60 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 5">
        <t>
          Use of a new 'x-ptime' attribute. However, SDP parsers complained
          about x- headers. It was once indicated to better use something
          without x- (e.g. 'xptime'). This is just another type of encoding
          of method 4 and also doesn't solve anything.
        </t>
        <figure align="center" title="Method 5">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 0 8
a=x-ptime 20 30 ]]>
          </artwork>
        </figure>

      </section>
      <section title="Method 6">
        <t>
          Use of different m-lines with one codec per m-line.<vspace/>
          However this is a misuse because different m-lines means different audio streams
          and not different codec options. So, this is certainly against the existing
          SDP concept.
        </t>
        <figure align="center" title="Method 6">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=ptime:40

m=audio 1234 RTP/AVP 18
a=rtpmap:18 G729/8000
a=ptime:30

m=audio 1234 RTP/AVP 4
a=rtpmap:4 G723/8000
a=ptime:60 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 7">
        <t>
          Use of the 'ptime' in the 'fmtp' attribute
        </t>
        <figure align="center" title="Method 7">
          <artwork>
            <![CDATA[
m=audio 1234 RTP/AVP 4 18
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=yes;ptime=20
a=maxptime:40

a=rtpmap 4 G723/8000
a=fmtp:4 bitrate=6.3;annexa=yes;ptime=30
a=maxptime:60 ]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 8">
        <t>
          Use of the vsel parameter as done for ATM bearer connections
          Following example indicates first preference of G.729 or G.729a (both are
          interoperable) as the voice encoding scheme.  A packet length of 10
          octets and a packetization interval of 10 ms are associated with this
          codec.  G726-32 is the second preference stated in this line, with an
          associated packet length of 40 octets and a packetization interval of
          10 ms.  If the packet length and packetization interval are intended
          to be omitted, then this media attribute line contains '-'.
        </t>
        <figure align="center" title="Method 8">
          <artwork>
            <![CDATA[
a=vsel:G729 10 10000 G726-32 40 10000
a=vsel:G729 - - G726-32 - -]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 9">
        <t>
          Use of <xref target="ITU.V152"/>'maxmptime' (maximum multiple ptime) attribute,
          which contains different packetization times, each one maps to its
          corresponding payload type described in the preceding 'm=' line to
          indicate the supported packetization period for all codec payload
          types. This attribute is a media-level attribute and defines a list
          of maximum packetization time values, expressed in milliseconds, the
          endpoint is capable of using (sending and receiving) for the connection. When
          the maxmptime attribute is present, the ptime shall be ignored according to
          the V.152 specification. When the maxptime is absent, then the value of ptime
          attribute, if present, shall be taken as indicating the packetization period
          for all codecs present in the 'm=' line.<vspace/>
          The specification doesn't specify what has to be done when a 'maxptime' is also
          present. Does the 'maxmptime' indicates the absolute maximum which can be used
          as packetization time for a certain codec or does it indicate the packetization
          time which has to be used as preference. It's open to many different
          interpretations certainly in interworking scenarios.
        </t>
        <figure align="center" title="Method 9">
          <artwork>
            <![CDATA[
m=audio 3456 RTP/AVP 18 0 13 96 98 99
a=maxmptime:10 10 - - 20 20]]>
          </artwork>
        </figure>
      </section>
      <section title="Method 10">
        <t>
          Use of PacketCable 'mptime' attribute. See
          <xref target="PKT.PKT-SP-CODEC-MEDIA">"Codec and Media Specification"</xref> which
          gives a Note about the 'ptime': <xref target="RFC4566"/> defines the 'maxptime'
          SDP attribute
          and V.152 defines the 'maxmptime' SDP attribute. The precedence of these
          attributes with respect to the 'ptime' and 'mptime' attributes is not defined
          at this time."<vspace/>
        </t>
        <t>
          Remark:<vspace/>
          This method is the same as indicated by method 4. However, in the <xref target="PKT.PKT-SP-CODEC-MEDIA"/>
          version from 9/2006, the mptime was removed and the maxptime was added. The PacketCable
          seems to move away from the need of having multiple packetization times in
          function of the codec and treat it more in the direction of a maximum end-to-end
          delay aspect.
        </t>
      </section>
      <section title="Method 11">
        <t>
          Use of SDP capabilities negotiation method. See <xref
          target="I-D.ietf-mmusic-sdp-capability-negotiation"/>
          which describes how additional capabilities can be
          negotiated, such as the different supported ptimes. This
          could be a possible solution in certain cases, but it also
          requires updates in implementations which followed the basic
          ptime/maxptime concept to adapt themselves to more
          restricted implementations. It also introduces additional
          complexity by adding new parameters and new semantics.
        </t>
      </section>
    </section>
    <section title="Background info">
      <t>
        The "Session Initiation Protocol" (SIP) is used to setup media sessions.
        In the SIP INVITE message, a "Session Description Protocol" (SDP) is
        used. In the SDP media description part, the m-line contains the media
        type (e.g. audio), a transport port, a transport protocol (e.g. RTP/AVP)
        and a media format description depending on the transport protocol.
        For the transport protocol RTP/AVP or RTP/SAVP, the media format sub-field
        can contain a list of RTP payload type numbers.<vspace/>
        <vspace/>
        Example:
        m=audio 49232 RTP/AVP 8 0 4<vspace/>
        <vspace/>
        The "8 0 4" is the media format, indicating a list of possible codecs
        indicated by static or dynamic numbers as defined in
        <xref target="RFC3551">RFC 3551</xref>.
        <vspace/>
        In the above example, a list of static numbers is used:<vspace/>
        8 = PCMA - G.711 PCM A-law<vspace/>
        0 = PCMU - G.711 PCM u-law<vspace/>
        4 = G723 - G.723.1<vspace/>
      </t>
      <t>
        The PCMA and PCMU are "sample-based" codecs while the G723 is a "frame-based"
        codec. All of them make use of a sampling rate of 8 kHz or 0.125 ms/sample.
        PCMA and PMCU encode each sample in 8 bits by making use of the A or u
        logarithmic companding laws resulting in a datarate of 64 kbps.
        G723 however does not operate on single samples, but on different
        samples combined together in a "frame". As such, higher compression rates
        can be achieved. The G723 codec makes use of 240 voice samples corresponding with
        30 ms speech frame duration. The codec compresses the data in the frame and
        encodes it with 192 or 160 bits resulting in a datarate of 6.4 or 5.3 kbps.
        G723 gives the advantage of a lower bit rate at the cost of increased
        voice delay: 30 ms instead of 0,125 ms
      </t>
      <t>
        The "International Telecommunication Union" (ITU) gives some guidelines
        on acceptable end-to-end delays in <xref target="ITU.G114"/>. A delay up to
        150 ms is acceptable. Between 150 and 400 ms, there is impact on the
        perceived voice quality but still acceptable. Above 400 ms it becomes
        unacceptable. Echo cancellers are required for delays >25 ms.
      </t>
      <t>
        In "time division multiplexing" (TDM) networks, the coding delay is the
        biggest part contributing to the end-to-end delay. However, in
        "Packet Oriented" networks, packetization delays are added to the
        end-to-end delay and can become an issue. Each packet has a certain
        header which contributes to the bandwidth usage, i.e. the total required
        bit-rate. The more data can be packed together, the smaller the influence
        of the header on the total payload and the higher the transmission
        efficiency. However, combining more data in a packet gives an increase
        of the end-to-end delay. As such, there is a trade-off between bandwidth
        usage, amount of packet processing and end-to-end delay. For a higher
        compression rate, more data in a packet to improve the transmission
        efficiency gives a quality reduction due to the increased end-to-end delay.
      </t>
      <t>
        An example is indicated in following table where the G.711 (A or u-Law) is
        compared with the G.723.1 for different packetization delays. The headers
        consist of:
      </t>
      <t>
        <list style="symbols">
          <t>RTP header: 12 bytes.</t>
          <t>UDP header: 8 bytes.</t>
          <t>IPv4 header: 20 bytes.</t>
          <t>MAC layer: 14 bytes.</t>
          <t>CRC: 4 bytes.</t>
          <t>Start frame + preamble: 20 bytes.</t>
        </list>
      </t>
      <figure align="center" title="Packet delay & Throughput">
        <artwork>
          <![CDATA[
Codec  Packet Datarate Voice    Headers Tot    Payload Throughput
       Delay           Payload
       ms     kbps     bytes    bytes   bytes  %       kbps
-----------------------------------------------------------------       
G711   0.125  64          1     78        79    1.3    5056.0
         2.5  64         20     78        98   20.4     313.6 
         5    64         40     78       118   33.9     188.8
        10    64         80     78       158   50.6     126.4
        20    64        160     78       238   67.2      95.2
        30    64        240     78       318   75.5      84.8
        90    64        720     78       798   90.2      70.9
       200    64       1600     78      1678   95.4      67.1
-----------------------------------------------------------------       
G723.1  30    6.4        24     78       102   23.5      27.2
        60    6.4        48     78       126   38.1      16.8
        90    6.4        72     78       150   48.0      13.3
       150    6.4       120     78       198   60.6      10.6
       300    6.4       240     78       318   75.5       8.5
-----------------------------------------------------------------      
]]>
        </artwork>
      </figure>
      <t>
        For the same packetization delay of 30 ms, the datarate of the G.723.1
        is 10 times lower as for the G.711, but the payload efficiency is reduced
        from 75.5 to 23.5%.  The same efficiency for the G.723.1 is obtained when
        the packetization delay is 300 ms!  While the packet efficiency is lower,
        the required bitrate on the link for the G.723.1 is reduced from 84.8 kbps
        to 27.2 kbps. And when different frames are packed together, e.g. 3 frames
        of 30 ms, the packetization delay becomes 90 ms resulting in a lower amount
        of packets which have to be routed and processed and resulting in an
        improved throughput data rate of 13.3 kbps.
      </t>
      <t>
        The used frame sizes for the different codecs are 0.125 ms (G.711), 2.5 ms
        (G728), 10 ms (G729); 20 ms (G726, GSM; GSM-EFR, QCELP, LPC) and 30 ms (G723).
        All of them have a default 'ptime' of 20 ms, with the exception of the G723
        with a default 'ptime' of 30ms.
      </t>
      <t>
        The media description part can contain additional attribute lines which
        complement or modify the media description line: 'ptime' and 'maxptime'
        attributes.
      </t>
      <t>
        Example:<vspace/>
        m=audio 49232 RTP/AVP 8 0 4<vspace/>
        a=ptime:20<vspace/>
        a=maxptime:60
      </t>
      <t>
        <xref target="RFC3551">RFC 35551 </xref> defines the default
        packetization time for each codec in Table 1. The PCMA and
        PCMU have 20 ms as default 'ptime' and the G723 has a 30 ms
        default 'ptime'.
      </t>
      <t>
        When, as in the example above, the 'ptime' value is 20, then it is a wrong
        value for the G723 codec which requires at least a frame size of 30 ms
        and as such requires a minimal packetization delay of 30 ms. And this causes many
        different interworking problems between different systems due to different
        interpretations of the relevant RFCs resulting in bad voice quality or call
        setup failures.
      </t>
      <t>
        In some APIs, the following functions are provided to interface with the RTP
        and codec hardware layer for encoding voice samples, based on a certain codec,
        in RTP packets.
        <list style="numbers">
          <t>
            Set the encoding parameters such as codec type, payload type (for RTP),
            packetization rate. Mostly these parameters are configuration parameters of
            the device.
            Either, these parameters are manually provided based on guidelines from the
            network architecture or are dynamically and automatically provided.
          </t>
          <t>
            Next a transmit buffer has to be allocated. The lower layer provides a function
            to calculate the required buffer size in function of the encoding parameters.
          </t>
          <t>
            A transmit buffer is allocated with the indicated size (as a minimum) by the
            application layer.
          </t>
          <t>
            The synchronous voice data which has to be encoded is passed to the
            hardware layer which encodes the data (codec and packetization) into the
            provided buffer.
          </t>
          <t>
            The buffer with the RTP data is returned to the application which
            can sent it out on the host network interface towards the packet network.
          </t>
        </list>
      </t>
      <t>
        For the receiving part, required API functions are:
        <list style="numbers">
          <t>
            Set the required decoding parameters such as codec type, payload type,
            initial latency in frames, jitter buffer info. Please note that packetization
            time is not required because every receiver should be able to handle up
            to 200 ms, which is in fact the MTU size for which the receiver should
            have the required resources.
          </t>
          <t>
            The required buffer size which needs to be allocated is requested at
            the hardware. This size is calculated based on the size of the RTP
            header and the maximum allowed payload of 200 ms.
          </t>
        </list>
      </t>
      <t>
        * The application however can decide to allocate smaller buffers if the
        worst case is known for the expected RTP packetization time, i.e. by making
        use of the 'maxptime' attribute.
      </t>
      <t>
        Most implementations make use of a general purpose host processor (GPP)
        in combination with a digital signal processor (DSP) for the codec/packetization part.
        The host processor has the interface with the packet oriented world while
        the DSP has an interface with a real-time synchronous network mostly with
        special buffer handling mechanism to avoid too many interrupt handling.
      </t>
      <t>
        Suppose a VoIP call making use of the G711 A or u-law. Most hardware
        solutions are using a DSP to handle the realtime stuff. Most of these
        DSPs have special build-in hardware functionality for PCM samples. The DSP can be
        configured for A or u law and for a specific clock rate. For every transmitted
        or received PCM sample, the hardware can generate an interrupt. But this has of
        course is a big burden on the system performance. As such, the DSPs also
        provide a method to avoid this interrupt burden by providing a mechanism
        based on an internal buffer. An interrupt is only generated when the buffer
        is empty or full.
        The initialization of this DSP hardware for a specific call is done at the
        SIP invite SDP negotiation time.
      </t>
      <figure align="center" title="Example">
        <artwork>
          <![CDATA[
m=audio 1234 RTP/AVP 0 8 4
ptime=30 ]]>
        </artwork>
      </figure>
      <t>
        So, if this SDP contains a PT=0,8,4 (i.e. G711u, G711A, G723) and a 'ptime'
        of 30, then this 'ptime' can be used to initialize the DSP port with a buffer
        size for 30 ms PCM voice samples. When the "offerer" sends a RTP packet
        for a G711u or G711A by making use of the default value of 20 ms, then
        the DSP PCM port is waiting for 30ms before sending out the buffer.
        Because only 20 ms are received in the RTP packet, it has to wait for
        the next RTP packet before being able to transmit the buffer causing a
        serious degradation of the voice quality.
      </t>
      <t>
        This could be the problem in DSP based solutions in media gateways between
        IP and PSTN world but also for end user internet access devices (IAD) providing
        the possibility to attach a normal analog voice phone via a RJ11 jack (ATA -
        analog telephone adapter).
      </t>
      <t>
        For this use case, certain implementers are making arguments in the
        direction of a complete SDP negotiation mechanism. But this is in conflict
        with the SDP paradigm where the 'ptime' is an optional parameter and not bound
        to a specific codec but to the media itself.
        Different proprietary solutions are now implemented causing even more
        interworking issues.
      </t>
    </section>
  </back>
</rfc>
PAFTECH AB 2003-2026
2026-04-23 08:27:34