One document matched: draft-ietf-rtcweb-audio-codecs-for-interop-05.xml
<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [
<!ENTITY RFC2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY RFC6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml'>
<!ENTITY RFC7478 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7478.xml'>
<!ENTITY RFC4867 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4867.xml'>
<!ENTITY RFC3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml'>
<!ENTITY audio-codecs PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtcweb-audio.xml'>
<!ENTITY overview PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtcweb-overview.xml'>
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt"?>
<rfc ipr="trust200902" docName="draft-ietf-rtcweb-audio-codecs-for-interop-05" category="info" xml:lang="en">
<?rfc toc='yes'?>
<?rfc compact='yes'?>
<?rfc sortrefs='yes'?>
<front>
<title abbrev='WebRTC audio codecs for interop'>
Additional WebRTC audio codecs for interoperability.
</title>
<author initials='S.' surname='Proust' fullname='Stephane Proust' role="editor">
<organization>Orange</organization>
<address>
<postal>
<street>2, avenue Pierre Marzin</street>
<city>Lannion</city>
<code>22307</code>
<country>France</country>
</postal>
<email>stephane.proust@orange.com</email>
</address>
</author>
<date month='February' year='2016' />
<area>Real-time Applications and Infrastructure Area</area>
<keyword>WebRTC</keyword>
<keyword>audio</keyword>
<keyword>codec</keyword>
<keyword>G.722</keyword>
<keyword>AMR</keyword>
<keyword>AMR-WB</keyword>
<abstract>
<t>To ensure a baseline level of interoperability between WebRTC endpoints,
a minimum set of required codecs is specified.
However, to maximize the possibility to establish the session without
the need for audio transcoding, it is also recommended to include in the
offer other suitable audio codecs that are available to the browser.
</t>
<t>
This document provides some guidelines on the suitable codecs to be considered
for WebRTC endpoints to address the most relevant interoperability use cases.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>As indicated in <xref target="I-D.ietf-rtcweb-overview" />, it has been anticipated
that WebRTC will not remain an isolated island and that some WebRTC endpoints
will need to communicate with devices used in other existing networks
with the help of a gateway. Therefore, in order to maximize the possibility
to establish the session without the need for audio transcoding, it is
recommended in <xref target="I-D.ietf-rtcweb-audio" /> to include in the offer other suitable audio codecs beyond those that are mandatory to implement.
This document provides some guidelines on the suitable codecs to be considered for WebRTC
endpoints to address the most relevant interoperability use cases.
</t>
<t>The codecs considered in this document are recommended to be supported and included
in the Offer only for WebRTC endpoints for which interoperability with other non-WebRTC endpoints
and non-WebRTC based services is relevant as described in <xref target="amrwb-use-cases"/>, <xref target="amr-use-cases"/>, <xref target="g722-use-cases"/>.
Other use cases may justify offering other additional codecs to avoid transcoding.
</t>
</section>
<section title="Definition and abbreviations">
<t>
<list style="symbols">
<t>Legacy networks: In this document, legacy networks encompass the conversational networks that are already deployed like the PSTN, the PLMN, the IP/IMS networks offering VoIP services, including 3GPP "4G" Evolved Packet System<xref target="TS23.002"/> supporting voice over LTE radio access (VoLTE) <xref target="IR.92"/>.</t>
<t>WebRTC endpoint: a WebRTC endpoint can be a WebRTC browser or a WebRTC non browser (also called "WebRTC device" or "WebRTC native application") as defined in <xref target="I-D.ietf-rtcweb-overview" /></t>
<t>AMR: Adaptive Multi-Rate.</t>
<t> AMR-WB: Adaptive Multi-Rate WideBand.</t>
<t> CAT-iq: Cordless Advanced Technology-internet and quality.</t>
<t> DECT: Digital Enhanced Cordless Telecommunications</t>
<t> IMS: IP Multimedia Subsystem</t>
<t> LTE: Long Term Evolution (3GPP "4G" wireless data transmission standard)</t>
<t> MOS: Mean Opinion Score</t>
<t> PSTN:Public Switched Telephone Network</t>
<t> PLMN: Public Land Mobile Network</t>
<t> VoLTE: Voice Over LTE</t>
</list>
</t>
</section>
<section anchor="rationale" title="Rationale for additional WebRTC codecs">
<t>
The mandatory implementation of OPUS <xref target="RFC6716" /> in WebRTC endpoints can guarantee codec interoperability
(without transcoding) at state of the art voice quality (better than narrow band "PSTN"
quality) between WebRTC endpoints. The WebRTC technology is also expected to be used to communicate with other types of endpoints using other technologies. It can be used for instance
as an access technology to VoLTE services (Voice over LTE as specified in <xref target="IR.92" />) or to interoperate with fixed or mobile Circuit Switched or VoIP services like mobile Circuit Switched voice over 3GPP 2G/3G mobile networks <xref target="TS23.002"/> or DECT based VoIP telephony <xref target="EN300175-1"/>. Consequently, a significant number of calls are likely to occur between terminals
supporting WebRTC endpoints and other terminals like mobile handsets, fixed VoIP terminals,
DECT terminals that do not support WebRTC endpoints nor implement OPUS. As a consequence, these
calls are likely to be either of low narrow band PSTN quality using G.711 <xref target="G.711"/> at both ends or affected
by transcoding operations. The drawback of such transcoding operations are listed below:
<list style="symbols">
<t>Degraded user experience with respect to voice quality: voice quality is significantly degraded by transcoding. For instance, the degradation is around 0.2 to 0.3 MOS for most of transcoding use cases with AMR-WB codec (<xref target="amrwb"/>) at 12.65 kbit/s and in the same range for other wideband transcoding cases. It should be stressed that if G.711 is used as a fall back codec for interoperation, wideband voice quality will be lost.
Such bandwidth reduction effect down to narrow band clearly degrades the user perceived quality of service leading to shorter and less frequent calls. Such a switch to G.711
is less than desirable or acceptable choice for customers. If transcoding is performed between OPUS and any other wideband codec, wideband communication could be maintained but with degraded quality (MOS scores of transcoding between AMR-WB 12.65 kbit/s and OPUS at 16 kbit/s in both directions are significantly lower than those of AMR-WB at 12.65 kbit/s or OPUS at 16 kbit/s). Furthermore, in degraded conditions, the addition of defects, like audio artifacts due to packet losses, and the audio effects resulting from the cascading of different packet loss recovery algorithms may result in a quality below the acceptable limit for the customers.<vspace blankLines="1" /></t>
<t>Degraded user experience with respect to conversational interactivity: the degradation of conversational interactivity is due to the increase of end to end latency for both directions that is introduced by the transcoding operations. Transcoding requires full de-packetization for decoding of the media stream (including mechanisms of de-jitter buffering and packet loss recovery) then re-encoding, re-packetization and re-sending. The delays produced by all these operations are additive and may increase the end to end delay up to 1 second, much beyond the acceptable limit.<vspace blankLines="1" /></t>
<t>Additional cost in networks: transcoding places important additional cost on network gateways mainly related to codec implementation, codecs licenses, deployment, testing and validation cost. It must be noted that transcoding of wideband to wideband would require more CPU processing and be more costly than transcoding between narrowband codecs.<vspace blankLines="1" /></t>
</list>
</t>
</section>
<section title="Additional suitable codecs for WebRTC">
<t>
The following codecs are considered as relevant codecs with respect to the general
purpose described in <xref target="rationale"/>. This list reflects the current status of WebRTC foreseen
use cases. It is not limitative and opened to further inclusion of other codecs for which
relevant use cases can be identified. These additional codecs are recommended to be included in the offer in addition to OPUS and G.711 according to the foreseen interoperability cases to be addressed.
</t>
<section anchor="amrwb" title="AMR-WB">
<section title="AMR-WB General description">
<t>
The Adaptive Multi-Rate WideBand (AMR-WB) is a 3GPP defined speech codec that
is mandatory to implement in any 3GPP terminal that supports wideband speech
communication. It is being used in circuit switched mobile telephony services
and new multimedia telephony services over IP/IMS. It is especially used for voice over LTE as specified by GSMA in <xref target="IR.92" />. More detailed information on
AMR-WB can be found in <xref target="IR.36" />. References for AMR-WB
related specifications including detailed codec description and source code are in <xref target="TS26.171"/>, <xref target="TS26.173"/>, <xref target="TS26.190"/>, <xref target="TS26.204"/>.
</t>
</section>
<section anchor="amrwb-use-cases" title="WebRTC relevant use case for AMR-WB">
<t>
The market of personal voice communication is driven by mobile terminals.
AMR-WB is now very widely implemented in devices and networks offering "HD Voice"
A high number of calls are consequently likely to occur between WebRTC endpoints
and mobile 3GPP terminals offering AMR-WB.
The use of AMR-WB by WebRTC endpoints would consequently allow transcoding
free interoperation with all mobile 3GPP wideband terminals. Besides,
WebRTC endpoints running on mobile terminals (smartphones) may reuse
the AMR-WB codec already implemented on these devices.
</t>
</section>
<section title="Guidelines for AMR-WB usage and implementation with WebRTC">
<t>
The payload format to be used for AMR-WB is described in <xref target="RFC4867" /> with bandwidth efficient format and one speech frame encapsulated in each RTP packets. Further guidelines for implementing and using AMR-WB and ensuring
interoperability with 3GPP mobile services can be found in <xref target="TS26.114" />.
In order to ensure interoperability with 4G/VoLTE as specified by GSMA,
the more specific IMS profile for voice derived from <xref target="TS26.114"/>
should be considered in <xref target="IR.92" />.
In order to maximize the possibility of successful call establishment for WebRTC endpoints offering AMR-WB it is important that the WebRTC endpoints:
<list style="symbols">
<t>Offer AMR in addition to AMR-WB with AMR-WB listed first (AMR-WB being a wideband codec) as preferred payload type with respect to other narrow band codecs (AMR, G.711...) and with Bandwidth Efficient payload format preferred.</t>
<t>Be capable of operating AMR-WB with any subset of the nine codec modes and source controlled rate operation.
Offer at least one AMR-WB configuration with parameter settings as defined in Table 6.1 of <xref target="TS26.114"/>. In order to maximize the interoperability and quality this offer does not restrict the codec modes offered. Restrictions in the use of codec modes may be included in the answer.</t>
</list>
</t>
</section>
</section>
<section title="AMR">
<section title="AMR General description">
<t>
Adaptive Multi-Rate (AMR) is a 3GPP defined speech codec that is
mandatory to implement in any 3GPP terminal that supports voice
communication.
This include both mobile phone calls using GSM and 3G cellular
systems as well as multimedia telephony services over IP/IMS and
4G/VoLTE, such as, GSMA voice IMS profile for VoLTE in <xref target="IR.92" />.
In addition to impacts listed above, support of AMR can avoid
degrading the high efficiency over mobile radio access.References for AMR related specifications including detailed codec description and source code are in <xref target="TS26.071"/>, <xref target="TS26.073"/>, <xref target="TS26.090"/>, <xref target="TS26.104"/>.
</t>
</section>
<section anchor="amr-use-cases" title="WebRTC relevant use case for AMR">
<t>
A user of a WebRTC endpoint on a device integrating an AMR
module wants to communicate with another user that can only be
reached on a mobile device that only supports AMR. Although more
and more terminal devices are now "HD voice" and support AMR-WB;
there are still a high number of legacy terminals supporting only
AMR (terminals with no wideband / HD Voice capabilities) that are
still in use. The use of AMR by WebRTC endpoints would consequently
allow transcoding free interoperation with all mobile 3GPP
terminals. Besides, WebRTC endpoints running on mobile terminals
(smartphones) may reuse the AMR codec already implemented on
these devices.
</t>
</section>
<section title="Guidelines for AMR usage and implementation with WebRTC">
<t>
The payload format to be used for AMR is described in <xref target="RFC4867"/> with bandwidth efficient format and one speech frame encapsulated in each RTP packets. Further guidelines for implementing and using AMR with purpose to ensure
interoperability with 3GPP mobile services can be found in <xref target="TS26.114" />.
In order to ensure interoperability with 4G/VoLTE as specified
by GSMA, the more specific IMS profile for voice derived from
<xref target="TS26.114" /> should be considered in <xref target="IR.92" />.
In order to maximize the possibility of successful call establishment for WebRTC endpoints offering AMR, it is important that the WebRTC endpoints:
<list style="symbols">
<t>Be capable of operating AMR with any subset of the eight codec modes and source controlled rate operation.</t>
<t>Offer at least one configuration with parameter settings as defined in Table 6.1 and Table 6.2 of <xref target="TS26.114" />. In order to maximize the interoperability and quality this offer shall not restrict AMR codec modes offered. Restrictions in the use of codec modes may be included in the answer.</t>
</list>
</t>
</section>
</section>
<section title="G.722">
<section title="G.722 General description">
<t>
G.722 <xref target="G.722" /> is an ITU-T defined wideband speech codec. G.722 was approved
by ITU-T in 1988. It is a royalty free codec that is common in a
wide range of terminals and endpoints supporting wideband speech
and requiring low complexity. The complexity of G.722 is estimated
to 10 MIPS <xref target="EN300175-8" /> which is 2.5 to 3 times lower than AMR-WB.
Especially, G.722 has been chosen by ETSI DECT as the mandatory
wideband codec for New Generation DECT with purpose to greatly
increase the voice quality by extending the bandwidth from narrow
band to wideband. G.722 is the wideband codec required for CAT-iq
DECT certified terminals and the V2.0 of CAT-iq specifications have
been approved by GSMA as minimum requirements for HD voice logo
usage on "fixed" devices; i.e., broadband connections using the
G.722 codec.
</t>
</section>
<section anchor="g722-use-cases" title="WebRTC relevant use case for G.722">
<t>
G.722 is the wideband codec required for DECT CAT-iq terminals.
DECT cordeless phones are still widely used to offer short range wireless connection to PSTN or VoIP services. G.722 has also been specified by ETSI in
<xref target="TS181005" /> as mandatory wideband codec for IMS multimedia telephony
communication service and supplementary services using fixed
broadband access. The support of G.722 would consequently allow
transcoding free IP interoperation between WebRTC endpoints and fixed
VoIP terminals including DECT / CAT-IQ terminals supporting G.722.
Besides, WebRTC endpoints running on fixed terminals implementing G.722
may reuse the G.722 codec already implemented on these devices.
</t>
</section>
<section title="Guidelines for G.722 usage and implementation">
<t>
The payload format to be used for G.722 is defined in <xref target="RFC3551"/> with each octet of the stream of octets produced by the codec to be octet-aligned in an RTP packet. The sampling frequency for G.722 is 16 kHz but the rtp clock rate is set to 8000Hz in SDP to stay backward compatible with an erroneous definition in the original version of the RTP A/V profile. Further guidelines for implementing and using G.722 with purpose to ensure
interoperability with multimedia telephony services over IMS can
be found in section 7 of <xref target="TS26.114" />. Additional information of
G.722 implementation in DECT can be found in <xref target="EN300175-8" /> and
full codec description and C source code in <xref target="G.722" />.
</t>
</section>
</section>
</section>
<section title="Security Considerations">
<t>
Security considerations for WebRTC Audio Codec and Processing Requirements can be found in <xref target="I-D.ietf-rtcweb-audio"/>. Implementors making use of the additional codecs considered in this document are advised to also refer more specifically to the "Security Considerations" sections of <xref target="RFC4867"/> (for AMR and AMR-WB) and <xref target="RFC3551"/>.
</t>
</section>
<section title="IANA Considerations">
<t>None.</t>
</section>
<section title="Acknowledgements">
<t>The authors of this document are
<list style="symbols">
<t>Stephane Proust, Orange, <eref target="stephane.proust@orange.com">stephane.proust@orange.com</eref>, </t>
<t>Espen Berger, Cisco, <eref target="espeberg@cisco.com">espeberg@cisco.com</eref>, </t>
<t>Bernhard Feiten, Deutsche Telekom, <eref target="Bernhard.Feiten@telekom.de">Bernhard.Feiten@telekom.de</eref>, </t>
<t>Bo Burman, Ericsson, <eref target="bo.burman@ericsson.com">bo.burman@ericsson.com</eref>, </t>
<t>Kalyani Bogineni, Verizon Wireless, <eref target="Kalyani.Bogineni@VerizonWireless.com">Kalyani.Bogineni@VerizonWireless.com</eref>, </t>
<t>Mia Lei, Huawei, <eref target="lei.miao@huawei.com">lei.miao@huawei.com</eref>, </t>
<t>Enrico Marocco,Telecom Italia, <eref target="enrico.marocco@telecomitalia.it">enrico.marocco@telecomitalia.it</eref>, </t>
</list>
though only the editor is listed on the front page.</t>
<t>The authors would like to thank Magnus Westerlund, Barry Dingle and Sanjay Mishra who carefully reviewed the document and helped to improve it.</t>
</section>
</middle>
<back>
<references title="Normative references">
&audio-codecs;
&RFC4867;
&RFC3551;
<reference anchor="TS26.114">
<front>
<title>IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction V13.0.0</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26114.htm</uri>
</address>
</author>
<date month="June" year="2015"></date>
</front>
</reference>
<reference anchor="IR.92">
<front>
<title>IMS Profile for Voice and SMS V9.0</title>
<author>
<organization>GSMA</organization>
<address>
<uri>http://www.gsma.com/newsroom/all-documents/ir-92-ims-profile-for-voice-and-sms/</uri>
</address>
</author>
<date month="April" year="2015"></date>
</front>
</reference>
<reference anchor="G.722">
<front>
<title>Recommendation ITU-T G.722 (2012): 7 kHz audio-coding within 64 kbit/s
</title>
<author>
<organization>ITU</organization>
<address>
<uri>http://www.itu.int/rec/T-REC-G.722-201209-I/en</uri>
</address>
</author>
<date year="2012-09"></date>
</front>
</reference>
<reference anchor="TS26.071">
<front>
<title>3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 (2012): "Mandatory Speech Codec speech processing functions; AMR Speech CODEC; General description".
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26071.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.073">
<front>
<title>3GPP TS 26.073 v12.0.0: ANSI C code for the Adaptive Multi Rate (AMR) speech codec
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26073.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.090">
<front>
<title>3GPP TS 26.090 v12.0.0: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions.
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26090.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.104">
<front>
<title>3GPP TS 26.104 v12.0.0: ANSI C code for the floating-point Adaptive Multi Rate (AMR) speech codec.
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26090.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.171">
<front>
<title>3GPP TS 26.071 v12.0.0: Recommendation ITU-T G.722 (2012): "Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; General description".
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26171.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.173">
<front>
<title>3GPP TS 26.073 v12.1.0: ANSI-C code for the Adaptive Multi-Rate - Wideband (AMR-WB) speech codec.
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26173.htm</uri>
</address>
</author>
<date year="2015-03"></date>
</front>
</reference>
<reference anchor="TS26.190">
<front>
<title>3GPP TS 26.090 v12.0.0: Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions.
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26190.htm</uri>
</address>
</author>
<date year="2014-09"></date>
</front>
</reference>
<reference anchor="TS26.204">
<front>
<title>3GPP TS 26.104 v12.1.0: Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; ANSI-C code.
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/26204.htm</uri>
</address>
</author>
<date year="2015-03"></date>
</front>
</reference>
</references>
<references title="Informative references">
&RFC6716;
&overview;
<reference anchor="IR.36">
<front>
<title>Adaptive Multirate Wide Band V3.0</title>
<author>
<organization>GSMA</organization>
<address>
<uri>http://www.gsma.com/newsroom/all-documents/official-document-ir-36-adaptive-multirate-wide-band</uri>
</address>
</author>
<date month="September" year="2014"></date>
</front>
</reference>
<reference anchor="TS181005">
<front>
<title>Telecommunications and Internet converged Services and
Protocols for Advanced Networking (TISPAN); Service and Capability Requirements V3.3.1 (2009-12)
</title>
<author>
<organization>ETSI</organization>
</author>
<date year="2009"></date>
</front>
</reference>
<reference anchor="EN300175-8">
<front>
<title>ETSI EN 300 175-8, v2.5.1: Digital Enhanced Cordless Telecommunications (DECT);
Common Interface (CI); Part 8: Speech and audio coding and transmission.
</title>
<author>
<organization>ETSI</organization>
</author>
<date year="2009"></date>
</front>
</reference>
<reference anchor="EN300175-1">
<front>
<title>ETSI EN 300 175-1, Digital Enhanced Cordless Telecommunications (DECT); Common Interface (CI); Part 1: Overview v2.5.1
</title>
<author>
<organization>ETSI</organization>
</author>
<date year="2009"></date>
</front>
</reference>
<reference anchor="G.711">
<front>
<title>Recommendation ITU-T G.711 (2012): Pulse code modulation (PCM) of voice frequencies
</title>
<author>
<organization>ITU</organization>
<address>
<uri>http://www.itu.int/rec/T-REC-G.711-198811-I/en</uri>
</address>
</author>
<date year="1988-11"></date>
</front>
</reference>
<reference anchor="TS23.002">
<front>
<title>3GPP TS 23.002 v13.3.0: Network architecture
</title>
<author>
<organization>3GPP</organization>
<address>
<uri>http://www.3gpp.org/DynaReport/23.002.htm</uri>
</address>
</author>
<date year="2015-09"></date>
</front>
</reference>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 14:17:49 |