One document matched: draft-isomaki-rtcweb-mobile-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
There has to be one entity for each item to be referenced.
An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
(Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="no" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-isomaki-rtcweb-mobile-00" ipr="trust200902">
<!-- category values: std, bcp, info, exp, and historic
ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
or pre5378Trust200902
you can add the attributes updates="NNNN" and obsoletes="NNNN"
they will automatically be output with "(if approved)" -->
<!-- ***** FRONT MATTER ***** -->
<front>
<!-- The abbreviated title is used in the page header - it is only necessary if the
full title is longer than 39 characters -->
<title abbrev="RTCWeb for Mobile">RTCweb Considerations for Mobile Devices</title>
<author initials='M.I.' surname="Isomaki" fullname='Markus Isomaki'>
<organization abbrev="Nokia">Nokia</organization>
<address>
<postal>
<street>Keilalahdentie 2-4</street>
<code>FI-02150 Espoo</code>
<country>Finland</country>
</postal>
<email>markus.isomaki@nokia.com</email>
</address>
</author>
<date year="2012" />
<!-- Meta-data Declarations -->
<area>RAI</area>
<workgroup>RTCWeb</workgroup>
<keyword>RTCWeb</keyword>
<keyword>mobile</keyword>
<!-- Keywords will be incorporated into HTML output
files in a meta tag but they have no effect on text or nroff
output. If you submit your draft to the RFC Editor, the
keywords will be used for the search engine. -->
<abstract>
<t>
Web Real-time Communications (WebRTC) aims to provide web-based applications
real-time and peer-to-peer communication capabilities. In many cases those
applications are run in mobile devices connected to different types of mobile
networks. This document gives an overview of the issues and challenges in
implementing and deploying WebRTC in mobile environments. It also gives
guidance on how to overcome those challenges.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
Web Real-time Communications (WebRTC) provides web-based applications real-time
and peer-to-peer communication capabilities. The applications can setup communication
sessions that can carry audio, video, or any application specific data. To be
reachable for incoming sessions setups or other messages, the applications must
keep persistent connectivity with their "calling site".
</t>
<t>
In the last few years, mobile devices, such as smartphones or tablets, have become
relatively powerful in terms of processing and memory. Their browsers are becoming
close to their desktop counterparts. So, from that perspective, it is feasible
to run WebRTC applications in them. However, power consumption and highly diverse
nature of the connectivity still remain as specific challenges. A lot of work is done
to address these challenges in e.g. radio technologies and hardware components, but
still by far the most important factor is how the applications and protocols and
application programming interfaces are designed.
</t>
<t>
Section 2 of this document gives an overview of the characteristics of different
mobile networks as background for further discussion. Section 3 introduces the specific
issues that WebRTC protocols and applications should take into consideration to be
mobile-friendly.
</t>
<t>
The current version of the document misses all references and lot of details. It may have
some errors. Its purpose is to get attention to the topics it raises and start discussion about them.
</t>
</section>
<section title="Common mobile networks and their properties">
<t>
The most relevant mobile networks for WebRTC at the moment are Wi-Fi and the different
variants of cellular technologies.
</t>
<t>
Many characteristics of the cellular networks are covered in Section 3 in the context of the
particular issue under discussion. The following is a very brief description of the power
consumption related properties of WCDMA/HSPA networks. The details vary, but similar principles
apply to other cellular networks, at least GPRS/EDGE and LTE.
</t>
<t>
In simplified terms, the WCDMA/HSPA radio can be in three different types of states: The power-save
state (IDLE, Cell_PCH, URA_PCH), a shared channel state (Cell_FACH) or a dedicated channel state
(Cell_DCH). The power-save states consumes about two decades less power than the dedicated channel
state, while the shared-channel state is somewhere in the middle. The state machine works so that if
a device has only small packets (upto ~200-500 bytes) to send or receive, it will allocate a
shared channel, that operates on low data rate. If there is more traffic (even a single full size
IP packet), a dedicated channel is allocated. Starting from the power-save state, the channel allocation
typically takes somewhere between 0.5 and 2 seconds, depending on the network and the exact power-save state.
Only after that, the first packet is really sent. If two cellular devices were to exchange packets with each
other starting from the power-save state, the initial IP-level RTT could be easily 3-4 seconds.
</t>
<t>
The channel
is kept for some time after the last packet has been sent or received. The dedicated channel drops to
power-save via the shared channel. The timers from dedicated to shared and shared to power-save are
network dependent, but typically somewhere between 5 and 30 seconds. So, in some networks sending a single
ping every 30 secods is enough to keep the power consumption constantly at the maximum level, while in
others the power-save state is entered much faster. The total radio power consumption does not actually
depend so much on overall volume of traffic, but on how long a dedicated or shared channel is active. So,
for instance a 1 kB keep-alive sent every 30 seconds for an hour (total ~100 kB of traffic) consumes much
more (even an order or magnitude more!) than a single 10 MB download, assuming that will finish in a minute or two.
</t>
<t>
The applications have no control over the radio states, but the Operating System and the Radio Modem
software can do something about them. In the newer specifications (and devices and networks) it is possible
for the device to explicily ask the radio channel to be abandoned even immediately after the last packet.
For instance, if the device were somehow to know that no new packets are to be sent for some time, it
could do such signaling and save power.
</t>
<t>
The bottom line is that applications and protocols should keep as long intervals between traffic as possible,
giving the radio as much low-power time as possible. The intervals that are more than a few seconds may help,
but at least intervals that are longer than 30 seconds will definitely help. On the other hand, the initial
RTT after an interval will be long. This issue is covered in Sections 3.1 and 3.2.
</t>
<t>
The other key characteristic of cellular networks is that they have long buffers and run link-layer in
"acknowledged" mode, meaning all lost packets are retransmitted. This means TCP will easily create long delays
and ruins real-time traffic. This is covered in Section 3.4.
</t>
<t>
The third characteristic is that mobile devices often change networks on the fly, typically between cellular and
Wi-Fi. Most devices only run a single interface at a time. From networking perspective this means that the
device's IP address changes, and e.g. all its TCP connections are lost. This is covered in Section 3.3.
</t>
</section>
<section title="Specific issues and how to deal with them">
<t>
</t>
<section title="Persistent connectivity to the Calling Site">
<t>
Many WebRTC apps want to be reachable for incoming sessions (JSEP Offers) or other types of
asynchronous messages. For this purpose they need some kind of a persistent communication
channel with their "Calling Site". Two standard approaches for this are WebSockets and HTTP
long-polling. In both of these cases a TCP connection is used as the underlying transport.
</t>
<t>
Most cellular networks have a firewall preventing incoming TCP connections, even when they
allocate public IPv4 or IPv6 addresses. Also NATs are becoming more popular with the exhaustion
of IPv4 address space. The firewall and NAT timers for TCP can range between 1 and 60 minutes,
depending on the network. To keep the TCP connection alive, the application needs to send
some kind of a keep-alive packets with high enough frequency to avoid the timeout.
</t>
<t>
If the WebRTC app intends to run for a long periods of time (even when the user is not actively
interacting with it), it is of utmost importance to keep this keep-alive traffic as infrequent
as possible. Every wake-up of the radio consumes a significant amount of power, even if it is
needed just for sending and receiving a couple of IP packets. It makes a huge difference, if there
are for instance 6 vs. 60 of these wake-ups every hour. A naiive application may want to make
it sure it sends frequently enough for all possible networks. That leads to unacceptable
power consumption. A smarter application will try to figure out a suitable
timeout for a given network it is using, and can save a lot of power in networks with longer
timers.
</t>
<t>
There are further strategies to manage the keep-alives so that they consume least amount of power.
It is best to send as small keep-alive messages as possible. HSPA/WCDMA networks have a special
shared radio channel (FACH) that can carry small amounts of traffic. Its power consumption is typically
less than half of the dedicated channel. Depending on the network, a packet of a couple of
hundred bytes will usually only require FACH, while a thousand byte packet will require the dedicated channel
to be activated. So, a WebSocket PING-PONG is better than an HTTP POST or GET with all the Cookies
and other headers attached. If there are multiple applications or connections to be kept alive, the Browser
or the underlying platform should offer some kind of a synchronization for them, so that the radio is
woken only once per cycle.
</t>
<t>
The most efficient approach would be to multiplex the initial incoming messages for all applications
over the same TCP connection. This would require the use of some kind of a gateway service in the network.
Such "notification" services are available on many platforms, but at the moment they are not typically
available for browsers or web applications. It would be useful to standardize or develop Javascript APIs for this
purpose. There is W3C work on Server-sent events. Also, the Open Mobile Alliance (OMA) has started work on
standardized "notification" services. Be the services standards based or proprietary, the most relevant
part to get done would be to give WebRTC and other Web applications access to them. Such services are
always subject to privacy concerns, so at minimum the messages passed over them should be end-to-end
encrypted. (Traffic analysis threats would still remain.)
</t>
</section>
<section title="Media and Data channels">
<t>
Real-time media (audio, video) is typically sent and/or received constantly, while the media channel
is established. This means radio needs to be on constantly, and there is little for the application
to do to preserve power. (Choosing a hardware accelerated video codec over a non-HW-supported one
is one thing the application may be able to influence.) At least in LTE there are techniques called
Discontinuous Transmission/Reception (DTX, DRX), that operate even in the timeframe of tens of
milliseconds and can affect power consumption e.g. for VoIP. It is an open issue if WebRTC stacks can
be somehow optimized for them.
</t>
<t>
The Data Channel may however be often low-volume or even idle for long periods of time. For instance
an IM connection may be idle for minutes or even hours. There can be many apps that want to keep
such a connection available just in case there is some traffic to be sent or received infrequently.
The WebRTC Data Channel is based on SCTP over DTLS over UDP. This means it needs keepalives in the
order of 30 seconds in cellular networks, meaning the radio will be active most of the time even if
no user traffic is sent. It is not possible to keep such a channel on for a long
time due to power consumption.
</t>
<t>
Applications can choose different strategies to deal with this problem. One approach is to avoid Data
Channels completely for low-volume or infrequent traffic and send it via the Web servers over HTTP or
WebSockets. This is probably the best approach. The other approach is to tear down the Data Channel after some timeout and re-establish it
only when new traffic needs to be sent. This may create some lag in sending the first message after
the interval. The third option is to transport the Data Channel over TCP, e.g. using a yet undefined
"HTTP tunneling fallback" mechanism. This would be almost identical to the first approach, except
that logically the application would still be using a WebRTC Data Channel. It is not yet clear if this
will be feasible due to ICE concent refreshes that may need to occur frequetly as well (every 30 seconds?).
They are sent end-to-end so one side of the Data Channel can not by itself even affect their rate.
</t>
</section>
<section title="Recovery from interface switching">
<t>
Most mobile platforms only support Internet connectivity over only one interface at a time. In
practice this is either a cellular or a Wi-Fi interface. From radio hardware perspective there
would be no need for such a limitation, but it is driven by simplicity and power preservation.
The devices typically have a hard-coded or configurable priority order for different networks.
The most common policy is that any known Wi-Fi network is always preferred over any cellular
network, but even more complex policies are possible.
</t>
<t>
When the device detects a higher priority network than the one currently in use, it will by default
attach to that network automatically. After a successful attachment to the new network, the device
turns the old network (and interface) off. In most platforms applications have no control over this.
In a typical situation the switch-over leads to a change of IP address, and for instance all
TCP connections becoming disconnected, and any state tied to them needs to be recreated.
</t>
<t>
It is important that WebRTC applications are made robust enough to survive this behavior. Many native
applications deal with it by listening to "disconnect" and "reconnect" events through the APIs they
are using. For WebRTC apps the first priority is to re-establish its "signaling" connectivity to the
"Calling Site". If that connectivity is based on a
WebSocket, the application needs to react to the "onerror" event through the WebSocket API and establish
a new connection and setup all state related to it. (Say, if the application was using SIP over
WebSockets, it might have to re-REGISTER on the SIP level.) If the disconnect was caused by interface
switching and the switch-over succeeded cleanly, it would be possible to setup the new connection
immediately. In some cases the disconnect could last longer, and the application would have to retry
the connection until connectivity is regained.
</t>
<t>
It would be advisable to make the
reconnect step as lightweight as possible in terms of RTTs required. For the browser and the web
application platform, it is important that the "disconnect" event gets propagated to the applications
as fast as possible.
</t>
<t>
For HTTP long-polling, it would similarly be important to notice that the underlying
TCP connection has become stale, and a new poll needs to be sent as quickly as possible.
</t>
<t>
The application may also attempt to update any peer-to-peer sessions it is having at the time of the
switch-over. At this point of RTCWeb standardization it is not yet clear how much control over this
the protocols and APIs will exhibit. There are many layers on which the recovery can be done. It is
possible to try to deal with it using ICE. This would require knowing when the currently used ICE
candidate becomes unusable, as it is bound to a removed interface. The failure of ICE connectivity
checks provide that information, but possibly after some delay. (Frequent connectivity checks are not
an issue as long as media is actively sent or received, but would be costly over an idle or low-volume media
channel, such as a Data Channel. If media traffic is infrequent, the speed of detection may not be that
critical for user experience anyway.) If an interface really became unusable, it would be better to have
an explicit event to signal that all ICE candidates bound to it are likely unusable as well, so the
application could act immediately. If a new interface became available, the application could restart
ICE and start using the new candidates gathered.
</t>
<t>
The PeerConnection API offers a few events for these
purposes, at least "icechange" and "renegotiationneeded". With these the application can learn about
problems with the currently used candidates. There is also a method "updateIce" by which the application
can restart the ICE candidate gathering process. It is however not yet entirely clear how these event
handlers and methods should be best used to deal with an interface change, and whether they even are a
feasible tool for dealing with it. It is also important to note that no new offers or answers could be
sent or received until the "signaling channel" (e.g. the Websocket connection) was first re-established.
</t>
<t>
If the lower-level instruments fail, the application could create a new PeerConnection, and recreate the
media channels. This would be a heavier operation, but in some cases it might still be better than
leaving the recovery entirely to the user, i.e. explicitly making a new call from the UI.
</t>
<t>
There are certain things that the underlyind platform (Operating System, Connection Manager etc.) can
also implement to make interface switching smoother for the applications. One possibility would be to
keep the old interface available for a short duration even after a new higher priority interface becomes
available. This would allow applications to deal with the change in a more proactive fashion. There are
also protocols such as Multipath TCP that could be used to switch e.g. WebSocket connections to a new
interface without always resorting to the application support.
</t>
</section>
<section title="Congestion avoidance">
<t>
Cellular mobile networks have notoriously large buffers. Their link layers also typically operate in an
"acknowledged" mode, meaning that the lost frames (or packets) are retransmitted. Retransmission creates
head of line blocking on the queue. This means packets are seldom lost, but delays grow large. The individual
users or endpoints are often isolated from each other so that the network capacity is divided among them
more or less evenly. However, all traffic to and from the same endpoint ends up in the same queue. In WebRTC
context this means that plain TCP traffic will easily ruin real-time traffic due to the buffering.
</t>
<t>
WebRTC protocols should be desinged to avoid this. If Data Channels transfer a lot of data in parallel to the
real-time streams, they should not use the loss-driven (TCP) congestion control algorithms but something that
reacts to queue growth much faster. IETF LEDBAT WG may have something to offer for this case. If the browser
wants to protect its real-time strams in general against all TCP (HTTP, WebSocket) traffic, it might be best for it to also
restrict the number of simultanous TCP connections in use, for instace to retrive a website. The HTTP 2.0 work
done in IETF HTTPBIS WG should prove helpful in this case.
</t>
<t>
Cellular networks also do have their in-built Quality of Service mechanisms that can be used to differentiate
service for different packet flows. These are not widely used in HSPA/WCDMA, but LTE may change the situation
to some extent. The QoS policy is enforced by the network, and requires a contract with the operator. It is thus
likely only available for services with some relation to the access operator. How the WebRTC application or the
browser deal with that is TBD. Technically DiffServ marking is probably the only dynamic approach to indicate
the priority of a particular flow.
</t>
</section>
</section>
<section anchor="Security" title="Security Considerations">
<t>
Not explicitly covered in this version.
</t>
</section>
<!-- <section title="Additional contributors"> -->
<!-- </section> -->
<section anchor="Acknowledgements" title="Acknowledgements">
<t>
Bernard Aboba and Göran Eriksson provided useful comments to the document. Dan Druta has worked on Web notifications in
the context of WebRTC.
</t>
</section>
<!-- Possibly a 'Contributors' section ... -->
</middle>
<!-- *****BACK MATTER ***** -->
<back>
<!-- References split into informative and normative -->
<!-- There are 2 ways to insert reference entries from the citation libraries:
1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
(for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")
Both are cited textually in the same manner: by using xref elements.
If you use the PI option, xml2rfc will, by default, try to find included files in the same
directory as the including file. You can also define the XML_LIBRARY environment variable
with a value containing a set of directories to search. These can be either in the local
filing system or remote ones accessed by http (http://domain/dir/... ).-->
<references title="References">
</references>
<!-- Change Log
-->
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 19:37:51 |