One document matched: draft-jennings-rtcweb-api-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<?rfc toc="yes"?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no"?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc sortrefs="no" ?>
<?rfc colonspace="yes" ?>
<?rfc tocdepth="4"?>
<rfc category="std" docName="draft-jennings-rtcweb-api-00"
ipr="noDerivativesTrust200902">
<front>
<title abbrev="RTC Web API Requirements">Architecture and API Requirements
for RTC Web</title>
<author fullname="Cullen Jennings" initials="C." surname="Jennings">
<organization>Cisco</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<phone>+1 408 421-9990</phone>
<email>fluffy@cisco.com</email>
</address>
</author>
<date day="7" month="March" year="2011" />
<abstract>
<t>Internet browsers and other software applications are enabling
support for real time interactive voice and video. This draft outlines a
set of IETF protocols that can be used for this purpose and describes
the overall architecture. It also identifies the requirements for an
application programming interface to control these protocols.</t>
</abstract>
</front>
<middle>
<section title="Overview">
<t>This draft describes two models of how this would work, which are
referred to as the advertisement proposal (AdProp) model and the offer
answer (OffAns) model. Both of these models are useful in various
situations, and they involve very similar code development efforts. This
draft proposes an API and protocol set standardization that supports
both models. </t>
<section title="Advertisement Proposal Model">
<t>The AdProp model standardizes a way to send media between two
browsers and standardizes an API in the browser, such that
browser-based applications can find out the media capabilities of the
browser and can tell the browser what media streams to send and
receive. We use the term "browser app" to refer to a program that is
running in the browser and using HTML, CSS, and JavaScript to control
the browser. It is assumed that the browser app could communicate with
the web server using existing approaches, and that the web server
communicates with a SIP server as a way of federating to other
websites or connecting to legacy VoIP systems. There are many
different ways this model could be used, but the diagram below covers
a fairly complex case that most other cases end up being a subset of.
More use cases are discussed in section XXX.</t>
<figure>
<artwork><![CDATA[
+-----------+ +-----------+
| Web/ | | Web/ |
| SIP | SIP | SIP |
| |-------------| |
| Server | | Server |
| | | |
+-----------+ +-----------+
/ \
/ \ Proprietary over
/ \ HTTP/Websockets
/ \
/ Proprietary over \
/ HTTP/Websockets \
/ \
+-----------+ +-----------+
|JS/HTML/CSS| |JS/HTML/CSS|
+-----------+ +-----------+
Add ^ |Prop Add ^ | Prop
| v | v
+-----------+ +-----------+
| | | |
| | | |
| Browser | ------------------------- | Browser |
| | ICE + SRTP | |
| | | |
+-----------+ +-----------+
]]></artwork>
</figure>
<t>The API for this model has two distinct phases. First there is a
Connection API that allows the browser app to use ICE to form a
connection to the other browsers. This API assumes that the browser
applications will be able to exchange ICE candidates lists by some
out-of-band means -- most likely involving passing them up to the web
servers over HTTP. The second stage is referred to as the AVT API.
This API allows the browser apps to discover which codecs and
capabilities the browser supports. It then allows the browser app to
control which media streams the browser will send and receive. The
browser describes its range of capabilities in an advertisement
object. The browser app requests that a particular set of media
streams be set up in a proposal to the browser. This is done as an
atomic request which is either accepted or not. Partial acceptance has
proven to be very difficult to deal with in the implementation of
existing systems. The general overview and advantage of the AdProp
model is discussed in <xref
target="I-D.peterson-sipcore-advprop">draft-peterson-sipcore-advprop</xref>.</t>
<t>The model above shows SIP as the protocol between the two web
servers, but the API proposed would also work using Jingle or H.323 as
the federation signaling protocol. It would also be possible to
implement the processing of SIP messages in the JavaScript in the
browser application and then somehow tunnel the SIP messaging between
the clients. XMPP over websockets has been proposed for this. The
architecture and API in this draft would support all of these
possibilities.</t>
</section>
<section title="Offer Answer Model">
<t>The OffAns model standardizes a way to send media between the
browsers, but it also selects an existing signaling protocol to
negotiate and set up the media. The browser app would indicate to the
browser that it wished to form a communication session with another
entity, and then the browser would take care of the rest. A typical
model for this is show below.</t>
<figure>
<artwork><![CDATA[
+------+ +------+ +------+ +------+
| Web | | SIP | SIP | SIP | | Web |
| Serv | | Serv |---------| Serv | | Serv |
+------+ +------+ +------+ +------+
| / \ |
|HTTP / SIP \ | HTTP
| / \ |
| /SIP \ |
| / \ |
| / \ |
| / \ |
+-----------+ +-----------+
|JS/HTML/CSS| |JS/HTML/CSS|
+-----------+ +-----------+
+-----------+ +-----------+
| | | |
| | | |
| Browser |---------------------| Browser |
| | ICE + SRTP | |
| | | |
+-----------+ +-----------+
]]></artwork>
</figure>
<t>The major goal for this API is to be extremely simple to use in
enabling a website for voice and video. On an iPhone today, one can
simply put a tel URL on the web page and the iPhone can call it. That
is a simple approach that web developers like and use. Since standards
are involved, this proposal will have to be more complex. The API
defines an HTML session element that can be used like a source element
inside of an audio or video element. It also provides a JavaScript API
to control the session and replace the user interface.</t>
</section>
<section title="Use Cases">
<section title="Facebook ">
<t>Consider the case of a social networking site that allows IM
between users and wants to also allow voice and video between them,
but does not need to federate with others. The case could easily use
the AdProp model. Assuming that it was only supported on browsers
meeting a certain minimum functionality and it always uses the same
capabilities, there is no need to even negotiate or share the
advertisements between the two browsers. The browser app simply sets
up the connection to the far end, and then uses a proposal for the
media steam that is always the same.</t>
<figure>
<artwork><![CDATA[
+-----------+
| Web |
| |
| |
| Server |
| |
+-----------+
/ \
/ \ Proprietary over
/ \ HTTP/Websockets
/ \
/ \
/ \
/ \
+-----------+ +-----------+
|JS/HTML/CSS| |JS/HTML/CSS|
+-----------+ +-----------+
Add ^ |Prop Add ^ | Prop
| v | v
+-----------+ +-----------+
| | | |
| | | |
| Browser |----------| Browser |
| | ICE+RTP | |
| | | |
+-----------+ +-----------+]]></artwork>
</figure>
</section>
<section title="Webex">
<t>TBD</t>
</section>
<section title="Amazon">
<t>Consider the case of a website that supports searching and
displays advertisements related to the search. In this case clicking
on the advertisement could directly connect the user with a sales
agent at the company associated with the advertisement.</t>
<figure>
<artwork><![CDATA[
+------+ +------+
| Web | | SIP |
| Serv | | Serv |
+------+ +------+
| / |
|HTTP / |
| / |
| /SIP |
| / |
| / |
| / | SIP
+-----------+ |
|JS/HTML/CSS| |
+-----------+ |
+-----------+ |
| | +-------+
| | | Video |
| Browser |----------| Phone |
| | ICE+SRTP | |
| | +-------+
+-----------+ ]]></artwork>
</figure>
<t>In this sort of case the people operating the web server do not
need to deploy anything special to display the advertisement, and
the company associated with the advertisement can use its existing
call center, assuming it meets the legacy VoIP requirements outlined
in section XXX.</t>
<t>The security issue of a browser sending a SIP packet to a device
that does not meet the same origin policy is discussed in the
section XXX, but the brief preview of the solution is that the SIP
messages can use CORS REF much like a HTTP does.</t>
</section>
</section>
</section>
<section title="Terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119">RFC 2119</xref>.</t>
</section>
<section title="Requirements">
<t>The section defines the set of protocols and selected subset profiles
of these protocols that a browser would need to implement, and forms the
requirements for the API to control these protocols. At a high level we
split this into connection management, transports for real time media
such as audio and video, transports for non media data, codecs support,
and signaling protocols.</t>
<t>All of the data plane sessions are set up using ICE [REF] or ICE-Lite
for security reasons, discussed in section XXX. Devices that could be
deployed behind NATs, such as a web browser, are REQUIRED to support ICE
while other devices that always deploy on public addresses can do
ICE-Lite. The only mode of ICE REQUIRED is aggressive. Real time media
is transported over RTP REF or SRTP [REF]. Support for multicast RTP is
OPTIONAL. To support ICE, implementation needs to be able to do STUN REF
and TURN REF. In addition, there is a strong interest to define a
TURN-like protocol that looks like HTTP to intermediaries, so that media
can be tunneled over HTTP. Support RTCPMUX REF is REQUIRED. RTP keep
alive is done using RTCP as described in REF. The API needs to allow the
DSCP REF for each RTP or media stream to be set. The API needs to allow
the browser app to observer and control the SSRC values in the RTP.</t>
<t>Open Issue: There is a desire to be able to pass non media type data
directly between browsers. For example, an application such as Second
Life or gaming application may wish to pass small chunks of data such as
player position with stringent real time requirements. There are several
proposals for how to do this. The session would be set up using ICE,
just as with RTP. One proposal is just to use a thin shim on top of UDP
or DTLS to demux the packets from other packets such as RTP on the same
connection. Another proposal is DTLS over DCCP over UDP with some
appropriate congestion control scheme chosen for DCCP. Another proposal
is to define a data codec to carry the data in RTP.</t>
<t>The mandatory to implement audio codecs are: PCMA, PCMU,
telephone-event, and opus [REF]. The API needs to support the following
OPTIONAL codecs: G729, G722, G7221, G723, AMR, AMR-WB, iLBC, L16 and
opus. PCMU and PMCA codecs are REQUIRED to support 1 channel with a rate
of 8000 and a ptime of 20. The mandatory to implement video codecs are:
<to be chosen by working group - leading candidates for consideration
are H.264-AVC and VP8>. The minimum profile and resolutions supported
by the mandatory to implement video codecs are TBD. The API needs to
support the following OPTIONAL codecs: H263-2000, H264, H264-SVC, raw
and VP8.</t>
<t>The signaling protocol selected here is SIP though very little
overall architecture would change if the WG decided to use Jingle REF
instead of SIP. The browser needs to implement the subset of SIP
REF3261,3263,3264 and is required to support registration, invite, ack,
cancel, bye, and update. Support for the following features is OPTIONAL:
INVITES without an offer, re-invite, forking, S/MIME and sips. Support
for the following is REQUIRED: sip over TLS, outbound proxy, 3xx
redirects, early media, multipart mine REF 5621, update, identity 4916
& 4471, rport REF 3581, SIP keep alive as described in 5626.</t>
<t>Open Issue: define a TURN like protocol to tunnel RTP over HTTP</t>
<t>Open Issues: define a RTP mux protocol to multiplex RTP on top of a
single UDP port. Would likely use SSRC as the demux code point.</t>
<t>Open Issue: Mandatory to implement video codec(s) and minimum
profile.</t>
<t>Open Issue: Mandatory to implement audio codecs.</t>
</section>
<section title="Connection API">
<t>It is expected this section will be removed from this draft and moved
to a W3C draft but it is provided for reference at this point. The straw
man API are many things including adequate error handling. The API would
likely end up using exceptions for many things. </t>
<section title="Session API">
<t>The session element can be used anywhere in HTML that the source
element could be used. Fundamentally, this is an alternative way of
setting up a source for an audio or video element.</t>
<t>Categories: None</t>
<t>Contexts in which this element can be used: same as source
element</t>
<t>Content model: Empty</t>
<t>Content attributes:<list style="hanging">
<t hangText=""></t>
<t hangText="src:">URL to destination to create session with.</t>
<t hangText=""></t>
<t hangText="aor:">Address of Record that identifies this
user.</t>
<t hangText=""></t>
<t hangText="credential:">Password or credential for the specified
AOR.</t>
<t hangText=""></t>
<t hangText="proxy:">URL for outbound proxy.</t>
</list></t>
<t>DOM interface:</t>
<figure>
<artwork><![CDATA[
interface Session : HTMLElement {
attribute double volume; // control speaker volume
attribute boolean mute; // control microphone
attribute boolean sendVideo; // control camera
attribute DOMObject videoPane;
attribute DOMString aorUrl;
attribute DOMString credential;
attribute DOMString outboundProxyURL;
readonly attribute DOMString remoteName;
readonly attribute boolean secure;
readonly attribute DOMString registrationState;
// noRegistrar, registering, registered, registrationFailed
attribute Function onRegisterStateChange;
void open( in DOMString url ); // tel or SIP URL
void close();
void accept( boolean accept );
readonly attribute DOMString sessionState;
// noSession, openingSession, acceptingSession,
// inSession, closingSession
attribute Function onSessionStateChange;
boolean sendKeyPress( in DOMString key ); // send DTMF or KPML
attribute Function onReceiveKeyPress;
};]]></artwork>
</figure>
<t></t>
<t>If the session will be able to display video, the DOM object for a
video tag must be provided in the videoPane parameter of the
constructor. If an aorUrl is provided, the session will attempt to
register for incoming calls at the server using the provided
credentials. If an outbound proxy is provided, all signaling for this
session will use that proxy. The progress of the registration can be
tracked with the onRegisterStateChange callback. The registrationState
attribute will be a string with one of the following values:
noRegistrar, registering, registered, or registrationFailed.</t>
<t>Open Issue: need to decide how to handle credentials and if they
will be in the JavaScript. Similar issues for TURN server credentials.
</t>
<t>To make a call, the open session method is called and the session
state will change to "opening session".</t>
<t>Events:</t>
<t>Exceptions:</t>
<section title="Session Example Incoming">
<t>The following HTML snippet would display a video pane with a user
interface such that when the user clicked, it would create an audio
video session by making a SIP call to "sales@example.com".</t>
<figure>
<artwork><![CDATA[
<video width='320' height='240' >
<session src="sip:sales@example.com" >
</video>]]></artwork>
</figure>
</section>
<section title="Session Example Outgoing">
<t>The following HTML snippet would register to receive calls to the
address "sip:fluffy@example.com". Furthermore it would use an
outbound SIP proxy at sip.example.com.</t>
<figure>
<artwork><![CDATA[
<video width='320' height='240' >
<session aor="sip:fluffy@example.com"
credential="password"
proxy="sip:sip.example.com" >
</video>]]></artwork>
</figure>
</section>
</section>
<section title="Connection API">
<t></t>
<figure>
<artwork><![CDATA[
[NoInterfaceObject]
interface IceCandidate {
DOMString foundation;
unsigned short component-id; // always 1 ?
DOMString transport; // udp
unsigned long priority;
DOMString type; // host, srflx, prflx, relay
DOMString addressFamily; // v4 v6
DOMString connectionAddress; // v4 or v6 ip address
unsigned short port;
};
[NoInterfaceObject]
interface IceCandidateList {
IceCandidate candiate[];
DOMString icePassword;
DOMString iceUFragment;
};
[NoInterfaceObject]
interface RelayServer {
DOMString type; // stun turn
};
[NoInterfaceObject]
interface StunServer : RelayServer {
DOMString ip;
};
[NoInterfaceObject]
interface TurnServer : RelayServer {
DOMString ip;
DOMString username;
DOMString password;
};
[Constructor(in optional RelayServer relayServers[])]
interface Connection {
attribute int keepAlivetime; // default 30 seconds
attribute RelayServer relayServers[]
readonly attribute IceCandidateList candidateList;
readonly attribute IceCandidate connectionNearEnd;
readonly attribute IceCandidate connectionFarEnd;
void open( IceCandidateList addressList );
readonly attribute DOMString state;
// creating,ready,connecting,open,closed
attribute Function onready;
attribute Function onopen;
void send(in DOMString data);
attribute Function onmessage; // implements MessageEvent interface
attribute Function onerror;
void close();
attribute Function onclose;
};]]></artwork>
</figure>
<t>The general usage for a browser that had a stun server at 192.0.2.1
would be to create a connection, wait for ICE to gather candidates and
the state to change to ready, then send the ICE candidates list to the
far side as shown in the following code.</t>
<t>Open Issue: Need to add more into this so that an application can
understand what is going on and get information to provide status and
debug problems as well as statistics. Also may need parameters to
change the algorithm. </t>
<figure>
<artwork><![CDATA[
myConn = new Connection( [ {type:"stun",ip:"192.0.2.1"} ] );
myConn.onready = function() {
myCandidates = myConn.candidateList;
// send myCandidates to far side
} ]]></artwork>
</figure>
<t>Open issue: add text around setter calling function if in that
state when set. </t>
<t>Later when the far side has sent its candidate list to this side,
the browser app calls open to start opening the connection to the
other side. Once the connection is open, the browser app can start
sending and receiving data.</t>
<figure>
<artwork><![CDATA[
myConn.open( farSideCandidateList );
myConn.onOpen = function() {
// can start sending data for far side
myConn.send( "Hello" );
}
myConn.onmessage = function(e) {
alert "Received data:" + e.data;
}]]></artwork>
</figure>
</section>
<section title="Audio Video API ">
<t>Note this section is far from complete and is more just a sketch to
get the flavor of the interface.</t>
<figure>
<artwork><![CDATA[
interface Advertisement {
CodecAd codecs[];
boolean rtcpMux; // default true
boolean rtpMux; // default true
boolean srtp; // default true
DOMString protocols[];
// RTP/AVP, RTP/AVPF, UDP/TLS/RTP/SAVP, UDP/TLS/RTP/SAPF
srtpSuites[]; // AES_CM_128_HMAC_SHA1_32
};
interface CodecAd{
string mediaType;
int clockRate;
float minBandwidth; // kbps
float maxBandwidth; // kbps
boolean canReceive;
boolean canSend;
boolean supportDscp;
};
interface TelEventDataCodecAd {
int supportCodes[]; // defaults to 0-11 if not present
};
interface AudioCodecAd : CodecAd {
int maxPacketTime; // ms
};
interface IlbcAudioCodecAd : AudioCodecAd {
int modeList [];
}
interface G729AudioCodecAd : AudioCodecAd {
boolean vadSupported;
};
interface G711uAudioCodecAd : AudioCodecAd {
// G.711 PCMU must be 1 channel at rate of 8000
};
interface G711aAudioCodecAd : AudioCodecAd {
// G.711 PCMA must be 1 channel at rate of 8000
};
interface L16AudioCodecAd : AudioCodecAd {
int rates[];
int channels[];
DOMString emphasis[];
DOMString channel-order[];
};
interface AMRAudioCodecAd : AudioCodecAd {
DOMString modeSet;
// bunch more needed here
};
interface VideoCodecAd : CodecAd {
float maxFramerate; // fps
int clockRate;
int minXsize; int maxXsize;
int minYsize; int maxYsixe;
float minPar; float maxPar; float parList[];
float minSar; float maxSar; float sarList[];
};
interface VP8CodecAd : VideoCodecAd {
int versions[];
};
interface H264CodecAd : VideoCodecAd {
unsigned short profile-levls[];
unsigned short max-recv-level;
int max-mbps;
int max-smbps;
int max-fs;
int max-cpb;
int max-dpb;
int max-br;
boolean redundant-pic-cap;
DOMString sprop-parameter-sets;
DOMString sprop-level-parameter-sets;
boolean use-level-src-parameter-sets;
boolean in-band-parameter-sets;
boolean level-asymmetry-allowed;
int packetization-modes[];
int sprop-interleaving-depth;
int sprop-deint-buf-req;
long deint-buf-cap;
int sprop-init-buf-time;
// int sprop-init-buf-time;
long max-rcmd-nalu-size;
int sar-understood;
int sar-supported;
};
interface Proposal {
StreamProp streams[];
};
interface StreamProp {
string mediaType;
int clockRate;
float minBandwidth; // kbps
float maxBandwidth; // kbps
boolean canReceive; // default true
boolean canSend; // default true
DOMString fingerprint; // RFC4572
int pTime;
DOMString protocol;
// RTP/AVP, RTP/AVPF, UDP/TLS/RTP/SAVP, UDP/TLS/RTP/SAPF
long ssrc;
int dscp;
DOMString srtpSuites;
int srtpKdr;
boolean srtpUnencryptedRtcp;
boolean srtpUnauthenticated;
DOMString srtpFecOrder; //FEC_SRTP, "SRTP_FEC"
int srtpLifetime; // log base 2 of max packets with one kety
DOMString srtpKeys[];
int srtpMki[]; // MKI corresponding to srtpKeys at same index
};
interface VideoProp : StreamProp {
int sizex;
int sizey;
float sar;
float frameRate;
};
interface AudioProp : StreamProp {
int pTime; // ms
};
interface Stats {
StreamStats steam[];
};
interface StreamStats {
// TODO RTCP stats
};
interface AVT {
attribute Connection connection;
readonly attribute Advertisement advertisement;
readonly attribute Advertisement advertisementNoVideo;
attribute DOMObject camera;
attribute DOMObject mic;
attribute HTMLVideoElement videoPane;
readonly attribute Stats stats;
readonly attribute Proposal proposal;
boolean setProposal( Proposal newProp );
};]]></artwork>
</figure>
<t>Using this interface is fairly simple. First an AVT object is
loaded and bound to an existing Connection object. It is also bound to
cameras, microphones, and speakers, Then the current advertisement can
be retried.</t>
<t>Open Issue: The SRTP keying should not be per stream. </t>
<figure>
<artwork><![CDATA[
var myAvt = org.w3c.device.load("device", "AVT", "1");
myAvt.connection = myConn; // the ICE formed connection
myAvt.camera = org.w3c.device.load("device", "camera", "1");
myAvt.mic = org.w3c.device.load("device", "mic", "1");
myAVT.videoPane = document.getElementById("myVideo");
mdAdv = myAvt.advertisement;]]></artwork>
</figure>
<t>Open Issues: What's the best way to get an AVT object? How to get
the other devices and wire them up to the AVT object?</t>
<t>Assume that the browser supports VP8 video at 720P and G.711. The
myAvt object might look like:</t>
<figure>
<artwork><![CDATA[
{
"codecs" : [
{
"mediaType" : "PCMU",
"clockRate": "8000",
"maxPacketTime" : "60"
},
{
"mediaType" : "PCMA",
"clockRate": "8000",
"maxPacketTime" : "60"
},
{
"mediaType" : "VP8",
"clockRate" : "90000",
"maxXsize" : ""1440,
"maxYsize" : "720",
"parList" : [ "1.0" ],
"versions" : [ "1" ]
} ],
"protocols" : ["RTP/AVP", "RTP/AVPF" ]
};]]></artwork>
</figure>
<t>Then, based on some knowledge about what the far end browser
supports, the system would decide that it wants to use PCMU with VP8
at a QCIF resolution and 15fps. After forming a connection to the far
end and waiting for the connection object to be in the ready state, it
would construct the following proposal object and then send that
proposal to the AVT systems as shown in the code below. Assuming the
proposal is acceptable, the setProposal returns true and (returns
false if it is not).</t>
<figure>
<artwork><![CDATA[
var proposal = {
"streams" : [
{
"mediaType" : "VP8",
"clockRate: : "90000",
"protcol" : "RTP/AVP",
"sizex" : "176",
"sizey" : "144",
"sar" : "1.0",
"frameRate" : "15",
"version" : "1"
},
{
"mediaType" : "PCMU",
"clockRate: : "8000",
"pTime" : "20",
"protcol" : "RTP/AVP"
} ]
};
if ( myAvt.setProposal( proposal ) ) {
// it worked
}]]></artwork>
</figure>
</section>
</section>
<section title="IANA Considerations">
<t>This document does not require any action of IANA.</t>
</section>
<section title="Security Considerations">
<section title="Attack Model">
<t>This architecture involves all the normal security consideration
and attack models of HTTP, SIP and RTP but introduces yet another key
issue. The assumption is that a user may browse to the attacker's
website. The other assumption is that the browser is behind a
firewall, and inside that firewall there are devices that would not
have appropriate security models for the internet. For example, there
could be SIP gateways that if sent an invite to call a 1-900 number
would do so with no authentication or authorization. Whatever
HTML/CS/Javascript is downloaded must not be able to send arbitrary
packets to hosts behind the firewall or send SIP or RTP to devices
that do not consent to communicate with the browser.</t>
</section>
<section title="Media Security">
<t>The browser MUST enforce the constraint that no RTP or other media
is sent to a given destination unless that destination completes an
ICE connectivity check and proves it knows the secret generated by the
browser. The browser must keep a list of locations it has attempted to
contact with ICE in the previous 30 seconds and not contact any
locations that have previously failed.</t>
</section>
<section title="Signaling Security">
<t>The browser stops unwanted SIP signaling by using CORS REF. The
same CORS headers used for HTTP will be added to the SIP signaling.
Before the browser sends SIP signaling, it will preflight the SIP
messaging using a SIP OPTIONS message. This is done the same ways CORS
can preflight check an HTTP request.</t>
</section>
</section>
<section title="Legacy VoIP Interoperability">
<t>There is no way to meet all the security requirements and maintain
comparability with all legacy VoIP equipment. This draft tries to
minimize the impedance mismatch. The requirements here would allow
interoperability with legacy VoIP equipment as long as that equipment
either directly supported, or was fronted by an SBC that supported, the
following: SIP CORS extension, ICE or ICE-Lite, codecs from the
mandatory to implement set, supported SIP invites containing an offer,
and supported DTMF over RTP with telephone events.</t>
<t>A substantial fraction of VoIP equipment does all of this except for
the CORS extensions. The item most commonly lacking is ICE-Lite but that
is becoming increasingly prevalent, particularly on devices designed to
sit on the edge of a domain and connect to remote UAs that may be behind
NATs. For an edge device that was willing to receive SIP call from
others, implementing the CORS is pretty trivial. When the UAS receives a
SIP options request with an Origin header, it checks whether the header
field value is on the white list, and if it is then the UAS copies the
value to the Access-Control-Allow-Origin header field value in the
response. For many situations the white list would be everything, while
for others it would be just the list of websites that are expected to
originate calls to this SIP device.</t>
</section>
<section title="Acknowledgement">
<t>Thanks to Joe Hildebrand, Matt Miller, Matthew Kaufman, Eric Rescorla
and Lyndsay Campbell for their review, comments and contributed
ideas.</t>
</section>
</middle>
<back>
<references title="Normative References">
<reference anchor="RFC2119">
<front>
<title abbrev="RFC Key Words">Key words for use in RFCs to Indicate
Requirement Levels</title>
<author fullname="Scott Bradner" initials="S." surname="Bradner">
<organization>Harvard University</organization>
<address>
<postal>
<street>1350 Mass. Ave.</street>
<street>Cambridge</street>
<street>MA 02138</street>
</postal>
<phone>- +1 617 495 3864</phone>
<email>sob@harvard.edu</email>
</address>
</author>
<date month="March" year="1997" />
<area>General</area>
<keyword>keyword</keyword>
<abstract>
<t>In many standards track documents several words are used to
signify the requirements in the specification. These words are
often capitalized. This document defines these words as they
should be interpreted in IETF documents. Authors who follow these
guidelines should incorporate this phrase near the beginning of
their document: <list style="empty">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
"SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in RFC 2119.</t>
</list></t>
<t>Note that the force of these words is modified by the
requirement level of the document in which they are used.</t>
</abstract>
</front>
<seriesInfo name="BCP" value="14" />
<seriesInfo name="RFC" value="2119" />
<format octets="4723"
target="http://www.rfc-editor.org/rfc/rfc2119.txt" type="TXT" />
<format octets="17491"
target="http://xml.resource.org/public/rfc/html/rfc2119.html"
type="HTML" />
<format octets="5777"
target="http://xml.resource.org/public/rfc/xml/rfc2119.xml"
type="XML" />
</reference>
</references>
<references title="Informative References">
<reference anchor="I-D.peterson-sipcore-advprop">
<front>
<title>The Advertisement/Proposal Model of Session
Description</title>
<author fullname="Jon Peterson" initials="J" surname="Peterson">
<organization></organization>
</author>
<author fullname="Cullen Jennings" initials="C" surname="Jennings">
<organization></organization>
</author>
<date day="28" month="February" year="2010" />
<abstract>
<t>In common SIP practice, a two-phase "offer/answer" exchange of
session description documents negotiates preferences, capabilities
and requested sessions. However, the structure of the session
description greatly confuses the disambiguation of these elements
and thus the clear characterization of sessions. The current work
proposes an alternative to the offer/answer model which leverages
pre-association between user agents to recast those two phases
into a less ambiguous form: an Advertisement of capabilities and
preferences which occurs in non-real-time before a session is ever
requested, which is followed during session establishment by a
unidirectional and complete Proposal of a session.</t>
</abstract>
</front>
<seriesInfo name="Internet-Draft"
value="draft-peterson-sipcore-advprop-00" />
<format target="http://www.ietf.org/internet-drafts/draft-peterson-sipcore-advprop-00.txt"
type="TXT" />
</reference>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 13:15:33 |