One document matched: draft-rosenberg-impp-im-00.txt
Internet Engineering Task Force IMPP WG
Internet Draft Jonathan Rosenberg
Dean Willis
Robert Sparks
Ben Campbell
dynamicsoft
Henning Schulzrinne
Jonathan Lennox
Columbia U.
Bernard Aboba
Christian Huitema
David Gurle
Microsoft
Dave Oran
Cisco
draft-rosenberg-impp-im-00.txt
June 15, 2000
Expires: December, 2000
SIP Extensions for Instant Messaging
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as work in progress.
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document describes a SIP extension that supports Instant
Messaging (IM). IM is supported in SIP with a single new method. We
provide motivations on why SIP is an ideal platform for IM, why IM
should be completely separated from presence, and exactly how to
perform IM with SIP.
1 Introduction
Instant messaging is defined as the exchange of content between a set
of participants in real time. Generally, the content is short textual
messages, although that need not be the case. Generally, the messages
that are exchanged are not stored, but this also need not be the
case. IM differs from email in common usage in that instant messages
are usually grouped together into brief live conversations,
Rosenberg et. al. [Page 1]
Internet Draft IM June 15, 2000
consisting of numerous small messages sent back and forth.
Instant messaging as a service has been in existence within intranets
and IP networks for quite some time. Early implementations include
zephyr [1], the unix talk application, and IRC. More recently, IM has
been used as a service coupled with presence and buddy lists; that
is, when a friend comes online, a user can be made aware of this and
have the option of sending the friend an instant message. The
protocols for accomplishing this are all proprietary, which has
seriously hampered interoperability. Furthermore, most of these
protocols tightly couple presence and IM, due to the way in which the
service is offered.
Despite the popularity of presence coupled IM services, IM is a
separate application from presence. There are many ways to use IM
outside of presence (for example, as part of a voice communications
session). Another example are interactive games (possibly established
with SIP - SIP can establish any type of session, not just voice or
video); IM is already a common component of multiplayer online games.
Keeping it apart from presence means it can be used in such ways.
Furthermore, keeping them separate allows separate providers for IM
and for presence service. Of course, it can always be offered by the
same provider, with both protocols implemented into a single client
application.
Along a similar vein, the mechanisms needed in an IM protocol are
very similar to those needed to establish an interactive session -
rapid delivery of small content to a user at their current location,
which may, in general, be dynamically changing as the user moves. The
similarity of needed function implies that existing solutions for
initiation of sessions (namely, the Session Initiation Protocol (SIP)
[2]) is an ideal base on which to build an IM protocol.
2 Motivations for Using SIP
Our first motivation for using SIP as the basis for IM is that the
problems of session initiation and instant messaging are very
similar. When provided independently of presence, the primary
challenge behind providing IM service is to deliver the instant
message to the host where the user is currently available, and if not
available, return an error code indicating such. This is exactly the
same service required for initiation of sessions, as these
invitations must also be delivered to the host where the user is
currently available. The result is that all of the application
layer-routing and personal mobility services provided by SIP are both
needed, and directly applicable to, the delivery of IMs. In fact, by
defining IM as just a new SIP method, existing SIP proxies can route
IMs without even being aware of this extension.
Rosenberg et. al. [Page 2]
Internet Draft IM June 15, 2000
SIP is a transactional service, consisting of sequences of request-
response transactions within a common context (identified by the
Call-ID). If desired, ordering of transactions can be guaranteed.
This kind of transactional service is also needed for instant
messages.
Instant messages often occur in groups; that is, one party sends an
instant message, and then there is a back and forth of messages that
form a conversation of sorts, where the conversation (aka session)
was effectively initiated by the first message. It is necessary to
provide an identifier to group these instant messages together, so
that each IM can be associated with a particular session. Since SIP
is used for session initiation, the identifiers and tools it provides
for management of the state associated with sessions are directly
applicable to instant messaging.
SIP uses MIME for transport of content. The meaning and purpose of
the content depend on the request method and on the content type.
This means that an IM service based on SIP can transport arbitrary
MIME content, which has been established as a requirement for IM [3].
SIP establishes and controls communications, generally between
humans, and thus provides numerous header fields for identification
of the users involved in the communication. IM is also designed to
enable communications between humans, and thus the same requirements
for user identification are present. The SIP header fields for this
function (To and From) are directly applicable to IM, as are the
authentication tools provided by SIP to verify those identities.
Scale is critical for IM service. Scale is primarily achieved by
removal of state from network elements, and pushing protocol
functions towards the periphery. Based on this, it is highly
desirable for it to be possible for messaging to occur directly
between participants, yet still take advantage of SIPs routing
capabilities to deliver messages. This is easily supported in SIP, as
the same requirement exists for achieving scale of session initiation
- the initial call setup messages go through network servers in order
to be routed properly, but subsequent signaling can occur end to end.
So, we can have the initial IM pass through proxies in order to be
properly routed to the recipient, and subsequent IMs can go direct.
SIPs Record-Route, Route, and Contact headers are used for this
purpose, and are applicable to IM to provide the same function.
Security is critical for IM, as it is for session initiation. SIPs
capabilities of end to end authentication and encryption, coupled
with hop by hop security mechanisms (outside of SIP itself) provide
security for session initiation, and these mechanisms will work for
IM as well.
Rosenberg et. al. [Page 3]
Internet Draft IM June 15, 2000
Finally, and most importantly, both IM and voice/video are part of a
complete communications service. It is likely that many devices will
perform both IM and voice, and that these devices will have limited
memory and processing power. By using the same protocol for both
forms of communications, a reduction in memory requirements through
code reuse is obtained.
Even bigger benefits are gained from providers that wish to offer
voice, video, presence and IM. By having all of these differing
aspects of communications running off the same infrastructure,
providers can realize substantial savings in infrastructure cost,
management cost, and provisioning cost.
Furthermore, by using SIP for both IM and establishment of
communications sessions, services that integrate the two are readily
supported. For example, many IM systems allow an IM session to
transition to voice with a single click. This is trivially done if
SIP is used for IM; all of the information needed to send a SIP
INVITE request directly to the other user has already been obtained
through the IM exchanges. Furthermore, by using the same session
identifiers, the call can be associated with the IM session. This
allows the called party to know that the call was related to a
specific IM exchange. If IM were done with a different protocol, this
integration would not be possible.
For these reasons, we believe SIP is ideal for IM service. Section 6
examines each of the requirements outlined in [3] and demonstrates
how this extension meets those requirements.
3 Terminology
Most of the terminology used here is defined in RFC2778 [4]. However,
we duplicate some of the terminology from SIP in order to clarify
this document:
User Agent (UA): A UA is a piece of software which is capable of
initiating requests, and of responding to requests.
User Agent Server (UAS): A UAS is the component of a UA which
receives requests, and responds to them.
User Agent Client (UAC): A UAC is the component of a UA which
sends requests, and receives responses.
Registrar: A registrar is a SIP server which can receive and
process REGISTER requests. These requests are used to
construct address bindings.
Rosenberg et. al. [Page 4]
Internet Draft IM June 15, 2000
4 Overview of Operation
When one user wishes to send an instant message to another, the
sender formulates a SIP request with a new method, called MESSAGE.
The request URI of this request is a normal SIP URL identifying the
party to whom the message is directed. This request URI is rewritten
by SIP proxies (which are very similar to HTTP proxies) as the
request travels towards the recipient. For example, a request for
sip:joe@example.com will arrive at the example.com server, which
looks up Joe in some corporate database, and then determines that Joe
can be reached internally at sip:joe@engineering.example.com. This
new address is placed in the request URI of the outgoing request, and
sent to the server for engineering.example.com. Since the request URI
is rewritten by proxies, some means is needed to convey the identity
of the original desired recipient. Thus, the sender also places the
URL for the desired recipient in the mandatory To field. The From
field identifies the originator of the message. The message must also
contain a Call-ID. In SIP, the Call-ID is used to associate a group
of requests with the same session. Here, the usage is the same; all
IMs that are part of the same session share the same Call-ID value.
Call-ID has no meaning beyond being a common identifier.
Each IM also carries a CSeq, which is a sequence number plus the name
of the method of the request (the method name is there to support SIP
features not required for IM). The CSeq uniquely identifies each IM
in the session, and increases for each subsequent IM. Each IM also
carries a Via header. Via headers contain a trace of the IP addresses
or FQDNs of the systems that the request traversed. As a request
travels from proxy to proxy towards the recipient, each adds its
address, "pushing" them into a header, much like the operation of a
stack. The stack of addresses is reflected in the response, and each
proxy "pops" the top address off, and uses that to determine where to
send the response. This allows proxies to forward UDP requests
statelessly, so that they need not even remember where the request
came from to forward the response. Finally, clients using this
extension MUST insert a Contact header into the request (Contact is
used for routing of requests in the reverse direction, from the
target of the original message to the initiator of the original
message).
The MESSAGE request MAY contain a body. The body contains the message
to be rendered by the recipient. SIP uses the standard MIME headers
(Content-Type, Content-Length, and Content-Encoding) to identify the
content.
The request MAY be sent using UDP or TCP (SIP supports both UDP and
TCP (and even SCTP [5]) transport; reliability is guaranteed over UDP
and congestion control is provided through a simple retransmission
Rosenberg et. al. [Page 5]
Internet Draft IM June 15, 2000
scheme with exponential backoff) but TCP is RECOMMENDED when the
message size exceeds 1184 bytes, in order to avoid fragmentation and
the associated loss exponentiation effect. This means that a TCP
connection may be established for the first large message; it is
RECOMMENDED that the client keep this connection open and use it to
send subsequent messages destined for the same server.
The request MAY be sent to a local outbound proxy (a local outbound
proxy is a device similar to an http proxy; it receives requests
which are not destined for itself, and then forwards them towards the
final destination), or MAY be sent directly to the server in the
domain specified in the request URI. This is identical to baseline
SIP. Local outbound proxies are RECOMMENDED in order to provide
domain-based third party signatures (i.e., re-sign the request with a
key for the entire domain). These proxies SHOULD perform proxy
authentication, verifying the identity of the originator, before re-
signing.
Proxies forward the message according to configured routing logic
combined with DNS SRV record procedures. Pre-established security
associations MAY be used, or SAs MAY be established on demand. The
SAs themselves SHOULD be based on IPSec ESP in transport mode [6] to
provide privacy services for instant messages. Keys for ESP MAY be
established administratively. If administrative keys are not
available, IKE is used for key exchange [7]. If a proxy receives a
request that does not arrive over a SA, it MAY reject the request.
This decision is based on the local security policy of the proxy.
Each proxy adds its address to the Via header as it forwards the
request. Proxies MAY also record route; this means that they can
request to receive all subsequent messages for the same Call-ID. By
not record-routing, proxies will see only the initial request they
forward; all subsequent requests in the same session will bypass the
proxy, and go on a more direct path between the end systems. Record-
routing is done by inserting a header into the forwarded request
(called Record-Route) which contains the address of the proxy. Like
the Via headers, Record-Route has a "stack" property, since proxies
"push" values into the message. The entire Record-Route stack is
reflected in the response to the IM, but unlike Via, no addresses are
"popped" in the response. In this fashion, both sender and recipient
of the IM have a list of the message path for subsequent requests.
This path list is built into a Route header by the end systems, and
placed in subsequent requests. The Route header is like a loose
source route in IP, and specifies the path that the request should
take. Record-routing gives each proxy the capability to independently
decide the right trade off of scale (achieved by not record routing)
and services (generally achieved by record routing). Proxies which
are aware that they are behind a firewall, for example, can record-
Rosenberg et. al. [Page 6]
Internet Draft IM June 15, 2000
route, ensuring that messages from inside to outside always come from
the proxy. Beyond the existence of firewalls, however, we see no
strong reason for proxies to Record-Route instant messages. The
decision, of course, is at the discretion of the administrator.
Proxies MAY have access rules which prohibit the transmission of
instant messages based on certain criteria. Typically, this criteria
will be based on the identity of the sender of the instant messages.
Establishment of this criteria in the proxy is outside the scope of
this extension. We anticipate that such access controls will often be
controlled through web pages accessible by users, mitigating the need
for standardization of a protocol for defining access rules.
Eventually, the request is forwarded to a proxy which is co-located
with a registrar. A registrar is an entity in SIP that has dynamic
application layer routing information. When a client starts up, they
send the registrar a REGISTER request that binds an address in the
domain of the registrar to the address of the machine they are
residing on. Continuing with the example above, the proxy for
engineering.example.com receives the request for Joe. Joe had
formerly registered a binding from sip:joe@engineering.example.com to
sip:joe@mypc.engineering.example.com, which contains the FQDN of the
host Joe is using. In fact, the binding established by a REGISTER can
be one to many, so that a user can indicate the ability to be
contacted at multiple hosts (laptop, PDA, cell phone). The proxy co-
located with the registrar uses this information to forward the
request once more. In fact, the proxy may fork, which means it sends
multiple copies of the request, one to each host in the binding. For
an IM, this means the message can appear at many hosts. So, a user
which has a tool running at work, goes home, and starts a tool there,
can receive the IM at *both* machines. Once the user sends an IM
back, future IMs in the same session will be routed only to the
machine where the second IM came from.
Proxies which route messages based on registrations SHOULD
additionally support the "methods" parameter in the caller
preferences specification [8]. This specification allows, among other
things, for clients to indicate in a REGISTER that they would prefer
to receive messages with specific methods. Proxies receiving requests
with a particular method forward it to the contact address which has
indicated it can handle that method. This allows for a user with a
single SIP address to use separate user agents for IM and for other
communications. Alternatively, users can use the same user agents for
both.
If the user agent (UA) does not support IM, the MESSAGE method will
be unknown to it, and it will generate a 405 (Method Not Allowed)
response, and list the methods that are allowed. Similarly, if a user
Rosenberg et. al. [Page 7]
Internet Draft IM June 15, 2000
agent only supports IM (that is, it only does instant messaging), it
rejects other requests, like INVITE, with a 405 and lists MESSAGE as
the only method in the Allow header.
It is RECOMMENDED that a user agent place an Allow header in a
response to an INVITE, indicating its support for MESSAGE. This
allows a UA to "grey out" the IM button that would allow message
exchanges during a multimedia session.
Finally, if the message is received correctly by the user agent
(independently of whether the principal (i.e, the user) has read it),
a 200 OK response is generated, and forwarded back towards the
sender.
It is worth noting that of all the described mechanisms above,
everything is already specified by SIP, excepting the new MESSAGE
method, and some minor handling rules (namely, Contact MAY be left
out of a 200 OK to a MESSAGE request) to enable the forking of
MESSAGE. Furthermore, the above describes the majority of the SIP
capabilities needed for IM. Section 7 more fully indicates the
components of SIP that are needed, and not needed, for IM.
4.1 Message flow
An example message flow is shown in Figure 1. The message flow shows
an initial IM sent from User 1 to User 2, both users in the same
domain, "domain", through a single proxy. A second IM, sent in
response, flows directly from User 2 to User 1.
Message F1 looks like:
MESSAGE sip:user2@domain.com SIP/2.0
Via: SIP/2.0/UDP user1pc.domain.com
From: sip:user1@domain.com
To: sip:user2@domain.com
Contact: sip:user1@user1pc.domain.com
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Type: text/plain
Content-Length: 18
Watson, come here.
User1 forwards this message to the server for domain.com (discovered
Rosenberg et. al. [Page 8]
Internet Draft IM June 15, 2000
| F1 MESSAGE | |
|--------------------> | F2 MESSAGE |
| | ----------------------->|
| | |
| | F3 200 OK |
| | <-----------------------|
| F4 200 OK | |
|<-------------------- | |
| | |
| | |
| | |
| | F5 MESSAGE |
| <--------------------|------------------------ |
| | |
| F6 200 OK | |
| ---------------------|-----------------------> |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
User 1 Proxy User 2
Figure 1: Example Message Flow
through a combination of SRV and A record processing specified in
SIP), using UDP. The proxy receives this request, and recognizes that
it is the server for domain.com. It looks up user2 in its database
Rosenberg et. al. [Page 9]
Internet Draft IM June 15, 2000
(built up through registrations), and finds a binding from
sip:user2@domain.com to sip:user2@user2pc.domain.com. It forwards the
request to user2, and does not insert the Record-Route header. The
resulting message, F2, looks like:
MESSAGE sip:user2@domain.com SIP/2.0
Via: SIP/2.0/UDP proxy.domain.com
Via: SIP/2.0/UDP user1pc.domain.com
From: sip:user1@domain.com
To: sip:user2@domain.com
Contact: sip:user1@user1pc.domain.com
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Type: text/plain
Content-Length: 18
Watson, come here.
The message is received by user2, displayed, and a response is
generated, message F3, and sent to the proxy:
SIP/2.0 200 OK
Via: SIP/2.0/UDP proxy.domain.com
Via: SIP/2.0/UDP user1pc.domain.com
From: sip:user1@domain.com
To: sip:user2@domain.com
Contact: sip:user2@user1pc.domain.com
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Length: 0
Note that most of the header fields are simply reflected in the
response. The proxy receives this response, strips off the top Via,
and forwards to the address in the next Via, user1pc.domain.com, the
result being message F4:
SIP/2.0 200 OK
Via: SIP/2.0/UDP proxy.domain.com
Via: SIP/2.0/UDP user1pc.domain.com
From: sip:user1@domain.com
To: sip:user2@domain.com
Rosenberg et. al. [Page 10]
Internet Draft IM June 15, 2000
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Length: 0
Now, user2 wishes to send an IM to user1, message F5. As there are no
Record-Routes in the original IM, it can simply send the IM directly
to the address in the Contact header. Note how the To and From fields
are now reversed from the response it sent in message F4:
MESSAGE sip:user1@user1pc.domain.com SIP/2.0
Via: SIP/2.0/UDP user2pc.domain.com
To: sip:user1@domain.com
From: sip:user2@domain.com;tag=ab8asdasd9
Contact: sip:user2@user2pc.domain.com
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Type: text/plain
Content-Length: 29
My name is User2, not Watson.
This is sent directly to user1, who responds with a 200 OK in message
F6:
SIP/2.0 200 OK
Via: SIP/2.0/UDP user2pc.domain.com
To: sip:user1@domain.com
From: sip:user2@domain.com;tag=ab8asdasd9
Call-ID: asd88asd77a@1.2.3.4
CSeq: 1 MESSAGE
Content-Length: 0
5 Detailed Operation
This section more formally defines the syntax and semantics of this
extension.
5.1 Method Definition
This specification defines a new SIP method, MESSAGE. The BNF for
Rosenberg et. al. [Page 11]
Internet Draft IM June 15, 2000
this method is:
Message = "MESSAGE"
As with all other methods, the MESSAGE method name is case sensitive.
Tables 1 and 2 extend Tables 4 and 5 of SIP by adding an additional
column, defining the headers that can be used in MESSAGE requests and
responses.
where enc. e-e MESSAGE
__________________________________________
Accept R e o
Accept 415 e o
Accept-Encoding R e o
Accept-Encoding 415 e o
Accept-Language R e o
Accept-Language 415 e o
Allow 200 e o
Allow 405 e m
Authorization R e o
Authorization r e o
Call-ID gc n e m
Contact R e m
Contact 2xx e o
Contact 3xx e o
Contact 485 e o
Content-Encoding e e o
Content-Length e e m
Content-Type e e *
CSeq gc n e m
Date g e o
Encryption g n e o
Expires g e o
From gc n e m
Hide R n h o
Max-Forwards R n e o
Organization g c h o
Table 1: Summary of header fields, A--O
Rosenberg et. al. [Page 12]
Internet Draft IM June 15, 2000
where enc. e-e MESSAGE
________________________________________________________
Priority R c e o
Proxy-Authenticate 407 n h o
Proxy-Authorization R n h o
Proxy-Require R n h o
Record-Route R h o
Record-Route 2xx,401,484 h o
Require R e o
Retry-After R c e -
Retry-After 404,413,480,486 c e o
500,503 c e o
600,603 c e o
Response-Key R c e o
Route R h o
Server r c e o
Subject R c e o
Timestamp g e o
To gc(1) n e m
Unsupported 420 e o
User-Agent g c e o
Via gc(2) n e m
Warning r e o
WWW-Authenticate R c e o
WWW-Authenticate 401 c e o
Table 2: Summary of header fields, P--Z; (1): copied with possible
addition of tag; (2): UAS removes first Via header field
5.2 UAC processing of initial MESSAGE request
A MESSAGE request MUST contain a To, From, Call-ID, CSeq, Via,
Content-Length, and Contact header, formatted as specified in [2].
All UAs MUST be prepared to send and receive MESSAGE requests with a
body of type text/plain. MESSAGE requests MAY contain an Accept
header listing the allowable MIME types which may be sent in the
response, or in subsequent requests in the reverse direction. The
absence of the Accept header implies that the only allowed MIME type
is text/plain. This simplifies operation in small devices, such as
wireless appliances, which will generally only have support for text,
but still allows any other MIME type to be used if both sides support
it. Note that multipart may be useful for IM as well; implementations
are encouraged to support multipart if possible.
MESSAGE requests MAY contain a Subject header indicating the subject
of the IM session.
Rosenberg et. al. [Page 13]
Internet Draft IM June 15, 2000
As a nice implementation feature, the subject can be
displayed on the title bar of the window which contains the
text of the IM exchange.
A UAC MAY send a MESSAGE request for an existing call, established
with an INVITE. In this case, the MESSAGE request is processed
identically to the INFO method [9]. The only difference is that a
MESSAGE request is assumed to be for the purpose of instant messaging
as part of the call, whereas INFO is less specific.
Also note that it is still possible for a user to maintain separate
IM and voice/video clients, yet still receive an IM for an existing
call (the IM is delivered to the IM client, of course).
5.3 Proxy processing of MESSAGE requests
Proxies route requests with method MESSAGE the same as they would any
other SIP request (proxy routing in SIP does not depend on the
method). Note that the MESSAGE request MAY fork; this allows for
delivery of the message to several possible terminals where the user
might be.
If a MESSAGE request hits a proxy that uses registrations to route
requests, but no registration exists for the target user in the
request-URI, the request is rejected with a 404 (Not Found). This is
standard behavior for SIP.
It is RECOMMENDED that proxies always insert Record-Route into every
request, as specified in [10].
5.4 UAS processing of MESSAGE requests
As specified in RFC 2543, if a UAS receives a request with a body of
type it does not understand, it MUST respond with a 415 (Unsupported
Media Type) containing an Accept header listing those types which are
acceptable.
Servers MAY reject requests (using a 413 response code) that are too
long, where too long is a matter of local configuration. All servers
MUST accept requests which are up to 1184 bytes in length.
1184 = minimum IPv6 guaranteed length (1280 bytes) minus
UDP (8 bytes) minus IPSEC (48 bytes) minus layer one
encapsulation (40 bytes).
A UAS receiving a MESSAGE request SHOULD respond with a final
response immediately. A 200 OK is sent if the request is acceptable.
Rosenberg et. al. [Page 14]
Internet Draft IM June 15, 2000
Note, however, that the UAS is not obliged to display the message to
the user either before or after responding with a 200 OK. A 200 class
response to a MESSAGE request MAY contain a body, but this will often
not be the case, since these responses are generated automatically.
Like any other SIP request, an IM MAY be redirected, or otherwise
responded to with any SIP response code. Note that a 200 OK response
to a MESSAGE request does not mean the user has read the message.
A UAS MAY include a Contact in a 200 class response. Including a
Contact header enables end to end messaging, which is good for
efficiency. However, it rules out the possibility of effectively
supporting more than one terminal which can handle IM simultaneously.
This odd but seemingly innocuous requirement enables a very
important feature. If a user is connected at several hosts,
an initial IM will fork, and arrive at each. Each UAS
responds with a 200 OK immediately, one of which is
arbitrarily forwarded upstream towards the UAC. If another
IM is sent for the same call-leg, we still wish for this IM
to fork, since we still don't know where the user is
currently residing. This information is known when the user
sends an IM in the reverse direction. This IM will contain
a Contact, and when it arrives at the originator of the
initial MESSAGE, will update the Route so that now IMs are
delivered only to that one host where the user is residing.
A UAS constructs a set of Route headers from the Record-Route and
Contact headers in the MESSAGE request, as per the procedure defined
in [10].
A UAS which is, in fact, a message relay, storing the message and
forwarding it later on, or forwarding it into a non-SIP domain,
SHOULD return a 202 (Accepted) response indicating that the message
was accepted, but end to end delivery has not been guaranteed.
5.5 UAS processing of initial MESSAGE response
A 200 OK response to an initial IM will contain Record-Route headers;
these MUST be used to construct a Route header for use in subsequent
requests for the same call-leg (defined as the combination of remote
address, local address, and Call-ID), using the process described in
Section 6.29 of SIP [2] as if the request were INVITE. Note that the
200 OK response may not contain a Contact header.
A 400 or 500 class response indicates that the message was not
delivered successfully. A 600 response means it was delivered
Rosenberg et. al. [Page 15]
Internet Draft IM June 15, 2000
successfully, but refused.
5.6 Subsequent MESSAGE requests
Subsequent messages follow the path established by the Route headers
computed by the UA. The CSeq header MUST be larger than a CSeq header
used in a previous request for the same call leg. Is is strongly
RECOMMENDED that the CSeq number be computed as described in Section
6.17 of SIP, using a clock. This allows for the CSeq to increment
without requiring the UA to store the previous CSeq values.
MESSAGE requests for an established IM session MUST contain a Tag in
the From field. Responses to an IM SHOULD contain a tag in the To
field.
For SIP experts - this represents a slightly different
operation than for INVITE. When a user sends an INVITE,
they will receive a 200 OK with a tag. Requests in the
reverse direction then contain that tag, and that tag only,
in the From field. Here, the response to IM will contain a
tag in the To field, and a MESSAGE will contain a tag in
the From field. However, the UA may receive MESSAGE
requests with tags in the From field that do not match the
tag in the 200 OK received to the initial IM. This is
because only a single 200 OK is returned to a MESSAGE
request, as opposed to multiple 200 OK for INVITE. Thus,
the UA MUST be prepared to receive MESSAGEs with many
different tags, each from a different PUA.
A UAS MUST be prepared to update the Route is has stored for an IM
session with a Contact received in a request, if that Contact is
different from one previously received, or if there was no Contact
previously.
Note that an IM effectively initates a session. There is state at the
UA associated with that session, encapsulated in the Call-ID, Route
headers, and CSeq numbers. A UA MAY terminate this session at any
time, including after each MESSAGE. No messaging is required to
terminate it. Any associated state with the session is simply
discarded. The idempotency of SIP requests will ensure that if one
side (side A) discards session state, and the other (side B) does
not, a message from side B will appear as a new IM, and standard
processing will reconstitute the session on side A.
5.7 Caller Preferences
User agents SHOULD add the "methods" tag defined in the caller
Rosenberg et. al. [Page 16]
Internet Draft IM June 15, 2000
preference specification [8] to Contact headers placed in REGISTER
requests, indicating support for the MESSAGE method. Other elements
of caller preferences MAY be supported. For example:
REGISTER sip:dynamicsoft.com SIP/2.0
Via: SIP/2.0/UDP mypc.dynamicsoft.com
To: sip:jdrosen@dynamicsoft.com
From: sip:jdrosen@dynamicsoft.com
Call-ID: asidhasd@1.2.3.4
CSeq: 39 REGISTER
Contact: sip:jdrosen@im-pc.dynamicsoft.com;methods="MESSAGE"
Content-Length: 0
Registrar/proxies which wish to offer IM service SHOULD implement the
proxy processing defined in the caller preferences specification [8].
5.8 Security
SIP provides numerous security mechanisms which can be utilized for
instant messaging services.
5.8.1 Privacy
In order to provide privacy of instant messages, it is RECOMMENDED
that between network servers (proxies to proxies, proxies to redirect
servers), transport mode ESP [6] is used to encrypt the entire
message. TLS MAY be used instead. Coupled with persistent connections
between users, it is impossible for eavesdroppers on non-UA
connections to determine when a particular user has even sent an IM,
let alone what the content is. Of course, the content of IMs are
exposed to proxies.
Between a UAC and its local proxy, TLS [11] is RECOMMENDED.
Similarly, TLS SHOULD be used between a proxy and the UAS receiving
the IM. The proxy can determine whether TLS is supported by the
receiving client based on the transport parameter in the Contact
header of its registration. If that registration contains the token
"tls" as transport, it implies that the UAS supports TLS.
Furthermore, we allow for the Contact header in the MESSAGE request
to contain TLS as a transport. The Contact header is used to route
subsequent messages between a pair of entities. It defines the
address and transport used to communicate with the user agent for
subsequent requests in the reverse direction. If no proxies insert
Record-Route headers, the recipient of the original IM, when it
Rosenberg et. al. [Page 17]
Internet Draft IM June 15, 2000
wishes to send an IM back, will use the Contact header, and establish
a direct TLS connection for the remainder of the IM communications.
If a proxy does Record-Route, the situation is different. When the
recipient of the original IM (call this participant B) sends an IM
back to the originator of the original IM (call this participant A),
this will be sent to the proxy closest to B which inserted Record-
Route. This proxy, in turn, sends the request to the proxy before it
which Record-Routed. The first proxy after A which inserted Record-
Route will then use TLS to contact A. Since we suspect that most
proxies will not insert Record-Route into instant messages,
efficient, secure, direct IM will occur frequently.
To prevent sensitive data from being observed by intermediate
proxies, SIP encryption MAY be used end to end for the transmission
of MESSAGE requests. SIP supports PGP based encryption, which does
not require the establishment of a session key for encryption of
messages within a session (basically, a new session key is
established for each message as part of the PGP encryption). Other
encryption mechanisms, such as S/MIME, can be readily defined for
SIP.
5.8.2 Message Integrity and Authenticity
It is important for the message recipient to ensure that the message
contents are actually what was sent by the originator, and that the
recipient of the IM be able to determine who the originator really
is. This is supported in SIP through end to end authentication and
message integrity. SIP provides PGP based authentication and
integrity (both challenge-response and normal signatures), http basic
and digest authentication.
5.8.3 Outbound authentication
When local proxies are used for transmission of outbound messages,
proxy authentication is RECOMMENDED. This is useful to verify the
identity of the originator, and prevent spoofing and spamming at the
originating network.
5.8.4 Replay Prevention
To prevent the replay of old instant messages, all signed MESSAGE
requests and responses SHOULD contain a Date header covered by the
message signature. Any message with a date older than several minutes
in the past, or which is more than several minutes in the future,
SHOULD be answered with a 400 (Incorrect Date or Time) message,
unless such messages arrive repeatedly from the same source, in which
case they MAY be discarded without sending a response. Obviously,
this replay attack prevention mechanism does not work for devices
Rosenberg et. al. [Page 18]
Internet Draft IM June 15, 2000
without clocks.
Furthermore, all signed MESSAGE requests MUST contain a Call-ID and
CSeq header covered by the message signature. A user agent MAY store
a list of Call-ID values, and for each, the higest CSeq seen within
that Call-ID. Any message that arrives for a Call-ID that exists,
whose CSeq is lower than the highest seen so far, is discarded.
Finally, challenge-response authentication MAY be used to prevent
replay protection.
6 Requirements Evaluation
RFC 2779 [3] outlines requirements for IM and presence protocols. The
document describes both shared requirements and IM and presence
specific requirements. Examining each of the IM requirements in turn,
we also observe that they are met by this proposal:
"Requirement 2.1.1: The protocols MUST allow a PRESENCE SERVICE
to be available independent of whether an INSTANT MESSAGE
SERVICE is available, and vice-versa." This requirement is
met by the separation of presence and IM which we propose
here.
"Requirement 2.1.2. The protocols must not assume that an
INSTANT INBOX is necessarily reached by the same IDENTIFIER
as that of a PRESENTITY. Specifically, the protocols must
assume that some INSTANT INBOXes may have no associated
PRESENTITIES, and vice versa." This requirement is also
easily met by any architecture which completely separates
IM and presence as we propose.
"Requirement 2.1.3. The protocols MUST also allow an INSTANT
INBOX to be reached via the same IDENTIFIER as the
IDENTIFIER of some PRESENTITY." Same as above.
"Requirement 2.1.4. The administration and naming of ENTITIES
within a given DOMAIN MUST be able to operate independently
of actions in any other DOMAIN." This requirement is met by
SIP. SIP uses email-like identifiers which consist of a
user name at a domain. Administration of user names is done
completely within the domain, and these user names have no
defined rules or organization that needs to be known
outside of the domain in order for SIP to operate.
Rosenberg et. al. [Page 19]
Internet Draft IM June 15, 2000
"Requirement 2.1.5. The protocol MUST allow for an arbitrary
number of DOMAINS within the NAMESPACE." This requirement
is met by SIP. SIP uses standard DNS domains, which are not
restricted in number.
"Requirement 2.2.1. It MUST be possible for ENTITIES in one
DOMAIN to interoperate with ENTITIES in another DOMAIN,
without the DOMAINS having previously been aware of each
other." This requirement is met by SIP, as it is essential
for establishing sessions as well. DNS SRV [12] records are
used to discover servers for a particular service within a
domain. They are a generalization of MX records, used for
email routing. SIP defines procedures for usage of DNS
records to find servers in another domains, which include
SRV lookups. This allows domains to communicate without
prior setup.
"Requirement 2.2.2: The protocol MUST be capable of meeting its
other functional and performance requirements even when
there are millions of ENTITIES within a single DOMAIN."
Whilst it is hard to judge whether this can be met by
examining the architecture of a protocol, SIP has numerous
mechanisms for achieving large scales of users within a
domain. It allows hierarchies of servers, whereby the
namespace can be partitioned among servers. Servers near
the top of the hierarchy, used solely for routing, can be
stateless, providing excellent scale.
"Requirement 2.2.3: The protocol MUST be capable of meeting its
other functional and performance requirements when there
are millions of DOMAINS within the single NAMESPACE." The
usage of DNS for dividing the namespace into domains
provides the same scale as todays email systems, which
support millions of DOMAINS.
"Requirement 2.3.5: The PRINCIPAL controlling an INSTANT INBOX
MUST be able to control which other PRINCIPALS, if any, can
send INSTANT MESSAGES to that INSTANT INBOX." This is
provided by access control mechanisms, outside the scope of
this extension.
"Requirement 2.3.6: The PRINCIPAL controlling an INSTANT INBOX
MUST be able to control which other PRINCIPALS, if any, can
Rosenberg et. al. [Page 20]
Internet Draft IM June 15, 2000
read INSTANT MESSAGES from that INSTANT INBOX." This is
accomplished through authenticated registration requests.
Registrations are used to determine which user gets
delivered an instant message. Policy in proxies can allow
only certain users to register contact address for a
particular inbox (an inbox is defined by the address-of-
record in the To field in the registration).
"Requirement 2.4.3: The protocol MUST allow the sending of an
INSTANT MESSAGE both directly and via intermediaries, such
as PROXIES." This is fundamental to the operation of SIP.
"Requirement 2.4.4: The protocol proxying facilities and
transport practices MUST allow ADMINISTRATORS ways to
enable and disable protocol activity through existing and
commonly-deployed FIREWALLS. The protocol MUST specify how
it can be effectively filtered by such FIREWALLS." Although
SIP itself runs on port 5060 by default, any other port can
be used. It is simple to specify that IM should run on a
different port, if so desired.
"Requirement 2.5.1. The protocol MUST provide means to ensure
confidence that a received message (NOTIFICATION or INSTANT
MESSAGE) has not been corrupted or tampered with." This is
supported by SIPs PGP and S/MIME authentication mechanism.
"Requirement 2.5.2. The protocol MUST provide means to ensure
confidence that a received message (NOTIFICATION or INSTANT
MESSAGE) has not been recorded and played back by an
adversary." This is provided by SIP's challenge response
authentication mechanisms, through timestamp-based replay
prevention, or through stateful storage of previous
transaction identifiers (the combination of To, From,
Call-ID, CSeq).
"Requirement 2.5.3. The protocol MUST provide means to ensure
that a sent message (NOTIFICATION or INSTANT MESSAGE) is
only readable by ENTITIES that the sender allows." This is
supported through SIPs end to end and hop by hop encryption
mechanisms.
"Requirement 2.5.4. The protocol MUST allow any client to use
Rosenberg et. al. [Page 21]
Internet Draft IM June 15, 2000
the means to ensure non-corruption, non-playback, and
privacy, but the protocol MUST NOT require that all clients
use these means at all times." All algorithms for security
in SIP are optional.
"Requirement 4.1.1. All ENTITIES sending and receiving INSTANT
MESSAGES MUST implement at least a common base format for
INSTANT MESSAGES." We specify text/plain here.
"Requirement 4.1.2. The common base format for an INSTANT
MESSAGE MUST identify the sender and intended recipient."
This is accomplished with the To and From fields in SIP.
"Requirement 4.1.3. The common message format MUST include a
return address for the receiver to reply to the sender with
another INSTANT MESSAGE." This is done through the Contact
headers defined in SIP.
"Requirement 4.1.4. The common message format SHOULD include
standard forms of addresses or contact means for media
other than INSTANT MESSAGES, such as telephone numbers or
email addresses." SIP supports any URL format in the
Contact headers. Furthermore, the body of a MESSAGE request
can be multipart, and contain things like vCards.
"Requirement 4.1.5. The common message format MUST permit the
encoding and identification of the message payload to allow
for non-ASCII or encrypted content." MIME content labeling
is used in SIP.
"Requirement 4.1.6. The protocol must reflect best current
practices related to internationalization." SIP uses UTF-8
and is completely internationalized.
"Requirement 4.1.7. The protocol must reflect best current
practices related to accessibility." Additional
requirements are needed on what is required for
accessibility.
"Requirement 4.1.9. The working group MUST determine whether the
Rosenberg et. al. [Page 22]
Internet Draft IM June 15, 2000
common message format includes fields for numbering or
identifying messages. If there are such fields, the working
group MUST define the scope within which such identifiers
are unique and the acceptable means of generating such
identifiers." This is done with the combination of Call-ID
and CSeq. The mechanisms for guaranteeing uniqueness are
specified in SIP.
"Requirement 4.1.10. The common message format SHOULD be based
on IETF-standard MIME [RFC 2045]." SIP uses MIME.
"Requirement 4.2.1. The protocol MUST include mechanisms so that
a sender can be informed of the SUCCESSFUL DELIVERY of an
INSTANT MESSAGE or reasons for failure. The working group
must determine what mechanisms apply when final delivery
status is unknown, such as when a message is relayed to
non-IMPP systems." SIP specifies notification of successful
delivery through 200 OK. When delivery of requests through
gateways, success can be indicated only through the SIP
component (if the gateway acts as a UAS/UAC) or through the
entire system (if it acts like a proxy).
"Requirement 4.3.1. The transport of INSTANT MESSAGES MUST be
sufficiently rapid to allow for comfortable conversational
exchanges of short messages." The support for end to end
messaging (i.e., without intervening proxies) allows IMs to
be delivered as rapidly as possible. The UDP reliability
mechanisms also support fast recovery from loss.
7 Required SIP features
SIP contains many components and capabilities, only some of which are
needed to support instant messaging. It is a common misconception to
believe that SIP is only good for initiating phone calls. Since SIP
separates the definition of a session to other protocols, such as the
Session Description Protocol (SDP) [13], SIP is best viewed as a
real-time rendezvous system, which allows content to be delivered
from one user, to the current location(s) where another user, the
desired target, is located. This rendezvous system can be used to
deliver invitations to sessions, as is accomplished with the INVITE
method, but other data, such as instant messages, can just as easily
be delivered.
As such, most of the generic components of SIP as they relate to
message routing are useful and needed for this extension, and most of
Rosenberg et. al. [Page 23]
Internet Draft IM June 15, 2000
those related specifically to INVITE, BYE, ACK, and CANCEL processing
are not needed.
This section outlines those components needed, and those not needed,
for IM.
7.1 Needed components
The following are the SIP components needed in a user agent to
support this extension:
o Basic SIP parser, capable of generating To, From, Call-ID,
CSeq, To, Via, Route, Accept, Allow, Require, Record-Route,
Expires, Contact, Content-Length, and Content-Type headers, in
addition to the request and response line.
o UDP transmission mechanisms for non-INVITE requests, which is
nothing more than a periodic retransmit of a request with
exponential backoff.
o Implementation of the client and server state machine for
non-INVITE requests (used for reliable transport), documented
in Section 10.4.1 of RFC 2543.
o The ability to send SIP REGISTER requests, and process
responses, and refresh those registrations.
o Construction and usage of Route headers.
o Support the Require mechanism for protocol extension, as
defined in Section 6.30 of RFC 2543.
o Reject requests with unknown methods, returning an Allow
header in the response.
o Reject requests with unknown bodies, returning an Accept
header in the response.
o Send and process SIP responses based solely on the 100s digit.
o Send responses based on the Via header processing rules of
Section 6.40
If a UA wishes to implement security, it needs to support the
security mechanisms defined in RFC 2543.
A proxy for IM messages has even fewer requirements:
Rosenberg et. al. [Page 24]
Internet Draft IM June 15, 2000
o Parse and generate SIP messages, understanding the To, From,
Call-ID, CSeq, Via, Route, Record-Route, and Proxy-Require
headers, in addition to the request and response line.
o If co-located with a registrar, process SIP REGISTER requests
and generate responses
o Perform the proxying functions described in Section 12 of RFC
2543; these rules mainly concern connection management, Via
processing, loop detection, and transport.
7.2 Components not needed
User agents supporting IM do not need to support the following SIP
capabilities:
o Processing of INVITE, ACK, CANCEL, BYE requests
o Support for the INVITE reliability mechanisms and state
machines
o Multiple 200 OK responses
o SDP processing
o re-INVITEs
Elimination of INVITE processing alone results in a substantial
reduction in required features.
8 Acknowledgements
The authors would like to thank the following people for their
support of the concept of SIP for IM, support for this work, and for
their useful comments and insights:
Jon Peterson Level(3) Communications
Sean Olson Ericsson
Adam Roach Ericsson
Billy Biggs University of Waterloo
Stuart Barkley UUNet
Mauricio Arango SUN
Richard Shockey Shockey Consulting LLC
Jorgen Bjorker Hotsip
Henry Sinnreich MCI Worldcom
Ronald Akers Motorola
Rosenberg et. al. [Page 25]
Internet Draft IM June 15, 2000
9 Author's Addresses
Jonathan Rosenberg
dynamicsoft
200 Executive Drive
Suite 120
West Orange, NJ 07052
email: jdrosen@dynamicsoft.com
Dean Willis
dynamicsoft
200 Executive Drive
Suite 120
West Orange, NJ 07052
email: dwillis@dynamicsoft.com
Robert Sparks
dynamicsoft
200 Executive Drive
Suite 120
West Orange, NJ 07052
email: rsparks@dynamicsoft.com
Ben Campbell
dynamicsoft
200 Executive Drive
Suite 120
West Orange, NJ 07052
email: bcampbell@dynamicsoft.com
Henning Schulzrinne
Columbia University
M/S 0401
1214 Amsterdam Ave.
New York, NY 10027-7003
email: schulzrinne@cs.columbia.edu
Jonathan Lennox
Columbia University
M/S 0401
1214 Amsterdam Ave.
New York, NY 10027-7003
email: lennox@cs.columbia.edu
Christian Huitema
Microsoft Corporation
One Microsoft Way
Rosenberg et. al. [Page 26]
Internet Draft IM June 15, 2000
Redmond, WA 98052-6399
email: huitema@microsoft.com
Bernard Aboba
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
email: bernarda@microsoft.com
David Gurle
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
email: dgurle@microsoft.com
David Oran
Cisco Systems
170 West Tasman Dr.
San Jose, CA 95134
email: oran@cisco.com
10 Bibliography
[1] C. A. DellaFera, M. W. Eichin, R. S. French, D. C. Jedlinsky, J.
T. Kohl, and W. E. Sommerfeld, "The Zephyr notification service," in
USENIX Winter Conference , (Dallas, Texas), Feb. 1988.
[2] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
session initiation protocol," Request for Comments 2543, Internet
Engineering Task Force, Mar. 1999.
[3] M. Day, S. Aggarwal, G. Mohr, and J. Vincent, "Instant messaging
/ presence protocol requirements," Request for Comments 2779,
Internet Engineering Task Force, Feb. 2000.
[4] M. Day, J. Rosenberg, and H. Sugano, "A model for presence and
instant messaging," Request for Comments 2778, Internet Engineering
Task Force, Feb. 2000.
[5] J. Rosenberg and H. Schulzrinne, "SCTP as a transport for SIP,"
Internet Draft, Internet Engineering Task Force, June 2000. Work in
progress.
[6] S. Kent and R. Atkinson, "IP encapsulating security payload
(ESP)," Request for Comments 2406, Internet Engineering Task Force,
Rosenberg et. al. [Page 27]
Internet Draft IM June 15, 2000
Nov. 1998.
[7] D. Harkins and D. Carrel, "The internet key exchange (IKE),"
Request for Comments 2409, Internet Engineering Task Force, Nov.
1998.
[8] H. Schulzrinne and J. Rosenberg, "SIP caller preferences and
callee capabilities," Internet Draft, Internet Engineering Task
Force, Mar. 2000. Work in progress.
[9] S. Donovan, "The SIP INFO method," Internet Draft, Internet
Engineering Task Force, Apr. 2000. Work in progress.
[10] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
session initiation protocol (draft standard)," Internet Draft,
Internet Engineering Task Force, June 2000.
[11] T. Dierks and C. Allen, "The TLS protocol version 1.0," Request
for Comments 2246, Internet Engineering Task Force, Jan. 1999.
[12] A. Gulbrandsen, P. Vixie, and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)," Request for Comments
2782, Internet Engineering Task Force, Feb. 2000.
[13] M. Handley and V. Jacobson, "SDP: session description protocol,"
Request for Comments 2327, Internet Engineering Task Force, Apr.
1998.
Rosenberg et. al. [Page 28]
| PAFTECH AB 2003-2026 | 2026-04-23 15:05:43 |