One document matched: draft-wing-sipping-srtp-key-03.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3711 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml">
<!ENTITY rfc3903 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3903.xml">
<!ENTITY rfc3261 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml">
<!ENTITY rfc3830 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3830.xml">
<!ENTITY I-D.ietf-sip-media-security-requirements SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-media-security-requirements.xml">
<!ENTITY rfc4568 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4568.xml">
<!ENTITY I-D.ietf-sipping-config-framework SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sipping-config-framework.xml">
<!ENTITY I-D.ietf-sip-sips SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-sips.xml">
<!ENTITY I-D.ietf-sip-saml SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-saml.xml">
<!ENTITY I-D.zimmermann-avt-zrtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.zimmermann-avt-zrtp.xml">
<!ENTITY rfc4117 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4117.xml">
<!ENTITY rfc4317 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4317.xml">
<!ENTITY rfc2804 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2804.xml">
<!ENTITY rfc3725 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3725.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc sortrefs="yes" ?>
<?rfc colonspace='yes' ?>
<?rfc tocindent='yes' ?>
<?rfc rfcprocack="yes"?>
<rfc category="std" docName="draft-wing-sipping-srtp-key-03" ipr="full3978">
<front>
<title abbrev="SRTP Recording with SIP">Secure Media Recording and
Transcoding with the Session Initiation Protocol</title>
<author fullname="Dan Wing" initials="D." surname="Wing">
<organization abbrev="Cisco">Cisco Systems, Inc.</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<email>dwing@cisco.com</email>
</address>
</author>
<author fullname="Francois Audet" initials="F." surname="Audet">
<organization abbrev="Nortel">Nortel</organization>
<address>
<postal>
<street>4655 Great America Parkway</street>
<city>Santa Clara</city>
<region>CA</region>
<code>95054</code>
<country>USA</country>
</postal>
<email>audet@nortel.com</email>
</address>
</author>
<author fullname="Steffen Fries" initials="S." surname="Fries">
<organization>Siemens AG</organization>
<address>
<postal>
<street>Otto-Hahn-Ring 6</street>
<city>Munich</city>
<region>Bavaria</region>
<code>81739</code>
<country>Germany</country>
</postal>
<email>steffen.fries@siemens.com</email>
</address>
</author>
<author fullname="Hannes Tschofenig" initials="H" surname="Tschofenig">
<organization>Nokia Siemens Networks</organization>
<address>
<postal>
<street>Otto-Hahn-Ring 6</street>
<city>Munich</city>
<region>Bavaria</region>
<code>81739</code>
<country>Germany</country>
</postal>
<email>Hannes.Tschofenig@nsn.com</email>
<uri>http://www.tschofenig.com</uri>
</address>
</author>
<author fullname="Alan Johnston" initials="A" surname="Johnston">
<organization>Avaya</organization>
<address>
<postal>
<street></street>
<city>St. Louis</city>
<region>MO</region>
<country>USA</country>
</postal>
<email>alan@sipstation.com</email>
</address>
</author>
<date year="2008" />
<workgroup>SIPPING Working Group</workgroup>
<abstract>
<t>Call recording is an important feature in enterprise telephony
applications. Some industries such as financial traders have
requirements to record all calls in which customers give trading orders.
This poses a particular problem for Secure RTP systems as many SRTP key
exchange mechanisms do not disclose the SRTP session keys to
intermediate SIP proxies. As a result, these key exchange mechanisms
cannot be used in environments where call recording is needed.</t>
<t>This document specifies a secure mechanism for a cooperating endpoint
to disclose its SRTP master keys to an authorized party to allow secure
call recording.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>Call recording is an important feature in enterprise telephony
applications. Some industries such as financial traders have
requirements to record all calls in which customers give trading orders.
In others, calls are recorded, as the near ubiquitous announcement says,
"for training and quality control purposes".</t>
<t>Note that the services and examples in this document are not
wiretapping as defined in <xref target="RFC2804">Raven</xref>.
Specifically, all recording done by enterprises is always announced to
both parties. Also, in most circumstances, the intent of the recording
is to protect both parties from later disagreements about what was said
during the conversation or to remedy mistakes made.</t>
<t>First, four different recording modes are discussed. Then example
call flows for how this can be accomplished using standard SIP
primitives. Finally, the impact of encrypted media, SRTP, is
discussed.</t>
</section>
<section title="Terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"></xref> and indicate requirement levels for compliant
mechanisms.</t>
<t>The following terminology is taken directly from <xref
target="RFC3903">SIP Event State Publication Extension</xref>:</t>
<t><list style="hanging">
<t hangText="Event Publication Agent (EPA):">The User Agent Client
(UAC) that issues PUBLISH requests to publish event state.</t>
<t hangText="Event State Compositor (ESC):">The User Agent Server
(UAS) that processes PUBLISH requests, and is responsible for
compositing event state into a complete, composite event state of a
resource.</t>
<t hangText="Publication:">The act of an EPA sending a PUBLISH
request to an ESC to publish event state.</t>
</list></t>
</section>
<section anchor="sec-introduction-to-call-recording"
title="Introduction to SRTP Call Recording">
<t>This document addresses two difficulties with End-to-end encryption
of RTP (<xref target="RFC3711">SRTP</xref>): transcoding and media
recording. When peering with other networks, different codecs are
sometimes necessary (e.g., transcoding a surround-sound codec for
transmission over a highly-compressed bandwidth-constrained network). In
some environments (e.g., stock brokerages and banks) regulations and
business needs require recording calls with coworkers or with customers.
In many environments, quality problems such as echo can only be
diagnosed by listening to the call (analyzing SRTP headers is not
sufficient).</t>
<t>With an RTP stream, transcoding is accomplished by modifying SDP to
offer a different codec through a transcoding device <xref
target="RFC4117"></xref>, and call recording or monitoring can be
accomplished with an Ethernet sniffer listening for SIP and its
associated RTP, with a media relay, or with a Session Border Controller.
However, when media is encrypted end-to-end <xref
target="I-D.ietf-sip-media-security-requirements"></xref>, these
existing techniques fail because they are unable to decrypt the media
packets.</t>
<t>When a media session is encrypted with SRTP, there are three
techniques to decrypt the media for monitoring or call recording:</t>
<t><list style="numbers">
<t>the endpoint establishes a separate media stream to the recording
device, with a separate SRTP key, and sends the (mixed) media to the
recording device. This techniques is often called 'active
recording'. The disadvantages of this technique include doubling
bandwidth requirements in the network and additionally the
processing power on the client side. Moreover, the loss of media
recording facility doesn't cause loss of call (as is required in
some environments). Depending on the application requirements it may
be necessary to establish a reliable connection to the recording
device to cope with possible packet loss on the unreliable link,
typically used for media transport. Because the endpoint maintains
its own key with the connected party, this technique is more secure:
a malicious media recording device cannot inject media to the
connected party on behalf of the endpoint.</t>
<t>the endpoint relays media through a device which forks a separate
media stream to the recording device. This technique is often
employed by Session Border Controllers. This relay does not, itself,
have access to the SRTP key.</t>
<t>Network monitoring devices are used to listen to the SRTP traffic
and correlate SRTP with SIP. This correlation requires cooperation
of call signaling devices if the call signaling is encrypted (e.g.,
with TLS).</t>
</list></t>
<t>This document describes cases (2) and (3) where a cooperating
endpoint publishes its SRTP master keys to an authorized party using the
<xref target="RFC3903">SIP Event State Publication Extension</xref>. The
mechanism can be described as passive recording, as the client is not
directly involved with the media recording. The client merely provides
the key information to a recording device. The mechanism described in
this paper allows secure disclosure of SRTP session keys to authorized
parties so that an endpoints media stream can be transcoded or
decrypted, as needed by that environment. Technique (1) stated above is
not considered further in this document, as it does not require the
disclosure of the key used for the communication between the two
endpoints.</t>
</section>
<section title="Recording Modes">
<t>There are four common modes of call recording which are described in
the following sections.</t>
<section title="Always On Recording">
<t>In the Always On recording mode, for an identified endpoint, phone
number, user or agent, all calls both incoming and outgoing are
recorded. For example, a toll free call to a helpline could utilize
this mode to record the entire text of calls.</t>
</section>
<section title="Recording On Demand">
<t>In the Recording On Demand recording mode, only certain calls are
recorded. For example, in a call center application, personal or
non-call center calls by an agent might not be recorded.</t>
</section>
<section title="Required Recording">
<t>In the Required Recording mode, the requirement for recording is so
strong that if call recording resources are unavailable, the call must
not be setup or an existing call must be disconnected.</t>
</section>
<section title="Pause and Resume Recording">
<t>In the Pause and Resume Recording Mode, only parts of a given call
may be recorded. For example, when the call is placed on hold,
recording may be paused and resumed when the call is resumed. Or, IVR
interactions in which a user enters account numbers and pin numbers
should not be recorded, as the DTMF tones convey private or secure
information. Pausing can be unidirectional or bi-directional.</t>
</section>
</section>
<section title="Recording Call Flows">
<t>This section will show how these four recording modes can be
implemented .</t>
<t>In SIP call recording, the two-way RTP or SRTP media session between
two UAs is sent to a UA referred to as a Recording UA. While it is
possible for recording to be done locally in a UA, this has no impact on
the SIP call flows.</t>
<t>While it is also possible for the recording policy and decision
making to be included in an endpoint, it is more common to have a third
party control recording and cause the RTP or SRTP to be sent to the
Recording UA. In these call flows, this third party will be called the
Controller.</t>
<t>If the Controller acts as a third party call controller <xref
target="RFC3725">(3PCC)</xref>, it is possible for the Controller to
cause each UA to send an extra media stream to the Recorder. However,
for this call flow to work:</t>
<t><list style="numbers">
<t>Both UAs must support multiple media lines and streams sent to
different addresses (e.g., Section 2.4 of <xref target="RFC4317">SDP
Examples</xref>).</t>
<t>Both UAs must have twice the normal bandwidth available.</t>
<t>Both UAs must know to send the same media on both media
streams.</t>
</list></t>
<t>While 1 and 2 are possible, 3 is the most difficult. Without
additional information in the SDP, each media stream is considered a
separate media stream.</t>
<t>Alternatively, the Controller could be a combination of a SIP Proxy
and a media relay (e.g., a Session Border Controller). This media relay
would copy media streams to a second location. The protocol and
coordination between these two elements is outside the scope of this
specification. In another model discussed in Section 5, the Controller
could be a SIP Focus and a Media Server with some special logic.
Finally, the Controller could be realized as a B2BUA.</t>
<t>Using this model, there are no SIP, SDP, or bandwidth requirements on
either UA. The Controller then can cause the media received at the Media
Relay to be copied to the Recorder. An example is shown in <xref
target="fig-controller"></xref>, below where the Recorder records a call
between Alice and Bob.</t>
<figure anchor="fig-controller" title="Controller Proxy or B2BUA">
<artwork align="center"><![CDATA[
Alice Controller Bob Recorder
| | | |
| INVITE F1 | | |
|--------------->| | |
|(100 Trying) F2 | | |
|<---------------| INVITE F3 | |
| |--------------------------------->|
| | | 200 OK F4 |
| |<---------------------------------|
| | | ACK F5 |
| |--------------------------------->|
| | INVITE F6 | |
| |------------->| |
| |180 Ringing F7| |
| |<-------------| |
| 180 Ringing F5 | | |
|<---------------| 200 OK F6 | |
| |<-------------| |
| 200 OK F7 | | |
|<---------------| | |
| ACK F8 | | |
|--------------->| ACK F9 | |
| |------------->| |
| | INVITE F10 | |
| |--------------------------------->|
| | | 200 OK F11 |
| |<---------------------------------|
| | | ACK F12 |
| |--------------------------------->|
| Both way SRTP Established | |
|<==============>|<============>| |
| | SRTP From Alice |
| |=================================>|
| | SRTP From Bob |
| |=================================>|
]]></artwork>
</figure>
<t>The following sections will discuss and extend this basic call flow
for the four recording modes.</t>
<section title="Always On Recording">
<t>The Always On recording mode for the user Bob can be implemented
using the call flow of <xref target="fig-controller"></xref> if every
call made to Bob is handled in this way.</t>
</section>
<section title="Recording On Demand">
<t>In the Recording On Demand recording mode, the call flow of <xref
target="fig-controller"></xref> is used selectively - only for the
calls that need to be recorded. For the non-recorded flows, the
Controller could act as a Proxy Server and make no changes to the
signaling or media flows. By not inserting a Record-Route, the
Controller could even drop out of the SIP dialog for calls where
recording is not of interest.</t>
</section>
<section title="Required Recording">
<t>Required recording could also be implemented using <xref
target="fig-controller"></xref>, as the INVITE is sent first to the
Recorder before being sent to Bob. As a result, if the INVITE is
refused (i.e., the Recorder is unable to record the call), the INVITE
will not be forwarded to Bob and the call refused. Also, if the
Recorder disconnects during the call or is unable to provide recording
resources (i.e., disks full, etc.), the BYE from the Recorder can be
used to terminate the call to Bob. This is show in <xref
target="fig-required-recording"></xref>, below.</t>
<figure anchor="fig-required-recording"
title="Required Recording Call Flow">
<artwork align="center"><![CDATA[
Alice Controller Bob Recorder
| | | |
| Both way SRTP Established | |
|<==============>|<============>| |
| | SRTP From Alice |
| |=================================>|
| | SRTP From Bob |
| |=================================>|
| | | |
| | BYE F1 |
| |<---------------------------------|
| | 200 OK F2 |
| |--------------------------------->|
| | | |
| BYE F3 | | |
|<---------------| | |
| 200 OK F4 | | |
|--------------->| | |
]]></artwork>
</figure>
</section>
<section title="Pause and Resume Recording Call Flow">
<t>The Pause and Resume recording mode can be initiated by the call
flow of Figure 2. When the recording is to be paused, for example,
when the caller Alice places the call on hold, the hold re-INVITE from
Alice causes the Controller to place the call to the Recorder on hold
as well. No media is sent to the Recorder until a re-INVITE starts the
recording again, as shown in <xref target="fig-pause-resume"></xref>,
below.</t>
<figure anchor="fig-pause-resume" title="Pause and Resume Call Flow">
<artwork align="center"><![CDATA[
Alice Controller Bob Recorder
| | | |
| Both way SRTP Established | |
|<==============>|<=============>| |
| | SRTP From Alice |
| |=================================>|
| | SRTP From Bob |
| |=================================>|
| INVITE (hold) F1 | |
|--------------->| INVITE (inactive) F2 |
| |--------------------------------->|
| | 200 OK (inactive) F4 |
| |<---------------------------------|
| | | ACK F5 |
| |--------------------------------->|
| |INVITE (hold) F6 |
| |------------->| |
| |200 OK (hold) F7 |
| |<-------------| |
| 200 OK (hold) F8 | |
|<---------------| | |
| ACK F8 | | |
|--------------->| ACK F9 | |
| |------------->| |
| | | |
| No SRTP Sent |
]]></artwork>
</figure>
</section>
<section title="Conference Recording">
<t>A call flow for conference recording is shown in <xref
target="fig-alternative"></xref>, below. This call flow is similar to
the previous ones except with a focus instead of the Controller. The
recorder SUBSCRIBEs to the focus using the conference event package to
learn of call recording events of interest to the Recorder.</t>
<t>With the subscription established by the SUBSCRIBE, the Recorder
receives NOTIFYs whenever recording events of interest occur from the
Controller. For example, the Recorder is informed when Alice joins the
conference, but recording is not initiated. When notification that Bob
has joined the conference is received in a NOTIFY, F7, is sent. In
this example, the Recorder decides to record the call and sends a
INVITE with Join to the Controller, F16. The dialog information used
to construct the Join header field is obtained using the NOTIFY, F13.
The Focus/Mixer then begins to stream the media to the Recorder for
the duration of the conference.</t>
<t>This model could be used for other recording modes. In this case,
the event package would be a new event package specifically tailored
to the recording application, containing all the information needed by
a Recorder to make a decision on whether or not to record a call. The
details of this event package may be defined in a future draft. Note
that presently, CTI (Computer Telephone Integration) protocols are
used for this purpose today.</t>
<figure anchor="fig-alternative"
title="Conference Recording Call Flow">
<artwork align="center"><![CDATA[
Alice Focus/Mixer Bob Recorder
| | | |
| | SUBSCRIBE F1 |
| |<---------------------------------|
| | | 200 OK F2 |
| |--------------------------------->|
| | NOTIFY F3 | |
| |--------------------------------->|
| | | 200 OK F4 |
| |<---------------------------------|
| INVITE F5 | | |
|--------------->| | |
| 200 OK F6 | | |
|<---------------| | |
| ACK F7 | | |
|--------------->| | |
| SRTP | NOTIFY F8 | |
|<==============>|--------------------------------->|
| | | 200 OK F9 |
| |<---------------------------------|
| | INVITE F10 | |
| |<-------------| |
| |180 Ringing F11 |
| |------------->| |
| | 200 OK F12 | |
| |------------->| |
| | SRTP | |
| |<============>| |
| | NOTIFY F13 | |
| |--------------------------------->|
| | | 200 OK F14 |
| |<---------------------------------|
| | INVITE Join: A-B F15 |
| |<---------------------------------|
| | | 200 OK F16 |
| |--------------------------------->|
| | | ACK F17 |
| |<---------------------------------|
| | Mixed SRTP from Alice and Bob |
| |=================================>|
]]></artwork>
</figure>
</section>
</section>
<section title="Transcoding">
<t>There are similarities between transcoding and call recording,
especially technique 2 described in <xref
target="sec-introduction-to-call-recording"></xref>. An endpoint that
desires transcoding can provide its SRTP key to a transcoder and request
its services.</t>
<t>[[This section is a placeholder, and will be expanded in a later
version of this document.]]</t>
</section>
<section title="Media Considerations">
<t>The following sections will discuss considerations relating to the
media streams.</t>
<section title="Offer/Answer Considerations">
<t>For the call flows in this document, it is assumed that a single
bi-directional media stream is to be recorded. Normally, this would be
negotiated using a single media line (m= line) in the SDP with a
default direction attribute (a=sendrcv). The media stream sent from
the Controller to the Recorder could be done in two different ways,
depending on the media handling in the Controller. In the simplest
case, each direction of the media stream between Alice and Bob could
be converted to a separate uni-directional media stream sent to the
Controller. In the INVITE from the Controller to the Recorder, for a
single recording session, there would be two media lines (m=) with
each marked as send only (a=sendonly). This has the advantage that the
Controller does not have to perform any processing on the RTP packets
- they are simply forwarded without changing SSRC or sequence numbers.
The Recording device will then mix the packets together or possibly
record the two sides of the conversation separately, if desired.</t>
<t>In the other model, the Controller can function as an RTP mixer, in
which case a single uni-directional media stream will be used with a
single media line. The Controller will need to process the RTP packets
by mixing them and including its own SSRC and sequence number in the
resulting RTP packets. The Recorder will then not have to mix them and
will not have the option of recording the two sides separately.</t>
<t>The approach of using two separate media lines is the recommended
one as it allows for simple RTP packet processing at the Controller
and also provides recording flexibility at the Recorder. However, a
Recorder should also be able to handle the case where the Controller
performs the mixing as well.</t>
</section>
<section title="Operation">
<t>For transcoding, RTP packets must be sent from and received by a
device which performs the transcoding. When the media is encrypted,
this device must be capable of decrypting the media, performing the
transcoding function, and re-encrypting the media.</t>
<t><list>
<t>ISSUE-1: should we consider providing some or all of the SIP
headers, as well? Some recording functions will need to know the
identity of the remote party. This information could be gleaned
from the SIP proxies, though, and starts to fall outside the
intended scope of this document.</t>
<t>ISSUE-2: The authors have been considering use of <xref
target="RFC3830">MIKEY</xref>, but MIKEY may not be used off the
shelf. Certain changes to the state machine may have to be made
(<xref target="RFC3830">MIKEY</xref> describes the TGK transport
rather than SRTP master key transport).</t>
</list></t>
<section title="Learning Name and Certificate of ESC">
<t>The endpoint will be configured with the AOR of its ESC (e.g.,
"transcoder@example.com"). If S/MIME is used to send the SRTP master
key to the ESC, the endpoint is additionally configured with the
certificate of its ESC.</t>
<t>The name and public key of the ESC is configured into the
endpoint. It is vital that the public key of the ESC is not changed
by an unauthorized user. Changes to change that public key will
cause SRTP key disclosure to be encrypted with that key. It is
RECOMMENDED that endpoints restrict changing the public key of the
disclosure device using protections similar to changes to the
endpoint's SIP username and SIP password.</t>
</section>
<section title="Authorization of ESC">
<t>Depending on the application, authorization of the key disclosure
and distribution to the ESC may be necessary besides the pure
transport security of the key distribution itself. This may be the
case when the <xref
target="I-D.ietf-sipping-config-framework">configuration
framework</xref> is not applied and thus the information about the
ESC is not known to the client.</t>
<t>This can be done by providing a <xref
target="I-D.ietf-sip-saml">SAML extension</xref> in the header of
the SUBSCRIBE message. The SAML assertion shall at least contain the
information about the ESC, call related information to associate the
call with the assertion (editors note: we may also define wildcards
here to allow for recordings of all phone calls for a day,
independent of the call) and a reference to the certificate for the
ESC. The latter information is needed to transport the SRTP Session
Key to the ESC in a protected manner, as described in the section
below.</t>
<t>The signature of the SAML assertion should be produced using the
private key of the domain certificate. This certificate MUST have a
SubjAltName which matches the domain of user agent's SIP proxy (that
is, if the SIP proxy is sip.example.com, the SubjAltName of the
domain certificate signing this SAML assertion MUST also be
example.com). Here, the main focus is placed on communication of
clients with the ESC, which belongs to the client's home domain.</t>
</section>
<section title="Sending SRTP Session Keys to ESC">
<t>SDP is used to describe the media session to the ESC. However,
the existing <xref target="RFC4568">Security Descriptions</xref>
only describes the master key and parameters of the SRTP packets
being sent -- it does not describe the master key (and parameters)
of the SRTP being received, or the SSRC being transmitted. For
transcoding and media recording, both the sending key and receiving
key are needed and in some cases the SSRC is needed.</t>
<t>Thus, we hereby extend the existing crypto attribute to indicate
the SSRC. We also create a new SDP attribute, "rcrypto", which is
identical to the existing "crypto" attribute, except that it
describes the receiving keys and their SSRCs. For example:</t>
<figure anchor="sdp_example" title="Example SDP">
<artwork><![CDATA[
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
SSRC=1899
a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKKIN1Mw|2^20|1:32
SSRC=3289
a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
inline:Hw3JFWNCFqSpTqNiYRj6HmSWKMHAmO4q1KIN1OVA|2^20|1:32
SSRC=4893
]]></artwork>
<postamble></postamble>
</figure>
<t>The full SDP, including the keying information, is then sent to
the ESC. The keying information MUST be encrypted and integrity
protected. Existing mechanisms such as <xref
target="RFC3261">S/MIME</xref> and <xref
target="I-D.ietf-sip-sips">SIPS</xref> or SIP over TLS (on all hops
per administrative means) MAY be used to achieve this goal, or other
mechanisms may be defined.</t>
<t><list style="hanging">
<t hangText="[[">ISSUE-3: if a endpoint is receiving multiple
incoming streams from multiple endpoints, it will have
negotiated different keys with each of them, and all of that
traffic is coming to the same transport address on the endpoint.
Thus, we need a way to describe the different keys we're using
to/from different transport addresses. One solution is to
indicate the remote transport address. Indicating the remote
SSRC is insufficient for this task, as several SRTP keying
mechanisms do not include SSRC in their signaling (DTLS-SRTP,
ZRTP, Security Descriptions). <vspace blankLines="1" />For
example, if there were two remote peers with different keys, we
could signal it like this:<figure anchor="Issue_example_SDP"
title="Strawman solution">
<preamble></preamble>
<artwork><![CDATA[ a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
192.0.2.1:5678 SSRC=1899 SSRC=3892
a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKKIN1Mw|2^20|1:32
192.0.2.1:5678 SSRC=3289 SSRC=2813
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:GdUJShpX1ZLEw6UzF3WSJjNzB4d1BINUAv+PSdFc|2^20|1:32
192.0.2.222:2893
a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
inline:6UzF3IN1ZLEwAv+PSdFcWUGdUJShpXSJjNzB4d1B|2^20|1:32
192.0.2.222:2893]]></artwork>
<postamble></postamble>
</figure></t>
<t hangText="]]"></t>
</list></t>
</section>
<section title="Scenarios and Call Flows">
<t>The following scenarios and call flows depict the assumptions for
the provision of media key disclosure. <xref
target="topology"></xref> shows the general setup within the home
domain of the client. Note that the authors assume that the client
only discloses media keys only to an entity in the client's home
network rather than to an arbitrary entity in the visited
network.</t>
<figure anchor="topology" title="Network Topology">
<artwork><![CDATA[
+----------+ +-------+ +---------+ +--------+ +----------+
| SIP User | | SIP | |SIP Proxy| | Media | | SIP |
|Agent(EPA)| | Proxy | | (ESC) | |Recorder| |User Agent|
+----------+ +-------+ +---------+ +--------+ +----------+
| | | | |
+----------+----------+-----------+-----------+]]></artwork>
</figure>
<t>Based on this setup there are different options to realize the
key disclosure, depending on the environment. In the following two
approaches are distinguished.</t>
<t><list style="hanging">
<t hangText="Publishing media keys to the ESC"><vspace
blankLines="1" /> This requires that the configuration
management provides the ESC configuration data (e.g.,
certificate, policy) in a secure way to the client. As stated
above, this configuration is outside the scope of this document,
but an example can be found in <xref
target="I-D.ietf-sipping-config-framework"></xref>. The key
disclosure in this approach uses the PUBLISH method to disclose
the key to the ESC according to a given policy. <vspace
blankLines="1" /> <figure anchor="fig-message-flow-publishing"
title="Message Flow showing Publishing of Media Keys to ESC">
<artwork><![CDATA[
+----------+ +-------+ +---------+ +--------+ +----------+
| SIP User | | SIP | |SIP Proxy| | Media | | SIP |
|Agent(EPA)| | Proxy | | (ESC) | |Recorder| |User Agent|
+----------+ +-------+ +---------+ +--------+ +----------+
| | | | |
|-REGISTER->| | | |
|<-200 OK---| | | |
| | | | |
|--INVITE-->|-------------INVITE------------->|
|<-200 Ok---|<------------200 Ok------------- |
| | | | |
|<====SRTP in both directions================>|
| | | | |
|-PUBLISH-->|-PUBLISH-->|-key----->| |
|<-200 Ok---|<--200 Ok--| | |
]]></artwork>
</figure> <vspace blankLines="1" />Note that the protocol
between the ESC and the recorder is out of scope of this
document.</t>
<t hangText="Using SAML assertions for ESC contact"><vspace
blankLines="1" /> In this approach authorization is provided via
a SAML assertion, see <xref target="I-D.ietf-sip-saml"></xref>,
indicating which ESC is allowed to perform call recording of a
single or a set of calls, depending on the content of the
assertion. Here a SAML assertion is provided as part of the
SUBSCRIBE message, send from the ESC to the client. The
assertion needs to provide at least the call relation, or a time
interval for which media recoding is going to be performed. The
SAML assertion is signed with the private key associated with
the domain certificate, which is in possession of the
authentication service. The call flow would look like following:
<vspace blankLines="1" /> <figure anchor="fig-publish-saml"
title="Message Flow Showing Publication using SAML">
<artwork><![CDATA[
+----------+ +-------+ +---------+ +--------+ +----------+
| SIP User | | SIP | |SIP Proxy| | Media | | SIP |
|Agent(EPA)| | Proxy | | (ESC) | |Recorder| |User Agent|
+----------+ +-------+ +---------+ +--------+ +----------+
| | | | |
|-REGISTER->| | | |
|<-200 OK---| | | |
| | | | |
|<-SUBSCRIBE (SAML as.)-| | |
| | | | |
|--INVITE-->|-------------INVITE------------->|
|<-200 Ok---|<------------200 Ok------------- |
| | | | |
|<====SRTP in both directions================>|
| | | | |
|--NOTIFY (SRTP data)-->| | |
| | | | |
]]></artwork>
</figure></t>
</list></t>
</section>
</section>
</section>
<section title="Grammar">
<t>[[Grammar will be provided in a subsequent version of this
document.]]</t>
</section>
<section title="Security Considerations">
<t></t>
<section title="Incorrect ESC">
<t>Insertion of the incorrect public key of the SRTP ESC will result
in disclosure of the SRTP session key to an unauthorized party. Thus,
the UA's configuration MUST be protected to prevent such
misconfiguration. To avoid changes to the configuration in the end
device, the configuration access MUST be suitably protected.</t>
</section>
<section anchor="disclosing_srtp_session_key"
title="Risks of Sharing SRTP Session Key">
<t>A party authorized to obtain the SRTP session key can listen to the
media stream and could inject data into the media stream as if it were
either party. The alternatives are worse: disclose the device's
private key to the transcoder or media recording device, or abandon
using secure SRTP key exchange in environments that require media
transcoding or media recording. As we wish to promote the use of
secure SRTP key exchange mechanisms, disclosure of the SRTP session
key appears the least of these evils.</t>
</section>
<section title="Disclosure of Call Recording">
<t>Secure SRTP key exchange techniques which implement this
specification SHOULD provide a "disclosure flag", similar to that
first proposed in Appendix B of <xref
target="I-D.zimmermann-avt-zrtp"></xref>. In this way, both endpoints
can be made aware of such recording and provide appropriate alerting
to their users (via an audible, visual, or other indicator).</t>
</section>
<section title="Integrity and encryption of keying information">
<t>The mechanism describe in this specification relies on protecting
and encrypting the keying information. There are well known mechanism
to achieve that goal.</t>
<t>Using SIPS to convey the SRTP key exposes the SRTP master key to
all SIP proxies between the Event Publication Agent (ESC, the SIP User
Agent) and the Event State Compositor (ESC). S/MIME allows disclosing
the SRTP master key to only the ESC.</t>
</section>
</section>
<section title="IANA Considerations">
<t>New SSRC extension of the "crypto" attribute, and the new "rcrypto"
attribute will be registered here.</t>
</section>
<section title="Examples">
<figure anchor="sips_example" title="Example with "SIPS:" AOR">
<preamble>This is an example showing a SIPS AOR for the ESC. This
relies on the SIP network providing TLS encryption of the SRTP master
keys to the ESC.</preamble>
<artwork><![CDATA[
PUBLISH sips:recorder@example.com SIP/2.0
Via: SIP/2.0/TLS pua.example.com;branch=z9hG4bK652hsge
To: <sips:recorder@example.com>
From: <sips:dan@example.com>;tag=1234wxyz
Call-ID: 81818181@pua.example.com
CSeq: 1 PUBLISH
Max-Forwards: 70
Expires: 3600
Event: srtp
Content-Type: application/sdp
Content-Length: ...
v=0
o=alice 2890844526 2890844526 IN IP4 client.atlanta.example.com
s=-
c=IN IP4 192.0.2.101
t=0 0
m=audio 49172 RTP/SAVP 0
a=crypto:1 AES_CM_128_HMAC_SHA1_80
inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKI8K1Mw|2^20|1:32
a=rtpmap:0 PCMU/8000
]]></artwork>
<postamble></postamble>
</figure>
<figure anchor="s_mime_example"
title="Example with S/MIME-encrypted SDP">
<preamble>This is an example showing an S/MIME-encrypted transmission
to the recorder's AOR, recorder@example.com. The data enclosed in "*"
is encrypted with recorder@example.com's public key.</preamble>
<artwork><![CDATA[
PUBLISH sip:recorder@example.com SIP/2.0
Via: SIP/2.0/UDP pua.example.com;branch=z9hG4bK652hsge
To: <sip:recorder@example.com>
From: <sip:dan@example.com>;tag=1234wxyz
Call-ID: 81818181@pua.example.com
CSeq: 1 PUBLISH
Max-Forwards: 70
Expires: 3600
Event: srtp
Content-Type: application/pkcs7-mime;smime-type=enveloped-data;
name=smime.p7m
Content-Transfer-Encoding: binary
Content-ID: 1234@atlanta.example.com
Content-Disposition: attachment;filename=smime.p7m;
handling=required
Content-Length: ...
******************************************************************
* (encryptedContentInfo) *
* Content-Type: application/sdp *
* Content-Length: ... *
* *
* v=0 *
* o=alice 2890844526 2890844526 IN IP4 client.atlanta.example.com*
* s=- *
* c=IN IP4 192.0.2.101 *
* t=0 0 *
* m=audio 49172 RTP/SAVP 0 *
* a=crypto:1 AES_CM_128_HMAC_SHA1_80 *
* inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32 *
* a=rcrypto:1 AES_CM_128_HMAC_SHA1_80 *
* inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKI8K1Mw|2^20|1:32 *
* a=rtpmap:0 PCMU/8000 *
* *
******************************************************************]]></artwork>
<postamble></postamble>
</figure>
<t></t>
</section>
<section title="Acknowledgements">
<t>Thanks to Sheldon Davis and Val Matula for suggesting improvements to
the document.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc3711;
&rfc3903;
&rfc3261;
</references>
<references title="Informational References">
&rfc3830;
&I-D.ietf-sip-media-security-requirements;
&rfc4568;
&I-D.ietf-sipping-config-framework;
&I-D.ietf-sip-sips;
&I-D.ietf-sip-saml;
&I-D.zimmermann-avt-zrtp;
&rfc4117;
&rfc4317;
&rfc2804;
&rfc3725;
</references>
<section title="Outstanding Issues">
<t>Authors' to-do list:<list style="symbols">
<t>Separate B2BUA function from media relay function in the call
flows and in the text.</t>
</list></t>
</section>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-23 08:36:29 |