One document matched: draft-wing-avt-dtls-srtp-key-transport-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY I-D.ietf-avt-dtls-srtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-avt-dtls-srtp.xml">
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4346 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4346.xml">
<!ENTITY rfc2827 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2827.xml">
<!ENTITY I-D.ietf-mmusic-sdp-capability-negotiation SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mmusic-sdp-capability-negotiation.xml">
<!ENTITY rfc5234 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml">
<!ENTITY rfc4566 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml">
<!ENTITY rfc5117 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5117.xml">
<!ENTITY rfc2627 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2627.xml">
<!ENTITY rfc3711 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml">
<!ENTITY rfc3550 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml">
<!ENTITY rfc4086 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4086.xml">
<!ENTITY I-D.ietf-msec-gdoi-srtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-msec-gdoi-srtp.xml">
<!ENTITY rfc4568 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4568.xml">
<!ENTITY I-D.mcgrew-srtp-ekt SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.mcgrew-srtp-ekt.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc rfcprocack="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc sortrefs="yes" ?>
<?rfc colonspace='yes' ?>
<?rfc tocindent='yes' ?>
<rfc category="std" docName="draft-wing-avt-dtls-srtp-key-transport-02"
ipr="full3978">
<front>
<title abbrev="DTLS-SRTP Key Transport (KTR)">DTLS-SRTP Key
Transport</title>
<author fullname="Dan Wing" initials="D." surname="Wing">
<organization abbrev="Cisco">Cisco Systems, Inc.</organization>
<address>
<postal>
<street>170 West Tasman Drive</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<email>dwing@cisco.com</email>
</address>
</author>
<date year="2008" />
<workgroup>AVT Working Group</workgroup>
<abstract>
<t>The existing DTLS-SRTP specification allows SRTP keys to be
established between a pair of SRTP endpoints. However, when there are
more than two participants in an RTP session, DTLS-SRTP is unable to
provide a single key for all of the participants. This existing
limitation of DTLS-SRTP prevents deploying DTLS-SRTP in certain
scenarios.</t>
<t>This document describes an extension to DTLS-SRTP, called Key
Transport (KTR). This extension transports SRTP keying material from one
DTLS-SRTP peer to another, so the same SRTP keying material can be used
by multiple DTLS-SRTP peers. This extension eliminates the need to key
each SRTP session individually, allowing cost-effective deployment of
several DTLS-SRTP scenarios.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>When <xref target="I-D.ietf-avt-dtls-srtp">DTLS-SRTP</xref>
establishes <xref target="RFC3711">Secure RTP</xref> master keys, each
peer contributes part of the keying material to derive the SRTP master
key. In some scenarios it is desirable for one peer to change its SRTP
key and to transmit SRTP packets using an SRTP key that was not derived
from the DTLS key exchange. This allows one peer to significantly reduce
cryptographic operations in many scenarios as described in detail in
<xref target="sec_scenarios"></xref>.</t>
<t>The extension described in this document allows transporting an SRTP
master key from one DTLS peer to the other. Thus, DTLS Key Transport
differs from normal DTLS-SRTP in that the SRTP master key is not derived
from the TLS handshake.</t>
</section>
<section title="Notational Conventions">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"></xref>.</t>
<t>A "listener" is an endpoint that only receives an SRTP stream. A
"speaker" is an endpoint that only transmits an SRTP stream. And
endpoint can be both a listener and a speaker.</t>
</section>
<section anchor="sec_scenarios" title="Scenarios">
<t>KTR allows mixers and video switchers to avoid having to encrypt each
packet multiple times under multiple SRTP keys, by allowing a single
SRTP key to be shared with the multiple recipients that are receiving
the SRTP stream.</t>
<t>Several SRTP scenarios that benefit from KTR are described in the
following sections, using terminology from <xref target="RFC5117">RTP
Topologies</xref>.</t>
<section anchor="section_point_multipoint"
title="Point to Multipoint using the RFC 3550 mixer model">
<t>This RTP scenario is described in Section 3.4 of <xref
target="RFC5117">RTP Topologies</xref>.</t>
<t>With DTLS-SRTP, this topology is computationally expensive for the
video switcher because it has to encrypt the payload uniquely for each
SRTP listener. Additionally, the architecture of a typical mixer
requires each listener's SRTP to be encrypted serially, incurring
additional delay for each successive listener. This is depicted below
in <xref target="multipoint-without-kt"></xref>.</t>
<figure anchor="multipoint-without-kt"
title="Point to Multipoint Mixer, without DTLS Key Transport">
<preamble></preamble>
<artwork align="center"><![CDATA[ +-------key=F-------+
| |
V +-------+ +------------+
+----+----+ | +--key=C->+ listener 1 |
| speaker +--key=A-->+ | +------------+
+---------+ | | +------------+
| mixer +--key=D->+ listener 2 |
+---------+ | | +------------+
| speaker +--key=B-->+ | +------------+
+----+----+ | +--key=E->+ listener 3 |
^ +---+---+ +------------+
| |
+-------key=G-------+
]]></artwork>
<postamble></postamble>
</figure>
<t>With KTR, the mixer can maintain one outbound SRTP cryptographic
context, and encrypt the SRTP once for all listeners. This is depicted
below in <xref target="multipoint-with-kt"></xref>.</t>
<figure anchor="multipoint-with-kt"
title="Point to Multipoint Mixer, with DTLS Key Transport">
<preamble>In the following figure, "=" indicates sessions where
DTLS-SRTP Key Transport is used, and "-" indicates where only
DTLS-SRTP is necessary. In this topology, only the listeners need
support KTR so that the switcher and the listeners can benefit from
KTR. In this scenario with DTLS-SRTP Key Transport, the mixer
assumes an additional role -- group's key server -- and provides a
common group SRTP key ("C") to all of the listeners. This group SRTP
key is shared between all of the listeners. The two speakers,
however, receive a unique stream (just as in the scenario above),
but to prevent a two-time (pad<xref target="two-time-pad"></xref>),
their content is encrypted using a different SRTP keys ("D" and
"E").</preamble>
<artwork align="center"><![CDATA[
+=======key=D=======+
| |
V +---+---+ +------------+
+---------+ | +==key=C=>+ listener 1 |
| speaker +--key=A-->+ | +------------+
+---------+ | | +------------+
| mixer +==key=C=>+ listener 2 |
+---------+ | | +------------+
| speaker +--key=B-->+ | +------------+
+----+----+ | +==key=C=>+ listener 3 |
^ +---+---+ +------------+
| |
+=======key=E=======+
]]></artwork>
<postamble></postamble>
</figure>
<t>The mixer is aware of listeners leaving or joining, and the mixer
can rekey the remaining active listeners.</t>
</section>
<section anchor="scenario-multicast"
title="Point to Multipoint using Multicast">
<t>This RTP topology is described in Section 3.2 of <xref
target="RFC5117">RTP Topologies</xref>.</t>
<t>With DTLS-SRTP, this scenario is not attainable because each
listener has a unique SRTP key. For this reason, <xref
target="I-D.ietf-msec-gdoi-srtp"></xref> was developed by the MSEC
working group.</t>
<t>With KTR, this scenario is attainable because the same key can be
provided to multiple listeners, as depicted below in <xref
target="multipoint-multicast-with-kt"></xref>. This compares favorably
with <xref target="I-D.ietf-msec-gdoi-srtp"></xref> when the group
size is small enough that the speaker can perform key server functions
(i.e., perform KTR) for all of the listeners.</t>
<figure anchor="multipoint-multicast-with-kt"
title="Point to Multipoint using Multicast with Key Transport">
<preamble></preamble>
<artwork align="center"><![CDATA[ +-------+ +------------+
/ \==key=A===>+ listener 1 |
/ \ +------------+
+---------+ | multicast | +------------+
| speaker +==key=A===>+ network +==key=A==>+ listener 2 |
+---------+ | | +------------+
\ / +------------+
\ /===key=A==>+ listener 3 |
+-------+ +------------+]]></artwork>
<postamble></postamble>
</figure>
</section>
<section anchor="sec-video-switching"
title="Point to Multipoint Using Video Switching MCUs">
<t>This RTP topology is described in Section 3.5 of <xref
target="RFC5117">RTP Topologies</xref>.</t>
<t>With DTLS-SRTP, this topology is computationally expensive for the
video switcher because it has to encrypt the payload uniquely for each
SRTP listener. Additionally, the architecture of a typical video
switcher requires each listener's SRTP to be encrypted serially,
incurring additional delay for each successive listener. This is
depicted below in <xref
target="multipoint-video-without-kt"></xref>.</t>
<figure anchor="multipoint-video-without-kt"
title="Point to Multipoint Video Switching, without DTLS Key Transport">
<preamble>In the following figure, KTR is used on all sessions and
depicted by "=". In this scenario, both the speakers and listeners
must support KTR so that the switcher and the listeners can benefit
from KTR.</preamble>
<artwork align="center"><![CDATA[
+-------key=F-------+
| |
V +---+------+ +------------+
+---------+ | +==key=C=>+ listener 1 |
| speaker +==key=A==>+selected | +------------+
+---------+ | | +------------+
| switcher +==key=D=>+ listener 2 |
+---------+ | | +------------+
| speaker +==key=B==>+dropped | +------------+
+----+----+ | +==key=E=>+ listener 3 |
^ +---+------+ +------------+
| |
+-------key=G-------+]]></artwork>
<postamble></postamble>
</figure>
<t>With DTLS key transport, this becomes easier; in fact, the video
switcher doesn't need to decrypt the SRTP at all, but just make its
decision (select the stream or drop the stream) and transmit the SRTP
packets to the listeners. This is depicted below in <xref
target="multipoint-video-with-kt"></xref>.</t>
<figure anchor="multipoint-video-with-kt"
title="Point to Multipoint Video Switching, with DTLS Key Transport">
<preamble></preamble>
<artwork align="center"><![CDATA[ +-------key=B-------+
| |
V +---+------+ +------------+
+----+----+ | +==key=A=>+ listener 1 |
| speaker +==key=A==>+selected | +------------+
+---------+ | | +------------+
| switcher +==key=A=>+ listener 2 |
+---------+ | | +------------+
| speaker +==key=B==>+prev.spkr | +------------+
+---------+ | +==key=A=>+ listener 3 |
^ +----------+ +------------+
| |
+-------key=A-------+]]></artwork>
<postamble></postamble>
</figure>
<t>The video switcher is aware of listeners leaving or joining. The
protocol described in this document allows the switcher to dictate, to
the speaker, that the speaker use a new encryption key. This allows
the switcher to enforce security, based on the switcher's policy
(<xref target="sec-group-policy"></xref>). This is done by the video
switcher sending a DTLS "your_new_srtp_key" message. The speaker will
respond with a DTLS "new_srtp_key" message which echos the same key.
The "new_srtp_key" message is relayed, by the switcher, to each of the
active listeners.</t>
<t>When there are multiple speakers, as shown in <xref
target="multipoint-video-with-kt"></xref> above, each speaker
transmits with his own SRTP key. That SRTP key is derived from the
DTLS handshake with the switcher. Each speaker uses KTR to signal the
SSRC that it will use.</t>
</section>
<section anchor="sec-scaling" title="Scaling to Large Groups">
<t>This section describes how DTLS-SRTP-Key-Transport supports large
groups of listeners, both for unicast and multicast scenarios.</t>
<section anchor="sec-scaling-lkh" title="Rekeying SRTP Quickly">
<t>When a new listener is added, or an existing listener is removed,
a new SRTP master key is necessary to retain the security of the
SRTP media. Normally this causes "n" cryptographic operations for
"n" listeners. These cryptographic operations take time, and if the
group is large enough or the processor slow enough, there can be a
considerable delay before all listeners receive the new SRTP key
(and can decrypt the stream).</t>
<t>A solution to the problem is to use a subset difference based key
management scheme <xref
target="I-D.irtf-smug-subsetdifference"></xref>. In this scheme, the
key server (the speaker) can send a message so that every authorized
listener -- but no unauthorized listeners -- can decrypt the
message. The message contains the new SRTP key. The advantage of
this scheme is that subset difference allows the message to be
encrypted just once, no matter how many listeners there are.</t>
<t>An implementation of subset-difference based key management is
<xref target="RFC2627">Logical Key Heirarchy (LKH)</xref>), which is
useful for unicast and multicast. LKH is supported by primitives
defined in this document, and the LKH "NET KEY" is communicated
using the KTR primitive "LKH_NET_KEY".</t>
</section>
<section anchor="sec-scaling-distributed"
title="Distributed Key Servers">
<t>Another problem with all group scenarios is that because each
listener establishes a DTLS-SRTP session with the speaker, only a
finite number of listeners can be supported (the speaker cannot
handle millions of DTLS-SRTP sessions). This is especially
problematic for multicast, but is also a problem for "large"
groups.</t>
<t>One workaround to the problem is distributing the DTLS-SRTP
keying to other devices in the network. In this scheme, one key
server is responsible for a sensible number of listeners and has
sufficient CPU power to update those listeners with new SRTP master
keys. This is done with a new SDP attribute, dtls-srtp-ktr-server,
which indicates the IP address and port of DTLS-SRTP server
associated with the media line.</t>
<t>There would need to be some communication between the KTR servers
to communicate new SRTP keys to the listeners. This communication is
for future study.</t>
</section>
</section>
<section title="Interworking with Other SRTP Key Management Systems">
<section title="Security Descriptions">
<t>Today, <xref target="RFC4568">Security Descriptions</xref> is
used for distributing SRTP keys in several different IP PBX systems
and is expected to be used by 3GPP's Long Term Evolution (LTE). The
IP PBX systems are typically used within a single enterprise, and
LTE is used within the confines of a mobile operator's network. A
Session Border Controller is a reasonable solution to interwork
between Security Descriptions (inside the enterprise or mobile
operator) and DTLS-SRTP (outside the enterprise), and would be
placed at the edge of the enterprise network or the edge of the
mobile operator's network.</t>
<t>However, due to the way Security Descriptions and DTLS-SRTP
manage their SRTP keys, such an SBC has to authenticate, decrypt,
re-encrypt, and re-authenticate the SRTP (and SRTCP) packets in one
direction, as shown in <xref
target="interworking-expensive"></xref>, below. This is not
desirable as it increases the cost of this SBC.</t>
<figure anchor="interworking-expensive"
title="Interworking Security Descriptions and DTLS-SRTP">
<preamble></preamble>
<artwork align="center"><![CDATA[RFC4568 endpoint SBC DTLS-SRTP endpoint
| | |
1. |---key=A------------->| |
2. | |<-DTLS-SRTP handshake->|
3. |<--key=B--------------| |
4. | |<--SRTP, encrypted w/B-|
5. |<-SRTP, encrypted w/B-| |
6. |-SRTP, encrypted w/A->| |
7. | (decrypt, re-encrypt) |
8. | |-SRTP, encrypted w/C-->|
| | |]]></artwork>
<postamble></postamble>
</figure>
<t>The message flow is as follows (similar steps occur with
SRTCP):<list style="numbers">
<t>The <xref target="RFC4568">Security Descriptions</xref>
endpoint discloses its SRTP key to the SBC, using a=crypto in
its SDP.</t>
<t>SBC completes DTLS-SRTP handshake. From this handshake, the
SBC derives the SRTP key for traffic from the DTLS-SRTP endpoint
(key B) and to the DTLS-SRTP endpoint (key C).</t>
<t>The SBC communicates the SRTP encryption key (key B) to the
Security Descriptions endpoint (using a=crypto). (There is no
way, with DTLS-SRTP, to communicate the Security Descriptions
key to the DTLS-SRTP key endpoint.)</t>
<t>The DTLS-SRTP endpoint sends an SRTP key, encrypted with its
key B. This is received by the SBC.</t>
<t>The received SRTP packet is simply forwarded; the SBC does
not need to do anything with this packet as its key (key B) was
already communicated in step 3.</t>
<t>The Security Descriptions endpoint sends an SRTP packet,
encrypted with its key A.</t>
<t>The SBC has to authenticate and decrypt the SRTP packet
(using key A), and re-encrypt it and generate an HMAC (using key
C).</t>
<t>The SBC sends the new SRTP packet.</t>
</list></t>
<t>KTR can help avoid the computaionally-expensive operation so the
SBC does not need not perform any per-packet operations on the SRTP
(or SRTCP) packets in either direction. With KTR the SBC can simply
forward the SRTP (and SRTCP) packets in both directions without
per-packet HMAC or cryptographic operations.</t>
<t>To accomplish this, KTR must be supported on the DTLS-SRTP
endpoint, which allows the SBC to transport the Security Description
key to the KTR endpoint and send the DTLS-SRTP key to the Security
Descriptions endpoint. This works equally well for both incoming and
outgoing calls. An abbreviated message flow is shown in <xref
target="interworking-cheap"></xref>, below.</t>
<figure anchor="interworking-cheap"
title="Interworking Security Descriptions and KTR">
<preamble></preamble>
<artwork align="center"><![CDATA[RFC4568 endpoint SBC DTLS-SRTP endpoint
| | |
1. |---key=A------------->| |
2. | |<-DTLS-SRTP handshake->|
3. |<--key=B--------------| |
4. | |--new_srtp_key:A------>|
5. | |<--SRTP, encrypted w/B-|
5. |<-SRTP, encrypted w/B-| |
6. |-SRTP, encrypted w/A->| |
7. | |-SRTP, encrypted w/A-->|
| | |]]></artwork>
<postamble></postamble>
</figure>
<t>The message flow is as follows (similar steps occur with
SRTCP):<list style="numbers">
<t>Security Descriptions endpoint discloses its SRTP key to the
SBC (a=crypto).</t>
<t>SBC completes DTLS-SRTP handshake. From this handshake, the
SBC derives the SRTP key for traffic from the DTLS-SRTP endpoint
(key B) and to the DTLS-SRTP endpoint (key C).</t>
<t>The SBC communicates the SRTP encryption key (key B) to the
Security Descriptions endpoint.</t>
<t>The SBC uses the KTR to indicate the key (key A) the SBC will
encrypt packets with key A to the DTLS-SRTP endpoint.</t>
<t>The DTLS-SRTP endpoint sends an SRTP key, encrypted with its
key B. This is received by the SBC.</t>
<t>The received SRTP packet is simply forwarded; the SBC does
not need to do anything with this packet as its key (key B) was
communicated in step 3.</t>
<t>The Security Descriptions endpoint sends an SRTP packet,
encrypted with its key A.</t>
<t>The received SRTP packet is simply forwarded; the SBC does
not need to do anything with this packet as its key (key A) was
communicated in step 4.</t>
</list></t>
</section>
</section>
</section>
<section anchor="sec_protocol_description" title="Protocol Description">
<t>This section describes the extension to the DTLS protocol for KTR,
which allows securely communicating the SRTP key to the DTLS peer.</t>
<section title="key_transport (KTR) extension to DTLS-SRTP">
<t>This document adds a new negotiated extension called
"key_transport", which MUST only be requested in conjunction with the
"use_srtp" extension (Section 3.2 of <xref
target="I-D.ietf-avt-dtls-srtp"></xref>). The DTLS server indicates
its support for key_transport by including key_transport in its
ServerHello message. If a DTLS client includes key_transport in its
ClientHello, but does not receive key_transport in the ServerHello,
the DTLS client MUST NOT send DTLS packets with the srtp_key_transport
content-type.</t>
<t>Support for the DTLS Key Transport extension is indicated in SDP
with the "srtp-kt" attribute. Advertising support for the extension is
necessary in SDP because in some cases it is required to establish an
SRTP call. For example, a mixer may be able to only support SRTP
listeners if those listeners implement DTLS Key Transport (because it
lacks the CPU cycles necessary to encrypt SRTP uniquely for each
listener).</t>
<figure anchor="handshake_message_flow" title="Handshake Message Flow">
<preamble>A message flow showing a DTLS client and DTLS server using
the key_transport extension</preamble>
<artwork align="center"><![CDATA[
Client Server
ClientHello + use_srtp + key_transport
-------->
ServerHello + use_srtp + key_transport
Certificate*
ServerKeyExchange*
CertificateRequest*
<-------- ServerHelloDone
Certificate*
ClientKeyExchange
CertificateVerify*
[ChangeCipherSpec]
Finished -------->
[ChangeCipherSpec]
<-------- Finished
SRTP packets <-------> SRTP packets
]]></artwork>
<postamble></postamble>
</figure>
<t>After successful negotiation of the key_transport extension, the
DTLS client and server MAY exchange SRTP packets, encrypted using the
KDF described in <xref target="I-D.ietf-avt-dtls-srtp"></xref>. This
is normal and expected, even if Key Transport was negotiated by both
sides, as neither side may (yet) have a need to alter the SRTP key.
However, it is also possible that one (or both) peers will immediately
send a new_srtp_key message before sending any SRTP.</t>
</section>
<section title="KTR Primitives">
<t>A new protocol is defined, called the srtp_key_transport protocol
which uses srtp_key_transport content-type which consists of the
following message types (primitives):</t>
<t><list style="hanging">
<t hangText="new_srtp_key_request:">request that the DTLS peer
choose a new key. Valid responses are new_srtp_key and
new_srtp_key_error.</t>
<t hangText="your_new_srtp_key:">Dictates a new SRTP key for the
peer to use when the peer transmits its SRTP packets.</t>
<t hangText="new_srtp_key:">contains the new SRTP keying material,
the master key, master salt, SSRC, rollover counter, and sequence
number. This message is sent by a DTLS endpoint that wants to
change its SRTP key beginning at the indicated sequence number.
This does not change any cryptographic parameters (a new DTLS
handshake is necessary for that), but only the DTLS key for the
associated SRTP session. This message includes the SSRC that will
be used for this key, which allows listeners to establish one SRTP
crypto-context per speaker (necessary for the video switching
scenario). The key chosen MUST be cryptographically random <xref
target="RFC4086"></xref>. This master keying material is processed
by the standard SRTP key deriviation function (Section 4.3.1 of
<xref target="RFC3711">SRTP</xref>) to provide the session
keys.</t>
<t hangText="new_key_activate:">indicates receiver is prepared to
receive SRTP packets encrypted with the new key.</t>
<t hangText="lkh_net_key">The Logical Key Hierarchy NET KEY.</t>
<t hangText="new_srtp_key_failure:">indicates a failure.</t>
</list></t>
<t>At any time, the DTLS client or DTLS server MAY send a
key_transport message, as shown in <xref
target="new_key_message_flow"></xref>. The sender of the new_srtp_key
message MAY immediately start transmitting SRTP packets with this new
key. However, to account for loss of the new_srtp_key message it is
RECOMMENDED that the sender wait before changing to the new SRTP key
until it receives the new_key_activate message or it times out waiting
for the new_key_activate_message. The duration of this timeout may
vary depending on the sensitivity of the content (e.g., 1 second or 10
seconds). In any case, the new_srtp_key message is retransmitted until
acknowledged by receipt of a new_key_activate message.</t>
<figure anchor="new_key_message_flow" title="New Key Message Flow">
<preamble></preamble>
<artwork align="center"><![CDATA[Client / Server Server / Client
[new_srtp_key_request] -------->
<-------- new_srtp_key
new_srtp_key_activiate -------->]]></artwork>
<postamble></postamble>
</figure>
<t>The following figure shows the state machine for the protocol.</t>
<figure anchor="state_machine"
title="Key Transport protocol state machine">
<preamble></preamble>
<artwork align="center"><![CDATA[ receive new_srtp_key_request from peer
or decide to choose new SRTP key
|
|
send V
new_srtp_key +---------------+ timeout
+---------| Communicate |--------+
| | Key | |
+-------->| | |
+---------------+ |
| ^ |
receive | | +----------------+
new_key_activate| +---| send SRTP using|
| | new SRTP key |
+----------------+ +----------------+
| send SRTP using|
| new SRTP key |
+----------------+
|
V
done]]></artwork>
<postamble></postamble>
</figure>
<figure anchor="datastructure" title="Data Structures">
<preamble>Using the syntax described in <xref
target="RFC4346">TLS</xref>, the following structures are
used:</preamble>
<artwork align="center"><![CDATA[
enum {
new_srtp_key_request(0),
your_new_srtp_key(1),
new_srtp_key(2),
new_srtp_key_activate(3),
lkh_net_key(4),
new_srtp_key_failure(128),
(255)
} SRTPKeyTransportType;
struct {
SRTPKeyTransportType keytrans_type;
uint24 length;
uint16 message_seq;
uint24 fragment_offset;
uint24 fragment_length;
select (SRTPKeyTransportType) {
case new_srtp_key_request: NewSRTPKeyRequest;
case your_new_srtp_key: YourNewSRTPKey;
case new_srtp_key: NewSRTPKey;
case new_srtp_key_activate: NewSRTPKeyActivate;
case lkh_net_key: LKHNetKey;
case new_srtp_key_failure: NewSRTPKeyFailure;
};
} KeyTransport;
struct {
uint random<64>; // additional entropy for peer
} NewSRTPKeyRequest;
struct {
boolean any_ssrc; // true=this key applies to any SSRC
uint32 ssrc; // SSRC used for this key.
uint key<16..32>; // change_cipher_spec decides
uint auth_tag<4..10>; // the key and auth_tag length.
uint salt<112>;
uint roc<32>;
uint sequence<16>;
uint random<64>; // random provides additional entropy
// for peer
} NewSRTPKey;
struct {
uint random<64>; // additional entropy for peer
} NewSRTPKeyActivate;
struct {
uint lkhnetkeylength; // length in bits, divided by 8
uint lkhnetkey<128..1024>; // LKH NET KEY
} LKHNetKey;
struct { } NewSRTPKeyFailure;]]></artwork>
<postamble></postamble>
</figure>
</section>
<section anchor="procedures" title="Procedures for Network Elements">
<t>A 'speaker' is an endpoint that terminates the DTLS-SRTP exchange
and also sends SRTP data towards its peer(s). This is usually
indicated by 'sendrecv' or 'sendonly'.</t>
<t>A 'listener' is an endpoint that terminates the DTLS-SRTP exchange
and also receives SRTP data from its peer(s). This is usually
indicated by 'sendrecv' or 'recvonly'.</t>
<t>As the Key Transport extension was negotiated during the DTLS-SRTP
handshake, an endpoint can send Key Transport primitives, and can
become a speaker or become a listener, at any point.</t>
<section anchor="procedure_speaker" title="Speaker">
<t>When a new speaker joins, the speaker can immediately send SRTP
using the key derived from the DTLS-SRTP handshake. There is no
scaling advantage to all of the speakers using the same key (because
their content is different), and if the speakers did use the same
key it would also introduce the risk of a two-time pad.</t>
<t>Once a speaker begins sending SRTP packets using a key
communicated via KTR (i.e., the NewSRTPKey primitive), the speaker
MUST NOT revert to using the SRTP key derived from the DTLS-SRTP
handshake.</t>
<t>If the speaker wants to use KTR, or is requested to change its
SRTP key (via the NewSRTPKeyRequest primitive), the speaker chooses
a new SRTP master key and salt, and chooses a sequence number a
reasonable distance in the future (1 second is recommended). The
speaker then sends this new key using the NewSRTPKey primitive. The
NewSRTPKey primitive message is re-transmitted until acknowledged
with a NewKeyActivate message. No matter if a NewKeyActiviate is
received or not, the speaker changes keys at its previously-chosen
sequence number.</t>
<t>continue SRTP key may be determined via DTLS-SRTP or by a KTR
primitives. In either case, the speaker's SRTP key and SSRC is
communicated, to each peer.</t>
</section>
<section anchor="procedure_mixer" title="Mixer">
<t>When a new speaker joins a mixer, the speaker does not need to
support KTR, and no KTR procedures need to occur with the speaker.
This is because the listener needs to decrypt and examine the
speaker's stream, and the mixer will mix, re-originate (with its own
SSRC) and re-encrypt the speaker's stream to the listeners.</t>
<t>The mixer functions as a speaker (<xref
target="procedure_speaker"></xref>) towards the listeners connected
to the mixer.</t>
<t>When a speaker leaves, there is no need to propagate that
information beyond the mixer.</t>
<t>When a listener joins or leaves, the mixer MUST rekey all of the
listeners based on the conference policy (<xref
target="sec-group-policy"></xref>).</t>
</section>
<section anchor="procedure_switcher" title="Switcher">
<t>When a new speaker joins, the switcher communicates the speaker's
key to all listeners using the NewSRTPKey primitive. In this way,
whenever one of the speakers becomes the active speaker, the active
speaker's SRTP can be immediately sent to all listeners.</t>
<t>In the event there are a large number of (potentially active)
speakers and it is not feasible to inform all listeners of all
speaker's keys, the switcher MAY decide to defer informing
listeneners of a speaker's key until the speaker becomes the active
speaker. This can cause some clipping when a speaker becomes the
active speaker.</t>
</section>
<section anchor="procedure_listener" title="Listener">
<t>When a listener joins, the listener is provided the same SRTP
master key as the other listeners. This is done with the NewSRTPKey
primitive. SRTP master keys are associated with both an SSRC and the
RTP sequence number. A single SRTP stream might have multiple keys
active at any point in time, such as when other listeners are
joining or leaving. For example, two NewSRTPKey primitives can
indicate that for a single SSRC value, key "A" is for sequence
numbers 100-200, and key B is for 200-300.</t>
<t>If a listener is also a speaker, it also follows the rules of a
speaker.</t>
<t>A listener can receive an SRTP packet with an unknown SSRC which
could caused by either:</t>
<t><list style="symbols">
<t>the speaker changed its SSRC (due to SSRC collision)</t>
<t>the speaker changed its SRTP master key</t>
</list>the listener can attempt to authenticate the packet using
the most-recently-used SRTP master key, which helps in the first
case. If the second case has occurred, the listener can only wait
until the sender (the speaker, the mixer, or the switcher) sends a
NewSRTPKey primitive.</t>
</section>
</section>
<section title="Key Transport SSRC and RTP SSRC Collisions">
<t>Per <xref target="RFC3550"></xref>, if an RTP source notices an RTP
SSRC collision, it is required to change its SSRC. If it has
negotiated support for KTR, it then MUST also send a NewSRTPKey
message indicating the new SSRC. The communication of the new SSRC is
necessary if there are multiple speakers in the video switching
scenario. However, because a speaker is not able to determine if their
audio or their video is being switched, a speaker MUST always indicate
a change in SSRC by following the procedure in this section for any
SRTP stream (audio, video, or other).</t>
<t>When this is done, in order to prevent clipping in listeners, it is
RECOMMENDED that the speaker retain the same SRTP master key and
salt.</t>
</section>
<section title="Fragmentation, Reassembly, and Retransmission">
<t>Much like the DTLS handshake itself, the KTR extension also needs
to handle fragmentation and reassembly (to send a large key) and
retransmission (to account for packet loss). This is to allow
communicating SRTP keys which are longer than the network MTU. The
same technique as DTLS's handshake are used to provide this function:
message_seq, fragment_offset, and fragment_length.</t>
<t>When transmitting the key transport message, the sender divides the
message into a series of N contiguous data ranges. These ranges MUST
NOT be larger than the maximum handshake fragment size and MUST
jointly contain the entire key transport message. The ranges SHOULD
NOT overlap. The sender then creates N key transport messages, all
with the same message_seq value as the original key transport message.
Each new message is labelled with the fragment_offset (the number of
bytes contained in previous fragments) and the fragment_length (the
length of this fragment). The length field in all messages is the same
as the length field of the original message. An unfragmented message
is a degenerate case with fragment_offset=0 and
fragment_length=length.</t>
<t>When a DTLS implementation receives a key transport message
fragment, it MUST buffer it until it has the entire key transport
message. DTLS implementations MUST be able to handle overlapping
fragment ranges. This allows senders to retransmit key transport
messages with smaller fragment sizes during path MTU discovery.</t>
</section>
<section title="SDP extensions">
<t>Two new SDP attributes are defined, dtls-srtp-ktr and
dtls-srtp-ktr-server. dtls-srtp-ktr merely indicates the endpoint is
capable of DTLS-SRTP-KTR, and is helpful to diagnose interoperability
issues. dtls-srtp-ktr-server causes the DTLS handshake to occur with a
different host than that indicated by the c/m lines, which is useful
to help offload computational effort from the speaker (<xref
target="sec-scaling-distributed"></xref>). Either attribute can appear
at the media level or session level.</t>
<figure>
<preamble>The <xref target="RFC5234">ABNF</xref> for new <xref
target="RFC4566">SDP</xref> attributes is as follows:</preamble>
<artwork><![CDATA[
ktr-server = "dtls-srtp-ktr-server:" port
[space nettype space addrtype
space connection-address]
ktr-capable = "dtls-srtp-ktr"
]]></artwork>
<postamble>Only the port is required; if the nettype is not
indicated, the network type, address type, and connection-address
are all the same as on the associated c= line.</postamble>
</figure>
</section>
</section>
<section title="Examples">
<figure anchor="fig_example_simple_offer_kt_required"
title="Simple SDP offer showing Key Transport is required">
<preamble>The following example shows how Key Transport would be
requested in an offer, using "a=dtls-srtp-kt".</preamble>
<artwork><![CDATA[
v=0
o=- 25678 753849 IN IP4 192.0.2.1
s=
c=IN IP4 192.0.2.1
t=0 0
m=audio 53456 UDP/TLS/RTP/SAVP 0
a=fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
a=dtls-srtp-ktr]]></artwork>
<postamble></postamble>
</figure>
<figure anchor="fig_example_offer_kt_required"
title="Example SDP offer showing Key Transport is required">
<preamble>Using the SDP syntax described in <xref
target="I-D.ietf-mmusic-sdp-capability-negotiation"></xref>, the
following figure shows an offerer that requires DTLS Key Transport in
order to set up this call as an SRTP call, otherwise it can set up
this call as an RTP call. This is indicated by the ",2" on the
"a=pcfg" line. If the answerer does not understand "a=dtls-srtp-kt"
but does understand DTLS-SRTP and <xref
target="I-D.ietf-mmusic-sdp-capability-negotiation"></xref>, this can
cannot be established using DTLS-SRTP; however, it can be established
using RTP.</preamble>
<artwork><![CDATA[
v=0
o=- 25678 753849 IN IP4 192.0.2.1
s=
c=IN IP4 192.0.2.1
t=0 0
m=audio 53456 RTP/AVP 0
a=tcap:1 UDP/TLS/RTP/SAVP
a=acap:1 fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
a=acap:2 dtls-srtp-ktr
a=pcfg:1 t=1 a=1,2 ]]></artwork>
<postamble></postamble>
</figure>
<figure anchor="fig_example_offer_kt_optional"
title="Example SDP offer showing Key Transport is optional">
<preamble>Using the SDP syntax described in <xref
target="I-D.ietf-mmusic-sdp-capability-negotiation"></xref>, the
following figure shows an offerer that indicates support for DTLS Key
Transport but does not require DTLS Key Transport in order to set up
this call as an SRTP call. This is indicated by the ",[2]" on the
"a=pcfg" line. If the answerer does not understand "a=dtls-srtp-kt"
but does understand DTLS-SRTP and <xref
target="I-D.ietf-mmusic-sdp-capability-negotiation"></xref>, this call
can still be established using DTLS-SRTP.</preamble>
<artwork><![CDATA[
v=0
o=- 25678 753849 IN IP4 192.0.2.1
s=
c=IN IP4 192.0.2.1
t=0 0
m=audio 53456 RTP/AVP 0
a=tcap:1 UDP/TLS/RTP/SAVP
a=acap:1 fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
a=acap:2 dtls-srtp-ktr
a=pcfg:1 t=1 a=1,[2] ]]></artwork>
<postamble></postamble>
</figure>
<t></t>
<figure anchor="fig_example_ktr_server"
title="Example showing alternate key server">
<preamble>The following example shows a Key Transport offer where the
DTLS-SRTP-KTR exchange occurs with another server.</preamble>
<artwork><![CDATA[
v=0
o=- 25678 753849 IN IP4 192.0.2.1
s=
c=IN IP4 192.0.2.1
t=0 0
m=audio 53456 UDP/TLS/RTP/SAVP 0
a=fingerprint:SHA-1 \
4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
a=dtls-srtp-ktr
a=dtls-srtp-ktr-server:37382 IN IP4 192.0.2.2]]></artwork>
<postamble></postamble>
</figure>
</section>
<section anchor="security_considerations" title="Security Considerations">
<t>In the point-to-multipoint scenario, <xref
target="section_point_multipoint"></xref>, all of the listeners know the
key being used by the mixer. Any of those listeners could create SRTP
packets that are encrypted with this same key, and send those SRTP
packets to other listeners. In order to reduce the vulnerability to this
threat, it is RECOMMENDED that the source transport address of received
SRTP packets be discarded if they do not match the source transport
address of the associated DTLS-SRTP session. Additionally, the network
SHOULD prevent IP address spoofing <xref target="RFC2827"></xref>.</t>
<section anchor="sec-group-policy"
title="Group Policy when Joining/Leaving">
<t>When sharing SRTP keys with several listeners, it is imperative
that the SRTP is changed when a new listener is added or a listener is
removed. This is because a legitimate listener should only be able to
decrypt the SRTP stream while he is listening; he should not be able
to decrypt the SRTP stream prior to joining the conference or after
leaving the conference. Failing to change the key when a listener
joins (or leaves) allows a listener to decrypt SRTP traffic prior to
(or after) they are authorized participants in the conference.</t>
<t>Policies for a specific user's access to a conference may be
different from conference to conference. For example, a
company-internal event announcing promotions might be accessible to
all employees and have no need for re-keying when listeners join or
leave the conference. As another example, a conference where a job
candidate is interviewed should be rekeyed when the job candidate
joins the conference and again when the job candidate leaves the
conference.</t>
<t>The protocol described in this document allows whichever policy is
needed for a particular situation. The protocol itself does not
enforce a certain policy; that is, the protocol itself does not ensure
the SRTP key is changed when a listener leaves (or joins) the
conference.</t>
<t>The RTP sequence number in the NewSRTPKey primitive allows the old
key to be used for a predictable period of time before switching to
the new key. This can provide sufficient time for all listeners to
learn the new SRTP key before the sender switches to the new key.</t>
</section>
<section anchor="two-time-pad" title="Two-Time Pad">
<t>[[expand this section.]]</t>
<t>In some scenarios, different data is sent to different
participants. For example, in the audio mixer scenario, the active
speaker receives a different stream than the other listeners; the
active speaker's stream does not contain the active speaker's own
input. It is critical that the same SRTP key is not used for the
different content, or else a (so-called) "two-time pad" occurs
(Section 9.1 of <xref target="RFC3711"></xref>).</t>
<t>The same SRTP key MUST NOT be used to send different data.</t>
</section>
</section>
<section title="Acknowledgements">
<t>Thanks to David McGrew for his improvements to this document and to
the underlying protocol. Thanks to Brian Weis, Sheela Rowles, and Mark
Baugher for suggesting how GDOI-SRTP's key management could be used by
DTLS-SRTP.</t>
<t>Thanks to Flemming Andreasen for the reminder regarding two-time
pads, to John Floroiu for reminder of salting key.</t>
<!--
<t>Thanks to Geir Arne Sandbakken for suggesting the H.323 interworking
scenario.</t>
-->
</section>
<section title="IANA Considerations">
<t>[[This section will be completed in a future version of this
document.]]</t>
<t>To do: <list style="symbols">
<t>Register new SDP attribute "dtls-srtp-ktr"</t>
<t>Register new SDP attribute "dtls-srtp-ktr-server"</t>
<t>new TLS content-type "key_transport" (26?)</t>
</list></t>
</section>
</middle>
<back>
<references title="Normative References">
&I-D.ietf-avt-dtls-srtp;
&rfc2119;
&rfc4346;
&rfc2827;
&I-D.ietf-mmusic-sdp-capability-negotiation;
&rfc5234;
&rfc4566;
</references>
<references title="Informational References">
&rfc5117;
&rfc2627;
&rfc3711;
&rfc3550;
&rfc4086;
&I-D.ietf-msec-gdoi-srtp;
<!--
<reference anchor="H.323" target="http://www.itu.int/rec/T-REC-H.323/e">
<front>
<title>H.323 : Packet-based multimedia communications
systems</title>
<author fullname="ITU" surname="ITU">
<organization></organization>
</author>
<date month="June" year="2006" />
</front>
</reference>
<reference anchor="H.235.6"
target="http://www.itu.int/rec/T-REC-H.235.6/en">
<front>
<title>H.323 security framework: Voice encryption profile with
native H.235/H.245 key management</title>
<author fullname="ITU" surname="ITU">
<organization></organization>
</author>
<date month="September" year="2005" />
</front>
</reference>
<reference anchor="H.235.7"
target="http://www.itu.int/rec/T-REC-H.235.7/en">
<front>
<title>H.323 security framework: Usage of the MIKEY key management
protocol for the Secure Real Time Transport Protocol (SRTP) within
H.235</title>
<author fullname="ITU" surname="ITU">
<organization></organization>
</author>
<date month="September" year="2005" />
</front>
</reference>
<reference anchor="H.235.8"
target="http://www.itu.int/rec/T-REC-H.235.8/en">
<front>
<title>Recommendation H.235.8 - H.323 Security framework: Key
exchange for SRTP using secure signalling channels</title>
<author fullname="ITU" surname="ITU">
<organization></organization>
</author>
<date month="September" year="2005" />
</front>
</reference>
-->
&rfc4568;
&I-D.mcgrew-srtp-ekt;
<reference anchor="I-D.irtf-smug-subsetdifference"
target="http://tools.ietf.org/html/draft-irtf-smug-subsetdifference">
<front>
<title>Subset-Difference based Key Management for Secure
Multicast</title>
<author fullname="Jeff Lotspiech" initials="J." surname="Lotspiech">
<organization>IBM</organization>
</author>
<author fullname="Moni Naor" initials="M." surname="Naor">
<organization>Weizmann Institute</organization>
</author>
<author fullname="Dalit Naor" initials="D." surname="Naor">
<organization>IBM</organization>
</author>
</front>
</reference>
</references>
<section anchor="ekt" title="Relationship with EKT">
<t><xref target="I-D.mcgrew-srtp-ekt">Encrypted Key Transport
(EKT)</xref> uses RTCP to send new SRTP keys. For EKT to operate, it
needs to distribute its Key Encryption Key (KEK) to all authorized
listeners, and EKT describes how Security Descriptions can be provide
that function. While KTR could also provide the same function, KTR as
described in this document does not support EKT.</t>
<t>This is because EKT can not satisfy the video switching scenario
(<xref target="sec-video-switching"></xref>) when listeners are ejected
or added to the group. In order for EKT to work in that scenario, the
video switcher would have to synthesize RTCP packets on behalf of the
video sender, or the video switcher would have to tell the video sender
exactly how to generate its EKT KEK message for consumption by the
DTLS-SRTP-Key-Transport listeners -- which is something only the video
switcher should be responsible for doing. Even more complexity would be
introduced if LKH is used between the video switcher and the listeners,
because only the video switcher is aware of the group membership (the
speaker is not) and the video switcher would have to communicate LKH
hierarchical information to the speaker so the speaker could generate
the EKT message. This would distribute LKH between the speaker and the
video switcher. It is more desirable to retain LKH complexity within the
video switcher -- as is proposed in <xref
target="sec-scaling"></xref>.</t>
<t>For the other scenarios, EKT or KTR would work equally well. But EKT
still needs a way to securely communicate its Key Encryption Key to the
authorized listeners, and if KTR was used to provide that function,
there seems no value in using EKT to distribute new keys -- KTR can do
that.</t>
<t>For these reasons, KTR does not describe how it would work with
EKT.</t>
</section>
<section title="Changes">
<t>[[Note to RFC Editor: Please remove this section prior to
publication.]]</t>
<section title="Changes from -00 to -01">
<t><list style="symbols">
<t>more closely aligned with <xref target="RFC5117">RTP
Topologies</xref></t>
<t>added multicast scenario</t>
<t>added voicemail storage/retrieval scenario</t>
<t>added delete_srtp_key</t>
<t>added your_new_srtp_key</t>
<t>aligned SDP for DTLS-SRTP with
draft-ietf-mmusic-sdp-dtls-00</t>
<t>key change rules are now discussed in Security
Considerations</t>
</list></t>
</section>
<section title="Changes from -01 to -02">
<t><list style="symbols">
<t>removed voicemail storage/retrieval scenario -- SPEECHSC found
such a scenario does not work</t>
<t>Described what occurs when a speaker joins/leaves, a listener
joins/leaves, and what a mixer/switcher does (<xref
target="procedures"></xref>).</t>
<t>Removed primitives that can allow two-time pad.</t>
<t>Added scenario for interworking with Security Descriptions</t>
<t>Describe relationship with EKT (<xref target="ekt"></xref>)</t>
</list></t>
</section>
</section>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-23 15:49:15 |