http://stupid.domain.name/ietf/

One document matched: draft-ietf-mmusic-ice-tcp-04.xml
<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc='yes'?>
<?rfc tocdepth='5'?>

<?rfc compact='yes'?>
<?rfc subcompact='no'?>

<rfc ipr="full3978" category="std">


    <front>
        <title abbrev="ICE-TCP">
TCP Candidates with Interactive Connectivity Establishment (ICE
</title>
    
        <author initials="J.R." surname="Rosenberg"
                fullname="Jonathan Rosenberg">
            <organization>Cisco</organization>
    
            <address>
                <postal>
                    <city>Edison</city> <region>NJ</region>
                    <country>US</country>
                </postal>
    
                <email>jdrosen@cisco.com</email>
                <uri>http://www.jdrosen.net</uri>
            </address>
        </author>
    
        <date month="July" year="2007" />
    
        <area>RAI</area>
        <workgroup>MMUSIC</workgroup>
        <keyword>SIP</keyword>
        <keyword>NAT</keyword>
        <abstract>
            <t>Interactive Connectivity Establishment (ICE) defines a
mechanism for NAT traversal for multimedia communication protocols
based on the offer/answer model of session negotiation. ICE works by
providing a set of candidate transport addresses for each media
stream, which are then validated with peer-to-peer connectivity checks
based on Session Traversal Utilities for NAT (STUN). ICE provides a
general framework for describing candidates, but only defines
UDP-based transport protocols. This specification extends ICE
to TCP-based media, including the ability to offer a mix of
TCP and UDP-based candidates for a single stream.</t>
        </abstract>
    </front>

<middle>

<!-- microsoft guys comment #1: the current tcp mechanism doesn't work
when one side is behind symmetric, other is behind not-symmetric. The
guy behind the not-symmetric nat will never get a IP address that
shows up in a candidate that it can use for the suicide attempt from
the other turn server. Proposed solution is to extend TURN to include
the mapped address in the response for tcp also. -->

<!-- stun over tcp - allow RTP/TCP mux. Use contrans to frame STUN
within TCP. This allows TCP processing to be the same as UDP as far as
ICE is concerned. This allows the hack discussed in ICE - of sending
to the validated candidate right away - to work for tcp -->

<!-- main change: allow sending media to validated candidate that you
would otherwise promote to m/c-line right away, promote it at the next
opportunity -->

<!-- for relayed addresses, make sure its ok to use UDP to the TCP
allocated address -->

<!-- make sure wording is not specific to turn/stun, only uses those
as examples on how to meet specific functoinal requirements -->

<!-- there is still some vague wording in here about whether there can
be an initial active candidate for tcp; clarify when you would do this
and be less vague -->

<!-- need to add example -->

<!-- january 2007 -->

<!-- per gonzalo's note mid-december, clarify connection
management. Namely, you can close a connection at any time but SHOULD
keep it open. Clarify more differences with 4145. -->


<section title="Introduction">

<t>
Interactive Connectivity Establishment (ICE) <xref
target="I-D.ietf-mmusic-ice"/> defines a mechanism for NAT traversal
for multimedia communication protocols based on the offer/answer model
<xref target="RFC3264"/> of session negotiation. ICE works by
providing a set of candidate transport addresses for each media
stream, which are then validated with peer-to-peer connectivity checks
based on Session Traversal Utilities for NAT (STUN) <xref
target="I-D.ietf-behave-rfc3489bis"/>. However, ICE only defines
procedures for UDP-based transport protocols.</t>

<t> There are many reasons why ICE support for TCP is
important. Firstly, there are media protocols that only run over
TCP. Examples of such protocols are web and application sharing and
instant messaging
<xref target="I-D.ietf-simple-message-sessions"/>. For these protocols
to work in the presence of NAT, unless they define their own NAT
traversal mechanisms, ICE support for TCP is needed. In addition, RTP
itself can run over TCP (without <xref target="RFC4571"/> and with TLS
<xref target="RFC4572"/>). Typically, it is preferable to run RTP over
UDP, and not TCP. However, in a variety of network environments,
overly restrictive NAT and firewall devices prevent UDP-based
communications altogether, but general TCP-based communications are
permitted. In such environments, sending RTP over TCP, and thus
establishing the media session, may be preferable to having it fail
altogether. With this specification, agents can gather UDP and TCP
candidates for an RTP-based stream, list the UDP ones with higher
priority, and then only use the TCP-based ones if the UDP ones fail
altogether. This provides a fallback mechanism that allows multimedia
communications to be highly reliable.
</t>

<t>
The usage of RTP over TCP is particularly useful when combined
with Traversal Using Relay NAT
<xref target="I-D.ietf-behave-turn"/>. In this case, one of 
the agents would connect to its TURN server using TCP, and obtain a
TCP-based relayed candidate. It would offer 
this to its peer agent as a candidate. The answerer would initiate a
TCP connection towards the TURN server. When that connection is
established, media can flow over the connections, through the
TURN server. The benefit of this usage is that it only requires the agents
to make outbound TCP connections to a server on the public
network. This kind of operation is broadly interoperable through NAT
and firewall devices. Since it is a goal of ICE and this extension to
provide highly reliable communications that "just works" in as a broad
a set of network deployments as possible, this use case is particularly
important.  </t>

<t>
The usage of RTP over TCP/TLS is also useful when communicating between
single-user agents (such as a softphone or hardphone) and an agent run
by a provider that is meant to service many users, such as a PSTN
gateway. In such a deployment, the multi-user agent would act as the
TLS server, and have a certificate. The single-user agent can 
then connect, validate the certificate, but offer none of its own
(since its not likely to have one). 
</t>

<t>
This specification extends ICE by defining its usage with TCP
candidates. This specification does so by following the 
outline of ICE itself, and calling out the additions and changes
necessary in each section of ICE to support TCP candidates. 
</t>

</section>

<section title="Overview of Operation">

<t>
The usage of ICE with TCP is relatively
straightforward. The main area 
of specification is around how and when connections are opened, and
how those connections relate to candidate pairs.
</t>

<t>
When the agents perform address allocations to gather TCP-based
candidates, three types of candidates can be obtained. These are
active candidates, passive candidates, and simultaneous-open
candidates. An active candidate is one for
which the agent will attempt to open an outbound connection, but will
not receive incoming connection requests. A passive candidate is one
for which the agent will receive incoming connection attempts, but not
attempt a connection. A simultaneous-open candidate is one for which
the agent will attempt to open a connection simultaneously with its
peer. 
</t>

<t>
Because this specification requires multiple candidates for a media
stream, it is not compatible with ICE's lite implementation, and can
only be used by full implementations. 
</t>

<t>
When gathering candidates from a host interface, the agent typically
obtains an active, passive and simultaneous-open
candidates. Similarly, communications with a STUN server will provide
server reflexive and relayed versions of all three types. Connections
to the STUN server are kept open during ICE processing.
</t>

<t>
When encoding these candidates into offers and answers, the type of
the candidate is signaled. In the case of active candidates, an IP
address and port is present, but it is meaningless, as it is ignored
by the peer. As a consequence, active candidates do not need to be
physically allocated at the time of address gathering. Rather, the
physical allocations, which occur as a consequence of a connection
attempt, occur at the time of the connectivity checks.
</t>

<t> When the candidates are paired together, active candidates are
always paired with passive, and simultaneous-open candidates with each
other. When a connectivity check is to be made on a candidate pair,
each agent determines whether it is to make a connection attempt for
this pair.
</t>

<t><list style="empty"><t> Why have both active and simultaneous-open
candidates? Why not just simultaneous-open? The reason is that NAT
treatment of simultaneous opens is currently not well defined, though
specifications are being developed to address this <xref
target="I-D.ietf-behave-tcp"/>. Some NATs block the second
TCP SYN packet or improperly process the subsequent SYNACK, which will
cause the connection attempt to fail. Therefore, if only simultaneous
opens are used, connections may often fail. Alternatively, using
unidirectional opens (where one side is active and the other is
passive) is more reliable, but will always require a relay if both
sides are behind NAT. Therefore, in the spirit of the ICE philosophy,
both are tried. Simultaneous-opens are preferred since, if it does
work, it will not require a relay even when both sides are behind a
different NAT.  </t>
</list></t>

<t>
The actual processing of generating connectivity checks, managing the
state of the check list, and updating the Valid list, work identically
for TCP as they do for UDP.
</t>

<t>
ICE requires an agent to demultiplex STUN and application layer
traffic, since they appear on the same port. This demultiplexing is
described by ICE, and is done using the magic cookie and other fields
of the message.  Stream-oriented transports introduce another wrinkle,
since they require a way to frame the connection so that the
application and STUN packets can be extracted in order to determine
which is which. For this reason, TCP media streams
utilizing ICE use the basic framing provided in RFC 4571 <xref
target="RFC4571"/>, even if the application layer protocol is not
RTP. 
</t>

<t>When TLS is in use, TLS itself runs over the RFC
4571 framing shim, so that STUN runs outside of the TLS
connection. Pictorially:
</t>

<figure anchor="fig-icetcp" title="ICE TCP Stack"><artwork>
<![CDATA[
                               +----------+                               
                               |          |                               
                               |    App   |                               
                    +----------+----------+                               
                    |          |          |                               
                    |   STUN   |    TLS   |                               
                    +----------+----------+                               
                    |                     |                               
                    |      RFC 4571       |                               
                    +---------------------+                               
                    |                     |                               
                    |         TCP         |                               
                    +---------------------+                               
                    |                     |                               
                    |         IP          |                               
                    +---------------------+                               
]]></artwork></figure>

<t>The
implication of this is that, for any media stream protected by TLS,
the agent will first run ICE procedures, exchanging STUN
messages. Then, once ICE completes, TLS procedures begin. ICE and TLS
are thus "peers" in the protocol stack. The STUN messages are not sent
over the TLS connection, even ones sent for the purposes of keepalive
in the middle of the media session.
</t>

<t>
When an updated offer is generated by the controlling endpoint, the SDP
extensions for connection oriented media <xref target="RFC4145"/> are
used to signal that an existing connection should be used, rather than
opening a new one. 
</t>

</section>

<section title="Sending the Initial Offer">

<t>
The offerer MUST be a full ICE implementation.
</t>

<section anchor="sec-gather" title="Gathering Candidates">

<t>For each TCP capable media stream the agent wishes to use
(including ones, like RTP, which can either be UDP or TCP), the agent
SHOULD obtain two host candidates (each on a different port) for each
component of the media stream on each interface that the host has -
one for the simultaneous open, and one for the passive candidate. If
an agent is not capable of acting in one of these modes (for example,
the TCP connection is being used with TLS and the agent can only act
as the client), it would omit those candidates.
</t>

<t>
Providers of real-time communications services may decide that it is
preferable to have no media at all than it is to have media over
TCP. To allow for choice, it is RECOMMENDED that agents be
configurable with whether they obtain TCP candidates for real time
media. 
</t>

<t><list style="empty"> <t> Having it be configurable, and then
configuring it to be off, is far better than not having the capability
at all. An important goal of this specification is to provide a single
mechanism that can be used across all types of endpoints. As such, it
is preferable to account for provider and network variation through
configuration, instead of hard-coded limitations in an
implementation. Furthermore, network characteristics and connectivity
assumptions can, and will change over time. Just because a agent is
communicating with a server on the public network today, doesn't mean
that it won't need to communicate with one behind a NAT tomorrow. Just
because a agent is behind a NAT with endpoint indpendent mapping
today, doesn't mean that 
tomorrow they won't pick up their agent and take it to a public
network access point where there is a NAT with address and port
dependent mapping properties, or one that only
allows outbound TCP. The way to handle these cases and build a
reliable system is for agents to implement a diverse set of techniques
for allocating addresses, so that at least one of them is almost
certainly going to work in any situation. Implementors should consider
very carefully any assumptions that they make about deployments before
electing not to implement one of the mechanisms for address
allocation. In particular, implementors should consider whether the
elements in the system may be mobile, and connect through different
networks with different connectivity. They should also consider
whether endpoints which are under their control, in terms of location
and network connectivity, would always be under their control. In
environments where mobility and user control are possible, a
multiplicity of techniques is essential for reliability.   </t></list>
</t>


<t>Each agent SHOULD "obtain" an active host candidate for each
component of each TCP capable media stream on each interface that the
host has. The agent does not have to actually allocate a port for
these candidates. These candidates serve as a placeholder for the
creation of the check lists.
</t>

<t>
Using each simultaneous-open and passive host TCP candidate as a base, the  
agent SHOULD obtain server reflexive candidates. In addition, the agent
SHOULD choose, amongst all host TCP candidates for a component that
have the same foundation (there will typically be two - a passive and
simultaneous-open), one of those candidates, and from it, obtain two
relayed candidates - one that will be simultaneous-open, and
one that will be passive. Based on these rules, for each host TCP
candidate, an agent will be seeking either a server reflexive
candidate, or both a server reflexive and relayed candidate:
<list style="symbols">
<t>If the agent is seeking both a server reflexive and relayed
  candidate for a host TCP candidate, it
  initiates a TCP connection from the host TCP candidate to its
  configured TURN server, and 
  through an Allocate request, obtains both at the same time. 
</t>
<t>If the agent is seeking just a server reflexive candidate for a host TCP
  candidate, the agent initiates a TCP connection from the host TCP
  candidate to its configured STUN server, and through a Binding
  Request, obtains a server reflexive candidate from the mapped
  address in the response. 
</t></list>
Once the
Allocate or Binding request has completed, the agent MUST keep the TCP
connection open until ICE processing has completed. See
<xref target="sec-impl"/> for important implementation
guidelines.
</t>

<list style="empty"><t>OPEN ISSUE: Do we really need S-O candidates
    from TURN servers? This would only be needed if there are NATs
    north of the TURN server. 
</t></list>

<t>If a media stream is UDP-based (such as RTP), an agent MAY use an
additional host TCP candidate to request a UDP-based candidate from
a TURN server. Usage of the UDP candidate from the TURN server follows the
procedures defined in ICE for UDP candidates.
</t>

<t>Each agent SHOULD "obtain" an active relayed candidate for each
component of each TCP capable media stream on each
interface that the host has. The agent does not have to actually
allocate a port for these candidates from the relay at this
time. These candidates serve as a placeholder for the creation of the
check lists.
</t>

<t> Like its UDP counterparts, TCP-based STUN transactions are paced
out at one every Ta seconds. This pacing refers strictly to STUN
transactions (both Binding and Allocate requests). If performance of
  the transaction requires establishment of a TCP connection, then the
  connection gets opened when the transaction is performed. 
</t>

</section>

<section title="Prioritization">

<t>
The transport protocol itself is a criteria for choosing one
candidate over another. If a particular media stream can run over UDP
or TCP, the UDP candidates might be preferred over the TCP
candidates. This allows ICE to use the lower latency UDP connectivity
if it exists, but fallback to TCP if UDP doesn't work.
</t>

<t>
To accomplish this, the local preference SHOULD be defined as:
</t>

<figure><artwork>
<![CDATA[
local-preference = (2^12)*(transport-pref) +
                   (2^9)*(direction-pref) +
                   (2^0)*(other-pref)
]]></artwork></figure>

<t>
Transport-pref is the relative preference for candidates with this
particular transport protocol (UDP or TCP), and direction-pref is the
preference for candidates with this particular establishment
directionality (active, passive, or simultaneous-open). Other-pref is
used as a differentiator when two candidates would otherwise have
identical local preferences. 
</t>

<t>
Transport-pref MUST be between 0
and 15, with 15 being the most preferred. Direction-pref MUST be
between 0 and 7, with 7 being the most preferred. Other-pref MUST be
between 0 and 511, with 511 being the most preferred. For RTP-based
media streams, it is RECOMMENDED that UDP have a transport-pref of 15
and TCP of 6. It is RECOMMENDED that, for all
connection-oriented media, simultaneous-open candidates have a
direction-pref of 7, active of 5 and passive of 2. If any two
candidates have the same type-preference, transport-pref, and
direction-pref, they MUST have a unique other-pref. With this
specification, the only way that can happen is with multi-homed hosts,
in which case other-pref is a preference amongst interfaces.
</t>

</section>

<section title="Choosing Default Candidates">

<t>
The default candidate is chosen primarily based on the likelihood of it
working with a non-ICE peer. When media streams supporting mixed modes
(both TCP and UDP) are used with ICE, it is RECOMMENDED that, for
real-time streams (such as RTP), the default candidates be
UDP-based. However, the default SHOULD NOT be the simultaneous-open
candidate. 
</t>

<t>
If a media stream is inherently TCP-based, the agent SHOULD NOT select
the simultaneous-open candidate as default.
</t>

</section>

<section title="Encoding the SDP">

<t> TCP-based candidates are encoded into a=candidate lines
identically to the UDP encoding described in
<xref target="I-D.ietf-mmusic-ice"/>. However, the transport protocol
is set to "tcp-so" for TCP simultaneous-open candidates, "tcp-act" for
TCP active candidates, and "tcp-pass" for TCP passive candidates. The
addr and port encoded into the candidate attribute for active
candidates MUST be set to IP address that will be used for the
attempt, but the port MUST be set to 9 (i.e., Discard). For active
relayed candidates, the value for addr must be identical to the IP
address of a passive or simultaneous-open candidate from the same TURN
server.
</t>

<t> If the default candidate is TCP, the agent MUST include the
a=setup and a=connection attributes from RFC 4145
<xref target="RFC4145"/>, following the procedures defined there as if
ICE was not in use. Furthermore, the agent MUST select a default TCP
candidate matching the type in the a=setup attribute. For example, if
an agent selects its passive candidate as default in an offer, and the
media stream utilizes RFC 4145, the agent MUST include an
a=setup:passive attribute with a passive candidate, and the answerer
would utilize an active candidate with the a=setup:active
attribute. If the peer is not ICE capable, the agents will fall back
to non-ICE processing of TCP connections, which is done based on RFC
4145.
</t>

<t>
If an agent is utilizing DTLS-SRTP
<xref target="I-D.ietf-avt-dtls-srtp"/>, it MAY include a mix of UDP
and TCP candidates. The SDP MUST be constructed as described in
<xref target="I-D.fischl-mmusic-sdp-dtls"/>, including the a=setup
attribute. DTLS will be utilized irregardless of whether a TCP or UDP
candidate is selected. If a TCP candidate is selected by ICE, the
directionality attributes (a=setup) are utilized strictly to determine
the direction of the DTLS handshake. Directionality of the TCP
connection establishment are determined by the ICE attributes and
procedures defined here. If an agent is securing media by running RTP
over a TLS connection, it MUST NOT include UDP candidates. The SDP
MUST be constructed as described in RFC 4572 <xref target="RFC4572"/>
and MUST include the a=setup attribute in RFC 4145
<xref target="RFC4145"/>. The directionality attributes (a=setup) are
utilized strictly to determine the direction of the TLS
handshake. Directionality of the TCP connection establishment are
determined by the ICE attributes and procedures defined here.
</t>

<list style="empty">
<t>OPEN ISSUE: The above paragraph assumes that DTLS-SRTP can also be
  run over TCP. Currently, that is not specified. It would need to be
  added. The alternative is that, depending on whether a TCP or UDP
  connection is selected, the next operation is either TLS with RTP,
  or DTLS with SRTP. This, however, is profoundly confusing and would
  have horrible interactions with SDPCap negotiation, since it bends
  layers. For ICE to be able to usefully select either TCP or UDP
  candidate, the processing of secure media should not vary based on
  UDP or TCP. Indeed, due to the RFC 4571 framing, DTLS-SRTP should
  happily run without any change. If we specify that, we should
  probably disallow RTP over TCP/TLS, since that would provide two
  ways of doing the same thing, and we might have interop problems.
</t></list>

</section>

</section>

<section title="Receiving the Initial Offer">

<section title="Forming the Check Lists">

<t>
When forming candidate pairs, the following types of candidates can be
paired with each other:
</t>

<figure><artwork>
<![CDATA[

Local             Remote
Candidate         Candidate
----------------------------
tcp-so           tcp-so
tcp-act          tcp-pass
tcp-pass         tcp-act
]]></artwork></figure>

<t>
When the agent prunes the check list, it MUST also remove any pair for
which the local candidate is tcp-pass.
</t>

<t>
The remainder of check list processing works like the UDP case.
</t>

</section>

</section>

<section title="Connectivity Checks">

<section title="STUN Client Procedures">

<section title="Sending the Request">

<t>
When an agent wants to send a TCP-based connectivity check, it first
opens a TCP connection if none yet exists for the 5-tuple defined by
the candidate pair for which the check is to be sent. This connection
is opened from the local candidate of the pair to the remote candidate
of the pair. If the local candidate is tcp-act, the agent MUST open a
connection from the interface associated with that local
candidate. This connection MUST be opened from an unallocated
port. For host candidates, this is readily done by connecting from the
candidates interface. For relayed candidates, the agent uses the
procedures in <xref target="I-D.ietf-behave-turn"/> to initiate a new
connection from the specified interface on the TURN server.
</t>

<t>
Once the connection is established, the agent MUST utilize the shim
defined in RFC 4571 <xref target="RFC4571"/> for the duration this
connection remains open. The STUN Binding requests and responses are
sent ontop of this shim, so that the length field defined in RFC 4571
precedes each STUN message. If TLS or DTLS-SRTP is to be utilized for the media
session, the TLS or DTLS-SRTP handshakes will take place ontop of this shim as
well. However, they only once ICE processing has completed. In
essence, the TLS or DTLS-SRTP handshakes are considered a part of the media
protocol. STUN is never run within the TLS or DTLS-SRTP session.
</t>

<t>
If the TCP connection cannot be established, the check is considered
to have failed, and a full-mode agent MUST update the pair state to
Failed in the check list.
</t>

<t>
Once the connection is established, client procedures are identical to
those for UDP candidates. Note that STUN responses received on an
active TCP candidate will typically produce a remote peer
reflexive candidate.
</t>


</section>

</section>

<section title="STUN Server Procedures">

<t>
An agent MUST be prepared to receive incoming TCP connection requests
on any host or relayed TCP candidate that is simultaneous-open or
passive. When the connection request is received, the agent MUST
accept it. The agent MUST utilize the framing defined in RFC 4571
<xref target="RFC4571"/> for the lifetime of this connection. Due to
this framing, the agent will receive data in discrete frames. Each
frame can be media (such as RTP or SRTP), TLS, DLTS, and STUN
packets. The STUN packets are extracted as described in
<xref target="sec-recvmedia"/>.
</t>

<t>
Once the connection is established, STUN server procedures are identical to
those for UDP candidates. Note that STUN requests received on a
passive TCP candidate will typically produce a remote peer
reflexive candidate.
</t>

</section>

</section>

<section title="Concluding ICE Processing">

<t>
If there are TCP candidates for a media stream, a controlling agent
MUST use a regular selection algorithm. 
</t>

<t>
When ICE processing for a media stream completes, each agent SHOULD
close all TCP connections except the one between the candidate pairs
selected by ICE.
</t>

<t><list style="empty"><t>
These two rules are related; the closure of connection on completion
of ICE implies that a regular selection algorithm has to be used. This
is because aggressive selection might cause transient pairs to be
selected. Once such a pair was selected, the agents would close the
other connections, one of which may be about to be selected as a
better choice. This race condition may result in TCP connections being
accidentally closed for the pair that ICE selects. 
</t></list></t>

</section>

<section title="Subsequent Offer/Answer Exchanges">


<section title="ICE Restarts">

<t>
If an ICE restart occurs for a media stream with TCP candidate pairs
that have been selected by ICE, the agents MUST NOT close the
connections after the restart. In the offer or answer that causes the
restart, an agent MAY include a simultaneous-open candidate whose
transport address matches the previously selected candidate. If both
agents do this, the result will be a simultaneous-open candidate pair
matching an existing TCP connection. In this case, the agents MUST NOT
attempt to open a new connection (or start new TLS or DTLS-SRTP
procedures). Instead, that existing connection is reused and STUN
checks are performed.
</t>

<t>
Once the restart completes, if the selected pair does not match the
previously selected pair, the TCP connection for the previously
selected pair SHOULD be closed by the agent. 
</t>

</section>

</section>

<section title="Media Handling">

<section title="Sending Media">

<t>
When sending media, if the selected candidate pair matches an existing TCP
connection, that connection MUST be used for sending media.
</t>

<t>
The framing defined in RFC 4571 MUST be used when sending media. For
media streams that are not RTP-based and do not normally use RFC 4571,
the agent treats the media stream as a byte stream, and assumes that
it has its own framing of some sort. It then takes an
arbitrary number of bytes from the bytestream, and places that as a
payload in the RFC 4571 frames, including the length. The recipient
can extract the bytestream and apply the application-specific framing
on it. 
</t>

<t>
If TLS or DTLS-SRTP procedures are being utilized to protect the media
stream, those procedures start at the point that media is permitted to
flow, as defined in the ICE specification
<xref target="I-D.ietf-mmusic-ice"/>. The TLS or DTLS-SRTP handshakes
occur ontop of the RFC 4571 shim, and are considered part of the media
stream for purposes of this specification.
</t>

</section>

<section anchor="sec-recvmedia" title="Receiving Media">

<t>
The framing defined in RFC 4571 MUST be used when receiving media. For
media streams that are not RTP-based and do not normally use RFC 4571,
the agent extracts the payload of each RFC 4571 frame, and determines
if it is a STUN or an application layer data based on the procedures
in ICE <xref target="I-D.ietf-mmusic-ice"/>. If media is being
protected with DTLS-SRTP, the DTLS, RTP and STUN packets are
demultiplexed as described in Section 3.6.2 of draft-ietf-avt-dtls-srtp
<xref target="I-D.ietf-avt-dtls-srtp"/>. If media is being protected
with RTP over TLS, the TLS and STUN packets are demultiplexed by TBD.
</t>

<list style="empty">
<t>OPEN ISSUE: With TLS, the demultiplexing would need to be done by
  lookign for the magic cookie. However due to TLS, the data in that
  position for a TLS frame will be random. So there is a 1 in 2^32
  chance that this matches. We could do better in this particular case
  by switching from the RFC 4571 framing to the TURN framing, which
  includes a next-protocol field. This would make demux
  deterministic. 
</t></list>


<t>
For non-STUN data, the agent appends this to the ongoing bytestream
collected from the frames. It then parses the bytestream as if it had
been directly received over the TCP connection. This allows for
ICE-tcp to work without regard to the framing mechanism used by the
application layer protocol.
</t>

</section>

</section>

<section title="Connection Management">

<section title="Connections Formed During Connectivity Checks">

<t>
Once a TCP or TCP/TLS connection is opened by ICE for the purpose of
connectivity checks, its lifecycle depends on how it is used. If that
candidate pair is selected by ICE for usage for media, an agent SHOULD
keep the connection open until:
<list style="symbols">
<t>The session terminates</t>
<t>The media stream is removed</t>
<t>An ICE restart
takes place, resulting in the selection of a different candidate
pair. </t>
</list>
In these cases, the agent SHOULD close the connection when that event
occurs. 
</t>

<t>
If a connection has been selected by ICE, an agent MAY close it
anyway. As described in the next paragraph, this will cause it to be
reopened almost immediately, and in the interim media cannot be
sent. Consequently, such closures have a negative effect and are NOT
RECOMMENDED. However, there may be cases where an agent needs to close
a connection for some reason.
</t>

<t>
If an agent needs to send media on the selected candidate pair, and
its TCP connection has closed, either on purpose or due to some error,
then:
<list style="symbols">
<t>If the agent's local candidate is tcp-act or tcp-so, it MUST reopen a
connection to the remote candidate of the selected pair.
</t>
<t>If the agent's local candidate is tcp-pass, the agent MUST await an
incoming connection request, and consequently, will not be able to
send media until it has been opened.
</t>
</list>
If the TCP connection is established, the framing of RFC 4571 is
utilized. If the agent opened the connection, it MUST send a STUN
connectivity check. An agent MUST be prepared to receive a
connectivity check over a connection it opened or accepted (note that
this is true in general; ICE requires that an agent be prepared to
receive a connectivity check at any time, even after ICE processing
completes). If an agent receives a connectivity check after
re-establishment of the connection, it MUST generate a triggered check
over that connection in response if it has not already sent a
check. Once an agent has sent a check and received a successful
response, the connection is considered Valid and media can be sent
(which includes a TLS or DTLS-SRTP session resumption or restart).
</t>

<t>
If the TCP connection cannot be established, the controlling agent
SHOULD restart ICE for this media stream.
</t>

</section>

<section title="Connections formed for Gathering Candidates">

<t>
If the agent opened a connection to a STUN server for the purposes of
gathering a server reflexive candidate, that connection SHOULD be
closed by the client once ICE processing has completed. This happens
irregardless of whether the candidate learned from the STUN server was
selected by ICE.
</t>

<t>
If the agent opened a connection to a TURN server for the purposes of
gathering a relayed candidate, that connection MUST be kept open by
the client for the duration of the media session if:
<list style="symbols">
<t>A relayed candidate learned by the TURN server was selected by ICE, 
</t>
<t>or an active candidate established as a consequence of a Connect
  request sent through that TCP connection was selected by ICE.
</t>
</list>
Otherwise, the connection to the TURN server SHOULD be closed once ICE
processing completes.
</t>

<t>
If, despite efforts of the client, a TCP connection to a TURN server
fails during the lifetime of the media session, the client SHOULD
reconnect to the TURN server, and using the procedures defined in TURN
<xref target="I-D.ietf-behave-turn"/>, request a move of the
allocation to the new connection by including the previously allocated
IP address and port int the Allocate request. Such a reconnection does
not require an ICE restart or any signaling to the peer. 
</t>

</section>

</section>

<section title="Security Considerations">

<t>
The main threat in ICE is hijacking of connections for the purposes of
directing media streams to DoS targets or to malicious users. ICE-tcp
prevents that by only using TCP connections that have been
validated. Validation requires a STUN transaction to take place over
the connection. This transaction cannot complete without both
participants knowing a shared secret exchanged in the rendezvous
protocol used with ICE, such as SIP. This shared secret, in turn, is
protected by that protocol exchange. In the case of SIP, the usage of
the sips mechanism is RECOMMENDED. When this is done, an attacker,
even if it knows or can guess the port on which an agent is listening
for incoming TCP connections, will not be able to open a connection
and send media to the agent.
</t>

<t>
A more detailed analysis of this attack and the various ways ICE
prevents it are described in <xref
target="I-D.ietf-mmusic-ice"/>. Those considerations apply to this
specification.
</t>

</section>

<section title="IANA Considerations">

<t>
There are no IANA considerations associated with this specification.
</t>

</section>

<section title="Acknowledgements">

<t>
The authors would like to thank Tim Moore, Saikat Guha, Francois Audet and Roni
Even for the reviews and input on this document.
</t>

</section>

</middle>

<back>
<references title="Normative References">
<?rfc include="reference.I-D.ietf-behave-rfc3489bis"?>
<?rfc include="reference.RFC.3264"?>
<?rfc include="reference.RFC.4145"?>
<?rfc include="reference.RFC.4571"?>
<?rfc include="reference.RFC.4572"?>
<?rfc include="reference.I-D.ietf-mmusic-ice"?>
<?rfc include="reference.I-D.ietf-avt-dtls-srtp"?>
<?rfc include="reference.I-D.fischl-mmusic-sdp-dtls"?>
<?rfc include="reference.I-D.ietf-behave-turn"?>
</references>

<references title="Informative References">
<?rfc include="reference.I-D.ietf-behave-tcp"?>
<?rfc include="reference.I-D.ietf-simple-message-sessions"?>

</references>

</back>

<section anchor="sec-impl" title="Implementation Considerations for
				  BSD Sockets">

<t>
This specification requires unusual handling of TCP connections, the
implementation of which in traditional BSD socket APIs is
non-trivial. 
</t>

<t>
In particular, ICE requirs an agent to obtain a local TCP candidate,
bound to a local IP and port, and then from that local port, initiate
a TCP connection (to the STUN server, in order to obtain server
reflexive candidates, or to the peer as part of a connectivity check),
and be prepared to receive incoming TCP connections (for passive and
simultaneous-open candidates). A "typical" BSD socket is used either
for initiating or receiving connections, and not for
both. The code required to allow incoming and outgoing connections on
the same local IP and port is non-obvious. The following pseudocode,
contributed by Saikat Guha, has been found to work on many platforms: 
</t>

<figure><artwork>
<![CDATA[
for i in 1 to MAX
   sock_i = socket()
   set(sock_i, SO_REUSEPORT)
   bind(sock_i, local)

listen(sock_0)
connect(sock_1, stun)
connect(sock_2, remote_a)
connect(sock_3, remote_b)
]]></artwork></figure>

<t>
The key here is that, prior to the listen() call, the full set of
sockets that need to be utilized for outgoing connections must be
allocated and bound to the local IP address and port. This number,
MAX, represents the maximum number of TCP connections to different
destinations that might need to be established from the same local
candidate. This number can be potentially large for simultaneous-open
candidates. If a request forks, ICE procedures may take place with
multiple peers. Furthermore, for each peer, connections would need to
be established to each passive or simultaneous-open candidate for the
same component. If we assume a worst case of 5 forked branches, and
for each peer, five simultaneous-open candidates, that results in
MAX=25. For a passive candidate, MAX is equal to the number of STUN
servers, since the agent only initiates TCP connections on a passive
candidate to its STUN server. 
</t>

</section>
</rfc>
PAFTECH AB 2003-2026
2026-04-23 00:02:23