One document matched: draft-ietf-tsvwg-sctp-failover-11.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
by Daniel M Kohn (private) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4690 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml">
]>
<rfc category="std" docName="draft-ietf-tsvwg-sctp-failover-11.txt"
ipr="trust200902">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<front>
<title abbrev="SCTP-PF">SCTP-PF: Quick Failover Algorithm in SCTP</title>
<author fullname="Yoshifumi Nishida" initials="Y.N" surname="Nishida">
<organization>GE Global Research</organization>
<address>
<postal>
<street>2623 Camino Ramon</street>
<city>San Ramon</city>
<region>CA</region>
<code>94583</code>
<country>USA</country>
</postal>
<email>nishida@wide.ad.jp</email>
</address>
</author>
<author fullname="Preethi Natarajan" initials="P.N" surname="Natarajan">
<organization>Cisco Systems</organization>
<address>
<postal>
<street>510 McCarthy Blvd</street>
<city>Milpitas</city>
<region>CA</region>
<code>95035</code>
<country>USA</country>
</postal>
<email>prenatar@cisco.com</email>
</address>
</author>
<author fullname="Armando Caro" initials="A.C" surname="Caro">
<organization>BBN Technologies</organization>
<address>
<postal>
<street>10 Moulton St.</street>
<city>Cambridge</city>
<region>MA</region>
<code>02138</code>
<country>USA</country>
</postal>
<email>acaro@bbn.com</email>
</address>
</author>
<author fullname="Paul D. Amer" initials="P.A" surname="Amer">
<organization>University of Delaware</organization>
<address>
<postal>
<street>Computer Science Department - 434 Smith Hall</street>
<city>Newark</city>
<region>DE</region>
<code>19716-2586</code>
<country>USA</country>
</postal>
<email>amer@udel.edu</email>
</address>
</author>
<author fullname="Karen E. E. Nielsen" initials="K.N" surname="Nielsen">
<organization>Ericsson</organization>
<address>
<postal>
<street>Kistavägen 25</street>
<city>Stockholm</city>
<region/>
<code>164 80</code>
<country>Sweden</country>
</postal>
<email>karen.nielsen@tieto.com</email>
</address>
</author>
<date/>
<abstract>
<t>SCTP supports multi-homing. However, when the failover operation
specified in RFC4960 is followed, there can be significant delay and
performance degradation in the data transfer path failover. To overcome this problem this document
specifies a quick failover algorithm (SCTP-PF) based on the introduction of a Potentially Failed (PF) state in SCTP Path Management. </t>
<t>The document also specifies a dormant
state operation of SCTP. This dormant state operation is required to be
followed by an SCTP-PF implementation, but it may equally well be
applied by a standard RFC4960 SCTP implementation.</t>
<t>Additionally, the document introduces an alternative switchback mode
called Permanent Failover that will be beneficial in some situations. This mode of operation applies to both a standard RFC4960 SCTP implementation as well as to a SCTP-PF implementation.</t>
<t>The procedures defined in the document require only minimal
modifications to the RFC4960 specification. The procedures are
sender-side only and do not impact the SCTP receiver.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The Stream Control Transmission Protocol (SCTP) specified in <xref
target="RFC4960"/> supports multi homing at the transport layer. SCTP's
multi homing features include failure detection and failover procedures
to provide network interface redundancy and improved end-to-end fault
tolerance. In SCTP's current failure detection procedure, the sender
must experience Path.Max.Retrans (PMR) number of consecutive failed
timer-based retransmissions on a destination address before detecting a
path failure. Until detecting the path failure, the sender continues to
transmit data on the failed path. The prolonged time in which <xref
target="RFC4960"/> SCTP continues to use a failed path severely degrades
the performance of the protocol. To address this problem, this
document specifies a quick failover algorithm (SCTP-PF) based on the
introduction of a new Potentially Failed path state in SCTP path
management. The performance deficiencies of the <xref target="RFC4960"/>
failover operation, and the improvements obtainable from the
introduction of a Potentially Failed state in SCTP, were proposed and
documented in <xref target="NATARAJAN09"/> for Concurrent Multipath Transfer SCTP <xref target="IYENGAR06"/>.</t>
<t>While SCTP-PF can accelerate failover process and improve performance, the
risks that an SCTP endpoint enters in dormant state where all destination
addresses are inactive can be increased. <xref target="RFC4960"/>
leaves the protocol operation during dormant state to implementations and
encourages to avoid entering the state as much as possible by careful tuning of the Path.Max.Retrans (PMR)
and Association.Max.Retrans (AMR) parameters. We specify a dormant state
operation for SCTP-PF which makes SCTP-PF provide the same disruption
tolerance as <xref target="RFC4960"/> despite that the dormant state may
be entered more quickly. The dormant state operation may equally well be
applied by an <xref target="RFC4960"/> implementation and will here
serve to provide added fault tolerance for situations where the tuning
of the Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR)
parameters fail to provide adequate prevention of the entering of the
dormant state.</t>
<t>The operation after the recovery of a failed
path equally well impacts the performance of the protocol. With the
procedures specified in <xref target="RFC4960"/> SCTP will, after a
failover from the primary path, switch back to use the primary path for
data transfer as soon as this path becomes available again. From a
performance perspective such a forced switchback of the data
transmission path can be suboptimal as the CWND towards the original
primary destination address has to be rebuilt once data transfer
resumes, <xref target="CARO02"/>. As an optional alternative to the
switchback operation of <xref target="RFC4960"/>, this document
specifies an alternative Permanent Failover procedure which avoid such
forced switchbacks of the data transfer path. The Permanent Failover
operation was originally proposed in <xref target="CARO02"/>.</t>
<t>While SCTP-PF primarily is motivated by a desire to improve the
multi-homed operation, the feature applies also to
SCTP single-homed operation. Here the algorithm serves to provide
increased failure detection on idle associations, whereas the failover
or switchback aspects of the algorithm will not be activated. This is
discussed in more detail in Appendix C.</t>
<t>A brief description of the motivation for the introduction of the
Potentially Failed state including a discussion of alternative
approaches to mitigate the deficiencies of the <xref target="RFC4960"/>
failover operation are given in the Appendices. Discussion of path
bouncing effects that might be caused by frequent switchover, are also
provided there.</t>
</section>
<section title="Conventions and Terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</section>
<section anchor="SCTP-PF"
title="SCTP with Potentially-Failed Destination State (SCTP-PF)">
<section title="Overview">
<t>To minimize the performance impact during failover, the sender
should avoid transmitting data to a failed destination address as
early as possible. In the <xref target="RFC4960"/> SCTP path
management scheme, the sender stops transmitting data to a destination
address only after the destination address is marked inactive. This
process takes a significant amount of time as it requires the error
counter of the destination address to exceed the Path.Max.Retrans
(PMR) threshold. The issue cannot simply be mitigated by lowering of
the PMR threshold because this may result in spurious failure
detection and unnecessary prevention of the usage of a preferred
primary path as well as it, due to the coupled tuning of the
Path.Max.Retrans (PMR) and the Association.Max.Retrans (AMR) parameter
values in <xref target="RFC4960"/>, may result in compromisation of
the fault tolerance of SCTP.</t>
<t>The solution provided in this document is to extend the SCTP path
management scheme of <xref target="RFC4960"/> by the addition of the
Potentially Failed (PF) state as an intermediate state in between the
active and inactive state of a destination address in <xref
target="RFC4960"/> path management scheme, and let the failover of
data transfer away from a destination address be driven by the
entering of the PF state instead of by the entering of the inactive
state. Thereby SCTP may perform quick failover without compromising
the overall fault tolerance of <xref target="RFC4960"/> SCTP. At the
same time, RTO-based HEARTBEAT probing is initiated towards a
destination address once it enters PF state. Thereby SCTP may quickly
ascertain whether network connectivity towards the destination address
is broken or whether the failover was spurious. In the case where the
failover was spurious data transfer may quickly resume towards the
original destination address.</t>
<t>The new failure detection algorithm assumes that loss detected by a
timeout implies either severe congestion or network connectivity
failure and it assumes that by default a destination address is
classified as PF already at the occurrence of one first timeout.</t>
</section>
<section title="Specification of the SCTP-PF Procedures">
<t>The SCTP-PF operation is specified as follows: <list style="numbers">
<t>The sender maintains a new tunable SCTP Protocol Parameter
called PotentiallyFailed.Max.Retrans (PFMR). The PFMR defines the
new intermediate PF threshold on the destination address error
counter at exceed of which the destination address is classified
as PF. The RECOMMENDED value of PFMR is 0, but other values MAY be
used. Setting PFMR larger to or equal to Path.Max.Retrans (PMR)
does not result in definition of a PF threshold for the
destination address. I.e., the destination address will not be classified as PF
prior to reaching inactive state.</t>
<t>The error counter of an active destination address is
incremented as specified in <xref target="RFC4960"/>. This means
that the error counter of the destination address will be
incremented each time the T3-rtx timer expires, or each time a
HEARTBEAT chunk is sent when idle and not acknowledged within an
RTO. When the value in the destination address error counter
exceeds PFMR, the endpoint MUST mark the destination address as in
the PF state.</t>
<t>The PFMR threshold defines the point the destination address no
longer is considered a good candidate for data transmission and a
SCTP-PF sender SHOULD NOT send data to destination addresses in PF
state when alternative destination addresses in active state are
available. Specifically this means that: <list style="hanging">
<t hangText="i">When there is outbound data to send and the
destination address presently used for data transmission is in
PF state, the sender SHOULD choose a destination address in
active state, if one exists, and failover to deploy this
destination address for data transmission.</t>
<t hangText="ii">When retransmitting data that has timed out
and the sender thus by <xref target="RFC4960"/>, section
6.4.1, should attempt to pick a new destination address for
data retransmission, the sender SHOULD choose an alternate
destination transport address in active state if one
exists.</t>
<t hangText="iii">When there is outbound data to send and the
SCTP user explicitly requests to send data to a destination
address in PF state, the sender SHOULD send the data to an
alternate destination address in active state if one
exists.</t>
</list>When choosing among multiple destination address in
active state the following considerations are given: <list
style="letters">
<t>An SCTP sender should comply with [RFC4960], section 6.4.1,
principles of choosing most divergent source-destination pairs
compared with, for i.: the destination address in PF state
that it performs a failover from, and for ii.: the destination
address towards which the data timed out. Rules for picking
the most divergent source-destination pair are an
implementation decision and are not specified within this
document.</t>
<t>A SCTP-PF sender MAY choose to send data to a destination
address in PF state, even if destination addresses in active
state exist, have the SCTP-PF sender other means of
information available that disqualifies the destination
address in active state from being preferred. However, the
discussion of such mechanisms is outside of the scope of the
SCTP-PF operation specified in this document.</t>
</list> In all cases, the sender MUST NOT change the state of
chosen destination address, whether this state be active or PF,
and it MUST NOT clear the error counter of the destination address
as a result of choosing the destination address for data
transmission.</t>
<t>When the destination addresses are all in PF state or some in
PF state and some in inactive state, the sender MUST choose one
destination address in PF state and transmit or retransmit data to
this destination address using the following rules: <list
style="letters">
<t>The sender SHOULD choose the destination in PF state with
the lowest error count (fewest consecutive timeouts) for data
transmission and transmit or retransmit data to this
destination.</t>
<t>When there are multiple PF destinations with same error
count, the sender should let the choice among the multiple PF
destination with equal error count be based on the <xref
target="RFC4960"/>, section 6.4.1, principles of choosing most
divergent source-destination pairs when executing (potentially
consecutive) retransmission. Rules for picking the most
divergent source-destination pair are an implementation
decision and are not specified within this document.</t>
<t>A sender MAY choose to deploy other strategies than the
above when choosing among multiple PF destinations have the
SCTP-PF sender other means of information available that
qualifies a particular destination address for being used. The
SCTP-PF protocol operation specified in this document makes no
assumption of the existence of such other means of information
and specifies for the above as the default operation of an
SCTP-PF sender.</t>
</list> The sender MUST NOT change the state and the error
counter of any destination address regardless of whether it has
been chosen for transmission or not.</t>
<t> The HB.interval of the Path Heartbeat function of
<xref target="RFC4960" /> MUST be ignored for destination addresses in PF state.
Instead HEARTBEAT chunks are sent to destination addresses in PF state
once per RTO. HEARTBEAT chunks SHOULD be sent to destination
addresses in PF state, but the sending of HEARTBEATS MUST honor
whether the Path Heartbeat function (Section 8.3 of <xref target="RFC4960" />)
is enabled for the destination address or not. I.e., if the
Path Heartbeat function is disabled for the destination address
in question, HEARTBEATS MUST NOT be sent.
Note that when Heartbeat function is disabled, it may take longer to
transition PF destination to ACTIVE. </t>
<t>HEARTBEATs are sent when a destination address reaches the PF state.
When a HEARTBEAT chunk is not acknowledged within the RTO, the
sender increments the error counter and exponentially backs off
the RTO value. If the error counter is less than PMR, the
sender transmits another packet containing the HEARTBEAT chunk
immediately after timeout expiration on the previous HEARTBEAT.
When data is being transmitted to a destination address in the
PF state, the transmission of a HEARTBEAT chunk MAY be omitted
in case receipt of a SACK of or a T3-rtx timer expiration on the
outstanding data can provide equivalent information,
such as a case where the data chunk has transmitted to a single
destination.
Likewise, the timeout of a HEARTBEAT chunk MAY be ignored if data is
outstanding towards the destination address.
</t>
<t>When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent
to a destination address in PF state, the sender MUST clear the
error counter of the destination address and transition the
destination address back to active state. When the sender resumes
data transmission on the destination address, it MUST do this
following the prescriptions of Section 7.2 of <xref
target="RFC4960"/>.</t>
<t>Additional (PMR - PFMR) consecutive timeouts on a destination
address in PF state confirm the path failure, upon which the
destination address transitions to the inactive state. As
described in <xref target="RFC4960"/>, the sender (i) SHOULD
notify the ULP about this state transition, and (ii) transmit
HEARTBEAT chunks to the inactive destination address at a lower
HB.interval frequency as described in Section 8.3 of <xref
target="RFC4960"/> (when the Path Heartbeat function is enabled
for the destination address).</t>
<t>Acknowledgments for chunks that have been transmitted to
multiple destinations (i.e., a chunk which has been retransmitted
to a different destination address than the destination address to
which the chunk was first transmitted) MUST NOT clear the error
count for an inactive destination address and MUST NOT transition
a destination address in PF state back to active state, since a
sender cannot disambiguate whether the ACK was for the original
transmission or the retransmission(s). A SCTP sender MAY apply a
different approach for the error count handling based on
unequivocally information on which destination (including multiple
destination addresses) the chunk reached. This document makes no
reference to what such unequivocally information could consist of,
neither how such unequivocally information could be obtained. The
design of such an alternative approach is left to
implementations.</t>
<t>Acknowledgments for chunks that has been transmitted to one
destination address only MUST clear the error counter for the
destination address and MUST transition a destination address in
PF state back to Active state. This situation can happen when new
data is sent to a destination address in the PF state. It can also
happen in situations where the destination address is in the PF
state due to the occurrence of a spurious T3-rtx timer and
Acknowledgments start to arrive for data sent prior to occurrence
of the spurious T3-rtx and data has not yet been retransmitted
towards other destinations. This document does not specify special
handling for detection of or reaction to spurious T3-rtx timeouts,
e.g., for special operation vis-a-vis the congestion control
handling or data retransmission operation towards a destination
address which undergoes a transition from active to PF to active
state due to a spurious T3-rtx timeout. But it is noted that this
is an area which would benefit from additional attention,
experimentation and specification for Single Homed SCTP as well as
for Multi Homed SCTP protocol operation.</t>
<t>When all destination addresses are in inactive state, and SCTP
protocol operation thus is said to be in dormant state, the
prescriptions given in <xref target="dormant"/> shall be
followed.</t>
<t>The SCTP stack should provide the ULP with the means to expose
the PF state of its destinations as well as the means to notify of
state transitions from Active to PF, and vice-versa. However it is
recommended that an SCTP stack implementing SCTP-PF also allows
for that the ULP is kept ignorant of the PF state of its
destinations and the associated state transition. For this reason
is it recommended that an SCTP stack implementing SCTP-PF also
should provide the ULP with the means to suppress exposure of PF
state and the associated state transitions.</t>
</list></t>
</section>
</section>
<section anchor="dormant" title="Dormant State Operation">
<t>In a situation with complete disruption of the communication in
between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of
SCTP-PF on destination addresses in PF state may make the association
enter dormant state faster than a standard <xref target="RFC4960"/> SCTP
implementation given the same setting of Path.Max.Retrans (PMR) and
Association.Max.Retrans (AMR). For example, an SCTP association with two
destination addresses typically would reach dormant state in half the
time of an <xref target="RFC4960"/> SCTP implementation in such
situations. This is because a SCTP PF sender will send HEARTBEATS and
data retransmissions in parallel with RTO intervals when there are
multiple destinations addresses in PF state. This argument presumes that
RTO << HB.interval of <xref target="RFC4960"/>. With the design
goal that SCTP-PF shall provide the same level of disruption tolerance
as an <xref target="RFC4960"/> SCTP implementation with the same
Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) setting, we
prescribe for that an SCTP-PF implementation SHOULD operate as described
below in <xref target="dormant_details"/> during dormant state.</t>
<t>An SCTP-PF implementation MAY choose a different dormant state
operation than the one described below in <xref
target="dormant_details"/> provided that the solution chosen does not
compromise the fault tolerance of the SCTP-PF operation.</t>
<t>The below prescription for SCTP-PF dormant state handling SHOULD NOT
be coupled to the value of the PFMR, but solely to the activation of
SCTP-PF logic in an SCTP implementation.</t>
<t>It is noted that the below dormant state operation is considered to
provide added disruption tolerance also for an <xref target="RFC4960"/>
SCTP implementation, and that it can be sensible for an <xref
target="RFC4960"/> SCTP implementation to follow this mode of operation. For an <xref target="RFC4960"/> SCTP implementation the
continuation of data transmission during dormant state makes the fault
tolerance of SCTP be more robust towards situations where some, or all,
alternative paths of an SCTP association approach, or reach, inactive
state prior to that the primary path used for data transmission observes
trouble.</t>
<section anchor="dormant_details" title="SCTP Dormant State Procedure">
<t><list style="letters">
<t>When the destination addresses are all in inactive state and
data is available for transfer, the sender MUST choose one
destination and transmit data to this destination address.</t>
<t>The sender MUST NOT change the state of the chosen destination
address (it remains in inactive state) and it MUST NOT clear the
error counter of the destination address as a result of choosing
the destination address for data transmission.</t>
<t>The sender SHOULD choose the destination in inactive state with
the lowest error count (fewest consecutive timeouts) for data
transmission. When there are multiple destinations with same error
count in inactive state, the sender SHOULD attempt to pick the
most divergent source - destination pair from the last source -
destination pair where failure was observed. Rules for picking the
most divergent source-destination pair are an implementation
decision and are not specified within this document. To support
differentiation of inactive destination addresses based on their
error count SCTP will need to allow for increment of the
destination address error counters up to some reasonable limit
above PMR+1, thus changing the prescriptions of <xref
target="RFC4960"/>, section 8.3, in this respect. The exact limit
to apply is not specified in this document but it is considered
reasonable to require for such to be an order of magnitude higher
than the PMR value. A sender MAY choose to deploy other strategies
that the strategy defined by here. The strategy to prioritize the
last active destination address, i.e., the destination address
with the fewest error counts is optimal when some paths are
permanently inactive, but suboptimal when a path instability is
transient.</t>
</list></t>
</section>
</section>
<section anchor="permanent_failover" title="Permanent Failover">
<t>The objective of the Permanent Failover operation is to allow the
SCTP sender to continue data transmission on a new working path even
when the old primary destination address becomes active again. This is
achieved by having SCTP perform a switch over of the primary path to
the new working path if the error counter of the primary path exceeds a certain threshold. This mode of operation can be applied not only to SCTP-PF implementations,
but also to <xref target="RFC4960"/> implementations.
</t>
<t>The Permanent Failover operation requires only sender side changes.
The details are:</t>
<t><list style="numbers">
<t>The sender maintains a new tunable parameter, called
Primary.Switchover.Max.Retrans (PSMR). For SCTP-PF implementations, the PSMR MUST be set
greater or equal to the PFMR value. For <xref target="RFC4960"/> implementations the PSMR MUST be set greater or equal to the PMR value. Implementations MUST reject
any other values of PSMR.</t>
<t>When the path error counter on a set primary path exceeds PSMR,
the SCTP implementation MUST autonomously select and set a new
primary path.</t>
<t>The primary path selected by the SCTP implementation MUST be
the path which at the given time would be chosen for data
transfer. A previously failed primary path can be used as data
transfer path as per normal path selection when the present data
transfer path fails.</t>
<t>For SCTP-PF, the recommended value of PSMR is PFMR when Permanent Failover
is used. This means that no forced switchback to a previously
failed primary path is performed. An SCTP-PF implementation of Permanent
Failover MUST support the setting of PSMR = PFMR. A
SCTP-PF implementation of Permanent Failover MAY support setting of PSMR
> PFMR.</t>
<t>For <xref
target="RFC4960"/> SCTP, the recommended value of PSMR is PMR when Permanent Failover
is used. This means that no forced switchback to a previously
failed primary path is performed. A <xref
target="RFC4960"/> SCTP implementation of Permanent
Failover MUST support the setting of PSMR = PMR An <xref
target="RFC4960"/> SCTP
implementation of Permanent Failover MAY support larger settings of PSMR > PMR.</t>
<t>It MUST be possible to disable the Permanent Failover and
obtain the standard switchback operation of <xref
target="RFC4960"/>.</t>
</list></t>
<t>The manner of switch over operation that is most optimal in a given
scenario depends on the relative quality of a set primary path versus
the quality of alternative paths available as well as it depends on
the extent to which it is desired for the mode of operation to enforce
traffic distribution over a number of network paths. I.e., load
distribution of traffic from multiple SCTP associations may be sought
to be enforced by distribution of the set primary paths with <xref
target="RFC4960"/> switchback operation. However as <xref
target="RFC4960"/> switchback behavior is suboptimal in certain
situations, especially in scenarios where a number of equally good
paths are available, an SCTP implementation MAY support also, as
alternative behavior, the Permanent Failover mode of operation and MAY
enable it based on users' requests.</t>
<t>For an SCTP implementation that implements Permanent Failover, this
specification RECOMMENDS that the standard RFC4960 switchback
operation is retained as the default operation.</t>
</section>
<section title="Suggested SCTP Protocol Parameter Values">
<t>This document does not alter the <xref target="RFC4960"/> value
RECOMMENDATIONS for the SCTP Protocol Parameters defined in <xref
target="RFC4960"/>.</t>
<t>The following protocol parameter is RECOMMENDED:<list style="empty">
<t>PotentiallyFailed.Max.Retrans (PFMR) - 0</t>
</list></t>
</section>
<section title="Socket API Considerations">
<t>This section describes how the socket API defined in <xref
target="RFC6458"/> is extended to provide a way for the application to
control and observe the SCTP-PF behavior as well as the Permanent Failover function.</t>
<t>Please note that this section is informational only.</t>
<t>A socket API implementation based on <xref target="RFC6458"/> is, by
means of the existing SCTP_PEER_ADDR_CHANGE event, extended to provide
the event notification when a peer address enters or leaves the
potentially failed state as well as the socket API implementation is
extended to expose the potentially failed state of a peer address in the
existing SCTP_GET_PEER_ADDR_INFO structure.</t>
<t>Furthermore, two new read/write socket options for the level
IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. The
first socket option is used to control the values of the PFMR and PSMR
parameters described in <xref target="SCTP-PF"/> and in <xref target="permanent_failover"/>. The second one
controls the exposition of the potentially failed path state.</t>
<t>Support for the SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be
added to the function sctp_opt_info().</t>
<section anchor="pf_support_api"
title="Support for the Potentially Failed Path State">
<t>As defined in <xref target="RFC6458"/>, the SCTP_PEER_ADDR_CHANGE
event is provided if the status of a peer address changes. In addition
to the state changes described in <xref target="RFC6458"/>, this event
is also provided, if a peer address enters or leaves the potentially
failed state. The notification as defined in <xref target="RFC6458"/>
uses the following structure:</t>
<figure>
<artwork><![CDATA[
struct sctp_paddr_change {
uint16_t spc_type;
uint16_t spc_flags;
uint32_t spc_length;
struct sockaddr_storage spc_aaddr;
uint32_t spc_state;
uint32_t spc_error;
sctp_assoc_t spc_assoc_id;
}
]]></artwork>
</figure>
<t><xref target="RFC6458"/> defines the constants SCTP_ADDR_AVAILABLE,
SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and
SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This
document defines in addition to that the new constant
SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected
address becomes potentially failed.</t>
<t>The SCTP_GET_PEER_ADDR_INFO socket option defined in <xref
target="RFC6458"/> can be used to query the state of a peer address.
It uses the following structure:</t>
<figure>
<artwork><![CDATA[
struct sctp_paddrinfo {
sctp_assoc_t spinfo_assoc_id;
struct sockaddr_storage spinfo_address;
int32_t spinfo_state;
uint32_t spinfo_cwnd;
uint32_t spinfo_srtt;
uint32_t spinfo_rto;
uint32_t spinfo_mtu;
};
]]></artwork>
</figure>
<t><xref target="RFC6458"/> defines the constants SCTP_UNCONFIRMED,
SCTP_ACTIVE, and SCTP_INACTIVE to be provided in the spinfo_state
field. This document defines in addition to that the new constant
SCTP_POTENTIALLY_FAILED, which is reported if the peer address is
potentially failed.</t>
</section>
<section title="Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option">
<t>Applications can control the SCTP-PF behavior by getting or setting
the number of consecutive timeouts before a peer address is considered
potentially failed or unreachable. The same socket option is used by applications to set and get the number of timeouts before the primary path is
changed automatically by the Permanent Failover function. This socket option uses the level IPPROTO_SCTP
and the name SCTP_PEER_ADDR_THLDS.</t>
<t>The following structure is used to access and modify the
thresholds:</t>
<figure>
<artwork><![CDATA[
struct sctp_paddrthlds {
sctp_assoc_t spt_assoc_id;
struct sockaddr_storage spt_address;
uint16_t spt_pathmaxrxt;
uint16_t spt_pathpfthld;
uint16_t spt_pathcpthld;
};
]]></artwork>
</figure>
<t><list style="hanging">
<t hangText="spt_assoc_id:">This parameter is ignored for
one-to-one style sockets. For one-to-many style sockets the
application may fill in an association identifier or
SCTP_FUTURE_ASSOC. It is an error to use SCTP_{CURRENT|ALL}_ASSOC
in spt_assoc_id.</t>
<t hangText="spt_address:">This specifies which peer address is of
interest. If a wild card address is provided, this socket option
applies to all current and future peer addresses.</t>
<t hangText="spt_pathmaxrxt:">Each peer address of interest is
considered unreachable, if its path error counter exceeds
spt_pathmaxrxt.</t>
<t hangText="spt_pathpfthld:">Each peer address of interest is
considered Potentially Failed, if its path error counter exceeds
spt_pathpfthld.</t>
<t hangText="spt_pathcpthld:">Each peer address of interest is not
considered the primary remote address anymore, if its path error
counter exceeds spt_pathcpthld. Using a value of 0xffff disables
the selection of a new primary peer address. If an implementation
does not support the automatically selection of a new primary
address, it should indicate an error with errno set to EINVAL if a
value different from 0xffff is used in spt_pathcpthld. For SCTP-PF, the setting of
spt_pathcpthld < spt_pathpfthld should be rejected with errno
set to EINVAL. For <xref
target="RFC4960"/> SCTP, the setting of
spt_pathcpthld < spt_pathmaxrxt should be rejected with errno
set to EINVAL. A SCTP-PF implementation MAY support only setting of
spt_pathcpthld = spt_pathpfthld and spt_pathcpthld = 0xffff and a <xref
target="RFC4960"/> SCTP implementation MAY support only setting of
spt_pathcpthld = spt_pathmaxrxt and spt_pathcpthld = 0xffff. In
these cases SCTP shall reject setting of other values with errno set
to EINVAL.</t>
</list></t>
</section>
<section title="Exposing the Potentially Failed Path State (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option">
<t>Applications can control the exposure of the potentially failed
path state in the SCTP_PEER_ADDR_CHANGE event and the
SCTP_GET_PEER_ADDR_INFO as described in <xref
target="pf_support_api"/>. The default value is implementation
specific.</t>
<t>This socket option uses the level IPPROTO_SCTP and the name
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.</t>
<t>The following structure is used to control the exposition of the
potentially failed path state:</t>
<figure>
<artwork><![CDATA[
struct sctp_assoc_value {
sctp_assoc_t assoc_id;
uint32_t assoc_value;
};
]]></artwork>
</figure>
<t><list style="hanging">
<t hangText="assoc_id:">This parameter is ignored for one-to-one
style sockets. For one-to-many style sockets the application may
fill in an association identifier or SCTP_FUTURE_ASSOC. It is an
error to use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.</t>
<t hangText="assoc_value:">The potentially failed path state is
exposed if and only if this parameter is non-zero.</t>
</list></t>
</section>
</section>
<section title="Security Considerations">
<t>Security considerations for the use of SCTP and its APIs are
discussed in <xref target="RFC4960"/> and <xref target="RFC6458"/>.</t>
<t> The logic introduced by this document does not impact existing
on-the-wire SCTP messages. Also, this document does not introduce
any new on-the-wire SCTP messages that require new security considerations.
</t>
<t>
SCTP-PF makes SCTP not only more robust during
primary path failure/congestion but also more vulnerable to
network connectivity/congestion attacks on the primary path.
SCTP-PF makes it easier for an attacker to trick SCTP to change data
transfer path, since the duration of time that an attacker needs to
compromise the network connectivity is much shorter than <xref target="RFC4960" />.
However, SCTP-PF does not constitute a significant change
in the duration of time and effort an attacker needs to keep SCTP
away from the primary path. With the standard switchback operation
<xref target="RFC4960" /> SCTP resumes data transfer on its primary path as soon as
the next HEARTBEAT succeeds.
</t>
<t>
On the other hand, usage of the Permanent Failover mechanism,
does change the treat analysis. This is because attackers can force
a permanent change of the data transfer path by blocking the primary path
until the switchover of the primary path is triggered by the
Permanent Failover algorithm.
This especially will be the case when Permanent Failover is used together
with SCTP-PF with the particular setting of PSMR = PFMR = 0, as
Permanent Failover here happens already at the first RTO timeout
experienced. Users of the Permanent Failover mechanism should be
aware of this fact.
</t>
<t>
The event notification of path state transfer from active to
potentially failed state and vice versa gives attackers an increased
possibility to generate more local events. However, it is
assumed that event notifications are rate-limited in the implementation
to address this threat.
</t>
</section>
<section title="IANA Considerations">
<t>This document does not create any new registries or modify the rules
for any existing registries managed by IANA.</t>
</section>
<section title="Acknowledgements">
<t>The authors wish to thank Michael Tuexen for his many invaluable
comments and for his very substantial support with the making of this
document.</t>
</section>
<section title="Proposed Change of Status (to be Deleted before Publication)">
<t>Initially this work looked to entail some changes of the Congestion
Control (CC) operation of SCTP and for this reason the work was proposed
as Experimental. These intended changes of the CC operation have since
been judged to be irrelevant and are no longer part of the
specification. As the specification entails no other potential harmful
features, consensus exists in the WG to bring the work forward as
PS.</t>
<t>Initially concerns have been expressed about the possibility for the
mechanism to introduce path bouncing with potential harmful network
impacts. These concerns are believed to be unfounded. This issue is
addressed in Appendix B.</t>
<t>It is noted that the feature specified by this document is
implemented by multiple SCTP SW implementations and furthermore that
various variants of the solution have been deployed in Telco signaling
environments for several years with good results.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.RFC.4960" ?>
</references>
<references title="Informative References">
<?rfc include="reference.RFC.6458" ?>
<reference anchor="IYENGAR06" target="">
<front>
<title>Concurrent Multipath Transfer using SCTP Multihoming over
Independent End-to-end Paths.</title>
<author fullname="" initials="J." surname="Iyengar">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="10" year="2006"/>
</front>
<seriesInfo name="IEEE/ACM Trans on Networking" value="14(5)"/>
</reference>
<reference anchor="NATARAJAN09" target="">
<front>
<title>Concurrent Multipath Transfer during Path Failure</title>
<author fullname="" initials="P." surname="Natarajan">
<organization/>
</author>
<author fullname="" initials="N." surname="Ekiz">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="5" year="2009"/>
</front>
<seriesInfo name="Computer Communications" value=""/>
</reference>
<reference anchor="JUNGMAIER02" target="">
<front>
<title>On the use of SCTP in failover scenarios</title>
<author fullname="" initials="A." surname="Jungmaier">
<organization/>
</author>
<author fullname="" initials="E." surname="Rathgeb">
<organization/>
</author>
<author fullname="" initials="M." surname="Tuexen">
<organization/>
</author>
<date month="7" year="2002"/>
</front>
<seriesInfo name="World Multiconference on Systemics, Cybernetics and Informatics"
value=""/>
</reference>
<reference anchor="GRINNEMO04" target="">
<front>
<title>Performance of SCTP-controlled failovers in M3UA-based
SIGTRAN networks</title>
<author fullname="" initials="K-J" surname="Grinnemo">
<organization/>
</author>
<author fullname="" initials="A." surname="Brunstrom">
<organization/>
</author>
<date month="4" year="2004"/>
</front>
<seriesInfo name="Advanced Simulation Technologies Conference"
value=""/>
</reference>
<reference anchor="FALLON08" target="">
<front>
<title>SCTP Switchover Performance Issues in WLAN
Environments</title>
<author fullname="" initials="S." surname="Fallon">
<organization/>
</author>
<author fullname="" initials="P." surname="Jacob">
<organization/>
</author>
<author fullname="" initials="Y." surname="Qiao">
<organization/>
</author>
<author fullname="" initials="L." surname="Murphy">
<organization/>
</author>
<author fullname="" initials="E." surname="Fallon">
<organization/>
</author>
<author fullname="" initials="A." surname="Hanley">
<organization/>
</author>
<date month="1" year="2008"/>
</front>
<seriesInfo name="IEEE CCNC" value="2008"/>
</reference>
<reference anchor="CARO04" target="">
<front>
<title>End-to-End Failover Thresholds for Transport Layer
Multihoming</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="11" year="2004"/>
</front>
<seriesInfo name="MILCOM 2004" value=""/>
</reference>
<reference anchor="CARO05" target="">
<front>
<title>End-to-End Fault Tolerance using Transport Layer
Multihoming</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<date month="1" year="2005"/>
</front>
<seriesInfo name="Ph.D Thesis, University of Delaware" value=""/>
</reference>
<reference anchor="CARO02" target="">
<front>
<title>A Two-level Threshold Recovery Mechanism for SCTP</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<author fullname="" initials="J." surname="Iyengar">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="G." surname="Heinz">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="7" year="2002"/>
</front>
<seriesInfo name="Tech report, CIS Dept, University of Delaware"
value=""/>
</reference>
</references>
<section anchor="alternative_approach"
title="Discussions of Alternative Approaches">
<t>This section lists alternative approaches for the issues described in
this document. Although these approaches do not require to update
RFC4960, we do not recommend them from the reasons described below.</t>
<section title="Reduce Path.Max.Retrans (PMR)">
<t>Smaller values for Path.Max.Retrans shorten the failover duration and in fact this is recommended in some research results <xref
target="JUNGMAIER02"/> <xref target="GRINNEMO04"/> <xref
target="FALLON08"/>. However to significantly reduce the failover time it is required to go down (as with PFMR) to Path.Max.Retrans=0 and with this setting SCTP
switches to another destination address already on a single timeout which may result in spurious failover. Spurious failover is a problem in <xref target="RFC4960"/> SCTP
as the transmission of HEARTBEATS on the left primary path, unlike in SCTP-PF, is governed by
'HB.interval' also during the failover process. 'HB.interval' is usually
set in the order of seconds (recommended value is 30 seconds) and when
the primary path becomes inactive, the next HEARTBEAT may be
transmitted only many seconds later. Indeed as recommended, only 30 secs later. Meanwhile, the primary path may since long have
recovered, if it needed recovery at all (indeed the failover could be truely spurious). In such situations, post failover, an endpoint is forced to
wait in the order of many seconds before the endpoint can resume
transmission on the primary path and furthermore once it returns on the primary path the CWND needs to be rebuild anew - a process which the throughput already have had to suffer from on the alternate path. Using a smaller value for
'HB.interval' might help this situation, but it would result in a general waste of
bandwidth as such more frequent HEARBEATING would take place also when there are no observed troubles. The bandwidth overhead may be diminished by having the ULP use a smaller 'HB.interval' only on the path which at any given time is set to be the primary path, but this adds complication in the ULP.</t>
<t>In addition, smaller Path.Max.Retrans values also affect the
'Association.Max.Retrans' value. When the SCTP association's error
count exceeds
Association.Max.Retrans threshold, the SCTP sender considers the peer
endpoint unreachable and terminates the association.
Section 8.2 in <xref target="RFC4960"/> recommends that
Association.Max.Retrans value should not be larger than the summation
of the Path.Max.Retrans of each of the destination addresses. Else the
SCTP sender considers its peer reachable even when all destinations
are INACTIVE and to avoid this dormant state operation, <xref target="RFC4960"/> SCTP
implementation SHOULD reduce Association.Max.Retrans accordingly
whenever it reduces Path.Max.Retrans. However, smaller
Association.Max.Retrans value compromizes the fault tolerance of SCTP as it increases the chances of association
termination during minor congestion events.</t>
</section>
<section title="Adjust RTO related parameters">
<t>As several research results indicate, we can also shorten the
duration of failover process by adjusting RTO related parameters <xref
target="JUNGMAIER02"/> <xref target="FALLON08"/>. During failover
process, RTO keeps being doubled. However, if we can choose smaller
value for RTO.max, we can stop the exponential growth of RTO at some
point. Also, choosing smaller values for RTO.initial or RTO.min can
contribute to keep the RTO value small.</t>
<t>Similar to reducing Path.Max.Retrans, the advantage of this
approach is that it requires no modification to the current
specification, although it needs to ignore several recommendations
described in the Section 15 of <xref target="RFC4960"/>. However, this
approach requires to have enough knowledge about the network
characteristics between end points. Otherwise, it can introduce
adverse side-effects such as spurious timeouts.</t>
<t> The significant issue with this approach, however, is that even if the RTO.max is lowered to an optimal low value, then as long as the Path.Max.Retrans is kept at the <xref target="RFC4960"/> recommended value, the reduction of the RTO.max doesn't reduce the failover time sufficiently enough to prevent severe performance degradation during failover. </t>
</section>
</section>
<section anchor="path_bouncing"
title="Discussions for Path Bouncing Effect">
<t>The methods described in the document can accelerate the failover
process. Hence, they might introduce the path bouncing effect where the
sender keeps changing the data transmission path frequently. This sounds
harmful to the data transfer, however several research results indicate
that there is no serious problem with SCTP in terms of path bouncing
effect <xref target="CARO04"/> <xref target="CARO05"/>.</t>
<t>There are two main reasons for this. First, SCTP is basically
designed for multipath communication, which means SCTP maintains all
path related parameters (CWND, ssthresh, RTT, error count, etc) per each
destination address. These parameters cannot be affected by path
bouncing. In addition, when SCTP migrates the data transfer to another
path, it starts with the minimal or the initial CWND. Hence, there is
little chance for packet reordering or duplicating.</t>
<t>Second, even if all communication paths between the end-nodes share
the same bottleneck, the SCTP-PF results in a behavior already allowed
by <xref target="RFC4960"/>.</t>
</section>
<section anchor="sh" title="SCTP-PF for SCTP Single-homed Operation ">
<t>For a single-homed SCTP association the only tangible effect of the
activation of SCTP-PF operation is enhanced failure detection in terms
of potential notification of the PF state of the sole destination
address as well as, for idle associations, more rapid entering, and
notification, of inactive state of the destination address and more
rapid end-point failure detection. It is believed that neither of these
effects are harmful, provided adequate dormant state operation is
implemented, and furthermore that they may be particularly useful for
applications that deploys multiple SCTP associations for load balancing
purposes. The early notification of the PF state may be used for
preventive measures as the entering of the PF state can be used as a
warning of potential congestion. Depending on the PMR value, the
aggressive HEARTBEAT transmission in PF state may speed up the end-point
failure detection (exceed of AMR threshold on the sole path error
counter) on idle associations in case where relatively large HB.interval
value compared to RTO (e.g. 30secs) is used.</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 03:00:08 |