One document matched: draft-ietf-tsvwg-sctp-failover-15.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
by Daniel M Kohn (private) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4690 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml">
]>
<rfc category="std" docName="draft-ietf-tsvwg-sctp-failover-15.txt"
ipr="trust200902">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>
<front>
<title abbrev="SCTP-PF">SCTP-PF: Quick Failover Algorithm in SCTP</title>
<author fullname="Yoshifumi Nishida" initials="Y.N" surname="Nishida">
<organization>GE Global Research</organization>
<address>
<postal>
<street>2623 Camino Ramon</street>
<city>San Ramon</city>
<region>CA</region>
<code>94583</code>
<country>USA</country>
</postal>
<email>nishida@wide.ad.jp</email>
</address>
</author>
<author fullname="Preethi Natarajan" initials="P.N" surname="Natarajan">
<organization>Cisco Systems</organization>
<address>
<postal>
<street>510 McCarthy Blvd</street>
<city>Milpitas</city>
<region>CA</region>
<code>95035</code>
<country>USA</country>
</postal>
<email>prenatar@cisco.com</email>
</address>
</author>
<author fullname="Armando Caro" initials="A.C" surname="Caro">
<organization>BBN Technologies</organization>
<address>
<postal>
<street>10 Moulton St.</street>
<city>Cambridge</city>
<region>MA</region>
<code>02138</code>
<country>USA</country>
</postal>
<email>acaro@bbn.com</email>
</address>
</author>
<author fullname="Paul D. Amer" initials="P.A" surname="Amer">
<organization>University of Delaware</organization>
<address>
<postal>
<street>Computer Science Department - 434 Smith Hall</street>
<city>Newark</city>
<region>DE</region>
<code>19716-2586</code>
<country>USA</country>
</postal>
<email>amer@udel.edu</email>
</address>
</author>
<author fullname="Karen E. E. Nielsen" initials="K.N" surname="Nielsen">
<organization>Ericsson</organization>
<address>
<postal>
<street>Kistavägen 25</street>
<city>Stockholm</city>
<region/>
<code>164 80</code>
<country>Sweden</country>
</postal>
<email>karen.nielsen@tieto.com</email>
</address>
</author>
<date/>
<abstract>
<t>SCTP supports multi-homing. However, when the failover operation
specified in RFC4960 is followed, there can be significant delay and
performance degradation in the data transfer path failover. To overcome
this problem this document specifies a quick failover algorithm
(SCTP-PF) based on the introduction of a Potentially Failed (PF) state
in SCTP Path Management.</t>
<t>The document also specifies a dormant state operation of SCTP. This
dormant state operation is required to be followed by an SCTP-PF
implementation, but it may equally well be applied by a standard RFC4960
SCTP implementation.</t>
<t>Additionally, the document introduces an alternative switchback
operation mode called Primary Path Switchover that will be beneficial in
certain situations. This mode of operation applies to both a standard
RFC4960 SCTP implementation as well as to a SCTP-PF implementation.</t>
<t>The procedures defined in the document require only minimal
modifications to the RFC4960 specification. The procedures are
sender-side only and do not impact the SCTP receiver.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>The Stream Control Transmission Protocol (SCTP) specified in <xref
target="RFC4960"/> supports multi-homing at the transport layer. SCTP's
multi-homing features include failure detection and failover procedures
to provide network interface redundancy and improved end-to-end fault
tolerance. In SCTP's current failure detection procedure, the sender
must experience Path.Max.Retrans (PMR) number of consecutive failed
timer-based retransmissions on a destination address before detecting a
path failure. Until detecting the path failure, the sender continues to
transmit data on the failed path. The prolonged time in which <xref
target="RFC4960"/> SCTP continues to use a failed path severely degrades
the performance of the protocol. To address this problem, this document
specifies a quick failover algorithm (SCTP-PF) based on the introduction
of a new Potentially Failed (PF) path state in SCTP path management. The
performance deficiencies of the <xref target="RFC4960"/> failover
operation, and the improvements obtainable from the introduction of a
Potentially Failed state in SCTP, were proposed and documented in <xref
target="NATARAJAN09"/> for Concurrent Multipath Transfer SCTP <xref
target="IYENGAR06"/>.</t>
<t>While SCTP-PF can accelerate failover process and improve
performance, the risks that an SCTP endpoint enters the dormant state
where all destination addresses are inactive can be increased. <xref
target="RFC4960"/> leaves the protocol operation during dormant state to
implementations and encourages to avoid entering the state as much as
possible by careful tuning of the Path.Max.Retrans (PMR) and
Association.Max.Retrans (AMR) parameters. We specify a dormant state
operation for SCTP-PF which makes SCTP-PF provide the same disruption
tolerance as <xref target="RFC4960"/> despite that the dormant state may
be entered more quickly. The dormant state operation may equally well be
applied by an <xref target="RFC4960"/> implementation and will here
serve to provide added fault tolerance for situations where the tuning
of the Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR)
parameters fail to provide adequate prevention of the entering of the
dormant state.</t>
<t>The operation after the recovery of a failed path also impacts the
performance of the protocol. With the procedures specified in <xref
target="RFC4960"/> SCTP will, after a failover from the primary path,
switch back to use the primary path for data transfer as soon as this
path becomes available again. From a performance perspective such a
forced switchback of the data transmission path can be suboptimal as the
CWND towards the original primary destination address has to be rebuilt
once data transfer resumes, <xref target="CARO02"/>. As an optional
alternative to the switchback operation of <xref target="RFC4960"/>,
this document specifies an alternative Primary Path Switchover procedure
which avoid such forced switchbacks of the data transfer path. The
Primary Path Switchover operation was originally proposed in <xref
target="CARO02"/>.</t>
<t>While SCTP-PF primarily is motivated by a desire to improve the
multi-homed operation, the feature applies also to SCTP single-homed
operation. Here the algorithm serves to provide increased failure
detection on idle associations, whereas the failover or switchback
aspects of the algorithm will not be activated. This is discussed in
more detail in Appendix C.</t>
<t>A brief description of the motivation for the introduction of the
Potentially Failed state including a discussion of alternative
approaches to mitigate the deficiencies of the <xref target="RFC4960"/>
failover operation are given in the Appendices. Discussion of path
bouncing effects that might be caused by frequent switchovers, are also
provided there.</t>
</section>
<section title="Conventions and Terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</section>
<section anchor="SCTP-PF"
title="SCTP with Potentially Failed Destination State (SCTP-PF)">
<section title="Overview">
<t>To minimize the performance impact during failover, the sender
should avoid transmitting data to a failed destination address as
early as possible. In the <xref target="RFC4960"/> SCTP path
management scheme, the sender stops transmitting data to a destination
address only after the destination address is marked inactive. This
process takes a significant amount of time as it requires the error
counter of the destination address to exceed the Path.Max.Retrans
(PMR) threshold. The issue cannot simply be mitigated by lowering of
the PMR threshold because this may result in spurious failure
detection and unnecessary prevention of the usage of a preferred
primary path. Also due to the coupled tuning of the Path.Max.Retrans
(PMR) and the Association.Max.Retrans (AMR) parameter values in <xref
target="RFC4960"/>, lowering of the PMR threshold may result in
lowering of the AMR threshold, which would result in decrease of
the fault tolerance of SCTP.</t>
<t>The solution provided in this document is to extend the SCTP path
management scheme of <xref target="RFC4960"/> by the addition of the
Potentially Failed (PF) state as an intermediate state in between the
active and inactive state of a destination address in the <xref
target="RFC4960"/> path management scheme, and let the failover of
data transfer away from a destination address be driven by the
entering of the PF state instead of by the entering of the inactive
state. Thereby SCTP may perform quick failover without negatively impacting
the overall fault tolerance of <xref target="RFC4960"/> SCTP. At the
same time, RTO-based HEARTBEAT probing is initiated towards a
destination address once it enters PF state. Thereby SCTP may quickly
ascertain whether network connectivity towards the destination address
is broken or whether the failover was spurious. In the case where the
failover was spurious data transfer may quickly resume towards the
original destination address.</t>
<t>The new failure detection algorithm assumes that loss detected by a
timeout implies either severe congestion or network connectivity
failure. It recommends that by default a destination address is
classified as PF at the occurrence of the first timeout.</t>
</section>
<section title="Specification of the SCTP-PF Procedures">
<t>The SCTP-PF operation is specified as follows: <list
style="numbers">
<t>The sender maintains a new tunable SCTP Protocol Parameter
called PotentiallyFailed.Max.Retrans (PFMR). The PFMR defines the
new intermediate PF threshold on the destination address error
counter. When this threshold is exceeded the destination address is classified
as PF. The RECOMMENDED value of PFMR is 0, but other values MAY be
used. If PFMR is set to be greater than or equal to
Path.Max.Retrans (PMR), the resulting PF threshold will be so high
that the destination address will reach the inactive state before
it can be classified as PF.</t>
<t>The error counter of an active destination address is
incremented as specified in <xref target="RFC4960"/>. This means
that the error counter of the destination address will be
incremented each time the T3-rtx timer expires, or each time a
HEARTBEAT chunk is sent when idle and not acknowledged within an
RTO. When the value in the destination address error counter
exceeds PFMR, the endpoint MUST mark the destination address as in
the PF state.</t>
<t>A SCTP-PF sender SHOULD NOT send data to destination addresses
in PF state when alternative destination addresses in active state
are available. Specifically this means that: <list style="hanging">
<t hangText="i">When there is outbound data to send and the
destination address presently used for data transmission is in
PF state, the sender SHOULD choose a destination address in
active state, if one exists, and use this
destination address for data transmission.</t>
<t hangText="ii">When retransmitting data that has timed out
and the sender thus by <xref target="RFC4960"/>, section
6.4.1, should attempt to pick a new destination address for
data retransmission, the sender SHOULD choose an alternate
destination transport address in active state if one
exists.</t>
<t hangText="iii">When there is outbound data to send and the
SCTP user explicitly requests to send data to a destination
address in PF state, the sender SHOULD send the data to an
alternate destination address in active state if one
exists.</t>
</list>When choosing among multiple destination addresses in
active state
an SCTP sender will follow the guiding principles of section 6.4.1 of <xref target="RFC4960"/>
of choosing most divergent source-destination pairs
compared with, for i.: the destination address in PF state
that it performs a failover from, and for ii.: the destination
address towards which the data timed out. Rules for picking
the most divergent source-destination pair are an
implementation decision and are not specified within this
document.
<vspace />
<vspace />
In all cases, the sender MUST NOT change the state of
chosen destination address, whether this state be active or PF,
and it MUST NOT clear the error counter of the destination address
as a result of choosing the destination address for data
transmission.</t>
<t>When the destination addresses are all in PF state or some in
PF state and some in inactive state, the sender MUST choose one
destination address in PF state and transmit or retransmit data to
this destination address using the following rules: <list
style="letters">
<t>The sender SHOULD choose the destination in PF state with
the lowest error count (fewest consecutive timeouts) for data
transmission and transmit or retransmit data to this
destination.</t>
<t>When there are multiple destination addresses in PF state
with same error count, the sender should let the choice among
the multiple destination addresses in PF state with equal
error count be based on the <xref target="RFC4960"/>, section
6.4.1, principles of choosing most divergent
source-destination pairs when executing (potentially
consecutive) retransmission. Rules for picking the most
divergent source-destination pair are an implementation
decision and are not specified within this document.</t>
</list> The sender MUST NOT change the state and the error
counter of any destination address regardless of whether it has
been chosen for transmission or not.</t>
<t>The HB.interval of the Path Heartbeat function of <xref
target="RFC4960"/> MUST be ignored for destination addresses in PF
state. Instead HEARTBEAT chunks are sent to destination addresses
in PF state once per RTO. HEARTBEAT chunks SHOULD be sent to
destination addresses in PF state, but the sending of HEARTBEATS
MUST honor whether the Path Heartbeat function (Section 8.3 of
<xref target="RFC4960"/>) is enabled for the destination address
or not. I.e., if the Path Heartbeat function is disabled for the
destination address in question, HEARTBEATS MUST NOT be sent. Note
that when Heartbeat function is disabled, it may take longer to
transition a destination address in PF state back to active
state.</t>
<t>HEARTBEATs are sent when a destination address reaches the PF
state. When a HEARTBEAT chunk is not acknowledged within the RTO,
the sender increments the error counter and exponentially backs
off the RTO value. If the error counter is less than PMR, the
sender transmits another packet containing the HEARTBEAT chunk
immediately after timeout expiration on the previous HEARTBEAT.
When data is being transmitted to a destination address in the PF
state, the transmission of a HEARTBEAT chunk MAY be omitted in
case where the receipt of a SACK of the data or a T3-rtx timer
expiration on the data can provide equivalent information, such as
the case where the data chunk has been transmitted to a single
destination address only. Likewise, the timeout of a HEARTBEAT
chunk MAY be ignored if data is outstanding towards the
destination address.</t>
<t>When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent
to a destination address in PF state, the sender SHOULD clear the
error counter of the destination address and transition the
destination address back to active state. When the sender resumes
data transmission on a destination address after a transition of
the destination address from PF to active state, it MUST do this
following the prescriptions of Section 7.2 of <xref
target="RFC4960"/>. In a situation where a HEARTBEAT ACK arrives
while there is data outstanding towards the destination address to
which the HEARTBEAT was sent, then an implementation MAY choose to
not have the HEARTBEAT ACK reset the error counter, but have the
error counter reset await the fate of the outstanding data
transmission. This situation can happen when data is sent to a
destination address in PF state.</t>
<t>Additional (PMR - PFMR) consecutive timeouts on a destination
address in PF state confirm the path failure, upon which the
destination address transitions to the inactive state. As
described in <xref target="RFC4960"/>, the sender (i) SHOULD
notify the ULP about this state transition, and (ii) transmit
HEARTBEAT chunks to the inactive destination address at a lower
HB.interval frequency as described in Section 8.3 of <xref
target="RFC4960"/> (when the Path Heartbeat function is enabled
for the destination address).</t>
<t>Acknowledgments for chunks that have been transmitted to
multiple destinations (i.e., a chunk which has been retransmitted
to a different destination address than the destination address to
which the chunk was first transmitted) SHOULD NOT clear the error
count for an inactive destination address and SHOULD NOT
move a destination address in PF state back to active state,
since a sender cannot disambiguate whether the ACK was for the
original transmission or the retransmission(s). A SCTP sender MAY
clear the error counter and move a destination address back to active
state if it has other information, than the acknowledgment, that uniquely
determines which destination, among multiple destination addresses,
the chunk reached. This document makes no
reference to what such information could consist of,
nor how such information could be obtained. </t>
<t>Acknowledgments for data chunks that has been transmitted to
one destination address only MUST clear the error counter for the
destination address and MUST transition a destination address in
PF state back to active state. This situation can happen when new
data is sent to a destination address in the PF state. It can also
happen in situations where the destination address is in the PF
state due to the occurrence of a spurious T3-rtx timer and
acknowledgments start to arrive for data sent prior to occurrence
of the spurious T3-rtx and data has not yet been retransmitted
towards other destinations. This document does not specify special
handling for detection of or reaction to spurious T3-rtx timeouts,
e.g., for special operation vis-a-vis the congestion control
handling or data retransmission operation towards a destination
address which undergoes a transition from active to PF to active
state due to a spurious T3-rtx timeout. But it is noted that this
is an area which would benefit from additional attention,
experimentation and specification for single-homed SCTP as well as
for multi-homed SCTP protocol operation.</t>
<t>When all destination addresses are in inactive state, and SCTP
protocol operation thus is said to be in dormant state, the
prescriptions given in <xref target="dormant"/> shall be
followed.</t>
<t>The SCTP stack SHOULD expose the PF state of its destination
addresses to the ULP as well as provide the means to notify the
ULP of state transitions of its destination addresses from active
to PF, and vice-versa. However it is recommended that an SCTP
stack implementing SCTP-PF also allows for that the ULP is kept
ignorant of the PF state of its destinations and the associated
state transitions, thus allowing for retain of the simpler state
transition model of RFC4960 in the ULP. For this reason it is
recommended that an SCTP stack implementing SCTP-PF also provides
the ULP with the means to suppress exposure of the PF state and
the associated state transitions.</t>
</list></t>
</section>
</section>
<section anchor="dormant" title="Dormant State Operation">
<t>In a situation with complete disruption of the communication in
between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of
SCTP-PF on destination addresses in PF state may make the association
enter dormant state faster than a standard <xref target="RFC4960"/> SCTP
implementation given the same setting of Path.Max.Retrans (PMR) and
Association.Max.Retrans (AMR). For example, an SCTP association with two
destination addresses typically would reach dormant state in half the
time of an <xref target="RFC4960"/> SCTP implementation in such
situations. This is because a SCTP PF sender will send HEARTBEATS and
data retransmissions in parallel with RTO intervals when there are
multiple destinations addresses in PF state. This argument presumes that
RTO << HB.interval of <xref target="RFC4960"/>. With the design
goal that SCTP-PF shall provide the same level of disruption tolerance
as an <xref target="RFC4960"/> SCTP implementation with the same
Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) setting, we
prescribe for that an SCTP-PF implementation SHOULD operate as described
below in <xref target="dormant_details"/> during dormant state.</t>
<t>An SCTP-PF implementation MAY choose a different dormant state
operation than the one described below in <xref
target="dormant_details"/> provided that the solution chosen does not
decrease the fault tolerance of the SCTP-PF operation.</t>
<t>The below prescription for SCTP-PF dormant state handling SHOULD NOT
be coupled to the value of the PFMR, but solely to the activation of
SCTP-PF logic in an SCTP implementation.</t>
<t>It is noted that the below dormant state operation is considered to
provide added disruption tolerance also for an <xref target="RFC4960"/>
SCTP implementation, and that it can be sensible for an <xref
target="RFC4960"/> SCTP implementation to follow this mode of operation.
For an <xref target="RFC4960"/> SCTP implementation the continuation of
data transmission during dormant state makes the fault tolerance of SCTP
be more robust towards situations where some, or all, alternative paths
of an SCTP association approach, or reach, inactive state before the
primary path used for data transmission observes trouble.</t>
<section anchor="dormant_details" title="SCTP Dormant State Procedure">
<t><list style="letters">
<t>When the destination addresses are all in inactive state and
data is available for transfer, the sender MUST choose one
destination and transmit data to this destination address.</t>
<t>The sender MUST NOT change the state of the chosen destination
address (it remains in inactive state) and it MUST NOT clear the
error counter of the destination address as a result of choosing
the destination address for data transmission.</t>
<t>The sender SHOULD choose the destination in inactive state with
the lowest error count (fewest consecutive timeouts) for data
transmission. When there are multiple destinations with same error
count in inactive state, the sender SHOULD attempt to pick the
most divergent source - destination pair from the last source -
destination pair where failure was observed. Rules for picking the
most divergent source-destination pair are an implementation
decision and are not specified within this document. To support
differentiation of inactive destination addresses based on their
error count SCTP will need to allow for increment of the
destination address error counters up to some reasonable limit
above PMR+1, thus changing the prescriptions of <xref
target="RFC4960"/>, section 8.3, in this respect. The exact limit
to apply is not specified in this document but it is considered
reasonable to require for the limit to be an order of magnitude
higher than the PMR value. A sender MAY choose to deploy other
strategies that the strategy defined here. The strategy to
prioritize the last active destination address, i.e., the
destination address with the fewest error counts is optimal when
some paths are permanently inactive, but suboptimal when a path
instability is transient.</t>
</list></t>
</section>
</section>
<section anchor="permanent_failover" title="Primary Path Switchover">
<t>The objective of the Primary Path Switchover operation is to allow
the SCTP sender to continue data transmission on a new working path even
when the old primary destination address becomes active again. This is
achieved by having SCTP perform a switchover of the primary path to the
new working path if the error counter of the primary path exceeds a
certain threshold. This mode of operation can be applied not only to
SCTP-PF implementations, but also to <xref target="RFC4960"/>
implementations.</t>
<t>The Primary Path Switchover operation requires only sender side
changes. The details are:</t>
<t><list style="numbers">
<t>The sender maintains a new tunable parameter, called
Primary.Switchover.Max.Retrans (PSMR). For SCTP-PF implementations,
the PSMR MUST be set greater or equal to the PFMR value. For <xref
target="RFC4960"/> implementations the PSMR MUST be set greater or
equal to the PMR value. Implementations MUST reject any other values
of PSMR.</t>
<t>When the path error counter on a set primary path exceeds PSMR,
the SCTP implementation MUST autonomously select and set a new
primary path.</t>
<t>The primary path selected by the SCTP implementation MUST be the
path which at the given time would be chosen for data transfer. A
previously failed primary path can be used as data transfer path as
per normal path selection when the present data transfer path
fails.</t>
<t>For SCTP-PF, the recommended value of PSMR is PFMR when Primary
Path Switchover operation mode is used. This means that no forced
switchback to a previously failed primary path is performed. An
SCTP-PF implementation of Primary Path Switchover MUST support the
setting of PSMR = PFMR. A SCTP-PF implementation of Primary Path
Switchover MAY support setting of PSMR > PFMR.</t>
<t>For <xref target="RFC4960"/> SCTP, the recommended value of PSMR
is PMR when Primary Path Switchover is used. This means that no
forced switchback to a previously failed primary path is performed.
A <xref target="RFC4960"/> SCTP implementation of Primary Path
Switchover MUST support the setting of PSMR = PMR. An <xref
target="RFC4960"/> SCTP implementation of Primary Path Switchover
MAY support larger settings of PSMR > PMR.</t>
<t>It MUST be possible to disable the Primary Path Switchover
operation and obtain the standard switchback operation of <xref
target="RFC4960"/>.</t>
</list></t>
<t>The manner of switchover operation that is most optimal in a given
scenario depends on the relative quality of a set primary path versus
the quality of alternative paths available as well as on the extent to
which it is desired for the mode of operation to enforce traffic
distribution over a number of network paths. I.e., load distribution of
traffic from multiple SCTP associations may be sought to be enforced by
distribution of the set primary paths with <xref target="RFC4960"/>
switchback operation. However as <xref target="RFC4960"/> switchback
behavior is suboptimal in certain situations, especially in scenarios
where a number of equally good paths are available, an SCTP
implementation MAY support also, as alternative behavior, the Primary
Path Switchover mode of operation and MAY enable it based on applications'
requests.</t>
<t>For an SCTP implementation that implements the Primary Path
Switchover operation, this specification RECOMMENDS that the standard
RFC4960 switchback operation is retained as the default operation.</t>
</section>
<section title="Suggested SCTP Protocol Parameter Values">
<t>This document does not alter the <xref target="RFC4960"/> value
RECOMMENDATIONS for the SCTP Protocol Parameters defined in <xref
target="RFC4960"/>.</t>
<t>The following protocol parameter is RECOMMENDED:<list style="empty">
<t>PotentiallyFailed.Max.Retrans (PFMR) - 0</t>
</list></t>
</section>
<section title="Socket API Considerations">
<t>This section describes how the socket API defined in <xref
target="RFC6458"/> is extended to provide a way for the application to
control and observe the SCTP-PF behavior as well as the Primary Path
Switchover function.</t>
<t>Please note that this section is informational only.</t>
<t>A socket API implementation based on <xref target="RFC6458"/> is, by
means of the existing SCTP_PEER_ADDR_CHANGE event, extended to provide
the event notification when a peer address enters or leaves the
potentially failed state as well as the socket API implementation is
extended to expose the potentially failed state of a peer address in the
existing SCTP_GET_PEER_ADDR_INFO structure.</t>
<t>Furthermore, two new read/write socket options for the level
IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. The
first socket option is used to control the values of the PFMR and PSMR
parameters described in <xref target="SCTP-PF"/> and in <xref
target="permanent_failover"/>. The second one controls the exposition of
the potentially failed path state.</t>
<t>Support for the SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be
added to the function sctp_opt_info().</t>
<section anchor="pf_support_api"
title="Support for the Potentially Failed Path State">
<t>As defined in <xref target="RFC6458"/>, the SCTP_PEER_ADDR_CHANGE
event is provided if the status of a peer address changes. In addition
to the state changes described in <xref target="RFC6458"/>, this event
is also provided, if a peer address enters or leaves the potentially
failed state. The notification as defined in <xref target="RFC6458"/>
uses the following structure:</t>
<figure>
<artwork>
struct sctp_paddr_change {
uint16_t spc_type;
uint16_t spc_flags;
uint32_t spc_length;
struct sockaddr_storage spc_aaddr;
uint32_t spc_state;
uint32_t spc_error;
sctp_assoc_t spc_assoc_id;
}
</artwork>
</figure>
<t><xref target="RFC6458"/> defines the constants SCTP_ADDR_AVAILABLE,
SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and
SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This
document defines in addition to that the new constant
SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected
address becomes potentially failed.</t>
<t>The SCTP_GET_PEER_ADDR_INFO socket option defined in <xref
target="RFC6458"/> can be used to query the state of a peer address.
It uses the following structure:</t>
<figure>
<artwork>
struct sctp_paddrinfo {
sctp_assoc_t spinfo_assoc_id;
struct sockaddr_storage spinfo_address;
int32_t spinfo_state;
uint32_t spinfo_cwnd;
uint32_t spinfo_srtt;
uint32_t spinfo_rto;
uint32_t spinfo_mtu;
};
</artwork>
</figure>
<t><xref target="RFC6458"/> defines the constants SCTP_UNCONFIRMED,
SCTP_ACTIVE, and SCTP_INACTIVE to be provided in the spinfo_state
field. This document defines in addition to that the new constant
SCTP_POTENTIALLY_FAILED, which is reported if the peer address is
potentially failed.</t>
</section>
<section title="Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option">
<t>Applications can control the SCTP-PF behavior by getting or setting
the number of consecutive timeouts before a peer address is considered
potentially failed or unreachable. The same socket option is used by
applications to set and get the number of timeouts before the primary
path is changed automatically by the Primary Path Switchover function.
This socket option uses the level IPPROTO_SCTP and the name
SCTP_PEER_ADDR_THLDS.</t>
<t>The following structure is used to access and modify the
thresholds:</t>
<figure>
<artwork>
struct sctp_paddrthlds {
sctp_assoc_t spt_assoc_id;
struct sockaddr_storage spt_address;
uint16_t spt_pathmaxrxt;
uint16_t spt_pathpfthld;
uint16_t spt_pathcpthld;
};
</artwork>
</figure>
<t><list style="hanging">
<t hangText="spt_assoc_id:">This parameter is ignored for
one-to-one style sockets. For one-to-many style sockets the
application may fill in an association identifier or
SCTP_FUTURE_ASSOC. It is an error to use SCTP_{CURRENT|ALL}_ASSOC
in spt_assoc_id.</t>
<t hangText="spt_address:">This specifies which peer address is of
interest. If a wild card address is provided, this socket option
applies to all current and future peer addresses.</t>
<t hangText="spt_pathmaxrxt:">Each peer address of interest is
considered unreachable, if its path error counter exceeds
spt_pathmaxrxt.</t>
<t hangText="spt_pathpfthld:">Each peer address of interest is
considered Potentially Failed, if its path error counter exceeds
spt_pathpfthld.</t>
<t hangText="spt_pathcpthld:">Each peer address of interest is not
considered the primary remote address anymore, if its path error
counter exceeds spt_pathcpthld. Using a value of 0xffff disables
the selection of a new primary peer address. If an implementation
does not support the automatically selection of a new primary
address, it should indicate an error with errno set to EINVAL if a
value different from 0xffff is used in spt_pathcpthld. For
SCTP-PF, the setting of spt_pathcpthld < spt_pathpfthld should
be rejected with errno set to EINVAL. For <xref target="RFC4960"/>
SCTP, the setting of spt_pathcpthld < spt_pathmaxrxt should be
rejected with errno set to EINVAL. A SCTP-PF implementation MAY
support only setting of spt_pathcpthld = spt_pathpfthld and
spt_pathcpthld = 0xffff and a <xref target="RFC4960"/> SCTP
implementation MAY support only setting of spt_pathcpthld =
spt_pathmaxrxt and spt_pathcpthld = 0xffff. In these cases SCTP
shall reject setting of other values with errno set to EINVAL.</t>
</list></t>
</section>
<section title="Exposing the Potentially Failed Path State (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option">
<t>Applications can control the exposure of the potentially failed
path state in the SCTP_PEER_ADDR_CHANGE event and the
SCTP_GET_PEER_ADDR_INFO as described in <xref
target="pf_support_api"/>. The default value is implementation
specific.</t>
<t>This socket option uses the level IPPROTO_SCTP and the name
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.</t>
<t>The following structure is used to control the exposition of the
potentially failed path state:</t>
<figure>
<artwork>
struct sctp_assoc_value {
sctp_assoc_t assoc_id;
uint32_t assoc_value;
};
</artwork>
</figure>
<t><list style="hanging">
<t hangText="assoc_id:">This parameter is ignored for one-to-one
style sockets. For one-to-many style sockets the application may
fill in an association identifier or SCTP_FUTURE_ASSOC. It is an
error to use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.</t>
<t hangText="assoc_value:">The potentially failed path state is
exposed if and only if this parameter is non-zero.</t>
</list></t>
</section>
</section>
<section title="Security Considerations">
<t>Security considerations for the use of SCTP and its APIs are
discussed in <xref target="RFC4960"/> and <xref target="RFC6458"/>.</t>
<t>The logic introduced by this document does not impact existing SCTP
messages on the wire. Also, this document does not introduce any new
SCTP messages on the wire that require new security considerations.</t>
<t>SCTP-PF makes SCTP not only more robust during primary path
failure/congestion but also more vulnerable to network
connectivity/congestion attacks on the primary path. SCTP-PF makes it
easier for an attacker to trick SCTP to change data transfer path, since
the duration of time that an attacker needs to negatively influence the network
connectivity is much shorter than <xref target="RFC4960"/>. However,
SCTP-PF does not constitute a significant change in the duration of time
and effort an attacker needs to keep SCTP away from the primary path.
With the standard switchback operation <xref target="RFC4960"/> SCTP
resumes data transfer on its primary path as soon as the next HEARTBEAT
succeeds.</t>
<t>On the other hand, usage of the Primary Path Switchover mechanism,
does change the threat analysis. This is because on-path attackers can
force a permanent change of the data transfer path by blocking the
primary path until the switchover of the primary path is triggered by
the Primary Path Switchover algorithm. This especially will be the case
when the Primary Path Switchover is used together with SCTP-PF with the
particular setting of PSMR = PFMR = 0, as Primary Path Switchover here
happens already at the first RTO timeout experienced. Users of the
Primary Path Switchover mechanism should be aware of this fact.</t>
<t>The event notification of path state transfer from active to
potentially failed state and vice versa gives attackers an increased
possibility to generate more local events. However, it is assumed that
event notifications are rate-limited in the implementation to address
this threat.</t>
</section>
<section title="IANA Considerations">
<t>This document does not create any new registries or modify the rules
for any existing registries managed by IANA.</t>
</section>
<section title="Acknowledgements">
<t>The authors wish to thank Michael Tuexen for his many invaluable
comments and for his very substantial support with the making of this
document.</t>
</section>
<section title="Proposed Change of Status (to be Deleted before Publication)">
<t>Initially this work looked to entail some changes of the Congestion
Control (CC) operation of SCTP and for this reason the work was proposed
as Experimental. These intended changes of the CC operation have since
been judged to be irrelevant and are no longer part of the
specification. As the specification entails no other potential harmful
features, consensus exists in the WG to bring the work forward as
PS.</t>
<t>Initially concerns have been expressed about the possibility for the
mechanism to introduce path bouncing with potential harmful network
impacts. These concerns are believed to be unfounded. This issue is
addressed in Appendix B.</t>
<t>It is noted that the feature specified by this document is
implemented by multiple SCTP SW implementations and furthermore that
various variants of the solution have been deployed in telephony signaling
environments for several years with good results.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.RFC.4960" ?>
</references>
<references title="Informative References">
<?rfc include="reference.RFC.6458" ?>
<reference anchor="IYENGAR06" target="">
<front>
<title>Concurrent Multipath Transfer using SCTP Multihoming over
Independent End-to-end Paths.</title>
<author fullname="" initials="J." surname="Iyengar">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="10" year="2006"/>
</front>
<seriesInfo name="IEEE/ACM Trans on Networking" value="14(5)"/>
</reference>
<reference anchor="NATARAJAN09" target="">
<front>
<title>Concurrent Multipath Transfer during Path Failure</title>
<author fullname="" initials="P." surname="Natarajan">
<organization/>
</author>
<author fullname="" initials="N." surname="Ekiz">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="5" year="2009"/>
</front>
<seriesInfo name="Computer Communications" value=""/>
</reference>
<reference anchor="JUNGMAIER02" target="">
<front>
<title>On the use of SCTP in failover scenarios</title>
<author fullname="" initials="A." surname="Jungmaier">
<organization/>
</author>
<author fullname="" initials="E." surname="Rathgeb">
<organization/>
</author>
<author fullname="" initials="M." surname="Tuexen">
<organization/>
</author>
<date month="7" year="2002"/>
</front>
<seriesInfo name="World Multiconference on Systemics, Cybernetics and Informatics"
value=""/>
</reference>
<reference anchor="GRINNEMO04" target="">
<front>
<title>Performance of SCTP-controlled failovers in M3UA-based
SIGTRAN networks</title>
<author fullname="" initials="K-J" surname="Grinnemo">
<organization/>
</author>
<author fullname="" initials="A." surname="Brunstrom">
<organization/>
</author>
<date month="4" year="2004"/>
</front>
<seriesInfo name="Advanced Simulation Technologies Conference"
value=""/>
</reference>
<reference anchor="FALLON08" target="">
<front>
<title>SCTP Switchover Performance Issues in WLAN
Environments</title>
<author fullname="" initials="S." surname="Fallon">
<organization/>
</author>
<author fullname="" initials="P." surname="Jacob">
<organization/>
</author>
<author fullname="" initials="Y." surname="Qiao">
<organization/>
</author>
<author fullname="" initials="L." surname="Murphy">
<organization/>
</author>
<author fullname="" initials="E." surname="Fallon">
<organization/>
</author>
<author fullname="" initials="A." surname="Hanley">
<organization/>
</author>
<date month="1" year="2008"/>
</front>
<seriesInfo name="IEEE CCNC" value="2008"/>
</reference>
<reference anchor="CARO04" target="">
<front>
<title>End-to-End Failover Thresholds for Transport Layer Multi
homing</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="11" year="2004"/>
</front>
<seriesInfo name="MILCOM 2004" value=""/>
</reference>
<reference anchor="CARO05" target="">
<front>
<title>End-to-End Fault Tolerance using Transport Layer Multi
homing</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<date month="1" year="2005"/>
</front>
<seriesInfo name="Ph.D Thesis, University of Delaware" value=""/>
</reference>
<reference anchor="CARO02" target="">
<front>
<title>A Two-level Threshold Recovery Mechanism for SCTP</title>
<author fullname="" initials="A." surname="Caro Jr.">
<organization/>
</author>
<author fullname="" initials="J." surname="Iyengar">
<organization/>
</author>
<author fullname="" initials="P." surname="Amer">
<organization/>
</author>
<author fullname="" initials="G." surname="Heinz">
<organization/>
</author>
<author fullname="" initials="R." surname="Stewart">
<organization/>
</author>
<date month="7" year="2002"/>
</front>
<seriesInfo name="Tech report, CIS Dept, University of Delaware"
value=""/>
</reference>
</references>
<section anchor="alternative_approach"
title="Discussions of Alternative Approaches">
<t>This section lists alternative approaches for the issues described in
this document. Although these approaches do not require to update
RFC4960, we do not recommend them from the reasons described below.</t>
<section title="Reduce Path.Max.Retrans (PMR)">
<t>Smaller values for Path.Max.Retrans shorten the failover duration
and in fact this is recommended in some research results <xref
target="JUNGMAIER02"/> <xref target="GRINNEMO04"/> <xref
target="FALLON08"/>. However to significantly reduce the failover time
it is required to go down (as with PFMR) to Path.Max.Retrans=0 and
with this setting SCTP switches to another destination address already
on a single timeout which may result in spurious failover. Spurious
failover is a problem in <xref target="RFC4960"/> SCTP as the
transmission of HEARTBEATS on the left primary path, unlike in
SCTP-PF, is governed by 'HB.interval' also during the failover
process. 'HB.interval' is usually set in the order of seconds
(recommended value is 30 seconds) and when the primary path becomes
inactive, the next HEARTBEAT may be transmitted only many seconds
later. Indeed as recommended, only 30 secs later. Meanwhile, the
primary path may since long have recovered, if it needed recovery at
all (indeed the failover could be truly spurious). In such situations,
post failover, an endpoint is forced to wait in the order of many
seconds before the endpoint can resume transmission on the primary
path and furthermore once it returns on the primary path the CWND
needs to be rebuild anew - a process which the throughput already have
had to suffer from on the alternate path. Using a smaller value for
'HB.interval' might help this situation, but it would result in a
general waste of bandwidth as such more frequent HEARTBEATING would
take place also when there are no observed troubles. The bandwidth
overhead may be diminished by having the ULP use a smaller
'HB.interval' only on the path which at any given time is set to be
the primary path, but this adds complication in the ULP.</t>
<t>In addition, smaller Path.Max.Retrans values also affect the
'Association.Max.Retrans' value. When the SCTP association's error
count exceeds Association.Max.Retrans threshold, the SCTP sender
considers the peer endpoint unreachable and terminates the
association. Section 8.2 in <xref target="RFC4960"/> recommends that
Association.Max.Retrans value should not be larger than the summation
of the Path.Max.Retrans of each of the destination addresses. Else the
SCTP sender considers its peer reachable even when all destinations
are INACTIVE and to avoid this dormant state operation, <xref
target="RFC4960"/> SCTP implementation SHOULD reduce
Association.Max.Retrans accordingly whenever it reduces
Path.Max.Retrans. However, smaller Association.Max.Retrans value
decreases the fault tolerance of SCTP as it increases the chances of
association termination during minor congestion events.</t>
</section>
<section title="Adjust RTO related parameters">
<t>As several research results indicate, we can also shorten the
duration of failover process by adjusting RTO related parameters <xref
target="JUNGMAIER02"/> <xref target="FALLON08"/>. During failover
process, RTO keeps being doubled. However, if we can choose smaller
value for RTO.max, we can stop the exponential growth of RTO at some
point. Also, choosing smaller values for RTO.initial or RTO.min can
contribute to keep the RTO value small.</t>
<t>Similar to reducing Path.Max.Retrans, the advantage of this
approach is that it requires no modification to the current
specification, although it needs to ignore several recommendations
described in the Section 15 of <xref target="RFC4960"/>. However, this
approach requires to have enough knowledge about the network
characteristics between end points. Otherwise, it can introduce
adverse side-effects such as spurious timeouts.</t>
<t>The significant issue with this approach, however, is that even if
the RTO.max is lowered to an optimal low value, then as long as the
Path.Max.Retrans is kept at the <xref target="RFC4960"/> recommended
value, the reduction of the RTO.max doesn't reduce the failover time
sufficiently enough to prevent severe performance degradation during
failover.</t>
</section>
</section>
<section anchor="path_bouncing"
title="Discussions for Path Bouncing Effect">
<t>The methods described in the document can accelerate the failover
process. Hence, they might introduce the path bouncing effect where the
sender keeps changing the data transmission path frequently. This sounds
harmful to the data transfer, however several research results indicate
that there is no serious problem with SCTP in terms of path bouncing
effect <xref target="CARO04"/> <xref target="CARO05"/>.</t>
<t>There are two main reasons for this. First, SCTP is basically
designed for multipath communication, which means SCTP maintains all
path related parameters (CWND, ssthresh, RTT, error count, etc) per each
destination address. These parameters cannot be affected by path
bouncing. In addition, when SCTP migrates the data transfer to another
path, it starts with the minimal or the initial CWND. Hence, there is
little chance for packet reordering or duplicating.</t>
<t>Second, even if all communication paths between the end-nodes share
the same bottleneck, the SCTP-PF results in a behavior already allowed
by <xref target="RFC4960"/>.</t>
</section>
<section anchor="sh" title="SCTP-PF for SCTP Single-homed Operation ">
<t>For a single-homed SCTP association the only tangible effect of the
activation of SCTP-PF operation is enhanced failure detection in terms
of potential notification of the PF state of the sole destination
address as well as, for idle associations, more rapid entering, and
notification, of inactive state of the destination address and more
rapid end-point failure detection. It is believed that neither of these
effects are harmful, provided adequate dormant state operation is
implemented, and furthermore that they may be particularly useful for
applications that deploys multiple SCTP associations for load balancing
purposes. The early notification of the PF state may be used for
preventive measures as the entering of the PF state can be used as a
warning of potential congestion. Depending on the PMR value, the
aggressive HEARTBEAT transmission in PF state may speed up the end-point
failure detection (exceed of AMR threshold on the sole path error
counter) on idle associations in case where relatively large HB.interval
value compared to RTO (e.g. 30secs) is used.</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 03:00:28 |