One document matched: draft-ietf-tsvwg-sctp-failover-11.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4690 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4960.xml">
]>
<rfc category="std" docName="draft-ietf-tsvwg-sctp-failover-11.txt"
     ipr="trust200902">
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

  <?rfc toc="yes" ?>

  <?rfc symrefs="yes" ?>

  <?rfc sortrefs="yes"?>

  <?rfc iprnotified="no" ?>

  <?rfc strict="yes" ?>

  <front>
    <title abbrev="SCTP-PF">SCTP-PF: Quick Failover Algorithm in SCTP</title>

    <author fullname="Yoshifumi Nishida" initials="Y.N" surname="Nishida">
      <organization>GE Global Research</organization>

      <address>
        <postal>
          <street>2623 Camino Ramon</street>

          <city>San Ramon</city>

          <region>CA</region>

          <code>94583</code>

          <country>USA</country>
        </postal>

        <email>nishida@wide.ad.jp</email>
      </address>
    </author>

    <author fullname="Preethi Natarajan" initials="P.N" surname="Natarajan">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>510 McCarthy Blvd</street>

          <city>Milpitas</city>

          <region>CA</region>

          <code>95035</code>

          <country>USA</country>
        </postal>

        <email>prenatar@cisco.com</email>
      </address>
    </author>

    <author fullname="Armando Caro" initials="A.C" surname="Caro">
      <organization>BBN Technologies</organization>

      <address>
        <postal>
          <street>10 Moulton St.</street>

          <city>Cambridge</city>

          <region>MA</region>

          <code>02138</code>

          <country>USA</country>
        </postal>

        <email>acaro@bbn.com</email>
      </address>
    </author>

    <author fullname="Paul D. Amer" initials="P.A" surname="Amer">
      <organization>University of Delaware</organization>

      <address>
        <postal>
          <street>Computer Science Department - 434 Smith Hall</street>

          <city>Newark</city>

          <region>DE</region>

          <code>19716-2586</code>

          <country>USA</country>
        </postal>

        <email>amer@udel.edu</email>
      </address>
    </author>

    <author fullname="Karen E. E. Nielsen" initials="K.N" surname="Nielsen">
      <organization>Ericsson</organization>

      <address>
        <postal>
          <street>Kistavägen 25</street>

          <city>Stockholm</city>

          <region/>

          <code>164 80</code>

          <country>Sweden</country>
        </postal>

        <email>karen.nielsen@tieto.com</email>
      </address>
    </author>

    <date/>

    <abstract>
      <t>SCTP supports multi-homing. However, when the failover operation
      specified in RFC4960 is followed, there can be significant delay and
      performance degradation in the data transfer path failover. To overcome this problem this document
      specifies a quick failover algorithm (SCTP-PF) based on the introduction of a  Potentially Failed (PF) state in SCTP Path Management. </t>

      <t>The document also specifies a dormant
      state operation of SCTP. This dormant state operation is required to be
      followed by an SCTP-PF implementation, but it may equally well be
      applied by a standard RFC4960 SCTP implementation.</t>

 <t>Additionally, the document introduces an alternative switchback mode 
      called Permanent Failover that will be beneficial in some situations. This mode of operation applies to both a standard RFC4960 SCTP implementation as well as to a SCTP-PF implementation.</t>


      <t>The procedures defined in the document require only minimal
      modifications to the RFC4960 specification. The procedures are
      sender-side only and do not impact the SCTP receiver.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>The Stream Control Transmission Protocol (SCTP) specified in <xref
      target="RFC4960"/> supports multi homing at the transport layer. SCTP's
      multi homing features include failure detection and failover procedures
      to provide network interface redundancy and improved end-to-end fault
      tolerance. In SCTP's current failure detection procedure, the sender
      must experience Path.Max.Retrans (PMR) number of consecutive failed
      timer-based retransmissions on a destination address before detecting a
      path failure. Until detecting the path failure, the sender continues to
      transmit data on the failed path. The prolonged time in which <xref
      target="RFC4960"/> SCTP continues to use a failed path severely degrades
      the performance of the protocol. To address this problem, this
      document specifies a quick failover algorithm (SCTP-PF) based on the
      introduction of a new Potentially Failed path state in SCTP path
      management. The performance deficiencies of the <xref target="RFC4960"/>
      failover operation, and the improvements obtainable from the
      introduction of a Potentially Failed state in SCTP, were proposed and
      documented in <xref target="NATARAJAN09"/> for Concurrent Multipath Transfer SCTP <xref target="IYENGAR06"/>.</t>

 <t>While SCTP-PF can accelerate failover process and improve performance, the
      risks that an SCTP endpoint enters in dormant state where all destination 
      addresses are inactive can be increased. <xref target="RFC4960"/> 
      leaves the protocol operation during dormant state to implementations and 
      encourages to avoid entering the state as much as possible by careful tuning of the Path.Max.Retrans (PMR) 
      and Association.Max.Retrans (AMR) parameters. We specify a dormant state
      operation for SCTP-PF which makes SCTP-PF provide the same disruption
      tolerance as <xref target="RFC4960"/> despite that the dormant state may
      be entered more quickly. The dormant state operation may equally well be
      applied by an <xref target="RFC4960"/> implementation and will here
      serve to provide added fault tolerance for situations where the tuning
      of the Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR)
      parameters fail to provide adequate prevention of the entering of the
      dormant state.</t>

     <t>The operation after the recovery of a failed
      path equally well impacts the performance of the protocol. With the
      procedures specified in <xref target="RFC4960"/> SCTP will, after a
      failover from the primary path, switch back to use the primary path for
      data transfer as soon as this path becomes available again. From a
      performance perspective such a forced switchback of the data
      transmission path can be suboptimal as the CWND towards the original
      primary destination address has to be rebuilt once data transfer
      resumes, <xref target="CARO02"/>. As an optional alternative to the
      switchback operation of <xref target="RFC4960"/>, this document
      specifies an alternative Permanent Failover procedure which avoid such
      forced switchbacks of the data transfer path. The Permanent Failover
      operation was originally proposed in <xref target="CARO02"/>.</t>

      <t>While SCTP-PF primarily is motivated by a desire to improve the 
      multi-homed operation, the feature applies also to
      SCTP single-homed operation. Here the algorithm serves to provide
      increased failure detection on idle associations, whereas the failover
      or switchback aspects of the algorithm will not be activated. This is
      discussed in more detail in Appendix C.</t>

      <t>A brief description of the motivation for the introduction of the
      Potentially Failed state including a discussion of alternative
      approaches to mitigate the deficiencies of the <xref target="RFC4960"/>
      failover operation are given in the Appendices. Discussion of path
      bouncing effects that might be caused by frequent switchover, are also
      provided there.</t>
    </section>

    <section title="Conventions and Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"/>.</t>
    </section>

    <section anchor="SCTP-PF"
             title="SCTP with Potentially-Failed Destination State (SCTP-PF)">
      <section title="Overview">
        <t>To minimize the performance impact during failover, the sender
        should avoid transmitting data to a failed destination address as
        early as possible. In the <xref target="RFC4960"/> SCTP path
        management scheme, the sender stops transmitting data to a destination
        address only after the destination address is marked inactive. This
        process takes a significant amount of time as it requires the error
        counter of the destination address to exceed the Path.Max.Retrans
        (PMR) threshold. The issue cannot simply be mitigated by lowering of

        the PMR threshold because this may result in spurious failure
        detection and unnecessary prevention of the usage of a preferred
        primary path as well as it, due to the coupled tuning of the
        Path.Max.Retrans (PMR) and the Association.Max.Retrans (AMR) parameter
        values in <xref target="RFC4960"/>, may result in compromisation of
        the fault tolerance of SCTP.</t>

        <t>The solution provided in this document is to extend the SCTP path
        management scheme of <xref target="RFC4960"/> by the addition of the
        Potentially Failed (PF) state as an intermediate state in between the
        active and inactive state of a destination address in <xref
        target="RFC4960"/> path management scheme, and let the failover of
        data transfer away from a destination address be driven by the
        entering of the PF state instead of by the entering of the inactive
        state. Thereby SCTP may perform quick failover without compromising
        the overall fault tolerance of <xref target="RFC4960"/> SCTP. At the
        same time, RTO-based HEARTBEAT probing is initiated towards a
        destination address once it enters PF state. Thereby SCTP may quickly
        ascertain whether network connectivity towards the destination address
        is broken or whether the failover was spurious. In the case where the
        failover was spurious data transfer may quickly resume towards the
        original destination address.</t>

        <t>The new failure detection algorithm assumes that loss detected by a
        timeout implies either severe congestion or network connectivity
        failure and it assumes that by default a destination address is
        classified as PF already at the occurrence of one first timeout.</t>
      </section>

      <section title="Specification of the SCTP-PF Procedures">
        <t>The SCTP-PF operation is specified as follows: <list style="numbers">
            <t>The sender maintains a new tunable SCTP Protocol Parameter
            called PotentiallyFailed.Max.Retrans (PFMR). The PFMR defines the
            new intermediate PF threshold on the destination address error
            counter at exceed of which the destination address is classified
            as PF. The RECOMMENDED value of PFMR is 0, but other values MAY be
            used. Setting PFMR larger to or equal to Path.Max.Retrans (PMR)
            does not result in definition of a PF threshold for the
            destination address. I.e., the destination address will not be classified as PF
            prior to reaching inactive state.</t>

            <t>The error counter of an active destination address is
            incremented as specified in <xref target="RFC4960"/>. This means
            that the error counter of the destination address will be
            incremented each time the T3-rtx timer expires, or each time a
            HEARTBEAT chunk is sent when idle and not acknowledged within an
            RTO. When the value in the destination address error counter
            exceeds PFMR, the endpoint MUST mark the destination address as in
            the PF state.</t>

            <t>The PFMR threshold defines the point the destination address no
            longer is considered a good candidate for data transmission and a
            SCTP-PF sender SHOULD NOT send data to destination addresses in PF
            state when alternative destination addresses in active state are
            available. Specifically this means that: <list style="hanging">
                <t hangText="i">When there is outbound data to send and the
                destination address presently used for data transmission is in
                PF state, the sender SHOULD choose a destination address in
                active state, if one exists, and failover to deploy this
                destination address for data transmission.</t>

                <t hangText="ii">When retransmitting data that has timed out
                and the sender thus by <xref target="RFC4960"/>, section
                6.4.1, should attempt to pick a new destination address for
                data retransmission, the sender SHOULD choose an alternate
                destination transport address in active state if one
                exists.</t>

                <t hangText="iii">When there is outbound data to send and the
                SCTP user explicitly requests to send data to a destination
                address in PF state, the sender SHOULD send the data to an
                alternate destination address in active state if one
                exists.</t>
              </list>When choosing among multiple destination address in
            active state the following considerations are given: <list
                style="letters">
                <t>An SCTP sender should comply with [RFC4960], section 6.4.1,
                principles of choosing most divergent source-destination pairs
                compared with, for i.: the destination address in PF state
                that it performs a failover from, and for ii.: the destination
                address towards which the data timed out. Rules for picking
                the most divergent source-destination pair are an
                implementation decision and are not specified within this
                document.</t>

                <t>A SCTP-PF sender MAY choose to send data to a destination
                address in PF state, even if destination addresses in active
                state exist, have the SCTP-PF sender other means of
                information available that disqualifies the destination
                address in active state from being preferred. However, the
                discussion of such mechanisms is outside of the scope of the
                SCTP-PF operation specified in this document.</t>
              </list> In all cases, the sender MUST NOT change the state of
            chosen destination address, whether this state be active or PF,
            and it MUST NOT clear the error counter of the destination address
            as a result of choosing the destination address for data
            transmission.</t>

            <t>When the destination addresses are all in PF state or some in
            PF state and some in inactive state, the sender MUST choose one
            destination address in PF state and transmit or retransmit data to
            this destination address using the following rules: <list
                style="letters">
                <t>The sender SHOULD choose the destination in PF state with
                the lowest error count (fewest consecutive timeouts) for data
                transmission and transmit or retransmit data to this
                destination.</t>

                <t>When there are multiple PF destinations with same error
                count, the sender should let the choice among the multiple PF
                destination with equal error count be based on the <xref
                target="RFC4960"/>, section 6.4.1, principles of choosing most
                divergent source-destination pairs when executing (potentially
                consecutive) retransmission. Rules for picking the most
                divergent source-destination pair are an implementation
                decision and are not specified within this document.</t>

                <t>A sender MAY choose to deploy other strategies than the
                above when choosing among multiple PF destinations have the
                SCTP-PF sender other means of information available that
                qualifies a particular destination address for being used. The
                SCTP-PF protocol operation specified in this document makes no
                assumption of the existence of such other means of information
                and specifies for the above as the default operation of an
                SCTP-PF sender.</t>
              </list> The sender MUST NOT change the state and the error
            counter of any destination address regardless of whether it has
            been chosen for transmission or not.</t>

	     <t> The HB.interval of the Path Heartbeat function of 
             <xref target="RFC4960" />  MUST be ignored for destination addresses in PF state.  
             Instead HEARTBEAT chunks are sent to destination addresses in PF state
             once per RTO.  HEARTBEAT chunks SHOULD be sent to destination
             addresses in PF state, but the sending of HEARTBEATS MUST honor
             whether the Path Heartbeat function (Section 8.3 of <xref target="RFC4960" />)
             is enabled for the destination address or not.  I.e., if the
             Path Heartbeat function is disabled for the destination address
             in question, HEARTBEATS MUST NOT be sent.  
             Note that when Heartbeat function is disabled, it may take longer to 
             transition PF destination to ACTIVE. </t>
        
            <t>HEARTBEATs are sent when a destination address reaches the PF state.
             When a HEARTBEAT chunk is not acknowledged within the RTO, the
             sender increments the error counter and exponentially backs off
             the RTO value.  If the error counter is less than PMR, the
             sender transmits another packet containing the HEARTBEAT chunk
             immediately after timeout expiration on the previous HEARTBEAT.
             When data is being transmitted to a destination address in the
             PF state, the transmission of a HEARTBEAT chunk MAY be omitted
             in case receipt of a SACK of or a T3-rtx timer expiration on the
             outstanding data can provide equivalent information, 
             such as a case where the data chunk has transmitted to a single 
             destination.
             Likewise, the timeout of a HEARTBEAT chunk MAY be ignored if data is
             outstanding towards the destination address.
            </t>

            <t>When the sender receives a HEARTBEAT ACK from a HEARTBEAT sent
            to a destination address in PF state, the sender MUST clear the
            error counter of the destination address and transition the
            destination address back to active state. When the sender resumes
            data transmission on the destination address, it MUST do this
            following the prescriptions of Section 7.2 of <xref
            target="RFC4960"/>.</t>

            <t>Additional (PMR - PFMR) consecutive timeouts on a destination
            address in PF state confirm the path failure, upon which the
            destination address transitions to the inactive state. As
            described in <xref target="RFC4960"/>, the sender (i) SHOULD
            notify the ULP about this state transition, and (ii) transmit
            HEARTBEAT chunks to the inactive destination address at a lower
            HB.interval frequency as described in Section 8.3 of <xref
            target="RFC4960"/> (when the Path Heartbeat function is enabled
            for the destination address).</t>

            <t>Acknowledgments for chunks that have been transmitted to
            multiple destinations (i.e., a chunk which has been retransmitted
            to a different destination address than the destination address to
            which the chunk was first transmitted) MUST NOT clear the error
            count for an inactive destination address and MUST NOT transition
            a destination address in PF state back to active state, since a
            sender cannot disambiguate whether the ACK was for the original
            transmission or the retransmission(s). A SCTP sender MAY apply a
            different approach for the error count handling based on
            unequivocally information on which destination (including multiple
            destination addresses) the chunk reached. This document makes no
            reference to what such unequivocally information could consist of,
            neither how such unequivocally information could be obtained. The
            design of such an alternative approach is left to
            implementations.</t>

            <t>Acknowledgments for chunks that has been transmitted to one
            destination address only MUST clear the error counter for the
            destination address and MUST transition a destination address in
            PF state back to Active state. This situation can happen when new
            data is sent to a destination address in the PF state. It can also
            happen in situations where the destination address is in the PF
            state due to the occurrence of a spurious T3-rtx timer and
            Acknowledgments start to arrive for data sent prior to occurrence
            of the spurious T3-rtx and data has not yet been retransmitted
            towards other destinations. This document does not specify special
            handling for detection of or reaction to spurious T3-rtx timeouts,
            e.g., for special operation vis-a-vis the congestion control
            handling or data retransmission operation towards a destination
            address which undergoes a transition from active to PF to active
            state due to a spurious T3-rtx timeout. But it is noted that this
            is an area which would benefit from additional attention,
            experimentation and specification for Single Homed SCTP as well as
            for Multi Homed SCTP protocol operation.</t>

            <t>When all destination addresses are in inactive state, and SCTP
            protocol operation thus is said to be in dormant state, the
            prescriptions given in <xref target="dormant"/> shall be
            followed.</t>

            <t>The SCTP stack should provide the ULP with the means to expose
            the PF state of its destinations as well as the means to notify of
            state transitions from Active to PF, and vice-versa. However it is
            recommended that an SCTP stack implementing SCTP-PF also allows
            for that the ULP is kept ignorant of the PF state of its
            destinations and the associated state transition. For this reason
            is it recommended that an SCTP stack implementing SCTP-PF also
            should provide the ULP with the means to suppress exposure of PF
            state and the associated state transitions.</t>
          </list></t>
      </section>

     
    </section>

    <section anchor="dormant" title="Dormant State Operation">
      <t>In a situation with complete disruption of the communication in
      between the SCTP Endpoints, the aggressive HEARTBEAT transmissions of
      SCTP-PF on destination addresses in PF state may make the association
      enter dormant state faster than a standard <xref target="RFC4960"/> SCTP
      implementation given the same setting of Path.Max.Retrans (PMR) and
      Association.Max.Retrans (AMR). For example, an SCTP association with two
      destination addresses typically would reach dormant state in half the
      time of an <xref target="RFC4960"/> SCTP implementation in such
      situations. This is because a SCTP PF sender will send HEARTBEATS and
      data retransmissions in parallel with RTO intervals when there are
      multiple destinations addresses in PF state. This argument presumes that
      RTO << HB.interval of <xref target="RFC4960"/>. With the design
      goal that SCTP-PF shall provide the same level of disruption tolerance
      as an <xref target="RFC4960"/> SCTP implementation with the same
      Path.Max.Retrans (PMR) and Association.Max.Retrans (AMR) setting, we
      prescribe for that an SCTP-PF implementation SHOULD operate as described
      below in <xref target="dormant_details"/> during dormant state.</t>

      <t>An SCTP-PF implementation MAY choose a different dormant state
      operation than the one described below in <xref
      target="dormant_details"/> provided that the solution chosen does not
      compromise the fault tolerance of the SCTP-PF operation.</t>

      <t>The below prescription for SCTP-PF dormant state handling SHOULD NOT
      be coupled to the value of the PFMR, but solely to the activation of
      SCTP-PF logic in an SCTP implementation.</t>

      <t>It is noted that the below dormant state operation is considered to
      provide added disruption tolerance also for an <xref target="RFC4960"/>
      SCTP implementation, and that it can be sensible for an <xref
      target="RFC4960"/> SCTP implementation to follow this mode of operation. For an <xref target="RFC4960"/> SCTP implementation the
      continuation of data transmission during dormant state makes the fault
      tolerance of SCTP be more robust towards situations where some, or all,
      alternative paths of an SCTP association approach, or reach, inactive
      state prior to that the primary path used for data transmission observes
      trouble.</t>

      <section anchor="dormant_details" title="SCTP Dormant State Procedure">
        <t><list style="letters">
            <t>When the destination addresses are all in inactive state and
            data is available for transfer, the sender MUST choose one
            destination and transmit data to this destination address.</t>

            <t>The sender MUST NOT change the state of the chosen destination
            address (it remains in inactive state) and it MUST NOT clear the
            error counter of the destination address as a result of choosing
            the destination address for data transmission.</t>

            <t>The sender SHOULD choose the destination in inactive state with
            the lowest error count (fewest consecutive timeouts) for data
            transmission. When there are multiple destinations with same error
            count in inactive state, the sender SHOULD attempt to pick the
            most divergent source - destination pair from the last source -
            destination pair where failure was observed. Rules for picking the
            most divergent source-destination pair are an implementation
            decision and are not specified within this document. To support
            differentiation of inactive destination addresses based on their
            error count SCTP will need to allow for increment of the
            destination address error counters up to some reasonable limit
            above PMR+1, thus changing the prescriptions of <xref
            target="RFC4960"/>, section 8.3, in this respect. The exact limit
            to apply is not specified in this document but it is considered
            reasonable to require for such to be an order of magnitude higher
            than the PMR value. A sender MAY choose to deploy other strategies
            that the strategy defined by here. The strategy to prioritize the
            last active destination address, i.e., the destination address
            with the fewest error counts is optimal when some paths are
            permanently inactive, but suboptimal when a path instability is
            transient.</t>
          </list></t>
      </section>
    </section>

 <section anchor="permanent_failover" title="Permanent Failover">
        <t>The objective of the Permanent Failover operation is to allow the
        SCTP sender to continue data transmission on a new working path even
        when the old primary destination address becomes active again. This is
        achieved by having SCTP perform a switch over of the primary path to
        the new working path if the error counter of the primary path exceeds a certain threshold. This mode of operation can be applied not only to SCTP-PF implementations, 
        but also to <xref target="RFC4960"/> implementations.
</t>

        <t>The Permanent Failover operation requires only sender side changes.
        The details are:</t>

        <t><list style="numbers">
            <t>The sender maintains a new tunable parameter, called
            Primary.Switchover.Max.Retrans (PSMR). For SCTP-PF implementations, the PSMR MUST be set
            greater or equal to the PFMR value. For <xref target="RFC4960"/> implementations the PSMR MUST be set greater or equal to the PMR value. Implementations MUST reject
            any other values of PSMR.</t>

            <t>When the path error counter on a set primary path exceeds PSMR,
            the SCTP implementation MUST autonomously select and set a new
            primary path.</t>

            <t>The primary path selected by the SCTP implementation MUST be
            the path which at the given time would be chosen for data
            transfer. A previously failed primary path can be used as data
            transfer path as per normal path selection when the present data
            transfer path fails.</t>

 <t>For SCTP-PF, the recommended value of PSMR is PFMR when Permanent Failover
            is used. This means that no forced switchback to a previously
            failed primary path is performed. An SCTP-PF implementation of Permanent
            Failover MUST support the setting of PSMR = PFMR. A
           SCTP-PF implementation of Permanent Failover MAY support setting of PSMR
            > PFMR.</t>

              <t>For <xref
            target="RFC4960"/> SCTP, the recommended value of PSMR is PMR when Permanent Failover
            is used. This means that no forced switchback to a previously
            failed primary path is performed. A <xref
            target="RFC4960"/> SCTP implementation of Permanent
            Failover MUST support the setting of PSMR = PMR An <xref
            target="RFC4960"/> SCTP 
            implementation of Permanent Failover MAY support larger settings of PSMR > PMR.</t>


            <t>It MUST be possible to disable the Permanent Failover and
            obtain the standard switchback operation of <xref
            target="RFC4960"/>.</t>
          </list></t>

        <t>The manner of switch over operation that is most optimal in a given
        scenario depends on the relative quality of a set primary path versus
        the quality of alternative paths available as well as it depends on
        the extent to which it is desired for the mode of operation to enforce
        traffic distribution over a number of network paths. I.e., load
        distribution of traffic from multiple SCTP associations may be sought
        to be enforced by distribution of the set primary paths with <xref
        target="RFC4960"/> switchback operation. However as <xref
        target="RFC4960"/> switchback behavior is suboptimal in certain
        situations, especially in scenarios where a number of equally good
        paths are available, an SCTP implementation MAY support also, as
        alternative behavior, the Permanent Failover mode of operation and MAY
        enable it based on users' requests.</t>

        <t>For an SCTP implementation that implements Permanent Failover, this
        specification RECOMMENDS that the standard RFC4960 switchback
        operation is retained as the default operation.</t>
      </section>

    <section title="Suggested SCTP Protocol Parameter Values">
      <t>This document does not alter the <xref target="RFC4960"/> value
      RECOMMENDATIONS for the SCTP Protocol Parameters defined in <xref
      target="RFC4960"/>.</t>

      <t>The following protocol parameter is RECOMMENDED:<list style="empty">
          <t>PotentiallyFailed.Max.Retrans (PFMR) - 0</t>
        </list></t>
    </section>

    <section title="Socket API Considerations">
      <t>This section describes how the socket API defined in <xref
      target="RFC6458"/> is extended to provide a way for the application to
      control and observe the SCTP-PF behavior as well as the Permanent Failover function.</t>

      <t>Please note that this section is informational only.</t>

      <t>A socket API implementation based on <xref target="RFC6458"/> is, by
      means of the existing SCTP_PEER_ADDR_CHANGE event, extended to provide
      the event notification when a peer address enters or leaves the
      potentially failed state as well as the socket API implementation is
      extended to expose the potentially failed state of a peer address in the
      existing SCTP_GET_PEER_ADDR_INFO structure.</t>

      <t>Furthermore, two new read/write socket options for the level
      IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and
      SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below. The
      first socket option is used to control the values of the PFMR and PSMR
      parameters described in <xref target="SCTP-PF"/> and in <xref target="permanent_failover"/>. The second one
      controls the exposition of the potentially failed path state.</t>

      <t>Support for the SCTP_PEER_ADDR_THLDS and
      SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be
      added to the function sctp_opt_info().</t>

      <section anchor="pf_support_api"
               title="Support for the Potentially Failed Path State">
        <t>As defined in <xref target="RFC6458"/>, the SCTP_PEER_ADDR_CHANGE
        event is provided if the status of a peer address changes. In addition
        to the state changes described in <xref target="RFC6458"/>, this event
        is also provided, if a peer address enters or leaves the potentially
        failed state. The notification as defined in <xref target="RFC6458"/>
        uses the following structure:</t>

        <figure>
          <artwork><![CDATA[
struct sctp_paddr_change {
  uint16_t spc_type;
  uint16_t spc_flags;
  uint32_t spc_length;
  struct sockaddr_storage spc_aaddr;
  uint32_t spc_state;
  uint32_t spc_error;
  sctp_assoc_t spc_assoc_id;
}
]]></artwork>
        </figure>

        <t><xref target="RFC6458"/> defines the constants SCTP_ADDR_AVAILABLE,
        SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and
        SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This
        document defines in addition to that the new constant
        SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected
        address becomes potentially failed.</t>

        <t>The SCTP_GET_PEER_ADDR_INFO socket option defined in <xref
        target="RFC6458"/> can be used to query the state of a peer address.
        It uses the following structure:</t>

        <figure>
          <artwork><![CDATA[
struct sctp_paddrinfo {
  sctp_assoc_t spinfo_assoc_id;
  struct sockaddr_storage spinfo_address;
  int32_t spinfo_state;
  uint32_t spinfo_cwnd;
  uint32_t spinfo_srtt;
  uint32_t spinfo_rto;
  uint32_t spinfo_mtu;
};
]]></artwork>
        </figure>

        <t><xref target="RFC6458"/> defines the constants SCTP_UNCONFIRMED,
        SCTP_ACTIVE, and SCTP_INACTIVE to be provided in the spinfo_state
        field. This document defines in addition to that the new constant
        SCTP_POTENTIALLY_FAILED, which is reported if the peer address is
        potentially failed.</t>
      </section>

      <section title="Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option">
        <t>Applications can control the SCTP-PF behavior by getting or setting
        the number of consecutive timeouts before a peer address is considered
        potentially failed or unreachable. The same socket option is used by applications to set and get the number of timeouts  before the primary path is
        changed automatically by the Permanent Failover function. This socket option uses the level IPPROTO_SCTP
        and the name SCTP_PEER_ADDR_THLDS.</t>

        <t>The following structure is used to access and modify the
        thresholds:</t>

        <figure>
          <artwork><![CDATA[
struct sctp_paddrthlds {
  sctp_assoc_t spt_assoc_id;
  struct sockaddr_storage spt_address;
  uint16_t spt_pathmaxrxt;
  uint16_t spt_pathpfthld;
  uint16_t spt_pathcpthld;
};
]]></artwork>
        </figure>

        <t><list style="hanging">
            <t hangText="spt_assoc_id:">This parameter is ignored for
            one-to-one style sockets. For one-to-many style sockets the
            application may fill in an association identifier or
            SCTP_FUTURE_ASSOC. It is an error to use SCTP_{CURRENT|ALL}_ASSOC
            in spt_assoc_id.</t>

            <t hangText="spt_address:">This specifies which peer address is of
            interest. If a wild card address is provided, this socket option
            applies to all current and future peer addresses.</t>

            <t hangText="spt_pathmaxrxt:">Each peer address of interest is
            considered unreachable, if its path error counter exceeds
            spt_pathmaxrxt.</t>

            <t hangText="spt_pathpfthld:">Each peer address of interest is
            considered Potentially Failed, if its path error counter exceeds
            spt_pathpfthld.</t>

            <t hangText="spt_pathcpthld:">Each peer address of interest is not
            considered the primary remote address anymore, if its path error
            counter exceeds spt_pathcpthld. Using a value of 0xffff disables
            the selection of a new primary peer address. If an implementation
            does not support the automatically selection of a new primary
            address, it should indicate an error with errno set to EINVAL if a
            value different from 0xffff is used in spt_pathcpthld. For SCTP-PF, the setting of
            spt_pathcpthld < spt_pathpfthld should be rejected with errno
            set to EINVAL. For  <xref
      target="RFC4960"/> SCTP, the setting of
            spt_pathcpthld < spt_pathmaxrxt should be rejected with errno
            set to EINVAL. A SCTP-PF implementation MAY support only setting of
            spt_pathcpthld = spt_pathpfthld and spt_pathcpthld = 0xffff and a  <xref
      target="RFC4960"/> SCTP implementation MAY support only setting of
            spt_pathcpthld = spt_pathmaxrxt and spt_pathcpthld = 0xffff. In
            these cases SCTP shall reject setting of other values with errno set
            to EINVAL.</t>
          </list></t>
      </section>

      <section title="Exposing the Potentially Failed Path State                       (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option">
        <t>Applications can control the exposure of the potentially failed
        path state in the SCTP_PEER_ADDR_CHANGE event and the
        SCTP_GET_PEER_ADDR_INFO as described in <xref
        target="pf_support_api"/>. The default value is implementation
        specific.</t>

        <t>This socket option uses the level IPPROTO_SCTP and the name
        SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.</t>

        <t>The following structure is used to control the exposition of the
        potentially failed path state:</t>

        <figure>
          <artwork><![CDATA[
struct sctp_assoc_value {
  sctp_assoc_t assoc_id;
  uint32_t assoc_value;
};
]]></artwork>
        </figure>

        <t><list style="hanging">
            <t hangText="assoc_id:">This parameter is ignored for one-to-one
            style sockets. For one-to-many style sockets the application may
            fill in an association identifier or SCTP_FUTURE_ASSOC. It is an
            error to use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.</t>

            <t hangText="assoc_value:">The potentially failed path state is
            exposed if and only if this parameter is non-zero.</t>
          </list></t>
      </section>
    </section>

    <section title="Security Considerations">
      <t>Security considerations for the use of SCTP and its APIs are
      discussed in <xref target="RFC4960"/> and <xref target="RFC6458"/>.</t>

      <t> The logic introduced by this document does not impact existing
       on-the-wire SCTP messages. Also, this document does not introduce 
       any new on-the-wire SCTP messages that require new security considerations.
      </t>

      <t>
       SCTP-PF makes SCTP not only more robust during 
       primary path failure/congestion but also more vulnerable to 
       network connectivity/congestion attacks on the primary path. 
       SCTP-PF makes it easier for an attacker to trick SCTP to change data
       transfer path, since the duration of time that an attacker needs to
       compromise the network connectivity is much shorter than <xref target="RFC4960" />. 
       However, SCTP-PF does not constitute a significant change
       in the duration of time and effort an attacker needs to keep SCTP
       away from the primary path. With the standard switchback operation
       <xref target="RFC4960" /> SCTP resumes data transfer on its primary path as soon as
       the next HEARTBEAT succeeds. 
      </t>

     <t>
       On the other hand, usage of the Permanent Failover mechanism,
       does change the treat analysis. This is because attackers can force
       a permanent change of the data transfer path by blocking the primary path  
       until the switchover of the primary path is triggered by the
       Permanent Failover algorithm. 
       This especially will be the case  when Permanent Failover is used together
       with SCTP-PF with the particular setting of PSMR = PFMR = 0, as
       Permanent Failover here happens already at the first RTO timeout
       experienced.  Users of the Permanent Failover mechanism should be
       aware of this fact.
     </t>
     
     <t>
       The event notification of path state transfer from active to
       potentially failed state and vice versa gives attackers an increased
       possibility to generate more local events. However, it is
       assumed that event notifications are rate-limited in the implementation
       to address this threat.
     </t>

    </section>

    <section title="IANA Considerations">
      <t>This document does not create any new registries or modify the rules
      for any existing registries managed by IANA.</t>
    </section>

    <section title="Acknowledgements">
      <t>The authors wish to thank Michael Tuexen for his many invaluable
      comments and for his very substantial support with the making of this
      document.</t>
    </section>

    <section title="Proposed Change of Status (to be Deleted before Publication)">
      <t>Initially this work looked to entail some changes of the Congestion
      Control (CC) operation of SCTP and for this reason the work was proposed
      as Experimental. These intended changes of the CC operation have since
      been judged to be irrelevant and are no longer part of the
      specification. As the specification entails no other potential harmful
      features, consensus exists in the WG to bring the work forward as
      PS.</t>

      <t>Initially concerns have been expressed about the possibility for the
      mechanism to introduce path bouncing with potential harmful network
      impacts. These concerns are believed to be unfounded. This issue is
      addressed in Appendix B.</t>

      <t>It is noted that the feature specified by this document is
      implemented by multiple SCTP SW implementations and furthermore that
      various variants of the solution have been deployed in Telco signaling
      environments for several years with good results.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119" ?>

      <?rfc include="reference.RFC.4960" ?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.6458" ?>

      <reference anchor="IYENGAR06" target="">
        <front>
          <title>Concurrent Multipath Transfer using SCTP Multihoming over
          Independent End-to-end Paths.</title>

          <author fullname="" initials="J." surname="Iyengar">
            <organization/>
          </author>

          <author fullname="" initials="P." surname="Amer">
            <organization/>
          </author>

          <author fullname="" initials="R." surname="Stewart">
            <organization/>
          </author>

          <date month="10" year="2006"/>
        </front>

        <seriesInfo name="IEEE/ACM Trans on Networking" value="14(5)"/>
      </reference>

      <reference anchor="NATARAJAN09" target="">
        <front>
          <title>Concurrent Multipath Transfer during Path Failure</title>

          <author fullname="" initials="P." surname="Natarajan">
            <organization/>
          </author>

          <author fullname="" initials="N." surname="Ekiz">
            <organization/>
          </author>

          <author fullname="" initials="P." surname="Amer">
            <organization/>
          </author>

          <author fullname="" initials="R." surname="Stewart">
            <organization/>
          </author>

          <date month="5" year="2009"/>
        </front>

        <seriesInfo name="Computer Communications" value=""/>
      </reference>

      <reference anchor="JUNGMAIER02" target="">
        <front>
          <title>On the use of SCTP in failover scenarios</title>

          <author fullname="" initials="A." surname="Jungmaier">
            <organization/>
          </author>

          <author fullname="" initials="E." surname="Rathgeb">
            <organization/>
          </author>

          <author fullname="" initials="M." surname="Tuexen">
            <organization/>
          </author>

          <date month="7" year="2002"/>
        </front>

        <seriesInfo name="World Multiconference on Systemics, Cybernetics and Informatics"
                    value=""/>
      </reference>

      <reference anchor="GRINNEMO04" target="">
        <front>
          <title>Performance of SCTP-controlled failovers in M3UA-based
          SIGTRAN networks</title>

          <author fullname="" initials="K-J" surname="Grinnemo">
            <organization/>
          </author>

          <author fullname="" initials="A." surname="Brunstrom">
            <organization/>
          </author>

          <date month="4" year="2004"/>
        </front>

        <seriesInfo name="Advanced Simulation Technologies Conference"
                    value=""/>
      </reference>

      <reference anchor="FALLON08" target="">
        <front>
          <title>SCTP Switchover Performance Issues in WLAN
          Environments</title>

          <author fullname="" initials="S." surname="Fallon">
            <organization/>
          </author>

          <author fullname="" initials="P." surname="Jacob">
            <organization/>
          </author>

          <author fullname="" initials="Y." surname="Qiao">
            <organization/>
          </author>

          <author fullname="" initials="L." surname="Murphy">
            <organization/>
          </author>

          <author fullname="" initials="E." surname="Fallon">
            <organization/>
          </author>

          <author fullname="" initials="A." surname="Hanley">
            <organization/>
          </author>

          <date month="1" year="2008"/>
        </front>

        <seriesInfo name="IEEE CCNC" value="2008"/>
      </reference>

      <reference anchor="CARO04" target="">
        <front>
          <title>End-to-End Failover Thresholds for Transport Layer
          Multihoming</title>

          <author fullname="" initials="A." surname="Caro Jr.">
            <organization/>
          </author>

          <author fullname="" initials="P." surname="Amer">
            <organization/>
          </author>

          <author fullname="" initials="R." surname="Stewart">
            <organization/>
          </author>

          <date month="11" year="2004"/>
        </front>

        <seriesInfo name="MILCOM 2004" value=""/>
      </reference>

      <reference anchor="CARO05" target="">
        <front>
          <title>End-to-End Fault Tolerance using Transport Layer
          Multihoming</title>

          <author fullname="" initials="A." surname="Caro Jr.">
            <organization/>
          </author>

          <date month="1" year="2005"/>
        </front>

        <seriesInfo name="Ph.D Thesis, University of Delaware" value=""/>
      </reference>

      <reference anchor="CARO02" target="">
        <front>
          <title>A Two-level Threshold Recovery Mechanism for SCTP</title>

          <author fullname="" initials="A." surname="Caro Jr.">
            <organization/>
          </author>

          <author fullname="" initials="J." surname="Iyengar">
            <organization/>
          </author>

          <author fullname="" initials="P." surname="Amer">
            <organization/>
          </author>

          <author fullname="" initials="G." surname="Heinz">
            <organization/>
          </author>

          <author fullname="" initials="R." surname="Stewart">
            <organization/>
          </author>

          <date month="7" year="2002"/>
        </front>

        <seriesInfo name="Tech report, CIS Dept, University of Delaware"
                    value=""/>
      </reference>
    </references>

    <section anchor="alternative_approach"
             title="Discussions of Alternative Approaches">
      <t>This section lists alternative approaches for the issues described in
      this document. Although these approaches do not require to update
      RFC4960, we do not recommend them from the reasons described below.</t>

      <section title="Reduce Path.Max.Retrans (PMR)">
        <t>Smaller values for Path.Max.Retrans shorten the failover duration and in fact this is recommended in some research results <xref
        target="JUNGMAIER02"/> <xref target="GRINNEMO04"/> <xref
        target="FALLON08"/>. However to significantly reduce the failover time it is required to go down (as with PFMR) to Path.Max.Retrans=0 and with this setting SCTP
        switches to another destination address already on a single timeout which may result in spurious failover. Spurious failover is a problem in  <xref target="RFC4960"/> SCTP
        as the transmission of HEARTBEATS on the left primary path, unlike in SCTP-PF,  is governed by
        'HB.interval' also during the failover process. 'HB.interval' is usually
        set in the order of seconds (recommended value is 30 seconds) and when
        the primary path becomes inactive, the next HEARTBEAT may be
        transmitted only many seconds later. Indeed as recommended, only 30 secs later. Meanwhile, the primary path may since long have
        recovered, if it needed recovery at all (indeed the failover could be truely spurious). In such situations, post failover, an endpoint is forced to
        wait in the order of many seconds before the endpoint can resume
        transmission on the primary path and furthermore once it returns on the primary path the CWND needs to be rebuild anew - a process which the throughput already have had to suffer from on the alternate path. Using a smaller value for
        'HB.interval' might help this situation, but it would result in a general waste of
        bandwidth as such more frequent HEARBEATING would take place also when there are no observed troubles. The bandwidth overhead may be diminished by having the ULP use a smaller 'HB.interval' only on the path which at any given time is set to  be the primary path, but this adds complication in the ULP.</t>

        <t>In addition, smaller Path.Max.Retrans values also affect the
        'Association.Max.Retrans' value. When the SCTP association's error
        count exceeds
        Association.Max.Retrans threshold, the SCTP sender considers the peer
        endpoint unreachable and terminates the association. 
        Section 8.2 in <xref target="RFC4960"/> recommends that
        Association.Max.Retrans value should not be larger than the summation
        of the Path.Max.Retrans of each of the destination addresses. Else the
        SCTP sender considers its peer reachable even when all destinations
        are INACTIVE and to avoid this dormant state operation, <xref target="RFC4960"/>  SCTP
        implementation SHOULD reduce Association.Max.Retrans accordingly
        whenever it reduces Path.Max.Retrans. However, smaller
        Association.Max.Retrans value compromizes the fault tolerance of SCTP as it increases the chances of association
        termination during minor congestion events.</t>
      </section>

      <section title="Adjust RTO related parameters">
        <t>As several research results indicate, we can also shorten the
        duration of failover process by adjusting RTO related parameters <xref
        target="JUNGMAIER02"/> <xref target="FALLON08"/>. During failover
        process, RTO keeps being doubled. However, if we can choose smaller
        value for RTO.max, we can stop the exponential growth of RTO at some
        point. Also, choosing smaller values for RTO.initial or RTO.min can
        contribute to keep the RTO value small.</t>

        <t>Similar to reducing Path.Max.Retrans, the advantage of this
        approach is that it requires no modification to the current
        specification, although it needs to ignore several recommendations
        described in the Section 15 of <xref target="RFC4960"/>. However, this
        approach requires to have enough knowledge about the network
        characteristics between end points. Otherwise, it can introduce
        adverse side-effects such as spurious timeouts.</t>

<t> The significant issue with this approach, however, is that even if the RTO.max is lowered to an optimal low value, then as long as the Path.Max.Retrans is kept at the  <xref target="RFC4960"/> recommended value, the reduction of the RTO.max doesn't reduce the failover time sufficiently enough to prevent severe performance degradation during failover. </t>
      </section>

    </section>

    <section anchor="path_bouncing"
             title="Discussions for Path Bouncing Effect">
      <t>The methods described in the document can accelerate the failover
      process. Hence, they might introduce the path bouncing effect where the
      sender keeps changing the data transmission path frequently. This sounds
      harmful to the data transfer, however several research results indicate
      that there is no serious problem with SCTP in terms of path bouncing
      effect <xref target="CARO04"/> <xref target="CARO05"/>.</t>

      <t>There are two main reasons for this. First, SCTP is basically
      designed for multipath communication, which means SCTP maintains all
      path related parameters (CWND, ssthresh, RTT, error count, etc) per each
      destination address. These parameters cannot be affected by path
      bouncing. In addition, when SCTP migrates the data transfer to another
      path, it starts with the minimal or the initial CWND. Hence, there is
      little chance for packet reordering or duplicating.</t>

      <t>Second, even if all communication paths between the end-nodes share
      the same bottleneck, the SCTP-PF results in a behavior already allowed
      by <xref target="RFC4960"/>.</t>
    </section>

    <section anchor="sh" title="SCTP-PF for SCTP Single-homed Operation ">
      <t>For a single-homed SCTP association the only tangible effect of the
      activation of SCTP-PF operation is enhanced failure detection in terms
      of potential notification of the PF state of the sole destination
      address as well as, for idle associations, more rapid entering, and
      notification, of inactive state of the destination address and more
      rapid end-point failure detection. It is believed that neither of these
      effects are harmful, provided adequate dormant state operation is
      implemented, and furthermore that they may be particularly useful for
      applications that deploys multiple SCTP associations for load balancing
      purposes. The early notification of the PF state may be used for
      preventive measures as the entering of the PF state can be used as a
      warning of potential congestion. Depending on the PMR value, the
      aggressive HEARTBEAT transmission in PF state may speed up the end-point
      failure detection (exceed of AMR threshold on the sole path error
      counter) on idle associations in case where relatively large HB.interval
      value compared to RTO (e.g. 30secs) is used.</t>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 03:00:08