One document matched: draft-ietf-l2tpext-failover-03.txt
Differences from draft-ietf-l2tpext-failover-02.txt
Network Working Group Vipin Jain
Internet-Draft Riverstone Networks
Expires Sep 2004 Editor
March 2004
Fail Over extensions for L2TP "failover"
draft-ietf-l2tpext-failover-03
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
L2TP is a connection-oriented protocol that has shared state between
active endpoints. Some of this shared state is vital for operation
but may be rather volatile in nature, such as packet sequence numbers
used on the L2TP Control Connection. When failure of one side of a
control connection occurs, a new control connection is created and
associated with the old connection by exchanging information about
the old connection. Such a mechanism is not intended as a replacement
for an active fail over with some mirrored connection states, but as
an aid just for those parameters that are particularly difficult to
have immediately available. Protocol extensions to L2TP defined in
this document are intended to facilitate state recovery, providing
additional resiliency in an L2TP network and improving a remote
system's layer 2 connectivity.
Jain expires Sep 2004 [Page 1]
INTERNET DRAFT FAILOVER March 2004
Contributors
Following is the complete list of contributors to this document.
Vipin Jain Riverstone Networks
Paul Howard Juniper Networks
Mark Townsley Cisco Systems
Sam Henderson Cisco Systems
Ly Loi Tahoe Networks
Leo Huber Extreme Networks
Keyur Parikh Sentito Networks
Table of Contents
Status of this Memo .......................................... 1
1.0 Introduction ............................................. 3
2.0 Protocol Operation ....................................... 4
2.1 Pre Failover Operation ................................... 4
2.2 Failover Recovery Process ................................ 5
2.2.1 Recovery Tunnel Establishment .......................... 5
2.2.2 Control and Data Channel Reset ......................... 6
2.3 Session State Synchronization ............................ 8
3.0 IANA Considerations ...................................... 9
4.0 Security Considerations .................................. 10
5.0 Acknowledgements ......................................... 10
6.0 Author Addresses ......................................... 11
7.0 References................................................ 11
Appendix A .................................................. 11
Appendix B .................................................. 12
Appendix C .................................................. 14
Appendix D .................................................. 15
Terminology
Endpoint: An L2TP control connection endpoint, either LAC or LNS.
Active Endpoint: An endpoint that is currently providing service.
Backup Endpoint: A redundant endpoint standing by for the active endpoint.
Recovered Tunnel: An old tunnel that has been recovered using the
mechanism described in this document.
Recovery Tunnel: A new tunnel that is established only to recover an
old tunnel.
Failover: The action of a backup endpoint taking over the service of an
active endpoint. This could be due to administrative action or failure
of the active endpoint.
Jain expires Sep 2004 [Page 2]
INTERNET DRAFT FAILOVER March 2004
1.0 Introduction
The goal of this draft is to aid the overall resiliency of an L2TP
endpoint by introducing extensions to RFC 2661 [L2TP] that will
minimize the recovery time of the L2TP layer after a failover, while
minimizing the impact on its performance. Therefore it is assumed that
the endpoint's overall architecture is also supportive in the
resiliency effort.
To ensure proper operation of a L2TP endpoint after a failover, the
associated information of the tunnels and sessions between them must be
correct and consistent. This includes both the configured and dynamic
information. The configured information is assumed to be correct and
consistent after a failover, otherwise the tunnels and sessions would
not have been setup in the first place. The dynamic information, which
is also referred to as stateful information, changes with the
processing of the tunnel's control and data packets. Currently, the only
such information that is essential to the tunnel's operation is its
sequence numbers. For the tunnel control channel, the inconsistencies
in its sequence numbers can result in the termination of the entire
tunnel. For tunnel sessions, the inconsistency in its sequence numbers,
when used, can cause significant data loss thus giving perception of
"service loss" to the end user.
Thus, an optimal resilient architecture that aims to minimize "service
loss" after a failover must make provision for the tunnel's essential
stateful information - i.e. its sequence numbers. Currently, there are
two options available: the first option is to ensure that the backup
endpoint is completely synchronized with the active with respect to the
control and data sessions sequence numbers. The other option is to
re-establish all the tunnels and its sessions after a failover.
The drawback of the first option is that it adds significant
performance and complexity impact to the endpoint's architecture,
especially as tunnel and session aggregation increases. The drawback of
the second option is that it increases the "service loss" time,
especially as the architecture scales.
To alleviate the above-mentioned drawbacks of the current options, this
draft introduces a mechanism to bring the dynamic stateful information
of a tunnel to correct and consistent state after a failure. The proposed
mechanism, defines the recovery of tunnels and sessions that were in
established state prior to the failure.
Jain expires Sep 2004 [Page 3]
INTERNET DRAFT FAILOVER March 2004
2.0 Protocol Operation
The failover protocol consists of three phases - pre failover,
failover recovery, and session state synchronization.
Pre failover operation allows an endpoint to specify its failover
capabilities and timer values, attributes that are used when failover
occurs.
Failover recovery is started at the failed endpoint when it initiates
a new L2TP tunnel (called recovery tunnel), for every old tunnel that
needs recovery. The recovery tunnel serves four purposes: 1) It provides
a means of authentication and a three-way handshake to ensure both ends
agree on the failover for a given tunnel. 2) It identifies the old
tunnel that needs recovery. 3) It tells whether failed endpoint
would like to recover control and/or data channel. 4) It exchanges
the Ns and Nr values to be used in the recovered tunnel on both ends.
Upon establishing the recovery tunnel, two endpoints reset their control
and/or data channel, after which the recovery tunnel could be torn down.
The sessions that were in established state resume traffic.
Session state synchronization process allows two endpoints to agree
on the state of various sessions in the recovered tunnel. The
inconsistency could arise due to failure on one of the endpoints.
To synchronize, both endpoints first silently clears the sessions that
were not in established state. At this point they can allow new sessions
to establish on the recovered tunnel. Then, they utilize FSQ/FSR messages
(over recovered tunnel) to obtain the state of sessions on the peer,
in order to clear stale sessions.
2.1 Pre Failover Operation
An endpoint that supports the failover protocol defined in this
document MUST include Failover Capability AVP in SCCRQ or SCCRP
during tunnel establishment.
Failover Capability AVP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|M|H| rsvd | Length | Vendor Id [IETF] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Type [TBD] | Reserved |D|C|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Recovery Time (in milliseconds) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is not
mandatory (the M-bit MUST be set to 0).
Jain expires Sep 2004 [Page 4]
INTERNET DRAFT FAILOVER March 2004
The D bit, when set indicates that an endpoint is capable of
supporting its peer's data channel failure. The C bit, when set
indicates that an endpoint is capable of supporting its peer's
control channel failure.
Recovery Time is the time in milliseconds an endpoint asks its
peer to wait before assuming the recovery process has failed.
This timer starts with when an endpoint's control channel
timeout ([L2TP] section 5.8) is started, and is not terminated
(before expiry) until an endpoint successfully authenticate
its peer during recovery. A value of zero indicates that
the sender can not preserve the state of sessions within the
tunnel, but it is able to support its peer's failure.
2.2 Failover Recovery Procedure
Failover recovery procedure consists of two steps: 1) Recovery
tunnel establishment 2) Control and/or data channel reset
2.2.1 Recovery tunnel establishment
Failed endpoint establishes a new tunnel, called recovery
tunnel, for every old tunnel it wishes to recover. The purpose
of the recovery tunnel is solely to recover the corresponding
old tunnel. An endpoint SHOULD not send any control message on
this tunnel, other than the messages to establish the tunnel
itself. To indicate failure on its end, the recovery tunnel
MUST include Tunnel Recovery AVP in its SCCRQ message.
Tunnel Recovery AVP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|M|H| rsvd | Length | Vendor Id [IETF] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Type [TBD] | Reserved |D|C|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Recover Tunnel Id | Recover Remote Tunnel Id |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is
mandatory (the M-bit is set to 1).
The D bit is set when a failed endpoint would like to recover
the data channel. The C bit is set when the failed endpoint would
like to recover the control channel.
Jain expires Sep 2004 [Page 5]
INTERNET DRAFT FAILOVER March 2004
Recover Tunnel Id encodes the tunnel id that is subjected to
recovery. Similarly, Recover Remote Tunnel Id encodes the
remote tunnel id corresponding to the old tunnel.
Upon getting an SCCRQ with Tunnel Recovery AVP, the non failed
endpoint validates Recover Tunnel Id and Recover Remote Tunnel
Id and responds with an SCCRP. It MUST terminate the tunnel
when:
- Recover Tunnel Id or Remote Recover Tunnel Id is unknown.
- Non failed endpoint did not indicate it was failover capable.
If non failed endpoint accepts the SCCRQ, it MAY include
Suggested Control Sequence AVP in the SCCRP.
Suggested Control Sequence AVP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|M|H| rsvd | Length | Vendor Id [IETF] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Type [TBD] | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Suggested Ns | Suggested Nr |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is
not mandatory (the M-bit is set to 0).
This is an optional AVP, suggesting the Ns and Nr values to be
used by the failed endpoint. If this AVP is present in an
SCCRP message, the failed endpoint MUST set the Ns and Nr
values of the recovered tunnel to the respective suggested
values. When this AVP is not sent in SCCRP or not present in an
incoming SCCRP, the Ns and Nr values for the recovered tunnel
are set to zero. It is RECOMMENDED that the non failed endpoint
suggests the Ns and Nr values to help avoid the interference in
recovered tunnel's control channel with old control packets.
To authenticate its peer during tunnel recovery, an endpoint
MUST follow the procedure described in [L2TP] section 5.1.1
using the same secret used to authenticate the old tunnel. Not
being able to authenticate could be a reason to terminate then
new tunnel. If, for any reason, the failed endpoint could not
establish the recovery tunnel then it MUST silently clear the
recovered tunnel and sessions within, assuming the recovery
process has failed.
Jain expires Sep 2004 [Page 6]
INTERNET DRAFT FAILOVER March 2004
Any control packet received on the recovered tunnel, before
control channel reset, MUST be siliently discarded.
If both endpoints fail simultaneously, then each endpoint
SHOULD follow the procedure described for a failed endpoint to
recover the tunnel and its sessions. To avoid teardown of
either one of the recovery tunnels, it is RECOMMENDED that tie
breaker AVP ([L2TP] section 4.4.3) is not used during recovery
tunnel establishment. Appendix C illustrates double failover
scenario.
2.2.2 Control and Data Channel Reset
Failed endpoint in Tunnel Recovery AVP (SCCRQ) indicates if it
would like to reset control channel and/or data channel.
Control channel reset on recovered tunnel SHOULD flush the
transmit and receive windows, and reset the control channel
sequence numbers (i.e. Ns and Nr values). The control channel
on failed endpoint is reset upon getting a valid SCCRP, whereas
control channel on non failed endpoint is reset upon getting a
valid SCCCN. If failed endpoint does not receive Suggested
sequence number AVP in SCCRP then it MUST reset Ns and Nr
values to zero. Similarly, if non failed endpoint opts not to
send suggested sequence number AVP then it MUST reset Ns and Nr
values to zero.
Data channel reset requires no action if data channel doesn't
use sequence numbers. Whereas if data channel were using
sequence numbers then the sequence numbers are reset as
follows:
- Before sending SCCRQ on the recovery tunnel, failed endpoint
MUST stop receiving and transmitting data packets on all
sessions.
- Failed endpoint resets Ns to zero. It also sets Nr from the
Ns received in the first data packet after sending SCCCN on the
recovery tunnel.
- After resetting Ns and Nr values, failed endpoint can start
transmit and receive data.
- Non failed endpoint reset the Nr to zero upon receipt of a
valid SCCN. It doesn't reset the Ns value.
An endpoint MUST prevent establishment of new sessions until it
has cleared (or marked for clearance) the sessions that were
not in established state i.e. until after Step 1, section 2.3
is complete.
Jain expires Sep 2004 [Page 7]
INTERNET DRAFT FAILOVER March 2004
2.3 Session State Synchronization
If failover happens while a session is being established or being
torn down, it is possible for an endpoint to consider a session in
established state, when its peer considers the same session non
existent. Two such situations occur when an endpoint fails after
sending:
- A CDN message that never made it to the peer.
- An ICCN message that never made it to the peer.
Following mechanism MUST be used to identify and clear the
sessions that exists on an endpoint but not on its peer:
Step1: After the recovery tunnel is established, the sessions that
were not in established state MUST be silently cleared (i.e.
without sending a CDN message) by each endpoint.
Step2: Both endpoints SHOULD identify the sessions that might have
been in inconsistent states, perhaps based on data channel
inactivity.
Step3: An endponit sends Failover Session Query (FSQ) message,
message type [TBD], to query the state of stale sessions on its
peer. An FSQ message MUST include at least one Failover Session
State (FSS) AVPs. An endpoint MAY send another FSQ message on
the recovered tunnel before getting response for its previous FSQs.
Failover Session State AVP is described as follows:
Failover Session State AVP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|M|H| rsvd | Length | Vendor Id [IETF] |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Attribute Type [TBD] | Session Id |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Remote Session Id |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This AVP MAY be hidden (the H-bit set to 0 or 1). The AVP is
mandatory (the M-bit is set to 1).
Session Id identifies the local session id sender had assigned,
for which it would like to query the state on its peer. Remote
Session Id is the remote session id for the same session.
Jain expires Sep 2004 [Page 8]
INTERNET DRAFT FAILOVER March 2004
Before all sessions are synchronized using FSQ/FSR mechanism, if
an endpoint reeceives an ICRQ for a session it believe is already
in established state, it MUST respond to such ICRQ with a CDN,
setting Assigned Session ID AVP (section 4.4.4 [L2TP]) to its
local session id, and clear the session that it considered
established. An endpoint could assign least recently used session
ids to avoid this situation.
When an endpoint receives an FSQ message, it responds with
Failover Session Response (FSR) message, message type [TBD], that
encodes one FSS AVP for each FSS AVP in FSQ. For each FSS AVP
received in FSQ, an endpoint MUST validate the Remote Session Id
and determine if it is paired with the Session Id specified in the
message. If FSS AVP is not valid (i.e. session is non-existing or
it is paired with different remote session id), then the Session
Id field in FSS AVP in the response MUST be set to zero. When
session is discovered to be pairing with mismatching session id,
the local session MUST not be cleared, but rather marked stale, to
be queried later using another FSQ message. An example dialogue in
Appendix D elaborates the endpoints behavior on mismatching
session ids.
Also, when responding to FSQ with an FSR message, Remote Session
Id in FSS AVP is always set to the received value of Session ID in
FSS AVP in FSQ message.
When an endpoint receives an FSR message, it MUST use the Remote
Session Id field to identify the local session and silently
(without sending a CDN) clear the session if Session Id in the AVP
was zero. Otherwise it can consider the session to be in
established state and recovered.
3. IANA Considerations
This document requires four new "AVP Attributes" and three new
messages to be assigned through IETF Consensus [RFC2434] as indicated
in Section 10.1 of [RFC2661]. These are:
Failover Capability AVP (section 2.1)
Tunnel Recovery AVP (section 2.2.1)
Suggested Control Sequence AVP (section 2.2.1)
Failover Session State AVP (section 2.3)
Jain expires Sep 2004 [Page 9]
INTERNET DRAFT FAILOVER March 2004
Failover Session Query Message (FSQ) (section 2.3)
Failover Session Response Message (FSR) (section 2.3)
4. Security Considerations
The failover mechanism described here leaves a some room (1 in 2^32)
for an intruder to discover the old tunnel id of an existing tunnel
by trying out various possibilities in Recovery Tunnel Id and
Recovery Remote Tunnel Id AVP.
It also introduces an opportunity for an intruder to spoof the
FSQ/FSR messages and know the active sessions.
5. Acknowledgements
Leo Huber of Extreme Networks provided valuable suggestions to help
define the failover concept. Ly Loi reviewd the draft and provided
suggestions on improving it.
6. Author Information
Vipin Jain
Riverstone Networks
5200 Great America Parkway
Santa Clara, CA 95054
Phone: +1 408.878.0464
Email: vipinietf@yahoo.com
Paul W. Howard
Juniper Networks
10 Technology Park Drive
Westford, MA 01886
Email: phoward@juniper.net
Sam Henderson
Cisco Systems
7025 Kit Creek Rd.
PO Box 14987
Research Triangle Park, NC 27709
Email: samh@cisco.com
Keyur Parikh
Sentito Networks
2096 Gaither Road Suite 100
Rockville, MD 20850
Email: kparikh@sentito.com
Jain expires Sep 2004 [Page 10]
INTERNET DRAFT FAILOVER March 2004
W. Mark Townsley
Cisco Systems
7025 Kit Creek Road
PO Box 14987
Research Triangle Park, NC 27709
Email: townsley@cisco.com
7. References
[L2TP] Townsley, et. al., "Layer Two Tunneling Protocol L2TP", RFC2661
Appendix A
This section describes some design considerations that came up during
discussions when developing the proposal:
A.1 Backward compatibility and extensibility
- The mechanism should be backwards compatible; i.e. it should
not redefine existing behavior of [L2TP] compliant systems.
- The protocol should allow a peer to detect failover capabilities
in advance, for it to fall back to other failover mechanisms
should peer does not support proposed failover protocol.
- The protocol should allow future extensions to fail-over
mechanism at ease.
A.2 Less failover recovery time
The mechanism should have least possible time to recover from
failover (target of 3-5 seconds for 30k tunnels). Specifically it
should take following into consideration:
- Faster recovery: by utilizing less number of messages exchanged
to recover from failover
- CPU intensiveness: less cpu intensive a proposal is, better are
the changes of faster recovery
- Parallel establishment of various tunnels: by keeping different
tunnel reestablishments independent of one another.
Jain expires Sep 2004 [Page 11]
INTERNET DRAFT FAILOVER March 2004
A.3 Less Payload data loss
The mechanism should have least possible impact on data flows for
sessions with sequencing enabled.
A.4 Minimum interference with pre-failure control traffic
The mechanism should define a way of clearly distinguishing the
messages that were sent before failover from that which are sent
after. Specifically, it should define a mechanism that avoid
confusion between sequence numbers that were used before and after if
the same Tunnel Id is used.
A.5 Simplicity
Simpler the protocol is, better are the changes of being adopted by
everybody. Following would help achieve this:
- Use of existing AVPs, messages and packet formats.
- Avoid introducing special considerations and mechanisms a new
implementation would have to deal with.
- Simpler post fail-over synchronization mechanism.
A.6 Security
The mechanism should provide a mechanism to authenticate peers when
resynchronization is happening after a failover.
A.7 Scalability
It is very important for a proposed protocol to work well for a
scalable deployment. This includes dealing with all design
considerations discussed above for scalable deployments, having
thousands of tunnels or sessions or mix of the two.
A target of 30,000 tunnels carrying 150,000 to 200,000 sessions from
300 peers was considered during the design.
Appendix B
Description below outlines the failover protocol operation for an
example tunnel. The failover protocol does not preclude an endpoint
from recovering multiple tunnels in parallel. It also allows an
endpoint from sending multiple FSQs to recover quickly.
Jain expires Sep 2004 [Page 12]
INTERNET DRAFT FAILOVER March 2004
Pre Failover Exchange (section 2.1):
Endpoint Peer
(assigned tid = x, failover capable)
SCCRQ --------------------------------------> validate SCCRQ
(assigned tid = y, failover capable)
validate <-------------------------------------- send SCCRP
SCCRP, etc.
.... <after tunnel gets created, sessions are established> ....
< This Node fails >
Failed endpoint establishes recovery tunnel (section 2.2.1).
Initiate recovery tunnel establishment for the old tunnel 'x':
Failed Endpoint Peer
(assigned tid = z, Recovery AVP)
SCCRQ -----------------------------------> Detects failover
(recover tid = x, recover remote tid = y) validate SCCRQ
(Suggested Control Seqence AVP, Suggested Ns/Nr = 3/100)
validate <----------------------------------- send SCCRP
SCCRP (recover tid = y, recover remote tid = x)
reset Ns = 3, Nr = 100
on the recovered tunnel
SCCCN -----------------------------------> validate and reset
Ns = 100, Nr = 3 on
the recovered tunnel.
Terminate the recovery tunnel
tid = 'z'
StopCCN --------------------------------------> Cleanup 'w'
Session states are synchronized both endpoints may send FSQs and
cleanup stale sessions (section 2.3)
(FSS AVP for sessions s1, s2, s3..)
send FSQ -------------------------------------> compute the state
of sessions in FSQ
(FSS AVP for sessions s1, s2, s3...)
deletes <-------------------------------------- send FSR
stale sessions, if any
(FSS AVP for sessions s7, s8, s9...)
compute <-------------------------------------- send FSQ
the sate of
sessions in FSQ
Jain expires Sep 2004 [Page 13]
INTERNET DRAFT FAILOVER March 2004
(FSS AVP for sessions s7, s8, s9...)
send FSR --------------------------------------> delete stale sessions,
if any
Appendix C
This section shows an example dialogue to illustrate double failure
recovery. Although illustration assumes two endpoints failing almost
at the same time, the behavior on two endpoints would be similar even
if the failure is interlaced.
Failed endpoint Failed endpoint
(assume old tid = A) (assume old tid = B)
Recovery AVP = (A, B)
SCCRQ --------------------------> valid SCCRQ ---+
(recovery tunnel 'C') |
|
|
Recovery AVP = (B, A) |
+- valid <-------------------------- Send SCCRQ |
| SCCRQ (recovery tunnel 'D') |
| |
| No SCS AVP |
| Validate <-------------------------- send SCCRP <---+
| SCCRP; Reset 'A'
| Ns, Nr set to zero
| |
| | No SCS AVP
+->Send SCCRP -------------------------> Validate SCCRP
| Reset 'B';
| Ns, Nr set to zero --+
| |
+-> Send SCCCN ---------------------> Validate SCCCN; |
Reset 'B' again; |
Ns, Nr set to zero |
|
Validate SCCN <---------------------- Send SCCN --------+
Reset 'A' again;
Ns, Nr set to zero
Jain expires Sep 2004 [Page 14]
INTERNET DRAFT FAILOVER March 2004
FSQs and FSRs for the old tunnel (A, B) are exchanged on
the recovered tunnel. This should be no different from handling
simultaneous FSQs and FSRs between two nodes when only one node
had failed.
Appendix D
Session id mismatch could not be a result of failure on one of the
endpoints. However, failover session recovery procedure could
exacerbate the situation, resulting into a permanent mismatch in
session ids between two endpoints. Dialogue below outlines the
behavior described in section 2.3 to handle such situations
gracefully.
Failed endpoint Non failed endpoint
(assume a mismatch) (assume a mismatch)
Sid = A, Remote Sid = B Sid = B, Remote Sid = C
Sid = C, Remote Sid = D
FSS AVP (A, B)
send FSQ -------------------------> No (B, A) pair exist;
rather (B, C) exist.
If it clears B then peer doesn't
know if C is stale on other end.
Instead if it marks B stale
and queries the session state
via FSQ, C would be cleared on the
other end.
FSS AVP (0, A)
Clears A <-------------------------- send FSR
... some time later ...
FSS AVP (B, C)
No (B,C) <-------------------------- send FSQ
Mark C Stale
FSS AVP (B, 0)
Send FSR --------------------------> Clears B
Jain expires Sep 2004 [Page 15]
INTERNET DRAFT FAILOVER March 2004
| PAFTECH AB 2003-2026 | 2026-04-22 22:51:13 |