One document matched: draft-rosen-l2vpn-mesh-failure-00.txt
Network Working Group Eric C. Rosen
Internet Draft Cisco Systems, Inc.
Expiration Date: February 2004
August 2003
Detecting and Reacting to Failures of the Full Mesh in IPLS and VPLS
draft-rosen-l2vpn-mesh-failure-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Certain L2VPN architectures [IPLS, VPLS] rely on there being a full
mesh of pseudowires [PWE3-ARCH] among a set of entities. This mesh
is used to provide a "LAN-like" service among the entities. If one
or more of these pseudowires is absent, so that there is not really a
full mesh, various higher layers (from routing to bridge control
protocols) that expect a LAN-like service may fail to work as
expected. Therefore it is desirable to have procedures that enable
the pseudowire endpoints to determine automatically whether there is
really a full mesh or not. It is also desirable to have procedures
that cause the L2VPNs to adapt to pseudowire failures. This document
proposes a set of procedures to meet these goals. Detailed protocol
encodings are not present, but will be added in future versions.
Rosen [Page 1]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
Contents
1 Introduction ......................................... 2
2 Detection of Partially Connected EEs ................. 4
3 Actions Taken Upon Detection ......................... 5
4 References ........................................... 7
5 Author's Information ................................. 7
1. Introduction
IPLS [IPLS] interconnects a set of CEs. With respect to a particular
IPLS instance and a particular PE supporting that IPLS instance, the
set of CEs can be divided into the PE's "local CEs" and the PE's
"remote CEs". The local CEs are directly attached to the PE.
("Directly attached" means attached via an "Attachment Circuit" in
the sense of [L2VPN-Framework].) The PE must ensure that each of its
local CEs is bound, by a Pseudowire (PW), to each of the remote CEs.
When this condition holds for all the PEs supporting a given IPLS
instance, we say that the IPLS instance is fully meshed.
VPLS [VPLS} interconnects a set of "VPLS Forwarders" [L2VPN-
FRAMEWORK], which are virtual entities inside PEs; for a given VPLS
instance, there is one VPLS Forwarder in a given PE. Some of these
are considered "spokes", and some are considered "hubs". In a given
VPLS instance, there must be a PW binding every hub VPLS Forwarder to
every other hub VPLS Forwarder; this means that every hub PE in the
VPLS instance must have a PW to every other hub PE in the VPLS
instance. When this condition holds, we say that the VPLS instance
is fully meshed.
We will use the term "LS" to mean "IPLS or VPLS".
In each LS instance, there is a set of "endpoint entities" (EEs). In
VPLS, the EEs are hub VPLS Forwarders inside the PEs, in IPLS the EEs
are CEs. In either case, we say say that the LS instance is "fully
meshed" if every pair of EEs which are not local to the same PE are
bound together by a PW.
(For present purposes, it does not matter whether two EEs are bound
by a single bidirectional point-to-point PW or by a pair of
unidirectional point-to-multipoint PWs.)
Rosen [Page 2]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
It is possible that a given LS instance may fail to be fully meshed.
This may happen for the following reasons:
- Configuration errors.
- Failure of the auto-discovery process.
- Failure of the control plane to properly establish all the
necessary PWs. This in turn may be due to bugs, or to resource
shortages at the PEs.
- Failure of the data plane to carry traffic correctly on all the
established PWs. This can occur if there are bugs in the
encapsulation/decapsulation procedures at the PEs, or bugs in the
forwarding procedures at intermediate nodes (especially in
technologies where the data and control planes are decoupled.
When an LS instance is not fully meshed, we will say that one or more
of its EEs are "partially connected". An EE is regarded as
"partially connected" at a particular time if one of the following
conditions holds:
- PW not established: at that time, some PW binding that EE to
another EE has not been properly established, as determined by
the PW control plane.
- PW not operational: at that time, although the control plane
indicates that all the PWs binding other EEs to the given EE are
properly established, one or more of those PW is incapable of
passing data to the given EE for some reason. Note that
"operational" status is a unidirectional attribute.
If an LS instance is not fully meshed, then it will not be able to
provide the "LAN-like" service on which its users are depending. For
instance, if a link state routing algorithm is using its LAN
procedures over an LS instance which is not fully meshed, the
selected set of routes may have "black holes".
It is desirable therefore to have procedures which will automatically
identify any partially connected EEs. This document proposes a set
of procedures to meet these goals. Detailed protocol encodings are
not present, but will be added in future versions if the WG has
interest in proceeding in this direction.
Rosen [Page 3]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
2. Detection of Partially Connected EEs
Each PE in a particular LS instance must have some sort of control
plane relationship with each of the other PEs in the same LS
instance. (For the time being we ignore the situation in which PWs
are spliced together; this concepts discussed here are readily
extended to that case.)
There must be a status message, which we call the "Mesh Status"
message, which a PE sends to each of the other PEs in the same LS
instance. The Mesh Status message identifies the LS instance (by its
globally unique VPN identifier, for example), and lists the set of EE
pairs for which the originating PE has operational PWs. This message
would need to be resent whenever the list changes. As long as the
control protocol can reliably transport control messages, this
message would not have to be sent unless there is a change; in fact,
only changes would need to be sent. (However, this would require two
variants of the Mesh Status message: an "Add" and a "Remove".) A
PE's Mesh Status messages should also indicate which of the EEs are
locally attached to that PE.
Thus every PE in an LS instance maintains the Mesh Status of every
other PE supporting that same LS instance.
When the control connection to a particular remote PE is lost, the
Mesh Status of the remote PE is flushed, and no longer considered for
the purposes of Partially Connected EE Detection.
By including a pair of EEs in its Mesh Status messages, a PE is
stating that there is an OPERATIONAL PW binding the two EEs together,
not merely an established PW. Each PE is responsible for determining
whether each of its local PWs is operational in the outgoing
direction. This may require the use of some sort of per-PW test of
the data plane. It is advisable to construct the test for operational
status so as to avoid the possibility of flapping, perhaps by not
allowing a non-operational PW to return to operational status in less
than a specified time period. The test for operational status should
also ensure that a PW is not declared non-operational due to ordinary
network conditions, such as occasional packet loss, and that a PW is
not declared non-operational due to routing transients.
It is understood that it is much easier to lay down such requirements
than it is to devise procedures to meet them. The specification of
such procedures however is outside the scope of the current document.
When a PE in a particular LS instance has received a Mesh Status
message from every other PE (that it knows about) in that instance,
it can compute the set {EE} of all the EEs in the LS instance. This
Rosen [Page 4]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
is the union of the set of EEs mentioned in all the Mesh Status
messages.
The IPLS or VPLS instance is fully meshed if and only if the
following condition holds:
For every PE p and every EE e, either e is one of p's local EEs,
or p reports an operational PW from each of its local EEs to e.
If this condition doesn't hold, there are one or more Partially
Connected PWs . The set of Partially Connected EEs is defined as
follows:
An EE e is "Partially Connected" if and only if there is some PE
p such that e is not locally attached to p, and p has a locally
attached EE e' such that there is either no operational PW from e
to e' or there is no operational PW from e' to e.
If the configuration and/or auto-discovery procedures identify a set
of EEs whose local PE just happens to be down (or otherwise
unreachable), no PEs will have operational PWs for any of those EEs,
and the above procedures will not result in the determination that
there are any Partially Connected EEs. However, misconfigurations or
auto-discovery problems which cause different PEs to learn about
different sets of EEs will result in the detection of Partially
Connected EEs.
3. Actions Taken Upon Detection
Upon identification of a Partially Connected EE, an alarm should be
raised so that the network operators are aware of the situation.
In general, the LS service will not function properly if there are
Partially Connected EEs. It can however be made to function properly
if the Partially Connected EEs are removed from service entirely,
until such time as they becomes fully connected. In effect, once the
problematic EEs are removed from the mesh entirely, the LS service is
once again fully meshed, though with fewer EEs. Any users who
connect via the removed EEs will of experience degraded service, if
not complete loss of service, but other users may continue to receive
service.
If a PE determines that one of its locally attached EEs is Partially
Connected, it should remove that EE from service. In the case of
VPLS, this means that an Emulated LAN interface [L2VPN-Framework] is
brought down. In the case of IPLS, this means that the Attachment
Circuit to a particular set of CEs is brought down. PWs which are
Rosen [Page 5]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
bound to the Emulated LAN interface or Attachment Circuit should NOT
be disestablished and the testing of the data plane of such PWs
should continue.
If a PE determines that a remote EE is Partially Connected, the PE
will cease to send or receive data to or from that EE. The
corresponding PWs should NOT be disestablished, and the testing of
the data plane of such PWs should continue.
There may be methods of returning the LS service to a full mesh which
do not require removing a Partially Connected EE from service
entirely. For example, in VPLS it may be possible to change a
Partially Connected EE from a hub to a spoke, thereby removing it
from the mesh without bringing it out of service. [HUB-TO-SPOKE]
If, at some later time, an EE ceases to be Partially Connected,
normal operations can resume.
It must be understood that when an EE first becomes known, there will
be a period of time during which PEs are trying to bring up PWs to
it. From the time the first PW to/from it becomes operational to the
time the last PW to/from it becomes operational, the EE will be
detected as Partially Connected. As this is a normal transient, there
should be a specified period of time during which a newly discovered
EE may be Partially Connected before any action is taken.
Determination that a previously known EE has become Partially
Connected should cause immediate actions, however.
If a PE detects that one of its PWs has ceased to be operational, the
remote EE does not necessarily get treated immediately as being
Partially Connected. Before declaring the EE to be Partially
Connected, the PE should wait a period of time to see if that EE
disappears from the Mesh Status messages generated by all the other
PEs. After all, a very likely cause for a PW to become non-
operational is for the remote PE to fail or to become unreachable.
As this will no result in a partial mesh, no special action needs to
be take.
Rosen [Page 6]
Internet Draft draft-rosen-l2vpn-mesh-failure-00.txt August 2003
4. References
[HUB-TO-SPOKE] as suggested by Vach Kompella on the L2VPN mailing
list
[IPLS] "IP over LAN Service (IPLS)", H. Shah, K. Arvind, E. Rosen, G.
Heron, V. Radoaca, draft-shah-ppvpn-ipls-02.txt, June 2003
[L2VPN-FRAMEWORK] "L2VPN Framework", L. Andersson, E. Rosen, editors,
draft-ietf-l2vpn-l2-framework-00.txt, February 2003
[PWE3-ARCH] "PWE3 Architecture", S. Bryant, P.Pate, editors, draft-
ietf-pwe3-arch-04.txt, June 2003
[VPLS] "Virtual Private LAN Services over MPLS", M. Lasserre, V.
Kompella, et. al., draft-ietf-l2vpn-vpls-ldp-00.txt, June 2003
5. Author's Information
Eric C. Rosen
Cisco Systems, Inc.
1414 Massachusetts Avenue
Boxborough, MA, 01719
E-mail: erosen@cisco.com
Rosen [Page 7]
| PAFTECH AB 2003-2026 | 2026-04-20 15:22:24 |