One document matched: draft-villamizar-mpls-tp-multipath-01.txt
Differences from draft-villamizar-mpls-tp-multipath-00.txt
CCAMP C. Villamizar, Ed.
Internet-Draft Infinera Corporation
Intended status: Informational March 6, 2011
Expires: September 7, 2011
Use of Multipath with MPLS-TP and MPLS
draft-villamizar-mpls-tp-multipath-01
Abstract
Many MPLS implementations have supported multipath techniques and
many MPLS deployments have used multipath techniques, particularly in
very high bandwidth applications, such as provider IP/MPLS core
networks. MPLS-TP has discouraged the use of multipath techniques.
Some degradation of MPLS-TP OAM performance cannot be avoided when
operating over current high bandwidth multipath implementations.
The tradeoffs involved in using multipath techniques with MPLS and
MPLS-TP are described. Requirements are discussed which enable full
MPLS-TP compliant LSP including full OAM capability to be carried
over MPLS LSP which are traversing multipath links. Other means of
supporting MPLS-TP coexisting with MPLS and multipath are discussed.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 7, 2011.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
Villamizar Expires September 7, 2011 [Page 1]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Multipath Behavior of Widely Deployed Equipment . . . . . 4
1.2. New Requirements imposed by MPLS-TP . . . . . . . . . . . 5
1.3. Apparantly Conflicting Requirements . . . . . . . . . . . 6
1.4. Requirements Language . . . . . . . . . . . . . . . . . . 6
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Multipath Requirements . . . . . . . . . . . . . . . . . . . . 8
3.1. Scalability and Large Capacity Requirements . . . . . . . 8
3.2. MPLS-TP Requirements . . . . . . . . . . . . . . . . . . . 9
3.3. Discussion of Requirements . . . . . . . . . . . . . . . . 11
3.3.1. Requirements related to midpoint LSR . . . . . . . . . 11
3.3.1.1. MPLS Incoming Label Map (ILM) Size . . . . . . . . 12
3.3.1.2. ILM Size Impact on Equipment Density . . . . . . . 12
3.3.1.3. Topology Impact on ILM Size . . . . . . . . . . . 12
3.3.1.4. Multiple LSP Between Node Pairs . . . . . . . . . 13
3.3.2. Requirements related to Ingress LSR . . . . . . . . . 13
3.3.2.1. Reasons to Use MPLS/GMPLS Signaling . . . . . . . 14
3.3.2.2. MPLS Fault Response and CSPF Scaling . . . . . . . 15
3.3.3. Efficient Use of Multipath Capacity . . . . . . . . . 15
4. Multipath Current Practices . . . . . . . . . . . . . . . . . 16
4.1. Techniques Common to Multipath in Provider Networks . . . 16
4.1.1. Flow Identification . . . . . . . . . . . . . . . . . 17
4.1.2. Simple Multipath and Adaptive Multipath . . . . . . . 18
4.1.3. Traffic Split over Parallel Links . . . . . . . . . . 19
4.1.4. Traffic Split over Multiple Paths . . . . . . . . . . 19
4.2. Specific Types of Multipath . . . . . . . . . . . . . . . 20
4.2.1. ECMP Current Practices . . . . . . . . . . . . . . . . 20
4.2.2. Ethernet Link Aggregation Current Practices . . . . . 21
4.2.3. MPLS Link Bundling Current Practices . . . . . . . . . 21
5. Improving Support for MPLS-TP and Multipath Requirements . . . 22
5.1. Characteristics of MPLS-TP Multipath Solutions . . . . . . 22
5.1.1. Coexistance of MPLS and MPLS-TP . . . . . . . . . . . 23
5.1.2. Advantages and Disadvangates of Solutions . . . . . . 24
5.2. MPLS-TP Multipath Solution Set . . . . . . . . . . . . . . 25
5.2.1. MPLS as a Server Layer for MPLS-TP . . . . . . . . . . 25
5.2.2. MPLS-TP as a Server Layer for MPLS . . . . . . . . . . 26
5.2.3. Relax MPLS-TP OAM Requirements . . . . . . . . . . . . 27
Villamizar Expires September 7, 2011 [Page 2]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
5.2.3.1. MPLS-TP CC/CV OAM with Multipath . . . . . . . . . 28
5.2.3.2. MPLS-TP LM OAM with Multipath . . . . . . . . . . 29
6. Summary of Recommendations . . . . . . . . . . . . . . . . . . 30
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
8. Security Considerations . . . . . . . . . . . . . . . . . . . 32
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32
9.1. Normative References . . . . . . . . . . . . . . . . . . . 32
9.2. Informative References . . . . . . . . . . . . . . . . . . 32
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 35
Villamizar Expires September 7, 2011 [Page 3]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
1. Introduction
Today the requirement to handle large aggregations of traffic, can be
handled by a number of techniques which we will collectively call
multipath. Multipath applied to parallel links between the same set
of nodes includes Ethernet Link Aggregation [IEEE-802.1AX], link
bundling [RFC4201], or other aggregation techniques some of which may
be vendor specific. Multipath applied to diverse paths rather than
parallel links includes Equal Cost MultiPath (ECMP) as applied to
OSPF, ISIS, or BGP, and equal cost LSP, as described in Section 4.
Various multipath techniques have strengths and weaknesses described
in Section 4.2.
The term composite link is more general than terms such as link
aggregation (which is specific to Ethernet) or ECMP (which implies
equal cost paths within a routing protocol). The use of the term
composite link here is consistent with the broad definition in
[ITU-T.G.800]. Multipath is very similar to composite link, but
specifically excludes inverse multiplexing.
1.1. Multipath Behavior of Widely Deployed Equipment
Identical load balancing techniques are used for multipath both over
parallel links (for example IP/MPLS over Ethernet link aggregation)
and over diverse paths (for example, IP ECMP, IP/MPLS ECMP over
multiple LSP or link bundling over LSP component links).
Large aggregates of IP traffic do not provide explicit signaling to
indicate the expected traffic loads. Large aggregates of MPLS
traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which
are signaled using RSVP-TE extensions do provide explicit signaling
which includes the expected traffic load for the aggregate. LSP
which are signaled using LDP do not provide an expected traffic load.
MPLS LSP may contain other MPLS LSP arranged hierarchically. When an
MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as
payload, there is no signaling associated with these client (inner)
LSP. Therefore even when using RSVP-TE signaling there may be
insufficient information provided by signaling to adequately
distribute load across a multipath link.
A set of label stack entries that is unique across the ordered set of
label numbers can safely be assumed to contain a group of (one or
more) flows. The reordering of MPLS traffic (except MPLS-TP) can
therefore be considered to be acceptable unless reordering occurs
within traffic containing a common unique set of label stack entries.
Existing load splitting techniques take advantage of this property in
addition to looking beyond the bottom of the label stack and
Villamizar Expires September 7, 2011 [Page 4]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
determining if the payload is IPv4 or IPv6 to load balance traffic
based on IP addresses.
A large aggregate of IP traffic may be subdivided into groups of
flows using a hash on the IP source and destination addresses. IP
microflows are described in [RFC2475] and clarified in [RFC3260].
For MPLS traffic that is not carrying IP, a similar hash can be
performed on the set of labels in the label stack. These techniques
subdivide traffic into groups of flows for the purpose of load
balancing traffic across the aggregated capacity of a multipath link.
Attempting to resolve years of discussion as to whether a hash based
approach provides a sufficiently even load balance using any
particular hashing algorithm or method of distributing traffic across
a set of component links is outside of the scope of this document.
For the purpose of discussing existing widely deployed
implementations, it is sufficient to say that hash based techniques
have proven to be at least satisfactory through their widespread
deployment (and its increase in deployment for more than two
decades).
The current load balancing techniques are referenced in [RFC4385] and
[RFC4928], though few specifics are provided in these two RFCs. The
use of three hash based approaches are described in [RFC2991] and
[RFC2992], though other techniques with very similar outcome are
used. A means to identify flows within pseudowires (when flows are
present, since not all PW types contain discernible flows) is
described in [I-D.ietf-pwe3-fat-pw].
1.2. New Requirements imposed by MPLS-TP
MPLS-TP OAM violates the assumption made in prior multipath
implementations that it is safe to reorder traffic within an LSP.
This assumption is common (if not universal) in multipath
implementations which use hashing techniques for load balancing. The
use of multipath can impact CC/CV (connectivity check, connectivity
verification) and LM (loss measurement) and DM (delay measurement)
[I-D.ietf-mpls-tp-oam-framework].
MPLS-TP CC/CV, DM, and LM OAM packets must take the same path as the
payload. If the label stack for the payload contains an LSP and a PW
label beneath it (one of one or more additional PW labels), then the
payload will be load split over the multipath. The OAM packets will
have a GAL label beneath the LSP label [RFC5586]. With no other
label beneath the GAL label, the OAM traffic will take only one path
and the set of PW will take multiple paths (though any one PW will
take one path if a flow label is not used).
Villamizar Expires September 7, 2011 [Page 5]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
With the current OAM CC/CV definition and current multipath
practices, OAM CC/CV functionality may not cover the forwarding path
for a particular PW within the LSP at any given multipath along the
path. The existing OAM CC/CV will provide a check for the condition
where the entire multipath becomes unavailable (goes down or the
particular LSP is preempted due to reduced multipath capacity).
There is no assurance that DM OAM is measuring the delay of the
forwarding path for a particular PW within the LSP with the current
OAM DM definition and current multipath practices. In addition, if
packets are reordered, OAM LM accuracy can be (and generally is)
affected.
1.3. Apparantly Conflicting Requirements
The existing multipath techniques address specific requirements.
MPLS-TP requirements are in conflict with multipath, at least as
currently implemented.
The underlying requirements that motivated the current use of
multipath are not in conflict with the use of MPLS-TP. Section 3
described these requirements in greater detail. Section 4 described
current practices in greater detail. Section 5 describes means of
better supporting both MPLS-TP and multipath requirements.
1.4. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Definitions
Multipath
The term multipath includes all techniques in which
1. Traffic can take more than one path from one node to a
destination.
2. Individual packets take one path only.
3. Packets are neither resequenced or subdivided and reassembled
at the receiving end.
4. The paths may be:
Villamizar Expires September 7, 2011 [Page 6]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
a. parallel links between two nodes, or
b. may be specific paths across a network to a destination
node, or
c. may be links or paths to a next hop hop used to reach a
common destination.
Link Bundle
Link bundling is a multipath technique specific to MPLS
[RFC4201]. Link bundling supports two modes of operations.
Either an LSP can be placed on one component link of a link
bundle, or an LSP can be load split across all members of the
bundle. There is no signaling defined which allows a per LSP
preference regarding load split, therefore whether to load split
is generally configured per bundle and applied to all LSP across
the bundle.
Link Aggregation
The term "link aggregation" generally refers to Ethernet Link
Aggregation [IEEE-802.1AX] as defined by the IEEE. Ethernet Link
Aggregation defines a Link Aggregation Control Protocol (LACP)
which coordinates inclusion of LAG members in the LAG.
Link Aggregation Group (LAG)
A group of physical Ethernet interfaces that are treated as a
logical link when using Ethernet Link Aggregation is referred to
as a Link Aggregation Group (LAG).
Equal Cost Multipath (ECMP)
Equal Cost Multipath (ECMP) is a specific form of multipath in
which the costs of the links or paths must be equal in a given
routing protocol. The load may be split equally across all
available links (or available paths), or the load may be split
proportionally to the capacity of each link (or path).
Loop Free Alternate Paths
"Loop-free alternate paths" (LFA) are defined in RFC 5714,
Section 5.2 [RFC5714] as follows. "Such a path exists when a
direct neighbor of the router adjacent to the failure has a path
to the destination that can be guaranteed not to traverse the
failure." Further detail can be found in [RFC5286]. LFA as
defined for IPFRR can be used to load balance by relaxing the
equal cost criteria of ECMP, though IPFRR defined LFA for use in
selecting protection paths. When used with IP, proportional
split is generally not used. LFA use in load balancing may be
implemented though rare or non-existent in deployments.
Villamizar Expires September 7, 2011 [Page 7]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
Composite Link
The term Composite Link had been a registered trademark of Avici
Systems, but was abandoned in 2007. The term composite link is
now defined by the ITU in [ITU-T.G.800]. The ITU definition
includes multipath as defined here, plus inverse multiplexing
which is explicitly excluded from the definition of multipath.
Inverse Multiplexing
Inverse multiplexing either transmits whole packets and
resequences the packets at the receiving end or subdivides
packets and reassembles the packets at the receiving end.
Inverse multiplexing requires that all packets be handled by a
common egress packet processing element and is therefore not
useful for very high bandwidth applications.
Component Link
The ITU definition of composite link in [ITU-T.G.800] and the
IETF definition of link bundling in [RFC4201] both refer to an
individual link in the composite link or link bundle as a
component link. The term component link is applicable to all
multipath.
LAG Member
Ethernet Link Aggregation as defined in [IEEE-802.1AX] refers to
an individual link in a LAG as a LAG member.
3. Multipath Requirements
This section enumerates two sets of requirements. The first set
includes those requirements imposed by the need for scalability and
very large capacity links and very large capacity LSP and are
enumerated in Section 3.1. The second set of requirements are those
imposed by the needs of MPLS-TP and are enumerated in Section 3.2.
Discussion of these requirements is provided in Section 3.3.
Section 4 describes multipath techniques which are implemented and
deployed today. Section 5 enumerates derived requirements which
focus on means to support the requirements in Section 3.1 and
Section 3.2 with minimal modifications to existing multipath
techniques. A summary of recommendations is provided in Section 6.
3.1. Scalability and Large Capacity Requirements
Networks today may support thousands or tens of thousands of nodes in
total. This large number of nodes is typically arranged in tiers to
improve scalability through aggregation of signaling and aggregation
of traffic. The innermost tier, most commonly referred to at the
Villamizar Expires September 7, 2011 [Page 8]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
network core, may support interconnection of adjacent sites with
hundreds of gigabits or terabits of capacity.
The physical interface of choice today is 10GbE with migration toward
100GbE expected to begin in the near future. SONET and OTN are also
in use, but are today also limited to 10Gb/s or 40Gb/s, with 100Gb/s
availability (OTN ODU4) expected in the near future. With core link
capacities of terabits today and tens of terabits expected in the
near future, multipath is needed.
R#1 Multipath MUST support multipath links that are in well in
excess of the largest component link and well in excess of the
capacity of a single packet processing element.
R#2 Multipath SHOULD support direct service bearing LSP carrying
Internet traffic within the network core with capacity in excess
of the largest component link and in excess of the capacity of a
single packet processing element.
R#3 Aggregation of LSP using hierarchy (as defined in [RFC4206]) may
be necessary to reduce the number of MPLS labels in use within a
network tier containing a large number of nodes. This
aggregation SHOULD NOT be constrained by multipath limitations.
R#4 LSP containing the aggregate of other LSP SHOULD be capable of
exceeding the capacity of the largest component link and in
excess of the capacity of a single packet processing element.
R#5 It SHOULD be possible to support load split of traffic which is
very efficient in its utilization of available capacity, subject
to some limitations due to conflicting requirements. The load
split SHOULD support sharing of total capacity across the entire
multipath where some LSP may make use of unused capacity set
aside for other LSP but unused. This load split SHOULD be as
free of bin packing issues as possible except when moving LSP to
other component links would conflict with other requirements.
3.2. MPLS-TP Requirements
MPLS-TP requirements related to multipath are primarily related to
prohibiting out-of-order delivery of traffic for reasons of OAM fate
sharing. Specific requirements related to OAM are provided in
"MPLS-TP OAM Framework", Section 4.6, Section 5.5.3, and Section
6.2.3 [I-D.ietf-mpls-tp-oam-framework].
The following requirement is currently met with no changes to
existing multipath implementations.
Villamizar Expires September 7, 2011 [Page 9]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
R#6 Traffic within an MPLS-TP PW MUST NOT be reordered unless
specifically allowed. This is met if a PW control word is used
[RFC4385]. Reordering may be specifically allowed using a PW
flow label [I-D.ietf-pwe3-fat-pw].
The following requirement can only be met with existing multipath
techniques using MPLS link bundling [RFC4201] if LSR are configured
to place an LSP on only a single component rather than spliting some
or all LSP across the set of components. Using link bundling with
all LSP constrained to use a single component has well known
disadvantages (see Section 4.2.3). Other forms of multipath as
currently defined do not meet this requirement (see Section 4.2).
R#7 Traffic within an MPLS-TP LSP MUST NOT be reordered if full OAM
capability is required of the MPLS-TP LSP
[I-D.ietf-mpls-tp-oam-framework].
The remaining MPLS-TP requirements are related to the scale of a
deployed MPLS-TP network and have the greatest impact on the network
core. These are practical requirements mostly related to scalability
but specific to MPLS-TP.
R#8 Service PWs and/or service bearing LSPs may form a fairly dense
mesh of LSPs from edge to edge over a very large set of nodes.
Some means MUST be available to support such usage of MPLS-TP.
See Section 3.3.1.1 for a discussion of ILM size limitations
that are relevant to this requirement.
R#9 For an MPLS-TP LSP to be fully compliant, all payload and OAM
traffic on the MPLS-TP LSP MUST traverse the same physical
path. OAM traffic taking the same path as payload (service
bearing) traffic is known as the "fate sharing" requirement
(see RFC 5860, Section 2.1.3 [RFC5860]).
R#10 For large networks, MPLS hierarchy [RFC4206] can be used to
reduce the number of LSP from the large number which would be
needed to carry all service bearing MPLS-TP LSP through the
network core. For networks configured through the management
plane, label stacking can be used to aggregate LSP, though the
signaling described in [RFC4206] is not used. Any MPLS-TP
constraints which impact this ability to aggregate LSP SHOULD
be optional. If MPLS-TP constraints must be relaxed in some
deployments, such deployments MAY be referred to as partially
MPLS-TP compliant.
Villamizar Expires September 7, 2011 [Page 10]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
R#11 For large networks using link bundling to support large
aggregations of MPLS-TP traffic, and using MPLS hierarchy, PSC
LSP (see [RFC4206]) or label stacking which are providing a
server layer within the network core and carrying many service
bearing MPLS-TP LSP SHOULD be capable of supporting capacity in
excess of any single link bundle component. In meeting this
requirement the server layer LSP need not be an MPLS-TP LSP as
long as it is capable of providing a server layer which can
support fully compliant MPLS-TP LSP.
LSP which are configured entirely from the management plane rather
than through use of a control plane need not use the MPLS PSC portion
of the hierarchy as specified in RFC 4206, however hierarchy is still
needed in the label stack.
3.3. Discussion of Requirements
There is a tradeoff between making use of MPLS-TP as a server layer
for the benefits of MPLS-TP and the benefits of using MPLS. The
benefits of MPLS-TP include the ability to run without the OSPF-TE,
ISIS-TE, and RSVP-TE control protocols, and MPLS-TP OAM. The
benefits of MPLS include more efficient use of multipath capacity due
to removal of MPLS-TP constraints.
A requirements for very large server layer traffic flow within the
network core can be accommodated using multiple parallel MPLS-TP LSP.
This increases the number of LSP required which itself is a drawback.
This also results in a bin packing problem if the service bearing
MPLS-TP LSP do not require the same capacity and are not all small
multiples of a common capacity increment. For example, if LSP are
not all 10Gb/s, or they are not only 10Gb/s and 40 Gb/s then bin
packing problems can occur. This use of MPLS-TP can also result in
less opportunity for statistical multiplexing with very large
aggregates of lower priority non-TP IP/MPLS traffic (see
Section 4.2.3 and Section 5.2.2 for further details on bin packing
problems and loss of efficiency with MPLS-TP as a server layer).
The following subsections provide further detail related to the
requirements enumerated in Section 3.1 and Section 3.2.
3.3.1. Requirements related to midpoint LSR
Midpoint LSR must support a very large number of LSP. This places
requirements on the ILM size. If a control plane is used this also
places requirements on the speed of processing RSVP-TE messages. As
long as RSVP-TE ERO contain only strict hops, the processing is
limited to connection admission, label assignment, and forwarding
hardware programming of the label swap operation.
Villamizar Expires September 7, 2011 [Page 11]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
3.3.1.1. MPLS Incoming Label Map (ILM) Size
The MPLS label entry is 32 bits of which the label itself is 20 bits
[RFC3032]. This allows 2^20 or 1,048,576 values minus the 16
reserved label values. The Incoming Label Map (ILM) (see RFC 3031,
Section 1.11 [RFC3031]) is generally much smaller. Circa 2000, ILM
sizes of 4K-32K were common. Circa 2010, ILM sizes of 64K-256K are
more common in core LSR.
Putting a bound on ILM size has two effects. It allows LSR that
offer higher power and space density. For deployments which use a
control plane and support restoration, speed of restoration is
dramatically improved when a smaller number of LSP are supported.
3.3.1.2. ILM Size Impact on Equipment Density
For some architectures, bounding the ILM size allows the ILM to be
supported without forwarding memory external to the forwarding IC.
This is a practical consideration as the power reduction and board
space reduction can allow an LSR to achieve higher power and space
density.
Reducing external memories reduces power consumed and therefore
reduces cooling problems. In addition there are board space
reductions. This results in reduced space as well as power.
In today's networks, which predominantly use MPLS/GMPLS OSPF-TE or
ISIS-TE and RSVP-TE signaling, the computational limitations
described in Section 3.3.2.2 are the limiting factor. Reduction in
space and power due to smaller ILM are then a secondary consequence
of the signaling scaling issue.
3.3.1.3. Topology Impact on ILM Size
In a network tier with N nodes, a worst case cutset has N/2 nodes on
either side of the cutset. Given that a full mesh of LSP
connectivity is needed in the network core, the cutset therefore
carries N^2/4 LSP. For example, if N is 400, the cutset carries a
minimum of 40,000 LSP to achieve a full mesh. If the core has over
2,000 nodes, then the cutset carries over 1,000,000 LSP. Since the
MPLS label space is only 20 bits, a full mesh within an entire
provider network with no hierarchy could easily exceed the MPLS label
number space. Use of Hierarchy can solve this problem.
Typically there are more than one LSP between any pair of LSR in the
network core. Protection is one source of additional LSP. More than
one LSP may be required to carry traffic with very different
requirements. See Section 3.3.1.4.
Villamizar Expires September 7, 2011 [Page 12]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
The result is that even considering only the ILM size, the number of
nodes in a full mesh of LSP must be limited to well under 1,000. If
two links in a cutset supporting a large number of LSP incur a fault,
then the nodes bordering the remaining links in the cutset must
process a very large number of RSVP-TE PATH and RESV messages and the
connection admission requests and ILM allocation operations that are
required as a result.
3.3.1.4. Multiple LSP Between Node Pairs
A full mesh of N nodes will have N*(N-1) unidirectional LSP or
N*(N-1)/2 bidirectional LSP if there is only one LSP with any given
pair of nodes as ingress and egress. There may be more than one LSP
with any given pair of nodes as ingress and egress to meet protection
requirements or to meet certain quality of service requirements.
If GMPLS protection [RFC4426] protection is used, the number of LSP
is doubled with end-to-end (path) protection, but more than doubled
with span protection. If MPLS FRR [RFC4090] is used, the number of
LSP is increased only slightly with the (more common) facilities
backup technique, but more than doubled with the one-to-one backup
technique.
All services between a pair of core nodes may be carried over a
single unsignaled E-LSP [RFC3270] if the eight TC values [RFC5462]
are sufficient and the requirements of these services is sufficiently
similar. If more than eight PHB are required, more LSP will be
required. If services require preemption, or have different
protection needs, then multiple LSP per pair of core nodes is
required. If services have different delay requirements, this too
may require multiple LSP per pair of core nodes.
The total number of LSP at a cutset needs to be constrained for two
reasons. First the number of LSP must fit in the 20 bit label field
or the smaller number of labels supported by most LSR. Second is a
need to reduce the amount of signaling that would be required if
restoration was needed to cover a multiple fault (if restoration is
not supported multiple faults can result in otherwise avoidable
outages which persist until a physical repair or manual intervention
is completed).
3.3.2. Requirements related to Ingress LSR
Where traffic enters a provider network tier such as the core, LSR
serve as ingress to PSC LSP if hierarchy is used. If RSVP-TE
signaling is used, ingress must perform CSPF if fully dynamic MPLS
routing is used. Even when working and protection paths are
configured with explicit paths computed offline, when a multiple
Villamizar Expires September 7, 2011 [Page 13]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
fault occurs, if restoration is supported, then CSPF must be run. It
is this multiple fault scenario which generally dictates scalability.
3.3.2.1. Reasons to Use MPLS/GMPLS Signaling
Dynamic routing is necessary in order to provide restoration which is
as robust as possible in the presence of multiple faults while still
providing efficient utilization of resources.
Legacy transport networks offer protection which requires dedicated
protection resources. If resources are allocated through the
management plane, then restoration support is either not provided at
all or extremely slow at best. More modern transport equipment which
supports fast restoration requires signaling which is generally
provided using GMPLS.
IP/MPLS networks typically make use of protection which offers
sharing or protection resources or more commonly make use of zero
bandwidth allocation on protection paths. The use zero bandwidth
allocation provides robust protection of preferred traffic as long as
preferred traffic is given queuing priority and preferred traffic
levels are low enough that adequate protection resources are
available for preferred traffic regardless of the protection path
taken. This assumption is not violated in network which are
dominated by Internet traffic and carry a minority of preferred
traffic.
When a single fault occurs, protection should restore traffic flow
quickly, with a typical target being 45 msec. Many deployments are
configured such that LSR run CSPF after a fault to obtain a new
protection path for what is now effectively the working path, or
reroute the working LSP and then create a new protection LSP.
Multiple faults which are not accounted for by SRLG are fairly
common. In many cases, such as earthquake, bridge collapse, train
wreck, flood, it is impractical to account for the specific multiple
fault in the SLRG set. When this does occur, fast restoration is
often required for a large number of LSP for which both the working
and protect paths are affected. In this case, a long convergence
time would result in a more lengthy outage for those LSP for which
the multiple fault was service affecting.
For core Internet services and for many non-Internet core services,
an inability to reach any one point in the network from another for a
significant length of time due to a fault which is correctable, even
if it is a multiple fault, is unacceptable. These services require
restoration at some layer.
Villamizar Expires September 7, 2011 [Page 14]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
3.3.2.2. MPLS Fault Response and CSPF Scaling
For most core networks MPLS/GMPLS signaling is required at some layer
for reasons described in Section 3.3.2.1. In order for restoration
to occur quickly, scaling issues must be considered and addressed,
including network topology impacts on scaling. These scaling issues
are dominated by CSPF computations and OSPF or ISIS flooding impact.
For a given ingress in a full mesh of LSR, a fault can result in a
very large number of affected LSP. At midpoint LSR the worst case
number of connection acceptance decisions can be very large. The
computational load per LSP on connection acceptance at midpoint LSR
is small but the reflooding of available bandwidth can also
contribute significant load.
At LSP ingress, the number of CSPF computations imposes scaling
limitations. CSPF computation time is proportional to the number of
nodes in a mesh and the total number of links. If the average node
degree remains constant, then the total number of links is
proportional to the number of nodes. The result is a single CSPF
time with order N*log2(N) time complexity (where N is the number of
nodes in the mesh). If the worst case number of LSP affected by a
fault also grows proportionally to N, then the total amount of
computation is order N^2*log2(N). The amount of computation grows at
a rate of greater than the square of the growth in the number of
nodes.
If restoration is not supported, any multiple fault will result in a
lengthy outage. If restoration is supported, constraining the size
of a full mesh will very significantly reduce the CSPF computation
load and the reflooding overhead and very significantly improve the
worst case restoration time.
3.3.3. Efficient Use of Multipath Capacity
Multipath load split based on hashing the IP addresses or MPLS labels
is far from perfect, though it is widely implemented and widely
deployed. For the vast majority of traffic, which is predominantly
Internet traffic, the underlying assumption that traffic is quite
evenly distributed across a hash space is valid. For a mix of
Internet traffic and fairly persistent large microflows, adaptive
multipath has proven effective (see Section 4.1.2).
The bandwidth reservations of LSP carrying Internet traffic are
merely predictions of required capacity. Often a significant
percentage of traffic can shift among a set of LSP. A great deal of
efficiency is gained in the presence of such shifts through the
ability to dynamically share the available capacity on a multipath.
Villamizar Expires September 7, 2011 [Page 15]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
The introduction of a minority of higher priority (and higher gross
margin) services to predominantly Internet traffic yields an
additional opportunity to make more efficient use of capacity. These
higher priority services on average significantly underutilize their
guaranteed capacities. The average over the entire set of such
services is fairly predictable. The capacity allocated to these
services but unused can be used as Internet capacity. Some small
probability exists that these services will make use of significantly
more capacity than predicted, up to their guaranteed capacities, but
the consequences of this unlikely occupance is a reduction in
capacity available to the Internet traffic for which capacity is not
guaranteed. This practice allows high margin services to be
delivered at substantially lower cost with very little risk to
Internet traffic and no risk at all to the higher priority services.
For the reasons above, current multipath techniques offer efficient
use of multipath capacity. Changes to multipath MUST NOT sacrifice
this efficiency where it is not necessary to meet other requirements.
4. Multipath Current Practices
Multipath take many forms. These include the use of ECMP in various
protocols, Ethernet Link Aggregation, and Link Bundling. The
specifications for each of these forms of multipath provide limited
characterization of external behavior, where any guidance is provided
at all. This section summarizes current practices among products
which are currently or have in the past been deployed successfully in
Internet service provider networks and content provider networks.
Much of the existing information on multipath current practices is
summarized in Section 1.1. With the exception of the work in PWE3
and minimal mention in LDP very little consideration for multipath
impact on new protocols has been documented.
This section is divided into two parts. First is documentation of
techniques common to all forms of multipath in Section 4.1. Second
is application of these techniques and unique characteristics of
specific forms of multipath in Section 4.2.
4.1. Techniques Common to Multipath in Provider Networks
There is a dramatic difference between the multipath techniques used
for pure Layer-2 Ethernet switches intended for enterprise networks
and the multipath techniques used for large provider core networks.
Many enterprise switches use only the Ethernet MAC in load balancing,
thought the argument that such networks may not be carrying IP or
MPLS traffic at all is rarely cited as a reason today. The routers
Villamizar Expires September 7, 2011 [Page 16]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
and/or LSR used in large provider networks are assumed to be carrying
IP traffic and/or MPLS traffic where the MPLS traffic is
predominantly carrying IP traffic as its payload.
Most of the multipath techniques used for large provider core
networks are common across all types of multipath. This is because
the traffic being handled by multipath in large provider networks is
predominantly IP or IP over MPLS. The following paragraph is quoted
from RFC 4928, Section 2, "Current ECMP Practices" [RFC4928]:
In the early days of MPLS, the payload was almost exclusively IP.
Even today the overwhelming majority of carried traffic remains
IP. Providers of MPLS equipment sought to continue this IP ECMP
behavior. As shown above, it is not possible to know whether the
payload of an MPLS packet is IP at every place where IP ECMP needs
to be performed. Thus vendors have taken the liberty of guessing
the payload. By inspecting the first nibble beyond the label
stack, existing equipment infers that a packet is not IPv4 or IPv6
if the value of the nibble (where the IP version number would be
found) is not 0x4 or 0x6 respectively. Most deployed LSRs will
treat a packet whose first nibble is equal to 0x4 as if the
payload were IPv4 for purposes of IP ECMP.
This observation led to the specification of the PW Control Word
[RFC4385] such that the values 4 and 6 which could be mistaken for
IPv4 or IPv6 were avoided. More accurately, [RFC4928] was written to
document the reasons for this decision made in [RFC4385].
4.1.1. Flow Identification
IP traffic in a large provider core network contains a very large
number of very short lived microflows (refer to the definition of
microflow in [RFC2475]). The number of flows has in the past been
estimated as many millions or many tens of millions. Many of the
flows exchange as few as two packet (DNS for example). Most contain
only tens of packets. Most flows exist for a few seconds and some
less than a second. A much smaller number of flows (though still a
large number) are longer in duration and exchange larger amounts of
data.
Attempts to isolate individual IP flows in large provider core
networks for the purpose of routing them individually have met with
resounding failure. Current practice does not attempt to isolate
individual flows, but instead isolates groups of flows. If
reordering is minimized or eliminated for groups of flows, then
reordering is minimized or eliminated for any single flow with a
group.
Villamizar Expires September 7, 2011 [Page 17]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
The method of subdividing IP traffic into groups of flows that has
been used successfully for more than two decades (since the T1-NSFNET
in 1987 or possibly prior to that) is to use a hash function over the
IP source address and destination address. Including the TCP or UDP
port numbers might be beneficial for enterprise networks but is not
necessary for large provider networks. Omitting port number is large
provider networks has the desirable characteristic of better
enforcing fairness among flows by eliminating or reducing the
potential of end users using multiple port numbers to defeat any
tendency toward fairness among flows.
In large provider core networks, MPLS LSP (in contrast to IP) are
very long lived, generally provide a large to very large amounts of
traffic, and are relatively few in number. In many large provider
core networks LSP which carry Internet traffic from one major core
node to another major core node, can very substantially exceed the
capacity of a multipath component link.
For MPLS traffic carrying Internet IP traffic, "taking the liberty of
guessing the payload" (as described in RFC 4928) was a matter of
necessity. The label stack simply did not provide adequate
diversity. Initially some LSR did not support this capability.
Splitting very large LSP by configuring two or more provided a
workaround (which only moved the hashing and load splitting out of
the core), however hashing based on label stack was highly
ineffective and packing LSP individually into link bundle component
links has substantial disadvantages (see Section 4.2.3).
For MPLS that is not carrying IP, the MPLS label stack is used as the
basis for the load split hash. Generally the entire label stack is
used or as few as three of the bottom labels are used. Using only
the bottom label (or only the top label) has proven unsatisfactory in
terms of splitting the load. Some forms of PW can be subdivided
which has motivated the introduction of a PW flow label
[I-D.ietf-pwe3-fat-pw].
4.1.2. Simple Multipath and Adaptive Multipath
Simple multipath generally relies on the mathematical probability
that given a very large number of small microflows, these microflows
will tend to be distributed evenly across a hash space. A common
simple multipath implementation assumes that all component links are
of equal capacity and perform a modulo operation across the hashed
value. An alternate simple multipath technique uses a table
generally with a power of two size, and distributes the table entries
proportionally among component links according to the capacity of
each component link.
Villamizar Expires September 7, 2011 [Page 18]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
An adaptive multipath technique is one where the traffic bound to
each component link is measured and the load split is adjusted
accordingly. As long as the adjustment is done within a single
network element, then no protocol extensions are required and there
are no interoperability issues.
Specific adaptive multipath techniques are outside of the scope of
this document.
4.1.3. Traffic Split over Parallel Links
The load splitting techniques defined in Section 4.1 and those
defined in Section 4.1.2 are both used in splitting traffic over
parallel links between the same pair of nodes. The best known
technique, though far from being the first, is Ethernet Link
Aggregation [IEEE-802.1AX]. This same technique had been applied
much earlier using OSPF or ISIS Equal Cost MultiPath (ECMP) over
parallel links between the same nodes. Multilink PPP [RFC1717] uses
a technique that provides inverse multiplexing. A number of vendors
had provided proprietary extensions to PPP over SONET/SDH [RFC2615]
that predated Ethernet Link Aggregation but are no longer used.
Link bundling [RFC4201] provides yet another means of handling
parallel LSP. RFC4201 explicitly allow a special value of all ones
to indicate a split across all component links of the bundle. Use of
link bundling is discussed in Section 4.2.3.
All of these techniques, including ECMP, may be used over two or more
links between a pair of nodes. The most primitive load split
algorithms may require that all links be of the same capacity and may
attempt to load balance equally. Somewhat less primitive techniques
may allow links to be unequal in capacity. Any of these techniques
can also use an adaptive multipath algorithm as described in
Section 4.1.2.
4.1.4. Traffic Split over Multiple Paths
OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of
traffic split over multiple paths that may traverse intermediate
nodes. ECMP is often incorrectly equated to only this case, and
multipath over multiple diverse paths is often incorrectly equated to
an equal division of traffic.
Many implementations are able to create more than one LSP between a
pair of nodes, where these LSP are routed diversely to better make
use of available capacity. The load on these LSP can be distributed
proportionally to the reserved bandwidth of the LSP. These multiple
LSP may be advertised as a single PSC FA and any LSP making use of
Villamizar Expires September 7, 2011 [Page 19]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
the FA may be split over these multiple LSP.
Link bundling [RFC4201] component links may themselves be LSP. When
this technique is used, any LSP which specifies the link bundle may
be split across the multiple paths of the LSP that comprise the
bundle.
Other forms of multipath may use what appear to be physical component
links that are provided by a server layer. For example, the
components of an Ethernet LAG may be provided by Ethernet PW
[RFC4448].
Techniques which spread traffic over multiple paths may use simple
multipath or adaptive multipath as described in Section 4.1.2. When
ECMP is used over an IP link or MPLS LDP LSP, visibility of available
capacity along the path is limited to the next hop only, therefore
load which is split proportionally to the capacity of the immediate
hop may not be split optimally for the entire path, even using an
adaptive multipath capable forwarding. For techniques which split
traffic over one or more LSP, the available capacity along the path
to the destination is assumed to be known through the bandwidth
reservations of the LSP.
4.2. Specific Types of Multipath
Three forms of multipath are considered here.
o ECMP
o Ethernet Link Aggregation
o MPLS Link Bundling
Of these types of multipath, the latter two can be applied to MPLS
with RSVP-TE signaling or static configurations.
4.2.1. ECMP Current Practices
Equal Cost Multipath has been available in the ISIS and OSPF link
state routing protocols for two decades or more. For example, see
[RFC1247]. ECMP is also available in BGP. ECMP is declared out of
scope in LDP, though widely implemented.
Although ECMP is not applicable to MPLS LSP setup with RSVP-TE
signaling, ECMP can be applied at an LER.
At an MPLS LER ECMP can be applied over two or more MPLS LSP with
traffic split proportionally to the LSP reserved bandwidth. This
Villamizar Expires September 7, 2011 [Page 20]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
could also be considered to be IP ECMP with an underlying MPLS LSP
server layer.
The equivalent to ECMP for an LSP setup can be achieved by creating
PSC LSP and concatenating them using link bundling, and using the
"all ones" link bundle component (see Section 4.2.3.
4.2.2. Ethernet Link Aggregation Current Practices
Ethernet link aggregation ([IEEE-802.1AX]) concatenates a set of
Ethernet member links below the Ethernet link layer, such that the
link aggregation group (LAG) appears as a single link with a single
Ethernet MAC address. The link aggregation control protocol (LACP)
coordinates membership in the LAG such that the member links can be
made unavailable to upper layers and added to the LAG on both nodes.
For IP using a link state protocol with ECMP, Ethernet link
aggregation had little effect. The load balancing on a LAG was
identical to the load balancing using ECMP over the set of member
links. ISIS only advertises the adjacencies between nodes. OSPF
advertises each link between nodes, so for IP using OSPF, link
aggregation only resulted in a reduction in routing protocol overhead
and simplification of the SPF.
For MPLS, some vendors had already implemented proprietary extensions
to PPP over SONET/SDH [RFC2615] that predated the earliest IEEE work
on link aggregation (IEEE 802.3ad) with capabilities similar to LACP.
It was not until 10GbE became widely available (about 5 years later)
that LAG was used in provider core networks, and began replacing OC-
192. MPLS link bundling implementations (prior to RFC status) also
predated Ethernet link aggregation.
A network deployment circa 2005 could either configure many Ethernet
links and use MPLS link bundling, or configure an Ethernet LAG. If
an MPLS link bundle was configured to split load over all link bundle
component links the functionality was equivalent to configuring the
set of links as a LAG. In core LSR implementations, the load split
in these two cases was identical.
4.2.3. MPLS Link Bundling Current Practices
MPLS link bundling [RFC4201] was conceived at about the time that it
was clear that OC-48 was too slow for IP core links, OC-192 was just
becoming available and would soon be too slow, and MPLS had strong
support among multiple providers. Link bundling initially solved two
problems. A few individual vendors had proprietary extensions to PPP
over SONET/SDH [RFC2615]. Link bundling could offer equivalent
capability and offer vendor interoperability. Second, some vendor
Villamizar Expires September 7, 2011 [Page 21]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
hardware was not capable of load splitting and therefore required
that each top level LSP be assigned a single path. Further, each
side of a link bundle could be configured differently, one could load
split and the other could place LSP on individual component link.
If LSP are place on individual links rather than split over the
entire bundle, then bin packing problems can occur. LSP are often
large making this packing error significant. In addition, LSP
bandwidth reservations in most IP/MPLS deployments are only
predictions of expected bandwidth. With link bundling, as specified,
LSP cannot be moved from one link bundle component link to another.
If LSP are assigned to links rather than split based on IP address
pairs, there is less opportunity for one LSP to make use of unused
capacity due to other LSP being utilized. The bin packing and loss
of opportunity to share capacity both reduce the efficiency of
capacity utilization.
MPLS link bundling does not currently offer an ability to select
which LSP are assigned to a single component link and which LSP are
split over the entire set of component links. Most forwarding
hardware can support this. Although an LSR could in principle be
configured to use some other attribute of an LSP to infer the
decision to load split, such as holding priority or an affinity for
an administrative attribute, no LSR software provides this
capability. Until MPLS-TP there was never a need for that
capability.
5. Improving Support for MPLS-TP and Multipath Requirements
The purpose of this section is to describe how MPLS-TP and multipath
could coexist and to define simple changes to accomplish this.
5.1. Characteristics of MPLS-TP Multipath Solutions
Three different methods to support MPLS-TP and multipath are
described. One method requires simple changes to link bundle and
LAG. One method requires no changes but has disadvantages. One
method involves no change to multipath but requires relaxation to
MPLS-TP OAM requirements.
The best solution makes MPLS over multipath a fully compliant server
layer for MPLS-TP meeting all of the requirements stated in the prior
sections but cannot be fully supported by most existing LSR without
hardware changes. The other two solutions have disadvantages but
require little or no change to existing hardware that would otherwise
support MPLS-TP. The changes are specified at the level of detail of
requirements and/or framework rather than as specific protocol
Villamizar Expires September 7, 2011 [Page 22]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
changes.
5.1.1. Coexistance of MPLS and MPLS-TP
The largest contributor of provider traffic today is the Internet.
All of this traffic is IP with some providers, but not all, using IP
over MPLS. IP is used without MPLS with ECMP and LAG and IP is used
with MPLS with all three forms 0f multipath described in Section 4.2,
ECMP, LAG, and link bundling.
In addition to Internet services, many providers currently offer
layer-2 and layer-3 VPN services over MPLS today. Other providers
offer native layer-2 services with an intention to migrate to MPLS-TP
for these services.
A primary purpose of migrating VPN and circuit services from layer-2
to MPLS-TP is to reduce cost relative to a dedicated layer-2
infrastructure for these services. Much of that reduction comes from
making use of infrastructure in place to support Internet traffic.
Using the capacity in place for Internet, predictive reservations can
be made for higher priority services, with guarantees possible by
transferring the risk of exceeding the predictions to the Internet
traffic through use of priority queuing. With Internet loads being
much larger, the unlikely event of predictive reservations being
exceeded would easily be absorbed. This architecture allows VPN and
circuit services to be delivered at lower cost.
IP/MPLS requires the use of multipath due to the high traffic levels.
MPLS-TP requires a single path for each LSP. With no changes, these
two requirements are in conflict. Three possible approaches are
examined in the following sections.
1. Supporting MPLS and MPLS-TP over a common server layer with
multipath support as well as MPLS-TP over an MPLS server layer
over a multipath capable server layer.
2. Supporting MPLS over an MPLS-TP server layer using multiple
MPLS-TP LSP as MPLS component links where multipath is needed.
3. Relaxing MPLS-TP OAM and documenting the limitations such that
MPLS-TP could be supported over an existing multipath server
layer.
Each of these are separate solutions. For example, if changes to
MPLS forwarding enable MPLS with multipath to support fully compliant
MPLS-TP LSP, then relaxing MPLS-TP OAM is not needed. Conversely, if
MPLS forwarding cannot be changed on specific existing equipment to
Villamizar Expires September 7, 2011 [Page 23]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
accommodate MPLS-TP, then one of the other two solutions is required.
Supporting MPLS-TP OAM at high rates also requires hardware change to
most existing LSR, therefore all of these solutions require some form
of hardware change.
5.1.2. Advantages and Disadvangates of Solutions
A desirable solution is one that meets all requirements and is highly
cost effective. An undesirable solution is one that either does not
meet all requirements or is not cost effective. The ability to use
existing hardware is also desirable. A number of solutions and the
necessary changes are discussed in the following subsections.
MPLS, which requires multipath, and MPLS-TP, which requires a single
path, could potentially coexist in the following ways.
MPLS as a Server Layer for MPLS-TP
(Section 5.2.1)
Advangates: MPLS-TP can be fully accommodated with small
signaling changes and forwarding changes. Efficient use of
capacity can be achieved.
Disadvangates: Changes to the fields over which a hash is
computed is required and therefore this method may no be
supportable with some existing hardware.
MPLS-TP as a Server Layer for MPLS
(Section 5.2.2)
Advangates: Some transport providers prefer to offer MPLS-TP due
to its ability to support familiar management and operations
procedures, involving static configuration of network
elements and inband performance monitoring and protection
activation.
Disadvangates: Multipath is moved to the client layer. High
bandwidth MPLS LSP must be supported through smaller parallel
MPLS-TP LSP. The opportunity to dynamically share capacity
of MPLS LSP is diminished when large MPLS LSP are run over
smaller MPLS-TP LSP. The use of MPLS-TP LSP across a high
bandwidth core will increase the number of LSP required and
may impact scalability.
Villamizar Expires September 7, 2011 [Page 24]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
Relax MPLS-TP OAM Requirements
(Section 5.2.3)
Advangates: Relaxing OAM requirements would allow MPLS-TP LSP to
exceed the capacity of a single component (or member) link.
MPLS over MPLS-TP becomes more practical.
Disadvangates: CC/CV requires enhancement to exercise all parts
of a multipath and would benefit from further enhancements
(see Section 5.2.3). CC/CV must be coordinated across
multiple packet processing elements. Reordering of MPLS-TP
traffic, even if not harmful to the payload itself, would
result in significant short term inaccuracy in loss reported
by OAM LM.
5.2. MPLS-TP Multipath Solution Set
Three solutions are described. As noted in Section 5.1.1 these are
three separate solutions. Each can be deployed independently. Most
important neither of the first two solutions requires relaxing
MPLS-TP OAM requirements. On the other hand, these solutions are not
mutually exclusive.
5.2.1. MPLS as a Server Layer for MPLS-TP
Using MPLS with multipath as a server layer for MPLS-TP has the most
advantages with respect to the requirements, and with the exception
of inability to run on some (or most) existing hardware, has no
disadvantages. This is assuming that the protocol changes suggested
in this subsection are implemented in later IETF documents.
Supporting fully conformant MPLS-TP LSP over MPLS LSP which are
making use of multipath, requires special treatment of the MPLS-TP
LSP such that those LSP only are not subject to the multipath load
slitting.
MP#1 It MUST be possible to identify MPLS-TP LSP.
MP#2 It MUST be possible to completely exclude MPLS-TP LSP from the
multipath hash and load split, statically assign it to a
component link or member, and compensate for this assignment in
the MPLS multipath load split.
MP#3 In order to support one or more MPLS-TP LSP contained in an
MPLS LSP, it MUST be possible to signal the presence of MPLS-TP
LSP within an MPLS LSP.
Villamizar Expires September 7, 2011 [Page 25]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
MP#4 In order to support an MPLS LSP carrying other MPLS LSP some of
which in turn carry MPLS-TP LSP, it MUST be possible to
determine the minimum depth within the label stack at which an
MPLS-TP LSP exists and provide this depth in signaling.
MP#5 The depth within the label stack of the multipath hash for any
MPLS LSP that is carrying MPLS-TP LSP MUST be constrained for
that MPLS LSP so that the hashing does not include any
information past an MPLS-TP label.
MP#6 It must be possible for an LSR which is setting up an MPLS-TP
or MPLS LSP to determine at CSPF time whether a link can
support the MPLS-TP requirements of the LSP.
Some hardware which exists today can support requirement MP#2. For
example, if a table is used to support multipath and produces
satisfactory results given existing traffic patterns, and the number
of component links or members is smaller than the table by a factor
or N, then an allocation of a multiple of 1/N of a component or
member link can be set aside for MPLS-TP traffic. The MPLS-TP
traffic can be protected from an degraded performance due to an
imperfect load split if the MPLS-TP traffic is given queuing priority
(using strict priority and policing or shaping at ingress or locally
or weighted queuing locally).
Most existing hardware cannot support requirement MP#5 but some may
be able to partially support this requirements by fixing the label
stack inspection depth to a fixed number of LSP from the top. Full
support for requirement MP#5 requires that the depth over which the
hash is computed can be derived from the label number of the label on
which a label swap operation is performed.
5.2.2. MPLS-TP as a Server Layer for MPLS
Carrying MPLS LSP which are larger than a component link over an
MPLS-TP server layer requires that the large MPLS client layer LSP be
accommodated by multiple MPLS-TP server layer LSPs. MPLS multipath
can be used in the client layer MPLS as described in Section 4.1.4.
Creating multiple MPLS-TP server layer LSP places a greater ILM
scaling burden on the LSR (see Section 3.3.1.1 and the examples in
Section 3.3.1.3). High bandwidth MPLS cores with a smaller amount of
nodes have the greatest tendency to require LSP in excess of
component links, therefore the reduction in number of nodes offsets
the impact of increasing the number of server layer LSP in parallel.
Today, only in cases where the ILM is small would this be an issue.
The most significant disadvantage of MPLS-TP as a Server Layer for
Villamizar Expires September 7, 2011 [Page 26]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
MPLS is that the MPLS LSP reduces the efficiency of carrying the MPLS
client layer. The service which provides by far the largest offered
load today is Internet, for which the LSP capacity reservations are
predictions of expected load. Many of these MPLS LSP may be smaller
than component link capacity. Using MPLS-TP as a server layer
results in bin packing problems for these smaller LSP. For those LSP
that are larger than component link capacity, their capacity are not
increments of convenient capacity increments such as 10Gb/s. Using
MPLS-TP as an underlying server layer greatly reduces the ability of
the client layer MPLS LSP to share capacity. For example, when one
MPLS LSP is underutilizing its predicted capacity, the fixed
allocation of MPLS-TP to component links may not allow another LSP to
exceed its predicted capacity. A solution which makes less efficient
use of resources may result in a less cost effective solution, due to
the amount of capital equipment cost required and an increase in
space and power required.
No additional requirements beyond MPLS-TP as it is now currently
defined are required to support MPLS-TP as a Server Layer for MPLS.
It is therefore viable but has some undesirable characteristics
discussed above.
5.2.3. Relax MPLS-TP OAM Requirements
If MPLS-TP OAM requirements are not fully met, as currently
specified, an LSP is not fully MPLS-TP conformant. That may be
little more than a semantic inconvenience and can not prevent
implementations from allowing LSP which are otherwise MPLS-TP
compliant to optionally use multipath with some reduction in OAM
capability.
Regardless as to whether relaxing MPLS-TP OAM requirements makes an
LSP no longer an MPLS-TP LSP, this section discusses the consequence
of using multipath with regard to MPLS-TP OAM.
If MPLS-TP over multipath is supported by relaxing MPLS-TP OAM
requirements, the requirements listed below will improve the behavior
of MPLS-TP OAM over multipath.
OAM#1 There MUST be a means of introducing entropy to MPLS-TP OAM.
OAM#2 There SHOULD be a means to focus CC/CV testing on a specific
multipath component link.
OAM#3 There MUST be a means to support LM over multipath, even if at
best a bounded long term inaccuracy is achieved.
Villamizar Expires September 7, 2011 [Page 27]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
5.2.3.1. MPLS-TP CC/CV OAM with Multipath
MPLS-TP CC/CV as currently defined has no means to exercise all paths
of a multipath. The label stack is fixed, followed by a GAL label
[RFC5586]. As is, only one path along a multipath can be exercised
when the ingress to the multipath is not also the ingress to the LSP.
For example, if the LSP is carrying PW, the PW themselves can be
spread across the multipath, but not the OAM traffic.
If CC/CV OAM is allowed to place a label below the GAL label, the
entire set of paths can be tested, though not in a deterministic
manner. This is called an entropy label. Using a different random
number in this entropy label for each OAM packet allows all links to
be exercised on a probabilistic basis.
The loss of a isolated OAM CC/CV packet currently has no effect. If
the loss of a single OAM packet can be noted by the sender, then the
sender can repeatedly use the same value in the entropy label. This
requires either a two way OAM or feedback to the ingress. If OAM
packets can be reordered, then a sliding window of outstanding OAM
packets is required. If OAM CC/CV packets are given high priority
(as currently specified), then delay difference should be minimal and
reordering may be non-existent if the send interval is longer than
the delay difference.
If a multipath component link failure had been detected locally (at a
node adjacent to the failure) and the failure corrected locally (ie:
segment protection) or the component link taken out of service, the
client LSP would either no longer be affected or it would be
preempted. If the client LSP has been preempted, MPLS-TP OAM
unmodified would be sufficient to detect this condition. The
existing BFD [RFC5884] provides this functionality.
Only in the case where a component link has failed and the server
layer has not been able to detect and correct the failure or take the
component link out of service would CC/CV OAM on the client LSP serve
any purpose. For this purpose, a relaxed OAM may be sufficient. If
the client LSP has no control over the multipath itself, the entire
multipath must be considered down if any uncorrected component link
failure is occurring at the multipath.
The CC/CV as described here can be handled by an OAM mechanism which
is bidirectional. LSP Ping provides such a mechanism [RFC4379].
Because the condition being handled by LSP ping should be quite rare,
it may be acceptable to use a combination of BFD and MPLS ping to
provide OAM with full coverage of all types of fault, but with a
slower response to a component link failure which is not detected at
the point of the fault.
Villamizar Expires September 7, 2011 [Page 28]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
For LSR implementations which support BFD and MPLS ping "as is",
these may be viable as an optional MPLS-TP form of CC/CV OAM. A
deployment may use this option if the reliance on IP is acceptable to
the provider. Alternately MPLS-TP OAM could take such requirements
into consideration and provide an additional capability in BFD or
provide MPLS-TP extensions to MPLS ping.
A further small complication may occur at the OAM egress. If the
egress to the LSP is a multipath egress, then the OAM may arrive at
any of the component links at the egress. This requires that the
CC/CV OAM be forwarded within the LSR to a common packet processor in
order to be handled in hardware (or forwarded to a common CPU). This
is also true of other types of OAM.
5.2.3.2. MPLS-TP LM OAM with Multipath
MPLS-TP LM OAM makes use of the count of payload packets at an
egress. If the payload is reordered, even with no consequence to the
payload itself, some inaccuracy is introduced to the LM. Some number
of payload packets which were transmitted before the LM OAM packet
was sent may arrive after the LM packet is received and some payload
packets transmitted after the LM OAM packet may arrive before the LM
packet.
If the LSP egress is a multipath, then the LM packets may arrive at
any packet processor over which the multipath resides. The counters
from each of the egress packet processors will have to be sampled.
During the sampling interval, addition packet arrive and will be
counted. This creates an equivalent out of order problem with
respect to the LM OAM and the payload it is counting.
This error is bounded and is not cumulative. For example, if one LM
interval counts too few packets, the next LM interval will tend to
count too many. Over longer measurement periods the total error
retains the same bounds, which over longer intervals becomes less
significant.
These errors are most significant when a substantial amount of
queuing delay is present (generally an indication of light
congestion) and when the queues at various component links differ in
delay. Queuing delay differences are generally milliseconds. Delay
differences of tens of milliseconds requires persistent queues and
significant congestion.
The worst case errors over long intervals are reasonably well
bounded. For example, with A 10 msec delay difference, a one minute
sampling yields less than a 0.02% uncertainty and over a 15 minute
interval loss uncertainty is just over 0.001%. Given that congestion
Villamizar Expires September 7, 2011 [Page 29]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
is required to achieve these uncertainties, the loss due to
congestion is likely to significantly exceed these uncertainties for
all but very short measurement intervals.
When loss is zero but short term queues are formed, the queuing delay
difference is likely to be under one millisecond for the common case
of parallel links that are routed along the same fiber (using WDM).
The uncertainty for 1 minute and 15 minute samples are under 0.002%
and just over 0.0001% (10^-6). The uncertainty over a 24 hour period
is 0.00000011% or just over 10^-9. An SLA could easily be supported
where loss was guaranteed not to exceed 10^-6 in any hour or 10^-8 in
any 24 hour period. Such a guarantee would require that the MPLS-TP
LSP be given priority over non-policed or shaped traffic and itself
is policed or shaped.
This measurement uncertainty may or may not be acceptable to a given
deployment. Providing an option to support MPLS-TP over multipath
does introduce a bounded error to LM but it does not remove a
providers option not to use MPLS-TP over multipath.
6. Summary of Recommendations
Section 3 enumerates functional requirements. Section 4 describes
current practices. Section 5 enumerates functional changes to better
meet these requirements. This section provides specific
recommendations.
To support MPLS with multipath as a server layer for MPLS-TP the
following changes are required.
Recommendation #1 Provide a means in RSVP-TE for an LSP to self
identify its requirement to be treated as fully
compliant MPLS-TP (disallow reordering).
Recommendation #2 Provide a means in RSVP-TE for an LSP that is not
an MPLS-TP LSP but is directly carrying MPLS-TP
LSP to indicate that hashing may only be performed
on the first two labels and indicate the largest
MPLS-TP LSP being carried (the largest potential
microflow).
Recommendation #3 Provide a means in RSVP-TE for an LSP that is not
an MPLS-TP LSP but is carrying MPLS-TP at some
depth to indicate the maximum depth in the label
stack that hashing can operate on, and the largest
MPLS-TP LSP being carried (the largest potential
microflow).
Villamizar Expires September 7, 2011 [Page 30]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
Recommendation #4 Provide a means in OSPF-TE and ISIS-TE to indicate
the largest microflow that a multipath can
accommodate independent of the largest LSP that
can accommodated with load splitting. An
extension to [RFC4201] which separates Maximum LSP
into two variables, with backward compatibility
may be the most desirable solution.
The current framework documents could be improved with the following
additions.
Recommendation #5 Relax GAL specification in [RFC5586] to allow a
label below GAL to provide entropy in OAM traffic
over multipath.
Recommendation #6 Preferably in the OAM framework, acknowledge the
need for entropy in OAM in some circumstances.
Note that if no multipath exists along a path, the
entropy is not needed but does no harm. Support
optional entropy in MPLS-TP OAM through use of a
label under the GAL label.
Recommendation #7 Document the need for MPLS Ping or other two way
mechanism to keep a sliding window of outstanding
packets at the sender which records the entropy
value used, note any single loss, and send
repeated packets for an entropy value which has
experienced a loss.
Recommendation #8 Preferably in the OAM framework, document the need
for CC/CV at a multipath egress to forward OAM
packets for an LSP that is load split through an
out of band means to a common packet processor or
CPU.
Recommendation #9 Preferably in the OAM framework, document the need
for LM at multipath egress to collect packet
counts on all packet processors that could
potentially receive packets for a given LSP.
Forwarding changes to multipath necessary to support MPLS with
multipath as a server layer for fully compliant MPLS-TP are the
following:
Forwarding #1 Store the maximum depth of multipath hash (or zero for
unconstrained depth) in the ILM.
Villamizar Expires September 7, 2011 [Page 31]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
Forwarding #2 Do not hash using the IP stack on an LSP which is
carrying MPLS-TP. An LSP where IP headers can be used
in the stack can be identified by noting that a
maximum depth equal zero cannot be carrying MPLS-TP or
it can be explicitly indicated, independently of
depth. If a CW is not used with PW, then this
indication must be explicit.
Forwarding #3 When hashing on the MPLS label stack do not hash
beyond the maximum depth of hash for a given LSP.
Forwarding #4 Exclude reserved labels from the hash on label stack.
In particular, the GAL [RFC5586] and OAM Alert Label
[RFC3429] should be skipped.
7. IANA Considerations
This memo includes no request to IANA.
8. Security Considerations
This document specifies requirements with discussion of framework for
solutions. The requirements and framework are related to the
coexistence of MPLS/GMPLS (without MPLS-TP) when used over a packet
network, MPLS-TP, and multipath. The combination of MPLS, MPLS-TP,
and multipath does not introduce any new security threats. The
security considerations for MPLS/GMPLS and for MPLS-TP are documented
in [RFC5920] and [I-D.ietf-mpls-tp-security-framework].
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
9.2. Informative References
[I-D.ietf-mpls-tp-oam-framework]
Allan, D., Busi, I., Niven-Jenkins, B., Fulignoli, A.,
Hernandez-Valencia, E., Levrau, L., Sestito, V., Sprecher,
N., Helvoort, H., Vigoureux, M., Weingarten, Y., and R.
Winter, "Operations, Administration and Maintenance
Framework for MPLS-based Transport Networks",
draft-ietf-mpls-tp-oam-framework-11 (work in progress),
Villamizar Expires September 7, 2011 [Page 32]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
February 2011.
[I-D.ietf-mpls-tp-security-framework]
Bitar, N., Fang, L., Niven-Jenkins, B., Zhang, R.,
Mansfield, S., Daikoku, M., and L. Wang, "MPLS-TP Security
Framework", draft-ietf-mpls-tp-security-framework-00 (work
in progress), February 2011.
[I-D.ietf-pwe3-fat-pw]
Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan,
J., and S. Amante, "Flow Aware Transport of Pseudowires
over an MPLS PSN", draft-ietf-pwe3-fat-pw-05 (work in
progress), October 2010.
[IEEE-802.1AX]
IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE
Standard for Local and Metropolitan Area Networks - Link
Aggregation", 2006, <http://standards.ieee.org/getieee802/
download/802.1AX-2008.pdf>.
[ITU-T.G.800]
ITU-T, "Unified functional architecture of transport
networks", 2007, <http://www.itu.int/rec/T-REC-G/
recommendation.asp?parent=T-REC-G.800>.
[RFC1247] Moy, J., "OSPF Version 2", RFC 1247, July 1991.
[RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The
PPP Multilink Protocol (MP)", RFC 1717, November 1994.
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, December 1998.
[RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615,
June 1999.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991, November 2000.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, November 2000.
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
Label Switching Architecture", RFC 3031, January 2001.
[RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
Villamizar Expires September 7, 2011 [Page 33]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
Encoding", RFC 3032, January 2001.
[RFC3260] Grossman, D., "New Terminology and Clarifications for
Diffserv", RFC 3260, April 2002.
[RFC3270] Le Faucheur, F., Wu, L., Davie, B., Davari, S., Vaananen,
P., Krishnan, R., Cheval, P., and J. Heinanen, "Multi-
Protocol Label Switching (MPLS) Support of Differentiated
Services", RFC 3270, May 2002.
[RFC3429] Ohta, H., "Assignment of the 'OAM Alert Label' for
Multiprotocol Label Switching Architecture (MPLS)
Operation and Maintenance (OAM) Functions", RFC 3429,
November 2002.
[RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
Extensions to RSVP-TE for LSP Tunnels", RFC 4090,
May 2005.
[RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.
[RFC4206] Kompella, K. and Y. Rekhter, "Label Switched Paths (LSP)
Hierarchy with Generalized Multi-Protocol Label Switching
(GMPLS) Traffic Engineering (TE)", RFC 4206, October 2005.
[RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol
Label Switched (MPLS) Data Plane Failures", RFC 4379,
February 2006.
[RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson,
"Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
Use over an MPLS PSN", RFC 4385, February 2006.
[RFC4426] Lang, J., Rajagopalan, B., and D. Papadimitriou,
"Generalized Multi-Protocol Label Switching (GMPLS)
Recovery Functional Specification", RFC 4426, March 2006.
[RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron,
"Encapsulation Methods for Transport of Ethernet over MPLS
Networks", RFC 4448, April 2006.
[RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
Cost Multipath Treatment in MPLS Networks", BCP 128,
RFC 4928, June 2007.
[RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast
Reroute: Loop-Free Alternates", RFC 5286, September 2008.
Villamizar Expires September 7, 2011 [Page 34]
Internet-Draft MPLS-TP and MPLS Multipath March 2011
[RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching
(MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic
Class" Field", RFC 5462, February 2009.
[RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic
Associated Channel", RFC 5586, June 2009.
[RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework",
RFC 5714, January 2010.
[RFC5860] Vigoureux, M., Ward, D., and M. Betts, "Requirements for
Operations, Administration, and Maintenance (OAM) in MPLS
Transport Networks", RFC 5860, May 2010.
[RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow,
"Bidirectional Forwarding Detection (BFD) for MPLS Label
Switched Paths (LSPs)", RFC 5884, June 2010.
[RFC5920] Fang, L., "Security Framework for MPLS and GMPLS
Networks", RFC 5920, July 2010.
Author's Address
Curtis Villamizar (editor)
Infinera Corporation
169 W. Java Drive
Sunnyvale, CA 94089
Email: cvillamizar@infinera.com
Villamizar Expires September 7, 2011 [Page 35]
| PAFTECH AB 2003-2026 | 2026-04-24 03:06:40 |