One document matched: draft-villamizar-mpls-tp-multipath-01.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!-- xml2rfc is available at http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

  <!ENTITY RFC1717 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1717.xml">
  <!ENTITY RFC1247 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1247.xml">
  <!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY RFC2475 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2475.xml">
  <!ENTITY RFC2615 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2615.xml">
  <!ENTITY RFC2991 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2991.xml">
  <!ENTITY RFC2992 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2992.xml">
  <!ENTITY RFC3031 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3031.xml">
  <!ENTITY RFC3032 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3032.xml">
  <!ENTITY RFC3260 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3260.xml">
  <!ENTITY RFC3270 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3270.xml">
  <!ENTITY RFC3429 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3429.xml">
  <!ENTITY RFC4090 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4090.xml">
  <!ENTITY RFC4201 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4201.xml">
  <!ENTITY RFC4206 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4206.xml">
  <!ENTITY RFC4385 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4385.xml">
  <!ENTITY RFC4379 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4379.xml">
  <!ENTITY RFC4426 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4426.xml">
  <!ENTITY RFC4448 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4448.xml">
  <!ENTITY RFC4928 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4928.xml">
  <!ENTITY RFC5286 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5286.xml">
  <!ENTITY RFC5462 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5462.xml">
  <!ENTITY RFC5586 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5586.xml">
  <!ENTITY RFC5714 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5714.xml">
  <!ENTITY RFC5860 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5860.xml">
  <!ENTITY RFC5884 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5884.xml">
  <!ENTITY RFC5920 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5920.xml">

  <!ENTITY I-D.ietf-mpls-tp-oam-framework SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-mpls-tp-oam-framework-11">
  <!ENTITY I-D.ietf-pwe3-fat-pw SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-pwe3-fat-pw-05.xml">
  <!ENTITY I-D.ietf-mpls-tp-security-framework SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-ietf-mpls-tp-security-framework-00.xml">

  ]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc comments="yes"?>
<?rfc inline="yes" ?>

<rfc category="info" ipr="trust200902"
     docName="draft-villamizar-mpls-tp-multipath-01">

  <front>
    <title abbrev="MPLS-TP and MPLS Multipath">
      Use of Multipath with MPLS-TP and MPLS</title>

    <author role="editor"
	    fullname="Curtis Villamizar" initials="C." surname="Villamizar">
      <organization>Infinera Corporation</organization>
      <address>
        <postal>
          <street>169 W. Java Drive</street>
          <city>Sunnyvale, CA</city>
	  <code>94089</code>
        </postal>
        <email>cvillamizar@infinera.com</email>
      </address>
    </author>

    <date month="March" year="2011" />

    <area>Routing</area>
    <workgroup>CCAMP</workgroup>

    <keyword>MPLS</keyword>
    <keyword>composite link</keyword>
    <keyword>link aggregation</keyword>
    <keyword>ECMP</keyword>
    <keyword>link bundling</keyword>
    <keyword>MPLS-TP</keyword>

    <abstract>
      <t>
	Many MPLS implementations have supported multipath techniques
	and many MPLS deployments have used multipath techniques,
	particularly in very high bandwidth applications, such as
	provider IP/MPLS core networks.  MPLS-TP has discouraged the
	use of multipath techniques.  Some degradation of MPLS-TP OAM
	performance cannot be avoided when operating over current high
	bandwidth multipath implementations.
      </t>
      <t>
	The tradeoffs involved in using multipath techniques with MPLS
	and MPLS-TP are described.  Requirements are discussed which
	enable full MPLS-TP compliant LSP including full OAM
	capability to be carried over MPLS LSP which are traversing
	multipath links.  Other means of supporting MPLS-TP coexisting
	with MPLS and multipath are discussed.
      </t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>
	Today the requirement to handle large aggregations of traffic,
	can be handled by a number of techniques which we will
	collectively call multipath.  Multipath applied to parallel
	links between the same set of nodes includes Ethernet Link
	Aggregation <xref target="IEEE-802.1AX" />,
	<xref target="RFC4201">link bundling</xref>, or other
	aggregation techniques some of which may be vendor specific.
	Multipath applied to diverse paths rather than parallel links
	includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS,
	or BGP, and equal cost LSP, as described in
	<xref target="multipath-practices" />.  Various multipath
	techniques have strengths and weaknesses described in
	<xref target="multipath-types" />.
      </t>
<!--

   [RFC5654 requirement 33]

   33  A solution MUST be provided to support the transport of a client
       MPLS or MPLS-TP layer network over a server MPLS or MPLS-TP layer
       network.

       A.  The level of coordination required between the client and
           server MPLS(-TP) layer networks MUST be minimized (preferably
           no coordination will be required).

       B.  The MPLS(-TP) server layer network MUST be capable of
           transporting the complete set of packets generated by the
           client MPLS(-TP) layer network, which may contain packets
           that are not MPLS packets (e.g., IP or Connectionless Network
           Protocol (CNLP) packets used by the control/management plane
           of the client MPLS(-TP) layer network).

  -->
      <t>
	The term composite link is more general than terms such as
	link aggregation (which is specific to Ethernet) or ECMP
	(which implies equal cost paths within a routing protocol).
	The use of the term composite link here is consistent with the
	broad definition in <xref target="ITU-T.G.800" />.  Multipath
	is very similar to composite link, but specifically excludes
	inverse multiplexing.
      </t>
      <section anchor="existing"
	       title="Multipath Behavior of Widely Deployed Equipment">
	<t>
	  Identical load balancing techniques are used for multipath
	  both over parallel links (for example IP/MPLS over Ethernet
	  link aggregation) and over diverse paths (for example, IP
	  ECMP, IP/MPLS ECMP over multiple LSP or link bundling over
	  LSP component links).
	</t>
	<t>
	  Large aggregates of IP traffic do not provide explicit
	  signaling to indicate the expected traffic loads.  Large
	  aggregates of MPLS traffic are carried in MPLS tunnels
	  supported by MPLS LSP.  LSP which are signaled using RSVP-TE
	  extensions do provide explicit signaling which includes the
	  expected traffic load for the aggregate.  LSP which are
	  signaled using LDP do not provide an expected traffic load.
	</t>
	<t>
	  MPLS LSP may contain other MPLS LSP arranged hierarchically.
	  When an MPLS LSR serves as a midpoint LSR in an LSP carrying
	  other LSP as payload, there is no signaling associated with
	  these client (inner) LSP.  Therefore even when using RSVP-TE
	  signaling there may be insufficient information provided by
	  signaling to adequately distribute load across a multipath
	  link.
	</t>
	<t>
	  A set of label stack entries that is unique across the
	  ordered set of label numbers can safely be assumed to
	  contain a group of (one or more) flows.  The reordering of
	  MPLS traffic (except MPLS-TP) can therefore be considered to
	  be acceptable unless reordering occurs within traffic
	  containing a common unique set of label stack entries.
	  Existing load splitting techniques take advantage of this
	  property in addition to looking beyond the bottom of the
	  label stack and determining if the payload is IPv4 or IPv6
	  to load balance traffic based on IP addresses.
	</t>
	<t>
	  A large aggregate of IP traffic may be subdivided into
	  groups of flows using a hash on the IP source and
	  destination addresses.  IP microflows are described in
	  <xref target="RFC2475" /> and clarified in
	  <xref target="RFC3260" />.  For MPLS traffic that is not
	  carrying IP, a similar hash can be performed on the set of
	  labels in the label stack.  These techniques subdivide
	  traffic into groups of flows for the purpose of load
	  balancing traffic across the aggregated capacity of a
	  multipath link.
	</t>
	<t>
	  Attempting to resolve years of discussion as to whether a
	  hash based approach provides a sufficiently even load
	  balance using any particular hashing algorithm or method of
	  distributing traffic across a set of component links is
	  outside of the scope of this document.  For the purpose of
	  discussing existing widely deployed implementations, it is
	  sufficient to say that hash based techniques have proven to
	  be at least satisfactory through their widespread deployment
	  (and its increase in deployment for more than two decades).
	</t>
	<t>
	  The current load balancing techniques are referenced in
	  <xref target="RFC4385" /> and <xref target="RFC4928" />,
	  though few specifics are provided in these two RFCs.  The
	  use of three hash based approaches are described in
	  <xref target="RFC2991" /> and <xref target="RFC2992" />,
	  though other techniques with very similar outcome are used.
	  A means to identify flows within pseudowires (when flows are
	  present, since not all PW types contain discernible flows)
	  is described in <xref target="I-D.ietf-pwe3-fat-pw" />.
	</t>
      </section>
      <section anchor="tp-intro"
	       title="New Requirements imposed by MPLS-TP">
	<t>
	  MPLS-TP OAM violates the assumption made in prior multipath
	  implementations that it is safe to reorder traffic within an
	  LSP.  This assumption is common (if not universal) in
	  multipath implementations which use hashing techniques for
	  load balancing.  The use of multipath can impact CC/CV
	  (connectivity check, connectivity verification) and LM (loss
	  measurement) and DM (delay measurement)
	  <xref target="I-D.ietf-mpls-tp-oam-framework" />.
	</t>
	<t>
	  MPLS-TP CC/CV, DM, and LM OAM packets must take the same
	  path as the payload.  If the label stack for the payload
	  contains an LSP and a PW label beneath it (one of one or
	  more additional PW labels), then the payload will be load
	  split over the multipath.  The OAM packets will have a GAL
	  label beneath the LSP label <xref target="RFC5586" />.  With
	  no other label beneath the GAL label, the OAM traffic will
	  take only one path and the set of PW will take multiple
	  paths (though any one PW will take one path if a flow label
	  is not used).
	</t>
	<t>
	  With the current OAM CC/CV definition and current multipath
	  practices, OAM CC/CV functionality may not cover the
	  forwarding path for a particular PW within the LSP at any
	  given multipath along the path.  The existing OAM CC/CV will
	  provide a check for the condition where the entire multipath
	  becomes unavailable (goes down or the particular LSP is
	  preempted due to reduced multipath capacity).
	</t>
	<t>
          There is no assurance that DM OAM is measuring the delay of
	  the forwarding path for a particular PW within the LSP with
	  the current OAM DM definition and current multipath
	  practices.  In addition, if packets are reordered, OAM LM
	  accuracy can be (and generally is) affected.
	</t>
      </section>
      <section anchor="mp-conflict"
	       title="Apparantly Conflicting Requirements">
	<t>
	  The existing multipath techniques address specific
	  requirements.  MPLS-TP requirements are in conflict with
	  multipath, at least as currently implemented.
	</t>
	<t>
	  The underlying requirements that motivated the current use
	  of multipath are not in conflict with the use of MPLS-TP.
	  <xref target="multipath-reqm" /> described these
	  requirements in greater detail.
	  <xref target="multipath-practices" /> described current
	  practices in greater detail.
	  <xref target="multipath-changes" /> describes means of
	  better supporting both MPLS-TP and multipath requirements.
	</t>
      </section>
      <section title="Requirements Language">
        <t>
	  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
          and "OPTIONAL" in this document are to be interpreted as
          described in <xref target="RFC2119">RFC 2119</xref>.
	</t>
      </section>
    </section>

    <section anchor="def" title="Definitions">
      <t><list style="hanging" hangIndent="4">
	  <t hangText="Multipath"><vspace blankLines="0" />
	    The term multipath includes all techniques in which
	    <list style="numbers">
	      <t>
		Traffic can take more than one path from one node to a
		destination.
	      </t>
	      <t>
		Individual packets take one path only.
	      </t>
	      <t>
		Packets are neither resequenced or subdivided and
		reassembled at the receiving end.
	      </t>
	      <t>
		The paths may be:
		<list style="letters">
		  <t>
		    parallel links between two nodes, or
		  </t>
		  <t>
		    may be specific paths across a network to a
		    destination node, or
		  </t>
		  <t>
		    may be links or paths to a next hop hop used to
		    reach a common destination.
		  </t>
		</list>
	      </t>
	    </list>
	  </t>
	  <t hangText="Link Bundle"><vspace blankLines="0" />
	    Link bundling is a multipath technique specific to MPLS
	    <xref target="RFC4201" />.  Link bundling supports two
	    modes of operations.  Either an LSP can be placed on one
	    component link of a link bundle, or an LSP can be load
	    split across all members of the bundle.  There is no
	    signaling defined which allows a per LSP preference
	    regarding load split, therefore whether to load split is
	    generally configured per bundle and applied to all LSP
	    across the bundle.
	  </t>
	  <t hangText="Link Aggregation"><vspace blankLines="0" />
	    The term "link aggregation" generally refers
	    to <xref target="IEEE-802.1AX">Ethernet Link
	    Aggregation</xref> as defined by the IEEE.  Ethernet Link
	    Aggregation defines a Link Aggregation Control Protocol
	    (LACP) which coordinates inclusion of LAG members in the
	    LAG.
	  </t>
	  <t hangText="Link Aggregation Group (LAG)">
	    <vspace blankLines="0" /> A group of physical Ethernet
	    interfaces that are treated as a logical link when using
	    Ethernet Link Aggregation is referred to as a Link
	    Aggregation Group (LAG).
	  </t>
	  <t hangText="Equal Cost Multipath (ECMP)">
	    <vspace blankLines="0" /> Equal Cost Multipath (ECMP) is a
	    specific form of multipath in which the costs of the links
	    or paths must be equal in a given routing protocol.  The
	    load may be split equally across all available links (or
	    available paths), or the load may be split proportionally
	    to the capacity of each link (or path).
	  </t>
	  <t hangText="Loop Free Alternate Paths">
	    <vspace blankLines="0" /> "Loop-free alternate paths"
	    (LFA) are defined in <xref target="RFC5714">RFC 5714,
	    Section 5.2</xref> as follows.  "Such a path exists when a
	    direct neighbor of the router adjacent to the failure has
	    a path to the destination that can be guaranteed not to
	    traverse the failure."  Further detail can be found in
	    <xref target="RFC5286" />.  LFA as defined for IPFRR can
	    be used to load balance by relaxing the equal cost
	    criteria of ECMP, though IPFRR defined LFA for use in
	    selecting protection paths.  When used with IP,
	    proportional split is generally not used.  LFA use in load
	    balancing may be implemented though rare or non-existent
	    in deployments.
	  </t>
	  <t hangText="Composite Link"><vspace blankLines="0" />
	    The term Composite Link had been a registered trademark of
	    Avici Systems, but was abandoned in 2007.  The term
	    composite link is now defined by the ITU in
	    <xref target="ITU-T.G.800" />.  The ITU definition
	    includes multipath as defined here, plus inverse
	    multiplexing which is explicitly excluded from the
	    definition of multipath.
	  </t>
	  <t hangText="Inverse Multiplexing"><vspace blankLines="0" />
	    Inverse multiplexing either transmits whole packets and
	    resequences the packets at the receiving end or subdivides
	    packets and reassembles the packets at the receiving end.
	    Inverse multiplexing requires that all packets be handled
	    by a common egress packet processing element and is
	    therefore not useful for very high bandwidth applications.
	  </t>
	  <t hangText="Component Link"><vspace blankLines="0" />
	    The ITU definition of composite link in
	    <xref target="ITU-T.G.800" /> and the IETF definition of
	    link bundling in <xref target="RFC4201" /> both refer to
	    an individual link in the composite link or link bundle as
	    a component link.  The term component link is applicable
	    to all multipath.
	  </t>
	  <t hangText="LAG Member"><vspace blankLines="0" />
	    Ethernet Link Aggregation as defined in
	    <xref target="IEEE-802.1AX" /> refers to an individual link in a
	    LAG as a LAG member.
	  </t>
      </list></t>
    </section>

    <section anchor="multipath-reqm"
	     title="Multipath Requirements">
      <t>
	This section enumerates two sets of requirements.  The first
	set includes those requirements imposed by the need for
	scalability and very large capacity links and very large
	capacity LSP and are enumerated in <xref target="ip-mpls-reqm"
	/>.  The second set of requirements are those imposed by the
	needs of MPLS-TP and are enumerated in <xref target="tp-reqm"
	/>.  Discussion of these requirements is provided in
	<xref target="mp-reqm-discuss" />.
      </t>
      <t>
	<xref target="multipath-practices" /> describes multipath
	techniques which are implemented and deployed today.
	<xref target="multipath-changes" /> enumerates derived
	requirements which focus on means to support the requirements
	in <xref target="ip-mpls-reqm" /> and <xref target="tp-reqm"
	/> with minimal modifications to existing multipath
	techniques.  A summary of recommendations is provided in
	<xref target="multipath-summary" />.
      </t>
      <section anchor="ip-mpls-reqm"
	       title="Scalability and Large Capacity Requirements">
	<t>
	  Networks today may support thousands or tens of thousands of
	  nodes in total.  This large number of nodes is typically
	  arranged in tiers to improve scalability through aggregation
	  of signaling and aggregation of traffic.  The innermost
	  tier, most commonly referred to at the network core, may
	  support interconnection of adjacent sites with hundreds of
	  gigabits or terabits of capacity.
	</t>
	<t>
	  The physical interface of choice today is 10GbE with
	  migration toward 100GbE expected to begin in the near
	  future.  SONET and OTN are also in use, but are today also
	  limited to 10Gb/s or 40Gb/s, with 100Gb/s availability (OTN
	  ODU4) expected in the near future.  With core link
	  capacities of terabits today and tens of terabits expected
	  in the near future, multipath is needed.
	</t>
	<t>
	  <list counter="fr" hangIndent="4" style="format R#%d">
	    <t>
	      Multipath MUST support multipath links that are in well
	      in excess of the largest component link and well in
	      excess of the capacity of a single packet processing
	      element.
	    </t>
	    <t>
	      Multipath SHOULD support direct service bearing LSP
	      carrying Internet traffic within the network core with
	      capacity in excess of the largest component link and in
	      excess of the capacity of a single packet processing
	      element.
	    </t>
	    <t>
	      Aggregation of LSP using hierarchy (as defined in
	      <xref target="RFC4206" />) may be necessary to reduce
	      the number of MPLS labels in use within a network tier
	      containing a large number of nodes.  This aggregation
	      SHOULD NOT be constrained by multipath limitations.
	    </t>
	    <t>
	      LSP containing the aggregate of other LSP SHOULD be
	      capable of exceeding the capacity of the largest
	      component link and in excess of the capacity of a single
	      packet processing element.
	    </t>
	    <t>
	      It SHOULD be possible to support load split of traffic
	      which is very efficient in its utilization of available
	      capacity, subject to some limitations due to conflicting
	      requirements.  The load split SHOULD support sharing of
	      total capacity across the entire multipath where some
	      LSP may make use of unused capacity set aside for other
	      LSP but unused.  This load split SHOULD be as free of
	      bin packing issues as possible except when moving LSP to
	      other component links would conflict with other
	      requirements.
	    </t>
	  </list>
	</t>
      </section>
      <section anchor="tp-reqm"
	       title="MPLS-TP Requirements">
	<t>
	  MPLS-TP requirements related to multipath are primarily
	  related to prohibiting out-of-order delivery of traffic for
	  reasons of OAM fate sharing.  Specific requirements related
	  to OAM are provided in
	  <xref target="I-D.ietf-mpls-tp-oam-framework">"MPLS-TP OAM
	  Framework", Section 4.6, Section 5.5.3, and Section
	  6.2.3</xref>.
	</t>
	<t>
	  The following requirement is currently met with no changes
	  to existing multipath implementations.
	  <list counter="fr" hangIndent="4" style="format R#%d">
	    <t>
	      Traffic within an MPLS-TP PW MUST NOT be reordered
	      unless specifically allowed.  This is met if a PW
	      control word is used <xref target="RFC4385" />.
	      Reordering may be specifically allowed using a PW flow
	      label <xref target="I-D.ietf-pwe3-fat-pw" />.
	    </t>
	  </list>
	</t>
	<t>
	  The following requirement can only be met with existing
	  multipath techniques using MPLS link bundling
	  <xref target="RFC4201" /> if LSR are configured to place an
	  LSP on only a single component rather than spliting some or
	  all LSP across the set of components.  Using link bundling
	  with all LSP constrained to use a single component has well
	  known disadvantages (see <xref target="multipath-bundle"
	  />).  Other forms of multipath as currently defined do not
	  meet this requirement (see <xref target="multipath-types"
	  />).
	  <list counter="fr" hangIndent="4" style="format R#%d">
	    <t>
	      Traffic within an MPLS-TP LSP MUST NOT be reordered if
	      full OAM capability is required of the MPLS-TP LSP
	      <xref target="I-D.ietf-mpls-tp-oam-framework" />.
	    </t>
	  </list>
	</t>
	<t>
	  The remaining MPLS-TP requirements are related to the scale
	  of a deployed MPLS-TP network and have the greatest impact
	  on the network core.  These are practical requirements
	  mostly related to scalability but specific to MPLS-TP.
	  <list counter="fr" hangIndent="4" style="format R#%d">
	    <t>
	      Service PWs and/or service bearing LSPs may form a
	      fairly dense mesh of LSPs from edge to edge over a very
	      large set of nodes.  Some means MUST be available to
	      support such usage of MPLS-TP.  See
	      <xref target="mp-label-space" /> for a discussion of ILM
	      size limitations that are relevant to this requirement.
	    </t>
	    <t>
	      For an MPLS-TP LSP to be fully compliant, all payload
	      and OAM traffic on the MPLS-TP LSP MUST traverse the
	      same physical path.  OAM traffic taking the same path as
	      payload (service bearing) traffic is known as the "fate
	      sharing" requirement (see <xref target="RFC5860">RFC
	      5860, Section 2.1.3</xref>).
	    </t>
	    <t>
	      For large networks, MPLS hierarchy
	      <xref target="RFC4206" /> can be used to reduce the
	      number of LSP from the large number which would be
	      needed to carry all service bearing MPLS-TP LSP through
	      the network core.  For networks configured through the
	      management plane, label stacking can be used to
	      aggregate LSP, though the signaling described in
	      <xref target="RFC4206" /> is not used.  Any MPLS-TP
	      constraints which impact this ability to aggregate LSP
	      SHOULD be optional.  If MPLS-TP constraints must be
	      relaxed in some deployments, such deployments MAY be
	      referred to as partially MPLS-TP compliant.
	    </t>
	    <t>
	      For large networks using link bundling to support large
	      aggregations of MPLS-TP traffic, and using MPLS
	      hierarchy, PSC LSP (see <xref target="RFC4206" />) or
	      label stacking which are providing a server layer within
	      the network core and carrying many service bearing
	      MPLS-TP LSP SHOULD be capable of supporting capacity in
	      excess of any single link bundle component.  In meeting
	      this requirement the server layer LSP need not be an
	      MPLS-TP LSP as long as it is capable of providing a
	      server layer which can support fully compliant MPLS-TP
	      LSP.
	    </t>
	  </list>
	</t>
	<t>
	  LSP which are configured entirely from the management plane
	  rather than through use of a control plane need not use the
	  MPLS PSC portion of the hierarchy as specified in RFC 4206,
	  however hierarchy is still needed in the label stack.
	</t>
      </section>
      <section anchor="mp-reqm-discuss"
	       title="Discussion of Requirements">
	<t>
	  There is a tradeoff between making use of MPLS-TP as a
	  server layer for the benefits of MPLS-TP and the benefits of
	  using MPLS.  The benefits of MPLS-TP include the ability to
	  run without the OSPF-TE, ISIS-TE, and RSVP-TE control
	  protocols, and MPLS-TP OAM.  The benefits of MPLS include
	  more efficient use of multipath capacity due to removal of
	  MPLS-TP constraints.
	</t>
	<t>
	  A requirements for very large server layer traffic flow
	  within the network core can be accommodated using multiple
	  parallel MPLS-TP LSP.  This increases the number of LSP
	  required which itself is a drawback.  This also results in a
	  bin packing problem if the service bearing MPLS-TP LSP do
	  not require the same capacity and are not all small
	  multiples of a common capacity increment.  For example, if
	  LSP are not all 10Gb/s, or they are not only 10Gb/s and 40
	  Gb/s then bin packing problems can occur.  This use of
	  MPLS-TP can also result in less opportunity for statistical
	  multiplexing with very large aggregates of lower priority
	  non-TP IP/MPLS traffic (see <xref target="multipath-bundle"
	  /> and <xref target="tp-server-layer" /> for further details
	  on bin packing problems and loss of efficiency with MPLS-TP
	  as a server layer).
	</t>
	<t>
	  The following subsections provide further detail related to
	  the requirements enumerated in <xref target="ip-mpls-reqm"
	  /> and <xref target="tp-reqm" />.
	</t>
	<section anchor="mp-midp-impact"
		 title="Requirements related to midpoint LSR">
	  <t>
	    Midpoint LSR must support a very large number of LSP.
	    This places requirements on the ILM size.  If a control
	    plane is used this also places requirements on the speed
	    of processing RSVP-TE messages.  As long as RSVP-TE ERO
	    contain only strict hops, the processing is limited to
	    connection admission, label assignment, and forwarding
	    hardware programming of the label swap operation.
	  </t>
	  <section anchor="mp-label-space"
		   title="MPLS Incoming Label Map (ILM) Size">
	    <t>
	      The MPLS label entry is 32 bits of which the label itself
	      is 20 bits <xref target="RFC3032" />.  This allows 2^20 or
	      1,048,576 values minus the 16 reserved label values.  The
	      Incoming Label Map (ILM) (see <xref target="RFC3031">RFC
		3031, Section 1.11</xref>) is generally much smaller.
	      Circa 2000, ILM sizes of 4K-32K were common.  Circa 2010,
	      ILM sizes of 64K-256K are more common in core LSR.
	    </t>
	    <t>
	      Putting a bound on ILM size has two effects.  It allows
	      LSR that offer higher power and space density.  For
	      deployments which use a control plane and support
	      restoration, speed of restoration is dramatically improved
	      when a smaller number of LSP are supported.
	    </t>
	  </section>
	  <section anchor="mp-power"
		   title="ILM Size Impact on Equipment Density">
	    <t>
	      For some architectures, bounding the ILM size allows the
	      ILM to be supported without forwarding memory external
	      to the forwarding IC.  This is a practical consideration
	      as the power reduction and board space reduction can
	      allow an LSR to achieve higher power and space density.
	    </t>
	    <t>
	      Reducing external memories reduces power consumed and
	      therefore reduces cooling problems.  In addition there are
	      board space reductions.  This results in reduced space as
	      well as power.
	    </t>
	    <t>
	      In today's networks, which predominantly use MPLS/GMPLS
	      OSPF-TE or ISIS-TE and RSVP-TE signaling, the
	      computational limitations described in
	      <xref target="mp-cspf" /> are the limiting factor.
	      Reduction in space and power due to smaller ILM are then a
	      secondary consequence of the signaling scaling issue.
	    </t>
	  </section>
	  <section anchor="mp-topo-ilm-size"
		   title="Topology Impact on ILM Size">
	    <t>
	      In a network tier with N nodes, a worst case cutset has
	      N/2 nodes on either side of the cutset.  Given that a full
	      mesh of LSP connectivity is needed in the network core,
	      the cutset therefore carries N^2/4 LSP.  For example, if N
	      is 400, the cutset carries a minimum of 40,000 LSP to
	      achieve a full mesh.  If the core has over 2,000 nodes,
	      then the cutset carries over 1,000,000 LSP.  Since the
	      MPLS label space is only 20 bits, a full mesh within an
	      entire provider network with no hierarchy could easily
	      exceed the MPLS label number space.  Use of Hierarchy can
	      solve this problem.
	    </t>
	    <t>
	      Typically there are more than one LSP between any pair of
	      LSR in the network core.  Protection is one source of
	      additional LSP.  More than one LSP may be required to
	      carry traffic with very different requirements.  See
	      <xref target="mp-topo-multi-lsp" />.
	    </t>
	    <t>
	      The result is that even considering only the ILM size, the
	      number of nodes in a full mesh of LSP must be limited to
	      well under 1,000.  If two links in a cutset supporting a
	      large number of LSP incur a fault, then the nodes
	      bordering the remaining links in the cutset must process a
	      very large number of RSVP-TE PATH and RESV messages and
	      the connection admission requests and ILM allocation
	      operations that are required as a result.
	    </t>
	  </section>
          <section anchor="mp-topo-multi-lsp"
		   title="Multiple LSP Between Node Pairs">
	    <t>
	      A full mesh of N nodes will have N*(N-1) unidirectional
	      LSP or N*(N-1)/2 bidirectional LSP if there is only one
	      LSP with any given pair of nodes as ingress and egress.
	      There may be more than one LSP with any given pair of
	      nodes as ingress and egress to meet protection
	      requirements or to meet certain quality of service
	      requirements.
	    </t>
	    <t>
	      If <xref target="RFC4426">GMPLS protection</xref>
	      protection is used, the number of LSP is doubled with
	      end-to-end (path) protection, but more than doubled with
	      span protection.  If <xref target="RFC4090">MPLS
		FRR</xref> is used, the number of LSP is increased only
	      slightly with the (more common) facilities backup
	      technique, but more than doubled with the one-to-one
	      backup technique.
	    </t>
	    <t>
	      All services between a pair of core nodes may be carried
	      over a single unsignaled E-LSP <xref target="RFC3270" />
	      if the eight <xref target="RFC5462">TC values</xref> are
	      sufficient and the requirements of these services is
	      sufficiently similar.  If more than eight PHB are
	      required, more LSP will be required.  If services
	      require preemption, or have different protection needs,
	      then multiple LSP per pair of core nodes is required.
	      If services have different delay requirements, this too
	      may require multiple LSP per pair of core nodes.
	    </t>
	    <t>
	      The total number of LSP at a cutset needs to be
	      constrained for two reasons.  First the number of LSP must
	      fit in the 20 bit label field or the smaller number of
	      labels supported by most LSR.  Second is a need to reduce
	      the amount of signaling that would be required if
	      restoration was needed to cover a multiple fault (if
	      restoration is not supported multiple faults can result in
	      otherwise avoidable outages which persist until a physical
	      repair or manual intervention is completed).
	    </t>
	  </section>
        </section>
	<section anchor="mp-ingress-impact"
		 title="Requirements related to Ingress LSR">
	  <t>
	    Where traffic enters a provider network tier such as the
	    core, LSR serve as ingress to PSC LSP if hierarchy is
	    used.  If RSVP-TE signaling is used, ingress must perform
	    CSPF if fully dynamic MPLS routing is used.  Even when
	    working and protection paths are configured with explicit
	    paths computed offline, when a multiple fault occurs, if
	    restoration is supported, then CSPF must be run.  It is
	    this multiple fault scenario which generally dictates
	    scalability.
	  </t>
	  <section anchor="mp-signal"
		   title="Reasons to Use MPLS/GMPLS Signaling">
	    <t>
	      Dynamic routing is necessary in order to provide
	      restoration which is as robust as possible in the presence
	      of multiple faults while still providing efficient
	      utilization of resources.
	    </t>
	    <t>
	      Legacy transport networks offer protection which
	      requires dedicated protection resources.  If resources
	      are allocated through the management plane, then
	      restoration support is either not provided at all or
	      extremely slow at best.  More modern transport equipment
	      which supports fast restoration requires signaling which
	      is generally provided using GMPLS.
	    </t>
	    <t>
	      IP/MPLS networks typically make use of protection which
	      offers sharing or protection resources or more commonly
	      make use of zero bandwidth allocation on protection paths.
	      The use zero bandwidth allocation provides robust
	      protection of preferred traffic as long as preferred
	      traffic is given queuing priority and preferred traffic
	      levels are low enough that adequate protection resources
	      are available for preferred traffic regardless of the
	      protection path taken.  This assumption is not violated in
	      network which are dominated by Internet traffic and carry
	      a minority of preferred traffic.
	    </t>
	    <t>
	      When a single fault occurs, protection should restore
	      traffic flow quickly, with a typical target being 45 msec.
	      Many deployments are configured such that LSR run CSPF
	      after a fault to obtain a new protection path for what is
	      now effectively the working path, or reroute the working
	      LSP and then create a new protection LSP.
	    </t>
	    <t>
	      Multiple faults which are not accounted for by SRLG are
	      fairly common.  In many cases, such as earthquake,
	      bridge collapse, train wreck, flood, it is impractical
	      to account for the specific multiple fault in the SLRG
	      set.  When this does occur, fast
	      restoration is often required for a large number of LSP
	      for which both the working and protect paths are affected.
	      In this case, a long convergence time would result in a
	      more lengthy outage for those LSP for which the multiple
	      fault was service affecting.
	    </t>
	    <t>
	      For core Internet services and for many non-Internet core
	      services, an inability to reach any one point in the
	      network from another for a significant length of time due
	      to a fault which is correctable, even if it is a multiple
	      fault, is unacceptable.  These services require
	      restoration at some layer.
	    </t>
	  </section>
	  <section anchor="mp-cspf"
		   title="MPLS Fault Response and CSPF Scaling">
	    <t>
	      For most core networks MPLS/GMPLS signaling is required at
	      some layer for reasons described in
	      <xref target="mp-signal" />.  In order for restoration to
	      occur quickly, scaling issues must be considered and
	      addressed, including network topology impacts on scaling.
	      These scaling issues are dominated by CSPF computations
	      and OSPF or ISIS flooding impact.
	    </t>
	    <t>
	      For a given ingress in a full mesh of LSR, a fault can
	      result in a very large number of affected LSP.  At
	      midpoint LSR the worst case number of connection
	      acceptance decisions can be very large.  The
	      computational load per LSP on connection acceptance at
	      midpoint LSR is small but the reflooding of available
	      bandwidth can also contribute significant load.
	    </t>
	    <t>
	      At LSP ingress, the number of CSPF computations imposes
	      scaling limitations.  CSPF computation time is
	      proportional to the number of nodes in a mesh and the
	      total number of links.  If the average node degree
	      remains constant, then the total number of links is
	      proportional to the number of nodes.  The result is a
	      single CSPF time with order N*log2(N) time complexity
	      (where N is the number of nodes in the mesh).  If the
	      worst case number of LSP affected by a fault also grows
	      proportionally to N, then the total amount of computation
	      is order N^2*log2(N).  The amount of computation grows at
	      a rate of greater than the square of the growth in the
	      number of nodes.
	    </t>
	    <t>
	      If restoration is not supported, any multiple fault will
	      result in a lengthy outage.  If restoration is supported,
	      constraining the size of a full mesh will very
	      significantly reduce the CSPF computation load and the
	      reflooding overhead and very significantly improve the
	      worst case restoration time.
	    </t>
	  </section>
        </section>
	<section anchor="mp-efficiency"
		 title="Efficient Use of Multipath Capacity">
	  <t>
	    Multipath load split based on hashing the IP addresses or
	    MPLS labels is far from perfect, though it is widely
	    implemented and widely deployed.  For the vast majority of
	    traffic, which is predominantly Internet traffic, the
	    underlying assumption that traffic is quite evenly
	    distributed across a hash space is valid.  For a mix of
	    Internet traffic and fairly persistent large microflows,
	    adaptive multipath has proven effective (see
	    <xref target="multipath-active" />).
	  </t>
	  <t>
	    The bandwidth reservations of LSP carrying Internet
	    traffic are merely predictions of required capacity.
	    Often a significant percentage of traffic can shift among
	    a set of LSP.  A great deal of efficiency is gained in the
	    presence of such shifts through the ability to dynamically
	    share the available capacity on a multipath.
	  </t>
	  <t>
	    The introduction of a minority of higher priority (and
	    higher gross margin) services to predominantly Internet
	    traffic yields an additional opportunity to make more
	    efficient use of capacity.  These higher priority services
	    on average significantly underutilize their guaranteed
	    capacities.  The average over the entire set of such
	    services is fairly predictable.  The capacity allocated to
	    these services but unused can be used as Internet
	    capacity.  Some small probability exists that these
	    services will make use of significantly more capacity than
	    predicted, up to their guaranteed capacities, but the
	    consequences of this unlikely occupance is a reduction in
	    capacity available to the Internet traffic for which
	    capacity is not guaranteed.  This practice allows high
	    margin services to be delivered at substantially lower
	    cost with very little risk to Internet traffic and no risk
	    at all to the higher priority services.
	  </t>
	  <t>
	    For the reasons above, current multipath techniques offer
	    efficient use of multipath capacity.  Changes to multipath
	    MUST NOT sacrifice this efficiency where it is not
	    necessary to meet other requirements.
	  </t>
	</section>
      </section>
    </section>

    <section anchor="multipath-practices"
	     title="Multipath Current Practices">
      <t>
	Multipath take many forms.  These include the use of ECMP in
	various protocols, Ethernet Link Aggregation, and Link
	Bundling.  The specifications for each of these forms of
	multipath provide limited characterization of external
	behavior, where any guidance is provided at all.  This
	section summarizes current practices among products which are
	currently or have in the past been deployed successfully in
	Internet service provider networks and content provider
	networks.
      </t>
      <t>
	Much of the existing information on multipath current
	practices is summarized in <xref target="existing" />.  With
	the exception of the work in PWE3 and minimal mention in LDP
	very little consideration for multipath impact on new
	protocols has been documented.
      </t>
      <t>
	This section is divided into two parts.  First is
	documentation of techniques common to all forms of multipath
	in <xref target="multipath-common" />.  Second is application
	of these techniques and unique characteristics of specific
	forms of multipath in <xref target="multipath-types" />.
      </t>
      <section anchor="multipath-common"
	       title="Techniques Common to Multipath in Provider Networks">
	<t>
	  There is a dramatic difference between the multipath
	  techniques used for pure Layer-2 Ethernet switches intended
	  for enterprise networks and the multipath techniques used
	  for large provider core networks.  Many enterprise switches
	  use only the Ethernet MAC in load balancing, thought the
	  argument that such networks may not be carrying IP or MPLS
	  traffic at all is rarely cited as a reason today.  The
	  routers and/or LSR used in large provider networks are
	  assumed to be carrying IP traffic and/or MPLS traffic where
	  the MPLS traffic is predominantly carrying IP traffic as its
	  payload.
	</t>
	<t>
	  Most of the multipath techniques used for large provider
	  core networks are common across all types of multipath.
	  This is because the traffic being handled by multipath in
	  large provider networks is predominantly IP or IP over MPLS.
	  The following paragraph is quoted from
	  <xref target="RFC4928"> RFC 4928, Section 2, "Current ECMP
	  Practices"</xref>:
	  <list style="empty">
	    <t>
	      In the early days of MPLS, the payload was almost
	      exclusively IP.  Even today the overwhelming majority of
	      carried traffic remains IP.  Providers of MPLS equipment
	      sought to continue this IP ECMP behavior.  As shown
	      above, it is not possible to know whether the payload of
	      an MPLS packet is IP at every place where IP ECMP needs
	      to be performed.  Thus vendors have taken the liberty of
	      guessing the payload.  By inspecting the first nibble
	      beyond the label stack, existing equipment infers that a
	      packet is not IPv4 or IPv6 if the value of the nibble
	      (where the IP version number would be found) is not 0x4
	      or 0x6 respectively.  Most deployed LSRs will treat a
	      packet whose first nibble is equal to 0x4 as if the
	      payload were IPv4 for purposes of IP ECMP.
	    </t>
	  </list>
	</t>
	<t>
	  This observation led to the specification of
	  the <xref target="RFC4385"> PW Control Word</xref> such that
	  the values 4 and 6 which could be mistaken for IPv4 or IPv6
	  were avoided.  More accurately, <xref target="RFC4928" />
	  was written to document the reasons for this decision made
	  in <xref target="RFC4385" />.
	</t>
	<section anchor="multipath-flow-id"
		 title="Flow Identification">
	  <t>
	    IP traffic in a large provider core network contains a
	    very large number of very short lived microflows (refer to
	    the definition of microflow in <xref target="RFC2475" />).
	    The number of flows has in the past been estimated as many
	    millions or many tens of millions.  Many of the flows
	    exchange as few as two packet (DNS for example).  Most
	    contain only tens of packets.  Most flows exist for a few
	    seconds and some less than a second.  A much smaller
	    number of flows (though still a large number) are longer
	    in duration and exchange larger amounts of data.
	  </t>
	  <t>
	    Attempts to isolate individual IP flows in large provider
	    core networks for the purpose of routing them individually
	    have met with resounding failure.  Current practice does
	    not attempt to isolate individual flows, but instead
	    isolates groups of flows.  If reordering is minimized or
	    eliminated for groups of flows, then reordering is
	    minimized or eliminated for any single flow with a group.
	  </t>
	  <t>
	    The method of subdividing IP traffic into groups of flows
	    that has been used successfully for more than two decades
	    (since the T1-NSFNET in 1987 or possibly prior to that) is
	    to use a hash function over the IP source address and
	    destination address.  Including the TCP or UDP port
	    numbers might be beneficial for enterprise networks but is
	    not necessary for large provider networks.  Omitting port
	    number is large provider networks has the desirable
	    characteristic of better enforcing fairness among flows by
	    eliminating or reducing the potential of end users using
	    multiple port numbers to defeat any tendency toward
	    fairness among flows.
	  </t>
	  <t>
	    In large provider core networks, MPLS LSP (in contrast to
	    IP) are very long lived, generally provide a large to very
	    large amounts of traffic, and are relatively few in
	    number.  In many large provider core networks LSP which
	    carry Internet traffic from one major core node to another
	    major core node, can very substantially exceed the
	    capacity of a multipath component link.
	  </t>
	  <t>
	    For MPLS traffic carrying Internet IP traffic, "taking the
	    liberty of guessing the payload" (as described in RFC
	    4928) was a matter of necessity.  The label stack simply
	    did not provide adequate diversity.  Initially some LSR
	    did not support this capability.  Splitting very large LSP
	    by configuring two or more provided a workaround (which
	    only moved the hashing and load splitting out of the
	    core), however hashing based on label stack was highly
	    ineffective and packing LSP individually into link bundle
	    component links has substantial disadvantages (see
	    <xref target="multipath-bundle" />).
	  </t>
	  <t>
	    For MPLS that is not carrying IP, the MPLS label stack is
	    used as the basis for the load split hash.  Generally the
	    entire label stack is used or as few as three of the
	    bottom labels are used.  Using only the bottom label (or
	    only the top label) has proven unsatisfactory in terms of
	    splitting the load.  Some forms of PW can be subdivided
	    which has motivated the introduction of
	    a <xref target="I-D.ietf-pwe3-fat-pw">PW flow
	    label</xref>.
	  </t>
	</section>
	<section anchor="multipath-active"
		 title="Simple Multipath and Adaptive Multipath">
	  <t>
	    Simple multipath generally relies on the mathematical
	    probability that given a very large number of small
	    microflows, these microflows will tend to be distributed
	    evenly across a hash space.  A common simple multipath
	    implementation assumes that all component links are of
	    equal capacity and perform a modulo operation across the
	    hashed value.  An alternate simple multipath technique
	    uses a table generally with a power of two size, and
	    distributes the table entries proportionally among
	    component links according to the capacity of each
	    component link.
	  </t>
	  <t>
	    An adaptive multipath technique is one where the traffic
	    bound to each component link is measured and the
	    load split is adjusted accordingly.  As long as the
	    adjustment is done within a single network element, then no
	    protocol extensions are required and there are no
	    interoperability issues.
	  </t>
	  <t>
	    Specific adaptive multipath techniques are outside of the
	    scope of this document.
	  </t>
	</section>
	<section anchor="multipath-links"
		 title="Traffic Split over Parallel Links">
	  <t>
	    The load splitting techniques defined in
	    <xref target="multipath-common" /> and those defined in
	    <xref target="multipath-active" /> are both used in
	    splitting traffic over parallel links between the same
	    pair of nodes.  The best known technique, though far from
	    being the first, is
	    <xref target="IEEE-802.1AX">Ethernet Link
	    Aggregation</xref>.  This same technique had been applied
	    much earlier using OSPF or ISIS Equal Cost MultiPath
	    (ECMP) over parallel links between the same
	    nodes.  <xref target="RFC1717"> Multilink PPP</xref> uses
	    a technique that provides inverse multiplexing.  A number
	    of vendors had provided proprietary extensions to
	    <xref target="RFC2615">PPP over SONET/SDH</xref> that
	    predated Ethernet Link Aggregation but are no longer used.
	  </t>
	  <t>
	    <xref target="RFC4201">Link bundling</xref> provides yet
	    another means of handling parallel LSP.  RFC4201
	    explicitly allow a special value of all ones to indicate a
	    split across all component links of the bundle.  Use of
	    link bundling is discussed in
	    <xref target="multipath-bundle" />.
	  </t>
	  <t>
	    All of these techniques, including ECMP, may be used over
	    two or more links between a pair of nodes.  The most
	    primitive load split algorithms may require that all links
	    be of the same capacity and may attempt to load balance
	    equally.  Somewhat less primitive techniques may allow
	    links to be unequal in capacity.  Any of these techniques
	    can also use an adaptive multipath algorithm as described
	    in <xref target="multipath-active" />.
	  </t>
	</section>
	<section anchor="multipath-paths"
		 title="Traffic Split over Multiple Paths">
	  <t>
	    OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known
	    form of traffic split over multiple paths that may traverse
	    intermediate nodes.  ECMP is often incorrectly equated to
	    only this case, and multipath over multiple diverse paths is
	    often incorrectly equated to an equal division of traffic.
	  </t>
	  <t>
	    Many implementations are able to create more than one LSP
	    between a pair of nodes, where these LSP are routed
	    diversely to better make use of available capacity.  The
	    load on these LSP can be distributed proportionally to the
	    reserved bandwidth of the LSP.  These multiple LSP may be
	    advertised as a single PSC FA and any LSP making use of the
	    FA may be split over these multiple LSP.
	  </t>
	  <t>
	    <xref target="RFC4201">Link bundling</xref> component links
	    may themselves be LSP.  When this technique is used, any LSP
	    which specifies the link bundle may be split across the
	    multiple paths of the LSP that comprise the bundle.
	  </t>
	  <t>
	    Other forms of multipath may use what appear to be
	    physical component links that are provided by a server
	    layer.  For example, the components of an Ethernet LAG may
	    be provided by Ethernet PW <xref target="RFC4448" />.
	  </t>
	  <t>
	    Techniques which spread traffic over multiple paths may
	    use simple multipath or adaptive multipath as described in
	    <xref target="multipath-active" />.  When ECMP is used
	    over an IP link or MPLS LDP LSP, visibility of available
	    capacity along the path is limited to the next hop only,
	    therefore load which is split proportionally to the
	    capacity of the immediate hop may not be split optimally
	    for the entire path, even using an adaptive multipath
	    capable forwarding.  For techniques which split traffic
	    over one or more LSP, the available capacity along the
	    path to the destination is assumed to be known through the
	    bandwidth reservations of the LSP.
	  </t>
	</section>
      </section>
      <section anchor="multipath-types"
	       title="Specific Types of Multipath">
	<t>
	  Three forms of multipath are considered here.
	  <list style="symbols">
	    <t>
	      ECMP
	    </t>
	    <t>
	      Ethernet Link Aggregation
	    </t>
	    <t>
	      MPLS Link Bundling
	    </t>
	  </list>
	</t>
	<t>
	  Of these types of multipath, the latter two can be applied
	  to MPLS with RSVP-TE signaling or static configurations.
	</t>
	<section anchor="multipath-ecmp"
		 title="ECMP Current Practices">
	  <t>
	    Equal Cost Multipath has been available in the ISIS and
	    OSPF link state routing protocols for two decades or more.
	    For example, see <xref target="RFC1247" />.  ECMP is also
	    available in BGP.  ECMP is declared out of scope in LDP,
	    though widely implemented.
	  </t>
	  <t>
	    Although ECMP is not applicable to MPLS LSP setup with
	    RSVP-TE signaling, ECMP can be applied at an LER.
	  </t>
	  <t>
	    At an MPLS LER ECMP can be applied over two or more MPLS
	    LSP with traffic split proportionally to the LSP reserved
	    bandwidth.  This could also be considered to be IP ECMP
	    with an underlying MPLS LSP server layer.
	  </t>
	  <t>
	    The equivalent to ECMP for an LSP setup can be achieved by
	    creating PSC LSP and concatenating them using link
	    bundling, and using the "all ones" link bundle component
	    (see <xref target="multipath-bundle" />.
	  </t>
	</section>
	<section anchor="multipath-lag"
		 title="Ethernet Link Aggregation Current Practices">
	  <t>
	    Ethernet link aggregation (<xref target="IEEE-802.1AX" />)
	    concatenates a set of Ethernet member links below the
	    Ethernet link layer, such that the link aggregation group
	    (LAG) appears as a single link with a single Ethernet MAC
	    address.  The link aggregation control protocol (LACP)
	    coordinates membership in the LAG such that the member
	    links can be made unavailable to upper layers and added to
	    the LAG on both nodes.
	  </t>
	  <t>
	    For IP using a link state protocol with ECMP, Ethernet
	    link aggregation had little effect.  The load balancing on
	    a LAG was identical to the load balancing using ECMP over
	    the set of member links.  ISIS only advertises the
	    adjacencies between nodes.  OSPF advertises each link
	    between nodes, so for IP using OSPF, link aggregation only
	    resulted in a reduction in routing protocol overhead and
	    simplification of the SPF.
	  </t>
	  <t>
	    For MPLS, some vendors had already implemented proprietary
	    extensions to <xref target="RFC2615">PPP over
	    SONET/SDH</xref> that predated the earliest IEEE work on
	    link aggregation (IEEE 802.3ad) with capabilities similar
	    to LACP.  It was not until 10GbE became widely available
	    (about 5 years later) that LAG was used in provider core
	    networks, and began replacing OC-192.  MPLS link bundling
	    implementations (prior to RFC status) also predated
	    Ethernet link aggregation.
	  </t>
	  <t>
	    A network deployment circa 2005 could either configure
	    many Ethernet links and use MPLS link bundling, or
	    configure an Ethernet LAG.  If an MPLS link bundle was
	    configured to split load over all link bundle component
	    links the functionality was equivalent to configuring the
	    set of links as a LAG.  In core LSR implementations, the
	    load split in these two cases was identical.
	  </t>
	</section>
	<section anchor="multipath-bundle"
		 title="MPLS Link Bundling Current Practices">
	  <t>
	    MPLS link bundling <xref target="RFC4201" /> was conceived
	    at about the time that it was clear that OC-48 was too
	    slow for IP core links, OC-192 was just becoming available
	    and would soon be too slow, and MPLS had strong support
	    among multiple providers.  Link bundling initially solved
	    two problems.  A few individual vendors had proprietary
	    extensions to <xref target="RFC2615">PPP over
	    SONET/SDH</xref>.  Link bundling could offer equivalent
	    capability and offer vendor interoperability.  Second,
	    some vendor hardware was not capable of load splitting and
	    therefore required that each top level LSP be assigned a
	    single path.  Further, each side of a link bundle could be
	    configured differently, one could load split and the other
	    could place LSP on individual component link.
	  </t>
	  <t>
	    If LSP are place on individual links rather than split
	    over the entire bundle, then bin packing problems can
	    occur.  LSP are often large making this packing error
	    significant.  In addition, LSP bandwidth reservations in
	    most IP/MPLS deployments are only predictions of expected
	    bandwidth.  With link bundling, as specified, LSP cannot
	    be moved from one link bundle component link to another.
	    If LSP are assigned to links rather than split based on IP
	    address pairs, there is less opportunity for one LSP to
	    make use of unused capacity due to other LSP being
	    utilized.  The bin packing and loss of opportunity to
	    share capacity both reduce the efficiency of capacity
	    utilization.
	  </t>
	  <t>
	    MPLS link bundling does not currently offer an ability to
	    select which LSP are assigned to a single component link
	    and which LSP are split over the entire set of component
	    links.  Most forwarding hardware can support this.
	    Although an LSR could in principle be configured to use
	    some other attribute of an LSP to infer the decision to
	    load split, such as holding priority or an affinity for an
	    administrative attribute, no LSR software provides this
	    capability.  Until MPLS-TP there was never a need for that
	    capability.
	  </t>
	</section>
      </section>
    </section>

    <section anchor="multipath-changes"
	     title="Improving Support for MPLS-TP and Multipath Requirements">
      <t>
	The purpose of this section is to describe how MPLS-TP and
	multipath could coexist and to define simple changes to
	accomplish this.
      </t>
      <section anchor="mp-soln-discuss"
	       title="Characteristics of MPLS-TP Multipath Solutions">
	<t>
	  Three different methods to support MPLS-TP and multipath are
	  described.  One method requires simple changes to link
	  bundle and LAG.  One method requires no changes but has
	  disadvantages.  One method involves no change to multipath
	  but requires relaxation to MPLS-TP OAM requirements.
	</t>
	<t>
	  The best solution makes MPLS over multipath a fully
	  compliant server layer for MPLS-TP meeting all of the
	  requirements stated in the prior sections but cannot be
	  fully supported by most existing LSR without hardware
	  changes.  The other two solutions have disadvantages but
	  require little or no change to existing hardware that would
	  otherwise support MPLS-TP.  The changes are specified at the
	  level of detail of requirements and/or framework rather than
	  as specific protocol changes.
	</t>
	<section anchor="tp-coexist"
		 title="Coexistance of MPLS and MPLS-TP">
	  <t>
	    The largest contributor of provider traffic today is the
	    Internet.  All of this traffic is IP with some providers,
	    but not all, using IP over MPLS.  IP is used without MPLS
	    with ECMP and LAG and IP is used with MPLS with all three
	    forms 0f multipath described in
	    <xref target="multipath-types" />, ECMP, LAG, and link
	    bundling.
	  </t>
	  <t>
	    In addition to Internet services, many providers currently
	    offer layer-2 and layer-3 VPN services over MPLS today.
	    Other providers offer native layer-2 services with an
	    intention to migrate to MPLS-TP for these services.
	  </t>
	  <t>
	    A primary purpose of migrating VPN and circuit services from
	    layer-2 to MPLS-TP is to reduce cost relative to a dedicated
	    layer-2 infrastructure for these services.  Much of that
	    reduction comes from making use of infrastructure in place
	    to support Internet traffic.
	  </t>
	  <t>
	    Using the capacity in place for Internet, predictive
	    reservations can be made for higher priority services, with
	    guarantees possible by transferring the risk of exceeding
	    the predictions to the Internet traffic through use of
	    priority queuing.  With Internet loads being much larger,
	    the unlikely event of predictive reservations being exceeded
	    would easily be absorbed.  This architecture allows VPN and
	    circuit services to be delivered at lower cost.
	  </t>
	  <t>
	    IP/MPLS requires the use of multipath due to the high
	    traffic levels.  MPLS-TP requires a single path for each
	    LSP.  With no changes, these two requirements are in
	    conflict.  Three possible approaches are examined in the
	    following sections.
	    <list style="numbers">
	      <t>
		Supporting MPLS and MPLS-TP over a common server layer
		with multipath support as well as MPLS-TP over an MPLS
		server layer over a multipath capable server layer.
	      </t>
	      <t>
		Supporting MPLS over an MPLS-TP server layer using
		multiple MPLS-TP LSP as MPLS component links where
		multipath is needed.
	      </t>
	      <t>
		Relaxing MPLS-TP OAM and documenting the limitations
		such that MPLS-TP could be supported over an existing
		multipath server layer.
	      </t>
	    </list>
	  </t>
	  <t>
	    Each of these are separate solutions.  For example, if
	    changes to MPLS forwarding enable MPLS with multipath to
	    support fully compliant MPLS-TP LSP, then relaxing MPLS-TP
	    OAM is not needed.  Conversely, if MPLS forwarding cannot be
	    changed on specific existing equipment to accommodate
	    MPLS-TP, then one of the other two solutions is required.
	    Supporting MPLS-TP OAM at high rates also requires hardware
	    change to most existing LSR, therefore all of these
	    solutions require some form of hardware change.
	  </t>
	</section>
	<section anchor="coexist-solution-set"
		 title="Advantages and Disadvangates of Solutions">
	  <t>
	    A desirable solution is one that meets all requirements and
	    is highly cost effective.  An undesirable solution is one
	    that either does not meet all requirements or is not cost
	    effective.  The ability to use existing hardware is also
	    desirable.  A number of solutions and the necessary changes
	    are discussed in the following subsections.
	  </t>
	  <t>
	    MPLS, which requires multipath, and MPLS-TP, which requires
	    a single path, could potentially coexist in the following
	    ways.
	    <list style="hanging" hangIndent="4">
              <t hangText="MPLS as a Server Layer for MPLS-TP">
		<vspace blankLines="0" />
		(<xref target="mpls-server-layer" />)
		<vspace blankLines="0" />
		<list style="hanging" hangIndent="4">
		  <t hangText="Advangates:">
		    MPLS-TP can be fully accommodated with small
		    signaling changes and forwarding changes.  Efficient
		    use of capacity can be achieved.
		  </t>
		  <t hangText="Disadvangates:">
		    Changes to the fields over which a hash is computed
		    is required and therefore this method may no be
		    supportable with some existing hardware.
		  </t>
		</list>
	      </t>
	      <t hangText="MPLS-TP as a Server Layer for MPLS">
		<vspace blankLines="0" />
		(<xref target="tp-server-layer" />)
		<vspace blankLines="0" />
		<list style="hanging" hangIndent="4">
		  <t hangText="Advangates:">
		    Some transport providers prefer to offer MPLS-TP due
		    to its ability to support familiar management and
		    operations procedures, involving static
		    configuration of network elements and inband
		    performance monitoring and protection activation.
		  </t>
		  <t hangText="Disadvangates:">
		    Multipath is moved to the client layer.  High
		    bandwidth MPLS LSP must be supported through
		    smaller parallel MPLS-TP LSP.  The opportunity to
		    dynamically share capacity of MPLS LSP is diminished
		    when large MPLS LSP are run over smaller MPLS-TP
		    LSP.  The use of MPLS-TP LSP across a high bandwidth
		    core will increase the number of LSP required and
		    may impact scalability.
		  </t>
		</list>
	      </t>
	      <t hangText="Relax MPLS-TP OAM Requirements">
		<vspace blankLines="0" />
		(<xref target="relax-tp-oam" />)
		<vspace blankLines="0" />
		<list style="hanging" hangIndent="4">
		  <t hangText="Advangates:">
		    Relaxing OAM requirements would allow MPLS-TP LSP to
		    exceed the capacity of a single component (or
		    member) link.  MPLS over MPLS-TP becomes more
		    practical.
		  </t>
		  <t hangText="Disadvangates:">
		    CC/CV requires enhancement to exercise all parts of
		    a multipath and would benefit from further
		    enhancements (see <xref target="relax-tp-oam" />).
		    CC/CV must be coordinated across multiple packet
		    processing elements.  Reordering of MPLS-TP traffic,
		    even if not harmful to the payload itself, would
		    result in significant short term inaccuracy in loss
		    reported by OAM LM.
		  </t>
		</list>
	      </t>
	    </list>
	  </t>
	</section>
      </section>
      <section anchor="mp-soln-set"
	       title="MPLS-TP Multipath Solution Set">
	<t>
	  Three solutions are described.  As noted in
	  <xref target="tp-coexist" /> these are three separate
	  solutions.  Each can be deployed independently.  Most
	  important neither of the first two solutions requires
	  relaxing MPLS-TP OAM requirements.  On the other hand, these
	  solutions are not mutually exclusive.
	</t>
	<section anchor="mpls-server-layer"
		 title="MPLS as a Server Layer for MPLS-TP">
	  <t>
	    Using MPLS with multipath as a server layer for MPLS-TP
	    has the most advantages with respect to the requirements,
	    and with the exception of inability to run on some (or
	    most) existing hardware, has no disadvantages.  This is
	    assuming that the protocol changes suggested in this
	    subsection are implemented in later IETF documents.
	  </t>
	  <t>
	    Supporting fully conformant MPLS-TP LSP over MPLS LSP which
	    are making use of multipath, requires special treatment of
	    the MPLS-TP LSP such that those LSP only are not subject to
	    the multipath load slitting.
	    <list counter="mp" hangIndent="4" style="format MP#%d">
	      <t>
		It MUST be possible to identify MPLS-TP LSP.  
	      </t>
	      <t>
		It MUST be possible to completely exclude MPLS-TP LSP
		from the multipath hash and load split, statically
		assign it to a component link or member, and compensate
		for this assignment in the MPLS multipath load split.
	      </t>
	      <t>
		In order to support one or more MPLS-TP LSP contained in
		an MPLS LSP, it MUST be possible to signal the presence
		of MPLS-TP LSP within an MPLS LSP.
	      </t>
	      <t>
		In order to support an MPLS LSP carrying other MPLS LSP
		some of which in turn carry MPLS-TP LSP, it MUST be
		possible to determine the minimum depth within the label
		stack at which an MPLS-TP LSP exists and provide this
		depth in signaling.
	      </t>
	      <t>
		The depth within the label stack of the multipath hash for
		any MPLS LSP that is carrying MPLS-TP LSP MUST be
		constrained for that MPLS LSP so that the hashing does
		not include any information past an MPLS-TP label.
	      </t>
	      <t>
		It must be possible for an LSR which is setting up an
		MPLS-TP or MPLS LSP to determine at CSPF time whether
		a link can support the MPLS-TP requirements of the
		LSP.
	      </t>
	    </list>
	  </t>
	  <t>
	    Some hardware which exists today can support requirement
	    MP#2.  For example, if a table is used to support multipath
	    and produces satisfactory results given existing traffic
	    patterns, and the number of component links or members is
	    smaller than the table by a factor or N, then an allocation
	    of a multiple of 1/N of a component or member link can be
	    set aside for MPLS-TP traffic.  The MPLS-TP traffic can be
	    protected from an degraded performance due to an imperfect
	    load split if the MPLS-TP traffic is given queuing priority
	    (using strict priority and policing or shaping at ingress or
	    locally or weighted queuing locally).
	  </t>
	  <t>
	    Most existing hardware cannot support requirement MP#5 but
	    some may be able to partially support this requirements by
	    fixing the label stack inspection depth to a fixed number of
	    LSP from the top.  Full support for requirement MP#5
	    requires that the depth over which the hash is computed can
	    be derived from the label number of the label on which a
	    label swap operation is performed.
	  </t>
	</section>
	<section anchor="tp-server-layer"
		 title="MPLS-TP as a Server Layer for MPLS">
	  <t>
	    Carrying MPLS LSP which are larger than a component link
	    over an MPLS-TP server layer requires that the large MPLS
	    client layer LSP be accommodated by multiple MPLS-TP server
	    layer LSPs.  MPLS multipath can be used in the client layer
	    MPLS as described in <xref target="multipath-paths" />.
	  </t>
	  <t>
	    Creating multiple MPLS-TP server layer LSP places a greater
	    ILM scaling burden on the LSR (see
	    <xref target="mp-label-space" /> and the examples in
	    <xref target="mp-topo-ilm-size" />).  High bandwidth MPLS
	    cores with a smaller amount of nodes have the greatest
	    tendency to require LSP in excess of component links,
	    therefore the reduction in number of nodes offsets the
	    impact of increasing the number of server layer LSP in
	    parallel.  Today, only in cases where the ILM is small would
	    this be an issue.
	  </t>
	  <t>
	    The most significant disadvantage of MPLS-TP as a Server
	    Layer for MPLS is that the MPLS LSP reduces the efficiency
	    of carrying the MPLS client layer.  The service which
	    provides by far the largest offered load today is Internet,
	    for which the LSP capacity reservations are predictions of
	    expected load.  Many of these MPLS LSP may be smaller than
	    component link capacity.  Using MPLS-TP as a server layer
	    results in bin packing problems for these smaller LSP.  For
	    those LSP that are larger than component link capacity,
	    their capacity are not increments of convenient capacity
	    increments such as 10Gb/s.  Using MPLS-TP as an underlying
	    server layer greatly reduces the ability of the client layer
	    MPLS LSP to share capacity.  For example, when one MPLS LSP
	    is underutilizing its predicted capacity, the fixed
	    allocation of MPLS-TP to component links may not allow
	    another LSP to exceed its predicted capacity.  A solution
	    which makes less efficient use of resources may result in a
	    less cost effective solution, due to the amount of capital
	    equipment cost required and an increase in space and power
	    required.
	  </t>
	  <t>
	    No additional requirements beyond MPLS-TP as it is now
	    currently defined are required to support MPLS-TP as a
	    Server Layer for MPLS.  It is therefore viable but has some
	    undesirable characteristics discussed above.
	  </t>
	</section>
	<section anchor="relax-tp-oam"
		 title="Relax MPLS-TP OAM Requirements">
	  <t>
	    If MPLS-TP OAM requirements are not fully met, as currently
	    specified, an LSP is not fully MPLS-TP conformant.  That may
	    be little more than a semantic inconvenience and can not
	    prevent implementations from allowing LSP which are
	    otherwise MPLS-TP compliant to optionally use multipath with
	    some reduction in OAM capability.
	  </t>
	  <t>
	    Regardless as to whether relaxing MPLS-TP OAM requirements
	    makes an LSP no longer an MPLS-TP LSP, this section
	    discusses the consequence of using multipath with regard to
	    MPLS-TP OAM.
	  </t>
	  <t>
	    If MPLS-TP over multipath is supported by relaxing MPLS-TP
	    OAM requirements, the requirements listed below will
	    improve the behavior of MPLS-TP OAM over multipath.
            <list counter="oam" hangIndent="4" style="format OAM#%d">
	      <t>
		There MUST be a means of introducing entropy to
		MPLS-TP OAM.
	      </t>
	      <t>
		There SHOULD be a means to focus CC/CV testing on a
		specific multipath component link.
	      </t>
	      <t>
		There MUST be a means to support LM over multipath,
		even if at best a bounded long term inaccuracy is
		achieved.
	      </t>
	    </list>
	  </t>
	  <section anchor="relax-tp-cccvq"
		   title="MPLS-TP CC/CV OAM with Multipath">
	    <t>
	      MPLS-TP CC/CV as currently defined has no means to
	      exercise all paths of a multipath.  The label stack is
	      fixed, followed by a GAL label <xref target="RFC5586"
	      />.  As is, only one path along a multipath can be
	      exercised when the ingress to the multipath is not also
	      the ingress to the LSP.  For example, if the LSP is
	      carrying PW, the PW themselves can be spread across the
	      multipath, but not the OAM traffic.
	    </t>
	    <t>
	      If CC/CV OAM is allowed to place a label below the GAL
	      label, the entire set of paths can be tested, though not
	      in a deterministic manner.  This is called an entropy
	      label.  Using a different random number in this entropy
	      label for each OAM packet allows all links to be exercised
	      on a probabilistic basis.
	    </t>
	    <t>
	      The loss of a isolated OAM CC/CV packet currently has no
	      effect.  If the loss of a single OAM packet can be noted
	      by the sender, then the sender can repeatedly use the
	      same value in the entropy label.  This requires either a
	      two way OAM or feedback to the ingress.  If OAM packets
	      can be reordered, then a sliding window of outstanding
	      OAM packets is required.  If OAM CC/CV packets are given
	      high priority (as currently specified), then delay
	      difference should be minimal and reordering may be
	      non-existent if the send interval is longer than the
	      delay difference.
	    </t>
	    <t>
	      If a multipath component link failure had been detected
	      locally (at a node adjacent to the failure) and the
	      failure corrected locally (ie: segment protection) or
	      the component link taken out of service, the client LSP
	      would either no longer be affected or it would be
	      preempted.  If the client LSP has been preempted,
	      MPLS-TP OAM unmodified would be sufficient to detect
	      this condition.  The existing BFD <xref target="RFC5884"
	      /> provides this functionality.
	    </t>
	    <t>
	      Only in the case where a component link has failed and
	      the server layer has not been able to detect and correct
	      the failure or take the component link out of service
	      would CC/CV OAM on the client LSP serve any purpose.
	      For this purpose, a relaxed OAM may be sufficient.  If
	      the client LSP has no control over the multipath itself,
	      the entire multipath must be considered down if any
	      uncorrected component link failure is occurring at the
	      multipath.
	    </t>
	    <t>
	      The CC/CV as described here can be handled by an OAM
	      mechanism which is bidirectional.  LSP Ping provides
	      such a mechanism <xref target="RFC4379" />.  Because the
	      condition being handled by LSP ping should be quite
	      rare, it may be acceptable to use a combination of BFD
	      and MPLS ping to provide OAM with full coverage of all
	      types of fault, but with a slower response to a
	      component link failure which is not detected at the
	      point of the fault.
	    </t>
	    <t>
	      For LSR implementations which support BFD and MPLS ping
	      "as is", these may be viable as an optional MPLS-TP form
	      of CC/CV OAM.  A deployment may use this option if the
	      reliance on IP is acceptable to the provider.
	      Alternately MPLS-TP OAM could take such requirements
	      into consideration and provide an additional capability
	      in BFD or provide MPLS-TP extensions to MPLS ping.
	    </t>
	    <t>
	      A further small complication may occur at the OAM
	      egress.  If the egress to the LSP is a multipath egress,
	      then the OAM may arrive at any of the component links at
	      the egress.  This requires that the CC/CV OAM be
	      forwarded within the LSR to a common packet processor in
	      order to be handled in hardware (or forwarded to a
	      common CPU).  This is also true of other types of OAM.
	    </t>
	  </section>
	  <section anchor="relax-tp-lm"
		   title="MPLS-TP LM OAM with Multipath">
	    <t>
	      MPLS-TP LM OAM makes use of the count of payload packets
	      at an egress.  If the payload is reordered, even with no
	      consequence to the payload itself, some inaccuracy is
	      introduced to the LM.  Some number of payload packets
	      which were transmitted before the LM OAM packet was sent
	      may arrive after the LM packet is received and some
	      payload packets transmitted after the LM OAM packet may
	      arrive before the LM packet.
	    </t>
	    <t>
	      If the LSP egress is a multipath, then the LM packets
	      may arrive at any packet processor over which the
	      multipath resides.  The counters from each of the egress
	      packet processors will have to be sampled.  During the
	      sampling interval, addition packet arrive and will be
	      counted.  This creates an equivalent out of order
	      problem with respect to the LM OAM and the payload it is
	      counting.
	    </t>
	    <t>
	      This error is bounded and is not cumulative.  For
	      example, if one LM interval counts too few packets, the
	      next LM interval will tend to count too many.  Over longer
	      measurement periods the total error retains the same
	      bounds, which over longer intervals becomes less
	      significant.
	    </t>
	    <t>
	      These errors are most significant when a substantial
	      amount of queuing delay is present (generally an
	      indication of light congestion) and when the queues at
	      various component links differ in delay.  Queuing delay
	      differences are generally milliseconds.  Delay
	      differences of tens of milliseconds requires persistent
	      queues and significant congestion.
	    </t>
	    <t>
	      The worst case errors over long intervals are reasonably
	      well bounded.  For example, with A 10 msec delay
	      difference, a one minute sampling yields less than a 0.02%
	      uncertainty and over a 15 minute interval loss uncertainty
	      is just over 0.001%.  Given that congestion is required to
	      achieve these uncertainties, the loss due to congestion is
	      likely to significantly exceed these uncertainties for all
	      but very short measurement intervals.
	    </t>
	    <t>
	      When loss is zero but short term queues are formed, the
	      queuing delay difference is likely to be under one
	      millisecond for the common case of parallel links that
	      are routed along the same fiber (using WDM).  The
	      uncertainty for 1 minute and 15 minute samples are under
	      0.002% and just over 0.0001% (10^-6).  The uncertainty
	      over a 24 hour period is 0.00000011% or just over 10^-9.
	      An SLA could easily be supported where loss was
	      guaranteed not to exceed 10^-6 in any hour or 10^-8 in
	      any 24 hour period.  Such a guarantee would require that
	      the MPLS-TP LSP be given priority over non-policed or
	      shaped traffic and itself is policed or shaped.
	    </t>
	    <t>
	      This measurement uncertainty may or may not be acceptable
	      to a given deployment.  Providing an option to support
	      MPLS-TP over multipath does introduce a bounded error to
	      LM but it does not remove a providers option not to use
	      MPLS-TP over multipath.
	    </t>
	  </section>
	</section>
      </section>
    </section>

    <section anchor="multipath-summary"
	     title="Summary of Recommendations">
      <t>
	<xref target="multipath-reqm" /> enumerates functional
	requirements.
	<xref target="multipath-practices" /> describes current practices.
	<xref target="multipath-changes" /> enumerates functional
	changes to better meet these requirements.  This section
	provides specific recommendations.
      </t>
      <t>
	To support MPLS with multipath as a server layer for MPLS-TP
	the following changes are required.
	<list counter="soln" hangIndent="4"
	      style="format Recommendation #%d">
	  <t>
	    Provide a means in RSVP-TE for an LSP to self identify its
	    requirement to be treated as fully compliant MPLS-TP
	    (disallow reordering).
	  </t>
	  <t>
	    Provide a means in RSVP-TE for an LSP that is not an
	    MPLS-TP LSP but is directly carrying MPLS-TP LSP to
	    indicate that hashing may only be performed on the first
	    two labels and indicate the largest MPLS-TP LSP being
	    carried (the largest potential microflow).
	  </t>
	  <t>
	    Provide a means in RSVP-TE for an LSP that is not an
            MPLS-TP LSP but is carrying MPLS-TP at some depth to
            indicate the maximum depth in the label stack that hashing
            can operate on, and the largest MPLS-TP LSP being carried
            (the largest potential microflow).
	  </t>
	  <t>
	    Provide a means in OSPF-TE and ISIS-TE to indicate the
	    largest microflow that a multipath can accommodate
	    independent of the largest LSP that can accommodated with
	    load splitting.  An extension to <xref target="RFC4201" />
	    which separates Maximum LSP into two variables, with
	    backward compatibility may be the most desirable solution.
	  </t>
	</list>
      </t>
      <t>
	The current framework documents could be improved with the
	following additions.
        <list counter="soln" hangIndent="4"
	      style="format Recommendation #%d">
	  <t>
	    Relax GAL specification in <xref target="RFC5586" /> to
	    allow a label below GAL to provide entropy in OAM traffic
	    over multipath.
	  </t>
	  <t>
	    Preferably in the OAM framework, acknowledge the need for
	    entropy in OAM in some circumstances.  Note that if no
	    multipath exists along a path, the entropy is not needed
	    but does no harm.  Support optional entropy in MPLS-TP OAM
	    through use of a label under the GAL label.
	  </t>
	  <t>
	    Document the need for MPLS Ping or other two way mechanism to keep
	    a sliding window of outstanding packets at the sender
	    which records the entropy value used, note any single
	    loss, and send repeated packets for an entropy value which
	    has experienced a loss.
	  </t>
	  <t>
	    Preferably in the OAM framework, document the need for
	    CC/CV at a multipath egress to forward OAM packets for an
	    LSP that is load split through an out of band means to a
	    common packet processor or CPU.
	  </t>
	  <t>
	    Preferably in the OAM framework, document the need for LM
	    at multipath egress to collect packet counts on all packet
	    processors that could potentially receive packets for a
	    given LSP.
	  </t>
	</list>
      </t>
      <t>
	Forwarding changes to multipath necessary to support MPLS with
	multipath as a server layer for fully compliant MPLS-TP are
	the following:
        <list counter="forw" hangIndent="4"
	      style="format Forwarding #%d">
	  <t>
	    Store the maximum depth of multipath hash (or zero for
	    unconstrained depth) in the ILM.
	  </t>
	  <t>
	    Do not hash using the IP stack on an LSP which is carrying
	    MPLS-TP.  An LSP where IP headers can be used in the stack
	    can be identified by noting that a maximum depth equal
	    zero cannot be carrying MPLS-TP or it can be explicitly
	    indicated, independently of depth.  If a CW is not used
	    with PW, then this indication must be explicit.
	  </t>
	  <t>
	    When hashing on the MPLS label stack do not hash beyond
	    the maximum depth of hash for a given LSP.
	  </t>
	  <t>
	    Exclude reserved labels from the hash on label stack.  In
	    particular, the <xref target="RFC5586">GAL</xref>
	    and <xref target="RFC3429">OAM Alert Label</xref> should
	    be skipped.
	  </t>
	</list>
      </t>
    </section>

    <!-- Possibly an Acknowledgements or a 'Contributors' section ... -->

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>
	This document specifies requirements with discussion of
	framework for solutions.  The requirements and framework are
	related to the coexistence of MPLS/GMPLS (without MPLS-TP)
	when used over a packet network, MPLS-TP, and multipath.  The
	combination of MPLS, MPLS-TP, and multipath does not introduce
	any new security threats.  The security considerations for
	MPLS/GMPLS and for MPLS-TP are documented in
	<xref target="RFC5920" />
	and <xref target="I-D.ietf-mpls-tp-security-framework" />.
      </t>
    </section>

<!--

  warning re use of router alert above PW label and ECMP.  rfc5085 (PW
  VCCV) 5.1.2.  Out-of-Band VCCV (Type 2)

  multipath out of scope in LDP - rfc5036 (LDP Specification) section
  "6.  Areas for Future Study"

   "rfc4928 Avoiding Equal Cost Multipath Treatment in MPLS Networks"

	document current ECMP practices
 	focus is on ECMP, though multipath is more broad
	described "ECMP behavior of currently deployed MPLS networks"
	informationsl RFC to justify definition of the PW CW

	  "While none of this is in violation of the basic service
	   offering of IP, it is detrimental to the performance of
	   various classes of applications.  It also complicates the
	   measurement, monitoring, and tracing of those flows."

  rfc4782 - Quick-Start for TCP and IP - warning about multipath for
  single connection

  rfc4385 PW3 Control Word for Use over an MPLS PSN

  rfc4379 "Detecting Multi-Protocol Label Switched (MPLS) Data Plane
  Failures" (aka MPLS-Ping) - extensive support for multipath

  rfc4378 - "A Framework for Multi-Protocol Label Switching (MPLS)
  Operations and Management (OAM)" - mentions characterization,
  including determining if MP (3.2.1.  Characterization).

  rfc4377 "Operations and Management (OAM) Requirements for
  Multi-Protocol Label Switched (MPLS) Networks"

	In "4.3.  Path Characterization"

      -  sufficient details that allow the test origin to exercise all
         path permutations related to load spreading (e.g., ECMP).

  rfc3916 "Requirements for Pseudo-Wire Emulation Edge-to-Edge (PWE3)"

  rfc3813 "Multiprotocol Label Switching (MPLS) Label Switching Router
  (LSR) Management Information Base (MIB)" - supports MP

  rfc3272 "Overview and Principles of Internet Traffic Engineering" -
  has some good glossary items and a good terminology overview.  not
  complimentary in comments about ecmp.  mentions unequal cost mp in
  "8.0 Overview of Contemporary TE Practices in Operational IP
  Networks" - doesn't quite mention proportional split across LSP.

  rfc3031 "Multiprotocol Label Switching Architecture" - mentions MP

  rfc2991, rfc2992 - describes hash used at the time for MP.  Good
  requirements section "Minimal disruption", "Fast implementation".
  Limited coverage of solutions "Modulo-N Hash", "Hash-Threshold",
  "Highest Random Weight (HRW)".  see "6.  Redundant Parallel Links"

  rfc2702 "Requirements for Traffic Engineering Over MPLS" -
  preference rules are mentioned indicating "pick one".

  rfc2329 "OSPF Standardization Report" (1998) - more mp than cidr or
  mib (note: 10 respondents).

                                       Imple-   Inter-
            Feature                    mented   operated   Deployed
              Equal-cost multipath       10       7          8

  rfc1247 (1991), rfc1583 (1994), rfc2178, rfc2328 "OSPF Version 2" 1998

  rfc1246 (1991) "Experience with the OSPF protocol" - includes mp

  rfc1126 (1989) "Goals and Functional Requirements for
  Inter-Autonomous - System Routing" - mp is desirable

  rfc4190 "Framework for Supporting Emergency Telecommunications
  Service (ETS) in IP Telephony" - mentions ecmp robustness

  rfc4041 "Requirements for Morality Sections in Routing Area Drafts"
  mentions ecmp

 -->

  </middle>

  <back>

    <references title="Normative References">

      &RFC2119;

    </references>

    <references title="Informative References">

      &RFC1717;

      &RFC1247;

      &RFC2475;

      &RFC2615;

      &RFC2991;

      &RFC2992;

      &RFC3031;

      &RFC3032;

      &RFC3260;

      &RFC3270;

      &RFC3429;

      &RFC4090;

      &RFC4201;

      &RFC4206;

      &RFC4385;

      &RFC4379;

      &RFC4426;

      &RFC4448;

      &RFC4928;

      &RFC5286;

      &RFC5462;

      &RFC5586;

      &RFC5714;

      &RFC5860;

      &RFC5884;

      &RFC5920;

      &I-D.ietf-pwe3-fat-pw;

      &I-D.ietf-mpls-tp-oam-framework;

      &I-D.ietf-mpls-tp-security-framework;

      <reference anchor="IEEE-802.1AX"
                 target="http://standards.ieee.org/getieee802/download/802.1AX-2008.pdf">
        <front>
          <title>IEEE Std 802.1AX-2008 IEEE Standard for
	    Local and Metropolitan Area Networks - Link Aggregation</title>

          <author>
            <organization>IEEE Standards Association</organization>
          </author>

          <date year="2006" />
        </front>
      </reference>

      <reference anchor="ITU-T.G.800"
                 target="http://www.itu.int/rec/T-REC-G/recommendation.asp?parent=T-REC-G.800">
        <front>
          <title>Unified functional architecture of transport
          networks</title>

          <author>
            <organization>ITU-T</organization>
          </author>

          <date year="2007" />
        </front>
      </reference>

    </references>

  </back>
</rfc>

PAFTECH AB 2003-20262026-04-24 02:58:31