One document matched: draft-ietf-mpls-entropy-label-06.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfcfoo.dtd">
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std"
     ipr="trust200902"
     updates="3031, 3107, 3209, 5036"
     docName='draft-ietf-mpls-entropy-label-06'>
  <front>
    <title abbrev="MPLS Entropy Labels">
      The Use of Entropy Labels in MPLS Forwarding
    </title>

    <author fullname="Kireeti Kompella" initials="K." surname="Kompella">
      <organization>Juniper Networks</organization>
      <address>
        <postal>
          <street>1194 N. Mathilda Ave.</street>
          <city>Sunnyvale</city>
          <region>CA</region>
          <code>94089</code>
          <country>US</country>
        </postal>
        <email>kireeti.kompella@gmail.com</email>
      </address>
    </author>

    <author fullname="John Drake" initials="J." surname="Drake">
      <organization>Juniper Networks</organization>
      <address>
        <postal>
          <street>1194 N. Mathilda Ave.</street>
          <city>Sunnyvale</city>
          <region>CA</region>
          <code>94089</code>
          <country>US</country>
        </postal>
        <email>jdrake@juniper.net</email>
      </address>
    </author>

    <author fullname="Shane Amante" initials="S." surname="Amante">
      <organization>Level 3 Communications, LLC</organization>
      <address>
        <postal>
          <street>1025 Eldorado Blvd</street>
          <city>Broomfield</city>
          <region>CO</region>
          <code>80021</code>
          <country>US</country>
        </postal>
        <email>shane@level3.net</email>
      </address>
    </author>

    <author fullname="Wim Henderickx" initials="W." surname="Henderickx">
      <organization>Alcatel-Lucent</organization>
      <address>
        <postal>
          <street>Copernicuslaan 50</street>
          <city>2018 Antwerp</city>
          <country>Belgium</country>
        </postal>
        <email>wim.henderickx@alcatel-lucent.com</email>
      </address>
    </author>

    <author fullname="Lucy Yong" initials="L." surname="Yong">
      <organization>Huawei USA</organization>
      <address>
        <postal>
          <street>5340 Legacy Dr.</street>
          <city>Plano</city>
          <region>TX</region>
          <code>75024</code>
          <country>US</country>
        </postal>
        <email>lucy.yong@huawei.com</email>
      </address>
    </author>

    <date year="2012"/>

    <area>Routing</area>

    <keyword>Internet-Draft</keyword>
    <keyword>entropy hash ecmp load balancing</keyword>

    <abstract>
      <t>
	Load balancing is a powerful tool for engineering traffic
	across a network.  This memo suggests ways of improving load
	balancing across MPLS networks using the concept of "entropy
	labels".  It defines the concept, describes why entropy labels
	are useful, enumerates properties of entropy labels that allow
	maximal benefit, and shows how they can be signaled and used
	for various applications.  This document updates RFCs 3031,
	3107, 3209 and 5036.
      </t>
    </abstract>
  </front>

<middle>
<section anchor="intro" title="Introduction">
  <t>
    Load balancing, or multi-pathing, is an attempt to balance traffic
    across a network by allowing the traffic to use multiple
    paths.  Load balancing has several benefits: it eases capacity
    planning; it can help absorb traffic surges by spreading them
    across multiple paths; it allows better resilience by offering
    alternate paths in the event of a link or node failure.
  </t>

  <t>
    As providers scale their networks, they use several techniques to
    achieve greater bandwidth between nodes.  Two widely used
    techniques are: Link Aggregation Group (LAG) and Equal-Cost
    Multi-Path (ECMP).  LAG is used to bond together several physical
    circuits between two adjacent nodes so they appear to higher-layer
    protocols as a single, higher bandwidth 'virtual' pipe.  ECMP is
    used between two nodes separated by one or more hops, to allow
    load balancing over several shortest paths in the network.  This
    is typically obtained by arranging IGP metrics such that there are
    several equal cost paths between source-destination pairs.  Both
    of these techniques may, and often do, co-exist in various parts
    of a given provider's network, depending on various choices made
    by the provider.
  </t>

  <t>
    A very important requirement when load balancing is that packets
    belonging to a given 'flow' must be mapped to the same path, i.e.,
    the same exact sequence of links across the network.  This is to
    avoid jitter, latency and re-ordering issues for the flow.  What
    constitutes a flow varies considerably.  A common example of a
    flow is a TCP session.  Other examples are an L2TP session
    corresponding to a given broadband user, or traffic within an ATM
    virtual circuit.
  </t>

  <t>
    To meet this requirement, a node uses certain fields, termed
    'keys', within a packet's header as input to a load balancing
    function (typically a hash function) that selects the path for all
    packets in a given flow.  The keys chosen for the load balancing
    function depend on the packet type; a typical set (for IP packets)
    is the IP source and destination addresses, the protocol type, and
    (for TCP and UDP traffic) the source and destination port numbers.
    An overly conservative choice of fields may lead to many flows
    mapping to the same hash value (and consequently poorer load
    balancing); an overly aggressive choice may map a flow to multiple
    values, potentially violating the above requirement.
  </t>

  <t>
    For MPLS networks, most of the same principles (and benefits)
    apply.  However, finding useful keys in a packet for the purpose
    of load balancing can be more of a challenge.  In many cases, MPLS
    encapsulation may require fairly deep inspection of packets to
    find these keys at transit Label Switching Routers (LSRs).
  </t>

  <t>
    One way to eliminate the need for this deep inspection is to have
    the ingress LSR of an MPLS Label Switched Path extract the
    appropriate keys from a given packet, input them to its load
    balancing function, and place the result in an additional label,
    termed the 'entropy label', as part of the MPLS label stack it
    pushes onto that packet.
  </t>

  <t>
    The packet's MPLS entire label stack can then be used by transit
    LSRs to perform load balancing, as the entropy label introduces
    the right level of "entropy" into the label stack.
  </t>

  <t>
    There are five key reasons why this is beneficial:
    <list style="numbers">
      <t>
	at the ingress LSR, MPLS encapsulation hasn't yet occurred, so
        deep inspection is not necessary;
      </t>

      <t>
	the ingress LSR has more context and information about
        incoming packets than transit LSRs;
      </t>

      <t>
	ingress LSRs usually operate at lower bandwidths than transit
        LSRs, allowing them to do more work per packet;
      </t>

      <t>
	transit LSRs do not need to perform deep packet inspection and
	can load balance effectively using only a packet's MPLS label
	stack; and
      </t>

      <t>
	transit LSRs, not having the full context that an ingress LSR
	does, have the hard choice between potentially misinterpreting
	fields in a packet as valid keys for load balancing (causing
	packet ordering problems) or adopting a conservative approach
	(giving rise to sub-optimal load balancing).  Entropy labels
	relieves them of making this choice.
      </t>
    </list>
  </t>

  <t>
    This memo describes why entropy labels are needed and defines the
    properties of entropy labels; in particular how they are generated
    and received, and the expected behavior of transit LSRs.  Finally,
    it describes in general how signaling works and what needs to be
    signaled, as well as specifics for the signaling of entropy labels
    for LDP (<xref target="RFC5036"/>), BGP (<xref
    target="RFC3107"/>), and RSVP-TE (<xref target="RFC3209"/>).
  </t>

  <section anchor="conv" title="Conventions used">
    <t>
      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described
      in <xref target="RFC2119"/>.
    </t>

    <t>
      The following acronyms are used:
      <list>
	<t>BoS: Bottom of Stack</t>
	<t>CE: Customer Edge device</t>
	<t>ECMP: Equal Cost Multi-Path</t>
	<t>EL: Entropy Label</t>
	<t>ELC: Entropy Label Capability</t>
	<t>ELI: Entropy Label Indicator</t>
	<t>FEC: Forwarding Equivalence Class</t>
	<t>LAG: Link Aggregation Group</t>
	<t>LER: Label Edge Router</t>
	<t>LSP: Label Switched Path</t>
	<t>LSR: Label Switching Router</t>
	<t>PE: Provider Edge Router</t>
	<t>PW: Pseudowire</t>
	<t>PHP: Penultimate Hop Popping</t>
	<t>TC: Traffic Class</t>
	<t>TTL: Time-to-Live</t>
	<t>UHP: Ultimate Hop Popping</t>
	<t>VPLS: Virtual Private LAN (Local Area Network) Service</t>
	<t>VPN: Virtual Private Network</t>
      </list>
    </t>

    <t>
      The term ingress (or egress) LSR is used interchangeably with
      ingress (or egress) LER.  The term application throughout the
      text refers to an MPLS application (such as a VPN or VPLS).
    </t>

    <t>
      A label stack (say of three labels) is denoted by <L1, L2,
      L3>, where L1 is the "outermost" label and L3 the innermost
      (closest to the payload).  Packet flows are depicted left to
      right, and signaling is shown right to left (unless otherwise
      indicated).
    </t>

    <t>
      The term 'label' is used both for the entire 32-bit label stack
      entry and the 20-bit label field within a label stack entry.  It
      should be clear from the context which is meant.
    </t>
  </section>

  <section title="Motivation">
    <t>
      MPLS is a very successful generic forwarding substrate that
      transports several dozen types of protocols, most notably: IP,
      PWs, VPLS and IP VPNs.  Within each type of protocol, there
      typically exist several variants, each with a different set of
      load balancing keys, e.g., for IP: IPv4, IPv6, IPv6 in IPv4,
      etc.; for PWs: Ethernet, ATM, Frame-Relay, etc.  There are also
      several different types of Ethernet over PW encapsulation, ATM
      over PW encapsulation, etc. as well.  Finally, given the
      popularity of MPLS, it is likely that it will continue to be
      extended to transport new protocols.
    </t>

    <t>
      Currently, each transit LSR along the path of a given LSP has to
      try to infer the underlying protocol within an MPLS packet in
      order to extract appropriate keys for load balancing.
      Unfortunately, if the transit LSR is unable to infer the MPLS
      packet's protocol (as is often the case), it will typically use
      the topmost (or all) MPLS labels in the label stack as keys for
      the load balancing function.  The result may be an extremely
      inequitable distribution of traffic across equal-cost paths
      exiting that LSR.  This is because MPLS labels are generally
      fairly coarse-grained forwarding labels that typically describe
      a next-hop, or provide some of demultiplexing and/or forwarding
      function, and do not describe the packet's underlying protocol.
    </t>

    <t>
      On the other hand, an ingress LSR (e.g., a PE router) has
      detailed knowledge of an packet's contents, typically through a
      priori configuration of the encapsulation(s) that are expected
      at a given PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.).
      They also have more flexible forwarding hardware.  PE routers
      need this information and these capabilities to:
      <list>
	<t>
	  a) apply the required services for the CE;
	</t>
	<t>
	  b) discern the packet's CoS forwarding treatment;
	</t>
	<t>
	  c) apply filters to forward or block traffic to/from the CE;
	</t>
	<t>
	  d) to forward routing/control traffic to an onboard
	  management processor; and,
	</t>
	<t>
	  e) load-balance the traffic on its uplinks to transit LSRs
	  (e.g., P routers).
	</t>
      </list>
      By knowing the expected encapsulation types, an ingress LSR
      router can apply a more specific set of payload parsing routines
      to extract the keys appropriate for a given protocol.  This
      allows for significantly improved accuracy in determining the
      appropriate load balancing behavior for each protocol.
    </t>

    <t>
      If the ingress LSR were to capture the flow information so
      gathered in a convenient form for downstream transit LSRs,
      transit LSRs could remain completely oblivious to the contents
      of each MPLS packet, and use only the captured flow information
      to perform load balancing.  In particular, there will be no
      reason to duplicate an ingress LSR's complex packet/payload
      parsing functionality in a transit LSR.  This will result in
      less complex transit LSRs, enabling them to more easily scale to
      higher forwarding rates, larger port density, lower power
      consumption, etc.  The idea in this memo is to capture this flow
      information as a label, the so-called entropy label.
    </t>

    <t>
      Ingress LSRs can also adapt more readily to new protocols and
      extract the appropriate keys to use for load balancing packets
      of those protocols.  This means that deploying new protocols or
      services in edge devices requires fewer concomitant changes in
      the core, resulting in higher edge service velocity and at the
      same time more stable core networks.
    </t>
  </section>
</section>

<section title="Approaches">
  <t>
    There are two main approaches to encoding load balancing
    information in the label stack.  The first allocates multiple
    labels for a particular Forwarding Equivalence Class (FEC).  These
    labels are equivalent in terms of forwarding semantics, but having
    multiple labels allows flexibility in assigning labels to flows
    belonging to the same FEC.  This approach has the advantage that
    the label stack has the same depth whether or not one uses
    label-based load balancing; and so, consequently, there is no
    change to forwarding operations on transit and egress LSRs.
    However, it has a major drawback in that there is a significant
    increase in both signaling and forwarding state.
  </t>
  <t>
    The other approach encodes the load balancing information as an
    additional label in the label stack, thus increasing the depth of
    the label stack by one.  With this approach, there is minimal
    change to signaling state for a FEC; also, there is no change in
    forwarding operations in transit LSRs, and no increase of
    forwarding state in any LSR.  The only purpose of the additional
    label is to increase the entropy in the label stack, so this is
    called an "entropy label".  This memo focuses solely on this
    approach.
  </t>
  <t>
    This latter approach uses upstream generated entropy labels, which
    may conflict with downstream allocated application labels.  There
    are a few approaches to deal with this: 1) allocate a pair of
    labels for each FEC, one that must have an entropy label below it,
    and one that must not; 2) use a label (the "Entropy Label
    Indicator") to indicate that the next label is an entropy label;
    and 3) allow entropy labels only where there is no possible
    confusion.  The first doubles control and data plane state in the
    network; the last is too restrictive.  The approach taken here is
    the second.  In making both the above choices, the trade-off is to
    increase label stack depth rather than control and data plane
    state in the network.
  </t>
  <t>
    Finally, one may choose to associate ELs with MPLS tunnels (LSPs),
    or with MPLS applications (e.g., VPNs).  (What this entails is
    described in later sections.)  We take the former approach, for
    the following reasons:
    <list style='numbers'>
      <t>
	There are a small number of tunneling protocols for MPLS, but
	a large and growing number of applications.  Defining ELs on a
	tunnel basis means simpler standards, lower development,
	interoperability and testing efforts.
      </t>
      <t>
	As a consequence, there will be much less churn in the network
	as new applications (services) are defined and deployed.
      </t>
      <t>
	Processing application labels in the data plane is more
	complex than processing tunnel labels.  Thus, it is preferable
	to burden the latter rather than the former with EL
	processing.
      </t>
      <t>
	Associating ELs with tunnels makes it simpler to deal with
	hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs.
	Each layer in the hierarchy can choose independently whether
	or not they want ELs.
      </t>
    </list>
    The cost of this approach is that ELIs will be mandatory; again,
    the trade-off is the size of the label stack.  To summarize, the
    net increase in the label stack to use entropy labels is two: one
    reserved label for the ELI, and the entropy label itself.
  </t>
</section>

<section title="Entropy Labels and Their Structure" anchor='el-struct'>
  <t>
    An entropy label (as used here) is a label:
    <list style="numbers">
      <t>that is not used for forwarding;</t>
      <t>that is not signaled; and</t>
      <t>
	whose only purpose in the label stack is to provide 'entropy'
        to improve load balancing.
      </t>
    </list>
  </t>

  <t>
    Entropy labels are generated by an ingress LSR, based entirely on
    load balancing information.  However, they MUST NOT have values in
    the reserved label space (0-15) [IANA MPLS Label Values].
  </t>

  <t>
    Since entropy labels are generated by an ingress LSR, an egress
    LSR MUST be able to distinguish unambiguously between entropy
    labels and application labels.  To accomplish this, it is REQUIRED
    that the label immediately preceding an entropy label (EL) in the
    MPLS label stack be an 'entropy label indicator' (ELI), where
    preceding means closer to the top of the label stack (farther from
    bottom of stack indication).  The ELI is a reserved label with
    value (TBD by IANA).  How to set values of the TTL, TC and 'Bottom
    of Stack' (BoS) fields (<xref target='RFC3032'/>) for the ELI and
    for ELs is discussed in <xref target='ingress-lsr'/>.
  </t>

  <t>
    Entropy labels are useful for pseudowires (<xref
    target="RFC4447"/>).  <xref target="RFC6391"/> explains how
    entropy labels can be used for RFC 4447-style pseudowires, and
    thus is complementary to this memo, which focuses on how entropy
    labels can be used for tunnels, and thus for all other MPLS
    applications.
  </t>
</section>

<section title="Data Plane Processing of Entropy Labels">
  <section anchor="egress-lsr" title="Egress LSR">
    <t>
      Suppose egress LSR Y is capable of processing entropy labels for
      a tunnel.  Y indicates this to all ingresses via signaling (see
      <xref target='sig'/>).  Y MUST be prepared to deal both with
      packets with an imposed EL and those without; the ELI will
      distinguish these cases.  If a particular ingress chooses not to
      impose an EL, Y's processing of the received label stack (which
      might be empty) is as if Y chose not to accept ELs.
    </t>
    <t>
      If an ingress X chooses to impose an EL, then Y will receive a
      tunnel termination packet with label stack <TL, ELI, EL>
      <remaining packet header>.  Y recognizes TL as the label
      it distributed to its upstreams for the tunnel, and pops it.
      (Note that TL may be the implicit null label, in which case it
      doesn't appear in the label stack.)  Y then recognizes the ELI
      and pops two labels: the ELI and the EL.  Y then processes the
      remaining packet header as normal; this may require further
      processing of tunnel termination, perhaps with further ELI+EL
      pairs.  When processing the final tunnel termination, Y MAY
      enqueue the packet based on that tunnel TL's or ELI's TC value,
      and MAY use the tunnel TL's or ELI's TTL to compute the TTL of
      the remaining packet header.  The EL's TTL MUST be ignored.
    </t>
    <t>
      If any ELI processed by Y has BoS bit set, Y MUST discard the
      packet, and MAY log an error.  The EL's BoS bit will indicate
      whether or not there are more labels in the stack.
    </t>
  </section>

  <section anchor="ingress-lsr" title="Ingress LSR">
    <t>
      If an egress LSR Y indicates via signaling that it can process
      ELs on a particular tunnel, an ingress LSR X can choose whether
      or not to insert ELs for packets going into that tunnel.  Y MUST
      handle both cases.
    </t>

    <t>
      The steps that X performs to insert ELs are as follows:
      <list style='numbers'>
	<t>
	  On an incoming packet, identify the application to which the
	  packet belongs; based on this, pick appropriate fields as
	  input to the load balancing function; apply the load
	  balancing function to these input fields, and let LB be the
	  output.
	</t>
	<t>
	  Determine the application label AL (if any).  Push
	  <AL> onto the packet.
	</t>
	<t>
	  Based on the application, the load balancing output LB and
	  other factors, determine the egress LSR Y, the tunnel to Y,
	  the specific interface to the next hop, and thus the tunnel
	  label TL.  Use LB to generate the entropy label EL.
	</t>
	<t>
	  If, for the chosen tunnel, Y has not indicated that it can
	  process ELs, push <TL> onto the packet.  If Y has
	  indicated that it can process ELs for the tunnel, push
	  <TL, ELI, EL> onto the packet.  X SHOULD put the same
	  TTL and TC fields for the ELI as it does for TL.  X MAY
	  choose different values for the TTL and TC fields if it is
	  known that the ELI will not be exposed as the top label at
	  any point along the LSP (as may happen in cases where PHP is
	  used and the ELI and EL are not stripped at the penultimate
	  hop (see <xref target='php-lsr'/>).  The BoS bit for the ELI
	  MUST be zero.  The TTL for the EL MUST be zero to ensure
	  that it is not used inadvertently for forwarding.  The TC
	  for the EL may be any value.  The BoS bit for the EL depends
	  on whether or not there are more labels in the label stack.
	</t>
	<t>
	  X then determines whether further tunnel hierarchy is
	  needed; if so, X goes back to step 3, possibly with a new
	  egress Y for the new tunnel.  Otherwise, X is done, and
	  sends out the packet.
	</t>
      </list>
    </t>

    <t>
      Notes:
      <list style='letters'>
	<t>
	  X computes load balancing information and generates the EL
	  based on the incoming application packet, even though the
	  signaling of EL capability is associated with tunnels.
	</t>
	<t>
	  X MAY insert several entropy labels in the stack (each, of
	  course, preceded by an ELI), potentially one for each
	  hierarchical tunnel, provided that the egress for that
	  tunnel has indicated that it can process ELs for that
	  tunnel.
	</t>
	<t>
	  X MUST NOT include an entropy label for a given tunnel
	  unless the egress LSR Y has indicated that it can process
	  entropy labels for that tunnel.
	</t>
	<t>
	  The signaling and use of entropy labels in one direction
	  (signaling from Y to X, and data path from X to Y) is
	  completely independent of the signaling and use of entropy
	  labels in the reverse direction (signaling from X to Y, and
	  data path from Y to X).
	</t>
      </list>
    </t>
  </section>

  <section anchor="transit-lsr" title="Transit LSR">
    <t>
      Transit LSRs MAY operate with no change in forwarding behavior.
      The following are suggestions for optimizations that improve
      load balancing, reduce the amount of packet data processed,
      and/or enhance backward compatibility.
    </t>
    <t>
      If a transit LSR recognizes the ELI, it MAY choose to load
      balance solely on the following label (the EL); otherwise, it
      SHOULD use as much of the whole label stack as feasible as keys
      for the load balancing function.  In any case, reserved labels
      MUST NOT be used as keys for the load balancing function.
    </t>
    <t>
      Some transit LSRs look beyond the label stack for better load
      balancing information.  This is a simple, backward compatible
      approach in networks where some ingress LSRs impose ELs and
      others don't.  However, this is of limited incremental value if
      an EL is indeed present, and requires more packet processing
      from the LSR.  A transit LSR MAY choose to parse the label stack
      for the presence of the ELI, and look beyond the label stack
      only if it does not find it, thus retaining the old behavior
      when needed, yet avoiding unnecessary work if not needed.
    </t>
    <t>
      As stated in <xref target='egress-lsr'/> and <xref
      target='sig'/>, an egress LSR that signals both ELC and implicit
      null MUST pop the ELI and the next label if it encounters a
      packet with the ELI as the topmost label.  Any other LSR
      (including PHP LSRs) MUST drop such packets, as per section 3.18
      of <xref target='RFC3031'/>.
    </t>
  </section>

  <section anchor='php-lsr' title='Penultimate Hop LSR'>
    <t>
      No change is needed at penultimate hop LSRs.  However, a PHP LSR
      that recognizes the ELI MAY choose to pop the ELI and following
      label (which should be an entropy label) in addition to popping
      the tunnel label, provided that doing so doesn't diminish its
      ability to load balance on the next hop.
    </t>
  </section>
</section>

<section anchor="sig" title="Signaling for Entropy Labels">
  <t>
    An egress LSR Y can signal to ingress LSR(s) its ability to
    process entropy labels (henceforth called "Entropy Label
    Capability" or ELC) on a given tunnel.  In particular, even if Y
    signals an implicit null label, indicating that PHP is to be
    performed, Y MUST be prepared to pop the ELI and EL.
  </t>

  <t>
    Note that Entropy Label Capability may be asymmetric: if LSRs X
    and Y are at opposite ends of a tunnel, X may be able to process
    entropy labels, whereas Y may not.  The signaling extensions below
    allow for this asymmetry.
  </t>

  <t>
    For an illustration of signaling and forwarding with entropy
    labels, see <xref target='sig-forw'/>.
  </t>

  <section anchor="ldp" title="LDP Signaling">
    <t>
      A new LDP TLV (<xref target="RFC5036"/>) is defined to signal an
      egress's ability to process entropy labels.  This is called the
      ELC TLV, and may appear as an Optional Parameter of the Label
      Mapping Message TLV.
    </t>

    <t>
      The presence of the ELC TLV in a Label Mapping Message indicates
      to ingress LSRs that the egress LSR can process entropy labels
      for the associated LDP tunnel.  The ELC TLV has Type (TBD by
      IANA) and Length 0.
    </t>

    <t>
      <figure anchor="el_sub_tlv" title="Entropy Label Capability TLV">
        <preamble>
	  The structure of the ELC TLV is shown below.
	</preamble>

        <artwork>
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|U|F|        Type (TBD)         |           Length (0)          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	</artwork>
      </figure>
    </t>

    <t>
      where:
      <list style="empty">
	<t>
	  U: Unknown bit.  This bit MUST be set to 1.  If the ELC TLV
	  is not understood by the receiver, then it MUST be ignored.
	</t>
	<t>
	  F: Forward bit.  This bit MUST be set be set to 1.  Since
	  the ELC TLV is going to be propagated hop-by-hop, it should
	  be forwarded even by nodes that may not understand it.
	</t>
	<t>
	  Type: Type field.  To be assigned by IANA.
	</t>
	<t>
	  Length: Length field.  This field specifies the total length
	  in octets of the ELC TLV, and is currently defined to be 0.
	</t>
      </list>
    </t>

    <section title='Processing the ELC TLV'>
      <t>
	An LSR that receives a Label Mapping with the ELC TLV but does
	not understand it MUST propagate it intact to its neighbors
	and MUST NOT send a notification to the sender (following the
	meaning of the U- and F-bits).
      </t>
      <t>
	An LSR X may receive multiple Label Mappings for a given FEC F
	from its neighbors.  In its turn, X may advertise a Label
	Mapping for F to its neighbors.  If X understands the ELC TLV,
	and if any of the advertisements it received for FEC F does
	not include the ELC TLV, X MUST NOT include the ELC TLV in its
	own advertisements of F.  If all the advertised Mappings for F
	include the ELC TLV, then X MUST advertise its Mapping for F
	with the ELC TLV.  If any of X's neighbors resends its
	Mapping, sends a new Mapping or Withdraws a previously
	advertised Mapping for F, X MUST re-evaluate the status of ELC
	for FEC F, and, if there is a change, X MUST re-advertise its
	Mapping for F with the updated status of ELC.
      </t>
    </section>
  </section>

  <section anchor="bgp" title="BGP Signaling">
    <t>
      When BGP <xref target="RFC4271"/> is used for distributing
      Network Layer Reachability Information (NLRI) as described in,
      for example, <xref target="RFC3107"/>, the BGP UPDATE message
      may include the ELC attribute as part of the Path Attributes.
      This is an optional, transitive BGP attribute of type (to be
      assigned by IANA).  The inclusion of this attribute with an NLRI
      indicates that the advertising BGP router can process entropy
      labels as an egress LSR for all routes in that NLRI.
    </t>

    <t>
      A BGP speaker S that originates an UPDATE should include the
      ELC attribute only if both of the following are true:
      <list style='format A%d:'>
	<t>
	  S sets the BGP NEXT_HOP attribute to itself; AND
	</t>
	<t>
	  S can process entropy labels.
	</t>
      </list>
    </t>

    <t>
      Suppose a BGP speaker T receives an UPDATE U with the ELC
      attribute.  T has two choices.  T can simply re-advertise U
      with the ELC attribute if either of the following is true:
      <list style='format B%d:'>
	<t>
	  T does not change the NEXT_HOP attribute; OR
	</t>
	<t>
	  T simply swaps labels without popping the entire label
	  stack and processing the payload below.
	</t>
      </list>
      An example of the use of B1 is Route Reflectors.
    </t>

    <t>
      However, if T changes the NEXT_HOP attribute for U and in the
      data plane pops the entire label stack to process the payload, T
      MAY include an ELC attribute for UPDATE U' if both of the
      following are true:
      <list style='format C%d:'>
	<t>
	  T sets the NEXT_HOP attribute of U' to itself; AND
	</t>
	<t>
	  T can process entropy labels.
	</t>
      </list>
      Otherwise, T MUST remove the ELC attribute.
    </t>
  </section>

  <section title="RSVP-TE Signaling" anchor='rsvp-te'>
    <t>
      Entropy Label support is signaled in RSVP-TE <xref
      target="RFC3209"/> using the Entropy Label Capability (ELC) flag
      in the Attribute Flags TLV of the LSP_ATTRIBUTES object <xref
      target="RFC5420"/>.  The presence of the ELC flag in a Path
      message indicates that the ingress can process entropy labels in
      the upstream direction; this only makes sense for a
      bidirectional LSP and MUST be ignored otherwise.  The presence
      of the ELC flag in a Resv message indicates that the egress can
      process entropy labels in the downstream direction.
    </t>
    <t>
      The bit number for the ELC flag is to be assigned by IANA.
    </t>
  </section>

  <section title="Multicast LSPs and Entropy Labels"
	   anchor='mlsp'>
    <t>
      Multicast LSPs <xref target="RFC4875"/>, <xref
      target='RFC6388'/> typically do not use ECMP for load balancing,
      as the combination of replication and multipathing can lead to
      duplicate traffic delivery.  However, these LSPs can traverse
      bundled links <xref target="RFC4201"/> and LAGs.  In both these
      cases, load balancing is useful, and hence entropy labels can be
      of value for multicast LSPs.
    </t>

    <t>
      The methodology defined for entropy labels here will be used for
      multicast LSPs; however, the details of signaling and processing
      ELs for multicast LSPs will be specified in a companion
      document.
    </t>
  </section>
</section>

<section title="Operations, Administration, and Maintenance (OAM) and Entropy Labels">
  <t>
    Generally OAM comprises a set of functions operating in the data
    plane to allow a network operator to monitor its network
    infrastructure and to implement mechanisms in order to enhance the
    general behavior and the level of performance of its network,
    e.g., the efficient and automatic detection, localization,
    diagnosis and handling of defects.
  </t>

  <t>
    Currently defined OAM mechanisms for MPLS include LSP
    Ping/Traceroute <xref target="RFC4379"/> and Bidirectional Failure
    Detection (BFD) for MPLS <xref target="RFC5884"/>.  The latter
    provides connectivity verification between the endpoints of an
    LSP, and recommends establishing a separate BFD session for every
    path between the endpoints.
  </t>

  <t>
    The LSP traceroute procedures of <xref target="RFC4379"/> allow an
    ingress LSR to obtain label ranges that can be used to send
    packets on every path to the egress LSR.  It works by having
    ingress LSR sequentially ask the transit LSRs along a particular
    path to a given egress LSR to return a label range such that the
    inclusion of a label in that range in a packet will cause the
    replying transit LSR to send that packet out the egress interface
    for that path.  The ingress provides the label range returned by
    transit LSR N to transit LSR N + 1, which returns a label range
    which is less than or equal in span to the range provided to it.
    This process iterates until the penultimate transit LSR replies to
    the ingress LSR with a label range that is acceptable to it and to
    all LSRs along path preceding it for forwarding a packet along the
    path.
  </t>

  <t>
    However, the LSP traceroute procedures do not specify where in the
    label stack the value from the label range is to be placed,
    whether deep packet inspection is allowed and if so, which keys
    and key values are to be used.
  </t>

  <t>
    This memo updates LSP traceroute by specifying that the value from
    the label range is to be placed in the entropy label.  Deep packet
    inspection is thus not necessary, although an LSR may use it,
    provided it do so consistently, i.e., if the label range to go to
    a given downstream LSR is computed with deep packet inspection,
    then the data path should use the same approach and the same keys.
  </t>

  <t>
    In order to have a BFD session on a given path, a value from the
    label range for that path should be used as the EL value for BFD
    packets sent on that path.
  </t>
</section>

<section title="MPLS-TP and Entropy Labels">
  <t>
    Since MPLS-TP does not use ECMP, entropy labels are not applicable to
    an MPLS-TP deployment.
  </t>
</section>

<section title="Entropy Labels in Various Scenarios" anchor='sig-forw'>
  <t>
    This section describes the use of entropy labels in various
    scenarios.
  </t>

  <t>
    In the figures below, the following conventions used to depict
    processing between X and Y.  Note that control plane signaling
    goes right to left, whereas data plane processing goes left to
    right.
    <figure>
      <artwork>
Protocols
Y:        <--- [L, E]                         Y signals L to X
    X ------------- Y
LS:   <L, ELI, EL>                            Label stack
X:  +<L, ELI, EL>                             X pushes <L, ELI, EL>
Y:                  -<L, ELI, EL>             Y pops <L, ELI, EL>
      </artwork>
    </figure>
    This means that Y signals to X label L for an LDP tunnel.  E can
    be one of:
    <list>
      <t>
	0: meaning egress is NOT entropy label capable, or
      </t>
      <t>
	1: meaning egress is entropy label capable.
      </t>
    </list>
    The line with LS: shows the label stack on the wire.  Below that
    is the operation that each LSR does in the data plane, where +
    means push the following label stack, - means pop the following
    label stack, L~L' means swap L with L', and * means that the
    operation is not depicted.
  </t>

  <section anchor="tunnels" title="LDP Tunnel">
    <t>
      The following illustrates several simple intra-AS LDP tunnels.
      The first diagram shows ultimate hop popping (UHP) with ingress
      inserting an EL, the second UHP with no ELs, the third PHP with
      ELs, and finally, PHP with no ELs, but also with an application
      label AL (which could, for example, be a VPN label).
    </t>
    <t>
      Note that, in all the cases below, the MPLS application does not
      matter; it may be that X pushes some more labels (perhaps for a
      VPN or VPLS) below the ones shown, and Y pops them.
      <figure title='LDP with UHP; ingress inserts ELs'>
	<artwork>
A:        <--- [TL4, 1]
B:                     <-- [TL3, 1]
...
W:                           <-- [TL1, 1]
Y:                                        <-- [TL0, 1]
    X --------------- A --------- B ... W ---------- Y
LS:    <TL4, ELI, EL>   <TL3,ELI,EL>      <TL0,ELI,EL>
X:  +<TL4, ELI, EL>
A:                    TL4~TL3
B:                                TL3~TL2
...
W:                                      TL1~TL0
Y:                                                   -<TL0, ELI, EL>
	</artwork>
      </figure>
      <figure title='LDP with UHP; ingress does not insert ELs'>
	<artwork>
A:        <--- [TL4, 1]
B:                     <-- [TL3, 1]
...
W:                           <-- [TL1, 1]
Y:                                        <-- [TL0, 1]
    X --------------- A --------- B ... W ---------- Y
LS:        <TL4>          <TL3>              <TL0>
X:  +<TL4>
A:                    TL4~TL3
B:                                TL3~TL2
...
W:                                      TL1~TL0
Y:                                                   -<TL0>
	</artwork>
      </figure>
      <figure title='LDP with PHP; ingress inserts ELs'>
	<artwork>
A:        <--- [TL4, 1]
B:                     <-- [TL3, 1]
...
W:                           <-- [TL1, 1]
Y:                                          <-- [3, 1]
    X --------------- A --------- B ... W ---------- Y
X:  +<TL4, ELI, EL>
A:                    TL4~TL3
B:                                TL3~TL2
...
W:                                      -TL1
Y:                                                   -<ELI, EL>
	</artwork>
      </figure>
      <figure title='LDP with PHP + VPN; ingress does not insert ELs'>
	<artwork>
A:        <--- [TL4, 1]
B:                     <-- [TL3, 1]
...
W:                           <-- [TL1, 1]
Y:                                          <-- [3, 1]
VPN:  <------------------------------------------ [AL]
    X --------------- A --------- B ... W ---------- Y
LS:      <TL4, AL>      <TL3, AL>             <AL>
X:  +<TL4, AL>
A:                    TL4~TL3
B:                                TL3~TL2
...
W:                                      -TL1
Y:                                                   -<AL>
	</artwork>
      </figure>
      <figure title='LDP with PHP + VPN; ingress inserts ELs'>
	<artwork>
A:        <--- [TL4, 1]
B:                        <-- [TL3, 1]
...
W:                              <-- [TL1, 1]
Y:                                             <-- [3, 1]
VPN:  <--------------------------------------------- [AL]
    X --------------- A ------------ B ... W ---------- Y
LS:  <TL4,ELI,EL,AL>  <TL3,ELI,EL,AL>        <ELI,EL,AL>
X:  +<TL4,ELI,EL,AL>
A:                    TL4~TL3
B:                                   TL3~TL2
...
W:                                         -TL1
Y:                                                      -<ELI,EL,AL>
	</artwork>
      </figure>
    </t>
  </section>

  <section anchor="ldp-rsvp" title="LDP Over RSVP-TE">
    <t>
      The following illustrates "LDP over RSVP-TE" tunnels.  X and Y are
      the ingress and egress (respectively) of the LDP tunnel; A and W
      are the ingress and egress of the RSVP-TE tunnel.  It is assumed
      that both the LDP and RSVP-TE tunnels have PHP.
      <figure title="LDP over RSVP-TE Tunnels" anchor='ldp-rvsp-fig'>
	<artwork>
LDP with ELs, RSVP-TE without ELs
LDP:       <--- [L4, 1]  <------- [L3, 1]  <--- [3, 1]
RSVP-TE:                <-- [Rn, 0]
                               <-- [3, 0]
    X --------------- A --------- B ... W ---------- Y
LS:    <L4, ELI, EL>   <Rn,L3,ELI,EL> ...  <ELI, EL>
DP: +<L4, ELI, EL>    L4~<Rn, L3> *     -L1          -<ELI, EL>
	</artwork>
      </figure>
    </t>
  </section>

  <section title='MPLS Applications'>
    <t>
      An ingress LSR X must keep state per unicast tunnel as to
      whether the egress for that tunnel can process entropy labels.
      X does not have to keep state per application running over that
      tunnel.  However, an ingress PE can choose on a per-application
      basis whether or not to insert ELs.  For example, X may have an
      application for which it does not wish to use ECMP (e.g.,
      circuit emulation), or for which it does not know which keys to
      use for load balancing (e.g., Appletalk over a pseudowire).  In
      either of those cases, X may choose not to insert entropy
      labels, but may choose to insert entropy labels for an IP VPN
      over the same tunnel.
    </t>
  </section>
</section>

<section anchor="sec-con" title="Security Considerations">
  <t>
    This document describes advertisement of the capability to support
    receipt of entropy labels which an ingress LSR may insert in MPLS
    packets in order to allow transit LSRs to attain better load
    balancing across LAG and/or ECMP paths in the network.
  </t>

  <t>
    This document does not introduce new security vulnerabilities to
    LDP, BGP or RSVP-TE.  Please refer to the Security Considerations
    section of these protocols (<xref target="RFC5036"/>, <xref
    target='RFC4271'/> and <xref target='RFC3209'/>) for security
    mechanisms applicable to each.
  </t>

  <t>
    Given that there is no end-user control over the values used for
    entropy labels, there is little risk of Entropy Label forgery
    which could cause uneven load-balancing in the network.
  </t>

  <t>
    If Entropy Label Capability is not signaled from an egress PE to
    an ingress PE, due to, for example, malicious configuration
    activity on the egress PE, then the PE will fall back to not using
    entropy labels for load-balancing traffic over LAG or ECMP paths
    which is in general no worse than the behavior observed in current
    production networks.  That said, it is recommended that operators
    monitor changes to PE configurations and, more importantly, the
    fairness of load distribution over LAG or ECMP paths.  If the
    fairness of load distribution over a set of paths changes that
    could indicate a misconfiguration, bug or other non-optimal
    behavior on their PEs and they should take corrective action.
  </t>
</section>

<section anchor="iana-con" title="IANA Considerations">
  <section title="Reserved Label for ELI">
    <t>
      IANA is requested to allocate a reserved label for the Entropy
      Label Indicator (ELI) from the "Multiprotocol Label Switching
      Architecture (MPLS) Label Values" Registry.
    </t>
  </section>
  <section title="LDP Entropy Label Capability TLV">
    <t>
      IANA is requested to allocate the next available value from the
      IETF Consensus range (0x0001-0x07FF) in the LDP TLV Type Name
      Space Registry as the "Entropy Label Capability TLV".
    </t>
  </section>

  <section title="BGP Entropy Label Capability Attribute">
    <t>
      IANA is requested to allocate the next available Path Attribute
      Type Code from the "BGP Path Attributes" registry as the "BGP
      Entropy Label Capability Attribute".
    </t>
  </section>

  <section title="RSVP-TE Entropy Label Capability flag">
    <t>
      IANA is requested to allocate a new bit from the "Attribute
      Flags" sub-registry of the "RSVP TE Parameters" registry.
    </t>

    <t>
      <figure>
        <artwork>
Bit | Name                     | Attribute  | Attribute  | RRO
No  |                          | Flags Path | Flags Resv |    
----+--------------------------+------------+------------+-----
TBD   Entropy Label Capability       Yes          Yes       No
	</artwork>
      </figure>
    </t>
  </section>
</section>

<section title="Acknowledgments">
  <t>
    We wish to thank Ulrich Drafz for his contributions, as well as
    the entire 'hash label' team for their valuable comments and
    discussion.
  </t>
  <t>
    Sincere thanks to Nischal Sheth for his many suggestions and
    comments, and his careful reading of the document, especially with
    regard to data plane processing of entropy labels.
  </t>
</section>
</middle>

<back>
  <references title="Normative References">
    <?rfc include='reference.RFC.2119'?>
    <?rfc include='reference.RFC.3031'?>
    <?rfc include='reference.RFC.3032'?>
    <?rfc include='reference.RFC.3107'?>
    <?rfc include='reference.RFC.3209'?>
    <?rfc include='reference.RFC.5036'?>
    <?rfc include='reference.RFC.5420'?>
  </references>

  <references title="Informative References">
    <?rfc include='reference.RFC.4201'?>
    <?rfc include='reference.RFC.4271'?>
    <?rfc include='reference.RFC.4379'?>
    <?rfc include='reference.RFC.4447'?>
    <?rfc include='reference.RFC.4875'?>
    <?rfc include='reference.RFC.5884'?>
    <?rfc include='reference.RFC.6388'?>
    <?rfc include='reference.RFC.6391'?>
  </references>

  <section title="Applicability of LDP Entropy Label Capability TLV">
    <t>
      In the case of unlabeled IPv4 (Internet) traffic, the Best
      Current Practice is for an egress LSR to propagate eBGP learned
      routes within a SP's Autonomous System after resetting the BGP
      next-hop attribute to one of its Loopback IP addresses.  That
      Loopback IP address is injected into the Service Provider's IGP
      and, concurrently, a label assigned to it via LDP.  Thus, when
      an ingress LSR is performing a forwarding lookup for a BGP
      destination it recursively resolves the associated next-hop to a
      Loopback IP address and associated LDP label of the egress LSR.
    </t>

    <t>
      Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy
      Label Capability TLV will typically be applied only to the FEC
      for the Loopback IP address of the egress LSR and the egress LSR
      need not announce an entropy label capability for the eBGP
      learned route.
    </t>
  </section>
</back>
</rfc>

PAFTECH AB 2003-20262026-04-21 22:14:41