One document matched: draft-ietf-pwe3-fat-pw-07.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<rfc category="std" docName="draft-ietf-pwe3-fat-pw-07" ipr="trust200902">
  <front>
    <title abbrev="FAT-PW">Flow Aware Transport of Pseudowires over an MPLS
    Packet Switched Network</title>

    <author fullname="Stewart Bryant" initials="S" role="editor"
            surname="Bryant">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>250 Longwater Ave</street>

          <city>Reading</city>

          <code>RG2 6GB</code>

          <country>United Kingdom</country>
        </postal>

        <phone>+44-208-824-8828</phone>

        <email>stbryant@cisco.com</email>
      </address>
    </author>

    <author fullname="Clarence Filsfils" initials="C" surname="Filsfils">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street></street>

          <city>Brussels</city>

          <country>Belgium</country>
        </postal>

        <email>cfilsfil@cisco.com</email>
      </address>
    </author>

    <author fullname="Ulrich Drafz" initials="U" surname="Drafz">
      <organization>Deutsche Telekom</organization>

      <address>
        <postal>
          <street></street>

          <city>Muenster</city>

          <country>Germany</country>
        </postal>

        <email>Ulrich.Drafz@t-com.net</email>
      </address>
    </author>

    <author fullname="Vach Kompella" initials="V" surname="Kompella">
      <organization>Alcatel-Lucent</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>Alcatel-Lucent vach.kompella@alcatel-lucent.com</email>
      </address>
    </author>

    <author fullname="Joe Regan" initials="J" surname="Regan">
      <organization>Alcatel-Lucent</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>joe.regan@alcatel-lucent.comRegan</email>
      </address>
    </author>

    <author fullname="Shane Amante " initials="S" surname="Amante">
      <organization>Level 3 Communications</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>shane@castlepoint.net</email>
      </address>
    </author>

    <date year="2011" />

    <area>Internet</area>

    <workgroup>PWE3</workgroup>

    <keyword></keyword>

    <keyword>pseudowire</keyword>

    <keyword>MPLS</keyword>

    <keyword>Internet-Draft</keyword>

    <abstract>
      <t>Where the payload of a pseudowire comprises a number of distinct
      flows, it can be desirable to carry those flows over the equal cost
      multiple paths (ECMPs) that exist in the packet switched network. Most
      forwarding engines are able to generate a hash of the MPLS label stack
      and use this mechanism to balance MPLS flows over ECMPs.</t>

      <t>This document describes a method of identifying the flows, or flow
      groups, within pseudowires such that Label Switching Routers can balance
      flows at a finer granularity than individual pseudowires. The mechanism
      uses an additional label in the MPLS label stack.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>A pseudowire (PW) <xref target="RFC3985"></xref> is normally
      transported over one single network path, even if multiple Equal Cost
      Multiple Paths (ECMP) exist between the ingress and egress PW provider
      edge (PE) equipment<xref target="RFC4385"> </xref> <xref
      target="RFC4928"> </xref>. This is required to preserve the
      characteristics of the emulated service (e.g. to avoid misordering SAToP
      PW packets <xref target="RFC4553"></xref> or subjecting the packets to
      unusable inter-arrival times). The use of a single path to preserve
      order remains the default mode of operation of a PW. The new capability
      proposed in this document is an OPTIONAL mode which may be used when the
      use of ECMP is known to be beneficial (and not harmful) to the operation
      of the PW.</t>

      <t>Some PWs are used to transport large volumes of IP traffic between
      routers. One example of this is the use of an Ethernet PW to create a
      virtual direct link between a pair of routers. Such PWs may carry from
      hundreds of Mbps to Gbps of traffic. These PWs only require packet
      ordering to be preserved within the context of each individual
      transported IP flow. They do not require packet ordering to be preserved
      between all packets of all IP flows within the pseudowire.</t>

      <t>The ability to explicitly configure such a PW to leverage the
      availability of multiple ECMPs allows for better capacity planning as
      the statistical multiplexing of a larger number of smaller flows is more
      efficient than with a smaller set of larger flows.</t>

      <t></t>

      <t>Typically, forwarding hardware can deduce that an IP payload is being
      directly carried by an MPLS label stack, and it is capable of looking at
      some fields in packets to construct hash buckets for conversations or
      flows. However, when the MPLS payload is a PW, an intermediate node has
      no information on the type of PW being carried in the packet. This
      limits the forwarder at the intermediate node to only being able to make
      an ECMP choice based on a hash of the MPLS label stack. In the case of a
      PW emulating a high bandwidth trunk, the granularity obtained by hashing
      the label stack is inadequate for satisfactory load-balancing. The
      ingress node, however, is in the special position of being able to look
      at the un-encapsulated packet and spread flows amongst any available
      ECMPs, or even any Loop-Free Alternates <xref target="RFC5286"></xref>.
      This document defines a method to introduce granularity on the hashing
      of traffic running over PWs by introducing an additional label, chosen
      by the ingress node, and placed at the bottom of the label stack.</t>

      <t>In addition to providing an indication of the flow structure for use
      in ECMP forwarding decisions, the mechanism described in the document
      may also be used to select flows for distribution over an 802.1ad link
      aggregation group that has been used in an MPLS network.</t>

      <t>NOTE: Although Ethernet is frequently referenced as a use case in
      this RFC, the mechanisms described in this document are general
      mechanisms that may be applied to any PW type in which there are
      identifiable flows, and in which there is no requirement to preserve the
      order between those flows.</t>

      <t></t>

      <section title="ECMP in Label Switching Routers">
        <t>Label switching routers (LSRs) commonly generate a hash of the
        label stack or some elements of the label stack as a method of
        discriminating between flows, and use this to distribute those flows
        over the available ECMPs that exist in the network. Since the label at
        the bottom of stack is usually the label most closely associated with
        the flow, this normally provides the greatest entropy, and hence is
        usually included in the hash. This document describes a method of
        adding an additional label stack entry (LSE) at the bottom of stack in
        order to facilitate the load balancing of the flows within a PW over
        the available ECMPs. A similar design for general MPLS use has also
        been proposed <xref target="I-D.kompella-mpls-entropy-label"></xref>,
        <xref target="MPLS"></xref>.</t>

        <t>An alternative method of load balancing by creating a number of PWs
        and distributing the flows amongst them was considered, but was
        rejected because:<list style="symbols">
            <t>It did not introduce as much entropy as can be introduced by
            adding an additional LSE.</t>

            <t>It required additional PWs to be set up and maintained.</t>
          </list></t>
      </section>

      <section title="Flow Label">
        <t>An additional LSE <xref target="RFC3032"></xref> is interposed
        between the PW LSE and the control word, or if the control word is not
        present, between the PW LSE and the PW payload. This additional LSE is
        called the flow LSE and the label carried by the flow LSE is called
        the flow label. Indivisible flows within the PW MUST be mapped to the
        same flow label by the ingress PE. The flow label stimulates the
        correct ECMP load balancing behaviour in the packet switched network
        (PSN). On receipt of the PW packet at the egress PE (which knows a
        flow LSE is present) the flow LSE is discarded without processing.</t>

        <t>Note that the flow label MUST NOT be an MPLS reserved label (values
        in the range 0..15) <xref target="RFC3032"></xref>, but is otherwise
        unconstrained by the protocol.</t>

        <t>It is useful to give consideration to the choice of TTL value in
        the flow LSE <xref target="RFC3032"></xref>. The flow LSE is at the
        bottom of label stack, therefore, even when penultimate hop popping is
        employed, it will always be will preceded by the PW label on arrival
        at the PE. If, due to an error condition the flow LSE becomes top of
        stack it might be examined as if it were a normal LSE, and the packet
        might then be forwarded. This can be prevented by setting the flow LSE
        TTL to 1, thereby forcing the packet to be discarded by the forwarder.
        Note that this may be a departure from considerations that apply to
        the general MPLS case.</t>

        <t>This document does not define a use for the TC bits (formerly known
        as the EXP bits) in the flow label. Future documents may define a use
        for these bits, therefore implementations conforming to this
        specification MUST set the TC bits to zero at the ingress and MUST
        ignore them at the egress.</t>
      </section>
    </section>

    <section title="Native Service Processing Function">
      <t>The Native Service Processing (NSP) function <xref
      target="RFC3985"></xref> is a component of a PE that has knowledge of
      the structure of the emulated service and is able to take action on the
      service outside the scope of the PW. In this case it is required that
      the NSP in the ingress PE identify flows, or groups of flows within the
      service, and indicate the flow (group) identity of each packet as it is
      passed to the pseudowire forwarder. As an example, where the PW type is
      an Ethernet, the NSP might parse the ingress Ethernet traffic and
      consider all of the IP traffic. This traffic could then be categorised
      into flows by considering all traffic with the same source and
      destination address pair to be a single indivisible flow. Since this is
      an NSP function, by definition, the method used to identify a flow is
      outside the scope of the PW design. Similarly, since the NSP is internal
      to the PE, the method of flow indication to the PW forwarder is outside
      the scope of this document.</t>
    </section>

    <section title="Pseudowire Forwarder">
      <t>The PW forwarder must be provided with a method of mapping flows to
      load balanced paths.</t>

      <t>The forwarder must generate a label for the flow or group of flows.
      How the flow label values are determined is outside the scope of this
      document, however the flow label allocated to a flow MUST NOT be an MPLS
      reserved label and SHOULD remain constant for the life of the flow. It
      is RECOMMENDED that the method chosen to generate the load balancing
      labels introduces a high degree of entropy in their values, to maximise
      the entropy presented to the ECMP selection mechanism in the LSRs in the
      PSN, and hence distribute the flows as evenly as possible over the
      available PSN ECMP. The forwarder at the ingress PE prepends the PW
      control word (if applicable), and then pushes the flow label, followed
      by the PW label.</t>

      <t>NOTE: Although this document does not attempt to specify any hash
      algorithms, it is suggested that any such algorithm should be based on
      the assumption that there will be a high degree of entropy in the values
      assigned to the load balancing labels.</t>

      <t>The forwarder at the egress PE uses the pseudowire label to identify
      the pseudowire. From the context associated with the pseudowire label,
      the egress PE can determine whether a flow LSE is present. If a flow LSE
      is present, it MUST be checked to determine whether it carries a
      reserved label. If it is a reserved label the packet is processed
      according to the rules associated with that reserved label, otherwise
      the LSE is discarded.</t>

      <t>All other PW forwarding operations are unmodified by the inclusion of
      the flow LSE.</t>

      <section title="Encapsulation ">
        <t>The PWE3 Protocol Stack Reference Model modified to include flow
        LSE is shown in <xref target="PStack"></xref> below</t>

        <figure anchor="PStack" title="PWE3 Protocol Stack Reference Model">
          <artwork><![CDATA[
   +-------------+                                +-------------+
   |  Emulated   |                                |  Emulated   |
   |  Ethernet   |                                |  Ethernet   |
   | (including  |         Emulated Service       | (including  |
   |  VLAN)      |<==============================>|  VLAN)      |
   |  Services   |                                |  Services   |
   +-------------+                                +-------------+
   |    Flow     |                                |    Flow     |
   +-------------+            Pseudowire          +-------------+
   |Demultiplexer|<==============================>|Demultiplexer|
   +-------------+                                +-------------+
   |    PSN      |            PSN Tunnel          |    PSN      |
   |   MPLS      |<==============================>|   MPLS      |
   +-------------+                                +-------------+
   |  Physical   |                                |  Physical   |
   +-----+-------+                                +-----+-------+

]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>

        <t>The encapsulation of a PW with a flow LSE is shown in <xref
        target="Encap"></xref> below</t>

        <figure anchor="Encap"
                title="Encapsulation of a pseudowire with a pseudowire flow LSE">
          <artwork><![CDATA[

    +---------------------------+
    |                           |
    |  Payload                  |
    |                           |  n octets
    |                           |
    +---------------------------+
    |  Optional Control Word    |  4 octets   
    +---------------------------+
    |  Flow LSE                 |  4 octets 
    +---------------------------+
    |  PW LSE                   |  4 octets  
    +---------------------------+
    |  MPLS Tunnel LSE (s)      |  n*4 octets (four octets per LSE)
    +---------------------------+
   
    

]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>
      </section>
    </section>

    <section title="Signaling the Presence of the Flow Label">
      <t>When using the signalling procedures in <xref
      target="RFC4447"></xref>, a new Pseudowire Interface Parameter Sub-TLV,
      the Flow Label Sub-TLV (FL Sub-TLV), is used to synchronise the flow
      label states between the ingress and egress PEs.</t>

      <t>The absence of a FL Sub-TLV indicates that the PE is unable to
      process flow labels. A PE that is using PW signalling and that does not
      send a FL Sub-TLV MUST NOT include a flow label in the PW packet. A PE
      that is using PW signalling and which does not receive a FL Sub-TLV from
      its peer MUST NOT include a flow label in the PW packet. This preserves
      backwards compatibility with existing PW specifications.</t>

      <t>A PE that wishes to send a flow label in a PW packet MUST include in
      its label mapping message a FL Sub-TLV with T = 1 (see <xref
      target="FL-TLV-strut"></xref>).</t>

      <t>A PE that is willing to receive a flow label MUST include in its
      label mapping message a FL Sub-TLV with R = 1 (see <xref
      target="FL-TLV-strut"></xref>).</t>

      <t>A PE that receives a label mapping message a FL Sub-TLV with R = 0
      MUST NOT include a flow label in the PW packet.</t>

      <t>Thus a PE sending a FL Sub-TLV with T = 1 and receiving a FL Sub-TLV
      with R = 1 MUST include a flow label in the PW packet. Under all other
      combinations of FL Sub-TLV signalling a PE MUST NOT include a flow label
      in the PW packet.</t>

      <t>The signalling procedures in <xref target="RFC4447"></xref> state
      that "Processing of the interface parameters should continue when
      unknown interface parameters are encountered, and they MUST be silently
      ignored." The signalling procedure described here is therefore backwards
      compatible with existing implementations.</t>

      <t>Note that what is signalled is the desire to include the flow LSE in
      the label stack. The value of the flow label is a local matter for the
      ingress PE, and the label value itself is not signalled.</t>

      <section anchor="FL-TLV-strut" title="Structure of Flow Label Sub-TLV">
        <t>The structure of the flow label TLV is shown in <xref
        target="FLSubTLV"></xref>.</t>

        <figure anchor="FLSubTLV" title="Flow Label Sub-TLV">
          <preamble></preamble>

          <artwork><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FL=0x17       |    Length     |T|R|      Reserved             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>

        <t>Where:</t>

        <t><list style="symbols">
            <t>FL (value 0x17) is the flow label sub-TLV identifier assigned
            by IANA (see<xref target="IANAsect"></xref> ).</t>

            <t>Length is the length of the TLV in octets and is 4.</t>

            <t>When T=1 the PE is requesting the ability to send a PW packet
            that includes a flow label. When T=0, the PE is indicating that it
            will not send a PW packet containing a flow label.</t>

            <t>When R=1 the PE is able to receive a PW packet with a flow
            label present. When R=0 the PE is unable to receive a PW packet
            with the flow label present.</t>

            <t>Reserved bits MUST be zero on transmit and MUST be ignored on
            receive.</t>
          </list></t>
      </section>
    </section>

    <section title="Static Pseudowires">
      <t>If PWE3 signalling <xref target="RFC4447"></xref> is not in use for a
      PW, then whether the flow label is used MUST be identically provisioned
      in both PEs at the PW endpoints. If there is no provisioning support for
      this option, the default behaviour is not to include the flow label.</t>
    </section>

    <section title="Multi-Segment Pseudowires">
      <t>The flow label mechanism described in this document works on
      multi-segment PWs without requiring modification to the Switching PEs
      (S-PEs). This is because the flow LSE is transparent to the label swap
      operation, and because interface parameter Sub-TLV signalling is
      transitive.</t>
    </section>

    <section title="OAM">
      <t>The following OAM considerations apply to this method of load
      balancing.</t>

      <t>Where the OAM is only to be used to perform a basic test that the PWs
      have been configured at the PEs, <xref target="RFC5085">VCCV</xref>
      messages may be sent using any load balance PW path, i.e. using any
      value for the flow label.</t>

      <t>Where it is required to verify that a pseudowire is fully functional
      for all flows, <xref target="RFC5085">VCCV</xref> connection
      verification message MUST be sent over each ECMP path to the pseudowire
      egress PE. This solution may be difficult to achieve and scales poorly.
      Under these circumstances, it may be sufficient to send VCCV messages
      using any load balance pseudowire path because if a failure occurs
      within the PSN the failure will normally be detected and repaired by the
      PSN. That is, the PSN's Interior Gateway protocol (IGP) link/node
      failure detection mechanism (loss of light, bidirectional forwarding
      detection <xref target="RFC5880"></xref> or IGP hello detection), and
      the IGP convergence will naturally modify the ECMP set of network paths
      between the ingress and egress PE's. Hence the PW is only impacted
      during the normal IGP convergence time. Note that this period may be
      reduced if a fast re-route or fast convergence technology is deployed in
      the network <xref target="RFC4090"></xref>, <xref
      target="RFC5286"></xref>.</t>

      <t>If the failure is related to the individual corruption of a Label
      Forwarding Information database (LFIB) entry in a router, then only the
      network path using that specific entry is impacted. If the PW is load
      balanced over multiple network paths, then this failure can only be
      detected if, by chance, the transported OAM flow is mapped onto the
      impacted network path, or if all paths are tested. Since testing all
      paths may present problems as noted above, other mechanisms to detect
      this type of error may need to be developed, such as an LSP self test
      technology.</t>

      <t>To troubleshoot the MPLS PSN, including multiple paths, the
      techniques described in <xref target="RFC4378"></xref> and <xref
      target="RFC4379"></xref> can be used.</t>

      <t>Where the PW OAM is carried out of band (VCCV Type 2) <xref
      target="RFC5085"></xref> it is necessary to insert an "MPLS Router Alert
      Label" in the label stack. The resultant label stack is a follows:</t>

      <t><figure anchor="OBVCCV" title="Use of Router Alert Label">
          <preamble></preamble>

          <artwork><![CDATA[  
   +-------------------------------+
   |                               |
   |      VCCV Message             |  n octets 
   |                               |
   +-------------------------------+
   |   Optional Control Word       |  4 octets   
   +-------------------------------+
   |      Flow label               |  4 octets 
   +-------------------------------+
   |      PW label                 |  4 octets  
   +-------------------------------+
   |      Router Alert label       |  4 octets 
   +-------------------------------+ 
   |      MPLS Tunnel label(s)     |  n*4 octets (four octets per label)
   +-------------------------------+ 

]]></artwork>

          <postamble></postamble>
        </figure></t>

      <t>Note that, depending on the number of labels hashed by the LSR, the
      inclusion of the Router Alert label may cause the OAM packet to be load
      balanced to a different path from that taken by the data packets with
      identical Flow and PW labels.</t>
    </section>

    <section title="Applicability of PWs using Flow Labels">
      <t>A node within the PSN is not able to perform deep-packet-inspection
      (DPI) of the PW as the PW technology is not self-describing: the
      structure of the PW payload is only known to the ingress and egress PE
      devices. The method proposed in this document provides a statistical
      mitigation of the problem of load balance in those cases where a PE is
      able to discern flows embedded in the traffic received on the attachment
      circuit.</t>

      <t>The methods described in this document are transparent to the PSN and
      as such do not require any new capability from the PSN.</t>

      <t>The requirement to load-balance over multiple PSN paths occurs when
      the ratio between the PW access speed and the PSN’s core link
      bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet
      this property. Ethernet may have this property, and for that reason this
      document focuses on Ethernet. Applications for other
      high-access-bandwidth PW’s (e.g. Fibre Channel) may be defined in
      the future.</t>

      <t>This design applies to MPLS PWs where it is meaningful to
      de-construct the packets presented to the ingress PE into flows. The
      mechanism described in this document promotes the distribution of flows
      within the PW over different network paths. This in turn means that
      whilst packets within a flow are delivered in order (subject to normal
      IP delivery perturbations due to topology variation), order is no longer
      maintained for all packets sent over the PW. It is not proposed to
      associate a different sequence number with each flow. If sequence number
      support is required the flow label mechanism MUST NOT be used.</t>

      <t>Where it is known that the traffic carried by the Ethernet PW is IP
      the flows can be identified and mapped to an ECMP. Such methods
      typically include hashing on the source and destination addresses, the
      protocol ID and higher-layer flow-dependent fields such as TCP/UDP
      ports, L2TPv3 Session IDs etc.</t>

      <t>Where it is known that the traffic carried by the Ethernet PW is
      non-IP, techniques used for link bundling between Ethernet switches may
      be reused. In this case however the latency distribution would be larger
      than is found in the link bundle case. The acceptability of the
      increased latency is for further study. Of particular importance the
      Ethernet control frames SHOULD always be mapped to the same PSN path to
      ensure in-order delivery.</t>

      <section title="Equal Cost Multiple Paths">
        <t>ECMP in packet switched networks is statistical in nature. The
        mapping of flows to a particular path does not take into account the
        bandwidth of the flow being mapped or the current bandwidth usage of
        the members of the ECMP set. This simplification works well when the
        distribution of flows is evenly spread over the ECMP set and there are
        a large number of flows that have low bandwidth relative to the paths.
        The random allocation of a flow to a path provides a good
        approximation to an even spread of flows, provided that polarisation
        effects are avoided. The method defined in this document has the same
        statistical properties as an IP PSN.</t>

        <t>ECMP is a load-sharing mechanism that is based on sharing the load
        over a number of layer 3 paths through the PSN. Often however multiple
        links exist between a pair of LSRs that are considered by the IGP to
        be a single link. These are known as link bundles. The mechanism
        described in this document can also be used to distribute the flows
        within a PW over the members of the link bundle by using the flow
        label value to identify candidate flows. How that mapping takes place
        is outside the scope of this specification. Similar considerations
        apply to link aggregation groups.</t>

        <t>There is no mechanism currently defined to indicate the bandwidths
        in use by specific flows using the fields of the MPLS shim header.
        Furthermore, since the semantics of the MPLS shim header are fully
        defined in <xref target="RFC3032"></xref> and <xref
        target="RFC5462"></xref>, those fields cannot be assigned semantics to
        carry this information. This document does not define any semantic for
        use in the TTL or TC fields of the label entry that carries the flow
        label, but requires that the flow label itself be selected with a high
        degree of entropy suggesting that the label value should not be
        overloaded with additional meaning in any subsequent
        specification.</t>

        <t>A different type of load balancing is the desire to carry a PW over
        a set of PSN links in which the bandwidth of members of the link set
        is less than the bandwidth of the PW. Proposals to address this
        problem have been made in the past<xref
        target="I-D.stein-pwe3-pwbonding"> </xref>. Such a mechanism can be
        considered complementary to this mechanism.</t>
      </section>

      <section title="Link Aggregation Groups">
        <t>A Link Aggregation Group (LAG) is used to bond together several
        physical circuits between two adjacent nodes so they appear to
        higher-layer protocols as a single, higher bandwidth "virtual" pipe.
        These may co-exist in various parts of a given network. An advantage
        of LAGs is that they reduce the number of routing and signalling
        protocol adjacencies between devices, reducing control plane
        processing overhead. As with ECMP, the key problem related to LAGs is
        that due to inefficiencies in LAG load-distribution algorithms, a
        particular component of a LAG may experience congestion. The mechanism
        proposed here may be able to assist in producing a more uniform flow
        distribution.</t>

        <t>The same considerations requiring a flow to go over a single member
        of an ECMP set apply to a member of a LAG.</t>
      </section>

      <section title="Multiple RSVP-TE Paths">
        <t>In some networks it is desirable for a Label Edge Router (LER) to
        be able to load balance a PW across multiple RSVP-TE tunnels. The flow
        label mechanism described in this document may be used to provide the
        LER with the required flow information, and necessary entropy to
        provide this type of load balancing. An example of such a case is the
        use of the flow label mechanism in networks using a link bundle with
        the all ones component <xref target="RFC4201"></xref>.</t>

        <t>Methods by which the LER is configured to apply this type of ECMP
        is outside the scope of this document.</t>
      </section>

      <section title="The Single Large Flow Case">
        <t>Clearly the operator should make sure that the service offered
        using PW technology and the method described in this document does not
        exceed the maximum planned link capacity, unless it can be guaranteed
        that it conforms to the Internet traffic profile of a very large
        number of small flows.</t>

        <t>If the NSP cannot access sufficient information to distinguish
        flows, perhaps because the protocol stack required parsing further
        into the packet than it is able, then the functionality described in
        this document does not give any benefits. The most common case where a
        single flow dominates the traffic on a PW is when it is used to
        transport enterprise traffic. Enterprise traffic may well consist of a
        single, large TCP flow, or encrypted flows that cannot be handled by
        the methods described in this document.</t>

        <t>An operator has four options under these circumstances:</t>

        <t><list style="numbers">
            <t>The operator can choose to do nothing and the system will work
            as it does without the flow label.</t>

            <t>The operator can make the customer aware that the service
            offering has a restriction on flow bandwidth and police flows to
            that restriction. This would allow customers offering multiple
            flows to use a larger fraction their access bandwidth, whilst
            preventing a single flow from consuming a fraction of internal
            link bandwidth that the operator considered excessive.</t>

            <t>The operator could configure the ingress PE to assign a
            constant flow label to all high bandwidth flows so that only one
            path was affected by these flows.</t>

            <t>The operator could configure the ingress PE to assign a random
            flow label to all high bandwidth flows so as to minimise the
            disruption to the network as a cost of out of order traffic to the
            user.</t>
          </list></t>

        <t>The issues described above are mitigated by the following two
        factors:</t>

        <t><list style="symbols">
            <t>Firstly, the customer of a high-bandwidth PW service has an
            incentive to get the best transport service because an inefficient
            use of the PSN leads to jitter and eventually to loss to the
            PW’s payload.</t>

            <t>Secondly, the customer is usually able to tailor their
            applications to generate many flows in the PSN. A well-known
            example is massive data transport between servers which use many
            parallel TCP sessions. This same technique can be used by any
            transport protocol: multiple UDP ports, multiple L2TPv3 Session
            ID’s, multiple GRE keys may be used to decompose a large
            flow into smaller components. This approach may be applied to
            IPsec <xref target="RFC4301"></xref> where multiple Security
            Parameters Indexes (SPIs) may be allocated to the same security
            association.</t>
          </list></t>
      </section>

      <section title="Applicability to MPLS-TP ">
        <t>The MPLS Transport Profile (MPLS-TP) <xref target="RFC5654"></xref>
        requirement 44 states that "MPLS-TP MUST support mechanisms that
        ensure the integrity of the transported customer's service traffic as
        required by its associated SLA. Loss of integrity may be defined as
        packet corruption, reordering, or loss during normal network
        conditions. " In addition MPLS-TP makes extensive use of the fate
        sharing between OAM and data packets, which is defeated by the flow
        LSE. The flow aware transport of a PW reorders packets, therefore MUST
        NOT be deployed in a network conforming to the MPLS-TP unless these
        integrity requirements specified in the SLA can be satisfied. In a</t>

        <t></t>
      </section>

      <section title="Asymmetric Operation">
        <t>The protocol defined in this document supports the asymmetric
        inclusion of the flow LSE. Asymmetric operation can be expected when
        there is asymmetry in the bandwidth requirements making it
        unprofitable for one PE to perform the flow classification, or when
        that PE is otherwise unable to perform the classification but is able
        to receive flow labeled packet from its peer. Asymmetric operation of
        the PW may also be required when one PE has a high transmission
        bandwidth requirement, but has a need to receive the entire PW on a
        single interface in order to perform a processing operation that
        requires the context of the complete PW (for example policing of the
        egress traffic).</t>
      </section>
    </section>

    <section anchor="MPLS" title="Applicability to MPLS LSPs">
      <t>An extension of this technique is to create a basis for hash
      diversity without having to peek below the label stack for IP traffic
      carried over LDP LSPs. The generalisation of this extension to MPLS has
      been described in <xref
      target="I-D.kompella-mpls-entropy-label"></xref>. This generalization
      can be regarded as a complementary, but distinct, approach from the
      technique described in this document. While similar consideration may
      apply to the identification of flows and the allocation of flow label
      values, the flow labels are imposed by different network components, and
      the associated signalling mechanisms are different.</t>
    </section>

    <section anchor="SECCON" title="Security Considerations">
      <t>The PW generic security considerations described in <xref
      target="RFC3985"></xref> and the security considerations applicable to a
      specific PW type (for example, in the case of an Ethernet PW <xref
      target="RFC4448"></xref> apply. The security considerations in <xref
      target="RFC5920"></xref> also apply.</t>

      <t>Section 1.2 describes considerations that apply to the TTL value used
      in the flow LSE. The use of a TTL value of one prevents the accidental
      forwarding of a packet based on the label value in the flow LSE.</t>
    </section>

    <section anchor="IANAsect" title="IANA Considerations">
      <t>IANA is requested to amend the PW Interface Parameters Sub-TLV type
      Registry value 0x17 (Flow Label indicator) to refer to this RFC.</t>

      <figure>
        <artwork><![CDATA[
Parameter  Length       Description
ID

0x17         4           Flow Label
]]></artwork>

        <postamble></postamble>
      </figure>
    </section>

    <section title="Congestion Considerations">
      <t>The congestion considerations applicable to PWs as described in <xref
      target="RFC3985"></xref> and any additional congestion considerations
      developed at the time of publication apply to this design.</t>

      <t>The ability to explicitly configure a PW to leverage the availability
      of multiple ECMPs is beneficial to capacity planning as, all other
      parameters being constant, the statistical multiplexing of a larger
      number of smaller flows is more efficient than with a smaller number of
      larger flows.</t>

      <t>Note that if the classification into flows is only performed on IP
      packets the behaviour of those flows in the face of congestion will be
      as already defined by the IETF for packets of that type and no
      additional congestion processing is required.</t>

      <t>Where flows that are not IP are classified PW congestion avoidance
      must be applied to each non-IP load balance group.</t>
    </section>

    <section title="Acknowledgements">
      <t>The authors wish to thank Mary Barns, Eric Grey, Kireeti Kompella,
      Joerg Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, Rolf
      Winter and Lucy Yong for valuable comments on this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.4447'?>

      <?rfc include='reference.RFC.4448'?>

      <?rfc include='reference.RFC.4928'?>

      <?rfc include='reference.RFC.4553'?>

      <?rfc include='reference.RFC.4385'?>

      <?rfc include='reference.RFC.4379'?>

      <?rfc include='reference.RFC.3032'?>

      <?rfc include='reference.RFC.5085'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.3985'?>

      <?rfc include='reference.RFC.4378'?>

      <?rfc include='reference.I-D.kompella-mpls-entropy-label'?>

      <?rfc include='reference.RFC.5286'?>

      <?rfc include='reference.I-D.stein-pwe3-pwbonding'?>

      <?rfc include='reference.RFC.5654'?>

      <?rfc include='reference.RFC.5880'?>

      <?rfc include='reference.RFC.4301'?>

      <?rfc include='reference.RFC.4201'?>

      <?rfc include='reference.RFC.4090'?>

      <?rfc include='reference.RFC.5462'?>

      <?rfc include='reference.RFC.5920'?>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-21 22:26:13