One document matched: draft-ietf-pwe3-fat-pw-03.xml


<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc toc="yes"?>
<rfc category="std" docName="draft-ietf-pwe3-fat-pw-03" ipr="trust200902">
  <front>
    <title abbrev="FAT-PW">Flow Aware Transport of Pseudowires over an MPLS
    PSN</title>

    <author fullname="Stewart Bryant" initials="S" role="editor"
            surname="Bryant">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>250 Longwater Ave</street>

          <city>Reading</city>

          <code>RG2 6GB</code>

          <country>United Kingdom</country>
        </postal>

        <phone>+44-208-824-8828</phone>

        <email>stbryant@cisco.com</email>
      </address>
    </author>

    <author fullname="Clarence Filsfils" initials="C" surname="Filsfils">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street></street>

          <city>Brussels</city>

          <country>Belgium</country>
        </postal>

        <email>cfilsfil@cisco.com</email>
      </address>
    </author>

    <author fullname="Ulrich Drafz" initials="U" surname="Drafz">
      <organization>Deutsche Telekom</organization>

      <address>
        <postal>
          <street></street>

          <city>Muenster</city>

          <country>Germany</country>
        </postal>

        <email>Ulrich.Drafz@t-com.net</email>
      </address>
    </author>

    <author fullname="Vach Kompella" initials="V" surname="Kompella">
      <organization>Alcatel-Lucent</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>Alcatel-Lucent vach.kompella@alcatel-lucent.com</email>
      </address>
    </author>

    <author fullname="Joe Regan" initials="J" surname="Regan">
      <organization>Alcatel-Lucent</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>joe.regan@alcatel-lucent.comRegan</email>
      </address>
    </author>

    <author fullname="Shane Amante " initials="S" surname="Amante">
      <organization>Level 3 Communications</organization>

      <address>
        <postal>
          <street></street>
        </postal>

        <email>shane@castlepoint.net</email>
      </address>
    </author>

    <date year="2010" />

    <area>Internet</area>

    <workgroup>PWE3</workgroup>

    <keyword></keyword>

    <keyword>pseudowire</keyword>

    <keyword>MPLS</keyword>

    <keyword>Internet-Draft</keyword>

    <abstract>
      <t>Where the payload carried over a pseudowire carries a number of
      identifiable flows it can in some circumstances be desirable to carry
      those flows over the equal cost multiple paths (ECMPs) that exist in the
      packet switched network. Most forwarding engines are able to hash based
      on label stacks and use this to balance flows over ECMPs. This draft
      describes a method of identifying the flows, or flow groups, to the
      label switched routers by including an additional label in the label
      stack.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>A pseudowire (PW) <xref target="RFC3985"></xref> is normally
      transported over one single network path, even if multiple Equal Cost
      Multiple Paths (ECMP) exit between the ingress and egress PW provider
      edge (PE) equipments<xref target="RFC4385"> </xref> <xref
      target="RFC4928"></xref>. This is required to preserve the
      characteristics of the emulated service (e.g. to avoid misordering SAToP
      pseudowire packets <xref target="RFC4553"></xref> or subjecting the
      packets to unusable inter-arrival times ). The use of a single path to
      preserve order remains the default mode of operation of a pseudowire
      (PW). The new capability proposed in this document is an OPTIONAL mode
      which may be used when the use of ECMP paths for is known to be
      beneficial (and not harmful) to the operation of the PW.</t>

      <t>Some pseudowires are used to transport large volumes of IP traffic
      between routers at two locations. One example of this is the use of an
      Ethernet pseudowire to create a virtual direct link between a pair of
      routers. Such pseudowire’s may carry from hundred’s of Mbps
      to Gbps of traffic. Such pseudowire’s do not require strict
      ordering to be preserved between packets of the pseudowire. They only
      require ordering to be preserved within the context of each individual
      transported IP flow. Some operators have requested the ability to
      explicitly configure such a pseudowire to leverage the availability of
      multiple ECMP paths. This allows for better capacity planning as the
      statistical multiplexing of a larger number of smaller flows is more
      efficient than with a smaller set of larger flows. Although Ethernet is
      used as an example above, the mechanisms described in this draft are
      general mechanisms that may be applied to any pseudowire type in which
      there are identifiable flows, and in which there is no requirement to
      preserve the order between those flows.</t>

      <t>Typically, forwarding hardware can deduce that an IP payload is being
      directly carried by an MPLS label stack, and is capable of looking at
      some fields in packets to construct hash buckets for conversations or
      flows. However, an intermediate node has no information on the type
      pseudowire being carried in the packet. This limits the forwarder at the
      intermediate node to only being able to make an ECMP choice based on a
      hash of the label stack. In the case of a pseudowire emulating a high
      bandwidth trunk, the granularity obtained by hashing the default label
      stack is inadequate for satisfactory load-balancing. The ingress node,
      however, is in the special position of being able to look at the
      un-encapsulated packet and spread flows amongst any available ECMP
      paths, or even any Loop-Free Alternates <xref target="RFC5286"></xref> .
      This draft proposes a method to introduce granularity on the hashing of
      traffic running over pseudowires by introducing an additional label,
      chosen by the ingress node, and placed at the bottom of the label
      stack.</t>

      <t>In addition to providing an indication of the flow structure for use
      in ECMP forwarding decisions, the mechanism described in the document
      may also be used to select flows for distribution over an 802.1ad link
      aggregation group that has been used in an MPLS network.</t>

      <section title="ECMP in Label Switched Routers">
        <t>Label switched routers commonly hash the label stack or some
        elements of the label stack as a method of discriminating between
        flows, in order to distribute those flows over the available equal
        cost multiple paths that exist in the network. Since the label at the
        bottom of stack is usually the label most closely associated with the
        flow, this normally provides the greatest entropy, and hence is
        usually included in the hash. This draft describes a method of adding
        an additional label at the bottom of stack in order to facilitate the
        load balancing of the flows within a pseudowire over the available
        ECMPs. A similar design for general MPLS use has also been proposed
        <xref target="I-D.kompella-mpls-entropy-label"></xref>, however that
        is outside the scope of this draft.</t>

        <t>An alternative method of load balancing by creating a number of
        pseudowires and distributing the flows amongst them was considered,
        but was rejected because:<list style="symbols">
            <t>It did not introduce as much entropy as the load balance label
            method.</t>

            <t>It required additional pseudowires to be set up and
            maintained.</t>
          </list></t>
      </section>

      <section title="Flow Label">
        <t>An additional label is interposed between the pseudowire label and
        the control word, or if the control word is not present, between the
        pseudowire label and the pseudowire payload. This additional label is
        called the flow label. Indivisible flows within the pseudowire MUST be
        mapped to the same flow label by the ingress PE. The flow label
        stimulates the correct ECMP load balancing behaviour in the packet
        switched network (PSN). On receipt of the pseudowire packet at the
        egress PE (which knows this additional label is present) the flow
        label is discarded without processing.</t>

        <t>Note that the flow label MUST NOT be an MPLS reserved label (values
        in the range 0..15) <xref target="RFC3032"></xref>, but is otherwise
        unconstrained by the protocol.</t>

        <t>Considerations of the TTL value are described in the Security
        section of this document. The flow label can never become the top
        label in normal operation, and hence the TTL in the flow label is
        never used to determine whether the packet should be discarded due to
        TTL expiry. Therefore there are no lower restrictions on the TTL
        value.</t>
      </section>
    </section>

    <section title="Native Service Processing Function">
      <t>The Native Service Processing (NSP) function <xref
      target="RFC3985"></xref> is a component of a PE that has knowledge of
      the structure of the emulated service and is able to take action on the
      service outside the scope of the pseudowire. In this case it is required
      that the NSP in the ingress PE identify flows, or groups of flows within
      the service, and indicate the flow (group) identity of each packet as it
      is passed to the pseudowire forwarder. As an example, where the PW type
      is an Ethernet, the NSP might parse the ingress Ethernet traffic and
      consider all of the IP traffic. This traffic could then be categorised
      into flows by considering all traffic with the same source and
      destination address pair to be a single indivisible flow. Since this is
      an NSP function, by definition, the method used to identify a flow is
      outside the scope of the pseudowire design. Similarly, since the NSP is
      internal to the PE, the method of flow indication to the pseudowire
      forwarder is outside the scope of this document.</t>
    </section>

    <section title="Pseudowire Forwarder">
      <t>The pseudowire forwarder must be provided with a method of mapping
      flows to load balanced paths.</t>

      <t>The forwarder must generate a label for the flow or group of flows.
      How the load balance label values are determined is outside the scope of
      this document, however the load balance label allocated to a flow MUST
      NOT be an MPLS reserved label and SHOULD remain constant for the life of
      the flow. It is recommended that the method chosen to generate the load
      balancing labels introduces a high degree of entropy in their values, to
      maximise the entropy presented to the ECMP path selection mechanism in
      the LSRs in the PSN, and hence distribute the flows as evenly as
      possible over the available PSN ECMP paths. The forwarder at the ingress
      PE prepends the pseudowire control word (if applicable), and then pushes
      the flow label, followed by the pseudowire label.</t>

      <t>The forwarder at the egress PE uses the pseudowire label to identify
      the pseudowire. From the context associated with the pseudowire label,
      the egress PE can determine whether a flow label is present. If a flow
      label is present, the label is discarded.</t>

      <t>All other pseudowire forwarding operations are unmodified by the
      inclusion of the flow label.</t>

      <section title="Encapsulation ">
        <t>The PWE3 Protocol Stack Reference Model modified to include flow
        label is shown in <xref target="PStack"></xref> below</t>

        <figure anchor="PStack" title="PWE3 Protocol Stack Reference Model">
          <artwork><![CDATA[
   +-------------+                                +-------------+
   |  Emulated   |                                |  Emulated   |
   |  Ethernet   |                                |  Ethernet   |
   | (including  |         Emulated Service       | (including  |
   |  VLAN)      |<==============================>|  VLAN)      |
   |  Services   |                                |  Services   |
   +-------------+                                +-------------+
   |    Flow     |                                |    Flow     |
   +-------------+            Pseudowire          +-------------+
   |Demultiplexer|<==============================>|Demultiplexer|
   +-------------+                                +-------------+
   |    PSN      |            PSN Tunnel          |    PSN      |
   |   MPLS      |<==============================>|   MPLS      |
   +-------------+                                +-------------+
   |  Physical   |                                |  Physical   |
   +-----+-------+                                +-----+-------+

]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>

        <t>The encapsulation of a pseudowire with a flow label is shown in
        <xref target="Encap"></xref> below</t>

        <figure anchor="Encap"
                title="Encapsulation of a pseudowire with a pseudowire load balancing label">
          <artwork><![CDATA[

    +-------------------------------+
    |                               |
    |            Payload            |
    |                               |  n octets
    |                               |
    +-------------------------------+
    |   Optional Control Word       |  4 octets   
    +-------------------------------+
    |      Flow label               |  4 octets 
    +-------------------------------+
    |      PW label                 |  4 octets  
    +-------------------------------+
    |      MPLS Tunnel label(s)     | n*4 octets (four octets per label)
    +-------------------------------+
   
    

]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>
      </section>
    </section>

    <section title="Signaling the Presence of the Flow Label">
      <t>When using the signalling procedures in <xref
      target="RFC4447"></xref>, a Pseudowire Interface Parameter Flow Label
      Sub-TLV (FL Sub-TLV) type is used to synchronise the flow label states
      between the ingress and egress PEs. The presence of an FL Sub-TLV in the
      interface parameters indicates to the ingress PE that the egress PE can
      correctly process a flow label.</t>

      <t>A PE that wishes to use a flow label includes in its label mapping
      message a Flow Label Sub-TLV (FL Sub-TLV) with F = 1 (see <xref
      target="FL-TLV-strut"></xref>). A PE that can correctly process a flow
      label, and is willing to receive one, but does not wish to send a flow
      label, includes an FL Sub-TLV with F = 0.</t>

      <t>If a PE has sent an FL Sub-TLV with F = 1, and has received an FL
      Sub-TLV it MUST include a flow lablel in the label stack.</t>

      <t>If a PE has sent an FL Sub-TLV with F = 1 and does not receive an FL
      Sub-TLV it MUST send a new label mapping using an FL Sub-TLV with F =
      0.</t>

      <t>A PE that has sent an FL Sub-TLV with F = 0 MUST NOT include a flow
      lablel in the label stack.</t>

      <t>If a PE that previously did not received a label binding without a FL
      Sub-TLV receives a new a label mapping with one included, it MAY send a
      new label mapping including an FL Sub-TLV with F = 1.</t>

      <t>The signalling procedures in <xref target="RFC4447"></xref> state
      that "Processing of the interface parameters should continue when
      unknown interface parameters are encountered, and they MUST be silently
      ignored." The signalling proceedure described here is therefore
      backwards compatible with existing implementations.</t>

      <t>If PWE3 signalling <xref target="RFC4447"></xref> is not in use for a
      pseudowire, then whether the flow label is used MUST be identically
      provisioned in both PEs at the pseudowire endpoints. If there is no
      provisioning support for this option, the default behaviour is not to
      include the flow label.</t>

      <t>Note that what is signalled is the desire to include the flow label
      in the label stack. The value of the label is a local matter for the
      ingress PE, and the label value itself is not signalled.</t>

      <section anchor="FL-TLV-strut" title="Structure of Flow Label Sub-TLV">
        <t>The structure of the flow label TLV is shown in <xref
        target="FLSubTLV"></xref>.</t>

        <figure anchor="FLSubTLV" title="Flow Label Sub-TLV">
          <preamble></preamble>

          <artwork><![CDATA[ 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FL            |    Length     |F|        Reserved             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+]]></artwork>

          <postamble></postamble>
        </figure>

        <t></t>

        <t>Where:</t>

        <t><list style="symbols">
            <t>FL is the flow label sub-TLV identifier assigned by IANA.</t>

            <t>Length is the length of the TLV in octets and is 4.</t>

            <t>When F=1 a flow label MUST be pushed. When F=0 a flow label
            MUST NOT be pushed.</t>

            <t>Reserved bits MUST be zero on transmit and MUST be ignored on
            receive.</t>
          </list></t>
      </section>
    </section>

    <section title="Multi-Segment Pseudowires">
      <t>The flow label mechanism described in this document works on
      multi-segment PWs without requiring modification to the Switching PEs
      (S-PEs). This is because the flow label is transparent to the label swap
      operation, and because interface parameter Sub-TLV signalling is
      transitive.</t>
    </section>

    <section title="OAM">
      <t>The following OAM considerations apply to this method of load
      balancing.</t>

      <t>Where the OAM is only to be used to perform a basic test that the
      pseudowires have been configured at the PEs, <xref
      target="RFC5085">VCCV</xref> messages may be sent using any load balance
      pseudowire path, i.e. using any value for the flow label.</t>

      <t>Where it is required to verify that a pseudowire is fully functional
      for all flows<xref target="RFC5085">, VCCV</xref> connection
      verification message MUST be sent over each ECMP path to the pseudowire
      egress PE. This problem is difficult to solve and scales poorly. We
      believe that this problem is addressed by the following two methods:</t>

      <t><list counter="" style="numbers">
          <t>If a failure occurs within the PSN, this failure will normally be
          detected by the PSN's Interior Gateway protocol (IGP) link/node
          failure detection mechanism (loss of light, bidirectional forwarding
          detection <xref target="I-D.ietf-bfd-base"></xref> or IGP hello
          detection), and the IGP convergence will naturally modify the ECMP
          set of network paths between the Ingress and Egress PE's. Hence the
          PW is only impacted during the normal IGP convergence time.</t>

          <t>If the failure is related to the individual corruption of an
          Label Forwarding Information dataBase (LFIB) entry in a router, then
          only the network path using that specific entry is impacted. If the
          PW is load balanced over multiple network paths, then this failure
          can only be detected if, by chance, the transported OAM flow is
          mapped onto the impacted network path, or all paths are tested. This
          type of error may be better solved be solved by other means such as
          LSP self test <xref
          target="I-D.ietf-mpls-lsr-self-test"></xref>.</t>
        </list>To troubleshoot the MPLS PSN, including multiple paths, the
      techniques described in <xref target="RFC4378"></xref> and <xref
      target="RFC4379"></xref> can be used.</t>

      <t>Where the pseudowire OAM is carried out of band (VCCV Type 2) <xref
      target="RFC5085"></xref> it is necessary to insert an "MPLS Router Alert
      Label" in the label stack. The resultant label stack is a follows:</t>

      <t><figure anchor="OBVCCV" title="Use of Router Alert LAbel">
          <preamble></preamble>

          <artwork><![CDATA[  
    +-------------------------------+
    |                               |
    |            Payload            |
    |                               |  n octets
    |                               |
    +-------------------------------+
    |   Optional Control Word       |  4 octets   
    +-------------------------------+
    |      Flow label               |  4 octets 
    +-------------------------------+
    |      PW label                 |  4 octets  
    +-------------------------------+
    |      Router Alert label       |  4 octets 
    +-------------------------------+ 
    |      MPLS Tunnel label(s)     | n*4 octets (four octets per label)
    +-------------------------------+ 

]]></artwork>

          <postamble></postamble>
        </figure></t>
    </section>

    <section title="Applicability of FAT PWs">
      <t>A node within the PSN is not able to perform deep-packet-inspection
      (DPI) of the PW as the PW technology is not self-describing: the
      structure of the PW payload is only known to the ingress and egress PE
      devices. The method proposed in this document provides a statistical
      mitigation of the problem of load balance in those cased where a PE is
      able to discern flows embedded in the traffic received on the attachment
      circuit.</t>

      <t>The methods describe in this document are transparent to the PSN and
      as such do not require any new capability from the PSN.</t>

      <t>The requirement to load-balance over multiple PSN paths occurs when
      the ratio between the PW access speed and the PSN’s core link
      bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet
      this property. Ethernet may have this property, and for that reason this
      document focuses on Ethernet. Applications for other
      high-access-bandwidth PW’s (e.g. Fibre Channel) may be defined in
      the future.</t>

      <t>This design applies to MPLS pseudowires where it is meaningful to
      de-construct the packets presented to the ingress PE into flows. The
      mechanism described in this document promotes the distribution of flows
      within the pseudowire over different network paths. This in turn means
      that whilst packets within a flow are delivered in order (subject to
      normal IP delivery perturbations due to topology variation), order is
      not maintained amongst packets of different flows. It is not proposed to
      associate a different sequence number with each flow. If sequence number
      support is required this mechanism is not applicable.</t>

      <t>Where it is known that the traffic carried by the Ethernet pseudowire
      is IP the method of identifying the flows are well known and can be
      applied. Such methods typically include hashing on the source and
      destination addresses, the protocol ID and higher-layer flow-dependent
      fields such as TCP/UDP ports, L2TPv3 Session ID’s etc.</t>

      <t>Where it is known that the traffic carried by the Ethernet pseudowire
      is non-IP, techniques used for link bundling between Ethernet switches
      may be reused. In this case however the latency distribution would be
      larger than is found in the link bundle case. The acceptability of the
      increased latency is for further study. Of particular importance the
      Ethernet control frames SHOULD always be mapped to the same PSN path to
      ensure in-order delivery.</t>

      <section title="Equal Cost Multiple Paths">
        <t>ECMP in packet switched networks is statistical in nature. The
        mapping of flows to a particular path does not take into account the
        bandwidth of the flow being mapped or the current bandwidth usage of
        the members of the ECMP set. This simplification works well when the
        distribution of flows is evenly spread over the ECMP set and there are
        a large number of flows that have low bandwidth relative to the paths.
        The random allocation of a flow to a path provides a good
        approximation to an even spread of flows, provided that polarisation
        effects are avoided. The method proposed in this document has the same
        statistical properties as an IP PSN.</t>

        <t>ECMP is a load-sharing mechanism that is based on sharing the load
        over a number of layer 3 paths through the PSN. Often however multiple
        links exist between a pair of LSRs that are considered by the IGP to
        be a single link. These are known as link bundles. The mechanism
        described in this document can also be used to distribute the flows
        within a pseudowire over the members of the link bundle by using the
        flow label value to identify candidate flows. How that mapping takes
        place is outside the scope of this specification. Similar
        considerations apply to link aggregation groups.</t>

        <t>In the ECMP case and the link bundling case the NSP may attempt to
        take bandwidth into consideration when allocating groups of flows to a
        common path. That is permitted, but it must be borne in mind that the
        semantics of a label stack entry (LSE) as defined by <xref
        target="RFC3032"></xref> cannot be modified, the value of the flow
        label cannot be modified at any point on the LSP, and the
        interpretation of bit patterns in, or values of, the flow label by an
        LSR are undefined.</t>

        <t>A different type of load balancing is the desire to carry a
        pseudowire over a set of PSN links in which the bandwidth of members
        of the link set is less than the bandwidth of the pseudowire. This
        problem is addressed in <xref
        target="I-D.stein-pwe3-pwbonding"></xref>. Such a mechanism can be
        considered complementary to this mechanism.</t>
      </section>

      <section title="Link Aggregation Groups">
        <t>A Link Aggregation Group (LAG) is used to bond together several
        physical circuits between two adjacent nodes so they appear to
        higher-layer protocols as a single, higher bandwidth "virtual" pipe.
        These may co-exist in various parts of a given network. An advantage
        of LAGs is that they reduce the number of routing and signalling
        protocol adjacencies between devices, reducing control plane
        processing overhead. As with ECMP, the key problem related to LAGs is
        that due to inefficiencies in LAG load-distribution algorithms, a
        particular component of a LAG may experience congestion. The mechanism
        proposed here may be able to assist in producing a more uniform flow
        distribution.</t>

        <t>The same considerations requiring a flow to go over a single member
        of an ECMP path set apply to a member of a LAG.</t>
      </section>

      <section title="The Single Large Flow Case">
        <t>Clearly the operator should make sure that the service offered
        using PW technology and the method described in this document does not
        exceed the maximum planned link capacity, unless it can be guaranteed
        that it conforms to the Internet traffic profile of a very large
        number of small flows.</t>

        <t>If the payload on a PW is made of a single inner flow (i.e. an
        encrypted connection between two routers), or the flow identifiers are
        too deeply buried in the packet, then the functionality described in
        this document does not give any benefits, though neither does it cause
        harm relative to the existing situation. The most common case where a
        single flow dominated the traffic on a PW is when it is used to
        transport enterprise traffic. Enterprise traffic may well consist of a
        large single TCP flows, or encrypted flows that cannot be handled by
        the methods described in this document.</t>

        <t>An operator has six options under these circumstances:</t>

        <t><list style="numbers">
            <t>The operator can do nothing and the system will work as it does
            without the flow label.</t>

            <t>The operator can make the customer aware that the service
            offering has a restriction on flow bandwidth and police flows to
            that restriction. This would allow customers offering multiple
            flows to use a larger fraction their access bandwidth, whilst
            preventing an single flow from consuming a fraction of internal
            link bandwidth that the operator considered excessive.</t>

            <t>The operator could configure the ingress PE to assign a
            constant flow label to all high bandwidth flows so that only one
            path was affected by these flows,</t>

            <t>The operator could configure the ingress PE to assign a random
            flow label to all high bandwidth flows so as to minimise the
            disruption to the network as a cost of out of order traffic to the
            user.</t>

            <t>The operator could configure the ingress to assign a label of
            special significance (such as a reserved label) to all high
            bandwidth flows so that some other action (not specified in this
            document) could be taken on the flow.</t>
          </list></t>

        <t>The issues described above are mitigated by the following two
        factors:</t>

        <t><list style="symbols">
            <t>Firstly, the customer of a high-bandwidth PW service has an
            incentive to get the best transport service because an inefficient
            use of the PSN leads to jitter and eventually to loss to the
            PW’s payload.</t>

            <t>Secondly, the customer is usually able to tailor their
            applications to generate many flows in the PSN. A well-known
            example is massive data transport between servers which use many
            parallel TCP sessions. This same technique can be used by any
            transport protocol: multiple UDP ports, multiple L2TPv3 Session
            ID’s, multiple GRE keys may be used to decompose a large
            flow into smaller components. This approach may be applied to
            IPsec <xref target="RFC4301"></xref> where multiple Security
            Parameters Indexes (SPI’s) may be allocated to the same
            security association.</t>
          </list></t>
      </section>

      <section title="MPLS-TP">
        <t>The MPLS Transport Profile (MPLS-TP) <xref target="RFC5654"></xref>
        requirement 44 states that "MPLS-TP SHOULD support mechanisms to
        enable the reserved bandwidth of a transport path to be decreased
        without impacting the existing traffic on that transport path,
        provided that the level of existing traffic is smaller than the
        reserved bandwidth following the decrease." The flow aware transport
        of a PW reorders packets (albeit in an application friendly way),
        therefore SHOULD NOT be deployed in a network conforming to the
        MPLS-TP.</t>
      </section>
    </section>

    <section title="Applicability to MPLS">
      <t>A further application of this technique would be to create a basis
      for hash diversity without having to peek below the label stack for IP
      traffic carried over LDP LSPs. Work on the generalisation of this to
      MPLS has been described in <xref
      target="I-D.kompella-mpls-entropy-label"></xref>. This is can be
      regarded as a complementary, but distinct, approach since although
      similar consideration may apply to the identification of flows and the
      allocation of flow label values, the flow labels are imposed by
      different network components, and the associated signalling mechanisms
      are different.</t>
    </section>

    <section anchor="SECCON" title="Security Considerations">
      <t>The pseudowire generic security considerations described in <xref
      target="RFC3985"></xref> and the security considerations applicable to a
      specific pseudowire type (for example, in the case of an Ethernet
      pseudowire <xref target="RFC4448"></xref> apply.</t>

      <t>The ingress PE SHOULD take steps to ensure that the load-balance
      label is not used as a covert channel.</t>

      <t>It is useful to give consideration to the choice of TTL value in the
      flow label stack entry <xref target="RFC3032"></xref>. The flow label is
      at the bottom of label stack. Therefore, even when penultimate hop
      popping is employed, it will always be will preceded by the PW label on
      arrival at the PE. The flow label TTL should therefore never be
      considered by the forwarder, and hence SHOULD be set to a value of 1.
      This will prevent the packet being inadvertently forwarded based on the
      value of the flow label. Note that this may be a departure from
      considerations that apply to the general MPLS case.</t>
    </section>

    <section title="IANA Considerations">
      <t>IANA is requested to allocate the next available values from the IETF
      Consensus range in the Pseudowire Interface Parameters Sub-TLV type
      Registry as a Flow Label indicator.</t>

      <figure>
        <artwork><![CDATA[
Parameter  Length       Description
ID

TBD         4           Flow Label
]]></artwork>

        <postamble></postamble>
      </figure>
    </section>

    <section title="Congestion Considerations">
      <t>The congestion considerations applicable to pseudowires as described
      in <xref target="RFC3985"></xref> and any additional congestion
      considerations developed at the time of publication apply to this
      design.</t>

      <t>The ability to explicitly configure a PW to leverage the availability
      of multiple ECMP paths is beneficial to capacity planning as, all other
      parameters being constant, the statistical multiplexing of a larger
      number of smaller flows is more efficient than with a smaller number of
      larger flows.</t>

      <t>Note that if the classification into flows is only performed on IP
      packets the behaviour of those flows in the face of congestion will be
      as already defined by the IETF for packets of that type and no
      additional congestion processing is required.</t>

      <t>Where flows that are not IP are classified pseudowire congestion
      avoidance must be applied to each non-IP load balance group.</t>
    </section>

    <section title="Acknowledgements">
      <t>The authors wish to thank Eric Grey, Kireeti Kompella, Joerg
      Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, and Lucy Yong
      for valuable comments on this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.4447'?>

      <?rfc include='reference.RFC.4448'?>

      <?rfc include='reference.RFC.4928'?>

      <?rfc include='reference.RFC.4553'?>

      <?rfc include='reference.RFC.4385'?>

      <?rfc include='reference.RFC.4379'?>

      <?rfc include='reference.RFC.3032'?>

      <?rfc include='reference.RFC.5085'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.3985'?>

      <?rfc include='reference.RFC.4378'?>

      <?rfc include='reference.I-D.ietf-mpls-lsr-self-test'?>

      <?rfc include='reference.I-D.kompella-mpls-entropy-label'?>

      <?rfc include='reference.RFC.5286'?>

      <?rfc include='reference.I-D.stein-pwe3-pwbonding'?>

      <?rfc include='reference.RFC.5654'?>

      <?rfc include='reference.I-D.ietf-bfd-base'?>

      <?rfc include='reference.RFC.4301'?>
    </references>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-21 22:30:23