One document matched: draft-ietf-l3vpn-mvpn-considerations-05.xml


<?xml version="1.0"?>
<?rfc toc="yes"?><?rfc tocompact="yes"?><?rfc tocdepth="3"?><?rfc tocindent="yes"?><?rfc symrefs="yes"?><?rfc sortrefs="no"?><?rfc comments="yes"?><?rfc inline="yes"?><?rfc compact="yes"?><?rfc subcompact="no"?><rfc docName="draft-ietf-l3vpn-mvpn-considerations-05" ipr="trust200811">
  <front>
    <title abbrev="Multicast VPN mandatory features">Mandatory Features in a
    Layer 3 Multicast BGP/MPLS VPN Solution</title>

    <author fullname="Thomas Morin" initials="T." role="editor" surname="Morin">
      <organization abbrev="France Telecom Orange">France Telecom - Orange
      Labs</organization>

      <address>
        <postal>
          <street>2 rue Pierre Marzin</street>

          <city>Lannion</city>

          <code>22307</code>

          <country>France</country>
        </postal>

        <email>thomas.morin@orange-ftgroup.com</email>
      </address>
    </author>

    <author fullname="Ben Niven-Jenkins" initials="B.P." role="editor" surname="Niven-Jenkins">
      <organization>BT</organization>

      <address>
        <postal>
          <street>208 Callisto House, Adastral Park</street>

          <city>Ipswich</city>

          <region>Suffolk</region>

          <code>IP5 3RE</code>

          <country>UK</country>
        </postal>

        <email>benjamin.niven-jenkins@bt.com</email>
      </address>
    </author>

    <author fullname="Yuji Kamite" initials="Y." surname="Kamite">
      <organization abbrev="NTT Communications">NTT Communications
      Corporation</organization>

      <address>
        <postal>
          <street>Tokyo Opera City Tower</street>

          <street>3-20-2 Nishi Shinjuku, Shinjuku-ku</street>

          <region>Tokyo</region>

          <code>163-1421</code>

          <country>Japan</country>
        </postal>

        <email>y.kamite@ntt.com</email>
      </address>
    </author>

    <author fullname="Raymond Zhang" initials="R." surname="Zhang">
      <organization>BT</organization>

      <address>
        <postal>
          <street>2160 E. Grand Ave.</street>

          <city>El Segundo</city>

          <code>CA 90025</code>

          <country>USA</country>
        </postal>

        <email>raymond.zhang@bt.com</email>
      </address>
    </author>

    <author fullname="Nicolai Leymann" initials="N." surname="Leymann">
      <organization>Deutsche Telekom</organization>

      <address>
        <postal>
          <street>Goslarer Ufer 35</street>

          <city>10589 Berlin</city>

          <country>Germany</country>
        </postal>

        <email>n.leymann@telekom.de</email>
      </address>
    </author>

    <author fullname="Nabil Bitar" initials="N" surname="Bitar">
      <organization>Verizon</organization>

      <address>
        <postal>
          <street>40 Sylvan Road</street>

          <city>Waltham</city>

          <region>MA</region>

          <code>02451</code>

          <country>USA</country>
        </postal>

        <email>nabil.n.bitar@verizon.com</email>
      </address>
    </author>

    <date year="2009"/>

    <abstract>
      <t>More that one set of mechanisms to support multicast in a layer 3
      BGP/MPLS VPN has been defined. These are presented in the documents that
      define them as optional building blocks.</t>

      <t>To enable interoperability between implementations, this document
      defines a subset of features that is considered mandatory for a
      multicast BGP/MPLS VPN implementation. This will help implementers and
      deployers understand which L3VPN multicast requirements are best
      satisfied by each option.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref target="RFC2119"/>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Specifications for <xref target="I-D.ietf-l3vpn-2547bis-mcast">multicast in BGP/MPLS</xref>
      include multiple alternative mechanisms for some of the required
      building blocks of the solution. However, they do not identify which of
      these mechanisms are mandatory to implement in order to ensure
      interoperability. Not defining a set of mandatory to implement
      mechanisms leads to a situation where implementations may support
      different subsets of the available optional mechanisms which do not
      interoperate, which is a problem for the numerous operators having
      multi-vendor backbones.</t>

      <t>The aim of this document is to leverage the already expressed <xref target="RFC4834">requirements</xref> and study the properties of each
      approach, to identify mechanisms that are good candidates for being part
      of a core set of mandatory mechanisms which can be used to provide a
      base for interoperable solutions.</t>

      <t>This document goes through the different building blocks of the
      solution and concludes on which mechanisms an implementation is required
      to implement. <xref target="summary"/> summarizes these
      requirements.</t>

      <t>Considering the history of the multicast VPN proposals and
      implementations, it is also useful to discuss how existing deployments
      of early implementations <xref target="I-D.rosen-vpn-mcast"/><xref target="I-D.raggarwa-l3vpn-2547-mvpn"/> can be accommodated, and
      provide suggestions in this respect.</t>
    </section>

    <section title="Terminology">
      <t>Please refer to <xref target="I-D.ietf-l3vpn-2547bis-mcast"/>
      and <xref target="RFC4834"/>.</t>
    </section>

    <section title="Examining alternatives mechanisms for MVPN functions">
      <section anchor="autodiscovery" title="MVPN auto-discovery">
        <t><xref target="I-D.ietf-l3vpn-2547bis-mcast">The current solution
        document</xref> proposes two different mechanisms for MVPN
        auto-discovery:<list style="numbers">
            <t>BGP-based auto-discovery</t>

            <t>"PIM/shared P-tunnel": discovery done through the exchange of
            PIM Hellos by C-PIM instances, across an MI-PMSI implemented with
            one shared P-tunnel per VPN (using multicast ASM, or MP2MP
            LDP)</t>
          </list>Both solutions address <xref target="RFC4834">Section 5.2.10
        of</xref> which states that "the operation of a multicast VPN solution
        SHALL be as light as possible and providing automatic configuration
        and discovery SHOULD be a priority when designing a multicast VPN
        solution. Particularly the operational burden of setting up multicast
        on a PE or for a VR/VRF SHOULD be as low as possible".</t>

        <t>The key consideration is that PIM-based discovery is only
        applicable to deployments using a shared P-tunnel to instantiate an
        MI-PMSI (it is not applicable if only P2P, PIM-SSM, P2MP mLDP/RSVP-TE
        P-tunnels are used, because contrary to ASM and MP2MP, building these
        types of P-tunnels cannot happen before the autodiscovery has been
        done), whereas the BGP-based auto-discovery does not place any
        constraint on the type of P-tunnel that would have to be used.
        BGP-based auto-discovery is independent of the type of P-tunnel used
        thus satisfying the requirement in <xref target="RFC4834">section
        5.2.4.1 of</xref> that "a multicast VPN solution SHOULD be designed so
        that control and forwarding planes are not interdependent".</t>

        <t>Additionally, it is to be noted that a number of service providers
        have chosen to use SSM-based P-tunnels for the default MDTs within
        their current deployments, therefore relying already on some BGP-based
        auto-discovery.</t>

        <t>Moreover, when shared P-tunnels are used, the use of BGP
        auto-discovery would allow inconsistencies in the
        addresses/identifiers used for the shared P-tunnel to be detected
        (e.g. the same shared P-tunnel identifier being used for different
        VPNs with distinct BGP route targets). This is particularly attractive
        in the context of inter-AS VPNs where the impact of any
        misconfiguration could be magnified and where a single service
        provider may not operate all the ASs. Note that this technique to
        detect some misconfiguration cases may not be usable during a
        transition period from a shared-P-tunnel autodiscovery to a BGP-based
        autodiscovery.</t>

        <t>Thus, the recommendation is that implementation of the BGP-based
        auto-discovery is mandated and should be supported by all MVPN
        implementations.</t>
      </section>

      <section title="S-PMSI Signaling">
        <t><xref target="I-D.ietf-l3vpn-2547bis-mcast">The current solution
        document</xref> proposes two mechanisms for signaling that multicast
        flows will be switched to an S-PMSI:</t>

        <t><list style="numbers">
            <t>a UDP-based TLV protocol specifically for S-PMSI signaling
            (described in section 7.4.2).</t>

            <t>a BGP-based mechanism for S-PMSI signaling (described in
            section 7.4.1).</t>
          </list></t>

        <t><xref target="RFC4834">Section 5.2.10 of</xref> states that "as far
        as possible, the design of a solution SHOULD carefully consider the
        number of protocols within the core network: if any additional
        protocols are introduced compared with the unicast VPN service, the
        balance between their advantage and operational burden SHOULD be
        examined thoroughly". The UDP-based mechanism would be an additional
        protocol in the MVPN stack, which isn't the case for the BGP-based
        S-PMSI switching signaling, since (a) BGP is identified as a
        requirement for autodiscovery, and (b) the BGP-based S-PMSI switching
        signaling procedures are very similar to the autodiscovery
        procedures.</t>

        <t/>

        <t>Furthermore, the UDP-based S-PMSI switching signaling mechanism
        requires an MI-PMSI, while the BGP-based protocol does not. In
        practice, this mean that with the UDP-based protocol a PE will have to
        join to all P-tunnels of all PEs in an MVPN, while in the alternative
        where BGP-based S-PMSI switching signaling is used, it could delay
        joining a P-tunnel rooted at a PE until traffic from that PE is
        needed, thus reducing the amount of state maintained on P routers.</t>

        <t>S-PMSI switching signaling approaches can also be compared in an
        inter-AS context (see <xref target="interas"/>). The proposed
        BGP-based approach for S-PMSI switching signaling provides a good fit
        with both the segmented and non-segmented inter-AS approaches
        (see<xref target="interas"/>). By contrast while the UDP-based
        approach for S-PMSI switching signaling appears to be usable with
        segmented inter-AS tunnels, in that case key advantages of the
        segmented approach are lost:</t>

        <t><list style="symbols">
            <t>there is no more an independence of ASes to choose when S-PMSIs
            tunnels will be triggered in their AS (and thus control the amount
            of state created on their P routers),</t>

            <t>there is no more an independence of ASes to choose the
            tunneling technique for the P-tunnels used for an S-PMSI,</t>

            <t>In an inter-AS option B context, an isolation of ASes is
            obtained as PEs in one AS don't have (direct) exchange of routing
            information with PEs of other ASes. This property is not preserved
            if UDP-based S-PMSI switching signaling is used. By contrast,
            BGP-based C-Multicast switching signaling does preserve this
            property.</t>
          </list></t>

        <t>Given all the above, it is the recommendation of the authors that
        BGP is the preferred solution for S-PMSI switching signaling and
        should be supported by all implementations.</t>

        <t>It is identified that, if nothing prevents a fast-paced creation of
        S-PMSI, then S-PMSI switching signaling with BGP would possibly impact
        the Route Reflectors used for MVPN routes. However is it also
        identified that such a fast-paced behavior would have an impact on P
        and PE routers resulting from S-PMSI tunnels signaling, which will be
        the same independently of the S-PMSI signaling approach that is used,
        and which it is certainly best to avoid by setting up proper
        mechanisms.</t>

        <t>The UDP-based S-PMSI switching signaling protocol can also be
        considered, as an option, given that this protocol has been in
        deployment for some time. Implementations supporting both protocols
        would be expected to provide a per-VRF configuration knob to allow an
        implementation to use the UDP-based TLV protocol for S-PMSI switching
        signaling for specific VRFs in order to support the coexistence of
        both protocols (for example during migration scenarios). Apart from
        such migration-facilitating mechanisms, the authors specifically do
        not recommend extending the already proposed UDP-based TLV protocol to
        new types of P-tunnels.</t>
      </section>

      <section title="PE-PE Exchange of C-Multicast Routing">
        <t><xref target="I-D.ietf-l3vpn-2547bis-mcast">The current solution
        document</xref> proposes multiple mechanisms for PE-PE exchange of
        customer multicast routing information (C-multicast routing):<list style="numbers">
            <t>Full per-MVPN PIM peering across an MI-PMSI (described in
            section 3.4.1.1).</t>

            <t>Lightweight PIM peering across an MI-PMSI (described in section
            3.4.1.2)</t>

            <t>The unicasting of PIM C-Join/Prune messages (described in
            section 3.4.1.3)</t>

            <t>The use of BGP for carrying C-Multicast routing (described in
            section 3.4.2).</t>
          </list></t>

        <section anchor="pepe-scaling" title="PE-PE C-multicast routing scalability">
          <t>Scalability being one of the core requirements for multicast VPN,
          it is useful to compare the proposed C-multicast routing mechanisms
          from this perspective: <xref target="RFC4834">Section 4.2.4
          of</xref> recommends that "a multicast VPN solution SHOULD support
          several hundreds of PEs per multicast VPN, and MAY usefully scale up
          to thousands" and section 4.2.5 states that "a solution SHOULD scale
          up to thousands of PEs having multicast service enabled".</t>

          <t>Scalability with an increased number of VPNs per PE, or with an
          increased number of multicast state per VPN, are also important, but
          are not focused on in this section since we didn't identify
          differences between the different approaches for these matters: all
          others things equal, the load on PE due to C-multicast routing
          increases roughly linearly with the number of VPNs per PE, and with
          the number of multicast state per VPN.</t>

          <t>This section presents conclusions related to PE-PE C-multicast
          routing scalability. <xref target="PEPE-mrouting-load"/>
          provides more detailed explanations on the differences in ways of
          handling the C-multicast routing load, between the PIM-based
          approaches and the BGP-based approach, along with a quantified
          evaluations of the amount of state and messages with the different
          approaches, and many points made in this section are detailed in
          <xref target="pepe-scaling-analysis"/>.</t>

          <t>At high scales of multicast deployment, the first and third
          mechanisms require the PEs to maintain a large number of PIM
          adjacencies with other PEs of the same multicast VPN (which implies
          the regular exchange PIM Hellos with each other) and to periodically
          refresh C-Join/Prune states, resulting in an increased processing
          cost when the amount of PEs increases (as detailed in <xref target="pepe-scaling-analysis"/>) to which the second approach
          is less subject, and to which the fourth approach is not
          subject.</t>

          <t>The third mechanism would reduce the amount of C-Join/Prune
          processing for a given multicast flow for PEs that are not the
          upstream neighbor for this flow, but would require "explicit
          tracking" state to be maintained by the upstream PE. It also isn't
          compatible with the "Join suppression" mechanism. A possible way to
          reduce the amount of signaling with this approach would be the use
          of a PIM refresh-reduction mechanism. Such a mechanism, based on
          TCP, is being specified by the PIM IETF Working Group (<xref target="I-D.ietf-pim-port"/>) ; its use in a multicast VPN
          context has not been described in <xref target="I-D.ietf-l3vpn-2547bis-mcast"/>, but it is expected
          that this approach would provide a scalability similar with the
          BGP-based approach without RR.</t>

          <t>The second mechanism would operate in a similar manner to full
          per-MVPN PIM peering except that PIM Hello messages are not
          transmitted and PIM C-Join/Prune refresh-reduction would be used,
          thereby improving scalability, but this approach has yet to be fully
          described. In any case, it seems that it only improves one thing
          among the things that will impact scalability when the number of PEs
          increases.</t>

          <t>The first and second mechanisms can leverage the "Join
          suppression" behavior and thus improve the processing burden of an
          upstream PE, sparing the processing of a Join refresh message for
          each remote PE joined to a multicast stream. This improvement
          requires all PEs of a multicast VPN to process all PIM Join and
          Prune messages sent by any other PE participating in the same
          multicast VPN whether they are the upstream PE or not.</t>

          <t>The fourth mechanism (the use of BGP for carrying C-Multicast
          routing) would have a comparable drawback of requiring all PEs to
          process a BGP C-multicast route only interesting a specific upstream
          PE. For this reason <xref target="I-D.ietf-l3vpn-2547bis-mcast-bgp">section 16</xref>
          recommends the use of the <xref target="RFC4684">Route-Target
          constrained BGP distribution</xref> mechanisms, which eliminate this
          drawback by making only the interested upstream PE to receive a BGP
          C-multicast route. Specifically when Route-Target constrained BGP
          distribution is used, the fourth mechanism reduces the total amount
          of C-multicast routing processing load put on the PEs by avoiding
          any processing of customer multicast routing information on the
          "unrelated" PEs, that are neither the joining PE nor the upstream
          PE.</t>

          <t>Moreover, the fourth mechanism further reduces the total amount
          of message processing load by avoiding the use of periodic
          refreshes, and by inheriting BGP features that are expected to
          improve scalability (for instance, providing a means to offload some
          of the processing burden associated with customer multicast routing
          onto one or many BGP route-reflectors). The advantages of the fourth
          mechanism come at a cost of maintaining an amount of state linear
          with the number of PEs joined to a stream. However, the use of route
          reflectors allows to spread this cost among multiple route
          reflectors, thus eliminating the need for a single route reflector
          to maintain all this state.</t>

          <t>However, the fourth mechanism is specific in that it offers the
          possibility of offloading customer multicast routing processing onto
          one or more BGP Route Reflector(s). When this is used, there is a
          drawback of increasing the processing load placed on the route
          reflector infrastructure. In the higher scale scenarios, it may be
          required to adapt the route reflector infrastructure to the MVPN
          routing load by using, for example:<list style="symbols">
              <t>a separation of resources for unicast and multicast VPN
              routing: using dedicated MVPN Route Reflector(s) (or using
              dedicated MVPN BGP sessions or dedicated MVPN BGP instances)
              ;</t>

              <t>the deployment of additional route reflector resources, for
              example increasing the processing resources on existing route
              reflectors or deployment of additional route reflectors.</t>
            </list>Among the above, the most straightforward approach is to
          consider the introduction of route reflectors dedicated to the MVPN
          service and dimension them accordingly to the need of that service
          (but doing so is not required and is left as an operator engineering
          decision).</t>
        </section>

        <section title="PE-CE multicast routing exchange scalability">
          <t>The overhead associated with the PE-CE exchange of C-multicast
          routing is independent of the choice of the mechanism used for the
          PE-PE C-multicast routing. Therefore, the impact of the PE-CE
          C-multicast routing overhead on the overall system scalability is
          independent of the protocol used for PE-PE signaling, and therefore
          is not relevant when comparing the different approaches proposed for
          the PE-PE C-multicast routing. This is true even if in some
          operational contexts the PE-CE C-multicast routing overhead is a
          significant factor in the overall system overhead.</t>
        </section>

        <section title="P-routers scalability">
          <t>Mechanisms (1) and (2) are restricted to use within multicast
          VPNs that use an MI-PMSI, thereby necessitating:<list style="hanging">
              <t hangText="">the use of a P-tunnel technique that allows
              shared P-tunnels (for example PIM-SM in ASM mode or MP2MP
              LDP)</t>

              <t hangText="or ">the use of one P-tunnel per PE per VPN, even
              for PEs that do not have sources in their directly attached
              sites for that VPN.</t>
            </list>By comparison, the fourth mechanism doesn't impose either
          of these restrictions, and when P2MP P-tunnels are used only
          necessitates the use of one P-tunnel per VPN per PE attached to a
          site with a multicast source or RP (or with a candidate BSR, if BSR
          is used).</t>

          <t>In cases where there are less PEs connected with sources than the
          total amount of PEs, it improves the amount of state maintained by
          P-routers compared to the amount required to build an MI-PMSI with
          P2MP P-tunnels. Such cases are expected to be frequent for multicast
          VPN deployments (see <xref target="RFC4834">sections 4.2.4.1
          of</xref>).</t>
        </section>

        <section title="Impact of C-multicast routing on Inter-AS deployments">
          <t>Co-existence with unicast inter-AS VPN options, and an equal
          level of security for multicast and unicast including in an inter-AS
          context, are specifically mentioned in <xref target="RFC4834">sections 5.2.6, 5.2.8 and 5.2.12 of</xref>.</t>

          <t>In an inter-AS option B context, an isolation of ASes is obtained
          as PEs in one AS don't have (direct) exchange of routing information
          with PEs of other ASes. This property is not preserved if PIM-based
          PE-PE C-multicast routing is used. By contrast, the fourth option
          (BGP-based C-Multicast routing) does preserve this property.</t>

          <t>Additionally, the authors note that the proposed BGP-based
          approach for C-multicast routing provides a good fit with both the
          segmented and non-segmented inter-AS approaches. By contrast, though
          the PIM-based C-multicast routing is usable with segmented inter-AS
          tunnels, the inter-AS scalability advantage of the approach is lost,
          since PEs in an AS will see the C-multicast routing activity of all
          other PEs of all other ASes.</t>
        </section>

        <section title="Security and robustness">
          <t>BGP supports MD5 authentication of its peers for additional
          security, thereby possibly benefit directly to multicast VPN
          customer multicast routing, whether for intra-AS or inter-AS
          communications. By contrast, with a PIM-based approach, no mechanism
          providing a comparable level of security to authenticate
          communications between remote PEs has been yet fully described yet
          <xref target="I-D.ietf-pim-sm-linklocal"/>[], and in any case
          would require significant additional operations for the provider to
          be usable in a multicast VPN context.</t>

          <t>The robustness of the infrastructure, especially the existing
          infrastructure providing unicast VPN connectivity, is key. The
          C-multicast routing function, especially under load, will compete
          with the unicast routing infrastructure. With the PIM-based
          approaches, the unicast and multicast VPN routing functions are
          expected to only compete in the PE, for control plane processing
          resources. In the case of the BGP-based approach, they will compete
          on the PE for processing resources, and in the route reflectors
          (supposing they are used for MVPN routing). It is identified that in
          both cases, mechanisms will be required to arbitrate resources (e.g.
          processing priorities). In the case of PIM-based procedures, between
          the different control plane routing instances in the PE. And in the
          case of the BGP-based approach, this is likely to require using
          distinct BGP sessions for multicast and unicast (e.g. through the
          use of dedicated MVPN BGP route reflectors, or to the use of a
          distinct session with an existing route reflector).</t>

          <t>Multicast routing is dynamic by nature, and multicast VPN routing
          has to follow the VPN customers multicast routing events. The
          different approaches can be compared on how they are expected to
          behave in scenarios where multicast routing in the VPNs is subject
          to an intense activity. Scalability of each approach under such a
          load is detailed in <xref target="leave-join-cost"/>, and the
          fourth approach (BGP-based) used in conjunction with the RT
          Constraint mechanisms <xref target="RFC4684"> </xref>, is the only
          one having a cost for join/leave operations independent of the
          number of PEs in the VPN (with one exception detailed in <xref target="leave-join-cost"> </xref>) and state maintenance not
          concentrated on the upstream PE.</t>

          <t>On the other hand, while the BGP-based approach is likely to
          suffer a slowdown under a load that is greater than the available
          processing resources (because of possibly congested TCP sockets),
          the PIM-based approaches would react to such a load by dropping
          messages, with failure-recovery obtained through message refreshes.
          Thus, the BGP-based approach could result in a degradation of
          join/leave latency performance typically spread evenly across all
          multicast streams being joined in that period, while the PIM-based
          approach could result in increased join/leave latency, for some
          random streams, by a multiple of the time between refreshes (e.g.
          tens of seconds), and possibly in some states the adjacency may
          time-out resulting in disruption of multicast streams.</t>

          <t>The behavior of the PIM-based approach under such a load is also
          harder to predict, given that the performance of the "Join
          suppression" mechanism (an important mechanism for this approach to
          scale) will itself be impeded by delays in Join processing. For
          these reasons, the BGP-based approach would be able to provide a
          smoother degradation and more predictable behavior under a highly
          dynamic load.</t>

          <t>In fact, both an "evenly spread degradation" and an "unevenly
          spread larger degradation" can be problematic, and what seems
          important is the ability for the VPN backbone operator to (a) limit
          the amount of multicast routing activity that can be triggered by a
          multicast VPN customer, and to (b) provide the best possible
          independence between distinct VPNs. It seems that both of these can
          be addressed through local implementation improvements, and that
          both the BGP-based and PIM-based approaches could be engineered to
          provide (a) and (b). It can be noted though that the BGP approach
          proposes ways to dampen C-multicast route withdrawals and/or
          advertisements, and thus already describes a way to provide (a),
          while nothing comparable has yet been described for the PIM-based
          approaches (even though it doesn't appear difficult). The PIM-based
          approaches rely on a per VPN dataplane to carry the MVPN control
          plane, and thus may benefit from this first level of separation to
          solve (b).</t>
        </section>

        <section title="C-multicast VPN join latency">
          <t><xref target="RFC4834">Section 5.1.3 of</xref> states that "the
          group join delay [...] is also considered one important QoS
          parameter. It is thus RECOMMENDED that a multicast VPN solution be
          designed appropriately in this regard". In a multicast VPN context,
          the "group join delay"of interest is the time between a CE sending a
          PIM Join to its PE and the first packet of the corresponding
          multicast stream being received by the CE.</t>

          <t>It is to be noted that the C-multicast routing procedures will
          only impact the group join latency of a said multicast stream for
          the first receiver that is located across the provider backbone from
          the multicast source-connected PE (or the first <n> receivers
          in the specific case where a specific UMH selection algorithm is
          used, that allows <n> distinct UMH to be selected by distinct
          downstream PEs).</t>

          <t>The different approaches proposed seem to have different
          characteristics in how they are expected to impact join
          latency:<list style="symbols">
              <t>the PIM-based approaches minimize the number of control plane
              processing hops between a new receiver-connected PE and the
              source-connected PE, and being datagram-based introduces minimal
              delay, thereby possibly having a join latency as good as
              possible depending on implementation efficiency</t>

              <t>under degraded conditions (packet loss, congestion, high
              control plane load) the PIM-based approach may impact the
              latency for a given multicast stream in an all or nothing
              manner: if a C-multicast routing PIM Join packet is lost,
              latency can reach a high time (a multiple of the periodicity of
              PIM Join refreshes)</t>

              <t>the BGP-based approach uses TCP exchanges, that may introduce
              an additional delay depending on BGP and TCP implementation, but
              which would typically result, under degraded conditions (such
              packet loss, congestion, high control plane load), in a
              comparably lower increase of latency spread more evenly across
              the streams</t>

              <t>as shown in <xref target="PEPE-mrouting-load"/>, the
              BGP-based approach is particular in that it removes load from
              all the PEs (without putting this load on the upstream PE for a
              stream); this improvement of background load can bring improved
              performance when a PE acts as the upstream PE for a stream, and
              thus benefit join latency</t>
            </list></t>

          <t>This qualitative comparison of approaches shows that the
          BGP-based approach is designed for a smoother degradation of latency
          under degraded conditions such as packet loss, congestion, or high
          control plane load. On the other hand, the PIM-based approaches seem
          to structurally be able to reach the shorter "best-case" group join
          latency (especially compared to deployment of the BGP-based approach
          where route-reflectors are used).</t>

          <t>Doing a quantitative comparison of latencies is not possible
          without referring to specific implementations and benchmarking
          procedures, and would possibly expose different conclusions,
          especially for best-case group join latency for which performance is
          expected vary with PIM and BGP implementations. We can also note
          that improving a BGP implementation for reduced latency of route
          processing would not only benefit multicast VPN group join latency,
          but the whole BGP-based routing, which means that the need for good
          BGP/RR performance is not specific to multicast VPN routing.</t>

          <t>Last, C-multicast join latency will be impacted by the overall
          load put on the control plane, and the scalability of the
          C-multicast routing approach is thus to be taken into account. As
          explained in sections <xref target="pepe-scaling"/> and <xref target="PEPE-mrouting-load"/>, the BGP-based approach will
          provide the best scalability with an increased number of PEs per
          VPN, thereby benefiting group join latency in such higher scale
          scenarios.</t>
        </section>

        <section title="Conclusion on C-multicast routing">
          <t>The first and fourth approaches are relevant contenders for
          C-multicast routing. Comparisons from a theoretical standpoint lead
          to identify some advantages as well as possible drawbacks in the
          fourth approach. Comparisons from a practical standpoint are harder
          to make: since only reduced deployment and implementation
          information is available for the fourth approach, advantages would
          be seen in the first approach that has been applied through multiple
          deployments and shown to be operationally viable.</t>

          <t>Moreover, the first mechanism (full per-MVPN PIM peering across
          an MI-PMSI) is the mechanism used by <xref target="I-D.rosen-vpn-mcast"/> and therefore it is deployed
          and operating in MVPNs today. The fourth approach may or may not end
          up being preferred for a said deployment, but because the first
          approach has been in deployment for some time, the support for this
          mechanism will in any case be helpful for to facilitate an eventual
          migration from a deployment using mechanism close to the first
          approach.</t>

          <t>Consequently, at the present time, implementations are
          recommended to support both the fourth (BGP-based) and first (Full
          per-MPVN PIM peering) mechanisms. Further experience on deployments
          of the fourth approach is needed before some best practice can be
          defined. In the meantime, this recommendation would enable service
          providers to choose between the first and the fourth mechanism,
          without this choice being constrained by vendors implementation
          choices.</t>
        </section>
      </section>

      <section title="Encapsulation techniques for P-multicast trees">
        <t>In this section the authors will not make any restricting
        recommendations since the appropriateness of a specific provider core
        data plane technology will depend on a large number of factors, for
        example the service provider's currently deployed unicast data plane,
        many of which are service provider specific.</t>

        <t>However, implementations should not unreasonably restrict the data
        plane technology that can be used, and should not force the use of the
        same technology for different VPNs attached to a single PE. Initial
        implementations may only support a reduced set of encapsulation
        techniques and data plane technologies but this should not be a
        limiting factor that hinders future support for other encapsulation
        techniques, data plane technologies or interoperability.</t>

        <t><xref target="RFC4834">Section 5.2.4.1 of</xref> states "In a
        multicast VPN solution extending a unicast L3 PPVPN solution,
        consistency in the tunneling technology has to be favored: such a
        solution SHOULD allow the use of the same tunneling technology for
        multicast as for unicast. Deployment consistency, ease of operation
        and potential migrations are the main motivations behind this
        requirement."</t>

        <t>Current unicast VPN deployments use a variety of LDP, RSVP-TE and
        GRE/IP-Multicast for encapsulating customer packets for transport
        across the provider core of VPN services. In order to allow the same
        encapsulations to be used for unicast and multicast VPN traffic, it is
        recommended that multicast VPN standards should recommend
        implementations to support for multicast VPNs, all the P2MP variants
        of the encapsulations and signaling protocols that they support for
        unicast and for which some multipoint extension is defined, such as
        mLDP, P2MP RSVP-TE and GRE/IP-multicast.</t>

        <t>All three of the above encapsulation techniques support the
        building of P2MP multicast P-tunnels. In addition mLDP and
        GRE/IP-ASM-Multicast implementations may also support the building of
        MP2MP multicast P-tunnels. The use of MP2MP P-tunnels may provide some
        scaling benefits to the service provider as only a single MP2MP
        P-tunnel need be deployed per VPN, thus reducing by an order of
        magnitude the amount of multicast state that needs to be maintained by
        P routers. This gain in state is at the expense of bandwidth
        optimization, since sites that do not have multicast receivers for
        multicast streams sourced behind a said PE group will still receive
        packets of such streams, leading to non-optimal bandwidth utilization
        across the VPN core. One thing to consider is that the use of MP2MP
        multicast P-tunnel will require additional configuration to define the
        same P-tunnel identifier or multicast ASM group address in all PEs (it
        has been noted that some auto-configuration could be possible for
        MP2MP P-tunnels, but this it is not currently supported by the
        auto-discovery procedures). [ It has been noted that C-multicast
        routing schemes not covered in <xref target="I-D.ietf-l3vpn-2547bis-mcast"/> could expose different
        advantages of MP2MP multicast P-tunnels - this is out of scope of this
        document ]</t>

        <t>MVPN services can also be supported over a unicast VPN core through
        the use of ingress PE replication whereby the ingress PE replicates
        any multicast traffic over the P2P tunnels used to support unicast
        traffic. While this option does not require the service provider to
        modify their existing P routers (in terms of protocol support) and
        does not require maintaining multicast-specific state on the P routers
        in order for the service provider to be able deploy a multicast VPN
        service, the use of ingress PE replication obviously leads to
        non-optimal bandwidth utilization and it is therefore unlikely to be
        the long term solution chosen by service providers. However ingress PE
        replication may be useful during some migration scenarios or where a
        service provider considers the level of multicast traffic on their
        network to be too low to justify deploying multicast specific support
        within their VPN core.</t>

        <t>All proposed approaches for control plane and dataplane can be used
        to provide aggregation amongst multicast groups within a VPN and
        amongst different multicast VPNs, and potentially reduce the amount of
        state to be maintained by P routers. However the latter -- the
        aggregation amongst different multicast VPNs will require support for
        upstream-assigned labels on the PEs. Support for upstream-assigned
        labels may require changes to the data plane processing of the PEs and
        this should be taken into consideration by service providers
        considering the use of aggregate PMSI tunnels for the specific
        platforms that the service provider has deployed.</t>
      </section>

      <section anchor="interas" title="Inter-AS deployments options">
        <t>There are a number of scenarios that lead to the requirement for
        inter-AS multicast VPNs, including:<list style="numbers">
            <t>a service provider may have a large network that they have
            segmented into a number of ASs.</t>

            <t>a service provider's multicast VPN may consist of a number of
            ASs due to acquisitions and mergers with other service
            providers.</t>

            <t>a service provider may wish to interconnect their multicast VPN
            platform with that of another service provider.</t>
          </list>The first scenario can be considered the "simplest" because
        the network is wholly managed by a single service provider under a
        single strategy and is therefore likely to use a consistent set of
        technologies across each AS.</t>

        <t>The second scenario may be more complex than the first because the
        strategy and technology choices made for each AS may have been
        different due to their differing history and the service provider may
        not have (or may be unwilling to) unified the strategy and technology
        choices for each AS.</t>

        <t>The third scenario is the most complex because in addition to the
        complexity of the second scenario, the ASs are managed by different
        service providers and therefore may be subject to a different trust
        model than the other scenarios.</t>

        <t><xref target="RFC4834">Section 5.2.6 of</xref> states that "a
        solution MUST support inter-AS multicast VPNs, and SHOULD support
        inter-provider multicast VPNs", "considerations about coexistence with
        unicast inter-AS VPN Options A, B and C (as described in section 10 of
        [RFC4364]) are strongly encouraged" and "a multicast VPN solution
        SHOULD provide inter-AS mechanisms requiring the least possible
        coordination between providers, and keep the need for detailed
        knowledge of providers' networks to a minimum - all this being in
        comparison with corresponding unicast VPN options".</t>

        <t><xref target="I-D.ietf-l3vpn-2547bis-mcast">Section 8 of </xref>
        addresses these requirements by proposing two approaches for MVPN
        inter-AS deployments:</t>

        <t><list style="numbers">
            <t>Non-segmented inter-AS tunnels where the multicast tunnels are
            end-to-end across ASes, so even though the PEs belonging to a
            given MVPN may be in different ASs the ASBRs play no special role
            and function merely as P routers (described in section 8.1).</t>

            <t>Segmented inter-AS tunnels where each AS constructs its own
            separate multicast tunnels which are then 'stitched' together by
            the ASBRs (described in section 8.2).</t>
          </list></t>

        <t><xref target="RFC4834">Section 5.2.6 of</xref> also states "Within
        each service provider the service provider SHOULD be able on its own
        to pick the most appropriate tunneling mechanism to carry (multicast)
        traffic among PEs (just like what is done today for unicast)". The
        segmented approach is the only one capable of meeting this
        requirement.</t>

        <t>The segmented inter-AS solution would appear to offer the largest
        degree of deployment flexibility to operators. However the
        non-segmented inter-AS solution can simplify deployment in a
        restricted number of scenarios and <xref target="I-D.rosen-vpn-mcast"/> only supports the non-segmented
        inter-AS solution and therefore the non-segmented inter-AS solution is
        likely to be useful to some operators for backward compatibility and
        during migration from <xref target="I-D.rosen-vpn-mcast"/> to
        <xref target="I-D.ietf-l3vpn-2547bis-mcast"/>.</t>

        <t>The applicability of segmented or non-segmented inter-AS tunnels to
        a given deployment or inter-provider interconnect will depend on a
        number of factors specific to each service provider. However, due to
        the additional deployment flexibility offered by segmented inter-AS
        tunnels, it is the recommendation of the authors that all
        implementations should support the segmented inter-AS model.
        Additionally, the authors recommend that implementations should
        consider supporting the non-segmented inter-AS model in order to
        facilitate co-existence with existing deployments, and as a feature to
        provide a lighter engineering in a restricted set of scenarios,
        although it is recognized that initial implementations may only
        support one or the other.</t>
      </section>

      <section anchor="bidir" title="Bidir-PIM support">
        <t>In Bidir-PIM, the packet forwarding rules have been improved over
        PIM-SM, allowing traffic to be passed up the shared tree toward the RP
        Address (RPA). To avoid multicast packet looping, Bidir-PIM uses a
        mechanism called the designated forwarder (DF) election, which
        establishes a loop-free tree rooted at the RPA. Use of this method
        ensures that only one copy of every packet will be sent to an RPA,
        even if there are parallel equal cost paths to the RPA. To avoid loops
        the DF election process enforces consistent view of the DF on all
        routers on network segment, and during periods of ambiguity or routing
        convergence the traffic forwarding is suspended.</t>

        <t>In the context of a multicast VPN solution, a solution for
        Bidir-PIM support must preserve this property of similarly avoiding
        packet loops, including in the case where mVRF's in a given MVPN don't
        have a consistent view of the routing to C-RPL/C-RPA.</t>

        <t>The current MVPN specifications <xref target="I-D.ietf-l3vpn-2547bis-mcast"/> in section 11, define
        three methods to support Bidir-PIM, as RECOMMENDED in <xref target="RFC4834"/>:<list style="numbers">
            <t>Standard DF election procedure over an MI-PMSI</t>

            <t>VPN Backbone as the RPL (section 11.1)</t>

            <t>Partitioned Sets of PEs (section 11.2)</t>
          </list></t>

        <t>Method (1) is naturally applied to deployments using "Full per-MVPN
        PIM peering across an MI-PMSI" for C-multicast routing, but as
        indicated in <xref target="I-D.ietf-l3vpn-2547bis-mcast"/> in
        section 11, the DF Election may not work well in an MVPN environment
        and an alternative to DF election would be desirable.</t>

        <t>The advantage of method (2) and (3) is that they do not require
        running the DF election procedure among PEs.</t>

        <t>Method (2) leverages the fact that in Bidir-PIM, running the DF
        election procedure is not needed on the RPL. This approach thus has
        the benefit of simplicity of implementation, especially in a context
        where BGP-based C-multicast routing is used. However it has the
        drawback of putting constraints on how Bidir-PIM is deployed which may
        not always match MVPN customers requirements.</t>

        <t>Method (3) treats an MVPN as a collection of sets of multicast
        VRFs, all PEs in a set having the same reachability information
        towards C-RPA, but distinct from PEs in other sets. Hence, with this
        method, C-Bidir packet loops in MVPN are resolved by the ability to
        partition a VPN into disjoints sets of VRF's, each having a distinct
        view of converged network. The partitioning approach to Bidir-PIM
        requires either upstream-assigned MPLS labels (to denote the
        partition) or a unique MP2MP LSP per partition. The former is based on
        PE Distinguisher Labels that have to be distributed using
        auto-discovery BGP routes and their handling requires the support for
        upstream assigned labels and context label lookups <xref target="RFC5331"/>. The latter, using MP2MP LSP per partition,
        does not have these constraints but is restricted to P-tunnel types
        supporting MP2MP connectivity (such as <xref target="I-D.ietf-mpls-ldp-p2mp">mLDP</xref>).</t>

        <t>This approach to C-Bidir can work with PIM-based or BGP-based
        C-multicast routing procedures, and is also generic in the sense that
        it does not impose any requirements on the Bidir-PIM service
        offering.</t>

        <t>Given the above considerations, method (3) "Partitioned Sets of
        PEs" is the RECOMMENDED approach.</t>

        <t>In the event where method (3) is not applicable (lack of support
        for upstream assigned labels or for a P-tunnel type providing MP2MP
        connectivity), then method (1) "Standard DF election procedure over an
        MI-PMSI" and (2) "VPN Backbone as the RPL" are RECOMMENDED as interim
        solutions, (1) having the advantage over (2) of not putting
        constraints on how Bidir-PIM is deployed and the drawbacks of only
        being applicable when PIM-based C-multicast is used and of possibly
        not working well in an MVPN environment.</t>
      </section>
    </section>

    <section title="Co-located RPs">
      <t><xref target="RFC4834">Section 5.1.10.1 of</xref> states "In the case
      of PIM-SM in ASM mode, engineering of the RP function requires the
      deployment of specific protocols and associated configurations. A
      service provider may offer to manage customers' multicast protocol
      operation on their behalf. This implies that it is necessary to consider
      cases where a customer's RPs are out-sourced (e.g. on PEs).
      Consequently, a VPN solution MAY support the hosting of the RP function
      in a VR or VRF."</t>

      <t>However, customers who have already deployed multicast within their
      networks and have therefore already deployed their own internal RPs are
      often reluctant to hand over the control of their RPs to their service
      provider and make use of a co-located RP model, and providing
      RP-collocation on a PE will require the activation of MSDP or the
      processing of PIM Registers on the PE. Securing the PE routers for such
      activity requires special care, additional work, and will likely rely on
      specific features to be provided by the routers themselves.</t>

      <t>The applicability of the co-located RP model to a given MVPN will
      thus depend on a number of factors specific to each customer and service
      provider.</t>

      <t>It is therefore the recommendation that implementations should
      support a co-located RP model, but that support for a co-located RP
      model within an implementation should not restrict deployments to using
      a co-located RP model: implementations MUST support deployments when
      activation of a PIM RP function (PIM Register processing and RP-specific
      PIM procedures) or VRF MSDP instance is not required on any PE router
      and where all the RPs are deployed within the customers' networks or
      CEs.</t>
    </section>

    <section title="Existing deployments">
      <t>Some suggestions provided in this document can be used to
      incrementally modify currently deployed implementations without
      hindering these deployments, and without hindering the consistency of
      the standardized solution by providing optional per-VRF configuration
      knobs to support modes of operation compatible with currently deployed
      implementations, while at the same time using the recommended approach
      on implementations supporting the standard.</t>

      <t>In cases where this may not be easily achieved, a recommended
      approach would be to provide a per-VRF configuration knob that allows
      incremental per-VPN migration of the mechanisms used by a PE device,
      which would allow migration with some per-VPN interruption of service
      (e.g. during a maintenance window).</t>

      <t>Mechanisms allowing "live" migration by providing concurrent use of
      multiple alternatives for a given PE and a given VPN, is not seen as a
      priority considering the expected implementation complexity associated
      with such mechanisms. However, if there happen to be cases where they
      could be viably implemented relatively simply, such mechanisms may help
      improve migration management.</t>
    </section>

    <section anchor="summary" title="Summary of recommendations">
      <t>The following list summarizes conclusions on the mechanisms that
      define the set of mandatory to implement mechanisms in the context of
      <xref target="I-D.ietf-l3vpn-2547bis-mcast"/>.</t>

      <t>Note well that the implementation of the non-mandatory alternative
      mechanisms is not precluded.</t>

      <t>Recommendations are:<list style="symbols">
          <t>that BGP-based auto-discovery be the mandated solution for
          auto-discovery ;</t>

          <t>that BGP be the mandated solution for S-PMSI switching signaling
          ;</t>

          <t>that implementations support both the BGP-based and the full
          per-MPVN PIM peering solutions for PE-PE exchange of customer
          multicast routing until further operational experience is gained
          with both solutions ;</t>

          <t>that implementations use the "Partitioned Sets of PEs" approach
          for Bidir-PIM support ;</t>

          <t>that implementations implement the P2MP variants of the P2P
          protocols that they already implement, such as mLDP, P2MP RSVP-TE
          and GRE/IP-Multicast ;</t>

          <t>that implementations support segmented inter-AS tunnels and
          consider supporting non-segmented inter-AS tunnels (in order to
          maintain backwards compatibility and for migration) ;</t>

          <t>implementations MUST support deployments when activation of a PIM
          RP function (PIM Register processing and RP-specific PIM procedures)
          or VRF MSDP instance is not required on any PE router.</t>
        </list></t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes no request to IANA.</t>

      <t>[ Note to RFC Editor: this section may be removed on publication as
      an RFC. ]</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This document does not by itself raise any particular security
      considerations.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>We would like to thank Adrian Farrel, Eric Rosen, Yakov Rekhter, and
      Maria Napierala for their feedback that helped shape this document.</t>

      <t>Additional credit is due to Maria Napierala for co-authoring <xref target="bidir"/> on <xref format="title" target="bidir"/>.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <reference anchor="RFC2119">
        <front>
          <title abbrev="RFC Key Words">Key words for use in RFCs to Indicate
          Requirement Levels</title>

          <author fullname="Scott Bradner" initials="S." surname="Bradner">
            <organization>Harvard University</organization>

            <address>
              <postal>
                <street>1350 Mass. Ave.</street>

                <street>Cambridge</street>

                <street>MA 02138</street>
              </postal>

              <phone>- +1 617 495 3864</phone>

              <email>sob@harvard.edu</email>
            </address>
          </author>

          <date month="March" year="1997"/>

          <area>General</area>

          <keyword>keyword</keyword>

          <abstract>
            <t>In many standards track documents several words are used to
            signify the requirements in the specification. These words are
            often capitalized. This document defines these words as they
            should be interpreted in IETF documents. Authors who follow these
            guidelines should incorporate this phrase near the beginning of
            their document: <list>
                <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
                "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
                "OPTIONAL" in this document are to be interpreted as described
                in RFC 2119.</t>
              </list></t>

            <t>Note that the force of these words is modified by the
            requirement level of the document in which they are used.</t>
          </abstract>
        </front>

        <seriesInfo name="BCP" value="14"/>

        <seriesInfo name="RFC" value="2119"/>

        <format octets="4723" target="ftp://ftp.isi.edu/in-notes/rfc2119.txt" type="TXT"/>

        <format octets="16553" target="http://xml.resource.org/public/rfc/html/rfc2119.html" type="HTML"/>

        <format octets="5703" target="http://xml.resource.org/public/rfc/xml/rfc2119.xml" type="XML"/>
      </reference>

      <?rfc include='reference.I-D.ietf-l3vpn-2547bis-mcast'?>

      <?rfc include='reference.I-D.ietf-l3vpn-2547bis-mcast-bgp'?>
    </references>

    <references title="Informative References">
      <reference anchor="RFC4834">
        <front>
          <title>Requirements for Multicast in L3 Provider-Provisioned Virtual
          Private Networks (PPVPNs)</title>

          <author fullname="Thomas Morin" initials="T" surname="Morin">
            <organization/>
          </author>

          <date day="" month="April" year="2007"/>

          <abstract>
            <t>This document presents a set of functional requirements for
            network solutions that allow the deployment of IP multicast within
            L3 Provider Provisioned Virtual Private Networks (PPVPNs). It
            specifies requirements both from the end user and service provider
            standpoints. It is intended that potential solutions specifying
            the support of IP multicast within such VPNs will use these
            requirements as guidelines.</t>
          </abstract>
        </front>

        <seriesInfo name="RFC" value="4834"/>

        <format target="http://www.ietf.org/rfc/rfc4834.txt" type="TXT"/>
      </reference>

      <?rfc include='reference.I-D.rosen-vpn-mcast'?>

      <reference anchor="I-D.raggarwa-l3vpn-2547-mvpn">
        <front>
          <title>Base Specification for Multicast in BGP/MPLS VPNs</title>

          <author fullname="Rahul Aggarwal" initials="R" surname="Aggarwal">
            <organization/>
          </author>

          <date day="22" month="June" year="2004"/>

          <abstract>
            <t>This document describes the minimal set of procedures required
            to build multi-vendor inter-operable implementations of multicast
            for BGP/MPLS VPNs. It is based on prior specifications of
            multicast for BGP/MPLS VPN specifications that have been
            implemented and deployed. The procedures described herein require
            PIM-SM as the multicast routing protocol in the SP network.</t>
          </abstract>
        </front>

        <seriesInfo name="Internet-Draft" value="draft-raggarwa-l3vpn-2547-mvpn-00"/>

        <format target="http://www.ietf.org/internet-drafts/draft-raggarwa-l3vpn-2547-mvpn-00.txt" type="TXT"/>
      </reference>

      <reference anchor="I-D.ietf-pim-sm-linklocal">
        <front>
          <title>Authentication and Confidentiality in PIM-SM Link-local
          Messages</title>

          <author fullname="John  Atwood" initials="J" surname="Atwood">
            <organization/>
          </author>

          <date day="18" month="November" year="2007"/>

          <abstract>
            <t>RFC 4601 mandates the use of IPsec to ensure authentication of
            the link-local messages in the Protocol Independent Multicast -
            Sparse Mode (PIM-SM) routing protocol. This document specifies
            mechanisms to authenticate the PIM-SM link local messages using
            the IP security (IPsec) Authentication Header (AH) or
            Encapsulating Security Payload (ESP). It specifies optional
            mechanisms to provide confidentiality using the ESP. Manual keying
            is specified as the mandatory and default group key management
            solution. To deal with issues of scalability and security that
            exist with manual keying, an optional automated group key
            management mechanism is specified.</t>
          </abstract>
        </front>

        <seriesInfo name="Internet-Draft" value="draft-ietf-pim-sm-linklocal-08"/>

        <format target="http://www.ietf.org/internet-drafts/draft-ietf-pim-sm-linklocal-02.txt" type="TXT"/>
      </reference>

      <?rfc include='reference.I-D.ietf-pim-port'?>

      <?rfc include='reference.RFC.4684'?>

      <?rfc include='reference.I-D.ietf-mpls-ldp-p2mp'?>

      <?rfc include='reference.RFC.5331'?>
    </references>

    <section anchor="PEPE-mrouting-load" title="Scalability of C-multicast routing processing load">
      <t>The main role of multicast routing is to let routers determine that
      they should start or stop forwarding a said multicast stream on a said
      link. In an MVPN context, this has to be done for each MVPN, and the
      associated function is thus named "customer-multicast routing" or
      "C-multicast routing" and its role is to let PE routers determine that
      they should start or stop forwarding the traffic of a said multicast
      stream toward the remote PEs, on some PMSI tunnel.</t>

      <t>When some "join" message is received by a PE, this PE knows that it
      should be sending traffic for the corresponding multicast group of the
      corresponding MVPN. But the reception of a "prune" message from a remote
      PE is not enough by itself for a PE to know that it should stop
      forwarding the corresponding multicast traffic: it has to make sure that
      they aren't any other PEs that still have receivers for this
      traffic.</t>

      <t>There are many ways that the "C-multicast routing" building block can
      be designed, and they differ, among other things, in how a PE determines
      when it can stop forwarding a said multicast stream toward other
      PEs:<list style="hanging">
          <t hangText="PIM LAN Procedures, by default"><vspace blankLines="0"/>By default when PIM LAN procedures are used, when a
          PE on a LAN Prunes itself from a multicast tree, all other PEs on
          that LAN check their own state to known if they are on the tree, in
          which case they send a PIM Join message on that LAN to override the
          Prune. Thus, for each PIM Prune message, all PE routers on the LAN
          work to let the upstream PE determine the answer to the "did the
          last receiver leave?" question.</t>

          <t hangText="PIM LAN Procedures, with explicit tracking : "><vspace blankLines="0"/>On a LAN, PIM LAN procedures can use an "explicit
          tracking" approach, where a PE which is the upstream router for a
          multicast stream maintains an updated list of all neighbors on the
          LAN who are joined to the tree. Thus, when it receives a Leave
          message from a PIM neighbor, it instantly knows the answer to the
          "did the last receiver leave?" question.<vspace blankLines="0"/>In
          this case, the question is answered by the upstream router alone.
          The side effect of this "explicit tracking" is that "Join
          suppression" is not used: the downstream PEs will always send Joins
          toward the upstream PE, which will have to process them all.</t>

          <t hangText="BGP-based C-multicast routing"><vspace blankLines="0"/>When BGP-based procedures are used for C-multicast
          routing, if no BGP route reflector is used, the "did the last
          receiver leave?" question is answered like in the PIM "explicit
          tracking" approach.<vspace blankLines="0"/>But, when a BGP route
          reflector is used (which is expected to be the recommended
          approach), the role of maintaining an updated list of the PEs that
          are part of a said multicast tree is taken care of by the Route
          Reflector(s). Using BGP procedures the route reflector that had been
          advertised a C-multicast Source Tree Join route for a said (C-S,
          C-G) to other route reflectors before, will withdraw this route when
          there is no of its clients PEs advertising this route anymore.
          Similarily, a route reflector that had advertised this route to its
          client PEs before, will withdraw this route when there is none of
          its (other) client PEs, and none of its route reflectors peers
          advertising this route anymore. In this context, the "did the last
          receiver leave?" question can be said to be answered by the
          route-reflector(s).<vspace blankLines="0"/>Furthermore, the BGP
          route distribution can leverage more than one route reflector: if
          multiple route reflectors are used with PEs being distributed (as
          clients) among these route reflectors, the "did the last receiver
          leave?" question is partly answered by each of these route
          reflector.</t>
        </list></t>

      <t>We can see that answering the "last receiver leaves" question is a
      significant proportion of the work that the C-multicast routing building
      block has to make, and where the approaches differ most. The different
      approaches for handling C-multicast routing can result in a different
      amount of processing and how this processing is spread among the
      different functions. These differences can be better estimated by
      quantifying the amount of message processing and state maintenance.</t>

      <t>Though the type of processing, messages and states, may vary with the
      different approaches, we propose here a rough estimation of the load of
      PEs, in terms of number of messages processed and number of control
      plane states maintained. A "message processed" being a message being
      parsed, a lookup being done, and some action being taken (such has
      updating a control plane or data plane state). A "state maintained"
      being a multicast state kept in the control plane memory of a PE,
      related to a interface or a PE being subscribed to a multicast stream.
      Note that here we don't compare the data plane states on PE routers,
      which wouldn't vary between the different options chosen.</t>

      <section anchor="pepe-scaling-analysis" title="Scalability with an increased number of PEs">
        <t>The following sections aims at evaluating the processing and state
        maintenance load for an increasingly high number of PEs in a VPN.</t>

        <section anchor="one-tree" title="SSM Scenario">
          <t>The following subsections do such an estimation for each proposed
          approach for C-multicast routing, for different phases of the
          following scenario:</t>

          <t><list style="symbols">
              <t>one SSM multicast stream is considered</t>

              <t>only the intra-AS case is concerned (with the segmented
              inter-AS tunnels and BGP-based C-multicast routing, #mvpn_PE and
              #R_PE should refer to the PEs of the MVPN in the AS, not to all
              PEs of the MVPN)</t>

              <t>the scenario is as follows:<list style="symbols">
                  <t>one PE Joins the multicast stream (because of a new
                  receiver-connected site has sent a Join on the PE-CE link),
                  followed by a number of additional PEs that also join the
                  same multicast stream, one after the other ; we evaluate the
                  processing required for the addition of each PE</t>

                  <t>some period of time T passes, without any PE joining or
                  leaving (baseline)</t>

                  <t>all PE leaves, one after the other, until the last one
                  leaves ; we evaluate the processing required for the leave
                  of each PE</t>
                </list></t>

              <t>the parameters used are:<list style="symbols">
                  <t>#MVPN_PE: the number of PEs in the MVPN</t>

                  <t>#R_PE: the number of PEs joining the multicast stream</t>

                  <t>#RR: the number of route reflectors</t>

                  <t>T_PIM_r: the time between two refreshes of a PIM Join
                  (default is 60s)</t>
                </list></t>
            </list></t>

          <t>The estimation unit used is the "message.equipment" (or "m.e"):
          one "message.equipment" corresponding to "one equipment processing
          one message" (10 m.e being "10 equipments processing each one
          message", or "5 messages each processed by 2 equipments", or "1
          message processed by 10 equipment", etc.). Similarly, for the amount
          of control plane state, the unit used is "state.equipment" or "s.e".
          This allow to take into account the fact that a message (or a state)
          can have be processed (or maintained) by more than one node.</t>

          <t>We distinguish three different types of equipments: the upstream
          PE for the considered multicast stream, the RR (if any), and the
          other PEs (which are not the upstream PE).</t>

          <t>The numbers or orders of magnitude given in the tables in the
          following subsections are totals across all equipments of a same
          type, for each type of equipment, in the the "m.e" and "s.e" units
          defined above.</t>

          <t>Additionally:<list style="symbols">
              <t>for PIM, only Join and Prune messages are counted:<list style="symbols">
                  <t>the load due to PIM Hellos can be easily computed
                  separately and only depends on the number of PEs in the
                  VPN;</t>

                  <t>message processing related to the PIM Assert mechanism is
                  also not taken into account, for sake of simplicity;</t>
                </list></t>

              <t>for BGP, all advertisements and withdrawals of C-multicast
              Source Tree Join routes are considered (Source-Active
              autodiscovery routes are not used in an SSM context) ; and,
              following the recommendation of <xref target="I-D.ietf-l3vpn-2547bis-mcast-bgp"/> the case where
              the <xref target="RFC4684">RT-Constraint mechanisms</xref> is
              not used is not covered;</t>
            </list></t>

          <section title="PIM LAN procedures, by default">
            <texttable style="all" title="Messages processing and state maintenance - PIM LAN procedures, by default">
              <ttcol width="12%"/>

              <ttcol width="22%">upstream PE (1)</ttcol>

              <ttcol width="22%">other PEs (total across (#mvpn_PE-1)
              PEs)</ttcol>

              <ttcol width="22%">RR (none)</ttcol>

              <ttcol width="22%">total across all equipments</ttcol>

              <c>first PE joins</c>

              <c>1 m.e</c>

              <c>#MVPN_PE-1 m.e</c>

              <c>/</c>

              <c>#MVPN_PE m.e</c>

              <c>for *each* additional PE joining</c>

              <c>1 m.e</c>

              <c>#mvpn_PE-1 m.e</c>

              <c>/</c>

              <c>#mvpn_PE m.e</c>

              <c>baseline processing over a period T</c>

              <c>T/T_PIM_r m.e</c>

              <c>(T/T_PIM_r) . (#mvpn_PE-1) m.e</c>

              <c>/</c>

              <c>(T/T_PIM_r) x #mvpn_PE m.e</c>

              <c>for *each* PE leaving</c>

              <c>2 m.e</c>

              <c>2(#mvpn_PE-1) m.e</c>

              <c>/</c>

              <c>2 x #mvpn_PE m.e</c>

              <c>the last PE leaves</c>

              <c>1 m.e</c>

              <c>#mvpn_PE-1 m.e</c>

              <c>/</c>

              <c>#mvpn_PE m.e</c>

              <c>total for #R_PE PEs</c>

              <c>#R_PE x 2 + T/T_PIM_r m.e</c>

              <c>(#mvpn_PE-1) x (#R_PE) x 2 + T/T_PIM_r) . (#mvpn_PE-1)
              m.e</c>

              <c>0</c>

              <c>#mvpn_PE x ( 3 x #R_PE + T/T_PIM_r ) m.e</c>

              <c>total state maintained</c>

              <c>1 s.e</c>

              <c>#R_PE s.e</c>

              <c>0</c>

              <c>#R_PE+1 s.e</c>
            </texttable>

            <t>We suppose here that the PIM Join suppression and Prune
            Override mechanisms are fully effective, i.e. that a Join or Prune
            message sent by a PE is instantly seen by other PEs. Strictly
            speaking, this is not true, and depending on network delays and
            timing, there could be cases where more messages are exchanged and
            the number given in this table is a lower bound to the number of
            PIM messages exchanged.</t>
          </section>

          <section anchor="pim-explicit" title="PIM LAN procedures, with explicit tracking">
            <texttable style="all" title="Messages processing and state maintenance - PIM LAN procedures, with explicit tracking">
              <ttcol/>

              <ttcol>upstream PE (1)</ttcol>

              <ttcol>other PEs (total across (#mvpn_PE-1) PEs)</ttcol>

              <ttcol>RRs (none)</ttcol>

              <ttcol>total across all equipments</ttcol>

              <c>first PE joins</c>

              <c>1 m.e</c>

              <c>1 m.e (see note below)</c>

              <c>/</c>

              <c>2 m.e</c>

              <c>for *each* additional PE joining</c>

              <c>1 m.e</c>

              <c>1 m.e (see note below)</c>

              <c>/</c>

              <c>2 m.e</c>

              <c>baseline processing over a period T</c>

              <c>(T/T_PIM_r) m.e x #R_PE m.e</c>

              <c>(T/T_PIM_r) m.e (see note below)</c>

              <c>/</c>

              <c>(T/T_PIM_r) x #R_PE m.e</c>

              <c>for *each* PE leaving</c>

              <c>1 m.e</c>

              <c>1 m.e (see note below)</c>

              <c>/</c>

              <c>2 m.e</c>

              <c>the last PE leaves</c>

              <c>1 m.e</c>

              <c>1 m.e (see note below)</c>

              <c>/</c>

              <c>2 m.e</c>

              <c>total for #R_PE PEs</c>

              <c>#R_PE (2 + T/T_PIM_r) m.e</c>

              <c>#R_PE x ( 2 + T/T_PIM_r) m.e</c>

              <c>0</c>

              <c>#R_PE x ( 4 + T/T_PIM_r) m.e</c>

              <c>total state maintained</c>

              <c>#R_PE s.e</c>

              <c>#R_PE s.e</c>

              <c>0</c>

              <c>2 x #R_PE s.e</c>
            </texttable>

            <t>Note: in this explicit tracking mode, a said Join or Leave
            message requires processing only by the upstream PE and the PE
            sending the message ; indeed, other PEs don't have any action to
            take ; it is to be noted though that these other PEs will still
            have to parse the PIM message, which is not zero processing. We
            make here the assumption that this is not significant.</t>
          </section>

          <section title="BGP-based C-multicast routing">
            <t>The following analysis assumes that BGP Route Reflectors (RRs)
            are used, and no hierarchy of RRs (remind that the analysis also
            assumes that Route Target Constrain mechanisms are is used).</t>

            <t>Given these assumptions, a message carrying a C-multicast route
            from a downstream PE would need to be processed by the RRs that
            have that PE as their client. Due to the use of RT Constrain,
            these RRs would then send this message to only the RRs that have
            the upstream PE as client. None of the other RRs, and none of the
            other PEs will receive this message. Thus, for a message
            associated with a given MVPN the total number of RRs that would
            need to process this message only depends on the number of RRs
            that maintain C-multicast routes for that MVPN and that have
            either the receiver-connected PE, or the source-connected PE as
            their clients, and is independent of the total number of RRs or
            the total number of PEs.</t>

            <t>In practice for a given MVPN a PE would be a client of just 2
            RRs (for redundancy, an RR cluster would typically have 2 RRs).
            Therefore, in practice the message would need to be processed by
            at most 4 RRs (2 RRs if both the downstream PE and the upstream PE
            are the clients of the same RRs). Thus the number of RRs that have
            to process a given message is at most 4. Since RRs in different RR
            clusters have a full IBGP mesh among themselves, each RR in the RR
            cluster that contains the upstream PE would receive the message
            from each of the RR in the RR cluster that contains the downstream
            PE. Given 2 RRs per cluster, the total number of messages
            processed by all the RRs is 6.</t>

            <t>Additionaly, as soon as there is a receiver-connected PEs in
            each RR cluster, the number of RRs processing a C-multicast route
            tends quickly toward 2 (taking into account that a PE peering to
            RRs will be made redundant).</t>

            <texttable style="all" title="Message processing and state maintenance - BGP-based procedures">
              <ttcol/>

              <ttcol>upstream PE (1)</ttcol>

              <ttcol>other PEs (total across (#mvpn_PE-1) PEs)</ttcol>

              <ttcol>RRs (#RR)</ttcol>

              <ttcol>total across all equipments</ttcol>

              <c>first PE joins</c>

              <c>2 m.e</c>

              <c>2 m.e</c>

              <c>6 m.e</c>

              <c>10 m.e</c>

              <c>for *each* additional PE joining</c>

              <c>0</c>

              <c>2 m.e</c>

              <c>(at most) 6 m.e tending toward 2 m.e</c>

              <c>(at most) 8 m.e tending toward 4 m.e</c>

              <c>baseline processing over a period T</c>

              <c>0</c>

              <c>0</c>

              <c>0</c>

              <c>0</c>

              <c>for *each* PE leaving</c>

              <c>0</c>

              <c>2 m.e</c>

              <c>(at most) 6 m.e tending toward 2</c>

              <c>(at most) 8 m.e tending toward 4 m.e</c>

              <c>the last PE leaves</c>

              <c>2 m.e</c>

              <c>2 m.e</c>

              <c>6 m.e</c>

              <c>6 m.e</c>

              <c>total for #R_PE PEs</c>

              <c>4 m.e</c>

              <c>#R_PE x 4 m.e</c>

              <c>(at most) 6 x #RP_PE m.e (tending toward 2 x #R_PE m.e)</c>

              <c>at most 2 (5 x #R_PE + 2) m.e (tending toward 2 (3 #R_PE + 2)
              m.e )</c>

              <c>total state maintained</c>

              <c>2 s.e</c>

              <c>#R_PE s.e</c>

              <c>approx. 2 #R_PE + #RR x #clusters s.e</c>

              <c>approx. 3 #R_PE + #RRx #clusters + 2 m.e</c>
            </texttable>

            <t/>
          </section>

          <section anchor="quant-conclusion" title="Side by side orders of magnitude comparison">
            <t>This section concludes on the previous section by considering
            the orders of magnitude when the number of PEs in a VPN
            increases.</t>

            <texttable style="all" title="Comparison of orders of magnitude for messages processing and state maintenance (totals across all equipements)">
              <ttcol/>

              <ttcol>PIM LAN Procedures, default</ttcol>

              <ttcol>PIM LAN Procedures, explicit tracking</ttcol>

              <ttcol>BGP-based</ttcol>

              <c>first PE joins (in m.e)</c>

              <c>O(#MVPN_PE)</c>

              <c>O(1)</c>

              <c>O(1)</c>

              <c>for *each* additional PE joining (in m.e)</c>

              <c>O(#MVPN_PE)</c>

              <c>O(1)</c>

              <c>O(1)</c>

              <c>baseline processing over a period T (in m.e)</c>

              <c>(T/T_PIM_r) x O(#mvpn_PE)</c>

              <c>(T/T_PIM_r) x O(#R_PE)</c>

              <c>0</c>

              <c>for *each* PE leaving (in m.e)</c>

              <c>O(#MVPN_PE)</c>

              <c>O(1)</c>

              <c>O(1)</c>

              <c>the last PE leaves (in m.e)</c>

              <c>O(#MVPN_PE)</c>

              <c>O(1)</c>

              <c>O(1)</c>

              <c>total for #R_PE PEs (in m.e)</c>

              <c>O(#MVPN_PE x #R_PE) + O(#MVPN_PE x T/T_PIM_r)</c>

              <c>O(#R_PE) x (T/T_PIM_r)</c>

              <c>O(#R_PE)</c>

              <c>states (in s.e)</c>

              <c>O(#R_PE)</c>

              <c>O(#R_PE)</c>

              <c>O(#R_PE)</c>

              <c>notes</c>

              <c>(processing and state maintenance are essentially done by,
              and spread amongst, the PEs of the MVPN ; non-upstream PEs have
              processing to do)</c>

              <c>(processing and state maintenance is essentially done on the
              upstream PE)</c>

              <c>(processing and state maintenance is essentially done by, and
              spread amongst, the RRs)</c>
            </texttable>

            <t>The conclusions that can be drawn from the above are that:</t>

            <t><list style="symbols">
                <t>the PIM LAN Procedures default approach is particular in
                that any message will be processed by all PEs, including those
                that are neither upstream nor downstream for the message,
                which results in a total amount of messages to process which
                is in O(#MVPN_PE x #R_PE) ; i.e. O(#MVPN_PE ^ 2) if the
                proportion of receiver PEs is considered constant when the
                number of PEs increases ;</t>

                <t>the two PIM-based approach do refreshes of Join messages,
                this is a linear factor not changing the order of magnitude,
                but which can be significant for long-lived streams ;</t>

                <t>the BGP-based approach requires an amount of message
                processing in O(#R_PE), lower than the two other approaches,
                and which is independent of the duration of streams ;</t>

                <t>state maintenance is of the same order of magnitude for all
                approaches: O(#R_PE), but the repartition is different:<list style="symbols">
                    <t>the PIM LAN Procedure default approach fully spreads,
                    and minimizes, the amount of state (one state per PE)</t>

                    <t>the PIM LAN procedure with explicit tracking,
                    concentrate all state on the upstream PE</t>

                    <t>the BGP-based procedures spread all the state on the
                    set of route reflectors</t>
                  </list></t>
              </list></t>
          </section>
        </section>

        <section anchor="asm-scalability" title="ASM Scalability">
          <t>When PIM-SM is used in a VPN and an ASM multicast group is joined
          by some PEs (#R_PEs) with some sources sending toward this multicast
          group address, we can note the following:</t>

          <t>PEs will generally have to maintain one shared tree, plus one
          source tree for each source sending toward G; each tree resulting in
          an amount of processing and state maintenance similar to what is
          described in the scenario in <xref target="one-tree"> </xref>, with
          the same differences in order of magnitudes between the different
          approaches when the number of PEs is high.</t>

          <t>An exception to this is, when, for a said group in a VPN, among
          the PIM instances in the customer routers and VRFs, none would
          switch to the SPT (SwitchToSptDesired always false): in that case
          the processing and state maintenance load is the one required for
          maintenance of the shared tree only. It has to be noted that this
          scenario is dependent on customer policy. To compare the resulting
          load in that case, between PIM-based approaches and the BGP-based
          approach configured to use inter-site shared trees, the scenario
          in<xref target="one-tree"> </xref> can be used with #R_PEs joining a
          (C-*,C-G) ASM group instead of an SSM group, and the same
          differences in order of magnitude remain true. In the case of the
          BGP-based approach used without inter-site shared trees, we must
          take into account the load resulting from the fact that to built the
          C-PIM shared tree, each PE has to join the Source Tree to each
          source ; using the notations of <xref target="one-tree"> </xref>
          this adds an amount of load (total load across all equipements)
          which is proportional to #R_PEs and the number of sources, the order
          of magnitude with an increasing amount of PEs is thus unchanged, and
          the differences in order of magnitude also remain the same.</t>

          <t>Additionaly to the maintenance of trees, PEs have to ensure some
          processing and state maintenance related to individual sources
          sending to a multicast group ; the related procedures and behaviors
          largely may differ depending on which C-multicast routing protocols
          is used, how it is configured, and how multicast source discovery
          mechanism are used in the customer VPN and which SwitchToSptDesired
          policy is used. However the following can be observed:</t>

          <t><list style="symbols">
              <t>when BGP-based C-multicast routing is used, each PE will
              possibly have to process and maintain one BGP Source-Active
              autodiscovery route for (some or all) sources of an ASM group,
              which results in a message processing and state maintenance
              (total across all the equipements) linearly dependent on the
              number of PEs in the VPN (#MVPN_PE) for each source,
              independently of the number of PEs joined to the group.
              Depending on whether or not inter-site shared trees are used,
              and depending on the SwitchToSptDesired policy in the PIM
              instances in the customer routers and VRFs, and depending on the
              relative locations of sources and RPs, this will happen for all
              (S,G) of an ASM group or only for some of them, and will be done
              in parallel to the maintenance of shared and/or source trees or
              at the first join of a PE on a source tree</t>

              <t>when PIM-based C-multicast routing is used, depending on the
              SwitchToSptDesired policy in the PIM instances in the customer
              routers and VRFs, and depending on the relative locations of
              sources and RPs, there are:<list style="symbols">
                  <t>possible control plane state transitions triggered by the
                  reception of (S,G) packets ; such events would induce
                  processing on all PEs joined to G</t>

                  <t>possible control plane state transitions triggered by the
                  reception of (S,G) packets, and possible PIM Assert messages
                  specific to (S,G) ; this would induce a message processing
                  on each PE of the VPN for each PIM Assert message</t>
                </list></t>
            </list>Given the above, the additional processing that may happen
          for each individual sending to the group beyond the maintenance of
          source and shared trees, does not change the orders of magnitude
          identified above.</t>
        </section>
      </section>

      <section anchor="leave-join-cost" title="Cost of PEs leaving and joining">
        <t>The quantification of message processing in <xref target="one-tree"> </xref> is done based on a use case where each PE
        with receivers has joined and left once. Drawing scalability-related
        conclusions for other patterns of changes of the set of
        receiver-connected PEs, can be done by considering the cost of each
        approach for "a new PE joining" and "a PE leaving".</t>

        <t>For the "PIM LAN Procedure default" approach, in the case of a
        single SSM or SPT tree, the total amount of message processing across
        all nodes depends linearly on the number of PEs in the VPN, when a PE
        joins such a tree. When "PIM LAN Procedures with explicit tracking"
        are used, the amount of processing is independent of the amount of
        PEs.</t>

        <t>For the "BGP-based" approach:<list style="symbols">
            <t>In the case of a single SSM tree, the total amount of message
            processing across all nodes is independent on the number of PEs,
            for "a new PE" joining and "a PE leaving"; it also depends on how
            Route Reflectors are meshed, but not with linear dependency.</t>

            <t>In the case of an SPT tree for an ASM group, BGP as additional
            processing due to possible Source-Active autodiscovery
            routes:<list style="symbols">
                <t>when BGP-based C-multicast routing is used with inter-site
                shared trees, for the first PE joining (and last PE leaving) a
                said SPT, the processing of the corresponding Source-Active
                autodiscovery routes results in a processint cost linearly
                dependent of the number of PEs in the VPN ; for subsequent PE
                joining (and non-last PE leaving) there is no processing due
                to advertisement or withdrawal of Source-Active autodiscovery
                routes</t>

                <t>when BGP-based C-multicast routing is used without
                inter-site shared trees, the processing of Source-Active
                autodiscovery routes for an (S,G), happens independently of
                PEs joining and leaving the SPT for (S,G).</t>
              </list></t>
          </list></t>

        <t>In the case of a new PE having having to join a shared tree for an
        ASM group G, we see the following:<list style="symbols">
            <t>the processing due to the PE joining the shared tree itself is
            the same as the processing required to setup an SSM tree, as
            described before (note that this does not happen when BGP-based
            C-multicast routing is used without inter-site shared trees)</t>

            <t>for each source for which the PE joins the SPT, the resulting
            processing cost is the same as one SPT tree, as described before ;
            <list style="symbols">
                <t>the conditions under which a PE will join the SPT for a
                said (C-S, C-G) are the same between the the BGP-based with
                inter-site shared tree approach and the PIM-based approach,
                and depend solely on the SwitchToSptDesired policy in the PIM
                instances in the customer routers in the sites connected to
                the PE and/or in the VRF</t>

                <t>the conditions under which a PE will join the SPT for a
                said (C-S, C-G) differ between the BGP-based without
                inter-site shared trees approach and the PIM-based
                approach</t>

                <t>the SPT for a said (S,G) can be joined by the PE in the
                following cases:<list style="symbols">
                    <t>as soon as one router, or the VPN VRF on the PE, has
                    SwitchToSptDesired(S,G) being true</t>

                    <t>when BGP-based routing is used, and configured to not
                    use inter-site shared trees</t>
                  </list></t>

                <t>said differently, the only case where the PE will not join
                the SPT for (S,G) is when all routers in the sites of the VPN
                connected to the PE, or the VPN VRF itself, will never have
                SwitchToSptDesired(S,G) being true, with the additional
                condition when BGP-based C-multicast routing is used, that
                inter-site shared trees are used</t>
              </list></t>
          </list></t>

        <t>Thus, when one PE joins a group G to which n sources are sending
        traffic, we note the following with regards to the dependency of the
        cost (in total amount of processing across all equipments) to the
        number of PEs :<list style="symbols">
            <t>in the general case (where any router in the site of the VPN
            connected to the PE, or the VRF itself, may have
            SwitchToSptDesired(S,G) being true):<list style="symbols">
                <t>for the "PIM LAN Procedure default" approach, the cost is
                linearly dependent on the number of PEs in the VPN, and
                linearly dependent on the number of sources</t>

                <t>for the "PIM LAN Procedures with explicit tracking"
                approach, the cost is linearly dependent on the number of
                sources and independent of the number of PEs in the VPN</t>

                <t>for the "BGP-based" approach, the cost is linearly
                dependent on the number of sources, and, in the sub-case of
                the BGP-based approach used with inter-site shared trees is
                also dependent on the number of PEs in the VPN only if the PE
                is the first to join the group or the SPT for some source
                sending to the group</t>
              </list></t>

            <t>else, under the assumption that routers in the sites of the VPN
            connected to the PE, and the VPN VRF itself, will never have the
            policy function SwitchToSptDesired(S,G) being possibly true,
            then:<list style="symbols">
                <t>in the case of the PIM-based approaches, the cost is
                linearly dependent on the number of PEs in the VPN, and there
                is no dependency on the number of sources</t>

                <t>in the case of the BGP-based approach with inter-site
                shared trees, the cost is linearly dependent on the number of
                RRs, and there is no dependency on the number of sources</t>

                <t>in the case of the BGP-based approach without inter-site
                shared trees, the cost is linearly dependent on the number of
                RRs and on the number of sources</t>
              </list></t>
          </list>Hence, with the PIM default approach the overall cost across
        all equipements of any PE joining an ASM group G is always dependent
        on the number of PEs (same for a PE that leaves), while in the
        BGP-based and PIM Explicit tracking approaches have a cost independent
        of the number of PEs (with the exception of the first PE joining the
        ASM group, for the BGP-based approach used without inter-site shared
        trees; in that case there is a dependency with the number of PEs).</t>

        <t>On the dependency with the number of sources : without making any
        assumption on the SwitchToSptDesired policy on PIM routers and VRFs of
        a VPN, we see that a PE joining an ASM group may induce a processing
        cost linearly dependent on the number of sources. Apart from this
        general case, under the condition where the SwitchToSptDesired is
        always false on all PIM routers and VRFs of the VPN, then with the PIM
        based approach, and with the BGP-based approach used with inter-site
        shared trees, the cost in amount of messages processed will be
        independent of the number of sources (it has to be noted that this
        condition depends on customer policy).</t>
      </section>
    </section>

    <section title="Switching to S-PMSI">
      <t>[ the following point was fixed in version 07 of <xref target="I-D.ietf-l3vpn-2547bis-mcast"/>, and is here for reference
      only ]</t>

      <t><xref target="I-D.ietf-l3vpn-2547bis-mcast">Section 7.2.2.3 of</xref>
      proposes two approaches for how a source PE can decide when to start
      transmitting customer multicast traffic on a S-PMSI:</t>

      <t><list style="numbers">
          <t>The source PE sends multicast packets for the <C-S, C-G> on
          both the I-PMSI P-multicast tree and the S-PMSI P-multicast tree
          simultaneously for a pre-configured period of time, letting the
          receiver PEs select the new tree for reception, before switching to
          only the S-PMSI.</t>

          <t>The source PE waits for a pre-configured period of time after
          advertising the <C-S, C-G> entry bound to the S-PMSI before
          fully switching the traffic onto the S-PMSI-bound P-multicast
          tree.</t>
        </list>The first alternative has essentially two drawbacks:<list style="symbols">
          <t><C-S,C-G> traffic is sent twice for some period of time,
          which would appear to be at odds with the motivation for switching
          to an S-PMSI in order to optimize the bandwidth used by the
          multicast tree for that stream.</t>

          <t>It is unlikely that the switchover can occur without packet loss
          or duplication if the transit delays of the I-PMSI P-multicast tree
          and the S-PMSI P-multicast tree differ.</t>
        </list></t>

      <t>By contrast, the second alternative has none of these drawbacks, and
      satisfy the requirement in <xref target="RFC4834">section 5.1.3
      of</xref>, which states that "[...] a multicast VPN solution SHOULD as
      much as possible ensure that client multicast traffic packets are
      neither lost nor duplicated, even when changes occur in the way a client
      multicast data stream is carried over the provider network". The second
      alternative also happen to be the one used in existing deployments.</t>

      <t>For these reasons, it is the authors' recommendation to mandate the
      implementation of the second alternative for switching to S-PMSI.</t>
    </section>
  </back>
</rfc>

PAFTECH AB 2003-20262026-04-21 22:24:40