One document matched: draft-francis-idr-intra-va-00.txt




Network Working Group                                         P. Francis
Internet-Draft                                                Cornell U.
Intended status: Standards Track                                   X. Xu
Expires: December 3, 2008                                         Huawei
                                                               June 2008


                    Intra-Domain Virtual Aggregation
                   draft-francis-idr-intra-va-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on December 3, 2008.

















Francis & Xu            Expires December 3, 2008                [Page 1]

Internet-Draft               Intra-Domain VA                   June 2008


Abstract

   Virtual Aggregation (VA) is a technique for shrinking the DFZ FIB
   size in routers (both IPv4 and IPv6).  This allows ISPs to extend the
   lifetime of existing routers, and allows router vendors to build FIBs
   with much less concern about the growth of the DFZ routing table.  VA
   does not shrink the size of the RIB.  VA may be deployed autonomously
   by an ISP (cooperation between ISPs is not required).  While VA can
   be deployed without changes to existing routers, doing so requires
   significant new management tasks.  This document describes changes to
   routers and BGP that greatly simplify the operation of VA.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Scope of this Document . . . . . . . . . . . . . . . . . .  3
     1.2.  Requirements notation  . . . . . . . . . . . . . . . . . .  4
     1.3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
     1.4.  Status as of July 2008 . . . . . . . . . . . . . . . . . .  5
   2.  Overview of Virtual Aggregation (VA) . . . . . . . . . . . . .  6
     2.1.  Mix of legacy and VA routers . . . . . . . . . . . . . . .  7
     2.2.  Summary of Tunnels and Paths . . . . . . . . . . . . . . .  8
   3.  Specification of VA  . . . . . . . . . . . . . . . . . . . . . 10
     3.1.  Requirements for VA  . . . . . . . . . . . . . . . . . . . 10
     3.2.  VA Operation . . . . . . . . . . . . . . . . . . . . . . . 10
       3.2.1.  Legacy Routers . . . . . . . . . . . . . . . . . . . . 10
       3.2.2.  Advertising and Handling Virtual Prefixes (VP) . . . . 11
       3.2.3.  Border VA Routers  . . . . . . . . . . . . . . . . . . 14
       3.2.4.  Advertising and Handling Sub-Prefixes  . . . . . . . . 14
       3.2.5.  Suppressing FIB Sub-prefix Routes  . . . . . . . . . . 15
   4.  Requirements Discussion  . . . . . . . . . . . . . . . . . . . 17
     4.1.  Response to router failure . . . . . . . . . . . . . . . . 17
     4.2.  Traffic Engineering  . . . . . . . . . . . . . . . . . . . 18
     4.3.  Incremental and safe deploy and start-up . . . . . . . . . 18
     4.4.  VA security  . . . . . . . . . . . . . . . . . . . . . . . 18
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 21
     6.1.  Properly Configured VA . . . . . . . . . . . . . . . . . . 21
     6.2.  Mis-configured VA  . . . . . . . . . . . . . . . . . . . . 21
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
   Intellectual Property and Copyright Statements . . . . . . . . . . 25







Francis & Xu            Expires December 3, 2008                [Page 2]

Internet-Draft               Intra-Domain VA                   June 2008


1.  Introduction

   Constant DFZ routing table growth forces ISPs to upgrade router
   hardware when the FIB in routers is too small to hold the entire DFZ
   routing table.  It also forces router vendors to build FIBs that can
   handle expected growth in the DFZ routing table.  Virtual Aggregation
   (VA) is a method for allowing FIB size to be smaller than the full
   routing table, at the expense of a slight path length and router load
   increase [va-tech-report-08].  VA operates by organizing the IP (v4
   or v6) address space into Virtual Prefixes (VP), and using tunnels to
   aggregate the (regular) sub-prefixes within each VP.

   The changes required for BGP are relatively minor.  For instance, the
   BGP decision process for regular prefixes does not change.  Inter-
   domain BGP does not change in any way (the same AS-paths and exits
   are chosen).  Indeed, the RIB continues to hold the entire DFZ
   routing table.  The main change with VA is that routers suppress
   loading certain routes into the FIB.  In other words, while the FIB
   shrinks, the RIB stays the same size (or in fact is marginally
   larger).  While all things being equal it would be nice to shrink
   both the RIB and FIB, as well as make other improvements such as
   speed up convergence time, the changes required to do so would be
   relatively major.  This document takes the position that FIB size is
   more of a constraint than RIB size, and so relatively minor changes
   to the operation of routers and BGP can nevertheless fix an important
   problem.

   One of the most attractive things about VA is that it can be deployed
   autonomously by a single ISP---indeed it can be deployed today
   without changes in existing legacy routers (see [va-tech-report-08]).
   Doing so, however, requires significant new router management to
   configure VA.  The purpose of this document is to define new router
   operations and BGP attributes that greatly simplify the operation of
   VA, as well as improve its robustness and performance (compared to
   simply modifying the configuration of legacy routers).

1.1.  Scope of this Document

   The scope of this document is limited to Intra-domain operation of
   VA.  In other words, the case where a single ISP autonomously
   operates VA internally without any coordination with neighboring
   ISPs.

   Note that this document assumes that the VA "domain" (i.e. the unit
   of autonomy) is the AS (that is, different ASes run VA independently
   and without coordination).  The option of making confederations the
   VA domain is for further study.  For the remainder of this document,
   the terms ISP, AS, and domain are used interchangeably.



Francis & Xu            Expires December 3, 2008                [Page 3]

Internet-Draft               Intra-Domain VA                   June 2008


   This document applies equally to IPv4 and IPv6.

   VA may operate with a mix of upgraded routers ("VA routers") and
   legacy routers.  There are no topological restrictions placed on the
   mix of routers.

   VA makes heavy use of tunnels.  In principle, a variety of tunnels
   may be used---any tunnel that works for deploying a VPN may also be
   used for VA.  This document limits itself to the use of MPLS tunnels,
   and indeed the terms "tunnel" and "LSP" (Label Switched Path) are
   used somewhat interchangeably.  This document also generally assumes
   the use of the Label Distribution Protocol (LDP) as the default
   method of establishing LSPs [RFC5036].  Other methods of establishing
   LSPs may be used.  Future versions of this document may specify the
   use of other tunnel types.

1.2.  Requirements notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

1.3.  Terminology

   Aggregation Point Router (APR):  An Aggregation Point Router (APR) is
      a router that aggregates a Virtual Prefix (VP) by installing
      routes (into the FIB) for all of the sub-prefixes within the VP.
      APRs advertise the VP to other routers with BGP.  For each sub-
      prefix within the VP, APRs have a Label Switched Path (LSP) from
      themselves to the external peer where packets for that prefix
      should be delivered.  The egress router does penultimate hop
      popping so that the external peer does not see the MPLS header.
      Note in particular that it is the label map at the egress router,
      and not a FIB entry, that identifies to which external peer to
      send the packet.

   Install and Suppress:  The terms "install" and "suppress" are used to
      describe whether a RIB entry has been loaded into the FIB
      (installed) or not loaded into the FIB (suppressed).  In other
      words, the phrase "install a route" means "install a route into
      the FIB", and the phrase "suppress a route" means "do not install
      a route into the FIB".

   Legacy Router:  A router that does not run VA, and has no knowledge
      of VA.






Francis & Xu            Expires December 3, 2008                [Page 4]

Internet-Draft               Intra-Domain VA                   June 2008


   non-APR Router:  In discussing Virtual Prefixes (VPs), it is often
      necessary to distinguish between routers that are APRs for that
      VP, and routers that are not APRs for that VP (but of course may
      be APRs for other VPs not under discussion).  In these cases, the
      term "APR" will be taken to mean "a VA router that is an APR for
      the given VP", and the term "non-APR" will be taken to mean "a VA
      router that is not an APR for the given VP".  The term non-APR
      router will not be used to refer to legacy routers.

   Popular Prefix:  A popular prefix is a sub-prefix that is installed
      in a router in addition to the sub-prefixes it holds by virtue of
      being a Aggregation Point Router.  Normally popular prefixes are
      installed because a relatively large amount of traffic are
      delivered to destinations within that prefix.  The popular prefix
      allows packets to follow the shortest path.  Note that different
      routers do not need to have the same set of popular prefixes.

   Sub-Prefix:  A regular (physically aggregatable) prefix.  These are
      equivalent to the prefixes that would normally comprise the global
      routing table in the absence of VA.  A VA router will contain a
      sub-prefix entry either because the sub-prefix falls within a
      virtual prefix for which the router is an Aggregation Point, or
      because the sub-prefix is installed as a popular prefix.  Legacy
      routers hold the same sub-prefixes they hold today.

   VA router:  A router that operates Virtual Aggregation according to
      this specification.

   Virtual Prefix (VP):  A Virtual Prefix (VP) is a prefix used to
      aggregate its contained regular prefixes (sub-prefixes).  A VP is
      not physically aggregatable, and so it is aggregated at APRs
      through the use of tunnels.

1.4.  Status as of July 2008

   A "configuration-only" variant of VA (i.e. one that can be deployed
   with today's legacy routers) has been configured and tested on a
   small testbed of commercial routers, as described in
   [va-tech-report-08].  While this serves as proof that the data-plane
   portion of Virtual Aggregation works, this configuration is
   relatively complex, and there are some control-plane performance
   issues associated with the routers that we configured.  The changes
   specified by this document (i.e.  Section 3) have not yet been
   implemented, and still require vetting and testing.







Francis & Xu            Expires December 3, 2008                [Page 5]

Internet-Draft               Intra-Domain VA                   June 2008


2.  Overview of Virtual Aggregation (VA)

   For descriptive simplicity, this section starts by describing VA
   assuming that there are no legacy routers in the domain.  Section 2.1
   describes the additional functions required by VA routers to
   accommodate legacy routers.

   A key concept behind VA is to operate BGP as normal, and in
   particular to populate the RIB with the full DFZ routing table, but
   to suppress many or most prefixes from being loaded into the FIB.  By
   populating the RIB as normal, the changes required to BGP and to
   router operation are relatively minor.  The basic idea behind VA is
   quite simple.  The address space is partitioned into large prefixes
   --- larger than any aggregatable prefix in use today.  These prefixes
   are called virtual prefixes (VP).  Different VPs do not need to be
   the same size.  They may be a mix of \6, \7, \8 (for IPv4), and so
   on.  Each ISP can independently select the size of its VPs.

   VPs are not themselves physically aggregatable.  VA makes the VPs
   aggregatable through the use of tunnels, as follows.  Associated with
   each VP are one or more "Aggregation Point Routers" (APR).  An APR
   (for a given VP) is a router that installs routes for all sub-
   prefixes (i.e. real physically aggregatable prefixes) within the VP.
   By "install routes" here, we mean:

   1.  The route for each of the sub-prefixes is loaded into the FIB,
       and

   2.  there is a tunnel from the APR to the external peer that is the
       BGP NEXT_HOP for the route (though note that the tunnel header is
       stripped with penultimate hop popping before the packet reaches
       the external peer).

   The APR originates a BGP route to the VP.  This route is distributed
   within the domain, but not outside the domain.  With this structure
   in place, a packet transiting the ISP goes from the ingress router to
   the APR via a tunnel, and then from the APR to the external peer
   through another (penultimate hop popped) tunnel.

   Note that the AS-path is not effected at all by VA.  Furthermore, the
   external peer selected by the ISP is the same whether or not VA is
   operating.  This path may not follow the shortest path within the ISP
   (where shortest path is defined here as the path that would have been
   taken if VA were not operating), because the APR may not be on the
   shortest path between the ingress and egress routers.  When this
   happens, the packet experiences additional latency and creates extra
   load (by virtue of taking more hops than it otherwise would have).




Francis & Xu            Expires December 3, 2008                [Page 6]

Internet-Draft               Intra-Domain VA                   June 2008


   VA can avoid traversing the APR for selected routes by installing
   these routes in ingress routers.  In other words, even if an ingress
   router is not an APR for a given sub-prefix, it may install that sub-
   prefix into its FIB (noting again that the sub-prefix is already in
   the RIB).  Packets in this case are tunneled directly from the
   ingress to the egress, therefore taking shortest path.  These routes
   are called "Popular Prefixes", and are typically installed for sub-
   prefixes that carry a high volume of traffic.  Different routers may
   have different popular prefixes.  As such, an ISP may assign popular
   prefixes per router, per POP, or uniformly across the ISP.  A given
   router may have zero popular prefixes, or the majority of its FIB may
   consist of popular prefixes.  The effectiveness of popular prefixes
   relies on the fact that traffic volumes follow something like a
   power-law distribution: i.e. that 90% of traffic is destined to 10%
   of the destinations.  Internet traffic measurement studies over the
   years have consistently shown that traffic patterns follow this
   distribution, though there is no guarantee that they always will.

   Note that for routing to work properly, every packet must sooner or
   later reach a router that has installed a sub-prefix route that
   matches the packet.  This would be the case for a given sub-prefix if
   every router has installed a route for that sub-prefix (which of
   course is the situation in the absence of VA).  If this is not the
   case, then there must be at least one Aggregation Point Router (APR)
   for the sub-prefix's virtual prefix (VP).  Ideally, every POP
   contains at least two APRs for every virtual prefix.  By having APRs
   in every POP, the latency imposed by routing to the APR is minimal
   (the extra hop is within the POP).  By having more than one APR,
   there is a redundant APR should one fail.  In practice it is often
   not possible to have an APR for every VP in every POP.  This is
   because some POPs may have only one or a few routers, and therefore
   there may not have enough cumulative FIB space in the POP to hold
   every sub-prefix.  Note that any router ("edge", "core", etc.) may be
   an APR.

2.1.  Mix of legacy and VA routers

   It is important that an ISP be able to operate with a mix of "VA
   routers" (routers upgraded to operate VA as described in the
   document) and "legacy routers".  This allows ISPs to deploy VA in an
   incremental fashion and to continue to use routers that for whatever
   reason cannot be upgraded.  This document allows such a mix, and
   indeed places no topological restrictions on that mix.  It does,
   however, require that legacy routers establish and use LSPs, so that
   APRs can forward packets to them.  Specifically, when a legacy router
   is a border router, it must initiate LSPs to itself using LDP,
   [RFC5036], and must use its own address as the BGP NEXT_HOP in routes
   received from external peers.



Francis & Xu            Expires December 3, 2008                [Page 7]

Internet-Draft               Intra-Domain VA                   June 2008


   VA prevents the routing loops that might otherwise occur when VA
   routers and legacy routers are mixed, as follows.  First of all, note
   that once a packet reaches a VA router (either because the ingress
   router is a VA router, or because a legacy router forwards the packet
   to a VA router), it will follow tunnels all the way to the egress
   router (Section 2).  If the egress router is a VA router, then the
   packet is forwarded via the LSP mapping.  If the egress router is a
   legacy router, then it will forward the packet using its FIB entry.

   If the ingress router is a legacy router, then it will forward the
   packet to the BGP NEXT_HOP via the associated tunnel.  (Note that in
   the unexpected case that some legacy router actually does not use the
   tunnel but rather forwards the packet to the IGP-resolved next hop,
   the packet will either work its way towards the egress router, and
   will either progress through a series of legacy routers (in which
   case the IGP prevents loops), or it will eventually reach a VA
   router.)

2.2.  Summary of Tunnels and Paths

   To summarize, the following tunnels are created:

   1.  From all routers to all APRs (noting that most VA routers are
       likely to be APRs).

   2.  From all routers to all legacy border routers.

   3.  From all routers to all external peers that are neighbors of VA
       border routers.

   There are a number of possible paths that packets may take through an
   ISP, summarized in the following diagram.  Here, "VA" is a VA router,
   "LR" is a legacy router, the symbol "==>" represents a tunneled
   packet (through zero or more LSRs), "-->" represents an untunneled
   packet, and "(pop)" represents a penultimate hop popping.  (Note that
   the external peer may actually be a legacy router or a VA router---it
   doesn't matter (and isn't known) to the ISP.)














Francis & Xu            Expires December 3, 2008                [Page 8]

Internet-Draft               Intra-Domain VA                   June 2008


           Ingress    Some       APR         Egress     External
           Router     Router     Router      Router     Peer
           -------    ------     ------      ------     --------
       1.    VA===================>VA=========>VA(pop)====>LR

       2.    VA===================>VA=========>LR--------->LR

       3.    VA===============================>VA(pop)====>LR

       4.    VA===============================>LR--------->LR

       5.    LR===============================>VA(pop)====>LR

       6.    LR===============================>LR--------->LR

       (the following two are not expected, but may exist with
        some legacy router)

       7.    LR------->VA (remaining path as in 1 to 4 above)

       8.    LR------->LR--------------------->LR--------->LR


   The first and second paths represent the case where the ingress
   router does not have a popular prefix for the destination, and must
   tunnel the packet to an APR.  The third and fourth paths represent
   the case where the ingress router does have a popular prefix for the
   destination, and so tunnels the packet directly to the egress.  The
   fifth and sixth paths are similar, but where the ingress is a legacy
   router (and effectively has the popular prefix by virtue of holding
   the entire DFZ routing table).  (Note that some ISPs have only
   partial routing tables in their customer-facing edge routers, and
   default route to a router that holds the full DFZ.  This case is not
   shown here.)  Finally, paths 7 and 8 represent the unexpected case
   where legacy routers do not use an IGP-resolved next hop rather than
   a tunnel.















Francis & Xu            Expires December 3, 2008                [Page 9]

Internet-Draft               Intra-Domain VA                   June 2008


3.  Specification of VA

   This section describes how to operate VA.  It starts with a brief
   discussion of requirements, followed by a specification of router
   support for VA.

3.1.  Requirements for VA

   While the core requirement is of course to be able to manage FIB
   size, this must be done in a way that:

   o  is robust to router failure,

   o  allows for traffic engineering,

   o  allows for existing inter-domain routing policies,

   o  operates in a predictable manner and is therefore possible to
      test, debug, and reason about performance (i.e. establish SLAs),

   o  can be safely installed, tested, and started up,

   o  can be incrementally deployed, and in particular can be operated
      in an AS with a mix of VA-capable and legacy routers.

   o  accommodates existing security mechanisms such as ingress
      filtering and DoS defense,

   o  does not introduce significant new security vulnerabilities.

   In short, operation of VA must not significantly affect the way ISPs
   operate their networks today.  Section 4 discusses the extent to
   which these requirements are met by the design presented in
   Section 3.2.  (Note that these requirements are generated by the
   authors, and do not (yet) reflect a consensus within the IDR working
   group.)

3.2.  VA Operation

   In this section, the detailed operation of VA is specified.

3.2.1.  Legacy Routers

   VA can operate with a mix of VA and legacy routers.  Although legacy
   routers have no notion of VA, they nevertheless MUST satisfy the
   following requirements:





Francis & Xu            Expires December 3, 2008               [Page 10]

Internet-Draft               Intra-Domain VA                   June 2008


   1.  Each legacy router MUST initiate LSPs to itself.  Specifically,
       it initiates Downstream Unsolicited tunnels to all neighbors
       using LDP [RFC5036], with its own full address (/32 if IPv4, /128
       if IPv6) as the Forwarding Equivalence Class (FEC).

   2.  When forwarding externally-received routes over iBGP, the BGP
       NEXT_HOP attribute MUST be set to the legacy router itself (the
       FEC of the LSPs).

   3.  Legacy routers MUST participate fully in LDP.  In other words,
       they MUST have all tunnels listed in Section 2.2.

   4.  Legacy routers MUST either store the entire DFZ routing table in
       their FIB, or they must be configured with a working default
       route.

   As long as legacy routers install LSPs as described here, there are
   no topological restrictions on the legacy routers.  They may be
   freely mixed with VA routers without the possibility of forming
   sustained loops (Section 2.1).  (Note that it may be possible to
   operate VA even when legacy routers do not establish LSPs.  However,
   it appears that doing so is relatively complex.  If it is determined
   that operating MPLS with legacy routers is impossible in some cases,
   the need to do so may be revisited.)

3.2.2.  Advertising and Handling Virtual Prefixes (VP)

3.2.2.1.  Limitations on Virtual Prefixes

   From the point of view of best-match routing semantics, VPs are
   treated identically to any other prefix.  In other words, if the
   longest matching prefix is a VP, then the packet is routed towards
   the VP.  If a packet matching a VP reaches an Aggregation Point
   Router (APR) for that VP, and the APR does not have a better match,
   then the packet is discarded by the APR (just as a router that
   originates any prefix will discard a packet that does not have a
   specific match within that prefix).

   The overall semantics of VPs, however, are subtly different from
   those of real prefixes (well, maybe not so subtly).  Without VA, when
   a router originates a route for a (real) prefix, the expectation is
   that the addresses within the prefix are within the originating AS
   (or a customer of the AS).  For VPs, this is not the case.  APRs
   originate VPs whose sub-prefixes exist in different ASes.  Because of
   this, it is important that VPs not be advertised across AS
   boundaries.

   It is up to individual domains to define their own VPs.  It is



Francis & Xu            Expires December 3, 2008               [Page 11]

Internet-Draft               Intra-Domain VA                   June 2008


   important that VPs be "larger" (span a larger address space) than any
   real sub-prefix.  If a VP is smaller than a real prefix, then packets
   that match the real prefix will nevertheless be routed to an APR
   owning the VP, at which point the packet will be dropped if it does
   not match a sub-prefix within the VP (Section 6).

3.2.2.2.  Aggregation Point Routers (APR)

   Any router may be configured as an Aggregation Point Router (APR) for
   one or more Virtual Prefixes (VP).  For each VP for which a router is
   an APR, the router does the following:

   1.  The APR MUST originate a BGP route to the VP [RFC4271].  In this
       route, the NLRI are all of the VPs for which the router is an
       APR.  The ORIGIN is set to INCOMPLETE (value 2), the AS number of
       the APR's AS is used in the AS_PATH, and the NEXT_HOP is set to
       the address of the APR.  The ATOMIC_AGGREGATE and AGGREGATOR
       attributes are not included.  The LOCAL_PREF attribute is
       optional.

   2.  The APR MUST attach a regular Extended Communities Attribute
       [RFC4360] to the route with a (newly-defined) well-known value of
       TDB "Virtual Prefix Route" (value to be assigned by IANA).  The
       Transitive Bit MUST be set to value 1 (the community is non-
       transitive across ASes).  The purpose of this attribute is
       twofold:

       1.  To insure that other VA routers do not suppress this route
           from the FIB.

       2.  To inform other VA routers as to what part of the address
           space is covered by VPs, so that other VA routers can install
           all sub-prefixes not belonging to any VA.

       This route is sent to VA and legacy routers alike.  Note that
       legacy routers will not recognize the Extended Communities
       Attribute value, but will pass it to iBGP neighbors (because it
       is a transitive BGP attribute), and will not pass it to eBGP
       neighbors (because the T bit is set to non-transitive).  (Note
       that the NO_EXPORT Communities Attribute [RFC1997] could also be
       used to prevent transmission across AS boundaries, and indeed if
       there are legacy routers that don't recognize Extended
       Communities Attributes, then the NO_EXPORT Communities Attribute
       MUST be attached.)

   3.  The APR MUST initiate LSPs terminating at itself.  Specifically,
       it initiates Downstream Unsolicited tunnels to all neighbors
       using LDP [RFC5036], with the address that it used in the BGP



Francis & Xu            Expires December 3, 2008               [Page 12]

Internet-Draft               Intra-Domain VA                   June 2008


       NEXT_HOP attribute of the VP route as the FEC.  Note that VA
       routers and legacy routers alike MUST have tunnels to the APR.

   4.  If a packet is received at the APR whose best match is the VP
       (i.e. it matches the VP but not any sub-prefixes within the VP),
       then the packet MUST be discarded (see Section 3.2.2.1).

3.2.2.2.1.  Selecting APRs

   An ISP is free to select APRs however it chooses.  The details of
   this are outside the scope of this document.  Nevertheless, a few
   comments are made here.  In general, APRs should be selected such
   that the distance to the nearest APR for any VP is small---ideally
   within the same POP.  Depending on the number of routers in a POP,
   and the sizes of the FIBs in the routers relative to the DFZ routing
   table size, it may not be possible for all VPs to be represented in a
   given POP.  In addition, there should be multiple APRs for each VP,
   again ideally in each POP, so that the failure of one does not unduly
   disrupt traffic.

   APRs should be statically assigned.  They may also, however, be
   dynamically assigned, for instance in response to APR failure.  For
   instance, each router may be assigned as a backup APR for some other
   APR.  If the other APR crashes (as indicated by the withdrawal of its
   routes to its VPs), the backup APR can install the appropriate sub-
   prefixes and advertise the VP as specified above.  Note that doing so
   may require it to first remove some popular prefixes from its FIB to
   make room.

   Note that, although VPs must be bigger than real prefixes, there is
   intentionally no mechanism designed to automatically insure that this
   is the case.  Such a mechanisms would be dangerous.  For instance, if
   an ISP somewhere advertised a very large prefix (a /4, say), then
   this would cause APRs to throw out all VPs that are smaller than
   this.  For this reason, VPs must be set through static configuration
   only.

3.2.2.3.  Non-APR Routers

   A non-APR router receiving one or more BGP routes for a VP knows that
   they are VPs because of the "Virtual Prefix Route" Extended
   Communities attribute.  The router MUST install one such route in its
   FIB (i.e. it MUST NOT be suppressed).  When there are multiple routes
   to APRs for a given VP, the router may select among any of the
   routes, but presumably it will select the nearest one.

   The non-APR MUST use the tunnel to forward packets to the selected
   APR.  This is both because a different non-APR may have selected a



Francis & Xu            Expires December 3, 2008               [Page 13]

Internet-Draft               Intra-Domain VA                   June 2008


   different APR, and because there may be a legacy router between the
   non-APR and the selected APR.  Either way, an untunneled packet could
   form a loop.

   The router MUST advertise the VP to its iBGP peers.  The router MUST
   NOT advertise the VP to eBGP peers.  (This should go without saying,
   since the Extended Communities T bit is set to be non-transitive.)

   When an APR fails, routers MUST select another APR to send packets to
   (if there is one).  This happens through normal internal BGP
   convergence mechanisms.  If internal BGP convergence is not fast
   enough, the IGP may be used instead.  This requires that BGP speakers
   advertise multiple routes to a given VP, so that routers have an
   immediate choice of multiple APRs.  When an APR fails, IGP will
   quickly notify routers, which can then immediately select another APR
   (even before BGP converges).  Whether this approach is needed is for
   further study.

3.2.3.  Border VA Routers

   VA routers that are border routers MUST do the following:

   1.  They MUST initiate LSPs to their external peers.  Specifically,
       they initiate Downstream Unsolicited tunnels to all neighbors
       using LDP [RFC5036], with the full address of their external
       peers as the FEC.

   2.  The border router MUST do penultimate-hop popping of the MPLS
       header (i.e. the external peer MUST NOT receive an MPLS header).

   3.  When forwarding externally-received routes over iBGP, the BGP
       NEXT_HOP attribute MUST be set to the external peer (i.e. the FEC
       of the LSPs).

   (Note that an alternative approach would be to used stacked labels,
   with the outer label terminating at the border router, and the inner
   label identifying the external peer and distributed in BGP as
   described in [RFC3107].  This approach requires that fewer tunnels be
   installed by LDP.  The need for this approach is for further study.)

3.2.4.  Advertising and Handling Sub-Prefixes

   Sub-prefixes are advertised and handled by BGP as normal.  VA does
   not effect this behavior.  The only difference in the handling of
   sub-prefixes is that they might not be installed in the FIB, as
   described in Section 3.2.5.

   In those cases where the route is installed, packets forwarded to



Francis & Xu            Expires December 3, 2008               [Page 14]

Internet-Draft               Intra-Domain VA                   June 2008


   prefixes external to the AS MUST be transmitted via the LSP
   established as described in Section 3.2.3.

3.2.5.  Suppressing FIB Sub-prefix Routes

   Any route not labeled as a VP (i.e. through the "Virtual Prefix
   Route" Extended Communities attribute) is taken to be a sub-prefix.
   The following rules are used to determine if a sub-prefix route can
   be suppressed (not loaded into the FIB).

   1.  If the router is an APR, a route for every sub-prefix within the
       VP MUST be installed.

   2.  If a non-APR router has a sub-prefix route that does not belong
       to any VP, then the route must be installed.  This may occur
       because the ISP hasn't defined a VP covering that prefix, for
       instance during an incremental deployment buildup.  It may also
       occur because all APRs for a given VP have crashed.  Note that
       when the last APR for a given VP in an AS goes down, every router
       in the AS will suddenly start installing all of the sub-prefixes
       in the FIB.  This is not a desirable situation and should be
       avoided.  One way would be for ISPs to insure that there are
       enough APRs that this doesn't occur.  Another would be for
       routers to proactively start installing sub-prefixes when, say,
       the second-to-last APR crashes.  How this is best handled is
       outside the scope of this document.

   3.  All other sub-prefix routes MAY be suppressed.  Such "optional"
       sub-prefixes that are nevertheless installed are referred to as
       popular prefixes.

3.2.5.1.  Selecting Popular Prefixes

   Individual routers may independently choose which sub-prefixes are
   popular prefixes.  There is no need for different routers to select
   the same sub-prefixes.  There is therefore significant leeway as to
   how routers select popular prefixes.  Several basic approaches are
   outlined here.

   1.  High-volume prefixes: By installing high-volume prefixes as
       popular prefixes, the latency and load associated with the longer
       path required by VA is minimized.  The expectation here is that
       an ISP will measure its traffic volume over time (days or a few
       weeks), and statically configure high-volume prefixes as popular
       prefixes.  There is strong evidence that prefixes that are high-
       volume tend to remain high-volume over multi-day or multi-week
       timeframes (though not necessarily at short timeframes like
       minutes or seconds).  Since a static list of popular prefixes can



Francis & Xu            Expires December 3, 2008               [Page 15]

Internet-Draft               Intra-Domain VA                   June 2008


       be very large (nearly as large as the global routing table
       itself, especially early on), routers must be configurable for
       large lists of popular prefixes without any significant
       performance penalty (i.e. the time it takes to populate the FIB).
       High-volume prefixes may also be installed dynamically.  In other
       words, a router measures its own traffic volumes, and installs
       and removes popular prefixes in response to short term traffic
       load.  The downside of this approach is that it complicates
       debugging network problems.  If packets are being dropped
       somewhere in the network, it is more difficult to find out where
       if the selected path can change dynamically.

   2.  SLA-constrained prefixes: Even though some prefixes may be low-
       volume, packets to those prefixes may have SLAs associated with
       them (i.e. maximum latency or packet drop rate).  ISPs may wish
       to install such prefixes.

   3.  Policy-based: Even after the APR-required prefixes, high-volume
       prefixes, and SLA-constrained prefixes have been installed, there
       may still be space for additional popular prefixes.  In this
       case, an ISP may wish to populate the FIB based on some policy.
       For instance, there can be a policy saying that customer prefixes
       are installed.  Alternatively, a router could be assigned as the
       APR for additional prefixes, for instance for additional
       robustness.  Or, prefixes from specific neighbor ISPs, or
       prefixes with smaller AS-paths may be selected.

























Francis & Xu            Expires December 3, 2008               [Page 16]

Internet-Draft               Intra-Domain VA                   June 2008


4.  Requirements Discussion

   This section describes the extent to which VA satisfies the list of
   requirements given in Section 3.1.

4.1.  Response to router failure

   VA introduces a new failure mode in the form of Aggregation Point
   Router (APR) failure.  There are two basic approaches to protecting
   against APR failure, static APR redundancy, and dynamic APR
   assignment (see Section 3.2.2.2.1).  In static APR redundancy, enough
   APRs are assigned for each Virtual Prefix (VP) so that if one goes
   down, there are others to absorb its load.  Failover to a static
   redundant APR is automatic with existing BGP mechanisms.  If an APR
   crashes, BGP will cause packets to be routed to the next nearest APR.
   Nevertheless, there are three concerns here, convergence time, load
   increase at the redundant APR, and latency increase for diverted
   flows.

   Regarding convergence time, note that, while fast-reroute mechanisms
   apply to the rerouting of packets to a given APR or egress router,
   they don't apply to APR failure.  Convergence time was discussed in
   Section 3.2.2.3, which suggested that it is likely that BGP
   convergence times will be adequate, and if not the IGP mechanisms may
   be used.

   Regarding load increase, in general this is relatively small.  This
   is because substantial reductions in FIB size can be achieved with
   almost negligible increase in load.  For instance,
   [va-tech-report-08] shows that a 5x reduction in FIB size yields a
   less than one percent increase in load overall.  Given this,
   depending on the configuration of redundant APRs, failure of one APR
   increases the load of its backups by only a few percent.  This is
   well within the variation seen in normal traffic loads.

   Regarding latency increase, some flows may see a significant increase
   in delay (and, specifically, an increase that puts it outside of its
   SLA boundary).  Normally a redundant APR would be placed within the
   same POP, and so increased latency would be minimal (assuming that
   load is also quite small, and so there is no significant queuing
   delay).  It is not always possible, however, to have an APR for every
   VP within every POP, much less a redundant APR within every POP, and
   so sometimes failure of an APR will result in significant latency
   increases for a small fraction of traffic.







Francis & Xu            Expires December 3, 2008               [Page 17]

Internet-Draft               Intra-Domain VA                   June 2008


4.2.  Traffic Engineering

   VA complicates traffic engineering because the placement of APRs and
   selection of popular prefixes influences how packets flow.  (Though
   to repeat, increased load is in any event likely to be minimal, and
   so the effect on traffic engineering should not be great in any
   event.)  Since the majority of packets may be forwarded by popular
   prefixes (and therefore follow the shortest path), it is particularly
   important that popular prefixes be selected appropriately.  As
   discussed in Section 3.2.5.1, there are static and dynamic approaches
   to this. [va-tech-report-08] shows that high-volume prefixes tend to
   stay high-volume for many days, and so a static strategy is probably
   adequate.  VA can operate correctly using either RSVP-TE or LDP to
   establish tunnels.

4.3.  Incremental and safe deploy and start-up

   It must be possible to install and configure VA in a safe and
   incremental fashion, as well as start it up when routers reboot.
   This document allows for a mixture of VA and legacy routers, allows a
   fraction or all of the address space to fall within virtual prefixes,
   and allows different routers to suppress different FIB entries
   (including none at all).  As a result, it is generally possible to
   deploy and test VA in an incremental fashion.  Although MPLS and LDP
   must be operational everywhere, once done, an ISP can incrementally
   increase the number of VA routers, the number of VPs, and the number
   of suppressed FIB entries over time.

   Likewise, routers can bootstrap VA by first bringing up the IGP, then
   establish LSPs, then establish routes to all required sub-prefixes,
   and then finally advertise VPs.

4.4.  VA security

   Regarding ingress filtering, because in VA the RIB is effectively
   unchanged, routers contain the same information they have today for
   installing ingress filters [RFC2827].  Presumably, installing an
   ingress filter in the FIB takes up some memory space.  Since ingress
   filtering is most effective at the "edge" of the network (i.e. at the
   customer interface), the number of FIB entries for ingress filtering
   should remain relatively small---equal to the number of prefixes
   owned by the customer.  Whether this is true in all cases remains for
   further study.

   Regarding DoS attacks, there are two issues that need to be
   considered.  First, does VA result in new types of DoS attacks?
   Second, does VA make it more difficult to deploy DoS defense systems.
   Regarding the first issue, one possibility is that an attacker



Francis & Xu            Expires December 3, 2008               [Page 18]

Internet-Draft               Intra-Domain VA                   June 2008


   targets a given router by flooding the network with traffic to
   prefixes that are not popular, and for which that router is an APR.
   This would cause a disproportionate amount of traffic to be forwarded
   to the APR(s).  While it is up to individual ISPs to decide if this
   attack is a concern, it does not strike the authors that this attack
   is likely to significantly worsen the DoS problem.

   Regarding DoS defense system deployment, more input about specific
   systems is needed.  It is the authors' understanding, however, that
   at least some of these systems use dynamically established routing
   table entries to divert victims' traffic into LSPs that carry the
   traffic to scrubbers.  The expectation is that this mechanism simply
   over-rides whatever route is in place (with or without VA), and so
   the operation of VA should not limit the deployment of these types of
   DoS defense systems.  Nevertheless, more study is needed here.




































Francis & Xu            Expires December 3, 2008               [Page 19]

Internet-Draft               Intra-Domain VA                   June 2008


5.  IANA Considerations

   This document requires the following number assignment from IANA:

   o  A regular non-transitive Extended Communities Attribute value
      meaning "Virtual Prefix Route" [RFC4360] (see Section 3.2.2.2).













































Francis & Xu            Expires December 3, 2008               [Page 20]

Internet-Draft               Intra-Domain VA                   June 2008


6.  Security Considerations

   We consider the security implications of VA under two scenarios, one
   where VA is configured and operated correctly, and one where it is
   mis-configured.  A cornerstone of VA operation is that the basic
   behavior of BGP doesn't change, especially inter-domain.  Among other
   things, this makes it easier to reason about security.

6.1.  Properly Configured VA

   If (intra-domain) VA is configured and operated properly, then the
   external behavior of an AS does not change.  The same upstream ASes
   are selected, and the same prefixes and AS-paths are advertised.
   Therefore, a properly configured VA domain has no security impact on
   other domains.

   This document discusses intra-domain security concerns in Section 4.4
   which argues that, any new security concerns appear to be relatively
   minor.

6.2.  Mis-configured VA

   Intra-domain VA introduces the possibility that a VP is advertised
   outside of an AS.  This in fact should be a very low probability
   event, but it is considered here none-the-less.

   If an AS leaks a large VP (i.e. larger than any real prefixes), then
   the impact is minimal.  Smaller prefixes will be preferred because of
   best-match semantics, and so the only impact is that packets that
   otherwise have no matching routes will be sent to the misbehaving AS
   and dropped there.  If an AS leaks a small VP (i.e. smaller than a
   real prefix), then packets to that AS will be hijacked by the
   misbehaving AS and dropped.  This can happen with or without VA, and
   so doesn't represent a new security problem per se.

















Francis & Xu            Expires December 3, 2008               [Page 21]

Internet-Draft               Intra-Domain VA                   June 2008


7.  Acknowledgements

   The authors would like to acknowledge the efforts of Xinyang Zhang
   and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an
   early variant of Inter-domain VA, and the efforts of Hitesh Ballani
   and Tuan Cao, who worked on the configuration-only variant of Intra-
   domain VA that works with legacy routers.  We would also like to
   thank Hitesh and Tuan, as well as Scott Brim, for their comments on
   this draft.










































Francis & Xu            Expires December 3, 2008               [Page 22]

Internet-Draft               Intra-Domain VA                   June 2008


8.  References

   [RFC1997]  Chandrasekeran, R., Traina, P., and T. Li, "BGP
              Communities Attribute", RFC 1997, August 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2827]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
              Defeating Denial of Service Attacks which employ IP Source
              Address Spoofing", BCP 38, RFC 2827, May 2000.

   [RFC3107]  Rekhter, Y. and E. Rosen, "Carrying Label Information in
              BGP-4", RFC 3107, May 2001.

   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
              Protocol 4 (BGP-4)", RFC 4271, January 2006.

   [RFC4360]  Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
              Communities Attribute", RFC 4360, February 2006.

   [RFC5036]  Andersson, L., Minei, I., and B. Thomas, "LDP
              Specification", RFC 5036, October 2007.

   [va-tech-report-08]
              Francis, P., Ballani, H., and T. Cao, "Virtual
              Aggregation:  A Configuration-only Approach to Reducing
              FIB Size", Cornell Technical Report http://hdl.handle.net/
              1813/11058 http://hdl.handle.net/1813/11058, July 2008.






















Francis & Xu            Expires December 3, 2008               [Page 23]

Internet-Draft               Intra-Domain VA                   June 2008


Authors' Addresses

   Paul Francis
   Cornell University
   4108 Upson Hall
   Ithaca, NY  14853
   US

   Phone: +1 607 255 9223
   Email: francis@cs.cornell.edu


   Xiaohu Xu
   Huawei Technologies
   No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District
   Beijing, Beijing  100085
   P.R.China

   Phone: +86 10 82836073
   Email: xuxh@huawei.com































Francis & Xu            Expires December 3, 2008               [Page 24]

Internet-Draft               Intra-Domain VA                   June 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.











Francis & Xu            Expires December 3, 2008               [Page 25]



PAFTECH AB 2003-20262026-04-24 04:32:37