One document matched: draft-ietf-grow-va-02.xml


<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
]>

<!-- zzzz next line -->
<rfc category="info" ipr="trust200811" docName="draft-ietf-grow-va-02.txt">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes" ?>


    <front>
		<title abbrev="FIB Suppression">FIB Suppression with Virtual Aggregation</title>
	    <author initials ='P' surname ='Francis' fullname='Paul Francis'>
	           <organization abbrev="MPI-SWS">Max Planck Institute for Software Systems</organization>
            <address>
                <postal>
                    <street>Gottlieb-Daimler-Strasse</street>
                    <city>Kaiserslautern</city>
		    <!-- <region>NY</region> -->
                    <code>67633</code>
                    <country>Germany</country>
                </postal>
		<phone>+49 631 930 39600</phone>
                <email>francis@mpi-sws.org</email>
            </address> 
        </author>
	        <author initials ='X' surname ='Xu' fullname='Xiaohu Xu'>
            <organization abbrev="Huawei">Huawei Technologies</organization>
            <address>
                <postal>
<street>
No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District
</street>
                    <city>Beijing</city>
                    <region>Beijing</region>
                    <code>100085</code>
                    <country>P.R.China</country>
                </postal>
                <phone>+86 10 82836073</phone>
                <email>xuxh@huawei.com</email>
            </address>
        </author>
 
	    <author initials ='H' surname ='Ballani' fullname='Hitesh Ballani'>
            <organization abbrev="Cornell U.">Cornell University</organization>
            <address>
                <postal>
                    <street>4130 Upson Hall</street>
                    <city>Ithaca</city>
                    <region>NY</region>
                    <code>14853</code>
                    <country>US</country>
                </postal>
                <phone>+1 607 279 6780</phone>
                <email>hitesh@cs.cornell.edu</email>
            </address>
        </author>

	
 
	    <author initials ='D' surname ='Jen' fullname='Dan Jen'>
            <organization abbrev="UCLA">UCLA</organization>
            <address>
                <postal>
                    <street> 4805 Boelter Hall </street>
                    <city>Los Angeles</city>
                    <region>CA</region>
                    <code> 90095 </code>
                    <country>US</country>
                </postal>
                <phone> </phone>
                <email>jenster@cs.ucla.edu</email>
            </address>
        </author>


	 
	    <author initials ='R' surname ='Raszuk' fullname='Robert Raszuk'>
            <organization abbrev="Cisco">Cisco Systems, Inc.</organization>
            <address>
                <postal>
                    <street> 170 West Tasman Drive</street>
                    <city> San Jose</city>
                    <region> CA </region>
                    <code> 95134 </code>
                     <country> USA </country>
                </postal>
                <phone> </phone>
                <email>raszuk@cisco.com</email>
            </address>
        </author>

	
	
	    <author initials ='L' surname ='Zhang' fullname='Lixia Zhang'>
            <organization abbrev="UCLA">UCLA</organization>
            <address>
                <postal>
                    <street> 3713 Boelter Hall </street>
                    <city>Los Angeles</city>
                    <region>CA</region>
                    <code> 90095 </code>
                    <country>US</country>
                </postal>
                <phone> </phone>
                <email>lixia@cs.ucla.edu</email>
            </address>
        </author>

	
	<!-- zzzz set date zzzz -->
	<date day="8" month="March" year="2010"/>

        <abstract>
		<t> 
The continued growth in the Default Free Routing Table (DFRT)
stresses the global routing system in a number of ways.  One of the
most costly stresses is FIB size:  ISPs often must upgrade router hardware
simply because the FIB has run out of space, and router vendors must
design routers that have adequate FIB.
FIB suppression is an approach to relieving stress on the FIB by
NOT loading selected RIB entries into the FIB.
Virtual Aggregation (VA) allows ISPs to shrink the FIBs of any and all
routers, easily by an order of magnitude with negligible increase in
path length and load.
FIB suppression
deployed autonomously by an ISP (cooperation between ISPs is not required),
and can co-exist with legacy routers in the ISP.
</t>
        </abstract>

    </front>

    <middle>

        <section title="Introduction">

				<t> 
ISPs today manage constant DFRT growth in a number of ways.
One way, of course, is for ISPs to upgrade their
router hardware before DFRT growth
outstrips the size of the FIB.  
This is too expensive for many ISPs.  They would prefer to extend
the lifetime of routers whose FIBs can no longer hold the full DFRT.
</t>
<t>
A common approach taken by lower-tier ISPs is to default route to their
providers.  Routes to customers and peer ISPs are maintained, but everything
else defaults to the provider.  This approach has several disadvantages.
First, packets to Internet destinations may take longer-than-necessary
AS paths.  
This problem can be mitigated through careful configuration of partial
defaults, but this can require substantial configuration overhead.
A second problem with defaulting to providers is that the ISP is no longer
able to provide the full DFRT to its customers.
Finally, provider defaults prevents the ISP from being able to detect
martian packets.  As a result, the ISP transmits packets that could
otherwise have been dropped over its expensive provider links.
</t>
<t>
An alternative is for the ISP to maintain full routes in its core
routers, but to filter routes from edge routers that do not require
a full DFRT.
These edge routers can then default route to the core routers.
This is often possible with edge routers that interface to
customer networks.
The problem with this approach is that it cannot be used for all
edge routers.  For instance, it cannot be used for routers that
connect to transits.
It of course also does not help in cases where core routers themselves
have inadequate FIB capacity.
</t>
<t>
FIB Suppression is an approach to shrinking FIB size that requires no
changes to BGP, no changes to packet forwarding mechanisms in routers,
and relatively minor changes to control mechanisms in routers and
configuration of those mechanisms.
The core idea behind FIB suppression is to run BGP as normal, and in
particular to not shrink the RIB, but rather to not load certain RIB
entries into the FIB.
This approach minimizes changes to routers, and
in particular is simpler than more general routing architectures that
try to shrink both RIB and FIB.
With FIB suppression, there are no changes to BGP per se.  The
BGP decision process does not change.  
The selected AS-path does not change, and except on rare occasion the exit
router does not change.
ISPs can deploy FIB suppression autonomously and with no coordination with
neighboring ASes.
</t>
<t>
This document describes an approach to FIB suppression called
"Virtual Aggregation" (VA). 
VA operates by organizing the IP (v4 or v6)
address space into Virtual Prefixes (VP), and using tunnels to
aggregate the (regular) sub-prefixes within each VP.
The decrease in FIB size can be dramatic, easily
5x or 10x with only a slight path length and router load increase
<xref target="nsdi09"></xref>.  
The VPs can be organized such that all routers in an ISP see FIB size
decrease, or in such a way that "core" routers keep the full FIB, and
"edge" routers have almost no FIB (i.e. by defining a VP of 0/0).
This "core-edge" style of VA deployment is much simpler than a "full"
VA deployment, whereby multiple VPs are defined, and any router,
core or otherwise, can have reduced FIB size.
This simpler "core-edge" style of deployment is specified in a 
separate draft in order to make it more easily understandable
<xref target="I-D.ietf-grow-simple-va"></xref>.
</t>
<t>
VA has the following characteristics:
<list style="symbols">
<t>
it is robust to router failure, 
</t>
<t>
it allows for traffic engineering, 
</t>
<t>
it allows for existing inter-domain routing policies,
</t>
<t>
it operates in a predictable manner and is therefore possible to test,
debug, and reason about performance (i.e. establish SLAs),
</t>
<t>
it can be safely installed, tested, and started up,
</t>
<t>
it can be configured and reconfigured without service interruption,
</t>
<t>
it can be incrementally deployed, and in particular can be operated
in an AS with a mix of VA-capable and legacy routers,
</t>
<t>
it accommodates existing security mechanisms such as unicast
Reverse Path Forwarding (uRPF) ingress
filtering and DoS defense, 
</t>
<t>
does not introduce significant new security vulnerabilities.
</t>
</list>
</t>

<section anchor="sec-scope" title="Scope of this Document">
<t>
The scope of this document is limited to Intra-domain VA operation.
In other words, the case where a single ISP autonomously operates VA
internally without any coordination with neighboring ISPs.  
</t>
<t>
Note that this document assumes that the VA "domain"
(i.e. the unit of autonomy) is the AS (that is,
different ASes run VA independently and without coordination).
For the remainder of this document, the terms ISP, AS, and domain are
used interchangeably.
</t>
<t>
This document applies equally to IPv4 and IPv6.
</t>
<t>
VA may operate with a mix of upgraded routers and
legacy routers.
There are no topological restrictions placed on the mix of routers.
In order to avoid loops between upgraded and legacy routers, 
packets are always tunneled by the VA routers to the BGP NEXT_HOPs
of the matched BGP routes. If a given local ASBR is a legacy router,
it must be able to terminate tunnels.
</t>
</section> <!-- "sec-scope" -->

	<section title="Requirements notation">
            <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
            "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
            and "OPTIONAL" in this document are to be interpreted as
	    described in 
	    <xref target="RFC2119"/>.
    </t>
        </section>

        <section title="Terminology">

<t>
<list style="hanging">
<t hangText="Aggregation Point Router (APR):"> 
	An Aggregation Point Router (APR) is a router that aggregates
	a Virtual Prefix (VP) by installing routes (into the FIB) for
	all of the sub-prefixes within the VP.  APRs
	advertise the VP to other routers with BGP.  For each 
	sub-prefix within the VP, APRs have a tunnel
	from themselves to the
	remote ASBR (Autonomous System Border Router)
	where packets for that prefix should be delivered.
</t>
<t hangText="Install and Suppress:"> 
	The terms "install" and "suppress" are used to describe whether
	a RIB entry has been loaded or not loaded into the FIB.
	In other words, the phrase
	"install a route" means "install a route into the FIB", and the
	phrase "suppress a route" means "do not install a route into the
	FIB". 
</t>
<t hangText="Legacy Router:"> 
	A router that does not run VA, and has no knowledge of VA.
	Legacy routers, however, must be able to terminate tunnels
	when they are local ASBRs.
</t>
<t hangText="non-APR Router:"> 
	In discussing VPs, it is often necessary to
	distinguish between routers that are APRs for that VP, and routers
	that are not APRs for that VP (but of course may be APRs for other
	VPs not under discussion).  In these cases, the term "APR" is
	taken to mean "a VA router that is an APR for the given VP",
	and the term "non-APR" is taken to mean "a VA router that is not
	an APR for the given VP".  The term non-APR router is not used
	to refer to legacy routers.
</t>
<t hangText="Popular Prefix:"> 
	A Popular Prefix is a sub-prefix that is installed in a router
	in addition to the sub-prefixes it holds by virtue of being a
	Aggregation Point Router.
	The Popular Prefix allows packets
	to follow the shortest path.  Note that different routers do not
	need to have the same set of Popular Prefixes.
</t>
<t hangText="Routing Information Base (RIB):"> 
	The term RIB is used rather sloppily in this document to refer either
	to the loc-RIB (as used in <xref target="RFC4271"></xref>), or to the combined
	Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out.
</t>
<t hangText="Sub-Prefix:"> 
	A regular (physically aggregatable) prefix.  These are equivalent to the
	prefixes that would normally comprise the DFRT in the
	absence of VA.  A VA router will contain a sub-prefix entry either
	because the sub-prefix falls within a Virtual Prefix for which
	the router is an APR, or because the sub-prefix is
	installed as a Popular Prefix.  Legacy routers hold the same
	sub-prefixes that they hold today.
</t>
<t hangText="Tunnel:"> 
	This draft specifies the use of MPLS Label Switched Paths (LSP),
	and of MPLS inner
	labels tunneled over either LSPs or IP headers.
	Other types of tunnels may be used, but are not specified here.
	This document generically uses the term tunnel to refer to any of these
	tunnel types.
</t>
<t hangText="VA router:"> 
	A router that operates Virtual Aggregation according to this
	document.
</t>
<t hangText="Virtual Prefix (VP):"> 
	A Virtual Prefix (VP) is a prefix used to aggregate its contained
	regular prefixes (sub-prefixes).  
	The set of sub-prefixes in a VP are not physically aggregatable, 
	and so they
	are aggregated at APRs through the use of tunnels.  
</t>
<t hangText="VP-List:"> 
       A list of defined VPs.  All routers must agree on the contents
       of this list (which
       is statically configured into every VA router).
</t>
</list></t>
        </section>


<section anchor="sec-temp" title="Temporary Sections">
<t>
This section contains temporary information, and will be removed
in the final version.
</t>

<section anchor="sec-changed" title="Document revisions">

<t>
This document was previously published as both draft-francis-idr-intra-va-01.txt and draft-francis-intra-va-01.txt.
</t>

<section anchor="sec-01-grow" title="Revisions from the 01 version of draft-ietf-grow-va">
<t>
The specification of how to use tunnels has been incorporated directly
into this draft.  Formerly the specifications were provided in separate
drafts (<xref target="I-D.ietf-grow-va-mpls"></xref>, and
<xref target="I-D.ietf-grow-va-mpls-innerlabel"></xref>).
The tunneling types specified in 
<xref target="I-D.ietf-grow-va-gre"></xref> are not included in this
draft.
</t>
<t>
The simpler "core-edge" style of deployment has been removed from
this draft and specified in a stand-alone draft 
<xref target="I-D.ietf-grow-simple-va"></xref>
to simplify its
understanding for those interested in only that style of
deployment.
</t>
<t>
Added text about usage of uRPF (strict and loose).
</t>
<t>
Added text about flapping APR failure scenario.
</t>
</section> <!-- "sec-01-grow" -->


<section anchor="sec-00-grow" title="Revisions from the 00 version of draft-ietf-grow-va">
<t>
Removed the notion that FIB suppression can be done by suppressing entries
from the Routing Table (as defined 
in Section 3.2 of <xref target="RFC4271"></xref>), an idea that was
introduced in the second version of the draft.
Suppressing from the Routing Table
breaks PIM-SM, which relies on the contents of the unicast Routing
Table to produce its forwarding table.
</t>
</section> <!-- "sec-00-grow" -->

<section anchor="sec-00-0" title="Revisions from the 00 version (of draft-francis-intra-va-00.txt)">
<t>
	Added additional authors (Jen, Raszuk, Zhang), to reflect primary
	contributors moving forwards.  In addition, a number of minor
	clarifications were made.
</t>
</section> <!-- "sec-00-0" -->

<section anchor="sec-01" title="Revisions from the 01 version (of draft-francis-idr-intra-va-01.txt)">
<t>
<t><list style="numbers">
<t>
Changed file name from draft-francis-idr-intra-va to draft-francis-intra-va.
</t>
<t>
Restructured the document to make the edge suppression mode a specific
sub-case of VA rather than a separate mode of operation.  This includes
modifying the title of the draft.
</t>
<t>
Removed MPLS tunneling details so that specific tunneling approaches
can be described in separate documents.
</t>
</list></t>
</t>
</section> <!-- "sec-01" -->

<section anchor="sec-00" title="Revisions from 00 version">
<t>
<t><list style="symbols">
<t>
Changed intended document type from STD to BCP, as per advice from
Dublin IDR meeting.
</t>
<t>
Cleaned up the MPLS language, and specified that the full-address routes to
remote ASBRs must be imported into OSPF (<xref target="spec-br"></xref>).
As per Daniel Ginsburg's email 
http://www.ietf.org/mail-archive/web/idr/current/msg02933.html.
</t>
<t>
Clarified that legacy routers must run MPLS.
As per Daniel Ginsburg's email 
http://www.ietf.org/mail-archive/web/idr/current/msg02935.html.
</t>
<t>
Fixed LOCAL_PREF bug.
As per Daniel Ginsburg's email 
http://www.ietf.org/mail-archive/web/idr/current/msg02940.html.
</t>
<t>
Removed the need for the extended communities attribute on VP routes,
and added the requirement that all VA routers be statically configured
with the complete list of VPs.
As per Daniel Ginsburg's emails
http://www.ietf.org/mail-archive/web/idr/current/msg02940.html and
http://www.ietf.org/mail-archive/web/idr/current/msg02958.html.
In addition, the procedure for adding, deleting, splitting, and
merging VPs was added.
As part of this, the possibility of having overlapping VPs was added.
</t>
<t>
Added the special case of a core-edge topology with default routes
to the edge as suggested by Robert Raszuk in email
http://www.ietf.org/mail-archive/web/idr/current/msg02948.html.
Note that this altered the structure and even title of the
document. 
</t>
<t>
Clarified that FIB suppression can be achieved by not loading entries into
the Routing Table, as suggested by Rajiv Asati in email
http://www.ietf.org/mail-archive/web/idr/current/msg03019.html.
</t>
</list></t>
</t>
</section> <!-- "sec-00" -->
</section> <!-- "sec-changed" -->

</section> <!-- "sec-temp" -->

        </section>



<section anchor="basic-operation" title="Overview of Virtual Aggregation (VA)">
<t>
For descriptive simplicity, this section starts by describing VA
assuming that there are no legacy routers in the domain.
<xref target="router-mix"></xref> overviews the additional functions
required by VA routers to accommodate legacy routers.
</t>
<t>
A key concept behind VA is to operate BGP as normal, and in particular
to populate the RIB with the full DFRT, but to suppress
many or most
prefixes from being loaded into the FIB.   By populating the RIB as normal,
we avoid any changes to BGP, and changes to router operation are
relatively minor.
The basic idea behind VA is quite simple.  
The address space is partitioned into large prefixes --- larger than
any aggregatable prefix in use today.  These prefixes are called
Virtual Prefixes (VP).  Different VPs do not need to be the same
size.  They may be a mix of /6, /7, /8 (for IPv4), and so on.  Indeed,
an ISP can define a single /0 VP, and use it for a core/edge type of
configuration <xref target="I-D.ietf-grow-simple-va"></xref>.
That is, the core routers would maintain full FIBs, and
edge routers could maintain default routes to the core routers,
and suppress as much of the FIB as they wish.
Each ISP can independently select the size of its VPs.  
</t>
<t>
VPs are not themselves topologically aggregatable. 
VA makes the VPs aggregatable
through the use of tunnels, as follows.
Associated with each VP are one or more "Aggregation Point Routers" (APR).
An APR (for a given VP) is a router that installs
routes for all
sub-prefixes (i.e. real physically aggregatable prefixes) within the VP.
Note that an APR is not a special router per se---it is an
otherwise normal router that is configured to operate as an APR.
By "install routes" here, we mean:
<t><list style="numbers">
<t>
The route for each of the sub-prefixes is loaded into the FIB, and
</t>
<t>
there is a tunnel from the APR to the BGP NEXT_HOP for the route.
</t>
</list></t></t>

<t>
The APR originates a BGP route to the VP.
This route is distributed within the domain, but not outside the domain.
With this structure in place, a packet transiting the ISP goes from
the ingress router to the APR (usually via a tunnel),
and then from the APR to the BGP NEXT_HOP router via a tunnel.
VA can operate with MPLS LSPs, or with MPLS inner labels
over LSPs or IP headers.
<xref target="sec-tunnels"></xref> specifies the usage of tunnels.
</t>
<t>
The BGP NEXT_HOP can be either the local ASBR or the remote ASBR.  
In the former case, an inner label is used to tunnel packets
(<xref target="sec-inner"></xref>).
In either case, all tunner headers are stripped by the local ASBR before
the packet is delivered to the remote ASBR.  In other words, the remote
ASBR sees a normal IP packet, and is completely unaware of the existence
of VA in the neighboring ISP.
Note that legacy ASBRs MUST set themselves as the BGP NEXT_HOP.
</t>
	
<t>
Note that the AS-path is not effected at all by VA.
This means among other things that AS-level policies are not effected by VA.
The packet may not, however,
follow the shortest path within the ISP (where shortest
path is defined here as the path that would have been taken if VA were
not operating), because the APR may not be on the shortest path between
the ingress and egress routers.
When this happens, the packet experiences additional latency and creates
extra load (by virtue of taking more hops than it otherwise would have).
Note also that, with VA, a packet may occasionally
take a different exit point than it otherwise would have.
</t>
<t>
VA can avoid traversing the APR for selected routes by installing
these routes in non-APR routers.
In other words, even if an ingress router is not an APR for a given
sub-prefix,
it MAY install that sub-prefix into its FIB.
Packets in this case are tunneled directly from the ingress to the
BGP NEXT_HOP.
These extra routes are called "Popular Prefixes", and are typically installed
for policy reasons (e.g. customer routes are always installed), or
for sub-prefixes that carry a high volume of traffic
(<xref target="spec-pop-pre"></xref>).
Different routers MAY
have different Popular Prefixes.  As such, an ISP MAY assign Popular
Prefixes per router, per POP, or uniformly across the ISP.
A given router MAY have zero Popular Prefixes, or the majority of its
FIB MAY consist of Popular Prefixes.
The effectiveness of
Popular Prefixes to reduce traffic load
relies on the fact that traffic volumes follow something
like a power-law distribution: i.e. that 90% of traffic is destined to
10% of the destinations.  Internet traffic measurement
studies over the years have consistently shown that traffic patterns follow
this distribution, though there is no guarantee that they always will.
</t>
<t>
Note that for routing to work properly, every packet must sooner or later
reach a router that has installed a sub-prefix route that matches the packet.
This would obviously be the case for a given sub-prefix if every router
has installed a route for that sub-prefix (which of course is the situation in
the absence of VA).  If this is not the case, then there MUST be at
least one Aggregation Point Router (APR) for the
sub-prefix's Virtual Prefix (VP).
Ideally, every POP contains at least two APRs
for every Virtual Prefix.  By having APRs in every POP, the latency
imposed by routing to the APR is minimal (the extra hop is within the POP).
By having more than one APR, there is a redundant APR should one fail.
In practice it is often not possible to have an APR for every VP in every
POP.  This is because some POPs may have only one or a few routers,
and therefore there may not have enough cumulative FIB space in the POP
to hold every sub-prefix.
Note that any router ("edge", "core", etc.) MAY be an APR.
</t>
<t>
It is important that both the contents of BGP RIBs, as well as the
contents of the Routing Table (as defined 
in Section 3.2 of <xref target="RFC4271"></xref>) not be modified by
VA (other than the introduction of routes to VPs).  This is because
PIM-SM <xref target="RFC4601"></xref>
relies on the contents of the Routing Table to build its own
trees and forwarding table.  Therefore, FIB suppression MUST take place
between the Routing Table and the actual FIB(s).
</t>

<section anchor="router-mix" title="Mix of legacy and VA routers">
<t>
It is important that an ISP be able to operate with a mix of "VA
routers" (routers upgraded to operate VA as described in the document)
and "legacy routers".
This allows ISPs to deploy VA in an incremental fashion and to continue
to use routers that for whatever reason cannot be upgraded.
This document allows such a mix, and indeed places no topological
restrictions on that mix.
It does, however, require that legacy routers (and VA routers
for that matter) are able to forward
already-tunneled packets, are able to serve as tunnel endpoints, and
are able to participate in distribution of tunnel information
required to establish themselves as tunnel endpoints.
(This is listed as Requirement R5 in the companion tunneling documents.)
Depending on the tunnel type, legacy routers MAY also be able to
initiate tunneled packets, though this is an OPTIONAL requirement.
(This is listed as Requirement R4 in the companion tunneling documents.)
Legacy routers MUST use their own address as the
BGP NEXT_HOP, and MUST FIB-install routes for which they are the BGP
NEXT_HOP.
<!-- 
Specifically, when a legacy router is a border router, it MUST initiate
LSPs to itself for instance using LDP,
<xref target="RFC5036"></xref>, and MUST use its own address as the
BGP NEXT_HOP in routes received from remote ASBRs.
-->
</t>
</section> <!-- "router-mix" -->

<section anchor="tunnel-sum" title="Summary of Tunnels and Paths">
<t>
To summarize, the following tunnels are created:
<t><list style="numbers">
<t>
From all VA routers to all BGP NEXT_HOP addresses (where the BGP
NEXT_HOP address is either an APR, a local ASBR, or the remote
ASBR neighbor of a VA router).  Note that this is listed as
Requirement R3 in the companion tunneling documents.
</t>
<t>
Optionally, from all legacy routers to all BGP NEXT_HOP addresses.
</t>
</list></t>
There are a number of possible paths that packets may take through an ISP,
summarized in the following diagram.
Here, "VA" is a VA router, "LR" is a legacy router,
the symbol "==>" represents a tunneled packet (through zero or more
routers), "-->" represents an
untunneled packet, and "(pop)" represents stripping the tunnel header.
The symbol "::>" represents the portion of the path where although
the tunnel is
targeted to the receiving node, the outer header has been stripped.
(Note that the remote ASBR may actually be a legacy router or a VA
router---it doesn't matter (and isn't known) to the ISP.)
</t>

<figure>
	<artwork><![CDATA[
                                          Egress
                                          Router
        Ingress    Some       APR         (Local     Remote
        Router     Router     Router      ASBR)      ASBR
        -------    ------     ------      ------     --------
    1.    VA===================>VA=========>VA(pop)::::>LR
    
    2.    VA===================>VA=========>LR--------->LR
    
    3.    VA===============================>VA(pop)::::>LR
    
    4.    VA===============================>LR--------->LR

    (The following two exist in the case where legacy routers
     can initiate tunneled packets.)
    
    5.    LR===============================>VA(pop)::::>LR
    
    6.    LR===============================>LR--------->LR
    
    (The following two exist in the case where legacy routers
     cannot initiate tunneled packets.)
    
    7.    LR------->VA (remaining paths as in 1 to 4 above)
    
    8.    LR------->LR--------------------->LR--------->LR
    
 ]]></artwork>

</figure>

<t>
The first and second paths represent the case where the ingress
router does not have a Popular Prefix for the destination, and MUST
tunnel the packet to an APR.
The third and fourth paths represent the case where the ingress
router does have a Popular Prefix for the destination, and so tunnels
the packet directly to the egress.
The fifth and sixth paths are similar to the third and fourth
paths respectively, but where the ingress is a
legacy router that can initiate tunneled packets,
and effectively has the Popular Prefix by virtue of
holding the entire DFRT.
(Note that some ISPs have only partial RIBs in their
customer-facing edge routers, and default route to a router that
holds the full DFRT.  This case is not shown here, but works perfectly well.)
Finally, paths 7 and 8 represent the case where legacy
routers cannot initiate a tunneled packet.
</t>
<t>
VA prevents the routing loops that might otherwise occur when VA
routers and legacy routers are mixed.
The trick is avoiding the case where a legacy router is forwarding
packets towards the BGP NEXT_HOP, while a VA router is forwarding
packets towards the APR, with each router thinking that the other is
on the shortest path to their respective targets.
</t>
<t>
In the first four types of path, the loop is avoided because
tunnels are used all the way to the egress.  As a result, there
is never an opportunity for a legacy router to try to route based
on the destination address unless the legacy router is the egress,
in which case it forwards the packet to the remote ASBR.
</t>
<t>
In the 5th and 6th cases, the ingress is a legacy router, but this
router can initiate tunnels and has the full FIB, and so simply
tunnels the packet to the egress router.
</t>
<t>
In the 7th and 8th cases, the legacy ingress cannot initiate
tunnels, and so forwards the packet hop-by-hop towards the
BGP NEXT_HOP.  
The packet will work its way
towards the egress router, and will either progress through a series
of legacy routers (in which case the IGP prevents loops), or it will
eventually reach a VA router, after which it will take tunnels as in
the 1st and 2nd cases.
</t>
</section> <!-- "tunnel-sum" -->
</section>



<section anchor="router-changes" title="Specification of VA">
<t>
This section describes in detail how to operate VA.
It starts with a brief discussion of requirements, followed by
a specification of router support for VA.
</t>



<section anchor="va-ops" title="VA Operation">

<t>
In this section, the detailed operation of VA is specified.
</t>


<section anchor="spec-legacy" title="Legacy Routers">
<t>
VA can operate with a mix of VA and legacy routers.  
To prevent the types of loops described in
<xref target="tunnel-sum"></xref>, 
however, legacy routers
MUST satisfy the following requirements:
<t><list style="numbers">
<t>
When forwarding externally-received routes over iBGP, the
BGP NEXT_HOP attribute
MUST be set to the legacy router itself.
</t>
<t>
Legacy routers MUST be able to detunnel packets addressed to
themselves at the
BGP NEXT_HOP address.  They MUST also be able to convey the tunnel
information needed by other routers to initiate tunneled packets to them.
This is listed as "Requirement R1" in the companion tunneling documents.
If a legacy router cannot detunnel and convey tunnel parameters, then the
AS cannot use VA.
<!-- 
Specifically, it initiates Downstream Unsolicited tunnels to all IGP neighbors
for instance using LDP <xref target="RFC5036"></xref>, with its own full address
(/32 if IPv4, /128 if IPv6) as the Forwarding Equivalence Class (FEC).
-->
</t>
<t>
<!-- 
Legacy routers MUST participate fully in LDP.  In other words, they
MUST have all tunnels listed in <xref target="tunnel-sum"></xref>.
-->
Legacy routers MUST be able to forward all tunneled packets.
</t>
<t>
Every legacy router MUST hold its complete FIB.  (Note, of course,
that this FIB does not necessarily need to contain the full DFRT.
This might be the case, for instance, if the router is an edge
router that defaults to a core router.)
</t>
</list></t>
</t>
<t>
As long as legacy routers participating in tunneling as described above
there are no topological restrictions on the legacy routers.  They
may be freely mixed with VA routers without the possibility of
forming sustained loops (<xref target="tunnel-sum"></xref>).  
</t>
</section> <!-- <section anchor="spec-legacy" title="Legacy Routers"> -->

<section anchor="spec-VPs" title="Advertising and Handling Virtual Prefixes (VP)">

<section anchor="sec-know-vp" title="Distinguishing VPs from Sub-prefixes">
<t>
VA routers MUST be able to distinguish VPs from sub-prefixes.
This is primarily in order to know which routes to install.
In particular, non-APR routers MUST know which prefixes are VPs before
they receive routes for those VPs, for instance when they first boot
up.
This is in order to avoid the situation where they unnecessarily start
filling their FIBs with routes that they ultimately don't need to 
install (<xref target="spec-suppress"></xref>).
This leads to the following requirement:
</t>
<t>
It MUST be possible to statically configure the complete list of
VPs into all VA routers.
This list is known as the VP-List.
<!--
Note that this list need only convey the ranges of addresses that
are covered by VPs, not each individual VP.  For instance, if the
entire address space is covered by VPs, then the VP list need only
consist of a single entry, '0/0'.
-->
</t>
</section> <!-- "sec-know-vp" -->


<section anchor="limit-vp" title="Limitations on Virtual Prefixes">
<t>
From the point of view of best-match routing semantics,
VPs are treated identically to any other prefix.  
In other words, if the longest matching prefix is a VP, then the packet
is routed towards the VP.
If a packet matching a VP reaches an Aggregation Point Router (APR)
for that VP, and the APR does not have a better matching route, then the packet
is discarded by the APR (just as a router that originates any prefix
will discard a packet that does not have a better match).
</t>
<t>
The overall semantics of VPs, however, are slightly different from those
of real prefixes.
Without VA, when a router originates a route for a (real)
prefix, the expectation is that the addresses within the prefix are
within the originating AS (or a customer of the AS).
For VPs, this is not the case.   APRs originate VPs whose sub-prefixes
exist in different ASes.
Because of this, it is important that VPs not be advertised across
AS boundaries.
</t>
<t>
It is up to individual domains to define their own VPs.
VPs MUST be "larger" (span a larger address space) than any real sub-prefix.
If a VP is smaller than a real prefix, then packets that match the real
prefix will nevertheless
be routed to an APR owning the VP, at which point the packet
will be dropped if it does not match a sub-prefix within the VP
(<xref target="sec-consider"></xref>).
</t>
<t>
(Note that, in principle there are cases where a VP could be smaller
than a real prefix.  This is where the egress router to the real
prefix is a VA router.  In this case, the APR could theoretically
tunnel the packet to the appropriate remote ASBR, which would then
forward the packet correctly.  On the other hand, if the egress router
is a legacy router, then the APR could not tunnel
matching packets to the egress.
This is because the egress would view the VP as a better
match, and would loop the packet back to the APR.  For this reason
we require that VPs be larger than any real prefixes, and that
APRs never install prefixes larger than a VP in their FIBs.)
</t>
<t>
It is valid for a VP to be a subset of another VP.  For example,
20/7 and 20/8 can both be VPs.  In fact, this capability is necessary
for "splitting" a VP without temporarily increasing the FIB size in any router.
(<xref target="spec-VP-manage"></xref>).
</t>
</section>

<section anchor="spec-apr" title="Aggregation Point Routers (APR)">
<t>
Any router MAY be configured as an Aggregation Point Router
(APR) for one or more Virtual Prefixes (VP).  
For each VP for which a router is an APR, the router does
the following:
<t><list style="numbers">
<t>
The APR MUST originate a BGP route to the VP <xref target="RFC4271"></xref>.
In this route, the NLRI are all of the VPs
for which the router is an APR.
This is true even for VPs that are a subset of another VP.
The ORIGIN is set to INCOMPLETE (value 2), the AS number of the
APR's AS is used in the AS_PATH, and the BGP NEXT_HOP is set to the address
of the APR.  The ATOMIC_AGGREGATE and AGGREGATOR attributes are not
included.
</t>
<t>
The APR MUST attach a NO_EXPORT Communities Attribute 
<xref target="RFC1997"></xref> to the route.
</t>
<t>
The APR MUST be able to detunnel packets addressed to
itself at its
BGP NEXT_HOP address.  It MUST also be able to convey the tunnel
information needed by other routers to initiate tunneled packets to them
(Requirement R1).
<!-- 
The APR MUST initiate LSPs terminating at itself.
Specifically, it initiates Downstream Unsolicited tunnels to all IGP neighbors
for instance using LDP <xref target="RFC5036"></xref>, with the address that it
used in the BGP NEXT_HOP attribute of the VP route as the FEC.
Note that VA routers and legacy routers alike MUST have tunnels to
the APR.
-->
</t>
<t>
If a packet is received at the APR whose best match route is the VP (i.e.
it matches the VP but not any sub-prefixes within the VP), then the
packet MUST be discarded (see <xref target="limit-vp"></xref>).
This can be accomplished by never installing a prefix larger than the VP into
the FIB, or by installing the VP as a route to \dev\null.
</t>
</list></t>
</t>

<section anchor="spec-apr-select" title="Selecting APRs">
<t>
An ISP is free to select APRs however it chooses.  The details of this
are outside the scope of this document.
Nevertheless, a few comments are made here.
In general, APRs should be selected such that the distance to
the nearest APR for any VP is small---ideally within the same POP.
Depending on the number of routers in a POP, and the sizes of the FIBs
in the routers relative to the DFRT size, it may not be
possible for all VPs to be represented in a given POP.
In addition, there should be multiple APRs for each VP, again ideally
in each POP, so that the failure of one does not unduly disrupt traffic.
</t>
<t>
Note that, although VPs MUST be larger than real prefixes, there
is intentionally no mechanism designed to automatically insure that
this is the case.  
Such a mechanisms would be dangerous.  For instance, if an ISP somewhere
advertised a very large prefix (a /4, say), then this would cause APRs
to throw out all VPs that are smaller than this.
For this reason, VPs MUST be set through static configuration only.
</t>
</section>
</section>

<section anchor="spec-other-routers" title="Non-APR Routers">
<t>
A non-APR router MUST install at least the following routes:
<t><list style="numbers">
<t>
Routes to VPs (identifiable using the VP-List).
</t>
<t>
Routes to all sub-prefixes that are not covered by any VP
in the VP-List.
</t>
</list></t>
</t>
<t>
If the non-APR has a tunnel to the BGP NEXT_HOP of any such route, it
MUST use the tunnel to forward packets to the BGP NEXT_HOP.
</t>
<t>
When an APR fails, routers must select another APR to send packets to (if
there is one).
This happens, however, through normal internal BGP convergence mechanisms.
</t>
</section>

<section anchor="spec-VP-manage" title="Adding and deleting VPs">
<t>
An ISP may from time to time wish to reconfigure its VP-List.
There are a number of reasons for this.  For instance, early in its deployment
an ISP may configure one or a small number of VPs in order to test
VA.  As the ISP gets more confident with VA, it may increase the number
of VPs.  Or, an ISP may start with a small number of large VPs (i.e.
/4's or even one /0),
and over time move to more smaller VPs in order to save even
more FIB.  In this case, the ISP will need to "split" a VP.  Finally,
since the address space is not uniformly populated with prefixes, the
ISP may want to change the size of VPs in order to balance FIB size
across routers.  This can involve both splitting and merging VPs.
Of course, an ISP must be able to modify its VP-List without
1) interrupting service to any destinations, or 2) temporarily
increasing the size of any FIB (i.e. where the FIB size during the change
is no bigger than its size either before or after the change).
</t>
<t>
Adding a VP is straightforward.  The first step is to configure the APRs
for the VP.  This causes the APRs to originate routes for the VP.
Non-APR routers will install this route according to the rules in
<xref target="spec-other-routers"></xref>
even though they do not yet recognize that the prefix is a VP.
Subsequently the VP is added to the VP-List of non-APR routers.
The Non-APR routers can then start suppressing the sub-prefixes with
no loss of service.
</t>
<t>
To delete a VP, the process is reversed.  First, the VP is removed
from the VP-Lists of non-APRs.  This causes the non-APRs to install
the sub-prefixes.  After all sub-prefixes have been installed, the
VP may be removed from the APRs.
</t>
<t>
In many cases, it is desirable to split a VP.  For instance, consider
the case where two routers, Ra and Rb, are APRs for the same prefix.
It would be possible to shrink the FIB in both routers by splitting
the VP into two VPs (i.e. split one /6 into two /7's), and assigning
each router to one of the VPs.
While this could in theory be done by first deleting the larger VP,
and then adding the smaller VPs,
doing so would temporarily increase the FIB size in non-APRs,
which may not have adequate space for such an increase.
For this reason, we allow overlapping VPs.
</t>
<t>
To split a VP, first the two smaller VPs are added to the
VP-Lists of all non-APR routers (in addition to the larger superset VP).
Next, the smaller VPs are added to the selected APRs (which may or may
not be APRs for the larger VP).
Because the smaller VPs are a better match than the larger VP,
this will cause the non-APR routers to forward packets to the
APRs for the smaller VPs.
Next, the larger VP can be removed from the VP-Lists of all non-APR
routers.
Finally, the larger VP can be removed from its APRs.
</t>
<t>
To merge two VPs, the new larger VP is configured in all
non-APRs.  This has no effect on FIB size or APR selection, since the
smaller VPs are better matches.  Next the larger VP is configured in its
selected APRs.  
Next the smaller VPs are deleted from all non-APRs.
Finally, the smaller VPs are deleted from their corresponding APRs.
</t>
</section> <!-- "spec-VP-manage" -->

</section>

<section anchor="spec-br" title="Border VA Routers">
<t>
A VA router that is an ASBR MUST do the following:
</t>
<t><list style="numbers">
<t>
When forwarding externally-received routes over iBGP,
if a tunnel with an inner label is used, the ASBR MUST set the
BGP NEXT_HOP attribute to itself.
Otherwise, the BGP NEXT_HOP attribute is left unchanged.
</t>
<t>
They MUST establish tunnels as described in
<xref target="sec-tunnels"></xref>.
</t>
<t>
The ASBR MUST detunnel the packet before forwarding the packet
to the remote ASBR.  In other words, the remote ASBR receives a normal
untunneled packet identical to the packet it would receive without
VA.
</t>
<t>
The ASBR MUST be able to forward the packet without a FIB 
lookup.  In other words, the tunnel information itself contains all the
information needed by the border router to know which remote ASBR
should receive the packet.
</t>
</list></t>


<!-- 
They MUST initiate LSPs to their remote ASBRs.
Specifically, they initiate Downstream Unsolicited tunnels to all IGP neighbors
for instance using LDP <xref target="RFC5036"></xref>, with the full address
of their remote ASBRs (/32 for IPv4, /128 for IPv6) as the FEC.
The effect of this is that the VA borders use the received label to
know to which remote ASBR to forward an outgoing packet (i.e. without
having to do a FIB lookup), but will strip the MPLS header before
forwarding to the remote ASBR.
</t>
<t>
They MUST import the full address of the remote ASBR into the IGP
(i.e. OSPF <xref target="RFC2328"></xref>).
This is of course necessary for LDP to establish the tunnels targeted to
the remote ASBRs.
</t>
<t>
When forwarding externally-received routes over iBGP, the
BGP NEXT_HOP attribute
MUST be set to the remote ASBR (i.e. the FEC of the corresponding LSP).
</t>
</list></t>
</t>
<t>
(Note that an alternative approach would be to used stacked labels,
with the outer label terminating at the border router, and the inner
label identifying the remote ASBR and distributed in BGP as described in
<xref target="RFC3107"></xref>.  This approach requires that fewer
tunnels be installed by LDP.  The need for this approach is for
further study.)
</t>
-->
</section>

<section anchor="spec-sub-prefix" title="Advertising and Handling Sub-Prefixes">
<t>
Sub-prefixes are advertised and handled by BGP
as normal.  VA does not effect this behavior.  The only difference in the
handling of sub-prefixes is that they might not be installed in the FIB,
as described in <xref target="spec-suppress"></xref>.
</t>
<t>
In those cases where the route is installed, packets forwarded
to prefixes external to the AS MUST be transmitted via the tunnel
established as described in <xref target="spec-br"></xref>.
</t>
</section>


<section anchor="spec-suppress" title="Suppressing FIB Sub-prefix Routes">
<t>
Any route not for a known VP (i.e. not in the VP-List)
is taken to be a sub-prefix.
The following rules are used to determine if a sub-prefix route can
be suppressed.
<t><list style="numbers">
<t>
A VA router MUST NOT FIB-install a sub-prefix route for which there
is no tunnel to the BGP NEXT_HOP address.
This is to 
prevent a loop whereby the APR forwards the packet hop-by-hop
towards the next hop, but a router on the path that has FIB-suppressed
the sub-prefix forwards it back to the APR.
If there is an alternate route to the sub-prefix for which there is a
tunnel, then that route SHOULD be selected, even if it is less attractive
according to the normal BGP best path selection algorithm.
</t>
<t>
If the router is an APR, a route for every sub-prefix
within the VP MUST be FIB-installed (subject to the above limitation that
there be a tunnel).
</t>
<t>
If a non-APR router has a sub-prefix route that does not fall within any VP
(as determined by the VP-List), then the route MUST be installed.
This may occur because the ISP hasn't defined a VP covering that prefix,
for instance during an incremental deployment buildup.
</t>
<t>
If an ASBR is using strict uRPF to do ingress filtering, then it MUST
install routes for which the remote ASBR is the BGP NEXT_HOP
<xref target="RFC2827"></xref>.
Note that only a APR may do loose uRPF filtering, and then only
for routes to sub-prefixes within its VPs.
</t>
<t>
All other sub-prefix routes MAY be suppressed.  Such "optional"
sub-prefixes that are nevertheless installed are
referred to as Popular Prefixes.
Note, however, that whether or not to install a given sub-prefix
SHOULD NOT be based on whether or not there is an active route 
to a VP in the VP-List.
This avoids the situation whereby, during BGP initialization,
the router receives some sub-prefix routes before
receiving the corresponding VP route, with the result that it
installs routes in its FIB that it will only remove a short time later,
possibly even overflowing its FIB.
</t>
</list></t>
</t>

<section anchor="spec-pop-pre" title="Selecting Popular Prefixes">
<t>
Individual routers MAY independently choose
which sub-prefixes are Popular Prefixes.
There is no need for different routers to install the same sub-prefixes.
There is therefore significant leeway as to how routers select Popular
Prefixes.
As a general rule, routers should fill the FIB as much as possible, because
the cost of doing so is relatively small, and more FIB entries leads to
fewer packets taking a longer path.
Broadly speaking,
an ISP may choose to fill the FIB by making routers APRs for as many VPs
as possible, or by assigning relatively few APRs and rather filling the
FIB with Popular Prefixes.
Several basic approaches to selecting Popular Prefixes are outlined here.
Router vendors are free to implement whatever approaches they want.
</t>
<t><list style="numbers">
<t>
Policy-based:  The simplest approach for network administrators is to have
broad policies that routers use to
determine which sub-prefixes are designated as popular.
An obvious policy would be a "customer routes" policy, whereby all
customer routes are installed (as identified for instance by 
appropriate community attribute tags).
Another policy would be for a router to install prefixes originated
by specific ASes.  For instance, two ISPs could mutually agree to
install each other's originated prefixes.
A third policy might be to install prefixes with the shortest AS-path.
</t>
<t>
Static list:  Another approach would be to configure static lists of
specific prefixes to install.  For instance, prefixes associated with
an SLA might be configured.  Or, a list of prefixes for the most popular
websites might be installed.
</t>
<t>
High-volume prefixes: By installing high-volume prefixes as Popular Prefixes,
the latency and load associated with the longer path required by VA
is minimized.
One approach would be for an ISP to measure its traffic volume over
time (days or a few weeks), and statically configure high-volume prefixes
as Popular Prefixes.
There is strong evidence that prefixes that are high-volume tend to
remain high-volume over multi-day or multi-week timeframes (though not
necessarily at short timeframes like minutes or seconds).
High-volume prefixes MAY also be installed dynamically.  In other words,
a router measures its own traffic volumes, and installs and removes
Popular Prefixes in response to short term traffic load.
The downside of this approach is that it complicates debugging network
problems.  If packets are being dropped somewhere in the network, it
is more difficult to find out where if the selected path can change
dynamically.
</t>
</list></t>
</section> <!-- "spec-pop-pre" -->
</section> <!-- "spec-suppress" -->

<!--
<section anchor="sec-core-edge" title="Core-Edge Operation">
<t>
A common style of router deployment in ISPs is the "core-edge" deployment,
whereby there is a core of high-capacity routers surrounded by
potentially lower-capacity "edge" routers that may not carry the whole DFRT,
and which default route to a core router.
VA can support this style of configuration be effectively defining a
single VP as 0/0, and by defining core routers to be APRs for 0/0.
This results in core routers maintaining full FIBs, and edge routers
having potentially extremely small FIBs.
The advantage of using VA to support core-edge topologies is that, with
VA, any edge router, including those peering with other ISPs, can have
a small FIB.  Today such routers must maintain the full DFRT in order
to peer.
</t>
<t>
Vendors may wish to facilitate configuration of a core-edge style of VA
for its customers that already use a core-edge topology.  In other words,
a vendor may wish to simplify the VA configuration task so that a customer
merely needs to configure which of its routers are core and which are edge,
and the appropriate VA configuration, i.e. the VP-List, tunnels, and
Popular Prefixes, is automatically done "under the hood" so to speak.
Note that,
under a core-edge configuration, it isn't strictly speaking necessary
for core routers to advertise the 0/0 VP within BGP.  
Rather, edge routers could
rely on their default route to a core router.
</t>
</section> <!-- "sec-core-edge" -->
-->
</section> <!-- "va-ops" -->


<section anchor="sec-new-config" title="New Configuration">
<t>
VA places new configuration requirements on ISP administrators.
Namely, the administrator must:
<t><list style="numbers">
<t>
Select VPs, and configure the VP-List into all VA routers.
As a general rule, having a larger number of relatively small prefixes
gives administrators the most flexibility in terms of filling
available FIB with sub-prefixes, and in terms of balancing load across
routers.  Once an administrator has selected a VP-List, it is just
as easy to configure routers with a large list as a small list.
We can expect network operator groups like NANOG to compile good
VP-Lists that ISPs can then adopt.  A good list would be one where
the number of VPs is relatively large, say 100 or so
(noting again that each VP must be smaller than
a real prefix), and the number of sub-prefixes within each VP is
roughly the same.
</t>
<t>
Select and configure APRs.
There are three primary considerations here.  First, there must be enough
APRs to handle reasonable APR failure scenarios.
Second, APR assignment should not result in router overload.
Third, particularly long paths should be avoided.
Ideally there should be two APRs for each VP within each PoP, but this
may not be possible for small PoPs.
Failing this, there should be at least two APRs in each geographical
region, so as to minimize path length increase.
Routers should have the appropriate counters to allow administrators to
know the volume of APR traffic each router is handling so as to adjust
load by adding or removing APR assignments.
</t>
<t>
Select and configure Popular Prefixes or Popular Prefix policies.
There are two general goals here.  The first is to minimize load overall
by minimizing the number of packets that take longer paths.
The second is to insure that specific selected prefixes don't have overly
long paths.
These goals must be weighed against the administrative overhead
of configuring potentially thousands of Popular Prefixes.
As one example a small ISP may wish to keep it simple by
doing nothing more than
indicating that customer routes should be installed.  In this case,
the administrator could otherwise assign as many APRs as possible
while leaving enough FIB space for customer routes.
As another example, a large ISP could build a management system that
takes into consideration the traffic matrix, customer SLAs,
robustness requirements, FIB sizes, topology, and router capacity,
and periodically automatically computes APR and
Popular Prefix assignments.
</t>
</list></t>
</t>
</section> <!-- "sec-new-config" -->
</section> <!-- "router-changes" -->

<section anchor="sec-tunnels" title="Usage of Tunnels">

<section anchor="sec-mpls" title="MPLS tunnels">
<t>
VA utilizes a straight-forward application of MPLS.  The tunnels
are MPLS Label Switched Paths (LSP), and are signaled using either the
Label Distribution Protocol (LDP) <xref target="RFC5036"></xref> or
RSVP-TE <xref target="RFC3209"></xref>.
Both VA and legacy routers MUST participate in this signaling.
</t>
<t>
APRs and ASBRs initiate tunnels.  In both cases, 
Downstream Unsolicited tunnels are initiated to all IGP neighbors
with the full BGP NEXT_HOP address as the Forwarding Equivalence Class (FEC).
In the case of APRs, the BGP NEXT_HOP is the APR's own address.
In the case of legacy ASBRs, the BGP NEXT_HOP is the ASBR's own address.
In the case of VA ASBRs, the BGP NEXT_HOP is that of the
remote ASBR.
</t>
<t>
Existing Penultimate
Hop Popping (PHP) mechanisms in the data plane can be used
for forwarding packets to remote ASBRs.
</t>
</section> <!-- "sec-mpls" -->

<section anchor="sec-inner" title="Usage of Inner Label">
<t>
Besides using a separate LSP to identify the remote ASBR as
described above, it is also possible to use an inner label to
identify the remote ASBR.
Either an outer label or an IP tunnel identifies
the local ASBR.
</t>

<t>
When a local ASBR advertises a route into iBGP, it sets the NEXT_HOP
to itself, and assigns a label to the route.
This label is used as the inner label, and 
identifies the remote ASBR from which the route was received
<xref target="RFC3107"></xref>.
</t>
<t>
The presence of the inner label in the iBGP update acts as the signal
to the receiving router that an inner label MUST be used in
packets tunneled to the NEXT_HOP address.
If there is an LSP established targeted to the NEXT_HOP address,
then it is used to tunnel the packet to the NEXT_HOP address.
Otherwise, an IP header address to the NEXT_HOP address is used.
</t>

</section> <!-- "sec-inner" -->
</section> <!-- "sec-tunnels" -->

<section anchor="sec-iana" title="IANA Considerations">
<t>
There are no IANA considerations.
</t>
</section>

<section anchor="sec-consider" title="Security Considerations">
<t>
We consider the security implications of VA under two scenarios, one where
VA is configured and operated correctly, and one where it is mis-configured.
A cornerstone of VA operation is that the basic behavior of BGP doesn't
change, especially inter-domain.  Among other things, this makes it easier
to reason about security.
</t>

<section anchor="sec-good-con" title="Properly Configured VA">
<t>
If VA is configured and operated properly, then the
external behavior of an AS does not change.  The same upstream ASes are
selected, and the same prefixes and AS-paths are advertised.
Therefore, a properly configured VA domain has no security impact on other
domains.
</t>
<t>
If another ISP starts advertising a prefix that is larger than a given VP,
this prefix will be ignored by APRs that have a VP that falls within
the larger prefix (<xref target="spec-apr"></xref>).
As a result, packets that might otherwise have been routed to the new
larger prefix will be dropped at the APRs.
Note that the trend in the Internet is towards large prefixes being
broken up into smaller ones, not the reverse.
Therefore, such a larger prefix is likely to be invalid.  If it is
determined without a doubt that the larger prefix is valid, then the
ISP will have to reconfigure its VPs.
</t>
<t>
VA does not change an ISP's ability to do ingress filtering using
strict uRPF (<xref target="spec-suppress"></xref>).
</t>
<t>
Regarding DoS attacks, there are two issues that need to be considered.
First, does VA result in new types of DoS attacks?  Second, does VA
make it more difficult to deploy DoS defense systems.
Regarding the first issue, 
one possibility is that an attacker targets a given
router by flooding the network with traffic to prefixes that are not
popular, and for which that router is an APR.  This would cause a
disproportionate amount of traffic to be forwarded to the APR(s).
While it is up to individual ISPs to decide if this attack is a concern,
it does not strike the authors that this attack is likely
to significantly worsen the DoS problem.
</t>
<t>
Many DoS defense
systems use dynamically established Routing Table entries to divert
victims' traffic into LSPs that carry the traffic to scrubbers.
This mechanism works with VA---it simply over-rides whatever
route is in place.
This mechanism works equally well with APRs and non-APRs.
</t>
</section>

<section anchor="sec-miscon" title="Mis-configured VA">
<t>
VA introduces the possibility that a VP is advertised outside
of an AS.  This in fact should be a low probability event, but it is
considered here none-the-less.
</t>
<t>
If an AS leaks a large VP (i.e. larger than any real prefixes), then the
impact is minimal.  Smaller prefixes will be preferred because of best-match
semantics, and so the only impact is that packets that otherwise have no
matching routes will be sent to the misbehaving AS and dropped there.
If an AS leaks a small VP (i.e. smaller than a real prefix), then
packets to that AS will be hijacked by the misbehaving AS and dropped.
This can happen with or without VA, and so doesn't represent a new
security problem per se.
</t>
</section>
</section>

<section anchor="sec-ack" title="Acknowledgements">
<t>
The authors would like to acknowledge the efforts of Xinyang Zhang
and Jia Wang, who worked on CRIO (Core Router Integrated Overlay), an
early inter-domain variant of FIB suppression, and the efforts of Hitesh Ballani
and Tuan Cao, who worked on the configuration-only variant of VA
that works with legacy routers.
We would also like to thank Scott Brim,
Daniel Ginsburg, and Rajiv Asati for their helpful comments.
In particular, Daniel's comments significantly simplified the spec
(eliminating the need for a new External Communities Attribute).
</t>
</section> <!-- "sec-ack" -->

    </middle>

    <back>

	    <references title='Normative References'>&rfc2119;


<reference anchor='RFC3107'>
<front>
<title>Carrying Label Information in BGP-4</title>
<author initials='Y.' surname='Rekhter' fullname='Y. Rekhter'>
<organization /></author>
<author initials='E.' surname='Rosen' fullname='E. Rosen'>
<organization /></author>
<date year='2001' month='May' />
<abstract>
<t>This document specifies the way in which the label mapping information for a particular route is piggybacked in the same Border Gateway Protocol (BGP) Update message that is used to distribute the route itself. [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='3107' />
<format type='TXT' octets='16442' target='ftp://ftp.isi.edu/in-notes/rfc3107.txt' />
</reference>

<reference anchor='RFC4271'>

<front>
<title>A Border Gateway Protocol 4 (BGP-4)</title>
<author initials='Y.' surname='Rekhter' fullname='Y. Rekhter'>
<organization /></author>
<author initials='T.' surname='Li' fullname='T. Li'>
<organization /></author>
<author initials='S.' surname='Hares' fullname='S. Hares'>
<organization /></author>
<date year='2006' month='January' />
<abstract>
<t>This document discusses the Border Gateway Protocol (BGP), which is an inter-Autonomous System routing protocol.</t><t> The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems. This network reachability information includes information on the list of Autonomous Systems (ASes) that reachability information traverses. This information is sufficient for constructing a graph of AS connectivity for this reachability from which routing loops may be pruned, and, at the AS level, some policy decisions may be enforced.</t><t> BGP-4 provides a set of mechanisms for supporting Classless Inter-Domain Routing (CIDR). These mechanisms include support for advertising a set of destinations as an IP prefix, and eliminating the concept of network "class" within BGP. BGP-4 also introduces mechanisms that allow aggregation of routes, including aggregation of AS paths.</t><t> This document obsoletes RFC 1771. [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='4271' />
<format type='TXT' octets='222702' target='ftp://ftp.isi.edu/in-notes/rfc4271.txt' />
</reference>



<reference anchor='RFC5036'>

<front>
<title>LDP Specification</title>
<author initials='L.' surname='Andersson' fullname='L. Andersson'>
<organization /></author>
<author initials='I.' surname='Minei' fullname='I. Minei'>
<organization /></author>
<author initials='B.' surname='Thomas' fullname='B. Thomas'>
<organization /></author>
<date year='2007' month='October' />
<abstract>
<t>The architecture for Multiprotocol Label Switching (MPLS) is described in RFC 3031.  A fundamental concept in MPLS is that two Label Switching Routers (LSRs) must agree on the meaning of the labels used to forward traffic between and through them.  This common understanding is achieved by using a set of procedures, called a label distribution protocol, by which one LSR informs another of label bindings it has made.  This document defines a set of such procedures called LDP (for Label Distribution Protocol) by which LSRs distribute labels to support MPLS forwarding along normally routed paths. [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='5036' />
<format type='TXT' octets='287101' target='ftp://ftp.isi.edu/in-notes/rfc5036.txt' />
</reference>

<reference anchor='RFC1997'>

<front>
<title>BGP Communities Attribute</title>
<author initials='R.' surname='Chandrasekeran' fullname='Ravishanker Chandrasekeran'>
<organization>cisco Systems, Inc.</organization>
<address>
<postal>
<street>170 W. Tasman Dr.</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>US</country></postal>
<email>rchandra@cisco.com</email></address></author>
<author initials='P.' surname='Traina' fullname='Paul Traina'>
<organization>cisco Systems, Inc.</organization>
<address>
<postal>
<street>170 W. Tasman Dr.</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>US</country></postal>
<email>pst@cisco.com</email></address></author>
<author initials='T.' surname='Li' fullname='Tony Li'>
<organization />
<address>
<email>tli@skat.usc.edu</email></address></author>
<date year='1996' month='August' />
<abstract>
<t>Border Gateway Protocolis an inter-autonomous system routing protocol designed for TCP/IP internets. This document describes an extension to BGP which may be used to pass additional information to both neighboring and remote BGP peers. The intention of the proposed technique is to aid in policy administration and reduce the management complexity of maintaining the Internet.</t></abstract></front>

<seriesInfo name='RFC' value='1997' />
<format type='TXT' octets='8275' target='ftp://ftp.isi.edu/in-notes/rfc1997.txt' />
</reference>


<!--
<reference anchor='RFC3392'>

<front>
<title>Capabilities Advertisement with BGP-4</title>
<author initials='R.' surname='Chandra' fullname='R. Chandra'>
<organization /></author>
<author initials='J.' surname='Scudder' fullname='J. Scudder'>
<organization /></author>
<date year='2002' month='November' /></front>

<seriesInfo name='RFC' value='3392' />
<format type='TXT' octets='9885' target='ftp://ftp.isi.edu/in-notes/rfc3392.txt' />
</reference>
-->


<reference anchor='RFC2827'>

<front>
<title>Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing</title>
<author initials='P.' surname='Ferguson' fullname='P. Ferguson'>
<organization /></author>
<author initials='D.' surname='Senie' fullname='D. Senie'>
<organization /></author>
<date year='2000' month='May' />
<abstract>
<t>This paper discusses a simple, effective, and straightforward method for using ingress traffic filtering to prohibit DoS (Denial of Service) attacks which use forged IP addresses to be propagated from 'behind' an Internet Service Provider's (ISP) aggregation point.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t></abstract></front>

<seriesInfo name='BCP' value='38' />
<seriesInfo name='RFC' value='2827' />
<format type='TXT' octets='21258' target='ftp://ftp.isi.edu/in-notes/rfc2827.txt' />
</reference>


<!--
<reference anchor='RFC2328'>

<front>
<title>OSPF Version 2</title>
<author initials='J.' surname='Moy' fullname='John Moy'>
<organization>Ascend Communications, Inc.</organization>
<address>
<postal>
<street>1 Robbins Road</street>
<city>Westford</city>
<region>MA</region>
<code>01886</code></postal>
<phone>978-952-1367</phone>
<facsimile>978-392-2075</facsimile>
<email>jmoy@casc.com</email></address></author>
<date year='1998' month='April' />
<area>Routing</area>
<keyword>open shortest-path first protocol</keyword>
<keyword>routing</keyword>
<keyword>OSPF</keyword>
<abstract>
<t>

    This memo documents	version	2 of the OSPF protocol.	 OSPF is a
    link-state routing protocol.  It is	designed to be run internal to a
    single Autonomous System.  Each OSPF router	maintains an identical
    database describing	the Autonomous System's	topology.  From	this
    database, a	routing	table is calculated by constructing a shortest-
    path tree.
</t>
<t>
    OSPF recalculates routes quickly in	the face of topological	changes,
    utilizing a	minimum	of routing protocol traffic.  OSPF provides
    support for	equal-cost multipath.  An area routing capability is
    provided, enabling an additional level of routing protection and a
    reduction in routing protocol traffic.  In addition, all OSPF
    routing protocol exchanges are authenticated.
</t>
<t>
    The	differences between this memo and RFC 2178 are explained in
    Appendix G.	All differences	are backward-compatible	in nature.
    Implementations of this memo and of	RFCs 2178, 1583, and 1247 will
    interoperate.
</t>
<t>
    Please send	comments to ospf@gated.cornell.edu.
</t></abstract></front>

<seriesInfo name='STD' value='54' />
<seriesInfo name='RFC' value='2328' />
<format type='TXT' octets='447367' target='ftp://ftp.isi.edu/in-notes/rfc2328.txt' />
<format type='XML' octets='446761' target='http://xml.resource.org/public/rfc/xml/rfc2328.xml' />
</reference>
-->

  
<reference anchor='RFC3209'>

<front>
<title>RSVP-TE: Extensions to RSVP for LSP Tunnels</title>
<author initials='D.' surname='Awduche' fullname='D. Awduche'>
<organization /></author>
<author initials='L.' surname='Berger' fullname='L. Berger'>
<organization /></author>
<author initials='D.' surname='Gan' fullname='D. Gan'>
<organization /></author>
<author initials='T.' surname='Li' fullname='T. Li'>
<organization /></author>
<author initials='V.' surname='Srinivasan' fullname='V. Srinivasan'>
<organization /></author>
<author initials='G.' surname='Swallow' fullname='G. Swallow'>
<organization /></author>
<date year='2001' month='December' />
<abstract>
<t>This document describes the use of RSVP (Resource Reservation Protocol), including all the necessary extensions, to establish label-switched paths (LSPs) in MPLS (Multi-Protocol Label Switching).  Since the flow along an LSP is completely identified by the label applied at the ingress node of the path, these paths may be treated as tunnels.  A key application of LSP tunnels is traffic engineering with MPLS as specified in RFC 2702. [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='3209' />
<format type='TXT' octets='132264' target='ftp://ftp.isi.edu/in-notes/rfc3209.txt' />
</reference>


<reference anchor='RFC4601'>

<front>
<title>Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)</title>
<author initials='B.' surname='Fenner' fullname='B. Fenner'>
<organization /></author>
<author initials='M.' surname='Handley' fullname='M. Handley'>
<organization /></author>
<author initials='H.' surname='Holbrook' fullname='H. Holbrook'>
<organization /></author>
<author initials='I.' surname='Kouvelas' fullname='I. Kouvelas'>
<organization /></author>
<date year='2006' month='August' />
<abstract>
<t>This document specifies Protocol Independent Multicast - Sparse Mode (PIM-SM). PIM-SM is a multicast routing protocol that can use the underlying unicast routing information base or a separate multicast-capable routing information base. It builds unidirectional shared trees rooted at a Rendezvous Point (RP) per group, and optionally creates shortest-path trees per source.</t><t> This document obsoletes RFC 2362, an Experimental version of PIM-SM. [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='4601' />
<format type='TXT' octets='340632' target='ftp://ftp.isi.edu/in-notes/rfc4601.txt' />
</reference>




	    </references>




  <references title='Informative References'>

<reference anchor="nsdi09">
  <front>
  <title>Making Routers Last Longer with ViAggre</title> 
  <author initials="H" surname="Ballani" fullname="Hitesh Ballani">
  <organization /> 
  </author>
  <author initials="P" surname="Francis" fullname="Paul Francis">
  <organization /> 
  </author>
  <author initials="T" surname="Cao" fullname="Tuan Cao">
  <organization /> 
  </author>
  <author initials="J" surname="Wang" fullname="Jia Wang">
  <organization /> 
  </author>
  <date month="April" day=" " year="2009" /> 
  </front>
  <seriesInfo name="ACM Usenix NSDI 2009" value="http://www.usenix.org/events/nsdi09/tech/full_papers/ballani/ballani.pdf" /> 
  </reference>

  
<reference anchor='I-D.ietf-grow-va-gre'>
<front>
<title>GRE and IP-in-IP Tunnels for Virtual Aggregation</title>

<author initials='P' surname='Francis' fullname='Paul Francis'>
    <organization />
</author>

<author initials='R' surname='Raszuk' fullname='Robert Raszuk'>
    <organization />
</author>

<author initials='X' surname='Xu' fullname='Xiaohu Xu'>
    <organization />
</author>

<date month='July' day='6' year='2009' />

<abstract><t>The document "FIB Suppression with Virtual Aggregation" [I-D.grow-va] describes how FIB size may be reduced.  That draft refers generically to tunnels, and leaves it to other documents to define the tunnel establishment methods for specific tunnel types.  This document provides those definitions for GRE and IP-in-IP tunnels.</t></abstract>

</front>

<seriesInfo name='Internet-Draft' value='draft-ietf-grow-va-gre-00' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-ietf-grow-va-gre-00.txt' />
</reference>


<reference anchor='I-D.ietf-grow-va-mpls'>
<front>
<title>MPLS Tunnels for Virtual Aggregation</title>

<author initials='P' surname='Francis' fullname='Paul Francis'>
    <organization />
</author>

<author initials='X' surname='Xu' fullname='Xiaohu Xu'>
    <organization />
</author>

<date month='May' day='23' year='2009' />

<abstract><t>The document "FIB Suppression with Virtual Aggregation" [I-D.francis-intra-va] describes how FIB size may be reduced.  The latest revision of that draft refers generically to tunnels, and leaves it to other documents to define the usage and signaling methods for specific tunnel types.  This document provides those definitions for MPLS Label Switched Paths (LSP), without tag stacking.</t></abstract>

</front>

<seriesInfo name='Internet-Draft' value='draft-ietf-grow-va-mpls-00' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-ietf-grow-va-mpls-00.txt' />
</reference>


<reference anchor='I-D.ietf-grow-va-mpls-innerlabel'>
<front>
<title>Proposal to use an inner MPLS label to identify the remote ASBR VA</title>

<author initials='X' surname='Xu' fullname='Xiaohu Xu'>
    <organization />
</author>

<author initials='P' surname='Francis' fullname='Paul Francis'>
    <organization />
</author>

<date month='September' day='23' year='2009' />

<abstract><t>The draft "MPLS Tunnels for Virtual Aggregation" [I-D.ietf-grow-va-mpls] specifies how MPLS is used as the tunneling protocol for Virtual Aggregation (VA).  The -00 version of that draft specifies only one level of labels, with the result that one Label Switched Path (LSP) for every remote ASBR must be established.  For large ISPs, this can amount to a large number of LSPs.  This draft proposes adding the option of using an inner label to identify the remote ASBR.  Either an outer label or an IP tunnel is used to reach the local ASBR.  When MPLS is used as the tunneling protocol, this reduces the number of LSPs to the number of local border routers (ASBR).</t></abstract>

</front>

<seriesInfo name='Internet-Draft' value='draft-ietf-grow-va-mpls-innerlabel-00' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-ietf-grow-va-mpls-innerlabel-00.txt' />
</reference>


<reference anchor='I-D.ietf-grow-simple-va'>
<front>
<title>Simple Virtual Aggregation (S-VA)</title>

<author initials='P' surname='Francis' fullname='Paul Francis'>
    <organization />
</author>

<author initials='X' surname='Xu' fullname='Xiaohu Xu'>
    <organization />
</author>

<author initials='H' surname='Ballani' fullname='Hitesh Ballani'>
    <organization />
</author>

<author initials='R' surname='Raszuk' fullname='Robert Raszuk'>
    <organization />
</author>

<author initials='L' surname='Zhang' fullname='Lixia Zhang'>
    <organization />
</author>

<date month='March' day='1' year='2010' />

<abstract><t>The continued growth in the Default Free Routing Table (DFRT) stresses the global routing system in a number of ways.  One of the most costly stresses is FIB size: ISPs often must upgrade router hardware simply because the FIB has run out of space, and router vendors must design routers that have adequate FIB.  FIB suppression is an approach to relieving stress on the FIB by NOT loading selected RIB entries into the FIB.  Simple Virtual Aggregation (S-VA) is a simple form of Virtual Aggregation (VA) that allows any and all edge routers to shrink their FIB requirements substantially and therefore increase their useful lifetime.  S-VA does not change FIB requirements for core routers.  S-VA is extremely easy to configure---considerably more so than the various tricks done today to extend the life of edge routers.  S-VA can be deployed autonomously by an ISP (cooperation between ISPs is not required), and can co-exist with legacy routers in the ISP.</t></abstract>

</front>

<seriesInfo name='Internet-Draft' value='draft-ietf-grow-simple-va-00' />
<format type='TXT'
        target='http://www.ietf.org/internet-drafts/draft-ietf-grow-simple-va-00.txt' />
</reference>

<!--
<reference anchor='RFC4724'>

<front>
<title>Graceful Restart Mechanism for BGP</title>
<author initials='S.' surname='Sangli' fullname='S. Sangli'>
<organization /></author>
<author initials='E.' surname='Chen' fullname='E. Chen'>
<organization /></author>
<author initials='R.' surname='Fernando' fullname='R. Fernando'>
<organization /></author>
<author initials='J.' surname='Scudder' fullname='J. Scudder'>
<organization /></author>
<author initials='Y.' surname='Rekhter' fullname='Y. Rekhter'>
<organization /></author>
<date year='2007' month='January' />
<abstract>
<t>This document describes a mechanism for BGP that would help minimize the negative effects on routing caused by BGP restart. An End-of-RIB marker is specified and can be used to convey routing convergence information. A new BGP capability, termed "Graceful Restart Capability", is defined that would allow a BGP speaker to express its ability to preserve forwarding state during BGP restart. Finally, procedures are outlined for temporarily retaining routing information across a TCP session termination/re-establishment.</t><t> The mechanisms described in this document are applicable to all routers, both those with the ability to preserve forwarding state during BGP restart and those without (although the latter need to implement only a subset of the mechanisms described in this document). [STANDARDS TRACK]</t></abstract></front>

<seriesInfo name='RFC' value='4724' />
<format type='TXT' octets='32343' target='ftp://ftp.isi.edu/in-notes/rfc4724.txt' />
</reference>
-->


  </references>
	    

    </back>

</rfc>

PAFTECH AB 2003-20262026-04-24 09:50:25