One document matched: draft-zhang-fibaggregation-01.txt
Differences from draft-zhang-fibaggregation-00.txt
Network Working Group B. Zhang
Internet-Draft Univ. of Arizona
Intended status: Informational L. Wang
Expires: April 29, 2010 Univ. of Memphis
X. Zhao
Univ. of Arizona
Y. Liu
Univ. of Memphis
L. Zhang
UCLA
October 26, 2009
FIB Aggregation
draft-zhang-fibaggregation-01.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 29, 2010.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
Zhang, et al. Expires April 29, 2010 [Page 1]
Internet-Draft FIB Aggregation October 2009
and restrictions with respect to this document.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Abstract
The rapid growth of Forwarding Information Base (FIB) has raised
concerns among many Internet Service Providers. One potential
solution to this problem is FIB aggregation, i.e. letting each router
aggregate its FIB entries without affecting the forwarding paths
taken by data traffic. It is a simple local software optimization
within a router, requiring no changes to routing protocols or router
hardware. To understand the effectiveness of using FIB aggregation
to extend router lifetime, in this draft we present several FIB
aggregation algorithms and evaluate their performance using routing
tables and updates collected from tens of networks. Our results show
that FIB aggregation can reduce the FIB table size by as much as 70%
with small computational overhead. We also show that the
computational overhead can be controlled through various mechanisms.
Zhang, et al. Expires April 29, 2010 [Page 2]
Internet-Draft FIB Aggregation October 2009
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. FIB Aggregation Overview . . . . . . . . . . . . . . . . . . . 6
3. FIB Aggregation Techniques and Algorithms . . . . . . . . . . 7
3.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7
3.2. Level 1 Aggregation . . . . . . . . . . . . . . . . . . . 9
3.3. Level 2 Aggregation . . . . . . . . . . . . . . . . . . . 9
3.4. Level 3 Aggregation . . . . . . . . . . . . . . . . . . . 10
3.5. Level 4 Aggregation . . . . . . . . . . . . . . . . . . . 12
3.6. Practical Considerations . . . . . . . . . . . . . . . . . 13
4. Updating Aggregated FIB . . . . . . . . . . . . . . . . . . . 14
5. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . 17
5.2. Table Size Reduction and Overhead . . . . . . . . . . . . 18
5.3. Routing Update Handling . . . . . . . . . . . . . . . . . 20
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
8. Security Considerations . . . . . . . . . . . . . . . . . . . 23
9. Informative References . . . . . . . . . . . . . . . . . . . . 23
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
Zhang, et al. Expires April 29, 2010 [Page 3]
Internet-Draft FIB Aggregation October 2009
1. Introduction
The Internet routing scalability problem has raised concerns in both
industry and research communities, as documented in the report from
the IAB Workshop on Routing and Addressing [RAWS]. Several solutions
have been proposed under the IRTF RRG and IETF GROW Working Group.
To address the root cause of the scalability problem may require
changes to the Internet routing architecture and protocols. However,
deploying these architectural and protocol changes may take a long
time, as we have learned from the design and deployment of IPv6.
Thus we should also explore simple short-term solutions that can help
ISPs address problems facing them today. In particular, ISPs
urgently need a solution to reduce their forwarding table size.
Forwarding tables are derived from routing tables and router
configurations, so their size increases as routing tables grow.
However, forwarding tables use high performance memory that is more
expensive and more difficult to scale than the memory used to hold
routing tables. Therefore, their size is a more immediate concern to
ISPs as well as vendors.
This draft reports our investigation on the feasibility of a purely
local solution: FIB aggregation, which is to combine multiple entries
in the routing table to one entry in the forwarding table without
changing the next hop for data traffic. This approach is
particularly appealing because it can be done by a software upgrade
at one router at a time, and its impact is limited within the router.
It does not require changes to routing protocols or router hardware,
nor does it affect multi-homing, traffic engineering, or other
network-wide operations. It is important to note that FIB
aggregation is not a replacement for the long-term architectural
solutions; rather, FIB aggregation is a local solution that can be
quickly implemented and deployed in the short-term, and in the long
run, it can co-exist and complement architectural solutions.
The idea of FIB aggregation is not new. It was mentioned as a
potential strategy in "Preliminary Recommendation for a Routing
Architecture" [Li09RRG]. Through personal exchanges, we have learned
that one vendor has implemented a simple FIB aggregation scheme
(similar to our Level-1 aggregation). There is also a patent for a
FIB aggregation algorithm [Cain02AUTOAGG]. In this work, we evaluate
the benefits and costs of several FIB aggregation schemes. We
recognize that FIB aggregation is an opportunistic technique -- its
effectiveness depends on what prefixes are present in the table, how
many of them can be numerically represented by a single prefix, and
how many of them share the same next-hop in data forwarding. The
benefits of FIB aggregation also come with certain cost, such as
extra CPU overhead. The cost also depends on the actual aggregation
algorithms, and how routing changes are handled to update the
Zhang, et al. Expires April 29, 2010 [Page 4]
Internet-Draft FIB Aggregation October 2009
forwarding table.
We design and implement five algorithms at different aggregation
levels, each representing different tradeoffs between table size
reduction and computation complexity. We evaluate them using
publicly available routing tables from tens of networks. The results
show that the lowest-level aggregation can reduce the FIB table size
by 30% to 50%, equivalent to the full DFZ routing table size two and
half years ago, while the highest-level aggregation can reduce the
FIB table size by 70%, equivalent to the full DFZ table size eight
years ago. The computation time of one full aggregation run ranges
from 50 milliseconds to 250 milliseconds on a commodity Linux
machine. Although these numbers may not reflect the computation time
on a router, they reflect the relative speed of different levels of
aggregation. Moreover, such full aggregation is performed only when
certain thresholds are reached, so the computation time is amortized
over time.
To handle routing changes, we design and implement algorithms to
incrementally update the aggregated FIB upon a change. The full
aggregation algorithm is invoked only when the router CPU load is low
or the FIB size rises above a given threshold. The computation time
can be further reduced by not aggregating highly active prefixes,
which represent a small fraction of the routing table but are
responsible for many routing changes. The evaluation using one-month
BGP routing updates shows that, compared with un-aggregated FIB, the
computation overhead of maintaining aggregated FIB over time is
small. Overall, we believe that FIB aggregation is a viable solution
to the routing scalability problem, since it provides significant
reduction in table size, and its extra computation can be controlled
by various mechanisms, such as using different levels of aggregation,
incremental update algorithms, on-deman invocation, and not
aggregating highly active prefixes.
Another proposed approach to FIB size reduction is Virtual
Aggregation [Virtual_Aggregation]. Virtual Aggregation designates a
small set of routers (APRs) to announce short prefixes (called
virtual prefixes), so that other routers do not need to install in
their FIB more specific prefixes under those virtual prefixes; they
can simply forward packets falling with the virtual prefixes to the
APRs responsible for those virtual prefixes. It can be independently
deployed by a single ISP, and does not require changes to the routing
architecture or protocols. However, Virtual Aggregation requires
considerable changes to network-wide router configurations and
specialized routers to announce virtual prefixes. Moreover, it
introduces extra delays (path stretch) in packet delivery. To some
extent, the Level 4 FIB aggregation described in this draft can be
viewed as a local Virtual Aggregation within a router.
Zhang, et al. Expires April 29, 2010 [Page 5]
Internet-Draft FIB Aggregation October 2009
2. FIB Aggregation Overview
FIB aggregation eliminates and aggregates entries in a FIB without
affecting the delivery of data traffic. For example, it may remove
an entry for prefix P1 from the FIB if another entry for prefix P2
with the same next-hop already covers P1's address space. It may
also introduce a new entry to the FIB after removing several entries
that share the same next-hop.
To ensure packet delivery and avoid changing the path that a packet
will take, a FIB aggregation scheme should satisfy the property of
forwarding correctness: after longest match lookup, if a destination
address has a non-NULL next-hop in the un-aggregated FIB, it should
have the same next-hop in the aggregated FIB. There are two main
questions in designing FIB aggregation techniques: how to aggregate
the full FIB and how to update an aggregated FIB upon a routing
change. In the next two sections, we will describe five aggregation
algorithms and one update handling algorithm.
The effectiveness of FIB aggregation depends on which address
prefixes are present in the routing table and how these prefixes are
distributed over next-hop routers. Generally speaking, the fewer
neighbors a router has, the better aggregation it may achieve. In
the extreme case, all prefixes share the same next-hop and the
aggregation is maximized. According to Li et. al [RouterTopology],
although some routers have high degrees up to a couple of hundreds,
these routers connect to a large number of end-customers, not transit
neighbor routers. Therefore, they will still use only a small number
of next-hops, i.e., the transit neighbors, to reach most of the
address prefixes.
Besides sharing the same next-hop, prefixes also need to be
numerically aggregatable. This is possible due to two factors.
First, in IP address allocation, large blocks of Internet addresses
are first allocated to Regional Internet Registries and then they
further allocate the addresses to networks within the same region. A
router outside the region tends to use the same nexthop to reach
these address prefixes, which can then be aggregated. Second, for
prefixes split for traffic engineering or other purposes, a router
near the origin network is likely to have different next-hops, but a
router further away from the origin network is more likely to have
the same next-hop towards these numerically aggregatable prefixes.
Therefore, although FIB aggregation is opportunistic and the
aggregation degree varies from router to router, there are some
inherent properties of the Internet that can make FIB aggregation
effective. If FIB aggregation is effective in reducing table size,
its most appealing feature is that the impact is limited within a
Zhang, et al. Expires April 29, 2010 [Page 6]
Internet-Draft FIB Aggregation October 2009
router's data plane. It does not change any routing protocols, or
any router's routing decisions. Data traffic still flows on the same
router paths. Therefore, it can co-exist with almost any new routing
protocols, including those architectural solutions to the routing
scalability problem in the long run.
3. FIB Aggregation Techniques and Algorithms
We present four levels of full FIB aggregation, each aggregating the
table more but incuring more computation and other overhead. These
algorithms assume that the routing table is stored in a tree
structure. Though our implementation uses patricia trie, a tree-like
data structure widely used for routing tables since the BSD
implementation, the algorithms should apply to any tree data
structure. Note that our algorithms do not build any additional
trees just for aggregation; we simply use the existing trees that the
RIB and FIB use. For a network device that uses non-tree data
structure to implement routing tables, the general techniques
discussed here still apply. In real networks, the algorithms and
implementations can always be optimized for specific hardware,
operating systems, and routing table data structures.
3.1. Terminology
In a patricia trie-based RIB or FIB, IP prefixes and their next-hop
information are stored in certain trie nodes. The prefixes become
more specific along the tree, which facilitates longest prefix
matching. Figure 1 shows part of a patricia trie containing four
prefixes 12.0/14, 12.2/15, 12.0/16, and 12.1/16. Note that some of
the trie nodes, e.g. Node 1, may not have corresponding prefixes; we
call them "internal nodes". However, they may become real nodes when
their corresponding prefixes are advertised to this router. For
example, if a routing announcement for 12.0/15 is received, this
prefix and its next-hop information will be inserted at Node 1.
12.0/14
/ \
Node 1 12.2/15
/ \
12.0/16 12.1/16
Figure 1
Before describing the detailed aggregation algorithms, we introduce a
few more terms:
Zhang, et al. Expires April 29, 2010 [Page 7]
Internet-Draft FIB Aggregation October 2009
o Ancestor prefix and immediate ancestor prefix: given two address
prefixes P1 and P2, if P1 is a less specific prefix of P2, then P1
is called an ancestor prefix of P2. For example, 12/8 is an
ancestor prefix of 12.1/16. If P1 is the longest prefix among all
the ancestor prefixes of P2, it is called the immediate ancestor
prefix of P2.
o Nearest descendant prefixes: for a given prefix P in a patricia
trie, its nearest descendant prefixes includes all the prefixes
whose immediate ancestor prefix is P.
o Covering prefix and covered prefix: if an address prefix P has no
ancestor prefix in a RIB/FIB, then it is a covering prefix.
Otherwise, it is a covered prefix. Note that in a patricia trie
implementation, a covering prefix may have ancestor nodes that do
not have corresponding prefixes.
o Real prefix and generated prefix: if a prefix exists in an
unaggregated RIB/FIB, it is called a real prefix. If a prefix is
generated by aggregation, it is called a generated prefix.
o Hole: if a RIB/FIB entry X has a more specific prefix than another
entry Y does and they have different next-hops, then X punches a
hole in Y. For example, the entry (prefix: 12.1/16, nexthop: A)
punches a hole in (prefix: 12/8, nexthop: B).
o Sibling prefix: if two prefixes of the same length are numerically
consecutive and numerically aggregatable, we call them sibling
prefixes. For example, 12.2/16 and 12.3/16 are sibling prefixes.
In a patricia trie, sibling prefixes are children prefixes under
the same parent.
o IN-FIB and NON-FIB: an aggregation algorithm determines whether a
prefix should be placed in the FIB. If placed in the FIB, the
prefix is labeled as IN-FIB. Otherwise, it is labeled as NON-FIB.
o Postorder traversal: this is a recursive tree traversal algorithm
that examines the children nodes before the parent nodes. Suppose
in a given FIB tree, prefix P1 has two children prefixes P2 and P3
and prefix P2 has two children prefixes P4 and P5 (P3, P4, P5 are
leaf nodes), then the algorithm will visit the prefixes in the
following order: P4, P5, P2, P3, P1.
o Preorder traversal: this is a recursive tree traversal algorithm
that traverses each node, its left subtree and then its right
subtee. In the above example, a pre-order traversal will visit
the prefixes in the following order: P1, P2, P4, P5, P3.
Zhang, et al. Expires April 29, 2010 [Page 8]
Internet-Draft FIB Aggregation October 2009
3.2. Level 1 Aggregation
(A) (A)
/ \ => / \
/ \ / \
(A) ( ) ( ) ( )
Level-1: Removing covered prefixes
Figure 2
This technique is illustrated in Figure 2. The binary tree
represents part of the IP address space. Nodes labeled with letters
are prefixes in the routing table, and the letter represents the
next-hop for the prefix. Nodes in parenthesis without labels are
internal nodes that do not have their corresponding prefixes in the
FIB tree.
Level-1 aggregation removes a prefix if it shares the same next-hop
with its immediate ancestor prefix. Addresses that previously match
the removed prefix now will match the ancestor prefix and still get
the same next-hop. Previously non-routable packets, whose table
lookup ends up with NULL next-hop, will still be non-routable. In
other words, this aggregation does not introduce any new prefix nor
extra routable space into the table.
The algorithm implementing this technique simply traverses the tree
recursively from the root node in postorder. When it arrives at a
node with a prefix, it compares this prefix's next-hop with its
immediate ancestor prefix's next-hop. If they have the same next-
hop, it labels the current node NON-FIB, otherwise labels it IN-FIB.
The immediate ancestor prefix's next-hop is updated and remembered
during the tree traversal. Eventually every prefix node is labeled
as either NON-FIB or IN-FIB, and all IN-FIB prefixes comprise the
aggregated FIB. The aggregation is done recursively throughout the
entire table. The computation time is O(n), where n is the total
number of nodes in the tree.
3.3. Level 2 Aggregation
Zhang, et al. Expires April 29, 2010 [Page 9]
Internet-Draft FIB Aggregation October 2009
( ) (A)
/ \ => / \
/ \ / \
(A) (A) ( ) ( )
Level-2: Combining sibling prefixes
Figure 3
This technique is illustrated in Figure 3. In addition to performing
Level 1 aggregation, Level 2 combines sibling prefixes that share the
same next-hop into a parent prefix. If the parent node already has a
prefix with a different next-hop, then the aggregation cannot be
done. Or if the parent node already has a prefix with the same next-
hop, then it is part of Level 1 aggregation. Therefore, Level 2 is
done when the parent node has no prefix. The net result is to
introduce a new prefix to cover two existing prefixes, but there is
no extra routable space introduced, i.e. the aggregated FIB covers
the exact address space as the un-aggregated FIB.
The algorithm implementing Level 2 aggregation traverses the tree
recursively from the root node in postorder. Besides doing Level 1
aggregation, when it arrives at a node without a prefix, it compares
this node's two children. If both children have prefixes and use the
same next-hop, then both children are labeled NON-FIB, and this
current node is assigned the parent prefix and labeled IN-FIB. The
aggregation is done recursively throughout the entire table. The
computation time is O(n), where n is the total number of nodes in the
tree.
3.4. Level 3 Aggregation
(A) (A)
/ \ / \
/ \ / \
() () => () ()
/\ /\ /\ /\
/ \ / \ / \ / \
(A) () () (A) () [] [] ()
Level-3: Allowing extra space
Figure 4
This technique is illustrated in Figure 4 (nodes with square brackets
are extra routable space introduced by the aggregation). In addition
to performing the Level 1 and 2 aggregation, Level 3 aggregates a set
of non-sibling prefixes that have the same next-hop into a super
Zhang, et al. Expires April 29, 2010 [Page 10]
Internet-Draft FIB Aggregation October 2009
prefix. Between these non-sibling prefixes, non-routable space is
allowed. For example, in Figure 1c, at the bottom level of the tree,
there are two nodes with address prefixes (real nodes) sharing the
same next hop. However, these two nodes are separated in the tree by
two nodes without address prefixes. The prefixes of the two real
nodes can be aggregated into a grandparent prefix.
Level 3 aggregation must be implemented with care to ensure its
forwarding correctness. For example, in Figure 1c, two grandchildren
prefixes are aggregated into one grandparent prefix. This would be
incorrect if there is already a great-grandparent prefix (not shown
in the figure) covering the subtree with a different next-hop B,
because that means the two middle nodes at the bottom level are not
non-routable space and their next-hops would change from B to A after
the aggregation. In order to handle this case without introducing
much computation overhead, we decide that in our implementation we
only apply this type of aggregation to prefixes that do not have any
existing ancestor prefix, i.e. covering prefixes. In a typical DFZ
routing table, about half of all the prefixes are covering prefixes
and the other half are covered prefixes. The covered prefixes can be
aggregated by Level 1 and Level 2, therefore our choice does not lose
too much aggregation capability.
The algorithm implementing Level 3 aggregation traverses the tree
recursively in postorder. Besides doing Level 1 and Level 2
aggregation for all nodes, when it arrives at a covering prefix, it
checks whether this prefix has a sibling node that does not have a
prefix. If yes, it returns a pointer of this prefix node to its
parent node, which will further pass this pointer up along the tree.
Whenever an upper level node has two such pointers, one from a left
descendant and another from a right descendant, and these two
descendants have the same next-hop, then a new prefix is created at
this upper level node and labeled IN-FIB, while the two descendant
nodes are labeled NON-FIB. If the two descendants have different
next-hops, then aggregation cannot be done and they remain IN-FIB.
The computation complexity is O(n), where n is the number of nodes in
the tree.
Zhang, et al. Expires April 29, 2010 [Page 11]
Internet-Draft FIB Aggregation October 2009
3.5. Level 4 Aggregation
( ) (A)
/ \ / \
/ \ / \
() () => () ()
/\ /\ /\ /\
/ \ / \ / \ / \
(A) (B)() (A) () (B)[] ()
Level-4: Allowing holes in the aggregation
Figure 5
This technique is illustrated in Figure Figure 5. In addition to
performing Level 1, 2 and 3 aggregation, Level 4 aggregates a set of
non-sibling prefixes with the same next-hops, which have other
prefixes with different next-hops in between. For example, in Figure
1d, an address prefix with next-hop B is between the prefixes being
aggregated to a new prefix with next-hop A, punching a "hole'' in the
aggregate prefix. This type of aggregation maintains forwarding
correctness and may also introduce extra routable space as Level 3
does. For the same reason as in Level 3, our algorithm only applies
this type of aggregation to prefixes that do not have ancestor
prefixes.
The seemingly trivial difference between Level 4 and Level 3 actually
has significant implication to algorithm design. It allows the
maximum flexibility for aggregation. However, taking advantage of it
may also require significant computational time. For example, given
a set of non-sibling prefixes with different next-hops, which super-
prefix should be inserted? Which next-hop should the super-prefix
take? Finally, how should the decision be made without too much
computational complexity? In this work, we present and evaluate two
different Level 4 algorithms described as follows.
The Level 4A algorithm traverses the tree recursively once in
postorder. Besides doing Level 1, 2 and 3 aggregations, when it
arrives at a covering prefix (our level 4 algorithm only applies to
covering prefixes), it returns a pointer of this prefix node to its
parent, which will further pass this pointer up along the tree.
Whenever an upper level node receives two lists of its prefix
descendants, one from its left child and the other from its right
child, this node combines the two lists to get all its immediate
prefix descendants and their next-hops, picks the most popular next-
hop as its own next-hop and inserts a prefix at this node. All the
immediate prefix descendants that use the most popular next-hop will
be labeled NON-FIB, and other descendants are labeled IN-FIB. If
Zhang, et al. Expires April 29, 2010 [Page 12]
Internet-Draft FIB Aggregation October 2009
there are multiple next-hops tie for the most popular, then one of
them will be randomly selected. The computation time is O(n), where
n is the number of nodes in the tree.
The Level 4B algorithm is based on Herrin's proposal [Herrin]. It
differs from the Level 4A algorithm mainly in how the popular next-
hop is calculated -- the 4A algorithm considers only the immediate
prefix descendants of a node, while the 4B algorithm considers all
the prefix descendants of a node (i.e. all the prefixes in the tree
rooted at this node). It traverses the tree twice. The first step
traverses the tree recursively in postorder, which is like sweeping
all tree nodes from bottom up. During this process, the algorithm
calculates the most popular next-hop among all descendant prefixes of
a node and records this next-hop with the node unless this node
already has a prefix with a different next-hop. The second step
traverses the covering prefixes in preorder, which is like going
through all tree nodes from top down. During this process, the
algorithm tries to insert new prefixes with the most popular next-hop
from all prefix descendants (not just immediate prefix descendants as
in Level 4A), as calculated in the previous postorder tree traversal,
and label the descendant prefixes NON-FIB or IN-FIB accordingly.
When there are multiple equally popular next-hops, we randomly select
one. The computation time is O(n), where n is the number of nodes in
the tree. It tries to do a more thorough aggregation than Level 4A,
but will take longer time since it traverse the tree twice.
3.6. Practical Considerations
Our aggregation techniques ensure forwarding correctness by
aggregating prefixes with the same next-hop. In reality, there are
other types of information stored in a FIB entry, such as packet
count. Aggregating two prefixes will probably lose such fine-grained
statistics, which also happens in all other routing scalability
solutions, usually to a much wider extent. Losing more detailed
information is a necessary cost we have to pay in order to improve
overall system scalability. One way to mitigate the problem is
having a router-wide packet counter instead of per-FIB packet
counter, thus the aggregated information from all line cards are
still kept for each prefix. Another way is for the operators to
configure some important prefixes not be subject to FIB aggregation,
thus fine-grained statistics of these important prefixes will be
kept.
A side effect of Level 3 and 4 aggregations is that they introduce
new prefixes that cover previously non-routable space, therefore some
previously non-routable traffic (which would have been dropped by
this router) will be forwarded along the next-hop of the new
prefixes. This does not violate the correctness requirement since
Zhang, et al. Expires April 29, 2010 [Page 13]
Internet-Draft FIB Aggregation October 2009
all previously routable traffic is still routable and will follow the
same path. The impact of introducing extra routable space into the
FIB depends on how much traffic is destined to that address space.
In normal operational conditions, the volume of such traffic should
be negligible. However, malicious traffic such as port scanning
usually explores such non-routable space and in certain cases it may
become noticeable. Eventually these packets will be dropped, either
because they arrive at a router that does not have a route for these
packets, or because the packet's time-to-live expires. These packets
will consume bandwidth during transit and that is a new type of
overhead introduced by Level 3 aggregation. A good Level 3 or Level
4 algorithm should consider the tradeoff between table size
reduction, extra routable space introduced, and computation time.
In our evaluation, we limited the newly introduced prefixes to be no
longer than a certain length and found that setting the prefix length
limit to 15 results in a good trade-off between table size reduction
and extra routable space.
4. Updating Aggregated FIB
Internet routes change over time, thus the obvious question is how to
update the aggregated FIB when there is a change. Re-run the full
FIB aggregation will maintain the best aggregation all the time, but
it will also incur significant computation overhead since the full
FIB aggregation accesses every tree node, comparing to updating an
un-aggregated FIB which only accesses a single prefix node.
We use the combination of four mechanisms to make sure that the
computation cost of updating aggregated FIBs is under the control of
operators. First, operators can choose one of the four
aforementioned levels of full FIB aggregation that suits their
routers the best. Routers with faster CPU and fewer routing updates
can use higher level FIB aggregation, otherwise they can use lower
level FIB aggregation. Second, we design an algorithm that updates
the aggregated FIB incrementally. The algorithm tries to minimize
the number of tree nodes that have to be accessed and changed to
maintain forwarding correctness after a routing change. Third, the
full FIB aggregation is only invoked when needed, e.g., the table
size has crossed a threshold after being incrementally updated for a
while, or when the router has free CPU cycles to spare, i.e., the
router load is under a threshold. Fourth, as shown in
[HighlyActive], a small number of highly active prefixes are
responsible for a large number of routing changes. Therefore one can
keep these highly active prefixes in the FIB without aggregating them
out. Then every time they have a change, the update process does not
incur extra overhead. Operators can configure the criteria for
Zhang, et al. Expires April 29, 2010 [Page 14]
Internet-Draft FIB Aggregation October 2009
deciding which prefixes should be kept in FIB regardless of the
aggregation process. This last method will be evaluated in our
future work.
The basic algorithm of incrementally updating the aggregated FIB is
described as follows. The algorithm must satisfy three goals:
o Keep forwarding correctness;
o Adhere to the aggregation level. That is, there is no generated
prefixes at level 1 aggregation, no extra routable space at level
2, and no holes at level 3.
o Minimize the number of changes to the FIB.
Processing a routing update includes two steps: updating the labelled
RIB and updating the aggregated FIB. The second step is
straightforward as for whichever node that has changed its label from
IN-FIB to NON-FIB or vice versa we will just apply such change to the
FIB. Therefore we will focus on describing how the labelled RIB is
incrementally updated.
In general, when a prefix gets a new nexthop, its nearest descendants
need to be re-aggregated, and when a prefix is withdrawn, its nearest
descendants need to be de-aggregated. The details differ depending
on the level of aggregation. We first define a few basic operations,
and then use them to describe the incremental update algorithm for
each level.
o update-node(p): when an announcement of prefix p is received,
insert the corresponding node if it does not exist in RIB,
otherwise update its nexthop information if it's a different one.
If p existed in the RIB but was a generated prefix, label it as a
real prefix. Let A be p's nearest ancestor, if nexthop(A) ==
nexthop(p), label p as NON-FIB, otherwise IN-FIB.
o re-aggregate(p): For each D of p's nearest descendants. If
nexthop(D) != nexthop(p), label D IN-FIB. Optionally, we can also
label D NON-FIB if nexthop(D) == nexthop(p), which does not affect
forwarding correctness but reduces the FIB size at the expense of
updating more FIB nodes.
o de-aggregate(p): Let A be p's nearest ancestor. For each D of p's
nearest descendants, if nexthop(D) != nexthop(A), label D IN-FIB.
Optionally, we can also label D NON-FIB if nexthop(D) ==
nexthop(A), which does not affect forwarding correctness but
reduces the FIB size at the expense of updating more FIB nodes.
Using the above basic operations, we describe the incremental update
Zhang, et al. Expires April 29, 2010 [Page 15]
Internet-Draft FIB Aggregation October 2009
algorithm for each level of aggregation as follows.
o Level One: Upon receiving an announcement of prefix p, update-
node(p) and re-aggregate(p). Upon receiving a withdrawal of
prefix p, de-aggregate(p) and remove p from the RIB.
o Level Two: Upon receiving an announcement of prefix p, update-
node(p) and re-aggregate(p). Upon receiving withdrawal of prefix
p, de-aggregate(p) and remove p. However, if p's nearest
ancestor, A, is a generated prefix, we also need to de-
aggregate(A) and remove A in order to prevent extra routable
space. For example, in Figure 2, when one of the covered prefix
is withdrawn, its generated parent prefix needs to be de-
aggregated.
o Level Three: Upon receiving an announcement of prefix p, update-
node(p) and re-aggregate(p). During the re-aggregation, if any D
is a generated prefix, de-aggregate(D) and remove D. This is
needed since in Level 3 aggregation, a generated prefix is not
supposed to appear as a descendant of any real prefix, otherwise
the forwarding correctness may not hold (explained in Section
3.4). Upon receiving a withdrawal of prefix p, de-aggregate(p)
and remove p.
o Level Four: The processing of announcements is the same as in
Level 3. The processing of withdrawals is mostly the same as in
Level 3, except that the de-aggregation will take place even if a
node turn from generated prefix to a real prefix with the same
nexthop. In Level 4 aggregation, it is possible that a generated
prefix covers another generated prefix. Therefore when a prefix
becomes real, its descendants need to be de-aggregated to make
sure there's no generated prefixes underneath. While in Level 3
aggregation, a generated prefix does not cover another generated
prefix.
5. Evaluation
We use publicly available routing tables from tens of networks
collected by the RouteViews Oregon monitor [routeviews] to evaluate
the various FIB aggregation algorithms for their table size
reduction, computing times, and extra routable space. We also use
BGP routing updates to evaluate our FIB update algorithm in terms of
the frequency of FIB updates, the scope of the updates, computing
times, and how often the full algorithm may need to run.
Zhang, et al. Expires April 29, 2010 [Page 16]
Internet-Draft FIB Aggregation October 2009
5.1. Methodology
We evaluate different FIB aggregation algorithms using publicly
available BGP routing tables taken from the RouteViews Oregon monitor
[routeviews]. Although these routing tables contain valid next AS
hops, they either do not have next-hop router information or do not
reflect the diversity of next-hops that an operational router
typically has, since the route monitors are not operational routers.
Therefore we need to generate realistic next-hops based on known
information. Our guideline of this process is trying to overestimate
the number of next-hops so that the table reduction results reflect
the worst case scenario, and real routers are likely to have better
aggregation ratio. The specifics of generating next-hops are as
follows.
Tables downloaded from RouteViews only contain the AS path for each
prefix. From the AS path, we can extract the next-AS-hop for each
prefix. We assume that prefixes sharing the same next-AS-hop share
the same iBGP neighbor (and thus the same next-hop router at the IP
level). We use routing tables from looking glass servers, which
provider the IBGP neighbor information, to validate this assumption.
For each next-AS-hop, if there is only one iBGP neighbor, then all
the prefixes using this next-AS-hop satisfy the assumption. If there
are multiple iBGP neighbors, the one that carries the most prefixes
is called "popular,'' and the prefixes that use the popular iBGP
neighbors satisfy the assumption. We found that most routing tables
have more than 90% of prefixes satisfying the assumption. Based on
this assumption, we use next-AS-hop as the next-hop router in
evaluation. This reflects the worst case scenario since large
networks have hundreds to thousands of neighbor ASes, but the number
of real next-hops should be much smaller.
We verify the correctness of each aggregated FIB by dividing each
original RIB prefix into multiple /24 sub prefixes and look up the
/24 sub prefixes in the aggregated FIB, which should give the same
next-hop as that in the RIB. All the results from our FIB
aggregation algorithms and incremental update algorithms have been
verified using this method.
All the evaluation has been done on a Linux machine with an Intel
Core 2 Quad 2.83GHz CPU using a single thread. The algorithms are
implemented in C and no performance optimization techniques have been
attempted. The Patricia trie implementation is taken from the C
source code of Perl's Net::Patricia module [NetPatricia], which in
turn was adapted from MRTD's [mrtd] source code.
We use the public BGP routing tables to do the evaluation because
these tables come from a diverse set of networks, from tier-1 ISPs to
Zhang, et al. Expires April 29, 2010 [Page 17]
Internet-Draft FIB Aggregation October 2009
small networks. However, in operational networks, there are other
types of routes, such as VPN routes, which can be of a large number
too. The FIB aggregation algorithms can be applied to these other
types of routes as well, even though this paper does not evaluation
the effectiveness of doing so. We plan to obtain forwarding tables
from operational ISP routers to validate our results.
5.2. Table Size Reduction and Overhead
We applied our algorithms to 37 routing tables archived at RouteViews
on Dec. 31, 2008 and then calculated the ratio between the aggregated
FIB size and the original routing table size. We obtained the
following results: (1) each level of aggregation can reduce the FIB
size more than the previous level, which is as expected; (2) even
with the simple Level 1 aggregation, the FIB size can be reduced by
30% to 50%; (3) Level 4 aggregation can reduce the FIB size from 60%
to over 90% with a median around 70% -- some of the tables have
almost all the prefixes sharing the same nexthop, so they can be
aggregated into very few entries; and (4) the Level 4A algorithm is
slightly better than the Level 4B algorithm, although the difference
is almost negligible.
To evaluate the effectiveness of our algorithms over a longer period
of time, we applied them to the RouteViews routing tables from 2001
to 2008. For each year, we obtained the median aggregation ratio of
all the routing tables (see Figure 6). We observed an overall
decreasing trend in the aggregation ratio (more table size
reduction), suggesting that the FIB has become more amenable to
aggregation over the years. One possible explanation is that the
increasing practice of prefix splitting due to multi-homing and
traffic engineering has made a larger percentage of FIB entries
aggregatable. We plan to further investigate this phenomenon in our
future work.
Zhang, et al. Expires April 29, 2010 [Page 18]
Internet-Draft FIB Aggregation October 2009
+==========+======+======+======+======+======+======+======+======+
|Algorithms| 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 |
+----------+------+------+------+------+------+------+------+------+
| Level-1 | 0.686| 0.707| 0.686| 0.653| 0.672| 0.660| 0.669| 0.665|
+----------+------+------+------+------+------+------+------+------+
| Level-2 | 0.556| 0.564| 0.527| 0.491| 0.509| 0.491| 0.494| 0.479|
+----------+------+------+------+------+------+------+------+------+
| Level-3 | 0.498| 0.518| 0.470| 0.433| 0.462| 0.447| 0.455| 0.437|
+----------+------+------+------+------+------+------+------+------+
| Level-4A | 0.396| 0.430| 0.367| 0.347| 0.367| 0.353| 0.358| 0.343|
+----------+------+------+------+------+------+------+------+------+
| Level-4B | 0.421| 0.458| 0.389| 0.363| 0.391| 0.370| 0.366| 0.363|
+==========+======+======+======+======+======+======+======+======+
Figure 6
In order to understand the significance of the above results, we use
the size of the routing table from 1994 to 2009 to translate the
table size reduction into how many years we turn the clock back for a
router. The data is obtained from bgp.potaroo.net, a site that
tracks the growth of the BGP table size. We found that the FIB size
in 2001 is around 30% of the FIB size on 12/31/2008, which means that
if an ISP uses our Level 4 aggregation algorithm, it will still be
able to use routers that were deployed in 2001. Assuming these
routers were sufficiently provisioned when they were purchased seven
or eight years ago, they should still be able to accommodate some
further routing table growth.
We measured the computing time incurred by our algorithms to
aggregate each of the 37 routing tables (see Figure 7). The Level 1
to 3 algorithms typically require tens of milliseconds, while the
Level 4 algorithms consume at most a few hundreds milliseconds. An
operational router may have slower CPU than our commodity Linux
machine, but it has specialized hardware and software, thus it is
hard to infer a router's computing time from what we report here.
Nevertheless, the simplicity of the algorithms and the very short
computing time suggest that the computational overhead in an
operational router may be small. Moreover, our results can be used
to compare the relative speed between different aggregation
algorithms. For example, we can observe that the Level 4B algorithm
is more computing intensive than the Level 4A algorithm.
Zhang, et al. Expires April 29, 2010 [Page 19]
Internet-Draft FIB Aggregation October 2009
+==========+==========+==========+=============+
|Algorithms| Max (ms) | Min (ms) | Median (ms) |
+----------+----------+----------+-------------+
| Level-1 | 58.0 | 53.0 | 55.8 |
+----------+----------+----------+-------------+
| Level-2 | 74.0 | 63.0 | 66.0 |
+----------+----------+----------+-------------+
| Level-3 | 76.7 | 66.9 | 69.8 |
+----------+----------+----------+-------------+
| Level-4A | 120.0 | 110.0 | 115.5 |
+----------+----------+----------+-------------+
| Level-4B | 398.5 | 200.0 | 269.3 |
+==========+==========+==========+=============+
Figure 7
We also measured the amount of extra routable space (measured by the
number of /8 prefixes) introduced by three of our algorithms (see
Figure 8. Since Level 1 and 2 algorithms do not introduce new
routable space, we do not present them here. We found that the Level
3 algorithm is better than the Level 4 algorithms in this regard,
while the Level 4B algorithm covers more routable space than the
Level 4A algorithm. This is mainly because the 4B algorithm
aggregates prefixes from top to bottom, which introduces more shorter
prefixes than the 4A algorithm.
+==========+===========+===========+==============+
|Algorithms| Max (/8s) | Min (/8s) | Median (/8s) |
+----------+-----------+-----------+--------------+
| Level-3 | 6.50 | 1.49 | 1.72 |
+----------+-----------+-----------+--------------+
| Level-4A | 7.03 | 4.58 | 5.09 |
+----------+-----------+-----------+--------------+
| Level-4B | 7.19 | 6.76 | 7.14 |
+==========+===========+===========+==============+
Figure 8
5.3. Routing Update Handling
We use one month (December 2008) of BGP routing updates collected by
RouteViews from the Level 3 Communications peer router to evaluate
our incremental update handling algorithms. We summarize our results
in Figure 9. There were 7,254,478 routing updates during this month.
For each unaggregated or aggregated FIB, we counted how many of these
routing updates would cause an update to the FIB, i.e. an insertion,
removal, or a change to the next-hop of a FIB entry. There were
Zhang, et al. Expires April 29, 2010 [Page 20]
Internet-Draft FIB Aggregation October 2009
2,914,020 updates to the unaggregated FIB. We make the following
observations: (a) the RIB processing time per routing update is
increased from 0.593us to 0.806us for Level 1 aggregation (36%
increase) and 0.824us for Level 4A aggregation (42% increase) due to
the extra operations required by the update handling algorithm, i.e.
de-aggregation in certain cases; (b) the total FIB processing time is
decreased by 3% to 10%. More specifically, although there is a
slightly higher total number of affected prefixes for the aggregated
FIBs than the unaggregated FIB (6th column of Figure 9), each prefix
takes less time to update in an aggregated FIB (last column of
Figure 9), leading to a lower total FIB processing time. The lower
FIB update time per prefix is likely due to the small FIB size after
aggregation (i.e. faster prefix lookup).
+==========+=====+=====+=======+=======+=======+=====+=====+
|Algorithms|T_rib|t_rib| N_fib | n_fib | p_fib |T_fib|t_fib|
| | (s) | (us)| | | | (s) | (us)|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Original | 4.30|0.593|2914020|2914020| 1.000| 2.60|0.892|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Level-1 | 5.85|0.806|2904630|2921335| 1.005| 2.53|0.866|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Level-2 | 5.96|0.822|2901530|2940178| 1.013| 2.45|0.833|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Level-3 | 5.98|0.824|2900389|2941398| 1.014| 2.42|0.823|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Level-4A | 6.10|0.841|2897450|2942969| 1.016| 2.33|0.792|
+----------+-----+-----+-------+-------+-------+-----+-----+
| Level-4B | 6.41|0.880|2913988|3388764| 1.162| 2.61|0.770|
+==========+=====+=====+=======+=======+=======+=====+=====+
Figure 9
T_rib: total RIB processing time;
t_rib: average RIB processing time per routing update;
N_fib: total number of FIB updates;
n_fib: total number of prefixes affected in the FIB;
p_fib: average number of affected prefixes per FIB update;
T_fib: total FIB processing time;
t_fib: average FIB processing time per affected prefix.
Note that there may be fewer routing updates that cause changes to an
aggregated FIB than the unaggregated FIB (see the 4th column of
Figure 9). For example, the aggregated FIB from our Level 4A
algorithm had 16,570 fewer FIB updates than the unaggregated FIB.
This is due to two reasons. First, some of the route withdrawals may
be for prefixes already removed from the FIB by the aggregation
Zhang, et al. Expires April 29, 2010 [Page 21]
Internet-Draft FIB Aggregation October 2009
algorithm. Second, our update algorithm minimizes the number of FIB
updates at the cost of slightly increased FIB size. On the other
hand, each FIB update may affect more prefixes in the case of the
aggregated FIB (6th column of Figure 9). Nevertheless, the total
number of affected prefixes for the aggregated FIBs is less than 1%
higher than the unaggregated FIB. Since each prefix takes less time
to update in the FIB, there is an overall time saving in updating the
aggregated FIBs.
In summary, FIB aggregation can reduce both the FIB size and FIB
update time. However, these benefits come at the cost of higher RIB
processing time. How to reduce the time overhead of the update
handling algorithm in one area of our future research.
Since our update handling algorithm trades off the FIB size for fewer
changes, the FIB needs to be re-aggregated when its size reaches a
certain threshold. To estimate how frequently the re-aggregation
will be triggered, we measure the growth of the FIB size as our
algorithm handles the BGP updates during the month of Dec. 2008 (see
Figure 10). The Level 4A aggregated FIB had 122,493 entries on Dec.
1, 2008. If we set the FIB size threshold to be 150,000 entries
(about 55% of the full routing table size), the FIB would be re-
aggregated on Dec. 4, 2008. We modified our update handling
algorithm to do a full aggregation when the FIB size reaches 150,000
and found that the full aggregation was indeed triggered once every
few days (see Figure 11. Considering that each full aggregation run
takes at most a few hundred milliseconds on our commodity PC (perhaps
a little longer on a router), incurring this overhead every few days
or so should not be a concern for an ISP.
2008.12.01 122493 2008.12.02 134973 2008.12.03 144604
2008.12.04 153646 2008.12.05 161553 2008.12.06 163750
2008.12.07 165838 2008.12.08 169662 2008.12.09 174251
2008.12.10 177586 2008.12.11 182915 2008.12.12 186142
2008.12.13 188863 2008.12.14 191313 2008.12.15 192892
2008.12.16 195079 2008.12.17 197376 2008.12.18 199840
2008.12.19 200484 2008.12.20 202941 2008.12.21 203587
2008.12.22 204350 2008.12.23 205750 2008.12.24 206364
2008.12.25 208668 2008.12.26 209141 2008.12.27 209494
2008.12.28 210531 2008.12.29 211312 2008.12.30 212679
2008.12.31 212897
Figure 10
Zhang, et al. Expires April 29, 2010 [Page 22]
Internet-Draft FIB Aggregation October 2009
2008.12.01 122493 2008.12.02 134973 2008.12.03 144604
2008.12.04 118495 2008.12.05 134863 2008.12.06 140979
2008.12.07 144169 2008.12.08 105650 2008.12.09 125091
2008.12.10 135525 2008.12.11 146605 2008.12.12 119185
2008.12.13 128680 2008.12.14 135729 2008.12.15 142430
2008.12.16 147821 2008.12.17 122117 2008.12.18 135979
2008.12.19 144314 2008.12.20 114800 2008.12.21 122179
2008.12.22 129412 2008.12.23 136886 2008.12.24 141400
2008.12.25 146456 2008.12.26 149222 2008.12.27 113078
2008.12.28 120509 2008.12.29 120567 2008.12.30 136701
2008.12.31 141190
Figure 11
6. Conclusion
We have presented an in-depth analysis of FIB aggregation and our
results suggest that it is a viable short-term solution to the
problem of growing FIB table size. Our aggregation algorithms
reduces the FIB size by as much as 70% and requires no hardware
changes or network-wide software/configuration changes, thus reducing
the need for ISP router upgrades in the short term. During this
time, the research community and the industry can work on a long-term
solution to reduce both the routing table and the FIB table.
Moreover, FIB aggregation can co-exist with any long-term solution to
further reduce ISPs' operational costs.
7. Acknowledgements
The authors are part of the eFIT project team funded by the US
National Science Foundation.
8. Security Considerations
This draft is a discussion on the Internet's necessity to follow an
evolutionary path towards the future. There is no direct impact on
the Internet security.
9. Informative References
[Cain02AUTOAGG]
Cain, B., "Auto aggregation method for IP prefix/length
pairs", Patent, June 2002,
Zhang, et al. Expires April 29, 2010 [Page 23]
Internet-Draft FIB Aggregation October 2009
<http://www.freepatentsonline.com/6401130.html>.
[Herrin] Herrin, W., "Opportunistic Topological Aggregation in the
RIB-FIB Calculation?", July 2008, <http://
www.ops.ietf.org/lists/rrg/2008/threads.html#01880>.
[HighlyActive]
Oliveira, R., Izhak-Ratzin, R., Zhang, B., and L. Zhang,
"Measurement of Highly Active Prefixes in BGP".
[Li09RRG] Li, T., "Preliminary Recommendation for a Routing
Architecture", Internet Draft, March 2009, <http://
tools.ietf.org/html/draft-irtf-rrg-recommendation-02>.
[NetPatricia]
Plonka, D. and P. Prindeville, "Net-Patricia Perl Module",
<http://search.cpan.org/dist/Net-Patricia/>.
[RAWS] Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
Workshop on Routing and Addressing", Internet Draft, 2007,
<http://www.ietf.org/internet-drafts/
draft-iab-raws-report-02.txt>.
[RFC4984] Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
Workshop on Routing and Addressing", RFC 4984,
September 2007.
[RouterTopology]
Li, L., Alderson, D., Willinger, W., and J. Doyle, "A
first-principles approach to understanding the Internet's
router-level topology", ACM SIGCOMM 2004.
[Virtual_Aggregation]
Francis, P., Xu, X., and H. Billani, "FIB Suppression with
Virtual Aggregation and Default Routes",
draft-francis-idr-intra-va-01, September 2008.
[mrtd] "MRTD: The Multi-Threaded Routing Toolkit",
<http://www.mrtd.net/>.
[routeviews]
Advanced Network Technology Center and University of
Oregon, "The RouteViews Project",
<http://www.routeviews.org/>.
Zhang, et al. Expires April 29, 2010 [Page 24]
Internet-Draft FIB Aggregation October 2009
Authors' Addresses
Beichuan Zhang
Univ. of Arizona
Email: bzhang@arizona.edu
Lan Wang
Univ. of Memphis
Email: lanwang@memphis.edu
Xin Zhao
Univ. of Arizona
Email: zhaox@email.arizona.edu
Yaoqing Liu
Univ. of Memphis
Email: yliu6@memphis.edu
Lixia Zhang
UCLA
Email: lixia@cs.ucla.edu
Zhang, et al. Expires April 29, 2010 [Page 25]
| PAFTECH AB 2003-2026 | 2026-04-24 06:05:10 |