One document matched: draft-dimitri-grow-rss-00.txt
Network Working Group Dimitri Papadimitriou
Internet Draft Jim Lowe
Expires: August 2008 Alcatel-Lucent
February, 18 2008
Routing System Stability
draft-dimitri-grow-rss-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on August 18, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2008).
Abstract
Understanding the dynamics of the Internet routing system is
fundamental to ensure its robustness/stability and to improve the
mechanisms of the BGP routing protocol. This documents outlines a
program of activity for identifying, documenting and analyzing the
dynamic properties of the Internet and its routing system.
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 1]
Routing System Stability February 2008
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Document History
This is the initial version of this document.
1. Introduction
Understanding the dynamics of the Internet routing system is
fundamental to ensuring its stability and improving the mechanisms of
the BGP routing protocol [RFC4271]. Investigations on the Internet
routing system dynamics involve investigations on routing engine
resource consumption, in particular, memory and CPU.
System resource consumption depends on two items. First, there is the
size of the routing space. The greater the number of routing entries
there are, the greater the memory requirement on a routing device,
and the greater the need for increased processing and searching
capabilities to perform lookup operations. Second, the greater the
number of adjacency and peering relationships between routing
devices, the greater the dynamics associated with the routing
information updates exchanged between all these adjacencies and
peerings. This activity also increases the memory requirements for
the operation of the routing protocol.
In other words, as the routing system grows [Huston07a], so do the
requirements for routing engine memory and processing capacity. From
a routing dynamics viewpoint, minimizing the amount of BGP routing
information exchanged by routers is key to grappling with increasing
requirements on memory and CPU.
So, although current routing engines could potentially support up to
O(1M) routing table entries instabilities resulting i) from routing
protocol behavior, ii) routing protocol information exchanges, and
iii) changes in network topology may adversely affect the network's
ability to remain in a useable state for extended periods of time.
Note however that in terms of number of active routing entries, such
routing engine could at worst have to deal with O(1M) routes
within the next 5 years, see [Fuller07].
2. Objectives
The overall goal is to identify, root cause and document - in a
structured manner - occurrences of Internet routing stability
phenomena using data from operational networks.
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 2]
Routing System Stability February 2008
To help accomplish this goal, the following tasks will be undertaken.
1. Development of a methodology to process and interpret routing
table data. One guiding principle will be to be able to reproduce
phenomena previously observed at different locations. This work
will include documenting what information to collect and how it
should be archived.
2. Identification of a set of stability criteria and development of
methods for using them to provide a better understanding of the
routing system's stability. Other working groups may find this
beneficial in addition to the GROW working group.
3. Begin investigation into how routing protocol behavior and network
dynamics mutually influence each other. The nature of the
observations collected in the first task will suggest directions
to proceed with this work.
This proposed approach would allow rigor and consistency to be
brought to the study of network and routing stability. For example,
it would allow for a unified approach to the cross-validation of
techniques for looking at improving path exploration effects on the
routing system.
3. Relevance to the GROW working group charter
This effort fits into the GROW working group's charter to deal with
BGP operational issues related to routing table growth rates and the
dynamic properties of the routing system.
GROW has an advisory role to the IDR working group to provide
commentary on whether BGP is addressing relevant operational needs
and, where appropriate, suggest course corrections, which puts this
effort in a central place in the BGP investigation process.
Also, since the GROW working group community is directly linked to
the broader BGP operational community, this effort goes together with
obtaining routing table data from the field.
4. Routing system stability
In order to begin the discussion defined in work item detailed in
Section 2, point 2, this section proposes a number of definitions for
common routing and network stability terms.
The stability of a routing system is characterized by its response
(in terms of processing routing information) to inputs of finite
amplitude.
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 3]
Routing System Stability February 2008
These inputs may be classified as either internal system events, such
as routing protocol configuration changes, or as external system
events, such as routing information updates. Such events are
sometimes loosely referred to as routing "instabilities"; however,
this term should be reserved for discussion about how the routing
system responds to such events.
A routing system, which returns to its initial equilibrium state,
when disturbed by an external and/or internal event, is considered to
be stable.
A routing system, which transitions to a new equilibrium state, when
disturbed by an external and/or internal event, is considered to be
marginally stable.
Such state transitions, whether stable or marginal, should occur
before the arrival of new input events.
The magnitude of the output of a stable routing system is small
whenever the input is small. That is, a single routing information
update shall not result in output amplification. Equivalently, a
stable system's output will always decrease to zero whenever the
input events stop.
A routing system, which remains in an unending condition of
transition from one state to another when disturbed by an external or
internal event, is considered to be unstable.
The degree to which a routing system, or components thereof, can
function correctly in the presence of input events is a measure of
the robustness of the system.
A precise definition of stability requires the specification of the
following elements:
o) The system being examined: for example, a system might be
comprised of: the routing system and associated events, such as
input events, outputs, and related arrival rates.
o) A convergence metric: a metric to define the convergence
characteristics of the system.
o) A stability metric: a metric that describes the degree of
stability of the system and indicates how close the system is to
being unstable.
The convergence and stability metrics may be affected by the
following parameters:
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 4]
Routing System Stability February 2008
o) The number of routing entries (where, each entry R toward an
existing prefix D has an associated attribute set A consisting of
AS-Path, MED, and Local Preference, etc.);
o) The number of CPU cycles, C, required to process a routing entry,
and its associated memory space, M;
o) The input events and their arrival rates;
o) The output events associated with the processing of each input
event.
5. Mathematical formulation
Section 4 outlined some proposals for definitions of commonly used
stability terms applied to network and routing systems. In this
section, an initial attempt is made to build a mathematical
formulation around those concepts in order to begin the development
of more practical metrics.
Let RT be the "Routing Table" and RT(n) represent the routing table
at some time n. At time n+1, the routing table can be expressed as
the sum of two components:
RT(n+1) = RTo(n) + deltaRT(n+1) (1)
In this equation, RTo(n) is the set of routes that experience no
change between n and n+1, and deltaRT(n+1) accounts for all route
changes (additions, deletions, and changes to previously existing
routes) between n and n+1. deltaRT(n+1) itself can expressed as the
sum of two components:
deltaRT(n+1) = RTc(n+1) + RTn(n+1) (2)
In this equation, RTc(n+1) is a set of routes at time n that
experience some sort of change at time n+1. Rtn(n+1) is a set of new
routes observed at time n+1 that were not present at time n.
RTc and RTn are each composed of two parts: one due to changes in
network state (new routes appearing, changes to existing routes,
etc.), and a second attributable to routing protocol changes (BGP
session failure, BGP route attribute changes, changes to filtering
policies, etc.). Equation (1) can be expanded to account for these
separate effects. First, substitute equation (2) into equation (1):
RT(n+1) = RTo(n) + RTc(n+1) + RTn(n+1) (3)
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 5]
Routing System Stability February 2008
As was mentioned, the terms RTc(n+1) and RTn(n+1) can be further
expanded into their two constitute components:
RTc(n+1) = RTcN(n+1) + RTcR(n+1) (4)
RTn(n+1) = RTnN(n+1) + RTnR(n+1) (5)
In these two equations, "N" denotes the component due to network
topology changes, and "R" denotes the component due to routing
protocol changes.
These equations can be used as the basis for deriving the convergence
and stability metrics discussed in Section 4. However, there are a
number of issues that will need to be resolved in order to make
progress:
a) Some thought will need to be done on how to distinguish between
network and routing protocol effects;
b) Some thought needs to be given to "timescales of applicability" in
order to make assessments about what constitutes instability in a
routing system from a practical point-of-view;
c) Some thought needs to given to how a protocol can absorb network
instabilities. [RFC2902] touches on this issue and indicated that
damping the effects of route updates enhances stability, but
possibly at the cost of reachability for some prefixes.
6. Previous work on BGP and Routing system stability
There have been numerous studies of BGP dynamics over the years. In
subsequent versions of this draft, they will be summarized in this
section and general findings will be drawn.
In this version of the document, we will just outline some of the
findings surrounding recent studies concerned with interactions of
BGP with Route Flap Damping (RFD) in order to show some of the
complexity in understanding BGP dynamics.
Work began in the early 1990s on an enhancement to the BGP called
"Route Flap Damping" [RFC2439]. The purpose of RFD was to prevent or
limit sustained route oscillations that could potentially put an
undue processing load on BGP. At that time there was a belief that
the predominate cause of route oscillation was due to BGP routing
sessions going up and down because they were being carried on
circuits that were themselves persistently going up and down (see
[Huston07b] for a fuller discussion). This would result in a constant
stream of route updates and withdrawals from the affected BGP
sessions that could propagate through the entire network due to the
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 6]
Routing System Stability February 2008
network's flat addressing architecture. The first draft of the RFD
algorithm specification appeared in October 1993, updates and
revisions lead to the publication of RFC 2439, BGP Route Flap
Damping, in November 1998 [RFC2439].
Over the next several years, RIPE published three recommendations,
[RIPE178], [RIPE210] and [RIPE229] in an attempt to establish
guidelines for operators when setting RFD's user configurable
parameters. The ultimate goal was to make the deployment of RFD
consistent throughout the network because different vendors provided
different default values for RFD's various parameters, and this could
result in different damping behaviors across the network. The last of
these recommendations, [RIPE229], was published in October 2001.
In August 2002, Mao et al. [Mao02] published a paper that discussed
how the use of RFD, as specified in RFC 2439. They showed that RFD
can significantly slowdown the convergence times of relatively stable
routing entries. This abnormal behavior arises during route
withdrawal from the interaction of RFD with "BGP path exploration"
(in which in response to path failures or routing policy changes,
some BGP routers may try a sequence of transient alternate paths
before selecting a new path or declaring destination unreachability).
The NANOG 2002 presentation of Bush et al. [Bush02] succinctly
summarized the findings of Mao et al. [Mao02] and presented some
observational data to illustrate the phenomena. The overall
conclusion of this work was that it was best not to use RFD so that
the overall ability of the network to re-converge after an episode of
"BGP path exploration" was not needlessly slowed.
In May 2006, RIPE published a final set of RFD recommendations
[RIPE378] that directed operators to not use RFD due primarily to the
findings presented in [Mao02].
Recently, solutions such as EPIC [Chandrashekar05], or improving BGP
convergence through Root Cause Notification (BGP-RCN) [Pei05] have
been proposed to solve the "BGP path exploration" problem; however,
there are several details that still require consideration.
BGP stability has also been reported in [RFC4984], outcome of the
Routing and Addressing Workshop held by the Internet Architecture
Board (IAB).
7. Security Considerations
TBD.
8. References
[Chandrashekar05] J.Chandrashekar, Z.Duan, Z.-L.Zhang, and J.Krasky,
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 7]
Routing System Stability February 2008
Limiting path exploration in BGP, In Proc. IEEE INFOCOM
2005, Miami, Florida, March 2005.
[Huston07a] G.Huston, http://bgp.potaroo.net, 2007.
[Huston07b] G.Huston, "Damping BGP", June 2007,
http://www.potaroo.net/ispcol/2007-06/dampbgp.html
[Labovitz00] C.Labovitz, A.Ahuja, A.Bose, and F.Jahanian, "Delayed
Internet Routing Convergence," in Proceedings of ACM
SIGCOMM 2000.
[Li07] T.Li, G.Huston, "BGP Stability Improvements", Internet
draft, work in progress, draft-li-bgp-stability-01, June
2007.
[Mao02] Z.Mao, R.Govindan, G.Varghese, and R.Katz, "Route Flap
Damping Exacerbates Internet Routing Convergence", ACM
SIGCOMM'02, August 2002.
[Pei05] D.Pei, M.Azuma, D.Massey, and L.Zhang, "BGP-RCN:
improving BGP convergence through root cause
notification", Computer Networks, ISDN Syst. vol. 48, no.
2, pp 175-194, June 2005.
[RFC2902] S.Deering, et al., "Overview of the 1998 IAB Routing
Workshop", RFC 2902, August 2000.
[RFC2439] Villamizar, C., Chandra, R., and Govindan, R., "BGP Route
Flap Damping", RFC 2439, November 1998.
[RFC4271] Y.Rekhter, T. Li, and S.Hares, Ed., "A Border Gateway
Protocol 4 (BGP-4)", RFC 4271, January 2006.
[RFC4984] D.Meyer, Ed., L.Zhang, Ed., K.Fall, Ed., "Report from
the IAB Workshop on Routing and Addressing", RFC 4984,
September 2007.
[RIPE178] Barber, T., Doran, S., Karrenberg, D., Panigl, C., and
Schmitz, J., "RIPE Routing-WG Recommendations for
coordinated route-flap damping parameters", RIPE-178, 2
February 1998.
http://www.ripn.net:8080/nic/ripe-docs/ripe-178.txt
[RIPE210] Barber, T., Doran S., Karrenberg, Pangil, C., and
Schmitz, J., "RIPE Routing-WG Recommendation for
coordinated route-flap damping parameters", RIPE-210, 12
May 2000. http://www.ripn.net/nic/ripe-docs/ripe-210.txt
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 8]
Routing System Stability February 2008
[RIPE229] Panigl, C., Schmitz, J., Smith, P., and Vistoli, C.,
"RIPE Routing-WG Recommendations for Coordinated Route-
flap Damping Parameters", RIPE-229, 22 October 2001.
ftp://ftp.ripe.net/ripe/docs/ripe-229.txt
[RIPE378] Smith, P., and Panigl, C., "RIPE Routing Working Group
Recommendations on Route-flap Damping", RIPE-378, 11 May
2006. http://www.ripe.net/ripe/docs/ripe-378.html
[Bush02] Bush, R., Griffin, T., and Mao, Z.M., "Route flap damping
harmful?", NANOG-26, 28 October 2002.
http://www.nanog.org/mtg-0210/ppt/flap.pdf
Authors' Addresses
Dimitri Papadimitriou
Alcatel-Lucent Bell NV
Copernicuslaan 50
B-2018 Antwerpen, Belgium
Phone: +32 3 2408491
Email: dimitri.papadimitriou@alcatel-lucent.be
James Lowe
Alcatel-Lucent
600 March Road
Ottawa, Ontario
Canada, K2K 2E6
Phone: 1-613-784-1495
Email: jim.lowe@alcatel-lucent.com
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 9]
Routing System Stability February 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
D.Papadimitriou & J.Lowe - Expires August 2008 [Page 10]
| PAFTECH AB 2003-2026 | 2026-04-24 10:28:40 |