One document matched: draft-ash-manral-ospf-congestion-control-00.txt
Network Working Group Jerry Ash
Internet Draft Gagan L. Choudhury
<draft-ash-manral-ospf-congestion-control-00.txt Vera D. Sapozhnikova
Expiration Date: November 2002 Mostafa Sherif
AT&T
Vishwas Manral
NetPlane Systems
Anurag Maunder
Sanera Systems
April, 2002
Congestion Avoidance & Control for OSPF Networks
<draft-ash-manral-ospf-congestion-control-00.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Based on overload and failure experience with link-state protocols, this
draft identifies means to enable link-state protocols to:
o avoid getting into congestion states wherever possible
o respond gracefully to network overloads and failures
o gracefully recover from massive loss of topology database
information
The proposed mechanisms include the following:
o throttle setups of link adjacencies
o reduce the rate of flooding of link-state-advertisements(LSAs)
o prioritized treatment of Hello & LSA ack packets
o database backup & resynchronization for graceful recovery from loss
of topology database information
Table of Contents
1. Introduction & Problem Statement
2. Failure Experience & Analysis
3. Proposed Solution Methods
4. OSPF Congestion Avoidance & Control
4.1 OSPF Congestion Avoidance Mechanisms
4.1.1 Throttle Setups of Link Adjacencies
4.1.2 DSCP Marking of Hello & LSA Ack Packets
4.1.3 Use of Other Control Packets to Detect Live-ness of
Adjacency
4.2 OSPF Congestion Control Mechanisms
4.2.1 Detecting Congestion & Notifying Neighbors
4.2.2 Router Response When it Detects Congestion
4.2.2.1 Congestion Notification
4.2.2.2 Throttle Rate of LSA Flooding
4.2.2.3 Increase Adjacency Break Interval
4.2.2.4 Reduce the Rate of SPF Computation
4.2.3 Router Response if Congestion Notification Received
4.2.4 Graceful Recovery from Loss of Topology Database
Information - Database Backup & Resynchronization
5. Related Solution Methods
5.1 Optimized Link State Routing (OLSR) Protocol
5.2 Fisheye State Routing Protocol (FSR) for Ad Hoc Networks
5.3 Landmark Routing Protocol (LANMAR) for Large Scale Ad Hoc
Networks
5.4 Topology Broadcast Based on Reverse-Path Forwarding (TBRPF)
6. Security Considerations
7. Acknowledgments
8. References
9. Appendix
Appendix A - Dropping Link Adjacencies Under Extreme Congestion
Appendix B - Vendor Analysis of LS Protocol Performance
Appendix C - Analytic Modeling & Simulation Analysis of LS
Protocol Performance
10. Authors' Addresses
11. Copyright Statement
1. Introduction & Problem Statement
Congestion in control protocols has been a problem in data networks many
times in the past, [att, pappalardo1, pappalardo2] being two good
examples. It can arise in data networks for many different reasons.
There is evidence based on previous failures that link-state protocols
such as OSPF cannot recover from large failures that can result in
widespread loss of topology database information. The problem is
aggravated when the number of routers in an area is large, which can
then lead to an overload in the flooding of topology database
information. Topology information in OSPF is distributed in Link State
Advertisements(LSA), which are encapsulated in link state update packets
and periodically flooded to other routers in the domain through all
available links. In some instances of network overload, failure, and/or
congestion, redundancy of the flooding mechanism can overwhelm network
processors and bring the network down.
Earlier papers and contributions identified issues of congestion control
and failure recovery for OSPF networks, such as OSPF [maunder,
choudhury, pappalardo1, pappalardo2, atm01-0101]. The problem addressed
in this draft is to enable OSPF to
o avoid getting into congestion states wherever possible,
o respond gracefully to network overloads and failures, and
o gracefully recovery from massive loss of topology database
information.
The need for graceful recovery mechanisms is illustrated in Section 2,
where we discuss failure experience in which all topology database
information is lost in all routers in a large network, and the
link-state protocol is unable to recover.
If a router floods control packets to its neighbors and has no
congestion avoidance and control mechanisms, it can overwhelm its
neighbors and cause congestion. To prevent this we need mechanisms to
avoid going into congestion and to recover when congestion occurs.
Congestion avoidance prevents the OSPF domain from entering a state of
congestive collapse with control traffic. Congestive collapse is a
situation where, although the network links are being heavily utilized
with an overload of control messages, very little useful work is being
done because of the routers are overwhelmed with non-useful work.
Congestion control and avoidance consists of automatic mechanisms to
reduce control traffic to avoid congestion, and actions taken if
congestion arises. We define both in the draft and propose specific
additional considerations for network congestion control/failure
recovery for the OSPF protocol, which include:
o throttle setups of link adjacencies
o reduce the rate of flooding of link-state-advertisements(LSAs)
o prioritized treatment of Hello & LSA ack packets
o database backup & resynchronization for graceful recovery from loss
of topology database information
We also define a standard way to detect and notify congestion states
between routers.
Section 2 contains a discussion of data network outages in which these
problems have manifested themselves. As a result of these failures, a
number of congestion control/failure recovery mechanisms were
recommended, and these are reviewed in Section 3. In Section 4, we
discuss how these candidate mechanisms for control of network congestion
and failure recovery can be used. Section 5 discusses related solution
methods which have been implemented elsewhere to control congestion and
may be worth considering for OSPF.
2. Failure Experience & Analysis
AT&T has experienced serious data network outages in which recovery of
the underlying LS protocols was inadequate. For example, in the failure
in the AT&T Frame Relay Network on April 13, 1998 [att], an initial
procedural error triggered two undetected software bugs, leading to a
huge overload of control messages in the network. The result of this
control overload was the loss of all topology information, which the LS
protocol then attempted to recover using the usual Hello and LS updates.
However, the LS protocol was overwhelmed and unable to recover, and
manual means had to be used to restart the network after a long outage.
Analysis has shown that several problems then occurred to prevent the
network from recovering properly:
o Very large number of LS updates being sent to every router to
process, causing general processor overload
o Route computation based on incomplete topology recovery, causing
routes to be generated based on transient, asynchronous topology
information and then in need of frequent re-computation
o Inadequate work queue management to allow processes to complete
before more work is put into the process queue
o Inability to segment the network (into smaller "areas") to aid in
the LS protocol recovery
o Inability to access the node processors with network management
commands due to lack of necessary priority of these messages
A more recent failure occurred on February 20, 2001 in the AT&T ATM
Network, which resulted in a large overload of LS Updates, and a lengthy
network outage [att, pappalardo1, pappalardo2]. Manual procedures were
put in place to reduce LS updates flooding, which worked to stabilize
the network. It is desirable that such LS updates flooding reduction be
automatic under overload. In general, there have been a number of major
outages reported by most major carriers, and routing protocol issues have
generally been involved. Other relevant LS-network failures are reported
in [cholewka, jander].
Various networks employing LS protocols use various control messages and
mechanisms to update the LS database, not necessarily LSAs, PTSEs, or
flooding mechanisms. Based on experience, however, the LS protocols
including OSPF are found to be vulnerable to loss of topology
information, such as occurred in the scenario described above [att],
control overload to re-sync databases, and other failure/overload
scenarios which make such networks more vulnerable in the absence of
adequate protection mechanisms.
Hence we are addressing a generic problem of LS protocols across a
variety of implementations, and the basic problem is prevalent in LS
protocol networks employing frame-relay, ATM, and IP based technologies.
However the solutions given are for OSPF networks alone.
3. Proposed Solution Methods
We propose enhancements and methods to prevent congestion arising out of
any stimulus. There are general congestion control/failure recovery
methods that have been applied in many networks, and these are
considered for enhancement to the OSPF protocol.
We propose enhancing OSPF using commonly known congestion control and
recovery mechanisms. These types of mechanisms have been implemented in
other protocols, such as EIGRP ["Enhanced Interior Gateway Routing
Protocol"] which uses a backoff mechanism based on round trip timeout.
Handling congestion in a more intelligent manner would also allow OSPF
areas to grow to larger scales; the flooding diameter as a limiting
factor could be greatly reduced, as congestion reaction mechanisms
within the protocol allow flooding to be paced to levels on par with the
capabilities of each adjacent pair of routers in the area.
The following mechanisms are proposed in this draft for congestion
control in OSPF:
o throttle setups of link adjacencies, and reduce the rate of
flooding of LSAs. This is discussed in greater detail in this
document.
o database backup, in which a topology database could be
automatically recovered from loss based on local backup mechanisms.
This is discussed in further in this document.
o special marking of critical control messages (e.g., Hello and LSA
Ack packets) so that they may receive prioritized processing
[maunder]. This has been dealt in detail in [maunder], with packets
prioritized according to the draft the chances of adjacencies
breaking is lowered. Not processing of Hellos, which can lead to
adjacencies breaking down, and not processing acknowledgements,
which will lead to retransmissions, should also be avoided.
The following mechanisms also help in congestion control for OSPF:
o hitless restart, which allows routes to continue to be used if
there is an uninterrupted data path, even if the control path is
interrupted due to a failure [moy1, zinin1]. This is generally
helpful in stable topologies, however hitless restart prevents
other routers in the domain from changing paths to destinations
through the restarting router, and hence prevents transient load
caused by flooding of changing LSA's and SPF calculations.
o give necessary priority to network management control messages so
they are not locked out during times of failure. This is because
certain configurable parameters need to be updated to prevent
congestion, and giving less priority to such messages can lead to
further congestion.
o use flooding optimizations to prevent unnecessary flooding in
densely meshed topologies. This mechanism has been specified in
[moy2, zinin2], which prevent unnecessary flooding in parallel
paths. Further work for highly meshed topologies is planned to
be taken up by OSPF-WG.
o mechanisms to be able to automatically segment a large area into
smaller areas, allow these segments to recover their topology and
routing, then combine the smaller areas into the original large
area and allow that to recover. Such mechanisms are implemented
in operational networks, such as through network management scripts
to segment the network under severe overload. Currently these
procedures are triggered manually, however it would be desirable
to automate such mechanisms within OSPF (see Appendix A for further
discussion).
o provide bandwidth allocation mechanisms to limit the amount of link
bandwidth allocation to control traffic versus data traffic. This
is essentially a rate limiting mechanism, which can be and is used
in implementations. For example, this capability exists in PNNI:
RCCs are provisioned with traffic parameters (default parameters
are specified in [af-pnni-0055.000]) and traffic is shaped into the
RCC VC at the configured traffic rate.
4. OSPF Congestion Avoidance & Control
Congestion avoidance and control mechanisms are described in this
Section. While we suggest example implementation methods, it is
intended that the exact implementation details be left as optional and
up to the implementer. It is also intended that all of the methods
suggested in the draft should be implemented together rather than any
sub-set of the methods.
4.1 OSPF Congestion Avoidance Mechanisms
4.1.1 Throttle Setups of Link Adjacencies
When an OSPF process in a router goes down and comes up again (or loses
a large number of adjacencies for any other reason), it will start
forming adjacencies with other routers. However, as the router reforms
these lost adjacencies, it may start breaking already existing
adjacencies due to processor overload or network congestion.
For example, some implementations may experience memory starvation when
a large number of adjacencies are being established because, per the
OSPF specification, the router is supposed to construct the LSDB summary
list containing headers of all LSAs for each adjacency when the
adjacency transitions from ExStart to Exchange. Building this LSDB
summary list may cause the OSPF to use more memory than it uses under
normal operation, so more adjacencies can be formed and held over time
than can be formed during a restart condition. This type of problem can
be prevented by prioritizing adjacencies, which is further explained
below.
A count, MAX_ADJACENCY_BUILD_COUNT, of the maximum number of adjacencies
that a router can bring up at any time SHOULD be kept in the OSPF main
structure. After the count of the number of adjacencies in the states
between Exchange and Loading exceeds MAX_ADJACENCY_BUILD_COUNT, new
adjacencies SHOULD not be brought up. The methods which MAY be used to
achieve this limiting of adjacencies may include:
o Not sending Hellos on particular interfaces, which would bring the
neighbors to the DOWN state.
o Sending Hellos, but not listing the neighbors, if the neighbors are
in state below Two-way (note that if we do not list the neighbor it
causes the adjacency to go below Two-way to One-way).
o Not let the adjacency progress past the Two-Way state by not
sending any database description packets, and ignoring any database
description packet received. We can also park the adjacency in
ExStart.
Such a MAX_ADJACENCY_BUILD_COUNT mechanism has already been implemented
in some routers.
4.1.2 DSCP Marking of Hello & LSA Ack Packets
A special DSCP [RFC2474] marking MUST be given to Hello and LSA Ack
packets. These are critical control messages that must receive
prioritized treatment. Not processing of Hellos, which can lead to
adjacencies breaking down and inconsistent designated router elections,
and not processing acknowledgements, which will lead to retransmissions,
MUST be avoided. By marking these packets and giving them priority
treatment in their processing, the chances of adjacencies breaking is
lowered.
The related goal is to achieve proper prioritization within routers'
queues, both transmit and receive queues. Note, however, that this will
be affected by cryptographic authentication, which has no tolerance for
packet reordering and which is possible with priority queuing.
Maintaining separate sequence number counters for different packet types
on the receiving side may solve this issue.
4.1.3 Use of Other Control Packets to Detect Live-ness of Adjacency
At present, if the control link between two routers is congested, for
example due to a high volume of LSAs, it is possible for several
consecutive Hellos to get dropped. As such, if the Inactivity Timer
expires (i.e., the ROUTER_DEAD_INTERVAL is exceeded), we still assume
the router is down even though because LSAs are being received it is
clearly up. To prevent this, a router SHOULD assume that the receipt of
any OSPF packet from a neighbor to be an indication that the neighbor is
up, so on receipt of any OSPF packet (not just a Hello) the Inactivity
Timer is reset. This behavior has been implemented in some routers.
4.2 OSPF Congestion Control Mechanisms
4.2.1 Detecting Congestion & Notifying Neighbors
Router congestion can be detected at a router in various ways. For
example, congestion can be inferred at a router by:
o The length of internal work queues
o Missing some percentage of hellos; for instance, missing 15 out of
the last 20 hellos might indicate that a router is regularly
missing OSPF packets on a given link.
o Frequent retransmissions are required.
o Explicit indication of congestion from an adjacent router.
It is useful for explicit congestion notification to take place when a
router notes its input work queues are very long, or if it is losing
Hello's from another router. Detecting outbound congestion in some way
does not require notification, since we should assume the adjacent
router will detect this congestion on its own.
Three levels of congestion MUST be defined: high, low, and no
congestion.
As multiple congestion states are specified, it may happen that
congestion state changes very often, and could lead to oscillations
between the high, low, and no congestion states. In order to damp
congestion state oscillation, congestion state SHOULD not be changed
unless some configurable time interval has elapsed since the last state
change. For example, any router that goes into high congestion state CAN
continue reporting this state for at least some interval
High_Congestion_Interval, so as to avoid flapping of a router between
high and medium congestion. The implementation details of congestion
detection and damping are left to the implementer.
When a router detects congestion, it MUST send notifications to
neighbors of its state of congestion. The order in which it sends the
notification MUST be in the order of ADJACENCY_PRIORITY, which have 2
levels: high and low. The congestion signaling MUST be done by sending
a Choke LSA (defined in the following paragraph) or LLS signaling to its
neighbor, which indicates the level of congestion at the node.
If LLS signaling is used for congestion notification, then two new bits,
called HCS (high congestion signal) and LCS(low congestion signal) are
introduced into the Extended Options TLV in the Link-Local-Signaling
(LLS) block [zinin4]. The value of these bits is TBD (temporarily, the
values are 0x00000004 and 0x00000008 in Figure 1 below).
+---+---+---+---+---+---+---+- -+---+---+---+---+---+---+---+---+
| * | * | * | * | * | * | * |...| * | * | * | * |HCS|LCS| RS| LR|
+---+---+---+---+---+---+---+- -+---+---+---+---+---+---+---+---+
Figure 1. Bits in Extended Options TLV
OSPF routers should set the HCS-bit in the Extended Options-TLV attached
to a Hello packet when the router is in the heavy congestion state, and
the LCS-bit when it is in the low congestion state. Only one of the
above two bits may be set in a Hello at one time. When there is no
congestion, neither of the bits is set. For a definition of the LR bit,
see [OOB Signaling], and for RS bit see [zinin1]. 'No' congestion is
sent after 'high' or 'low' congestion is sent and congestion subsides.
4.2.2 Router Response When it Detects Congestion
When a router detects congestion, it SHOULD perform the actions
described in the following Sections. In addition, mechanisms are
proposed in Appendix A for dropping adjacencies under extreme
congestion, but may be controversial, so they are placed in an Appendix
for now.
4.2.2.1 Congestion Notification
Send congestion notification to neighbors using Link-Local-Signaling.
4.2.2.2 Throttle Rate of LSA Flooding
The rate at which an OSPF implementation floods information SHOULD be
able to be progressively limited based on network conditions.
Progressively limiting flooding rates can reduce packet drops and LSA
storms. The mechanism used to progressively rate limit LSA flooding can
vary between implementations, so no single implementation mechanism is
set forward in this document as required. However, a mechanism is
discussed here to illustrate how such a mechanism might work.
A configurable variable MAX_LSA_FLOOD CAN be defined for each interface,
which is the maximum number of LSAs flooded by any router at any time,
for which it has not gotten acknowledgement. This is, in doing the rate
limiting, the router should not send more than a given maximum rate of
packets per second and should allow for interpacket gaps. This
mechanism is similar to the Min_LSP_Transmission_Interval mechanism in
ISIS. This mechanism can use a sliding window protocol for rate
control, which is how it is already implemented in some routers.
A router SHOULD also flood LSAs in the order of the priority of the
adjacency. Intra-area LSAs that affect the SPF calculation CAN be given
priority over inter-area LSAs. Inter-area LSAs CAN be given higher
priority over external LSAs, which can be given priority over LSAs which
do not result in any topology changes.
When a router detects congestion from a neighbor, it CAN progressively
decrease the flooding rate through some mechanism. For example, the
router can increase its LSA_RETRANSMIT_INTERVAL by doubling the
LSA_RETRANSMIT_INTERVAL for low congestion, and quadrupling the
LSA_RETRANSMIT_INTERVAL for high congestion. Also, the congested router
SHOULD a) reduce the rate of flooding, and b) increase the shortest path
computation interval.
4.2.2.3 Increase Adjacency Break Interval
Under high congestion, if no packet is received from a neighbor within
the ROUTER_DEAD_INTERVAL, the router SHOULD not break adjacency
immediately, but MUST wait for a greater interval
ADJACENCY_BREAK_INTERVAL before it calls the adjacent router down. The
HELLO_INTERVAL and ROUTER_DEAD_INTERVAL can be increased at most once at
each router. The mechanism for changing these intervals needs to avoid
the problem that in case the ROUTER_DEAD_INTERVAL is changed (as sent in
Hello messages), a mismatch in ROUTER_DEAD_INTERVAL can cause
adjacencies to break (as Hellos will not be processed when received with
a different Hello interval from what is configured). If the router
detects a mismatch, the HELLO_INTERVAL or ROUTER_DEAD_INTERVAL should be
appropriately incremented/decremented. This mechanism would protect
broadcast links on which a router has many neighbors, and perhaps one
neighbor is dominating the use of bandwidth on the link.
4.2.2.4 Reduce the Rate of SPF Computation
Under high congestion, reduce the rate of SPF computation and do not do
any incremental SPF computation. This reduces the CPU overload load by
not overloading the processor with SPF calculations. Also, during high
congestion, the SPF computation would generally be based on incomplete
topology information anyway, so convergence time is not affected in this
case.
4.2.3 Router Response if Congestion Notification Received
When a router receives a congestion notification message, it SHOULD
perform the following actions:
o Increase the retransmit interval to the congested router,
o Throttle the rate of LSA flooding toward the congested router (see
Section 4.2.2.2), and
o Increase the adjacency break interval to the congested router (see
Section 4.2.2.3).
4.2.4 Graceful Recovery from Loss of Topology Database Information -
Database Backup & Resynchronization
There is a need to gracefully recover from massive loss of topology
database information. This need is illustrated in Section 2, where we
discuss failure experience in which all topology database information is
lost in all routers in a large network, and the link-state protocol is
unable to recover. To address this problem, OSPF database backup and LS
database synchronization SHOULD be applied. In essence, the database
needs to be recovered, and can be backed up on the local node, and the
process can use [zinin3] to resynchronize the database.
OSPF LS database synchronization is achieved via two methods,
o initial LSDB synchronization when router first connected to the
network, and
o asynchronous flooding ensures continuous LSDB synchronization in
presence of topology changes after initial procedure completed.
After two routers have established an adjacency (neighbor FSMs have
reached Full state), routers announce adjacency states in their
router-LSAs. Asynchronous flooding ensures routers' LS databases stay
in sync in the presence of topology changes.
It is sometimes necessary for OSPF routers to resynchronize their LS
databases, such as under failure when database information is lost.
Here we identify LS database backup mechanisms, and review
resynchronization procedures.
After a node fails and comes up, the node needs to recover the current
state of topology database. The database of the failed router SHOULD be
backed up before it fails, and this backup SHOULD be used to update the
database. A router SHOULD back up all the non-self-originated LSAs and
the states of the interfaces. Accordingly, when the network is
recovering, a router SHOULD generate all its own LSAs, node, network,
and summary LSAs. All Do_Not_Age (DNA) LSAs need not be regenerated.
The router SHOULD only receive the LSAs that have changed between the
time it failed and the current time. If the database backup is
completely up to date, none of the LSAs would need to get updated.
After a failure, a router SHOULD base its routing on the current
database, derived as above. Some of the database information will be
somewhat out of sync, and routing will not be optimal. However, if the
database is relatively up to date, the routing would suffice until the
database re-syncs completely. What is saved is a considerable amount of
CPU processing of LS requests and LS updates.
The approach of Zinin [zinin3] is relevant here when the router
experiencing a failure uses graceful restart. The mechanism described in
"OSPF Out-of-band LSDB resynchronization", when database information is
lost, OSPF standard does not allow routers to do so without actually
changing the topology view of the network. That is, if routers need to
resynchronize their LS databases, they cannot do that without putting
the neighbor FSMs into the ExStart state. This effectively causes
adjacencies to be removed from the router-LSAs, and is not acceptable in
some cases. With out-of-band link-state-data-base (OOB LSDB)
resynchronization, routers can re-establish adjacencies after a reload,
such as with a database backup, described above. To do this, an R-bit
indicates OOB LSDB resynchronization.
5. Related Solution Methods
The following is a summary of methods implemented elsewhere that help
control congestion and may be worth considering here.
5.1 Optimized Link State Routing (OLSR) Protocol [jacquet]
With Optimized Link State Routing (OLSR), a key concept is the use
multipoint relays (MPRs). MPRs are selected routers which forward
broadcast messages during the flooding process. As such, OLSR
substantially reduces the message overhead as compared to a pure
flooding mechanism. OLSR is designed to work in a distributed manner
and does thus not depend on any central entity. Note that MPR behavior
in dynamic topologies (when the connectivity graph changes) is still
under consideration.
5.2 Fisheye State Routing Protocol (FSR) for Ad Hoc Networks [gerla1]
With Fisheye State Routing (FSR), a router periodically broadcasts LS
updates of a destination to its neighbors with a frequency that depends
on hop distance to destination. LS updates corresponding to far away
destinations are propagated with lower frequency than those for close by
destinations. FSR reduces overhead caused by routing table updates.
5.3 Landmark Routing Protocol (LANMAR) for Large Scale Ad Hoc Networks
[gerla2]
With Landmark Routing Protocol (LANMAR), a 'landmark' is dynamically
elected in each group. Routing to the landmark propagated throughout
the network. A router directs a packet to the landmark and then to the
destination. LANMAR reduces routing table size and routing update
overhead, since it a) summarizes routing information of remote groups,
and b) uses truncated local routing table.
5.4 Topology Broadcast Based on Reverse-Path Forwarding (TBRPF) [SRI
International authors]
With Topology Broadcast Based on Reverse-Path Forwarding (TBRPF), an
efficient Hello mechanism is introduced in which
o TBRPF Neighbor Discovery (TND) protocol allows each router to
quickly detect neighboring nodes with direct link, and
o TND uses "differential" HELLO messages which report only *changes*
in status of neighbors.
This results in Hello messages that are much smaller than other LS
routing protocols which include the IDs of *all* neighbors.
6. Security Considerations
There are no new security considerations based on proposals in this
draft.
7. Acknowledgments
The authors gratefully acknowledge the comments and suggestions from
many people. We are very appreciative of the most helpful and
constructive comments from Alex Zinin, Russ White (Cisco), and Dave
Katz (Juniper). At AT&T we thank Margaret Chiosi, Enrique Cuevas, Tom
Helstern, Steve Holmgren, Raghuram Aswatnarayan, and Mike Shapiro,
Shawn McAllister at Alcatel, Marty Albright at Worldcom, Sohel Khan at
Sprint, Ghassem Koleyni at Nortel Networks, Thomas Cornely and Greg
Wetzel at Covad, Bob Dianda at Sprint, and Jeff Parker at Axiowave.
8. References
[af-pnni-0055.000] "Private Network-Network Interface Specification
Version 1.0 (PNNI 1.0)," March 1996.
[ash] Ash, G. R., "Dynamic Routing in Telecommunications Networks,"
McGraw Hill.
[atmf00-0249] "Scalability and Reliability of large ATM networks."
[atm00-0257] "Signaling Congestion Control in PNNI Networks: The Need
and Proposed Solution Outline."
[atm00-0480] "Congestion Control/Failure Recovery in PNNI Networks."
[atm01-0101] "Proposed Mechanisms for Congestion Control/Failure
Recovery in PNNI Networks."
[att] "AT&T announces cause of frame-relay network outage," AT&T Press
Release, April 22, 1998.
[btd-cs-congestion-02.00] "Signaling Congestion Control Version 1.0",
Baseline Text
[cholewka] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive
Week, August 20, 1999.
[choudhury] Choudhury, G., Maunder, A. S., Sapozhnikova, V., "Faster
Link-State Convergence and Improving Network Scalabiity and Stability,"
submitted for presentation at LCN 2001.
[gerla1] Gerla, M., et. al., "Fisheye State Routing Protocol (FSR) for
Ad Hoc Networks," work in progress.
[gerla2] Gerla, M., et. al., "Landmark Routing Protocol (LANMAR) for
Large Scale Ad Hoc Networks," work in progress.
[hosein1] Hosein, P., "An Improved ACC Algorithm for Telecommunication
Networks," Telecommunication Systems 0, 1998.
[hosein2] Hosein, P., "Overload Control for Real-Time Telecommunication
Databases," International Teletraffic Congress - 16, Edinburgh,
Scotland, June 1999.
[jacquet] Jacquet, et. al., "Optimized Link State Routing Protocol,"
work in progress.
[jander] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light
Reading, April 6, 2001.
[maunder] Maunder, A. S., Choudhury, G., "Explicit Marking and
Prioritized Treatment of Specific IGP Packets for Faster IGP
Convergence and Improved Network Scalability and Stability," work
in progress.
[mpls-architecture] Rosen, E., Viswanathan, A., Callon, R.,
"Multiprotocol Label Switching Architecture," RFC 3031, January 2001.
[mummert] Mummert, V. S., "Network Management and its Implementation on
the No. 4ESS," International Switching Symposium, Japan, 1976.
[moy1] Moy, J., "Hitless OSPF Restart", work in progress.
[moy2] Moy, J., "Flooding over parallel point-to-point links," work in
progress.
[moy3] Moy, J., "Flooding Over a Subset Topology," work in progress.
[murphy] Murphy, P., "OSPF Floodgates," work in progress.
[pappalardo1] Pappalardo, D., "Can one rogue switch buckle AT&T's
network?," Network World Fusion, February 23, 2001.
[pappalardo2] Pappalardo, D., "AT&T, customers grapple with ATM net
outage," Network World, February 26, 2001.
[pillayesnault] Pillay-Esnault, P., "OSPF Refresh and flooding
reduction in stable topologies," work in progress.
[Q.764] "Signalling System No. 7 - ISDN user part signalling
procedures," December 1999.
[refresh-guide] Zinin, A., "Guidelines for Efficient LSA Refreshment
in OSPF," work in progress.
RFC2475] Blake, S., et. al., "An Architecture for Differentiated
Services," RFC 2475.
[sri] SRI authors, et. al., "Topology Broadcast Based on Reverse-Path
Forwarding (TBRPF)," work in progress.
[whitehead] Whitehead, Martin, "A class of overload controls based
on controlling call reject rates," ITU-T contribution D.19, Feburary 2001.
[zinen1] Zinin, A., et. al., "OSPF Restart Signaling," work in
progress.
[zinen2] Zinin, A., et. al., "Flooding optimizations in link-state
routing protocols," work in progress.
[zinin3] Zinin, A., et. al., "OSPF Out-of-band LSDB resynchronization,"
work in progress.
[zinin4] Zinin, A., et. al., "OSPF Link-Local Signaling," work in
progress.
9. Appendix
Appendix A - Dropping Link Adjacencies Under Extreme Congestion
The following mechanisms are proposed for dropping adjacencies under
extreme congestion, but may be controversial, so they are placed in an
Appendix for now. As discussed below, there is experience in using
these mechanisms in practice, and also experimental verification that
the mechanisms are effective.
In the case a router is experiencing extremely high congestion, it
SHOULD drop adjacencies on all links with low priority.
It is natural to think that as we drop adjacencies under congestion,
that in itself causes some flooding and so may not help. However, this
flooding is only one-time event but over a longer term the reduced
adjacency helps. The main point is that with lower adjacency the node
has to do less work for any LSA generated anywhere in the network. This
has been shown by the analytic model discussed in [choudhury]. In
addition to the analysis and simulation results, there is implementation
experience with this approach within large carrier networks. In fact, a
procedure for reducing adjacencies was done to recover the network after
the 2/20/01 failure discussed in Section 2, and is also invoked before
each major upgrade.
If a router has at any time more adjacencies between Exchange and
Loading than MAX_ADJACENCY_BUILD_COUNT, the router SHOULD drop
adjacencies in Exchange or Loading with lower priority. In that way,
routers within a router-group can experience a lower level of LSA
flooding and Hello processing. When congestion on a router subsides, it
SHOULD then gradually reestablish adjacencies on links that were
dropped.
When an adjacency is dropped, the router SHOULD list the link as a stub
link in its Router LSA, and if the router is a Designated Router on a
broadcast link, it SHOULD remove the neighbor from its Network LSA. The
Hello messages should still show the router as a neighbor. If the
router in this case gets a 1-way Hello message, it SHOULD perform as per
RFC2328.
Also proposed is the concept of forming smaller 'router-groups' within
an area in order to stabilize the network under extreme congestion, as a
last resort to avoid network collapse. After the smaller router groups
recover, then the router-groups should be re-constituted into the
original area. Here again, analysis and simulation models have shown the
benefit. This is also a current plan within a large carrier network to
manually segment the network into multiple smaller sub-networks and then
reconnecting them after stabilization has been achieved.
Priorities CAN be set on interfaces such that it can help form smaller
router-groups, or clusters. That is, links between routers in different
groups SHOULD be given low priority and those within a group high
priority. Links from a router to other router-groups in an area SHOULD
be given low priority. Links within a router-group CAN be given high
priority if only one link exists between routers. With parallel
point-to-point links, one link CAN be given high priority over the
others. In this way, the adjacency limiting mechanism will be confined
to router groups, and routers within a router-group can experience a
lower level of LSA flooding and Hello processing, thus helping to avoid
the spread of congestion.
In high congestion, the adjacencies are forming and then breaking again,
and this leads to LSA storms such that none of the adjacencies are
forming at all. In this state, we instead drop adjacencies, let only a
subset of the adjacencies form, incrementally bring up adjacencies, and
in that way make the domain stable. So dropping adjacencies may cause
additional LSA flooding temporarily, but as routers now only have to
flood to a subset of the topology, the network will stabilize sooner.
Implementations should take care to damp link flaps, and provide
interface and adjacency state dampening. If an interface or a given
adjacency keeps flapping, it should be suppressed for a period of time,
and the worse the instability the longer should be the hold-down period.
Appendix B - Vendor Analysis of LS Protocol Performance
Various vendors and service providers were asked to analyze their
product with reference to how their LS protocol recovers from a total
network
failure, that is, loss of all database information in the specified
scenario. The specification of the failure presented to the vendors
made the following assumptions:
1. 400 node network is considered
a) 100 backbone nodes
b) 3 edge nodes per backbone node (Edge single homed)
c) Each backbone node is connected to no more than 10 nodes (maximum
node adjacency is 13, which defines a sparse network)
2. There are 101 areas
a) 1 backbone area with 100 backbone nodes
b) 100 edge areas, each with 3 nodes, all homed on the backbone peer
group
3. 1,000,000 addresses are advertised in the network
4. maximum node adjacency is 13
a) sparse network
Assumptions on packet sizes and processing times varied according to
product capabilities associated with the individual estimates given
below.
The results for each projected recovery time from three
vendor/service-provider analyses of the above scenario are as follows:
Recovery Time Estimate A - 3.5 hours
Recovery Time Estimate B - 5-15 minutes
Recovery Time Estimate C - 5.5 hours
Clearly there is a large variability to the estimates, however the
expectation is that vendor equipment recovery is not adequate under a
large failure scenario as was analyzed.
Appendix C - Analytic Modeling & Simulation Analysis of LS Protocol
Performance
Some analysis models have been published which reveal problematic
performance of LS protocols under various failure conditions [maunder,
atmf00-0249].
There are various simulation capabilities [NS, wandl, opnet,
scalable-networks, makesystems] for analyzing LS protocol behavior
under failure. We report below an event simulation study [choudhury]
on overloads in large-scale networks and the corresponding recovery
times.
Emulation of large failure performance could also be available through
laboratory testing of actual protocol performance of vendor products
under failure conditions. However to date no studies have been
published on failures in large scale networks and the corresponding
recovery times.
The results from such simulation and emulation studies are specific to
the type of equipment and network configuration modeled, and therefore
are not completely general. However, the results to date lend support
for the existence of problems with LS protocols in the face of
congestion and network failures.
A network-wide event simulation model is reported in [choudhury] to
study the impact of a TSU storm. It captures the actual congestion
seen at various nodes and accounts for propagation delay between
nodes, retransmissions in case a TSU is not acknowledged, failure of
links for TSUs delayed beyond the node-dead interval, and link
recovery following database synchronization and TSU flooding once the
TSU is processed. It approximates a real network implementation and
uses processing times that are roughly in the same order of magnitude
as measured in the real network (of the order of milliseconds).
There are two categories of IGP messages processed at each node in the
simulation. Category 1 messages are triggered by a timer and include
the Hello refresh, TSU refresh and retransmission packets. Category 2
messages are not triggered by a timer and include received Hello,
received TSU and received acknowledgements. Timer-triggered messages
are given non-preemptive priority over the other type and are queued
in the timer queue. As a result, the received Hello packets and the
received acknowledgement packets may see long queuing delays under
intense CPU overload. Table 1 below shows sample results of the
simulation study when applied to a network with about 300 nodes and
800 links. The node-adjacency varies from node-to-node and the
maximum node-adjacency is 30. The Hello interval is assumed to be 5
seconds, the minimum interval between successive SPF
(Shortest-Path-First) calculations is 1 second, and the Node-Dead
Interval ("Router-Dead Interval") is 15 seconds, i.e., a link is
declared down if no Hello packet is received for three successive
hello intervals. During the study, a TSU storm of size X is created
at instant of time 100 seconds where storm-size is defined as the
number of TSUs generated during a storm. Three cases are considered
with X 3D 300, 600 and 900 respectively. Besides the storm, there
are also the normal once-in-thirty-minutes TSU refreshes. At any
given point of time we define a quantity "dispersion" that is the
number of control packets already generated in the network but not
received and processed in at least one node (each control packet is
assumed to carry three TSUs).
Table 1 plots dispersion as a function of time and thereby
illustrates the impact of TSU storm on network stability. Before
the TSU storm, the dispersion due to normal TSU refreshes remains
small. We expect the dispersion to jump to a high value right after
the storm and then come down to the pre-storm level after some period
of time (this happens with X=300 and X=600 but not X=900). In Table
1 with a TSU storm size 300, the "heavy dispersion period" lasted
about 11 seconds and no link losses were observed. With a TSU storm
of size 600, the "heavy dispersion period" lasted about 40 seconds.
Some link losses were observed a little after 15 seconds within
the "heavy dispersion period" but eventually all links recovered and
the dispersion came down to the pre-storm level. With a TSU storm of
size 900, the "heavy dispersion period" lasted throughout the
simulation period (6 minutes).
======|=====================================================================
| Table 1: DISPERSION as a FUNCTION of TIME (in sec)
| for different TSU Storm Sizes
STORM |=====================================================================
SIZE |100s 106s 110s 115s 140s 170s 230s 330s 370s
======|=====================================================================
300 | 0 39 3 1 0 1 0 0 0
------|---------------------------------------------------------------------
600 | 0 133 120 100 12 1 0 0 0
------|---------------------------------------------------------------------
900 | 0 230 215 196 101 119 224 428 488
======|=====================================================================
The generic observations are as follows:
* If the initial TSU storm size (e.g., X3D300) is such that the
delays experienced by Hello packets are not big enough to cause any
link failures anywhere in the network, the network remains stable and
quickly gets back to a period of "low dispersion". These types of LSA
storms are observed quite frequently in operational networks, from
which the network easily recovers.
* If the initial TSU storm size (e.g., X3D600) is such that the
delays experienced by a few Hello packets in a few nodes cause link
failures then some secondary TSU storms are generated. However, the
secondary storms do not keep growing indefinitely and the network
remains stable and eventually gets back to a period of "low
dispersion". This type of LSA storm was observed in an operational
network triggered by a network upgrade, from which the network
recovered but with some difficulty.
* If the initial TSU storm size (e.g., X3D900), is such that the
delays experienced by many Hello packets in many nodes cause link
failures then a wave of secondary TSU storms are generated. The
network enters an unstable state and the secondary storms are
sustained indefinitely or for a very long period of time. This type
of LSA storm was observed in an operational network triggered by a
network failure (2/20/01 AT&T ATM network failure discussed in Section
2.1) from which the network did not recover without corrective steps
(manual procedures were used to reduce TSU flooding and stabilize the
network).
The results show that there is a TSU storm threshold above which the
network shows unstable behavior. It is desirable that TSU flooding
reduction be automatic under overload, and the model could be used to
assess the benefits of various control mechanisms, such as those
discussed in the draft.
10. Authors' Addresses
Jerry Ash
AT&T
Room MT D5-2A01
200 Laurel Avenue
Middletown, NJ 07748, USA
Phone: +1-(732)-420-4578
Fax: +1-(732)-368-8659
Email: gash@att.com
Gagan L. Choudhury
AT&T
Room D5-3C21
200 S.Laurel Avenue
Middletown, NJ 07748, USA
Phone: + 1 732 420- 3721
Fax:+1 732 368-1919
E-mail: gchoudhury@att.com
Vishwas Manral
NetPlane Systems
189, Prashasan Nagar,
Road number 72
Jubilee Hills
Hyderabad, India
E-mail: vishwasm@netplane.com
Anurag S. Maunder
Sanera Systems
370 San Aleso Avenue
Sunnyvale, CA 94085, USA
Ph: 408-734-6123
Fax: 408-400-0151
E-mail: amaunder@sanera.net
Vera D. Sapozhnikova
AT&T
Room C5-2C29
200 S. Laurel Avenue
Middletown, NJ 07748, USA
Phone: + 1 732 420-2653
Fax: +.1 732 368-1774
E-mail: sapozhnikova@att.com
Mostafa Hashem Sherif
AT&T
Room C5-2D18
200 S. Laurel Avenue
Middletown, NJ 07748, USA
Phone: + 1 732 420-2448
Fax: +.1 732 371-7234
E-mail: mhs@.att.com
11. Copyright Statement
Copyright (C) The Internet Society (1998). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are included
on all such copies and derivative works.
However, this document itself may not be modified in any way, such as by
removing the copyright notice or references to the Internet Society or
other Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be followed,
or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS
IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT
INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
| PAFTECH AB 2003-2026 | 2026-04-24 06:06:14 |