One document matched: draft-hancock-nsis-overload-00.txt
Internet Draft Robert Hancock
Eleanor Hepworth
Andrew McDonald
Siemens/Roke Manor Research
Document: draft-hancock-nsis-overload-
00.txt
Expires: December 2003 June 2003
Handling Overload Conditions in the NSIS Protocol Suite
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
The NSIS working group is considering protocols for signaling for
resources for a traffic flow along its path in the network. The
requirements for such signaling are being developed in [2] and a
framework in [3]. The framework describes a 2-layer protocol
architecture, with a common lower NSIS 'transport' layer protocol
(NTLP) supporting a variety of upper layer NSIS signaling layer
protocols (NSLPs).
It is an open issue where within this architecture to place the
responsibility for handling overload conditions. These conditions
relate both to overload of the IP layer itself, as well as overload
of buffer/processing resources within the NTLP/NSLPs. This note
discusses the requirements and the implications of various
approaches, and proposes a way forwards.
Hancock et al. Expires - December 2003 [Page 1]
NSIS: Overload Handling June 2003
Table of Contents
1. Introduction, Scope and Terminology............................2
1.1 Terminology; Flow and Congestion Control ...................3
2. Requirements...................................................3
3. Implications of Doing Overload Handling within NSIS Protocols..5
4. RSVP and Other Protocol Work...................................5
5. Handling IP Overload ("Congestion Control")....................6
6. Handling NSIS Protocol Overload................................7
7. Security Considerations........................................9
8. Conclusions....................................................9
Acknowledgments..................................................10
Author's Addresses...............................................11
Full Copyright Statement.........................................11
1. Introduction, Scope and Terminology
The NSIS working group is considering protocols for signaling for
resources for a traffic flow along its path in the network. The
requirements for such signaling are being developed in [2] and a
framework in [3]. The framework describes a 2-layer protocol
architecture, with a common lower NSIS 'transport' layer protocol
(NTLP) supporting a variety of upper layer NSIS signaling layer
protocols (NSLPs).
It is an open issue where within this architecture to place the
responsibility for handling overload conditions; 'handling' includes
detection as well as prevention and recovery. These conditions relate
both to overload of the network (IP) layer itself, as well as
overload of buffer/processing resources within the NTLP/NSLPs. This
note discusses the requirements and the implications of various
approaches, and proposes a way forwards.
These issues have been intermittently discussed on the NSIS mailing
list [4], and noted in some of the design-related drafts [5, 6, 7].
[8] provides authoritative guidance specifically on how the problem
of congestion should be approached within Internet protocol
standards, and includes many important references.
Note that this draft is specifically not about resource signaling to
manage congestion within the network when it actually occurs - for
example, traffic engineering to route data flows around congested
network areas. This is an important subject, but it is specifically
about how resource management should be done, rather than about how
signaling protocols should work. This draft includes discussion of
how to prevent signaling protocols from adding to the network
congestion problem.
Hancock et al. Expires - December 2003 [Page 2]
NSIS: Overload Handling June 2003
After classifying the various types of signaling overload in section
1.1, section 2 describes the potential causes of overload and the
(proposed) requirements for how they should be dealt with. Section 3
describes the basic implications for protocol design and
implementation if they provide overload handling, and section 4
briefly mentions how some other protocols related to network
operation handle the problem. Section 5 discusses how to handle
network (IP layer) overload, and section 6 discusses overload within
the NSIS protocol suite itself. Security aspects are briefly
mentioned in section 7, and section 8 concludes.
1.1 Terminology; Flow and Congestion Control
Unless otherwise stated, this document follows the terminology given
in the current NSIS framework [3].
The overload problem is actually (at least) three problems:
a) Overload in the IP layer, i.e. buffer congestion which causes IP
packets to be dropped (affecting all flows, for signaling, data and
other applications).
b) Overload in the NTLP, meaning it cannot process incoming or
outgoing packets fast enough. This might be caused by processor
overload or by lower (IP) level congestion. It affects all NSIS
signaling applications, but not the rest of the network - assuming
(a) is already handled.
c) Overload in an NSLP, meaning it cannot process incoming or
outgoing packets fast enough. This might be caused by processor
overload or by lower (NTLP/IP) level congestion. It affects only this
signaling application - assuming that (a) and (b) are already
handled.
Traditionally, networking discussions draw a distinction between
congestion control - protecting the infrastructure - and flow control
- protecting the end systems. Making this distinction is somewhat
subtle in the NSIS case, since the infrastructure includes end
systems. For example, overload within the NTLP could be prevented by
NTLP-level flow control; however, it would still be seen as
equivalent to network congestion by NSLPs, and be invisible to the IP
layer (as congestion or anything else). Therefore we work in terms of
the more concrete concept of overload within particular protocol
layers. No doubt even finer distinctions could be drawn.
2. Requirements
This section summarises the potential sources of overload, and just
how critical it is to deal with them as part of protocol design.
Hancock et al. Expires - December 2003 [Page 3]
NSIS: Overload Handling June 2003
Load/overload could originate from the following causes:
NORMAL: 'Normal' operation, as user applications initiate signaling
for their flows. (If this actually causes problems, the network or
network elements probably just need re-engineering.)
RETRY: Aggressive retry behaviour, as end-systems attempt to re-
signal for failed or failing sessions, i.e. even if the flow itself
is not active. (This sort of behaviour is felt to be a real problem
in traditional telephony networks, where the worst excesses of such
devices are curbed by regulation.)
REFRESH: Signaling refresh messages generated within the network may
cause overload, if the refresh period is not appropriately chosen.
RXMIT: Message retransmission (e.g. to achieve reliability in the
face of congestive loss) is itself a potential cause of overload, and
particularly worrying as a source of instability, since the
retransmissions themselves add to the overload.
REPAIR: If there is a path change within the network, local repair
actions could cause a flood of signaling traffic over the
neighbouring links.
While the sources of NORMAL and RETRY are end-systems proxies, the
others are not. Therefore, it is not possible to rely only on end-to-
end load control mechanisms, unless the other sources can be
discounted.
While NORMAL and REFRESH are proportional (somehow) to data traffic
(and should be a small proportion of it) and hence should not usually
be a source of IP-level overload, the others are not. Hence, both
signaling element and general network overload should be handled
within the protocol design.
Any of these factors, especially RETRY and REPAIR, can lead to
overload within the signaling protocol processing. The consequences
of such overload would be reduced responsiveness within the network
control plane, dropped signaling state for user sessions, and so on.
Modified operation under these circumstances is mainly signaling-
application specific; however, the signaling applications usually
need support at the protocol level to detect the overload condition
in the first place.
In the case where all nodes in the network are NSIS-aware, the IP
overload problem essentially becomes a node implementation issue
(allocation of forwarding resources on outgoing links). However, a
background assumption is that the NSIS protocols need to operate well
over large-diameter NSIS-unaware clouds.
A related issue is that causes REFRESH and REPAIR are mainly about
signaling generated in support of particular signaling applications,
rather than 'protocol maintenance' signaling. This is therefore
Hancock et al. Expires - December 2003 [Page 4]
NSIS: Overload Handling June 2003
generated only at NSLP-aware nodes. (This is a consequence of the
design decision that the NTLP only handles message forwarding, not
state maintenance, and therefore cannot for example generate a flood
of signaling application messages on a rerouting event.)
While NSLP/NTLP overload failures are problems which are 'local' to
the NSIS activity, there is no point in even attempting to
standardise protocols which can contribute to network congestion (IP
overload) in an uncontrolled way (see the warnings in [9]).
The conclusion of this section is that overload both within the NSIS
protocols and IP layer needs to be handled with the NSIS protocol
designs, the latter with particular attention to robustness.
3. Implications of Doing Overload Handling within NSIS Protocols
Overload handling generally implies having a feedback channel to
complement the forward channel which carries the 'overload
generating' traffic. The nodes at each end of the feedback channel
have to be sensitive to the presence of the overload and be able to
reduce it; generally, the closer to the location of the overload the
better (e.g. end-to-end mechanisms will be inefficient at dealing
with a local overload caused by a rerouting event).
The implication of this is that an NSIS protocol that purports to
deal with overloads has to be bi-directional, and have state
information at each end which tracks the current load situation. The
more direct the feedback in the reverse direction the better.
Overload protection mechanisms are often associated with reliability
mechanisms, but they don't have to be (e.g. DCCP [10]); they can be
considered independently. Indeed, there may be a case for
unreliability within the protocol (e.g. to delete aged messages),
even though overload control is still needed.
Avoidance of congestion (IP overload) generally has to be done by
tracking packet drops at NSIS-unaware nodes. The mechanisms can vary
from very simple to very complex. At one extreme, a simple stop-and-
wait protocol will work; at the other end, the full (and growing)
sophistication of TCP can be used. More sophistication is needed as
the network length of the feedback channel and the desired throughput
performance increase. This may be a situation where there is a case
for different protocol options in different parts of the network.
4. RSVP and Other Protocol Work
The base RSVP protocol as defined in [11] includes very limited
overload detection and management capabilities. The main aspect is
Hancock et al. Expires - December 2003 [Page 5]
NSIS: Overload Handling June 2003
the fact that refresh intervals can be locally adjusted, but this
just allows management intervention rather than being an adaptive
mechanism within the protocol itself. RSVP extensions for reliability
were introduced in [12], accompanied by an exponential backoff
procedure to address overload cause RXMIT.
Most end-to-end application protocols, subject to causes NORMAL and
RETRY, handle the overload control problem either by using TCP/SCTP
as transports, or with a variety of ad hoc application level
techniques applied over UDP.
Within the network, the protocols which could be victims of causes
REFRESH, RXMIT and REPAIR are non-trivial routing protocols. The most
serious potential overload cause is a flood of routing messages as a
new link is brought up. Here, OSPF uses a simple stop-and-wait
protocol, while BGP uses TCP. The situation for the NSIS protocols is
more severe, since the situation arises for any re-routing event
(even one caused by link changes in a remote part of the network),
and affects links which are already supposedly operational.
In the Diameter Base protocol, which uses TCP/SCTP as a transport,
higher layer overload is managed on a per-peer-connection basis by
the explicit signaling of "busy" indications to the originating peer
and the termination of the connection. The originating peer has the
option to switch to an alternative next hop (load sharing), which is
not possible within NSIS because the signaling has to be coupled to
the data path.
5. Handling IP Overload ("Congestion Control")
If NTLP can generate its own messages for any of causes REFRESH,
RXMIT or REPAIR, then it has to do so in a way which cannot cause IP
layer overload; there is no other option. If this is the case, it
would seem to make sense to rely on the same mechanism (whatever it
is) to protect the IP layer from all NSIS overload causes.
However, whether the NTLP generates such messages depends on other
aspects of NTLP design and other decisions about NTLP functionality.
One could imagine a situation where a very lightweight NTLP had no
intelligence to generate messages independently of NSLP operation, in
which case protection responsibility could be pushed up to the
individual NSLPs. We can't tell whether this argument applies or not
without more detail about the proposed NTLP design.
Therefore, the question remains of whether it is sensible to allocate
the problem to the NTLP in any case. The following arguments would
seem to apply:
Hancock et al. Expires - December 2003 [Page 6]
NSIS: Overload Handling June 2003
*) There is no need for different sorts of congestion control for
different signaling applications. (There may be different detailed
reactions to congestion, i.e. how to generate fewer messages;
however, detecting that fewer messages need to be sent is universal
across all signaling applications.) Therefore, there is no need to
solve this in a signaling-application sensitive manner.
*) Detecting the problem may be easier with closer interaction with
the lower layers. The NTLP is best placed to do this.
*) Solving the problem is hard and important. Therefore, it is better
to do it once and for all, and make life less burdensome for future
NSLP developers.
The conclusion of this set of arguments appears to be that congestion
control, i.e. protection of the IP layer from overloads caused by
NSIS protocol operation, should be an NTLP function.
6. Handling NSIS Protocol Overload
The other question is related to handling overloads within the NSIS
protocol layers themselves, i.e. when the internal resource of the
NEs are constrained. It is clear that the NSLP should be in charge of
adapting its own behaviour in response to overload situations, since
the response will be specific to the signaling application. However,
the method of detection and response depends on what overload
detection and control features the NTLP provides, and what
assumptions the NSLP can make about their presence (especially in
remote nodes). Therefore, this section aims to identify the different
options for how overload indications can be pushed up the protocol
stack and/or out to the edge of the network (where the adaptation can
take place) and how in particular the NTLP should support this.
If the conclusion of section 5 is correct (i.e. NTLP enforcing IP
layer congestion control), it is most likely that in any case there
should be a flow-controlling API between the NSIS protocol layers.
For providing overload indications towards the edge nodes, there seem
to be three cases to consider. The argument depends on whether there
are intermediate nodes which are unaware of the NSLPs in use (see
Figure 1).
1) The NTLP provides the equivalent of a highly granular flow
controlled delivery service up to the next NSLP-aware node, with no
assumed constraints on NSLP behaviour. The source is explicitly
forced to throttle back the transmission of messages for the
combination of source/destination/application. The NSLP only has to
detect the condition locally; in fact, it can only send messages
which the local NTLP is prepared to deliver. This makes life very
Hancock et al. Expires - December 2003 [Page 7]
NSIS: Overload Handling June 2003
easy for the NSLP, but NTLP design (in particular, buffer allocation
and propagation of flow control information across nodes) is hard.
+------+
| NE3 |
|+----+|
||NSLP||
|+----+|
+------+ +------+ | || |
| NE1 | | NE2 | |+----+|
|+----+| | | |======||NTLP||===
||NSLP|| | | | |+----+|
|+----+| | | | +------+
| || | | | |
|+----+| |+----+| +------+ +------+
====||NTLP||====||NTLP||==|Router| | NE4 |
|+----+| |+----+| +------+ |+----+|
+------+ +------+ | ||NSLP||
| |+----+|
| | || |
| |+----+|
|======||NTLP||====
|+----+|
+------+
Figure 1: Signaling with NTLP-only hops
2) The NTLP provides a flow controlled delivery service (as above),
but operates under assumptions about upper layer sending windows
which allow buffer management to be simplified. For example, if only
one message is allowed to be outstanding for a particular session at
any time, the buffer requirements can be precisely calculated.
3) The NTLP simply provides the service of delivery to the next NTLP
node, e.g. NE1->NE2, NE2->NE3 in the figure. Overload at an NSLP-
unaware intermediate node (NE2) is handled by dropping packets there
(or, more sophisticated but still IP-like behaviour). The NSLPs in
NE1 and NE3 have to detect this condition and somehow adapt
accordingly (in particular, NE1 has to be able to detect that NE3 is
overloaded but that NE4 may not be).
Solutions (1) and (2) are both flow-control based, and require the
maintenance of per-source-destination information in order to support
flow control properly. For example, in figure 1, the NTLP at NE2
would have to detect overload for the signaling application at NE3
and throttle signaling messages for it from NE1, while not affecting
NE1->NE2->NE4 communications. In addition, these solutions put
complexity into the NTLP, and might infect it with knowledge about
signaling flow topologies which it should really be ignorant of.
Hancock et al. Expires - December 2003 [Page 8]
NSIS: Overload Handling June 2003
Solution (3) puts some complexity into the NSLP behaviour which could
be common to several applications; on the other hand, the flexibility
to do it differently between different applications could be
valuable. This option does not preclude the NTLP from doing flow
control, but it does place a requirement on the NSLP to cope with
lost messages at least as pathological events (although this would
have to be the case anyway, e.g. to cope with intermediate node
failure).
Note that these problems are mainly caused by the NSLP-unaware node,
NE2, and the fact that the NTLP cannot bypass it. In contrast, for
direct communication (e.g. NE3<->NE4) it would be very easy to
implement solution (1). Flow-controlling solutions are also
attractive because they can minimize the buffering taking place
within the network and hence improve responsiveness.
The conclusion of this argument appears to be that (3) is the
preferred approach. This conclusion is mainly driven by complexity
arguments about the NTLP, and the existence of NSLP-unaware nodes; if
both of these arguments could be dealt with, the conclusion might
well be the opposite way around.
7. Security Considerations
Malicious nodes can attack congestion control mechanisms to force
nodes into a congestion avoidance state. The NTLP design should
protect against this type of attack where the network is open to it.
Also, both NSIS overload protection approaches have to make some
assumptions about fairness at the NTLP level; however, this seems to
be unavoidable.
8. Conclusions
1. The NTLP needs to prevent network overload in the IP layer between
NTLP peers.
2. However, NSLPs need to detect and adapt to overload within the
NSIS protocols themselves.
3. Detection may take place by noting messages dropped by the NTLP,
as well as any flow control imposed by the NTLP.
References
1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996.
Hancock et al. Expires - December 2003 [Page 9]
NSIS: Overload Handling June 2003
2 Brunner, M., "Requirements for QoS Signaling Protocols", draft-
ietf-nsis-req-07.txt (work in progress), March 2003
3 Freytsis, I., R. E. Hancock, G. Karagiannis, J. Loughney, S. van
den Bosch, "Next Steps in Signaling: Framework", draft-ietf-nsis-
fw-02.txt (work in progress), March 2003
4 Archive at: www.ietf.org/mail-archive/working-groups/nsis/
5 Braden, R. and B. Lindell, "A Two-Level Architecture for Internet
Signaling", draft-braden-2level-signal-arch-01.txt (work in
progress), November 2002
6 Schulzrinne, H., H. Tschofenig, X. Fu, A. McDonald, "CASP - Cross-
Application Signaling Protocol", draft-schulzrinne-nsis-casp-
01.txt (work in progress), March 2003
7 McDonald, A., R. Hancock, E. Hepworth, "Design Considerations for
an NSIS Transport Layer Protocol", draft-mcdonald-nsis-ntlp-
considerations-00.txt (work in progress), January 2003
8 Floyd, S., "Congestion Control Principles", RFC 2914, September
2000
9 http://www.ietf.org/ID-nits.html
10 http://www.ietf.org/html.charters/dccp-charter.html
11 Braden, R. et al., "Resource ReSerVation Protocol (RSVP) --
Version 1 Functional Specification", RFC 2205, September 1997
12 Berger, L., Gan, D., Swallow, G., Pan, P., Tommasi, F. and S.
Molendini, "RSVP Refresh Overhead Reduction Extensions", RFC 2961,
April 2001
Acknowledgments
The authors would like to thank all their colleagues and fellow
participants in the NSIS working group and internal protocol
discussions for exposing the complexities and subtleties in this
subject area. In particular, input was used from (in order of
CRC{name}) Henning Schulzrinne, Xiaoming Fu, John Loughney, Melinda
Shore, Hannes Tschofenig, Georgios Karagiannis, Ping Pan, Bob Braden,
Sven Van den Bosch, Lars Westberg, Marcus Brunner, and Ruediger Geib.
Henning in particular provided valuable education on flow control in
Hancock et al. Expires - December 2003 [Page 10]
NSIS: Overload Handling June 2003
signaling protocols. Needless to say, the interpretation and
conclusions should be blamed only on the authors.
Author's Addresses
{Robert Hancock, Eleanor Hepworth, Andrew McDonald}
Roke Manor Research
Old Salisbury Lane
Romsey, Hampshire
SO51 0ZN
United Kingdom
email: {robert.hancock|eleanor.hepworth|andrew.mcdonald}@roke.co.uk
Full Copyright Statement
Copyright (C) The Internet Society (2003). All Rights Reserved. This
document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. This
document and the information contained herein is provided on an "AS
IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL
NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY
OR FITNESS FOR A PARTICULAR PURPOSE.
Hancock et al. Expires - December 2003 [Page 11]
| PAFTECH AB 2003-2026 | 2026-04-23 09:16:19 |