One document matched: draft-westberg-pcn-load-control-00.txt
PCN Working Group Lars Westberg
INTERNET-DRAFT A. Bader
D. Partain
Ericsson
Expires: 15 November 2007 Georgios Karagiannis
University of Twente
May 15, 2007
LC-PCN - The Load Control PCN solution
<draft-westberg-pcn-load-control-00.txt>
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 15, 2007.
Copyright (C) The IETF Trust (2007).
Intended Status:
Westberg, et al. [Page 1]
INTERNET-DRAFT Load Control
Abstract
There is an increased interest of simple and scalable resource
provisioning solution for Diffserv network.
The Load Control PCN (LC-PCN) addresses the following issues:
1. Admission control for real time data flows in stateless Diffserv
Domains
2. Flow termination: Termination of flows in case of exceptional
events, such as severe congestion after re-routing.
Admission control in a Diffserv stateless domain is a combination of:
1. Probing, whereby a probe packet is
sent along the forwarding path in a network to determine
whether a flow can be admitted based upon the current
congestion state of the network
2. Admission control based on data marking, whereby in congestion
situations the data packets are marked to notify the egress node
that a congestion occurred on a particular ingress to egress
path.
The scheme provides the capability of controlling the traffic load in
the network without requiring signaling or any per-flow processing in
the core routers. The complexity of Load Control is kept to a minimum
to make implementation simple.
Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . .4
3. LC-PCN Overview . . . . . . . . . . . . . .. . .4
4. LC-PCN Detailed Description . . . . . . . . . . . .. . . 9
5. Security Consideration. . . . . . . . . . . . . . . . . . . 29
6. IANA Considerations. . . . . . . . . . . . . . . . . . . . .29
7. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . .29
8. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . 29
9. Normative References . . . . . . . . . . . . . . . . . . . .30
10. Informative References . . . . . . . . . . . . . . . . . . 30
Westberg, et al. [Page 2]
INTERNET-DRAFT Load Control
1. Introduction
The amount of traffic carried on the Internet is now greater than the
traffic on the world's telephony network. Still, Internet-based
communication services generate less income than plain old telephony
services. Enabling value-added services over the Internet is
therefore crucial for service providers. One significant class of
such value-added services requires real-time packet transportation.
It can be expected that these real-time services will be popular as
they replicate or are natural extensions of existing communication
services like telephony. Exact and reliable resource management
(e.g., admission control) is essential for achieving high utilization
in networks with real-time transportation capabilities.
The problem is difficult mainly due to scalability issues.
With the introduction of differentiated services (DS) [RFC2475], it
is now possible to provide large scale, real-time services. The basic
idea of DiffServ is that, rather than classifying packets at each
router, packets are only classified at the edge devices. The result
- the required packet treatment - is stored and carried in the packet
headers, and core routers can carry out appropriate scheduling.
The current definition of DiffServ, however, does not contain any
simple, scalable solution to the problem of resource provisioning and
control. A number of approaches to solving the problem already exist
[RFC3175], [Berson97], [Guerin97], [Stoica99], [Bernet99]. The scheme
presented in this document does not require any state aggregation and
aims at extreme simplicity and low cost of implementation along with
good scaling properties. Load control operates edge-to-edge in a DS
domain, or between two RSVP or NSIS capable routers, where only the
edge devices keep flow state and do per-flow processing. The main
purpose of Load Control is to provide a simple and scalable solution
to the resource provisioning problem.
The original Load Control concept, submitted in April 2000,
[Westberg00], has been developed further to a signaling concept named
Resource Management in Diffserv. RMD was incorporated by NSIS
working group, where the protocol details were worked out for using
NSIS as external protocol [RMD]. Recently new drafts have been
submitted aiming to standardize new Diffserv PHB that provides
controlled load services in Diffserv domains [CL-PHB], [CL-ARCH]. CL
PHB concept is very similar to the original two-bit marking scheme of
Load Control. In CL PHB proposal admission control is based on the
marking of the data packets, i.e. without sending probe packets.
Westberg, et al. [Page 3]
INTERNET-DRAFT Load Control
This document aims to develop a common framework that could be used
both with RSVP and NSIS external protocols.
The remainder of this draft is structured as follows.
After the terminology in Section 2, we give an overview of the LC-PCN
in Section 3. In Section 4 we give a detailed
description of the LC-PCN. Section 5 discusses security issues.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC 2119.
The following terms are used:
Edge node: a Diffserv node on the boundary of some
administrative domain.
Ingress node: An edge node that handles the traffic as it enters the
domain.
Egress node: An edge node that handles the traffic as it leaves the
domain.
Interior nodes: the set of Diffserv nodes which form an
administrative domain, excluding the edge nodes.
<<To be modified and/or extended>>
3. LC-PCN Overview
Load Control PCN (LC-PCN) is achieved by two actions: admission
control based on probing and flow termination. The LC-PCN can be
applied within either a single Diffserv domain, see Figure 1, or
multiple neighboring Diffserv domains, when a trust relationship
exists between these multiple Diffserv domains.
Westberg, et al. [Page 4]
INTERNET-DRAFT Load Control
Ingress Egress
Node (Interior Nodes; I-Nodes) Node
| | |
| | |
V V V
+-------+ Data +------+ +------+ +------+ +------+
|-------|--------|------|------|------|-------|------|---->|------|
| | Flow | | | | | | | |
|Ingress| |I-Node| |I-Node| |I-Node| |Egress|
| | | | | | | | | |
+-------+ +------+ +------+ +------+ +------+
=================================================>
<=================================================
Signaling
Figure 1: Actors in the LC-PCN
3.1 Admission control based on probing
The admission control function based on probing can be used to
implement a simple measurement-based
admission control within a Diffserv domain. In these interior nodes
thresholds are set for the traffic belonging to different PHBs in
the measurement based admission control function. In this scenario
an IP packet is used as a probe packet, meaning that
the DSCP field in the header of the IP packet is re-marked when the
predefined congestion threshold is exceeded.
Note that when the predefined congestion threshold is exceeded all
packets are remarked
by a node. In this way also the data packets are marked to notify the
egress node that a congestion occurred on a particular ingress to
egress path.The edges can then admit or reject flows that are
requesting resources. The rate of the re-marked data packets is used
to detect a congestion situation that can influence the admission
control decisions.
Note that by using probing, the ECMP (Equal Cost Multi Path) problem
that is associated with the
admission control feature can be, to a certain degree, solved by
being able to identify which flows are passing through the congested
node. Note that the ECMP problem is related to the fact that flows
that are not passing through a congested interior node can belong to
an aggregate that detects a congestion.
Westberg, et al. [Page 5]
INTERNET-DRAFT Load Control
Any measures that are taken
on such flows will not solve the congestion problem, since such flows
are not contributing and causing the congestion on the interior node.
3.2 Flow termination
The flow termination function is able to terminate flows in case of
exceptional events, such as severe congestion after re-routing.
The exceptional vent, or severe congestion can be detected using a
DSCP remarking approach where the packet remarking is proportional to
the amount of unavailable resources. In particular, the Diffserv
nodes mark packets whenever the measured link throughput rate exceeds
a pre-configured throughput threshold and the proportion of the
marked packets is in proportion to the excess traffic above the pre-
configured throughput threshold.
The egress nodes can use the remarked DSCP packets to calculate the
percentage of throughput or bandwidth that does exceed the pre-
configured threshold. The egress node can then, in combination with
the ingress node, the sender of the traffic and the support of the
Diffserv domain(s), reduce the generated throughput, by terminating
ongoing flows, until the pre-configured throughput threshold is
satisfied.
3.3 Common Diffserv node configurations
The Diffserv nodes, see Figure 1, which are supporting the LC-PCN,
must perform the following functionalities:
(1) Meter + (2) Marking Action: the Diffserv nodes must be configured
with a meter and marking function that measures and remarks bytes
that are out of a configured traffic profile (e.g., bandwidth
threshold) for a corresponding PHB traffic class, to provide and
indication of a potential resource limitation to a Diffserv node
outside the domain. The traffic profile can be set according to an
engineered bandwidth limitation based on pre-configured thresholds or
based on a capacity limitation of specific links. By using an
algorithm that calculates the rate of bytes that are out of profile,
say rate_out_profile_bytes, a number of bytes, i.e.,
rate_out_profile_bytes/N, are remarked to a second DSCP, denoted
in this example as local_DSCP, that receives the same PHB as the
original DSCP.
Westberg, et al. [Page 6]
INTERNET-DRAFT Load Control
The local_DSCP is defined to be used only locally
within the Diffserv domain. "N" is a pre-configured parameter used to
indicate the proportionality between the measured out of profile
bytes and the remarked bytes. If "N" is used in the algorithm, then
it must have the same value in all Diffserv nodes that use this
mechanism.
(3) Packet Classification + (4) Scheduling: The Diffserv node SHOULD
be configured to consider that the packets marked either with the
original_DSCP or with the local_DSCP SHOULD receive the same per hop
behavior treatment. However, packets that are marked with the
local_DSCP, may be classified to enter a different and larger virtual
queue than the packets marked with original_DSCP. This can ensure
that the dropping probability of local_DSCP remarked packets is lower
than the dropping probability of original_DSCP remarked packets. This
classification can be accomplished by using the packet classification
function, while the way of how the packets are treated in the virtual
queues is accomplished using the scheduling function. Note that
the original_DSCP marked packets and their associated local_DSCP
packets get the same forwarding behavior. The main difference is
related to the fact that the local_DSCP packets get a lower dropping
probability compared to the original_DSCP packets. This is because
the marking information carried by the local_DSCP packets has a
higher significance for the operation of the resource unavailability
algorithm compared to the marking information carried by the
original_DSCP packets.
The two virtual queues, one for the original_DSCP and another one for
local_DSCP marked packets can, for example, be implemented by using
one Drop Tail physical queue and by maintaining queuing information
and also one queuing threshold for each of the virtual queues. The
physical queue uses the same scheduling algorithm, but the length of
each of the virtual queue defines the packet dropping probability of
a virtual queue.
The classification of packets SHOULD be based on either the DSCP or
on a combination of IP header fields including the DSCP.
When the LC-PCN is applied in multiple neighboring Diffserv domains
where a trust relationship exists between these multiple Diffserv
domains and a packet is received by the edge router of another
trusted domain (new Diffserv domain, that might be managed by another
operator), remarking of the original_DSCP and local_DSCP to other
DSCPs, say original_new_DSCP and local_new_DSCP might be necessary.
This is because the neighbor DSCP operator may use different Diffserv
Mapping schemes.
Westberg, et al. [Page 7]
INTERNET-DRAFT Load Control
It is however, considered that SLA agreements exist
between the operator(s) of these Diffserv domains, thus also the
remarking rules followed in each Diffserv domain are known. Note that
the Diffserv nodes used in the neigbouring Diffserv domains should
use the same classification, meter & marking actions as described
above.
3.4. Configuration of edge nodes
The edges must maintains aggregated states that encompass several
flows/calls. The size of the aggregates should be large enough to
ensure that new flows/calls belong to aggregates where ongoing calls
provide feedback for admission control decisions.
When the egress nodes, receive the remarked packets, the rate of the
received marked bytes, per each flow aggregate, is measured. Note
that the calculated rate has to be corrected and
multiplied with the parameter "N", see above, in order to calculate
the real rate of overload, say real_rate_overload. This rate can be
used to provide handling decisions on the flow termination
functionality. Two types of handling decisions could be supported.
For admission control, the egress node can maintain at least a
threshold, say Threshold1, then if the calculated rate of
remarked bytes is higher than Threshold1, i.e., real_rate_overload >
Threshold1, then the Diffserv node can use this information to
provide the basis of call admission decisions for new flows. The
detailed specification of this algorithm is given in Section 4.1.4.
The ingress is configured such that when it receives a request
for reservation message, it generates a probe packet that is sent
within the Diffserv domain. The probe packet should use the same
flow ID and DSCP value as the ones used by the data packets
associated with the request for reservation message.
If the ingress node receives a response that notifies that the probe
was successfully processed, then the reservation request is admitted.
Otherwise it is rejected. Both situations are notified to the sender
of the flow.
When the flow termination procedure is also supported, then at least
two pre-configured bandwidth thresholds are used, i.e.,
Threshold1 and Threshold2, with Threshold2 > Threshold1, then the
Diffserv node should operate in the following way.
Westberg, et al. [Page 8]
INTERNET-DRAFT Load Control
When the
calculated rate, real_rate_overload > Threshold1 then the same
procedure as described above is used (situation that only one
threshold is used). When the calculated rate is higher than
Threshold 2, then the Diffserv node can calculate the amount of
exceeded rate above this threshold, see Section 4.x.x. Note that
Threshold2 is used in the case that a persistent congestion (or
severe congestion) situation occurs, and ongoing calls have to be
notified about it. The egress, by using this exceeded rate it
supports the below options:
* identify ongoing flows, that are part of the aggregate, to be
terminated and send flow termination notifications to these
ongoing sessions towards the ingress
* send the measured value(s) of the excess rate towards the ingress
If the ingress, due to the severe congestion situation, receives flow
termination notifications for certain flows, it will have to terminate
these flows within the Diffserv domain and send flow termination
notifications towards the sender of these flows. The ingress, up to
the moment that the severe congestion situation is solved, it will
also have to stop admitting new flows that could be incorporated
within the aggregated state that is affected by the severe congestion
situation. Furthermore, the ingress uses the received measured excess
rate to resize the aggregated reservation state.
4. LC-PCN detailed description
This section describes the details of the used LC-PCN algorithms.
Section 4.1 and 4.2 describe the "Admission control based on probing"
and "Flow termination" scenario, respectively, for the situation that
the end-to-end sessions are using unidirectional reservations.
Sections 4.3 and 4.4 are describing the two algorithms for the
situation that the end-to-end sessions are using bi-directional
reservations.
4.1 Admission control based on probing for unidirectional flows
The admission control function based on probing can be used to
implement a simple measurement-based admission control within a
Diffserv domain. At interior nodes along the data path congestion
notification thresholds are set in the measurement based admission
control function for the traffic belonging to different PHBs.
Westberg, et al. [Page 9]
INTERNET-DRAFT Load Control
4.1.1 Operation in Ingress nodes
After a trigger event, e.g., the ingress node receives a reservation
request message, the ingress node sends a probe packet, see Figure 2,
towards the egress node. Note that the probe packet should use the
same flow ID information and DSCP value as the data packets
associated with the received reservation request message.
If the ingress node receives a response that notifies that the probe
was successfully processed, then the reservation request is admitted.
Otherwise it is rejected. Both situations are notified to the sender
of the flow.
4.1.2 Operation in Interior nodes
Using standard functionalities congestion notification thresholds are
set for the traffic belonging to different PHBs, see Section 3.
The DSCP field of all data packets and of the probe packet will be
re-marked when the corresponding "congestion notification detection"
threshold is exceeded, see A.
Note that when the data rate is higher than the congestion
notification threshold then also the data packets are remarked.
An example of the detailed operation of this procedure is descried
below.
The predefined congestion notification threshold, see Section 4.2.2
is set according to, and usually less than, an engineered bandwidth
limitation, i.e., admission threshold, based on e.g. agreed Service
Level Agreement or a capacity limitation of specific links.
The difference between the congestion notification threshold and the
engineered bandwidth limitation, i.e., admission threshold, provides
an interval where the signaling information on resource limitation is
already sent by a node but the actual resource limitation is not
reached. This is due to the fact that data packets associated with an
admitted session have not yet arrived, while allows the admission
control process available at the egress to interpret the signaling
information and reject new calls before reaching congestion. Note
that in the situation when the data rate is higher than the
preconfigured congestion notification rate, also data packets are
re-marked. To distinguish between congestion
notification and severe congestion, the following method is used:
Westberg, et al. [Page 10]
INTERNET-DRAFT Load Control
The "encoded DSCP" marking for congestion notification and
severe congestion. When this method is used
and when the interior node is in "congestion notification" state, see
Section 4.2.2, then the node should remark the unmarked bytes using
the "encoded DSCP".
Note that if a node starts dropping packets belonging to a PHB that
suports both "severe congestion" and "congestion notification"
states, see section 4.2.2, then it is considered that the
packet rate associated to this PHB is higher than the severe
congestion detection threshold and that the operation state of this
node has moved to the severe congestion state.
4.1.3 Operation in Egress nodes
When the egress receives the probe packet, which is used as a
request for reservation, it will have to perform the following
functionality.
When the operation state of the ingress/egress pair
aggregate is the "congestion notification", see Section 4.2.3, then
the implementation of the algorithm depends on how the congestion
notification situation is notified to the egress. As mentioned in
Section 4.1.2 this is accomplished by using the received data packets
that are marked using the "encoded DSCP". In this case, during a
measurement interval T, the egress measures the input_notified_bytes
by counting instead of the "notified DSCP", the "encoded DSCP" bytes.
The incoming congestion_rate can be then calculated as follows:
incoming_congestion_rate = N*input_notified_bytes/T
If the incoming_congestion_rate is higher than a preconfigured
congestion notification threshold, then the communication path
between ingress and egress is considered to be congested. In this
situation when the probe packet arrives at the egress,
then this request should be rejected. Note that this is happening
only when the probe packet is "encoded DSCP" marked. In this way it
is ensured that the probe packet passed through the node that it is
congested. This feature is very useful when ECMP based routing is
used to detect only flows that are passing through the congested
router.
Westberg, et al. [Page 11]
INTERNET-DRAFT Load Control
If such an ingress/egress pair aggregated state is not available when
the probe packet arrives at the egress, then this request
is accepted if the DSCP of the probe packet is unmarked. Otherwise
("encoded DSCP" marked), it is rejected.
In any of the situations the egress will have to notify the ingress
whether the request for reservation is admitted or rejected.
Ingress Interior Interior Egress
user | | | |
data | user data | | |
------>|----------------->| user data | |
| |---------------->| user data |
| | |----------------->|
user | | | |
data | user data | | |
------>|----------------->| user data | user data |
| |---------------->S(# marked bytes) |
| | S----------------->|
| | S(# unmarked bytes)|
| | S----------------->|
| | S |
request for reservation | S |
------->| probe packet S |
|----------------------------------->S |
| | S probe packet |
| | S----------------->|
| |response |
|<------------------------------------------------------|
response | | |
<------| | | |
Figure: 2 Admission control based on probing
4.2 Flow termination for unidirectional flows
This flow termination handling method requires the following
functionalities.
Westberg, et al. [Page 12]
INTERNET-DRAFT Load Control
4.2.1 Operation in the Ingress nodes
Upon receiving the notification message sent by the egress, the
Ingress resolves the severe congestion by a predefined policy, e.g.,
by refusing new incoming flows (sessions), terminating the affected
and notified flows (sessions), and blocking their packets or shifting
them to an alternative LC-PCN traffic class (PHB). This operation is
depicted in Figure 3, where the Ingress, for each flow (session)
to be terminated, receives a notification message.
When the Ingress receives the notification message, it starts the
termination of the flows within the LC-PCN domain by sending release
messages.
Ingress Interior Interior Egress
user | | | |
data | user data | | |
------>|----------------->| user data | user data |
| |---------------->S(# marked bytes) |
| | S----------------->|
| | S(# unmarked bytes)|
| | S----------------->|Term.
| notification for termination |flow?
|<-----------------|-----------------S------------------|YES
release | S |
| -----------------|----------------------------------->|
| | | |
Figure: 3 LC-PCN flow termination handling
When the Ingress node receives the notification message that contains
the to be released aggregation bandwidth, it can use it to resize the
size of the aggregation size accordingly.
4.2.2 Operation in the Interior nodes
The Interior node detecting severe congestion remarks data packets
passing the node. For this remarking, two additional DSCPs can be
allocated for each traffic class. One DSCP MAY be used to indicate
that the packet passed a congested node. This type of DSCP is denoted
in this document as "affected DSCP" and is used to indicate that a
packet passed through a severe congested node.
Westberg, et al. [Page 13]
INTERNET-DRAFT Load Control
The use of this DSCP
type eliminates the possibility that, due to e.g. ECMP (Equal Cost
Multiple Paths) enabled routing, the egress node either does not
detect packets passed a severe congested node or erroneously detects
packets that actually did not pass the severe congested node. Note
that this type of DSCP MUST only be used if all the nodes within the
LC-PCN domain are configured to use it. Otherwise, this type of DSCP
MUST not be applied. The other DSCP MUST be used to indicate the
degree of congestion by marking the bytes proportionally to the
degree of congestion. This type of DSCP is denoted in this document
as "encoded DSCP".
Note that in this document the terms marked packets or marked bytes
refer to the "encoded DSCP". The terms unmarked packets or unmarked
bytes are representing the packets or the bytes belonging to these
packets that their DSCP is either the "affected DSCP" or the original
DSCP. Furthermore, in the algorithm described below it is considered
that the router may drop received packets. The counting/measuring of
marked or unmarked bytes described in this section is accomplished
within measurement periods. All nodes within a LC-PCN domain use the
same, fixed measurement interval, say T seconds, which MUST be
pre-configured.
It is RECOMMENDED that the total number of additional (local and
experimental) DSCPs needed
for flow termination handling within an LC-PCN domain should be as
low as possible and it should not exceed the limit of 8.
An example of a remarking procedure is given below.
Per supported PHB, the interior node can support the operation states
depicted in Figure 4, when the per-flow congestion notification
based on probing signaling scheme is used in combination with this
flow termination type.
Westberg, et al. [Page 14]
INTERNET-DRAFT Load Control
---------------------------------------------
| event B |
| V
---------- ------------- ----------
| Normal | event A | Congestion | event B | Severe |
| state |---------->| notification|-------->|congestion|
| | | state | | state |
---------- ------------- ----------
^ ^ | |
| | event C | |
| ----------------------- |
| event D |
------------------------------------------------
Figure 4: States of operation, flow termination combined with
congestion notification based on probing
The terms used in Figure 4 are:
Normal state: represents the normal operation conditions of the
node, i.e. no congestion
Severe congestion state: it represents the state when state the
interior node is severely congested related to a certain PHB
Congestion notification: state where the load is relatively high,
close to the level when congestion can occur
event A: this event occurs when the incoming PHB rate is higher than
the "congestion notification detection" threshold. This threshold is
used by the admission control based on probing scheme, see
Section 4.1, 4.3.
event B: this event occurs when the incoming PHB rate is higher than
the "severe congestion detection" threshold.
event C: this event occurs when the incoming PHB rate is lower than
the "congestion notification detection" threshold.
event D: this event occurs when the incoming PHB rate is lower than
the "severe_congestion_restoration" threshold.
event E: this event occurs when the incoming PHB rate is lower than
the "severe congestion restoration" threshold.
Westberg, et al. [Page 15]
INTERNET-DRAFT Load Control
Note that the "severe congestion detection", "severe congestion
restoration" and admission thresholds should be higher than the
"congestion notification detection" threshold, i.e.,:
"severe congestion detection" > "congestion notification detection"
and "severe congestion restoration" > "congestion notification
detection"
Furthermore, the "severe congestion detection" threshold should be
higher than or equal to the admission threshold that is used by the
reservation based and NSIS measurement based signaling schemes.
"severe congestion detection" >= admission threshold
Moreover, the "severe congestion restoration" threshold should be
lower than or equal to the "severe congestion detection" threshold
that is used by the reservation based and NSIS measurement based
signaling schemes, i.e.,:
"severe congestion restoration" <= "severe congestion detection"
During severe congestion the interior node calculates, per traffic
class (PHB), the incoming rate that is above the "severe congestion
restoration" threshold, denoted as signaled_overload_rate, in the
following way:
* A severe congested interior node should take into account that
packets might be dropped. Therefore, before queuing and eventually
dropping packets, the interior node should count the total number of
unmarked and remarked bytes received by the severe congested node,
denote this number as total_received_bytes. Note that there are
situations when more than one interior nodes in the same path become
severe congested. Therefore, any interior node located behind a
severe congested node may receive marked bytes.
* before queuing and eventually dropping the packets, at the end of
each measurement interval of T seconds, calculate the current
estimated overloaded rate, say measured_overload_rate, by using the
following equation:
measured_overload_rate =
=((total_received_bytes)/T) - severe_congestion_restoration)
Westberg, et al. [Page 16]
INTERNET-DRAFT Load Control
Note that since marking is done in interior nodes, the decisions are
made at egress nodes, and termination of flows are performed by
ingress nodes, there is a significant delay until the overload
information is learned by the ingress nodes, see Section 6 of
[CsTa05]). The delay consists of the trip time of data packets from
the severe congested interior node to the egress, the measurement
interval, i.e., T, and the trip time of the notification signaling
messages from egress to ingress. Moreover, until the overload
decreases at the severe congested interior node, an additional trip
time from the ingress node to the severe congested interior node must
expire. This is because immediately before receiving the congestion
notification, the ingress may have sent out packets in the flows that
where selected for termination. That is, a terminated flow may
contribute to congestion for a time longer that is taken from the
ingress to the interior node. Without considering the above, interior
nodes would continue marking the packets until the measured
utilization falls below the severe congestion restoration threshold.
In this way, in the end more flows will be terminated than necessary,
i.e., an over-reaction takes place. [CsTa05] provides a solution to
this problem, where the interior nodes use a sliding window memory to
keep track of the signaling overload in a couple of previous
measurement intervals. At the end of a measurement intervals, T,
before encoding and signaling the overloaded rate as "encoded DSCP"
packets, the actual overload is decreased with the sum of already
signaled overload stored in the sliding window memory, since that
overload is already being handled in the severe congestion handling
control loop. The sliding window memory consists of an integer number
of cells, i.e, n = maximum number of cells. Guidelines for
configuring the sliding window parameters are given in [CsTa05].
At the end of each measurement interval, the newest calculated
overload is pushed into the memory, and the oldest cell is dropped.
If Mi is the overload_rate stored in ith memory cell (i = [1..n]),
then at the end of every measurement interval, the overload rate that
is signaled to the egress node, i.e., signaled_overload_rate is
calculated as follows:
Sum_Mi =0
For i =1 to n
{
Sum_Mi = Sum_Mi + Mi
}
Westberg, et al. [Page 17]
INTERNET-DRAFT Load Control
signaled_overload_rate = measured_overload_rate - Sum_Mi,
where Sum_Mi is calculated as above.
Next, the sliding memory is updated as follows:
for i = 1..(n-1): Mi <- Mi+1
Mn <- signaled_overload_rate
The bytes that have to be remarked to satisfy the signaled overload
rate: signaled_remarked_bytes, are calculated as follows:
signaled_remarked_bytes = signaled_overload_rate*T/N
The signal_remarked_bytes represents also the number of
the outgoing packets (after the dropping stage) that must be
remarked, during each measurement interval T, by a node when operates
in severe congestion mode.
Note that in order to process an overload situation higher than 100%
of the maintained severe congestion threshold all the nodes within
the domain must be configured and maintain a scaling parameter, e.g.,
N used in the above equation, which in combination with the marked
bytes, e.g., signaled_remarked_bytes, such a high overload situation
ca be calculated and represented.
Note that when incoming remarked bytes are dropped, the operation of
the flow termination algorithm may be affected, e.g., the algorithm
may become in certain situations slower. An implementation of the
algorithm may assure as much as possible that the incoming marked
bytes are not dropped. This could for example be accomplished by
using different dropping rate thresholds for marked and unmarked
bytes.
All the outgoing packets that are not marked
(i.e., by using the "encoded DSCP") have to be remarked using the
"affected DSCP" code.
4.2.3 Operation in the Egress nodes
The QNE Egress node applies a predefined
policy to solve the severe congestion situation, by selecting a
number of inter-domain (end-to-end) flows that should be terminated,
or forwarded in a lower priority queue.
Westberg, et al. [Page 18]
INTERNET-DRAFT Load Control
Some flows, belonging to the same PHB traffic class might get
other priority than other flows belonging to the same PHB traffic
class. It is considered that this difference in priority can be
notified by a signalling protocol and that the edges can store and
maintain the priority information releted to each of the end-to-end
flows. The terminated flows are selected from the flows having the
same PHB traffic class as the PHB
of the marked (as "encoded DSCP") and "affected DSCP" (when applied
in the complete LC-PCN domain) packets and that are belonging to the
same ingress/egress pair aggregate.
For flows associated with the same PHB traffic class the priority of
the flow plays a significant role. An example of calculating the
number of flows associated with each priority class that have to be
terminated is described below.
The states of operation in Egress nodes are similar to the ones
described in Section 4.2.2. The definition of the events, see below,
is how ever different than the definition of the events given in
Figure 4.
* event A: the egress node measures the rate of the incoming
"encoded_DSCP" marked packets and compare it with a
predefined congestion notification detection threshold and to a
severe congestion detection threshold in the egress. Note that the
detection thresholds used in the egress for congestion notification
and flow termination may be different than the ones used in interior
nodes. When the measured rate of "encoded DSCP" bytes is higher than
the congestion notification threshold but lower than the severe
congestion threshold then event_A is activated.
* event B: this event occurs when the egress receives packets marked
as either "encoded DSCP" or "affected DSCP". However, when the
"encoded_DSCP" marking is also used for congestion notification
detection purposes, see description of event_A, then event_B is only
activated if either "affected DSCP" packets are received or if the
rate of the incoming "encoded_DSCP" marked packets is higher than the
preconfigured severe congestion detection egress threshold.
* event C: this event occurs when the rate of incoming "encoded
DSCP" packets decreases below the congestion notification threshold.
Westberg, et al. [Page 19]
INTERNET-DRAFT Load Control
* event D: this event occurs when the egress does not receive packets
marked as either "encoded DSCP" or "affected DSCP". When the
"encoded_DSCP" marking is also used for congestion notification
detection, see description of event_A, event_B, event_C, then the
event_D is only activated if either "affected DSCP" packets are not
anymore received or if the rate of the incoming "encoded_DSCP" marked
packets is slower than the preconfigured severe congestion
restoration threshold in egress.
* event E: this event occurs when the egress does not receive packets
marked as either "encoded DSCP" or "affected DSCP"
An example of the algorithm for calculation of the
number of flows associated with each priority class that have to be
terminated is explained by the pseudocode below.
First, when the egress operates in the severe congestion state then
the total amount of remarked bandwidth, per ingress/egress pair
reservation aggregate, associated with the PHB
traffic class, say total_congested_bandwidth, is calculated.
This bandwidth represents the severe congested
bandwidth, per ingress/egress pair, that should be terminated.
Note that the below algorithm is performed for each
ingress/egress pair reservation aggregate.
The total_congested_bandwidth can be calculated as follows:
total_congested_bandwidth = N*input_remarked_bytes/T
Where, input_remarked_bytes represents the number of marked bytes
that arrive at the egress, during one measurement interval T, N is
defined as in Section 4.2.1. The term denoted as
terminated_bandwidth is a temporal variable representing the total
bandwidth that have to be terminated, belonging to the same
PHB traffic class. The terminate_flow_bandwidth(priority_class) is
the total of bandwidth associated with flows of priority class equal
to priority_class. The parameter priority_class is an integer
fulfilling
0 < priority_class =< Maximum_priority.
Westberg, et al. [Page 20]
INTERNET-DRAFT Load Control
The calculate_terminate_flows(priority_class) function determines the
flows for a given priority class and per PHB that has to be
terminated. This function also calculates the term
sum_bandwidth_terminate(priority_class), which is the sum of the
bandwith associated with the flows that will be terminated.
The constraint of finding the total number of flows that have to
be terminated is that sum_bandwidth_terminate(priority_class), should
be smaller or approximatelly equal to the variable
terminate_bandwidth(priority_class).
terminated_bandwidth = 0;
priority_class = 0;
while terminated_bandwidth < total_congested_bandwidth
{
terminate_bandwidth(priority_class) =
= total_congested_bandwidth - terminated_bandwidth
calculate_terminate_flows(priority_class);
terminated_bandwidth =
= sum_bandwidth_terminate(priority_class) + terminated_bandwidth;
priority_class = priority_class + 1;
}
For the end-to-end flows (sessions) that have to be terminated, the
Egress node generates and sends notification message to the ingress
node to indicate the flow termination in the communication path.
Furthermore, for the aggregated sessions that are affected, the
Egress sends within a notify message that contains the
To be released bandwidth, associated with the
aggregated reservation state.
Note that egress should restore the original DSCP
values of the remarked packets, otherwise multiple actions for the
same event might occur. However, this value MAY be left in its
remarking form if there is an SLA agreement between domains that a
downstream domain handles the remarking problem.
4.3 Admission control based on probing for bi-directional flows
This section describes the admission control scheme that uses the
admission control function based on probing when bi-directional
reservations are supported.
Westberg, et al. [Page 21]
INTERNET-DRAFT Load Control
Ingress) Interior Interior Interior Egress
user| | | | |
data| | | | |
--->| | user data | |user data |
|-------------------------------------------->S (#marked bytes)
| | | S-------------->|
| | | S(#unmarked bytes)
| | | S-------------->|
| | | S |
| | probe(re-marked DSCP) |
| | | S |
|-------------------------------------------->S |
| | | S-------------->|
| | | S |
| | response(unsuccessful) |
|<------------------------------------------------------------|
| | | S |
Figure 5: Admission control based on probing
for bi-directional admission control (congestion on path
from Ingress towards Egress)
This procedure is similar to the admission control procedure
described in Section 4.1. The main
difference is related to the location of the severe congested node,
i.e., "forward" path (i.e., path between Ingress towards
Egress) or "reverse" path (i.e., path between Egress towards
Ingress).
Figure 5 shows the scenario where the severe congested node is
located in the "forward" path. The functionality of providing
admission control is the same as the one described in Section
4.1, Figure 2.
Figure 6 shows the scenario where the congested node is located in
the "reverse" path. The probe packet sent in the "forward"
direction will not be affected by the severe congested node, while
the DSCP value in the IP header of any packet of the "reverse"
direction flow and also of the probe packet that carries the
sent in the "reverse" direction will be
remarked by the congested node. The ingress is in this way
notified that a congestion occurred in the network and therefore it
is able to reject the new initiation of the reservation.
Westberg, et al. [Page 22]
INTERNET-DRAFT Load Control
Ingress Interior Interior interior Egress
user| | | | |
data| | | | |
--->| | user data | | |
|-------------------------------------------->|user data |user
| | | |-------------->|data
| | | | |--->
| | | | |user
| | | | |data
| | | | |<---
| S | user data | |
| S user data |<--------------------------|
| user data S<---------------| | |
|<---------------S | | |
| user data S | | |
| (#marked bytes)S | | |
|<---------------S | | |
| S probe(unmarked DSCP) |
| S | | |
|----------------S------------------------------------------->|
| S probe(re-marked DSCP) |
| S<-------------------------------------------|
|<---------------S | | |
Figure 6: Admission control based on probing for
bi-directional admission control (congestion on path
Egress towards Ingress)
4.4 Flow termination handling for bi-directional flows
This section describes the flow termination handling operation for
bi-directional flows. This flow termination handling operation is
similar to the one described in Section 4.2.
Westberg, et al. [Page 23]
INTERNET-DRAFT Load Control
Ingress Interior Interior Interior Egress
user| | | | |
data| user | | | |
--->| data | user data | |user data |
|--------------->| | S |
| |--------------------------->S (#marked bytes)
| | | S-------------->|
| | | S(#unmarked bytes)
| | | S-------------->|Term
| | | S |flow?
| | notification (terminate) |YES
|<------------------------------------------------------------|
|release (forward) | S |
|------------------------------------------------------------>|
| release (reverese) | S |
|<------------------------------------------------------------|
| | | S |
Figure 7: Flow termination handling for
bi-directional reservation (congestion on path Ingress
towards Egress)
This procedure is similar to the flow termination handling procedure
described in Section 4.2. The main difference is related to the
location of the severe congested node, i.e. "forward" or "reverse"
path. Note that when a severe congestion situation occurs on
e.g. on a forward path, and flows are terminated to solve the severe
congestion in forward path, then the reserved bandwidth associated
with the terminated bidirectional flows will also be released.
Therefore, a careful selection of the flows that have to be
terminated should take place. An example of such a selection is given
below.
When a severe congestion occurs on e.g., in the forward path, and
when the algorithm terminates flows to solve the flow termination in
forward path, then the reserved bandwidth associated with the
terminated bidirectional flows is also released. Therefore, a careful
selection of the flows that have to be terminated should take place.
A possible method of selecting the flows belonging to the same
priority type passing through the severe congestion point on a
unidirectional path can be the following:
* the egress node should select, if possible, first unidirectional
flows instead of bidirectional flows
Westberg, et al. [Page 24]
INTERNET-DRAFT Load Control
* the egress node should select, if possible, bidirectional flows
that reserved a relatively small amount of resources on the path
reversed to the path of congestion.
Ingress) Interior Interior Interior Egress
user| | | | |
data| user | | | |
--->| data | user data | |user data |
|--------------->| | | |
| |--------------------------->|user data |user
| | | |-------------->|data
| | | | |--->
| | | user | |<---
| user data | | data |<--------------|
| (#marked bytes)| S<----------| |
|<--------------------------------S | |
| (#unmarked bytes) S | |
Term|<--------------------------------S | |
Flow? | S | |
YES | | S | |
|release (forward) S | |
|------------------------------------------------------------>|
| release (reverse) S | |
|<------------------------------------------------------------|
| | S | |
Figure 8: Flow termination handling for
bi-directional reservation (congestion on path Egress
towards Ingress)
Furthermore, a special case of this operation is associated to the
severe congestion situation occurring simultaneously on the forward
and reverse paths. An example of this operation is given below.
Consider that the egress node selects a number of bi-directional
flows to be terminated, see Figure 9. In this case the egress will
send for each bi-directional flows a notification message to ingress.
If the Ingress receives these notification messages and its
operational state (associated with reverse path) is in the severe
congestion state (see Figure 4), then the ingress operates in the
following way:
Westberg, et al. [Page 25]
INTERNET-DRAFT Load Control
Ingress) Interior Interior Interior Egress
user| | | | |
data| user | | | |
--->| data | #unmarked bytes| | |
|--------------->S #marked bytes | | |
| S--------------------------->| |
| | | |-------------->|data
| | | | |--->
| | | | Term.?
| NOTIFY | | |Yes
|<------------------------------------------------------------|
| | | | |data
| | | user | |<---
| user data | | data |<--------------|
| (#marked bytes)| S<----------| |
|<--------------------------------S | |
| (#unmarked bytes) S | |
Term|<--------------------------------S | |
Flow? | S | |
YES | | S | |
|release (forward) S | |
|------------------------------------------------------------>|
| release (reverse) S | |
|<------------------------------------------------------------|
Figure 9: Flow termination handling for
bi-directional reservation (congestion on both forward and
reverse direction)
* For each notification message, the Ingress should identify the
bidirectional flows have to be terminated.
* The ingress then calculates the total bandwidth that should be
released in the reverse direction (thus not in forward direction) if
the bidirectional flows will be terminated (preempted), say
"notify_reverse_bandwidth". This bandwidth can be calculated by the
sum of the bandwidth values associated with all the end-to-end
flows that received a (flow termination) notification message.
Westberg, et al. [Page 26]
INTERNET-DRAFT Load Control
* Furthermore, using the received marked packets (from the reverse
path) the ingress will calculate, using the algorithm used by an
egress and described in Section 4.2.3, the total bandwidth that has
to be terminated in order to solve the congestion in the reverse path
direction, say "marked_reverse_bandwidth".
* The ingress then calculates the bandwidth of the additional flows
that have to be terminated, say "additional_reverse_bandwidth", in
order to solve the flow termination in reverse direction, by taking
into account:
** the bandwidth in the reverse direction of the bidirectional flows
that were appointed by the egress (the ones that received a
notification message) to be preempted, i.e.,
"notify_reverse_bandwidth"
** the total amount of bandwidth in the reverse direction that has
been calculated by using the received marked packets, i.e.,
"marked_reverse_bandwidth".
This additional bandwidth can be calculated using the following
algorithm:
IF ("marked_reverse_bandwidth" > "notify_reverse_bandwidth") THEN
"additional_reverse_bandwidth" =
= "marked_reverse_bandwidth"- "notify_reverse_bandwidth";
ELSE
"additional_reverse_bandwidth" = 0
* Ingress terminates the flows that experienced a severe congestion
in the "forward" path and received a (flow termination) notification
message
* If possible the ingress should terminate unidirectional flows that
are using the same egress-ingress reverse direction communication
path to satisfy the release of a total bandiwtdh up equal to the:
"additional_reverse_bandwidth".
Westberg, et al. [Page 27]
INTERNET-DRAFT Load Control
* If the number of required uni-directional flows (to satisfy the
above issue) is not available, then a number of bi-directional flows
that are using the same egress-ingress reverse direction
communication path may be selected for preemption in order to satisfy
the release of a total bandiwtdh equal up to the:
"additional_reverse_bandwidth". Note that using the guidelines given
in above, first the bidirectional flows that reserved a
relatively small amount of resources on the path reversed to the path
of congestion should be selected for termination.
* Furthermore, the egress includes the to be released
aggregated bandwidth value in one of the notification messages.
* The Ingress receives this notification message and reads the value
of the carried to be released aggregated bandwidth.
Note that this value is denoted as
"aggregated_notify_reverse_bandwidth". The variables
"marked_reverse_bandwidth" and "additional_reverse_bandwidth are
calculated using the same steps as explained for the situation that
the QNE edges maintain per flow intra-domain QoS-NSLP states.
The size of the aggregated reservation state can be reduced in the
"forward" and "reverse" by using the received to be reduced values
the aggregated bandwidth in "forward" and "reverese" directions.
Figure 7 shows the scenario where the severe congested node is
located in the "forward" path. This scenario is very similar to the
flow termination handling scenario described in Section 4.2. The
difference is related to the release procedure, which is accomplished
in both directions "forward" and "reverse".
Figure 8 shows the scenario where the severe congested node is
located in the "reverse" path. The main difference between this
scenario and the scenario shown in Figure 7 is that no
notification messages have to be generated by the Egress
node. This is because the (#marked and #unmarked) user data is
arriving at the Ingress. The Ingress node will be able to
calculate the number of flows that have to be terminated or forwarded
in a lower priority queue.
Westberg, et al. [Page 28]
INTERNET-DRAFT Load Control
5. Security Considerations
We propose the use of the (DS field) to provide admission control and
flow termination support within a DiffServ domain. This poses similar
security problems to the use of the DS field to differentiate packets
specified in [RFC2475].
<<to be extended>>
6. IANA Considerations
<< to be done>>
7. Acknowledgments
<<to be done>>
8. Authors' Addresses
Attila Bader
Ericsson Research
Ericsson Hungary Ltd.
Laborc 1, Budapest, Hungary, H-1037
EMail: Attila.Bader@ericsson.com
Lars Westberg
Ericsson Research
Torshamnsgatan 23
SE-164 80 Stockholm, Sweden
EMail: Lars.Westberg@ericsson.com
Georgios Karagiannis
University of Twente
P.O. BOX 217
7500 AE Enschede, The Netherlands
EMail: g.karagiannis@ewi.utwente.nl
David Partain
Ericsson Radio Systems AB
P.O. Box 1248
SE-581 12 Linkoping
Sweden
EMail: David.Partain@ericsson.com
Westberg, et al. [Page 29]
INTERNET-DRAFT Load Control
9. Normative References
10. Informative References
[Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
Speer, M., Braden, R., "Interoperation of RSVP/Intserv and Diffserv
Networks", Work in Progress, March 1999
[Berson97] Berson, S. and Vincent, R., "Aggregation of Internet
Integrated Services State", Work in Progress, December 1997.
[CL-ARCH] Briscoe, B., et. al., "An edge-to-edge Deployment model for
pre-congestion notification: Admission control over a Diffserv
region", IETF work in progress, October 2006.
[CL-PHB] Briscoe, B., et. al., "Pre-congestion notification marking",
IETF work in progress, October 2006.
[CsTa05] Csaszar, A., Takacs, A., Szabo, R., Henk, T., "Resilient
Reduced-State Resource Reservation", Journal of Communication and
Networks, Vol. 7, Nr. 4, December 2005.
[Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP
based
QoS Requests", Work in Progress, November 1997.
[RFC3175] Baker, F., Iturralde, C. Le Faucher, F., Davie, B.,
"Aggregation of RSVP for IPv4 and IPv6 Reservations",
IETF RFC 3175, 2001.
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.
and W. Weiss, "An Architecture for Differentiated Services", RFC
2475, December 1998
[RMD] Bader, A., et. al., "RMD-QOSM: The resource management in
Diffserv QoS Model", IETF Work in Progress, March. 2007
[Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic
Packet States", Work in Progress, February 1999
[Westberg00] Westberg, L, et. al., "Load Control of Real-Time
Traffic", IETF work in progress, April 2000.
Westberg, et al. [Page 30]
INTERNET-DRAFT Load Control
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
| PAFTECH AB 2003-2026 | 2026-04-22 22:48:50 |