One document matched: draft-charny-ef-definition-00.txt
Internet Draft Anna Charny, ed.
Cisco Systems
Fred Baker
Cisco Systems
Jon Bennett
Riverdelta Networks
Kent Benson
Tellabs
Jean-Yves Le Boudec
EPFL
Angela Chiu
AT&T Labs
William Courtney
TRW
Bruce Davie
Cisco Systems
Shahram Davari
PMC-Sierra
Victor Firou
Nortel Networks
Charles Kalmanek
AT&T Research
K.K. Ramakrishnan
AT&T Research
Dimitrios Stiliadis
Lucent Technologies
Expires January, 2001
draft-charny-ef-definition-00.txt July 2000
EF PHB Redefined
Status of this Memo
This document is an Internet Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
Charny, ed. et al. 1
draft-charny-ef-definition.txt July, 2000
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a
"working draft" or "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.ietf.org (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
This document is a product of the Diffserv working group of the
Internet Engineering Task Force. Please address comments to the
group's mailing list at diffserv@ietf.org, with a copy to the
authors.
Copyright (C) The Internet Society (1999). All Rights
Reserved.
Abstract
This document proposes text aiming at providing clarification to RFC
2598. The primary motivation for this draft is that the definition
of EF PHB given in RFC 2598 does not match the intuition of the EF
PHB, and has been determined to be unimplementable in a manner
compliant with the definition. In particular, no work-conserving
scheduler, including all of the schedulers listed in the
Implementation Examples section of RFC 2598, can comply with the
definition given in that RFC.
This draft gives a rigorous definition of EF PHB which in our
opinion preserves the spirit of the EF PHB as intended by RFC 2598
while allowing a number of reasonable compliant implementations. The
document also provides implementation guidance and lists several
possible implementation examples which conform to the definition
given in this draft.
1 Introduction
Charny, ed. et al. 2
draft-charny-ef-definition.txt July, 2000
The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to
be used to build a low-loss, low-latency, low-jitter, assured
bandwidth service. The potential benefits of this service, and
therefore the EF PHB, are enormous. Because of the great value of
this PHB, it is critical that the forwarding behavior required of
and delivered by an EF-compliant node be specific, quantifiable, and
unambiguous.
Many members of the data networking community have a strong
intuitive feel for the packet forwarding behavior required by EF.
This intuitive forwarding behavior is very useful and should form
the basis of the formal EF conformance definition. Unfortunately
however, the actual definition contained in section 2 of RFC 2598
does not fully reflect this intuitive understanding. Many of the
forwarding behaviors which are intuitively reasonable do not
actually comply with the formal definition of RFC 2598. Furthermore,
many of the schedulers believed to deliver EF-compliant behavior
cannot be used to implement the formal definition of EF since they
result in forwarding treatment which does not comply with the
definition of RFC 2598. In fact, schedulers listed in sections 2.2
and A.3 of RFC 2598 as implementation examples realizing EF do not
actually deliver the forwarding behavior defined earlier in the same
document. Moreover, it can be shown that no work-conserving
scheduler can satisfy the definition of RFC 2598.
There are several potentially serious problems with having a formal
EF definition that does not match people's intuitive understanding.
First, the understanding of what it means for a node to be EF-
compliant may vary among people. This discrepancy may arise due to
the fact that two people's intuitive understanding of the definition
may actually differ somewhat; also, someone learning about EF from
the formal definition may develop an understanding of EF at odds
with the understanding that most people currently familiar with EF
have. These discrepancies in people's understanding of EF may have
serious consequences. The resulting confusion may increase the time
and cost needed to develop equipment, cause interoperability
problems, and create mismatches between expected node and network
performance and actual performance. Second, the lack of a clear
conformance definition makes it impossible to test a piece of
equipment and declare it "conforming" or "non-conforming." Third,
the lack of a mathematically precise description of a node's
behavior makes it impossible to analytically design or evaluate
services constructed using the EF PHB or other PHBs that must
contend for resources with EF traffic. Fourth, an incorrect formal
definition of EF may lead to erroneous reasoning about the
properties of networks implementing EF.
The goal of this document is to bring the formal compliance
definition in line with what we believe is the intuitive
understanding of EF PHB. It is NOT the goal of this document to
redefine the intuitive behavior expected of EF PHB, nor it is the
Charny, ed. et al. 3
draft-charny-ef-definition.txt July, 2000
goal to change the expectations of the properties of schedulers
widely believed to deliver adequate EF PHB.
Rather, our sole goal is to introduce minimal changes to the
existing definition so that forwarding behaviors that unambiguously
satisfy the well-established intuition become formally compliant
with the definition, and schedulers widely believed suitable for
implementing EF PHB actually formally comply with the modified
definition. Hence, our goal is not to change the concept of EF PHB,
but rather to remove ambiguities and add mathematical precision to
the definition of the familiar EF PHB.
With this goal in mind, this document proposes a new definition of
EF PHB. This definition provides several benefits. First, it closely
matches what we believe is the intuitive definition of EF. Second,
the proposed definition permits EF to be implemented using many
well-known schedulers. Third, the proposed definition allows the
behavior of EF-compliant implementations to be unambiguously
expressed in mathematical terms. This means that a quantitative
guarantee can be given about a node's behavior, nodes can be tested
for compliance, and local, per-domain, and end-to-end behaviors of a
variety of compliant practical implementations can be studied
analytically.
In the remainder of section 1, we consider the definition and
intuitive content of EF PHB as given in RFC 2598. We identify
problems with that definition and show that without some change to
the current definition it is impossible to implement EF PHB with any
work-conserving scheduler. We then discuss possible simple changes
that can be made to the definition which correct some, but not all,
of the identified problems. We then argue that a different solution
is necessary to correct the remaining problems.
In section 2, we provide a new definition for EF PHB called the
"packet scale rate guarantee." We then discuss this definition,
describe some of its properties, and compare it to the better-known
rate-latency curve.
In sections 3 and 4, we demonstrate that our definition is
satisfiable by several practical schedulers such as Priority Queuing
(PQ) and Worst-Case Fair Weighted Fair Queuing (WF2Q). We also
discuss other implementations and give values for the key parameter
of the new definition for a large variety of schedulers.
Proofs of critical statements about the properties and
satisfiability of the new definition are given in the Appendices.
1.1 The RFC 2598 Definition of EF PHB and Its Intuitive Meaning
The definition of the EF PHB as given in [RFC 2598] states:
Charny, ed. et al. 4
draft-charny-ef-definition.txt July, 2000
It [the EF PHB departure rate] SHOULD average at least the
configured rate when measured over any time interval equal
to or longer than the time it takes to send an output link
MTU sized packet at the configured rate.
The intuitive content of this definition is fairly clear. On all
time scales ranging down to very small time scales, the EF aggregate
should be given at least its configured share of the output link
bandwidth. Among other things, this allows EF to support
applications that are delay- and jitter-sensitive.
However, intuition alone will not allow vendors to design compliant
schedulers capable of advertising their EF configuration to other
routers. As we show in the next section, the simplicity of the
definition is misleading in the sense that it does not actually
capture the intuition correctly under a number of circumstances.
A note is due here on the precise interpretation of the wording of
the definition. A potential cause of ambiguity is the fact that the
definition contains the word SHOULD which according to [Bra97] means
that in principle an implementation of EF PHB may under some
circumstances choose not to be strictly compliant with the specified
requirement, in which case any issues with the strict definition may
be viewed as irrelevant. However, it seems that in order for the
SHOULD to be meaningful, there should exist at least some
implementations which are strictly compliant, even if non-compliant
implementations may be chosen under some circumstances. Furthermore,
the Virtual Wire behavior aggregate [JNP2000] is defined by
replacing SHOULD by MUST in the definition of EF PHB in RFC 2598.
Therefore, in all cases the exact mathematical properties of the EF
definition and the existence of strictly compliant implementations
are of substantial interest. The remainder of this section
concentrates on the discussion of these issues in detail.
1.2 Difficulties with the RFC 2598 EF PHB Definition
A literal interpretation of the definition would consider the
behaviors given in the next two subsections as non-compliant. The
definition also unnecessarily constrains the maximum configurable
rate of an EF aggregate.
1.2.1 Perfectly-Clocked Forwarding
Consider the following stream forwarded from a router with EF-
configured rate R=C/2, where C is the output line rate. In the
illustration, E is an MTU-sized EF packet while x is a non-EF packet
or unused capacity, also of size MTU.
... E x E x E x E x E x E x...
|-----|
Charny, ed. et al. 5
draft-charny-ef-definition.txt July, 2000
The interval between the vertical bars is 3*MTU/C, which is greater
than MTU/(C/2), and so is subject to the EF PHB definition. During
this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
but only MTU bits are forwarded. Therefore, while this forwarding
pattern should be considered compliant under any reasonable
interpretation of the EF PHB, it actually does not formally comply
with the definition of RFC 2598.
Note that this forwarding pattern can occur in any work-conserving
scheduler in an ideal output-buffered architecture where EF packets
arrive in a perfectly clocked manner according to the above pattern
and are forwarded according to exactly the same pattern in the
absence of any non-EF traffic.
Trivial as this example may be, it reveals the lack of mathematical
precision in the formal definition. The fact that no work-conserving
scheduler can formally comply with the definition is unfortunate,
and appears to warrant some changes to the definition that would
correct this problem.
The underlying reason for the problem described here is quite simple
- one can only expect that the EF aggregate is served at configured
rate in some interval where there is enough backlog of EF packets to
sustain that rate. In the example above the packets come in exactly
at the rate at which they are served, and so there is no persistent
backlog. Certainly, if the input rate is even smaller than the
configured rate of the EF aggregate, there will be no backlog as
well, and a similar formal difficulty will occur.
A seemingly simple solution to this difficulty might be to require
that the EF aggregate is served at its configured rate only when the
queue is backlogged. However, as we show in the remainder of this
section, this solution does not suffice.
1.2.2 Router Internal Delay
We now argue that the example considered in the previous section is
not as trivial as it may seem at first glance.
Consider a router with EF configured rate R = C/2 as in the previous
example, but with an internal delay of 3T (where T = MTU/C) between
the time that a packet arrives at the router and the time that it is
first eligible for forwarding at the output link. Such things as
header processing, route look-up, and delay in switching through a
multi-layer fabric could cause this delay. Now suppose that EF
traffic arrives regularly at a rate of (2/3)R = C/3. The router will
perform as shown below.
EF Packet Number 1 2 3 4 5 6 ...
Arrival (at router) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure 4T 7T 10T 13T 16T 19T ...
Charny, ed. et al. 6
draft-charny-ef-definition.txt July, 2000
Again, the output does not satisfy the RFC 2598 definition of EF
PHB. As in the previous example, the underlying reason for this
problem is that the scheduler cannot forward EF traffic faster than
it arrives. However, it can be easily seen that the existence of
internal delay causes one packet to be inside the router at all
times. An external observer will rightfully conclude that the number
of EF packets that arrived to the router is always at least one
greater than the number of EF packets that left the router, and
therefore the EF aggregate is constantly backlogged. However, while
the EF aggregate is continuously backlogged, the observed output
rate is nevertheless strictly less that the configured rate.
This example indicates that the simple addition of the condition
that EF aggregate must receive its configured rate only when the EF
aggregate is backlogged does not suffice in this case.
Yet, the problem described here is of fundamental importance in
practice. Most routers have a certain amount of internal delay. A
vendor declaring EF compliance is not expected to simultaneously
declare the details of the internals of the router. Therefore, the
existence of internal delay may cause a perfectly reasonable EF
implementation to display seemingly non-conformant behavior, which
is clearly undesirable.
1.2.3 Maximum Configurable Rate and Provisioning Efficiency
It is well understood that with any non-preemptive scheduler, the
compliant configurable rate for an EF aggregate cannot exceed C/2
[JNP2000]. This is because an MTU-sized EF packet may arrive to an
empty queue at time t just as an MTU-sized non-EF packet begins
service. The maximum number of EF bits that could be forwarded
during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
R > C/2, then this interval would be of length greater than MTU/R,
and more than MTU EF bits would have to be served during this
interval for the router to be compliant. Thus, R must be no greater
than C/2.
It can be shown that for schedulers other than PQ, such as various
implementations of WFQ, the maximum compliant configured rate may be
much smaller than 50%. For example, for SCFQ [Gol94] the maximum
configured rate cannot exceed C/N, where N is the number of queues
in the scheduler. For WRR, mentioned as compliant in section 2.2 of
RFC 2598, this limitation is even more severe. This is because in
these schedulers a packet arriving to an empty EF queue may be
forced to wait until one packet from each other queue (in the case
of SCFQ) or until several packets from each other queue (in the case
of WRR) are served before it will finally be forwarded.
While it is frequently assumed that the configured rate of EF
traffic will be substantially smaller than the link bandwidth, the
requirement that this rate should never exceed 50% of the link
Charny, ed. et al. 7
draft-charny-ef-definition.txt July, 2000
bandwidth appears unnecessarily limiting. For example, in a fully
connected mesh network, where any flow traverses a single link on
its way from source to its destination there seems no compelling
reason to limit the amount of EF traffic to 50% (or an even smaller
percentage for some schedulers) of the link bandwidth.
Another, perhaps even more striking example is the fact that even a
TDM circuit with dedicated slots cannot be configured to forward EF
packets at more than 50% of the link speed without violating RFC
2598 (unless the entire link is configured for EF). If the
configured rate of EF traffic is greater than 50% (but less than the
link speed), there will always exist an interval longer than MTU/R
in which less than the configured rate is achieved. For example,
suppose the configured rate of the EF aggregate is 2C/3. Then the
forwarding pattern of the TDM circuit might be
E E x E E x E E x ...
|---|
where only one packet is served in the marked interval of length 2T
= 2MTU/C. But at least 4/3 MTU would have to be served during this
interval by a router in compliance with the definition in RFC 2598.
The fact that even a TDM line cannot be booked over 50% by EF
traffic indicates that the restriction is artificial and
unnecessary.
1.3 The Non-trivial Nature of the Difficulties
One possibility to correct the problems discussed in the previous
sections might be to attempt to clarify the definition of the
intervals to which the definition applied or by averaging over
multiple intervals. However, an attempt to do so meets with
considerable analytical and implementation difficulties. For
example, attempting to align interval start times with some epochs
of the forwarded stream appears to require a certain degree of
global clock synchronization and is fraught with the risk of
misinterpretation and mistake in practice.
Another approach might be to allow averaging of the rates over some
larger time scale. However, it is unclear exactly what finite time
scale would suffice in all reasonable cases. Furthermore, this
approach would compromise the notion of very short-term time scale
guarantees that are the essence of EF PHB.
We also explored a combination of two simple fixes. The first is the
addition of the condition that the only intervals subject to the
definition are those that fall inside a period during which the EF
aggregate is continuously backlogged in the router (i.e., when an EF
packet is in the router). The second is the addition of an error
(latency) term that could serve as a figure-of-merit in the
advertising of EF services.
Charny, ed. et al. 8
draft-charny-ef-definition.txt July, 2000
With the addition of these two changes the candidate definition
becomes as follows:
In any interval of time (t1, t2) in which EF traffic is
continuously backlogged, at least R(t2 - t1 - E) bits of EF
traffic must be served, where R is the configured rate for the
EF aggregate and E is an implementation-specific latency term.
The "continuously backlogged" condition eliminates the insufficient-
packets-to-forward difficulty while the addition of the latency term
of size MTU/C resolves the perfectly-clocked forwarding example
(section 1.2.1), and also removes the limitation on EF configured
rate.
However, neither fix (nor the two of them together) resolves the
example of section 1.2.2. To see this, recall that in the example of
section 1.2.2 the EF aggregate is continuously backlogged, but the
service rate of the EF aggregate is consistently smaller than the
configured rate, and therefore no finite latency term will suffice
to bring the example into conformance. This appears to be a serious
problem.
Therefore, we believe that such modification, albeit attractive in
its simplicity, falls short of addressing all the problems
identified with the definition of the RFC 2598.
The next section proposes a new definition that aims at addressing
all the issues discussed so far.
2 Definition of EF PHB
2.1 The Formal Definition
The intent of EF PHB is to provide the configured service rate to
the EF aggregate at as small a timescale as possible. In order to
express this notion rigorously, we introduce the definition of
"packet time scale rate guarantee".
We first need to introduce some notation.
Let P_inp(j) and P_out(j) denote the j-th packet of the EF aggregate
arriving to and departing from a network node respectively. If
several packets of an EF aggregate arrive simultaneously from
different inputs, ties are broken arbitrarily. In the case when all
EF traffic shares a single FIFO queue, P_inp(j) and P_out(j) refer
to the same packet, but in general the j-th arrival and the
j-th departure may correspond to different packets.
Let a(j) denote the time of arrival of the last bit of P_inp(j)
to a network node. Let d(j) denote the time of departure of the
last bit of P_out(j) from the network node. Let L(j) denote the
length of P_out(j).
Charny, ed. et al. 9
draft-charny-ef-definition.txt July, 2000
We require that the indexing is chosen in such a way that the packet
P_inp(1) arriving at time a(1) sees no other packet of the EF
aggregate in the node upon arrival, and d(1) >= a(1).
Definition DEF_1
----------------
We say that a node offers to the EF aggregate a "packet time scale
rate guarantee R with latency E" if the j-th departure time
satisfies the following condition for all j>=0:
d(j) <= F(j) + E (eq_1)
where F(j) is defined iteratively by
F(0)=0, d(0) = 0
F(j)=max(a(j), min(d(j-1), FF(j-1)))+ L(j)/R for all j>0 (eq_2)
Note that the choice of indexes does not restrict when in the actual
packet stream we start the observation of the arrival and departure
process. The only restriction that is being imposed is that the
observation starts when there are no EF packets in the node. Note
also that while index j=1 corresponds to the first packet in the
observation, index j=0 does not correspond to any packet at all and
is used solely for initial conditions on d(j) and F(j) in the
recursion.
We now define EF PHB as a forwarding treatment for a particular
diffserv aggregate where the node offers to the aggregate a
packet time scale rate guarantee R with latency E, where R is a
configurable rate and E is a tolerance which depends on the
particular node characteristics.
2.2 The Meaning and Basic Properties of DEF_1.
To understand the meaning of DEF_1 we compare it with a well-known
rate-latency curve [LEB98], and argue that DEF_1 is stronger than
the rate-latency curve [LEB98] in the sense that if a scheduler
satisfies DEF_1 it also satisfies the rate-latency curve. As a
result, all the properties known for the rate-latency curve also
apply to DEF_1 above. We also argue why DEF_1 is more suitable to
reflect the intent of EF PHB than the rate-latency curve.
It is shown in [LEB98] that the rate-latency curve is equivalent to
the following definition:
Definition DEF_2:
d(j) <= F'(j) + E (eq_3)
Charny, ed. et al. 10
draft-charny-ef-definition.txt July, 2000
where
F'(0)=0,
F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_4)
It can be easily verified that DEF_1 is stronger than DEF_2 by
noticing that for all j, F'(j) >= F(j).
It is easy to see that F'(j) in the definition DEF_2 corresponds to
the time the j-th departure should have occurred should the EF
aggregate be constantly served exactly at its configured rate R.
Following the common convention, we refer to F'(j) as the "fluid
finish time" of the j-th packet to depart.
The intuitive meaning of the rate-latency curve of DEF_2 is that any
packet is served at most time E later than this packet would finish
service in the fluid model.
For a rate-latency curve DEF_2 (and hence for the stronger DEF_1) it
holds that in any interval (0,t) the EF aggregate gets close to the
desired service rate R (as long as there is enough traffic to
sustain this rate). The discrepancy between the ideal and the actual
service in this interval depends on the latency term E, which in
turn depends on the scheduling implementation. The smaller E, the
smaller the difference between the configured rate and the actual
rate achieved by the
scheduler.
While DEF_2 guarantees the desired rate to the EF aggregate in all
intervals (0,t) to within a specified error, it may nevertheless
result in large gaps in service. For example, suppose that (a large
number) N of identical EF packets of length L arrived from different
interfaces to the EF queue in the absence of any non-EF traffic.
Then any work-conserving scheduler will serve all N packets at link
speed. When the last packet is sent at time NL/C, where C is the
capacity of output link, F(N) will be equal to NL/R. Suppose now
that at time NL/C a large number of non-EF packets arrive, followed
by a single EF packet. Then the scheduler can legitimately delay
starting to send the EF packet until time F(N+1)=(N+1)L/R + E -
L/C. This means that the EF aggregate will have no service at all
in the interval (NL/C, (N+1)L/R + E - L/C). This interval can be
quite large if R is substantially smaller than C. In essence, the
EF aggregate can be "punished" by a gap in service for receiving
faster service than its configured rate at the beginning.
Definition DEF_1 alleviates this problem by introducing the term
min(d(j-1), F(j-1)) in the recursion. Essentially, this means
that the fluid finishing time is "reset" if that packet is sent too
early. As a consequence of that, for the case where the EF aggregate
is served in a FIFO order, suppose a packet arrives at time t to a
server satisfying DEF_1. The packet will be transmitted no later
Charny, ed. et al. 11
draft-charny-ef-definition.txt July, 2000
than time t + Q(t)/R + E, where Q(t) is the EF queue size at time t
(including the packet under discussion). This statement is proved in
Appendix A1.
Note finally that just as discussed in section 1.3, the latency term
E in EF_1 can still be viewed as a "figure of merit" and can be
used to compare different implementations of EF PHB.
2.3 Reordering within the EF aggregate.
The EF PHB SHOULD NOT cause packets in a microflow to be reordered.
Neither DEF_1 nor the RFC 2598 definition directly addresses how
reordering is to be avoided should packets from the same microflow
arrive at a node on different inputs. Nonetheless, because DEF_1
explicitly refers to the times of arrival and departure of EF
packets, the issue of re-ordering in microflows deserves a brief
discussion.
Section 4.5, below, discusses the effects that router internal delay
can have on the order in which EF packets are forwarded. The upshot
there is that differences in internal delay between one input and
output and another input and the same output can cause a later-
arriving packet to be delivered to the output scheduler ahead of an
earlier-arriving packet. A work-conserving scheduler may have no
alternative but to forward the two packets in reverse order of their
arrival. This shuffling of packets causes no harm, provided the
packets are not part of the same microflow. That is, this shuffling
(and any other shuffling caused by the scheduler itself, if the
scheduler is not a FIFO) will not reorder packets (in the EF PHB
sense) unless packets from the same microflow arrive on different
inputs of the node. This would imply that somewhere upstream,
packets of the same EF microflow were distributed across multiple
outputs. Such behavior is inherently destructive of microflow packet
order. It is recommended that all EF packets with the same
destination address arriving at a node be forwarded on the same
output.
2.4 Per Packet Delay.
It is important to note that just as the definition of EF PHB in
RFC 2598, DEF_1 does not in itself guarantee per-packet delay for
EF aggregate. While the definition implies that the aggregate
service is within a certain error from the desired service at the
configured rate, the definition says nothing about per-packet
delay. In particular, for non-FIFO service order for packets
within the EF aggregate it is possible in principle that a scheduler
satisfying the EF definition (both new and old) delays a given
packet an infinite amount of time. However, once the queuing
and scheduling mechanisms are specified, it should be possible to
Charny, ed. et al. 12
draft-charny-ef-definition.txt July, 2000
demonstrate per-packet delay bound for a reasonable scheduling
implementation (see section 4 for implementation examples).
For the FIFO service order within the EF aggregate the DEF_1 does
imply per-packet per-hop, per-domain, and end-to-end delay
guarantees. For example, an end-to-end delay bound for an EF
aggregate sharing an aggregate FIFO queue in a network of rate-
latency servers is derived in [CLeB2000]. This result implies that
the same per packet delay bound holds for a stronger DEF_1.
3. Satisfiability of the definition
It is important to make sure that there exist at least some
schedulers that satisfy definition DEF_1 with reasonable latency
terms.
It can be shown that the strict priority scheduler in which all EF
packets share a single FIFO queue which is served at strict non-
preemptive priority over other queues satisfies DEF_1 with the
latency term E = MTU/C where MTU is the maximum packet size and C
is the speed of output link.
The proof that PQ satisfies DEF_1 is given in appendix B.
Another scheduler that satisfies DEF_1 with a small latency term is
WF2Q described in [BZ96a]. A class-based WF2Q scheduler, in which
all EF traffic shares a single queue with the weight corresponding
to the configured rate of the EF aggregate can be shown to satisfy
DEF_1 with the latency term E = MTU/C+MTU/R.
The proof that WF2Q satisfies DEF_1 with E=MTU/C+MTU/R is given in
Appendix C.
4. Implementation considerations
4.1 General queuing and scheduling considerations.
Definition DEF_1 does not mandate a particular underlying queuing
structure. While it can be implemented using aggregate queuing,
where all packets of the EF aggregate share a single queue, it also
allows finer queuing granularity, where EF packets may be assigned
to a number of different queues.
Likewise, DEF_1 allows in principle a wide range of scheduling
algorithms, ranging from a strict priority scheduling of aggregate
EF queue, to hierarchical scheduling with per-flow queuing as
described in section 4.4 below.
Both the queuing structure and the scheduling algorithm have a
significant impact on the delay and jitter which can be provided to
the packets of the EF aggregate. It is typically more difficult to
Charny, ed. et al. 13
draft-charny-ef-definition.txt July, 2000
provide strict deterministic end-to-end delay and/or jitter
guarantees if aggregate queuing is implemented [CLeB2000]. However,
implementing and scheduling a large number of queues at high speeds
presents a significant engineering challenge, while aggregate
scheduling is very attractive due to its simplicity and scalability.
4.2 Aggregate Queuing and Scheduling Accuracy for FIFO Service of
the EF Aggregate
It can be shown that if all packets in the EF aggregate share a
single FIFO queue served by a scheduler satisfying the rate-latency
service curve, then end-to-end delay and jitter guarantees depend on
the latency term E of the scheduler [CLeB2000]. The smaller the
latency term, the better the delay and jitter bounds that can be
provided. In that respect, a strict priority queuing implementation
which has a very small latency term is a natural candidate for
implementing EF PHB. Various implementations of Weighted Fair
Queuing-like schedulers are also possible candidates for such
implementation, but the delay and jitter characteristics of these
schedulers differ substantially depending on the accuracy of the
implementation.
A widely used way of evaluating the accuracy of rate-based
scheduling implementations is to compare the output of the scheduler
with the so-called "fluid model" [Par92]. In this framework, a
given scheduler S and the reference fluid scheduler are subject to
the same arrival patterns. The accuracy of the scheduler S can be
determined by how close the time of the i-th departure in the
scheduler S is to the corresponding departure time in the fluid
scheduler. More precisely, if d(i) is the time of the i-th
departure under some scheduler S, and G(i) is the time of the i-th
departure in the reference fluid scheduler, then the accuracy of S
may be determined by two latency terms E1 and E2 such that for all
i
G(i)-E1 <= d(i)<= G(i) + E2 (eq_5)
While the term E2 determines the maximum per-hop delay bound, E1 has
an effect on the jitter at the output of the scheduler. For
example, as shown in [BZ96a], for WF2Q, E1 = MTU/R, E2= MTU/C, and
for PGPS [Par92] E2 = MTU/C as well, while E1 is linear in the
number of queues in the scheduler. It is demonstrated in [BZ96a]
that while WF2Q and PGPS have the same delay bounds, PGPS may result
in substantially burstier departure patterns.
In general, it can be shown that if a scheduler satisfies DEF_2,
then it also satisfies definition DEF_1 with the latency term E <=
E1 + E2. The proof of this statement is given in Appendix C. Note
that E1+E2 is not necessarily a tight latency bound, and for a given
scheduler a tighter bound may be obtained. That is, the fact that a
given scheduler has a large E1+E2 does not necessarily mean that is
has a large E.
Charny, ed. et al. 14
draft-charny-ef-definition.txt July, 2000
4.3 Additional examples of efficient WFQ-Like Scheduling
Implementations and their Latency Terms.
In this section we briefly discuss some schedulers that can be used
to implement EF PHB with different degrees of accuracy and with
different implementation complexity.
4.3.1 Weighted Fair Queuing (WFQ/PGPS)
For WFQ/PGPS ([DKS90],[Par92]), E2 = MTU/C just as for the case of
WF2Q. However, it can be shown that E1 can grow linearly with
the number of queues in the scheduler (which here and below is
denoted by N). The worst case complexity of WFQ is also O(N).
4.3.2.Deficit Round Robin (DRR)
For DRR [SV95], both E1 and E2 can be shown to grow linearly with
N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
the largest rate among the rate assignments of all queues in the
scheduler. The implementation complexity of DRR is O(1).
4.3.3. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing
(SCFQ)
For SFQ [GVC96] and SCFQ [Gol94] both E1 and E2 can be shown to grow
linearly with N. Implementation complexity of both of these
schedulers is O(log N).
4.3.4 WF2Q+
For WF2Q+ [BZ96b], E1 = MTU/R, while E2 can grow linearly with N.
The implementation complexity of WF2Q+ is O(log N).
4.4. Hierarchical scheduling implementations
A possible implementation of EF PHB may be based on a hierarchical
scheduling framework, such as described in [FJ95]. In this
framework, different subsets of EF packets may be assigned to
different queues. The semantics of exactly how packets are
classified into different EF queues is highly implementation-
dependent. For convenience, the subset of EF packets sharing a
single queue will be referred to as "EF flows". The EF queues are
grouped in a "logical queue", which is scheduled as a single entity
along with other non-EF queues or groups of queues by a "top-level"
scheduler. It is this top-level scheduler that must satisfy DEF_1.
Once the EF aggregate (i.e the EF "logical queue") is scheduled by
this top-level scheduler, an "EF flow-level" scheduler is invoked.
As an example, a hierarchical scheduler with WF2Q at each level of
the hierarchy (as described in [BZ96b]) can be used for such a
Charny, ed. et al. 15
draft-charny-ef-definition.txt July, 2000
purpose. Alternatively, the EF "logical" queue can be served at
strict priority over all non-EF queues, while the EF queue at the
"EF flow" level can be served by some other scheduler, such as WF2Q.
In principle, hierarchical scheduling structure allows a substantial
flexibility in the choice of scheduling mechanisms at each level of
the hierarchy. Per-packet delay guarantees in such a hierarchical
scheduling framework strongly depend on the accuracy of schedulers
employed at each level of the hierarchy. In general, the more
accurate the scheduling implementation at each level, the better the
per-packet guarantee that can be provided. It can be shown that for
the scheduling hierarchy, the E1 and E2 latency terms of the
hierarchical scheduler with respect to a particular "leaf queue"
can be obtained by summing the E1 and E2 terms of the
schedulers employed at the nodes of the scheduling tree along
the ascending branch of the tree from the root to the leaf.
4.5. Effect of internal switching mechanisms
A packet passing through a router will experience delay for a number
of reasons. Two familiar components of this delay are the time the
packet sits in a buffer at an outgoing link waiting for the
scheduler to select it and the time it takes to actually transmit
the packet on the outgoing line.
There may be other components of a packet's delay through a router,
however. A router might have to do some amount of header processing
before the packet can be given to the correct output scheduler, for
example. In another case a router may have a FIFO buffer (called a
transmission queue in [FC2000]) where the packet sits after being
selected by the output scheduler but before it is transmitted. In
cases such as these, the extra delay a packet may experience can be
accounted for by absorbing it into the latency term, E, in DEF_1.
Implementing EF on a router with a multi-stage switch fabric
requires special attention. A packet may experience additional
delays due to the fact that it must compete with other traffic for
forwarding resources at multiple contention points in the core. The
delay an EF packet may experience before it even reaches the output-
link scheduler should be included in the latency term. Input-
buffered and input/output-buffered routers may also require
modification of their latency terms.
Delay in the switch core comes from two sources, both of which must
be considered. The first part of this delay is the fixed delay a
packet experiences regardless of the other traffic. This component
of the delay includes the time it takes for things such as packet
segmentation and reassembly in cell based cores, enqueueing and
dequeueing at each stage, and transmission between stages. The
second part of the switch core delay is variable and depends on the
type and amount of
other traffic traversing the core. This delay comes about if
Charny, ed. et al. 16
draft-charny-ef-definition.txt July, 2000
the stages in the core mix traffic flowing between different
input/output port pairs. Thus, EF packets must compete against
other traffic for forwarding resources in the core. Some of this
competing traffic may even be EF traffic from other aggregates.
This introduces extra delay, that can also be absorbed by the
latency term in the definition.
5. Security Considerations
This draft makes the PHB definition in [RFC2598] more rigorous, but
adds no new functions to it. As a result, it adds no security issues
to those described in that specification.
6. Appendix A.
Statement A1.
If a scheduler conforms to DEF_1, and the EF packets are served in
the FIFO order, then all EF packets that are in the system at time t
will leave the system no later than at time t+Q/R+E, where Q is the
total EF queue at time t.
Proof of A1.
Consider the time t. Let a(1)...a(n) denote the arrival times of all
EF packets that are in the system at time t. For all of these
packets a(i)<=t. Let d(0)<=t denote the previous departure time
before time t. Let d(1),...d(n) denote the first n departures
after time t, and let L(1),...L(n) denote the packet lengths
corresponding to these departures and let F(1),...F(n)
denote the corresponding "finish times" in DEF_1. We now prove
by induction that
F(i)<=t+(L(1)+...L(i))/R for all 1<=i<=n.
Base case. Since a(1)<t, it is easy to see that
F(1) = max(a(1), min(d(0), F(0)))+L(1)/R <=max(a(1),d(0))+L(1)/R <=
t+L(1)/R
Inductive step. Suppose F(i)<=t+(L(1)+...L(i))/R for all i<=j<n.
Then, recalling that for all 1<=j<=n, a(j)<t, we obtain
F(j+1)=max(a(j+1), in(d(j), F(j)))+ L(j+1)/R <=
max(a(j+1),F(j))+L(j+1)/R <= t+(L(1)+...L(j+1)/R
Therefore, the first n departures must occur no later than
F(n)+ E = t+(L(1)+....L(n))/R + E. QED.
Charny, ed. et al. 17
draft-charny-ef-definition.txt July, 2000
7. Appendix B
Statement B1.
PQ satisfies DEF_1 with E=MTU/C.
Proof of B1.
Consider any busy period of the EF queue. Let k=1 correspond to
the first packet in that busy period and assume that a(1)>=0.
We prove by induction that for all k>=1 in this busy period
d(k) <= F(k)+MTU/C
(eq_b1)
This would immediately imply Statement B1.
Base case.
For k=1,
F(1) = max(a(1), min(d(0), F(0)) + L(1)/R >= a(1) + L(1)/R >=
a(1)+L(1)/C (eq_b2)
and
d(1) <= a(1) + MTU/C + L(1)/C <= F(1) + MTU/C
where the first inequality follows from the fact that the first
packet in a PQ may wait at most for one largest packet
transmission before its own transmission begins, and the second
inequality follows from (eq_b2).
Inductive step.
Note that since EF has the highest priority, for k>1 in the busy
period of the EF queue
d(k)=d(k-1) + L(k)/C (eq_b3)
Now from the induction hypothesis
F(k-1) >= d(k-1) - MTU/C
And the definition (eq_2) of F(k) gives
F(k) >= max(a(k), min(d(k-1), d(k-1) - MTU/C))+ L(k)/C =
max(a(k), d(k-1)- MTU/C)+ L(k)/C (eq_b4)
It follows immediately from (eq_b4) that
F(k) >= d(k-1)- MTU/C + L(k)/C
Charny, ed. et al. 18
draft-charny-ef-definition.txt July, 2000
Combining with (eq_b3) demonstrates (eq_b1) and completes the
inductive step.
8. Appendix C
Statement C1.
============
If a scheduler satisfies the condition
G(i) - E1 <= d(i) <= G(i) + E2 (eq_c1)
where G(i) is the i-th finishing time of the reference fluid
scheduler, then it satisfies DEF_1 in section 2.1 with latency term
E <= E1 + E2
Proof of Statement C1.
----------------------
To prove Statement C1 we will prove that for all i >= 0
F(i) >= G(i) - E1 (eq_c2)
where F(i) is the set of finish times recursively defined by
(eq_2).
If (eq_c2) is proven, then from (eq_c1) and (eq_c2)
d(i) <= G(i) + E2 <= F(i) + E1 + E2, which means that the scheduler
satisfies DEF_1 with the latency term E = E1 + E2
Proof of (eq_c2).
-----------------
First note that in the reference GPS system, packet i starts its
service at time max ( a(i), G(i-1)) and receives a service rate at
least equal to R. Thus
G(i) <= max ( a(i), G(i-1)) + L(i)/R (eq_c3)
Now the proof of (eq_c2) proceeds by induction.
Base case
F(0)=0, G(0) = 0, so (eq_c2) trivially holds for i=0.
Inductive step.
Suppose (eq_c2) holds for all j=0,1...i-1, (i>=1)
We have both F(i-1) >= G(i-1) - E1 and d(i-1) >= G(i-1) - E1,
thus
min (F(i-1), d(i-1)) >= G(i-1) - E1 (eq_c4)
Charny, ed. et al. 19
draft-charny-ef-definition.txt July, 2000
Combining this with equation (eq_2), we obtain
F(i) >= G(i-1) - E1 + L(i)/R (eq_c5)
Again from equation (eq_2) we have
F(i) >= a(i)+ L(i)/R >= a(i) - E1 + L(i)/R (eq_c6)
Combining (eq_c5), (eq_c6) and (eq_c3) gives F(i) >= G(i)-E1,
which completes the proof of (eq_c2) and statement C1.
Statement C2.
=============
WF2Q satisfies DEF_1 with E = MTU/C + MTU/R
Proof of C2.
------------
It follows from the results of [BZ96a] that the departures in WF2Q
satisfy the condition
max(G(i-1), a(i))<= d(i) <= G(i) + MTU/C
From Equation (eq_c3) this implies that
d(i) >= G(i) - L(i)/R >= G(i) - MTU/R
Therefore (eq_c1) in Statement C1 holds with E1=MTU/R and
E2 = MTU/C. Therefore, by Statement C1, WF2Q satisfies DEF_1 with
E=MTU/C + MTU/R.
9. References
[BZ96a] J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case
Fair Weighted Fair Queuing'', INFOCOM'96, Mar, 1996
[BZ96b] J.C.R. Bennett and H. Zhang, Hierarchical
Packet Fair Queuing Algorithms. IEEE/ACM Transactions
on Networking, 5(5):675-689, Oct 1997. Also in
Proceedings of SIGCOMM'96, Aug, 1996
[RFC2475] Black, D., Blake, S., Carlson, M., Davies, E., Wang,
Z. and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, December 1998.
[LEB98] J.-Y. Le Boudec, "Application of Network Calculus To
Guaranteed Service Networks", IEEE Transactions on
Information theory, (44) 3, May 1998
Charny, ed. et al. 20
draft-charny-ef-definition.txt July, 2000
[Bra97] Bradner, S., "Key Words for Use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[CLeB2000] A. Charny, J.-Y. Le Boudec "Delay Bounds in a
Network with Aggregate Scheduling". To appear in Proc.
of QoFIS'2000, September 25-26, 2000, Berlin, Germany.
[DKS90] A. Demers, S. Keshav, and S. Shenker, "Analysis
and Simulation of a Fair Queuing Algorithm". In
Journal of Internetworking Research and Experience,
pages 3-26, October 1990. Also in Proceedings of ACM
SIGCOMM'89, pp 3-12.
[FC2000] T. Ferrari and P. F. Chimento, "A Measurement-
Based Analysis of Expedited Forwarding PHB
Mechanisms," Eighth International Workshop on Quality
of Service, Pittsburgh, PA, June 2000,
[FJ95] S. Floyd and V. Jacobson, "Link-sharing and Resource
Management Models for Packet Networks", IEEE/ACM
Transactions on Networking, Vol. 3 no. 4, pp. 365-
386,August 1995.
[Gol94] S.J. Golestani. "A Self-clocked Fair Queuing
Scheme for Broad-band Applications". In Proceedings of
IEEE INFOCOM'94, pages 636-646, Toronto, CA, April
1994.
[GVC96] P. Goyal, H.M. Vin, and H. Chen. "Start-time
Fair Queuing: A Scheduling Algorithm for Integrated
Services". In Proceedings of the ACM-SIGCOMM 96, pages
157-168, Palo Alto, CA, August 1996.
[RFC2598] V. Jacobson, K. Nichols, K. Poduri, "An Expedited
Forwarding PHB", RFC 2598, June 1999
[RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
December 1998.
[JNP2000] V. Jacobson, K. Nichols, K. Poduri,
"The 'Virtual Wire' Behavior Aggregate,"
(draft-ietf-diffserv-ba-vw-00.txt), March 2000.
[Par92] A. Parekh. "A Generalized Processor Sharing
Approach to Flow Control in Integrated Services
Networks". PhD dissertation, Massachusetts Institute
of Technology, February 1992.
[SV95] M. Shreedhar and G. Varghese. "Effient Fair Queueing
Charny, ed. et al. 21
draft-charny-ef-definition.txt July, 2000
Using Deficit Round Robin". In Proceedings of
SIGCOMM'95, pages 231-243, Boston, MA, September 1995.
[Sto95] I. Stoica and H. Abdel-Wahab, "Earliest Eligible
Virtual Deadline First: A Flexible and Accurate
Mechanism for Proportional Share Resource Allocation",
Technical Report 95-22, Old Dominion University,
November 1995.
9. Authors' addresses
Anna Charny, ed.
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
acharny@cisco.edu
Fred Baker
Cisco Systems
170 West Tasman Dr.
San Jose, CA 95134
fred@cisco.com
Jon Bennett
RiverDelta Networks
3 Highwood Drive East
Tewksbury, MA 01876
jcrb@riverdelta.com
Kent Benson
Tellabs Research Center
3740 Edison Lake Parkway #101
Mishawaka, IN 46545
Kent.Benson@tellabs.com
Jean-Yves Le Boudec
ICA-EPFL, INN
Ecublens, CH-1015
Lausanne-EPFL, Switzerland
leboudec@epfl.c
Angela Chiu
AT&T Labs
100 Schulz Dr. Rm 4-204
Red Bank, NJ 07701
alchiu@att.com
Bill Courtney
TRW
Bldg. 201/3702
One Space Park
Redondo Beach, CA 90278
Charny, ed. et al. 22
draft-charny-ef-definition.txt July, 2000
bill.courtney@trw.com
Shahram Davari
PMC-Sierra Inc
555 Legget drive
Suit 834, Tower B
Ottawa, ON K2K 2X3, Canada
shahram_davari@pmc-sierra.com
Bruce Davie
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
bsd@cisco.com
Victor Firou
Nortel Networks
600 Tech Park
Billerica, MA 01821
vfirou@nortelnetworks.com
Charles Kalmanek
AT&T Labs-Research
180 Park Avenue, Room A113,
Florham Park NJ
crk@research.att.com.
K.K. Ramakrishnan
AT&T Labs-Research
Rm. A155, 180 Park Ave,
Florham Park, NJ 07932
kkrama@research.att.com
Dimitrios Stiliadis
Lucent Technologies
1380 Rodick Road
Markham, Ontario, L3R-4G5, Canada
stiliadi@bell-labs.com
10. Full Copyright
Copyright (C) The Internet Society 1999. All Rights Reserved.
This document and translations of it may be copied andFurnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
Charny, ed. et al. 23
draft-charny-ef-definition.txt July, 2000
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
draft-charny-ef-definition-00.txt Expires
January, 2001
Charny, ed. et al. 24
| PAFTECH AB 2003-2026 | 2026-04-23 03:59:33 |