One document matched: draft-briscoe-re-pcn-border-cheat-01.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<rfc category="info" docName="draft-briscoe-re-pcn-border-cheat-01"
ipr="full3978">
<?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ?>
<?rfc private="" ?>
<!-- Default private="" Produce an internal memo 2.5pp shorter than an I-D or RFC -->
<?rfc rfcprocack="yes" ?>
<!-- Default rfcprocack="no" add a short sentence acknowledging xml2rfc -->
<?rfc toc="yes" ?>
<!-- Default toc="no" No Table of Contents -->
<?rfc symrefs="yes" ?>
<!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->
<?rfc sortrefs="yes"?>
<!-- Default sortrefs="no" Don't sort references into order -->
<?rfc iprnotified="no" ?>
<!-- Default iprnotified="no" I haven't disclosed existence of IPR to IETF -->
<?rfc strict="no" ?>
<!-- Default strict="no" Don't check I-D nits -->
<?rfc compact="yes"?>
<!-- Default compact="no" Start sections on new pages -->
<?rfc subcompact="no"?>
<!-- Default subcompact="(as compact setting)" yes/no is not quite as compact as yes/yes -->
<?rfc emoticonic="yes" ?>
<!-- Default emoticonic="no" Doesn't prettify HTML format -->
<?rfc comments="yes" ?>
<!-- Default comments="no" Don't render comments -->
<front>
<title abbrev="Bulk Border Policing using Re-ECN">Emulating Border Flow
Policing using Re-ECN on Bulk Data</title>
<author fullname="Bob Briscoe" initials="B." surname="Briscoe">
<organization>BT & UCL</organization>
<address>
<postal>
<street>B54/77, Adastral Park</street>
<street>Martlesham Heath</street>
<city>Ipswich</city>
<code>IP5 3RE</code>
<country>UK</country>
</postal>
<phone>+44 1473 645196</phone>
<email>bob.briscoe@bt.com</email>
<uri>http://www.cs.ucl.ac.uk/staff/B.Briscoe/</uri>
</address>
</author>
<date day="25" month="February" year="2008" />
<area>Transport</area>
<workgroup>PCN Working Group</workgroup>
<keyword>Quality of Service</keyword>
<keyword>QoS</keyword>
<keyword>Congestion Control</keyword>
<keyword>Differentiated Services</keyword>
<keyword>Integrated Services</keyword>
<keyword>Admission Control Policing</keyword>
<keyword>Flow Rate Policing</keyword>
<keyword>Inter-domain</keyword>
<keyword>Trust</keyword>
<keyword>Theft of Service</keyword>
<keyword>Signalling</keyword>
<keyword>Protocol</keyword>
<keyword>Congestion Notification</keyword>
<keyword>Scalability</keyword>
<abstract>
<t>Scaling per flow admission control to the Internet is a hard problem.
A recently proposed approach combines Diffserv and pre-congestion
notification (PCN) to provide a service slightly better than Intserv
controlled load. It scales to networks of any size, but only if domains
trust each other to comply with admission control and rate policing.
This memo claims to solve this trust problem without losing scalability.
It describes bulk border policing that provides a sufficient emulation
of per-flow policing with the help of another recently proposed
extension to ECN, involving re-echoing ECN feedback (re-ECN). With only
passive bulk measurements at borders, sanctions can be applied against
cheating networks.</t>
</abstract>
</front>
<middle>
<!-- ================================================================ -->
<note title="Status (to be removed by the RFC Editor)">
<t>This memo is posted as an Internet-Draft with the intent to
eventually be broken down in two documents; one for the standards track
and one for informational status. But until it becomes an item of IETF
working group business the whole proposal has been kept together to aid
understanding. Only the text of <xref
target="repcn_Re-ECN-RSVP_Protocol" /> of this document requires
standardisation. The rest of the sections describe how a system might be
built from these protocols by the operators of an internetwork. Note in
particular that the policing and monitoring functions proposed for the
trust boundaries between operators would not need standardisation by the
IETF. They simply represent one way that the proposed protocols could be
used to extend the PCN architecture <xref target="I-D.ietf-pcn-architecture" /> to span
multiple domains without mutual trust between the operators.</t>
<t>To realise the system described, this document also depends on
standardisation of three other documents currently being discussed (but
not on the standards track) in the IETF Transport Area: pre-congestion
notification (PCN) marking on interior nodes <xref target="PCN" />;
feedback of aggregate PCN measurements by suitably extending the
admission control signalling protocol (e.g. RSVP) <xref
target="RSVP-ECN" />; and re-insertion of the feedback into the forward
stream of IP packets by the PCN ingress gateway in a similar way to that
proposed for a TCP source <xref target="Re-TCP" />.</t>
<t>The authors seek comments from the Internet community on whether
combining PCN and re-ECN in this way is a sufficient solution to the
problem of scaling microflow admission control to the Internet as a
whole, even though such scaling must take account of the increasing
numbers of networks and users who may all have conflicting
interests.</t>
</note>
<note title="Changes from previous drafts (to be removed by the RFC Editor)">
<t>Full diffs of incremental changes between drafts are available at
URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn></t>
<t>
<list style="hanging">
<t hangText="Changes from <draft-briscoe-re-pcn-border-cheat-00> to <draft-briscoe-re-pcn-border-cheat-01> (current version):" />
<t>Updated references.</t>
<t hangText="Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01> to <draft-briscoe-re-pcn-border-cheat-00>:" />
<t>Changed filename to associate it with the new IETF PCN w-g,
rather than the TSVWG w-g.</t>
<t>Introduction: Clarified that bulk policing only replaces per-flow
policing at interior inter-domain borders, while per-flow policing
is still needed at the access interface to the internetwork. Also
clarified that the aim is to neutralise any gains from cheating
using local bilateral contracts between neighbouring networks,
rather than merely identifying remote cheaters.</t>
<t><xref target="repcn_Traditional_Problem" />: Described the
traditional per-flow policing problem with inter-domain reservations
more precisely, particularly with respect to direction of
reservations and of traffic flows.</t>
<t>Clarified status of <xref
target="repcn_Emulating_Policing_Re-ECN" /> onwards, in particular
that policers and monitors would not need standardisation, but that
the protocol in <xref target="repcn_Re-ECN-RSVP_Protocol" /> would
require standardisation.</t>
<t><xref target="repcn_Competitive_Routing" /> on competitive
routing: Added discussion of direct incentives for a receiver to
switch to a different provider even if the provider has a
termination monopoly.</t>
<t>Clarified that "Designing in security from the start" merely
means allowing codepoint space in the PCN protocol encoding. There
is no need to actually implement inter-domain security mechanisms
for solutions confined to a single domain.</t>
<t>Updated some references and added a ref to the Security
Considerations, as well as other minor corrections and
improvements.</t>
<t hangText="Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to <draft-briscoe-tsvwg-re-ecn-border-cheat-01>:" />
<t>Added subsection on Border Accounting Mechanisms (<xref
target="repcn_Border_Accounting_Mechanisms" />)</t>
<t><xref
target="repcn_Re-ECN_Abstracted_Network_Layer_Wire_Protocol" /> on
the re-ECN wire protocol clarified and re-organised to separately
discuss re-ECN for default ECN marking and for pre-congestion
marking (PCN).</t>
<t>Router Forwarding Behaviour subsection added to re-organised
section on Protocol Operation (<xref
target="repcn_Protocol_Operation" />). Extensions section moved
within Protocol Operations.</t>
<t>Emulating Border Policing (<xref
target="repcn_Emulating_Policing_Re-ECN" />) reorganised, starting
with a new Terminology subsection heading, and a simplified overview
section. Added a large new subsection on Border Accounting
Mechanisms within a new section bringing together other subsections
on Border Mechanisms generally (<xref
target="repcn_Border_Mechanisms" />). Some text moved from old
subsections into these new ones.</t>
<t>Added section on Incremental Deployment (<xref
target="repcn_Deployment" />), drawing together relevant points
about deployment made throughout.</t>
<t>Sections on Design Rationale (<xref target="repcn_Rationale" />)
and Security Considerations (<xref
target="repcn_Security_Considerations" />) expanded with some new
material, including new attacks and their defences.</t>
<t>Suggested Border Metering Algorithms improved (<xref
target="repcn_Alg_Metering" />) for resilience to newly identified
attacks.</t>
</list>
</t>
</note>
<!-- ================================================================ -->
<section anchor="repcn_Introduction" title="Introduction">
<t>The Internet community largely lost interest in the Intserv
architecture after it was clarified that it would be unlikely to scale
to the whole Internet <xref target="RFC2208" />. Although Intserv
mechanisms proved impractical, the bandwidth reservation service it
aimed to offer is still very much required.</t>
<t>A recently proposed approach <xref target="I-D.ietf-pcn-architecture" /> combines
Diffserv and pre-congestion notification (PCN) to provide a service
slightly better than Intserv controlled load <xref
target="RFC2211" />. It scales to any size network, but only if domains
trust their neighbours to have checked that upstream customers aren't
taking more bandwidth than they reserved, either accidentally or
deliberately. This memo describes border policing measures so that one
network can protect its interests, even if networks around it are
deliberately trying to cheat. The approach provides a sufficient
emulation of flow rate policing at trust boundaries but without per-flow
processing. The emulation is not perfect, but it is sufficient to ensure
that the punishment is at least proportionate to the severity of the
cheat. Per-flow rate policing for each reservation is still expected to
be used at the access edge of the internetwork, but at the borders
between networks bulk policing can be used to emulate per-flow
policing.</t>
<t>The aim is to be able to scale controlled load service to any number
of endpoints, even though such scaling must take account of the
increasing numbers of networks and users who may all have conflicting
interests. To achieve such scaling, this memo combines two recent
proposals, both of which it briefly recaps: <list style="symbols">
<t>A deployment model for admission control over Diffserv using
pre-congestion notification <xref target="I-D.ietf-pcn-architecture" />
describes how bulk pre-congestion notification on routers within an
edge-to-edge Diffserv region can emulate the precision of per-flow
admission control to provide controlled load service without
unscalable per-flow processing;</t>
<t>Re-ECN: Adding Accountability to TCP/IP <xref
target="Re-TCP" />. The trick that addresses cheating at borders is
to recognise that border policing is mainly necessary because
cheating upstream networks will admit traffic when they shouldn't
only as long as they don't directly experience the downstream
congestion their misbehaviour can cause. The re-ECN protocol
requires upstream nodes to declare expected downstream congestion in
all forwarded packets and it makes it in their interests to declare
it honestly. Operators can then monitor downstream congestion in
bulk at borders to emulate policing.</t>
</list></t>
<t>The aim is not to enable a network to <spanx
style="emph">identify</spanx> some remote cheating party, which would
rarely be useful given the victim network would be unlikely to be able
to seek redress from a cheater in some remote part of the world with
whom no direct contractual relationship exists. Rather the aim is to
ensure that any gain from cheating will be cancelled out by penalties
applied to the cheating party by its local network. Further, the
solution ensures each of the chain of networks between the cheater and
the victim will lose out if it doesn't apply penalties to its neighbour.
Thus the solution builds on the local bilateral contractual
relationships that already exist between neighbouring networks.</t>
<t>Rather than the end-to-end arrangement used when re-ECN was specified
for the TCP transport <xref target="Re-TCP" />, this memo specifies
re-ECN in an edge-to-edge arrangement, making it applicable to the above
deployment model for admission control over Diffserv. Also, rather than
using a TCP transport for regular congestion feedback, this memo
specifies re-ECN using RSVP as the transport for feedback <xref
target="RSVP-ECN" />. A similar deployment model, but with a different
transport for signalling congestion feedback could be used (e.g.
Arumaithurai <xref target="I-D.arumaithurai-nsis-pcn" /> and
RMD <xref target="I-D.ietf-nsis-rmd" /> use NSIS).</t>
<t>This memo aims to do two things: i) define how to apply the re-ECN
protocol to the admission control over Diffserv scenario; and ii)
explain why re-ECN sufficiently emulates border policing in that
scenario. Most of the memo is taken up with the second aim; explaining
why it works. Applying re-ECN to the scenario actually involves quite a
trivial modification to the ingress gateway. That modification can be
added to gateways later, so our immediate goal is to convince everyone
to have the foresight to define the PCN wire protocol encoding to
accommodate the extended codepoints defined in this document, whether
first deployments require border policing or not. Otherwise, when we
want to add policing, we will have built ourselves a legacy problem. In
other words, we aim to convince people to "Design in security from the
start."</t>
<t>The body of this memo is structured as follows: <list style="empty">
<t><xref target="repcn_Problem" /> describes the border policing
problem. We recap the traditional, unscalable view of how to solve
the problem, and we recap the admission control solution which has
the scalability we do not want to lose when we add border
policing;</t>
<t><xref target="repcn_Re-ECN-RSVP_Protocol" /> specifies the re-ECN
protocol solution in detail;</t>
<t><xref target="repcn_Emulating_Policing_Re-ECN" /> explains how to
use the protocol to emulate border policing, and why it works;</t>
<t><xref target="repcn_Analysis" /> analyses the security of the
proposed solution;</t>
<t><xref target="repcn_Rationale" /> explains the sometimes subtle
rationale behind our design decisions;</t>
<t><xref target="repcn_Security_Considerations" /> comments on the
overall robustness of the security assumptions and lists specific
security issues.</t>
</list></t>
<t>It must be emphasised that we are not evangelical about removing
per-flow processing from borders. Network operators may choose to do
per-flow processing at their borders for their own reasons, such as to
support business models that require per-flow accounting. Our aim is to
show that per-flow processing at borders is no longer <spanx
style="emph">necessary</spanx> in order to provide end-to-end QoS using
flow admission control. Indeed, we are absolutely opposed to
standardisation of technology that embeds particular business models
into the Internet. Our aim is merely to provide a new useful metric
(downstream congestion) at trust boundaries. Given the well-known
significance of congestion in economics, operators can then use this new
metric in their interconnection contracts if they choose. This will
enable competitive evolution of new business models (for examples
see <xref target="IXQoS" />), even for sets of flows running
alongside another set across the same border but using the more
traditional model that depends on more costly per-flow processing at
each border.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Reqs_notation" title="Requirements Notation">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119" />.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Problem" title="The Problem">
<!-- ________________________________________________________________ -->
<section anchor="repcn_Traditional_Problem"
title="The Traditional Per-flow Policing Problem">
<t>If we claim to be able to emulate per-flow policing with bulk
policing at trust boundaries, we need to know exactly what we are
emulating. So, we will start from the traditional scenario with
per-flow policing at trust boundaries to explain why it has always
been considered necessary.</t>
<t>To be able to take advantage of a reservation-based service such as
controlled load, a source-destination pair must reserve resources
using a signalling protocol such as RSVP <xref
target="RFC2205" />. An RSVP signalling request refers to a flow of
packets by its flow ID tuple (filter spec <xref
target="RFC2205" />) (or its security parameter index (SPI) <xref
target="RFC2207" /> if port numbers are hidden by IPSec encryption).
Other signalling protocols use similar flow identifiers. But, it is
insufficient to merely authorise and admit a flow based on its
identifiers, for instance merely opening a pin-hole for packets with
identifiers that match an admitted flow ID. Because, once a flow is
admitted, it cannot necessarily be trusted to send packets within the
rate profile it requested.</t>
<t>The packet rate must also be policed to keep the flow within the
requested flow spec <xref target="RFC2205" />. For instance,
without data rate policing, a source-destination pair could reserve
resources for an 8kbps audio flow but the source could transmit a
6Mbps video (theft of service). More subtly, the sender could generate
bursts that were outside the profile requested.</t>
<t>In traditional architectures, per-flow packet rate-policing is
expensive and unscalable but, without it, a network is vulnerable to
such theft of service (whether malicious or accidental). Perhaps more
importantly, if flows are allowed to send more data than they were
permitted, the ability of admission control to give assurances to
other flows will break.</t>
<t>Just as sources need not be trusted to keep within the requested
flow spec, whole networks might also try to cheat. We will now set up
a concrete scenario to illustrate such cheats. Imagine reservations
for unidirectional flows, through at least two networks, an edge
network and its downstream transit provider. Imagine the edge network
charges its retail customers per reservation but also has to pay its
transit provider a charge per reservation. Typically, both its selling
and buying charges might depend on the duration and rate of each
reservation. The level of the actual selling and buying prices are
irrelevant to our discussion (most likely the network will sell at a
higher price than it buys, of course).</t>
<t>A cheating ingress network could systematically reduce the size of
its retail customers' reservation signalling requests (e.g. the
SENDER_TSPEC object in RSVP's PATH message) before forwarding them to
its transit provider and systematically reinstate the responses on the
way back (e.g. the FLOWSPEC object in RSVP's RESV message). It would
then receive an honest income from its upstream retail customer but
only pay for fraudulently smaller reservations downstream. A similar
but opposite trick (increasing the TSPEC and decreasing the FLOWSPEC)
could be perpetrated by the receiver's access network if the
reservation was paid for by the receiver.</t>
<t>Equivalently, a cheating ingress network may feed the traffic from
a number of flows into an aggregate reservation over the transit that
is smaller than the total of all the flows. Because of these fraud
possibilities, in traditional QoS reservation architectures the
downstream network polices at each border. The policer checks that the
actual sent data rate of each flow is within the signalled
reservation.</t>
<t>Reservation signalling could be authenticated end to end, but this
wouldn't prevent the aggregation cheat just described. For this
reason, and to avoid the need for a global PKI, signalling integrity
is typically only protected on a hop-by-hop basis <xref
target="RFC2747" />.</t>
<t>A variant of the above cheat is where a router in an honest
downstream network denies admission to a new reservation, but a
cheating upstream network still admits the flow. For instance, the
networks may be using Diffserv internally, but Intserv admission
control at their borders <xref target="RFC2998" />. The cheat
would only work if they were using bulk Diffserv traffic policing at
their borders, perhaps to avoid the cost/complexity of Intserv border
policing. As far as the cheating upstream network is concerned, it
gets the revenue from the reservation, but it doesn't have to pay any
downstream wholesale charges and the congestion is in someone else's
network. The cheating network may calculate that most of the flows
affected by congestion in the downstream network aren't likely to be
its own. It may also calculate that the downstream router has been
configured to deny admission to new flows in order to protect
bandwidth assigned to other network services (e.g. enterprise VPNs).
So the cheating network can steal capacity from the downstream
operator's VPNs that are probably not actually congested.</t>
<t>All the above cheats are framed in the context of RSVP's receiver
confirmed reservation model, but similar cheats are possible with
sender-initiated and other models.</t>
<t>To summarise, in traditional reservation signalling architectures,
if a network cannot trust a neighbouring upstream network to
rate-police each reservation, it has to check for itself that the data
rate fits within each of the reservations it has admitted.</t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Generic_Scenario" title="Generic Scenario">
<t>We will now describe a generic internetworking scenario that we
will use to describe and to test our bulk policing proposal. It
consists of a number of networks and endpoints that do not fully trust
each other to behave. In <xref target="repcn_Analysis" /> we will tie
down exactly what we mean by partial trust, and we will consider the
various combinations where some networks do not trust each other and
others are colluding together.</t>
<?rfc needLines="27" ?>
<figure anchor="repcn_Fig_Scenario"
title="Generic Scenario (see text for explanation of terms)">
<artwork><![CDATA[
_ ___ _____________________________________ ___ _
| | | | _|__ ______ ______ ______ _|__ | | | |
| | | | | | | | | | | | | | | | | |
| | | | | | |Inter-| |Inter-| |Inter-| | | | | | |
| | | | | | | ior | | ior | | ior | | | | | | |
| | | | | | |Domain| |Domain| |Domain| | | | | | |
| | | | | | | A | | B | | C | | | | | | |
| | | | | | | | | | | | | | | | | |
| | | | +----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | | | |
| | | | | | |B| |B| |B| |B| |B| |B| | | | |\ | |
| |==| |==|Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr |==| |=>| |
| | | | |G/W | | | | | | | | | | | | | |G/W | | |/ | |
| | | | +----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | | | |
| | | | | | | | | | | | | | | | | |
| | | | |____| |______| |______| |______| |____| | | | |
|_| |___| |_____________________________________| |___| |_|
Sx Ingress Diffserv region Egress Rx
End Access Access End
Host Network Network Host
<-------- edge-to-edge signalling ------->
(for admission control)
<-------------------end-to-end QoS signalling protocol------------->
]]></artwork>
</figure>
<t>An ingress and egress gateway (Ingr G/W and Egr G/W in <xref
target="repcn_Fig_Scenario" />) connect the interior Diffserv region
to the edge access networks where routers (not shown) use per-flow
reservation processing. Within the Diffserv region are three interior
domains, A, B and C, as well as the inward facing interfaces of the
ingress and egress gateways. An ingress and egress border router (BR)
is shown interconnecting each interior domain with the next. There may
be other interior routers (not shown) within each interior domain.</t>
<t>In two paragraphs we now briefly recap how pre-congestion
notification is intended to be used to control flow admission to a
large Diffserv region. The first paragraph describes data plane
functions and the second describes signalling in the control plane. We
omit many details from <xref target="I-D.ietf-pcn-architecture" /> including behaviour
during routing changes. For brevity here we assume other flows are
already in progress across a path through the Diffserv region before a
new one arrives, but how bootstrap works is described in <xref
target="repcn_Aggregate_Bootstrap" />.</t>
<t><xref target="repcn_Fig_Scenario" /> shows a single simplex
reserved flow from the sending (Sx) end host to the receiving (Rx) end
host. The ingress gateway polices incoming traffic within its admitted
reservation and remarks it to turn on an ECN-capable
codepoint <xref target="RFC3168" /> and the controlled load (CL)
Diffserv codepoint. Together, these codepoints define which traffic is
entitled to the enhanced scheduling of the CL behaviour aggregate on
routers within the Diffserv region. The CL PHB of interior routers
consists of a scheduling behaviour and a new ECN marking behaviour
that we call `pre-congestion notification' <xref target="PCN" />.
The CL PHB simply re-uses the definition of expedited forwarding
(EF) <xref target="RFC3246" /> for its scheduling behaviour. But
it incorporates a new ECN marking behaviour, which sets the ECN field
of an increasing number of CL packets to the admission marked (AM)
codepoint as they approach a threshold rate that is lower than the
line rate. The use of virtual queues ensures real queues have hardly
built up any congestion delay. The level of marking detected at the
egress of the Diffserv region is then used by the signalling system in
order to determine admission control as follows.</t>
<t>The end-to-end QoS signalling (e.g. RSVP) for a new reservation
takes one giant hop from ingress to egress gateway, because interior
routers within the Diffserv region are configured to ignore RSVP. The
egress gateway holds flow state because it takes part in the
end-to-end reservation. So it can classify all packets by flow and it
can identify all flows that have the same previous RSVP hop (a
CL-region-aggregate). For each CL-region-aggregate of flows in
progress, the egress gateway maintains a per-packet moving average of
the fraction of pre-congestion-marked traffic. Once an RSVP PATH
message for a new reservation has hopped across the Diffserv region
and reached the destination, an RSVP RESV message is returned. As the
RESV message passes, the egress gateway piggy-backs the relevant
pre-congestion level onto it <xref target="RSVP-ECN" />. Again,
interior routers ignore the RSVP message, but the ingress gateway
strips off the pre-congestion level. If the pre-congestion level is
above a threshold, the ingress gateway denies admission to the new
reservation, otherwise it returns the original RESV signal back
towards the data sender.</t>
<t>Once a reservation is admitted, its traffic will always receive low
delay service for the duration of the reservation. This is because
ingress gateways ensure that traffic not under a reservation cannot
pass into the Diffserv region with the CL DSCP set. So non-reserved
traffic will always be treated with a lower priority PHB at each
interior router. And even if some disaster re-routes traffic after it
has been admitted, if the traffic through any resource tips over a
fail-safe threshold, pre-congestion notification will trigger flow
pre-emption to very quickly bring every router within the whole
Diffserv region back below its operating point.</t>
<t>The whole admission control system just described deliberately
confines per-flow processing to the access edges of the network, where
it will not limit the system's scalability. But ideally we want to
extend this approach to multiple networks, to take even more advantage
of its scaling potential. We would still need per-flow processing at
the access edges of each network, but not at the high speed interfaces
where they interconnect. Even though such an admission control system
would work technically, it would gain us no scaling advantage if each
network also wanted to police the rate of each admitted flow for
itself—border routers would still have to do complex packet
operations per-flow anyway, given they don't trust upstream networks
to do their policing for them.</t>
<t>This memo describes how to emulate per-flow rate policing using
bulk mechanisms at border routers, so the full scalability potential
of pre-congestion notification is not limited by the need for per-flow
policing mechanisms at borders, which would make borders the most
cost-critical pinch-points. Then we can achieve the long sought-for
vision of secure Internet-wide bandwidth reservations without needing
per-flow processing at all in core and border routers—where
scalability is most critical.</t>
</section>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Re-ECN-RSVP_Protocol"
title="Re-ECN Protocol for an RSVP (or similar) Transport">
<!-- ________________________________________________________________ -->
<section anchor="repcn_Protocol_Overview" title="Protocol Overview">
<t>First we need to recap the way routers accumulate congestion
marking along a path. Each ECN-capable router marks some packets with
CE, the marking probability increasing with the length of the queue at
its egress link. The only difference with pre-congestion
marking <xref target="PCN" /> is that marking is based on the
length of a virtual queue, so that the real queue occupancy can remain
very low. We will use the terms congestion and pre-congestion
interchangeably in the following unless it is important to distinguish
between them.</t>
<t>With multiple ECN-capable routers on a path, the ECN field
accumulates the fraction of CE marking that each router adds. The
combined effect of the packet marking of all the routers along the
path signals congestion of the whole path to the receiver. So, for
example, if one router early in a path is marking 1% of packets and
another later in a path is marking 2%, flows that pass through both
routers will experience approximately 3% marking.</t>
<t>The packets crossing an inter-domain trust boundary within the
Diffserv region will all have come from different ingress gateways and
will all be destined for different egress gateways. We will show that
the key to policing against theft of service is for a border router to
be able to directly measure the congestion that is about to be caused
by the traffic it forwards. That is, it can measure locally the
congestion on each of the downstream paths between itself and the
egress gateways that its traffic is destined for.</t>
<t>With the original ECN protocol, if CE markings crossing the border
had been counted over a period, they would have represented the
accumulated upstream congestion that had already been experienced by
those packets. The general idea of re-ECN is for the ingress gateway
to continuously encode path congestion into the IP header where, in
this case, `path' means from ingress to egress gateway. Then at any
point on that path (e.g. between domains A & B in <xref
target="repcn_Fig_Re-ECN_Concept" /> below), IP headers can be
monitored to subtract upstream congestion from expected path
congestion in order to give the expected downstream congestion still
to be experienced until the egress gateway.</t>
<t>Importantly, it turns out that there is no need to monitor
downstream congestion on a per-flow basis. We will show that
accounting for it in bulk across all flows will be sufficient. <?rfc needLines="27" ?>
<figure anchor="repcn_Fig_Re-ECN_Concept" title="Re-ECN concept">
<artwork><![CDATA[
_____________________________________
_|__ ______ ______ ______ _|__
| | | A | | B | | C | | |
+----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+
| | |B| |B| |B| |B| |B| |B| | |
|Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr |
|G/W | | | | |: | | | | | | | | |G/W |
+----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+
| | | |: | | | | | |
|____| |______|: |______| |______| |____|
|_____________:_______________________|
:
| : |
|<-upstream-->:<-expected downstream->|
| congestion : congestion |
| u v ~= p - u |
| |
|<--- expected path congestion, p --->|
]]></artwork>
</figure></t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Re-ECN_Abstracted_Network_Layer_Wire_Protocol"
title="Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)">
<t>In this section we define the names of the various codepoints of
the re-ECN protocol when used with pre-congestion notification,
deferring description of their semantics to the following sections.
But first we recap the re-ECN wire protocol proposed in <xref
target="Re-TCP" />.</t>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Re-ECN_Recap" title="Re-ECN Recap">
<t>Re-ECN uses the two bit ECN field broadly as in
RFC3168 <xref target="RFC3168" />. It also uses a new re-ECN
extension (RE) flag. The actual position of the RE flag is different
between IPv4 & v6 headers so we will use an abstraction of the
IPv4 and v6 wire protocols by just calling it the RE flag. <xref
target="Re-TCP" /> proposes using bit 48 (currently unused) in the
IPv4 header for the RE flag, while for IPv6 it proposes an ECN
extension header.</t>
<t>Unlike the ECN field, the RE flag is intended to be set by the
sender and remain unchanged along the path, although it can be read
by network elements that understand the re-ECN protocol. In the
scenario used in this memo, the ingress gateway acts as a proxy for
the sender, setting the RE flag as permitted in the specification of
re-ECN.</t>
<t>Note that general-purpose routers do not have to read the RE
flag, only special policing elements at borders do. And no
general-purpose routers have to change the RE flag, although the
ingress and egress gateways do because in the edge-to-edge
deployment model we are using, they act as proxies for the
endpoints. Therefore the RE flag does not even have to be visible to
interior routers. So the RE flag has no implications on protocols
like MPLS. Congested label switching routers (LSRs) would have to be
able to notify their congestion with an ECN/PCN codepoint in the
MPLS shim <xref target="RFC5129" />, but like any interior IP
router, they can be oblivious to the RE flag, which need only be
read by border policing functions.</t>
<t>Although the RE flag is a separate, single bit field, it can be
read as an extension to the two-bit ECN field; the three
concatenated bits in what we will call the extended ECN field (EECN)
make eight codepoints available. When the RE flag setting is "don't
care", we use the RFC3168 names of the ECN codepoints, but <xref
target="Re-TCP" /> proposes the following six codepoint names for
when there is a need to be more specific. <?rfc needLines="25" ?>
<texttable anchor="repcn_Tab_Default_EECN_Codepoints"
title="Re-cap of Default Extended ECN Codepoints Proposed for Re-ECN">
<ttcol align="center">ECN field</ttcol>
<ttcol align="left">RFC3168 codepoint</ttcol>
<ttcol align="center">RE flag</ttcol>
<ttcol align="left">Extended ECN codepoint</ttcol>
<ttcol align="center">Re-ECN meaning</ttcol>
<c>00</c>
<c>Not-ECT</c>
<c>0</c>
<c>Not-RECT</c>
<c>Not re-ECN-capable transport</c>
<c>00</c>
<c>Not-ECT</c>
<c>1</c>
<c>FNE</c>
<c>Feedback not established</c>
<c>01</c>
<c>ECT(1)</c>
<c>0</c>
<c>Re-Echo</c>
<c>Re-echoed congestion and RECT</c>
<c>01</c>
<c>ECT(1)</c>
<c>1</c>
<c>RECT</c>
<c>Re-ECN capable transport</c>
<c>10</c>
<c>ECT(0)</c>
<c>0</c>
<c>---</c>
<c>Legacy ECN use only </c>
<c>10</c>
<c>ECT(0)</c>
<c>1</c>
<c>--CU--</c>
<c>Currently unused
</c>
<c>11</c>
<c>CE</c>
<c>0</c>
<c>CE(0)</c>
<c>Congestion experienced with Re-Echo</c>
<c>11</c>
<c>CE</c>
<c>1</c>
<c>CE(-1)</c>
<c>Congestion experienced</c>
</texttable></t>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Re-PCN"
title="Re-ECN Combined with Pre-Congestion Notification (re-PCN)">
<t>As permitted by the ECN specification <xref
target="RFC3168" />, a proposal is currently being advanced in the
IETF to define different semantics for how routers might mark the
ECN field of certain packets. The idea is to be able to notify
congestion when the router's load approaches a logical limit, rather
than the physical limit of the line. This new marking is called
pre-congestion notification <xref target="PCN" /> and we will
use the term PCN-enabled router for a router that can apply
pre-congestion notification marking to the ECN fields of
packets.</t>
<t><xref target="RFC3168" /> recommends that a packet's Diffserv
codepoint should determine which type of ECN marking it receives. A
Diffserv per-hop behaviour (PHB) can specify that routers should
apply pre-congestion notification marking to PCN-capable packets. We
will call this a PCN-enhanced PHB. A PCN-capable packet must meet
two conditions, it must carry a DSCP that maps to a PCN-enhanced PHB
and it must carry an ECN field that turns on PCN marking.</t>
<t>As an example, the controlled load (CL) PHB might specify
expedited forwarding as its scheduling behaviour and PCN marking as
its congestion marking behaviour. Then we would say the CL PHB is a
PCN-enhanced PHB, and that packets with a DSCP that maps to the CL
PHB and with ECN turned on are PCN-capable packets.</t>
<t><xref target="PCN" /> actually proposes that two logical limits
should be used for pre-congestion notification, with the higher
limit as a back-stop for dealing with anomalous events. It envisages
PCN will be used to admission control inelastic real-time traffic,
so marking at the lower limit will trigger admission control, while
at the higher limit it will trigger flow pre-emption.</t>
<t>Because it needs two types of congestion marking, PCN seems to
need five states: Not-ECT, ECT (ECN-capable transport), the ECN
Nonce, Admission Marking (AM) and Flow Pre-emption Marking (PM).
<xref target="PCN" /> proposes various alternative encodings of the
ECN field, attempting various compromises to fit these five states
into the four available ECN codepoints.</t>
<t>One of the five states to make room for is the ECN
Nonce <xref target="RFC3540" />, but the capability we describe
in this memo supersedes any need for the Nonce. The ECN Nonce is an
elegant scheme, but it only allows a sending node (or its proxy) to
detect suppression of congestion marking in the feedback loop. Thus
the Nonce requires the sender or its proxy to be trusted to respond
correctly to congestion. But this is precisely the main cheat we
want to protect against (as well as many others).</t>
<t>One of the compromise protocol encodings that <xref
target="PCN" /> explores ("Alternative 5") leaves out support for
the ECN Nonce. Therefore we use that one. This encoding of PCN
markings is shown on the left of <xref
target="repcn_Tab_PC_EECN_Codepoints" />. Note that these codepoints
of the ECN field only take on the semantics of pre-congestion
notification if they are combined with a Diffserv codepoint that the
operator has configured to cause PCN marking, by mapping it to a
PCN-enhanced PHB.</t>
<t>For the rest of this memo, we will not distinguish between
Admission Marking and Pre-emption Marking unless we need to be
specific. We will call both "congestion marking". With the above
encoding, congestion marking can be read to mean any packet with the
left-most bit of the ECN field set.</t>
<t>The re-ECN protocol can be used to control misbehaving sources
whether congestion is with respect to a logical threshold (PCN) or
the physical line rate (ECN). In either case the RE flag can be used
to create an extended ECN field. For PCN-capable packets, the 8
possible encodings of this 3-bit extended ECN (EECN) field are
defined on the right of <xref
target="repcn_Tab_PC_EECN_Codepoints" /> below. The purposes of
these different codepoints will be introduced in subsequent
sections. <?rfc needLines="26" ?> <texttable
anchor="repcn_Tab_PC_EECN_Codepoints"
title="Extended ECN Codepoints if the Diffserv codepoint uses Pre-congestion Notification (PCN)">
<ttcol align="center">ECN field</ttcol>
<ttcol align="left">PCN codepoint (Alternative 5)</ttcol>
<ttcol align="center">RE flag</ttcol>
<ttcol align="left">Extended ECN codepoint</ttcol>
<ttcol align="center">Re-ECN meaning</ttcol>
<c>00</c>
<c>Not-ECT</c>
<c>0</c>
<c>Not-RECT</c>
<c>Not re-ECN-capable transport</c>
<c>00</c>
<c>Not-ECT</c>
<c>1</c>
<c>FNE</c>
<c>Feedback not established</c>
<c>01</c>
<c>ECT(1)</c>
<c>0</c>
<c>Re-Echo</c>
<c>Re-echoed congestion and RECT</c>
<c>01</c>
<c>ECT(1)</c>
<c>1</c>
<c>RECT</c>
<c>Re-ECN capable transport</c>
<c>10</c>
<c>AM</c>
<c>0</c>
<c>AM(0)</c>
<c>Admission Marking with Re-Echo</c>
<c>10</c>
<c>AM</c>
<c>1</c>
<c>AM(-1)</c>
<c>Admission Marking </c>
<c>11</c>
<c>PM</c>
<c>0</c>
<c>PM(0)</c>
<c>Pre-emption Marking with Re-Echo</c>
<c>11</c>
<c>PM</c>
<c>1</c>
<c>PM(-1)</c>
<c>Pre-emption Marking</c>
</texttable></t>
</section>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Protocol_Operation" title="Protocol Operation">
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Protocol_Operation_Established"
title="Protocol Operation for an Established Flow">
<t>The re-ECN protocol involves a simple tweak to the action of the
gateway at the ingress edge of the CL region. In the deployment
model just described <xref target="I-D.ietf-pcn-architecture" />, for each
active traffic aggregate across the CL region (CL-region-aggregate)
the ingress gateway will hold a fairly recent
Congestion-Level-Estimate that the egress gateway will have fed back
to it, piggybacked on the signalling that sets up each flow. For
instance, one aggregate might have been experiencing 3%
pre-congestion (that is, congestion marked octets whether Admission
Marked or Pre-emption Marked). In this case, the ingress gateway
MUST clear the RE flag to <spanx style="verb">0</spanx> for the same
percentage of octets of CL-packets (3%) and set it to <spanx
style="verb">1</spanx> in the rest (97%). <xref
target="repcn_Alg_Blanking_RE" /> gives a simple pseudo-code
algorithm that the ingress gateway may use to do this.</t>
<t>The RE flag is set and cleared this way round for incremental
deployment reasons (see <xref target="Re-TCP" />). To avoid
confusion we will use the term `blanking' (rather than marking) when
the RE flag is cleared to <spanx style="verb">0</spanx>, so we will
talk of the `RE blanking fraction' as the fraction of octets with
the RE flag cleared to <spanx style="verb">0</spanx>.</t>
<?rfc needLines="17" ?>
<figure anchor="repcn_Fig_Up_Down_Congestion_Imprecise"
title="Example Extended ECN codepoint Marking fractions (Imprecise)">
<artwork><![CDATA[
^
|
| RE blanking fraction
3% | +----------------------------+====+
| | | |
2% | | | |
| | congestion marking fraction| |
1% | | +----------------------+ |
| | | |
0% +----+=====+---------------------------+------>
^ <--A---> <---B---> <---C---> ^ domain
| ^ ^ |
ingress | | egress
1.00% 2.00% marking fraction
]]></artwork>
</figure>
<t><xref target="repcn_Fig_Up_Down_Congestion_Imprecise" />
illustrates our example. The horizontal axis represents the index of
each congestible resource (typically queues) along a path through
the Internet. The two superimposed plots show the fraction of each
ECN codepoint observed along this path, assuming there are two
congested routers somewhere within domains A and C. And <xref
target="repcn_Tab_Downstream_Congestion_Example" /> below shows the
downstream pre-congestion measured at various border observation
points along the path. <xref
target="repcn_Fig_Policing_Framework" /> (later) shows the same
results of these subtractions, but in graphical form like the above
figure. The tabulated figures are actually reasonable approximations
derived from more precise formulae given in Appendix A of <xref
target="Re-TCP" />. The RE flag is not changed by interior routers,
so it can be seen that it acts as a reference against which the
congestion marking fraction can be compared along the path. <?rfc needLines="9" ?>
<texttable anchor="repcn_Tab_Downstream_Congestion_Example"
title="Downstream Congestion Measured at Example Observation Points">
<ttcol align="center">Border observation point</ttcol>
<ttcol align="center">Approximate Downstream
pre-congestion</ttcol>
<c>ingress -- A</c>
<c>3% - 0% = 3%</c>
<c>A -- B</c>
<c>3% - 1% = 2%</c>
<c>B -- C</c>
<c>3% - 1% = 2%</c>
<c>C -- egress</c>
<c>3% - 3% = 0%</c>
</texttable></t>
<t>Note that the ingress determines the RE blanking fraction for
each aggregate using the most recent feedback from the relevant
egress, arriving with each new reservation, or each refresh. These
updates arrive relatively infrequently compared to the speed with
which congestion changes. Although this feedback will always be out
of date, on average positive errors should cancel out negative over
a sufficiently long duration.</t>
<t>In summary, the network adds pre-congestion marking in the
forward data path, the egress feeds its level back to the ingress in
RSVP (or similar signalling), then the ingress gateway re-echoes it
into the forward data path by blanking the RE flag. Hence the name
re-ECN. Then at any border within the Diffserv region, the
pre-congestion marking that every passing packet will be expected to
experience downstream can be measured to be the RE blanking fraction
minus the congestion marking fraction.</t>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Aggregate_Bootstrap"
title="Aggregate Bootstrap">
<t>When a new reservation PATH message arrives at the egress, if
there are currently no flows in progress from the same ingress,
there will be no state maintaining the current level of
pre-congestion marking for the aggregate. While the reservation
signalling continues onward towards the receiving host, the egress
gateway returns an RSVP message to the ingress with a
flag <xref target="RSVP-ECN" /> asking the ingress to send a
specified number of data probes between them. This bootstrap
behaviour is all described in the deployment model <xref
target="I-D.ietf-pcn-architecture" />.</t>
<t>However, with our new re-ECN scheme, the ingress does not know
what proportion of the data probes should have the RE flag blanked,
because it has no estimate yet of pre-congestion for the path across
the Diffserv region.</t>
<t>To be conservative, following the guidance for specifying other
re-ECN transports in <xref target="Re-TCP" />, the ingress SHOULD
set the FNE codepoint of the extended ECN header in all probe
packets (<xref target="repcn_Tab_PC_EECN_Codepoints" />). As per the
deployment model, the egress gateway measures the fraction of
congestion-marked probe octets and feeds back the resulting
pre-congestion level to the ingress, piggy-backed on the returning
reservation response (RESV) for the new flow. Probe packets are
identifiable by the egress because they have the ingress as the
source and the egress as the destination in the IP header.</t>
<t>It may seem inadvisable to expect the FNE codepoint to be set on
probes, given legacy firewalls etc. might discard such packets
(because this flag had no previous legitimate use). However, in the
deployment scenarios envisaged, each domain in the Diffserv region
has to be explicitly configured to support the controlled load
service. So, before deploying the service, the operator MUST
reconfigure such a misbehaving middlebox to allow through packets
with the RE flag set.</t>
<t>Note that we have said SHOULD rather than MUST for the FNE
setting behaviour of the ingress for probe packets. This entertains
the possibility of an ingress implementation having the benefit of
other knowledge of the path, which it re-uses for a newly starting
aggregate. For instance, it may hold cached information from a
recent use of the aggregate that is still sufficiently current to be
useful.</t>
<t>It might seem pedantic worrying about these few probe packets,
but this behaviour ensures the system is safe, even if the
proportion of probe packets becomes large.</t>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Flow_Bootstrap" title="Flow Bootstrap">
<t>It might be expected that a new flow within an active aggregate
would need no special bootstrap behaviour. If there was an aggregate
already in progress between the gateways the new flow was about to
use, it would inherit the prevailing RE blanking fraction. And if
there were no active aggregate, the bootstrap behaviour for an
aggregate would be appropriate and sufficient for the new flow.</t>
<t>However, for a number of reasons, at least the first packet of
each new flow SHOULD be set to the FNE codepoint, irrespective of
whether it is joining an active aggregate or not. If the first
packet is unlikely to be reliably delivered, a number of FNE packets
MAY be sent to increase the probability that at least one is
delivered to the egress gateway.</t>
<t>If each flow does not start with an FNE packet, it will be seen
later that sanctions may be too strict at the interface before the
egress gateway. It will often be possible to apply sanctions at the
granularity of aggregates rather than flows, but in an
internetworked environment it cannot be guaranteed that aggregates
will be identifiable in remote networks. So setting FNE at the start
of each flow is a safe strategy. For instance, a remote network may
have equal cost multi-path (ECMP) routing enabled, causing different
flows between the same gateways to traverse different paths.</t>
<t>After an idle period of more than 1 second, the ingress gateway
SHOULD set the EECN field of the next packet it sends to FNE. This
allows the design of network policers to be deterministic (see <xref
target="Re-TCP" />).</t>
<t>However, if the ingress gateway can guarantee that the network(s)
that will carry the flow to its egress gateway all use a common
identifier for the aggregate (e.g. a single MPLS network without
ECMP routing), it MAY NOT set FNE when it adds a new flow to an
active aggregate. And an FNE packet need only be sent if a whole
aggregate has been idle for more than 1 second.</t>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Router_Forwarding_Behaviour"
title="Router Forwarding Behaviour">
<t>Adding re-ECN works well without modifying the forwarding
behaviour of any routers. However, below, two changes are proposed
when forwarding packets with a per-hop-behaviour that requires
pre-congestion notification:</t>
<list style="hanging">
<t hangText="Preferential drop:">When a router cannot avoid
dropping ECN-capable packets, preferential dropping of packets
with different extended ECN codepoints SHOULD be implemented
between packets within a PHB that uses PCN marking. The drop
preference order to use is defined in <xref
target="repcn_Tab_Drop_Pref" />. Note that to reduce configuration
complexity, Re-Echo and FNE MAY be given the same drop preference,
but if feasible, FNE should be dropped in preference to Re-Echo.
<?rfc needLines="22" ?> <texttable anchor="repcn_Tab_Drop_Pref"
title="Drop Preference of Extended ECN Codepoints (1 = drop 1st)">
<ttcol align="center">ECN field</ttcol>
<ttcol align="center">RE flag</ttcol>
<ttcol align="left">Extended ECN codepoint</ttcol>
<ttcol align="left">Drop Pref</ttcol>
<ttcol align="center">Re-ECN meaning</ttcol>
<c>01</c>
<c>0</c>
<c>Re-Echo</c>
<c>5/4</c>
<c>Re-echoed congestion and RECT</c>
<c>00</c>
<c>1</c>
<c>FNE</c>
<c>4</c>
<c>Feedback not established</c>
<c>01</c>
<c>1</c>
<c>RECT</c>
<c>3</c>
<c>Re-ECN capable transport</c>
<c>10</c>
<c>0</c>
<c>AM(0)</c>
<c>3</c>
<c>Admission Marking with Re-Echo</c>
<c>10</c>
<c>1</c>
<c>AM(-1)</c>
<c>3</c>
<c>Admission Marking
</c>
<c>11</c>
<c>0</c>
<c>PM(0)</c>
<c>2</c>
<c>Pre-emption Marking with Re-Echo</c>
<c>11</c>
<c>1</c>
<c>PM(-1)</c>
<c>2</c>
<c>Pre-emption Marking </c>
<c>00</c>
<c>0</c>
<c>Not-RECT</c>
<c>1</c>
<c>Not re-ECN-capable transport</c>
</texttable></t>
<t>Given this proposal is being advanced at the same time as PCN
itself, we strongly RECOMMEND that preferential drop based on
extended ECN codepoint is added to router forwarding at the same
time as PCN marking. Preferential dropping can be difficult to
implement, but we strongly RECOMMEND this security-related re-ECN
improvement where feasible as it is an effective defence against
flooding attacks.</t>
<t hangText="Marking vs. Drop:">We propose that PCN-routers SHOULD
inspect the RE flag as well as the ECN field to decide whether to
drop or mark PCN DSCPs. They MUST choose drop if the codepoint of
this extended ECN field is Not-RECT. Otherwise they SHOULD mark
(unless, of course, buffer space is exhausted).</t>
<t>A PCN-capable router MUST NOT ever congestion mark a packet
carrying the Not-RECT codepoint because the transport will only
understand drop, not congestion marking. But a PCN-capable router
can mark rather than drop an FNE packet, even though its ECN field
when looked at in isolation is '00' which appears to be a legacy
Not-ECT packet. Therefore, if a packet's RE flag is '1', even if
its ECN field is '00', a PCN-enabled router SHOULD use congestion
marking. This allows the `feedback not established' (FNE)
codepoint to be used for probe packets, in order to pick up PCN
marking when bootstrapping an aggregate.</t>
<t>ECN marking rather than dropping of FNE packets MUST only be
deployed in controlled environments, such as that in <xref
target="I-D.ietf-pcn-architecture" />, where the presence of an egress node that
understands ECN marking is assured. Congestion events might
otherwise be ignored if the receiver only understands drop, rather
than ECN marking. This is because there is no guarantee that ECN
capability has been negotiated if feedback is not established
(FNE). Also, <xref target="Re-TCP" /> places the strong condition
that a router MUST apply drop rather than marking to FNE packets
unless it can guarantee that FNE packets are rate limited either
locally or upstream.</t>
</list>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Extensions" title="Extensions">
<t>If a different signalling system, such as NSIS, were used, but it
provided admission control in a similar way, using pre-congestion
notification (e.g. Arumaithurai <xref
target="I-D.arumaithurai-nsis-pcn" /> or RMD <xref
target="I-D.ietf-nsis-rmd" />) we believe re-ECN could be used to
protect against misbehaving networks in the same way as proposed
above.</t>
</section>
</section>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Emulating_Policing_Re-ECN"
title="Emulating Border Policing with Re-ECN">
<!-- ________________________________________________________________ -->
<t>Note that the re-ECN protocol described in <xref
target="repcn_Re-ECN-RSVP_Protocol" /> above would require
standardisation, whereas operators acting in their own interests would
be expected to deploy policing and monitoring functions similar to those
proposed in the sections below without any further need for
standardisation by the IETF. Flexibility is expected in exactly how
policing and monitoring is done.</t>
<section anchor="repcn_Informal_Terminology"
title="Informal Terminology">
<t>In the rest of this memo, where the context makes it clear, we will
sometimes loosely use the term `congestion' rather than using the
stricter `downstream pre-congestion'. Also we will loosely talk of
positive or negative flows, meaning flows where the moving average of
the downstream pre-congestion metric is persistently positive or
negative. The notion of a negative metric arises because it is derived
by subtracting one metric from another. Of course actual downstream
congestion cannot be negative, only the metric can (whether due to
time lags or deliberate malice).</t>
<t>Just as we will loosely talk of positive and negative flows, we
will also talk of positive or negative packets, meaning packets that
contribute positively or negatively to downstream pre-congestion.</t>
<t>Therefore packets can be considered to have a `worth' of +1, 0 or
-1, which, when multiplied by their size, indicates their contribution
to downstream congestion. Packets will usually be sent with a worth of
0. Blanking the RE flag increments the worth of a packet to +1.
Congestion marking a packet decrements its worth (whether admission
marking or pre-emption marking). Congestion marking a previously
blanked packet cancel out the positive and negative worth of each
marking (a worth of 0). The FNE codepoint is an exception. It has the
same positive worth as a packet with the Re-Echo codepoint. The table
below specifies unambiguously the worth of each extended ECN
codepoint. Note the order is different from the previous table to
emphasise how congestion marking processes decrement the worth. <?rfc needLines="22" ?>
<texttable anchor="repcn_Tab_Worth"
title="'Worth' of Extended ECN Codepoints">
<ttcol align="center">ECN field</ttcol>
<ttcol align="center">RE flag</ttcol>
<ttcol align="left">Extended ECN codepoint</ttcol>
<ttcol align="left">Worth</ttcol>
<ttcol align="center">Re-ECN meaning</ttcol>
<c>00</c>
<c>0</c>
<c>Not-RECT</c>
<c>n/a</c>
<c>Not re-ECN-capable transport</c>
<c>01</c>
<c>0</c>
<c>Re-Echo</c>
<c>+1</c>
<c>Re-echoed congestion and RECT</c>
<c>10</c>
<c>0</c>
<c>AM(0)</c>
<c>0</c>
<c>Admission Marking with Re-Echo</c>
<c>11</c>
<c>0</c>
<c>PM(0)</c>
<c>0</c>
<c>Pre-emption Marking with Re-Echo</c>
<c>00</c>
<c>1</c>
<c>FNE</c>
<c>+1</c>
<c>Feedback not established</c>
<c>01</c>
<c>1</c>
<c>RECT</c>
<c>0</c>
<c>Re-ECN capable transport</c>
<c>10</c>
<c>1</c>
<c>AM(-1)</c>
<c>-1</c>
<c>Admission Marking
</c>
<c>11</c>
<c>1</c>
<c>PM(-1)</c>
<c>-1</c>
<c>Pre-emption Marking</c>
</texttable></t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Policing_Overview" title="Policing Overview">
<t>It will be recalled that downstream congestion can be found by
subtracting upstream congestion from path congestion. <xref
target="repcn_Fig_Policing_Framework" /> displays the difference
between the two plots in <xref
target="repcn_Fig_Up_Down_Congestion_Imprecise" /> to show downstream
pre-congestion across the same path through the Internet.</t>
<t>To emulate border policing, the general idea is for each domain to
apply penalties to its upstream neighbour in proportion to the amount
of downstream pre-congestion that the upstream network sends across
the border. That is, the penalties should be in proportion to the
height of the plot. Downward arrows in the figure show the resulting
pressure for each domain to under-declare downstream pre-congestion in
traffic they pass to the next domain, because of the penalties. <?rfc needLines="23" ?>
<figure anchor="repcn_Fig_Policing_Framework"
title="Policing Framework, showing creation of opposing pressures to under-declare and over-declare downstream pre-congestion, using penalties and sanctions">
<artwork><![CDATA[
p e n a l t i e s
/ | \
A : : :
| | <--A---> <---B---> <---C---> domain
| V : : :
3% | +-----+ | | :
| | | V V :
2% | | +----------------------+ :
| | downstream pre-congestion | :
1% | | : | :
| | : | :
0% +----+----------------------------+====+------>
: : : A :
: : : | :
ingress : : : egress
1.00% 2.00%: pre-congestion
|
sanctions
]]></artwork>
</figure></t>
<t>These penalties seem to encourage everyone to understate downstream
congestion in order to reduce the penalties they incur. But a
balancing pressure is introduced by the last domain, which applies
sanctions to flows if downstream congestion goes negative before the
egress gateway. The upward arrow at Domain C's border with the egress
gateway represents the incentive the sanctions would create to prevent
negative traffic. The same upward pressure can be applied at any
domain border (arrows not shown).</t>
<t>Any flow that persistently goes negative by the time it leaves a
domain must not have been marked correctly in the first place. A
domain that discovers such a flow can adopt a range of strategies to
protect itself. Which strategy it uses will depend on policy, because
it cannot immediately assume malice—there may be an innocent
configuration error somewhere in the system.</t>
<t>This memo does not propose to standardise any particular mechanism
to detect persistently negative flows, but <xref
target="repcn_Sanctioning_Dishonest_Marking" /> does give examples.
Note that we have used the term flow, but there will be no need to
bury into the transport layer for port numbers; identifiers visible in
the network layer will be sufficient (IP address pair, DSCP, protocol
ID). The appendix also gives a mechanism to bound the required flow
state, preventing state exhaustion attacks.</t>
<t>Of course, some domains may trust other domains to comply with
admission control without applying sanctions or penalties. In these
cases, the protocol should still be used but no penalties need be
applied. The re-ECN protocol ensures downstream pre-congestion marking
is passed on correctly whether or not penalties are applied to it, so
the system works just as well with a mixture of some domains trusting
each other and others not.</t>
<t>Providers should be free to agree the contractual terms they wish
between themselves, so this memo does not propose to standardise how
these penalties would be applied. It is sufficient to standardise the
re-ECN protocol so the downstream pre-congestion metric is available
if providers choose to use it. However, the next section (<xref
target="repcn_Pre-requisite_Contract" />) gives some examples of how
these penalties might be implemented.</t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Pre-requisite_Contract"
title="Pre-requisite Contractual Arrangements">
<t>The re-ECN protocol has been chosen to solve the policing problem
because it embeds a downstream pre-congestion metric in passing CL
traffic that is difficult to lie about and can be measured in bulk.
The ability to emulate border policing depends on network operators
choosing to use this metric as one of the elements in their contracts
with each other.</t>
<t>Already many inter-domain agreements involve a capacity and a usage
element. The usage element may be based on volume or various measures
of peak demand. We expect that those network operators who choose to
use pre-congestion notification for admission control would also be
willing to consider using this downstream pre-congestion metric as a
usage element in their interconnection contracts for admission
controlled (CL) traffic.</t>
<t>Congestion (or pre-congestion) has the dimension of [octet], being
the product of volume transferred [octet] and the congestion fraction
[dimensionless], which is the fraction of the offered load that the
network isn't able to serve (or would rather not serve in the case of
pre-congestion). Measuring downstream congestion gives a measure of
the volume transferred but modulated by congestion expected
downstream. So volume transferred during off-peak periods counts as
nearly nothing, while volume transferred at peak times counts very
highly. The re-ECN protocol allows one network to measure how much
pre-congestion has been `dumped' into it by another network. And then
in turn how much of that pre-congestion it dumped into the next
downstream network.</t>
<t><xref target="repcn_Border_Mechanisms" /> describes mechanisms for
calculating border penalties referring to <xref
target="repcn_Alg_Metering" /> for suggested metering algorithms for
downstream congestion at a border router. Conceptually, it could
hardly be simpler. It broadly involves accumulating the volume of
packets with the RE flag blanked and the volume of those with
congestion marking then subtracting the two.</t>
<t>Once this downstream pre-congestion metric is available, operators
are free to choose how they incorporate it into their interconnection
contracts <xref target="IXQoS" />. Some may include a threshold
volume of pre-congestion as a quality measure in their service level
agreement, perhaps with a penalty clause if the upstream network
exceeds this threshold over, say, a month. Others may agree a set of
tiered monthly thresholds, with increasing penalties as each threshold
is exceeded. But, it would be just as easy, and more resistant to
gaming, to do away with discrete thresholds, and instead make the
penalty rise smoothly with the volume of pre-congestion by applying a
price to pre-congestion itself. Then the usage element of the
interconnection contract would directly relate to the volume of
pre-congestion caused by the upstream network.</t>
<t>The direction of penalties and charges relative to the direction of
traffic flow is a constant source of confusion. Typically, where
capacity charges are concerned, lower tier customer networks pay
higher tier provider networks. So money flows from the edges to the
middle of the internetwork, towards greater connectivity, irrespective
of the flow of data. But we advise that penalties or charges for usage
should follow the same direction as the data flow—the direction
of control at the network layer. Otherwise a network lays itself open
to `denial of funds' attacks. So, where a tier 2 provider sends data
into a tier 3 customer network, we would expect the penalty clauses
for sending too much pre-congestion to be against the tier 2 network,
even though it is the provider.</t>
<t>It may help to remember that data will be flowing in the other
direction too. So the provider network has as much opportunity to levy
usage penalties as its customer, and it can set the price or strength
of its own penalties higher if it chooses. Usage charges in both
directions tend to cancel each other out, which confirms that
usage-charging is less to do with revenue raising and more to do with
encouraging load control discipline in order to smooth peaks and
troughs, improving utilisation and quality.</t>
<t>Further, when operators agree penalties in their interconnection
contracts for sending downstream congestion, they should make sure
that any level of negative marking only equates to zero penalty. In
other words, penalties are always paid in the same direction as the
data, and never against the data flow, even if downstream congestion
seems to be negative. This is consistent with the definition of
physical congestion; when a resource is underutilised, it is not
negatively congested. Its congestion is just zero. So, although short
periods of negative marking can be tolerated to correct temporary
over-declarations due to lags in the feedback system, persistent
downstream negative congestion can have no physical meaning and
therefore must signify a problem. The incentive for domains not to
tolerate persistently negative traffic depends on this principle that
penalties must never be paid against the data flow.</t>
<t>Also note that at the last egress of the Diffserv region, domain C
should not agree to pay any penalties to the egress gateway for
pre-congestion passed to the egress gateway. Downstream pre-congestion
to the egress gateway should have reached zero here. If domain C were
to agree to pay for any remaining downstream pre-congestion, it would
give the egress gateway an incentive to over-declare pre-congestion
feedback and take the resulting profit from domain C.</t>
<t>To focus the discussion, from now on, unless otherwise stated, we
will assume a downstream network charges its upstream neighbour in
proportion to the pre-congestion it sends (V_b in the notation of
<xref target="repcn_Alg_Metering" />). Effectively tiered thresholds
would be just more coarse-grained approximations of the fine-grained
case we choose to examine. If these neighbours had previously agreed
that the (fixed) price per octet of pre-congestion would be L, then
the bill at the end of the month would simply be the product L*V_b,
plus any fixed charges they may also have agreed.</t>
<t>We are well aware that the IETF tries to avoid standardising
technology that depends on a particular business model. Indeed, this
principle is at the heart of all our own work. Our aim here is to make
a new metric available that we believe is superior to all existing
metrics. Then, our aim is to show that border policing can at least
work with the one model we have just outlined. We assume that
operators might then experiment with the metric in other models. Of
course, operators are free to complement this pre-congestion-based
usage element of their charges with traditional capacity charging, and
we expect they will.</t>
<t>Also note well that everything we discuss in this memo only
concerns interconnection within the Diffserv region. ISPs are free to
sell or give away reservations however they want on the retail market.
But of course, interconnection charges will have a bearing on that.
Indeed, in the present scenario, the ingress gateway effectively sells
reservations on one side and buys congestion penalties on the other.
As congestion rises, one can imagine the gateway discovering that
congestion penalties have risen higher than the (probably fixed)
revenue it will earn from selling the next flow reservation. This
encourages the gateway to cut its losses by blocking new calls, which
is why we believe downstream congestion penalties can emulate per-flow
rate policing at borders, as the next section explains.</t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Emulation_Rationale_Limits"
title="Emulation of Per-Flow Rate Policing: Rationale and Limits">
<t>The important feature of charging in proportion to congestion
volume is that the penalty aggregates and disaggregates correctly
along with packet flows. This is because the penalty rises linearly
with bit rate (unless congestion is absolutely zero) and linearly with
congestion, because it is the product of them both. So if the packets
crossing a border belong to a thousand flows, and one of those flows
doubles its rate, the ingress gateway forwarding that flow will have
to put twice as much congestion marking into the packets of that flow.
And this extra congestion marking will add proportionately to the
penalties levied at every border the flow crosses in proportion to the
amount of pre-congestion remaining on the path.</t>
<t>Effectively, usage charges will continuously flow from ingress
gateways to the places generating pre-congestion marking, in
proportion to the pre-congestion marking introduced and to the data
rates from those gateways.</t>
<t>As importantly, pre-congestion itself rises super-linearly with
utilisation of a particular resource. So if someone tries to push
another flow into a path that is already signalling enough
pre-congestion to warrant admission control, the penalty will be a lot
greater than it would have been to add the same flow to a less
congested path. This makes the incentive system fairly insensitive to
the actual level of pre-congestion for triggering admission control
that each ingress chooses. The deterrent against exceeding whatever
threshold is chosen rises very quickly with a small amount of
cheating.</t>
<t>These are the properties that allow re-ECN to emulate per-flow
border policing of both rate and admission control. It is not a
perfect emulation of per-flow border policing, but we claim it is
sufficient to at least ensure the cost to others of a cheat is borne
by the cheater, because the penalties are at least proportionate to
the level of the cheat. If an edge network operator is selling
reservations at a large profit over the congestion cost, these
pre-congestion penalties will not be sufficient to ensure networks in
the middle get a share of those profits, but at least they can cover
their costs.</t>
<t>We will now explain with an example. When a whole inter-network is
operating at normal (typically very low) congestion, the
pre-congestion marking from virtual queues will be a little higher
than if the real queues had been used—still low, but more
noticeable. But low congestion levels do not imply that usage <spanx
style="emph">charges</spanx> must also be low. Usage charges will
depend on the <spanx style="emph">price</spanx> L as well.</t>
<t>If the metric of the usage element of an interconnection agreement
was changed from pure volume to pre-congested volume, one would expect
the price of pre-congestion to be arranged so that the total usage
charge remained about the same. So, if an average pre-congestion
fraction turned out to be 1/1000, one would expect that the price L
(per octet) of pre-congestion would be about 1000 times the previously
used (per octet) price for volume. We should add that a switch to
pre-congestion is unlikely to exactly maintain the same overall level
of usage charges, but this argument will be approximately true,
because usage charge will rise to at least the level the market finds
necessary to push back against usage.</t>
<t>From the above example it can be seen why a 1000x higher price will
make operators become acutely sensitive to the congestion they cause
in other networks, which is of course the desired effect; to encourage
networks to <spanx style="emph">control</spanx> the congestion they
allow their users to cause to others.</t>
<t>If any network sends even one flow at higher rate, they will
immediately have to pay proportionately more usage charges. Because
there is no knowledge of reservations within the Diffserv region, no
interior router can police whether the rate of each flow is greater
than each reservation. So the system doesn't truly emulate
rate-policing of each flow. But there is no incentive to pack a higher
rate into a reservation, because the charges are directly proportional
to rate, irrespective of the reservations.</t>
<t>However, if virtual queues start to fill on any path, even though
real queues will still be able to provide low latency service,
pre-congestion marking will rise fairly quickly. It may eventually
reach the threshold where the ingress gateway would deny admission to
new flows. If the ingress gateway cheats and continues to admit new
flows, the affected virtual queues will rapidly fill, even though the
real queues will still be little worse than they were when admission
control should have been invoked. The ingress gateway will have to pay
the penalty for such an extremely high pre-congestion level, so the
pressure to invoke admission control should become unbearable.</t>
<t>The above mechanisms protect against rational operators. In <xref
target="repcn_Fail-safes" /> we discuss how networks can protect
themselves from accidental or deliberate misconfiguration in
neighbouring networks.</t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Sanctioning_Dishonest_Marking"
title="Sanctioning Dishonest Marking">
<t>As CL traffic leaves the last network before the egress gateway
(domain C) the RE blanking fraction should match the congestion
marking fraction, when averaged over a sufficiently long duration
(perhaps ~10s to allow a few rounds of feedback through regular
signalling of new and refreshed reservations).</t>
<t>To protect itself, domain C should install a monitor at its egress.
It aims to detect flows of CL packets that are persistently negative.
If flows are positive, domain C need take no action—this simply
means an upstream network must be paying more penalties than it needs
to. <xref target="repcn_Alg_Sanction_Negative" /> gives a suggested
algorithm for the monitor, meeting the criteria below. <list
style="symbols">
<t>It SHOULD introduce minimal false positives for honest
flows;</t>
<t>It SHOULD quickly detect and sanction dishonest flows (minimal
false negatives);</t>
<t>It MUST be invulnerable to state exhaustion attacks from
malicious sources. For instance, if the dropper uses flow-state,
it should not be possible for a source to send numerous packets,
each with a different flow ID, to force the dropper to exhaust its
memory capacity;</t>
<t>It MUST introduce sufficient loss in goodput so that malicious
sources cannot play off losses in the egress dropper against
higher allowed throughput. Salvatori <xref
target="CLoop_pol" /> describes this attack, which involves the
source understating path congestion then inserting forward error
correction (FEC) packets to compensate expected losses.</t>
</list></t>
<t>Note that the monitor operates on flows but with careful design we
can avoid per-flow state. This is why we have been careful to ensure
that all flows MUST start with a packet marked with the FNE codepoint.
If a flow does not start with the FNE codepoint, a monitor is likely
to treat it unfavourably. This risk makes it worth setting the FNE
codepoint at the start of a flow, even though there is a cost to
setting FNE (positive `worth').</t>
<t>Starting flows with an FNE packet also means that a monitor will be
resistant to state exhaustion attacks from other networks, as the
monitor can then be designed to never create state unless an FNE
packet arrives. And an FNE packet counts positive, so it will cost a
lot for a network to send many of them.</t>
<t>Monitor algorithms will often maintain a moving average across
flows of the fraction of RE blanked packets. When maintaining an
average across flows, a monitor MUST ignore packets with the FNE
codepoint set. An ingress gateway sets the FNE codepoint when it does
not have the benefit of feedback from the egress. So counting packets
with FNE cleared would be likely to make the average unnecessarily
positive, providing headroom (or should we say footroom?) for
dishonest (negative) traffic.</t>
<t>If the monitor detects a persistently negative flow, it could drop
sufficient negative and neutral packets to force the flow to not be
negative. This is the approach taken for the `egress dropper' in <xref
target="Re-TCP" />, but for the scenario in this memo, where everyone
would expect everyone else to keep to the protocol, a management alarm
SHOULD be raised on detecting persistently negative traffic and any
automatic sanctions taken SHOULD be logged. Even if the chosen policy
is to take no automatic action, the cause can then be investigated
manually.</t>
<t>Then all ingresses cannot understate downstream pre-congestion
without their action being logged. So network operators can deal with
offending networks at the human level, out of band. As a last resort,
perhaps where the ingress gateway address seems to have been spoofed
in the signalling, packets can be dropped. Drops could be focused on
just sufficient packets in misbehaving flows to remove the negative
bias while doing minimal harm.</t>
<t>A future version of this memo may define a control message that
could be used to notify an offending ingress gateway (possibly via the
egress gateway) that it is sending persistently negative flows.
However, we are aware that such messages could be used to test the
sensitivity of the detection system, so currently we prefer silent
sanctions.</t>
<t>An extreme scenario would be where an ingress gateway (or set of
gateways) mounted a DoS attack against another network. If their
traffic caused sufficient congestion to lead to drop but they
understated path congestion to avoid penalties for causing high
congestion, the preferential drop recommendations in <xref
target="repcn_Router_Forwarding_Behaviour" /> would at least ensure
that these flows would always be dropped before honest flows..</t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Border_Mechanisms" title="Border Mechanisms">
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Border_Accounting_Mechanisms"
title="Border Accounting Mechanisms"><t>One of the main design goals
of re-ECN was for border security mechanisms to be as simple as
possible, otherwise they would become the pinch-points that limit
scalability of the whole internetwork. As the title of this memo
suggests, we want to avoid per-flow processing at borders. We also
want to keep to passive mechanisms that can monitor traffic in
parallel to forwarding, rather than having to filter traffic
inline—in series with forwarding. As data rates continue to
rise, we suspect that all-optical interconnection between networks
will soon be a requirement. So we want to avoid any new need for
buffering (even though border filtering is current practice for other
reasons, we don't want to make it even less likely that we will ever
get rid of it). </t> <t>So far, we have been able to keep the border
mechanisms simple, despite having had to harden them against some
subtle attacks on the re-ECN design. The mechanisms are still passive
and avoid per-flow processing, although we do use filtering as a
fail-safe to temporarily shield against extreme events in other
networks, such as accidental misconfigurations (<xref
target="repcn_Fail-safes" />). </t> <t>The basic accounting mechanism
at each border interface simply involves accumulating the volume of
packets with positive worth (Re-Echo and FNE), and subtracting the
volume of those with negative worth: AM(-1) and PM(-1). Even though
this mechanism takes no regard of flows, over an accounting period
(say a month) this subtraction will account for the downstream
congestion caused by all the flows traversing the interface, wherever
they come from, and wherever they go to. The two networks can agree to
use this metric however they wish to determine some congestion-related
penalty against the upstream network (see <xref
target="repcn_Pre-requisite_Contract" /> for examples). Although the
algorithm could hardly be simpler, it is spelled out using pseudo-code
in <xref target="repcn_Bulk_Alg_Metering" />. </t> {ToDo: Replace the
XML from here to just before "Note that the guiding principle..." with
that in draft-briscoe-tsvwg-re-ecn-border-cheat-02a_fragment.xml}
<t>Various attempts to subvert the re-ECN design have been made. In
all cases their root cause is persistently negative flows. But, after
describing these attacks we will show that we don't actually have to
get rid of all persistently negative flows in order to thwart the
attacks. </t> <t>In honest flows, downstream congestion is measured as
positive minus negative volume. So if all flows are honest (i.e. not
persistently negative), adding all positive volume and all negative
volume without regard to flows will give an aggregate measure of
downstream congestion. But such simple aggregation is only possible if
no flows are persistently negative. Unless persistently negative flows
are completely removed, they will reduce the aggregate measure of
congestion. The aggregate may still be positive overall, but not as
positive as it would have been had the negative flows been removed.
</t> <t>In <xref target="repcn_Sanctioning_Dishonest_Marking" /> we
discussed how to sanction traffic to remove, or at least to identify,
persistently negative flows. But, even if the sanction for negative
traffic is to discard it, unless it is discarded at the exact point it
goes negative, it will wrongly subtract from aggregate downstream
congestion, at least at any borders it crosses after it has gone
negative but before it is discarded. </t> <t>We rely on sanctions to
deter dishonest understatement of congestion. But even the ultimate
sanction of discard can only be effective if the sender is bothered
about the data getting through to its destination. A number of attacks
have been identified where a sender gains from sending dummy traffic
or it can attack someone or something using dummy traffic even though
it isn't communicating any information to anyone: <list
style="symbols">
<t>A network can simply create its own dummy traffic to congest
another network, perhaps causing it to lose business at no cost to
the attacking network. This is a form of denial of service
perpetrated by one network on another. The preferential drop
measures in <xref target="repcn_Router_Forwarding_Behaviour" />
provide crude protection against such attacks, but we are not
overly worried about more accurate prevention measures, because it
is already possible for networks to DoS other networks on the
general Internet, but they generally don't because of the grave
consequences of being found out. We are only concerned if re-ECN
increases the motivation for such an attack, as in the next
example.</t>
<t>A network can just generate negative traffic and send it over
its border with a neighbour to reduce the overall penalties that
it should pay to that neighbour. It could even initialise the TTL
so it expired shortly after entering the neighbouring network,
reducing the chance of detection further downstream. This attack
need not be motivated by a desire to deny service and indeed need
not cause denial of service. A network's main motivator would most
likely be to reduce the penalties it pays to a neighbour. But, the
prospect of financial gain might tempt the network into mounting a
DoS attack on the other network as well, given the gain would
offset some of the risk of being detected.</t>
</list> </t> <t>Note that we have not included DoS by Internet hosts
in the above list of attacks, because we have restricted ourselves to
a scenario with edge-to-edge admission control across a Diffserv
region. In this case, the edge ingress gateways insulate the Diffserv
region from DoS by Internet hosts. Re-ECN resists more general DoS
attacks, but this is discussed in <xref target="Re-TCP" />. </t>
<t>The first step towards a solution to all these problems with
negative flows is to be able to estimate the contribution they make to
downstream congestion at a border and to correct the measure
accordingly. Although ideally we want to remove negative flows
themselves, perhaps surprisingly, the most effective first step is to
cancel out the polluting effect negative flows have on the measure of
downstream congestion at a border. It is more important to get an
unbiased estimate of their effect, than to try to remove them all. A
suggested algorithm to give an unbiased estimate of the contribution
from negative flows to the downstream congestion measure is given in
<xref target="repcn_Inflation_Negative_Flows" />. </t> <t>Although
making an accurate assessment of the contribution from negative flows
may not be easy, just the single step of neutralising their polluting
effect on congestion metrics removes all the gains networks could
otherwise make from mounting dummy traffic attacks on each other. This
puts all networks on the same side (only with respect to negative
flows of course), rather than being pitched against each other. The
network where this flow goes negative as well as all the networks
downstream lose out from not being reimbursed for any congestion this
flow causes. So they all have an interest in getting rid of these
negative flows. Networks forwarding a flow before it goes negative
aren't strictly on the same side, but they are disinterested
bystanders—they don't care that the flow goes negative
downstream, but at least they can't actively gain from making it go
negative. The problem becomes localised so that once a flow goes
negative, all the networks from where it happens and beyond downstream
each have a small problem, each can detect it has a problem and each
can get rid of the problem if it chooses to. But negative flows can no
longer be used for any new attacks. </t> <t>Once an unbiased estimate
of the effect of negative flows can be made, the problem reduces to
detecting and preferably removing flows that have gone negative as
soon as possible. But importantly, complete eradication of negative
flows is no longer critical—best endeavours will be sufficient.
</t> <t>Note that the guiding principle behind all the above
discussion is that any gain from subverting the protocol should be
precisely neutralised, rather than punished. If a gain is punished to
a greater extent than is sufficient to neutralise it, it will most
likely open up a new vulnerability, where the amplifying effect of the
punishment mechanism can be turned on others. </t> <t>For instance, if
possible, flows should be removed as soon as they go negative, but we
do NOT RECOMMEND any attempts to discard such flows further upstream
while they are still positive. Such over-zealous push-back is
unnecessary and potentially dangerous. These flows have paid their
`fare' up to the point they go negative, so there is no harm in
delivering them that far. If someone downstream asks for a flow to be
dropped as near to the source as possible, because they say it is
going to become negative later, an upstream node cannot test the truth
of this assertion. Rather than have to authenticate such messages,
re-ECN has been designed so that flows can be dropped solely based on
locally measurable evidence. A message hinting that a flow should be
watched closely to test for negativity is fine. But not a message that
claims that a positive flow will go negative later, so it should be
dropped. . </t></section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Competitive_Routing"
title="Competitive Routing">
<t>With the above penalty system, each domain seems to have a
perverse incentive to fake pre-congestion. For instance domain B
profits from the difference between penalties it receives at its
ingress (its revenue) and those it pays at its egress (its cost). So
if B overstates internal pre-congestion it seems to increase its
profit. However, we can assume that domain A could bypass B, routing
through other domains to reach the egress. So the competitive
discipline of least-cost routing can ensure that any domain tempted
to fake pre-congestion for profit risks losing <spanx
style="emph">all</spanx> its incoming traffic. The least congested
route would eventually be able to win this competitive game, only as
long as it didn't declare more fake pre-congestion than the next
most competitive route.</t>
<t>The competitive effect of interdomain routing might be weaker
nearer to the egress. For instance, C may be the only route B can
take to reach the ultimate receiver. And if C over-penalises B, the
egress gateway and the ultimate receiver seem to have no incentive
to move their terminating attachment to another network, because
only B and those upstream of B suffer the higher penalties. However,
we must remember that we are only looking at the money flows at the
unidirectional network layer. There are likely to be all sorts of
higher level business models constructed over the top of these low
level 'sender-pays' penalties. For instance, we might expect a
session layer charging model where the session originator pays for a
pair of duplex flows, one as receiver and one as sender.
Traditionally this has been a common model for telephony and we
might expect it to be used, at least sometimes, for other media such
as video. Wherever such a model is used, the data receiver will be
directly affected if its sessions terminate through a network like C
that fakes congestion to over-penalise B. So end-customers will
experience a direct competitive pressure to switch to cheaper
networks, away from networks like C that try to over-penalise B.</t>
<t>This memo does not need to standardise any particular mechanism
for routing based on re-ECN. Goldenberg et al <xref
target="Smart_rtg" /> refers to various commercial products and
presents its own algorithms for moving traffic between multi-homed
routes based on usage charges. None of these systems require any
changes to standards protocols because the choice between the
available border gateway protocol (BGP) routes is based on a
combination of local knowledge of the charging regime and local
measurement of traffic levels. If, as we propose, charges or
penalties were based on the level of re-ECN measured in passing
traffic, a similar optimisation could be achieved without requiring
any changes to standard routing protocols.</t>
<t>We must be clear that applying pre-congestion-based routing to
this admission control system remains an open research issue.
Traffic engineering based on congestion requires careful damping to
avoid oscillations, and should not be attempted without adult
supervision :) Mortier & Pratt <xref target="ECN-BGP" />
have analysed traffic engineering based on congestion. But without
the benefit of re-ECN, they had to add a path attribute to BGP to
advertise a route's downstream congestion (actually they proposed
that BGP should advertise the charge for congestion, which we
believe wrongly embeds an assumption into BGP that the only thing to
do with congestion is charge for it).</t>
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Fail-safes" title="Fail-safes">
<t>The mechanisms described so far create incentives for rational
operators to behave. That is, one operator aims to make another
behave responsibly by applying penalties and expects a rational
response (i.e. one that trades off costs against benefits). It is
usually reasonable to assume that other network operators will
behave rationally (policy routing can avoid those that might not).
But this approach does not protect against the misconfigurations and
accidents of other operators.</t>
<t>Therefore, we propose the following two mechanisms at a network's
borders to provide "defence in depth". Both are similar: <list
style="hanging">
<t hangText="Highly positive flows:">A small sample of positive
packets should be picked randomly as they cross a border
interface. Then subsequent packets matching the same source and
destination address and DSCP should be monitored. If the
fraction of positive marking is well above a threshold (to be
determined by operational practice), a management alarm SHOULD
be raised, and the flow MAY be automatically subject to focused
drop.</t>
<t hangText="Persistently negative flows:">A small sample of
congestion marked packets should be picked randomly as they
cross a border interface. Then subsequent packets matching the
same source and destination address and DSCP should be
monitored. If the RE blanking fraction minus the congestion
marking fraction is persistently negative, a management alarm
SHOULD be raised, and the flow MAY be automatically subject to
focused drop.</t>
</list></t>
<t>Both these mechanisms rely on the fact that highly positive (or
negative) flows will appear more quickly in the sample by selecting
randomly solely from positive (or negative) packets.</t>
<t>Note that there is no assumption that <spanx
style="emph">users</spanx> behave rationally. The system is
protected from the vagaries of irrational user behaviour by the
ingress gateways, which transform internal penalties into a
deterministic, admission control mechanism that prevents users from
misbehaving, by directly engineered means.</t>
</section>
</section>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Analysis" title="Analysis">
<t>The domains in <xref target="repcn_Fig_Scenario" /> are not expected
to be completely malicious towards each other. After all, we can assume
that they are all co-operating to provide an internetworking service to
the benefit of each of them and their customers. Otherwise their routing
polices would not interconnect them in the first place. However, we
assume that they are also competitors of each other. So a network may
try to contravene our proposed protocol if it would gain or make a
competitor lose, or both, but only if it can do so without being caught.
Therefore we do not have to consider every possible random attack one
network could launch on the traffic of another, given anyway one network
can always drop or corrupt packets that it forwards on behalf of
another.</t>
<t>Therefore, we only consider new opportunities for <spanx
style="emph">gainful</spanx> attack that our proposal introduces. But to
a certain extent we can also rely on the in depth defences we have
described (<xref target="repcn_Fail-safes" /> ) intended to mitigate the
potential impact if one network accidentally misconfiguring the workings
of this protocol.</t>
<t>The ingress and egress gateways are shown in the most generic
arrangement possible in <xref target="repcn_Fig_Scenario" />, without
any surrounding network. This allows us to consider more specific cases
where these gateways and a neighbouring network are operated by the same
player. As well as cases where the same player operates neighbouring
networks, we will also consider cases where the two gateways collude as
one player and where the sender and receiver collude as one. Collusion
of other sets of domains is less likely, but we will consider such
cases. In the general case, we will assume none of the nine trust
domains across the figure fully trust any of the others.</t>
<t>As we only propose to change routers within the Diffserv region, we
assume the operators of networks outside the region will be doing
per-flow policing. That is, we assume the networks outside the Diffserv
region and the gateways around its edges can protect themselves. So
given we are proposing to remove flow policing from some networks, our
primary concern must be to protect networks that don't do per-flow
policing (the potential `victims') from those that do (the `enemy'). The
ingress and egress gateways are the only way the outer enemy can get at
the middle victim, so we can consider the gateways as the
representatives of the enemy as far as domains A, B and C are concerned.
We will call this trust scenario `edges against middles'.</t>
<!-- <t>The general arrangement is similar to Intserv over Diffserv <xref target="RFC2998" /> with per-flow reservation processing outside the Diffserv region, but interior routers within it configured to ignore flow signalling. Exactly how per-flow reservations are achieved in the outer region is not of particular concern. To be concrete, we take Intserv <xref target="RFC1633" /> as the architecture of the outer per-flow region, but other architectures may be used such as proprietary bandwidth brokers or some future signalling architecture such as NSIS <xref target="RFC4080" />, or perhaps some hybrid of these.
</t>
-->
<t>Earlier in this memo, we outlined the classic border rate policing
problem (<xref target="repcn_Problem" />). It will now be useful to
reiterate the motivations that are the root cause of the problem. The
more reservations a gateway can allow, the more revenue it receives. The
middle networks want the edges to comply with the admission control
protocol when they become so congested that their service to others
might suffer. The middle networks also want to ensure the edges cannot
steal more service from them than they are entitled to.</t>
<t>In the context of this `edges against middles' scenario, the re-ECN
protocol has two main effects: <list style="symbols">
<t>The more pre-congestion there is on a path across the Diffserv
region, the higher the ingress gateway must declare downstream
pre-congestion.</t>
<t>If the ingress gateway does not declare downstream pre-congestion
high enough on average, it will `hit the ground before the runway',
going negative and triggering sanctions, either directly against the
traffic or against the ingress gateway at a management level</t>
</list></t>
<t>An executive summary of our security analysis can be stated in three
parts, distinguished by the type of collusion considered. <list
style="hanging">
<t hangText="Neighbour-only Middle-Middle Collusion:">Here there is
no collusion or collusion is limited to neighbours in the feedback
loop. In other words, two neighbouring networks can be assumed to
act as one. Or the egress gateway might collude with domain C. Or
the ingress gateway might collude with domain A. Or ingress and
egress gateways might collude with each other.</t>
<t>In these cases where only neighbours in the feedback loop
collude, we concludes that all parties have a positive incentive to
declare downstream pre-congestion truthfully, and the ingress
gateway has a positive incentive to invoke admission control when
congestion rises above the admission threshold in any network in the
region (including its own). No party has an incentive to send more
traffic than declared in reservation signalling (even though only
the gateways read this signalling). In short, no party can gain at
the expense of another.</t>
<t hangText="Non-neighbour Middle-Middle Collusion:">In the case of
other forms of collusion between middle networks (e.g. between
domain A and C) it would be possible for say A & C to create a
tunnel between themselves so that A would gain at the expense of B.
But C would then lose the gain that A had made. Therefore the value
to A & C of colluding to mount this attack seems questionable.
It is made more questionable, because the attack can be
statistically detected by B using the second `defence in depth'
mechanism mentioned already. Note that C can defend itself from
being attacked through a tunnel by treating the tunnel end point as
a direct link to a neighbouring network (e.g. as if A were a
neighbour of C, via the tunnel), which falls back to the safety of
the neighbour-only scenario.</t>
<t hangText="Middle-Edge Collusion:">Collusion between networks or
gateways within the Diffserv region and networks or users outside
the region has not yet been fully analysed. The presence of full
per-flow policing at the ingress gateway seems to make this a less
likely source of a successful attack.</t>
</list></t>
<t>{ToDo: Due to lack of time, the full write up of the security
analysis is deferred to the next version of this memo.}</t>
<t>Finally, it is well known that the best person to analyse the
security of a system is not the designer. Therefore, our confident
claims must be hedged with doubt until others with perhaps a greater
incentive to break it have mounted a full analysis.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Deployment" title="Incremental Deployment">
<t>We believe ECN has so far not been widely deployed because it
requires widespread end system and network deployment just to achieve a
marginal improvement in performance. The ability to offer a new service
(admission control) would be a much stronger driver for ECN
deployment.</t>
<t>As stated in the introduction, the aim of this memo is to "Design in
security from the start" when admission control is based on
pre-congestion notification. The proposal has been designed so that
security can be added some time after first deployment, but only if the
PCN wire protocol encoding is defined with the foresight to accommodate
the extended set of codepoints defined in this document. Given admission
control based on pre-congestion notification requires few changes to
standards, it should be deployable fairly soon. However, re-ECN requires
a change to IP, which may take a little longer.</t>
<t>We expect that initial deployments of PCN-based admission control
will be confined to single networks, or to clubs of networks that trust
each other. The proposal in this memo will only become relevant once
networks with conflicting interests wish to interconnect their admission
controlled services, but without the scalability constraints of per-flow
border policing. It will not be possible to use re-ECN, even in a
controlled environment between consenting operators, unless it is
standardised into IP. Given the IPv4 header has limited space for
further changes, current IESG policy <xref target="RFC4727" /> is not to
allow experimental use of codepoints in the IPv4 header, as whenever an
experiment isn't taken up, the space it used tends to be impossible to
reclaim.</t>
<t>If PCN-based admission control is deployed before re-ECN is
standardised into IP, wherever a networks (or club of networks) connects
to another network (or club of networks) with conflicting interests,
they will place a gateway between the two regions that does per-flow
rate policing and admission control. If re-ECN is eventually
standardised into IP, it will be possible for these separate regions to
upgrade all their gateways to use re-ECN before removing the per-flow
policing gateways between them. Given the edge-to-edge deployment model
of PCN-based admission control, it is reasonable to imagine this
incremental deployment model without needing to cater for partial
deployment of re-ECN in just some of the gateways around one Diffserv
region.</t>
<t>Only the edge gateways around a Diffserv region have to be upgraded
to add re-ECN support, not interior routers. It is also necessary to add
the mechanisms that use re-ECN to secure a network against misbehaving
gateways and networks. Specifically, these are the border mechanisms
(<xref target="repcn_Border_Mechanisms" />) and the mechanisms to
sanction dishonest marking (<xref
target="repcn_Sanctioning_Dishonest_Marking" />).</t>
<t>We also RECOMMEND adding improvements to forwarding on interior
routers (<xref target="repcn_Router_Forwarding_Behaviour" />). But the
system works whether all, some or none are upgraded, so interior routers
may be upgraded in a piecemeal fashion at any time.</t>
</section>
{ToDo: ECN deployment for admission control can reap immediate benefits when deployed unilaterally by one network operator, without any need to change end systems. Then, as more networks interconnect, the gains increase, due to cost savings at border gateways. Further, the benefits to individual networks are immediate and considerable. {ToDo: can I reveal these results? For instance, an internal BT study has compared the average capacity per link to provide admission control across a Diffserv region with and without pre-congestion notification. The pre-congestion notification solution requires less than a quarter of the capacity to serve forecast voice traffic load.}
<!-- ================================================================ -->
<section anchor="repcn_Rationale"
title="Design Choices and Rationale"><t>The primary insight of this work
is that downstream congestion is the metric that would be most useful to
control an internetwork, and particularly to police how one network
responds to the congestion it causes in a remote network. This is the
problem that has previously made it so hard to provide scalable admission
control. </t> <t>The case for using re-feedback (a generalisation of
re-ECN) to police congestion response and provide QoS is made in <xref
target="Re-fb" />. Essentially, the insight is that congestion is a factor
that crosses layers from the physical upwards. Therefore re-feedback
polices congestion where it emerges from a physical interface between
networks. This is achieved by bringing the congestion information to the
interface, rather than examining packet addressing where there is
congestion. Then congestion crossing the physical interface at a border
can be policed at the interface, rather than policing the congestion on
packets that claim to come from an address (which may be spoofed). Also,
re-feedback works in the network layer independently of other
layers—despite its name re-feedback does not actually require
feedback. It requires a source to act conservatively before it gets
feedback. </t> <t>On the subject of lack of feedback, the feedback not
established (FNE) codepoint is motivated by arguments for a state set-up
bit in IP to prevent state exhaustion attacks. This idea was first put
forward informally by David Clark and documented by Handley and Greenhalgh
in <xref target="Steps_DoS" />. The idea is that network layer datagrams
should signal explicitly when they require state to be created in the
network layer or the layer above (e.g. at flow start). Then a node can
refuse to create any state unless a datagram declares this intent. We
believe the proposed FNE codepoint serves the same purpose as the proposed
state-set-up bit, but it has been overloaded with a more specific purpose,
using it on more packets than just the first in a flow, but never less
(i.e. it is idempotent). In effect the FNE codepoint serves the purpose of
a `soft-state set-up codepoint'. </t> <t>The re-feedback paper <xref
target="Re-fb" /> also makes the case for converting the economic
interpretation of congestion into hard engineering mechanism, which is the
basis of the approach used in this memo. The admission control gateways
around the Diffserv region use hard engineering, not incentives, to
prevent end users from sending more traffic than they have reserved.
Incentive-based mechanisms are only used between networks, because they
are expected to respond to incentives more rationally than end-users can
be expected to. However, even then, a network can use fail-safes to
protect itself from excessively unusual behaviour by neighbouring
networks, whether due to an accidental misconfiguration or malicious
intent. </t> <t>The guiding principle behind the incentive-based approach
used between networks is that any gain from subverting the protocol should
be precisely neutralised, rather than punished. If a gain is punished to a
greater extent than is sufficient to neutralise it, it will most likely
open up a new vulnerability, where the amplifying effect of the punishment
mechanism can be turned on others. </t> <t>The re-feedback paper also
makes the case against the use of congestion charging to police congestion
if it is based on classic feedback (where only upstream congestion is
visible to network elements). It argues this would open up receiving
networks to `denial of funds' attacks and would require end users to
accept dynamic pricing (which few would). </t> <t>Re-ECN has been
deliberately designed to simplify policing at the borders between
networks. These trust boundaries are the critical pinch-points that will
limit the scalability of the whole internetwork unless the overall design
minimises the complexity of security functions at these borders. The
border mechanisms described in this memo run passively in parallel to data
forwarding and they do not require per-flow processing. </t> {ToDo: Why a
step marking regime wouldn't be as effective.}</section>
<!-- ================================================================ -->
<section anchor="repcn_Security_Considerations"
title="Security Considerations"><t>This whole memo concerns the security
of a scalable admission control system. In particular the analysis
section. Below some specific security issues are mentioned that did not
belong elsewhere or which comment on the overall robustness of the
security provided by the design. </t> <t>Firstly, we must repeat the
statement of applicability in the analysis: that we only consider new
opportunities for <spanx style="emph">gainful</spanx> attack that our
proposal introduces, particularly if the attacker can avoid being
identified. Despite only involving a few bits, there is sufficient
complexity in the whole system that there are probably numerous
possibilities for other attacks. However, as far as we are aware, none
reap any benefit to the attacker. For instance, it would be possible for a
downstream network to remove the congestion markings introduced by an
upstream network, but it would only lose out on the penalties it could
apply to a downstream network. </t> <t>When one network forwards a
neighbouring network's traffic it will always be possible to cause damage
by dropping or corrupting it. Therefore we do not believe networks would
set their routing policies to interconnect in the first place if they
didn't trust the other networks not to arbitrarily damage their traffic.
</t> <t>Having said this, we do want to highlight some of the weaker parts
of our argument. We have argued that networks will be dissuaded from
faking congestion marking by the possibility that upstream networks will
route round them. As we have said, these arguments are based on fairly
delicate assumptions and will remain fairly tenuous until proved in
practice, particularly close to the egress where less competitive routing
is likely. </t> <t>We should also point out that the approach in this memo
was only designed to be robust for admission control. We do not claim the
incentives will always be strong enough to force correct flow pre-emption
behaviour. This is because a user will tend to perceive much greater loss
in value if a flow is pre-empted than if admission is denied at the start.
However, in general the incentives for correct flow pre-emption are
similar to those for admission control. </t> <t>Finally, it may seem that
the 8 codepoints that have been made available by extending the ECN field
with the RE flag have been used rather wastefully. In effect the RE flag
has been used as an orthogonal single bit in nearly all cases. The only
exception being when the ECN field is cleared to <spanx
style="verb">00</spanx>. The mapping of the codepoints in an earlier
version of this proposal used the codepoint space more efficiently, but
the scheme became vulnerable to a network operator focusing its congestion
marking to mark more positive than neutral packets in order to reduce its
penalties (see Appendix B of <xref target="Re-TCP" />). </t> <t>With the
scheme as now proposed, once the RE flag is set or cleared by the sender
or its proxy, it should not be written by the network, only read. So the
gateways can detect if any network maliciously alters the RE flag. IPSec
AH integrity checking does not cover the IPv4 option flags (they were
considered mutable—even the one we propose using for the RE flag
that was `currently unused' when IPSec was defined). But it would be
sufficient for a pair of gateways to make random checks on whether the RE
flag was the same when it reached the egress gateway as when it left the
ingress. Indeed, if IPSec AH had covered the RE flag, any network
intending to alter sufficient RE flags to make a gain would have focused
its alterations on packets without authenticating headers (AHs). </t>
<t>No cryptographic algorithms have been harmed in the making of this
proposal. </t> {ToDo: RFC2474 and SIP analogy}</section>
<!-- ================================================================ -->
<section anchor="repcn_IANA_Considerations" title="IANA Considerations">
<t>This memo includes no request to IANA.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Conclusions" title="Conclusions">
<t>This memo builds on a promising technique to solve the classic
problem of making flow admission control scale to any size network. It
involves the use of Diffserv in a deployment model that uses
pre-congestion notification feedback to control admission into a network
path <xref target="I-D.ietf-pcn-architecture" />. However as it stands, that
deployment model depends on all network domains trusting each other to
comply with the protocols, invoking admission control and flow
pre-emption when requested.</t>
<t>We propose that the congestion feedback used in that deployment model
should be re-echoed into the forward data path, by making a trivial
modification to the ingress gateway. We then explain how the resulting
downstream pre-congestion metric in packets can be monitored in bulk at
borders to sufficiently emulate flow rate policing.</t>
<t>We claim the result of combining these two approaches is an admission
control system that scales to any size network <spanx
style="emph">and</spanx> any number of interconnected networks, even if
they all act in their own interests.</t>
<t>This proposal aims to convince its readers to "Design in Security
from the start," by ensuring the PCN wire protocol encoding can
accommodate the extended set of codepoints defined in this document,
even if border policing is not needed at first. This way, we will not
build ourselves tomorrow's legacy problem.</t>
<t>Re-echoing congestion feedback is based on a principled technique
called Re-ECN <xref target="Re-TCP" />, designed to add
accountability for causing congestion to the general-purpose IP datagram
service. Re-ECN proposes to consume the last completely unused bit in
the basic IPv4 header.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Acknowledgements" title="Acknowledgements">
<t>All the following have given helpful comments and some may become
co-authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve
Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard, Carla
Di Cairano-Gilfedder (BT), Mark Handley (who identified the excess
canceled packets attack), Stephen Hailes, Adam Greenhalgh (UCL),
Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz, Kwok-Ho Chan,
Corey Alexander (Nortel), David Clark, Bill Lehr, Sharon Gillett, Steve
Bauer (MIT) (who publicised various dummy traffic attacks), Sally Floyd
(ICIR) and comments from participants in the CFP/CRN Inter-Provider QoS,
Broadband and DoS-Resistant Internet working groups.</t>
</section>
<!-- ================================================================ -->
<section anchor="repcn_Comments_Solicited" title="Comments Solicited">
<t>Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Congestion and Pre-Congestion Notification working group's
mailing list <pcn@ietf.org>, and/or to the author(s).</t>
</section>
</middle>
<back>
<!-- ================================================================ -->
<references title="Normative References">
<?rfc include="localref.I-D.briscoe-tsvwg-re-ecn-tcp" ?>
<?rfc include="localref.I-D.briscoe-tsvwg-cl-phb" ?>
<?rfc include="localref.I-D.lefaucheur-rsvp-ecn" ?>
<?rfc include="reference.RFC.2119" ?>
<?rfc include="reference.RFC.2211" ?>
<?rfc include="reference.RFC.3168" ?>
<?rfc include="reference.RFC.3246" ?>
</references>
<references title="Informative References">
<?rfc include="reference.I-D.ietf-pcn-architecture" ?>
<?rfc include="reference.RFC.5129" ?>
<?rfc include="reference.I-D.ietf-nsis-rmd.xml" ?>
<?rfc include='reference.I-D.arumaithurai-nsis-pcn'?>
<!-- <?rfc include="reference.RFC.1633" ?> -->
<?rfc include="reference.RFC.2207" ?>
<?rfc include="reference.RFC.2205" ?>
<?rfc include="reference.RFC.2208" ?>
<?rfc include="reference.RFC.2747" ?>
<?rfc include="reference.RFC.2998" ?>
<?rfc include="reference.RFC.3540" ?>
<!-- <?rfc include="reference.RFC.4080" ?> -->
<?rfc include="localref.Briscoe05d.Re-fb_policing" ?>
<?rfc include='reference.RFC.4727'?>
<?rfc include="localref.Briscoe05f.IPQoS_ix.xml" ?>
<?rfc include="localref.Golden04.Smart_routing_multihome.xml" ?>
<?rfc include="localref.Handley04.Steps_DoS_Arch.xml" ?>
<?rfc include="localref.Mortier03.Incentive_BGP.xml" ?>
<?rfc include="localref.Salvatori05a.Re-fb_closed_loop_policing.xml" ?>
</references>
<!-- ================================================================ -->
<section anchor="repcn_Implementation" title="Implementation">
<!-- ________________________________________________________________ -->
<section anchor="repcn_Alg_Blanking_RE"
title="Ingress Gateway Algorithm for Blanking the RE flag">
<t>The ingress gateway receives regular feedback reporting the
fraction of congestion marked octets for each aggregate arriving at
the egress. So for each aggregate it should blank the RE flag on the
same fraction of octets. It is more efficient to calculate the
reciprocal of this fraction when the signalling arrives, Z_0 = (1 /
Congestion-Level-Estimate). Z_0 will be the number of octets of
packets the ingress should send with the RE flag set between those it
sends with the RE flag blanked. Z_0 will also take account of the
sustainable rate reported during the flow pre-emption process, if
necessary.</t>
<t>A suitable pseudo-code algorithm for the ingress gateway is as
follows: <artwork><![CDATA[
====================================================================
B_i = 0 /* interblank volume */
for each PCN-capable packet {
b = readLength(packet) /* set b to packet size */
B_i += b /* accumulate interblank volume */
if B_i < b * Z_0 { /* test whether interblank volume... */
writeRE(1)
} else { /* ...exceeds blank RE spacing * pkt size*/
writeRE(0) /* ...and if so, clear RE */
B_i = 0 /* ...and re-set interblank volume */
}
}
====================================================================
]]></artwork></t>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Alg_Metering"
title="Downstream Congestion Metering Algorithms">
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Bulk_Alg_Metering"
title="Bulk Downstream Congestion Metering Algorithm">
<t>To meter the bulk amount of downstream pre-congestion in traffic
crossing an inter-domain border, an algorithm is needed that
accumulates the size of positive packets and subtracts the size of
negative packets. We maintain two counters: <list style="empty">
<t>V_b: accumulated pre-congestion volume</t>
<t>B: total data volume (in case it is needed)</t>
</list></t>
<t>A suitable pseudo-code algorithm for a border router is as
follows: <artwork><![CDATA[
====================================================================
V_b = 0
B = 0
for each PCN-capable packet {
b = readLength(packet) /* set b to packet size */
B += b /* accumulate total volume */
if readEECN(packet) == (Re-Echo || FNE) {
V_b += b /* increment... */
} elseif readEECN(packet) == ( AM(-1) || PM(-1) ) {
V_b -= b /* ...or decrement V_b... */
} /*...depending on EECN field */
}
====================================================================
]]></artwork></t>
<t>At the end of an accounting period this counter V_b represents
the pre-congestion volume that penalties could be applied to, as
described in <xref target="repcn_Pre-requisite_Contract" />.</t>
<t>For instance, accumulated volume of pre-congestion through a
border interface over a month might be V_b = 5PB (petabyte = 10^15
byte). This might have resulted from an average downstream
pre-congestion level of 1% on an accumulated total data volume of B
= 500PB.</t>
{ToDo: Include algorithm for precise downstream pre-congestion.}
</section>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
<section anchor="repcn_Inflation_Negative_Flows"
title="Inflation Factor for Persistently Negative Flows">
<t>The following process is suggested to complement the simple
algorithm above in order to protect against the various attacks from
persistently negative flows described in <xref
target="repcn_Border_Accounting_Mechanisms"></xref>. As explained in
that section, the most important and first step is to estimate the
contribution of persistently negative flows to the bulk volume of
downstream pre-congestion and to inflate this bulk volume as if
these flows weren't there. The process below has been designed to
give an unbiased estimate, but it may be possible to define other
processes that achieve similar ends.</t>
<t>While the above simple metering algorithm is counting the bulk of
traffic over an accounting period, the meter should also select a
subset of the whole flow ID space that is small enough to be able to
realistically measure but large enough to give a realistic sample.
Many different samples of different subsets of the ID space should
be taken at different times during the accounting period, preferably
covering the whole ID space. During each sample, the meter should
count the volume of positive packets and subtract the volume of
negative, maintaining a separate account for each flow in the
sample. It should run a lot longer than the large majority of flows,
to avoid a bias from missing the starts and ends of flows, which
tend to be positive and negative respectively.</t>
<t>Once the accounting period finishes, the meter should calculate
the total of the accounts V_{bI} for the subset of flows I in the
sample, and the total of the accounts V_{fI} excluding flows with a
negative account from the subset I. Then the weighted mean of all
these samples should be taken a_S = sum_{forall I} V_{fI} /
sum_{forall I} V_{bI}.</t>
<t>If V_b is the result of the bulk accounting algorithm over the
accounting period (<xref target="repcn_Bulk_Alg_Metering"></xref>)
it can be inflated by this factor a_S to get a good unbiased
estimate of the volume of downstream congestion over the accounting
period a_S.V_b, without being polluted by the effect of persistently
negative flows.</t>
</section>
</section>
<!-- ________________________________________________________________ -->
<section anchor="repcn_Alg_Sanction_Negative"
title="Algorithm for Sanctioning Negative Traffic">
<t>{ToDo: Write up algorithms similar to Appendix D of <xref
target="Re-TCP"></xref> for the negative flow monitor with flow
management algorithm and the variant with bounded flow state.}</t>
</section>
</section>
</back>
</rfc>| PAFTECH AB 2003-2026 | 2026-04-22 21:39:07 |