One document matched: draft-irtf-rrg-recommendation-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC1887 SYSTEM
"http://xml.resource.org/public/rfc/bibxml/reference.RFC.1887.xml">
<!ENTITY I-D.narten-radir-problem-statement SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-radir-problem-statement.xml">
<!ENTITY I-D.irtf-rrg-design-goals SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.irtf-rrg-design-goals.xml">
<!ENTITY I-D.carpenter-renum-needs-work SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.carpenter-renum-needs-work.xml">
<!ENTITY I-D.francis-intra-va SYSTEM
"http://xml.resource.org/public/rfc/bibxml3/reference.I-D.francis-intra-va.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc category="info" docName="draft-irtf-rrg-recommendation-02"
ipr="trust200811">
<front>
<title abbrev="RRG Recommendation">
Preliminary Recommendation for a Routing Architecture
</title>
<author fullname="Tony Li" initials="T." role="editor"
surname="Li">
<organization>Ericsson</organization>
<address>
<postal>
<street>300 Holger Way</street>
<city>San Jose</city>
<region>CA</region>
<code>95134</code>
<country>USA</country>
</postal>
<phone>+1 408 750 5160</phone>
<email>tony.li@tony.li</email>
</address>
</author>
<date month='March' year="2009" />
<area></area>
<workgroup>Internet Research Task Force</workgroup>
<keyword>routing</keyword>
<abstract>
<t>
It is commonly recognized that the Internet routing and addressing
architecture is facing challenges in scalability, multi-homing, and
inter-domain traffic engineering. This document reports the
Routing Research Group's prelimnary findings from its efforts
towards developing a recommendation for a scalable routing
architecture.
</t>
<t>
This document is a work in progress.
</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
It is commonly recognized that the Internet routing and addressing
architecture is facing challenges in scalability, multi-homing, and
inter-domain traffic engineering. The problem being addressed has
been documented in
<xref target='I-D.narten-radir-problem-statement'/>, and the design
goals that we have agreed to can be found in
<xref target='I-D.irtf-rrg-design-goals'/>. This document reports
the Routing Research Group's (RRG's) preliminary results from its
efforts towards developing a recommendation for a scalable routing
architecture.
</t>
<t>
This document is a work in progress.
</t>
<section title="Structure of This Document">
<t>
This document describes a number of the different possible
approaches that could be taken in a new routing architecture, as
well as a summary of the current thinking of the overall group
regarding each approach.
</t>
</section>
</section>
<section title="Terminology and Abbreviations">
<t>
This section describes the common terminology used in this
document. Particular architectures and discussions frequently
define additional terms, qualify these terms or add additional
semantics.
</t>
<t>
<list style="hanging">
<t hangText="address">
An address is a name that is both an interface locator and an
endpoint identifier.
</t>
<t hangText="FIB">
Forwarding Information Base, also known as the forwarding
table. Typically, the forwarding table contains the subset of
the information in the RIB that is actually needed at
forwarding time.
</t>
<t hangText="GUID">
Globally Unique IDentifier
</t>
<t hangText="ISP">
Internet Service Provider
</t>
<t hangText="identifier">
An identifier is the name of an object; identifiers have no
topological sensitivity, and do not have to change, even if
the object changes its point(s) of attachment within the
network topology. Identifiers may have other properties,
such as the scope of their uniqueness (local or global
(default)), the probability of their uniqueness
(statistical or absolute (default)), and their lifetime
(ephemeral or permanent (default)).
</t>
<t hangText="locator">
A locator is a name that has topological sensitivity at a
given layer and must change if the point of attachment at
that layer changes. By default, a locator refers to layer
3. It is also possible to have locators at other layers.
Locators may have other properties, such as their scope
(local or global (default)) and their lifetime (ephemeral
or permanent (default)).
</t>
<t hangText="multihoming">
A site or host is multihomed if it has multiple topological
connections to the network and the locators for those
connections do not aggregate.
</t>
<t hangText="RIB">
Routing Information Base, also known as the routing table.
</t>
<t hangText="RIR">
Regional Internet Registry
</t>
<t hangText="RLOC">
A Remote LOCator is a locator with global scope.
</t>
<t hangText="SID">
Session IDentifier
</t>
<t hangText="TE">
Traffic Engineering is a technique for controlling the path that
traffic takes beyond baseline methods, such as shortest path
first IGP computations and BGP shortest AS path computation.
</t>
</list>
</t>
</section>
<section title="Taxonomies of the Solution Space">
<t>
In trying to understand the entirety of the solution space that we
are confronted with, we have made multiple attempts to divide the
space into comprehensible sectors. The entire solution space is
complex, and it seems difficult to capture all of the pertinent
dimensions of the space with only a single perspective. Different
taxonomies seem to provide insight during different discussions,
and we summarize all of them here to capture all of the useful
perspectives. Of these, we've found that <xref target="Herrin"/>
is the most useful so far and is where we will continue to focus
our efforts.
</t>
<section title="A Mechanism Taxonomy">
<t>
In this taxonomy, solutions are grouped by the primary mechanisms
that they use to achieve their goals.
</t>
<section anchor="Mechanism:Transport" title="Layer 4 Transport">
<t>
Transport solutions are characterized by their usage of
modifications soley at layer 4 to provide locator
and identifier independence. For example, if a transport
protocol supports connections across multiple addresses as a
means of supporting multi-homed hosts, and can seamlessly and
transparently shift across these addresses, then it can provide
the multi-homing support that is required.
</t>
<t>
However, in our discussions, it became clear that even with
transport level agility, host-level renumbering of sites would
still be necessary to support these types of solutions. The
consensus of the group is that such site renumbering is widely
unacceptable for operational reasons and thus, these types of
solutions are not of interest for further exploration in this
group as the primary basis for a scalable routing architecture.
The advantages of these techinques are undeniable and are
likely to complement other architectural approaches.
</t>
</section>
<section title="Translation">
<t>
Translation solutions are characterized by a translation
operation between an identifier to a locator and back to an
identifier as the packet traverses the network. Translation
approaches do not add additional encapsulations to the packet
as they traverse the network, usually translating the fields in
their place in the packet. Translation solutions can further
be categorized as those with separated fields for locators and
identifiers and those that continue to use a single address
field. Translation solutions also can be categorized as having
the translation done in the host or in a middle box.
</t>
</section>
<section title="Map & Encap">
<t>
Map & Encap solutions are characterized by a lookup
operation from the identifier to a locator and then an
encapsulation of the packet payload into a tunnel that directs
the packet across the topology.
</t>
</section>
</section>
<section title="A Functional Taxonomy">
<t>
In solving a problem one must keep clearly separate the goals and
the means. Here the goal is to get a control handle on the
scalability of the routing architecture. Another important issue
to keep in mind is that, for any change to be made in one party
of the Internet, it must do no harm to the rest of the system.
</t>
<section title="FIB Size Reduction">
<t>
One can achieve FIB size reduction through virtual aggregation
as explained in Paul Francis' draft.
<xref target="I-D.francis-intra-va"/>
</t>
<t>
It is worth pointing out that this approach has been discussed
in slightly different forms, e.g. a talk at NANOG 44, and used
in practice as various forms of default routes.
</t>
<t>
<!-- Added by tli -->
While reducing the FIB size is a laudable goal, alone it is
insufficient in that it does not address the RIB scalability
issue.
</t>
</section>
<section title="RIB Size Reduction">
<t>
EDITOR'S NOTE: Lixia to propose text here.
</t>
</section>
</section>
<section anchor="Herrin" title="The Herrin Taxonomy">
<t>
As part of the mailing list discussion, the group constructed a
more detailed taxonomy of possible architectures, described as a
series of strategies.
</t>
<section title="Strategy A">
<t>
Local routing is based on an address, which functions as a GUID,
SID component and local locator, but have each packet flow
through an encoder which attaches a RLOC before the packet
enters the internetwork core. Routing within the core is based
on the RLOC. Only ISPs with significant interconnection have
their own RLOCs. Fewer than 10,000 such "core ISPs" exist today
and the number is growing much more slowly than the routing
table overall. Once the packet reaches the network identified
by the RLOC, local routing by address takes over for final
delivery. Distribute RLOCs through the core via a typical
distance-vector or link-state routing protocol.
</t>
<section title="Variants">
<t>
<list style="hanging">
<t hangText="A1a">
Each core ISP has one RLOC. The RLOC's existence and
reachability is flooded to the rest of the core.
</t>
<t hangText="A1b">
Each core ISP has a small number of RLOCs for TE. The
RLOCs' existence and reachability is flooded to the rest of
the core.
</t>
<t hangText="A1c">
Each core ISP has an aggregated set of RLOCs which it may
hierarchically assign to customers downstream and/or
disaggregate for TE. The aggregated RLOC's existence and
reachability is flooded to the rest of the core.
</t>
</list>
</t>
</section>
<section title="Mapping approaches">
<t>
<list style="hanging">
<t hangText="A2a">
Addresses are statically mapped to RLOCs. Map entries are
periodically pushed towards a central or distributed
registry. The full list is periodically downloaded to the
encoders which add RLOCs to the packets.
</t>
<t hangText="A2b">
Addresses are dynamically mapped to RLOCs. Map entries are
pushed towards a central or distributed registry as they
change. The registry pushes all incremental changes in
near-real time to all encoders which add RLOCs to the
packets.
</t>
<t hangText="A2c">
Addresses are dynamically mapped to RLOCs. Map entries are
pushed towards a central or distributed registry as they
change. Encoders request and briefly cache individual
mappings from the registry as needed.
</t>
</list>
</t>
</section>
<section title="Failure handling approaches">
<t>
Link failures in the Internet core cause the RLOCs to be
rerouted with no change to the address to RLOC mapping.
</t>
<t>
<list style="hanging">
<t hangText="A3a">
RLOC encoders detect when particular RLOCs are no longer
reachable at all and fall back on secondary RLOCs for a
particular address. Encoders rely on active failure messages
from some system in the RLOC-specified network to indicate
that a host is no longer available via that RLOC, causing
them to fall back on secondary RLOCs for that host.
</t>
<t hangText="A3b">
Link failures which prevent parts of the RLOC's network from
reaching a destination host or set of hosts it serves cause
an external analysis element to make a dynamic change to the
address-RLOC map, depreferencing or removing the affected
RLOC. The external analysis element may be under the control
of the end-user destination network, the RLOC network or a
third party under contract to one of them.
</t>
</list>
</t>
</section>
<section title="Compatibility approaches">
<t>
<list style="hanging">
<t hangText="A4a">
Create a new IP protocol. The new protocol would not be
compatible with IPv4 and IPv6.
</t>
<t hangText="A4b">
Modify the IP protocol. The modified protocol would not
be compatible with IPv4 and IPv6 as deployed.
</t>
<t hangText="A4c">
Standard IPv4 and IPv6 packets are tunnelled while they
transit the Internet core. Path-MTU issues are handled
by setting an Internet-wide maximum packet size enforced
by the encoders and assuring that all core links support
that size.
</t>
<t hangText="A4d">
Standard IPv4 and IPv6 packets are tunnelled while they
transit the Internet core. Path-MTU issues are handled
by returning packets which breach the MTU while in the
core back to the encoder who must act as a proxy by
returning a sensible packet-too-big message to the
originating host.
</t>
<t hangText="A4e">
The IPv6 address space is partitioned into end-user address
space and Internet core address space. The address to RLOC
map is symmetric. Part of the IPv6 end-user address is
swapped for the RLOC when the packet enters the Internet core
and then restored when it leaves the Internet core. Use a
different A4 variant for IPv4.
</t>
<t hangText="A4f">
The IPv6 flow label or some other component(s) of the IPv6
header are used to contain the RLOC. The flow label is set
before the packet enters the core. Non-local packets are
routed based on the flow label. Use a different A4 variant
for IPv4.
</t>
<t hangText="A4g">
Steal bits from other functions in the IPv4 header
(e.g. checksum) to make space for an RLOC. Discard those
components and set the RLOC when the packet enters the
core. Restore the original bits when the packet leaves the
core. Use a different A4 variant for IPv6.
</t>
</list>
</t>
</section>
<section title="Core routing methods">
<t>
<list style="hanging">
<t hangText="A5a">
Distribute RLOCs through the Internet core via BGP.
</t>
<t hangText="A5b">
Distribute RLOCs through the Internet core via a new
distance-vector protocol.
</t>
<t hangText="A5c">
Distribute RLOCs through the Internet core via a link-state
protocol.
</t>
</list>
</t>
</section>
<section title="Major criticisms">
<t>
There don't appear to be any genuinely clean ways of
implementing strategy A. Handling path-MTU is a usually
problem since the packets in the core are different than the
origin host would recognize. Extra bandwidth is consumed by
the ingress tunnel router figuring out whether the egress
tunnel router is still available and functioning. Border
filtering of source addresses becomes problematic.
</t>
<t>
Deployment may require heavy weight "for the public good"
relays in the non-upgraded part of the Internet to facilitate
migration.
</t>
<t>
<!-- Added by tli -->
During the transition period, it appears difficult to remove
legacy prefixes from the global routing table. The best that
can be done is to advertise aggregates of legacy prefixes
from the relays. This may have an impact on stretch.
</t>
</section>
</section>
<section anchor="StrategyB" title="Strategy B">
<t>
Assign hierarchically aggregatable locators to every
host. Assign multiple locators to each host such that in the
network topology hosts appear as stubs in multiple locations
instead of forming distant connections in the graph. Assign one
aggregated set of locators to each core ISP where a core ISP is
one which has at least half a dozen major transit or peering
links. Flood the aggregated locator's existence and
reachability to the rest of the core.
</t>
<t>
Having reduced the network topology to something relatively
close to a hierarchy, perform plain old hierarchical
aggregation on the locators. Add and remove locators to each
host dynamically during operation as needed to reflect changes
in the nearby network hierarchies.
</t>
<t>
Attach source and destination locators when the packet leaves
the host. Route first by source then by destination locator:
move up the source network hierarchy until you can move
laterally toward the destination locator in a permissioned
manner.
</t>
<t>
Identifier to locator maps are pushed from the host towards a
distributed registry as they change. Hosts request and
temporarily cache individual mappings from the registry as
needed.
</t>
<section title="Locator variants">
<t>
<list style="hanging">
<t hangText="B1a">
A hierarchically aggregated locator is dynamically
assigned to each host from each upstream path. Each
router receives a less specific prefix from upstream and
assigns a more specific prefix downstream. Link state
changes in the path to the core are satisfied by
renumbering instead of rerouting: the host abandons the
locator hierarchically associated with the old path. If a
new path is available, the host acquires a locator
hierarchically associated with the new path.
</t>
<t hangText="B1b">
A locator is an administratively-assigned loose source
route instead of a single address. The first address in
the loose source route is a universally-known waypoint
router. The last address is the final destination. Link
state changes in the path to the core are satisfied by
rerouting in the appropriate routing domain when
possible. If rerouting in the affected domain is not
possible, the host abandons the impacted locator.
</t>
<t hangText="B1c">
Semi-hierarchical locators are administratively or
automatically assigned. Local reconnection during link
state changes is accomplished with rerouting instead of
renumbering.
</t>
</list>
</t>
</section>
<section title="Identifier variants">
<t>
<list style="hanging">
<t hangText="B2a">
Each host has a single identifer to which the locators
are attached. This identifier is used by the layer-4/5
and higher protocols to compose the SID.
</t>
<t hangText="B2b">
Each service provided by a host has a globally unique,
hierarchical identifier to which the locators are
attached. Clients initiating communication with that
service negotiate a SID which is unique only within the
scope of that service.
</t>
</list>
</t>
</section>
<section title="Major criticisms">
<t>
<list style="numbers">
<t>
This strategy is probably not compatible with UDP or TCP
though B1a/c could be compatible with IPv6's layer 3. The
replacement layer-4/5 protocols should also be coaxable
to run on top of IPv4's layer 3 in the not-yet-upgraded
part of the network.
</t>
<t>
How do firewalls work if the locators are constantly in
flux in B1a?
</t>
<t>
How is theft of service avoided in B1b?
</t>
</list>
</t>
</section>
</section>
<section title="Strategy C">
<t>
Suppress distant routes by aggregating them into sets expected
to be available in a given direction. Because locator reachability
info is not flooded, the routing tables each router must deal
with are relatively small.
</t>
<section title="Variants">
<t>
<list style="hanging">
<t hangText="C1">
Aggregate locators based on geography. All nodes within
some geographic boundary are assigned the same
locator. Routers move packets to any adjacent router
deemed to be "closer" to the locator in question.
</t>
</list>
</t>
</section>
<section title="Major criticisms">
<t>
No one has been able to construct a proposal under strategy C
without introducing constraints that are fundamentally
incompatible with the Internet's economic model. For example,
geographic aggregation has been shown to have uncorrectable
theft-of-service anomalies in networks as small as 8
autonomous systems and two geographic areas.
</t>
<t>
<!-- Added by tli -->
Fundamentally, geographic aggregation requires that there be
a per-region interconnect that functions as the deaggregation
point for the region's traffic. Funding such an interconnect
and compelling the affected ISPs to participate in the
interconnect requires external third party coercive controls.
</t>
</section>
</section>
<section anchor="StrategyD" title="Strategy D">
<t>
Use plain old BGP for the RIB. Algorithmically compress the FIB
in each router.
</t>
<section title="Variants">
<t>
<list style="hanging">
<t hangText="D1a">
Aggregate any adjacent routes that have the same next hop.
</t>
<t hangText="D1b">
Insert a /0 route into the FIB which goes to the most
popular next hop for all the routes in the RIB. Step to the
/1 level. For each /1, if most of the routes in the RIB
within that /1 go to a different next hop than the longest
route above (the /0 route), add that /1 route to the
FIB. Step to the /2 level. Repeat until all routes in the
RIB go to the correct next hop in the FIB. Unrouted space
is treated as "don't care": it will route wherever the
algorithm happens to drop it and will rely on the TTL to
take packets off the network.
</t>
</list>
</t>
</section>
<section title="Major criticisms">
<t>
<list style="numbers">
<t>
The RIB can grow to up to an order of magnitude larger
than the FIB before it hits the wall too. One order of
magnitude doesn't gain us multihoming for small
office/home office sites.
</t>
<t>
FIBs towards the edge should aggregate well with this
strategy but there's no evidence to support a conclusion
that they'd aggregate well deep in the core.
</t>
</list>
</t>
</section>
</section>
<section anchor="StrategyE" title="Strategy E">
<t>
Make no routing architecture changes. Instead, create a billing
system through which the ISPs running core routers are paid by
the ISPs announcing prefixes. Let economics suppress growth to
a survivable level.
</t>
<section title="Variants">
<t>
<list style="hanging">
<t hangText="E1a">
Everybody pays the RIRs. the RIRs pay the router
operators.
</t>
<t hangText="E1b">
Private negotiation between parties.
</t>
<t hangText="E1c">
Assisted private negotiation where router operators can
offer standardized contracts to carry prefixes and prefix
announcers can accept groups of identical contracts via
an automated third-party payment system moving funds
between the two easily.
</t>
</list>
</t>
</section>
<section title="Major criticisms">
<t>
<list style="numbers">
<t>
If it could be done without creating massive boondoggle,
why hasn't it been done already? This has been discussed
previously and there are no obvious mechanisms to put
such a system in place without having a central authority
for the Internet.
</t>
<t>
This means giving up on a solution that genuinely enables
users and accepting one that merely keeps the Internet
viable.
</t>
</list>
</t>
</section>
</section>
<section anchor="StrategyF" title="Strategy F">
<t>
Do nothing. (See <xref target="RFC1887"/> Section 4.4.1)
</t>
<section title="Major criticisms">
<t>
It costs "everybody else" a grand total of at least $6000 per
year for each prefix you announce. <xref target="BGPCost"/>
When we give away that $6000 of value for free, it inevitably
creates a "tragedy of the commons" problem.
</t>
<t>
<!--- Added by tli -->
Given that the research group is chartered to 'do something',
this alternative does not fit within the charter.
</t>
</section>
</section>
<section anchor="StrategyG" title="Strategy G">
<t>
Change the topology so that all hosts attach to only one ISP
using IPv6 and the ISP's single set of provider assigned
addresses. <!-- tli: The following is incorrect. --> (Actual
result of <xref target="RFC1887"/> Section 4.4.3)
</t>
<section title="Major criticisms">
<t>
This strategy wasn't accepted by the operations community
because the IPv6 architecture makes renumbering every bit as
hard as in IPv4 and the multihoming described in
<xref target="RFC1887"/> Section 4.4.3 does not appear to
actually work.
<!-- tli: I don't think that this was properly -->
<!-- understood. -->
</t>
</section>
</section>
</section>
</section>
<section title="Recommendations">
<section title="No manual renumbering of end hosts">
<t>
There is clear consensus in the group that renumbering of sites
must not require manual intervention on a per-host basis. This
does not scale adequately from a management cost structure. This
effectively eliminates solutions that require that hosts have
only a single locator and renumber on topological changes, or if
hosts maintain multiple locators manually.
</t>
<t>
This implies that <xref target="Mechanism:Transport"> transport
solutions </xref> are unacceptable unless coupled with another
mechanism that would automate the distribution and management of
host renumbering, which appears to be a major undertaking all on
its own. Further, variants of <xref target="StrategyB">Strategy
B</xref> that require manual locator assignment are similarly
unacceptable, as are other solutions that require manual locator
assignment, such as
<xref target="StrategyD">Strategy
D</xref>, <xref target="StrategyE">Strategy
E</xref>, <xref target="StrategyF">Strategy F</xref>,
and <xref target="StrategyG">Strategy G</xref>.
</t>
<t>
Some further work on improving host renumbering can be found in
<xref target="I-D.carpenter-renum-needs-work"/>.
</t>
</section>
<section title="Future progress">
<t>
The RRG should continue to prune the solution space presented
here, attempting to find the overall maximally acceptable
solution within the bounds and constraints that have been
presented. Whenever possible the research group will continue to
discuss architectural concepts and make architectural
recommendations rather than becoming embroiled in detailed
engineering implementation discussions.
</t>
<t>
The RRG should present a final recommendation by March, 2010.
</t>
</section>
</section>
<section title="Acknowledgements">
<t>
This document represents a small portion of the overall work
product of the Routing Research Group, who have developed all of
these architectural approaches and many specific proposals within
this solution space.
</t>
<t>
In particular, Bill Herrin has been instrumental in constructing
his <xref target="Herrin">taxonomy</xref>, with the input of the
entire community. This has been pivotal in helping to focus
the discussions of the group. We would also like to thank Joel
Halpern for his insights and comments.
</t>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>This memo includes no requests to IANA.</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>All solutions are required to provide security that is at least as
strong as the existing Internet routing and addressing architecture.</t>
</section>
</middle>
<back>
<references title="Normative References">
&I-D.narten-radir-problem-statement;
&I-D.irtf-rrg-design-goals;
&RFC1887;
</references>
<references title="Informative References">
&I-D.carpenter-renum-needs-work;
&I-D.francis-intra-va;
<reference anchor="BGPCost"
target="http://bill.herrin.us/network/bgpcost.html">
<front>
<title>What does a BGP Route cost?</title>
<author initials="W." surname="Herrin">
<organization></organization>
</author>
</front>
</reference>
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 08:57:55 |