One document matched: draft-jen-apt-00.txt
Network Working Group D. Jen
Internet-Draft M. Meisel
Intended status: Informational D. Massey
Expires: January 3, 2008 L. Wang
B. Zhang
L. Zhang
July 2, 2007
APT: A Practical Transit Mapping Service
draft-jen-apt-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 3, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
The size of the global routing table is a rapidly growing problem.
Several solutions have been proposed. These solutions commonly
divide the Internet into two parts, one for customers and one for
providers, where only provider addresses are globally routable.
Packets destined for customer addresses are tunneled through provider
Jen, et al. Expires January 3, 2008 [Page 1]
Internet-Draft Transit Mapping July 2007
space. For this process to work, there must be a mapping service
that can supply an appropriate provider-edge address for any given
customer address. We present a design for such a mapping service.
We adhere to a "do no harm" design philosophy: maintain all desirable
features of the current architecture without negatively affecting its
security or reliability. Our design aims to minimize delay and
prevent loss in packet encapsulation, minimize the number of new or
modified devices, and keep the level of control traffic manageable.
Table of Contents
1. Requirements Notation . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. The Mapping Service . . . . . . . . . . . . . . . . . . . . . 5
4.1. A Mapping Example . . . . . . . . . . . . . . . . . . . . 6
5. Multihoming Support . . . . . . . . . . . . . . . . . . . . . 7
5.1. Using Alternate ETRs During Failures . . . . . . . . . . . 8
5.1.1. Handling TS Prefix Failure . . . . . . . . . . . . . . 9
5.1.2. Handling Single TS Address Failure . . . . . . . . . . 9
5.1.3. Handling User-to-TR Link Failure . . . . . . . . . . . 10
5.2. Summary of Requirements for Multihoming Support . . . . . 11
6. Exchanging Mappings Between ASes . . . . . . . . . . . . . . . 11
6.1. In Defense of BGP . . . . . . . . . . . . . . . . . . . . 12
7. Security and Robustness . . . . . . . . . . . . . . . . . . . 13
7.1. Detecting Misconfigurations . . . . . . . . . . . . . . . 13
7.2. ICMP Mapping Packets . . . . . . . . . . . . . . . . . . . 14
7.3. Other ICMP Packets . . . . . . . . . . . . . . . . . . . . 14
7.4. Default Mapper Scalability . . . . . . . . . . . . . . . . 15
8. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 15
9. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 16
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
11. Security Considerations . . . . . . . . . . . . . . . . . . . 17
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12.1. Normative References . . . . . . . . . . . . . . . . . . . 17
12.2. Informative References . . . . . . . . . . . . . . . . . . 17
Appendix A. BGP Mapping Announcement Fields . . . . . . . . . . 18
Appendix B. ICMP Mapping Message Fields . . . . . . . . . . . . 19
Appendix C. ICMP Border Link Failure Fields . . . . . . . . . . 19
Appendix D. Hidden Backup Mappings . . . . . . . . . . . . . . . 19
Appendix D.1. Hidden Backup Mapping Protocol . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
Intellectual Property and Copyright Statements . . . . . . . . . . 23
Jen, et al. Expires January 3, 2008 [Page 2]
Internet-Draft Transit Mapping July 2007
1. Requirements Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. Introduction
The unexpected, explosive growth of the Internet is causing a greater
and greater strain on its infrastructure. This problem has been
well-documented in [RAWS][AddrAlloc]. Several solutions have been
proposed to address this problem [EFIT][CRIO][LISP], most of which
involve separating the Internet into two parts -- one for user
networks and one for transit providers. Routers in transit space
would only need to know how to route to transit prefixes, which are
stable and conducive to topological aggregation. When a packet is
sent from user address A to destination user address B, A's provider-
edge router (the ingress tunnel router, or "ITR", as defined in
[LISP]) encapsulates the packet and sends it through transit space to
B's provider-edge router (the egress tunnel router, or "ETR"). B's
ETR decapsulates the packet and forwards it to the appropriate
recipient, B.
When encapsulating a packet, A's ITR must somehow determine B's ETR's
transit-space address and include it in the outer header. In
general, any ITR must be able to map any given user-space address to
a corresponding ETR transit-space address for proper tunneling
through transit space. This illustrates the need for a mapping
service that can provide this address. The design details of this
mapping service will play a large part in determining the
effectiveness of any proposed implementation of a user/transit
provider address space separation. The mapping service also presents
an exciting opportunity to enhance the services currently offered by
the Internet, which is further reason to carefully consider how this
service should be implemented. Should mapping information be
distributed via a push or a pull model? What additional information,
if any, should be obtained along with the mapping information? Can
we satisfy the mapping requirement without sacrificing any services
or packet delivery quality?
Our answers to these questions are rooted in a "do no harm" design
philosophy: improve routing scalability without sacrificing any
desirable features in the current architecture or negatively
affecting its security and reliability. To this end, we present APT,
A Practical Transit mapping service designed with the following goals
in mind.
Jen, et al. Expires January 3, 2008 [Page 3]
Internet-Draft Transit Mapping July 2007
o Minimize delay and prevent loss in packet encapsulation.
o Minimize the number of devices that need to be modified to support
our new design.
o Minimize the number of devices that will require additional
resources or complexity.
o Keep the design modular so that the method used to propagate
mapping information is independent from the method used to
retrieve mapping information for tunneling.
APT is designed for use with eFIT [efitID][EFIT], one of the major
proposals for user/transit provider address space separation.
However, APT should be generally applicable to other proposals of the
same class.
3. Terminology
User Network (UN) - A network that pays another organization to
deliver its packets through the Internet. Each user network is a
customer of some Transit Network (see definition below). "User
network" holds the same meaning as it does in the eFIT proposal.
Transit Network (TN) - An AS whose business is to provide packet
delivery services for its customers. Transit Networks serve as
providers for user networks. As a rule of thumb, if the AS appears
in the middle of any ASPATH in a BGP route today, it is considered a
transit network.
Transit Space (TS) - The address space used by transit networks.
Nodes within a transit network are assigned TS addresses. Sometimes
the term "transit space" will refer to the non-edge area of the
Internet where TS prefixes are routable.
User Space (US) - The address space used by user networks. Nodes
within a user AS are assigned user-space addresses. Sometimes the
term "user space" will refer to the edges of the Internet whose
prefixes are not routable in transit space (though packets to those
addresses are deliverable through transit space). We assume that TS
and US addresses can be clearly distinguished.
Border Link - A link that crosses the boundary between transit space
and user space.
Default Mapper - A new device required by our mapping service. Each
transit network MUST have at least one default mapper. A default
Jen, et al. Expires January 3, 2008 [Page 4]
Internet-Draft Transit Mapping July 2007
mapper carries a complete mapping table. In other words, given any
user-space address, default mappers can return the TS address of a
provider-edge router corresponding to that address. To support the
growing trend towards multihoming, default mapping entries will map a
user-space prefix to a non-empty SET of TS destinations, all of which
have a direct connection to the destination network in user space.
Tunnel Router (TR) - These devices will replace all current provider-
edge routers, located at the provider end of border links. Like ITRs
and ETRs in LISP [LISP], TRs provide the encapsulation and
decapsulation services required for tunneling user packets through
transit space. A TR has both ITR and ETR functionality, meaning that
any TR can perform both encapsulation and decapsulation of packets.
To properly encapsulate any given user-space packet, TRs can query
the default mappers for mapping information. TRs also cache commonly
used mapping entries locally. Note that TR cache entries are NOT
identical to the mappings stored at default mappers (see the
definitions of "mapping" and "mapping entry" below). TRs are
designed to be as simple and as fast as possible, adding only what is
necessary for proper tunneling functionality.
APT Node - A general term referring to any new device type introduced
by APT. This includes both default mappers and TRs.
Router - These are ISP-owned non-border routers that exist today.
Other than minor configuration changes, these routers need no
alteration or replacement, and can be used just as they are used
currently.
Mapping - A mapping contains a user-space prefix and a non-empty SET
of ETR TS addresses associated with the prefix. Mappings also
include related information such as the user's public key and
priority rankings for each of the ETRs in the set. Default mappers
store mappings.
Mapping Entry - A mapping entry contains a user-space prefix and any
SINGLE ETR TS address associated with the prefix. Any mapping entry
is a subset of the complete mapping for its user-space prefix. TRs
store mapping entries along with an associated TTL. A mapping entry
is removed once its TTL expires.
4. The Mapping Service
To minimize the latency introduced by encapsulation, APT seeks to
store mapping information as close to the ITR as possible. However,
the global mapping table is likely to grow very large over time. To
avoid undue memory requirements for ITRs while still keeping mapping
Jen, et al. Expires January 3, 2008 [Page 5]
Internet-Draft Transit Mapping July 2007
information within reach, we introduce the concept of default
mappers.
A TR does not need to store the entire global mapping table.
Instead, it queries a default mapper for mapping information and
caches recently used mapping entries.
Default mappers are the only devices in the network that need to
store the complete global mapping table. As we will see in the
following example, TRs only use default mappers in the event of a
cache miss. This means that, given large enough caches at the TRs,
network latency will not heavily depend upon default-mapper
performance. Additionally, we propose the use of anycast to reach
default mappers within an AS. Each TN AS need only have a single
default mapper, but the use of anycast makes it easy for a TN to
deploy more. The result is a robust, scalable default mapping
system.
4.1. A Mapping Example
_ _
/ \ / \
/ A \ / B \________
\___/ \___/ |
| | | User Space
- - - - -|- - - - - - - - - - - - - -|- - - - -|- - - - - - - - - - - -
| | | Transit Space
.--+---. .--+---. |
_-| ITR1 |-_ _-| ETR1 |-_ |
/ '------' .`--. .--'. '------' .`--+--.
| ____ | X |--------| X | ____ | ETR2 |
| | M1 | '-;-' '-:-' | M2 | '-;----'
\ '-/\-' / \ '----' /
\___/ \___/ \__________/
______/ \____________________
| User Net | TS Addr | Priority |
|----------|----------|----------|
| ... | ... | ... |
|----------|----------|----------|
| B | ETR1 | 10 |
| | ETR2 | 20 |
|----------|----------|----------|
| ... | ... | ... |
'--------------------------------'
Figure 1. This is a simple topology for demonstrative purposes. A
and B are user networks addressable via user-space prefixes, ITR1,
Jen, et al. Expires January 3, 2008 [Page 6]
Internet-Draft Transit Mapping July 2007
ETR1, and ETR2 are TRs, any node labeled "X" is a router, and M1 and
M2 are default mappers. A portion of the mapping table for M1 is
shown.
In this section, we illustrate how TRs and default mappers interact
within an AS to properly tunnel user-space packets through transit
space.
In Figure 1, assume a node in network A sends a packet to a user-
space address in network B. When this packet arrives at ITR1, ITR1
looks up the destination user-space address in its mapping cache. If
a matching prefix is present in its cache, ITR1 simply encapsulates
the packet with the corresponding TS destination address and send it
across transit space. If a matching prefix is not present, ITR1 will
send the packet through its default mapper. It does this by
encapsulating the packet with the anycast address for default mappers
in its AS as the destination.
This packet will arrive at M1, the only default mapper in ITR1's AS.
When M1 receives the packet, it decapsulates it and examines the
user-space destination address. Since default mappers store the
full, global mapping table, a default mapper will always be able to
encapsulate the packet with a valid TS destination address. All
packets encapsulated by a default mapper MUST contain the default
mapper's TS address as the source address.
In addition to forwarding the packet to an appropriate ETR (ETR1, in
this case), M1 also treats the incoming packet as an implicit request
from ITR1 for mapping information. M1 responds to ITR1 with an ICMP
packet containing a mapping entry that maps B to ETR1. This allows
ITR1 to add this mapping entry to its cache so that ITR1 can tunnel
further packets destined for B directly to ETR1. The mapping entry
also contains a time to live (TTL) that is set by M1. The TTL
ensures that ITR1 will occasionally re-request this mapping
information from M1. At this time, if the mapping information
changed in any way since ITR1's prior request, M1 can respond with an
updated mapping entry. Without this TTL, ITR1's cached information
may become inaccurate over time.
5. Multihoming Support
In the example above, the observant reader may have noted that B is
multihomed. That is, B can be reached through both ETR1 and ETR2.
Multihoming provides B with both enhanced reliability in case of a
connectivity failure and the flexibility to split incoming traffic
across different ETRs.
Jen, et al. Expires January 3, 2008 [Page 7]
Internet-Draft Transit Mapping July 2007
In accordance with our design goals, all of the logic for selecting a
destination for a multihomed user is contained within default
mappers. Default mappers will store mappings containing all of the
ETRs for a given user-space prefix, and ITRs will only store a single
mapping entry per user-space prefix. When an ITR requests a mapping
entry for a multihomed user, it is up to the default mapper to decide
which one to return.
Many users will want to have some control over which ETR is used for
incoming traffic. To allow this, we let users assign a priority
value to each of the mapping entry for their prefixes, making it
available to all default mappers throughout the transit space (see
Section Section 6). The number is to be treated like a ranking -- an
ETR with a lower priority value is more preferable.
At the same time, a transit network may also has its own preference
regarding which of the ETRs to use for a given user-space prefix.
Default mappers can use a combination of locally configured routing
policies and the user priority information to choose from a set of
valid ETR addresses. Going back to Figure 1, assume that ITR1 does
not have a mapping entry for B in its cache. When A sends B a
packet, ITR1 will send the packet to M1. If M1 has no preference
between ETR1 and ETR2, it will examine the priority values in B's
mapping and select ETR1, B's most preferred ETR. M1 forwards the
packet to ETR1 and returns the appropriate mapping entry to ITR1,
which stores the mapping entry in its cache.
In the case of a priority value tie, the default mapper can break the
tie by picking the ETR to which it has the shortest path. If some
ETRs are tied in terms of both lowest priority value and shortest
path, the default mapper is free to break the tie arbitrarily. The
address of the selected ETR will be used as the destination address
when encapsulating the packet.
We envision that users will be able to manipulate their incoming
traffic load by setting appropriate priority values in their mapping.
A user who wants load balancing can assign the same priority value to
all of his mapping entries. A user who wants to have one TN as a
primary provider and another only as a backup can simply assign a
higher priority value to his ETR at his backup provider.
5.1. Using Alternate ETRs During Failures
When a network failure has caused an ETR to become unreachable, an
affected multihomed user will expect his traffic to be temporarily
routed through alternate ETRs. There are three general types of
failures that would require an ITR to use an alternate ETR: (1) an
ITR may discover via BGP that it can no longer reach the TS prefix
Jen, et al. Expires January 3, 2008 [Page 8]
Internet-Draft Transit Mapping July 2007
containing the address of the intended ETR, (2) an ITR may learn via
ICMP Destination Unreachable packets that its intended ETR is
unreachable, and (3) the link between a user network and its TR may
be down, a new problem introduced by the tunneling architecture. We
will explain how to handle each of these failure types below, using
Figure 1 as a reference. We assume that, at the time of failure, all
TN ASes are using ETR1 to reach B.
To assist in handling these failures, we include a time till retry
(TTR) for each mapping entry in every mapping stored in default
mappers. Normally, the TTR for each mapping entry is set to zero,
indicating that it is usable. Any mapping entry with a non-zero TTR
value is considered invalid. We will refer to the action of setting
a mapping entry's TTR as "invalidating the entry." Mapping entries
that map to unroutable destinations are also considered invalid. So
long as a mapping entry is invalid, default mappers will not use this
entry as a destination address or include it in mapping responses.
The role of the TTR for handling failures will become clear in the
explanations below.
5.1.1. Handling TS Prefix Failure
For failures of type (1), ITR1 has no route to ETR1. Assume a host
in network A attempts to send a packet to a host in network B. If
ITR1 does not have B's mapping in its cache, it will forward the
packet to M1 (see Section Section 4.1). If ITR1 does have B's
mapping in its cache, it will see that it has no path to ETR1, and
send the packet to M1 instead. M1 will also see that it has no route
to ETR1, and thus select the next-most-preferred ETR for B, ETR2. If
it has a route to ETR2, it sends the packet with ETR2 as the TS
destination address and replies to ITR1 with the corresponding
mapping entry. M1 can assign a relatively short TTL to the mapping
entry in its response. Once this TTL expires, ITR1 will forward the
next packet for B to the default mapper, which will respond with the
most-preferred mapping entry that is routable at that time. This
allows ITRs to quickly go back to using ETR1 once it becomes routable
again.
5.1.2. Handling Single TS Address Failure
In the second case, the TS prefix containing ETR1 is still routable
from ITR1, but ETR1 is unreachable from ITR1. Thus, ITR1 will
receive an ICMP Destination Unreachable message in response to any
packet sent to ETR1. ITR1 will need to turn to its default mapper
for an alternate TS destination address for B. M1 will send an
alternate valid mapping entry (if available) to ITR1. For this to
work, TRs MUST forward all received ICMP Destination Unreachable
messages to their default mappers. Default mappers MUST then
Jen, et al. Expires January 3, 2008 [Page 9]
Internet-Draft Transit Mapping July 2007
invalidate ALL mapping entries that map to the unreachable TS
destination address. To allow this, default mappers will have a
reverse-mapping table to go along with their mapping table. These
reverse-mapping tables map TS addresses to their corresponding user-
space prefixes. Now default mappers can look up the unreachable TS
address in their reverse-mapping tables, and temporarily invalidate
all entries that map to that TS address.
5.1.3. Handling User-to-TR Link Failure
The final case involves a failure of the link connecting ETR1 to B.
In the previous two cases, current Internet standards were in place
to allow ITR1 to know that a failure occurred. This case, on the
other hand, is a new type of failure that does not exist in today's
infrastructure. Therefore, it will require a new type of failure
message. These messages will take the form of a new ICMP message
type, which will include the user-space prefix that was not
reachable. TRs MUST be configured to forward all border link failure
ICMP messages to their default mappers, in the same fashion that TRs
forward all destination unreachable ICMP messages to their default
mappers.
Going back to our example, when ETR1 discovers it cannot forward the
packet to B due to a border link failure, it will send ITR1 an ICMP
packet of our new type stating that B's prefix is currently
unreachable. ITR1 will forward the border link failure ICMP message
to its default mapper, which will invalidate that mapping entry. If
the mapping entry is already invalid, it will reset the entry's TTR.
If the prefix has an alternate valid mapping entry, M1 will send this
mapping entry to ITR1.
Furthermore, to minimize packet losses, ETR1 should not simply drop
the packet addressed to the unreachable user network. Instead, ETR1
should send this packet to M2 in hopes of finding an alternate ETR
that can reach the user network. However, M2 will then look up a TS
destination address for B and choose ETR1 again. This is
undesirable, since we are seeking an alternative destination.
Therefore, when encapsulating packets for forwarding, default mappers
MUST check if the chosen TS destination address is the same as the TS
sender address in the packet's original TS header. If so, this
indicates that the TS-to-user link is down at this ETR. In such
cases, default mappers MUST invalidate the corresponding mapping
entry and seek an alternative.
To complete our example, ETR1 sends an ICMP message to ITR1 and also
sends the data packet to M2. M2 looks up a destination TS address
for the packet and finds ETR1. M2 then compares this TS address with
the TS address of the original sender of the packet, which is also
Jen, et al. Expires January 3, 2008 [Page 10]
Internet-Draft Transit Mapping July 2007
ETR1. Since they are the same, M2 invalidates this mapping entry and
finds an alternate destination, ETR2. M2 then forwards the packet to
ETR2.
5.2. Summary of Requirements for Multihoming Support
TR cache entries MUST include a TTL value, which will be provided by
their default mapper.
In default mappers, every TS destination address in a mapping MUST
include a time until retry (TTR). Usable mapping entries have a TTR
of zero. When a mapping entry becomes unreachable due to failures,
the TTR MUST be set to a pre-configured value. An alternate entry in
the same mapping MUST be used in place of an invalid mapping entry if
available.
Default mappers MUST be able to invalidate all mapping entries that
map to a particular TS destination address that has become
unreachable. This can be implemented using a reverse-mapping table.
We will use a new type of ICMP message to indicate border link
failure.
TRs MUST forward all ICMP destination unreachable and border link
failure messages to their default mapper.
If an ETR cannot send a packet due to a border link failure, it MUST
send this packet to its default mapper. This ETR MUST use its own TS
address as the source TS address of the packet.
Upon receipt of any data packet, default mappers MUST check if the
chosen TS destination address is the same as the TS source address in
the packet's original TS header. If so, default mappers MUST
invalidate the corresponding mapping entry and look for an alternate
ETR for the packet.
6. Exchanging Mappings Between ASes
In order for default mappers to store a full, global mapping table,
there must be some way for them to receive mappings from other ASes.
To avoid introducing latency or packet loss when encapsulating
packets, the default mappers must have a full set of mappings
available locally. To accomplish this, we distribute mappings using
a push method. Default mappers MUST regularly announce the mappings
for all of their customers to the rest of the network.
When a default mapper receives new mappings, it stores them in its
Jen, et al. Expires January 3, 2008 [Page 11]
Internet-Draft Transit Mapping July 2007
mapping table, replacing any existing mappings. When a TR receives
new mappings, it simply deletes any matching cache entries. Any
further communication with the formerly cached host will require the
use of a default mapper. This ensures that only the default mappers
need to validate mapping announcements and enforce policy.
Mapping messages will be flooded throughout the network via BGP. A
new BGP attribute will be required for this purpose.
We have selected BGP initially in order to ease incremental
deployment and minimize the changes required to existing routers.
However, mapping announcements could easily be distributed via a
different reliable broadcast protocol at a later date. Transitioning
mapping distribution to a different protocol will not affect any
other aspect of APT.
6.1. In Defense of BGP
Despite the use of BGP, mapping announcements will not cause the same
problems that BGP routing announcements do in the Internet today for
the following reasons.
First, for routing announcements, the path taken to reach each router
is a crucial piece of information. For mapping announcements, the
path taken to reach each APT node is not meaningful. This means that
only a single copy of each mapping announcement needs to reach each
APT node, providing an opportunity to prune duplicates, or even to
make use of a spanning tree. This also means that path exploration
and its repercussions do not exist for mapping announcements.
Second, mapping announcements only require processing at default
mappers and, to a lesser extent, TRs. Other routers in the network
need only pass these announcements along to their peers. Thus, the
processing burden placed on other routers by excessive routing
updates is completely avoided.
Finally, there will be far fewer mapping announcements than there are
routing announcements. TNs rarely change the addresses of their
equipment, and customers are generally under a monthly contract with
their provider TNs. Therefore, permanent mapping changes are
unlikely to occur more than once per month per customer.
Furthermore, transient failures do not cause mapping announcements in
APT. The most common cause of mapping announcements will be regular
refresh announcements, which should never need to be sent more than
every other day in most cases.
Jen, et al. Expires January 3, 2008 [Page 12]
Internet-Draft Transit Mapping July 2007
7. Security and Robustness
Using BGP to distribute mapping announcements guarantees that they
are only accepted from manually configured BGP peers. This ensures
that mapping announcements are no less secure than routing
announcements today. When applied to the eFIT architecture, however,
the security of this scheme is greatly increased. This is due to the
fact that eFIT TS addresses are not addressable from user space
[efitID][EFIT]. This turns out to be a major boon for the BGP trust
model, since only other TS nodes are valid BGP peers.
The complete separation of the eFIT TS address space provides another
security benefit: malicious users cannot attack equipment that they
cannot address. End users simply cannot affect the TS nodes that
their packets travel through within transit space.
Despite these benefits, there are some additional issues introduced
by APT. Manually configured mappings provide an opportunity for
human error, our reliance on ICMP packets provides an opportunity for
spoofing and cache poisoning, and storing the entire global mapping
table at default mappers poses a threat to long-term scalability.
The remainder of this section will address each of these issues in
turn.
7.1. Detecting Misconfigurations
Due to the fact that only TNs will have access to transit space,
false mapping updates are far more likely to be the result of
accidental misconfigurations than malicious attacks. With this in
mind, we present a simple, extensible authentication scheme that can
detect and, in some cases, prevent accidental misconfigurations.
The types of misconfigurations that could potentially be harmful are
those that result in one provider accidentally interfering with the
mapping for another provider's customer. This can happen whenever a
provider accidentally announces a mapping for the wrong user-space
prefix. These types of accidental conflicts fall into three
categories: (1) a provider announces a mapping for a prefix owned by
another provider's customer, (2) a provider announces a mapping for a
shorter user-space prefix that contains a longer user-space prefix
owned by owned by another provider's customer, and (3) a provider
announces a mapping for a longer user-space prefix that is a subset
of a shorter user-space prefix owned by another provider's customer.
The first category of conflicts is the only one that we intend to
actively prevent. Clearly, the user that owns a particular user-
space prefix should be the ultimate authority for his mapping
information. However, user networks do not announce their mappings
Jen, et al. Expires January 3, 2008 [Page 13]
Internet-Draft Transit Mapping July 2007
to the network directly, but rather through their providers. In
order to ensure a mapping update for a user-space prefix is approved
by its rightful owner, we must include some sort of user
authorization string in each announcement. To this end, we introduce
a public-key field into each mapping. This field SHOULD contain a
cryptographically valid public key, but it will only rarely need to
be used as such. In the normal case, when a default mapper receives
a new mapping announcement that would replace an existing one, it
only needs to ensure that the public key has not changed. (This
scheme is similar in spirit to the way that OpenSSH uses its
'known_hosts' file.) However, as long as all of a UN's providers
store the corresponding private key, the distribution of public keys
also introduces the possibility of using cryptographic signatures for
any number of purposes within transit space.
For the other two categories, it is less clear that such an
announcement is the result of a misconfiguration. It is possible,
for example, that the owner of a /16 user-space prefix has resold
some of the contained /24 prefixes to other UNs. In such cases, only
the administrators will know if the announcement is valid. It is for
this reason that (in the spirit of PHAS [PHAS]) we do not attempt to
prevent such changes, but only detect and notify interested parties.
Since legitimate mapping changes are infrequent, notifying interested
parties of mapping changes via e-mail is a perfectly viable option.
These notifications could also prove useful in debugging the mapping
service, or a particular provider's configuration.
7.2. ICMP Mapping Packets
ICMP mapping packets are used exclusively by default mappers to send
mapping entries to the TRs within their AS. Therefore, there is no
reason that these ICMP packets should ever need to travel between
ASes. In order to prevent cache poisoning through spoofing, these
ICMP packets simply MUST be dropped at all border routers within
transit space.
7.3. Other ICMP Packets
Our mapping service also depends on two other types of ICMP packets:
existing ICMP Destination Unreachable messages, and our new ICMP
Border Link Failure messages. Both of these packet types must
traverse AS boundaries. Again note that, under the eFIT
architecture, these packets are already more trustworthy than ICMP
packets in the current infrastructure -- they can only be generated
by hosts in transit space. However, if this level of security is
deemed insufficient, the keys used for detecting misconfigurations
could be used to cryptographically sign such packets, ensuring that
they are coming from the appropriate sender.
Jen, et al. Expires January 3, 2008 [Page 14]
Internet-Draft Transit Mapping July 2007
7.4. Default Mapper Scalability
Theoretically, the global mapping table could grow to contain a
separate mapping for every user-space prefix. In the case of IPv6
prefixes, the total number of mappings would be on the order of
10^18, far more than we can expect to be able to store on a single
device. If the global mapping table were to approach such gargantuan
proportions, a few simple changes to the default-mapper model would
allow APT to scale gracefully.
Instead of each default mapper storing the full, global mapping
table, each default mapper would store only a subset of the table.
This subset would be aggregatable by user-space prefix. For
addresses outside of this subset, a default mapper would store
mappings that mapped short, artificially aggregated prefixes to the
TS addresses of other default mappers. Like virtual prefixes in CRIO
[CRIO], the user-space prefixes in these mappings would not
necessarily correspond to actual user prefixes.
Each virtual prefix would be announced by the default mapper
responsible for the corresponding subset of the global mapping table.
In order to ensure complete coverage of the user address space, some
central authority would need to assign these virtual prefixes to
individual transit networks.
This scheme allows for a tradeoff between latency and default mapper
storage requirements. (For more discussion of the characteristics of
such a tradeoff, see [CRIO].) However, this scheme also requires
some providers to become authoritative sources for mappings owned by
other providers' customers. Both this requirement and the need to
involve a central authority could prove problematic for deployment.
Therefore, we do not recommend using this scheme unless the size of
the global mapping table demands it.
8. Incremental Deployment
Clearly, the deployment of APT will coincide with the deployment of
eFIT (or a similar architecture). Though incremental deployment of
the eFIT architecture itself is beyond the scope of this document, we
must at least show that APT will behave properly under partial
deployment.
Under the eFIT architecture, addresses outside of transit space will
not change. This means that user-space prefixes will initially share
the existing IP address space. This fact provides us with a simple
method for delivering packets to addresses for which no mapping is
available. Presumably, the only such addresses will be those
Jen, et al. Expires January 3, 2008 [Page 15]
Internet-Draft Transit Mapping July 2007
connected to providers who have not yet adopted the new architecture.
In order to deliver such packets, APT nodes can simply return them to
the old infrastructure, and they will be routed as they are today.
In order to support this feature, default mappers will respond to TRs
with an ICMP mapping packet that indicates that no entry exists for
the given user-space prefix. TRs will keep a negative cache entry
for such prefixes so that they can forward such packets directly to a
non-TS router.
In discussing incremental deployment, we must also address the issue
of how new default mappers will acquire the complete mapping table
when they are first connected to transit space. Since our mapping
service design requires that all ASes re-announce all of their
mappings at a regular interval, commissioning a new default mapper
only requires connecting it to the network and waiting for all other
TS ASes to re-announce their mappings. Yet, this introduces a
potential problem -- if there is no upper bound on the regular
refresh interval, there will be no upper bound on how long a new
default mapper needs to wait until its mapping table is complete.
Therefore, there needs to be an upper bound on the refresh interval
for mappings. An appropriate value would be once a week. This would
mean that a newly deployed default mapper would be able to reach the
entire transit space one week later (with the exception of any ASes
that failed to follow protocol).
9. Future Work
Optimally, any design paper should include an evaluation section. In
the future, we will examine traces of Internet activity to determine
the characteristics of the tradeoff between TR cache size and default
mapper workload, the amount of traffic overhead that would be
incurred by our push-based design, and any other results that the
community deems useful.
We are also considering automating user mapping updates. Under our
current design, whenever a user needs to update his mapping
information (he may add, subtract, or change providers, or change his
priority values), the user must contact his providers offline and
request that they announce the updated mapping information. It is
then up to the providers to update the mapping information. As we
have seen with DNS updates, human involvement introduces the
possibility of human error and delay. We hope to provide UNs with an
automated way to manage their mapping information.
Jen, et al. Expires January 3, 2008 [Page 16]
Internet-Draft Transit Mapping July 2007
10. IANA Considerations
This memo includes no request to IANA.
11. Security Considerations
Security considerations for APT are discussed in Section Section 7.
12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
12.2. Informative References
[efitID] Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Proposal
for Scalable Internet Routing and Addressing", Internet Dr
aft, http://www.ietf.org/internet-drafts/
draft-wang-ietf-efit-00.txt, 2 2007.
[EFIT] Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Scalable
Routing System Design for Future Internet", SIGCOMM IPv6
Workshop , 8 2007.
[LISP] Farinacci, D., Fuller, V., and D. Oran, "Locator/ID
Separation Protocol (LISP)", Internet Draft, http://
www.ietf.org/internet-drafts/draft-farinacci-lisp-00.txt,
2007.
[PHAS] Lad, M., Massey, D., Pei, D., Wu, Y., Zhang, B., and L.
Zhang, "PHAS: A Prefix Hijack Alert System", USENIX
Security .
[AddrAlloc]
Meng, X., Xu, Z., Zhang, B., Huston, G., Lu, S., and L.
Zhang, "IPv4 Address Allocation and BGP Routing Table
Evolution", ACM SIGCOMM Computer Communication Review
(CCR) special issue on Internet Vital Statistics, Volume
35, Issue 1, p71-80.
[RAWS] Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
Workshop on Routing and Addressing", Internet Draft, http:
//www.ietf.org/internet-drafts/
draft-iab-raws-report-02.txt, 2007.
Jen, et al. Expires January 3, 2008 [Page 17]
Internet-Draft Transit Mapping July 2007
[CRIO] Zhang, X., Francis, P., Wang, J., and K. Yoshida, "Scaling
IP Routing with the Core Router-Integrated Overlay", Proc.
International Conference on Network Protocols , 11 2005.
Appendix A. BGP Mapping Announcement Fields
Address Type - This field specifies the type of user-space addresses
used for the user-space prefixes in the announcement. All user-space
prefixes in a single mapping announcement MUST be of the same address
type. Currently, this is expected to be either IPv4 or IPv6, but any
other address type is possible provided that it is supported by the
APT nodes in the ASes that wish to use it. APT nodes MUST ignore
mapping announcements for address types that they do not understand.
Total Length - This field specifies the total number of bytes used by
all mappings in the announcement. Each mapping announcement can
contain mappings for multiple prefixes, each with multiple mapping
entries.
Each mapping in the announcement is described by the following
fields:
User-space Prefix - This is the user prefix for the mapping.
Public Key - This is a public key that can be used to verify
signatures, decrypt data, and prevent misconfigurations for the
corresponding user-space prefix. See Section Section 7.1 for more
information.
Time To Live (TTL) - This is the amount of time in hours that this
mapping should persist in default mappers before being considered
obsolete and erased. This value MUST be set to at least three times
the regular refresh interval lest the corresponding user-space prefix
become unreachable. The TTL is specified in hours to prevent
misconfigurations from causing excessive mapping updates.
TS Address Count - This is the total number of TS addresses that the
corresponding user-space prefix maps to.
TS Address Set - This is a set of TS addresses, each with a priority.
The total number of addresses is specified by the previous field.
Priorities are arbitrary integers that only have meaning in reference
to each other. Addresses with lower priority values are considered
more preferable.
Jen, et al. Expires January 3, 2008 [Page 18]
Internet-Draft Transit Mapping July 2007
Appendix B. ICMP Mapping Message Fields
User-space Prefix - This prefix is used to match the input address
for mapping cache lookups.
TS Address - This is the destination that the user-space prefix maps
to.
TTL - This is the time that the entry stays in the cache. Its value
is determined by the default mapper.
Appendix C. ICMP Border Link Failure Fields
Prefix - This field contains the user-space prefix that cannot be
reached as a result of the border link failure.
Signature - This field can optionally contain a signature generated
using the UN's private key. It can then be used to verify the
legitimacy of the message.
Appendix D. Hidden Backup Mappings
As mentioned in our mapping section, our design allows users to
assign backup providers and perform traffic engineering through
appropriate assignment of their TN priority values. Of course, this
method will only prove effective if all transit networks generally
respect these priority values. This may not be the case in practice.
User networks may be negatively affected if priorities are not
respected. For example, imagine that a user has a cheap primary
provider and an expensive backup provider. If enough transit
networks ignore the UN's preference and send his traffic through the
backup provider, the financial impact on the user could be
significant. For this reason, users may not want to depend on other
ASes to respect their priority values.
In today's Internet, multihomed user networks can use BGP trickery to
hide their backup providers unless they are needed. The backup
provider simply does not announce a route to the UN's prefix unless
it receives a withdrawal for that prefix from the UN's primary
provider. At this point, the backup provider will announce its path
to the UN's prefix. Once it receives a new announcement for the
prefix from the primary provider, the backup provider withdraws its
path to the UN's prefix, putting it back into hiding.
In accordance with our "do no harm" design philosophy, we present a
Jen, et al. Expires January 3, 2008 [Page 19]
Internet-Draft Transit Mapping July 2007
method for including a hidden backup feature into APT. Hidden backup
support introduces new ICMP packets, mapping tables, and state into
APT. We leave it as an open question whether this feature should be
included at all. If transit networks are willing to respect the
priority values included with mapping entries, hidden backup support
(and its complexity) can be omitted entirely from APT.
Appendix D.1. Hidden Backup Mapping Protocol
A user would want to activate his hidden backup provider in the same
three failure situations that require switching to an alternate
provider (see Section Section 5.1). We will explain how to handle
each of these failure types.
Situation (1) is detectable by the backup provider via BGP. When the
backup provider learns that there are no routes to the UN's primary
provider, he MUST announce his own backup mapping and begin servicing
the user network. If the UN's primary provider later becomes
reachable, the backup provider MUST re-announce the original mapping.
The responsibility to re-announce the original mapping lies with the
backup provider in order to prevent route flapping from causing
mapping flapping. The backup provider SHOULD wait until the primary
provider has been stable for a set period of time before re-
announcing the original mapping. Also note that these mapping
announcements are indistinguishable from those generated by permanent
mapping changes, leaving default mappers throughout transit space no
choice but to respect them.
Situation (2) is detectable by the primary provider via IGP. When
the primary provider learns that one of his TRs is down, he MUST
inform the backup providers for the affected user networks. This
could be done via BGP flooding, but it seems excessive to flood the
entire core with a message that is only relevant to a handful of
providers. Instead of flooding, the primary provider needs to inform
the relevant backup providers directly. To support this, primary
providers MUST store a "backup-mapping table" that maps each of their
customers to their corresponding backup providers. This table should
not be very large, since each provider will only store entries for
his own customers. Furthermore, customers who do not have a hidden
backup can be excluded from the backup-mapping table.
When a TR goes down, one of the provider's default mappers can use
its reverse-mapping table (see Section Section 5.1) to determine
which user prefixes are affected. It can then use its backup-mapping
table to determine which backup providers need to be notified. The
rest of the communication will be implemented using two new ICMP
message types, "Primary Provider Failure" and "Primary Provider
Recovery". Each of these types will require an acknowledgment (ACK)
Jen, et al. Expires January 3, 2008 [Page 20]
Internet-Draft Transit Mapping July 2007
flag to ensure delivery.
Primary providers MUST send an ICMP "Primary Provider Failure"
message to each of the appropriate backup providers. These messages
MUST contain the relevant mapping entry. Upon receipt of such a
message, a backup provider MUST respond with an identical packet,
except that it MUST set the ACK flag. Then, it MUST announce a
backup mapping entry. When the customer's primary provider detects a
recovery, it MUST send an ICMP "Primary Provider Recovery" message to
the appropriate backup providers. The backup providers MUST
acknowledge the message, and re-announce the original mapping. As in
situation (1), re-announcing of the original mapping is left to the
backup providers to prevent mapping flapping.
Situation (3) is detectable by the TR whose link to a user has gone
down. The TR MUST inform his default mapper of this failure via the
new ICMP type described in Section Section 5.1.3. At this point, the
primary provider can lookup the affected user in his backup-mapping
table, and proceed as in situation (2).
The ICMP communication described above is essential to hidden backup
functionality. Thus, these messages must be secure and reliable.
Security can be achieved with public-private key cryptography (see
Section Section 7.3). For reliability, the primary provider MUST
continue to send "Primary Provider Failure" and "Primary Provider
Recovery" ICMP packets periodically until it receives an
acknowledgment from the backup provider. Backup providers MUST
always acknowledge these types of ICMP messages, regardless of the
state of the corresponding mapping.
Mapping announcements and ICMP communication will be carried out by
default mappers unless otherwise specified. Backup-mapping tables
are also stored in the default mappers.
Authors' Addresses
Dan Jen
Email: jenster@cs.ucla.edu
Michael Meisel
Email: meisel@cs.ucla.edu
Jen, et al. Expires January 3, 2008 [Page 21]
Internet-Draft Transit Mapping July 2007
Dan Massey
Email: massey@cs.colostate.edu
Lan Wang
Email: lanwang@memphis.edu
Beichuan Zhang
Email: bzhang@cs.arizona.edu
Lixia Zhang
Email: lixia@cs.ucla.edu
Jen, et al. Expires January 3, 2008 [Page 22]
Internet-Draft Transit Mapping July 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Jen, et al. Expires January 3, 2008 [Page 23]
| PAFTECH AB 2003-2026 | 2026-04-21 18:48:15 |