One document matched: draft-whittle-ivip-arch-02.txt
Differences from draft-whittle-ivip-arch-01.txt
Network Working Group R. Whittle
Internet-Draft First Principles
Intended status: Experimental August 19, 2008
Expires: February 20, 2009
Ivip (Internet Vastly Improved Plumbing) Architecture
draft-whittle-ivip-arch-02.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 20, 2009.
Whittle Expires February 20, 2009 [Page 1]
Internet-Draft Ivip Architecture August 2008
Abstract
Ivip (Internet Vastly Improved Plumbing) is a proposed global system
of routers and either collection of databases which control the
tunneling of some of these routers. Database changes affect all
Ingress Tunnel Routers (ITRs) within a few seconds, controlling which
Egress Tunnel Router (ETR) they tunnel each packet to, depending on
the packet's destination address. The ETR used by a host with an
Ivip-mapped address is typically located in the same network as this
destination host. The ETR decapsulates packets and forwards them to
the destination host. A second type of ETR known as a Translating
Tunnel Router (TTR) is used for mobile-IP, with the mobile node
creating two-way tunnels to one or more nearby TTRs. Ivip enables a
subset of IPv4 and IPv6 address space to be portable (used via any
ISP which has an ETR) and to be suitable for multihoming (connection
to the Net via two or more ISPs) - without involving BGP and without
requiring any changes to host operating systems or applications.
This is a form of "locator-ID separation" and is based on some
principles derived from LISP (Locator/ID Separation Protocol). IP
addresses in the subset of address space which is subject to being
tunneled by ITRs are known as Destination Identifiers (DIDs). ITRs
and ETRs are located on ordinary BGP Reachable IP (BRIP) addresses.
The databases and ITRs map DID addresses to an ETR's BRIP address
with a granularity of a single IPv4 address or a /64 prefix for IPv6.
These two granularities are 256 and 64k times finer than is typically
possible with BGP. This proposal is intended to resolve many of the
problems discussed in the October 2006 Amsterdam IAB Routing and
Addressing Workshop (RAWS). Ivip's primary goals include the more
efficient utilisation of IPv4 space and enabling millions of end-
users to achieve portability and multihoming without involving BGP,
without fuelling the growth of the global BGP routing table, and
without requiring these end users to have ASNs or to acquire
conventional prefixes of PI (Provider Independent) BGP reachable
address space.
Whittle Expires February 20, 2009 [Page 2]
Internet-Draft Ivip Architecture August 2008
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1. New IDs soon . . . . . . . . . . . . . . . . . . . . . . 5
1.2. Brainstorming phase . . . . . . . . . . . . . . . . . . . 5
1.3. Postal redirection analogy . . . . . . . . . . . . . . . 8
1.4. LISP and ID/LOC separation . . . . . . . . . . . . . . . 9
1.5. One way tunnels to a single ETR . . . . . . . . . . . . . 11
1.6. Anycast ITRs . . . . . . . . . . . . . . . . . . . . . . 13
1.7. Types of ETR . . . . . . . . . . . . . . . . . . . . . . 15
1.8. Types of ITR . . . . . . . . . . . . . . . . . . . . . . 15
1.8.1. ITRD - full database (push) . . . . . . . . . . . . . 16
1.8.2. ITRC - query, cache (pull) and notify . . . . . . . . 16
1.8.3. ITFH - Ingress Tunnel Function in Host . . . . . . . 16
1.9. Initial deployment . . . . . . . . . . . . . . . . . . . 17
1.9.1. Paths taken by packets . . . . . . . . . . . . . . . 19
1.9.2. Multihoming when both links are working . . . . . . . 22
1.9.3. External multihoming monitoring system . . . . . . . 22
1.9.4. Multihoming after a link fails . . . . . . . . . . . 23
1.9.5. Potential problems with internal routing systems . . 25
1.10. Ivip's intended benefits . . . . . . . . . . . . . . . . 25
1.11. Long term deployment . . . . . . . . . . . . . . . . . . 27
2. Definition of Terms, Concepts and Functions . . . . . . . . . 30
2.1. IMIP - Ivip-Mapped IP address . . . . . . . . . . . . . . 30
2.2. NIMIP - Non-Ivip-mapped IP address . . . . . . . . . . . 31
2.3. BRIP - BGP Reachable IP address . . . . . . . . . . . . . 31
2.4. UAIP - Un-Advertised IP address . . . . . . . . . . . . . 31
2.5. DID - Destination Identifier . . . . . . . . . . . . . . 31
2.6. TELOC - Tunnel Endpoint Locator . . . . . . . . . . . . . 32
2.7. IMAB - Ivip-Mapped Address Block . . . . . . . . . . . . 32
2.8. IMAB-DB - IMAB DataBase . . . . . . . . . . . . . . . . . 33
2.9. IMAB-DBD - IMAB DataBase Dump . . . . . . . . . . . . . . 34
2.10. UMUC - User Mapping Update Command . . . . . . . . . . . 35
2.11. SUMUC - Signed User Mapping Update Command . . . . . . . 35
2.12. SH/SN - Sending Host/Node . . . . . . . . . . . . . . . . 35
2.13. RH/RN - Receiving Host/Node . . . . . . . . . . . . . . . 36
2.14. IRH/IRN - Ivip-mapped Receiving Host/Node . . . . . . . . 36
2.15. MH/MN - Mobile Host/Node . . . . . . . . . . . . . . . . 36
2.16. UAS - Update Authorisation System . . . . . . . . . . . . 36
2.17. RUAS - Root Update Authorisation System . . . . . . . . . 37
2.18. US-IMAB - Update Stream specific to one IMAB . . . . . . 37
2.19. US-Complete - Update Stream for the Complete Ivip
system . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.20. Replicator . . . . . . . . . . . . . . . . . . . . . . . 38
2.21. QSD - Query Server with full Database . . . . . . . . . . 38
2.22. QSC - Query Server with Cache . . . . . . . . . . . . . . 39
2.23. ITR - Ingress Tunnel Router . . . . . . . . . . . . . . . 39
2.24. ITRD - Ingress Tunnel Router with Database . . . . . . . 39
Whittle Expires February 20, 2009 [Page 3]
Internet-Draft Ivip Architecture August 2008
2.25. ITRC - Ingress Tunnel Router with Cache . . . . . . . . . 40
2.26. ITFH - Ingress Tunneling Function in Host . . . . . . . . 42
2.27. ETR - Egress Tunnel Router . . . . . . . . . . . . . . . 43
2.28. ETFH - Egress Tunnel Function in Host . . . . . . . . . . 43
2.29. TTR - Translating Tunnel Router for Mobile-IP . . . . . . 43
3. The Crisis in Routing and Addressing . . . . . . . . . . . . 45
3.1. Interrelated needs and problems . . . . . . . . . . . . . 45
3.2. Constraints on possible solutions . . . . . . . . . . . . 46
4. Potential Solutions . . . . . . . . . . . . . . . . . . . . . 48
5. Comparison with LISP . . . . . . . . . . . . . . . . . . . . 49
5.1. LISP principles and mechanisms used by Ivip . . . . . . . 49
5.2. LISP principles and mechanisms not used by Ivip . . . . . 50
5.3. Additional principles and mechanisms in Ivip . . . . . . 53
6. Ivip's goals, non-goals and challenges . . . . . . . . . . . 55
7. User Interface and Update Authorities . . . . . . . . . . . . 56
8. Replicators . . . . . . . . . . . . . . . . . . . . . . . . . 63
9. Query Servers - QSD and QSC . . . . . . . . . . . . . . . . . 70
10. Ingress Tunnel (ITR) strategies . . . . . . . . . . . . . . . 71
11. Egress Tunnel (ETR) strategies . . . . . . . . . . . . . . . 78
12. Mobile-IP with TTRs . . . . . . . . . . . . . . . . . . . . . 79
13. IPv6 and longer term strategies . . . . . . . . . . . . . . . 80
14. Loose ends . . . . . . . . . . . . . . . . . . . . . . . . . 81
14.1. ETRs checking src & dest addresses . . . . . . . . . . . 81
14.1.1. Short version . . . . . . . . . . . . . . . . . . . . 81
14.1.2. ITR tunneled packet with source address of sending
host . . . . . . . . . . . . . . . . . . . . . . . . 82
14.2. Scaling the Replicator network . . . . . . . . . . . . . 96
14.3. Is fast, secure, Replication possible on the Internet? . 97
14.4. TTRs and Mobility . . . . . . . . . . . . . . . . . . . . 98
15. Security Considerations . . . . . . . . . . . . . . . . . . . 103
16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 104
17. Informative References . . . . . . . . . . . . . . . . . . . 105
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 106
Appendix B. The Ivip acronym . . . . . . . . . . . . . . . . . . 107
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 108
Intellectual Property and Copyright Statements . . . . . . . . . 109
Whittle Expires February 20, 2009 [Page 4]
Internet-Draft Ivip Architecture August 2008
1. Introduction
1.1. New IDs soon
This version 02 ID is a placeholder until I complete some further
drafts later in 2008, and rewrite this ivip-arch ID in a shorter
format.
Please refer to http://www.firstpr.com.au/ip/ivip/ for the latest
news.
The remainder of this ID is unchanged from version 00.
1.2. Brainstorming phase
The purpose of this Internet Draft is to contribute to the
development of one or more proposals to resolve the problems of what
might be called the Crisis in Routing and Addressing. Ivip is one
proposal among potentially many. Ivip is at an early stage of
development and this I-D is part of what I regard as a brainstorming
effort on the RAM mailing list. Consequently this I-D contains more
exploratory and speculative material than the architectural RFC it
may one day become. Most of the discussion below focuses on IPv4
except where noted. The goal is to develop a practical, elegant,
incrementally deployable model of Ivip for IPv4. Once a promising
model for IPv4 has been developed, full consideration should be given
to IPv6 and to what degree these two separate Ivip networks might be
integrated. Consideration should also be given to how a globally
deployed Ivip system might support IPv6 and the scalable tunneling of
IPv6 traffic over IPv4 and vice-versa.
The remainder of this long Introduction is intended primarily for
readers - such as members of the RAM list - who are already familiar
with RAWS, LISP and with other proposals, including especially with
the limitations of BGP routers which are the primary reason why the
Internet needs a new routing and addressing architecture. Following
this Introduction is a section describing in detail Ivip's major
Terms, Concepts and Functions.
The three sections following this provide a fuller grounding for
readers who are new to this field, introducing the RAWS report on the
Crisis, other solutions, and a comparison with LISP. Once these
sections have been read, the Introduction should make more sense to
readers who were not yet familiar with these.
Following these are sections which contain further discussion and
diagrams regarding various deployment scenarios and about how the
ITR, ETR, TTR, Replicators and Query Server functions of Ivip can be
Whittle Expires February 20, 2009 [Page 5]
Internet-Draft Ivip Architecture August 2008
implemented in conventional routers, in servers, and in some cases
within hosts.
Finally, in the "Loose ends" section, is some material which I don't
have time to refine and integrate smoothly into this version of the
draft. This includes a section on ensuring ETRs are not a backdoor
around security arrangements which prevent attackers sending packets
with spoofed source addresses. There is also a section which
questions whether the Internet itself is a suitable basis for
building the fast, secure, high-volume system of RUAS servers and
Replicators. Even if it was secured with cryptographic techniques,
it would still be vulnerable to DoS attacks from botnets. The last
"Loose ends" section describes Translating Tunnel Routers and how
they may be used with Ivip's ITR system to provide much more
efficient and flexible Mobile IP connectivity than is possible with
current techniques.
While I believe the simple ITR and ETR behavior of Ivip is both
satisfactory and superior to the more complex approach of LISP
(although perhaps LISP 3 will involve simpler arrangements than
described for 1 or 1.5 in the current LISP-01 I-D) - I don't feel I
have a robust enough approach to pushing the mapping data out to
ITRDs and QSDs all over the Net. If that problem can be solved, I
think Ivip has a reasonable chance of satisfying the criteria set
forth in the RRG's Design Goals for Scalable Internet Routing
[I-D.irtf-rrg-design-goals-01].
Grave problems will arise if no suitable new architectural solution
is found to the Internet's problems in routing and addressing. Ivip
is intended to facilitate a much finer splitting of IPv4 address
space than BGP allows - and therefore a much greater utilisation of
this space than is currently possible. Ivip is also intended to
provide a better approach to IP address portability and multihoming
so that fewer end-users will want to gain conventional PI address
space and further burden DFZ (Default Free Zone) routers with
additions to the global BGP routing table.
The iplane.cs.washington.edu project indicates there are
approximately 63,000 BGP routers. (Lists of alias clusters in
[iPlane].) Most of these will be transit and multihomed border
routers. The remainder are singlehomed border routers. Transit and
multihomed border routers are in the DFZ, and so need to develop a
separate routing rule for each of the 220,000 or so prefixes which
are advertised in the global BGP system. Every DFZ router needs to
communicate with each of its peers about each of these prefixes, with
messages about each prefix typically propagating across the entire
BGP system. Iljitsch van Beijnum estimates that each prefix for each
peer consumes between 60 and 240 bytes of router memory - and some
Whittle Expires February 20, 2009 [Page 6]
Internet-Draft Ivip Architecture August 2008
routers have dozens of peers. [van-Beijnum-BGP] Problems with the
load this places on routers, and difficulties with the stability of
the whole BGP system, are the most serious and growing problem at
present - and threaten to make many of these (probably) 50,000+
routers obsolete as the number of BGP routes grows.
The size of these problems means that considerable resources can
justifiably be devoted to introducing a new system. So while the
problems to be overcome are daunting, the author of any such proposal
can invoke the expenditure of millions of dollars of resources with
ease, since other competing proposals will involve similar
expenditures and since inaction would result in far higher
expenditures still. However, a successful proposal must be not only
the most promising of the alternatives, but must also be
incrementally deployable. As Noel Chiappa wrote on the RRG list on
2007 July 13:
"That is *the* problem in Internet engineering these days. Any old
fool (well, sort of :-) can design a better network, or a jet
airplane; but it takes a real genius to figure out how to turn a
fabric biplane into a jet while it's flying! :-)
Ivip requires no changes to host operating systems or applications.
Nor does it require changes to the BGP routing system. Ivip requires
new functionality within, or closely connected to, some existing BGP
and internal routers. The intention is that this can be implemented
with firmware and/or configuration changes. In principle, the entire
Ivip system could be introduced by adding specially programmed
servers, with only configuration changes to the existing routers.
However the most likely deployment scenarios involve additional
router functionality, as well as the creation of some globally
coordinated networks of servers.
Ivip ITR and ETR behavior is relatively simple. The real challenges
are in allowing end-users to securely control their part of the
mapping database, getting the database information to the ITRs
quickly and securely, implementing the ITR functions efficiently
(including in servers and sending hosts rather than routers), in
ensuring ETRs can't be used to circumvent security measures - while
ensuring that some networks will want to implement Ivip even when few
people use it or know what it is.
Please use and adapt these ideas for your own proposals and suggest
any improvements which could be made to this I-D, which was prepared
in a hurry. I intend to create a better version 01 in mid to late
August. In the meantime, please discuss this I-D on the RAM list -
http://www1.ietf.org/mailman/listinfo/ram - or via private email. It
is possible that discussions will be redirected to the RRG (IRTF
Whittle Expires February 20, 2009 [Page 7]
Internet-Draft Ivip Architecture August 2008
Routing Research Group) list:
http://www.irtf.org/charter?gtype=rg&group=rrg . I will attempt to
list bug-fixes and planned improvements to this I-D at
http://www.firstpr.com.au/ip/ivip/ .
1.3. Postal redirection analogy
A simple and reasonably instructive analogy to Ivip is the Post
Office's mail redirection system. Letters addressed to an original
home address are redirected from the original destination's post
office with a sticker (or within a new envelope) to a new address,
which typically involves them being delivered via a second post
office.
This often involves sub-optimal path lengths, for instance a letter
sent from Boston to an original address in San Francisco being
redirected to Manhattan. Optimal paths could be achieved - at a very
high cost - if every sorting office recognised letters with
redirected destination addresses, so the letter was redirected at its
first point of contact with the sorting and forwarding system. Ivip
does not involve every router being able to do this, but uses a
subset of routers with additional ITR (Ingress Tunnel Router)
functionality. ITRs recognise packets which need to be redirected.
They encapsulate and tunnel packets to another router (an Egress
Tunnel Router - ETR), using an address gained from a global databases
for this particular block of Ivip-mapped address space.
Ivip doesn't encapsulate and tunnel packets which, in the postal
analogy, were addressed to ordinary addresses in streets which
physically exist. A postal system which is closely analogous to Ivip
would redirect every letter with a destination address in one of
multiple new artificial streets or towns, which have no physical
existence. The Post Office would create multiple "streets" such as
Twenty-seventh Virtual St in Virtualville. It then assigns, for
years or indefinitely, numbers in such streets to individuals,
families and organisations. A subset of sorting houses, through
which every letter must pass before reaching a delivery office, would
recognise every letter addressed to a street in Virtualville. For
each such letter, using a central database, these specially upgraded
sorting offices would place the letter in an envelope addressed to
one of the Post Office's delivery offices. The database query
consists of the full Virtualville address. The response consists
simply of the postal address of whichever delivery office can best
deliver the letter to its proper recipient. Whenever the proper
recipient moves to a new locality, they use a username and password
system via the Web, or via a post office, to update the central
database so their letters will be redirected to the delivery office
in the new locality. When the encapsulated letter reaches the
Whittle Expires February 20, 2009 [Page 8]
Internet-Draft Ivip Architecture August 2008
delivery office, the sticker or outer envelope is removed and the
office has local knowledge of how to deliver the letter to its
intended recipient.
With Mobile IP and existing postal redirection systems, the
destination typically has a physical "care-of" address (although the
Post Office's "post restante" service does not require an address,
just identification when picking up mail from a post office). With
Ivip, the destination need not have any other IP address than its own
Ivip-mapped IP address. In the postal analogy, the delivery office
delivers packets to the correct recipient, which is not necessarily a
house with an ordinary street address. In neither Ivip nor the new
Virtualville postal routing and addressing architecture does the
system specify exactly how the final router or delivery office should
forward the packet or letter to its proper recipient. In all cases,
however, the destination does have the Ivip-mapped address, or has
the full Virtualville address emblazoned on it in some manner.
In the postal analogy, due to anti-terrorist security measures, every
initial sorting office which processes letters posted in a locality,
will not forward any letters which come from an unrecognised address.
The local system needs to recognise that letters with particular,
previously locally registered, Virtualville sender addresses should
be delivered normally, and that letters with sender addresses from
other Virtualville streets and numbers, or from any address outside
the local area, should be quarantined in a safe location where they
will await scrutiny by the Office of Homeland Security.
Packets sent from hosts with Ivip-mapped addresses only need to pass
muster in respect of their source address being locally recognised.
They don't require any special delivery system, unless of course the
destination address is Ivip-mapped too, in which case the packet will
be forwarded to an ITR, which tunnels it to an ETR which forwards it
to the destination host.
1.4. LISP and ID/LOC separation
Many proposals have been made regarding additional protocol layers to
take the place of IP addresses, which are widely regarded as
performing two functions: identifying the end-point of a
communication and specifying, as part of the address, information
about where the end-point is located. The primary goal of all these
proposals is that upper layer protocols would work with the
identifier, and continue a communication session with the end-point,
even when it becomes accessible via a different locator - for
instance due to end-point mobility or switching from one provider
network to another in a multihoming setting.
Whittle Expires February 20, 2009 [Page 9]
Internet-Draft Ivip Architecture August 2008
This goes beyond the functionality of the current two level DNS and
IP address system. The current system is fine for a human user or a
piece of software always commencing a communication session with a
FQDN such as www.example.org. What the current system cannot cope
with is continuing a session, such as an HTTP session over TCP, with
the remote server when that server becomes only reachable via a
different IP address. ID/LOC separation proposals generally intend
that higher layer protocols such as TCP can continue to operate on
identifiers, with a lower, new, layer of protocol software
translating these to whichever physical locators are needed to reach
the server at each moment in time.
This would allow session continuity when a multihomed host becomes
reachable only via a new IP address. It would also enable locators
to be allocated to physical sites in accordance with the dictates of
route aggregation, which makes life easier for routers, while the
allocation of identifiers need not be constrained by route
aggregation or any other constraint regarding the physical topology
of the network. However, since physical routing is done on the lower
level locators, which are still subject to topological constraints,
ID/LOC separation doesn't necessarily allow complete portability of
networks from one provider to another, since network's internal
routing configuration is set in part by numeric locator IP addresses,
and these can't be advertised at arbitrary providers without
compromising route aggregation and/or adding a further route to the
global BGP routing table.
While Ivip is based on LISP (Locator/ID Separation Protocol)
[I-D.farinacci-lisp], which is an ID/LOC separation protocol, I am
not sure that Ivip meets all the formal requirements which proponents
of ID/LOC might have for such a protocol. In a later section
(Comparison with LISP) I attempt to list what Ivip takes from LISP,
what it leaves out and what it adds.
Some ID/LOC proposals require non-backwards compatible changes to
operating system and/or application software. Some use conventional
IP addresses for both "identifier" and "locator". For instance SHIM6
[I-D.ietf-shim6-proto] (which is still being developed) works between
IPv6 hosts with upgraded TCP/IP stacks and achieves multihoming, but
not portability, on a purely host-to-host basis without any changes
to routers or the addressing system.
LISP also uses some ordinary IP addresses for identifiers and others
for locators. LISP requires no changes to hosts, BGP routers or the
BGP routing system. It achieves its goals of portability,
multihoming and Traffic Engineering (TE) with special ITR and ETR
(Ingress and Egress) Tunnel Routers inside provider and end-user edge
networks. In the LISP variants which are most suitable for adoption,
Whittle Expires February 20, 2009 [Page 10]
Internet-Draft Ivip Architecture August 2008
a centralised or distributed database controls the ITRs.
Ivip is based on some of LISP's principles, including ITRs, ETRs and
using a subset of the existing address space as identifiers with the
remainders being usable as locators. Ivip does not attempt LISP's
communication between ITRs and ETRs. Nor does it involve LISP's
explicit TE functions. Ivip has a very different method of
distributing ID-LOC mapping information (instructions to ITRs on
where to tunnel packets based on their original destination address)
than is proposed in current LISP I-Ds.
Some of Ivip's ITRs are "anycast ITRs in the core" (meaning outside
provider and AS-end-user edge networks) with the mapped addresses
(identifiers, or EIDs in LISP terminology) being part of BGP
advertised prefixes. In this way, packets sent by hosts in networks
without ITRs will still be tunneled by an ITR and find their way to
hosts with Ivip-mapped addresses. This "anycast ITRs in the core"
system is an unusual form of anycast, and supports TCP and all other
protocols, because all packets are tunneled to the one destination
host. This system is believed to make Ivip much more incrementally
deployable than LISP, because without these "anycast ITRs in the
core", hosts with LISP/Ivip-mapped addresses would not be reachable
from hosts in networks which have not installed an ITR.
Ivip may have more ambitious goals than LISP regarding the fine
division of address space to serve the needs of millions of end-users
and regarding how quickly the database(s) and ITR system can respond
to user commands to change mapping of their addresses. Ivip has no
explicit TE functions, but it is intended that some TE be achievable.
For instance to achieve load balancing over two or more links to a
multihomed site which has traffic arriving on multiple Ivip-mapped
addresses, the end-user would choose, for each such Ivip-mapped
address, which ISP's ETR the packets are tunneled to and therefore
which link these packets travel on.
1.5. One way tunnels to a single ETR
If host HA has a normal BGP-reachable IP address and host HB has an
Ivip-mapped address, Ivip is only involved in tunneling packets sent
by HA to HB. The typical arrangement is for the packet to be
forwarded to an ITR which uses the packet's Destination Address (DA)
as a key to its local copy of the mapping database, with the result
being an IP address to which the packet will be tunneled. IP-in-IP
tunneling is used, with a single outer IP header added, using the
original source address. The destination address is that of an ETR,
and is provided by a copy of the database which the ITR either
contains or can query. The end-user (who runs host HB) has
previously set the database so all ITRs in the world will tunnel
Whittle Expires February 20, 2009 [Page 11]
Internet-Draft Ivip Architecture August 2008
packets which are addressed to host HB's Ivip-mapped address, to
whichever ETR the end-user chooses.
When the encapsulated packet arrives at the ETR, the outer IP header
is removed, and the original packet, as HA sent it and as the ITR
received it, is forwarded to host HB. (The ITR typically copies the
hop-count value from the original packet to the outer IP header and
the ETR copies it from the outer IP header to the decapsulated
packet.)
................ ................
. N1 . . N2 .
. . . .
. HA-----ITR~~~~~BR~~~~~~TR~~~~~~BR~~~~~ETR-----HB .
. . . .
................ ................
Figure 1: Basic left to right packet flow - ITR in N1.
Figure 1 depicts left to right flow of a packet from host HA
(7.7.7.7) to host HB (22.22.22.22). The "raw" packet, with DA =
22.22.22.22 is forwarded to the ITR in network N1. 22.22.22.22 is
part of the 22.22.0.0/16 prefix, which is one of the Ivip-Mapped
Address Blocks (IMABs) which all ITRs advertise. This means that
every ITR which is a BGP router advertises itself as the destination
for this prefix, and that every ITR which is an internal router will
inject this route into the local routing system. The one /16 prefix,
in this example, burdens the BGP system with one extra route, but can
be used to support the portable and multihoming address needs of
hundreds or thousands of end-users.
Without Ivip or a similarly effective system, some or all of these
hundreds or thousands of end-users would get their own PI space,
totalling far more than the 65,536 addresses of 22.22.0.0/16, and
adding hundreds or thousands of routes to the global BGP routing
table.
Ivip's capacity to reduce the growth on the BGP routing table and to
enable the efficient use of IPv4 space by giving end-users precisely
the number of addresses they need - not 256, 512, 1024 etc. addresses
- rests on the RIRs developing an address management policy for Ivip-
mapped address space which generally ensures that large blocks of
addresses are assigned to the Ivip system, with each block being used
to serve the needs of many end-users. It should not be difficult to
develop implement such policies.
Whittle Expires February 20, 2009 [Page 12]
Internet-Draft Ivip Architecture August 2008
Ivip is intended to serve the needs of end-users who need
portability, multihoming and perhaps TE. Some of these end-users
have already gained - or in the absence of a new routing and
addressing architecture, would soon gain - an ASN and PI space to add
to the BGP routing system. Ivip is also intended to serve the needs
of end-users who do not have the resources to become an AS, gain a PI
prefix etc., but who nonetheless need portability, multihoming and
perhaps TE over multiple links to providers.
Portability of a single IP address between providers is not
ordinarily considered a high priority goal, since a single host or
NAT router and its DNS entry can easily be manually configured to a
new IP address whenever a new provider is used. However there may be
instances where an organisation has hundreds of branch offices, each
with a single or a few IP addresses, which it wishes to remain fixed
despite changing each office's singlehomed connection to one local
provider or another, so its country-wide routing system does not need
to be reconfigured frequently.
1.6. Anycast ITRs
Multiple routers, usually each with an associated server, advertising
the same prefix is known as "anycasting" [RFC1546] [ISC-Anycast].
Ivip's use of multiple anycast routers may be novel: tunneling
packets to a single tunnel endpoint, which forwards the packets to a
single host.
Each ITR either has a copy of the Ivip database (ITRD) or queries
(ITRC) a QSD server (perhaps indirectly through one or more caching
QSC servers) which does have a copy. The database's array for the
22.22.0.0 IMAB has 65,536 elements - one for each IP address. Each
element contains a 32 bit IP address. The element for 22.22.22.22
has been set by the end-user to contain the address 54.32.1.0, which
is the address of the ETR in Network N2.
The ~~~~ path in Figure 2 depicts the encapsulated packet being
forwarded from the ITR, to N1's border router, to a transit router,
to N2's border router and then to its destination, the ETR in N2.
This transport of the encapsulated packet has been entirely with the
standard BGP system and N2's internal routing system.
In this example, the BGP system sees only a packet with the
Destination Address (DA) of 54.32.1.0. If there had been no ITR in
N1, but the TR transit router in Figure 1 was an ITR - as shown in
Figure 2 - then the BGP system would handle two different packets.
The first is a "raw" packet with DA = 22.22.22.22, which was
forwarded to the ITR function of this transit router. The second is
the encapsulated packet leaving this ITR transit router for the
Whittle Expires February 20, 2009 [Page 13]
Internet-Draft Ivip Architecture August 2008
border router of N2, with its outer IP header having DA = 54.32.1.0.
................ ................
. N1 . . N2 .
. . . .
. HA-----IR------BR-----ITR~~~~~~BR~~~~~ETR-----HB .
. . . .
................ ................
Figure 2: Basic left to right packet flow - anycast ITR in core.
In both Figures 1 and 2, the ETR removes the outer IP header,
revealing the original packet. After updating its hop-count, the ETR
forwards the decapsulated packet to the destination host. This
requires either a direct connection to the destination host or
support from N2's internal routing system. The latter involves the
routing system recognising packets with DA = this particular IP
address - 22.22.22.22 - as needing to be forwarded to this host,
while (assuming N2 has no other hosts using Ivip-mapped addresses
from this IMAB) other addresses within 22.22.0.0/16 are forwarded as
usual. "As usual" means towards any ITR inside N2 or failing that,
to N2's border router, because this prefix is one which is advertised
in BGP by multiple anycast ITRs in the core.
A border router of a provider or AS-end-user network which is an ITR
and advertises 22.22.0.0/16 to its BGP peers in other ASes also
functions as an "anycast ITR in the core" because "raw" packets
emerging from networks with no ITR will be forwarded to this ITR
border router and be encapsulated and tunneled from there. Exactly
why a network would provide this service for packets not associated
with its network is a separate question.
A border router in N1 may be a convenient location to install ITR
functionality. A more likely arrangement is that it would not
advertise 22.22.0.0/16 or any of the other IMABs in the Ivip system
to its BGP peers outside N1 (so as not to attract packets originating
from non-ITR networks). The border ITR would internally advertise
22.22.0.0/16 and the other IMABs so that all packets addressed to an
Ivip-mapped IP address (IMIP) would be forwarded internally to this
border router ITR.
To the picture given by Figures 1 and 2 three other concepts need to
be added.
Firstly, packets sent from hosts all over the Net to 22.22.22.22 are
tunneled by ITRs to the one ETR at any one time.
Whittle Expires February 20, 2009 [Page 14]
Internet-Draft Ivip Architecture August 2008
Secondly, the address of the tunnel endpoionts for all the ITRs can
be changed within a short time, globally - ideally within a few
seconds - by the end-user who controls the Ivip-mapping of
22.22.22.22.
Thirdly, if the network in which the sending host is located does not
have an ITR, the raw packet will be forwarded internally to the
border router and then forwarded through the BGP system to the
"nearest" (in BGP terms) ITR, which tunnels it to the end-user's
chosen ETR.
Packet's flowing from HB to HA do not require any involvement of
Ivip. Each ITR and ETR shown in the previous and the following
diagrams also performs whatever functions an ordinary router in its
position performs. A packet sent from HB to HA is forwarded
internally to the N2's border router and then through BGP routers to
the border router of N1, after which it is forwarded internally to
HA. The packet may well pass through N2's ETR, but since its DA is
not one of N2's ETR's IP addresses, that ETR forwards it normally.
The packet may well pass through N1's ITR (Figure 1) or the core-ITR
in Figure 2 - but since its DA is not within one of the Ivip system's
IMABs, both of those ITRs behave like an ordinary internal router
(Figure 1) or transit router (Figure 2) and forward the packet
normally towards HA.
If HA had an Ivip-mapped address too, then packets sent to it from HB
would also need to go via an ITR and an ETR. These are not shown in
the previous two diagrams.
1.7. Types of ETR
ETRs (with the exception of some TTRs - Translating Tunnel Routers,
for mobile destination hosts) are always located in provider or AS-
end-user networks. It is also possible for the destination host to
perform its own ETR function, which requires it to have suitable
software and a BGP-reachable care-of address.
TTRs are discussed in a section below concerning mobility.
1.8. Types of ITR
ITRs are typically located in provider or AS-end-user networks. ITRs
outside those networks - "anycast ITRs in the core" - handle packets
sent from networks which have no ITR. It is also possible to perform
the ITR function in the sending host, provided that host is not
behind NAT. The NAT router itself, assuming it is not behind NAT, is
a good place to perform the ITR function.
Whittle Expires February 20, 2009 [Page 15]
Internet-Draft Ivip Architecture August 2008
In this introduction, I assume each ITR handles the full range of
IMABs in the Ivip system. However, to spread load over multiple ITRs
in a single location, several could be configured so they each cover
a fraction of the total Ivip-mapped address space.
1.8.1. ITRD - full database (push)
An ITRD is an ITR which has a real-time updated copy of the full
Ivip-mapping database (or multiple databases, one for each IMAB).
Its FIB is always up-to-date, instantly tunneling all packets
received whose DA is within any one of the IMABs. An ITRD requires a
very extensive FIB and a large amount of CPU RAM. An ITRD could be
implemented in a server - but the highest performance ITRDs would
always be those with a full ASIC-based router FIB hardware system.
1.8.2. ITRC - query, cache (pull) and notify
An ITRC does not keep a full copy of the database, but queries a
nearby (ideally) Query Server which does have a full copy. Query
Servers are not described in detail in this introduction. The ITRC's
FIB only tunnels packets for which the ITRC has recently received
mapping information.
ITRCs are informed by Query Servers if the mapping changes for any
IMIP (Ivip-mapped IP address) for which it recently received mapping
information. This cache invalidation message is known as
Notification, and is initiated by the Query Server which has a real-
time updated copy of the full database for each of the IMABs in the
Ivip system. ITRCs could be implemented with a server, but the
highest performance ITRCs will generally be routers with additional
capabilities, using their existing FIB hardware to encapsulate
packets.
1.8.3. ITFH - Ingress Tunnel Function in Host
An ITFH (Ingress Tunnel Function in Host) is an operating system
implementation of an ITRC. As such, this is an additional layer of
TCP/IP software in the upper part of the IP Layer 3 code, at the same
level chosen for SHIM6. [I-D.ietf-shim6-proto] This is suitable for
hosts which are not behind NAT or for a NAT router itself, provided
it is not behind NAT. There is absolutely no requirement for ITFH in
Ivip, but in the longer term, if Ivip or something similar becomes
widely deployed, the most cost-effective location to perform most or
all encapsulation may be in the sending host or the NAT router.
Both ITFHs and ITRCs may not be able to gain mapping information fast
enough to correctly tunnel all packets whose destinations are Ivip-
mapped. Also, they may not be able to store all this information in
Whittle Expires February 20, 2009 [Page 16]
Internet-Draft Ivip Architecture August 2008
their RAM, or implement all the mapping in their limited FIB
functions. These "unmatched" packets (including those which are not
novel, but which for one reason or another should be encapsulated but
have not been) may be simply forwarded normally, in which case they
will find their way to an ITRD. Alternatively, the ITRC or ITRH may
be able to identify these packets and explicitly forward or tunnel
them to a nearby ITRD.
1.9. Initial deployment
The simplest initial deployment of Ivip involves a single database,
multiple anycast ITRs in the core, and one or more ETRs in each of
multiple provider networks. Better performance would be achieved
with ITRs in provider and AS-end-user edge networks. The diagram
below assume a single or distributed database system which controls
all ITRs. In later sections I describe how multiple databases, one
for each IMAB, are distributed over multiple systems and their
updates combined and distributed by a global Replicator system.
......... ..........
. N1 . . N3 .
. . . .
. . . /-IH5 .
. H1----\ . . / .
. BR1------ITR1-------BR3--ETR1 . Multihomed
. H2----/ . \ / \ / .\ \ . end-user
. . \ / \ / . H6 \- PE1-\ ...........
......... \ / \ / .......... \ . N5 .
\/ \/ \ . .
/\ /\ CE1---IH9 .
......... / \ / \ .......... / . \ .
. . / \ / \ . . / . \-IH10 .
. H3-ITR2 . / \ / \ . /------ETR2-/ . .
. BR2-------TR1-------BR4---H7 . ...........
. H4----/ . . \-IH8 .
. . . .
. N2 . BR4 = ITR & ETR . N4 .
......... ..........
Figure 3: Simple multihoming scenario.
The following discussion relates to Figure 3. This represents a
small section of the Internet, but we can assume it is the entire
Internet for these examples.
Networks N1 to N4 are provider (ISP) networks. N5 is the network of
Whittle Expires February 20, 2009 [Page 17]
Internet-Draft Ivip Architecture August 2008
an end-user. Current multihoming practice requires the end-user to
have their own PI (Provider Independent) address space, which
typically requires them to be an Autonomous System. This means they
could run BGP routers, but this is not actually required. All that
is required is that both N5's two providers have links to N5's CE1
(Customer Edge) router and that one or the other advertises N5's PI
prefix at its border routers and forwards those packets to CE1.
I assume the reader is fully familiar with this approach to
multihoming, and that it is understood that the central challenge in
devising a new routing and addressing architecture for the Internet
involves achieving multihoming without N5 having an unnecessarily
large number of IP addresses assigned to it and without burdening the
BGP system both with an extra advertised prefix and when changes are
made to this advertisement when, for instance, the link to N3 fails
and N4's BR4 advertises the prefix instead. The sections following
this Introduction provide more background information on these
matters.
N1 is an unaltered provider network - it has no ITRs or ETRs.
Therefore it is not possible (except via a TTR inside or outside N1)
to have any hosts there using Ivip-mapped addresses.
N2 has an ITR but no ETR. Without an ETR (and ignoring TTRs for the
rest of this discussion) N2 cannot have any hosts with Ivip-mapped
addresses.
N3 has an ETR. The diagram shows one host IH5 with an Ivip mapped
address. In this discussion I will assume that each host has a
single Ivip-mapped address, but it is perfectly possible for a host
to have multiple such addresses, prefixes of such addresses etc. as
well as having ordinary BGP-reachable non-Ivip-mapped addresses. N3
has a PE1 (Provider Edge) internal router which has a link to the
end-user's site.
N4's border router BR4 is both an ITR and an ETR. N4's Provider Edge
router (ETR1) with a link to the end user's site is also an ETR. N4
has a host H7 with an ordinary address and IH8 with an Ivip-mapped
address.
N5 has an Ivip-mapped prefix: 22.22.2.0/28 - 16 IP addresses. These
are effectively PI addresses, because they have been obtained either
from the Ivip system itself, or from whichever company (perhaps an
ISP) is participating with the Ivip system and which has assigned the
IMAB 22.22.0.0/16 to the Ivip system. N5 probably pays a small
annual fee for these addresses, and may need to justify its use of
them, as pressure mounts to use IPv4 space efficiently.
Whittle Expires February 20, 2009 [Page 18]
Internet-Draft Ivip Architecture August 2008
N5's Ivip-mapped prefix consists of 16 contiguous IP addresses which
happen to fit on binary boundaries. Initially we will consider
multihoming for robustness, with all these 16 addresses treated as a
prefix, and all tunneled to one ETR or another. In practice, the
Ivip-system and the ITRs will tunnel packets to whatever address the
end-user chooses, subject to some areas of the address space being
off-limits for tunneling and also subject to packets never being
tunneled to any address which is Ivip-mapped. In this example, I
assume that the end-user ensures that packets addressed to their
addresses are always tunneled to an ETR of a provider they have a
commercial relationship with.
1.9.1. Paths taken by packets
Here I will give examples of packet flows in Figure 3.
Packets to and from hosts with ordinary BGP Reachable IP (BRIP)
addresses follow predictable paths, for instance: H1, BR1, ITR1
(acting as an ordinary transit router), BR3, H6. Packets sent by H6
to H1 follow the same path in reverse.
Packets sent by H1 to IH5 (with an Ivip-mapped address) follow this
path: H1, BR1, ITR1 (which encapsulates the packet with IP-in-IP, DA
= ETR1's IP address), BR3, ETR1 (decapsulates the packet), BR3
(assuming N3's internal routing system has an appropriate route to
handle packets with DA = IH5's Ivip-mapped address), IH5.
Packets sent from IH5 to H1 follow a simpler path, because
destination address is an ordinary BRIP - so the packet is handled by
the usual internal and BGP systems, without involving Ivip
mechanisms: IH5, BR3, ITR1 (acting as a conventional transit router),
BR1, H1.
A packet from H3 to IH5 does not use the core-ITR ITR1, because its
network N2 has its own ITR. The path is: H3, ITR2 (which
encapsulates the packet with DA = ETR1's IP address), BR2, TR1 (or
perhaps ITR1, depending on BR2s choice of best path for the prefix in
which ETR1's BRIP matches), BR3, ETR1 (which decapsulates it, to
restore the original packet with DA = IH5's Ivip-mapped address),
BR3, IH5. Packets from IH5 to H3 involve no Ivip handling and follow
a path such as IH5, BR3, TR1, BR2, ITR2 (acting as an ordinary
internal router, since the packet's DA is not part of an Ivip-mapped
address block - IMAB), H3.
A packet from H4 to IH5 would follow a similar path to that just
described, but initially it would travel to BR2, and then to ITR2.
ITR2 advertises (injects?) the routes for all the IMABs into N2's
internal routing system. BR3 forwards the packet to ITR2 for this
Whittle Expires February 20, 2009 [Page 19]
Internet-Draft Ivip Architecture August 2008
reason. If there was no ITR in N2 (like the situation in N1) then
BR2 would have forwarded the packet to one of its BGP peers, probably
ITR1, which also advertises the same set of IMABs. I assume that the
internal routing system route for packets addressed to any one of
these IMABs takes precedence for BR2. Once the packet reaches ITR2,
it is encapsulated and forwarded as previously described to ETR1,
where it is decapsulated and forwarded to IH5.
A packet sent from H6 to IH5 would presumably be handled by N3's
internal routing system, which presumably has a route specific for
IH5's Ivip-mapped address. If not, then the packet will be forwarded
out of N3, because N3 has no ITR, and will reach the nearest ITR,
which is the core-ITR ITR1. There is it will be encapsulated and
forwarded to ETR1, to be decapsulated and forwarded through BR3 to
IH5.
Similarly, a packet from H6 to IH9 will either be handled by N3's
internal routing system - forwarded directly as a raw packet through
BR3, ETR1 (as a normal internal router), PE1, CE1 and IH9 - or be
forwarded out to ITR1, where it is encapsulated, forwarded to BR3 and
ETR1, decapsulated and forwarded to PE1, CE1 and IH9.
N2 has its own ITR, so its hosts do not rely on external ITRs such as
ITR1 when sending packets to hosts with Ivip-mapped addresses. N2
has no ETR, so it can't have any hosts with Ivip-mapped addresses.
N4 has one ITR and at least one (two) ETRs, so it can have hosts with
Ivip-mapped addresses. N4's host's don't rely on external ITRs
either when sending packets to Ivip-mapped addresses.
The hosts in N5 are all on Ivip-mapped addresses. When they send
packets to hosts with Ivip-mapped addresses which are outside N5,
these packets will need to be handled by an ITR - unless the
destination host is within whichever provider network N3 or N4 CE1
sends the outgoing packets to and if that provider network's internal
routing system has routes for that destination host. If IH9 sends a
packet to IH8, while CE1 is sending outgoing packets along the link
to N3, then the raw packet will be forwarded out of N3, since N3 has
no ITR. The raw packet will be forwarded to ITR1, which will
encapsulate it and tunnel the packet to BR4, which decapsulates it
and forwards it to IH8.
If there was no core-ITR such as ITR1 nearby, these packets would
have to travel to the nearest core-ITR. This is assuming that N4's
BR4, which is an ITR, is not advertising the Ivip IMABs to its BGP
peers. If there was no nearby core-ITR and N4's BR4 was advertising
the Ivip IMABs, then in the previous example, the raw packet would be
forwarded out of BR3 and find its way to BR4, which is acting like a
core-ITR. BR4 could respond in two ways. Firstly, BR4 would look
Whittle Expires February 20, 2009 [Page 20]
Internet-Draft Ivip Architecture August 2008
into its database (if it was an ITRD - or use a Query Server if it
was an ITRC) and find that the Ivip mapping for this address (IH8's)
is to tunnel it to one of BR4's own addresses. It could encapsulate
it, forward it to itself and decapsulate it. Secondly, before
testing the packet against the Ivip database, BR4's FIB could first
apply local routing rules to the packet, in which case the packet
would be forwarded directly to IH7. This would be a rare, but
perfectly valid, case where a packet sent to a host with an Ivip-
mapped address completes the journey, in this case via three
networks, without actually being tunneled.
It would be a public-spirited act for N4 to make its BR4 ITR
functions available to packets arriving from its BGP peers. There
could be a number of reasons why N4 does this, including simply
wanting to encourage Ivip adoption, in the hope of saving a bunch of
money by not having to upgrade its DFZ routers as quickly as would be
required without something like Ivip. Perhaps there could be some
central collection of funds and subsidisation of core-ITRs - which
BR4 would effectively become - if it advertised the Ivip IMABs to its
BGP peers.
However, any ITR which does this MUST forward all decapsulated
packets without restriction. For instance if ITR1 was an ordinary
transit router and there was no other core-ITR anywhere close, then
BR4, acting as a core-ITR, could be handling packets which have
nothing directly to do with N4 or its customers. For instance, a
packet from H2 to IH5 would follow this path: H2, BR1, TR1, BR4
(acting as an ITR, encapsulates it), TR2 (a new name for the transit
router where ITR1 was), BR3, ETR1 (which decapsulates it), BR3 and
IH5. It would not be acceptable for N5 to make BR4 an anycast ITR
for its BGP peers and only forward encapsulated packets received from
those peers where the final destination was within N4.
N5 could have its own ITR, which would get the raw packet and
encapsulate it - but perhaps N5 doesn't want to run an ITR, due to
the capital cost, due to the high traffic volume of database updates
for an ITRD, or due to the slow response times and extra traffic over
its link for an ITRC due to the slow nature of its link to the Query
Server(s) the ITRC would depend on in whichever provider network CE1
is currently sending outgoing packets to.
This discussion has involved a lot of low-level detail but I hope it
has helped the reader understand various ways packets can flow with
Ivip.
Whittle Expires February 20, 2009 [Page 21]
Internet-Draft Ivip Architecture August 2008
1.9.2. Multihoming when both links are working
The end-user arranges with N3 and N4 to configure their ETRs,
internal routing systems etc. ready to accept encapsulated packets
for its 22.22.2.0/28 prefix. This also involves N3 and N4 allowing
packets with Source Addresses (SAs) from this prefix to be forwarded
normally, including out of their border routers to the BGP system.
In the case of N4, BR4 must accept outgoing packets with SAs within
22.22.2.0/28 to be forwarded to its BGP peers and to be accepted into
its ITR function (if their DA matches one of the Ivip IMABs).
N5's CE1 router accepts incoming packets with DA matching
22.22.2.0/28 on either link, and forwards them to the local network,
which is shown with two hosts IH9 and IH10 which both have Ivip-
mapped addresses.
The administrators of N3 and N4 tell the end-user (the administrator
of N5) the IP addresses of their two ETRs: ETR1 and ETR2. It would
also have been possible for N4 to have the packets decapsulated by
its BR4 which is N4's second ETR, as well as an ITR and border
router. However, this is a busy router and it makes more sense to
have ETR2 do the decapsulating work. In this case - ETR2 doing the
decapsulation - N4 doesn't have to alter its internal routing system
to forward packets for N5's prefix, because the link to CE1 is
connected directly to an interface of ETR2. N3 does need to
configure its internal routing system to handle CE1's prefix, unless
some special tunneling is used to get the decapsulated packets from
ETR1 to PE1.
1.9.3. External multihoming monitoring system
Not shown in this diagram is some kind of commercial monitoring
system, which the end-user hires to keep a constant watch on the
status of their multihoming arrangement.
Monitoring of link failure etc. is not part of Ivip. There may be an
argument for one or more IETF standardised protocols etc. for such a
monitoring system. Here we assume there is a monitoring system which
can rapidly and reliably detect any failure which affects N5's
multihoming arrangement, including for instance the failure of the
link to either ISP, the failure of either ISP's PE router, its ETR,
or the ISP's entire connection to the Net.
The monitoring system probably needs to be located entirely outside
N3, N4 or N5. In principle, it might be possible to locate it in N5,
but the whole purpose of a monitoring system is to change the Ivip
database once a fault occurs, so that the ITRs tunnel packets to an
alternative ETR which has a working link to CE1. Any such commands
Whittle Expires February 20, 2009 [Page 22]
Internet-Draft Ivip Architecture August 2008
need to be cryptographically secured, and a unidirectional system for
such commands to whatever accepts commands to alter the mapping
database might be vulnerable to a replay attack. I assume that the
monitoring system needs a reliable two way link to whatever Update
Authorisation Server (UAS) the end-user uses to alter the mapping of
their Ivip-mapped addresses. In that case, it is best that it the
monitoring system not be in the end-user network, because at the time
of N3 link failure, two-way communication can't occur using the
current ITR tunnels, which are still to N3's ETR.
It is conceivable that the UAS could be preconfigured to communicate
with a monitoring system at the end-user site via its own tunneling
of packets to one or more ETRs which are not currently tunneled to by
the Ivip ITRs. That would best be achieved by an IETF standardised
protocol. It is also conceivable that an external monitoring system
might accept prompt, cryptographically secured, messages from some
router or server in the N3 network the moment the link to the CE1
went down. This too could be the subject of an IETF standardised
protocol, but it would not directly involve ETRs or Ivip.
Ignoring for the moment packets sent by hosts in N3, N4 or N5,
initially, all packets sent by hosts all over the world to any of
N5's prefix of Ivip mapped addresses are sent via ETR1, because that
is where the end-user and/or the monitoring system has configured the
database to map these 22.22.2.0/28 addresses to. In this example,
the end-user has given the monitoring system the private key,
username and password etc. which is necessary for the monitoring
system to automatically change the mapping, via the UAS which handles
the end-user's Ivip-mapped addresses.
1.9.4. Multihoming after a link fails
The monitoring system sends frequent probe packets to CE1, by
tunnelling packets to both ETR1 and ETR2. The monitoring system
might also monitor the current state of the mapping of the end-user's
16 Ivip-mapped addresses. It could do this either by gaining a real-
time feed of database changes, or by querying a Query Server (which
would use Notify to instantly inform the monitoring system of any
change). At some point in time, the inability of the monitoring
system to receive responses to probe packets sent via ETR1 causes it
to decide this link has failed. It uses the credentials supplied by
the end-user and initiates a session with the UAS by which this end-
user controls the mapping of its addresses.
Once logged in, the monitoring system could issue separate commands
to change the mapping for each of the 16 IP addresses, or a single
command for all 16 together. It changes their mapping from the IP
address of ETR1 to the IP address of ETR2. There is no particular
Whittle Expires February 20, 2009 [Page 23]
Internet-Draft Ivip Architecture August 2008
reason other than N5's internal networking convenience why its
addresses should be a conventional prefix on binary boundaries, as
they are in this example. The Ivip system can handle individual
addresses and arbitrary ranges of addresses with equal ease.
The precise details of the UASes, databases, update streams, database
dump files, Replicators, ITRs and QSD/QSC Query Servers are detailed
in later sections of this I-D. The discussion here gives a rough
idea of what is achieved by these systems.
The command from the monitoring system - or from any other system or
a web-browser human interface session with anyone or anything with
credentials accepted by the UAS - will cause the UAS to hand down a
User Mapping Update Command (UMUC), with its signature (therefore a
SUMUC), to another UAS which delegated it the responsibility for
whatever ranges of Ivip-mapped addresses it is authoritative for.
This command causes a change in the database for the particular IMAB
which the end-user's addresses are part of. This results in the
change being incorporated within a second or so into multiple
identical UDP packets which are sent to 30 or so "Level 1
Replicators". (There may be other ways of achieving the same
results, but this is the plan I am pursuing at present.) Three or
four levels of replicators reliably propagate the changes to the
global network of ITRs and QSDs (Query Servers with a full copy of
the Database).
This causes a nearly instant (say a few seconds delay, but ideally a
fraction of a second) change in the FIBs of the ITRDs all over the
Net - so that all packets arriving with DA matching 22.22.2.0/28 will
now be tunneled to ETR2, instead of to ETR1. All ITRCs which
recently (perhaps some standard caching time, such as 600 seconds)
requested from a QSD (perhaps via one or more QSCs - Query Servers
with Cache) mapping for an IP address which resulted in a response
which concerned any one of the 16 addresses which have just had their
mapping changed, will quickly (fraction of a second?) receive a
Notification from the QSD (plus chain of 0 or more QSCs) which
provided the response. The notification causes all these ITRCs to
change their tunneling to ETR2 as well. ITFH functions in hosts
behave and are notified in exactly the same way as just described for
ITRCs.
Connectivity is restored, as long as N4, its ETR4, the link to CE1
etc. are still working. CE1 also needs to have changed its outgoing
packet path to be via ETR2. Perhaps the monitoring system could
inform it of the change, if CE2 had not already determined that there
was a problem with the link to N3.
Whittle Expires February 20, 2009 [Page 24]
Internet-Draft Ivip Architecture August 2008
1.9.5. Potential problems with internal routing systems
There are some potential problems during this failure and changeover
time which I will briefly mention. I would appreciate any assistance
understanding the likely behavior of provider internal routing
systems in this situation. I understand that typically, the internal
routing system will rapidly respond to the broken link, but would
like to know more about all this.
When the link to CE1 fails (which could be due to any failure in CE1,
the link, PE1, the internal routing system etc.) can the internal
routing system of N3 be relied upon to quickly cancel the special
route it has for forwarding packets whose DA matches 22.22.2.0/28 to
PE1?
If not, then there is a potentially serious problem with hosts within
N3 not being able to send packets to N5. If N3 can't guarantee that
its internal routing system will quickly remove any such routes, and
so allow packets addressed to 22.22.2.0/28 to find their way out of
N5 like the packets addressed to the rest of the 22.22.0.0/16 IMAB
(where they will find their way to an ITR such as ITR1, or any ITR
within N3), then perhaps it would be better if N3 never made such a
route in its internal routing system. In this scenario, all packets
from hosts inside N3 to 22.22.2.0/28 would need to go via an ITR, and
ETR1 would use an explicit tunnel to get decapsulated packets to PE1.
1.10. Ivip's intended benefits
From the above examples, it can be seen that a global Ivip system, or
something similar, is capable of having large amounts of address
space assigned to it, where it can slice and dice it with very fine
resolution (single IPv4 addresses) with very rapid response times
(probably a few seconds, but perhaps less with ideal arrangements) so
that the addresses can be portable between any ISP with an ETR. This
portability directly supports multihoming which can be controlled at
a "site level" (range of IP addresses all at once) or down to an
individual host (single IP address) level. For IPv6, I envisage Ivip
mapping each /64 to a particular ETR.
This portability and multihoming - and whatever TE is possible with
Ivip - requires no changes to host operating systems or applications.
The ITFH function is a strictly optional concept, which would be
attractive for some hosts and NAT routers in the longer term but
which is not required at any time, including initial introduction.
The use of "anycast ITRs in the core" means that hosts in unaltered
provider and AS-end-user networks are all capable of sending and
receiving packets to and from hosts with Ivip-mapped addresses.
Whittle Expires February 20, 2009 [Page 25]
Internet-Draft Ivip Architecture August 2008
There are cost and administrative challenges in deploying the entire
Ivip system, including especially the anycast core-ITRs. However,
these costs and difficulties are arguably far less challenging than
what may be the two remaining alternatives: firstly to pay for and
ensure the installation of ITRs in every provider and AS-end-user
network as LISP is widely believed to require, or secondly, to do
nothing and allow all the routers in the DFZ to become swamped by
continued growth in the global BGP routing table, and so need
replacement with new, more expensive, models.
Ivip or something like it seems to offer the only chance we have for
efficiently using limited IPv4 address space. Ivip is unconstrained
by binary boundaries, "route aggregation" etc.
Only when addresses can be assigned according to direct need, rather
than in large chunks as they have been to date, can the address space
be used efficiently. For instance to have the majority of the 3.7
billion available IP addresses ((0 to 223 inclusive, except 10 and
128) * 256 * 256 * 256 = 3.724 billion) actively used either for an
individual host or for a NAT device which supports multiple hosts on
a private network. There are no reliable estimates of actual usage
of IPv4 utilisation, but in early 2007, a random ping survey
indicated there were about 108 million ping-responsive hosts, with
much higher densities in some advertised prefixes. [RW ping survey]
Ivip can also be used to achieve some TE benefits, by steering
traffic of individual Ivip-mapped addresses to one ETR or another.
Ivip's ability to support highly efficient mobile-IP is discussed in
a later section. So to is the possibility that it could be used to
greatly facilitate highly scalable IPv6 tunneling over the existing
IPv4 system.
None of this places any further burden on the BGP system. Ivip's
benefits should greatly reduce the impetus for end-users and perhaps
providers for gaining and advertising PI addresses in the global BGP
system.
This I-D proposes changes which are pervasive and unprecedented.
There are many questions to be explored, security problems to be
resolved etc. The scope of this project goes beyond the IETF
developing protocols and recommended procedures, since it requires
cooperation amongst providers, end-users and RIRs, who must approve
of address space being used for this novel purpose.
There is nothing technically preventing one or more Ivip systems
being created today, perhaps as profitable enterprises hiring out
their IP addresses to customers - as long as RIRs approve. Although
Whittle Expires February 20, 2009 [Page 26]
Internet-Draft Ivip Architecture August 2008
it may be impossible and/or undesirable to prevent the creation of
multiple independent Ivip systems which behave as described here, the
rest of this I-D concentrates on the establishment of a single global
Ivip system. (Multiple Ivip systems need not know about each other -
it is not disastrous if an ETR tunnel end-point of one Ivip system's
mapping is actually an address which is Ivip-mapped in another
system.)
This introduction has provided a good general overview of Ivip, for
those with some familiarity with the crisis in routing and
addressing. Sections below contain a more comprehensive statement of
the problem space, goals and potential solutions. Following that I
explore in greater detail the various aspects of the Ivip system.
This is a very early stage of development and I hope many people will
point out faults, suggest improvements, and be inspired to create
their own proposals to these challenging problems. One luxury this
field enjoys is that we can invoke large resources and make
uncommonly bold plans - because there is a dearth of easy
alternatives and the costs of doing nothing are expected to be so
high.
1.11. Long term deployment
The above discussion primarily relates to Ivip's capacity to provide
important benefits to those who adopt it, while maintaining
reachability from hosts in networks which have made no changes, such
as installing ITRs or ETRs. The most likely deployment actions will
involve the networks of Update Authorisation Servers, Replicators,
ITRDs, ITRCs and Query Servers. Although all these functions should
be capable of being implemented in software on ordinary servers
(albeit with many gigabytes of RAM for the QSDs and ITRDs) it is
likely that most network operators will require the ITRD and ITRC
functions to be performed on existing or future router systems.
In the longer term, assuming Ivip or something similar is widely
adopted, it can be expected that there will be widely available,
auto-discovered, QSC and QSD services which can support queries from
ITRCs and the ITFH functions in hosts.
An ITFH function in a host operating system is the most cost-
effective way of performing the Ingress Tunneling function of Ivip.
The cost will be essentially zero for the software, and there is
generally plenty of CPU power and RAM available to do the work.
Assuming the Replicator network will be largely built by and shared
by providers and AS-end-users and assuming this system propagates
updates throughout the world in a few seconds, then it is possible
that the Notification arrangement will make the cheaper ITRC routers
Whittle Expires February 20, 2009 [Page 27]
Internet-Draft Ivip Architecture August 2008
an attractive alternative to the full database feed, large RAM, very
large FIB ITRD routers (or their server-based alternatives). If an
ITRC can get an up-to-date response to a query about any IP address
from a local QSC - in a fraction of a second - then it may be
acceptable for it to do this for every novel packet it receives. In
that case, the ITRC handles all packets without delay, providing the
performance of an ITRD without the need for a full database feed and
without the same large FIB and RAM requirements (assuming of course
that the ITRC is not attempting to handle packets addressed to
millions of Ivip-mapped addresses at once).
If ITRCs can be so successful, then so can ITFHs which have
sufficient RAM and CPU power. An ITFH costs nothing and always
achieves optimal paths, since there is no deviation from the shortest
path towards a separate ITR. An ITFH function would probably become
mandatory in any web server at a hosting company. The alternative
would be a large investment in ITRCs and/or ITRDs.
Similarly, ITFH functions in the NAT functions of DSL and HFC cable
modems would also be an effectively zero cost alternative to the
provider network deploying large numbers of ITRDs and ITRCs. The
provider would still need to maintain a responsive QSD and QSC
network. (I tend to think of this being an "in-host" function
because these modems, although technically routers, have no hardware
FIB and the ITRC function is performed entirely in software.)
The proliferation of peer-to-peer filesharing and other applications
presents something of a challenge for ITRCs and ITRHs. An ITRD has
no difficulty with this traffic, since its large FIB is ready to
encapsulate packets with any Ivip-mapped destination address.
However, a smallish ITFH function in the NAT router section of an
ADSL modem will have some limitations on memory for its cached
mapping information. A large number of hosts behind the NAT, each
firing off packets to thousands of separate Ivip-mapped host
addresses, would place a significant burden on the ITFH, including a
frequent need to contact the nearest Query Server. However,
hopefully most users behind a NAT firewall, including especially the
hundreds of millions of DSL, HFC cable and fibre home and SOHO end
users, will have no need to have their NAT on an Ivip-mapped address.
This is a highly speculative and optimistic vision for a proposal
which is less than a month old. If such widespread deployment
eventuated, the long-term stable outcome might resemble what the
proponents of ID-LOC separation have long preferred: a new layer
(ITFH) of software in the TCP/IP stacks of many hosts. However, such
changes to hosts would be purely to increase efficiency and reduce
costs, not to ensure reachability - which is already provided by a
sufficiently widely distributed system of core-ITRs.
Whittle Expires February 20, 2009 [Page 28]
Internet-Draft Ivip Architecture August 2008
ETR functions can also be performed in hosts, or at least in NAT
devices for hosts behind NAT. The NAT device could be an ETR for
specifically identified hosts, each with a care-of address in the
private network. In this case, the NAT ETR somewhat resembles a TTR,
since the destination host sends its outward-going packets through
the same device.
These visions of ubiquitous Ivip adoption are probably unnecessary
and unrealistic. Only a subset of hosts or end-user networks will
benefit from real portability and multihoming.
Future versions of this I-D will more fully explore the highly
promising use of the ITR system to beam packets to TTRs for mobile
IP.
Future versions of this I-D will more fully explore the potential for
using the IPv4 Ivip system for tunneling IPv6 packet in a highly
scalable fashion, for using Ivip with IPv6, and for using IPv6 Ivip
to tunnel IVv4 packets.
Whittle Expires February 20, 2009 [Page 29]
Internet-Draft Ivip Architecture August 2008
2. Definition of Terms, Concepts and Functions
In the context of the extensive Introduction, this is a comprehensive
set of definitions not just of new terms, but of the main concepts
and functions which make up the current Ivip proposal. I explore in
greater detail in sections below how the various forms of ITR etc.
are used, but have included considerable detail here. There is some
repetition of material from the Introduction.
Some of the terms defined here are identical or similar to those used
in LISP and in general discussion. Others are different from roughly
equivalent terms used in LISP. There has been a long discussion on
the RAM list about the precise meaning of the terms "Identifier" and
"Locator". I am trying to avoid these terms as much as possible with
Ivip, because of the evident confusion they cause. Whether an item
of information such as an IP address should be considered or referred
to as an "Identifier" or a "Locator" depends very much on the context
in which it is used - so these terms tend to describe usage, rather
than any intrinsic quality of the item.
The long Introduction above has used some of these terms, but not
all. Eventually the Introduction may be rewritten to use all these
terms consistently, and this section moved in front of that
introductory material. For now, I want the Introduction to be
accessible to readers without learning much new terminology.
However, for the more detailed description of Ivip principles and
mechanisms below, we need to use the new terms extensively.
This is quite a detailed definition of terms, which gives some
insight into the operation of whole the Ivip system.
[To do: references for LISP, APT etc. in definitions below.]
2.1. IMIP - Ivip-Mapped IP address
Within the global unicast address space of IPv4 or IPv6, a subset of
these addresses are covered by one of the one or more IMABs (Ivip
Mapped Address Blocks, as described below). Every such address is an
IMIP.
The fact that the relevant part of the Ivip database system (the
particular IMAB-DB as defined below) may contain a null entry (zero)
for this particular address (meaning to drop the packet, rather than
tunnel it somewhere) does not alter the fact that this address is an
IMIP. Similarly, if current mapping is to an unreachable address, or
to the wrong ETR, or to no ETR etc. the address is an IMIP simply
because it is within the range of one of the Ivip system's IMABs.
Whittle Expires February 20, 2009 [Page 30]
Internet-Draft Ivip Architecture August 2008
2.2. NIMIP - Non-Ivip-mapped IP address
Within the global unicast address space of IPv4 or IPv6, every
address which is not an IMIP (is not within one of the IMABs) is a
NIMIP.
2.3. BRIP - BGP Reachable IP address
A BRIP is an ordinary IP address which is within one of the currently
advertised BGP prefixes, excluding those prefixes which are for
IMABs, meaning they are used to advertise Ivip mapped addresses
(IMIPs).
Whether or not there is actually a host or router at this address is
not important. The criteria is that the global BGP system has an
advertisement for it, and that therefore ordinary BGP routers will
forward packets with this DA to whichever router advertises the
relevant prefix. BRIP addresses include those which are anycast by
all systems other than Ivip. For instance, I understand that some
root nameservers are implemented with multiple servers using anycast.
Those addresses are BRIPs too. (This discussion assumes a single
global Ivip system. How to define this term when there are multiple
Ivip systems, including those which are not known publicly, would be
trickier.)
2.4. UAIP - Un-Advertised IP address
Any global unicast IP address which is not part of a currently
advertised BGP prefix is a UAIP. UAIPs include addresses which have
not been allocated by the IANA to any RIR, and which have not been
assigned by an RIR (or other address assignment authority) to any
end-user. The remainder of the UAIPs are in regions of the address
space which has been assigned to a provider or AS-end-user but with
they are not, at the moment, advertising. (This assumes that no
router ever advertises a prefix its operators are not entitled to
advertise, by virtue of that prefix not having been allocated or
assigned.) [To do: link to Geoff Huston's site and my ping survey
page's table.]
2.5. DID - Destination Identifier
This is roughly synonymous with LISP's "EID" (Endpoint ID). A DID is
an IP address which is an IMIP. "IMIP" is a subset of all the
possible IP addresses. We can know that a packet's DA is within this
IMIP set, so we know this specific address refers to a DID, of some
particular IRH/IRN (Ivip-mapped Receiving Host/Node). A host or any
non-ITR router doesn't recognise this. It is one of the tasks any
kind of ITR must perform to recognise that the packet's address is in
Whittle Expires February 20, 2009 [Page 31]
Internet-Draft Ivip Architecture August 2008
the IMIP set, and therefore is a DID which must be used to look up
mapping - in an internal set of copies of the IMAB-DBs or via some
external Query Server.
2.6. TELOC - Tunnel Endpoint Locator
A TELOC is a BRIP address which we, or an ITR, reasonably believes is
the address of an ITR - because this address is found in the database
as the mapping for one or more IMIPs.
The ITR will encapsulate the packet, using the appropriate TELOC as
the DA of the outer IP header.
To all routers, the packet is just an ordinary packet addressed to
some BRIP address. When it arrives at its destination, the idea is
that this will be an ETR which decapsulates the original packet and
forwards it to the host with the orginal DID address. However, the
ITR doesn't know for sure this will happen. It simply tunnels the
packet to the TELOC.
"TELOC" is related to LISP's "RLOC" (Routing Locator), except I think
that some LISP material uses "RLOC" to refer to any IP address which
is not an EID. I think this is rather too loose a use of a single
term, so for Ivip, "BRIP" means any advertised address which is not
an IMIP. "DID" refers to the specific address of a packet, which is
an IMIP, and "TELOC" refers to a specific address to which a packet
is tunneled.
2.7. IMAB - Ivip-Mapped Address Block
(This is what I previously referred to as a "master-subnet".) An
IMAB is a contiguous range of address space for which a single RUAS
(Root Update Authorisation System) is authorised to control the
mapping for, and for which it does so via a single stream of update
packets (US-IMAB) and a single IMAB-DBD (IMAB DataBase Dump) file.
While the database structure, update messages etc. work fine for
arbitrary starting and ending points for an IMAB, it is important
that the IMAB can be advertised as a single BGP prefix. A
straightforward prefix on binary boundaries can be an IMAB, such as
29.0.0.0/20. Assuming IPv4 for the rest of this definition, and
assuming a /24 limit on the longest prefix which is admitted to the
BGP system, all IMABs need to be on /24 boundaries. They should not
involve a prefix any shorter than /8.
An IMAB may straddle simple binary boundaries, as long as it is still
acceptable to be advertised within BGP. For instance 29.0.1.0/20 is
also a valid IMAB, covering 29.0.1.0 to 29.0.16.255. 29.0.1.128/20
Whittle Expires February 20, 2009 [Page 32]
Internet-Draft Ivip Architecture August 2008
would not do, because it straddles a /24 boundary.
It is not permissible to use a range such as 29.0.1.0 to 29.0.15.255
as an IMAB, since this does not match a full /19, /20 or /21 range.
The reason for these restrictions is that when an ITRD (full "push"
database ITR) downloads an IMAB-DB, decodes it and applies all real-
time updates to it, it is then able to handle packets for the address
range of the IMAB. At that point in time, it advertises the IMAB's
prefix to its BGP peers. In order to reduce the number of advertised
BGP routes and to reduce churn in the way they are advertised, it is
desirable for every area of address space covered by a single
database dump and by a single stream of update packets to match a
single prefix which can be advertised in BGP.
Where a single large range of contiguous addresses is for some
scaling reason handled with separate database dumps and update
streams, it should be divided into separate IMABs. This increases
the number of BGP advertised prefixes, but may be justifiable, for
instance within a large (eg. /8) prefix of IMIP space, so that ITRs
can load share by each handling a subset of the entire /8.
2.8. IMAB-DB - IMAB DataBase
This refers to the body of data which specifies the Ivip mapping of
the individual IPv4 addresses (or /64s for IPv6) for a single IMAB.
Within a RUAS (Root Update Authorisation System) there exists one or
more copies of the Master IMAB-DB for each IMAB this RUAS is
authoritative for. This is updated in real-time by Update Commands
directly from end-users or from branch and leaf UASes (Update
Authorisation Systems).
ITRDs (full database ITRs) and QSDs (Query Servers with the full
Database) maintain as best they can a real-time updated copy of each
IMAB-DB for each IMAB in the Ivip system. This is a Slave copy of
the IMAB-DB. The state of the slave copy is that it lags behind the
master, ideally by only fractions of a second, but in practice
probably by a few seconds - or more if there is congestion or lost
packets in the Replicator system.
The slave copy of the IMAB-DB directly controls the FIB of the ITRD,
and how the QSD responds to queries. (In a server-based ITRD, the
array which contains the raw mapping data is the FIB, because the
packet handling code simply indexes into the appropriate location in
the array for the appropriate IMAB, and reads the 32 bit result
there.) Changes to the IMAB-DB may cause the QSD to send
Notifications to child QSCs, ITRCs or ITFHs which previously received
query responses concerning one or more IMIPs for which the mapping
Whittle Expires February 20, 2009 [Page 33]
Internet-Draft Ivip Architecture August 2008
has changed.
Whereas LISP and APT carry a potentially large amount of information
for each IP address or prefix within their database system (eg.
multiple ETR addresses, TE parameters for choosing dynamically
between multiple ETRs and in the case of APT, the end-user's public
key), the Ivip database structure is extremely simple. Each element
of the database contains a single IP address: 32 bits for IPv4 or 128
bits for IPv6. Typically, this is the address of an ETR, but in fact
it could be any address, subject to certain off-limits ranges,
including the prohibition of any addresses which is an IMIP. In
practice, the value of the IP address would always point to a BRIP
address, not to an unadvertised UAIP address.
Consequently, the dump and the update messages for this database can
be highly compressed and easily interpreted. (Any protocol handling
these dumps or update messages should be backwards compatible
extendable to incorporate further elements, but I can't think of a
use for them at present.)
The easiest way to think of this database is an array, where location
0 refers to the first IMIP in the IMAB. It is also possible to
structure the database as a series of prefix rules, so for instance
16 contiguous addresses on binary boundaries with the same mapping
could be specified by a rule to this effect, rather than with 16
separate IP addresses. For IPv4, I will assume the database is
simply an array. For IPv6, it would probably be best to structure
the database as prefix rules, since so many more address bits may
vary over the range of the IMAB. (I guess IPv6 was designed by
people who wrote programs in high level languages, rather than
electronic hardware engineers!)
2.9. IMAB-DBD - IMAB DataBase Dump
This is a file, typically compressed, which carries the full contents
of the master IMAB-DB at some point in time. It is made available
quickly at multiple servers so ITRDs and QSDs can download a copy
when they boot up, or periodically afterwards.
The dump file format needs to be carefully standardised. It should
have an extendable format, and be compact for all typical data
patterns. Probably a series of binary elements followed by a long
array would be fine, all gzipped. However, maybe a specialised
compression algorithm would be more efficient, be easier to implement
at the ITRD or QSD, or provide some other benefits.
The dump file needs to specify: the format of the file, such as by
the RFC version it adheres to; the time and date it was created; a
Whittle Expires February 20, 2009 [Page 34]
Internet-Draft Ivip Architecture August 2008
number identifying the RUAS which generated it; a sequence number
matching such a number in an update stream packet which signifies a
dump was made at that instant; the AFI (Address Family Identifier) of
the address space covered; the starting address and range of the
address space covered; the BGP prefix which will be advertised once
the ITRD has this data loaded and fully updated (perhaps this is
redundant); finally, the array of addresses in some compressed form.
There probably needs to be a CRC as well, with the ITRD or QSD able
to ensure by some cryptographic means that the data is valid and
really originates from the RUAS.
2.10. UMUC - User Mapping Update Command
A UMUC is whatever action the end-user performs on one or more
different user-interfaces of whatever UAS (Update Authorisation
System) they use to change the mapping of their one or more IMIPs.
The system would be able to tell the user the current mapping and
also confirm that a requested change to the mapping was to an
acceptable address.
For now, I will assume that all UMACs are for valid mapping addresses
- so a UMAC is a successfully accepted update command from the end-
user, or some person or system or with the end-user's credentials.
There probably needs to be a protocol by which a request to change to
an invalid address, for example a UAIP, is rejected with an error
message.
The command takes the form of a starting IMIP, a range, and a single
IP address to which this one or more IMIPs will have their mapping
changed to. The UMUC exists only after the UAS has verified the
credentials, the addresses and the new mapping address as being
valid. The UMUC is then ready to be handed down either to alter the
IMAB-DB itself, or to another UAS which achieves the same outcome.
2.11. SUMUC - Signed User Mapping Update Command
This is the information contained in a UMUC, signed by the UAS which
accepted it from the user (or by some lower UAS in the tree), being
handed down the tree to another UAS, perhaps the RUAS of the tree, so
that the recipient UAS can verify the signature and regard the UMUC
as authoritative.
2.12. SH/SN - Sending Host/Node
The host computer, or a router, which sends the packet in question.
Other than the local network's checking the SA (Source Address) of
the packet to decide whether it is from an authorised address, there
Whittle Expires February 20, 2009 [Page 35]
Internet-Draft Ivip Architecture August 2008
is no difference in Ivip whether the sending host or node has an IMIP
or a BRIP address.
2.13. RH/RN - Receiving Host/Node
The host computer, or a router, with an ordinary BRIP address (or
prefix) which is intended to be the final recipient of the packet in
question.
2.14. IRH/IRN - Ivip-mapped Receiving Host/Node
The host computer, or a router, with an IMIP address (or prefix of
IMIP addresses) which is intended to be the final recipient of the
packet in question. An IRH or IRN does not need any address or
prefix other than the one it has via the Ivip system. However it may
have one or more addresses in the local network and it may have more
than one IMIP address or prefix, each perhaps using a different ETR.
2.15. MH/MN - Mobile Host/Node
A host computer, or a router, with an IMIP address (or prefix of IMIP
addresses) which is using via one or more two-way tunnels it
establishes with one or more TTRs (Translating Tunnel Routers). A
Mobile Host or Node typically [To do - what is the proxy MIPv6 mode
where this is not true??] has a "care-of" address in the one or more
networks it is currently connected to. It also needs special
software which operates from the care-of address, running the tunnel
to and from the TTR, and connecting that tunnel with the main TCP/IP
stack in the host or node. Please see the "Loose ends - TTRs and
Mobility" section for a fuller description of how Ivip can help with
Mobile IP.
2.16. UAS - Update Authorisation System
This is a general term for a system which is operated by an
organisation and plays some role between the user making a UMUC and
the actual IMAB-DB being changed.
Some UASes accept UMUCs as their inputs. Those which do not must
accept SUMUCs from other UASes. A UAS may have end-user interfaces
and links to branch or leaf UASes higher in the tree.
Leaf UASes are at the ends of branches of a tree composed of UASes,
with a single Root UAS at the base. Each UAS SHOULD be implemented
as two or more linked but redundant servers, similar to the master
and one or more slave arrangement of nameservers, with all of them
being authoritative in terms of their interactions with other UASes
and with end-users.
Whittle Expires February 20, 2009 [Page 36]
Internet-Draft Ivip Architecture August 2008
2.17. RUAS - Root Update Authorisation System
A RUAS is the authoritative UAS for one or more IMABs. Therefore, it
periodically generates - say every 10 minutes - an IMAB-DBD file. It
also continually produces a stream of updates. The RUAS MUST be
implemented as two (three?) or more redundant servers in
geographically and topologically well-separated locations.
The interactions between the RUAS and its branch and leaf UASes
SHOULD be governed by some new IETF standards to ensure it is easy
and robust to run these systems and have them interoperate securely.
The set of other UASes each RUAS may interact with may be different
for the authorisation tree for each of the potentially multiple IMABs
it handles. The branch and leaf UASes in each such tree may also be
members of other trees of this RUAS (for other IMABs) and of trees
rooted in other RUASes. An RUAS may be a leaf or a branch in some
other RUAS's tree, but in that role the system and its servers only
behave as an ordinary UAS.
2.18. US-IMAB - Update Stream specific to one IMAB
This is a stream of data, at present assumed to be UDP packets (but
perhaps implemented in another way, such as a multicast system) by
which the real-time updates to the mapping data for any one IMAB are
conveyed.
One or more identical US-IMAB streams are generated for each IMAB for
which the RUAS which is authoritative. So each RUAS could be
generating these streams for multiple IMABs. As described in a
section below, these streams are replicated and delivered, with high
reliability, to ITRDs and QSDs all over the Net - ideally within a
second or so.
2.19. US-Complete - Update Stream for the Complete Ivip system
This is the combined set of all US-IMAB streams which each ITRD or
QSD needs. To what extent it is simply the sum of all US-IMAB
packets simply replicated, or to what degree the first level of
replicators compacts the data to reduce the number of packets, is yet
to be determined. There are also problems to be solved when this US-
Complete is missing packets.
Theoretically, all Replicators get two copies of all US-IMAB streams,
for redundancy. Ideally, each ITRD and QSC will get two separate US-
Complete streams from two separate Replicators in widely
topologically distinct locations on the Net, to enhance robustness.
This is a crude doubling of bandwidth, but it might be better than
Whittle Expires February 20, 2009 [Page 37]
Internet-Draft Ivip Architecture August 2008
something more complex with lower bandwidth.
2.20. Replicator
A system of Replicators form a redundant, reliable, high-speed
distribution system for update streams. The Replicator system is
only roughly described in this I-D. Its job is to get packets which
together make up at least one US-Complete stream to every ITRD and
QSC which needs it.
Replicators could be implemented in routers, but are probably best
implemented in ordinary software on a Linux/BSD etc. server. They
don't need hard drive storage and do no caching of data.
Replicators could be located within, or as stubs to, transit routers
or border routers. Within large provider or AS-end-user networks,
they would be servers or perhaps implemented in internal routers.
An ITRD or QSD could also operate as a Replicator.
2.21. QSD - Query Server with full Database
Like ITRDs, QSDs get a full feed of updates (at least one copy of US-
Complete) from one or more Replicators. Like ITRDs, when they boot,
they download individual IMAB-DBD files for each IMAB in the Ivip
system. I write more about this in a section below on ITRs. Once
their slave copies of the complete set of IMAB-DBs is up-to-date and
being continually updated, they are ready to respond to queries.
The query protocol needs to be defined, and is the same for queries
from ITRCs, ITFHs and QSCs - Query Servers which Cache.
The QSD needs to keep a record of responses sent out, and cache times
(which ideally might be a single fixed time, to make it easy to
implement). It keeps a watch on incoming changes to the many IMAB-
DBs, and if any change affects IMIPs which were covered by a response
it sent out which could be cached by another device, it sends out a
Notification to that device, with the new information.
A QSD could be integrated with a Replicator function, and perhaps an
ITRD function - or for that matter an ETR function too.
QSDs have no routing functions, so it would be overkill to implement
this in a router. They need a lot of memory, so the best way to
implement a QSD is probably on an ordinary server with one or more
gigabit Ethernet interfaces. No hard drive is required, except
perhaps for logging purposes.
Whittle Expires February 20, 2009 [Page 38]
Internet-Draft Ivip Architecture August 2008
2.22. QSC - Query Server with Cache
A QSC could be implemented in a router. It does not route packets,
but its memory and computational requirements are likely to be modest
compared to those of a QSD. There is no need for a full feed of US-
Complete data. However, there must be one or more upstream QSDs - or
perhaps QSCs with upstream QSDs.
The easiest way to implement this would be software on a modest
server, which would only need a hard drive for logging purposes.
In addition to handling queries from cache or by passing the query to
one or two or more upstream QSDs or QSCs, the QSC needs to keep a
record of responses sent out to this queriers - which are ITRCs,
ITFHs or other QSCs. When it receives a Notification from its
upstream QSD/QSC, it needs to look at those records and decide which
of its queriers to send the Notification to.
Small sites could use one or more QSCs for local ITRCs and ITFHs,
relying on one or more external QSD to answer all queries. This
saves bringing a full US-Complete feed into the site and it saves on
the RAM needed for a full QSD.
2.23. ITR - Ingress Tunnel Router
A general term for a router or server which accepts packets with DA =
an IMIP and which encapsulates the packet, with the outer IP header
having a DA of some BRIP address the end-user chose as the mapping
for this IMIP. That address will presumably cause the packet to
arrive at an ETR, which decapsulates it and forwards the packet to
the Destination Node.
The ITR has a locally configured set of limits which prevent it from
tunneling packets to certain ranges of addresses, including those
defined for protecting critical infrastructure against Ivip
malfunction, and including all IMAB addresses. This set of limits is
downloaded regularly and securely, so that over time, these limits
can be altered.
2.24. ITRD - Ingress Tunnel Router with Database
An ITR with a full copy of all IMAB-DBs, updated in real time by the
US-Complete it gets from one or ideally two Replicators. The updates
alter the local copy of each IMAB-DB and cause a corresponding change
in the FIB of the router, which finds and tunnels every incoming
packet with an IMIP DA. (Unless the address in the database for that
IMIP is zero or within a banned region, in which case the packet is
dropped.)
Whittle Expires February 20, 2009 [Page 39]
Internet-Draft Ivip Architecture August 2008
ITRDs can be implemented in a suitable router with lots of RAM, CPU
power and a very high capacity FIB, in terms of the ability to tunnel
packets and in terms of how many rules can be applied, down to
potentially millions of /32 (IPv4) or /64 (IPv6) prefixes.
I explore in a section below how an approximately 1 gigabit ITRD
could be built using commonly available server hardware. For a well
developed Ivip system, this will require quite a few gigabytes of RAM
- since the best way to implement the database and FIB is as a series
of arrays with 32 bits (128 bits for IPv6 - urrgh!) for each mapped
address (or /64 for IPv6).
An ITRD might also implement the Replicator, QSD and/or ETR
functions.
2.25. ITRC - Ingress Tunnel Router with Cache
An ITR without a full copy of all the IMAB-DBs - and so not requiring
a US-Complete stream from one or more Replicators.
The ITRC gains mapping information from a nearby QSD, perhaps by one
or more intermediate QSCs. It may hold every packet it receives with
an IMIP DA until it requests and receives mapping information. In
this case, it handles every packet with DA within an IMAB - generally
as quickly as a full ITRD.
Whenever an ITRC chooses to request mapping information from the one
or more QSD/QSC systems it relies upon (two separate systems might be
more robust, especially if the query and response is sent via UDP),
its request specifies a single IP address, the DID of this packet,
which it already knows is an IMIP address.
The response it receives will concern that DID address, and
potentially one or more IMIP addresses above and below this address -
all of which have the same mapping. So the response will consist of
a starting address, a range, and a TELOC IP address which will become
the DA for the encapsulated packet for any incoming packet with a DA
within this range. There may also be an explicit caching time for
this response, or perhaps a default, system-wide, constant caching
time such as 600 seconds.
The ITRC uses this mapping information, updating its FIB accordingly,
for the caching time. At the end of that time, it may choose to make
another query - which it would ordinarily only do if it is still
receiving packets within that range.
At any time during the caching period, if the QSD which answered the
query (or provided an answer to a QSC which actually answered this
Whittle Expires February 20, 2009 [Page 40]
Internet-Draft Ivip Architecture August 2008
ITRC's query) recognises a change in the relevant IMAB-DB which
affects the range of addresses in the response this ITRC received to
its query, then the QSD will send a Notification. The Notification
may pass through multiple QSCs, but will reach this ITRC and any
other ITRCs which received similar responses.
ITRCs do not need a massive FIB, but if they are a router, their FIB
needs to be able to encapsulate packets and handle a substantial
number of rules, depending on the volume and nature of the traffic.
CPU involvement would be modest to substantial.
An ITRC could be implemented in a server with modest memory
requirements. It requires only modest bandwidth (compared to a full
US-Complete feed) for the queries, responses and Notifications with
its one or more parent QSDs or QSCs.
An ITRC faces some choices regarding which packets to try to gain
mapping information for. Firstly, it needs some way of identifying
incoming packets as having a DA which matches one of the IMIPs or
ranges of IMIPs which it already has mapping information for. Those
packets should be encapsulated immediately according to that mapping
information.
Secondly, the FIB needs a way of detecting which packets arrive with
IMIP DAs, but which are not currently matched by one of the existing
encapsulation rules. I guess the most advanced routers such as the
CRS-1, M120 and MX960 have such flexible ASIC and RAM FIBs that with
suitable firmware, they could do this sort of thing. I would be
surprised if lesser routers could be programmed to do this sort of
thing efficiently.
Also the router needs to reliably monitor which of its currently
cached rules are still being used by packets. Furthermore, the
router may need an efficient way of only requesting mapping
information for packets whose DA appears more than once.
If the ITRC doesn't quickly (fraction of a second) gain the mapping
information for every IMIP packet it receives, and/or if its RAM or
FIB can't hold all these rules and mappings, then it has to decide
what to do with packets which it cannot at present tunnel to the
correct address.
One option is to drop the packets - but this is unlikely to be
acceptable. Another is to let the packet be forwarded towards a peer
router which also advertises the complete set of IMAB prefixes. If
that peer is an ITRD, or this path leads to some ITRD in the core,
then it is probably acceptable to let a small proportion of packets
pass like this.
Whittle Expires February 20, 2009 [Page 41]
Internet-Draft Ivip Architecture August 2008
Alternatively, these untunneled packets, assuming the router can
identify every one, could be forwarded or tunneled to a nearby ITRD.
A bunch of ITRCs could therefore take most of the load, with the ITRD
instantly tunneling a fraction of the network's total DA=IMIP
packets.
An ITRC might also implement the QSC and/or an ETR function.
2.26. ITFH - Ingress Tunneling Function in Host
A host which is not behind a NAT could have additional software in
its TCP/IP stack to perform the ITRC functions described above. It
needs a good link to a nearby QSD/QSC system - so this would not be
suitable over a dialup modem or radio link.
Host software, CPU power and RAM is free, provided there is enough of
it. This would greatly reduce the load on any ITRCs and perhaps
ITRDs in the rest of the network. An ITFH function would be highly
desirable in every web server in a hosting company.
As with ITRCs, ITFHs need to have some kind of backup ITRD to handle
packets they can't tunnel. As with ITRCs, ideally the location of
two or more nearby QSDs or QSCs should be auto-discovered. Likewise
the location of two or more ITRDs if there is a way of explicitly
tunneling packets to them when the ITFH doesn't have the mapping or
FIB capacity to tunnel them itself.
The ITFH device doesn't need to be on a BRIP address (neither does an
ITRD or ITRC, but I usually assume "routers" are on BRIP addresses),
but it cannot be behind a NAT.
A host performing NAT functions for some hosts on a private network
is a good place to implement ITFH, as long as this host is not behind
NAT itself. The most common NAT situation is a DSL or cable modem
(or an optical home/SOHO adaptor too). I have referred to performing
Ingress Tunnelling functions in such a modem as ITFH, but I guess
they are formally a router, not a host, so maybe it would be purely
software-based ITRC function as a firmware upgrade.
ITRCs and ITFHs could easily be overwhelmed by a large number of
different DA addresses inside the caching period, so they need to be
able to drop old cached mapping data when their RAM or FIB can't
handle it. They need to be in a network position where an upstream
ITRD will always find their packets. In principle, with Ivip, this
is always the case, depending on how congested the nearest "anycast
core-ITR is".
Whittle Expires February 20, 2009 [Page 42]
Internet-Draft Ivip Architecture August 2008
2.27. ETR - Egress Tunnel Router
An ETR is a router or a server which receives encapsulated packets on
one of its one or more BRIP addresses, strips off the outer IP
header, copying its hop-count to the internal packet, and then by
some means ensures the resulting packet is delivered to the IRH/IRN
(the receiving host/node with an Ivip-mapped address).
There needs to be some local network management system which can tell
the IRH/IRN - or at least the end-user by some means, where the one
or more usable ETRs are. This management system may also need to
ensure the local routing system can deliver decapsulated packets with
DA=DID to the IRH/IRN. The ETR is not necessarily the device to be
responsible for this, because ETRs can die and there should be
another available to select by the end-user changing the Ivip-mapping
of their IMIP.
Ivip ETRs don't need any fancy functions, management or protocols -
they just accept any IP-in-IP packet they get on one or more of their
BRIP addresses, decapsulate it, and - if the DA matches an address
the ETR and the local routing system is ready to handle - forward the
packet to its destination host or link to the end-user's site.
2.28. ETFH - Egress Tunnel Function in Host
I haven't given much thought to this. Maybe it would be useful for a
host with a local care-off address to do its own ETR functions,
rather than relying on a separate ETR. Perhaps the host has another
link to another network for multihoming. This resembles some
mobile-IP situations.
2.29. TTR - Translating Tunnel Router for Mobile-IP
A TTR behaves like an ETR as far as the Ivip system (IMAB-DBs and
ITRs) are concerned - it is simply a device with a BRIP address to
which packets are tunneled. A MN/MH establishes a two-way tunnel to
the TTR from its care-of address, which can be behind NAT. The MN/MH
may have such tunnels to other TTRs, including via different edge
networks - such as one link over WiFi, another over UMTS and a third
via an Ethernet cable. I have not looked into the details of such
tunnels, but lets say it is a two-way IP-in-IP tunnel or some other
type, perhaps with compression and encryption.
A TTR may be in the provider network with which a MH/MH has a link,
in which case the provider network probably runs it and pays for it.
In this case, the TTR will need to be authorised to forward the
packets with the source address being that of the MN/MH's IMIP
addresses.
Whittle Expires February 20, 2009 [Page 43]
Internet-Draft Ivip Architecture August 2008
A TTR may be outside the current provider network where the MN/MH has
its care-of address. In this case, the end-user will probably need
secured access to it, and have to pay some TTR network for using the
TTR. This TTR network might have central monitoring systems and
autodiscovery software in the MN/MH which automate the process by
which the MN/MH finds TTRs, and by which the TTR network controller
changes the mapping by issuing a UMUC to whichever UAS handles the
end-user's IMIP address(es).
Please see the "Loose ends - TTRs and Mobility" section for a fuller
description of how Ivip can help with Mobile IP.
In some ways, the TTR resembles a home-agent in Mobile IPv6.
However, there can be TTRs all over the world, inside and outside
provider networks. The MH/MH can choose two or more "nearby" TTRs
and either by itself, or more likely via some centralised monitoring
system, cause the ITRs of the world to beam packets to whichever TTR
it has the best link with.
This means highly optimal paths in both directions between
correspondent hosts anywhere and MN/MHs anywhere, including as the
MN/MH moves from one place to another.
Handover from one TTR to another in the event a radio link fails is
unlikely to be fast enough to support glitch-free VoIP, but this
still represents a tremendous boost for mobile IP. Firstly, it gives
optimal paths without any fixed home-agent. Secondly it will work
fine with IPv4!
Mobile IP is not such a big thing that a global system of ITRs and
databases etc. would be built. But we clearly need to build
something like this to avoid the entire BGP system being swamped.
Once built, it enables a great new approach to Mobile IP. I can't
see anything specific which needs to be added to Ivip itself - the
UASes, Replicators, ITRs etc. specifically for facilitating mobile
IP.
However, since mobile IP involves many changes to mapping, compared
to end-users who use Ivip for portability or multihoming without
fancy TE, mobile-IP end-users are going to be placing quite a load on
the entire Ivip system. They need to pay for this in some way, and
it could be argued that the whole Ivip system is so valuable to
mobile-IP end-users that it could be run at a profit just by charging
them for their frequent changes and generally rather low traffic
volumes,
Whittle Expires February 20, 2009 [Page 44]
Internet-Draft Ivip Architecture August 2008
3. The Crisis in Routing and Addressing
I don't have time to do this section properly. Obviously there needs
to be reference to the RAWS report etc. [IAB-RAWS-website] and to
work which is yet to be completed such as RADIR's "Problem Statement"
and the RRG's "Design Goals I-D".
Here are some points which attempt to summarise the situation.
3.1. Interrelated needs and problems
Internet routers cannot handle millions of separate routes.
Therefore, in order to maximise the number of separate subnets which
are advertised at border routers, while decreasing the number of
routes each router needs to implement, "route aggregation" must be
maximised.
Route aggregation can be maximised by end-users obtaining Provider
Assigned (PA, AKA Provider Aggregatable) addresses from their ISPs
rather than becoming Autonomous Systems and obtaining their own
Provider Independent (PI) addresses.
Route aggregation can also be maximised by trying to encourage ISPs
and AS-end-users to connect prefixes which are close to each other in
the address range at topologically close points in the Internet.
Route aggregation can also be maximised by giving ISPs and AS-end-
users rather large subnets of address space, infrequently, rather
than frequently giving them more numerous smaller subnets, which
would necessarily break up the address space into more divisions.
Large and medium sized end-users, such as companies and businesses,
need IP addresses which are portable, meaning PI addresses. This has
long been allowed for IPv4, but the general absence of this to date
for IPv6 has been a barrier to the adoption of IPv6. [To do: link to
ARIN policy which apparently allows this, and to proposal for RIPE
policy change.]
These end-users also need PI addresses because this is currently the
only way of achieving multihoming.
Multihoming is maximised when the prefix can be advertised in widely
different parts of the network topology, which is directly at odds
with Route Aggregation.
All end-users must become Autonomous Systems and invest heavily in
BGP expertise and routers before they can obtain PI address space and
Whittle Expires February 20, 2009 [Page 45]
Internet-Draft Ivip Architecture August 2008
therefore portability and multihoming. [Is this true, or can an end-
user get their own PI space without becoming an AS? I think someone
told me they could, but this surprised me.] Many such end-users and
millions of businesses and organisations with less resources want and
arguably need both portability and multihoming.
Consequently, the only way large end users can meet their needs for
portability and multihoming is to advertise more and more prefixes,
fuelling growth in the "global BGP routing table" to the point where
ISPs, AS-end-users and other router operators (transit providers)
fear overly rapid obsolescence of routers and very high replacement
costs for more powerful routers.
Fresh supplies of IPv4 address space are projected to run out in
2010. [To do: link to Geoff Huston's site.] Yet if routers had
always been capable of handling tens or hundreds of millions or
routes, there would be no shortage yet, because the address space
would have been handed out in smaller, more efficiently used chunks,
because there would have been no imperative to maximise route
aggregation.
Some administrative and technical changes must be made to allow and
encourage the more efficient use of the 3.7 billion IP addresses
available in the IPv4 Internet. Current utilisation rates of the
portion which has been assigned so far cannot be know for certain,
but are probably 10% or less on average, while some areas of the
address space are used with efficiencies of 20% or more. [RW ping
survey]
3.2. Constraints on possible solutions
The constraints upon solutions are quite oppressive.
It is not practical to upgrade or alter the operating systems or
application programs of any Internet computers, including servers and
desktop machines.
While some improvements in the BGP network may be achieved, the
benefits which might be achieved are not of the scale which would be
required to cope with current growth rates in the "global BGP routing
table".
Ideally, most of the BGP routers in use today will still be usable
for five or more years, but this will only occur if end-user needs
can be met while halting or drastically slowing the growth in the
"global BGP routing table".
A wholesale move to IPv6 is not possible - and IPv6 packets are
Whittle Expires February 20, 2009 [Page 46]
Internet-Draft Ivip Architecture August 2008
harder for routers to classify, while the problems of the growth in
the IPv6 BGP routing table are just the same as for IPv4.
Any new system must be backwards compatible with existing software,
ISP and AS-end-user edge network structure, BGP routers and the
existing BGP routing system.
Any new solution must be incrementally deployable. This means that
there must be some immediate benefit for those who first make the
effort to install the new system.
The most likely solution is a major change to the Internet's routing
and addressing architecture, in the form of an "overlay network"
which provides new flexibility for portability and multihoming - and
ideally for TE - for many more end-users without them needing to
become Autonomous Systems, gain PI space or add to the number of
routes in the global BGP routing table.
Whittle Expires February 20, 2009 [Page 47]
Internet-Draft Ivip Architecture August 2008
4. Potential Solutions
[To do: point to LISP I-Ds. SHIM6, eFIT - APT, Tony Li's and Geoff
Huston's BGP ID and other BGP potential improvements. Maybe list
more esoteric non-backwards compatible ID/Loc split work.]
Whittle Expires February 20, 2009 [Page 48]
Internet-Draft Ivip Architecture August 2008
5. Comparison with LISP
This section contains three lists of principles and mechanisms: those
in LISP which are used by Ivip, those in LISP which are not used by
Ivip, and those which are not in LISP and which are in Ivip. These
lists concern principles and mechanisms, not outcomes or functional
features of each entire system.
This is based on my potentially incorrect understanding of LISP,
based on the current IDs LISP-010 and LISP-CONS-01. There is also
LISP-NERD which I am not really considering here.
5.1. LISP principles and mechanisms used by Ivip
The basic concept of using some existing IP addresses to identify a
final destination host, and others to locate an ETR which can deliver
the packets to that host. This is in contrast to some Id/Loc
separation schemes which propose major alterations to the TCP/IP
stack so these two types of addressing are performed by separate
types of address which are not at all the same - and with one or both
of these being incompatible with the existing concept of an IP
address.
Changing some parts of the routing and addressing system, but not
host operating systems or applications. Both LISP and Ivip are
invisible to hosts.
Applicability in principle to IPv6, with most work at present
developing a good system for IPv4. Potential carriage of IPv4
traffic over IPv6 tunnels and vice-versa.
Intended to be incrementally deployable, to minimise the software
changes required in routers and to require no hardware changes in
routers. However, to what extent the ITR functions of encapsulation
- and for LISP, quite complex communication with ETRs - can be
implemented in existing FIB hardware would depend very much on the
model of router. (Perhaps it is fine for these communications to be
done by the main CPU. LISP's encapsulation is more complex than
Ivip's and I think it often, or always, involves a nonce - a single-
use random number.)
The ITR term and basic concept. (With Ivip the ITR is purely for
encapsulating a packet to be sent to an ETR. Ivip does not involve
messaging between ITR and ETR, either in headers or separate
packets.) ITRs are globally distributed and are close to sending
hosts - to encapsulate packets which are destined for LISP/
Ivip-mapped hosts in networks all over the Net. (LISP ITRs are in the
network of the sending host, which is the ideal location for Ivip
Whittle Expires February 20, 2009 [Page 49]
Internet-Draft Ivip Architecture August 2008
ITRs, but Ivip also has "anycast ITRs in the core".)
The ETR term and basic concept. (Ivip ETRs simply decapsulate
packets. They do not engage in communication with the ITR, or make
use of any information in the encapsulation header except the hop-
count, which as with LISP, is copied to the decapsulated packet.)
With both LISP and Ivip, the ETR or the local routing system needs to
be configured to deliver the decapsulated packets to the destination
host. Except for Ivip's TTR, which is a type of ETR, ETRs are always
inside provider or end-user networks, and are close to the host for
which they are decapsulating packets.
Providing portability (not LISP 1.0 or 1.5?), multihoming and TE,
without involving BGP and without requiring any changes to host
applications or operating systems. (Ivip has no explicit TE
capability and its TE goals may be more modest, and require external
control systems.) Depending on the response time of the database and
ITR system, both LISP and Ivip will ideally support mobility while
existing communication sessions continue on the same EID (DID for
Ivip) IP address.
Not altering all the BGP routers. The best place to put ETR and ITR
functions in LISP is often in border routers, which are BGP routers,
but I understand that transit routers are typically unmodified.
(Ivip requires a large number, but still only a small proportion, of
"transit" BGP routers to be ITRs - although the same effect can be
achieved if many border routers are ITRs which accept raw packets
from unmodified networks and tunnel them to wherever the packets need
to be tunneled - not just to ETRs in the network in which the border
router is located.)
Some kind of global database, distributed databases etc. to control
all ITRs at the same time. (The current Ivip proposal for how that
data is altered by users and distributed to ITRs is very different
from the current LISP proposals.)
5.2. LISP principles and mechanisms not used by Ivip
The EID and RLOC terminology. Ivip uses new terms which are
comparable, but which have somewhat different meanings. Likewise,
while Ivip may to some extent be an instance of an Id/Loc separation
system, I haven't made this a focus of the design or terminology.
LISP-00 (January 2007) used IP-in-IP encapsulation, as does Ivip.
LISP-01 (July 2007, or at least the draft I have seen) uses UDP
encapsulation instead.
Whittle Expires February 20, 2009 [Page 50]
Internet-Draft Ivip Architecture August 2008
Recursive tunneling and recapsulating tunnels. There is nothing in
Ivip to prevent an ETR sending packets in a tunnel to somewhere else,
but it is not contemplated in the Ivip architecture, except for TTRs,
which are ETRs with a two-way tunnel established by the mobile node.
LISP header, UDP encapsulation and UDP message packets between ITR
and ETR. Ivip uses no special header or UDP - just IP-in-IP
encapsulation. It could use UDP or some other encapsulation method,
but at present I think IP-in-IP is fine. The Ivip IP-in-IP header
contains no extra information, nonces etc. So Ivip isn't really a
protocol between ITR and ETR, but a plan for running ITRs and ETRs,
with protocols, databases, replicators etc. so the end-users can
securely and quickly alter the tunneling behavior of all the world's
ITRs.
Ivip only works with a centralised or set of centralised databases.
There is no equivalent to the more ad-hoc arrangement of LISP 1.0 or
1.5, in which ETRs are the authoritative source of mapping
information and in which they communicate back to the ITR in various
ways. Ivip encapsulated packets never have the ITR's own IP
addresses as the source address, as is done for at least some
encapsulations with LISP 1.5 and perhaps other variants too. Ivip's
outer IP header has a DA of the ETR's address and the SA of the
sending host.
I think LISP includes some functions for testing and confirming
reachability between ETR and ITR. Also, I think the ETR can signal
to the ITR which of the various alternative ETRs should currently be
used or not used. Ivip doesn't involve any reachability tests or
communication about reachability. A practical multihoming
arrangement would require some separate monitoring system which
constantly tests reachability, and/or receives reports about non-
reachability, so it can send update information to the Ivip database
to select a different ETR.
I understand that with at least some LISP variants - perhaps only 1.0
or 1.5 - the ITR in a network with some sending hosts with EID (I
call them "LISP-mapped") addresses will NAT the packets sent by these
hosts to ordinary non-LISP-mapped addresses. So the packet received
by the ordinary destination host has a SA of the ITR, which is an
ordinary non-LISP-mapped address which is reachable via BGP. This
enables the destination host to send packets back in the opposite
direction, making the hosts with EID addresses reachable, as long as
these EID hosts initiate the communication. (I don't understand how
the destination host could then distinguish between the packets sent
from different sending hosts behind this NAT arrangement.) The
encapsulated packet leaving an Ivip ITR has the original SA and a new
DA (of the ETR) in its outer IP header.
Whittle Expires February 20, 2009 [Page 51]
Internet-Draft Ivip Architecture August 2008
LISP's explicit TE parameters in the database and which are
communicated to the ITR (Priority and Weight) may require the ITR to
make some complex decisions, which I think would be difficult or
impossible in existing FIB hardware. Although I think these TE
functions are powerful and elegant, this extra complexity in the
database and the ITR is not a part of Ivip. Ivip's database maps
each IPv4 address (or each /64 for IPv6) to a single 32 bit (128 bit
for IPv6) address. There is no other data in each element of the
database array, although "0" means "drop the packet" and it is
possible that some ranges of values, such as above 224.0.0.0, may be
used for special purposes in the future. (I have no idea what for.)
LISP is intended to handle multicast in some way. I don't yet
understand this or understand how multicast is used over the public
Internet, so Ivip at present is only concerned with global unicast
addresses.
Ivip is not concerned with the reachability of what for LISP are the
RLOC addresses of ETRs. The Ivip ITR tunnels packets to a particular
IP address for each Ivip-mapped IP address. What happens if there is
no ETR there, or no host, or no route to that address, is not defined
within Ivip. It is up to the end-user and whatever multihoming
monitoring system they use to ensure the database contains an IP
address which does something useful for them - and which does not
create problems for others.
The LISP-01 I-D only discusses LISP 1.0 and 1.5, so it is not
possible to know what encapsulation techniques are to be used for
LISP 3.x. In the RAM mailing list (2007 July 13 - msg01703) Dino
Farinacci discussed whether there would be Map-Reply messages from
ETRs to ITRs resulting from the ETR receiving a data packet: not for
CONS and NERD and perhaps for APT. It is not clear what
communication, if any, would take place between the ITR and ETR in
these 3.x variants - either in the encapsulation scheme for getting
raw packets to the ETR, or in separate message packets in either
direction. Nor is it clear whether LISP 3.x would use for the source
address of the outer header of its encapsulated packets the ITR's
address or that of the sending host. For LISP 1 and 1.5, the
encapsulation is UDP with the outer SA (source address) being one of
the ITRs RLOC addresses.
Ivip, as currently defined, uses IP-in-IP encapsulation with the SA
of the outer IP header being that of the sending host - the SA of the
inner IP header. Please see the "Loose Ends: ETRs checking src &
dest addresses" section below for discussion of why I believe it is
best to use the outer SA being the same as the inner SA - because it
greatly simplifies enforcing a particular kind of security filtering.
Whittle Expires February 20, 2009 [Page 52]
Internet-Draft Ivip Architecture August 2008
There are some disadvantages to this if "outer SA = inner SA" is used
with IP-in-IP encapsulation, because the tunneled packet carries no
indication of which ITR encapsulated it. If this is found to be a
serious enough problem, then Ivip could have its encapsulation
changed to use UDP, like LISP, retaining the "outer SA = inner SA",
but including some Ivip data at the start of the UDP body, containing
at least the address of the ITR. Any such use of UDP and an extra
"Ivip header" structure, which would of course have a structure which
supports adding new elements to it in the future, would add to the
overhead in every tunneled packet, and require further work at the
ITR and ETR. Such an extendable header would have the advantage of
enabling Ivip to do things in the future which are not currently
contemplated.
Most of the complexity of LISP 1 and 1.5 ITR and ETR behavior is
absent from Ivip. It is not known how complex LISP 3.x ITR and ETR
behaviour is, but LISP in general attempts explicit TE functions
which are not part of Ivip. The LISP 3.x database contains more
information and is more complex than Ivip's - which makes it more
difficult to implement full database (push) ITRs in LISP than in
Ivip. Likewise, any push mapping database distribution system for
LISP is likely to be even more challenging than Ivip's, for a given
rate of updates and number of recipient devices. This Ivip I-D is
longer than LISP's because I explore a wide variety of ways in which
Ivip could be used and because I explore the full user-control,
database and update distribution system in some detail.
5.3. Additional principles and mechanisms in Ivip
Any of these principles and mechanisms might be applied to another
system, including LISP - but some of them would only make sense in
LISP if other aspects of LISP were changed to more closely resemble
Ivip.
Ivip has three kinds of ITR function: ITRD, ITRC and ITFH, while LISP
has only the first two: the full database and the caching variety.
ITFHs could easily be added to LISP.
Ivip could have the ETR function performed in the destination host.
This resembles mobile IP in that the host must have additional TCP/IP
software and must have a conventional care-of address. This may
conflict with some goals, such as efficient use of address space.
This could easily be added to LISP.
Ivip's tree-structured UAS system, with multiple such systems feeding
update packets via a distributed replication network to ITRDs and
QSDs is quite different from any current LISP proposal I know of.
This could easily be adopted by LISP.
Whittle Expires February 20, 2009 [Page 53]
Internet-Draft Ivip Architecture August 2008
I think the QSD and QSC system, with Notification (cache
invalidation) messages is unlike anything in LISP. This could be
adopted by LISP, I think.
Support for TTRs for mobile IP. I don't think this is explicitly
mentioned in LISP I-Ds, but it could be adopted by LISP, as long as
the TTRs fulfilled all the ETR functions of LISP, which are more
demanding than what Ivip requires.
Whittle Expires February 20, 2009 [Page 54]
Internet-Draft Ivip Architecture August 2008
6. Ivip's goals, non-goals and challenges
[To do.]
Whittle Expires February 20, 2009 [Page 55]
Internet-Draft Ivip Architecture August 2008
7. User Interface and Update Authorities
Here are some ideas about how mapping information might be
controlled. This is different from keeping the authoritative data in
multiple locations and having to work backwards to find the
authoritative server, as in the DNS, and as described in
draft-meyer-lisp-cons. Nonetheless, the authority to change the
mapping for the IP addresses is delegated in a distributed fashion
similar to the DNS.
The entire Ivip system has a number (perhaps thousands or even tens
of thousands) of IMABs (Ivip-Mapped Address Blocks). The mapping for
each IMAB is controlled by a single body of data - an IMAB-DB. This
is maintained by a single system, typically comprising multiple
redundant servers separated from each other, called a RUAS (Root
Update Authorisation System). In the diagram below, I draw this as a
single entity, but in fact it acts as one entity while being
physically distributed over several servers.
Each RUAS can be authoritative for one or more IMABs. Once an ITRD
has a complete, real-time updated, slave copy of a particular IMAB's
IMAB-DB, it can program its FIB to match and tunnel these packets
according to every entry in the IMAB-DB. So then it can advertise
the BGP prefix for this IMAB and start accepting and tunneling
packets.
For this reason, it is desirable that there be a 1:1 relationship
between the IMAB, which is defined as a contiguous range of IMIP
address space which can be advertised as a single BGP prefix, and a
single IMAB-DB database which carries the mapping for this range.
The single IMAB-DB generates two things which are used by ITRDs and
QSDs. Firstly a regular (every 10 minutes?) IMAB-DBD dump file is
generated, and can be downloaded by any recently booted ITRD or QSD.
Secondly, there is a stream of update packets - a US-IMAB - which is
specific to this particular IMAB. The Replicator system gets this
stream to the ITRDs and QSCs which need it - and they typically need
it fast, without missing packets, and with the US-IMAB of every other
IMAB in the Ivip system.
The idea is that for scaling purposes, especially due to problems
with FIB rule capacity and packet handling speed, it may be desirable
to split a total ITRD function over several co-located ITRDs each of
which handles a fraction of the total set of IMABs. In this case,
each ITRD needs to be able to tell its one or more Replicators to
send it only US-IMABs only for the IMABs it is handling. Likewise,
when each such ITRD boots, it only needs to download a subset of the
IMAB-DBDs to get started. The format of these US-IMAB streams is
discussed in the section on Replicators.
Whittle Expires February 20, 2009 [Page 56]
Internet-Draft Ivip Architecture August 2008
In this section, I depict a single tree of delegated responsibility
for the user control of mapping of one IMAB. The Root UAS at the
base of the tree is run by Company X - RUAS-X. RUAS-X could be
authoritative for other IMABs, and each such tree of delegation may
have the same set of other UAS systems, or it could be different.
Each delegation tree is separate from the delegation trees of other
IMABs, even if they look similar, because the tree includes specific
subsets of the whole IMAB address range as one of the defining
characteristics of its branches and leaves.
The initial action which leads to the database being changed is a
user generated (manually or by the user's equipment or by a system
authorised by the user) UMUC (User Mapping Update Command).
For authorising and feeding UMUCs to the RUAS-X, there is a tree as
depicted in Figure 4. Delegation of authority flows up the tree as
the total address range of the UMAB is split at each branching
junction. This tree structure involves data, in the form of SUMUCs
(Signed User Mapping Updated Commands) flowing down towards the root
of the tree. (Data would also flow up the tree so each user-
interface leaf could tell end-users what their current mapping was,
could test their requests against constraints etc.) The idea is that
RUAS-X could delegate control of one or more subsets of the UMAB's
total range of addresses to some other system, which in turn would
delegate control to other systems. There would be no absolute limit
on the height (usually called depth) of these hierarchies.
Ultimately, from the point of view of the end-user, there needs to be
a username and password (or some crypto private key challenge
response system) by which they manually (via a web interface), or in
some automated way, control the DID to TELOC mapping of their one or
more Ivip-mapped addresses. The system would also tell them what the
current mapping is, and enable them to test a potential new mapping
address to see whether it was valid, given that it must be a BRIP,
and be outside certain well-known ranges which are protected from
being tunnel endpoints in order to ensure critical infrastructure
can't accidentally or maliciously be interfered with by tunneled
packets from ITRs.
The servers which handle the end-user interaction needs to be one of
the leaves of this tree structure, so as not to burden the RUAS-X
database servers themselves with this messy stuff. This enables
various companies to give different kinds of control for the Ivip-
mapping of the IP addresses their branch of the tree controls.
Figure 4 does not show RUAS-X having any user interface servers, but
it could. The simplest arrangement would be the RUAS having simply a
user-interface server and no tree of other UASes.
Whittle Expires February 20, 2009 [Page 57]
Internet-Draft Ivip Architecture August 2008
There would need to be IETF standardised methods by which some server
could execute a UMAC with the user-interface servers of any of these
UASes. This standardisation would be especially important for
multihoming, because some reasonably trusted company could run an
automated monitoring system, and have the credentials (username,
password, key etc.) stored in their system so their system can change
the mapping of one or more IP addresses the moment one link seems not
to be working. Also, the company which controls a particular range
of the Ivip-mapped space (such as X, Y or Z in Figure 4) may offer
such a multihoming monitoring system itself.
The tree in this example controls an IMAB with the address range
20.0.0.0 to 20.3.255.255. Let's say company X has authority (perhaps
direct from the Ivip system or because X assigned this space which it
got from an RIR to the Ivip system) over the entire range 20.0.0.0 to
20.3.255.255. It sublets to Y a quarter of this: 20.1.0.0 to
20.1.255.255. I am making these examples on binary boundaries, but
there is no reason why the divisions should be like this. It would
be just as possible for X to delegate to Y an arbitrary subset of the
whole range, or the entire range, or just one IP address.
X's Root Update Authorisation Server (RUAS) has a private key for
signing all the IMAB-DBD dumps it periodically creates and makes
available. (Actually, it probably signs a message which attests to
the MD5 hash of each IMAB-DB file.) This key is also used in some
way with a corresponding public key so Replicators and/or ITRDs and
QSDs can check that the US-IMAB they are getting has not been
corrupted. (This could be quite an onerous task.)
The rest of the Ivip system - the Replicators, ITRDs and QSDs -
neither know nor care about company Y or Z, or about any particular
end-user. All the rest of the Ivip system knows is the various
instances like RUAS-X, each an organisation with a public key for
authenticating streams of US-IMAB update packets they generate, and
the corresponding IMAB-DBD dump files, for a given subset of the
total Ivip-mapped address space. This could be any arbitrary subset,
but for simplicity I will assume that X only has authority over this
one IMAB 20.0.0.0 to 20.3.255.255.
Let's say Y delegates control of some of its space to company Z, and
that Z has an end-user U, who needs to control the mapping of one or
more IP addresses in Z's range.
Z has various interfaces by which U can do this, with its own
arrangements for authentication, for monitoring a multihoming system
and making changes automatically etc. Hopefully there would be one
or more automated, host-to-server, IETF-standardised protocols so all
end users could have standardised software for talking to whichever
Whittle Expires February 20, 2009 [Page 58]
Internet-Draft Ivip Architecture August 2008
company's servers they use to control the mapping of their IP
address(es).
Whittle Expires February 20, 2009 [Page 59]
Internet-Draft Ivip Architecture August 2008
User-R User-S User-T User-U Multihoming
\ \ | | Monitoring
\ \ | | Inc.
\ ................. /
\----. Web interface .---/
. other protocols .
. etc. .
....UAS-Z........
|
Other companies |
like Y and Z |
/-----<----/
| | \ | /
| | \|/
| | UAS-Y
\ | |
\ | /----<-----/
\ | /
\|/
RUAS-X Root Update Authorisation Server company X
| \
| \
V \->-[Multiple web servers for IMAB-DBD files]
|
|
| Other RUASes like RUAS-X, each authoritative
| for mapping one or more IMABs and producing
| regular IMAB-DBD dumps and streams of US-IMAB
| update streams to securely control the ITRs
| and Query Servers.
\
\ | | | /
\ | | | /
\ | | | /
\ | | | /
\ | | | /
\ | | | |
| | | | |
V V V V V
| | | | |
Each line depicts 30 or so streams
of identical packets for each US-IMAB
stream - one for each Level 1 Replicator,
which are depicted in the next section.
Whittle Expires February 20, 2009 [Page 60]
Internet-Draft Ivip Architecture August 2008
Figure 4: Delegation tree of UASes above one RUAS.
Let's say User-U wants to change the mapping of their one IMIP via a
web interface - or a range of IMIPs - to a new TELOC. User-U does
this via Z's website, authenticating him-, her- or it-self, by
whatever means Z requires, and gives the command (UMUC) to map their
IMIP to a new IP address (typically the address of another ETR).
This causes UAS-Z to generate a signed copy of this update command (a
SUMUC, according to some future IETF standard, of course) and to send
it to UAS-Y.
The SUMUC consists of three items (assuming IPv4 for simplicity): A
starting address for which IMIP address this update covers, a range,
being at least one, and a new mapping value, which will also be a 32
bit integer. It could also consist of a time in the future the
update should be executed.
Exactly how these UASes communicate is for future consideration. I
guess TCP-IP, with multiple links and each set of servers which
constitutes a UAS somehow behaving as one entity.
UAS-Y trusts this SUMUC because it can authenticate UAS-Z's
signature. It strips off the signature and adds its own, before
passing the SUMUC down to the next level: RUAS-X.
RUAS-X likewise has a copy of UAS-Y's public key and within a
fraction of a second of U initiating the UMUC, the IMAB-DB is altered
accordingly.
Authority is delegated up the tree, because UAS-Y will only accept
update commands if they are signed by one of its branch UASes, and
for the particular address range that UAS has been authorised to
control.
User-U may have given their username and password etc. to Multihoming
Monitoring Inc. so this company can monitor their multihoming links
and change the mapping as soon as one link goes down. UAS-Z doesn't
know or care who actually makes the change - as long as they can
authenticate themselves for whatever IMIP or range of IMIPs they want
to change the mapping of.
There is no need for PKI in any of this, I think.
I believe that a pure "pull" system such as draft-meyer-lisp-cons
will be too slow to respond. draft-meyer-lisp-cons has "push"
elements, but that is not pushing data towards ITRs, just information
about where the authoritative CAR can be found. Since we are going
to build a global system of ITRs, we might as well build a really
Whittle Expires February 20, 2009 [Page 61]
Internet-Draft Ivip Architecture August 2008
fast way of controlling them.
Whittle Expires February 20, 2009 [Page 62]
Internet-Draft Ivip Architecture August 2008
8. Replicators
Please consider the following section, which depicts an unencrypted
UDP-based system for collecting and fanning out the update streams to
hundred of thousands of ITRDs and QSDs as being just an attempt at
finding a solution to this major engineering problem. In the "Loose
ends" section below, titled "Is fast, secure, Replication possible on
the Internet?", I suggest that a secure, fast, Replicator system will
require robust authentication of each packet's data, whether the
packets are sent via TCP or UDP. I also discuss the difficulty of
ensuring that the RUASes and first and second levels of Replicators
can withstand DDoS attacks - so perhaps this part of the system would
best be done with private network links.
This section of the Ivip design not yet close to finding a good
approach to the problem of pushing mapping information securely and
rapidly to hundreds of thousands of ITRDs and QSDs. Hopefully this
ambitious work will inspire others to contribute ideas or develop
different and better plans.
I believe it is vital to make the system a fast, secure, push system,
rather than the likely very slow system based on querying and caching
of LISP-CONS.
Multiple companies, organisations etc. which have one or more IMABs
in the Ivip system each have their own RUAS (Root Update
Authorisation Server) system, as described in the previous section.
RUAS-X in Figure 4 is the central store of the mapping database for
at least one IMAB. RUAS-X could handle multiple separate IMABs but
the following example only considers one IPv4 IMAB.
There could be potentially large number of such RUAS systems, maybe
hundreds or up to tens of thousands. Ideally there would be no more
than a few dozen or a few hundred.
Each RUAS periodically, say ever 10 minutes, generates a compressed
IMAB-DBD "dump" file for each of its IMAB-DBs and makes it available
for download by HTTP or FTP on multiple redundant servers. Each dump
file has a timestamp and a sequence number which matches a message in
the US-IMAB stream for this IMAB.
Each RUAS continually generates a UDP stream of updates, also
timestamped - the US-IMAB - for each of its IMABs. One of the
messages in that stream may be "a dump file was generated now". Each
RUAS system generates many (say 30) identical streams from different
locations. Maybe it generates an update message packet (actually 30
identical such packets, each to a different Level 1 Replicator as
discussed below) as soon as the incoming updates fill a UDP packet or
Whittle Expires February 20, 2009 [Page 63]
Internet-Draft Ivip Architecture August 2008
after one second elapses. If no SUMUCs come in for ten seconds,
maybe it sends a time-stamped update message anyway, with no updates.
Each message needs a 64 bit sequence number, and a 32 bit or similar
identifier for which IMAB it is updating the mapping of.
The distributed system of Replicators is configured to reliably
distribute the contents of the update streams produced by RUAS-X -
and likewise ever other RUAS which has one or more IMABs in the Ivip
system.
A newly booted ITRD (Ingress Tunnel Router with full database) or QSD
(Query Server with full Database) performs the following procedure,
for each of the IMABs in the Ivip system. The ITRD or QSD is
receiving from the replicator system many individual UD-IMAB streams
of updates, including the one the IMAB this example concerns, which
is coming from RUAS-X.
The ITRD/QSD monitors the US-IMAB stream, waiting for the flag which
says a dump has been created. It then buffers all subsequent updates
in the stream, waits until the IMAB-DBD dump file is available (which
could take some seconds) and then starts to download the IMAB-DBD
file.
By the time it arrives, perhaps ten or twenty seconds of updates will
have been buffered.
The ITRD/QSC unpacks the dump file into an array in RAM which is 4
bytes for every IP address in the master-subnet. (This is an IPv4
example.) It then applies the buffered updates, bringing the data
totally up-to-date with the last received update. Then it continues
to apply all subsequent update messages as they arrive from the
replicator system.
At this point in time, the ITRD/QSD has an up-to-date copy of
RUAS-X's IMAB-DB for this master prefix. A QSD can start answering
queries about it. A ITRD can advertise this IMAB's BGP prefix. Soon
it will receive packets addressed to this IMAB and can encapsulate
them and forward them to its BGP peers, according to ordinary BGP-
derived FIB rules for the TELOC destination addresses in the outer IP
headers. These packets are soon forwarded to their ETRs.
It would be important to have a close or perfect match between the
address range of each IMAB and the BGP advertisement which the ITRs
make for it. We want each ITR either advertising or not advertising
a BGP prefix. We don't want excessive churn in the advertisements,
such as advertising a small subnet (longer prefix) when one IMAB's
mapping is complete and then withdrawing this to advertise a larger
subnet when an adjacent IMAB's mapping data is complete.
Whittle Expires February 20, 2009 [Page 64]
Internet-Draft Ivip Architecture August 2008
On the other hand, if there was a massive IMAB, like a whole /8, it
would be good for some ITRDs to advertise a subset of this, if the
total ITR load was to be split among several ITRDs by making each one
only handle a subset of Ivip-mapped address space. Also, by
splitting something big like a /8 into four or 16 smaller IMABs,
there is only a slight extra burden on the global BGP routing table,
but the process of booting, downloading IMAB-DBD files, buffering a
stream of US-IMAB updates, etc. can be done in smaller chunks,
including especially smaller allocations of RAM inside the ITRD or
QSD. This would also facilitate finer control of load balancing when
a single ITRD couldn't handle the traffic in one location, and
several ITRDs were used there with different IMABs for each one.
Periodically the ITRD or QSC could repeat the process of downloading
a IMAB-DBD dump file, buffering the US-IMAB stream (which it would
also be applying to its working copy of the IMAB-DB) and then
building a second array in RAM while the current one is being
updated. When the process was complete, it would switch to using the
second one for its queries or mapping functions, freeing the first
area of memory. Theoretically, the two bodies of data at switchover
time should be the same. This rolling complete refreshing of the
local copy of each IMAB-DB would be done for Justin - Justin Case.
Perhaps the ITRD or QSC uses non error-corrected RAM and a high-
energy particle, such as from radioactive decay ripped through one of
the chips. Even if it did use ECC RAM (much more expensive . . .)
debris from an upper-atmosphere cosmic ray impact shower can rip
through a CPU or other chip in the system and write false data there.
(I have had two occasions where a perfectly stable Pentium III and
Pentium IV system simply froze. I figure it was soft errors in the
CPUs, probably from cosmic ray debris. Burying the server
underground would help.)
An alternative might be to periodically send, as part of the US-IMAB
stream, some hash or CRC values for parts of the IMAB-DB as the RUAS
currently sees it. This can be applied at each ITRD or QSC, and if
there is a mismatch, this could trigger a complete reload as just
described.
Since some or many of the packets coming from the UAS systems to the
Level 1 Replicators might be short, perhaps the Level 1 Replicators
should have a way of combining shorter packets into longer ones, to
reduce the total number which need to be sent through the rest of the
Replicator system. This could be dodgy, since a single missing
packet at that point could cause some difference in the streams
leaving different Level 1 Replicators.
If there were 30 Level 1 Replicators, RUAS-X might generate streams
to every such Replicator. If RUAS-X consisted of three servers, each
Whittle Expires February 20, 2009 [Page 65]
Internet-Draft Ivip Architecture August 2008
could send 10 streams, or maybe more for some kind of redundancy.
(Maybe two streams to every Level 1 replicator?) Level 2 replicators
typically receive streams from two level 1 replicators for
redundancy. There could be hundreds of systems like RUAS-X feeding
UDP update streams to the Level 1 replicators. There are major
scaling problems here, but by judicious design, I hope they can be
overcome.
This removes any central system for handling the data, with all the
reliability, administrative and political dependencies that would
probably entail.
So an ITRD would boot and advertise various prefixes as it acquired
the full mapping information for each IMAB.
(See "Figure 4 Tree of UASes above one RUAS".)
\ | / } Update information from end-users
\ V / } directly or via child UAS systems.
\ | /
\|/
RUAS-X --------->-------------------[IMAB-DBD HTTP server 1]
/|\ \
/ | \ \----[IMAB-DBD HTTP server 2]
/ | \ \
/ V \ \-- etc.
| \
|
| 30 UDP streams of identical realtime
| updates to the 30 Level 1 Replicators
| for each IMAB.
|
|
\ \ | / / Each of the 30 Level 1 Replicators gets a
\ \ V / / stream from every RUAS such as RUAS-X -
\ \ | / / one stream for every IMAB.
[Replicator-N]
/ / | \ \
/ / V \ \ Each of 30 Level 1 Replicators sends 30
/ | | | \ "full streams" (the sum of all the streams
| it receives from systems like RUAS-X) to
| Level 2 Replicators.
\
\ /
\ /
\ / Level 2 Replicator gets two (ideally
Whittle Expires February 20, 2009 [Page 66]
Internet-Draft Ivip Architecture August 2008
[Replicator] identical) full streams from two of the
/ / | \ \ Level 1 Replicators. From this pair
/ / V \ \ of US-IMAB streams it constructs a
/ | | | \ single stream with (hopefully) no
| missing packets. It sends that to
/ each of 30 Level 3 Replicators.
/
\ /
\ /
\ / Level 3 Replicator gets two (ideally
[Replicator] identical) full streams from two of the
/ / | \ \ Level 2 replicators. It does the same
/ / V \ \ as described above - constructs a single
/ | | | \ complete stream and sends it to 30
/ | | | \ ITRDs or QSDs.
/ | | | \
/ | | | \ All these replicators are cheap
| | diskless Linux/BSD servers with one or
| | two gigabit Ethernet links. They would
| | ideally be located on stub connections
| | to transit routers, though the Level
\ | | 3 (or 4 etc. if desired) might be at
\ | | the border of, or inside, provider and
\ | | ASN-end-user networks.
\ | \
ITRD QSD ITRDs and QSDs ideally get two or more
/|\ ideally identical full feeds of updates -
/ | \ so generally a missing packet from one
/ QSC \ is fine since the other stream has the
/ /|\ \ same packet.
/ / |
/ / | Both therefore have a real-time updated
/ QSC | copy all the IMAB-DB databases of all
ITRC /| ITRC RUASes like UAS-X. Queries go up to the
/ | QSD or to a QSC which has a cached answer.
/ | Responses go back down to the requester
ITFH ITRC which is either a QSC or one of the two
"pull and be notified" types of caching
ITRs: ITRC (ITR with Cached mapping) and
ITFH (Ingress Tunnel Function in Host).
Figure 5: Three levels of Replicator drive ITRDs and QSDs.
The figures quoted below are wild guesses for the example. Exactly
what the database sizes will be, the update rates, the data rates of
updates etc. depends on many factors. I want a system which can
ultimately scale to handling one or two billion IPv4 addresses, with
Whittle Expires February 20, 2009 [Page 67]
Internet-Draft Ivip Architecture August 2008
some of these having reasonably frequent updates due to mobility -
with those mobile end-users paying per update to help finance this
system.
The average Level 2 Replicator gets two full streams from two widely
separated Level 1 replicators. This means there can be 450 Level 2
Replicators, each of which sends out 30 streams to Level 3
Replicators. The pattern continues with 15 * 450 = 6750 Level 3
Replicators, each of which has 30 output streams, with most ITRDs and
QSDs getting two streams, from widely separated Level 3 Replicators.
With this push to ITRDs and QSDs, pull via queries to QSDs, and the
QSDs notifying (carefully directed push) their child QSCs or ITRCs of
an update - those which recently (10 minutes?) made a query whose
answer covered one or more addresses affected by an update - the
entire global ITRD, ITRC and ITFH system should get updates within a
few seconds of the end-user making their change.
There would be some agreed, centrally coordinated, system by which
the Level 1 Replicators and the ITRDs and QSDs could recognise which
RUAS systems were currently a part of the Ivip system, and the IP
address ranges of each of their IMABs.
That could be as simple as some organisation pointing to them with
DNS from a domain of theirs. It could also be in the form of an
agreement for the Replicators and ITRs to accept updates from a
generally expanding list of UASes. This would involve central
coordination, but it doesn't involve centralised flows or storage of
data - just non-real-time configuration information which the
operators of Replicators, ITRDs and QSDs would follow.
For scaling purposes, some ITRDs may not cover the entire set of
IMABs. Then two or more ITRDs at the same site could split the load
among themselves.
In that case, an ITRD which covers a subset should be able to request
of its upstream (typically two, but maybe three or more) Replicators
only to send those US-IMAB streams for the IMABs it is advertising.
This means the packet format needs to be easy for the Replicators to
recognise and classify, which would be more complex if some or all of
the full stream packets contained data collected from separate
packets sent by separate RUAS systems to the Level 1 replicators.
In a RAM mailing list message I mentioned how a Replicator, ITRD or
QSD could request of an upstream Replicator another copy of some
packets it was missing. This sounds messy. Maybe the same result
could be achieved via making the packets available in a structured
manner, such as with file names with their sequence number in ASCII
Whittle Expires February 20, 2009 [Page 68]
Internet-Draft Ivip Architecture August 2008
as the name, available in the same web servers which are used to
supply IMAB-DBD dump files. These files only need to be kept for 10
or 20 minutes at most, assuming there is a dump every 10 minutes or
so.
Whittle Expires February 20, 2009 [Page 69]
Internet-Draft Ivip Architecture August 2008
9. Query Servers - QSD and QSC
I was going to write about the Query Servers here, but there is a
pretty complete description of them in the "Definition of Terms,
Concepts and Functions" section.
Whittle Expires February 20, 2009 [Page 70]
Internet-Draft Ivip Architecture August 2008
10. Ingress Tunnel (ITR) strategies
It should be quite attractive to make an ITRD from a mass-market
motherboard with one or two gigabit Ethernet interfaces on board, a
dual core CPU, Linux/BSD booting from a USB stick or Flash drive, and
suitable software. Depending on the number of addresses which are
Ivip-mapped, multiple gigabytes of RAM will be required, but this is
now very cheap. Perhaps a 1Gbps ITRD can be made for USD$1000 or so.
I discuss this in:
http://www1.ietf.org/mail-archive/web/ram/current/msg01628.html and
will probably write up a better version for a later version of this
I-D. Servers are not for everyone, but they are cheaper than
routers. Similar ordinary, low-cost, mass-market motherboards are
also ideal for making Query Servers and Replicators.
Here is an idea for combining numerous caching (pull) ITRCs with one
or a few full-database (push) ITRDs, to achieve generally optimal
paths whilst not delaying the first packets - as might otherwise be
the case with a caching ITRC.
However, if there is a QSD not far away, perhaps there will be no
delay, but more a problem with the ITRC not wanting to do a query and
an FIB entry for every unique IMAB destination address incoming
packets might have. In that case, the backup ITRD(s) is to catch
packets which the ITRCs stand back from, and in cricket metaphor,
"let go to the (wicket) keeper".
I also discuss having a caching ITRH function as part of host
operating systems, to reduce the load on ITRs in the edge network or
beyond.
Let's say the operators of a provider or AS-end-user network are
totally hip to Ivip. Ideally, they would install ITRDs all over
their network. However this will be costly in terms of traffic flow
if US-Complete updates to each ITRD - and in terms of the cost of an
ITRD due to its need to decode the updates, store them, and write
them to its FIB (which must have a huge capacity) as they arrive.
Let's say the edge network is rather large, and the operators only
want to have a single full database ITRD. They could rely on one in
the core, but they want to ensure their users don't depend on
anything outside their network which might be loaded by traffic from
other not so swinging networks. The operators want to have a few
hundred ITRCs, all over their network. These query a system of QSD
and QSC query servers.
Here are some ways caching ITRCs can forward packets in the Ivip-
mapped address ranges (meaning they are addressed to one of the
Whittle Expires February 20, 2009 [Page 71]
Internet-Draft Ivip Architecture August 2008
IMABs) if it doesn't already have some mapping information. If the
IRTC does have mapping for the address of the packet, its FIB will
encapsulate the packet and send it on its way to the ETR, wherever
this may be.
Let's say these caching ITRCs have some mechanism for detecting the
fact that a significant number of packets it receives have a
destination address which is within one of the IMABs, but for which
it has not yet asked for mapping information about. Ideally this
would be a counter per IP address, but that would be extremely
unwieldy or impossible, so perhaps there is some simpler system such
as a sampling scheme which examines 1% of the incoming packet's
destination addresses (when the router's CPU has nothing else to do).
Then an algorithm searches for two packets in the last few minutes
which are for the one IP address within an IMAB, but for which the
ITRC has not yet asked for mapping information.
Then the ITRC can ask for the mapping information for this address
and update its FIB whenever it arrives. If the address is part of a
larger subnet which has the same mapping, the response will say so
and so the FIB response for that subnet will be in place for future
packets.
I think a distributed system for handling mapping requests such as
that of draft-meyer-lisp-cons could take several seconds to get a
reply back to this ITR. Even with the snappier system of QSDs and
QSCs we may not want to delay the packets for which the ITRC doesn't
yet have mapping information for. Nor we do want to insist the ITRC
encapsulates every packet.
The task then is to organise the ITRs so that every packet gets
handled by an ITR, with the "unmatched" ones (those the ITRC - the
first ITR the packet is handled by - doesn't have mapping for) being
handled by some probably longer path than is ideal, while the bulk of
the traffic has no path delays, because the local ITRCs have already
got mapping data for the destination addresses of these "bulk"
packets.
The diagrams below use ITRCs, ITRDs, IRs (ordinary internal router)
and BR (border router - connects to the global BGP system).
I assume the ITRD and ITRC functions are performed by routers which
also do all the other things routers are expected to do in such a
location.
This plan (Figure 5) is all inside the provider or AS-end-user
network.
Whittle Expires February 20, 2009 [Page 72]
Internet-Draft Ivip Architecture August 2008
................
.
AS network .
.
ITRD . }
/ \ . / }
H--\ / \ . / }
\ / \ ./ }
H----ITRC0--------BR--- }
/ | \ / .\ }
H--/ | \ / . \ }
| \ / . \ }
| IR . }
| / \ . / }
H--\ | / \ . / }
\ | / \ ./ } BGP transit &
H----ITRC1--------BR--- } border routers
/ | \ / .\ } of the Internet
H--/ | \ / . \ }
| \ / . \ }
| IR . }
| / \ . / }
H--\ | / \ . / }
\ | / \ ./ }
H----ITRC2--------BR--- }
/ | \ / .\ }
H--/ | \ / . \ }
| \ / . \ }
| IR . }
| / \ . / }
H--\ | / \ . / }
\ | / \ ./ }
H----ITRC3--------BR--- }
/ .\ }
H--/ . \ }
. \ }
. }
..................
Figure 6: Internal ITRD for unmatched packets.
Plan A for Figure 6 is that every caching ITRC has its FIB set up so
that packets addressed to every IP address in the IMABs will either
be encapsulated and tunneled to the ETR near the host with the mapped
address, or will be encapsulated and tunneled to a single IP address
of the full database ITRD (at the top of the diagram).
Whittle Expires February 20, 2009 [Page 73]
Internet-Draft Ivip Architecture August 2008
This way, all packets addressed to an address within an IMAB, but
which are "unmatched" (and therefore not tunneled to an ETR) by the
first ITR they reach will probably go on a longer than optimal path
via ITRD, before being encapsulated and tunneled to the proper ETR.
ITRD needs to be expecting encapsulated packets arriving on one of
its IP addresses, but it would be easy for it to pop off the IP-in-IP
header and then put the packets through its FIB which will quickly
encapsulate each one addressed to an IMIP, tunneling it to its ETR.
Ideally, these initially "unmatched" packets will be a small
proportion of the total outgoing traffic addressed to IMIPs - and the
main traffic will flow through the internal routers along optimal
paths and then out via the nearest border router. In Figure 6, the
first internal routers are all ITRCs but ITRCs don't need to be the
closest router to the sending hosts - as I depict in Figure 7.
Plan B is for the Figure 6's network's routing system to correctly
handle each ITRC spitting out a packet it knows is addressed to an
IMIP but which it doesn't yet have mapping data for - and the
internal routing system forwarding that packet to the one internal
ITRD.
The internal routing system needs to ensure these "unmatched" packets
are always forwarded towards ITRD. It would be acceptable or
desirable if they pass through one or more further ITRCs on their way
to ITRD. If a packet did reach an ITRC which had mapping information
for it, then that would be fine, because it would be tunneled from
there.
Maybe it would work fine if each IRTC accepts packets addressed to
the IMABs and forwards those which were not mapped and encapsulated
by its FIB on a link which leads them closer to ITRD - which
advertises these IMABs. For this to work, it would be vital for none
of the border routers to announce paths for these IMABs - unless the
border router sent the packets towards the IRTD rather than out to
the Internet. The border routers would advertise (inject) routes in
the internal routing system for all the BGP announced prefixes other
than the Ivip IMABs and those prefixes for which this local network
is the destination.
So any packet sent from inside the network would eventually find its
way to an ITR.
Figure 6 could be applied to LISP. Fig 7 can't be, because LISP
doesn't involve EIDs being part of prefixes which are advertised in
BGP.
Whittle Expires February 20, 2009 [Page 74]
Internet-Draft Ivip Architecture August 2008
Alternatively, in Figure 7, the network has no full database ITRD and
would rely on the closest ITR(s) (presumably an ITRD) in the BGP
system to handle packets which its local ITRCs let pass without
encapsulation.
This would be cheaper for the network, and would not require a
constant inflow of update data for an ITRD. However, ITRCs work best
when there is a local QSD, so any substantial network probably needs
to bring in one or two full US-IMAB feeds to keep at least one QSD
fully updated.
Whittle Expires February 20, 2009 [Page 75]
Internet-Draft Ivip Architecture August 2008
................
.
AS network .
.
.
. /
H--\ . / /
\ ./ /
H----ITRC0--------BR-----TR
/ | \ / .\ \
H--/ | \ / . \ \
| \ / . \ \
| IR . TR-----ITRD---
| / \ . /
H--\ | / \ . /
\ | / \ ./ BGP transit &
H----ITRC1--------BR--- border routers
/ | \ / .\ of the Internet
H--/ | \ / . \
| \ / . \
| ITRC2 . TR----
| / \ . /
H--\ | / \ . / /
\ | / \ ./ /
H-----IR BR-----ITRD---
/ | \ / .\ /
H--/ | \ / . \ /
| \ / . \ /
| ITRC3 . TR
| / \ . /
H--\ | / \ . /
\ | / \ ./
H-----IR BR---
/ .\
H--/ . \
. \
.
..................
Figure 7: ITRCs but no ITRD in the network.
This shows how all paths taken by packets generated by hosts will
need to pass through at least one ITRC before exiting a border
router.
A feature of both Figure 6 and 7 is that there are a large number of
ITRCs. The difference is the location of the IRTD is which packets
Whittle Expires February 20, 2009 [Page 76]
Internet-Draft Ivip Architecture August 2008
go to which are not mapped by the ITRCs, because the ITRC hasn't yet
made a query or received a response yet.
A smaller network which doesn't want to have either an ITRD
(expensive, because of its large RAM and massive FIB capabilities,
unless implemented in software on a server) or a QSD receiving the
full database updates, will need to rely on some external system to
answer the queries of its ITRCs (and any ITFHs in hosts).
Ideally, there would be a way that every ITRC (and ITFH) could
automatically discover:
1 - Two or more addresses (of QSDs or QSCs) to which mapping queries
should be sent.
2 - How to handle packets for which it has not yet cached any mapping
information. For instance, what IP address to tunnel them to so they
reach an ITRD, or some other way of handling them.
The ITRC would need to be able to discover how these change after
boot time too, so perhaps the information could come with a caching
time.
Whittle Expires February 20, 2009 [Page 77]
Internet-Draft Ivip Architecture August 2008
11. Egress Tunnel (ETR) strategies
[To do. I think most people understand the simple role of an ETR.
They can generally be placed at border routers, internal routers,
Provider Edge routers etc. However please see the Loose ends section
for thoughts on ETRs filtering packets. ]
Whittle Expires February 20, 2009 [Page 78]
Internet-Draft Ivip Architecture August 2008
12. Mobile-IP with TTRs
[To do. I will probably write some more here in the future. See the
RAM list discussions around 17 to 18 June for discussion of TTRs and
mobile-IP.]
Whittle Expires February 20, 2009 [Page 79]
Internet-Draft Ivip Architecture August 2008
13. IPv6 and longer term strategies
[To do.]
Whittle Expires February 20, 2009 [Page 80]
Internet-Draft Ivip Architecture August 2008
14. Loose ends
I plan to refine the following material and integrate it properly
with a future version of this I-D.
14.1. ETRs checking src & dest addresses
14.1.1. Short version
A short version of this section, which is based on a RAM-list message
of 2007 July 14is:
The easiest and most robust way to enable a network to enforce on its
ETRs the rule that encapsulated packets from ITRs outside the network
must not contain inner packets with a source address (SA) which
matches one of the network's own prefixes is (along with some other
requirements) to break with convention and require ITRs to tunnel the
packet with the outer SA = the inner SA. That is, the packet sent by
the ITR to the ETR has the same source address as the sending host.
This means that no-one or nothing in the destination network (or
after the ITR), including the ETR itself, can find which ITR tunneled
the packet - unless the encapsulation method carries extra data which
includes the ITR's address, which is not the case with Ivip or with
current LISP plans.
The reason is that it is probably very difficult or perhaps
impossible to make all ETRs inside the network filter the
decapsulated packets to drop those which arrived from an external ITR
and have the inner SA matching a local prefix (this would be a packet
with a spoofed source address) - so it is better to achieve the same
desired protection by:
1 - Requiring all ITRs to set the outer SA = inner SA.
2 - Let the border routers continue to drop packets arriving from
outside the network with SA matching any one of the network's local
prefixes.
3 - Require all ETRs to drop all decapsulated packets with an SA
(inner SA) which is not identical to the SA of the outer header
(outer SA).
Having the outer SA = inner SA also has the benefit that traceroute
functions normally. The current LISP 1 and 1.5 definition has the SA
= ETR's address, which means the sending host gets no traceroute
results for any router between the ITR and the ETR - and perhaps not
from the ETR either.
Whittle Expires February 20, 2009 [Page 81]
Internet-Draft Ivip Architecture August 2008
I also explore the idea for LISP or Ivip of there being a service so
the current and recent history of the database (multiple databases
perhaps, as is the case with Ivip) can be queried to see when any
mapping - of any EID (LISP) or DID (Ivip) address - has been to a
particular (RLOC) IP address. This would be vital for debugging
problems with end-users setting the mapping incorrectly, for
determining why streams of encapsulated packets arrived in some
unwelcome fashion etc. It would be impossible to prevent this sort
of analysis of the mapping data.
14.1.2. ITR tunneled packet with source address of sending host
I tend to agree with what Iljitsch van Beijnum wrote:
"I don't think it's a good idea to have node Y send packets where the
source address is X, both because this claims that the sender is
different from his/her actual identity and because return traffic,
such as ICMP messages, will then end up at (arguably) the wrong node.
"Knowing the address of the encapsulating TR is also useful if the
decapsulating TR ever wants to get in touch with it."
However, I think there are some strong arguments for making the outer
SA = the inner SA, which is contrary to the conventional sensible
notion that any packet created by node Y should have its SA = Node
Y's address.
The general arrangement is:
HA-----ITR~~~~~~~~~~~~~~~~~~ETR------HB
Figure 8: Basic ITR-ETR tunnel.
where HB has the LISP/Ivip-mapped address (maybe HA has such an
address too, but that doesn't matter).
The current LISP-01 I-D only describes LISP 1 and 1.5, so there's no
way of knowing whether with LISP 3.x the outer SA (Source Address of
the UDP packet which encapsulates data packets when the ITR tunnels a
data packet to an ETR) will be the original packet's SA (HA's IP
address) or the ITR's address. In LISP 1 and 1.5 the outer SA is
definitely the ITR's address (see Definition of terms: ITR).
In LISP 3.x I am not sure to what degree the ITR sends messages to
the ETR, or whether the ETR sends anything back to the ITR.
Whittle Expires February 20, 2009 [Page 82]
Internet-Draft Ivip Architecture August 2008
Dino's recent message (RAM list msg01703) indicates that the 3.x
approaches using CONS or NERD do not involve the ETR sending a Map-
Reply message to the ITR and that perhaps with APT
(http://tools.ietf.org/html/draft-jen-apt-00) there would be such
messages.
So I am not sure whether some or all 3.x variants of LISP involve no
messages at all from the ETR to the ITR. If there is no requirement
for such messages or information exchange, then maybe the ETR doesn't
need to get any information from the ITR in data packets, or
presumably by any other means.
In that case, the ETR wouldn't need to know the address of the one or
more ITRs which are tunneling packets to it. In that case, it may be
possible for LISP to adopt the same "outer SA = inner SA" approach I
currently favor for Ivip. As far as I can tell at present, this
would have the same benefits for LISP as for Ivip - much greater ease
of a network protecting itself from a particular form of attack which
I will call "internal source address spoofing", and the retention of
the sending host's ability to fully traceroute the path taken by the
packets it sends.
If LISP or Ivip uses the ITR's address for the outer SA then HA will
find that traceroute does not produce any results for any routers
between the ITR and ETR. Whether or not it would produce a response
from the ETR depends on whether the ETR treats the decapsulated
packet just like any newly arrived packet or not.
I regard this as a substantial argument against using the original
packet's SA for the outer SA.
I am assuming in all this that both LISP and Ivip ITRs and ETRs
follow the principle that the ITR copies the TTL (Time to Live) from
the packet it is encapsulating to the header of the new packet which
contains it (the IP-UDP IP header for LISP and the IP-in-IP IP header
for Ivip). Similarly, the ETR takes the TTL from the outer IP header
and copies it to the TTL field of the IP header of the decapsulated
packet. (These operations are specified for Ivip, but are not
actually specified in LISP, except for similar concepts with
recursive or re-encapsulating tunneling.)
My previous message contained some detailed thoughts on why requiring
ITRs to make the outer SA = inner SA makes it much easier for the
network in which the ETR is located to ensure that packets arriving
from outside the network and being decapsulated by its ETRs do not
produce decapsulated packets with SAs from local prefixes.
Here I develop this line of argument further - in support for the
Whittle Expires February 20, 2009 [Page 83]
Internet-Draft Ivip Architecture August 2008
initially unpalatable notion of having the ITR tunnel the packet with
outer SA = the packet's original SA (inner SA). I then examine what
benefits and difficulties would result from following convention and
having the ITR use one of its own addresses as the outer SA.
Ivip has no communication from ITR to ETR or from ETR to ITR. There
is no header, other than the outer header of IP-in-IP encapsulation.
Ivip could be redefined to use the ITR's address as the outer SA, but
at present I think it is best not to. Ivip could be redefined to use
UDP encapsulation as is currently defined for LISP 1 and 1.5 - then
various other items such as the ITR's address could be included in
the encapsulated packet - but I am trying to keep Ivip simple. If
the problem of not being able to find an errant ITR was considered
great enough, then Ivip could use UDP encapsulation and include the
ITR's IP address and maybe other items of information in the body of
the UDP packet, before the raw data packet itself.
In this discussion, when I refer to "network" I mean any Autonomous
System network (provider or for an end-user) in which ETRs are
deployed. I don't refer in this discussion to the edge networks of a
single-homed or multihomed end-user which relies upon LISP-mapped or
Ivip-mapped addresses. Those edge networks have different
requirements for preventing "internal source address spoofing", which
I discuss in Note 1 at the end.
Here is a description of the particular security problem I am
concerned about:
............. ....................
N1 . . N2
. .
. .
H1---ETR1~~~BR1~~~~~~~~TR~~~~~~BR1~~~~~ETR1---MN1
. / . \ \
............. / . \---H2 \--MN2
/ . \
/ . \-H3
/ .
AT1~~~~~~~~~~~~~~~~~~~TR ....................
Figure 9: Diagram for explaining SA filtering problem.
N1 has a host H1 which is sending a packet to Mapped Node MN1.
Whether MN1 is a host with one or multiple IP addresses, or a link to
a router at a multihomed end-user's site, doesn't matter. Nor does
it matter whether H1's address is Ivip-mapped or not.
Whittle Expires February 20, 2009 [Page 84]
Internet-Draft Ivip Architecture August 2008
Also, it probably doesn't matter whether the ETR gets the
decapsulated packet to MN1 via a direct link or via relying on N2's
internal routing system to forward the packets to MN1. This problem
only concerns packets flowing left to right in this diagram. How MN1
gets packets to H1 is a separate matter and is not affected by this
filtering and security problem.
I assume N2 has BR2 set up to drop any packet which arrives from the
outside world where the SA matches any of the one or more prefixes N2
advertises to its BGP peers. If N2 doesn't bother to do this, there
is no point in fussing over how the ETRs should filter packets to
achieve the same purpose.
I assume (for reasons discussed in my first message in this thread
and in Note 2 below) that all ETRs will drop any packet which has a
DA for any address other than the set of hosts/routers which it knows
it can deliver decapsulated packets to, with LISP/Ivip-mapped
addresses. This means that if the ETR decapsulates a packet and
finds its DA is for H2, then it drops the packet. This also means
that if it decapsulates a packet and finds its DA is for some other
LISP/Ivip-mapped address which it can't deliver packets to (either
directly or via support from the local routing system) then this
packet will be dropped too.
The purpose of this "internal source address spoofing" protection is
to stop MN1 receiving a packet with SA matching any of N2's BGP-
advertised prefixes, for instance the address of H2. Without this
protection, an attacker AT1 can easily create an encapsulated packet,
with outer DA = ETR1, inner DA = MN1 and inner SA = H2.
It is not the purpose of "internal source address spoofing" to stop
ETR1 from decapsulating and forwarding an inner packet to MN1 when
its SA is any LISP/Ivip-mapped address, including some address of
another host/node MN2, which happens to be "within" N2 or potentially
(multihomed link which may not be working) connected to N2.
An attacker can already do this by sending a packet with a spoofed SA
to any ITR, or by generating its own encapsulated packet.
The only purpose is to prevent attackers spoofing source addresses of
the non-LISP/Ivip-mapped addresses within N2.
Attackers are assumed to be outside N2. (It's all over if an
attacker is inside N2.)
I start with some further assumptions:
A1 - Whatever new architecture is adopted - LISP, Ivip etc. - the new
Whittle Expires February 20, 2009 [Page 85]
Internet-Draft Ivip Architecture August 2008
architecture must not force a lower level of security than currently
exists (RRG Design Goals 3.9) and should not make it significantly
more difficult, costly or error-prone to ensure the same levels of
security are maintained.
A2 - Therefore, for networks which protect against "internal source
address spoofing" the new architecture must make it easy to maintain
this protection for packets being decapsulated by ETRs.
A3 - That the ETR needs to decapsulate packets which were
encapsulated by ITRs in this same network. I argue in Note 2 at the
end of this message why this is a reasonable requirement.
A4 - We can't expect the border router to perform deep packet
inspection on every incoming packet - for instance to find any packet
which looks like it might be intended to be decapsulated by an ETR,
and to then decapsulate it, and filter it according to the SA of its
inner packet.
So we can assume that BR1 drops any packet arriving from outside if
the SA matches, for instance, H3's (ordinary, BGP reachable) address.
14.1.2.1. Preventing SA spoofing when outer SA = inner SA
In the above framework, it becomes very easy to protect against
"internal source address spoofing" if all ITRs make their outer SA
equal to the source address of the sending host (the inner SA).
All that needs to be added is that ETRs drop any packet whose inner
SA does not match the outer SA.
This means that there are two classes of spoofed SA packets being
filtered:
1 - Those sent as ordinary packets, from outside. This is handled by
the border router's existing filters.
2 - Those in inner packets with an outer header with DA = an ETR.
This is performed in two stages - by the border router and then by
the ETR dropping packets where inner and outer SA do not match.
This seems to be a bullet-proof arrangement, and works fine with
encapsulated packets created by ITRs in the network. These are
assumed not to arise from attackers, since attackers are defined to
be outside the network.
Whittle Expires February 20, 2009 [Page 86]
Internet-Draft Ivip Architecture August 2008
14.1.2.2. Preventing SA spoofing when outer SA = ITR's address
In this arrangement, the ETR can't drop packets with inner SA !=
outer SA. So there is no way the ETR can use this simple technique
to extend the border router's filtering to the inner SA.
The only alternative is for the ETR to drop the inner packet if both
these conditions are met:
1 - If the inner SA matches any of the network's BGP advertised
prefixes.
and 2 - The outer SA does not match any of these prefixes.
These two conditions would be met if the packet arrived in
encapsulated form from outside the network while pretending to be
sent from inside the network.
Assuming the ETR needs to accept encapsulated packets from ITRs
inside the network (Note 2 below) then both these tests are required.
However, this is likely to be prohibitively difficult for an ETR to
perform.
Firstly, the FIB hardware of a proper router isn't necessarily able
to perform these gymnastics. (The decapsulation and "drop if inner
SA != outer SA is still tricky, but does not involve any knowledge of
the potentially numerous local prefixes.)
Secondly, the list of prefixes this network advertises could be very
large indeed. This would make it perhaps impossible for FIBs to cope
with such a list.
Thirdly, we want to be able to do ETR functions in servers, not just
hardware-FIB "routers", including in the destination host (if it has
a BGP-reachable care-of address). There is no way with ordinary
software of applying a huge list of rules to the inner SA to decide
if the packet should be dropped, and then applying the same set of
rules to the outer SA as well.
Finally, even if routers and software ETRs could do this, there are
serious problems with the network's control system finding all these
ETR functions ensuring they comply with these rules.
With my idea towards the end of Note 2 below - that ETRs should know
how to deliver packets directly to the destination host, rather than
use the internal routing system (which is compatible with LISP-01
page 11 point 7: "attached destination host") - there is no need for
Whittle Expires February 20, 2009 [Page 87]
Internet-Draft Ivip Architecture August 2008
a general system to control all ETRs at once (as would be required if
every ETR was to decapsulate packets for any host in the network with
a LISP/Ivip-mapped address, relying on the local routing system to
get the packets there).
14.1.2.3. What is lost by making outer SA == inner SA?
By defying convention and having ITRs send tunneled packets without
their own IP address in the SA of the outer header, we lose certain
things:
1 - We can't find directly which ITR tunneled the packet, once it
left the ITR.
2 - Therefore, we can't get a message to that ITR, or to whoever runs
it.
It is an uncomfortable thing to propose such an arrangement, but here
I explore exactly what would be lost. As with all this stuff, I
could be mistaken and be missing many important things - so please
let me know what I have missed.
Ivip requires no communication from an ETR to an ITR, or vice-versa,
so nothing is lost with this arrangement.
Some variants of LISP do require this communication, so if this
"outer SA == inner SA" was adopted for LISP (for instance because it
seems to be the only practical or reasonable way of allowing a
destination network to maintain current security limits) then the
LISP header would need to contain the ITR's address.
I think the remaining reasons for wanting to know the ITR's address
are to do with coping with unwelcome packets.
Here is a possibly incomplete list of the scenarios which could lead
to the perception of unwelcome packets arriving from an ITR.
In the case of packets from an attacker, I will assume that the outer
SA (Ivip) or any "ITR address" in a LISP header contains a bogus
address, which may be part of the attack by encouraging victim V1
(whoever's host gets the unwelcome packets) to send messages to
victim V2 who probably runs an ITR, or to V3 which is whoever runs
the LISP/Ivip database system, with possible negative consequences
for further victims who might use LISP/Ivip-mapped addresses:
a - The packets are sent to an ETR but have inner DAs for LISP/
Ivip-mapped addresses which the ETR is not configured to deliver to a
destination host. This could include an address which is a BGP
Whittle Expires February 20, 2009 [Page 88]
Internet-Draft Ivip Architecture August 2008
reachable address, or some other LISP/Ivip-mapped address other than
the small subset of those addresses for which the current ETR can
deliver packets.
b - The packets are sent to an ETR and have inner DAs which is one
the ETR is configured to deliver packets for - however the flow of
packets is excessive in volume, is regarded by that host as
irrelevant or unwelcome etc.
c - The packets are sent to an address which is not an ETR - it may
be of a host, an ordinary router or to some address which has no
destination node.
In all these cases, if the encapsulated packets come direct from an
attacker (that is they have not been generated by a proper ITR) then
there is no point in looking at the outer DA. That will probably not
lead to any clues about the location of the attacker. Any attempt to
complain about an attack from that outer DA address will probably
cause V1 to drag other victims into the attacker's ploy.
If the packets do come from one or more genuine ITRs, then I think
one of the following must be true:
e - The one or more ITRs are functioning properly, with fully updated
databases (an ITRD or an ITRC or ITFH with access to properly updated
query servers, and getting notifications from those query servers in
the event of a database change for some mapping information they
cached).
f - The one or more ITRs are not functioning correctly. Maybe their
FIB is broken and doesn't reflect their RIB (copy of database or
cached mapping information). Maybe they are not properly updated.
For this reason, ITRDs which for some reason are not getting updates
or which have detected some corruption should probably stop
forwarding packets. This is a tricky business I will write more
about in the future.
I also want to write about methods of detecting errant ITRs from the
sender's end, perhaps by some method such as sending commands to
ITRs, or making an ITR respond to a traceroute in some way which
indicates the address is going to tunnel the packet to.
In the case of 'e' above, there is no point in knowing which one or
more ITR is tunneling the packets, because there is nothing wrong
with these or any other ITR - and similar problems can be expected
with packets being tunneled by any of the world's hundreds of
thousands of ITRs (or millions, with ITFHs widely deployed). The
problem is either with the contents of the mapping database(s) or
Whittle Expires February 20, 2009 [Page 89]
Internet-Draft Ivip Architecture August 2008
with the behavior of sending hosts. In Note 3 I discuss how to
resolve the first problem. The second problem has nothing to do with
the LISP or Ivip system, although perhaps changing the mapping to
"drop" or pointing it to some other ETR could resolve some problems.
In the case of 'f', it would be good to find the ITR which is sending
the packets which are considered unfriendly. Ivip as I am currently
proposing it would prevent anyone from finding out which one or more
broken or out-of-synch ITRs are causing the trouble.
This problem is similar to having some router out in the Net
malfunction, forwarding packets to some place they don't belong.
Generally, the packets wouldn't get to an ETR or a host, because not
even a malfunction in a local router could do this (unless the victim
host was on a single link, rather than a LAN, from the errant router.
The broken ITR is not generating the packet, but it is tunneling a
packet which should be tunneled to some ETR which would be happy
about it to some other address where the packets are not welcome.
If the outer SA was the ITR's address, then victim V1 could
potentially find who runs this ITR and complain - but it could be
tricky finding out who to complain to, unless there was some global
register of ITRs, which won't include the hundreds of millions of
ITFHs on host computers if Ivip is widely implemented in the future.
There is something lost by not being able to identify the genuine ITR
which mistakenly tunneled the packets. However, if the ITR was
functioning properly, there is no point in finding out its address or
who owns it - the problem is not with the ITR but can only be
resolved by either changing the mapping information or by altering
the behaviour of sending hosts - which is no different from the
situation with unwelcome packets today without LISP or Ivip.
Even if it was possible to identify the genuine ITR which tunneled
the unwelcome packets, why should the operator of that ITR shut it
down just because V1 complains about it? Who is to say that the
complaint is not the work of an attacker?
There does need to be is a way of V1 finding out the recent history
of its IP address being involved in the mapping database. I write
more about this in Note 3 below.
14.1.2.4. Note 1 - Edge networks and internal source address spoofing
A single-homed or multihomed "edge network" which uses purely LISP-
mapped or Ivip-mapped addresses has a very different set of
conditions in which it might protect against "internal source address
Whittle Expires February 20, 2009 [Page 90]
Internet-Draft Ivip Architecture August 2008
spoofing".
Firstly, it has no BGP connections to the Internet. It only receives
incoming packets via one or more links to provider networks. Here, I
assume that it relies upon ETRs in those provider networks (see
Figure 3 in the Ivip I-D).
These ETRs feed the inner packets (that is the packets with DA = some
LISP/Ivip-mapped address, where this is one of the edge network's
address range, such as the address of IH9 in Figure 3) directly to
the edge-network's router, Ethernet switch or whatever.
However, if the edge network gets from each of its one or more
providers one or a few of the provider's PA addresses, on which it
runs its own ETRs, and then routes only the inner packets produced by
these ETRs to the rest of the edge network, then similar principles
apply.
The edge network does not contain any ETRs, because ETRs do not
reside on addresses which are LISP/Ivip-mapped.
Therefore, the edge network doesn't need to worry about packets
emerging from ETRs within its own network.
I can think of two scenarios which require different approaches to
protecting against an attacker (implicitly any host outside this edge
network) from sending packets with spoofed local addresses - meaning
addresses within the range of LISP-Ivip-mapped addresses of this edge
network.
14.1.2.4.1. No ITRs in edge network
The edge network has no ITRs - including any ITFHs - which might be
tricky to establish if ITFHs became a common part of operating
systems . . . but then an ITFH will always send queries to some Query
Server, so if there was a way of preventing this from succeeding at
the network's router, then this would prevent any ITFH function
working.
In this case, the edge network relies on its internal routing system
to forward packets from its hosts to its hosts.
In this case, raw packets with DA matching a LISP or Ivip-mapped
address range will be forwarded directly to the correct host if that
is their DA, or to the router and out one of the links to the
provider network if they don't match one of the edge network's
addresses. There they will soon be encapsulated by one or a series
of ITRCs, ITRD etc.
Whittle Expires February 20, 2009 [Page 91]
Internet-Draft Ivip Architecture August 2008
Protection against packets from the outside with spoofed local SAs
must be done by the edge network's router - it must drop any incoming
packet with a SA which matches one of the edge network's LISP/
Ivip-mapped addresses.
14.1.2.4.2. Edge network contains ITRs
The edge network has its own ITRD, ITRC and/or ITFH functions - and
these may encapsulate packets which were addressed to one of the edge
network's LISP/Ivip-mapped addresses.
This is probably a bad idea, since it makes it much more difficult,
or impossible, to protect against "internal source address spoofing".
There are various reasons I won't explore here why this is a messy
arrangement to be avoided, but for instance how can the edge
network's router know whether decapsulated packets from an ETR
originated in its own network or were decapsulated from a packet of
an attacker?
14.1.2.5. Note 2 - ETRs must handle packets from ITRs in the same
network
(See assumption A3 above.)
In a large network, for scaling purposes, there needs to be lots of
ITRs. We can't make all the ITR functionality be in the border
routers.
Also, at least with Ivip, it would be advantageous to allow and
encourage caching ITR functions in sending hosts (both those on
ordinary BGP reachable IP addresses and those with Ivip-mapped
addresses). This is an ITFH function. It needs to send queries to
QSD or QSC query servers, and it doesn't necessarily have to tunnel
every packet - because those it doesn't tunnel will (in a well
designed network) be forwarded (or perhaps explicitly tunneled to)
one or more ITRCs or an ITRD which can encapsulate it. The ITFH
greatly reduces the load on ITRCs and ITRDs, without any cost to
anyone and with the path taken by the packets to the ETR being
entirely optimal, since they never need to go via an ITRC or ITRD.
(ITRCs and ITFHs may not tunnel all packets, so they need a backup of
some other ITRCs or ideally an ITRD to handle these.)
I assume ITFHs can't easily be detected (except by detecting or
blocking their requests to query servers - maybe the autodiscovery
system returns a message "ITFHs not supported here", which may be as
simple as an empty list of Query Servers) and that they can't be
highly managed, at least in terms of rapid changes to their behavior.
For instance there could be thousands or hundred of thousands of
Whittle Expires February 20, 2009 [Page 92]
Internet-Draft Ivip Architecture August 2008
ITFHs in a large network, in many hosts or in DSL modem NAT functions
(I call this "function in host" because the "router" is just a CPU
and software, with no FIB hardware etc.). Nonetheless, I expect
there to be autodiscovery arrangements for an ITRC or ITFH to find
where to send queries to and perhaps where to tunnel packets it
doesn't encapsulate for some reason, but which should be - for
instance because it doesn't have mapping information for that
packet's DA.
It is probably much more robust and easier to plaster ITRs all over
the network and to encourage the adoption of ITFHs in many hosts and
DSL etc. modem-routers - than to try to ban ITFHs and centralise all
ITRs in a few places where their activities can be carefully
controlled. If all the ITRs in a network could be carefully
controlled, then it would be possible to ensure that the local
routing system took precedence over encapsulation, so that if a host
H3 wanted to send a packet to MN1, then the packet would be sent via
the local routing system to MN1, and not encapsulated and tunneled to
ETR1. However this is unlikely to be practical, because it could be
difficult or impossible to immediately change the internal routing
system and all local ITR behaviour to reflect the fact that MN1's one
or more LISP/Ivip-mapped addresses have just become reachable inside
this network, or have just become unreachable.
If the local routing system forwards packets to MN1 - this must be
stopped the moment MN1 is no longer reachable - such as if MN1 has
moved and changed its mapping to some other ETR or another network's
ETR.
Even if the local routing system forwards packets to MN1, then it
would still be best if any packet which is encapsulated by an ITR
anywhere will be delivered properly.
Generally, I think there should be no exception to the rule "if the
raw packet finds its way to an ITR which encapsulates it according to
the current state of the database(s), then it definitely will be
delivered to the currently selected ETR". A local routing system
which attempts to get the packet to the destination host might be
acceptable, provided it changed its behavior very rapidly to reflect
the contents of the mapping database.
Overall, I think it will be best if:
1 - One or more ETRs have some explicit tunnel system for getting
decapsulated packets to the destination host, rather than relying on
the local routing system.
2 - Hosts inside the same network should rely on ITRs (and perhaps
Whittle Expires February 20, 2009 [Page 93]
Internet-Draft Ivip Architecture August 2008
their own ITFH function) to deliver the packets, in accordance with
the current state of the database(s) - and not have this interfered
with by the local routing system trying to take packets directly to
the destination host. (See the discussion below in "TTRs and
Mobility" about an equivalent situation where a TTR has a route to a
mobile node, and might forward a packet there directly. It would be
best if it allowed an ITR, and therefore the mapping database, to
decide where the packet should go to, since the mobile node may want
packets sent over some other link.)
3 - We don't want to concentrate all the ITR functions in border
routers. To maintain optimal path lengths within the network, we
want the packets to encounter an ITR ASAP, including in the sending
host's own ITFH function.
4 - Therefore, we need ITRs all over the network, and these ITRs will
be encapsulating packets to be sent to ETRs in the network, which
will directly get the decapsulated packets to the proper destination.
All this only applies to Autonomous System networks - not the end-
user networks which consist only of LISP/Ivip-mapped addresses.
14.1.2.6. Note 3 - A search-mapping service to debug LISP/Ivip mapping
This proposal would work with Ivip as currently defined - and perhaps
with some forms of LISP. I think there really needs to be some kind
of service like this, for LISP or Ivip.
There are some security and privacy implications of such a service
being open to anyone to query about any IP address, but there is
absolutely no way of preventing such a service being created, so
there is no point in trying to prohibit misuse of such a service.
This service could be implemented as part of the main Ivip system,
but there is no need for this to be done, since one or more separate
companies, organisations or individuals could set up their own system
to do the same job.
The service has a server which gets a full feed of the update stream
- "US-Complete" in the current Ivip I-D. It can also download the
IMAB-DBD periodic dumps of the complete set of mapping databases. It
is absolutely required that this information be freely available to
anyone, so anyone can set up their own ITRD and QSD-QSC-ITRC-ITFH
systems. No practical measures could prevent anyone from gaining
access to this information.
The service analyses the data and stores a copy of its analysis in
some database, covering the last few weeks of activity. Then, the
Whittle Expires February 20, 2009 [Page 94]
Internet-Draft Ivip Architecture August 2008
service is able to definitively answer queries such as:
1 - What is the history of RLOC (LISP) or TELOC (Ivip) mapping of
this particular EID (LISP) or DID (Ivip) address?
2 - What is the history of the RLOC/TELOC mapping of any EID/DID over
the past minutes, weeks etc. which involved the ITRs being told to
tunnel packets to this particular IP address?
This means that if the unwelcome packets resulted from something in
the mapping system, now or in the past, that V1 could find out for
sure which one or more EID/DID addresses was involved. Then, it
could quickly establish which RUAS (Ivip Root Update Authorisation
System) the update was made through. If that RUAS considered the
complaint to be genuine, it could try to resolve it with the end-user
who it authorises (directly or via one or more branch and leave
UASes) to control the mapping of this IP address.
Spies, dodgy detectives and the security authorities will be watching
the changes in the database, so anyone with something to hide (which
includes ordinary folk who are targeted by nosey authorities and
those with malicious intent) need to consider how changes to their
mapping might leak information to others.
BTW, future generations will want to know why LISP, Ivip or whatever
was foisted on the Internet - because it is certain to add to the
difficulty of understanding and managing the system, with its own set
of gotchas and security problems. They will hopefully realise that
BGP couldn't be asked to do much more and that IPv6 wasn't ready for
mass adoption, and doesn't solve the major problems anyway.
14.1.2.7. Note 4 - Finding errant ITRs
The problem of finding an errant ITR is tricky or impossible for the
recipient of the tunneled packets if the outer packet header and
other information doesn't identify the ITR. This can be resolved,
while keeping the outer SA = inner SA to help with filtering, by
adopting UDP encapsulation, with the first part of the packet's data
including some field of data including the ITR's address, followed by
the raw data packet.
The sender of the original packets is in a better position to find
which ITR is handling them by doing a traceroute. (Unless the ITR is
an ITFH in the sending host.) A full traceroute should also show the
ETR the packets go to and the final destination host. This debugging
situation is made possibly messier by a number of things.
Firstly, the routing of packets to ITRCs and ITRDs is not necessarily
Whittle Expires February 20, 2009 [Page 95]
Internet-Draft Ivip Architecture August 2008
stable.
Secondly, the packets may at first have passed through one or more
ITRCs (and perhaps the host's own ITFH) before being encapsulated by
some ITRC or ITRD which is errant - but now the packets are handled
by the ITFH or a nearby ITRC and so never go through the errant ITR.
Thirdly, this traceroute really needs to be performed from the
original sending host, or some host in the same part of that network,
to exactly the same destination IP address - the Ivip-mapped address
of the destination host. Whoever does this test should check the
mapping of that address first, to find the correct address of the ETR
these packets are supposed to be tunneled to. If that doesn't appear
on the traceroute, then maybe the ITR is doing something wrong - such
as operating from corrupt or incomplete update information, or has
something wrong with its FIB data.
Finally, the problem might have occurred for a while when the mapping
was in one state, but now the mapping has changed and no problem can
be found.
There are going to be lots of ITRDs and ITRCs around the Net, most of
them probably not very closely managed. An errant one of these, or
an errant Query Server, will cause some packets to be dropped or at
least not sent to the correct ETR. But that behavior will only
happen for the specific Ivip-mapped address, from a some set of
sending hosts - and the fault may disappear when new mapping
information arrives or after the errant ITR decides it has lost synch
with the mapping update stream and has either taken itself offline
(letting the packets be tunneled by another ITR) or has since
reloaded the mapping data.
This is an area which will require a lot of thought.
14.2. Scaling the Replicator network
I don't know of any system resembling the Replicator system - so a
great deal of work will be required to figure out an architecture
which can reliably deliver streams of packets to hundreds of
thousands of millions of ITRDs and QSDs all over the Net.
The system described above assumes that a single Replicator can
receive two complete US-Complete streams of packets and send out some
number of copies, such as 30 or so. The idea is that since each
Replicator is generally going to get two copies of some packet, from
its two previous level Replicators, that it sends 30 copies of a
packet of a particular IMAB number and sequence number as soon as one
is received, and then ignores the second copy (assuming it contains
Whittle Expires February 20, 2009 [Page 96]
Internet-Draft Ivip Architecture August 2008
the same information).
Replicators need to have pretty reliable levels of packet reception
and delivery - which can be difficult to ensure. There can be peaks
in the streams of packets - I am not sure how to regulate this,
except by some feedback from the first level of Replicators to the
various RUASes, causing them - or some of them - to hold back on
updates for a moment so as not to overload the Replicator system.
If the volume of updates becomes too much, the simple expedient is to
build a second parallel system of replicators, with the new system
handling updates from one subset of the RUASes and the original one
handing updates from the remainder.
The ITRD system could also be split, with one set of ITRDs handling
one set of IMABs and therefore advertising them, and another set
handling the remainder. Perhaps one set is more optimised for rapid
changes for mobile end-users, so these end-users would get Ivip-
mapped addresses in the IMABs of the higher-speed network, and their
pay-per-update fees would fund that system.
It is not obvious how a similar split could be achieved for QSDs and
QSCs. That would require two sets of ITRCs, or at least a single
ITRC or ITFH which knew two separate QSCs or QSDs to query, depending
on whether the DID in question belonged to one set of IMABs or the
other. That would be very messy.
14.3. Is fast, secure, Replication possible on the Internet?
There are probably various ways of using UDP packets for updates with
detection of missing packets and of spoofed packets. This could
limit the time incorrect data was being sent to ITRDs and there could
be various methods of recovery. In order to protect against false
information being used by the ITRD, authentication of each update
packet's data will be required. TCP could be used, such as with
"HMAC Protected TCP Connections" as suggested in LISP-CONS. I guess
this is draft-touch-tcpm-tcp-simple-auth (work in progress). But
this can be disrupted by spoofed packets.
Even if attacks aimed at creating bogus mapping information into
ITRDs could be prevented, Level 1 of the Replicator system could be
disrupted by a flood of packets from a botnet. RFC 4732, Internet
Denial-of-Service Considerations [RFC4732] describes various types of
DoS, but notes that there is no absolute way of protecting against
them:
"As a result, almost all Internet services are vulnerable to denial-
of-service attacks of sufficient scale. In most cases, sufficient
Whittle Expires February 20, 2009 [Page 97]
Internet-Draft Ivip Architecture August 2008
scale can be achieved by compromising enough end- hosts (typically
using a virus or worm) or routers, and using those compromised hosts
to perpetrate the attack."
One way of making parts of the system invulnerable to DDoS attacks
would be to have parts of the RUAS and Replicator system
interconnected with private network links - so RUASes and first few
levels of Replicators are not using Internet addresses at all. This
adds enormously to the difficulty and cost of setting the system up.
Perhaps it is best to design a system which is as robust as possible
for deployment on the open Internet and consider using private
network links closer to the time of deployment.
14.4. TTRs and Mobility
The global ITR system of LISP, or Ivip etc. could be used to direct
packets to "Translating Tunnel Routers" (TTRs). These would be
located in multiple locations, and a mobile node would find one or
more of them topologically nearby and establish a two way tunnel to
each TTR. Each TTR would be capable of being somewhat like a home-
agent - accepting packets to be sent to the mobile node and
forwarding outgoing packets from the mobile node to the local network
and the Internet. This mobile use of Ivip does not involve the
database or the ITRs in any new type of functionality, other perhaps
than "mobility" implying a higher rate of updates than for
multihoming or simple portability, and with the general hope or
expectation that a change in the database will result in changed
tunneling very quickly - ideally in a fraction of a second.
Traditional Mobile IP involves a fixed home-agent router, and the
mobile node usually having an address from the network that router
handles. Sub-optimal paths usually result, since the correspondent
node may be near the mobile node, but both are far from the home-
agent. Traditional Mobile IP works with IPv6 and requires no new
functions in the correspondent node as long as the (typically)
suboptimal paths via the home-agent are used. New software in the
correspondent host enables it to send packets more directly to and
from the mobile node. Ivip will enable IPv4 and IPv6 correspondent
nodes with no special mobility software to have generally optimal
paths to and from the mobile node - which will require additional
mobility software.
Normal Ivip does not require the destination host to have any IP
address other than its Ivip-mapped address. Mobility usually
involves the mobile node acquiring a care-of address in whatever
network it is currently using (or multiple networks, if it is using
multiple radio, wired Ethernet etc.) and establishing a tunnel from
there to a home-agent. The mobile use of Ivip also involves the
Whittle Expires February 20, 2009 [Page 98]
Internet-Draft Ivip Architecture August 2008
mobile node having one or more care-of addresses - which may be
behind NAT, as long as the tunnel arrangement to the TTR can be
established from behind a NAT.
Using the ITR system to direct packets from correspondent nodes all
over the Net to the currently active TTR will lead to generally
optimal, or close to optimal, paths to that TTR. Since the TTR is
typically close to the mobile node, the total path length will
generally be close to optimal.
The ability of the mobile node to choose its own TTR as it acquires
new connections to the Net means it can physically move and establish
new TTRs, and have the ITRs tunnel packets to whichever TTR it
chooses. So a mobile node could move physically across the world, if
it could maintain some kind of Internet connection, whilst retaining
all along the one Ivip-mapped address (or multiple addresses, or a
/64 for IPv6), on which long-lasting sessions could be conducted.
If the mobile node had two TTRs at one time, with the ITRs tunneling
to TTRA, it wouldn't matter that the database and ITR network might
take some seconds to change the tunneling to TTRB. As long as the
mobile node accepted incoming packets from both TTRs at once, then
there should be few problems.
Switching to another TTR because the current one is unreachable (to
the Net or from the mobile node) is likely to take a few or many
seconds - so it would not be possible to use this global Ivip network
to achieve split-second changeovers and so have only sub-second loss
of connectivity.
A mobile node would need its own mobile software to find TTRs and to
establish tunnels to them. The mobile node would also need to decide
which TTR to send its outgoing packets on. Access to TTRs would
probably involve paying a fee, unless it was within the network the
mobile node is currently connecting with. Some central system to
help mobile nodes find nearby TTRs would also be needed. This
centralised system would probably be a commercial service, not
directly connected with the Ivip system, but would have the
credentials required to alter the mapping data for the end-user's
Ivip-mapped address(es). This centralised system would probably
monitor connectivity to the mobile node via the multiple TTRs and
direct the mobile node about which one is best to send outgoing
packets on. This central system would also probably control the
mapping, so if the currently used TTR and its link to the mobile node
became non-functional, the central system would quickly change the
mapping to another TTR. In this respect, the system would be doing
the same job as a centralised multihoming monitoring and failure
detection system.
Whittle Expires February 20, 2009 [Page 99]
Internet-Draft Ivip Architecture August 2008
A router or server which performs TTR functions may also be an ITRC
or ITRD, at least for encapsulating and tunneling packets which are
sent by the one or more mobile nodes it connects with. Two mobile
nodes which are sending packets to each other while using the one TTR
would have their packets either routed directly within the TTR or
would have them encapsulated by the ITR function in that device and
then decapsulated by the ETR part of the TTR function. In principle,
ITFH could be used in the mobile host, but this would add mapping
query packets to the traffic of the link, which can reasonably be
assumed to be a slow and expensive radio link in many cases. It is
better to leave the ITR function to the TTR-ITRC device, which has
connections to a nearby QSD/QSC and to an ITRD to handle packets it
doesn't yet have mapping for. The TTR could also integrate an ITRD,
but this would require it to get a continual feed of mapping updates.
Generally, the more TTRs there are and the closer they can be to
wherever mobile devices connect, the better - so an integrated ITRC
is probably the best choice.
The basic diagram of using a combined TTR and ETR is as follows.
................ ............
. N1 . . N2 .
. . . .
. CN1----ITR1~~~~~BR~~~TR~~~BR~~~~~TTR1===PE==\
. . \ . . =
................ | ............ =
| =
| =
| MN1
......... | ........... =
. N4 . / . N3 . =
. ./ . . =
. TTR2==BR======BR==========PE==/
. . . .
......... ...........
~~~~ 1-way Ivip tunnel
==== 2-way tunnel established by Mobile Node to TTR
Figure 10: Mobile IP with two TTRs.
This shows only one correspondent node CN1, but of course any number
of correspondent nodes would be using their nearby ITRs to tunnel
packets to the currently chosen TTR, which is TTR1. Packets sent by
Whittle Expires February 20, 2009 [Page 100]
Internet-Draft Ivip Architecture August 2008
CN1 travel to the internal ITR ITR2, where they are tunneled through
N1's BR (Border Router) the TR (Transit Router), N2's BR to the
tunnel endpoint (DA of outer IP header) TTR1. There, the ETR
function of TTR1 decapsulates the raw packet and then recencapsulates
it whatever way is required for the 2-way tunnel to MN1.
The mobile node MN1 has established 2-way tunnels to two TTRs. TTR1
is inside the access network N2 - for instance this might be a GRPS
or UMTS cellular mobile link to N2.
MN1 also has an airport lounge or in-flight WiFi link to N3.
Alternatively, the link could be via an Ethernet cable in an office
LAN setting, or an Ethernet or WiFi link in a home which gives it an
address behind a NAT. In any of these cases, it establishes a 2-way
tunnel to TTR2 which is in a separate network from N3.
Perhaps TTR2 is operated by a commercial TTR network operator and the
end-user pays to use this TTR. The TTRs of this company could be
located all over the Net, close to various provider networks, or
located within them - and the mobile nodes find them with the help of
some centralised control system this company provides. Exactly what
sort of 2-way tunnels are established is a matter for the mobile node
and TTR to decide - this has nothing directly to do with Ivip.
Currently, the mapping for MN1's IP address in whichever IMAB it is
located, causes the ITRs to tunnel packets to TTR1. Assuming MN1 is
currently directing its outgoing packets along the line to TTR1, the
flow of packets from MN1 follows the 2-way tunnel to TTR1, which
decapsulates the packets from the 2-way tunnel. Since CN1 is on an
ordinary BRIP address, there is no involvement of the ITR function I
which TTR1 is assumed to have. From TTR1, the packet uses ordinary
BGP forwarding, via N2's BR (Border Router), the TR (Transit Router),
N1's BR, ITR1 (operating as an ordinary router, since the DA of this
packet is not within an Ivip-mapped IMAB) and to the destination:
CN1.
In general TTR1 forwards the packets it receives from MN1 to the rest
of the Net. If the packet's DA is within an Ivip IMAB then this
packet would typically be handled by the ITR function which I suggest
should usually be integrated into an ITR. An exception might be made
if the packet was addressed to some other mobile node - MN2 (not
shown) TTR1 has a tunnel from. If TTR1 routes those packets directly
to the MN2, then this is the equivalent of the internal routing
system directly forwarding packets with Ivip-mapped addresses to
local hosts which have those addresses - as in point 2 of the
discussion above about "Note 2 - ETRs must handle packets from ITRs
in the same network". As noted in that discussion, this is probably
a bad idea. What really matters is the current mapping for that
Whittle Expires February 20, 2009 [Page 101]
Internet-Draft Ivip Architecture August 2008
Ivip-mapped address of MN2, not the fact that TTR2 happens to have a
link to it.
It is best for TTR1 to either have its own ITR function to decide
where the any packet with DA = an Ivip-mapped address (including
those with DA = a mobile node TTR1 has a tunnel from) should be
tunneled to. If the mapping database has TTR1 as the TELOC for MN2's
DID, then TTR1's ITR function could either encapsulate the packet and
tunnel it to TTR1's own ETR function (which would lead to it being
sent on the 2-way tunnel to MN2) - or it could forward the packet to
its 2-way tunnel input system directly. If TTR2 didn't have an
internal ITR function, it would be best if it let the packet out
where it would find a nearby ITR which would tunnel it according to
the current state of the mapping database. This may tunnel the
packet back to TTR1's ETR function - or to some other TTR - whichever
MN2, the end-user or the centralised management system has decided is
best. The mapping database and therefore the ITRs know the best way
to handle any packet addressed to MN2's Ivip-mapped address. The
fact that TTR1's "local routing system" has a link to MN2 is not as
important as the mapping information for MN2's address.
A centralised control system, perhaps operated by the same company
which runs TTR2 and hundreds of other TTRs, is not shown here.
Suppose this system determines that it would be best to use the TTR2
link instead. It simply changes the mapping (using the credential
previously supplied by the end-user) and within a few seconds all
ITRs will be tunneling packets to TTR2 instead. The centralised
control system would probably be in regular communication with its
corresponding software in MN1. This system doesn't need to rely on
the Ivip system (database and ITRs) for this communication, since it
can easily create its own encapsulated packets and send them to TTR1
and TTR2.
Whittle Expires February 20, 2009 [Page 102]
Internet-Draft Ivip Architecture August 2008
15. Security Considerations
There are clearly a plethora of potential security problems with
Ivip. Any system which controls the tunneling of all packets
addressed to one or more Ivip-mapped addresses is a tempting target
for many attackers. Due to the limited time available to prepare
this 00 draft, consideration of security matters is deferred until
subsequent versions.
Whittle Expires February 20, 2009 [Page 103]
Internet-Draft Ivip Architecture August 2008
16. IANA Considerations
[To do.]
Whittle Expires February 20, 2009 [Page 104]
Internet-Draft Ivip Architecture August 2008
17. Informative References
[I-D.farinacci-lisp]
Farinacci, D., "Locator/ID Separation Protocol (LISP)",
draft-farinacci-lisp-01 (work in progress), June 2007.
[I-D.ietf-shim6-proto]
Bagnulo, M. and E. Nordmark, "Shim6: Level 3 Multihoming
Shim Protocol for IPv6", draft-ietf-shim6-proto (work in
progress), April 2007.
[I-D.irtf-rrg-design-goals-01]
Li, T., "Design Goals for Scalable Internet Routing",
draft-irtf-rrg-design-goals (work in progress), July 2007.
[IAB-RAWS-website]
Meyers, D., "IAB Workshop on Routing and Addressing -
resources and presentations", December 2006.
[ICANN-DNS-attack]
"DNS Attack Factsheet 1.1", March 2007.
[ISC-Anycast]
Abley, J., "Hierarchical Anycast for Global Service
Distribution", March 2003.
[RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host
Anycasting Service", RFC 1546, November 1993.
[RFC4732] Handley, M., Rescorla, E., and IAB, "Internet Denial-of-
Service Considerations", RFC 4732, December 2006.
[RW ping survey]
Whittle, R., "Probing the density of ping-responsive-hosts
in each /8 IPv4 prefix and in different sizes of BGP
advertised prefix", March 2007.
[iPlane] "iPlane Datasets", July 2007.
[van-Beijnum-BGP]
van Beijnum, I., "Encoding routing information in
bitmaps", August 2001.
Whittle Expires February 20, 2009 [Page 105]
Internet-Draft Ivip Architecture August 2008
Appendix A. Acknowledgements
Thanks to the following people for LISP and for helping me in other
ways: Noel Chiappa, Olivier Bonaventure, Brian Carpenter, Dino
Farinacci, Vince Fuller, Joel M. Halpern, Geoff Huston, Ved Kafle,
Eliot Lear, Simon Leinen, Tony Li, Jeroen Massar, Dave Meyer, Chris
Morrow, Dave Oran, Robert Raszuk, Jason Schiller, John Scudder, K.
Sriram, Markus Stenberg, Christian Vogt and Kilian Weniger.
This I-D is the first attempt at documenting Ivip proposal - a month
after I first began devising it. Hopefully one or more ideas within
this proposal will prove to be of lasting value.
Whittle Expires February 20, 2009 [Page 106]
Internet-Draft Ivip Architecture August 2008
Appendix B. The Ivip acronym
The Internet is widely known for its positive commercial, cultural
and political impacts. Perhaps in the longer term the Internet's
interpersonal benefits may become better recognised.
I have lived most of my life in Melbourne, Australia. Without the
Internet's free, open, global, person-to-person and one-to-many
communications, I would never have known my wife Tina, who comes from
Houston, Texas.
One evening we were watching Doris Day, Rock Hudson and Tony Randall
in the 1961 romp "Lover Come Back". Advertising executive Jerry
Webster (Rock Hudson) finds himself in trouble - from which he
believes he can extract himself by convincing a dancer (Edie Adams)
that he will introduce her to Hollywood by making her the star of a
promotional campaign for a hot new product. She is keen and keeps
asking him what the product is. Casting his eyes around the room, he
sees a newspaper with a headline about a VIP. "Vip!" he exclaims.
He spends the rest of the movie trying to figure out what this great
new product will be.
The next night I thought up "anycast ITRs in the core, with EIDs
advertised in BGP" to make the LISP proposal incrementally
deployable. I wanted a name for a new proposal . . .
Initial meanings for ViP (later Ivip) included "Versatile redIrection
of Packets" and some others. "Internet Vastly Improved Plumbing"
came a few days later, and is the most memorable so far. Ivip's
semantics are user extendable.
"Ivip" is brief, distinctive and easy to pronounce ("eye-vip" as in
"ivory"). Capitalisation is user-configurable, but the first
character, upper case 'i', SHOULD be capitalized, because I believe
the Internet richly deserves its name remaining a proper noun - and
to discourage pronunciation such as "ivip" as in "itch".
The capital "I" raises a potential problem with sans-serif fonts such
as Helvetica, since it is indistinguishable from lower-case "L".
This has bedevilled the 3GGP term "Iub" (capital 'i') which is far
more widely known outside the organisation as "lub" (lower-case 'L').
"IViP" looks good in print but is annoying to type. Like "iViP",
"IViP" is reminiscent of the 1990s, while Ivip is in fact a 1960s
engineering product: www.firstpr.com.au/ip/ivip/tv-ad/.
Whittle Expires February 20, 2009 [Page 107]
Internet-Draft Ivip Architecture August 2008
Author's Address
Robin Whittle
First Principles
Email: rw@firstpr.com.au
URI: http://www.firstpr.com.au/ip/ivip/
Whittle Expires February 20, 2009 [Page 108]
Internet-Draft Ivip Architecture August 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Whittle Expires February 20, 2009 [Page 109]
| PAFTECH AB 2003-2026 | 2026-04-24 14:14:07 |