One document matched: draft-whittle-ivip-db-fast-push-03.txt
Differences from draft-whittle-ivip-db-fast-push-02.txt
Network Working Group R. Whittle
Internet-Draft First Principles
Intended status: Experimental January 19, 2010
Expires: July 23, 2010
Ivip Mapping Database Fast Push
draft-whittle-ivip-db-fast-push-03.txt
Abstract
From the base of draft-whittle-ivip-arch-03 and later, this ID
describes Ivip's fast-push mapping distribution system. This accepts
mapping changes from end-user networks or organizations they
authorise to make these changes. The mapping changes are handled by
RUAS (Root Update Authorization Server) companies who collectively
run the initial levels of a global network of Replicator servers.
This is a secure, packet-based flooding system which will propagate
the mapping changes to potentially hundreds of thousands of full
database query servers (QSDs) in ISPs and larger end-user networks
all over the world. This ID describes the overall system. The
distributed Fast Payload Forwarding system is described in detail in
draft-whittle-ivip-fpr.
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 23, 2010.
Copyright Notice
Whittle Expires July 23, 2010 [Page 1]
Internet-Draft Ivip DB Fast Push January 2010
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
Whittle Expires July 23, 2010 [Page 2]
Internet-Draft Ivip DB Fast Push January 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Outline of the RUAS and Replicator systems . . . . . . . . 4
1.2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. It may not be so daunting... . . . . . . . . . . . . . . . 7
2. Goals, Non-Goals and Challenges . . . . . . . . . . . . . . . 9
2.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2. Non-goals . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . . 11
3. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 12
3.1. SPI - Scalable PI space . . . . . . . . . . . . . . . . . 12
3.1.1. Conventional global unicast address space . . . . . . 12
3.2. MAB - Mapped Address Block . . . . . . . . . . . . . . . . 12
3.3. UAB - User Address Block . . . . . . . . . . . . . . . . . 13
3.4. Micronet . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5. RUAS - Root Update Authorisation System . . . . . . . . . 14
3.6. UAS - Update Authorisation System . . . . . . . . . . . . 14
3.7. UMUC - User Mapping Update Command . . . . . . . . . . . . 15
3.8. SUMUC - Signed User Mapping Update Command . . . . . . . . 17
3.9. MABUS - Update Stream specific to one MAB . . . . . . . . 17
3.10. Level 0 Replicators . . . . . . . . . . . . . . . . . . . 17
3.11. Level 1 and greater Replicators . . . . . . . . . . . . . 18
3.12. QSD - Query Server with full Database . . . . . . . . . . 18
3.13. QSC - Query Server with Cache . . . . . . . . . . . . . . 19
4. Update Authorities and User Interfaces . . . . . . . . . . . . 20
4.1. RUAS Outputs . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1. Update packets to level 0 Replicators . . . . . . . . 21
4.1.2. MAB snapshots . . . . . . . . . . . . . . . . . . . . 22
4.1.3. Missing Payload Servers (MPSes) . . . . . . . . . . . 24
4.2. Authentication of RUAS-generated data . . . . . . . . . . 25
4.2.1. Snapshot and missing packet files . . . . . . . . . . 25
4.2.2. Mapping updates . . . . . . . . . . . . . . . . . . . 25
4.3. RUAS - UAS interconnection . . . . . . . . . . . . . . . . 26
5. Common information to be sent by the FMS . . . . . . . . . . . 31
6. The Fast Payload Replication system . . . . . . . . . . . . . 32
7. Scaling limits . . . . . . . . . . . . . . . . . . . . . . . . 33
8. Managing Replicators . . . . . . . . . . . . . . . . . . . . . 36
9. Security Considerations . . . . . . . . . . . . . . . . . . . 37
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38
11. Informative References . . . . . . . . . . . . . . . . . . . . 39
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 40
Whittle Expires July 23, 2010 [Page 3]
Internet-Draft Ivip DB Fast Push January 2010
1. Introduction
The aim of this I-D is to establish that Ivip's fast-push mapping
distribution system (FMS) is practical and desirable for very large
numbers of micronets (EIDs in LISP terminology) and rates of change
of the mapping database. All parts of Ivip are intended to be
operated by a variety of organisations, with appropriate cooperation
- including between companies which are competing with each other.
Please refer to [I-D.whittle-ivip-arch] for an explanation of Ivip in
general. A glossary of Ivip and some general scalable routing terms
and acronyms is: [I-D.whittle-ivip-glossary].
This is a revision of the 02 version, with a substantial
simplification of what was previously the "Launch server" system
which drove the level 1 Replicators. These are replaced by level 0
Replicators, which are functionally identical to other relatively
simple Replicators, but use more input streams. The level 0
Replicators are fully meshed with each other and this mesh is driven
by packets from the multiple RUASes (Root Update Authorization
Servers). A new network element - a Missing Payload Server - is also
introduced. Please see [I-D.whittle-ivip-fpr] for a detailed
explanation.
1.1. Outline of the RUAS and Replicator systems
The most important part of the FMS is comprised of thousands (perhaps
tens of thousands in the long term future) of essentially identical
"Replicator" servers. This can be viewed as taking a tree-like
structure somewhat similar to a multicast set of routers, driving
multiple such trees with the same data, and then cross-linking the
branches at multiple levels so that the payload of a packet lost at
one replication point will be replaced by an identical payload in
another packet from another branch.
Each Replicator receives at least two streams of identical mapping
data, so it is much less likely to miss a payload than if it only
received the payloads in a single stream of packets from a single
source.
A better way to view the system is that it floods each replication
point from at least two directions from the previous level. The
items being flooded are the payloads of DTLS packets - UDP packets
whose contents are encrypted to prevent attacks involving spoofing
such packets in order to propagate them in the Replicator system.
At level 0, the Replicators flood each others and the larger number
of level 1 Replicators. The level 1 Replicators flood the larger
Whittle Expires July 23, 2010 [Page 4]
Internet-Draft Ivip DB Fast Push January 2010
number of level 2 Replicators. Since the flooding unit is a packet
payload, as soon as a particular payload is received, it is
replicated, so the delay time in each point of replication will be
very short indeed - probably only milliseconds.
In this way, it is reasonable to expect a single payload, injected
securely to the level 0 Replicators, to be fanned out to hundreds of
thousands of end-points all over the Net, in a time not much longer
than is imposed by the intervening routers and data links. If there
are 5 levels of x10 "amplification" and each involves a 20msec delay
(hopefully it would be less), then we can expect global delivery of
the payloads to most fibre-linked locations within 300 to 350ms.
(From Melbourne Australia, most of the distant sites in Russia or
Africa have RTTs in the 300 to 450msec range, but I found a server
apparently in Swaziland (cr1mba.swazi.net) with RTTs of 710 to
900ms.)
In this way, each Replicator consumes two identical streams from
geographically and topologically different sources, and fans the
content of the streams out to some larger number of Replicators or
QSDs at the next level. This number of output streams per Replicator
may be in the tens to one hundred range, depending on the volume of
updates. Initially, it would be quite high, when update rates are
low - meaning that the initial global Replicator network could serve
the growing number of QSDs with just three or so levels of
Replicators, and with each one fanning out updates to a large number
of Replicators at the next level.
After some number of levels of replication, determined by local
conditions, the streams deliver the update information at a QSD.
Ideally, each QSD will receives two streams from two geographically
dispersed Replicators. These need not be at the same level, so the
system is relatively flexible, and each Replicator will generally be
sending a complete streams of packets.
There will also be a distributed system of Missing Packet Servers
(MSPs) which receive streams from Replicators and store the payloads
for ten or so minutes. MSPs will compare notes with each other via
TCP (probably HTTP or HTTPS) and so form one or more worldwide
distributed groups which will quickly replace any payload they
missed. QSDs will query one or more MSPs - probably a close one and
a distant one - and so be able to receive missing payloads on the
hopefully rare occasions when one or more payloads are missing from
the two or more streams each QSD normally receives from Replicators.
RUASes asynchronously feed packets into the fully meshed ring of
level 0 Replicators.
Whittle Expires July 23, 2010 [Page 5]
Internet-Draft Ivip DB Fast Push January 2010
Snapshots of segments of the mapping database are taken regularly by
each RUAS. Each snapshot contains a complete copy of the mapping of
one MAB (Mapped Address Block) at a particular instant. At that
point in time, a hash function of the mapping data for this MAB is
generated and within a few seconds is sent to all QSDs. This enables
each QSD to verify its copy of the mapping for this QSD is fully up-
to-date.
During initialisation, and if an error is found in the local copy of
the mapping for a particular MAB, the QSD downloads snapshots from
HTTP servers provided by the RUAS companies. The QSD buffers all
updates for the MAB which arrive after the snapshot and hash message.
Once the snapshot is downloaded and unpacked into the QSDs copy of
the mapping database, the buffered updates are applied and the
database then contains an up-to-date copy of mapping for this MAB.
Updates are then applied as they arrive from the two or more upstream
Replicators.
1.2. Assumptions
For the purposes of this discussion, it is assumed there will be a
single global Ivip system, with multiple organisations being
responsible for the management of the various blocks of address space
which are managed with Ivip. The system itself is intended to be
decentralised and have no single point of failure. Furthermore, it
is intended to be highly suitable for being built, operated and
expanded upon by a number of separate organisations, who cooperate
much as do the organisations which run the DNS today.
It would also be possible for an organisation to establish an Ivip-
like system, without reference to any IETF RFCs, and to conduct a
business renting out address space in small, flexible, chunks, with
portability and multihoming via any ISP who provides the requisite,
relatively simple, ETRs. The most likely scenario is this being
done, with one or more independent Ivip-like systems operated by
different companies, primarily for supporting TTR mobility [TTR
Mobility], but also usable for portability, multihoming and inbound
Traffic Engineering for non-mobile end-user networks.
For simplicity, this ID assumes that Ivip development will be
coordinated into a single global system, as DNS is, following
appropriate IETF engineering work and administrative decisions in
RIRs and other relevant organisations. A development timeframe of
2010 to ca. 2014 is assumed, with widespread deployment being
achieved later in the decade, for IPv4 at least.
The IPV4 FMS for is identical in principle to the IPv6. The server
software which implements the Replicators will probably remain as two
Whittle Expires July 23, 2010 [Page 6]
Internet-Draft Ivip DB Fast Push January 2010
separate items, but a single server could run them both,
independently, and so be both an IPv4 and IPv6 Replicator. Each RUAS
would have both IPv4 and IPv6 sections, with separate outputs of
mapping data. The level 0 Replicator servers for IPv4 would be
physically different and independent of those for IPv6.
In addition to the global fast push database update distribution
system discussed in this ID and in [I-D.whittle-ivip-fpr], Ivip also
involves Query Servers sending "notifications" to ITRs which recently
requested mapping for a micronet whose mapping has just changed.
This is a second form of push - on a local scale - and is outlined in
[I-D.whittle-ivip-arch] .
This ID concentrates on IPv4, since the future core-edge separation
architecture is more urgently required for IPv4 than for IPv6. In
principle, the same arrangements will apply for IPv6, with a
different and more verbose data format than the 12 or so bytes
required for each IPv4 mapping update. It may make sense to defer
finalisation of any future IPv6 map-encap scheme until substantial
operational experience was gained with the IPv4 scheme.
1.3. It may not be so daunting...
Ivip documentation is written with a preference for detailed
discussion over terseness. So Ivip IDs may appear rather daunting at
first. Hopefully these IDs will be clearly understandable, and the
reader will recognise that this scalable routing solution is a
momentous development, requiring detailed consideration. Ivip goes
beyond the formal RRG requirements of providing portability (the only
way of allowing free choice of alternative ISPs) multihoming and
inbound traffic engineering, by also providing with TTR mobility, a
global mobility system for both IPv4 and IPv6. While no mapping
changes are required unless the Mobile Node moves a large distance,
such as 1000km or more, it is important that the Ivip FMS be able to
scale to very large numbers of updates and cope with mapping
databases for up to 10^10 micronets.
This ID focuses on handling billions of micronets and potentially
thousands of updates a second. These data-rates may sound high
today, but domestic customers are already downloading full quality
video in real-time. By the time such large levels of adoption arise,
the bandwidth needed for these will not be a significant obstacle.
However, it is difficult to imagine a situation where more than 10
billion mapping changes are needed each year, which is an average of
320 a second. There would be peaks, but with an IPv6 mapping change
requiring about 32 bytes this is an average of 100kbps.
Whittle Expires July 23, 2010 [Page 7]
Internet-Draft Ivip DB Fast Push January 2010
During initial deployment, the demands on the fast push system will
be far lighter than those anticipated below, so the system might
initially be somewhat simpler. In the initial stages of
introduction, there may be little need to deploy dedicated servers
for the "Replicator" functions, since the volume of updates may be so
light as to make it practical to run this software on existing
servers, such as nameservers.
Furthermore, in the early years of introduction, when there are
hundreds of thousands or a few million micronets, the low level of
update packets (compared to the highest imaginable levels
contemplated below) should enable each Replicator to fan out to many
more next-level Replicators than would be possible when hundreds of
millions or billions of micronets are handled by the system. This
would mean fewer levels of Replicators and fewer Replicators than
would be possible with current technology if the system was handling
billions of micronets.
This ID explores how the FMS would be structured in the most
demanding future scenarios which can be realistically expected.
Building the initial FMS for trials and early services won't be as
daunting as it may look from the diagrams and discussions below.
Whittle Expires July 23, 2010 [Page 8]
Internet-Draft Ivip DB Fast Push January 2010
2. Goals, Non-Goals and Challenges
2.1. Goals
The overall goal of the fast push system is to enable end-users, who
manage the mapping of their one or more micronets of address space,
to securely, reliably and easily communicate their mapping change
command to some organisation with which they have a business
relationship, so that that change will be propagated to every QSD as
soon as possible.
"As soon as possible" means typical delay times of a few seconds,
ideally zero seconds, but in practice probably two or so seconds.
Prior to 2010-01-18, the Ivip IDs mentioned longer times than this,
but this was on the basis of "Launch" servers executing a complex
pipelined process which would take three or so seconds. This
arrangement is now replaced by fully meshed level 0 Replicators,
which have no complex protocols, pipelining or delays.
"Reliably" means that in the great majority of cases, the QSDs
receive every mapping change as expected and that in the relatively
rare event of this being impossible due to packet loss, that the QSD
can recover from this situation within one or at the most two seconds
by requesting a copy of the packet from one or more Missing Payload
Servers (MPSes - which were also introduced on 2010-01-18).
Reliability also involves robustness against DoS attacks. This can
never be completely protected against for any device on the open
Internet, since its link(s) can easily be flooded by packets sent
from botnets etc. As mentioned in [I-D.whittle-ivip-fpr]
considerable protection from DoS attacks could be achieved by running
the level 0 and level 1 Replicators via private network links. These
levels would be owned and operated by the RUAS companies working
together. This would enable reliable feeds to hundreds or perhaps a
thousand or so level 2 Replicators all over the Net, which would mean
that a DoS attack would not be able to cause so much trouble.
"Securely" means that each QSD which receives the updates will be
able to instantly verify that the updates are genuine, rather than
the result of an attacker who might, for instance, send forged
packets to that device or to some other part of the fast push system.
The data format for the mapping update packets is for further work.
There will be end-to-end encryption so that the QSD can authenticate
the mapping data originated from the RUAS which sent it. Whether
this involve authenticating each individual payload, or combining
typically multiple packet payloads into a single body of data to be
authenticated, remains to be decided. Sometimes, probably quite
frequently, the RUAS will send only a single packet of updates, so
Whittle Expires July 23, 2010 [Page 9]
Internet-Draft Ivip DB Fast Push January 2010
then the entire payload would be authenticated, since there are no
other payloads to consider. The data format needs to provide for
open-ended extensions in the future and to support authentication.
In the present design, DTLS RFC 4347 UDP packets are sent from RUASes
to Replicators, from Replicators to other Replicators and from
Replicators to QSDs and MPSes. This protects against an attacker
spoofing a packet and having it Replicated or accepted by a QSD or
MPS. However, it cannot be completely assured that a replicator was
not under control of an attacker - which would enable them to send
packets which would be replicated and accepted.
The most common mapping change command, as sent by the end-user, or
by some other organisation or device which has the end-user's
credentials, would involve the length of the micronet being checked
to ensure it is the same as the currently configured length of the
micronet which starts at that location. The end-user's command might
be part of an encrypted exchange involving a challenge-response
protocol and the end-user's private key. Alternatively, an encrypted
link could be used, such as via HTTPS, and a conventional username
and password given as part of the command.
The end-user would previously have communicated directly or
indirectly with their RUAS to configure their total assigned address
space into one or more micronets. This ID concentrates on the
changes of ETR address for existing micronets, but the mapping change
packets will also contain information about how existing micronets
have been deleted and replaced by other micronets, smaller or larger
and with different start and end-points.
RUASes and the level 0 and level 1 Replicators are few in number and
will be administered carefully, so this ID does not consider
automated aids to their management and debugging. However, the rest
of the Replicators, level 2 and greater, will be numerous and
operated by a wide range of organisations. Future work will concern
maximising the degree to which the Replicator system can be robustly
and easily managed, rather than requiring a great deal of manual
configuration etc.
In order to debug the way the Ivip system is used, such as transient
erroneous or malicious mapping updates which cause packets to be
tunnelled to addresses where they are not welcome, there will need to
be a system which monitors all mapping changes and keeps a lasting
record of them. Then, aggrieved parties can search such a system for
the address on which the received the unwanted packets, and so
determine the micronet involved. This will enable the aggrieved
party to complain to the RUAS which is responsible for that micronet.
This "mapping history" function could be performed by one or multiple
Whittle Expires July 23, 2010 [Page 10]
Internet-Draft Ivip DB Fast Push January 2010
separate systems, each simply taking a feed from the Replicator
system.
2.2. Non-goals
Apart from checking the ETR address against any specific exclusion
lists (such as specific prefixes, private RFC 1198 and multicast
space) and to ensure it is not part of a Mapped Address Block (MAB -
a BGP advertised prefix containing SPI space, divided into many
micronets), the entire Ivip system takes no interest in whether there
is a device at that address, whether the address is advertised in
BGP, whether there is or was an ETR at that address, whether the ETR
is reachable or whether the ETR can deliver packets to the micronet's
destination device.
These are all matters which fall under the responsibility of the end-
user network whose micronet is being mapped to this ETR address.
It is not a goal of the system to keep mapping changes secret from
any party. This would be impossible. Therefore, it cannot be a goal
of this or probably any core-edge elimination scheme that in a mobile
setting, the movement of an individual's device could not be inferred
by anyone who monitors the mapping updates. However, the mapping
only concerns the currently active TTR. MNs can still use a TTR no-
matter where they are physically connected, and using a TTR hundreds
or even thousands of km distant will probably present no serious
difficulties due to path-length or lost packets. So mapping changes
need not indicate much, or anything, about the physical location of
the MN.
Replicators perform a best-effort copying of mapping update packets.
They do not store the payloads of these packets for any appreciable
time or attempt to request a payload which is missing from their two
or more input streams.
2.3. Challenges
Please refer to the Ivip Fast Payload Replication ID
[I-D.whittle-ivip-fpr] for discussion of the most difficult
challenges or the FMS. The present ID concentrates on the overall
system, including the RUASes and UASes which connect to them. Here,
the FPR system - Replicators and Missing Payload Servers - are
regarded as a subsystem.
Whittle Expires July 23, 2010 [Page 11]
Internet-Draft Ivip DB Fast Push January 2010
3. Definition of Terms
3.1. SPI - Scalable PI space
Once Ivip is operational, a growing subset of the global unicast
addresses will be handled by ITRs tunnelling the packets to an ETR,
which delivers the packets to the destination. This subset is used
by end-user networks and provides portability, multihoming and
inbound traffic engineering in a manner which is highly scalable -
does not overly burden DFZ routers.
SPI space is "mapped" by Ivip and this mapping system can divide it
into smaller sections than is possible with BGP in the DFZ - a 256 IP
address granularity for IPv4, due to a widely enforced convention on
the lengths of routes which are accepted.
The granularity with which Ivip maps SPI space - dividing it into
micronets (described below) is single IP addresses for IPv4, and /64
prefixes for IPv6.
3.1.1. Conventional global unicast address space
This is global unicast address space as it is used today. With Ivip,
this will be a subset of the full unicast space - the part which is
not used for SPI space. The LISP term for this is "RLOC" space.
3.2. MAB - Mapped Address Block
A MAB is a BGP advertised prefix which is used as SPI space. DITRs
(Default ITRs in the DFZ) all over the Net advertise this prefix,
tunnelling the packets to ETRs according to the current mapping for
the destination address of each packet.
A MAB could, in principle, be as large as a /8. Larger MABs are
preferred in general, because each one burdens the BGP system with
only a single advertisement, but includes the SPI space of
potentially hundreds of thousands of end-user networks. However, for
reasons discussed below - including load sharing between ITRs and
ease of initially loading snapshots of the mapping database - it may
be best if MABs are more typically in the /12 to /17 range for IPv4.
MABs do contribute to the load on the DFZ's BGP control plane, and
involve one more route in the RIB and FIB of all DFZ routers.
However, a MAB typically supports the address needs of thousands or
tens of thousands of end-user networks. This ratio is how Ivip or
any other successful core-edge separation architecture solves the
routing scaling problem. Without such an architecture, each of these
end-user networks would either require their own route (AKA "prefix")
Whittle Expires July 23, 2010 [Page 12]
Internet-Draft Ivip DB Fast Push January 2010
in the DFZ, or not be able to obtain address space which was portable
and suitable for multihoming and inbound TE.
3.3. UAB - User Address Block
Each MAB typically contains address space which has been assigned by
some means to many (perhaps tens of thousands) separate end-users. A
UAB is a contiguous range of addresses within a MAB which is assigned
to one end-user. UABs are important divisions for the RUAS company,
but UABs are not specifically mentioned or needed in the mapping
update packets handled by Replicators. Nor are UABs relevant to the
operation of QSDs, QSCs (caching query servers), ITRs or ETRs.
A MAB could be assigned entirely to one end-user - as might be the
case if the end-user converted a prefix of theirs which was
previously conventional PI space to be managed as SPI space by the
Ivip system. Generally speaking, MABs are ideally large (short
prefixes) and each contains space for multiple end-users. Generally,
MABs are owned or at least administered by MAB companies, who rent
SPI space to end-user networks. Each MAB must have its mapping
handled by a single RUAS. The company which operates the MAB may
have its own RUAS. If not, it will contract the services of an RUAS
to handle mapping distribution for this MAB. Ivip is intended to
support dozens of RUASes, perhaps a hundred or so - though if there
was a need, more than this could be accommodated.
An end-user might have multiple UABs in a MAB, UABs in multiple MABs
from the same company or UABs in MABs from multiple MAB companies.
For simplicity, this ID assumed each end-user has a has a single UAB.
UABs are specified by starting address and length, in units as
mentioned above: IPv4 addresses or IPv6 /64s. A MAB's boundaries are
always on power-of-two boundaries of these units, since it is a
prefix advertised in the DFZ. UABs and micronets have arbitrary
starting points and lengths - they are not at all constrained by
binary "prefix" boundaries.
3.4. Micronet
Following Bill Herrin's suggestion, the term "micronet" refers to a
range of SPI space for which all addresses have the same mapping. In
LISP, these are known as EID prefixes. In Ivip, a micronet need not
be on binary boundaries - it is specified by a starting address and a
length, in units of single IPv4 addresses or IPv6 /64 prefixes.
An end-user could use their entire UAB as a single micronet, or they
could split it into as many micronets as they wish, and change these
divisions dynamically.
Whittle Expires July 23, 2010 [Page 13]
Internet-Draft Ivip DB Fast Push January 2010
Any micronet which is mapped to zero (its ETR address is 0.0.0.0 in
IPv4) will cause ITRs to drop any packets addressed to this micronet.
A micronet can be defined within the whole or part of a contiguous
range of address space which is currently mapped to zero, by the FMS
carrying an update message specifying the new micronet's starting
address, its length, and a non-zero address for its mapping. (Future
work: decide exactly what instructions are needed and which sequences
of operations are allowable for making new micronets in place of
existing ones.)
3.5. RUAS - Root Update Authorisation System
Multiple RUASes collectively generate the total stream of mapping
update messages. Each RUAS is responsible for one or more MABs.
There may be a dozen to a hundred or so RUASes. Greater numbers of
RUAS companies is good for competition and innovation. Prior to
2010-01-18 it looked technically difficult to have more than a dozen
or so RUASes. With the simplified layer 0 Replicator arrangement,
there can be as many RUASes as each (or most) layer 0 Replicators
have DTLS sessions with. So in principle, if there was a need for
several thousand RUASes, I am sure the Replicator software could be
made to handle this number of DTLS sessions.
Each RUAS receives mapping updates either directly from end-user
networks (or their appointed Multihoming Mapping companies) - or
indirectly via intermediate organisations, each of which runs a UAS.
3.6. UAS - Update Authorisation System
A UAS is the system of an organisation which accepts mapping change
commands from end-users, and conveys them directly - or perhaps
indirectly via another UAS - to the RUAS which handles the relevant
MAB. An RUAS which accepts mapping update commands from end-users
does so via its own UAS system.
A UAS accepts upstream input from end-users and/or other UASes. It
generates output to downstream RUASes and/or other UASes. One UAS
may have relationships with multiple RUASes. A MAB may be assigned
to an RUAS and control of parts of this may be delegated to multiple
UASes. A single UAS may work only with a single RUAS, or with
multiple and perhaps all RUASes.
Whether the MAB itself is administratively assigned (by an RIR, or
some national Internet Registry) to the UAS or to the RUAS is not
important in a technical sense. End-users will choose address space
according to the RUAS (and any UASes) it depends upon with care,
because the reliability of this MAB's address space will forever be
dependent on these organisations.
Whittle Expires July 23, 2010 [Page 14]
Internet-Draft Ivip DB Fast Push January 2010
If the MAB is not operated by an RUAS company, then the company or
organisation which operates it can choose any RUAS to handle its
mapping. In this case, while an end-user network may choose to rent
its SPI space from this particular MAB operating company, in part
based on the reputation of the RUAS company currently chosen by the
MAB operating company, the operating company could at any time select
another RUAS company. If it did so, it would presumably arrange for
whatever UAS system its SPI-renting customers used to work with the
new RUAS. Assuming this is the case, then the end-user networks
would not perceive any change, or alter however they control their
mapping.
The number of RUASes will probably be limited to some degree, such as
dozens or a hundred or so, enable them to efficiently and reliably
work together with their jointly operated system of level 0 and 1
Replicators to create a single stream of updates for the entire Ivip
system. The ability of companies with UASes to act as agents for
RUAS companies and/or to have their own MABs which they contract a
RUAS to handle the mapping for, will enable a large number of
organisations to compete in the rental of SPI space.
3.7. UMUC - User Mapping Update Command
(I apologise for the muddy sounding acronym. Finding short, unused,
meaningful, pronounceable acronyms which have not already acquired
meanings in the IETF is quite a challenge!)
A UMUC is whatever action the end-user performs on one or more
different user-interfaces of whatever UAS they use to change the
mapping of their one or more micronets. The system would also be
able to tell the user the current mapping and also confirm that a
requested change to the mapping was acceptable. In other words, the
system lets end-user networks (and/or whichever Multihoming
Monitoring company they contract to control the mapping of their
micronets) to "see" (server-to-human and server-to-server) how their
UAB is broken into micronets and what ETR addresses those micronets
are mapped to.
The UAS system could also provide diagnostics such as testing the
reachability of their network via one or more ETR addresses. The
system would also enable trialling mapping changes and altered
micronet boundaries without actually executing the changes - so the
end-user network operators can manually test their proposed changes
are valid, before actually making them.
QSDs will only accept certain kinds of updates, and it is vital that
the mapping updates are applied in the order they are sent - and that
these updates are in themselves valid. For instance, it will
Whittle Expires July 23, 2010 [Page 15]
Internet-Draft Ivip DB Fast Push January 2010
probably be mandatory for micronets to be mapped to an ETR address of
0.0.0.0 before being split or joined. This rule will probably apply
firstly to mapping updates arriving in QSDs and being applied to
update the local copy of a MAB's mapping database, but also to
mapping updates sent by QSDs to any querier which previously received
mapping for a micronet whose mapping has just been changed. The
querier could be a QSD or an ITR. It will be important for the UAS
to ensure the update commands it sends to the RUAS are valid
according to these constraints.
In addition to testing proposed changes for validity, the UAS system
should be able to combine multiple updates into a single set, to be
executed in order, but at the same time. The complete set would be
sent on the FMS as part of a single message. Ideally the message
would be in a single payload of a packet, but if not, then the data
format will recognise a complete set of updates are spread over two
or more payloads, and ensure the complete message is ready before
executing it. For instance, mapping an 8-long micronet's ETR address
to zero, and splitting it into three smaller micronets and then
setting the ETR address of each. This would involve 17 commands.
When testing proposed changes, or deciding whether to accept changes
which have been ordered with the end-user network's credentials, the
UAS system would generate an error if the mapping was to a disallowed
address - multicast, SPI space, private address space or to some
other prefixes to which the Ivip system does not support the
tunnelling of packets. Similarly, and error would be generated if
the end-user attempted to change the mapping for some address space
outside their UAB, or if they defined a new micronet within that
space with non-zero mapping, or which overlapped some addresses for
which the mapping was currently non-zero.
For the sake of discussion, it will be assumed that all UMUCs have
passed these validity tests at the UAS and are for valid mapping
addresses - so a UMUC is a successfully accepted update command from
the end-user, or some person or system or with the end-user's
credentials.
There could be many methods by which this command is communicated to
the UAS, including HTTPS web forms with username and password
authentication. SSL sessions might be more suitable for automated
mapping change systems, such as those of a Multihoming Monitoring
company which the end-user authorises to control the mapping of some
or all of their UAB.
In addition to authentication, the command takes the form of the
starting address of the micronet, the length of the micronet, and a
single ETR IP address to which this micronet will have its mapping
Whittle Expires July 23, 2010 [Page 16]
Internet-Draft Ivip DB Fast Push January 2010
changed to.
3.8. SUMUC - Signed User Mapping Update Command
This is the information contained in a UMUC, signed by the UAS which
accepted it from the user (or by some other UAS), being handed down
the tree to another UAS or to the RUAS of the tree, so that the
recipient UAS/RUAS can verify the signature and regard the UMUC as
authoritative.
3.9. MABUS - Update Stream specific to one MAB
This is a stream of data by which the real-time updates to the
mapping data for any one MAB are conveyed. For the purposes of
discussion, the RUASes and the Launch system are assumed to work in a
synchronized fashion, generating a body of updates for each MAB which
are gathered together in some way over a short period of time. Prior
to 2010-01-18, I assumed the whole FMS would operate on one-second
cycles. Now, the core of the FMS - the Replicator system - is
asynchronous and the best thing would be for RUASes to sent packets
along it in a reasonably even manner, but coordinated so as not to
exceed some agreed total maximum data rate in any period such as 0.1
seconds.
Mapping changes are typically not urgent to the point of not being
able to wait a second or so. So it would make sense for an RUAS to
bundle multiple updates for one MAB together, before sending them to
the FMS, either alone in a packet payload, or together with updates
for other MABs.
For the purposes of discussion, we can imagine each RUAS buffering
changes for any one MAB for up to a second in order to collect them
together. Of course, for some MABs, hours or even days may pass
without a mapping change. This discussion is intended to explore the
more demanding scenarios.
Each RUAS will generate one MABUS for each of its MABs. So each
second or so, the RUASes collectively generate a variable length body
of update information for every MAB in the Ivip system. Some or many
of these may contain no updates. The MABUS includes mapping changes
(altering ETR addresses of existing micronets), changes to micronet
boundaries and snapshot messages (described above). The data format
would be extensible for purposes not yet anticipated.
3.10. Level 0 Replicators
A small (such as 8) number of widely dispersed Replicators which
receive packets from all the RUASes on a continual basis, and where
Whittle Expires July 23, 2010 [Page 17]
Internet-Draft Ivip DB Fast Push January 2010
each one also sends a stream of whatever it received to each other
one. This is a "fully meshed" set of Replicators. These are the
only ones to receive packets from RUASes and the only ones to drive
Replicators in the other levels.
3.11. Level 1 and greater Replicators
A cross-linked, tree-like, system of Replicators form a redundant,
reliable, high-speed distribution system for delivering mapping
updates to full database ITRs and Query Servers all over the Net.
Each Replicator receives one or more (typically two) streams of
update packets from an upstream Replicator or Launch server. These
two source streams should come from widely topologically separated
sources, ideally over two separate physical links. For instance a
Replicator in Berlin might receive its update streams from London and
Berlin, two sources in Berlin which are in different ISP networks, or
in any combination which minimises the likelihood that both sources
will be disrupted by any one fault.
The Replicator identifies the DTLS payloads of each packet by the
"Fresh / Repeat" algorithm, which is described in:
[I-D.whittle-ivip-fpr]. The first time a packet with a particular
payload arrives at a Replicator, it is detected as being "Fresh" and
then the payload is replicated as DTLS packets to all the downstream
devices, which can be Replicators, QSDs or MPSes. When another
packet with the same payload arrives later, as it probably will from
the other input stream, the second one is recognised as a "Repeat"
and no further action is taken with it.
At present I am assuming each Replicator will receive typically two
streams and send typically 20 streams. However, it may be possible
to have many more output streams, such as 50 or 100.
Replicators could be implemented in routers, but are probably best
implemented in ordinary software on a GNU-Linux/BSD etc. COTS
(Commercial Off The Shelf) server. Replicators do not cache
information and need no hard drive storage. A server performing as a
QSD could also operate as a Replicator.
3.12. QSD - Query Server with full Database
QSDs get a full feed of updates from one or more Replicators. When
they boot, they download individual snapshot files for each MAB in
the Ivip system.
QSDs respond immediately to queries from nearby ITRs and from caching
Query Servers (QSCs) - and send notifications to these if mapping
Whittle Expires July 23, 2010 [Page 18]
Internet-Draft Ivip DB Fast Push January 2010
data changes for a micronet which was the subject of a recent query.
QSDs have no routing or traffic handling functions. In a full-scale
billion-plus micronet deployment they need a lot of memory, so the
best way to implement a QSD is probably on an ordinary server with
one or more gigabit Ethernet interfaces. No hard drive is required,
except perhaps for logging purposes.
3.13. QSC - Query Server with Cache
A QSC could be implemented in a router or more likely a COTS server.
It does not route packets, and its memory and computational
requirements will be modest compared to those of a QSD. There is no
need for a full feed of updates from the Replicator system. However,
each QSD must be able to get mapping information from one or more
upstream QSDs - or via upstream QSCs which themselves access upstream
QSDs.
The easiest way to implement a QSC would be software on a modest
server, which would only need a hard drive for logging purposes.
Whittle Expires July 23, 2010 [Page 19]
Internet-Draft Ivip DB Fast Push January 2010
4. Update Authorities and User Interfaces
This section is a detailed discussion of the fast push mapping
distribution system itself, starting with the systems which accept
commands from end-users (or their authorised representatives or
systems) and prepare the information to be fanned out worldwide via
the level 0 Replicators.
This is the early stage of an ambitious design, so a number of
options are contemplated. This section of the system may not need
IETF standardised protocols, since only a small number of
organisations need to interact to make it work. The Replicators and
the data format of mapping updates do need to be standardized. The
purpose of exploring the RUAS and Launch server systems is to
estimate the difficulty of constructing them - and hopefully to show
that an approach like this is feasible and desirable. There may well
be easier approaches than the ones explored here.
Probably the closest thing to them would be the large scale systems
for managing DNS, such as for .com and other major TLDs. I don't
know anything about these and people with experience in such systems
could probably design the UAS, RUAS and perhaps Launch server systems
better than I could.
The real-time nature of these systems of controlling ITR behavior has
no precedent. Generally, the system should work on a continual
basis. However, if there is a technical problem or the system is
stopped for a few minutes to do an upgrade or whatever, the Internet
is not going to grind to a halt. In that downtime, end-user networks
which experience a multihoming failure will have to wait for their
connectivity to be restored. Likewise, end-user networks which send
mapping changes for inbound TE will have to wait. The effect on TTR
mobility would be minor, since mapping changes are not required when
the MN changes its physical connections, including when moving to an
entirely different access network. The delay in mapping changes
means that those few MNs which have chosen a new, closer, TTR will
need to wait for traffic to be tunneled to that new TTR - meaning
they will need to keep up the tunnel to the old, and now more
distant, TTR for these minutes. Normally, with mapping changes
getting to ITRs in a few seconds, the MN could terminate the tunnel
to the old TTR within a few seconds of the ITRs beginning their
tunneling to the new TTR.
The final authority to control mapping information is fully devolved
to end-users, who by means of a username and password or some other
authentication method, are able to issue commands to define micronets
within their UAS, and to map each micronet to any ETR address.
Whittle Expires July 23, 2010 [Page 20]
Internet-Draft Ivip DB Fast Push January 2010
However the physical authority to control the mapping of all Mapped
space within a single MAB rests with a single RUAS. That RUAS may be
acting for a UAS who is administers a MAB. The RUAS may administer
it - perhaps on behalf of another company - and may delegate control
of parts of it to one or more UASes. The RUAS may have relationships
directly to the end-users of this MAB, through its own UAS. Here we
discuss the flow of information and trust between these various
entities, in real-time, so that every second or so each RUAS
assembles a body of update information for each of its MABs.
In the diagrams below, each RUAS or UAS is depicted as a single
entity. Each such entity acts as a single functional block, but
would typically be implemented as a redundant system over several
servers.
4.1. RUAS Outputs
4.1.1. Update packets to level 0 Replicators
Each RUAS is largely autonomous in when it generates packets to be
sent to level 0 Replicators. Ideally it would spread its packets out
smoothly in time. Ideally it would send fewer, larger, packets than
more numerous small ones.
In future work I intend to describe a means by which the RUASes
collectively manage the data capacity of the FMS. One aspect of this
is usage fees of some kind. Since the FMS is a shared resource,
which burdens Replicators, QSDs and MPSes all over the world
according to the packets it carries, there needs to be an arrangement
whereby RUASes don't send packets for no good reason. Since RUASes
will be charging end-user networks, directly or indirectly, for each
mapping change, there will probably be some kind of traffic-based
usage fees or settlement system amongst the RUASes which collectively
run the first two or more levels of the Replicator system.
Exactly how this will be done commercially does not need to be
defined. What matters is that the technical elements can feasibly be
used in a way which supports a shared, cooperative, effort to run the
system reliably and in a way that no RUAS places unreasonable burdens
on other parties. There would probably need to be some kind of
agreement, consortium or the like for governing the FMS. The design
presented here is to show that such a system could work well, not
depend on any one RUAS or device, and that it could support a large
enough number of RUAS companies, with RUAS systems and the level 0
Replicators, physically dispersed in many countries.
Another aspect is the moment-to-moment management of the total volume
of packets sent. This would be partly a question of the number of
Whittle Expires July 23, 2010 [Page 21]
Internet-Draft Ivip DB Fast Push January 2010
packets and mainly a question of their total length - in bits per
second over some short time period such as 0.1 seconds or so.
While data rates would grow over the years, at any one point in time,
the whole FMS system would have some kind of specification for the
peak data rate of the packets it carries. If this was 100kbps, then
each Replicator which accepts two input streams would need to ensure
its data links from the two upstream replicators could, in general,
handle this data rate with minimal chance of packet loss.
The operators of Replicators, QSDs and MPSes need some guidance on
peak bandwidth, and the only way to ensure the level 0 Replicators do
not send out greater than this bandwidth is some kind of real-time
demand balancing arrangement between the RUASes.
RUASes will probably have widely varying needs to send updates, and
these may change with time of day, due to a flurry of multihoming
mapping changes resulting from a network outage or for any other
reason. At each point in time, each RUAS needs a "quota" - a
quantity of data, in bytes, which is the limit of the total packets
it is allowed to send in the next time period, which may be 0.1, 0.2
or some other fraction of a second. If the RUAS needs to send more
packets than this, it should buffer the data, request a higher quota,
and only send the packets if and when it has received a higher quota.
Since the quota represents the right to use this shared resource, and
the sending of packets involves the actual use of this right, it is
likely that some kind of market forces will govern how the capacity
of the system is divided, moment-to-moment. There could be many ways
of arranging this, and it doesn't need to be standardised by the
IETF. The RUAS companies will need to work together, choose who to
accept as new RUAS companies, decide how to share the burdens of any
common infrastructure etc.
4.1.2. MAB snapshots
Every few minutes (or some other time period, as chosen by the RUAS,
but with some reasonable maximum defined by a BCP) the RUAS makes a
copy of the complete mapping information for a MAB. Snapshots for
each MAB are independent of each other, and so can be done with
different frequencies.
The snapshot is in a format which needs to be standardized, so it can
be downloaded and understood by any QSD, now and in the future. This
data format needs to be extensible to cover new kinds of mapping
information and other functions not yet anticipated - which will be
ignored by devices which are not capable of these functions.
Whittle Expires July 23, 2010 [Page 22]
Internet-Draft Ivip DB Fast Push January 2010
The exact format for this is for future work, but for instance would
begin with some identifying information about the MAB, a block
defining that the following data concerns IPv4 micronet mapping
information (and snapshot announcements), with the possibility of
other blocks containing different kinds of data. Binary format would
probably be best, and the file could then be compressed with gzip
etc.
Each such file will be given a distinctive name, according to a
standardised format, which indicates at least the MAB starting
address and length, and the time of the snapshot.
The snapshot process will take a second or two to complete from the
time it is initiated, and the resulting file will be copied to a
number of servers, ideally located in a variety of locations around
the Net.
Each such server would be run by the RUAS directly, or as part of all
RUASes working together. The servers can probably be conventional
HTTP servers, so that QSDs can download the snapshots when needed.
There is scope for some careful design with DNS so that there is an
automatic structure in the domain names of these servers, enabling an
expandable system to be automatically used by QSDs without manual
configuration.
These files will be publicly available, and need to be made available
for somewhat longer than the cycle time of snapshots. So with a ten
minute snapshot cycle, the previous snapshot should be available for
a while - probably 10 minutes or so - after the new one is available.
Snapshots are downloaded by QSDs when they boot, and if they suffer a
disruption in mapping updates which necessitates a reload of this
part of the complete mapping database. To facilitate this, MABs
should not be too large in terms of IPv4 addresses or IPv6 /64s - or
at least should not contain too many micronets - which would make
individual snapshot files excessively large.
At boot time, or when re-synching, the QSD will monitor the update
streams for each MAB until a snapshot announcement is found. It will
then buffer all subsequent updates and download the snapshot as soon
as it is available. Once the snapshot has arrived, and been unpacked
to RAM, the buffered updates are applied to it. Then, this MAB's
part of the mapping database is up-to-date and the QSD can being
using it to answer queries. (During the re-synching operation, the
QSD will need to tell a querier it can't answer the query, or may
buffer the query and send the same query to another QSD, passing on
the response when it arrives.
Whittle Expires July 23, 2010 [Page 23]
Internet-Draft Ivip DB Fast Push January 2010
In order to reduce total path lengths for these file downloads, and
likewise for retrieving missing packets from the same servers, it
would be desirable if each QSD in a given location could access a
nearby snapshot server. It may be desirable to have every snapshot
of every MAB in a single server, or a single set of servers which are
accessed by geographically close QSDs. Anycast is not a good
technology for this, since file retrieval is best done via TCP
sessions. The servers need to be on conventional addresses, rather
than SPI addresses, so the QSDs can access them without needing to
use ITRs which themselves depend on mapping. Likewise, any DNS
servers involved in this server system need to be strictly on
conventional addresses.
Each QSD needs to be configured with, or to automatically discover,
two or more such servers - at least one of which is relatively close
- so the data can be found despite one server being down.
From the point of view of the QSC, seeking an update for a given MAB
of a particular RUAS, the address to request the file from could be
made up from the RUAS identifier yyyy which is contained in the
snapshot announcement (in the stream of mapping updates),
concatenated with a locally configured "xxxxx" and
"ipv4.ivipservers.net". In the event that this server was
unavailable one or more locally configured alternatives to this
initial "xxxxx" value could be tried - including one or more for
nearby countries.
The most significant 24 bits of the MAB's starting address (probably
48 bits for IPv6, assuming this is the granularity of BGP
advertisements) for would be transformed into a text string such as
150.101.072. A similar transformation of the precise time of the
snapshot would result in a second text string, and these would be
used to reliably identify the appropriate directory and file in the
server.
4.1.3. Missing Payload Servers (MPSes)
Until 2010-01-18 I planned QSDs to download the payloads of any
packets they missing from one of several HTTP servers, as described
above for snapshot files - where those servers would be run by each
RUAS. This may be possible and desirable, but please see
[I-D.whittle-ivip-fpr] for a description of a distributed arrangement
of Missing Payload Servers which QSDs could access to obtain any
payloads which did not arrive via their typically two input streams
from level ~4 Replicators.
ISPs and larger end-user networks would run these MPSes and they
would be linked by HTTP or HTTPS so each could query the other,
Whittle Expires July 23, 2010 [Page 24]
Internet-Draft Ivip DB Fast Push January 2010
obtaining payloads each one was missing. These TCP-based links are
not subject to any PMTU constraints, since the payloads of any length
can be sent via HTTP or some other query-response protocol.
QSDs would query one or more MPSes as needed, with persistent or
temporary HTTP or HTTPS sessions.
To the extent that missing packets result from local outages, is it
more likely that a topologically distant MPS will have the payloads a
local MPS or QSD is most likely to want. So HTPP or HTTPS links
across oceans and continents would naturally be used by ISPs which
wanted to run MPSes - for mutual benefit.
4.2. Authentication of RUAS-generated data
Careful consideration must be given to how QSDs can quickly and
reliably ensure that the information they receive ostensibly from
each RUAS is genuine.
The DTLS links between Replicators and to QSDs will prevent an
attacker injecting bogus payloads into the FMS. But there's no way a
QSD could be entirely sure that all its upstream Replicators, which
could be quite numerous (2 above, 2 above each of them, 2 above each
of them etc.) are not under the control of an attacker. Being able
to direct traffic to an attacker's site, by means of altering the
mapping information in an ITR, is such a threat to security, and such
an attractive proposition for attackers, that some kind of digital
signing of the mapping update information will be required.
4.2.1. Snapshot and missing packet files
Each RUAS has a key pair and signs the MAB snapshot and missing
packet files with its private key. QSDs can verify the signature
with the RUAS's public key, subject to a PKI arrangement of
certificates, or some other simpler arrangements.
Both these types of files are only handled occasionally, so the
overhead in performing crypto operations is insignificant.
4.2.2. Mapping updates
This principle does not apply to the update information contained in
packets received from the Replicator system. The system needs to be
highly secure against attack, because even a second or two of an ITR
mapping packets to the attacker's site constitutes an unacceptable
breach.
Sometimes, possibly frequently, the RUAS will send a single packet,
Whittle Expires July 23, 2010 [Page 25]
Internet-Draft Ivip DB Fast Push January 2010
and the QSD needs to be able to authenticate this information
independent of any which follows a second or two later, because it
needs to use the information immediately to update its local copy of
the mapping database. So there will frequently be need to
authenticate individual packets.
There are multiple ways of solving this problem. I doubt anyone
would argue that it is so difficult as to warrant the abandonment of
the entire fast-push, local query server concept. With more work
later, I believe a satisfactory method can be found of the QSD
ensuring the updates are authentic before applying them.
4.3. RUAS - UAS interconnection
This section depicts a single tree of delegated responsibility for
the user control of mapping of one MAB. The Root UAS at the base of
the tree is run by Company X - RUAS-X. RUAS-X could be authoritative
for other MABs, and each such tree of delegation may have the same
set of other UAS systems, or it could be different. Each delegation
tree is separate from the delegation trees of other MABs, even if
they look similar, because the tree includes specific subsets of the
whole MAB address range as one of the defining characteristics of its
branches and leaves.
The initial action which leads to the database being changed is a
user generated (manually or by the user's equipment or by a system
authorised by the user) UMUC (User Mapping Update Command).
For authorising and feeding UMUCs to the RUAS-X, there is a tree as
depicted in Figure 1. Delegation of authority flows up the tree as
the total address range of the MAB is split at each branching
junction. This tree structure involves data, in the form of SUMUCs
(Signed User Mapping Updated Commands) flowing down towards the root
of the tree. (Data would also flow up the tree so each user-
interface leaf could tell end-users what their current mapping was,
could test their requests against constraints etc.) The idea is that
RUAS-X could delegate control of one or more subsets of the MAB's
total range of addresses to some other system, which in turn could
delegate control to other systems. There would be no absolute limit
on the height (usually called depth) of these hierarchies.
The RUAS maintains the master database, for each of its MABs, of what
the mapping, division into micronets etc. actually *is*. This
information is used to inform UASes of the current state, which they
can convey to end-users and use to check the validity of requests
from these end-users. This information is also used to generate
snapshot files. As the mapping in the master database is changed,
this gives rise to actual changes which must be assembled into
Whittle Expires July 23, 2010 [Page 26]
Internet-Draft Ivip DB Fast Push January 2010
MABUSes to be sent to the level 0 Replicators in the near future.
The servers which handle the end-user interaction needs to be one of
the leaves of this tree structure, so as not to burden the RUAS-X
database servers themselves with details of user interaction. This
enables various companies to give different kinds of control for the
mapping of the SPI space their branch of the tree controls. Figure 1
does not show RUAS-X having any user interface servers, but it could.
The simplest arrangement would be the RUAS having simply a user-
interface server and no tree of other UASes.
There would need to be IETF standardised methods by which some server
could execute a UMUC with the user-interface servers of any of these
UASes. This standardisation would be especially important for
multihoming, because some reasonably trusted company could run an
automated monitoring system, and have the credentials (username,
password, key etc.) stored in their system so their system can change
the mapping of one or more micronets the moment one link was detected
to be faulty. It is vital that there be a standardised method by
which all multihoming monitoring companies could send these mapping
change commands (and queries about the current state of mapping) to
UASes. Also, the company (such as X, Y or Z in Figure 1) which
controls a particular range of the Mapped space may offer such a
multihoming monitoring system itself.
The tree in this example controls an MAB with the address range
20.0.0.0 to 20.3.255.255. In this example, company X has been
assigned by an RIR the entire range 20.0.0.0 to 20.3.255.255.
Company X leases to Y a quarter of this: 20.1.0.0 to 20.1.255.255.
These divisions are on binary boundaries, but they need not be. It
would be just as possible for X to delegate to Y an arbitrary subset
of the whole range, or the entire range - or just one IPv4 address or
IPv6 /64.
X's Root Update Authorisation Server (RUAS) has a private key for
signing all the MAB snapshot files it periodically creates and makes
available. The same key would be used for signing the mapping change
information for each MAB which are sent to the level 0 Replicators
and so to all QSDs.
In this example, company Y delegates control of some of its space to
company Z, and Z has an end-user U, who needs to control the mapping
of a UAB containing one or more micronets in Z's range.
Whittle Expires July 23, 2010 [Page 27]
Internet-Draft Ivip DB Fast Push January 2010
User-R User-S User-T User-U Multihoming
\ \ | | Monitoring
\ \ | | Inc.
\ ................. /
\----. Web interface .---/
. other protocols .
. etc. .
....UAS-Z........
|
Other companies |
like Y and Z |
/-----<----/
| | \ | /
| | \|/
| | UAS-Y
\ | |
\ | /----<-----/
\ | /
\|/
RUAS-X Root Update Authorisation Server company X
| \
| \
V \->-[ Multiple web servers for MAB snapshot ]
|
| Other RUASes like RUAS-X, each authoritative
| for mapping one or more MABs and producing
| regular MAB snapshots and update streams to
| which are sent to all level 0 Replicators.
\
\ | | | /
\ | | | /
\ | | | /
\ | | | /
\ | | | /
\ | | | |
| | | | |
V V V V V
| | | | |
Each line depicts 8 streams of packets with
identical payloads - one stream for each of
the 8 level 0 Replicators.
Figure 1: Delegation tree of UASes above one RUAS. Multiple RUASes
all driving their mapping updates to every level 0 Replicator. These
fan the packets out to hundreds of thousands of QSDs all over the
world, in a second or so.
Whittle Expires July 23, 2010 [Page 28]
Internet-Draft Ivip DB Fast Push January 2010
Z has various interfaces by which U can do this, with its own
arrangements for authentication, for monitoring a multihoming system
and making changes automatically etc. Ideally there might be one or
more automated, host-to-server, IETF-standardised protocols so all
end users and their appointed multihoming monitoring companies could
have standardised software for talking to whichever company's servers
they use to control the mapping of their IP address(es).
When user-U (or a device or system with user-U's credentials) changes
the mapping of their micronet via a web interface this is achieved
via Z's website, authenticating him-, her- or it-self, by whatever
means Z requires. This causes UAS-Z to generate a signed copy of
this update command (a SUMUC) and to send it to UAS-Y. This may
include multiple commands to be executed in order.
The simplest SUMUC would be a change to the ETR address of an
existing micronet. This would consist of three items (assuming IPv4
for simplicity): A starting address for which micronet this update
covers, the number of IP addresses covered by the micronet to be
changed (>=1) (or alternatively the last address of the micronet),
and a new mapping value - a 32 bit ETR address. The SUMUC could also
consist of a time in the future the update should be executed. In
that case, it would be stored by RUAS-X and sent to the FMS at the
appointed time.
Mapping change commands would also include commands to join and split
micronets. Sequences of these commands would be sent, in order - and
the UAS should check their validity before putting them into a SUMUC.
So a SUMUC consists of one or multiple mapping change commands
concerning a particular micronet, or perhaps a set of micronets. The
commands will be executed in order, but as if at once.
If the SUMUC consists simply of changing a micronet's ETR address,
including zeroing it, then this will be applied by every QSD and
updates sent to any ITRs which need it. Multiple such changes all
together in the one SUMUC would cause the same effects, for multiple
micronets. However, if the changes involved a sequence of changes
affecting the same SPI addresses, the QSD will update its queriers,
which could be ITRs or QSCs, to the final state of the mapping after
the changes.
For instance a sequence of changes could zero two micronets (set
their ETR address to 0.0.0.0) and then join them into one micronet.
The resulting micronet could then be split into five micronets and
each one mapped to a different ETR address. The QSD may have a
querier which is caching the mapping for the first original micronet,
but not the other. It will send that querier updates which define
the new mapping arrangements for exactly that range of SPI addresses
Whittle Expires July 23, 2010 [Page 29]
Internet-Draft Ivip DB Fast Push January 2010
which the original response covered. This avoids the ITR (or the
QSC, if that is the querier) having to be told about a larger amount
of SPI space than it was told about in the initial reply. As noted
previously, the caching time for these newly defined micronets, each
of which will now be in the cache of the ITR or QSC, will be flushed
from the cache at the same time as the originally cached micronet
would have been.
UAS-Y trusts this SUMUC because it can authenticate UAS-Z's
signature. It strips off the signature and adds its own, before
passing the SUMUC down to the next level: RUAS-X.
RUAS-X likewise has a copy of UAS-Y's public key and within a
fraction of a second of U initiating the UMUC, the master copy of
this MAB's database, in RUAS-X is altered accordingly. (This would
be a distributed, redundant, database system.)
Authority is delegated up the tree, because UAS-Y will only accept
update commands if they are signed by one of its branch UASes, and
for the particular address range that UAS has been authorised to
control.
User-U may have given their username and password etc. to Multihoming
Monitoring Inc. so this company can monitor their multihoming links
and change the mapping as soon as one link goes down. UAS-Z doesn't
know or care who actually makes the change - as long as they can
authenticate themselves for whatever micronet they want to change the
mapping of. UAS-Z would keep an audit trail of all interactions such
as with User-U or Multihoming Monitoring Inc.
Whittle Expires July 23, 2010 [Page 30]
Internet-Draft Ivip DB Fast Push January 2010
5. Common information to be sent by the FMS
In future work I will consider what common information all QSDs need,
such as to reliably gain the basic information about the current
state of Ivip-mapped SPI space. The most important things are the
identities of the RUASes, how each RUAS is represented in the 7 bit
(for instance) "ruas" field in the FPR header of each packet in the
FMS, and the exact details of each current MAB. This will include
which RUAS is responsible for the mapping of which MAB.
One way of doing this is for QSDs to download it periodically via
HTTPS from one or several servers which are somehow trusted and
operated by either a consortium of the RUAS companies, or by
individual RUASes. Another way, would be for information such as
this to be periodically sent on the FMS itself. Probably the best
way is the downloaded file approach, with a regular schedule by which
each day, QSDs would download the latest information. MABs could be
added to the Ivip system on a day-by-day basis. There's no need to
expect QSDs to set up another MAB mapping database on the basis of a
command to this effect which arrives on the FMS itself.
Some kind of distributed and secure rsync arrangement is probably a
good method of doing this.
Whittle Expires July 23, 2010 [Page 31]
Internet-Draft Ivip DB Fast Push January 2010
6. The Fast Payload Replication system
Please refer to [I-D.whittle-ivip-fpr] for all details of this, the
most critical, global, part of Ivip's FMS.
Whittle Expires July 23, 2010 [Page 32]
Internet-Draft Ivip DB Fast Push January 2010
7. Scaling limits
The Replicator system is scalable to any size simply by adding
Replicators. Assuming two input streams for each Replicator, N
output streams gives an N/2 amplification of stream numbers per
level. N could be quite high in the early years of introduction,
when the number of micronets and updates is small by comparison with
the design target of one to ten billion micronets, with accompanying
update rates driven by their use for inbound TE for multihomed non-
mobile end-user networks and by mobile devices selecting new TTRs.
First, a maximal IPv4 example will be considered. Assume a billion
micronets, most of them for single IP addresses. Presumably most of
these will be for individual end-users, at home or with mobile
devices. The update rate will be relatively low for multihoming the
home and office-based micronets.
The update rate due to inbound TE is impossible to predict. Being
able to steer traffic dynamically to maximise utilization of multiple
links is economically highly attractive. Market mechanisms will tend
to set prices for updates which balance competing concerns. If the
price is too low, there will be more of them and the FPR system will
need to be improved to cope with them - so the price would rise to
either reduce the number, or pay for the upgrades.
It is possible that the RUASes could collectively set prices low
enough to make a profit running their operation and many of the
Replicators - levels 0 and 1 at least, and perhaps level 2 or 3 too -
with a very high volume of TE updates. TE updates are the class of
updates with the most elastic demand. Multihoming updates are needed
urgently when they are needed, but most of the time, for any one end-
user network, none are needed. TTR mobility updates are probably
somewhat elastic. If it is expensive to choose a nearby TTR, then
people will make do with a distant one for longer, or indefinitely.
There is a potentially large market for TE changes, because if an
end-user network made lots of them, they may be able to make much
better use of less expensive links.
If RUASes collectively set mapping update prices so low that the
volume rose to quite a high level, it is possible that ISPs and end-
user networks which run QSDs may feel less and less inclined to
accept all these updates - without some financial encouragement from
the RUASes who are making money from the updates.
If this grew to the point where those operating QSDs found they had
to spend money upgrading their QSDs just to cope with the volume,
then there would be the possibility that they could instead program
Whittle Expires July 23, 2010 [Page 33]
Internet-Draft Ivip DB Fast Push January 2010
their QSDs to ignore the most frequent updates which had patterns
resembling TE updates.
Then, in order for the RUASes to be able to continue charging for
these TE updates, the RUASes might need to pay QSD operators to
accept such a high level of updates. This would probably be
excessively expensive considering the number of ISPs and larger end-
user networks which would be running QSDs. So RUASes would be under
strong pressure to limit the total rate of updates to a level the
great majority of QSD operators are happy with. The price of updates
will not deter their use for multihoming service restoration - and
this would represent a small proportion of total updates. Higher
prices per update would reduce the number for TE, in a highly elastic
manner. Likewise, higher prices per update would cause mobile users
(or more directly the TTR companies, who are paying for each update)
not to change TTRs as often.
So overall, it is impossible to state with confidence what update
rates might be expected.
Even with the entire Earth's population owning a mobile device with
its own micronets, if we pick some figure, such as 1000 km, within
which there is no significant benefit in choosing a closer TTR, then
a WAG (Wild-Ass Guess) could be based on airline passenger numbers.
If we assume that each such trip would be long enough to require a
new TTR, then we would get some very approximate worst-case figure.
Statistics from the International Air Transport Association
[IATA-2009] indicate that commercial airlines carried 2.271 billion
passengers in 2008. I have not been able to find estimates for the
number of people travelling large distances by road or train, but it
is reasonable to assume these are relatively small compared to the
numbers of airline passengers. Most travel by car and train involves
trips short enough, with a return trip home, that there will be no
need to use a closer TTR during the whole trip. Truck drivers
crossing continents might be an exception, but the number of such
trips would be small compared to the 2 billion airline passenger
figure.
There could be growth in passenger numbers and it is possible that on
long trips, the aircraft's satellite link would connect to several
ground stations, with the MNs in the aircraft therefore (ideally)
changing their mapping to a new TTR near the ground station. (This
is explored in [TTR Mobility]. There are various ways of
extrapolating these figures, such as with population growth. For
simplicity, I will double the 2 billion figure and use this to
roughly include all mapping changes due to multihoming service
restoration and TE. So I have WAG of 4 billion mapping changes a
Whittle Expires July 23, 2010 [Page 34]
Internet-Draft Ivip DB Fast Push January 2010
year.
This is about 128 updates a second.
The raw data for change to an IPv6 micronet's ETR address is 32
bytes: 64 bits for the micronet's starting /64, another 64 bits for
its length or end, and 128 bits for the ETR address. 128 of these a
second is 4k bytes a second - 32kbps. There would be peaks and
troughs, and there could be peaks due to a major outage driving many
end-user networks to switch ETRs for multihoming service restoration.
This is a low data rate in the scheme of things. VoIP calls
typically run at 16, 32 or 64kbps for the actual voice data, plus
considerable overhead due to IP and other headers.
If there were 5 or 10 billion mobile devices, each with a micronet,
many of these would keep using the same TTR from one year to the
next. There would be a mapping change when the micronet was assigned
to a given handset, and then another when the handset was no longer
used, or replaced by another. So there would also be a significant
background level of administrative mapping changes with billions of
micronets for mobile devices.
It is hard to imagine a scenario in which the update rate would
require prohibitive volumes of data, even by today's standard, for
any substantial ISP. The flow of update packets would be somewhat
greater than this raw data rate due to the need for packing them into
some kind of robust format, having hashes of them with digital
signatures etc. The total amount of mapping data coming into an ISP
would be 2 to 4 times this due to the need for feeds from two or more
Replicators. Still, by the times such high levels of adoption could
occur, the bandwidth they require will surely not present a
significant difficulty for any ISP, or for larger end-user networks
which want to run their own ITRs and wish to have their own QSDs,
rather than relying on the QSDs of their ISPs.
Whittle Expires July 23, 2010 [Page 35]
Internet-Draft Ivip DB Fast Push January 2010
8. Managing Replicators
Replicators should be easy to create and deploy. Any substantial
server with the requisite software, in a suitable location, will do
the job - but it should be well secured against attackers gaining
root access. A successful system will require some mechanisms which
ensure reliable operation with a minimal amount of configuration and
ongoing management.
In the current model, each Replicator normally receives feeds from
two upstream Replicators, and generates some figure N feeds for
downstream devices. Each Replicator should be able to request and
quickly gain a replacement feed from another upstream Replicator if
one of those it is using becomes unavailable, or unreliable.
This requires that Replicators in general be operating below
capacity, so that when others in their level fail, they can take up
the slack. This needs to be locally configured beforehand, with
upstream Replicators of organisations which have agreed to provide
the feeds, and with downstream Replicators of organisations who have
requested them.
It is possible to imagine a sophisticated, distributed, management
system for the Replicator network. This could be developed over
time, since for initial deployment, considerable manual configuration
and less automation would be acceptable.
Whittle Expires July 23, 2010 [Page 36]
Internet-Draft Ivip DB Fast Push January 2010
9. Security Considerations
This ID mentions some authentication and security problems and
possible solutions to them, but full consideration of security can
only occur when the architecture is fleshed out in greater detail.
Whittle Expires July 23, 2010 [Page 37]
Internet-Draft Ivip DB Fast Push January 2010
10. IANA Considerations
For future work.
Whittle Expires July 23, 2010 [Page 38]
Internet-Draft Ivip DB Fast Push January 2010
11. Informative References
[I-D.whittle-ivip-arch]
Whittle, R., "Ivip (Internet Vastly Improved Plumbing)
Architecture", draft-whittle-ivip-arch-04 (work in
progress), January 2010.
[I-D.whittle-ivip-fpr]
Whittle, R., "Fast Payload Replication mapping
distribution for Ivip", draft-whittle-ivip-fpr-00 (work in
progress), January 2010.
[I-D.whittle-ivip-glossary]
Whittle, R., "Glossary of some Ivip and scalable routing
terms", draft-whittle-ivip-glossary-00 (work in progress),
January 2010.
[IATA-2009]
"Fact sheet: industry statistics", September 2009, <http:/
/www.iata.org/NR/rdonlyres/
8BDAFB17-EED8-45D3-92E2-590CD87A3144/0/
FactSheetIndustryFactsSept09.pdf>.
[TTR Mobility]
Whittle, R. and S. Russert, "TTR Mobility Extensions for
Core-Edge Separation Solutions to the Internets Routing
Scaling Problem", August 2008,
<http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf>.
Whittle Expires July 23, 2010 [Page 39]
Internet-Draft Ivip DB Fast Push January 2010
Author's Address
Robin Whittle
First Principles
Email: rw@firstpr.com.au
URI: http://www.firstpr.com.au/ip/ivip/
Whittle Expires July 23, 2010 [Page 40]
| PAFTECH AB 2003-2026 | 2026-04-24 12:51:53 |