One document matched: draft-marques-l3vpn-end-system-01.xml


<?xml version="1.0" encoding="US-ASCII"?>

<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?rfc toc="yes"?>
<rfc ipr='trust200902' docName='draft-marques-l3vpn-end-system-01'>

<front>
<title abbrev='End-system attachements in BGP IP/VPNs'>
End-system support for BGP-signaled IP/VPNs.
</title>

<author initials='P.' surname='Marques' fullname='Pedro Marques'>
    <address>
    <email>pedro.r.marques@gmail.com</email>
    </address>
</author>

<author initials='L.' surname='Fang' fullname='Luyuan Fang'>
  <organization>Cisco Systems</organization>
  <address>
    <postal>
      <street>111 Wood Avenue South</street>
      <city>Iselin</city>
      <region>NJ</region>
      <code>08830</code>
    </postal>
    <email>lufang@cisco.com</email>
  </address>
</author>

<author initials='P.' surname='Pan' fullname='Ping Pan'>
  <organization>Infinera Corp</organization>
  <address>
    <postal>
      <street>140 Caspian Ct.</street>
      <city>Sunnyvale</city>
      <region>CA</region>
      <code>94089</code>
    </postal>
    <email>ppan@infinera.com</email>
  </address>
</author>

<date month="October" year="2011"/>
<area>Routing</area>
<abstract>
<t>Network Service Providers often use
<xref target="RFC4364">BGP/MPLS IP VPNs</xref> as the control plane for
overlay networks. That solution has proven to scale to large number of VPNs
and attachment points and is one familiar to network equipment software.</t>
<t>There is a significant interest in the industry in building overlay networks in which end-systems are themselves the direct participant, along with network equipment such as service appliances.</t>
<t>This document proposes an extension of the BGP IP VPN model to serve as
the signaling protocol for host-based overlay networks along with an XMPP interface that provides a bridge between the software concepts familiar to end-points and those familiar to network equipment.</t>
</abstract>
</front>

<middle>
<section title="Introduction">
<t>Data center applications require private networks connecting multiple
"Virtual Machines" belonging to the same administrative "user" and
between them and network elements and appliances.</t>

<t>In this context, it is a common goal, for the data-center forwarding
infrastructure to be isolated from the knowledge of the private network. The
set of routers and switches that interconnects physical machines in the
data-center is assumed to provide an IP service (with or without the use
of IEEE 802.1 technologies).</t>

<t>The Virtual Private Networks (VPNs) associated with each individual
administrative domain can be built without the knowledge of the
data-center connectivity layer as an overlay network.

This proposal leverages the technology used in the Service Provide managed VPN space and extends it to address the problem of interconnecting virtual interfaces on end-systems.

In both applications there is the need
to be able to manage at scale a very large number of VPNs and
attachement points. And in both cases there is the need to support the
interchange of traffic between different VPNs.</t>

<t>This document defines how BGP-signaled IP/VPNs can be used to interconnect
end-systems and network elements. It assumes that the forwarding layer uses
IP over GRE as defined by <xref target='RFC4023'/>. Other
transport layers such as native MPLS or 802.1ah can also be used with
the same signaling approach.</t>

<t>When this document uses the term 'Infrastructure IP' addresses, it refers to
the addresses used in the outer header of GRE packets.
In the case of a transport other than IP over GRE,
this would be the Subnetwork Point of Attachement (SNPA) address
corresponding to the
multi-access network providing connectivity to the end-systems.</t>

<t>BGP is not an interface that application software is familiar with. In order
to bridge the gap between concepts familiar to network devices and those
familiar to end-system developers,
this document defines an XMPP client interface to be used by end-systems.
It defines the procedures to interchange data between XMPP and BGP IP VPN
sessions along with the corresponding data schemas.
Networking devices may opt to receive the signaling information
directly via BGP.</t>
</section>

<section title="End-system functionality">
<t>For the purposes of this document we assume that each end-system executes
an 'Host Operating System' with the ability to:
<list>
  <t>Create virtual interfaces (typically ethernet interfaces).</t>
  <t>Associate a given virtual interface with a specific "Virtual and Routing
     Forwarding (VRF)" table.</t>
  <t>Store entries in the VRF table that map an VRF-specific IP prefix into
     a next-hop which contains a destination IP address and a 20-bit label.</t>
  <t>Encapsulate outgoing packets according <xref target='RFC4023'/> using
     the result of the VRF lookup.</t>
  <t>Associate incoming packets with a VRF according to the 20-bit label
     contained immediately after the GRE header.</t>
  <t>Expose a programmatic interface to create, update and delete VRF table
     entries.</t>
</list></t>

<t>The 'Host Operating System' may choose to associate the virtual interfaces
with specify 'Virtual Machines' or use other policies to manage the
application access to these interfaces.</t>
<t>Hosts should support the ability to associate multiple virtual interfaces
with the same VRF. The 20-bit label which is associated with a VRF is of local significance
only and SHOULD be allocated by the end-system.</t>
<t>The procedure that determines that a VRF should configured on a particular
end-system as well as which IP addresses to be associated with each interface
are outside the scope of this document. We assume that statically assigned
IP addresses are used.</t>
<t>The VRFs support IP unicast traffic only. Multicast support is subject
for further study and will be detailed in a separate document. Both IPv4 and
IPv6 are supported and the term 'IP' can refer to either version of the Internet Protocol.</t>
<t>The VRF table is populated by the signaling mechanisms described
bellow and may contain both host length (i.e. /32 and /128 for IPv4
and v6 respectively) or subnet prefixes. As an example a VPN with access
to the external networks would probably contain a default route plus a set
of host length entries for all the Virtual Machines (VMs) in the same VPN.</t>
<t>In the terminology used in the <xref target='RFC4364'>BGP-signaled IP/VPN
standard</xref>, a end-system acts as a 'Provider Edge' (PE) device in
terms of its forwarding
capabilities, with the virtual interfaces that it exposes
(for instance to virtual machines) as the 'Customer Edge' (CE) interfaces.</t>
</section>

<section title='Virtual Machine Networking'>
<t>When virtual machines are associated with a virtual interface on the
end-system, this document assumes that there is a single route entry to
a default route on the Virtual Machine (VM). Packets are then routed by
the Host OS, which imposes the VPN encapsulation header. Link-local addresses
on the virtual ethernet interface that connects the virtual machine are not
globally significant.</t>

<t>When discussing VM connectivity, it is frequent to encounter the assumption
that the VM routing table contains a subnet route entry with reachability to
all other VMs in the VPN.</t>

<texttable align='left' anchor='vm_rt_table_lan'>
<preamble>VM route table using LAN adjacency:</preamble>
<ttcol>IP prefix</ttcol><ttcol>Next-hop</ttcol><ttcol>Interface</ttcol>
<c>10.1.1.1/32</c><c>local</c><c>veth0</c>
<c>10.1/16</c><c>direct</c><c>veth0</c>
</texttable>

<t>In the scenario above, the VM assumes a direct LAN adjacency with its peers (e.g. 10.1.1.2). It uses ARP to build an L2 adjacency to its communication peers. Scaling ARP broadcasts and updating ARP entries across the data-center becomes an important problem. Often the conclusion reached is that
the VM mac-addresses must be global and constant for a given VM. That can be a problematic
requirement when interconnecting data-centers administered by different
orchestration systems.</t>

<t>This document proposes a VM routing table configuration where there are no ARP adjacencies between different VMs.</t>

<texttable align='left' anchor='vm_rt_table_l3vpn'>
<preamble>VM route table using default gateway:</preamble>
<ttcol>IP prefix</ttcol><ttcol>Next-hop</ttcol><ttcol>Interface</ttcol>
<c>10.1.1.1/32</c><c>local</c><c>veth0</c>
<c>169.254.254.254/32</c><c>direct</c><c>veth0</c>
<c>0/0</c><c>169.254.254.254</c><c>veth0</c>
</texttable>

<t>The configuration above eliminates the need for L2 adjacencies between
VMs. The VM contains a single ARP entry to its default gateway, which is
the Host OS. The Host OS performs a route lookup in the corresponding VRF
in order to route the packet. In this approach, the Host OS VRF is the
point of control that determines the destination end-system associated
with the VPN destination IP address.</t>

<t>One of the advantages of this model is that it eliminates the need
to support broadcast across a VPN.</t>

</section>

<section title="Operational Model">

<t>In the simplest case, a VPN is a collection of systems that are allowed
to exchange traffic with each other and where all the VRFs in the VPN contain
all the routing entries for the VPN. Only members of the VPN are allowed
to exchange traffic with each other. We can refer to these as symmetrical
VPNs since all VRFs contain the same routing information.</t>

<t>When end-systems join a given VPN they advertise their membership
by advertising the VPN-specific IP address associated with a
particular virtual interface as well as its binding to the
infrastructure IP address associated with the host.</t>

<t>Infrastructure addresses are routable in the underlying transport
network (e.g. the data-center network). While VPN addresses are routable
on the VPN network only.</t>

<t>End-systems subscribe to the contents of the VPN routing tables for
which they have members associated with. This information is then used
to populate the host's routing tables. It may contain both host routes
(i.e. IPv4 32-bit prefixes or IPv6 128-bit prefixes) or routes to
gateways that interconnect other networks.</t>

<t>The signaling network delivers the membership advertisements generated by the end-systems to other members of the same VPN, subjet to policy controls.</t>

<t>When a particular VM "moves" from one physical end-system to another,
its respective VPN address will be advertised by the new system and that
notification propagated to all attachment points of that VPN.</t>

<!-- TODO: move to a different section
<t>A VPN route is a triple of the form (end-system-id, address-family,
ip-prefix).
The end-system-id is a 6-octect value associated with the physical end-system
(i.e. the host machine) and which is assumed to be unique accross the
domain. This document suggests that the end-system-id be assigned based on the
IEEE 802 MAC address associated with a physical interface in the system.
The address family can be either IPv4 (numeric value 1) or IPv6
(numeric value 2) as defined in 'Address Familiy Numbers' (IANA).
</t>
-->

<t>This document assumes two types of applications that perform network
signaling functions: BGP Route Reflectors (RRs) and BGP/XMPP signaling gateways.
Both functions may be collocated in the same physical device.</t>

<t>The BGP Route Reflectors accept connection from gateways or native BGP
devices. These BGP peering sessions SHALL be support the address families:
VPN-IPv4 (1, 128), VPN-IPv6 (2, 128) and
<xref target='RFC4684'>RT-Constraint (1, 132)</xref>.</t>

<t>The XMPP signaling gateways maintain persistent connection to a subset of the
end-systems of the domain and provide a 'pubsub' API to the contents of
each specific VPN routing table.
These systems are not in the forwarding plane and do not need to be collocated
with a network device.</t>

<t>Network devices MAY have direct BGP sessions to the BGP Route Reflectors.
For instance, a router or security appliance that supports BGP/MPLS IP VPNs
over GRE may use its existing functionality to inter-operate directly with a
collection of Virtual Machines.</t>

<t>The BGP/XMPP gateways implement the VRF policy functionality that
is associated with PE routers in the pure BGP IP/VPN case. In these signaling
gateways, the 'publish-subscribe' messages from the end-systems are associated
with a VRF-specific signaling table. Each of these routing tables contains
import and export policies which provide fine grain control over the
table contents.</t>

<t>An export policy associates VPN routing information with one or
more 6 byte values known as 'Route Targets'. These 'Route Targets' are
associated with the routes as they are advertised out to other BGP
speakers.</t>
<t>Import policies, on the other hand, select via 'Route Targets',
from all the available routing information which routes should be
imported into a VPN-specific routing table.</t>
<t>A symmetrical VPN uses the
same configuration for both import and export. By controlling these policies
it is possible to selectively allow direct traffic exchanges between members of
different VPNs, assuming their respective IP addresses are non-overlapping.</t>

<figure anchor='forwarding'>
<artwork><![CDATA[
               +--------+                 +--------+
VM1 -- veth0 --| host 1 |=== [network] ===| host 2 |-- veth0 -- VM2
               +--------+                 +--------+

 IP pkt  ===> GRE encap  ===> [IP net] ===> GRE decap ===> IP pkt
           [192.168.2.1, 20]               map 20 to veth0
]]></artwork>
</figure>

<texttable align='left' anchor='vrf1'>
<ttcol>VPN IP address</ttcol><ttcol>Host address</ttcol><ttcol>label</ttcol>
<c>10.1.1.1/32</c><c>localhost</c><c>10</c>
<c>10.1.1.2/32</c><c>192.168.2.1</c><c>20</c>
<postamble>VRF table on host1</postamble>
</texttable>

<t>The figure and table above contain an example in which IP packets are
transmitted from one VPN interface (with address 10.1.1.1) to another VPN
interface (with address 10.1.1.2). As previously mentioned, the virtual
ethernet interfaces function as a CE interace in a traditional BGP-signaled
IP VPN. While the end-system provide the forwarding functionality equivalent
to a PE device.</t>

<figure anchor='signaling'>
<artwork><![CDATA[
+--------+       +-----------+       +--------+
| host 1 | <===> | signaling | <===> | BGP RR |
+--------+       | gateway   |       +--------+
                 +-----------+
]]></artwork>
</figure>

<texttable align='left' anchor='vrf1_on_gw'>
<ttcol>VPN IP address</ttcol><ttcol>SNPA</ttcol><ttcol>label</ttcol>
<ttcol>Known via</ttcol>
<c>10.1.1.1/32</c><c>192.168.1.1</c><c>10</c><c>XMPP</c>
<c>10.1.1.2/32</c><c>192.168.2.1</c><c>20</c><c>BGP</c>
<postamble>VPN Routing table on signaling gateway</postamble>
</texttable>

<t>The signaling network corresponding to the same example is depicted above.
The signaling gateway is an out-of-band system which speaks both XMPP to the
host as well as BGP to the BGP RRs. The table above represents the routing
table on the gateway that corresponds to the VPN of the example.
Host 2 would be connected to another signaling gateway which would be in turn
connected to the BGP RR mesh.
</t>
<t>The gateway is configured via an external mechanism with the parameters that
correspond to the VPNs in use by its clients along with its respective
vrf import and export policies.</t>
<t>XMPP publish request are translated into routing entries on this table,
which are then advertised via BGP, using standard BGP-signaled IP VPN mechanism.BGP learned routes are also imported into this routing table. Any changes
to its content are advertised to local XMPP clients.</t>
<t>In comparison with traditional IP VPNs, the signaling gateway is performing the PE functionality, which XMPP is used as a PE-CE routing protocol.</t>
</section>

<section title="XMPP client interface">
<t>The communication between end-systems and the signaling gateway uses the
XMPP protocol with the <xref target='pubsub'>PubSub Collection Nodes</xref>
extension in order to exchange VPN route information.</t>

<t>End-systems establish persistent XMPP sessions. These sessions MUST use
the XMPP <xref target='xmpp-ping'>Ping</xref> extension in order to detect
end-system failures.</t>

<t>An End-system MAY connect to multiple VPN-signaling gateways for
reliability. In this case it SHOULD publish its information to each of the
gateways. It MAY choose to subscribe to VPN routing information once
only from one of the available gateways.</t>

<t>The information advertised by a end-system SHOULD be deleted after a
configurable timeout, when the session closes. This timeout should default to
60 seconds.</t>

<figure>
<artwork><![CDATA[
		+---------+	      	+--------+
		| gateway | -----------	| BGP RR |
		+---------+ 	       	+--------+
                //	    \          /
              XMPP           \	      /
              //     	      \	     /
+------------+		       \    /
| end-system |		       	\  /
+------------+		      	 \/
              \\	      	 /\
              XMPP	      	/  \
                \\     	       /    \
		+---------+   /	     \  +--------+
		| gateway | -----------	| BGP RR |
		+---------+	      	+--------+
]]></artwork>
</figure>

<t>The figure above represents a typical configuration in which a end-system
is homed to multiple gateways, which are in turn connected to multiple BGP
route reflectors. In a deployment there would be a number of gateways
corresponding to the number of end-systems divided by the gateway capacity in terms of number of XMPP sessions.
While the BGP RR scale in terms of the number of gateways attached to it.</t>

<t>The XMPP "jid" used by the end-system shall be a 6-byte value that
uniquely identifies the host in the domain. This specification recommends
the use of the 802 MAC address of one of the physical ethernet interfaces
of the end-system, when present.</t>
<t>Each VPN shall be identified by a 64 ASCII character string.</t>

<t>The guest system software on an end-system SHALL establish an XMPP session
with its configured signaling gateways before creating virtual interfaces.</t>

<t>When a virtual interface is created, for instance as result of a Virtual
Machine
being instantiated on a end-system, the host operating-system
software shall generate an XMPP Publish message to the VPN-signaling
gateway.</t>

<figure>
<preamble>Publish request from end-system to gateway:</preamble>
<artwork align='left'><![CDATA[
<iq type='set'
    from='01020304abcd@domain.org'	<!-- system-id@domain.org -->
    to='network-control.domain.org'
    id='request1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <publish node='01020304abcd:vpn-ip-address/32'>
      <item>
        <entry xmlns='http://ietf.org/protocol/bgpvpn'>
          <nlri af='1'>'vpn-ip-address>/32'</nlri>
	  <snpa af='1'>'infrastructure-ip-address'</snpa>
          <version id='1'>		<!-- non-decreasing VM version # -->
	  <label>1</label>		<!-- 24 bit number -->
        </entry>
       </item>
    </publish>
  </pubsub>
</iq>

<iq type='set'
    from='01020304abcd@domain.org'
    to='network-control.domain.org'
    id='request2'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <collection node='vpn-customer-name'>
      <associate node='01020304abcd:vpn-ip-address/32'/>
    </collection>
  </pubsub>
</iq>
]]></artwork>
</figure>

<t>In the request above the node 'vpn-customer-name' is assumed to be
a collection which is implicitly created by the VPN-signaling gateway.</t>

<t>The VPN-signaling gateway will convert the information received in a
the 'publish' request into the corresponding BGP route information such
that:.</t>
<t>
<list>
  <t>It associates the specific request with a local VRF with the name
specified in the collection 'node' attribute.</t>
  <t>It Creates a route with with a 'Route Distinguisher' (RD) containing the end-system's 'system-id'
and the specified 'label' and NLRI prefix.</t>
  <t>It associates this route with the specified SNPA address.</t>
  <t>It associates the route with an extended community TDB containing the
    version number.</t>
</list>
</t>

<figure>
<preamble>Subscription request from end-system to gateway:</preamble>
<artwork align='left'><![CDATA[
<iq type='set'
    from='01020304abcd@domain.org'
    to='network-control.domain.org'
    id='sub1'>
  <pubsub xmlns='http://jabber.org/protocol/pubsub'>
    <subscribe node='vpn-customer-name'/>
  </pubsub>
</iq>
]]></artwork>
</figure>

<figure>
<preamble>Update notification from gateway to end-system:</preamble>
<artwork align='left'><![CDATA[
<message to='system-id@domain.org from='network-control.domain.org>
  <event xmlns='http://jabber.org/protocol/pubsub#event'>
    <items node='vpn-customer-name'>
      <item id='ae890ac52d0df67ed7cfdf51b644e901'>
        <entry xmlns='http://ietf.org/protocol/bgpvpn'>
          <nlri af='1'>'vpn-ip-address>/32'</nlri>
	  <snpa af='1'>'infrastructure-ip-address'</snpa>
          <version id='1'>		<!-- non-decreasing VM version # -->
	  <label>1</label>		<!-- 24 bit number -->
        </entry>
       </item>
      <item >
        ...
      </item>
    </items>
  </event>
</message>
]]></artwork>
</figure>

<t>Notifications should be generated whenever a VPN route is added, modified
or deleted.</t>
<t>Note that the Update from the signaling gateway to the end-point does
not contain the system-id of the destination end-point. When multiple
possible routes exist for a given VPN IP address, for instance because
the VM may be in the process of moving location, it is the responsibility
of the gateway to select the best path to advertise to the end-system.</t>

<t>In situations where an automated system is controlling the instantiation
of VMs it may be possible to have that system assign a non-decreasing version
number for each instantiation of that particular VM. In that case, this
number, carried in the 'version' field may be used to help gateways select
the most recent instantiation of a VM during the interval of time where
multiple routes are present.</t>

</section>

<section title="VPN NLRI">

<t>When a VPN-signaling gateway receives a request to create or modify a VPN
route is SHALL generate a BGP VPN route advertisement with the corresponding
information using the BGP address family corresponding to the address family
specified by the end-system.</t>

<t>It is assumed that the VPN-signaling gateways contain information regarding
the mapping between 'vpn-customer-names' and BGP Route Targets used to import
and export information from the associated VRFs. This mapping is known via
an out-of-band mechanism not specified in this document.</t>

<t>Whenever a VRF in the gateway contains local routing information, the
gateway shall advertise the corresponding RT-Constraint route target routes
in BGP, which perform a parallel function to the subscription requests in
XMPP.</t>

<t>The 32bit route version number defined in the XML schema is advertised into
BGP as a Extended community with type TBD.</t>

<t>Signaling gateways SHOULD use automatically assign a BGP route distinguisher per VPN routing table.</t>
</section>

<section title="Security Considerations">
<t>The signaling protocol defines the access control policies for each virtual
interface and any VM associated with it. It is important to secure the
end-system access to signaling gateways and the BGP infrastructure itself.</t>
<t>The XMPP session between end-systems and the XMPP gateways MUST use mutual authentication. One possible strategy is to distribute pre-signed certificates
to end-systems which are presented as proof of authorization to the signaling
gateway.</t>
<t>BGP sessions MUST be authenticated using a shared secret. This document
recommends that BGP speaking systems filter traffic on port 179 such that
only IP addresses which are known to participate in the BGP signaling protocol
are allowed.</t>
</section>

</middle>

<back>
  <references>
    <?rfc include="reference.RFC.4023.xml"?>
    <?rfc include="reference.RFC.4364.xml"?>
    <?rfc include="reference.RFC.4684.xml"?>
    <reference anchor='xmpp-ping'>
      <front>
	<title>XMPP Ping</title>
	<author fullname='Peter Saint-Andre'></author>
	<date month='June' year='2009'/>
      </front>
      <seriesInfo name='XEP' value='0199'/>
    </reference>

    <reference anchor='pubsub'>
      <front>
	<title>PubSub Collection Nodes</title>
	<author fullname='Peter Saint-Andre'></author>
	<author fullname='Ralph Meijer'></author>
	<author fullname='Brian Cully'></author>
	<date month='September' year='2010'/>
      </front>
      <seriesInfo name='XEP' value='0248'/>
    </reference>
  </references>
</back>

</rfc>

PAFTECH AB 2003-20262026-04-24 02:55:18