One document matched: draft-ymbk-idr-rs-bfd-01.xml


<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc comments="yes"?>
<?rfc compact="yes"?>
<?rfc inline="yes"?>
<?rfc sortrefs="yes"?>
<?rfc subcompact="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc tocompact="yes"?>

<rfc category="std" docName="draft-ymbk-idr-rs-bfd-01" ipr="noDerivativesTrust200902">
	<front>
		<title>Making Route Servers Aware of Data Link Failures at IXPs</title>

		<author fullname="Randy Bush" initials="R." surname="Bush">
			<organization>Internet Initiative Japan</organization>
			<address>
				<postal>
					<street>5147 Crystal Springs</street>
					<city>Bainbridge Island</city>
					<region>Washington</region>
					<code>98110</code>
					<country>US</country>
				</postal>
				<email>randy@psg.com</email>
			</address>
		</author>

		<author fullname="Jeffrey Haas" initials="J." surname="Haas">
			<organization>Juniper Networks, Inc.</organization>
			<address>
				<postal>
					<street>1194 N. Mathilda Ave.</street>
					<city>Sunnyvale</city>
					<region>CA</region>
					<code>94089</code>
					<country>US</country>
				</postal>
				<email>jhaas@juniper.net</email>
			</address>
		</author>

		<author fullname="John G. Scudder" initials="J." surname="Scudder">
			<organization>Juniper Networks, Inc.</organization>
			<address>
				<postal>
					<street>1194 N. Mathilda Ave.</street>
					<city>Sunnyvale</city>
					<region>CA</region>
					<code>94089</code>
					<country>US</country>
				</postal>
				<email>jgs@juniper.net</email>
			</address>
		</author>

		<author fullname="Arnold Nipper" initials="A." surname="Nipper">
			<organization>DE-CIX Management GmbH</organization>
			<address>
				<postal>
					<street>Lichtstrasse 43i</street>
					<city>Cologne</city>
					<code>50825</code>
					<country>Germany</country>
				</postal>
				<email>arnold.nipper@de-cix.net</email>
			</address>
		</author>
		
		<author fullname="Thomas King" initials="T." surname="King" role="editor">
			<organization>DE-CIX Management GmbH</organization>
			<address>
				<postal>
					<street>Lichtstrasse 43i</street>
					<city>Cologne</city>
					<code>50825</code>
					<country>Germany</country>
				</postal>
				<email>thomas.king@de-cix.net</email>
			</address>
		</author>

		<date month="March" year="2015" />

		<abstract>
			<t>
				When route servers are used, the data plane is not congruent with
				the control plane. Therefore, the peers on the Internet exchange can
				lose
				data connectivity without the control plane being aware of it,
				and
				packets are dropped on the floor. This document proposes the use
				of
				BFD between the two peering routers to detect a data plane
				failure, and then
				uses BGP next hop cost to signal the state of the
				data link to the route server(s).
			</t>

		</abstract>

		<note title="Requirements Language">

			<t>
				The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
				NOT",
				"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
				are to
				be interpreted as described in
				<xref target="RFC2119" />
				only when they appear in all upper
				case. They may also appear in
				lower or mixed case as English
				words, without normative meaning.
			</t>

		</note>

	</front>

	<middle>
		<section anchor="intro" title="Introduction">
			<t>
				In configurations (typically Internet exchanges) where EBGP routing
				information is exchanged between client routers through the agency of a route
				server <xref target="I-D.ietf-idr-ix-bgp-route-server" />, but traffic is exchanged directly, operational
				issues can arise when partial data plane connectivity exists among the route
				server client routers.
				This is because, as the data plane is not congruent with the control
				plane, the client routers on the Internet exchange can lose data
				connectivity without the
				control plane - the route server - being aware of it, and
				packets are dropped on the floor.
			</t>
			<t>
				To remedy this, two basic problems need to be solved:
			</t>
			
			<t>
				1. Client routers must have a means of verifying connectivity
				amongst themselves, and
			</t>
			<t>
				2. Client routers must have a means of communicating the knowledge so
				gained back to the route server.
			</t>
			<t>
				The first can be solved by application of Bidirectional Forwarding Detection
				<xref target="RFC5880" />. The second can be solved
				by use of BGP NH-SAFI <xref target="I-D.ietf-idr-bgp-nh-cost" />. There is a subsidiary problem that
				must also be solved. Since one of the key value propositions offered by
				a route server is that client routers need not be configured to peer with each
				other:
			</t>
			<t>
				3. Client routers must have a means (other than configuration)
				to know of one another's existence.
			</t>
			<t>
				This can also be solved by an application of BGP NH-SAFI.
			</t>
			<t>
				Throughout this document, we generally assume that the route server
				being discussed is able to represent different RIBs towards
				different clients, as discussed in
				<xref target="I-D.ietf-idr-ix-bgp-route-server">section 2.3.2.1.</xref>.
				These procedures (other than the use of BFD to track next hop
				reachability) have limited value if this is not the case.
			</t>

		</section>

		<section anchor="operation" title="Operation">
			<t>
				Below, we detail procedures where a route server tells its client routers
				about other client routers (by sending it their next hops using
				NH-SAFI),
				the client router verifies connectivity to those other client routers
				(using BFD)
				and communicates its findings back to the route server (again using
				NH-SAFI). The route server uses the received NH-SAFI routes as input
				to the route selection process it performs on behalf of the client.
			</t>

			<section anchor="discovery"
				title="Mutual Discovery of Route Server Client Routers">
				<t>
					Strictly speaking, what is needed is not for a route server client router
					to know of other (control-plane) client routers, but rather to know
					(so
					that it can validate) all the next hops the route server might
					choose to send the client router, i.e. to know of potential forwarding
					plane relationships.
				</t>

				<t>
					In effect, this requirement amounts to knowing the BGP next hops the
					route server is aware of in its Adj-RIBs-In. Fortunately, <xref target="I-D.ietf-idr-bgp-nh-cost" />
					defines a construct that contains exactly this data, the
					"Next-Hop Information Base", or NHIB, as well as procedures for a
					BGP speaker to communicate its NHIB to its peer. Thus, the problem
					can be solved by the route server advertising its NHIB to its
					client router, following those procedures.
				</t>
				<t>
					We observe that (as per NH-SAFI) the cost advertised in the
					route server's Adj-NHIB-Out need not reflect a "real" IGP cost, the
					only requirement being that the advertised costs are commensurate. A
					route server MAY choose to advertise any fixed cost other than
					all-ones (which is a reserved value in NH-SAFI). This specification
					does not suggest semantics be imputed to the NH-SAFI advertised by
					the route server and received by the client, other than "this next
					hop is present in the control plane, you might like to track it". The
					route server is not allowed to advertise a next hop as NH_UNREACHABLE.
				</t>
				
				<t>
					A route server client SHOULD use BFD (or other means beyond the
					scope of this document) to track forwarding plane connectivity <xref target="RFC5880"/>
					to each next hop depicted in the received NH-SAFI.
				</t>

				<t>
					<!-- Comment John:
					XXX want to say something about tracking connectivity to plain old
					next hops too? Just in case the RS gets it wrong?
					 -->
				</t>
			</section>

			<section anchor="tracking" title="Tracking Connectivity">
				<t>
					For each next hop in the Adj-NHIB-In received from the route server,
					the client router SHOULD use some means to confirm that data plane
					connectivity does exist to that next hop.
				</t>
				
				<t>	
					For each next hop in the
					Adj-NHIB-In received from the route server, the
					client router SHOULD
					setup a BFD session to it if one is not already
					available and track the
					reachability of this next hop.
				</t>

				<t>
					For each next hop being tracked, a corresponding NH-SAFI route
					should be placed in the client router's own Adj-NHIB-Out to be
					advertised to the route server. Any next hop for which connectivity
					has failed should have its cost advertised as NH_UNREACHABLE. (This
					may also be done as a result of policy even if connectivity
					exists.)
					Any other next hop should have some feasible cost advertised. The
					values advertised may be all equal, or may be set according to
					policy or other implementation-specific means.
				</t>
				
				<t>
					If the test of connectivity between one client router and another client router
					has failed the client router that detected this failure should perform
					connectivity test for a configurable amount of time (preferable 24 hours) on
					a regular basis (e.g. every 5 minutes). If during this time no connectivity can
					be restored no more testing is performed and this client router is advertised
					as NH_UNREACHABLE until manually changed or the client router is rebooted.
				</t>

				<!--To b replaced by section best path determination. <t>
					A client router tracking next hop reachability should also use that
					determination as input to its own bestpath determination, as per
					<xref target="RFC4271">section 9.1</xref>.
				</t>-->
			</section>
		</section>
		
		<section anchor="advertising" title="Advertising Client Router Connectivity to the Route Server">
			<t>
				As discussed above, a client router will advertise its Adj-NHIB-Out
				to the route server. The route server should use this information as
				input to its own decision process when computing the Adj-RIB-Out for
				this peer.
				This peer-dependent Adj-RIB-Out is then advertised to this peer. In
				particular, the route server MUST
				exclude any routes whose next hops the client has declared to be
				NH_UNREACHABLE. The route server MAY also consider the advertised
				cost to be the "IGP cost"
				<xref target="RFC4271">section 9.1</xref>
				when doing this
				computation.
			</t>
		</section>

		<section anchor="routeselectionprocess" title="Utilizing Next Hop Unreachablility Information at Client Routers">

			<t>
				A client router detecting an unreachable next hop signals this information 
				to the route server as described above.
				Also, it treats the routes as unresolvable as per
				<xref target="RFC4271">section 9.1.2.1</xref> and proceeds with route selection as normal.
			</t>

			<t>
				Changes in nexthop reachability via these mechanisms should receive some amount of consideration
				toward avoiding unnecessary route flapping.  Similar mechanisms exist in IGP implementations
				and should be applied to this scenario.
			</t>
		</section>
		
		<section anchor="recommendations" title="Recommendations for Using BFD">
				<t>
					The RECOMMENDED way a client router can confirm the data plane
					connectivity to its next hops is available, is the use of BFD in
					asynchronous mode. Echo mode MAY be used if both client routers running
					a BFD session support this. The use of authentication in BFD is
					OPTIONAL as there is a certain level of trust between the
					operators of the client routers at a particular IXP. If trust cannot
					be assumed, it is recommended to use pair-wise keys (how this can be
					achieved is outside the scope of this document).
					The ttl/hop limit values as described in
					<xref target="RFC5881">section 5</xref>
					MUST be obeyed in order to secure BFD sessions from packets
					coming from outside the IXP.
				</t>
				
				<t>
					There is interdependence between the functionality described in
					this document and BFD from an administrative point of view.
					To streamline behaviour of different implementations the following
					is RECOMMENDED:
					<list style='symbols'>
                		<t>If BFD is administratively shut down by the administrator of
                		a client router then the functionality described in this document
                		MUST also be administratively shut down.</t>
                		<t>If the administrator enables the functionality described in
                		this document on a client router then BFD MUST be automatically
                		enabled.</t>
            		</list>
				</t>
				
				<t>
					The following values of the BFD configuration of client routers
					(see <xref target="RFC5880">section 6.8.1</xref>) are RECOMMENDED in
					order to allow a fast detection of lost data plane connectivity:
					<list style='symbols'>
                		<t>DesiredMinTxInterval: 1,000,000 (microseconds)</t>
                		<t>RequiredMinRxInterval: 1,000,000 (microseconds)</t>
                		<t>DetectMult: 3</t>
            		</list>
				</t>

				<t>
            		The configuration values above are a trade-off between fast detection of data plane connectivity and
            		the load client routers must handle keeping up the BFD communication. Selecting smaller
            		DesiredMinTxInterval and RequiredMinRxInterval values generates lots of BFD packets,
            		especially at larger IXPs with many hundreds of client routers.
            	</t>

            	<t>
            		The configuration values above are selected in order to handle brief interrupts on the data plane.
            		Otherwise, if a BFD session detects a brief data plane interrupt to a particular client router, it
            		will cause to signal the route server that is should remove routes from this client router and
            		tell it shortly afterwards to add the routes again. This is disruptive and computational expensive
            		on the route server.
            	</t>

            	<t>
            		The configuration values above are also partially impacted by BGP advertisement time in reaction
            		to events from BFD. If the configuration values are selected so that BFD detects data plane interrupts
            		a lot faster than the BGP advertisement time, a data plane connectivity flapping could be detected by BFD but
            		the route server is not informed about them because BGP is not able to transport this information fast enough.
            	</t>

            	<t>
            		As discussed, finding good configuration values is hard so a client router administrator MAY
            		select better suited values depending on the special needs of the particular deployment.
            	</t>

            	<!-- bFD flapping handling meachanism -->
            	<!-- Comment Jeff: Whether we need to specify that the remote nexthop reachability mechanism is damped in the spec is an open question. -->
		</section>
		
		<section anchor="bootstrapping" title="Bootstrapping">
			<t>
				If the route server starts it does not know anything about connectivity states
				between client routers. So, the route server assumes optimistically that all
				client routers are able to reach each other unless told otherwise.
			</t>
			
		</section>

		<section anchor="other" title="Other Considerations">
			<t>
				For purposes of routing stability, implementations may wish to apply
				hysteresis ("holddown") to next hops that have transitioned from
				reachable to unreachable and back.
			</t>
		</section>

	</middle>

	<back>

		<references title="Normative References">
   		<?rfc include="reference.RFC.2119"?>
   		<?rfc include="reference.RFC.2439"?>
   		<?rfc include="reference.RFC.4271"?>
   		<?rfc include="reference.RFC.5880"?>
   		<?rfc include="reference.RFC.5881"?>
   		<?rfc include="reference.I-D.ietf-idr-ix-bgp-route-server"?>
		<?rfc include="reference.I-D.ietf-idr-bgp-nh-cost"?>
		</references>

		<!--- <references title="Informative References"> <?rfc include="reference.I-D.ietf-sidr-bgpsec-protocol"?> 
			</references> -->

	</back>

</rfc>

PAFTECH AB 2003-20262026-04-24 05:42:32