One document matched: draft-litkowski-rtgwg-spf-uloop-pb-statement-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY LFA-APPLICABILITY SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6571.xml">
<!ENTITY RFC1195 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1195.xml">
<!ENTITY RFC2328 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2328.xml">
<!ENTITY ULOOP SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-microloop-analysis.xml">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- used by XSLT processors -->
<!-- OPTIONS, known as processing instructions (PIs) go here. -->
<!-- For a complete list and description of PIs,
please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable PIs that most I-Ds might want to use. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC): -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="3"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references: -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space:
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of popular PIs -->
<rfc category="std" docName="draft-litkowski-rtgwg-spf-uloop-pb-statement-02"
ipr="trust200902">
<front>
<title abbrev="spf-microloop">Link State protocols SPF trigger and delay algorithm impact on IGP microloops</title>
<author fullname="Stephane Litkowski" initials="S" surname="Litkowski">
<organization>Orange Business Service</organization>
<address>
<!-- postal><street/><city/><region/><code/><country/></postal -->
<!-- <phone/> -->
<!-- <facsimile/> -->
<email>stephane.litkowski@orange.com</email>
<!-- <uri/> -->
</address>
</author>
<date year="2015"/>
<area/>
<workgroup>Routing Area Working Group</workgroup>
<!-- <keyword/> -->
<!-- <keyword/> -->
<!-- <keyword/> -->
<!-- <keyword/> -->
<abstract>
<t>A micro-loop is a packet forwarding loop that may occur transiently
among two or more routers in a hop-by-hop packet forwarding paradigm.
</t>
<t>In this document, we are trying to analyze the impact of using different Link State IGP implementations in a single network in regards of microloops.
The analysis is focused on the SPF triggers and SPF delay algorithm.</t>
</abstract>
<note title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119"/>.</t>
</note>
</front>
<middle>
<section anchor="introduction" title="Introduction">
<t>
Link State IGP protocols are based on a topology database on which a SPF (Shortest Path First) algorithm like Dijkstra is implemented to find the optimal routing paths.
</t>
<t>Specifications like IS-IS (<xref target="RFC1195"/>) propose some optimization of the route computation (See Appendix C.1) but not all the implementations are following those not mandatory optimizations.</t>
<t>We will call SPF trigger, the events that would lead to a new SPF computation based on the topology.</t>
<t>
Link State IGP protocols, like OSPF (<xref target="RFC2328"/>) and IS-IS (<xref target="RFC1195"/>), are using plenty of timers to control the router behavior in case of churn : SPF delay, PRC delay, LSP generation delay, LSP flooding delay, LSP retransmission interval ...
</t>
<t>Some of those timers are standardized in protocol specification, some are not especially the SPF computation related timers.</t>
<t>For non standardized timers, implementations are free to implement it in any way.
For some standardized timer, we can also see that rather than using static configurable values for such timer , implementations may offer dynamically adjusted timers to help controlling the churn.</t>
<t>We will call SPF delay, the delay timer that exists in most implementations that makes codes to wait before running SPF computation after a SPF trigger is received.</t>
<t>
A micro-loop is a packet forwarding loop that may occur transiently
among two or more routers in a hop-by-hop packet forwarding paradigm. We can observe that these micro-loops are formed when two routers do not update their Forwarding Information Base (FIB) for a certain prefix at the same time.
The micro-loop phenomenon is described in <xref target="I-D.ietf-rtgwg-microloop-analysis"/>.
</t>
<t>
Routers have more and more powerful controlplane and dataplane that reduce the Control plane to Forwarding plane overhead during the convergence process. Even if FIB update is still reasonably the highest contributor in the convergence time
for large network, its duration is reducing more and more and may become comparable to protocol timers. This is particular true in small and medium networks.
</t>
<t>
In multi vendor networks, using different implementations of a link state protocol may favor micro-loops creation during convergence time due to deprecancies of timers.
Service Providers are already aware to use similar timers for all the network as best practice, but sometimes it is not possible due to limitation of implementations.
</t>
<t>
This document will present why it sounds important for service provider to have consistent implementations of Link State protocols across vendors. We are particularly analyzing the impact of using different Link State IGP implementations in a single network in regards of microloops.
The analysis is focused on the SPF triggers and SPF delay algorithm in a first step.
</t>
<t>This document is only stating the problem, and defining some work items but its not intented to provide a solution.</t>
</section>
<section anchor="problem" title="Problem statement">
<t>
<figure>
<artwork>
A ---- B
| |
10 | | 10
| |
C ---- D
| 2 |
Px Px
Figure 1
</artwork>
</figure>
In the figure above, A uses primarily the AC link to reach C. When the AC link fails, IGP convergence occurs. If A converges before B, A will forward traffic to C through B, but as B as not converged yet, B will loop back traffic to A, leading to a microloop.
</t>
<t>
The micro-loop appears due to the asynchronous convergence of nodes in a network when a event occurs.
</t>
<t>Multiple factors (and combination of these factors) may increase the probability for a micro-loop to appear :
<list style="symbols">
<t>delay of failure notification : the more B is advised of the failure later than A, the more a micro-loop may appear.</t>
<t>SPF delay : most of the implementations supports a delay for the SPF computation to try to catch as many events as possible. If A uses a SPF delay timer of x msec and B uses a SPF delay timer of y msec and x < y, B would start converging after A leading to a potential microloop.</t>
<t>SPF computation time : mostly a matter of CPU power and optimizations like incremental SPF. If A computes SPF faster than B, there is a chance for a microloop to appear. CPUs are today faster enough to consider SPF computation time as negligeable (order of msec in a large network).</t>
<t>RIB and FIB prefix insertion speed or ordering : highly implementation dependant.</t>
</list>
</t>
<t>
This document will focus on analysis SPF delay (and associated triggers).
</t>
</section>
<section anchor="spf-trigger" title="SPF trigger strategies">
<t>
Depending of the change advertised in LSP/LSA, the topology may be affected or not.
An implementation can decide to not run SPF (and only run IP reachability) if the advertised change is not affecting topology.
</t>
<t>
Different strategies exists to trigger SPF :
<list style="numbers">
<t>Always run full SPF whatever the change to process.</t>
<t>Run only Full SPF when required : e.g. if a link fails, a local node will run an SPF for its local LSP update. If the LSP from the neighbor (describing the same failure)
is received after SPF has started, the local node can decide that a new full SPF is not required as the topology has not change.</t>
<t>If topology does not change, only recompute reachability.</t>
</list>
</t>
<t>
As pointed in <xref target="introduction"/>, SPF optimization are not mandatory in specifications, leading to multiple strategies to be implemented.
</t>
</section>
<section anchor="spf-delay" title="SPF delay strategies">
<t>
Implementations of link state routing protocols use different strategies to delay SPF :
<list style="numbers">
<t>Two steps.</t>
<t>Exponential backoff.</t>
</list>
</t>
<section anchor="spf-delay-2step" title="Two step SPF delay">
<t>
The SPF delay is managed by four parameters :
<list style="symbols">
<t>Rapid delay : amount of time to wait before running SPF.</t>
<t>Rapid runs : amount of consecutive SPF runs that can run using rapid delay. When amount is exceeded router moves to slow delay.</t>
<t>Slow delay : amount of time to wait before running SPF.</t>
<t>Wait time : amount of time to wait without events before going back to rapid delay.</t>
</list>
</t>
<t>
Example : Rapid delay = 50msec, Rapid runs = 3, Slow delay = 1sec, Wait time = 2sec
</t>
<figure>
<artwork>
SPF delay time
^
|
|
SD- | x xx x
|
|
|
RD- | x x x x
|
+---------------------------------> Events
| | | | || | |
< wait time >
</artwork>
</figure>
</section>
<section anchor="spf-delay-exp" title="Exponential backoff">
<t>
The algorithm has two mode : fast mode and backoff mode. In backoff mode, the SPF delay is increasing exponentially at each run.
The SPF delay is managed by four parameters :
<list style="symbols">
<t>First delay : amount of time to wait before running SPF. This delay is used on when SPF is in fast mode.</t>
<t>Incremental delay : amount of time to wait before running SPF. This delay is used on when SPF is in backoff mode and increments exponentially at each SPF run.</t>
<t>Maximum delay : maximum amount of time to wait before running SPF.</t>
<t>Wait time : amount of time to wait without events before going back to fast mode.</t>
</list>
</t>
<t>
Example : First delay = 50msec, Incremental delay = 50msec, Maximum delay = 1sec, Wait time = 2sec
</t>
<figure>
<artwork>
SPF delay time
^
MD- | xx x
|
|
|
|
|
| x
|
|
|
| x
|
FD- | x x x
ID |
+---------------------------------> Events
| | | | || | |
< wait time >
FM->BM -------------------->FM
</artwork>
</figure>
</section>
</section>
<section anchor="spf-mix" title="Mixing strategies">
<figure>
<artwork>
S ---- E
| |
10 | | 10
| |
D ---- A
| 2
Px
Figure 2
</artwork>
</figure>
<t>
In the diagram above, we consider a flow of packet from S to D. We consider that S is using optimized SPF triggering (Full SPF is triggered only when necessary), and two steps SPF delay (rapid=150ms,rapid-runs=3, slow=1s). As implementation of S is optimized, Partial Reachability Computation (PRC) is available.
We consider the same timers as SPF for delaying PRC.
We consider that E is using a SPF trigger strategy that always compute Full SPF and exponential backoff strategy for SPF delay (start=150ms, inc=150ms, max=1s)
</t>
<t>
We also consider the following sequence of events (note : the timescale does not intend to represent a real router timescale where jitters are introduced to all timers) :
<list style="symbols">
<t>t0 : a prefix is declared down in the network.</t>
<t>t0+200ms : the prefix is declared as up.</t>
<t>t0+400ms : a prefix is declared down in the network.</t>
<t>t0+1000ms : S-D link fails.</t>
</list>
</t>
<figure>
<artwork>
S timescale E timescale Event timescale
| | |
| | | <- t0 Event
| Schedule PRC (150ms) | Schedule SPF (150ms) |
| | |
| | |
| | |
| PRC starts | SPF starts |
| PRC ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+180ms
| | |
| | | < - t0+200ms Event
| Schedule PRC (150ms) | Schedule SPF (150ms) |
| | |
| | |
| | |
| PRC starts | SPF starts |
| PRC ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+380ms
| | | < - t0+400ms Event
| Schedule PRC (300ms) | Schedule SPF (300ms) |
| | |
| | |
| | |
| | |
| | |
| | |
| PRC starts | SPF starts |
| PRC ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+730ms
| | |
| | |
| | |
| | |
| | | < - t0+1000ms Event
| Schedule SPF (150ms) | Schedule SPF (600ms) |
| | |
| | |
| SPF starts | |
| | |
| SPF ends | |
| RIB/FIB starts | |
| | | }
| RIB/FIB ends | | }
| | | }
| | | }
| | | }
| | | }
| | | } Micro-loop creation
| | | }
| | SPF starts | }
| | | }
| | SPF ends | }
| | RIB/FIB starts | }
| | | }
| | RIB/FIB ends | }
Figure 3
</artwork>
</figure>
<t>
In the figure above, we can see that due to deprecancies in SPF management, after multiple events (different types of event), SPF delays are completely misaligned between nodes leading to long microloop creation.
</t>
<t>
The same issue can also appear with only single type of events as displayed below :
<figure>
<artwork>
S timescale E timescale Event timescale
| | |
| | | < - t0 Event remote link down
| Schedule SPF (150ms) | Schedule SPF (150ms) |
| | |
| | |
| | |
| PRC starts | SPF starts |
| PRC ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+180ms
| | |
| | | < - t0+200ms Event remote link down
| Schedule SPF (150ms) | Schedule SPF (150ms) |
| | |
| | |
| | |
| SPF starts | SPF starts |
| SPF ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+380ms
| | | < - t0+400ms Event remote link change
| Schedule SPF (150ms) | Schedule SPF (300ms) |
| | |
| | |
| SPF starts | |
| | |
| SPF ends | |
| RIB/FIB starts | |
| | SPF starts | }
| RIB/FIB ends | | }
| | SPF ends | } micro-loop creation
| | RIB/FIB starts | }
| | | }
| | RIB/FIB ends | t0+730ms
| | |
| | |
| | |
| | |
| | | < - t0+1000ms Event
| Schedule SPF (1s) | Schedule SPF (600ms) |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | SPF starts |
| | |
| | SPF ends |
| | RIB/FIB starts |
| | | }
| | RIB/FIB ends | }
| | | }
| | | }
| | | } microloop creation
| | | }
| | | }
| | | }
| SPF starts | | }
| | | }
| SPF ends | | }
| RIB/FIB starts | | }
| | | }
| RIB/FIB ends | | t0 + 2030ms
Figure 4
</artwork>
</figure>
</t>
</section>
<section anchor="workitems" title="Proposed work items">
<t>
In order to enhance the current LinkState IGP behavior, authors would encourage working on
standardization of some behaviors.
</t>
<t>Authors are proposing the following work items :
<list style="symbols">
<t>Standardize SPF trigger strategy.</t>
<t>Standardize computation timer scope : single timer for all computation operations, separated timers ...</t>
<t>Standardize "slowdown" timer algorithm including its association to a particular timer : authors of this document does not presume that the same algorithm
must be used for all timers.</t>
</list>
</t>
<t>Using the same event sequence as in figure 2, we may expect fewer and/or shorter microloops using standardized implementations.
<figure>
<artwork>
S timescale E timescale Event timescale
| | |
| | | < - t0 Event
| Schedule PRC (150ms) | Schedule PRC (150ms) |
| | |
| | |
| | |
| PRC starts | PRC starts |
| PRC ends | |
| RIB/FIB starts | PRC ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+180ms
| | |
| | | < - t0+200ms Event
| Schedule PRC (150ms) | Schedule PRC (150ms) |
| | |
| | |
| | |
| PRC starts | PRC starts |
| PRC ends | |
| RIB/FIB starts | PRC ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+380ms
| | | < - t0+400ms Event
| Schedule PRC (300ms) | Schedule PRC (300ms) |
| | |
| | |
| | |
| | |
| | |
| | |
| PRC starts | PRC starts |
| PRC ends | |
| RIB/FIB starts | PRC ends |
| | RIB/FIB starts |
| RIB/FIB ends | |
| | RIB/FIB ends | t0+730ms
| | |
| | |
| | |
| | |
| | | < - t0+1000ms Event
| Schedule SPF (150ms) | Schedule SPF (150ms) |
| | |
| | |
| SPF starts | SPF starts |
| | |
| SPF ends | |
| RIB/FIB starts | SPF ends |
| | RIB/FIB starts | } microloop creation
| RIB/FIB ends | | }
| | RIB/FIB ends |
| | |
| | |
Figure 5
</artwork>
</figure>
As displayed above, there could be some other parameters like router computation power, flooding timers that may also influence microloops.
In the figure 5, we consider E to be a bit slower than S, leading to microloop creation. Despite of this, we expect that by aligning implementations
at least on SPF trigger and SPF delay, service provider may reduce number or duration of microloops.
</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>
This document does not introduce any security consideration.
</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements"/>
<section anchor="IANA" title="IANA Considerations">
<t>This document has no action for IANA.</t>
</section>
</middle>
<back>
<references title="Normative References">
&RFC2119;
&RFC2328;
&RFC1195;
&ULOOP;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 03:11:25 |