One document matched: draft-litkowski-rtgwg-uloop-delay-02.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC5286 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5286.xml">
<!ENTITY RFC3906 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3906.xml">
<!ENTITY RFC4090 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4090.xml">
<!ENTITY RFC5714 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5714.xml">
<!ENTITY RFC5305 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5305.xml">
<!ENTITY RFC5715 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5715.xml">
<!ENTITY RFC3630 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3630.xml">
<!ENTITY RFC5443 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5443.xml">
<!ENTITY RFC6571 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6571.xml">
<!ENTITY REMOTE-LFA SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-remote-lfa.xml">
<!ENTITY OFIB SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-ordered-fib.xml">
<!ENTITY PLSN SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-rtgwg-microloop-analysis.xml">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> <!-- used by XSLT processors -->
<!-- OPTIONS, known as processing instructions (PIs) go here. -->
<!-- For a complete list and description of PIs,
please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable PIs that most I-Ds might want to use. -->
<?rfc strict="yes" ?> <!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC): -->
<?rfc toc="yes"?> <!-- generate a ToC -->
<?rfc tocdepth="3"?> <!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references: -->
<?rfc symrefs="yes"?> <!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?> <!-- sort the reference entries alphabetically -->
<!-- control vertical white space:
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?> <!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?> <!-- keep one blank line between list items -->
<!-- end of popular PIs -->
<rfc category="std" docName="draft-litkowski-rtgwg-uloop-delay-02" ipr="trust200902">
<front>
<title abbrev="uloop-delay">Microloop prevention by introducing a local convergence delay</title>
<author fullname="Stephane Litkowski" initials="S" surname="Litkowski">
<organization>Orange</organization>
<address>
<!-- postal><street/><city/><region/><code/><country/></postal -->
<!-- <phone/> -->
<!-- <facsimile/> -->
<email>stephane.litkowski@orange.com</email>
<!-- <uri/> -->
</address>
</author>
<author fullname="Bruno Decraene" initials="B" surname="Decraene">
<organization>Orange</organization>
<address>
<!-- postal><street/><city/><region/><code/><country/></postal -->
<!-- <phone/> -->
<!-- <facsimile/> -->
<email>bruno.decraene@orange.com</email>
<!-- <uri/> -->
</address>
</author>
<author fullname="Pierre Francois" initials="P" surname="Francois">
<organization>IMDEA Networks</organization>
<address>
<!-- postal><street/><city/><region/><code/><country/></postal -->
<!-- <phone/> -->
<!-- <facsimile/> -->
<email>pierre.francois@imdea.org</email>
<!-- <uri/> -->
</address>
</author>
<author fullname="Clarence Fils Fils" initials="C" surname="Filsfils">
<organization>Cisco Systems</organization>
<address>
<!-- postal><street/><city/><region/><code/><country/></postal -->
<!-- <phone/> -->
<!-- <facsimile/> -->
<email>cf@cisco.com</email>
<!-- <uri/> -->
</address>
</author>
<date year="2014" />
<area></area>
<workgroup>Routing Area Working Group</workgroup>
<!-- <keyword/> -->
<!-- <keyword/> -->
<!-- <keyword/> -->
<!-- <keyword/> -->
<abstract>
<t>
This document describes a mechanism for link-state routing protocols
to prevent local transient forwarding loops in case of link failure.
This mechanism
Proposes a two-steps convergence by introducing a delay between the convergence of the node adjacent to the topology change and the network wide convergence.
</t>
<t>
As this mechanism delays the IGP convergence it may only be used for planned maintenance or when fast reroute protects the traffic between the link failure and the IGP convergence.
</t>
<t>
Simulations using real network topologies have been performed and show that local loops are a significant portion (>50%) of the total forwarding loops.
</t>
</abstract>
<note title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref
target="RFC2119"/>.</t>
</note>
</front>
<middle>
<section anchor="intro" title="Introduction">
<t>
In figure 1, upon link AD down event, for the destination A, if D
updates its forwarding entry before C, a transient forwarding loop
occurs between C and D. We have a similar loop for link up event, if
C updates its forwarding entry A before D.
<figure>
<artwork>
A ------ B
| |
| |
D--------C All the links have a metric of 1 except BC=5
Figure 1
</artwork>
</figure>
</t>
</section>
<section anchor="overview" title="Overview of the solution">
<t>
This document defines a two-step convergence initiated by the router
detecting the failure and advertising the topological changes in the
IGP. This introduces a delay between the convergence of the local
router and the network wide convergence. This delay is
positive in case of "down" events and negative in case of "up"
events.
</t>
<t>
This ordered convergence, is similar to the ordered FIB proposed
defined in <xref target="I-D.ietf-rtgwg-ordered-fib"/>, but limited to only one hop
distance. As a consequence, it is simpler and becomes a local only feature not requiring interoperability; at the cost of only covering the transient forwarding loops involving this local router. The proposed mechanism also reuses some concept described
in <xref target="I-D.ietf-rtgwg-microloop-analysis"/> with some limitation and
improvements.
</t>
</section>
<section anchor="specification" title="Specification">
<section anchor="definition" title="Definitions">
<t>
This document will refer to the following existing IGP timers:
<list style="symbols">
<t>LSP_GEN_TIMER: to batch multiple local events in one single local LSP update. It is often associated with damping mechanism to slowdown reactions by incrementing the timer when multiple consecutive events are detected.</t>
<t>SPF_TIMER: to batch multiple events in one single computation. It is often associated with damping mechanism to slowdown reactions by incrementing the timer when the IGP is instable.</t>
<t>IGP_LDP_SYNC_TIMER: defined in <xref target="RFC5443"/> to give LDP some time to establish the session and learn the MPLS labels before the link is used.</t>
</list>
</t>
<t>
This document introduces the following two new timers :
<list style="symbols">
<t>ULOOP_DELAY_DOWN_TIMER: slowdown the network wide IGP convergence in case of link down events.</t>
<t>ULOOP_DELAY_UP_TIMER: slowdown the local node convergence in case of link up events.</t>
</list>
</t>
</section>
<section anchor="description-current" title="Current IGP reactions">
<t>
Upon a change of status on an adjacency/link, the existing behavior of the router advertising the event is the following:
<list style="numbers">
<t>UP/Down event is notified to IGP.</t>
<t>IGP processes the notification and postpones the reaction in LSP_GEN_TIMER msec.</t>
<t>Upon LSP_GEN_TIMER expiration, IGP updates its LSP/LSA and floods it.</t>
<t>SPF is scheduled in SPF_TIMER msec.</t>
<t>Upon SPF_TIMER expiration, SPF is computed and RIB/FIB are updated.</t>
</list>
</t>
</section>
<section anchor="description-local-events" title="Local events">
<t>
In the next sections, we will use the concept of local events versus remote events.
The notion of event we are using in this document is linked to IGP link state advertisements and not network events, as a single network event would create multiple IGP link state advertisement within the network.
</t>
<t>
A local event is a set of IGP link state advertisements describing only a change of a local component of the computing router (e.g. a link).
As opposite to a remote event being a set of IGP link state advertisements describing any other type of changes.
</t>
<t>Example :
<figure>
<artwork>
+--- E ----+--------+
| | |
A ---- B -------- C ------ D
</artwork>
</figure>
Considering computing router is B, when B-C fails. B updates its local LSP describing the link B->C being down, C does exactly the same and starts flooding. During SPF_TIMER, B and C LSPs would be taken into account. B and C LSPs are describing exactly the same event (B-C link down). For B point of view, both LSPs must be considered as a local event as they are describing the change of a local component of B (link B-C).
If C node is failing, routers B,E and D are updating and flooding their LSPs. LSPs from E and D are considered as remote events for B as they are describing a change in a component that does not belong to B. Hence the local delay mechanism will be aborted. Hence this mechanism is not applicable to node failure.
</t>
</section>
<section anchor="description-new" title="Local delay">
<section anchor="description-updown" title="Link down event">
<t>
Upon an adjacency/link down event, this document introduces a change
in step 5 in order to delay the local convergence compared to the
network wide convergence: the node SHOULD delay the forwarding entry
updates by ULOOP_DELAY_DOWN_TIMER. Such delay SHOULD only be
introduced if all the LSDB modifications processed are only reporting
down local events . Note that determining that all topological change
are only local down events requires analyzing all modified LSP/LSA as
a local link or node failure will typically be notified by multiple
nodes. If a subsequent LSP/LSA is received/updated and a new SPF
computation is triggered before the expiration of
ULOOP_DELAY_DOWN_TIMER, then the same evaluation SHOULD be performed.
</t>
<t> As a result of this addition, routers local to the failure will
converge slower than remote routers. Hence it SHOULD only be done for non urgent convergence, such as for administrative de-activation (maintenance) or when the traffic is Fast ReRouted.
</t>
</section>
<section anchor="description-downup" title="Link up event">
<t> Upon an adjacency/link up event, this document introduces the
following change in step 3 where the node SHOULD:
<list style="symbols">
<t> Firstly build a LSP/LSA with the new adjacency but setting the
metric to MAX_METRIC . It SHOULD flood it but not compute the SPF
at this time. This step is required to ensure the two way connectivity check on all nodes when computing SPF.
</t>
<t> Then build the LSP/LSA with the target metric but SHOULD delay the
flooding of this LSP/LSA by SPF_TIMER + ULOOP_DELAY_UP_TIMER.
MAX_METRIC is equal to MaxLinkMetric (0xFFFF) for OSPF and 2^24-2
(0xFFFFFE) for IS-IS.
</t>
<t> Then continue with next steps (SPF computation) without waiting
for the expiration of the above timer. In other word, only the
flooding of the LSA/LSP is delayed, not the local SPF computation.
</t>
</list>
</t>
<t> As as result of this addition, routers local to the failure will
converge faster than remote routers.
</t>
<t>
If this mechanism is used in cooperation with "LDP IGP
Synchronization" as defined in <xref target="RFC5443"/> then the mechanism defined
in RFC 5443 is applied first, followed by the mechanism defined in
this document. More precisely, the procedure defined in this
document is applied once the LDP session is considered "fully
operational" as per <xref target="RFC5443"/>.
</t>
</section>
</section>
</section>
<section anchor="use-case" title="Applicability">
<t>As previously stated, the mechanism only avoids the forwarding loops on the links between the node local to the failure and its neighbor. Forwarding loops may still occur on other links.</t>
<section anchor="use-case-working" title="Applicable case : local loops">
<t>
<figure>
<artwork>
A ------ B ----- E
| / |
| / |
G---D------------C F All the links have a metric of 1
Figure 2
</artwork>
</figure>
</t>
<t>
Let us consider the traffic from G to F. The primary path is
G->D->C->E->F. When link CE fails, if C updates its forwarding entry
for F before D, a transient loop occurs.
This is sub-optimal as C has FRR enabled and it breaks the FRR forwarding while all upstream routers are still forwarding the traffic to itself.
</t>
<t>
By implementing the mechanism defined in this document on C, when the
CE link fails, C delays the update of his forwarding entry to F, in
order to let some time for D to converge. FRR keeps protecting the traffic during this period. When the timer expires on
C, forwarding entry to F is updated. There is no transient
forwarding loop on the link CD.
</t>
</section>
<section anchor="use-case-nonworking" title="Non applicable case : remote loops">
<t>
<figure>
<artwork>
A ------ B ----- E --- H
| |
| |
G---D--------C ------F --- J ---- K
All the links have a metric of 1 except BE=15
Figure 3
</artwork>
</figure>
</t>
<t>
Let us consider the traffic from G to K. The primary path is
G->D->C->F->J->K. When the CF link fails, if C updates its forwarding
entry to K before D, a transient loop occurs between C and D.</t>
<t>
By implementing the mechanism defined in this document on C, when the
link CF fails, C delays the update of his forwarding entry to K,
letting time for D to converge. When the timer expires on C,
forwarding entry to F is updated. There is no transient forwarding
loop between C and D. However, a transient forwarding loop may still
occur between D and A. In this scenario, this mechanism is not enough
to address all the possible forwarding loops. However, it does not
create additional traffic loss. Besides, in some cases -such as when
the nodes update their FIB in the following order C, A, D, for
example because the router A is quicker than D to converge- the
mechanism may still avoid the forwarding loop that was occuring.
</t>
</section>
</section>
<section anchor="applicability" title="Simulations">
<t>
Simulations have been run on multiple service provider topologies. So far, only link down event have been tested.
</t>
<texttable anchor="simu-result-topology" title="Number of Repair/Dst that may loop">
<ttcol align="center">Topology</ttcol>
<ttcol align="center">Gain</ttcol>
<c>T1</c><c>71%</c>
<c>T2</c><c>81%</c>
<c>T3</c><c>62%</c>
<c>T4</c><c>50%</c>
<c>T5</c><c>70%</c>
<c>T6</c><c>70%</c>
<c>T7</c><c>59%</c>
<c>T8</c><c>77%</c>
</texttable>
<t>
We evaluated the efficiency of the mechanism on eight different
service provider topologies (different network size, design). The
benefit is displayed in the table above. The benefit is evaluated as
follows:
<list style="symbols">
<t>We consider a tuple (link A-B, destination D, PLR S, backup nexthop N) as a loop if upon link A-B failure, the flow from a router S upstream from A (A could be considered as PLR also) to D may loop due to convergence time difference between S and one of his neighbor N.</t>
<t>We evaluate the number of potential loop tuples in normal conditions.</t>
<t>We evaluate the number of potential loop tuples using the same topological input but taking into account that S converges after N.</t>
<t>Gain is how much loops (remote and local) we succeed to suppress.</t>
</list>
On topology 1, 71% of the transient forwarding loops created by the failure of any link are prevented by implementing the local delay. The analysis shows that all local loops are obviously solved and only remote loops are remaining.
</t>
</section>
<section anchor="Deployment" title="Deployment considerations">
<t>
Transient forwarding loops have the following drawbacks :
<list style="symbols">
<t>Limit FRR efficiency : even if FRR is activated in 50msec, as soon as PLR has converged, traffic may be affected by a transient loop.</t>
<t>It may impact traffic not directly concerned by the failure (due to link congestion).</t>
</list>
This local delay proposal is a transient forwarding loop avoidance
mechanism (like OFIB). Even if it only address local transient loops, , the efficiency versus complexity comparison of the
mechanism makes it a good solution. It is also incrementally deployable with incremental benefits, which makes it an attractive option for both vendors to implement and Service Providers to deploy.
Delaying convergence time is not an issue if we consider that the
traffic is protected during the convergence.
</t>
</section>
<section anchor="Security" title="Security Considerations">
<t>
This document does not introduce change in term of IGP security. The
operation is internal to the router. The local delay does not
increase the attack vector as an attacker could only trigger this
mechanism if he already has be ability to disable or enable an IGP
link. The local delay does not increase the negative consequences as
if an attacker has the ability to disable or enable an IGP link, it
can already harm the network by creating instability and harm the
traffic by creating forwarding packet loss and forwarding loss for
the traffic crossing that link.
</t>
</section>
<section anchor="Acknowledgements" title="Acknowledgements">
<t>
We wish to thanks the authors of <xref target="I-D.ietf-rtgwg-ordered-fib"/> for introducing the concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano Previdi, and Olivier Bonaventure.
</t>
</section>
<section anchor="IANA" title="IANA Considerations">
<t>
This document has no actions for IANA.
</t>
</section>
</middle>
<back>
<references title="Normative References">
&RFC2119;
&RFC5715;
&RFC5443;
</references>
<references title="Informative References">
&OFIB;
&REMOTE-LFA;
&RFC6571;
&RFC3630;
&PLSN;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 02:54:28 |