One document matched: draft-ietf-6man-impatient-nud-07.xml
<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc0826 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.0826.xml'>
<!ENTITY rfc2119 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
<!ENTITY rfc3971 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3971.xml'>
<!ENTITY rfc4861 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4861.xml'>
<!ENTITY rfc6583 PUBLIC ''
'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6583.xml'>
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc strict="no"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" ipr="trust200902" docName="draft-ietf-6man-impatient-nud-07.txt" updates="4861">
<front>
<title abbrev="NUD is too impatient">
Neighbor Unreachability Detection is too impatient
</title>
<author initials="E" surname="Nordmark" fullname="Erik Nordmark">
<organization>Arista Networks</organization>
<address>
<postal>
<street></street>
<city>Santa Clara, CA</city>
<country>USA</country>
</postal>
<email>nordmark@acm.org</email>
</address>
</author>
<author initials="I" surname="Gashinsky" fullname="Igor Gashinsky">
<organization>Yahoo!</organization>
<address>
<postal>
<street>45 W 18th St</street>
<city>New York, NY</city>
<country>USA</country>
</postal>
<email>igor@yahoo-inc.com</email>
</address>
</author>
<date month="October" year="2013"/>
<workgroup>6MAN WG</workgroup>
<keyword>6MAN</keyword>
<abstract>
<t>IPv6 Neighbor Discovery includes Neighbor Unreachability Detection. That
function is very useful when a host has an alternative neighbor, for instance when there are
multiple default routers, since it allows the host to switch to the alternative neighbor in
short time. This time is 3 seconds after the node starts probing by default. However,
if there are no alternative neighbors, this is far too impatient. This document
specifies relaxed rules for Neighbor Discovery retransmissions that allow an
implementation to choose different timeout behavior based on whether or not there are alternative
neighbors. This document updates RFC 4861.</t>
</abstract>
</front>
<middle>
<section title="Introduction">
<t>
IPv6 Neighbor Discovery <xref target="RFC4861"/> includes Neighbor
Unreachability Detection (NUD), which detects when a neighbor is no longer
reachable. The timeouts specified for NUD are very short (by default three transmissions
spaced one second apart). These short timeouts can be appropriate when there are alternative
neighbors to which the packets can be sent. For example, if a host has multiple
default routers in its Default Router List or if the host has a
Neighbor Cache Entry (NCE) created by a Redirect
message. In those cases, when NUD fails, the host
will try the alternative neighbor by redoing next-hop selection. That implies
picking the next router in the Default Router List or discarding the redirect,
respectively.</t>
<t>The timeouts specified in <xref target="RFC4861"/> were chosen to be
short in order to optimize for the scenarios where alternative neighbors are available.</t>
<t>However, when there is no alternative neighbor there are several benefits in making
NUD try probing for a longer time. One of those benefits is to make NUD more
robust against transient failures, such as spanning tree reconvergence and other
layer 2 issues that can take many seconds to resolve. Marking the NCE as
unreachable in that case causes additional multicast on the network. Assuming
there are IP packets to send, the lack of an NCE will result in multicast
Neighbor Solicitations being sent (to the solicited-node multicast address)
every second instead of the unicast Neighbor
Solicitations that NUD sends.</t>
<t>As a result IPv6 Neighbor Discovery is operationally more brittle than IPv4 ARP. For IPv4 there
is no mandatory time limit on the retransmission behavior for ARP
<xref target="RFC0826"/> which allows implementors to pick more robust
schemes.</t>
<t>The following constant values in <xref target="RFC4861"/> seem to have
been made part of IPv6 conformance testing: MAX_MULTICAST_SOLICIT,
MAX_UNICAST_SOLICIT, and
RETRANS_TIMER. While such strict conformance testing seems consistent
with <xref target="RFC4861"/>, it means that the standard needs to be updated
to allow IPv6 Neighbor Discovery to be as robust as
ARP.</t>
<t>This document updates RFC 4861 to relax the retransmission rules.</t>
<t>Additional motivations for making IPv6 Neighbor Discovery more robust
in the face of degenerate conditions are covered in
<xref target="RFC6583"/>.</t>
</section>
<section title="Definition Of Terms">
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in <xref target="RFC2119"/>.
</t>
</section>
<section title="Protocol Updates" anchor='protocol-updates'>
<t>Discarding the NCE after after three packets spaced one second apart is only needed when
an alternative neighbor is available, such as an additional default router or a redirect.</t>
<t>If an implementation transmits more than MAX_UNICAST_SOLICIT/MAX_MULTICAST_SOLICIT
packets then it SHOULD use
exponential backoff of the retransmit timer. This is to avoid any
significant load due to a steady background level of retransmissions from
implementations that retransmit a large number of NSes before discarding the NCE.</t>
<t>Even if there is no alternative neighbor, the protocol needs to be able to
handle the case when the link-layer address of the neighbor/target has changed by
switching to multicast Neighbor Solicitations at some point in time.</t>
<t>In order to capture all the cases above this document introduces a new
UNREACHABLE state in the conceptual model described in <xref target="RFC4861"/>.
A NCE in the UNREACHABLE state retains
the link-layer address, and IPv6 packets continue to be sent to that link-layer
address. But in the UNREACHABLE state the NUD Neighbor Solicitations are
multicast (to the solicited-node multicast address),
using a timeout that follows an exponential backoff.</t>
<t>In the places where RFC4861 says to to discard/delete the NCE after
N probes (Section 7.3, 7.3.3 and Appendix C) this document instead specifies a
transition to the UNREACHABLE state.</t>
<t>If the Neighbor Cache Entry was created by a redirect, a node MAY delete the
NCE instead of changing its state to UNREACHABLE. In any case, the node SHOULD
NOT use an NCE created by a Redirect to send packets if that NCE is in
UNREACHABLE state. Packets should be sent following the next-hop selection
algorithm in <xref target="RFC4861"/>, Section 5.2,
which disregards NCEs that are not reachable.</t>
<t>The default router selection in <xref target="RFC4861"/>, Section 6.3.6
says to prefer default routers
that are "known to be reachable". For the purposes of that section, if the NCE
for the router is in UNREACHABLE state, it is not known to be reachable.
Thus the particular text in section 6.3.6 which says "in any
state other than INCOMPLETE" needs to be extended to say "in any
state other than INCOMPLETE or UNREACHABLE".</t>
<t>Apart from the use of multicast NS instead of unicast NS, and the
exponential backoff of the timer, the UNREACHABLE state works the same
as the current PROBE state.</t>
<t>A node MAY garbage collect a Neighbor Cache Entry at any time as specified
in RFC 4861. This freedom to garbage collect does not change with the
introduction of the UNREACHABLE state in the conceptual model. An
implementation MAY prefer garbage collecting UNREACHABLE NCEs over other NCEs.</t>
<t>There is a non-obvious extension to the state machine description
in Appendix C in RFC 4861 in the case for "NA, Solicited=1, Override=0. Different link-layer
address than cached". There we need to add "UNREACHABLE" to the current list
of "STALE, PROBE, Or DELAY". That is, the NCE would be unchanged.
Note that there is no corresponding change necessary to
the text in <xref target="RFC4861"/>, Section 7.2.5, since it is phrased using "Otherwise" instead of explicitly listing the three states.</t>
<t>The other state transitions described in Appendix C handle the introduction
of the UNREACHABLE state without any change, since they are described using
"not INCOMPLETE".
</t>
<t>There is also the more obvious change already described above. RFC 4861
has this:</t>
<figure>
<artwork><![CDATA[
State Event Action New state
PROBE Retransmit timeout, Discard entry -
N or more
retransmissions.
]]></artwork>
</figure>
<t>That needs to be replaced by:</t>
<figure>
<artwork><![CDATA[
State Event Action New state
PROBE Retransmit timeout, Increase timeout UNREACHABLE
N retransmissions. Send multicast NS
UNREACHABLE Retransmit timeout Increase timeout UNREACHABLE
Send multicast NS
]]></artwork>
</figure>
<t>The exponential backoff SHOULD be clamped at some reasonable
maximum retransmit timeout, such as 60 seconds (see MAX_RETRANS_TIMER below).
If there is no IPv6
packet sent using the UNREACHABLE NCE, then it is RECOMMENDED to stop the
retransmits of the multicast NS until either the NCE is garbage collected
or there are IPv6 packets sent using the NCE.
The multicast NS and associated exponential backoff
can be applied on the condition of the continued use of the NCE to send IPv6
packets to the recorded link-layer address.</t>
<t>A node can unicast the first few Neighbor Solicitation messages even
while in UNREACHABLE state, but it MUST switch to multicast Neighbor
Solicitations within 60 seconds of the initial retransmission
to be able to handle a link-layer address change
for the target. The example below shows such behavior.</t>
</section>
<section title="Example Algorithm">
<t>This section is NOT normative, but specifies a simple implementation which
conforms with this document. The implementation is described using operator
configurable values that allows it to be configured in a way to be
compatible with the retransmission behavior in <xref target="RFC4861"/>.
The operator can configure the values for MAX_UNICAST_SOLICIT, MAX_MULTICAST_SOLICIT,
RETRANS_TIMER, and the new BACKOFF_MULTIPLE, MAX_RETRANS_TIMER and MARK_UNREACHABLE.
This allows the implementation to be as simple as:
</t>
<t>
next_retrans = ($BACKOFF_MULTIPLE ^ $solicit_retrans_num) * $RetransTimer *
$JitterFactor
where solicit_retrans_num is zero for the first transmission, and JitterFactor
is a random value between MIN_RANDOM_FACTOR and MAX_RANDOM_FACTOR <xref target="RFC4861"/> to avoid any synchronization of transmissions from different hosts.
</t>
<t>
After MARK_UNREACHABLE transmissions the implementation would mark the
NCE UNREACHABLE and as result explore alternate next hops.
After MAX_UNICAST_SOLICIT the implementation would
switch to multicast NUD probes.
</t>
<t>
The behavior of this example algorithm is to have 5 attempts, with timing spacing of 0
(initial request), 1 second later, 3 seconds after the first retransmission,
then 9, then 27, and
switch to UNREACHABLE after the first three transmissions. Thus relative to
the time
of the first transmissions the retransmissions would occur at 1 second,
4 seconds, 13 seconds, and finally 40 seconds.
At 4 seconds from the first transmission the NCE would be
marked UNREACHABLE. That behavior
corresponds to:
<list>
<t>MAX_UNICAST_SOLICIT=5</t>
<t>RETRANS_TIMER=1 (default)</t>
<t>MAX_RETRANS_TIMER=60</t>
<t>BACKOFF_MULTIPLE=3</t>
<t>MARK_UNREACHABLE=3</t>
</list>
After 3 retransmissions the implementation would mark the NCE UNREACHABLE.
That results in trying an alternative neighbor, such as another default router or
ignoring a redirect as specified in <xref target="RFC4861"/>.
With the above values that would occur after 4 seconds after
the first transmission compared to the 2 seconds using the fixed scheme in
<xref target="RFC4861"/>. That
additional delay is small compared to the default 30,000 milliseconds ReachableTime.
</t>
<t>After 5 transmissions, i.e., 40 seconds after the initial transmission,
the example behavior is to switch to multicast NUD probes.
In the language of the state machine in
<xref target="RFC4861"/> that corresponds to the action "Discard entry".
Thus any attempts to send future packets would result in sending
multicast NS packets. An implementation MAY retain the backoff value as
it switches to multicast NUD probes.
The potential downside of deferring switching to multicast is that it would
take longer for NUD to handle a change in a link-layer
address i.e., the case when a host or a router changes their link-layer address
while keeping the same IPv6 address. However, <xref target="RFC4861"/> says
that a node MAY send unsolicited NS to handle that case, which is rather
infrequent in operational networks. In any case, the implementation needs
to follow the "SHOULD" in section <xref target='protocol-updates'/> to switch to
multicast solutions within 60 seconds after the initial transmission.
</t>
<t>
If BACKOFF_MULTIPLE=1, MARK_UNREACHABLE=3 and
MAX_UNICAST_SOLICIT=3, you would get the same behavior as in
<xref target="RFC4861"/>.
</t>
<t>
An implementation following this algorithm would, if the request was
not answered at first due for example to a transitory condition,
retry immediately, and then back off for progressively longer
periods. This would allow for a reasonably fast resolution time when
the transitory condition clears.
</t>
<t>
Note that RetransTimer and ReachableTime are by default set from
the protocol constants RETRANS_TIMER and REACHABLE_TIME, but are
overridden by values advertised in Router Advertisements as specified
in <xref target="RFC4861"/>. That remains the case even with the
protocol updates specified in this document.
The key values that the operator would configure are BACKOFF_MULTIPLE,
MAX_RETRANS_TIMER,
MAX_UNICAST_SOLICIT and MAX_MULTICAST_SOLICIT.
</t>
<t>It is be useful to have a maximum value for
($BACKOFF_MULTIPLE^$solicit_attempt_num)*$RetransTimer
so that the retransmissions are not too far apart. The above value
of 60 seconds for this MAX_RETRANS_TIMER is consistent with DHCPv6.</t>
</section>
<section title="Acknowledgements">
<t>The comments from Thomas Narten, Philip Homburg, Joel Jaeggli, Hemant
Singh, Tina Tsou, Suresh Krishnan, and Murray Kucherawy have helped improve this draft.</t>
</section>
<section title="Security Considerations">
<t>Relaxing the retransmission behavior for NUD is believed to have no impact on
security. In particular, it doesn't impact the application Secure Neighbor
Discovery <xref target="RFC3971"/>.</t>
</section>
<section title="IANA Considerations">
<t>This are no IANA considerations for this document.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc3971;
&rfc4861;
</references>
<references title="Informative References">
&rfc0826;
&rfc6583;
</references>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-23 11:09:20 |