One document matched: draft-gashinsky-v6nd-enhance-00.xml
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc4861 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4861.xml">
<!ENTITY rfc4033 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4033.xml">
<!ENTITY rfc4034 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4034.xml">
<!ENTITY rfc4035 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4035.xml">
<!ENTITY rfc4255 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4255.xml">
<!ENTITY rfc4398 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4398.xml">
<!ENTITY rfc4862 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4862.xml">
<!ENTITY rfc5246 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5246.xml">
<!ENTITY rfc6164 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6164.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!--
-->
<?rfc strict='yes'?>
<?rfc toc="yes"?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc strict="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc category="info" docName="draft-gashinsky-v6nd-enhance-00"
ipr="trust200902">
<front>
<title abbrev="Operational ND Enhance">Operational Neighbor Discovery
Problems and Enhancements.</title>
<author fullname="Warren Kumari" initials="W." surname="Kumari">
<organization>Google</organization>
<address>
<email>warren@kumari.net</email>
</address>
</author>
<author fullname="Igor" initials="I." surname="Gashinsky">
<organization>Yahoo!</organization>
<address>
<postal>
<street>45 W 18th St</street>
<city>New York</city>
<region>NY</region>
<country>USA</country>
</postal>
<email>igor@yahoo-inc.com</email>
</address>
</author>
<author fullname="Joel" initials="J." surname="Jaeggli">
<organization>Zynga</organization>
<address>
<postal>
<street>111 Evelyn</street>
<city>Sunnyvale</city>
<region>CA</region>
<country>USA</country>
</postal>
<email>jjaeggli@zynga.com</email>
</address>
</author>
<date day="29" month="June" year="2011" />
<abstract> <t>In IPv4, subnets are generally small, made just
large enough to cover the actual number of machines on the
subnet. In contrast, the default IPv6 subnet size is a /64, a
number so large it covers trillions of addresses, the
overwhelming number of which will be unassigned. Consequently,
simplistic implementations of Neighbor Discovery can be
vulnerable to denial of service attacks whereby they attempt to
perform address resolution for large numbers of unassigned
addresses. Such denial of attacks can be launched intentionally
(by an attacker), or result from legitimate operational tools
that scan networks for inventory and other purposes. As a result
of these vulnerabilities, new devices may not be able to "join"
a network, it may be impossible to establish new IPv6 flows,
and existing ipv6 transported flows may be interrupted.</t>
<t>This document describes the problem in detail and suggests
possible implementation improvements as well as operational
mitigation techniques that can in some cases to protect against
such attacks. It also discusses possible modifications to the
<xref target="RFC4861">traditional</xref> neighbor discovery
protocol itself.</t> </abstract> </front>
<middle> <section title="Introduction"> <t>This document describes
implementation issues with IPv6's Neighbor Discovery
protocol that can result in vulnerabilities when a network
is scanned, either by an intruder or through the use of
scanning tools that peform network inventory, security
audits, etc. (e.g., "nmap"). </t>
<t> This document describes the problem in detail and suggests
possible implementation improvements as well as operational
mitigation techniques that can in some cases to protect against
such attacks. It also discusses possible modifications to the
<xref target="RFC4861">traditional</xref> neighbor discovery
protocol itself.</t>
<t>The RFC series documents generally describe on-the-wire
behavior of protocols, that is, "what" is to be done by a
protocol, but not exactly "how" it is to be implemented. The exact
details of how best to implement a protocol will depend on the
overall hardware and software architecture of a particular
device. The actual "how" decisions are (correctly) left in the
hands of implementers, so long as implementations produce proper
on-the-wire behavior.</t>
<t>While reading this document, it is important to keep in mind
that discussions of how things have been implemented beyond basic
compliance with the specification is not in the scope of the
neighbor discovery RFCs.</t>
<section title="Applicability">
<t>This document is primarily intended for operators of IPV6 networks
and implementors of <xref target="RFC4861"></xref>. The Document
provides some operational consideration as well as recommendations to
increase the resilience of the Neighbor Discovery protocol.</t>
</section>
</section>
<section title="The Problem">
<t> In IPv4, subnets are generally small, made just large
enough to cover the actual number of machines on the subnet.
For example, an IPv4 /20 contains only 4096 address. In
contrast, the default IPv6 subnet size is a /64, a number so
large it covers literally billions of billions of addresses, the
overwhelming number of which will be unassigned. Consequently,
simplistic implementations of Neighbor Discovery can be
vulnerable to denial of service attacks whereby they perform
address resolution for large numbers of unassigned addresses.
Such denial of attacks can be launched intentionally (by an
attacker), or result from legitimate operational tools that scan
networks for inventory and other purposes. As a result of these
vulnerabilities, new devices may not be able to "join" a
network, it may be impossible to establish new IPv6 flows, and
existing ipv6 transport flows may be interrupted. </t>
<t> Network scans attempt to find and probe devices on a
network. Typically, scans are performed on a range of target
addresses, or all the addresses on a particular subnet. When
such probes are directed via a router, and the target
addresses are on a directly attached network, the router will
to attempt to perform address resolution on a large number of
destinations (i.e., some fraction of the 2^64 addresses on the
subnet). The process of testing for the (non)existance of
neighbors can induce a denial of service condition, where the
number of Neighbor Discovery requests overwhelms the
implementation's capacity to process them, exhausts available
memory, replaces existing in-use mappings with incomplete
entries that will never be completed, etc. The result can be
network disruption, where existing traffic may be impacted,
and devices that join the net find that address resolutions
fails.</t>
<t> In order to alleviate risk associated with this DOS
threat, some router implementations have taken steps to
rate-limit the processing rate of Neighbor Solicitations
(NS). While these mitigations do help, they do not fully
address the issue and may introduce their own set of potential
liabilities to the neighbor discovery process.</t> </section>
<section title="Terminology">
<t><list style="hanging">
<t hangText="Address Resolution">Address resolution is the
process through which a node determines the link-layer
address of a neighbor given only its IP address. In IPv6,
address resolution is performed as part of Neighbor
Discovery <xref target="RFC4861"></xref>, p60</t>
<t hangText="Forwarding Plane">That part of a router
responsible for forwarding packets. In higher-end routers,
the forwarding plane is typically implemented in specialized
hardware optimized for performance. Forwarding steps include
determining the correct outgoing interface for a packet,
decrementing its Time To Live (TTL), verifying and updating
the checksum, placing the correct link-layer header on the
packet, and forwarding it. </t>
<t hangText="Control Plane"> That part of the router
implementation that maintains the data structures that
determine where packets should be forwarded. The control
plane is typically implemented as a "slower" software
process running on a general purpose processor and is
responsible for such functions as the routing protocols,
performing management and resolving the correct link-layer
address for adjacent neighbors. The control plane "controls"
the forwarding plane by programming it with the information
needed for packet forwarding.</t>
<t hangText="Neighbor Cache">As described in <xref
target="RFC4861"></xref>, the data structure that holds the
cache of (amongst other things) IP address to link-layer
address mappings for connected nodes. The forwarding plane
accesses the Neighbor Cache on every forwarded packet. Thus
it is usually implemented in an ASIC .</t>
<t hangText="Neighbor Discovery Process">The Neighbor
Discovery Process (NDP) is that part of the control plane
that implements the Neighbor Discovery protocol. NDP is
responsible for performing address resolution and
maintaining the Neighbor Cache. When forwarding packets, the
forwarding plane accesses entries within the Neighbor
Cache. Whenever the forwarding plane processes a packet for
which the corresponding Neighbor Cache Entry is missing or
incomplete, it notifies NDP to take appropriate action
(typically via a shared queue). NDP picks up requests from
the shared queue and performs any necessary actions. In many
implementations it is also responsible for responding to
router solicitation messages, Neighbor Unreachability
Detection (NUD), etc.</t> </list></t> </section>
<section title="Background">
<t>Modern router architectures separate the forwarding of packets
(forwarding plane) from the decisions needed to decide where the
packets should go (control plane). In order to deal with the
high number of packets per second the forwarding plane is
generally implemented in hardware and is highly optimized for
the task of forwarding packets. In contrast, the NDP control
plane is mostly implemented in software processes running on a
general purpose processor. </t>
<t>When a router needs to forward an IP packet, the forwarding
plane logic performs the longest match lookup to determine where
to send the packet and what outgoing interface to use. To
deliver the packet to an adjacent node, It encapsulates the
packet in a link-layer frame (which contains a header with the
link-layer destination address). The forwarding plane logic
checks the Neighbor Cache to see if it already has a suitable
link-layer destination, and if not, places the request for the
required information into a queue, and signals the control plane
(i.e., NDP) that it needs the link-layer address resolved. </t>
<t> In order to protect NDP specifically and the control plane
generally from being overwhelmed with these requests,
appropriate steps must be taken. For example, the size and rate
of the queue might be limited. NDP running in the control plane
of the router dequeues requests and performs the address
resolution function (by performing a neighbor solicitation and
listening for a neighbor advertisement). This process is usually
also responsible for other activities needed to maintain
link-layer information, such as Neighbor Unreachability
Detection (NUD).</t>
<t>An attacker sending the appropriate packets to addresses on a given
subnet can cause the router to queue attempts to resolve so many
addresses that it crowds out attempts to resolve "legitimate" addresses
(and in many cases becomes unable to perform maintenance of existing
entries in the neighbor cache, and unable to answer Neighbor
Solicitiation). This condition can result the inability to resolve new
neighbors and loss of reachability to neighbors with existing ND-Cache
entries. During testing it was concluded that 4 simultaneous nmap
sessions from a low-end computer was sufficient to make a router's
neighbor discovery process unhappy and therefore forwarding
unusable.</t>
<t>This behavior has been observed across multiple platforms and
implementations.</t>
</section>
<section title="Neighbor Discovery Overview"> <t>When a packet
arrives at (or is generated by) a router for a destination on
an attached link, the router needs to determine the correct
link-layer address to send the packet to. The router checks
the Neighbor Cache for an existing Neighbor Cache Entry for
the neighbor, and if none exists, invokes the address
resolution portions of the <xref target="RFC4861">IPv6
Neighbor Discovery </xref> protocol to determine the
link-layer address.</t>
<t>RFC4861 Section 5.2 (Conceptual Sending Algorithm) outlines
how this process works. A very high level summary is that the
device creates a new Neighbor Cache Entry for the neighbor,
sets the state to INCOMPLETE, queues the packet and initiates
the actual address resolution process. The device then sends
out one or more Neighbor Solicitiations, and when it receives
a correpsonding Neighbor Advertisement, completes the Neighbor
Cache Entry and sends the queued packet.</t> </section>
<section title="Operational Mitigation Options">
<t>This section provides some feasible mitigation options that can be
employed today by network operators in order to protect network availability
while vendors implement more effective protection measures. It can be
stipulated that some of these options are "kludges", and are operationally
difficult to manage. They are presented, as they represent options we
currently have. It is each operator's responsibility to evaluate and
understand the impact of changes to their network due to these
measures.</t>
<section title="Filtering of unused address space.">
<t>The DOS condition is induced by making a router try to resolve
addresses on the subnet at a high rate. By carefully addressing
machines into a small portion of a subnet (such as the lowest numbered
addresses), it is possible to filter access to addresses not in that
portion. This will prevent the attacker from making the router attempt
to resolve unused addresses. For example if there are only 50 hosts
connected to an interface, you may be able to filter any address above
the first 64 addresses of that subnet by nullrouting the subnet
carrying a more specific /122 route.</t>
<t>As mentioned at the beginning of this section, it is fully
understood that this is ugly (and difficult to manage); but failing
other options, it may be a useful technique especially when responding
to an attack.</t>
<t>This solution requires that the hosts be statically or
statefully addressed (as is often done in a datacenter) and may not
interact well with networks using <xref target="RFC4862"></xref></t>
</section>
<section title="Appropriate Subnet Sizing.">
<t>By sizing subnets to reflect the number of addresses actually in
use, the problem can be avoided. For example <xref
target="RFC6164"></xref> recommends sizing the subnet for inter-router
links to only have 2 addresses. It is worth noting that this practice
is common in IPv4 networks, partly to protect against the harmful
effects of ARP flooding attacks.</t>
</section>
<section title="Routing Mitigation.">
<t>One very effective technique is to route the subnet to a discard
interface (most modern router platforms can discard traffic in
hardware / the forwarding plane) and then have individual hosts
announce routes for their IP addresses into the network (or use some
method to inject much more specific addresses into the local routing
domain). For example the network 2001:db8:1:2:3::/64 could be routed
to a discard interface on "border" routers, and then individual hosts
could announce 2001:db8:1:2:3::10/128, 2001:db8:1:2:3::66/128 into the
IGP. This is typically done by having the IP address bound to a
virtual interface on the host (for example the loopback interface),
enabling IP forwarding on the host and having it run a routing daemon.
For obvious reasons, host participation in the IGP makes many
operators uncomfortable, but can be a very powerful technique if used
in a disciplined and controlled manner.</t>
</section>
<section title="Tuning of the NDP Queue Rate Limit.">
<t>Many implementations provide a means to control the rate of
resolution of unknown addresses. By tuning this rate, it may be
possibly to amerlorate the isse, although, as with most tuning knobs
(especially those that deal with rate limiting), you may be
"completing the attack". By excissivly lowing this rate you may
negatively impact how long the device takes to learn new addresses
under normal conditions (for example, after clearing the neighbor
cache or when the router first boots) and, under attack conditions you
may be unable to resolve "legitimate" addresses sooner than if you had
just the the knob alone.</t>
<t>It is worth noting that this technique is only worth investigationg
if the device has separate queue for resolution of unknown addresses
versus maintaice of existing entries.</t>
</section>
</section>
<section title="Recommendations for Implementors.">
<t>The section provides some recommendations to implementors of IPv4
Neighbor Discovery.</t>
<t>At a high-level, implementors should program
defensively. That is, they should assume that intruders will
attempt to exploit implementation weaknesses, and should ensure
that implementations are robust to various attacks. In the case
of Neighbor Discovery, the following general considerations
apply:</t>
<t><list style="hanging">
<t hangText="Manage Resources Explicitely"> - Resources such
as processor cycles, memory, etc. are never infinite, yet
with IPv6's large subnets it is easy to cause NDP to
generate large numbers of address resolution requests for
non-existant destinations. Implementations need to limit
resources devoted to processing Neighbor Discovery requests
in a thoughtful manner.
</t>
<t hangText="Prioritize"> - Some NDP requests are more
important than others. For example, when resources are
limited, responding to Neighbor Solicitations for one's own
address is more important than initiating address resolution
requests that create new entries. Likewise, performing
Neighbor Unreachability Detection, which by definition is
only invoked on destinations that are actively being used,
is more important than creating new entries for possibly
non-existant neighbors.
</t>
</list></t>
<section title="Priortize NDP Activities">
<t>Not all Neighbor Discovery activies are equally
important. Specifically, requests to perform large numbers
of address resolutions on non-existant Neighbor Cache
Entries should not come at the expense of servicing requests
related to keeping existing, in-use entries properly
up-to-date. Thus, implementations should divide work
activities into categories having different priorities. The
following gives examples of different activities and their
importance in rough priority order.</t>
<t>1. It is critical to respond to Neighbor Solicitations for
one's own address, especially when a router. Whether for
address resolution or Neighbor Unreachability Detection,
failure to respond to Neighbor Solicitations results in
immediate problems. Failure to respond to NS requests that are
part of NUD can cause neighbors to delete the NCE for that
address, and will result in followup NS messages using
multicast. Once an entry has been flushed, existing traffic
for destinations using that entry can no longer be forwarded
until address resolution completes succesfully. In other
words, not responding to NS messages further increases the NDP
load, and causes on-going communication to fail.</t>
<t>2. It is critical to revalidate one's own existing NCEs in
need of refresh. As part of NUD, ND is required to frequently
revalidate existing, in-use entries. Failure to do so can
result in the entry being discarded. For in-use entries,
discarding the entry will almost certainly result in a
subsquent request to perform address resolution on the entry,
but this time using multicast. As above, once the entry has
been flushed, existing traffic for destinations using that
entry can no longer be forwarded until address resolution
completes succesfully. </t>
<t>3. To maintin the stability of the control plane, Neighbor
Discovery activity related to traffic sourced by the router
(as opposed to traffic being forwarded by the router) should
be given high priority. Whenever network problems occur,
debugging and making other operational changes requires being
able to query and access the router. In addition, routing
protocols may begin to react (negatively) to perceived
connectivity problems, causing addition undesirable ripple
effects. </t>
<t>4. Activities related to the sending and recieving of
Router Advertisements also impact address
resolutions. [XXX say more?]
</t>
<t>5. Traffic to unknown addresses should be given lowest
priority. Indeed, it may be useful to distinguish between
"never seen" addresses and those that have been seen before,
but that do not have a corresponding NCE. Specifically, the
conceptual processing algorithm in <xref target="RFC4861">IPv6
Neighbor Discovery </xref> calls for deleting NCEs under
certain conditions. Rather than delete them completely,
however, it might be useful to at least keep track of the fact
that an entry at one time existed, in order to prioritize
address resolution requests for such neighbors compared with
neighbors that have never been seen before.</t>
</section>
<section title="Queue Tuning.">
<t>On implementations in which requests to NDP are submitted
via a single queue, router vendors SHOULD provide operators
with means to control both the rate of link-layer address
resolution requests placed into the queue and the size of
the queue. This will allow operators to tune Neighbour
Discovery for their specific environment. The ability to
set or have per interface or subnet queue limits at a rate
below that of the global queue limit might limit the damage
to the neighbor discovery process to the taret network.</t>
<t>Setting those values must be a very careful balancing act -
the lower the rate of entry into the queue, the less load
there will be on the ND process, however, it also means that
it will take the router longer to learn legitimate
destinations. In a datacenter with 6,000 hosts attached to a
single router, setting that value to be under 1000 would mean
that resolving all of the addresses from an initial state (or
something that invalidates the address cache, such as a STP
TCN) may take over 6 seconds. Similarly, the lower the size of
the queue, the higher the likelihood of an attack being able
to knock out legitimate traffic (but less memory utilization
on the router).</t>
</section>
<section title="NDP Protocol Gratuitous NA">
<t>Per <xref target="RFC4861">RFC 4861, section 7.2.5 and 7.2.6</xref> requires
that unsolicited neighbor advertisements result in the receiver
setting it's neighbor cache entry to STALE, kicking off the resolution
of the neighbor using neighbor solicitation. If the link layer address
in an unsolicited neighbor advertisement matches that of the existing
ND cache entry, routers SHOULD retain the existing entry updating it's
status with regards to LRU retention policy.</t>
<t>Hosts MAY be configured to send unsolicited Neighbor advertisement
at a rate set at the discretion of the operators. The rate SHOULD be
appropriate to the sizing of ND cache parameters and the host count on
the subnet. An unsolicited NA rate parameter MUST NOT be enabled by
default. The unsolicted rate interval as interpreted by hosts must
jitter the value for the interval between transmissions. Hosts
receiving a neighbor solicitation requests from a router following
each of three subsequent gratuitous NA intervals MUST revert to RFC
4861 behavior.</t>
<t>Implementation of new behavior for unsolicited neighbor
advertisement would make it possible under appropriate circumstances
to greatly reduce the dependence on the neighbor solicitation process
for retaining existing ND cache entries.</t>
<t>This may impact the detection of one-way reachability.</t>
<t>It is understood that this section may need to be moved into a
separate document -- it is (currently) provided here for discussion
purposes.</t>
</section>
<section title="ND cache priming and refresh">
<t>With all of the above recommendations implemented, it should be
possible to survive a "scan attack" with very little impact to the
network, however, adding new hosts to the network (and the sending of
traffic to them) may still be negatively impacted. Traffic to those new
hosts would have to go through the unknown Neighbor Resolution queue,
which is where the attack traffic would end up as well. A solution to
this would be that any new host that joins the network would
"announce" itself, and be added to the cache, therefore not requiring
packets destined to it to go through the unknown NDP queue. This could
be done by sending a ping packet to the all-routers multicast address,
which would then trigger the router's own neighbor resolution process,
which should be in a different queue then other packets. </t>
<t>All attempts should be made to keep these addresses in
cache, since any eviction of legitimate hosts from the cache could
potentially place resolutions for them into the same queue as the
attack traffic. At present, <xref
target="RFC4861"></xref> states that there should be
MAX_UNICAST_SOLICIT (3) attempts, RETRANS_TIMER 1 second apart, so if
there is an interruption in the network or control plane
processing for longer then 3 seconds
during the refresh, the entry would be evicted from the ND Cache.
Any network event which takes longer then 3 seconds to
converge (UDLD, STP, etc may take 30+ seconds) while under an attack,
would result in ND cache eviction. If an entry is evicted during a scan,
connectivity could be lost for an extended period of time.</t>
<t>NDP refresh timers could be
revised as suggested in <eref target="http://tools.ietf.org/html/draft-nordmark-6man-impatient-nud-00">draft-nordmark-6man-impatient-nud-00</eref> and SHOULD have a
configurable value for MAX_UNICAST_SOLICIT and RETRANS_TIMER,
and include capabilities for binary/exponential backoff.</t>
<t>A suggested algorithm, which retains backward compatiblity with <xref
target="RFC4861"></xref> is: operator
configurable values for MAX_UNICAST_SOLICIT, RETRANS_TIMER, and a way
to set adaptive back-of multiple, simmilar to ipv4 -- call it
BACKOFF_MULTIPLE), so that we could implement:</t>
<t>next_retrans =
($BACKOFF_MULTIPLE^$solicit_attempt_num)*$RETRANS_TIMER + jittered
value.</t>
<t>The recommended behavior is to have 5 attempts, with timing spacing
of 0 (initial request), 1 second later, 3 seconds later, then 9, and
then 27, which represents:</t>
<t>MAX_UNICAST_SOLICIT=5</t>
<t>RETRANS_TIMER=1 (default)</t>
<t>BACKOFF_MULTIPLE=3</t>
<t>If BACKOFF_MULTIPLE=1 (which should be the default value), and
MAX_UNICAST_SOLICIT=3, you would get the backwards-compatible RFC
behavior, but operators should be able to adjust the values as necessary
to insure that they are sufficiently aggressive about retaining ND
entries in cache.</t>
<t>An Implementation following this algorithm would if the
request was not answered at first due for example to a transitory condition,
retry immediately, and then back off for progressively longer
periods. This would allow for a reasonably fast resolution time when the transitory condition clears. </t>
</section>
</section>
<section title="IANA Considerations">
<t>No IANA resources or consideration are requested in this draft.</t>
</section>
<section title="Security Considerations">
<t>This document outlines mitigation options that operators can use to
protect themselves from Denial of Service attacks. Implementation advice
to router vendors aimed at ameliorating known problems carries the risk
of previously unforeseen consequences. It is not believed that these
techniques create additional security or DOS exposure</t>
</section>
<section title="Acknowledgements">
<t>The authors would like to thank Ron Bonica, Troy Bonin, John Jason Brzozowski,
Randy Bush, Vint Cerf, Jason Fesler Erik Kline, Jared Mauch, Chris Morrow and
Suran De Silva. Special thanks to Thomas Narten for detailed review
and (even more so) for providing text!</t>
<t>Apologies for anyone we may have missed; it was not intentional.</t>
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc4861;
&rfc4398;
&rfc4862;
&rfc6164;
</references>
<references title="Informative References">
&rfc4255;
</references>
<section title="Text goes here.">
<t>TBD</t>
</section>
</back>
</rfc>
| PAFTECH AB 2003-2026 | 2026-04-24 01:50:44 |