One document matched: draft-weaver-dnsext-comprehensive-resolver-00.txt
DNS Extensions Working Group N. Weaver
Internet-Draft International Computer Science
Intended status: Informational Institute
Expires: April 3, 2009 September 30, 2008
Comprehensive DNS Resolver Defenses Against Cache Poisoning
draft-weaver-dnsext-comprehensive-resolver-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 3, 2009.
Abstract
DNS resolvers are vulnerable to many attacks on their network
communication, ranging from blind attacks to full men-in-the-middle.
Although a full man-in-the-middle can only be countered with
cryptography, there are many layers of defenses which apply to less
powerful attackers. Of particular interest are defenses which only
require changing the DNS resolvers, not the authoritative servers or
the DNS protocols. This document begins with a taxonomy of attacker
capabilities and desires, and then discusses defenses against classes
of attackers, including detecting non-disruptive attacks, entropy
budgeting, detecting entropy stripping, semantics of duplication, and
cache policies to eliminate "race-until-win" conditions. Proposed
defenses were evaluated with traces of network behavior.
Weaver Expires April 3, 2009 [Page 1]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. A Taxonomy of Attacks . . . . . . . . . . . . . . . . . . . . 3
2.1. Passive or Active Attacks . . . . . . . . . . . . . . . . 3
2.2. Blind or Aware Attacks . . . . . . . . . . . . . . . . . . 4
2.3. Non-Disruptive or Disruptive Attacks . . . . . . . . . . . 4
2.4. Transaction or Cache Attacks . . . . . . . . . . . . . . . 4
2.5. Direct Mapping or Ancillary Data Attacks . . . . . . . . . 4
3. Race-Until-Win Blind Attacks . . . . . . . . . . . . . . . . . 4
4. Requirements for Blind Transaction Attacks . . . . . . . . . . 5
5. General Evaluation Strategy . . . . . . . . . . . . . . . . . 6
6. Directly Detecting Non-Disruptive Attacks . . . . . . . . . . 6
7. Entropy Budgeting . . . . . . . . . . . . . . . . . . . . . . 8
8. Entropy Stripping . . . . . . . . . . . . . . . . . . . . . . 9
9. On Duplication for Entropy Increase . . . . . . . . . . . . . 9
10. Cache Policy, Scoping, and Ancillary Data Attacks . . . . . . 11
10.1. Preliminary Estimates of Performance Impact . . . . . . . 13
10.2. Optimization: Only One A Record for the NS RRSet . . . . . 14
10.3. Optimization: Object Scope . . . . . . . . . . . . . . . . 14
10.4. Optimization: Lazily Fetching the NS RRSet . . . . . . . . 15
10.5. Accepting Requests for Change . . . . . . . . . . . . . . 16
11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 16
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17
14. Security Considerations . . . . . . . . . . . . . . . . . . . 17
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
15.1. Normative References . . . . . . . . . . . . . . . . . . . 18
15.2. Informative References . . . . . . . . . . . . . . . . . . 18
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 18
Intellectual Property and Copyright Statements . . . . . . . . . . 20
Weaver Expires April 3, 2009 [Page 2]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
1. Introduction
DNS resolvers are susceptible to many attacks on their network
traffic, ranging from an attacker performing blind packet injection
to a full man-in-the-middle, capable of controlling all traffic the
resolver receives.
With the recent discovery of the Kaminski Attack [kaminski], new
attention has been focused on securing DNS from adversaries. This
document focuses on a subset of the problem: securing DNS resolvers
without changing the DNS authoritative servers or protocols,
including authorities that do not actively follow the DNS
specification.
This document begins with a taxonomy of attacker properties
(Section 2), observations on race-until-win blind attacks
(Section 3), the limitations of blind transaction attacks
(Section 4), the evaluation strategy used to study possible defenses
(Section 5), directly detecting non-disruptive attacks (Section 6),
entropy budgeting (Section 7), detecting entropy stripping
(Section 8), the effects of duplication on protecting the transaction
and cache while maintaining compatibility (Section 9), and finally,
the effects of cache policies which resist race-until-win attacks
(Section 10).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. A Taxonomy of Attacks
Not all attacker capabilities are equal, and different defenses are
able to block some classes of attackers. By forming a taxonomy, one
can describe the classes of attackers which each defense can detect
or mitigate.
2.1. Passive or Active Attacks
A passive attacker must wait for the targeted resolver to make
requests, while an active attacker is able to trigger specific
requests by the resolver. In general, there are numerous mechanisms
which an attacker can use to trigger requests. One must assume that,
should the attacker desire it, the attacker can be active.
Weaver Expires April 3, 2009 [Page 3]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
2.2. Blind or Aware Attacks
A blind attacker does not know any of the entropy sources (e.g. the
Transaction ID, port selection, capitalization) of the request. An
aware attacker knows this information because he can directly observe
the request of the resolver. Increasing the entropy of a request
increases the difficulty for blind attackers, but has no effect on
aware attackers.
2.3. Non-Disruptive or Disruptive Attacks
A non-disruptive attacker is unable to block either the resolver's
request or the authority's response, while a disruptive attacker can
halt either the legitimate request or the legitimate response.
Disruptions can take the shape of a DOS attack (if the attacker is
not on the packet path), taking advantage of a failure of the
authoritative DNS server, mechanisms in the physical layer which
enable sequelching a sender, or dropping packets if the attacker is a
man-in-the-middle.
2.4. Transaction or Cache Attacks
A transaction attack targets the individual transaction requested by
the end-client: the attacker needs to get his response immediately
accepted by the victim application, while a cache attack requires
that the resolver cache the attacker's result for future use.
Transaction attacks are often less powerful than cache attacks: A
transaction attack targets the single victim request, while cache
attacks can have lasting effects until the TTL expires. For blind
attacks, as discussed in Section 4, blind transaction attacks are
strictly less powerful than blind cache attacks.
2.5. Direct Mapping or Ancillary Data Attacks
A direct attack targets the immediate answer to the question asked by
the resolver. An ancillary data attack requires that the resolver
accept some data, be it an NS RRSet, a CNAME, an A record, or other
result, which is not the direct answer to the question. A
transaction attack must target the direct mapping, while cache
attacks can target ancillary data.
3. Race-Until-Win Blind Attacks
The Kaminski attack is a blind, cache attack targeting ancilllatory
data. The additional power of the Kaminski attack is not a reduction
in the number of packets required for a successful blind attack, but
Weaver Expires April 3, 2009 [Page 4]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
a reduction in time. Rather than only running one race-condition
attack every TTL, the Kaminski attack and variations all rely on
ancillary information to achieve a "race-until-win" property: when
the attack fails, it can be immediately retried without delay instead
of waiting for the TTL to expire.
If a blind attacker targets an actual record, that race can only be
run once every TTL, as subsequent requests will simply hit in the
cache. Thus, creating any race-until-win attack on a single name
requires targeting ancillary data.
4. Requirements for Blind Transaction Attacks
A blind transaction attack is strictly harder than a blind cache
attack. In a blind cache attack, the attacker only needs to coerce
the victim into generating DNS requests. A blind transaction attack
not only requires coercing the victim into generating a DNS request,
but also requires that the victim act upon the result of that
request.
Blind transaction attacks are also less powerful than blind cache
attacks. In order to target a name with a race-until-win attack, the
attacker must be able to not only get the victim to generate a DNS
request, and act on that request, but that request must be valid for
any arbitrary subname within the target domain. If the blind
transaction attack is targeting a single name, it can never be run
with a race-until-win property, as it must target the direct mapping
and not ancillatory data.
However, there does exist at least one significant blind transaction
attack which can be conducted with a "race-until-win" property:
targeting cookies contained in a web browser. In this attack, the
attacker coerces the victim into visiting a site of the attacker's
choosing. This site opens up an iFrame which points to
1.www.target.com, which the attacker attempts to poison with a blind
attack. If it fails, the Javascript on the site creates a second
iFrame, 2.www.target.com. This process iterates until success,
whereupon the victim's browser contacts the attacker's web site,
presenting all relevant cookies. Since each attempt uses a different
name, the attacker can try continuously until successful.
Although such cookie stealing is noteworthy, any site which allows
cookies to be stolen in this manner is also trivially vulnerable to
many attack tools such as Cookie Monster [perry], and any web service
which resists such tools also resists this attack. It is unclear
whether the DNS infrastructure should be concerned with this
particular attack.
Weaver Expires April 3, 2009 [Page 5]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
It is also unclear what other blind transaction attacks are possible
with this "race-until-win" property, but it relies on the end-
application trusting arbitrary subnames and subdomains for attacker-
triggered requests. Without this property, the attacker cannot
perform a "race-until-win" blind transaction attack. Thus, blind
transaction attacks, although they need to be considered, are a far
narrower threat than blind cache attacks.
5. General Evaluation Strategy
We evaluate multiple defenses in this document using network traces
analyzed by "Bro", a network analysis environment primarily used for
intrusion detection. Bro includes a detailed analyzer for DNS
behavior, coupled to a scriptable analysis language.
The primary evaluation was conducted on traces of the ICSI border,
capturing 18,329,249 UDP external DNS requests generated by ICSI's
resolvers over a period of 19 days. This initial evaluation ignored
PTR lookups.
6. Directly Detecting Non-Disruptive Attacks
Any non-disruptive attack will, with high probability, have the
resolver receive two separate and valid responses: one from the
attacker and one from the authoritative server. This will occur on
blind attacks where the attacker does not suppress the correct
operation of the authoritative resolver, such as by flooding it to
achieve a denial-of-service. It will also occur if an attacker is
acting as a packet injector on a broadcast media if the attacker is
not able to squelch the sender.
The presence of a changed duplicate response, where the second
response is different from the first response within a short period
of time can thus be used to directly detect that a non-disruptive
attack has occurred on a transaction, and mitigate any poisoning of
the cache.
The algorithm is simple: the resolver maintains a list of all
answered responses for a timeout which should be on the order of one
or two seconds. If a second response for a request arrives whose
transaction and other entropy matches the accepted response but whose
value is different, this should be treated as an attack, and all
cache entries set or dependent on the first result should be voided,
mitigating the attack's effect on the cache. Thus a transaction
attack is detected but not mitigated (as the final victim may have
acted on the result), but a cache attack is both detected and
Weaver Expires April 3, 2009 [Page 6]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
mitigated
Furthermore, if the resolver is a recursive resolver, it should also
forward the second response to the originator with the TTL changed to
0 seconds, which allows the initial questioner, if running the same
algorithm, to also know that the result was compromised, enabling the
end system to detect the attack on the transaction and mitigate any
effects on a local cache.
We developed a Bro analysis script to detect this condition to
evaluate the false positive rate. We observed only one such site at
ICSI, a realtime DNS blackhole service. A run at Ohio State did
found a few other sites, which we haven't yet assessed.
As importantly, if resolvers implemented this approach, they would
fail to cache only a handful of authorities which present this
anomaly on normal results. Until the stub resolvers are updated to
treat this condition as an attack, these sites would still resolve
successfully but would not be cached.
The state-holding requirements, although nontrivial, are also
reasonable. Consider a recursive resolver (or a cluster node in a
clustered resolver implementation) that generates 10,000 outstanding
external queries per second. If the timeout is 1 second, and the
state-holding required involves 1 KB of data, the additional memory
requirements would only be 10 MB of data for such a resolver. Even
with a longer timeout or a more active resolver, the state-holding
requirements should be reasonable.
Furthermore, although the replies themselves need to be maintained,
for the most part this information is already retained in the cache.
For any data element that is cached, only a pointer to that element
needs to be stored to perform this consistency check, rather than the
full answer.
This should also not affect any resolver composed of a cluster of
systems as long as the load balancer is deterministic: always sending
the second response back to the cluster node which received the first
response. This enables each cluster node to maintain its own list of
replies to check for changed responses.
This policy would have no effect on the latency of resolvers. Until
end-user stub resolvers and applications are updated to treat this as
an attack, even sites with anomalous DNS authorities will still
resolve properly, but may experience slightly higher load due to lack
of cacheability.
Weaver Expires April 3, 2009 [Page 7]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
7. Entropy Budgeting
An important question is "how much entropy is necessary" to prevent
blind attacks. It's obvious that 16 bits (16b) of entropy in a query
is insufficient, both to protect transactions and to protect the
cache. 32b of entropy may in practice currently suffice to mitigate
most attacks today, as attackers will prefer victims that do not
implement proper randomization.
Thus the question, assuming all resolvers implement increased entropy
defenses, what is sufficient query entropy to protect the cache? We
argue that 40b of entropy is more than sufficient. As the
probability of attack success with K tries and N bits of entropy is
1-(1-1/2^N)^K, 40b of entropy requires nearly a terapacket of attacks
to be successful with reasonable probability.
40b of entropy is easily obtainable for resolvers not behind entropy-
stripping devices such as NATs (Section 8). With 16b of entropy for
the transaction ID, over 15b of entropy for source-port
randomization, and names of at least 9 characters using 0x20
randomization (observing that most DNS authoritative servers maintain
capitalization, thus random capitolization can provide a nonce
value)[0x20] achieves the necessary 40b threshold without resorting
to duplication in most cases (Section 9).
It is possible for a resolver to use a simple count of failures,
responses which match requests and appear to come from the proper
authority but have different entropy values, to know that it is under
attack and respond with duplication Section 9 while the attack is
ongoing, well before an attacker could have an effect on the cache
with large entropy. In our observations on ICSI traces we noticed
that some authoritative servers do this naturally, but the rate is
low.
If the resolver responded to 1k PPS of attacks with duplication, and
the entropy budget is 40b, an attacker attempting to go below the
threshold by sending 990 attacks per second would need over 2 hours
to have even a .001% chance of success, or over 80 days of a
continual attack to have just a 1% chance. Yet by requiring the
attacker to send 1k packets-per-second to trigger duplication in the
resolver, duplication which only needs to be in place while under
attack, this shouldn't prove to be an effective DOS.
However, the author initially advocated clearing the cache at a
threshold of attack. This was an error, as voiding the cache does
not provide a benefit as the attacker knows when his attack is
successful, and could accept the voided cache and just keep trying
until successful.
Weaver Expires April 3, 2009 [Page 8]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
8. Entropy Stripping
Many network devices, such as NATs, firewalls, and mandatory proxies,
can exist between a resolver and the remote authority it is querying.
Some of these devices will rewrite important information, such as IP
addresses, UDP port numbers, or even DNS transaction IDs, fields that
the resolver relies on as sources of query entropy. These devices
are likely to be located near one or the other endpoint system.
If a DNS resolver is attempting to increase its request entropy by
using one or more of these sources, the resolver must know if these
entropy sources are being stripped from its network communication.
The simple solution is to query a specialized authority which returns
the entropy value it receives as an A record. An example of such an
authority has been temporarily operating at
{port,id,server}.nettest.icir.org, where the transaction ID and port
are returned as the least significant two bytes of the address, while
server returns the IP address of the contacting DNS resolver. An
alternate version, entropy.nettest.icir.org, returns a CNAME to a
human readable form of all three values, while respecting the
incoming capitalization to verify 0x20 operation.
When a DNS resolver starts, and when a resolver notices that its IP
address has changed, it should query such an authority, provided by
either the resolver's software developer or a third party, to
determine if it is located behind a NAT or other entropy-stripping
device. If the resolver is behind a NAT, it must then use
duplication (Section 9) to protect the cache if the remaining entropy
is not sufficient to meet the security goals (Section 7).
A case where this does not provide protection is if the authority and
the attacker, not the resolver, is behind an entropy-stripping
network device. In such a case, an attacker capable of forging
packets from within the authority's network is likely able to perform
other activities far more damaging than a blind attack on DNS
requests from that authority. Also, the entropy stripping is only
affecting queries to this authority, not all queries performed by a
resolver. This also assumes than any entropy-stripper is not
malicious and would therefore not benefit from actively whitelisting
the test.
9. On Duplication for Entropy Increase
If 40b of entropy are available on the request and the resolver is
not under a significant attack, duplication is not necessary. Thus
duplication should be viewed as a fallback position for resolvers
Weaver Expires April 3, 2009 [Page 9]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
which are behind a NAT or other entropy stripping device, accessing
authoritative servers which don't respond to 0x20 (per Dagon et. al's
proposal [0x20], or is known to be under an active blind attack.
Most authorities are deterministic to multiple queries: if two
requests for the same name are received in a short period of time,
the returned values will be identical. By issuing two identical
requests with different entropy values, this nearly doubles the
entropy if we require that both responses are identical before
acceptance. Specifically, if the request has K bits of entropy, two
requests which are accepted if the responses are identical has 2K-1
bits of entropy.
Dagon et. al.'s 0x20 proposal [0x20] uses this to increase query
entropy when capitalization is not preserved by an authority, as
without 0x20 the available entropy is only around 32b per query.
Similarly, duplication can be used in any case where there is
insufficient entropy, such as the impact of an entropy-stripping
network device, or if the resolver knows it is currently under a
blind attack.
Unfortunately, not all authorities are deterministic in this manner,
including some critical authoritative servers belonging to Content
Distrbution Networks such as Akamai[akamai] and others that use
frequently changing DNS responses for load-balancing. If two
distinct responses are received, and the resolver randomly selects
one, this reduces by 1 bit the entropy of the request when compared
with no duplication. Thus, for purposes of the cache, it is clear
that a resolver must not cache a response when the two responses are
different.
However, it appears reasonable to return one of the values as the
answer to the transaction, as blind transaction attacks are less
powerful than blind cache attacks (Section 4). For a blind
transaction attack to work, the attacker must target a domain served
by such a volatile authority as well as coerce the victim to act on
this. Although this does leave a small window of vulnerability open,
it is proabably preferable to the alternative of not resolving thees
names.
By setting the TTL to 0 and returning a randomly selected response,
this enables compatibility with nondeterministic authorities without
compromising the integrity of either the resolver's cache or the
final client's cache. Since it also only reduces the transactional
entropy by 1b, it does not make transactions significantly less
secure than without duplication.
The alternate approach, iterate until convergence as proposed by
Weaver Expires April 3, 2009 [Page 10]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
Barwood [barwood], will succeed with very high probability if the
remote resolver returns single answers, but fails when the authority
returns RRSets which contain more than one element where the RRSets
are randomly chosen, if a complete match on the RRSet is required.
If "return one of N of the RRSets" is employed it works well.
10. Cache Policy, Scoping, and Ancillary Data Attacks
RFC 2181 section 5.4.1 [RFC2181] is the current specification for how
to accept ancillatory data. It allows ancillatory data, if "in
bailywick" (from a server which should be an authority for this
name), to be cached for subsequent use, although it should not be
returned as an answer. It is this caching of ancillatory data that
enables the "race-until-win" Kaminski blind cache attack. Likewise,
before bailywick-checking was deployed, ancillatory data was used for
classic glue poisoning.
The ancillatory data includes all the data beyond the direct answer
to the query (including the NS RRSet, A records associated with the
NS RRSet, A records associated with a CNAME for the direct answer, or
any other data). This data serves four purposes:
o The NS RRSet and associated A records needed to resolve the
current request.
o Items, such as an A record for a CNAME alias, which if accepted
will speed the current request's processing by removing the need
to fetch additional records.
o Items which if placed in the cache will speed subsequent lookups.
o An indication that an item in the cache is now obsolete.
This suggests that items have a different role in two scopes,
analogous to how programming languages view scope: the local scope of
the current recursive query and the global scope of the cache.
Within the local scope, for purposes of resolving the direct
question, ancillary data often must be trusted: otherwise
authoritative nameservers may not be reachable when they exist within
the domain being queried or in other cases where domains host each-
others nameservers. Yet for the purposes of resolving a query, if an
authority lies about the ancillatory data, it could just as easily
lie about the direct answer, making this data no less trustworthy for
processing this answer.
Yet for purposes of inclusion into the global scope, or for returning
as the response to a query, ancillary data must not be trusted. It
Weaver Expires April 3, 2009 [Page 11]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
is, by definition, unsolicited information and not an authoritative
response, and lies at the heart of the Kaminski attack. If a
recursive resolver never accepts ancillary data into the cache, it
becomes impossible to target a single name with a race-until-win
blind attack.
However, a resolver may safely perform an independent fetch for any
piece of ancillary data. This second fetch must not reuse the local
scope of the previous fetch, but instead be fetched using a new local
context. This enables both the use of ancillary data in responses
(such as A records involved in CNAMES) and to speed subsequent
responses (such as obtaining the NS RRSet for subsequent lookups).
As an example, suppose the query for 'www.example.com' returns
'www.example.com CNAME server.example.com, example.com NS
ns.example.com, ns.example.com A 12.34.56.78, server.example.com A
12.34.56.90'. The resolver can succesfully cache and return
'www.example.com CNAME server.example.com', but must perform
independent queries to fetch the NS RRSet and the A records for
ns.example.com and server.example.com, and it can't return
'www.example.com CNAME server.example.com, server.example.com A
12.34.56.90' to the final client until the lookup for
server.example.com completes.
As a result, all information which is either placed in the global
scope or returned to the final client will be validated directly by
querying the proper authoritative server. There are no race-until-
win attacks possible for including a name into the cache, as
inserting a name only would take place if it doesn't exist in the
cache which limits any attempt to once every TTL.
This is more restrictive than the policy in RFC 2181 section 5.4.1
[RFC2181]. The RFC policy allows ancillary data to be accepted into
the global scope (cache) for purposes of subsequent query processing,
which enables race-until-win attacks.
Although the RFC states that the final client should also not accept
these authority or additional records, it is unclear whether the stub
resolver follow the RFC. Thus a well structured recursive resolver
should return results which are safe even if the client does cache
additional records in violation of the RFC.
The simplest mechanism for validation is to perform an explicit
fetch, a policy being implemented in Unbound [unbound]: For CNAMEs
and A records that are not the direct answer to the query, such
records are not accepted directly but are instead fetched
independently. For the NS record and associated A records, the
separate global and local scope is approximated by validating the NS
Weaver Expires April 3, 2009 [Page 12]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
RRSet and all elements within it using subsequent separate fetches.
There is some minor uncertanty if this is a close-enough
approximation for the NS RRSet. There exists a short-term time
window where the NS RRSet and A records would be in the cache but
unvalidated, which may provide a narrow opportunity for abuse.
However, this policy does not require new data structures to maintain
the local scope of a recursive query, and the abuse window is
probably sufficiently small.
Note that if an explicit local scope is maintained, such a policy of
fetching all ancillary data rather than including it subsumes
bailiwick checking for accepting data into the cache, as no data is
included into the cache for any use that was not explicitly requested
from an authoritative server.
10.1. Preliminary Estimates of Performance Impact
Such a policy, although protecting from race-until-win conditions,
does impose a higher load on the DNS infrastructure. For queries
whose direct answer is not a CNAME, it does not add any latency to
query processing if the resolver returns an empty authoritative NS
RRset when the NS RRset has yet to be validated, but the number of
outstanding queries is significantly increased.
We developed a Bro analysis script to estimate the impact of this
policy. This policy accepted a trace of DNS requests and replies and
estimated the number of additional queries that would be generated to
follow CNAME chains, to cache the NS RRSet and associated A records,
and to cache any A records not associated with either a CNAME chain
or the NS RRSet. It creates a model of what the resolver's own cache
looks like and uses this to estimate the load from additional
requests.
For ICSI's 18,329,249 queries, this simple policy would require an
additional 16% additional fetches for NS records, an additional 10%
of fetches for A records associated with the NS RRSets, .7%
additional fetches of A records not associated with NS RRSets and a
trivial number of CNAMEs. Thus this policy increases the number of
requests by 27%, a nontrivial but still modest overhead. Later on,
we discuss some policies that can significantly reduce the number of
fetches while maintaining safety.
Only a few queries would have their latency increased if the resolver
does not return the NS RRset to the client for the first query
because it is still unvalidated. Thus the only cases where a query
would see increased latency is when the answer is a CNAME and the
record pointed to by this CNAME is in the record. Only .2% of ICSI's
Weaver Expires April 3, 2009 [Page 13]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
queries showed this behavior.
If the resolver does include the NS RRSet it needs to wait for it to
be validated in case the final client caches the names for other
purposes. This would increase the latency on 16% of the queries
which require accessing an external authority to obtain the NS RRSet.
10.2. Optimization: Only One A Record for the NS RRSet
It is obviously excessive to fetch all A records for the NS RRSet:
unless there is a failure, only one nameserver needs to be contacted
to return future results, although that resolver might be sub-optimal
from a latency viewpoint. If only one A record is fetched for the NS
RRSet rather than all A records, this reduces from 10% to 6% the
number of additional records fetched for the purpose of providing a
valid name associated with each NS RRset.
However, this may increase the latency of subsequent responses if the
chosen authoritative server is not the closest. This latency could
be mitigated by performing additional fetches of alternate
nameservers as requests for a domain continue to be made. One
suggestion is to use the "closest name", to ranodmly select the name
which most closely matches the domain. Thus if "example.com NS
ns.example.com, ns.example2.com, ns.example3.org", needed to be
cached, ns.example.com would be fetched. If this lookup failed,
ns.example2.com would be fetched, followed by ns.example3.org:
selecting from the most matching to least-matching name. It is
unclear what the latency penalty would be for this heuristic.
10.3. Optimization: Object Scope
There does exist an additional data scope: object local scope, that
can act as an optimization. For example, the A records associated
with an NS RRSet cannot be accepted directly. Yet since they were
returned in association with a specific query for the NS RRSet, they
can be trusted solely for purposes of evaluating the NS RRset.
As an example, if the response for a query for the nameservers of
example.com is 'example.com NS ns.example.com, ns.example.com A
12.34.56.78', it is acceptable to use the A record solely for the
purposes of a subsequent lookup of a value in example.com, but not
for any other purposes. This, in programming language terms,
represents object-local scope: a data value that can only be used in
the context of another data value.
This arrises from a simple observation: if the response is corrupted,
any value in the response could have been corrupted, including the
legitimate answer. Thus there is no risk in using the ancillatory
Weaver Expires April 3, 2009 [Page 14]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
data in a context where the direct answer was trusted. But
ancillatory data can not be trusted outside the context where the
direct answer is trusted, because this enables race-until-win
attacks.
There are two mechanisms where support for object-local scope can
enhance resolver performance without compromising safety. The first
is to collapse CNAMES and return the result, per Ohta's suggestion
(cite). In this, an authority that returns "www.example.com CNAME
a.example2.com, a.example2.com A 10.34.56.78" could be treated by the
resolver (and returned to the final client) as "www.example.com A
10.34.45.78". This acts to eliminate the latency penalty for
fetching CNAME chains.
The second area where object-local scope provides significant savings
is for the NS RRset. When the NS RRset is requested, the A records
can't be considered as authoritative in general, but MAY be
considered as valid only for the use of the NS RRset for subsequent
name lookups. This completely eliminates the need to look up the A
records associated with the NS RRSet, while still preventing auxilary
data from poisoning the cache. However, it does require changes to
cache architectures to support this notion: the NS RRset records must
include inernal A records which are not exported to the rest of the
cache.
The savings are significant: support for object-local scope allows
the resolver to not fetch the A records for the NS entries, reducing
the total overhead from 27% to 17% for ICSI's traffic
10.4. Optimization: Lazily Fetching the NS RRSet
There exists a latency/performance tradeoff in fetching the NS RRset.
Observe that, for many sites, only a single name is used. Thus,
fetching (and caching) the NS RRset represents wasted effort.
Instead of eagerly fetching the NS RRset, fetching the NS RRset and
any associated A records can be delayed until a second query is
generated.
The traffic savings are substantial. In the ICSI traces, instead of
requiring 16% additional fetches of the NS RRset, only 4% additional
fetches would have been required, with a similar reduction in the
number of A records which need to be fetched. As a consequence,
however, the lookup of the second name will be delayed as the NS
RRset needs to be fetched first. Yet the penalty is fairly modest,
as only the second (and not subsequent) fetches would incur this
latency penalty. Thus only 4% of the queries would see this latency
penalty.
Weaver Expires April 3, 2009 [Page 15]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
10.5. Accepting Requests for Change
The final use of ancillary records is to indicate a change request: a
statement that a previously cached value is no longer valid despite
still being within the TTL of the item. The use of ancillatory data
to indicate changes is somewhat out of the protocol specification,
but is often considered essential behavior.
It is clear that a resolver must not overwrite an item in the cache
with an ancillary item for any reason, as otherwise ancillary data
can be used for a race-until-win attack using data replacement
instead of data inclusion.
It is also clear that a resolver must not void a cache entry upon a
change request for an ancillary item when the response is not in
bailiwick. Otherwise, an attacker who controls an arbitrary
authority could construct a race-until-win attack by alternating
between attacking the name and performing an unrelated query which
uses the attacker's authority uses to void the name.
However, a resolver may (and probably should) safely respond to an
in-bailiwick request for change by voiding the cache entry associated
with the ancillary item. A blind attacker cannot use this behavior
to create a race-until-win condition, as the attacker would have to
win a race against an arbitrary name in the same domain to void the
cache entry and then win the next subsequent race to set the cache
entry to the attacker's desired value. As winning two back-to-back
races is exponentially harder than winning a single race, the policy
is safe from attack while still enabling ancillary data to act as a
notification of change.
11. Conclusions
There are many defenses which can be layered to provide robust
defenses for recursive DNS resolvers.
By looking for duplicate responses to the same transaction which have
different value, all non-disruptive attacks can be directly detected,
and their effects on the cache mitigated. By forwarding the second
response, the final victim can also be notified of the attack. This
requires ~10MB of state for a resolver performing 10,000 external
queries a second.
By setting an entropy budget of 40b, blind attacks are infeasible,
requiring terapackets to have a high probability. By querying
special authoritative DNS servers, a resolver can detect any process
which reduces this entropy, and can use duplicate requests to restore
Weaver Expires April 3, 2009 [Page 16]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
entropy.
For the few sites where duplication produces different results, it is
probably safe to return a randomly selected result. Although this
still enables transactional cache attacks, it is probably better to
accept this very narrow window of vulnerability to enable resolution
of key sites.
Finally, cache policy can eliminate 'race-until-win' attacks while
subsuming most bailywick checks. By only accepting and returning the
direct answer to requests, an attacker can no longer conduct any
race-until-win attacks targeting specific names. The overhead is
reasonable as even without optimizations, a study of our resolvers
showed a 26% increase in requestes before any optimizations are
applied.
These changes all involve only resolver behavior, and they also
combine to provide better protection than any one defense alone.
Increasing entropy increases the blind attacker's work in packets,
while eliminating race-until-win increases the work in time. With a
sufficient entropy budget, a resolver can detect that it is under
attack and act according. While directly detecting non-disruptive
attacks can detect both packet injectors and many blind injectors.
12. Acknowledgements
This work is sponsored in part by NSF Grants ITR/ANI-0205519 and NSF-
0433702. All opinions are those of the author, not the funding
institution.
Feedback from Vern Paxson, Robin Sommer, Christian Kreibich, Paul
Vixie (who also suggested the "closest name" heuristic), Seth Hall,
Wooter Wijngaards, Dan Kaminski, and David Dagon
13. IANA Considerations
None
14. Security Considerations
This text is focused on security concerns for DNS resolvers. The
security aspects of each defense are discussed as part of each
section.
Weaver Expires April 3, 2009 [Page 17]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
15. References
15.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
15.2. Informative References
[0x20] Vixie, P. and D. Dagon, "Use of Bit 0x20 in DNS Labels to
Improve Transaction Identity", March 2008,
<http://tools.ietf.org/html/
draft-vixie-dnsext-dns0x20-00>.
[akamai] Akamai Inc, "The Akamai CDN", 2008,
<http://www.akamai.com>.
[barwood] Barwood, G., "The Birthday Defense (IETF NameDroppers
Mailing List)", September 2008, <http://www.ops.ietf.org/
lists/namedroppers/namedroppers.2008/msg01647.html>.
[kaminski]
US-CERT, "Multiple DNS implementations vulnerable to cache
poisoning", July 2008,
<http://www.kb.cert.org/vuls/id/800113>.
[perry] Perry, M., "Fully Automated Active HTTPS Cookie
Hijacking", August 2008, <http://fscked.org/blog/
fully-automated-active-https-cookie-hijacking>.
[unbound] Wijngaards, W., "Resolver Side Mitigations", August 2008,
<http://tools.ietf.org/html/
draft-wijngaards-dnsext-resolver-side-mitigation-00>.
Weaver Expires April 3, 2009 [Page 18]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
Author's Address
Nicholas Weaver
International Computer Science Institute
1947 Center Street suite 600
Berkeley, CA 94704
USA
Phone: +1 510 666 2903
Email: nweaver@icsi.berkeley.edu
Weaver Expires April 3, 2009 [Page 19]
Internet-Draft Comprehensive DNS Resolver Defenses September 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Weaver Expires April 3, 2009 [Page 20]
| PAFTECH AB 2003-2026 | 2026-04-22 14:35:58 |