How to Really Fix Your DNS

Obviously the first thing everyone should be doing is to apply the patches that the major vendors rolled out, and do it quickly.  It is no longer the time for debate in regard to whether or not you really do need to patch… the answer to that question is quite clear; Yes.  Yes you do. Stop reading this, go to your vendor right now, and get the patches. Then apply them.  This will still be here when you get back…

Unfortunately, the existing patch doesn’t really fix the problem, it just makes it much harder to attack, which is a good thing.  If you still aren’t patched, you obviously didn’t follow my instructions in the first paragraph, so I’ll reiterate: Stop reading this, go to your vendor right now, and get the patches. Then apply them.

The patches that most major vendors rolled out when this vulnerability was announced, albeit with no technical details, primarily revolves around randomizing the source port that the nameserver makes it’s queries from.  Without this randomization, the only other piece of random information in the DNS packet is the transaction ID, which DNS servers use to correlate queries and replies, and also helps prevent reply-spoofing attacks by requiring that the attacker correctly guess this value.  Given the randomized hostname exploitation technique used in this attack, the attacker can force the nameserver to do as many queries as they like, which provides a birthday attack scenario for guessing the transaction ID value and succeeding in spoofing the reply.  The search space of the transaction ID is 16 bits, which provides possible values of 0-65535 within which the attacker has to guess correctly.  Given as many attempts as the attacker likes, this can take anywhere from a few seconds to a couple of minutes.  By adding the source port randomization to the picture, this adds around another 16 bits to the equation (minus source ports already used, privileged source port range, etc.), making the time it takes to correctly guess much longer, but still not impossible.

As Dan mentioned in his BlackHat Podcast yesterday, much debate will surround how to actually fix this problem in the long term.  The source port randomization is a good short-term band-aid, as it will make actually exploiting the vulnerability much more difficult; but it can still be exploited.  Some ideas have already been tossed around, and there are many good reasons why a lot of them really won’t work in all situations, but they’re still good ideas to look into:

1. Use DNSSec

This is likely The Best™ way to permanently fix the problem, but rolling out DNSSec is a complicated issue.  DNSSec was standardized close to 10 years ago, and it’s still not widely adopted.  There are many reasons for this lack in adoption and much debate regarding it, which I won’t cover here.  Do the research.  If it makes sense for you to use it, this will solve the problem.  Until of course someone finds a flaw in DNSSec…

2. Access Control

If your nameserver is intended for internal resolution services only, restrict which hosts can make queries to it. If the attacker can’t send a flood of queries to it and cause it to make recursive queries to another nameserver, it’s much, much more difficult for this attack to be successful, if not impossible. Similarly, restricting which clients can even cause a nameserver to make recursive queries can prevent this attack assuming you also are able to detect and block incoming queries from spoofed addresses which are allowed. For dual-purpose nameservers which handle both recursive resolution services for clients and authoritative resolution services for domains, most nameserver software will allow you to configure which clients can make recursive queries, such as your internal network, and which can’t, such as queries from outside your network. These types of mitigation were suggested by the US-CERT advisory for this vulnerability, and details on how to configure this for BIND can be found here.

3. DNS over TCP

If you remove the fact that you can spoof server responses, as you can by using a connection-oriented protocol such as TCP, you remove the attack vector.  Dan claims, and I’m inclined to believe him due to his track record with DNS research, that this simply introduces too much overhead for our infrastructure to support.  I would believe this to be true for major ISPs and backbone providers, but if you’re a relatively small network (and by small, I mean in the area of Fortune 1000s or so) and maintain your own enterprise DNS hierarchy, this may actually be an option for you, and will at least help protect nameservers that query against yours, such as your internal nameservers, and should protect your domain from being poisoned in anyone else’s cache.

4. Get a Second Opinion

During Dan’s BlackHat Webcast yesterday, he made reference to an email he received earlier that day, which I can only assume was the one I sent him with this suggestion, suggesting that nameservers could potentially get a second opinion;  When the nameserver is going to query for a hostname, and it first looks up the nameservers it needs to ask to find that hostname, there are usually more than one option for which server to ask.  Rather than just ask one of them and only ask another if the first fails to respond, why not ask two?  Or all of them?  This also makes the attack much more difficult, because now the attacker has to spoof responses from all of them (the Metasploit exploits already do this).  Dan follows his mention of this idea in the Webcast by correctly saying that this would at least double the amount of DNS traffic and our infrastructure, like using DNS over TCP exlusively, can’t handle this.  He didn’t mention that I also had a suggestion for addressing this in my email as well, which is to delay the second opinion for a short, indeterminate, random amount of time, perhaps until there is a lull in regular traffic, to make the second, third, etc. request.  If the first request ended up posioning the cache, the subsequent request(s) would correct it, and the nameserver doesn’t have to respond to the original query until it has verified the address via second opinion.  While this does introduce some latency, I’d rather get a lagged connection to the right host than a fast connection to a malicious host anyday. This also makes the attack more difficult as the attacker then has to deal with timing his spoofed responses from each nameserver so that they arrive AFTER the queries that were sent to each of those nameservers, and since the attacker has no way to know which nameservers are queried in what order, he has no way to reliably do this.  While this may not be an option due to the duplication in bandwidth for large ISPs and backbone providers, like DNS over TCP, this may make sense for the smaller networks that can handle the extra traffic, however unlike configuring your nameserver to only listen for queries via TCP, this option would require another patch to most nameserver software.  Unless you’re rolling your own, your vendor will likely have to implement this behavioral option for you.

5. Rate-limit Inbound DNS Packets by Source Address

One of the properties of this attack is that the attacker is allowed to force the target nameserver to make as many recursive queries as they want, as fast as they can.  Slow them down a bit… No single resolver client using your nameserver really needs more than X queries in Y seconds to look anything up, do they?  By rate-limiting inbound queries, it’ll make it much harder for an attacker to cause your nameserver to make all of the recursive queries they need it to, unless the attacker is randomizing source addresses on their query packets.  In the case that they are (the Metasploit exploits have the option to do this), the attacker still has to spoof the reply to the recursive query as if it were coming from the authoritative nameserver for the domain they’re attempting to hijack.  By rate-limiting the number of inbound DNS packets by source address, most of the incoming flood of spoofed replies will be discarded.  Since the original query was caused by the attacker, it doesn’t matter if the legitimate response gets dropped either.  Take a little time to model your legitimate DNS traffic and see what those X and Y thresholds that I mention above are for inbound queries per source address.  Anything above and beyond that is likely attack traffic.  This works excellently for smaller networks, however in order to accurately rate-limit ALL inbound DNS packets including replies, because DNS over UDP is connectionless, you may have to configure your nameserver to always query from a static source port; this is the exact opposite of what the patches rolled out by all the major vendors do.

6. Don’t Cache

A cache-poisoning attack can’t be successfull if there’s no cache to poison, right?  DNS will work just fine if the nameserver doesn’t cache, it’ll just be extremely unoptimized and repeat a lot of requests that it otherwise wouldn’t.  Again, for smaller, low-traffic networks this may be fine, however having your nameserver go make the same queries repeatedly within the TTL of the first query’s answer might make you a bad Netizen.

Leave a Reply