Recently I received a call from our $2/month DNS service provider explaining that our DNS queries have increased to 400 million authoritative queries per month (a high of 120 queries/second) , putting us outside the scope of the basic service and we were required to upgrade to a premium service in the range of $1600/month. Welcome to some tech at pinkbike.
After some investigation we determined that browser DNS prefetching was causing an increase in our authoritative DNS queries of a whopping 800%.
A single meta tag to control prefetching reduced Pinkbike DNS queries by 350 million per month. The same implementation on a larger site, deviantART, reduced authoritative DNS queries by 10 billion per month.
DNS resolution is the process of converting a domain/hostname to an IP address required to access the resource. This process requires a certain time and adds to the perceived page loading time. DNS preresolution/prefetching is the process of figuring out the IP address of every link on the page before you even click on it, with the goal to save the DNS resolution time when the link is clicked. DNS prefetching is a fairly recent (added in Safari 7 months ago) enhancement to all the major browsers. After a page loads, the browser looks at all the hosts in the links on the page and in the background proceeds to issue DNS queries to resolve those hostnames.
I setup our DNS a number of years ago, before DNS prefetching existed, so I never thought about it’s implications. I knew just enough to use a good DNS service which had multiple geo located DNS servers to minimize authoritative query request times, and to make sure that TTLs were set high (in our case a week) in order to keep caches around longer. In our case we used dyndns which has great service, performance, and an easy to use UI.
After receiving the information about the large amount of DNS queries that we were generating I investigated all the obvious elements.
1. DNS TTLs DNS queries are cached at many levels. Caches may exist on your browser, OS, router and ISP. Configuring how long the cache is valid is a TTL (time to live) setting in your DNS. In our case our TTLs were set at the maximum of 1 day, so this was surly not the issue.
2. Lots of links/images on other sites It is possible that there are a large amount of embedded images on other sites and users viewing those sites are causing additional load on our DNS. Pinkbike uses a separate subdomain for static content, pinkbike.org, and since this domain is not generating high DNS queries, this is not the problem
3. Misconfigured internal services hitting the DNS Another potential issue was a misconfigured server. Pinkbike has a bunch of servers that communicate internally for all sorts of reasons. Perhaps one of these is not caching DNS. You can easily check this with the following command on a linux box. This will show you all the DNS queries that your server is doing. This was not the issue. dnstop -l 4 eth0
The basic DNS service we were using did have any reporting to allow us to determine what hostname the dns queries were for and where they were coming. This made it difficult to figure out what the issue may be.
Upgrading to Dynect
I need to give Bobby and the guys at Dyn a plug for being forthcoming in the help to investigate this issue. The first thing we were able to do is get a trial account to the Dynect platform which is the “enterprise” version of dyndns. Since the same company runs both services, transition from dyndns was one click to transfer all the data in to dynect. In addition to better performing DNS using anycast and more geo servers, you get all sorts of reporting, data, and control.
Looking at the www domain we see that this is not the culprit of the 120 q/s
Now that we were able to see which hosts were getting the bulk of the queries, I was able to narrow down the problem. The bulk of the queries were the wildcard dns queries used for user subdomains such as username.pinkbike.com. I always expected that this would be a source of additional queries, but the volume still did not make sense. After checking our analytics further, there were more DNS queries per subdomain then actual subdomain page loads. Pinkbike does about 100 dynamic html pages per second, so to have 120 DNS queries per second made no sense, especially taking into account multiple layers of DNS caching.
This finally led me to look in the direction of prefetching.
Browser DNS Prefetching
I was aware of the idea of DNS prefetching but never really understood how the mechanics worked in the browser. The first thing I did was fire up wireshark to take a look at exactly what was going on.
Some of our pages have comments, and each comment has a link to the author’s subdomain. When you have 100s of comments on a page, this means 100s of potential subdomains that can be prefetched. Of course the browsers seem to blindly prefetch all the subdomains.
Safari and Chrome prefetch DNS after the page loads. Prefetch can last many seconds after the page loads.
Firefox 3.6.x seems to bloat the prefetching even further. It seems to prefetch the AAAA (ipv6) record in addition to the A (ipv4) record for every prefetch. This may further explain the large amount of AAAA requests on all our domains. Perhaps FF does this for every hostname.
Firefox DNS prefetch is worse as it always gets the AAAA record in addition to the A record further increasing DNS queries. Though I did try the FF4 beta and it seems this is corrected in the future release.
IE8 / 9 early beta do not appear to do DNS prefetching.
Solution - Prefetch Control
Prefetching seems to be enabled by default on every browser, but can be disabled in the settings. You could ask all your users to disable prefetch in their browsers but that just would be a waste of goodwill with your users. Luckily the browsers have standardized on a prefetch control meta tag that can be used on your page to disable prefetching.
The dns prefetch control allows you to turn off and on prefetching for your whole page or certain parts of the page. Additionally you can force lookups on specific hostnames.
Initially I disabled prefetch on a subset of the pages and the result was dramatic and instant
Additional tuning resulted in a further drop
After realizing this was the cause for bloated dns queries I called up my buddies at deviantART to let them know about this discovery and asked them to implement the basic fix. Their result was also instant.
On deviantART this adds up to a decrease of about 10 billion dns queries per month
When I first heard of DNS prefetching I thought it was a great idea and assumed it was an optimization that had absolutely no negative issues. I did not fully understand the mechanics and impact that it may cause on an organization. If 350 million queries cost Pinkbike.com $1600/month, then saving a cost for a few billion queries per month starts to be something to consider carefully.
Before you run to disable prefetching on your site, please realize that the increase in queries is directly proportional to the amount of subdomains that you link on your page. Arguably, prefetching is an effective way to decrease latency for most sites, but there are implications. I hope this article will make you aware of and help you understand the potential impact and behaviour of this technology.
- A browser and OS typically cache DNS hostnames in the order of a few hundred. If a user is visiting sites that have a lot of subdomain hostnames, do the valuable entries in these caches get evicted by prefetched entries frequently, and in so prefetching may be causing increased latency?
- If I'm paying for an enterprise DNS solution which has DNS servers all over the world to minimize query time, is browser prefetching less valuable to me?
- If you're thinking of building a site with the pattern of username.example.com, this data may sway you away from it.