North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Re: DNS TTL adherence
On Thursday 16 Mar 2006 04:23, you wrote: > > You might consider the following paper from IMC 2003: "On the > Responsiveness of DNS-based Network Control" by Jeffrey Pang, Aditya > Akella, Anees Shaikh, Balachander Krishnamurthy, Srinivasan Seshan, > http://www.imconf.net/imc-2004/papers/p21-pang.pdf The results are greatly at odds with my experience. As they imply the problem may be specifically misconfigured ISPs DNS server, which might explain why we see less violations, if our sites aren't popular with those ISPs users. However I wouldn't trust any report where the control of the authoritative DNS itself wasn't explicitly monitored and reported. They may think they have updated the authoritative answers (and TTL), but in my experience when you find violators you often find that the authoritative DNS servers didn't all update as, or when, expected, or that earlier records were returned with a longer TTL from those servers. Certainly that was the experience of moving many sites last week. Where you can in real time check the logs and find which domains we messed up on by the traffic still arriving. Looking at the 4 long term violators for one site.... Hits Source IP 8 220.127.116.11 <--- ?? 1 18.104.22.168 <--- lager.netcraft.com 15 22.214.171.124 <--- IBM Almaden Research Center 5 126.96.36.199 <--- Fast Search & Transfer During this period (starting 3 days after moving a 10 minute TTL) we saw 27234 hits (okay not exactly a busy site) for that site on the correct server. So roughly 1 in a 1000 hits during days 3 to 6 went to the old web server, and this domain had the most lost hits, most of the moved domains don't show in the old server's log at all. Given I think we can exclude at least 21 out of 29 safely as being "non-human" (sorry IBM Research if you were deeply interested in proof reading), and I'm guessing have made a deliberate effort to cache stale data for their own reasons. So I can put an upper estimate on our sites of 1 in 1000 hits of interest going to the wrong site during days 3 to 6. The most popular site moved, had only two DNS violators days 3 to 6, the most notable being the same "Fast Search & Transfer" IP above. It may be that popular sites have a far worse problem by dint of exercising more caching code, but this site is far from being our most popular. And these sites were moved by reducing the TTL to a low value (10 minutes) and keeping it there for a long period of time, before we actually performed the move.