North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

Re: ultradns reachability

  • From: Christopher L. Morrow
  • Date: Fri Jul 02 00:21:57 2004


On Thu, 1 Jul 2004, k claffy wrote:

> On Fri, Jul 02, 2004 at 02:06:59AM +0000, Christopher L. Morrow wrote:
>   On Thu, 1 Jul 2004, James Edwards wrote:
>   > http://www.cymru.com/DNS/gtlddns-o.html
>   >
>   Anycast makes the pinpointing of problems a little challenging from the
>   external perspective it seems to me.
>
> i am relieved it is only 'a little challenging'
> because i was worried it was 'sub-possible'.
> (or am i misinterpreting operational euphemisms...)

Oops, I did it again, I forgot the ":)".

So, I thought of it like this:
1) Rodney/Centergate/UltraDNS knows where all their 35000billion copies of
the 2 .org TLD boxes are, what network pieces they are connected to at
which bandwidths and the current utilization
2) Rodney/Centergate/UltraDNS knows which boxes in each location (there
could be multiple inside each pod, right?) are running their dns process
and answering at which rates
3) Rodney/Centergate/UltraDNS knows when processes die and locally stop
pushing requests to said system inside the pod
4) Rodney/Centergate/UltraDNS knows when a pod is completely down (no
systmes responding inside the local pod) so they can stop routing the /24
from that pod's location

So, Rodney/Centergate/UltraDNS should know almost exactly when they have a
problem they can term 'critical'... I most probably left out some steps
above, like wedged proceseses or loss of outbound routing to prefixes
sending reqeusts. I'm sure Paul/ISC has a fairly complete list of failure
modes for anycast DNS services.

The problem then becomes the "Hey, .org is dead!" From where is it dead?
What pod are you seeing it dead from? Is it routing TO the pod from you?
FROM the pod to you? The pod itself? Stuck/stale routing information
somewhere on the path(s)? This is very complex, or seems to be to me :(

A good thing, oddly enough, is each of these events gives everyone more
and better information about the failure modes :)

>
> but then we've taken similar risks before and gotten
> stuff like BGP so maybe we'll be, um, just as fond of
> anycast in due time. :)
>

I think more failure modes will be investigated before that comes :)
fortunately lots of people are already investigating these, eh?

-Chris