How many networks in the world are unreachable at any given instant, and how long are their outages? How many end-to-end BGP routes are unstable at any given time? What levels of unreachability and instability are within a "usual range," which indicate serious problems, and which events stand out? Who is doing well, who is bad, and what are the long-term trends?
In order to clearly communicate about the quality and characteristics of inter-domain routing on an Internet-wide basis, we need to establish a common quantitative language to discuss interdomain routing instabilities and problems. Traditional metrics such as routing table sizes and BGP update counts can show that something unusual happens, but not much more. Given the complexity of routing on the global scale, and distinctions between edge networks and core transit networks, there cannot be a single number that flags all problems. We need a compact set of routing quality metrics that are distinguished by three properties:
The presentation will be illustrated with both long-term behavior of routing stability and reachability, and with zoom-in on various routing events of interest.
PDF presentation
RealVideo stream