Saturday, February 7, 2004
Topic/Presenter
Full Abstract

This tutorial introduces a few simple security tools any ISP can implement at low cost (free) to improve both reaction time and customer satisfaction during crisis. Basics of blackhole routing and customer use of the blackhole configuration in the ISP network will be discussed. Some advanced uses for blackhole routing will also be covered. Finally, the BGP FlowSpec draft, titled "Dissemination of flow specification rules" (http://www.tcb.net/draft-marques-idr-flow-spec-00.txt">http://www.tcb.net/draft-marques-idr-flow-spec-00.txt) and its future within the standards process will be discussed. An overview of the draft will be provided with operator input sought.

Speakers
Tim Battles, AT&T
Danny McPherson, Arbor Networks
Chris Morrow, UUNET

Full Abstract

This tutorial discusses the L2 VPN-over-MPLS solutions being standardized in the IETF. The first part of the session covers the drivers, the enterprise perspective of the main models (Virtual Private Wire Services, VPWS, and Virtual Private LAN Service, VPLS) and related technology in the standards. The second section describes the building blocks common to all L2 VPN models. The third part of the tutorial discusses the specifics of VPWS and VPLS. We also discuss Layer 2 internetworking concepts, i.e., bridged and routed mode solutions, options for ARP mediation, options for MPLS encapsulation, and end to end OAM. Configuration examples from several vendors will be given for the various solutions covered in the tutorial. This session requires a basic understanding of MPLS technology, including control plane and data plane concepts. We begin with simple, generic models, and move gradually to more complex concepts to appeal to a variety of attendees: product managers, network planners/architects, system engineers, network operations staff, customer support, applications engineering, and development engineers who are new to the area of MPLS L2 VPN services. An outline of the session follows:

  1. Introduction to L2VPNs over MPLS
    • Drivers, High-level Definitions, Standards
  2. IETF Generic Models
    • Information Model/Provisioning
    • Control Plane: Auto-discovery, Signaling
    • Data Plane: Attachment Circuits, Virtual Forwarders, Pseudo-wires
    • Virtual Private Wire Service (VPWS)
      • Provisioning models (e.g., single-sided, double-sided provisioning), and related concepts (Gid, PWid FEC etc)
      • Configuration Examples
    • Layer 2 Internetworking
      • Encapsulation options: Bridged/Routed Mode, MPLS
      • Packet walkthrough, ARP Mediation
      • Service OAM
      • Configuration Example
    • Virtual Private Line Service (VPLS)
    • LDP, BGP VPLS commonalities and differences
    • VPLS Signaling, Data Plane
    • Configuration Example
  3. Summary

Speakers
Florin Balus, Nortel
Florin Balus is an M.Sc (Artificial Intelligence) graduate from the Faculty of Electronics and Telecommunications, University of Bucharest, Romania. He has more than 10 years experience in data networking, working for the last seven years at Nortel in the Global Data Network Engineering team. During his career at Nortel he has been Network Engineering prime for various projects in the areas of ATM/FR, DSL and IP Services. He currently specializes in MPLS VPNs, focusing also on the evolution to Ethernet. Prior to joining Nortel, Florin worked for France Telecom in Europe for three years.

Mike Loomis, Nortel
Mike Loomis holds a BS in engineering from Rensselaer Polytechnic Institute and an MBA from the University of New Hampshire. Mike has eight years of data networking experience at Bay Networks and Nortel Networks. He has held a variety of product management positions in Network Management, Enterprise Switching, and Optical Ethernet business units. His recent focus has been on L2 VPNs over MPLS, the evolution toward Ethernet, and VPLS.

Sunday, February 8, 2004
Topic/Presenter
Full Abstract

This tutorial discusses MPLS VPNs in detail, concentrating on layer 3 BGP MPLS VPNs. The tutorial will cover basic L3VPN setup and carrier scenarios outlined "BGP/MPLS IP VPNs," as well as advanced topics that arise in the context of VPNs. The material presented is vendor-independent. The tutorial is targeted at network engineers and service providers who want to gain a deeper understanding of MPLS VPNs. Basic understanding of MPLS is assumed, but no prior knowledge of VPNs is necessary, as the tutorial will start building the MPLS VPN model from scratch.

Speakers
Ina Minei, Juniper

Full Abstract

This tutorial offers an advanced exploration of Fast Reroute (FRR) techniques and operations. Fast Reroute is an important tool in engineering networks for transporting real-time data in addition to traditional Layer 2 technologies. During this tutorial we build on existing engineer skills to discuss the terminology and background for FRR. Both current FRR models, node and link protection, are examined. We'll look at the RSVP objects used, the establishment of the FRR paths, as well as operational issues. Throughout the tutorial, Juniper Networks and Cisco Systems routers and CLI commands are used to illustrate important FRR concepts.

A brief outline follows:

  1. Assumptions
  2. Why use fast reroute?
  3. FRR Terminology
    • Node protection
    • Link protection / Facility backup
    • Point of local repair (PLR)
    • Protected LSP
    • Detour LSP
    • Bypass LSP
    • Merge point
  4. Node Protection
    • Building FRR paths
    • RSVP objects
    • Fast Reroute
    • Detour
    • Record route
    • Session attribute
    • Label operations
    • Merging detours
    • Notifying ingress router of local repair
  5. Link Protection
    • Building FRR paths
    • RSVP objects
    • Fast Reroute
    • Detour
    • Record route
    • Session attribute
    • Label operations
    • RSVP operation during repair mode
    • Merging detours
    • Notifying ingress router of local repair

Speakers
Joe Soricelli, Juniper

Full Abstract

What operational pitfalls and successes did network technologies face during the last 10 years? Dino presents a practical and operational perspective on lessons learned during major deployments of the last decade.

Speakers
Dino Farinacci, Procket
Dino Farinacci has been designing and implementing networking protocols for 21 years. He has extensive experience with distance vector and link state protocol implementations, as well as multicast routing protocols, which have been his focus for the past eight years. He wrote widely deployed implementations of IS-IS, OSPF, PIM, MBGP, and MSDP when these protocols were infantile in their development. A former Fellow at Cisco, Dino currently works for Procket Networks in the Routing Protocols group.

Full Abstract

Speakers
Sue Hares, NextHop

Monday, February 9, 2004
Topic/Presenter
Full Abstract

Currently, router measurements collected via SNMP do not report any statistics on the through-router delays experienced by packets. However, such delays are building blocks of end-to-end packet delay seen by applications, which may soon be subject to SLA's. It is therefore important for operators to demonstrate compliance by compiling statistics on delays suffered over their own network, in additional to network monitoring needs. In recent research work, we have shown how, at least for store and forward routers, an accurate surrogate for end-to-end delays can be obtained by measuring fluctuations in output queue size. Queue sizes are available to routers which have implemented queue management strategies such as RED. By focusing on queue "busy periods," which encapsulate all the congestion behaviour at the corresponding output interface, we addressed the question of what queue statistics are best to represent delay behaviour, how this can be done in small memory and with bounded processing, and how to compactly report these back via SNMP. In the talk we will describe our main findings and recommendations. These include on-line algorithms for storing a joint measure of busy period durations (related to utilisation) and amplitude (related to delay), and a method for discretising the resulting two-dimensional data which naturally adapts to traffic conditions. Using the joint description, we deliver not just a simple average measure of delay, but a rich summary of both delay and utilisation behaviour which can be mined in post-processing at network management nodes to derive, not just the simple first order statistics, but also time-scale dependent metrics such as the duration of congestion episodes which pass a given threshold.

Speakers
Darryl Veitch, Sprintlabs
Darryl Veitch has been working in networking and tele-traffic engineering for over 10 years. He has worked within academic, industrial, and government research organisations in Australia, France, Sweden and the USA. Darryl has worked extensively both in passive and active traffic measurement and modelling, at both the theoretical and practical "layers."

Full Abstract

What tools can we create to give troubleshooters truly useful support for resolving the most complex problems more efficiently and effectively? Complex problems include, for example, inexplicable intermittent faults, runaway processes, multiple entangled faults, interactions among network elements that "shouldn't be happening," and elusive root causes. Ultimately, our project will design a generic standalone, open source troubleshooting framework with a set of integrated technologies and workspaces for data preparation, exploratory analysis within and across subsystems, and collaboration. At present, we are exploring the most fundamental issue preceding design - defining requirements by studying and modeling troubleshooters' complex problem-solving for specific classes of network problems in real-world settings. We emphasize user models strongly because if they fail to aptly represent this work and its uncertainties and adaptation to dynamic conditions, tools resulting from them will be less than useful. In this talk, we describe our findings to date in constructing these user models. To assure generalizability, we have observed several global intranet troubleshooters working on complex problems in actual work settings; interviewed high-level troubleshooters in telecommunications and the Internet about their most difficult problems; and analyzed a transcript of eight days of e-mail exchanged among 26 Internet participants as they investigated and solved a problem with an executive-level, remote conference in which video-over-IP transmission was disrupted. We briefly will identify high-level patterns of inquiry that we have found specialists perform in common across Internet, intranet, and telecommunication networks. Then, for one type of problem, we will discuss a complex troubleshooting scenario (a composite drawn from our findings) and abstract from it top areas of support that advanced troubleshooters demonstrably need but lack - e.g., parsing and syncing up diverse data from elements across subsystems and network domains, using at once functional and structural models of the system for complex diagnosis, and comparing current and saved views of network behavior, such as correlations, trends, and patterns. Participation of NANOG attendees is vital, and we hope you'll participate in our BOF Monday evening. Operators' suggestions and critical reviews can give crucial insight into important matches we must make between users' and tools' ways of chunking analytical moves, strategies, and pathways for specific purposes, points in time, and circumstances. This matching makes or breaks the usefulness of a tool. Attendee comments will enrich the user models that we create and assure greater usefulness in the framework we develop.

Speakers
Barbara Mirel, Univ. of Michigan
Barbara Mirel, a visiting professor and research investigator at the University of Michigan, specializes in data visualizations and usability for complex problem solving. She has been a lead human factors engineer at Lucent Technologies and Visual Insights and the head of a national healthcare task force evaluating the safety and usability of clinical information systems. She is the author of Interaction Design for Complex Problem Solving: Developing Usable and Useful Software (Morgan Kaufmann, 2003).

Full Abstract

One of the few efforts to develop a globally analyzable and secure Internet is the creation of the Internet Routing Registries (IRRs). IRRs provide a voluntary detailed repository of BGP policy information. The IRR effort has not reached its full potential for two reasons: a) extracting useful information is far from trivial, and b) the accuracy of the data is uncertain. In this presentation, we provide a brief overview of our systematic approach to analyze the policy information stored in the IRRs. There exist a number of tools to measure actual BGP routing, such as ping, traceroute, looking glass, BGP table dumps, etc. But there does not exist a tool to bridge the gap between intended policy (configuration) and actual routing. Internet Routing Registries contain the policy of a large number of networks expressed in a high-level language. These registries are often considered to be useless and outdated, based primarily on empirical evidence. To the best of our knowledge, there does not exist a tool that can analyze these policies, and check their validity or freshness. The registries are maintained manually and on a voluntary basis to a large extent, and the policies remain as simple text. Thus, analyzing the IRR is not a trivial task. The difficulties include a) RPSL is very flexible, so policies can be very complex, b) there can be many different ways to express the same policy, c) the registries can contain inaccurate, and incomplete data. At the same time, having a tool that can analyze the policy information stored in IRRs, and more specifically RPSL-based policies, is important during the configuration and operation phases. During the configuration phase we can check the registered policy for correctness. During the operation phase, we can check (offline) whether the intended policy matches the actual routing. In fact, our tool is among the first public tools to analyze the IRR policies. A long-term goal of the RIPE Routing Information Service is to validate the policies that Autonomous Systems register, and thus increase the robustness of BGP. Our work here is the first step in reaching this ambitious goal. Our tool, Nemecis, which stands for Network ManagEment and ConfIguration System, consists of two parts. First, we convert the policies using filters to an equivalent link-level policy. In the link-level policy, we replace the export and import filters, with a boolean matrix that describes the relation between the links for an AS. For example, if we import a route from link i, and export that route to link j, then the value of the matrix at (i,j) will be true. By converting the problem to the link level, the problem becomes independent of the different kinds of implementations of the policy, or about specific routes or sets used in the filters. This way we can concentrate on how to model the actual policy. The second part is to infer the business policies using the link-level model. This part is independent of the first one. For example, we can enrich the business relations to include more types of relations, such as backup links, without changing the link-level approach. Finally, as a validation of our method, we check whether the registered policies agree with the actual Internet routing. Our contributions can be summarized in the following points: We provide Nemecis, an efficient tool to analyze the IRR/RPSL information. Our tool can be used to parse, clean, and infer the business relations found in the Internet Routing Registries, and create an easy-to-query relational database, where the policies are stored in tables and not as simple text. Our tool can infer the policy with higher than 83% accuracy. We validate the policy from IRR against real routing tables. We consider the accuracy to be very good, if we take into account the quality of the registered policies. We quantify the usefulness of the IRR information: we find that 28% of the ASes have both a consistent policy and are consistent with BGP routing tables. Note though that almost all are from a single registry, RIPE. We identify common mistakes and problems in IRR registries. We discuss ways to overcome them so that the IRR can be used to automate the management and safety of Internet routing. Our ambition is to establish our tool as a foundation and inspiration for two complementary goals. First, we would like to draw the interest of experts to develop efficient RPSL-based tools. Second, we would like to motivate practitioners and the related authorities to maintain and use the IRRs more. We think that one of the ways to succeed this is by establishing the practical potential of the IRR. We view our tool to be a promising first step in this direction. The presentation will be based on the following paper: "Analyzing BGP Policies: Methodology and Tool", by Georgos Siganos and Michalis Faloutsos, which will appear in IEEE INFOCOM 2004. For more details, please see http://www.cs.ucr.edu/~siganos/papers/Nemecis.pdf">http://www.cs.ucr.edu/~siganos/papers/Nemecis.pdf

Speakers
Georgos Siganos, UC Riverside

Full Abstract

NETCONF is an effort within the IETF to provide a programmatic interface to network elements. The protocol is being designed with numerous operational models in mind. This talk will focus on current development status and raise several questions that are relevent to the operator community. Please see http://ops.ietf.org/netconf/

Speakers
Eliot Lear, Cisco Systems
Eliot Lear is a consulting engineer for Cisco Systems and a co-author of the NETCONF draft specification.

Full Abstract

Accurate software that can be reliably and inexpensively synchronised is essential for many aspects of networking, including passive network measurement, active probing based network measurement, and many real-time network applications. Best effort solutions using existing PC software clocks synchronised with the standard Network Time Protocol (NTP) algorithms are not robust enough, nor accurate enough for many purposes, whereas GPS-based synchronisation is money- and effort-intensive, and thereby infeasible for large scale measurement efforts with a large number of nodes. In this talk a CPU clock counter (TSC register)-based software clock will be described that has many intrinsic advantages, thanks to the high performance of modern off-the-shelf hardware. Principles and algorithms enabling a robust, accurate synchronisation based on the existing NTP server network will be described and illustrated using four months of real data collected in four different host-server environments. The result is an alternative remote-synchronised software clock with substantially enhanced performance. In particular, its reliability should enable many network measurement and service functions to be performed that are not feasible at present, and substantially improve the reliability of existing measurements that rely on precise timing. The technique is relatively lightweight and requires no kernel modifications as such for implementation.

Speakers
Darryl Veitch, Sprintlabs
Darryl Veitch has been working in networking and tele-traffic engineering for over 10 years. He has worked within academic, industrial, and government research organisations in Australia, France, Sweden and the USA. Darryl has worked extensively both in passive and active traffic measurement and modelling, at both the theoretical and practical "layers."

Full Abstract

Arbor Networks

Full Abstract

Security incidents are a daily event for Internet Service Providers. Attacks on an ISP's customers, attacks from an ISP's customer, worms, BOTNETs, and attacks on the ISP's infrastructure are now one of many "security" NOC tickets through out the day. This increase in the volume and intensity of attacks has forced ISP's to spend constrained resources to mitigate the effects of these attacks on their operations and services. This investment has helped minimize the effects of the attacks, but it has not helped stop them at the source. Stopping attacks at their source requires rapid and effective inter-ISP cooperation. Hence, these ISP Security BOFs are also used as a face-to-face sync up meeting for the NSP-SEC forum (see https://puck.nether.net/mailman/listinfo/nsp-security">https://puck.nether.net/mailman/listinfo/nsp-security)

Speakers
Barry Raveendran Greene, Cisco Systems
Merike Kaeo, None

Full Abstract

In this panel, industry experts will discuss the operational impact of changes in inter-domain routing (Sue), multicast (Dino), security (Steve), and IPv6/NAT (Paul). The panel will be followed by an open forum/Q&A for all the NANOG30 anniversary speakers. 1. 15 Years of Policy Routing, by Sue Hares 2. NAT and IPv6, We Meet at Last, by Paul Francis This talk examines the history of NAT and IPv6, describing how each has evolved, and how their independent paths have finally met in the form of Teredo. It also examines possible futures for NAT and IPv6. 3. Where Multicast Has Been and Where It's Headed, by Dino Farinacci Dino describes the early history and evolution of the multicast routing protocols.

Speakers
Moderator - Sue Hares, NextHop
As founder and CTO of NextHop Technologies, Sue Hares leads the company's technology qualification, development, and strategic planning functions. Prior to launching NextHop, Sue spent 13 years at Merit Network, Inc., where she most recently directed the Merit GateD Consortium. She was also a senior engineer at both Allen-Bradley Corp. and ADP Inc. An active participant in the design, specification and implementation of routing protocols, Sue co-chairs the IETF Inter-domain Routing working group, which is standardizing BGP. She is also a member of the NANOG program committee. Sue holds a B.S. in Computer Engineering from the University of Michigan.

Panelist - Steve Bellovin, AT&T Research
Panelist - Dino Farinacci, Procket
Dino Farinacci has been designing and implementing networking protocols for 21 years. He has extensive experience with distance vector and link state protocol implementations, as well as multicast routing protocols, which have been his focus for the past eight years. He wrote widely deployed implementations of IS-IS, OSPF, PIM, MBGP, and MSDP when these protocols were infantile in their development. A former Fellow at Cisco, Dino currently works for Procket Networks in the Routing Protocols group.

Panelist - Paul Francis, Cornell University
Paul Francis is the inventor of NAT (though not NAPT), as well as an early contributor to IPv6. In his 15-year career in industry research labs (MITRE, Bellcore, NTT Labs), Paul has originated a number of interesting ideas, including Landmark Routing, Shortcut Routing across large non-broadcast networks, shared multicast trees, and application-layer multicast. Paul is currently a faculty at Cornell University and is working on BGP scalability, IP anycast deployment, overlay multicast, DDoS prevention, network proximity addressing, and NAT.

Full Abstract

Speakers
Eric Aupperle, Merit Network
Eric Aupperle was President of Merit Network from 1988 to 2001, and is now the organization's President Emeritus. From 1987-1995, he led the NSFNET partnership with IBM, MCI, and the Michigan Strategic Fund, which re-engineered and managed the nation.s first high-speed backbone network. Eric received a B.S.E. in Electrical Engineering, a B.S.E. in Mathematics, and an M.S.E. in Nuclear Engineering from the University of Michigan.

Andy Burnette, Terremark/NOTA
Susan Harris, Merit Network
Susan Harris coordinated NANOG meetings and was Senior Science Writer at Merit Network at the University of Michigan. She has been working in IT for 20 years, mostly in telecommunications and network engineering. Before discovering computers she spent her time reading Babylonian contracts and earning a Ph.D. in ancient Near Eastern History at the University of Michigan.

Full Abstract

Speakers
Diane Sidebottom, Dept. of Homeland Security
Diane Sidebottom has been counsel for the Science and Technology Directorate and the Homeland Security Advanced Research Projects Agency of the Department of Homeland Security since June 2003. Prior to coming to DHS, Diane was counsel to the Defense Advanced Research Projects Agency for nine years, where she focused on government contracting and intellectual property. Diane is a graduate of the University of Colorado and the Thomas M. Cooley Law School. She received her LL.M. in Intellectual Property from the George Washington University Law School in August 2001.

Full Abstract

Speakers
Majdi Abbas, Lattice Networks

Full Abstract

This talk reviews the history of the Internet from a Bell-Heads vs. Net-Heads point of view.

Speakers
Scott Bradner, Harvard University
Scott Bradner is a Senior Technical Consultant in the Harvard Office of the Provost, where he provides technical advice and guidance on issues relating to the Harvard data networks and new technologies. He has served in a number of roles in the IETF, as codirector of the Operational Requirements Area (1993-1997), IPng Area (1993-1996), Transport Area (1997-2003) and Sub-IP Area (2001-2003). Scott was a member of the IESG (1993-2003) and was an elected trustee of the Internet Society (1993-1999), where he still serves as the Vice President for Standards. Scott is also a trustee of ARIN.

Full Abstract

Speakers
Barbara Mirel, University of Michigan

Full Abstract

Now more than ever, Internet Service Providers are focusing on ways to increase the resiliency of their networks and, if at all possible, reduce their operating costs at the same time. Past research (Internet Service Providers and Peering, presented at NANOG 19, and A Business Case for Peering) demonstrates the economic tradeoffs of peering and highlight the simple but challenging first step: How to know who to talk with at an ISP to get peering set up. This Peering BOF focuses on this first step using "Peering Personals." We solicit Peering Coordinators (before the meeting), asking them to characterize their networks and peering policies in general ways ("content heavy" or "access (eyeball) -heavy," "Multiple Points Required" or "Will Peer anywhere," "Peering with Content OK," etc.). From the answers we will select a set of ISP Peering Coordinators to present a 2-3 minute description of their network, what they look for in a peer, etc., allowing the audience to put a face with the name of the ISP. At the end of the Peering BOF, Peering Coordinators will have time to speak with Peering Coordinators of ISPs they seek to interconnect with. The expectation is that these interactions will lead to the Peering Negotiations stage, the first step towards a more fully meshed and therefore resilient Internet.

Speakers
Bill Norton, Equinix

Full Abstract

This talk focuses on the many developing threats to the end-to-end model, and the need to find better ways to fight spam and DoS attacks without destroying the Internet to save it.

Speakers
Phil Karn, QUALCOMM
Phil Karn designed KA9Q, one of the first publicly available implementations of TCP/IP for DOS. He is a well known Internet researcher in areas ranging from security to his current work on wireless data communications.

Full Abstract

Since the first NANOG meeting ten years ago, there have been significant changes to the names of companies running various IP backbones. The multitude of mergers, acquisitions, and Chapter-11's over the last decade have seen the names change, but not the issues. In this talk we will take a walk down memory lane and see how the NANOG community has handled the changing corporate world. Did the gut-wrenching changes at our employers help or hinder the community? Are the issues discussed ten years ago at that first NANOG meeting still relevant today?

Speakers
Moderator - Marti Levy
Panelist - John Curran, XO Communications
Panelist - Doug Humphrey, Joss Institute

Full Abstract

How many networks in the world are unreachable at any given instant, and how long are their outages? How many end-to-end BGP routes are unstable at any given time? What levels of unreachability and instability are within a "usual range," which indicate serious problems, and which events stand out? Who is doing well, who is bad, and what are the long-term trends? In order to clearly communicate about the quality and characteristics of inter-domain routing on an Internet-wide basis, we need to establish a common quantitative language to discuss interdomain routing instabilities and problems. Traditional metrics such as routing table sizes and BGP update counts can show that something unusual happens, but not much more. Given the complexity of routing on the global scale, and distinctions between edge networks and core transit networks, there cannot be a single number that flags all problems. We need a compact set of routing quality metrics that are distinguished by three properties:

  1. They offer practical operational utility; that is, they have good intuitive correspondence to observable macroscale Internet routing behaviors (beyond "flapping is bad").

  2. They must be properly conditioned, able to preserve intuition as they scale up from describing single networks, to autonomous systems, to countries or continents, to the entire Internet.

  3. They must preserve their meaning over multiple timescales, up to years, despite the inherent nonstationarity of the Internet. Meaningful trending is a serious goal. In this presentation we will describe the set of routing quality metrics that we currently use to study global routing stability, incorporating all routing changes over all globally announced prefixes received from dozens of BGP peer routers worldwide.

The presentation will be illustrated with both long-term behavior of routing stability and reachability, and with zoom-in on various routing events of interest.

Speakers
Jim Cowie, Renesys Corporation
Andy T. Ogielski, Renesys Corporation
B.J. Premore, Renesys Corporation
Eric A. Smith, Renesys Corporation
Todd Underwood, Renesys Corporation

Full Abstract

I. Introduction Internet routing is plagued with several problems today, including chronic instability, convergence problems, and misconfigurations of routers [2]. We believe that a first step towards making BGP robust to these dynamics is by developing a systematic methodology for analyzing routing changes and inferring why they happen and where they originate. Answers to these questions can provide useful insights into the sources of anomalous routing events and instabilities. We are working towards development of a BGP health inferencing system for determining the root cause of routing changes. The health inferencing system collects and correlates route updates from multiple vantage points to determine the routing events that trigger each route update. We envision deploying our inference algorithms in data collection centers such as Routeviews [3] and RIPE, which receive streams of route updates from multiple vantage points (views). More generally, we can use a BGP health monitor to continuously infer the state of the network. Such inferences may then be used: (a) offline for network performance monitoring and troubleshooting; or (b) online to improve path selection and damping of instability. II. Inference techniques II.a Turbulent vs. quiescent periods The rate at which prefixes get updated signifies the type(s) of event that caused the stream of updates. In a turbulent period, one or a few major routing events cause several routes to simultaneously get updated. We assume that many observations in such a period are correlated (i.e., arise from the same routing event). In a quiescent period, when very few prefixes are updated, it is harder to determine which updates are caused by the same routing event. In this case, we analyze updates to each prefix in isolation. For example, suppose at view V we have 2000 prefixes which all use AS Path [V,A,B,C], and 2000 prefixes which use AS Path [V,A,B,D]. Suppose within a short period of time we observe updates to 2000 prefixes traversing the inter-AS link (B,C), and suppose very few other prefixes are updated. Then, it is likely that a single major event took place place at (B,C). On the other hand, suppose we instead observed updates to 4000 prefixes that traverse (A,B), and of these prefixes, 2000 traverse (B,C) and 2000 traverse (B,D). Then, the major event likely took place on the link (A,B). II.b Matching causes with observations For every potential cause of a routing event, there exist different patterns of route updates that can be observed at a vantage point. Based on the pattern of observations, we classify the causes into equivalence classes, where each class contains different causes that might trigger the same pattern of updates. While Griffin et al. [1] have shown that matching causes with observations is a hard problem, we find that certain patterns of updates (e.g., presence of route withdrawals) can help in narrowing down the set of possible causes. For example, suppose view V's routing table contains an AS path [V,A,B,C,D] to a prefix X, and assume for simplicity that AS's are singly peered (we do not make this assumption in our design). Suppose after some time the path changes to [V,A,B,Y,C,D] and remains stable for some time. There are several possible events that could explain this change: perhaps Y advertised a lower MED to B, or perhaps the link (B,C) failed, or perhaps D changed a community attribute in the message triggering a route change at B. However, certain events could not explain this change: a failure of link (B,Y), or Y advertising a higher MED to B could not have caused this observation. In general, there are three possible explanations: either (1) some event happened on the path [B,C] to make it less desirable to B, (2) some event happened on the path [B,Y,C] to make it more desirable to B, or (3) some router on the path [C,D] changed a community attribute. II.c Multiple vantage points Observing the same event from several vantage points allows us to acquire additional information about the event. By comparing similarities and differences in observations across the views, and by measuring the magnitudes of the event at each view, we can distinguish the signature of the event from effects introduced by intermediate routers along the path. For example, suppose in the previous scenario another view V2 simultaneously observed a routing change to the same prefix X from AS Path [V2,F,G,C,D] to [V2,F,Y,C,D]. Then, it is most likely that a single event caused the observations at V and V2. Moreover, it is highly likely that the event took place at C,Y, or D, and not at B. III. Validation and results Most ISPs do not wish to reveal the types or frequency of events taking place in their networks, making validation of our approach difficult. However, there are several well-known major events that are public knowledge, such as the spread of Internet worms, or routing problems suffered by major ISPs. In addition, we know the location where certain classes of updates are caused, for example updates pertaining to prefixes originated by the AS containing the vantage point, or updates generated by BGP Beacons [4]. We considered a large number of such updates, and found that inference was performed correctly in every case. Although we aren't able to directly validate all of our inferences using this approach, we are able to verify the correctness of a base set of rules that we used to acquire our results. To demonstrate the utility of such a system, we apply our inference methodology to updates collected from Routeviews and RIPE over a period of 18 months. We make several observations from our analysis: We can pinpoint the location where the update was generated to a single pair of AS's for over 70% of updates. Additionally, we output a list of potential causes that might have caused an event, but may not always be able to identify the specific cause. Our system can detect major routing anomalies, many of which were previously unknown. We detected nearly 1,400 resets per month, and found certain inter-AS links to be perennially unstable. Roughly 25% of prefixes continuously flap at least every 30 minutes, and these account for a large fraction (20%) of routing updates. Routing events in the Internet core usually trigger short-term flaps, but an event taking place at the network edge is nine times more likely to cause a long-term route change. Bibliography: [1] T. Griffin, "What is the sound of one route flapping?," presentation made at the Network Modeling and Simulation Summer Workshop, 2002. [2] C. Labovitz, A. Ahuja, F. Jahanian, "Experimental study of Internet stability and wide-area network failures," in Proc. of Fault Tolerant Computing Symposium, June 1999. [3] "Route Views Project," http://www.routeviews.org. [4] Z. Mao, R. Bush, T. Griffin, M. Roughan, "BGP beacons," in Proc. Internet Measurement Conference, October 2003. [5] M. Caesar, L. Subramanian, R. Katz, "BGP health monitoring: realtime analysis," web site

Speakers
Matthew C. Caesar, UC Berkeley
Matthew Caesar is a graduate student in Computer Science at the University of California, Berkeley. His current work focuses on improving the properties of interdomain routing. He has previously worked on IP telephony and fast restoration from failures.

R.H. Katz, UC Berkeley
L. Subramanian, UC Berkeley

Full Abstract

The release of the Welchia/Nachi worm coincided with the start of the Fall semester at the UC Berkeley campus. While the virus payload was not exceptionally malicious, its mode of victim discovery had severe impacts as its self-replication was further fueled by the continuous ingress of vulnerable hosts. Welchia traffic was less prominent on more robustly connected networks, but the wireless LANs at UCB were quickly debilitated, punctuating limitations in the system's architecture and vulnerabilities in the topology. This presentation will discuss methods used to restore wireless usability and efforts to retard the proliferation of this and other worms.

Speakers
Christopher Chin, UC Berkeley
Christopher Chin is a senior member of the Network Services Team at UC Berkeley, where he and his colleagues support the data network and its associated services, and explore new technologies in their spare time. He returned to his alma mater, where he studied German and Electrical Engineering, after an extended intellectual vacation in the private sector.

Full Abstract

In fall 2003, many schools were unprepared in terms of network infrastructure and staff to deal with the overwhelming number of infected computers that suddenly arrived on their campus networks. This resulted in the shutdown of several University networks and an enormous strain on helpdesk staff. Over a single week in September, Boston University had approximately 10,000 students arrive on campus, 7,000 of whom arrived during the three-day Labor Day weekend. As with most schools, many of these computers were either exploitable or already infected with a wide variety of worms and viruses. Why did Boston University have a relatively quiet "move-in"? Using shareware tools and some minor in-house coding, Boston University deployed a system that detects, isolates, and quarantines most vulnerable systems when they attach to the network for the first time. After the hosts are active on the campus network, the host can be returned to this quarantine if it is subsequently found to be infected or fails an active vulnerability scan. While quarantined, all web queries are redirected to an informational web site that has customizable information, including a self-help guide and tools to patch and clean the host. This talk will detail the infrastructure, systems and software used to build this network registration and quarantining system, the modifications that were needed, its successes, its failures, and some thoughts on where to go next.

Speakers
Eric Gauthier, Boston University
Eric Gauthier is currently the senior Network Systems Engineer for Boston University's Office of Information Technology. Prior to this, he worked as a network engineer for several regional and large-scale ISPs, including Exodus Communications.

Full Abstract

We present our analysis of the December 2003 Distributed Denial-of-Service (DDoS) attack against the SCO group. In spite of rumors that SCO faked the denial-of-service attack to implicate Linux users and garner sympathy from its critics, UCSD's Network Telescope received more than 2.8 million response packets from SCO servers, indicating that SCO responded to more than 700 million attack packets over 32 hours. The outage was also documented by Netcraft and others. We present the details of this specific analysis as well as the principles and techniques behind UCSD's network telescope observatory station.

Speakers
David Moore, CAIDA
Colleen Shannon, CAIDA

Full Abstract

The goal of negative testing is to produce network stabiity and proper routing when errors are observed. This presentation will provide techniques for negative testing of BGP to ensure that BGP-speaking devices in the network can handle error conditions that may occur. The network operator is introduced to the concept of negative testing and the benefits it provides to operational networks. Recommendations for BGP negative testing are then provided covering the following topics:

  • Handling of BGP Configuration Errors
  • Responding to BGP Update Message Errors
  • BGP Convergence Due to Unplanned Network Failures
  • Next Hop Selection Due to BGP/IGP interaction
  • BGP Route Selection Criteria

Speakers
Brent Imhoff, Wiltel
Brent Imhoff is currently the IP Principal Architect at WilTel Communications. Prior to working at WilTel, Brent held similar positions at Qwest and Digital Teleport. He has recently become active in the IETF Benchmarking Methodology Working Group.

Scott Poretsky, Quarry
Scott Poretsky is currently Software Quality Assurance Manager at Quarry Technologies. Prior to this, he spent six years at Avici Systems as Manager of Product Verification. Scott has been an active contributor to the IETF's Benchmarking Methodology Working Group, where he has co-authored numerous Internet Drafts. Scott earned an MSEE from the Worcester Polytechnic Institute and a BSEE from the University of Vermont.

Full Abstract

Speakers
Moderator - Celeste Anderson, LAAP/Pacific Wave
Panelist - Jay Adelson, Equinix
Panelist - Tom Bechly, MAE Services
Panelist - Itojun Hagino, DIX-IE
Panelist - Mike Hughes, LINX
Panelist - Christopher Quesada, Switch and Data
Panelist - Akio Sugeno, NYIIX/LAIIX

Tuesday, February 10, 2004
Topic/Presenter
Recordings
Full Abstract

Speakers
Susan Harris, Merit Network

Full Abstract

Scott will discuss policies Harvard University has put in place to fight virus and worm attacks that have recently hit the campus.

Speakers
Scott Bradner, Harvard University
Scott Bradner is a Senior Technical Consultant in the Harvard Office of the Provost, where he provides technical advice and guidance on issues relating to the Harvard data networks and new technologies. He has served in a number of roles in the IETF, as codirector of the Operational Requirements Area (1993-1997), IPng Area (1993-1996), Transport Area (1997-2003) and Sub-IP Area (2001-2003). Scott was a member of the IESG (1993-2003) and was an elected trustee of the Internet Society (1993-1999), where he still serves as the Vice President for Standards. Scott is also a trustee of ARIN.

Full Abstract

I. Introduction BGP assumes that the routing information propagated by authenticated routers is correct. This assumption leaves the current infrastructure vulnerable to both accidental misconfigurations and deliberate attacks. Though BGP currently enables peers to transmit route announcements over authenticated channels, this approach only verifies who is speaking, but not what they say. For example, in 1997, a simple misconfiguration in a customer router caused it to advertise a short path to a large number of network prefixes, and this resulted in a massive black hole that disconnected significant portions of the Internet. Adversaries can inflict more extensive damage than misconfigurations. Adversaries can potentially render destinations unreachable, eavesdrop on data passing through them, or even impersonate a site. More sophisticated BGP security mechanisms have been proposed (e.g., S-BGP), but they often require an extensive cryptographic key distribution infrastructure and/or a trusted central database. Neither of these two crucial ingredients have been introduced and hence these security proposals have not moved forward towards adoption. In this paper we seek measures to secure BGP that need no public key distribution nor rely on a trusted database. Our goal is not to achieve perfect security, but to provide much better security than exists at present through mechanisms that are easily deployable. The underlying vulnerability in BGP, which we primarily address in this paper, is the ability of an AS to propagate invalid routes that deviate from the actual Internet topology. II. Our Approach: Listen and Whisper The primary underlying vulnerability in BGP that we address in this presentation is the ability of an AS to create invalid routes. There are two types of invalid routes: Invalid routes in the Control plane: This occurs when an AS propagates an advertisement with a fake AS path (i.e., one that does not exist in the Internet topology), causing other AS's to choose this route over genuine routes. A single malicious adversary can divert traffic to pass through it and then cause havoc by, for example, dropping packets (rendering destinations unreachable), eavesdropping (violating privacy), or impersonating end-hosts within the destination network (such as Web servers, etc.). Invalid routes in the Data Plane: This occurs when a router forwards packets in a manner inconsistent with the routing advertisements it has received or propagated; in short, the routing path in the data plane does not match the corresponding routing path advertised in the control plane. Mao et al.** show that for nearly 8% of Internet paths, the control plane and data plane paths do not match. The prevalence of such a high mismatch ratio motivates the need for separately verifying the correctness of routes in the data plane and not merely focusing on the control plane. The difference between an inadvertent misconfiguration and an adverse operation is intention and persistence. An adversary will make a deliberate effort to disguise the misconfiguration, to sustain it, and to hide its origin. Therefore, we present two types of mechanisms below, Listen and Whisper, that are appropriate for different types of incidents and different threat levels. II.1 Brief description of our solutions Listen detects invalid routes in the data plane by checking whether data sent along routes reaches the intended destination. Whisper checks for consistency in the control plane. Whisper: The objective of the Whisper method is to defend against invalid route announcements on the control plane. The primary design principle of these protocols is to use redundant network connectivity as a substitute for secure communication channels. The protocols verify route announcements of the same originator pair-wise. Unless an adversary controls the paths over which both route announcements were propagated, the verification yields an inconsistency. In this case, our protocols raise an alarm and flag the suspicious routes. On the other hand, if one route announcement is consistent with a valid route announcement, then two of our Whisper protocols also provide a certain level of confidence that the AS path in the first announcement is valid. The primary advantage of these protocols is that they have a negligible management, processing, and implementation overhead. Particularly, they do not require prior exchange of cryptographic keys. Listen: The main idea behind the Listen method is to monitor the progress of TCP flows on the data plane. By doing this, a router can detect loss of connectivity that might be caused either by BGP misconfigurations or network failures. While the Listen approach only points to the existence of a reachability problem, determining the cause requires other mechanisms. The Listen technique has two distinct advantages. First, early detection of reachability problems for reasonably popular prefixes (prefixes that regularly observe non-zero traffic) can virtually eliminate the possibility of long outages due to misconfigurations. Second, it is a stand-alone technique that can be incrementally deployed: a router would benefit from implementing this technique even if it is the only one to implement it. However, this technique is not robust against attackers along the downstream path that impersonate the destinations. II.2 Level of Protection While both these techniques can be used in isolation, they are more useful when applied in conjunction. The extent to which they provide protection against the three threat scenarios can be summarized as follows: Misconfigurations and Isolated Adversaries: Whisper guarantees path integrity for route advertisements in the presence of misconfigurations or isolated adversaries; i.e., any invalid route advertisement due to a misconfiguration or isolated adversary with either a fake AS path or with any of the fields of the AS path being tampered (e.g. addition, modification or deletion of AS's) will be detected. Path integrity also implies that an isolated adversary cannot exploit BGP policies to create favorable invalid routes. In addition, Whisper can identify the offending router if it is propagating a significant number of invalid routes. Listen detects reachability problems caused by errors in the data plane, but is only applicable for destination prefixes that observe TCP traffic. However, none of our solutions can prevent malicious nodes already on the path to a particular destination from eavesdropping, impersonating, or dropping packets. In particular, countermeasures (from isolated adversaries already along the path) can defeat Listen's attempts to detect problems on the data path. Colluding Adversaries: None of our techniques can prevent two colluding nodes from pretending there is a direct link between them by tunneling packets. Moreover, colluding nodes can exploit the current usage of BGP policies to create large-scale outages without being detectable by either Listen or Whisper. To deal with this problem, we suggest simple modifications to the BGP policy engine which in combination with Whisper can largely restrict the damage that colluding adversaries can cause. In the absence of complete knowledge of the Internet topology, these two problems also exist in the case of heavy-weight security solutions such as S-BGP. ** "Towards an Accurate AS-Level Traceroute Tool." by Z. Morley Mao, Jennifer Rexford, Jia Wang, and Randy Katz. ACM SIGCOMM 2003.

Speakers
Lakshminarayanan Subramanian, UC Berkeley
Lakshminarayanan Subramanian is a Ph.D. student at UC Berkeley, working under the guidance of Prof. Randy Katz and Prof. Ion Stoica. His primary research interests are in the areas of inter-domain routing and overlay networking. Previously, Lakshmi worked on the problem of characterizing the properties of Internet topology using BGP routing tables. His current work focuses on improving the security of BGP.

Recordings
Full Abstract

In today's internet, BGP is extremely chatty --- the most minor connectivity change produces hundreds of updates and a significant peering loss can generate millions. While gigahertz processors and terabyte disks have made it possible to capture and record BGP events via passive peering, making sense of the deluge of data remains difficult. We have developed statistical algorithms to extract the large-scale structure of BGP event streams and visualization techniques to display that structure in operationally meaningful ways, i.e., to quickly answer questions like "what happened?", "where did it happen?" and "how does it affect me?." These tools can also be used to provide real-time views of an ISP's interdomain topology that help rapidly diagnose problems like misconfigured community tags, policy filters with unintended consequences, unexpected or unwanted backup paths, peering traffic imbalance, etc. The analysis is fast enough to run in real time on a modern processor even when dealing with, for example, the entire backbone mesh of a typical tier-1 ISP. We will describe the algorithms and show case studies from variety of data taken on both large ISP backbones and large institutional networks. Email note sent after the meeting from Van Jacobson: The animations from Tina Wong's "Making Sense of BGP" talk at NANOG-30 this morning are available at: http://www.packetdesign.com/technology/presentations/nanog-30/index.htm The animations are in SVG (a W3C graphics standard) and should be viewable in any web browser but you'll probably have to download an SVG plugin first (there's a link to Adobe's free plugin at the top of the web page). If you play with the stuff, we'd welcome coments and suggestions.

Speakers
Cengiz Alaettinoglu, Packet Design
Van Jacobson, Packet Design
Tina Wong, Packet Design
Tina Wong is a member of the Network Science department at Packet Design, Inc. in Palo Alto, California. Previously, she was a researcher at Hewlett Packard Laboratories, spending most of her time on assignment in Tokyo, Japan. Her interests include experimental systems building, network measurement and trace analysis, and Internet applications and services. Tina holds a Ph.D. and M.S. in Computer Science from University of California at Berkeley, and a B.S. with distinction in Computer Science from University of Washington.

Full Abstract

This talk presents an analysis of the influence of intradomain routing changes in BGP in the AT&T domestic backbone. We propose a general methodology for associating BGP update messages with events visible in OSPF. Then, we apply our methodology to streams of OSPF link-state advertisements and BGP update messages from the AT&T network.

Our analysis shows that

  1. "Hot potato" routing is sometimes a significant source of BGP updates.

  2. BGP updates can lag 60 seconds or more behind the related intradomain change, which can cause delays in forwarding-plane convergence and introduce inaccuracy in active measurements of the customer experience.

  3. The number of BGP path changes triggered by hot-potato routing has a nearly uniform distribution across destination prefixes.

  4. The fraction of BGP messages triggered by OSPF varies significantly across time and router locations, with important implications on external monitoring of BGP.

We also describe how certain network designs and operational practices increase the impact that internal OSPF events have on BGP routing.

Speakers
Tim Griffin, Intel
Jennifer Rexford, AT&T
Aman Shaikh, AT&T
Renata Teixeira, UCSD
Renata Teixeira is a Ph.D. student at the University of California, San Diego, in the Department of Computer Science and Engineering. She received her B.Sc. in Computer Science and M.Sc. in Electrical Engineering from Universidade Federal do Rio de Janeiro, Brazil, in 1997 and 1999, respectively. Her research interests are in measurement and analysis of routing protocols, and in management of large IP networks.

Full Abstract

There are a number of advancing Internet Drafts within the IETF describing methods for encapsulating MPLS over IP networks. These include MPLS over IP, MPLS over GRE, MPLS over L2TPv3, MPLS over IPsec, and various combinations of each. This presentation will give an overview of the various options available, the tradeoffs of each, and how they can be used to complement an MPLS or IP core network.

Speakers
Mark Townsley, Cisco Systems
Mark Townsley is a Technical Leader for Cisco Systems, where he has worked on various VPN software development projects since 1997. Mark is Chair of the L2TP Extensions Working Group and Technical Advisor for the Pseudo Wire Emulation Edge to Edge (PWE3) Working Group in the IETF and has published numerous RFCs and technical reports. Mark holds a Bachelor's degree in Electrical Engineering (cum laude) from Auburn University and a Master's degree in Computer Science (summa cum laude) from Johns Hopkins.

Recordings
Full Abstract

On October 17, 2003, the IETF's Internet Architecture Board (IAB) posted the following note to the NANOG list: The IAB notes that there ISPs/ASes undertaking permanent deployment of edge-based protocol number/port number packet filtering on traffic received from eBGP peers. As a short term response to security incidents this is a prudent operational measure that limits the spread of various forms of attack, and also mitigates some level of risk associated with network vulnerabilities. For example, many ISPs installed temporary filters in response to a July 2003 security advisory for CISCO routers (http://www.cert.org/advisories/CA-2003-15.html). In the case of this incident PIM (protocol # 103) and mobile-ip4 (55) packets could trigger the vulnerability. The operational community responded with widespread deployment of filters at AS borders for these protocol numbers. Because of this, PIM and mobile-ip4 no longer work across such AS borders. The IAB is concerned about the practice of the permanent deployment of such traffic filters, since this could block the operation of certain applications in current use, as well as limiting the potential for deployment of future applications. Such filters ultimately limit extensibility of the Internet protocol as well as the Internet itself. It is an entirely appropriate and operationally prudent response to filter at the AS border as a short term mitigation of various network vulnerabilities. However, filters at AS borders do not provide any more than a relatively short term mitigation, and certainly do not solve the real problem of eliminating all forms of exploitation of such vulnerabilities. Over time knowledge of a vulnerability spreads across the network and potential exploiters of a vulnerability will be within an ISP/AS as well as being on the outside. The only stable and appropriate longer term operational response is to upgrade network equipment to eliminate the vulnerability, rather than attempting to configure packet filters intended to prevent externally located third parties from exploiting it. While short term traffic filters are deployed, the appropriate recommended longer term action is to: - To install filters to detect packets that are directed to the router itself to protect the router. (do not filter traffic that goes through the routers). - To update router firmware to a version known to eliminate the vulnerability Regards, Jun-ichiro itojun Hagino, on behalf of IAB ([email protected]) (See the posting and reponses at http://www.merit.edu/mail.archives/nanog/2003-10/msg01025.html">http://www.merit.edu/mail.archives/nanog/2003-10/msg01025.html). The posting generated a certain amount of discussion, including questions about why the IAB is commenting on ISP operational issues, and whether or not ISPs should filter routes at eBGP routers. In this presentation, we will discuss the IAB's views on the importance of the Internet's extensibility/adaptability to new protocols, and the negative impact on extensibility of filtering at eBGP borders.

Speakers
Itojun Hagino, IETF Internet Architecture Board
Itojun is a member of the KAME project, which develops an IPv6/IPsec stack for *BSD UNIX variants. In the IETF, Itojun has been involved in IPv6 and security-related Working Groups such as IPsec, and contributed to various RFCs related to the IPv6 transition, operations, and the protocol itself. He has been a member of the IAB since March 2002.

Full Abstract

This report summarizes the IPv4, IPv6, and AS number allocations on a global basis and highlights the recent activities from the four Regional Internet Registries: ARIN, RIPE, APNIC, and LACNIC.

Speakers
Ray Plzak, ARIN
Ray Plzak is the President and CEO of ARIN.