^ Top

NANOG Meeting Presentation Abstract

Move Fast, Unbreak Things!
Meeting: NANOG66
Date / Time: 2016-02-08 11:30am - 12:00pm
This item is webcast
Room: Grande Ballroom
Presenters: Speakers:
Petr Lapukhov, Facebook.
Abstract: Every network fails, and large networks fail more often. Many times the issue is clearly visible, but every now and then there is something that goes by undetected by traditional monitoring systems (read - link down alarms, or packet drop/error counters).

This talk summarizes Facebook's experience of building a "black-box" fault detection and isolation system for data-center and backbone networks. The heart of the system is high-rate active probing component that allows for detection of failures regardless of the underlying cause. One of the prominent aspects of the system is its aim at real-time detection, allowing for practical reaction times from 10 to 20 seconds. We argue that this is likely one key feature that made system practical and useful to operations.

Retrospectively, we review the system's evolution, which went through multiple iterations, and compare different kinds of problems that arise in data-center, backbone and edge segments of the networks. Finally, we discuss the challenges specific to fault isolation and present our current approach, as well as the vision for future evolution.
Files: pdfMove Fast, Unbreak Things!(PDF)
youtubeMove Fast, Unbreak Things!
Sponsors: None.

Back to NANOG66 agenda.

NANOG66 Abstracts

  • Conference Opening
    Speakers:
    Tony Tauber, Comcast; Greg Dendy, Equinix; Raj Khurana.
    Al Burgio, IIX;
  • Conference Opening
    Speakers:
    Tony Tauber, Comcast; Greg Dendy, Equinix; Raj Khurana.
    Al Burgio, IIX;
  • Conference Opening
    Speakers:
    Tony Tauber, Comcast; Greg Dendy, Equinix; Raj Khurana.
    Al Burgio, IIX;
  • Conference Opening
    Speakers:
    Tony Tauber, Comcast; Greg Dendy, Equinix; Raj Khurana.
    Al Burgio, IIX;
  • Coding BOF
    Speakers:
    Matt Griswold, United Internet Exchange; Job Snijders, NTT Communications; Jesse Sowell, MIT; Elisa Jasinska, BigWave;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Research and Education Track
    Speakers:
    Manish Karir, QuadMetrics; Seyed K. Fayaz, Carnegie Mellon University; Alberto Dainotti, CAIDA, UC San Diego; Luca Sani, IIT-CNR; Ruwaifa Anwar, Stony Brook University; Vicente De Luca, Zendesk;
  • Coding BOF
    Speakers:
    Matt Griswold, United Internet Exchange; Job Snijders, NTT Communications; Jesse Sowell, MIT; Elisa Jasinska, BigWave;
  • Coding BOF
    Speakers:
    Matt Griswold, United Internet Exchange; Job Snijders, NTT Communications; Jesse Sowell, MIT; Elisa Jasinska, BigWave;
  • Coding BOF
    Speakers:
    Matt Griswold, United Internet Exchange; Job Snijders, NTT Communications; Jesse Sowell, MIT; Elisa Jasinska, BigWave;
  • Peering Track
    Speakers:
    Brad Raymo, Microsoft; Aaron Hughes6connect; .
    Ciprian Marginean, AMS-IX; Daniel KoppDE-CIX; .
  • Peering Track
    Speakers:
    Brad Raymo, Microsoft; Aaron Hughes6connect; .
    Ciprian Marginean, AMS-IX; Daniel KoppDE-CIX; .
  • Peering Track
    Speakers:
    Brad Raymo, Microsoft; Aaron Hughes6connect; .
    Ciprian Marginean, AMS-IX; Daniel KoppDE-CIX; .
  • Peering Track
    Speakers:
    Brad Raymo, Microsoft; Aaron Hughes6connect; .
    Ciprian Marginean, AMS-IX; Daniel KoppDE-CIX; .

 

^ Back to Top