|
North American Network Operators Group Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical Re: Followup British Telecom outage reason
They probably did. The vendor probably did also. Of course, they can't always simulate real network conditions. Nor can your own labs. Heck, even a small deployment on 2 or 3 routers (out of, say, 200) can't catch everything. It is a simple fact that some bugs don't show up until its too late. And cascade failures occure more often than you might think (and not necessarily from software.) Remember the AT&T frame outage? Procedural error. How about the netcom outage of a few years ago? Someone misplaced a '.*' if I remember correctly. Human error of the simplest kind. I've had a data center go offline because someone slipped and turned off one side of a large breaker box. These things happen. The challenge is to eliminate the ones you CAN control. And, IMO, the industry is generally doing a good job of that. I chalk this whole thing up to bad karma for BT. -Wayne On Sat, Nov 24, 2001 at 11:05:20AM +0000, Neil J. McRae wrote: > > > > > > > BT is telling ISPs the reason for the multi-hour outage was > > a software bug in the interface cards used in BT's core network. > > BT installed a new version of the software. When that didn't fix > > the problem, they fell back to a previous version of the software. > > > > BT didn't identify the vendor, but BT is identified as a "Cisco Powered > > Network(tm)." Non-BT folks believe the problem was with GSR interface > > cards. I can't independently confirm it. > > > > I'd be surprised if it was the GSR, and in anycase that doesn't > absolve anyone. If it was a software issue- why wasn't the software > properly tested? Why was such a critical upgrade rolled out across > the entire network at the same time? It doesn't add up. > > Neil. --- Wayne Bouchard web@typo.org Network Engineer http://www.typo.org/~web/resume.html
|