North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
[fwd] Rats take down Stanford ...
A follow-up thought on redundancy issues. - paul [snip] >Date: Mon, 21 Oct 1996 12:54:05 -0700 (PDT) >From: email@example.com >Subject: RISKS DIGEST 18.54 [snip] > >Date: Fri, 18 Oct 96 11:03 EST >From: William Hugh Murray <firstname.lastname@example.org> >Subject: Re: Rats take down Stanford ... (RISKS-18.53) > >PGN's request for redundancy brings to mind the story of the infrastructure >computer center in Trumbull, Connecticut. It is an old story but bears >repeating. > >Seems that a squirrel got into a transformer and brought down the external >power supply. The UPS kicked in, engine generators came on line, and the >center operated in this mode for about an hour and a half. At the end of >that time the external power was restored. The external power, the UPS, and >the engine generators went inot a deadly embrace. The whole thing came down >and would not come back up. > >I take two lessons from this. First, redundancy adds some complexity and a >lot of redundancy adds a lot of complexity. At some point the redundancy >begins to introduce failure modes and failure events that would not have >exited in its absence. There is an upper bound to such redundancy. > >Second, test redundant systems through to resumption of normal operations. >In this case, the operators had tested to ensure that the redundant systems >would come online in the event of a failure of the primary system. They had >not tested to see what would happen when the primary system was restored to >normal operation. > >Who would have even thought about it? I confess that I would not have. > >William Hugh Murray, New Canaan, Connecticut > [snip] - - - - - - - - - - - - - - - - -