North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Re: Persistent BGP peer flapping - do you care?
Brian: Thank-you for your 2 cents. I'm gathering all the input until Sunday night. I really appreciate your comments. I'll summarize all the input to the list at that time, and suggest some ideas. I'll try to boil all the input on this problem into a document that I can post to IDR and NANOG. Sue PS - I'm away from email from now until Monday am. Thanks nanog folks!! At 07:30 PM 1/17/2002 -0500, Dickson, Brian wrote:
Here's my two cents... A good rule of thumb (possibly from RFC 822) is, be liberal in what you accept and strict in what you send. When applied to BGP, I would suggest that any implementation should choose a canonical form for constructing updates, but a parser that allows for rule-bending without rule-breaking. On the issue of existing vendor implementations, and how to build the specs to prevent meltdowns: I would suspect that during implementation, brand C routers were the victims during testing, and perhaps the change was made to avoid that happening. The current state of affairs is very much like the classical game-theory "prisoner's dilemna". The new spec should have two goals - discourage any implementation which can lead to meltdowns, and encourage strict adherence to the spec. The latter can be achieved via the former, in fact, if the mechanisms are well chosen. My suggestion would be, rather than a back-off of resetting BGP sessions, that first attempt strict interpretation (to insulate against completely insane routers), and then loose interpretation. The model is "Fool me once, shame on you, fool me twice, shame on me." On first receiving a bad update, reset. If upon re-establishing the session, the same bad update is heard, drop the bad update but keep the session up (along with the messages back, etc.) One additional optional behaviour I would suggest - look at the AS path and/or path length and/or announcing router IP address. If heard from the originator, drop the session (and either keep it down, or try one more time before requiring operator intervention); it may be the case that only these conditions strictly require a reset, and that all other situations may only require the "ignore bad routes" behaviour. Resetting BGP more than a small, finite number of times is, IMHO, a bad idea. After all, BGP is a stateful protocol, and state changes should be triggered deterministically, even if that requires operator input. Brian Dickson Velocita