North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

outages, quality monitoring, trouble tickets, etc

  • From: William Allen Simpson
  • Date: Thu Nov 23 01:24:27 1995

A bit of a rambling note, as I catch up with the unusually busy lists....

> From: Scott Huddle <huddle@mci.net>
> I consider this list a place for ISPs to discuss general policy and
> planning issues that effect all of us.  It is a very inappropriate
> place to discuss problems with a specific provider.

I firmly DISAGREE!  None of us are particularly interested in hearing
every jot and tittle about every network flap, but _BIG_ ones and their
resolution are important to bring to this list!  How else to get a
handle on what the real problems are?  How else to help each other
avoid repeating the problem in the future?

                                ----

As to the current state of the 'net, I have to agree whole-heartedly
with Hans Werner (something I rarely did when he was around here....)

The mindset in NANOG is pretty useless.  It would be nice if folks
stopped beating around the bush, worrying about "competitive" issues,
and started cooperating!  Leave the competitive posturing to the
marketing departments.

Fixing problems usually means focusing on a particular case.  If
analysis of the problems of a particular ISP/NSP shed some light on the
resolution of a bigger scope, then a little embarrassment is a small
price to pay; it's not fatal -- failing to fix the problem is fatal!

                                ----

As to the earlier discussion about Frame Relay instead of direct links,
my experience is that F-R within a LATA between a few routers is working
reasonably well, but inter-LATA and wider is working poorly, and more
than 5-6 routers is a disaster.

Most of my recent link problems that can be pinpointed enough to trouble
ticket have been directly due to F-R, primarily at PSI.  A couple of
weeks ago, they lost the entire Great Lakes area, and didn't notice for
over 4.5 hours.  And took another 6 hours to fix.  They never did tell
me the final solution.

So, based on experience, I don't recommend F-R for long haul links.
It's just not good enough!

                                ----

One of the reasons that ISPs are flapping is the lack of Link Quality
Monitoring.  You can easily tell when the link is degrading, with very
accurate reports on a packet or byte basis.  This is particularly
important for F-R links, as the switches don't seem to tell each other
when the link is down.

I was surprised to learn that some folks weren't using PPP LQM on high
speed HDLC links.  That's why we originally designed it!  PPP also runs
over F-R links, even if all you use it for is LQM.

After 4 years, we are finally getting around to raising PPP LQM for
Draft Standard, but it is pretty widely implemented....

Insist on LQM from your router vendors!

                                ----

Has anybody else noticed how hard it is to get trouble tickets these
days?  Once upon a time, I just called the NSF NOC, and got a report to
them in real time, so the problem could be fixed quickly.  Nowadays,
NOCs seem to want you to send email with 24 or 48 hour turnaround, or go
through 2 layers of service representatives.  Pretty hard to send email
to them when their link is down, or go through "regular" support in the
middle of the night!

We really need more folks like MCI with an 800 number.  I've found them
very responsive.  But then, I've also found that they have fewer
problems than other ISPs I've dealt with lately.  Maybe that's because
they get faster problem reports?  (See, I can give compliments, too.)

Bill.Simpson@um.cc.umich.edu
          Key fingerprint =  2E 07 23 03 C5 62 70 D3  59 B1 4F 5E 1D C2 C1 A2