Portmaster Crashes

Stuart Lynne (sl@wimsey.com)
Tue, 31 Oct 1995 13:56:53 -0800 (PST)

In article <46uf29$cjn@vanbc.wimsey.com>, Brian Tao <taob@io.org> wrote:
> What sort of tools are available to figure out what caused a PM-2e
>to crash? We've had three of our Portmasters suddenly stop responding
>to pings over Ethernet and to incoming calls in the past week. In
>some (but not all) cases, the activity LED on the PM-2e has gone out
>(we are using the BNC connectors on all eleven PM-2e's). Power
>cycling the unit always brings it back to life.
>
> I don't want to get into the advantages and disadvantages of thin
>vs. thick vs. twisted-pair, but would plugging the PM-2e's into a UTP
>hub help? Are there any on-board diagnostics to help me trace what
>caused the lockup? We are running ComOS 3.1.4 with BSD/OS 2.0
>machines as login hosts.
>
> One (rather embarrassing) point: the collision rate on our
>network is currently sitting at above 50% and spikes above 100%...
>could this cause a Portmaster to hang?

If I remember right our trouble ticket number on this problem is 6-133.

I've been watching these for months. We have a network monitor watching all
of our units that goes of if they can't be pinged. Sends us mail if it fails
three times at one minute intervals. What we have been seeing is that they
frequently (average of two to four occurences a day on a population of
six units) just go silent and then after a short interval (2-30 minutes)
they come back to life with all activity resuming as if nothing had happened.
About once or twice a week a power cycle is required to fix it.

While silent there is no activity on the serial ports. PPP/SLIP sessions are
stuck. Telnet/rlogin are frozen. Dialup to the unit, the modem answers but
no prompt. On the ethernet side there is no output.

Moving the units to their own ethernet segment helped a bit. 3.1.3 made it
worse (as compared to 3.1.2). 3.1.4 seems to have helped but not cured it.

We have put an ethernet sniffer on the ethernet segment two of them live on
and there doesn't seem to be anything untoward happening.

We have seen the problem with units on both Thinnet and UTP.

The timing of the pauses seems independant of usage. We see just as many in
the early AM as the PM.

-- 
Stuart Lynne <sl@wimsey.com>      604-933-1000      <http://www.wimsey.com>
PGP Fingerprint: 28 E2 A0 15 99 62 9A 00  88 EC A3 EE 2D 1C 15 68