Re: PM3 'stuck'

Jeffrey J. Mountin (sysop@mixcom.com)
Thu, 10 Jul 1997 13:24:42 -0500

At 10:19 PM 7/9/97 -0600, Ron Tapia wrote:
>Equipment:
>
> PM3
> 3 '10 port' cards
> CT1, E&M immediate start, B8ZS, ESF

Ditto here, except 4-10s and 1-8.

> Calling in with a Cardinal 28.8
>
>Ticket: 40016 (not that I've been called or anything)
>
>I've seen a lot of lost carriers with 3.5.1b20.
>
>In addition, I've been seen a problem where a connection will hang and
>either unhang or die due to a loss of carrier.
>
>The hangs seem to be associated with retrains (maybe renegotiations). These
>hangs can last many minutes (recollection, I haven't timed anything yet) and
>can seem permanent (i.e., before I thought that the connection was dead and
>I'd hang up). It's just now what I've realized that my connections can
>unhang.

FYI, retraining was used with v.32bis (14.4) and should not happen on v.34,
but for some reason certain v.34 modems will retrain and not renegotiate.

>It seems like my PM3 is retraining/renegotiating much too often for
>usability (or it's taking to long to do one or the other). Is it possible
>that 3.5.1b20 is a little too sensitive to line noise (for US West
>territory)?

Personally I find the terms "retrain" and "renegotiation" should be
reversed, as a retrain is basically the same thing as the inital connect,
only much slower. Some times this can be repeated consecutively. While
renegotiations are more like "Hey this line is a bit noisy, let's slow it
down a notch" or just the opposite.

>Just in the course of writing this message I've experienced several hangs
>and two hangs followed by disconnects. I'm beginning to favor the
>hypothesis that the hangs and disconnects are related and are both
>related to the sensitivity of the 3.5.1b20 code to line noise.
>Is there a better explanation?
>
>Here's a full description for those interested. I'm hoping that it will
>help others that see this problem know what to look for. If other info
>would be of help, please let me know.
>
>My PPP session was "hung" it was on m0. I was connected on another phone
>line and on the PM3 in question. Here is "show m0":
>
>==========================================================
>dpm1> show m0
> State: ACTIVE
> Active Port: S0
> Transmit Rate: 28800
> Receive Rate: 26400
> Connection Type: LAPM/V42BIS
> Chars Sent: 70837038
> Chars Received: 10681725
> Retrains: 1
> Renegotiations: 0
>
> Total Calls: 197
> Modem Detects: 192
> Good Connects: 170
>
> Connection Failures
> No Modulation: 13
> No Protocol: 13
> Not Operational: 9
> Total Failed: 22
>
> Session Terminations
> Lost Carrier: 0
>Normal Disconnect: 169
>==========================================================
>
>I was typing into a telnet window and I could see the "send" light on my
>modem blinking, but I never received responses. This lasted about a
>minute. Suddenly, my connection came back, my characters were echoed in
>my telnet session and "show m0" gave:
>
>==========================================================
>dpm1> show m0
> State: ACTIVE
> Active Port: S0
> Transmit Rate: 21600
> Receive Rate: 21600
> Connection Type: LAPM/V42BIS
> Chars Sent: 70855214
> Chars Received: 10683091
> Retrains: 2
> Renegotiations: 0

What happened to cause such a large "pause" was the retrain, which should
not be done on a v.34 connection, as renegotaitions are much faster (I like
to think of them as speed shifting). Data will not be transmitted while
either is happening.

--snip-- (several of these)
>
>After only 1 other noticable hang "show m0" gives:
>
>==========================================================
>dpm1> show m0
> State: ACTIVE
> Active Port: S0
> Transmit Rate: 26400
> Receive Rate: 26400
> Connection Type: LAPM/V42BIS
> Chars Sent: 71057433
> Chars Received: 10882282
> Retrains: 3
> Renegotiations: 8

--snip--

>As I was composing this, I saw another hang, "show m0" gave:
>
>==========================================================
>dpm1> show m0
> State: READY
> Chars Sent: 71101167
> Chars Received: 10908630
> Retrains: 4
> Renegotiations: 8
>
> Total Calls: 197
> Modem Detects: 192
> Good Connects: 170
>
> Connection Failures
> No Modulation: 13
> No Protocol: 13
> Not Operational: 9
> Total Failed: 22
>
> Session Terminations
> Lost Carrier: 0
>Normal Disconnect: 170

--snip--

Amazing, it learned to renegotiate!

--snip--

>This time, instead of "unhanging", I lost carrier. I reconnected and got
>the same modem.
>
>Here is "show modems":

M0 S0 ACTIVE 28800 V42BIS LAPM 171 0 LOST CARRIER

--snip--

>It shows that M0 last disconnected because of a lost carrier.
>
>While composing this, I saw another hang followed by a loss of carrier.

Now this is what bothers me. You lost carrier and the 'sh mo' has this
too, but on the 'sh m0' it has "Normal Disconnect"??? We have been using
the disconnect reason logging since it first appeared and equate "Normal
Modem Disconnect" with a lost carrier.

Considering the fact that you had 4 retrains in that session, it is no
wonder that you lost carrier. Most modems will not deal with more than 2-3
retrains per session and the result is a lost carrier.

Retraining with certain modems *could* be caused by b20 and I say this only
becuase I dealt with the same thing on an older beta flash for
SupraFaxModems and I disabled retraining, which didn't help. One must also
assume that some modems are more prone to this, as the default for the
Supra has it set so that it will never initate the retrain.

Since running b20 and even on b8 I see very few retrains, but I have seen
several connections that renegotiated up to 80 times in less than 2 minutes
and only in one case did they manage to stay connected. One of these
figured out that his 95 install was trashed, but once he reinstalls I will
be debugging him again.

Try disabling the retrain or set the preference to renegnotiation and see
if that helps. Update the flash too?

-------------------------------------------
Jeff Mountin - System/Network Administrator
jeff@mixcom.net

MIX Communications
Serving the Internet since 1990