We're still having a problem with multilink PPP connections, in that only
20% of the attempts succeed. The rest fail as soon as the second channel
is brought up.
In this particular pool we have four PM3's, all running ComOS 3.7.2c3 (we
had the same problem prior to c3). We are using 8 PRIs from a 5ESS switch
configured in NI-2 mode. Lucent has confirmed that all the settings
are correct in both the PM3's and the users file.
We have a mix of users testing this, they are using all different brands of
ISDN TAs (Motorola, Supra, 3Com, Adtran, etc). They all have the latest
and greatest firmware upgrades. Some are on Macs, others on NT, Win '95,
and one is using a NetBlazer router. They all report the same thing, an
80% failure rate.
The time of day or load on the PM3's, ethernet, and radius servers does not
make any difference at all. Each PM3 is on it's own port of an etherswitch,
collisions are pretty well non-existant, etc. It doesn't seem to matter if
the two channels hit the same or different PM3's.
We have an open ticket with Lucent, but of course every call they attempt
works perfectly. However, it appears that every call made via LD works
for us too. That is, the 80% failure rate only applies to local calls.
We had a Lucent tech logged in to all the PM3's simultaneously to monitor
debugging while we made numerous connections. Of course, we got 100%
successful connections and felt kinda silly. But 2 minutes after the Lucent
guy logged out and hung up, it went back to 80% failures again.
So we've been doing some tests, and this may sound weird, but the stats
are pretty convincing... It appears that simply having an administrative
login session going on each PM3 fixes the problem.
If I telnet to each PM3 and log in, then we make 10's of calls, we get
a 100% success rate. If I log out of them, we immediately return to an
80% failure rate. So we've been repeating this cycle ad nauseum and the
stats are pretty distinct. After hundreds of calls we can see that not
a single failure has occured while an admin login is present. When there
isn't a login we get 80% failures.
So I've left several telnets running to keep an admin session open on each
PM3 and voila - users report 100% success now.
By the way, the admin session doesn't actually have to be doing anything.
I simply log in and that's that - no "set console", no "set debug ...", etc.
Debugging is totally off.
We've reported this to the guy at Lucent who is working on it, but I was
wondering if anyone here has any ideas.
We did find a helpdesk document at Microsoft that describes the exact same
multilink PPP problem, which they blame on the PortMaster. They say it
has a timing problem where the call will terminate if both channels
connect within 500ms of each other. Both Microsoft and MZ have said that
this was corrected in newer versions of ComOS, yet I wonder if perhaps
there's still something similar going on.
A fact that seems to support this is that customers can get 100% successful
calls even without the admin logins present if they tie up one of their
channels with a handset, make the PPP call, then hang up the handset. The
TA then connects the second channel and all is well.
Then there's the LD issue. Correct me if I'm wrong, but wouldn't a LD
connect take slightly longer to establish as the circuit is built up? And
if the TA waits for the first channel to connect before establishing the
second, this would certainly increase the inbetween time.
Now this may be a stretch, but we're at that point: could the mere presence
of an admin login introduce some extremely small amount of latency to
ComOS, enough to delay the MCPPP negotiation by a few milliseconds?
In the meantime, leaving four telnet sessions running isn't a great
solution, but it definately appears to work. If this rings any bells
with anyone I'd sure appreciate hearing what you have to say.
Mark
-
To unsubscribe, email 'majordomo@livingston.com' with
'unsubscribe portmaster-users' in the body of the message.