WAN port locks up

Brian 'MegaZone' Bikowicz ((no email))
Mon, 16 Oct 1995 13:13:20 -0700

>From megazone Mon Oct 16 12:30:53 1995 remote from server
Received: from bast.livingston.com (bast.livingston.com [149.198.247.2]) by server.livingston.com (8.6.9/8.6.9) with ESMTP id MAA03623; Mon, 16 Oct 1995 12:30:53 -0700
Received: from asterix.helix.net ([204.244.109.2]) by bast.livingston.com (8.6.9/8.6.9) with ESMTP id MAA10016; Mon, 16 Oct 1995 12:32:53 -0700
Received: from asterix.helix.net (asterix.helix.net [205.233.118.2]) by asterix.helix.net (8.6.12/8.6.12) with SMTP id MAA16422; Mon, 16 Oct 1995 12:17:48 -0700
Date: Mon, 16 Oct 1995 12:17:47 -0700 (PDT)
From: Charles Howes <chowes@helix.net>
To: portmaster-users@livingston.com
cc: cdr@livingston.com
Subject: Help, radiusd isn't talking to the portmaster!
In-Reply-To: <199510140600.XAA24740@server.livingston.com>
Message-ID: <Pine.SOL.3.91.951016112113.14859H-100000@asterix.helix.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: megazone

Site: Sun Solaris 2.4, ComOS 3.1.2 and 3.1.4, PM2ER

Problem:

"It was working yesterday, and today it went foom!"

The first hint I had that something was wrong was when someone called
me and said that users couldn't log in, and it had been that way for
an hour and a half.

I dialed in, and sure enough, I couldn't log in.

I dialed in as !root, and tried to telnet to my host. The telnet
command just sat there. I tried to hit ^] to break out, but it
wouldn't let me. I hung up.

I dialed in as !root, telnetted to the portmaster, and then telnetted
to the host. This time, it still sat there, but I could hit ^] to
break out.

I did a ping to the ip number of my host, it was alive.
I did a ping to the hostname of my host, it hung.

I changed the dns server.
I did a ping to the hostname of my host, it was alive.
I telnetted to my host, it hung.

I did a traceroute to my host, it worked.
I telnetted to each host of the chain, it worked (cisco login prompt).
I telnetted to any unix machine that I knew of, and the sessions hung.
I telnetted to any cisco or netblazer or portmaster that I knew of,
and the session worked.

I telnetted to a netblazer, then telnetted to my host, it worked.
I killed off a process that was causing a load of 40.
I killed off and restarted radiusd.

I dialed in, but couldn't log in.

I dialed in as !root, but still couldn't telnet to my host.
After a few more attempted telnets to my host, the host started saying
"Connection refused" for every port I could try; telnet, finger,
chargen, ftp, etc.

The other sysadmin arrived at the office, since I was working from home.
He rebooted, which cleared the 'Connection refused' problem.

I dialed in as !root, telnetted to the netblazer, telnetted to my host.

I added 'fopen(...,"a");fprintf(...);fclose(..)' statements all over
the place, and found the problem to be the hostname lookup stuff. The
portmaster's hostname was on that crashed nameserver.

I renamed the portmaster to be on my own name server. I'm really glad
I had the reverse-lookup domain for it though, so I could change it.

Now I could telnet from the portmaster to my host.
I changed a lot of files (/etc/raddb/clients, /etc/hosts, the hosts
table on the portmaster to see the new names.)

I dialed in, but could not login.
I pulled a bunch of hair.

I changed the radiusd server to be another machine, running Sunos 4.1.3U1.

That worked, now I could login.
A second portmaster which was showing all the same symptoms as the
first is still set to be using the original host. It still doesn't
work.

Because we're getting rid of the Sunos 4.1.3U1 machine, I need to make
it work with Solaris 2.4.

>From the extensive debugging output of radiusd -x and my own stuff, I
determined that the portmaster was sending a valid request, it was
being decoded correctly, and the acknowledgment was being sent, but
the portmaster doesn't let people log in. The log on the host says

"Line S2 user xxxxx access denied"

and nothing more.

-----
Things that were changed in the past week:

The ip number of the host.
The name server.
The names of the portmasters.
The version of ComOS (3.1.4, did that last night in vain)
The auth host (did that last night with some success)

-----
Things I want to know:

The upgrade program for installing new versions of ComOS:
Why does it coredump on Solaris 2.4; I had to use Sunos 4.1.3u1...

Why isn't the system working right?
It absolutely has to work with Solaris 2.4, and the old hosts are
being returned to Sun.

How do I determine if radiusd or the portmaster is at fault?
I'd enjoy it if there was a self-test built in to radiusd.
I've reset the radiusd secret several times on both ends.
What were those debug options on the portmaster?