| I suspect that you are seeing the symptome of two problems: one
is a bug, which has been fixed since the field test update. Under
certain circumstances, TSAM would take a lock out on a terminal
servers, and then 'forget' to unlock it when the operation is done.
I'm pretty sure that exiting out of DECmcc will release those locks.
The symptom would be that any operation with that specific terminal
server would hang, but otoperations with other terminal servers would still
succeed.
The second problem is tougher, that I have been aware of for some time.
Let me give you the long store of the what has happened and why it is
that way. I'm not sure there is much we can do to fix this.
As you probably know, TSAM uses a detached process to communicate with
the terminal servers. Since it requires all sorts of privileges
to start this process up, and we wanted less privileged users to be able
to use TSAM, TSAM was designed to be started up by one user (probably
SYSTEM) and other users would be able to communicate with the detached
process via a mailbox.
The problem is that TSAM has no way of knowing if the detached process
is up and running by actually executing a GETJPI system call. (That
would require WORLD privs.) So instead the detached process defines
a system logical MCC_TS_AM_STATE which acts as a flag to let TSAM tell
whether the detached process is ENABLED or not.
This works fine except when the detached process is hung, or there is
a discrepency between the state the logical is in and the state of the
detached process. (Say either the logical is deassigned, but the detached
process is still running, or the logical is defined, but the detached process
is stopped.)
What has probably happened in your case is the detached process got hung,
or stopped, but the logical was still around. You then issue a command,
and that hangs waiting for a response from the detached process which is
longer there (if the logical had been deassigned, then we wouldn't have
let you get this far.)
I don't fully understand the sequence of events you descibe in your note,
but the next time things hang, please do the following, and let me know the
results:
1. From MCC:
MCC> SHOW MCC 0 TERMSERVER_AM ALL CHAR
2. From DCL:
$ SHOW PROCESS MCC_TS_AM_SRV
3. Then the execute the MCC_TS_AM_STARTUP.COM (Make sure that this is
from a suitably privileged account. Check the users guide, I never
remember all of the privs required.)
4. Repeat steps 1 and 2 again. TSAM should be enabled, and the detached
process should be running.
Let me know how it goes!
-Dave
|
| Dave,
The problem which is ever coming back at our site:
We are running within a CLUSTER. Fortunately it is a homogeneous
cluster, but which is composed of machines of different types.
This means that we have a variety of Ethernet interfaces as
MNA-0, BNA-0, UNA-0 and I do not mention the ones of the satellites.
Now I remember that when I registered the TS, I had to give a Circuit name.
The manual is perhaps accurate, but therefore we ran in the problem.
I registered the TS on one node, trying to specify "Service Circuits"
YES Circuits with an "s". The manual specifies: A list of service
circuits for the host to access the server. DECmcc copies the first
circuit in the list to the DECmcc database.
1). We tried this, but a list was never accepted. Only a single circuit
could be specified.
2). If it had been possible to specify a list of circuits, only the
first one would have been "kept" in the MIR. Why ?
Finally I was moved to another system of the cluster, an there the
panic situation started. Everything hung .... because of this
inconsistent Service circuit.
Although it is still not very clean (in our case) to have to specify a
list of service circuits for each TS, is it �berhaupt possible to
specify a list ? Wouldn't it be nice to specify a list only once at
the TS_AM child entity level of MCC to get rid of this for each TS ?
A list 4 4 service circuits has been too short in our case during the
previous years, as experienced using TSM, because our cluster is used
as a cluster is expected to be used, and that a lot of BOOT MEMBERS and
SATELLITES of different kinds did exhaust the list very quickly.
Thanks for the background mechanism of the TS AM. I am sure this
will help us next time. Also I hope that my problem statement has also
been useful to better understand from where problems in the field can
come.
Regards,
Dominique.
|
| For the second part of note 3155.0, about the
"Already operating", and "Normal operation has begun", this is also
in the case I was on another system than the one I did the installation
and setup of the TS AM. Could this info help you ?
On my Original system I get now "Normal operations has begun" each time
I execute the TS STARTUP command file.
Dominique.
|