[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | DECtp Desktop for ACMS |
|
Moderator: | UCROW::GIBSON |
|
Created: | Mon Sep 24 1990 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 859 |
Total number of notes: | 3034 |
853.0. "problems with failover" by CSC32::J_HENSON (Don't get even, get ahead!) Fri May 09 1997 13:18
acmsdi v2.2 (link date on di server 19-oct-1995), ovms v6.1, vax,
distributed
Healthnet is reporting a problem with failover by the DI server. Their
configuration is as follows.
- 3 FE vaxes running acmsdi. Also have regular acms users logged
in.
- clients are running on PCs using Visual Basic. She didn't know
what network, but will find out and let me know.
- there is a BE/application node that is used by all three DI
front end Vaxes, as well as the normal acms users.
- there is a backup BE node, and they have a system logical defined
similar to APPLICATION = NODE1::APPLICATION,NODE2::APPLICATION.
Recently, the backend application node crashed. When this occurred,
all regular acms users and 2 of the 3 DI servers performed failover
as expected. One of the DI servers began looping (I think). When
I pressed for more detail, I was told that show proc revealed that
no I/Os were being done, and that the process was in CUR state. This
doesn't seem right to me, so the customer may be a bit confused on this.
When the main applications node was restarted, they executed acms/reprocess
application. Regular acms users were revectored to the primary
application node, and 2 of the 3 di servers did the same. The 3rd
di server remained in a loop, and would not allow new user logins,
nor would it do any work for users currently logged in. They had
to stop and restart this server in order to conintue processing.
This happened a second time, and all 3 di servers exhibited the
same problems/behavior as the one problem server from the previous backend
crash. They had to cycle all 3 di servers in order to get any
work done.
That is all I know about this problem. The customer will be sending
swlup information, and I will post it when I get it, or at least provide
a pointer to it.
Any of this sound familiar to anyone? Is there any other information
I should be getting?
Thanks,
Jerry
T.R | Title | User | Personal Name | Date | Lines |
---|
853.1 | more info | CSC32::J_HENSON | Don't get even, get ahead! | Fri May 09 1997 14:29 | 21 |
| I have some additional information, but am not sure if it will help.
I did get the swlup log of the events that occurred while this
was happening. I can make that available, but don't think it
will help. The only even logged by the acmsdi server was an
invalid login attempt with an invalid password. According to
the customer, this was logged AFTER the di server was stopped and
restarted.
While this was happening, the clients were logging error -3020, which
is that the di server has died. However, the customer assures me
that the server was running the entire time, but in a CUR state
and not performing any i/o. So, I'm confused (nothing unusual).
I have asked them to enable client logging on both the PCs and
the vax running the di server, but don't know what else to do.
Also, they're using tcp/ip (don't know whose) as their communications
layer.
Jerry
|
853.2 | | UCROW::GIBSON | | Fri May 16 1997 10:40 | 14 |
| Hello Jerry,
- Can you find out the version of ACMS that is running on all the
systems?
How many nodes are involved - 5?
- Whether or not it is UCX or Multinet and the version.
- Are they running DECnet/OSI over the TCP/IP or are they using straight
DECnet between the ACMS F/E and B/E machines?
- In what way did the original application node fail (so we might be
able to try something here)?
- It sounds like all the F/E nodes are running CP agents and that they
all failover and back OK, just the Acmsdi agents fail to do so - right?
/Tom
|