| Phil,
I had a couple of mail exchanges with the author of the base note...
It comes out that the problem does not pertain to PSW nor the feeder
(even after installing a more advanced version of the feeder, which
source is posted into NOTED::SNS, note #416.1).
An extract from the logfile of the feeder is included below:
----------------------------------------------------------------------------
%SNS-F-DQP, HP7128 Queue scheduler is not running
%SNS-E-PRO, GSUX02 Process iidbms with UID=3D[7] is missing
%SNS-E-PRO, GSUV13 Process APPLIPRC with UIC=3D[1,4] is missing
%SNS-I-EXT, GSUV13 YTFKHVKHJGV gggg
%SNS-E-PRO, GSUV12 Process DTQMAN with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV12 Process DENPST with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV12 Process DCRGON with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV12 Process DBPGON with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV12 Process DBIGON with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV11 Process DTQMAN with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV11 Process DENPST with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV11 Process DCRGON with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV11 Process DBPGON with UIC=3D[250,10] is missing
%SNS-E-PRO, GSUV11 Process DBIGON with UIC=3D[250,10] is missing
%SNS-W-DNF, GSUV10 Disk SYS$SYSDEVICE has less than 10% (~209109) free blocks
%SNS-W-DNF, GSUV07 Disk $1$DKA200 has less than 5% (~102543) free blocks
%SNS-W-DNF, GSISUR Disk $1$DUA0 has less than 30% (~93360) free blocks
CMUserSendEvent failed with status <204>
%SNS-I-WDM, ALTAIR psw_agent error: process not running
----------------------------------------------------------------------------
Note the "CMUserSendEvent failed with status <204>" message -issued by
the feeder error handling-, which appens just after a quite harmless and
usual "disk near full" event message.
According to the CMReturnStatusCodes enumeration definition into
console.h, it must be a CM_EVENTSENDFAILED" error, raised consequently
to the CM event list crash (the PC shown in the base note should help
you in finding the exact location within the PCM code).
Note as well that, as stated into the base note, the PCM ECO01 is
already installed.
I hope this helps you in investigating the problem.
I'm at your disposal for the PSW and PSW to PCM feeder parts concerns.
Regards,
-- Olivier.
|
| Re -2
Phil,
Here are more infos from my customer :
Hope it will help you
Thierry
The question was
The CMUserSendEvent failed because it thought that ENS
had "gone away" now, is the crash an ENS crash causing
the evntlist to exit, or is ENS still running after the
crash?
Answer :
Each time we stop/start the consolidator while the SNS$Feed_PCM is running,
the EventList window crashes and then the Console Notify process crashes a few
seconds later.
Show system before crash :
--------------------------
000002C2 Console Daemon HIB 6 113 0 00:00:00.73 348 380
00000097 Console Ctrl 01 HIB 6 388 0 00:00:01.26 281 454
00000098 Console Ctrl 02 HIB 6 158 0 00:00:00.70 302 364
000001DE SYSTEM_1 LEF 6 771 0 00:00:02.76 661 805 S
00000271 SNS$WATCHDOG HIB 6 15 0 00:00:00.30 236 118
00000272 SNS$CONS_538 LEF 4 847 0 00:00:02.10 386 313
00000274 SNS$Feed_PCM HIB 4 34 0 00:00:00.17 144 103
00000275 Console Notify HIB 6 98 0 00:00:00.51 330 356
00000177 SYSTEM LEF 6 10979 0 00:00:13.97 3738 577
At this time, we stop the consolidator and restart a new one.
Old messages from Watchdog are sent again to the PCM mailbox.
Show system AFTER the EventList window crash (Notify daemon desapeared) :
-------------------------------------------------------------------------
000002C2 Console Daemon HIB 6 115 0 00:00:00.79 351 383
00000097 Console Ctrl 01 LEF 6 418 0 00:00:01.33 284 457
00000098 Console Ctrl 02 LEF 6 170 0 00:00:00.77 305 367
0000015F SNS$CONS_619 HIB 4 1684 0 00:00:03.43 352 498
00000271 SNS$WATCHDOG HIB 6 15 0 00:00:00.37 285 105
00000274 SNS$Feed_PCM LEF 6 100 0 00:00:00.26 229 85
00000177 SYSTEM LEF 6 11384 0 00:00:14.59 4018 486
The EventList Window crash dump :
---------------------------------
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=003381C0, PC
=00043FC8, PS=0000001B
Improperly handled condition, image exit forced.
Signal arguments: Number = 00000005
Name = 0000000C
00000000
003381C0
00043FC8
0000001B
Register dump:
R0 = 00000000003381C0 R1 = 0000000000000110 R2 = 0000000000011BC8
R3 = 0000000000CE9D40 R4 = 0000000000000000 R5 = 00000000000A4680
R6 = 00000000000A4680 R7 = 00000000000A46E0 R8 = 0000000000000004
R9 = 0000000000000001 R10 = 00000000000A2A00 R11 = 000000007F96F478
R12 = 000000007F96F268 R13 = 00000000000A61DC R14 = 0000000000000000
R15 = 0000000500000000 R16 = 0000000000000006 R17 = 000000007F96F368
R18 = 000007B400D22158 R19 = 000000007FB967C8 R20 = 0000000000000000
R21 = 0000000000000000 R22 = 0000000045555254 R23 = 0000000000D2215C
R24 = F00D000000000000 R25 = 0000000000000001 R26 = 0000000000043ED0
R27 = 000000007FB967C8 R28 = 0000000000043FC0 R29 = 000000007F96F200
SP = 000000007F96F200 PC = 0000000000043FC8 PS = 000000000000001B
The accounting record for the EventList Window subprocess :
-----------------------------------------------------------
SUBPROCESS Process Termination
------------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: SYSTEM Finish time: 17-NOV-1994 17:47:11.57
Process ID: 000001DE Start time: 17-NOV-1994 17:46:07.55
Owner ID: 00000177 Elapsed time: 0 00:01:04.01
Terminal name: Processor time: 0 00:00:02.85
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: FFFFFFFF
Remote ID: Privilege <63-32>: FFFFFFFF
Queue entry: Final status code: 1000000C
Queue name:
Job name:
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB, virtual
Page faults: 703 Direct IO: 46
Page fault reads: 137 Buffered IO: 799
Peak working set: 13040 Volumes mounted: 0
Peak page file: 41904 Images executed: 3
The accounting record for "Console Notify" process :
----------------------------------------------------
DETACHED Process Termination
----------------------------
Username: SYSTEM UIC: [SYSTEM]
Account: SYSTEM Finish time: 17-NOV-1994 17:47:54.48
Process ID: 00000275 Start time: 17-NOV-1994 17:45:50.79
Owner ID: Elapsed time: 0 00:02:03.68
Terminal name: Processor time: 0 00:00:03.55
Remote node addr: Priority: 4
Remote node name: Privilege <31-00>: FFFFFFFF
Remote ID: Privilege <63-32>: FFFFFFFF
Queue entry: Final status code: 1000000C
Queue name:
Job name:
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB, virtual
Page faults: 823 Direct IO: 906
Page fault reads: 262 Buffered IO: 434
Peak working set: 19280 Volumes mounted: 0
Peak page file: 38368 Images executed: 5
And the SNS$Feed_PCM logfile :
------------------------------
$ RUN CONSOLE$ACTIONS:AXP_SNS$FEED_PCM
%SNS-I-WDM, URSA psw_agent error: process not running
%SNS-F-DQP, HP7128 Queue scheduler is not running
%SNS-E-PRO, GSUX02 Process iidbms with UID=[7] is missing
%SNS-E-PRO, GSUV13 Process APPLIPRC with UIC=[1,4] is missing
%SNS-E-EXT, GSUV13 DANGER test 1
%SNS-E-EXT, GSUV13 DANGER test 2
%SNS-E-EXT, GSUV13 DANGER test 3
%SNS-E-EXT, GSUV13 DANGER test 4
%SNS-E-EXT, GSUV13 DANGER test 5
%SNS-E-EXT, GSUV13 DANGER test 6
%SNS-E-EXT, GSUV13 DANGER test 7
%SNS-E-EXT, GSUV13 DANGER test 8
%SNS-E-EXT, GSUV13 DANGER test 9
%SNS-E-EXT, GSUV13 DANGER test 10
%SNS-W-DNF, GSUV10 Disk SYS$SYSDEVICE has less than 10% (~209109) free blocks
%SNS-W-DQP, GSUV10 Queue RAJA1_LAETI (On node GSUV11) is stopped
CMUserSendEvent failed with status <204>
%SNS-W-DQP, GSUV10 Queue RAJA1_LAFAC (On node GSUV11) is stopped
---------------------------------------
|
| Re .5
Phil,
Concerning the crash, I am sending the new
CONSOLE$DAEMON.EXE_AXP to the customer. I"ll tell you about it.
Now concerning the "hang". Here are more infos :
Here is how the DECserver is setup.
Here is also :
GSIFI::SYS$MANAGER:OPERATOR.LOG
CONSOLE$LOGFILES:GSIFI.LOG (Problem with "OPCOM 18-NOV-1994 00:01:15.27"
The "CONSOLE MONITOR GSIFI" screen (missing data between 00:23 and 07:15)
GSIFI and GSISUR are in the same cluster.
Look at my comments especially in the LOG files. They are marked by >>
Thanks
Thierry
---------------------------------------
>
> OK, as data loggin stops, it looks to me as though your
> customer is losing connection to the remote device, Is
> he getting any events at all to say that the console has
> been "lost" if so, does the evnt say whyis there any
> reason attached?
>
DECserver 700-16 V1.1 BL44-11 LAT V5.1 ROM V4.0-0 Uptime: 21 10:22:40
Address: 08-00-2B-38-EE-01 Name: GSUS02 Number: 0
Identification: SERVER GSUS02
Circuit Timer: 80 Password Limit: 3
Console Port: 1 Prompt: Local>
Inactivity Timer: 30 Queue Limit: 100
Keepalive Timer: 20 Retransmit Limit: 8
Multicast Timer: 30 Session Limit: 64
Node Limit: 200 Software: WWENG1
Service Groups: 1
Enabled Characteristics:
Announcements, Broadcast, Dump, Lock
Port 4: (Remote) Server: GSUS02
Character Size: 8 Input Speed: 9600
Flow Control: XON Output Speed: 9600
Parity: None Signal Control: Disabled
Stop Bits: Dynamic Signal Select: CTS-DSR-RTS-DTR
Access: Remote Local Switch: None
Backwards Switch: None Name: PCM_GSIFI
Break: Remote Session Limit: 4
Forwards Switch: None Type: Hard
Default Protocol: LAT
Preferred Service: None
Authorized Groups: 1
(Current) Groups: 1
Enabled Characteristics:
Failover, Input Flow Control, Lock, Loss Notification, Message Codes,
Output Flow Control, Verification
---------------------------------------
The DECserver and Console LAT port definition for the managed node GSIFI :
--------------------------------------------------------------------------
DECserver 700-16 V1.1 BL44-11 LAT V5.1 ROM V4.0-0 Uptime: 21 10:22:40
Address: 08-00-2B-38-EE-01 Name: GSUS02 Number: 0
Identification: SERVER GSUS02
Circuit Timer: 80 Password Limit: 3
Console Port: 1 Prompt: Local>
Inactivity Timer: 30 Queue Limit: 100
Keepalive Timer: 20 Retransmit Limit: 8
Multicast Timer: 30 Session Limit: 64
Node Limit: 200 Software: WWENG1
Service Groups: 1
Enabled Characteristics:
Announcements, Broadcast, Dump, Lock
Port 4: (Remote) Server: GSUS02
Character Size: 8 Input Speed: 9600
=46low Control: XON Output Speed: 9600
Parity: None Signal Control: Disabled
Stop Bits: Dynamic Signal Select: CTS-DSR-RTS-DTR
Access: Remote Local Switch: None
Backwards Switch: None Name: PCM_GSIFI
Break: Remote Session Limit: 4
=46orwards Switch: None Type: Hard
Default Protocol: LAT
Preferred Service: None
Authorized Groups: 1
(Current) Groups: 1
Enabled Characteristics:
=46ailover, Input Flow Control, Lock, Loss Notification, Message Codes,
Output Flow Control, Verification
GSIFI::SYS$MANAGER:OPERATOR.LOG :
---------------------------------
%%%%%%%%%%% OPCOM 18-NOV-1994 00:00:01.24 %%%%%%%%%%%
Message from user SYSTEM on GSIFI
defragmentation process start
Process ID: 21810D16
Device: _$1$DIA6:
Time: 18-NOV-1994 00:00:01.02
%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:08.42 %%%%%%%%%%% (from node=
GSISUR a=D7
Message from user DECNET on GSISUR
DECnet event 7.5, clear
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:00:47.43
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D Invalid=
P(s) (1)
%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR a=D7
Request 299, from user EXPLOIT on GSISUR
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
%%%%%%%%%%% OPCOM 18-NOV-1994 00:17:34.46 %%%%%%%%%%% (from node=
GSISUR a=D7
Message from user DECNET on GSISUR
DECnet event 7.5, clear
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:17:13.17
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D No=
additional informat=D7
00:25:36.96, request 299 was completed by operator _GSISUR$LTA3760:
%%%%%%%%%%% OPCOM 18-NOV-1994 00:30:14.49 %%%%%%%%%%% (from node=
GSISUR a=D7
Message from user DECNET on GSISUR
DECnet event 7.5, clear
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:29:52.91
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D Invalid=
P(s) (1)
%%%%%%%%%%% OPCOM 18-NOV-1994 00:42:31.23 %%%%%%%%%%%
Logfile time stamp
CONSOLE$LOGFILES:GSIFI.LOG (Problem with "OPCOM 18-NOV-1994 00:01:15.27") :
----------------------------------------------------------------------------
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:00:01.24 %%%%%%%%%%%=98
Message from user SYSTEM on GSIFI=98
defragmentation process start =98
Process ID: 21810D16=98
Device: _$1$DIA6:=98
Time: 18-NOV-1994 00:00:01.02
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:08.42 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:01:57.33)=98
Message from user DECNET on GSISUR=98
DECnet event 7.5, clear=98
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:00:47.43=98
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET=98
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D Invalid=
P(s) (1)=98
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
>>> That message is repeated why ??? It is not repeated on the console !
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:17:34.46 %%%%%%%%%%% (from node=
>> Look at the time time it is after NEXT MESSAGE !
>> Why is it before ?
GSISUR =D7
18-NOV-1994 00:18:23.36)=98
Message from user DECNET on GSISUR=98
DECnet event 7.5, clear=98
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:17:13.17=98
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET=98
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D No=
additional informat=D7
on (0)=98
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:02:04.18)=98
Request 299, from user EXPLOIT on GSISUR=98
BATCH_100, Montez la bande Y22601 (R=E9pondre quand c'est pret)
=98
=1A=98
00:25:36.96, request 299 was completed by operator _GSISUR$LTA3760:
=98
=1A%%%%%%%%%%% OPCOM 18-NOV-1994 00:30:14.49 %%%%%%%%%%% (from node=
GSISUR =D7
18-NOV-1994 00:31:03.38)=98
Message from user DECNET on GSISUR=98
DECnet event 7.5, clear=98
=46rom node 55.504 (GSXR02), 18-NOV-1994 01:29:52.91=98
Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET=98
Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D Invalid=
P(s) (1)=98
The "CONSOLE MONITOR GSIFI" screen (missing data between 00:23 and 07:15)
----------------------------------------------------------------------------
----
****************************************************************************
*****
00:13:32
00:13:32 %%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from =
nod
00:13:32 18-NOV-1994 00:02:04.18)
00:13:32 Request 299, from user EXPLOIT on GSISUR
00:13:32 BATCH_100, Montez la bande Y22601 (R pondre quand c'est pret)
00:18:32
00:18:32 %%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from =
nod
00:18:32 18-NOV-1994 00:02:04.18)
00:18:32 Request 299, from user EXPLOIT on GSISUR
00:23:32 BATCH_100, Montez la bande Y22601 (R pondre quand c'est pret)
00:23:32
00:23:32 %%%%%%%%%%% OPCOM 18-NOV-1994 00:17:34.46 %%%%%%%%%%% (from =
nod
00:23:32 18-NOV-1994 00:18:23.36)
00:23:32 Message from user DECNET on GSISUR
00:23:32 DECnet event 7.5, clear
00:23:32 From node 55.504 (GSXR02), 18-NOV-1994 01:17:13.17
00:23:32 Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D GSINET
00:23:32 Direction =3D Incoming, LCN =3D 2, Cause =3D 198, Diagnostic =3D No=
additional
00:23:32 on (0)
00:23:32
00:23:32
00:23:32 %%%%%%%%%%% OPCOM 18-NOV-1994 00:01:15.27 %%%%%%%%%%% (from =
nod
00:23:32 18-NOV-1994 00:02:04.18)
00:23:32 BATCH_100, Montez la bande YIT on GSISUR
>> No message are displayed during 00:23 and 07:15 !
07:15:44 %%%%%%%%%%% OPCOM 18-NOV-1994 06:11:52.38 %%%%%%%%%%% (from =
nod
07:15:44 18-NOV-1994 06:12:41.07)
07:15:44 Message from user DECNET on GSISUR
07:15:44 DECnet event 7.5, clear
07:15:44 From node 55.504 (GSXR02), 18-NOV-1994 07:11:30.76
07:15:44 Module X25-PROTOCOL, DTE =3D 8010010061, Network =3D TRANSPAC
*GSIFI - VAX 4100A cluster production Picard-Cezus | Fri Nov 18 1994 | New =
Data
|