| Title: | POLYCENTER Console Manager |
| Notice: | Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS: |
| Moderator: | CSC32::BUTTERWORTH |
| Created: | Thu Aug 06 1992 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1541 |
| Total number of notes: | 6564 |
My customer is running PCM V1.5 ECO1 on a VAX/VMS station running V6.0.
From time to time the console processes go into a RWMBX wait state. The
corresponding mailbox does not show up as busy. Attached you can find
an SDA output of the running system. The messages in the mailboxes
seem to be error messages. No process is there to read them.
Thanks for any help,
Joerg
VAX/VMS System analyzer
SDA> show sum
Current process summary
-----------------------
Extended Indx Process name Username State Pri PCB PHD Wkset
-- PID -- ---- --------------- ----------- ------- --- -------- -------- ------
00000081 0001 SWAPPER HIB 16 826051A0 82605000 0
00000085 0005 IPCACP SYSTEM HIB 10 82C82AC0 831F4200 86
00000086 0006 ERRFMT SYSTEM HIB 8 82C830C0 832FA200 122
00000088 0008 AUDIT_SERVER AUDIT$SERVER HIB 10 82C838C0 83400200 82
00000089 0009 JOB_CONTROL SYSTEM HIB 10 82C83AC0 83483200 91
0000008A 000A QUEUE_MANAGER SYSTEM HIB 8 82C83EC0 83506200 142
0000168C 000C OPSMGR_2 OPSMGR LEF 6 82C9E440 84F1F200 94
00002B0D 000D OPCOM SYSTEM HIB 7 82C7FFC0 8337D200 138
00001790 0010 _FTA65: OPS HIB 8 82CB2180 848FB200 87
00000092 0012 NETACP DECNET HIB 10 82C87CC0 8368F200 822
00000093 0013 EVL DECNET HIB 6 82C884C0 83589200 122
00000094 0014 SNS$WATCHDOG SYSTEM HIB 5 82C896C0 8391E200 269
00000097 0017 REMACP SYSTEM HIB 9 82C818C0 839A1200 70
00000099 0019 SCHED_REMOTE SYSTEM LEF 6 82C898C0 83AA7200 54
0000009B 001B SMISERVER SYSTEM HIB 9 82C87AC0 83795200 52
0000009C 001C LATACP SYSTEM HIB 13 82C81AC0 8360C200 109
0000009E 001E UCX$INET_ACP INTERnet HIB 10 82C872C0 83818200 203
0000009F 001F DECW$SERVER_0 OPS HIB 6 82C882C0 83B2A200 5237
00001B20 0020 OPSMGR_1 OPSMGR LEF 6 82CA7D40 8414E200 97
00001DA1 0021 FIELD FIELD CUR 10 82C53E00 8435A200 1364
000000A5 0025 DECW$SESSION OPS LEF 7 82C854C0 83712200 87
000000A7 0027 DECW$MWM OPS LEF 4 82C866C0 83277200 811
000000A9 0029 VUE$OPS_3 OPS LEFO 5 82BB9940 00000000 337
000000AA 002A VUE$OPS_4 OPS LEFO 5 82C852C0 00000000 311
000000AB 002B VUE$OPS_5 OPS LEF 7 82C86CC0 83D36200 239
000000AC 002C DECW$TE_00AC OPS LEF 6 82C86AC0 83DB9200 4649
000000AD 002D _FTA8: OPS HIB 5 82C876C0 83A24200 79
000000AF 002F _FTA10: OPS HIB 6 82C880C0 83EBF200 87
000000B0 0030 Console Notify SYSTEM RWMBX 6 82C812C0 83F42200 1792
000000B1 0031 Console Daemon SYSTEM HIB 6 82C860C0 83FC5200 311
000000B2 0032 Console Ctrl 01 SYSTEM RWMBX 6 82C816C0 8389B200 3871
000000B3 0033 Console Ctrl 02 SYSTEM RWMBX 6 82C81EC0 84048200 3497
000000B4 0034 OPS OPS LEF 4 82C94580 840CB200 222
00002A37 0037 OPSMGR_3 OPSMGR HIB 6 82CA9040 843DD200 99
SDA> set proc/ind=30
SDA> show dev/add=@r5
I/O data structures
-------------------
MBA1408 MBX UCB address: 82C54900
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 1640 ORB address 82C35640
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 00000008 Byte count 0000 LNM address 82AFB140
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 00010030
*** I/O request queue is empty ***
SDA> define iob=ucb
SDA> @see_mbx
00000000 00B00321 00000002 00000001 ........!.�..... 82C56024
00230010 000C0042 000F0017 00000000 ........B.....#. 82C56034
72674D43 2EE6BE96 00000000 2FFC0007 ..�/.....��.CMgr 82C56044
756F4620 544F4E20 656C6F73 6E6F4320 Console NOT Fou 82C56054
6567616E 616D454C 4F534E4F 4300646E nd.CONSOLEmanage 82C56064
6E65706F 206F7420 656C6261 6E550072 r.Unable to open 82C56074
73797320 726F6620 656C6F73 6E6F6320 console for sys 82C56084
59532520 2D203153 47574E43 206D6574 tem CNWGS1 - %SY 82C56094
6261202C 54524F42 412D462D 4D455453 STEM-F-ABORT, ab 82C560A4
736E6F43 20646567 616E614D 0074726F ort.Managed Cons 82C560B4
61766120 746F6E20 656E696C 20656C6F ole line not ava 82C560C4
4D430031 5347574E 4300656C 62616C69 ilable.CNWGS1.CM 82C560D4
6C6F736E 6F43006C 616E7265 746E4920 Internal.Consol 82C560E4
82C56100 00000072 6567616E 614D2065 e Manager....a�. 82C560F4
SDA> @see_mbx
0001FF30 000000B8 00000000 00000010 ........�...0... 82BC9924
SDA> set proc/ind=32
SDA> show dev/addr=@r5
I/O data structures
-------------------
MBA94 MBX UCB address: 82C57400
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 2737 ORB address 82BFB8C0
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 00000009 Byte count 0000 LNM address 82AFA170
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 00010030
*** I/O request queue is empty ***
SDA> @see_mbx
00000000 00B00322 00000005 00000001 ........".�..... 82C8FFE4
002A0010 000C0023 000F0010 00000000 ........#.....*. 82C8FFF4
72674D43 2EE6BE51 00000000 2FFC0004 ..�/....Q��.CMgr 82C90004
534E4F43 00746365 6E6E6F63 73694420 Disconnect.CONS 82C90014
20726573 55007265 67616E61 6D454C4F OLEmanager.User 82C90024
72662064 65746365 6E6E6F63 73696420 disconnected fr 82C90034
73550053 4541206D 65747379 73206D6F om system AES.Us 82C90044
63656E6E 6F637369 64207361 68207265 er has disconnec 82C90054
454C4F53 4E4F4320 6D6F7266 20646574 ted from CONSOLE 82C90064
49204D43 00534541 00726567 616E616D manager.AES.CM I 82C90074
20656C6F 736E6F43 006C616E 7265746E nternal.Console 82C90084
4D430031 5347574E 00726567 616E614D Manager.NWGS1.CM 82C90094
SDA> set proc/ind=33
SDA> show dev/addr=@r5
I/O data structures
-------------------
MBA90 MBX UCB address: 82C55C00
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 432 ORB address 82BF8B00
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 0000000A Byte count 0000 LNM address 82AFA070
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 00010030
*** I/O request queue is empty ***
SDA> @see_mbx
00000000 00B00323 00000001 00000001 ........#.�..... 82C78BE4
00220010 000C004B 000F0012 00000000 ........K.....". 82C78BF4
72674D43 2EE6BEB2 00000000 2FFC0007 ..�/....���.CMgr 82C78C04
4F430074 736F4C20 656C6F73 6E6F4320 Console Lost.CO 82C78C14
6E6F4300 72656761 6E616D45 4C4F534E NSOLEmanager.Con 82C78C24
79732072 6F66206B 6E696C20 656C6F73 sole link for sy 82C78C34
74736F6C 2031424E 43535620 6D657473 stem VSCNB1 lost 82C78C44
20646573 696E676F 6365726E 55202D20 - Unrecognised 82C78C54
646F6320 726F7272 65206D65 74737973 system error cod 82C78C64
69746365 6E6E6F43 00363138 35203A65 e: 5816.Connecti 82C78C74
67616E61 6D206F74 2074736F 6C206E6F on lost to manag 82C78C84
31424E43 5356006D 65747379 73206465 ed system.VSCNB1 82C78C94
6E6F4300 6C616E72 65746E49 204D4300 .CM Internal.Con 82C78CA4
82C92D00 72656761 6E614D20 656C6F73 sole Manager.-�. 82C78CB4
SDA> @see_mbx
0000FFD0 000000BB 00000000 00000010 ........�...�... 82BD61A4
SDA> show proc/chan
Process index: 0033 Name: Console Ctrl 02 Extended PID: 000000B3
--------------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DKA300:
0020 82BC9B00 DKA300:[CONSOLE.IMAGES]CONSOLE$DAEMON.EXE;6
0030 82BCCD40 DKA300:[VMS$COMMON.SYSLIB]SMGSHR.EXE;1 (section file)
0040 82BCDFC0 DKA300:[VMS$COMMON.SYSLIB]LIBRTL.EXE;3 (section file)
0050 82BCC680 DKA300:[VMS$COMMON.SYSLIB]PTD$SERVICES_SHR.EXE;1 (section file)
0060 82BCE580 DKA300:[VMS$COMMON.SYSLIB]DECC$SHR.EXE;1 (section file)
0070 82BCE640 DKA300:[VMS$COMMON.SYSLIB]CMA$TIS_SHR.EXE;1 (section file)
0080 82BC9D40 DKA300:[VMS$COMMON.SYSLIB]UVMTHRTL.EXE;3 (section file)
0090 82BD0EC0 DKA300:[VMS$COMMON.SYSLIB]DECW$XLIBSHR.EXE;3 (section file)
00A0 82BCE280 DKA300:[VMS$COMMON.SYSLIB]VAXCRTL.EXE;3 (section file)
00B0 82BCE700 DKA300:[VMS$COMMON.SYSLIB]DECW$TRANSPORT_COMMON.EXE;1 (section file)
00C0 82BD1C80 DKA300:[VMS$COMMON.SYSLIB]DECW$DXMLIBSHR.EXE;1 (section file)
00D0 82BCCA40 DKA300:[VMS$COMMON.SYSLIB]LBRSHR.EXE;3 (section file)
00E0 82BD03C0 DKA300:[VMS$COMMON.SYSLIB]DECW$DWTLIBSHR.EXE;3 (section file)
00F0 82BD1A80 DKA300:[VMS$COMMON.SYSLIB]DECW$XMLIBSHR.EXE;1 (section file)
0100 82BD1200 DKA300:[VMS$COMMON.SYSLIB]DECW$XEXTLIBSHR.EXE;1 (section file)
0110 82BD5680 DKA300:[VMS$COMMON.SYSLIB]DECW$TERMINALSHR.EXE;1 (section file)
0120 00000000 MBA88:
0130 00000000 Busy MBA89:
0140 00000000 MBA90:
0150 00000000 Busy MBA91:
0160 00000000 LTA5578:
0170 82C04740 DKA300:[CONSOLE.LOG]AES.TIMES;1
0180 82C0F900 DKA300:[CONSOLE.LOG]AES.EVENTS;1
0190 82C1B300 DKA300:[CONSOLE.LOG]AES.LOG;1
01A0 82BF0DC0 DKA300:[CONSOLE.LOG]AXCRNA.TIMES;1
01B0 00000000 LTA5064:
01C0 82BEF740 DKA300:[CONSOLE.LOG]ATLAS1.TIMES;1
01D0 82BEF2C0 DKA300:[CONSOLE.LOG]ATLAS1.EVENTS;1
01E0 82C31080 DKA300:[CONSOLE.LOG]ATLAS1.LOG;1
01F0 00000000 Busy MBA1225:
0200 00000000 LTA5065:
0210 82C3D500 DKA300:[CONSOLE.LOG]ATLAS3.TIMES;1
0220 82C21FC0 DKA300:[CONSOLE.LOG]ATLAS3.EVENTS;1
0230 82C2F040 DKA300:[CONSOLE.LOG]ATLAS3.LOG;1
0240 00000000 Busy MBA1226:
0250 82C00480 DKA300:[CONSOLE.LOG]AXCRNA.EVENTS;1
0260 82BF4D80 DKA300:[CONSOLE.LOG]AXCRNA.LOG;1
0270 00000000 Busy MBA1953:
0280 00000000 LTA5579:
0290 82C23040 DKA300:[CONSOLE.LOG]DXCERN.TIMES;1
02A0 00000000 LTA5075:
02B0 82C19140 DKA300:[CONSOLE.LOG]AXCRNB.TIMES;1
02C0 82BF6A00 DKA300:[CONSOLE.LOG]AXCRNB.EVENTS;1
02D0 82C04C80 DKA300:[CONSOLE.LOG]AXCRNB.LOG;1
02E0 00000000 Busy MBA1234:
02F0 00000000 LTA5080:
0300 82C3FD80 DKA300:[CONSOLE.LOG]AXCRNC.TIMES;1
0310 82C28B00 DKA300:[CONSOLE.LOG]AXCRNC.EVENTS;1
0320 82C0C900 DKA300:[CONSOLE.LOG]AXCRNC.LOG;1
0330 00000000 Busy MBA1239:
0340 00000000 LTA5066:
0350 82BFB2C0 DKA300:[CONSOLE.LOG]CHORUS_TEST.TIMES;1
0360 82C30A80 DKA300:[CONSOLE.LOG]CHORUS_TEST.EVENTS;1
0370 82C3BAC0 DKA300:[CONSOLE.LOG]CHORUS_TEST.LOG;1
0380 00000000 Busy MBA1227:
0390 00000000 Busy LTA5067:
03A0 82C47C40 DKA300:[CONSOLE.LOG]CMS1.TIMES;1
03B0 82C2C400 DKA300:[CONSOLE.LOG]CMS1.EVENTS;1
03C0 82C4D580 DKA300:[CONSOLE.LOG]CMS1.LOG;1
03D0 00000000 Busy MBA1228:
03E0 00000000 Busy LTA5068:
03F0 82C2E980 DKA300:[CONSOLE.LOG]CMS2.TIMES;1
0400 82C0F9C0 DKA300:[CONSOLE.LOG]CMS2.EVENTS;1
0410 82C135C0 DKA300:[CONSOLE.LOG]CMS2.LOG;1
0420 00000000 Busy MBA1229:
0430 82C3F240 DKA300:[CONSOLE.LOG]DXCERN.EVENTS;1
0440 82C25D40 DKA300:[CONSOLE.LOG]DXCERN.LOG;1
0450 00000000 Busy MBA1954:
0460 00000000 Busy LTA5580:
0470 82C0C480 DKA300:[CONSOLE.LOG]HSC1.TIMES;1
0480 82C2FA00 DKA300:[CONSOLE.LOG]HSC1.EVENTS;1
0490 82C2FF40 DKA300:[CONSOLE.LOG]HSC1.LOG;1
04A0 00000000 Busy MBA1955:
04B0 00000000 Busy LTA5581:
04C0 82C26100 DKA300:[CONSOLE.LOG]HSC8.TIMES;1
04D0 00000000 Busy LTA5583:
04E0 82BF8740 DKA300:[CONSOLE.LOG]HSC7.TIMES;1
04F0 82C1FA40 DKA300:[CONSOLE.LOG]HSC7.EVENTS;1
0500 82BEAD00 DKA300:[CONSOLE.LOG]HSC7.LOG;1
0510 00000000 Busy MBA1964:
0520 82C06F00 DKA300:[CONSOLE.LOG]HSC8.EVENTS;1
0530 82C42600 DKA300:[CONSOLE.LOG]HSC8.LOG;1
0540 00000000 Busy MBA1956:
0570 00000000 LTA5081:
0580 82C4B6C0 DKA300:[CONSOLE.LOG]UXACB.TIMES;1
0590 82C516C0 DKA300:[CONSOLE.LOG]UXACB.EVENTS;1
05A0 82BFDF00 DKA300:[CONSOLE.LOG]UXACB.LOG;1
05B0 00000000 Busy MBA1240:
05C0 00000000 LTA5082:
05D0 82C47640 DKA300:[CONSOLE.LOG]UXCSB1.TIMES;1
05E0 82C28200 DKA300:[CONSOLE.LOG]UXCSB1.EVENTS;1
05F0 82C2F7C0 DKA300:[CONSOLE.LOG]UXCSB1.LOG;1
0600 00000000 Busy MBA1241:
0610 00000000 LTA5076:
0620 82C24FC0 DKA300:[CONSOLE.LOG]UXPLOT.TIMES;1
0630 82C44580 DKA300:[CONSOLE.LOG]UXPLOT.EVENTS;1
0640 82C33240 DKA300:[CONSOLE.LOG]UXPLOT.LOG;1
0650 00000000 Busy MBA1235:
0660 82BCD840 DKA300:[VMS$COMMON.SYSMSG]SHRIMGMSG.EXE;1 (section file)
0670 82BCEAC0 DKA300:[VMS$COMMON.SYSMSG]VAXCMSG.EXE;1 (section file)
0680 82BD4100 DKA300:[VMS$COMMON.SYSMSG]DECW$XLIBMSG.EXE;1 (section file)
0690 82BD3F00 DKA300:[VMS$COMMON.SYSMSG]DECW$TRANSPORTMSG.EXE;1 (section file)
06A0 82BD5B00 DKA300:[VMS$COMMON.SYSMSG]DECW$TERMINALMSG.EXE;1 (section file)
06B0 00000000 NLA0:
06C0 00000000 NLA0:
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 510.1 | OPG::PHILIP | And through the square window... | Thu Dec 08 1994 11:35 | 12 | |
Joerg, It is quite normal for the PCM processes to go into RWMBX as you normally have lots of controller daemons feeding one notification process. The data you see in the mailbox are not error messages, they are in fact events!! Are you saying that the processes never leave the resource wait state? Cheers, Phil | |||||
| 510.2 | OPCO::TSG_SJM | Coming live to you from Rosebery | Thu Dec 08 1994 20:58 | 5 | |
This is the same problem we were experiencing, until I reduced the
number of systems per control process down to 5. Now PCM runs as
advertised.
Steve
| |||||
| 510.3 | OPG::PHILIP | And through the square window... | Thu Dec 08 1994 21:34 | 14 | |
Steve, et al. >> This is the same problem we were experiencing, until I reduced the >> number of systems per control process down to 5. Now PCM runs as >> advertised. In that case could you try the new CONSOLE$DAEMON.EXE_VAX thats in OPG::CM$KIT: I have done a lot of work adjusting the quotas the controller daemons are started with. Hopefully the new quotas will alleviate some of the problems people have been experiencing, but we do need some feedback about how it works in your environments. Cheers, Phil | |||||
| 510.4 | Process stays forever in RWMBX state | LEMAN::NEUWEILER | System Support Geneva | Fri Dec 09 1994 10:34 | 6 |
The processes stay forever in the RWMBX state. I will try the new
CONSOLE$DAEMON.EXE_VAX and report back if it doesn't fix the problem.
Thanks for your help,
Joerg
| |||||
| 510.5 | Tried it... | OPCO::TSG_SJM | Coming live to you from Rosebery | Mon Dec 12 1994 20:58 | 8 |
Phil,
I have copied the new daemon image, set the number of systems per
controller back up to 16, and restarted PCM. I will let you know how
we go.
Thanks
Steve
| |||||
| 510.6 | how to limit systems per controller ? | LEMAN::NEUWEILER | System Support Geneva | Tue Dec 13 1994 11:01 | 8 |
.5
Question from the my customer:
How do you limit the number of systems per controller?
Thanks for your help
Joerg
| |||||
| 510.7 | OPG::PHILIP | And through the square window... | Tue Dec 13 1994 12:23 | 8 | |
Answer to your customer: You dont, changing this is strictly unsupported. Hopefully, the RWMBX problems will dissapear with the MUP kit. Cheers, Phil | |||||
| 510.8 | Almost got it... | OPCO::TSG_SJM | Coming live to you from Rosebery | Tue Dec 13 1994 23:50 | 21 |
Phil,
We ran with the new version of the daemon image (compile date 5/12/94)
successfully for about 24 hours, but then several of the control
processes and the notify process went into RWMBX. Slowly one after the
other, the remaining control processes went into RWMBX.
I was in the processes of identifying which of the processes was the
culprit, when they all went out of RWMBX. Slowly the systems began
reconnecting to PCM, and then rather quickly began disconnecting
themselves again.
I have since gone back to the previous image, and also changed that
other parameter back, as everything was running fine under that
configuration.
I think you have almost got it, considering that before it only ran for
about 4-5 hours before falling over. I'll try another image if you
want to have another go at it.
Steve
| |||||
| 510.9 | CM and OSCint | BER::MUELLER | Wed Dec 14 1994 08:35 | 26 | |
---------------------------------------------------------------------------- This entry is cross-posted in the OSCINT conference ---------------------------------------------------------------------------- Hallo, I'm running CM 1.5 ECO 1 on VMS 6.1 together with OSCint V2 also got a problem with processes getting into RWMBX state. The following processes got into RWMBX state about every 2 hours: Console Notify Console Ctrl 01 Console Ctrl 02 OSCINT$PSW_PCM I copied over the latest CONSOLE$DAEMON.EXE mentioned in this topic and tried to run CM with this new exe, but I still have the RWMBX problem, but now only for the Console Notify and OSCINT$PSW_PCM processes. When I run the CM without OSCINT it seems to be fine (at least for 1 day). Sabine Mueller OSC-GY | |||||
| 510.10 | OPG::PHILIP | And through the square window... | Wed Dec 14 1994 09:51 | 19 | |
Steve, Sorry its not working for you, you dont happen to be running OSCint too do you? Sabine, When the processes go into RWMBX, is it as a result of a large number of events from PSW via the PSC to PCM feeder? It would suggest that as you dont appear to be having problems when the PSW feeder isnt running, then that may well be the cause of our problems! I am clutching at straws here as I dont really know what the problem is, I only know that we CANNOT get this stuff to behave badly on our test configuration, however, we are running a pure PCM installation, with no other external applications. Cheers, Phil | |||||
| 510.11 | BER::MUELLER | Wed Dec 14 1994 12:45 | 14 | ||
>> When the processes go into RWMBX, is it as a result of a large number of
>> events from PSW via the PSC to PCM feeder? It would suggest that
This morning I removed the filter which does automatic repairs
(it automatically restarts printqueue), and OSCint was coming up fine.
I could reproduce the behaviour with the RWMBX when I included the
repair again.
Before this RWMBX problem appeared the autom. repair was running and
behaving normally.
- Sabine
| |||||
| 510.12 | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Wed Dec 14 1994 18:12 | 15 | |
I haven't said anything up to now but there seems to be a lot of
occurences of late so here goes.
I have been able to a brief look at only one of these situations as the
customer had to get it back up quickly. What happens here is that
ENS backs up waiting on an action to pull stuff from a mailbox which
can eventually cause *all* of the controller daemons to go into RWMBX.
In the one case I looked at it appeared the Action routine had
terminated abnormally and the timing was such that ENS got stuck in the
RWMBX state. I too have never been able to reproduce the exact scenario
that I dealt with. What needs to be done is for us to access a
brken system and be given the time to really debug this.
Regs,
Dan
| |||||
| 510.13 | Access to system | OPCO::TSG_SJM | Coming live to you from Rosebery | Wed Dec 14 1994 21:47 | 13 |
Phil
Nope not currently running OSCint, but when I was I was having all
sorts of problems, these have now disappeared with the deinstallation
of OSCint. The main problem we were experiencing with OSCint was that
ENS would shutdown on a very regular basis.
If you want to look at the system running PCM here, I can arrange that.
The system running PCM isn't on the network, we can discuss access
off-line.
Cheers
Steve
| |||||
| 510.14 | available for testing | BER::MUELLER | Thu Dec 15 1994 08:53 | 11 | |
If you are interested in the CM / OSCint configuration you can
access my system. As I mentioned before, I could reproduce the 'error'.
Unfortunatelly, today I have to use CM to access a remote system for
installation, but tomorrow testing won't be a problem.
Just give me a call.
- Sabine
859-3334
| |||||
| 510.15 | RWMBX again | LEMAN::NEUWEILER | System Support Geneva | Fri Dec 16 1994 10:45 | 332 |
The problem RWMBX problem reappeard again with the new CONSOLE$DAEMON.EXE
installed. There had been a temporary network problem at the customer site,
the console processes went into RWMBX state and never left it again.
At that point the customer took a crash dump.
Attached you can find an extract showing the situation:
Current process summary
-----------------------
Extended Indx Process name Username State Pri PCB PHD Wkset
-- PID -- ---- --------------- ----------- ------- --- -------- -------- ------
00000081 0001 SWAPPER HIB 16 826051A0 82605000 0
00000B02 0002 OPSMGR_1 OPSMGR LEF 6 82CC8200 844E3200 131
00000085 0005 IPCACP SYSTEM HIB 10 82C99A40 831F4200 86
00000086 0006 ERRFMT SYSTEM HIB 8 82C9A040 832FA200 113
00000088 0008 AUDIT_SERVER AUDIT$SERVER HIB 10 82C9A840 83400200 143
00000089 0009 JOB_CONTROL SYSTEM HIB 10 82C9AA40 83483200 63
0000008A 000A QUEUE_MANAGER SYSTEM HIB 8 82C9AC40 83506200 153
00000E8E 000E _RTA5: OPSMGR LEF 5 82CA1A40 84878200 1190
00000B10 0010 UCX$RSHD_BG49 WGSMONIT LEF 7 82CB2E00 84772200 288
00000091 0011 NETACP DECNET HIB 10 82C9E640 8368F200 903
00000092 0012 EVL DECNET HIB 4 82C9EE40 83589200 127
00000093 0013 SNS$WATCHDOG SYSTEM HIB 5 82CA0040 8389B200 357
00001315 0015 TAMI TAMI LEF 5 82C6BD00 847F5200 366
00000096 0016 NSCHED SYSTEM LEF 8 82C97C40 83795200 150
00000097 0017 REMACP SYSTEM HIB 9 82C99C40 83A24200 70
00000098 0018 SCHED_REMOTE SYSTEM LEF 6 82C9AE40 83AA7200 70
00001219 0019 TAMI_1 TAMI LEF 6 82D379C0 8497E200 262
0000121A 001A TAMI_2 TAMI LEF 6 82CF1780 84A84200 1027
0000131B 001B TAMI_3 TAMI HIB 5 82CE5240 83E3C200 1115
0000009C 001C SMISERVER SYSTEM HIB 9 82C9C440 8391E200 65
0000009D 001D UCX$INET_ACP INTERnet HIB 10 82C9FE40 8360C200 267
0000009E 001E OPSMGR OPSMGR LEF 6 82C99240 83712200 310
0000141F 001F DECW$TE_141F TAMI LEF 7 82CBE200 84B07200 187
000000A0 0020 LATACP SYSTEM HIB 14 82C9D840 83B2A200 117
00001421 0021 TAMI_4 TAMI LEF 5 82C92F80 84B8A200 368
000000A6 0026 DECW$SERVER_0 OPS HIB 6 82C99840 83DB9200 5471
00001627 0027 FIELD FIELD LEF 10 82C9F840 84566200 272
000000A8 0028 DECW$SESSION OPS LEF 7 82C9C640 83EBF200 852
000000AA 002A DECW$MWM OPS LEF 4 82C9DA40 83818200 959
000000AC 002C VUE$OPS_3 OPS LEF 5 82C9E440 83F42200 411
000000AD 002D VUE$OPS_4 OPS LEF 5 82C9F240 83FC5200 385
000000AE 002E DECW$TE_00AE OPS LEF 7 82C9F440 84048200 4538
000000AF 002F _FTA5: OPS LEF 6 82CA1040 83277200 225
00000130 0030 _FTA9: OPS LEF 5 82C6B700 8435A200 284
000000B1 0031 OPS OPS LEF 10 82C9EC40 8414E200 255
00000BB8 0038 _RTA4: CLUSMGR LEF 8 82CA5E00 843DD200 336
00000BB9 0039 _RTA3: OPSMGR LEF 5 82C96900 83BAD200 307
00000E41 0041 Console Notify OPSMGR RWMBX 6 82CA0C40 842D7200 1337
000013C2 0042 Console Daemon OPSMGR HIB 5 82CB1940 84254200 677
000013C3 0043 Console Ctrl 01 OPSMGR RWMBX 6 82C9DC40 8466C200 2864
000013C4 0044 Console Ctrl 02 OPSMGR RWMBX 6 82CE5580 84E19200 3004
000011C5 0045 CLUSMGR CLUSMGR LEF 5 82CE5A40 845E9200 372
000000CB 004B _FTA8: OPS LEF 5 82C9BC40 83D36200 276
000014D0 0050 SERVER_0014 OPSMGR LEF 7 82C66B00 848FB200 423
00000951 0051 BATCH_294 OPSMGR LEF 6 82CD1280 846EF200 736
00001752 0052 SERVER_0012 OPSMGR LEF 7 82C65F80 841D1200 437
000016D3 0053 SERVER_0011 OPSMGR LEF 7 82C88300 84A01200 423
00000AD4 0054 OPCOM SYSTEM HIB 7 82C9E040 8337D200 146
00000D55 0055 SERVER_000F OPSMGR LEF 7 82C82680 84D96200 435
00000CD7 0057 OPSMGR_2 OPSMGR HIB 6 82C62C00 83CB3200 331
00001358 0058 DECW$TE_1358 OPSMGR LEF 7 82C69240 83C30200 1534
000016D9 0059 OPSMGR_3 OPSMGR LEF 4 82C8A240 84460200 1184
000003E3 0063 _FTA12: OPS LEF 5 82CC2940 839A1200 254
SDA> set proc/ind=41
SDA> show proc/chan
Process index: 0041 Name: Console Notify Extended PID: 00000E41
-------------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DKA300:
0020 82C93B80 DKA300:(3039,7,0)
0030 82BCCDC0 DKA300:(1108,3,0) (section file)
0040 82BCB600 DKA300:(191,3,0) (section file)
0050 82BCE040 DKA300:(643,3,0) (section file)
0060 82C98040 DKA300:(5266,2,0) (section file)
0070 00000000 NLA0:
0080 82C262C0 DKA300:(303,159,0)
0090 82BCC700 DKA300:(1095,3,0) (section file)
00A0 82BCE600 DKA300:(415,3,0) (section file)
00B0 82BCE6C0 DKA300:(400,3,0) (section file)
00C0 82BC9DC0 DKA300:(1232,2,0) (section file)
00D0 82BD5BC0 DKA300:(2011,2,0) (section file)
00E0 82BCE300 DKA300:(1233,2,0) (section file)
00F0 82BD1E40 DKA300:(1437,2,0) (section file)
0100 82BD6D00 DKA300:(2001,2,0) (section file)
0110 82BCCAC0 DKA300:(642,54,0) (section file)
0120 82BD6300 DKA300:(1999,2,0) (section file)
0130 82BD6A00 DKA300:(2012,2,0) (section file)
0140 82BD60C0 DKA300:(2009,2,0) (section file)
0150 82BCFC00 DKA300:(2004,2,0) (section file)
0160 00000000 DKA300:
0170 00000000 Busy MBA8360:
0180 00000000 MBA8361:
0190 00000000 Busy MBA8362:
01A0 00000000 Busy MBA8363:
01B0 00000000 MBA8364:
01C0 00000000 Busy MBA8365:
01D0 00000000 MBA8366:
01E0 00000000 Busy MBA8367:
01F0 00000000 MBA8368:
0220 00000000 MBA8372:
0230 00000000 MBA8373:
0240 00000000 MBA8378:
0250 00000000 MBA8379:
0260 00000000 MBA8380:
0270 00000000 MBA8381:
0280 00000000 Busy MBA8805:
0290 00000000 MBA8806:
0320 00000000 Busy MBA8812:
0330 00000000 MBA8813:
0340 00000000 Busy MBA8824:
0350 00000000 MBA8825:
SDA> show dev/addr=@r5
I/O data structures
-------------------
MBA8825 MBX UCB address: 82C63500
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 1864 ORB address 82C27700
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 00000008 Byte count 0000 LNM address 82AFDEA0
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 001C0041
*** I/O request queue is empty ***
SDA> set proc/ind=43
SDA> show dev/addr=@r5
I/O data structures
-------------------
MBA8378 MBX UCB address: 82C68000
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 162 ORB address 82C1C840
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 00000008 Byte count 0000 LNM address 82AFB280
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 001C0041
*** I/O request queue is empty ***
SDA> show proc/chan
Process index: 0043 Name: Console Ctrl 01 Extended PID: 000013C3
--------------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DKA300:
0020 82BCE380 DKA300:(469,278,0)
0030 82BCCDC0 DKA300:(1108,3,0) (section file)
0040 82BCB600 DKA300:(191,3,0) (section file)
0050 82BCE040 DKA300:(643,3,0) (section file)
0060 82C98040 DKA300:(5266,2,0) (section file)
0070 82C3C340 DKA300:(3425,24,0)
0080 82C26500 DKA300:(3016,75,0)
0090 82BCC700 DKA300:(1095,3,0) (section file)
00A0 82BCE600 DKA300:(415,3,0) (section file)
00B0 82BCE6C0 DKA300:(400,3,0) (section file)
00C0 82BC9DC0 DKA300:(1232,2,0) (section file)
00D0 82BD5BC0 DKA300:(2011,2,0) (section file)
00E0 82BCE300 DKA300:(1233,2,0) (section file)
00F0 82BD1E40 DKA300:(1437,2,0) (section file)
0100 82BD6D00 DKA300:(2001,2,0) (section file)
0110 82BCCAC0 DKA300:(642,54,0) (section file)
0120 82BD6300 DKA300:(1999,2,0) (section file)
0130 82BD6A00 DKA300:(2012,2,0) (section file)
0140 82BD60C0 DKA300:(2009,2,0) (section file)
0150 82BCFC00 DKA300:(2004,2,0) (section file)
0160 00000000 DKA300:
0170 00000000 MBA8374:
0180 00000000 MBA8375:
0190 00000000 MBA8378:
01A0 00000000 Busy MBA8379:
0480 00000000 LTA5105:
0490 82C0A6C0 DKA300:(3725,5,0)
04A0 82C52180 DKA300:(3726,114,0)
04B0 82C20740 DKA300:(3720,106,0)
04C0 00000000 Busy MBA8424:
0520 00000000 LTA5109:
0530 82C05440 DKA300:(485,11,0)
0540 82C2E3C0 DKA300:(486,127,0)
0550 82BFAF40 DKA300:(492,56,0)
0560 00000000 Busy MBA8428:
0570 00000000 LTA5111:
0580 82C2F380 DKA300:(505,20,0)
0590 82C4C180 DKA300:(520,135,0)
05A0 82BF1E80 DKA300:(525,202,0)
05B0 00000000 Busy MBA8430:
06F0 82BCD8C0 DKA300:(1310,2,0) (section file)
0700 82BCEB40 DKA300:(1316,2,0) (section file)
0710 82BD9D80 DKA300:(2028,2,0) (section file)
0720 82BD96C0 DKA300:(1450,2,0) (section file)
0730 82BD90C0 DKA300:(2026,2,0) (section file)
SDA> set proc/ind=44
SDA> show dev/addr=@r5
I/O data structures
-------------------
MBA8380 MBX UCB address: 82C86F00
Device status: 00000010 online
Characteristics: 0C150001 rec,shr,avl,mbx,idv,odv
00000200 nnm
Owner UIC [000001,000004] Operation count 1613 ORB address 82BF7880
PID 00000000 Error count 0 DDB address 8286BAF8
Class/Type A0/01 Reference count 2 DDT address 8258CF08
Def. buf. size 1024 BOFF 0000 CRB address 8286C28C
DEVDEPEND 00000009 Byte count 0000 LNM address 82AFB340
DEVDEPND2 00000000 SVAPTE 00000000 I/O wait queue empty
FLCK index 28 DEVSTS 0002
DLCK address 82866800
Charge PID 001C0041
*** I/O request queue is empty ***
SDA> show proc/chan
Process index: 0044 Name: Console Ctrl 02 Extended PID: 000013C4
--------------------------------------------------------------------
Process active channels
-----------------------
Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 DKA300:
0020 82BDA840 DKA300:(469,278,0)
0030 82BCCDC0 DKA300:(1108,3,0) (section file)
0040 82BCB600 DKA300:(191,3,0) (section file)
0050 82BCE040 DKA300:(643,3,0) (section file)
0060 82C98040 DKA300:(5266,2,0) (section file)
0070 82C34A80 DKA300:(3441,97,0)
0080 82BFA040 DKA300:(3440,23,0)
0090 82BCC700 DKA300:(1095,3,0) (section file)
00A0 82BCE600 DKA300:(415,3,0) (section file)
00B0 82BCE6C0 DKA300:(400,3,0) (section file)
00C0 82BC9DC0 DKA300:(1232,2,0) (section file)
00D0 82BD5BC0 DKA300:(2011,2,0) (section file)
00E0 82BCE300 DKA300:(1233,2,0) (section file)
00F0 82BD1E40 DKA300:(1437,2,0) (section file)
0100 82BD6D00 DKA300:(2001,2,0) (section file)
0110 82BCCAC0 DKA300:(642,54,0) (section file)
0120 82BD6300 DKA300:(1999,2,0) (section file)
0130 82BD6A00 DKA300:(2012,2,0) (section file)
0140 82BD60C0 DKA300:(2009,2,0) (section file)
0150 82BCFC00 DKA300:(2004,2,0) (section file)
0160 00000000 DKA300:
0170 00000000 MBA8376:
0180 00000000 MBA8377:
0190 00000000 MBA8380:
01A0 00000000 Busy MBA8381:
02A0 00000000 LTA5094:
02B0 82BCD8C0 DKA300:(1310,2,0) (section file)
02C0 82BCEB40 DKA300:(1316,2,0) (section file)
02D0 82BD9D80 DKA300:(2028,2,0) (section file)
02E0 82BD96C0 DKA300:(1450,2,0) (section file)
02F0 82BD90C0 DKA300:(2026,2,0) (section file)
0300 82C022C0 DKA300:(3552,59,0)
0310 82C06640 DKA300:(3556,54,0)
0320 82C5D700 DKA300:(4111,4,0)
0330 00000000 Busy MBA8401:
0390 00000000 LTA5098:
03A0 82BF3E00 DKA300:(3745,43,0)
03B0 82BFDAC0 DKA300:(3746,322,0)
03C0 82C19E40 DKA300:(3744,196,0)
03D0 00000000 Busy MBA8409:
03E0 00000000 LTA5100:
03F0 82C10300 DKA300:(3749,69,0)
0400 82C42C40 DKA300:(3751,150,0)
0410 82C52E40 DKA300:(3748,203,0)
0420 00000000 Busy MBA8411:
04D0 00000000 LTA5106:
04E0 82C2E840 DKA300:(3759,341,0)
04F0 82C0C1C0 DKA300:(3761,46,0)
0500 82C03F40 DKA300:(3757,191,0)
0510 00000000 Busy MBA8425:
05C0 00000000 LTA5112:
05D0 82C46480 DKA300:(3778,58,0)
05E0 82BF8600 DKA300:(3780,304,0)
05F0 82C3A240 DKA300:(3777,130,0)
0600 00000000 Busy MBA8431:
0610 00000000 LTA5114:
0620 82C0AC00 DKA300:(3631,4,0)
0630 82C4EAC0 DKA300:(3641,4,0)
0640 82C27400 DKA300:(3620,4,0)
0650 00000000 Busy MBA8433:
0660 00000000 LTA5116:
0670 82BEA680 DKA300:(502,317,0)
0680 82BEB100 DKA300:(521,71,0)
0690 82C0BD40 DKA300:(528,16,0)
06A0 00000000 Busy MBA8435:
06F0 00000000 LTA5119:
0700 82C25900 DKA300:(4114,70,0)
0710 82C3D000 DKA300:(4115,18,0)
0720 82C61300 DKA300:(4116,8,0)
SDA> exit
| |||||
| 510.16 | AZUR::HEUSBOURG | Fri Dec 16 1994 14:02 | 34 | ||
<<< EASE::DISK$ALLIN1V24:[NOTES$LIBRARY]OSC_TOOLS_DEVELOPMENT.NOTE;1 >>>
-< *** DIGITAL INTERNAL USE ONLY *** >-
================================================================================
Note 160.4 RWMBX Problems 4 of 4
AZUR::HEUSBOURG 27 lines 16-DEC-1994 13:20
--------------------------------------------------------------------------------
Sabine, All,
OSCint Repair is an ENS action routine. ENS is using MBXes on OVMS
to perform IPC with the action routines (image mode). So of course
as soon as ENS receives an event, and the filter matches, the event
is written to the dedicated MBX. The action routine then read this
MBX to get the events to be treated.
What's happening, is that the action routine, depending on the action
it has to perform, can be in a situation where it read the MBX slower
compared to the speed ENS writes the events into the same MBX.
Hence, when the MBX is full, ENS goes temporarily into RWMBX state
to let the action routine read more events.
To bypass this problem, we changed the OSCint Repair module to make it
read the MBX and perform the action asynchronously, so that, even
if the action takes time (ie long timeout on a DECnet connection),
it still continues to read the MBX and queues the events.
I'm going to rebuild a kit and put it public at the begining of next
week. I'll put "FT" somewhere in the name till I get some validations
from you after deeper tests.
Regards,
Christian.
| |||||
| 510.17 | AZUR::HEUSBOURG | Fri Dec 16 1994 14:38 | 20 | ||
RE .13
Steve,
> Nope not currently running OSCint, but when I was I was having all
> sorts of problems, these have now disappeared with the deinstallation
> of OSCint. The main problem we were experiencing with OSCint was that
> ENS would shutdown on a very regular basis.
I'll be glad to help you use OSCint, as I don't feel that the ONLY solution
is to deinstal it. OSCint is using PCM intensely so can reveal problems
which would take longer to appear otherwhise.
OSCint is beeing used now on a lot of customer sites without the ENS shutdown
problem you explained. So I'm a little surprised, but would be very happy to
help you run OSCint on your system if you need the functionalities.
Regards,
Christian.
| |||||
| 510.18 | Thanks for the offer. | WOTVAX::ELLISM | Are you all sitting too comfybold square on your botty? - Then w | Tue Dec 20 1994 10:04 | 13 |
Christian,
Steve is leaving at the end of this month. I'll be working in there
from January/February sometime.
It was felt that until Console manager itself ran regularly not to
throw in another 'wildcard', like OSCint, to potentialy confuse the
issue.
Basicaly, this is a problem with Console Manager, and not OSCint. When
this is resolved, then we can look at any problems OSCint may cause.
Martin
| |||||
| 510.19 | OPCO::TSG_SJM | Coming live to you from Rosebery | Wed Dec 21 1994 01:50 | 9 | |
Christian and Martin,
Martin, I'm actually finishing up today, whooppee, we seem to have a
pretty stable PCM platform at the moment, thanks to the efforts of the
PCM team. Tony is taking over from me with the POLYCENTER products, so
you may like to touch bases with him.
Cheers for the final time
Steve
| |||||