| Title: | POLYCENTER Console Manager |
| Notice: | Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS: |
| Moderator: | CSC32::BUTTERWORTH |
| Created: | Thu Aug 06 1992 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1541 |
| Total number of notes: | 6564 |
SUBJECT: System hangs - reboots often necessary
SOFTWARE: OpenVMS VAX V6.1
PCM V1.5-006 (w/MUP)
PROBLEM STATEMENT:
Since adding a number of nodes to CX3PCM, there have been a several
problems (much more than usual) with the PCM system.
Of late, after addition of 30 more system and the MUP, PCM has been
hanging, fails to restart properly.
SYMPTOMS on 6520 w/192 active nodes:
RECONFIGURE took 10-15 minutes to complete, if it completed at all
High use of console output could hang display, and PCM
PAGE UP in Log File would hang display
Restart of PCM would usually fail
UPGRADE: CX3PCM has been reconfigured from a 6520 w/DEBNI to a 6620
w/DEMNA (XMI interface) to deal with resource requirements.
ANALYSIS:
PCM is a real resource hog with PCM RECONFIGURE, high rates of logging
of console data, or moving through the log file:
If you hit "page up" several times a second, it results in about
80 DECnet packets/sec out to the NI card. A faster primary CPU
was needed to keep up with the several hundred BIO/s that result.
Need to replace DEBNI card with DEMNA to deal with excessive
network I/O with multiple users
With the faster 6620 CPU, I reset Console Deamon to deal with 16
consoles per Console Ctrl process - I had it set to 4 for the 6420 CPU,
8 for the 6520 CPU, and now the SUPPORTED 16 for the 6620 CPU. This
will hopefully help with the RWMBX problems.
NPAGEDYN is expanding again - next reboot will up this to 8,000,000
bytes. It seems that PCM does not deal well with expanding NPAGEDYN.
Replaced the FOUR Striped RA70 log disk with FASTER FOUR RA72 disks
for speed and capacity issues.
ISSUES with VMS Version of PCM:
The VMS version of PCM requires HUGH resources to deal with the
support 200 nodes.
The terminal I/O design of PCM appears to need work to be more
efficient with DECnet I/O.
The design of the monitor output for PCM apparently requires one
DECnet packet I/O for EACH line of data that is sent to the monitor
console. PAGE UP/PAGE DOWN can create up to 120 DECnet I/O/Sec per
user. This is excessive, and WILL cause hang of the monitor window
unless VERY FAST DECnet NI card and FAST CPU is used to keep up with
the normal rate of movement up and down the monitor logs.
PAGE UP/DOWN requires HUGH amounts of CPU ... for a PAGE UP key rate
of 3-5 per second, it takes about 50% of a 6610 CPU to produce
the output. Clearly this seems excessive for a terminal I/O function.
With 190+ nodes, RECONFIG on a 6520 w/DEBNI took 10 minutes or more, with
both 6500 CPUs saturated. On a 6620 w/DEMNA, same reconfig takes about
30-40 seconds with saturate CPUs. This does NOT scale as expected. If the
primary CPU is not fast enough, PCM can hang on reconfig.
Jim Lind
POLYCENTER Console Manager Summary
Totals
Configured Systems: 194 User disabled: 2
Active Systems : 192 (D:000 P:000 L:192 T:000) Unreachable: 000
Active Users : 5 (Connect/Monitor: 002 C3: 003 Event sources: 013)
CM pid ........: 00000132 V1.5-006 Uptime: 0 08:15:54
ENS pid .......: 00000131 V1.5-006 Uptime: 0 08:15:56
Total bytes ...: 3.80M (0) Ave bps: 127.64
Total lines ...: 74.4K (0) Ave lpm: 149.96
Total events ..: 17 (0) Ave epm: 0.03
Total actions .: 0 (0)
Active actions : 0 Failed actions : 0
Crit: 0 Maj: 1 Min: 0 Warn: 0 Clr: 16 Ind: 0
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 560.1 | OPG::PHILIP | And through the square window... | Mon Jan 16 1995 20:22 | 32 | |
Jim,
Thanks for the info, you have obviously spent some time putting this
together, and its always interesting to see how people are using the
software. A couple of points...
1) Yes we know reconfigure is a pig, the daemons have undergone a complete
rewrite for V2.0 hopefully this will improve the situation.
2) The memory usage for V2.0 has been reduced as well, the various
interfaces no longer load up the database when they start, they query a
server process for the info they require.
3) I dont understand your comment about DECnet packets, are you setting
host to your PCM system and then doing a console monitor? If so, I cant
see how we can change the situation as it is DECnet which controls the
size of the packets via the cterm protocol.
4) We have redesigned the IPC mechanism we use for V2.0 such that we are
totally unable to produce a RWMBX situation. So you shouldnt see that
problem anymore.
Now then, dont ask me when you can have V2.0 on OpenVMS that is a question
better answered by our product manager. However, suffice to say eventually
when we get to ship it, most of your problems will disappear.
Finally, you are really close to the "supported" limit on systems connected,
do you see a time when you will want to exceed the 200 system limit?
Cheers,
Phil
| |||||
| 560.2 | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Mon Jan 16 1995 22:34 | 9 | |
> NPAGEDYN is expanding again - next reboot will up this to 8,000,000
> bytes. It seems that PCM does not deal well with expanding NPAGEDYN.
What is the value of DEFMBXBUFQUO? PCM uses mailboxes *very heavily*
and larger values for this parameter will cause pool consumption
especially in your environment!
Regs,
Dan
| |||||
| 560.3 | DEFMBXBUFQUO at 2048 | BSS::LIND | Jim Lind; 592-4099 CX03-1/N14 CNMC-West | Sat Jan 21 1995 21:20 | 3 |
DEFMBXBUFQUO is 2048 on the CX3PCM at CXO3.
Jim Lind
| |||||
| 560.4 | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Jan 24 1995 20:06 | 6 | |
Considering your environment and the number of mailboxes present
in it, you could easily consume 2-3 megabytes of non-paged pool
just for PCM.
Regs,
Dan
| |||||