[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359

5117.0. "Notification performance -> problem" by STKHLM::BERGGREN (Nils Berggren EIS/Project dpmt, Sweden DTN 876-8287) Wed May 26 1993 05:54

     Hi all,


     We're running acceptance-tests at the swedish PTT and have
     run into serious problems regarding performance.
     
     The configuration is:
     
     VS4000/90, 128 Mb, 2*RZ58 as system-disk and MCC-disk
     respectively.  DNS, DECmcc v1.3.
     
     Process-quotas set up to or above what MCC-AUDIT suggests.
     
     The test that fails is to prove that the configuration has
     sufficient power and memory  to support 5 operators.
     
     The test-setup is as follows:
     
     1.  A process blocking 32 MB ( SYS$LCKPAG(32Mb) and
     SYS$HIBER ).  Working-set set up to hold it without paging.
     
     2.  DETmcc (Asset-SW developed in Sweden) with 400 alarm-rules
     enabled.  The rules are polling at every 10 hours, so no
     CPU-power is consumed, but memory is.  
     
     3.  A command-procedure, on another node, is generating 1
     DECnet event per second (ZERO EXECUTOR).  The events are
     sinked to the test-node.
     
     4.  5 DECmcc operators are logged in, running IMPM with the
     notification services window.  Default notification request
     enabled (Notfiy domain xxx).
     
     
     At this stage everything works OK.  Response time is good.
     aprox 50-60% CPU-power left on the system.  Memory is
     enough, aprox 10-15 Mb on the free-list.
     
     When the operators create notification request for the the
     'NDOE4 node REMOTE NODE node COUNTERS ZEROED' the CPU goes
     up to 100% busy time and they're getting the events from the
     remote node almost as expected.  The notifications comes in
     groups of two-three at a time.  This doesn't cause any
     objections.
     
     After a couple of minutes (with 100% CPU-busy) all operators
     are hanging, no CPU is consumed (less than 5% CPU busy), no
     other activities can be seen on the system (using MONITOR).
     Other processes, not running DECmcc, are not affected at
     all.
     
     This may continue for 60-90 seconds when the hanging
     processes are 'released' and notifications continues trying
     to catch up.  Sometimes we're getting a message telling that
     "one or more events were lost for NODE4 node ..." (I think,
     if I remember right, that this is a CVR from the event
     manager).  When the processes are released, the CPU goes up
     to 100% busy again.
     
     
     We tried to have 5 operators issuing 'GETEVENT NODE4 node
     REMOTE node COUNTERS ZEROED, at every 00:00:00.1' and this
     didn't cause any problems.  CPU less than 40% busy, which I
     interpret as that the event manager is not the bottle-neck,
     but notification-services/FM is.

     We ran MONITOR ALL_CLASSES during the tests, and found
     nothing strange during the hangings.

     We observed that notifications can't catch up with the rate
     that the events are coming in.  In another test we did the
     following:

     1. one operator having a notify request for 'counters zeroed'
     on node Y
     
     2.  Node Y generating 100 'counters zeroed' events at a rate
     of 1/sec.
     
     3. No other user activity on the system.
     
     The time to process all 100 events and have them displayed
     on notifications window took aprox 120 seconds.  This
     implies that notifications FM can't process 1 event/sec for
     a long time and that the event manager at one point will
     have a full queue of events to be delivered.
     
     What if we have a very long flooding of events from the
     network into the MCC-system with notifications enabled?
     
     How many events can the event manager handle? At what rate?
     Can the event-queue be increased?



     We're going to do some more tests to try to find out more
     precise under what conditions these hangings appear and
     I'll be back with further info.


     I'd appreciate any comments.  
     
     We'll try to convince the customer that the test should be
     done some other way and that this test is far away from a
     real situation, but until then... 
     
     These hangings may ruin the whole deal...

	/Nils

T.R	Title	User	Personal Name	Date	Lines
5117.1		CHEEKO::DITMARS	Pete	`Wed May 26 1993 12:56`	26
	Your FCL getevent test is slightly flawed: you want to specify a FOR DURATION instead of an AT EVERY. GETEVENT NODE4 node REMOTE node COUNTERS ZEROED, for duration 24:00:00 Try that and see what happens. Also, try running the same notification requests (not getevents) from the same number of FCL processes and direct the output to the null device. notify domain blah blah blah , to file nl: This will take the notif PM out of the loop but still test the FM. Another thing to try is to close (File->Close Window not just iconify) the Notif PM window(s) and just let the notifications arrive at the map (and beep). A question: in your test of running 5 IMPMs, how many of your IMPMs are pumping their displays to other workstations? If all 5 IMPM sessions are running with their displays on that single workstation, try sending the displays elsewhere.
5117.2	Can you change the size of the Event Manager ?	TAEC::LAVILLAT		`Wed May 26 1993 13:04`	9
	Would it also help to increase the size of the Event Manager pool ? Is it possible on VMS (i.e. Does the MCC_EVENT_POOL_SIZE has an effect ? ) ? My 2 �. Pierre.
5117.3	Re .1	STKHLM::BERGGREN	Nils Berggren EIS/Project dpmt, Sweden DTN 876-8287	`Thu May 27 1993 03:18`	56
	Re .1 >>> Your FCL getevent test is slightly flawed: you want to >>> specify a FOR DURATION instead of an AT EVERY. >>> >>> GETEVENT NODE4 node >>> REMOTE node COUNTERS ZEROED, for duration 24:00:00 >>> >>> Try that and see what happens. I guess that this wouldn't affect the performance. Does NOTIF-FM do a 'GETEVENT xxx, FOR DURATION yyy' when you issue a notification request or does it issue the GETEVENT without any scheduling after processing a received event? I thought it was the latter, so I specified AT EVERY to try to do as NOTIF-FM. >>> Also, try running the same notification requests >>> (not getevents) from the same number of FCL processes >>>> and direct the output to the null device. >>> >>> notify domain blah blah blah , to file nl: >>> >>> This will take the notif PM out of the loop but still >>> test the FM. I'll do it. >>> Another thing to try is to close (File->Close Window not >>> just iconify) the Notif PM window(s) and just let the >>> notifications arrive at the map (and beep). guess that this is like the above test: 'notify domain blah blah blah , to file nl:' >>> A question: in your test of running 5 IMPMs, how many >>> of your IMPMs are pumping their displays to other >>> workstations? If all 5 IMPM sessions are running >>> with their displays on that single workstation, try >>> sending the displays elsewhere. We're not displayin on the workstation itself. They have 5 VXT2000 with 14 Mb of memory each on which we're displaying the IMPM:s. We've tried everything from 5 IMPM:s on 5 VXT:s to 5 IMPM:s on 1 VXT and we get hangings in every case. Memory on the VXT:s are not a bottleneck, there's 2-3 Mb of free memory when we have __lots__ of windows on the screen. /Nils
5117.4		CHEEKO::DITMARS	Pete	`Thu May 27 1993 10:30`	54
	>I guess that this wouldn't affect the performance. It may not. >Does NOTIF-FM do a 'GETEVENT xxx, FOR DURATION yyy' when you >issue a notification request or does it issue the GETEVENT >without any scheduling after processing a received event? >I thought it was the latter, so I specified AT EVERY to try to >do as NOTIF-FM. It's the former. The scope of interest (FOR) gets passed through by the AM to the event manager (via a parameter to mcc_event_get), which forces it to hang onto events that arrive between calls. The scheduling time (AT) only effects the scheduling of the mcc_call to the AM. It's possible (though I'm not sure how likely) that using AT would allow the event manager to drop events that would otherwise get stuck in the pool. >>>> Another thing to try is to close (File->Close Window not >>>> just iconify) the Notif PM window(s) and just let the >>>> notifications arrive at the map (and beep). > >guess that this is like the above test: > 'notify domain blah blah blah , to file nl:' Kinda, but not quite. There's still a fair amount of processing that the IMPM does with each notification... just far less X calls. >We're not displayin on the workstation itself. They have 5 Hmmmm.... Same behavior if all 5 IMPMs pump their displays to the workstation? (not that this is a scenario that a customer would use...) I do remember, in one test scenario, we saw a kind of "reverse traffic jam" behavior where the IMPM/Notif PM kind of slowed way down and then, by starting another PM, things sped back up again. During that testing, we found a major memory leak. When we fixed the memory leak, the weird "traffic jam" behavior went away. I never really understood what could have caused performance to improve when two processes were running as opposed to one, but I had this vague feeling that the code being executed was more likely to be in physical memory when two processes were using it than when one was. That coupled with the fact that the virtual address usage of the processes were growing rapidly, I guess. Or, it could be that the event manager was unable to handle the throughput when the consumers were "slow". > The test-setup is as follows: > > 1. A process blocking 32 MB ( SYS$LCKPAG(32Mb) and > SYS$HIBER ). Working-set set up to hold it without paging. I'm not sure I understand the point of this. Is it effectively taking 32Mb of physical (and virtual) memory away from the rest of the system?
5117.5	Block 32 Mb - make sure enough memory for expansion.	STKHLM::BERGGREN	Nils Berggren EIS/Project dpmt, Sweden DTN 876-8287	`Fri May 28 1993 03:58`	17
	re .4 >>> I'm not sure I understand the point of this. Is it effectively taking >>> 32Mb of physical (and virtual) memory away from the rest of the system? The blocking of 32 Mb is just to prove that 128 Mb is enough, and that they have some room for expansion regarding memory usage. It is taking 32 Mb out of memory. The System Services manual says: "The Lock Pages in Memory service locks a page or range of pages in memory. The specified virtual pages are forced into the working set and then locked in memory. A locked page is not swapped out of memory if the working set of the process is swapped out. These pages are not candidates for page replacement and in this sense are locked in the working set as well."
5117.6	Hangis if using FCL with 'TO FILE NL:' as well	STKHLM::BERGGREN	Nils Berggren EIS/Project dpmt, Sweden DTN 876-8287	`Tue Jun 01 1993 03:42`	15
	We tried FCL with 'NOTIFY ..., TO FILE NL:' and it did hang for 30-60 seconds. The first time I tested, it worked ok with 400+ events, but the second time it failed... We also tried IMPM with the notification window closed with no success. I guess that this takes X-traffic overhead as the problem away. Anyone tried to reproduce the problem? We get the hangings every time running 1 event/sec for more than 2 minutes and 4 or more operators. We even had hangings with two operators, but that is not reproducable. /Nils
5117.7	What LOG-logicals to use?	STKHLM::BERGGREN	Nils Berggren EIS/Project dpmt, Sweden DTN 876-8287	`Wed Jun 02 1993 03:09`	14
	Are the LOG-logicals (MCC_EVENT_LOG and MCC_EVENT_TRACE) still supported? If yes, what are the proper values to set? Is it still 1 and 180 respectively? Is the logical MCC_NOTIFICATION_FM_LOG the right name for NOTIFICATION-FM and what values should be used? Does it make any sence to use them to figure out when and where it is hanging, and what's going on? Any other hints on what to do to figure out what's going on (or rather what is NOT going on...) /Nils
5117.8	Some event pool logicals	TOOK::T_HUPPER	The rest, as they say, is history.	`Fri Jun 11 1993 18:03`	11
	The event logicals MCC_EVENT_LOG and MCC_EVENT_TRACE are still usable, values 1 and 180 (hex) respectively. These will cause the dumping of event puts and gets. This may be a LARGE amount of output. Make sure that all processes using the event pool use these logicals to get the whole picture. The event logical MCC_EVENT_POOL_SIZE can be used to set the size of the event pool in bytes (evenly divisible by 1024). Minimum is 262144, default is 524288, max is 8192000. Ted
5117.9	Notification window missing events	NEWVAX::BUCHMAN	UNIX refugee in a VMS world	`Wed May 31 1995 00:36`	18
	A member of our test team is noticing that an operator's notification window can start to miss events if it has been up for a long time. Is this a normal occurrence? Should the notifications window not be left up for long periods? Thanks, Jim B. Tester's message follows: > When MCC has been up for hours on end, the notification window can slow > down. This is indicated when there are gaps between the alarm ids > shown in the notification window. > > (Note: all alarms are logged to the mcc_notification.log file.) > > To fix the problem, > you can clear out all of the alarms in the notification window > periodically, or restart MCC.
5117.10	Any ideas?	NEWVAX::BUCHMAN	UNIX refugee in a VMS world	`Mon Jun 12 1995 22:29`	8
	> Note 5117.9 by NEWVAX::BUCHMAN "UNIX refugee in a VMS world" >>> > -< Notification window missing events >- Can anyone give us an opinion here? We have increased the pool size, and that helps a bit; but can we do anything else? BTW, all of our events come in via SNMP traps. Thanks, Jim B.