[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1994.0. "Alarm rule ACCVIO" by TAVIS::PERETZ () Thu Dec 26 1991 03:53

Hi,

Yesterday evening I created 3 alarm rules. This morning I looked at the 
notification window and saw that only one rule is still firing.

The problem is that DECmcc is not telling me that there is any problem. 

SHOW STATUS command on all 3 rules is telling me that all 3 are enabled 
and running. 

SHOW COUNTERS command tells me that the last evaluation for 2 of the rules
was around midnight. This is suspicious since I know that they should run
forever.

Only by looking at the DECTERM I see two ACCESS VIOLATIONS and I guess that
they maybe related to the two dead alarm rules. But I cannot know for sure.

Now this was an easy case because I expected to see notifications. Usually
operators are NOT expecting notifications, so if alarm rules are  killed
for some reason NOBODY will notice it until it is too late.

My suggestion is to add an audit process to DECmcc that will NOTIFY the users
if some thread has terminated. This way the operator will at least KNOW that
there is a problem with DECmcc. This is especially important when DECmcc is
used for monitoring NON DEC networks, and the operators do not know anything
about VMS.

P.S. The alarm rules were:
	1. Node4 telcom user bytes sent > 50 at every 00:01:00
	2. Node4 telcom user bytes sent > 100 at every 00:01:00
	3. Node4 telcom user bytes sent > 500 at every 00:01:00

     Rule 1 stopped after 308 evaluations.
	%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=
	 00000000, PC=201C0000, PSL=00F825CC
     Rule 2 stopped after 338 evaluations.
	%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=
	 00000000, PC=201C0000, PSL=00CB29CC

I run T1.2.4 on VAXstation 3100 M48 SPX with 32MB memory, VMS 5.4-1

Peretz Gur-El
T.RTitleUserPersonal
Name
DateLines
1994.1More problems - dynamic memoryTAVIS::PERETZSun Dec 29 1991 05:2738
Another problem related to alarm rules:

I created the same three alarm rules mentioned in the previous note and let
them run for the weekend. When I check the workstation again after 2.5 days
I see:
	1. In the notification window - a list of alarm notifications of all
	   3 rules, as expected, but only until yesterday morning. At about
	   01:10:00 yesterday morning all 3 rules stopped firing. The last
	   notification is number 5965.

	2. In the DECTERM window - No information (No ACCVIO, no X Toolkit
	   Warning, nothing).
 
	3. There is a DECmcc message window about not enough dynamic memory.
	   
	4. When I try to do any SHOW command on any of the alarm rules
	   I get a DECmcc message window: C allocation error.

	5. When I try to do a SHOW commands on my node4 entity I get
	   either the: C allocation error
	           or: dispatch local management module file access error during
		       probe.

Then I tried to do SHOW STATUS command on another node4 entity and received
the following message on the DECTERM window:

%DEBUGBOOT-W-VASFULL, virtual addres space is full
%XLIB-E-INSFMEM, insufficient dynamic memory
%DEBUGBOOT-W-VASFULL, virtual addres space is full
%XLIB-E-INSFMEM, insufficient dynamic memory
%Thread 190 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000
PC=201C0000, PSL=01C883CC

My PAGEDYN=NPAGEDYN=1499648. Is it enough? What is the recommended value?
Should I change other sysgen parameters?

Peretz Gur-El
1994.2could be a quota problemMOLAR::ROBERTSKeith Roberts - DECmcc Toolkit TeamTue Dec 31 1991 08:2815
    Peretz Gur-El,
    
    Are you running the t1.2.4 kit -- The Alarms FM was tested for memory
    leaks -- none were found in the t1.2.4 kit.
    
    The Notification FM maintains rule fired information, this is stored in
    allocated dynamic memory (I suspect).  This could be causing the
    Insufficient Virtual Memroy you are seeing.
    
    What is the value of the sysgen parameter: VIRTUALPAGECNT
    
    Goto Authorize, and display the user process quotas.  Post your
    results here.
    
    /keith
1994.3Here are the quotasTAVIS::PERETZWed Jan 01 1992 02:5442
>    Are you running the t1.2.4 kit -- The Alarms FM was tested for memory
>    leaks -- none were found in the t1.2.4 kit.

Yes I am running the T1.2.4 kit.

>    Goto Authorize, and display the user process quotas.  Post your
>    results here.

$ mc authorize
UAF> sho demo

Username: DEMO                             Owner:
Account:                                   UIC:    [300,300] ([300,300])
CLI:      DCL                              Tables: DCLTABLES
Default:  TELCOM$DKB100:[DEMO]
LGICMD:   LOGIN
Flags:
Primary days:   Mon Tue Wed Thu Fri
Secondary days:                     Sat Sun
No access restrictions
Expiration:            (none)    Pwdminimum:  6   Login Fails:     0
Pwdlifetime:         30 00:00    Pwdchange:  23-DEC-1991 15:43
Last Login:  1-JAN-1992 10:40 (interactive),  1-JAN-1992 10:43 (non-interactive)
Maxjobs:         0  Fillm:       150  Bytlm:        64000
Maxacctjobs:     0  Shrfillm:      0  Pbytlm:           0
Maxdetach:       0  BIOlm:       100  JTquota:       1024
Prclm:           4  DIOlm:       100  WSdef:         4096
Prio:            4  ASTlm:       100  WSquo:         4000
Queprio:         4  TQElm:       150  WSextent:     16000
CPU:        (none)  Enqlm:       512  Pgflquo:     100000
Authorized Privileges:
  LOG_IO SETPRV TMPMBX NETMBX PHY_IO SYSPRV
Default Privileges:
  LOG_IO SETPRV TMPMBX NETMBX PHY_IO SYSPRV

SYSGEN>  SHO VIRTUALPAGECNT
Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic
--------------            -------    -------    -------  -------   ----  -------
VIRTUALPAGECNT              73536       9216       512   1000000 Pages


Peretz Gur-El
1994.4parameters look pretty goodMOLAR::ROBERTSKeith Roberts - DECmcc Toolkit TeamThu Jan 02 1992 08:1229
    >>> Autorize Quotas
    
     o The Working Set values look a bit odd:
    
    	wsdef < wsquo < wsextent   ...  typically like:
    
    	512   < 4096  < 16000
    
        But I don't think your settings would cause any problems.
    
     o Page File Quota is good -- I forgot to ask what size page file
       you have?  It should be as big or larger than your PGFLQUO
    
    >>> Virtual Page Count
    
        73,536 looks good -- but should probably be higher -- about
        100,000.
    
    >>> Due to the nature of the operations you were performing, your
        system may require a large memory configuration (not just physical
        memory - but authorize and sysgen parameters).
    
        Could someone from the Notification Team give us an idea of the
        memory consumption when a lot of event data occurs and accumulates
        on the map (?)
    
    /keith
    
    
1994.5We know of one of the problems you have noticedTOOK::ORENSTEINFri Jan 03 1992 13:4411
    
    The Alarms team is also aware of one of the problems you have seen.
    
    If one of our running rule (threads) has an ACCVIO and fanishes, 
    we don't have abig brother checking on this situation.  So you will
    see that the state will never be set to DISABLE and you will not be
    able to delete the rule.  In this case, you must EXIT the MCC process
    and everything will be properly cleaned-up.
    
    		aud...
    
1994.6Any plans to correct it?TAVIS::PERETZWed Jan 08 1992 11:0419
>    The Alarms team is also aware of one of the problems you have seen.
>    
>    If one of our running rule (threads) has an ACCVIO and fanishes, 
>    we don't have abig brother checking on this situation.  So you will
>    see that the state will never be set to DISABLE and you will not be
>    able to delete the rule.  In this case, you must EXIT the MCC process
>    and everything will be properly cleaned-up.

1. What about the other problems?

2. Exiting DECmcc surely will clean up everything, but I am sure you agree
   this is not the right solution. Are there any plans to do something:
	A. To let me know that a thread terminated, and what is the meaning
	   from a manager perspective (If a thread terminated I know it by
	   looking at the DECterm, but I have no idea how does it affect my
	   DECmcc! I.E which one of the many alarm rules is dead).
	B. To be able to corect the situation WITHOUT leaving DECmcc.

Peretz Gur-El 
1994.7Not what you want to hear...TOOK::ORENSTEINWed Jan 08 1992 12:4222
    
    We are taking a hard look at memory consumption in MCC.  Once you have
    run out of dynamic memory, you will have to exit MCC and start again.
    There is no graceful way to get around this.  We are very conscious of
    this and we are doing the best we can.
    
    ALARMS will try to let you know if your thread has died, but I can not
    guarentee that you will see this in the V1.2 product.  As to being able
    to tell if your thread has died, the only thing I can suggest is to 
    compare the "last evaluation time" with the polling time to see if
    it makes sense.  To better clarify, if your polling time is every 15
    minutes and the last evaluation time is an hour ago, you can assume
    that the thread died.
    
    Unfortuately, until I stick in the big brother to detect dead threads,
    there will be no way for you to reuse a dead rule without exiting MCC.
    
    We have recently switched over to using DecThreads (CMA) and now that
    we are in field test, we can, as a group, take a better look at dead
    thread detection.
    
    	aud...
1994.81. You are right 2.It happened againTAVIS::PERETZThu Jan 09 1992 03:1833
>    We are taking a hard look at memory consumption in MCC.  Once you have
>    run out of dynamic memory, you will have to exit MCC and start again.
>    There is no graceful way to get around this.  We are very conscious of
>    this and we are doing the best we can.

The question is why do I run out of dynamic memory? This happened to me
again last night. I defined 6 alarm rules that evaluate every minute
and let it run for the night. This morning I found two dead bodies:

$ mana/enter/inter=decw
%Thread 170 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000, PC
=201C0000, PSL=01B78BCC
%Thread 166 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000, PC
=201C0000, PSL=01C50FCC

So I guess there is still some leaks in ALARMS. I shall leave the process 
running and see what happenes to the rest 4 rules.

My congiguration and quotas are listed in previous notes. 

>    ALARMS will try to let you know if your thread has died, but I can not
>    guarentee that you will see this in the V1.2 product.  As to being able
>    to tell if your thread has died, the only thing I can suggest is to 
>    compare the "last evaluation time" with the polling time to see if
>    it makes sense.  To better clarify, if your polling time is every 15
>    minutes and the last evaluation time is an hour ago, you can assume
>    that the thread died.

Sure, but when you have 100 rules then it takes some time...

Peretz
1994.9Don't confuse ACCVIO with INSVIRMEMTOOK::GUERTINDon&#039;t fight fire with flamesThu Jan 09 1992 07:1337
    I think the point here is that this looks like two problems.
    1) Memory leaks.  The software (MCC in general) allocates dynamic
    memory, but sometimes forgets to release it, after about 2.5 days of
    fairly heavy use, it exhausts virtual memory.  You could probably get
    it to die earlier than that.  The workaround is that you have to exit
    MCC at some convenient time, perhaps during the night?  Some
    "always-running" sites have a command file or a night shift operator
    which does this at its lowest usage / least critical time.  We view
    this as an extremely critical and very high priority bug.  But
    realisticly, there are just too many lines of code to clean out all
    memory leaks.
    
    2) ACCVIO.  This is a priority 1 bug which needs to QARed and looked at
    in detail.  _Sometimes_ exhausting virtual memory will result in ACCVIOs.
    When this happens, it is almost always preceeded by some sort of
    "Insufficient Virtual Memory" message.  Just an ACCVIO message with
    nothing else implies a more serious bug.  How long do you run before
    getting this message? Overnight?  If so, then chances are you have NOT
    yet exhausted dynamic memory.
    
    Please do not mix the two problems together.  It is better to assume
    you have two separate problems which sometimes show up side-by-side,
    than to assume that it is all one big gigantic problem.  I assumed from
    your first two notes that you had exhausted virtual memory, then saw
    accvios, which is a common side-effect of running out of memory.
    
    For problem #2.  You could try this (it may or may not give us more
    information).  $ DEFINE MCC_LOG 8 and then enter
    MANA/ENTER/DEBUG/INTER=DECW and type GO at the DBG prompt.  Let it run
    overnight.  Then when the DBG prompt re-appears (probably will mention
    something about an ACCVIO exception), enter DBG> SHOW IMAGE and
    DBG> SHOW CALLS, and post the results here, along with any exception
    messages.  (Note also that if you have exhausted virtual memory, the
    debugger will probably tell you so.)  
    
    -Matt.
    
1994.10A lot of things are going on in thereNANOVX::ROBERTSKeith Roberts - DECmcc Toolkit TeamThu Jan 09 1992 08:0837
When you run MCC on VMS, all the MM's are loaded into 1 process; in your case:

	DECmcc Kernel

	PM:  Iconic Map
	PM:  Notification PM

	FM:  Notification FM
	FM:  Alarms FM

	AM:  DECnet Phase-4 AM

The Alarms FM was tested very thoroughly for memory leaks - some were found
and fixed for 1.2 -- the testing will be done again to be sure.

But as I said before - it may not be a 'leak' at all.  Each component above
allocates a certain amount of memory just to operate.  Typically, some of
the memory is never returned.

For example, the Alarms FM has an in-memory database which maintains the
information about executing rules (like counter and status information).
An entry is made in the database when the rule is enabled - but the memory
consumed doesn't grow during execution.

When the Rule fires an event is declared by Alarms.  The Notification FM/PM
pick this up and light up the Icon.  Also, I believe, the information
about the rule is kept in a 'list' (by Notfication) which you can then examine.

If you aren't deleting the entries in this list, I imagine they grow until
all available memory is consumed.

                           -- P L E A S E --

        Could someone from the Notification team help here ?


/keith
1994.11ACCVIO occurs first, later comes INSVIRMEMTAVIS::PERETZThu Jan 09 1992 11:2324
>    2) ACCVIO.  This is a priority 1 bug which needs to QARed and looked at
>    in detail.  _Sometimes_ exhausting virtual memory will result in ACCVIOs.
>    When this happens, it is almost always preceeded by some sort of
>    "Insufficient Virtual Memory" message.  Just an ACCVIO message with
>    nothing else implies a more serious bug.  How long do you run before
>    getting this message? Overnight?  If so, then chances are you have NOT
>    yet exhausted dynamic memory.

I am afraid this is the case here. Here is the timing of the events:
	1. Started DECmcc & 6 alarm rules.
	2. 15 hours later I saw the two ACCVIOs. No other messages.
 	   4 rules are still firing 4 times per minute.
	3. 9 hours later I see a DECmcc message window:
		Notify request 1 for domain .world encountered an error
		%MCC-E-INSVIRMEM, software error: Insufficient virtual memory
	4. and from then on there are no more notifications (last one is number
	   5818).
	5. Any command from now on results in: C allocation error.

So clearly the ACCVIOs are not a result of running out of virtual memory.

The debugging session will be done next week. I have to run now.

Peretz Gur-El
1994.12QAR 2084TOOK::MINTZErik Mintz, DECmcc DevelopmentThu Jan 09 1992 13:352
Entered as QAR 2084 at priority 1

1994.13some clarification of what is being done.TOOK::CALLANDERMCC = My Constant CompanionFri Jan 10 1992 09:5741
    RE: most of the previous ones....
    
    Some more inf ormation to help shed some light on why you can see and
    accvio and then keep running.
    
    When the accvio occurs it can happen in a number of places, and some
    of these are capable of telling the requestor that they died, others
    are not. When they are capable of do ing so, then some memory clean up
    can occur, allowing operations to continue. The fact that an accvio
    occurs and not a clean message is a big problem. This must be addressed
    and fixed. If an insufficient virtual memory message is return then we
    can try to do something to clean up the mess we are in, or tell the
    user about the problem. With the ACCVIO we are simply in a bad state.
    
    Like Keith said there is an awful lot of stuff going on (note that the
    notification PM functions are built into the iconic map, they are not
    two seperate modules). One of the things we did through out the system
    to help "speed" things up, was to add alot of specialized caching. I
    personally believe that this is part of the problem we are now seeing.
    Where we are caching additional inormation away, but we are not purging
    them out when we see that we are running out of memory. More
    investigation is on-going in this area. 
    
    In general Jim Swist has taken on the major responsiblity in this area
    for overseeing the work/investigation for the remainder of field test.
    Any ideas I am sure are welcome.
    
    Please understand we are *NOT* trying to trivialize this problem, it is
    the number one item on everyones agenda. I believe that the answers you
    have received so far are simply trying to point out that the problem
    "appears" from a users view to be in alarms, but due to the complexity
    of the system that may be the symptom and not the entire problem. The
    information you are providing will be helpful in finding the accvio
    portion of the problem and helping to make MCC more stable. If you
    could also post the rules that you are running we can see if we can
    duplicate the access violation on our test systems.
    
    Thanks for the testing, and PLEASE keep the input coming in.
    
    Jill Callander
    
1994.14Please continue the good workTAVIS::PERETZSun Jan 12 1992 08:5127
>    Please understand we are *NOT* trying to trivialize this problem, it is
>    the number one item on everyones agenda. 

Never thought you did. 

>    I believe that the answers you
>    have received so far are simply trying to point out that the problem
>    "appears" from a users view to be in alarms, but due to the complexity
>    of the system that may be the symptom and not the entire problem. 

I have no idea which piece of code is responsible. I mentioned "alarms" only
because it happened when I defined and activated alarm rules.

>    The
>    information you are providing will be helpful in finding the accvio
>    portion of the problem and helping to make MCC more stable. If you
>    could also post the rules that you are running we can see if we can
>    duplicate the access violation on our test systems.

The rules were quite simple:
	"User bytes sent > x at every 00:01:00"  (x=50, 100, 200, 300, 400, 500)
So I had 6 rules firing every minute and appearing in the notification window.
I run VMS 5.4-2 on VAXstation 3100 M48 w 32MB memory. My quotas are listed
in previous replies to this note. You have to wait overnight before the ACCVIO
appears.

Peretz
1994.15Alarms-FM may be a wild-goose chase.TOOK::GUERTINDon&#039;t fight fire with flamesMon Jan 13 1992 08:205
    I just tried the exact same rules.  They ran for almost two days before
    I had to kill them (to install another version of MCC).  No accvios.
    I'm convinced the accvios have little or nothing to do with alarms.
    
    -Matt.
1994.16But still they come...TAVIS::PERETZMon Jan 13 1992 10:158
>    I just tried the exact same rules.  They ran for almost two days before
>    I had to kill them (to install another version of MCC).  No accvios.
>    I'm convinced the accvios have little or nothing to do with alarms.

I don't know if they are from alarms or some other code. I ran it again last
night and this morning I had again 3 ACCVIOs. There IS a problem somewhere...

Peretz
1994.17did you test with the Iconic Map & Notification ?NANOVX::ROBERTSKeith Roberts - DECmcc Toolkit TeamMon Jan 13 1992 13:425
Matt -- did you test with the Iconic Map & Notification ?

I believe that Peretz is testing with the Map & Notification enabled.

/keith
1994.18NoTOOK::GUERTINDon&#039;t fight fire with flamesMon Jan 13 1992 14:0118
    I'm trying not to get too involved here.  The point of my testing was
    *ONLY* to test if the Alarms-FM could run the specified rules for two
    days straight without accvioing and it did.  So, I ran alarms
    "stand-alone" from FCL.  Since it seemed to be humming right along, I
    concluded that the Alarms-FM was not directly involved in the accvios.
     
    I was concerned that because of the title of this note, and the general
    discussion, some people might head in the wrong direction and/or draw
    the wrong conclusions.  There should really be two notes here, one on
    running out of virtual memory, and another on accvios when running
    Iconic Map w/Notifications enabled.  And the accvios should be easy to
    isolate (previous note).  So, I'm assuming that someone on the
    Notification team (or Iconic Map team) has this well in hand (if not
    resolved) by now.
    
    -Matt.
    (ps., If I had more time I would have run the IMPM.  Why? Did you
    reproduce it?)
1994.19It happens with IMPM & Notification enabledTAVIS::PERETZTue Jan 14 1992 04:565
>I believe that Peretz is testing with the Map & Notification enabled.

Thats correct.

Peretz
1994.20NANOVX::ROBERTSKeith Roberts - DECmcc Toolkit TeamTue Jan 14 1992 09:428
re: .18



>    (ps., If I had more time I would have run the IMPM.  Why? Did you
>    reproduce it?)

No - I haven't tried - but I will.  Thanks 8)