T.R | Title | User | Personal Name | Date | Lines |
---|
1994.1 | More problems - dynamic memory | TAVIS::PERETZ | | Sun Dec 29 1991 05:27 | 38 |
| Another problem related to alarm rules:
I created the same three alarm rules mentioned in the previous note and let
them run for the weekend. When I check the workstation again after 2.5 days
I see:
1. In the notification window - a list of alarm notifications of all
3 rules, as expected, but only until yesterday morning. At about
01:10:00 yesterday morning all 3 rules stopped firing. The last
notification is number 5965.
2. In the DECTERM window - No information (No ACCVIO, no X Toolkit
Warning, nothing).
3. There is a DECmcc message window about not enough dynamic memory.
4. When I try to do any SHOW command on any of the alarm rules
I get a DECmcc message window: C allocation error.
5. When I try to do a SHOW commands on my node4 entity I get
either the: C allocation error
or: dispatch local management module file access error during
probe.
Then I tried to do SHOW STATUS command on another node4 entity and received
the following message on the DECTERM window:
%DEBUGBOOT-W-VASFULL, virtual addres space is full
%XLIB-E-INSFMEM, insufficient dynamic memory
%DEBUGBOOT-W-VASFULL, virtual addres space is full
%XLIB-E-INSFMEM, insufficient dynamic memory
%Thread 190 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000
PC=201C0000, PSL=01C883CC
My PAGEDYN=NPAGEDYN=1499648. Is it enough? What is the recommended value?
Should I change other sysgen parameters?
Peretz Gur-El
|
1994.2 | could be a quota problem | MOLAR::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Tue Dec 31 1991 08:28 | 15 |
| Peretz Gur-El,
Are you running the t1.2.4 kit -- The Alarms FM was tested for memory
leaks -- none were found in the t1.2.4 kit.
The Notification FM maintains rule fired information, this is stored in
allocated dynamic memory (I suspect). This could be causing the
Insufficient Virtual Memroy you are seeing.
What is the value of the sysgen parameter: VIRTUALPAGECNT
Goto Authorize, and display the user process quotas. Post your
results here.
/keith
|
1994.3 | Here are the quotas | TAVIS::PERETZ | | Wed Jan 01 1992 02:54 | 42 |
| > Are you running the t1.2.4 kit -- The Alarms FM was tested for memory
> leaks -- none were found in the t1.2.4 kit.
Yes I am running the T1.2.4 kit.
> Goto Authorize, and display the user process quotas. Post your
> results here.
$ mc authorize
UAF> sho demo
Username: DEMO Owner:
Account: UIC: [300,300] ([300,300])
CLI: DCL Tables: DCLTABLES
Default: TELCOM$DKB100:[DEMO]
LGICMD: LOGIN
Flags:
Primary days: Mon Tue Wed Thu Fri
Secondary days: Sat Sun
No access restrictions
Expiration: (none) Pwdminimum: 6 Login Fails: 0
Pwdlifetime: 30 00:00 Pwdchange: 23-DEC-1991 15:43
Last Login: 1-JAN-1992 10:40 (interactive), 1-JAN-1992 10:43 (non-interactive)
Maxjobs: 0 Fillm: 150 Bytlm: 64000
Maxacctjobs: 0 Shrfillm: 0 Pbytlm: 0
Maxdetach: 0 BIOlm: 100 JTquota: 1024
Prclm: 4 DIOlm: 100 WSdef: 4096
Prio: 4 ASTlm: 100 WSquo: 4000
Queprio: 4 TQElm: 150 WSextent: 16000
CPU: (none) Enqlm: 512 Pgflquo: 100000
Authorized Privileges:
LOG_IO SETPRV TMPMBX NETMBX PHY_IO SYSPRV
Default Privileges:
LOG_IO SETPRV TMPMBX NETMBX PHY_IO SYSPRV
SYSGEN> SHO VIRTUALPAGECNT
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VIRTUALPAGECNT 73536 9216 512 1000000 Pages
Peretz Gur-El
|
1994.4 | parameters look pretty good | MOLAR::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Thu Jan 02 1992 08:12 | 29 |
| >>> Autorize Quotas
o The Working Set values look a bit odd:
wsdef < wsquo < wsextent ... typically like:
512 < 4096 < 16000
But I don't think your settings would cause any problems.
o Page File Quota is good -- I forgot to ask what size page file
you have? It should be as big or larger than your PGFLQUO
>>> Virtual Page Count
73,536 looks good -- but should probably be higher -- about
100,000.
>>> Due to the nature of the operations you were performing, your
system may require a large memory configuration (not just physical
memory - but authorize and sysgen parameters).
Could someone from the Notification Team give us an idea of the
memory consumption when a lot of event data occurs and accumulates
on the map (?)
/keith
|
1994.5 | We know of one of the problems you have noticed | TOOK::ORENSTEIN | | Fri Jan 03 1992 13:44 | 11 |
|
The Alarms team is also aware of one of the problems you have seen.
If one of our running rule (threads) has an ACCVIO and fanishes,
we don't have abig brother checking on this situation. So you will
see that the state will never be set to DISABLE and you will not be
able to delete the rule. In this case, you must EXIT the MCC process
and everything will be properly cleaned-up.
aud...
|
1994.6 | Any plans to correct it? | TAVIS::PERETZ | | Wed Jan 08 1992 11:04 | 19 |
| > The Alarms team is also aware of one of the problems you have seen.
>
> If one of our running rule (threads) has an ACCVIO and fanishes,
> we don't have abig brother checking on this situation. So you will
> see that the state will never be set to DISABLE and you will not be
> able to delete the rule. In this case, you must EXIT the MCC process
> and everything will be properly cleaned-up.
1. What about the other problems?
2. Exiting DECmcc surely will clean up everything, but I am sure you agree
this is not the right solution. Are there any plans to do something:
A. To let me know that a thread terminated, and what is the meaning
from a manager perspective (If a thread terminated I know it by
looking at the DECterm, but I have no idea how does it affect my
DECmcc! I.E which one of the many alarm rules is dead).
B. To be able to corect the situation WITHOUT leaving DECmcc.
Peretz Gur-El
|
1994.7 | Not what you want to hear... | TOOK::ORENSTEIN | | Wed Jan 08 1992 12:42 | 22 |
|
We are taking a hard look at memory consumption in MCC. Once you have
run out of dynamic memory, you will have to exit MCC and start again.
There is no graceful way to get around this. We are very conscious of
this and we are doing the best we can.
ALARMS will try to let you know if your thread has died, but I can not
guarentee that you will see this in the V1.2 product. As to being able
to tell if your thread has died, the only thing I can suggest is to
compare the "last evaluation time" with the polling time to see if
it makes sense. To better clarify, if your polling time is every 15
minutes and the last evaluation time is an hour ago, you can assume
that the thread died.
Unfortuately, until I stick in the big brother to detect dead threads,
there will be no way for you to reuse a dead rule without exiting MCC.
We have recently switched over to using DecThreads (CMA) and now that
we are in field test, we can, as a group, take a better look at dead
thread detection.
aud...
|
1994.8 | 1. You are right 2.It happened again | TAVIS::PERETZ | | Thu Jan 09 1992 03:18 | 33 |
| > We are taking a hard look at memory consumption in MCC. Once you have
> run out of dynamic memory, you will have to exit MCC and start again.
> There is no graceful way to get around this. We are very conscious of
> this and we are doing the best we can.
The question is why do I run out of dynamic memory? This happened to me
again last night. I defined 6 alarm rules that evaluate every minute
and let it run for the night. This morning I found two dead bodies:
$ mana/enter/inter=decw
%Thread 170 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000, PC
=201C0000, PSL=01B78BCC
%Thread 166 terminating with exception:
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000, PC
=201C0000, PSL=01C50FCC
So I guess there is still some leaks in ALARMS. I shall leave the process
running and see what happenes to the rest 4 rules.
My congiguration and quotas are listed in previous notes.
> ALARMS will try to let you know if your thread has died, but I can not
> guarentee that you will see this in the V1.2 product. As to being able
> to tell if your thread has died, the only thing I can suggest is to
> compare the "last evaluation time" with the polling time to see if
> it makes sense. To better clarify, if your polling time is every 15
> minutes and the last evaluation time is an hour ago, you can assume
> that the thread died.
Sure, but when you have 100 rules then it takes some time...
Peretz
|
1994.9 | Don't confuse ACCVIO with INSVIRMEM | TOOK::GUERTIN | Don't fight fire with flames | Thu Jan 09 1992 07:13 | 37 |
| I think the point here is that this looks like two problems.
1) Memory leaks. The software (MCC in general) allocates dynamic
memory, but sometimes forgets to release it, after about 2.5 days of
fairly heavy use, it exhausts virtual memory. You could probably get
it to die earlier than that. The workaround is that you have to exit
MCC at some convenient time, perhaps during the night? Some
"always-running" sites have a command file or a night shift operator
which does this at its lowest usage / least critical time. We view
this as an extremely critical and very high priority bug. But
realisticly, there are just too many lines of code to clean out all
memory leaks.
2) ACCVIO. This is a priority 1 bug which needs to QARed and looked at
in detail. _Sometimes_ exhausting virtual memory will result in ACCVIOs.
When this happens, it is almost always preceeded by some sort of
"Insufficient Virtual Memory" message. Just an ACCVIO message with
nothing else implies a more serious bug. How long do you run before
getting this message? Overnight? If so, then chances are you have NOT
yet exhausted dynamic memory.
Please do not mix the two problems together. It is better to assume
you have two separate problems which sometimes show up side-by-side,
than to assume that it is all one big gigantic problem. I assumed from
your first two notes that you had exhausted virtual memory, then saw
accvios, which is a common side-effect of running out of memory.
For problem #2. You could try this (it may or may not give us more
information). $ DEFINE MCC_LOG 8 and then enter
MANA/ENTER/DEBUG/INTER=DECW and type GO at the DBG prompt. Let it run
overnight. Then when the DBG prompt re-appears (probably will mention
something about an ACCVIO exception), enter DBG> SHOW IMAGE and
DBG> SHOW CALLS, and post the results here, along with any exception
messages. (Note also that if you have exhausted virtual memory, the
debugger will probably tell you so.)
-Matt.
|
1994.10 | A lot of things are going on in there | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Thu Jan 09 1992 08:08 | 37 |
| When you run MCC on VMS, all the MM's are loaded into 1 process; in your case:
DECmcc Kernel
PM: Iconic Map
PM: Notification PM
FM: Notification FM
FM: Alarms FM
AM: DECnet Phase-4 AM
The Alarms FM was tested very thoroughly for memory leaks - some were found
and fixed for 1.2 -- the testing will be done again to be sure.
But as I said before - it may not be a 'leak' at all. Each component above
allocates a certain amount of memory just to operate. Typically, some of
the memory is never returned.
For example, the Alarms FM has an in-memory database which maintains the
information about executing rules (like counter and status information).
An entry is made in the database when the rule is enabled - but the memory
consumed doesn't grow during execution.
When the Rule fires an event is declared by Alarms. The Notification FM/PM
pick this up and light up the Icon. Also, I believe, the information
about the rule is kept in a 'list' (by Notfication) which you can then examine.
If you aren't deleting the entries in this list, I imagine they grow until
all available memory is consumed.
-- P L E A S E --
Could someone from the Notification team help here ?
/keith
|
1994.11 | ACCVIO occurs first, later comes INSVIRMEM | TAVIS::PERETZ | | Thu Jan 09 1992 11:23 | 24 |
| > 2) ACCVIO. This is a priority 1 bug which needs to QARed and looked at
> in detail. _Sometimes_ exhausting virtual memory will result in ACCVIOs.
> When this happens, it is almost always preceeded by some sort of
> "Insufficient Virtual Memory" message. Just an ACCVIO message with
> nothing else implies a more serious bug. How long do you run before
> getting this message? Overnight? If so, then chances are you have NOT
> yet exhausted dynamic memory.
I am afraid this is the case here. Here is the timing of the events:
1. Started DECmcc & 6 alarm rules.
2. 15 hours later I saw the two ACCVIOs. No other messages.
4 rules are still firing 4 times per minute.
3. 9 hours later I see a DECmcc message window:
Notify request 1 for domain .world encountered an error
%MCC-E-INSVIRMEM, software error: Insufficient virtual memory
4. and from then on there are no more notifications (last one is number
5818).
5. Any command from now on results in: C allocation error.
So clearly the ACCVIOs are not a result of running out of virtual memory.
The debugging session will be done next week. I have to run now.
Peretz Gur-El
|
1994.12 | QAR 2084 | TOOK::MINTZ | Erik Mintz, DECmcc Development | Thu Jan 09 1992 13:35 | 2 |
| Entered as QAR 2084 at priority 1
|
1994.13 | some clarification of what is being done. | TOOK::CALLANDER | MCC = My Constant Companion | Fri Jan 10 1992 09:57 | 41 |
| RE: most of the previous ones....
Some more inf ormation to help shed some light on why you can see and
accvio and then keep running.
When the accvio occurs it can happen in a number of places, and some
of these are capable of telling the requestor that they died, others
are not. When they are capable of do ing so, then some memory clean up
can occur, allowing operations to continue. The fact that an accvio
occurs and not a clean message is a big problem. This must be addressed
and fixed. If an insufficient virtual memory message is return then we
can try to do something to clean up the mess we are in, or tell the
user about the problem. With the ACCVIO we are simply in a bad state.
Like Keith said there is an awful lot of stuff going on (note that the
notification PM functions are built into the iconic map, they are not
two seperate modules). One of the things we did through out the system
to help "speed" things up, was to add alot of specialized caching. I
personally believe that this is part of the problem we are now seeing.
Where we are caching additional inormation away, but we are not purging
them out when we see that we are running out of memory. More
investigation is on-going in this area.
In general Jim Swist has taken on the major responsiblity in this area
for overseeing the work/investigation for the remainder of field test.
Any ideas I am sure are welcome.
Please understand we are *NOT* trying to trivialize this problem, it is
the number one item on everyones agenda. I believe that the answers you
have received so far are simply trying to point out that the problem
"appears" from a users view to be in alarms, but due to the complexity
of the system that may be the symptom and not the entire problem. The
information you are providing will be helpful in finding the accvio
portion of the problem and helping to make MCC more stable. If you
could also post the rules that you are running we can see if we can
duplicate the access violation on our test systems.
Thanks for the testing, and PLEASE keep the input coming in.
Jill Callander
|
1994.14 | Please continue the good work | TAVIS::PERETZ | | Sun Jan 12 1992 08:51 | 27 |
| > Please understand we are *NOT* trying to trivialize this problem, it is
> the number one item on everyones agenda.
Never thought you did.
> I believe that the answers you
> have received so far are simply trying to point out that the problem
> "appears" from a users view to be in alarms, but due to the complexity
> of the system that may be the symptom and not the entire problem.
I have no idea which piece of code is responsible. I mentioned "alarms" only
because it happened when I defined and activated alarm rules.
> The
> information you are providing will be helpful in finding the accvio
> portion of the problem and helping to make MCC more stable. If you
> could also post the rules that you are running we can see if we can
> duplicate the access violation on our test systems.
The rules were quite simple:
"User bytes sent > x at every 00:01:00" (x=50, 100, 200, 300, 400, 500)
So I had 6 rules firing every minute and appearing in the notification window.
I run VMS 5.4-2 on VAXstation 3100 M48 w 32MB memory. My quotas are listed
in previous replies to this note. You have to wait overnight before the ACCVIO
appears.
Peretz
|
1994.15 | Alarms-FM may be a wild-goose chase. | TOOK::GUERTIN | Don't fight fire with flames | Mon Jan 13 1992 08:20 | 5 |
| I just tried the exact same rules. They ran for almost two days before
I had to kill them (to install another version of MCC). No accvios.
I'm convinced the accvios have little or nothing to do with alarms.
-Matt.
|
1994.16 | But still they come... | TAVIS::PERETZ | | Mon Jan 13 1992 10:15 | 8 |
| > I just tried the exact same rules. They ran for almost two days before
> I had to kill them (to install another version of MCC). No accvios.
> I'm convinced the accvios have little or nothing to do with alarms.
I don't know if they are from alarms or some other code. I ran it again last
night and this morning I had again 3 ACCVIOs. There IS a problem somewhere...
Peretz
|
1994.17 | did you test with the Iconic Map & Notification ? | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Mon Jan 13 1992 13:42 | 5 |
| Matt -- did you test with the Iconic Map & Notification ?
I believe that Peretz is testing with the Map & Notification enabled.
/keith
|
1994.18 | No | TOOK::GUERTIN | Don't fight fire with flames | Mon Jan 13 1992 14:01 | 18 |
| I'm trying not to get too involved here. The point of my testing was
*ONLY* to test if the Alarms-FM could run the specified rules for two
days straight without accvioing and it did. So, I ran alarms
"stand-alone" from FCL. Since it seemed to be humming right along, I
concluded that the Alarms-FM was not directly involved in the accvios.
I was concerned that because of the title of this note, and the general
discussion, some people might head in the wrong direction and/or draw
the wrong conclusions. There should really be two notes here, one on
running out of virtual memory, and another on accvios when running
Iconic Map w/Notifications enabled. And the accvios should be easy to
isolate (previous note). So, I'm assuming that someone on the
Notification team (or Iconic Map team) has this well in hand (if not
resolved) by now.
-Matt.
(ps., If I had more time I would have run the IMPM. Why? Did you
reproduce it?)
|
1994.19 | It happens with IMPM & Notification enabled | TAVIS::PERETZ | | Tue Jan 14 1992 04:56 | 5 |
| >I believe that Peretz is testing with the Map & Notification enabled.
Thats correct.
Peretz
|
1994.20 | | NANOVX::ROBERTS | Keith Roberts - DECmcc Toolkit Team | Tue Jan 14 1992 09:42 | 8 |
| re: .18
> (ps., If I had more time I would have run the IMPM. Why? Did you
> reproduce it?)
No - I haven't tried - but I will. Thanks 8)
|