| I have noticed that the same thing will occur on our VAXcluster
at different times. The normal indication that it has "lost" a
process on the system is that the stack size exceeds the process
count on the system. It seems to occur more frequently the longer
that ZAP has been running continuously on the system.
Might I suggest, modify the ZAPMAINT command procedure to resubmit
itself every day (sometime late at night if you don't run 24hrs).
Use this procedure to restart ZAP which should help with the problem
of "lost" processes.
Another problem, which I have found, is that at times, a user will
abort from a "protected" (in ZAP.DAT) image and the system (VMS)
still thinks that the process is "executing" the image. Now since
ZAP can only see what the system sees, it will think that the user
is still using the same "protected" image. One way to check this
is to run the WHO program (or similar utility such as WHAT) to check
the image name for "unzapped" processes.
One way to solve the above problem is to put very long time limits
on those images which get "hung" instead of making them immune to
being zapped (example: 60 minutes instead of *).
Be careful of turning debug on. The log files really do get VERY
large. If the above suggestions do not help, then I will try to
simulate your problem here and see if I can find a solution.
Keith Maconi
|
|
I have experienced the same problem as in .0, however I successfully
managed to turn on debug and examine what is happening.
Simply telling ZAP to restart by setting ZAP$RUN_STATUS does not
clear the problem. I am currently running ZAP V3.7 on VMS V4.6.
I did see the problem using ZAP V3.6.
The debug output was comparing the offending process to the wrong
execption record from ZAP.DAT. The process was a interactive user,
connected via a terminal server.
It should have been comparing to a uic record identifying the
user and terminal = ALL, idle limit = 15, and image = *. However
in this particular case it was using the exception record:
UIC = *, terminal = DETACH, idle limit = -1 (assume this is *)
and image = *.
Comments, suggestions, or work arounds would be appreciated. I
am currently evaluating ZAP for use on this system but as in .0
I need 100%.
Running ZAP$MAINT cleared the problem and eventually stopped the
process in question.
Regards,
Ian
|
|
IMPORTANT NOTICE
Currently released versions of Zap can have problems "loosing"
processes. This is caused by the process being scanned while
it is still logging into the system (at username prompt). At
such time, it properly identifies the process as a DETACHed
job in [1,4].
Zap was also written not to update the MODE and TERMINAL NAME
of a process once detected. This caused to to forget about
the job until it was restarted (ZAP$RUN_STATUS = START).
I am currently testing (and field testing) a new version (3.8)
which updates the MODE and TERMINAL NAME each time it scans
a process. This should solve the problem of "lost" users.
As soon as it is available, it will be posted in this notes file.
Keith Maconi
|