T.R | Title | User | Personal Name | Date | Lines |
---|
518.1 | Version Information | CGOOA::VCOOKE | Vern Cooke @CTU (Western Canada CNS) | Tue Dec 13 1994 14:57 | 5 |
| Ooooops!
Forgot to mention that I am running PCM V1.5-002 with a 1.5-003 CONMON
image on a VAX/VMS 6.1 system.
................ Vern.
|
518.2 | | OPG::PHILIP | And through the square window... | Tue Dec 13 1994 17:05 | 12 |
| Vern,
Neither the C3 nor the daemons create log files unless the debug symbol is
defined, even then the information in the log files would be pretty useless
to you unless you were familiar with the internal workings of the product.
May I ask why you are looking for these log files, do you have a specific
problem.
Cheers,
Phil
|
518.3 | Description of Problems | CGOOA::VCOOKE | Vern Cooke @CTU (Western Canada CNS) | Tue Dec 13 1994 20:56 | 61 |
| Hi Phil:
Yes, I am experiencing a few problems and was looking for additional sources of
info to provide you with my questions. But, since there are no log files, here
is a description of the situtation:
Last Sunday we cut over from VCS to Console Manager for 28 Customer systems we
manage out of our data centre. These systems are fairly active, with requests
every few minutes throughout the day.
Last night, Console Manager unexpectedly shut down. I talked the operator
through the process of re-starting it hoped to look at the logs when I came in
this morning. Unfortunately, no logs! The info the operator provided was that
Console Manager displayed a shutdown message, then shut itself down.
Anyhow, when I arrived this morning, I found a number of systems unreachable. I
immediately shut down Console Manager and re-started it, watching the output
myself. All seemed to go okay, except one system was still unreachable. A "SHOW
SYSTEMS" gave the message "Port Open Failed". I looked at the associated
DECserver port and it showed a state of "Remote Idle".
Console Manager worked for the rest of the morning, then we were unable
to connect to systems - doing a "CONSOLE MONI" from DCL produced a blank
screen. I looked at the system with "SHOW SYSTEM" (in DCL) and found that the
two "Console Ctrl 01" and "Console Ctrl 02" processes were in RWMBX state. I
then shut down (CONSOLE SHUTDOWN) Console Manager and found that those two
processes would not go away. I ended up STOP/IDing them. Then, I restarted
Console Manager (@SYS$STARTUP:CONSOLE$STARTUP). This time, seven systems were
unreachable. Another shutdown/restart did not help - still seven were
unreachable. Finally, I ended up rebooting the Console Manager node. That
cleared things up. I am now able to communicate with all the systems.
So, this boils down to:
1) What would cause a system to become unreachable and why would they remain
unreachable after a Console Manager shutdown/restart? How can I fix this?
2) What causes the "Console Ctrl xx" processes to go into RWMBX state and how
can this be prevented (I assume this is abnormal from some of the other
Notes I've read - I haven't had time yet to experiment with some of the
possible solutions including the -003 images)? How can I prevent this
from happening?
I suspect this caused the unexpected shutdown. Correct suspicion?
3) Will the MUP I keep seeing references to fix any of these problems?
4) BONUS PROBLEM: We have defined "Request" events to alert the operator that
a mount request is outstanding. Since this text occurrs at the start of the
line, we have found that the remainder of the line (intermittently) does not
appear in the ENS display AND is not properly displayed in the monitor
interface.
ie. "Request 10127, from user BFARRELL on THOR" appears as
"Request 10127, f"
in both the monitor interface and the ENS display. The text does not appear
unless you wait for a period then VIEW the system again.
I am very concerned about the events treatment by ENS since ENS triggers
action routines passing the message text as a parameter. This becomes
serious for events dealing with system availability more so than requests.
I have read the notes and understand that Console Manager triggers events
when it sees the text and will not wait for the line to finish.
Is there any workaround to this problem?
Thank you for your help.
...... Vern.
|
518.4 | | OPG::PHILIP | And through the square window... | Tue Dec 13 1994 21:18 | 51 |
| Vern,
Thanks for the clear and concise description of your problems, now
unfortunately, I dont have too many answers for you except...
1) This shouldnt happen, PCM should reconnect to "lost" consoles.
2) I dont know what causes the RWMBX (well, I do, its a resource
problem, its just we dont exactly know why). I notice in your first
note you say your are running V1.5-002, could you copy the
CONSOLE$DAEMON.EXE_VAX image from OPG::CM$KIT: and try that, it starts
its "child" processes differently and you will get log files for them
with some semi-meaningful information, especially if you do a
define/system of CONSOLE$DEBUG to be DAEMON before you start the
software. The logs are in CONSOLE$TMP and called CONTROLLER_xx wher xx
is the same as the number in the process name.
On top of all that, this image starts the children with quotas that for
us in our test environment cause no problems.
BTW, there is a new ENS image as well which fixes some memory leaks you
may want to try that as well.
Should you decide to try these images, I would appreciate some feedback
letting us know if they work for you in your environment.
3) The MUP has all the fixes I have described up to now, most of the
significant fixes are also in the images I have pointed you to in (2)
above. If these do indeed fix your problem, then the MUP will too,
however, we have not put any fixes in for your specific problem as we
have never been able to reproduce it here.
4) Ahh, now I can answer this one. We are notifying you of an event as soon
as the last character in the match string (regular expression or
literal) is seen on the console, now, when the event is passed on to
ENS, the rest of the line may not have been output by the managed system
(worst case) or the daemon has not flushed the text to the log file.
This is a side-effect of us not waiting until the end of the line before
performing the notify (which is what I think VCS did) You can revert to
the VCS behaviour easily by changing your event definitions from
literals to regular expressions and adding the string *^ to the end of
the text to scan for, this will cause the daemon to scan for your
string, then any number of characters to the end of line, only when the
end of line has happened will notification take place and as the data is
flushed to disk before the notification, you stand an extremely good
chance of the text being there for your action routine.
I hope all this helps.
Cheers,
Phil
|
518.5 | Thanks! | CGOOA::VCOOKE | Vern Cooke @CTU (Western Canada CNS) | Tue Dec 13 1994 23:23 | 26 |
| Phil:
Thank you very much for the quick response! Your answers were very
helpful:
1) Hmmm. I did notice a few extra LTA5xxx devices around after I had
shut down PCM the last time (forgot to mention it, sorry). This time
they all disappeared when I shut down PCM and reappeared when it
started. None disabled. I'll keep watching this to see if any
consistent pattern emerges.
2) Thank you for the images. I pulled them over and installed them
(manually "INSTALL REMOVE"ing the old ones first). I also defined
the CONSOLE$DEBUG logical so I could enjoy the log files! :-)
I'll let you know how these work.
3) When is the MUP (or V2.0) due out?
4) Great! I'll update my events tomorrow! Does that mean the my Request
event will be updated like this:
From: Text = Request
To: Text = Request*^
along with changing "Literal" to "Regular"?
Again, thank you for your help!
....... Vern.
|
518.6 | | OPG::PHILIP | And through the square window... | Wed Dec 14 1994 09:45 | 41 |
| Vern,
>> Thank you very much for the quick response! Your answers were very
>> helpful:
No problem, glad to be of use.
>> 1) Hmmm. I did notice a few extra LTA5xxx devices around after I had
>> shut down PCM the last time (forgot to mention it, sorry). This time
>> they all disappeared when I shut down PCM and reappeared when it
>> started. None disabled. I'll keep watching this to see if any
>> consistent pattern emerges.
These should go away at shutdown time, however, if you STOP/ID a child
controller, it wont have the chance to delete them (which is, i assume, how
you recovered from the RWMBX problem).
>> 2) Thank you for the images. I pulled them over and installed them
>> (manually "INSTALL REMOVE"ing the old ones first). I also defined
>> the CONSOLE$DEBUG logical so I could enjoy the log files! :-)
>> I'll let you know how these work.
Thanks you for taking the time to try them out.
>> 3) When is the MUP (or V2.0) due out?
As soon as its finished ;-) Actually, we hope to get it out before our
Christmas break.
>> 4) Great! I'll update my events tomorrow! Does that mean the my Request
>> event will be updated like this:
>> From: Text = Request
>> To: Text = Request*^
>> along with changing "Literal" to "Regular"?
Exactly! And by the way, the section on ENS in the PCM Users guide shows
you all the regular expression characters.
Cheers,
Phil
|
518.7 | | ELGIN::RASOOLM | The computer in front is an ALPHA | Wed Dec 14 1994 12:03 | 12 |
| >>> 3) When is the MUP (or V2.0) due out?
> As soon as its finished ;-) Actually, we hope to get it out before our
> Christmas break.
I take that to refer to the MUP. Do you have a target date for V2.0?
Regards,
Max.
|
518.8 | Not Yet | OPG::SIMON | | Wed Dec 14 1994 12:55 | 6 |
| re .7
There is not yet an official target date for V2.0 and I do not wish to
discuss hopefuls in a public conference.
Cheers Simon...
|
518.9 | Problems Again | CGOOA::VCOOKE | Vern Cooke @CTU (Western Canada CNS) | Wed Dec 14 1994 22:53 | 123 |
| Hi Phil:
Well, we had another hang today with the new images. This time, they did not
enter the RWMBX state. They were not getting any CPU time and sitting in HIB
and LEF. In other words, everthing looked normal except that nothing was
running!
I did a CONSOLE SHUTDOWN and found that the Console Notify process and one of
the Ctrl processes did not stop. I have included SHOW PROCESS/ACCOUNTING and
SHOW PROCESS/QUOTA info for each of those below.
I ended up manually stopping both processes. Again, the LTA ports were left
behind. There was nothing special about these ports: LATCP showed them
connected to the target DECserver and port. This answers my previous
question (1) about the port problems in re-starting Console Manager. I
ended up manually deleting these ports using LATCP.
I was then able to successfully re-start Console Manager.
Unfortunately, though I had renamed the CONTROLLER_01.LOG and CONTROLLER_02.LOG
files, I left them in the CONSOLE$TEMP directory. The entire directory was
nicely cleaned out by Console Manager on startup, preventing me from enjoying
the log files. :-(
Please left me know what steps I should take the next time this happens. I
already plan on copying the .LOG files to another directory. Are there any SHOW
PROCESS or ANALYZE/SYSTEM things you would like me to try?
.......... Vern.
14-DEC-1994 15:12:05.28 User: SYSTEM Process ID: 00000503
Node: CTUPCM Process name: "Console Notify"
Process Quotas:
Account name: SYSTEM
CPU limit: Infinite Direct I/O limit: 100
Buffered I/O byte count quota: 300214 Buffered I/O limit: 8192
Timer queue entry quota: 255 Open file quota: 97
Paging file quota: 8086 Subprocess quota: 8
Default page fault cluster: 64 AST quota: 194
Enqueue quota: 2048 Shared file limit: 0
Max detached processes: 0 Max active jobs: 0
14-DEC-1994 15:12:27.80 User: SYSTEM Process ID: 00000503
Node: CTUPCM Process name: "Console Notify"
Accounting information:
Buffered I/O count: 32914 Peak working set size: 1655
Direct I/O count: 16838 Peak virtual size: 9577
Page faults: 1281 Mounted volumes: 0
Images activated: 3
Elapsed CPU time: 0 00:04:08.23
Connect time: 0 23:10:49.45
14-DEC-1994 15:10:27.95 User: SYSTEM Process ID: 00000510
Node: CTUPCM Process name: "Console Ctrl 02"
Process Quotas:
Account name: SYSTEM
CPU limit: Infinite Direct I/O limit: 1024
Buffered I/O byte count quota: 619446 Buffered I/O limit: 1024
Timer queue entry quota: 1023 Open file quota: 985
Paging file quota: 23368 Subprocess quota: 64
Default page fault cluster: 64 AST quota: 1004
Enqueue quota: 1024 Shared file limit: 0
Max detached processes: 0 Max active jobs: 0
14-DEC-1994 15:13:15.15 User: SYSTEM Process ID: 00000510
Node: CTUPCM Process name: "Console Ctrl 02"
Accounting information:
Buffered I/O count: 172667 Peak working set size: 6099
Direct I/O count: 88490 Peak virtual size: 14056
Page faults: 5346 Mounted volumes: 0
Images activated: 1
Elapsed CPU time: 0 00:17:11.45
Connect time: 0 23:11:29.73
14-DEC-1994 15:11:07.42 User: SYSTEM Process ID: 00000510
Node: CTUPCM Process name: "Console Ctrl 02"
Terminal:
User Identifier: [SYSTEM]
Base priority: 4
Default file spec: Not available
Devices allocated: CTUPCM$LTA5035:
CTUPCM$LTA5037:
CTUPCM$LTA5040:
CTUPCM$LTA5042:
CTUPCM$LTA5043:
CTUPCM$LTA5046:
CTUPCM$LTA5047:
CTUPCM$LTA5050:
CTUPCM$LTA5052:
CTUPCM$LTA5054:
CTUPCM$LTA5056:
CTUPCM$LTA5058:
(This CONSOLE STATUS was done after I did the SHUTDOWN and the two processes
were still left behind).
POLYCENTER Console Manager Summary
Totals
Configured Systems: 0 User disabled: 0
Active Systems : 0 (D:000 P:000 L:000 T:000) Unreachable: 000
Active Users : 5 (Connect/Monitor: 003 C3: 002 Event sources: 003)
CM pid ........: 00000000 Uptime: 0 00:00:00 (Not Running)
ENS pid .......: 00000503 V1.5-003 Uptime: 0 23:09:57
Total bytes ...: 0 (0)
Total lines ...: 0 (0)
Total events ..: 0 (0)
Total actions .: 1266 (0)
Active actions : 1 Failed actions : 0
Crit: 0 Maj: 0 Min: 0 Warn: 0 Clr: 0 Ind: 0
|
518.10 | More Info | CGOOA::VCOOKE | Vern Cooke @CTU (Western Canada CNS) | Wed Dec 14 1994 23:03 | 14 |
| Hi Phil. Me again:
I was reading 422 and it seems to describe my situation:
- I have a number of empty log files.
- I had to stop the processes manually then clean up the LTA ports.
- CONSOLE MONITOR was the command that hung.
The only item that I am not sure about is if CONSOLE$MONITOR goes into a
CPU loop. I did check the process of the user doing CONSOLE MONITOR and
that process was not clocking any CPU at all.
Just thought I would mention this.
....... Vern.
|