[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference csc32::consolemanager

Title:POLYCENTER Console Manager
Notice:Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS:
Moderator:CSC32::BUTTERWORTH
Created:Thu Aug 06 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1541
Total number of notes:6564

518.0. "Looking for .LOG Files" by CGOOA::VCOOKE (Vern Cooke @CTU (Western Canada CNS)) Tue Dec 13 1994 14:51

    Hello:
    
    I am trying to track down the .LOG files that the C3 and DAEMON
    processes create in a VMS environment. I have looked in SYS$MANAGER,
    CONSOLE$TMP and CONSOLE$LOGSFILES. The only files I am able to locate
    are in CONSOLE$LOGSFILES and relate to monitored systems. I have even
    done a DIR disk:[000000...]*DAEMON*.* on both disks in the PCM
    system to no avail!
    
    Is there some special logical that must be defined to create these log
    files? Specifically, I am after the log file that would contain info on
    LAT connection failures.
    				Any help would be appreciated,
    						.............. Vern.
T.RTitleUserPersonal
Name
DateLines
518.1Version InformationCGOOA::VCOOKEVern Cooke @CTU (Western Canada CNS)Tue Dec 13 1994 14:575
    Ooooops!
    
    Forgot to mention that I am running PCM V1.5-002 with a 1.5-003 CONMON
    image on a VAX/VMS 6.1 system.
    					................ Vern.
518.2OPG::PHILIPAnd through the square window...Tue Dec 13 1994 17:0512
Vern,

  Neither the C3 nor the daemons create log files unless the debug symbol is
  defined, even then the information in the log files would be pretty useless
  to you unless you were familiar with the internal workings of the product. 

  May I ask why you are looking for these log files, do you have a specific 
  problem.

Cheers,
Phil

518.3Description of ProblemsCGOOA::VCOOKEVern Cooke @CTU (Western Canada CNS)Tue Dec 13 1994 20:5661
Hi Phil:

Yes, I am experiencing a few problems and was looking for additional sources of
info to provide you with my questions. But, since there are no log files, here
    is a description of the situtation:

Last Sunday we cut over from VCS to Console Manager for 28 Customer systems we
manage out of our data centre. These systems are fairly active, with requests
every few minutes throughout the day.

Last night, Console Manager unexpectedly shut down. I talked the operator
through the process of re-starting it hoped to look at the logs when I came in
this morning. Unfortunately, no logs! The info the operator provided was that
Console Manager displayed a shutdown message, then shut itself down.

Anyhow, when I arrived this morning, I found a number of systems unreachable. I
immediately shut down Console Manager and re-started it, watching the output
myself. All seemed to go okay, except one system was still unreachable. A "SHOW
SYSTEMS" gave the message "Port Open Failed". I looked at the associated
DECserver port and it showed a state of "Remote Idle".

Console Manager worked for the rest of the morning, then we were unable
to connect to systems - doing a "CONSOLE MONI" from DCL produced a blank
screen. I looked at the system with "SHOW SYSTEM" (in DCL) and found that the
two "Console Ctrl 01" and "Console Ctrl 02" processes were in RWMBX state. I
then shut down (CONSOLE SHUTDOWN) Console Manager and found that those two
processes would not go away. I ended up STOP/IDing them. Then, I restarted
Console Manager (@SYS$STARTUP:CONSOLE$STARTUP). This time, seven systems were
unreachable. Another shutdown/restart did not help - still seven were
unreachable. Finally, I ended up rebooting the Console Manager node. That
cleared things up. I am now able to communicate with all the systems.

So, this boils down to:

1) What would cause a system to become unreachable and why would they remain
   unreachable after a Console Manager shutdown/restart? How can I fix this?
2) What causes the "Console Ctrl xx" processes to go into RWMBX state and how 
   can this be prevented (I assume this is abnormal from some of the other
   Notes I've read - I haven't had time yet to experiment with some of the
   possible solutions including the -003 images)? How can I prevent this
   from happening?
   I suspect this caused the unexpected shutdown. Correct suspicion?
3) Will the MUP I keep seeing references to fix any of these problems?
4) BONUS PROBLEM: We have defined "Request" events to alert the operator that
   a mount request is outstanding. Since this text occurrs at the start of the
   line, we have found that the remainder of the line (intermittently) does not
   appear in the ENS display AND is not properly displayed in the monitor
   interface.
   ie. "Request 10127, from user BFARRELL on THOR" appears as
       "Request 10127, f"
   in both the monitor interface and the ENS display. The text does not appear
   unless you wait for a period then VIEW the system again.
   I am very concerned about the events treatment by ENS since ENS triggers
   action routines passing the message text as a parameter. This becomes
   serious for events dealing with system availability more so than requests.
   I have read the notes and understand that Console Manager triggers events
   when it sees the text and will not wait for the line to finish.
   Is there any workaround to this problem?

						Thank you for your help.
							...... Vern.
518.4OPG::PHILIPAnd through the square window...Tue Dec 13 1994 21:1851
Vern,

  Thanks for the clear and concise description of your problems, now
  unfortunately, I dont have too many answers for you except...

  1) This shouldnt happen, PCM should reconnect to "lost" consoles.

  2) I dont know what causes the RWMBX (well, I do, its a resource
     problem, its just we dont exactly know why). I notice in your first
     note you say your are running V1.5-002, could you copy the 
     CONSOLE$DAEMON.EXE_VAX image from OPG::CM$KIT: and try that, it starts
     its "child" processes differently and you will get log files for them
     with some semi-meaningful information, especially if you do a
     define/system of CONSOLE$DEBUG to be DAEMON before you start the
     software. The logs are in CONSOLE$TMP and called CONTROLLER_xx wher xx
     is the same as the number in the process name. 

     On top of all that, this image starts the children with quotas that for 
     us in our test environment cause no problems.

     BTW, there is a new ENS image as well which fixes some memory leaks you 
     may want to try that as well.

     Should you decide to try these images, I would appreciate some feedback
     letting us know if they work for you in your environment.

  3) The MUP has all the fixes I have described up to now, most of the
     significant fixes are also in the images I have pointed you to in (2)
     above. If these do indeed fix your problem, then the MUP will too,
     however, we have not put any fixes in for your specific problem as we
     have never been able to reproduce it here. 

  4) Ahh, now I can answer this one. We are notifying you of an event as soon
     as the last character in the match string (regular expression or
     literal) is seen on the console, now, when the event is passed on to
     ENS, the rest of the line may not have been output by the managed system
     (worst case) or the daemon has not flushed the text to the log file.
     This is a side-effect of us not waiting until the end of the line before
     performing the notify (which is what I think VCS did) You can revert to
     the VCS behaviour easily by changing your event definitions from
     literals to regular expressions and adding the string *^ to the end of
     the text to scan for, this will cause the daemon to scan for your
     string, then any number of characters to the end of line, only when the
     end of line has happened will notification take place and as the data is
     flushed to disk before the notification, you stand an extremely good
     chance of the text being there for your action routine. 

  I hope all this helps.

Cheers,
Phil
518.5Thanks!CGOOA::VCOOKEVern Cooke @CTU (Western Canada CNS)Tue Dec 13 1994 23:2326
    Phil:
    
    Thank you very much for the quick response! Your answers were very
    helpful:
    
    1) Hmmm. I did notice a few extra LTA5xxx devices around after I had
       shut down PCM the last time (forgot to mention it, sorry). This time
       they all disappeared when I shut down PCM and reappeared when it
       started. None disabled. I'll keep watching this to see if any
       consistent pattern emerges.
    
    2) Thank you for the images. I pulled them over and installed them
       (manually "INSTALL REMOVE"ing the old ones first). I also defined
       the CONSOLE$DEBUG logical so I could enjoy the log files! :-)
       I'll let you know how these work.
    
    3) When is the MUP (or V2.0) due out?
    
    4) Great! I'll update my events tomorrow! Does that mean the my Request
       event will be updated like this:
    	From:  Text    = Request
        To:    Text    = Request*^
       along with changing "Literal" to "Regular"?
    
    Again, thank you for your help!
    						....... Vern.
518.6OPG::PHILIPAnd through the square window...Wed Dec 14 1994 09:4541
Vern,

>>    Thank you very much for the quick response! Your answers were very
>>    helpful:

 No problem, glad to be of use.
    
>>    1) Hmmm. I did notice a few extra LTA5xxx devices around after I had
>>       shut down PCM the last time (forgot to mention it, sorry). This time
>>       they all disappeared when I shut down PCM and reappeared when it
>>       started. None disabled. I'll keep watching this to see if any
>>       consistent pattern emerges.

 These should go away at shutdown time, however, if you STOP/ID a child 
  controller, it wont have the chance to delete them (which is, i assume, how 
  you recovered from the RWMBX problem).
    
>>    2) Thank you for the images. I pulled them over and installed them
>>       (manually "INSTALL REMOVE"ing the old ones first). I also defined
>>       the CONSOLE$DEBUG logical so I could enjoy the log files! :-)
>>       I'll let you know how these work.

 Thanks you for taking the time to try them out.
    
>>    3) When is the MUP (or V2.0) due out?

 As soon as its finished ;-) Actually, we hope to get it out before our 
 Christmas break.    

>>    4) Great! I'll update my events tomorrow! Does that mean the my Request
>>       event will be updated like this:
>>    From:  Text    = Request
>>    To:    Text    = Request*^
>>       along with changing "Literal" to "Regular"?

  Exactly! And by the way, the section on ENS in the PCM Users guide shows 
  you all the regular expression characters.

Cheers,
Phil    

518.7ELGIN::RASOOLMThe computer in front is an ALPHAWed Dec 14 1994 12:0312
    >>>    3) When is the MUP (or V2.0) due out?
    
    > As soon as its finished ;-) Actually, we hope to get it out before our
    > Christmas break.
    
    
    I take that to refer to the MUP. Do you have a target date for V2.0?
    
    Regards,
    
    Max.
    
518.8Not YetOPG::SIMONWed Dec 14 1994 12:556
    re .7
    
    There is not yet an official target date for V2.0 and I do not wish to
    discuss hopefuls in a public conference.
    
    Cheers Simon...
518.9Problems AgainCGOOA::VCOOKEVern Cooke @CTU (Western Canada CNS)Wed Dec 14 1994 22:53123
Hi Phil:

Well, we had another hang today with the new images. This time, they did not
enter the RWMBX state. They were not getting any CPU time and sitting in HIB
and LEF. In other words, everthing looked normal except that nothing was
running!

I did a CONSOLE SHUTDOWN and found that the Console Notify process and one of
the Ctrl processes did not stop. I have included SHOW PROCESS/ACCOUNTING and
SHOW PROCESS/QUOTA info for each of those below.

I ended up manually stopping both processes. Again, the LTA ports were left
behind. There was nothing special about these ports: LATCP showed them
    connected to the target DECserver and port. This answers my previous
    question (1) about the port problems in re-starting Console Manager. I
    ended up manually deleting these ports using LATCP.

I was then able to successfully re-start Console Manager.

Unfortunately, though I had renamed the CONTROLLER_01.LOG and CONTROLLER_02.LOG
files, I left them in the CONSOLE$TEMP directory. The entire directory was
nicely cleaned out by Console Manager on startup, preventing me from enjoying
the log files. :-(

Please left me know what steps I should take the next time this happens. I
already plan on copying the .LOG files to another directory. Are there any SHOW
PROCESS or ANALYZE/SYSTEM things you would like me to try?

						.......... Vern.

14-DEC-1994 15:12:05.28   User: SYSTEM           Process ID:   00000503
                          Node: CTUPCM           Process name: "Console Notify"

Process Quotas:
 Account name: SYSTEM
 CPU limit:                      Infinite  Direct I/O limit:       100
 Buffered I/O byte count quota:    300214  Buffered I/O limit:    8192
 Timer queue entry quota:             255  Open file quota:         97
 Paging file quota:                  8086  Subprocess quota:         8
 Default page fault cluster:           64  AST quota:              194
 Enqueue quota:                      2048  Shared file limit:        0
 Max detached processes:                0  Max active jobs:          0


14-DEC-1994 15:12:27.80   User: SYSTEM           Process ID:   00000503
                          Node: CTUPCM           Process name: "Console Notify"

Accounting information:
 Buffered I/O count:     32914  Peak working set size:       1655
 Direct I/O count:       16838  Peak virtual size:           9577
 Page faults:             1281  Mounted volumes:                0
 Images activated:           3
 Elapsed CPU time:          0 00:04:08.23
 Connect time:              0 23:10:49.45


14-DEC-1994 15:10:27.95   User: SYSTEM           Process ID:   00000510
                          Node: CTUPCM           Process name: "Console Ctrl 02"

Process Quotas:
 Account name: SYSTEM
 CPU limit:                      Infinite  Direct I/O limit:      1024
 Buffered I/O byte count quota:    619446  Buffered I/O limit:    1024
 Timer queue entry quota:            1023  Open file quota:        985
 Paging file quota:                 23368  Subprocess quota:        64
 Default page fault cluster:           64  AST quota:             1004
 Enqueue quota:                      1024  Shared file limit:        0
 Max detached processes:                0  Max active jobs:          0

14-DEC-1994 15:13:15.15   User: SYSTEM           Process ID:   00000510
                          Node: CTUPCM           Process name: "Console Ctrl 02"

Accounting information:
 Buffered I/O count:    172667  Peak working set size:       6099
 Direct I/O count:       88490  Peak virtual size:          14056
 Page faults:             5346  Mounted volumes:                0
 Images activated:           1
 Elapsed CPU time:          0 00:17:11.45
 Connect time:              0 23:11:29.73

14-DEC-1994 15:11:07.42   User: SYSTEM           Process ID:   00000510
                          Node: CTUPCM           Process name: "Console Ctrl 02"

Terminal:
User Identifier:    [SYSTEM]
Base priority:      4
Default file spec:  Not available

Devices allocated:  CTUPCM$LTA5035:
                    CTUPCM$LTA5037:
                    CTUPCM$LTA5040:
                    CTUPCM$LTA5042:
                    CTUPCM$LTA5043:
                    CTUPCM$LTA5046:
                    CTUPCM$LTA5047:
                    CTUPCM$LTA5050:
                    CTUPCM$LTA5052:
                    CTUPCM$LTA5054:
                    CTUPCM$LTA5056:
                    CTUPCM$LTA5058:

(This CONSOLE STATUS was done after I did the SHUTDOWN and the two processes
were still left behind).

                       POLYCENTER Console Manager Summary
                                     Totals


Configured Systems:   0 User disabled:   0
Active Systems    :   0 (D:000 P:000 L:000 T:000)   Unreachable: 000
Active Users      :   5 (Connect/Monitor: 003 C3: 002 Event sources: 003)

CM pid ........: 00000000                  Uptime:   0 00:00:00 (Not Running)
ENS pid .......: 00000503 V1.5-003         Uptime:   0 23:09:57

Total bytes ...: 0            (0)
Total lines ...: 0            (0)
Total events ..: 0            (0)
Total actions .: 1266         (0)
Active actions : 1                 Failed actions : 0

   Crit: 0  Maj: 0  Min: 0  Warn: 0  Clr: 0  Ind: 0

518.10More InfoCGOOA::VCOOKEVern Cooke @CTU (Western Canada CNS)Wed Dec 14 1994 23:0314
Hi Phil. Me again:

I was reading 422 and it seems to describe my situation:

- I have a number of empty log files.
- I had to stop the processes manually then clean up the LTA ports.
- CONSOLE MONITOR was the command that hung.

The only item that I am not sure about is if CONSOLE$MONITOR goes into a
CPU loop. I did check the process of the user doing CONSOLE MONITOR and
that process was not clocking any CPU at all.

Just thought I would mention this.
							....... Vern.