[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference csc32::consolemanager

Title:POLYCENTER Console Manager
Notice:Kits, Scans, Docs on CSC32:: as PCM$KITS:,PCM$DOCS:, PCM$SCANS:
Moderator:CSC32::BUTTERWORTH
Created:Thu Aug 06 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1541
Total number of notes:6564

999.0. "Memory Fault - core dump on monitor interface" by 52246::ALBERTO (Alberto Yanguas. Systems Integration Spain) Fri Sep 22 1995 05:10

Hello,

I'm involved in a customer management platform implementation, wich includes
PCM.

I have installed PCM V1.6 ECO1 on Digital UNIX V3.2

The monitor interface (launched either from c3 or  console -c/-m) is having
a persistent problem.

After some seconds of start the monitor interface it crashes with
"memory fault - core dumped".

It usually happen when exercising the log review by pressing the advance
and back keys. Some other times has happened some seconds after connect to
the system console line. Even some time happened without doing anything, 
i.e. just invoke console -m and wait for the core dump.

I have clean all log files from the console lines, to start log files from 
scratch, and have restarted PCM and rebooted the system for many times.

The problem is still happening.

Could you please tell me how to investigate the problem ?
Is there any information that I can save for you to analyze the problem ?

Thank you in advance for your support

Regards
Alberto Yanguas
Systems Integration - Spain

T.RTitleUserPersonal
Name
DateLines
999.1Any help please!?52246::ALBERTOAlberto Yanguas. Systems Integration SpainWed Sep 27 1995 10:0938
Hello,

Please I need help. Could anybody answer to .0 ?

I have seen following behaviour:

After a core dump of the monitor/connect interface, I connect to the same
system from another terminal. Each caracter that is typed under connect 
interface is echoed twice. But the managed system respods correctly to the
typed command and through the monitor interface you see the log with good
appearence (i.e. without echo twice).

After a second core dump, I enter again the connect interface and each caracter
that I type is echoed three times, but the managed system still receives only
one time as he interprets commands correctly.

Restarting the console daemons clears the problem but the customer still have
a monitor/connect that is inoperative.

Another observation:

- Start Console Manager
- invoque a connection: console -c monti2
- just wait and see...
- after some minutes the connection interface is core dumped
- shutdown console manager

On logfile /var/opt/console/tmp/CONSOLE_CHILD_001.LOG you can read:

Write error on Local socket /var/opt/console/tmp/CONSOLE_CTRL_MONTI2
Bad file number


Please, could you give me any assessment on what to do?


Regards
Alberto
999.2Seems same problem on note 80052246::ALBERTOAlberto Yanguas. Systems Integration SpainWed Sep 27 1995 10:5519
Hi again,

The problem described here in .0 is quite similar to the one described in 
note 800, that was aparently solved with a newer field test kit.

The kits that I have installed have been taken from PCMSVR::DISK$PCM:[PCM]
and are the following:

DCROSF161.TAR;1         7520/7521     21-JUL-1995 14:40:25.00
PCM_V16_OSF.TAR;1      10620/10620    21-JUN-1995 11:40:59.00

Are these the final SSB kits ?
Is there any additional patch ?


Rgds
Alberto

P.D. Please Any answer from anybody ?
999.3ZENDIA::DBIGELOWInnovate, Integrate, EvaporateWed Sep 27 1995 13:3624
    
    Alberto,
    
         You can try the following:
    
    1. shutdown PCM. Make sure there are no lingering processes hanging
       around after you shutdown PCM 
    
       ps -wax | grep console
    
       dxconsole daemons are not part of PCM, so do not kill them.
    
    2. Delete all files in /var/opt/console/tmp
    
    3. Restart PCM.
    
    I don't know why you are geting the character echo'ed 2-3 times. 
    If it is still doing this, then look into the setting on the terminal
    server. Make sure that local echo is not set.
    
    You appear to be using the latest kit version.
    
    Dave
    
999.4Is AlphaServer 2100 supported hardware ?52246::ALBERTOAlberto Yanguas. Systems Integration SpainMon Oct 02 1995 12:5817
Thanks for your suggestions, but that only clears the problem untill
it happens again. It hhapens only with some console lines. The rest appear
to be stable and reliable although no extensive testing has been done yet.

Other issue:

The management platform of the customer (including PCM) is running on a 
Alphaserver 2100 4/275.			

This type of system does not appear on the list of supported hardware on SPD !

Is really the 2100 not supported for run PCM software ?



Rgds
Alberto
999.5ZENDIA::DBIGELOWInnovate, Integrate, EvaporateMon Oct 02 1995 16:168
    Alberto,
    
       Also, make sure that the terminal server's software is up to date.
    
    Regarding the 2100, PCM was probabily released before the 2100 was. 
    It should work just fine on a 2100.
    
    Dave
999.6Problem increase !52246::ALBERTOAlberto Yanguas. Systems Integration SpainWed Oct 04 1995 09:2627
The problem still persists and it seems is growing.

Although, I will upgrade to the last DEC UNIX V3.2C and also will convince 
the customer to upgrade terminal servers to the last revision, I think the 
problem may be in another place.

The console manager daemons work fine. There are abount 20 systems defined,
using telnet connections, and the output of consoles appears correctly on
the log files and I can execute console -x without any problems.

BUT, THE PROBLEM IS that I cannot mantain a CONNECT/MONITOR session for more
than a minute without crassing the process with a core dump.

Even I invoke just console -m (and type a show systems) and some time in a few
seconds (without doing anything else) the program core dumps.

Is it possible to check whether I've got the right programs ? For example
by checking the checksum of the programs involved  when invoking console -m
or console -c ?

Is of any use the core files that I have saved ?

Hope any additional help

regards
Alberto

999.7Another one here...could it be resources??29067::SCHLABSThu Oct 12 1995 13:428
    
    I am talking with a gentleman who is seeing the same thing, but only
    on one system.  He gets the memory fault, core dumped on the connect.
    This is a heavily loaded oracle server.  Could resources be a problem??
    If so, what should we look at?
    
    thanks,
     greg
999.8Problem solved !52246::ALBERTOAlberto Yanguas. S.I. SpainMon Oct 16 1995 12:2042
After some extensive testing I think I've isolated when this failure occurs.

The console_conmon program appears to core dump (in less than one minute
after it has been invoked) when either or both the following situations 
occurs:

a) There are no log files (*.EVENTS, *.LOG, *.TIMES) for a configured system.
   
   This may happen if the files are deleted or moved to another directory,
   or when a console link never establises for a new configured system.
   (log files are not created until a console link is initially established).

b) There is a system with telnet connection (I never tested with LAT) whose
   terminal server is unknown (i.e. not in /etc/hosts).

   This was my intentional fault trying to emulate VCS funcionality to have 
   a peripheral icon on the C3 display. I defined a system with a connection
   to a dummy server name.

   To avoid that, I have redefined the connection to pseudo-terminal and have
   a convenient script that loops to avoid link disconnections.

   Question: Is there a more direct way to represent 'peripheral' or other 
   kind of icons without beeing a configured system ?  (Oh... dear VCS...!)


After I created empty log files for the systems whose console link does not
still work, and avoid to use dummy server for the connection information, the
core dump problem seems to have gone away, and a monitor session has lasted 
several hours without any core dump.

I'm able to reproduce the core dump if either or both of above situations occur.

The environment is DEC UNIX V3.2C and PCM V1.6 + ECO 1

I think this problems/workarounds could be investigated or at least documented
as a restriction.

Best Regards

Alberto
999.929067::BUTTERWORTHGun Control is a steady hand.Tue Oct 17 1995 18:3433
>a) There are no log files (*.EVENTS, *.LOG, *.TIMES) for a configured system.
   
>   This may happen if the files are deleted or moved to another directory,
>   or when a console link never establises for a new configured system.
>   (log files are not created until a console link is initially established).

>b) There is a system with telnet connection (I never tested with LAT) whose
>   terminal server is unknown (i.e. not in /etc/hosts).
    
    As you have correctly surmised, the files aren't created until we have
    successfully connected to a node at least once. Since  item "b"
    indicates that you had intentionally configured a system  that could
    never be connected too then it caused the symptom in item "a". The
    point is item "a" is the *real* problem and this isnot the way it's
    supposed to work. It's broken!

>   This was my intentional fault trying to emulate VCS funcionality to have 
>   a peripheral icon on the C3 display. I defined a system with a connection
>   to a dummy server name.

>   To avoid that, I have redefined the connection to pseudo-terminal and have
>   a convenient script that loops to avoid link disconnections.

>   Question: Is there a more direct way to represent 'peripheral' or other 
>   kind of icons without beeing a configured system ?  (Oh... dear VCS...!)
    
    If I understand your questiont he answer is yes. Turn on EDut mode in
    the C3, place the pointer inthe backgroundof the C3 and hold down MB3.
    You can then create a peripheral icon from that menu.


    Regards,
       Dan
999.10Don't you think is a bug ?52246::ALBERTOAlberto Yanguas. S.I. SpainMon Oct 30 1995 12:4245
>>a) There are no log files (*.EVENTS, *.LOG, *.TIMES) for a configured system.
>   
>>   This may happen if the files are deleted or moved to another directory,
>>   or when a console link never establises for a new configured system.
>>   (log files are not created until a console link is initially established).
>
>>b) There is a system with telnet connection (I never tested with LAT) whose
>>   terminal server is unknown (i.e. not in /etc/hosts).
>    
>    As you have correctly surmised, the files aren't created until we have
>    successfully connected to a node at least once. Since  item "b"
>    indicates that you had intentionally configured a system  that could
>    never be connected too then it caused the symptom in item "a". The
>    point is item "a" is the *real* problem and this isnot the way it's
>    supposed to work. It's broken!
>

OK, but imagine that you add a new system with a valid server connection data,
then you do a console reconfiguration, but the server is unreachable.
Until the server is reachable you have the situation a) and there is a 
probabilty that a console monitor user gets a core dump and leaves the product 
in a not very reliable status until a full daemons restart.

Also imagine that you decide to change the location of log files, re-edit
the configuration file and restart PCM without moving previous logfiles.
Then all console connections 'unreachable' will have missing logfiles.

I think the above is 'the way some people is supposed to work' (at least me),
although these situations does not happen every day.

I think that the console monitor program should not core dump when there are
missing logfiles. If a patch to avoid this core dump is not going to be worked,
at least some kind of 'restriction or known bugs documentation' could be 
generated, to avoid 'frustration' on other people that could do same things.

(It took almost one month for this to be resolved, and I spent many days 
testing for this problem, and upgrading from UNIX V3.2B to V3.2C and 
reinstalling products, because 'this was mandatory' according to my local CSC 
support in order to be able to escalate the problem...!)

Regards
Alberto


999.11<29067::BUTTERWORTHGun Control is a steady hand.Mon Oct 30 1995 17:037
    Alberto,
      I think you misunderstood me. I said "it's broken" in -2 which means
    it is a bug.
    
    Regards,
       dan
    
999.1252246::ALBERTOAlberto Yanguas. S.I. SpainTue Oct 31 1995 14:376
    Sorry, Dan. I really misunderstood you.
    Thank you for your support.

    Regards
    Alberto