[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::sns

Title:POLYCENTER System Watchdog for VMS OSF/1 ULTRIX HP-UX AIX SunOS
Notice:Wishes:406,FAQ:845,Kits-VMS:1000,UNIX:694 VMS ECO01 FT kit: 521
Moderator:AZUR::HUREZZ
Created:Fri May 15 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1033
Total number of notes:4584

1004.0. "Polycenter watchdog : creation failure" by NETRIX::"[email protected]" (Thierry FAIDHERBE) Wed Feb 19 1997 09:49

Note posted on Digital_unix conference with ID 8867....

Hi to all,

I received a customer's problem with Polycenter watchdog : 

Problem description of call 

message : EPLZ14:Sensor /usr/opt/PSW/psw_sensor_eth
          creation failure.

4th time today.
Other agents seem to work normally.

Consolidator 2.2-03 OpenVMS Alpha 6.2-1h2
Agent : 2.2 on Digital Unix 3.2d1

We enabled the logfile right now....
 SNS> sh cons/full
Controller       : V2.2-03

Consolidator     : 135 V2.2-03
Profile          : $1$DUA1:[SYS3.SYSEXE]SNS$PROFILE.DAT;213
Log file         : SYS$SYSROOT:[SYSMGR]SNS$LOG.DAT;1  Enabled
Action routines  : Enabled
DECtalk          : Enabled
Mailbox          : Enabled
Polling interval : 180
Before setting   : Not specified
Since setting    : Not specified
Watchdog information:   
Node    Status    Class            Version  OS Version
  QPLZ11  Enabled   DEVELOPMENT       XO2.20   OSF1 V3.2 62 alpha
  QPLZ02  Enabled   DEVELOPMENT       V2.2-03  VMS V6.2
  QPLZ01  Enabled   DEVELOPMENT       V2.2-03  VMS V6.2
  EPLZ14  Enabled   PRODUCTION        XO2.20   OSF1 V3.2 41.64 alpha
  EPLZ13  Enabled   PRODUCTION        XO2.20   OSF1 V3.2 41 alpha
  EPLZ12  Enabled   PRODUCTION        XO2.20   OSF1 V3.2 41 alpha
  EPLZ11  Enabled   PRODUCTION        XO2.20   OSF1 V3.2 41 alpha
  EPLZ07  Enabled   PRODUCTION        V2.2-02  VMS V6.2-1H3
  EPLZ05  Enabled   DEFAULT           V2.2-03  VMS V6.2
  EPLZ04  Enabled   PRODUCTION        V2.2-02  VMS V6.2-1H3
  EPLZ03  Enabled   MANAGEMENT        V2.2-03  VMS V6.2-1H2
  EPLZ02  Enabled   PRODUCTION        V2.2-03  VMS V6.2
  EPLZ01  Enabled   PRODUCTION        V2.2-03  VMS V6.2

We have other problem on this system : cron process cores dump...

dbx /usr/sbin/cron core_mob
dbx version 3.11.8
Type 'help' for help.
Core file created by program "cron"

signal Segmentation fault at >*[NLstrdlen, 0x3ff800c7810]       bis     r1,
r2,r1
(dbx) t
>  0 NLstrdlen(0x140001698, 0x1400018a0, 0x11ffff330, 0x100000018,
0x3ff800cd490) [0x3ff800c7810]
   1 _doprnt(0x140000251, 0x11ffffb30, 0x28, 0x7e04, 0x3ff800ece70)
[0x3ff800c4f38]
   2 sprintf(0x140001af8, 0x140000230, 0x14000ab40, 0x140000258, 0x53300d6c8)
[0x3ff800c7914]
   3 ex(0x43e33412a61d8290, 0x43e03411267dffff, 0x22737708a6100000,
0x27ba20006b5b4365, 0x4400041023bd5534) [0x120006d50]
   4 ex(0x4787041c4b80005c, 0x239000013f800000, 0x400034004a3c0f51,
0xf63ffff44a271791, 0x221e00b2a77d8180) [0x120006afc]


I toke a look in  dxbookreadter and found :
  Sensor Creation failure    WDM   Not enough resources; stop and restart
                                   psw_agent

I suspect that watchdog problem in way with cron problem : if no cron process,
maybe is it a problem with process creation.

Does anybody have a explanation and/or more informations about
" Not enough resources " error message.

Kindly Regards,

+---++---++---++---++---++---++---+ TM  Digital Equipment Belgium
|   ||   ||   ||   ||   ||   ||   |   Multivendor Customer Services
| d || i || g || i || t || a || l |         Thierry FAIDHERBE 
|   ||   ||   ||   ||   ||   ||   |      DIGITAL Unix Support Team
+---++---++---++---++---++---++---+  Email [email protected] 
            Phone : +32 2 729 77 44  Fax : +32 2 729 77 65
           With DIGITAL Unix, ... You get what you pay for ...


[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
1004.1Try the sensor independantly, with trace enabledAZUR::HUREZConnectivity & Computing Services @VBE. DTN 828-5159Wed Feb 19 1997 12:2316
    PSW for UNIX Agent doesn't use cron (as a matter of fact, the psw_agent
    process is a kind of dedicated cron process that itself schedules
    the sensors according to its configuration file psw_agent.conf)
    
    Can you please try the following:
    
    # csh
    # setenv psw_trace on
    # psw_sensor_eth 0 -x eth.log
    
    and mail me with the output logfile...
    
    Thanks,
    
    	-- Olivier.
    
1004.2ETH is OK. What about psw_agent and system resourcesAZUR::HUREZConnectivity & Computing Services @VBE. DTN 828-5159Tue Feb 25 1997 06:1025
    OK, I got the logfiles... According to their contents, when the sensor
    has a chance to run, it seems to perform its job alright...
    
    Do you have a logfile for the Agent itself that would cover a period
    of time during which at least one ETH sensor creation failure happened?
    
    FYI, such a trace can be obtained using the following method:
    
      o Go superuser and Kill existing psw_agent process
    
        % su
        # ps -e -opid,ucomm | grep psw_agent | grep -v psw_agent_ | \
          grep -v grep | awk '{print "kill -USR2 " $1}' | sh
    
      o Enable the trace facility and rerun the psw_agent process,
        while specifying an output logfile (-x option)
    
        # setenv psw_trace
        # /usr/opt/psw/psw_agent -f/usr/opt/psw/psw_agent.conf -xAGENT.LOG &
    
    
    Would it be possible that the concerned system would have its process
    table nearly full by that time?
    
    	-- Olivier.