[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1958.0. "Node not currently accessible." by KETJE::PACCO () Tue Dec 17 1991 12:21

    On a VAXstation 3100 M38 running DECmcc BMS V1.1 polled rules regularely
    gives the exception "Node not currently accessible".  Regularely means 5-10
    times a day where in average 2500 poll/day are being done.  Although it
    may seem "low", it is unacceptable to ring automatically somebody at
    home to tell him the node not currently accessible, because if you look
    at the network at the same time, everything is right.
    
    The strange thing is that it is not reproductible at will, and that it
    appears both during daytime as nighttime ( even when traffic on the network
    is low).
    
    The Entities polled are circuits on DEMSA's.  No network resource
    problems on the DEMSA's or on the workstation are being encountered
    because all error DECnet counters remains at 0.  The number of "MAximum
    active logical links" also is far under the "Maximum logical links" allowed.
    I suspect another kind resource problem on the workstation, something
    which perhaps was not mentionned in the release notes.  The
    quota's used are the ones described in the V1.1 release notes.  There are
    only 12 rules created and activated in the workstation.  There is only 1
    DECmcc workstation in the network.
    
    Can anybody be clear on the possible causes of that NODE4 EXCEPTION,
    What are e.g. the possible VMS error codes causing that exception.  Is
    it a time-out problem???
    
    An implementation will stay or be cancelled due to this (for the customer)
    unacceptable behaviour.  I do not like to put pressure to anybody, but
    I am unable to do it without any additional aid.  Can the code of the
    node4_am be searched to know where that exception could be generated,
    after which VMS error cause ?
    
    Regards,
    	Dominique.
T.RTitleUserPersonal
Name
DateLines
1958.1exception vs rule fireDADA::DITMARSPeteThu Dec 19 1991 14:5147
Hi,

Please forgive what may be a silly question.

Does this condition cause the rule to "fire" or to encounter an exception?

If the customer is currently using the same procedure for both his "procedure" 
and the "exception handler", perhaps it would be acceptable to simply
create a different procedure for the exception handler which would
be smart enough to ignore these spurious problems.  From the V11 online help:



ENTITY

  ALARMS

    RULE

      CREATE

        Exception_Handler

          Information


            When an error occurs during the evaluation of an alarm expression,
            the error is passed, as string values, to the command procedure
            through the following parameters:

              P1 - Rule name
              P2 - Description
              P3 - Category
              P4 - Expression
              P5 - Time of error detection (Binary Absolute Time output
                   format)
              P6 - Error that occurred.  This parameter will also contain
                   the string "The rule has been disabled." if the error caused
                   the rule to be automatically disabled. There may be a number
                   of separate text strings used to describe the error and/or
                   the state of the rule.  When this is the case, they will be
                   separated by the string "<EOS>" (End Of Segment).
              P7 - Parameter
              P8 - Data file.


ENTITY ALARMS RULE CREATE Exception_Handler Subtopic?
1958.2The unexpected alarm is an exception.KETJE::PACCOFri Dec 20 1991 06:0324
    Pete,
    
    The alarm rule fires an Exception, but the silly thing is that there
    should be no exception (neither a normal alarm) being fired if (an I am
    assuming this) the network is stable and operational.  If the cause of
    the exception is something related to the management system (And I
    presume this), that situation is not acceptable.
    
    If the problem is in the management station then,
    	if it is a resource problem, that should be solved.
    	if it is a DECmcc code problem, that should also be solved.
    	if it is e.g. because the management station does not see the
    network, that would be acceptable, but then there should be hard
    evidence (with DECnet EVL) that the station was temporarely
    "disconnected".  For this last assumption, no event has been seen
    around the intervals when the exception happens, although event
    logging has been sinked to the management station, and at other moments
    "normal" events are being collected.
    
    Therefore I want to investigate the cause of the "spurious" exceptions.
    
    Regards,
    	Dominique.
    
1958.3entered as QAR 1965 in NACQAR mcc_internal databaseDADA::DITMARSPeteMon Dec 23 1991 11:562
You might send/post an exact log of what's going on so the Phase IV folks can
be precise in figuring out what's going on?  Jean Lee is responsible for DNA4.
1958.4QIO told us it was unreachableTOOK::PURRETTAMon Dec 23 1991 15:209
I looked into this.  No place in the AM do we explicitly set this
exception.  We truly returned from a QIO with SS$_UNREACHABLE and
translated it to the specialized exception which you are seeing.
As far as VMS was concerned, the node was unreachable.

We will continue to look into this problem but I figured I'd give
what info we have in the meantime.

	-- John
1958.5What's the network like between the two systems?TOOK::BIRNBAUMMon Dec 23 1991 17:2815
One possible explanation of this behavior is that the router is unable respond
to the management system's connection initiate message before the outgoing timer
on the management system expires. 

This type of timeout can be caused by a number of factors including: network
congestion and network routing cost transitions. Could you please check the
value of the outgoing timer and the number of response timeouts on the
management system and the value of the incoming timer on the router. Also can
you give me some idea of the topology between the management system and the
router.

Thanks,

Bill
    
1958.6Problem is in DECnet.KETJE::PACCOMon Dec 30 1991 10:099
    Thanks for specifying that the error "Node not currently accessible"
    unambiguisly corresponds with SS$_UNREACAHBLE (could have been
    documented !)
    
    Now it's also clear that DECmcc is out of suspicion.  I can reproduce the
    error (quite difficultly) with NCP procedures.  It looks at this time
    as a DECrouter/X25 gateway problem.  Investigation goes further.
    
    	Dominique.