[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5280.0. "Port PE errors..." by VAXRIO::MAURO () Fri Apr 11 1997 15:30

Hi,

I have a customer where there is a FDDI/SCSI cluster. The original 
configuration was Ethernet/SCSI cluster. Since he installed the FDDI and 
disconnected the Ethernet, he gets differents crashes in both nodes. Sometimes
Cluexit (VC type=Dead) and INVEXCEPTN.

I installed there, the most recent patches for LAN (ALPLAN04_062), 
LAVC (ALPLAVC01_062) and upgraded Pathworks for 50E_E01050. 

After that, the crashes still continuous to happen. Checking the Port PE
informations, through SDA I found plenty of errors in the EWXX buses 
in both CPU's. Using the SDA in the working system I checked that these
errors happens dinamicly. The NCP command show line counters displays
"Send failure: carrier check failure" been incremented dinamicly. The
error is 204C (SYSTEM-F-DISCONNECT). Trying to minimize the situation
set RECNXINTERVAL to 60.

My doubts are:

How a circuit and a line, both with state off, can log send error?
Could PEDRIVER try to use this line even if it is state off?
Could the correct setting of NISCS_MAX_PKTSZ (4468) change this situation?
Do I need to put a terminator in the Ethernet interface to avoid this
situation?

I would like to stop those errors before to go deeper in the crash dump
analisys.

Any help will be very welcome, Mauro Aquino.

---------
AlphaServer 8400 Model 5/350 (SIRIUS)
OpenVMS v6.2-1H3

NCP Informations:

Known Line Volatile Summary as of 11-APR-1997 11:37:38
 
   Line             State
 
  EWA-0             off
  EWA-1             off
  FPA-0             on

>65534  Send failure, including:
        Carrier check failed

SDA Informations:

Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
82033100  LCL  00-00-00-00-00-00           0
820C18C0  EWA  08-00-2B-E5-80-03        3465  0000204C  11-APR-1997 11:43:03.98
820C3980  EWB  08-00-2B-E6-20-BE        3465  0000204C  11-APR-1997 11:43:03.98
820C5980  FWA  00-00-F8-63-1C-2B           0


--------
AlphaServer 2100 5/300 (ORION)
OpenVMS v6.2-1H3

NCP Informations:

Known Line Volatile Summary as of 11-APR-1997 11:37:37
 
   Line             State
 
  EWA-0             off
  FEA-0             on


>65534  Send failure, including:
        Carrier check failed

SDA>sh port/add=xxxxxxxx

Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
818311C0  LCL  00-00-00-00-00-00           0
818FB200  EWA  08-00-2B-E5-F9-6A      121285  0000204C  11-APR-1997 11:44:37.99
818FD980  FRA  08-00-2B-B1-1B-40           0
    
T.RTitleUserPersonal
Name
DateLines
5280.1ALEPPO::mse_notbuk.mse.tay.dec.com::bowkerFri Apr 11 1997 16:235
NCP only controls the DECnet protocols.

Try using LAVC_STOP_BUS (found in SYS$EXAMPLES) to stop SCS from using the EWA ports.


5280.2COL01::VSEMUSCHINDuck and Recover !Sun Apr 13 1997 16:0412
    carrier check failure means an hardware error, something wrong
    with transceiver (heartbeat not allowed or failed), cabel defective
    etc. Check your network and as previous note says turn off the
    cluser traffic on ethernet. Not only because of hardare failures.
    If your ethernet is (or will be) connected to FDDI ring your
    customer could run into dangerous situation where circuits (cluster
    circuits !) will be built between ethernet and FDDI ports. Such
    cross connection is very unhealthy (if this word exist ;-) I saw
    a cluster where shadowing and pathworks cannot coexist until
    ethernet and FDDI was separated.
    
    =Seva
5280.3STAR::STOCKDALEMon Apr 14 1997 08:588
The carrier check failure is irrelevant.  Since .0 said the Ethernet was
disconnected, and since PEDRIVER is going to keep attempting to use the
Ethernet device, carrier check failure errors are going to accumulate and
it should not cause the system to crash.

You need to look at the dumps to see why the system is crashing.

- Dick
5280.4When execute LAVC$STOP_BUS?VAXRIO::MAUROWed Apr 16 1997 11:4010
    .3
    
    I checked the CLUEXIT dump, from one system, since the second had the
    dump creation aborted by the operator. I checked all stuffs that can
    cause VC to close (pagepool exausted, lan error, etc) and the unique
    strange thing I found was EW errors. I already disabled this bus using
    the LAVC$STOP_BUS procedure and I'm monitoring the system. The matter
    is that after a cold-start/reboot the bus appeared again displaying
    plenty of errors. Do I need to execute this procedure in the system
    startup?
5280.5yesALEPPO::mse_notbuk.mse.tay.dec.com::bowkerWed Apr 16 1997 13:574
>    Do I need to execute this procedure in the system
>    startup?

Yes.