| Title: | + OpenVMS Clusters - The best clusters in the world! + | 
| Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 | 
| Moderator: | PROXY::MOORE | 
| Created: | Fri Aug 26 1988 | 
| Last Modified: | Fri Jun 06 1997 | 
| Last Successful Update: | Fri Jun 06 1997 | 
| Number of topics: | 5320 | 
| Total number of notes: | 23384 | 
Hi,
I have a customer where there is a FDDI/SCSI cluster. The original 
configuration was Ethernet/SCSI cluster. Since he installed the FDDI and 
disconnected the Ethernet, he gets differents crashes in both nodes. Sometimes
Cluexit (VC type=Dead) and INVEXCEPTN.
I installed there, the most recent patches for LAN (ALPLAN04_062), 
LAVC (ALPLAVC01_062) and upgraded Pathworks for 50E_E01050. 
After that, the crashes still continuous to happen. Checking the Port PE
informations, through SDA I found plenty of errors in the EWXX buses 
in both CPU's. Using the SDA in the working system I checked that these
errors happens dinamicly. The NCP command show line counters displays
"Send failure: carrier check failure" been incremented dinamicly. The
error is 204C (SYSTEM-F-DISCONNECT). Trying to minimize the situation
set RECNXINTERVAL to 60.
My doubts are:
How a circuit and a line, both with state off, can log send error?
Could PEDRIVER try to use this line even if it is state off?
Could the correct setting of NISCS_MAX_PKTSZ (4468) change this situation?
Do I need to put a terminator in the Ethernet interface to avoid this
situation?
I would like to stop those errors before to go deeper in the crash dump
analisys.
Any help will be very welcome, Mauro Aquino.
---------
AlphaServer 8400 Model 5/350 (SIRIUS)
OpenVMS v6.2-1H3
NCP Informations:
Known Line Volatile Summary as of 11-APR-1997 11:37:38
 
   Line             State
 
  EWA-0             off
  EWA-1             off
  FPA-0             on
>65534  Send failure, including:
        Carrier check failed
SDA Informations:
Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
82033100  LCL  00-00-00-00-00-00           0
820C18C0  EWA  08-00-2B-E5-80-03        3465  0000204C  11-APR-1997 11:43:03.98
820C3980  EWB  08-00-2B-E6-20-BE        3465  0000204C  11-APR-1997 11:43:03.98
820C5980  FWA  00-00-F8-63-1C-2B           0
--------
AlphaServer 2100 5/300 (ORION)
OpenVMS v6.2-1H3
NCP Informations:
Known Line Volatile Summary as of 11-APR-1997 11:37:37
 
   Line             State
 
  EWA-0             off
  FEA-0             on
>65534  Send failure, including:
        Carrier check failed
SDA>sh port/add=xxxxxxxx
Bus Addr  Bus     LAN Address    Error Count Last Error   Time of Last Error
--------  ---  ----------------- ----------- ---------- -----------------------
818311C0  LCL  00-00-00-00-00-00           0
818FB200  EWA  08-00-2B-E5-F9-6A      121285  0000204C  11-APR-1997 11:44:37.99
818FD980  FRA  08-00-2B-B1-1B-40           0
    
| T.R | Title | User | Personal Name | Date | Lines | 
|---|---|---|---|---|---|
| 5280.1 | ALEPPO::mse_notbuk.mse.tay.dec.com::bowker | Fri Apr 11 1997 15:23 | 5 | ||
| NCP only controls the DECnet protocols. Try using LAVC_STOP_BUS (found in SYS$EXAMPLES) to stop SCS from using the EWA ports. | |||||
| 5280.2 | COL01::VSEMUSCHIN | Duck and Recover ! | Sun Apr 13 1997 15:04 | 12 | |
|     carrier check failure means an hardware error, something wrong
    with transceiver (heartbeat not allowed or failed), cabel defective
    etc. Check your network and as previous note says turn off the
    cluser traffic on ethernet. Not only because of hardare failures.
    If your ethernet is (or will be) connected to FDDI ring your
    customer could run into dangerous situation where circuits (cluster
    circuits !) will be built between ethernet and FDDI ports. Such
    cross connection is very unhealthy (if this word exist ;-) I saw
    a cluster where shadowing and pathworks cannot coexist until
    ethernet and FDDI was separated.
    
    =Seva
 | |||||
| 5280.3 | STAR::STOCKDALE | Mon Apr 14 1997 07:58 | 8 | ||
| The carrier check failure is irrelevant. Since .0 said the Ethernet was disconnected, and since PEDRIVER is going to keep attempting to use the Ethernet device, carrier check failure errors are going to accumulate and it should not cause the system to crash. You need to look at the dumps to see why the system is crashing. - Dick | |||||
| 5280.4 | When execute LAVC$STOP_BUS? | VAXRIO::MAURO | Wed Apr 16 1997 10:40 | 10 | |
|     .3
    
    I checked the CLUEXIT dump, from one system, since the second had the
    dump creation aborted by the operator. I checked all stuffs that can
    cause VC to close (pagepool exausted, lan error, etc) and the unique
    strange thing I found was EW errors. I already disabled this bus using
    the LAVC$STOP_BUS procedure and I'm monitoring the system. The matter
    is that after a cold-start/reboot the bus appeared again displaying
    plenty of errors. Do I need to execute this procedure in the system
    startup?
 | |||||
| 5280.5 | yes | ALEPPO::mse_notbuk.mse.tay.dec.com::bowker | Wed Apr 16 1997 12:57 | 4 | |
| > Do I need to execute this procedure in the system > startup? Yes. | |||||