Title: | + OpenVMS Clusters - The best clusters in the world! + |
Notice: | This conference is COMPANY CONFIDENTIAL. See #1.3 |
Moderator: | PROXY::MOORE |
Created: | Fri Aug 26 1988 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 5320 |
Total number of notes: | 23384 |
Hi, I have a customer where there is a FDDI/SCSI cluster. The original configuration was Ethernet/SCSI cluster. Since he installed the FDDI and disconnected the Ethernet, he gets differents crashes in both nodes. Sometimes Cluexit (VC type=Dead) and INVEXCEPTN. I installed there, the most recent patches for LAN (ALPLAN04_062), LAVC (ALPLAVC01_062) and upgraded Pathworks for 50E_E01050. After that, the crashes still continuous to happen. Checking the Port PE informations, through SDA I found plenty of errors in the EWXX buses in both CPU's. Using the SDA in the working system I checked that these errors happens dinamicly. The NCP command show line counters displays "Send failure: carrier check failure" been incremented dinamicly. The error is 204C (SYSTEM-F-DISCONNECT). Trying to minimize the situation set RECNXINTERVAL to 60. My doubts are: How a circuit and a line, both with state off, can log send error? Could PEDRIVER try to use this line even if it is state off? Could the correct setting of NISCS_MAX_PKTSZ (4468) change this situation? Do I need to put a terminator in the Ethernet interface to avoid this situation? I would like to stop those errors before to go deeper in the crash dump analisys. Any help will be very welcome, Mauro Aquino. --------- AlphaServer 8400 Model 5/350 (SIRIUS) OpenVMS v6.2-1H3 NCP Informations: Known Line Volatile Summary as of 11-APR-1997 11:37:38 Line State EWA-0 off EWA-1 off FPA-0 on >65534 Send failure, including: Carrier check failed SDA Informations: Bus Addr Bus LAN Address Error Count Last Error Time of Last Error -------- --- ----------------- ----------- ---------- ----------------------- 82033100 LCL 00-00-00-00-00-00 0 820C18C0 EWA 08-00-2B-E5-80-03 3465 0000204C 11-APR-1997 11:43:03.98 820C3980 EWB 08-00-2B-E6-20-BE 3465 0000204C 11-APR-1997 11:43:03.98 820C5980 FWA 00-00-F8-63-1C-2B 0 -------- AlphaServer 2100 5/300 (ORION) OpenVMS v6.2-1H3 NCP Informations: Known Line Volatile Summary as of 11-APR-1997 11:37:37 Line State EWA-0 off FEA-0 on >65534 Send failure, including: Carrier check failed SDA>sh port/add=xxxxxxxx Bus Addr Bus LAN Address Error Count Last Error Time of Last Error -------- --- ----------------- ----------- ---------- ----------------------- 818311C0 LCL 00-00-00-00-00-00 0 818FB200 EWA 08-00-2B-E5-F9-6A 121285 0000204C 11-APR-1997 11:44:37.99 818FD980 FRA 08-00-2B-B1-1B-40 0
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
5280.1 | ALEPPO::mse_notbuk.mse.tay.dec.com::bowker | Fri Apr 11 1997 16:23 | 5 | ||
NCP only controls the DECnet protocols. Try using LAVC_STOP_BUS (found in SYS$EXAMPLES) to stop SCS from using the EWA ports. | |||||
5280.2 | COL01::VSEMUSCHIN | Duck and Recover ! | Sun Apr 13 1997 16:04 | 12 | |
carrier check failure means an hardware error, something wrong with transceiver (heartbeat not allowed or failed), cabel defective etc. Check your network and as previous note says turn off the cluser traffic on ethernet. Not only because of hardare failures. If your ethernet is (or will be) connected to FDDI ring your customer could run into dangerous situation where circuits (cluster circuits !) will be built between ethernet and FDDI ports. Such cross connection is very unhealthy (if this word exist ;-) I saw a cluster where shadowing and pathworks cannot coexist until ethernet and FDDI was separated. =Seva | |||||
5280.3 | STAR::STOCKDALE | Mon Apr 14 1997 08:58 | 8 | ||
The carrier check failure is irrelevant. Since .0 said the Ethernet was disconnected, and since PEDRIVER is going to keep attempting to use the Ethernet device, carrier check failure errors are going to accumulate and it should not cause the system to crash. You need to look at the dumps to see why the system is crashing. - Dick | |||||
5280.4 | When execute LAVC$STOP_BUS? | VAXRIO::MAURO | Wed Apr 16 1997 11:40 | 10 | |
.3 I checked the CLUEXIT dump, from one system, since the second had the dump creation aborted by the operator. I checked all stuffs that can cause VC to close (pagepool exausted, lan error, etc) and the unique strange thing I found was EW errors. I already disabled this bus using the LAVC$STOP_BUS procedure and I'm monitoring the system. The matter is that after a cold-start/reboot the bus appeared again displaying plenty of errors. Do I need to execute this procedure in the system startup? | |||||
5280.5 | yes | ALEPPO::mse_notbuk.mse.tay.dec.com::bowker | Wed Apr 16 1997 13:57 | 4 | |
> Do I need to execute this procedure in the system > startup? Yes. |