[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]
Conference 7.286::fddi

Title:	FDDI - The Next Generation

Moderator:	NETCAD::STEFANI

Created:	Thu Apr 27 1989
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2259
Total number of notes:	8590
1575.0. "PEA0 errors on FDDi LAVc Cluster" by GIDDAY::STANISLAUS () Tue Feb 07 1995 09:17

.            <<< SPEZKO::NOTESPUBLIC:[NOTES$LIBRARY]CLUSTER.NOTE;2 >>>
                                 -< + VAXclusters + >-
================================================================================
Note 4448.0            PEA0 errors on a FDDI LAVc Cluster             No replies
GIDDAY::STANISLAUS                                  169 lines   7-FEB-1995 09:15
--------------------------------------------------------------------------------
		FDDI LAVc and PEA0 errors question
		----------------------------------

		Site 1 to Site 2 distance is about 1/2 Km.
		------------------------------------------


	Site 1						Site 2
	------						------

	+----+						+-----+
    +---|SVS3|						|SVS10|---+
    |	+----+						+-----+  C|
    |	      \	   +-------------------------------+   /	 I|
   C|	       \   |				   |  /		  |
   I|	+----+	+-----+				+-----+		 P|
    |---|SVS7|--|DC500|	    Dual FDDI Ring	|DC500|		 A|
   P|	+----+  +-----+				+-----+		 T|
   A|		/  | +-----+		           |		 H|
   T|	       /   +-|DB500|-----------------------+		  |
   H|	      /      +-----+					+----+
    |	+-----+		|					|HSJ5|
    +---|SVS20|	   	|					+----+
    |	+-----+	    +------------+				   |
    |			     |					   |
  +----+		  +----+				   v
  |HSJ4|		  |SVS9|				To about
  +----+		  +----+				30 disks
    |
    |
    v 
To about 30 disks



Current configuration:
----------------------

VMS ver is 5.5-2.

DEMFAs are at rev 2.0

S/W staff tell me the systems have been patched to the nth degree. IY14 patch,
PEDRIVER patch, FXDRIVER patch, you name it.

LRPSIZE on all systems connected to FDDI is 4541.

MSCP_BUFF on all systems connected to FDDI is 3200. (No symptoms of Frags
seen with MONITOR MSCP). Ofcourse never montored at the time of PEA0 errors.


TIMVCFAIL=4000, RECNXINTERVAL=180, SHADOW_MBR_TMO=210 to prevent CLUEXIT
bugchecks and Shadow Copy from occuring.


On Site 1:
----------
SVS3 is a VAX6000 with 1 vote and expected vote 5. 
SVS7 is a VAX6000 with 1 vote and expected vote 5.
SVS9 is a VAX6000 with 1 vote and expected vote 5. 
SVS20 is a VAX7000 with 1 vote and expected vote 5.

On Site 2:
----------
SVS10 is a VAX7000 with 1 vote and expected vote 5.


There are a few Satellite systems as members of this cluster and connected via 
10/100 bridges but do not have votes. Not shown on the diagram.

Also there are a few more DB500 on the FDDI ring connecting to other Ethernet
around the campus and Satellites on those Ethernets. Not shown on the diagram.

Quorom VAX (which will be on 10/100 bridge) is yet to be installed.

We have checked and made sure there are no FDDI to Ethernet Loops across the
systems.


Fault Symptom:
--------------

	We see SVS3, SVS7, SVS20 logs PEA0 errors (about every hour or so)
and the closure of the VC is always to SVS10.

	For every 1 PEA0 error logged by each of the above systems, SVS10 logs
3 PEA0 errors - 1 closing VC to SVS3, 1 closing VC to SVS 7 and 1 closing VC
to SVS20. These 3 errors occur in a burst at the same time.
This scenario causes MSCP served and shadowed disks across the FDDI to go
into Mount Verification when the PEDRIVER closes the VC, immediately followed
by a Mount Verification completed message in the operator log file.
While all this is happening (Closing VC, loosing connection, mount verification
in progress/completed), there is a distinct pause on the system.

	What never happens are the following:

1)SVS3 never looses connection to SVS7 or SVS20.

2)SVS7 never looses connection to SVS3 or SVS20.

3)SVS20 never looses connection to SVS3 or SVS7. 

Please Note that SVS3, SVS7 and SVS20 (also SVS9) are all on Site 1.
-----------

4)SVS9 which is on Ethernet (via a 10/100 bridge) never looses connection to 
any of the above systems (SVS3, SVS7, SVS10, SVS20) and these systems never 
loose connection to SVS9.


No evidence of pool expansion or system hanging on pagefile or swapfile wait
    as seen by Anal/Sys.
    
No evidence of DECnet circuit down/up messages from any node in the cluster.

No evidence of LAT session dropouts to anynodes in the cluster.

    We get some FDDI Ring Inits Received by all the nodes and very few Ring 
    Inits sent. But these inits do not occur at the same time as the PEA0
    errors. No other FDDI counters increment.
    
	Why doesn't a system that is on the Ethernet (SVS9) via a 10/100 bridge
in the Site 1 never closes VC to SVS10 in the Site 2 but
systems (SVS3, SVS7 and SVS20) on the FDDI in the Site 1 close VC to
SVS10 in the Site 2 about once every hour or so. 

If the theory is:

Connection Manager Hello Message Multicast Packet Loss from SVS10 to other 
systems, then why doesn't SVS9 (which is a member of the same cluster via an 
FDDI to Ethernet Bridge on the same FDDI ring) never fails to receive this
Multicast. So I am convinced it is not Connection Manager Hello Messages
Multicast Packet Loss. So is it Unicast Packet Loss of SCA packets and the
packet loss is not detected by the Data Link ?? For example the theory of
the 10/100 bridge diverting a Unicast packet that is not meant for it to 
an Ethernet Port.

	My current action plan to eliminate the possibilty of a 10/100
bridge grabbing the Unicast packet that is not meant for it and diverting 
it, will be as follows:

There are some spare fibres between Site 1 and Site 2.

We will us use this spare fibre and connect SVS10 in Site 2 to the 
concentrator in Site 1 to which SVS3, SVS7 and SVS20 are connected. 
    
If the problem still occurs, there can only be two things wrong - 

(1) the FDDI concentrator in Site 1  (OR)
(2)SVS10 in Site 2. 

Not a big fault isolation, but atleast we will know it is (a) not the FDDI 
ring, (b) not other FDDI concentrators and (c) not other FDDI bridges. 
This action plan is as good as having SVS10 next to SVS3, SVS7 and SVS20 on 
the same FDDI concentrator.

Previous Configuration:
-----------------------

SVS3 and SVS7 in Site 1 were on the Ethernet just like SVS9. In that
configuration PEA0 errors and VC fails were only happening between the two
VAX 7000 systems (SVS20 on Site 1 to SVS10 on Site 2). No PEA0 errors were
logged by the three systems on the Ethernet (SVS3, SVS7, SVS9) in our previous
configuration. Also SVS10 or SVS20 would never close VC to SVS3, SVS7, SVS9
in that configuration.

Alphonse
T.R	Title	User	Personal Name	Date	Lines
1575.1	See cluster notes conference	GIDDAY::STANISLAUS		`Sat Feb 25 1995 13:23`	5
	For the fix, please see Cluster notes conference note numbers 4482, 4465 and 4448. It is well explained there. ALphonse