[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::fddi

Title:FDDI - The Next Generation
Moderator:NETCAD::STEFANI
Created:Thu Apr 27 1989
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2259
Total number of notes:8590

932.0. "DECbridge 620 port to wait state?" by BUSSTP::JHANNAH (Jim Hannah, Telecoms & Nets, AYO) Fri Apr 16 1993 04:09

	Hi,
           I'm having a problem with 2 of my DECbridge 620's. They are
	configured in a simple ring as shown:-


                -------                      -------
        fmiclv | a    b|====================| a   b | mainbp 
	        -------                      -------
                ||                               ||
	        -------                      -------
	fmicbp | b    a|====================| b   a | mainlv
	        -------                      -------


	     Computer Room 1               Computer Room 2


	The bridges have been happily operating for about the last 7 months
	with no problems. They are all running V1.2 software.

	Yesterday afternoon, DECmcc notified me that port A and port B on 
	mainlv and mainbp, repectively had gone into a waiting state:-


	Bridge LOCAL_NS:.mainlv PHY Port 1
	AT 1993-04-16-08:32:21.883 Status

                         Phy Port State = Waiting

	My first thought was that it might be a faulty cable between the 2 
	ports, so I changed it for a known good one, but that didn't make any 
	difference. The bridge counters show the following:-

	MCC> show bridge mainlv phy port 1 all counter

	Bridge LOCAL_NS:.mainlv PHY Port 1
	AT 1993-04-16-08:36:27.125 Counters

	Examination of attributes shows:
                  Counter Creation Time = 1992-09-24-16:47:07.125
                            LEM Rejects = 360287970189639680
                        LEM Link Errors = 7349874591868649472
                            LCT Rejects = 144115188075855872
                   Connection Completed = 32
                    TNE Expired Rejects = 0
               Elasticity Buffer Errors = 144115188075855872


	MCC> show bridge mainbp phy port 2 all counter

	Bridge LOCAL_NS:.mainbp PHY Port 2
	AT 1993-04-16-08:38:11.133 Counters

	Examination of attributes shows:
                  Counter Creation Time = 1992-11-22-14:34:53.133
                            LEM Rejects = 0
                        LEM Link Errors = 0
                            LCT Rejects = 72057594037927936
                   Connection Completed = 4
                    TNE Expired Rejects = 0
               Elasticity Buffer Errors = 0

	The error counters on mainlv port 1 seem to have clocked up a very large
	number of errors. Could this have caused the port to go into a wait 
	state? Even if they didn't, I don't like to see error counters
    	increasing so what sort of things could cause these errors?

	Regards,
		Jim.

T.RTitleUserPersonal
Name
DateLines
932.1time for new hardwareCOMICS::WOODWARDSmile!Fri Apr 16 1993 06:127
Hardware fault in one of the bridges. The elasticity buffer errors imply a 
serious timing problem between the two stations, but unfortunately you can't
tell which station has gone wrong.

Time to get some spares parts...

Steve
932.2Which one I wonder?BUSSTP::JHANNAHJim Hannah, Telecoms & Nets, AYOFri Apr 16 1993 09:227
    Does it mean that there is a problem in one of these two bridges which
    have their ports in the wait state, or could the problem be in any of
    the four bridges but just showing up on the counters on this one?
    
    Regards,
    		Jim.
    
932.3COMICS::WOODWARDSmile!Fri Apr 16 1993 09:4438
Jim,

Sorry I should have been a bit more specific. The problem lies in one of the
2 bridges whose ports are in the wait state. FDDI (unlike 802.5) uses distributed
clocking, so every station has its own elasticity buffer to allow for clock
differences between it and its neighbours. Elasticity buffer errrors at one 
station mean a problem between that station and the one it's directly connected
to on the port which is showing the errors. 

You can't really tell which station is at fault because, as they are each using
their own clocking, either clock being out of spec will cause errors eg

Take this part of a ring:

	Station 1		Station 2
	---------		---------
    ====!   A	!===============!   B	!===	(ie A port to B port, but it
	---------		---------	  is just as true for M-S etc)

Station 2 port uses internal clock to transmit data to Station 1 on primary ring.

Station 1 clocks data into its elasticity buffer using clock recovered from bit 
stream

Station 1 transmits data on down ring (out of elasticity buffer) using internal
clock.

IF Station 2 clock is too fast OR Station 1 clock is too slow, the elasticity 
buffer will overflow.

IF Station 2 clock is too slow OR Station 1 clock is too fast the elasticity 
buffer could 'empty' in the middle of a frame.

Any of the 4 faults lead to elasticity buffer errors being logged in station 1.

Hope this helps,

Steve
932.4KONING::KONINGPaul Koning, A-13683Fri Apr 16 1993 15:0237
Well, those counters are obviously nonsense.

More specifically, they are 8-byte values with the byte order wrong.  I can't
tell who did it wrong, the management agent or the management director, but
one of them is broken.

Some quick arithmetic says that the real counters are:

                  Counter Creation Time = 1992-09-24-16:47:07.125
                            LEM Rejects = 5 
                        LEM Link Errors = 102 
                            LCT Rejects = 2 
                   Connection Completed = 32
                    TNE Expired Rejects = 0
               Elasticity Buffer Errors = 2 


	MCC> show bridge mainbp phy port 2 all counter

	Bridge LOCAL_NS:.mainbp PHY Port 2
	AT 1993-04-16-08:38:11.133 Counters

                  Counter Creation Time = 1992-11-22-14:34:53.133
                            LEM Rejects = 0
                        LEM Link Errors = 0
                            LCT Rejects = 1 
                   Connection Completed = 4
                    TNE Expired Rejects = 0
               Elasticity Buffer Errors = 0


Given the 100 link errors, I'd say you have a bad link.  May be a bad
transceiver on either end, or a cabling problem.  It's possible the Ebuf
errors are a consequence of the link errors rather than a real problem;
see if they go away once you fix the LEM problem.

	paul