[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::fddi

Title:FDDI - The Next Generation
Moderator:NETCAD::STEFANI
Created:Thu Apr 27 1989
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2259
Total number of notes:8590

1203.0. "PC_Trace (the full story)" by QUIVER::PARISEAU (Luc Pariseau) Tue Jan 11 1994 11:36

    This note is here to try to explain FDDI PC_Trace.

    I've been asked at least 10 times to explain this process so I
    decided to create this note so that everybody would get it.
    Please fell free to add replies if you still have questions/comments.

    WHAT : The Trace process is the method used by FDDI to recover from
    a 'major' fault.  A 'minor' fault for example would be a lost token.
    The Claim process can take care of that usually.  But if it can't...

    HOW : An example is the best way to explain...

              Concentrator
          +------------------------------+
          | A                  MAC---> B |
          | v                  ^         |
          | v                  ^         |
          | M>>>>M>>>>M>>>>M>>>>         |
          +------------------------------+
                | |  | |
                +-+  +-+
                |S|  |S|
                | |  | |
                MAC  MAC
                #1   #2

    Event #1:
	'Something' happens in the transmit path of Station #1 that causes 
	all frames and tokens to be removed from the ring and IDLE symbols 
	are sent out of the station.

    Event #2:
	Since no tokens or frames are going around the ring, one of the
	TVX timers in a MAC (doesn't matter which MAC) will expire.
	TVX is normally set to 2.5 msec.  This starts a Claim process.

    Event #3:
	After TMax (165 msec) the Claim process fails.  This starts a
	Beacon Process.

    Event #4:
	After T_Stuck (7 secs) the Beacon Process fails.  At this point the
	node directly downstream from the fault is stuck beaconing (this
	would be MAC #2 in my example.)  

	Directed Beacons are now sent for 340 msec by the node that
	is stuck beaconing.  Directed Beacons have very useful information.
	The DA is a multicast (01-80-C2-00-01-00).  You can use this
	DA to filter these frames with your network analyzer and capture
	them.  (You don't have a network analyzer...then get one.)
	The SA = MLA of the node stuck beaconing.  The UNA of this
	node is part of the data in the frame.  This tells you that the
	fault is between the SA and the UNA!

    Event #5:
	MAC #2 INITIATES the Trace Process by sending MLS (Master Line State)
	to the concentrator.  

	The concentrator gets an interupt that informs
	it that a Trace was received on that PHY.  The concentrator then
	"finds" (it knows it's internal stucture and what ports are active)
	the Upstream Neighbor of the port on which the Trace was received.
	It tells the port to send MLS.  The concentrator has PROPAGATED
	the Trace.

	Now the PHY on MAC #1 receives the Trace.  It "finds" that its
	MAC is upstream from this PHY and therefore TERMINATES the Trace.
	The Trace is initiated by the MAC that is directly downstream from
	the fault.  The Trace is propagated until it reaches a MAC at which
	time it is terminated.

	The node that initiated the trace + all nodes that propagated it +
	the node that terminates it are part of the Trace Domain.

    Event #6:
	After TraceMaxExpiration (8 secs), ALL nodes in the Trace Domain will
	 go to PathTest.

	PathTest is NOT defined in the SMT spec but the intent is to find
	the problem and make sure that the node with the problem does not
	get back on the ring.

    ALL DEC FDDI products implement PathTest by doing a reboot which
    causes the power-up diags to run again.

    Some people complain about this.  They don't want us to reboot because
    that "causes" problems...  NO...  The reboot is the symptom NOT the cause.
    Figure out WHY you are geting Traces and FIX that.

    Luc

T.RTitleUserPersonal
Name
DateLines
1203.1I'm cured, thanks.35405::MCELWEEOpponent of OppressionMon Jan 17 1994 01:5121
    Luc,
    
    	I am one of those who asked for an explaination which may have
    filtered down to you. Thanks if so.
    
    	Much of the confusion is due to inadequate details in the
    explaination provided in the FDDI Sytem Level Description document,
    esp. regarding the SAS ports vs. DAS ports trace behavior, IMHO.	
    
    	If you remove MAC #1 or #2 in your example, the fault domain becomes 
    MAC #X + concentrator, + the UNA DAS station if the SAS station fails.
    Understanding how this fault domain is established is unknown unless
    one knows the concentrator's behavior- namely that it knows the SAS MAC
    addresses via ECM and can send PHY level trace internally to that
    station if a beacon fails.
    
    	While it's not common to see only one SAS on a concentrator, the
    effect of an intermittent in the SAS makes troubleshooting this hard to 
    understand if these details are not known.
    
    Phil
1203.2PC_Trace (SAS, DAS...)DRFIX::PARISEAULuc PariseauMon Jan 17 1994 09:1229
	The concentrator doesn't know (well, at least not DEC's concentrator)
	the MAC address of stations attached to it's ports.

	Let me try to explain the Trace propagation another way.


	The node "knows" its internal configuration (how its
	port(s) and MAC(s) are connected to each other.)  When a node
	receives a Trace Signal (MLS=Master Line State) it "finds" its
	next active UPSTREAM PORT or MAC (his MAC, not the MAC of stations
	connected to it).

	It if finds a port then it sends MLS on that port.
	It if finds his MAC then it terminates the trace.


	This rule applies to any FDDI node (DAS, SAS, DAC, SAC, NAC).

	So for example:  A SAS receives MLS on its Port (it only has one).
	What is UPSTREAM from the Port --> his MAC.  So trace terminates.

	When I say UPSTREAM, I mean Ports and MACs (not just MACs which is
	usually what people mean by upstream.)

	If you still have questions, please put an example of a configuration
	and I'll explain how the Trace Domain is established.

	Luc
1203.3Details.35405::MCELWEEOpponent of OppressionMon Jan 17 1994 23:5529
    An example where both the 620 and DEFCN were resetting due to a fault
    in the SAS attached to the DEFCN...
    
    Other DAS				Other DAS
    		__________________
        |A|	|B______________ A|     |B|
        | |	| |		| |     | |
    	-----------		-----------
        |620 	  |		|DEFCN	  |
    	|Bridge	  |		| 	  |
    	-----------		-----------
    				|M|
    				|S|
    				------
    				|SAS |
    				------
    
    >	So for example:  A SAS receives MLS on its Port (it only has one).
    >	What is UPSTREAM from the Port --> his MAC.  So trace terminates.
    
    	In my example, the upstream DAS is also effected, but MLS is sent 
    over the secondary ring path, correct? Lack of understanding how the fault 
    domain is determined made it hard to know if the DEFCN or SAS was the
    cause of the DEFCN and 620 resets.
    	
    Q:	Does not ECM "map" the ports/ MACs and thus allow devices use this to 
    "find" the next active Upstream Port or MAC?
    
    Phil
1203.4SAS TraceQUIVER::PARISEAULuc PariseauTue Jan 18 1994 11:1228
	If the DEFCN and the 620 Reset (and the SAS was doing Trace correctly...
	this is VERY important) then I believe that the SAS was initiating
	the Trace (sending MLS to the DEFCN).  Then the DEFCN propagated the
	Trace (sending MLS on the next upstream active port...in this case
	the A port...just happens to be secondary ring is this case).  
	The 620 received the Trace and found its MAC to be
	upstream from its B port and therefore terminated the Trace.

	So the Trace Domain included all 3 nodes and all 3 nodes should have
	done a PathTest.  So the fault may be an any of those 3 nodes.

	But lets look at what happens if the DEFCN initiates the Trace...
	The DEFCN will send MLS to the first active upstream port from
	its MAC (this would be the M port connected to the SAS).  The
	SAS SHOULD receive the MLS and terminate the Trace (because it
	has a MAC directly upstream from its S port) BUT if it doesn't
	and instead propagates it back to the DEFCN then the DEFCN will
	propagate it to the 620.  So only the DEFCN and SAS SHOULD PathTest,
	but because the SAS didn't implement Trace correctly, all three
	do PathTest.

	We have seen problems with other vendors in the area of Trace
	so you have to keep an "open" mind.  But, the real problem is
	in one of the nodes in the Trace Domain.