[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::fddi

Title:	FDDI - The Next Generation

Moderator:	NETCAD::STEFANI

Created:	Thu Apr 27 1989
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2259
Total number of notes:	8590

421.0. "FDDI Maximum Access Delay" by TOOIS1::MIRGHANE () Mon Dec 16 1991 13:40

    On DIGITAL Technical Journal  volume 3 number 3, Raj Jain
    explains very well how maximum access delay (MAD) is linked to
    the TTRT parameter with the formulae: 
    	MAD = Maximum access delay = (n-1)TTRT + 2*D
    Where n is the number of stations and D the ring latency.
    
    These MAD value is very important for designing distributed 
    FDDI realtime systems.
    
    We however think this formulae need to be completed by all 
    unsteady states on the FDDI ring.
    
    We need to know how the token is managed. What happens when the token
    is lost for example. Or what happens when there is an automatic 
    reconfiguration of the ring (i.e. in case of a dual attachement station 
    breakdown).
    Our objective is to understand the causes of such unsteady states
    in order to design systems to minimize the probability
    of such events.
    The second objective is to be able to calculate the Maximum access
    Delay in case of such an unsteady state.
    
    Thanks very much for any help (pointer, document, ...).
    
    Soumetty.

T.R	Title	User	Personal Name	Date	Lines
421.1		KONING::KONING	Paul Koning, NI1D	`Tue Dec 17 1991 11:01`	124
	There are various levels of fault recovery in FDDI, some of which deal with very esoteric faults (and take a fair amount of time to do so). There are basically three levels of recovery. 1. Recovery from problems caused by line errors This category applies to a normal working network, since some small number of line errors is normal for a working network. (Note that on fiber optic networks, the "typical" bit error rate is extremely low and may approach zero -- however, it can be non-zero even though nothing is "defective".) Line errors can result in (a) loss of data packets, (b) loss of the token. Data packet loss is generally detected by higher layer protocols. Once detected, the higher layer protocol normally retransmits. The delay caused by a data packet loss is therefore the sum of the higher layer timeout and the FDDI access delay. Typically, the timeout is a couple of seconds, which is far larger than the FDDI access delay in normal configurations. Packet loss affects only the user whose packet was lost; everything else continues normally. Given a bit error rate of 10^-12 (which is "typical"), 100 stations, and 4500 byte, you get a packet loss rate of 3.610^-6. In other words, less than 1 in 100,000 packets is lost. Token loss due to bit errors occurs when a bit error hits the token and changes it to something else. There are two timers in FDDI that detect this. In general, it is the TVX (Valid Transmission) timer that will go off. Token loss of course affects everyone; transmissions stop until a new token is generated. That's why the recovery here is rather quick. TVX is typically a few milliseconds. Our stations use as default 2.62 ms. So that's the delay from token loss until detection. Once TVX goes off, Claim is started to recreate the token. Claim takes at most two ring delays (i.e., 2D_Max, or 3.2 ms, worst case); after that it takes two more ring delays for the ring to return completely to normal. So the entire process takes about 9 ms with default TVX. (There is no reason I know off ever to set TVX to any other value -- even though the FDDI standard insists that it's settable. Then again, the same thing is true for all but one or two FDDI parameters...) 2. Recovery from problems caused by link failures This category covers things that are "broken" (as opposed to bit errors, which are "normal") -- but that are nevertheless fairly common and expected. By "link failure" I mean cable breaks, connector problems, and transceiver failures. All these cause loss of signal. Cable break is probably the most common, followed by connector problems. Transceiver failures aren't all that common but the transmitter in particular does have a finite (though quite long) life and will degrade over time. Two mechanisms recover from link failures. The simplest case is a total failure, such as an unplugged connector or a backhoe digging right through your cable. That is caught by the Signal Detect function of PMD, or the Quiet Line State detection in PHY. This happens within a millisecond. After that, the affected port is removed from the ring, and the ring is reinitialized as with a token loss. So the total hit is under 10 ms. Degraded links generally take longer to detect but are probably less common. "Degraded" here means a link with a high bit error rate -- perhaps 10^-6 or so. This can happen due to damaged cable, or partially plugged-in connectors. The Link Error Monitor detects this situation. There is a parameter, set by default to 8 (i.e., 10^-8) which specifies what link error rate is considered "excessive". Normally you should leave it at 8. The only reason for changing it is that you have a marginal link that is very important, and you prefer to leave it on -- even though you're getting high packet and token loss rates -- rather than have it taken out of service, while waiting for it to be repaired. LEM requires about 5-10 errors over an averaging period of a minute or so before it decides a link is bad. If it suddenly becomes bad enough to generate 10 errors in one second, it's shut off then, but if it gradually goes bad, it will probably take up to a minute or so for the problem to be detected. Note that this isn't serious, since communication does continue in the meantime. (You may lose a lot more packets than usual, or lose the token a couple of times, but that's not all that major given that it doesn't last.) Once the problem is detected, the port is removed from the ring and the ring reinitialized, as above. 3. Recovery from esoteric problems If you think long enough and hard enough, you can come up with some cases that aren't covered by any of the above. There are two that are rather improbable but are nevertheless covered by the FDDI standard: internal station faults, and duplicate addresses. Internal station faults are faults other than link faults. The typical example we use is a MAC whose transmit circuit is broken and always sending Idle. In that case, clearly you can't get any data through, and the token disappears. Claim also fails, and the subsequent Beacon process will not complete. (Getting to this point takes 175 ms.) Thus we speak of a "stuck beaconing" condition. After about 8 seconds of beaconing, the "Trace" process begins. This propagates a signal to each of the stations that might be the source of the problem. The signal propagation typically takes less than a second; the worst case is 7 seconds. At that point, each of the stations that received the trace signal performs a self test. There is no standard time bound for self test, of course; it depends on the product. After successful self test, the station rejoins the ring. In the meantime, as soon as trace finishes the other stations (those not suspected of being broken) remain in an operating ring; for them there was an interruption of up to 15 seconds (stuck-beaconing time plus trace time). Duplication of the station address can cause the ring not to operate. (You'll have to walk through the details of the Claim procedure to see why; I'll skip that for now.) If that happens, a duplicate address test is invoked after about 1 second, which should identify the offenders and remove them from the ring quickly (within another second or so). Note that I said "station address", i.e., "MLA". In DEC FDDI products, the station address is ALWAYS the "hardware address" and comes from a ROM. Thus the probability of duplicate address is close to zero. The familiar DECnet Phase 4 address is NOT the MLA in our FDDI products -- instead it is an "alias" address. A major reason for this is that duplication of an alias address isn't a serious problem. As many people know, duplicate DECnet node numbers -- and thus duplicate DECnet-style addresses -- happens fairly frequently. We wanted to make sure that this situation would not crash the ring, which is why alias addresses are used for it. paul
421.2	Thanks	TOOIS1::MIRGHANE		`Tue Dec 17 1991 14:20`	4
	Paul thank you very much for this information. That's exactly what we needed. Soumetty.