[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::fddi

Title:	FDDI - The Next Generation

Moderator:	NETCAD::STEFANI

Created:	Thu Apr 27 1989
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2259
Total number of notes:	8590

1833.0. "fta0: Halt Reason (4): Network Hardware Fault revisited" by 51488::PVILJANEN (Pekka Viljanen/ CSC Finland) Mon Oct 16 1995 11:53

    I need to ask this again, because of "customer end of patience".
    
    Q1: 
    What does this error message mean ?
    
    "Sep 10 21:00:55 lerppu vmunix: fta0: Halt Reason (4): Network 
    Hardware Fault"
    
    Q2: 
    After this server does OSF v3.0 boot, the FDDI charastesistics are
    SAS and half dublex.
    
    After "fta0: Halt Reason (4): Network Hardware Fault" event, they are
    SAS and FULL dublex.
    (it seems, that this is taken from FDDI hardware/firmware)
    
    Where can it be set up properly after this event ?
    
    
    thanking in advance
    Pekka
    
    ps. this is cross posted in Digital_UNIX notes
    
    
  
    ************************************************************************** 

    This is a short part from kern.log file: 

Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142
Sep 10 20:49:03 lerppu vmunix: NFS server: stale file handle fs(739,1046212) fil
e 29009 gen 33056
Sep 10 20:49:03 lerppu vmunix:  read, client address = 130.233.245.32, errno 22
Sep 10 21:00:53 lerppu vmunix: fta0: Illegal length, packet dropped; len = 7450
Sep 10 21:00:54 lerppu vmunix: fta0: Link Unavailable.
Sep 10 21:00:55 lerppu vmunix: rfs_dispatch: sendreply failed
Sep 10 21:00:55 lerppu last message repeated 31 times
Sep 10 21:00:55 lerppu vmunix: fta0: Halted.
Sep 10 21:00:55 lerppu vmunix: fta0: Halt Reason (4): Network Hardware Fault
Sep 10 21:00:55 lerppu vmunix: fta0: Reinitializing network adapter...
Sep 10 21:00:55 lerppu vmunix: rfs_dispatch: sendreply failed
Sep 10 21:00:55 lerppu last message repeated 4 times
Sep 10 21:01:01 lerppu vmunix: fta0: DMA Available.
Sep 10 21:01:01 lerppu vmunix: fta0: Link Unavailable.
Sep 10 21:01:02 lerppu vmunix: fta0: Link Available.

    
    note to "rfs_dispatch: sendreply failed" event
                             aosg::alpha_osf_ift
    ---------------------------------------------------------------------------
    Note 9144.1              rfs_dispatch, send retry failed              
    QUABBI::"[email protected]"                          31 lines  
    2-MAR-1995 20:32
                        -< Re: rfs_dispatch, send retry failed >-
    --------------------------------------------------------------------------------
    
    NFS is complaining that it couldn't reply to a request.  It may be a
    routing problem, as in a client that managed to deliver a request but
    the server's routing table doesn't have a path back.
    
    
    rfs_* are NFS server, primarily V2, rfs3_* are NFS V3 server.
    nfs_* are NFS client, primarily V2, nfs3_* are NFS V3 client.
    
    -- 
    Eric (Ric) Werme         |  [email protected]
    Digital Equipment Corp.  |  This space intentionally left blank.
    [posted by Notes-News gateway]
    
    ***************************************************************************
        
    Environment:
    
    There are boxes on the same FDDI ring:
    
    2 * OSF v3.0 rev 347 NFS v3/v2 servers DEC3000-600 fw 6.0 DEFTA fw v2.4a
    4 * OSF v3.2a NFS clients DEC3000-400 fw 6.0 DEFTA fw v2.4a
    1 * IBM RS6000/580 AIX v3.2.5 NFS client
    1 * CISCO AGS+
    1 * PC router (Schneider & Koch)

T.R	Title	User	Personal Name	Date	Lines
1833.1	so, any comments?	51900::jpt	FIS and Chips	`Thu Oct 19 1995 05:03`	9
	It would be really great if someone could give some hints on this, as after V3.2c upgrade the FDDI controllers reset themselves to totallu useless state after few minutes and system hangs. This starts to get annoying... -jari
1833.2		NPSS::WADE	Network Systems Support	`Thu Oct 19 1995 12:08`	11
	Did this start after the V3.2c upgrade and it was running fine up to then? What changed between the time when everything was working fine and now? And, have you tried using an FDDI lan analyzer to determine if there really are oversized packets on the wire and is so who is sending them? Bill
1833.3	let's see ... it may be faulty concentrator, but ...	51900::jpt	FIS and Chips	`Fri Oct 20 1995 04:50`	36
	Bill, Thanks for your comments. > What changed between the time when everything was working fine and > now? Well, it worked sluggishly even with V3.0 and now when they installed V3.2c it stopped working totally (in practise). In practise meaning that after the controller reset the controller seems to ignore all parameters set by user and resets to some mystical state that hangs the system totally. With V3.0 it suffered the problem about once a week and recovery was easier (total hardware reset not needed), but with V3.2c it takes hard reset to clear things every time, and it takes only about 30 minutes with 3.2c to hang the system. Guess I have is that there really may be those illegal lenght packets, but the real problem might be that the interaction of FDDI controller firmware and Digital UNIX FDDI driver in case of this error has problems. We will try replacing concentrator under suspect, but it would be nice to find out why system hangs and can't handle these errors, and or course getting this fixed would protect us against further problems. > And, have you tried using an FDDI lan analyzer to determine if there > really are oversized packets on the wire and is so who is sending them? Where could we get FDDI LAN analyzer? We don't have one, and it costs like h*ll to buy one? We have tried tracking borrowing one, but only place we've found this far is propably HP's local office ;-) Thanks, -jari
1833.4	PCI FDDI too	51900::jpt	FIS and Chips	`Mon Oct 23 1995 06:02`	9
	Now the exactly same problem has been demonstrated also with PCI FDDI in AlphaServer 2100 4/275 (2cpu's). Any hints? I couldn't find related modules from V3.0 source kit... I found only error reporting, but no routine calling that module with PI_HALT_ID_K_HW_FAULT or any similar error... -jari
1833.5	they have traditionally used pfilt utilities ...	51900::jpt	FIS and Chips	`Tue Oct 24 1995 14:46`	16
	These aren't necessarily related problems, but this is very recent patch, could these two have anything common??? --------- V3.2C patches ---------- PROBLEM: (Patch ID: OSF350-041) (CLD 7AZB92029) ******** When writing packets using the packetfilter on FDDI, there are 14 bytes of corruption in the link layer header of the packet, so the packet appears corrupted on the FDDI ring. This fix is to the FDDI/packet filter code for an erroneous write-side bcopy, which has been corrected. This problem does not occur using Ethernet. It is included here so that patch #1 does not get overwritten. There is no impact (side effects, etc.) if this patch is installed when FDDI is not being used.
1833.6	I've received comments	51900::jpt	FIS and Chips	`Wed Nov 01 1995 04:08`	6
	I've been (unofficially) told yesterday that this error should never occur. Our engineering should propably be aware of this now... -jari
1833.7	some facts, a suggestion	NETCAD::ROLKE	$ set terminal/script	`Wed Nov 01 1995 15:25`	49
	The common "FDDI corner" has a 68000 processor and a standard set of fancy hardware to implement FDDI for your host. The host driver talks to the 68000 processor to get the link set up. Eventually the data flows between FDDI and the host with no intervention by the 68000. This is what you want - the fancy hardware is doing all the work. Occasionally, however, something evil happens either to the hardware or to the 68000 causing the 68000 to "crash". When this happens the 68000 does this: 1. disables interrupts 2. writes an error log entry into flash 3. updates Port Status with the halt reason 4. changes state to "halted" 5. interrupts the host with a "state change" reason Notice that the 68000 is still in control even though it is declaring the subsystem to be failed. This code is very robust and few things cause it to halt so badly that it can't report state to the host driver. A "double bus fault" will cause it to hang but those are "rare". The base note is describing a driver that is reporting "HW FAULT" as the halt reason noted in Step 3 above. This is good information but it is not enough to diagnose the problem from the 68000's side. What we really need to see is the result of Step 2: the error log entry. This gives a register dump at the time of the fault. FDDI host drivers can get the error log entry from the adapter via either port register or DMA commands. The 68000 keeps the last several error reports in flash and having a record of all of them (and not just the most recent) would be most valuable. If driver fta collects the error log it is not posted in .0. You mention a CISCO AGS+ in the configuration. Is this the source of these packets? Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142 I have never seen packets this big on my network! I can guarantee that MY adapters are not subjected to packets this big. This makes the elusive crash data in the error log even more intriguing. I know that driver fta0 won't be modified overnight to get me the crash dump data. I will suggest through other channels that we swap the customer's adapter and get the adapter with the failures back to engineering. The crash dump can then be extracted and we can make progress on analyzing this problem. Regards, Chuck
1833.8	I'll continue this in mail...	51900::jpt	FIS and Chips	`Thu Nov 02 1995 07:57`	18
	>be most valuable. If driver fta collects the error log it is not posted >in .0. Reason why it's not posted is that there is no entry in error log! So we must approach it by sending the card to engineering if problem can't be isolated otherwise. >I have never seen packets this big on my network! I can guarantee that MY >adapters are not subjected to packets this big. This makes the elusive >crash data in the error log even more intriguing. Yes, it has been proven with FDDI Analyzer too, your adapters aren't generating these packets ;-) Let's take this offline, I will mail you, thanks for you great answer in .7 ! -jari