[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference 7.286::fddi

Title:FDDI - The Next Generation
Moderator:NETCAD::STEFANI
Created:Thu Apr 27 1989
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2259
Total number of notes:8590

1833.0. "fta0: Halt Reason (4): Network Hardware Fault revisited" by 51488::PVILJANEN (Pekka Viljanen/ CSC Finland) Mon Oct 16 1995 12:53

    I need to ask this again, because of "customer end of patience".
    
    Q1: 
    What does this error message mean ?
    
    "Sep 10 21:00:55 lerppu vmunix: fta0: Halt Reason (4): Network 
    Hardware Fault"
    
    Q2: 
    After this server does OSF v3.0 boot, the FDDI charastesistics are
    SAS and half dublex.
    
    After "fta0: Halt Reason (4): Network Hardware Fault" event, they are
    SAS and FULL dublex.
    (it seems, that this is taken from FDDI hardware/firmware)
    
    Where can it be set up properly after this event ?
    
    
    thanking in advance
    Pekka
    
    ps. this is cross posted in Digital_UNIX notes
    
    
  
    ************************************************************************** 

    This is a short part from kern.log file: 

Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142
Sep 10 20:49:03 lerppu vmunix: NFS server: stale file handle fs(739,1046212) fil
e 29009 gen 33056
Sep 10 20:49:03 lerppu vmunix:  read, client address = 130.233.245.32, errno 22
Sep 10 21:00:53 lerppu vmunix: fta0: Illegal length, packet dropped; len = 7450
Sep 10 21:00:54 lerppu vmunix: fta0: Link Unavailable.
Sep 10 21:00:55 lerppu vmunix: rfs_dispatch: sendreply failed
Sep 10 21:00:55 lerppu last message repeated 31 times
Sep 10 21:00:55 lerppu vmunix: fta0: Halted.
Sep 10 21:00:55 lerppu vmunix: fta0: Halt Reason (4): Network Hardware Fault
Sep 10 21:00:55 lerppu vmunix: fta0: Reinitializing network adapter...
Sep 10 21:00:55 lerppu vmunix: rfs_dispatch: sendreply failed
Sep 10 21:00:55 lerppu last message repeated 4 times
Sep 10 21:01:01 lerppu vmunix: fta0: DMA Available.
Sep 10 21:01:01 lerppu vmunix: fta0: Link Unavailable.
Sep 10 21:01:02 lerppu vmunix: fta0: Link Available.

    
    note to "rfs_dispatch: sendreply failed" event
                             aosg::alpha_osf_ift
    ---------------------------------------------------------------------------
    Note 9144.1              rfs_dispatch, send retry failed              
    QUABBI::"[email protected]"                          31 lines  
    2-MAR-1995 20:32
                        -< Re: rfs_dispatch, send retry failed >-
    --------------------------------------------------------------------------------
    
    NFS is complaining that it couldn't reply to a request.  It may be a
    routing problem, as in a client that managed to deliver a request but
    the server's routing table doesn't have a path back.
    
    
    rfs_* are NFS server, primarily V2, rfs3_* are NFS V3 server.
    nfs_* are NFS client, primarily V2, nfs3_* are NFS V3 client.
    
    -- 
    Eric (Ric) Werme         |  [email protected]
    Digital Equipment Corp.  |  This space intentionally left blank.
    [posted by Notes-News gateway]
    
    ***************************************************************************
        
    Environment:
    
    There are boxes on the same FDDI ring:
    
    2 * OSF v3.0 rev 347 NFS v3/v2 servers DEC3000-600 fw 6.0 DEFTA fw v2.4a
    4 * OSF v3.2a NFS clients DEC3000-400 fw 6.0 DEFTA fw v2.4a
    1 * IBM RS6000/580 AIX v3.2.5 NFS client
    1 * CISCO AGS+
    1 * PC router (Schneider & Koch)
    
    
    
T.RTitleUserPersonal
Name
DateLines
1833.1so, any comments?51900::jptFIS and ChipsThu Oct 19 1995 06:039
	It would be really great if someone could give some hints on this,
	as after V3.2c upgrade the FDDI controllers reset themselves to
	totallu useless state after few minutes and system  hangs.

	This starts to get annoying...

		-jari

1833.2NPSS::WADENetwork Systems SupportThu Oct 19 1995 13:0811
    Did this start after the V3.2c upgrade and it was running fine up to
    then?  
    
    What changed between the time when everything was working fine and
    now?
    
    And, have you tried using an FDDI lan analyzer to determine if there
    really are oversized packets on the wire and is so who is sending them?
                                                        
    Bill
    
1833.3let's see ... it may be faulty concentrator, but ...51900::jptFIS and ChipsFri Oct 20 1995 05:5036
	Bill,

	Thanks for your comments.

>    What changed between the time when everything was working fine and
>    now?

	Well, it worked sluggishly even with V3.0 and now when they installed
	V3.2c it stopped working totally (in practise). In practise meaning
	that after the controller reset the controller seems to ignore
	all parameters set by user and resets to some mystical state that
	hangs the system totally. With V3.0 it suffered the problem about
	once a week and recovery was easier (total hardware reset not needed),
	but with V3.2c it takes hard reset to clear things every time, and 
	it takes only about 30 minutes with 3.2c to hang the system.

	Guess I have is that there really may be those illegal lenght packets,
	but the real problem might be that the interaction of FDDI controller
	firmware and Digital UNIX FDDI driver in case of this error has 
	problems.

	We will try replacing concentrator under suspect, but it would be
	nice to find out why system hangs and can't handle these errors, and
	or course getting this fixed would protect us against further problems.
    
>    And, have you tried using an FDDI lan analyzer to determine if there
>    really are oversized packets on the wire and is so who is sending them?

	Where could we get FDDI LAN analyzer? We don't have one, and it
	costs like h*ll to buy one? We have tried tracking borrowing one,
	but only place we've found this far is propably HP's local office ;-)


	Thanks,

		-jari
1833.4PCI FDDI too51900::jptFIS and ChipsMon Oct 23 1995 06:029
	Now the exactly same problem has been demonstrated also with 
	PCI FDDI in AlphaServer 2100 4/275 (2cpu's).

	Any hints? I couldn't find related modules from V3.0 source kit...
	I found only error reporting, but no routine calling that module
	with PI_HALT_ID_K_HW_FAULT or any similar error...

		-jari
1833.5they have traditionally used pfilt utilities ...51900::jptFIS and ChipsTue Oct 24 1995 14:4616
	These aren't necessarily related problems, but this is very recent
	patch, could these two have anything common???

 --------- V3.2C patches ----------
PROBLEM:        (Patch ID: OSF350-041)          (CLD 7AZB92029)
********
When writing packets using the packetfilter on FDDI, there are 14 bytes
of corruption in the link layer header of the packet, so the packet
appears corrupted on the FDDI ring.  This fix is to the FDDI/packet
filter code for an erroneous write-side bcopy, which has been corrected.

This problem does not occur using Ethernet.  It is included here
so that patch #1 does not get overwritten.  There is no impact (side
effects, etc.) if this patch is installed when FDDI is not being used.

1833.6I've received comments51900::jptFIS and ChipsWed Nov 01 1995 04:086
	I've been (unofficially) told yesterday that this error should
	never occur. Our engineering should propably be aware of this
	now...

			-jari
1833.7some facts, a suggestionNETCAD::ROLKE$ set terminal/scriptWed Nov 01 1995 15:2549
The common "FDDI corner" has a 68000 processor and a standard set of
fancy hardware to implement FDDI for your host.  The host driver talks
to the 68000 processor to get the link set up.  Eventually the data 
flows between FDDI and the host with no intervention by the 68000.
This is what you want - the fancy hardware is doing all the work.

Occasionally, however, something evil happens either to the hardware
or to the 68000 causing the 68000 to "crash". When this happens the 68000
does this:

	1. disables interrupts
	2. writes an error log entry into flash
	3. updates Port Status with the halt reason
	4. changes state to "halted"
	5. interrupts the host with a "state change" reason

Notice that the 68000 is still in control even though it is declaring the
subsystem to be failed.  This code is very robust and few things cause 
it to halt so badly that it can't report state to the host driver.
A "double bus fault" will cause it to hang but those are "rare".

The base note is describing a driver that is reporting "HW FAULT" as the
halt reason noted in Step 3 above.  This is good information but it is
not enough to diagnose the problem from the 68000's side.  What we
really need to see is the result of Step 2: the error log entry.  This
gives a register dump at the time of the fault.  FDDI host drivers can 
get the error log entry from the adapter via either port register or
DMA commands.  The 68000 keeps the last several error reports in flash
and having a record of all of them (and not just the most recent) would
be most valuable.  If driver fta collects the error log it is not posted
in .0.

You mention a CISCO AGS+ in the configuration.  Is this the source of these
packets?

 Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142

I have never seen packets this big on my network!  I can guarantee that MY
adapters are not subjected to packets this big.  This makes the elusive
crash data in the error log even more intriguing.

I know that driver fta0 won't be modified overnight to get me the crash 
dump data.  I will suggest through other channels that we swap the customer's
adapter and get the adapter with the failures back to engineering.  The
crash dump can then be extracted and we can make progress on analyzing
this problem.

Regards,
Chuck
1833.8I'll continue this in mail...51900::jptFIS and ChipsThu Nov 02 1995 07:5718
>be most valuable.  If driver fta collects the error log it is not posted
>in .0.

	Reason why it's not posted is that there is no entry in error log!
	So we must approach it by sending the card to engineering if problem
	can't be isolated otherwise.

>I have never seen packets this big on my network!  I can guarantee that MY
>adapters are not subjected to packets this big.  This makes the elusive
>crash data in the error log even more intriguing.

	Yes, it has been proven with FDDI Analyzer too, your adapters
	aren't generating these packets ;-)

	Let's take this offline, I will mail you, thanks for you great
	answer in .7 !

		-jari