| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 1833.1 | so, any comments? | 51900::jpt | FIS and Chips | Thu Oct 19 1995 05:03 | 9 | 
|  | 
	It would be really great if someone could give some hints on this,
	as after V3.2c upgrade the FDDI controllers reset themselves to
	totallu useless state after few minutes and system  hangs.
	This starts to get annoying...
		-jari
 | 
| 1833.2 |  | NPSS::WADE | Network Systems Support | Thu Oct 19 1995 12:08 | 11 | 
|  |     Did this start after the V3.2c upgrade and it was running fine up to
    then?  
    
    What changed between the time when everything was working fine and
    now?
    
    And, have you tried using an FDDI lan analyzer to determine if there
    really are oversized packets on the wire and is so who is sending them?
                                                        
    Bill
    
 | 
| 1833.3 | let's see ... it may be faulty concentrator, but ... | 51900::jpt | FIS and Chips | Fri Oct 20 1995 04:50 | 36 | 
|  | 	Bill,
	Thanks for your comments.
>    What changed between the time when everything was working fine and
>    now?
	Well, it worked sluggishly even with V3.0 and now when they installed
	V3.2c it stopped working totally (in practise). In practise meaning
	that after the controller reset the controller seems to ignore
	all parameters set by user and resets to some mystical state that
	hangs the system totally. With V3.0 it suffered the problem about
	once a week and recovery was easier (total hardware reset not needed),
	but with V3.2c it takes hard reset to clear things every time, and 
	it takes only about 30 minutes with 3.2c to hang the system.
	Guess I have is that there really may be those illegal lenght packets,
	but the real problem might be that the interaction of FDDI controller
	firmware and Digital UNIX FDDI driver in case of this error has 
	problems.
	We will try replacing concentrator under suspect, but it would be
	nice to find out why system hangs and can't handle these errors, and
	or course getting this fixed would protect us against further problems.
    
>    And, have you tried using an FDDI lan analyzer to determine if there
>    really are oversized packets on the wire and is so who is sending them?
	Where could we get FDDI LAN analyzer? We don't have one, and it
	costs like h*ll to buy one? We have tried tracking borrowing one,
	but only place we've found this far is propably HP's local office ;-)
	Thanks,
		-jari
 | 
| 1833.4 | PCI FDDI too | 51900::jpt | FIS and Chips | Mon Oct 23 1995 06:02 | 9 | 
|  | 
	Now the exactly same problem has been demonstrated also with 
	PCI FDDI in AlphaServer 2100 4/275 (2cpu's).
	Any hints? I couldn't find related modules from V3.0 source kit...
	I found only error reporting, but no routine calling that module
	with PI_HALT_ID_K_HW_FAULT or any similar error...
		-jari
 | 
| 1833.5 | they have traditionally used pfilt utilities ... | 51900::jpt | FIS and Chips | Tue Oct 24 1995 14:46 | 16 | 
|  | 
	These aren't necessarily related problems, but this is very recent
	patch, could these two have anything common???
 --------- V3.2C patches ----------
PROBLEM:        (Patch ID: OSF350-041)          (CLD 7AZB92029)
********
When writing packets using the packetfilter on FDDI, there are 14 bytes
of corruption in the link layer header of the packet, so the packet
appears corrupted on the FDDI ring.  This fix is to the FDDI/packet
filter code for an erroneous write-side bcopy, which has been corrected.
This problem does not occur using Ethernet.  It is included here
so that patch #1 does not get overwritten.  There is no impact (side
effects, etc.) if this patch is installed when FDDI is not being used.
 | 
| 1833.6 | I've received comments | 51900::jpt | FIS and Chips | Wed Nov 01 1995 04:08 | 6 | 
|  | 
	I've been (unofficially) told yesterday that this error should
	never occur. Our engineering should propably be aware of this
	now...
			-jari
 | 
| 1833.7 | some facts, a suggestion | NETCAD::ROLKE | $ set terminal/script | Wed Nov 01 1995 15:25 | 49 | 
|  | The common "FDDI corner" has a 68000 processor and a standard set of
fancy hardware to implement FDDI for your host.  The host driver talks
to the 68000 processor to get the link set up.  Eventually the data 
flows between FDDI and the host with no intervention by the 68000.
This is what you want - the fancy hardware is doing all the work.
Occasionally, however, something evil happens either to the hardware
or to the 68000 causing the 68000 to "crash". When this happens the 68000
does this:
	1. disables interrupts
	2. writes an error log entry into flash
	3. updates Port Status with the halt reason
	4. changes state to "halted"
	5. interrupts the host with a "state change" reason
Notice that the 68000 is still in control even though it is declaring the
subsystem to be failed.  This code is very robust and few things cause 
it to halt so badly that it can't report state to the host driver.
A "double bus fault" will cause it to hang but those are "rare".
The base note is describing a driver that is reporting "HW FAULT" as the
halt reason noted in Step 3 above.  This is good information but it is
not enough to diagnose the problem from the 68000's side.  What we
really need to see is the result of Step 2: the error log entry.  This
gives a register dump at the time of the fault.  FDDI host drivers can 
get the error log entry from the adapter via either port register or
DMA commands.  The 68000 keeps the last several error reports in flash
and having a record of all of them (and not just the most recent) would
be most valuable.  If driver fta collects the error log it is not posted
in .0.
You mention a CISCO AGS+ in the configuration.  Is this the source of these
packets?
 Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142
I have never seen packets this big on my network!  I can guarantee that MY
adapters are not subjected to packets this big.  This makes the elusive
crash data in the error log even more intriguing.
I know that driver fta0 won't be modified overnight to get me the crash 
dump data.  I will suggest through other channels that we swap the customer's
adapter and get the adapter with the failures back to engineering.  The
crash dump can then be extracted and we can make progress on analyzing
this problem.
Regards,
Chuck
 | 
| 1833.8 | I'll continue this in mail... | 51900::jpt | FIS and Chips | Thu Nov 02 1995 07:57 | 18 | 
|  | >be most valuable.  If driver fta collects the error log it is not posted
>in .0.
	Reason why it's not posted is that there is no entry in error log!
	So we must approach it by sending the card to engineering if problem
	can't be isolated otherwise.
>I have never seen packets this big on my network!  I can guarantee that MY
>adapters are not subjected to packets this big.  This makes the elusive
>crash data in the error log even more intriguing.
	Yes, it has been proven with FDDI Analyzer too, your adapters
	aren't generating these packets ;-)
	Let's take this offline, I will mail you, thanks for you great
	answer in .7 !
		-jari
 |