T.R | Title | User | Personal Name | Date | Lines |
---|
1833.1 | so, any comments? | 51900::jpt | FIS and Chips | Thu Oct 19 1995 06:03 | 9 |
|
It would be really great if someone could give some hints on this,
as after V3.2c upgrade the FDDI controllers reset themselves to
totallu useless state after few minutes and system hangs.
This starts to get annoying...
-jari
|
1833.2 | | NPSS::WADE | Network Systems Support | Thu Oct 19 1995 13:08 | 11 |
| Did this start after the V3.2c upgrade and it was running fine up to
then?
What changed between the time when everything was working fine and
now?
And, have you tried using an FDDI lan analyzer to determine if there
really are oversized packets on the wire and is so who is sending them?
Bill
|
1833.3 | let's see ... it may be faulty concentrator, but ... | 51900::jpt | FIS and Chips | Fri Oct 20 1995 05:50 | 36 |
| Bill,
Thanks for your comments.
> What changed between the time when everything was working fine and
> now?
Well, it worked sluggishly even with V3.0 and now when they installed
V3.2c it stopped working totally (in practise). In practise meaning
that after the controller reset the controller seems to ignore
all parameters set by user and resets to some mystical state that
hangs the system totally. With V3.0 it suffered the problem about
once a week and recovery was easier (total hardware reset not needed),
but with V3.2c it takes hard reset to clear things every time, and
it takes only about 30 minutes with 3.2c to hang the system.
Guess I have is that there really may be those illegal lenght packets,
but the real problem might be that the interaction of FDDI controller
firmware and Digital UNIX FDDI driver in case of this error has
problems.
We will try replacing concentrator under suspect, but it would be
nice to find out why system hangs and can't handle these errors, and
or course getting this fixed would protect us against further problems.
> And, have you tried using an FDDI lan analyzer to determine if there
> really are oversized packets on the wire and is so who is sending them?
Where could we get FDDI LAN analyzer? We don't have one, and it
costs like h*ll to buy one? We have tried tracking borrowing one,
but only place we've found this far is propably HP's local office ;-)
Thanks,
-jari
|
1833.4 | PCI FDDI too | 51900::jpt | FIS and Chips | Mon Oct 23 1995 06:02 | 9 |
|
Now the exactly same problem has been demonstrated also with
PCI FDDI in AlphaServer 2100 4/275 (2cpu's).
Any hints? I couldn't find related modules from V3.0 source kit...
I found only error reporting, but no routine calling that module
with PI_HALT_ID_K_HW_FAULT or any similar error...
-jari
|
1833.5 | they have traditionally used pfilt utilities ... | 51900::jpt | FIS and Chips | Tue Oct 24 1995 14:46 | 16 |
|
These aren't necessarily related problems, but this is very recent
patch, could these two have anything common???
--------- V3.2C patches ----------
PROBLEM: (Patch ID: OSF350-041) (CLD 7AZB92029)
********
When writing packets using the packetfilter on FDDI, there are 14 bytes
of corruption in the link layer header of the packet, so the packet
appears corrupted on the FDDI ring. This fix is to the FDDI/packet
filter code for an erroneous write-side bcopy, which has been corrected.
This problem does not occur using Ethernet. It is included here
so that patch #1 does not get overwritten. There is no impact (side
effects, etc.) if this patch is installed when FDDI is not being used.
|
1833.6 | I've received comments | 51900::jpt | FIS and Chips | Wed Nov 01 1995 04:08 | 6 |
|
I've been (unofficially) told yesterday that this error should
never occur. Our engineering should propably be aware of this
now...
-jari
|
1833.7 | some facts, a suggestion | NETCAD::ROLKE | $ set terminal/script | Wed Nov 01 1995 15:25 | 49 |
| The common "FDDI corner" has a 68000 processor and a standard set of
fancy hardware to implement FDDI for your host. The host driver talks
to the 68000 processor to get the link set up. Eventually the data
flows between FDDI and the host with no intervention by the 68000.
This is what you want - the fancy hardware is doing all the work.
Occasionally, however, something evil happens either to the hardware
or to the 68000 causing the 68000 to "crash". When this happens the 68000
does this:
1. disables interrupts
2. writes an error log entry into flash
3. updates Port Status with the halt reason
4. changes state to "halted"
5. interrupts the host with a "state change" reason
Notice that the 68000 is still in control even though it is declaring the
subsystem to be failed. This code is very robust and few things cause
it to halt so badly that it can't report state to the host driver.
A "double bus fault" will cause it to hang but those are "rare".
The base note is describing a driver that is reporting "HW FAULT" as the
halt reason noted in Step 3 above. This is good information but it is
not enough to diagnose the problem from the 68000's side. What we
really need to see is the result of Step 2: the error log entry. This
gives a register dump at the time of the fault. FDDI host drivers can
get the error log entry from the adapter via either port register or
DMA commands. The 68000 keeps the last several error reports in flash
and having a record of all of them (and not just the most recent) would
be most valuable. If driver fta collects the error log it is not posted
in .0.
You mention a CISCO AGS+ in the configuration. Is this the source of these
packets?
Sep 10 20:33:42 lerppu vmunix: fta0: Illegal length, packet dropped; len = 5142
I have never seen packets this big on my network! I can guarantee that MY
adapters are not subjected to packets this big. This makes the elusive
crash data in the error log even more intriguing.
I know that driver fta0 won't be modified overnight to get me the crash
dump data. I will suggest through other channels that we swap the customer's
adapter and get the adapter with the failures back to engineering. The
crash dump can then be extracted and we can make progress on analyzing
this problem.
Regards,
Chuck
|
1833.8 | I'll continue this in mail... | 51900::jpt | FIS and Chips | Thu Nov 02 1995 07:57 | 18 |
| >be most valuable. If driver fta collects the error log it is not posted
>in .0.
Reason why it's not posted is that there is no entry in error log!
So we must approach it by sending the card to engineering if problem
can't be isolated otherwise.
>I have never seen packets this big on my network! I can guarantee that MY
>adapters are not subjected to packets this big. This makes the elusive
>crash data in the error log even more intriguing.
Yes, it has been proven with FDDI Analyzer too, your adapters
aren't generating these packets ;-)
Let's take this offline, I will mail you, thanks for you great
answer in .7 !
-jari
|