T.R | Title | User | Personal Name | Date | Lines |
---|
2065.1 | | 19584::STOCKDALE | | Thu Jun 13 1996 08:45 | 14 |
| I can't answer your question about why the transmit timeout occurred
but normally its because the link became unavailable so rather than
hold on to the outstanding transmit forever, FWDRIVER resets the DEFPA
and returns the transmit with error status.
Perhaps a SHOW LAN/FULL would provide more information.
As to the driver version question. The V6.2 driver enabled parity
checking when it shouldn't have. This caused occasional parity error
crashes. The new driver disabled parity checking. This change is
included in the V6.2-1H* versions. This sounds like a much different
problem than what you are having which sounds like a network problem.
Dick
|
2065.2 | GIGASwitch crashing? | CSC32::J_SOBECKI | John Sobecki, DTN 592-4101, CXO3-2/D2 | Thu Jun 13 1996 13:10 | 20 |
| Hello,
Usually the transmit timeouts are caused by the loss of physical
connection, aka is the GIGASwitch crashing? I've never heard of a
DEFPA causing an FGL-4 card to go down.
Were the previous crashes the UCB R5 cleared crash? This crash seems
to not be checked in the recent LAN driver images.
The V6.2 driver should work fine under V6.2-1H2. I'd check the errolog
on the GIGASwitch to see what's causing the transmit timeouts. If you
have more than one SCP, and the SCP's are crashing, the errorlog is
contained on the SCP itself. So if the Elected SCP is the seconday
SCP, you'll need to fail back to the primary SCP to check the errorlog.
Maybe this is a new 2100A related problem. I'd IPMT the driver issue
if the crashes have returned.
Good Day,
John
|
2065.3 | Get error log from FGL4 if necessary | NPSS::RLEBLANC | | Thu Jun 13 1996 16:42 | 7 |
|
If the SCP reports the FGL-4 in question is crashing, please
also get the error log from the FGL4.
|
2065.4 | | FRSIT::MAYER | | Mon Jun 17 1996 08:20 | 10 |
| Hi,
as next we will check GIGAswitch Errorlog to see if there are some Problem
regarding the GIGAswitch SCP or Linecard.
Also a sho lan/full is available on FRSIT::GSI_SDA_LAN.TXT
Regards
Juergen Mayer
|
2065.5 | | 19584::STOCKDALE | | Tue Jun 18 1996 12:17 | 5 |
| >>Also a sho lan/full is available on FRSIT::GSI_SDA_LAN.TXT
It doesn't appear to be there.
- Dick
|
2065.6 | SDA output now available | FRSIT::MAYER | | Thu Jun 20 1996 09:15 | 5 |
| Sorry,
the sho lan/full is now available on FRSIT::GSI_SDA_LAN.TXT
Regards Juergen
|
2065.7 | | 19584::STOCKDALE | | Thu Jun 20 1996 16:57 | 63 |
| If I extract the significant information from the counters it shows that
the ring went away and came back a few times, resulting in failed transmits
(either a timeout after the ring went away or transmits while the ring was
not available). The last error CSR shows the port status register contents
at the time of the transmit timeout, showing 'link available' and nothing
else - this indicates that the FDDI appeared to be ok when the driver
declared a transmit timeout and shut down the adapter. Note that the
transmit timeout is 5-6 seconds, so the device owned the transmit for
that long before the timeout occurred.
Transmit underrun 0 Dup tokens detected 7
Ring inits received 5 LEM rejects 0
DAT test failures 0 Connections completed 10
No work transmits 59193334 Ring avail transitions 10
Buffer_Addr transmits 0 Ring unavail transitions 7
+00 Device interrupts 296991649 +2C Too many segments 0
+08 Transmits failed 2779 +34 RESETs issued 3
+0C Receive errors 0 +38 Fatal errs (soft tmo) 2
+10 Transmit timeouts 2 +3C EEPROM update tmo 0
Fatal error count 2 Last error CSR 00000400
Fatal error code 3-XmtTimeout Last fatal error 11-JUN 11:09:38
Prev error code 3-XmtTimeout Prev fatal error 7-JUN 16:50:01
Transmit timeouts 2 Last USB time None
The driver version is the V6.2-1H2 version. There is a later version in
V6.2-1H3 but it only has a bug fix for a DEFAA workaround so although the
version is different, the code is identical since the DEFAA bug fix is in
DEFAA conditional code).
But the driver consists of a port driver plus the LAN common routines. The
LAN common routines has a couple of fixes in V6.2-1H3, one when more than
11 multicast addresses are enabled (this system has 11 exactly), and one
which affects shared user applications causing the first packet received
by a shared user to be lost (if there was actually a shared user) and in
this case there are no shared users (although there are two users started
in shared mode there are only one for each protocol type). So neither of
these fixes is significant in your case.
So, my guess is that there was an failure of the ring which is likely
something on the ring and not the DEFPA in the system. Perhaps a
longer timeout would have allowed the FDDI ring to recover from whatever
was going on, but given that the driver would have restarted the users
automatically immediately after the error, the cluexits shouldn't have
happened, but apparently the FDDI ring did not come back before the
reconnect interval expired so the satellites cluexited. Increasing
the reconnect interval may give the nodes enough time for the ring
to recover.
>> The question is, why do we get the first error on the FWA0 interface.
Because the FDDI ring became unavailable for more than 5-6 seconds.
>> Are those changes from the special driver implemented in the V6.2-1H2 one?
Yes.
>> May he use the special one from V6.2 with V6.2-1H2?
Yes, as long as he doesn't want a couple of additional bug fixes.
- Dick
|
2065.8 | | FRSIT::MAYER | | Fri Jun 21 1996 07:32 | 13 |
| Hi Dick,
I also saw the Ring Inits and Connection Completed. So I asked the customer if
he was plugging and unplugging the Systems from the Gigaswitch.
He confirmed that he was moving from one Gigaswitch Port to another ones,
but didn't remember how often.
So in the moment we doesn't know how many Inits are "homemade" or real failures.
Because we have the counters from know, we have to wait until the next failure
occurs.
We also focus on the Gigaswitch counters and errorlogs.
regards Juergen
|