T.R | Title | User | Personal Name | Date | Lines |
---|
1114.1 | Need more info | LEVERS::PAGLIARO | Rich Pagliaro, Hub Products Group | Tue Jun 14 1994 12:38 | 8 |
| Before I can answer this question I need some more information. Can you
tell me what version of firmware the DECrepeater 900TMs are running?
It would also help a great deal if you dumped the error log (via the
setup menu) of the repeaters which reset and sent me the results.
Thanks,
Rich
|
1114.2 | Here is some info | UTRTSC::DENIJS | | Wed Jun 15 1994 04:41 | 20 |
| I have the versions for you so here goes:
Hub manager: HW rev. F
ROM V1.1.6
Firmware V2.2.1
Repeaters 900 TM slot HW RO SW
1 V2 V1 V1.0G
2 V2 V1 V1.0G
3 V1 V1 V1.0G
4 V2 V1 V1.0G
5 V2 V1 V1.0G
6 V1 V1 V1.0F
I have no errorlog info but will try to get it ASAP.
Thanks for your help,
Peter.
|
1114.3 | errorlog info | UTRTSC::DENIJS | | Thu Jun 16 1994 10:35 | 56 |
| Below is the errorlog from the repeater, as requested. You see the entries
of one repeater but all repeaters have the same entries, only the entry
numbers differ.
I have one more question, since the repeater are snmp managable, they must
be able to look at mac addresses and take frames from the ethernet. will they
only look for their own addresses ? in other words, how will the repeaters
react on broadcasts?
==============================================================================
Enter selection : 9
DECrepeater 900TM
=============================================================================
ERROR LOG
Entry = 45
Time Stamp = 0 0
Reset Count = 0
Fatal error: Line 611, File enet.c
Dump another entry y/[n]? y
Entry = 44
Time Stamp = 0 0
Reset Count = 0
Fatal error: Line 611, File enet.c
Dump another entry y/[n]? y
Entry = 43
Time Stamp = 0 0
Reset Count = 0
Fatal error: Line 611, File enet.c
Dump another entry y/[n]? y
Entry = 42
Time Stamp = 0 0
Reset Count = 0
Fatal error: Line 611, File enet.c
Dump another entry y/[n]? y
Regards,
Peter.
|
1114.4 | | LEVERS::PAGLIARO | Rich Pagliaro, Hub Products Group | Mon Jun 20 1994 15:08 | 27 |
| Peter,
Thanks for the extra info. The behaviour you are experiencing appears
to be somewhat similar to something we've seen here in our lab. In our
testing environment the behavior occurs very infrequently and we have
had difficulty determining the root cause of the problem.
You mentioned that your problem occures less frequently when there is
less traffic. Do you know approximately what your traffic level was
when you started to experience the problem? Do you know what percentage
of that traffic was multicast/broadcast?
To answer some of your other questions:
There is no way to stop the repeater from transmitting bootp requests
other than to assign the repeater an IP address.
The repeaters "look for" frames with their own unicast destination
addresses as well as frames with multicast and broadcast destination
addresses. That is, a repeater will receive frames with
multicast/broadcast destination addresses and process them. The actual
processing, of course, depends upon the received message.
Regards,
Rich
|
1114.5 | Some more info.... | UTRTSC::DENIJS | | Tue Jun 21 1994 08:18 | 79 |
| Rich,
Some more background info:
**************************
DECHUB 900 with 6 decrepeater 900 TM
Slot: 1 2 3 4 5 6
+-+ +-+ +-+ +-+ +-+ +-+
| | | | | | | | | | | |
-----*-------*-------*-------*-------*----------- Thin bus
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
+-+ +-+ +-+ +-+ +-+ +-+
|
| Repeater 1-5 are connected to the thinwire bus,
+-------+ | and not to any flex bus.
|DECNIS | | They have about 70 PC's connected (PCSA) to 3
| | | servers (4000), 7 VXT's.
|TCP/IP +------+
| LAT | Repeater 6 is not connected to any internal bus,
| | it just sits in the backplane for power and is
+---+---+ part of a different network.
|
/ To outside world ( 128Kb )
If we set up the terminal, connected to the hub, for displaying events we
see intermittent messages like " status: module not responding " every minute
or so on repeaters 1 to 5, 6 seems to be ok. This message is often followed by
a reset of the repeater (probably done by the hub manager because the module
is not responding). When we accedentely did an init of the hub, causing
repeater 6 to be also connected to the thin bus we also saw the messages and
resets for slot 6.
During troubleshooting we found two ways to get rid of the problems:
1) Disconnect the DECnis from the hub or disconnect the link to
the outside world on the DECnis.
2) Give the repeaters an ip address by hand.
1)Since we were wondering why the problems stopped when we disconnected the
link on the DECnis to the outside world we have monitored the packets going
to and coming from the MAC addresse of the repeaters. The only thing we have
seen here is broadcasts from all repeaters (bootp asking for ip address?).
The repeaters stopped sending those broadasts when an ip address was supplied
by hand.Nothing was sent back in response to these broadcasts.
The only thing we have not monitored was if there were broadcasts/multicasts
in response to the bootp requests, hence my question about how the repeaters
would react to broad/multicasts.
They only other thing i can think of why disconnecting the DECnis could stop
the problems is just taking away part of the ethernet load.
The total ethernet load during the problems was about 20-25%, we have not
measured the broadcasts/multicasts.
2)As a workaround we have now given the repeaters an ip address by hand.
At the moment i see two possible scenarios:
a) The bootp requests from the repeaters initiate some sort of
broadcast storm, keeping the repeaters to busy to respond to
the hub manager.
b) The repeaters send out a bootp request, listen to the response,
and in combination with ethernet load >25% is to busy to respond
to the hub manager.
Any other ideas ?
I think it would be possible for us to go back to the old situation and do
some measurements, however, this will take place during the evenings and i am
not sure we will see the problems then. But we can try.
Best regards,
Peter.
|
1114.6 | | NACAD2::SLAWRENCE | | Tue Jun 21 1994 12:17 | 9 |
| The reset is not initiated by the hub in response to the 'module not
responding'; what's happening is that the module has either hung or
crashed and the hub notices it before the self test is started on the
module. The reset is the cause and the 'not responding' is an effect,
not the other way around.
You can prevent your slot 6 repeater from being added to the backplane
on a reset by creating an Ethernet backplane net (IMB) and connecting
it to that by itself.
|
1114.7 | ideas for measurement with analyzer ??? | UTRTSC::DENIJS | | Mon Jun 27 1994 06:54 | 7 |
| ok, so the repeaters hang or crash. We are planning to go back onsite
to do some measurements with the lan analyzer. First thing we will look
for is broadcast storms.
Any other ideas what we can look for ?
Peter.
|
1114.8 | We have our best people working on it... | LEVERS::PAGLIARO | Rich Pagliaro, Hub Products Group | Mon Jun 27 1994 12:51 | 17 |
| Peter,
We are still investigating this problem here on our end. We had another
failure occur last week and it correlated to what you have observed
regarding IP addresses. That is, repeaters assigned IP addresses did
not crash while repeaters without IP addresses did crash. Because of
this we suspect something is broken in the repeater's bootP protocol
processing.
The people actually doing the investigation here told me you might want
to look for bogus bootP responses or bogus ICMP messages.
I'll post results here when we find something definitive.
Regards,
Rich
|
1114.9 | Thanks for the update | UTRTSC::DENIJS | | Tue Jun 28 1994 09:00 | 11 |
| Hi Rich,
Thanks for the update. As of next week i will be on vacation but one of
our other engineers ( Ted Paehlig ) will setup a session on site to do
some maesurements. If he finds something interresting he will post it
here.
Best regards,
Peter.
|
1114.10 | Update...Problem solved! | NACAD::PAGLIARO | Rich Pagliaro, Hub Products Group | Thu Jul 07 1994 13:56 | 26 |
| Peter,
Good News! The bug causing the problem you witnessed has been found and
fixed. Apparently there is a bug in the repeater's UDP layer
processing. The UDP layer is designed to accept bootp responses when
the repeater has not been assigned an IP address. The problem occurs
when the repeater receives other types of IP frames when the repeater
is not assigned an IP address. The UDP layer will not process the
frame but it will also not relinquish the frame's buffer. Hence a
memory leak exists. Eventually the repeater runs out of buffers and
crashes.
We experienced this problem here in our lab when some station on the
network decided to send out SNMP requests to the broadcast address.
For what its worth, this UDP code is shared by all of the repeaters as
well as the DECconcentrator 900 and DECbridge 900. The repeaters will
crash once they run out of buffers. I personally do not know how this
memory leak effects the behavior of the concentrator and bridge.
The fix to this bug will be available in the soon to be released
"60-day upgrade".
Regards,
Rich
|
1114.11 | Great News! | IJSAPL::PAEHLIG | Ted Paehlig _ Amsterdam | Fri Jul 08 1994 07:15 | 10 |
|
We won't have to bother our customer with additional measurement visits
but tell him the good news instead !
Please keep us posted on the availability of the fix(es).
Thanks for your pronto attention on this matter.
Peter (proxy) & Ted
|
1114.12 | effect on concentrator & bridge | LEVERS::SLAWRENCE | | Mon Jul 11 1994 11:50 | 13 |
|
This problem causes the DECconcentrator 900MX to 'go silent'; it
doesn't crash, but it stops doing any management communications
(including the FDDI management frames used to create a ring map). It
does continue to pass FDDI frames normally, but won't respond to
anything for its own MAC address or as an IP server.
It too will have the fix in the 60 day upgrade.
I don't believe that this problem affects the DECbridge 900MX because
of the way the low-level bridging code passes frames up to the
management stack, but in any event it too will have the fix.
|
1114.13 | When will the 60 day upgrade availible | ZUR01::SCHNEIDERR | | Thu Jul 21 1994 11:40 | 4 |
| When will the 60 day upgrade availible???
Roland
|
1114.14 | Early August | NACAD2::PAGLIARO | Rich Pagliaro, Hub Products Group | Thu Jul 21 1994 11:44 | 3 |
| I believe some time during the first week of August.
-Rich
|
1114.15 | Customer kits available later | NAC::FORREST | | Mon Jul 25 1994 15:20 | 7 |
|
To clarify, Rich is talking about online availability. It will be
available to customers with HUBwatch V3.1 when V3.1 ships, hopefully
by the end of September.
You can have problems by upgrading only one module, and
not the MAM, or HUBwatch.
|
1114.16 | Can I upgrade | ZUR01::SCHNEIDERR | | Tue Aug 02 1994 06:08 | 8 |
| We have different problems out in the field and we are waiting for this upgrade.
If the upgrade is availible (this week, isnb't it???), can we upgrade the MAM and
the modules and use HUBwatch V3.0? Or do we realy have to wait until HUBwatch
V3.1???
Roland
|
1114.17 | V3.0 is ok in the short run | NACAD2::HAROKOPUS | | Wed Aug 03 1994 13:22 | 10 |
| Although, officially you need HUBwatch V3.1 with MAM V3.1, I don't
anticipate any problems using V3.0 until V3.1 ships.
However, there are some new modules shipping soon that are not
supported by V3.0 and V3.1 has many bug fixes so you will want
to upgrade to V3.1 as soon as it is available.
Regards,
Bob
|