T.R | Title | User | Personal Name | Date | Lines |
---|
1786.1 | | KAOFS::S_HYNDMAN | Acronym Decoder Ring Architect | Thu Dec 15 1994 17:46 | 11 |
|
I'm not really clear on what your trying to do, have backup
connections to bypass the bridge or provide redundant connections for
the server. If it was the latter, why not go DAS FDDI and multi home the
server on the ring?
Cabletron also make redundant fiber tranceivers.
Scott
|
1786.2 | It's a VAX 4500 Pathworks Server | MSDOA::REED | John Reed @CBO, DTN:367-6463, KB4FFE, SouthEast | Fri Dec 16 1994 09:17 | 23 |
| The 4000 series VAXes have Q-bus FDDI controllers as the only available
option, and the customer feels that the throughput of the ISA Ethernet
will be faster than a Q-bus attached FDDI controller.
I need to have a way to reach this file server if the DEChub900 near it
decides to crash. The customer has expericenced several hub crashes
(it's on a UPS, has three DECCon, and one DECbridge, with three power
supply modules) and each time the hub reboots, he looses the
connections to his file server. He wants a way to keep the PC's
running through the HUB crashes.
The PC's are connected to Ethernets, on various other DEChub mounted
DECbridge900's. I beleive that the Fault Tolerant Ethernet FOT
attached to his ISA-0 Ethernet, and one Primary fiber port fed to a
repeater module in one hub, and the backup fiber port fed to a module
in a different hub will work, as long as the repeater modules have
ANOTHER working node on their PORT GROUP. This will keep the spanning
tree from shutting down either port, and allow the FOT to choose the
proper path to enable. The customer would like to eventually put an
FDDI PC file server on the ring. But I think that this will be a godd
starting point.
JR
|
1786.3 | | NETCAD::SLAWRENCE | | Fri Dec 16 1994 12:23 | 8 |
|
Ahh hah! The hub is crashing? It shouldn't be, so let's look at
that...
What are the firmware revs for the Hub and all modules?
Are there error log entries?
|
1786.4 | I HOPE the crashing has stopped... | MSDOA::REED | John Reed @CBO, (803) 781-9571 NIS Networker | Mon Dec 19 1994 09:13 | 31 |
| The crashing appears to have stopped after we upgraded to the most
recent revisions. (It hasn't occured for a week now, and it used to be
several times a day). They used to have DECcon FM 2.0.0, DECBridge900
version 1.2.1, and HUBmanager v3.0.0. It ran wonderfully, until the
imaging application on the alpha's came online. They have an Alpha
farm with Kubota(tm) graphics accelerators and funny little
transmitters on top of their screens. They wear 3-D glasses, and do
molecular modelling. The images spin around, suspended in the air in
front of your monitor. If you wear the glasses, and turn out the
lights, it would make a great lava lamp at a 60's party... They are a
medical research and design firm, with a lobby full of patent grants
and awards.
They have since upgraded to v2.8.0 on the Conc, 1.4.0 on the Bride, and
3.1.0 on the HUB managers. They feel the problem was traffic related,
and they think the DECbridge900 "couldn't keep up with the traffic."
The customer's MIS department suffered a lot of grief during the
period when the Hubs were rebooting. The MIS staff doesn't want this
to occur again, and they see how the link to their file server is a
single point of failure. (For that matter, having a single file
server is also troublesome). So, they are planning additional fault
tolerance. They like the FDDI, and the speed, and the way that it
wraps around outages. We are trying to add to their comfort level
about the bridges, and give them some redundancy. Ethernet and the STP
will not bypass a fault as quickly as FDDI, (typically 45 seconds) so
their LAT and Pathworks DIsks might time out during a hub crash. But I
hope to create a config where the users can get back on quickly.
JR
|
1786.5 | The crashing should never have started... | NETCAD::SLAWRENCE | | Mon Dec 19 1994 17:30 | 41 |
|
I don't know how much comfort it will add, but here's more data, for
what it's worth:
The crash you saw was very well understood here; in fact, it took out
our file servers here in DEChub Engineering before we ever released the
bridge to field test.
The original problem was in the bridge, and was - in a way - traffic
related (your customer was right). It started with a bug in the IP
fragmentation code in the bridge that occured only if two IP packets
arrived from the FDDI requiring fragmentation _very_ close together
such that they both were queued together in the bridge (this is a
very narrow window). It took a little while, but a few of these
crashed the bridge. Combine that with some problems in the hub manager
that had problems with modules that crashed too frequently, and you end
up with an unstable hub. (Without this bug they keep up just fine, by
the way)
The good news is that all of the above are fixed in the latest
releases.
The bad news (for your customer) is that the bugs have been fixed for
quite a while now, and they didn't get the fix.
We have spent a great deal of energy here on trying to create a set of
mechanisms that ensures that the latest releases of all our firmware is
available to the field and (where possible) directly to the customer -
but it does no good if you don't check them. What your customer had
was the very first field release of firmware - almost certain to have
at least some minor problems (in this case, unfortunately, it was
fairly serious for them because thier Alphas were so fast).
We _cannot_ guarantee that you will get the latest release of firmware
when hardware is delivered to you. You should _never_ assume that it
is up to date.
We have Internet and Easynet archives for the latest firmware, and
mailing lists that you and your customers can subscribe to for release
notices. Pointers to both are in the owners manuals and/or the release
notes.
|