[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference netcad::hub_mgnt

Title:	DEChub/HUBwatch/PROBEwatch CONFERENCE
Notice:	Firmware -2, Doc -3, Power -4, HW kits -5, firm load -6&7
Moderator:	NETCAD::COLELLADT

Created:	Wed Nov 13 1991
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4455
Total number of notes:	16761

1786.0. "Fault-Tolerant 10BaseFL FOT ??" by MSDOA::REED (John Reed @CBO, DTN:367-6463, KB4FFE, SouthEast) Thu Dec 15 1994 12:56

    Hello.
    
    I am configured a large site full of ALpha's on DECConcentrator900,
    and PC's off of DECswitch900EF, and a few old DECbridge 620's.   This
    has worked nicely for my customer for a year.  But his VAX file server
    for the PC's is a 4000 with an ISA-0 ethernet and a QNA ethernet.  He
    lost a DECBridge900mx a few weeks ago, when it started rebooting for no
    good reason.  I upgraded the firmware, and it seemed to help.   
    
    He is now concerned about redundancy for his Pathworks physical
    connection to the server.   When that DECbridge rebooted, all of his
    pathworks users got knocked off the LAN.   He has asked how to connect
    the file server redundantly to the FDDI ring.  I told him that the best
    method was by using a direct Fiber link, to a DECRepeater900FP
    port-pair.   He has two DEChub900's, that could receive a DEFMM card,
    and have spare ports of the DECbridge900mx switched to the backplane.
    
    I could feed a fiber from a master port, and a fiber from a slave port
    to the VAX's ISA-0 ethernet port, and attach a fault-tolerant FOT to
    that VAX Ethernet port.  (He is using the QNA-0 port for cluster
    traffic only).
    
    
    I called Anixter, and they sell a fault-toleranct optical transceiver
    that speaks 10baseFL, from a vendor called MiLAN Technology.  It has
    two modular ports that can be configured each for either 10BaseT, or
    10BaseFL.  It will use the active port unless it is "disrupted" and
    then the backup port will take over. 
    
    Is is possible for me to attach this transceiver with two 10baseFL
    inserts to a fiber from each DEChub's DECrepeater?  I know that the
    transceiver will probably perform the switching, and the two repeater
    ports will probably keep running, since the stand-by port shouldn't
    transmit. I expect that the STP from the bridge on the backup port will
    not enable that port until 45 seconds after the failure of the first
    bridge or fiber, but I expect this solution to be better than  a single
    link.
    
    We saw a chart at NEtwork Academy that told about the protocols used by
    Digital's redundant ethernet fiber pairs.  DO we have a planned
    fault-tolerant FOT that will operate with the same protocol used by the
    DECrepeater900fp ??   I would prefer to sell DEC product, but any port
    in a storm...
    
    JR

T.R	Title	User	Personal Name	Date	Lines
1786.1		KAOFS::S_HYNDMAN	Acronym Decoder Ring Architect	`Thu Dec 15 1994 17:46`	11
	I'm not really clear on what your trying to do, have backup connections to bypass the bridge or provide redundant connections for the server. If it was the latter, why not go DAS FDDI and multi home the server on the ring? Cabletron also make redundant fiber tranceivers. Scott
1786.2	It's a VAX 4500 Pathworks Server	MSDOA::REED	John Reed @CBO, DTN:367-6463, KB4FFE, SouthEast	`Fri Dec 16 1994 09:17`	23
	The 4000 series VAXes have Q-bus FDDI controllers as the only available option, and the customer feels that the throughput of the ISA Ethernet will be faster than a Q-bus attached FDDI controller. I need to have a way to reach this file server if the DEChub900 near it decides to crash. The customer has expericenced several hub crashes (it's on a UPS, has three DECCon, and one DECbridge, with three power supply modules) and each time the hub reboots, he looses the connections to his file server. He wants a way to keep the PC's running through the HUB crashes. The PC's are connected to Ethernets, on various other DEChub mounted DECbridge900's. I beleive that the Fault Tolerant Ethernet FOT attached to his ISA-0 Ethernet, and one Primary fiber port fed to a repeater module in one hub, and the backup fiber port fed to a module in a different hub will work, as long as the repeater modules have ANOTHER working node on their PORT GROUP. This will keep the spanning tree from shutting down either port, and allow the FOT to choose the proper path to enable. The customer would like to eventually put an FDDI PC file server on the ring. But I think that this will be a godd starting point. JR
1786.3		NETCAD::SLAWRENCE		`Fri Dec 16 1994 12:23`	8
	Ahh hah! The hub is crashing? It shouldn't be, so let's look at that... What are the firmware revs for the Hub and all modules? Are there error log entries?
1786.4	I HOPE the crashing has stopped...	MSDOA::REED	John Reed @CBO, (803) 781-9571 NIS Networker	`Mon Dec 19 1994 09:13`	31
	The crashing appears to have stopped after we upgraded to the most recent revisions. (It hasn't occured for a week now, and it used to be several times a day). They used to have DECcon FM 2.0.0, DECBridge900 version 1.2.1, and HUBmanager v3.0.0. It ran wonderfully, until the imaging application on the alpha's came online. They have an Alpha farm with Kubota(tm) graphics accelerators and funny little transmitters on top of their screens. They wear 3-D glasses, and do molecular modelling. The images spin around, suspended in the air in front of your monitor. If you wear the glasses, and turn out the lights, it would make a great lava lamp at a 60's party... They are a medical research and design firm, with a lobby full of patent grants and awards. They have since upgraded to v2.8.0 on the Conc, 1.4.0 on the Bride, and 3.1.0 on the HUB managers. They feel the problem was traffic related, and they think the DECbridge900 "couldn't keep up with the traffic." The customer's MIS department suffered a lot of grief during the period when the Hubs were rebooting. The MIS staff doesn't want this to occur again, and they see how the link to their file server is a single point of failure. (For that matter, having a single file server is also troublesome). So, they are planning additional fault tolerance. They like the FDDI, and the speed, and the way that it wraps around outages. We are trying to add to their comfort level about the bridges, and give them some redundancy. Ethernet and the STP will not bypass a fault as quickly as FDDI, (typically 45 seconds) so their LAT and Pathworks DIsks might time out during a hub crash. But I hope to create a config where the users can get back on quickly. JR
1786.5	The crashing should never have started...	NETCAD::SLAWRENCE		`Mon Dec 19 1994 17:30`	41
	I don't know how much comfort it will add, but here's more data, for what it's worth: The crash you saw was very well understood here; in fact, it took out our file servers here in DEChub Engineering before we ever released the bridge to field test. The original problem was in the bridge, and was - in a way - traffic related (your customer was right). It started with a bug in the IP fragmentation code in the bridge that occured only if two IP packets arrived from the FDDI requiring fragmentation _very_ close together such that they both were queued together in the bridge (this is a very narrow window). It took a little while, but a few of these crashed the bridge. Combine that with some problems in the hub manager that had problems with modules that crashed too frequently, and you end up with an unstable hub. (Without this bug they keep up just fine, by the way) The good news is that all of the above are fixed in the latest releases. The bad news (for your customer) is that the bugs have been fixed for quite a while now, and they didn't get the fix. We have spent a great deal of energy here on trying to create a set of mechanisms that ensures that the latest releases of all our firmware is available to the field and (where possible) directly to the customer - but it does no good if you don't check them. What your customer had was the very first field release of firmware - almost certain to have at least some minor problems (in this case, unfortunately, it was fairly serious for them because thier Alphas were so fast). We _cannot_ guarantee that you will get the latest release of firmware when hardware is delivered to you. You should _never_ assume that it is up to date. We have Internet and Easynet archives for the latest firmware, and mailing lists that you and your customers can subscribe to for release notices. Pointers to both are in the owners manuals and/or the release notes.