[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference netcad::hub_mgnt

Title:	DEChub/HUBwatch/PROBEwatch CONFERENCE
Notice:	Firmware -2, Doc -3, Power -4, HW kits -5, firm load -6&7
Moderator:	NETCAD::COLELLADT

Created:	Wed Nov 13 1991
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	4455
Total number of notes:	16761

776.0. "error 1801 on DECbridge 90" by CSC32::L_MORSE () Mon Feb 28 1994 18:15

    
    	A DECbridge 90 show repeater command loops with
    
    	error 1801 firmware 3.1  
    
    	every 30 seconds until the RCF connection is broken.
    
    	question ? where is such an error documented ?
    
    	I am tying to get a more complete problem statement.

T.R	Title	User	Personal Name	Date	Lines
776.1	an explanation	CSC32::L_MORSE		`Tue Mar 01 1994 13:39`	67
	An explanation: Subj: DECbridge-90 "1801" messages 18 means "Beginning of self test." 01 means "End of self test". That no other numbers appear means that no tests were performed. The DECbridge-90 will do this whenever it thinks that it may have lost track of some memory. The DECbridge-90 does this under any of the following conditions: - An overrun error on the Intel 82590 Ethernet controller. This isn't supposed to ever happen, because the bus is allocated in fixed time slots, and there is always enough time for the controller. However, if the controller reports this error, the bridge will respond with an 1801. If this is what is causing the problem, it is either bad hardware (something stuck accessing the memory bus), or the network "traffic" is some kind of noise that is badly confusing the controller. These aren't counted, so neither SHOW DISPLAY nor SHOW PORT will help you. - Two consecutive lifetime exceeded errors (indicating substantial outbound congestion failure, or a bad tranceiver). Normally, a port's "lifetime exceeded" counter never increments above 0. If this counter is incrementing, then this could be the problem. It would indicate some kind of problem either with excessive traffic on the network (sum of traffic on both wires exceedig 14,000 packets/second), or some kind of wiring problem that makes transmitting packets difficult, and involving many retries. This could also result from using some non-conforming Ethernet devices which are too aggressive in their back-off and retry algorithm. SUN was famous for shipping system like that a few years back, but I assume that problem is gone by now. - Any system buffer unavailable event will trigger it. Running phases 18 and 01 of the self test recovers any memory that may have been lost due to excessive numbers of runt frames or lifetime exceeded errors. However, this trigger is protected by a timer that will prevent it from happening any more often than once every 10 minutes. Normally, the system buffer unavailable counter will never increment above 0. If this counter is moving, then this may be the problem. It would indicate an unusually high number of runt frams and collision fragments, which is indicative of improperly configured wiring. Check the "repeater count" limits, check wires for proper termination, and check to be sure the 180 meter limit on the length of a thinwire segment is not exceeded. - ^C received on manufacturing diagnostic console does an 1801. However, this requires power-on with password reset button depressed to enter manufacturing diagnostic mode (which turns the backplane management port into a diagnostic console). This is also possible if the ASO_L backplane pin is shorted to ground. DEChub 90 normally doesn't use control characters in the hub management protocol. However, if this is in a -900, it might be possible that the ASO_L pin is asserted due to some hub problem, and it might be polling the devices that are responding in some binary protocol or at a different baud rate, such that the bridge is interpreting repeater responses as ^C. That it happens every 30 seconds in another hint. 30 seconds is related to the interval of spanning tree hello messages. However, bad hellos cause a port loopback test to be scheduled, which appears as 180501 or 180201, not as just 1801. Please post this as a reply to your note, and let me know if this helps. Is this a bridge under test, or in production use?
776.2	Seeing this any new ideas	CX3PST::NOTAMI::A_ANDERSON	CX03 2/H13 NSU/VAX MacGhille Aindrais	`Fri Sep 23 1994 09:34`	45
	I have a customer that is seeing this problem also. He has the following configuration. Two DEChub 90's connected via 20M of thin wire and a management cable. In the top hub he has a DECBridge 90FL REV 3.0 and a DECrepeater 90C and a DECserver 90L and a DS90L+. The DECbridge 90FL is connected via the AUI port to Thick wire transceiver with Heart beat disabled. In the bottom hub he has a DECrepeater 90C and two DECserver 90TL's. He has a 5 node cluster connectd to the two repeaters on this hub, and a couple of Xwindow terminals. This has been running ok for a few months. On Sep 22 he lost connection with nodes on the backbone. The cluster on the hub was not affected. He tried to do a conenct node to the DECBridge 90FL from one of the cluster members and would get th 1801 a few times then a target does not respond. He had a spare DB90FL so he did a quick swap, at 19:30. By 07:30 the next morning the spare DECbridge was in the same condition. Next he swapped the DECBridge 90FL with a DECbeidge 90 and has not had any problems since. I have asked him to replace the DECbridge 90FL and if the problem reoccurs to remove the Thin wire cabel from the Hub and terminate the hub. To see if the bridge can recover. I have also asked him try and break the two hubs apart with their own DECbridge 90FL. Can a five node cluster accross these two hubs cause enough traffic to be a problem for a DB90FL? The Cluster members are not logging any send or recieve failures. And with the exception of losing communication with nodes out side of the hub these cluster members wer not affected in any way by the outage. This tells me that the work group side was not having a problem. He has the 3.1 update on order. Thanks for any input Alan S. Anderson Network Support CSC CS