[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference netcad::hub_mgnt

Title:DEChub/HUBwatch/PROBEwatch CONFERENCE
Notice:Firmware -2, Doc -3, Power -4, HW kits -5, firm load -6&7
Moderator:NETCAD::COLELLADT
Created:Wed Nov 13 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4455
Total number of notes:16761

3220.0. "Please explain these errors" by BERFS4::NORD () Wed Jan 31 1996 10:33


	Hi you all in the world of HUB900-products,

	on different customer sides I have heard about some problems with our
	DEChub Multiswitch and DECswitch 900EF and I told the customers: Please
	do a Dump Errorlog on the console port of the Multiswitch and tell me,
	what you are seeing.

	And here is what they have seen:

	1. DEChub 900 Multiswitch:

	Entry		= 7
	Time Stamp	= 0 20400
	Reset Count	= 4
	1 : PS problem! AC_OK,DC_BAD,48V_OK

	Dump another entry [Y]/N? y

	Entry		= 6
	Time Stamp	= 0 1000
	Reset Count	= 4
	2 : PS problem! AC_OK,DC_BAD,48V_OK

	Dump another entry [Y]/N? y

	Entry		= 5
	Time Stamp	= 0 1000
	Reset Count	= 3
	2 : PS problem! AC_OK,DC_BAD,48V_OK

	Dump another entry [Y]/N? y

	Entry		= 3
	Time Stamp	= 0 1000
	Reset Count	= 3
	1 : PS problem! AC_OK,DC_BAD,48V_OK

	Dump another entry [Y]/N? y

	To this backplane has been connected a DECswitch 900 EF, nothing was
	configured on this DECswitch, no virtual LAN, no bridge port to the
	backplane, but this DECswitch goes into a connection with all ports to
	the backplane, all yellow LEDs are on, permanent! and all the connec-
	tions between the connected repeater in the backplane are brocken.
 

	2. DECswitch 900 EF


DECswitch 900EF - slot 4
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 36

==============================================================================


Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 35
Timestamp     =    0    0    0
Write Count   = 48
FRU Mask      = 3
Test ID       = E03
Error Data    = SR=0009 PC=00000300 Error Code=000000F0 ProcCsr=0000

                 0:00000009  1:00000300  2:000000F0  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 
Dump another entry [Y]/N? y
Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 35
Timestamp     =    0    0    0
Write Count   = 48
FRU Mask      = 2
Test ID       = 911
Error Data    = SR=0003 PC=00000073 Error Code=00000000 ProcCsr=0000

                 0:00000003  1:00000073  2:00000000  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 
Dump another entry [Y]/N? y
Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 35
Timestamp     =    0    0    0
Write Count   = 48
FRU Mask      = 2
Test ID       = 814
Error Data    = SR=0001 PC=04000000 Error Code=55555555 ProcCsr=0000

                 0:00000001  1:04000000  2:55555555  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 
Dump another entry [Y]/N? y
Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 34
Timestamp     =    0    0    0
Write Count   = 48
FRU Mask      = 3
Test ID       = E03
Error Data    = SR=0009 PC=00000300 Error Code=000000F0 ProcCsr=0000

                 0:00000009  1:00000300  2:000000F0  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 




DECswitch 900EF - slot 5
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 34

==============================================================================


Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 28
Timestamp     =    0   1A 4DCE
Write Count   = 5
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2000 PC=03072C78 Error Code=00003000 ProcCsr=5D6D
Line #        = 1026
File          = ncsh.c
Dump another entry [Y]/N? 



	I don't know what the first DECswitch is, it looks like a switching
	router, and the second DECswitch has really firmware version 1.5.2,
	I think this a unpatched version as I got in August '95.


	My question is: Is there anybody out there who can explane the
	meanings of the different entries and what to do with the power supply
	(DC-BAD)???

	Any help is welcome

	Thanks a lot

	Wolfgang Nord
	MCS Berlin, Germany
T.RTitleUserPersonal
Name
DateLines
3220.1probably a bad PSNETRIX::"[email protected]"Chip BoyleWed Jan 31 1996 16:3415
Wolfgang,
Could you please obtain the following information from your customer:
	1) Number of power supplies in the DEChub900.
	2) Type of each power supply.
	3) Exact line card configuration in the DEChub900
		(slot position, module type, and version).
	4) What set of events cause this problem?

It appears to be a problem with a power supply, but not necessarily
with power supplies 1 & 2.  Without knowing how your customer is 
producing this bug, I would say the best thing to do is swap out
power supplies while checking the error log in between each swap.

Chip
[Posted by WWW Notes gateway]
3220.2NPSS::WADENetwork Systems SupportThu Feb 01 1996 09:0515
    Hi Wolfgang,
    
    Regarding the DECswitch -
    
    	The first one is crashing while running diagnostics (indicated by
    the 2.1 version in the error logs) and looks to be a hardware problem so 
    I'd suggest swapping it out.
    
    	The second one with the error code = 3000 needs more investigation
    and I'll have the  firmware support engineer look at the error log. 
    You're sure they're running 1.5.2?
    
    Bill
    
    
3220.3More info on interpretation of error logs....NETCAD::BATTERSBYThu Feb 01 1996 09:2421
    To further on what Bill said, yes the 4 entries in the first
    DECswitch are diagnostic errors.
    The 4 Test ID's indicate the following logic areas of failure
    
    Test ID = E03   - Fddi Internal Loopback Test failure
    Test ID = 911   - lance accept phy test failure
    Test ID = 814   - biga pm dpath test 
    Test ID = E03   - Fddi Internal Loopback Test failure
    
    The first error log entry and the fourth are the same test. So at 
    some point in time 3 descrete errors occured, and then there was a
    subsequent occurence of E03 again.
    The test id's of 911 and 814 indicate a problem with the first Ethernet 
    port (port 2 of the box).
    The second module with the error code of 3000 looks like an exception
    vector code. But like Bill said, we're going to get a Firmware experts 
    opinion on this one.
    
    Bob
    
    
3220.4some answer from customer with power problemBERFS4::NORDThu Feb 01 1996 11:1627

	Hi Chip, and all the others, who are answering to my questions,

	some more input from the customer with my first entry (power):

	He is using a DEChub Multiswitch 900 with one DECswitch 900 EF in slot
	8 and one DECrepeater 90T. All works fine, DS900EF has no problem. He
	connected a PC to the DR90T and starts HUBwatch and than connect the
	agent. At this moment, the DS900EF starts with havy traffic on all
	the ports and only to the backplane, like it was configured for back-
	plane only. Customer connects a LANanalyzer to the DR90T and looks,
	what's going on on the ThinWire-port in the backplane. He saw a utili-
	zation above 70%, but there is no protocol, the LANanalyzer can see,
	only "wild" bits, no Ethernet-address with vendor code.
	This DS900EF provides the IP-service for the backplane, he has tested
	this switch on different backplanes and this problem only occures, if
	he wants to run HUBwatch on this backplane and the switch delivers the
	IP-service for this backplane.

	Any hints?

	Greetings

	Wolfgang Nord
	MCS Berlin, Germany

3220.5it's really v1.5.2, as HUBwatch saidBERFS4::NORDThu Feb 01 1996 11:2520

	Hi Bill, hi Bob, hi to the rest of the world,

	yes, it's the version 1.5.2, HUBwatch told it to me, as I swapped the
	switch into a backplane in our demo room and ran HUBwatch.

	Bill, do you know some days befor, we have had a problem with the
	broadcast address as a source address, which are confusing our switches,
	there was a version 1.5.2, but the revision level wasn't change by
	engineering, this was done at official release date for version 1.5.2.
	(Like a short hack.)

	This version was updated in the switch by Robert Krause from NPBU, but
	I don't know, from where he has this old (unpached) version.

	Greeting to Boston and the rest of the world

	Wolfgang Nord
	MCS Berlin, Germany
3220.6NPSS::WADENetwork Systems SupportThu Feb 01 1996 12:455
    There was only one 1.5.2 that went out during late Aug-95.  If it says
    1.5.2 then it is the latest image.
    
    Bill
    
3220.7NPSS::WADENetwork Systems SupportThu Feb 01 1996 12:4620
    
Someone from the common code team needs to look at this error -


Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 28
Timestamp     =    0   1A 4DCE
Write Count   = 5
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2000 PC=03072C78 Error Code=00003000 ProcCsr=5D6D
Line #        = 1026
File          = ncsh.c
Dump another entry [Y]/N? 



3220.8some more errorsBERFS4::NORDMon Feb 05 1996 10:49154

	Hello all of you,

	here are some more Error Log entries, the customer saw and he wants
	to be explained


PEswitch 900TX - slot 6
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 7

==============================================================================


Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.1
Reset Count   = 3
Timestamp     =    0    0    0
Write Count   = 5
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=4F69
Line #	      = 608
File          = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y


Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.1
Reset Count   = 3
Timestamp     =    0    0    0
Write Count   = 5
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=4F69
Line #	      = 608
File          = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y


Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.1
Reset Count   = 1
Timestamp     =    0    0    0
Write Count   = 5
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=6769
Line #	      = 608
File          = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y

Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.1
Reset Count   = 7
Timestamp     =    0    0    0
Write Count   = 5
FRU Mask      = 2
Test ID       = DEAD
Error Data    = SR=2700 PC=030390ba Error Code=00003000 ProcCsr=4f69
Dump another entry [Y]/N? 

DECswitch 900EF - slot 7
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 71

==============================================================================


Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 70
Timestamp     =    0    0    0
Write Count   = 6808
FRU Mask      = 2
Test ID       = B01
Error Data    = SR=0010 PC=00000020 Error Code=00000002 ProcCsr=0000

                 0:00000010  1:00000020  2:00000002  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 


Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 70
Timestamp     =    0    0    0
Write Count   = 6808
FRU Mask      = 2
Test ID       = A60
Error Data    = SR=0002 PC=00000006 Error Code=00000000 ProcCsr=0000

                 0:00000002  1:00000006  2:00000000  3:80006060
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 

Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 70
Timestamp     =    0    0    0
Write Count   = 6807
FRU Mask      = 2
Test ID       = A50
Error Data    = SR=0002 PC=00000006 Error Code=00000000 ProcCsr=0000

                 0:00000002  1:00000006  2:00000000  3:80006060
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 

Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 70
Timestamp     =    0    0    0
Write Count   = 6807
FRU Mask      = 2
Test ID       = 961
Error Data    = SR=0002 PC=00000043 Error Code=00000000 ProcCsr=0000

                 0:00000002  1:00000043  2:00000000  3:80006060
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 

	The customer is a little bit confused about these errors, 'cause there
	is no ducumentation about "Test ID", "Timestamp" and soon, and so I'm
	unable to help or explain the customer, what these errors are.


	Any help is welcome

	Thanks a lot

	Wolfgang Nord
	MCS Berlin, Germany
3220.9Some answers on error messages....NETCAD::BATTERSBYMon Feb 05 1996 12:1620
    The errors seen on the DECswitch 900EF in slot 7 are all diagnostic
    errors as follows. This unit should probably be returned as there
    appear to be problems with several of the Ethernet ports.
    
    Test ID       = B01   "LANCES - All 901s Int Loopbk Test"
    Test ID       = A60   "IMBI MAC6 Int/Ext loopback Test"
    Test ID       = A50   "IMBI MAC5 Int/Ext loopback Test"
    Test ID       = 961   "LANCE P7 - Accept PHY Test"
    
    The error messages seen in the PEswitch 900TX in slot 6 appear to be
    firmware errors, and someone is looking into those.
    In general, the error codes, Test ID, Timestamp fields are primarily
    for Manufacturing personnel, Digital service personnel, and Engineering
    personnel to use for interpretation, and as such the primary intended
    audience for this information are those trained in the interpretation
    of this information. The documentation normally provided our customers for
    products like the HUB products is not intended to provide this level of
    detailed information.
    
    Bob
3220.10some more infos needed...BERFS4::NORDFri Feb 09 1996 13:20139

	Hi Bob, hi to the rest of the world of HUB900-products,

	at first:
		Many thanks for all the answers, I got from you to all my
		questions, and this is my real opinion and meening.


	Ok, I understand: Don't give it to the customer, only for internal
	use by field service personnel/engineering.

	But I'm a so called field service engineer and I have to look at the
	error log and have to interpret the meening of "timestamp", "test id"
	and "error data".

	Concrete:
	- Who to interprete "timestamp" ?
	- Which "test id"s are possible ?
	- Which "error codes" are possible ?

	These are my favoried questions, and I want answers to these questions.

	The next question is ('cause you told about: ... and as such the pri-
	mary intended audience for this information are those trained in the
	interpretation of this information ...):

		Is there a training for service engineers, so they can inter-
		pret/understand all the information the "error log" give us???
	
	Ok, you are sitting there and clapping your hands together over your
	head (a german phrase: "Die H�nde �ber dem Kopf zusammenschlagen")
	(used, if someone is confused/astonished about something) and thinking:
	What these germans want!, but my job is the network and the products
	are sometimes out of our 900-serie. And so I think, I have to know, is
	it a firmware- or a hardware-problem (Bill Wade knows about a problem
	we have had here in Berlin with some DECswitches 900, which are getting
	confused by a source address, which was a broadcast address, all "F"s,
	so it came to DEFBA firmware version 1.5.2), and I need a helping hand
	without escalation and writing a ITMP. What's the best, I know some
	basics, or have to escalate or do a swap, where it wasn't needed.

	You will find attached another error log from a DECswitch 900, but
	there are some registers, I can't interprete, and I hope there is some-	
	one "with a helping hand".

	If there is a problem in discussing the problem "online" (in this note)
	my mail-stop is BERFS4::NORD, I need your help!!!




DECswitch 900EF - slot 7
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 83

==============================================================================


Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 78
Timestamp     =    0    0   63
Write Count   = 9
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2104 PC=030395A0 Error Code=000023C0 ProcCsr=5E6D
Registers     = D0=00000000 D1=00002101 D2=00000001 D3=00002000
                D4=0004B8C0 D5=00000000 D6=00000000 D7=0000FFFF
                A0=0000002C A1=04427825 A2=0004B8B4 A3=00068D30
                A4=030020D8 A5=03020000 A6=0004B8BC A7=0004B880
Dump another entry [Y]/N? 

Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 11
Firmware Rev  = 1.5
Reset Count   = 41
Timestamp     =    0    0   28
Write Count   = 9
FRU Mask      = 0
Test ID       = 3DB
Error Data    = SR=00002000 PC=0009378A ErrorCode=00000005
Registers     = Phy1Csr     =000003DB ElmBase     =00000000 MacBase    =00000000
                CamCsr      =0000823F CamData15_00=00000000 PmCsr      =00001415
                CamData31_16=00004300 CamData47_32=00008001 PortDataA  =00000001
                RtosTimer   =00000030 RtosTimerVal=00000011 PortDataB  =00000000
                i68k68kInt  =00000000 i68k68kMask =000001FF DmaInt     =00000036
                i68kForceInt=00000000 DmaMask     =00000000 HostData   =00000000
                HostInt0Mask=00000000 HostInt0    =000000C0 PortStatus =00000500
                PortCtrlMask=00007FFF HostDmaMask =00005000 PortCtrlInt=00000000
                FmcControl  =00000032 FmcStatus   =0000E000 FmcInt     =00000000
Dump another entry [Y]/N? 

Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.4
Reset Count   = 16
Timestamp     =    0    0    0
Write Count   = 9
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2700 PC=000757A8 Error Code=00002010 ProcCsr=4769
Registers     = D0=00000002 D1=00000001 D2=00000004 D3=00000800
                D4=00000002 D5=00000000 D6=00000000 D7=0000FFFF
                A0=00075726 A1=00050238 A2=00062638 A3=0444E018
                A4=030020D8 A5=03020000 A6=0004A8C4 A7=0004A8B0
Dump another entry [Y]/N? 

Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.4
Reset Count   = 20
Timestamp     =    0    0    0
Write Count   = 9
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2710 PC=000758E0 Error Code=0000200C ProcCsr=6F69
Registers     = D0=00000001 D1=FFFFFFFF D2=00000400 D3=00000800
                D4=00000002 D5=00000000 D6=00000000 D7=0000FFFF
                A0=00051F38 A1=00050238 A2=00062638 A3=0444E018
                A4=030020D8 A5=03020000 A6=000FD803 A7=0004A8BD
Dump another entry [Y]/N? 



	Many thanks for reading this lines, I need answers for doing my job,
	and you can help me, I think so!!!

	Many thanks in advance

	Wolfgnag Nord
	MCS Berlin, Germany
3220.11NETCAD::MILLBRANDTanswer mamFri Feb 09 1996 15:3215
Hi Wolfgang -

The 900 Hub and its modules are a family, but a family
of individualists, and getting more so all the time.

Timestamps in errorlogs mean the same thing in most devices.
A timestamp is a count of the number of 10ms time intervals
that the device has been up.  In hex.  Don't let the spaces
in between lead you to think there is a separate field for
days or hours or minutes.  It's all one big count.

What a test id is and what an error code is depends on
what device is doing the dumping.

	Dotsie
3220.12NPSS::WADENetwork Systems SupportMon Mar 18 1996 11:578
    Wolfgang,
    
    I need to ask again; can you confirm that the error log entry listed in 
    3220.7 was logged against DEFBA 1.5.2 and not 1.5.0?  Was the DEFBA running
    with 1.5.0 prior to installing 1.5.2? 
    
    Bill