T.R | Title | User | Personal Name | Date | Lines |
---|
3220.1 | probably a bad PS | NETRIX::"[email protected]" | Chip Boyle | Wed Jan 31 1996 16:34 | 15 |
| Wolfgang,
Could you please obtain the following information from your customer:
1) Number of power supplies in the DEChub900.
2) Type of each power supply.
3) Exact line card configuration in the DEChub900
(slot position, module type, and version).
4) What set of events cause this problem?
It appears to be a problem with a power supply, but not necessarily
with power supplies 1 & 2. Without knowing how your customer is
producing this bug, I would say the best thing to do is swap out
power supplies while checking the error log in between each swap.
Chip
[Posted by WWW Notes gateway]
|
3220.2 | | NPSS::WADE | Network Systems Support | Thu Feb 01 1996 09:05 | 15 |
| Hi Wolfgang,
Regarding the DECswitch -
The first one is crashing while running diagnostics (indicated by
the 2.1 version in the error logs) and looks to be a hardware problem so
I'd suggest swapping it out.
The second one with the error code = 3000 needs more investigation
and I'll have the firmware support engineer look at the error log.
You're sure they're running 1.5.2?
Bill
|
3220.3 | More info on interpretation of error logs.... | NETCAD::BATTERSBY | | Thu Feb 01 1996 09:24 | 21 |
| To further on what Bill said, yes the 4 entries in the first
DECswitch are diagnostic errors.
The 4 Test ID's indicate the following logic areas of failure
Test ID = E03 - Fddi Internal Loopback Test failure
Test ID = 911 - lance accept phy test failure
Test ID = 814 - biga pm dpath test
Test ID = E03 - Fddi Internal Loopback Test failure
The first error log entry and the fourth are the same test. So at
some point in time 3 descrete errors occured, and then there was a
subsequent occurence of E03 again.
The test id's of 911 and 814 indicate a problem with the first Ethernet
port (port 2 of the box).
The second module with the error code of 3000 looks like an exception
vector code. But like Bill said, we're going to get a Firmware experts
opinion on this one.
Bob
|
3220.4 | some answer from customer with power problem | BERFS4::NORD | | Thu Feb 01 1996 11:16 | 27 |
|
Hi Chip, and all the others, who are answering to my questions,
some more input from the customer with my first entry (power):
He is using a DEChub Multiswitch 900 with one DECswitch 900 EF in slot
8 and one DECrepeater 90T. All works fine, DS900EF has no problem. He
connected a PC to the DR90T and starts HUBwatch and than connect the
agent. At this moment, the DS900EF starts with havy traffic on all
the ports and only to the backplane, like it was configured for back-
plane only. Customer connects a LANanalyzer to the DR90T and looks,
what's going on on the ThinWire-port in the backplane. He saw a utili-
zation above 70%, but there is no protocol, the LANanalyzer can see,
only "wild" bits, no Ethernet-address with vendor code.
This DS900EF provides the IP-service for the backplane, he has tested
this switch on different backplanes and this problem only occures, if
he wants to run HUBwatch on this backplane and the switch delivers the
IP-service for this backplane.
Any hints?
Greetings
Wolfgang Nord
MCS Berlin, Germany
|
3220.5 | it's really v1.5.2, as HUBwatch said | BERFS4::NORD | | Thu Feb 01 1996 11:25 | 20 |
|
Hi Bill, hi Bob, hi to the rest of the world,
yes, it's the version 1.5.2, HUBwatch told it to me, as I swapped the
switch into a backplane in our demo room and ran HUBwatch.
Bill, do you know some days befor, we have had a problem with the
broadcast address as a source address, which are confusing our switches,
there was a version 1.5.2, but the revision level wasn't change by
engineering, this was done at official release date for version 1.5.2.
(Like a short hack.)
This version was updated in the switch by Robert Krause from NPBU, but
I don't know, from where he has this old (unpached) version.
Greeting to Boston and the rest of the world
Wolfgang Nord
MCS Berlin, Germany
|
3220.6 | | NPSS::WADE | Network Systems Support | Thu Feb 01 1996 12:45 | 5 |
| There was only one 1.5.2 that went out during late Aug-95. If it says
1.5.2 then it is the latest image.
Bill
|
3220.7 | | NPSS::WADE | Network Systems Support | Thu Feb 01 1996 12:46 | 20 |
|
Someone from the common code team needs to look at this error -
Entry # = 0
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.5
Reset Count = 28
Timestamp = 0 1A 4DCE
Write Count = 5
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2000 PC=03072C78 Error Code=00003000 ProcCsr=5D6D
Line # = 1026
File = ncsh.c
Dump another entry [Y]/N?
|
3220.8 | some more errors | BERFS4::NORD | | Mon Feb 05 1996 10:49 | 154 |
|
Hello all of you,
here are some more Error Log entries, the customer saw and he wants
to be explained
PEswitch 900TX - slot 6
==============================================================================
DUMP ERROR LOG
Current Reset Count: 7
==============================================================================
Entry # = 3
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.1
Reset Count = 3
Timestamp = 0 0 0
Write Count = 5
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=4F69
Line # = 608
File = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y
Entry # = 2
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.1
Reset Count = 3
Timestamp = 0 0 0
Write Count = 5
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=4F69
Line # = 608
File = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y
Entry # = 1
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.1
Reset Count = 1
Timestamp = 0 0 0
Write Count = 5
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2700 PC=030390BA Error Code=00003000 ProcCsr=6769
Line # = 608
File = /proj1023/pe100/work/hub-mgmt/duart.c
Dump another entry [Y]/N? y
Entry # = 0
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.1
Reset Count = 7
Timestamp = 0 0 0
Write Count = 5
FRU Mask = 2
Test ID = DEAD
Error Data = SR=2700 PC=030390ba Error Code=00003000 ProcCsr=4f69
Dump another entry [Y]/N?
DECswitch 900EF - slot 7
==============================================================================
DUMP ERROR LOG
Current Reset Count: 71
==============================================================================
Entry # = 1
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 1
Firmware Rev = 2.1
Reset Count = 70
Timestamp = 0 0 0
Write Count = 6808
FRU Mask = 2
Test ID = B01
Error Data = SR=0010 PC=00000020 Error Code=00000002 ProcCsr=0000
0:00000010 1:00000020 2:00000002 3:00000000
4:00000000 5:00000000 6:00000000 7:00000000
Dump another entry [Y]/N?
Entry # = 0
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 1
Firmware Rev = 2.1
Reset Count = 70
Timestamp = 0 0 0
Write Count = 6808
FRU Mask = 2
Test ID = A60
Error Data = SR=0002 PC=00000006 Error Code=00000000 ProcCsr=0000
0:00000002 1:00000006 2:00000000 3:80006060
4:00000000 5:00000000 6:00000000 7:00000000
Dump another entry [Y]/N?
Entry # = 3
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 1
Firmware Rev = 2.1
Reset Count = 70
Timestamp = 0 0 0
Write Count = 6807
FRU Mask = 2
Test ID = A50
Error Data = SR=0002 PC=00000006 Error Code=00000000 ProcCsr=0000
0:00000002 1:00000006 2:00000000 3:80006060
4:00000000 5:00000000 6:00000000 7:00000000
Dump another entry [Y]/N?
Entry # = 2
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 1
Firmware Rev = 2.1
Reset Count = 70
Timestamp = 0 0 0
Write Count = 6807
FRU Mask = 2
Test ID = 961
Error Data = SR=0002 PC=00000043 Error Code=00000000 ProcCsr=0000
0:00000002 1:00000043 2:00000000 3:80006060
4:00000000 5:00000000 6:00000000 7:00000000
Dump another entry [Y]/N?
The customer is a little bit confused about these errors, 'cause there
is no ducumentation about "Test ID", "Timestamp" and soon, and so I'm
unable to help or explain the customer, what these errors are.
Any help is welcome
Thanks a lot
Wolfgang Nord
MCS Berlin, Germany
|
3220.9 | Some answers on error messages.... | NETCAD::BATTERSBY | | Mon Feb 05 1996 12:16 | 20 |
| The errors seen on the DECswitch 900EF in slot 7 are all diagnostic
errors as follows. This unit should probably be returned as there
appear to be problems with several of the Ethernet ports.
Test ID = B01 "LANCES - All 901s Int Loopbk Test"
Test ID = A60 "IMBI MAC6 Int/Ext loopback Test"
Test ID = A50 "IMBI MAC5 Int/Ext loopback Test"
Test ID = 961 "LANCE P7 - Accept PHY Test"
The error messages seen in the PEswitch 900TX in slot 6 appear to be
firmware errors, and someone is looking into those.
In general, the error codes, Test ID, Timestamp fields are primarily
for Manufacturing personnel, Digital service personnel, and Engineering
personnel to use for interpretation, and as such the primary intended
audience for this information are those trained in the interpretation
of this information. The documentation normally provided our customers for
products like the HUB products is not intended to provide this level of
detailed information.
Bob
|
3220.10 | some more infos needed... | BERFS4::NORD | | Fri Feb 09 1996 13:20 | 139 |
|
Hi Bob, hi to the rest of the world of HUB900-products,
at first:
Many thanks for all the answers, I got from you to all my
questions, and this is my real opinion and meening.
Ok, I understand: Don't give it to the customer, only for internal
use by field service personnel/engineering.
But I'm a so called field service engineer and I have to look at the
error log and have to interpret the meening of "timestamp", "test id"
and "error data".
Concrete:
- Who to interprete "timestamp" ?
- Which "test id"s are possible ?
- Which "error codes" are possible ?
These are my favoried questions, and I want answers to these questions.
The next question is ('cause you told about: ... and as such the pri-
mary intended audience for this information are those trained in the
interpretation of this information ...):
Is there a training for service engineers, so they can inter-
pret/understand all the information the "error log" give us???
Ok, you are sitting there and clapping your hands together over your
head (a german phrase: "Die H�nde �ber dem Kopf zusammenschlagen")
(used, if someone is confused/astonished about something) and thinking:
What these germans want!, but my job is the network and the products
are sometimes out of our 900-serie. And so I think, I have to know, is
it a firmware- or a hardware-problem (Bill Wade knows about a problem
we have had here in Berlin with some DECswitches 900, which are getting
confused by a source address, which was a broadcast address, all "F"s,
so it came to DEFBA firmware version 1.5.2), and I need a helping hand
without escalation and writing a ITMP. What's the best, I know some
basics, or have to escalate or do a swap, where it wasn't needed.
You will find attached another error log from a DECswitch 900, but
there are some registers, I can't interprete, and I hope there is some-
one "with a helping hand".
If there is a problem in discussing the problem "online" (in this note)
my mail-stop is BERFS4::NORD, I need your help!!!
DECswitch 900EF - slot 7
==============================================================================
DUMP ERROR LOG
Current Reset Count: 83
==============================================================================
Entry # = 3
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.5
Reset Count = 78
Timestamp = 0 0 63
Write Count = 9
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2104 PC=030395A0 Error Code=000023C0 ProcCsr=5E6D
Registers = D0=00000000 D1=00002101 D2=00000001 D3=00002000
D4=0004B8C0 D5=00000000 D6=00000000 D7=0000FFFF
A0=0000002C A1=04427825 A2=0004B8B4 A3=00068D30
A4=030020D8 A5=03020000 A6=0004B8BC A7=0004B880
Dump another entry [Y]/N?
Entry # = 2
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 11
Firmware Rev = 1.5
Reset Count = 41
Timestamp = 0 0 28
Write Count = 9
FRU Mask = 0
Test ID = 3DB
Error Data = SR=00002000 PC=0009378A ErrorCode=00000005
Registers = Phy1Csr =000003DB ElmBase =00000000 MacBase =00000000
CamCsr =0000823F CamData15_00=00000000 PmCsr =00001415
CamData31_16=00004300 CamData47_32=00008001 PortDataA =00000001
RtosTimer =00000030 RtosTimerVal=00000011 PortDataB =00000000
i68k68kInt =00000000 i68k68kMask =000001FF DmaInt =00000036
i68kForceInt=00000000 DmaMask =00000000 HostData =00000000
HostInt0Mask=00000000 HostInt0 =000000C0 PortStatus =00000500
PortCtrlMask=00007FFF HostDmaMask =00005000 PortCtrlInt=00000000
FmcControl =00000032 FmcStatus =0000E000 FmcInt =00000000
Dump another entry [Y]/N?
Entry # = 1
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.4
Reset Count = 16
Timestamp = 0 0 0
Write Count = 9
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2700 PC=000757A8 Error Code=00002010 ProcCsr=4769
Registers = D0=00000002 D1=00000001 D2=00000004 D3=00000800
D4=00000002 D5=00000000 D6=00000000 D7=0000FFFF
A0=00075726 A1=00050238 A2=00062638 A3=0444E018
A4=030020D8 A5=03020000 A6=0004A8C4 A7=0004A8B0
Dump another entry [Y]/N?
Entry # = 0
Entry Status = 0 [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id = 10
Firmware Rev = 1.4
Reset Count = 20
Timestamp = 0 0 0
Write Count = 9
FRU Mask = 0
Test ID = DEAD
Error Data = SR=2710 PC=000758E0 Error Code=0000200C ProcCsr=6F69
Registers = D0=00000001 D1=FFFFFFFF D2=00000400 D3=00000800
D4=00000002 D5=00000000 D6=00000000 D7=0000FFFF
A0=00051F38 A1=00050238 A2=00062638 A3=0444E018
A4=030020D8 A5=03020000 A6=000FD803 A7=0004A8BD
Dump another entry [Y]/N?
Many thanks for reading this lines, I need answers for doing my job,
and you can help me, I think so!!!
Many thanks in advance
Wolfgnag Nord
MCS Berlin, Germany
|
3220.11 | | NETCAD::MILLBRANDT | answer mam | Fri Feb 09 1996 15:32 | 15 |
| Hi Wolfgang -
The 900 Hub and its modules are a family, but a family
of individualists, and getting more so all the time.
Timestamps in errorlogs mean the same thing in most devices.
A timestamp is a count of the number of 10ms time intervals
that the device has been up. In hex. Don't let the spaces
in between lead you to think there is a separate field for
days or hours or minutes. It's all one big count.
What a test id is and what an error code is depends on
what device is doing the dumping.
Dotsie
|
3220.12 | | NPSS::WADE | Network Systems Support | Mon Mar 18 1996 11:57 | 8 |
| Wolfgang,
I need to ask again; can you confirm that the error log entry listed in
3220.7 was logged against DEFBA 1.5.2 and not 1.5.0? Was the DEFBA running
with 1.5.0 prior to installing 1.5.2?
Bill
|