[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Ask the Storage Architecture Group |
Notice: | Check out our web page at http://www-starch.shr.dec.com |
Moderator: | SSAG::TERZA N |
|
Created: | Wed Oct 15 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6756 |
Total number of notes: | 25276 |
We had a problem with an Alphaserver 4100 running Digital Unix V4.0b.
The KZPSC that was controlling a set of 5 drives (raid 5) would time out
under heavy I/O doing random access writes. We could not duplicate the
problem doing serial I/O. It turns out that the cache simm was the problem.
Support was unable to help determine the actual problem using the error report
generated by DECevent. Does anyone have register information that would help
in determining that this was a cache problem using the error information
do we have to replace parts one at a time until the problem goes away?
DECevent gave more information than uerf, but turned out to be no more
useful.
The system configuration is:
Alphaserver 4100
2 KZPSC 3 ch. PCI controllers + BBU + 32MB cache
Only one channel used on each controller and they are connected each to
a BA356-SC with 16 bit I/O and 5 RZ29b-vw disks.
Each shelf configured with a 5 member raid-5 set.
Everything at latest revision.
******************************** ENTRY 144 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 2.
Timestamp of occurrence 14-MAY-1997 14:08:01
Host name silver
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 198. SWXCR RAID Controller Event
------ Device Data ------
Class x00 RAID Disk
Subsystem x20 SWXCR Mport/RAID Controller
Number of Packets 5.
------ Packet Type ------ 258. Module Name String
Routine Name xcr_cmd_timeout
------ Packet Type ------ 256. Generic String
Controller has stopped responding
------ Packet Type ------ 260. Hardware Error String
Error Type Hard Error Detected
------ Packet Type ------ 256. Generic String
Controller Softc at time of error
------ Packet Type ------ 512. SWXCR Softc(XCR_SOFTC)
*sc_bus_name xFFFFFC00006D0F10
Controller Number x00000001
Controller Version x00000000
*sc_ctrl xFFFFFC000063FBF8
I/O Handle x0000FB8000100000
Flags x00000002 Needs to be Restarted
Offset to Controller x00000000
Normal Commands Active 60.
Special Commands Active 4.
Command Slots Active 0.
*sc_act_flink xFFFFFC001FE57320
*sc_act_blink xFFFFFC001FE57708
Commands on Pending List 0.
*sc_pend_flink xFFFFFC001FE57050
*sc_pend_blink xFFFFFC001FE57050
*sc_free_flink xFFFFFC001FE570A0
*sc_free_blink xFFFFFC001FE576B8
Command Slots Available 61.
2560. Bytes Cmd Que Data ** Not Printed **
*sc_restartptr xFFFFFC001FE57A78
*sc_do_cmdptr xFFFFFC001FE57A78
******************************** ENTRY 145 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 3.
Timestamp of occurrence 14-MAY-1997 14:10:02
Host name silver
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000001
Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 198. SWXCR RAID Controller Event
------ Device Data ------
Class x00 RAID Disk
Subsystem x20 SWXCR Mport/RAID Controller
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name xcr_p_restart
------ Packet Type ------ 256. Generic String
Controller step 3 failed to clear
------ Packet Type ------ 260. Hardware Error String
Error Type Hard Error Detected
******************************** ENTRY 146 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 14-MAY-1997 14:10:02
Host name silver
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 198. SWXCR RAID Controller Event
------ Device Data ------
Class x00 RAID Disk
Subsystem x20 SWXCR Mport/RAID Controller
Number of Packets 7.
------ Packet Type ------ 258. Module Name String
Routine Name re_complete
------ Packet Type ------ 256. Generic String
I/O failed
------ Packet Type ------ 260. Hardware Error String
Error Type Hard Error Detected
------ Packet Type ------ 256. Generic String
Active XCR_COM at time of error
------ Packet Type ------ 0. SWXCR Communication Block (XCR_COM)
*my_addr xFFFFFC001CF0FA58
Controller Number x00000001
Unit Number on Controller x00000000
Function Status x00000003 Command has Timed Out
Adapter Status x00000000
SWXCR Flags x00000010 BP Points to Buffer
Received by Callback x00000001
*xcr_pdrv_ws xFFFFFC001CF0F928
*xcr_cntrl_ws xFFFFFC001CF0F950
*xcr_trns_ws xFFFFFC001CF0F900
*xcr_bp xFFFFFC0003A5B680
*xcr_cbfcnp xFFFFFC00005B9DC4
*xcr_data_ptr xFFFFFFFF9FE10000
Data Xfer Length 65536.
Number of Scatter Entries 0.
Command Data Length 0.
Block Number x01603580
Xfer Residual Length 65536.
Timeout Value in Seconds 60.
Command x00000002 Read
------ Packet Type ------ 256. Generic String
Active Controller Working Set at time of
error
------ Packet Type ------ 1. Controller/HBA Working Set(CNTRL_WS)
*cntrl_flink xFFFFFC000076CBA0
*cntrl_blink xFFFFFC000076CBA0
*cntrl_softc xFFFFFC001FE57000
*cntrl_com xFFFFFC001CF0FA58
*cntrl_cmd_slot xFFFFFC001FE57320
General Flags x00000001 Command Ready
Command Retry Count 0.
*scsi_addr x0000000000000000
dma_mapping_0 x0000000000000000
dma_mapping_1 x0000000000000000
dma_mapping_sg x0000000000000000
160. Bytes Scatter/Gather ** Not Printed **
Mask Register x00001FFF
Register 0 x82
Register 1 x12
Register 2 x80
Register 3 x40
Register 4 x80
Register 5 x35
Register 6 x60
Register 7 x00
Register 8 xA0
Register 9 xF9
Register a xF0
Register b x9C
Register c x08
Register d x00
Register e x00
Register f x00
******************************** ENTRY 147 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 5.
Timestamp of occurrence 14-MAY-1997 14:10:02
Host name silver
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 198. SWXCR RAID Controller Event
------ Device Data ------
Class x00 RAID Disk
Subsystem x20 SWXCR Mport/RAID Controller
Number of Packets 7.
------ Packet Type ------ 258. Module Name String
Routine Name re_complete
------ Packet Type ------ 256. Generic String
I/O failed
------ Packet Type ------ 260. Hardware Error String
Error Type Hard Error Detected
------ Packet Type ------ 256. Generic String
Active XCR_COM at time of error
------ Packet Type ------ 0. SWXCR Communication Block (XCR_COM)
*my_addr xFFFFFC001CF0F058
Controller Number x00000001
Unit Number on Controller x00000000
Function Status x00000003 Command has Timed Out
Adapter Status x00000000
SWXCR Flags x00000010 BP Points to Buffer
Received by Callback x00000001
*xcr_pdrv_ws xFFFFFC001CF0EF28
*xcr_cntrl_ws xFFFFFC001CF0EF50
*xcr_trns_ws xFFFFFC001CF0EF00
*xcr_bp xFFFFFC000440D040
*xcr_cbfcnp xFFFFFC00005B9DC4
*xcr_data_ptr xFFFFFC0003846000
Data Xfer Length 8192.
Number of Scatter Entries 0.
Command Data Length 0.
Block Number x00002C90
Xfer Residual Length 8192.
Timeout Value in Seconds 60.
Command x00000003 Write
------ Packet Type ------ 256. Generic String
Active Controller Working Set at time of
error
------ Packet Type ------ 1. Controller/HBA Working Set(CNTRL_WS)
*cntrl_flink xFFFFFC001CF0E7D0
*cntrl_blink xFFFFFC000076CBA0
*cntrl_softc xFFFFFC001FE57000
*cntrl_com xFFFFFC001CF0F058
*cntrl_cmd_slot xFFFFFC001FE57708
General Flags x00000001 Command Ready
Command Retry Count 0.
*scsi_addr x0000000000000000
dma_mapping_0 x0000000000000000
dma_mapping_1 x0000000000000000
dma_mapping_sg x0000000000000000
160. Bytes Scatter/Gather ** Not Printed **
Mask Register x00000FFF
Register 0 x03
Register 1 x2B
Register 2 x10
Register 3 x00
Register 4 x90
Register 5 x2C
Register 6 x00
Register 7 x00
Register 8 x00
Register 9 x60
Register a x84
Register b x83
Register c x00
Register d x00
Register e x00
Register f x00
******************************** ENTRY 150 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 8.
Timestamp of occurrence 14-MAY-1997 14:10:02
Host name silver
System type register x00000016 AlphaServer 4000 Series
Number of CPUs (mpnum) x00000002
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 3. High Priority
Entry type 198. SWXCR RAID Controller Event
------ Device Data ------
Class x00 RAID Disk
Subsystem x20 SWXCR Mport/RAID Controller
Number of Packets 7.
------ Packet Type ------ 258. Module Name String
Routine Name re_complete
------ Packet Type ------ 256. Generic String
I/O failed
------ Packet Type ------ 260. Hardware Error String
Error Type Hard Error Detected
------ Packet Type ------ 256. Generic String
Active XCR_COM at time of error
------ Packet Type ------ 0. SWXCR Communication Block (XCR_COM)
*my_addr xFFFFFC001CF0E8D8
Controller Number x00000001
Unit Number on Controller x00000000
Function Status x00000003 Command has Timed Out
Adapter Status x00000000
SWXCR Flags x00000010 BP Points to Buffer
Received by Callback x00000001
*xcr_pdrv_ws xFFFFFC001CF0E7A8
*xcr_cntrl_ws xFFFFFC001CF0E7D0
*xcr_trns_ws xFFFFFC001CF0E780
*xcr_bp xFFFFFC000440CDC0
*xcr_cbfcnp xFFFFFC00005B9DC4
*xcr_data_ptr xFFFFFFFF9FE20000
Data Xfer Length 65536.
Number of Scatter Entries 0.
Command Data Length 0.
Block Number x01603600
Xfer Residual Length 65536.
Timeout Value in Seconds 60.
Command x00000002 Read
------ Packet Type ------ 256. Generic String
Active Controller Working Set at time of
error
------ Packet Type ------ 1. Controller/HBA Working Set(CNTRL_WS)
*cntrl_flink xFFFFFC000076CBA0
*cntrl_blink xFFFFFC000076CBA0
*cntrl_softc xFFFFFC001FE57000
*cntrl_com xFFFFFC001CF0E8D8
*cntrl_cmd_slot xFFFFFC001FE57370
General Flags x00000001 Command Ready
Command Retry Count 0.
*scsi_addr x0000000000000000
dma_mapping_0 x0000000000000000
dma_mapping_1 x0000000000000000
dma_mapping_sg x0000000000000000
160. Bytes Scatter/Gather ** Not Printed **
Mask Register x00001FFF
Register 0 x82
Register 1 x14
Register 2 x80
Register 3 x40
Register 4 x00
Register 5 x36
Register 6 x60
Register 7 x00
Register 8 x20
Register 9 xE8
Register a xF0
Register b x9C
Register c x08
Register d x00
Register e x00
Register f x00
Thanks,
[Posted by WWW Notes gateway]
T.R | Title | User | Personal Name | Date | Lines
|
---|