[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6712.0. "Decoding KZPSC cache errors in DECevent logs" by NNTPD::"wilsonru@lrodial_port1.lro.dec.com" (Rudy Wilson) Wed May 21 1997 20:53

We had a problem with an Alphaserver 4100 running Digital Unix V4.0b.
The KZPSC that was controlling a set of 5 drives (raid 5) would time out
under heavy I/O doing random access writes. We could not duplicate the
problem doing serial I/O. It turns out that the cache simm was the problem.
Support was unable to help determine the actual problem using the error report

generated by DECevent. Does anyone have register information that would help
in determining that this was a cache problem using the error information 
do we have to replace parts one at a time until the problem goes away?
DECevent gave more information than uerf, but turned out to be no more
useful. 
 
The system configuration is:

Alphaserver 4100
2 KZPSC 3 ch. PCI controllers + BBU + 32MB cache 
Only one channel used on each controller and they are connected each to
a BA356-SC with 16 bit I/O and 5 RZ29b-vw disks.
Each shelf configured with a 5 member raid-5 set.
Everything at latest revision.
                                                                              
                                                                              
                                                                              

******************************** ENTRY  144 ********************************  

                                                                              

                                                                              

Logging OS                        2. Digital UNIX                             

System Architecture               2. Alpha                                    

Event sequence number             2.                                          

Timestamp of occurrence              14-MAY-1997 14:08:01                     

Host name                            silver                                   

                                                                              

System type register      x00000016  AlphaServer 4000 Series                  

Number of CPUs (mpnum)    x00000002                                           

CPU logging event (mperr) x00000000                                           

                                                                              

Event validity                    1. O/S claims event is valid                

Event severity                    3. High Priority                            

Entry type                      198. SWXCR RAID Controller Event              

                                                                              

                                                                              

------ Device Data ------                                                     

Class                           x00  RAID Disk                                

Subsystem                       x20  SWXCR Mport/RAID Controller              

Number of Packets                 5.                                          

                                                                              

------ Packet Type ------       258. Module Name String                       

Routine Name                         xcr_cmd_timeout                          

------ Packet Type ------       256. Generic String                           

                                     Controller has stopped responding        

------ Packet Type ------       260. Hardware Error String                    

Error Type                           Hard Error Detected                      

------ Packet Type ------       256. Generic String                           

                                     Controller Softc at time of error        

------ Packet Type ------       512. SWXCR Softc(XCR_SOFTC)                   

                                                                              

*sc_bus_name              xFFFFFC00006D0F10                                   

Controller Number         x00000001                                           

Controller Version        x00000000                                           

*sc_ctrl                  xFFFFFC000063FBF8                                   

I/O Handle                x0000FB8000100000                                   

Flags                     x00000002  Needs to be Restarted                    

Offset to Controller      x00000000                                           

Normal Commands Active           60.                                          

Special Commands Active           4.                                          

Command Slots Active              0.                                          

*sc_act_flink             xFFFFFC001FE57320                                   

*sc_act_blink             xFFFFFC001FE57708                                   

Commands on Pending List          0.                                          

*sc_pend_flink            xFFFFFC001FE57050                                   

*sc_pend_blink            xFFFFFC001FE57050                                   

*sc_free_flink            xFFFFFC001FE570A0                                   

*sc_free_blink            xFFFFFC001FE576B8                                   

Command Slots Available          61.                                          

2560. Bytes Cmd Que Data             ** Not Printed **                        

*sc_restartptr            xFFFFFC001FE57A78                                   

*sc_do_cmdptr             xFFFFFC001FE57A78                                   

                                                                              

                                                                              

******************************** ENTRY  145 ********************************  

                                                                              

                                                                              

Logging OS                        2. Digital UNIX                             

System Architecture               2. Alpha                                    

Event sequence number             3.                                          

Timestamp of occurrence              14-MAY-1997 14:10:02                     

Host name                            silver                                   

                                                                              

System type register      x00000016  AlphaServer 4000 Series                  

Number of CPUs (mpnum)    x00000002                                           

CPU logging event (mperr) x00000001                                           

                                                                              

Event validity                    1. O/S claims event is valid                

Event severity                    3. High Priority                            

Entry type                      198. SWXCR RAID Controller Event              

                                                                              

                                                                              

------ Device Data ------                                                     

Class                           x00  RAID Disk                                

Subsystem                       x20  SWXCR Mport/RAID Controller              

Number of Packets                 3.                                          

                                                                              

------ Packet Type ------       258. Module Name String                       

Routine Name                         xcr_p_restart                            

------ Packet Type ------       256. Generic String                           

                                     Controller step 3 failed to clear        

------ Packet Type ------       260. Hardware Error String                    

Error Type                           Hard Error Detected                      

                                                                              

                                                                              

******************************** ENTRY  146 ********************************  

                                                                              

                                                                              

Logging OS                        2. Digital UNIX                             

System Architecture               2. Alpha                                    

Event sequence number             4.                                          

Timestamp of occurrence              14-MAY-1997 14:10:02                     

Host name                            silver                                   

                                                                              

System type register      x00000016  AlphaServer 4000 Series                  

Number of CPUs (mpnum)    x00000002                                           

CPU logging event (mperr) x00000000                                           

                                                                              

Event validity                    1. O/S claims event is valid                

Event severity                    3. High Priority                            

Entry type                      198. SWXCR RAID Controller Event              

                                                                              

                                                                              

------ Device Data ------                                                     

Class                           x00  RAID Disk                                

Subsystem                       x20  SWXCR Mport/RAID Controller              

Number of Packets                 7.                                          

                                                                              

------ Packet Type ------       258. Module Name String                       

Routine Name                         re_complete                              

------ Packet Type ------       256. Generic String                           

                                     I/O failed                               

------ Packet Type ------       260. Hardware Error String                    

Error Type                           Hard Error Detected                      

------ Packet Type ------       256. Generic String                           

                                     Active XCR_COM at time of error          

------ Packet Type ------         0. SWXCR Communication Block (XCR_COM)      

                                                                              

*my_addr                  xFFFFFC001CF0FA58                                   

Controller Number         x00000001                                           

Unit Number on Controller x00000000                                           

Function Status           x00000003  Command has Timed Out                    

Adapter Status            x00000000                                           

SWXCR Flags               x00000010  BP Points to Buffer                      

Received by Callback      x00000001                                           

*xcr_pdrv_ws              xFFFFFC001CF0F928                                   

*xcr_cntrl_ws             xFFFFFC001CF0F950                                   

*xcr_trns_ws              xFFFFFC001CF0F900                                   

*xcr_bp                   xFFFFFC0003A5B680                                   

*xcr_cbfcnp               xFFFFFC00005B9DC4                                   

*xcr_data_ptr             xFFFFFFFF9FE10000                                   

Data Xfer Length              65536.                                          

Number of Scatter Entries         0.                                          

Command Data Length               0.                                          

Block Number              x01603580                                           

Xfer Residual Length          65536.                                          

Timeout Value in Seconds         60.                                          

Command                   x00000002  Read                                     

                                                                              

------ Packet Type ------       256. Generic String                           

                                     Active Controller Working Set at time of 

                                     error                                    

------ Packet Type ------         1. Controller/HBA Working Set(CNTRL_WS)     

                                                                              

*cntrl_flink              xFFFFFC000076CBA0                                   

*cntrl_blink              xFFFFFC000076CBA0                                   

*cntrl_softc              xFFFFFC001FE57000                                   

*cntrl_com                xFFFFFC001CF0FA58                                   

*cntrl_cmd_slot           xFFFFFC001FE57320                                   

General Flags             x00000001  Command Ready                            

Command Retry Count               0.                                          

*scsi_addr                x0000000000000000                                   

dma_mapping_0             x0000000000000000                                   

dma_mapping_1             x0000000000000000                                   

dma_mapping_sg            x0000000000000000                                   

160. Bytes Scatter/Gather            ** Not Printed **                        

Mask Register             x00001FFF                                           

Register 0                      x82                                           

Register 1                      x12                                           

Register 2                      x80                                           

Register 3                      x40                                           

Register 4                      x80                                           

Register 5                      x35                                           

Register 6                      x60                                           

Register 7                      x00                                           

Register 8                      xA0                                           

Register 9                      xF9                                           

Register a                      xF0                                           

Register b                      x9C                                           

Register c                      x08                                           

Register d                      x00                                           

Register e                      x00                                           

Register f                      x00                                           

                                                                              

                                                                              

******************************** ENTRY  147 ********************************  

                                                                              

                                                                              

Logging OS                        2. Digital UNIX                             

System Architecture               2. Alpha                                    

Event sequence number             5.                                          

Timestamp of occurrence              14-MAY-1997 14:10:02                     

Host name                            silver                                   

                                                                              

System type register      x00000016  AlphaServer 4000 Series                  

Number of CPUs (mpnum)    x00000002                                           

CPU logging event (mperr) x00000000                                           

                                                                              

Event validity                    1. O/S claims event is valid                

Event severity                    3. High Priority                            

Entry type                      198. SWXCR RAID Controller Event              

                                                                              

                                                                              

------ Device Data ------                                                     

Class                           x00  RAID Disk                                

Subsystem                       x20  SWXCR Mport/RAID Controller              

Number of Packets                 7.                                          

                                                                              

------ Packet Type ------       258. Module Name String                       

Routine Name                         re_complete                              

------ Packet Type ------       256. Generic String                           

                                     I/O failed                               

------ Packet Type ------       260. Hardware Error String                    

Error Type                           Hard Error Detected                      

------ Packet Type ------       256. Generic String                           

                                     Active XCR_COM at time of error          

------ Packet Type ------         0. SWXCR Communication Block (XCR_COM)      

                                                                              

*my_addr                  xFFFFFC001CF0F058                                   

Controller Number         x00000001                                           

Unit Number on Controller x00000000                                           

Function Status           x00000003  Command has Timed Out                    

Adapter Status            x00000000                                           

SWXCR Flags               x00000010  BP Points to Buffer                      

Received by Callback      x00000001                                           

*xcr_pdrv_ws              xFFFFFC001CF0EF28                                   

*xcr_cntrl_ws             xFFFFFC001CF0EF50                                   

*xcr_trns_ws              xFFFFFC001CF0EF00                                   

*xcr_bp                   xFFFFFC000440D040                                   

*xcr_cbfcnp               xFFFFFC00005B9DC4                                   

*xcr_data_ptr             xFFFFFC0003846000                                   

Data Xfer Length               8192.                                          

Number of Scatter Entries         0.                                          

Command Data Length               0.                                          

Block Number              x00002C90                                           

Xfer Residual Length           8192.                                          

Timeout Value in Seconds         60.                                          

Command                   x00000003  Write                                    

                                                                              

------ Packet Type ------       256. Generic String                           

                                     Active Controller Working Set at time of 

                                     error                                    

------ Packet Type ------         1. Controller/HBA Working Set(CNTRL_WS)     

                                                                              

*cntrl_flink              xFFFFFC001CF0E7D0                                   

*cntrl_blink              xFFFFFC000076CBA0                                   

*cntrl_softc              xFFFFFC001FE57000                                   

*cntrl_com                xFFFFFC001CF0F058                                   

*cntrl_cmd_slot           xFFFFFC001FE57708                                   

General Flags             x00000001  Command Ready                            

Command Retry Count               0.                                          

*scsi_addr                x0000000000000000                                   

dma_mapping_0             x0000000000000000                                   

dma_mapping_1             x0000000000000000                                   

dma_mapping_sg            x0000000000000000                                   

160. Bytes Scatter/Gather            ** Not Printed **                        

Mask Register             x00000FFF                                           

Register 0                      x03                                           

Register 1                      x2B                                           

Register 2                      x10                                           

Register 3                      x00                                           

Register 4                      x90                                           

Register 5                      x2C                                           

Register 6                      x00                                           

Register 7                      x00                                           

Register 8                      x00                                           

Register 9                      x60                                           

Register a                      x84                                           

Register b                      x83                                           

Register c                      x00                                           

Register d                      x00                                           

Register e                      x00                                           

Register f                      x00                                           

                                                                              

                                                                              

******************************** ENTRY  150 ********************************  

                                                                              

                                                                              

Logging OS                        2. Digital UNIX                             

System Architecture               2. Alpha                                    

Event sequence number             8.                                          

Timestamp of occurrence              14-MAY-1997 14:10:02                     

Host name                            silver                                   

                                                                              

System type register      x00000016  AlphaServer 4000 Series                  

Number of CPUs (mpnum)    x00000002                                           

CPU logging event (mperr) x00000000                                           

                                                                              

Event validity                    1. O/S claims event is valid                

Event severity                    3. High Priority                            

Entry type                      198. SWXCR RAID Controller Event              

                                                                              

                                                                              

------ Device Data ------                                                     

Class                           x00  RAID Disk                                

Subsystem                       x20  SWXCR Mport/RAID Controller              

Number of Packets                 7.                                          

                                                                              

------ Packet Type ------       258. Module Name String                       

Routine Name                         re_complete                              

------ Packet Type ------       256. Generic String                           

                                     I/O failed                               

------ Packet Type ------       260. Hardware Error String                    

Error Type                           Hard Error Detected                      

------ Packet Type ------       256. Generic String                           

                                     Active XCR_COM at time of error          

------ Packet Type ------         0. SWXCR Communication Block (XCR_COM)      

                                                                              

*my_addr                  xFFFFFC001CF0E8D8                                   

Controller Number         x00000001                                           

Unit Number on Controller x00000000                                           

Function Status           x00000003  Command has Timed Out                    

Adapter Status            x00000000                                           

SWXCR Flags               x00000010  BP Points to Buffer                      

Received by Callback      x00000001                                           

*xcr_pdrv_ws              xFFFFFC001CF0E7A8                                   

*xcr_cntrl_ws             xFFFFFC001CF0E7D0                                   

*xcr_trns_ws              xFFFFFC001CF0E780                                   

*xcr_bp                   xFFFFFC000440CDC0                                   

*xcr_cbfcnp               xFFFFFC00005B9DC4                                   

*xcr_data_ptr             xFFFFFFFF9FE20000                                   

Data Xfer Length              65536.                                          

Number of Scatter Entries         0.                                          

Command Data Length               0.                                          

Block Number              x01603600                                           

Xfer Residual Length          65536.                                          

Timeout Value in Seconds         60.                                          

Command                   x00000002  Read                                     

                                                                              

------ Packet Type ------       256. Generic String                           

                                     Active Controller Working Set at time of 

                                     error                                    

------ Packet Type ------         1. Controller/HBA Working Set(CNTRL_WS)     

                                                                              

*cntrl_flink              xFFFFFC000076CBA0                                   

*cntrl_blink              xFFFFFC000076CBA0                                   

*cntrl_softc              xFFFFFC001FE57000                                   

*cntrl_com                xFFFFFC001CF0E8D8                                   

*cntrl_cmd_slot           xFFFFFC001FE57370                                   

General Flags             x00000001  Command Ready                            

Command Retry Count               0.                                          

*scsi_addr                x0000000000000000                                   

dma_mapping_0             x0000000000000000                                   

dma_mapping_1             x0000000000000000                                   

dma_mapping_sg            x0000000000000000                                   

160. Bytes Scatter/Gather            ** Not Printed **                        

Mask Register             x00001FFF                                           

Register 0                      x82                                           

Register 1                      x14                                           

Register 2                      x80                                           

Register 3                      x40                                           

Register 4                      x00                                           

Register 5                      x36                                           

Register 6                      x60                                           

Register 7                      x00                                           

Register 8                      x20                                           

Register 9                      xE8                                           

Register a                      xF0                                           

Register b                      x9C                                           

Register c                      x08                                           

Register d                      x00                                           

Register e                      x00                                           

Register f                      x00      

Thanks,
                                      
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines