[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | HSZ40 Product Conference |
|
Moderator: | SSDEVO::EDMONDS |
|
Created: | Mon Apr 11 1994 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 902 |
Total number of notes: | 3319 |
851.0. "HSZ50 - KZPSA => SCSI bus reset (reason 0x6) !" by LEMAN::MARTIN_A (Be vigilant...) Wed Apr 23 1997 03:59
Hello,
I open a note in ask_ssag (#6610) and wonder is this error could
not be linked with HSZ50 - setting ?
These errors just appeared shortly after the HSZ50 installation...
Your advise will be welcome on the matter.
============================
Alain MARTIN/SSG Switzerland
*********************************************************************
We are having a few SCSI bus resets at a customer site these
last couple of days.
Customer wants to know the reason, as theses errors trigger
off alarms on their system... waking up the operators 8-)
I could not explain these, can someone find any clue...
2 differents busses seem to show the problem : SCSI #1 and
SCSI # 2, both are KZPSA connected to HSZ50 (FW v50Z) !
No device errors are reported !!
Error Message :
****************************************************************
Bus reset request from adapter detected
(reason = 0x6)
****************************************************************
I discovered in note #6066 (ask_ssag) that reason 0x6 means the
following:
Unable_to_arbitrate
6 This code shall be used with a Reset Request
message when the adapter is not participating
in any SCSI bus traffic, has I/O requests,
and has not seen an opportunity to arbitrate
for the bus for a significant period time.
The method in which the adapter determines
that it has been waiting too long arbitrate
for the bus is implementation specific, and
is not required to work when commands specify
an infinite timeout period (FFFFh).
Here are the system config and the dia outputs, hope it'll show
any evidence I'm not able to interpret myself...
**************************** ENTRY 1. *******************************
----- EVENT INFORMATION -----
EVENT CLASS OPERATIONAL EVENT
OS EVENT TYPE 300. SYSTEM STARTUP
SEQUENCE NUMBER 0.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sun Apr 13 17:59:03 1997
OCCURRED ON SYSTEM chpd01
SYSTEM ID x00050018
SYSTYPE x00000000
MESSAGE PCXAL keyboard, language
Francais
_(Suisse Romande)
Alpha boot: available memory
from
_0x1860000 to 0x3ffee000
Digital UNIX V3.2D-2 (Rev.
41.64); Sun
_Apr 13 16:40:27 MET DST 1997
physical memory = 1024.00
megabytes.
available memory = 999.61
megabytes.
using 3924 buffers containing
30.65
_megabytes of memory
Master cpu at slot 0.
Firmware revision: 4.7
PALcode: OSF version 1.21
ibus0 at nexus
AlphaServer 2100A 5/250
cpu 0 EV-5 4mb b-cache
cpu 1 EV-5 4mb b-cache
cpu 2 EV-5 4mb b-cache
cpu 3 EV-5 4mb b-cache
gpc0 at ibus0
pci0 at ibus0 slot 0
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
fd0 at fdi0 unit 0
Initializing xcr0. Please
wait.
Initializing xcr0. Please
wait.
Initializing xcr0. Please
wait.
Initializing xcr0. Please
wait.
Initializing xcr0. Please
wait.
xcr0 at eisa0
re0 at xcr0 unit 0 (unit status
=
_ONLINE, raid level = 0)
re1 at xcr0 unit 1 (unit status
=
_ONLINE, raid level = 0)
re2 at xcr0 unit 2 (unit status
=
_ONLINE, raid level = JBOD)
re4 at xcr0 unit 4 (unit status
=
_ONLINE, raid level = 0)
re5 at xcr0 unit 5 (unit status
=
_ONLINE, raid level = 0)
pci2000 at pci0 slot 3
psiop0 at pci2000 slot 1
Loading SIOP: script 1007400,
reg
_81804000, data 406393c8
scsi0 at psiop0 slot 0
rz0 at scsi0 bus 0 target 0
lun 0 (DEC
_ RZ28M (C) DEC 0616)
rz1 at scsi0 bus 0 target 1
lun 0 (DEC
_ RZ28D (C) DEC 0008)
rz6 at scsi0 bus 0 target 6
lun 0 (DEC _ RRD45 (C) DEC
1645)
tu0: DECchip 21040-AA:
Revision: 2.4
tu0 at pci2000 slot 6
tu0: DEC TULIP Ethernet
Interface,
_hardware address:
00-00-F8-20-C1-70
tu0: console mode: selecting
UTP
_(10BaseT) port
tu1: DECchip 21140-AA:
Revision: 1.2
tu1 at pci2000 slot 7
tu1: DEC Fast Ethernet
Interface,
_hardware address:
00-00-F8-03-1F-0E
tu1: console mode: selecting
UTP
_(100BaseT) port
Initializing xcr1. Please
wait.
Initializing xcr1. Please
wait.
Initializing xcr1. Please
wait.
Initializing xcr1. Please
wait.
xcr1 at pci2000 slot 8
re8 at xcr1 unit 8 (unit
status =
_ONLINE, raid level = 0)
re9 at xcr1 unit 9 (unit
status =
_ONLINE, raid level = 0)
re10 at xcr1 unit 10 (unit
status =
_ONLINE, raid level = JBOD)
re11 at xcr1 unit 11 (unit
status =
_ONLINE, raid level = 0)
re12 at xcr1 unit 12 (unit
status =
_ONLINE, raid level = 0)
re13 at xcr1 unit 13 (unit
status =
_ONLINE, raid level = 0)
re14 at xcr1 unit 14 (unit
status =
_ONLINE, raid level = JBOD)
vga0 at pci0 slot 6
1024x768 (S3TRIO )
pza0 at pci0 slot 7
pza0 firmware version: DEC P01
A10
_
scsi1 at pza0 slot 0
rz9 at scsi1 bus 1 target 1
lun 0
_(DEC HSZ50-AX V50Z)
rz10 at scsi1 bus 1 target 2
lun 0
_(DEC HSZ50-AX V50Z)
rz12 at scsi1 bus 1 target 4
lun 0
_(DEC HSZ50-AX V50Z)
rz12 at scsi1 bus 1 target 4
lun 1
_(DEC HSZ50-AX V50Z)
rz11 at scsi1 bus 1 target 3
lun 0
_(DEC HSZ50-AX V50Z)
psiop1 at pci0 slot 8
Loading SIOP: script 1013400,
reg
_81b32000, data 406397c8
scsi2 at psiop1 slot 0
tz16 at scsi2 bus 2 target 0
lun 0
_(DEC TZ877 (C) DEC
9B3C)
pza1 at pci0 slot 9
pza1 firmware version: DEC P01
A10
_
scsi3 at pza1 slot 0
rz25 at scsi3 bus 3 target 1
lun 0
_(DEC HSZ50-AX V50Z)
rz26 at scsi3 bus 3 target 2
lun 0
_(DEC HSZ50-AX V50Z)
rz27 at scsi3 bus 3 target 3
lun 0
_(DEC HSZ50-AX V50Z)
rz28 at scsi3 bus 3 target 4
lun 0
_(DEC HSZ50-AX V50Z)
rz28 at scsi3 bus 3 target 4
lun 1
_(DEC HSZ50-AX V50Z)
dli: configured
************************************************************************
DECevent V2.2
******************************** ENTRY 1
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 5.
Timestamp of occurrence 22-APR-1997 13:31:32
Host name chpd01
System type register x00000018 Systype 24. Not announced yet
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------ Packet Type ------ 256. Generic String
A SCSI bus reset has been done
------ Packet Type ------ 1078. SIMport Softc(SIMPORT_SOFTC)
Packet Revision 2.
*spo_adp xFFFFFC003FA088A0
Adapter State x00002240 SIMport Thread Started
Path Inquiry info Valid
Flags - Supported Feature x00000004 Support Linked BSMs
Max # of Queued Cmds 55.
# of SCSI Channel 1.
Min KEEPALIVE Time(sec) 30.
Min # of Free Queue 3.
# of 4K Memory Segments 0.
Adap Min Data Alignment x00
# of SAC Buffers x00
CAM Version xD0
SCSI Capabilities x86
Target Mode Support x49
Miscellaneous Flags x00
HBA Engine Count 64512.
Vendor Unique Flags
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000000 00000080 3247FFFF
*..G2...........l*
Private Data Size x00000000
Async Capabilities x00000000
Highest Path ID x38
SCSI Device ID x00
SIM Vendor ID (ASCII) M-q
HBA Vendor ID (ASCII) ORT/VA13DEC P01
*cam_osd_usage x2020203031412020
Max CDB Length 0.
*spo_sim_softc x000000000000000C
*waitq_head xFFFFFFFF803F1000
*waitq_tail xFFFFFC003F998840
Lock for Wait Queue x3A16E120
*spo_adap_sanity_ccb x000000000049DBF4
*spo_adap_ccb xFFFFFC003FE18B28
# 100 millsec since MIN 1071744808.
**spo_stl_nexus x0000000000000000
# LUNs in Crash Recovery 0.
******************************** ENTRY 2
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 4.
Timestamp of occurrence 22-APR-1997 13:31:32
Host name chpd01
System type register x00000018 Systype 24. Not announced yet
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 3.
Unit Number xFFFF Target = 7.
LUN = 7.
Not Defined
------- CAM Data -------
Class x33 SIMport Adapter - KZxSA
Subsystem x33 SIMport Adapter - KZxSA
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name spo_bus_reset_rspn
------ Packet Type ------ 256. Generic String
Bus reset request from adapter
detected
(reason = 0x6)
------ Packet Type ------ 1078. SIMport Softc(SIMPORT_SOFTC)
Packet Revision 2.
*spo_adp xFFFFFC003FA088A0
Adapter State x00002240 SIMport Thread Started
Path Inquiry info Valid
Flags - Supported Feature x00000004 Support Linked BSMs
Max # of Queued Cmds 55.
# of SCSI Channel 1.
Min KEEPALIVE Time(sec) 30.
Min # of Free Queue 3.
# of 4K Memory Segments 0.
Adap Min Data Alignment x00
# of SAC Buffers x00
CAM Version xD0
SCSI Capabilities x86
Target Mode Support x49
Miscellaneous Flags x00
HBA Engine Count 64512.
Vendor Unique Flags
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000000 00000080 3247FFFF *..G2...........l*
Private Data Size x00000000
Async Capabilities x00000000
Highest Path ID x38
SCSI Device ID x00
SIM Vendor ID (ASCII) M-q
HBA Vendor ID (ASCII) ORT/VA13DEC P01
*cam_osd_usage x2020203031412020
Max CDB Length 0.
*spo_sim_softc x000000000000000C
*waitq_head xFFFFFFFF803F1000
*waitq_tail xFFFFFC003F998840
Lock for Wait Queue x3A16E120
*spo_adap_sanity_ccb x000000000049DB8C
*spo_adap_ccb xFFFFFC003FE18B28
# 100 millsec since MIN 1071744808.
**spo_stl_nexus x0000000000000000
# LUNs in Crash Recovery 0.
******************************** ENTRY 3
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 3.
Timestamp of occurrence 21-APR-1997 12:31:32
Host name chpd01
System type register x00000018 Systype 24. Not announced yet
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 1.
Unit Number xFFFF Target = 7.
LUN = 7.
Not Defined
------- CAM Data -------
Class x33 SIMport Adapter - KZxSA
Subsystem x33 SIMport Adapter - KZxSA
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name spo_process_ccb
------ Packet Type ------ 256. Generic String
A SCSI bus reset has been done
------ Packet Type ------ 1078. SIMport Softc(SIMPORT_SOFTC)
Packet Revision 2.
*spo_adp xFFFFFC003FA08420
Adapter State x00002240 SIMport Thread Started
Path Inquiry info Valid
Flags - Supported Feature x00000004 Support Linked BSMs
Max # of Queued Cmds 55.
# of SCSI Channel 1.
Min KEEPALIVE Time(sec) 30.
Min # of Free Queue 3.
# of 4K Memory Segments 0.
Adap Min Data Alignment x00
# of SAC Buffers x00
CAM Version xD0
SCSI Capabilities x86
Target Mode Support x49
Miscellaneous Flags x00
HBA Engine Count 64512.
Vendor Unique Flags
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000000 00000080 3247FFFF *..G2...........l*
Private Data Size x00000000
Async Capabilities x00000000
Highest Path ID x38
SCSI Device ID x00
SIM Vendor ID (ASCII) M-q
HBA Vendor ID (ASCII) ORT/VA13DEC P01
*cam_osd_usage x2020203031412020
Max CDB Length 0.
*spo_sim_softc x000000000000000C
*waitq_head xFFFFFFFF803EB000
*waitq_tail xFFFFFC003FA20420
Lock for Wait Queue x3FF63020
*spo_adap_sanity_ccb x000000000049DBF4
*spo_adap_ccb xFFFFFC003FE07B28
# 100 millsec since MIN 1071675176.
**spo_stl_nexus x0000000000000000
# LUNs in Crash Recovery 0.
******************************** ENTRY 4
********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 2.
Timestamp of occurrence 21-APR-1997 12:31:32
Host name chpd01
System type register x00000018 Systype 24. Not announced yet
Number of CPUs (mpnum) x00000004
CPU logging event (mperr) x00000000
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 1.
Unit Number xFFFF Target = 7.
LUN = 7.
Not Defined
------- CAM Data -------
Class x33 SIMport Adapter - KZxSA
Subsystem x33 SIMport Adapter - KZxSA
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name spo_bus_reset_rspn
------ Packet Type ------ 256. Generic String
Bus reset request from adapter
detected
(reason = 0x6)
------ Packet Type ------ 1078. SIMport Softc(SIMPORT_SOFTC)
Packet Revision 2.
*spo_adp xFFFFFC003FA08420
Adapter State x00002240 SIMport Thread Started
Path Inquiry info Valid
Flags - Supported Feature x00000004 Support Linked BSMs
Max # of Queued Cmds 55.
# of SCSI Channel 1.
Min KEEPALIVE Time(sec) 30.
Min # of Free Queue 3.
# of 4K Memory Segments 0.
Adap Min Data Alignment x00
# of SAC Buffers x00
CAM Version xD0
SCSI Capabilities x86
Target Mode Support x49
Miscellaneous Flags x00
HBA Engine Count 64512.
Vendor Unique Flags
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000000 00000080 3247FFFF *..G2...........l*
Private Data Size x00000000
Async Capabilities x00000000
Highest Path ID x38
SCSI Device ID x00
SIM Vendor ID (ASCII) M-q
HBA Vendor ID (ASCII) ORT/VA13DEC P01
*cam_osd_usage x2020203031412020
Max CDB Length 0.
*spo_sim_softc x000000000000000C
*waitq_head xFFFFFFFF803EB000
*waitq_tail xFFFFFC003FA20420
Lock for Wait Queue x3FF63020
*spo_adap_sanity_ccb x000000000049DB8C
*spo_adap_ccb xFFFFFC003FE07B28
# 100 millsec since MIN 1071675176.
**spo_stl_nexus x0000000000000000
# LUNs in Crash Recovery 0.
***********************************************************************
T.R | Title | User | Personal Name | Date | Lines |
---|
851.1 | Beleive it's KZPSA FW bug or HSZ50 incompatibility ! | PANTER::MARTIN | Be vigilant... | Thu Apr 24 1997 07:15 | 26 |
| We still think there is something wrong around between HSZ50s and
KZPSAs !!!
We checked on HSZs errors reported by FMU, but none !!!
What could have caused these SCSI bus resets is : the customer is
running from crontab 2x/hour (at 01' and 31') a hszterm "show disk full"
to make sure all his disks are there...
This since 2 weeks (that's to say 48 times/day) , but only twice have
caused SCSI bus resets with reason 0x6 (Unable_to_arbitrate) !
We checked for disks errors, None !
We checked the disks FW versions , all RZ28 are at rev 442D and all
rz29b at rev 16 !!!
So we beleive there is something wrong (either timeout setting or
FW bug) that cause the KZPSAs to fail to arbitrate the SCSI from
time to time with HSZ50s (FW v50Z) !!!
Nobody aware of such a problem ???
Looking forward to hear from somebody...
============================
Alain MARTIN/SSG Switzerland
|
851.2 | | SSDEVO::T_GONZALES | | Mon Apr 28 1997 14:39 | 6 |
| We have usually seen these errors as a result of a scsi bus problem.
Check termnination at both ends, also check for bad cables/loose
cables, trilinks on hsz's. Basically anything on the scsi bus.
|
851.3 | errors come from show device full... | PANTER::MARTIN | Be vigilant... | Fri May 02 1997 09:31 | 27 |
| I agree that most of the time SCSI bus resets are caused by bad
SCSI cabling.
However it doesn't seem our case as the 2 buses do show the same
problem and we checked for the proper connection of the cables
and terminators (we haven't replaced cable/terminator though) !
But we were able to determine that the reason 0x6 (unable to
arbitrate) is caused by a script the customer is running each
30 minutes, this script do "show device full" hszterm
command.
Each time a SCSI bus reset occured, we saw from the log that all
the disks one of the BA350 shelf did not answer (but not always
the same) !
We think this means they were busy doing something at the time of
the "show device full". Is that wise ?
This script is there for checking for disk errors that would not
reported at Digital Unix level....
Do you know a better way to proceed ?
Cheers,
============================
Alain MARTIN/SSG Switzerland
|