[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Ask the Storage Architecture Group |
Notice: | Check out our web page at http://www-starch.shr.dec.com |
Moderator: | SSAG::TERZA N |
|
Created: | Wed Oct 15 1986 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 6756 |
Total number of notes: | 25276 |
6382.0. "Can see HSZ40 but can't access any drives." by EVMS::PIRULO::LEDERMAN (B. Z. Lederman) Fri Feb 07 1997 06:13
I tried asking this in what seemed like the appropriate conference, but
got no response. I've looked through many notes here, and haven't seen
anything. Any ideas would be appreciated.
<<< SSDEVO::DISK$SSDEVO_SYS2:[NOTES$LIBRARY]HSZ40_PRODUCT.NOTE;1 >>>
-< HSZ40 Product Conference >-
================================================================================
Note 760.0 Can see HSZ40 but can't access any drives No replies
EVMS::PIRULO::LEDERMAN "B. Z. Lederman" 215 lines 5-FEB-1997 12:17
--------------------------------------------------------------------------------
I've been through a lot of notes in several conferences, and can't see
any similar problem to this and can't find out where to go next.
We just got a new HSZ40 delivered. At the moment, I'm using a single
cable to connect it to one KZPSA on one Alphaserver 1000. When
everything comes up, the HSZ40 is seen by the Alpha but doesn't respond
properly. Once the system boots, the disks on the HSZ40 can be seen
but can't be accessed. Any attempt to access the disks hangs the
system.
I've checked at the HSZ40 console, self test runs without errors, the
configuration looks o.k., I can test / exercise the disks without
errors, etc. There is no SCSI ID conflict that I can see (HSZ40 is at
ID 6, KZPSA is at ID 2 or ID 7). There is a "Y" adapter with a
terminator at the HSZ40 end, the cable runs to one KZPSA adaptor on one
CPU, and should have an internal terminator. I've tried this system on
two different 1000s and one one 2100 and get the same results. I can
plug a StorageWorks box into the same systems and they work, so the
controller at the CPU end is o.k.
I suppose it's possible that there is something mis-configured in the
HSZ40, but I can't imagine what: I've checked everything I can find in
the manuals, I've tried re-configuring disk drives manually and using
CFMENU, I've tried different disks, and so on, but nothing changes. My
only other thought is that the unit was bad on delivery, or that the
cable is somehow bad in a way that allows it to partially function.
Below is a log of the console from one system showing what happens.
You can see some error messages come out when the system polls the
controllers, so the problem apparently is at a low level.
Any suggestions on where to go next?
SYSTEM SHUTDOWN COMPLETE
halted CPU 0
halt code = 5
HALT instruction executed
PC = ffffffff8006df28
>>>init
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot 0 -- pka -- QLogic ISP1020
bus 2, slot 1 -- ewa -- DECchip 21140-AA
bus 0, slot 11 -- pkb -- DEC KZPSA
bus 0, slot 12 -- pkc -- DEC KZPSA
bus 0, slot 13 -- fwa -- DEC PCI FDDI
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.7-179, built on Dec 17 1996 at 14:26:45
>>>sho devi
waiting for pkc0.7.0.12.0 to poll... <-
amcsr_lo = 8 <- THESE MESSAGES OCCUR ONLY WHEN
abbrr_lo = 200a40f <- THE SYSTEM IS CONNECTED TO
dafqir_lo = 80052b1 <- THE HSZ40
dacqir_lo = 8005459 <-
asr_lo = 10 <- MORE MESSAGES COME OUT DURING
afar_lo = 0 <- THE BOOT PROCESS, BELOW
afpr_lo = 30a <-
waiting for pkc0.7.0.12.0 to poll... <-
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
dka0.0.0.2000.0 DKA0 RZ26N 0616
dka400.4.0.2000.0 DKA400 RRD45 0436
dva0.0.0.1000.0 DVA0
ewa0.0.0.2001.0 EWA0 00-00-F8-03-E6-74
fwa0.0.0.13.0 FWA0 00-00-F8-4A-A0-04
pka0.7.0.2000.0 PKA0 SCSI Bus ID 7 2.10
pkb0.7.0.11.0 PKB0 SCSI Bus ID 7 P01 A10
pkc0.7.0.12.0 PKC0 SCSI Bus ID 7 P01 A10
>>>sho conf
Digital Equipment Corporation
AlphaServer 1000A 4/***
Firmware
SRM Console: V4.7-179
ARC Console: 4.49
PALcode: VMS PALcode V5.56-6, OSF PALcode X1.45-12
Serial Rom: V2.8
Processor
DECchip (tm) 21064A-2 233MHz
Memory
64 Meg of System Memory
Bank 0 = 64 Mbytes(16 MB Per Simm) Starting at 0x00000000
Bank 1 = No Memory Detected
Bank 2 = No Memory Detected
Bank 3 = No Memory Detected
Slot Option Hose 0, Bus 0, PCI
7 Intel 82375EB Bridge to Bus 1, EISA
8 DECchip 21050-AA Bridge to Bus 2, PCI
11 DEC KZPSA pkb0.7.0.11.0 SCSI Bus ID 7
12 DEC KZPSA pkc0.7.0.12.0 SCSI Bus ID 7
13 DEC PCI FDDI fwa0.0.0.13.0 00-00-F8-4A-A0-04
Slot Option Hose 0, Bus 1, EISA
Slot Option Hose 0, Bus 2, PCI
0 QLogic ISP1020 pka0.7.0.2000.0 SCSI Bus ID 7
dka0.0.0.2000.0 RZ26N
dka400.4.0.2000.0 RRD45
1 DECchip 21140-AA ewa0.0.0.2001.0 00-00-F8-03-E6-74
>>>b
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot 0 -- pka -- QLogic ISP1020
bus 2, slot 1 -- ewa -- DECchip 21140-AA
bus 0, slot 11 -- pkb -- DEC KZPSA
bus 0, slot 12 -- pkc -- DEC KZPSA
bus 0, slot 13 -- fwa -- DEC PCI FDDI
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.7-179, built on Dec 17 1996 at 14:26:45
CPU 0 booting
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
error on pkc0.6.0.12.0, cmd = 12, sts = 48, camh->status = 19
amcsr_lo = 8
abbrr_lo = 200a40f
dafqir_lo = 802b0f9
dacqir_lo = 80290a5
asr_lo = 10
afar_lo = 0
afpr_lo = 30a
SIMport Adapter error: asr = 10, afpr = 30a
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
CAM command EXECUTE_SCSI_IO timed out
(boot dka0.0.0.2000.0 -flags 0,0)
FRU table creation disabled
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 904 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1c2000, image_start = 0, image_bytes = 71000
initializing HWRPB at 2000
initializing page table at 3ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
error on pkc0.6.0.12.0, cmd = 12, sts = 48, camh->status = 19
OpenVMS (TM) Alpha Operating System, Version V7.1
[remainder of system boot looks normal.]
$ sho dev
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DAD0: Online 0
PRFMK1$DKB601: Offline 1
PRFMK1$DKC0: Mounted 0 AXPVMSSYS 1008792 308 1
PRFMK1$DKC400: Online wrtlck 0
PRFMK1$DVA0: Online 0
Device Device Error
Name Status Count
FTA0: Offline 0
LTA0: Offline mounted 0
OPA0: Online 0
RTA0: Offline 0
RTB0: Offline 0
TTA0: Online 0
Device Device Error
Name Status Count
LRA0: Online 0
Device Device Error
Name Status Count
EWA0: Online 0
EWA2: Online 0
EWA3: Online 0
FWA0: Online 0
FWA2: Online 0
FWA4: Online 0
FWA5: Online 0
GQA0: Online 0
IKA0: Offline 0
IMA0: Offline 0
INA0: Offline 0
LAST0: Online 0
MPA0: Online 0
OPA2: Online 0
OPA3: Online 0
PKA0: Online 0
PKB0: Online 1
PKC0: Online 0
WSA0: Offline 0
WSA1: Online 0
$ sho dev/fu dkb601
Disk PRFMK1$DKB601:, device type unknown, is offline, file-oriented device,
shareable, available to cluster, error logging is enabled.
Error count 1 Operations completed 0
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 0 Default buffer size 512
T.R | Title | User | Personal Name | Date | Lines |
---|
6382.1 | Need a LUN | JULIET::SMITH_P | | Fri Feb 07 1997 09:15 | 17 |
| The KZPSA as shipped is at SCSI Target 7. The HSZ40 is user selectable
with up to 4 SCSI Targets assigned. When performing a show this look
for SCSI targets and if you did not change it the default is "0". The
SCSI ID 6 you spoke about is slot dependant and is the SCSI ID in
relation to the Disks not the host.
If you have not created a LUN by the simple operation of using an
individual disk to test with:
HSZ>init disk100
HSZ>add unit d0 disk100
You would not be able to create a volume with a filesystem.
If after this above is done boot Alpha and now you should see a disk.
Paul
|
6382.2 | Did all of that already, the problem remains. | EVMS::PIRULO::LEDERMAN | B. Z. Lederman | Fri Feb 07 1997 12:01 | 6 |
| I have already done all of the steps in .-1. I thought I had explained
that in my base note.
The problem is that the Alpha _CAN_ see the disk, but any attempts to
access it hang the system.
|
6382.3 | SHO TIHS/SHO UNIT FULL | CSC32::M_DIFABIO | MOVL #OPINION,EXE$GL_BLAKHOLE | Fri Feb 07 1997 13:18 | 3 |
| Can you include a SHO THIS and SHO UNIT FULL from the HSZ?
Mark d.
|
6382.4 | Additional information. | EVMS::PIRULO::LEDERMAN | B. Z. Lederman | Mon Feb 10 1997 06:27 | 56 |
| This is what I get on the console now. (I had more disks plugged in,
they all looked the same: at the moment only one is plugged in.)
HSZ> show this
Controller:
HSZ40 ZG65009288 Firmware V30Z-2, Hardware A01
Not configured for dual-redundancy
SCSI address 7
Time: NOT SET
Host port:
SCSI target(s) (6), No preferred targets
TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
16 megabyte read cache, version 2
Cache is GOOD
Host Functionality Mode = A
HSZ> show unit full
LUN Uses
--------------------------------------------------------------
D601 DISK110
Switches:
RUN NOWRITE_PROTECT READ_CACHE
NOWRITEBACK_CACHE
MAXIMUM_CACHED_TRANSFER_SIZE = 32
State:
ONLINE to this controller
Not reserved
PREFERRED_PATH = THIS_CONTROLLER
Size: 2049853 blocks
HSZ> sho disk fu
Name Type Port Targ Lun Used by
------------------------------------------------------------------------------
DISK110 disk 1 1 0 D601
DEC RZ26 (C) DEC 392A
Switches:
NOTRANSPORTABLE
TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ negotiated)
Size: 2049853 blocks
Configuration being backed up on this container
This should probably be a separate note, but I first tried to do a
SET HOST /SCSI
command which is mentioned in the manuals, and I get
%DCL-W-ACTIMAGE, error activating image HSZTERM$SCSIPAD
Any suggestions on where this image is provided? Should it have been
on media supplied with the HSZ or do I have to try to track down a
CONDIST somewhere in ZKO?
Thanks again.
|
6382.5 | LUN 0 | JULIET::SMITH_P | | Mon Feb 10 1997 10:00 | 5 |
| From your response to the request for a show this and a sho units it
does seem you do not have a LUN 0 on SCSI target 6 but you do have a
LUN 1. Without a LUN 0 the system will not look for LUN1. Change from
D601 to D600 and see if the symptoms change.
|
6382.6 | Cahce Batteries | JULIET::SMITH_P | | Mon Feb 10 1997 10:02 | 4 |
| I missed it the first time but change your cache_policy=b because it
appears your cache battery entry does not appear.
Paul
|
6382.7 | Looking worse and worse. | EVMS::65381::LEDERMAN | B. Z. Lederman | Mon Feb 10 1997 12:38 | 60 |
| | <<< Note 6382.5 by JULIET::SMITH_P >>>
| -< LUN 0 >-
|
| From your response to the request for a show this and a sho units it
| does seem you do not have a LUN 0 on SCSI target 6 but you do have a
| LUN 1. Without a LUN 0 the system will not look for LUN1. Change from
| D601 to D600 and see if the symptoms change.
This seems rather odd to me: I do not remember seeing anything like
this in the documentation. Could you supply a pointer?
In any event, I had a D600 before, and since reading your note I
created one again.
When there was only one drive at D601 the VMS system could see a
DKB601, but it was offline.
With a drive at D600 and one at D601 the VMS system sees a drive at
DKB600 which is offline, and a drive at D601 which is online. However,
any attempts to access any unit, or do a SYSMAN IO AUTO hangs the
system.
None of this looks right. We don't have any of this kind of trouble
with our much older HSJ40. I followed the system configuration manual
to the letter several times, and it ought to just plug in and run. But
NOTHING is working correctly.
(We didn't have any trouble with our HSC70 either, but of course there
was less to go wrong in the configuration.)
I am more convinced than ever that the unit is just plain defective as
delivered and that I'm going to have to call Field Service to get a
replacement or something. But I need to be able to establish that it
really isn't working before I can put in the service call.
| <<< SSAG::DISK$ARCH2:[NOTES$LIBRARY.SSAG]ASK_SSAG.NOTE;32767 >>>
| -< Ask the Storage Architecture Group >-
|================================================================================
|Note 6382.6 Can see HSZ40 but can't access any drives. 6 of 6
|JULIET::SMITH_P 4 lines 10-FEB-1997 10:02
| -< Cahce Batteries >-
|--------------------------------------------------------------------------------
| I missed it the first time but change your cache_policy=b because it
| appears your cache battery entry does not appear.
|
| Paul
Apparently, we don't have any. I opened up the unit to look, and the
spaces for the batteries are empty and there is a jumper in the
"battery disable" position.
Aren't brand-new HSZ40s supposed to have batteries?
I dug through the papers that came with it and it included what looks
like a printout of the console with a SHOW THIS_CONTROLLER FULL
command. The printout looks similar to what I get now. It does show
CACHE_POLICY = A, however. It also shows all of the licenses disabled.
We ordered it with shadowing and raid licenses, and the paperwork
included documents to that effect.
|
6382.8 | CLEAR LOST_DATA, just to be sure... | CSC32::M_DIFABIO | MOVL #OPINION,EXE$GL_BLAKHOLE | Mon Feb 10 1997 17:14 | 8 |
| Pointer to HSZTERM is CSC32::ISG$COMMON:[HSZ40.HSZTERM_VMS]. And
another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
See if the controller responds. Another thing I might do is a
CLEAR LOST_DATA D600 or D601. Sometimes there is lost data even though
the SHO UNIT output doesn't indicate it. I'd suggest doing the
CLEAR LOST_DATA D600 to at least rule that out as a possibility.
Mark d.
|
6382.9 | Batteries Required | JULIET::SMITH_P | | Mon Feb 10 1997 19:46 | 8 |
| Since you do not have any cache batteries and the jumper is installed(a
good thing) and you do not have a license for WBC set this
cache_policy=b. This will put the controller in write through mode.
Also most os require a LUN 0 to look for a LUN 1.
Paul
|
6382.10 | My controller won't CLEAR LOST_DATA | EVMS::65381::LEDERMAN | B. Z. Lederman | Tue Feb 11 1997 07:56 | 21 |
| | <<< Note 6382.8 by CSC32::M_DIFABIO "MOVL #OPINION,EXE$GL_BLAKHOLE" >>>
| -< CLEAR LOST_DATA, just to be sure... >-
|
| Pointer to HSZTERM is CSC32::ISG$COMMON:[HSZ40.HSZTERM_VMS]. And
Thanks for the pointer. Unfortunately, HSZTerm does not install
properly on OpenVMS V7.1 on the Alpha (guess the kit hasn't caught up
yet?). There is a CLI failure, error parsing 'CACHING', specified
entry not found in command tables.
| another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
| See if the controller responds. Another thing I might do is a
| CLEAR LOST_DATA D600 or D601. Sometimes there is lost data even though
This command fails on my HSZ40 console, it says LOST_DATA can't be
performed because there is no LOST_DATA.
We're going back over all of the paper-work. The system was ordered
with licenses and the paperwork that came with it says we have
licenses, but I have to see if they were loaded properly.
|
6382.11 | SCSI_INFO doesn't work either. | EVMS::65381::LEDERMAN | B. Z. Lederman | Tue Feb 11 1997 07:58 | 13 |
| | another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
I tried this, and the controller never responds. The system is just
hung forever, or until I do a Control-P on the system console. At that
point, I get an "unknown system state" error, and I have to INIT the
system to continue.
When I boot now I get an error I don't remember seeing before,
CAM command EXECUTE_SCSI_IO timed out
even if I'm booting from a local disk and not trying to access the
HSZ40.
|
6382.12 | Terminators? | CSC32::M_DIFABIO | MOVL #OPINION,EXE$GL_BLAKHOLE | Tue Feb 11 1997 10:56 | 6 |
| I don't want to drag this along through the notesfile, so I'll mail
you on a couple other things to try. One thing I'd like to know is
where termination is (for the scsi bus) and what the terminator looks
like. (Metal with a black cover or a molded plastic terminator.)
Mark d.
|
6382.13 | Cable was too long or defective. | EVMS::PIRULO::LEDERMAN | B. Z. Lederman | Wed Feb 12 1997 10:21 | 9 |
|
Thanks to some off-line help from M. Difabio, my system is now working.
The HSZ40 was supplied with a BN21F-10 cable. This cable was either
defective, or is too long even for differential SCSI. I replaced it
with a BN21F-05 and everything appears to be working normally.
Bart.
|
6382.14 | BN21F? | USPS::FPRUSS | Frank Pruss, 202-232-7347 | Sat Feb 15 1997 05:28 | 2 |
| I don't find a BN21F-xx in the price file. I would have expected a
BN21K-xx.
|