[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6382.0. "Can see HSZ40 but can't access any drives." by EVMS::PIRULO::LEDERMAN (B. Z. Lederman) Fri Feb 07 1997 06:13

    I tried asking this in what seemed like the appropriate conference, but
    got no response.  I've looked through many notes here, and haven't seen
    anything.  Any ideas would be appreciated.
    
      <<< SSDEVO::DISK$SSDEVO_SYS2:[NOTES$LIBRARY]HSZ40_PRODUCT.NOTE;1 >>>
                         -< HSZ40 Product Conference >-
================================================================================
Note 760.0          Can see HSZ40 but can't access any drives         No replies
EVMS::PIRULO::LEDERMAN "B. Z. Lederman"             215 lines   5-FEB-1997 12:17
--------------------------------------------------------------------------------

    I've been through a lot of notes in several conferences, and can't see
    any similar problem to this and can't find out where to go next.
    
    We just got a new HSZ40 delivered.  At the moment, I'm using a single
    cable to connect it to one KZPSA on one Alphaserver 1000.  When
    everything comes up, the HSZ40 is seen by the Alpha but doesn't respond
    properly.  Once the system boots, the disks on the HSZ40 can be seen
    but can't be accessed.  Any attempt to access the disks hangs the
    system.
    
    I've checked at the HSZ40 console, self test runs without errors, the
    configuration looks o.k., I can test / exercise the disks without
    errors, etc.  There is no SCSI ID conflict that I can see (HSZ40 is at
    ID 6, KZPSA is at ID 2 or ID 7).  There is a "Y" adapter with a
    terminator at the HSZ40 end, the cable runs to one KZPSA adaptor on one
    CPU, and should have an internal terminator.  I've tried this system on
    two different 1000s and one one 2100 and get the same results.  I can
    plug a StorageWorks box into the same systems and they work, so the
    controller at the CPU end is o.k.
    
    I suppose it's possible that there is something mis-configured in the
    HSZ40, but I can't imagine what: I've checked everything I can find in
    the manuals, I've tried re-configuring disk drives manually and using
    CFMENU, I've tried different disks, and so on, but nothing changes.  My
    only other thought is that the unit was bad on delivery, or that the
    cable is somehow bad in a way that allows it to partially function.
    
    Below is a log of the console from one system showing what happens. 
    You can see some error messages come out when the system polls the
    controllers, so the problem apparently is at a low level.
    
    Any suggestions on where to go next?
    
    
	SYSTEM SHUTDOWN COMPLETE

halted CPU 0

halt code = 5
HALT instruction executed
PC = ffffffff8006df28
>>>init
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot  0 -- pka -- QLogic ISP1020
bus 2, slot  1 -- ewa -- DECchip 21140-AA
bus 0, slot 11 -- pkb -- DEC KZPSA
bus 0, slot 12 -- pkc -- DEC KZPSA
bus 0, slot 13 -- fwa -- DEC PCI FDDI
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.7-179, built on Dec 17 1996 at 14:26:45
>>>sho devi
waiting for pkc0.7.0.12.0 to poll...	<-
amcsr_lo = 8				<-  THESE MESSAGES OCCUR ONLY WHEN
abbrr_lo = 200a40f			<-  THE SYSTEM IS CONNECTED TO
dafqir_lo = 80052b1			<-  THE HSZ40
dacqir_lo = 8005459			<-
asr_lo = 10				<-  MORE MESSAGES COME OUT DURING
afar_lo = 0				<-  THE BOOT PROCESS, BELOW
afpr_lo = 30a				<-
waiting for pkc0.7.0.12.0 to poll...	<-
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
dka0.0.0.2000.0            DKA0                          RZ26N  0616
dka400.4.0.2000.0          DKA400                        RRD45  0436
dva0.0.0.1000.0            DVA0                               
ewa0.0.0.2001.0            EWA0              00-00-F8-03-E6-74
fwa0.0.0.13.0              FWA0              00-00-F8-4A-A0-04
pka0.7.0.2000.0            PKA0                  SCSI Bus ID 7  2.10
pkb0.7.0.11.0              PKB0                  SCSI Bus ID 7   P01  A10    
pkc0.7.0.12.0              PKC0                  SCSI Bus ID 7   P01  A10    
>>>sho conf
                        Digital Equipment Corporation
                           AlphaServer 1000A 4/***

Firmware
SRM Console:	V4.7-179
ARC Console:	4.49
PALcode:	VMS PALcode V5.56-6, OSF PALcode X1.45-12
Serial Rom:	V2.8

Processor
DECchip (tm) 21064A-2	233MHz

Memory
     64 Meg of System Memory
     Bank 0 = 64 Mbytes(16 MB Per Simm) Starting at 0x00000000
     Bank 1 = No Memory Detected 
     Bank 2 = No Memory Detected 
     Bank 3 = No Memory Detected 


 Slot	Option			Hose 0, Bus 0, PCI
   7	Intel 82375EB       	                    	Bridge to Bus 1, EISA
   8	DECchip 21050-AA    	                    	Bridge to Bus 2, PCI
  11	DEC KZPSA           	pkb0.7.0.11.0       	SCSI Bus ID 7
  12	DEC KZPSA           	pkc0.7.0.12.0       	SCSI Bus ID 7
  13	DEC PCI FDDI        	fwa0.0.0.13.0       	00-00-F8-4A-A0-04

 Slot	Option			Hose 0, Bus 1, EISA

 Slot	Option			Hose 0, Bus 2, PCI
   0	QLogic ISP1020      	pka0.7.0.2000.0     	SCSI Bus ID 7
				dka0.0.0.2000.0     	RZ26N
				dka400.4.0.2000.0   	RRD45
   1	DECchip 21140-AA    	ewa0.0.0.2001.0     	00-00-F8-03-E6-74
>>>b
ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.ef.df.ee.f4.
probing hose 0, PCI
probing PCI-to-EISA bridge, bus 1
probing PCI-to-PCI bridge, bus 2
bus 2, slot  0 -- pka -- QLogic ISP1020
bus 2, slot  1 -- ewa -- DECchip 21140-AA
bus 0, slot 11 -- pkb -- DEC KZPSA
bus 0, slot 12 -- pkc -- DEC KZPSA
bus 0, slot 13 -- fwa -- DEC PCI FDDI
ed.ec.eb.....ea.e9.e8.e7.e6.e5.e4.e3.e2.e1.e0.
V4.7-179, built on Dec 17 1996 at 14:26:45

CPU 0 booting

waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
error on pkc0.6.0.12.0, cmd = 12, sts = 48, camh->status = 19
amcsr_lo = 8
abbrr_lo = 200a40f
dafqir_lo = 802b0f9
dacqir_lo = 80290a5
asr_lo = 10
afar_lo = 0
afpr_lo = 30a
SIMport Adapter error: asr = 10, afpr = 30a
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
waiting for pkc0.7.0.12.0 to poll...
CAM command EXECUTE_SCSI_IO timed out
(boot dka0.0.0.2000.0 -flags 0,0)
FRU table creation disabled
block 0 of dka0.0.0.2000.0 is a valid boot block
reading 904 blocks from dka0.0.0.2000.0
bootstrap code read in
base = 1c2000, image_start = 0, image_bytes = 71000
initializing HWRPB at 2000
initializing page table at 3ff0000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
error on pkc0.6.0.12.0, cmd = 12, sts = 48, camh->status = 19


    OpenVMS (TM) Alpha Operating System, Version V7.1    

[remainder of system boot looks normal.]


$ sho dev

Device                  Device           Error    Volume         Free  Trans Mnt
 Name                   Status           Count     Label        Blocks Count Cnt
DAD0:                   Online               0
PRFMK1$DKB601:          Offline              1
PRFMK1$DKC0:            Mounted              0  AXPVMSSYS      1008792   308   1
PRFMK1$DKC400:          Online wrtlck        0
PRFMK1$DVA0:            Online               0

Device                  Device           Error
 Name                   Status           Count
FTA0:                   Offline              0
LTA0:                   Offline mounted      0
OPA0:                   Online               0
RTA0:                   Offline              0
RTB0:                   Offline              0
TTA0:                   Online               0

Device                  Device           Error
 Name                   Status           Count
LRA0:                   Online               0

Device                  Device           Error
 Name                   Status           Count
EWA0:                   Online               0
EWA2:                   Online               0
EWA3:                   Online               0
FWA0:                   Online               0
FWA2:                   Online               0
FWA4:                   Online               0
FWA5:                   Online               0
GQA0:                   Online               0
IKA0:                   Offline              0
IMA0:                   Offline              0
INA0:                   Offline              0
LAST0:                  Online               0
MPA0:                   Online               0
OPA2:                   Online               0
OPA3:                   Online               0
PKA0:                   Online               0
PKB0:                   Online               1
PKC0:                   Online               0
WSA0:                   Offline              0
WSA1:                   Online               0
$ sho dev/fu dkb601

Disk PRFMK1$DKB601:, device type unknown, is offline, file-oriented device,
    shareable, available to cluster, error logging is enabled.

    Error count                    1    Operations completed                  0
    Owner process                 ""    Owner UIC                      [SYSTEM]
    Owner process ID        00000000    Dev Prot            S:RWPL,O:RWPL,G:R,W
    Reference count                0    Default buffer size                 512

T.RTitleUserPersonal
Name
DateLines
6382.1Need a LUNJULIET::SMITH_PFri Feb 07 1997 09:1517
    The KZPSA as shipped is at SCSI Target 7.  The HSZ40 is user selectable
    with up to 4 SCSI Targets assigned.  When performing a show this look
    for SCSI targets and if you did not change it the default is "0".  The
    SCSI ID 6 you spoke about is slot dependant and is the SCSI ID in
    relation to the Disks not the host.
    
    If you have not created a LUN by the simple operation of using an
    individual disk to test with:
    
    HSZ>init disk100
    HSZ>add unit d0 disk100
    
    You would not be able to create a volume with a filesystem.
    
    If after this above is done boot Alpha and now you should see a disk.
    
    Paul
6382.2Did all of that already, the problem remains.EVMS::PIRULO::LEDERMANB. Z. LedermanFri Feb 07 1997 12:016
    I have already done all of the steps in .-1.  I thought I had explained
    that in my base note.
    
    The problem is that the Alpha _CAN_ see the disk, but any attempts to
    access it hang the system.
    
6382.3SHO TIHS/SHO UNIT FULLCSC32::M_DIFABIOMOVL #OPINION,EXE$GL_BLAKHOLEFri Feb 07 1997 13:183
    Can you include a SHO THIS and SHO UNIT FULL from the HSZ?
    
              Mark d.
6382.4Additional information.EVMS::PIRULO::LEDERMANB. Z. LedermanMon Feb 10 1997 06:2756
    This is what I get on the console now.  (I had more disks plugged in,
    they all looked the same: at the moment only one is plugged in.)
    
HSZ> show this
Controller:
        HSZ40 ZG65009288 Firmware V30Z-2, Hardware  A01
        Not configured for dual-redundancy
        SCSI address 7
        Time: NOT SET
Host port:
        SCSI target(s) (6), No preferred targets
        TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
        16 megabyte read cache, version 2
        Cache is GOOD
        Host Functionality Mode = A
HSZ> show unit full
    LUN                                      Uses
--------------------------------------------------------------

  D601                                       DISK110
        Switches:
          RUN                    NOWRITE_PROTECT        READ_CACHE            
          NOWRITEBACK_CACHE     
          MAXIMUM_CACHED_TRANSFER_SIZE = 32
        State:
          ONLINE to this controller
          Not reserved
          PREFERRED_PATH = THIS_CONTROLLER
        Size: 2049853 blocks
HSZ> sho disk fu
Name          Type                      Port Targ  Lun        Used by
------------------------------------------------------------------------------

DISK110       disk                         1    1    0        D601
          DEC      RZ26     (C) DEC 392A
        Switches:
          NOTRANSPORTABLE       
          TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ negotiated)
        Size: 2049853 blocks
        Configuration being backed up on this container
    
    This should probably be a separate note, but I first tried to do a 
    
    SET HOST /SCSI
    
    command which is mentioned in the manuals, and I get
    
    %DCL-W-ACTIMAGE, error activating image HSZTERM$SCSIPAD
    
    Any suggestions on where this image is provided?  Should it have been
    on media supplied with the HSZ or do I have to try to track down a
    CONDIST somewhere in ZKO?
    
    Thanks again.
    
6382.5LUN 0JULIET::SMITH_PMon Feb 10 1997 10:005
    From your response to the request for a show this and a sho units it
    does seem you do not have a LUN 0 on SCSI target 6 but you do have a
    LUN 1.  Without a LUN 0 the system will not look for LUN1.  Change from
    D601 to D600 and see if the symptoms change.
    
6382.6Cahce BatteriesJULIET::SMITH_PMon Feb 10 1997 10:024
    I missed it the first time but change your cache_policy=b because it
    appears your cache battery entry does not appear.
    
    Paul
6382.7Looking worse and worse.EVMS::65381::LEDERMANB. Z. LedermanMon Feb 10 1997 12:3860
|                     <<< Note 6382.5 by JULIET::SMITH_P >>>
|                                   -< LUN 0 >-
|
|    From your response to the request for a show this and a sho units it
|    does seem you do not have a LUN 0 on SCSI target 6 but you do have a
|    LUN 1.  Without a LUN 0 the system will not look for LUN1.  Change from
|    D601 to D600 and see if the symptoms change.
    
    This seems rather odd to me: I do not remember seeing anything like
    this in the documentation.  Could you supply a pointer?
    
    In any event, I had a D600 before, and since reading your note I
    created one again.
    
    When there was only one drive at D601 the VMS system could see a
    DKB601, but it was offline.
    
    With a drive at D600 and one at D601 the VMS system sees a drive at
    DKB600 which is offline, and a drive at D601 which is online.  However,
    any attempts to access any unit, or do a SYSMAN IO AUTO hangs the
    system.
    
    None of this looks right.  We don't have any of this kind of trouble
    with our much older HSJ40.  I followed the system configuration manual
    to the letter several times, and it ought to just plug in and run.  But
    NOTHING is working correctly.
    
    (We didn't have any trouble with our HSC70 either, but of course there
    was less to go wrong in the configuration.)
    
    I am more convinced than ever that the unit is just plain defective as
    delivered and that I'm going to have to call Field Service to get a
    replacement or something.  But I need to be able to establish that it
    really isn't working before I can put in the service call.
    
|        <<< SSAG::DISK$ARCH2:[NOTES$LIBRARY.SSAG]ASK_SSAG.NOTE;32767 >>>
|                    -< Ask the Storage Architecture Group >-
|================================================================================
|Note 6382.6        Can see HSZ40 but can't access any drives.             6 of 6
|JULIET::SMITH_P                                       4 lines  10-FEB-1997 10:02
|                              -< Cahce Batteries >-
|--------------------------------------------------------------------------------
|    I missed it the first time but change your cache_policy=b because it
|    appears your cache battery entry does not appear.
|    
|    Paul
    
    Apparently, we don't have any.  I opened up the unit to look, and the
    spaces for the batteries are empty and there is a jumper in the
    "battery disable" position.
    
    Aren't brand-new HSZ40s supposed to have batteries?
    
    I dug through the papers that came with it and it included what looks
    like a printout of the console with a SHOW THIS_CONTROLLER FULL
    command.  The printout looks similar to what I get now.  It does show
    CACHE_POLICY = A, however.  It also shows all of the licenses disabled.
    We ordered it with shadowing and raid licenses, and the paperwork
    included documents to that effect.
    
6382.8CLEAR LOST_DATA, just to be sure...CSC32::M_DIFABIOMOVL #OPINION,EXE$GL_BLAKHOLEMon Feb 10 1997 17:148
    Pointer to HSZTERM is CSC32::ISG$COMMON:[HSZ40.HSZTERM_VMS]. And 
    another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
    See if the controller responds. Another thing I might do is a 
    CLEAR LOST_DATA D600 or D601. Sometimes there is lost data even though
    the SHO UNIT output doesn't indicate it. I'd suggest doing the 
    CLEAR LOST_DATA D600 to at least rule that out as a possibility.
    
                             Mark d.
6382.9Batteries RequiredJULIET::SMITH_PMon Feb 10 1997 19:468
    Since you do not have any cache batteries and the jumper is installed(a
    good thing) and you do not have a license for WBC set this
    cache_policy=b.  This will put the controller in write through mode.
    
    Also most os require a LUN 0 to look for a LUN 1.
    
    Paul
    
6382.10My controller won't CLEAR LOST_DATAEVMS::65381::LEDERMANB. Z. LedermanTue Feb 11 1997 07:5621
|     <<< Note 6382.8 by CSC32::M_DIFABIO "MOVL #OPINION,EXE$GL_BLAKHOLE" >>>
|                    -< CLEAR LOST_DATA, just to be sure... >-
|
|    Pointer to HSZTERM is CSC32::ISG$COMMON:[HSZ40.HSZTERM_VMS]. And 
    
    Thanks for the pointer.  Unfortunately, HSZTerm does not install
    properly on OpenVMS V7.1 on the Alpha (guess the kit hasn't caught up
    yet?).  There is a CLI failure, error parsing 'CACHING', specified
    entry not found in command tables.
    
|    another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
|    See if the controller responds. Another thing I might do is a 
|    CLEAR LOST_DATA D600 or D601. Sometimes there is lost data even though
    
    This command fails on my HSZ40 console, it says LOST_DATA can't be
    performed because there is no LOST_DATA.
    
    We're going back over all of the paper-work.  The system was ordered
    with licenses and the paperwork that came with it says we have
    licenses, but I have to see if they were loaded properly.
    
6382.11SCSI_INFO doesn't work either.EVMS::65381::LEDERMANB. Z. LedermanTue Feb 11 1997 07:5813
|    another thing you might do is SCSI_INFO on the DKA600 or DKA601 device.
    
    I tried this, and the controller never responds.  The system is just
    hung forever, or until I do a Control-P on the system console.  At that
    point, I get an "unknown system state" error, and I have to INIT the
    system to continue.
    
    When I boot now I get an error I don't remember seeing before,
    
    CAM command EXECUTE_SCSI_IO timed out
    
    even if I'm booting from a local disk and not trying to access the
    HSZ40.
6382.12Terminators?CSC32::M_DIFABIOMOVL #OPINION,EXE$GL_BLAKHOLETue Feb 11 1997 10:566
    I don't want to drag this along through the notesfile, so I'll mail 
    you on a couple other things to try. One thing I'd like to know is
    where termination is (for the scsi bus) and what the terminator looks
    like. (Metal with a black cover or a molded plastic terminator.)
    
                      Mark d.
6382.13Cable was too long or defective.EVMS::PIRULO::LEDERMANB. Z. LedermanWed Feb 12 1997 10:219
    
    Thanks to some off-line help from M. Difabio, my system is now working.
    
    The HSZ40 was supplied with a BN21F-10 cable.  This cable was either
    defective, or is too long even for differential SCSI.  I replaced it
    with a BN21F-05 and everything appears to be working normally.
    
    Bart.
    
6382.14BN21F?USPS::FPRUSSFrank Pruss, 202-232-7347Sat Feb 15 1997 05:282
    I don't find a BN21F-xx in the price file.  I would have expected a
    BN21K-xx.