[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6400.0. "EZ32 on HSJ50 errlog analysis please ?" by KAOFS::mqsn10.mqs.dec.com::d_Ormaechea () Mon Feb 17 1997 13:11


I need help to get more information on errorlog analysis from an EZ32-VW
drive on an HSJ50 in a VAX VMS cluster running version 5.5-2. I had two 
failures on device DUA1000 (Original & replacement drive) about 
uncorrectable ecc error. The drive goes offline because of the error. Here 
is the SWEAT V2.7 and errorlog output that i got. It does not lead me to
any ASC/ASCQ code that could tell me if both problems are similar.In both
case, we had to swap the drive.

I can make both ERRLOG.SYS files available on the net for anybody that
migth be interrested. One more problem about these drives, we
cannot easely get the EZ32-VW swap unit from SR17, in both case we had to 
replace the internal unit EZ32-W.Any reason why ?


Thanks in advance

Denis.

-----------------------------------------------------------------------

 Copyright Digital Equipment Corporation 1993,1994. All rights reserved. 
 StorageWorks Errorlog Analyser Tool	X2.7 


** X2.7 ******************** Entry ********************************
 ERROR SEQUENCE 632.			     Logged On: SID 13000202
 Date/Time 13-FEB-1997 12:22:22.33		           Sys_Type 
02100101
 SCS Node:  TS3      
 ERL$LogMessage Entry
 I/O Sub-System, Unit HSJ010$DUA1000
        Message Type              0001	Disk MSCP Message
        MSLG$L_CMD_REF        00000000
        MSLG$W_SEQ_NUM            0003
        MSLG$B_FORMAT               02
                                        Disk Xfer log
        MSLG$B_FLAGS                00
                                        Unrecoverable error
        MSLG$W_EVENT              000B
        MSLG$Q_CNT_ID         63300559
                              012D0009
                                        Unique Identifier, 000963300559
                                        Mass Storage Controller
                                        HSJ50
        MSLG$B_CNT_SVR              50
                                        Cronic Rev V800
        MSLG$B_CNT_HVR              01
        MSLG$W_MULT_UNT           0005
        MSLG$Q_UNIT_ID        00000000
                              02FF0000
                                        Unique Identifier, 000000000000
                                        Disk Class Device (166)
                                        HSX001
        MSLG$B_UNIT_SVR             01
        MSLG$B_UNIT_HVR             34
        MSLG$B_LEVEL                01
        MSLG$B_RETRY                00
        MSLG$L_VOL_SER        05590002
        MSLG$L_HDR_CODE       000358A0
                                        LBN = 219296
        Instance Code         0328450A
        Template Type               51
                                        Disk Xfer Error Event
        Template Size               3C
        Event Time            002C0459
                              00000000
        Ancillary Info        00000000
                              000358A0
                              00000000
                                        Byte Count = 	00000000
                                        LBN = 	000358A0
        Device Locator          000501
                                        PTL = 1/5/0
        Device Type                 00
                                        Magnetic Disk
        Device Ident          32335A45
                              20202020
                              29432820
                              43454420
                                        "EZ32     (C) DEC"
        Device Serial No      20202020
                              20202020
                                        "        "
        SCSI Cmd Opcode             2A
                                        WRITE (10 byte)
        Sense Data Qual             80
                                        Sense Data from device

        Err Code & Valid            70
                                        Error during Current Command
                                        Sense Data Info Field NOT Valid
        Segment                     00
        Sense Flags                 00
                                        No Sense
        Information Field     00000000
        Additional Sense length     0A
        Command Spec          00000000
        ASCW                      0000
                                        ASC = 00, ASCQ = 00
                                        No Additional Sense Info
        FRUCode                     00
        Keyspec                 000000

Decoded Instance Code is:-
    The disk device reported standard SCSI Sense Data. Check the service 
    manual for the device for further instructions.

Repair action is:-
    If SWEAT has not interpreted the device-supplied Sense  data,  refer 
    to the device Service Manual.

Error Log Report Generator					Version 
V5.5   
 ******************************* ENTRY  *******************************
 ERROR SEQUENCE 632.                             LOGGED ON:        SID 
13000202
 DATE/TIME 13-FEB-1997 12:22:22.33                            SYS_TYPE 
02100101
 SYSTEM UPTIME: 4 DAYS 00:11:53
 SCS NODE: TS3                                                 VAX/VMS 
V5.5-2

 ERL$LOGMESSAGE ENTRY  KA66  CPU FW REV# 2.  CONSOLE FW REV# 1.0
                       XMI NODE # 1.

 I/O SUB-SYSTEM, UNIT _HSJ010$DUA1000:

       MESSAGE TYPE        0001
                                       DISK MSCP MESSAGE
       MSLG$L_CMD_REF  00000000
       MSLG$W_UNIT         03E8
                                       UNIT #1000.
       MSLG$W_SEQ_NUM      0003
                                       SEQUENCE #3.
       MSLG$B_FORMAT         02
                                       DISK TRANSFER LOG
       MSLG$B_FLAGS          00
                                       UNRECOVERABLE ERROR
       MSLG$W_EVENT        000B
                                       DRIVE ERROR
                                       UNKNOWN SUBCODE #0000(X)
       MSLG$Q_CNT_ID   63300559
                       012D0009
                                       UNIQUE IDENTIFIER, 000963300559(X)
                                       MASS STORAGE CONTROLLER
                                       MODEL = 45.
       MSLG$B_CNT_SVR        50
                                       CONTROLLER SOFTWARE VERSION #80.
       MSLG$B_CNT_HVR        01
                                       CONTROLLER HARDWARE REVISION #1.
       MSLG$W_MULT_UNT     0005
       MSLG$Q_UNIT_ID  00000000
                       02FF0000
                                       UNIQUE IDENTIFIER, 000000000000(X)
                                       DISK CLASS DEVICE (166)
                                       MODEL = 255.
       MSLG$B_UNIT_SVR       01
                                       UNIT SOFTWARE VERSION #1.
       MSLG$B_UNIT_HVR       34
                                       UNIT HARDWARE REVISION #52.
       MSLG$B_LEVEL          01
       MSLG$B_RETRY          00
       MSLG$L_VOL_SER  05590002
                                       VOLUME SERIAL #89718786.
       MSLG$L_HDR_CODE 000358A0
                                       LOGICAL BLOCK #219296.
                                       GOOD LOGICAL SECTOR

 CONTROLLER DEPENDENT INFORMATION

       LONGWORD 1.     0328450A
                                       /.E(./
       LONGWORD 2.     00003C51
                                       /Q<../
       LONGWORD 3.     00000000
                                       /..../
       LONGWORD 4.     002C0459
                                       /Y.,./
       LONGWORD 5.     00000000
                                       /..../
       LONGWORD 6.     00000000
                                       /..../
       LONGWORD 7.     000358A0
                                       /.X../
       LONGWORD 8.     00000000
                                       /..../
       LONGWORD 9.     00000501
                                       /..../
       LONGWORD 10.    32335A45
                                       /EZ32/
       LONGWORD 11.    20202020
                                       /    /
       LONGWORD 12.    29432820
                                       / (C)/
       LONGWORD 13.    43454420
                                       / DEC/
       LONGWORD 14.    20202020
                                       /    /
       LONGWORD 15.    20202020
                                       /    /
       LONGWORD 16.    0070802A
                                       /*.p./
       LONGWORD 17.    00000000
                                       /..../
       LONGWORD 18.    00000A00
                                       /..../
       LONGWORD 19.    00000000
                                       /..../
       LONGWORD 20.    00000000
                                       /..../
      
T.RTitleUserPersonal
Name
DateLines
6400.1Does not look right...SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Mon Feb 17 1997 15:5114
The analysis shows no sense keys. It does not seem right. Maybe Chris Loane will
chime in, but for some reason you got an error with no data to support it in the
log.

As for busting open the SBB, you should not have to, and you may end up eating
the cost if the dispensation we requested on the warrenty sticker has not made
it in to manufacturing yet. Have your logistic folk find out why they can't get
the correct varient in stock.

We/are the terminaton power jumpers installed in the device as per TIMA BLITZ
TD2041?

roger.
6400.2KERNEL::LOANEComfortably numb!!Tue Feb 18 1997 02:3123
>Decoded Instance Code is:-
>    The disk device reported standard SCSI Sense Data. Check the service 
>    manual for the device for further instructions.

    The  Instance  code  SUGGESTS  that  the HSJ is about to log all the 
    extended Sense data, but....

>       LONGWORD 16.    0070802A
>                                       /*.p./
>       LONGWORD 17.    00000000
>                                       /..../
>       LONGWORD 18.    00000A00
>                                       /..../
>       LONGWORD 19.    00000000
>                                       /..../
>       LONGWORD 20.    00000000
>                                       /..../

    ......it's  all  zero.....this  is  very   strange   (i.e.   nothing 
    useful/no  further  help).  Were there ANY other errors logged at or 
    around the same time??

    Chris
6400.3GOOD, it wasn't just me ;^)SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Tue Feb 18 1997 07:437
You may want to hook up a printer to the console of the controller and see if
any event are being dumped to the console. It sounds like either something is
being incorrectly reported OR we are missing some magic key to unlock what IS
coming back.

roger.
6400.4Action planKAOFS::D_ORMAECHEADenis Ormaechea... Montreal MCSTue Feb 18 1997 13:5524
    The customer's merged errorlog and text output of SWEAT V2.7 is now
    available on node MQOU27 decnet account (FAL$server). I found out that
    the errorlog can be analysed on VAX VMS6.2 even though the errorlog is
    out of a 5.5-2 system. File names are MSE_errlog.sys & MSE_sweat.txt. I
    will try to run DECEVENT from that file this afternoon.
    
    By the way, I checked the jumper on the drive that was replaced this
    weekend, and the jumper was missing. The only jumper present was 1-2 .I
    ran SCSIpro on that drive at the office and found no growing list out
    of the drive. I tried running read scan, write verify...etc but i
    cannot go over block number 48000. I'm working on this rigth now.I ran
    format successfully but still cannot go over block 48000.
    
    The plan for tonight, is to run dilx on the HSJ50 to exercise the disk
    to see if we cannot get more accurate info out of the test.Then, maybe
    format the drive is necessary.After, we should be changing the drive's
    slot in the BA356 and recreate the unit. FMU on both HSJ's did not show
    any problem so far. I will hookup a printer on the HSJ also.
    
    Regards,
    
    Denis Ormaechea
    DTN-632-7942
    
6400.5Troubleshooting results.KAOFS::D_ORMAECHEADenis Ormaechea... Montreal MCSWed Feb 19 1997 19:49365
--------------------------------------------------------------------------------
DENIS ORMAECHEA           <Troubleshooting results.>           19-FEB-1997 21:30
--------------------------------------------------------------------------------


18-Feb-97 Action

	I was onsite almost all day to gather all possible information in logs
about the problem since it started. Most important info were in node TS4
errorlog log, but the file had corrupted entries that needed to fix by RMS.
After fixing everything, i merged all errorlog info from the cluster since
problem started (20-jan-97).Brougth merged errorlog and SWEAT text output to
office by tape cartridge.

	Copied files to MQOU27 decnet's account and asked RDC to run DECEVENT
from it get get more info.All files are called MSE_errlog.sys,MSE_sweat.txt,
MSE_decevent.txt.

	My first action onsite at 17:30 Hr was to run DILX on both drives to
get better info from SCSI ASC/ASCQ status. The dua1000 disk showed errors
within 2 Mins with the following results:

This is the config :
*****************************************************************************
Controller:
        HSJ50-AX ZG63300559 Firmware V50J-2, Hardware  A01
        Configured for dual-redundancy with ZG63100486
            In dual-redundant configuration
        SCSI address 6
        Time: 18-FEB-1997 17:38:50
Host port:
        Node name: HSJ010, valid CI node 6, 16 max nodes
        System ID 420010061122
        Path A is ON
        Path B is ON
        MSCP allocation class    1
        TMSCP allocation class   1
        CI_ARBITRATION = ASYNCHRONOUS
        MAXIMUM_HOSTS = 31
        NOCI_4K_PACKET_CAPABILITY
Cache:
        128 megabyte write cache, version 3
        Cache is GOOD
        Battery is GOOD
        No unflushed data in cache
        CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
        CACHE_POLICY = A
        NOCACHE_UPS
HSJ010 > sho d1000
MSCP unit                                    Uses
--------------------------------------------------------------

  D1000                                      DISK150
        Switches:
          RUN                    NOWRITE_PROTECT        READ_CACHE            
          WRITEBACK_CACHE       
          MAXIMUM_CACHED_TRANSFER_SIZE = 32
        State:
          AVAILABLE
          No exclusive access
          PREFERRED_PATH = THIS_CONTROLLER
        Size: 523366 blocks
HSJ010 > sho disk150
Name          Type          Port Targ  Lun                    Used by
------------------------------------------------------------------------------

DISK150       disk             1    5    0                    D1000
          DEC      EZ32     (C) DEC V064
        Switches:
          NOTRANSPORTABLE       
          TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ negotiated)
        Size: 523366 blocks
        Configuration being backed up on this container
HSJ010 > sho d1100
MSCP unit                                    Uses
--------------------------------------------------------------

  D1100                                      DISK210
        Switches:
          RUN                    NOWRITE_PROTECT        READ_CACHE            
          WRITEBACK_CACHE       
          MAXIMUM_CACHED_TRANSFER_SIZE = 32
        State:
          ONLINE to the other controller
          No exclusive access
          PREFERRED_PATH = OTHER_CONTROLLER
        Size: 523366 blocks
HSJ010 > sho disk210
Name          Type          Port Targ  Lun                    Used by
------------------------------------------------------------------------------

DISK210       disk             2    1    0                    D1100
          DEC      EZ32     (C) DEC V064
        Switches:
          NOTRANSPORTABLE       
          TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ negotiated)
        Size: 523366 blocks
        Configuration being backed up on this container


This is the results :

*******************************************************************************


HSJ010 > run dilx

Disk Inline Exerciser - version 2.0

Note: DILX will only test units with a single physical device.

The Auto-Configure option will automatically select, for testing, half or
all of the disk units configured. It will perform a very thorough test with
*WRITES* enabled. Only disk units with a single physical device will be
tested. The user will only be able to select the run time and
performance summary options and whether to test a half or full configuration.
The user will not be able to specify specific units to test.
The Auto-Configure option is only recommended for initial installations.

Do you wish to perform an Auto-Configure (y/n) [n] ?

Use all defaults and run in read only mode (y/n) [y] ?n
Enter execution time limit in minutes (1:65535) [10] ?30
Enter performance summary interval in minutes (1:65535) [10] ?
Include performance statistics in performance summary (y/n) [n] ?
Display hard/soft errors (y/n) [n] ?y
Display hex dump of Error Information Packet Requester Specific 
information (y/n) [n] ?
When the hard error limit is reached, the unit will be dropped from testing.
Enter hard error limit (1:65535) [65535] ?
When the soft error limit is reached, soft errors will no longer be
displayed but testing will continue for the unit.
Enter soft error limit (1:65535) [32] ?
Enter IO queue depth (1:12) [4] ?
  *** Available tests are:
    1. Basic Function
    2. User Defined

Use the Basic Function test 99.9% of the time. The User Defined
test is for special problems only.
Enter test number (1:2) [1] ?1

 **CAUTION**
If you answer yes to the next question, user data WILL BE destroyed.

Write enable disk unit(s) to be tested (y/n) [n] ?y
The write percentage will be set automatically. 
Enter read percentage for Random IO and Data Intensive phase (0:100) [67] ?
Enter data pattern number 0=ALL, 19=USER_DEFINED, (0:19) [0] ?
Perform initial write (y/n) [n] ?y
The erase percentage will be set automatically.
Enter access percentage for Seek Intensive phase (0:100) [90] ?
Perform data compare (y/n) [n] ?y
Enter compare percentage (1:100) [5] ?50
Disk unit numbers available for testing on this controller include:
    1000
    1100
Enter unit number to be tested ?1000
Unit 1000 will be write enabled.
Do you still wish to add this unit (y/n) [n] ?y
Enter start block number (0:523365) [0] ?
Enter end block number (0:523365) [523365] ?
Unit 1000 successfully allocated for testing
Select another unit (y/n) [n] ?

   DILX testing started at: 18-FEB-1997 17:57:01
    Test will run for 30 minutes
    Type ^T(if running DILX through VCS) or ^G(in all other cases)
      to get a current performance summary
    Type ^C to terminate the DILX test prematurely
    Type ^Y to terminate DILX prematurely

Error Information Packet in hex
      Cmd Ref Number       000010D5
      Unit Number          000003E8
      Log Sequence         0000002F
      Format               02
      Flags                40
      Event Code           0000000B
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0005
      Unit ID[0]           00000000
      Unit ID[1]           02FF0000
      Unit Software Rev    01
      Unit Hardware Rev    34
      Recovery Level       01
      Retry Count          00
      Serial Number        05590004
      Header Code          00022B8F
      Instance                    0328450A
      Template Type               51
      Requestor Information Size  3C
      Sense Key                   01
      ASC                         17
      ASQ                         07

Error Information Packet in hex
      Cmd Ref Number       000010D5
      Unit Number          000003E8
      Log Sequence         00000030
      Format               02
      Flags                80
      Event Code           0000000B
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0005
      Unit ID[0]           00000000
      Unit ID[1]           02FF0000
      Unit Software Rev    01
      Unit Hardware Rev    34
      Recovery Level       01
      Retry Count          00
      Serial Number        05590004
      Header Code          00022B8F
      Instance                    0328450A
      Template Type               51
      Requestor Information Size  3C
      Sense Key                   01
      ASC                         17
      ASQ                         07

Error Information Packet in hex
      Cmd Ref Number       00000000
      Unit Number          00000000
      Log Sequence         00000032
      Format               00
      Flags                02
      Event Code           0000016A
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0000
      Instance                    03F40064
      Template Type               41
      Requestor Information Size  04

 Bad Value Added Completion Status for unit 1000, end message in hex
      Event Code                 0043
      Op Code                    21
      Cmd Ref Number             000017CE
      Byte Count                 00005A00
      Error Byte Count           00000000
      Sequence Number            0000
      Flags                      00

Error Information Packet in hex
      Cmd Ref Number       000017CE
      Unit Number          000003E8
      Log Sequence         00000031
      Format               02
      Flags                40
      Event Code           0000002B
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0005
      Unit ID[0]           00000000
      Unit ID[1]           02FF0000
      Unit Software Rev    01
      Unit Hardware Rev    34
      Recovery Level       01
      Retry Count          00
      Serial Number        05590004
      Header Code          00026B8B
      Instance                    031A4002
      Template Type               51
      Requestor Information Size  3C
      Sense Key                   04
      ASC                         B0
      ASQ                         00

Error Information Packet in hex
      Cmd Ref Number       00000000
      Unit Number          00000000
      Log Sequence         00000034
      Format               00
      Flags                02
      Event Code           0000016A
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0000
      Instance                    03F40064
      Template Type               41
      Requestor Information Size  04

Error Information Packet in hex
      Cmd Ref Number       000017CE
      Unit Number          000003E8
      Log Sequence         00000033
      Format               02
      Flags                00
      Event Code           0000012B
      Controller ID        63300559 012D0009
      Controller SW ver    50
      Controller HW ver    01
      Multi Unit Code      0005
      Unit ID[0]           00000000
      Unit ID[1]           02FF0000
      Unit Software Rev    01
      Unit Hardware Rev    34
      Recovery Level       01
      Retry Count          00
      Serial Number        05590004
      Header Code          00026B8B
      Instance                    03134002
      Template Type               51
      Requestor Information Size  3C
      Sense Key                   04
      ASC                         E0
      ASQ                         06
  The unit status and/or the unit device type changed unexpectedly.
  Unit 1000 dropped from testing

   DILX Summary at 18-FEB-1997 17:58:34
   Test minutes remaining: 29, expired: 1

Cnt err in HEX  IC:03F40064  PTL:01/05/FF  Key:06  ASC/Q:00/00  HC:0  SC:2
  Total Cntrl Errs   Hard Cnt 0   Soft Cnt 2

Unit 1000     Total IO Requests 6098
  Err in Hex: IC 0328450A  PTL:01/05/00  Key:01  ASC/Q:17/07  HC:0  SC:2
  Err in Hex: IC 031A4002  PTL:01/05/00  Key:04  ASC/Q:B0/00  HC:0  SC:1
  Err in Hex: IC 03134002  PTL:01/05/00  Key:04  ASC/Q:E0/06  HC:1  SC:0
  Total Errs   Hard Cnt 1   Soft Cnt 3
  The unit status and/or the unit device type changed unexpectedly.
  Unit 1000 dropped from testing
Reuse Parameters (stop, continue, restart, change_unit) [stop] ?

DILX - Normal Termination
************************************************************************
Also had these errors:


Unit 1100     Total IO Requests 1136
  Err in Hex: IC 0326450A  PTL:02/01/00  Key:03  ASC/Q:80/00  HC:1  SC:0
  Err in Hex: IC 031A4002  PTL:02/01/00  Key:04  ASC/Q:B0/00  HC:0  SC:1
  Err in Hex: IC 03134002  PTL:02/01/00  Key:04  ASC/Q:E0/06  HC:1  SC:0
  Total Errs   Hard Cnt 2   Soft Cnt 1
  The unit status and/or the unit device type changed unexpectedly.
  Unit 1100 dropped from testing
******************************************************************************


	Troubleshooting aliminated the following:

HSJ50 Controller	:By running test from both controllers on both drives


SCSI cables		:By interchanging drives (on differents busses) and
BA356 BUS		 again running DILX on both drives from both contr.
BA356 slot		 Same drive was failing.
SCSI terminators


	After T/S, i've put back previously replaced drive in SBB and ran
same tests. All test ran fine. Customer ran INIT/erase on both unit and
putted them back in their respective shadowset.

I have ordered an EZ32-VW (Whole SBB swap unit from SR17 with ETA for 3-march.
Also contacted customer today and their was still no errors.

6400.6ok,SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Thu Feb 20 1997 11:1142
Let's see...

   DILX Summary at 18-FEB-1997 17:58:34
   Test minutes remaining: 29, expired: 1

Cnt err in HEX  IC:03F40064  PTL:01/05/FF  Key:06  ASC/Q:00/00  HC:0  SC:2
  Total Cntrl Errs   Hard Cnt 0   Soft Cnt 2

Unit 1000     Total IO Requests 6098
  Err in Hex: IC 0328450A  PTL:01/05/00  Key:01  ASC/Q:17/07  HC:0  SC:2
  Err in Hex: IC 031A4002  PTL:01/05/00  Key:04  ASC/Q:B0/00  HC:0  SC:1
  Err in Hex: IC 03134002  PTL:01/05/00  Key:04  ASC/Q:E0/06  HC:1  SC:0
  Total Errs   Hard Cnt 1   Soft Cnt 3
  The unit status and/or the unit device type changed unexpectedly.
  Unit 1000 dropped from testing
Reuse Parameters (stop, continue, restart, change_unit) [stop] ?


Unit 1100     Total IO Requests 1136
  Err in Hex: IC 0326450A  PTL:02/01/00  Key:03  ASC/Q:80/00  HC:1  SC:0
  Err in Hex: IC 031A4002  PTL:02/01/00  Key:04  ASC/Q:B0/00  HC:0  SC:1
  Err in Hex: IC 03134002  PTL:02/01/00  Key:04  ASC/Q:E0/06  HC:1  SC:0
  Total Errs   Hard Cnt 2   Soft Cnt 1
  The unit status and/or the unit device type changed unexpectedly.
  Unit 1100 dropped from testing

Unit 1000 had a couple recoverable errors then disappeared. (E0 and B0 are HSJ)
Unit 1100 had a 1 hard error then disappeared. (E0 and B0 are HSJ events)

Seems strange. Both units are just dropping out of site. I see from the logs
they are different ports so that kinda rules out a power/bus issue. 

I see you did some moving around and reseating of hardware, did you change
anything or did the units just start running?

Did you have a "regular" disk to also use to make sure you did not have a non-ez
problem?

If these units fail again, escalate a case to engineering and have those units
analyzed to make sure you are not fighting a symptom of "something else".

roger.
6400.7Confusion here... Sorry!KAOFS::D_ORMAECHEADenis Ormaechea... Montreal MCSFri Feb 21 1997 09:4626
    
    Roger,
    
    	Let me appologize for the confusion here. With the cut and paste
    i've done from my document, my intention was to show you that i had two
    kind  of error string out of DILX on the first instance code line:
    
    ASC/q:80/000 & 17/07.
    
    	The two units that you see in the report are actually the same
    physical drive, but in a different configuration during the
    troubleshooting step. To answer your question about the hardware
    change, the unit that i was troubleshooting had a solid problem even
    after moving the unit aroud, and putting it back the way it was. I've
    put back the original unit back in the SBB because i had it with me,
    and the ETA for the new EZ32-VW is March -03. The original unit failed
    on Feb-13, but may only have an intermitent problem (with the same
    symptoms), so i think that this unit is not reliable.
    
    	Conclusion of this, i think that i had 3 bad units in a row !!!!
    
    Regards,
    
    Denis
    
    
6400.8I hope not.SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Fri Feb 21 1997 09:507
Three bad is REALLY BAD luck or I'm about to get a LOT busier ;^)

I am going on vacation next week (yes, even I do take vacations ;^) but if you
want me to look at the bad unit, send me mail offine.

roger.