[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

6736.0. "Rz29 Alternate cylinder???" by MSAM03::RAHMAN () Fri May 30 1997 07:28

Hi,
1. I need clarification why the total cylinder of RZ29B-VA became less one
after a "retry exausted" error as shown in uerf entry.

THe disklayout before the error:

-----------------------------------------------------
# /dev/rrzc33c:
type: SCSI
disk: HSZ40
label: 
flags:
bytes/sector: 512
sectors/track: 113
tracks/cylinder: 20
sectors/cylinder: 2260
cylinders: 3707
sectors/unit: 8378028
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:       32        0    unused     1024  8192       	# (Cyl.    0 - 0*)
  b:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  c:  8378028        0    unused     1024  8192       	# (Cyl.    0 - 3707*)
  d:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  e:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  f:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  g:  4188998       32    unused     1024  8192       	# (Cyl.    0*- 1853*)
  h:  4188998  4189030    unused     1024  8192       	# (Cyl. 1853*- 3707*)
-----------------------------------------------------

Then the SAP R3 under Informix RDBMS performed some works. After some time, 
Informix DB crashed due to chunk offline problem.
Informix uses partition g and h for its raw devices.


When checked in the uerf, the following output was found:

						  uerf version 4.2-011 (122)
********************************* ENTRY     2. *********************************

----- EVENT INFORMATION -----

EVENT CLASS                             ERROR EVENT 
OS EVENT TYPE                  199.     CAM SCSI 
SEQUENCE NUMBER                 16.
OPERATING SYSTEM                        DEC OSF/1 
OCCURRED/LOGGED ON                      Thu May 29 16:43:51 1997
OCCURRED ON SYSTEM                      posmal1 
SYSTEM ID                 x00070016
SYSTYPE                   x00000000
PROCESSOR COUNT                  2.
PROCESSOR WHO LOGGED      x00000000

----- UNIT INFORMATION -----

CLASS                         x0000     DISK 
SUBSYSTEM                     x0000     DISK 
BUS #                         x0004
                              x0112     LUN x2
                                        TARGET x1

----- CAM STRING -----

ROUTINE NAME                            cdisk_complete 

----- CAM STRING -----

                                        Retries Exhausted 

----- CAM STRING -----

ERROR TYPE                              Hard Error Detected 

----- CAM STRING -----

DEVICE NAME                             DEC     HSZ4 

----- CAM STRING -----

                                        Active CCB at time of error 

----- CAM STRING -----

                                        CCB request completed with an error 
ERROR - os_std, os_type = 11, std_type = 10


----- ENT_CCB_SCSIIO -----

*MY ADDR                  x3FE29B28
CCB LENGTH                    x00C0
FUNC CODE            x01
CAM_STATUS                    x0084     CAM_REQ_CMP_ERR 
                                        AUTOSNS_VALID 
PATH ID              4.
TARGET ID            2.
TARGET LUN           2.
CAM FLAGS                 x00000442
                                        CAM_QUEUE_ENABLE 
                                        CAM_DIR_IN 
                                        CAM_SIM_QFRZDIS 
*PDRV_PTR                 x3FE29828
*NEXT_CCB                 x00000000
*REQ_MAP                  x3FE08400
VOID (*CAM_CBFCNP)()      x00526660
*DATA_PTR                 x400A5828
DXFER_LEN                 x00002000
*SENSE_PTR                x3FE29850
SENSE_LEN            xA0
CDB_LEN              x0A
SGLIST_CNT                    x0000
CAM_SCSI_STATUS               x0002     SCSI_STAT_CHECK_CONDITION 
SENSE_RESID          x8E
RESID                     x00002000
CAM_CDB_IO           x000000100000ACD47F000028
CAM_TIMEOUT               x0000003C
MSGB_LEN                      x0000
VU_FLAGS                      x4000
TAG_ACTION           x20

----- CAM STRING -----

                                        Error, exception, or abnormal 
                                         _condition 

----- CAM STRING -----

                                        ILLEGAL REQUEST - Illegal request or 
                                         _CDB parameter 

----- ENT_SENSE_DATA -----

ERROR CODE                    x0070     CODE x70
SEGMENT              x00
SENSE KEY                     x0005     ILLEGAL REQ 
INFO BYTE 3          x00
INFO BYTE 2          x00
INFO BYTE 1          x00
INFO BYTE 0          x00
ADDITION LEN         x0A
CMD SPECIFIC 3       x00
CMD SPECIFIC 2       x00
CMD SPECIFIC 1       x00
CMD SPECIFIC 0       x00
ASC                  x21
ASQ                  x00
FRU                  x00
SENSE SPECIFIC       x0200C0
ADDITIONAL SENSE    
0000:   02000000  00000000  00000000  00000000        *................*
0010:   00000000  00000000  00000000  00000000        *................*
0020:   00000000  00000000  00000000  00000000        *................*
0030:   00000000  00000000  00000000  00000000        *................*
0040:   00000000  00000000  00000000  00000000        *................*
0050:   00000000  00000000  00000000  00000000        *................*
0060:   00000000  00000000  00000000  00000000        *................*
0070:   00000000  00000000  00000000  00000000        *................*
0080:   00000000  00000000  00000000  00000000        *................*
0090:   7E250000  00005E3C  00000000  00000000        *..%~<^..........*


****************END Of uerf extraction****************************
After the error, the following steps was carried out:
1) disklabel -z /dev/rrzc33c
2) disklabel -wr /dev/rrzc33c hsz40

The total cylinder has becomes less 1, ie before error=3707, 
  after error=3706.


# /dev/rrzc33c:
type: SCSI
disk: HSZ40
label: 
flags: dynamic_geometry
bytes/sector: 512
sectors/track: 113
tracks/cylinder: 20
sectors/cylinder: 2260
cylinders: 3706
sectors/unit: 8377528
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   200000        0    unused     1024  8192       	# (Cyl.    0 - 88*)
  b:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  c:  8377528        0    unused     1024  8192       	# (Cyl.    0 - 3706*)
  d:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  e:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  f:        0        0    unused     1024  8192       	# (Cyl.    0 - -1)
  g:  4088700   200004    unused     1024  8192       	# (Cyl.   88*- 1897*)
  h:  4088700  4288708    unused     1024  8192       	# (Cyl. 1897*- 3706*)

*************End of disklabel*********************************

Can anyone explain???
I have read some notes regarding the problem with SCSI timeout error with
regards to a very busy IO disks connected to a SCSI controller. The note
mentioning that SCSI timeout error can be due to several reasons: 
1. Due to arbitration/priority of SCSI ID can cause the less priority disk to 
timeout while waiting for the higher priority disk being serviced.
2. When the application perform a long request to the disk. Informix uses raw
devices and the IO request is handled by it.
3. When the disk firmware does not cater for longer timeout setting.
4. Others.

I have a feeling, once the OS detected the harderror (as shown in the uerf)
it will mark the block as bad and therefore the disk is considered 
corrupted from Informix. Since this is not a real hardware fault, ie the
bad block was marked due to the "retry exhausted", the disk firmware
will perform a self recovery by creating an "alternate cylinder" and recover
the block on this cylinder. But since, all blocks of the disk are being
used, the creation of alternate cylinder will actually be on the informix
raw device, and therefore corrupted the informix pages and informix database
crashed as a result.

Therefore to eliminate this possibilty, I have reserved the first few 
cylinders for the alternate cylinder---if my theory is correct.
I have also reserved 4 blocks (2KB) before and after parttions g and h, to
eliminate possibility of informix to overwritten accross the disklayout
partition boundary. Infromix block size is 2KB.


Please I need the explaination for this strange behaviour/bug. If possible,
I need also some permanent solution for this problem.

Thanks in advance,

rahman ibrahim@MSA

T.RTitleUserPersonal
Name
DateLines
6736.1from the drive side...SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Fri May 30 1997 14:4911
No idea why you have 1 less cylnder reported.

The drive does not use any type of alternate cylnder to use for revectoring. It
has spare sectors pre-mapped at the factory on each track and each head that it
uses. If the drive has major media/head problems and uses up all of these
sectors (that are NOT included in the capacity of the device), any subsequent
reassign results in an Sense Key of 04, ASC of 19 (Defect List Error). Not an
Illegal Request as your log indicated.

Roger.
6736.5SCSI Timeout is an Issue for heavy IOMSAM03::RAHMANSun Jun 01 1997 03:48152
    
    Hi Roger,
    
    That is exactly the case.. the block is not bad as u could detect it
    during the formating at the manufacturing. It is an ugly block, ie 
    bad because of difficulties arise when attempting to read the block.
    
    I believe the situation that I am encountering is similar to the 2
    notes
    I attached below. Please analyse this situation. If there is no logical 
    explaination, then the customer has the right to change digittal's
    Hardware.
    I have checked the in /usr/include/sys/disklabel.h, about the
    definition
    of alternate sector and alternate cylinder and it seems that it is not
    used in /etc/disktab. Please verify the Rz29-va (is it seagate
    baracuda) and 
    is the unix driver does not comply to the SCSI command from seagate?
    
    MCS engineers has verified the "suspected" disk is OK at local digital
    office!! 
    
    I would say it is because of heavy IO, that the driver mark it as bad,
    and 
    the alternate track is running out, because of so many "UGLY" block.
    
    Please look into this matter more seriously. If u need info please ask
    for it.
    I am very interested to solve this matter once and for all. Otherwiese,
    tommorow I walk into the customer and selling different vendors box.
    
    rahman ibrahim@MSA
    SSU Malaysia.
    
    
    132.0">Topic #132: ``Bad RCT causes an err on BBR?
    
        I believe the term "Good" block and "Bad" block in the RCT should
    be    clearly understood. The term "bad" generally implies unreadable   
    If the block is deemed bad at the factory (PBN entry in the FCT)    or
    the Formatter "detects" the block as bad, then it will format    the
    header with header code "11", marking it unusuable. If the block   
    header is still "00" (Good LBN) but difficulties arise attempting    to
    read the block (continued uncorectable ECC, smashed header, etc)    the
    block is again deemed bad. Alternate copys of the relative block   
    will be acessed in the RCT during BBR or revector operations.       
    There is, however, a condition I like to call "ugly". This is a   
    block that is not bad but contains "bad data" with good ECC, EDC,   
    etc. Alternate copies of these type blocks WILL NOT BE ACCESSED   
    under normal circumstances.        Example:        K.SDI fails and
    "forgets the HOST/RCT boundary" and writes a data    pattern into the
    first few blocks of the RCT during periodics, for    example. This
    corrupts the first copy of the RCT control block.    The data happens
    to get written with good ECC,EDC. This could have    a variety of
    effects during host mount of that disk.        Continuing on, problems
    arise and the Field Engineer determines    the K.SDI is bad and
    replaces it. Good ! The disk is still corrupt    but the symptoms may
    not be obvious. If the corruption "clobbered    word 4 in the RCT (BBR
    control word) the symptoms appear during    each attempt to ONLINE the
    disk (VMS Mount for example). If the P1    or P2 flags happen to be
    set, the system will attempt to finish    a BBR that never really
    started. If the replaced LBN address field    gets filled with this
    erroneous pattern, the HSC may attempt a    BBR to a "non-existent" LBN
    and crash the HSC "Every time a mount    is attempted.  If undefined
    bits get set in the control word, the    HSC will "data safety
    write-protect" the disk every time it is mounted.    The list is
    endless, esp if the descriptor blocks become affected.        The point
    is this, if blocks in the RCT get written with bad data    but good
    ECC, then alternate copies of the blocks are NOT ACCESSED    because
    the block is considered "good" (better term is readable,    not
    necessarily good).        I can produce these symptoms manually, and
    they do happen in the field,    fortunately infrequently (I hope). We
    had two occassions of K.SDI    failure in our lab (CSSE lab) that
    produced these very same "subtle"    but serious problems. I saved the
    printout for one and use it during    my seminar (DSA troubleshooting )
    to teach FE's how to deal with    logical failures usually resulting
    from hardware failures.        Rule of thumb. If you have experienced
    any hardware problem that    could affect the R/W data path to the disk
    (controller, SDI, disk    electronics, you may have experienced
    corruption on the media, which    stays around "after" the HW is
    resolved. I call it logical recovery.        Mark Himes    CX/CSSE                                                                     
    href="5752.0">Topic #5752: ``command timeout issue ''
    
       Looks  like  HSJ01$DUA62  and  HSJ04$DUA702  are  suffering  Command    
    Timeouts; What rev firmware  are  they  running?  (If  it's  running    
    V007, upgrade to 0016...if it's running 0014, then it should be OK).                                                   
    I've  included  a  blitz that Roger Patenaude put out in relation to    
    Command Timeouts.    BTW, you should REALLY upgrade HSOF to V2.7 and
    SWEAT to X2.7Copyright (c) Digital Equipment Corporation 1995. All
    rights reserved.      +---------------------------+TM      |   |   |  
    |   |   |   |   |      | d | i | g | i | t | a | l |              TIME
    DEPENDENT CASE      |   |   |   |   |   |   |   |     
    +---------------------------+      TITLE: What are SCSI Command
    Timeouts Errors?      AUTHOR: Roger Patenaude                   DATE:
    August 16, 1995      DTN: 237-3705                             TD #:
    1904      ENET: BABAGI::Patenaude                   CROSS REFERENCE
    #'s:      DEPT: Storage External Products           (PRISM/TIME/CLD#'s)           
    Continuation Engineering      INTENDED AUDIENCE: All                   
    PRIORITY LEVEL:  2      (U.S./EUROPE/GIA)                        
    (1=TIME CRITICAL,                                               
    2=NON-TIME CRITICAL)     
    =====================================================================     
    PROBLEM:      --------      The purpose of this Blitz is to give you
    some insight as to what a      SCSI "Command Timeout" error is. I've
    kept this very generic as more      of an informational Blitz for a
    change.      These errors are telling you that a specific "command" did
    not      complete in a specified period of time. This can be caused by     
    multiple sources and in most all cases can be recovered by the host     
    system by reissuing the failed command. Some of the reasons for     
    "Command Timeouts" are;      1) The SCSI bus is too busy. The SCSI bus
    priority is designed using         the drives ID in arbitration with no
    regard for how many times         the device wins the bus. So, if you
    have a bus with the highest         priority device doing VERY heavy
    workload ("hogging" the bus), then         other devices on the bus
    will not be able to arbitrate and win the         bus. These devices
    will then have commands outstanding that they         cannot complete.
    The host will then log an error "command timeout"         and sometime
    follow it with a bus reset.      2) The host issued a command to a
    drive that took to long to         complete. This could be due to a
    broken device but more common is         that the device is doing a
    long commands and does not have time to         answer the host. Normal
    convention is the host will only ask "how         things are
    proceeding" (as in the case where you issued a rewind         to a tape
    drive and are waiting for it to become ready) via a Test         Unit
    Ready command but if data type (read/write) command are        
    continually issued to the unit this the first command can not be        
    completed and may time out.      3) Operating system driver issues. The
    drivers may not be allowing         reasonable enough time for the
    commands to complete. A case in         point, VMS recently increased
    the command timeout values in         MKDRIVER (TAPE) and DKDRIVER
    (DISK) (from 3 seconds to 10 in MK).          This was because 3 was
    just to aggressive on a busy bus and command          timeouts and bus
    resets were occurring under heavy load.      4) Device issues. The
    drive may not have enough horsepower to         complete the commands
    it accepted in a reasonable amount of time.         OR, the drive may
    be not be working on commands it has accepted         because it is too
    busy. RZ28B's running version 003 code are one         such case, the
    drive will optimize it's seeks by working commands         that are in
    the local area of the heads. One side effect is that a         command
    may timeout if it was not in the local area of where the         drive
    is spending all it's time thus not getting serviced. RZ28B's        
    running 006 do not have this issue.      RESOLUTION/WORKAROUND:     
    -----------------------         For the most part these are just events
    and should be left alone.                  In the rare case where this
    is disruptive due to resets occurring,         review the four points
    above and see how they fit into your         environment. You may need
    to split heavily loaded devices between         multiple busses, or you
    may need new firmware or maybe move a         device off to another
    bus.       ADDITIONAL COMMENTS:      --------------------         None.                     
    ****  DIGITAL INTERNAL USE ONLY ****
    
    
6736.6Man you are ALL over the place...SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY&#039;s...Mon Jun 02 1997 12:5145
>    That is exactly the case.. the block is not bad as u could detect it
>    during the formating at the manufacturing. It is an ugly block, ie 
>    bad because of difficulties arise when attempting to read the block.

Exactly WHAT case????? You got a failure in the errorlog that said;

----- CAM STRING -----

                                        ILLEGAL REQUEST - Illegal request or 
                                         _CDB parameter 

The drive also returned status that said it got an invalid request!

How are you equating that with a note about DSDF / RCT / FCT information that
was written about SDI device's (RA81, RA82, RA90, etc...) and a note about
command timeouts???????

>   Please verify the Rz29-va (is it seagate
>    baracuda) and 
>    is the unix driver does not comply to the SCSI command from seagate?
 
It is a Seagate drive and YOU can dig through UNIX drivers. Not I. 

>    MCS engineers has verified the "suspected" disk is OK at local digital
>    office!! 
>

So it's probably not the drive ;^)

>    Please look into this matter more seriously. If u need info please ask
>    for it.

NOTES IS NOT AN ESCALATION PATH!!!!! You need to look at this more seriously and
follow proper escalation to get this looked at. Have you tried any local sales
and service support folk? (Don't answer, rhetorical question)

>    I am very interested to solve this matter once and for all. Otherwiese,
>    tommorow I walk into the customer and selling different vendors box.

UNBELIEVABLE!!!! You have what most likely is a SOFTWARE problem and you are
about to condem our hardware. Unbelievable is all I can say. Glad I only have
250 shares of DEC stock as of today with this mindset.

roger.
6736.7Help is needed......MSAM03::RAHMANMon Jun 02 1997 21:098
    Thanks for ur response to the problem. Opp! Sorry this is not the
ESCALATION....
    path. I will be more careful next time. However thanks for ur time in
    looking into my problem. 
    
    I will escalate this problem to our support people.
    
    Rahman
6736.8Roger is right: escalate itSUBSYS::BROWNSCSI and DSSI advice given cheerfullyTue Jun 03 1997 08:1919
    I don't think it's clear whether this is a software problem or a 
    configuration error.  The SCSI sense data is 05/21/00, which means
    the software attempted to read a block beyond the drive's capacity.
    
    Now, we know the capacity after the error was smaller than the capacity 
    before the error.  We know the blocks being read (16 blocks, starting at 
    0x7fd4ac) were within the drive's capacity before the error, and
    outside the capacity after the error.  We don't know when the capacity
    changed, or who changed it.
    
    The obvious candidates are:
    - the Informix software
    - the HSZ40 controller
    - a bus reset, causing the drive to return to the most recently saved
    	capacity
    
    It may take a fair amount of time and engineering support to find the
    cause.  Please escalate, so the right people can be identified and 
    assigned.
6736.9notes collisionWRKSYS::HOUSEKenny House, Workstations EngineeringTue Jun 03 1997 08:2326
    So far as I can tell, there are two issues in the basenote.
    
    (1)	The error log is quite explicit about the HSZ40's complaining about
        an out-of-range logical block address used by a READ(10) command. 
        The LBA requested was 8377516(decimal), although the number of
        sectors claimed in the disklabel was 8378028(decimal).
    
    (2)	Writing over the disklabel changed the geometry, so that the number
        of sectors is now 8377528(decimal).  Note that the flags now have
        "dynamic_geometry" set, too.
    
    The whole concept of a simple sector/head/track geometry is an
    industry-wide falsehood.  Zoned drives (with different number of
    sectors per track) and RAID volumes, for example, do not have this
    structure.  It would be nice, however, if all logical blocks on this
    "geometry" were addressable -- this does not seem to be the case in (1)
    above.
    
    Do SAP or Informix bypass the normal file structure to get to the raw
    drive?  Are they likely to be writing the disklabel?
    
    There is no indication of a "retry exhausted" error or "SCSI timeout"
    in the information presented in this note string to date.  Nor is there
    clear evidence of a hardware problem.
    
    -- Kenny House
6736.10SSDEVO::ROLLOWDr. File System&#039;s Home for Wayward Inodes.Tue Jun 03 1997 10:0513
	Many database class applications on UNIX use the raw device,
	it avoid any issues of whether the file system buffers the
	data (sync, fsync or not) and it avoids a buffer copy.  If
	you remember that disk read and writes have to be multiples 
	of the sector size it is also easy, using the same system calls
	as reading and writing files.

	Since Digital UNIX disklabels have been around for a few years
	most vendors that use raw disks have either figured out where
	the label is and don't use it, or require the user to partition
	the disk to protect the label.  If this is the same disklabel
	that got posted to the DIGITAL_UNIX conference this morning,
	that's what that 32 sectors is in the A partition.
6736.11Not broken H/WSMURF::KNIGHTFred KnightWed Jun 04 1997 16:0619
What most likely happened, is that some user labeled
this device BEFORE it was put into the HSZ40 (note that
there is NO dynamic geometry in the first disklabel).

Then, after installing in into the HSZ40, they just started
to use it (with the WRONG disklabel).  After the error, they
put a NEW disklabel (now a correct one) on the media (now
note that dynamic geometry IS set).  And magically, it now
works!

The only other option is the HSZ40 firmware bug that has
been BLITZed about conditions when the firmware would change
the size of a volume (not common, but still possible).

In both cases, NOTHING is broken in the H/W.  If it's case
1, then educate your customer, if case 2, use the documented
firmware workaround.

	Fred Knight
6736.12Hmm, did somebody INIT SAVE_CONFIG?SSDEVO::JACKSONJim JacksonWed Jun 04 1997 18:4625
Sure, we've seen this type of error a bunch when folks got careless about
reusing disks.  Here's a recipe for the problem:

	1) Have a direct-connected SCSI disk.  Put a filesystem on to it.
	2) Move the disk to an HSZ40
	3) INIT the disk from the HSZ40 console
	4) ADD UNIT

At this point, the host sees a disk that has a valid filesystem on it.  The
only problem is that the last few blocks have been lopped off by the HSZ40
to contain its metadata.

One of the rules we have in our lab is if you INIT it on the HSZ, then you
have to put a new filesystem on it (VMS INIT, Unix ??).  Our documentation
has stated for eons that you should assume that an HSZ INIT destroys the
user data on the disk.

disklabel value	8378028
new value	8377528
-----------------------
difference	    500

500 blocks is exactly the number of blocks consumed by SAVE_CONFIG.  So, in
your case, it would appear that you had a JBOD with a filesystem on it, the
disk got an INIT SAVE_CONFIG, and a new filesystem was not put in place.