[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1860.0. "3.2g/1.3/hsz40 remove LSM mirror - hangs process" by NETRIX::"[email protected]" (decatl::johnson) Wed Feb 05 1997 11:33

I am submitting this here because I have yet obtained the specifics
required for a IPMT from the CSC and do not have access to resources or 
system for followup answers:


3.2G  ASE 1.3

Installed patches for LSM vold and ASE 1.3 
	I am assumming
		PATCH ID: OSF375-350244, ASE130-015, & ASE130-014 
                          (vold,voldisk),(lsm_dg_action),(am.o,am_scsi.o)

Initially the contact blamed the action on am.o patch, but his problem
statement (below) futher defines his suspects.

The problem is that when a disk was removed from a mirrored LSM
volume, the  "# ls   /filesystem"  hung indefinately.

They then 
	set the ASE service off line
	manually put LSM disk/diskgroup on line
	mounted the /filesystem  (outside of ASE)
	and repeated the test.

	the ls command did not hang.

I have the output of checkit from the system and expect the output
from the other system soon, and a 16 page fax of HSZ> output, but
do not have additional specifics as to the specific time these
events took place (which syslog), where in uerf.   there is lots
of uerf info, looks like reseveration stuff...

Anyway, if you have any idea of the problem from this info,
it would help the issue::

=============================================================================
This is from Dave Ascher, Systems Integeration, at a Hot Customer 
Production Site:
=============================================================================

    The problem which I am reporting is that when a disk was pulled
    out of the shelf (I'm afraid I cannot identify which disk at
    this time) I/O to the LSM VOLUME of which that disk was one
    of TWO mirrored PLEXes was hung indefinitely. THe I/O was
    generated using an "ls" command.
    
    When the DECsafe service was set OFFLINE, the disks brought
    ONLINE manually using the "voldisk online" command, the disk
    group imported using the "sapdg import" command, the file systems
    mounted manually - the same experiment yielded the expected
    behavior - that is I/O to the VOLUME completed normally.

    We have installed the newest lsm_dg_action script (v 1.2.21.2?)
    the newest versions of asehsm and aseagent and the newest am.o
    and am_scsi.o. While I originally suspected that the problem
    is most likely in the AM driver, I now believe that asehsm
    is an equally likely culprit.
    
    Clearly, the microcode version of the HSZ or the Rev level of the disk
    or the KZPSA, etc... is not a factor as they remain constant
    in our experiment. The experiment points right at a DECsafe
    component.
    
     ......remainder deleted, dave can include it if he desires......

==============================================================================


Additional info from Sid, to help with the above email::

>  Clearly, the microcode version of the HSZ or the Rev level of the disk
>    or the KZPSA, etc... is not a factor as they remain constant

	----------------------------------------------
from email/uerf:        kzpsa (rev A10)	                    -->  
from uerf:                 bus (scsi4 through scsi8)        -->
from hsz/uerf:                hsz40  (rev 27    or 30)      --> 
from hsz:                       dual redundant              --> 
from hsz: units                  (D0-D1,D101,D202,etc____)  --> 
from hsz: disks                     (DISK110,etc) -->  
from hsz:                             rz29B's  (most at 016, some at 014)
 
	----------------------------------------------

this is some of the output from df::
sap_PRD_domain4#sapdata1           16753952     5981952    10644144    36%   
/o
racle/PRD/sapdata1
sap_PRD_domain5#sapdata2           16753952     5777040    10964240    35%   
/o
racle/PRD/sapdata2
sap_PRD_domain6#sapdata3           16753952     5294496    11446784    32%   
/o
racle/PRD/sapdata3
sap_PRD_domain7#sapdata4           16753952     7639360     9101920    46%   
/o
racle/PRD/sapdata4
sap_PRD_domain8#sapdata5           16753952     6512960    10228320    39%   
/o
racle/PRD/sapdata5


this is some of the output from volprint::
	----------------------------------------------

v  proddisk07   fsgen        ENABLED  ACTIVE   8376988  SELECT   -
pl proddisk07-01 proddisk07   DISABLED NODEVICE 8376988  CONCAT   -        RW
sd rza33-01     proddisk07-01 0        0        8376988  rza33        -
pl proddisk07-02 proddisk07   ENABLED  ACTIVE   8376988  CONCAT   -        RW
sd rza41-01     proddisk07-02 0        0        8376988  rza41        rza41

v  proddisk08   fsgen        ENABLED  ACTIVE   8376988  SELECT   -
pl proddisk08-01 proddisk08   ENABLED  ACTIVE   8376988  CONCAT   -        RW
sd rzb33-01     proddisk08-01 0        0        8376988  rzb33        rzb33
pl proddisk08-02 proddisk08   ENABLED  ACTIVE   8376988  CONCAT   -        RW
sd rzb41-01     proddisk08-02 0        0        8376988  rzb41        rzb41

v  proddisk09   fsgen        ENABLED  ACTIVE   8376988  SELECT   -
pl proddisk09-01 proddisk09   ENABLED  ACTIVE   8376988  CONCAT   -        RW
sd rzc33-01     proddisk09-01 0        0        8376988  rzc33        rzc33
pl proddisk09-02 proddisk09   ENABLED  ACTIVE   8376988  CONCAT   -        RW
sd rzc41-01     proddisk09-02 0        0        8376988  rzc41        rzc41

	----------------------------------------------
	i asked if the disk was rza33, answer 'dont know'
	----------------------------------------------
rza33        sliced    -            -            online
rza34        sliced    rza34        prod_data    online
rza40        sliced    rza40        prod_data    online
rzf73        sliced    rzf73        prod_data    online
rzf74        sliced    rzf74        prod_data    online
-            -         rza33        prod_data    failed was:rza33
	----------------------------------------------
	only one service in asemgr, part of the output is
	----------------------------------------------
     Status for DISK service `dbfailover`

 Status:             Relocate:  Placement Policy:       Favored Member(s):
 on copley           no         Restrict to Member(s)   copley,pierre


        Storage configuration for DISK service `dbfailover`

Mount Table (device, mount point, type, options)
 /dev/vol/prod_data/proddisk37 NONE ufs NONE
 /dev/vol/prod_data/proddisk38 NONE ufs NONE
	... 
 /dev/vol/prod_data/proddisk20 NONE ufs NONE
 /dev/vol/prod_data/proddisk19 NONE ufs NONE

LSM Configuration
 Disk Group:      Device(s):
 prod_data        rza32 rza33 rza34 rza40 rza41 rza42 rza48 rza49 rza50 rza56
rz
a57 rza58 rza64 rza65 rza66 rza72 rza73 rza74 rzb32 rzb33 rzb34 rzb40 rzb41
rzb4
2 rzb48 rzb49 rzb50 rzb56 rzb57 rzb58 rzb64 rzb65 rzb66 rzb72 rzb73 rzb74
rzc32 
rzc33 rzc34 rzc40 rzc41 rzc42 rzc48 rzc49 rzc50 rzc56 rzc57 rzc58 rzc64 rzc65
rz
c66 rzc72 rzc73 rzc74 rzd32 rzd33 rzd34 rzd40 rzd41 rzd42 rzd48 rzd49 rzd50
rzd5
6 rzd57 rzd58 rzd64 rzd65 rzd66 rzd72 rzd73 rzd74 rze32 rze33 rze34 rze40
rze41 
rze42 rze48 rze49 rze50 rze56 rze57 rze58 rze64 rze65 rze66 rze72 rze73 rze74
rz
f32 rzf33 rzf34 rzf40 rzf41 rzf42 rzf48 rzf49 rzf50 rzf56 rzf57 rzf58 rzf64
rzf6
5 rzf66 rzf72 rzf73 rzf74
	----------------------------------------------
	 grep ASE rc.config
		ASELOGGER="1"
		export ASELOGGER
		ASE_PARTIAL_MIRRORING="ON"
		export ASE_PARTIAL_MIRRORING
	---------drwxr-xr-x   2 root     system      8192 Oct  3 15:34 root_domain
drwxr-xr-x   2 root     system      8192 Jan 25 14:02 sap_PRD_domain0
drwxr-xr-x   2 root     system      8192 Jan 30 02:51 sap_PRD_domain1
drwxr-xr-x   2 root     system      8192 Jan 30 20:48 sap_PRD_domain2
drwxr-xr-x   2 root     system      8192 Jan 30 20:51 sap_PRD_domain3
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain4
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain5
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain6
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain7
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain8
drwxr-xr-x   2 root     system      8192 Jan 30 23:30 sap_PRD_domain9
drwxr-xr-x   2 root     system      8192 Jan 27 23:37 test_domain
drw/etc/fdmns/sap_PRD_domain4:
total 0


lrwxr-xr-x   1 root     system        29 Jan 30 02:51 proddisk16 ->
/dev/vol/pro
d_data/proddisk16
lrwxr-xr-x   1 root     system        29 Jan 25 13:38 proddisk37 ->
/dev/vol/pro
d_data/proddisk37

/etc/fdmns/sap_PRD_domain5:
total 0
lrwxr-xr-x   1 root     system        29 Jan 30 02:51 proddisk22 ->
/dev/vol/pro
d_data/proddisk22
lrwxr-xr-x   1 root     system        29 Jan 30 18:27 proddisk43 ->
/dev/vol/pro
d_data/proddisk43

xr-xr-x   2 root     system      8192 Oct  3 17:51 tmp_domain
drwxr-xr-x   2 root     system      8192 Oct  3 17:51 usr_domain
drwxr-xr-x   2 root     system      8192 Oct  3 17:51 var_domain
	-------------------------------------
This if from another person working the issue::

	Production
	----------		
	CPU	E2061-DA		REV A01
	MEMORY	MS7CC-DF		REV B01
		MS7CC-DP		REV B01
		KFTIA 	(E2054-AA) 	REV C09
		KFTHA	(E2052-AA)	REV D03
		KZPSA-BA				QUANTITY=4
		KZPSA			REV A10		QUANTITY=6
		FDDI 					QUANTITY=2

        ALPHA FIRMWARE UPGRADED WITH VERSION 3.8 CD
	QA
	--
        CPU     E2061-DA                REV A01
        MEMORY  MS7CC-FP                REV B01
                KFTIA   (E2054-AA)      REV C09
                KFTHA   (E2052-AA)      REV D02
                KZPSA-BA  				QUANTITY=4
                KZPSA			REV A10		QUANTITY=6
                FDDI					QUANTITY=2
	ALPHA FIRMWARE UPGRADED WITH VERSION 3.8 CD

	-------------------------------------     

[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines