[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | HSZ40 Product Conference |
|
Moderator: | SSDEVO::EDMONDS |
|
Created: | Mon Apr 11 1994 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 902 |
Total number of notes: | 3319 |
793.0. "Save_config blitz" by SSDEVO::ASTOR (Subsystems Engineering Support) Tue Mar 04 1997 10:27
Copyright (c) Digital Equipment Corporation 1997. All rights reserved.
+---------------------------+TM
| | | | | | | |
| d | i | g | i | t | a | l | TIME DEPENDENT BLITZ
| | | | | | | |
+---------------------------+
BLITZ TITLE:
Possible problem with disks intitialized with SAVE_CONFIG under HSOF V2.7
on HSZ40/20/SWXRC
PRIORITY LEVEL: 1
DATE: 2/21/97
TD #: 2241
AUTHOR: Kurt Astor, Tom Gonzales
DTN: 522-2478, 522-6234
EMAIL: SSDEVO::ASTOR, SSDEVO::T_GONZALES
DEPARTMENT: Subsystem Engineering Support
=================================================================
PRODUCT NAME(S): HSZ40, RA410, SWXRC
PRODUCT FAMILY(IES):
Storage _X_
Systems/OS ___
Networks ___
PC/Peripherals ___
Software Apps. ___
BLITZ TYPE:
Maintenance Tip _X_
Service Action Requested ___
IF SERVICE ACTION IS REQUESTED:
Labor Support Required ___
Material Support Required ___
Estimated time to complete activity (in hours):
Will this require a change in the field's inventory: Yes ___ No _X_
Will an FCO be associated with this advisory? Yes ___ No _X_
DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):
**********************************************************************
SYMPTOM:
There is a remote possibility that some disks attached to
HSZ40/20/SWXRC and the solution products containing them (RA410,
SC4200/4600, etc.) may have a problem in the structure of the
on-disk file system. Systems which may be affected are those
which:
1. Use disks in "JBOD" configuration (that is, disks which
are not members of controller-based storagesets such as
RAIDsets and mirrorsets)
2. Initialized disks under HSOF V2.7Z using the SAVE_CONFIG
command AND rebooted the controller BEFORE initializing
the disk under the operating system
Note that the problem does not occur if the file system was
built on the disk before the controller was rebooted. Also,
the problem does not occur when disks are initialized using
SAVE_CONFIG and the platform operating system under HSOF V3.0Z.
Note that all 2GB and 4GB drives on Windows NT platforms are NOT
exposed to this potential problem. Drives on other platforms
meeting the above criteria have a small risk of exposure; see the
"How to Detect" section of this Blitz for procedures to determine
whether a disk is exposed.
PROBLEM STATEMENT:
When a disk being used in a JBOD configuration is initialized
with SAVE_CONFIG, the last 500 blocks on the disk are allocated
by the controller to store the configuration data. If the
controller running HSOF V2.7Z is rebooted BEFORE the disk is
initialized by the platform operating system, the controller
fails to remember the reduction in disk size and reports the
unreduced disk capacity to the operating system. When the
operating system subsequently builds the file system, the blocks
which SAVE_CONFIG will use to update the configuration data are
also included in the file system disk space, creating a potential
for both the operating system and the controller to write to the
last 500 blocks on disk.
If the file system subsequently overwrites configuration data,
the controller recognizes that the data is invalid config data
and ignores it. In this case, controller parameters must be
manually re-entered when SAVE_CONFIG tries to restore the
configuration (unless another drive contains valid config data).
Various configuration events will cause the controller to write
the config data to the SAVE_CONFIG area. If the controller
overwrites file system data, the results vary depending on the
platform operating system and the application.
If a controller which has this problem is upgraded to HSOF V3.0Z
before the differing file system and controller view of the disk
capacity is resolved and the file system tries to access the
SAVE_CONFIG area, the controller returns an error to the
operating system. The action that the operating system will take
upon receiving this error will vary depending on the platform,
but may include rendering the entire file system or database
inaccessible.
HOW TO DETECT IF YOU HAVE THIS PROBLEM:
1. Windows NT platforms
As previously noted, 2GB and 4GB drives on Windows NT platforms are
not exposed to the problem described in this blitz. This problem
affects 1GB single-disks units in JBOD configuration with SAVE_CONFIG
data stored on them. If you are not using 1GB JBOD disk units with
SAVE_CONFIG data saved on them, do not proceed any further. Your
system is NOT at risk.
Use the following procedure to check a JBOD 1GB drive with
SAVE_CONFIG data saved on it to determine whether it is exposed:
a. Shut down the host computer, wait until shut down is complete
b. Restart the hsz controller(s) by pressing the heart-beat
button(s) (Green reset button)
c. Wait a minute, then start the host computer
d. After the host reboots, start up 'Disk Administrator.'
e. Determine which drive on 'Disk Admin' corresponds to the
1GB JBOD disk to be checked.
f. Check if the jbod has a 1MB or greater unpartitioned space at
the end of disk.
g. If 'f' is true, the disk does NOT have the problem described
in this blitz. Make sure that you never use the last 1MB
space, leave it unpartitioned.
h. If 'f' is false, there is no unpartitioned space at the end of
the disk, then the very last 196 Blocks (100KB) on the drive
are at risk for the problem described in this blitz. See
the "Solution" section below for the recovery procedure.
2. Novell NetWare platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
NetWare reserves 2% of the space at the end of each disk for bad
block replacement. 500 blocks (256KB) at the end of this 2% space
will be exposed to the problem described in this blitz. A 2% space
is larger than is generally needed for replacing bad blocks. For
example, reserve space on a 4GB, 2GB, and 1GB disk is 80MB, 40MB,
and 20MB respectively. The probability of a bad block being
replaced in the last 256KB of this reserve space is very small;
however, it is possible. Use the following procedure to check a
disk in JBOD configuration to determine whether it is exposed:
a. NWSERVER> load install
b. Open "disk options"
c. Open "Modify disk partition and Hot Fix"
d. Select disk drive
e. Choose "Change Hot Fix"
f. Record "Redirection Area", this is the BadBlock size.
g. calculate 2% of the disk
h. if BadBlock size is less than (2% - 256KB) then the disk
is NOT affected.
i. if the BadBlock size is greater than (2% - 256KB) then the
disk IS at risk. See the "Solution" section below for the
recovery procedure.
3. Sun Solaris and SunOS platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
If you followed the installation guide, you are not at risk. This
is due to the fact that the default partition layout reserves the
last two cylinders for diagnostic purposes. The 500 blocks in
question will always reside within those two diagnostic cylinders.
If you changed the default partition layout, AND allocated the two
diagnostic cylinders to a partition, you may be at risk.
If disks in your system are at risk of this problem, use the
following procedure to check a disk in JBOD configuration to
determine whether it is exposed:
a. Use the GUI to display the number of blocks on the unit.
Do this by selecting the LUN in question, and then chosing
LUN parameters from the pull-down menu. Write down this number.
b. Use the tip command (or an RS-232 terminal) to connect to
the controller CLI. If you have problems or questions, this
command is documented in the installation guide.
c. Use the CLI command show <unitname>, substituting the actual
name of the unit in question for <unitname>.
d. If the GUI and the CLI report different sizes for the same
unit, you are at risk for the problem. See the "Solution"
section below for the recovery procedure.
4. OpenVMS platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
If disks in your system are at risk of this problem, use the
following procedure to check a disk in JBOD configuration to
determine whether it is exposed:
a. At the controller prompt, type SHOW DISKnnn (where nnn is
the JBOD disk in question).
b. Look for "Configuration being backed up on this container"
message.
c. Record the block size capacity displayed by the controller.
d. From the OpenVMS prompt on one of the hosts, mount the disk
in question and type the command:
$ show device/full dka200:
e. Compare the total block size obtained from the "show device"
command with the block size capacity obtained in step 'c.'
f. If the reported sizes are different, this disk is at risk for
the problem. See the "Solution" section below for the recovery
procedure.
5. DIGITAL UNIX platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
If disks in your system are at risk of this problem, use the
following procedure to check a disk in JBOD configuration to
determine whether it is exposed:
a. At the controller prompt, type SHOW DISKnnn (where nnn is
the JBOD disk in question).
b. Look for "Configuration being backed up on this container"
message.
c. Record the block size capacity displayed by the controller.
d. From the DIGITAL UNIX on one of the hosts, type the following
commands (rrza18c is used in the following example as the device
in question):
# disklabel -rw /dev/rrza18c HSZ40
# disklabel -r /dev/rrza18c
# /dev/rrza18c:
e. Compare the sectors/unit output from disklabel command with
the block size capacity obtained in step 'c.'
f. If the reported sizes are different, this disk is at risk for
the problem. See the "Solution" section below for the recovery
procedure.
6. AIX platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
If disks in your system are at risk of this problem, use the
following procedure to check a disk in JBOD configuration to
determine whether it is exposed:
AIX 4.1.4:
a. Sum the raw device as shown in the following command:
sum -r /dev/rhdiskN
b. If this operation results in a read error as shown below,
the disk is at risk for the problem. See the "Solution"
section below for the recovery procedure.
sum: read error on /dev/rhdiskN
AIX 3.2.5: Disks on systems which have the risk factors described
above should be regarded as at risk for the problem
described in this blitz.
7. HP-UX platforms
The problem described in this blitz affects single-disks units in
JBOD configuration with SAVE_CONFIG data stored on them. If you are
not using JBOD disk units with SAVE_CONFIG data saved on them, do
not proceed any further. Your system is NOT at risk.
Disks on systems which have the risk factors described above should
be regarded as at risk for the problem described in this blitz.
SOLUTION:
1. If you are using SAVE_CONFIG to initialize JBOD disks under
HSOF V2.7, be sure to initialize the disk with the platform
file system BEFORE rebooting the controller.
2. If a customer has the risk factors for the problem as described
in the SYMPTOM and DETECTION sections above, he should use the
steps below to resolve the discrepancy in controller/operating
system views of the disk at the earliest opportunity. Digital
recommends that the recovery process described below be
performed BEFORE upgrading the V2.7Z controller to V3.0Z. Any
files which may have been written in the SAVE_CONFIG area will
be accessible to the operating system after the restore process;
however, any such files are suspect and should be carefully
examined to ensure that the data they contain is correct, or
restored from a previous backup.
a. Back up the unit that contains SAVE_CONFIG information.
b. Unmount the file system(s) contained on that unit.
c. Delete the unit from the configuration in the controller.
d. Initialize the container from the controller without SAVE_CONFIG.
e. Add the unit back into the configuration.
f. Initialize and restore unit from backup.
VERIFICATION:
N/A
LARS INFORMATION: (Supplied by MCS)
Attention Service Personnel: Begin the comment field of your LARS
with the word "BLITZ" when you perform an activity associated with a
BLITZ Type "Service Action Requested".
*** DIGITAL INTERNAL USE ONLY ***
\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT PROD=HSZ40
T.R | Title | User | Personal Name | Date | Lines
|
---|