T.R | Title | User | Personal Name | Date | Lines |
---|
323.1 | | COOKIE::FROEHLIN | Let's RAID the Internet! | Tue Feb 25 1997 09:25 | 35 |
| RAID Software uses a sys$mount call to mount the shadow sets. From that
point on RAID software only knows about the DSA devices as members of a
RAID set. All I/Os are sent to the shadowing driver by RAID$DPDRIVER.
The RAID server is not involved in any shadowing member management. It
is all done by shadowing.
> one 6 (2 member) shadowsets. It's just the stripeset with 6
> members that doesn't work. It's called DPA2 and 3 times it went
Any differences to the other RAID set (e.g. disk device types)?
What are the values for SHADOW_MAX_COPY and SHADOW_MBR_TMO on all
nodes?
> reduce the DSA-devices to 1 member. Now the stripeset with 6 (1 member)
> shadowsets works fine but without any security.
Did they try to add one member to the shadow sets at a time with
RAID ADD/SHADOW? I mean adding a 2nd member to the first shadow set.
Wait until the copy has completed and so on.
> Why do we get this problems?
What's in the OPERATOR.LOG and ERRLOG.SYS related to either the
controllers, the shadow set members or the shadow sets?
> Is there a limitation with 12 disk in a raid 0+1 set?
No!
> Could we have a quota problem with the Raid-server process?
No!
Guenther
|
323.2 | More info | RULLE::LINDSTROM_S | | Wed Feb 26 1997 12:06 | 27 |
|
The parameters you asked about are as follows
SHADOW_MBR_TMO = 20
SHADOW_MAX_COPY = 4
The customer are trying tonight to add 1 disk at the time.
There are nothing in the errorlog other than the entries telling
us that DPA2 went offline into mount verification.
According to operator.log all 6 DSA devices was added into DPA2.
The first hour after the reboot went without errors and 2 of the
shadow copy operations were completed. Then there was a couple of
mount verification that went back online again. From now on DPA2
is logging about 200 errors/5 min. Then it gets stuck in mount
verification until 'Mount verify timeout'.
This raid set is made of RZ29B with 0014 and 0016 microcode only.
We havent seen any indication of problem with the HSZ40,disks or
the DSA devices. The only symptom is DPA2 is Offline and goes
into mount verification.
The 6 other raidsets with 5 dsa disk members are working without
any problem and this 6 dsa-disk raid set has worked all day
with just 1 shadowmember.
If there are any trouble with H/W or shadowing why doesn't it show?
Have we missed to look at the right places?
Regards Sten Lindstrom
CSC Sweden
|
323.3 | | AMCFAC::RABAHY | dtn 471-5160, outside 1-810-347-5160 | Wed Feb 26 1997 12:27 | 1 |
| Are the HSZ40's in dual redundant pairs? Is the PREFER command being used?
|
323.4 | Single controller | RULLE::LINDSTROM_S | | Wed Feb 26 1997 14:13 | 7 |
|
The HSZ40 is a single controller.
By the way how do I get the new shadowing code (note 3.15) used in V7.1?
Did I get by installing ALPSHAD05_062?
Sten.
|
323.5 | | COOKIE::FROEHLIN | Let's RAID the Internet! | Wed Feb 26 1997 14:44 | 18 |
| Sten,
the TIMA patch kit for V7.1 has not been released yet. Is your customer
running straight V6.2 or the V6.2 Compatibility Kit (which goes with
V7.1)?
DPA2 is going into mount verification and all DSA devices look ok from
a SHOW DEVICE? Then I'm puzzled. If this happens (DPA device in mount
verification) can they still do this:
$ DUMP DSAnnn:[000000]RAID$BC1.SYS/BLOCK=(COUNT:1)
for each shadow set in this RAID set?
What about a "RAID ANALYZE/ERROR/OUTPUT=..."? Any entries? Is this
truely as standalone Alpha?
Guenther
|
323.6 | | COOKIE::FROEHLIN | Let's RAID the Internet! | Wed Feb 26 1997 14:49 | 13 |
| Still need to know the configuration. The RAID sets which work, are
the disks connected to the same controller? Could it be that just one
(maybe more) shadow set is having a problem? Keep in mind that if just
one member in the RAID set has a problem the DPA device can enter mount
verification whenever we hit the fould RAID set member.
If they can still "play" with these disks devices they should try to
DCL INITIALIZE the disks and mount them from DCL in exactly the same
pairs as they use them in the RAID set and do some copies to/from these
shadow sets. Maybe this helps to isolate the faulte shadow set
(assuming this is the case).
Guenther
|
323.7 | More info again | RULLE::LINDSTROM_S | | Thu Feb 27 1997 04:59 | 207 |
|
Last night the customer added 1 member at the time and now the
raid set is complete. We will see what happens today but he feels
a little bit more confident now. Here are the disk configuration.
There are some errors logged since last boot. These errors came
up when he did a HSZ>DELE UNIT and the controller restarted. But
this is an other problem I think. After that HSZ restart we still had
the same problem with the DPA2 device.
When he first came up after the reconfiguration of the HSZ all the
disks had to do a shadow full copy. Could the heavy load be part
of the problem?
Sten
*******************************************************************************
$ mc sysman para sho vaxc
%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node GAER13
Node GAER13: Parameters in use: ACTIVE
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 0 1 0 2 Coded-value
$ sho dev dp
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DPA0: (GAER13) Online 0
DPA2: (GAER13) Mounted 0 DISK2 4650639 105 1
DPA3: (GAER13) Mounted 0 DISK3 8717625 101 1
DPA4: (GAER13) Mounted 0 DISK4 7861586 46 1
DPA8: (GAER13) Offline 0
*******************************************************************************
$ raid sho disk2
StorageWorks(TM) RAID Software V2.3 Display Time: 27-FEB-1997 10:36:23.85
Copyright Digital Equipment Corporation 1993-1996. All Rights Reserved.
RAID Array Parameters:
Current RAID Array ID: DISK2
Permanent RAID Array ID: DISK2
RAID Level: 0+1
Current State: NORMAL
RAID Array Configuration:
Member ShadowSet ShadowSet
Index Name State Members State
----- ------ ----- --------- ---------
0 _DSA21: NORMAL 2 SteadyState
1 _DSA22: NORMAL 2 SteadyState
2 _DSA23: NORMAL 2 SteadyState
3 _DSA24: NORMAL 2 SteadyState
4 _DSA25: NORMAL 2 SteadyState
5 _DSA26: NORMAL 2 SteadyState
Virtual
Unit Size Status Reads Writes Errors
------- ------ -------- ----- ------ ------
DPA0002: 50188312 ACCESS 4187604 713687 0
*******************************************************************************
$ raid sho disk3
StorageWorks(TM) RAID Software V2.3 Display Time: 27-FEB-1997 10:36:30.44
Copyright Digital Equipment Corporation 1993-1996. All Rights Reserved.
RAID Array Parameters:
Current RAID Array ID: DISK3
Permanent RAID Array ID: DISK3
RAID Level: 0+1
Current State: NORMAL
RAID Array Configuration:
Member ShadowSet ShadowSet
Index Name State Members State
----- ------ ----- --------- ---------
0 _DSA31: NORMAL 2 SteadyState
1 _DSA32: NORMAL 2 SteadyState
2 _DSA33: NORMAL 2 SteadyState
3 _DSA34: NORMAL 2 SteadyState
4 _DSA35: NORMAL 2 SteadyState
Virtual
Unit Size Status Reads Writes Errors
------- ------ -------- ----- ------ ------
DPA0003: 41824792 ACCESS 2097262 1079406 0
*******************************************************************************
$ raid sho disk4
StorageWorks(TM) RAID Software V2.3 Display Time: 27-FEB-1997 10:36:35.16
Copyright Digital Equipment Corporation 1993-1996. All Rights Reserved.
RAID Array Parameters:
Current RAID Array ID: DISK4
Permanent RAID Array ID: DISK4
RAID Level: 0+1
Current State: NORMAL
RAID Array Configuration:
Member ShadowSet ShadowSet
Index Name State Members State
----- ------ ----- --------- ---------
0 _DSA41: NORMAL 2 SteadyState
1 _DSA42: NORMAL 2 SteadyState
2 _DSA43: NORMAL 2 SteadyState
3 _DSA44: NORMAL 2 SteadyState
4 _DSA45: NORMAL 2 SteadyState
Virtual
Unit Size Status Reads Writes Errors
------- ------ -------- ----- ------ ------
DPA0004: 41824792 ACCESS 1722124 550852 0
*******************************************************************************
$ sho dev d
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 SYSDISK 3927375 453 1
DSA1: Mounted 0 DISK1 6619482 89 1
DSA5: Mounted 0 DISK5 845068 101 1
DSA6: Mounted 0 DISK6 3543084 60 1
DSA7: Mounted 0 DISK7 7740180 3 1
DSA21: Mounted 0 DISK20000000 32 2 1
DSA22: Mounted 0 DISK20000001 32 2 1
DSA23: Mounted 0 DISK20000002 32 2 1
DSA24: Mounted 0 DISK20000003 32 2 1
DSA25: Mounted 0 DISK20000004 32 2 1
DSA26: Mounted 0 DISK20000005 32 2 1
DSA31: Mounted 0 DISK30000000 32 2 1
DSA32: Mounted 0 DISK30000001 32 2 1
DSA33: Mounted 0 DISK30000002 32 2 1
DSA34: Mounted 0 DISK30000003 32 2 1
DSA35: Mounted 0 DISK30000004 32 2 1
DSA41: Mounted 0 DISK40000000 32 2 1
DSA42: Mounted 0 DISK40000001 32 2 1
DSA43: Mounted 0 DISK40000002 32 2 1
DSA44: Mounted 0 DISK40000003 32 2 1
DSA45: Mounted 0 DISK40000004 32 2 1
DPA0: (GAER13) Online 0
DPA2: (GAER13) Mounted 0 DISK2 4650541 105 1
DPA3: (GAER13) Mounted 0 DISK3 8717830 103 1
DPA4: (GAER13) Mounted 0 DISK4 7861586 46 1
DPA8: (GAER13) Offline 0
$1$DKA0: (GAER13) Online 0
$1$DKA100: (GAER13) ShadowSetMember 0 (member of DSA1:)
$1$DKA200: (GAER13) ShadowSetMember 0 (member of DSA6:)
$1$DKA300: (GAER13) ShadowSetMember 0 (member of DSA7:)
$1$DKA400: (GAER13) ShadowSetMember 0 (member of DSA5:)
$1$DKA600: (GAER13) Online 0
$1$DKB0: (GAER13) ShadowSetMember 0 (member of DSA0:)
$1$DKB100: (GAER13) ShadowSetMember 0 (member of DSA1:)
$1$DKB200: (GAER13) ShadowSetMember 0 (member of DSA6:)
$1$DKB300: (GAER13) ShadowSetMember 0 (member of DSA7:)
$1$DKB400: (GAER13) ShadowSetMember 0 (member of DSA5:)
$1$DKC0: (GAER13) ShadowSetMember 1 (member of DSA21:)
$1$DKC1: (GAER13) ShadowSetMember 2 (member of DSA26:)
$1$DKC2: (GAER13) ShadowSetMember 1 (member of DSA31:)
$1$DKC3: (GAER13) ShadowSetMember 4 (member of DSA41:)
$1$DKC4: (GAER13) ShadowSetMember 0 (member of DSA45:)
$1$DKC5: (GAER13) ShadowSetMember 1 (member of DSA22:)
$1$DKC6: (GAER13) ShadowSetMember 2 (member of DSA21:)
$1$DKC7: (GAER13) ShadowSetMember 1 (member of DSA32:)
$1$DKC100: (GAER13) ShadowSetMember 1 (member of DSA31:)
$1$DKC101: (GAER13) ShadowSetMember 2 (member of DSA42:)
$1$DKC102: (GAER13) ShadowSetMember 1 (member of DSA23:)
$1$DKC103: (GAER13) ShadowSetMember 2 (member of DSA22:)
$1$DKC104: (GAER13) ShadowSetMember 6 (member of DSA33:)
$1$DKC105: (GAER13) ShadowSetMember 1 (member of DSA32:)
$1$DKC106: (GAER13) ShadowSetMember 1 (member of DSA43:)
$1$DKC107: (GAER13) ShadowSetMember 1 (member of DSA24:)
$1$DKC200: (GAER13) ShadowSetMember 2 (member of DSA23:)
$1$DKC201: (GAER13) ShadowSetMember 2 (member of DSA34:)
$1$DKC202: (GAER13) ShadowSetMember 4 (member of DSA33:)
$1$DKC203: (GAER13) ShadowSetMember 1 (member of DSA44:)
$1$DKC204: (GAER13) ShadowSetMember 1 (member of DSA25:)
$1$DKC205: (GAER13) ShadowSetMember 2 (member of DSA24:)
$1$DKC206: (GAER13) ShadowSetMember 10 (member of DSA35:)
$1$DKC207: (GAER13) ShadowSetMember 2 (member of DSA34:)
$1$DKC300: (GAER13) ShadowSetMember 2 (member of DSA42:)
$1$DKC301: (GAER13) ShadowSetMember 1 (member of DSA44:)
$1$DKC302: (GAER13) ShadowSetMember 1 (member of DSA26:)
$1$DKC303: (GAER13) ShadowSetMember 2 (member of DSA25:)
$1$DKC304: (GAER13) ShadowSetMember 9 (member of DSA35:)
$1$DKC305: (GAER13) ShadowSetMember 7 (member of DSA41:)
$1$DKC306: (GAER13) ShadowSetMember 1 (member of DSA45:)
$1$DKC307: (GAER13) ShadowSetMember 1 (member of DSA43:)
$1$DVA0: (GAER13) Online 0
|
323.8 | | COOKIE::FROEHLIN | Let's RAID the Internet! | Thu Feb 27 1997 09:59 | 6 |
| So all disks are connected to the very same controller. How about which
drives are in which shelve (SCSI bus)?
What about entries from "RAID ANALYZE/ERRORLOG SYS$ERRORLOG/OUTPUT=...?
Guenther
|
323.9 | Is the CDDB reinit count going up? | VMSSG::JENKINS | Kevin M Jenkins VMS Support Engineering | Thu Feb 27 1997 11:38 | 8 |
| Check the CDDB reinit count to see if the controller is breakingf
the connection. Use SDA.. SHOW DEV DUAxx the second screen should
have the CDDB data, the reinit field is in the middle section.
If so then you may be seeing some sort of controller related
load situation.
Kevin
|
323.10 | No more problems | RULLE::LINDSTROM_S | | Thu Mar 06 1997 02:37 | 7 |
|
Since we got the raid set established it hasn't been any more
problems. There has been no indication of H/W errors anywere so
I don't know what conclusions to make. We are closing this for now.
But thanks for your support anyway.
Sten.
|
323.11 | Blame DUDRIVER | ESSB::JNOLAN | John Nolan | Mon Mar 17 1997 14:58 | 8 |
|
Having experienced similar problems (though with HSJ's) I'd put the
blame on DUDRIVER. VAX(ALP)DRIV01_070 contains a fix for Shadowset
going into Mountverify and not coming out of it. Needless to say
DRIV04 is an even better kit to have than any of the earlier ones
(read the release notes for it and be thankful that you have HSZ
and not HSJ's, as I've experienced both the shadowset mount verify
and the other problem mentioned)!
|
323.12 | DRIV04 good idea, but not likely a fix | VMSSPT::JENKINS | Kevin M Jenkins VMS Support Engineering | Tue Mar 18 1997 05:59 | 8 |
|
The DRIV04 kit would be a good thing to have, however the MountVerify
problem that was fixed in DRIV01 happened only on "idle" connections
and would go away when any load was put on the devices. This problem
seems to happen only under load. This makes me wonder about something
in the controller or drives that doesn't like a heavy load.
Kevin
|