|
A low battery condition will cause an hsz controller failover
to occur (in dual-redundant configurations) with firmware V3.0
or greater. Older versions of the HSOF did not support this.
(see the extract below).
HSZ controller failovers should be transparent to Windows NT
and to the NT cluster software. This is one of the functions of
the hszdisk.sys filter driver.
What was the state of D400 from the HSZ's point of view ? What
it operative ? If so, could NT still see the disk partition(s)
associated with the storageset ? What was the status of the
cluster failover group that was associated with D400 ?
You should check the FMlog files for any anomalies around the time
of the hsz failover.
Post a note in the hsz40_product notes conference since this
seems to be a hsz failover issue and not a cluster issue.
Better yet, upgrade to the hsz50. The cache battery (ECB) design
is MUCH improved.
HSZ40 Array Controller Operating Software (HSOF), Version 3.0
SPD 53.54.09
DESCRIPTION
Cache Battery Diagnostic
Software Version 3.0 checks the condition of the optional write-back cache
batteries every 24 hours. If a low capacity or failure is detected, write-back
cache data is flushed from cache and depending on the pre-defined cache policy,
selected RAIDsets and disk mirrorsets may become inoperative. In dual redundant
configurations, failover to the redundant controller will occur.
Refer to the HSZ40 Array Controller Operating System Software Release Notes,
EK-HSZ40-RN. K01, for further information.
|
|
From the HSZ point of vieuw (during the problem), d400 was
AVAILABLE.
From the Cluster point of view, D400 was OFFLINE on BOTH systems
and when trying to force online, an UNKNOWN ERROR code popped up.
Looking through the FMlogs (where there is a lot of information),
the cluster-software tries to failover the D400 Shared disk to the
other system, which also fails. Timeout's occur and error-threshold
is exceeded.
For a still unknown reason, during the start of the problems, BOTH
system's FMlogs report "This system has lost connectivity to node
<other_node>".
They do have a TAPE on the shared SCSI-bus and are doing a backup of
the Shared disk on the other system via the network.
Although I did NOT find any statement that TAPES are NOT supported
on the shared bus, I assume the release notes must be read as: it
does NOT mention any tapes, so NOT supported on the shared bus.
Hszdisk.sys is at V2.51 and we found it's NOT stable. V2.71 behaves
much better.
So that is at least ONE step which MUST be done.
Jan Visser.
|