[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference spezko::cluster

Title:+ OpenVMS Clusters - The best clusters in the world! +
Notice:This conference is COMPANY CONFIDENTIAL. See #1.3
Moderator:PROXY::MOORE
Created:Fri Aug 26 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5320
Total number of notes:23384

5256.0. "Hang following HSC powerup" by GEC013::OAKLEY () Fri Mar 14 1997 11:51

A customer running a VMS V5.5-2 cluster experienced a hang after
field service powered on an HSC they had been repairing. The 
cluster looks like:

                 (6) HSC70
                     V830
                      |
 (CI)   -------------------------------
        |         |         |         |
       VAX A     VAX B     VAX C     VAX D     VAX E (Quorum VAX)
       6520      6520      6610      6610      VS4060
        |         |         |         |          |
        ------------------------------------------   (Ethernet)




Console messages on A, B, and C indicate they lost connection to
D. The console messages on D indicate it lost connection to A, B,
and C. (We never looked at E.) I would have expected D to exit
the cluster by crashing, but nothing happened. After waiting
nearly an hour, the customer halted D. The rest of the cluster
immediately began to work. D was booted and successfully
rejoined the cluster.

The hang occurred after the HSC was powered up (not down). All
drives on the HSC being repaired were dual-ported to another
HSC and failed over when the repair work began. During the hang 
any process that attempted a disk access was put into a wait
state. The customer believes most or all disks went into mount
verification - even those not on the repaired HSC - but I can
not confirm this. The customer uses the "preferred path"
program in SYS$EXAMPLES to balance the HSC load.

We do not have a crash dump to examine because D was simply
halted. The error log file indicates the CI port shutdown
duing the hang. The customer has some more HSC work scheduled
and was wondering about the cause. I know there is very little
information here, but speculation would be welcome.
    
T.RTitleUserPersonal
Name
DateLines