[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

586.0. "System hung with SYSINIT-I-LOCK on V7.1 DSSI clus." by CRLRFR::BLUNT () Tue May 13 1997 16:52

    
    Ran into an issue where a site is adding a third node to a DSSI cluster
    of Alpha 2100s.  Any TWO systems run fine, but when ANY third member is
    added, it hangs during boot with:
    
    	SYSINIT-I-LOCK, taking out lock on system device
    
    At this point, the booting system is hung.  Shutting down the cluster
    and bringing all systems up results in no difference in behavior.
    
    Anyone seen this, or have recommendations about solutions?
    
    bob
T.RTitleUserPersonal
Name
DateLines
586.1No Answers, Just Questions...XDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 14 1997 10:2818
   That message is from SYSINIT debugging, and should only appear when
   the DEBUG_MSG_FLAG cell is set non-zero in the SYSINIT image.  (Have
   you turned on extra logging in the bootstrap?)

   What is the DSSI bus configuration?

   Are all three nodes running V7.1?

   Shadowing?

   I will assume that you have checked the DSSI unit numbers on all DSSI
   disks, DSSI tapes, and all host DSSI controllers, making sure that all
   are set to unique values.  I will assume that all three hosts are set
   in the same host allocation non-zero class if there is storage on the
   DSSI (or shared SCSI), and that all volumes on all hosts in the same
   (host or SCSI port) allocation class have unique unit numbers.

586.2this it?CTHU26::S_BURRIDGEWed May 14 1997 14:3883
    If the system is running OpenVMS 7.1 or has the ALPCLUSIO or
    ALPCOMPAT_62 patches installed, and if the error message is actually
    "%SYSINIT-I-LOCKWAIT, waiting for locks on system disk", then the
    problem may be the one described in the following:
    
    (This is mail I got from the Colorado CSC.  Apparently a STARS article
    is in the works but has not yet appeared.)
    

Hi,

THis sounds exactly like a problem I had worked.
It is caused by perfectcache.




After upgrading to 7.1, installing the ...CLISIO, or the ..COMPAT_062 kits
Boots may hang with the following error:

	 %SYSINIT-I-LOCKWAIT, waiting for locks on system disk

The new MOUNT96 code that is installed with 7.1, CLISIO, and ..COMPAT_062 kits
now requires the use of the DMT$ lock on the device to insure proper
synchronization.

If PerfectCache is also running on the cluster it also uses the DMT$
lock which causes the MOUNT/BOOT to hang.

We are in the process of working with PerfectCache to resolve this problem.

Analysis:

Looking at [MOUNT96.LIS]SYSMOU.LIS, which is the module that generates the
SYSINIT-I-LOCKWAIT error.  Here, we simply

    Loop:
	attempt to get the MNT$physical_device for device_index=0
		lock in EX mode
	attempt to find the device using search_device
		This takes out the MNT$physical_device for device_index=j
			lock in EX mode
		This takes out the DMT$physical_device for device_index=j
			lock in PW mode
		This takes out the SYS$device_name in PW mode

	If any of these attempts fail with MOUN$_DEVBUSY
	then
	    release the DMT$ lock (which was taken out in search_device)
	    release the MNT$ lock (which was taken out in search_device)
	    release the MNT$ lock (for device index = 0)

	    Write the "SYSINIT-I-LOCKWAIT..." message once

	    wait, then goto loop

So, to determine what is causing the error we need to see the lock information,
and since this is at startup, that means that the customer will have to
force a crash when it is hung.

The above locks can be examined and you then follow the PIDs to find what
process is holding the lock that you are waiting for. Following is a
picture on one lock:

Lock id:  010008A0   PID:     00010028   Flags:   SYNCSTS SYSTEM  NODLCKW
Par. id:  00000000   SUBLCKs:        0            NODLCKB
LKB:      83781D80   BLKAST:  00010A98
PRIORTY:      0000

Granted at      PR   00000000-FFFFFFFF

Resource:      2431245F 24544D44    DMT$_$1$  Status:
 Length   15   003A3030 37415544    DUA700:.
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Process copy of lock 210018B6 on system 00010001<CR><LF>
Process index: 0028   Name: .PerfectCache..   Extended PID: 20601028
--------------------------------------------------------------------
Process status:        01840011  RES,PSWAPM,PHDRES,NODELET
Required capabilities: 0000000C  QUORUM,RUN

    
586.3Valid SYSINIT MessageXDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 14 1997 14:584
   re: 2.
   "SYSINIT-I-LOCK, taking out lock on system device" is a valid
   message displayed -- when debugging is turned on -- by SYSINIT.
586.4No debug, checking DSSI config...CRLRFR::BLUNTWed May 14 1997 15:026
    
    Indirectly an answer to both .1 and .2.  No, SYSINIT debugging is not
    turned on.  I will check the DSSI unit, ALLOCLASS and related questions
    Steve posted in .1
    
    bob
586.5boot flags?XDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 14 1997 15:383
   ...What Alpha console boot command flags were used here?

586.6VMSSG::FRIEDRICHSAsk me about Young EaglesWed May 14 1997 17:2616
    I suspect that PerfectCache is running....
    
    It was very recently discovered that PerfectCache has an "unusal
    usage" of the DMT$device lock.
    
    The newly re-written MOUNT, to fix a number of synch. problems, now
    also takes out the DMT$device lock as part of the mounting process.
    
    This deadlock leads to the SYSINIT message you are seeing.
    
    The easy workaround is to turn off PerfectCache.  We are working to
    get this resolved.
    
    Cheers,
    jeff
    
586.7VMSSG::FRIEDRICHSAsk me about Young EaglesThu May 15 1997 14:4515
    This morning I talked with the engineer at RAXCO...  
    
    V6.0 of PerfectCache does not use the DMT$ lock.  Customer that are 
    running PerfectCache and want to upgrade to V7.1 or CLUSIO01_062 
    should first upgrade to V6.0.
    
    As a workaround, V5.0 can be disabled.
    
    We are working with Storage engineering to be sure that V6.0 gets 
    shipped witih the HSxxx boxes and we are working on STARS/BLITZ
    articles to inform the field..
    
    Cheers,
    jeff
    
586.8SSDEVO::DESKORick in Storage - DTN 522-3905Thu May 15 1997 20:508
     > We are working with Storage engineering to be sure that V6.0 gets 
     > shipped witih the HSxxx boxes and we are working on STARS/BLITZ
     > articles to inform the field..
    
    Actually, PerfectCache never shipped with HSxxx FDDI Servers.
    It has only shipped with SWXNA (the latest generation of FDDI Servers).

    Rick