[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5009.0. "Freeze lock not released by ALS" by jpalp1.jp.oracle.com::HKATSURA () Fri Feb 07 1997 03:56

I have a customer who is facing the following problem.  Has anyone
seen something like this?

Thanks,
Haruko Katsurai - Oracle Japan Rdb Support

[Environment]

   - Rdb 6.1A Standard 
   - Alpha VMS 6.2

[Problem]

   Database hangs.
   This has occcurred twice (1/23 01:00AM and 1/26 03:00AM) but
   can't be reproduced at will.

[Situation]

   - DBR is created
   - Some node's ALS has the Freeze lock
   - ALS does not release the Freeze lock
   - DBR cannot recover
   - DB hangs
   - executed RMONSTOP.COM (has RMU/MONITOR STOP/WAIT/ABORT=FORCEX in it)
   - could not close DB
   - Rebooted VMS where upon RMONSTOP.COM was reexecuted
   - DB closed and put back on line.

[Available information]

   * OS System Crash dump
   * RDMMON.LOG
   * will try to get Rdb bugcheck if it occurs again.
T.RTitleUserPersonal
Name
DateLines
5009.1information from the crash dumpjpalp1.jp.oracle.com::HKATSURAFri Feb 07 1997 03:56304
Following is some information from the crash dump.

Regards,
Haruko Katsurai

---------------------------------------------------------------------------
1.  DBR status
================
    RDM_RB_1 and RDM_RB_2 were the only one waiting for the freeze lock.
    Other processes were waiting for the DBR.

SDA> sh proc/lock
Process index: 0199   Name: RDM_RB_1   Extended PID: 20800599
-------------------------------------------------------------
Lock data:

Lock id:  04051389   PID:     00010199   Flags:   SYNCSTS SYSTEM  PROTECT
Par. id:  04051388   SUBLCKs:        0
LKB:      85FCB080   BLKAST:  00000000
PRIORTY:      0000   RQSEQNM:     0000

Waiting for     PR   00000000-FFFFFFFF

Resource:      00000000 00000046    F.......  Status:  ASYNC   PROTCT
 Length   04   00000000 00000000    ........
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy

SDA> sh proc/lock
Process index: 01C7   Name: RDM_RB_2   Extended PID: 208005C7
-------------------------------------------------------------
Lock data:

Lock id:  1004F25B   PID:     000101C7   Flags:   SYNCSTS SYSTEM  PROTECT
Par. id:  1004DD84   SUBLCKs:        0
LKB:      853F3180   BLKAST:  00000000
PRIORTY:      0000   RQSEQNM:     05E8

Waiting for     PR   00000000-FFFFFFFF

Resource:      00000000 00000046    F.......  Status:  ASYNC   PROTCT
 Length   04   00000000 00000000    ........
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy

2.  Freeze lock information of RDM_RB_2
=======================================

SDA> sh res/lock=1004F25B
Resource database

Address of RSB:  8396C0C0  GGMODE:       CW  Status: WTFULRG
Parent RSB:      822D5340  CGMODE:       CW
Sub-RSB count:          0  FGMODE:       CW
Lock Count:            55  CSID:   00000000
BLKAST count:           1  RQSEQNM:    083B

Resource:      00000000 00000046   F.......  Valblk: 00000000 00000000
 Length    4   00000000 00000000   ........          00000000 00000000
 Exec. mode    00000000 00000000   ........
 System        00000000 00000000   ........  Seqnum: 00000000

Granted queue (Lock ID / Gr mode / Range):
 09035D91  CW 00000000-FFFFFFFF

Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range):
     *** EMPTY QUEUE ***

Waiting queue (Lock ID / Rq mode / Range):
 04051389  PR 00000000-FFFFFFFF         6005138B  PR 00000000-FFFFFFFF
 710313D1  PR 00000000-FFFFFFFF         70044850  PR 00000000-FFFFFFFF
 1203A046  PR 00000000-FFFFFFFF         2702DEDE  PR 00000000-FFFFFFFF
 2602E6DD  PR 00000000-FFFFFFFF         03006ECD  PR 00000000-FFFFFFFF
 03006ECF  PR 00000000-FFFFFFFF         09038F3C  PR 00000000-FFFFFFFF
 0D037C81  PR 00000000-FFFFFFFF         0D02D550  PR 00000000-FFFFFFFF
 03053074  PR 00000000-FFFFFFFF         0305307C  PR 00000000-FFFFFFFF
 030530AB  PR 00000000-FFFFFFFF         03053065  PR 00000000-FFFFFFFF
 03053054  PR 00000000-FFFFFFFF         030530B0  PR 00000000-FFFFFFFF
 030530B2  PR 00000000-FFFFFFFF         03053059  PR 00000000-FFFFFFFF
 03053049  PR 00000000-FFFFFFFF         0905304E  PR 00000000-FFFFFFFF
 03053091  PR 00000000-FFFFFFFF         0305302F  PR 00000000-FFFFFFFF
 0A053075  PR 00000000-FFFFFFFF         0C05309F  PR 00000000-FFFFFFFF
 030530A2  PR 00000000-FFFFFFFF         0603047C  PR 00000000-FFFFFFFF
 0305306B  PR 00000000-FFFFFFFF         0C0530A5  PR 00000000-FFFFFFFF
 030349E4  PR 00000000-FFFFFFFF         0304465A  PR 00000000-FFFFFFFF
 090530E5  PR 00000000-FFFFFFFF         090530DA  PR 00000000-FFFFFFFF
 0905309A  PR 00000000-FFFFFFFF         0905309D  PR 00000000-FFFFFFFF
 030530DF  PR 00000000-FFFFFFFF         0305302C  PR 00000000-FFFFFFFF
 0A02D1F9  PR 00000000-FFFFFFFF         03053259  PR 00000000-FFFFFFFF
 42043865  PR 00000000-FFFFFFFF         1204B8B7  PR 00000000-FFFFFFFF
 7C04AFD0  PR 00000000-FFFFFFFF         5804B7E8  PR 00000000-FFFFFFFF
 14036307  PR 00000000-FFFFFFFF         0A04B016  PR 00000000-FFFFFFFF
 6C0398B1  PR 00000000-FFFFFFFF         7102CDC5  PR 00000000-FFFFFFFF
 25041BA6  PR 00000000-FFFFFFFF         3002C8E1  PR 00000000-FFFFFFFF
 09030E84  PR 00000000-FFFFFFFF         1004F25B  PR 00000000-FFFFFFFF
 3A0336EB  PR 00000000-FFFFFFFF         3903A00A  CW 00000000-FFFFFFFF

3.  More information on Freeze lock
====================================

SDA> sh lock 09035D91
Lock database
-------------

Lock id:  09035D91   PID:     0001008E   Flags:   SYSTEM
Par. id:  01003554   SUBLCKs:        0
LKB:      86152E80   BLKAST:  002A65A0
PRIORTY:      0000

Granted at      CW   00000000-FFFFFFFF

Resource:      00000000 00000046    F.......  Status:  BLASTQD
 Length   04   00000000 00000000    ........
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy

4.  Information on the Freeze lock holder (RDM_ALS_2)
=====================================================

SDA> sh proc /ind=0001008E
Process index: 008E   Name: RDM_ALS_2   Extended PID: 2080048E

Process status:        00040001  RES,PHDRES
Required capabilities: 0000000C  QUORUM,RUN

PCB address              81EFA980    JIB address              82381680
PHD address              B9B30000    Swapfile disk address    00000000
Master internal PID      0001008E    Subprocess count                0
Internal PID             0001008E    Creator internal PID     00000000
Extended PID             2080048E    Creator extended PID     00000000
State                       HIB      Termination mailbox          0045
Previous CPU Id          00000002    Current CPU Id           00000002
Previous ASNSEQ  00000000000163BD    Previous ASN     000000000000004A
Current priority               15    # of threads     0000000000000000
Initial process priority       15    Delete pending count         0
Base priority                  15    AST's active                 NONE
UIC                [00001,000004]    AST's remaining              3999
Mutex count                     0    Buffered I/O count/limit     1000/1000
Waiting EF cluster              0    Direct I/O count/limit       1000/1000
Abs time of last event   01B24C13    BUFIO byte count/limit   99999167/99999167
Event flag wait mask     FEFFFFFF    # open files allowed left    1997

5.  Info on RDM_ALS_2
=======================

SDA> sh proc
Process index: 008E   Name: RDM_ALS_2   Extended PID: 2080048E
--------------------------------------------------------------
Process status:        00040001  RES,PHDRES
Required capabilities: 0000000C  QUORUM,RUN

PCB address              81EFA980    JIB address              82381680
PHD address              B9B30000    Swapfile disk address    00000000
Master internal PID      0001008E    Subprocess count                0
Internal PID             0001008E    Creator internal PID     00000000
Extended PID             2080048E    Creator extended PID     00000000
State                       HIB      Termination mailbox          0045
Previous CPU Id          00000002    Current CPU Id           00000002
Previous ASNSEQ  00000000000163BD    Previous ASN     000000000000004A
Current priority               15    # of threads     0000000000000000
Initial process priority       15    Delete pending count         0
Base priority                  15    AST's active                 NONE
UIC                [00001,000004]    AST's remaining              3999
Mutex count                     0    Buffered I/O count/limit     1000/1000
Waiting EF cluster              0    Direct I/O count/limit       1000/1000
Abs time of last event   01B24C13    BUFIO byte count/limit   99999167/99999167
Event flag wait mask     FEFFFFFF    # open files allowed left    1997

6.  Activated images for RDM_ALS_2
===================================
Process index: 008E   Name: RDM_ALS_2   Extended PID: 2080048E
--------------------------------------------------------------


                            Process activated images
                            ------------------------

  IMCB    Start     End    Sym Vect    Type      Image Name  Major ID,Minor ID
-------- -------- -------- -------- ------------ -----------------------------
7FEFC5C8 00010000 001813FF 00000000 MAIN         RDMALS 0,0
7FEFE060 002A6000 003486E8 00336000 GLBL PRT SHR RDMPRV 1,1
7FEFE140 7FE4C000 7FEADFFF 7FE60F80 GLBL     SHR DECC$SHR 1,1
           Base      End   ImageOff     Section Type
         80566000 80684000 00000000   System Resident Code
         7FE4C000 7FE67000 00120000   Shareable Address Data
         7FE6C000 7FE75A00 00140000   Read-Write Data
         7FE7C000 7FE85400 00150000   Shareable Read-Only Data
         7FE8C000 7FE8C200 00160000   Read-Write Data
         7FE9C000 7FEA0600 00170000   Demand Zero Data
         7FEAC000 7FEADC00 00180000   Read-Write Data
7FEFEBA0 7FBBC000 7FD1DFFF 7FC0A370 GLBL     SHR DPML$SHR 1,0
         804A0000 80564600 00090000   System Resident Code
         7FBBC000 7FBF2600 00000000   Shareable Read-Only Data
         7FBFC000 7FC0DE00 00040000   Shareable Address Data
         7FC1C000 7FC1C400 00060000   Read-Write Data
         7FC2C000 7FC48200 00070000   Shareable Read-Only Data
         7FD1C000 7FD1D000 00160000   Read-Write Data
7FEFE760 7FB7C000 7FBADFFF 7FB7C540 GLBL     SHR CMA$TIS_SHR 1,1
           Base      End   ImageOff     Section Type
         8049E000 8049EE00 00020000   System Resident Code
         7FB7C000 7FB7CA00 00000000   Shareable Address Data
         7FB8C000 7FB8C200 00010000   Read-Write Data
         7FBAC000 7FBAC200 00030000   Read-Write Data
7FEFE3E0 00204000 002A47FF 00288930 GLBL     SHR SMGSHR 1,104
7FEFE4C0 00182000 00202DFF 00195950 GLBL     SHR SORTSHR 2,28
7FEFE300 7FADC000 7FB2DFFF 7FAE63C0 GLBL     SHR LIBRTL 1,1
         80400000 8048DE00 00000000   System Resident Code
         7FADC000 7FAE8800 00090000   Shareable Address Data
         7FAEC000 7FAED000 000A0000   Read-Write Data
         7FAFC000 7FB07C00 000B0000   Shareable Read-Only Data
         7FB0C000 7FB0C200 000C0000   Read-Write Data
         7FB1C000 7FB1D000 000D0000   Demand Zero Data
         7FB2C000 7FB2D600 000E0000   Read-Write Data
7FEFE220 7FB3C000 7FB6DFFF 7FB4C000 GLBL     SHR LIBOTS 1,3
           Base      End   ImageOff     Section Type
         8048E000 8049CC00 00020000   System Resident Code
         7FB3C000 7FB3E600 00000000   Shareable Read-Only Data
         7FB4C000 7FB4DC00 00010000   Shareable Address Data
         7FB6C000 7FB6C200 00030000   Read-Write Data
7FEFE5A0 B13909A0 B139D6B0 B13909A0 GLBL         SYS$BASE_IMAGE 18,4611610
7FEFE680 B1382AF8 B13840E8 B1382AF8 GLBL         SYS$PUBLIC_VECTORS 69,8183133

7.  PCB for RDM_ALS_2
=======================

SDA> for 81EFA980
81EFA980   PCB$L_SQFL                      81A8E840
81EFA984   PCB$L_SQBL                      81F76500
81EFA988   PCB$W_SIZE                          0280
81EFA98A   PCB$B_TYPE                        0C
81EFA98B                                   00
81EFA98C   PCB$L_AST_PENDING               00000000
81EFA990   PCB$Q_PHYPCB                    24CBA080
81EFA994                                   00000000
81EFA998   PCB$L_LEFC_0_SWAPPED            00000000
           PCB$Q_LEFC_SWAPPED
81EFA99C   PCB$L_LEFC_1_SWAPPED            00000000
81EFA9A0   PCB$L_ASTQFL_SPK                81EFA9A0     PCB+00020
81EFA9A4   PCB$L_ASTQBL_SPK                81EFA9A0     PCB+00020
81EFA9A8   PCB$L_ASTQFL_K                  81EFA9A8     PCB+00028
81EFA9AC   PCB$L_ASTQBL_K                  81EFA9A8     PCB+00028
81EFA9B0   PCB$L_ASTQFL_E                  81EFA9B0     PCB+00030
81EFA9B4   PCB$L_ASTQBL_E                  81EFA9B0     PCB+00030
81EFA9B8   PCB$L_ASTQFL_S                  81EFA9B8     PCB+00038
81EFA9BC   PCB$L_ASTQBL_S                  81EFA9B8     PCB+00038
81EFA9C0   PCB$L_ASTQFL_U                  81EFA9C0     PCB+00040
81EFA9C4   PCB$L_ASTQBL_U                  81EFA9C0     PCB+00040

  Process SYSTEM_1 logged out at 31-JAN-1997 13:53:31.22

8.  instruction at PC for RDM_ALS_2
===================================

SDA> ex/i @pc
SYS$PUBLIC_VECTORS_NPRO+00304:          BIS             SP,R31,R28

9.  Call Frames for RDM_ALS_2
================================

SDA> sh call
Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: 00000000 00064608
        Handler at 00000000 7FB4D460, Data = 00000000 00000020
        Return address on stack = 00000000 00064500

Registers saved on stack
------------------------
7F9A74A0  00000000 00010160  Saved R2     SYS$K_VERSION_16+00120
7F9A74A8  00000000 00041AEC  Saved R3
7F9A74B0  00000000 00000000  Saved R4
7F9A74B8  00000000 00041AC8  Saved R5
7F9A74C0  00000000 00368400  Saved R6
7F9A74C8  00000000 00000000  Saved R7
7F9A74D0  00000000 00000000  Saved R8
7F9A74D8  FFFFFFFF F38312C8  Saved R9     EXE$CATCH_ALL
7F9A74E0  00000000 7F9A7500  Saved R29

SDA> sh call/n
Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: 00000000 00063B00
        Handler at 00000000 7FB4D460, Data = 00000000 00000078
        Return address on stack = FFFFFFFF F3821C44     EXE$PROC_IMGACT_C+00384

Registers saved on stack
------------------------
7F9A7B48  00000000 7FFBF880  Saved R2     MMG$IMGHDRBUF+00080
7F9A7B50  00000000 7FFBF934  Saved R3     MMG$IMGHDRBUF+00134
7F9A7B58  FFFFFFFF 81EFA980  Saved R4     PCB
7F9A7B60  00000000 7FF40000  Saved R5
7F9A7B68  FFFFFFFF F21EE620  Saved R6
7F9A7B70  00000000 00001000  Saved R7     PRV$M_PSWAPM
7F9A7B78  00000000 7F9A7BA0  Saved R29
5009.2HOTRDB::PMEADPaul, [email protected], 719-577-8032Fri Feb 07 1997 10:038
    Haruko,
    
    You are very close to getting the right info.  We need to determine why
    the ALS is not responding.  I can't see any reason from the info
    supplied.  Could you do more SHOW CALL/NEXT commands until it returns
    an error?
    
    	Paul
5009.3Call Frame for ALSjpalp1.jp.oracle.com::HKATSURATue Feb 11 1997 21:1871
Hi,

Paul, thank you for your response.  The call frame shows one more call frame 
and I receive an error beyond that.  The call frames for other hibernating 
ALSs seem to be exactly the same.  

Thanks,
Haruko


9.  Call Frames for RDM_ALS_2 (modified)
========================================

SDA> sh call
Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: 00000000 00064608
        Handler at 00000000 7FB4D460, Data = 00000000 00000020
        Return address on stack = 00000000 00064500

Registers saved on stack
------------------------
7F9A74A0  00000000 00010160  Saved R2     SYS$K_VERSION_16+00120
7F9A74A8  00000000 00041AEC  Saved R3
7F9A74B0  00000000 00000000  Saved R4
7F9A74B8  00000000 00041AC8  Saved R5
7F9A74C0  00000000 00368400  Saved R6
7F9A74C8  00000000 00000000  Saved R7
7F9A74D0  00000000 00000000  Saved R8
7F9A74D8  FFFFFFFF F38312C8  Saved R9     EXE$CATCH_ALL
7F9A74E0  00000000 7F9A7500  Saved R29

SDA> sh call/n
Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: 00000000 00063B00
        Handler at 00000000 7FB4D460, Data = 00000000 00000078
        Return address on stack = FFFFFFFF F3821C44     EXE$PROC_IMGACT_C+00384

Registers saved on stack
------------------------
7F9A7B48  00000000 7FFBF880  Saved R2     MMG$IMGHDRBUF+00080
7F9A7B50  00000000 7FFBF934  Saved R3     MMG$IMGHDRBUF+00134
7F9A7B58  FFFFFFFF 81EFA980  Saved R4     PCB
7F9A7B60  00000000 7FF40000  Saved R5
7F9A7B68  FFFFFFFF F21EE620  Saved R6
7F9A7B70  00000000 00001000  Saved R7     PRV$M_PSWAPM
7F9A7B78  00000000 7F9A7BA0  Saved R29

Call Frame Information
----------------------
        Stack Frame Procedure Descriptor
Flags:  Base Register = FP, No Jacket, Native
        Procedure Entry: FFFFFFFF F3821B20              EXE$PROC_IMGACT_C+00260
        Handler at FFFFFFFF F38311A0
        Return address on stack = FFFFFFFF F3821B1C     EXE$PROC_IMGACT_C+0025C

SDA> sh call/n
Registers saved on stack
------------------------
7F9A7BC0  00000000 00000028  Saved R2
7F9A7BC8  00000000 7FF91F40  Saved R3
7F9A7BD0  FFFFFFFF F3831180  Saved R13    EXE$PRCDELMSG+00020
7F9A7BD8  00000000 00000000  Saved R29

SDA> show call/n
%SDA-E-NOTINPHYS, 00000000 : virtual data not in physical memory
5009.4HOTRDB::PMEADPaul, [email protected], 719-577-8032Wed Feb 12 1997 12:1922
    Well Haruko, I'm stumped.  I don't know VMS internals that well, but
    from what I can see VMS appears to have lost the blocking AST for the
    process.  That is, the PCB shows nothing on the kernel mode blast queue
    (the FREEZE lock is a kernel mode lock), yet the lock itself shows a
    BLASTQD status.
    
    Perhaps you can find a VMS person to verify this, but it is my
    suspicion that the BLASTQD status should not be set if an AST has been
    delivered.
    
    Btw, we have seen a number of situations over the years where a $DEQ
    appears to fail for some strange reason and even though the process
    thinks it has released the lock it really has not.  In every case
    disabling VMS dynamic lock remastering has caused the problem to
    disappear.  If the customer continues to see the problem you might have
    the customer set SYSGEN parameter PE1 to a small value (like 1) to see
    if that prevents the problem from occurring.  If it does prevent the
    problem then they might want to pursue the problem with Digital.
    
    	fwiw,
    
    	Paul
5009.5NOVA::MCGEEOracle Rdb Mission Critical EngineeringWed Feb 12 1997 12:3712
    >Perhaps you can find a VMS person to verify this, but it is my
    >suspicion that the BLASTQD status should not be set if an AST has been
    >delivered.
    
    I think the BLASTQD is set when a BLAST is queued.  When it is set, it
    tells VMS not to re-blast the lock.  The flag is not cleared until the
    lock is DEQueued or demoted.
    
    Anyway, disabling dynamic lock remastering is a good test since it only
    seems to cause problems anyway.
    
    Steve.
5009.6jpalp1.jp.oracle.com::HKATSURAThu Feb 13 1997 04:354
I will suggest this to the customer.  
Thank you for both of your input, 

Haruko