[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dbms

Title:VAX DBMS
Notice:THIS NOTESFILE IS NOT A FORMAL SUPPORT CHANNEL
Moderator:SCARY::CHARLAND
Created:Thu Feb 20 1986
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2642
Total number of notes:11044

2594.0. "Undetected deadlock ?" by NOMAHS::SECRIST (Rdb WWS; [email protected]) Wed Feb 12 1997 17:27

[6.1A ECO 1, VMS 6.2]

Customer has two DBMS processes and the DBMS_MONITOR get into a
deadlock situation which is not resolved and the whole database
winds up hung on a locking logjam with these processes as the
blocker(s).  Out of desparation they crashed the system and a
romp through SDA follows.  I suggested DBO/BUGCHECK/PID as an
unsupported way out of it rather than taking out the whole
production machine, but I still need to track down what is
wrong.  DEADLOCK_WAIT is set to 3.  Any ideas ?

Regards,
rcs

>The following is the output from SDA showing the DBMS_MONITOR process
>waiting for a lock owned by process index 02C8.  

SDA> show proc dbms_monitor
                                               
Process index: 00A2   Name: DBMS_MONITOR   Extended PID: 35E004A2
-----------------------------------------------------------------
Status : 00040011 res,pswapm,phdres
Status2: 00000001 quantum_resched
PCB address              B53E29C0    JIB address              B46C2B00
PHD address              C33C7C00    Swapfile disk address    00000000
Master internal PID      000100A2    Subprocess count                0
Internal PID             000100A2    Creator internal PID     00000000
Extended PID             35E004A2    Creator extended PID     00000000
State                       LEF      Termination mailbox          00AA
Current priority               15    AST's enabled                KESU
Base priority                  15    AST's active                 NONE
UIC                [00001,000004]    AST's remaining               509
Mutex count                     0    Buffered I/O count/limit     1022/1024
Waiting EF cluster              0    Direct I/O count/limit        256/256
Starting wait time       10001010    BUFIO byte count/limit     496288/496288
Event flag wait mask     F7FFFFFF    # open files allowed left    1014
Local EF cluster 0       E0000001    Timer entries allowed left    127
Local EF cluster 1       00000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count         552
Global cluster 3 pointer 00000000    Global WS page count           23

SDA> show proc/lock

Process index: 00A2   Name: DBMS_MONITOR   Extended PID: 35E004A2
-----------------------------------------------------------------
Lock data:

Lock id:  5700E9B7   PID:     000100A2   Flags:   VALBLK  SYNCSTS SYSTEM
Par. id:  00000000   SUBLCKs:        0            PROTECT NODLCKW
LKB:      B3179200   BLKAST:  00000000
PRIORTY:      0000   RQSEQNM:     0000

Waiting for     PW   00000000-FFFFFFFF

Resource:      00000054 00000028    (...T...  Status:  ASYNC   PROTCT
 Length   16   00000001 35E07AC8    Hz`5....
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy


SDA> show res/lock=5700E9B7

Resource database
-----------------
Address of RSB:  B44D6080  GGMODE:       PW  Status: DIRENTR VALID   WTFULRG
Parent RSB:      00000000  CGMODE:       PW
Sub-RSB count:          0  FGMODE:       PW
Lock Count:             2  CSID:   00000000
BLKAST count:           0  RQSEQNM:    0001

Resource:      00000054 00000028   (...T...  Valblk: 00000000 00000000
 Length   16   00000001 35E07AC8   Hz`5....          00000000 00000000
 Exec. mode    00000000 00000000   ........
 System        00000000 00000000   ........  Seqnum: 00000000

Granted queue (Lock ID / Gr mode / Range):
 3200316D  PW 00000000-FFFFFFFF   <- ***********

Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range):
     *** EMPTY QUEUE ***

Waiting queue (Lock ID / Rq mode / Range):
 5700E9B7  PW 00000000-FFFFFFFF


SDA> show lock 3200316D

Lock database
-------------

Lock id:  3200316D   PID:     001E02C8   Flags:   NOQUEUE SYNCSTS SYSTEM
                              ********
Par. id:  00000000   SUBLCKs:        0
LKB:      B5952000   BLKAST:  00000000
PRIORTY:      0000

Granted at      PW   00000000-FFFFFFFF

Resource:      00000054 00000028    (...T...  Status:
 Length   16   00000001 35E07AC8    Hz`5....
 Exec. mode    00000000 0
0000000    ........
 System        00000000 00000000    ........

Local copy


>Next, process 02C8 (VA9401:BFAB) is waiting on a lock owned by process index
>038A.
                                      
SDA> show proc/in=2c8

Process index: 02C8   Name: VA9401:BFAB   Extended PID: 35E07AC8
----------------------------------------------------------------
Status : 02040001 res,phdres,inter
Status2: 00000001 quantum_resched
PCB address              B586E980    JIB address              B2F0DF40
PHD address              E6F76400    Swapfile disk address    00000000
Master internal PID      001E02C8    Subprocess count                0
Internal PID             001E02C8    Creator internal PID     00000000
Extended PID             35E07AC8    Creator extended PID     00000000
State                       LEF      Termination mailbox          0000
Current priority                5    AST's enabled                KESU
Base priority                   5    AST's active                 NONE
UIC                [00030,000036]    AST's remaining               509
Mutex count                     0    Buffered I/O count/limit      512/512
Waiting EF cluster              0    Direct I/O count/limit        128/128
Starting wait time       1A001A19    BUFIO byte count/limit      58976/58976
Event flag wait mask     FDFFFFFF    # open files allowed left     274
Local EF cluster 0       E0000001    Timer entries allowed left      9
Local EF cluster 1       80000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count        1967
Global cluster 3 pointer 00000000    Global WS page count          256


SDA> show proc/lock

Process index: 02C8   Name: VA9401:BFAB   Extended PID: 35E07AC8
----------------------------------------------------------------
Lock data:

Lock id:  61E2001B   PID:     001E02C8   Flags:   SYSTEM
Par. id:  5800E768   SUBLCKs:        0
LKB:      B3B71D80   BLKAST:  023CD2B8
PRIORTY:      0000   RQSEQNM:     2D08

Waiting for     CW   00000000-FFFFFFFF

Resource:      00000000 00000046    F.......  Status:  ASYNC   TIMOUTQ
 Length   04   00000000 00000000    ........
- Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy


SDA> show res/lock=61E2001B

Resource database
-----------------
Address of RSB:  B4954340  GGMODE:       CW  Status: VALID   WTFULRG
Parent RSB:      B44753C0  CGMODE:       CW
Sub-RSB count:          0  FGMODE:       CW
Lock Count:           163  CSID:   00000000
BLKAST count:           1  RQSEQNM:    2D7C

Resource:      00000000 00000046   F.......  Valblk: 00000000 00000000
 Length    4   00000000 00000000   ........          00000000 00000000
 Exec. mode    00000000 00000000   ........
 System        00000000 00000000   ........  Seqnum: 00000000

Granted queue (Lock ID / Gr mode / Range):
 0200B0D0  CW 00000000-FFFFFFFF    <- **********


Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range):
     *** EMPTY QUEUE ***

Waiting queue (Lock ID / Rq mode / Range):
 4F00FFD4  PR 00000000-FFFFFFFF         36F000A4  PR 00000000-FFFFFFFF
                                                

SDA> show lock 0200B0D0

Lock database
-------------

Lock id:  0200B0D0   PID:     001F038A   Flags:   SYSTEM
                              ********
Par. id:  66020125   SUBLCKs:        0
LKB:      B4397180   BLKAST:  0028E6B8
PRIORTY:      0000

Granted at      CW   00000000-FFFFFFFF

Resource:      00000000 00000046    F.......  Status:  BLASTQD
 Length   04   00000000 00000000    ........
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy


>Finally, process 038A (BFAB_35E0870E) is waiting on a lock owned by the
>DBMS_MONITOR (process index 00A2).
                                 
     
SDA> show proc/in=38a

Process index: 038A   Name: BFAB_35E0870E   Extended PID: 35E07F8A
------------------------------------------------------------------
Status : 00040023 res,delpen,respen,phdres
Status2: 00000001 quantum_resched
PCB address              B5221B40    JIB address              B45E5880
PHD address              E33A2800    Swapfile disk address    00000000
Master internal PID      0021030E    Subprocess count                1
Internal PID             001F038A    Creator internal PID     0021030E
Extended PID             35E07F8A    Creator extended PID     35E0870E
State                       LEF      Termination mailbox          1650
Current priority               10    AST's enabled                KESU
Base priority                   5    AST's active                 NONE
UIC                [00030,000036]    AST's remaining               510
Mutex count                     0    Buffered I/O count/limit      512/512
Waiting EF cluster              0    Direct I/O count/limit        128/128
Starting wait time       1A001A1A    BUFIO byte count/limit      39904/40032
Event flag wait mask     FEFFFFFF    # open files allowed left     181
Local EF cluster 0       E2000001    Timer entries allowed left     10
Local EF cluster 1       98000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count         865
Global cluster 3 pointer 00000000    Global WS page count          439


SDA> show proc/lock

Process index: 038A   Name: BFAB_35E0870E   Extended PID: 35E07F8A
------------------------------------------------------------------
Lock data:

Lock id:  7900ABB9   PID:     001F038A   Flags:   CONVERT SYNCSTS NODLCKW
Par. id:  00000000   SUBLCKs:        0
LKB:      B30F7280   BLKAST:  00000000
PRIORTY:      0000   RQSEQNM:     0001

Converting to   EX   00000000-FFFFFFFF
Granted at      NL   00000000-FFFFFFFF

Resource:      00000054 00000028    (...T...  Status:  ASYNC
 Length   16   00000001 35E07F8A    ..`5....
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy


SDA> show res/lock=7900ABB9

Resource database
-----------------
Address of RSB:  B5066B40  GGMODE:       PW  Status: VALID   CVTFULR
Parent RSB:      00000000  CGMODE:       PW
Sub-RSB count:          0  FGMODE:       PW
Lock Count:             2  CSID:   00000000
BLKAST count:           0  RQSEQNM:    0002

Resource:      00000054 00000028   (...T...  Valblk: 00000000 00000000
 Length   16   00000001 35E07F8A   ..`5....          00000000 00000000
 Exec. mode    00000000 00000000   ........
 System        00000000 00000000   ........  Seqnum: 00000000

Granted queue (Lock ID / Gr mode / Range):
 668A013F  PW 00000000-FFFFFFFF    <- ************

Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range):
 7900ABB9  NL 00000000-FFFFFFFF / EX 00000000-FFFFFFFF

Waiting queue (Lock ID / Rq mode / Range):
     *** EMPTY QUEUE ***


SDA> show lock 668A013F

Lock database
-------------

Lock id:  668A013F   PID:     000100A2   Flags:   VALBLK  SYNCSTS SYSTEM
                              ********
Par. id:  00000000   SUBLCKs:        0            PROTECT NODLCKW
LKB:      B3AA4F00   BLKAST:  00000000
PRIORTY:      0000

Granted at      PW   00000000-FFFFFFFF

Resource:      00000054 00000028    (...T...  Status:  ASYNC   PROTCT
 Length   16   00000001 35E07F8A    ..`5....
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy

    
T.RTitleUserPersonal
Name
DateLines
2594.1HOTRDB::PMEADPaul, [email protected], 719-577-8032Wed Feb 12 1997 19:1829
    Richard,
    
    We don't recognize some of those locks as being Rdb locks.  Could you
    have the customer ask VMS support if they recognize the lock shown
    below?  I think this lock is killing deadlock detection since it has
    the NODLCKW flag lit.
    
    	Paul
    
SDA> show lock 668A013F

Lock database
-------------

Lock id:  668A013F   PID:     000100A2   Flags:   VALBLK  SYNCSTS SYSTEM
                              ********
Par. id:  00000000   SUBLCKs:        0            PROTECT NODLCKW
LKB:      B3AA4F00   BLKAST:  00000000
PRIORTY:      0000

Granted at      PW   00000000-FFFFFFFF

Resource:      00000054 00000028    (...T...  Status:  ASYNC   PROTCT
 Length   16   00000001 35E07F8A    ..`5....
 Exec. mode    00000000 00000000    ........
 System        00000000 00000000    ........

Local copy
    
2594.2ORAREP::HERON::GODFRINDOracle Rdb EngineeringThu Feb 13 1997 06:5028
The T locks are the termination locks. 

>Resource:      00000054 00000028    (...T...  Status:  ASYNC   PROTCT
> Length   16   00000001 35E07AC8    Hz`5....
> Exec. mode    00000000 00000000    ........
> System        00000000 00000000    ........

Each process connected to the database holds one such lock (the resource name
includes the pid of that process). The monitor has a permanent request for all
of those locks queued (it will notice a process failure when being granted that
lock). Those locks are taken with the NODLCKWT option, and are therefore not
seen as causing deadlocks. 

The actual problem seems to be caused by process 35e07f8a. That process is
holding on to the F lock (the freeze lock). It is also waiting to be deleted:

>Process index: 038A   Name: BFAB_35E0870E   Extended PID: 35E07F8A
>------------------------------------------------------------------
>Status : 00040023 res,delpen,respen,phdres
>Status2: 00000001 quantum_resched

The freeze lock is used during database recovery. It should be given up by all
processes so that recovery can proceed. If one process does not release it,
then the database will seem to hang.

Have you got a way to find out what process 35E07F8A was doing ?

/albert
2594.3HOTRDB::PMEADPaul, [email protected], 719-577-8032Thu Feb 13 1997 10:2712
    Hmmm.  Indeed Albert is right as always.  It appears this bit of trivia
    leaked from my brain.
    
    OK.  Sorry for leading you astray.  Indeed, someone needs to determine
    why the process doesn't give up the freeze lock.  There have been tons
    of notes on this in the Rdb notesfiles.  In practically every case
    turning off VMS dynamic lock remastering has prevented the problem from
    occurring.
    
    	fwiw,
    
    	Paul
2594.4PE1 *was* 0, so...NOMAHS::SECRISTRdb WWS; [email protected]Thu Feb 13 1997 13:5346
    
    The process was running a homegrown application that performs
    a minor amount of DBMS reads in addition to other things.  This is
    the second time this problem has occurred.  Before, the suspect
    process was running a different executable (though accessing the same
    DB).
    
    PE1 across the cluster was set to 0.  The dumpfile from the crash
    shows this:
    
    SDA> exam SGN$GL_PE1
    SGN$GL_PE1:  00000000   "...."
    
    Again, I have asked the customer to try PE1 of 1 to see if the
    problem goes away.
    
    Here's the process summary from the dumpfile:
    
    Process index: 038A   Name: BFAB_35E0870E   Extended PID: 35E07F8A
    ------------------------------------------------------------------
    Status : 00040023 res,delpen,respen,phdres
    Status2: 00000001 quantum_resched
    PCB address              B5221B40    JIB address              B45E5880
    PHD address              E33A2800    Swapfile disk address    00000000
    Master internal PID      0021030E    Subprocess count                1
    Internal PID             001F038A    Creator internal PID     0021030E
    Extended PID             35E07F8A    Creator extended PID     35E0870E
    State                       LEF      Termination mailbox          1650
    Current priority               10    AST's enabled                KESU
    Base priority                   5    AST's active                 NONE
    UIC                [00030,000036]    AST's remaining               510
    Mutex count                     0    Buffered I/O count/limit  512/512
    Waiting EF cluster              0    Direct I/O count/limit    128/128
    Starting wait time       1A001A1A    BUFIO byte count/limit 39904/40032
    Event flag wait mask     FEFFFFFF    # open files allowed left     181
    Local EF cluster 0       E2000001    Timer entries allowed left     10
    Local EF cluster 1       98000000    Active page table count         0
    Global cluster 2 pointer 00000000    Process WS page count         865
    Global cluster 3 pointer 00000000    Global WS page count          439
    
    Do you see anything of interest ?  
    
    Regards,
    rcs
    
    
2594.5PE1 of 100 ?NOMAHS::SECRISTRdb WWS; [email protected]Fri Feb 14 1997 14:2612
    
    The customer finally set PE1 to 100.  I recall a Digital recommendation 
    some moons ago that if you felt compelled to set PE1 to something, to 
    set it small so at least small trees should migrate.  Since it only
    takes 1 tree to potentially hose a database if that is the root of a
    locking logjam, my inclination is to set it to -1 and shut it off
    entirely or 1 which effectively does the same thing.  Does anyone
    think 100 would improve matters at all since a lock tree could be
    fairly small (but deadly) ?
    
    Regards,
    rcs
2594.6DUCATI::LASTOVICAIs it possible to be totally partial?Fri Feb 14 1997 15:161
set it to 1.