Title: | VAX DBMS |
Notice: | THIS NOTESFILE IS NOT A FORMAL SUPPORT CHANNEL |
Moderator: | SCARY::CHARLAND |
Created: | Thu Feb 20 1986 |
Last Modified: | Tue Jun 03 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2642 |
Total number of notes: | 11044 |
[6.1A ECO 1, VMS 6.2] Customer has two DBMS processes and the DBMS_MONITOR get into a deadlock situation which is not resolved and the whole database winds up hung on a locking logjam with these processes as the blocker(s). Out of desparation they crashed the system and a romp through SDA follows. I suggested DBO/BUGCHECK/PID as an unsupported way out of it rather than taking out the whole production machine, but I still need to track down what is wrong. DEADLOCK_WAIT is set to 3. Any ideas ? Regards, rcs >The following is the output from SDA showing the DBMS_MONITOR process >waiting for a lock owned by process index 02C8. SDA> show proc dbms_monitor Process index: 00A2 Name: DBMS_MONITOR Extended PID: 35E004A2 ----------------------------------------------------------------- Status : 00040011 res,pswapm,phdres Status2: 00000001 quantum_resched PCB address B53E29C0 JIB address B46C2B00 PHD address C33C7C00 Swapfile disk address 00000000 Master internal PID 000100A2 Subprocess count 0 Internal PID 000100A2 Creator internal PID 00000000 Extended PID 35E004A2 Creator extended PID 00000000 State LEF Termination mailbox 00AA Current priority 15 AST's enabled KESU Base priority 15 AST's active NONE UIC [00001,000004] AST's remaining 509 Mutex count 0 Buffered I/O count/limit 1022/1024 Waiting EF cluster 0 Direct I/O count/limit 256/256 Starting wait time 10001010 BUFIO byte count/limit 496288/496288 Event flag wait mask F7FFFFFF # open files allowed left 1014 Local EF cluster 0 E0000001 Timer entries allowed left 127 Local EF cluster 1 00000000 Active page table count 0 Global cluster 2 pointer 00000000 Process WS page count 552 Global cluster 3 pointer 00000000 Global WS page count 23 SDA> show proc/lock Process index: 00A2 Name: DBMS_MONITOR Extended PID: 35E004A2 ----------------------------------------------------------------- Lock data: Lock id: 5700E9B7 PID: 000100A2 Flags: VALBLK SYNCSTS SYSTEM Par. id: 00000000 SUBLCKs: 0 PROTECT NODLCKW LKB: B3179200 BLKAST: 00000000 PRIORTY: 0000 RQSEQNM: 0000 Waiting for PW 00000000-FFFFFFFF Resource: 00000054 00000028 (...T... Status: ASYNC PROTCT Length 16 00000001 35E07AC8 Hz`5.... Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy SDA> show res/lock=5700E9B7 Resource database ----------------- Address of RSB: B44D6080 GGMODE: PW Status: DIRENTR VALID WTFULRG Parent RSB: 00000000 CGMODE: PW Sub-RSB count: 0 FGMODE: PW Lock Count: 2 CSID: 00000000 BLKAST count: 0 RQSEQNM: 0001 Resource: 00000054 00000028 (...T... Valblk: 00000000 00000000 Length 16 00000001 35E07AC8 Hz`5.... 00000000 00000000 Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Seqnum: 00000000 Granted queue (Lock ID / Gr mode / Range): 3200316D PW 00000000-FFFFFFFF <- *********** Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range): *** EMPTY QUEUE *** Waiting queue (Lock ID / Rq mode / Range): 5700E9B7 PW 00000000-FFFFFFFF SDA> show lock 3200316D Lock database ------------- Lock id: 3200316D PID: 001E02C8 Flags: NOQUEUE SYNCSTS SYSTEM ******** Par. id: 00000000 SUBLCKs: 0 LKB: B5952000 BLKAST: 00000000 PRIORTY: 0000 Granted at PW 00000000-FFFFFFFF Resource: 00000054 00000028 (...T... Status: Length 16 00000001 35E07AC8 Hz`5.... Exec. mode 00000000 0 0000000 ........ System 00000000 00000000 ........ Local copy >Next, process 02C8 (VA9401:BFAB) is waiting on a lock owned by process index >038A. SDA> show proc/in=2c8 Process index: 02C8 Name: VA9401:BFAB Extended PID: 35E07AC8 ---------------------------------------------------------------- Status : 02040001 res,phdres,inter Status2: 00000001 quantum_resched PCB address B586E980 JIB address B2F0DF40 PHD address E6F76400 Swapfile disk address 00000000 Master internal PID 001E02C8 Subprocess count 0 Internal PID 001E02C8 Creator internal PID 00000000 Extended PID 35E07AC8 Creator extended PID 00000000 State LEF Termination mailbox 0000 Current priority 5 AST's enabled KESU Base priority 5 AST's active NONE UIC [00030,000036] AST's remaining 509 Mutex count 0 Buffered I/O count/limit 512/512 Waiting EF cluster 0 Direct I/O count/limit 128/128 Starting wait time 1A001A19 BUFIO byte count/limit 58976/58976 Event flag wait mask FDFFFFFF # open files allowed left 274 Local EF cluster 0 E0000001 Timer entries allowed left 9 Local EF cluster 1 80000000 Active page table count 0 Global cluster 2 pointer 00000000 Process WS page count 1967 Global cluster 3 pointer 00000000 Global WS page count 256 SDA> show proc/lock Process index: 02C8 Name: VA9401:BFAB Extended PID: 35E07AC8 ---------------------------------------------------------------- Lock data: Lock id: 61E2001B PID: 001E02C8 Flags: SYSTEM Par. id: 5800E768 SUBLCKs: 0 LKB: B3B71D80 BLKAST: 023CD2B8 PRIORTY: 0000 RQSEQNM: 2D08 Waiting for CW 00000000-FFFFFFFF Resource: 00000000 00000046 F....... Status: ASYNC TIMOUTQ Length 04 00000000 00000000 ........ - Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy SDA> show res/lock=61E2001B Resource database ----------------- Address of RSB: B4954340 GGMODE: CW Status: VALID WTFULRG Parent RSB: B44753C0 CGMODE: CW Sub-RSB count: 0 FGMODE: CW Lock Count: 163 CSID: 00000000 BLKAST count: 1 RQSEQNM: 2D7C Resource: 00000000 00000046 F....... Valblk: 00000000 00000000 Length 4 00000000 00000000 ........ 00000000 00000000 Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Seqnum: 00000000 Granted queue (Lock ID / Gr mode / Range): 0200B0D0 CW 00000000-FFFFFFFF <- ********** Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range): *** EMPTY QUEUE *** Waiting queue (Lock ID / Rq mode / Range): 4F00FFD4 PR 00000000-FFFFFFFF 36F000A4 PR 00000000-FFFFFFFF SDA> show lock 0200B0D0 Lock database ------------- Lock id: 0200B0D0 PID: 001F038A Flags: SYSTEM ******** Par. id: 66020125 SUBLCKs: 0 LKB: B4397180 BLKAST: 0028E6B8 PRIORTY: 0000 Granted at CW 00000000-FFFFFFFF Resource: 00000000 00000046 F....... Status: BLASTQD Length 04 00000000 00000000 ........ Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy >Finally, process 038A (BFAB_35E0870E) is waiting on a lock owned by the >DBMS_MONITOR (process index 00A2). SDA> show proc/in=38a Process index: 038A Name: BFAB_35E0870E Extended PID: 35E07F8A ------------------------------------------------------------------ Status : 00040023 res,delpen,respen,phdres Status2: 00000001 quantum_resched PCB address B5221B40 JIB address B45E5880 PHD address E33A2800 Swapfile disk address 00000000 Master internal PID 0021030E Subprocess count 1 Internal PID 001F038A Creator internal PID 0021030E Extended PID 35E07F8A Creator extended PID 35E0870E State LEF Termination mailbox 1650 Current priority 10 AST's enabled KESU Base priority 5 AST's active NONE UIC [00030,000036] AST's remaining 510 Mutex count 0 Buffered I/O count/limit 512/512 Waiting EF cluster 0 Direct I/O count/limit 128/128 Starting wait time 1A001A1A BUFIO byte count/limit 39904/40032 Event flag wait mask FEFFFFFF # open files allowed left 181 Local EF cluster 0 E2000001 Timer entries allowed left 10 Local EF cluster 1 98000000 Active page table count 0 Global cluster 2 pointer 00000000 Process WS page count 865 Global cluster 3 pointer 00000000 Global WS page count 439 SDA> show proc/lock Process index: 038A Name: BFAB_35E0870E Extended PID: 35E07F8A ------------------------------------------------------------------ Lock data: Lock id: 7900ABB9 PID: 001F038A Flags: CONVERT SYNCSTS NODLCKW Par. id: 00000000 SUBLCKs: 0 LKB: B30F7280 BLKAST: 00000000 PRIORTY: 0000 RQSEQNM: 0001 Converting to EX 00000000-FFFFFFFF Granted at NL 00000000-FFFFFFFF Resource: 00000054 00000028 (...T... Status: ASYNC Length 16 00000001 35E07F8A ..`5.... Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy SDA> show res/lock=7900ABB9 Resource database ----------------- Address of RSB: B5066B40 GGMODE: PW Status: VALID CVTFULR Parent RSB: 00000000 CGMODE: PW Sub-RSB count: 0 FGMODE: PW Lock Count: 2 CSID: 00000000 BLKAST count: 0 RQSEQNM: 0002 Resource: 00000054 00000028 (...T... Valblk: 00000000 00000000 Length 16 00000001 35E07F8A ..`5.... 00000000 00000000 Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Seqnum: 00000000 Granted queue (Lock ID / Gr mode / Range): 668A013F PW 00000000-FFFFFFFF <- ************ Conversion queue (Lock ID / Gr mode / Range -> Rq mode / Range): 7900ABB9 NL 00000000-FFFFFFFF / EX 00000000-FFFFFFFF Waiting queue (Lock ID / Rq mode / Range): *** EMPTY QUEUE *** SDA> show lock 668A013F Lock database ------------- Lock id: 668A013F PID: 000100A2 Flags: VALBLK SYNCSTS SYSTEM ******** Par. id: 00000000 SUBLCKs: 0 PROTECT NODLCKW LKB: B3AA4F00 BLKAST: 00000000 PRIORTY: 0000 Granted at PW 00000000-FFFFFFFF Resource: 00000054 00000028 (...T... Status: ASYNC PROTCT Length 16 00000001 35E07F8A ..`5.... Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
2594.1 | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Wed Feb 12 1997 19:18 | 29 | |
Richard, We don't recognize some of those locks as being Rdb locks. Could you have the customer ask VMS support if they recognize the lock shown below? I think this lock is killing deadlock detection since it has the NODLCKW flag lit. Paul SDA> show lock 668A013F Lock database ------------- Lock id: 668A013F PID: 000100A2 Flags: VALBLK SYNCSTS SYSTEM ******** Par. id: 00000000 SUBLCKs: 0 PROTECT NODLCKW LKB: B3AA4F00 BLKAST: 00000000 PRIORTY: 0000 Granted at PW 00000000-FFFFFFFF Resource: 00000054 00000028 (...T... Status: ASYNC PROTCT Length 16 00000001 35E07F8A ..`5.... Exec. mode 00000000 00000000 ........ System 00000000 00000000 ........ Local copy | |||||
2594.2 | ORAREP::HERON::GODFRIND | Oracle Rdb Engineering | Thu Feb 13 1997 06:50 | 28 | |
The T locks are the termination locks. >Resource: 00000054 00000028 (...T... Status: ASYNC PROTCT > Length 16 00000001 35E07AC8 Hz`5.... > Exec. mode 00000000 00000000 ........ > System 00000000 00000000 ........ Each process connected to the database holds one such lock (the resource name includes the pid of that process). The monitor has a permanent request for all of those locks queued (it will notice a process failure when being granted that lock). Those locks are taken with the NODLCKWT option, and are therefore not seen as causing deadlocks. The actual problem seems to be caused by process 35e07f8a. That process is holding on to the F lock (the freeze lock). It is also waiting to be deleted: >Process index: 038A Name: BFAB_35E0870E Extended PID: 35E07F8A >------------------------------------------------------------------ >Status : 00040023 res,delpen,respen,phdres >Status2: 00000001 quantum_resched The freeze lock is used during database recovery. It should be given up by all processes so that recovery can proceed. If one process does not release it, then the database will seem to hang. Have you got a way to find out what process 35E07F8A was doing ? /albert | |||||
2594.3 | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Thu Feb 13 1997 10:27 | 12 | |
Hmmm. Indeed Albert is right as always. It appears this bit of trivia leaked from my brain. OK. Sorry for leading you astray. Indeed, someone needs to determine why the process doesn't give up the freeze lock. There have been tons of notes on this in the Rdb notesfiles. In practically every case turning off VMS dynamic lock remastering has prevented the problem from occurring. fwiw, Paul | |||||
2594.4 | PE1 *was* 0, so... | NOMAHS::SECRIST | Rdb WWS; [email protected] | Thu Feb 13 1997 13:53 | 46 |
The process was running a homegrown application that performs a minor amount of DBMS reads in addition to other things. This is the second time this problem has occurred. Before, the suspect process was running a different executable (though accessing the same DB). PE1 across the cluster was set to 0. The dumpfile from the crash shows this: SDA> exam SGN$GL_PE1 SGN$GL_PE1: 00000000 "...." Again, I have asked the customer to try PE1 of 1 to see if the problem goes away. Here's the process summary from the dumpfile: Process index: 038A Name: BFAB_35E0870E Extended PID: 35E07F8A ------------------------------------------------------------------ Status : 00040023 res,delpen,respen,phdres Status2: 00000001 quantum_resched PCB address B5221B40 JIB address B45E5880 PHD address E33A2800 Swapfile disk address 00000000 Master internal PID 0021030E Subprocess count 1 Internal PID 001F038A Creator internal PID 0021030E Extended PID 35E07F8A Creator extended PID 35E0870E State LEF Termination mailbox 1650 Current priority 10 AST's enabled KESU Base priority 5 AST's active NONE UIC [00030,000036] AST's remaining 510 Mutex count 0 Buffered I/O count/limit 512/512 Waiting EF cluster 0 Direct I/O count/limit 128/128 Starting wait time 1A001A1A BUFIO byte count/limit 39904/40032 Event flag wait mask FEFFFFFF # open files allowed left 181 Local EF cluster 0 E2000001 Timer entries allowed left 10 Local EF cluster 1 98000000 Active page table count 0 Global cluster 2 pointer 00000000 Process WS page count 865 Global cluster 3 pointer 00000000 Global WS page count 439 Do you see anything of interest ? Regards, rcs | |||||
2594.5 | PE1 of 100 ? | NOMAHS::SECRIST | Rdb WWS; [email protected] | Fri Feb 14 1997 14:26 | 12 |
The customer finally set PE1 to 100. I recall a Digital recommendation some moons ago that if you felt compelled to set PE1 to something, to set it small so at least small trees should migrate. Since it only takes 1 tree to potentially hose a database if that is the root of a locking logjam, my inclination is to set it to -1 and shut it off entirely or 1 which effectively does the same thing. Does anyone think 100 would improve matters at all since a lock tree could be fairly small (but deadly) ? Regards, rcs | |||||
2594.6 | DUCATI::LASTOVICA | Is it possible to be totally partial? | Fri Feb 14 1997 15:16 | 1 | |
set it to 1. |