Title: | dec_mls_plus |
Moderator: | SMURF::BAT |
Created: | Mon Nov 29 1993 |
Last Modified: | Thu Jun 05 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 534 |
Total number of notes: | 2544 |
Oracle called. When they run more than one Oracle database at one time, after several hours, they will end up hung. This doesn't happen on MLS+ V3.1A, this is just on their V4.0A port. They all appear to be in a thread_block state within msg_dequeue. I've asked him to send me the stack trace of each one. Is it possible that something is not thread safe? How do we go about finding it?
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
527.1 | From Norman. Chris called later -- still have to call back | SMURF::BAT | Segui la tua beatitudine | Tue Jun 03 1997 21:35 | 71 |
From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 2-JUN-1997 16:56:02.03 To: smurf::bat CC: Subj: stack of hung problem Barbara, Here is the stack trace of the hung process: [2] record output foo.tmp (0 lines) (dbx) (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] 1 msg_dequeue(0xfffffc0001dcea80, 0xfffffc0007f9aa48, 0xffffffff80309020, 0xffffffff883738e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ another dbm: ] 1 msg_dequeue(0xfffffc0003232a80, 0xfffffc000423a048, 0xffffffff80309020, 0xffffffff882538e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ yet another dbm hung on the same system: ] 1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 0xffffffff8816f8e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ yet another dbm hung on the same system: ] 1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 0xffffffff8816f8e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) | |||||
527.2 | next step | SMURF::BAT | Segui la tua beatitudine | Tue Jun 03 1997 21:41 | 13 |
I spoke to Kris this morning to get a pointer to the right place to ask about how to find this. She said 1. Dave Long is the right guy to ask about thread issues. 2. Try building them a debug kernel: Ask them what software options they have in their kernel (ask them for their conf file), then build them a genvmunix with those options and with CFLAGS=g3 and no optimizations, and send it to them. Then, next time these dbm's hang, they should force a crash and send the crash dump files here. | |||||
527.3 | now to build a debug kernel | SMURF::BAT | Segui la tua beatitudine | Thu Jun 05 1997 13:42 | 2 |
Norman sent me the kernel options list; I've archived it in ~ftp/pub/oracle | |||||
527.4 | what is dlm? | SMURF::BAT | Segui la tua beatitudine | Thu Jun 05 1997 14:01 | 5 |
In further discussing this hang, Norman said that he had had to remove the calls to the lock manager their code was using... he said because "they couldn't find the 'Digital Lock Manager' code in V4... e.g., /usr/include/sys/dlm.h and the dlm_detach, etc., routines... where are they?" | |||||
527.5 | I put the files in ~ftp/pub/oracle | SMURF::BAT | Segui la tua beatitudine | Thu Jun 05 1997 14:30 | 21 |
From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 3-JUN-1997 22:23:47.95 To: smurf::bat CC: [email protected], [email protected] Subj: Re: RE: multiple dbm hangs on thread_block --=_ORCL_38673501_0_11919706032002120 Content-Transfer-Encoding:7bit Content-Type:text/plain; charset="us-ascii" The hung situation still persists. And I cannot relate it to any Oracle's potential problem. The only thing I could think of now is latch problem. Since latch is implemented in assembly language and the "as" compiler seems output a significant small .o for me, where in 3.1, I got a much bigger object file. Anyway, I've attached the files you need. (I need more semaphores in order for several database running together.) Norman Liang Oracle Corporation | |||||
527.6 | but it is hanging on creat syscall | SMURF::BAT | Segui la tua beatitudine | Thu Jun 05 1997 14:32 | 15 |
From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 3-JUN-1997 22:28:01.30 To: smurf::bat CC: Subj: Re: RE: multiple dbm hangs on thread_block --=_ORCL_38673663_0_11919706032006300 Content-Transfer-Encoding:7bit Content-Type:text/plain; charset="us-ascii" One more thing you need to know. We're using a multi-processors machine and It's more likely to hang when I'm using parallel query option from Oracle. Norman Liang Oracle Coporation | |||||
527.7 | got to get this to norman | SMURF::BAT | Segui la tua beatitudine | Thu Jun 05 1997 21:53 | 3 |
I just found out that dlm stuff is in the kernel, for TruClusters -- still don't know where the pool is. |