[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::dec_mls_plus

Title:	dec_mls_plus

Moderator:	SMURF::BAT

Created:	Mon Nov 29 1993
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	534
Total number of notes:	2544

527.0. "Oracle: V4.0A porting problem: hung processes" by SMURF::BAT (Segui la tua beatitudine) Mon Jun 02 1997 16:16

    Oracle called.  When they run more than one Oracle database at one
    time, after several hours, they will end up hung.  This doesn't
    happen on MLS+ V3.1A, this is just on their V4.0A port.
    
    They all appear to be in a thread_block state within msg_dequeue.
    I've asked him to send me the stack trace of each one.
    
    Is it possible that something is not thread safe?  How do we go about
    finding it?

T.R	Title	User	Personal Name	Date	Lines
527.1	From Norman. Chris called later -- still have to call back	SMURF::BAT	Segui la tua beatitudine	`Tue Jun 03 1997 20:35`	71
	From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 2-JUN-1997 16:56:02.03 To: smurf::bat CC: Subj: stack of hung problem Barbara, Here is the stack trace of the hung process: [2] record output foo.tmp (0 lines) (dbx) (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] 1 msg_dequeue(0xfffffc0001dcea80, 0xfffffc0007f9aa48, 0xffffffff80309020, 0xffffffff883738e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ another dbm: ] 1 msg_dequeue(0xfffffc0003232a80, 0xfffffc000423a048, 0xffffffff80309020, 0xffffffff882538e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ yet another dbm hung on the same system: ] 1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 0xffffffff8816f8e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx) (dbx) > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac ] [ yet another dbm hung on the same system: ] 1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 0xffffffff8816f8e0, 0x0 ) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 0x1400e5710, 0x400) ["../../../.. /src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) ["../../../../src/kernel/arch/ alpha/locore.s":1333, 0xfffffc000051adf8] (dbx)
527.2	next step	SMURF::BAT	Segui la tua beatitudine	`Tue Jun 03 1997 20:41`	13
	I spoke to Kris this morning to get a pointer to the right place to ask about how to find this. She said 1. Dave Long is the right guy to ask about thread issues. 2. Try building them a debug kernel: Ask them what software options they have in their kernel (ask them for their conf file), then build them a genvmunix with those options and with CFLAGS=g3 and no optimizations, and send it to them. Then, next time these dbm's hang, they should force a crash and send the crash dump files here.
527.3	now to build a debug kernel	SMURF::BAT	Segui la tua beatitudine	`Thu Jun 05 1997 12:42`	2
	Norman sent me the kernel options list; I've archived it in ~ftp/pub/oracle
527.4	what is dlm?	SMURF::BAT	Segui la tua beatitudine	`Thu Jun 05 1997 13:01`	5
	In further discussing this hang, Norman said that he had had to remove the calls to the lock manager their code was using... he said because "they couldn't find the 'Digital Lock Manager' code in V4... e.g., /usr/include/sys/dlm.h and the dlm_detach, etc., routines... where are they?"
527.5	I put the files in ~ftp/pub/oracle	SMURF::BAT	Segui la tua beatitudine	`Thu Jun 05 1997 13:30`	21
	From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 3-JUN-1997 22:23:47.95 To: smurf::bat CC: [email protected], [email protected] Subj: Re: RE: multiple dbm hangs on thread_block --=_ORCL_38673501_0_11919706032002120 Content-Transfer-Encoding:7bit Content-Type:text/plain; charset="us-ascii" The hung situation still persists. And I cannot relate it to any Oracle's potential problem. The only thing I could think of now is latch problem. Since latch is implemented in assembly language and the "as" compiler seems output a significant small .o for me, where in 3.1, I got a much bigger object file. Anyway, I've attached the files you need. (I need more semaphores in order for several database running together.) Norman Liang Oracle Corporation
527.6	but it is hanging on creat syscall	SMURF::BAT	Segui la tua beatitudine	`Thu Jun 05 1997 13:32`	15
	From: US2RMC::"[email protected]" "NLIANG.US.ORACLE.COM" 3-JUN-1997 22:28:01.30 To: smurf::bat CC: Subj: Re: RE: multiple dbm hangs on thread_block --=_ORCL_38673663_0_11919706032006300 Content-Transfer-Encoding:7bit Content-Type:text/plain; charset="us-ascii" One more thing you need to know. We're using a multi-processors machine and It's more likely to hang when I'm using parallel query option from Oracle. Norman Liang Oracle Coporation
527.7	got to get this to norman	SMURF::BAT	Segui la tua beatitudine	`Thu Jun 05 1997 20:53`	3
	I just found out that dlm stuff is in the kernel, for TruClusters -- still don't know where the pool is.