Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
Notice: | Welcome to the Digital UNIX Conference |
Moderator: | SMURF::DENHAM |
Created: | Thu Mar 16 1995 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 10068 |
Total number of notes: | 35879 |
Hello all, today, one of our customers experienced a process hang due to threads which were blocked in an uninterruptible sleep. My findings indicate the following scenario : 1. Thread "A" faulted on virtual address (VA) 0x1407bc000. "u_anon_fault" acquired the anon cluster lock of array entry 58, i.e. ao_acla[58].acl_klock. Then thread "A" was blocked within "u_anon_faultpage" since the associated page was busy. 2. Thread "B" faulted on virtual address (VA) 0x1407b8000. "u_anon_fault" attempted to acquire the anon cluster lock of array entry 58, BUT THIS ENTRY IS STILL HELD BY THREAD "A". Hence, thread "B" was blocked in an uninterruptible sleep. 3. Thread "C" attempted to terminate the process. It had to wait for thread "B" but this thread was blocked indefinitely. There is one remarkable fact : The page which thread "A" had been waiting for is no longer busy, i.e. there's no I/O hanging ! The relevant data structures are appended below. The O/S is V3.2G with patch OSF375-050 installed. I recommended to replace OSF375-050 by OSF375-056, because the only difference between the two patch versions is the kernel object "u_mape_anon.o". The revision of "u_mape_anon.c" was changed from 1.1.143.2 in OSF375-050 to 1.1.143.3 to OSF375-056. Now my question : Does "u_mape_anon.c" revision 1.1.143.3 contain the fix for the type of scenario described above ? This is urgent since the problem occured in a mission critical enviroment. I must make sure that OSF375-056 solves the problem. If there are any doubts, I'll have to open an IPMT case. I would really appreciate a quick reply ! Many thanks in advance and Best Regards, Uli Obergfell Offsite Services Open Systems / CSC Munich DTN : 775-8137 E-Mail : [email protected] -------------------------------------------------------------------------------- # ps -m -p 16046 PID TTY S TIME COMMAND 16046 ?? U 26:54.97 /usr/users/inss7/bin/gwcs1 /usr/users/inss7/scr T 0:08.71 T 1:25.64 TW 0:00.00 TW 0:00.00 TW 0:00.39 T 0:00.40 T 0:11.90 T 0:42.35 T 0:00.01 TW 0:00.01 TW 0:00.00 TW 0:00.03 TW 0:00.00 TW 0:33.11 T 0:41.72 H 0:00.00 H 0:00.00 T 0:00.01 H 0:00.00 H 0:00.00 H 0:00.00 TW 0:00.01 TW 0:00.00 TW 0:21.88 TW 0:00.00 T 0:31.79 TW 0:00.00 TW 0:00.00 TW 0:00.00 TW 0:00.01 T 14:12.94 T 3:21.42 U 0:41.56 <--- Thread "B" T 0:08.33 TW 0:00.09 U 0:00.07 <--- Thread "C" T 0:00.11 T 0:00.48 H 0:01.72 T 0:00.50 T 0:00.45 T 1:49.16 T 0:39.21 T 0:41.46 T 0:39.48 H 0:00.00 # dbx -k /vmunix (dbx) set $pid=16046 (dbx) tstack : Thread 0xfffffc001e5ed000: ... Thread "B" > 0 thread_block 1 u_anon_fault(0xfffffc0009a43860, 0x1407b8000, ...) 2 u_map_fault ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^ 3 vm_fault vm_map_entry faulting VA 4 trap 5 _XentMM Thread 0xfffffc001e5ed800: ... Thread "A" > 0 thread_block 1 u_anon_faultpage 2 u_anon_fault(0xfffffc0009a43860, 0x1407bc000, 0x1, ...) 3 u_map_fault ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^ 4 vm_fault vm_map_entry faulting VA 5 trap 6 _XentMM Thread 0xfffffc001e5ec000: ... Thread "C" > 0 thread_block 1 thread_dowait 2 task_dowait 3 thread_ex_check 4 exit 5 rexit 6 syscall 7 _Xsyscall # kdbx -k /vmunix (kdbx) px *(struct vm_map_entry *)0xfffffc0009a43860 struct { vme_links = struct { prev = 0xfffffc0009a42960 next = 0xfffffc0019cc2f60 start = 0x14005e000 end = 0x142656000 } vme_map = 0xfffffc00182189c0 vme_uobject = union { vm_object = 0xfffffc0016f66c80 sub_map = 0xfffffc0016f66c80 } vmet = union { tvme_offset = 0x0 tvme_seg = (nil) } vme_ops = 0xfffffc0000643650 vme_vpage = struct { _uvpage = union { _uvp = struct { _uvp_prot = 0x7 _uvp_plock = 0x0 } _kvp = struct { _kvp_prot = 0x7 _kvp_kwire = 0x0 } } } vme_faultlock = struct { sl_data = 0x389628 sl_info = 0x0 sl_cpuid = 0x0 sl_lifms = 0x0 } vme_faults = 0x3 vmeu = union { uvme = struct { uvme_faultwait = 0x0 uvme_keep_on_exec = 0x0 uvme_inheritance = 0x1 uvme_maxprot = 0x7 } kvme = struct { kvme_faultwait = 0x0 kvme_is_submap = 0x0 kvme_copymap = 0x1 } } vme_private = 0x0 } (kdbx) px *(struct vm_anon_object *)0xfffffc0016f66c80 struct { ao_object = struct { ob_memq = (nil) ob_lock = struct { sl_data = 0x3812c0 sl_info = 0x0 sl_cpuid = 0x0 sl_lifms = 0x0 } ob_ops = 0xfffffc0000643888 ob_aux_obj = (nil) ob_ref_count = 0x1 ob_res_count = 0x1 ob_size = 0x2600000 ob_resident_pages = 0x0 ob_flags = 0x1 ob_type = 0x2 } ao_flags = 0x0 ao_rbase = 0x0 ao_crefcnt = 0x1 ao_rswanon = 0x0 ao_swanon = (nil) ao_ranon = 0x12fc ao_bobject = (nil) ao_boffset = 0x0 ao_acla = 0xffffffff807a0000 } (kdbx) px ((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58] struct { acl_klock = struct { akl_slock = struct { sl_data = 0x36ce30 sl_info = 0x0 sl_cpuid = 0x0 sl_lifms = 0x0 } akl_want = 0x10600001 akl_lock = 0x3 akl_mlock = 0x1 akl_plock = 0x0 akl_rpages = 0x10 akl_anon = 0x10 akl_pagelist = 0xfffffc0000e21f40 } acl_anon = { [0] 0xffffffffa0dd17d0 [1] 0xffffffffa0dd17e0 [2] 0xffffffffa1014da0 [3] 0xffffffffa0dd17f0 [4] 0xffffffffa1014db0 [5] 0xffffffffa0dd1800 [6] 0xffffffffa1014dc0 [7] 0xffffffffa0dd1810 [8] 0xffffffffa1014dd0 [9] 0xffffffffa0dd1820 [10] 0xffffffffa1014de0 [11] 0xffffffffa0dd1830 [12] 0xffffffffa1014df0 [13] 0xffffffffa0dd1840 <--- array entry for VA 0x1407b8000 [14] 0xffffffffa1014e00 [15] 0xffffffffa0dd1850 <--- array entry for VA 0x1407bc000 } } (kdbx)px *((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58].acl_anon[13] struct { _uanonx = union { _an_page = 0xfffffc0000c43020 _an_next = 0xfffffc0000c43020 } _uanony = union { _an_bits0 = struct { _an_refcnt = 0x1 _an_cowfaults = 0x0 _an_hasswap = 0x0 _an_type = 0x0 } _an_bits1 = struct { _an_anon = 0x1 _an_type1 = 0x0 } } } (kdbx) px *(struct vm_page *)0xfffffc0000c43020 struct { pg_pnext = 0xfffffc0000ceaf00 pg_pprev = 0xfffffc0000c12f60 pg_onext = 0xfffffc0000e77380 pg_oprev = 0xfffffc0000b82ea0 pg_hnext = 0xfffffc0000a5a1a0 pg_hprev = 0xfffffc0000bfc640 pg_object = 0xfffffc001fe04bc0 pg_offset = 0xbb08000 pg_wire_count = 0x0 pg_iocnt = 0x0 pg_free = 0x0 pg_busy = 0x0 pg_wait = 0x0 pg_error = 0x0 pg_dirty = 0x0 pg_zeroed = 0x0 pg_reserved = 0x1 pg_hold = 0x0 pg_phys_addr = 0xd1b4000 _upg = union { _apg = struct { ap_owner = 0xfffffc0016f66c80 ap_roffset = 0x75a000 } _vppg = struct { vp_addr = 0xfffffc0016f66c80 vp_pfs = 0x75a000 } _pkva = 0xfffffc0016f66c80 _pg_private = { [0] 0xfffffc0016f66c80 [1] 0x75a000 } } } (kdbx)px *((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58].acl_anon[15] struct { _uanonx = union { _an_page = 0xfffffc0000a0f4a0 _an_next = 0xfffffc0000a0f4a0 } _uanony = union { _an_bits0 = struct { _an_refcnt = 0x1 _an_cowfaults = 0x0 _an_hasswap = 0x0 _an_type = 0x0 } _an_bits1 = struct { _an_anon = 0x1 _an_type1 = 0x0 } } } (kdbx) px *(struct vm_page *)0xfffffc0000a0f4a0 struct { pg_pnext = 0xfffffc0000b30dc0 pg_pprev = 0xfffffc0000fa3440 pg_onext = 0xfffffc0000e21f40 pg_oprev = 0xfffffc0000e77380 pg_hnext = 0xfffffc0000c05a00 pg_hprev = 0xfffffc0000d28700 pg_object = 0xfffffc001fe04bc0 pg_offset = 0xbb0a000 pg_wire_count = 0x0 pg_iocnt = 0x0 pg_free = 0x0 pg_busy = 0x0 pg_wait = 0x0 pg_error = 0x0 pg_dirty = 0x0 pg_zeroed = 0x0 pg_reserved = 0x1 pg_hold = 0x1 pg_phys_addr = 0x15cc000 _upg = union { _apg = struct { ap_owner = 0xfffffc0016f66c80 ap_roffset = 0x75e000 } _vppg = struct { vp_addr = 0xfffffc0016f66c80 vp_pfs = 0x75e000 } _pkva = 0xfffffc0016f66c80 _pg_private = { [0] 0xfffffc0016f66c80 [1] 0x75e000 } } }
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
9894.1 | V3.2G / Thread Hangs in "u_anon_faultpage" | MGOF01::UOBERGFELL | Thu May 22 1997 10:54 | 55 | |
Hello, I've spent some more time on analyzing the problem and I finally found the clue (I was even able to reproduce the problem with a small test program on a V4.0a machine). For those who are interested in details, here's what happens : 1. Thread "A" faults on a page which is currently swapped. The routine "u_anon_fault" acquires the corresponding anon cluster lock and calls "a_anon_getpage". "a_anon_getpage" allocates a new (free) page and initiates the I/O (-> read the page from the swap partition). The page is marked "busy" while the I/O is in progress. "u_anon_fault" then calls "u_anon_faultpage". The routine "u_anon_faultpage" blocks while the page is "busy" (-> wait for I/O completion). THE ESSENTIAL PROBLEM HERE IS, THAT THIS "SLEEP" IS INTERRUPTIBLE (although it shouldn't be) ! 2. Thread "B" faults on a page within the same anon cluster. The routine "u_anon_fault" attempts to acquire the lock which is currently held by thread "A". "u_anon_fault" blocks until the lock will be released by thread "A". THIS "SLEEP" IS NON-INTERRUPTIBLE ! 3. Thread "C" attempts to terminate the process (-> exit system call) while thread "A" and "B" are still blocked. In a multi-threaded process, the exiting thread has to take care, that the other threads enter a "safe" (suspended) state. It may wake up threads which are blocked in an interruptible sleep, but it must wait for non-interruptible threads (-> routine "thread_dowait"). Hence, it wakes up thread "A" but must wait for thread "B". THREAD "A" IMMEDIATELY ENTERS THE SUSPENDED STATE AND DOES NEVER RELEASE THE ANON CLUSTER LOCK. THAT'S WHY THREAD "B" SLEEPS INDEFINITELY AND WHY THREAD "C" WAITS FOREVER. The solution is straightforward : "u_anon_faultpage" must be blocked in a NON- interruptible "sleep" while it waits for the "busy" page. By disassembling "u_anon_faultpage" from patch OSF375-056, I found that this is fixed. I would appreciate a feedback if somebody (from engineering) is going to read this and does not agree. Best Regards, Uli Obergfell |