[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9894.0. "V3.2G / Thread Hangs in "u_anon_faultpage"" by MGOF01::UOBERGFELL () Tue May 20 1997 15:15


Hello all,
today, one of our customers experienced a process hang due to threads which
were blocked in an uninterruptible sleep. My findings indicate the following
scenario :

1.  Thread "A" faulted on virtual address (VA) 0x1407bc000.
    "u_anon_fault" acquired the anon cluster lock of array entry 58,
    i.e. ao_acla[58].acl_klock. Then thread "A" was blocked within
    "u_anon_faultpage" since the associated page was busy.

2.  Thread "B" faulted on virtual address (VA) 0x1407b8000.
    "u_anon_fault" attempted to acquire the anon cluster lock of array entry 58,
    BUT THIS ENTRY IS STILL HELD BY THREAD "A". Hence, thread "B" was blocked in
    an uninterruptible sleep.

3.  Thread "C" attempted to terminate the process. It had to wait for thread "B"
    but this thread was blocked indefinitely.

There is one remarkable fact : The page which thread "A" had been waiting for
is no longer busy, i.e. there's no I/O hanging !

The relevant data structures are appended below.


The O/S is V3.2G with patch OSF375-050 installed.  I recommended to replace
OSF375-050 by OSF375-056, because the only difference between the two patch
versions is the kernel object "u_mape_anon.o". The revision of "u_mape_anon.c"
was changed from 1.1.143.2 in OSF375-050 to 1.1.143.3 to OSF375-056.


Now my question : Does "u_mape_anon.c" revision 1.1.143.3 contain the fix
                  for the type of scenario described above ?


This is urgent since the problem occured in a mission critical enviroment.
I must make sure that OSF375-056 solves the problem. If there are any doubts,
I'll have to open an IPMT case.



I would really appreciate a quick reply !



Many thanks in advance and Best Regards,

Uli Obergfell
Offsite Services Open Systems / CSC Munich
DTN    : 775-8137
E-Mail : [email protected]


--------------------------------------------------------------------------------


# ps -m -p 16046
  PID TTY      S           TIME COMMAND
16046 ??       U       26:54.97 /usr/users/inss7/bin/gwcs1 /usr/users/inss7/scr
               T        0:08.71                                                
               T        1:25.64                                                
               TW       0:00.00                                                
               TW       0:00.00                                                
               TW       0:00.39                                                
               T        0:00.40                                                
               T        0:11.90                                                
               T        0:42.35                                                
               T        0:00.01                                                
               TW       0:00.01                                                
               TW       0:00.00                                                
               TW       0:00.03                                                
               TW       0:00.00                                                
               TW       0:33.11                                                
               T        0:41.72                                                
               H        0:00.00                                                
               H        0:00.00                                                
               T        0:00.01                                                
               H        0:00.00                                                
               H        0:00.00                                                
               H        0:00.00                                                
               TW       0:00.01                                                
               TW       0:00.00                                                
               TW       0:21.88                                                
               TW       0:00.00                                                
               T        0:31.79                                                
               TW       0:00.00                                                
               TW       0:00.00                                                
               TW       0:00.00                                                
               TW       0:00.01                                                
               T       14:12.94                                                
               T        3:21.42                                                
               U        0:41.56 <--- Thread "B"
               T        0:08.33                                                
               TW       0:00.09                                                
               U        0:00.07 <--- Thread "C"
               T        0:00.11                                                
               T        0:00.48                                                
               H        0:01.72                                                
               T        0:00.50                                                
               T        0:00.45                                                
               T        1:49.16                                                
               T        0:39.21                                                
               T        0:41.46                                                
               T        0:39.48                                                
               H        0:00.00                                                


# dbx -k /vmunix

(dbx) set $pid=16046
(dbx) tstack

     :

Thread 0xfffffc001e5ed000: ... Thread "B"
>  0 thread_block
   1 u_anon_fault(0xfffffc0009a43860, 0x1407b8000, ...)
   2 u_map_fault  ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
   3 vm_fault        vm_map_entry     faulting VA
   4 trap
   5 _XentMM

Thread 0xfffffc001e5ed800: ... Thread "A"
>  0 thread_block
   1 u_anon_faultpage
   2 u_anon_fault(0xfffffc0009a43860, 0x1407bc000, 0x1, ...)
   3 u_map_fault  ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
   4 vm_fault        vm_map_entry     faulting VA
   5 trap
   6 _XentMM

Thread 0xfffffc001e5ec000: ... Thread "C"
>  0 thread_block
   1 thread_dowait
   2 task_dowait
   3 thread_ex_check
   4 exit
   5 rexit
   6 syscall
   7 _Xsyscall


# kdbx -k /vmunix

(kdbx) px *(struct vm_map_entry *)0xfffffc0009a43860
struct {
    vme_links = struct {
        prev = 0xfffffc0009a42960
        next = 0xfffffc0019cc2f60
        start = 0x14005e000
        end = 0x142656000
    }
    vme_map = 0xfffffc00182189c0
    vme_uobject = union {
        vm_object = 0xfffffc0016f66c80
        sub_map = 0xfffffc0016f66c80
    }
    vmet = union {
        tvme_offset = 0x0
        tvme_seg = (nil)
    }
    vme_ops = 0xfffffc0000643650
    vme_vpage = struct {
        _uvpage = union {
            _uvp = struct {
                _uvp_prot = 0x7
                _uvp_plock = 0x0
            }
            _kvp = struct {
                _kvp_prot = 0x7
                _kvp_kwire = 0x0
            }
        }
    }
    vme_faultlock = struct {
        sl_data = 0x389628
        sl_info = 0x0
        sl_cpuid = 0x0
        sl_lifms = 0x0
    }
    vme_faults = 0x3
    vmeu = union {
        uvme = struct {
            uvme_faultwait = 0x0
            uvme_keep_on_exec = 0x0
            uvme_inheritance = 0x1
            uvme_maxprot = 0x7
        }
        kvme = struct {
            kvme_faultwait = 0x0
            kvme_is_submap = 0x0
            kvme_copymap = 0x1
        }
    }
    vme_private = 0x0
} 

(kdbx) px *(struct vm_anon_object *)0xfffffc0016f66c80
struct {
    ao_object = struct {
        ob_memq = (nil)
        ob_lock = struct {
            sl_data = 0x3812c0
            sl_info = 0x0
            sl_cpuid = 0x0
            sl_lifms = 0x0
        }
        ob_ops = 0xfffffc0000643888
        ob_aux_obj = (nil)
        ob_ref_count = 0x1
        ob_res_count = 0x1
        ob_size = 0x2600000
        ob_resident_pages = 0x0
        ob_flags = 0x1
        ob_type = 0x2
    }
    ao_flags = 0x0
    ao_rbase = 0x0
    ao_crefcnt = 0x1
    ao_rswanon = 0x0
    ao_swanon = (nil)
    ao_ranon = 0x12fc
    ao_bobject = (nil)
    ao_boffset = 0x0
    ao_acla = 0xffffffff807a0000
} 

(kdbx) px ((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58]
struct {
    acl_klock = struct {
        akl_slock = struct {
            sl_data = 0x36ce30
            sl_info = 0x0
            sl_cpuid = 0x0
            sl_lifms = 0x0
        }
        akl_want = 0x10600001
        akl_lock = 0x3
        akl_mlock = 0x1
        akl_plock = 0x0
        akl_rpages = 0x10
        akl_anon = 0x10
        akl_pagelist = 0xfffffc0000e21f40
    }
    acl_anon = {
        [0] 0xffffffffa0dd17d0
        [1] 0xffffffffa0dd17e0
        [2] 0xffffffffa1014da0
        [3] 0xffffffffa0dd17f0
        [4] 0xffffffffa1014db0
        [5] 0xffffffffa0dd1800
        [6] 0xffffffffa1014dc0
        [7] 0xffffffffa0dd1810
        [8] 0xffffffffa1014dd0
        [9] 0xffffffffa0dd1820
        [10] 0xffffffffa1014de0
        [11] 0xffffffffa0dd1830
        [12] 0xffffffffa1014df0
        [13] 0xffffffffa0dd1840 <--- array entry for VA 0x1407b8000
        [14] 0xffffffffa1014e00
        [15] 0xffffffffa0dd1850 <--- array entry for VA 0x1407bc000
    }
} 



(kdbx)px *((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58].acl_anon[13]
struct {
    _uanonx = union {
        _an_page = 0xfffffc0000c43020
        _an_next = 0xfffffc0000c43020
    }
    _uanony = union {
        _an_bits0 = struct {
            _an_refcnt = 0x1
            _an_cowfaults = 0x0
            _an_hasswap = 0x0
            _an_type = 0x0
        }
        _an_bits1 = struct {
            _an_anon = 0x1
            _an_type1 = 0x0
        }
    }
} 


(kdbx) px *(struct vm_page *)0xfffffc0000c43020
struct {
    pg_pnext = 0xfffffc0000ceaf00
    pg_pprev = 0xfffffc0000c12f60
    pg_onext = 0xfffffc0000e77380
    pg_oprev = 0xfffffc0000b82ea0
    pg_hnext = 0xfffffc0000a5a1a0
    pg_hprev = 0xfffffc0000bfc640
    pg_object = 0xfffffc001fe04bc0
    pg_offset = 0xbb08000
    pg_wire_count = 0x0
    pg_iocnt = 0x0
    pg_free = 0x0
    pg_busy = 0x0
    pg_wait = 0x0
    pg_error = 0x0
    pg_dirty = 0x0
    pg_zeroed = 0x0
    pg_reserved = 0x1
    pg_hold = 0x0
    pg_phys_addr = 0xd1b4000
    _upg = union {
        _apg = struct {
            ap_owner = 0xfffffc0016f66c80
            ap_roffset = 0x75a000
        }
        _vppg = struct {
            vp_addr = 0xfffffc0016f66c80
            vp_pfs = 0x75a000
        }
        _pkva = 0xfffffc0016f66c80
        _pg_private = {
            [0] 0xfffffc0016f66c80
            [1] 0x75a000
        }
    }
} 

(kdbx)px *((struct vm_anon_object *)0xfffffc0016f66c80).ao_acla[58].acl_anon[15]
struct {
    _uanonx = union {
        _an_page = 0xfffffc0000a0f4a0
        _an_next = 0xfffffc0000a0f4a0
    }
    _uanony = union {
        _an_bits0 = struct {
            _an_refcnt = 0x1
            _an_cowfaults = 0x0
            _an_hasswap = 0x0
            _an_type = 0x0
        }
        _an_bits1 = struct {
            _an_anon = 0x1
            _an_type1 = 0x0
        }
    }
} 

(kdbx) px *(struct vm_page *)0xfffffc0000a0f4a0
struct {
    pg_pnext = 0xfffffc0000b30dc0
    pg_pprev = 0xfffffc0000fa3440
    pg_onext = 0xfffffc0000e21f40
    pg_oprev = 0xfffffc0000e77380
    pg_hnext = 0xfffffc0000c05a00
    pg_hprev = 0xfffffc0000d28700
    pg_object = 0xfffffc001fe04bc0
    pg_offset = 0xbb0a000
    pg_wire_count = 0x0
    pg_iocnt = 0x0
    pg_free = 0x0
    pg_busy = 0x0
    pg_wait = 0x0
    pg_error = 0x0
    pg_dirty = 0x0
    pg_zeroed = 0x0
    pg_reserved = 0x1
    pg_hold = 0x1
    pg_phys_addr = 0x15cc000
    _upg = union {
        _apg = struct {
            ap_owner = 0xfffffc0016f66c80
            ap_roffset = 0x75e000
        }
        _vppg = struct {
            vp_addr = 0xfffffc0016f66c80
            vp_pfs = 0x75e000
        }
        _pkva = 0xfffffc0016f66c80
        _pg_private = {
            [0] 0xfffffc0016f66c80
            [1] 0x75e000
        }
    }
}

T.R Title User Personal
Name Date Lines

9894.1 V3.2G / Thread Hangs in "u_anon_faultpage" MGOF01::UOBERGFELL Thu May 22 1997 10:54 55

T.R	Title	User	Personal Name	Date	Lines
9894.1	V3.2G / Thread Hangs in "u_anon_faultpage"	MGOF01::UOBERGFELL		`Thu May 22 1997 10:54`	55
	Hello, I've spent some more time on analyzing the problem and I finally found the clue (I was even able to reproduce the problem with a small test program on a V4.0a machine). For those who are interested in details, here's what happens : 1. Thread "A" faults on a page which is currently swapped. The routine "u_anon_fault" acquires the corresponding anon cluster lock and calls "a_anon_getpage". "a_anon_getpage" allocates a new (free) page and initiates the I/O (-> read the page from the swap partition). The page is marked "busy" while the I/O is in progress. "u_anon_fault" then calls "u_anon_faultpage". The routine "u_anon_faultpage" blocks while the page is "busy" (-> wait for I/O completion). THE ESSENTIAL PROBLEM HERE IS, THAT THIS "SLEEP" IS INTERRUPTIBLE (although it shouldn't be) ! 2. Thread "B" faults on a page within the same anon cluster. The routine "u_anon_fault" attempts to acquire the lock which is currently held by thread "A". "u_anon_fault" blocks until the lock will be released by thread "A". THIS "SLEEP" IS NON-INTERRUPTIBLE ! 3. Thread "C" attempts to terminate the process (-> exit system call) while thread "A" and "B" are still blocked. In a multi-threaded process, the exiting thread has to take care, that the other threads enter a "safe" (suspended) state. It may wake up threads which are blocked in an interruptible sleep, but it must wait for non-interruptible threads (-> routine "thread_dowait"). Hence, it wakes up thread "A" but must wait for thread "B". THREAD "A" IMMEDIATELY ENTERS THE SUSPENDED STATE AND DOES NEVER RELEASE THE ANON CLUSTER LOCK. THAT'S WHY THREAD "B" SLEEPS INDEFINITELY AND WHY THREAD "C" WAITS FOREVER. The solution is straightforward : "u_anon_faultpage" must be blocked in a NON- interruptible "sleep" while it waits for the "busy" page. By disassembling "u_anon_faultpage" from patch OSF375-056, I found that this is fixed. I would appreciate a feedback if somebody (from engineering) is going to read this and does not agree. Best Regards, Uli Obergfell


Hello,

I've spent some more time on analyzing the problem and I finally found the clue
(I was even able to reproduce the problem with a small test program on a V4.0a
machine).  For those who are interested in details, here's what happens :


1.  Thread "A" faults on a page which is currently swapped.  The routine
    "u_anon_fault" acquires the corresponding anon cluster lock and calls
    "a_anon_getpage".  "a_anon_getpage" allocates a new (free) page and
    initiates the I/O (-> read the page from the swap partition).  The
    page is marked "busy" while the I/O is in progress.  "u_anon_fault"
    then calls "u_anon_faultpage".  The routine "u_anon_faultpage" blocks
    while the page is "busy" (-> wait for I/O completion).

    THE ESSENTIAL PROBLEM HERE IS, THAT THIS "SLEEP" IS INTERRUPTIBLE
    (although it shouldn't be) !


2.  Thread "B" faults on a page within the same anon cluster.  The routine
    "u_anon_fault" attempts to acquire the lock which is currently held by
    thread "A".  "u_anon_fault" blocks until the lock will be released by
    thread "A".

    THIS "SLEEP" IS NON-INTERRUPTIBLE !


3.  Thread "C" attempts to terminate the process (-> exit system call) while
    thread "A" and "B" are still blocked.  In a multi-threaded process,
    the exiting thread has to take care, that the other threads enter a
    "safe" (suspended) state.  It may wake up threads which are blocked
    in an interruptible sleep, but it must wait for non-interruptible
    threads (-> routine "thread_dowait").  Hence, it wakes up thread "A"
    but must wait for thread "B".

    THREAD "A" IMMEDIATELY ENTERS THE SUSPENDED STATE AND DOES NEVER RELEASE
    THE ANON CLUSTER LOCK. THAT'S WHY THREAD "B" SLEEPS INDEFINITELY AND WHY
    THREAD "C" WAITS FOREVER.



The solution is straightforward : "u_anon_faultpage" must be blocked in a NON-
interruptible "sleep" while it waits for the "busy" page.  By disassembling
"u_anon_faultpage" from patch OSF375-056, I found that this is fixed.


I would appreciate a feedback if somebody (from engineering) is going to
read this and does not agree.



Best Regards,
Uli Obergfell