[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9772.0. "NXM 2-level scheduling" by NNTPD::"[email protected]" (John McDonald) Sun May 11 1997 17:40

Does anyone know where I can get some info on the new NXM 2-level
scheduling stuff in 4.0? I had a customer ask me about it and
I realized that I didn't understand it myself. Any help, pointers,
etc. would be appreciated.

John McDonald
Atlanta CSC

[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
9772.1"Guide to DECthreads" +RHETT::PARKERMon May 12 1997 10:3112
    
    Hi, John.
    
    The "Guide to DECthreads", appendix A, has some good info w/
    pointers to papers also. There is also some info in here and
    in the CLT::CMA notes conference.  Maybe one of the experts
    will tell us if there are other locations...
    
    Hth,
    
    Lee
    
9772.2Thanx.NNTPD::"[email protected]"John McDonaldMon May 12 1997 11:305
Thanx for the pointer, Lee. I'll take a look...

John McDonald

[Posted by WWW Notes gateway]
9772.3101SMURF::DENHAMDigital UNIX KernelMon May 12 1997 11:4488
    We've been busy so long now refining and tuning the 2-level interface
    that we've "neglected" to generate any materials on how the interface
    works that's suitable for public consumption. I'll dash down a quick
    description here and hope it will be useful as a start.
    
    In point of fact, a user really doesn't need to know that much about
    how the system works. It's essentially a "classic" multilevel
    scheduler, capable of performing "library" (user-space) and kernel-mode
    context switching, with no loss of MP concurrency. Those are the
    "levels" of the scheduler. The V3.2 OS (and earlier) was a single-level
    scheduler -- the kernel did all the context switching. In V4.0, a
    second level of scheduling was added to allow the DECthreads library to
    context switch its own threads. This is really a hybrid of purely
    kernel-based multithreading (one kernel thread per user thread, i.e.,
    "one to one") and purely library-based multithreading (one kernel
    thread per multiple user threads, i.e., "many to one."). The "nxm"
    facility code you refer to is in fact a pronouncable corruption of
    M-x-N ("M by N"), which mean "many to some." The motivation for adding
    the level is simple: the library can context switch threads with less
    overhead that the kernel can.
    
    In other words, when a pthread blocks on a mutex, conditon varaible, or
    some derivitive thereof (such as a pthread_join), the context switch
    occurs completely in the library -- there is no kernel involvement.
    These user threads are run and context switched on one or more kernel-
    threads created by the DECthreads library. There is usually one of
    these scheduling threads per physical processor on the system, so they
    are referred to as virtual processors (VPs).
    
    When a pthread blocks in a system call, the kernel performs the context
    switch to another kernel (Mach) thread. This means that the user thread
    *and* its VP are blocked in the kernel.
    
    Now, if the user thread and its VP were to stay blocked in the kernel
    until the syscall completes, the threads library could completely lose
    its ability to execute (one VP blocked on a uniprocessor) or loss part
    of its ability to execute threads concurrently (one VP blocked on a
    multiprocessor). Therefore, to maintain the requested number of VPs for
    the application, the kernel "creates" a new VP thread to replace the
    one blocked in a system call. It then hands this thread off to the
    threads library in an event called an upcall, which is simply an
    asynchronous entry into the library's upcall handler, very much like a
    UNIX signal. The delivery of this new VP via the upcall also informs
    the library that one of its pthreads has been blocked in the kernel on
    a syscall. Other blocking events such as page faults cause the same
    series of events.
    
    When the user thread blocked in the kernel becomes runnable again (I/O
    completed, page faulted in, timer expired, signal delivered, etc.), it
    returns to the threads library by performing another upcall operation
    -- an unblock upcall. It performs this upcall still executing on the
    kernel thread that was once a VP. The unblock upcall causes the library
    to ready the unblocked thread to run and to return the former VP thread
    to the kernel for later reuse as a replacement VP. Other upcalls occur
    as well. For example, the threads library timeslices its threads using
    the quantum-expiration upcall, and thread cancelation events detected
    by the kernel are passed to the library via a cancelation upcall.
    
    All this happens for what POSIX calls "process contention scope"
    threads. The scheduling priorities and policies of these threads are
    meaningful only in relation to other threads in the same process. These
    are the only types of threads supported before the upcoming V4.0D
    release. With V4.0D, we're adding support for POSIX "system contention
    scope" threads. These are threads that simply don't participate in
    2-level scheduling. They don't run on VPs and are scheduled soley by
    the kernel. Their advantage is that their scheduling priorities and
    policies *are* visible to the kernel, so they compete with threads in
    other processes, making them suitable for applications with harder
    realtime requirements.
    
    That's really most of what anyone should need to know. Under the covers,
    there's a fair amount of plumbing necessary to maintain correct pthread
    semantics for PCS threads, especially in the area of signals. The major
    technique for acheiving this is a per-VP communication area shared with
    the kernel, where the relevant features of the currently executing
    pthread are directly available to the kernel. Bits such as the current
    signal mask, the currently pending per-thread signals, the cancelation
    state, and the quantum countdown value appear in this shared area
    whenever the library context switches a pthread onto a VP for
    execution.
    
    Some text books out there do a pretty good job at describing 2-level
    concepts. I'm sure Dave Butenhof's upcoming threads book from
    Addison-Wesley probably does an OK job at it too :^).
    
    Hope this is helpful,
    
    Jeff
9772.4Thanx.NNTPD::"[email protected]"John McDonaldMon May 12 1997 17:1810
Jeff,

Thanx. That's exactly what I was looking for. I can understand the bits
and pieces in the source, but I need a context for them. I appreciate
it.

John McDonald
Atlanta CSC

[Posted by WWW Notes gateway]