| We've been busy so long now refining and tuning the 2-level interface
that we've "neglected" to generate any materials on how the interface
works that's suitable for public consumption. I'll dash down a quick
description here and hope it will be useful as a start.
In point of fact, a user really doesn't need to know that much about
how the system works. It's essentially a "classic" multilevel
scheduler, capable of performing "library" (user-space) and kernel-mode
context switching, with no loss of MP concurrency. Those are the
"levels" of the scheduler. The V3.2 OS (and earlier) was a single-level
scheduler -- the kernel did all the context switching. In V4.0, a
second level of scheduling was added to allow the DECthreads library to
context switch its own threads. This is really a hybrid of purely
kernel-based multithreading (one kernel thread per user thread, i.e.,
"one to one") and purely library-based multithreading (one kernel
thread per multiple user threads, i.e., "many to one."). The "nxm"
facility code you refer to is in fact a pronouncable corruption of
M-x-N ("M by N"), which mean "many to some." The motivation for adding
the level is simple: the library can context switch threads with less
overhead that the kernel can.
In other words, when a pthread blocks on a mutex, conditon varaible, or
some derivitive thereof (such as a pthread_join), the context switch
occurs completely in the library -- there is no kernel involvement.
These user threads are run and context switched on one or more kernel-
threads created by the DECthreads library. There is usually one of
these scheduling threads per physical processor on the system, so they
are referred to as virtual processors (VPs).
When a pthread blocks in a system call, the kernel performs the context
switch to another kernel (Mach) thread. This means that the user thread
*and* its VP are blocked in the kernel.
Now, if the user thread and its VP were to stay blocked in the kernel
until the syscall completes, the threads library could completely lose
its ability to execute (one VP blocked on a uniprocessor) or loss part
of its ability to execute threads concurrently (one VP blocked on a
multiprocessor). Therefore, to maintain the requested number of VPs for
the application, the kernel "creates" a new VP thread to replace the
one blocked in a system call. It then hands this thread off to the
threads library in an event called an upcall, which is simply an
asynchronous entry into the library's upcall handler, very much like a
UNIX signal. The delivery of this new VP via the upcall also informs
the library that one of its pthreads has been blocked in the kernel on
a syscall. Other blocking events such as page faults cause the same
series of events.
When the user thread blocked in the kernel becomes runnable again (I/O
completed, page faulted in, timer expired, signal delivered, etc.), it
returns to the threads library by performing another upcall operation
-- an unblock upcall. It performs this upcall still executing on the
kernel thread that was once a VP. The unblock upcall causes the library
to ready the unblocked thread to run and to return the former VP thread
to the kernel for later reuse as a replacement VP. Other upcalls occur
as well. For example, the threads library timeslices its threads using
the quantum-expiration upcall, and thread cancelation events detected
by the kernel are passed to the library via a cancelation upcall.
All this happens for what POSIX calls "process contention scope"
threads. The scheduling priorities and policies of these threads are
meaningful only in relation to other threads in the same process. These
are the only types of threads supported before the upcoming V4.0D
release. With V4.0D, we're adding support for POSIX "system contention
scope" threads. These are threads that simply don't participate in
2-level scheduling. They don't run on VPs and are scheduled soley by
the kernel. Their advantage is that their scheduling priorities and
policies *are* visible to the kernel, so they compete with threads in
other processes, making them suitable for applications with harder
realtime requirements.
That's really most of what anyone should need to know. Under the covers,
there's a fair amount of plumbing necessary to maintain correct pthread
semantics for PCS threads, especially in the area of signals. The major
technique for acheiving this is a per-VP communication area shared with
the kernel, where the relevant features of the currently executing
pthread are directly available to the kernel. Bits such as the current
signal mask, the currently pending per-thread signals, the cancelation
state, and the quantum countdown value appear in this shared area
whenever the library context switches a pthread onto a VP for
execution.
Some text books out there do a pretty good job at describing 2-level
concepts. I'm sure Dave Butenhof's upcoming threads book from
Addison-Wesley probably does an OK job at it too :^).
Hope this is helpful,
Jeff
|