T.R | Title | User | Personal Name | Date | Lines |
---|
1515.1 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Thu Apr 03 1997 07:59 | 5 |
| No useful comment can possibly be made without knowing which VERSION of the
O/S you're using. At least, I can infer that you're using UNIX, which is more
information than some people manage to convey...
/dave
|
1515.2 | Lotsa possibilities... | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Apr 03 1997 11:45 | 10 |
| I'd say that it's most likely an inter-process communication issue (i.e., a
race condition resulting from the order in which your processes are running).
However, remember that the EV5 chip takes more liberties with memory writes
than did the EV4, so if you're not using mutexes around memory which is
shared between threads (or processes??) you'll be subject to
read/write-ordering problems.
Webb
|
1515.3 | 3.2G | TUXEDO::CHUBB | | Thu Apr 03 1997 13:46 | 7 |
| Thanks for the info Webb. The fact that there's a practical
software-behavior difference between EV5 and EV4 is news to me, and may
explain why we don't hit the problem on every SMP machine.
I certainly meant to incude the Unix version: it's 3.2G.
-- brandon
|
1515.4 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Thu Apr 03 1997 14:07 | 24 |
| > The fact that there's a practical
> software-behavior difference between EV5 and EV4 is news to me
Just wait 'til you see an EV6. EV4 was a good uniprocessor, but a poor
multiprocessor. EV5 is pretty slick. EV6 blazes. A good part of the
improvements come from mining the deliberate looseness of the memory ordering
and latency rules in the Alpha architecture (of which EV4 took no advantage).
If you touch any shared memory without a mutex or low-level hardware
operations, you're in big trouble!
Use a mutex for every access to shared memory (not just for WRITES... also
for READS) and you'll be fine. You can skip the mutex for READS when you are
only reading ONE value and you don't care if you get the LATEST value. If you
care about getting the latest value, or you're reading more than one value,
you always need a mutex. (The next level of optimization is to figure out the
proper use of MB instructions when you care about sequence but not latency.)
The main reason I was so concerned about the version, by the way, is that
bind_to_cpu is completely meaningless for a threaded process on 4.0 (up until
4.0D comes out) -- that would have made your apparent differences based on
CPU completely coincidental. But, of course, you should always give the
version anyway!
/dave
|
1515.5 | bind_to_cpu meaningless before 4.0D? | HYDRA::SOUZA | For Internal Use Only | Sat May 10 1997 14:58 | 8 |
| re: -1
Can you elaborate on bind_to_cpu being meaningless before 4.0D?
thanks
bob
|
1515.6 | Why do you think you need to request a binding? | WTFN::SCALES | Despair is appropriate and inevitable. | Mon May 12 1997 14:52 | 15 |
| .5> Can you elaborate on bind_to_cpu being meaningless before 4.0D?
For process contention scope threads (i.e., all threads as of V4.0, and the
default threads in V4.0D) there is no way to force a particular thread to run on
a particular CPU. That is, calling bind_to_cpu() will not have the effect that
you expect. (It may or may not return an error, depending on what version you
are calling it on.)
In the next major functional release, we expect to provide a mechanism for
restricting threads' access to the available CPUs. However, as I said, there is
no way to do that in V4.0[A|B|C], and you have to resort to using system
contention scope threads to do it on V4.0D.
Webb
|
1515.7 | scheduling description? | HYDRA::SOUZA | For Internal Use Only | Mon May 12 1997 15:01 | 5 |
| Thanks.
Is there a description of how all this scheduling works anywhere?
|
1515.8 | What do you want to know? | WTFN::SCALES | Despair is appropriate and inevitable. | Mon May 12 1997 15:31 | 19 |
| .7> Is there a description of how all this scheduling works anywhere?
Which "this scheduling"? Do you mean "process contention scope" vs. "system
contention scope"? Do you mean V3 vs V4? Do you mean "how does two-level
scheduling work"? Do you mean how do threads scheduling policies and priorities
affect their access to the processor(s)? ...
Yours is kind of an open question... Yes, I think there are descriptions of
various aspects of "thread scheduling" in the DECthreads documentation and
elsewhere in this conference. However, none of it is too detailed, since much
of the capabilities are new and most of them have been changing in various small
ways in the past three years and will continue to do so at least through the
next release of each of the major operating systems.
I don't have the luxury of being able to write a comprehensive review here, but
if you have a specific question or to, I'd be happy to try to address it.
Webb
|
1515.9 | scheduling | HYDRA::SOUZA | For Internal Use Only | Mon May 12 1997 16:07 | 18 |
| I guess a more specific question would be better.
A customer who is familiar with what Sun calls lightweight threads asked me:
How are 'user threads' (which I think is what we would call process contention
scope) mapped to 'kernel threads' (which I think is what we would call
system contention scope), and is is possible to control this mapping?
How are threads bound to a cpu, and is is possible to control the binding?
They are running Digital Unix 4.0B.
Whether these are reasonable questions is not clear.
Thanks
bob
|
1515.10 | | SMURF::DENHAM | Digital UNIX Kernel | Mon May 12 1997 18:22 | 5 |
| Well, as starters about how this stuff works in general, see
not 9772 in tle::digital_unix.
Before, V4.0D, there is no way to control any mapping of a thread
to anything, kernel thread or CPU.
|
1515.11 | It's too bad that people think they need this stuff... | WTFN::SCALES | Despair is appropriate and inevitable. | Mon May 12 1997 20:37 | 69 |
| .9> A customer who is familiar with what Sun calls lightweight threads asked me
We should have a very good story to tell, relative to Sun...however, as we well
know, Sun tends to be much better at spinning things than we are...
.9> How are 'user threads' (which I think is what we would call process
.9> contention scope) mapped to 'kernel threads' (which I think is what we
.9> would call system contention scope), and is is possible to control this
.9> mapping?
Both Digital Unix (in either V3.2 or again starting in V4.0D) and current
versions of Solaris offer the ability to create a thread which is scheduled by
the kernel. It sounds like Sun calls these "kernel threads"; we use the POSIX
term, "system contention scope thread". Either way, when a processor selects
something to run, it selects the kernel-scheduled thread with the highest
scheduling precedence and switches to its context, without regard to which
process the thread is in.
Both Digital Unix (as of V4.0) and current versions of Solaris offer threads
which are scheduled by the threads library. It sounds like Sun calls these
"user threads"; we use the POSIX term, "process contention scope thread".
(Since both process contention scope and system contention scope threads can
execute application (i.e., "user") code in non-priviledged (i.e., "user") mode,
the terms "user threads" and "kernel threads" can be confusing...) These
threads are executed in the context of one or more kernel-scheduled entities,
each of which selects a process contention scope thread from those available in
the process, based on the PCS threads' scheduling parameters. (Meanwhile, the
OS kernel selects which of the "kernel-scheduled" entities to run at any given
time, based on a separate set of scheduling parameters.)
The big difference between Digital Unix and Solaris is that, when using process
contention scope threads, you don't have to guess a priori how many
kernel-scheduled entities (we call them "virtual processors" (VPs)) you will
need if you're running on Digital Unix. DECthreads creates one VP (as needed)
for each processor which is available to the process. If you have fewer VPs
than that, then your process cannot take full advantage of the machine; if you
have more VPs than that, then they fight among themselves for access to the
processors and you lose throughput to the content-switch overhead. However, Sun
doesn't currently have the ability to replace VPs when they block (e.g., in a
system call, for I/O, or for page faults). This means that sometimes your
process will have too few VPs, unless you set the "concurrency level" up, in
which case at times you'll have too many. (This is one of the major selling
points of having the full two-level scheduling model -- you can have analogous
problems if you rely solely on system contention scope threads.)
Anyway, to get back toward your question... Digital's implementation of threads
is targetted at maximizing throughput. Thus, we have have tried to obviate the
need for use of big hammers like binding. Instead, we try to schedule
application threads whereever and however is most efficient from a throughput
perspective. (Also, binding tends to be very sensitive to the sort of machine
you're running on, so it's not very flexible in terms of a single executable
running on different machines; whereas, DECthreads is completely adaptive.)
Nevertheless, we are sensitive to application-providers' interest in being able
to control this stuff. In V4.0D an application is able to bind a given system
contention scope thread to a specific physical processor so that it cannot run
anywhere else (using bind_to_cpu()). Not that I would recommend it, but by
careful use of this stuff (and exhaustive knowlege of your application and
system configuration), you can arrange it so that a specific system contention
scope thread can get exclusive use of a specific processor (which is a
hurrendous waste, but there you go). In the following functional release of
Digital Unix, there will be a new interface which allow you to do analogous
things with process contention scope threads (although, hopefully, the interface
will prove much more flexible and effective than bind_to_cpu()). So, if your
customer really wants to shoot his application in the foot, you can now tell him
when the bullets will be available...
Webb
|
1515.12 | ask the right question, get the right answer... | HYDRA::SOUZA | For Internal Use Only | Mon May 12 1997 20:52 | 4 |
| Thanks very much, that's very helpful.
bob
|
1515.13 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Mon May 19 1997 09:34 | 59 |
| I found Webb's reply a little confusing. So even though this is a bit late (I
was out last week), I'm going to try my own spin -- perhaps more legible
given that I know more about Solaris than Webb ;-)
Solaris:
1) The kernel provides LWPs, "light weight processes", which the kernel
schedules onto processors. Solaris supports a form of realtime scheduling
control over these LWPs, though not the POSIX 1003.1b APIs, and it also
timeslices non-realtime LWPs.
2) The Solaris thread libraries (libpthread for POSIX and libthread for UI
threads) initially create an LWP for each processor available to the process.
You can create "bound" threads (THR_BOUND flag in the UI interface, or
PTHREAD_SCOPE_SYSTEM in the POSIX interface) which are permanently attached
to a new (private) LWP. Or you can create "unbound" threads (which are the
default in both interfaces).
3) The thread library does user-mode context switching of unbound threads,
when a thread blocks on a mutex, condition variable, or read/write lock,
scheduling them among the various LWPs that aren't attached to BOUND threads.
Note that there is no support for realtime scheduling OR timeslicing in the
user mode scheduler. Furthermore, when a thread blocks in the kernel, e.g.,
on a read(2) call, the LWP on which it is currently scheduled remains bound
to the thread until it returns.
4) When the last unbound LWP in a process is blocked in the kernel, the
kernel issues a special signal to allow the library to create additional
LWPs. This still, however, reduces the concurrency of the process to 1.
Digital UNIX:
1) The kernel provides Mach threads, of which the user thread library
utilizes a special subset termed "scheduler threads" to implement what we
call "virtual processors". The kernel supports full POSIX realtime scheduling
for Mach threads, but (prior to 4.0D) this isn't useful to threaded programs.
The kernel also timeslices non-realtime Mach threads.
2) The thread library (libpthread) initially creates a scheduler thread for
each processor available to the process. Prior to 4.0D, you can create only
process contention scope threads (PCS). In 4.0D and later, you can create
either PCS threads or SCS threads (system contention scope). PCS is "unbound"
in Solaris terms, while SCS is "bound". (Digital UNIX uses the POSIX terms
exclusively, while Solaris still clings to the proprietary UI thread terms.)
3) The thread library does context switching of PCS threads among the
scheduler threads it controls. Blocking on process synchronization objects
(mutexes, condition variables, etc.) occur completely in user mode. When a
PCS thread blocks in the kernel, an "upcall" occurs -- the kernel provides a
"replacement virtual processor" on which the library immediately schedules a
new PCS thread, if any are ready to run, instead of reducing the process
concurrency. The thread library supports the full POSIX scheduling model, and
timeslicing, in user mode.
4) The process concurrency can be artificially reduced only when the process
(or user) has exceeded the allowed quota of kernel threads, so that no
additional replacement VPs can be provided by the kernel, in which case the
process simply waits until a blocked kernel call completes. (The limitation
is entirely resource-bounded, never arbitrary.)
|