T.R | Title | User | Personal Name | Date | Lines |
---|
1469.1 | You're probably running out of memory segments | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Thu Jan 23 1997 08:41 | 4 |
1469.2 | | SMAUG::GANTYALA | | Thu Jan 23 1997 11:23 | 14 |
1469.3 | | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Thu Jan 23 1997 11:41 | 5 |
1469.4 | Going back to basic ... | SMAUG::GANTYALA | | Thu Jan 23 1997 12:16 | 352 |
1469.5 | Memory (i.e., page-availability) problem: vpagemax or swap space? | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Jan 23 1997 14:49 | 54 |
1469.6 | The story of vm-mapentries and vm-vpagemax | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Jan 23 1997 15:15 | 48 |
1469.7 | vpagemax ... | SMAUG::GANTYALA | | Thu Jan 23 1997 17:13 | 19 |
1469.8 | | SMURF::DENHAM | Digital UNIX Kernel | Fri Jan 24 1997 10:19 | 11 |
| Just for the record, can we see the output of the following commands?
# /sbin/sysconfig -q proc
# /sbin/sysconfig -q generic
I would be nice if we have greater granularity in finding the
source of pthread_create failures. Might be interesting to
compare SCS vs PCS (system vs process scope) creations. Maybe
not...
Remind me, what are the threads doing after they're created?
|
1469.9 | proc, generic, vm info ... | SMAUG::GANTYALA | | Fri Jan 24 1997 11:03 | 114 |
| >> Remind me, what are the threads doing after they're created?.
In test program which is listed in note *.5 doing nothing. But, just
it increments the counter after every five seconds sleep. However, actual
programs, implements SNA protocols to connect IBM mainframe applications.
>> Just for the record, can we see the output of the following commands?
>> # /sbin/sysconfig -q proc
>> # /sbin/sysconfig -q generic
$/sbin/sysconfig -q proc
proc:
max-proc-per-user = 267
max-threads-per-user = 2048
per-proc-stack-size = 2097152
max-per-proc-stack-size = 33554432
per-proc-data-size = 134217728
max-per-proc-data-size = 1073741824
max-per-proc-address-space = 1073741824
per-proc-address-space = 1073741824
autonice = 0
autonice-time = 600
autonice-penalty = 4
open-max-soft = 4096
open-max-hard = 4096
ncallout_alloc_size = 8192
round-robin-switch-rate = 0
round_robin_switch_rate = 0
sched-min-idle = 0
sched_min_idle = 0
give-boost = 1
give_boost = 1
maxusers = 1000
task-max = 277
thread-max = 8192
num-wait-queues = 64
$/sbin/sysconfig -q generic
generic:
clock-frequency = 1024
booted_kernel = vmunix
booted_args = vmunix
lockmode = 2
lockdebug = 0
locktimeout = 15
max-lock-per-thread = 16
lockmaxcycles = 0
rt_preempt_opt = 0
rt-preempt-opt = 0
cpu_enable_mask = 18446744073709551615
cpu-enable-mask = 18446744073709551615
msgbuf_size = 4096
message-buffer-size = 4096
dump-sp-threshold = 4096
lite-system = 0
physio_max_coalescing = 65536
kmem-debug = 0
old-obreak = 1
user_cfg_pt = 45000
memstr-buf-size = 0
memstr-start-addr = 0
memstr-end-addr = 0
insecure-bind = 0
$/sbin/sysconfig -q vm
vm:
ubc-minpercent = 10
ubc-maxpercent = 100
ubc-borrowpercent = 20
ubc-maxdirtywrites = 5
ubc-nfsloopback = 0
vm-max-wrpgio-kluster = 32768
vm-max-rdpgio-kluster = 16384
vm-cowfaults = 4
vm-mapentries = 4096
vm-maxvas = 1073741824
vm-maxwire = 16777216
vm-heappercent = 7
vm-vpagemax = 536870912
vm-segmentation = 1
vm-ubcpagesteal = 24
vm-ubcdirtypercent = 10
vm-ubcseqstartpercent = 50
vm-ubcseqpercent = 10
vm-csubmapsize = 1048576
vm-ubcbuffers = 256
vm-syncswapbuffers = 128
vm-asyncswapbuffers = 4
vm-clustermap = 1048576
vm-clustersize = 65536
vm-zone_size = 0
vm-kentry_zone_size = 16777216
vm-syswiredpercent = 80
vm-inswappedmin = 1
vm-page-free-target = 128
vm-page-free-min = 20
vm-page-free-reserved = 10
vm-page-free-optimal = 74
vm-page-prewrite-target = 256
dump-user-pte-pages = 0
kernel-stack-guard-pages = 1
vm-min-kernel-address = 18446744071562067968
contig-malloc-percent = 20
vm-aggressive-swap = 0
new-wire-method = 1
vm-segment-cache-max = 50
vm-page-lock-count = 64
gh-chunks = 0
gh-min-seg-size = 8388608
gh-fail-if-no-mem = 1
|
1469.10 | malloc() bumped into something? | WTFN::SCALES | Despair is appropriate and inevitable. | Fri Jan 24 1997 11:27 | 7 |
| Jeff, any chance the malloc() region (i.e., "the break") is bumping into
something else? I.e., he's allocated all the memory available to malloc(),
even though there are plenty of pages and plenty of VA elsewhere in the
address space?
Webb
|
1469.11 | Is it a bug in OSF1 version 4.0? | SMAUG::GANTYALA | | Fri Jan 24 1997 16:20 | 16 |
| Now, I changed the OSF1 version 4.0 464 alpha to older OSF1 version 3.2 148 alpha.
This has been configured to maxusers 512, this means now the system can have
maximum threads 8232. The test program listed in note *.4 changed to support
older thread version.
Note: With this version change now the test program can able to create any
number of threads (of course upto 8232). There is no error message
from pthread_create. However, the system response is horrible, may be
because of system size.
Do I need to apply any fixes to OSF1 version 4.0 464 alpha? or Is it a known
problem in the OSF1 version 4.0 464 alpha?. If both are not true, then what is
the explanation?.
-Ramesh
|
1469.12 | | SMURF::DENHAM | Digital UNIX Kernel | Fri Jan 24 1997 17:20 | 17 |
| > Jeff, any chance the malloc() region (i.e., "the break") is bumping
> into something else? I.e., he's allocated all the memory available to
> malloc(), even though there are plenty of pages and plenty of VA
> elsewhere in the address space?
Yeah, that's a pretty good possibility. It sounds like the case
on V3.2x where we found the vm_allocate() was passing stack garbage
as the start address and was shooting holes in the address space.
If the code that's succeeding on V3.2 (.11) has that fix, it could
be behaving better using vm_allocate than the malloc code on V4.0
is.
RE: .11 -- This isn't a known problem on V4.0x. Looks like we'll
need to do some investigation. Feel filing a QAR against OSF_QAR
on GORGE. Or an IPMT or something....
|
1469.13 | | COL01::LINNARTZ | | Mon Jan 27 1997 10:19 | 11 |
| Even I'm not able to explain how the nxm scheduling is done in
detail, but have you ever tried to set your max-threads-per-user
bigger than 2048.
I think that the max-threads-per-user is still the machthread limit
accounted by the kernel, so if there is some scheduling done in the
userlib land, it could account in a value bigger than 2048, as the
OS wouldn't see this amount.
just a wag
Pit
|
1469.14 | It's a memory problem, not a thread problem. | WTFN::SCALES | Despair is appropriate and inevitable. | Mon Jan 27 1997 11:24 | 22 |
| I'm quite confident that the bug has basically nothing to do with DECthreads or
even threads, per se. The problem is that for whatever reason there simply is
not enough memory available to your process for what you are trying to do.
That is, this is either a configuration problem, or there is a problem in the
way that the system is using memory in your configuration.
The fact that it works on V3.x and doesn't work on V4.x, while interesting, is
not really useful. Many things changed between V3 and V4, not just DECthreads
but also the way in which it gets and uses memory.
I echo Jeff's recommendation that you enter a QAR.
One other point, if you expect to have 10K threads blocking in system calls
(e.g., sleep() or read()) all at the same time, you will need 10K kernel
threads, and so you will need to set max-threads-per-user appropriately.
However, this parameter should not affect the creation of user threads, only the
application's ability to execute once they all start blocking. (I.e., this is
the _next_ problem you'll see, not the current one.)
Webb
|
1469.15 | How max-thread limit works? | SMAUG::GANTYALA | | Mon Jan 27 1997 12:22 | 32 |
| Pit, I tried increasing the max-threads-per-user upto 8192 (i.e. to the same
value of max-threads). I think this parameter is dependent on the max-threads.
For example, max-threads = 1024, and set the max-threads-per-user to 2048. At
boot time, the max-threads-per-user value might be checked, if this value is
greater than the max-threads, then it changes the value to max-threads-20(in
this case it is 1004). I tried with various values, to check this formula. I
couldn't locate any documentation about the dependency between these parameters
in the unix documents 'system administration' and 'system tuning'.
I think in DEC/OSF1 4.x, the max-threads parameter may not be a limiting factor
in creating the number of threads in user address space. If task runs with
supper user privilege then process can create threads more than max-thread
limit. To test this statement, changed max-threads=1024, and excuted sample
program, which fails again at the same number(2688+kernel threads = 2697).
So, I wonder weather max-threads limit specifies the number kernel threads
can exists at a given time?. or Is it total number of threads in kernel and
user space?.
>> RE: .11 -- This isn't a known problem on V4.0x. Looks like we'll
>> need to do some investigation. Feel filing a QAR against OSF_QAR
>> on GORGE. Or an IPMT or something....
OK, I will log this problem on OSF_QAR on GORGE
-Ramesh
|
1469.16 | | SMAUG::GANTYALA | | Mon Jan 27 1997 14:55 | 21 |
| RE: .14
>> That is, this is either a configuration problem, or there is a problem in
>> the way that the system is using memory in your configuration.
The note .9, lists all the system parameters. I don't understand now what
parameter to change.
>> One other point, if you expect to have 10K threads blocking in system calls
>> (e.g., sleep() or read()) all at the same time, you will need 10K kernel
>> threads, and so you will need to set max-threads-per-user appropriately.
>> However, this parameter should not affect the creation of user threads, only
>> the application's ability to execute once they all start blocking. (I.e.,
>> this is the _next_ problem you'll see, not the current one.)
I agree with you.
>> I echo Jeff's recommendation that you enter a QAR.
The QAR number is 51210, on GORGE.
|
1469.17 | Kernel parameters for kernel threads | WTFN::SCALES | Despair is appropriate and inevitable. | Mon Jan 27 1997 16:14 | 11 |
| .15> I wonder weather max-threads limit specifies the number kernel threads
.15> can exists at a given time?
Yes; it limits the number of threads which can be created on the system.
.15> Is it total number of threads in kernel and user space?.
Nope; the kernel thread parameters do not govern user threads.
Webb
|
1469.18 | | SMAUG::GANTYALA | | Wed Jan 29 1997 13:18 | 37 |
| This time test was carried out on sufficiently large system which has
physical memory 8192M and swap space 4287MB, installed OSF1 V4.0 464.
Changed vpagemax, thread-max, and mapentries as similar to the previous
figures(Doesn't matter with the deault values).
Again, test program fails while creating threads at the same number 2688 on this
machine also. When this failed, the 'ps aux |grep cthread' command output
as printed below:
root 732 16.5 0.5 130M 44M ttyp1 R + 11:32:38 2:04:15 cthread 3000
root 730 5.8 0.5 130M 44M ttyp2 S + 11:31:47 2:37:83 cthread 3000
Also, the 'vmstat ' command output printed below:
procs memory pages intr cpu
r w u act free wire fault cow zero react pin pout in sy cs us sy id
35520 30 16K 967K 46K 50 0 56 0 0 0 5 465 746 1 65 34
It is interesting to note, that, the each process could create upto 2688
threads (in other words one can have 2688 threads+kernel threads per process
with this test program). Of course, all process are running in superuser mode.
And it works on all machines, irrespective of their system configuration.
My strong feeling is that, it is nothing to do with the system configuration
parameters, all I could think about the resources handled at the kernel level
per address space in creating threads(in this case). Somehow, resource manager
is not able to allocate required memory for process in which threads are
created. I strongly suspect some fishy things are happening in allocating
required resources to the process.
I am struggling over this problem since last week with all possible
combinations whatever I could think of. However, the attempt was not
fruitful, but it is very important for us to test this. Because, all of our
products are heavy dependent on the threads which are created for each
session, and these will amount to thousands depending on the load.
Suggestions and comments are welcomed.
-Ramesh
|
1469.19 | Your process has malloc() "fenced in". | WTFN::SCALES | Despair is appropriate and inevitable. | Wed Jan 29 1997 17:09 | 20 |
| I think that you are probably hitting the limit on how much memory your process
can allocate (i.e., valloc(), which is the function used to allocate memory for
the threads).
I expect that, if you create a thread and then loop on calls to valloc(), you'll
find that you hit an otherwise unexplainable limit, one which corresponds to the
magic 2688 number.
I suspect that this is because the process memory allocator demands a contiguous
region of memory and that that region is placed in the virtual address space
such that its expansion is bounded by some other region, such as the process
stack or shared libraries or something. (I'm not very familiar with how the
Unix process address space is laid out; perhaps someone else can comment?)
If my suppositions are true, in order for you to acheive more threads, you will
have to link your program or configure your system so that there is more virtual
address space available to the process memory allocator.
Webb
|
1469.20 | You don't really want 10K threads if they're all active!! | WTFN::SCALES | Despair is appropriate and inevitable. | Wed Jan 29 1997 17:21 | 15 |
| .18> all of our products are heavy dependent on the threads which are created
.18> for each session, and these will amount to thousands depending on the load.
BTW, you -do- understand that even though you may have thousands of threads, you
have at most only a few (1-14) processors, right? Thus, if you are running on a
uniprocessor and if all of those 10K threads are busy, any given thread will
only run once every FIFTEEN MINUTES or so. (10K threads is ALOT! Think about
what things would be like with 10K processes...)
You might want to consider a design with fewer threads representing clients and
a shared pool of threads to do actual work. With -lots- of _busy_ threads, the
context switch overhead will put a drain on your overal performance...
Webb
|
1469.21 | Threads are GOOD aren't they? | EDSCLU::GARROD | IBM Interconnect Engineering | Wed Jan 29 1997 19:45 | 30 |
| Re .-1
You're not serious are you!!
I'd really hope that you Threads guys can write better schedulers than
anything anyone else could do.
I'd say that if what we are doing couldn't be supported with threads
then it wouldn't work with any home grown scheduler either.
By the way I doubt that all 10,000 threads would be doing things at
once. Most of them will spend most of their time idle. Unfortunately
they'll be idle on a select() of TCP/IP read() call which I understand
from what you said earlier would mean we'll need a kernel thread
per user thread (lots of kernel resources I guess). I guess it would be
better to have one thread doing a passive select() (actually poll()
since select() can't handle that many sockets) and having the the
individual per session threads waiting on a condition variable.Would
this design be much more efficient or would the poll() with a 10,000
element list cause too many problems? This is more a question
for the UNIX kernel guys I guess.
Thanks for the hint on address space we'll look at that. But surely
by default the address space isn't layed out on UNIX where you
have to specially rearrange it. I'm familar with VMS where you've
got ther whole of P0 space to grow into and the stacks are right up at
the top of P1 space. In UNIX us there something that gets in the
way of VA expansion?
Dave
|
1469.22 | async I/O? | FREE::CAMBRIA | We're just one PTF from never talking to VTAM again | Thu Jan 30 1997 10:42 | 37 |
| > <<< Note 1469.21 by EDSCLU::GARROD "IBM Interconnect Engineering" >>>
> -< Threads are GOOD aren't they? >-
> I'd really hope that you Threads guys can write better schedulers than
> anything anyone else could do.
>
> I'd say that if what we are doing couldn't be supported with threads
> then it wouldn't work with any home grown scheduler either.
This depends on how you do it. Say you have a "list" of file descriptors
you are waiting on for input (as I know you are Dave.) If you were to
use the fd you select on as an index to an array of pointers of "context"
which now (because it received something) has need of a "worker" thread,
your "home grown" scheduling isn't that bad. I conceed that this is a
simplistic example, but you get the point.
> I guess it would be
> better to have one thread doing a passive select() (actually poll()
> since select() can't handle that many sockets) and having the the
> individual per session threads waiting on a condition variable.Would
> this design be much more efficient or would the poll() with a 10,000
> element list cause too many problems? This is more a question
> for the UNIX kernel guys I guess.
Never mind, you do see my point :-)
Why wouldn't a thread dedicated to using 1003.1b async_io work?
(Or even one the "worker threads", just before finishing with some "context"
setup the next aio_read)? This question is more for everyone else DaveG,
and it the reason I replied here vs. just walking to your cube.
I've read (sorry forget where, probably comp.programming.threads) that
async io and pthread don't play well. This seems like just the right time
to ask about it a) in general (ie. what posix says) and b) on Dunix.
MikeC
|
1469.23 | | DCETHD::BUTENHOF | Dave Butenhof, DECthreads | Thu Jan 30 1997 10:51 | 20 |
| >I've read (sorry forget where, probably comp.programming.threads) that
>async io and pthread don't play well. This seems like just the right time
>to ask about it a) in general (ie. what posix says) and b) on Dunix.
I suspect you misinterpreted something. What I've seen (and also said, in
many places) is that async I/O is more complicated to use than threads, and,
for most applications, that complication is not worthwhile.
However, in some high-I/O applications, async I/O may be exactly what you
want. Even though it's harder to use than threads, it can be much more
efficient when you've got lots of outstanding I/O requests. Just as a thread
is "cheaper" than a process, the context that's kept for an async I/O can be
cheaper than a thread.
If you've got, typically, "thousands" of outstanding I/O requests, and rarely
will more than a few require application activity at a time, you're best off
using async I/O with one or two server threads to pick up completed I/Os and
process them.
/dave
|
1469.24 | I'm always serious (just kidding... ;-) | WTFN::SCALES | Despair is appropriate and inevitable. | Thu Jan 30 1997 12:26 | 16 |
| .21> Re .-1
.21>
.21> You're not serious are you!!
Um, actually, yeah, I am. And, it has nothing to do with the scheduler. It's
borne out by the issue that there are 10,000 busy threads. If they are not all
busy, then things start to look alot more reasonable (at least in terms of
responsiveness). However, even so, 10,000 idle threads take up a great amount
of process resources and, if they are all blocked in the kernel, kernel
resources as well. The notion of using async I/O or poll() to provide work for
a (relatively!) small pool of server threads seems alot more reasonable.
(Although, as I think about it, the poll() solution has a fairly tricky wrinkle
in it... :-} )
Webb
|
1469.25 | Try the Digital UNIX Crash Dump Analyzer ... | PTHRED::PORTANTE | Peter Portante, DTN 381-2261, (603)881-2261, MS ZKO2-3/Q18 | Fri Jan 31 1997 18:37 | 3 |
| See http://www.zk3.dec.com/~shashi/cda.html
-Peter
|
1469.26 | Creating large numbers of pthreads | NETRIX::"[email protected]" | John Piacente | Fri Feb 28 1997 13:39 | 20 |
| Webb was right (1469.14), it's a memory problem.
For threads with a default stacksize on a typically configured machine,
the maximum number created (about 2000) occurs when VSZ as reported by
ps vm reaches 130M. (Your mileage may vary)
This limit is enforced by at least two kernel configuration attributes.
The first is vm-vpagemax, which is set at 16384 (times 8192 page size
= 130 Mb).
The second is per-proc-data-size, which is set at 130 Mb.
If you double both values, VSZ can grow to 260 Mb, and the number of
threads can grow proportionately.
-John
[Posted by WWW Notes gateway]
|
1469.27 | | SMURF::DENHAM | Digital UNIX Kernel | Fri Feb 28 1997 14:37 | 3 |
| What a pleasure to see this data captured so neatly. John,
please use this to start a troubleshooting for threads guide
or something!
|