| I'll take a WAG and assume that the system was able to read the data
from /tmp faster than it wrote to /var/tmp, and as a result wound up
filling up the unified buffer cache with pending disk writes. With
the default setting of ubc-maxpercent=100%, the UBC can grow to 100%
of available memory. If this is happening, you've got little or no
memory left for your other processes to run.
Try tuning ubc-maxpercent down to 50% and see if the same thing
happens. To do this, add the following to /etc/sysconfigtab:
vm:
ubc-maxpercent = 50
then reboot and try the test again.
Martin Moore
Digital UNIX Support
Atlanta CSC
|
| Try tuning ubc-maxpercent down to 50% and see if the same thing
happens. To do this, add the following to /etc/sysconfigtab:
vm:
ubc-maxpercent = 50
then reboot and try the test again.
Well, I did that, and verified via dbx that the value of ubc_maxpercent was
indeed 50. It didn't change a thing. In fact, I did an even more basic
test: I wrote a couple-line C program that merely fills an N-byte buffer
with junk and a newline in position N, and then writes the result to stdout
until it has written M bytes. With M=100000000, N=1000 and stdout redirected
to a disk file, I got the same misbehavior: After a few seconds, the only
thing that worked on the screen was the pointer, which moved around just fine.
But any other operations (changing focus, clicking to change stacking, input
to the active window) simply hung until the disk write completed.
Further evidence that the problem isn't with output buffers: I also wrote a
little C program that merely reads stdin and does nothing with the data. I
redirected its stdin to the 100-MB file generated above. After a few seconds,
most of the stuff on the screen was again hung. This one wasn't quite as
drastic; the hangs typically lasted 5-10 seconds, then everything woke up
and ran like crazy for a few seconds before hanging again.
So it's not just cp that causes the problem. A small loop that just reads
or just writes data from/to a disk file can hang most of the process table.
During both of these tests, I kept a copy of top running in one window with
a 1-sec refresh time. Its output would hang at the same time as everything
else's. This could be either because top itself was hung, or because its
xterm was hung; I don't know how to distinguish these cases. But top itself
is a rather small process (RES=196K), and uses only a couple percent of the
cpu at its highest refresh rate, so the problem isn't only with big programs.
Any other ideas?
[This is a serious "showstopper" to some customers; if it can't be solved, the
next test case will probably be on an NT system, and if that also fails, then
maybe a Solaris or SGI system. ;-]
|