| First of all, I only know how vmstat collects its paging statistics. I dont
know what to say about the polycenter tools. As far as vmstat is concerned
the fault statistic reports the number of times that the kernel fault handler
was involked. The zero statistic reports the nunmber of times the
zero-filled-on-demand handler was invoked. The cow static reports the numver
of the copy-on-write handler was invoked. There is no confusion possible
in these fields of vmstat. The pin or page-in statistic can be confusing
to any utility. The reason for this is that pages might be cached in the UBC
or file system dependent cache and this is a source of confusion. The tool
has
to decide how it is going to charge for hits in the cache, either as a pin or
nothing. The vmstat utility tries is probably more aggressive on considering
a page that is in the cache a page-in where some of the other tools dont even
consider file-backed pagefaults as pageins at all.
As far as vmstat slowing down is concerned, that all depends on what the
interval
is set to. Long term sleeping processes are likely to become outswap
candidates.
Is the system out of memory and paging out??? You said that half of the 4 GB
of memory was idle, what do you mean by this, free??? Does the actual
performance
of this machine degrade substantially or is it that these tools run more
poorly
when the system is under a heavier load???
Larry Woodman
[Posted by WWW Notes gateway]
|
| The problem has been fixed but we still do not know fully what
caused the many pagefaults. We increased all the semaphore parameters
and we saw the page faults came way down and CPU became 90% free. My
interpretation is that the old semaphore parameters caused many commands
in many scripts to fail. Since the scripts were coded as loops, we had
many image activations. The images activated must already be cached
because the many scripts all do simular things. However, there must be
something during image activation that are counted as page-in's. The
failed scripts kept on triggering these page-in's and hence we had up
to 10k page-in/sec.
After increasing the semaphore parameters, we were able to start
up 1000 Universe scripts and still had 35% CPU free. We had page faults
in the 10k range, but most of them were soft faults (largely demand zero).
Page-in's ranged from 600 to 1k which is much more acceptable.
I've observed the differences between vmstat, vmubc and performance monitor
(PMGR) more. Even at light load, they did not agree on page fault rates.
Personally, I have more faith on vmstat.
ref.1 The system was never out of memory. Page out rate was zero all
the time. We had more than 2GB on the free-list. The system
was still very responsive to interactive sessions when the page
fault problem started. CPU were used 100%. 'vmstat 5' refreshed
it's screen in 50 second intervals instead of 5.
|