[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9664.0. "Strage Paging Behavior on A4100 Benchmark" by TBC001::WONG () Tue Apr 29 1997 20:24

    Strange paging behavior  happened on my Alpha 4100 running a
    Universe benchmark.  When the number of users simulated ramped up to
    230, the system was still behaving really well (15% cpu utilization,
    low I/O, page faults in hundreds and great interactive response). 
    However, at 256 users, the system became extremely sluggish.  The
    four cpu's were all maxed out with 70% in sys mode and 30% in user mode.
    vmubc indicated that I was doing 3000+ page faults per second, Polycenter
    performance advisor indicated that I was doing 10000 page faults per
    second, whereas vmstat told me that I was doing 30000 page faults per
    second.  I just didn't know which one to believe anymore.  One
    thing I observed was that vmubc screen refresh rate was not affected
    (still close to one second).  vmstat refresh rate was slowed down up to
    ten fold.  Polycenter Data Collector missed a couple of samples every
    ten minutes.  Still, I could not explain the large discrepancies.
    
    Another interesting thing was that I had 4G memory and half was still
    idle at peak time.  I/O was never higher than 30 per second on any drive. 
    No swaping was observed.  UBC miss rate was close to zero (changing 
    ubc-maxpercent from 100 to 50 did not seem to make a difference
    anyway).  How could page fault rate went sky high when I had tons of
    memory? These were interactive users repeating simular scripts!
    An interesting observation was that the page fault rate
    started doubling when the number of users exceeded 230.  It went up as
    high as 30k page faults per second with page-in rates reaching 10k.
    Seems like there was a threshold beyond which things started to
    degenerate.
    
    I was really baffled.  Your comments and suggestions are most
    appreciated.
    
    P.S.  System configuration were:
    AlphaServer 4100 with 4xcpu (400 MHz)
    			  4GB Mem
    			  1HSZ with 70G storage
    Digital UNIX 4.0B and Universe
    Home grown scripts for user simulation (with Expect)
T.RTitleUserPersonal
Name
DateLines
9664.1Strage Paging BehaviorNNTPD::"[email protected]"Larry WoodmanWed Apr 30 1997 16:2330
First of all, I only know how vmstat collects its paging statistics.  I dont
know what to say about the polycenter tools.  As far as vmstat is concerned
the fault statistic reports the number of times that the kernel fault handler
was involked.  The zero statistic reports the nunmber of times the 
zero-filled-on-demand handler was invoked.  The cow static reports the numver
of the copy-on-write handler was invoked.  There is no confusion possible
in these fields of vmstat.  The pin or page-in statistic can be confusing
to any utility.  The reason for this is that pages might be cached in the UBC
or file system dependent cache and this is a source of confusion.  The tool
has
to decide how it is going to charge for hits in the cache, either as a pin or 
nothing.  The vmstat utility tries is probably more aggressive on considering
a page that is in the cache a page-in where some of the other tools dont even
consider file-backed pagefaults as pageins at all.

As far as vmstat slowing down is concerned, that all depends on what the
interval
is set to.  Long term sleeping processes are likely to become outswap
candidates.
Is the system out of memory and paging out???  You said that half of the 4 GB
of memory was idle, what do you mean by this, free???   Does the actual
performance
of this machine degrade substantially or is it that these tools run more
poorly
when the system is under a heavier load???

Larry Woodman


[Posted by WWW Notes gateway]
9664.2Problem fixedTBC001::WONGThu May 01 1997 13:5626
	The problem has been fixed but we still do not know fully what
caused the many pagefaults.  We increased all the semaphore parameters
and we saw the page faults came way down and CPU became 90% free.  My
interpretation is that the old semaphore parameters caused many commands
in many scripts to fail.  Since the scripts were coded as loops, we had
many image activations.  The images activated must already be cached
because the many scripts all do simular things.  However, there must be
something during image activation that are counted as page-in's.  The
failed scripts kept on triggering these page-in's and hence we had up
to 10k page-in/sec.

	After increasing the semaphore parameters, we were able to start
up 1000 Universe scripts and still had 35% CPU free.  We had page faults
in the 10k range, but most of them were soft faults (largely demand zero).
Page-in's ranged from 600 to 1k which is much more acceptable.

I've observed the differences between vmstat, vmubc and performance monitor
(PMGR) more.  Even at light load, they did not agree on page fault rates.
Personally, I have more faith on vmstat.

ref.1	The system was never out of memory.  Page out rate was zero all
	the time.  We had more than 2GB on the free-list.  The system
	was still very responsive to interactive sessions when the page
	fault problem started.  CPU were used 100%.  'vmstat 5' refreshed
	it's screen in 50 second intervals instead of 5.