[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9664.0. "Strage Paging Behavior on A4100 Benchmark" by TBC001::WONG () Tue Apr 29 1997 19:24

    Strange paging behavior  happened on my Alpha 4100 running a
    Universe benchmark.  When the number of users simulated ramped up to
    230, the system was still behaving really well (15% cpu utilization,
    low I/O, page faults in hundreds and great interactive response). 
    However, at 256 users, the system became extremely sluggish.  The
    four cpu's were all maxed out with 70% in sys mode and 30% in user mode.
    vmubc indicated that I was doing 3000+ page faults per second, Polycenter
    performance advisor indicated that I was doing 10000 page faults per
    second, whereas vmstat told me that I was doing 30000 page faults per
    second.  I just didn't know which one to believe anymore.  One
    thing I observed was that vmubc screen refresh rate was not affected
    (still close to one second).  vmstat refresh rate was slowed down up to
    ten fold.  Polycenter Data Collector missed a couple of samples every
    ten minutes.  Still, I could not explain the large discrepancies.
    
    Another interesting thing was that I had 4G memory and half was still
    idle at peak time.  I/O was never higher than 30 per second on any drive. 
    No swaping was observed.  UBC miss rate was close to zero (changing 
    ubc-maxpercent from 100 to 50 did not seem to make a difference
    anyway).  How could page fault rate went sky high when I had tons of
    memory? These were interactive users repeating simular scripts!
    An interesting observation was that the page fault rate
    started doubling when the number of users exceeded 230.  It went up as
    high as 30k page faults per second with page-in rates reaching 10k.
    Seems like there was a threshold beyond which things started to
    degenerate.
    
    I was really baffled.  Your comments and suggestions are most
    appreciated.
    
    P.S.  System configuration were:
    AlphaServer 4100 with 4xcpu (400 MHz)
    			  4GB Mem
    			  1HSZ with 70G storage
    Digital UNIX 4.0B and Universe
    Home grown scripts for user simulation (with Expect)

T.R	Title	User	Personal Name	Date	Lines
9664.1	Strage Paging Behavior	NNTPD::"[email protected]"	Larry Woodman	`Wed Apr 30 1997 15:23`	30
	First of all, I only know how vmstat collects its paging statistics. I dont know what to say about the polycenter tools. As far as vmstat is concerned the fault statistic reports the number of times that the kernel fault handler was involked. The zero statistic reports the nunmber of times the zero-filled-on-demand handler was invoked. The cow static reports the numver of the copy-on-write handler was invoked. There is no confusion possible in these fields of vmstat. The pin or page-in statistic can be confusing to any utility. The reason for this is that pages might be cached in the UBC or file system dependent cache and this is a source of confusion. The tool has to decide how it is going to charge for hits in the cache, either as a pin or nothing. The vmstat utility tries is probably more aggressive on considering a page that is in the cache a page-in where some of the other tools dont even consider file-backed pagefaults as pageins at all. As far as vmstat slowing down is concerned, that all depends on what the interval is set to. Long term sleeping processes are likely to become outswap candidates. Is the system out of memory and paging out??? You said that half of the 4 GB of memory was idle, what do you mean by this, free??? Does the actual performance of this machine degrade substantially or is it that these tools run more poorly when the system is under a heavier load??? Larry Woodman [Posted by WWW Notes gateway]
9664.2	Problem fixed	TBC001::WONG		`Thu May 01 1997 12:56`	26
	The problem has been fixed but we still do not know fully what caused the many pagefaults. We increased all the semaphore parameters and we saw the page faults came way down and CPU became 90% free. My interpretation is that the old semaphore parameters caused many commands in many scripts to fail. Since the scripts were coded as loops, we had many image activations. The images activated must already be cached because the many scripts all do simular things. However, there must be something during image activation that are counted as page-in's. The failed scripts kept on triggering these page-in's and hence we had up to 10k page-in/sec. After increasing the semaphore parameters, we were able to start up 1000 Universe scripts and still had 35% CPU free. We had page faults in the 10k range, but most of them were soft faults (largely demand zero). Page-in's ranged from 600 to 1k which is much more acceptable. I've observed the differences between vmstat, vmubc and performance monitor (PMGR) more. Even at light load, they did not agree on page fault rates. Personally, I have more faith on vmstat. ref.1 The system was never out of memory. Page out rate was zero all the time. We had more than 2GB on the free-list. The system was still very responsive to interactive sessions when the page fault problem started. CPU were used 100%. 'vmstat 5' refreshed it's screen in 50 second intervals instead of 5.