[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9981.0. "syscall statistics tool?" by BPSOF::TUBA (There are more things in heaven and earth) Thu May 29 1997 10:21

Hi,

Is there any tool out there to measure how much percent of the system time,
consumed by a particular process during a given period, is dedicated to each
individual type of system call? Kind of trace, truss, with some statistics on
this included.  DU 3.2C, no possibility to recompile/relink apps.

Thanks,
Zoli

T.R	Title	User	Personal Name	Date	Lines
9981.1		KITCHE::schott	Eric R. Schott USG Product Management	`Thu May 29 1997 20:55`	10
	Hi I don't know if it is what you want, but maybe kernel profiling? Are you trying to figure out how to tune the system? If you can't change the app, why take this tact? Can you state your problem, and system configuration? You may get more ideas...
9981.2	description of problem	BPSOF::TUBA	There are more things in heaven and earth	`Fri May 30 1997 07:14`	47
	TP application at a customer shows strange behaviour both at interactive workload during office hours and batch processing overnight. With interactive processing, CPU utilization for a single transaction is about 3 secs. With 250 users, the same transaction uses about 20 CPU secs. Most of the CPU time is spent in system mode (over 90%). Of course it is in itself a problem and shows poor application design and inadequate runtime environment for this number of users. The trace output shows the same number of system calls issued in both cases. What I am investigating is why the user processes need 6-7 times more CPU time to complete the same number of system calls in the second case. If it was a locking problem, processes would have to be suspended while waiting for lock if using SETLKW, which is not measured in the CPU time. With SETLK's issued repeatedly while waiting for locks, we would see an increase in the number of system calls, which doesn't happen. If it is I/O wait, it should also be counted as idle time. I would like to investigate what the extra CPU time is being used for. A run queue length of 90-100 indicates that user processes are almost always ready to use the CPUs. Batch processing during night is scheduled for 6-7 CPUs. It does the same kind of operation on the data of about 200 branch offices, each process takes on different sets of branch offices. Again there is over 90% system time. The workload is evenly distributed accross the CPUs. At some point, there is a drastical slowdown in the processing. The number of system calls issued during a given period decrease 6-7 times the initial rate. Slowdown takes place within a minute, all the processes slow down together. At the same time, customer reports a similar decrease in the transaction rate (measured in the time needed to process one branch's data). The CPU load is still distributed evenly, the processing remains that slow until the end of the batch execution. No paging/swapping activity can be observed, as well as no I/O bottleneck occurs, UBC miss rate is close to nil. Run queue length is the same before and after the slowdown and equals to the number of batch processes. With batch processing running on one CPU only no slowdown occurs at all. What I would like to track down if it is a particular type of system call which causes the slowdown, whether it is a fight for common resource, and if it could be improved with OS tuning. Or the reason why it couldn't. I would appreciate any ideas about what else I should take a closer look to, parameters to check, additional tests to run, etc. System config: AS8400, 8 CPU, 1 GB, KZPSA's, RZ28D, LSM 0+1, DU 3.2C Runtime env: Magic 5.61, C-ISAM The same image is being executed by all users and batch processes. Thanks in advance Zoli
9981.3	a couple ideas, if you're sure it's not just thrashing	SMURF::JPW	John P Williams, DUDE, USG, 381-2079	`Fri May 30 1997 08:54`	27
	There are a few profiling tools that might help, though none explcitly provide exactly the information you are asking for: 1. The Atom-based Hiprof tool, which is not supported on V3.2C, but which is available as an unsupported ADK, via our web page: http://www.zk3.dec.com/dude/program_analysis/ Use Atom commands like this: atom -tool hiprof -toolargs=-calltime -all <exe> or atom -tool hiprof -toolargs=-cputime -all <exe> Then display the system-call profile with gprof: gprof -b -all <exe> <exe>.hiout or (to see only the system calls) gprof -b -incobj KERNEL -excobj <exe> <exe> <exe>.hiout 2. The kprofile tool, which is supported in V3.2C, but which needs the pfm(7) device driver configured into the kernel (ie "pseudo-device pfm" in the configuration file), and which needs a patch kit to make it work on EV5 and SMP systems like I think you have. I don't know the patch id(s). 3. There are some internal, unsupported, Atom-based kernel profiling tools available. Yaacov Fenster ([email protected]) may be the best contact.
9981.4	Maybe truss would help	UNXA::KOPY		`Fri May 30 1997 16:51`	13
	Zoli, Digital does ship truss as part of the SVE product. It is a System V implementation and may provide the information you are looking for. However, this version of truss does not work on threaded programs; you didn't say whether or not your's is threaded but with an mp system I guess that it might be. The second point to note is that SVE is a layered product with a separate software license, so it will cost $. Regards, - Walt
9981.5		SMURF::DENHAM	Digital UNIX Kernel	`Fri May 30 1997 18:07`	17
	Walt, I was gonna send you mail, but why not here. Any chance of getting SVE truss enhanced to deal with system calls from the realtime habitat? Right now it prints junk. Well, not junk. You can sorta figure out happened based on the syscall return value if you know what to look for. We noticed this running truss on a Sybase database engine, which like informix and oracle uses realtime AIO system calls to do its I/O. AIO is in the realtime habitat. I'd be happen to provide guidance on dealing with habitats, even though they were invented down your way... Jeff
9981.6		KITCHE::schott	Eric R. Schott USG Product Management	`Sun Jun 01 1997 12:23`	9
	Hi Have you included sys_check out for the system? Can you run sys_check -perf during the "bad performance" time? http://www-unix.zk3.dec.com/tuning/tools/sys_check/sys_check.html
9981.7		UNXA::KOPY		`Tue Jun 03 1997 14:35`	16
	Re: .5 Jeff, You asked about getting SVE truss enhanced to deal with system calls from the realtime habitat. Unfortunately, the response to this is similar to many SVE questions, that is since the product is essentially in maintenance mode there are no resources allocated to address any "enhancements". And, as you probably know, most of the SVE code is encumbered so we can't look at it. That makes hacking around impossible so we can't persue this unless its a bit more formal. If you'd like to push it a little, you might want to enter a QAR. Although I can't promise anything, at least it would be on the list... - Walt
9981.8	some update	BPSOF::TUBA	There are more things in heaven and earth	`Thu Jun 05 1997 10:44`	13
	It seems that atom 3.2 can't deal with stripped objects. Is there an internal version which can do that? The patch I found for pfm.o works with ev5 only, we have ev56. Again, is anybody aware of a patch for du3.2 and ev56? If this can't be achieved, I will move the testing process under 4.0 (although Magic is not officially supported on 4.0 I would expect it to work and for test purposes it may be sufficient). Thanks, Zoli