[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9981.0. "syscall statistics tool?" by BPSOF::TUBA (There are more things in heaven and earth) Thu May 29 1997 11:21

Hi,

Is there any tool out there to measure how much percent of the system time,
consumed by a particular process during a given period, is dedicated to each
individual type of system call? Kind of trace, truss, with some statistics on
this included.  DU 3.2C, no possibility to recompile/relink apps.

Thanks,
Zoli
T.RTitleUserPersonal
Name
DateLines
9981.1KITCHE::schottEric R. Schott USG Product ManagementThu May 29 1997 21:5510
Hi

I don't know if it is what you want, but maybe kernel profiling?

Are you trying to figure out how to tune the system?  If you can't
change the app, why take this tact?

Can you state your problem, and system configuration?  You may get
more ideas...

9981.2description of problemBPSOF::TUBAThere are more things in heaven and earthFri May 30 1997 08:1447
TP application at a customer shows strange behaviour both at interactive
workload during office hours and batch processing overnight.

With interactive processing, CPU utilization for a single transaction is about 3
secs. With 250 users, the same transaction uses about 20 CPU secs. Most of the
CPU time is spent in system mode (over 90%). Of course it is in itself a problem
and shows poor application design and inadequate runtime environment for this
number of users. The trace output shows the same number of system calls issued
in both cases. What I am investigating is why the user processes need 6-7 times
more CPU time to complete the same number of system calls in the second case. If
it was a locking problem, processes would have to be suspended while waiting for
lock if using SETLKW, which is not measured in the CPU time. With SETLK's issued
repeatedly while waiting for locks, we would see an increase in the number of
system calls, which doesn't happen. If it is I/O wait, it should also be counted
as idle time. I would like to investigate what the extra CPU time is being used
for. A run queue length of 90-100 indicates that user processes are almost
always ready to use the CPUs.

Batch processing during night is scheduled for 6-7 CPUs. It does the same kind
of operation on the data of about 200 branch offices, each process takes on
different sets of branch offices. Again there is over 90% system time. The
workload is evenly distributed accross the CPUs. At some point, there is a
drastical slowdown in the processing. The number of system calls issued during a
given period decrease 6-7 times the initial rate. Slowdown takes place within a
minute, all the processes slow down together. At the same time, customer reports
a similar decrease in the transaction rate (measured in the time needed to
process one branch's data). The CPU load is still distributed evenly, the
processing remains that slow until the end of the batch execution. No
paging/swapping activity can be observed, as well as no I/O bottleneck occurs,
UBC miss rate is close to nil. Run queue length is the same before and after the
slowdown and equals to the number of batch processes. With batch processing
running on one CPU only no slowdown occurs at all.

What I would like to track down if it is a particular type of system call which
causes the slowdown, whether it is a fight for common resource, and if it could
be improved with OS tuning. Or the reason why it couldn't.

I would appreciate any ideas about what else I should take a closer look to,
parameters to check, additional tests to run, etc.

System config: AS8400, 8 CPU, 1 GB, KZPSA's, RZ28D, LSM 0+1, DU 3.2C
Runtime env:   Magic 5.61, C-ISAM
The same image is being executed by all users and batch processes.

Thanks in advance
Zoli

9981.3a couple ideas, if you're sure it's not just thrashingSMURF::JPWJohn P Williams, DUDE, USG, 381-2079Fri May 30 1997 09:5427
There are a few profiling tools that might help, though none explcitly
provide exactly the information you are asking for:

1. The Atom-based Hiprof tool, which is not supported on V3.2C,
   but which is available as an unsupported ADK, via our web page:

	http://www.zk3.dec.com/dude/program_analysis/

   Use Atom commands like this:

	atom -tool hiprof -toolargs=-calltime -all <exe>
   or
	atom -tool hiprof -toolargs=-cputime -all <exe>

   Then display the system-call profile with gprof:

	gprof -b -all <exe> <exe>.hiout
   or (to see only the system calls)
	gprof -b -incobj KERNEL -excobj <exe> <exe> <exe>.hiout

2. The kprofile tool, which is supported in V3.2C, but which needs the
   pfm(7) device driver configured into the kernel (ie "pseudo-device pfm"
   in the configuration file), and which needs a patch kit to make it work
   on EV5 and SMP systems like I think you have. I don't know the patch id(s).

3. There are some internal, unsupported, Atom-based kernel profiling tools
   available. Yaacov Fenster ([email protected]) may be the best contact.
9981.4Maybe truss would helpUNXA::KOPYFri May 30 1997 17:5113
Zoli,

Digital does ship truss as part of the SVE product.  It is a System V 
implementation and may provide the information you are looking for.  However,
this version of truss does not work on threaded programs; you didn't say
whether or not your's is threaded but with an mp system I guess that it might be.

The second point to note is that SVE is a layered product with a separate
software license, so it will cost $.

Regards,

- Walt
9981.5SMURF::DENHAMDigital UNIX KernelFri May 30 1997 19:0717
    Walt,
    
    I was gonna send you mail, but why not here. Any chance of getting
    SVE truss enhanced to deal with system calls from the realtime
    habitat? Right now it prints junk. Well, not junk. You can sorta
    figure out happened based on the syscall return value if you know
    what to look for.
    
    We noticed this running truss on a Sybase database engine, which
    like informix and oracle uses realtime AIO system calls to do its
    I/O. AIO is in the realtime habitat.
    
    I'd be happen to provide guidance on dealing with habitats, even though
    they were invented down your way...
    
    Jeff
    
9981.6KITCHE::schottEric R. Schott USG Product ManagementSun Jun 01 1997 13:239
Hi

 Have you included sys_check out for the system?

Can you run sys_check -perf during the "bad performance" time?

http://www-unix.zk3.dec.com/tuning/tools/sys_check/sys_check.html


9981.7UNXA::KOPYTue Jun 03 1997 15:3516
Re: .5

Jeff,

You asked about getting SVE truss enhanced to deal with system calls from
the realtime habitat.  Unfortunately, the response to this is similar to many
SVE questions, that is since the product is essentially in maintenance mode
there are no resources allocated to address any "enhancements".  And, as
you probably know, most of the SVE code is encumbered so we can't look at it.
That makes hacking around impossible so we can't persue this unless its a bit
more formal.  

If you'd like to push it a little, you might want to enter a QAR.  Although I 
can't promise anything, at least it would be on the list...

- Walt
9981.8some updateBPSOF::TUBAThere are more things in heaven and earthThu Jun 05 1997 11:4413
It seems that atom 3.2 can't deal with stripped objects. Is there an internal
version which can do that?

The patch I found for pfm.o works with ev5 only, we have ev56. Again, is anybody
aware of a patch for du3.2 and ev56?

If this can't be achieved, I will move the testing process under 4.0 (although
Magic is not officially supported on 4.0 I would expect it to work and for test
purposes it may be sufficient).

Thanks,
Zoli