[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference caldec::wrl_atom

Title:ATOM Tool Development System
Moderator:CALDEC::SCHMIDT
Created:Tue Sep 07 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:309
Total number of notes:979

292.0. "hiprof internal error" by RHETT::SHEPPARD () Tue Feb 11 1997 11:20

[unix v4.0a, upgraded from v3.2g, C++ v5.5]

A customer recently ported a threaded application to Unix V4.0A that
ran without problem on Unix v3.2g .  He recompiled/relinked with no
problems on Unix V4.0a.

Now he claims that the runtime profile of his app shown by ATOM hiprof
shows much higher %CPU usage for functions in threaded libraries on
Unix V4.0a than it did V3.2g.  He wants to optimize what he can, but
finds many of the functions are in system (or 3rd-party) libraries.

Could this be expected behavior ?

In once instance he ran the profiler on a single thread and got the
error msg:
    Hiprof internal error: analysis overflow
What might cause this ?

Any pointers on how to pinpoint (or work around) high usage of
functions inside system libraries ?

Thanks in advance for any info or suggestions !


                        Steve Sheppard
                        Digital Unix / Ultrix Applications Support Team
                        [email protected]


The user's notes follow:

    "This file is the ATOM hiprof output for one thread.  I could only run
    one thread with the ATOM hiprof because it took so much memory and
    processing time.

    The flat profile shows 45% in ids_thread_wait function

    We are using ISIS for our IPC communications.

    The file after this one is the profiler run for 6 threads for 30 minutes.

    My application processes telemetry data for multiple satellites.
    I'm using a data generator to generate what we call minor frames(250
    bytes) of data generated at 1 minor frame/2 seconds. The applications
    decommutates the data, converts it to engineering units, performs
    limit checking, and output this data to shared memory. It also
    processes memory data and writes this data to files.

    We are using C++ V5.5, Rogue Wave, UNIX(4.0A)

    Hiprof internal error: analysis overflow
    Hiprof internal error: analysis overflow
    ...
    [etcetera, etcetera...]
    ...
    Hiprof internal error: analysis overflow
    Hiprof internal error: analysis overflow
    High Profile: X1.0-1,  3 August 1995"

Output from his profile session follows; the file is *large*,
truncated here for space. If you'd like to see the original let me
know and I'll provide it via ftp.

"============================= Profile Number  1 =============================
...
Instrumentation Options (-toolargs): -cputime

Run time Options (HIPROF_ARGS):      -textout sigdump 15

1 unit = 1,000 cycles
1 cycle = 6.01 ns

Process times:
 Time in Application        =        3,880,390,946 units      (23321.150 sec)
 Hiprof Initialize Time     =               25,923 units      (    0.156 sec)
 Time in Analysis Routines  =       13,878,604,305 units      (83410.412 sec)
 Rusage time                =          182,378,023 units      ( 1096.092 sec)
 Rusage diff                =      -17,576,643,152 units      (-105635.625 sec)
 Total Number of Calls = 58,886,256
 Application Time per Call =    65896.4 cycles
 Analysis Time per Call =      235684.9 cycles
 Edge Searches per Call =           1.6

Thread times:
 Time in Application        =          777,641,543 units      ( 4673.626 sec)
 Time in Analysis Routines  =        6,008,420,429 units      (36110.607 sec)

Linkage Overhead = 2000000000 cycles
...


T.RTitleUserPersonal
Name
DateLines
292.1Multithreaded Hiprof -cputime is broken on V4.0, V4.0A, and V4.0BSMURF::JPWJohn P Williams, DUDE, USG, 381-2079Wed Feb 12 1997 07:1827
There is a known (and crippling) bug in Hiprof's -cputime option for
DECthreads applications, though it is fine for non-threaded applications.
The problem is that in V4.0 DECthreads started multiplexing pthreads in
a smaller number of kernel threads, but the RPCC instruction (which Hiprof's
-cputime option uses) is only correct for a given kernel thread, not for
the pthreads that Hiprof reports on.

We are currently testing a fix for this problem, in the PTmin release, which 
is planned as one of the next few V4.0x releases. Until that release, Hiprof's
-cputime option should not be used for DECthreads applications. The fix depends
on new DECthreads and Kernel interfaces, so a patch is impractical.

The nearest alternative is to compile the application with the -pg option,
though this only covers the main executable. Hiprof's -calltime option
will profile all but the threads-related libraries (when used with "atom -all"),
but only in terms of instruction counts, and with more run-time overhead.
Relinking with -p profiles all shared libraries in terms of seconds of 
cpu-time at low overhead, but just omits the call-graph part of the profile.
The uprofile tool profiles an uninstrumented program (but no libraries)
at almost no overhead.

The only way to further optimize a system library or third-party library
is to link the application -non_shared (ie use archive, not shared, libraries)
and use the -om and related ld options to perform post-link optimization.
However, until the Steel release, non_shared programs are not supported by
DECthreads. So all the programmer can do is avoid calling the library routines
that use a lot of cpu-time.
292.2thanx!RHETT::SHEPPARDWed Feb 12 1997 12:415
    Thanks for the quick response!  I forwarded your reply to my customer
    and am awaiting further updates.
    
    --Steve