[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8467.0. "How to determine cpu time per thread" by BLAZER::MIKELIS (Software Partner's Eng. MR01-3/F26) Wed Jan 15 1997 09:43

T.RTitleUserPersonal
Name
DateLines
8467.1SMURF::DENHAMUSGWed Jan 15 1997 10:2725
8467.2Need to use a profilerSMURF::JPWJohn P Williams, DUDE, USG, 381-2079Wed Jan 15 1997 10:4019
8467.3thanksBLAZER::MIKELISSoftware Partner's Eng. MR01-3/F26Fri Jan 17 1997 15:290
8467.4DCETHD::BUTENHOFDave Butenhof, DECthreadsMon Jan 27 1997 08:0817
Note that Solaris _lwp_info() is exactly equivalent to using the equivalent
Mach interfaces on Digital UNIX. (With the exception that our Mach interfaces
are officially unsupported.)

The important thing to remember is that in both operating systems, user
threads are not (by default) bound to any particular kernel thread. Thus,
getting the CPU time for a kernel thread is not an accurate indication of the
time used by any USER thread.

While Solaris 2.5 allows you to create bound threads (THR_BOUND or POSIX
System Contention Scope), Digital UNIX 4.0 doesn't. As Jeff said, we'll be
adding System Contention Scope in 4.0D. But remember that system contention
scope may be less efficient than the default for most applications... which
is why it's not the default. System Contention Scope is also much more
expensive in terms of system resources -- more kernel memory (and other
resources, like kernel thread slots) are consumed, and you may not be able to
create as many threads.
8467.5Straight from the wild-blue yonder! :-)RHETT::PARKERMon Jan 27 1997 17:1796
    
    Thank you all for your replies. I got an email directly out of
    the blue from the customer requesting this. I think it was from
    my reply to 8460 (8460.1 - guess someone sent that to him or ?) 
    and so I worked with him a little on this.
    
    As soon as I get some free time, probably next year! :-), I'm
    going to QAR it because I told them I would. Seriously, I will
    try to do it this week!
    -----------------------------------------------------------------

Hi Folks, 

This mail is from the customer that needs/wants this type of 
functionality on Digital UNIX and why. I'm preparing to file 
a QAR requesting this be added per the input below. Of course,
it's entirely up to Product Management and Engineering as to
whether this gets added...

Lee
-------------------------------------------------------------------------

> I don't think Digital UNIX has an exact equivalent of the Sun 
>> _lwp_info(2).

That's not good.

>> What Digital UNIX V4.0 does have is a set of profiling tools that 
>> can report on the cpu-time per procedure per thread. For general 
>> information see our Programmer's Guide.

This doesn't help.  We want to be able to report the CPU time per 
thread to the user as part of our application.  That's what _lwp_info() 
does on Solaris. This allows us to tell the user how well the application 
load balanced.

If you could put in a request for the equivalent of _lwp_info() to be 
put in a future version of Digital Unix, that would be great.

Thanks again for your help.

        -Irv Lustig, PhD                     [email protected]
        Director of Numerical Optimization   http://www.cplex.com/~irv/
        CPLEX Optimization, Inc.             http://www.cplex.com/

--------------------------------------------------------------------------
Lee:
	I've spoken with members of our company to increase the business 
case for having such a routine.  Right now, our parallel product works on 
SGI and Sun multiprocessor platforms.  Both of these platforms offer this 
capability.

We are also developing relationships with HP and IBM to support our 
parallel product line.   In order for our customers to be able to 
compare performance between vendors, this timer is a critical component 
of being able to make that comparison.  It is difficult to measure the 
lost business for not having a routine like _lwp_info(), but our 
customers tend to react negatively when we say that a vendor does not 
provide a particular capability, even though, in this case, this     
capability is just providing a performance measure.

For the business case of why we need overall good support for
parallelism, see the text below that forecasts the market performance 
in the mathematical programming market, which is our market.

        -Irv Lustig, PhD                     [email protected]
        Director of Numerical Optimization   http://www.cplex.com/~irv/
        CPLEX Optimization, Inc.             http://www.cplex.com/

>Forecast Market Performance in Math Programming
>===============================================
>A general trend in computing is the availability of near super-computer
>performance at historic server and even workstation prices.  This has
>brought the application of math programming-based technologies to historic
>highs and historic growth rates.
>
>Look at the investor web pages of three of our Value Added Resellers as 
>an example, i2 technologies(195% growth) [www.i2.com], Manugistics (81%
>growth) [http://www.manu.com/html/ir.html] and Aspen Technologies, Inc.
>(163% growth which recently aquired another CPLEX process industry VAR
>(Bechtel's PIMS Group) [www.aspentech.com].
>
>This growth is the bandwagon that DEC can jump on provided it offers a
>suitable compute server solution.  The idea is to position DEC's high
>performance computer solutions in an application area with explosive
>growth.
>
>These high growth rates in our area will continue for at least 3 more
>years.  Beyond that it is tough to forecast.  In total we conservatively
>estimate we'll leverage annual high performance computing systems sales 
>to $150 million growing to $500 million.



    
    
8467.6straight from the Solaris man page - but where?BBPBV1::WALLACEjohn wallace @ bbp. +44 860 675093Tue Jan 28 1997 18:487
    Didn't I see some stuff not a million miles from here saying that the
    Solaris lwp stuff was due to go obsolete soon? In which case there's
    not a lot of point asking for it in DIGITAL UNIX - though equivalent
    functionality would surely be nice.
    
    regards
    john
8467.7DCETHD::BUTENHOFDave Butenhof, DECthreadsWed Jan 29 1997 08:5020
The comparison doesn't hold water. First off, Solaris uses LWPs as "virtual
processors" for user threads. By default, user threads are context switched
essentially randomly across multiple LWPs. Getting the accumulated CPU time
for an LWP tells you nothing about the CPU usage of any user thread. There's
no advantage to this over simply looking at the accumulated CPU time of the
entire process.

LWP accumulated CPU time is useful only for BOUND threads -- which nobody
should be using routinely. (The Solaris threads implementation unfortunately
encourages people to use bound threads "improperly" to avoid deadlocks, due
to their lack of timeslicing or replacement of blocked LWPs before exhausting
the process concurrency -- Digital UNIX has neither of these problems.)

Digital UNIX 4.0 does not support bound threads, however PTmin will add
support for the POSIX "scope" attribute, allowing threads to be created as
"system contention scope" (equivalent to Solaris UI threads THR_BOUND flag).
Bound threads do have their own CPU time accumulated by the kernel, and there
are existing (though unsupported) Mach functions to retrieve the data. Given
sufficient business justification, we could provide a supported DECthreads
interface to a bound thread's CPU time. We currently have no plans to do so.
8467.8DCETHD::BUTENHOFDave Butenhof, DECthreadsWed Jan 29 1997 09:0215
>    Didn't I see some stuff not a million miles from here saying that the
>    Solaris lwp stuff was due to go obsolete soon? In which case there's
>    not a lot of point asking for it in DIGITAL UNIX - though equivalent
>    functionality would surely be nice.

No, Solaris isn't going to get rid of LWPs. That's just their "kernel
thread", just like our Mach threads. Without them, they'd have no support for
multiprocessors within a process.

The note you probably remember was in response to someone asking about
per-LWP timers. Those were supported by UI threads, but not by POSIX, and Sun
has announced that nobody should rely on per-LWP timers because they may
remove support for them.

That's got nothing to do with this issue.
8467.9Good info!!RHETT::PARKERWed Jan 29 1997 09:3060
    
    Hi Dave,
    
    Thanks for the information and clarification. I'll take some of this 
    information and use it to explain to the customer how he would really 
    be comparing apples to oranges and see how they respond. If they still 
    want me to file the QAR, I will. I have no idea if the "business case"
    outlined would provide sufficient justification for adding it to Digital 
    UNIX. Once they get their application running on an Alpha, I think the 
    performance will speak for itself.	:-)
    
    Thanks again for the explanation!
    
    Lee
    ------------------------------------------------------------------------
    For reference, here is the Solaris man page on _lwp_info(2) 
    
    _lwp_info(2)              System Calls               _lwp_info(2)
    
    NAME
         _lwp_info - return the time-accounting information of a sin-
         gle LWP.
    
    SYNOPSIS
         #include <sys/time.h>
         #include <sys/lwp.h>
    
         int _lwp_info(struct lwpinfo *buffer);
    
    DESCRIPTION
         _lwp_info() fills the lwpinfo structure pointed to by buffer
         with  time-accounting  information pertaining to the calling
         LWP. This call may be extended in the future to return other
         information to the lwpinfo structure as needed.  The lwpinfo
         structure in <sys/lwp.h> includes the following members:
    
                 timestruc_t lwp_utime;
                 timestruc_t lwp_stime;
    
         lwp_utime is the CPU time used while executing  instructions
         in the user space of the calling LWP.
    
         lwp_stime is the CPU time used by the system  on  behalf  of
         the calling LWP.
    
    RETURN VALUES
         Upon successful completion, _lwp_info() returns 0 and  fills
         in the lwpinfo structure pointed to by buffer.
    
    ERRORS
         If the following condition is detected, _lwp_info()  returns
         the corresponding value:
    
         EFAULT         buffer points to an illegal address.
    
    SEE ALSO
         times(2)
    
    SunOS 5.4           Last change: 31 Mar 1994                    1
     
8467.10What would RPCC showHYDRA::NEWMANChuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26Wed Jan 29 1997 14:233
Curious -- what would RPCC show?  Cycles for the kernel thread?

								-- Chuck Newman
8467.11It gives you what you asked for...WTFN::SCALESDespair is appropriate and inevitable.Wed Jan 29 1997 18:1310
> what would RPCC show?  Cycles for the kernel thread?

RPCC would show what it's supposed to show -- the PROCESSOR cycle counter.  

If the kernel thread moves to another processor, you get weird results in trying
to directly compare the results of two RPCC's; likewise, if the user thread
moves to a different kernel thread, you'll get similiarly bizzare results.


				Webb
8467.12RPCC is for threads, sometimesWIBBIN::NOYCEPulling weeds, pickin&#039; stonesThu Jan 30 1997 08:1314
No, Webb, RPCC shows the "thread" cycle counter, assuming that
context switching follows the protocol in the SRM (or Architecture
Handbook, etc).  What makes this confusing is that the SRM calls a
thread a "process" (hence Read Process Cycle Counter), and the
thread that matters is the kernel thread, not the user thread,
since you can't properly perform the RPCC context switch in user
mode.  And Windows NT doesn't properly do the switch even in kernel
mode (partly because they were confused by the "Process" terminology).

The low 32 bits from RPCC is a free-running counter that does, in
fact, show time on the processOR.  But the upper 32 bits is an
offset that, when properly combined with the low 32 bits, gives
you the time for the current (kernel) thread.  This offset is updated
at each context switch to contain the right value to make this happen.
8467.13When (if ever) does RUSAGE_THREAD "do the right thing?"HYDRA::NEWMANChuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26Thu Jan 30 1997 09:2921
What causes a user thread to migrate to a different kernel thread?.
If I have "n" threads executing on "n" cpus (nothing much else happening
on the system) and all the threads are compute bound (no I/O), how likely
are the user threads to migrate?

What if the system is busy?
What if the main thread does I/O?
What if the user-threads do I/O?
What if the threads do other calls (e.g., pthread_mutex_lock)?

I'm wondering if there are classes of applications (e.g., technical
applications where SMP is used solely to do computations in parallel
using spin-locks to wait) where getrusage(RUSAGE_THREAD, *r_usage)
might be appropriate for resource usage reporting in development and/or
the shipping application.

B.T.W., is it intended that RUSAGE_THREAD is in the
/usr/include/sys/resource.h include file but not in the man page in
Digital UNIX V4.0A

								-- Chuck Newman
8467.14Why a user thread would migrate to another kernel threadWTFN::SCALESDespair is appropriate and inevitable.Thu Jan 30 1997 12:1122
Re .12:  Oh yeah, right, I remember now...thanks for the correction, Bill. :-)

.13> What causes a user thread to migrate to a different kernel thread?.

Any time a user thread makes the transition from the "running" state to the
"ready" state (possibly indirectly) and back to the "running" state, it could
end up running on a different kernel thread the second time.  So, anything which
might cause the thread to block, such as issuing an I/O, incurring a page fault,
blocking for synchronization, etc. could result in a "migration".  Also,
anything which would cause the thread to yield the "virtual processor" to
another thread, such as a priority preemption, execution quantum exhaustion, or
explicit yield could result in a "migration".  Finally, it's possible that
DECthreads will need to execute internal updates or "housecleaning", which will
require preemption of a running thread, and this could result in a "migration".

So, if you have about the same number of user threads as you have physical
processors, the odds are very low that the user threads will move around much,
but they are high enough that trying to collect CPU data for user threads by
specifying their kernel threads won't yield reliable results.


				Webb
8467.15To close this out...RHETT::PARKERMon Feb 10 1997 16:3522
    
    Well, just to close this issue - the customer that requested this
    was satisfied with the explanation from Dave Butenhof that they 
    would still be comparing apples & oranges with regard to comparing
    this on Solaris and Digital UNIX. 
    
    Here is his mail:
    
    Lee:
            After much internal discussion, we've decided to drop our
    requirement for a per-thread timer.  It would be really nice to have,
    in order for us to measure the efficiency of our parallel implementation,
    but we now realize the difficulties that the different vendors are
    having in providing it.
    
            Sorry to make such an earlier fuss.
    
            -Irv Lustig                          [email protected]
            Director of Numerical Optimization   http://www.cplex.com/~irv/
            CPLEX Optimization, Inc.             http://www.cplex.com/
    
    
8467.16SMURF::DENHAMDigital UNIX KernelTue Feb 11 1997 12:191
    Somebody send that guy a nice fruit basket or something!