[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9053.0. "Proc. Idle time: output of "w" <> output of "ps ax -O SL"" by NETRIX::&quot;[email protected]&quot; (Joao Miranda) Thu Mar 06 1997 06:49

Digital UNIX V4.0B  (Rev. 564)

In V3.2C there used to be a correspondence between the idle column of the "w" 
command and the "SL" switch of the "ps" command.
I used that correspondence to kill all the processes from a certain user
that were sleeping for more than X minutes.
But now, the output of  "ps -O SL" doesn't seem to be correct:

# w
17:51  up  5:17,  9 users,  load average: 0.13, 0.08, 0.08
User     tty        from             login@    idle   JCPU   PCPU what
minimal  console                     17:47                        -ksh
beatriz  0g         LAT_08002B274928 14:22        2   2:32      1 fglgo
/icnbac
root     p0         lapa             17:50                        -sh
informix p5         lapa             16:40       12   2:05        ksh -o vi
carzila  p9         lapa             15:36            1:03      7 fglgo
/icnbac

icn#ps axtp9 -O sl
  PID       SL S    TTY             TIME COMMAND
 2903      890 I  + ttyp9        0:00.11 sh
 2997     8121 IW + ttyp9        0:00.09 sh
 6618     1806 S  + ttyp9        0:07.72 fglgo

And in fact, the user at ttyp9 was working at when I issued both commands, so
it seems that "ps -O SL" it's not working well.

Does anybody knows what's the problem ?

By the way, this is the script I've mentioned above:

#!/bin/ksh
#
# File: KILL
#

DoIt()
{
   who -u | grep -v root | grep -v console| awk '{print $2}' |while read TTY;
do
      RIP="" ; MIX=""; IDLE="TRUE"
      TTY=`echo $TTY |cut -c4-5 `
      ps axt$TTY -o pid,sl |grep -v PID |awk '{print $1,$2 }' |  while read
PID 
SL; do
          [ "$SL" -ge 600 ] && {
            RIP=$RIP" $PID" ; } ||
          { IDLE=FALSE ; continue ; }
    done

        [ "$IDLE" = "TRUE" ] &&  {

        #Kill all (sleeping) user processes

          for i in $RIP
          do
             kill -9 $i
          done
        }
        done
}
DoIt

Thanks,
Joao Miranda


[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
9053.1SMURF::DENHAMDigital UNIX KernelThu Mar 06 1997 11:582
    Sorry, can't get to my 3.2 machine at the moment. Can you post some
    comparable 3.2 output?
9053.2Ouput of 3.2NETRIX::&quot;[email protected]&quot;Joao MirandaFri Mar 07 1997 04:3337
Hi

Here's an example of a 3.2D-2 output:

# w
09:20  up 15:22,  2 users,  load average: 0.31, 0.18, 0.17
User     tty        from             login@    idle   JCPU   PCPU what
miranda  p1         pca015.xip.dec.c 09:11                        w
miranda  p2         pca015.xip.dec.c 09:15        5               -ksh
# ps axtp2 -O sl
  PID       SL S    TTY             TIME COMMAND
27893      314 I  + ttyp2        0:00.14 ksh
#

As you can see, there is a relation between the idle time of "w" and 
the sleep time of "ps". ("ps -O sl" is more or less 60 times the "idle"
time of "w").

A few minutes later ...

# w
09:28  up 15:30,  2 users,  load average: 0.50, 0.22, 0.15
User     tty        from             login@    idle   JCPU   PCPU what
miranda  p1         pca015.xip.dec.c 09:11                        w
miranda  p2         pca015.xip.dec.c 09:15       13               -ksh
# ps axtp2 -O sl
  PID       SL S    TTY             TIME COMMAND
27893      799 I  + ttyp2        0:00.14 ksh
# bc
799 / 60
13
#

Is this enough ?
Thanks in advance for any reply
Joao Miranda
[Posted by WWW Notes gateway]
9053.3SMURF::DENHAMDigital UNIX KernelFri Mar 07 1997 08:495
    Yep, just what I wanted to see. Those V4.0 sleep numbers sure
    look big. Maybe we changed the units to meet some standard
    or other... :^) Sure takes some of these features a long
    time to show up.
    
9053.4Any other way ?NETRIX::&quot;[email protected]&quot;Joao MirandaFri Mar 07 1997 09:403
Is there any other way to find out if a process 
is sleeping for more than X seconds ?
[Posted by WWW Notes gateway]
9053.5SMURF::DENHAMDigital UNIX KernelFri Mar 07 1997 13:163
    Are any of these processes you're looking multithreaded?
    Add the m flag to your ps command. All threads sleep time
    gets added into the "process" sleep time ps shows...
9053.6ps -m -O slNETRIX::&quot;[email protected]&quot;Joao MirandaMon Mar 10 1997 04:5428
Hi


#w
09:48  up 2 days, 14:07,  16 users,  load average: 0.95, 1.17, 1.24
User     tty        from             login@    idle   JCPU   PCPU what
root     console                     09:46                        ls -la
.....
cgeres2  p5         lapa             09:33               5      5 fglgo
/icnbac
.....
#ps amxtp5 -O sl
  PID       SL S    TTY             TIME COMMAND
 2379      845 IW + ttyp5        0:00.05 sh
 2382      911 IW + ttyp5        0:00.10 sh
 2396     1729 S  + ttyp5        0:07.12 fglgo
           845 I                 0:00.07                                      

           844 I                 0:00.00                                      

            20 S                 0:03.24                                      

            20 S                 0:03.81                                      

#

What can you get from this ???!!
[Posted by WWW Notes gateway]
9053.7"Addition" is the wrong operation -- try "minimum"?WTFN::SCALESDespair is appropriate and inevitable.Mon Mar 10 1997 10:2916
.5> All threads sleep time gets added into the "process" sleep time ps
.5> shows...

*grin*  I'd guess that ps _should_ be showing the smallest value for sleep
time that it finds in any of the threads (i.e., that's the easiest
approximation for the amount of time that the -process- spent sleeping).

However that value could still unreliable in that while that thread was
sleeping for it's short time, another thread could have been running, such
that the -process- was never asleep.

Nevertheless, it strikes me that _adding_ the sleep times is the wrong thing
to do...  :-)


					Webb
9053.8DCETHD::BUTENHOFDave Butenhof, DECthreadsMon Mar 10 1997 12:1625
>Nevertheless, it strikes me that _adding_ the sleep times is the wrong thing
>to do...  :-)

Yes. Especially since the DECthreads "manager thread" (a sort of daemon that
we use for various bookkeeping functions within the thread library) spends
nearly its entire life sleeping. I would expect that for most threaded
programs, the manager thread's sleep time will closely approximate the amount
of time the program has been active -- so adding that time to the sleep time
of other threads seems "unuseful".

Prior to PTmin, the manager thread is a user-mode (process contention scope)
thread like all others, and can migrate between various kernel threads. Thus
its sleep time can be split among any number of kernel threads. (Only kernel
threads are known to, or reported by, ps.) In PTmin, the manager thread is
"system contention scope", and will run on a single "bound" kernel thread for
the life of the process, so it would be easy for ps to factor out this
particular thread in the calculation of process sleep time.

What ps SHOULD show for "process sleep time" in a multithreaded process is
the amount of time that NO (kernel) thread within the process was running. I
have no idea how easy or hard it would be for the kernel to compute this
statistic. Short of that, Webb's suggestion of using the smallest thread
sleep time seems like a reasonable fudge.

	/dave
9053.9SMURF::DENHAMDigital UNIX KernelMon Mar 10 1997 16:595
    Good stuff, guys.
    
    Now -- somebody file a QAR/IPMT on ps so it get fixed. I don't
    just jump into the ps code anymore. Too many hoops to jump
    through....