[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9972.0. "UBC locking and realtime performance" by CSC64::BLAYLOCK (If at first you doubt,doubt again.) Wed May 28 1997 17:26

In the "Guide to Realtime Programming" in Section 11.2 there is
a paragraph that reads:

   "For improved realtime responsiveness, change this the value
   of /sys/conf/param.c to between 50 and 80, depending on the
   amount of file system activity done on the system.  This can
   improve system realtime latency, because when the UBC has
   consumed its maximum allocation of memory for buffering
   file data, the least recently used buffers must be flushed to
   disk if they are modified. Flushing these buffers is done with
   a simple lock held, and therefore can effect process dispatch
   latency.  The more memory that the UBC is allowed to use
   before flushing, the longer it will take to perform the flush-
   ing.  Lowering the value of the ubc_maxpercent parameter
   will cause the flushing to occur more frequently, but take less
   time."

Would it be possible to get further clarification on the 
sentence "Flushing these buffers is done with
a simple lock held, and therefore can effect process dispatch
latency." ?  How often and under what cercumstances is the UBC
flushed?  For how long will this simple lock be taken out?

The problem that my customer is seeing is that a realtime process
will have an 800 millisecond delay when a process (timeshare priority)
does a "dd if=core of=new_core conv=sparse".  The system itself
is a Turbolaser with 4 processors and 4GB of memory.   The core files
themselves are small (10 to 15MB).

Thanks for any assistance.

Ken
T.RTitleUserPersonal
Name
DateLines
9972.1Some info28964::PARKERWed May 28 1997 18:0931
                                                                   
    Hi, Ken.
    
    >> How often and under what cercumstances is the UBC flushed?
    
    Every 30 seconds the update daemon sync's the disks.
    
    >> For how long will this simple lock be taken out?
    
    It depends on several factors but it could be as long as it takes
    to flush the UBC. 800 ms is not unreasonable.
    
    >>The problem that my customer is seeing is that a realtime process
    >>will have an 800 millisecond delay when a process (timeshare priority)
    >>does a "dd if=core of=new_core conv=sparse".  The system itself
    >>is a Turbolaser with 4 processors and 4GB of memory.   The core files
    >>themselves are small (10 to 15MB).
    
    Well, if it's using timeshare scheduling (i.e. non-fixed priority)
    then I would not consider the application "realtime". Have they tried
    using round-robin or fixed priority policies? Might be a good idea.
    
    I would be happy to discuss the issue with you. They could also lock
    their app into memory or try other things to help reduce the latency.
    
    Best Regards,
    
    Lee Parker                       [email protected]
    Digital UNIX Device Driver & Realtime Support
    Realtime Expertise Center        1-800-354-9000
    
9972.2HELIX::SONTAKKEThu May 29 1997 12:3311
    I thought he said the process doing "dd if=core of=new_core
    conv=sparse" was using timeshare priority.
    
    Which version of the operating system?
    
    Few weeks ago, somebody mentioned that doing large file copy was
    causing the system to become unresponsive during the time it took to
    copy the file.  I have not seen any update to that topic.  I wonder if
    the root cause is similar to Ken's problem.
    
    - Vikas
9972.3More infoRHETT::PARKERThu May 29 1997 13:1819
    
    Hi, Vikas.
    
    I just talked to Ken about it - yes, the dd(1) is timeshare and
    the realtime app is SCHED_FIFO. It's sending data over the net
    using sockets (I think) so maybe that's one reason it's being
    blocked? Ken is going to try to get more info using ps(1) and 
    any other tools he can find - do you all have any tools that 
    will show more ? Their ubc-maxpercent is something like 3% so
    it should not take that long to sync the disks - maybe it's too
    low? Maybe Ken can either mail you more info or post it here. 
    
    I recall the other note too - I'm not sure if/how it was resolved.
    I can't seem to find it right now. 
    
    Thanks, 
    
    Lee
    
9972.4CSC64::BLAYLOCKIf at first you doubt,doubt again.Thu May 29 1997 14:0931

The version of the OS is V4.0A with BL4 of USEG's patches
applied (the January kit).

The only realtime process is reading and writing over the
network (FORE ATM) via the BSD socket interface (UDP).  The
process has never lost any packets, it is only the delay
that is noticed (the packet is echoed with a time stamp).
Changing the network interface to FDDI or ethernet does
not change the delay.

Changing the process priority from 33 to 63 has no effect.
Changing the priority of the netisr thread(s) has no effect.

There are actually no real users on the system (except the
developer) and so an 800 ms delay is not noticable except to
this one peice of the application.

Even with ps(1), getting a picture of what the RT priority
process is being blocked by is not going to be easy.  The
only time that the delay is seen is when the dd(1) occurs
(the script involved is looking for and copying core files
so the 'conv=sparse' is necessary.)

We have been looking for tools to find out what the schedualar
has been doing, but the POLYCENTER tools are not up to V4.0 yet
and none of the others (Parasite) seem to show what the schedular
is actually doing.

Thanks to all for the response.
9972.5HELIX::SONTAKKEThu May 29 1997 17:181
    I just found the note; it is 9075 and it talks about mv performance.
9972.6KITCHE::schottEric R. Schott USG Product ManagementThu May 29 1997 21:424
Setting ubc-max to 3% is a really BAD idea....

I don't know the answer, but this one at 3% can't help...