[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5043.0. "add 2x more areas backup time up by 5X" by M5::JHAYTER () Tue Feb 18 1997 14:44

Hello,

Rdb 6.1A, on Alpha 6.2

Customer reported rmu backup of appox 1000 storage areas, about 10gig total,
took around 4 hours.  After adding 1000 more storage areas, about 2K each,
backup took 22 hours.  Watching the backup process they found it looping
and page faulting.

The customer reported seeing a lot of time looping in the range of:
00111090
001110c0
00111098

I have somewhat reproduced the behavior.  I saw a lot of cpu usage, no I/O
in the range of
000ED0B4
000ED0B8
000ED174

Then, after a bunch of IO, more looping in the range of 00110* 
after which my backup finished fairly soon.

time/count for backup of 1000 storage areas (each 46 blocks, 20 pages) 

!cpu    4675
!dio    4755
!bufio  3072
!pgflts 27055
!virtpeak  252368  
end time:   18-FEB-1997 10:55:35.58
start time: 18-FEB-1997 10:53:29.72
elapsed time (seconds) 00000125.86 = 2 min.


time/count for backup of 2000 storage areas (each 46 blocks, 20 pages) 

!cpu    31710  ** 6.8 times more
!dio    9415   ** fwiw, had done around 2400 dio when the cpu/faulting kicked in
!bufio  6103
!pgflts 7833455  ** almost 300 times more
!virtpeak  446368  
end time:   18-FEB-1997 11:45:09.09
start time: 18-FEB-1997 11:35:06.22
elapsed time (seconds) 00000602.87 = 10 min


The first 1.5 hours,

30 minutes of CPU.
4 Million pagefaults
400k i/o.

next 2 hours
1.5 hours cpu
200k i/o
virtually no pagefaults

Any clues what rmu/backup might be doing in the various cpu intensive areas
and all the page faulting?  Looks like the behavior is a candidate for
bugging.

Thanks
Jerry
T.RTitleUserPersonal
Name
DateLines
5043.1M5::LWILCOXChocolate in January!!Tue Feb 18 1997 14:486
Jerry, this sounds ever so vaguely familiar from many years ago.

Does it seem to make a difference if you define the buffers logical to
something larger?

I might be thinking of an entirely different problem, but it rang a bell.
5043.2Disoptimal DB design for backup speedNOVA::DICKSONTue Feb 18 1997 15:597
    How come there are so many small storage areas?   There is processing
    overhead in BACKUP involved with having large numbers of storage areas
    being backed up at once.  It takes a lot of memory too.  Each storage
    area gets its own set of buffers and its own thread with stack space.
    
    How many tapes are being written in parallel?  What type of tape drive
    is being used?
5043.3how much overheadM5::JHAYTERTue Feb 18 1997 16:4623
Liz, upping buffers made the problem worse.


>    How come there are so many small storage areas?

I think the customer was just "testing".  I didn't feel like creating
a 10 gig db.

>There is processing
>    overhead in BACKUP involved with having large numbers of storage areas
>    being backed up at once.  It takes a lot of memory too.  Each storage
>    area gets its own set of buffers and its own thread with stack space.

That explains the some of the additional pgfile needed and some of the
time.  But the overall increase in cpu and page faulting still seems
a bit out of line.  6x for cpu, 300x for pgflts, at least in my test.
  
>    How many tapes are being written in parallel?  What type of tape drive
>    is being used?

On the customer side don't know, I'll check.  On my testing I was going to
disk.
5043.4DUCATI::LASTOVICAIs it possible to be totally partial?Tue Feb 18 1997 16:536
I think that you can assume that the increase in CPU time is
probably related to the increase in page faults.  I suspect that
RMU probably scales the number of 'somethings' based on the
number of storage areas and/or the size of each.  You might want
to try doubling the process's working set limits and see of that
helps.
5043.5NOVA::DICKSONWed Feb 19 1997 09:2423
    Performance degradation stops being linear and becomes exponential
    once key resources approach saturation.  So it is entirely reasonable
    for a 2x increase in load to result in a 5x degradation in performance
    if you have pushed it over the knee in the curve.
    
        A formula useful for approximating performance is
    
            Tresponse = Tbase * (1 + (u/(1-u))
    
            Where Tresponse is the time to complete an operation under load
                  Tbase is the time to complete the operation in the
                            absense of heavy load
                  u is the fraction of the resource (CPU, network, or disk
                            bandwidth for example) being used.
    
    You can see that as utilization approaches 1.0, Tresponse goes to
    infinity.
    
    2000 storage areas with even one disk buffer (32kB) each needs 64
    Megabytes of address space.  It is not surprising that you are
    getting a lot of page faults.  BACKUP attempts to read from all 2000 areas
    at once.  (This is planned to change...)
    
5043.6Queueing Theory 101NOVA::DICKSONWed Feb 19 1997 14:5011
    For anyone trying to figure out how that formula could be correct, it
    works on probabilities.   If the resource you want to use is idle, then
    your request will be processed right away, in time Tbase. If the
    resource you want to use is busy half the time on average, then when
    your request comes along you have a 50-50 chance of having to wait for
    a previous request to complete.  The busier the resource, the greater
    the probability you will have to wait in a queue somewhere, and
    therefore the longer the queue as everybody else waits too.
    
    This formula takes all that into account, and is very useful for
    back-of-the-envelope calculations about performance scalability.