[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:	Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:	RDB_60 is archived, please use RDB_70..
Moderator:	NOVA::SMITHISON

Created:	Fri Mar 18 1994
Last Modified:	Thu May 29 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5118
Total number of notes:	28246

5043.0. "add 2x more areas backup time up by 5X" by M5::JHAYTER () Tue Feb 18 1997 14:44

Hello,

Rdb 6.1A, on Alpha 6.2

Customer reported rmu backup of appox 1000 storage areas, about 10gig total,
took around 4 hours.  After adding 1000 more storage areas, about 2K each,
backup took 22 hours.  Watching the backup process they found it looping
and page faulting.

The customer reported seeing a lot of time looping in the range of:
00111090
001110c0
00111098

I have somewhat reproduced the behavior.  I saw a lot of cpu usage, no I/O
in the range of
000ED0B4
000ED0B8
000ED174

Then, after a bunch of IO, more looping in the range of 00110* 
after which my backup finished fairly soon.

time/count for backup of 1000 storage areas (each 46 blocks, 20 pages) 

!cpu    4675
!dio    4755
!bufio  3072
!pgflts 27055
!virtpeak  252368  
end time:   18-FEB-1997 10:55:35.58
start time: 18-FEB-1997 10:53:29.72
elapsed time (seconds) 00000125.86 = 2 min.


time/count for backup of 2000 storage areas (each 46 blocks, 20 pages) 

!cpu    31710  ** 6.8 times more
!dio    9415   ** fwiw, had done around 2400 dio when the cpu/faulting kicked in
!bufio  6103
!pgflts 7833455  ** almost 300 times more
!virtpeak  446368  
end time:   18-FEB-1997 11:45:09.09
start time: 18-FEB-1997 11:35:06.22
elapsed time (seconds) 00000602.87 = 10 min


The first 1.5 hours,

30 minutes of CPU.
4 Million pagefaults
400k i/o.

next 2 hours
1.5 hours cpu
200k i/o
virtually no pagefaults

Any clues what rmu/backup might be doing in the various cpu intensive areas
and all the page faulting?  Looks like the behavior is a candidate for
bugging.

Thanks
Jerry

T.R	Title	User	Personal Name	Date	Lines
5043.1		M5::LWILCOX	Chocolate in January!!	`Tue Feb 18 1997 14:48`	6
	Jerry, this sounds ever so vaguely familiar from many years ago. Does it seem to make a difference if you define the buffers logical to something larger? I might be thinking of an entirely different problem, but it rang a bell.
5043.2	Disoptimal DB design for backup speed	NOVA::DICKSON		`Tue Feb 18 1997 15:59`	7
	How come there are so many small storage areas? There is processing overhead in BACKUP involved with having large numbers of storage areas being backed up at once. It takes a lot of memory too. Each storage area gets its own set of buffers and its own thread with stack space. How many tapes are being written in parallel? What type of tape drive is being used?
5043.3	how much overhead	M5::JHAYTER		`Tue Feb 18 1997 16:46`	23
	Liz, upping buffers made the problem worse. > How come there are so many small storage areas? I think the customer was just "testing". I didn't feel like creating a 10 gig db. >There is processing > overhead in BACKUP involved with having large numbers of storage areas > being backed up at once. It takes a lot of memory too. Each storage > area gets its own set of buffers and its own thread with stack space. That explains the some of the additional pgfile needed and some of the time. But the overall increase in cpu and page faulting still seems a bit out of line. 6x for cpu, 300x for pgflts, at least in my test. > How many tapes are being written in parallel? What type of tape drive > is being used? On the customer side don't know, I'll check. On my testing I was going to disk.
5043.4		DUCATI::LASTOVICA	Is it possible to be totally partial?	`Tue Feb 18 1997 16:53`	6
	I think that you can assume that the increase in CPU time is probably related to the increase in page faults. I suspect that RMU probably scales the number of 'somethings' based on the number of storage areas and/or the size of each. You might want to try doubling the process's working set limits and see of that helps.
5043.5		NOVA::DICKSON		`Wed Feb 19 1997 09:24`	23
	Performance degradation stops being linear and becomes exponential once key resources approach saturation. So it is entirely reasonable for a 2x increase in load to result in a 5x degradation in performance if you have pushed it over the knee in the curve. A formula useful for approximating performance is Tresponse = Tbase * (1 + (u/(1-u)) Where Tresponse is the time to complete an operation under load Tbase is the time to complete the operation in the absense of heavy load u is the fraction of the resource (CPU, network, or disk bandwidth for example) being used. You can see that as utilization approaches 1.0, Tresponse goes to infinity. 2000 storage areas with even one disk buffer (32kB) each needs 64 Megabytes of address space. It is not surprising that you are getting a lot of page faults. BACKUP attempts to read from all 2000 areas at once. (This is planned to change...)
5043.6	Queueing Theory 101	NOVA::DICKSON		`Wed Feb 19 1997 14:50`	11
	For anyone trying to figure out how that formula could be correct, it works on probabilities. If the resource you want to use is idle, then your request will be processed right away, in time Tbase. If the resource you want to use is busy half the time on average, then when your request comes along you have a 50-50 chance of having to wait for a previous request to complete. The busier the resource, the greater the probability you will have to wait in a queue somewhere, and therefore the longer the queue as everybody else waits too. This formula takes all that into account, and is very useful for back-of-the-envelope calculations about performance scalability.