[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5114.0. "Parallel Load Performance Question" by NOVA::SANTIAGO (I was a teenage net-random.) Thu Mar 06 1997 10:28

    I'm trying to understand parallel performance in the following scenario
    
    	- load 2.7 billion rows in a single table
    	- area is partitioned by year (6) and by a market channel (41)
    	- of the 41 various channels, 8 comprise > 50% of all rows
    	- using a thread count of 17; this number allows each of the 8
    	  hot area to get their own executor
    	- the record itself is being read from a fixed binary file; 55
          bytes each by month (total 72 files for 6 years)
    	- commits are 10000, row_count 500
    	- performance is about 20K+ records/second (1.5 hours per file) as
    
    %RMU-I-EXECSTAT0, Statistics for EXECUTOR_1:
    %RMU-I-EXECSTAT1,   Elapsed time:  01:37:25.45    CPU time:      1624.63
    %RMU-I-EXECSTAT2,   Storing time:  00:28:19.83    Rows stored:   11540432
    %RMU-I-EXECSTAT3,   Commit time:   00:02:45.76    Direct I/O:     35916
    %RMU-I-EXECSTAT4,   Idle time:     01:06:15.81    Early commits:  1
    
    my concern is the idle time; this is on an 8400 w/ 6 CPU currently
    seeing utilization average of 350%; I/O is spread evenly acorss 8 hsj
    controllers (each mapping 8 storage units, each comprised of 4 member
    rz29 stripesets)
    
    my goal is to cut this time by 40% or more; any suggestions
    
    /los
T.RTitleUserPersonal
Name
DateLines
5114.1physical area layoutNOVA::SANTIAGOI was a teenage net-random.Thu Mar 06 1997 10:306
    oh, forgot to mention 1 detail;
    
    there are 6 databases, one per year, mapped to by parallel query; each
    database contains 41 area files (1 per channel)
    
    /los
5114.2NOVA::SMITHIDon't understate or underestimate Rdb!Thu Mar 06 1997 12:1210
is the data in the sourec sorted?

If so then the load is probably delivering rows to the executor initially then
after a while it favours the next executor and so on.  The _1 executor then
sits idle.

rmu/load/parallel works best with unsorted data so that all executors are
active at the same time.

Ian
5114.3goal is to reduce idle time?NOVA::SANTIAGOI was a teenage net-random.Thu Mar 06 1997 15:0220
    the input is created as follows
    
    	loop 1..31 days
    	  loop 1..n products
            loop 1..n marketss
              loop 1...n channels
    
    yielding about 118M per month; the executors are busy usually 25-65%
    cpu and faily even across them; it had been skewed until I moved the
    channel loop down
    
    I suspect there's an interaction between the number of buffers (comm)
    and the row count to make the executors busy, but I didn't want to
    create busy work due to comm loading, nor do I want stalls due to comm
    starvation
    
    I guess the question I'm realy trying to answer is, is idle time bad in
    a parallel load?
    
    /los
5114.4AVMSV1::EKREISLEErich KreislerFri Mar 14 1997 08:373
Did you try to increase the buffer option also ?

erich