[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference heron::dw_tools

Title:Datamart & Datawarehouse tools support
Moderator:HERON::ROWLANDS
Created:Thu Nov 28 1996
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:45
Total number of notes:155

33.0. "A question on database structure" by COPCLU::SCHOUBO (No noose is good noose) Wed Apr 02 1997 10:46

Hi,

I would really appreciate any comments you could give on the following 
DataWarehouse database setup I came across at a customer site. 
My gut feeling is that they have created an unnecessarily complicated
database, but then my experience is with modelling, not tuning.

Please feel free to comment here or directly to me at Jan Schoubo @DMA.
The comments will not make much difference in the current situation, but
I need to know whether this kind of setup is really necessary when I 
- hopefully - get into other similar situations (but with Alphas :-).


THE SITUATION

I am in the process of selling Business Objects to a major Telecom customer 
running several Data Warehouse databases (One DB2 PE on IBM SP2, one 
Oracle7 on HP, no Alphas unfortunately). 
One of these databases has a rather strange structure, 
according to the customer optimized with the sole purpose of enabling loads
in the time-window available. The structure (two aspects really) makes 
the setup of Business Objects complicated, as I shall describe.

The data is international Call Detail Records (CDR) - one for every call made 
from Denmark to abroad and visa versa. We are talking about 1-2 million records
per day, stored in detail for the last month, and summed at various levels
for older data.

The database is an Oracle 7 running on an HP box of some sort.

(Fast question: Is this a BIG amount of data for a decent Alpha? Anything
to worry about, if we have a bid of this size?)


I have describe the two different aspects below, and the combination at the end.



ASPECT 1

Instead of simply one table containing all 31 x 1-2 mill records 
with a non-unique key of (year, month, day), the data is split into
31 tables (TAB1, TAB2, ...) each containing data for one day.
To further complicate things, day 1 is not in TAB1 but perhaps in TAB17, 
and a management table keeps the association between the name of the table 
and the day of the data kept in it. All the TABn tables do have 
(year, month, day) as fields, though. Other fields include A- and B-number
(to and from telephone number), time of day, length of call in seconds, 
charge for that call, whether the call completed or not, customer etc.

The first problem has to do with defining one Business Objects object
containing a field from a CDR.

This can only be done elegantly by defining a view that gathers the 
data from all the TABn tables:

CREATE VIEW VIEWTAB AS
  SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB1 UNION
  SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB2 UNION 
  ....
  SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB31;

I thought Oracle had a limit on the textual length of the code for the view,
but that does not seem to be an issue.

By defining the BO objects on this view, the BO user can simply do a query
of say Month, CallCompleted, Charge, Length and even include a condition
like Day in (5,6) to have two days worth of data.

It turns out (via experiments) that the overhead for Oracle to check all
31 tables (if there is an index on Day) is very slight. However, if there
is an index, Oracle will then read that table by index which is much slower 
than scanning all records. 

If there is no index on Day, Oracle will read all 31 tables, which is of
course much slower in the common situation of just wanting one or a few 
days worth of records in your report.



ASPECT 2

In summary tables (say summed per customer per day) they have more than 
one record per customer per day. 

The record layout is like this:

    Customer, Year, Month, Day, Kind, Value1, Value2, Value3

The kind field defines the meaning of the Valuen fields:

    Kind   Value1                Value2               Value3
    =      contains              contains             contains
    1      Charge                CallLength           (nothing)
    2      Charge within Europe  Charge to Far East   Charge to US
    3      Length within Europe  Length to Far East   Length to US
      ... etc. ...

The reason for doing this was quoted as spacesaving, as not all kinds need
be represented (in case all three values are zero, the record is not present).

Again, to make the BO definitions elegant, a view is defined for each kind,

CREATE VIEW VIEVSUM1 AS
  SELECT CUSTOMER, YEAR, MONTH, DAY, CHARGE, CALLLENGTH FROM TABSUM
    WHERE KIND = 1;

CREATE VIEW VIEVSUM2 AS
  SELECT CUSTOMER, YEAR, MONTH, DAY, CHARGEEU, CHARGEFE, CHARGEUS FROM TABSUM
    WHERE KIND = 2;

   ....

Now you can define BO objects on CHARGE, CHARGEUS, CHARGEFE etc.

This means that BO will generate joins between two views (and therefore 
two tables) whenever you run a BO query with objects from two different 
views.

Any comment on why not just have all the columns in the same record, and
hope (?) Oracle does space compression if values are zero (Rdb does :-).
In case you need to add a new information type, simply add a column to the 
tables?

ASPECT 1 and ASPECT 2

Well, turns out that the summary tables are ALSO structured as 31 tables 
for each summary! This means that although there is only 20 different 
record layouts in all, there is over 500 tables in the database!
This also means that the views mentioned in ASPECT 2 must be multiplied
by 31!

- Jan
T.RTitleUserPersonal
Name
DateLines
33.1YuccHERON::ROWLANDSRob Rowlands @vbo 828 5480Fri Apr 04 1997 16:2215
Jan,
What a crap database design - I sympathise!
Dutch PTT are building a similar DW on call records, and to get the load
throughput they are using Redbrick (renowned for it's fast loading).

Looks like they should've bought Alpha,since it should be possible to load a
couple of million records a day quite easily. In an experiment we did in Rome,
we were loading 800,000 records in about half an hour (no indexes, striped
disks, just a little Alpha 1000).

Finally I don't (yet...) feel qualified to talk about Oracle compression
capabilities of missing fields. You might try asking the oracle notes file.
What is really useful for the missing fields is vertical partiioning, which (of
course) oracle does not yet have (Informix does).
 
33.2Check thisUTROP1::utodhcp-197-240-212.uto.dec.com::olthof_hSpellchecked Henry AlthoughMon Apr 07 1997 17:3421
Well,

Dutch PTT loads 40M CDRs/day, in total 5 GB/day. Problem they had
was that records had to be inserted in the 'right place' in the table,
to allow fast retrieval. Red Brick proved to be faster than Oracle
on this. Check the reference information in the DW notesfile, note 5.95
(from the top of my head), reference PTT CDR.

Typically, when there is not enough 'space' between the steps in
aggregations/hierarchies, this leads to an explosion of summary
tables, also sometimes to an explosion of storage requirements.

If you want to discuss technical issues, contact the DBTC in M�nich
directly, either Martin Gosejacob or Klaus Grupe, both @RTO. Also the
Dutch people can help, contact Paul Laman or Jan Mulder, both @UTO.
Looks to me like the design in Holland was optimized for retrieval
(=problems with load), maybe the Danish design was optimized for
load (=problems with retrieval). It's either the cat or the dog.

Cheers,
Henny
33.3Thanks! This is a sticky issue...COPCLU::SCHOUBONo noose is good nooseTue Apr 08 1997 16:1416
Hi Rob, Henny,

Thanks for your feedback - I will feel more comfortable in the future when 
promoting DW.

As for the database in question, this could turn out to be a critical 
issue for BusinessObjects at that department. They obviously designed for 
load rather than retrieval, so we are stuck with the mess.
What it boils down to is wether it is possible to have the BO construct 
the object definitions on the fly, based on parameters, before running 
the resulting SQL...! The customer does this in programs now, but then again, 
why have a query tool, when you write your own programs anyway...

I'll get back if there are some interesting developments.

- Jan
33.4HERON::ROWLANDSRob Rowlands @vbo 828 5480Tue Apr 08 1997 18:0115
Jan,

I should say also that your favourite query tool does have a feature that helps
in your case. How about:

rename the table to 

sales_month@prompt('WHich Month','N',,)

This is what I'm seeing in a BO slide, so no guarantee exactly how it works...!
Also you could play with @variable rather than @prompt, where the variable
could be some environmental variable...

Hope this helps,
Rob
33.5Rename Table does not call the prompt...COPCLU::SCHOUBONo noose is good nooseFri Apr 11 1997 14:029
Rob,

Thanks for the suggestion. I saw the same slide yesterday, and I have 
now tried it. Whether it is a bug or me, I don't know (probably a bug :-)
but it doesn't work. When you say run to the query, it just accepts the
click, but still keeps the Data Provider window open - i.e. nothing happens.
I haven't given up, though...

- Jan