[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | Datamart & Datawarehouse tools support |
|
Moderator: | HERON::ROWLANDS |
|
Created: | Thu Nov 28 1996 |
Last Modified: | Wed Jun 04 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 45 |
Total number of notes: | 155 |
33.0. "A question on database structure" by COPCLU::SCHOUBO (No noose is good noose) Wed Apr 02 1997 10:46
Hi,
I would really appreciate any comments you could give on the following
DataWarehouse database setup I came across at a customer site.
My gut feeling is that they have created an unnecessarily complicated
database, but then my experience is with modelling, not tuning.
Please feel free to comment here or directly to me at Jan Schoubo @DMA.
The comments will not make much difference in the current situation, but
I need to know whether this kind of setup is really necessary when I
- hopefully - get into other similar situations (but with Alphas :-).
THE SITUATION
I am in the process of selling Business Objects to a major Telecom customer
running several Data Warehouse databases (One DB2 PE on IBM SP2, one
Oracle7 on HP, no Alphas unfortunately).
One of these databases has a rather strange structure,
according to the customer optimized with the sole purpose of enabling loads
in the time-window available. The structure (two aspects really) makes
the setup of Business Objects complicated, as I shall describe.
The data is international Call Detail Records (CDR) - one for every call made
from Denmark to abroad and visa versa. We are talking about 1-2 million records
per day, stored in detail for the last month, and summed at various levels
for older data.
The database is an Oracle 7 running on an HP box of some sort.
(Fast question: Is this a BIG amount of data for a decent Alpha? Anything
to worry about, if we have a bid of this size?)
I have describe the two different aspects below, and the combination at the end.
ASPECT 1
Instead of simply one table containing all 31 x 1-2 mill records
with a non-unique key of (year, month, day), the data is split into
31 tables (TAB1, TAB2, ...) each containing data for one day.
To further complicate things, day 1 is not in TAB1 but perhaps in TAB17,
and a management table keeps the association between the name of the table
and the day of the data kept in it. All the TABn tables do have
(year, month, day) as fields, though. Other fields include A- and B-number
(to and from telephone number), time of day, length of call in seconds,
charge for that call, whether the call completed or not, customer etc.
The first problem has to do with defining one Business Objects object
containing a field from a CDR.
This can only be done elegantly by defining a view that gathers the
data from all the TABn tables:
CREATE VIEW VIEWTAB AS
SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB1 UNION
SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB2 UNION
....
SELECT YEAR,MONTH,DAY,ANUM,BNUM,CHARGE,LENGTH,... FROM TAB31;
I thought Oracle had a limit on the textual length of the code for the view,
but that does not seem to be an issue.
By defining the BO objects on this view, the BO user can simply do a query
of say Month, CallCompleted, Charge, Length and even include a condition
like Day in (5,6) to have two days worth of data.
It turns out (via experiments) that the overhead for Oracle to check all
31 tables (if there is an index on Day) is very slight. However, if there
is an index, Oracle will then read that table by index which is much slower
than scanning all records.
If there is no index on Day, Oracle will read all 31 tables, which is of
course much slower in the common situation of just wanting one or a few
days worth of records in your report.
ASPECT 2
In summary tables (say summed per customer per day) they have more than
one record per customer per day.
The record layout is like this:
Customer, Year, Month, Day, Kind, Value1, Value2, Value3
The kind field defines the meaning of the Valuen fields:
Kind Value1 Value2 Value3
= contains contains contains
1 Charge CallLength (nothing)
2 Charge within Europe Charge to Far East Charge to US
3 Length within Europe Length to Far East Length to US
... etc. ...
The reason for doing this was quoted as spacesaving, as not all kinds need
be represented (in case all three values are zero, the record is not present).
Again, to make the BO definitions elegant, a view is defined for each kind,
CREATE VIEW VIEVSUM1 AS
SELECT CUSTOMER, YEAR, MONTH, DAY, CHARGE, CALLLENGTH FROM TABSUM
WHERE KIND = 1;
CREATE VIEW VIEVSUM2 AS
SELECT CUSTOMER, YEAR, MONTH, DAY, CHARGEEU, CHARGEFE, CHARGEUS FROM TABSUM
WHERE KIND = 2;
....
Now you can define BO objects on CHARGE, CHARGEUS, CHARGEFE etc.
This means that BO will generate joins between two views (and therefore
two tables) whenever you run a BO query with objects from two different
views.
Any comment on why not just have all the columns in the same record, and
hope (?) Oracle does space compression if values are zero (Rdb does :-).
In case you need to add a new information type, simply add a column to the
tables?
ASPECT 1 and ASPECT 2
Well, turns out that the summary tables are ALSO structured as 31 tables
for each summary! This means that although there is only 20 different
record layouts in all, there is over 500 tables in the database!
This also means that the views mentioned in ASPECT 2 must be multiplied
by 31!
- Jan
T.R | Title | User | Personal Name | Date | Lines |
---|
33.1 | Yucc | HERON::ROWLANDS | Rob Rowlands @vbo 828 5480 | Fri Apr 04 1997 16:22 | 15 |
| Jan,
What a crap database design - I sympathise!
Dutch PTT are building a similar DW on call records, and to get the load
throughput they are using Redbrick (renowned for it's fast loading).
Looks like they should've bought Alpha,since it should be possible to load a
couple of million records a day quite easily. In an experiment we did in Rome,
we were loading 800,000 records in about half an hour (no indexes, striped
disks, just a little Alpha 1000).
Finally I don't (yet...) feel qualified to talk about Oracle compression
capabilities of missing fields. You might try asking the oracle notes file.
What is really useful for the missing fields is vertical partiioning, which (of
course) oracle does not yet have (Informix does).
|
33.2 | Check this | UTROP1::utodhcp-197-240-212.uto.dec.com::olthof_h | Spellchecked Henry Although | Mon Apr 07 1997 17:34 | 21 |
| Well,
Dutch PTT loads 40M CDRs/day, in total 5 GB/day. Problem they had
was that records had to be inserted in the 'right place' in the table,
to allow fast retrieval. Red Brick proved to be faster than Oracle
on this. Check the reference information in the DW notesfile, note 5.95
(from the top of my head), reference PTT CDR.
Typically, when there is not enough 'space' between the steps in
aggregations/hierarchies, this leads to an explosion of summary
tables, also sometimes to an explosion of storage requirements.
If you want to discuss technical issues, contact the DBTC in M�nich
directly, either Martin Gosejacob or Klaus Grupe, both @RTO. Also the
Dutch people can help, contact Paul Laman or Jan Mulder, both @UTO.
Looks to me like the design in Holland was optimized for retrieval
(=problems with load), maybe the Danish design was optimized for
load (=problems with retrieval). It's either the cat or the dog.
Cheers,
Henny
|
33.3 | Thanks! This is a sticky issue... | COPCLU::SCHOUBO | No noose is good noose | Tue Apr 08 1997 16:14 | 16 |
| Hi Rob, Henny,
Thanks for your feedback - I will feel more comfortable in the future when
promoting DW.
As for the database in question, this could turn out to be a critical
issue for BusinessObjects at that department. They obviously designed for
load rather than retrieval, so we are stuck with the mess.
What it boils down to is wether it is possible to have the BO construct
the object definitions on the fly, based on parameters, before running
the resulting SQL...! The customer does this in programs now, but then again,
why have a query tool, when you write your own programs anyway...
I'll get back if there are some interesting developments.
- Jan
|
33.4 | | HERON::ROWLANDS | Rob Rowlands @vbo 828 5480 | Tue Apr 08 1997 18:01 | 15 |
| Jan,
I should say also that your favourite query tool does have a feature that helps
in your case. How about:
rename the table to
sales_month@prompt('WHich Month','N',,)
This is what I'm seeing in a BO slide, so no guarantee exactly how it works...!
Also you could play with @variable rather than @prompt, where the variable
could be some environmental variable...
Hope this helps,
Rob
|
33.5 | Rename Table does not call the prompt... | COPCLU::SCHOUBO | No noose is good noose | Fri Apr 11 1997 14:02 | 9 |
| Rob,
Thanks for the suggestion. I saw the same slide yesterday, and I have
now tried it. Whether it is a bug or me, I don't know (probably a bug :-)
but it doesn't work. When you say run to the query, it just accepts the
click, but still keeps the Data Provider window open - i.e. nothing happens.
I haven't given up, though...
- Jan
|