T.R | Title | User | Personal Name | Date | Lines |
---|
1248.1 | Take your time | IJSAPL::OLTHOF | Henny Olthof@UTO, 838-2021 | Thu May 06 1993 10:45 | 9 |
| Lars,
I have no immediate need for that information but it would be nice if
you could post a pointer once you are done with the writing of the
material. You know, it's hard to get competitive info these days.
Anyway, Informix is not (yet) a competitor on the same platform as Rdb.
Thanks,
Henny
|
1248.2 | INFORMIX/OnLine V5.01 Weak Points | COPCLU::BRUNSGAARD | Alarm set to Quarter past Friday | Tue May 11 1993 20:39 | 175 |
| This note is highlighting areas where INFORMIX/OnLine has weak points.
The information is presented without explanations, just the plain facts.
If you need explanations why I feel it is a weak point, or specific questions
about featues, please don't hesistate to reply here.
Whereever appropiate I compare with corresponding DEC Rdb features.
The information is based on
DEC Rdb V5.1
INFORMIX/OnLine V5.01
and my knowlege about both products at the time of writing.
This note contains information about:
System Architecture
I/O Architecture
BLOB Support
Referencial Integrity
Storage mechanisms
Transaction Control
Optimizer
System Architecture:
====================
1 database attach = 1 frontend process + 1 backend process
typically used in Client/server environments.
Precompiler support under Windows (ie native SQL is written)
Minimal *dictionary* support throught the
DEFINE ... LIKE table
command.
both in stored procedures and windows Client tools (4gl)
I/O Architecture:
==================
Page Sizes are fixed at either 2Kbytes or 4Kbytes
Size is set by the installation kit and is decided only by the INFORMIX
development group on a per platform basis.
Compare this with Rdb's 0.5-16Kbytes page size intervals
Buffer Size is fixed at Page size, except for SEQUENTIAL SCAN's where an
8 page buffer is used.
Compare this with Rdb's 0.5-32Kbytes buffer size intervals
INDEX NODE sizes are fixed at page size
Compare this with Rdb's user defined node sizes
DBspaces (Storage Areas) have predefined sizes.
Compare this with Rdb's online extensions
Logical and Physical Logs (similar in AIJ and RUJ in function) have predefined
sizes
Compare this with Rdb's online extensions
Table and Index will always share the same Tablespaces (storage areas)
Compare this with Rdb's advanced storage map posibilities for table and
indices
INDEX are always B*-tree indices.
Compare this with Rdb's SORTED and HASH indices
ROWID (DBKey) consists of 3 bytes TABLE-ID, 1 byte Row index
The actual chunk (file) the data has to be looked up in another table
Note this architectually limits the number of pages (hence rows) for a
given table + all indices to 2**24 ~ 16 mill. pages
See also BLOB Support.
Compare this with Rdb's 4byte structure and variable page sizes
BLOB Support
================
All BLOBs in a database (OnLine system) share the same BLOBspace.
Compare this with Rdb's advanced storage map posibilities for BLOBS
The ONLINE archive (backup) cannot process the BLOBspace online.
Compare this with Rdb's advanced ONLINE backup options.
There is no support for WORM devices
Storage mechanisms
===================
Rows are always UNCOMPRESSED
Dynamic structures are handled by having all static structures on a
*home* page, and all dynamic structures (VARCHAR and BLOBS) on 1 or more
separate pages.
Compare this with Rdb's advanced storage map posibilities for table
Indices are always UNCOMPRESSED
Compare this with Rdb's advanced pre/post fix compression applied
automatically and the user specified compression option.
VARCHAR's can only be 256 chars maximum.
Compare this with Rdb's 32000 chars limit
Free space is found usig special BITMAP pages that uses
2bit structure for fixed size rows less than page size
4bit structure for dynamic rows
This means one bitmap page for approx each 7000 pages
Compare this with Rdb's advanced AIP/SPAM mechanism to find free space
CLUSTER INDICES is a *one shot* operation that only sorts data
CREATE CLUSTER INDEX ... or
ALTER INDEX <name> TO CLUSTER
really doesn't apply to the index at all.
It sort the DATA rows according to the index, and places them back on disk
(now in sorted order).
However NO ATTEMPTS are done from now on by INFORMIX to keep the clustering
effect at run-time.
Compare this with Rdb's real data clustering both by hash indices and
sorted indices.
Transaction Control
====================
There is no support for ISOLATION LEVEL SERIALIZABLE transactions
There is no concept of a READ ONLY transaction,
ie a transaction that guaranties a consistent data view,
without imposing any visible locking.
Only 1 transaction can be rolled back at a time, in case of a process being
killed.
Compare this with Rdb's parallel recovery mechanism.
Referencial Integrity
=====================
PRIMARY KEY and FOREIGN KEY's create INDEX implicitly to help the checks.
Compare this with Rdb's general handling of indices required for
constraints.
CHECK constraints can only check intra-row dependencies,
ie Project start < Project end
Compare this with Rdb's generalized CHECK constraints
Optimizer
=============
The optimizer is costbased and needs information from a regular run
UPDATE STATISTICS sql command (can be run online).
Cost is calculated as #I/O's + #ROWs*0.03 (CPU cost of row processing).
Compare this with Rdb's highly advanced dynamic optimization
Information used is:
Table level:
- Number of pages
- number of rows
- Second largest and smallest value on a per column basis
Index Level:
- Number of pages used
- Number of leaves
- number of levels
- number of unique values for entire index
- degree of clustering (how *ordered* are the rows)
ORDER BY, GROUP BY, DISTINCT will always result in a temporary table
being created.
The temporary table is always placed in the same file (rootdbs) and
is fixed sized.
Only leading segments can be used in indices, there is no concept of a
key-only boolean as in Rdb.
Leading segments in a multisegmented index has the same uniqueness applied
as the entire index (ie SEX, BADGENO; using SEX alone will act as if UNIQUE).
The optimizer strategy (plan) is never remembered unless stored procedures
are used.
Compare this with Rdb's automatic automatic-once execute-many architecture
regardless of client software.
Optimizer strategy only available if source code is available and changed
Compare this with Rdb's DEBUG_FLAGS mechanism
No Global query optimization depsite the global DML capabilities
|
1248.3 | INFORMIX/OnLine V5.01 Strong Points | COPCLU::BRUNSGAARD | Alarm set to Quarter past Friday | Tue May 11 1993 20:40 | 138 |
| This note is highlighting areas where INFORMIX/OnLine has strong points.
The information is presented without explanations, just the plain facts.
If you need explanations why I feel it is a strong point, or specific questions
about featues, please don't hesistate to reply here.
Whereever appropiate I compare with corresponding DEC Rdb features.
The information is based on
DEC Rdb V5.1
INFORMIX/OnLine V5.01
and my knowlege about both products at the time of writing.
This note contains information about:
System Architecture
Buffer Management
Transaction Control
SQL, Stored Procedures and Triggers
Index structure
Optimizer
Administration tools
System Architecture:
====================
Precompiler support under Windows (ie native SQL is written)
Minimal *dictionary* support throught the
DEFINE ... LIKE table
command.
both in stored procedures and windows Client tools (4gl)
Invisible distributed DML and transaction capabilities.
BUFFER/page Management
======================
Updated Buffers are written by background processes (page cleaners)
#page cleaners can be configured individually on a system basis
Buffer locks are using latches instead of kernal-locking
Page cleanup (making freespace 1 big clump) is only done on demand basis,
ie a new row is less than FreeSpaceLeft but there is no hole where the row
fits in.
Transaction Control
=================
ISOLATION LEVEL READ UNCOMMITED is implemented
SQL, Stored Procedures and Triggers
===============================
Scroll Cursors
- Fetch absolute, next, prir, relative aso.
- cursor stability automatically
Distributed DML operations (Throught INFORMIX/Star).
- No metadata sharing between databases, nor a central place where
metadata must be kept.
- invisible integration in SQL from programs
SELECT FROM <table>, rough BNF for <table>
<table> := <table_name> | <alias>
<alias> := <distributed name> <table_name>
- used in any statement except UPDATE/INSERT/DELETE of VIEWS.
UPDATE statements with record syntax
ie. Update set * = (execute procedure ...)
Triggers implemented as SQL draft paper
- indefinit nesting of triggers (triggered by triggers...)
- Before and After events
- For Each Row event
- Insert, Update and Delete events
Very advanced Stored PRocedure support.
- Local variables on procedure or Block level (BEGIN END)
- IF-then-else
- Case statement
- While and For loops
- Exception handling
- Whenever ... Call [with resume]
- Raise Exception
- saving of SQLCODE, SQLCA and SQLDIAGNISTIC variables
- recursiuve call (20! is just as easy as Pascal)
- BLOB procedure parameters considered BY REFERENCE
all other BY VALUE
- generalized LET statement
LET (a,b) = (1,2)
- CURSOR CONTEXT retension after RETURN
ie next call by teh same client to the stored procedure will
just step further on in the data
- Precompiled query strategies with automatic recompilation if any
dependencies are REMOVED.
Recompile possible on demand by UPDATE STATISTICS FOR PROCEDURE command
- complety integrated in SQL
ie.
SELECT ...
FROM ...
WHERE col = Execute procedure <name>
- For-each construct (an automatic looping cursor)
- System callout, but no parameters can be passed back
Index structure
=================
Nonunique indices have duplicate chains included in the primary index tree
Only the ACTUAL value used in the index is locked even if several values are
placed in the same index node
SELECT * FROM table WHERE NAME = 'Lars'
will not lock any other part of the index than the one pointing to 'Lars'
not even 'Larse'.
Index node splits: keys are generally balanced, but if the key value is
the new maximum value only the new key will be inserted in the new node.
Ex.
Node 1 contains: 1, 2, 3, 4, 5
if a "6" comes along the index will split as
Node 1: same as above
Node 2: 6
Optimizer
=================
Index Only retrieval is implemented
Administration tools
=================
Online Backup (except BLOBspaces)
Very advanced Loader
several tables loaded from 1 input record
Datatype conversions supported
Specific control over errors during load
Control over COMMIT intervals
DBcreate utility to generate command procedure of database or specific items
|
1248.4 | Not so strong a point | NOVA::FISHER | US Patent 5225833 | Fri Aug 06 1993 16:24 | 36 |
| re:
Index node splits: keys are generally balanced, but if the key value is
the new maximum value only the new key will be inserted in the new node.
Ex.
Node 1 contains: 1, 2, 3, 4, 5
if a "6" comes along the index will split as
Node 1: same as above
Node 2: 6
-----------------------------------------------------
For Rdb:
Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
if a "11" comes along the index will split as
Node 1: 1 through 9
Node 2: 10, 11
[with compression we get more into a node in the first place :-) ]
I think this was an early Rdb design feature because with the Informix
algorithm, after the in-order insertions are done, the first record to be
inserted causes node splits and tree rebalancing. For example, create
the tree inserting 1.000.000 records in sorted order each node is full
[unless "full" doesn't mean "full"] now insert a record with key value
998.5 -- the node it goes into must be split, and each of the higher level
nodes must be split as well.
(Remember in V1.0 and V1.1 we created btrees by sorting the keys and
inserting them into the tree in order. V2.0 incorprated the current
build-bottom-up-withpercent-fill method. So you could say "we tried and
rejected this method") :-)
ed
|
1248.5 | Seuqnce numbres in indices are generally painfull | COPCLU::BRUNSGAARD | I swear to say the whole truth as often as I can | Tue Aug 10 1993 00:24 | 28 |
| > For Rdb:
>
> Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
>
> if a "11" comes along the index will split as
> Node 1: 1 through 9
> Node 2: 10, 11
>
>[with compression we get more into a node in the first place :-) ]
>
But this is not the way Rdb works UNLESS you are performing an ALTER/CREATE
index command.
Your above example will split as
Node 1: 1-6
Node 2: 7-11
ie leaving Node 1 approx 50% empty, again leading to special handling of sequnce
numbers in indices (Index too wide).
I do hover see you point that a split must be done immediatly if a 0 comes
along, but since the algoritm I explained was only to be used on sequence
numbers, that would be an issue.
Ex: Create index ..
usage sequence
this will tell Rdb that a special index split mechanism should be used, and
always perform a 100% fill on any Alter Index operations.
fwiw
Lars
|
1248.6 | wrongo | NOVA::FISHER | US Patent 5225833 | Tue Aug 10 1993 15:12 | 64 |
| Lars, I would not have written that example if I had not tested it
first to assure that it hasn't changed. It works that way I said it did:
RDO> def da foo.
RDO> def fie l da si lo.
%RDO-W-NOCDDUPDAT, database invoked by filename, the CDD will not be updated
RDO> def rel r. l. end.
RDO> def ind ir for r node size 100.. l. end.
RDO>
RDO> store r in r using r.l = 1 end_s
RDO> store r in r using r.l = 2 end_s
RDO> store r in r using r.l = 3 end_s
RDO> store r in r using r.l = 4 end_s
RDO> store r in r using r.l = 5 end_s
RDO> store r in r using r.l = 6 end_s
RDO> store r in r using r.l = 7 end_s
RDO> store r in r using r.l = 8 end_s
RDO> store r in r using r.l = 9 end_s
RDO> store r in r using r.l = 10 end_s
RDO> anal ind ir
Index Index Length Duplicate Duplicate
Index Name levels nodes used nodes length used
---------- ------ ----- ------ --------- -----------
Node: 39:407:0, Level:1, Length: 67/100, Sibling: null, Parent_ikey : (0)""
38:404:0 (0+5) "^0080000001" ="....."
38:404:1 (4+1) "00800000^02" ="....."
38:404:2 (4+1) "00800000^03" ="....."
38:404:3 (4+1) "00800000^04" ="....."
38:404:4 (4+1) "00800000^05" ="....."
38:404:5 (4+1) "00800000^06" ="....."
38:404:6 (4+1) "00800000^07" ="....."
38:404:7 (4+1) "00800000^08" ="....."
38:404:8 (4+1) "00800000^09" ="....."
38:404:9 (4+1) "00800000^0A" ="....."
IR 1 1 67 0 0
RDO> commit
RDO> start_t read_write
RDO> store r in r using r.l = 11 end_s
RDO> anal ind ir
Index Index Length Duplicate Duplicate
Index Name levels nodes used nodes length used
---------- ------ ----- ------ --------- -----------
Node: 39:407:0, Level:2, Length: 17/100, Sibling: null, Parent_ikey : (0)""
Node: 39:407:1, Level:1, Length: 60/100, Sibling: 39:407:2, Parent_ikey : (0)""
38:404:0 (0+5) "^0080000001" ="....."
38:404:1 (4+1) "00800000^02" ="....."
38:404:2 (4+1) "00800000^03" ="....."
38:404:3 (4+1) "00800000^04" ="....."
38:404:4 (4+1) "00800000^05" ="....."
38:404:5 (4+1) "00800000^06" ="....."
38:404:6 (4+1) "00800000^07" ="....."
38:404:7 (4+1) "00800000^08" ="....."
38:404:8 (4+1) "00800000^09" ="....."
Node: 39:407:2, Level:1, Length: 18/100, Sibling: null, Parent_ikey : (5)"008000000A"
38:404:9 (0+5) "^008000000A" ="....."
38:404:10 (4+1) "00800000^0B" ="....."
IR 2 3 95 0 0
RDO>
RDO> commit
|
1248.7 | You are right OF COURSE | COPCLU::BRUNSGAARD | I swear to say the whole truth as often as I can | Wed Aug 11 1993 14:13 | 25 |
| I declare defeat !!
It does indeed work the way you describe.
Why could I then make up such a story ?
Simply because one of the excercises in the old Rdb Physical Design and
Administration course showed (and it REALLY did, I tought it atleast 10 times
!!) the the split was approc 50-50.
Of course now I can't find the course material (probably threw it out when I
moved the last time), but the principle was to
- store 6 keys in an index, dump it
- store 1 more, dump it
- see that the split between the nodes was 4 in the left 3 in the right.
This was during 3.0-3.1 days, but maybe something did change ?
Maybe it depends on whether you had STORE IN in the index
maybe I am just imagining things ... but I am sure I have seen the behaviour I
described, otherwise I wouldn't have even tried to argue with you.
...
After thinking about this, I have actually been granted a free feature !!!
Thanks for the clarification.
Lars
|
1248.8 | Index NODES in informix are VERY different | COPCLU::BRUNSGAARD | I swear to say the whole truth as often as I can | Wed Aug 11 1993 14:29 | 51 |
| While I am digging into that rathole about index management, I thought I would
explain how Informix is actually physically treating its indices.
They are VERY different from our approach.
First of all they have the concept of an INDEX PAGE
----------
This is page with a completely different structure than a database page
In effect they threat an index page as we handle an index NODE.
However:
- Where we lock an INDEX NODE every time we update the node
ie all "close" values to the one we update are locked
they treat every distinct index value as a record on the page
Ie people can insert a 5 and a 6 at the same time, since they only share the
PAGE, not the record itself
- This actually meant that they have VERY limited locking between users in
indices !!
- It enables them to handle metadata operations much more smootly, seen from
a DDL point of view. Of course the otehrr waspects of metadata changes do not
benefit at all forom this (ie how to reorganinse online aso...)
- a NODE SPLIT is for them when a page is full. Then the page will split and
point to the next node.
The downsides of the approach are:
- More page structures to handle, ie difficult to backup/restore/recover/verify
because every page type (three in total, since BLOBS have a 3rd page style)
is different.
- Variable page sizes is not an option for Informix, since it will be much to
complicated to handle for the product.
- It is almost impossible for them to implement ISOLATION LEVEL SERIALIZABLE,
wihout severly hurting performance (much more than in the Rdb case).
Therefore they only do ISOLATION LEVEL REPEATABLE READ and lower.
- Variable index node size is NOT possible, because it would be variable page
sizes in effect.
- New index types are HARD WORK (ie dont expect hash indices or even clustering)
because it will require enourmous amount of handling in every aspect of the
administration in the product.
However it is an interesting concept, and clearly something that removes
(nearly) all the locking issues from sorted indices, with quite a few
sideeffects though.
comments/questions are always welcomed
cheers
Lars
|
1248.9 | Informix Constraint handling | COPCLU::BRUNSGAARD | I swear to say the whole truth as often as I can | Wed Aug 11 1993 14:46 | 63 |
| Another interesting concept in Informix.
Constraint optimizations:
The too have DEFERRED and IMMEDIATE constraints, again differently than ours
though.
Functionally they are equivalent to Rdb's DEFERRED and IMMEDIATE, but not VERB
TIME.
Lets take an example:
I have a primary key
1, 2, 3, 4
and I want to change this list to insert a new number 3, ie I move 4 to 5,
3 to 4 and insert a new #3.
The update is allowed in one statement (which a VERB TIME constraint in Rdb does
not allow) because of the follow algoritm.
- Every time a column is changed, the dependent constraint is checked
IMMEDIATLY (ie at time of the column change)
- If the row does NOT satisfy the constraint it is written to a temporary table
- when the entire statement is ended all rows in the temporary table is
reevaluated.
- If some still fails, then the message is retuned.
Applied to the example:
Update set id = id + 1 where id >=3 does
find row 3
Update to a 4
check constraint
error not unique, write to temp table
find next (4)
Update to 5
check constraint
ok
Statement end
Read temp table
fetch 4 (the old #3)
check constraint
now ok
Return to user with no error.
Of course for a DEFERRED constraint, they just postpone the extra constraint
check to COMMIT.
Pros:
Very simple concept to understand, and does not require inventing new queries
on behalf of the constraint, since the values are in reality always checked VERB
TIME.
A primary key will never reject an update just because of the timing of the
constraint (see above example).
Cons: a HUGE risc for performance problems if you don't take care.
Just consider the above example with 1 million rows (all written to temp table)
The solution does definitly NOT scale well for large tables, but after all that
is the fact for the basic implementation of constraints in Informix (requiring
specialised indices maintained by Informix for each contraint defined).
comments/questions welcomed
Lars
|
1248.10 | I tried to document it :) | BOUVS::OAKEY | Assume is *my* favorite acronym | Wed Aug 11 1993 17:38 | 14 |
| �<<< Note 1248.7 by COPCLU::BRUNSGAARD "I swear to say the whole truth as often as I can" >>>
� -< You are right OF COURSE >-
�It does indeed work the way you describe.
Lars,
It's even documented in the ISD that it works this way (although thanks to
Ed I did discover a misinterpretation of the code on my part that I will
fix).
I have the split taking place one entry off, but the fact that the split is
not 50/50 when the entry is the last entry in the node is documented :)
|
1248.11 | What is IRM :-) | COPCLU::BRUNSGAARD | I swear to say the whole truth as often as I can | Fri Aug 13 1993 14:40 | 6 |
| I am sorry that I cant remeber the IRM of the top of my head :-)
Anyway a NICE feature we now have (without changing anything).
Thanks for straighting me out
Lars
|