[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:DEC Rdb against the World
Moderator:HERON::GODFRIND
Created:Fri Jun 12 1987
Last Modified:Thu Feb 23 1995
Last Successful Update:Fri Jun 06 1997
Number of topics:1348
Total number of notes:5438

1248.0. "INFORMIX/OnLine knockoffs, if you need them" by COPCLU::BRUNSGAARD (Alarm set to Quarter past Friday) Wed May 05 1993 21:08

    I have currently sitting in on a INFORMIX/OnLine Internals class, and
    have the opportuninty to learn all the KNOCKOFFs we have against them.
    
    Please mail (or answer here) if you need the information,
    or said in another way
    
    Don't ask for it unless you need it because I will have to use a few
    evenings to write the stuff, and my evenings are full as it is already.
    
    Cheers
    Lars
    
    Ps. There are LOTS of architectual deficiencies in that product (V5.01)
T.RTitleUserPersonal
Name
DateLines
1248.1Take your timeIJSAPL::OLTHOFHenny Olthof@UTO, 838-2021Thu May 06 1993 10:459
    Lars,
    
    I have no immediate need for that information but it would be nice if
    you could post a pointer once you are done with the writing of the
    material. You know, it's hard to get competitive info these days.
    Anyway, Informix is not (yet) a competitor on the same platform as Rdb.
    
    Thanks,
    Henny
1248.2INFORMIX/OnLine V5.01 Weak Points COPCLU::BRUNSGAARDAlarm set to Quarter past FridayTue May 11 1993 20:39175
This note is highlighting areas where INFORMIX/OnLine has weak points.
The information is presented without explanations, just the plain facts.
If you need explanations why I feel it is a weak point, or specific questions
about featues, please don't hesistate to reply here.

Whereever appropiate I compare with corresponding DEC Rdb features.
The information is based on
  DEC Rdb V5.1
  INFORMIX/OnLine V5.01
  and my knowlege about both products at the time of writing.

This note contains information about:
System Architecture
I/O Architecture
BLOB Support
Referencial Integrity 
Storage mechanisms
Transaction Control
Optimizer


System Architecture:
====================
1 database attach = 1 frontend process + 1 backend process
  typically used in Client/server environments.

Precompiler support under Windows (ie native SQL is written)

Minimal *dictionary* support throught the
  DEFINE ... LIKE table
  command.
  both in stored procedures and windows Client tools (4gl)

I/O Architecture:
==================
Page Sizes are fixed at either 2Kbytes or 4Kbytes
  Size is set by the installation kit and is decided only by the INFORMIX
  development group on a per platform basis.
  Compare this with Rdb's 0.5-16Kbytes page size intervals

Buffer Size is fixed at Page size, except for SEQUENTIAL SCAN's where an
  8 page buffer is used. 
  Compare this with Rdb's 0.5-32Kbytes buffer size intervals

INDEX NODE sizes are fixed at page size
  Compare this with Rdb's user defined node sizes

DBspaces (Storage Areas) have predefined sizes.
  Compare this with Rdb's online extensions

Logical and Physical Logs (similar in AIJ and RUJ in function) have predefined
sizes
  Compare this with Rdb's online extensions

Table and Index will always share the same Tablespaces (storage areas)
  Compare this with Rdb's advanced storage map posibilities for table and
   indices

INDEX are always B*-tree indices.
  Compare this with Rdb's SORTED and HASH indices

ROWID (DBKey) consists of 3 bytes TABLE-ID, 1 byte Row index
  The actual chunk (file) the data has to be looked up in another table
  Note this architectually limits the number of pages (hence rows) for a
  given table + all indices to 2**24 ~ 16 mill. pages
  See also BLOB Support.
  Compare this with Rdb's 4byte structure and variable page sizes

BLOB Support
================
All BLOBs in a database (OnLine system) share the same BLOBspace.
  Compare this with Rdb's advanced storage map posibilities for BLOBS

The ONLINE archive (backup) cannot process the BLOBspace online.
  Compare this with Rdb's advanced ONLINE backup options.

There is no support for WORM devices

Storage mechanisms
===================
Rows are always UNCOMPRESSED
  Dynamic structures are handled by having all static structures on a
  *home* page, and all dynamic structures (VARCHAR and BLOBS) on 1 or more
  separate pages.
  Compare this with Rdb's advanced storage map posibilities for table

Indices are always UNCOMPRESSED
  Compare this with Rdb's advanced pre/post fix compression applied
  automatically and the user specified compression option.

VARCHAR's can only be 256 chars maximum.
   Compare this with Rdb's 32000 chars limit

Free space is found usig special BITMAP pages that uses
  2bit structure for fixed size rows less than page size
  4bit structure for dynamic rows
  This means one bitmap page for approx each 7000 pages
  Compare this with Rdb's advanced AIP/SPAM mechanism to find free space

CLUSTER INDICES is a *one shot* operation that only sorts data
  CREATE CLUSTER INDEX ... or
  ALTER INDEX <name> TO CLUSTER
  really doesn't apply to the index at all.
  It sort the DATA rows according to the index, and places them back on disk
  (now in sorted order).
  However NO ATTEMPTS are done from now on by INFORMIX to keep the clustering
  effect at run-time.
  Compare this with Rdb's real data clustering both by hash indices and
  sorted indices.

Transaction Control
====================
There is no support for ISOLATION LEVEL SERIALIZABLE transactions

There is no concept of a READ ONLY transaction,
   ie a transaction that guaranties a consistent data view,
   without imposing any visible locking.

Only 1 transaction can be rolled back at a time, in case of a process being
  killed.
  Compare this with Rdb's parallel recovery mechanism.

Referencial Integrity
=====================
PRIMARY KEY and FOREIGN KEY's create INDEX implicitly to help the checks.
   Compare this with Rdb's general handling of indices required for
   constraints.

CHECK constraints can only check intra-row dependencies,
   ie Project start < Project end
   Compare this with Rdb's generalized CHECK constraints


Optimizer
=============
The optimizer is costbased and needs information from a regular run
  UPDATE STATISTICS sql command (can be run online).
  Cost is calculated as #I/O's + #ROWs*0.03 (CPU cost of row processing).

  Compare this with Rdb's highly advanced dynamic optimization

  Information used is:
  Table level:
    - Number of pages
    - number of rows 
    - Second largest and smallest value on  a per column basis

  Index Level:
    - Number of pages used
    - Number of leaves
    - number of levels
    - number of unique values for entire index
    - degree of clustering (how *ordered* are the rows)

ORDER BY, GROUP BY, DISTINCT will always result in a temporary table
   being created.
   The temporary table is always placed in the same file (rootdbs) and
   is fixed sized.

Only leading segments can be used in indices, there is no concept of a
   key-only boolean as in Rdb.

Leading segments in a multisegmented index has the same uniqueness applied
  as the entire index (ie SEX, BADGENO; using SEX alone will act as if UNIQUE).

The optimizer strategy (plan) is never remembered unless stored procedures
  are used.
   Compare this with Rdb's automatic automatic-once execute-many architecture
   regardless of client software.

Optimizer strategy only available if source code is available and changed
   Compare this with Rdb's DEBUG_FLAGS mechanism

No Global query optimization depsite the global DML capabilities
    
1248.3INFORMIX/OnLine V5.01 Strong Points COPCLU::BRUNSGAARDAlarm set to Quarter past FridayTue May 11 1993 20:40138
This note is highlighting areas where INFORMIX/OnLine has strong points.
The information is presented without explanations, just the plain facts.
If you need explanations why I feel it is a strong point, or specific questions
about featues, please don't hesistate to reply here.

Whereever appropiate I compare with corresponding DEC Rdb features.
The information is based on
  DEC Rdb V5.1
  INFORMIX/OnLine V5.01
  and my knowlege about both products at the time of writing.

This note contains information about:
System Architecture
Buffer Management
Transaction Control
SQL, Stored Procedures and Triggers
Index structure
Optimizer
Administration tools

System Architecture:
====================
Precompiler support under Windows (ie native SQL is written)

Minimal *dictionary* support throught the
  DEFINE ... LIKE table
  command.
  both in stored procedures and windows Client tools (4gl)

Invisible distributed DML and transaction capabilities.

BUFFER/page Management
======================
Updated Buffers are written by background processes (page cleaners)
  #page cleaners can be configured individually on a system basis

Buffer locks are using latches instead of kernal-locking

Page cleanup (making freespace 1 big clump) is only done on demand basis,
  ie a new row is less than FreeSpaceLeft but there is no hole where the row
  fits in.

  
Transaction Control
=================
ISOLATION LEVEL READ UNCOMMITED is implemented

SQL, Stored Procedures and Triggers
===============================
Scroll Cursors
  - Fetch absolute, next, prir, relative aso.
  - cursor stability automatically

Distributed DML operations (Throught INFORMIX/Star).
  - No metadata sharing between databases, nor a central place where
    metadata must be kept.
  - invisible integration in SQL from programs
    SELECT FROM <table>, rough BNF for <table>
    <table> := <table_name> | <alias>
    <alias> := <distributed name> <table_name>
  - used in any statement except UPDATE/INSERT/DELETE of VIEWS.

UPDATE statements with record syntax
  ie. Update set * = (execute procedure ...)


Triggers implemented as SQL draft paper
  - indefinit nesting of triggers (triggered by triggers...)
  - Before and After events
  - For Each Row event
  - Insert, Update and Delete events

Very advanced Stored PRocedure support.
- Local variables on procedure or Block level (BEGIN END)
- IF-then-else
- Case statement
- While and For loops
- Exception handling
  - Whenever ... Call  [with resume]
  - Raise Exception
  - saving of SQLCODE, SQLCA and SQLDIAGNISTIC variables
- recursiuve call (20! is just as easy as Pascal)
- BLOB procedure parameters considered BY REFERENCE
  all other BY VALUE
- generalized LET statement
  LET (a,b) = (1,2)
- CURSOR CONTEXT retension after RETURN
  ie next call by teh same client to the stored procedure will
  just step further on in the data
- Precompiled query strategies with automatic recompilation if any 
  dependencies are REMOVED.
  Recompile possible on demand by UPDATE STATISTICS FOR PROCEDURE command
- complety integrated in SQL
  ie.
  SELECT ...
  FROM ...
  WHERE col = Execute procedure <name>
- For-each construct (an automatic looping cursor)
- System callout, but no parameters can be passed back


Index structure
=================
Nonunique indices have duplicate chains included in the primary index tree

Only the ACTUAL value used in the index is locked even if several values are
  placed in the same index node
  SELECT * FROM table WHERE NAME = 'Lars'
  will not lock any other part of the index than the one pointing to 'Lars'
  not even 'Larse'.

Index node splits: keys are generally balanced, but if the key value is
  the new maximum value only the new key will be inserted in the new node.
  Ex.
    Node 1 contains: 1, 2, 3, 4, 5
    
    if a "6" comes along the index will split as
    Node 1: same as above
    Node 2: 6


Optimizer
=================
Index Only retrieval is implemented


Administration tools
=================
Online Backup (except BLOBspaces)

Very advanced Loader
  several tables loaded from 1 input record
  Datatype conversions supported
  Specific control over errors during load
  Control over COMMIT intervals

DBcreate utility to generate command procedure of database or specific items
    
1248.4Not so strong a pointNOVA::FISHERUS Patent 5225833Fri Aug 06 1993 16:2436
re:

Index node splits: keys are generally balanced, but if the key value is
  the new maximum value only the new key will be inserted in the new node.
  Ex.
    Node 1 contains: 1, 2, 3, 4, 5
    
    if a "6" comes along the index will split as
    Node 1: same as above
    Node 2: 6
-----------------------------------------------------

For Rdb:

    Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
    
    if a "11" comes along the index will split as
    Node 1: 1 through 9
    Node 2: 10, 11

[with compression we get more into a node in the first place :-) ]

I think this was an early Rdb design feature because with the Informix
algorithm, after the in-order insertions are done, the first record to be
inserted causes node splits and tree rebalancing.  For example, create
the tree inserting 1.000.000 records in sorted order each node is full
[unless "full" doesn't mean "full"] now insert a record with key value
998.5 -- the node it goes into must be split, and each of the higher level
nodes must be split as well.

(Remember in V1.0 and V1.1 we created btrees by sorting the keys and
inserting them into the tree in order.  V2.0 incorprated the current
build-bottom-up-withpercent-fill method.  So you could say "we tried and
rejected this method")  :-)

ed
1248.5Seuqnce numbres in indices are generally painfullCOPCLU::BRUNSGAARDI swear to say the whole truth as often as I canTue Aug 10 1993 00:2428
> For Rdb:
>
>    Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
>    
>    if a "11" comes along the index will split as
>    Node 1: 1 through 9
>    Node 2: 10, 11
>
>[with compression we get more into a node in the first place :-) ]
>
But this is not the way Rdb works UNLESS you are performing an ALTER/CREATE
index command.
Your above example will split as
Node 1: 1-6
Node 2: 7-11
ie leaving Node 1 approx 50% empty, again leading to special handling of sequnce
numbers in indices (Index too wide).

I do hover see you point that a split must be done immediatly if a 0 comes
along, but since the algoritm I explained was only to be used on sequence
numbers, that would be an issue.
Ex: Create index ..
      usage sequence
this will tell Rdb that a special index split mechanism should be used, and
always perform a 100% fill on any Alter Index operations.

fwiw
Lars
1248.6wrongoNOVA::FISHERUS Patent 5225833Tue Aug 10 1993 15:1264
    Lars, I would not have written that example if I had not tested it
    first to assure that it hasn't changed.  It works that way I said it did:
    
RDO> def da foo.
RDO> def fie l da si lo.
%RDO-W-NOCDDUPDAT, database invoked by filename, the CDD will not be updated
RDO> def rel r. l. end.
RDO> def ind ir for r node size 100.. l. end.
RDO>
RDO> store r in r using r.l = 1 end_s
RDO> store r in r using r.l = 2 end_s
RDO> store r in r using r.l = 3 end_s
RDO> store r in r using r.l = 4 end_s
RDO> store r in r using r.l = 5 end_s
RDO> store r in r using r.l = 6 end_s
RDO> store r in r using r.l = 7 end_s
RDO> store r in r using r.l = 8 end_s
RDO> store r in r using r.l = 9 end_s
RDO> store r in r using r.l = 10 end_s
RDO> anal ind ir

                                 Index   Index   Length  Duplicate  Duplicate
Index Name                       levels  nodes   used    nodes      length used
----------                       ------  -----   ------  ---------  -----------
Node: 39:407:0, Level:1, Length: 67/100, Sibling: null, Parent_ikey : (0)""
   38:404:0  (0+5) "^0080000001"  ="....."
   38:404:1  (4+1) "00800000^02"  ="....."
   38:404:2  (4+1) "00800000^03"  ="....."
   38:404:3  (4+1) "00800000^04"  ="....."
   38:404:4  (4+1) "00800000^05"  ="....."
   38:404:5  (4+1) "00800000^06"  ="....."
   38:404:6  (4+1) "00800000^07"  ="....."
   38:404:7  (4+1) "00800000^08"  ="....."
   38:404:8  (4+1) "00800000^09"  ="....."
   38:404:9  (4+1) "00800000^0A"  ="....."
IR                                    1      1       67          0            0

RDO> commit
RDO> start_t read_write
RDO> store r in r using r.l = 11 end_s
RDO> anal ind ir

                                 Index   Index   Length  Duplicate  Duplicate
Index Name                       levels  nodes   used    nodes      length used
----------                       ------  -----   ------  ---------  -----------
Node: 39:407:0, Level:2, Length: 17/100, Sibling: null, Parent_ikey : (0)""
  Node: 39:407:1, Level:1, Length: 60/100, Sibling: 39:407:2, Parent_ikey : (0)""
     38:404:0  (0+5) "^0080000001"  ="....."
     38:404:1  (4+1) "00800000^02"  ="....."
     38:404:2  (4+1) "00800000^03"  ="....."
     38:404:3  (4+1) "00800000^04"  ="....."
     38:404:4  (4+1) "00800000^05"  ="....."
     38:404:5  (4+1) "00800000^06"  ="....."
     38:404:6  (4+1) "00800000^07"  ="....."
     38:404:7  (4+1) "00800000^08"  ="....."
     38:404:8  (4+1) "00800000^09"  ="....."
  Node: 39:407:2, Level:1, Length: 18/100, Sibling: null, Parent_ikey : (5)"008000000A"
     38:404:9  (0+5) "^008000000A"  ="....."
     38:404:10  (4+1) "00800000^0B"  ="....."
IR                                    2      3       95          0            0

RDO>
RDO> commit
    
1248.7You are right OF COURSECOPCLU::BRUNSGAARDI swear to say the whole truth as often as I canWed Aug 11 1993 14:1325
I declare defeat !!

It does indeed work the way you describe.
Why could I then make up such a story ?

Simply because one of the excercises in the old Rdb Physical Design and
Administration course showed (and it REALLY did, I tought it atleast 10 times
!!) the the split was approc 50-50.

Of course now I can't find the course material (probably threw it out when I
moved the last time), but the principle was to
- store 6 keys in an index, dump it
- store 1 more, dump it
- see that the split between the nodes was 4 in the left 3 in the right.

This was during 3.0-3.1 days, but maybe something did change ?
Maybe it depends on whether you had STORE IN in the index
maybe I am just imagining things ... but I am sure I have seen the behaviour I
described, otherwise I wouldn't have even tried to argue with you.

...
After thinking about this, I have actually been granted a free feature !!!

Thanks for the clarification.
Lars
1248.8Index NODES in informix are VERY differentCOPCLU::BRUNSGAARDI swear to say the whole truth as often as I canWed Aug 11 1993 14:2951
While I am digging into that rathole about index management, I thought I would
explain how Informix is actually physically treating its indices.
They are VERY different from our approach.

First of all they have the concept of an INDEX PAGE
                                         ----------
This is page with a completely different structure than a database page
In effect they threat an index page as we handle an index NODE.
However:
- Where we lock an INDEX NODE every time we update the node
  ie all "close" values to the one we update are locked
  they treat every distinct index value as a record on the page
  Ie people can insert a 5 and a 6 at the same time, since they only share the
  PAGE, not the record itself

- This actually meant that they have VERY limited locking between users in
  indices !!
- It enables them to handle metadata operations much more smootly, seen from
  a DDL point of view. Of course the otehrr waspects of metadata changes do not
  benefit at all forom this (ie how to reorganinse online aso...)

- a NODE SPLIT is for them when a page is full. Then the page will split and
  point to the next node.

The downsides of the approach are:
- More page structures to handle, ie difficult to backup/restore/recover/verify
  because every page type (three in total, since BLOBS have a 3rd page style)
  is different.

- Variable page sizes is not an option for Informix, since it will be much to
  complicated to handle for the product.

- It is almost impossible for them to implement ISOLATION LEVEL SERIALIZABLE,
  wihout severly hurting performance (much more than in the Rdb case).
  Therefore they only do ISOLATION LEVEL REPEATABLE READ and lower.

- Variable index node size is NOT possible, because it would be variable page
  sizes in effect.

- New index types are HARD WORK (ie dont expect hash indices or even clustering)
  because it will require enourmous amount of handling in every aspect of the
  administration in the product.

However it is an interesting concept, and clearly something that removes
(nearly) all the locking issues from sorted indices, with quite a few
sideeffects though.

comments/questions are always welcomed
cheers
Lars

1248.9Informix Constraint handlingCOPCLU::BRUNSGAARDI swear to say the whole truth as often as I canWed Aug 11 1993 14:4663
Another interesting concept in Informix.

Constraint optimizations:
The too have DEFERRED and IMMEDIATE constraints, again differently than ours
though.

Functionally they are equivalent to Rdb's DEFERRED and IMMEDIATE, but not VERB
TIME.
Lets take an example:
I have a primary key
  1, 2, 3, 4

and I want to change this list to insert a new number 3, ie I move 4 to 5,
3 to 4 and insert a new #3.
The update is allowed in one statement (which a VERB TIME constraint in Rdb does
not allow) because of the follow algoritm.

- Every time a column is changed, the dependent constraint is checked
  IMMEDIATLY (ie at time of the column change)
- If the row does NOT satisfy the constraint it is written to a temporary table
- when the entire statement is ended all rows in the temporary table is
  reevaluated.
- If some still fails, then the message is retuned.

Applied to the example:
Update set id = id + 1 where id >=3 does

find row 3
Update to a 4
  check constraint
    error not unique, write to temp table
find next (4)
Update to 5
  check constraint
  ok
Statement end
Read temp table
  fetch 4 (the old #3)
  check constraint
  now ok

Return to user with no error.

Of course for a DEFERRED constraint, they just postpone the extra constraint
check to COMMIT.


Pros:
  Very simple concept to understand, and does not require inventing new queries
on behalf of the constraint, since the values are in reality always checked VERB
TIME.

A primary key will never reject an update just because of the timing of the
constraint (see above example).

Cons: a HUGE risc for performance problems if you don't take care.
Just consider the above example with 1 million rows (all written to temp table)
The solution does definitly NOT scale well for large tables, but after all that
is the fact for the basic implementation of constraints in Informix (requiring
specialised indices maintained by Informix for each contraint defined).

comments/questions welcomed
Lars
1248.10I tried to document it :)BOUVS::OAKEYAssume is *my* favorite acronymWed Aug 11 1993 17:3814
�<<< Note 1248.7 by COPCLU::BRUNSGAARD "I swear to say the whole truth as often as I can" >>>
�                          -< You are right OF COURSE >-

�It does indeed work the way you describe.

Lars,

It's even documented in the ISD that it works this way (although thanks to 
Ed I did discover a misinterpretation of the code on my part that I will 
fix).

I have the split taking place one entry off, but the fact that the split is 
not 50/50 when the entry is the last entry in the node is documented :)

1248.11What is IRM :-)COPCLU::BRUNSGAARDI swear to say the whole truth as often as I canFri Aug 13 1993 14:406
I am sorry that I cant remeber the IRM of the top of my head :-)

Anyway a NICE feature we now have (without changing anything).

Thanks for straighting me out
Lars