[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:	DEC Rdb against the World

Moderator:	HERON::GODFRIND

Created:	Fri Jun 12 1987
Last Modified:	Thu Feb 23 1995
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1348
Total number of notes:	5438

1248.0. "INFORMIX/OnLine knockoffs, if you need them" by COPCLU::BRUNSGAARD (Alarm set to Quarter past Friday) Wed May 05 1993 21:08

    I have currently sitting in on a INFORMIX/OnLine Internals class, and
    have the opportuninty to learn all the KNOCKOFFs we have against them.
    
    Please mail (or answer here) if you need the information,
    or said in another way
    
    Don't ask for it unless you need it because I will have to use a few
    evenings to write the stuff, and my evenings are full as it is already.
    
    Cheers
    Lars
    
    Ps. There are LOTS of architectual deficiencies in that product (V5.01)

T.R	Title	User	Personal Name	Date	Lines
1248.1	Take your time	IJSAPL::OLTHOF	Henny Olthof@UTO, 838-2021	`Thu May 06 1993 10:45`	9
	Lars, I have no immediate need for that information but it would be nice if you could post a pointer once you are done with the writing of the material. You know, it's hard to get competitive info these days. Anyway, Informix is not (yet) a competitor on the same platform as Rdb. Thanks, Henny
1248.2	INFORMIX/OnLine V5.01 Weak Points	COPCLU::BRUNSGAARD	Alarm set to Quarter past Friday	`Tue May 11 1993 20:39`	175
	This note is highlighting areas where INFORMIX/OnLine has weak points. The information is presented without explanations, just the plain facts. If you need explanations why I feel it is a weak point, or specific questions about featues, please don't hesistate to reply here. Whereever appropiate I compare with corresponding DEC Rdb features. The information is based on DEC Rdb V5.1 INFORMIX/OnLine V5.01 and my knowlege about both products at the time of writing. This note contains information about: System Architecture I/O Architecture BLOB Support Referencial Integrity Storage mechanisms Transaction Control Optimizer System Architecture: ==================== 1 database attach = 1 frontend process + 1 backend process typically used in Client/server environments. Precompiler support under Windows (ie native SQL is written) Minimal dictionary support throught the DEFINE ... LIKE table command. both in stored procedures and windows Client tools (4gl) I/O Architecture: ================== Page Sizes are fixed at either 2Kbytes or 4Kbytes Size is set by the installation kit and is decided only by the INFORMIX development group on a per platform basis. Compare this with Rdb's 0.5-16Kbytes page size intervals Buffer Size is fixed at Page size, except for SEQUENTIAL SCAN's where an 8 page buffer is used. Compare this with Rdb's 0.5-32Kbytes buffer size intervals INDEX NODE sizes are fixed at page size Compare this with Rdb's user defined node sizes DBspaces (Storage Areas) have predefined sizes. Compare this with Rdb's online extensions Logical and Physical Logs (similar in AIJ and RUJ in function) have predefined sizes Compare this with Rdb's online extensions Table and Index will always share the same Tablespaces (storage areas) Compare this with Rdb's advanced storage map posibilities for table and indices INDEX are always B-tree indices. Compare this with Rdb's SORTED and HASH indices ROWID (DBKey) consists of 3 bytes TABLE-ID, 1 byte Row index The actual chunk (file) the data has to be looked up in another table Note this architectually limits the number of pages (hence rows) for a given table + all indices to 224 ~ 16 mill. pages See also BLOB Support. Compare this with Rdb's 4byte structure and variable page sizes BLOB Support ================ All BLOBs in a database (OnLine system) share the same BLOBspace. Compare this with Rdb's advanced storage map posibilities for BLOBS The ONLINE archive (backup) cannot process the BLOBspace online. Compare this with Rdb's advanced ONLINE backup options. There is no support for WORM devices Storage mechanisms =================== Rows are always UNCOMPRESSED Dynamic structures are handled by having all static structures on a home* page, and all dynamic structures (VARCHAR and BLOBS) on 1 or more separate pages. Compare this with Rdb's advanced storage map posibilities for table Indices are always UNCOMPRESSED Compare this with Rdb's advanced pre/post fix compression applied automatically and the user specified compression option. VARCHAR's can only be 256 chars maximum. Compare this with Rdb's 32000 chars limit Free space is found usig special BITMAP pages that uses 2bit structure for fixed size rows less than page size 4bit structure for dynamic rows This means one bitmap page for approx each 7000 pages Compare this with Rdb's advanced AIP/SPAM mechanism to find free space CLUSTER INDICES is a one shot operation that only sorts data CREATE CLUSTER INDEX ... or ALTER INDEX <name> TO CLUSTER really doesn't apply to the index at all. It sort the DATA rows according to the index, and places them back on disk (now in sorted order). However NO ATTEMPTS are done from now on by INFORMIX to keep the clustering effect at run-time. Compare this with Rdb's real data clustering both by hash indices and sorted indices. Transaction Control ==================== There is no support for ISOLATION LEVEL SERIALIZABLE transactions There is no concept of a READ ONLY transaction, ie a transaction that guaranties a consistent data view, without imposing any visible locking. Only 1 transaction can be rolled back at a time, in case of a process being killed. Compare this with Rdb's parallel recovery mechanism. Referencial Integrity ===================== PRIMARY KEY and FOREIGN KEY's create INDEX implicitly to help the checks. Compare this with Rdb's general handling of indices required for constraints. CHECK constraints can only check intra-row dependencies, ie Project start < Project end Compare this with Rdb's generalized CHECK constraints Optimizer ============= The optimizer is costbased and needs information from a regular run UPDATE STATISTICS sql command (can be run online). Cost is calculated as #I/O's + #ROWs0.03 (CPU cost of row processing). Compare this with Rdb's highly advanced dynamic optimization Information used is: Table level: - Number of pages - number of rows - Second largest and smallest value on a per column basis Index Level: - Number of pages used - Number of leaves - number of levels - number of unique values for entire index - degree of clustering (how ordered* are the rows) ORDER BY, GROUP BY, DISTINCT will always result in a temporary table being created. The temporary table is always placed in the same file (rootdbs) and is fixed sized. Only leading segments can be used in indices, there is no concept of a key-only boolean as in Rdb. Leading segments in a multisegmented index has the same uniqueness applied as the entire index (ie SEX, BADGENO; using SEX alone will act as if UNIQUE). The optimizer strategy (plan) is never remembered unless stored procedures are used. Compare this with Rdb's automatic automatic-once execute-many architecture regardless of client software. Optimizer strategy only available if source code is available and changed Compare this with Rdb's DEBUG_FLAGS mechanism No Global query optimization depsite the global DML capabilities
1248.3	INFORMIX/OnLine V5.01 Strong Points	COPCLU::BRUNSGAARD	Alarm set to Quarter past Friday	`Tue May 11 1993 20:40`	138
	This note is highlighting areas where INFORMIX/OnLine has strong points. The information is presented without explanations, just the plain facts. If you need explanations why I feel it is a strong point, or specific questions about featues, please don't hesistate to reply here. Whereever appropiate I compare with corresponding DEC Rdb features. The information is based on DEC Rdb V5.1 INFORMIX/OnLine V5.01 and my knowlege about both products at the time of writing. This note contains information about: System Architecture Buffer Management Transaction Control SQL, Stored Procedures and Triggers Index structure Optimizer Administration tools System Architecture: ==================== Precompiler support under Windows (ie native SQL is written) Minimal dictionary support throught the DEFINE ... LIKE table command. both in stored procedures and windows Client tools (4gl) Invisible distributed DML and transaction capabilities. BUFFER/page Management ====================== Updated Buffers are written by background processes (page cleaners) #page cleaners can be configured individually on a system basis Buffer locks are using latches instead of kernal-locking Page cleanup (making freespace 1 big clump) is only done on demand basis, ie a new row is less than FreeSpaceLeft but there is no hole where the row fits in. Transaction Control ================= ISOLATION LEVEL READ UNCOMMITED is implemented SQL, Stored Procedures and Triggers =============================== Scroll Cursors - Fetch absolute, next, prir, relative aso. - cursor stability automatically Distributed DML operations (Throught INFORMIX/Star). - No metadata sharing between databases, nor a central place where metadata must be kept. - invisible integration in SQL from programs SELECT FROM <table>, rough BNF for <table> <table> := <table_name> \| <alias> <alias> := <distributed name> <table_name> - used in any statement except UPDATE/INSERT/DELETE of VIEWS. UPDATE statements with record syntax ie. Update set * = (execute procedure ...) Triggers implemented as SQL draft paper - indefinit nesting of triggers (triggered by triggers...) - Before and After events - For Each Row event - Insert, Update and Delete events Very advanced Stored PRocedure support. - Local variables on procedure or Block level (BEGIN END) - IF-then-else - Case statement - While and For loops - Exception handling - Whenever ... Call [with resume] - Raise Exception - saving of SQLCODE, SQLCA and SQLDIAGNISTIC variables - recursiuve call (20! is just as easy as Pascal) - BLOB procedure parameters considered BY REFERENCE all other BY VALUE - generalized LET statement LET (a,b) = (1,2) - CURSOR CONTEXT retension after RETURN ie next call by teh same client to the stored procedure will just step further on in the data - Precompiled query strategies with automatic recompilation if any dependencies are REMOVED. Recompile possible on demand by UPDATE STATISTICS FOR PROCEDURE command - complety integrated in SQL ie. SELECT ... FROM ... WHERE col = Execute procedure <name> - For-each construct (an automatic looping cursor) - System callout, but no parameters can be passed back Index structure ================= Nonunique indices have duplicate chains included in the primary index tree Only the ACTUAL value used in the index is locked even if several values are placed in the same index node SELECT * FROM table WHERE NAME = 'Lars' will not lock any other part of the index than the one pointing to 'Lars' not even 'Larse'. Index node splits: keys are generally balanced, but if the key value is the new maximum value only the new key will be inserted in the new node. Ex. Node 1 contains: 1, 2, 3, 4, 5 if a "6" comes along the index will split as Node 1: same as above Node 2: 6 Optimizer ================= Index Only retrieval is implemented Administration tools ================= Online Backup (except BLOBspaces) Very advanced Loader several tables loaded from 1 input record Datatype conversions supported Specific control over errors during load Control over COMMIT intervals DBcreate utility to generate command procedure of database or specific items
1248.4	Not so strong a point	NOVA::FISHER	US Patent 5225833	`Fri Aug 06 1993 16:24`	36
	re: Index node splits: keys are generally balanced, but if the key value is the new maximum value only the new key will be inserted in the new node. Ex. Node 1 contains: 1, 2, 3, 4, 5 if a "6" comes along the index will split as Node 1: same as above Node 2: 6 ----------------------------------------------------- For Rdb: Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 if a "11" comes along the index will split as Node 1: 1 through 9 Node 2: 10, 11 [with compression we get more into a node in the first place :-) ] I think this was an early Rdb design feature because with the Informix algorithm, after the in-order insertions are done, the first record to be inserted causes node splits and tree rebalancing. For example, create the tree inserting 1.000.000 records in sorted order each node is full [unless "full" doesn't mean "full"] now insert a record with key value 998.5 -- the node it goes into must be split, and each of the higher level nodes must be split as well. (Remember in V1.0 and V1.1 we created btrees by sorting the keys and inserting them into the tree in order. V2.0 incorprated the current build-bottom-up-withpercent-fill method. So you could say "we tried and rejected this method") :-) ed
1248.5	Seuqnce numbres in indices are generally painfull	COPCLU::BRUNSGAARD	I swear to say the whole truth as often as I can	`Tue Aug 10 1993 00:24`	28
	> For Rdb: > > Node 1 contains: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 > > if a "11" comes along the index will split as > Node 1: 1 through 9 > Node 2: 10, 11 > >[with compression we get more into a node in the first place :-) ] > But this is not the way Rdb works UNLESS you are performing an ALTER/CREATE index command. Your above example will split as Node 1: 1-6 Node 2: 7-11 ie leaving Node 1 approx 50% empty, again leading to special handling of sequnce numbers in indices (Index too wide). I do hover see you point that a split must be done immediatly if a 0 comes along, but since the algoritm I explained was only to be used on sequence numbers, that would be an issue. Ex: Create index .. usage sequence this will tell Rdb that a special index split mechanism should be used, and always perform a 100% fill on any Alter Index operations. fwiw Lars
1248.6	wrongo	NOVA::FISHER	US Patent 5225833	`Tue Aug 10 1993 15:12`	64
	Lars, I would not have written that example if I had not tested it first to assure that it hasn't changed. It works that way I said it did: RDO> def da foo. RDO> def fie l da si lo. %RDO-W-NOCDDUPDAT, database invoked by filename, the CDD will not be updated RDO> def rel r. l. end. RDO> def ind ir for r node size 100.. l. end. RDO> RDO> store r in r using r.l = 1 end_s RDO> store r in r using r.l = 2 end_s RDO> store r in r using r.l = 3 end_s RDO> store r in r using r.l = 4 end_s RDO> store r in r using r.l = 5 end_s RDO> store r in r using r.l = 6 end_s RDO> store r in r using r.l = 7 end_s RDO> store r in r using r.l = 8 end_s RDO> store r in r using r.l = 9 end_s RDO> store r in r using r.l = 10 end_s RDO> anal ind ir Index Index Length Duplicate Duplicate Index Name levels nodes used nodes length used ---------- ------ ----- ------ --------- ----------- Node: 39:407:0, Level:1, Length: 67/100, Sibling: null, Parent_ikey : (0)"" 38:404:0 (0+5) "^0080000001" ="....." 38:404:1 (4+1) "00800000^02" ="....." 38:404:2 (4+1) "00800000^03" ="....." 38:404:3 (4+1) "00800000^04" ="....." 38:404:4 (4+1) "00800000^05" ="....." 38:404:5 (4+1) "00800000^06" ="....." 38:404:6 (4+1) "00800000^07" ="....." 38:404:7 (4+1) "00800000^08" ="....." 38:404:8 (4+1) "00800000^09" ="....." 38:404:9 (4+1) "00800000^0A" ="....." IR 1 1 67 0 0 RDO> commit RDO> start_t read_write RDO> store r in r using r.l = 11 end_s RDO> anal ind ir Index Index Length Duplicate Duplicate Index Name levels nodes used nodes length used ---------- ------ ----- ------ --------- ----------- Node: 39:407:0, Level:2, Length: 17/100, Sibling: null, Parent_ikey : (0)"" Node: 39:407:1, Level:1, Length: 60/100, Sibling: 39:407:2, Parent_ikey : (0)"" 38:404:0 (0+5) "^0080000001" ="....." 38:404:1 (4+1) "00800000^02" ="....." 38:404:2 (4+1) "00800000^03" ="....." 38:404:3 (4+1) "00800000^04" ="....." 38:404:4 (4+1) "00800000^05" ="....." 38:404:5 (4+1) "00800000^06" ="....." 38:404:6 (4+1) "00800000^07" ="....." 38:404:7 (4+1) "00800000^08" ="....." 38:404:8 (4+1) "00800000^09" ="....." Node: 39:407:2, Level:1, Length: 18/100, Sibling: null, Parent_ikey : (5)"008000000A" 38:404:9 (0+5) "^008000000A" ="....." 38:404:10 (4+1) "00800000^0B" ="....." IR 2 3 95 0 0 RDO> RDO> commit
1248.7	You are right OF COURSE	COPCLU::BRUNSGAARD	I swear to say the whole truth as often as I can	`Wed Aug 11 1993 14:13`	25
	I declare defeat !! It does indeed work the way you describe. Why could I then make up such a story ? Simply because one of the excercises in the old Rdb Physical Design and Administration course showed (and it REALLY did, I tought it atleast 10 times !!) the the split was approc 50-50. Of course now I can't find the course material (probably threw it out when I moved the last time), but the principle was to - store 6 keys in an index, dump it - store 1 more, dump it - see that the split between the nodes was 4 in the left 3 in the right. This was during 3.0-3.1 days, but maybe something did change ? Maybe it depends on whether you had STORE IN in the index maybe I am just imagining things ... but I am sure I have seen the behaviour I described, otherwise I wouldn't have even tried to argue with you. ... After thinking about this, I have actually been granted a free feature !!! Thanks for the clarification. Lars
1248.8	Index NODES in informix are VERY different	COPCLU::BRUNSGAARD	I swear to say the whole truth as often as I can	`Wed Aug 11 1993 14:29`	51
	While I am digging into that rathole about index management, I thought I would explain how Informix is actually physically treating its indices. They are VERY different from our approach. First of all they have the concept of an INDEX PAGE ---------- This is page with a completely different structure than a database page In effect they threat an index page as we handle an index NODE. However: - Where we lock an INDEX NODE every time we update the node ie all "close" values to the one we update are locked they treat every distinct index value as a record on the page Ie people can insert a 5 and a 6 at the same time, since they only share the PAGE, not the record itself - This actually meant that they have VERY limited locking between users in indices !! - It enables them to handle metadata operations much more smootly, seen from a DDL point of view. Of course the otehrr waspects of metadata changes do not benefit at all forom this (ie how to reorganinse online aso...) - a NODE SPLIT is for them when a page is full. Then the page will split and point to the next node. The downsides of the approach are: - More page structures to handle, ie difficult to backup/restore/recover/verify because every page type (three in total, since BLOBS have a 3rd page style) is different. - Variable page sizes is not an option for Informix, since it will be much to complicated to handle for the product. - It is almost impossible for them to implement ISOLATION LEVEL SERIALIZABLE, wihout severly hurting performance (much more than in the Rdb case). Therefore they only do ISOLATION LEVEL REPEATABLE READ and lower. - Variable index node size is NOT possible, because it would be variable page sizes in effect. - New index types are HARD WORK (ie dont expect hash indices or even clustering) because it will require enourmous amount of handling in every aspect of the administration in the product. However it is an interesting concept, and clearly something that removes (nearly) all the locking issues from sorted indices, with quite a few sideeffects though. comments/questions are always welcomed cheers Lars
1248.9	Informix Constraint handling	COPCLU::BRUNSGAARD	I swear to say the whole truth as often as I can	`Wed Aug 11 1993 14:46`	63
	Another interesting concept in Informix. Constraint optimizations: The too have DEFERRED and IMMEDIATE constraints, again differently than ours though. Functionally they are equivalent to Rdb's DEFERRED and IMMEDIATE, but not VERB TIME. Lets take an example: I have a primary key 1, 2, 3, 4 and I want to change this list to insert a new number 3, ie I move 4 to 5, 3 to 4 and insert a new #3. The update is allowed in one statement (which a VERB TIME constraint in Rdb does not allow) because of the follow algoritm. - Every time a column is changed, the dependent constraint is checked IMMEDIATLY (ie at time of the column change) - If the row does NOT satisfy the constraint it is written to a temporary table - when the entire statement is ended all rows in the temporary table is reevaluated. - If some still fails, then the message is retuned. Applied to the example: Update set id = id + 1 where id >=3 does find row 3 Update to a 4 check constraint error not unique, write to temp table find next (4) Update to 5 check constraint ok Statement end Read temp table fetch 4 (the old #3) check constraint now ok Return to user with no error. Of course for a DEFERRED constraint, they just postpone the extra constraint check to COMMIT. Pros: Very simple concept to understand, and does not require inventing new queries on behalf of the constraint, since the values are in reality always checked VERB TIME. A primary key will never reject an update just because of the timing of the constraint (see above example). Cons: a HUGE risc for performance problems if you don't take care. Just consider the above example with 1 million rows (all written to temp table) The solution does definitly NOT scale well for large tables, but after all that is the fact for the basic implementation of constraints in Informix (requiring specialised indices maintained by Informix for each contraint defined). comments/questions welcomed Lars
1248.10	I tried to document it :)	BOUVS::OAKEY	Assume is my favorite acronym	`Wed Aug 11 1993 17:38`	14
	�<<< Note 1248.7 by COPCLU::BRUNSGAARD "I swear to say the whole truth as often as I can" >>> � -< You are right OF COURSE >- �It does indeed work the way you describe. Lars, It's even documented in the ISD that it works this way (although thanks to Ed I did discover a misinterpretation of the code on my part that I will fix). I have the split taking place one entry off, but the fact that the split is not 50/50 when the entry is the last entry in the node is documented :)
1248.11	What is IRM :-)	COPCLU::BRUNSGAARD	I swear to say the whole truth as often as I can	`Fri Aug 13 1993 14:40`	6
	I am sorry that I cant remeber the IRM of the top of my head :-) Anyway a NICE feature we now have (without changing anything). Thanks for straighting me out Lars