[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:	DEC Rdb against the World

Moderator:	HERON::GODFRIND

Created:	Fri Jun 12 1987
Last Modified:	Thu Feb 23 1995
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1348
Total number of notes:	5438

881.0. "Oracle question." by SS::SPIRO () Thu Mar 07 1991 19:34

I have a question concerning Oracle's versioning mechanism and their redo log.
I know very little about Oracle so some of my terms might be off.

As I understand it, Oracle supports a versioning mechanism that is similar
to the Rdb snapshot mechanism.  

In addition, Oracle supports a form of redo recovery/logging (that is, data
does not need to be flushed to disk before transaction commit). In order to
support redo recovery, after-images must be written to a recovery log so
that after a failure, unflushed, committed updates can be redone.

The question is: does Oracle also write their 'snapshot records' to the redo
log, so that after a failure, not only are the data updates redone, but the
snapshot records are also redone?  If they do write their 'snapshot records'
to the redo log, do these get propagated to archive their log (or whatever it's 
called)?

Thinking cap required for the following discussion: ;-)

Now it seems to me that Oracle might not even have this problem if database
access was restricted to one node, and a user was not allowed to fail
independently of a node failure.  In other words, without cross-cluster access,
there is not the scenario of a live transaction wishing to see an earlier
version of a record, therefore they wouldn't have to redo any snapshot
records.

But with their current release which allows cross-cluster access, it 
seems they might encounter this problem.  In this instance the scenario might
be: on node 1, a read_only transaction starts (call it T1).  On node 2, a user
upates a record.  Now there are two versions of that record, the live version
and the 'snap' version.  At this point if node 2 crashes, the redo recovery
must make sure both versions of the record are redone, because T1 might
try and read the record and it should see the earlier version.

-peter

T.R	Title	User	Personal Name	Date	Lines
881.1	degree_3, updates only from one node?	IJSAPL::OLTHOF	Henny Olthof @UTO 838-2021	`Fri Mar 08 1991 08:29`	9
	As far as the snapshotting is concerned, Oracle does not have the degree_3 consistency, meaning reading the same record inside a TXN does not imply you get the same values (they may have been changed). As far as clusters are concerned, to my understanding there's still the recommendation made by Oracle to restict updating of the database to one node (even with V6.2). Please correct me if I'm wrong. Henny Olthof, TP-DB Netherlands
881.2	I didn't realize, no degree 3	MJBOOT::WEINBROM	Jack of all trades-Master of none	`Fri Mar 08 1991 19:01`	8
	Pardon me, but no degree 3 consistency? I didn't realize that. I just assumed that Oracle had it. You'd think that after all these years, I'd have learned my lesson "Never assume". However, I do have a question: Is there an advantage to NOT providing degree 3 consistency? There must be a reason for not doing it.
881.3	Oracle Cluster support - Maybe??	TRCA03::MCMULLEN	Ken McMullen	`Fri Mar 08 1991 19:37`	9
	Feb 25th issue of Digital Review Front page article on Oracle stating that "next" months release of Oracle will support VAX Clusters. They (may) have delivered on an a product feature they started selling five years ago. The article also has a few quotes from Mr. Finkelstein. He basically says Oracle's cluster support will probably not work very well.
881.4	Let your programmers handle it	IJSAPL::OLTHOF	Henny Olthof @UTO 838-2021	`Mon Mar 11 1991 08:17`	16
	Re 2 Reason for not implementing degree-3 is of course concurrency in the database (release read locks as soon as possible), also minimising the lock overhead. There are commercial databases available where you can select wether to have degree-2 or degree-3, in that way having the best of both worlds. To do degree-3 is a lot more work than to support degree-2. It's not a very serious thing IF your programmers are aware of it and CONSEQUENTLY take the right measures to deal with the problem in their application. The latter is perhaps a bigger problem, you cannot rely on the discipline of your programers so you better make the system to do it. Same holds for enforcing database integrity. Henny Olthof, TP-DB Netherlands
881.5	clarification...	NOVA::SPIRO		`Mon Mar 11 1991 13:29`	11
	Henny, If Oracle does not provide degree-3 consistency for snaps, they probably don't even recover their snapshot records after a failure. Which implies that they probably don't write snapshot records to the recovery log. True or false? Thanks. -peter
881.6	read only transaction = degree 3	HGOVC::DEANGELIS	Momuntai	`Tue Mar 12 1991 04:07`	48
	Hi Pete (how're the Red Soxs doing :-)), Looking through the V6 docs, it states that by default Oracle provides "read consistency" which: . guarantees that the data seen by a statement (such as a query) will not change during statement execution . assures that readers of database data do not wait for writers of the same database blocks ie. degree 2. If you want repeatable operations then you either: 1. manually lock what you want if you want to update it, or 2. specify a read only trasaction In case 1, manual locking involves using the "SELECT... FOR UPDATE" clause or the LOCK TABLE statement. Case 2: SET TRANSACTION READ ONLY gives you repeatable reads for the life of the transaction, ie. degree 3. Other snippets from the docs: "ORACLE's read consistency model is called a `multi-version' consistency model because multiple versions of the same table may appear to exist simultaneously." "The ORACLE RDBMS must create a read consistent set of data when the table is being simultaneously queried and updated. When a read consistent view is necessary, the RDBMS uses information in the rollback segments to generate a read consistent view of a table's data for a query. No read locks are ever required by ORACLE queries. This ensures that readers do not interfere with writers and writers do not interfere with readers". "The `snapshot' is taken during the execute phase. Though you can imagine it as a snapshot, it is not actually a full copy of the data. Instead, when other users have changed the queried data since the query began, ORACLE reconstructs the older data blocks that it needs, using rollback segments. Only blocks which have changed since the query began need to be reconstructed. Read consistency is thus assured and the performance impact is acceptable". I'll look into how rollback segments work (eg in crash recovery) and post a reply. John.
881.7		WIBBIN::NOYCE		`Tue Mar 12 1991 21:57`	13
	Peter, it seems to me that they wouldn't need to write the snap records out. All the necessary information could be in the redo log for the live records. When the "redo" process is processing the redo log, it would have the opportunity at that point to move the old record to the snap area before writing the new record (from the redo log) into the live page. You can probably work out what ordering requirements are needed to make this work reliably, and what checks are needed to avoid making duplicate copies (which would waste space). I don't have any idea how Oracle does it, but I can see a way it could be done cheaply... -- Bill
881.8		SS::SPIRO		`Wed Mar 13 1991 14:10`	22
	re .6 - Gidday John. Thanks for the info. So their read_only transactions do provide degree 3. It seems their rollback segments are similar to our snapshot files. So they have a separate undo log and redo log? I wonder how the rollback segments are garbage collected? re .7 - Bill, you're right, depending on the implementation it might not be necessary to write the snap records out. However it is somewhat tricky. Remember the redo log contains only after-images; you're not always guaranteed to have the before-image on the live page (which would enable you to 're-create' the snap copy during a redo phase). That is, the live page might have been flushed to disk during a transaction, thereby eliminating the info needed to store a snap copy during redo! In this case, redo is not needed for the page in question. Anyway my guess is, in 6.2 (which I know nothing about), since they supposedly have cross-cluster access, after a node failure, they probably depend on the rollback segments, mentioned by John in 881.8, to provide the proper versioning capability to read_only transactions still alive on different nodes. And that they probably flush these rollback segments before any corresponding live data page is flushed to disk. -peter
881.9	v6 architecture	RIPPLE::UNWIN_SC		`Tue Mar 19 1991 20:30`	70
	Oracle architecture for transactions and recovery: Oracle maintains buffers in shared global memory: .Data base buffer pool consisting of data segment blocks and rollback segment blocks. Corresponds to datablocks and rollback segment blocks in the database file. .Redo log buffer. Contains changes made to data segment blocks and rollback segment blocks during the transaction. Corresponds to redo log blocks in the redo log file. When a user issues a read, no locks are placed, instead a read consistent image of the database is built from the rollback segments and data segments. The rollback segemnts are timestamped. During transactions changes are not written immediately to the database files. The following occurrs: Data if needed is read into the buffer pool from the database file. When a user issues an update new bytes are recorded in the data segment blocks and old bytes are recorded in the rollback segment blocks. A record of both of these changes is then recorded in the redo log buffers When the user issues a commit the LGWR or Log writer an Oracle background process flushes the redo log buffers to disk(REDO LOG FILE). The user then is notified that the transaction has been committed. This action permanently records the transaction, ie all data necessary to recover in the event of media failure has been recorded. Note: The LGWR also flushes data to disk if the redo log buffer fills. In some circumstances your data will be written without a commit. Redo log records from multiple transactions requesting to commit at the same time are piggy backed to disk at the same time. At most there will be one synchronous write per transaction and on average less. Information in the database buffer pool is NOT written to the database every time a transaction is committed. It is written asynchronously to disk using a least recently used algorithm thus decreasing disk I/O. This data is written by the DBWR or database writer whether the data has been committed or not. The rollback segment block is always written before the corresponding data segment block. The database writer performs what Oracle calls a CHECKPOINT to synchronize the database and redo log files. A checkpoint occurs when a predetermined number of redo log blocks have been written to disk or when a redo log file fills and a log file switch occurs. Each time a checkpoint occurs the DBWR writes all database blocks to disk that have been modified since the last checkpoint, committed as well as noncommitted. Once the checkpoint is completed, the redo log entries made since the previous checkpoint are no longer needed for database instance recovery. This checkpoint assures that frequently changed data gets written to disk, ie without this feature the LRU algorithm might never write the blocks to disk. The checkpoint insures that modified data blocks get written to disk. When a checkpoint begins the RDBMS remembers the location of the next entry in the online redo log file, writes the modified database buffers to the database file and then writes the remembered location to a "Control file". If database recovery is necessay after the checkpoint, it starts at the checkpoint location as noted in the DB file headers not before it. Checkpoint allow the RDBMS to determine which redo log files to apply to the database and to allow reuse of the redo log files. Hope this helps Scott
881.10	Timestamped, how ??	COPCLU::BRUNSGAARD	Who said I'm paranoid ??!?	`Wed Mar 20 1991 10:24`	16
	Thanks for the accurate information her. I have just onre further question. Rollback segments are timestamped ? Using a real timestamp (wall-clock) or a generated number ? If a real timestamp, then the Cluster implementation is to say the least VERY riscy, as the clock is only syncronized within one second (at best) between the machines. This means that in a real cluster environemtn (V6.2) people could see dirty data (ie non-commited data), or the recovery could fail as it processes rows in the wrong order. If a generated number, how is this then done ?? Lars