[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:DEC Rdb against the World
Moderator:HERON::GODFRIND
Created:Fri Jun 12 1987
Last Modified:Thu Feb 23 1995
Last Successful Update:Fri Jun 06 1997
Number of topics:1348
Total number of notes:5438

881.0. "Oracle question." by SS::SPIRO () Thu Mar 07 1991 19:34

I have a question concerning Oracle's versioning mechanism and their redo log.
I know very little about Oracle so some of my terms might be off.

As I understand it, Oracle supports a versioning mechanism that is similar
to the Rdb snapshot mechanism.  

In addition, Oracle supports a form of redo recovery/logging (that is, data
does not need to be flushed to disk before transaction commit). In order to
support redo recovery, after-images must be written to a recovery log so
that after a failure, unflushed, committed updates can be redone.

The question is: does Oracle also write their 'snapshot records' to the redo
log, so that after a failure, not only are the data updates redone, but the
snapshot records are also redone?  If they do write their 'snapshot records'
to the redo log, do these get propagated to archive their log (or whatever it's 
called)?

Thinking cap required for the following discussion: ;-)

Now it seems to me that Oracle might not even have this problem if database
access was restricted to one node, and a user was not allowed to fail
independently of a node failure.  In other words, without cross-cluster access,
there is not the scenario of a live transaction wishing to see an earlier
version of a record, therefore they wouldn't have to redo any snapshot
records.

But with their current release which allows cross-cluster access, it 
seems they might encounter this problem.  In this instance the scenario might
be: on node 1, a read_only transaction starts (call it T1).  On node 2, a user
upates a record.  Now there are two versions of that record, the live version
and the 'snap' version.  At this point if node 2 crashes, the redo recovery
must make sure both versions of the record are redone, because T1 might
try and read the record and it should see the earlier version.

-peter
T.RTitleUserPersonal
Name
DateLines
881.1degree_3, updates only from one node?IJSAPL::OLTHOFHenny Olthof @UTO 838-2021Fri Mar 08 1991 08:299
    As far as the snapshotting is concerned, Oracle does not have the
    degree_3 consistency, meaning reading the same record inside a TXN does
    not imply you get the same values (they may have been changed).
    
    As far as clusters are concerned, to my understanding there's still the
    recommendation made by Oracle to restict updating of the database to
    one node (even with V6.2). Please correct me if I'm wrong.
    
    Henny Olthof, TP-DB Netherlands
881.2I didn't realize, no degree 3MJBOOT::WEINBROMJack of all trades-Master of noneFri Mar 08 1991 19:018
    Pardon me, but no degree 3 consistency?  I didn't realize that.  I just
    assumed that Oracle had it.  You'd think that after all these years,
    I'd have learned my lesson "Never assume".
    
    However, I do have a question:
    
    Is there an advantage to NOT providing degree 3 consistency?  There
    must be a reason for not doing it.
881.3Oracle Cluster support - Maybe??TRCA03::MCMULLENKen McMullenFri Mar 08 1991 19:379
    Feb 25th issue of Digital Review
    
    Front page article on Oracle stating that "next" months release of
    Oracle will support VAX Clusters. They (may) have delivered on an a
    product feature they started selling five years ago.
    
    The article also has a few quotes from Mr. Finkelstein. He basically
    says Oracle's cluster support will probably not work very well.
    
881.4Let your programmers handle itIJSAPL::OLTHOFHenny Olthof @UTO 838-2021Mon Mar 11 1991 08:1716
    Re 2
    
    Reason for not implementing degree-3 is of course concurrency in the
    database (release read locks as soon as possible), also minimising the
    lock overhead. There are commercial databases available where you can
    select wether to have degree-2 or degree-3, in that way having the best
    of both worlds. To do degree-3 is a lot more work than to support
    degree-2.
    
    It's not a very serious thing IF your programmers are aware of it and
    CONSEQUENTLY take the right measures to deal with the problem in their
    application. The latter is perhaps a bigger problem, you cannot rely on
    the discipline of your programers so you better make the system to do
    it. Same holds for enforcing database integrity.
    
    Henny Olthof, TP-DB Netherlands
881.5clarification...NOVA::SPIROMon Mar 11 1991 13:2911
Henny,

If Oracle does not provide degree-3 consistency for snaps, they probably
don't even recover their snapshot records after a failure.  Which implies
that they probably don't write snapshot records to the recovery log.

True or false?

Thanks.

-peter
881.6read only transaction = degree 3HGOVC::DEANGELISMomuntaiTue Mar 12 1991 04:0748
Hi Pete (how're the Red Soxs doing :-)),

Looking through the V6 docs, it states that by default Oracle provides "read
consistency" which:

. guarantees that the data seen by a statement (such as a query) will not 
  change during statement execution

. assures that readers of database data do not wait for writers of the same
  database blocks

ie. degree 2. 

If you want repeatable operations then you either:

1. manually lock what you want if you want to update it, or
2. specify a read only trasaction

In case 1, manual locking involves using the "SELECT... FOR UPDATE" clause
or the LOCK TABLE statement.

Case 2: SET TRANSACTION READ ONLY gives you repeatable reads for the life of
the transaction, ie. degree 3.

Other snippets from the docs:

"ORACLE's read consistency model is called a `multi-version' consistency model
because multiple versions of the same table may appear to exist
simultaneously."

"The ORACLE RDBMS must create a read consistent set of data when the table is
being simultaneously queried and updated. When a read consistent view is
necessary, the RDBMS uses information in the rollback segments to generate a
read consistent view of a table's data for a query. No read locks are ever
required by ORACLE queries. This ensures that readers do not interfere with
writers and writers do not interfere with readers".

"The `snapshot' is taken during the execute phase. Though you can imagine it as
a snapshot, it is not actually a full copy of the data. Instead, when other
users have changed the queried data since the query began, ORACLE reconstructs
the older data blocks that it needs, using rollback segments. Only blocks which
have changed since the query began need  to be reconstructed. Read consistency
is thus assured and the performance impact is acceptable".

I'll look into how rollback segments work (eg in crash recovery) and post a 
reply.

John.
881.7WIBBIN::NOYCETue Mar 12 1991 21:5713
    Peter, it seems to me that they wouldn't need to write the snap
    records out.  All the necessary information could be in the redo
    log for the live records.  When the "redo" process is processing
    the redo log, it would have the opportunity at that point to move
    the old record to the snap area before writing the new record (from
    the redo log) into the live page.  You can probably work out what
    ordering requirements are needed to make this work reliably, and
    what checks are needed to avoid making duplicate copies (which would
    waste space).
    
    I don't have any idea how Oracle does it, but I can see a way it
    could be done cheaply...
	-- Bill
881.8SS::SPIROWed Mar 13 1991 14:1022
re .6 - Gidday John.  Thanks for the info.  So their read_only transactions
do provide degree 3.  It seems their rollback segments are similar to our
snapshot files. So they have a separate undo log and redo log? I wonder how 
the rollback segments are garbage collected?

re .7 - Bill, you're right, depending on the implementation it might not
be necessary to write the snap records out.  However it is somewhat tricky.
Remember the redo log contains only after-images; you're not always guaranteed
to have the before-image on the live page (which would enable you to 're-create'
the snap copy during a redo phase).  That is, the live page might have been
flushed to disk during a transaction, thereby eliminating the info needed
to store a snap copy during redo!  In this case, redo is not needed for the page
in question.

Anyway my guess is, in 6.2 (which I know nothing about), since they supposedly
have cross-cluster access, after a node failure, they probably depend on the
rollback segments, mentioned by John in 881.8, to provide the proper versioning
capability to read_only transactions still alive on different nodes. And that
they probably flush these rollback segments before any corresponding live
data page is flushed to disk.

-peter
881.9v6 architectureRIPPLE::UNWIN_SCTue Mar 19 1991 20:3070
Oracle architecture for transactions and recovery:

Oracle maintains buffers in shared global memory:

.Data base buffer pool consisting of data segment blocks
 and rollback segment blocks. Corresponds to datablocks and rollback segment
 blocks in the database file.

.Redo log buffer. Contains changes made to data segment blocks and
 rollback segment blocks during the transaction. Corresponds to redo log
 blocks in the redo log file.
 

When a user issues a read, no locks are placed, instead a read 
consistent image of the database is built from the rollback segments
and data segments. The rollback segemnts are timestamped.

During transactions changes are not written immediately to the database
files. The following occurrs:

Data if needed is read into the buffer pool from the database file.
When a user issues an update new bytes are recorded in the data segment 
blocks and old bytes are recorded in the rollback segment blocks. A 
record of both of these changes is then recorded in the redo log buffers

When the user issues a commit the LGWR or Log writer an Oracle background 
process flushes the redo log buffers to disk(REDO LOG FILE). 
The user then is notified 
that the transaction has been committed. This action permanently records 
the transaction, ie all data necessary to recover in the event of media
failure has been recorded. Note: The LGWR also flushes data to disk if 
the redo log  buffer fills. In some circumstances your data will be written 
without a commit.

Redo log records from multiple transactions requesting to commit at the 
same time are piggy backed to disk at the same time. At most there will 
be one synchronous write per transaction and on average less.

Information in the database buffer pool is NOT written to the database
every time a transaction is committed. It is written asynchronously to 
disk using a least recently used algorithm thus decreasing disk I/O.
This data is written by the DBWR or database writer whether the data has 
been committed or not. The rollback segment block is always written
before the corresponding data segment block.

The database writer performs what Oracle calls a CHECKPOINT to 
synchronize the database and redo log files. A checkpoint occurs
when a predetermined number of redo log blocks have been written to disk
or when a redo log file fills and a log file switch occurs. Each time a 
checkpoint occurs the DBWR writes all database blocks to disk that have 
been modified since the last checkpoint, committed as well as 
noncommitted. Once the checkpoint is completed, the redo log 
entries made since the previous checkpoint are no longer needed for 
database instance recovery. This checkpoint assures that frequently 
changed data gets written to disk, ie without this feature the LRU
algorithm might never write the blocks to disk. The checkpoint insures 
that modified data blocks get written to disk.

When a checkpoint begins the RDBMS remembers the location of the next 
entry in the online redo log file, writes the modified database buffers 
to the database file and then writes the remembered location to a
"Control file". If database recovery is necessay after the checkpoint,
it starts at the checkpoint location as noted in the DB file headers
not before it. Checkpoint allow the RDBMS to determine which redo log 
files to apply to the database and to allow reuse of the redo log files.

Hope this helps
Scott


881.10Timestamped, how ??COPCLU::BRUNSGAARDWho said I'm paranoid ??!?Wed Mar 20 1991 10:2416
    Thanks for the accurate information her.
    
    I have just onre further question.
    
    Rollback segments are timestamped ?
    Using a real timestamp (wall-clock) or a generated number ?
    
    If a real timestamp, then the Cluster implementation is to say the
    least VERY riscy, as the clock is only syncronized within one second
    (at best) between the machines.
    This means that in a real cluster environemtn (V6.2) people could see
    dirty data (ie non-commited data), or the recovery could fail as it
    processes rows in the wrong order.
    
    If a generated number, how is this then done ??
    Lars