T.R | Title | User | Personal Name | Date | Lines |
---|
881.1 | degree_3, updates only from one node? | IJSAPL::OLTHOF | Henny Olthof @UTO 838-2021 | Fri Mar 08 1991 08:29 | 9 |
| As far as the snapshotting is concerned, Oracle does not have the
degree_3 consistency, meaning reading the same record inside a TXN does
not imply you get the same values (they may have been changed).
As far as clusters are concerned, to my understanding there's still the
recommendation made by Oracle to restict updating of the database to
one node (even with V6.2). Please correct me if I'm wrong.
Henny Olthof, TP-DB Netherlands
|
881.2 | I didn't realize, no degree 3 | MJBOOT::WEINBROM | Jack of all trades-Master of none | Fri Mar 08 1991 19:01 | 8 |
| Pardon me, but no degree 3 consistency? I didn't realize that. I just
assumed that Oracle had it. You'd think that after all these years,
I'd have learned my lesson "Never assume".
However, I do have a question:
Is there an advantage to NOT providing degree 3 consistency? There
must be a reason for not doing it.
|
881.3 | Oracle Cluster support - Maybe?? | TRCA03::MCMULLEN | Ken McMullen | Fri Mar 08 1991 19:37 | 9 |
| Feb 25th issue of Digital Review
Front page article on Oracle stating that "next" months release of
Oracle will support VAX Clusters. They (may) have delivered on an a
product feature they started selling five years ago.
The article also has a few quotes from Mr. Finkelstein. He basically
says Oracle's cluster support will probably not work very well.
|
881.4 | Let your programmers handle it | IJSAPL::OLTHOF | Henny Olthof @UTO 838-2021 | Mon Mar 11 1991 08:17 | 16 |
| Re 2
Reason for not implementing degree-3 is of course concurrency in the
database (release read locks as soon as possible), also minimising the
lock overhead. There are commercial databases available where you can
select wether to have degree-2 or degree-3, in that way having the best
of both worlds. To do degree-3 is a lot more work than to support
degree-2.
It's not a very serious thing IF your programmers are aware of it and
CONSEQUENTLY take the right measures to deal with the problem in their
application. The latter is perhaps a bigger problem, you cannot rely on
the discipline of your programers so you better make the system to do
it. Same holds for enforcing database integrity.
Henny Olthof, TP-DB Netherlands
|
881.5 | clarification... | NOVA::SPIRO | | Mon Mar 11 1991 13:29 | 11 |
| Henny,
If Oracle does not provide degree-3 consistency for snaps, they probably
don't even recover their snapshot records after a failure. Which implies
that they probably don't write snapshot records to the recovery log.
True or false?
Thanks.
-peter
|
881.6 | read only transaction = degree 3 | HGOVC::DEANGELIS | Momuntai | Tue Mar 12 1991 04:07 | 48 |
| Hi Pete (how're the Red Soxs doing :-)),
Looking through the V6 docs, it states that by default Oracle provides "read
consistency" which:
. guarantees that the data seen by a statement (such as a query) will not
change during statement execution
. assures that readers of database data do not wait for writers of the same
database blocks
ie. degree 2.
If you want repeatable operations then you either:
1. manually lock what you want if you want to update it, or
2. specify a read only trasaction
In case 1, manual locking involves using the "SELECT... FOR UPDATE" clause
or the LOCK TABLE statement.
Case 2: SET TRANSACTION READ ONLY gives you repeatable reads for the life of
the transaction, ie. degree 3.
Other snippets from the docs:
"ORACLE's read consistency model is called a `multi-version' consistency model
because multiple versions of the same table may appear to exist
simultaneously."
"The ORACLE RDBMS must create a read consistent set of data when the table is
being simultaneously queried and updated. When a read consistent view is
necessary, the RDBMS uses information in the rollback segments to generate a
read consistent view of a table's data for a query. No read locks are ever
required by ORACLE queries. This ensures that readers do not interfere with
writers and writers do not interfere with readers".
"The `snapshot' is taken during the execute phase. Though you can imagine it as
a snapshot, it is not actually a full copy of the data. Instead, when other
users have changed the queried data since the query began, ORACLE reconstructs
the older data blocks that it needs, using rollback segments. Only blocks which
have changed since the query began need to be reconstructed. Read consistency
is thus assured and the performance impact is acceptable".
I'll look into how rollback segments work (eg in crash recovery) and post a
reply.
John.
|
881.7 | | WIBBIN::NOYCE | | Tue Mar 12 1991 21:57 | 13 |
| Peter, it seems to me that they wouldn't need to write the snap
records out. All the necessary information could be in the redo
log for the live records. When the "redo" process is processing
the redo log, it would have the opportunity at that point to move
the old record to the snap area before writing the new record (from
the redo log) into the live page. You can probably work out what
ordering requirements are needed to make this work reliably, and
what checks are needed to avoid making duplicate copies (which would
waste space).
I don't have any idea how Oracle does it, but I can see a way it
could be done cheaply...
-- Bill
|
881.8 | | SS::SPIRO | | Wed Mar 13 1991 14:10 | 22 |
| re .6 - Gidday John. Thanks for the info. So their read_only transactions
do provide degree 3. It seems their rollback segments are similar to our
snapshot files. So they have a separate undo log and redo log? I wonder how
the rollback segments are garbage collected?
re .7 - Bill, you're right, depending on the implementation it might not
be necessary to write the snap records out. However it is somewhat tricky.
Remember the redo log contains only after-images; you're not always guaranteed
to have the before-image on the live page (which would enable you to 're-create'
the snap copy during a redo phase). That is, the live page might have been
flushed to disk during a transaction, thereby eliminating the info needed
to store a snap copy during redo! In this case, redo is not needed for the page
in question.
Anyway my guess is, in 6.2 (which I know nothing about), since they supposedly
have cross-cluster access, after a node failure, they probably depend on the
rollback segments, mentioned by John in 881.8, to provide the proper versioning
capability to read_only transactions still alive on different nodes. And that
they probably flush these rollback segments before any corresponding live
data page is flushed to disk.
-peter
|
881.9 | v6 architecture | RIPPLE::UNWIN_SC | | Tue Mar 19 1991 20:30 | 70 |
| Oracle architecture for transactions and recovery:
Oracle maintains buffers in shared global memory:
.Data base buffer pool consisting of data segment blocks
and rollback segment blocks. Corresponds to datablocks and rollback segment
blocks in the database file.
.Redo log buffer. Contains changes made to data segment blocks and
rollback segment blocks during the transaction. Corresponds to redo log
blocks in the redo log file.
When a user issues a read, no locks are placed, instead a read
consistent image of the database is built from the rollback segments
and data segments. The rollback segemnts are timestamped.
During transactions changes are not written immediately to the database
files. The following occurrs:
Data if needed is read into the buffer pool from the database file.
When a user issues an update new bytes are recorded in the data segment
blocks and old bytes are recorded in the rollback segment blocks. A
record of both of these changes is then recorded in the redo log buffers
When the user issues a commit the LGWR or Log writer an Oracle background
process flushes the redo log buffers to disk(REDO LOG FILE).
The user then is notified
that the transaction has been committed. This action permanently records
the transaction, ie all data necessary to recover in the event of media
failure has been recorded. Note: The LGWR also flushes data to disk if
the redo log buffer fills. In some circumstances your data will be written
without a commit.
Redo log records from multiple transactions requesting to commit at the
same time are piggy backed to disk at the same time. At most there will
be one synchronous write per transaction and on average less.
Information in the database buffer pool is NOT written to the database
every time a transaction is committed. It is written asynchronously to
disk using a least recently used algorithm thus decreasing disk I/O.
This data is written by the DBWR or database writer whether the data has
been committed or not. The rollback segment block is always written
before the corresponding data segment block.
The database writer performs what Oracle calls a CHECKPOINT to
synchronize the database and redo log files. A checkpoint occurs
when a predetermined number of redo log blocks have been written to disk
or when a redo log file fills and a log file switch occurs. Each time a
checkpoint occurs the DBWR writes all database blocks to disk that have
been modified since the last checkpoint, committed as well as
noncommitted. Once the checkpoint is completed, the redo log
entries made since the previous checkpoint are no longer needed for
database instance recovery. This checkpoint assures that frequently
changed data gets written to disk, ie without this feature the LRU
algorithm might never write the blocks to disk. The checkpoint insures
that modified data blocks get written to disk.
When a checkpoint begins the RDBMS remembers the location of the next
entry in the online redo log file, writes the modified database buffers
to the database file and then writes the remembered location to a
"Control file". If database recovery is necessay after the checkpoint,
it starts at the checkpoint location as noted in the DB file headers
not before it. Checkpoint allow the RDBMS to determine which redo log
files to apply to the database and to allow reuse of the redo log files.
Hope this helps
Scott
|
881.10 | Timestamped, how ?? | COPCLU::BRUNSGAARD | Who said I'm paranoid ??!? | Wed Mar 20 1991 10:24 | 16 |
| Thanks for the accurate information her.
I have just onre further question.
Rollback segments are timestamped ?
Using a real timestamp (wall-clock) or a generated number ?
If a real timestamp, then the Cluster implementation is to say the
least VERY riscy, as the clock is only syncronized within one second
(at best) between the machines.
This means that in a real cluster environemtn (V6.2) people could see
dirty data (ie non-commited data), or the recovery could fail as it
processes rows in the wrong order.
If a generated number, how is this then done ??
Lars
|