| In an article on page 1 of the March 18, 1991 Digital Review, continued on page
6, they eat their words:
"DEC also takes issue with other claims about Rdb that Oracle officials made
last week, according to [Vicki] Farrell.
'Oracle made a big deal about Rdb not having row-level locking, and we do offer
row-level locking. Oracle said that we don't have group commits and we do, and
we have on-line backup, although they said we don't,' Farrell said.
When questioned about DEC's assertions, Oracle officials modified their stance.
'Yes, Rdb does row-level locking: We may[!] have made a mistake on that point.
But they also have limits on the number of locks that can be supported,' said
Ken Jacobs, Oracle's director of database marketing.
'...We had no intent to misrepresent [what Rdb can do],' Jacobs said.
[DEC] also [does] support on-line backup, but it incurs a substantial amount of
overhead,' an Oracle spokesman said."
The "[!]" in the above is mine.
Does anybody have the text of Oracle's announcement?
Also, the article was accompanied by a pie chart from Computer Intelligence that
has no date. Does anybody know the date of this graphic? It has the following
percentages:
"DBMS MARKET SHARE AT VAX SITES"
DEC* 38%
Oracle 22%
Other 17%
Ingres 15%
IBI 5%
Compushare 3%
* Includes both Rdb and DBMS. Does not include run-time Rdb.
The Computer Intelligence numbers for June 1990, according to Michael Booth's
Sybase Competitive Fact Sheet, were
Rdb 18.4%
Oracle 17.7%
Other 21.6%
Ingres 12.6%
In-House 6.5%
Focus 3.8%
System 1032 2.7%
Cincom 1.2%
Adabas 1%
Sybase 0.6%
Note that CI did not include a separate entry for Sybase in the most
recent numbers and that they did not break out Rdb and DBMS, separately.
Anybody know how Rdb and DBMS break out? I would guess that the DBMS numbers
have not changed, gone down a little maybe, so Rdb is probably 24%! That's a
larger percentage than I would have guessed for Rdb and Oracle.
Bruce
|
| Below is some mail that I wrote up to answer some questions about
Rdb/VMS, I believe there is a section on group commit (or group write
in there). I don't believe that the actual mechanism is in the normal
doc set, unless there is a one liner saying something like "Rdb/VMS
uses an efficient mechanism to group AIJ writes together..." It is
documented in the "Rdb/VMS Technical Handbook" which is an orderable
item, you'll have to look thru the rdb_40 notes file for the order
number. Hopefully this helps a bit.
-Jay
SYNCHRONOUS WRITE
- Database (i.e. data records as compared to log
records) I/O is done synchronously (at request
time as compared to commit time) with writes by
the application or user.
o At the I/O level we use both synchronous and
asynchronous writes for data, which I believe
may have been a confusing point during the course
of the conversations that you have had with DEC
people. During a write operation you will update
the tuple on the database page and we will defer
the write until one of three events:
1. Another user wants to update another tuple on
the database page. In this case the user is
signaled that someone wants the database page
and writes that page back to the disk....this
write is a synchronous write (before this
write can happen the one record updated is
journaled in the RUJ file...but that will be
discussed later).
2. Commit Time. - In the current release, all
database pages that have been updated will
be written to disk. The sequence of writes to
the journal will be discussed later, but the
write of updated database pages is done as a
synchronous operation, BUT using asynchronous
I/O's. How this is done is that we mark the
start of the buffer flush, issue all the
I/O's asynchronously and then continue in
the commit sequence when all data is written.
Thus although the operation is synchronous the
I/O is done asynchronously for each database
buffer (applications use say 100+ each). And
the total time of the operation is decreased
significantly. Tests have shown a graphed
result of time vs I/Os to level off so that
Page 2
the total time of say a complete flush of a
buffer pool of three buffers is approx. the
same as ten buffers.
3. Buffer Pool Overflow - a group of pages need
to be read into a buffer and there are no
empty buffers, thus a buffer has to be se-
lected for a flush to make a buffer available.
We have an internal method of partitioning
our buffer usage to select buffers contain-
ing database pages that have been updated vs
only read. Thus we can defer the flushing of
updated pages, which defers the corresponding
journal writes.
ASYNCHRONOUS WRITE
- Data records are written when it is convenient for
the database engine, possible at commit time.
o Explained above
DEFERRED WRITE
- Database (ie. data records) are written to disk
asynchronously from application processing. This
means that data records need not be written to the
database at commit time and implies that a "log
I/O" protocol is followed. During recovery this
means that uncommitted transactions need only be
removed from the recovery log and that committed
transactions which have not been applied to the
database are then applied.
o I wouldn't have used this terminology, so maybe
this is where some confusion arose. I would de-
scribe the above scenario as an undo/redo recov-
ery scheme with some type of checkpoint interval.
The checkpoint interval would define the interval
of the write operation to synchronize the both
Page 3
the undo/redo log operations and the committed
data. In the current release of Rdb/VMS in the
field we do not have this recovery scenario.
If I used the term deferred write I would be
referring to our the cache scheme of deferring
the write to the commit time, using the I/O
scheme described above.
GROUP COMMIT
- Log I/O operations are minimized by grouping to-
gether log data of multiple committed transactions
and writing the data to the recovery log in a single
I/O operation. The number of commits handled in this
way is usually dependent on the number of commits
ready for processing within some small time window.
This circumvents the need for sequential processing
for the individual commits, but does not signifi-
cantly delay commit processing if multiple commits
are not ready.
o This has been implemented in Rdb/VMS since V2.3
and has been improved over the years. But yes,
this is implemented as you describe the only dif-
ference is that we commonly call this our group
write capability VS the group commit capability.
The reason is that the grouping of writes to the
after image log is always done. The COMMIT point
is just a special case. The two cases are 1. that
a user writes a number of log records (as he up-
dates data records) to an after image log buffer,
when the buffer becomes full (dependent on the
size of the data records and number being up-
dated) the buffer then needs to be flushed. (this
is a rare event) 2. The commit point is reached
and the after image log record buffer is flushed.
At either point the group write mechanism is
used.
Page 4
SYNCH POINT
- A synch point is specific to a particular users
transaction space. All database and log buffers are
flushed to their appropriate designation.
o I would use this description as our commit point
in terms of recovery log behavior. However, also
I would equate a synch point to the start of a
transaction. This is the sets up the synchroniza-
tion of a database user in terms of the database
environment. The most important part of this is
the assignment of our internal TSN number. This
is used in terms of our recovery mechanisms for
synchronizing transactions (applying after im-
age logs to a database restored from a backup),
automatic space reclamation and our snapshot
versioning mechanism. Over the past few years
we have optimized this mechanism in a number of
different ways.
The first method is what we call "pre-start" of
a transaction. This happens at the commit point
of a writer within the database. When the I/O
is done to the root file to "mark" the user as
committed we optimistically believe that his
next operation will be the start of another write
operation. Thus at that time we 'pre-allocate'
the next TSN number for the user. So at the start
of the transaction the synchronous I/O that would
be needed to the root file (the coordinating
'accounting file' in our database environment) is
saved.
The last point I'd like to make is that Rdb/VMS
has an 'optimistic commit strategy'. What this
means is that traditionally the I/O to the root
file would be defined as the 'commit point' or
synch point in this context. 'Commit point' being
defined as the final point in the transaction
Page 5
where a 'rollback' would not happen. Anyways,
the optimistic commit strategy is that the AIJ
record is written to the log, before the synch
point (i/o to the root). If there is a crash
before the root I/O recovery would scan the aij
backwards through the AIJ looking for the commit
record for that transaction. If it finds it it
will commit the transaction, if not it will roll
it back. This way the total time of the I/O's at
the commit operations are stream lined in wait
time.
All of this (write to the root) is done with our
'group commit' optimization (vs. the group write
to the AIJ file described above...but using the
same concept).
SYSTEM CHECKPOINT
At a system checkpoint (possibly triggered on an inter-
val, when
- buffers are full, after N commits, at operating
system or TP monitor request, etc.), all database
and log buffers are flushed to their appropriate
destination. In the current release of Rdb/VMS
in the field we do not have this checkpointing
mechanism
QUESTIONS:
1. Which of the following does Rdb/VMS support?
* Described above.
2. What processing takes place at commit time?
* At commit time the order of writing is to the
ruj file (before images of data changes), the
Page 6
data area files (data files), the aij file (af-
ter images of changes) and the root file (synch
point). The RUJ file write is synchronous, the
data write is a synchronous operation that uses
asynchronous I/O as described above, the AIJ
I/O is synchronous using the group write capa-
bility described above and the root file I/O is
synchronous using the group commit capability
described above. The start of a new transaction
does not require an I/O, because of the pre-start
transaction capability as described above. What
this translates to is that the journal and root
file I/O is a "fractional I/O" during a users in-
dividual transaction...our measured rate is down
to .1 (10 users doing 1 TPS will have one I/O)
3. What processing takes place at recovery time?
* At recover time. There are two type of trans-
action recovery. One is that a user types in
ROLLBACK (or in VMS he exits his image 'nor-
mally'...which will invoke image exit handlers
that call ROLLBACK for the user). In this sit-
uation the user will re-apply before images of
updates (from the RUJ file) to database pages
that were updated, write a rollback record to
the AIJ file and terminate his transaction in
the root file. The other situation is an abnormal
termination (a node leaving the cluster or a user
that opens the database for the first time after
a complete system crash is just a special case of
this processing). The situation is detected by
the monitor process (really this is the only ma-
jor job this process plays in the system), a DBR
(database recovery) process is created, this pro-
cess inherits the context of the terminated user
from the root file and rolls back the transaction
as described above, if any database pages that
were written to have been flushed to disk. The
Page 7
important item not note is that we do not need
any 'corresponding log records' for a transaction
to rollback which I know other system do...this
results in a reduced number of log records and
faster system and media recovery scenarios.
4. How can a DBA control the amount of time it takes to
recover a database after a system crash?
* This can not be currently controlled. We view
this as really an option that goes hand in hand
with what you refer to as system synch points.
And will be implemented when the need arises.
5. What processing takes place when a database if
rolled forward?
* First the database is restored from a backup.
The user then uses the RMU/RECOVER command which
opens the AIJ file and sequentially reads the
file. It will buffer log records for a transac-
tion, if the transaction is rolled back it will
discard the records buffered. If the transaction
is committed it would be applied to the database.
Rdb/VMS has a recover/rollforward capability on
a per area basis. Say, one area is corrupt. The
area is deleted, that one area is restored from
a backup file (you can backup only one area if
you want or the restore facility will selectively
get one area out of a complete backup file).
Then apply the AIJ file to that one area...this
first gathers context out of the root file to
determine the last committed transaction within
the database environment and then applies updates
to the area being restored until that commit
point is reached.
Page 8
6. Is the AIJ optional? If so, what are the penalties
for not using it (ie, what can't I do)? Yes it
is totally optional. There are no penalties in the
current recovery schemes if disabled. A undo/redo
scheme, that is mentioned above would find this log
mandatory to use. Oh, there is one restriction, you
couldn't rollforward the database!
|