| >if I have a table T1 with field F1, F2, F3 .. Fn and define a transfer
>type REPLICATION on this table BUT only on field F1, F2 and I perform
>sql command an UPDATE T1 SET F3=... I obtain that:
>
>-in the RDB$CHANGES the info about UPDATE is stored, Ok
> I think that the overhead to filter the UPDATE is too much.
>
>-at the execution time on transaction is applied on target database
> WHY?, in the target table on the target database field F3 is not present!
> In many cases this can generate many transaction not necessary
> to have a target DB aligned!
Giovanni, the reason has to do with how error recovery is handled for replica-
tion update transfers. Let me develop a simplified example.
Here is a list of updates recorded to the RDB$CHANGES table. Each update can
consist of one or more inserts, updates and deletes to the same or different
tables, all performed in a single transaction.
RDB$CHANGES
-----------
update (transaction) 942
update (transaction) 943
update (transaction) 944
...
When ROR (the Replication Option for Rdb) processes the RDB$CHANGES table, it
sorts the rows into order by transaction in the order the transactions were
committed. For each such transaction, a transaction is started and committed
in the target database. In some cases, the changes made in the source database
are relevent to the transfer and result in changes applied to the target tables.
In other cases the changes do not apply to the target tables and therefore are
not made.
Let's assume that transaction 942 is applied to the target successfully, but
that during the transfer of transaction 943, the source machine crashes. When
the machine is rebooted, ROR restarted, and the transfer rerun, how are we to
know that transaction 942 was committed to the target database and that we
should restart with 943?
1) One way is to erase transaction 942 from RDB$CHANGES once it has been
committed in the target database. But we can't do that because some
other transfer might need to transfer the same transaction to a different
target database.
2) Another way is to record where the transfer got to so we know where to
continue. That's in fact what ROR does. The information could be
written into the RDB$TRANSFERS table in the source database. We don't
do that because otherwise, with multiple transfers executing in parallel,
the RDB$TRANSFERS table would be a hot spot and the source of constant
lock contention problems.
3) ROR writes the transaction number into the RDB$VINTAGE table in the
target database (the DDAL$TRANSFER_INFO table if the target is other
than Oracle Rdb). Whenever ROR needs to start a new execution, it first
reads the information from this target table to know where to continue
reading in the RDB$CHANGES table.
Even when the target tables are not modified, RDB$VINTAGE (DDAL$TRANSFER_INFO)
is updated to note the most recent source transaction that has been processed.
If we did not do that, the next time the transfer is rerun ROR might have to
reread transactions it had already finished with the last time.
In the case you mention, we are not smart enough to realize that there was no
change in columns F1 and F2 and that therefore there's no need to modify the
target table. We go ahead and modify the target row even though there's no
change. We could be smarter and compare the before and after source row
images to see if there was a change. My guess is that in some cases this
could improve performance and in other cases it could make it worse. How might
it be worse? Let's assume you have tables with large segmented strings (varbyte
lists). Let's assume that these strings are frequently changed. Checking for
a change only to find out that there was one adds to the overall execution time.
Maybe this is an area that should be under customer control. That is, perhaps
customers can analyze the way their data change and decide on a table by table
or perhaps transfer by transfer basis whether or not such change checking ought
to be done. If a good case can be made for doing this, we could add this to
the product wish list as a future enhancement request. Where would the com-
parisons be done, in Rdb when the changes are captured in RDB$CHANGES or in
ROR when it is time to transfer the changes to the target tables? If done in
Rdb, it would be possible to reduce the amount of information written into
RDB$CHANGES. However, it would also add extra execution time. You probably
would not want to do so.
|
| hi all,
RE .1
>We could be smarter and compare the before and after source row
>images to see if there was a change.
Why? is necessary to compare data?
Consider only the UPDATE statements.
For example:
on the source db:
UPDATE T1 SET F1=x, F3=y WHERE condition1 ...
in RDB$CHANGES what happens?
one or more new row sure, but, the encoded BLR what mean?
UPDATE T1 SET F1=x, F2=old_value, F3=y, F4=old_value, Fn=old_value
WHERE ROR_PRIMARY_KEY (I think) = list_ror_dbkeys
or
UPDATE T1 SET F1=x, F3=y
WHERE ROR_PRIMARY_KEY (I think) = list_ror_dbkeys
My guess is that second way is better for two reason:
1) at execution time of the transfer, comparing the definition of
the transfer and the list of the field in SET clause is possible
to choose if statement are rilevant for the target table in the
target database. If row is relevent then send, otherwise, but optionally
(I prefere), update only RDB$VINTAGE on target database
2) the RDB$CHANGES growth is less
>If a good case can be made for doing this, we could add this to
>the product wish list as a future enhancement request.
WONDERFUL!!!! :-))))
Our customer have 10 transfer from one source db to one target db.
The table are 32. Four/five table have 1~2M row. Each two month 60~80%
of this row are updated on a field not present in target table. The transfer
will be executed every night. Is sufficient?
>Where would the com-
>parisons be done, in Rdb when the changes are captured in RDB$CHANGES or in
>ROR when it is time to transfer the changes to the target tables?
When ROR analyze the RDB$CHANGES. ABSOLUTELY!
RE .2
>Therefore, only the UPDATE statements which affected those columns
>would be journalled in RDB$CHANGES.
BE CAREFULLY!
The overhead of ROR in production system is 0.47%, this is a COMPUTED value,
not estimated. :-)
We have defined an performance evalutation enviroments with,
-HW like production system;
-source database for performance = d.b. production (RMU/RESTORE each time 4h!)
-many script DTM (Dec Test Manager, I think) that simulate activity
operator (R/O & R/W Txn) submitted in 60 batch queue running concurrently
-over 12k new customer added to the source database (R/W Txn, involved 4/5
table, one transcation = one new customer)
-the duration of the test is one hour and half
I have used this configuration & workload two time, the first without ROR
active, the second with the ROR active and transfer defined (after initial
replication).
The delta time for inserting one new customer in the source database
is +0.47%, all other activities have the same elapsed time.
This INSIGNIFICANT overhead is the "KEY" to apply ROR for this kind of
problem (replication of data). I think. In the night, after the full
backup of source database the transfer are excuted.
I prefer that you introduce overhead in the night activities.
ciao
GIovanni
|