[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dec_data_distributor

Title:	The Replication Option for Rdb
Notice:	Product renamed to Replication Option for Rdb
Moderator:	BROKE::PROTEAU

Created:	Wed Mar 02 1994
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	287
Total number of notes:	1231

267.0. "RDB$CHANGES & UPDATE" by itvms1.it.oracle.com::GTACCHIN (Giovanni Tacchini) Tue Feb 18 1997 12:14

Hi all,

ROR 6.0

if I have a table T1 with field F1, F2, F3 .. Fn and define a transfer
type REPLICATION on this table BUT only on field F1, F2 and I perform
sql command an UPDATE T1 SET F3=... I obtain that:

-in the RDB$CHANGES the info about UPDATE is stored, Ok
 I think that the overhead to filter the UPDATE is too much.

-at the execution time on transaction is applied on target database
 WHY?, in the target table on the target database field F3 is not present!
 In many cases this can generate many transaction not necessary 
 to have a target DB aligned!

Thank in advance
GIovanni

T.R	Title	User	Personal Name	Date	Lines
267.1	The reasons it works the way it does	BROKE::PROTEAU	Jean-Claude Proteau	`Tue Feb 18 1997 13:02`	83
	>if I have a table T1 with field F1, F2, F3 .. Fn and define a transfer >type REPLICATION on this table BUT only on field F1, F2 and I perform >sql command an UPDATE T1 SET F3=... I obtain that: > >-in the RDB$CHANGES the info about UPDATE is stored, Ok > I think that the overhead to filter the UPDATE is too much. > >-at the execution time on transaction is applied on target database > WHY?, in the target table on the target database field F3 is not present! > In many cases this can generate many transaction not necessary > to have a target DB aligned! Giovanni, the reason has to do with how error recovery is handled for replica- tion update transfers. Let me develop a simplified example. Here is a list of updates recorded to the RDB$CHANGES table. Each update can consist of one or more inserts, updates and deletes to the same or different tables, all performed in a single transaction. RDB$CHANGES ----------- update (transaction) 942 update (transaction) 943 update (transaction) 944 ... When ROR (the Replication Option for Rdb) processes the RDB$CHANGES table, it sorts the rows into order by transaction in the order the transactions were committed. For each such transaction, a transaction is started and committed in the target database. In some cases, the changes made in the source database are relevent to the transfer and result in changes applied to the target tables. In other cases the changes do not apply to the target tables and therefore are not made. Let's assume that transaction 942 is applied to the target successfully, but that during the transfer of transaction 943, the source machine crashes. When the machine is rebooted, ROR restarted, and the transfer rerun, how are we to know that transaction 942 was committed to the target database and that we should restart with 943? 1) One way is to erase transaction 942 from RDB$CHANGES once it has been committed in the target database. But we can't do that because some other transfer might need to transfer the same transaction to a different target database. 2) Another way is to record where the transfer got to so we know where to continue. That's in fact what ROR does. The information could be written into the RDB$TRANSFERS table in the source database. We don't do that because otherwise, with multiple transfers executing in parallel, the RDB$TRANSFERS table would be a hot spot and the source of constant lock contention problems. 3) ROR writes the transaction number into the RDB$VINTAGE table in the target database (the DDAL$TRANSFER_INFO table if the target is other than Oracle Rdb). Whenever ROR needs to start a new execution, it first reads the information from this target table to know where to continue reading in the RDB$CHANGES table. Even when the target tables are not modified, RDB$VINTAGE (DDAL$TRANSFER_INFO) is updated to note the most recent source transaction that has been processed. If we did not do that, the next time the transfer is rerun ROR might have to reread transactions it had already finished with the last time. In the case you mention, we are not smart enough to realize that there was no change in columns F1 and F2 and that therefore there's no need to modify the target table. We go ahead and modify the target row even though there's no change. We could be smarter and compare the before and after source row images to see if there was a change. My guess is that in some cases this could improve performance and in other cases it could make it worse. How might it be worse? Let's assume you have tables with large segmented strings (varbyte lists). Let's assume that these strings are frequently changed. Checking for a change only to find out that there was one adds to the overall execution time. Maybe this is an area that should be under customer control. That is, perhaps customers can analyze the way their data change and decide on a table by table or perhaps transfer by transfer basis whether or not such change checking ought to be done. If a good case can be made for doing this, we could add this to the product wish list as a future enhancement request. Where would the com- parisons be done, in Rdb when the changes are captured in RDB$CHANGES or in ROR when it is time to transfer the changes to the target tables? If done in Rdb, it would be possible to reduce the amount of information written into RDB$CHANGES. However, it would also add extra execution time. You probably would not want to do so.
267.2		NOVA::SMITHI	Don't understate or underestimate Rdb!	`Tue Feb 18 1997 13:12`	7
	Rdb could detect that all transfers for a table only needed a subset of column. Therefore, only the UPDATE statements which affected those columns would be journalled in RDB$CHANGES. This I believe is already on the ROR wish list. Ian
267.3	if only it were true! (from my Collins dict!)	ITVMS1::GTACCHIN	Giovanni Tacchini	`Wed Feb 19 1997 05:30`	77
	hi all, RE .1 >We could be smarter and compare the before and after source row >images to see if there was a change. Why? is necessary to compare data? Consider only the UPDATE statements. For example: on the source db: UPDATE T1 SET F1=x, F3=y WHERE condition1 ... in RDB$CHANGES what happens? one or more new row sure, but, the encoded BLR what mean? UPDATE T1 SET F1=x, F2=old_value, F3=y, F4=old_value, Fn=old_value WHERE ROR_PRIMARY_KEY (I think) = list_ror_dbkeys or UPDATE T1 SET F1=x, F3=y WHERE ROR_PRIMARY_KEY (I think) = list_ror_dbkeys My guess is that second way is better for two reason: 1) at execution time of the transfer, comparing the definition of the transfer and the list of the field in SET clause is possible to choose if statement are rilevant for the target table in the target database. If row is relevent then send, otherwise, but optionally (I prefere), update only RDB$VINTAGE on target database 2) the RDB$CHANGES growth is less >If a good case can be made for doing this, we could add this to >the product wish list as a future enhancement request. WONDERFUL!!!! :-)))) Our customer have 10 transfer from one source db to one target db. The table are 32. Four/five table have 1~2M row. Each two month 60~80% of this row are updated on a field not present in target table. The transfer will be executed every night. Is sufficient? >Where would the com- >parisons be done, in Rdb when the changes are captured in RDB$CHANGES or in >ROR when it is time to transfer the changes to the target tables? When ROR analyze the RDB$CHANGES. ABSOLUTELY! RE .2 >Therefore, only the UPDATE statements which affected those columns >would be journalled in RDB$CHANGES. BE CAREFULLY! The overhead of ROR in production system is 0.47%, this is a COMPUTED value, not estimated. :-) We have defined an performance evalutation enviroments with, -HW like production system; -source database for performance = d.b. production (RMU/RESTORE each time 4h!) -many script DTM (Dec Test Manager, I think) that simulate activity operator (R/O & R/W Txn) submitted in 60 batch queue running concurrently -over 12k new customer added to the source database (R/W Txn, involved 4/5 table, one transcation = one new customer) -the duration of the test is one hour and half I have used this configuration & workload two time, the first without ROR active, the second with the ROR active and transfer defined (after initial replication). The delta time for inserting one new customer in the source database is +0.47%, all other activities have the same elapsed time. This INSIGNIFICANT overhead is the "KEY" to apply ROR for this kind of problem (replication of data). I think. In the night, after the full backup of source database the transfer are excuted. I prefer that you introduce overhead in the night activities. ciao GIovanni