[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dec_data_distributor

Title:	The Replication Option for Rdb
Notice:	Product renamed to Replication Option for Rdb
Moderator:	BROKE::PROTEAU

Created:	Wed Mar 02 1994
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	287
Total number of notes:	1231

271.0. "Duplicate TSER row in RDB$CHANGES" by ITVMS1::GTACCHIN (Giovanni Tacchini) Thu Mar 20 1997 07:35

hi,

we have a strange situation about RDB$CHANGES.

Starting point: INVLOGTYP error in a transfer .LOG

well,

         select
             C1.RDBVMS$VINTAGE_TRANSFER_NAME,
             C1.RDB$VINTAGE_TSER,
             C1.RDB$VINTAGE_TIMESTAMP
         from RDB$VINTAGE C1
         where (C1.RDBVMS$VINTAGE_TRANSFER_ID > 0)

produce...

 RDBVMS$VINTAGE_TRANSFER_NAME      RDB$VINTAGE_TSER   RDB$VINTAGE_TIMESTAMP     
 SICC_ENT_CORP                               382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_DIENST                             382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KONTO                              382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KON_ZDL                            382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KUNDE                              382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KUN_RECHADR                        382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_MISC                               382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_MUI_VERTRAG                        382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_RECHADR                            382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_TEILNEHMER                         382288   18-MAR-1997 13:38:03.24   

that means: all changes on the source with TSER <= 382288 are replicated 
on target database.

After some check and analisys I have discovered that in RDB$CHANGES exists
many duplicate entrys for TSER.

	select 	RDB$TRANSACTION_TSER, COUNT(*)
	from	RDB$CHANGES
	where	RDB$TRANSACTION_TSER > 0
	group 	by RDB$TRANSACTION_TSER

it can be verified that about all TSER have a count=1 but there are about 50/60
TSER with count=2. The actual RDB$CHANGES cardinality is 100k~.

The first TSER duplicate value is 382289.

By mean of SELECT DBKEY FROM RDB$CHANGES & RMU/DUMP/AREA/START...
I have verified lasted time of about 1 hours btw two row in RDB$CHANGES
with the same TSER.

I have asked to the customer "What happend in that range of time?"

1) Stop ACMS application,
2) drop index defined in RDB$CHANGES for a mismake presence
3) one process has became RWAST 
4) shutdwon/reboot of system

The machine is VAX 7000 member of a cluster with other 2x VAX7000
OS is VAX/VMS V6.1, Rdb V6.0-1.

I would like to know:

1) what does it meant "(<value>)" in:
	INVLOGTYP,     invalid logical change file record type (<value>) was
         	       found

2) the possible reason for presence of duplicate values of TSER
3) REINITIALIZE TRANSFER work fine with this scenario?
4) if answer #3 is YES can I delete rows from target table before
   the reinitialize command, this in order to minimize the elapsed time 
   for transfer execution?
5) after all transfer REINITIALIZE, what does it happend with the 
   RDB$CHANGES rows?

The system is a production enviroment and the time available to manage
the problem is limited to a window in the night (3 or 4 hours) or
sunday time :-(,
fortunately the update logged in RDB$CHANGES are not important :-)


thank in advance

ciao
GIovanni

T.R	Title	User	Personal Name	Date	Lines
271.1	Yeah!	ITVMS1::GTACCHIN	Giovanni Tacchini	`Fri Mar 21 1997 08:08`	14
	Yeah, yesterday I just came back from holyday and my head & heart were elsewhere... Now I'm only looking for the answer for this question: How is possible to have a duplicate TSER in RDB$CHANGES row? sorry, thank you ciao GIovanni
271.2	Some answers to your questions	BROKE::PROTEAU	Jean-Claude Proteau	`Sun Mar 23 1997 23:56`	90
	>After some check and analisys I have discovered that in RDB$CHANGES exists >many duplicate entrys for TSER. > > select RDB$TRANSACTION_TSER, COUNT(*) > from RDB$CHANGES > where RDB$TRANSACTION_TSER > 0 > group by RDB$TRANSACTION_TSER > >it can be verified that about all TSER have a count=1 but there are about 50/60 >TSER with count=2. The actual RDB$CHANGES cardinality is 100k~. This is most interesting. It is the first time that I have heard of this type of corruption. It doesn't mean that it hasn't happened before, only that it has not been reported with enough information to point out the existence of duplicate TSER values. Whenever corruption of RDB$CHANGES is evident, we would like to analyze some parts of the database to try to determine what happened. Would it be possible to get: The database root file The RDB$SYSTEM storage area inluding snapshot file The segmented string storage area and snapshot file if different from RDB$SYSTEM The TSER value from the target databases in the RDB$VINTAGE tables. This tells us the last good transaction and, by implication, the next transaction which shows a corruption. >The first TSER duplicate value is 382289. > >By mean of SELECT DBKEY FROM RDB$CHANGES & RMU/DUMP/AREA/START... >I have verified lasted time of about 1 hours btw two row in RDB$CHANGES >with the same TSER. > >I have asked to the customer "What happend in that range of time?" Is the list of 4 items below what the customer told you in answer to your question? >1) Stop ACMS application, >2) drop index defined in RDB$CHANGES for a mismake presence When the index was dropped, were there any other applications attached to the source database? You seem to imply yes in item 3. Is the database accessed by more than one node in a cluster? If so, were there any applica- tions running on the other nodes and attached to the database when the index was dropped? >3) one process has became RWAST >4) shutdwon/reboot of system > >The machine is VAX 7000 member of a cluster with other 2x VAX7000 >OS is VAX/VMS V6.1, Rdb V6.0-1. > >I would like to know: > >1) what does it meant "(<value>)" in: > INVLOGTYP, invalid logical change file record type (<value>) was > found The error message is explained in the DDAL$MSG.DOC file in the SYS$HELP directory. In this case, it means that the information in the RDB$CHANGES table is bad. It cannot say how the data corruption occurred. That's why we would like to get copies of some of the database files to see if we can determine what might have caused the problem. >2) the possible reason for presence of duplicate values of TSER At present, I've no idea how this might have happened. You are the first to have reported such an error. We've heard of INVLOGTYP before, but customers did not send us the database files to analyze. >3) REINITIALIZE TRANSFER work fine with this scenario? Yes. >4) if answer #3 is YES can I delete rows from target table before > the reinitialize command, this in order to minimize the elapsed time > for transfer execution? I presume the transfer defintion says TO EXISTING FILENAME ... In such case, yes, you can delete the rows in the target tables before restarting the trans- fer. >5) after all transfer REINITIALIZE, what does it happend with the > RDB$CHANGES rows? The rows will initially be skipped over, and eventually they will be erased from the table. Claude
271.3		itvms1.it.oracle.com::GTACCHIN	Giovanni Tacchini	`Tue Mar 25 1997 08:38`	123
	hi Claude, The situation now is normal and Ok. On friday (21-mar-1997) from 03:30am to 07:00 am all transfer are REINITIALIZED and after, saturday, sunday... all transfer are started on regular daily basis. Now the RDB$CHANGES are purged. Fortunately, during the analisys of corruption, I have performed all your suggestion (in note 272) plus a verify via dump of AIJ file that the data in RDB$CHANGES are generated via normal user application program (not UPDATE RDB$CHANGES...). My conclusion is that the TSER value, after "X" hour, is not good. That means: isn't updated (or missing or not flushed, I don't know) the location where TSER is stored wich last value used. This is only my IMPRESSION. At this point I have: 1-RDB$CHANGES purged 2-the root file is 2783 blk 3-the RDB$SYSTEM.RDA is 1804656 blk 4-the RDB$SYSTEM.SNP is 1741956 blk (is a MF database without user object in RDB$SYSTEM, RDB$CHANGES cardinality is approx 200k~250k row before the first REPLICATION UPDATE TRANSFER after the REINITIALIZE) 5-output of :SQL> sel DBKEY, RDB$TRANSACTION_TID, RDB$TRANSACTION_SEQUENCE, RDB$TRANSACTION_TSER from rdb$changes where RDB$TRANSACTION_TSER > 0 order by RDB$TRANSACTION_TSER; (omitted) 6-sel * from RDB$TRANSFERs; (source DB) RDB$TRANSFER_NAME RDB$VINTAGE_TSER SICC_ENT_CORP 378734 SICC_ENT_DIENST 378734 SICC_ENT_KONTO 378734 SICC_ENT_KON_ZDL 378734 SICC_ENT_KUNDE 378734 SICC_ENT_MISC 378948 SICC_ENT_RECHADR 378755 SICC_ENT_KUN_RECHADR 378734 SICC_ENT_MUI_VERTRAG 378734 SICC_ENT_TEILNEHMER 378734 10 rows selected 7- select C1.RDBVMS$VINTAGE_TRANSFER_NAME, (from a view defined in target DB) C1.RDB$VINTAGE_TSER, C1.RDBVMS$VINTAGE_TRANSFER_BUSY, C1.RDB$VINTAGE_TIMESTAMP from RDB$VINTAGE C1 where (C1.RDBVMS$VINTAGE_TRANSFER_ID > 0) NAME TSER BUSY TIMESTAMP SICC_ENT_CORP 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_DIENST 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_KONTO 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_KON_ZDL 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_KUNDE 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_KUN_RECHADR 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_MISC 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_MUI_VERTRAG 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_RECHADR 382288 F 18-MAR-1997 13:38:03.24 SICC_ENT_TEILNEHMER 382288 F 18-MAR-1997 13:38:03.24 8-sel RDB$TRANSACTION_TSER, count(*) from RDB$CHANGES group by RDB$TRANSACTION_TSER; (omitted) 9-a RMU/DUMP/AREA=RDB$SYSTEM/Start... (without /OPTION=DEBUG) of two page with same TSER (omitted) 10-two AIJ file involved in the "X" hour (+/- 500k blk each) (omitted) 11-RMU/DUMP/HEADER source_database (omitted) Not all info I post here for space reason. Mind the RDB$SYSTEM.(RDA SNP) dimension, is impossible to COPY. Eventually I can only SAVE it in other disk in order to analyze the corruption. >Is the list of 4 items below what the customer told you in answer to your >question? yes >When the index was dropped, were there any other applications attached to >the source database? RMU/DUMP/USER & RMU/SHOW USER report 0 user >You seem to imply yes in item 3. sorry, the process in RWAST is a CP ACMS process, not involved, not attached. >Is the database accessed by more than one node in a cluster? in normal condition yes, at the "drop index" time no. Sorry for the my stupid question in .0 (Q5 Q4 Q3 Q1 to be precise). I wrote these in order to obtain help from EC people at GMT time :-) before my personal test. A very good things for me a this point is acquire more knowledge and/or tools in order to analyze 1500k blks of RDB$SYSTEM (when and if a new corruption happen), for example: 1-exist a INTERNALS... for ROR? 2-where is stored the LAST TSER used in a source database... etc... Thank you, ciao GIovanni
271.4	A few answers	BROKE::PROTEAU	Jean-Claude Proteau	`Mon Apr 07 1997 10:56`	23
	Giovanni, > 1-exist a INTERNALS... for ROR? Sorry, no. There is a little information in one of the appendices of the ROR handbook (appendix C, I think). > 2-where is stored the LAST TSER used in a source database. While there are applications running, the last TSER used in the source database is not on disk, it is in virtual memory. But if you look at the database at a point where there are no other users attached, the way to find out is to search RDB$CHANGES for the maximum TSER value in the RDB$TRANSACTION_TSER column. ------------------- By the way, please read note 272. It explains what information we need in case such a problem recurs. Ciao, Claude
271.5	thank...	itvms1.it.oracle.com::GTACCHIN	Giovanni Tacchini	`Wed Apr 09 1997 13:06`	9
	welcome Claude! > By the way, please read note 272. It explains what information we need > in case such a problem recurs. done :-) see my previous reply... ciao, GIovanni
271.6	Time to enter a bug report	BROKE::PROTEAU	Jean-Claude Proteau	`Fri Apr 11 1997 10:10`	6
	Well, Giovanni, do you want to pursue this problem anymore? If so, submit a bug report and provide us if you can with the information you say you gathered. Claude
271.7		itvms1.it.oracle.com::GTACCHIN	Giovanni Tacchini	`Fri Apr 11 1997 12:30`	20
	hi, >Well, Giovanni, do you want to pursue this problem anymore? NO >If so, submit a bug report and provide us if you can with the information you >say you gathered. Since RDB$CHANGES is purged, RDB$SYSTEM.(RDA,SNP) are too big and dated TODAY and AIJ files are deleted I can't manage a BUG (I think, right?). In my note 271.3 you can see the list of info that I have. With exception of item #3 (RDB$SYSTEM.RDA), #4 (RDB$SYSTEM.SNP) and #10 I can send you by MAIL (or post here) all other info. The most important is prepare the env. suitable to collect all info, in other hand reduce the RDB$SYSTEM size. IF and WHEN corruption re-happen I open a bug. thank Claude, ciao GIovanni
271.8	Wait until Wednesday	BROKE::PROTEAU	Jean-Claude Proteau	`Fri Apr 11 1997 16:08`	6
	The information to gather was suggested by Ian Smith. I'll ask him if the partial information you have might be of value. If you don't hear from us by the middle of next week, assume no. Claude