[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dec_data_distributor

Title:The Replication Option for Rdb
Notice:Product renamed to Replication Option for Rdb
Moderator:BROKE::PROTEAU
Created:Wed Mar 02 1994
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:287
Total number of notes:1231

271.0. "Duplicate TSER row in RDB$CHANGES" by ITVMS1::GTACCHIN (Giovanni Tacchini) Thu Mar 20 1997 07:35

hi,

we have a strange situation about RDB$CHANGES.

Starting point: INVLOGTYP error in a transfer .LOG

well,

         select
             C1.RDBVMS$VINTAGE_TRANSFER_NAME,
             C1.RDB$VINTAGE_TSER,
             C1.RDB$VINTAGE_TIMESTAMP
         from RDB$VINTAGE C1
         where (C1.RDBVMS$VINTAGE_TRANSFER_ID > 0)

produce...

 RDBVMS$VINTAGE_TRANSFER_NAME      RDB$VINTAGE_TSER   RDB$VINTAGE_TIMESTAMP     
 SICC_ENT_CORP                               382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_DIENST                             382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KONTO                              382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KON_ZDL                            382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KUNDE                              382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_KUN_RECHADR                        382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_MISC                               382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_MUI_VERTRAG                        382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_RECHADR                            382288   18-MAR-1997 13:38:03.24   
 SICC_ENT_TEILNEHMER                         382288   18-MAR-1997 13:38:03.24   

that means: all changes on the source with TSER <= 382288 are replicated 
on target database.

After some check and analisys I have discovered that in RDB$CHANGES exists
many duplicate entrys for TSER.

	select 	RDB$TRANSACTION_TSER, COUNT(*)
	from	RDB$CHANGES
	where	RDB$TRANSACTION_TSER > 0
	group 	by RDB$TRANSACTION_TSER

it can be verified that about all TSER have a count=1 but there are about 50/60
TSER with count=2. The actual RDB$CHANGES cardinality is 100k~.

The first TSER duplicate value is 382289.

By mean of SELECT DBKEY FROM RDB$CHANGES & RMU/DUMP/AREA/START...
I have verified lasted time of about 1 hours btw two row in RDB$CHANGES
with the same TSER.

I have asked to the customer "What happend in that range of time?"

1) Stop ACMS application,
2) drop index defined in RDB$CHANGES for a mismake presence
3) one process has became RWAST 
4) shutdwon/reboot of system

The machine is VAX 7000 member of a cluster with other 2x VAX7000
OS is VAX/VMS V6.1, Rdb V6.0-1.

I would like to know:

1) what does it meant "(<value>)" in:
	INVLOGTYP,     invalid logical change file record type (<value>) was
         	       found

2) the possible reason for presence of duplicate values of TSER
3) REINITIALIZE TRANSFER work fine with this scenario?
4) if answer #3 is YES can I delete rows from target table before
   the reinitialize command, this in order to minimize the elapsed time 
   for transfer execution?
5) after all transfer REINITIALIZE, what does it happend with the 
   RDB$CHANGES rows?

The system is a production enviroment and the time available to manage
the problem is limited to a window in the night (3 or 4 hours) or
sunday time :-(,
fortunately the update logged in RDB$CHANGES are not important :-)


thank in advance

ciao
GIovanni

T.RTitleUserPersonal
Name
DateLines
271.1Yeah!ITVMS1::GTACCHINGiovanni TacchiniFri Mar 21 1997 08:0814
Yeah, 

yesterday I just came back from holyday and my head & heart were elsewhere...

Now I'm only looking for the answer for this question:

How is possible to have a duplicate TSER in RDB$CHANGES row?

sorry, thank you

ciao 
GIovanni

271.2Some answers to your questionsBROKE::PROTEAUJean-Claude ProteauSun Mar 23 1997 23:5690
>After some check and analisys I have discovered that in RDB$CHANGES exists
>many duplicate entrys for TSER.
>
>	select 	RDB$TRANSACTION_TSER, COUNT(*)
>	from	RDB$CHANGES
>	where	RDB$TRANSACTION_TSER > 0
>	group 	by RDB$TRANSACTION_TSER
>
>it can be verified that about all TSER have a count=1 but there are about 50/60
>TSER with count=2. The actual RDB$CHANGES cardinality is 100k~.

This is most interesting.  It is the first time that I have heard of this type
of corruption.  It doesn't mean that it hasn't happened before, only that it
has not been reported with enough information to point out the existence of
duplicate TSER values.

Whenever corruption of RDB$CHANGES is evident, we would like to analyze some
parts of the database to try to determine what happened.  Would it be possible
to get:

The database root file
The RDB$SYSTEM storage area inluding snapshot file
The segmented string storage area and snapshot file if different from RDB$SYSTEM
The TSER value from the target databases in the RDB$VINTAGE tables.  This tells
us the last good transaction and, by implication, the next transaction which
shows a corruption.

>The first TSER duplicate value is 382289.
>
>By mean of SELECT DBKEY FROM RDB$CHANGES & RMU/DUMP/AREA/START...
>I have verified lasted time of about 1 hours btw two row in RDB$CHANGES
>with the same TSER.
>
>I have asked to the customer "What happend in that range of time?"

Is the list of 4 items below what the customer told you in answer to your
question?

>1) Stop ACMS application,
>2) drop index defined in RDB$CHANGES for a mismake presence

When the index was dropped, were there any other applications attached to
the source database?  You seem to imply yes in item 3.  Is the database
accessed by more than one node in a cluster?  If so, were there any applica-
tions running on the other nodes and attached to the database when the index
was dropped?

>3) one process has became RWAST 
>4) shutdwon/reboot of system
>
>The machine is VAX 7000 member of a cluster with other 2x VAX7000
>OS is VAX/VMS V6.1, Rdb V6.0-1.
>
>I would like to know:
>
>1) what does it meant "(<value>)" in:
>	INVLOGTYP,     invalid logical change file record type (<value>) was
>         	       found

The error message is explained in the DDAL$MSG.DOC file in the SYS$HELP
directory.  In this case, it means that the information in the RDB$CHANGES
table is bad.  It cannot say how the data corruption occurred.  That's why
we would like to get copies of some of the database files to see if we can
determine what might have caused the problem.

>2) the possible reason for presence of duplicate values of TSER

At present, I've no idea how this might have happened.  You are the first
to have reported such an error.  We've heard of INVLOGTYP before, but customers
did not send us the database files to analyze.

>3) REINITIALIZE TRANSFER work fine with this scenario?

Yes.

>4) if answer #3 is YES can I delete rows from target table before
>   the reinitialize command, this in order to minimize the elapsed time 
>   for transfer execution?

I presume the transfer defintion says TO EXISTING FILENAME ...  In such case,
yes, you can delete the rows in the target tables before restarting the trans-
fer.

>5) after all transfer REINITIALIZE, what does it happend with the 
>   RDB$CHANGES rows?

The rows will initially be skipped over, and eventually they will be erased
from the table.

Claude
271.3itvms1.it.oracle.com::GTACCHINGiovanni TacchiniTue Mar 25 1997 08:38123
hi Claude,

The situation now is normal and Ok.

On friday (21-mar-1997) from 03:30am to 07:00 am all transfer are 
REINITIALIZED and after, saturday, sunday... all transfer are started on 
regular daily basis. Now the RDB$CHANGES are purged.

Fortunately, during the analisys of corruption, I have performed all your 
suggestion (in note 272) plus a verify via dump of AIJ file that the data
in RDB$CHANGES are generated via normal user application program (not UPDATE
RDB$CHANGES...). My conclusion is that the TSER value, after "X" hour, is
not good. That means: isn't updated (or missing or not flushed, I don't know)
the location where TSER is stored wich last value used. This is only my
IMPRESSION.

At this point I have:
1-RDB$CHANGES purged
2-the root file is 2783 blk
3-the RDB$SYSTEM.RDA is 1804656 blk
4-the RDB$SYSTEM.SNP is 1741956 blk	(is a MF database without user object
					 in RDB$SYSTEM, RDB$CHANGES cardinality
					 is approx 200k~250k row before the
					 first REPLICATION UPDATE TRANSFER
                                         after the REINITIALIZE)

5-output of :SQL> sel DBKEY, RDB$TRANSACTION_TID,
                             RDB$TRANSACTION_SEQUENCE,
                             RDB$TRANSACTION_TSER 
                 from    rdb$changes
                 where   RDB$TRANSACTION_TSER > 0
                 order   by RDB$TRANSACTION_TSER;
  (omitted)

6-sel * from RDB$TRANSFERs; (source DB)
 RDB$TRANSFER_NAME                 RDB$VINTAGE_TSER   
 SICC_ENT_CORP                               378734   
 SICC_ENT_DIENST                             378734   
 SICC_ENT_KONTO                              378734   
 SICC_ENT_KON_ZDL                            378734   
 SICC_ENT_KUNDE                              378734   
 SICC_ENT_MISC                               378948   
 SICC_ENT_RECHADR                            378755   
 SICC_ENT_KUN_RECHADR                        378734   
 SICC_ENT_MUI_VERTRAG                        378734   
 SICC_ENT_TEILNEHMER                         378734   
10 rows selected

7- select C1.RDBVMS$VINTAGE_TRANSFER_NAME, (from a view defined in target DB)
          C1.RDB$VINTAGE_TSER,
          C1.RDBVMS$VINTAGE_TRANSFER_BUSY,
          C1.RDB$VINTAGE_TIMESTAMP
   from RDB$VINTAGE C1
   where (C1.RDBVMS$VINTAGE_TRANSFER_ID > 0)
 
 NAME                                     TSER   BUSY   TIMESTAMP               
 SICC_ENT_CORP                          382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_DIENST                        382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_KONTO                         382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_KON_ZDL                       382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_KUNDE                         382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_KUN_RECHADR                   382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_MISC                          382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_MUI_VERTRAG                   382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_RECHADR                       382288   F      18-MAR-1997 13:38:03.24 
 SICC_ENT_TEILNEHMER                    382288   F      18-MAR-1997 13:38:03.24 

8-sel RDB$TRANSACTION_TSER, count(*)
  from RDB$CHANGES
  group by RDB$TRANSACTION_TSER;
  (omitted)


9-a RMU/DUMP/AREA=RDB$SYSTEM/Start... (without /OPTION=DEBUG) of two page
  with same TSER
  (omitted)

10-two AIJ file involved in the "X" hour (+/- 500k blk each)
  (omitted)

11-RMU/DUMP/HEADER source_database
  (omitted)

Not all info I post here for space reason. Mind the RDB$SYSTEM.(RDA SNP)
dimension, is impossible to COPY. Eventually I can only SAVE it in other 
disk in order to analyze the corruption.


>Is the list of 4 items below what the customer told you in answer to your
>question?

yes

>When the index was dropped, were there any other applications attached to
>the source database?

RMU/DUMP/USER & RMU/SHOW USER report 0 user

>You seem to imply yes in item 3.  

sorry, the process in RWAST is a CP ACMS process, not involved, not attached.

>Is the database accessed by more than one node in a cluster?

in normal condition yes, at the "drop index" time no.

Sorry for the my stupid question in .0 (Q5 Q4 Q3 Q1 to be precise). I wrote 
these in order to obtain help from EC people at GMT time :-) before my 
personal test.

A very good things for me a this point is acquire more knowledge and/or tools
in order to analyze 1500k blks of RDB$SYSTEM (when and if a new corruption
happen), for example:

1-exist a INTERNALS... for ROR?
2-where is stored the LAST TSER used in a source database...

etc...


Thank you, ciao
GIovanni

271.4A few answersBROKE::PROTEAUJean-Claude ProteauMon Apr 07 1997 11:5623
    Giovanni,
    
    > 1-exist a INTERNALS... for ROR?
    
    Sorry, no.  There is a little information in one of the appendices of
    the ROR handbook (appendix C, I think).
    
    > 2-where is stored the LAST TSER used in a source database.
    
    While there are applications running, the last TSER used in the source
    database is not on disk, it is in virtual memory.  But if you look at
    the database at a point where there are no other users attached, the
    way to find out is to search RDB$CHANGES for the maximum TSER value in
    the RDB$TRANSACTION_TSER column.
    
    -------------------
    
    By the way, please read note 272.  It explains what information we need
    in case such a problem recurs.
    
    Ciao,
    
    Claude
271.5thank...itvms1.it.oracle.com::GTACCHINGiovanni TacchiniWed Apr 09 1997 14:069
welcome Claude!

>    By the way, please read note 272.  It explains what information we need
>    in case such a problem recurs.

done :-) see my previous reply... 
   
ciao,
GIovanni
271.6Time to enter a bug reportBROKE::PROTEAUJean-Claude ProteauFri Apr 11 1997 11:106
    
    Well, Giovanni, do you want to pursue this problem anymore?  If so,
    submit a bug report and provide us if you can with the information you
    say you gathered.
    
    Claude
271.7itvms1.it.oracle.com::GTACCHINGiovanni TacchiniFri Apr 11 1997 13:3020
hi,

>Well, Giovanni, do you want to pursue this problem anymore?  
NO

>If so, submit a bug report and provide us if you can with the information you
>say you gathered.

Since RDB$CHANGES is purged, RDB$SYSTEM.(RDA,SNP) are too big and dated TODAY
and AIJ files are deleted I can't manage a BUG (I think, right?).

In my note 271.3 you can see the list of info that I have. With exception of 
item #3 (RDB$SYSTEM.RDA), #4 (RDB$SYSTEM.SNP) and #10 I can send you by MAIL 
(or post here) all other info.

The most important is prepare the env. suitable to collect all info, in other 
hand reduce the RDB$SYSTEM size. IF and WHEN corruption re-happen I open a bug.

thank Claude, ciao
GIovanni
271.8Wait until WednesdayBROKE::PROTEAUJean-Claude ProteauFri Apr 11 1997 17:086
    
    The information to gather was suggested by Ian Smith.  I'll ask him if
    the partial information you have might be of value.  If you don't hear
    from us by the middle of next week, assume no.
    
    Claude