[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5106.0. "rdb 7.0 aij busy why?" by ukvms3.uk.oracle.com::SHISCOCK (stand and deliver) Wed Mar 05 1997 12:07

    Hi,
    
    I have a database that hangs when trying to do an aij backup with
    waiting for busy AIJ sequence 24
    for no apparent reason. I've run through the scenario on other
    databases and they're fine. So I've no idea why this hang is
    occuring.
    
    restored from a backup.
    Alter database to reserve 5 journal slots.
    Alter database to enable ALS, add 5 journals and enable fast commit.
    rmu/backup offline
    rmu/set after/switch     <---   empty aij
    rmu/back/after/quiet/nowait db_name aij_back_file
    
    Database is not opened so no users are in.
    
    AIJ info screen shows
    
    S_AIJ1                              24 *BACKUP NEEDED* Written Backing up
    S_AIJ2                              25     512       2 Current Accessible
    
    The backing up process is stuck in LEF. Current PC is 800A1308. There
    are no waiting or blocking locks.
    
    If I then disable fast commit it all works.
    
    
    rdb 7.0 axp/vms 7.0
    
    Any ideas,
    
    Steve
    
                                               
T.RTitleUserPersonal
Name
DateLines
5106.1NOVA::R_ANDERSONOracle Corporation (603) 881-1935Wed Mar 05 1997 13:0533
Hmmm - works for me...  I restored from backup, added 5 journals (20 slots), ALS
enabled, ABS disabled, fast-commit enabled  - see below...

What process does SHOW STATS "Checkpoint Information" screen indicate is holding
the AIJ journal 24???

Rick




ALL> rmu/backup mf_personnel mf_personnel.rbf
ALL> rmu/set after/switch mf_personnel
ALL> rmu/backup/after/quiet/nowait/log mf_personnel backup.aij
%RMU-I-AIJBCKBEG, beginning after-image journal backup operation
%RMU-I-OPERNOTIFY, system operator notification: Oracle Rdb Database KODH$:[R_AN
DERSON.WORK.ALS]MF_PERSONNEL.RDB;1 Event Notification
AIJ backup operation started

%RMU-I-AIJBCKSEQ, backing up after-image journal sequence number 4
%RMU-I-LOGBCKAIJ, backing up after-image journal RICK4 at 13:02:24.97
%RMU-I-LOGCREBCK, created backup file KODH$:[R_ANDERSON.WORK.ALS]BACKUP.AIJ;4
%RMU-I-AIJBCKSEQ, backing up after-image journal sequence number 5
%RMU-I-LOGBCKAIJ, backing up after-image journal RICK5 at 13:02:25.91
%RMU-I-QUIETPT, waiting for database quiet point
%RMU-I-OPERNOTIFY, system operator notification: Oracle Rdb Database KODH$:[R_AN
DERSON.WORK.ALS]MF_PERSONNEL.RDB;1 Event Notification
AIJ backup operation completed

%RMU-I-AIJBCKEND, after-image journal backup operation completed successfully
%RMU-I-LOGAIJJRN, backed up 2 after-image journals at 13:02:26.92
%RMU-I-LOGAIJBLK, backed up 508 after-image journal blocks at 13:02:26.92
ALL>
5106.2no clues in show statsukvms3.uk.oracle.com::SHISCOCKstand and deliverThu Mar 06 1997 03:4013
                                                           
    Checkpoint info showed nothing.
    The checkpoint info (unsorted) just had the 1 line with the process
    id of the backup process and an entry beneath QuietVno.
    
    I've run through identical scenarios with other databases and they
    work fine.
    
    Any other ideas how I can find out why this one stalls?
    
    thanks,
    
    Steve
5106.3NOVA::R_ANDERSONOracle Corporation (603) 881-1935Thu Mar 06 1997 08:4223
What *exactly* does the SHOW STATS "Active Stall Messages Screen" display when
the AIJ backup is waiting for available journal?

Here's something to try: 

1.  $ define/sys RDM$BIND_ABS_LOG_FILE "device:[directory]RDMABS70_PID.LOG"
2.  $ rmu/set after/backup=(automatic,backup_file=aij_back_file) db_name
3.  $ rmu/set after/switch db_name

Once the ABS backup fails, examine the logfile.  Note that the "_PID" portion of
the filename will have been replaced with the ABS process' PID, so the filename
will be something like "RDMABS70_12345678.LOG".

You should see a line (or several) that say "waiting for journal 24 (oldest
checkpoint X:Y)".  I would expect the "X:Y" to be something like "24:2".

RMU/DUMP/HEADER should list the active users, including their current checkpoint
information.  If this is NOT the case, use the RMU/DUMP/HEADER/OPTION=DEBUG
command and search for an occurrence of "CKPT_VNO = n." where "n <> -1".  This
is the process stopping the AIJ backup.  (Note that "n" should match the "X"
above).

Rick
5106.4thanksukvms3.uk.oracle.com::SHISCOCKstand and deliverFri Mar 07 1997 02:2123
    
    Many thanks Rick.
    
    From the ABS log
    
     7-MAR-1997 02:12:35.66 - Oldest RCS checkpoint found 23:2
    
    which repeated itself until the backup timed out.
    
    The dump/head/opt=debug gave the 23. No active users except the
    backup process.
    
    CKPT_VNO = 23.            DBID = 1.
    CKPT_VNO = 4294967295.    DBID = 2.
    
    The second ckpt 4294967295 had a dozen or more occurances aganst 
    different DBIDs.  Certainly a big gap in the numbers which I presume
    to be incremental. Does that mean a really old checkpoint failed
    and it's never been cleared?
    
    cheers,
    
    Steve
5106.5NOVA::R_ANDERSONOracle Corporation (603) 881-1935Fri Mar 07 1997 07:048
Ignore the 4294967295 (that's a fancy "-1" :-).  Those are "no initial
checkpoint" indications.

For the RTUPB entry with the CKPT_VNO=23, what is the process indicated by the
corresponding PID?  Could you post the entire RTUPB entry, please?  (just the
entry with the CKPT_VNO=23).

Rick
5106.6no process referenceukvms3.uk.oracle.com::SHISCOCKstand and deliverFri Mar 07 1997 07:4821
    
    There is no rtupb with 23 in it. There's only 1 in use with any detail
    and that's the first and that belongs to the backup process.
    
    The rest are
    
    RTUPB_ENT[2.] @00C91840
    
    00000000000000000000000000000000  0000   '................'
                                      ::::  (1 duplicate line)
    
    They're only 30 user slots.
    
    A word of warning is that this database has been tested against
    the ft release of eco1 for 7.0.
    
    cheers,
    
    Steve
    
     
5106.7NOVA::R_ANDERSONOracle Corporation (603) 881-1935Mon Mar 10 1997 08:5318
An examination of the database header dump has revealed that a few important
details were missing from previous descriptions.

1.  Record caching is enabled (a *major* missing detail).
2.  While "fast commit" was disabled (implicitly disabling record cache), 3 AIJ
journals were backed up (26 is now "current" - 23, 24 & 25 backed up).
3.  After re-enabling "fast commit" (implicitly re-enabling record cache), the
record cache checkpoint remained at "23" even though the "current" AIJ journal
is "26".
4.  The database open mode is "manual" but it appears that the AIJ backup
operation (the failing one) is being done while the database is closed.

I would recommend manually opening the database before performing the AIJ backup
operation.  Also, before starting the AIJ backup operation, verify using the
"Checkpoint Information" SHOW STATS screen that the record cache checkpoint was
advanced successfully.

Rick