[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::dbms

Title:VAX DBMS
Notice:THIS NOTESFILE IS NOT A FORMAL SUPPORT CHANNEL
Moderator:SCARY::CHARLAND
Created:Thu Feb 20 1986
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2642
Total number of notes:11044

2625.0. "ALS bugcheck w/AIJTUL$ABORT" by m5.us.oracle.com::LWILCOX (How about Fireworks?) Tue May 06 1997 15:41

6.1-1, ECO 1 not yet applied.  Database is configured for 5 journals, using
3, 75,000 blocks each.  FAST COMMIT disabled (as seen in the header and
in the bugchecks).  ALS enabled, OVERWRITE enabled.

Customer reports that for the past 3 nights one of her journals has become
inacessible and she receives an ALS bugcheck w/no exception but with
AIJUTL$ABORT in the stack:


19-OCT-1995 17:05:27.89: Linked ALS (DBMS) DBM$ALPHA_STD:[KIT]
19-OCT-1995 11:34:06.42: Compiled ALS (KODA) KOD$ALPHA_V0611D:[CODE]
19-OCT-1995 11:33:47.93: Compiled KOD$LIBRARY (KODA) KOD$ALPHA_V0611D:[CODE]
================================================================================
          Stack Dump
================================================================================

Saved PC = 00133A38 : STD$DUMP_ALPHA_VMS_STACK + 00000088
Saved PC = 00119044 : KOD$BUGCHECK_DUMP + 00001014
Saved PC = 0006AACC : AIJUTL$ABORT + 000002E4
Saved PC = 00071040 : AIJUTL$SWITCH_FILE + 00000CE8
Saved PC = 000730B4 : AIJUTL$UPDATE_LEOF + 0000043C
Saved PC = 00066A40 : ALS$FLUSH_ONE_CACHE + 00000158
Saved PC = 00066540 : ALS$FLUSH + 00000180
Saved PC = 000662B8 : ALS$MAIN + 000009B8
Saved PC = 9EE69C44 : S0 address
================================================================================

One from the VAX looks like this:

   1 00000000
        Handler = 0007CC15, PSW = 0000, CALLS = 1, STACKOFFS = 0
        Saved AP = 7FE1D2D4, Saved FP = 7FE1D2B4, PC Opcode = DD
SR2 = 00173008: 00000000 00000000 00000000 00009C22 00063A24 000054FE 00000000
SR3 = 0015EA00: 000007A8 00000000 00000000 00000005 0000002C 00000000 00000000
SR4 = 00143700: 00000000 01000000 00000000 00000000 00000001 01C9007C 00000001
SR5 = 00000004
SR6 = FFFFFFFF
SR7 = 00000E10: 0001C2B3 0000454C 49465F45 5441434E 5552545F 4F492449 534F4315
SR8 = 000007A9: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
SR9 = 00000001
        16 bytes of stack data from 7FE1D2A4 to 7FE1D2B4:
0025B2100025B2200000000000000001  0000   '........ �%..�%.'

Saved PC = 0002639F : AIJUTL$ABORT + 00000187
ARG# Argument [data...] -----------------------------------------------------
   1 00000005
   2 00000001
        Handler = 0002641D, PSW = 0000, CALLS = 1, STACKOFFS = 0
        Saved AP = 7FE1D864, Saved FP = 7FE1D828, PC Opcode = D4
SR2 = 0015FA00: FFFFFFFE FFFFFFFF 0000000A 00010001 00110001 00000001 268109D5
SR3 = 00000005


Judging from argument 1 it looks like the AIJ is full.  Well, that's ok, now
we just want to switch to another and overwrite it if necessary so we can
continue processing.  There also is a recovery bugcheck dump produced that
looks like this:


******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;4

***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB4233DBFB.RUJ;1
-RMS-E-FNF, file not found

******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;3

***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB39F7183B.RUJ;1
-RMS-E-FNF, file not found

******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;2

***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB39F7183B.RUJ;1
-RMS-E-FNF, file not found

******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;1
***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB4233DBFB.RUJ;1
-RMS-E-FNF, file not found

The 4 dbr bugchecks were produced only seconds apart from one another.

The only reason I could come up with for overwrite not to overwrite is
if fast commit is enabled which it is not in her case.  Have I missed
something obvious?  (or something subtle...?)

Thanks,

Liz
T.RTitleUserPersonal
Name
DateLines
2625.1m5.us.oracle.com::LWILCOXHow about Fireworks?Tue May 06 1997 15:427
One other tidbit I forgot to mention is she tried to use ROTATE and this
command appeared to hang.  She also mentioned at one time she noted a stall
message along the lines of "waiting for AIJ submission".

Thanks.

Liz
2625.2NOVA::R_ANDERSONOracle Corporation (603) 881-1935Tue May 06 1997 21:3815
    For the benefit of other readers, I'll provide a synopsis of what I
    sent Liz offline.
    
    The ALS switch-over operation failed because 1 (or more) of 3 events
    occurred:
    1.  The switch-over timed out (not likely).
    2.  The switch-over was being performed by DBR (impossible :-)
    3.  A DBR was invoked after ALS was unable to locate a valid journal
    for over-writing (most likely).
    
    I'm not sure given the available information why the journals could not
    be over-written (especially because fast-commit is disabled so there
    are no checkpoint requirements), so further analysis is required.
    
    Rick
2625.3ACE *might* be the problem?M5::LWILCOXHow about Fireworks?Wed May 21 1997 15:128
After turning off the AIJ on electronic cache feature the database has not
had any problem with this.  I think I remember seeing something about that
somewhere, but can't come up with it.  Anyone else have any kind of similar
experience (bugchecks so long as ACE was on)?

Thanks.

Liz
2625.4More occurrencesM5::DMACKENZWed May 28 1997 12:5728
    This problem has occurred two more times.  Today's incident was when
    AIJ cache was disabled and DBMS is 6.1-11.  Bugchecks are on their way; 
    I expect to have more information soon.
    
    The first bugcheck occurred on May 4th, DBMS 6.1-1, AIJ cache enabled.
    No exception in the bugcheck.  The AIJ_STATUS is 1 (filaccerr right?)
    Following is noteworthy information about each AIJ file:
    o Current:   Activated May 4th, modified, inaccessible, never backed up,
                 AIJ_STATUS=5
    o Last Used: Activated May 4th, modified, 99% full, backup in progress,
                 never backed up, AIJ_STATUS=0
    o Oldest:    Activated April 28th, modified, 99% full, backup in
                 progress, last backed up April 28th, next to be backed up,
                 AIJ_STATUS=0
    
    I've requested operator notification be enabled since it is not at this
    point.
    
    The problem was corrected (disable AIJs, drop AIJs, create AIJs, BU
    database) before today's problem was reported.  Would there by any
    clues remaining as to the cause for the AIJ being marked inaccessible
    and the reason the AIJ backups are not completing?  What does an
    AIJ_STATUS of 5 mean?  Any other insights?
    
    Thanks,
    
    Diane
                           
2625.5AIJ backups not completingM5::DMACKENZWed May 28 1997 14:5512
    I received the ALS bugcheck from May 22nd and took a look at it.  DBMS 
    is 6.1-11 and AIJ caching is disabled.  Again no exception.  Two AIJ
    files are in the process of being backed up and the third is current
    and marked full and inaccessible.
    
    It appears to me that the current AIJ file is being marked inaccessible
    because it is full and there are no AIJ files available to write to,
    since the other two are in the process of being backed up.  I'll be
    looking for clues as to why the backups aren't completing.  Would there
    be any left behind?
    
    Diane