| Title: | VAX DBMS |
| Notice: | THIS NOTESFILE IS NOT A FORMAL SUPPORT CHANNEL |
| Moderator: | SCARY::CHARLAND |
| Created: | Thu Feb 20 1986 |
| Last Modified: | Tue Jun 03 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2642 |
| Total number of notes: | 11044 |
6.1-1, ECO 1 not yet applied. Database is configured for 5 journals, using
3, 75,000 blocks each. FAST COMMIT disabled (as seen in the header and
in the bugchecks). ALS enabled, OVERWRITE enabled.
Customer reports that for the past 3 nights one of her journals has become
inacessible and she receives an ALS bugcheck w/no exception but with
AIJUTL$ABORT in the stack:
19-OCT-1995 17:05:27.89: Linked ALS (DBMS) DBM$ALPHA_STD:[KIT]
19-OCT-1995 11:34:06.42: Compiled ALS (KODA) KOD$ALPHA_V0611D:[CODE]
19-OCT-1995 11:33:47.93: Compiled KOD$LIBRARY (KODA) KOD$ALPHA_V0611D:[CODE]
================================================================================
Stack Dump
================================================================================
Saved PC = 00133A38 : STD$DUMP_ALPHA_VMS_STACK + 00000088
Saved PC = 00119044 : KOD$BUGCHECK_DUMP + 00001014
Saved PC = 0006AACC : AIJUTL$ABORT + 000002E4
Saved PC = 00071040 : AIJUTL$SWITCH_FILE + 00000CE8
Saved PC = 000730B4 : AIJUTL$UPDATE_LEOF + 0000043C
Saved PC = 00066A40 : ALS$FLUSH_ONE_CACHE + 00000158
Saved PC = 00066540 : ALS$FLUSH + 00000180
Saved PC = 000662B8 : ALS$MAIN + 000009B8
Saved PC = 9EE69C44 : S0 address
================================================================================
One from the VAX looks like this:
1 00000000
Handler = 0007CC15, PSW = 0000, CALLS = 1, STACKOFFS = 0
Saved AP = 7FE1D2D4, Saved FP = 7FE1D2B4, PC Opcode = DD
SR2 = 00173008: 00000000 00000000 00000000 00009C22 00063A24 000054FE 00000000
SR3 = 0015EA00: 000007A8 00000000 00000000 00000005 0000002C 00000000 00000000
SR4 = 00143700: 00000000 01000000 00000000 00000000 00000001 01C9007C 00000001
SR5 = 00000004
SR6 = FFFFFFFF
SR7 = 00000E10: 0001C2B3 0000454C 49465F45 5441434E 5552545F 4F492449 534F4315
SR8 = 000007A9: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
SR9 = 00000001
16 bytes of stack data from 7FE1D2A4 to 7FE1D2B4:
0025B2100025B2200000000000000001 0000 '........ �%..�%.'
Saved PC = 0002639F : AIJUTL$ABORT + 00000187
ARG# Argument [data...] -----------------------------------------------------
1 00000005
2 00000001
Handler = 0002641D, PSW = 0000, CALLS = 1, STACKOFFS = 0
Saved AP = 7FE1D864, Saved FP = 7FE1D828, PC Opcode = D4
SR2 = 0015FA00: FFFFFFFE FFFFFFFF 0000000A 00010001 00110001 00000001 268109D5
SR3 = 00000005
Judging from argument 1 it looks like the AIJ is full. Well, that's ok, now
we just want to switch to another and overwrite it if necessary so we can
continue processing. There also is a recovery bugcheck dump produced that
looks like this:
******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;4
***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB4233DBFB.RUJ;1
-RMS-E-FNF, file not found
******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;3
***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB39F7183B.RUJ;1
-RMS-E-FNF, file not found
******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;2
***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB39F7183B.RUJ;1
-RMS-E-FNF, file not found
******************************
SYS$SYSROOT:[SYSEXE]DBMDBRBUG.DMP;1
***** Exception at 0007ACC4 : DBR$RECOVER + 000009C4
%DBM-F-FILACCERR, error opening run-unit journal file SYS$COMMON:[SYSMGR.SHIPPIN
G.RUJ.ROB]MANDB$009B3CAB4233DBFB.RUJ;1
-RMS-E-FNF, file not found
The 4 dbr bugchecks were produced only seconds apart from one another.
The only reason I could come up with for overwrite not to overwrite is
if fast commit is enabled which it is not in her case. Have I missed
something obvious? (or something subtle...?)
Thanks,
Liz
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 2625.1 | m5.us.oracle.com::LWILCOX | How about Fireworks? | Tue May 06 1997 14:42 | 7 | |
One other tidbit I forgot to mention is she tried to use ROTATE and this command appeared to hang. She also mentioned at one time she noted a stall message along the lines of "waiting for AIJ submission". Thanks. Liz | |||||
| 2625.2 | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Tue May 06 1997 20:38 | 15 | |
For the benefit of other readers, I'll provide a synopsis of what I
sent Liz offline.
The ALS switch-over operation failed because 1 (or more) of 3 events
occurred:
1. The switch-over timed out (not likely).
2. The switch-over was being performed by DBR (impossible :-)
3. A DBR was invoked after ALS was unable to locate a valid journal
for over-writing (most likely).
I'm not sure given the available information why the journals could not
be over-written (especially because fast-commit is disabled so there
are no checkpoint requirements), so further analysis is required.
Rick
| |||||
| 2625.3 | ACE *might* be the problem? | M5::LWILCOX | How about Fireworks? | Wed May 21 1997 14:12 | 8 |
After turning off the AIJ on electronic cache feature the database has not had any problem with this. I think I remember seeing something about that somewhere, but can't come up with it. Anyone else have any kind of similar experience (bugchecks so long as ACE was on)? Thanks. Liz | |||||
| 2625.4 | More occurrences | M5::DMACKENZ | Wed May 28 1997 11:57 | 28 | |
This problem has occurred two more times. Today's incident was when
AIJ cache was disabled and DBMS is 6.1-11. Bugchecks are on their way;
I expect to have more information soon.
The first bugcheck occurred on May 4th, DBMS 6.1-1, AIJ cache enabled.
No exception in the bugcheck. The AIJ_STATUS is 1 (filaccerr right?)
Following is noteworthy information about each AIJ file:
o Current: Activated May 4th, modified, inaccessible, never backed up,
AIJ_STATUS=5
o Last Used: Activated May 4th, modified, 99% full, backup in progress,
never backed up, AIJ_STATUS=0
o Oldest: Activated April 28th, modified, 99% full, backup in
progress, last backed up April 28th, next to be backed up,
AIJ_STATUS=0
I've requested operator notification be enabled since it is not at this
point.
The problem was corrected (disable AIJs, drop AIJs, create AIJs, BU
database) before today's problem was reported. Would there by any
clues remaining as to the cause for the AIJ being marked inaccessible
and the reason the AIJ backups are not completing? What does an
AIJ_STATUS of 5 mean? Any other insights?
Thanks,
Diane
| |||||
| 2625.5 | AIJ backups not completing | M5::DMACKENZ | Wed May 28 1997 13:55 | 12 | |
I received the ALS bugcheck from May 22nd and took a look at it. DBMS
is 6.1-11 and AIJ caching is disabled. Again no exception. Two AIJ
files are in the process of being backed up and the third is current
and marked full and inaccessible.
It appears to me that the current AIJ file is being marked inaccessible
because it is full and there are no AIJ files available to write to,
since the other two are in the process of being backed up. I'll be
looking for clues as to why the backups aren't completing. Would there
be any left behind?
Diane
| |||||