[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

4985.0. "SYSTEM-F-ACCVIO from recovery process" by ukvms3.uk.oracle.com::LWILES (Louise Wiles, UK Rdb support) Fri Jan 31 1997 05:20

    Hi,

    Rdb V6.0-14
    VAX VMS V6.1

    A customer of mine found a DBR bugcheck with no exception, but loads of
    ACCVIOs & bugcheck retries.

    There was also an ACCVIO in the monitor log for the recovery process.
    This has happened 2 or 3 times in the past month. Each time, the
    database has recovered transparently - they've not had to shut it down
    etc.

    I just wondered if anyone can shed any light on what's happening.

    The bugcheck has the following in it:

    %SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual
    adress=00000000, PC=0007BA5E, PSL=03C00000
    Bugcheck retry count is 0, depth is 0
    %SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual
    adress=00000000, PC=0007BA5E, PSL=03C00000
    Bugcheck retry count is 1, depth is 0

    with bugcheck retries increasing to about 100.
     
    The monitor log has the following entries:

27-JAN-1997 11:38:55.84 - received recovery image termination from 2020031D:1
  - user freed 26 global buffers, 1875 free out of 1875
  - recovery failed
  - starting delete-[rpcess shutdown of database <database>
    - "%RDMS-F-DBRABORTED, database recovery process terminated abnormally"
  - database shutdown waiting for recovery to terminate

27-JAN-1997 11:38:56.35 - received request for remote node to join
  - database <database>
  - cluster watcher waiting for MEMBIT lock

27-JAN-1997 11:38:56.38 - received recovery process termination from 2020031D:1
  - final status: "%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=0001AAA3, PC=201C0000, PSL=7FED53CC"
  - recovery failed
  - continuing shutdown of database <database>
  - database shutdown of <database>

    Thanks,
    Louise.

T.RTitleUserPersonal
Name
DateLines
4985.1NOVA::R_ANDERSONOracle Corporation (603) 881-1935Fri Jan 31 1997 07:366
Hard to say what caused the problem.

The PC indicates an accvio in the bugcheck code (duh :-), so without the
bugcheck stack trace this is kind of useless...

Rick
4985.2It's a mysteryHOTRDB::PMEADPaul, [email protected], 719-577-8032Fri Jan 31 1997 09:355
    This looks like another case of the bugcheck code not being able to
    successfully set the stall message.  We see this off and on, but have
    never been able to determine why.  In 6.1 I hacked the code to make
    sure it could write the stall message before attempting to actually do
    it.  That code will be in the next ECO to 6.1.
4985.3Accvio @ 201C0000 w/o DMP while RB on VAX ?NOMAHS::SECRISTRdb WWS; [email protected]Mon Feb 17 1997 10:4526
    
    ; ...looks like another case of the bugcheck code not being able to
    ; successfully set the stall message ...
    
    So if we're bugchecking the bugcheck code I'd expect there to be
    no bugchecks ?  If we get here we're going down at whatever level
    due to yet anothr problem, only we're not going to have the
    information we need to diagnose it, right ?
    
    This VAX customer was dying with an access violation, reason mask 00,
    virtual address 0001AAA3, PC 201C0000, with a PSL that is garbled
    in his FAX that I can retrieve if you want it -- but we can only find 
    an entry in RDMMON.LOG and no dump files anywhere.  Is this problem 
    always at that PC on a VAX ?
    
    ; That code will be in the next ECO to 6.1.
    
    Which ECO to 6.1 would you expect this to be in ?  Could this be
    part of an ECO for V6.0A or a special patch ?  
    
    Regards,
    rcs
    
    
    
    
4985.4HOTRDB::PMEADPaul, [email protected], 719-577-8032Mon Feb 17 1997 11:385
    That PC doesn't make any sense to me so I can't say what is going on.
    
    The hack I put in the dumper code would be in V6.1A ECO 1 (V6.1-11).  I
    don't know of any immediate plans to begin work on that ECO at this
    point in time.
4985.5Bug 421402CHSR36::LCONSMon Feb 17 1997 11:454
    Could it be related to bug 421402 ?
    With no Stop/id or Ctrl Y the problem has disapeared
    
    Louis
4985.6Are all the cases of this on VAX ?NOMAHS::SECRISTRdb WWS; [email protected]Mon Feb 17 1997 12:4417
    
    ; That PC doesn't make any sense to me so I can't say what is going on. 
    
    That is the same PC and VA from .0 of this note and that my
    customer saw today.  They were at 6.0-14 so we're going to
    start by brining them up to 6.0-16, but if they get this a
    lot more they're going to be interested in your hack so we
    can get to the root of the problem (it still happens
    infrequently and they can't reproduce it at will, but since
    it takes ~1.5-2 hours to roll everything back when whatever
    DOES bite them they're getting real sick of not having a 
    clue what it is).
    
    Regards,
    rcs
    
    
4985.7HOTRDB::PMEADPaul, [email protected], 719-577-8032Mon Feb 17 1997 13:188
>    That is the same PC and VA from .0 of this note and that my
>    customer saw today.  
    
    Oh, the one from the monitor log.  Well that is junk.  You need to make
    sure they don't have any files in SYS$SYSTEM.  If you don't find the
    RDMDBRBUG.DMP files then look for KOD$TT. files.
    
    In any case, you might want to look at the bug Louis refered to.