[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5060.0. "DATACHECK while dumping AIJ info" by M5::LWILCOX (Chocolate in January!!) Fri Feb 21 1997 14:28

This is only slightly similar to 4770 so I've started a new topic.

Customer is running 6.1-04 on an Alpha running 6.2 with all patches applied
(according to them).  On one of the production databases when they do either
RMU/DUMP/HEAD or RMU/DUMP/OPTIONS=DEBUG about 1 out of 4 times it will return

%RMU-F-FILACCERR, error reading disk file
-SYSTEM-F-DATACHECK, write check error

This does not happen on any other RMU commands or during any other kinds
of database operations.  The error happens at the point where it
is dumping the information about the currently active AIJ file.  Each time 
this error is returned the error count goes up on the AIJ device.  

I obtained the following from a dial in session:

     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 0039000C 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Read attributes: Record attributes CLEARING21_DATABASE_AIJ2.AIJ
;1 (12,57,0)
%XQP, Thread #0, Lookup CLEARING21_DATABASE_AIJ2.AIJ;1 (12,57,0) Status: 0000000
1
%XQP, Thread #0, Physical read  (12,57,0) Status: 0000005C
%RMU-F-FILACCERR, error reading disk file
-SYSTEM-F-DATACHECK, write check error
%XQP, Thread #0, FIB contents:
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 711F0017 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Deaccess (23,28959,0) Reads: 17, Writes: 0, Status: 00000001
%XQP, Thread #0, FIB contents:
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 0039000C 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Deaccess (12,57,0) Reads: 10, Writes: 0, Status: 00000001
N

The customer has spoken with support at Digital on this and was told:

"the miscompare error is not a disk or controller problem, but related to
any application performing hardware data checks while not locking down the 
source data buffers on the host until the compare is complete".

I've placed a call to the analyst at Digital to get some clarification on
this.  I think it's a fancy way of wanting to say it's an Rdb problem :-).
He was going to have one of the storage engineers in on the call.

We are interested in knowing what it is that RMU/DUMP does that might be
different from other commands and database operations.  (Yes, Norm, we have
the source code).

The customer did turn off HSJ caching.  He was curious if turning off RMS
caching would be useful.  I don't know what the impact would be.

Interestingly enough, I have found out that Sybase has a customer who
has been experiencing the same problem.

I am aware that Rdb engineering will not take ownership of this problem
but is willing to assist VMS engineering once the problem is escalated on
the Digital side.  I just wanted to present this information to see if
anyone could add anything at this point.

Thanks.

Liz
T.RTitleUserPersonal
Name
DateLines
5060.1HOTRDB::PMEADPaul, [email protected], 719-577-8032Fri Feb 21 1997 14:4221
>"the miscompare error is not a disk or controller problem, but related to
>any application performing hardware data checks while not locking down the 
>source data buffers on the host until the compare is complete".
    
    The application is not doing data checks, VMS is.  If VMS doesn't know
    how to handle multiple processes doing I/O to the same disk blocks at
    the same time then I think VMS might want to consider enhancing the
    feature or at least documenting this shortcoming.
    
    RMU/DUMP is different in that it doesn't do interlocking on the data it
    is going to read.  It is just a reader, and it is not critical that it
    has a completely consistent view of the data on disk, so it just does
    the reads with no regard whether or not other processes are writing the
    same disk blocks at the same time.
    
    In my mind this is simply a "feature" of VMS, and the customer should
    ignore it.  If they insist on pursuing it then they can continue their
    discussion with DEC support.  All of this stuff should be completely
    transparent to any VMS application (like Rdb) doing I/O to the disks. 
    If it is not transparent then it is a shortcoming of the underlying I/O
    subsystem.
5060.2m5.us.oracle.com::LWILCOXChocolate in January!!Thu Feb 27 1997 13:411
Thanks Paul!