[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:	Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:	RDB_60 is archived, please use RDB_70..
Moderator:	NOVA::SMITHISON

Created:	Fri Mar 18 1994
Last Modified:	Fri May 30 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5118
Total number of notes:	28246

5060.0. "DATACHECK while dumping AIJ info" by M5::LWILCOX (Chocolate in January!!) Fri Feb 21 1997 14:28

This is only slightly similar to 4770 so I've started a new topic.

Customer is running 6.1-04 on an Alpha running 6.2 with all patches applied
(according to them).  On one of the production databases when they do either
RMU/DUMP/HEAD or RMU/DUMP/OPTIONS=DEBUG about 1 out of 4 times it will return

%RMU-F-FILACCERR, error reading disk file
-SYSTEM-F-DATACHECK, write check error

This does not happen on any other RMU commands or during any other kinds
of database operations.  The error happens at the point where it
is dumping the information about the currently active AIJ file.  Each time 
this error is returned the error count goes up on the AIJ device.  

I obtained the following from a dial in session:

     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 0039000C 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Read attributes: Record attributes CLEARING21_DATABASE_AIJ2.AIJ
;1 (12,57,0)
%XQP, Thread #0, Lookup CLEARING21_DATABASE_AIJ2.AIJ;1 (12,57,0) Status: 0000000
1
%XQP, Thread #0, Physical read  (12,57,0) Status: 0000005C
%RMU-F-FILACCERR, error reading disk file
-SYSTEM-F-DATACHECK, write check error
%XQP, Thread #0, FIB contents:
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 711F0017 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Deaccess (23,28959,0) Reads: 17, Writes: 0, Status: 00000001
%XQP, Thread #0, FIB contents:
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, FIB contents:
     00000000 0039000C 00000000 00000000 00000000 00000000 00000000 00000000
     00000000 00000000 00000000 00000000 00000000 00000001 00000000 00000000
     00000000 00000000 00000000 00000000
%XQP, Thread #0, Deaccess (12,57,0) Reads: 10, Writes: 0, Status: 00000001
N

The customer has spoken with support at Digital on this and was told:

"the miscompare error is not a disk or controller problem, but related to
any application performing hardware data checks while not locking down the 
source data buffers on the host until the compare is complete".

I've placed a call to the analyst at Digital to get some clarification on
this.  I think it's a fancy way of wanting to say it's an Rdb problem :-).
He was going to have one of the storage engineers in on the call.

We are interested in knowing what it is that RMU/DUMP does that might be
different from other commands and database operations.  (Yes, Norm, we have
the source code).

The customer did turn off HSJ caching.  He was curious if turning off RMS
caching would be useful.  I don't know what the impact would be.

Interestingly enough, I have found out that Sybase has a customer who
has been experiencing the same problem.

I am aware that Rdb engineering will not take ownership of this problem
but is willing to assist VMS engineering once the problem is escalated on
the Digital side.  I just wanted to present this information to see if
anyone could add anything at this point.

Thanks.

Liz

T.R	Title	User	Personal Name	Date	Lines
5060.1		HOTRDB::PMEAD	Paul, [email protected], 719-577-8032	`Fri Feb 21 1997 14:42`	21
	>"the miscompare error is not a disk or controller problem, but related to >any application performing hardware data checks while not locking down the >source data buffers on the host until the compare is complete". The application is not doing data checks, VMS is. If VMS doesn't know how to handle multiple processes doing I/O to the same disk blocks at the same time then I think VMS might want to consider enhancing the feature or at least documenting this shortcoming. RMU/DUMP is different in that it doesn't do interlocking on the data it is going to read. It is just a reader, and it is not critical that it has a completely consistent view of the data on disk, so it just does the reads with no regard whether or not other processes are writing the same disk blocks at the same time. In my mind this is simply a "feature" of VMS, and the customer should ignore it. If they insist on pursuing it then they can continue their discussion with DEC support. All of this stuff should be completely transparent to any VMS application (like Rdb) doing I/O to the disks. If it is not transparent then it is a shortcoming of the underlying I/O subsystem.
5060.2		m5.us.oracle.com::LWILCOX	Chocolate in January!!	`Thu Feb 27 1997 13:41`	1
	Thanks Paul!