[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:DEC Rdb against the World
Moderator:HERON::GODFRIND
Created:Fri Jun 12 1987
Last Modified:Thu Feb 23 1995
Last Successful Update:Fri Jun 06 1997
Number of topics:1348
Total number of notes:5438

452.0. "ANOTHER ORACLE 6320 PROBLEM" by CGOS01::HSACHS () Tue Oct 10 1989 21:53

Oracle and the VAX6320 are battling it out again, this time in Winnipeg.
This problem is similar to the one at Petro Can (note 397).

Our customer is the Manitoba Telephone system (MTS) and is currently running
an Oracle V5.1.22 application on a VAX 6310 running VMS V5.1.  MTS has 
bought a VAX 6320 onto which they are going to migrate the Oracle application
(the 6310 had actually been designated for other uses).

When moving the database using SQL Plus, MTS observed the following:
	o When the 6320 was running in SMP mode, the SQL process
	remained in hibernate state indefinately

	o When the 2nd processor was turned off, the SQL process began
	processing immediately

	o When the 2nd processor was turned on again, the SQL process
	immediatly returned to a hibernate state

	o When other processes were started up, the hibernating process
	also began to start processing.  We have no statistics to show
	if the processes are running any slower than what they would be
	in the case with the second processor turned off.

The system configuration includes a VAX 6320, 2 KDB50's, 4 RA90's, 2 RA70's, 
and VMS V5.1-1.

We have checked the hardware and operating system configurations, everything
seems to be OK.  Oracle has checked the Oracle installation and given it a
clean bill of health as well.

Are there any Oracle V5.1.22 sites out there that are running fine in an
SMP environment?

Oracle feels the solution of course is to migrate to Oracle V6.  The 
VAX 6320 however has been scoped out using the DECcp (capacity planner)
service on Oracle V5.1.22 .  We feel the model would be invalidated by
the Oracle upgrade as we do not know how Oracle V6 differs from V5 as
far as system (CPU, MEM, I/O) resource consumption. 

Does anyone have any idea what may be causing our problem?

Are there any sites out there that know if Oracle V6 uses more or less
system resources than Oracle V5?  (IE: if the customer uses migration
as a solution, would the DECcp model projection still be reasonably 
accurate?)

Oracle has already supplied a local Oracle Rep to help MTS upgrade to 
V6 to see if that will make the problem go away. 

    
T.RTitleUserPersonal
Name
DateLines
452.1SRFSUP::BREWISFri Oct 13 1989 20:5013
    WE have a customer in Los Angeles that currently has a custom project
    tracking system developed in Oracle and it is running version 5.1.22.
    The application is in pilot mode with 5-8 users simultaneously
    accessing the database each day.  At this point, the users have
    not complained about response problems.  
    
    A few things will be happening that will give us a better measurement
    criteria.  The customer plans to add more users and projects to
    track (eventually growing to 30-50 users) AND the 3rd party consultant who
    is maintaining the application plans to upgrade to V6.  We should
    begin to see some real results in the next month.
    
    Rick
452.2Any other Oracle SMP sites out there?CGOS01::HSACHSTue Oct 17 1989 21:0512
    
    Oracle has offered 1 potential cause of the Oracale V5.1.22 problem
    at MTS and PetroCan sites as being due to the fact that they are
    using KDB50 disk controllers, and not HSC's.  I am desprately trying
    to find any Oracle V5.1.22 sites running in an SMP environment,
    
   Especally if there is a site that is using a KDB50 disk controller,
    and the Oracle is a dedicated application on the system,  added
    bonuses if it is on a 6320 as well.   Are there any such sites
    out there?  Please respond,  Thanks.
    
    Harry
452.3Update on Petro-Canada 6320 situationCGOO01::TULLISCraig TullisTue Oct 17 1989 21:304
    I'm in a bit of a rush right now, but just to add to the .2 comment.
     Oracle is telling Petro-Canada that the problem is with RMS and
    KDBs.  I'm trying to find out more information from them at this
    point and will add to this note as the information comes in.
452.4Come on!WIBBIN::NOYCEBill Noyce, FORTRAN/PARALLELWed Oct 18 1989 17:072
    Right, next they'll tell them that the problem is the brown paint
    on top of the cabinet...
452.5not the paint, but maybe somethingCGOO01::TULLISCraig TullisWed Oct 18 1989 18:5712
    Actually, I was worried that they would somehow find a way to point
    the finger at us, but I am beginning to think that they may have
    some reason to say we are at least somewhat to "blame".
    
    They have tried to duplicate the problem on their 6340 with HSCs
    and have had no luck.  We have seen the problem on two different
    6320s with KDBs and on and 8350 with a KDB.  If you look at that,
    at least on the surface, it seems there may be a problem when there
    are KDBs involved.  They have, apparently, isolated the problem
    to waiting for I/O completion (LEF 30).  They were going to be doing
    some further testing, but their office is in Belmont California
    (not far from San Fransico) and so that could be delayed for a while.
452.6Could it be the earthquake ??MAIL::DUNCANGGerry Duncan @KCOWed Oct 18 1989 20:292
    And I'm sure that Oracle will somehow blame the earthquake for this
    problem !!
452.7Quake will delay solutionCGOO01::TULLISCraig TullisWed Oct 18 1989 21:286
    Actually Gerry, they are blaming the earthquake for the delay in
    solving the problem.
    
    I will keep everyone posted as results come in.
    
    
452.8WIBBIN::NOYCEBill Noyce, FORTRAN/PARALLELThu Oct 19 1989 15:397
    Are they really using Local Event Flag #30?  This is in the "reserved
    to VMS" range...  I wonder if they follow all the necessary protocols
    for sharing event flags, including using $SYNC instead of just $WAITFR,
    and settting the event flag after the test-clear-retest sequence?
    
    The only difference I would expect with a KDB is that I/Os get started
    faster.  Maybe that's the real problem?
452.9Other related conferencesTROA01::NAISHRDB4ME Paul Naish DTN 631-3352Thu Oct 19 1989 15:406
    You may also wish to cross-post the local vs clustered disk issues
    to one or more of the following:
    
    	Clusters	ELKTRA::CLUSTER
    	Storage Arch	SSAG::ASK_SSAG
    	VAX 6000	SASE::CALYPSO
452.10Oracle's $QIOs not word allignedCIMNET::BOURDEAURich Bourdeau CIM Product MarketingThu Oct 19 1989 21:4113
    
    I encountered a similar problem with CINCOM's Ultra database back in
    1985.  The problem is that the Oracle database is probably issuing
    $QIOs that are not word alligned.  It seems that the KDA, KDB, and
    RQDX3 disk controllers do not support odd byte transfers of data.  To 
    compensate for this PUDRIVER allocates it's own 512 byte word alligned
    buffer.  512 bytes are then moved from the unalligned buffer to the
    alligned buffer, and single block transfers are queued until the
    original request has been satisfied.  If this is the problem the
    symptoms are very obvious.  Split I/O will be very high.  The solution
    is for the database to issue word alligned $QIOs.  This solevd CINCOM's
    problem.   This may or may not be your problem, but it's worth looking
    into. 
452.11Oracle SMP test resultsCGOS01::HSACHSTue Nov 21 1989 20:1192
  
Greetings All.  There are 2 occurances of this notes entry, they are
	BISTRO::RDB_VMS_COMPETITION  # 452
		(also see # 397 for Petro Can site with same problem)
	VAXWRK::VMSNOTES  # 2979

You can refer to the above notes for more details on the problems at both
MTS and Petro Can.

These are the results of some testing that the Manitoba Telephone System (MTS)
has done to try and determine the nature of the Oracle V5.1.22 problem they
are experiencing on a VAX 6320 with VMS V5.1-1

The MTS configuration:
	VAX6320		VMS V5.1-1		2 KDB50 controllers
	4 RA90's	Oracle V5.1.22

The problem:
	When initially trying to IMPORT the Oracle database with both
	CPU's running, the Oracle Application would go into permanent
	hibernate state.   If multiple processes were started, Oracle 
	would process, but spend large chunks of time in hibernate 
	state, still drastically effecting turn-around time.

The MTS test:
   *****This test, MTS has requested to remain CONFIDENTCIAL, and NOT
	to be released to ORACLE. *****

	The above problem was observed when running in a stand-alone 
	system configuration.   MTS, suspecting a timing problem with
	Oracle, set up their system as a cluster to see if that would have
	any effect on the problem.  The results follow:

	With the system in a cluster configuration, and both CPU's 
	enabled:
Oracle ran fine doing EXPORTS (with NO WRITES)
When doing IMPORTS (LOTSA WRITES) Oracle initially ran fine, however, 
	some periods in hibernation were observed. As time passed, the
	time spent in hibernation increased.  At about 2 hours, more
	time was spend in hibernation than execution.  At about 2 1/2
	to 3 hours - Oracle again seemed to reach the permanent hiber-
	nation stage.

When the second CPU was turned off, Oracle immediately started processing
	again.  If the 2nd CPU was turned on a short while later, the
	hibernation seemed to continue where it left off when the CPU was
	turned off.   Intrestingly enough, if the 2nd CPU was left off for
	a longer period of time, such as 1/2 hr, Oracle some how seemed to
	reset itself.  ie: turning on the 2nd CPU after having it shut
	off for about 1/2 hour, Oracle would process fine again, with only
	short periods in hibernation (and hibernation periods would begin
	growing again).

MTS described the hibernation period growth to be almost logarithmic in
	nature.

ORACLE response:
	Oracle's response has been that this is a DIGITAL RMS problem
	in conjunction with the use of a KDB50.  They have sent a test
	program to Petro Can that they said would prove this.  The initial
	tests using this test program have shown only 28/100 of a second
	difference between single-processor and SMP modes.  NO long 
	hibernation periods.

	Oracle reasons that Oracle V6 fixes the problem because 
	it doesn't use RMS calls.   Also, no sites running Oracle V5.1.22
	and using HSC's in their configuration have reported any problems.

	I've found 2 sites using 6320's and KDB50's that have not noticed
	any problems (yet), however, they are running other processes as
	well.  Oracle has not commented on why these sites do not seem to
	have problems.

REQUESTS:
	If these symtoms/observations give any ideas as to what the problem
	might be....ALL SUGGESTIONS ARE VERY WELCOME !!!  

	If I have ommitted any detail that may be of help, feel free to
	E-mail me  at  CGOA01::SACHS
		for any all-in-one users:   HARRY SACHS @WNO

WARNINGS: Since multpile processes seem to knock Oracle out of Hibernation,
	and long dedicated runs (such as IMPORTS) make the problem most
	obvious....many of your current customers using Oracle V5.1.22
	may have this problem and not realize it.  Be very leary of 
	any dedicated Oracle applications on SMP platforms - Oracle support
	has not been very helpful to date in finding a resolution to
	this problem (Their stance is Oracle V6. fixes the problem, 
	pay the bucks and upgrade).
	

Harry Sachs  (at Winnipeg)
452.12MP-specific -- gotta be error in access to global sectionWIBBIN::NOYCEBill Noyce, FORTRAN/PARALLELWed Nov 22 1989 15:1918
    This sounds to me as if somewhere deep inside Oracle, the software
    is updating shared memory from different processes without using
    an interlocked instruction.  If they use a single VAX instruction
    such as INCL to increment a memory location, this will work fine
    on a single-processor system, but on a multiprocessor it can lose
    some increments, like this:
    
    	original value = 5
    	processor 1		processor 2
    	 reads 5
    	 comutes 5+1=6		 reads 5
    	 writes 6		 computes 5+1=6
    				 writes 6
    
    Or the problem could be more subtle.  But it really sounds like
    some kind of coding error in dealing with shared memory (on VMS
    that's a global section).  The general solution involves using
    interlocked instructions such as ADAWI, BBSSI, INSQTI, etc.
452.13different source - different storyCGOWGS::OAKLEYWhat am I doing here...Mon Nov 27 1989 19:4533
    
    I tend to agree with .12 that Oracle is suffering from loss of sync
    through use of writeable Global Sections.  
    
    I have been reproducing this problem on our NI Cluster and have found
    that the problem occurs on our 8370 with a DWBUA UDA50 (45% speed
    difference), KDB50 (25% speed difference) and NI Served RQDX3 (20%
    speed difference).  This makes the Oracle statement of not being able
    to reproduce the problem on HSC's suspect, so we are scheduling some
    time at a customer site to try it there (unless someone in DEC were to
    volunteer their system for a short test).
    
    The only test not done is to bring up the 8370 standalone and try the
    UDA and KDB that way.
    
    In our testing we found that the detached process that writes to the
    database file runs very erraticly in SMP but runs smoothly in
    Uniprocessor.  It also spends most of the time in HIB with a TQE
    wakeup.  The application flips between HIB, LEF and COM in Uniprocessor
    but spends most of its time in HIB in SMP (but still does occasional
    I/O).
    
    At this time we are waiting to talk to an Oracle developer about their
    code (which appears to be written mostly in C, so it is unlikely that
    they make use of interlocked instructions).
    
    Does anybody happen to know how one would go about closely monitoring a
    detached process (ala PCA without PCA since its detached and started by
    another process)?
    
    wayne oakley
    dtn:635-4359
    
452.14Don't give up on PCAWIBBIN::NOYCEBill Noyce, FORTRAN/PARALLELTue Nov 28 1989 15:5610
    You can use PCA on a detached process, I think.
    Link the image /DEBUG=PCA$COLLECTOR, and run it in an environment
    where there's a PCA$STARTUP logical (possibly some other spelling)
    that points to a command file containing
    	set datafile ...
    	set pc_sampling
    	:
    	go
    or whatever measurements you want.
    If necessary, you could probably make PCA$STARTUP a system logical.
452.15Looks like this one is solvedCGOO01::TULLISCraig TullisThu Dec 07 1989 01:219
    Well, it looks like the problem may be finally "solved".  Oracle
    has written a letter to MTS (not to Petro-Canada yet) in which the
    say that the problem lies with their I/O routines, and that to "fix"
    it would require a major effort.  So, their proposed solution is
    to have the customer upgrade to Oracle version 6.
    
    I will let Harry Sachs post any other news of MTS (they were
    considering converting to Rdb).
    
452.16Make the bastards squirm !!!SNO78C::BELAKHOVThe ORACLEBUSTER !!!Thu Dec 07 1989 05:365
    I think that Digital should suggest to the customer, that as Oracle
    have admitted the problem as theirs.  Oracle should supply the V6
    upgrade for free. (:-)
    
    
452.17Can we get the letter ?MAIL::DUNCANGGerry Duncan @KCOThu Dec 07 1989 13:132
    Yipee !!  Gotta' have a copy of that letter !!!  Can we get it ??
    please, please, please.
452.18I don't know why we couldn'tCGOO01::TULLISCraig TullisThu Dec 07 1989 18:217
    I can try to get it.  Robin Dunn, Digital in Winnipeg, has a copy
    of it in the Digital office there.  We were copied on it, so I don't
    see why we can't get it to you Gerry.
    
    As an interesting side note:  Oracle has not been talking to our
    Customer Services person in Calgary since last Friday and they have
    also said nothing about this to Petro-Canada either.
452.19the finishing touchCGOS01::HSACHSFri Dec 29 1989 22:3051
  
Greetings All.  There are 2 occurances of this notes entry, they are
	BISTRO::RDB_VMS_COMPETITION  # 452
		(also see # 397 for Petro Can site with same problem)
	VAXWRK::VMSNOTES  # 2979

You can refer to the above notes for more details on the problems at both
MTS and Petro Can.

This will be my final entry to summarize the results of the Oracle problem
that I have been dealing with.  

On December 5, 1989 Digital was copied a letter which was sent to MTS by
Oracle.  The general overtone of the letter implied that the MTS problem was
still due to a problem with Digital hardware and/or software, however one
paragraph in the 3rd page states:

	Oracle developers and Digital testing has indicated,
	although inconclusively, that to actually fix this problem
	might require a major re-write of the I/O routines, requiring
	many months of effort.  Since the problem has been eliminated
	with version 6, and is not a problem common to all VAX installations
	it will be impossible to get a commitment to undertake such
	a fix.

The drivers refered to are of course Oracle's (note how easy it could be
for a reader to missinterpret this to be VMS drivers).  Basically Oracle
gave the position that the Oracle V5.1.22 would NOT be fixed.  Oracle feels
Version 6 cures the problem and that customers should upgrade.

Wayne Oakley from Calgary, who has been giving the primary technical
assistance from the Digital side,  has managed to re-create the problem
on all forms of system configurations, including an HSC configuration
which mirrored the hardware configuration at Oracle in Belmont.  Wayne
believes that Oracle is not correctly utilizing the global 
sections in VMS resulting in a synchronization problem.

Since Oracle has flatly refused to fix the Oracle V5.1.22 problem,
MTS has decided to upgrade to a 6420 and to initially run it in the single
processor mode to get the required cpu power they had anticipated from 
the 6320.  This will allow the application to run with its current load
until they upgrade to Oracle V6 (or convert to RDB???  wouldn't it be nice?
there are rifts in the MTS ranks so the conversion issue could go either
way.  If they do convert, I'll add one last entry that will indicate this.)

One final note,  Petro Can, which was the first site to experience this
wonderful problem, still has not heard any response from Oracle on the 
source of the problem as of Christmas.  I wonder what the new year will 
bring for them (keep posted to note 397).

Harry Sachs  (at Winnipeg)