[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ulysse::rdb_vms_competition

Title:	DEC Rdb against the World

Moderator:	HERON::GODFRIND

Created:	Fri Jun 12 1987
Last Modified:	Thu Feb 23 1995
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1348
Total number of notes:	5438

452.0. "ANOTHER ORACLE 6320 PROBLEM" by CGOS01::HSACHS () Tue Oct 10 1989 20:53

Oracle and the VAX6320 are battling it out again, this time in Winnipeg.
This problem is similar to the one at Petro Can (note 397).

Our customer is the Manitoba Telephone system (MTS) and is currently running
an Oracle V5.1.22 application on a VAX 6310 running VMS V5.1.  MTS has 
bought a VAX 6320 onto which they are going to migrate the Oracle application
(the 6310 had actually been designated for other uses).

When moving the database using SQL Plus, MTS observed the following:
	o When the 6320 was running in SMP mode, the SQL process
	remained in hibernate state indefinately

	o When the 2nd processor was turned off, the SQL process began
	processing immediately

	o When the 2nd processor was turned on again, the SQL process
	immediatly returned to a hibernate state

	o When other processes were started up, the hibernating process
	also began to start processing.  We have no statistics to show
	if the processes are running any slower than what they would be
	in the case with the second processor turned off.

The system configuration includes a VAX 6320, 2 KDB50's, 4 RA90's, 2 RA70's, 
and VMS V5.1-1.

We have checked the hardware and operating system configurations, everything
seems to be OK.  Oracle has checked the Oracle installation and given it a
clean bill of health as well.

Are there any Oracle V5.1.22 sites out there that are running fine in an
SMP environment?

Oracle feels the solution of course is to migrate to Oracle V6.  The 
VAX 6320 however has been scoped out using the DECcp (capacity planner)
service on Oracle V5.1.22 .  We feel the model would be invalidated by
the Oracle upgrade as we do not know how Oracle V6 differs from V5 as
far as system (CPU, MEM, I/O) resource consumption. 

Does anyone have any idea what may be causing our problem?

Are there any sites out there that know if Oracle V6 uses more or less
system resources than Oracle V5?  (IE: if the customer uses migration
as a solution, would the DECcp model projection still be reasonably 
accurate?)

Oracle has already supplied a local Oracle Rep to help MTS upgrade to 
V6 to see if that will make the problem go away.

T.R	Title	User	Personal Name	Date	Lines
452.1		SRFSUP::BREWIS		`Fri Oct 13 1989 19:50`	13
	WE have a customer in Los Angeles that currently has a custom project tracking system developed in Oracle and it is running version 5.1.22. The application is in pilot mode with 5-8 users simultaneously accessing the database each day. At this point, the users have not complained about response problems. A few things will be happening that will give us a better measurement criteria. The customer plans to add more users and projects to track (eventually growing to 30-50 users) AND the 3rd party consultant who is maintaining the application plans to upgrade to V6. We should begin to see some real results in the next month. Rick
452.2	Any other Oracle SMP sites out there?	CGOS01::HSACHS		`Tue Oct 17 1989 20:05`	12
	Oracle has offered 1 potential cause of the Oracale V5.1.22 problem at MTS and PetroCan sites as being due to the fact that they are using KDB50 disk controllers, and not HSC's. I am desprately trying to find any Oracle V5.1.22 sites running in an SMP environment, Especally if there is a site that is using a KDB50 disk controller, and the Oracle is a dedicated application on the system, added bonuses if it is on a 6320 as well. Are there any such sites out there? Please respond, Thanks. Harry
452.3	Update on Petro-Canada 6320 situation	CGOO01::TULLIS	Craig Tullis	`Tue Oct 17 1989 20:30`	4
	I'm in a bit of a rush right now, but just to add to the .2 comment. Oracle is telling Petro-Canada that the problem is with RMS and KDBs. I'm trying to find out more information from them at this point and will add to this note as the information comes in.
452.4	Come on!	WIBBIN::NOYCE	Bill Noyce, FORTRAN/PARALLEL	`Wed Oct 18 1989 16:07`	2
	Right, next they'll tell them that the problem is the brown paint on top of the cabinet...
452.5	not the paint, but maybe something	CGOO01::TULLIS	Craig Tullis	`Wed Oct 18 1989 17:57`	12
	Actually, I was worried that they would somehow find a way to point the finger at us, but I am beginning to think that they may have some reason to say we are at least somewhat to "blame". They have tried to duplicate the problem on their 6340 with HSCs and have had no luck. We have seen the problem on two different 6320s with KDBs and on and 8350 with a KDB. If you look at that, at least on the surface, it seems there may be a problem when there are KDBs involved. They have, apparently, isolated the problem to waiting for I/O completion (LEF 30). They were going to be doing some further testing, but their office is in Belmont California (not far from San Fransico) and so that could be delayed for a while.
452.6	Could it be the earthquake ??	MAIL::DUNCANG	Gerry Duncan @KCO	`Wed Oct 18 1989 19:29`	2
	And I'm sure that Oracle will somehow blame the earthquake for this problem !!
452.7	Quake will delay solution	CGOO01::TULLIS	Craig Tullis	`Wed Oct 18 1989 20:28`	6
	Actually Gerry, they are blaming the earthquake for the delay in solving the problem. I will keep everyone posted as results come in.
452.8		WIBBIN::NOYCE	Bill Noyce, FORTRAN/PARALLEL	`Thu Oct 19 1989 14:39`	7
	Are they really using Local Event Flag #30? This is in the "reserved to VMS" range... I wonder if they follow all the necessary protocols for sharing event flags, including using $SYNC instead of just $WAITFR, and settting the event flag after the test-clear-retest sequence? The only difference I would expect with a KDB is that I/Os get started faster. Maybe that's the real problem?
452.9	Other related conferences	TROA01::NAISH	RDB4ME Paul Naish DTN 631-3352	`Thu Oct 19 1989 14:40`	6
	You may also wish to cross-post the local vs clustered disk issues to one or more of the following: Clusters ELKTRA::CLUSTER Storage Arch SSAG::ASK_SSAG VAX 6000 SASE::CALYPSO
452.10	Oracle's $QIOs not word alligned	CIMNET::BOURDEAU	Rich Bourdeau CIM Product Marketing	`Thu Oct 19 1989 20:41`	13
	I encountered a similar problem with CINCOM's Ultra database back in 1985. The problem is that the Oracle database is probably issuing $QIOs that are not word alligned. It seems that the KDA, KDB, and RQDX3 disk controllers do not support odd byte transfers of data. To compensate for this PUDRIVER allocates it's own 512 byte word alligned buffer. 512 bytes are then moved from the unalligned buffer to the alligned buffer, and single block transfers are queued until the original request has been satisfied. If this is the problem the symptoms are very obvious. Split I/O will be very high. The solution is for the database to issue word alligned $QIOs. This solevd CINCOM's problem. This may or may not be your problem, but it's worth looking into.
452.11	Oracle SMP test results	CGOS01::HSACHS		`Tue Nov 21 1989 20:11`	92
	Greetings All. There are 2 occurances of this notes entry, they are BISTRO::RDB_VMS_COMPETITION # 452 (also see # 397 for Petro Can site with same problem) VAXWRK::VMSNOTES # 2979 You can refer to the above notes for more details on the problems at both MTS and Petro Can. These are the results of some testing that the Manitoba Telephone System (MTS) has done to try and determine the nature of the Oracle V5.1.22 problem they are experiencing on a VAX 6320 with VMS V5.1-1 The MTS configuration: VAX6320 VMS V5.1-1 2 KDB50 controllers 4 RA90's Oracle V5.1.22 The problem: When initially trying to IMPORT the Oracle database with both CPU's running, the Oracle Application would go into permanent hibernate state. If multiple processes were started, Oracle would process, but spend large chunks of time in hibernate state, still drastically effecting turn-around time. The MTS test: ***This test, MTS has requested to remain CONFIDENTCIAL, and NOT to be released to ORACLE. *** The above problem was observed when running in a stand-alone system configuration. MTS, suspecting a timing problem with Oracle, set up their system as a cluster to see if that would have any effect on the problem. The results follow: With the system in a cluster configuration, and both CPU's enabled: Oracle ran fine doing EXPORTS (with NO WRITES) When doing IMPORTS (LOTSA WRITES) Oracle initially ran fine, however, some periods in hibernation were observed. As time passed, the time spent in hibernation increased. At about 2 hours, more time was spend in hibernation than execution. At about 2 1/2 to 3 hours - Oracle again seemed to reach the permanent hiber- nation stage. When the second CPU was turned off, Oracle immediately started processing again. If the 2nd CPU was turned on a short while later, the hibernation seemed to continue where it left off when the CPU was turned off. Intrestingly enough, if the 2nd CPU was left off for a longer period of time, such as 1/2 hr, Oracle some how seemed to reset itself. ie: turning on the 2nd CPU after having it shut off for about 1/2 hour, Oracle would process fine again, with only short periods in hibernation (and hibernation periods would begin growing again). MTS described the hibernation period growth to be almost logarithmic in nature. ORACLE response: Oracle's response has been that this is a DIGITAL RMS problem in conjunction with the use of a KDB50. They have sent a test program to Petro Can that they said would prove this. The initial tests using this test program have shown only 28/100 of a second difference between single-processor and SMP modes. NO long hibernation periods. Oracle reasons that Oracle V6 fixes the problem because it doesn't use RMS calls. Also, no sites running Oracle V5.1.22 and using HSC's in their configuration have reported any problems. I've found 2 sites using 6320's and KDB50's that have not noticed any problems (yet), however, they are running other processes as well. Oracle has not commented on why these sites do not seem to have problems. REQUESTS: If these symtoms/observations give any ideas as to what the problem might be....ALL SUGGESTIONS ARE VERY WELCOME !!! If I have ommitted any detail that may be of help, feel free to E-mail me at CGOA01::SACHS for any all-in-one users: HARRY SACHS @WNO WARNINGS: Since multpile processes seem to knock Oracle out of Hibernation, and long dedicated runs (such as IMPORTS) make the problem most obvious....many of your current customers using Oracle V5.1.22 may have this problem and not realize it. Be very leary of any dedicated Oracle applications on SMP platforms - Oracle support has not been very helpful to date in finding a resolution to this problem (Their stance is Oracle V6. fixes the problem, pay the bucks and upgrade). Harry Sachs (at Winnipeg)
452.12	MP-specific -- gotta be error in access to global section	WIBBIN::NOYCE	Bill Noyce, FORTRAN/PARALLEL	`Wed Nov 22 1989 15:19`	18
	This sounds to me as if somewhere deep inside Oracle, the software is updating shared memory from different processes without using an interlocked instruction. If they use a single VAX instruction such as INCL to increment a memory location, this will work fine on a single-processor system, but on a multiprocessor it can lose some increments, like this: original value = 5 processor 1 processor 2 reads 5 comutes 5+1=6 reads 5 writes 6 computes 5+1=6 writes 6 Or the problem could be more subtle. But it really sounds like some kind of coding error in dealing with shared memory (on VMS that's a global section). The general solution involves using interlocked instructions such as ADAWI, BBSSI, INSQTI, etc.
452.13	different source - different story	CGOWGS::OAKLEY	What am I doing here...	`Mon Nov 27 1989 19:45`	33
	I tend to agree with .12 that Oracle is suffering from loss of sync through use of writeable Global Sections. I have been reproducing this problem on our NI Cluster and have found that the problem occurs on our 8370 with a DWBUA UDA50 (45% speed difference), KDB50 (25% speed difference) and NI Served RQDX3 (20% speed difference). This makes the Oracle statement of not being able to reproduce the problem on HSC's suspect, so we are scheduling some time at a customer site to try it there (unless someone in DEC were to volunteer their system for a short test). The only test not done is to bring up the 8370 standalone and try the UDA and KDB that way. In our testing we found that the detached process that writes to the database file runs very erraticly in SMP but runs smoothly in Uniprocessor. It also spends most of the time in HIB with a TQE wakeup. The application flips between HIB, LEF and COM in Uniprocessor but spends most of its time in HIB in SMP (but still does occasional I/O). At this time we are waiting to talk to an Oracle developer about their code (which appears to be written mostly in C, so it is unlikely that they make use of interlocked instructions). Does anybody happen to know how one would go about closely monitoring a detached process (ala PCA without PCA since its detached and started by another process)? wayne oakley dtn:635-4359
452.14	Don't give up on PCA	WIBBIN::NOYCE	Bill Noyce, FORTRAN/PARALLEL	`Tue Nov 28 1989 15:56`	10
	You can use PCA on a detached process, I think. Link the image /DEBUG=PCA$COLLECTOR, and run it in an environment where there's a PCA$STARTUP logical (possibly some other spelling) that points to a command file containing set datafile ... set pc_sampling : go or whatever measurements you want. If necessary, you could probably make PCA$STARTUP a system logical.
452.15	Looks like this one is solved	CGOO01::TULLIS	Craig Tullis	`Thu Dec 07 1989 01:21`	9
	Well, it looks like the problem may be finally "solved". Oracle has written a letter to MTS (not to Petro-Canada yet) in which the say that the problem lies with their I/O routines, and that to "fix" it would require a major effort. So, their proposed solution is to have the customer upgrade to Oracle version 6. I will let Harry Sachs post any other news of MTS (they were considering converting to Rdb).
452.16	Make the bastards squirm !!!	SNO78C::BELAKHOV	The ORACLEBUSTER !!!	`Thu Dec 07 1989 05:36`	5
	I think that Digital should suggest to the customer, that as Oracle have admitted the problem as theirs. Oracle should supply the V6 upgrade for free. (:-)
452.17	Can we get the letter ?	MAIL::DUNCANG	Gerry Duncan @KCO	`Thu Dec 07 1989 13:13`	2
	Yipee !! Gotta' have a copy of that letter !!! Can we get it ?? please, please, please.
452.18	I don't know why we couldn't	CGOO01::TULLIS	Craig Tullis	`Thu Dec 07 1989 18:21`	7
	I can try to get it. Robin Dunn, Digital in Winnipeg, has a copy of it in the Digital office there. We were copied on it, so I don't see why we can't get it to you Gerry. As an interesting side note: Oracle has not been talking to our Customer Services person in Calgary since last Friday and they have also said nothing about this to Petro-Canada either.
452.19	the finishing touch	CGOS01::HSACHS		`Fri Dec 29 1989 22:30`	51
	Greetings All. There are 2 occurances of this notes entry, they are BISTRO::RDB_VMS_COMPETITION # 452 (also see # 397 for Petro Can site with same problem) VAXWRK::VMSNOTES # 2979 You can refer to the above notes for more details on the problems at both MTS and Petro Can. This will be my final entry to summarize the results of the Oracle problem that I have been dealing with. On December 5, 1989 Digital was copied a letter which was sent to MTS by Oracle. The general overtone of the letter implied that the MTS problem was still due to a problem with Digital hardware and/or software, however one paragraph in the 3rd page states: Oracle developers and Digital testing has indicated, although inconclusively, that to actually fix this problem might require a major re-write of the I/O routines, requiring many months of effort. Since the problem has been eliminated with version 6, and is not a problem common to all VAX installations it will be impossible to get a commitment to undertake such a fix. The drivers refered to are of course Oracle's (note how easy it could be for a reader to missinterpret this to be VMS drivers). Basically Oracle gave the position that the Oracle V5.1.22 would NOT be fixed. Oracle feels Version 6 cures the problem and that customers should upgrade. Wayne Oakley from Calgary, who has been giving the primary technical assistance from the Digital side, has managed to re-create the problem on all forms of system configurations, including an HSC configuration which mirrored the hardware configuration at Oracle in Belmont. Wayne believes that Oracle is not correctly utilizing the global sections in VMS resulting in a synchronization problem. Since Oracle has flatly refused to fix the Oracle V5.1.22 problem, MTS has decided to upgrade to a 6420 and to initially run it in the single processor mode to get the required cpu power they had anticipated from the 6320. This will allow the application to run with its current load until they upgrade to Oracle V6 (or convert to RDB??? wouldn't it be nice? there are rifts in the MTS ranks so the conversion issue could go either way. If they do convert, I'll add one last entry that will indicate this.) One final note, Petro Can, which was the first site to experience this wonderful problem, still has not heard any response from Oracle on the source of the problem as of Christmas. I wonder what the new year will bring for them (keep posted to note 397). Harry Sachs (at Winnipeg)