T.R | Title | User | Personal Name | Date | Lines |
---|
452.1 | | SRFSUP::BREWIS | | Fri Oct 13 1989 20:50 | 13 |
| WE have a customer in Los Angeles that currently has a custom project
tracking system developed in Oracle and it is running version 5.1.22.
The application is in pilot mode with 5-8 users simultaneously
accessing the database each day. At this point, the users have
not complained about response problems.
A few things will be happening that will give us a better measurement
criteria. The customer plans to add more users and projects to
track (eventually growing to 30-50 users) AND the 3rd party consultant who
is maintaining the application plans to upgrade to V6. We should
begin to see some real results in the next month.
Rick
|
452.2 | Any other Oracle SMP sites out there? | CGOS01::HSACHS | | Tue Oct 17 1989 21:05 | 12 |
|
Oracle has offered 1 potential cause of the Oracale V5.1.22 problem
at MTS and PetroCan sites as being due to the fact that they are
using KDB50 disk controllers, and not HSC's. I am desprately trying
to find any Oracle V5.1.22 sites running in an SMP environment,
Especally if there is a site that is using a KDB50 disk controller,
and the Oracle is a dedicated application on the system, added
bonuses if it is on a 6320 as well. Are there any such sites
out there? Please respond, Thanks.
Harry
|
452.3 | Update on Petro-Canada 6320 situation | CGOO01::TULLIS | Craig Tullis | Tue Oct 17 1989 21:30 | 4 |
| I'm in a bit of a rush right now, but just to add to the .2 comment.
Oracle is telling Petro-Canada that the problem is with RMS and
KDBs. I'm trying to find out more information from them at this
point and will add to this note as the information comes in.
|
452.4 | Come on! | WIBBIN::NOYCE | Bill Noyce, FORTRAN/PARALLEL | Wed Oct 18 1989 17:07 | 2 |
| Right, next they'll tell them that the problem is the brown paint
on top of the cabinet...
|
452.5 | not the paint, but maybe something | CGOO01::TULLIS | Craig Tullis | Wed Oct 18 1989 18:57 | 12 |
| Actually, I was worried that they would somehow find a way to point
the finger at us, but I am beginning to think that they may have
some reason to say we are at least somewhat to "blame".
They have tried to duplicate the problem on their 6340 with HSCs
and have had no luck. We have seen the problem on two different
6320s with KDBs and on and 8350 with a KDB. If you look at that,
at least on the surface, it seems there may be a problem when there
are KDBs involved. They have, apparently, isolated the problem
to waiting for I/O completion (LEF 30). They were going to be doing
some further testing, but their office is in Belmont California
(not far from San Fransico) and so that could be delayed for a while.
|
452.6 | Could it be the earthquake ?? | MAIL::DUNCANG | Gerry Duncan @KCO | Wed Oct 18 1989 20:29 | 2 |
| And I'm sure that Oracle will somehow blame the earthquake for this
problem !!
|
452.7 | Quake will delay solution | CGOO01::TULLIS | Craig Tullis | Wed Oct 18 1989 21:28 | 6 |
| Actually Gerry, they are blaming the earthquake for the delay in
solving the problem.
I will keep everyone posted as results come in.
|
452.8 | | WIBBIN::NOYCE | Bill Noyce, FORTRAN/PARALLEL | Thu Oct 19 1989 15:39 | 7 |
| Are they really using Local Event Flag #30? This is in the "reserved
to VMS" range... I wonder if they follow all the necessary protocols
for sharing event flags, including using $SYNC instead of just $WAITFR,
and settting the event flag after the test-clear-retest sequence?
The only difference I would expect with a KDB is that I/Os get started
faster. Maybe that's the real problem?
|
452.9 | Other related conferences | TROA01::NAISH | RDB4ME Paul Naish DTN 631-3352 | Thu Oct 19 1989 15:40 | 6 |
| You may also wish to cross-post the local vs clustered disk issues
to one or more of the following:
Clusters ELKTRA::CLUSTER
Storage Arch SSAG::ASK_SSAG
VAX 6000 SASE::CALYPSO
|
452.10 | Oracle's $QIOs not word alligned | CIMNET::BOURDEAU | Rich Bourdeau CIM Product Marketing | Thu Oct 19 1989 21:41 | 13 |
|
I encountered a similar problem with CINCOM's Ultra database back in
1985. The problem is that the Oracle database is probably issuing
$QIOs that are not word alligned. It seems that the KDA, KDB, and
RQDX3 disk controllers do not support odd byte transfers of data. To
compensate for this PUDRIVER allocates it's own 512 byte word alligned
buffer. 512 bytes are then moved from the unalligned buffer to the
alligned buffer, and single block transfers are queued until the
original request has been satisfied. If this is the problem the
symptoms are very obvious. Split I/O will be very high. The solution
is for the database to issue word alligned $QIOs. This solevd CINCOM's
problem. This may or may not be your problem, but it's worth looking
into.
|
452.11 | Oracle SMP test results | CGOS01::HSACHS | | Tue Nov 21 1989 20:11 | 92 |
|
Greetings All. There are 2 occurances of this notes entry, they are
BISTRO::RDB_VMS_COMPETITION # 452
(also see # 397 for Petro Can site with same problem)
VAXWRK::VMSNOTES # 2979
You can refer to the above notes for more details on the problems at both
MTS and Petro Can.
These are the results of some testing that the Manitoba Telephone System (MTS)
has done to try and determine the nature of the Oracle V5.1.22 problem they
are experiencing on a VAX 6320 with VMS V5.1-1
The MTS configuration:
VAX6320 VMS V5.1-1 2 KDB50 controllers
4 RA90's Oracle V5.1.22
The problem:
When initially trying to IMPORT the Oracle database with both
CPU's running, the Oracle Application would go into permanent
hibernate state. If multiple processes were started, Oracle
would process, but spend large chunks of time in hibernate
state, still drastically effecting turn-around time.
The MTS test:
*****This test, MTS has requested to remain CONFIDENTCIAL, and NOT
to be released to ORACLE. *****
The above problem was observed when running in a stand-alone
system configuration. MTS, suspecting a timing problem with
Oracle, set up their system as a cluster to see if that would have
any effect on the problem. The results follow:
With the system in a cluster configuration, and both CPU's
enabled:
Oracle ran fine doing EXPORTS (with NO WRITES)
When doing IMPORTS (LOTSA WRITES) Oracle initially ran fine, however,
some periods in hibernation were observed. As time passed, the
time spent in hibernation increased. At about 2 hours, more
time was spend in hibernation than execution. At about 2 1/2
to 3 hours - Oracle again seemed to reach the permanent hiber-
nation stage.
When the second CPU was turned off, Oracle immediately started processing
again. If the 2nd CPU was turned on a short while later, the
hibernation seemed to continue where it left off when the CPU was
turned off. Intrestingly enough, if the 2nd CPU was left off for
a longer period of time, such as 1/2 hr, Oracle some how seemed to
reset itself. ie: turning on the 2nd CPU after having it shut
off for about 1/2 hour, Oracle would process fine again, with only
short periods in hibernation (and hibernation periods would begin
growing again).
MTS described the hibernation period growth to be almost logarithmic in
nature.
ORACLE response:
Oracle's response has been that this is a DIGITAL RMS problem
in conjunction with the use of a KDB50. They have sent a test
program to Petro Can that they said would prove this. The initial
tests using this test program have shown only 28/100 of a second
difference between single-processor and SMP modes. NO long
hibernation periods.
Oracle reasons that Oracle V6 fixes the problem because
it doesn't use RMS calls. Also, no sites running Oracle V5.1.22
and using HSC's in their configuration have reported any problems.
I've found 2 sites using 6320's and KDB50's that have not noticed
any problems (yet), however, they are running other processes as
well. Oracle has not commented on why these sites do not seem to
have problems.
REQUESTS:
If these symtoms/observations give any ideas as to what the problem
might be....ALL SUGGESTIONS ARE VERY WELCOME !!!
If I have ommitted any detail that may be of help, feel free to
E-mail me at CGOA01::SACHS
for any all-in-one users: HARRY SACHS @WNO
WARNINGS: Since multpile processes seem to knock Oracle out of Hibernation,
and long dedicated runs (such as IMPORTS) make the problem most
obvious....many of your current customers using Oracle V5.1.22
may have this problem and not realize it. Be very leary of
any dedicated Oracle applications on SMP platforms - Oracle support
has not been very helpful to date in finding a resolution to
this problem (Their stance is Oracle V6. fixes the problem,
pay the bucks and upgrade).
Harry Sachs (at Winnipeg)
|
452.12 | MP-specific -- gotta be error in access to global section | WIBBIN::NOYCE | Bill Noyce, FORTRAN/PARALLEL | Wed Nov 22 1989 15:19 | 18 |
| This sounds to me as if somewhere deep inside Oracle, the software
is updating shared memory from different processes without using
an interlocked instruction. If they use a single VAX instruction
such as INCL to increment a memory location, this will work fine
on a single-processor system, but on a multiprocessor it can lose
some increments, like this:
original value = 5
processor 1 processor 2
reads 5
comutes 5+1=6 reads 5
writes 6 computes 5+1=6
writes 6
Or the problem could be more subtle. But it really sounds like
some kind of coding error in dealing with shared memory (on VMS
that's a global section). The general solution involves using
interlocked instructions such as ADAWI, BBSSI, INSQTI, etc.
|
452.13 | different source - different story | CGOWGS::OAKLEY | What am I doing here... | Mon Nov 27 1989 19:45 | 33 |
|
I tend to agree with .12 that Oracle is suffering from loss of sync
through use of writeable Global Sections.
I have been reproducing this problem on our NI Cluster and have found
that the problem occurs on our 8370 with a DWBUA UDA50 (45% speed
difference), KDB50 (25% speed difference) and NI Served RQDX3 (20%
speed difference). This makes the Oracle statement of not being able
to reproduce the problem on HSC's suspect, so we are scheduling some
time at a customer site to try it there (unless someone in DEC were to
volunteer their system for a short test).
The only test not done is to bring up the 8370 standalone and try the
UDA and KDB that way.
In our testing we found that the detached process that writes to the
database file runs very erraticly in SMP but runs smoothly in
Uniprocessor. It also spends most of the time in HIB with a TQE
wakeup. The application flips between HIB, LEF and COM in Uniprocessor
but spends most of its time in HIB in SMP (but still does occasional
I/O).
At this time we are waiting to talk to an Oracle developer about their
code (which appears to be written mostly in C, so it is unlikely that
they make use of interlocked instructions).
Does anybody happen to know how one would go about closely monitoring a
detached process (ala PCA without PCA since its detached and started by
another process)?
wayne oakley
dtn:635-4359
|
452.14 | Don't give up on PCA | WIBBIN::NOYCE | Bill Noyce, FORTRAN/PARALLEL | Tue Nov 28 1989 15:56 | 10 |
| You can use PCA on a detached process, I think.
Link the image /DEBUG=PCA$COLLECTOR, and run it in an environment
where there's a PCA$STARTUP logical (possibly some other spelling)
that points to a command file containing
set datafile ...
set pc_sampling
:
go
or whatever measurements you want.
If necessary, you could probably make PCA$STARTUP a system logical.
|
452.15 | Looks like this one is solved | CGOO01::TULLIS | Craig Tullis | Thu Dec 07 1989 01:21 | 9 |
| Well, it looks like the problem may be finally "solved". Oracle
has written a letter to MTS (not to Petro-Canada yet) in which the
say that the problem lies with their I/O routines, and that to "fix"
it would require a major effort. So, their proposed solution is
to have the customer upgrade to Oracle version 6.
I will let Harry Sachs post any other news of MTS (they were
considering converting to Rdb).
|
452.16 | Make the bastards squirm !!! | SNO78C::BELAKHOV | The ORACLEBUSTER !!! | Thu Dec 07 1989 05:36 | 5 |
| I think that Digital should suggest to the customer, that as Oracle
have admitted the problem as theirs. Oracle should supply the V6
upgrade for free. (:-)
|
452.17 | Can we get the letter ? | MAIL::DUNCANG | Gerry Duncan @KCO | Thu Dec 07 1989 13:13 | 2 |
| Yipee !! Gotta' have a copy of that letter !!! Can we get it ??
please, please, please.
|
452.18 | I don't know why we couldn't | CGOO01::TULLIS | Craig Tullis | Thu Dec 07 1989 18:21 | 7 |
| I can try to get it. Robin Dunn, Digital in Winnipeg, has a copy
of it in the Digital office there. We were copied on it, so I don't
see why we can't get it to you Gerry.
As an interesting side note: Oracle has not been talking to our
Customer Services person in Calgary since last Friday and they have
also said nothing about this to Petro-Canada either.
|
452.19 | the finishing touch | CGOS01::HSACHS | | Fri Dec 29 1989 22:30 | 51 |
|
Greetings All. There are 2 occurances of this notes entry, they are
BISTRO::RDB_VMS_COMPETITION # 452
(also see # 397 for Petro Can site with same problem)
VAXWRK::VMSNOTES # 2979
You can refer to the above notes for more details on the problems at both
MTS and Petro Can.
This will be my final entry to summarize the results of the Oracle problem
that I have been dealing with.
On December 5, 1989 Digital was copied a letter which was sent to MTS by
Oracle. The general overtone of the letter implied that the MTS problem was
still due to a problem with Digital hardware and/or software, however one
paragraph in the 3rd page states:
Oracle developers and Digital testing has indicated,
although inconclusively, that to actually fix this problem
might require a major re-write of the I/O routines, requiring
many months of effort. Since the problem has been eliminated
with version 6, and is not a problem common to all VAX installations
it will be impossible to get a commitment to undertake such
a fix.
The drivers refered to are of course Oracle's (note how easy it could be
for a reader to missinterpret this to be VMS drivers). Basically Oracle
gave the position that the Oracle V5.1.22 would NOT be fixed. Oracle feels
Version 6 cures the problem and that customers should upgrade.
Wayne Oakley from Calgary, who has been giving the primary technical
assistance from the Digital side, has managed to re-create the problem
on all forms of system configurations, including an HSC configuration
which mirrored the hardware configuration at Oracle in Belmont. Wayne
believes that Oracle is not correctly utilizing the global
sections in VMS resulting in a synchronization problem.
Since Oracle has flatly refused to fix the Oracle V5.1.22 problem,
MTS has decided to upgrade to a 6420 and to initially run it in the single
processor mode to get the required cpu power they had anticipated from
the 6320. This will allow the application to run with its current load
until they upgrade to Oracle V6 (or convert to RDB??? wouldn't it be nice?
there are rifts in the MTS ranks so the conversion issue could go either
way. If they do convert, I'll add one last entry that will indicate this.)
One final note, Petro Can, which was the first site to experience this
wonderful problem, still has not heard any response from Oracle on the
source of the problem as of Christmas. I wonder what the new year will
bring for them (keep posted to note 397).
Harry Sachs (at Winnipeg)
|