T.R | Title | User | Personal Name | Date | Lines |
---|
177.1 | | AUSS::GARSON | DECcharity Program Office | Wed Feb 12 1997 21:06 | 12 |
| re .0
> Is anyone considering changing the units for this parameter in future
> VMS versions?
future => product manager
I wouldn't have thought the implied change was all that desirable.
Surely the customer can fix the application not to deadlock so much.
You might want to work out how long it takes to complete a deadlock
search.
|
177.2 | | WHOS01::BOWERS | Dave Bowers, NSIS/IM | Thu Feb 13 1997 09:46 | 15 |
| future => product manager => name or e-mail addr?
The main problem is that the code is generated by TI's IEF case tool
and we really can't control it in any meaningful way. We know it's
lousy code in many ways, but we're stuck with it.
The overal system design (the other half of the problem) is likewise
cast in concrete.
The argument being made by the DBA is essentially that the 1 second
granularity was imposed when the box ran at .1% of the speed of the
current generation. On a 780, a second was a fairly brief interval. On
an 8400, it's forever.
\dave
|
177.3 | | AUSS::GARSON | DECcharity Program Office | Thu Feb 13 1997 20:28 | 37 |
| re .2
future => product manager => name or e-mail => note 7.2 in this conference
It sounds as if an IPMT should be raised for the customer.
> The argument being made by the DBA is essentially that the 1 second
> granularity was imposed when the box ran at .1% of the speed of the
> current generation. On a 780, a second was a fairly brief interval. On
> an 8400, it's forever.
The goal of DEADLOCK_WAIT was to define a time limit beyond which a
queued lock conversion or request was deemed likely to indicate
deadlock rather than simply that a process holding the lock hadn't
finished with it i.e. to control the amount of CPU on *wasted* deadlock
searches.
It is important therefore to identify what is the limiting factor on
legitimate lock hold time. If there is *no* user interaction while
locks are held then CPU is one factor but I/O may be another. Note that
this includes cluster communications I/O as well as disk I/O. All of
these have got faster but CPU has increased by the greatest factor. [If
locks are held across user interaction then all bets are off.]
At the same time, since CPUs have got faster, one can afford to perform
more deadlock searches per unit time and expend only the same CPU time
regardless of whether the CPU time is wasted but, on the other hand,
lock populations have become larger and so this effect is offset somewhat.
So while sub-second DEADLOCK_WAIT may be justified, perhaps not to the
extent implied by the DBA. (I agree that 10 seconds default and 1
second minimum looks pretty conservative for a system doing perhaps a
thousand database transactions per second.)
At the very least they should confirm that when deadlock is not
occurring the locks are not held for a period of time exceeding what
they would propose for DEADLOCK_WAIT.
|
177.4 | | EEMELI::MOSER | Orienteers do it in the bush... | Wed Feb 19 1997 14:52 | 20 |
| woa, I think this customer wants some more CPU cycle competition!
Why does he want to 'decrease' DEADLOCK_WAIT? in order to be faster
notified of any deadlocks, i.e. a process issuing a $ENQW getting
quicker a SS$_DEADLOCK back?
or
does he just have to burn some more CPU cycles? you will trigger many
many more deadlock searches with a low DEADLOCK_WAIT value. And those
are very expensive and at high IPL and no locks can be granted during
this time for anybody etc.
Bottom line: I don't see any valid point of lowering DEADLOCK_WAIT
below 1 sec. I occasionally lower it from the default of 10 sec down
to maybe 3 or 5 sec and then watch the timeout queue to see on what
resources I might have a contention...
/cmos
|
177.5 | | AUSS::GARSON | DECcharity Program Office | Wed Feb 19 1997 17:39 | 8 |
| re .4
> does he just have to burn some more CPU cycles? you will trigger many
> many more deadlock searches with a low DEADLOCK_WAIT value.
Not unless there are locks timing out. And on the evidence submitted by
the customer, while there would be more deadlock searches, they would
not be wasted.
|
177.6 | | EEMELI::MOSER | Orienteers do it in the bush... | Thu Feb 20 1997 01:33 | 18 |
| re: .5
I'm still not convinced. Lets say your system is busy and you have a
contention for a certain resource. On average a lock request has to
wait 0.7 sec for the lock to be granted. With a DEADLOCK_WAIT of 1 sec
this means, that you'd normally wouldn't trigger a deadlock search for
this lock request because it's removed from the timeout queue before
the next round of check.
If DEADLOCK_WAIT would be 0.5 sec you would stumble over this lock and
trigger a search, which comes back and says "no deadlock", takes the
lock off the queue and a fraction of a second later it's anyway gone
because in the meantime the lock is granted.
I call it waste if you have lots of deadlock searches, but no deadlock
found numbers.
/cmos
|
177.7 | | AUSS::GARSON | DECcharity Program Office | Thu Feb 20 1997 16:48 | 30 |
| re .6
> I'm still not convinced. Lets say your system is busy and you have a
> contention for a certain resource. On average a lock request has to
> wait 0.7 sec for the lock to be granted.
> If DEADLOCK_WAIT would be 0.5 sec you would stumble over this lock and
> trigger a search, which comes back and says "no deadlock", takes the
As I wrote in a prior reply...
"At the very least they should confirm that when deadlock is not occurring the
locks are not held for a period of time exceeding what they would propose for
DEADLOCK_WAIT."
How realistic is 0.7 sec for an average time that a lock is held? A
system that is supposed to complete 200 transactions per second or
even 100 transactions per second is going to struggle if that is the
correct average.
Here, by transaction I mean from the start of a database transaction to
the commit or rollback that drops all the locks. This may be less than
a business transaction but is the relevant definition in this case.
One needs to take into account parallelism in a system so that 100
transactions per second does not mean each transaction lasts 0.01
second on average. The bottom line is that the customer should measure
the average transaction duration (or their database software can tell
them) before making a case for sub-second DEADLOCK_WAIT but my gut feel
is that their claim may have some merit.
|
177.8 | | EEMELI::MOSER | Orienteers do it in the bush... | Fri Feb 21 1997 02:37 | 14 |
| the problem is not necessary the hold/wait time of 'nice behaving'
locks, but all the others.
For example if I want to see which directory files are busy, I just
lower DEADLOCK_WAIT and suddenly lots of F11B$ locks turn up, because
too many processes try to create/delete files in large directories,
and I can bet lots of money that those .DIR files are larger than
127 blocks.
So any of your transaction locks behaving well do not cause any
problems, but those others can trigger deadlocks searches like hell,
and those searches will also hurt your transaction lock requests.
/cmos
|
177.9 | More info | WHOS01::BOWERS | Dave Bowers, NSIS/IM | Fri Feb 21 1997 11:25 | 19 |
| I went back to the DBA for more info. I appears that most of the
deadlocks are Rdb page locks. The real villain here is the application,
which was written using TI's IEF product (James Martin methodology):
1. IEF is remarkably naive regarding transaction control (like it uses
default transactions).
2. The design of the application has a primary process which writes
rows and then passes a key to a secondary process which further
processes and updates the row. This of course creates instant "hot
spots" on both data and index pages as both processes contend foir
access to the same page.
Neither of the above problems is amenable to a direct fix, so we're
looking to "tune" the system so as to minimize the mess. The good news,
if any, is that there is only this one (abeit large) application
running on the system.
\dave
|
177.10 | | EEMELI::MOSER | Orienteers do it in the bush... | Fri Feb 21 1997 13:19 | 12 |
| do you know which Rdb pages? always the same? then it is pretty
likely to be an application issue.
If you're interested I have a tool which monitors the lock timeout
queue and logs information for locks hanging around there too long,
especially who they are blocking and who is waiting.
For Rdb locks it will translate them to something an Rdb expert
understands, grab and have a look at TUIJA""::LOCK033.A
(works for VAX and Alpha and understands almost all Rdb lock types)
/cmos
|
177.11 | You must fix those deadlocks, not hide them | HERON::GODFRIND | Oracle Rdb Engineering | Fri Feb 28 1997 05:12 | 74 |
| If I may join the conversation ...
The real problem that needs fixing is the excessive number of deadlocks that
happen. Lowering DEADLOCK_WAIT would of course make it possible for VMS to
notice those deadlocks faster, but that is at the expense of potentially severe
side effects (as Christian pointed out) in terms of additional CPU usage at
high IPL.
In addition, when an Rdb process receives a deadlock error against one of its
pending page lock requests, it will demote all other page locks it may have,
which may involve writing pages back to disk that would otherwise have been
written lazily at a later stage. Also, there is a very good chance that the
same process will need to reacquire some of those locks it gave up right after.
So, although page deadlocks in rdb cause no application failure, and even
though adjusting DEADLOCK_WAIT to a lower value will seem to make the
application more responsive, they are still bad and should be kept to the
smallest possible level.
> <<< Note 177.9 by WHOS01::BOWERS "Dave Bowers, NSIS/IM" >>>
> I went back to the DBA for more info. I appears that most of the
> deadlocks are Rdb page locks. The real villain here is the application,
> which was written using TI's IEF product (James Martin methodology):
I myself have done extensive tuning of databases used by applications built
using IEF (although those applications did not require that level of
performance - more like the 10 to 15 TPS range).
> 1. IEF is remarkably naive regarding transaction control (like it uses
> default transactions).
That is not exactly true. Some level of control is available (via logical
names) to control isolation levels and transaction mode (read only vs read
write) for the readers.
> 2. The design of the application has a primary process which writes
> rows and then passes a key to a secondary process which further
> processes and updates the row. This of course creates instant "hot
> spots" on both data and index pages as both processes contend foir
> access to the same page.
> Neither of the above problems is amenable to a direct fix, so we're
> looking to "tune" the system so as to minimize the mess. The good news,
> if any, is that there is only this one (abeit large) application
> running on the system.
You may consider using alternate indexing techniques (such as hashing) to
better distribute records and index nodes and possibly avoid the contention.
Another point to watch is maybe to give up using fast commit (or to set up the
primary process so that it checkpoints after each transaction). that way it
will give up its page locks after each transaction and let the secondary
process get at the pages without locking.
Also, adapting index node sizes should have an effect on contention (smaller
nodes may help).
All this is of course highly speculative. Accurate recommendations would
require a detailed look at the application and database design.
I recommend you get assistance from some experienced Rdb consultant. There are
quite a few available from Oracle and external sources.
Where is the customer located BTW ? Just send me mail offline and I may be able
to locate names for you ...
/albert
--
Albert Godfrind Oracle Rdb Engineering
Oracle Corporation Email: [email protected]
DEC European Technical Center [email protected]
950 Route des Colles Phone: +33/4/92.95.51.63
06901 Sophia-Antipolis Mobile: +33/6/09.97.27.23
France FAX: +33/4/92.95.50.50
|