T.R | Title | User | Personal Name | Date | Lines |
---|
5029.1 | | M5::JHAYTER | | Fri Feb 14 1997 14:41 | 9 |
|
> Dynamic lock remastering is disabled (PE1 was set to 25 and 50)
try using 1 or (%xFFFFFFFF) -1
> Why is Rdb not solving the deadlock in this particular case?
Rdb does not detect deadlocks. The VMS lock manager does and it notifies
Rdb.
|
5029.2 | | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Sat Feb 15 1997 08:04 | 4 |
| Also, Rdb handles "page" deadlocks internally - they are not normally
returned to the application.
Rick
|
5029.3 | How does it solve the deadlock? | NLVMS2::VVISSER | Vincent Visser, Oracle Rdb Support, The Netherlands | Mon Feb 17 1997 04:53 | 17 |
|
>Also, Rdb handles "page" deadlocks internally - they are not normally
>returned to the application.
>
>Rick
This is exactly what it should do. The application is not getting
any deadlock error, but when you look at the RMU/SHOW LOCK/MODE=BLOCKING
output there are deadlocks.
It looks like that Rdb doesn't correctly handles "page" deadlocks
internally.
How does it solve a deadlock with a page lock and a freeze lock
involved? Who will be chosen as the victim?
Regards,
Vincent
|
5029.4 | | ukvms3.uk.oracle.com::PJACKSON | Oracle UK Rdb Support | Mon Feb 17 1997 05:08 | 15 |
| > This is exactly what it should do. The application is not getting
> any deadlock error, but when you look at the RMU/SHOW LOCK/MODE=BLOCKING
> output there are deadlocks.
This shows that VMS has not chosen one of the lock requests to abort.
When it does the $ENQ returns an error and the request will no longer
be outstanding.
> How does it solve a deadlock with a page lock and a freeze lock
> involved? Who will be chosen as the victim?
VMS does the choosing (based on a value supplied by Rdb). Until VMS
chooses one Rdb can do nothing.
Peter
|
5029.5 | | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Mon Feb 17 1997 06:13 | 6 |
| Check your DEADLOCK_WAIT sysgen parameter.
I like to have it set to "1" or "2" (default the "10" seconds, which is
horrendous for any real-world application).
Rick
|
5029.6 | Gotta know your system and the OpenVMS lock manager... | BOUVS::OAKEY | I'll take Clueless for $500, Alex | Mon Feb 17 1997 11:31 | 31 |
| ~~Note 5029.4 Deadlock on page not resolved? 4 of 5
~~ukvms3.uk.oracle.com::PJACKSON "Oracle UK Rdb Suppo" 15 lines 17-FEB-1997 05:08
~~
~~ This shows that VMS has not chosen one of the lock requests to abort.
~~ When it does the $ENQ returns an error and the request will no longer
~~ be outstanding.
Not quite true. When OpenVMS detects a deadlock, it signals the victim but
does nothing to the pending request. It is up to the victim to $ENQ to a
lesser lock mode (or $DEQ) to remove the request from the appropriate
pending/conversion queue.
~~Note 5029.5 Deadlock on page not resolved? 5 of 5
~~NOVA::R_ANDERSON "Oracle Corporation (603) 881-1935" 6 lines 17-FEB-1997 06:13
~~
~~I like to have it set to "1" or "2" (default the "10" seconds, which is
~~horrendous for any real-world application).
Here is where I might disagree a bit. DEADLOCK_WAIT is a SYSGEN parameter.
Tweaking it affects the entire system. Setting it to 1 or 2 will help
quickly identify true deadlocks. However, you may be causing the system to
check an excessive number of potential deadlocks in the deadlock queue that
aren't really deadlocks, just pending lock requests. You should evaluate
your system to make sure that you aren't waiting an excessive amount of
time to find real deadlocks but also to make sure you're not checking too
quickly and using up system resources checking for potential deadlocks that
aren't.
|
5029.7 | | 138.3.209.29::PJACKSON | Oracle UK Rdb Support | Mon Feb 17 1997 12:05 | 9 |
| >Not quite true. When OpenVMS detects a deadlock, it signals the victim but
>does nothing to the pending request. It is up to the victim to $ENQ to a
>lesser lock mode (or $DEQ) to remove the request from the appropriate
>pending/conversion queue.
That's not what my VMS internals manual says. It says the lock request
fails.
Peter
|
5029.8 | I think we said the same thing :) | BOUVS::OAKEY | I'll take Clueless for $500, Alex | Mon Feb 17 1997 12:25 | 16 |
| ~~ <<< Note 5029.7 by 138.3.209.29::PJACKSON "Oracle UK Rdb Support" >>>
~~ That's not what my VMS internals manual says. It says the lock request
~~ fails.
Which doesn't really disagree with what I said. When you request a lock
with WAIT and the request is not immediately granted, you're placed in
either the waiting or conversion queue (depending on the previous state of
the lock) *and* the timeout queue. When you've been in the timeout queue
deadlock_wait length of time, OpenVMS will check to see if your lock
request participates in a deadlock. If so, then one of the deadlock
participators is signalled as the victim and their lock request returns a
deadlock error. That doesn't mean they're removed from the waiting or
conversion queue, you've got to $ENQ to a more permissive mode for that to
happen.
|
5029.9 | | NOVA::GODFRIND | Oracle Rdb Engineering | Mon Feb 17 1997 12:46 | 49 |
| >~~ This shows that VMS has not chosen one of the lock requests to abort.
>~~ When it does the $ENQ returns an error and the request will no longer
>~~ be outstanding.
>
>Not quite true. When OpenVMS detects a deadlock, it signals the victim but
>does nothing to the pending request. It is up to the victim to $ENQ to a
>lesser lock mode (or $DEQ) to remove the request from the appropriate
>pending/conversion queue.
Ahem. I beg to disagree (and agree with Peter). The lock request for which the
deadlock error gets reported does get removed from the queue it was waiting in
(and put back in its prior state if necessary).
However, the other locks that the victim process may have (and that are
blocking the other processes, causing the deadlock condition) do NOT get
removed automaticaly. It is up to the applicaiton to do the right thing
(usually rollback the current transaction).
>~~I like to have it set to "1" or "2" (default the "10" seconds, which is
>~~horrendous for any real-world application).
>
>Here is where I might disagree a bit. DEADLOCK_WAIT is a SYSGEN parameter.
>Tweaking it affects the entire system. Setting it to 1 or 2 will help
>quickly identify true deadlocks. However, you may be causing the system to
>check an excessive number of potential deadlocks in the deadlock queue that
>aren't really deadlocks, just pending lock requests. You should evaluate
>your system to make sure that you aren't waiting an excessive amount of
>time to find real deadlocks but also to make sure you're not checking too
>quickly and using up system resources checking for potential deadlocks that
>aren't.
I beg to agree. Deadlock seraches are pretty costly - not so much that they use
CPU, but that they use kernel mode cpu at evated IPL (IPL8), which may disturb
other system functions.
I tend to think that setting deadlock wait to a low number provides fast
relief, but does not cure the real problem. It acts like a pain killer, but you
still need to see the doctor. A large number of deadlocks (even if they are
handled internally by Rdb) is bad and needs investigating.
That said, we are straying away from the base prtoblem. From the look of it,
two ACMS servers were waiting for the freeze lock, held by a recovery process,
which itself was waiting for a page (page #1 in some area), held by those two
processes.
I am not sure what should have happened. The DBR should have a deadlock
priority lowe than the monitor but higher than all user processes, so any
deadlock error should have been reported to the acms servers (probsably a
"deadlock on freeze") error.
|
5029.10 | | ukvms3.uk.oracle.com::PJACKSON | Oracle UK Rdb Support | Mon Feb 17 1997 12:52 | 31 |
| >~~ That's not what my VMS internals manual says. It says the lock request
>~~ fails.
>
>Which doesn't really disagree with what I said.
It does as I read it.
>When you request a lock
>with WAIT and the request is not immediately granted, you're placed in
>either the waiting or conversion queue (depending on the previous state of
>the lock) *and* the timeout queue. When you've been in the timeout queue
>deadlock_wait length of time, OpenVMS will check to see if your lock
>request participates in a deadlock. If so, then one of the deadlock
>participators is signalled as the victim and their lock request returns a
>deadlock error. That doesn't mean they're removed from the waiting or
>conversion queue, you've got to $ENQ to a more permissive mode for that to
>happen.
If the request is still queued then it has not failed - it may yet
succeed.
Two sentences earlier the manual says 'VMS resolves deadlocks by
choosing a participant in the deadlock cycle and refusing that
participant's lock request', which also seems incompatible with the
request remaining queued.
It may be that the manual is wrong. I haven't been able to find
anything more recent than 1989 - some manuals went missing in the last
office move :-(
Peter
|
5029.11 | | ukvms3.uk.oracle.com::PJACKSON | Oracle UK Rdb Support | Mon Feb 17 1997 12:56 | 9 |
| >I tend to think that setting deadlock wait to a low number provides fast
>relief, but does not cure the real problem. It acts like a pain killer, but you
>still need to see the doctor. A large number of deadlocks (even if they are
>handled internally by Rdb) is bad and needs investigating.
I normally consider deadlocks to be a side effect of a locking problem.
Fix the locking problem and the deadlocks go away by themselves.
Peter
|
5029.12 | Small nit | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Mon Feb 17 1997 13:23 | 4 |
| I don't want to lead things off on a big tangent, but it is possible
for a user process doing a rollback to have deadlock priority higher
than DBR. This can occur for brief periods on page deadlocks.
Rollbacks proceed regardless of whether DBRs are running.
|
5029.13 | Back to the real question. | NLVMS2::VVISSER | Vincent Visser, Oracle Rdb Support, The Netherlands | Mon Feb 17 1997 14:55 | 12 |
| Back to the real question.
Suppose that, because of the deadlock priority, VMS chooses the
pagelock as the victim and gives a deadlock error back to Rdb.
How does it solve this deadlock? When two pagelocks are involved it can
release all the pagelocks, but can Rdb decide to release the
freeze lock? This is the only way to get out of this situation when the
page lock has been chosen as victim.
Could it be that this is the problem why it didn't get out of the
situation?
Regards,
Vincent
|
5029.14 | | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Mon Feb 17 1997 16:45 | 12 |
| Any process that gets a deadlock on a page will flush any modified
buffers and reduce the remaining page locks to the minimum required
level to indicate that the process is still looking at a page. It then
temporarily boosts its deadlock priority to a high enough level that it
will almost always win in any deadlock conflict (even with a DBR).
This activity can iterate forever until all processes involved in the
deadlock have unmarked all of their buffers and minimized all of their
page locks. At some point there should no longer be a conflict.
As far as I know unmarking all buffers is always enough to allow the
competing process (such as a DBR) to get a copy of the page in question
and thus resolve the deadlock.
|
5029.15 | | ukvms3.uk.oracle.com::PJACKSON | Oracle UK Rdb Support | Tue Feb 18 1997 04:24 | 17 |
| > Back to the real question.
> Suppose that, because of the deadlock priority, VMS chooses the
> pagelock as the victim and gives a deadlock error back to Rdb.
> How does it solve this deadlock? When two pagelocks are involved it can
> release all the pagelocks, but can Rdb decide to release the
> freeze lock? This is the only way to get out of this situation when the
> page lock has been chosen as victim.
> Could it be that this is the problem why it didn't get out of the
> situation?
No, because VMS has not given a deadlock back to Rdb. If it had, you
would not be able to see the deadlock situation using rmu/show locks
(assuming that Albert and I are correct).
If what you are suggesting had happened there would be no process waiting
for the page lock, and that lock request would have been rejected.
Peter
|
5029.16 | another 'deadlock'.... | NLVMS3::ADRIEL | | Thu Feb 20 1997 12:03 | 46 |
| Oracle Rdb V6.1-04 VAX/VMS V6.1
Hi,
same customer encountered last night again a hang condition which
could only be resolved by killing one of the processes.
An operator is warned when the (7x24)application 'hangs' for more then
30 minutes.
After which he has to 'solve' this problem as quick as possible.
Below the RMU output just before killing the ACMS process.
This is the 3 third time in a few weeks such a 'deadlock' condition
occurs.
W'll try to collect as much information as possible but that's
difficult afterwards and with almost no time available to analyze
on-line.
Any further ideas, for example is this related to previous events?
Adri
================================================================================
SHOW LOCKS/MODE=BLOCKING Information
================================================================================
--------------------------------------------------------------------------------
Resource: page 1905
ProcessID Process Name Lock ID System ID Requested Granted
--------- --------------- --------- --------- --------- -------
Waiting: 00207639 ACMS001SP001000 579B0050 00090002 PR NL
Blocker: 0020824C BATCH_30....... 3B0007BB 00100001 PW PW
.
.
.
--------------------------------------------------------------------------------
Resource: nowait signal
ProcessID Process Name Lock ID System ID Requested Granted
--------- --------------- --------- --------- --------- -------
Waiting: 0020824C BATCH_30....... 0C001666 00090002 CW PR
Blocker: 00207639 ACMS001SP001000 66003D3A 00100001 PR PR
...
..
|
5029.17 | | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Thu Feb 20 1997 13:06 | 10 |
| That one looks familiar. A deadlock on the nowait lock. The nowait
lock is one of the special "no deadlock search" locks.
I could swear someone reported that in this notesfile a year or so ago.
If my fuzzy memory serves me right I believe we asked to have the
problem reported.
Is your customer using fast commit? Do they use nowait txns? If so,
they might want to stop doing one or the other if this problem is
causing them a lot of grief -- at least until it can be fixed.
|