T.R | Title | User | Personal Name | Date | Lines |
---|
4972.1 | | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Tue Jan 28 1997 09:57 | 10 |
| Nobody should get "locked out" just because the application is now
running on a different node. They could, however, see some very long
pauses. Is that what they mean by locked out?
There are tons of ways to tweak lock remastering. The simple,
big-hammer, approach is to set PE1 to a low number (like 1) on all
nodes. This prevents lock trees with more than one lock in them from
getting remastered across the cluster. If they do this then they will
need to be sure they open the db on the "user" node first so that it
will become the master of the locks for the db.
|
4972.2 | Remember stats is node specific | BOUVS::OAKEY | I'll take Clueless for $500, Alex | Tue Jan 28 1997 10:20 | 9 |
| ~~ <<< Note 4972.0 by BROKE::BASTINE >>>
~~ -< Lock Re-mastering and Rdb >-
~~on the other node all complained about being locked out. She said she used
~~the rmu/show stat's but couldn't find the blocker. She stopped the report
What was she looking at in stats? I'd run stats on the node which the
users are stalled on and see what they're stalled for and go from there.
|
4972.3 | Thanks... | BROKE::BASTINE | | Tue Jan 28 1997 10:36 | 25 |
| >What was she looking at in stats? I'd run stats on the node which the
>users are stalled on and see what they're stalled for and go from there.
She was in RMU/SHow STAT on the users node, and looking in the process
information. She typed L (for locks) and saw that the blocker was an
ACMS process, but she said that is NEVER the case and didn't believe it. She
said that the blockers are usually batch jobs. When she killed the query
it took a while before things got back to normal, but that batch job was the
only thing she thought could have been the culprit.
Paul, what/why would they experience "pauses"?
The customer is going to set PE1 to 1 and try the "cluster access" again. I
explained that we didn't think it would have been this batch job and given
the stat's screen called out an ACMS process as the culprit, could it have
been coincidence that the problem occured when running the batch job?
She said that this batch job wouldn't use ACMS, so I don't think they are
related. Anyway, thanks for your quick replies. The customer will call back
if they find it happens again. If it does, it sure would help to know why
pauses are expected.
Thanks,
Renee
|
4972.4 | | M5::LWILCOX | Chocolate in January!! | Tue Jan 28 1997 10:38 | 9 |
| <<< Note 4972.3 by BROKE::BASTINE >>>
-< Thanks... >-
>>Paul, what/why would they experience "pauses"?
Long verb perhaps?
Liz (not Paul)
|
4972.5 | | HOTRDB::LASTOVICA | Is it possible to be totally partial? | Tue Jan 28 1997 10:49 | 8 |
| when you start working in a cluster, several things happen. First,
you'll find that a 'remote' lock request (where the lock request
requires access to a remote node) can be tens or hundreds of times
slower than a local lock request. This will slow down locking, plain
and simple. Second, if global buffers are being used, having remove
accessors to the database may cause additional disk I/O. However, I
suspect that some simple detective work with RMU/SHO STAT will show the
scoop.
|
4972.6 | | HOTRDB::PMEAD | Paul, [email protected], 719-577-8032 | Tue Jan 28 1997 11:22 | 3 |
| A long pause can occur when VMS decides to migrate mastering a lock
tree from one node in a cluster to another node. The bigger the lock
tree the longer the pause.
|
4972.7 | vms process states | UKVMS3::SHISCOCK | stand and deliver | Tue Jan 28 1997 11:29 | 2 |
| with lock remastering you may also see process states in RWSCS and/or
RWCLU.
|
4972.8 | | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Tue Jan 28 1997 11:48 | 9 |
| >She was in RMU/SHow STAT on the users node, and looking in the process
>information. She typed L (for locks) and saw that the blocker was an
>ACMS process, but she said that is NEVER the case and didn't believe it.
Trust me - if SHOW STATS says it is an ACMS process, then it *is* :-)
I get this information direct from VMS, so it had better be accurate...
Rick
|
4972.9 | | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Tue Jan 28 1997 14:53 | 13 |
| Typical cause of "long pauses" are the following:
1. DEADLOCK_WAIT sysgen parameter set to "10" (probably should be "1" or "2").
2. Dynamic Lock remastering (set PE1 sysgen parameter to "0")
3. Lock serialization (an application problem)
4. Cluster transition (new node joining or old node leaving)
5. Doing CTRL-Y and NOT issuing STOP or EXIT command from DCL
6. Pausing the STATS screen
7. New pot of coffee ("the pause that refreshes")
8. Amnesia
Rick
|
4972.10 | RMU/SHOW LOCKS/MODE=BLOCKING | NOVA::BRYDEN | | Tue Jan 28 1997 15:01 | 4 |
| What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on
both nodes? That could tell us what resource it is waiting for.
Dave
|
4972.11 | | NOVA::R_ANDERSON | Oracle Corporation (603) 881-1935 | Tue Jan 28 1997 15:29 | 7 |
| > What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on
> both nodes? That could tell us what resource it is waiting for.
This should not be necessary, since the local node knows about the resource by
virtue of the stall...
Rick
|
4972.12 | still.... | NOVA::BRYDEN | | Tue Jan 28 1997 19:21 | 6 |
| It would still be interesting to know what the ACMS process was
stalled on. All the user said was that ACMS was the process and
that never locks anything... maybe if we knew what was being
locked it might shed some light on the problem.
Dave
|
4972.13 | Wow!! What a great response! Thanks! | BROKE::BASTINE | | Tue Jan 28 1997 19:58 | 10 |
| Well, she hasn't called back yet, so:
1: She didn't fire it up again
2: She did fire it up again and it ran just fine
3: She found out why the ACMS process was stuck and is afraid to call back! :)
Thanks for all the answers/replies. They are truly helpful, if not to the
customer, then by virtue that they all confirmed what I *thought*. :)
Renee
|