[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

4972.0. "Lock Re-mastering and Rdb" by BROKE::BASTINE () Tue Jan 28 1997 09:51

I have a question on Lock Re-mastering, PE1 and Rdb...

I had a customer call this morning who is looking to "cluster" her Rdb database
so that she can run reports on one node in the cluster and have users run on
the other node.  Right now they all live in harmony on the 1 node.  When she
opened the database on the second node and ran one of her reports, the users
on the other node all complained about being locked out.  She said she used
the rmu/show stat's but couldn't find the blocker.  She stopped the report
running on the other node and the database users seemed to function again.

According to the customer this query will run on the same node as the users
and not lock them out, so why would running it on another node in the cluster
lock them out?  She was told a while ago that if she used rdb in a cluster,
she would need to tweek some SYSGEN parameter.  The only thing I could think
of would be the lock re-mastering parameter (PE1).  Right now it is set
to 0 on both nodes.  If she turns it off, by setting it to -1, will that help
the above situation?  

I tried to focus on the program a bit, asking if it declares a read only or
read/write transaction and she didn't know, but said when it runs on the one
node it doesn't lock anyone out and it is the same program.

Would lock re-mastering cause what she is seeing?

Renee
T.RTitleUserPersonal
Name
DateLines
4972.1HOTRDB::PMEADPaul, [email protected], 719-577-8032Tue Jan 28 1997 09:5710
    Nobody should get "locked out" just because the application is now
    running on a different node.  They could, however, see some very long
    pauses.  Is that what they mean by locked out?
    
    There are tons of ways to tweak lock remastering.  The simple,
    big-hammer, approach is to set PE1 to a low number (like 1) on all
    nodes.  This prevents lock trees with more than one lock in them from
    getting remastered across the cluster.  If they do this then they will
    need to be sure they open the db on the "user" node first so that it
    will become the master of the locks for the db.
4972.2Remember stats is node specificBOUVS::OAKEYI'll take Clueless for $500, AlexTue Jan 28 1997 10:209
~~                      <<< Note 4972.0 by BROKE::BASTINE >>>
~~                         -< Lock Re-mastering and Rdb >-

~~on the other node all complained about being locked out.  She said she used
~~the rmu/show stat's but couldn't find the blocker.  She stopped the report

What was she looking at in stats?  I'd run stats on the node which the 
users are stalled on and see what they're stalled for and go from there.

4972.3Thanks...BROKE::BASTINETue Jan 28 1997 10:3625
>What was she looking at in stats?  I'd run stats on the node which the 
>users are stalled on and see what they're stalled for and go from there.

She was in RMU/SHow STAT on the users node, and looking in the process
information.  She typed L (for locks) and saw that the blocker was an
ACMS process, but she said that is NEVER the case and didn't believe it.  She
said that the blockers are usually batch jobs.  When she killed the query
it took a while before things got back to normal, but that batch job was the
only thing she thought could have been the culprit.

Paul, what/why would they experience "pauses"?

The customer is going to set PE1 to 1 and try the "cluster access" again.  I
explained that we didn't think it would have been this batch job and given
the stat's screen called out an ACMS process as the culprit, could it have
been coincidence that the problem occured when running the batch job?

She said that this batch job wouldn't use ACMS, so I don't think they are
related.  Anyway, thanks for your quick replies.  The customer will call back
if they find it happens again.  If it does, it sure would help to know why
pauses are expected.

Thanks,
Renee

4972.4M5::LWILCOXChocolate in January!!Tue Jan 28 1997 10:389
                      <<< Note 4972.3 by BROKE::BASTINE >>>
                                 -< Thanks... >-


>>Paul, what/why would they experience "pauses"?

Long verb perhaps?

Liz (not Paul)
4972.5HOTRDB::LASTOVICAIs it possible to be totally partial?Tue Jan 28 1997 10:498
    when you start working in a cluster, several things happen.  First,
    you'll find that a 'remote' lock request (where the lock request
    requires access to a remote node) can be tens or hundreds of times
    slower than a local lock request.  This will slow down locking, plain
    and simple.  Second, if global buffers are being used, having remove
    accessors to the database may cause additional disk I/O.  However, I
    suspect that some simple detective work with RMU/SHO STAT will show the
    scoop.
4972.6HOTRDB::PMEADPaul, [email protected], 719-577-8032Tue Jan 28 1997 11:223
    A long pause can occur when VMS decides to migrate mastering a lock
    tree from one node in a cluster to another node.  The bigger the lock
    tree the longer the pause.
4972.7vms process statesUKVMS3::SHISCOCKstand and deliverTue Jan 28 1997 11:292
    with lock remastering you may also see process states in RWSCS and/or
    RWCLU.
4972.8NOVA::R_ANDERSONOracle Corporation (603) 881-1935Tue Jan 28 1997 11:489
>She was in RMU/SHow STAT on the users node, and looking in the process
>information.  She typed L (for locks) and saw that the blocker was an
>ACMS process, but she said that is NEVER the case and didn't believe it.

Trust me - if SHOW STATS says it is an ACMS process, then it *is* :-)

I get this information direct from VMS, so it had better be accurate...

Rick
4972.9NOVA::R_ANDERSONOracle Corporation (603) 881-1935Tue Jan 28 1997 14:5313
Typical cause of "long pauses" are the following:

1.  DEADLOCK_WAIT sysgen parameter set to "10" (probably should be "1" or "2").
2.  Dynamic Lock remastering (set PE1 sysgen parameter to "0")
3.  Lock serialization (an application problem)
4.  Cluster transition (new node joining or old node leaving)
5.  Doing CTRL-Y and NOT issuing STOP or EXIT command from DCL
6.  Pausing the STATS screen
7.  New pot of coffee ("the pause that refreshes")
8.  Amnesia

Rick

4972.10RMU/SHOW LOCKS/MODE=BLOCKINGNOVA::BRYDENTue Jan 28 1997 15:014
        What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on
        both nodes? That could tell us what resource it is waiting for.
        
        Dave
4972.11NOVA::R_ANDERSONOracle Corporation (603) 881-1935Tue Jan 28 1997 15:297
>        What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on
>        both nodes? That could tell us what resource it is waiting for.

This should not be necessary, since the local node knows about the resource by
virtue of the stall...

Rick
4972.12still....NOVA::BRYDENTue Jan 28 1997 19:216
        It would still be interesting to know what the ACMS process was
        stalled on. All the user said was that ACMS was the process and
        that never locks anything... maybe if we knew what was being
        locked it might shed some light on the problem.
        
        Dave
4972.13Wow!! What a great response! Thanks!BROKE::BASTINETue Jan 28 1997 19:5810
Well, she hasn't called back yet, so:

1: She didn't fire it up again
2: She did fire it up again and it ran just fine
3: She found out why the ACMS process was stuck and is afraid to call back! :)

Thanks for all the answers/replies.  They are truly helpful, if not to the
customer, then by virtue that they all confirmed what I *thought*. :)

Renee