[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:	Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:	RDB_60 is archived, please use RDB_70..
Moderator:	NOVA::SMITHISON

Created:	Fri Mar 18 1994
Last Modified:	Thu May 29 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	5118
Total number of notes:	28246

4972.0. "Lock Re-mastering and Rdb" by BROKE::BASTINE () Tue Jan 28 1997 09:51

I have a question on Lock Re-mastering, PE1 and Rdb...

I had a customer call this morning who is looking to "cluster" her Rdb database
so that she can run reports on one node in the cluster and have users run on
the other node.  Right now they all live in harmony on the 1 node.  When she
opened the database on the second node and ran one of her reports, the users
on the other node all complained about being locked out.  She said she used
the rmu/show stat's but couldn't find the blocker.  She stopped the report
running on the other node and the database users seemed to function again.

According to the customer this query will run on the same node as the users
and not lock them out, so why would running it on another node in the cluster
lock them out?  She was told a while ago that if she used rdb in a cluster,
she would need to tweek some SYSGEN parameter.  The only thing I could think
of would be the lock re-mastering parameter (PE1).  Right now it is set
to 0 on both nodes.  If she turns it off, by setting it to -1, will that help
the above situation?  

I tried to focus on the program a bit, asking if it declares a read only or
read/write transaction and she didn't know, but said when it runs on the one
node it doesn't lock anyone out and it is the same program.

Would lock re-mastering cause what she is seeing?

Renee

T.R	Title	User	Personal Name	Date	Lines
4972.1		HOTRDB::PMEAD	Paul, [email protected], 719-577-8032	`Tue Jan 28 1997 09:57`	10
	Nobody should get "locked out" just because the application is now running on a different node. They could, however, see some very long pauses. Is that what they mean by locked out? There are tons of ways to tweak lock remastering. The simple, big-hammer, approach is to set PE1 to a low number (like 1) on all nodes. This prevents lock trees with more than one lock in them from getting remastered across the cluster. If they do this then they will need to be sure they open the db on the "user" node first so that it will become the master of the locks for the db.
4972.2	Remember stats is node specific	BOUVS::OAKEY	I'll take Clueless for $500, Alex	`Tue Jan 28 1997 10:20`	9
	~~ <<< Note 4972.0 by BROKE::BASTINE >>> ~~ -< Lock Re-mastering and Rdb >- ~~on the other node all complained about being locked out. She said she used ~~the rmu/show stat's but couldn't find the blocker. She stopped the report What was she looking at in stats? I'd run stats on the node which the users are stalled on and see what they're stalled for and go from there.
4972.3	Thanks...	BROKE::BASTINE		`Tue Jan 28 1997 10:36`	25
	>What was she looking at in stats? I'd run stats on the node which the >users are stalled on and see what they're stalled for and go from there. She was in RMU/SHow STAT on the users node, and looking in the process information. She typed L (for locks) and saw that the blocker was an ACMS process, but she said that is NEVER the case and didn't believe it. She said that the blockers are usually batch jobs. When she killed the query it took a while before things got back to normal, but that batch job was the only thing she thought could have been the culprit. Paul, what/why would they experience "pauses"? The customer is going to set PE1 to 1 and try the "cluster access" again. I explained that we didn't think it would have been this batch job and given the stat's screen called out an ACMS process as the culprit, could it have been coincidence that the problem occured when running the batch job? She said that this batch job wouldn't use ACMS, so I don't think they are related. Anyway, thanks for your quick replies. The customer will call back if they find it happens again. If it does, it sure would help to know why pauses are expected. Thanks, Renee
4972.4		M5::LWILCOX	Chocolate in January!!	`Tue Jan 28 1997 10:38`	9
	<<< Note 4972.3 by BROKE::BASTINE >>> -< Thanks... >- >>Paul, what/why would they experience "pauses"? Long verb perhaps? Liz (not Paul)
4972.5		HOTRDB::LASTOVICA	Is it possible to be totally partial?	`Tue Jan 28 1997 10:49`	8
	when you start working in a cluster, several things happen. First, you'll find that a 'remote' lock request (where the lock request requires access to a remote node) can be tens or hundreds of times slower than a local lock request. This will slow down locking, plain and simple. Second, if global buffers are being used, having remove accessors to the database may cause additional disk I/O. However, I suspect that some simple detective work with RMU/SHO STAT will show the scoop.
4972.6		HOTRDB::PMEAD	Paul, [email protected], 719-577-8032	`Tue Jan 28 1997 11:22`	3
	A long pause can occur when VMS decides to migrate mastering a lock tree from one node in a cluster to another node. The bigger the lock tree the longer the pause.
4972.7	vms process states	UKVMS3::SHISCOCK	stand and deliver	`Tue Jan 28 1997 11:29`	2
	with lock remastering you may also see process states in RWSCS and/or RWCLU.
4972.8		NOVA::R_ANDERSON	Oracle Corporation (603) 881-1935	`Tue Jan 28 1997 11:48`	9
	>She was in RMU/SHow STAT on the users node, and looking in the process >information. She typed L (for locks) and saw that the blocker was an >ACMS process, but she said that is NEVER the case and didn't believe it. Trust me - if SHOW STATS says it is an ACMS process, then it is :-) I get this information direct from VMS, so it had better be accurate... Rick
4972.9		NOVA::R_ANDERSON	Oracle Corporation (603) 881-1935	`Tue Jan 28 1997 14:53`	13
	Typical cause of "long pauses" are the following: 1. DEADLOCK_WAIT sysgen parameter set to "10" (probably should be "1" or "2"). 2. Dynamic Lock remastering (set PE1 sysgen parameter to "0") 3. Lock serialization (an application problem) 4. Cluster transition (new node joining or old node leaving) 5. Doing CTRL-Y and NOT issuing STOP or EXIT command from DCL 6. Pausing the STATS screen 7. New pot of coffee ("the pause that refreshes") 8. Amnesia Rick
4972.10	RMU/SHOW LOCKS/MODE=BLOCKING	NOVA::BRYDEN		`Tue Jan 28 1997 15:01`	4
	What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on both nodes? That could tell us what resource it is waiting for. Dave
4972.11		NOVA::R_ANDERSON	Oracle Corporation (603) 881-1935	`Tue Jan 28 1997 15:29`	7
	> What happens if the customer runs RMU/SHOW LOCK/MODE=BLOCKING on > both nodes? That could tell us what resource it is waiting for. This should not be necessary, since the local node knows about the resource by virtue of the stall... Rick
4972.12	still....	NOVA::BRYDEN		`Tue Jan 28 1997 19:21`	6
	It would still be interesting to know what the ACMS process was stalled on. All the user said was that ACMS was the process and that never locks anything... maybe if we knew what was being locked it might shed some light on the problem. Dave
4972.13	Wow!! What a great response! Thanks!	BROKE::BASTINE		`Tue Jan 28 1997 19:58`	10
	Well, she hasn't called back yet, so: 1: She didn't fire it up again 2: She did fire it up again and it ran just fine 3: She found out why the ACMS process was stuck and is afraid to call back! :) Thanks for all the answers/replies. They are truly helpful, if not to the customer, then by virtue that they all confirmed what I thought. :) Renee