[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:	SCHEDULER
Notice:	Welcome to the Scheduler Conference on node HUMANEril
Moderator:	RUMOR::FALEK

Created:	Sat Mar 20 1993
Last Modified:	Tue Jun 03 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1240
Total number of notes:	5017

1180.0. "Decscheduler batch jobs come frequently in RWSCS " by BACHUS::BANKEN () Wed Nov 06 1996 05:14

Config : Cluster compound of two 7000
	 OpenVms 6.2
	 Decscheduler 2.1b-9

Problem : Decscheduler batch jobs come frequently in RWSCS

Observation : Let's say that node A is the default scheduler node and that 
it should run the jobs (no load balancing, no node restriction).
For batch jobs we observed that those processes come from time to time in RWSCS.
Under SDA we have seen the following locks for the NSCHED process :

NodeA_SDA> sh proc nsched/lock
 
  NSCHED_796922473 ( there are many of this kind of locks)

  NSCHED_MASTER (the master copy of this lock is owned by node B)

Changing the default scheduler node result to a switch of all the NSCHED_7xxxx
locks to the (new) default node, the NSCHED_MASTER master copy stays where it 
was.
 
Can someone give some info about those locks ? 

Many thanks,

Alain.

T.R Title User Personal
Name Date Lines

1180.1 RUMOR::FALEK ex-TU58 King Wed Nov 06 1996 14:53 10

T.R	Title	User	Personal Name	Date	Lines
1180.1		RUMOR::FALEK	ex-TU58 King	`Wed Nov 06 1996 14:53`	10
	I don't have the source code , but I seem to remember that the NSCHED_nnn (nnn is the job number in decimal?) locks are taken out by batch mode scheduler jobs when they start. A message then gets sent to the default NSCHED, which queues a request for that lock, with a completion AST to execute if the request completes. If the job terminates abnormally, NSCHED gets the lock and the AST signals it to do appropriate clean up. So, the normal situation is for each currently executing batch mode job to hold a lock, and the default NSCHED process blocked requests queued for all these locks.

    I don't have the source code , but I seem to remember that the
    NSCHED_nnn (nnn is the job number in decimal?) locks are taken out by
    batch mode scheduler jobs when they start. A message then gets sent to
    the default NSCHED, which queues a request for that lock, with a
    completion AST to execute if the request completes. If the job terminates
    abnormally, NSCHED gets the lock and the AST signals it to do
    appropriate clean up.  So, the normal situation is for each currently
    executing batch mode job to hold a lock, and the default NSCHED process
    blocked requests queued for all these locks.