[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:SCHEDULER
Notice:Welcome to the Scheduler Conference on node HUMANEril
Moderator:RUMOR::FALEK
Created:Sat Mar 20 1993
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1240
Total number of notes:5017

1203.0. "Job does no load balance, why?" by HTSC19::KENNETH () Mon Feb 03 1997 02:28

Hi,

My customer is facing a quite strange problem, the jobs are not load balanced
on their cluster even the load_balance logical is turned on (we modify it on
the SYS$STARTUP:SCHEDULER$STARTUP and restart again).

At first the Jmax is set to 6 by default, but for our testing purpose, we set
it to 2.  The logical NSCHED$LOAD_BALANCE is already set to "on" as you can
see it below.  

Jobs still execute on node JENA even Jmax is reach and rating is 0.  We have
tried to set the default node to MAGNA, but it is the same.  Jobs always 
execute on node MAGNA (the default) and never pass to the other node when
the Jmax is reach and rating is 0.

Is there anything we need to check further?  I have no experience on 
Scheduler's load balancing, I only check it on the manuals and play it
on the customer's system.  Any thing I miss?  Do I need to set up proxy
access on the systems?  Below are the logs.  Anyone there knows why?


Thanks for your help in advance.

Kenneth Leung

================================================================================


SCHEDULE> SHO STAT
Node   Version  Started              Jobs  Jmax   Log  Pri Rating
JENA   V2.1b-9  30-JAN-1997 14:12:44    2     2     5    4      0 <-- Default
MAGNA  V2.1b-9  24-JAN-1997 22:02:23    0     2     5    4   2091



SCHEDULE> SHO STAT
Node   Version  Started              Jobs  Jmax   Log  Pri Rating
JENA   V2.1b-9  30-JAN-1997 14:12:44    3     2     5    4      0 <-- Default
MAGNA  V2.1b-9  24-JAN-1997 22:02:23    0     2     5    4   2883

SCHEDULE> SHO STAT
Node   Version  Started              Jobs  Jmax   Log  Pri Rating
JENA   V2.1b-9  30-JAN-1997 14:12:44    4     2     5    4      0 <-- Default
MAGNA  V2.1b-9  24-JAN-1997 22:02:23    0     2     5    4   1963
SCHEDULE>




  "NSCHED$" = "SYS$COMMON:[NSCHED]"
  "NSCHED$CLEAR_RESTART_PARAM" = "TRUE"
  "NSCHED$DEFAULT_JOB_MAX" = "6"
  "NSCHED$DEFAULT_JOB_PRI" = "4"
  "NSCHED$LOAD_BALANCE" = "ON"
  "NSCHED$MAILBOX" = "_MBA7199:"
  "NSCHED$TERM_MAILBOX" = "_MBA7204:"
  "NSCHED$UID" = "NSCHED$:SCHEDULER$XUI.UID"
  "NSCHED_DEFAULT_SD_ACTION" = "SKIP"
  


T.RTitleUserPersonal
Name
DateLines
1203.1check your databaseHLFS00::ERIC_SEric Sonneveld MCS - B.O. IS HollandThu Feb 06 1997 05:284
Try a $ schedule check/all and look again to the output of sched sh stat.
What does schedule sh load tell you?

eric
1203.2NSCHED logicals are not set by default, is it normal?HTSC19::KENNETHWed Feb 12 1997 21:2613
    Hi Eric,

    Thanks for your help.  I will onsite again tomorrow.  From the findings
    last time, I notice that when showing the logicals NSCHED*, the
    logical NSCHED$LBAL$CPU_WEIGHT is not shown by default, even the
    NSCHED$DEFAULT_JOB_MAX, NSCHED$DEFAULT_JOB_PRI or NSCH$LBAL$INTERVAL
    are not shown.  Is it normal?  If it is not shown, does it mean the
    values are use the default one?

    Thanks again for your help.

    Kenneth Leung

1203.3only need to set if you don't like the defaultsRUMOR::FALEKex-TU58 KingFri Feb 14 1997 13:1217
    nsched$lbal$cpu_weight does not need to be defined unless you want to
    fine-tune the algorithm by which a scheduler calculates the service
    ratings used by load balancing.   There is a memory caluclation and a
    cpu calculation that are plugged into a formula. If set, 
    nsched$lbal$cpu_weight lets you affect the multiplier used for the cpu
    portion of the rating on that node.   One use for this is if you have
    cluster nodes that are architecturally different from each other (for
    example a 2 node cluster consiting of a node with 6-6 VUP CPU's and
    another with 1-36 VUP CPU.   The cpu portion of the service ratings
    for the 6-6Vup node will be almost as high if there are more job slots
    available, and you may not want that, so you can tune it (without
    having to go to custom load balancing, which has much more overhead
    since it involves spawning a process before each job is run, to execute
    the script that defines which node it should run on).  
    
    
    All the load balancing logicals have defaults if they are not set 
1203.4a stupid question...RUMOR::FALEKex-TU58 KingFri Feb 14 1997 13:165
    Does the command 
    
    	$ sched set load on
    
    fix it ?
1203.5I think I have set it.HTSC19::KENNETHSun Feb 16 1997 20:3515
Hi,

I have set the logical NSCHED$LOAD_BALANCE to "on" in the SCHEDULER$STARTUP.COM,
does it equivalent to "$ sched set load on"?

I find that if I "$ sched set load off", there is no "Rating" displays.
I see the rating displays when I show status.  So I think we have already
turn on load balancing, is that right?

By the way, refer to ".1", during my last onsite, I issue the command 
"$ schedule check/all", the jobs still submit to the default node.  

Thanks for your help again.

Kenneth Leung
1203.6how to seeRUMOR::FALEKex-TU58 KingMon Feb 17 1997 18:3226
    The nsched$load_balance logical is looked by the first scheduler in the
    cluster started, to see whether it should come up with load balancing
    ON or OFF.  Other schedulers that come up later copy their setting from
    the "default" scheduler,
    whereas the $ sched set load {on off} command is dynamic - it sends a
    broadcast message to all schedulers telling them to switch their
    setting to ON or OFF.
    
    If you see ratings numbers in the user interface, load balancing
    "should" be on, otherwise something is in an inconsistent state.
     
    You should try the $sched set load on  command anyway, it will do no
    harm if load balancing is already on.   The best way to really find out
    what is happening is to stop all the schedulers in the cluster, start
    one scheduler on a hard copy terminal or window where output is logged,
    (it will become the "default" scheduler since it is started first - the
    "default" assigns jobs to nodes for load balancing), then start another
    scheduler on a node with a higher rating.   Then watch the debug output
    to see what's happening.   Alternatively, if you don't want to mess
    with this, you can do    $ SCHED SET DEBUG ON/NODE=default_node
    run a few jobs that should be load balanced to another node, then
    $ SCHED  SET DEBUG OFF /node=           Look at the scheduler's output,
    probably nsched$:nsched.log unless you've moved it.    (Don't leave
    debug mode on permanently, the log file will get very big and hurt
    performance, you should be able to see very quickly what's happening by
    looking at the end of the file)