[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | SCHEDULER |
Notice: | Welcome to the Scheduler Conference on node HUMANE ril |
Moderator: | RUMOR::FALEK |
|
Created: | Sat Mar 20 1993 |
Last Modified: | Tue Jun 03 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1240 |
Total number of notes: | 5017 |
1123.0. "Load Balancing problem - is it ON or OFF or both!" by KERNEL::TITCOMBER () Fri Jun 28 1996 18:23
Can anybody shed light on the following load balancing problem?
A customer has a 4 node cluster of VAX systems all running Scheduler.
As one node is running V5.5-2 while the others are running V6.1, they
have set up Scheduler with the logical NSCHED$ pointing to a
search-listed logical, as one does for mixed architecture (VAX & Alpha)
clusters. So for instance on the V5.5-2 node:
$ sh log nsched$
"NSCHED$" = "DISK$DATA:[NSCHED]" (LNM$SYSTEM_TABLE)
= "DISK$DATA:[NSCHED.V552]"
and for V6.1:
$ sh log nsched$
"NSCHED$" = "DISK$DATA:[NSCHED]" (LNM$SYSTEM_TABLE)
= "DISK$DATA:[NSCHED.V61]"
The V5 and V6 images for Scheduler are then located in the appropriate
directory. Having said all this, it may not be relevant, but for the
sake of completeness...
Anyway, the big question is why one of the nodes has a rating when none
of the others do?
We get the same following information returned from all 4 nodes in the
cluster:
$ sched show stat
Node Version Started Jobs Jmax Log Pri Rating
PORTOS V2.1B-7 22-JUN-1996 18:02:41 0 15 5 4 <-- Default
KERMIT V2.1B-7 22-JUN-1996 22:40:00 1 15 5 4 871
ARAMIS V2.1B-7 23-JUN-1996 02:42:16 0 15 5 4
GONZO V2.1B-7 23-JUN-1996 03:02:44 0 15 5 4
$ sched show load
Load Balancing is OFF
The interesting thing is that each node believes that load balancing is
turned OFF, in which case there should be no rating value displayed for
any nodes in the cluster. The one that does have a rating value
displayed is one of the V6.1 nodes.
Furthermore, to add more confusion, each node logfile has reference to
load balancing being turned on, but not off:
.
.
.
$ Run NSCHED$:NSCHED.EXE
Nsched Version V2.1B-7 starting...
Setting Debugging OFF
Setting Job Max to 15
Setting Logging to 5
Setting Default Job Priority to 4
Setting Load Balancing ON <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Setting Restart Params to CLEAR on job completition
Setting Remote Jobs DISABLED
Setting brkthru/notify wait to 300 seconds
%DCL-I-SUPERSEDE, previous value of NSCHED$MAILBOX has been superseded
%DCL-I-SUPERSEDE, previous value of NSCHED$TERM_MAILBOX has been superseded
CPUtype= 19 CPU_count= 2 Total pages= 669897 Meg= 342 VUPS= 64
Remote jobs not enabled
.
.
.
This problem has been uncovered in the process of investigating why
some jobs in the Scheduler database have been scheduled and left in the
requested state "Run" for hours (and in some cases days) while other
jobs on the same nodes run OK.
What is going on here?
If anybody has seen this before or has any ideas or suggestions then I
would be grateful for some assistance.
Thanks in advance,
Rich
T.R | Title | User | Personal Name | Date | Lines |
---|
1123.1 | weird - try turning it on (or off) ... | RUMOR::FALEK | ex-TU58 King | Mon Jul 01 1996 02:47 | 6 |
| What happens if you do
$ sched set load on
(from an account with sysprv or oper priv) ??
|