[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:	SCHEDULER
Notice:	Welcome to the Scheduler Conference on node HUMANEril
Moderator:	RUMOR::FALEK

Created:	Sat Mar 20 1993
Last Modified:	Tue Jun 03 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1240
Total number of notes:	5017

1106.0. "MAIL not sent & Jobs in Slot Wait" by STKHLM::WIDMAN (Modo liceat vivere, est spes.) Wed May 22 1996 15:12

    Hi,

    I'm looking at some strange problems in (IMHO) a weird config.

    The problem(s) :    1. No mail after job completion.
                        2. No NSCEHD$MAILBOX or NSCEHD$TERM_MAILBOX exists.
                        3. Jobs in Slot Wait  even if there seems to be
                           slots available.



    Question : What's going on ?

    Some Info.:

    Cluster with three nodes
              
    BLGV07  VAX 6000-630          VMS V5.5-2
    BLGV06  VAX 6000-630          VMS V5.5-2
    BLGV04  VAX 7000-630          VMS V5.5-2

    BLGV04 has its own systemdisk with a 'separate' installation of the
    scheduler...



    SCHEDULE>sh sta
    Node   Version  Started              Jobs  Jmax   Log  Pri Rating
    BLGV06 V2.1B-1  12-MAY-1996 22:34:30    0    40     5    4   4659 <--
    Default
    BLGV07 V2.1B-1  12-MAY-1996 22:44:33    0    40     5    4  10209
    BLGV04 V2.1B-1  12-MAY-1996 23:01:22    0    40     5    4  10457



    SCHEDULE>sho job 6/fu

    Job Name             Entry    User_name    State      Next Run Time
    --------             -----    ---------    -----      -------------
    TIME-TEST            6        BOSTROMRO    Scheduled  23-MAY-1996 00:00
    VMS_Command : show time
    Group : (none)                             Type : (none)
    Last Start Time   : 22-MAY-1996 17:27
    Last Finish Time  : 22-MAY-1996 17:27      Last Exit Status : SUCCESS
    Schedule Interval : D                      Mode   : Detached
    Mail to           : BLGV04::FIELD (Always)
    Days              : ALL
    Output File       : SHOW-TIME.LOG
    Cluster_CPU       : BLGV04                 Notify user upon completion
    Run Priority      : Default
    Max_Time Warning  : None                   Job Always retained
    Stall Notify      : None                   No Retry on Error
    Success Count     : 78                     Failure Count : 14
    Owner UIC         : [242,21225]            Restart on Crash
    No Pre or Post Function for this job
    No local jobs depend upon this job.
    This job has no Dependencies on other jobs



    From NSCHED$:BLGV04.LOG

    !
    Nsched Version V2.1B-1  starting...
    Setting Debugging OFF
    Setting Job Max to  40
    Setting Logging to  5
    Setting Default Job Priority to  4
    Setting Load Balancing ON
    Setting Restart Params to CLEAR on job completition
    Setting Remote Jobs ENABLED
    Setting brkthru/notify wait to  300  seconds
    CPUtype= 23  CPU_count= 3  Total pages= 688879   Meg= 352   VUPS= 99
    Check Requested
    Setting Debugging ON
    timer flag was clear
    timer not expired. No earlier event to set.
    sleeping
    ...
    ...
    we woke up!
    got mbx msg '>>6       '
    06:59 PM  processing record #  6  status= S   request= N
    vss$get_next_start_time: 1  cstat= 211191443  next=
    23-MAY-1996 00:00:00.00
    Running Job  6  PID=2061BCC9 Count= 1  Priority= 4
    timer flag was clear
    timer not expired. No earlier event to set.
    sleeping
    we woke up!
    job #  6  finished.... count=  0
    exit status of job was 00030001
    NSCHED: LIB$SPAWN(MAIL...4 to_list='BLGV04::FIELD'
    Sending mail :
    MAIL/NOSELF/SUBJ:"Scheduler Job #6 (NAME: TIME-TEST) finished, Status:
    Success"  NL: "BLGV04::FIELD"
    Spawn mail failed:  28
     0  remote nodes care about job  6
    06:59 PM  processing record #  6  status= S   request=
     Now=22-MAY-1996 18:59:47.01   job_sched_time=23-MAY-1996 00:00:00.00
    job  6  is scheduled for the future
    06:59 PM  updated    record #  6  status= S    request=
    Found 0  local jobs depending on :: 6
    cluster_broadcast:---node= msg=CWJ
    timer flag was clear
    timer not expired. No earlier event to set.
    sleeping
                                                               

$ sh log ns*

(LNM$PROCESS_TABLE)

(LNM$JOB_84789EB0)

(LNM$GROUP_000001)

(LNM$SYSTEM_TABLE)

  "NSCHED$" = "SYS$COMMON:[NSCHED]"
  "NSCHED$CLEAR_RESTART_PARAM" = "TRUE"
  "NSCHED$DEFAULT_JOB_MAX" = "40"
  "NSCHED$REMOTE_SUPPORT_ENABLED" = "TRUE"
  "NSCHED$UID" = "NSCHED$:SCHEDULER$XUI.UID"
  "NSCHED_DEFAULT_SD_ACTION" = "SKIP"

(DECW$LOGICAL_NAMES)


\ H�kan Widman / CSC Sweden

BTW  - I've sent ECO7 to the customer today ....

T.R	Title	User	Personal Name	Date	Lines
1106.1	subprocess quota ?	RUMOR::FALEK	ex-TU58 King	`Wed May 22 1996 15:35`	4
	If you type "exit 28" at the VMS $ prompt, you see "exceeded quota" In this case it is probably the subprocess quota of the qccount that the scheduler runs under.