[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:	SCHEDULER
Notice:	Welcome to the Scheduler Conference on node HUMANEril
Moderator:	RUMOR::FALEK

Created:	Sat Mar 20 1993
Last Modified:	Tue Jun 03 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1240
Total number of notes:	5017

1115.0. "" by BACHUS::WILLEMS (Johan Willems @BRO DTN 856-8739) Thu Jun 06 1996 09:01

    Can somebody explain why a scheduled job ran almost 5 hours too late???


    This is what I have as information
    
    The problem job is job # 146

    $ sched sh job 146/full

    Job Name             Entry    User_name    State      Next Run Time
    --------             -----    ---------    -----      -------------
    ALLOW_EOS            146      NOSTRO       Scheduled   6-JUN-1996 22:00
    VMS_Command : @NOSbil$com_dir:ALLOW_EOS.COM
    Group : NOSTRO                             Type : JOURNALIER
>>> Last Start Time   :  6-JUN-1996 02:54
>>> Last Finish Time  :  6-JUN-1996 02:56      Last Exit Status : SUCCESS
    Schedule Interval : D 22:00                Mode   : Batch
    Mail to           : SLSOPER (No Mail)
    Days              : (MON,TUE,WED,THU)
    Output File       : NOSbil$log_dir:ALLOW_EOS.LOG
    Cluster_CPU       : <Ignored>              Notify user upon completion
    Submit Queue      : NOSTRO$QUEUE
    CPULimit (x100ms) : 0                      QPriority : 100
    Max_Time Warning  : 0 00:45:00.00          Job Always retained
    Stall Notify      : 0 00:05:00.00          No Retry on Error
    Success Count     : 596                    Failure Count : 1
    Owner UIC         : [156,1]                Restart on Crash
    Send Opcom Completion Message
    Pre Function  : '@OPER$DISK:[HOZAY]RESET_SC.COM 149 169', Last Exit
    Status : SUCCESS
    Post Function : (none)
    This job has 1 local job(s) that depend upon it:
    (EOS_JOURNAL)
    All dependencies must successfully complete after:  6-JUN-1996
    02:56:12.35
    Job Dependencies: (NS_SERV_TO_20)
    Job Restricted to run NOT_ON Special Days, Action is to SKIP
    Job Restricted by Special Days Classes:
          (GDA_FIN_SEMAINE, GDA_FERIES, GDA_1_MOIS_OCTOBRE)


    $ sched sh job/full/user=* NS_SERV_TO_20

    Job Name             Entry    User_name    State      Next Run Time
    --------             -----    ---------    -----      -------------
    NS_SERV_TO_20        140      NOSTRO       Scheduled   6-JUN-1996 21:30
    VMS_Command : @NOSbil$com_dir:NS_SERV_TO_20.COM
    Group : NOSTRO                             Type : JOURNALIER
>>> Last Start Time   :  5-JUN-1996 21:32
>>> Last Finish Time  :  5-JUN-1996 21:37      Last Exit Status : SUCCESS
    Schedule Interval : D 21:30                Mode   : Batch
    Mail to           : SLSOPER (No Mail)
    Days              : (MON,TUE,WED,THU,FRI)
    Output File       : NOSbil$log_dir:NS_SERV_TO_20.LOG
    Cluster_CPU       : <Ignored>              Notify user upon completion
    Submit Queue      : NOSTRO$QUEUE
    CPULimit (x100ms) : 0                      QPriority : 100
    Max_Time Warning  : 0 00:30:00.00          Job Always retained
    Stall Notify      : 0 00:15:00.00          No Retry on Error
    Success Count     : 817                    Failure Count : 10
    Owner UIC         : [156,1]                Restart on Crash
    Send Opcom Completion Message
    No Pre or Post Function for this job
    This job has 3 local job(s) that depend upon it:
    (ALLOW_EOS, ALLOW_EOS_DBR, ALLOW_EOS_DBR__1ER)
    All dependencies must successfully complete after:  5-JUN-1996
    21:37:58.68
    Job Dependencies: (START_NOSTRO__SCHED)
    Job Restricted to run NOT_ON Special Days, Action is to SKIP
    Job Restricted by Special Days Classes:
          (GDA_FERIES)


From the event report I can see that the job was queued much to late (and not
held by the queue)
                            JOB         EVENT                         EXIT
       DATE AND TIME       NUMBER       TYPE       NODE     PID      STATUS   
                
    ADDITIONAL INFORMATION
    --------------------   ------   ------------  ------  -------- -------- 
-----------------------------------------------------

     5-JUN-1996 21:01:19      158   JOB QUEUED    BILUX2                    
entry=1646 queue=NOSTRO$QUEUE
     5-JUN-1996 21:03:46      158   JOB START     BILUX2  00A74898
     5-JUN-1996 21:22:10      158   JOB FINISH    BILUX1  00A74898  Success
     5-JUN-1996 21:24:49            SCHED DIAG    BILUX2                    
Process message: Time-out message for non-existent
                                                                            
record. PID=1615111800
     5-JUN-1996 21:30:14      140   JOB QUEUED    BILUX2                    
entry=1741 queue=NOSTRO$QUEUE
     5-JUN-1996 21:32:23      140   JOB START     BILUX2  00A488BC
     5-JUN-1996 21:37:38      163   JOB QUEUED    BILUX2                    
entry=1756 queue=NOSTRO$QUEUE
     5-JUN-1996 21:37:43      163   JOB START     BILUX2  00A77CD5
     5-JUN-1996 21:37:58      140   JOB FINISH    BILUX1  00A488BC  Success
     5-JUN-1996 21:44:29      163   JOB FINISH    BILUX1  00A77CD5  Success
     5-JUN-1996 21:57:01      165   JOB QUEUED    BILUX2                    
entry=1778 queue=NOSTRO$QUEUE
     5-JUN-1996 22:00:15      165   JOB START     BILUX2  00A7A0EC
     5-JUN-1996 22:07:52      165   JOB FINISH    BILUX1  00A7A0EC  Success
     6-JUN-1996 01:50:59      163   JOB QUEUED    BILUX2                    
entry=419 queue=NOSTRO$QUEUE
     6-JUN-1996 01:51:03      163   JOB START     BILUX2  00A6915D
     6-JUN-1996 01:52:16      163   JOB FINISH    BILUX1  00A6915D  Success
>>>  6-JUN-1996 02:54:36      146   JOB QUEUED    BILUX2                    
entry=532 queue=NOSTRO$QUEUE
>>>  6-JUN-1996 02:54:39      146   JOB START     BILUX2  00A68D8D
     6-JUN-1996 02:55:29      147   JOB QUEUED    BILUX2                    
entry=533 queue=NOSTRO$QUEUE
>>>  6-JUN-1996 02:56:12      146   JOB FINISH    BILUX1  00A68D8D  Success
     6-JUN-1996 02:57:31      147   JOB START     BILUX2  00A7558E
     6-JUN-1996 03:00:08      147   JOB FINISH    BILUX1  00A7558E  Success




Can somebody tell me where to go from here??


johan

T.R	Title	User	Personal Name	Date	Lines
1115.1	tuning ?	HLFS00::ERIC_S	Eric Sonneveld MCS - B.O. IS Holland	`Fri Jun 07 1996 02:47`	7
	I can only assume, seen this more on different customer sides, a very bad tuned scheduler database. Take a look elsewere in this conference about tune aspect of schedule, like many deleted jobs, big vermont_creamery (logfile). Eric
1115.2		BACHUS::WILLEMS	Johan Willems @BRO DTN 856-8739	`Fri Jun 07 1996 07:50`	18
	Eric, I can agree that you point to tuning but why should this only influence this job (all other jobs ran on time). Customer veryfied size of log file (135000 blocks) and will create new one. He also checked on the number of deletions and on 16 deleted jobs were found. The customer also told me some other interresting thing. The day before the problem, he found a whole bunch of jobs that had their next run time changed to never. this job (146) was one of them. The customer reset the next run time to the correct values. Is this related??? Johan
1115.3	keep logfile small - use a scheduler job for it	HLFS00::ERIC_S	Eric Sonneveld MCS - B.O. IS Holland	`Sat Jun 08 1996 04:27`	24
	>I can agree that you point to tuning but why should this only >influence this job (all other jobs ran on time). Might have to do that this jobs run more frequent than others ? so update the (indexed ! ) logfile takes more than for others ??? >Customer veryfied size of log file (135000 blocks) and will create >new one. He also checked on the number of deletions and on 16 >deleted jobs were found. The 16 I wouldn't care about, the logfile is really a performance bottleneck . Keep this file max 5000 blocks. >The customer also told me some other interresting thing. The day >before the problem, he found a whole bunch of jobs that had their >next run time changed to never. this job (146) was one of them. The >customer reset the next run time to the correct values. >Is this related??? Don't think so. Never is a special next scheduled day set on purpose. I can not think of any reason why scheduler itself sets a job a never, except that a job is only allowed to run on special days, and the special day calender has no valid combinations with the scheduled interval anymore. That would imply no scheduled runtimes anymore Eric
1115.4	Special day are ok	BACHUS::WILLEMS	Johan Willems @BRO DTN 856-8739	`Tue Jun 11 1996 03:36`	6
	Eric, I checked with the customer and all special day callenders are filled until the end of this year at least. Johan