[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | SCHEDULER |
Notice: | Welcome to the Scheduler Conference on node HUMANE ril |
Moderator: | RUMOR::FALEK |
|
Created: | Sat Mar 20 1993 |
Last Modified: | Tue Jun 03 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1240 |
Total number of notes: | 5017 |
1115.0. "" by BACHUS::WILLEMS (Johan Willems @BRO DTN 856-8739) Thu Jun 06 1996 10:01
Can somebody explain why a scheduled job ran almost 5 hours too late???
This is what I have as information
The problem job is job # 146
$ sched sh job 146/full
Job Name Entry User_name State Next Run Time
-------- ----- --------- ----- -------------
ALLOW_EOS 146 NOSTRO Scheduled 6-JUN-1996 22:00
VMS_Command : @NOSbil$com_dir:ALLOW_EOS.COM
Group : NOSTRO Type : JOURNALIER
>>> Last Start Time : 6-JUN-1996 02:54
>>> Last Finish Time : 6-JUN-1996 02:56 Last Exit Status : SUCCESS
Schedule Interval : D 22:00 Mode : Batch
Mail to : SLSOPER (No Mail)
Days : (MON,TUE,WED,THU)
Output File : NOSbil$log_dir:ALLOW_EOS.LOG
Cluster_CPU : <Ignored> Notify user upon completion
Submit Queue : NOSTRO$QUEUE
CPULimit (x100ms) : 0 QPriority : 100
Max_Time Warning : 0 00:45:00.00 Job Always retained
Stall Notify : 0 00:05:00.00 No Retry on Error
Success Count : 596 Failure Count : 1
Owner UIC : [156,1] Restart on Crash
Send Opcom Completion Message
Pre Function : '@OPER$DISK:[HOZAY]RESET_SC.COM 149 169', Last Exit
Status : SUCCESS
Post Function : (none)
This job has 1 local job(s) that depend upon it:
(EOS_JOURNAL)
All dependencies must successfully complete after: 6-JUN-1996
02:56:12.35
Job Dependencies: (NS_SERV_TO_20)
Job Restricted to run NOT_ON Special Days, Action is to SKIP
Job Restricted by Special Days Classes:
(GDA_FIN_SEMAINE, GDA_FERIES, GDA_1_MOIS_OCTOBRE)
$ sched sh job/full/user=* NS_SERV_TO_20
Job Name Entry User_name State Next Run Time
-------- ----- --------- ----- -------------
NS_SERV_TO_20 140 NOSTRO Scheduled 6-JUN-1996 21:30
VMS_Command : @NOSbil$com_dir:NS_SERV_TO_20.COM
Group : NOSTRO Type : JOURNALIER
>>> Last Start Time : 5-JUN-1996 21:32
>>> Last Finish Time : 5-JUN-1996 21:37 Last Exit Status : SUCCESS
Schedule Interval : D 21:30 Mode : Batch
Mail to : SLSOPER (No Mail)
Days : (MON,TUE,WED,THU,FRI)
Output File : NOSbil$log_dir:NS_SERV_TO_20.LOG
Cluster_CPU : <Ignored> Notify user upon completion
Submit Queue : NOSTRO$QUEUE
CPULimit (x100ms) : 0 QPriority : 100
Max_Time Warning : 0 00:30:00.00 Job Always retained
Stall Notify : 0 00:15:00.00 No Retry on Error
Success Count : 817 Failure Count : 10
Owner UIC : [156,1] Restart on Crash
Send Opcom Completion Message
No Pre or Post Function for this job
This job has 3 local job(s) that depend upon it:
(ALLOW_EOS, ALLOW_EOS_DBR, ALLOW_EOS_DBR__1ER)
All dependencies must successfully complete after: 5-JUN-1996
21:37:58.68
Job Dependencies: (START_NOSTRO__SCHED)
Job Restricted to run NOT_ON Special Days, Action is to SKIP
Job Restricted by Special Days Classes:
(GDA_FERIES)
From the event report I can see that the job was queued much to late (and not
held by the queue)
JOB EVENT EXIT
DATE AND TIME NUMBER TYPE NODE PID STATUS
ADDITIONAL INFORMATION
-------------------- ------ ------------ ------ -------- --------
-----------------------------------------------------
5-JUN-1996 21:01:19 158 JOB QUEUED BILUX2
entry=1646 queue=NOSTRO$QUEUE
5-JUN-1996 21:03:46 158 JOB START BILUX2 00A74898
5-JUN-1996 21:22:10 158 JOB FINISH BILUX1 00A74898 Success
5-JUN-1996 21:24:49 SCHED DIAG BILUX2
Process message: Time-out message for non-existent
record. PID=1615111800
5-JUN-1996 21:30:14 140 JOB QUEUED BILUX2
entry=1741 queue=NOSTRO$QUEUE
5-JUN-1996 21:32:23 140 JOB START BILUX2 00A488BC
5-JUN-1996 21:37:38 163 JOB QUEUED BILUX2
entry=1756 queue=NOSTRO$QUEUE
5-JUN-1996 21:37:43 163 JOB START BILUX2 00A77CD5
5-JUN-1996 21:37:58 140 JOB FINISH BILUX1 00A488BC Success
5-JUN-1996 21:44:29 163 JOB FINISH BILUX1 00A77CD5 Success
5-JUN-1996 21:57:01 165 JOB QUEUED BILUX2
entry=1778 queue=NOSTRO$QUEUE
5-JUN-1996 22:00:15 165 JOB START BILUX2 00A7A0EC
5-JUN-1996 22:07:52 165 JOB FINISH BILUX1 00A7A0EC Success
6-JUN-1996 01:50:59 163 JOB QUEUED BILUX2
entry=419 queue=NOSTRO$QUEUE
6-JUN-1996 01:51:03 163 JOB START BILUX2 00A6915D
6-JUN-1996 01:52:16 163 JOB FINISH BILUX1 00A6915D Success
>>> 6-JUN-1996 02:54:36 146 JOB QUEUED BILUX2
entry=532 queue=NOSTRO$QUEUE
>>> 6-JUN-1996 02:54:39 146 JOB START BILUX2 00A68D8D
6-JUN-1996 02:55:29 147 JOB QUEUED BILUX2
entry=533 queue=NOSTRO$QUEUE
>>> 6-JUN-1996 02:56:12 146 JOB FINISH BILUX1 00A68D8D Success
6-JUN-1996 02:57:31 147 JOB START BILUX2 00A7558E
6-JUN-1996 03:00:08 147 JOB FINISH BILUX1 00A7558E Success
Can somebody tell me where to go from here??
johan
T.R | Title | User | Personal Name | Date | Lines |
---|
1115.1 | tuning ? | HLFS00::ERIC_S | Eric Sonneveld MCS - B.O. IS Holland | Fri Jun 07 1996 03:47 | 7 |
| I can only assume, seen this more on different customer sides, a very bad tuned
scheduler database.
Take a look elsewere in this conference about tune aspect of schedule, like
many deleted jobs, big vermont_creamery (logfile).
Eric
|
1115.2 | | BACHUS::WILLEMS | Johan Willems @BRO DTN 856-8739 | Fri Jun 07 1996 08:50 | 18 |
| Eric,
I can agree that you point to tuning but why should this only
influence this job (all other jobs ran on time).
Customer veryfied size of log file (135000 blocks) and will create
new one. He also checked on the number of deletions and on 16
deleted jobs were found.
The customer also told me some other interresting thing. The day
before the problem, he found a whole bunch of jobs that had their
next run time changed to never. this job (146) was one of them. The
customer reset the next run time to the correct values.
Is this related???
Johan
|
1115.3 | keep logfile small - use a scheduler job for it | HLFS00::ERIC_S | Eric Sonneveld MCS - B.O. IS Holland | Sat Jun 08 1996 05:27 | 24 |
| >I can agree that you point to tuning but why should this only
>influence this job (all other jobs ran on time).
Might have to do that this jobs run more frequent than others ? so update the
(indexed ! ) logfile takes more than for others ???
>Customer veryfied size of log file (135000 blocks) and will create
>new one. He also checked on the number of deletions and on 16
>deleted jobs were found.
The 16 I wouldn't care about, the logfile is really a performance bottleneck .
Keep this file max 5000 blocks.
>The customer also told me some other interresting thing. The day
>before the problem, he found a whole bunch of jobs that had their
>next run time changed to never. this job (146) was one of them. The
>customer reset the next run time to the correct values.
>Is this related???
Don't think so. Never is a special next scheduled day set on purpose. I can not
think of any reason why scheduler itself sets a job a never, except that a job
is only allowed to run on special days, and the special day calender has no
valid combinations with the scheduled interval anymore. That would imply no
scheduled runtimes anymore
Eric
|
1115.4 | Special day are ok | BACHUS::WILLEMS | Johan Willems @BRO DTN 856-8739 | Tue Jun 11 1996 04:36 | 6 |
| Eric,
I checked with the customer and all special day callenders are
filled until the end of this year at least.
Johan
|