| Title: | SCHEDULER |
| Notice: | Welcome to the Scheduler Conference on node HUMANE ril |
| Moderator: | RUMOR::FALEK |
| Created: | Sat Mar 20 1993 |
| Last Modified: | Tue Jun 03 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1240 |
| Total number of notes: | 5017 |
Hello,
Configuration : MV3190 as Scheduler Server running OpenVms 6.2 and Scheduler 2.1b-7
A cluster of Agents running OpenVms 5.5-2 and Sched Agent 2.1b-5
Problem: From times to times a job running on a agent never get's his status updated, that means the job on
agent side is finished, but on the server side the status is still 'running'.
That happens once or twice a day, from both agent, other local and remote jobs continue to work
without problem (if dependencies allow it).
In the DBC083_REMOTE_EXECUTOR log file we may find the following :
entered outer main loop
assign to mailbox failed; trying again <- could this be a problem ?
Most of the jobs are run very often (each 5 or 10 minutes)and take few secondes to complete.
Here is an extract of the agent log file :
539033266: receieved job 113 from scheduler node BKS010
539033266: local username: EXPSYSTEM
539033266: remote username: EXPSYSTEM
539033266: command: @[.CHECK_CLUSTER]CHECK_CLUSTER
539033266: output file: log_EXPSYSTEM_bigsys:check_cluster_bigsys
Job stats:
cputime: 103 ticks
maxws: 1219
faults: 1248
ios: 191
elapsed: 1 secs
<- this job gave the problem
there is no Job ends message !
539033266: receieved job 114 from scheduler node BKS010
539033266: local username: EXPSYSTEM
539033266: remote username: EXPSYSTEM
539033266: command: @[.CHECK_DISKS]CHECK_DISKS
539033266: output file: log_EXPSYSTEM_bigsys:CHECK_DISKS
Job stats:
cputime: 94 ticks
maxws: 731
faults: 608
ios: 181
elapsed: 1 secs
Job 114 ends at Tue Apr 30 13:20:41 1996
539033266: final error on connect to socket 4 <- May this be considered as normal ?
539033266: send failed
539033266: receieved job 120 from scheduler node BKS010
539033266: local username: EXPSYSTEM
539033266: remote username: EXPSYSTEM
539033266: command: @[.CHECK_QUEUE]CHECK_QUEUE
Any info will be welcome.
Best regards,
Alain.
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 1089.1 | BACHUS::BANKEN | Thu May 02 1996 04:48 | 10 | ||
Hello, What do you think about the error messages (logfiles) ?. Most of the jobs are remote jobs, may we have confidence in Scheduler in such a configuration, as someone a site where remote jobs are used intensively ?. Please react, Alain. | |||||
| 1089.2 | BACHUS::BANKEN | Fri May 03 1996 05:49 | 10 | ||
Hi Scheduler Team, I may imagine that you are all busy with very important stuff, but on the other hand we must keep customers satisfied. This case will be IMPT'ed ... Thanks in advance for your comprehension. Alain. | |||||