Title: | SCHEDULER |
Notice: | Welcome to the Scheduler Conference on node HUMANE ril |
Moderator: | RUMOR::FALEK |
Created: | Sat Mar 20 1993 |
Last Modified: | Tue Jun 03 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 1240 |
Total number of notes: | 5017 |
Hello, Configuration : MV3190 as Scheduler Server running OpenVms 6.2 and Scheduler 2.1b-7 A cluster of Agents running OpenVms 5.5-2 and Sched Agent 2.1b-5 Problem: From times to times a job running on a agent never get's his status updated, that means the job on agent side is finished, but on the server side the status is still 'running'. That happens once or twice a day, from both agent, other local and remote jobs continue to work without problem (if dependencies allow it). In the DBC083_REMOTE_EXECUTOR log file we may find the following : entered outer main loop assign to mailbox failed; trying again <- could this be a problem ? Most of the jobs are run very often (each 5 or 10 minutes)and take few secondes to complete. Here is an extract of the agent log file : 539033266: receieved job 113 from scheduler node BKS010 539033266: local username: EXPSYSTEM 539033266: remote username: EXPSYSTEM 539033266: command: @[.CHECK_CLUSTER]CHECK_CLUSTER 539033266: output file: log_EXPSYSTEM_bigsys:check_cluster_bigsys Job stats: cputime: 103 ticks maxws: 1219 faults: 1248 ios: 191 elapsed: 1 secs <- this job gave the problem there is no Job ends message ! 539033266: receieved job 114 from scheduler node BKS010 539033266: local username: EXPSYSTEM 539033266: remote username: EXPSYSTEM 539033266: command: @[.CHECK_DISKS]CHECK_DISKS 539033266: output file: log_EXPSYSTEM_bigsys:CHECK_DISKS Job stats: cputime: 94 ticks maxws: 731 faults: 608 ios: 181 elapsed: 1 secs Job 114 ends at Tue Apr 30 13:20:41 1996 539033266: final error on connect to socket 4 <- May this be considered as normal ? 539033266: send failed 539033266: receieved job 120 from scheduler node BKS010 539033266: local username: EXPSYSTEM 539033266: remote username: EXPSYSTEM 539033266: command: @[.CHECK_QUEUE]CHECK_QUEUE Any info will be welcome. Best regards, Alain.
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
1089.1 | BACHUS::BANKEN | Thu May 02 1996 05:48 | 10 | ||
Hello, What do you think about the error messages (logfiles) ?. Most of the jobs are remote jobs, may we have confidence in Scheduler in such a configuration, as someone a site where remote jobs are used intensively ?. Please react, Alain. | |||||
1089.2 | BACHUS::BANKEN | Fri May 03 1996 06:49 | 10 | ||
Hi Scheduler Team, I may imagine that you are all busy with very important stuff, but on the other hand we must keep customers satisfied. This case will be IMPT'ed ... Thanks in advance for your comprehension. Alain. |