T.R | Title | User | Personal Name | Date | Lines |
---|
1114.1 | Backup scheduler with script command | CMGOP2::meod22dgp4.gen.meo.dec.com::mckenzie | --> dangling pointer | Wed Jun 05 1996 21:48 | 7 |
|
You can get the same effect (ability to recreate the
database) without closing down the scheduler with
schedule script job/all
FWIW
|
1114.2 | ...but why "notRunning" | KERNEL::TITCOMBER | | Fri Jun 07 1996 08:33 | 6 |
|
Thanks for that, but what I really need to know is why the job goes
into the "Not Running" state. Is it expected, or is it a problem?
Rich
|
1114.3 | not running pops up hardly on a normal system | HLFS00::ERIC_S | Eric Sonneveld MCS - B.O. IS Holland | Sat Jun 08 1996 05:23 | 13 |
| Not running means scheduler didn't get the PID details (yet). This shouldn't
take more than a few seconds, most times you do not see it.
It also happens at image rundown, when the pid is gone and Scheduler has to
update the status in the scheduler database.
I've no sources of whatever, but just from experience I remember to have seen
this state at startup/rundown of a scheduler job.
In a normal situation (no bad performance on the system) it should hardly be
seen, if it does it indicates a bad performance on the system and/or scheduler
itself.
Eric
|
1114.4 | NotRunning - more explanation | RUMOR::FALEK | ex-TU58 King | Mon Jun 10 1996 15:53 | 19 |
| The Scheduler DCL interface checks the PID associated with scheduler
jobs that are marked as currently "running" in the scheduler's
database (disk file), and if the PID isn't found, it shows the job as
"NotRunning" rather than "Running".
If you stop the scheduler while jobs are running, jobs continue to run
and complete normally (unless you did $ sched stop /abort) but since the
scheduler isn't running, the disk database isn't updated. The user
interface would then indicate "NotRunning" for the job process. When the
scheduler is restarted, it checks its database and cleans up any jobs
whose PIDs aren't really there (it would probably report the job
completion status as failure "NSCHED-F-job was aborted" since it can't
tell if the completion was normal, as it missed the mailbox termination
message).
In a multi-node cluster, this won't happen if the scheduler is running on
another node, and the job is not restricted to running on a node that
is down, since the "default" scheduler will take over responsibility for
tracking jobs and updating the status on disk..
|
1114.5 | | RUMOR::FALEK | ex-TU58 King | Mon Jun 10 1996 15:57 | 6 |
| PS if jobs stay in NotRunning state (as reported by the GUI) after the
scheduler is restarted on the node the job was running on (or started
on any node of a VMScluster, if the job is not restricted to a particular
node), and the scheduler has had time to read thru the database and has
reached more-or-less steady state, then something is wrong - since the
scheduler should update the job's status.
|
1114.6 | Fixed with V2.1B-7 | KERNEL::TITCOMBER | | Fri Jul 05 1996 09:35 | 9 |
|
Thanks for all the quality explanations and help. The problem was
resolved by upgrading to V2.1B-7 - the problem no longer exists,
despite many re-runs of the job.
Thanks again,
Rich
|