[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:SCHEDULER
Notice:Welcome to the Scheduler Conference on node HUMANEril
Moderator:RUMOR::FALEK
Created:Sat Mar 20 1993
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1240
Total number of notes:5017

1185.0. "NOJOB_TO_EXECUTE error in 3.0" by IB001::ANAMARIA (Ana Garc�a, MCS (Madrid)) Mon Nov 18 1996 12:29

	Hi.

	My customer is running Polycenter Scheduler for Digital UNIX v3.0 on 
Digital UNIX v3.2C.

	All the installation and configuration process has been ok, but when 
he tries to start a job already defined, he receives the following message:

SCHED-E-NOJOB_TO_EXECUTE, scheduler job 'job_ident' missing on node 
'machine_name' during execution

	However, the jobs is in the database and it is even started (the state 
changes to 'running') and in the GUI you can see the human figure running.

	The log file for the job is created but it has no info.

	He has tried to test the script associated to the job by hand and it 
works ok (in fact, it is a simple 'echo' command).

	He has tested with jobs started from 'root' and 'non-root' users, and 
the results are the same.

	It is the first time this installation is done, so the database is 
completely new and it has only the jobs tested.

	What do you think it's wrong in the customer system?.

	If you need more info, please let me know.

	Please, we need an urgent answer.

	Thank you very much in advance.

	Regards,

					Ana
T.RTitleUserPersonal
Name
DateLines
1185.1MORE INFO.IB001::ANAMARIAAna Garc�a, MCS (Madrid)Tue Nov 19 1996 05:3917
    	Hi again.
    
    	I have some info that I don't know if it may have something to do
    with the problem.
    
    	The system hostname is in capital letters. In Polycenter Scheduler,
    the server name is in capital letters but the agent name (the same
    system) is in lower letter.
    
    	Do you think this could be the cause of this problem?. The customer
    will try to do the test, installing again, in a test system.
    
    	Thank you very much in advance.
    
    	Regards.
    
    					Ana
1185.2MORE TESTS AND LOG FILESIB001::ANAMARIAAna Garc�a, MCS (Madrid)Tue Nov 19 1996 09:17328
    	Hi.
    
    	The customer has changed the hostname to lowercase letters and the
    behavour is exactly the same.
    
    	He has installed DCE and Scheduler again, and the results are the
    same.
    
    	One aspect that could have something to do: THE SYSTEM IS USING C2.
    						    ======================
    
    
    	After the installation, he has run some of the test jobs coming
    with the kit, getting the same error message than in .0. The log files
    are at the end of the note.
    
    	It seems a system problem because, when the daemon is going to kill
    the process generated to accomplish the job, it has dissappeared.
    
    
    	Please, we'd need an urgent answer.
    
    	Thank you very much in advance.
    
    						Ana
    
    ***********************************************************************
    'DWPROD_AGENT.LOG' FILE   (WITH DEBUGGING LEVEL)
    ================================================
    
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_agent.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_agent
.log
ProcessMain: Done processing SCHED_IP_DEBUG 26
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
ProcessMain: Found a Message ... Processing
ProcessMain: message type is 32 message is
SchedArglist {
        ACL Name = ""
        Access Privileges = 12004
        Account = "root"
        Ace = 21002
        Agent User Name = "root"
        Client User Name = "root"
        Close Partition Action = 0
        Fail Job = 0
        Failure Count = 1
        Flags = 559360
        Job Name = "FINANCIAL_REPORT"
        Job Number = 1
        Job Request = 2
        Job Retry Attempts = 100
        Job Retry Count = 0
        Job State = 8
        Job Status = 3
        Net Retry Max Attempts = 0
        Output File = ""
        Owner = "root"
        Partition = "SYSTEM"
        Pid = 11163
        Proxy User Name = "root"
        Ref Job Id = 1
        Restart Step = 0
        Retry Interval = "+0 00:15:00"
        Run Mode = 18001
        Run Priority = 0
        Running on agent = "dwprod"
        SERVER NODE = "dwprod"
        Sched Priority = 100
        Stall Job = 0
        Step Number = 1
        Success Count = 0
        Timeout Job = 0
        Trigger Job Id = 0
        Trigger Job Status = 0
        UIL Node = "dwprod"
        }

ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Received Message SCHED_IP_RUN_JOB 32
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: Run Job requested for ref job number 1 by server node dwprod.  Job
was not running - run request continuing.
ProcessMain: AL_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_OWNER GetProperty returned 76972033 with value root
ProcessMain: AL_ACCOUNT GetProperty returned 76972033 with value root
ProcessMain: AL_LOGFILE GetProperty returned 76972033 with value
ProcessMain: AL_RUN_MODE GetProperty returned 76972033 with value 18001
ProcessMain: AL_EXEC_PRIORITY GetProperty returned 76972033 with value 0
ProcessMain: AL_TRIGGER_JOB_ID GetProperty returned 76972033 with value 0
ProcessMain: AL_TRIGGER_JOB_STATUS GetProperty returned 76972033 with value 0
ProcessMain: AL_RESTART_PARAM GetProperty returned 76991338  setting it to '(*&*
&%giuhjl()*890biPIBUOY%^&'
ProcessMain: About to run new job
Agent30Job::Run() about to fork.
Child Process: Log File for this job will be: /dev/null
Child Process: Error File for this job will be: /dev/null
Child Process: chdir to /
Child Process: priority set to 0
Child Process: chdir to / OK
Child Process: Child PID is 17552
Child Process: process group leader set to 17552
Child Process: setting gid.
Child Process: initing groups.
ProcessMain: Run new job done function status was 76972033
Agent30Job::SaveToCheckpoint saving to file /var/sched/sched_agent_dwprod_1_r1.c
kp
Child Process: setting uid.
1 SCHED_S_NORMAL=76972033
2 SCHED_IP_END_JOB=FALSE
3 PATH=/usr/ucb:/bin:/usr/bin:
4 LD_LIBRARY_PATH=/usr/shlib:/usr/ccs/lib:/usr/lib/cmplrs/cc:/usr/lib:/usr/local
/lib:/usr/lib/cmplrs/cxx
5 USER=root
6 HOME=/
7 SHELL=/bin/ksh
8 SCHED_JOB_SERVER=dwprod
9 SCHED_JOBID=1
10 SCHED_REFJOB=1
11 SCHED_STEPNUMBER=0
12 SCHED_JOBOWNER=root
13 SCHED_USERNAME=root
14 SCHED_LOGFILE=
15 SCHED_RUNMODE=18001
16 SCHED_PRIORITY=0
17 SCHED_TRIGGER_JOB_ID=0
18 SCHED_TRIGGER_JOB_STATUS=0
About to MessageDispatch with SchedArglist {
        Job Number = 1
        Pid = 17552
        Ref Job Id = 1
        Running on agent = "dwprod"
        SERVER NODE = "dwprod"
        Status = 76972033
        }

19 SCHED_RESTART=FALSE
20 SCHED_AGENT30_PROC=3
Child Process: Environment String set.
Child Process: Using script /var/sched/sched_agent_job_script.ksh
Child Process: Redirecting stdout, stderr, and stdin.
Message sent.
ProcessMain: Done Processing SCHED_IP_RUN_JOB 32
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
ProcessMain: Found a Message ... Processing
ProcessMain: message type is 5 message is
SchedArglist {
        ACL Name = ""
        Access Privileges = 12004
        Account= "root"
        Ace = 21002
        Agent User Name = "root"
        Client User Name = "root"
        Close Partition Action = 0
        Command = "/usr/bin/sleep 10"
        Fail Job = 0
        Failure Count = 1
        Flags = 559360
        Job Name = "FINANCIAL_REPORT"
        Job Number = 1
        Job Request = 2
        Job Retry Attempts = 100
        Job Retry Count = 0
        Job State = 5
        Job Status = 3
        Net Retry Max Attempts = 0
        Output File = ""
        Owner = "root"
        Partition = "SYSTEM"
        Pid = 17552
        Proxy User Name = "root"
        Ref Job Id = 1
        Restart Step = 0
        Retry Interval = "+0 00:15:00"
        Run Mode = 18001
        Run Priority = 0
        Running on agent = "dwprod"
        SERVER NODE = "dwprod"
        Sched Priority = 100
        Stall Job = 0
        Step Number = 1
        Success Count = 0
        Timeout Job = 0
        UIL Node = "dwprod"
        }

ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Received Message SCHED_IP_JOB_STEP 5
ProcessMain: AL_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_COMMAND GetProperty returned 76972033 with value /usr/bin/sleep 10
ProcessMain: AL_STEP_NUMBER GetProperty returned 76972033 with value 1
ProcessMain: AL_PID GetProperty returned 76972033 with value 17552
ProcessMain: AL_EXEC_PRIORITY GetProperty returned 76972033 with value 0
ProcessMain: saving step */usr/bin/sleep 10* for dwprod 1
Agent30Job::SaveToCheckpoint saving to file /var/sched/sched_agent_dwprod_1_r1.c
kp
ProcessMain: Done processing SCHED_IP_JOB_STEP
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
error>>> Error calling kill function, no such pid.
error>>> Missing job: found missing jobSchedBaseObject SchedArglist {
        Account = "root"
        Agent Job Ended = 0
        Command = "/usr/bin/sleep 10"
        Have Step = 1
        JOB START TIME = November 19, 1996 1:56:51 pm
        Job Number = 1
        Output File = ""
        Owner = "root"
        Pid = 17552
        Ref Job Id = 1
        Restart Parameter = "(*&*&%giuhjl()*890biPIBUOY%^&"
        Run Mode = 18001
        Run Priority = 0
        SERVER NODE = "dwprod"
        Step Number = 1
        Trigger Job Id = 0
        Trigger Job Status = 0
        }

ProcessMain: Found a Message ... Processing
ProcessMain: message type is 123 message is
SchedArglist {
        Ref Job Id = 1
        SERVER NODE = "dwprod"
        Status = 76991914
        }

ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Processing SCHED_IP_JOB_MISSING
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: job process missing, job will be deletedSchedBaseObject
 SchedArglist {
        Account = "root"
        Agent Job Ended = 0
        Command = "/usr/bin/sleep 10"
        Have Step = 1
        JOB START TIME = November 19, 1996 1:56:51 pm
        Job Number = 1
        Output File = ""
        Owner = "root"
        Pid = 17552
        Ref Job Id = 1
        Restart Parameter = "(*&*&%giuhjl()*890biPIBUOY%^&"
        Run Mode = 18001
        Run Priority = 0
        SERVER NODE = "dwprod"
        Step Number = 1
        Trigger Job Id = 0
        Trigger Job Status = 0
        }

Deleting checkpoint file: /var/sched/sched_agent_dwprod_1_r1.ckp
About to MessageDispatch with SchedArglist {
        Ref Job Id = 1
        SERVER NODE = "dwprod"
        Status = 76991914
        }

Message sent.
ProcessMain: Done processing SCHED_IP_JOB_MISSING
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
    
    
    'DWPROD_ENGINE.LOG' FILE
    ========================
    
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_engine.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_engin
e.log
...Number of arguments specified: 1
Command line arguments (Default = -1):
        1: process code. Default to 2.
        2: use simulated agents. Default to FALSE(0).
        3: use realtime clock. Default to TRUE(1).
        4: debug level (-1/all to 3). Default to 2.
        5: start without waiting for TXM/verification.  Default to FALSE.
        6: process code for V3 agents.
Default to 3.
        7: port number for V21 agents. Default to 5482.
        8: activate interface to V21 agents. Default to FALSE.
        9: alternative log file name.

-->Engine status initialized to: SchedArglist {
        Build Date = "Oct 16 1995"
        Component Type = "Engine"
        Database is attached = 0
        OS Type = "OSF/1"
        Scheduler Version = "V3.0-02"
        Sender Process Id = SchedProcessAddress SchedArglist {
        SCHEDULER PROCESS CODE = 2
        Server = "dwprod"
        }

        Server = "dwprod"
        Startup Time = November 19, 1996 1:03:49 pm
        }
!!! Job 1 found missing by Agent during execution. Treated as job failed.
!!! Job 1 found missing by Agent during execution. Treated as job failed.
    
    
    	'DWPROD_LISTENER.LOG' FILE
    	===========================
    
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_listener.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_liste
ner.log
Command line arguments:
        arg 1: process code. Default to 4
        arg 2: debug level. Default to 1

...Process code for this UIL is: 4
...Debug level is: 0
    
    
    
    	'DWPROD_TXM.LOG' FILE
    	=====================
    
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_txm.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_txm.l
og
SchedTrace::CloseLog: closing log file...