| Hi again.
I have some info that I don't know if it may have something to do
with the problem.
The system hostname is in capital letters. In Polycenter Scheduler,
the server name is in capital letters but the agent name (the same
system) is in lower letter.
Do you think this could be the cause of this problem?. The customer
will try to do the test, installing again, in a test system.
Thank you very much in advance.
Regards.
Ana
|
| Hi.
The customer has changed the hostname to lowercase letters and the
behavour is exactly the same.
He has installed DCE and Scheduler again, and the results are the
same.
One aspect that could have something to do: THE SYSTEM IS USING C2.
======================
After the installation, he has run some of the test jobs coming
with the kit, getting the same error message than in .0. The log files
are at the end of the note.
It seems a system problem because, when the daemon is going to kill
the process generated to accomplish the job, it has dissappeared.
Please, we'd need an urgent answer.
Thank you very much in advance.
Ana
***********************************************************************
'DWPROD_AGENT.LOG' FILE (WITH DEBUGGING LEVEL)
================================================
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_agent.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_agent
.log
ProcessMain: Done processing SCHED_IP_DEBUG 26
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
ProcessMain: Found a Message ... Processing
ProcessMain: message type is 32 message is
SchedArglist {
ACL Name = ""
Access Privileges = 12004
Account = "root"
Ace = 21002
Agent User Name = "root"
Client User Name = "root"
Close Partition Action = 0
Fail Job = 0
Failure Count = 1
Flags = 559360
Job Name = "FINANCIAL_REPORT"
Job Number = 1
Job Request = 2
Job Retry Attempts = 100
Job Retry Count = 0
Job State = 8
Job Status = 3
Net Retry Max Attempts = 0
Output File = ""
Owner = "root"
Partition = "SYSTEM"
Pid = 11163
Proxy User Name = "root"
Ref Job Id = 1
Restart Step = 0
Retry Interval = "+0 00:15:00"
Run Mode = 18001
Run Priority = 0
Running on agent = "dwprod"
SERVER NODE = "dwprod"
Sched Priority = 100
Stall Job = 0
Step Number = 1
Success Count = 0
Timeout Job = 0
Trigger Job Id = 0
Trigger Job Status = 0
UIL Node = "dwprod"
}
ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Received Message SCHED_IP_RUN_JOB 32
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: Run Job requested for ref job number 1 by server node dwprod. Job
was not running - run request continuing.
ProcessMain: AL_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_OWNER GetProperty returned 76972033 with value root
ProcessMain: AL_ACCOUNT GetProperty returned 76972033 with value root
ProcessMain: AL_LOGFILE GetProperty returned 76972033 with value
ProcessMain: AL_RUN_MODE GetProperty returned 76972033 with value 18001
ProcessMain: AL_EXEC_PRIORITY GetProperty returned 76972033 with value 0
ProcessMain: AL_TRIGGER_JOB_ID GetProperty returned 76972033 with value 0
ProcessMain: AL_TRIGGER_JOB_STATUS GetProperty returned 76972033 with value 0
ProcessMain: AL_RESTART_PARAM GetProperty returned 76991338 setting it to '(*&*
&%giuhjl()*890biPIBUOY%^&'
ProcessMain: About to run new job
Agent30Job::Run() about to fork.
Child Process: Log File for this job will be: /dev/null
Child Process: Error File for this job will be: /dev/null
Child Process: chdir to /
Child Process: priority set to 0
Child Process: chdir to / OK
Child Process: Child PID is 17552
Child Process: process group leader set to 17552
Child Process: setting gid.
Child Process: initing groups.
ProcessMain: Run new job done function status was 76972033
Agent30Job::SaveToCheckpoint saving to file /var/sched/sched_agent_dwprod_1_r1.c
kp
Child Process: setting uid.
1 SCHED_S_NORMAL=76972033
2 SCHED_IP_END_JOB=FALSE
3 PATH=/usr/ucb:/bin:/usr/bin:
4 LD_LIBRARY_PATH=/usr/shlib:/usr/ccs/lib:/usr/lib/cmplrs/cc:/usr/lib:/usr/local
/lib:/usr/lib/cmplrs/cxx
5 USER=root
6 HOME=/
7 SHELL=/bin/ksh
8 SCHED_JOB_SERVER=dwprod
9 SCHED_JOBID=1
10 SCHED_REFJOB=1
11 SCHED_STEPNUMBER=0
12 SCHED_JOBOWNER=root
13 SCHED_USERNAME=root
14 SCHED_LOGFILE=
15 SCHED_RUNMODE=18001
16 SCHED_PRIORITY=0
17 SCHED_TRIGGER_JOB_ID=0
18 SCHED_TRIGGER_JOB_STATUS=0
About to MessageDispatch with SchedArglist {
Job Number = 1
Pid = 17552
Ref Job Id = 1
Running on agent = "dwprod"
SERVER NODE = "dwprod"
Status = 76972033
}
19 SCHED_RESTART=FALSE
20 SCHED_AGENT30_PROC=3
Child Process: Environment String set.
Child Process: Using script /var/sched/sched_agent_job_script.ksh
Child Process: Redirecting stdout, stderr, and stdin.
Message sent.
ProcessMain: Done Processing SCHED_IP_RUN_JOB 32
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
ProcessMain: Found a Message ... Processing
ProcessMain: message type is 5 message is
SchedArglist {
ACL Name = ""
Access Privileges = 12004
Account= "root"
Ace = 21002
Agent User Name = "root"
Client User Name = "root"
Close Partition Action = 0
Command = "/usr/bin/sleep 10"
Fail Job = 0
Failure Count = 1
Flags = 559360
Job Name = "FINANCIAL_REPORT"
Job Number = 1
Job Request = 2
Job Retry Attempts = 100
Job Retry Count = 0
Job State = 5
Job Status = 3
Net Retry Max Attempts = 0
Output File = ""
Owner = "root"
Partition = "SYSTEM"
Pid = 17552
Proxy User Name = "root"
Ref Job Id = 1
Restart Step = 0
Retry Interval = "+0 00:15:00"
Run Mode = 18001
Run Priority = 0
Running on agent = "dwprod"
SERVER NODE = "dwprod"
Sched Priority = 100
Stall Job = 0
Step Number = 1
Success Count = 0
Timeout Job = 0
UIL Node = "dwprod"
}
ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Received Message SCHED_IP_JOB_STEP 5
ProcessMain: AL_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: AL_COMMAND GetProperty returned 76972033 with value /usr/bin/sleep 10
ProcessMain: AL_STEP_NUMBER GetProperty returned 76972033 with value 1
ProcessMain: AL_PID GetProperty returned 76972033 with value 17552
ProcessMain: AL_EXEC_PRIORITY GetProperty returned 76972033 with value 0
ProcessMain: saving step */usr/bin/sleep 10* for dwprod 1
Agent30Job::SaveToCheckpoint saving to file /var/sched/sched_agent_dwprod_1_r1.c
kp
ProcessMain: Done processing SCHED_IP_JOB_STEP
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
error>>> Error calling kill function, no such pid.
error>>> Missing job: found missing jobSchedBaseObject SchedArglist {
Account = "root"
Agent Job Ended = 0
Command = "/usr/bin/sleep 10"
Have Step = 1
JOB START TIME = November 19, 1996 1:56:51 pm
Job Number = 1
Output File = ""
Owner = "root"
Pid = 17552
Ref Job Id = 1
Restart Parameter = "(*&*&%giuhjl()*890biPIBUOY%^&"
Run Mode = 18001
Run Priority = 0
SERVER NODE = "dwprod"
Step Number = 1
Trigger Job Id = 0
Trigger Job Status = 0
}
ProcessMain: Found a Message ... Processing
ProcessMain: message type is 123 message is
SchedArglist {
Ref Job Id = 1
SERVER NODE = "dwprod"
Status = 76991914
}
ProcessMain: AL_SERVER_NODE GetProperty returned 76972033
ProcessMain: Processing SCHED_IP_JOB_MISSING
ProcessMain: AL_REF_JOB_ID GetProperty returned 76972033 with value 1
ProcessMain: job process missing, job will be deletedSchedBaseObject
SchedArglist {
Account = "root"
Agent Job Ended = 0
Command = "/usr/bin/sleep 10"
Have Step = 1
JOB START TIME = November 19, 1996 1:56:51 pm
Job Number = 1
Output File = ""
Owner = "root"
Pid = 17552
Ref Job Id = 1
Restart Parameter = "(*&*&%giuhjl()*890biPIBUOY%^&"
Run Mode = 18001
Run Priority = 0
SERVER NODE = "dwprod"
Step Number = 1
Trigger Job Id = 0
Trigger Job Status = 0
}
Deleting checkpoint file: /var/sched/sched_agent_dwprod_1_r1.ckp
About to MessageDispatch with SchedArglist {
Ref Job Id = 1
SERVER NODE = "dwprod"
Status = 76991914
}
Message sent.
ProcessMain: Done processing SCHED_IP_JOB_MISSING
agent30_main.cxx::ProcessMain returning status: 76972033
Exiting agent30_main.cxx::ProcessMain
'DWPROD_ENGINE.LOG' FILE
========================
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_engine.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_engin
e.log
...Number of arguments specified: 1
Command line arguments (Default = -1):
1: process code. Default to 2.
2: use simulated agents. Default to FALSE(0).
3: use realtime clock. Default to TRUE(1).
4: debug level (-1/all to 3). Default to 2.
5: start without waiting for TXM/verification. Default to FALSE.
6: process code for V3 agents.
Default to 3.
7: port number for V21 agents. Default to 5482.
8: activate interface to V21 agents. Default to FALSE.
9: alternative log file name.
-->Engine status initialized to: SchedArglist {
Build Date = "Oct 16 1995"
Component Type = "Engine"
Database is attached = 0
OS Type = "OSF/1"
Scheduler Version = "V3.0-02"
Sender Process Id = SchedProcessAddress SchedArglist {
SCHEDULER PROCESS CODE = 2
Server = "dwprod"
}
Server = "dwprod"
Startup Time = November 19, 1996 1:03:49 pm
}
!!! Job 1 found missing by Agent during execution. Treated as job failed.
!!! Job 1 found missing by Agent during execution. Treated as job failed.
'DWPROD_LISTENER.LOG' FILE
===========================
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_listener.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_liste
ner.log
Command line arguments:
arg 1: process code. Default to 4
arg 2: debug level. Default to 1
...Process code for this UIL is: 4
...Debug level is: 0
'DWPROD_TXM.LOG' FILE
=====================
SchedTrace::OpenSchedulerLog: Opened log file /var/sched/dwprod_txm.log
SchedTrace::OpenSchedulerLog: Start of Trace Information /var/sched/dwprod_txm.l
og
SchedTrace::CloseLog: closing log file...
|