| Re: Note 1209.1 by HLFS00::ERIC_S
> This looks like either a busy system and a bad tuned scheduler
> environment. Look at # 1206.1
Thanks much for that, Eric. I relayed the article to the customer and
asked them to confirm each point in return. They did so, and their
reply follows.
In short, the command execution time is down to about 15 seconds, and
they don't say that they did anything to cause that. They had a couple
of freezes recently.
I'm guessing there might be another reason for the freeze, though the
size of the creamery log file appears to be within the range of
trouble.
James Cameron
Sydney CSC.
From: SMTP%"LinscottS@..." 11-FEB-1997 08:49:09.78
To: <[email protected]>
Subj: RE: K20013 - Response Time Is Slow To DECscheduler Commands
I have added comments (preceded with ***) in the following document.
Sam
[...]
*** We have had two problems in the last week with the scheduler
freezing. At the moment it seems to be ok. It is taking about 15
seconds to action state change requests (such as SCHED HOLD commands).
>[DECsched] Response Time Is Slow To DECscheduler Commands
>SOURCE: Digital Customer Support Center
[snip...]
>o System resources are fully used. In this case other system
> processes will also be slow. Troubleshoot as a system resource
> problem and not a DECscheduler problem. Check out system
> parameters to identify where the bottleneck might be.
*** There are no significant resource problems. The scheduler database
is on a shadowed disk (spread over 2 HSJs. Users notice no problems
with response times.
On one occasion the NSCHED$ disk filled up, and we experienced major
scheduler problems. However even with a million blocks free space on
this disk we still have problems.
>o Debugger logfile (NSCHED$:NODENAME.LOG) is larger than 500 blocks.
*** The Debugger log files average about 20 blocks for each node in the
cluster.
>o The history log file is greater than 2500 blocks. The history
> logfile is pointed to by the logical NSCHED$LOGFILE or if
> undefined is NSCHED$:VERMONT_CREAMERY.LOG by default.
*** We have had problems with vermont_creamery.log before, and have
previously been creating a new version of the log file when it reaches
25000 blocks. Currently the log file is 4200 blocks. We have noticed
lately that starting a new version of the log file does not improve
response times.
>o DECscheduler logging may need to be reduced.
*** We log 5 events (1 job and 4 abnormal) on each node in the cluster.
We guess that this information is used by the SCHED SHO HIST command.
>o The DECscheduler database (VSS.DAT) may be fragmented from the
> number of deletes being greater than 200.
*** The number of deletes is 46
>o The DECscheduler default node transition may be set to a slower
> system. Issue the command "$ SCHED SHOW STATUS" for the following
> output:
*** The default node transition was a vax 6600. I have just moved it to
a vax 7800.
>o The NSCHED priority may have been lowered. In the example
> above(#5), the "Pri" field is the default priority that all jobs
> will run at. Is this number less than four? If so, consider
> increasing this priority.
*** The default priority has always been 4 for all nodes.
>o A new node is currently bringing up DECscheduler. This may cause a
> temporary slowness in writing information to the database.
*** All nodes have been up for several days.
>o Review cluster node system times. If your operating in a clustered
> environment, DECscheduler's performance can greatly be effected if
> the system clocks on the various nodes don't agree.
*** All nodes have the same time. NTP is used to keep the times
synchronised.
[end]
|