T.R | Title | User | Personal Name | Date | Lines |
---|
448.1 | | UTRTSC::utoras-198-48-146.uto.dec.com::JurVanDerBurg | Change mode to Panic! | Fri Apr 11 1997 02:10 | 8 |
| If you have DECPS data then start looking at it. It contains a lot of info
on the system's state which should point you in the right direction. It may
be possible to increase the default 2 minutes sample interval of DECPS to
check with as finer granularity, which should certainly give the info you
need.
Jur.
|
448.2 | | BSS::JILSON | WFH in the Chemung River Valley | Fri Apr 11 1997 10:08 | 12 |
| You cannot change the main collection (CPD) interval. It is fixed at 2
min. You can create an alternate collection at a smaller interval.
I would be looking at just the specific evidence time periods and see if
the cpu is overworked at just those times. This conclusion is saying that
you have too many COM processes that are above their default priority AND
the process consuming the most cpu time is using > 40% AND the top process
is a high priority process. It would appear you have time periods where
the cpu cannot keep up with the demand or the top priority process is going
compute bound. Might be time to add another CPU, if possible.
Jilly
|
448.3 | Timers falling into sync? | EPS::VANDENHEUVEL | Hein | Fri Apr 11 1997 11:26 | 54 |
|
> The problem is that the system hesitates.
:
> there are no interactive users,
Can you refine that problem description please.
Who notices the hesitations since there are no interactive
users to give feedback on say echo times during edit session :-)
Should we think about sub-second hickups while editting?
Monitor screens freezing for several seconds?
No printers going for several minutes?
Perhaps you 'simply' have a clock sync resonance problem ?
Some timers that are prone to go of (milli)seconds apart
on set time interval? For sake of the argument, let's
have 10 schedulers of sorts waking themself up every 10
seconds and each requiring 0.5 seconds of CPU. If they
manage to do so perfectly spread, you'll see no com queue
and 50% cpu business. If the timers go of at the same
time, you'll see a com queue of 9 at that time. Now the QUANTUM
start to play an important role. If each of those tasks is allowed
to go from start to finish without being pre-empted, then every
0.5 second one will be done and there will be an average com queue
of 2.5 (half of 10 processes waiting for 5 seconds every 10 seconds)
and still an average CPU business of 50% !
If quantum is low, then the CPU will be handed over all the time
and all tasks will finish within a quantum from each other, when
all of them are done. There would be a com que of 9 for 5 out of 10
second and 0 for the other 5 giving an average com queue of 4ish. Yuck.
In the latter case, even if the start time spread is say 2 seconds wide
around a central time, there will still be a clump of activity where
you are likely to hit com queue, and once you hit the queue, you'll be
part for that queue and compound the problem. It only takes a little
external sync point, like adisk volume allocation lock, for them to
start syncing more and more and for the queue to spike more and more.
You might need bigger quantums to allow processes to do start their
jobs and finish them, If you can tweak timers, you may try to set them
not to coincide all the time (7,11, and 13 seconds instead of 3 times 10).
You might find an external sync event between seemingly unrelated
processes that you can spread, for example a log file looking for space
on a disk all the time, fighting with a process creating files.
Solutions: spread activities over disks and directories. Make files
extend by serious chunks instead of the silly 5 blocks found too often.
Get a second CPU! That should flatten out those peaks tremendeously.
fwiw,
Hein.
|
448.4 | Too many scheduled jobs? | TAY2P1::HOWARD | Whoever it takes | Fri Apr 11 1997 18:39 | 39 |
| The only interactive users are a few people logging into SYSTEM to
monitor queues and check on the system. I see the hesitation, but
people printing see jobs that seem to take forever to print. We get
people printing the same job to the same printer via an NT server
printing in about a quarter of the time. Usually, I find that DECPS
gives very concrete suggestions, such as "increase NPAGEDYN" or "reduce
file fragmentation". The initial report said to run LIBDECOMP, which I
did. It also suggested rewriting the applications, which are all
standard Digital or former Digital products. I also installed
DCPS$SMB, since that is used a great deal. File fragmentation is good
to excellent on all drives.
I like the idea of adding a CPU, but that isn't going to happen unless
I can find an idle asset somewhere.
The current report gives massive lists of image activations, e.g.,
# of Page Faults Avg. % of % of Uptime/ Cputim/
activ- per Actvtn Ws Direct Buffered % of image image
Image ations -Soft--Hard size I/O I/O Cputim (sec) (sec)
-------- ------- ------ ---- ------ ------ ------ ------ ------- --------
. . .
DQS$CLIENT 4 325 22 374 0.08 0.05 0.01 11 0.56
Would INSTALLing this image be likely to make a difference? 325 page
faults is not too many over a 24-hour period.
There are a lot of Scheduler jobs. Some of them run periodically
through the day. I'm not sure why they would cause undo strain, since
they mostly look for stopped queues or error messages. Mostly DCL and
mostly running at normal priority. Is it just that the Scheduler is
waking up to check them?
.3 suggests increasing quantums. Is that what you mean? I will review
these jobs to try to keep them from running into each other.
Ben
|
448.5 | | ZIMBRA::BERNARDO | Dave Bernardo, VMS Engineering | Fri Apr 11 1997 19:22 | 4 |
| I would be tempted to reduce QUANTUM before I'd increase it...
If you do, reduce AWSTIME as well.
d.
|
448.6 | How many DCPS queues? Verison of DCPS? | KEIKI::WHITE | MIN(2�,FWIW) | Sat Apr 12 1997 20:40 | 19 |
|
There is an issue with DCPS where all the symbionts would wake up
every .10 of second whether the queues were active or not.
This is a known problem that occurs because DCPS use DECthreads in its
operation. DECthreads uses a timer AST which expires every 0.10 of a
second and then resets itself. This causes the process to become
computable (COM) to handle the AST.
How many DCPS queues and what version are they? This would probably
throw DECps into fits.
Bill
PS - Comet V4.3 search criteria- dcps decthreads cpu time -
were the four words used.
|
448.7 | Running 1 symbiont process per queue | TAY2P1::HOWARD | Whoever it takes | Tue Apr 15 1997 17:39 | 11 |
| Thanks for the input. I will start with the number of DCPS processes
and look at QUANTUM after that. PSPA was very happy over the weekend
after I CONVERTed NSCHED$:VSS.DAT. It went from 4200 to 52 blocks. But
Monday's report is pretty much as before. Not sure if this was
related, but it probably was a good idea anyway.
>How many DCPS queues and what version are they?
There are 58 DCPS queues running DCPS V1.3. 14 were inactive last week.
Ben
|
448.8 | DCPS$MAX_STREAMS | FUNYET::ANDERSON | Exchange *this* | Tue Apr 15 1997 21:28 | 5 |
| You can reduce the number of DCPS symbiont processes by changing the value of
the logical name DCPS$MAX_STREAMS as described in DCPS$STARTUP.COM. This may
help your situation.
Paul
|
448.9 | Looks like DCPS$MAX_STREAMS was the solution | TAY2P1::HOWARD | Whoever it takes | Fri Apr 18 1997 18:34 | 10 |
| I set DCPS$MAX_STREAMS to 4 and PSPA is now reporting no bottleneck.
It had been 8 before the problem began, but I had removed the logical
in an attempt to get PATHWORKS working. I don't know if it helped that
problem, since I did several things at the same time to get things
going again.
I appreciate the help, because I did not see the relationship between
that and the problems.
Ben
|