T.R | Title | User | Personal Name | Date | Lines |
---|
327.1 | real problem is with choice of language | SAUTER::SAUTER | John Sauter | Tue Feb 04 1992 15:40 | 7 |
| I don't have an answer to your specific question but I do have a
general observation on this kind of problem. If the customer is
displeased because he looks at the assembly code generated by a
compiler and wishes it were better, he is using the wrong language.
If he knows what assembly code he wants, he should be coding in
the assembler.
John Sauter
|
327.2 | suggest caution | SGOUTL::BELDIN_R | Pull us together, not apart | Wed Feb 05 1992 09:54 | 8 |
| re .0
Many C instructors recommend that programmers not depend on use of the
"register" attribute for variables precisely because the results are
highly hardware dependent. It sounds like the customer is trying to
find something to criticize. Watch out.
Dick
|
327.3 | Pay attention to customers... | CIRCUS::DETLEFS | | Wed Feb 05 1992 13:13 | 30 |
| John --
I'm afraid that your "general observation"
I don't have an answer to your specific question but I do have a
general observation on this kind of problem. If the customer is
displeased because he looks at the assembly code generated by a
compiler and wishes it were better, he is using the wrong language.
If he knows what assembly code he wants, he should be coding in
the assembler.
doesn't make much sense to me. Taken to the extreme, it seems to imply
that we should not attempt to create optimizing compilers. Why should
the customer not expect the compiler to generate good code? If he
looks at the generated code and doesn't like it, then one could just
as well say that he is using the wrong compiler; he may well go out
and find one that gives him a more pleasing result.
Perhaps this is the source of our disagreement: I'm assuming that
if the customer can look at the assempbly code and envision equivalent,
more efficient code, then the customer is fairly knowledgable about
compiler optimizations, and is probably proposing a reasonable
optimization that the compiler didn't do. You're probably assuming
that the customer is proposing some ad-hoc hack that's applicable
only in this instance. If my assumption is true, DEC ought to listen
very carefully to the customer's feedback. If yours is true, then
your attitude is probably correct. We ought to make darn sure which
case this falls into, though.
Dave
|
327.4 | I agree there are two cases... | SAUTER::SAUTER | John Sauter | Wed Feb 05 1992 15:52 | 14 |
| I don't think a customer should expect a compiler to generate good
code, though a customer does expect his program to run fast. The
description in .0 didn't clearly indicate that the problem was
performance; it sounded like the customer felt that the code was "bad"
because it didn't make "good" use of registers. A compiler shouldn't
be obligated to generate "beautiful" code, arrange to run the program
acceptably fast.
If the customer has profiled his application, discovered a hot spot,
examined the code at that spot, rewritten it in assembler, and measured
a significant improvement in the performance of his application, then
we should listen to him. If he's just complaining about the beauty of
the code, we shouldn't.
John Sauter
|
327.5 | Addition information | SANFAN::MCNICHOL_TH | | Thu Feb 06 1992 15:25 | 50 |
| G'day,
I'd like to follow up on the replies I've recieved from my original
note. The customer is doctor/researcher at UC of San Francisco. He
is VERY technically competent and a great DEC supporter. I'd really
like to help him, or at least let him know that there's an attempt
to listen to him. Please....can some one tell me how I go about
having someone address this? Even someone calling to clarify, and
qualify his concern. I need to come up with some kind of strategy,
I know the resource exsists, but I can really use your assistance.
From his reply below, he know what he's doing....
Ted
******************************************************************************
Ted,
I thought you would like to know just how slowly the 5000
is running on this convolution. The program reduces to a
tight loop that is running 25 instructions, 60% lw or sw,
20% adds and 20% integer multiplies. For loop control, lets
give it an additional 15 instructions for a total or 40.
The loop is run 64x16x16x16x8 times and it takes 69 seconds.
That amounts to about 12 MIPS. The machine isn't doing anything
else, not even running a bunch of windows while this is happening.
So, where is the rest of the time going?
I've tried a couple of things that are intended to improve the
pipelining of instructions and reduce stalls, but they don't
give any improvement in performance.
I'm about to give up on this optimization and just run my
program. The program will run 6.7 hours and has to run 128
times -- so you see why I'm looking for a way to speed it up!
Steve
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: by mts-gw.pa.dec.com; id AA22907; Thu, 6 Feb 92 08:00:02 -0800
% Received: from phye.ucsf.EDU by cgl.ucsf.EDU (5.65/GSC4.21) id AA20713 for [email protected]; Thu, 6 Feb 92 07:59:56 -080
% Received: by phye.ucsf.EDU (5.57/GSC4.19) id AA23446; Thu, 6 Feb 92 08:00:27 -080
% Date: Thu, 6 Feb 92 08:00:27 -0800
% From: [email protected]
% Message-Id: <[email protected]>
% To: sanfan::mcnichol_th
% Subject: depressingly slow ...
|
327.6 | consider the CSC | SAUTER::SAUTER | John Sauter | Thu Feb 06 1992 15:47 | 3 |
| Perhaps you should contact the Customer Support Center, or put your
customer in touch with the specialists there.
John Sauter
|
327.7 | Some analysis... | SMOP::GLOSSOP | Kent Glossop | Thu Feb 06 1992 16:59 | 63 |
| I'm a little confused by how he got 12 MIPS, but...
64*16*16*16*8 = ~2M loop iterations
If this takes 69 seconds, that's about 30400 iterations/second, or about
822 cycles/iteration on a 25MHz machine, which seems very slow indeed
if the loop really is only on the order of 40 instructions. (20+ cycles
per instruction seems very high, even in the absolute worst case.)
One thing to keep in mind is that integer multiply is very slow on MIPS
relative to the other integer instructions (which is very typical for
RISC machines.) MIPS R3000s are 15 cycles if you really can issue them
back-to-back. (i.e. mult/mflo/stall/stall/mult.)
However, having said all of this, if he's right about the number of
instructions, it should come to only around 100 cycles/iteration even
for unscheduled/unoptimized code from the perspective of the processor
(presuming all cache hits for memory references).
Memory references could be the culprit.
The first possibility is if things don't fit in the on-chip cache,
many of these memory references may be going all the way to memory.
(Are they dealing with extremely large arrays by any chance?) When
things miss the cache, the basically start running at closer to
memory speeds. (i.e. what matters is the fact that you have n-nanosecond
memory, say 40/60/80 - I don't know what the DS5000 has offhand.) Note
that it isn't that bad for each non-cached reference quite frequently,
depending on the memory system design.
The next possibility is that the data involved is large enough that
it is paging. If that is true, it could definition explain things.
Another possible culprit is that RISC machines tend to have is a
direct-mapped cache. If things are separated by a power of two larger
than the cache size, it is possible to get (very) destructive cache
interference. (The number of iterations is suggestive that there
might be matricies with power-of-2 sizes.) If he has large power-of-2
sized matricies, one thing to try is to pad them by a relatively small
amount (say, 32 bytes) to prevent cache-line interference.
--------------------
Figuring worst-case (basically unschedulable code, etc.):
Fixed-time instructions
5 mul/add - which actually have one additional mflo ~80 cycles
instruction per multiply
15 loop control instructions 15 cycles
That leaves basically 725 cycles for 15 memory references, or almost
50 cycles apiece, which would be 2000 nanoseconds/memory reference,
which I wouldn't believe even with the worse memory system. I might
well believe this if there is paging involved. (From the description,
it sounds like they believe that they have eliminated that, though.)
--------------------
I'm not sure if this helped or not, but if the loop figures are correct,
I would suspect memory references, and if the time is really this long,
I would suspect paging, because even the worst memory system shouldn't
be even as slow as a factor of 10 faster than what the above figures
came up with.
Kent
|
327.8 | Expect good code | DREGS::BLICKSTEIN | Soaring on the wings of dawn | Wed Apr 08 1992 15:44 | 20 |
| re: .1, .4 (Sauter)
> I don't have an answer to your specific question but I do have a
> general observation on this kind of problem. If the customer is
> displeased because he looks at the assembly code generated by a
> compiler and wishes it were better, he is using the wrong language.
> If he knows what assembly code he wants, he should be coding in
> the assembler.
> I don't think a customer should expect a compiler to generate good
> code, though a customer does expect his program to run fast.
> A compiler shouldn't be obligated to generate "beautiful" code, arrange
> to run the program acceptably fast.
The opinions expressed by Mr. Sauter are his own and do not necessarily
reflect those of The Language Group (TLG). ;-)
Dave Blickstein
GEM Optimizer
|