[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:Languages
Notice:Speaking In Tongues
Moderator:TLE::TOKLAS::FELDMAN
Created:Sat Jan 25 1986
Last Modified:Wed May 21 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:394
Total number of notes:2683

327.0. "C compiler question" by SANFAN::MCNICHOL_TH () Mon Feb 03 1992 18:48

G'day,

    I'm looking for some support for "C" on a RISC system.  I have a 
    customer whoses having a problem (see below).  So, if someone
    could direct me to another notes file, or a person that may have
    some understanding of the problem and it's resolution, I'd really
    appreciate the assistance.

    Thanks in advance,

         TED

**************************************************************************
**************************************************************************

Ted,

I am up against a C-compiler optimization problem that
is frustrating me.  I am trying to run a convolution
program and I can see by looking at the assembler code
generated by cc that it is having all kinds of problems
that could be resolved if the compiler were smarter
about its use of registers. 

The question is whether there is a compiler that can
be instructed better by the user to save its registers
and use them in the inside of a tight loop?

Steve

T.RTitleUserPersonal
Name
DateLines
327.1real problem is with choice of languageSAUTER::SAUTERJohn SauterTue Feb 04 1992 15:407
    I don't have an answer to your specific question but I do have a
    general observation on this kind of problem.  If the customer is
    displeased because he looks at the assembly code generated by a
    compiler and wishes it were better, he is using the wrong language.
    If he knows what assembly code he wants, he should be coding in
    the assembler.
        John Sauter
327.2suggest cautionSGOUTL::BELDIN_RPull us together, not apartWed Feb 05 1992 09:548
    re .0
    
    Many C instructors recommend that programmers not depend on use of the
    "register" attribute for variables precisely because the results are
    highly hardware dependent.  It sounds like the customer is trying to
    find something to criticize.  Watch out.
    
    Dick
327.3Pay attention to customers...CIRCUS::DETLEFSWed Feb 05 1992 13:1330
John --

I'm afraid that your "general observation"

    I don't have an answer to your specific question but I do have a
    general observation on this kind of problem.  If the customer is
    displeased because he looks at the assembly code generated by a
    compiler and wishes it were better, he is using the wrong language.
    If he knows what assembly code he wants, he should be coding in
    the assembler.

doesn't make much sense to me.  Taken to the extreme, it seems to imply
that we should not attempt to create optimizing compilers.  Why should
the customer not expect the compiler to generate good code?  If he
looks at the generated code and doesn't like it, then one could just
as well say that he is using the wrong compiler; he may well go out
and find one that gives him a more pleasing result.

Perhaps this is the source of our disagreement:  I'm assuming that
if the customer can look at the assempbly code and envision equivalent,
more efficient code, then the customer is fairly knowledgable about
compiler optimizations, and is probably proposing a reasonable
optimization that the compiler didn't do.  You're probably assuming
that the customer is proposing some ad-hoc hack that's applicable
only in this instance.  If my assumption is true, DEC ought to listen
very carefully to the customer's feedback.  If yours is true, then
your attitude is probably correct.  We ought to make darn sure which
case this falls into, though.

Dave
327.4I agree there are two cases...SAUTER::SAUTERJohn SauterWed Feb 05 1992 15:5214
    I don't think a customer should expect a compiler to generate good
    code, though a customer does expect his program to run fast.  The
    description in .0 didn't clearly indicate that the problem was
    performance; it sounded like the customer felt that the code was "bad"
    because it didn't make "good" use of registers.  A compiler shouldn't
    be obligated to generate "beautiful" code, arrange to run the program
    acceptably fast.
    
    If the customer has profiled his application, discovered a hot spot,
    examined the code at that spot, rewritten it in assembler, and measured
    a significant improvement in the performance of his application, then
    we should listen to him.  If he's just complaining about the beauty of
    the code, we shouldn't.
        John Sauter  
327.5Addition informationSANFAN::MCNICHOL_THThu Feb 06 1992 15:2550
G'day,

    I'd like to follow up on the replies I've recieved from my original
    note.  The customer is doctor/researcher at UC of San Francisco.  He
    is VERY technically competent and a great DEC supporter.  I'd really 
    like to help him, or at least let him know that there's an attempt
    to listen to him.  Please....can some one tell me how I go about  
    having someone address this?  Even someone calling to clarify, and  
    qualify his concern.  I need to come up with some kind of strategy,
    I know the resource exsists, but I can really use your assistance.
    From his reply below, he know what he's doing....

 Ted


******************************************************************************

Ted,

I thought you would like to know just how slowly the 5000
is running on this convolution.  The program reduces to a
tight loop that is running 25 instructions, 60% lw or sw,
20% adds and 20% integer multiplies.  For loop control, lets
give it an additional 15 instructions for a total or 40.

The loop is run 64x16x16x16x8 times and it takes 69 seconds.
That amounts to about 12 MIPS.  The machine isn't doing anything
else, not even running a bunch of windows while this is happening.

So, where is the rest of the time going?  

I've tried a couple of things that are intended to improve the
pipelining of instructions and reduce stalls, but they don't
give any improvement in performance. 

I'm about to give up on this optimization and just run my
program.  The program will run 6.7 hours and has to run 128 
times -- so you see why I'm looking for a way to speed it up!

Steve

% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: by mts-gw.pa.dec.com; id AA22907; Thu, 6 Feb 92 08:00:02 -0800
% Received: from phye.ucsf.EDU by cgl.ucsf.EDU (5.65/GSC4.21) id AA20713 for [email protected]; Thu, 6 Feb 92 07:59:56 -080
% Received: by phye.ucsf.EDU (5.57/GSC4.19) id AA23446; Thu, 6 Feb 92 08:00:27 -080
% Date: Thu, 6 Feb 92 08:00:27 -0800
% From: [email protected]
% Message-Id: <[email protected]>
% To: sanfan::mcnichol_th
% Subject: depressingly slow ...
327.6consider the CSCSAUTER::SAUTERJohn SauterThu Feb 06 1992 15:473
    Perhaps you should contact the Customer Support Center, or put your
    customer in touch with the specialists there.
        John Sauter
327.7Some analysis...SMOP::GLOSSOPKent GlossopThu Feb 06 1992 16:5963
I'm a little confused by how he got 12 MIPS, but...

64*16*16*16*8 = ~2M loop iterations

If this takes 69 seconds, that's about 30400 iterations/second, or about
822 cycles/iteration on a 25MHz machine, which seems very slow indeed
if the loop really is only on the order of 40 instructions.  (20+ cycles
per instruction seems very high, even in the absolute worst case.)

One thing to keep in mind is that integer multiply is very slow on MIPS
relative to the other integer  instructions (which is very typical for
RISC machines.)  MIPS R3000s are 15 cycles if you really can issue them
back-to-back.  (i.e. mult/mflo/stall/stall/mult.)

However, having said all of this, if he's right about the number of
instructions, it should come to only around 100 cycles/iteration even
for unscheduled/unoptimized code from the perspective of the processor
(presuming all cache hits for memory references).

Memory references could be the culprit.

The first possibility is if things don't fit in the on-chip cache,
many of these memory references may be going all the way to memory.
(Are they dealing with extremely large arrays by any chance?)  When
things miss the cache, the basically start running at closer to
memory speeds.  (i.e. what matters is the fact that you have n-nanosecond
memory, say 40/60/80 - I don't know what the DS5000 has offhand.)  Note
that it isn't that bad for each non-cached reference quite frequently,
depending on the memory system design.

The next possibility is that the data involved is large enough that
it is paging.  If that is true, it could definition explain things.

Another possible culprit is that RISC machines tend to have is a
direct-mapped cache.  If things are separated by a power of two larger
than the cache size, it is possible to get (very) destructive cache
interference.  (The number of iterations is suggestive that there
might be matricies with power-of-2 sizes.)  If he has large power-of-2
sized matricies, one thing to try is to pad them by a relatively small
amount (say, 32 bytes) to prevent cache-line interference.

--------------------
Figuring worst-case (basically unschedulable code, etc.):

  Fixed-time instructions
     5 mul/add - which actually have one additional mflo    ~80 cycles
       instruction per multiply
    15 loop control instructions			     15 cycles

That leaves basically 725 cycles for 15 memory references, or almost
50 cycles apiece, which would be 2000 nanoseconds/memory reference,
which I wouldn't believe even with the worse memory system.  I might
well believe this if there is paging involved.  (From the description,
it sounds like they believe that they have eliminated that, though.)

--------------------
I'm not sure if this helped or not, but if the loop figures are correct,
I would suspect memory references, and if the time is really this long,
I would suspect paging, because even the worst memory system shouldn't
be even as slow as a factor of 10 faster than what the above figures
came up with.

Kent
327.8Expect good codeDREGS::BLICKSTEINSoaring on the wings of dawnWed Apr 08 1992 15:4420
    re: .1, .4 (Sauter)
    
>    I don't have an answer to your specific question but I do have a
>    general observation on this kind of problem.  If the customer is
>    displeased because he looks at the assembly code generated by a
>    compiler and wishes it were better, he is using the wrong language.
>    If he knows what assembly code he wants, he should be coding in
>    the assembler.
    
>    I don't think a customer should expect a compiler to generate good
>    code, though a customer does expect his program to run fast.  
    
    > A compiler shouldn't be obligated to generate "beautiful" code, arrange
    > to run the program acceptably fast.
    
    The opinions expressed by Mr. Sauter are his own and do not necessarily 
    reflect those of The Language Group (TLG).  ;-)
    
    	Dave Blickstein
    	GEM Optimizer