[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:	Languages
Notice:	Speaking In Tongues
Moderator:	TLE::TOKLAS::FELDMAN

Created:	Sat Jan 25 1986
Last Modified:	Wed May 21 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	394
Total number of notes:	2683

327.0. "C compiler question" by SANFAN::MCNICHOL_TH () Mon Feb 03 1992 18:48

G'day,

    I'm looking for some support for "C" on a RISC system.  I have a 
    customer whoses having a problem (see below).  So, if someone
    could direct me to another notes file, or a person that may have
    some understanding of the problem and it's resolution, I'd really
    appreciate the assistance.

    Thanks in advance,

         TED

**************************************************************************
**************************************************************************

Ted,

I am up against a C-compiler optimization problem that
is frustrating me.  I am trying to run a convolution
program and I can see by looking at the assembler code
generated by cc that it is having all kinds of problems
that could be resolved if the compiler were smarter
about its use of registers. 

The question is whether there is a compiler that can
be instructed better by the user to save its registers
and use them in the inside of a tight loop?

Steve

T.R	Title	User	Personal Name	Date	Lines
327.1	real problem is with choice of language	SAUTER::SAUTER	John Sauter	`Tue Feb 04 1992 15:40`	7
	I don't have an answer to your specific question but I do have a general observation on this kind of problem. If the customer is displeased because he looks at the assembly code generated by a compiler and wishes it were better, he is using the wrong language. If he knows what assembly code he wants, he should be coding in the assembler. John Sauter
327.2	suggest caution	SGOUTL::BELDIN_R	Pull us together, not apart	`Wed Feb 05 1992 09:54`	8
	re .0 Many C instructors recommend that programmers not depend on use of the "register" attribute for variables precisely because the results are highly hardware dependent. It sounds like the customer is trying to find something to criticize. Watch out. Dick
327.3	Pay attention to customers...	CIRCUS::DETLEFS		`Wed Feb 05 1992 13:13`	30
	John -- I'm afraid that your "general observation" I don't have an answer to your specific question but I do have a general observation on this kind of problem. If the customer is displeased because he looks at the assembly code generated by a compiler and wishes it were better, he is using the wrong language. If he knows what assembly code he wants, he should be coding in the assembler. doesn't make much sense to me. Taken to the extreme, it seems to imply that we should not attempt to create optimizing compilers. Why should the customer not expect the compiler to generate good code? If he looks at the generated code and doesn't like it, then one could just as well say that he is using the wrong compiler; he may well go out and find one that gives him a more pleasing result. Perhaps this is the source of our disagreement: I'm assuming that if the customer can look at the assempbly code and envision equivalent, more efficient code, then the customer is fairly knowledgable about compiler optimizations, and is probably proposing a reasonable optimization that the compiler didn't do. You're probably assuming that the customer is proposing some ad-hoc hack that's applicable only in this instance. If my assumption is true, DEC ought to listen very carefully to the customer's feedback. If yours is true, then your attitude is probably correct. We ought to make darn sure which case this falls into, though. Dave
327.4	I agree there are two cases...	SAUTER::SAUTER	John Sauter	`Wed Feb 05 1992 15:52`	14
	I don't think a customer should expect a compiler to generate good code, though a customer does expect his program to run fast. The description in .0 didn't clearly indicate that the problem was performance; it sounded like the customer felt that the code was "bad" because it didn't make "good" use of registers. A compiler shouldn't be obligated to generate "beautiful" code, arrange to run the program acceptably fast. If the customer has profiled his application, discovered a hot spot, examined the code at that spot, rewritten it in assembler, and measured a significant improvement in the performance of his application, then we should listen to him. If he's just complaining about the beauty of the code, we shouldn't. John Sauter
327.5	Addition information	SANFAN::MCNICHOL_TH		`Thu Feb 06 1992 15:25`	50
	G'day, I'd like to follow up on the replies I've recieved from my original note. The customer is doctor/researcher at UC of San Francisco. He is VERY technically competent and a great DEC supporter. I'd really like to help him, or at least let him know that there's an attempt to listen to him. Please....can some one tell me how I go about having someone address this? Even someone calling to clarify, and qualify his concern. I need to come up with some kind of strategy, I know the resource exsists, but I can really use your assistance. From his reply below, he know what he's doing.... Ted ****************************************************************************** Ted, I thought you would like to know just how slowly the 5000 is running on this convolution. The program reduces to a tight loop that is running 25 instructions, 60% lw or sw, 20% adds and 20% integer multiplies. For loop control, lets give it an additional 15 instructions for a total or 40. The loop is run 64x16x16x16x8 times and it takes 69 seconds. That amounts to about 12 MIPS. The machine isn't doing anything else, not even running a bunch of windows while this is happening. So, where is the rest of the time going? I've tried a couple of things that are intended to improve the pipelining of instructions and reduce stalls, but they don't give any improvement in performance. I'm about to give up on this optimization and just run my program. The program will run 6.7 hours and has to run 128 times -- so you see why I'm looking for a way to speed it up! Steve % ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ====== % Received: by mts-gw.pa.dec.com; id AA22907; Thu, 6 Feb 92 08:00:02 -0800 % Received: from phye.ucsf.EDU by cgl.ucsf.EDU (5.65/GSC4.21) id AA20713 for [email protected]; Thu, 6 Feb 92 07:59:56 -080 % Received: by phye.ucsf.EDU (5.57/GSC4.19) id AA23446; Thu, 6 Feb 92 08:00:27 -080 % Date: Thu, 6 Feb 92 08:00:27 -0800 % From: [email protected] % Message-Id: <[email protected]> % To: sanfan::mcnichol_th % Subject: depressingly slow ...
327.6	consider the CSC	SAUTER::SAUTER	John Sauter	`Thu Feb 06 1992 15:47`	3
	Perhaps you should contact the Customer Support Center, or put your customer in touch with the specialists there. John Sauter
327.7	Some analysis...	SMOP::GLOSSOP	Kent Glossop	`Thu Feb 06 1992 16:59`	63
	I'm a little confused by how he got 12 MIPS, but... 641616168 = ~2M loop iterations If this takes 69 seconds, that's about 30400 iterations/second, or about 822 cycles/iteration on a 25MHz machine, which seems very slow indeed if the loop really is only on the order of 40 instructions. (20+ cycles per instruction seems very high, even in the absolute worst case.) One thing to keep in mind is that integer multiply is very slow on MIPS relative to the other integer instructions (which is very typical for RISC machines.) MIPS R3000s are 15 cycles if you really can issue them back-to-back. (i.e. mult/mflo/stall/stall/mult.) However, having said all of this, if he's right about the number of instructions, it should come to only around 100 cycles/iteration even for unscheduled/unoptimized code from the perspective of the processor (presuming all cache hits for memory references). Memory references could be the culprit. The first possibility is if things don't fit in the on-chip cache, many of these memory references may be going all the way to memory. (Are they dealing with extremely large arrays by any chance?) When things miss the cache, the basically start running at closer to memory speeds. (i.e. what matters is the fact that you have n-nanosecond memory, say 40/60/80 - I don't know what the DS5000 has offhand.) Note that it isn't that bad for each non-cached reference quite frequently, depending on the memory system design. The next possibility is that the data involved is large enough that it is paging. If that is true, it could definition explain things. Another possible culprit is that RISC machines tend to have is a direct-mapped cache. If things are separated by a power of two larger than the cache size, it is possible to get (very) destructive cache interference. (The number of iterations is suggestive that there might be matricies with power-of-2 sizes.) If he has large power-of-2 sized matricies, one thing to try is to pad them by a relatively small amount (say, 32 bytes) to prevent cache-line interference. -------------------- Figuring worst-case (basically unschedulable code, etc.): Fixed-time instructions 5 mul/add - which actually have one additional mflo ~80 cycles instruction per multiply 15 loop control instructions 15 cycles That leaves basically 725 cycles for 15 memory references, or almost 50 cycles apiece, which would be 2000 nanoseconds/memory reference, which I wouldn't believe even with the worse memory system. I might well believe this if there is paging involved. (From the description, it sounds like they believe that they have eliminated that, though.) -------------------- I'm not sure if this helped or not, but if the loop figures are correct, I would suspect memory references, and if the time is really this long, I would suspect paging, because even the worst memory system shouldn't be even as slow as a factor of 10 faster than what the above figures came up with. Kent
327.8	Expect good code	DREGS::BLICKSTEIN	Soaring on the wings of dawn	`Wed Apr 08 1992 14:44`	20
	re: .1, .4 (Sauter) > I don't have an answer to your specific question but I do have a > general observation on this kind of problem. If the customer is > displeased because he looks at the assembly code generated by a > compiler and wishes it were better, he is using the wrong language. > If he knows what assembly code he wants, he should be coding in > the assembler. > I don't think a customer should expect a compiler to generate good > code, though a customer does expect his program to run fast. > A compiler shouldn't be obligated to generate "beautiful" code, arrange > to run the program acceptably fast. The opinions expressed by Mr. Sauter are his own and do not necessarily reflect those of The Language Group (TLG). ;-) Dave Blickstein GEM Optimizer