[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:Languages
Notice:Speaking In Tongues
Moderator:TLE::TOKLAS::FELDMAN
Created:Sat Jan 25 1986
Last Modified:Wed May 21 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:394
Total number of notes:2683

383.0. "keeping functional units busy" by STAR::PRAETORIUS (what does the elephant need?) Thu Sep 08 1994 17:12

     With the recent rise in popularity of multiple issue machines,
which usually have a number of functional units dedicated to floating
point operations, the thought occurred to me that integer code might
go a little faster if some of the operations could be simulated in
the floating units (presumably one would pick things that were less
latency sensitive, since latencies tend to be higher on the FP side).

     My next thought was that, since commercial multiple issue machines
have been around for 30 years (the CDC 6600 is a quad issue box), there
must've already been some interesting research on this.  Does anybody
know where I could find it?
T.RTitleUserPersonal
Name
DateLines
383.1AUSSIE::GARSONachtentachtig kacheltjesThu Sep 08 1994 20:056
    re .0
    
    Can't help with research pointers but I guess in a sort of way Alpha
    already does this by not having an integer divide instruction. Perhaps
    the folks who designed the Alpha chip looked at this kind of thing and
    could provide pointers.
383.2SMOP::glossopKent GlossopThu Sep 08 1994 22:2839
This tends to be a fairly classic "phase ordering problem".  You can try
to pick FP operations for some integer code, but since the latencies are
longer, you really only want to do it when you can tell that changing
the type will actually help (which would typically be during scheduling).
Almost all (or all) production compilers in existance do code selection,
then code scheduling.  I can imagine doing scheduling, then code changes,
then re-scheduling, but...

In practise, the only thing I've seen that you might want to be able to
do is to do memory/memory copies using the FP regs on occasion (and that
isn't for latency or scheduling - it's to avoid integer register pressure).
In general, the benefits seem to be *extremely* marginal.  (Much lower
than most other transforms which will be more generally applicable.)
Part of the issue is that there are very few complete expressions
that can be done in floating point without requiring additional
conversion operations to be inserted (and in most cases, that winds
up being a lose.)

Note that one thing that GEM will already do in some limited cases is
strength reduction using floating point.  For example, if you have
an integer variable that is always converted to double in a loop,
a version of the value is kept in floating point.

A simple example is:

	double d() {
	    int i;
	    double s = 0.0;
	    for(i=0; i<100; i++)
		s += i;
	    return s;
	}

Where there's effectively a copy of "i" in the floating register set
that gets incremented by 1.0d0.  (Note that this is kind of ironic -
the best transformation in this case would be to determine that in
fact the floating point value only contained integral values, and
instead do the whole loop in integer and only convert to floating
point at the return - which would be about a 5x improvement on EV4...)
383.3Gilding a lilyQUARRY::reevesJon Reeves, UNIX compiler groupFri Sep 09 1994 12:062
Actually, the best transformation would be to precompute the result and simply
turn the function into "return 4950.0;".
383.4SMOP::glossopKent GlossopFri Sep 09 1994 14:4124
Yep.  GEM will do that for small numbers when the loop is completely
unrolled, but doesn't try to actually interpret loops.  Just for example:

              1 double d() {
              2     int i;
              3     double s = 0.0;
              4     for(i=0; i<5; i++)
              5         s += i;
              6     return s;
              7 }

d::                                                                 ; 000001
        ldah    gp, d                   ; gp, (r27)
        lda     gp, d                   ; gp, (gp)
        ldq     r28, (gp)               ; r28, (gp)                 ; 000005
        ldt     f0, (r28)               ; f0, (r28)
        ret     r26                     ; r26                       ; 000006

        .section .lita, QUAD, noexe, rd, nowrt
        .address  .lit8

        .section .lit8, QUAD, noexe, rd, nowrt
        .double   10.0000000000000

383.5not sure I got your driftSTAR::PRAETORIUSwhat does the elephant need?Thu Sep 15 1994 15:173
     Is the intent of .2 that it's not really a good idea, or that it's
not feasible to do with traditional compiler organization (or both or
neither)?
383.6SMOP::glossopKent GlossopThu Sep 15 1994 15:389
Both:

    - The opportunities with current hardware appear to be very limited
      (given the lack of similar available functions in the "architectural
      functional units")

    - Attempting to exploit those (very few) opportunities might well take
      a different compiler organization in order to have a chance of being
      a net gain.
383.7RANGER::BRADLEYChuck BradleyFri Feb 17 1995 17:558
re .0
no pointer to research, but some history.

i've heard several times of programs on CDC6600 using floating point
operations for indexing counted loops.  i do not remember ever hearing
if it was a compiler trick or was only done by assembly language programmers.

383.8AUSSIE::BELLCaritas Patiens estSun Feb 26 1995 21:427
I don't remember see any CDC compiler that used floating point for loop
indexing. But the 6600s did not have an integer multiply instruction, and any
index calculation that required a multiplication request floating point
operations. This was fixed in the Cyber 70 series when the DXn Xn*Xn was made to
do an integer multiply when both exponents were zero.

Peter.