[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:Digital Fortran
Notice:Read notes 1.* for important information
Moderator:QUARK::LIONEL
Created:Thu Jun 01 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1333
Total number of notes:6734

1211.0. "Any performance implications re: automatic vs. noautomatic (i.e., stack vs. static allocation)?" by HYDRA::NEWMAN (Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26) Wed Mar 05 1997 14:54

FORTRAN routines that are going to be called in parallel either
need to have local variables declared as AUTOMATIC either explicitly
in the source or else via the AUTOMATIC compiler switch.

Aside from that, are there any *performance* implications of one over
the other?  I would guess that it would be pretty much a wash, with
perhaps a slight win for living on the stack.

I would like the opinion of others more knowledgeable than I.

								-- Chuck Newman
T.RTitleUserPersonal
Name
DateLines
1211.1some issuesGEMGRP::PIEPERWed Mar 05 1997 15:3122
Automatic variables can be known to be not live on entry (they didn't exist
before) so that can help the optimizer. Automatic variables can exist entirely
in registers and need never go to memory, whereas "normal" variables may need to
be preserved across routine calls unless the compiler is SURE that they are
written before being read. Because of the way the compiler works, you can get
uninitialized variable messages for automatic variables that you wouldn't get
with statically-allocated variables.

Allocating variables statically will have *different* (data) cache behavior from
allcating them on the stack. Not necessarily better, nor necessarily worse, but
*different*.

Variables on the stack can re-use the same space, and may exhibit better
locality. On the other hand, when you put things on the stack it's much harder
to control their alignment in the cache, so they may conflict with arrays and
common blocks that aren't automatic in uncontrollable ways.

We do use -automatic for some of the specfp95 benchmarks, so it helps at least
some, for some programs. We don't use it for all the specfp95 benchmarks. And,
of course, if you really need values preserved from one invocation of a call to
another, just throwing the -automatic switch on the compiler may not be a good
way to compile your program.
1211.2fewer memory refs with automatic?HERON::BLOMBERGTrapped inside the universeThu Mar 06 1997 02:5521
    
    
            I had an example a year ago that ran twice as fast with
            -automatic. Using atom it turned out that it did
            considerable fewer memory references with -automatic.
    
            I never quite understood it, but I can imagine that static
            variables may need two loads to read, first the address,
            then the datum. With automatic, variables can be read
            directly off the stack. Something like:
    
            static variable:
            ldq R1,...      load address of x
            ldl R2,(R1)     load x
    
            stack variable:
            ldl R2,X(SP)    load x
    
    deux centimes,
    Ake
    
1211.3If SPEC jumps off a cliff, don't follow :-)PERFOM::HENNINGThu Mar 06 1997 07:4020
    Thanks .1 for the compliment, but I should point out that just because
    the spec tuning has a switch doesn't necessarily mean its right - we
    occasionally will go try to prove the right combination of, say, the
    top 3 switches which are hypothesized to be relevant  (= 8 combinations
    * 10 benchmarks * at least 12 repetitions to have statistical
    signficance * about 200 seconds per benchmark = one weekend) but we
    haven't tested every combo.  And of course every new version of
    software introduces new improvements that may change how the
    combinations interact.  So there are certain to be some switches in the
    current best-known tuning that are just plain wrong.
    
    I'll add it to the list that we should go re-try -automatic, I think
    its been at least 6 months since we played with it very much.
    
    Re: .2 - wouldn't the explanation in .1 seem more likely?  If the
    compiler feels more freedom to throw a result away, it would need fewer
    stores and reloads.  Do you still have the benchmark?  We've made some
    strides in improving the ease-of-use for IPROBE, it would be fairly
    quick to pin down bcache misses or scache misses to specific
    instruction sequences.  Drop me an email if you're interested.
1211.4Part of Alfa Avio testsHERON::BLOMBERGTrapped inside the universeThu Mar 06 1997 08:048
    
    Re .3
    
    No, I don't have it any longer. It was part som some tests
    we did for Alfa Avio in Italy a year ago.
    
    /Ake