[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::fortran

Title:	Digital Fortran
Notice:	Read notes 1.* for important information
Moderator:	QUARK::LIONEL

Created:	Thu Jun 01 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1333
Total number of notes:	6734

1211.0. "Any performance implications re: automatic vs. noautomatic (i.e., stack vs. static allocation)?" by HYDRA::NEWMAN (Chuck Newman, 508/467-5499 (DTN 297), MRO1-3/F26) Wed Mar 05 1997 14:54

FORTRAN routines that are going to be called in parallel either
need to have local variables declared as AUTOMATIC either explicitly
in the source or else via the AUTOMATIC compiler switch.

Aside from that, are there any *performance* implications of one over
the other?  I would guess that it would be pretty much a wash, with
perhaps a slight win for living on the stack.

I would like the opinion of others more knowledgeable than I.

								-- Chuck Newman

T.R	Title	User	Personal Name	Date	Lines
1211.1	some issues	GEMGRP::PIEPER		`Wed Mar 05 1997 15:31`	22
	Automatic variables can be known to be not live on entry (they didn't exist before) so that can help the optimizer. Automatic variables can exist entirely in registers and need never go to memory, whereas "normal" variables may need to be preserved across routine calls unless the compiler is SURE that they are written before being read. Because of the way the compiler works, you can get uninitialized variable messages for automatic variables that you wouldn't get with statically-allocated variables. Allocating variables statically will have different (data) cache behavior from allcating them on the stack. Not necessarily better, nor necessarily worse, but different. Variables on the stack can re-use the same space, and may exhibit better locality. On the other hand, when you put things on the stack it's much harder to control their alignment in the cache, so they may conflict with arrays and common blocks that aren't automatic in uncontrollable ways. We do use -automatic for some of the specfp95 benchmarks, so it helps at least some, for some programs. We don't use it for all the specfp95 benchmarks. And, of course, if you really need values preserved from one invocation of a call to another, just throwing the -automatic switch on the compiler may not be a good way to compile your program.
1211.2	fewer memory refs with automatic?	HERON::BLOMBERG	Trapped inside the universe	`Thu Mar 06 1997 02:55`	21
	I had an example a year ago that ran twice as fast with -automatic. Using atom it turned out that it did considerable fewer memory references with -automatic. I never quite understood it, but I can imagine that static variables may need two loads to read, first the address, then the datum. With automatic, variables can be read directly off the stack. Something like: static variable: ldq R1,... load address of x ldl R2,(R1) load x stack variable: ldl R2,X(SP) load x deux centimes, Ake
1211.3	If SPEC jumps off a cliff, don't follow :-)	PERFOM::HENNING		`Thu Mar 06 1997 07:40`	20
	Thanks .1 for the compliment, but I should point out that just because the spec tuning has a switch doesn't necessarily mean its right - we occasionally will go try to prove the right combination of, say, the top 3 switches which are hypothesized to be relevant (= 8 combinations * 10 benchmarks * at least 12 repetitions to have statistical signficance * about 200 seconds per benchmark = one weekend) but we haven't tested every combo. And of course every new version of software introduces new improvements that may change how the combinations interact. So there are certain to be some switches in the current best-known tuning that are just plain wrong. I'll add it to the list that we should go re-try -automatic, I think its been at least 6 months since we played with it very much. Re: .2 - wouldn't the explanation in .1 seem more likely? If the compiler feels more freedom to throw a result away, it would need fewer stores and reloads. Do you still have the benchmark? We've made some strides in improving the ease-of-use for IPROBE, it would be fairly quick to pin down bcache misses or scache misses to specific instruction sequences. Drop me an email if you're interested.
1211.4	Part of Alfa Avio tests	HERON::BLOMBERG	Trapped inside the universe	`Thu Mar 06 1997 08:04`	8
	Re .3 No, I don't have it any longer. It was part som some tests we did for Alfa Avio in Italy a year ago. /Ake