[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:Languages
Notice:Speaking In Tongues
Moderator:TLE::TOKLAS::FELDMAN
Created:Sat Jan 25 1986
Last Modified:Wed May 21 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:394
Total number of notes:2683

115.0. "VAX Compilers Code Speed" by MUNICH::FROEHLIN (Guenther K. Froehlin) Mon Nov 10 1986 08:20

    Here is a list of execution speed of code produced by different
    VAX languages. This is by far not representative ! But gives
    some impression. Any comments ?
    
    If you are interested in the sources, get saveset
    
    	MUNICH""::CSC_INFO:ERATO.A
    
    Prost !
    G�nther


------------------------------------------------------------------------------
    
Seeth of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------

Execution timing in seconds !

Language ! Version  ! VAX750  ! VAX780  ! VAX785  ! VAX8200 !
---------+----------+---------+---------+---------+---------+
FORTRAN  ! V4.3-145 ! 02.14   ! 01.24   ! 00.84   ! 01.08   !
BLISS    ! V4.1-746 ! 02.49   ! 01.63   ! 01.03   ! 01.48   !
PLI      ! T2.4-397 ! 02.85   ! 01.54   ! 01.00   ! 01.40   !
PASCAL   ! V3.2-57  ! 02.69   ! 01.62   !         ! 01.51   !
ADA      ! T1.2-12  ! 03.72   ! 02.02   ! 01.29   ! 01.80   !
C        ! V2.1-007 ! 03.41   ! 02.05   ! 01.57   ! 01.80   !
COBOL    ! V3.2-42  ! 10.04   ! 05.87   ! 03.89   ! 05.67   !
BASIC    ! V2.3     ! 13.13   ! 05.84   ! 04.58   ! 07.63   !
---------+----------+---------+---------+---------+---------+

VAX750	REV: xx	FPA: ?
VAX780	REV:  7	FPA: Y
VAX785	REV:  2	FPA: Y
VAX8200	REV: xx	FPA: Y
T.RTitleUserPersonal
Name
DateLines
115.1For Internal Use OnlyTLE::FELDMANLSE, zealouslyMon Nov 10 1986 10:2813
    Just a reminder from your friendly moderator:
    
    Timing data on compilers is considered sensitive information.  The
    data published in the base note (along with everything else in this
    notes file) is For Internal Use Only. 
    
	Gary
    
    PS Anyone care to comment on the relevance of the Sieve of Eratosthenes
    as a benchmark?  My gut reaction is that very few of our customers
    need to calculate prime numbers, and that optimizations that work
    quite well for reasonable code mixes may not necessarily do well
    on the S of E.
115.2And sources, too!MINAR::BISHOPMon Nov 10 1986 11:029
    Further, when such data is given, there should also be an
    indication of the qualifiers used:  Pascal in particular
    has a default of /CHECK, which slows it down.  Thus to get
    the fastest runs with VAX Pascal, non-default qualifiers 
    must be use.
    
    I agree with Gary's implied criticism: the benchmark cited
    is a poor one.
    			-John Bishop
115.3Got it !MUNICH::FROEHLINType <RETURN>! What ye see ? R-E-T-U-R-NMon Nov 10 1986 12:322
    Thank you for your responses ! As I expect you professionals, my
    sieve will disappear into dust. But a bitter taste still remains!
115.4Certainly not USDA Grade Prime numbersNOBUGS::AMARTINAlan H. MartinMon Nov 10 1986 13:3015
I wouldn't spread those numbers far for another reason.  Here are the
first few prime numbers the Ada version found:

"
ERATO start in ADA
          1          2           3           4           5           7
          8          10          13          14          17          19
         20          22          25          28          29          32
         34          35          38          40          43          47
         49          50          52          53          55          62
         64          67          68          73          74          77
         80          82          85          88          89          94
         95          97          98         104         110         112
"
				/AHM
115.5The right corner...MUNICH::FROEHLINType <RET>! What do you see? R-E-TWed Nov 12 1986 08:0615
    .4: Thank you for your testing ! I tried the ADA version again and
        printed out the PRIME variable (after prime:=i+i+3;) and got
        prime numbers only. The algorithm doesn't calculate for 1,2
        and 3. Or did you look at the FLAGS indicator ? They do not
        reflect the the prime numbers.
    
    .1,.2: I would have liked some technical discussion of the subject
           because there is a difference in code produced by different VAX
           compilers (see 'VAX Performance Summary 1985, Chapter 6).
    
    As far as I know, VAX C and VAX PLI are using the same CODE GENERATOR
    and OPTIMIZER, but the result looks different. Things like this
    had been the reason why I posted .0 here.
    
    Guenther Froehlin, ACT/TEC, Munich, F.R.Germany
115.6Feel free to continue the discussion (FIUO)TLE::FELDMANPDS, our next successWed Nov 12 1986 10:2911
    My reply in .1 wasn't meant to stifle discussion on this subject.
    Please feel free to continue; I, too, would appreciate an explanation
    of the differences in code generated by our compilers that share
    the same back end.
    
    This discussion brings out yet another reason for timing data to be
    considered confidential: it takes an expert to design benchmarks, and
    an expert to interpret them; fudging or misinterpreting the data is far
    too easy. 
    
       Gary
115.7I'd like to see the program againNOBUGS::AMARTINAlan H. MartinWed Nov 12 1986 13:4211
Re .5:

True, I did not examine the successive values of the variable "prime".
On the other hand, in the only version of the Sieve of Eratosthenes
which I am aware of, a side effect of the algorithm is a boolean array,
where every number which is not "crossed off" is a prime.

I'd like to examine the program further, but deleted my copy.  Could you
post the latest location for the saveset?  It seems to have disappeared
from MUNICH""::CSC_INFO:.
				/AHM/THX
115.8programs available and more ?sMUNICH::FROEHLINType <RET>!What do you see?R-E-T-U-R-NThu Nov 13 1986 07:1418
    .6: Are there programers around for all the languages shown in
        .1 ? I would like to have the programs checked for correctness
        and speed (qualifiers to use etc.).
    
    .7: The saveset is back again : MUNICH::CSC_INFO:ERATO.A
    
    In general, the sieve of Eratosthenes is one of the programs in
    the set for benchmarking CPU power of computers. Among those programs
    are others like HANOI, ACKERMANN... I just want to be more sure
    about the ERATO version to compare it with other published data
    and I appreciate every help that the programs in ERATO.A are running
    as fastest as possible on a VAX.
    	One interesting thing is that the ratio between the languages
    is different on the various VAXes. How that ?
    
    gggGuenther
    
    
115.9Benchmarks are fun but often misleading...CADSYS::COOKNeilFri Nov 14 1986 01:1749
    Whilst the programs you cite are often used in benchmarking computers
    and compilers, they are not very useful for this task. The reason
    for this is alluded to in your own note - "as fastest as possible on
    a VAX". Since a benchmark which shows one machine faster than another
    makes good advertising copy, people often "hand tune" benchmarks to
    get the best possible performance out of their machines. When measured
    in MIPS, this is often refered to as "Peak MIPS", since it could not
    be sustined for a more normal workload.

    Some small programs like Eratosthenes fit into the caches of a
    significant number of machines. This will reduce the amount of (slow)
    main memory fetches required, replacing them with fetches from (fast)
    cache memory. While a cache will normally improve a machine's speed,
    running small benchmarks like these over-emphasize the benefits to
    be gained.

    Some of the most realistic benchmarks are application programs (or
    parts thereof) which have been made available for benchmarking
    purposes. The Lawrence Livermore Loops program is one.


    You wondered why the ratio between the different languages varied
    between machines. Some of the VAX compilers use different code
    generators. Some will be more efficient than others for doing
    different things as the compiler writers will have concentrated
    on different language elements. Even when two compilers share
    the same code generator, the input to the code generator is likely
    to be a little different even for the same algorithm, so different
    machine code might be generated which runs at differing speeds.
    Lastly, different VAXes take differing numbers of machine cycles
    to perform the same instrucutions, so even if they had the same
    machine cycle time, they would differ in execution time. A very
    important aspect of this, is that some machines emulate instructions
    which others execute from microcode. In some cases where a particular
    code generator is more prone to generating these instructions, it
    can have a dramatic effect on performance. COBOL is quite prone to
    this since packed decimal instructions, for example, have been
    given much less prominance on more recent implementations of the
    VAX architechture.

    Personally, I am fascinated by benchmark results (the Usenet newsgroup
    comp.arch carries results every few months) and I need to keep
    reminding myself to take them with a grain of salt, lest I get
    too optimistic or pessimistic about the latest ones.

    P.S. BYTE is very good at bad benchmarks. Small benchmarks written
    in BASIC tell a potential purchaser next to nothing about a
    particular machine, especially if the floating point instructions
    are emulated.
115.10Thanks for restoring itNOBUGS::AMARTINAlan H. MartinFri Nov 14 1986 17:3073
Re .8:

Thanks for recovering the save set.  I still haven't figured out the
algorithm, but I agree that the results are much improved when you examine
the contents of the variable "prime".  Now the only composite number
it claims is prime is 9.  Probably related to some glitch in starting the
algorithm up.  (I've got a simple but fast implementation of a prime
finding algorithm which doesn't get 2 or 3 right).

An implementation of the Sieve of Eratosthenes which uses the same
algorithm that my office mate and I regard as "the original Greek" follows.
It comes from the VAX C test system:

"
#ifndef	TIMER
#define TIMER 1
#endif

/* sieve.c */
/***************************************************************************** 
	Eratosthenes Sieve Prime Number Program (Byte September 1981) 
	modified to make more 'C-like' 
*****************************************************************************/

#define TRUE 1
#define FALSE 0
#define SIZE 8190

char flags[SIZE + 1];

main()
{
	int	timrb(), timre();
	int	i, k, iter;

#ifdef OUTPUT
#ifdef vax11c
	puts("\n\nVAX11C --- SIEVE benchmark ---\n");
#else
	puts("\n\n--- SIEVE benchmark ---\n");
#endif
#endif

#if TIMER 
	timrb();
#endif

	for (iter = 1; iter <= 500; iter++)
	{
		for (i = 0; i <= SIZE; i++)
			flags[i] = TRUE;
		for (i = 2; i <= SIZE; i++)
			if (flags[i])
				for (k = i * 2; k <= SIZE; k += i)
					flags[k] = FALSE;
	}

#if TIMER 
	timre();
#endif

#ifdef OUTPUT
	printf("flags[k]:  k = %d, %d\n", SIZE - 200, SIZE);
	for (k = SIZE - 200; k <= SIZE; k += 65)
	{
		printf("\n");
		for (i = k; i <= k + 64 && i <= SIZE; i++)
			printf(" %d", flags[i]);
	}
#endif

}
"
115.11I could still use some helpNOBUGS::AMARTINAlan H. MartinMon Nov 17 1986 14:263
I'm going to submit one of the programs to CLT::MATH (q.v.) to see if
someone can explain the algorithm works.
				/AHM/THX
115.12Some grains of salt ...TLE::MEIERBill MeierWed Nov 19 1986 16:528
re: various

Ada also has the default of /CHECK like Pascal, so comparing it to other 
compilers w/o check is bad. Thus, if you have an array (which I assume you do)
these two compilers will probably make run-time checks every time an element in
the array is refered to, costing you ...

Ada, C and PL/I all in fact do use the same code generator! The VCG.
115.13Tweaks or FORGOL?TLE::MEIERBill MeierThu Nov 20 1986 10:1311
I don't know if your intent is to write everything "FORGOL" style, but with
your Ada example, merely replacing

	for I in 1.. LIMIT loop FLAGS(I) := 1; end loop;

with the array aggregate assignment statement

	FLAGS := (others => 1);

you get a 10% performance improvement! Any true Ada programmer would express it
that way in the first place, I suspect.
115.14/NOCHECK/OPTI=SPEED_MAXMUNICH::FROEHLINType &lt;RET&gt;!What do you see?R-E-T-U-R-NThu Nov 20 1986 10:174
    As far as I can remember I switched off checks and set optimization
    to max speed for all compilations.
    
    Guenther
115.15MUNICH::FROEHLINType &lt;RET&gt;!What do you see?R-E-T-U-R-NThu Nov 20 1986 10:222
    .13: Got this one ! Thanks..
    					Guenther
115.16Better BASICWELSWS::DODDThu Nov 20 1986 11:2746
        I took a look at the BASIC version of this and the first reaction
    is nothing to do with benchmarking but...
    
    WHY do people write VAX BASIC as though they were on their Micky
    Mouse home computer??
    
    The guys in BASIC development have done a tremendous job on the
    language in terms of improving its structures and functions and
    still we see junk like this. (sorry it makes me cross!!)
    
    I include a better version - this on it's own gives about 8%
    improvement something like .13 should be do-able give me a while.
    
    The example now looks a lot like PLI so I wonder why the execution
    time is not comparable??
    
110	DECLARE BYTE FLAGS(10000)
	DECLARE WORD LIMIT,COUNTER,PRIME,J,I,K
	LIMIT=10000%
	PRINT "ERATO start in BASIC"
	CALL LIB$INIT_TIMER()
	FOR J=1% TO 10%
		COUNTER=0%
		FOR I=1% TO LIMIT
			FLAGS(I)=1%
		NEXT I
		FOR I=1% TO LIMIT
			IF FLAGS(I)=1%
			THEN
				PRIME=I+I+3%
				K=I+PRIME
				WHILE K <= LIMIT
					FLAGS(K)=0
					K=K+PRIME
				NEXT
				COUNTER=COUNTER+1%
			END IF
		NEXT I
	NEXT J
	CALL LIB$SHOW_TIMER()
	PRINT
	PRINT COUNTER;" primes"
	END
    
    

115.17Still better.BISTRO::HEINHein van den Heuvel, Valbonne.Fri Nov 21 1986 09:3551
1  re .16,                                                         
    
    	Have a look at the code BASIC generates for your example.
    	You'll find zillions of CVTxx instructions for no good reason.
    	Those can be avoided by explicit datatyping of constants.
    	BASIC V3 would allow you to specify a default datatype.
    	I guess the worst instruction in your example is in the innermost
    	loop: FLAGS(K)=0 which will genearate:
    
    					CVTWL	K(R11), R12
    					INDEX	R12, #0, #10000, #1, #0, R12
    					CVTFB	#0., FLAGS(R11)[R12]
    	
    	If you change K to LONG and =0 to ='0'B you will get
    					
    					INDEX	K(R11), #0, #10000, #1, #0, R12
    					MOVB	#0, FLAGS(R11)[R12]
    
    	Better, but not best source example included.
    	Do you all understand now why some people do not like BASIC !?
    
    	Hein.
    

1	DECLARE BYTE FLAGS(10000)
	DECLARE LONG LIMIT,COUNTER,PRIME,J,I,K
	LIMIT='10000'L
	PRINT "ERATO start in BASIC"
	CALL LIB$INIT_TIMER()
	FOR J=1% TO 10%
		COUNTER='0'L
		FOR I=1% TO LIMIT
			FLAGS(I)='1'B
		NEXT I
		FOR I=1% TO LIMIT
			IF FLAGS(I)='1'B
			THEN
				PRIME=I+I+'3'L
				K=I+PRIME
				WHILE K <= LIMIT
					FLAGS(K)='0'B
					K=K+PRIME
				NEXT
				COUNTER=COUNTER+'1'L
			END IF
		NEXT I
	NEXT J
	CALL LIB$SHOW_TIMER()
	PRINT
	PRINT COUNTER;" primes"
	END
115.18What's not to like?NOBUGS::AMARTINAlan H. MartinFri Nov 21 1986 10:204
Re .17:

Do you mean they don't like VAX BASIC the product, or BASIC the language?
				/AHM/THX
115.19TGIFBISTRO::HEINHein van den Heuvel, Valbonne.Fri Nov 21 1986 11:0823
    I like the functionality, language constructs, that VAX BASIC
    offers me, but I do not like the implementation, how it's done.
    
    I think more people feel like that.
    
    I think the language is fun. The compiler is stupid. The VAX BASIC
    engineers will ofcourse tell us that they would like to make a
    clever compiler but the stupid languange definitions do not allow for one.
    
    Eg the RESUME statement forces them the re-load registers over and over
    again to handle the case where one RESUMEs in the middle of some code.
    Hence, they say, the compiler can not optimize.
    Well, I have my doubts, but I can not substanciate (sp?) them.
    It's just that some optimizations *seem* so easy to do. They example
    in the previous reply can provide numerous examples.
    
    Look at the FOR loop implementation:
    
    FOR I = 1 to 10    --->	MOVL #1, I(R11)         }
    				DECL I(R11)             } = CLR I(R11)
    				ACBL #10, #1, I(R11)
    
    Oh well...
115.20Best Yet.CHOVAX::YOUNGBack from the Shadows Again,Fri Nov 21 1986 23:4178
    	Some people dislike BASIC for the wrong reasons.  The CVTxx
    complaints mentioned earlier were a result of the mistaken use of
    a real number literal constant intead of an integer, and a failure
    to set the default data type properly for the program (this CAN
    be done in versions earlier than v3.0).  As Hein noted, correcting
    this does result in an increase in speed.  However, the biggest
    problem of both this example, and the orginal ERATOBAS.BAS is that
    neither turns off BASICs default checking.  Also, the use of BYTEs
    were there is sufficient room for LONGwords will result in a
    performance hit, so I have changed this too.
    	
    	The end result is that while the original ERATOBAS.BAS took
    5.84 cpu seconds to complete on a 780, the following program takes
    only 2.88 seconds.  While this is still slow by some standards,
    I have found that VAX BASIC is nowhere nearly as slow as most people
    think.  I have consistently been able to bring it to within a factor
    of 2 compared to most other High Level languages (excluding Fortran),
    and I consider that a small price to pay for the use of real native
    dynamic strings.
    
    --  Barry
    
    ---------------------------------------------------------------------
    
    
1	!***************************************************************!
	!								!
	!	Seive of Erastothones Basic Benchmark program		!
	!								!
	!***************************************************************!

10	Option type = explicit
	Option inactive = subscript checking
	Option inactive = integer overflow

110	Declare long constant	Limit	=	10000
	Declare long constant	Strsiz	=	4*(limit+1)

	Map (ws) long   FLAGS (limit),    ones (limit)
	Map (ws) string flg_str = strsiz, one_str = strsiz

	DECLARE long COUNTER,PRIME,J,I,K

	! Prime the array pages.
	Flags (i) = 0%   For i = 0% to limit
	Ones (i) = 1%   For i = 0% to limit

	PRINT "ERATO start in BASIC"
	CALL LIB$INIT_TIMER()

	FOR J=1% TO 10%

		COUNTER=0%

 		Flg_str = one_str

		FOR I=1% TO LIMIT
			IF FLAGS(I)=1%
			THEN
				PRIME=I+I+3%
				K=I+PRIME
				WHILE K <= LIMIT
    					FLAGS(K)=0%
					K=K+PRIME
				NEXT
				COUNTER=COUNTER+1%
			END IF
		NEXT I

	NEXT J

	CALL LIB$SHOW_TIMER()
	PRINT
	PRINT COUNTER;" primes"
	END
    
    
    
115.21just thinkingWELSWS::DODDMon Nov 24 1986 09:1812
    The point I was trying to make was only that people lay out and
    write BASIC badly far more often than any other VAX language.
    The improvement with taking out the initialisation loops could be
    done in all of the ERATO* I looked at and would give a hefty
    performance improvement. I attempted no performance improvement
    just maintainability... Datatyping is probably the cause of most
    spurious code just take a look at a COBOL program - the slightest
    incompatibility and it's away into packed decimal hence it's sluggish
    performance on a microVAX.
                
    BASIC is a good enough language unless your major goal is absolute
    performance. 
115.22STAR::BENSONMon Nov 24 1986 17:0843
RE: .19:
    
>    Eg the RESUME statement forces them the re-load registers over and over
>    again to handle the case where one RESUMEs in the middle of some code.
>    Hence, they say, the compiler can not optimize.
>    Well, I have my doubts, but I can not substanciate (sp?) them.
>    It's just that some optimizations *seem* so easy to do. 
    
  You're right - some SEEM easy to do, until you think of all the implications.
  For example, eliminating the code you mention which loads the current line
  number address into the frame. Even if the user does explicit RESUMEs, there
  is still the problem of returning the line on which the error occurred (ERL), 
  which can even be done from a calling program. 

>    I think the language is fun. The compiler is stupid. The VAX BASIC
>    engineers will ofcourse tell us that they would like to make a
>    clever compiler but the stupid languange definitions do not allow for one.
    
  Yes, there are some language elements which cause problems for optimizations.
  However, you should also look at the improvements that have been made in the
  VAX BASIC language and compiler between V1, V2, and V3, which would give
  you a clue as to what the project goals have been. And remember that another
  implicit goal is to be a compiler that looks like an interpreter, and
  optimization can cost compile time.

  Personally, I don't doubt that the VAX BASIC group would like to add some
  optimizations (for non-interactive mode), as time permits. Flaming at the
  compiler (or the group) doesn't help. Learning how to write good BASIC code,
  and then providing input about where you think the shortcomings are and what
  language elements are most important, might.

RE: .20:

  Note that you can specify /NOLINE to further improve the code - it eliminates
  the MOVA mentioned above. Also, V3 has a CONSTANT TYPE option, so if you
  dislike adding "%" to your integer constants, you could specify

		OPTION CONSTANT TYPE = INTEGER

- Tom

  Disclaimer - I don't work on VAX BASIC so I don't speak for the group. I
	       just happen to know someone who once did. 8^)
115.23.0 updateMUNICH::FROEHLINType &lt;RET&gt;!What do you see?R-E-T-U-R-NMon Dec 01 1986 07:1835
	With all your valuable inputs, this is the last update
    	of .0 .
    
    	Servus,
    	G�nther
    	

Sieve of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------

CPU execution time in seconds !

Compiler ! VAX750  ! VAX780  ! VAX785  ! uVAX II ! VAX8200 !
---------+---------+---------+---------+---------+---------+
ADA      !  2.53   !  1.54   !  1.07   !   1.46	 !  1.30   !
BASIC    !  3.87   !  2.19   !  1.60   !   2.45	 !  1.89   !
BLISS    !  2.27   !  1.49   !  0.91   !   1.40	 !  1.64   !
C        !  3.49   !  2.14   !  1.56   !   2.14	 !  1.81   !
COBOL    ! 10.14   !  5.91   !  3.89   !  19.88	 !  5.69   !
FORTRAN  !  2.14   !  1.27   !  0.84   !   1.18	 !  1.08   !
PASCAL   !  2.98   !  1.65   !  1.06   !   1.71	 !  1.54   !
PLI      !  2.63   !  1.48   !  0.91   !   1.61	 !  1.33   !
---------+---------+---------+---------+---------+---------+

Compiler ! Version  ! Compiler Option
---------+----------+---------------------------------------+
ADA      ! T1.3-19  ! ADA    /NOCHECK/OPTIMIZE=TIME
BASIC    ! V2.4     ! BASIC  /NOCHECK/OPTIMIZE
BLISS    ! V4.1-746 ! BLISS  /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED)
C        ! V2.2-015 ! CC     /OPTIMIZE
COBOL    ! V3.4-49  ! COBOL  /NOCHECK
FORTRAN  ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE
PASCAL   ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL
PLI      ! T3.0-091 ! PLI    /NOCHECK/OPTIMIZE
---------+----------+---------------------------------------+
115.24How about a MACRO-32 version?TLE::MEIERBill MeierMon Dec 01 1986 14:468
Might be interesting to code the "FORGOL" algorithm in MACRO-32 to see what the
"theroretical" limit on execution speed, assuming of course, a good macro 
programmer can [eventually] produce code that is better than any compiler, at
least today, with the VAX instruction set.

Ie, is FORTRAN at the 99% point relative to MACRO or the 80% point, etc.

Any volunteers?
115.25Improvement still possibleOSLCSC::OLAVOlav Tollefsen, TSSC, NorwayThu Dec 04 1986 15:5442
    Declaring the flags array of type BOOLEAN will speed up the Ada version
    by 10%.  I have included the new version. 
    
    Olav
    
with TEXT_IO; use TEXT_IO;
procedure ERATOADA is
    package INT_IO is new INTEGER_IO(INTEGER);
    LIMIT : constant INTEGER:=10000;
    FLAGS : array (1..LIMIT) of BOOLEAN;
    COUNTER,PRIME,I,J,K : INTEGER;
    procedure INIT;
    pragma INTERFACE(VMS,INIT);
    pragma IMPORT_PROCEDURE(INIT,"lib$init_timer");
    procedure SHOW;
    pragma INTERFACE(VMS,SHOW);
    pragma IMPORT_PROCEDURE(SHOW,"lib$show_timer");
begin
 PUT_LINE("ERATO start in ADA");
 INIT;
 for J in 1..10
  loop
   COUNTER:=0;
   FLAGS:=(others => TRUE);
   for I in 1..LIMIT
    loop
     if FLAGS(I)
      then
       PRIME:= I+I+3;
       K:=I+PRIME;
       while K<=LIMIT
        loop
         FLAGS(K):= FALSE;
         K:=K+PRIME;
        end loop;
       COUNTER:=COUNTER+1;
      end if;
    end loop;
  end loop;
 SHOW;
 INT_IO.PUT(COUNTER);PUT_LINE(" primes");
end ERATOADA;
115.26SMOP::GLOSSOPKent GlossopSat Dec 06 1986 01:4191
Just like for BASIC, it's possible to make things go faster for other
languages if you know the compiler.  I modified the PL/I benchmark
slightly, and it now actually very slightly outperforms even the original
FORTRAN version on an 8800.  Well, that was until I modified the FORTRAN
version...  The FORTRAN version can be sped up quite a bit on the
8800 if you know that the memory is organized as longwords.  When
you know that, you know that byte writes take longer than longword
writes, since for byte writes, the system must read the old longword
and update the value, while (aligned) longword writes can just blast
the old location.  The catch is systems that cost more to write longwords
than bytes (systems with 16-bit memory, perhaps) might actually go
slower.  (Also, LOGICAL*4 allows BLBS/BLBC to be used in place of
BBS/BBC, since you know there is an entire longword available.)

I will point out that I think this is a VERY POOR benchmark.  I will
also point out that anyone who has a good knowledge of a language and
a compiler can frequently do a LOT better than J. Random Programmer
that doesn't ever bother to see what the compiler is generating and
why it is doing that.

I have marked the revised figures with asterisks.  [I ran the benchmark
10 times in a command procedure interleaving the FORTRAN and PL/I runs
and used the average of the results after verifying there were no anomalies.
This is using the original version of the FORTRAN code, not the version
specifically set up for the 8800.]

Sieve of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------

CPU execution time in seconds !

Compiler ! VAX750  ! VAX780  ! VAX785  ! uVAX II ! VAX8200 ! VAX8800 |
---------+---------+---------+---------+---------+---------+---------+
ADA      !  2.53   !  1.54   !  1.07   !   1.46	 !  1.30   !         |
BASIC    !  3.87   !  2.19   !  1.60   !   2.45	 !  1.89   !         |
BLISS    !  2.27   !  1.49   !  0.91   !   1.40	 !  1.64   !         |
C        !  3.49   !  2.14   !  1.56   !   2.14	 !  1.81   !         |
COBOL    ! 10.14   !  5.91   !  3.89   !  19.88	 !  5.69   !         |
FORTRAN  !  2.14   !  1.28*  !  0.85*  !   1.24* !  1.08   !   .273* |(.253 w/
PASCAL   !  2.98   !  1.65   !  1.06   !   1.71	 !  1.54   !         |LOGICAL*4)
PLI      !  2.??   !  1.28*  !  0.85*  !   1.26* !  1.??   !   .268* |
---------+---------+---------+---------+---------+---------+---------+

Compiler ! Version  ! Compiler Option
---------+----------+---------------------------------------+
ADA      ! T1.3-19  ! ADA    /NOCHECK/OPTIMIZE=TIME
BASIC    ! V2.4     ! BASIC  /NOCHECK/OPTIMIZE
BLISS    ! V4.1-746 ! BLISS  /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED)
C        ! V2.2-015 ! CC     /OPTIMIZE
COBOL    ! V3.4-49  ! COBOL  /NOCHECK
FORTRAN  ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE
PASCAL   ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL
PLI      ! T3.0-124 ! PLI    /NOCHECK/OPTIMIZE
---------+----------+---------------------------------------+

/*
 * FLAGS_OVERLAY is used because for some reason FLAGS = '1'b isn't
 * generating a MOVC5 instruction like I would expect.  COUNT2 is
 * used because using a variable in a PUT statement causes an up-level
 * reference, since a BEGIN block is implicitly generated for all PUT
 * statements.  (Up-level referenced variables are forced to memory.)
 */
%REPLACE LIMIT BY 10000;
ERATO:	PROC OPTIONS(MAIN);
    DCL (COUNT,COUNT2,PRIME,J,I,K) FIXED;
    DCL FLAGS(LIMIT) BIT ALIGNED;
    DCL FLAGS_OVERLAY(LIMIT) FIXED BIN(7) BASED(ADDR(FLAGS));
    DCL LIB$INIT_TIMER EXTERNAL ENTRY;
    DCL LIB$SHOW_TIMER EXTERNAL ENTRY;

    PUT LIST ('ERATO start in PLI');
    CALL LIB$INIT_TIMER;
    DO J=1 TO 10;
	COUNT=0;
	FLAGS_OVERLAY=1; /* Array assignment */
	DO I=1 TO LIMIT;
	    IF FLAGS(I) THEN DO;
		PRIME=I+I+3;
		K=I+PRIME;
		DO WHILE (K <= LIMIT);
		    FLAGS(K)='0'b;
		    K=K+PRIME;
		    END;
		COUNT=COUNT+1;
		END;
	    END;
	END;
    CALL LIB$SHOW_TIMER;
    COUNT2 = COUNT;
    PUT SKIP LIST (COUNT2,' primes');
    END ERATO;
115.27SMOP::GLOSSOPKent GlossopSat Dec 06 1986 02:0846
Also, just emphasize how much knowing about compilers, etc., can make a
difference, simply by moving the variable declarations INSIDE the program
block for C makes an enormous difference, since the variables go from
"external" (which can in theory be modified asynchronously - i.e. aliased)
to "automatic", which is not aliased unless it has its address taken.
(Aliased variables cannot be cached in registers.)  The speed of the C
code after this "trival" change is approximately the same speed as the
revised PL/I program - i.e. only slightly slower than FORTRAN, or a 40%
speed improvement over the original version.

************************************************************************
DON'T TRUST BENCHMARKS UNLESS YOU KNOW *EXACTLY* WHAT THEY ARE SUPPOSED
TO BE TESTING, AND YOU ARE CONFIDENT THAT THEY ARE REALLY TESTING WHAT
THEY CLAIM TO BE TESTING.  INTER-LANGUAGE BENCHMARKS ARE ESPECIALLY
DANGEROUS, SINCE PEOPLE RARELY KNOW ALL LANGUAGES EQUALLY WELL.
************************************************************************

The revised C program follows:

#define limit 10000

main()
{
  int counter,prime,i,j,k; /* Moved from outside the block */
  int flags[limit+1]; /* Moved from outside the block */

  printf("\nERATO start in C");
  lib$init_timer();
  for (j=1; j<=10; j++)
  { counter=0;
    for (i=1; i<=limit; i++) flags[i]=1;
    for (i=1; i<=limit; i++)
    { if (flags[i]==1)
      { prime = i + i + 3;
        k = i + prime;
        while (k <= limit)
        { flags[k]=0;
          k = k + prime;
        }
        ++counter;
      }
    }
  }
  lib$show_timer();
  printf("\n%d primes",counter);
}
115.28Correct S of E?VINO::MERRILLMon Dec 08 1986 18:2331
What is the correct definition for S of E?  The previous response
used a different algorithm than that before, so I came up with one more
it is quite fast for 1000 sieves, about .57 seconds execution time:

#define TRUE 1
#define FALSE 0
#define SIZE 8190

main()
{
int i, j, k;
/* initilize 1 & 2 to be initially prime */
int flags[SIZE + 1] = { FALSE, TRUE, TRUE, };

lib$init_timer();
for (j = 1; j <= 1000; j++) {
  for (i = 3; i <= SIZE; i += 2)	/* start with 3 */
    flags[i] = TRUE;
  for (i = 3; i <= SIZE; i += 2)
    if (flags[i])
      for (k = (i + i) + i; k <= SIZE; k += (i + i))
        flags[k] = FALSE;
  }
lib$show_timer();

printf("Prime numbers:\n");
for (i = 1; i <= SIZE; i++)
  if (flags[i])
    printf(" %d\n",i);
}
115.29SMOP::GLOSSOPKent GlossopMon Dec 08 1986 20:135
I wouldn't be surprized if there are faster algorithms - I just sped up
the one provided to make a point about inter-language benchmarking
comparisons.

Kent
115.30see it two waysMUNICH::FROEHLINType &lt;RET&gt;!What do you see?R-E-T-U-R-NTue Dec 09 1986 04:3213
    If you compare the results among the cpu types, you see how
    dangerous the SofE benchmark is. The relative speed of the produced
    code varies. And this I think is an interesting point too.
    For certain (or small) programs one VAX may be relatively better
    than the other. It seems a little bit like you can find out a
    PASCAL machine or a BLISS machine or whatsoever.
    
    I would like to know how compiler engineers put that into account.
    
    Thank you all for the interesting discussion 
    (and code improvements)!

    G�nther
115.31Optimizing for a specific processorQUARK::LIONELReality is frequently inaccurateTue Dec 09 1986 10:129
    Re .30:
    
    Our usual strategy is to optimize for the fastest processor.  However,
    of late the various processor groups are trying to woo the
    compiler developers into making their particular processor look
    good, sometimes at the expense of another processor.  It's quite
    true that a code sequence that is best for processor X is not so
    for processor Y.
    				Steve
115.32ahemBIZET::VANROGGENTue Dec 09 1986 10:254
    I found that using the algorithm provided by Knuth, although a
    bit primitive, was about four times faster than the original
    algorithm given in ERATO. It was also correct, unlike ERATO.
    I'm surprised people haven't complained more about this.
115.33Byte gets the blameTLE::RMEYERSRandy MeyersTue Dec 09 1986 14:5912
Re .32:

I believe that Byte Magazine is the culprit responsible for the
broken algorithm for finding prime numbers.  Byte popularized
that particular benchmark in that form.  I always assumed that
they wrote the program without checking to see if it worked.

In general, I have found Byte very enjoyable for its technological
howlers.  I especially liked an article on recursive descent parsing
that evaluated expressions right to left.  The author of the article
did believe this to be a drawback, but didn't see how it could be
fixed.
115.34CLT::GILBERTeager like a childTue Dec 09 1986 19:1121
    In looking at the code generated by VAX Ada, I noticed a couple
    interesting things -- the initialization of the array is done by
    a MOVC3, whereas a MOVC5 would do as well, the value 'FALSE' is
    materialized (consuming R4, whereas a literal zero would suffice
    in the one instruction that uses it), the compiler does subscript
    checking but no range propagation, so some checks are redundant,
    and it uses INSV and EXTV to set/clear and test bits in the array.

    These are a few opportunities for the compiler to generate better
    code.

    A more important aspect (as far as benchmarks go) is that the loop
    variables are 'off by 1' from what the algorithm naturally desires.
    Thus, the compiler generates some "SUBL3 #1, Ra, Rb" instructions
    that would be unnecessary if the 'correct' loop variable or induction
    variable were used.  This is due (in part) to the bit-vector being
    indexed starting at 1, instead of 0.  It's doubtful that any compiler
    would attempt to determine the 'correct' induction variable, but
    some (such as Pascal or Fortran) might 'luck' upon it.  This gives
    a significant performance improvement (on older VAX models), and is
    attributable only to chance; this doesn't make for a good benchmark.
115.35AUTHOR::WELLCOMESteveWed Mar 18 1987 11:122
    For what it's worth (not much), I tried this in MACRO-11 on a
    PRO-380 and it took 2.68 seconds.  
115.36late entry for COBOLCLT::RICEMon Mar 23 1987 11:1295
    I know this is a late entry, but I just found this note. 

    To emphasize the fact that benchmarks and benchmark data should not be
    distributed without a complete understanding of the nature of the
    program and the expected meaning of the test (re: .27) - the VAX COBOL
    numbers previously reported were inaccurate on the order of 125% to
    over 850%. A properly written VAX COBOL program (knowing something
    about the compiler helps) is much faster than the times reported in
    #.23 indicate. In this case the major difference is the use of longword
    integer numerics instead of numeric string (which the compiler then
    converts to packed decimal for most calculations - but that's another
    story). 

    I was unable to obtain the saveset, but wrote a COBOL program based on
    the BASIC version in .16 and .20. Without spending a great deal of time
    optimizing it I obtained: 

                 MicroVAX II      VAX/785

'better' COBOL      2.06            1.71

instead of
                   19.88            3.89

    In view of the age of this note - I wonder how many readers have a
    distorted view of the speed of VAX COBOL programs. Slower than FORTRAN
    - sure, but not THAT slow! And, it shouldn't be a big suprise to anyone
    that packed is slower than integer - especially on MicroVAX. 
    
    Wish I'd seen this note sooner.

-Chip
(VAX COBOL development)


*****************************************************************
IDENTIFICATION DIVISION.
PROGRAM-ID. SIEVE.
ENVIRONMENT DIVISION.    
*****************************************************************
*								*
*	Sieve of Eratosthenes COBOL Benchmark program		*
*			     (whynot)				*
*****************************************************************
DATA DIVISION.
WORKING-STORAGE SECTION.
01  LIMITT  PIC S9(9) COMP VALUE 10000.
01  STRSIZ  PIC S9(9) COMP VALUE 40004.
01  COUNTER PIC S9(9) COMP VALUE 0.
01  PRIME   PIC S9(9) COMP VALUE 0.
01  J       PIC S9(9) COMP VALUE 0.
01  I       PIC S9(9) COMP VALUE 0.
01  K       PIC S9(9) COMP VALUE 0.
01  TABLES.
    02  FLGSTR.
        03  FLAGS OCCURS 10000 TIMES PIC S9(9) COMP.
    02  ONESTR.
        03  ONES  OCCURS 10000 TIMES PIC S9(9) COMP.

PROCEDURE DIVISION.
BEGIN.
    	PERFORM VARYING I FROM 1 BY 1  UNTIL I > LIMITT
    		MOVE 0 TO FLAGS(i)
    		MOVE 1 TO ONES(i)
    	END-PERFORM.

    	DISPLAY "ERATO start in COBOL".
    	CALL "LIB$INIT_TIMER".

    	PERFORM VARYING J FROM 1 BY 1 UNTIL J = 10
    		MOVE 0 TO COUNTER
    		MOVE ONESTR TO FLGSTR 
    		PERFORM VARYING I FROM 1 BY 1 UNTIL I = LIMITT
    			IF FLAGS(I) = 1
			THEN
*    				ADD I I 3 GIVING PRIME
                                MOVE I TO PRIME
                                ADD I TO PRIME
                                ADD 3 TO PRIME
*                                display prime conversion
				ADD I PRIME GIVING K
				PERFORM UNTIL K NOT < LIMITT
    				   MOVE 0 TO FLAGS(K)
				   ADD K PRIME GIVING K
				END-PERFORM
				ADD 1 TO COUNTER
			END-IF
		END-PERFORM
	END-PERFORM.

	CALL "LIB$SHOW_TIMER".
	DISPLAY " ".
	DISPLAY COUNTER CONVERSION " primes".

    
115.37I've got what you were looking forDENTON::AMARTINAlan H. MartinTue Mar 24 1987 10:587
Re .36:

>    I was unable to obtain the saveset, ...

See DENTON""::DENTON$DUA1:[AMARTIN.ERATO]ERATO.A  (or ERATOCOB.COB on
the same area).
				/AHM
115.38got it - thanksCLT::RICETue Mar 24 1987 12:555
    re: .37 thanks. Guenther too has mailed me his COBOL version.
    Most of the improvement in my version is from the use of integer
    numeric instead of string. 
    
    -Chip
115.39another set of benchmark figuresCOOKIE::DOUCETTEChuck DoucetteTue Jun 02 1987 18:33111
	Here (after the FF) is a reproduction of some benchmarks I did recently.
I did this in preparation for a talk I gave at ZKO last Thursday in
the Pythagoras room (5/28/87) on DEC WRL Modula-2 (a descendant of Mike
Powell's compiler). I post this for two reasons - 1) to generate
interest in this compiler, and 2) I'd like suggestions as to how
to make this a fair, accurate, and complete comparison.

	If you'd like complete compiler distribution kits for ULTRIX or VMS
including these benchmarks - please send mail to Joel McCormack
(SONORA::JOEL). If you want to just play with the benchmark files themselves,
you can look in COOKIE::DISK$GZ_DISK:[DOUCETTE.NEWMOD.BENCH] at the files
BENCH.*;* and DESCRIP.MMS;0

Chuck

From:	COOKIE::DOUCETTE  "Chuck Doucette  26-May-1987 2333" 27-MAY-1987 00:07
To:	@GROUP
Subj:	I redid the DECWRL Modula-2 benchmarks - some dramatic results:

--------------------------------------------------------------------------------
|| Program		|	Berkeley	Ultrix		DEC WRL	      ||
|| Name			|	Pascal		C		Modula--2     ||
--------------------------------------------------------------------------------
|| Perm			|	2.58	     	1.40	     	1.14	      ||
|| Towers		|	2.88	     	1.60	     	1.00	      ||
|| Queens		|	1.47	     	0.49	     	0.46	      ||
|| Intmm		|	2.75	     	1.27	     	0.59	      ||
|| Mm			|	2.32	     	1.69	     	0.64	      ||
|| Puzzle		|	15.73	     	8.92	     	2.56	      ||
|| Quick		|	2.07	     	0.62	     	0.45	      ||
|| Bubble		|	3.57	     	2.98	     	0.67	      ||
|| Tree			|	3.30	     	1.76	     	1.00	      ||
|| FFT			|	5.72	     	3.62	     	1.44	      ||
|| Int Comp		|	5.27	     	2.91	     	1.39	      ||
|| Real Comp		|	7.90	     	4.49	     	2.01	      ||
--------------------------------------------------------------------------------
|| avg. improvement     |	74%	     	47%	     	N/A	      ||
--------------------------------------------------------------------------------

This is CPU seconds running on shodha w/ ULTRIX (a VAX-8200).

    The "avg. improvement" row means that for each entry, I compared
it with DEC WRL Modula-2. I figured the improvement in execution
time from the particular language and Modula-2. Then, I added
up all the improvement percentages per column, divided by the
number of entries, and got the average. This should indicate
(not accurately) the average expected improvement in execution
time (i.e. how good was the code that the compiler generated)
if you coded the same thing in Modula-2.

    N/A means either Not Applicable or Not Available.

    Other than the actual timings I got (and the machine), and the avg.
improvement line that I added - see the following excerpt from overview.doc
written by Mike Powell for details (note that speed of compilation with the
current compiler is twice as fast as the compiler in use when this was
written):

          The compiler was designed and built by Michael L. Powell and
          it  appears  to  be one of the best compilers for the VAX in
          terms of the efficiency of the generated  code.   The  error
          messages are slightly better than those from the Unix C com-
          piler and the compile time is slightly worse.   Below  is  a
          table  of  execution  times  on a set of ten benchmarks that
          John Hennessy at Stanford has assembled.  All the times  are
          in CPU seconds on a VAX 11/780 with FPA.  All compilers were
          asked to generate the best code they could and all  optional
          run  time  checks were turned off.  All versions of the pro-
          grams were as syntactically and semantically similar as pos-
          sible.

          Doing benchmarks is always tricky and is  especially  diffi-
          cult  on  a  VAX 11/780 because of the peculiar interactions
          between the  instruction  prefetch  buffer  and  instruction
          alignment  on  word  boundaries.  Variations on the order of
          10% because of changes in alignment are  not  unusual.   The
          integer and real composite numbers in the last two lines are
          Hennessy's weighted averages of the previous  lines  --  his
          attempt  to  summarize  the  results.  The above performance
          information is provided for guidance only.  Your mileage may
          vary.

--------------------------------------------------------------------------------
|| Program	    |	VAX	VAX	VAX	    DEC WRL		      ||
|| Name		    |	C	Pascal	Bliss--32   Modula--2		      ||
--------------------------------------------------------------------------------
|| Perm		    | 	2.11	1.52  	1.93		1.71		      ||
|| Towers	    | 	2.30	1.20  	2.01		1.46		      ||
|| Queens	    | 	0.60	0.75  	0.54	 	0.59		      ||
|| Intmm	    | 	0.55	0.52  	1.02	 	0.54		      ||
|| Mm		    | 	0.99	0.55  	1.03	 	0.57		      ||
|| Puzzle	    | 	3.88	3.78  	3.77	 	3.05		      ||
|| Quick	    | 	0.59	0.60  	0.60	 	0.63		      ||
|| Bubble	    | 	0.76	0.74  	0.73	 	0.85		      ||
|| Tree		    | 	2.42	3.03  	2.32	 	1.44		      ||
|| FFT		    | 	1.88	1.28  	1.02	 	0.98		      ||
|| Int Comp	    | 	2.43	2.13  	2.25	 	1.76		      ||
|| Real Comp	    | 	3.27	2.68  	N/A 	 	2.21		      ||
--------------------------------------------------------------------------------
|| avg. improvement |	21%	8%    	17%	 	N/A 		      ||
--------------------------------------------------------------------------------

This is CPU seconds running on newton w/ VMS (a VAX-11/785).

    Note that if my arithmetic is bad (in calculating avg. improvement),
please correct me. If you doubt the veracity of the results, I have all of the
sources - you are welcome to try them yourself.

Chuck Doucette

May 27,	1987
115.40if you have the BLISS versions as well...VLNVAX::DMCLUREGear up for DECworld-87!Mon Jun 22 1987 00:503
	I'd be curious to see the run time of the BLISS versions.

							-davo
115.41my last word on this subjectCOOKIE::DOUCETTEChuck Doucette - Database A/D @CXOTue Jun 23 1987 03:0419
	I just checked - they are listed. If you look in the second chart (under
the paragraphs written by Mike Powell) you will see benchmark
timings under "DEC Bliss-32". I posted a very similar to this
one in the BLISS conference (TLE::BLISS) under note 294.* and it
seemed to draw much more interest.
	I'd like to comment more on my benchark results,
but people either say a) your optimizer was tuned to these benchmarks,
b) you hand-picked these bench marks to make your compiler look
good, c) these benchmarks aren't realistic, or d) all benchmarks
are useless. I personally have faith in the benchmarks I listed
and in the compiler our group uses (it isn't MY compiler although
I have worked on it and do plug it).

	Anyhow, I've said enough. If the compiler interests you,
you are more than welcome to read about it in the conference
I started in COOKIE::MODULA2 (send me mail if you want to become a
member - browsers are ok).

Chuck
115.42TLE::BRETTTue Jun 23 1987 09:155
    
    faith: n. trust; belief; belief without proof; religion; promise;
    	 	loyalty, constancy.
    
    /Bevin