[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::languages

Title:	Languages
Notice:	Speaking In Tongues
Moderator:	TLE::TOKLAS::FELDMAN

Created:	Sat Jan 25 1986
Last Modified:	Wed May 21 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	394
Total number of notes:	2683

115.0. "VAX Compilers Code Speed" by MUNICH::FROEHLIN (Guenther K. Froehlin) Mon Nov 10 1986 08:20

    Here is a list of execution speed of code produced by different
    VAX languages. This is by far not representative ! But gives
    some impression. Any comments ?
    
    If you are interested in the sources, get saveset
    
    	MUNICH""::CSC_INFO:ERATO.A
    
    Prost !
    G�nther


------------------------------------------------------------------------------
    
Seeth of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------

Execution timing in seconds !

Language ! Version  ! VAX750  ! VAX780  ! VAX785  ! VAX8200 !
---------+----------+---------+---------+---------+---------+
FORTRAN  ! V4.3-145 ! 02.14   ! 01.24   ! 00.84   ! 01.08   !
BLISS    ! V4.1-746 ! 02.49   ! 01.63   ! 01.03   ! 01.48   !
PLI      ! T2.4-397 ! 02.85   ! 01.54   ! 01.00   ! 01.40   !
PASCAL   ! V3.2-57  ! 02.69   ! 01.62   !         ! 01.51   !
ADA      ! T1.2-12  ! 03.72   ! 02.02   ! 01.29   ! 01.80   !
C        ! V2.1-007 ! 03.41   ! 02.05   ! 01.57   ! 01.80   !
COBOL    ! V3.2-42  ! 10.04   ! 05.87   ! 03.89   ! 05.67   !
BASIC    ! V2.3     ! 13.13   ! 05.84   ! 04.58   ! 07.63   !
---------+----------+---------+---------+---------+---------+

VAX750	REV: xx	FPA: ?
VAX780	REV:  7	FPA: Y
VAX785	REV:  2	FPA: Y
VAX8200	REV: xx	FPA: Y

T.R	Title	User	Personal Name	Date	Lines
115.1	For Internal Use Only	TLE::FELDMAN	LSE, zealously	`Mon Nov 10 1986 10:28`	13
	Just a reminder from your friendly moderator: Timing data on compilers is considered sensitive information. The data published in the base note (along with everything else in this notes file) is For Internal Use Only. Gary PS Anyone care to comment on the relevance of the Sieve of Eratosthenes as a benchmark? My gut reaction is that very few of our customers need to calculate prime numbers, and that optimizations that work quite well for reasonable code mixes may not necessarily do well on the S of E.
115.2	And sources, too!	MINAR::BISHOP		`Mon Nov 10 1986 11:02`	9
	Further, when such data is given, there should also be an indication of the qualifiers used: Pascal in particular has a default of /CHECK, which slows it down. Thus to get the fastest runs with VAX Pascal, non-default qualifiers must be use. I agree with Gary's implied criticism: the benchmark cited is a poor one. -John Bishop
115.3	Got it !	MUNICH::FROEHLIN	Type <RETURN>! What ye see ? R-E-T-U-R-N	`Mon Nov 10 1986 12:32`	2
	Thank you for your responses ! As I expect you professionals, my sieve will disappear into dust. But a bitter taste still remains!
115.4	Certainly not USDA Grade Prime numbers	NOBUGS::AMARTIN	Alan H. Martin	`Mon Nov 10 1986 13:30`	15
	I wouldn't spread those numbers far for another reason. Here are the first few prime numbers the Ada version found: " ERATO start in ADA 1 2 3 4 5 7 8 10 13 14 17 19 20 22 25 28 29 32 34 35 38 40 43 47 49 50 52 53 55 62 64 67 68 73 74 77 80 82 85 88 89 94 95 97 98 104 110 112 " /AHM
115.5	The right corner...	MUNICH::FROEHLIN	Type <RET>! What do you see? R-E-T	`Wed Nov 12 1986 08:06`	15
	.4: Thank you for your testing ! I tried the ADA version again and printed out the PRIME variable (after prime:=i+i+3;) and got prime numbers only. The algorithm doesn't calculate for 1,2 and 3. Or did you look at the FLAGS indicator ? They do not reflect the the prime numbers. .1,.2: I would have liked some technical discussion of the subject because there is a difference in code produced by different VAX compilers (see 'VAX Performance Summary 1985, Chapter 6). As far as I know, VAX C and VAX PLI are using the same CODE GENERATOR and OPTIMIZER, but the result looks different. Things like this had been the reason why I posted .0 here. Guenther Froehlin, ACT/TEC, Munich, F.R.Germany
115.6	Feel free to continue the discussion (FIUO)	TLE::FELDMAN	PDS, our next success	`Wed Nov 12 1986 10:29`	11
	My reply in .1 wasn't meant to stifle discussion on this subject. Please feel free to continue; I, too, would appreciate an explanation of the differences in code generated by our compilers that share the same back end. This discussion brings out yet another reason for timing data to be considered confidential: it takes an expert to design benchmarks, and an expert to interpret them; fudging or misinterpreting the data is far too easy. Gary
115.7	I'd like to see the program again	NOBUGS::AMARTIN	Alan H. Martin	`Wed Nov 12 1986 13:42`	11
	Re .5: True, I did not examine the successive values of the variable "prime". On the other hand, in the only version of the Sieve of Eratosthenes which I am aware of, a side effect of the algorithm is a boolean array, where every number which is not "crossed off" is a prime. I'd like to examine the program further, but deleted my copy. Could you post the latest location for the saveset? It seems to have disappeared from MUNICH""::CSC_INFO:. /AHM/THX
115.8	programs available and more ?s	MUNICH::FROEHLIN	Type <RET>!What do you see?R-E-T-U-R-N	`Thu Nov 13 1986 07:14`	18
	.6: Are there programers around for all the languages shown in .1 ? I would like to have the programs checked for correctness and speed (qualifiers to use etc.). .7: The saveset is back again : MUNICH::CSC_INFO:ERATO.A In general, the sieve of Eratosthenes is one of the programs in the set for benchmarking CPU power of computers. Among those programs are others like HANOI, ACKERMANN... I just want to be more sure about the ERATO version to compare it with other published data and I appreciate every help that the programs in ERATO.A are running as fastest as possible on a VAX. One interesting thing is that the ratio between the languages is different on the various VAXes. How that ? gggGuenther
115.9	Benchmarks are fun but often misleading...	CADSYS::COOK	Neil	`Fri Nov 14 1986 01:17`	49
	Whilst the programs you cite are often used in benchmarking computers and compilers, they are not very useful for this task. The reason for this is alluded to in your own note - "as fastest as possible on a VAX". Since a benchmark which shows one machine faster than another makes good advertising copy, people often "hand tune" benchmarks to get the best possible performance out of their machines. When measured in MIPS, this is often refered to as "Peak MIPS", since it could not be sustined for a more normal workload. Some small programs like Eratosthenes fit into the caches of a significant number of machines. This will reduce the amount of (slow) main memory fetches required, replacing them with fetches from (fast) cache memory. While a cache will normally improve a machine's speed, running small benchmarks like these over-emphasize the benefits to be gained. Some of the most realistic benchmarks are application programs (or parts thereof) which have been made available for benchmarking purposes. The Lawrence Livermore Loops program is one. You wondered why the ratio between the different languages varied between machines. Some of the VAX compilers use different code generators. Some will be more efficient than others for doing different things as the compiler writers will have concentrated on different language elements. Even when two compilers share the same code generator, the input to the code generator is likely to be a little different even for the same algorithm, so different machine code might be generated which runs at differing speeds. Lastly, different VAXes take differing numbers of machine cycles to perform the same instrucutions, so even if they had the same machine cycle time, they would differ in execution time. A very important aspect of this, is that some machines emulate instructions which others execute from microcode. In some cases where a particular code generator is more prone to generating these instructions, it can have a dramatic effect on performance. COBOL is quite prone to this since packed decimal instructions, for example, have been given much less prominance on more recent implementations of the VAX architechture. Personally, I am fascinated by benchmark results (the Usenet newsgroup comp.arch carries results every few months) and I need to keep reminding myself to take them with a grain of salt, lest I get too optimistic or pessimistic about the latest ones. P.S. BYTE is very good at bad benchmarks. Small benchmarks written in BASIC tell a potential purchaser next to nothing about a particular machine, especially if the floating point instructions are emulated.
115.10	Thanks for restoring it	NOBUGS::AMARTIN	Alan H. Martin	`Fri Nov 14 1986 17:30`	73
	Re .8: Thanks for recovering the save set. I still haven't figured out the algorithm, but I agree that the results are much improved when you examine the contents of the variable "prime". Now the only composite number it claims is prime is 9. Probably related to some glitch in starting the algorithm up. (I've got a simple but fast implementation of a prime finding algorithm which doesn't get 2 or 3 right). An implementation of the Sieve of Eratosthenes which uses the same algorithm that my office mate and I regard as "the original Greek" follows. It comes from the VAX C test system: " #ifndef TIMER #define TIMER 1 #endif /* sieve.c / /************************************************************************** Eratosthenes Sieve Prime Number Program (Byte September 1981) modified to make more 'C-like' **************************************************************************/ #define TRUE 1 #define FALSE 0 #define SIZE 8190 char flags[SIZE + 1]; main() { int timrb(), timre(); int i, k, iter; #ifdef OUTPUT #ifdef vax11c puts("\n\nVAX11C --- SIEVE benchmark ---\n"); #else puts("\n\n--- SIEVE benchmark ---\n"); #endif #endif #if TIMER timrb(); #endif for (iter = 1; iter <= 500; iter++) { for (i = 0; i <= SIZE; i++) flags[i] = TRUE; for (i = 2; i <= SIZE; i++) if (flags[i]) for (k = i 2; k <= SIZE; k += i) flags[k] = FALSE; } #if TIMER timre(); #endif #ifdef OUTPUT printf("flags[k]: k = %d, %d\n", SIZE - 200, SIZE); for (k = SIZE - 200; k <= SIZE; k += 65) { printf("\n"); for (i = k; i <= k + 64 && i <= SIZE; i++) printf(" %d", flags[i]); } #endif } "
115.11	I could still use some help	NOBUGS::AMARTIN	Alan H. Martin	`Mon Nov 17 1986 14:26`	3
	I'm going to submit one of the programs to CLT::MATH (q.v.) to see if someone can explain the algorithm works. /AHM/THX
115.12	Some grains of salt ...	TLE::MEIER	Bill Meier	`Wed Nov 19 1986 16:52`	8
	re: various Ada also has the default of /CHECK like Pascal, so comparing it to other compilers w/o check is bad. Thus, if you have an array (which I assume you do) these two compilers will probably make run-time checks every time an element in the array is refered to, costing you ... Ada, C and PL/I all in fact do use the same code generator! The VCG.
115.13	Tweaks or FORGOL?	TLE::MEIER	Bill Meier	`Thu Nov 20 1986 10:13`	11
	I don't know if your intent is to write everything "FORGOL" style, but with your Ada example, merely replacing for I in 1.. LIMIT loop FLAGS(I) := 1; end loop; with the array aggregate assignment statement FLAGS := (others => 1); you get a 10% performance improvement! Any true Ada programmer would express it that way in the first place, I suspect.
115.14	/NOCHECK/OPTI=SPEED_MAX	MUNICH::FROEHLIN	Type <RET>!What do you see?R-E-T-U-R-N	`Thu Nov 20 1986 10:17`	4
	As far as I can remember I switched off checks and set optimization to max speed for all compilations. Guenther
115.15		MUNICH::FROEHLIN	Type <RET>!What do you see?R-E-T-U-R-N	`Thu Nov 20 1986 10:22`	2
	.13: Got this one ! Thanks.. Guenther
115.16	Better BASIC	WELSWS::DODD		`Thu Nov 20 1986 11:27`	46
	I took a look at the BASIC version of this and the first reaction is nothing to do with benchmarking but... WHY do people write VAX BASIC as though they were on their Micky Mouse home computer?? The guys in BASIC development have done a tremendous job on the language in terms of improving its structures and functions and still we see junk like this. (sorry it makes me cross!!) I include a better version - this on it's own gives about 8% improvement something like .13 should be do-able give me a while. The example now looks a lot like PLI so I wonder why the execution time is not comparable?? 110 DECLARE BYTE FLAGS(10000) DECLARE WORD LIMIT,COUNTER,PRIME,J,I,K LIMIT=10000% PRINT "ERATO start in BASIC" CALL LIB$INIT_TIMER() FOR J=1% TO 10% COUNTER=0% FOR I=1% TO LIMIT FLAGS(I)=1% NEXT I FOR I=1% TO LIMIT IF FLAGS(I)=1% THEN PRIME=I+I+3% K=I+PRIME WHILE K <= LIMIT FLAGS(K)=0 K=K+PRIME NEXT COUNTER=COUNTER+1% END IF NEXT I NEXT J CALL LIB$SHOW_TIMER() PRINT PRINT COUNTER;" primes" END
115.17	Still better.	BISTRO::HEIN	Hein van den Heuvel, Valbonne.	`Fri Nov 21 1986 09:35`	51
	1 re .16, Have a look at the code BASIC generates for your example. You'll find zillions of CVTxx instructions for no good reason. Those can be avoided by explicit datatyping of constants. BASIC V3 would allow you to specify a default datatype. I guess the worst instruction in your example is in the innermost loop: FLAGS(K)=0 which will genearate: CVTWL K(R11), R12 INDEX R12, #0, #10000, #1, #0, R12 CVTFB #0., FLAGS(R11)[R12] If you change K to LONG and =0 to ='0'B you will get INDEX K(R11), #0, #10000, #1, #0, R12 MOVB #0, FLAGS(R11)[R12] Better, but not best source example included. Do you all understand now why some people do not like BASIC !? Hein. 1 DECLARE BYTE FLAGS(10000) DECLARE LONG LIMIT,COUNTER,PRIME,J,I,K LIMIT='10000'L PRINT "ERATO start in BASIC" CALL LIB$INIT_TIMER() FOR J=1% TO 10% COUNTER='0'L FOR I=1% TO LIMIT FLAGS(I)='1'B NEXT I FOR I=1% TO LIMIT IF FLAGS(I)='1'B THEN PRIME=I+I+'3'L K=I+PRIME WHILE K <= LIMIT FLAGS(K)='0'B K=K+PRIME NEXT COUNTER=COUNTER+'1'L END IF NEXT I NEXT J CALL LIB$SHOW_TIMER() PRINT PRINT COUNTER;" primes" END
115.18	What's not to like?	NOBUGS::AMARTIN	Alan H. Martin	`Fri Nov 21 1986 10:20`	4
	Re .17: Do you mean they don't like VAX BASIC the product, or BASIC the language? /AHM/THX
115.19	TGIF	BISTRO::HEIN	Hein van den Heuvel, Valbonne.	`Fri Nov 21 1986 11:08`	23
	I like the functionality, language constructs, that VAX BASIC offers me, but I do not like the implementation, how it's done. I think more people feel like that. I think the language is fun. The compiler is stupid. The VAX BASIC engineers will ofcourse tell us that they would like to make a clever compiler but the stupid languange definitions do not allow for one. Eg the RESUME statement forces them the re-load registers over and over again to handle the case where one RESUMEs in the middle of some code. Hence, they say, the compiler can not optimize. Well, I have my doubts, but I can not substanciate (sp?) them. It's just that some optimizations seem so easy to do. They example in the previous reply can provide numerous examples. Look at the FOR loop implementation: FOR I = 1 to 10 ---> MOVL #1, I(R11) } DECL I(R11) } = CLR I(R11) ACBL #10, #1, I(R11) Oh well...
115.20	Best Yet.	CHOVAX::YOUNG	Back from the Shadows Again,	`Fri Nov 21 1986 23:41`	78
	Some people dislike BASIC for the wrong reasons. The CVTxx complaints mentioned earlier were a result of the mistaken use of a real number literal constant intead of an integer, and a failure to set the default data type properly for the program (this CAN be done in versions earlier than v3.0). As Hein noted, correcting this does result in an increase in speed. However, the biggest problem of both this example, and the orginal ERATOBAS.BAS is that neither turns off BASICs default checking. Also, the use of BYTEs were there is sufficient room for LONGwords will result in a performance hit, so I have changed this too. The end result is that while the original ERATOBAS.BAS took 5.84 cpu seconds to complete on a 780, the following program takes only 2.88 seconds. While this is still slow by some standards, I have found that VAX BASIC is nowhere nearly as slow as most people think. I have consistently been able to bring it to within a factor of 2 compared to most other High Level languages (excluding Fortran), and I consider that a small price to pay for the use of real native dynamic strings. -- Barry --------------------------------------------------------------------- 1 !*************************************************************! ! ! ! Seive of Erastothones Basic Benchmark program ! ! ! !************************************************************! 10 Option type = explicit Option inactive = subscript checking Option inactive = integer overflow 110 Declare long constant Limit = 10000 Declare long constant Strsiz = 4(limit+1) Map (ws) long FLAGS (limit), ones (limit) Map (ws) string flg_str = strsiz, one_str = strsiz DECLARE long COUNTER,PRIME,J,I,K ! Prime the array pages. Flags (i) = 0% For i = 0% to limit Ones (i) = 1% For i = 0% to limit PRINT "ERATO start in BASIC" CALL LIB$INIT_TIMER() FOR J=1% TO 10% COUNTER=0% Flg_str = one_str FOR I=1% TO LIMIT IF FLAGS(I)=1% THEN PRIME=I+I+3% K=I+PRIME WHILE K <= LIMIT FLAGS(K)=0% K=K+PRIME NEXT COUNTER=COUNTER+1% END IF NEXT I NEXT J CALL LIB$SHOW_TIMER() PRINT PRINT COUNTER;" primes" END
115.21	just thinking	WELSWS::DODD		`Mon Nov 24 1986 09:18`	12
	The point I was trying to make was only that people lay out and write BASIC badly far more often than any other VAX language. The improvement with taking out the initialisation loops could be done in all of the ERATO* I looked at and would give a hefty performance improvement. I attempted no performance improvement just maintainability... Datatyping is probably the cause of most spurious code just take a look at a COBOL program - the slightest incompatibility and it's away into packed decimal hence it's sluggish performance on a microVAX. BASIC is a good enough language unless your major goal is absolute performance.
115.22		STAR::BENSON		`Mon Nov 24 1986 17:08`	43
	RE: .19: > Eg the RESUME statement forces them the re-load registers over and over > again to handle the case where one RESUMEs in the middle of some code. > Hence, they say, the compiler can not optimize. > Well, I have my doubts, but I can not substanciate (sp?) them. > It's just that some optimizations seem so easy to do. You're right - some SEEM easy to do, until you think of all the implications. For example, eliminating the code you mention which loads the current line number address into the frame. Even if the user does explicit RESUMEs, there is still the problem of returning the line on which the error occurred (ERL), which can even be done from a calling program. > I think the language is fun. The compiler is stupid. The VAX BASIC > engineers will ofcourse tell us that they would like to make a > clever compiler but the stupid languange definitions do not allow for one. Yes, there are some language elements which cause problems for optimizations. However, you should also look at the improvements that have been made in the VAX BASIC language and compiler between V1, V2, and V3, which would give you a clue as to what the project goals have been. And remember that another implicit goal is to be a compiler that looks like an interpreter, and optimization can cost compile time. Personally, I don't doubt that the VAX BASIC group would like to add some optimizations (for non-interactive mode), as time permits. Flaming at the compiler (or the group) doesn't help. Learning how to write good BASIC code, and then providing input about where you think the shortcomings are and what language elements are most important, might. RE: .20: Note that you can specify /NOLINE to further improve the code - it eliminates the MOVA mentioned above. Also, V3 has a CONSTANT TYPE option, so if you dislike adding "%" to your integer constants, you could specify OPTION CONSTANT TYPE = INTEGER - Tom Disclaimer - I don't work on VAX BASIC so I don't speak for the group. I just happen to know someone who once did. 8^)
115.23	.0 update	MUNICH::FROEHLIN	Type <RET>!What do you see?R-E-T-U-R-N	`Mon Dec 01 1986 07:18`	35
	With all your valuable inputs, this is the last update of .0 . Servus, G�nther Sieve of Eratosthenes Benchmark (calculating prime numbers) ------------------------------------------------------------- CPU execution time in seconds ! Compiler ! VAX750 ! VAX780 ! VAX785 ! uVAX II ! VAX8200 ! ---------+---------+---------+---------+---------+---------+ ADA ! 2.53 ! 1.54 ! 1.07 ! 1.46 ! 1.30 ! BASIC ! 3.87 ! 2.19 ! 1.60 ! 2.45 ! 1.89 ! BLISS ! 2.27 ! 1.49 ! 0.91 ! 1.40 ! 1.64 ! C ! 3.49 ! 2.14 ! 1.56 ! 2.14 ! 1.81 ! COBOL ! 10.14 ! 5.91 ! 3.89 ! 19.88 ! 5.69 ! FORTRAN ! 2.14 ! 1.27 ! 0.84 ! 1.18 ! 1.08 ! PASCAL ! 2.98 ! 1.65 ! 1.06 ! 1.71 ! 1.54 ! PLI ! 2.63 ! 1.48 ! 0.91 ! 1.61 ! 1.33 ! ---------+---------+---------+---------+---------+---------+ Compiler ! Version ! Compiler Option ---------+----------+---------------------------------------+ ADA ! T1.3-19 ! ADA /NOCHECK/OPTIMIZE=TIME BASIC ! V2.4 ! BASIC /NOCHECK/OPTIMIZE BLISS ! V4.1-746 ! BLISS /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED) C ! V2.2-015 ! CC /OPTIMIZE COBOL ! V3.4-49 ! COBOL /NOCHECK FORTRAN ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE PASCAL ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL PLI ! T3.0-091 ! PLI /NOCHECK/OPTIMIZE ---------+----------+---------------------------------------+
115.24	How about a MACRO-32 version?	TLE::MEIER	Bill Meier	`Mon Dec 01 1986 14:46`	8
	Might be interesting to code the "FORGOL" algorithm in MACRO-32 to see what the "theroretical" limit on execution speed, assuming of course, a good macro programmer can [eventually] produce code that is better than any compiler, at least today, with the VAX instruction set. Ie, is FORTRAN at the 99% point relative to MACRO or the 80% point, etc. Any volunteers?
115.25	Improvement still possible	OSLCSC::OLAV	Olav Tollefsen, TSSC, Norway	`Thu Dec 04 1986 15:54`	42
	Declaring the flags array of type BOOLEAN will speed up the Ada version by 10%. I have included the new version. Olav with TEXT_IO; use TEXT_IO; procedure ERATOADA is package INT_IO is new INTEGER_IO(INTEGER); LIMIT : constant INTEGER:=10000; FLAGS : array (1..LIMIT) of BOOLEAN; COUNTER,PRIME,I,J,K : INTEGER; procedure INIT; pragma INTERFACE(VMS,INIT); pragma IMPORT_PROCEDURE(INIT,"lib$init_timer"); procedure SHOW; pragma INTERFACE(VMS,SHOW); pragma IMPORT_PROCEDURE(SHOW,"lib$show_timer"); begin PUT_LINE("ERATO start in ADA"); INIT; for J in 1..10 loop COUNTER:=0; FLAGS:=(others => TRUE); for I in 1..LIMIT loop if FLAGS(I) then PRIME:= I+I+3; K:=I+PRIME; while K<=LIMIT loop FLAGS(K):= FALSE; K:=K+PRIME; end loop; COUNTER:=COUNTER+1; end if; end loop; end loop; SHOW; INT_IO.PUT(COUNTER);PUT_LINE(" primes"); end ERATOADA;
115.26		SMOP::GLOSSOP	Kent Glossop	`Sat Dec 06 1986 01:41`	91
	Just like for BASIC, it's possible to make things go faster for other languages if you know the compiler. I modified the PL/I benchmark slightly, and it now actually very slightly outperforms even the original FORTRAN version on an 8800. Well, that was until I modified the FORTRAN version... The FORTRAN version can be sped up quite a bit on the 8800 if you know that the memory is organized as longwords. When you know that, you know that byte writes take longer than longword writes, since for byte writes, the system must read the old longword and update the value, while (aligned) longword writes can just blast the old location. The catch is systems that cost more to write longwords than bytes (systems with 16-bit memory, perhaps) might actually go slower. (Also, LOGICAL4 allows BLBS/BLBC to be used in place of BBS/BBC, since you know there is an entire longword available.) I will point out that I think this is a VERY POOR benchmark. I will also point out that anyone who has a good knowledge of a language and a compiler can frequently do a LOT better than J. Random Programmer that doesn't ever bother to see what the compiler is generating and why it is doing that. I have marked the revised figures with asterisks. [I ran the benchmark 10 times in a command procedure interleaving the FORTRAN and PL/I runs and used the average of the results after verifying there were no anomalies. This is using the original version of the FORTRAN code, not the version specifically set up for the 8800.] Sieve of Eratosthenes Benchmark (calculating prime numbers) ------------------------------------------------------------- CPU execution time in seconds ! Compiler ! VAX750 ! VAX780 ! VAX785 ! uVAX II ! VAX8200 ! VAX8800 \| ---------+---------+---------+---------+---------+---------+---------+ ADA ! 2.53 ! 1.54 ! 1.07 ! 1.46 ! 1.30 ! \| BASIC ! 3.87 ! 2.19 ! 1.60 ! 2.45 ! 1.89 ! \| BLISS ! 2.27 ! 1.49 ! 0.91 ! 1.40 ! 1.64 ! \| C ! 3.49 ! 2.14 ! 1.56 ! 2.14 ! 1.81 ! \| COBOL ! 10.14 ! 5.91 ! 3.89 ! 19.88 ! 5.69 ! \| FORTRAN ! 2.14 ! 1.28 ! 0.85* ! 1.24* ! 1.08 ! .273* \|(.253 w/ PASCAL ! 2.98 ! 1.65 ! 1.06 ! 1.71 ! 1.54 ! \|LOGICAL4) PLI ! 2.?? ! 1.28 ! 0.85* ! 1.26* ! 1.?? ! .268* \| ---------+---------+---------+---------+---------+---------+---------+ Compiler ! Version ! Compiler Option ---------+----------+---------------------------------------+ ADA ! T1.3-19 ! ADA /NOCHECK/OPTIMIZE=TIME BASIC ! V2.4 ! BASIC /NOCHECK/OPTIMIZE BLISS ! V4.1-746 ! BLISS /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED) C ! V2.2-015 ! CC /OPTIMIZE COBOL ! V3.4-49 ! COBOL /NOCHECK FORTRAN ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE PASCAL ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL PLI ! T3.0-124 ! PLI /NOCHECK/OPTIMIZE ---------+----------+---------------------------------------+ /* * FLAGS_OVERLAY is used because for some reason FLAGS = '1'b isn't * generating a MOVC5 instruction like I would expect. COUNT2 is * used because using a variable in a PUT statement causes an up-level * reference, since a BEGIN block is implicitly generated for all PUT * statements. (Up-level referenced variables are forced to memory.) / %REPLACE LIMIT BY 10000; ERATO: PROC OPTIONS(MAIN); DCL (COUNT,COUNT2,PRIME,J,I,K) FIXED; DCL FLAGS(LIMIT) BIT ALIGNED; DCL FLAGS_OVERLAY(LIMIT) FIXED BIN(7) BASED(ADDR(FLAGS)); DCL LIB$INIT_TIMER EXTERNAL ENTRY; DCL LIB$SHOW_TIMER EXTERNAL ENTRY; PUT LIST ('ERATO start in PLI'); CALL LIB$INIT_TIMER; DO J=1 TO 10; COUNT=0; FLAGS_OVERLAY=1; / Array assignment */ DO I=1 TO LIMIT; IF FLAGS(I) THEN DO; PRIME=I+I+3; K=I+PRIME; DO WHILE (K <= LIMIT); FLAGS(K)='0'b; K=K+PRIME; END; COUNT=COUNT+1; END; END; END; CALL LIB$SHOW_TIMER; COUNT2 = COUNT; PUT SKIP LIST (COUNT2,' primes'); END ERATO;
115.27		SMOP::GLOSSOP	Kent Glossop	`Sat Dec 06 1986 02:08`	46
	Also, just emphasize how much knowing about compilers, etc., can make a difference, simply by moving the variable declarations INSIDE the program block for C makes an enormous difference, since the variables go from "external" (which can in theory be modified asynchronously - i.e. aliased) to "automatic", which is not aliased unless it has its address taken. (Aliased variables cannot be cached in registers.) The speed of the C code after this "trival" change is approximately the same speed as the revised PL/I program - i.e. only slightly slower than FORTRAN, or a 40% speed improvement over the original version. ************************************************************************ DON'T TRUST BENCHMARKS UNLESS YOU KNOW EXACTLY WHAT THEY ARE SUPPOSED TO BE TESTING, AND YOU ARE CONFIDENT THAT THEY ARE REALLY TESTING WHAT THEY CLAIM TO BE TESTING. INTER-LANGUAGE BENCHMARKS ARE ESPECIALLY DANGEROUS, SINCE PEOPLE RARELY KNOW ALL LANGUAGES EQUALLY WELL. ************************************************************************ The revised C program follows: #define limit 10000 main() { int counter,prime,i,j,k; /* Moved from outside the block / int flags[limit+1]; / Moved from outside the block */ printf("\nERATO start in C"); lib$init_timer(); for (j=1; j<=10; j++) { counter=0; for (i=1; i<=limit; i++) flags[i]=1; for (i=1; i<=limit; i++) { if (flags[i]==1) { prime = i + i + 3; k = i + prime; while (k <= limit) { flags[k]=0; k = k + prime; } ++counter; } } } lib$show_timer(); printf("\n%d primes",counter); }
115.28	Correct S of E?	VINO::MERRILL		`Mon Dec 08 1986 18:23`	31
	What is the correct definition for S of E? The previous response used a different algorithm than that before, so I came up with one more it is quite fast for 1000 sieves, about .57 seconds execution time: #define TRUE 1 #define FALSE 0 #define SIZE 8190 main() { int i, j, k; /* initilize 1 & 2 to be initially prime / int flags[SIZE + 1] = { FALSE, TRUE, TRUE, }; lib$init_timer(); for (j = 1; j <= 1000; j++) { for (i = 3; i <= SIZE; i += 2) / start with 3 */ flags[i] = TRUE; for (i = 3; i <= SIZE; i += 2) if (flags[i]) for (k = (i + i) + i; k <= SIZE; k += (i + i)) flags[k] = FALSE; } lib$show_timer(); printf("Prime numbers:\n"); for (i = 1; i <= SIZE; i++) if (flags[i]) printf(" %d\n",i); }
115.29		SMOP::GLOSSOP	Kent Glossop	`Mon Dec 08 1986 20:13`	5
	I wouldn't be surprized if there are faster algorithms - I just sped up the one provided to make a point about inter-language benchmarking comparisons. Kent
115.30	see it two ways	MUNICH::FROEHLIN	Type <RET>!What do you see?R-E-T-U-R-N	`Tue Dec 09 1986 04:32`	13
	If you compare the results among the cpu types, you see how dangerous the SofE benchmark is. The relative speed of the produced code varies. And this I think is an interesting point too. For certain (or small) programs one VAX may be relatively better than the other. It seems a little bit like you can find out a PASCAL machine or a BLISS machine or whatsoever. I would like to know how compiler engineers put that into account. Thank you all for the interesting discussion (and code improvements)! G�nther
115.31	Optimizing for a specific processor	QUARK::LIONEL	Reality is frequently inaccurate	`Tue Dec 09 1986 10:12`	9
	Re .30: Our usual strategy is to optimize for the fastest processor. However, of late the various processor groups are trying to woo the compiler developers into making their particular processor look good, sometimes at the expense of another processor. It's quite true that a code sequence that is best for processor X is not so for processor Y. Steve
115.32	ahem	BIZET::VANROGGEN		`Tue Dec 09 1986 10:25`	4
	I found that using the algorithm provided by Knuth, although a bit primitive, was about four times faster than the original algorithm given in ERATO. It was also correct, unlike ERATO. I'm surprised people haven't complained more about this.
115.33	Byte gets the blame	TLE::RMEYERS	Randy Meyers	`Tue Dec 09 1986 14:59`	12
	Re .32: I believe that Byte Magazine is the culprit responsible for the broken algorithm for finding prime numbers. Byte popularized that particular benchmark in that form. I always assumed that they wrote the program without checking to see if it worked. In general, I have found Byte very enjoyable for its technological howlers. I especially liked an article on recursive descent parsing that evaluated expressions right to left. The author of the article did believe this to be a drawback, but didn't see how it could be fixed.
115.34		CLT::GILBERT	eager like a child	`Tue Dec 09 1986 19:11`	21
	In looking at the code generated by VAX Ada, I noticed a couple interesting things -- the initialization of the array is done by a MOVC3, whereas a MOVC5 would do as well, the value 'FALSE' is materialized (consuming R4, whereas a literal zero would suffice in the one instruction that uses it), the compiler does subscript checking but no range propagation, so some checks are redundant, and it uses INSV and EXTV to set/clear and test bits in the array. These are a few opportunities for the compiler to generate better code. A more important aspect (as far as benchmarks go) is that the loop variables are 'off by 1' from what the algorithm naturally desires. Thus, the compiler generates some "SUBL3 #1, Ra, Rb" instructions that would be unnecessary if the 'correct' loop variable or induction variable were used. This is due (in part) to the bit-vector being indexed starting at 1, instead of 0. It's doubtful that any compiler would attempt to determine the 'correct' induction variable, but some (such as Pascal or Fortran) might 'luck' upon it. This gives a significant performance improvement (on older VAX models), and is attributable only to chance; this doesn't make for a good benchmark.
115.35		AUTHOR::WELLCOME	Steve	`Wed Mar 18 1987 11:12`	2
	For what it's worth (not much), I tried this in MACRO-11 on a PRO-380 and it took 2.68 seconds.
115.36	late entry for COBOL	CLT::RICE		`Mon Mar 23 1987 11:12`	95
	I know this is a late entry, but I just found this note. To emphasize the fact that benchmarks and benchmark data should not be distributed without a complete understanding of the nature of the program and the expected meaning of the test (re: .27) - the VAX COBOL numbers previously reported were inaccurate on the order of 125% to over 850%. A properly written VAX COBOL program (knowing something about the compiler helps) is much faster than the times reported in #.23 indicate. In this case the major difference is the use of longword integer numerics instead of numeric string (which the compiler then converts to packed decimal for most calculations - but that's another story). I was unable to obtain the saveset, but wrote a COBOL program based on the BASIC version in .16 and .20. Without spending a great deal of time optimizing it I obtained: MicroVAX II VAX/785 'better' COBOL 2.06 1.71 instead of 19.88 3.89 In view of the age of this note - I wonder how many readers have a distorted view of the speed of VAX COBOL programs. Slower than FORTRAN - sure, but not THAT slow! And, it shouldn't be a big suprise to anyone that packed is slower than integer - especially on MicroVAX. Wish I'd seen this note sooner. -Chip (VAX COBOL development) *************************************************************** IDENTIFICATION DIVISION. PROGRAM-ID. SIEVE. ENVIRONMENT DIVISION. *************************************************************** * * * Sieve of Eratosthenes COBOL Benchmark program * * (whynot) * ***************************************************************** DATA DIVISION. WORKING-STORAGE SECTION. 01 LIMITT PIC S9(9) COMP VALUE 10000. 01 STRSIZ PIC S9(9) COMP VALUE 40004. 01 COUNTER PIC S9(9) COMP VALUE 0. 01 PRIME PIC S9(9) COMP VALUE 0. 01 J PIC S9(9) COMP VALUE 0. 01 I PIC S9(9) COMP VALUE 0. 01 K PIC S9(9) COMP VALUE 0. 01 TABLES. 02 FLGSTR. 03 FLAGS OCCURS 10000 TIMES PIC S9(9) COMP. 02 ONESTR. 03 ONES OCCURS 10000 TIMES PIC S9(9) COMP. PROCEDURE DIVISION. BEGIN. PERFORM VARYING I FROM 1 BY 1 UNTIL I > LIMITT MOVE 0 TO FLAGS(i) MOVE 1 TO ONES(i) END-PERFORM. DISPLAY "ERATO start in COBOL". CALL "LIB$INIT_TIMER". PERFORM VARYING J FROM 1 BY 1 UNTIL J = 10 MOVE 0 TO COUNTER MOVE ONESTR TO FLGSTR PERFORM VARYING I FROM 1 BY 1 UNTIL I = LIMITT IF FLAGS(I) = 1 THEN * ADD I I 3 GIVING PRIME MOVE I TO PRIME ADD I TO PRIME ADD 3 TO PRIME * display prime conversion ADD I PRIME GIVING K PERFORM UNTIL K NOT < LIMITT MOVE 0 TO FLAGS(K) ADD K PRIME GIVING K END-PERFORM ADD 1 TO COUNTER END-IF END-PERFORM END-PERFORM. CALL "LIB$SHOW_TIMER". DISPLAY " ". DISPLAY COUNTER CONVERSION " primes".
115.37	I've got what you were looking for	DENTON::AMARTIN	Alan H. Martin	`Tue Mar 24 1987 10:58`	7
	Re .36: > I was unable to obtain the saveset, ... See DENTON""::DENTON$DUA1:[AMARTIN.ERATO]ERATO.A (or ERATOCOB.COB on the same area). /AHM
115.38	got it - thanks	CLT::RICE		`Tue Mar 24 1987 12:55`	5
	re: .37 thanks. Guenther too has mailed me his COBOL version. Most of the improvement in my version is from the use of integer numeric instead of string. -Chip
115.39	another set of benchmark figures	COOKIE::DOUCETTE	Chuck Doucette	`Tue Jun 02 1987 18:33`	111
	Here (after the FF) is a reproduction of some benchmarks I did recently. I did this in preparation for a talk I gave at ZKO last Thursday in the Pythagoras room (5/28/87) on DEC WRL Modula-2 (a descendant of Mike Powell's compiler). I post this for two reasons - 1) to generate interest in this compiler, and 2) I'd like suggestions as to how to make this a fair, accurate, and complete comparison. If you'd like complete compiler distribution kits for ULTRIX or VMS including these benchmarks - please send mail to Joel McCormack (SONORA::JOEL). If you want to just play with the benchmark files themselves, you can look in COOKIE::DISK$GZ_DISK:[DOUCETTE.NEWMOD.BENCH] at the files BENCH.; and DESCRIP.MMS;0 Chuck From: COOKIE::DOUCETTE "Chuck Doucette 26-May-1987 2333" 27-MAY-1987 00:07 To: @GROUP Subj: I redid the DECWRL Modula-2 benchmarks - some dramatic results: -------------------------------------------------------------------------------- \|\| Program \| Berkeley Ultrix DEC WRL \|\| \|\| Name \| Pascal C Modula--2 \|\| -------------------------------------------------------------------------------- \|\| Perm \| 2.58 1.40 1.14 \|\| \|\| Towers \| 2.88 1.60 1.00 \|\| \|\| Queens \| 1.47 0.49 0.46 \|\| \|\| Intmm \| 2.75 1.27 0.59 \|\| \|\| Mm \| 2.32 1.69 0.64 \|\| \|\| Puzzle \| 15.73 8.92 2.56 \|\| \|\| Quick \| 2.07 0.62 0.45 \|\| \|\| Bubble \| 3.57 2.98 0.67 \|\| \|\| Tree \| 3.30 1.76 1.00 \|\| \|\| FFT \| 5.72 3.62 1.44 \|\| \|\| Int Comp \| 5.27 2.91 1.39 \|\| \|\| Real Comp \| 7.90 4.49 2.01 \|\| -------------------------------------------------------------------------------- \|\| avg. improvement \| 74% 47% N/A \|\| -------------------------------------------------------------------------------- This is CPU seconds running on shodha w/ ULTRIX (a VAX-8200). The "avg. improvement" row means that for each entry, I compared it with DEC WRL Modula-2. I figured the improvement in execution time from the particular language and Modula-2. Then, I added up all the improvement percentages per column, divided by the number of entries, and got the average. This should indicate (not accurately) the average expected improvement in execution time (i.e. how good was the code that the compiler generated) if you coded the same thing in Modula-2. N/A means either Not Applicable or Not Available. Other than the actual timings I got (and the machine), and the avg. improvement line that I added - see the following excerpt from overview.doc written by Mike Powell for details (note that speed of compilation with the current compiler is twice as fast as the compiler in use when this was written): The compiler was designed and built by Michael L. Powell and it appears to be one of the best compilers for the VAX in terms of the efficiency of the generated code. The error messages are slightly better than those from the Unix C com- piler and the compile time is slightly worse. Below is a table of execution times on a set of ten benchmarks that John Hennessy at Stanford has assembled. All the times are in CPU seconds on a VAX 11/780 with FPA. All compilers were asked to generate the best code they could and all optional run time checks were turned off. All versions of the pro- grams were as syntactically and semantically similar as pos- sible. Doing benchmarks is always tricky and is especially diffi- cult on a VAX 11/780 because of the peculiar interactions between the instruction prefetch buffer and instruction alignment on word boundaries. Variations on the order of 10% because of changes in alignment are not unusual. The integer and real composite numbers in the last two lines are Hennessy's weighted averages of the previous lines -- his attempt to summarize the results. The above performance information is provided for guidance only. Your mileage may vary. -------------------------------------------------------------------------------- \|\| Program \| VAX VAX VAX DEC WRL \|\| \|\| Name \| C Pascal Bliss--32 Modula--2 \|\| -------------------------------------------------------------------------------- \|\| Perm \| 2.11 1.52 1.93 1.71 \|\| \|\| Towers \| 2.30 1.20 2.01 1.46 \|\| \|\| Queens \| 0.60 0.75 0.54 0.59 \|\| \|\| Intmm \| 0.55 0.52 1.02 0.54 \|\| \|\| Mm \| 0.99 0.55 1.03 0.57 \|\| \|\| Puzzle \| 3.88 3.78 3.77 3.05 \|\| \|\| Quick \| 0.59 0.60 0.60 0.63 \|\| \|\| Bubble \| 0.76 0.74 0.73 0.85 \|\| \|\| Tree \| 2.42 3.03 2.32 1.44 \|\| \|\| FFT \| 1.88 1.28 1.02 0.98 \|\| \|\| Int Comp \| 2.43 2.13 2.25 1.76 \|\| \|\| Real Comp \| 3.27 2.68 N/A 2.21 \|\| -------------------------------------------------------------------------------- \|\| avg. improvement \| 21% 8% 17% N/A \|\| -------------------------------------------------------------------------------- This is CPU seconds running on newton w/ VMS (a VAX-11/785). Note that if my arithmetic is bad (in calculating avg. improvement), please correct me. If you doubt the veracity of the results, I have all of the sources - you are welcome to try them yourself. Chuck Doucette May 27, 1987
115.40	if you have the BLISS versions as well...	VLNVAX::DMCLURE	Gear up for DECworld-87!	`Mon Jun 22 1987 00:50`	3
	I'd be curious to see the run time of the BLISS versions. -davo
115.41	my last word on this subject	COOKIE::DOUCETTE	Chuck Doucette - Database A/D @CXO	`Tue Jun 23 1987 03:04`	19
	I just checked - they are listed. If you look in the second chart (under the paragraphs written by Mike Powell) you will see benchmark timings under "DEC Bliss-32". I posted a very similar to this one in the BLISS conference (TLE::BLISS) under note 294.* and it seemed to draw much more interest. I'd like to comment more on my benchark results, but people either say a) your optimizer was tuned to these benchmarks, b) you hand-picked these bench marks to make your compiler look good, c) these benchmarks aren't realistic, or d) all benchmarks are useless. I personally have faith in the benchmarks I listed and in the compiler our group uses (it isn't MY compiler although I have worked on it and do plug it). Anyhow, I've said enough. If the compiler interests you, you are more than welcome to read about it in the conference I started in COOKIE::MODULA2 (send me mail if you want to become a member - browsers are ok). Chuck
115.42		TLE::BRETT		`Tue Jun 23 1987 09:15`	5
	faith: n. trust; belief; belief without proof; religion; promise; loyalty, constancy. /Bevin