T.R | Title | User | Personal Name | Date | Lines |
---|
115.1 | For Internal Use Only | TLE::FELDMAN | LSE, zealously | Mon Nov 10 1986 10:28 | 13 |
| Just a reminder from your friendly moderator:
Timing data on compilers is considered sensitive information. The
data published in the base note (along with everything else in this
notes file) is For Internal Use Only.
Gary
PS Anyone care to comment on the relevance of the Sieve of Eratosthenes
as a benchmark? My gut reaction is that very few of our customers
need to calculate prime numbers, and that optimizations that work
quite well for reasonable code mixes may not necessarily do well
on the S of E.
|
115.2 | And sources, too! | MINAR::BISHOP | | Mon Nov 10 1986 11:02 | 9 |
| Further, when such data is given, there should also be an
indication of the qualifiers used: Pascal in particular
has a default of /CHECK, which slows it down. Thus to get
the fastest runs with VAX Pascal, non-default qualifiers
must be use.
I agree with Gary's implied criticism: the benchmark cited
is a poor one.
-John Bishop
|
115.3 | Got it ! | MUNICH::FROEHLIN | Type <RETURN>! What ye see ? R-E-T-U-R-N | Mon Nov 10 1986 12:32 | 2 |
| Thank you for your responses ! As I expect you professionals, my
sieve will disappear into dust. But a bitter taste still remains!
|
115.4 | Certainly not USDA Grade Prime numbers | NOBUGS::AMARTIN | Alan H. Martin | Mon Nov 10 1986 13:30 | 15 |
| I wouldn't spread those numbers far for another reason. Here are the
first few prime numbers the Ada version found:
"
ERATO start in ADA
1 2 3 4 5 7
8 10 13 14 17 19
20 22 25 28 29 32
34 35 38 40 43 47
49 50 52 53 55 62
64 67 68 73 74 77
80 82 85 88 89 94
95 97 98 104 110 112
"
/AHM
|
115.5 | The right corner... | MUNICH::FROEHLIN | Type <RET>! What do you see? R-E-T | Wed Nov 12 1986 08:06 | 15 |
| .4: Thank you for your testing ! I tried the ADA version again and
printed out the PRIME variable (after prime:=i+i+3;) and got
prime numbers only. The algorithm doesn't calculate for 1,2
and 3. Or did you look at the FLAGS indicator ? They do not
reflect the the prime numbers.
.1,.2: I would have liked some technical discussion of the subject
because there is a difference in code produced by different VAX
compilers (see 'VAX Performance Summary 1985, Chapter 6).
As far as I know, VAX C and VAX PLI are using the same CODE GENERATOR
and OPTIMIZER, but the result looks different. Things like this
had been the reason why I posted .0 here.
Guenther Froehlin, ACT/TEC, Munich, F.R.Germany
|
115.6 | Feel free to continue the discussion (FIUO) | TLE::FELDMAN | PDS, our next success | Wed Nov 12 1986 10:29 | 11 |
| My reply in .1 wasn't meant to stifle discussion on this subject.
Please feel free to continue; I, too, would appreciate an explanation
of the differences in code generated by our compilers that share
the same back end.
This discussion brings out yet another reason for timing data to be
considered confidential: it takes an expert to design benchmarks, and
an expert to interpret them; fudging or misinterpreting the data is far
too easy.
Gary
|
115.7 | I'd like to see the program again | NOBUGS::AMARTIN | Alan H. Martin | Wed Nov 12 1986 13:42 | 11 |
| Re .5:
True, I did not examine the successive values of the variable "prime".
On the other hand, in the only version of the Sieve of Eratosthenes
which I am aware of, a side effect of the algorithm is a boolean array,
where every number which is not "crossed off" is a prime.
I'd like to examine the program further, but deleted my copy. Could you
post the latest location for the saveset? It seems to have disappeared
from MUNICH""::CSC_INFO:.
/AHM/THX
|
115.8 | programs available and more ?s | MUNICH::FROEHLIN | Type <RET>!What do you see?R-E-T-U-R-N | Thu Nov 13 1986 07:14 | 18 |
| .6: Are there programers around for all the languages shown in
.1 ? I would like to have the programs checked for correctness
and speed (qualifiers to use etc.).
.7: The saveset is back again : MUNICH::CSC_INFO:ERATO.A
In general, the sieve of Eratosthenes is one of the programs in
the set for benchmarking CPU power of computers. Among those programs
are others like HANOI, ACKERMANN... I just want to be more sure
about the ERATO version to compare it with other published data
and I appreciate every help that the programs in ERATO.A are running
as fastest as possible on a VAX.
One interesting thing is that the ratio between the languages
is different on the various VAXes. How that ?
gggGuenther
|
115.9 | Benchmarks are fun but often misleading... | CADSYS::COOK | Neil | Fri Nov 14 1986 01:17 | 49 |
| Whilst the programs you cite are often used in benchmarking computers
and compilers, they are not very useful for this task. The reason
for this is alluded to in your own note - "as fastest as possible on
a VAX". Since a benchmark which shows one machine faster than another
makes good advertising copy, people often "hand tune" benchmarks to
get the best possible performance out of their machines. When measured
in MIPS, this is often refered to as "Peak MIPS", since it could not
be sustined for a more normal workload.
Some small programs like Eratosthenes fit into the caches of a
significant number of machines. This will reduce the amount of (slow)
main memory fetches required, replacing them with fetches from (fast)
cache memory. While a cache will normally improve a machine's speed,
running small benchmarks like these over-emphasize the benefits to
be gained.
Some of the most realistic benchmarks are application programs (or
parts thereof) which have been made available for benchmarking
purposes. The Lawrence Livermore Loops program is one.
You wondered why the ratio between the different languages varied
between machines. Some of the VAX compilers use different code
generators. Some will be more efficient than others for doing
different things as the compiler writers will have concentrated
on different language elements. Even when two compilers share
the same code generator, the input to the code generator is likely
to be a little different even for the same algorithm, so different
machine code might be generated which runs at differing speeds.
Lastly, different VAXes take differing numbers of machine cycles
to perform the same instrucutions, so even if they had the same
machine cycle time, they would differ in execution time. A very
important aspect of this, is that some machines emulate instructions
which others execute from microcode. In some cases where a particular
code generator is more prone to generating these instructions, it
can have a dramatic effect on performance. COBOL is quite prone to
this since packed decimal instructions, for example, have been
given much less prominance on more recent implementations of the
VAX architechture.
Personally, I am fascinated by benchmark results (the Usenet newsgroup
comp.arch carries results every few months) and I need to keep
reminding myself to take them with a grain of salt, lest I get
too optimistic or pessimistic about the latest ones.
P.S. BYTE is very good at bad benchmarks. Small benchmarks written
in BASIC tell a potential purchaser next to nothing about a
particular machine, especially if the floating point instructions
are emulated.
|
115.10 | Thanks for restoring it | NOBUGS::AMARTIN | Alan H. Martin | Fri Nov 14 1986 17:30 | 73 |
| Re .8:
Thanks for recovering the save set. I still haven't figured out the
algorithm, but I agree that the results are much improved when you examine
the contents of the variable "prime". Now the only composite number
it claims is prime is 9. Probably related to some glitch in starting the
algorithm up. (I've got a simple but fast implementation of a prime
finding algorithm which doesn't get 2 or 3 right).
An implementation of the Sieve of Eratosthenes which uses the same
algorithm that my office mate and I regard as "the original Greek" follows.
It comes from the VAX C test system:
"
#ifndef TIMER
#define TIMER 1
#endif
/* sieve.c */
/*****************************************************************************
Eratosthenes Sieve Prime Number Program (Byte September 1981)
modified to make more 'C-like'
*****************************************************************************/
#define TRUE 1
#define FALSE 0
#define SIZE 8190
char flags[SIZE + 1];
main()
{
int timrb(), timre();
int i, k, iter;
#ifdef OUTPUT
#ifdef vax11c
puts("\n\nVAX11C --- SIEVE benchmark ---\n");
#else
puts("\n\n--- SIEVE benchmark ---\n");
#endif
#endif
#if TIMER
timrb();
#endif
for (iter = 1; iter <= 500; iter++)
{
for (i = 0; i <= SIZE; i++)
flags[i] = TRUE;
for (i = 2; i <= SIZE; i++)
if (flags[i])
for (k = i * 2; k <= SIZE; k += i)
flags[k] = FALSE;
}
#if TIMER
timre();
#endif
#ifdef OUTPUT
printf("flags[k]: k = %d, %d\n", SIZE - 200, SIZE);
for (k = SIZE - 200; k <= SIZE; k += 65)
{
printf("\n");
for (i = k; i <= k + 64 && i <= SIZE; i++)
printf(" %d", flags[i]);
}
#endif
}
"
|
115.11 | I could still use some help | NOBUGS::AMARTIN | Alan H. Martin | Mon Nov 17 1986 14:26 | 3 |
| I'm going to submit one of the programs to CLT::MATH (q.v.) to see if
someone can explain the algorithm works.
/AHM/THX
|
115.12 | Some grains of salt ... | TLE::MEIER | Bill Meier | Wed Nov 19 1986 16:52 | 8 |
| re: various
Ada also has the default of /CHECK like Pascal, so comparing it to other
compilers w/o check is bad. Thus, if you have an array (which I assume you do)
these two compilers will probably make run-time checks every time an element in
the array is refered to, costing you ...
Ada, C and PL/I all in fact do use the same code generator! The VCG.
|
115.13 | Tweaks or FORGOL? | TLE::MEIER | Bill Meier | Thu Nov 20 1986 10:13 | 11 |
| I don't know if your intent is to write everything "FORGOL" style, but with
your Ada example, merely replacing
for I in 1.. LIMIT loop FLAGS(I) := 1; end loop;
with the array aggregate assignment statement
FLAGS := (others => 1);
you get a 10% performance improvement! Any true Ada programmer would express it
that way in the first place, I suspect.
|
115.14 | /NOCHECK/OPTI=SPEED_MAX | MUNICH::FROEHLIN | Type <RET>!What do you see?R-E-T-U-R-N | Thu Nov 20 1986 10:17 | 4 |
| As far as I can remember I switched off checks and set optimization
to max speed for all compilations.
Guenther
|
115.15 | | MUNICH::FROEHLIN | Type <RET>!What do you see?R-E-T-U-R-N | Thu Nov 20 1986 10:22 | 2 |
| .13: Got this one ! Thanks..
Guenther
|
115.16 | Better BASIC | WELSWS::DODD | | Thu Nov 20 1986 11:27 | 46 |
| I took a look at the BASIC version of this and the first reaction
is nothing to do with benchmarking but...
WHY do people write VAX BASIC as though they were on their Micky
Mouse home computer??
The guys in BASIC development have done a tremendous job on the
language in terms of improving its structures and functions and
still we see junk like this. (sorry it makes me cross!!)
I include a better version - this on it's own gives about 8%
improvement something like .13 should be do-able give me a while.
The example now looks a lot like PLI so I wonder why the execution
time is not comparable??
110 DECLARE BYTE FLAGS(10000)
DECLARE WORD LIMIT,COUNTER,PRIME,J,I,K
LIMIT=10000%
PRINT "ERATO start in BASIC"
CALL LIB$INIT_TIMER()
FOR J=1% TO 10%
COUNTER=0%
FOR I=1% TO LIMIT
FLAGS(I)=1%
NEXT I
FOR I=1% TO LIMIT
IF FLAGS(I)=1%
THEN
PRIME=I+I+3%
K=I+PRIME
WHILE K <= LIMIT
FLAGS(K)=0
K=K+PRIME
NEXT
COUNTER=COUNTER+1%
END IF
NEXT I
NEXT J
CALL LIB$SHOW_TIMER()
PRINT
PRINT COUNTER;" primes"
END
|
115.17 | Still better. | BISTRO::HEIN | Hein van den Heuvel, Valbonne. | Fri Nov 21 1986 09:35 | 51 |
| 1 re .16,
Have a look at the code BASIC generates for your example.
You'll find zillions of CVTxx instructions for no good reason.
Those can be avoided by explicit datatyping of constants.
BASIC V3 would allow you to specify a default datatype.
I guess the worst instruction in your example is in the innermost
loop: FLAGS(K)=0 which will genearate:
CVTWL K(R11), R12
INDEX R12, #0, #10000, #1, #0, R12
CVTFB #0., FLAGS(R11)[R12]
If you change K to LONG and =0 to ='0'B you will get
INDEX K(R11), #0, #10000, #1, #0, R12
MOVB #0, FLAGS(R11)[R12]
Better, but not best source example included.
Do you all understand now why some people do not like BASIC !?
Hein.
1 DECLARE BYTE FLAGS(10000)
DECLARE LONG LIMIT,COUNTER,PRIME,J,I,K
LIMIT='10000'L
PRINT "ERATO start in BASIC"
CALL LIB$INIT_TIMER()
FOR J=1% TO 10%
COUNTER='0'L
FOR I=1% TO LIMIT
FLAGS(I)='1'B
NEXT I
FOR I=1% TO LIMIT
IF FLAGS(I)='1'B
THEN
PRIME=I+I+'3'L
K=I+PRIME
WHILE K <= LIMIT
FLAGS(K)='0'B
K=K+PRIME
NEXT
COUNTER=COUNTER+'1'L
END IF
NEXT I
NEXT J
CALL LIB$SHOW_TIMER()
PRINT
PRINT COUNTER;" primes"
END
|
115.18 | What's not to like? | NOBUGS::AMARTIN | Alan H. Martin | Fri Nov 21 1986 10:20 | 4 |
| Re .17:
Do you mean they don't like VAX BASIC the product, or BASIC the language?
/AHM/THX
|
115.19 | TGIF | BISTRO::HEIN | Hein van den Heuvel, Valbonne. | Fri Nov 21 1986 11:08 | 23 |
| I like the functionality, language constructs, that VAX BASIC
offers me, but I do not like the implementation, how it's done.
I think more people feel like that.
I think the language is fun. The compiler is stupid. The VAX BASIC
engineers will ofcourse tell us that they would like to make a
clever compiler but the stupid languange definitions do not allow for one.
Eg the RESUME statement forces them the re-load registers over and over
again to handle the case where one RESUMEs in the middle of some code.
Hence, they say, the compiler can not optimize.
Well, I have my doubts, but I can not substanciate (sp?) them.
It's just that some optimizations *seem* so easy to do. They example
in the previous reply can provide numerous examples.
Look at the FOR loop implementation:
FOR I = 1 to 10 ---> MOVL #1, I(R11) }
DECL I(R11) } = CLR I(R11)
ACBL #10, #1, I(R11)
Oh well...
|
115.20 | Best Yet. | CHOVAX::YOUNG | Back from the Shadows Again, | Fri Nov 21 1986 23:41 | 78 |
| Some people dislike BASIC for the wrong reasons. The CVTxx
complaints mentioned earlier were a result of the mistaken use of
a real number literal constant intead of an integer, and a failure
to set the default data type properly for the program (this CAN
be done in versions earlier than v3.0). As Hein noted, correcting
this does result in an increase in speed. However, the biggest
problem of both this example, and the orginal ERATOBAS.BAS is that
neither turns off BASICs default checking. Also, the use of BYTEs
were there is sufficient room for LONGwords will result in a
performance hit, so I have changed this too.
The end result is that while the original ERATOBAS.BAS took
5.84 cpu seconds to complete on a 780, the following program takes
only 2.88 seconds. While this is still slow by some standards,
I have found that VAX BASIC is nowhere nearly as slow as most people
think. I have consistently been able to bring it to within a factor
of 2 compared to most other High Level languages (excluding Fortran),
and I consider that a small price to pay for the use of real native
dynamic strings.
-- Barry
---------------------------------------------------------------------
1 !***************************************************************!
! !
! Seive of Erastothones Basic Benchmark program !
! !
!***************************************************************!
10 Option type = explicit
Option inactive = subscript checking
Option inactive = integer overflow
110 Declare long constant Limit = 10000
Declare long constant Strsiz = 4*(limit+1)
Map (ws) long FLAGS (limit), ones (limit)
Map (ws) string flg_str = strsiz, one_str = strsiz
DECLARE long COUNTER,PRIME,J,I,K
! Prime the array pages.
Flags (i) = 0% For i = 0% to limit
Ones (i) = 1% For i = 0% to limit
PRINT "ERATO start in BASIC"
CALL LIB$INIT_TIMER()
FOR J=1% TO 10%
COUNTER=0%
Flg_str = one_str
FOR I=1% TO LIMIT
IF FLAGS(I)=1%
THEN
PRIME=I+I+3%
K=I+PRIME
WHILE K <= LIMIT
FLAGS(K)=0%
K=K+PRIME
NEXT
COUNTER=COUNTER+1%
END IF
NEXT I
NEXT J
CALL LIB$SHOW_TIMER()
PRINT
PRINT COUNTER;" primes"
END
|
115.21 | just thinking | WELSWS::DODD | | Mon Nov 24 1986 09:18 | 12 |
| The point I was trying to make was only that people lay out and
write BASIC badly far more often than any other VAX language.
The improvement with taking out the initialisation loops could be
done in all of the ERATO* I looked at and would give a hefty
performance improvement. I attempted no performance improvement
just maintainability... Datatyping is probably the cause of most
spurious code just take a look at a COBOL program - the slightest
incompatibility and it's away into packed decimal hence it's sluggish
performance on a microVAX.
BASIC is a good enough language unless your major goal is absolute
performance.
|
115.22 | | STAR::BENSON | | Mon Nov 24 1986 17:08 | 43 |
| RE: .19:
> Eg the RESUME statement forces them the re-load registers over and over
> again to handle the case where one RESUMEs in the middle of some code.
> Hence, they say, the compiler can not optimize.
> Well, I have my doubts, but I can not substanciate (sp?) them.
> It's just that some optimizations *seem* so easy to do.
You're right - some SEEM easy to do, until you think of all the implications.
For example, eliminating the code you mention which loads the current line
number address into the frame. Even if the user does explicit RESUMEs, there
is still the problem of returning the line on which the error occurred (ERL),
which can even be done from a calling program.
> I think the language is fun. The compiler is stupid. The VAX BASIC
> engineers will ofcourse tell us that they would like to make a
> clever compiler but the stupid languange definitions do not allow for one.
Yes, there are some language elements which cause problems for optimizations.
However, you should also look at the improvements that have been made in the
VAX BASIC language and compiler between V1, V2, and V3, which would give
you a clue as to what the project goals have been. And remember that another
implicit goal is to be a compiler that looks like an interpreter, and
optimization can cost compile time.
Personally, I don't doubt that the VAX BASIC group would like to add some
optimizations (for non-interactive mode), as time permits. Flaming at the
compiler (or the group) doesn't help. Learning how to write good BASIC code,
and then providing input about where you think the shortcomings are and what
language elements are most important, might.
RE: .20:
Note that you can specify /NOLINE to further improve the code - it eliminates
the MOVA mentioned above. Also, V3 has a CONSTANT TYPE option, so if you
dislike adding "%" to your integer constants, you could specify
OPTION CONSTANT TYPE = INTEGER
- Tom
Disclaimer - I don't work on VAX BASIC so I don't speak for the group. I
just happen to know someone who once did. 8^)
|
115.23 | .0 update | MUNICH::FROEHLIN | Type <RET>!What do you see?R-E-T-U-R-N | Mon Dec 01 1986 07:18 | 35 |
| With all your valuable inputs, this is the last update
of .0 .
Servus,
G�nther
Sieve of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------
CPU execution time in seconds !
Compiler ! VAX750 ! VAX780 ! VAX785 ! uVAX II ! VAX8200 !
---------+---------+---------+---------+---------+---------+
ADA ! 2.53 ! 1.54 ! 1.07 ! 1.46 ! 1.30 !
BASIC ! 3.87 ! 2.19 ! 1.60 ! 2.45 ! 1.89 !
BLISS ! 2.27 ! 1.49 ! 0.91 ! 1.40 ! 1.64 !
C ! 3.49 ! 2.14 ! 1.56 ! 2.14 ! 1.81 !
COBOL ! 10.14 ! 5.91 ! 3.89 ! 19.88 ! 5.69 !
FORTRAN ! 2.14 ! 1.27 ! 0.84 ! 1.18 ! 1.08 !
PASCAL ! 2.98 ! 1.65 ! 1.06 ! 1.71 ! 1.54 !
PLI ! 2.63 ! 1.48 ! 0.91 ! 1.61 ! 1.33 !
---------+---------+---------+---------+---------+---------+
Compiler ! Version ! Compiler Option
---------+----------+---------------------------------------+
ADA ! T1.3-19 ! ADA /NOCHECK/OPTIMIZE=TIME
BASIC ! V2.4 ! BASIC /NOCHECK/OPTIMIZE
BLISS ! V4.1-746 ! BLISS /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED)
C ! V2.2-015 ! CC /OPTIMIZE
COBOL ! V3.4-49 ! COBOL /NOCHECK
FORTRAN ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE
PASCAL ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL
PLI ! T3.0-091 ! PLI /NOCHECK/OPTIMIZE
---------+----------+---------------------------------------+
|
115.24 | How about a MACRO-32 version? | TLE::MEIER | Bill Meier | Mon Dec 01 1986 14:46 | 8 |
| Might be interesting to code the "FORGOL" algorithm in MACRO-32 to see what the
"theroretical" limit on execution speed, assuming of course, a good macro
programmer can [eventually] produce code that is better than any compiler, at
least today, with the VAX instruction set.
Ie, is FORTRAN at the 99% point relative to MACRO or the 80% point, etc.
Any volunteers?
|
115.25 | Improvement still possible | OSLCSC::OLAV | Olav Tollefsen, TSSC, Norway | Thu Dec 04 1986 15:54 | 42 |
| Declaring the flags array of type BOOLEAN will speed up the Ada version
by 10%. I have included the new version.
Olav
with TEXT_IO; use TEXT_IO;
procedure ERATOADA is
package INT_IO is new INTEGER_IO(INTEGER);
LIMIT : constant INTEGER:=10000;
FLAGS : array (1..LIMIT) of BOOLEAN;
COUNTER,PRIME,I,J,K : INTEGER;
procedure INIT;
pragma INTERFACE(VMS,INIT);
pragma IMPORT_PROCEDURE(INIT,"lib$init_timer");
procedure SHOW;
pragma INTERFACE(VMS,SHOW);
pragma IMPORT_PROCEDURE(SHOW,"lib$show_timer");
begin
PUT_LINE("ERATO start in ADA");
INIT;
for J in 1..10
loop
COUNTER:=0;
FLAGS:=(others => TRUE);
for I in 1..LIMIT
loop
if FLAGS(I)
then
PRIME:= I+I+3;
K:=I+PRIME;
while K<=LIMIT
loop
FLAGS(K):= FALSE;
K:=K+PRIME;
end loop;
COUNTER:=COUNTER+1;
end if;
end loop;
end loop;
SHOW;
INT_IO.PUT(COUNTER);PUT_LINE(" primes");
end ERATOADA;
|
115.26 | | SMOP::GLOSSOP | Kent Glossop | Sat Dec 06 1986 01:41 | 91 |
| Just like for BASIC, it's possible to make things go faster for other
languages if you know the compiler. I modified the PL/I benchmark
slightly, and it now actually very slightly outperforms even the original
FORTRAN version on an 8800. Well, that was until I modified the FORTRAN
version... The FORTRAN version can be sped up quite a bit on the
8800 if you know that the memory is organized as longwords. When
you know that, you know that byte writes take longer than longword
writes, since for byte writes, the system must read the old longword
and update the value, while (aligned) longword writes can just blast
the old location. The catch is systems that cost more to write longwords
than bytes (systems with 16-bit memory, perhaps) might actually go
slower. (Also, LOGICAL*4 allows BLBS/BLBC to be used in place of
BBS/BBC, since you know there is an entire longword available.)
I will point out that I think this is a VERY POOR benchmark. I will
also point out that anyone who has a good knowledge of a language and
a compiler can frequently do a LOT better than J. Random Programmer
that doesn't ever bother to see what the compiler is generating and
why it is doing that.
I have marked the revised figures with asterisks. [I ran the benchmark
10 times in a command procedure interleaving the FORTRAN and PL/I runs
and used the average of the results after verifying there were no anomalies.
This is using the original version of the FORTRAN code, not the version
specifically set up for the 8800.]
Sieve of Eratosthenes Benchmark (calculating prime numbers)
-------------------------------------------------------------
CPU execution time in seconds !
Compiler ! VAX750 ! VAX780 ! VAX785 ! uVAX II ! VAX8200 ! VAX8800 |
---------+---------+---------+---------+---------+---------+---------+
ADA ! 2.53 ! 1.54 ! 1.07 ! 1.46 ! 1.30 ! |
BASIC ! 3.87 ! 2.19 ! 1.60 ! 2.45 ! 1.89 ! |
BLISS ! 2.27 ! 1.49 ! 0.91 ! 1.40 ! 1.64 ! |
C ! 3.49 ! 2.14 ! 1.56 ! 2.14 ! 1.81 ! |
COBOL ! 10.14 ! 5.91 ! 3.89 ! 19.88 ! 5.69 ! |
FORTRAN ! 2.14 ! 1.28* ! 0.85* ! 1.24* ! 1.08 ! .273* |(.253 w/
PASCAL ! 2.98 ! 1.65 ! 1.06 ! 1.71 ! 1.54 ! |LOGICAL*4)
PLI ! 2.?? ! 1.28* ! 0.85* ! 1.26* ! 1.?? ! .268* |
---------+---------+---------+---------+---------+---------+---------+
Compiler ! Version ! Compiler Option
---------+----------+---------------------------------------+
ADA ! T1.3-19 ! ADA /NOCHECK/OPTIMIZE=TIME
BASIC ! V2.4 ! BASIC /NOCHECK/OPTIMIZE
BLISS ! V4.1-746 ! BLISS /OPTIMIZE=(LEVEL:3,NOSAFE,SPEED)
C ! V2.2-015 ! CC /OPTIMIZE
COBOL ! V3.4-49 ! COBOL /NOCHECK
FORTRAN ! V4.5-220 ! FORTRAN/NOCHECK/OPTIMIZE
PASCAL ! V3.4-114 ! PASCAL /NOCHECK/OPTIMIZE=ALL
PLI ! T3.0-124 ! PLI /NOCHECK/OPTIMIZE
---------+----------+---------------------------------------+
/*
* FLAGS_OVERLAY is used because for some reason FLAGS = '1'b isn't
* generating a MOVC5 instruction like I would expect. COUNT2 is
* used because using a variable in a PUT statement causes an up-level
* reference, since a BEGIN block is implicitly generated for all PUT
* statements. (Up-level referenced variables are forced to memory.)
*/
%REPLACE LIMIT BY 10000;
ERATO: PROC OPTIONS(MAIN);
DCL (COUNT,COUNT2,PRIME,J,I,K) FIXED;
DCL FLAGS(LIMIT) BIT ALIGNED;
DCL FLAGS_OVERLAY(LIMIT) FIXED BIN(7) BASED(ADDR(FLAGS));
DCL LIB$INIT_TIMER EXTERNAL ENTRY;
DCL LIB$SHOW_TIMER EXTERNAL ENTRY;
PUT LIST ('ERATO start in PLI');
CALL LIB$INIT_TIMER;
DO J=1 TO 10;
COUNT=0;
FLAGS_OVERLAY=1; /* Array assignment */
DO I=1 TO LIMIT;
IF FLAGS(I) THEN DO;
PRIME=I+I+3;
K=I+PRIME;
DO WHILE (K <= LIMIT);
FLAGS(K)='0'b;
K=K+PRIME;
END;
COUNT=COUNT+1;
END;
END;
END;
CALL LIB$SHOW_TIMER;
COUNT2 = COUNT;
PUT SKIP LIST (COUNT2,' primes');
END ERATO;
|
115.27 | | SMOP::GLOSSOP | Kent Glossop | Sat Dec 06 1986 02:08 | 46 |
| Also, just emphasize how much knowing about compilers, etc., can make a
difference, simply by moving the variable declarations INSIDE the program
block for C makes an enormous difference, since the variables go from
"external" (which can in theory be modified asynchronously - i.e. aliased)
to "automatic", which is not aliased unless it has its address taken.
(Aliased variables cannot be cached in registers.) The speed of the C
code after this "trival" change is approximately the same speed as the
revised PL/I program - i.e. only slightly slower than FORTRAN, or a 40%
speed improvement over the original version.
************************************************************************
DON'T TRUST BENCHMARKS UNLESS YOU KNOW *EXACTLY* WHAT THEY ARE SUPPOSED
TO BE TESTING, AND YOU ARE CONFIDENT THAT THEY ARE REALLY TESTING WHAT
THEY CLAIM TO BE TESTING. INTER-LANGUAGE BENCHMARKS ARE ESPECIALLY
DANGEROUS, SINCE PEOPLE RARELY KNOW ALL LANGUAGES EQUALLY WELL.
************************************************************************
The revised C program follows:
#define limit 10000
main()
{
int counter,prime,i,j,k; /* Moved from outside the block */
int flags[limit+1]; /* Moved from outside the block */
printf("\nERATO start in C");
lib$init_timer();
for (j=1; j<=10; j++)
{ counter=0;
for (i=1; i<=limit; i++) flags[i]=1;
for (i=1; i<=limit; i++)
{ if (flags[i]==1)
{ prime = i + i + 3;
k = i + prime;
while (k <= limit)
{ flags[k]=0;
k = k + prime;
}
++counter;
}
}
}
lib$show_timer();
printf("\n%d primes",counter);
}
|
115.28 | Correct S of E? | VINO::MERRILL | | Mon Dec 08 1986 18:23 | 31 |
|
What is the correct definition for S of E? The previous response
used a different algorithm than that before, so I came up with one more
it is quite fast for 1000 sieves, about .57 seconds execution time:
#define TRUE 1
#define FALSE 0
#define SIZE 8190
main()
{
int i, j, k;
/* initilize 1 & 2 to be initially prime */
int flags[SIZE + 1] = { FALSE, TRUE, TRUE, };
lib$init_timer();
for (j = 1; j <= 1000; j++) {
for (i = 3; i <= SIZE; i += 2) /* start with 3 */
flags[i] = TRUE;
for (i = 3; i <= SIZE; i += 2)
if (flags[i])
for (k = (i + i) + i; k <= SIZE; k += (i + i))
flags[k] = FALSE;
}
lib$show_timer();
printf("Prime numbers:\n");
for (i = 1; i <= SIZE; i++)
if (flags[i])
printf(" %d\n",i);
}
|
115.29 | | SMOP::GLOSSOP | Kent Glossop | Mon Dec 08 1986 20:13 | 5 |
| I wouldn't be surprized if there are faster algorithms - I just sped up
the one provided to make a point about inter-language benchmarking
comparisons.
Kent
|
115.30 | see it two ways | MUNICH::FROEHLIN | Type <RET>!What do you see?R-E-T-U-R-N | Tue Dec 09 1986 04:32 | 13 |
| If you compare the results among the cpu types, you see how
dangerous the SofE benchmark is. The relative speed of the produced
code varies. And this I think is an interesting point too.
For certain (or small) programs one VAX may be relatively better
than the other. It seems a little bit like you can find out a
PASCAL machine or a BLISS machine or whatsoever.
I would like to know how compiler engineers put that into account.
Thank you all for the interesting discussion
(and code improvements)!
G�nther
|
115.31 | Optimizing for a specific processor | QUARK::LIONEL | Reality is frequently inaccurate | Tue Dec 09 1986 10:12 | 9 |
| Re .30:
Our usual strategy is to optimize for the fastest processor. However,
of late the various processor groups are trying to woo the
compiler developers into making their particular processor look
good, sometimes at the expense of another processor. It's quite
true that a code sequence that is best for processor X is not so
for processor Y.
Steve
|
115.32 | ahem | BIZET::VANROGGEN | | Tue Dec 09 1986 10:25 | 4 |
| I found that using the algorithm provided by Knuth, although a
bit primitive, was about four times faster than the original
algorithm given in ERATO. It was also correct, unlike ERATO.
I'm surprised people haven't complained more about this.
|
115.33 | Byte gets the blame | TLE::RMEYERS | Randy Meyers | Tue Dec 09 1986 14:59 | 12 |
| Re .32:
I believe that Byte Magazine is the culprit responsible for the
broken algorithm for finding prime numbers. Byte popularized
that particular benchmark in that form. I always assumed that
they wrote the program without checking to see if it worked.
In general, I have found Byte very enjoyable for its technological
howlers. I especially liked an article on recursive descent parsing
that evaluated expressions right to left. The author of the article
did believe this to be a drawback, but didn't see how it could be
fixed.
|
115.34 | | CLT::GILBERT | eager like a child | Tue Dec 09 1986 19:11 | 21 |
| In looking at the code generated by VAX Ada, I noticed a couple
interesting things -- the initialization of the array is done by
a MOVC3, whereas a MOVC5 would do as well, the value 'FALSE' is
materialized (consuming R4, whereas a literal zero would suffice
in the one instruction that uses it), the compiler does subscript
checking but no range propagation, so some checks are redundant,
and it uses INSV and EXTV to set/clear and test bits in the array.
These are a few opportunities for the compiler to generate better
code.
A more important aspect (as far as benchmarks go) is that the loop
variables are 'off by 1' from what the algorithm naturally desires.
Thus, the compiler generates some "SUBL3 #1, Ra, Rb" instructions
that would be unnecessary if the 'correct' loop variable or induction
variable were used. This is due (in part) to the bit-vector being
indexed starting at 1, instead of 0. It's doubtful that any compiler
would attempt to determine the 'correct' induction variable, but
some (such as Pascal or Fortran) might 'luck' upon it. This gives
a significant performance improvement (on older VAX models), and is
attributable only to chance; this doesn't make for a good benchmark.
|
115.35 | | AUTHOR::WELLCOME | Steve | Wed Mar 18 1987 11:12 | 2 |
| For what it's worth (not much), I tried this in MACRO-11 on a
PRO-380 and it took 2.68 seconds.
|
115.36 | late entry for COBOL | CLT::RICE | | Mon Mar 23 1987 11:12 | 95 |
| I know this is a late entry, but I just found this note.
To emphasize the fact that benchmarks and benchmark data should not be
distributed without a complete understanding of the nature of the
program and the expected meaning of the test (re: .27) - the VAX COBOL
numbers previously reported were inaccurate on the order of 125% to
over 850%. A properly written VAX COBOL program (knowing something
about the compiler helps) is much faster than the times reported in
#.23 indicate. In this case the major difference is the use of longword
integer numerics instead of numeric string (which the compiler then
converts to packed decimal for most calculations - but that's another
story).
I was unable to obtain the saveset, but wrote a COBOL program based on
the BASIC version in .16 and .20. Without spending a great deal of time
optimizing it I obtained:
MicroVAX II VAX/785
'better' COBOL 2.06 1.71
instead of
19.88 3.89
In view of the age of this note - I wonder how many readers have a
distorted view of the speed of VAX COBOL programs. Slower than FORTRAN
- sure, but not THAT slow! And, it shouldn't be a big suprise to anyone
that packed is slower than integer - especially on MicroVAX.
Wish I'd seen this note sooner.
-Chip
(VAX COBOL development)
*****************************************************************
IDENTIFICATION DIVISION.
PROGRAM-ID. SIEVE.
ENVIRONMENT DIVISION.
*****************************************************************
* *
* Sieve of Eratosthenes COBOL Benchmark program *
* (whynot) *
*****************************************************************
DATA DIVISION.
WORKING-STORAGE SECTION.
01 LIMITT PIC S9(9) COMP VALUE 10000.
01 STRSIZ PIC S9(9) COMP VALUE 40004.
01 COUNTER PIC S9(9) COMP VALUE 0.
01 PRIME PIC S9(9) COMP VALUE 0.
01 J PIC S9(9) COMP VALUE 0.
01 I PIC S9(9) COMP VALUE 0.
01 K PIC S9(9) COMP VALUE 0.
01 TABLES.
02 FLGSTR.
03 FLAGS OCCURS 10000 TIMES PIC S9(9) COMP.
02 ONESTR.
03 ONES OCCURS 10000 TIMES PIC S9(9) COMP.
PROCEDURE DIVISION.
BEGIN.
PERFORM VARYING I FROM 1 BY 1 UNTIL I > LIMITT
MOVE 0 TO FLAGS(i)
MOVE 1 TO ONES(i)
END-PERFORM.
DISPLAY "ERATO start in COBOL".
CALL "LIB$INIT_TIMER".
PERFORM VARYING J FROM 1 BY 1 UNTIL J = 10
MOVE 0 TO COUNTER
MOVE ONESTR TO FLGSTR
PERFORM VARYING I FROM 1 BY 1 UNTIL I = LIMITT
IF FLAGS(I) = 1
THEN
* ADD I I 3 GIVING PRIME
MOVE I TO PRIME
ADD I TO PRIME
ADD 3 TO PRIME
* display prime conversion
ADD I PRIME GIVING K
PERFORM UNTIL K NOT < LIMITT
MOVE 0 TO FLAGS(K)
ADD K PRIME GIVING K
END-PERFORM
ADD 1 TO COUNTER
END-IF
END-PERFORM
END-PERFORM.
CALL "LIB$SHOW_TIMER".
DISPLAY " ".
DISPLAY COUNTER CONVERSION " primes".
|
115.37 | I've got what you were looking for | DENTON::AMARTIN | Alan H. Martin | Tue Mar 24 1987 10:58 | 7 |
| Re .36:
> I was unable to obtain the saveset, ...
See DENTON""::DENTON$DUA1:[AMARTIN.ERATO]ERATO.A (or ERATOCOB.COB on
the same area).
/AHM
|
115.38 | got it - thanks | CLT::RICE | | Tue Mar 24 1987 12:55 | 5 |
| re: .37 thanks. Guenther too has mailed me his COBOL version.
Most of the improvement in my version is from the use of integer
numeric instead of string.
-Chip
|
115.39 | another set of benchmark figures | COOKIE::DOUCETTE | Chuck Doucette | Tue Jun 02 1987 18:33 | 111 |
| Here (after the FF) is a reproduction of some benchmarks I did recently.
I did this in preparation for a talk I gave at ZKO last Thursday in
the Pythagoras room (5/28/87) on DEC WRL Modula-2 (a descendant of Mike
Powell's compiler). I post this for two reasons - 1) to generate
interest in this compiler, and 2) I'd like suggestions as to how
to make this a fair, accurate, and complete comparison.
If you'd like complete compiler distribution kits for ULTRIX or VMS
including these benchmarks - please send mail to Joel McCormack
(SONORA::JOEL). If you want to just play with the benchmark files themselves,
you can look in COOKIE::DISK$GZ_DISK:[DOUCETTE.NEWMOD.BENCH] at the files
BENCH.*;* and DESCRIP.MMS;0
Chuck
From: COOKIE::DOUCETTE "Chuck Doucette 26-May-1987 2333" 27-MAY-1987 00:07
To: @GROUP
Subj: I redid the DECWRL Modula-2 benchmarks - some dramatic results:
--------------------------------------------------------------------------------
|| Program | Berkeley Ultrix DEC WRL ||
|| Name | Pascal C Modula--2 ||
--------------------------------------------------------------------------------
|| Perm | 2.58 1.40 1.14 ||
|| Towers | 2.88 1.60 1.00 ||
|| Queens | 1.47 0.49 0.46 ||
|| Intmm | 2.75 1.27 0.59 ||
|| Mm | 2.32 1.69 0.64 ||
|| Puzzle | 15.73 8.92 2.56 ||
|| Quick | 2.07 0.62 0.45 ||
|| Bubble | 3.57 2.98 0.67 ||
|| Tree | 3.30 1.76 1.00 ||
|| FFT | 5.72 3.62 1.44 ||
|| Int Comp | 5.27 2.91 1.39 ||
|| Real Comp | 7.90 4.49 2.01 ||
--------------------------------------------------------------------------------
|| avg. improvement | 74% 47% N/A ||
--------------------------------------------------------------------------------
This is CPU seconds running on shodha w/ ULTRIX (a VAX-8200).
The "avg. improvement" row means that for each entry, I compared
it with DEC WRL Modula-2. I figured the improvement in execution
time from the particular language and Modula-2. Then, I added
up all the improvement percentages per column, divided by the
number of entries, and got the average. This should indicate
(not accurately) the average expected improvement in execution
time (i.e. how good was the code that the compiler generated)
if you coded the same thing in Modula-2.
N/A means either Not Applicable or Not Available.
Other than the actual timings I got (and the machine), and the avg.
improvement line that I added - see the following excerpt from overview.doc
written by Mike Powell for details (note that speed of compilation with the
current compiler is twice as fast as the compiler in use when this was
written):
The compiler was designed and built by Michael L. Powell and
it appears to be one of the best compilers for the VAX in
terms of the efficiency of the generated code. The error
messages are slightly better than those from the Unix C com-
piler and the compile time is slightly worse. Below is a
table of execution times on a set of ten benchmarks that
John Hennessy at Stanford has assembled. All the times are
in CPU seconds on a VAX 11/780 with FPA. All compilers were
asked to generate the best code they could and all optional
run time checks were turned off. All versions of the pro-
grams were as syntactically and semantically similar as pos-
sible.
Doing benchmarks is always tricky and is especially diffi-
cult on a VAX 11/780 because of the peculiar interactions
between the instruction prefetch buffer and instruction
alignment on word boundaries. Variations on the order of
10% because of changes in alignment are not unusual. The
integer and real composite numbers in the last two lines are
Hennessy's weighted averages of the previous lines -- his
attempt to summarize the results. The above performance
information is provided for guidance only. Your mileage may
vary.
--------------------------------------------------------------------------------
|| Program | VAX VAX VAX DEC WRL ||
|| Name | C Pascal Bliss--32 Modula--2 ||
--------------------------------------------------------------------------------
|| Perm | 2.11 1.52 1.93 1.71 ||
|| Towers | 2.30 1.20 2.01 1.46 ||
|| Queens | 0.60 0.75 0.54 0.59 ||
|| Intmm | 0.55 0.52 1.02 0.54 ||
|| Mm | 0.99 0.55 1.03 0.57 ||
|| Puzzle | 3.88 3.78 3.77 3.05 ||
|| Quick | 0.59 0.60 0.60 0.63 ||
|| Bubble | 0.76 0.74 0.73 0.85 ||
|| Tree | 2.42 3.03 2.32 1.44 ||
|| FFT | 1.88 1.28 1.02 0.98 ||
|| Int Comp | 2.43 2.13 2.25 1.76 ||
|| Real Comp | 3.27 2.68 N/A 2.21 ||
--------------------------------------------------------------------------------
|| avg. improvement | 21% 8% 17% N/A ||
--------------------------------------------------------------------------------
This is CPU seconds running on newton w/ VMS (a VAX-11/785).
Note that if my arithmetic is bad (in calculating avg. improvement),
please correct me. If you doubt the veracity of the results, I have all of the
sources - you are welcome to try them yourself.
Chuck Doucette
May 27, 1987
|
115.40 | if you have the BLISS versions as well... | VLNVAX::DMCLURE | Gear up for DECworld-87! | Mon Jun 22 1987 00:50 | 3 |
| I'd be curious to see the run time of the BLISS versions.
-davo
|
115.41 | my last word on this subject | COOKIE::DOUCETTE | Chuck Doucette - Database A/D @CXO | Tue Jun 23 1987 03:04 | 19 |
| I just checked - they are listed. If you look in the second chart (under
the paragraphs written by Mike Powell) you will see benchmark
timings under "DEC Bliss-32". I posted a very similar to this
one in the BLISS conference (TLE::BLISS) under note 294.* and it
seemed to draw much more interest.
I'd like to comment more on my benchark results,
but people either say a) your optimizer was tuned to these benchmarks,
b) you hand-picked these bench marks to make your compiler look
good, c) these benchmarks aren't realistic, or d) all benchmarks
are useless. I personally have faith in the benchmarks I listed
and in the compiler our group uses (it isn't MY compiler although
I have worked on it and do plug it).
Anyhow, I've said enough. If the compiler interests you,
you are more than welcome to read about it in the conference
I started in COOKIE::MODULA2 (send me mail if you want to become a
member - browsers are ok).
Chuck
|
115.42 | | TLE::BRETT | | Tue Jun 23 1987 09:15 | 5 |
|
faith: n. trust; belief; belief without proof; religion; promise;
loyalty, constancy.
/Bevin
|