| Can you post the program here?
Does it use a single array, or several?
What datatype (real, double precision, other) does it use?
What operations doe sit perform on the array?
I'm sure the "problem" is related to the relatively small
16KB direct-mapped cache, backed by significantly slower
off-chip cache, of the EV45 processor in the Alphastation 255,
but making some plausible assumptions about the anwers to
the above question I can't see an obvious cause.
Have you considered bidding a more modern system, such as a
PWS 433au?
Is there any reason you specify -O3 (reducing optimization
below the default for Digital Fortran)? Can you get the
customer to just leave off the -O option for us?
If this example doesn't represent the "real and bigger job",
then why does it matter how this example performs? How does
the real job perform?
Do you want an explanation, or do you want advice on how
to improve the situation?
|
| re .1
Hello. Firstly, thank you for your fast reply...
>Can you post the program here?
No. My customer told me that this program is owned by several teachers.
Follow below the customer responses:
> Does it use a single array, or several?
There are several arrays. Is is important to say that the program
used, which reproduced the table sent before, was the same in all
machines tested.
> What datatype (real, double precision, other) does it use?
All double precision.
> What operations doe sit perform on the array?
The greatest cpu time consuming is to solve for 12000 times the
following equation:
A^{-1}B=C
this equation means an inversion of a matrix followed by a multiplication
by another matrix. As I said, this is carried out, in the present test,
12000 times. Afterwards few calculations, not expressive, are done.
>
>
> Is there any reason you specify -O3 (reducing optimization
> below the default for Digital Fortran)? Can you get the
> customer to just leave off the -O option for us?
There is no reason for using -O3 option. However, this was done in the
other machines as well. In addtion, -O option was used and no changes at all
were observed.
> If this example doesn't represent the "real and bigger job",
> then why does it matter how this example performs? How does
> the real job perform?
The test that we carried out was set (input) for a simple and fast
calculation. The program used is our main research line and usually
it takes several days of cpu time. For example, we are studying a
system and the calculations are being carried out on risc/6000 model
3CT/IBM and it has not finished. It started three months ago. Our
reserach concerns mainly in quantum scatterig calculations with
rotational state resolved.
>I'm sure the "problem" is related to the relatively small
>16KB direct-mapped cache, backed by significantly slower
>off-chip cache, of the EV45 processor in the Alphastation 255,
>but making some plausible assumptions about the anwers to
>the above question I can't see an obvious cause.
>Have you considered bidding a more modern system, such as a
>PWS 433au?
No. I can't consider another machine by now.
>Do you want an explanation, or do you want advice on how
>to improve the situation?
Please. I'd like an explanation, but if you can send some hints to improve
the performance I will be very happy.
Thank you very much for your attention.
Best regards,
Marcos Tome
|