T.R | Title | User | Personal Name | Date | Lines |
---|
1261.1 | | TRACE::GILBERT | Ownership Obligates | Fri Jun 29 1990 21:14 | 23 |
| > r
> X
The fact that X is an integer doesn't help much.
Suppose (for example) that the value of X is *known* and *fixed*,
so that you have a function of just one variable, f(r) = X^r. Now
r
f( r * log B ) = B
X
Thus, by the simple multiplication of r by a constant real number,
you can use f() to compute the exp function, which is a non-trivial
function, and which would be quite awkward to compute using only
integer arithmetic.
[ Note that multiplying two reals via integer arithmetic isn't too
difficult -- extract the mantissas and multiply them, extract the
exponents and add them, and put the parts together again.]
Why would you want to do this? Also, do you know how exponentiation
is done in math libraries?
|
1261.2 | More info... | RPLACA::HARVEY | Ask me... I might | Mon Jul 02 1990 13:06 | 17 |
|
The reason I want to do this: I need to calculate a formula that
looks something like this:
P = S * D * C * Q**1.85
During the course of my program, I expect this formula to be
calculated over 90 million times. In performance testing (in VAX C)
I found that on the machine I'm on (VAXStation 3500) I can do
about 800,000 floating point multiplications per second - but only
7000 exponentiations per second! (using the "pow" function
in C).
Exponentiation is the bottleneck here. I'm also considering
going to scaled integer operations instead of floating point for
the performance improvements - hence, the request regarding
an integer exponentiation method.
|
1261.3 | Looks bad | CIVAGE::LYNN | Lynn Yarbrough @WNP DTN 427-5663 | Mon Jul 02 1990 13:19 | 10 |
| > looks something like this:
>
> P = S * D * C * Q**1.85
>
> During the course of my program, I expect this formula to be
> calculated over 90 million times.
But does Q change each time the expression is evaluated? If it changes less
frequently, you (or the optimizer) can move the exponentiation out of the
inner loop... other than that, I see little hope.
|
1261.4 | | ALLVAX::JROTH | It's a bush recording... | Mon Jul 02 1990 13:50 | 9 |
| What range of values does Q have? Is the exponent really fixed?
If you really need the speed, you could code up a much faster
approximation than calling out of line library functions.
There would be no loss of accuracy in this - in fact it could
even be slightly more accurate. (But only slightly - the library
functions are quite finely tuned.)
- Jim
|
1261.5 | | RPLACA::HARVEY | Ask me... I might | Mon Jul 02 1990 14:01 | 7 |
|
RE: .3
Yes, Q changes at every iteration.
RE: .4
Q will range in value between 0 and 50,000. The exponent is really
fixed at 1.85
|
1261.6 | 50,000 element array? | VMSDEV::HALLYB | The Smart Money was on Goliath | Mon Jul 02 1990 14:35 | 5 |
| > Q will range in value between 0 and 50,000. The exponent is really
> fixed at 1.85
1.85
But this says you should pre-build a table of X for 1 <= X <= 50000
and just look up the entry you want.
|
1261.7 | | TRACE::GILBERT | Ownership Obligates | Mon Jul 02 1990 15:07 | 13 |
| (re .-1: You beat me to it.)
And if you can't `afford' a 50000-element array, then use a table for just the odd values
of Q, and handle the even Q with a multiply:
p
Q = 2 * q (q is odd, and p is an integer < 16)
1.85 1.85 p 1.85
Q = (2 ) * q
= T[p] * Z[q div 2] (T and Z are precomputed tables)
|
1261.8 | Hmmm... that's possible, but... | RPLACA::HARVEY | Ask me... I might | Mon Jul 02 1990 15:54 | 6 |
|
Building a 50,000 element array is possible. Except that the value
of Q is now kept to 2 decimal places. Using scaled integer
(with a scaling factor of 100), space for 5,000,000 array entries would
be needed, and 5,000,000 exponentiations would have to be done at
program start. Better than 90,000,000, though...
|
1261.9 | | ALLVAX::JROTH | It's a bush recording... | Mon Jul 02 1990 16:17 | 17 |
| If the exponent is really fixed, you could still use a "massive table
lookup" with a pair of few hundred entry tables by the floating point
exponent and the first 8 or so bits of fraction to index to a set
of low order Taylor series expansions to get the actual value.
Even better might be a set of Hermite expansions. Very cheap, just
a few multiples and adds.
I've done this for inverse square root (needed in a graphics application
to normalize vectors) and obtained several-fold improvement in
performance over calling library routines - at no loss of accuracy.
Note that the code for this goes inline which saves too, even on a
RISC processor where routine calls are much cheaper.
I could post some more details if you wish...
- Jim
|
1261.10 | ??? | CHOVAX::YOUNG | Bush: 'Read my lips; I Lied' | Mon Jul 02 1990 16:31 | 4 |
| I am afraid that I do not understand .8 at all. Why would you need
5,000,000 entries for 50,000 integers?
-- Barry
|
1261.11 | | ALLVAX::JROTH | It's a bush recording... | Mon Jul 02 1990 16:34 | 6 |
| � <<< Note 1261.10 by CHOVAX::YOUNG "Bush: 'Read my lips; I Lied'" >>>
�
� I am afraid that I do not understand .8 at all. Why would you need
� 5,000,000 entries for 50,000 integers?
I think they weren't really "integers"...
|
1261.12 | We would be using scaled integers to represent floats... | RPLACA::HARVEY | Ask me... I might | Mon Jul 02 1990 21:10 | 8 |
| re: .9
The value of Q is actually a floating point number with an accuracy
of 2 decimal digits. (e.g., 3256.83) In order to represent
this floating point number as an integer (with the same level of
precision) you would have to multiply it times 100 (so it would
be 325683). 50,000 times 100 = 5,000,000 entries in the array.
Jeff
|
1261.13 | | CHOVAX::YOUNG | Bush: 'Read my lips; I Lied' | Mon Jul 02 1990 23:03 | 9 |
| OK. Getting back to the original problem...
I believe that the performance loss that you are seeing is because the
pow() function, like most C floating point routines is forcing double
precision arithmetic. If you do not need this extra precision then you
can definitely get a speed-up by using single-precision routines (like
the OTS$ routines). Check in the VAXC conference for confirmation.
-- Barry
|
1261.14 | Are we solving the right problem? | CIVAGE::LYNN | Lynn Yarbrough @WNP DTN 427-5663 | Tue Jul 03 1990 11:54 | 9 |
| I have this sneaking suspicion that we're toying with the wrong problem. If
the exponent 1.85 is known only to 2 decimals, that says to me that it is
derived from some sampled data and is thus suspect, since exponents like
that occur so rarely in nature. I would take a hard look at the original
problem/data to make sure the mathematics fit the mechanics of the problem.
I'm not sure it's faster, but you might try an exponent of 59/32=1.84375,
which can be evaluated with 7-8 multipies and 5 square roots. Maybe even
15/8=1.875 would suffice.
|
1261.15 | Accuracy? | CADSYS::COOPER | Topher Cooper | Tue Jul 03 1990 12:26 | 4 |
| Which brings us to the question: how accurate does the result have
to be?
Topher
|
1261.16 | | SSDEVO::LARY | under the Big Western Sky | Tue Jul 03 1990 13:40 | 25 |
| The accuracy question is critical to the space/speed tradeoff, especially
if five million values won't fit into your working set (a disk read is a LOT
slower than an exponentiation!).
Consider the following algorithm for N^1.85, with N accurate to 2 digits after
the decimal:
- if N <= 500, look up I=100*N in a table (of 50000 elements) of (I/100)^1.85.
- if N > 500, look up J=floor(N) in a table (of ~50000 elements) of J^1.85;
then, N^1.85 = (J+F)^1.85 = (J^1.85) * (1+F/J)^1.85, where F is the fractional
part of N, = N-J.
Since F/J is small (<1/500) you can use the first three terms of the
"binomial expansion" of (1+F/J)^1.85 [can't you? If you can't, you'll have
to figure another polynomial based on exp(1.85*ln(1+F/J)), MAPLE should be
good for this] as an approximation to (1+F/J)^1.85, namely:
2
(1+F/J)^1.85 = 1 + 1.85*(F/J) + 1.85*.85*(F/J) /2
So you have to calculate about 100000 exponentiations initially, and each
individual operation is at worst a compare, floor, table lookup, divide, three
multiplies and three adds. Result ought to be good to 7 places, since the
magnitude of the next term in the "binomial expansion" is less than 1E-9...
|
1261.17 | replies... | RPLACA::HARVEY | Ask me... I might | Tue Jul 03 1990 20:13 | 12 |
|
re: .13
Thanks, I'll check out the OTS$ routines.
re:.14
The "1.85" exponent comes from the Hazen Williams formula for
calculating pressure loss and flowrates in a hydraulic system
(a pipeline). Our customer has used
this formula extensively over the last 20 years and is pretty
convinced of its accuracy.
re: .15
2 decimal places of accuracy is fine...
|
1261.18 | Please check .16 again, it is definitely the way to go | VMSDEV::HALLYB | The Smart Money was on Goliath | Tue Jul 03 1990 20:49 | 9 |
| Re: .17 It seems to me that Richie did a dynamite job getting you
something that you can use painlessly. It is easily the best of the
suggestions entered so far, especially if you're happy with the 1.85
exponent. You might earn some brownie points if you casually mention
to your customer that one of the company's most senior (and most
talented) engineers came up with a near ideal space/time tradeoff, just
for them.
John
|
1261.19 | when you lose your desire to tweak things, you're no longer an engineer... | TRACE::GILBERT | Ownership Obligates | Tue Jul 03 1990 22:13 | 19 |
| For that last step, instead of calculating 1 + c0*(F/J) + c2*(F/J)^2,
this is better done when multiplying by the table value (which I'll call T):
2
T + T*( 1.85*(F/J) + 0.78625*(F/J) )
Originally, when you added 1, you may have lost precision. Above, more
precision is kept going into the last step.
In any case, you calculate:
2
1.85*(F/J) + 0.78625*(F/J)
as
(F/J) * (1.85 + 0.78625*(F/J))
to reduce the number of multiplies (which John mentioned in swift passing).
P.S. John, whenever you want to write Math RTL code, let us know.
|
1261.20 | No slight was intended... | RPLACA::HARVEY | Ask me... I might | Fri Jul 06 1990 12:38 | 16 |
|
Re: .18 (Re:.16)
I certainly didn't intend any slight to the solution offered by
.16. The reason I didn't mention it in my last reply
is due more to my, shall we say "lack of mathematical sophistication"
(I didn't understand it yet, and hadn't had a chance to contact
the author to explain it to me...) than to lack of appreciation.
I do appreciate what appears to be a VERY good solution, and I
promise I'll get on it as soon as I put out this most recent fire.
Thanks for everyone's help and suggestions.
Jeff
|
1261.21 | example C code... | ALLVAX::JROTH | It's a bush recording... | Fri Jul 06 1990 17:33 | 105 |
| Here is a practical example of efficient calculation of x^1.85
in single precision. It works by putting together the exponent
and a simple linearly interpolated fraction and doesn't require
large tables.
The actual code could be sped up by some careful optimization
where we extract the fields of the floating point numbers
involved, but the gains won't be orders of magnitude.
It depends on VAX floating format, but obviously the same
thing with minor change to the exponents would work on a RISC
processor.
Hope this helps...
- Jim
/*
* Demonstrate fast evaluation of x^1.85 where x is a single
* float number.
*
* x = +/- 2^e * f
* e = exponent in excess 128 form
* f = fraction in [0.5, 1) with implied normalize bit
*
*/
#include math
#include stdio
/*
* Table of precomputed values of 2^(e*1.85)
*/
static float exp_seed[256];
/*
* Table of precomputed values of f^1.85 for f in the range [1/2, 1]
* in steps of 1/256
*/
static float frac_seed[129];
static init_seeds()
{
int i;
float x, *px = &x;
for (i = 0; i < 256; i++) {
exp_seed[i] = 0.0;
if ((i-128)*1.85 > 127) continue;
exp_seed[i] = pow(2.0, 1.85*(i-128));
}
for (i = 0; i < 128; i++) {
*(int *)px = 0x4000+i;
frac_seed[i] = pow(x, 1.85);
}
frac_seed[128] = 1.0;
}
double ask_d(query)
char *query;
{
static response[132];
double answer;
while(1) {
printf(query);
gets(response);
if (sscanf(response, "%lf", &answer) > 0) return answer;
}
}
main()
{
float x, *px = &x;
float e, f, p, t;
int iexp, ifrac;
init_seeds();
loop:
x = ask_d("enter x: ");
p = pow(x, 1.85);
iexp = *(int *)px & 0x7f80; /* strip off the exponent field */
ifrac = *(int *)px & 0x7f; /* strip off the high fraction bits */
*(int *)px += 0x4000-iexp; /* remove exponent from x */
t = x; /* get interpolation between fraction entries */
*(int *)px &= ~0xffff0000;
t = (t-x)*256.0;
/* put together exponent and interpolated fraction and show relative error */
e = exp_seed[iexp >> 7 & 255];
f = frac_seed[ifrac+1]*t + frac_seed[ifrac]*(1.0-t);
printf("exact value = %g\n", p);
printf("approximated value = %g\n", e*f);
printf("relative error = %g\n", (p-f*e)/p);
goto loop;
}
|