| It's easy to see how this works by writing out the multiplication by
hand (here only 8 bits are shown):
xxxxxxxx
* yyyyyyyy
----------
xxxxxxxx add if y0 is 1
xxxxxxxx add if y1 is 1
xxxxxxxx add if y2 is 1, ...
xxxxxxxx
+ etc
----------------
zzzzzzzz00000001 desired form of product
Starting at the lsb we find that X0 and Y0 must be 1.
Set an accumulator, A, to X.
shift X left 1 place.
Now we know X0 is 1, so if A1 is 1 we must add X into the accumulator
and set Y1 to 1, else we must clear Y1 and not add X to the accumulator.
Repeat until X has been shifted left 8 bits and cannot affect the low
bits of the product; at this point A will equal 00000001 and Y will be
the multiplicative inverse mod 2^8.
Your routine has some nice embellishments; for example it shifts the
Y bits in place of the A bits saving a register. It does not matter
if you add or subtract X, since only testing the next bit in A counts
at each step because adding or subtracting xor's the lsb in any case.
Also, the LSB has been toggled at startup so that the process of
updating A simultaneously stashes the next correct bit in the Y
register.
Nice code. But it seems that there must be a faster way than simply
looping thru the bits.
Can anyone think of faster way which uses existing multiply/divide
instructions?
- Jim
|
| re .1
Oops. My memory fails me.
re .2
I think it does matter whether you add or subtract or XOR.
There must be some loop invariants, some equations relating
the input argument, R0, R1, and R2 that is true on each iteration
of the loop, and which imply that R0 holds the correct result
when we fall out of the loop. For example, at 10$, we know
(by induction) that R2 = (input-1)*2^R1 mod 2^32, and that
0 <= R1 <= 30. But what is R0?
|
| Let X = input
Y(j) = inverse of X with respect to 2^j (for j = 1, 2, 3, ...)
but Y(0) = 0
At 10$: R1(j) = j
R2(j) = 2^j * (X - 1)
R0(j) = Y(j) + (1 - X * Y(j))
The secret is that (Y) uses low-order bits, and (1 - X*Y) uses high-order
bits. Someone saw that they could be squeezed together!
There are two cases going from R0(j) to R0(j+1):
A
Y(j+1) = Y(j)
R0(j) = Y(j) mod 2^(j+1)
R0 bit j is zero
R0(j+1) = R0(j)
= Y(j) + (1 - X * Y(j))
= Y(j+1) + (1 - X * Y(j+1))
B
Y(j+1) = Y(j) + 2^j
R0(j) = Y(j) + 2^j mod 2^(j+1)
R0 bit j is one
R0(j+1) = R0(j) - R2(j)
= Y(j) + (1 - X * Y(j)) - 2^j * (X - 1)
= Y(j) + 2^j + 1 - X * (Y(j) + 2^j)
= Y(j+1) + (1 - X * Y(j+1))
John
|