| From: DEC:.REO.JOLLY::HUDSON "[email protected] - UK Software
Partner Engineering 830-4121" 19-MAY-1997 14:51:56.28
To: nm%vbormc::"[email protected]"
CC: HUDSON
Subj: RE POINT #28554, TIS Software Ltd , floating numbers
Hello Noel
Thanks for your ASAP call on floating point formats.
Unfortunately I don't have a solution for you. I think that what's happening
will depend on the routines which converts ASCII to floating point and back to
an output string.
I believe that the number you mention, "1081.13" is one that can't be
represented precisely in hardware (see below). The actual value you are
storing in binary to represent "1081.13" will depend on the floating point
format (F_FLOAT, D_FLOAT, G_FLOAT, or one of the IEEE formats), and the way
that the code is translating between ASCII and floating format.
From the numbers you quote ("1081.1299999999999" and "1081.1300000000001"), I'd
guess you're using double precision (you don't tend to get that many places
from single precision). Further, I'd guess you're probably using G_FLOAT or
D_FLOAT, since they're the default with the language compilers (not IEEE),
although it would be subject to the same problems.
The specification for D_FLOAT says that the precision is "typically 16 decimal
digits", and G_FLOAT "typically 15 decimal digits" (Alpha Architecture Reference
Manual). Note though that on Alpha, D_FLOAT isn't implemented to the same
level of precision as on VAX, so that brings it down to 15 digits when used for
calculations.
So assuming the ASCII value has to be translated into a binary floating point
form, you are guaranteed to lose the exact "1081.13" value.
When you turn that value back into an output string, then what you see will
depend on what the routine does that prints the string. For example, if your
output routine "knows" that the number is only going to be correct to 15
digits, then it could round up or down, and you'd get a value of "1081.13"
regardless of whether your internal representation is slightly higher or lower
than that.
But if your output routine isn't doing any rounding, then the values it
displays will be likely to be wrong in the way you've seen.
As an example, the DEBUGGER "knows" to do this rounding:
DBG> set radix hex
DBG> dep/d_float 10000=1081.13
DBG> ex/d_float 10000
00010000: 1081.13000000000
Note that the debugger says that the number stored at address 10000 is
"1081.13000000000".
But if I look at the actual binary data and convert that back to a floating
point value by hand (well actually with a program)...
DBG> ex/hex 10000
00010000: 24284587
DBG> ex/hex 10004
00010004: 8F5CF5C2
A D_FLOAT looks like this :
32 16|15 7|6 0
+----------------+-+--------+-------+
| fraction2 |S|exponent|fract1 |
+----------------+-+--------+-------+
| fraction4 | fraction3 |
+----------------+------------------+
32 16|15 0
The number 8F5CF5C2 24284587 looks like this :
32 16|15 7|6 0
+----------------+-+--------+-------+
|0010010000101000|+|10001011|0000111| 24284587
+----------------+-+--------+-------+
|1000111101011100| 1111010111000010| 8F5CF5C2
+----------------+------------------+
32 16|15 0
The most significant bit of the fraction is ALWAYS considered to be set, and
so does not need to be stored. Given that, the full fractional part is :
<frac1> <frac2> <frac3> <frac4>
+0.10000111 0010010000101000 1111010111000010 1000111101011100
Which represents the sum of the following (decimal) numbers :
+0.5
+0.015625
+0.0078125
+0.00390625
+0.00048828125
+0.00006103515625
+0.0000019073486328125
+0.000000476837158203125
+0.0000000298023223876953125
+0.00000001490116119384765625
+0.000000007450580596923828125
+0.0000000037252902984619140625
+0.000000000931322574615478515625
+0.00000000023283064365386962890625
+0.000000000116415321826934814453125
+0.0000000000582076609134674072265625
+0.000000000001818989403545856475830078125
+0.00000000000045474735088646411895751953125
+0.000000000000028421709430404007434844970703125
+0.0000000000000142108547152020037174224853515625
+0.00000000000000710542735760100185871124267578125
+0.000000000000003552713678800500929355621337890625
+0.00000000000000088817841970012523233890533447265625
+0.0000000000000002220446049250313080847263336181640625
+0.00000000000000011102230246251565404236316680908203125
+0.000000000000000055511151231257827021181583404541015625
================================================================================
Total= +0.527895507812499997779553950749686919152736663818359375
================================================================================
The exponent is 139 which is held in excess 128 format, giving a true
exponent of 11
So to find the actual number, we now have to multiply the fraction by 2^11
Which works out to
0.527895507812499997779553950749686919152736663818359375
*2048.
The precise number represented is :
================================================================================
+1081.1299999999999954525264911353588104248046875
================================================================================
So whatever routine the debugger uses does appropriate "rounding".
I hope this answer is of some help. Let me know if you have other questions.
Regards
Nick Hudson
Digital Software Partner Engineering
|