T.R | Title | User | Personal Name | Date | Lines |
---|
615.1 | Obscurity is not the sole property of mathematicians | SMURF::JMARTIN | Kill a Type A today: drive 55. | Fri Nov 21 1986 08:40 | 7 |
| Since this item hasn't gathered a reply in twenty-four hours, I guess I
don't have to feel embarassed that I didn't understand it. It appears that
the request does not have to do with the accuracy of the math RTL. Does
it have to do with passing values to a function which are outside its domain
(e.g. arcsin(42))? Does it have to do with passing as a floating point
argument something which is not a floating point number (e.g. -0.0)?
--Joe
|
615.2 | My Brain Hurts! | CHOVAX::YOUNG | Back from the Shadows Again, | Sat Nov 22 1986 00:17 | 5 |
| I concur with .1. I may be a little bit thick-headed, but this
pretty hard to follow. Perhaps a rehash with more examples for
us slow learners?
-- Barry
|
615.3 | Further explanation. | STAR::JONES | | Mon Nov 24 1986 11:09 | 32 |
| It is a difficult thing to describe. The bottom line is I am trying
to validate as many code paths in the math RTL routines as I can
with the minimum amount of effort needed. An advantage that I have
is I can use a program called Testgen to automatically collect sets
of testcases based on the VMS usage name associated with each paramter
type for an RTL routine. Another words if the VMS usage name for
the MTH$ASIN X parameter is floating_point I could have a set of
testcases by that name that could be reused by other floating_point
parameter types if I am careful to only use testcases that are
generally acceptable to all RTL routine floating_point type parameters.
I do not know much about math routine testing. I would expect the
type testing that would be valuable would be both testing of failure
code paths via bad user input and accuracy testing of success code
paths. I would expect that in the case of accuracy testing there
would be sets of testcases that could be defined that would be
generally usable across many of the RTL routines. The purpose would
not be to do certification level but simply to make sure things
were not seriously broke. For example when each new baselevel of
VMS is generated this set of tests could be run first to prove that
the quality of the math routines had not regressed with the last
set of changes checked in. The system service regression tests are
currently used this way plus the performance group is now using
the tests to collect rough base level by base level graphs of system
service performance. This same performance information could be
collected for each of the math routines providing an early warning
system for performance degradation. It is difficult to describe
the need in a notes file but if anyone is able to provide this type
of information I am willing to meet with them and explain the need
further. I thing there is a very large potential for pay back in
this area with a relatively small investment in effort.
--Larry
|
615.4 | | BEING::POSTPISCHIL | Always mount a scratch monkey. | Mon Nov 24 1986 13:10 | 19 |
| Each math routine is different from others. Within a group, such as
the trigonometric functions, there may be some use in using the same
values for tests of the different functions, but overall this would not
be so.
You will need to analyze each function or group of functions to select
the test cases. For example, for trigonometric functions, you do the
obvious things, test at pi, 0, -pi, some points in between (both
fractions of pi and not), some special values, and numbers outside the
pi, 0, -pi range, both shortly outside but within the precision of the
routine and large numbers for which precision does not permit a
significant result. Since you must expect different answers for sine,
cosine, and such things, I would expect the testing software to be
flexible on this point, a hence probably flexible about the inputs, so
I do not know why you are looking for test cases to cover all
functions, instead of doing functions on a more individual basis.
-- edp
|
615.5 | We have test drivers unique to each routine. | STAR::JONES | | Mon Nov 24 1986 17:23 | 12 |
| We have test drivers that deal with the specifics for a particular
routine under test. This routine is responsible for checking the
results and verifying that no ill side effects have occured like
corrupted registers. It also is responsible for reporting errors
so the SIN test driver would know what correct answers would be
to specific values if there turned out to be a common set of good
testcases say for the trig functions and could even expect hard
wired results if needed. Again my lack of knowledge about how good
testing is performed on math type routines is what I am trying to
solve and this is being very helpful to me.
--Larry
|
615.6 | FP testing | EAGLE1::DANTOWITZ | David .. DTN: 226-6957 | Tue Nov 25 1986 09:25 | 12 |
|
As a side note on this topic:
My group is involved with exercising the VAX architecture (AXE).
Part of this involves testing the FP instructions!
We are presently looking into how to test the FP instructions.
Part of this is how to choose operands (FP numbers). If anyone
has any comments on this topic we would be interested. This seems
to be related to what Larry was asking (but more tightly focused).
David
|
615.7 | Maybe Latin Squares methods would help | MODEL::YARBROUGH | | Tue Nov 25 1986 09:55 | 15 |
| There was an article in one of the 1984/85 ACM Technical Publications on
testing using Latin Squares to minimize the number of tests required -
right at the moment I can't come up with it, but I was impressed by the
article, and I think it is worth digging out. Try the ACM Communications
and the S/W Eng'g Notes. I am sure it was NOT in the ACM Journal.
I think you can generate a fair set of tests for FP arithmetic by using
randomly generated signs, exponents, and number of nonzero bits in the
fraction. Verifying that the results are correct is easy for all but the
largest size numbers - just repeat the calculations with greater precision.
The biggest can be checked with MAPLE or some other extended precision
package. The best thing about randomly generated tests is their
objectivity; they will generate tests for cases people might not consider.
Lynn Yarbrough
|
615.8 | Some suggestions | SQM::HALLYB | Are all the good ones taken? | Tue Nov 25 1986 10:07 | 24 |
| * G-floating numbers that are too large for D-floating representation,
to catch any accidental conversions.
* The reverse: D-floating numbers that are too precise for G-floating
conversions, although this would require accuracy check on the output,
and that may require more thought.
* Are there any ill-formed numbers Larry should consider? I think
there's a bit pattern (^X80000000?) that is an invalid F-floating
point number. If so, wouldn't Larry want to test the Math library
routines with that as an input argument (as well as for the LIB$POLYx
coefficients, etc.) Are there any other illegal floating numbers?
* Are there any numbers that are always valid for input to all floating
routines? I'm tempted to suggest that something random like .654321
would happen to work in all cases, but maybe an RTL wiz knows fer sure.
* Another thought -- is it worthwhile to consider a "demand zero"
number? That is, an input parameter that is located at an address
that is currently mapped demand-zero, presumably causing a page
fault upon reference, but otherwise should result in the same output
as if a "hard" zero were supplied.
John
|
615.9 | replace every instruction with CALL UNTRACE | REGINA::OSMAN | and silos to fill before I feep, and silos to fill before I feep | Tue Nov 25 1986 14:21 | 40 |
| Rather than merely trying pi, 0, -pi, several values in between, a suggest
a different method.
Use one of those code tracer schemes. The idea is, you replace every
instruction of the routine being traced with:
CALL UNTRACE
The UNTRACE routine replaces the "CALL UNTRACE" with the correct
instructions, unwinds the stack, and transfers back to where the
CALL UNTRACE was.
Then, you run what you believe to be a thorough set of cases for the
routine. Finally, you examine the instructions. If there are any
CALL UNTRACE's left in the code, that's an area you haven't tested
thoroughly yet.
For example, if you were testing a quadratic equation routine, let's
say the routine were working with this equation:
s1, s2 = (-b +- sqr (b^2 - 4*a*c))/(2*a)
The routine might look something like this:
if a eql 0 then goto linear;
else if b^2 lss 4*a*c then goto imaginary;
if b^2 eql 4*a*c then goto only_one_solution;
else s1 = (-b + sqr (b^2 - 4*a*c))/(2*a), s2 = (-b - sqr (b^2 -
4*a*c))/(2*a)
By implanting CALL UNTRACE's in the code, then testing various cases,
then checking for CALL UNTRACE's, I'd expect them to only entirely
disappear if we are thorough enough to test ALL of the following:
o quadratic that's actually only linear
o quadratic that has imaginary roots
o quadratic that has only one root
o garden-variety quadratic
/Eric
|
615.10 | | CLT::GILBERT | eager like a child | Tue Nov 25 1986 14:29 | 31 |
| In testing some (D-floating) MTH$ routines, I wrote a routine that was
similar to MTH$RANDOM -- it cycled through 2^21 different 'interesting'
numbers. The format of a D-floating number is:
FFFFFFFFFFFFFFFFSEEEEEEEEFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Where S is the sign bit, Es are exponent bits, and Fs are mantissa bits
(in a slightly strange, word-oriented order). The following shows the
bits as variables:
abbbbbbbbbbbbbbcdefghijklmnnnnno
pqqqqqqqqqqqqqqrsttttttttttttttu
There are 21 variables, and the routine cycled through all 2^21 combinations
of the bits. Before using the number, the code could check for reserved
operands, and avoid calling the routine with these (condition handling took
a considerable amount of time, so I wanted to avoid it).
This is useful for MTH$ functions having a single argument (like all the
important ones), but with multiple operands, there would be far too many
test cases. Also, notice that it misses many interesting cases -- like
values near a multiple of pi/4 for the trigonometrics.
What's useful for the math routines (besides this random-number testing)
is testing near 'branch-points' -- places where the code takes different
code paths. These might be found (automatically?) by tracing through the
routine, and using a binary search.
- Gilbert
|
615.11 | UsePCA for that, IF it's an issue | MODEL::YARBROUGH | | Tue Nov 25 1986 14:55 | 21 |
| >... a different method.
>
>Use one of those code tracer schemes. The idea is, you replace every
>instruction of the routine being traced with:
>
> CALL UNTRACE
>
>The UNTRACE routine replaces the "CALL UNTRACE" with the correct
>instructions, unwinds the stack, and transfers back to where the
>CALL UNTRACE was.
That's what DEC/PCA does quite nicely, without building any kludges like
"UNTRACE". Further, it only places tracepoints where they will be useful,
at branches, and not in linear code.
But I think this misses the point, which is in determining whether the
approximating polynomials (or whatever) that are used to evaluate the
function are valid approximations to the function(s) over the set of reals
intended as arguments. This issue may not have anything to do with control
flow. It may have a great deal to do with the precision with which internal
constants are represented, number of terms in the polynomial(s), etc.
|
615.12 | More on Latin Squares | MODEL::YARBROUGH | | Wed Nov 26 1986 13:28 | 4 |
| Re .7: since someone asked for more info, I dug up the article on Orthogonal
Latin Squares: it's in ACM Communications of Oct 1985, p. 1054. The article
is mostly about testing an ADA compiler, but the general method is widely
applicable.
|
615.13 | W.J. Cody for transcendental accuracy | SMURF::JMARTIN | Picture Jimmy Carter's grin now. | Mon Dec 01 1986 12:46 | 12 |
| He wrote an excellent book covering algorithms for arithmetic and
transcendental functions. I can't remember the name, but I remember borrowing
it from the Marlboro technical library once. More to the point, he is an
author of some tests written in FORTRAN at Argonne Labs. The book may identify
those programs precisely enough so that you can locate them.
I had occasion to use the tests when I was designing code generator sequences
for a processor that had some--but not all--transcendental functions in
microcode. They provided excellent coverage of accuracy as well as of extreme
and pathological values in the functions' domains, including extreme domain
values which should work but are flagged as out of range by naive algorithms.
--Joe
|
615.14 | Floating point operand generation | EAGLE1::DANTOWITZ | Happy gnu year! | Tue Jan 06 1987 16:31 | 468 |
| The following memo discusses a selection process for floating point
operands. It's intended target is the VAX floating point instruction
set but the basic idea could be applicable to MTH$ routines with a
little tweaking.
There are bound to be comments both technical and editorial. Please
send these to me via mail or post relevant technical comments here.
TO: Floating point interest group
SUBJECT: Floating point operand selection
0.0 Introduction
------------
This memo discusses the generation of floating point
operands for architecture verification software. In general
it is not easy to invent good test values for floating point
operations. Here I will present a set of values that should
exercise floating point operations very well. The specifics
to be covered are boundary values for floating point
operations and how to generate them. This memo assumes that
the reader has a knowledge of the formats used to represent
VAX floating point values.
Company Confidential Page 2
0.1 Boundaries
----------
The floating point primitive operations are ADD, SUB,
MUL, DIV, MOV, and CONVERT. Others operations (e.g. POLY or
CMP) are built using these primitives (from a functional point
of view). I will target the test values generated to these
primitives.
To test the floating point instructions we should
generate values that test the boundaries of the domain and
range of the instructions. Boundary testing is important
because more often than not this is where non-trivial errors
are likely to show up. The most apparent boundaries are
overflow and underflow. These are tested by using operands
that are or produce results close to or extended past the
storage limits for the data type.
There are many boundary values for floating point
numbers. A boundary value that is not at a computational
limit is a floating point number with a fraction of 0. The
stored value is normalized with the most significant bit not
represented so a number with a fraction of 0 represents a
value that is a power of 2.
Other boundaries include special cases and values based
on the operation being computed. An example of a special case
is the representation for a floating point value of zero.
When the sign is 0 and the exponent is 0 the value of the
floating point number is zero, no matter what value is stored
in the fraction. It is important that the instructions are
tested using floating point 0 with zero and non-zero
fractions. (A floating point 0 with a non-zero fraction is
called a "dirty zero".) When the exponent is 0 and the sign
is 1 the value is reserved and should cause a reserved operand
fault (these values should be tested as well).
Some boundaries involve the nature of the instruction.
For example ADD instructions must propagate carries properly
and SUB instructions must propagate borrows properly.
Accuracy is also a factor since rounding, chopping,
normalizing, and aligning operands are necessary steps for
many of the floating point operations.
The remainder of this memo will present a simple method
for generating floating point operands that effectively and
efficiently test floating point operations.
Company Confidential Page 3
0.2 Classes
-------
It is important to note that to exercise an architecture
we compare two implementations of that architecture. Bugs are
found ONLY when two machines produce different results for the
same set of operations. Bugs will be found quickest if we
execute as many different classes of operations as possible
and avoid re-testing the same class of operations.
Using randomly selected data from a space as large as 32,
64, and 128 bits for as few as 10 million cases will not
sufficiently exercise an architecture. Another problem with
randomly selected data is that many of the values test similar
facets of the architecture, thus a large percentage of
redundant testing is performed. Because our goal is to
exercise an architecture in a reasonable amount of time we
must find a method to generate an effective subset of the
total test space. This section presents what should be an
effective subset.
The first class of floating point operands has the
following form:
i j
+/- 2 +/- 2 where i and j are integers.
Adding and subtracting such values will yield a large
variation in carries and borrows during computation. By
generating operands of this form many special cases of ADD and
SUB are generated in fewer cases than they would be if random
or enumerative generation were used. MUL, DIV, and CONVERT
also benefit from testing with such values. All of the
floating point errors found or learned of by Schryer [1] were
detected with values of the above form. To achieve a bit more
testing (for CONVERT instructions and precision problems etc.)
you can add or subtract another term to obtain values of the
form:
i j k
+/- 2 +/- 2 +/- 2 where i, j, and k are integers.
Generating values of these two forms will produce floating
point values over a very wide range of values.
There are many ways of generating floating point numbers
over a wide range of values, so why use this method? The two
forms above are combinations of simple terms. The terms are
powers of 2, the base that is used to represent floating point
values. Computing sums and differences of powers of 2 creates
a small set of bit patterns that are very effective in
Company Confidential Page 4
exercising arithmetic operations. You may ask "Why didn't I
include forms that are products of simple terms?" The answer
to this is that such products only change the exponent, not
the fraction value, so they are represented by the two forms
above. The quotient of some simple terms also are represented
by the two forms above. Quotients that do not have one of the
two forms above unnecessarily complicate the generation of
floating point operands. This is because such quotients are
not likely to surface any bugs not found with the two forms
above and such terms are not easily generated.
One important boundary value is not included above. This
value is any floating point number with a fraction equal to
the normalized square root of 2. This value is important
because its square should result in a power of two. This will
test overflows and underflows and the intricacies of MUL
instructions. This value is important for DIV instructions
for similar reasons.
The values above are numerically valid floating point
test values, but they also have a simple grammatical
representation in binary. A benefit of generating values of
these forms is that they can be generated with a simple
algorithm that avoids any mathematical operations. Below are
the patterns for the normalized fractions of the floating
point operands generated using the above forms. Since the
values below are normalized i and j are positive integers and
i < j. Note that the most significant bit is not shown. (*
and +, used below, are a notation borrowed from the study of
language theory.)
* * -i
a) 0 1 0 = 1/2 + 2
* * -i
b) 1 0 = 1 - 2
* * * -i -j
c) 0 1 0 1 0 = 1/2 + 2 + 2
* + * -i -j
d) 0 1 0 = 1/2 + 2 - 2
* * * -i -j
e) 1 0 1 0 = 1 - 2 + 2
* + * -i -j
f) 1 0 1 0 = 1 - 2 - 2
Recognizing the above patterns simplifies the generation
of floating point fractions. Note that many approaches (AXE
included) rely on just the opposite process for generating
floating point fractions: the test designer sees a pretty
pattern that he/she thinks will make a "nice" test and
Company Confidential Page 5
proceeds to generate "nice" floating point operands. This
approach may uncover bugs - given enough time - but it lacks
the necessary structure and understanding that should guide
test design. It is more likely that properly constructed
tests will uncover bugs in less time than will "nicely"
constructed tests.
To further enhance our testing we can add or subtract a
small number to a value selected from the above patterns.
Rather than adding or subtracting it may be more efficient and
just as effective to XOR the least significant 8 bits of the
fraction with a byte selected randomly or from a list.
Modifying the least significant 8 bits of the fraction should
be enough to trip over any accuracy bugs.
With this information we can now select the fraction of a
floating point operand.
The sign has only two values and its selection presents
no difficult problem. The exponent is the final part of the
floating point operand that needs to be discussed. The
previous paragraphs presented the patterns in fractions of
numbers produced by sums and differences of powers of 2
without mention of the exponent values. Relevant boundaries
for the exponent are the minimum and maximum values, half the
maximum, and twice the minimum. These are by magnitude and
should also be varied + and -.
The exponent should be generated using the same type of
algorithm as the fraction. This will generate exponents at
the boundary values noted above. Since the storage for the
exponent is excess-N a slight change in the fraction
generation algorithm must be made: the most significant bit
of the exponent should be selected independently and the other
bits should be selected using the method presented for
generating fractions. Note that the exponents are not
normalized so using the patterns a - f will produce one and
two term results rather than two and three term results.
Negative exponents have the form 0X, where X is a series
of bits. The value of the exponent (in excess-N) is X - N.
If Y = X - N then ABS(Y) = (NOT X) + 1. In other words when
we generate the bits for X and the preceding bit is 0, then
the actual value of the exponent is not -X, but -[(NOT X)+1].
We could modify the algorithm to generate exponents that are
-X if the readers think it is a matter of concern. This
slight change may not be necessary or perhaps another
variation may be suggested.
We now have the knowledge to generate floating point
operands from scratch.
Company Confidential Page 6
0.3 Implementation notes and analysis
---------------------------------
This section applies the previous information and
presents some ideas and notes on implementation.
All floating point operands should not be generated from
scratch. Instead, when there are two source operands for an
operation, some percentage of the second operands should be
generated based on the first operand. (The operands should
also be swapped 50% of the time to remove any bias. [An
implementation detail.]) For example once the first operand
is generated we could use the same fraction and modify the
exponent and/or the sign to produce the second operand.
Modification might be changing a few bits or replacing an
entire section of the operand (the fraction, exponent, or
sign).
When choosing values for exponents or fractions the
patterns a - f should be chosen from uniformly. At this point
in time there are no reasons to choose any particular pattern
more frequently than another. Besides these patterns we also
have the normalized square root of 2 (from here on referred to
as NSQRT2). The NSQRT2 can be placed in a list of patterns
that are not generated by a - f. You may wish to include 0 in
the list with NSQRT2 because only pattern b generates a 0 (no
1's) and if there are n bits in the pattern then b only
generates a 0 once out of n+1 possible values. An
implementation would use each of the six patterns a - f x% and
the special list y% where 6 * x + y = 100.
As you looked at the fraction patterns a - f you may have
noticed similarities between some of them. (a) and (d) are
quite similar (a is a subset of d), but the two should not be
combined into one pattern because the patterns are equally
important. Combining them would simplify the software, but
would generate some values less frequently. Given a length of
n bits, if pattern (a) were merged with pattern (d) then
pattern (a) would be generated ~1/nth of the number times it
would be generated if it remained a separate pattern.
You may have also looked at the patterns and noted that
they are all a subset of the general expression:
p q r s t
0 1 0 1 0
If you are thinking to yourself "Hey, that's neat, let's just
use this general expression for all the patterns!" then you
have missed the point of the first five pages in this memo.
Patterns a - f generate values that are important boundary
Company Confidential Page 7
values. Yes, all the patterns generated by a - f are
generated by the general expression, but the general
expression also generates many other patterns. The table
below shows the statistics for generating the fraction part of
a floating point operand. The three columns contain: the
number of patterns generated by the general expression, the
number of patterns generated by the general expression that
are NOT generated by a - f and the ratio of the first two
columns.
F 17550 9975 0.57%
D 455126 363103 0.80
G 367290 289100 0.79
H 7160245 6420645 0.90
The third column shows the percentage of values generated by
the general expression that are not generated by patterns
a - f. Introducing such a large percentage of values with
unknown properties is not a valid procedure. Doing so
basically throws away all the work already done to generate
boundary values. Even for F_floating more than half of the
values generated with the general expression are not known
boundary values. Generating a large percentage of cases with
the general expression would not be much better than randomly
generated values and is not an optimal solution.
The general expression is not value-less and it should be
present as an entry in the list of special values (along with
NSQRT2 and 0). While we're at it we could also add the NOT of
the general expression. The last interesting entry we might
add is a totally random entry. Each time the random entry is
selected a new random value would be chosen.
To generate all these patterns one routine may be written
to produce the general expression above. When generating the
patterns a - f the user could specify restrictions on p, q, r,
s, and t (e.g. to generate (a): q=1, s=0, and t=0).
Company Confidential Page 8
0.4 Conclusion
----------
This algorithm is both simple and easy to implement.
Once percentages are assigned to the various choices presented
above it is easy to answer questions like "Does it generate
Z?" and "How often does it generate Z?" The list of choices
for percentages is given in the appendix.
If later on someone discovers that some value Z is not
generated and that value should be generated then the
algorithm is easily augmented by placing the value Z in the
list of special values.
The principle idea of this memo (sums and differences of
simple terms) should be applied to integer data as well as
floating point.
Generally we don't know what values should be included in
the optimal set of test values, but the values presented here
should make the job of architecture exercising easier and more
efficient for floating point instructions.
Company Confidential Page 9
References
Du Croz, J. J. and Pont, M. W. (1985 Draft). FPV,
Floating-Point Validation Package, User's Guide. Numerical
Algorithms Group Ltd, Oxford, U.K.
Company Confidential Page 10
APPENDIX
IMPLEMENTATION RECOMMENDATIONS
The percentages that must be assigned in this algorithm
include the following:
For selecting a new exponent or fraction:
x% of the time select pattern a
x% of the time select pattern b
x% of the time select pattern c
x% of the time select pattern d
x% of the time select pattern e
x% of the time select pattern f
y% of the time select a value from the special list
6 * x + y = 100
Note that the values in the special list may have more than
one entry to favor the selection of one over another.
My recommendation (by instinct) is let x=16 and y=4. (They
don't necessarily have to be integers.)
Deciding whether to generate a new exponent, fraction, or
sign, for a second source operand:
a% of the time generate a new exponent, fraction or sign.
b% of the time modify the old value.
c% of the time use the old value.
a + b + c = 100
My recommendation is let a=70, b=25, and c=5.
The last percentage is how often to vary the least
significant 8 bits of the fraction. My recommendation
is 5%.
FLOATING POINT REPRESENTATIONS.
Format fraction exponent
bits bits
F 23 8
D 55 8
G 52 11
H 112 15
Company Confidential Page 11
THE NORMALIZED SQUARE ROOT OF 2
In hex (with the most significant bit shown)
1 6A09 E667 F3BC C908 B2FB 1366 EA95 7D3E 3ADE C175
|
615.15 | | RUSURE::EDP | Always mount a scratch monkey. | Tue Dec 29 1992 11:00 | 9 |
| The previous reply, .14, is labeled confidential. However, it is six
years old and does not seem to contain any strategic information. Does
anybody object to allowing its transmission outside the company?
The author is no longer with Digital; does anybody know what group
might "own" the information?
-- edp
|