11.1Sross
21.1SrossTestFloat Release 2a General Documentation
31.1Sross
41.1SrossJohn R. Hauser
51.1Sross1998 December 16
61.1Sross
71.1Sross
81.1Sross-------------------------------------------------------------------------------
91.1SrossIntroduction
101.1Sross
111.1SrossTestFloat is a program for testing that a floating-point implementation
121.1Srossconforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
131.1SrossAll standard operations supported by the system can be tested, except for
141.1Srossconversions to and from decimal.  Any of the following machine formats can
151.1Srossbe tested:  single precision, double precision, extended double precision,
161.1Srossand/or quadruple precision.
171.1Sross
181.1SrossTestFloat actually comes in two variants:  one is a program for testing
191.1Srossa machine's floating-point, and the other is a program for testing
201.1Srossthe SoftFloat software implementation of floating-point.  (Information
211.1Srossabout SoftFloat can be found at the SoftFloat Web page, `http://
221.1SrossHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.)  The version that
231.1Srosstests SoftFloat is expected to be of interest only to people compiling the
241.1SrossSoftFloat sources.  However, because the two versions share much in common,
251.1Srossthey are discussed together in all the TestFloat documentation.
261.1Sross
271.1SrossThis document explains how to use the TestFloat programs.  It does not
281.1Srossattempt to define or explain the IEC/IEEE Standard for floating-point.
291.1SrossDetails about the standard are available elsewhere.
301.1Sross
311.1SrossThe first release of TestFloat (Release 1) was called _FloatTest_.  The old
321.1Srossname has been obsolete for some time.
331.1Sross
341.1Sross
351.1Sross-------------------------------------------------------------------------------
361.1SrossLimitations
371.1Sross
381.1SrossTestFloat's output is not always easily interpreted.  Detailed knowledge
391.1Srossof the IEC/IEEE Standard and its vagaries is needed to use TestFloat
401.1Srossresponsibly.
411.1Sross
421.1SrossTestFloat performs relatively simple tests designed to check the fundamental
431.1Srosssoundness of the floating-point under test.  TestFloat may also at times
441.1Srossmanage to find rarer and more subtle bugs, but it will probably only find
451.1Srosssuch bugs by accident.  Software that purposefully seeks out various kinds
461.1Srossof subtle floating-point bugs can be found through links posted on the
471.1SrossTestFloat Web page (`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/
481.1SrossTestFloat.html').
491.1Sross
501.1Sross
511.1Sross-------------------------------------------------------------------------------
521.1SrossContents
531.1Sross
541.1Sross    Introduction
551.1Sross    Limitations
561.1Sross    Contents
571.1Sross    Legal Notice
581.1Sross    What TestFloat Does
591.1Sross    Executing TestFloat
601.1Sross    Functions Tested by TestFloat
611.1Sross        Conversion Functions
621.1Sross        Standard Arithmetic Functions
631.1Sross        Remainder and Round-to-Integer Functions
641.1Sross        Comparison Functions
651.1Sross    Interpreting TestFloat Output
661.1Sross    Variations Allowed by the IEC/IEEE Standard
671.1Sross        Underflow
681.1Sross        NaNs
691.1Sross        Conversions to Integer
701.1Sross    TestFloat Options
711.1Sross        -help
721.1Sross        -list
731.1Sross        -level <num>
741.1Sross        -errors <num>
751.1Sross        -errorstop
761.1Sross        -forever
771.1Sross        -checkNaNs
781.1Sross        -precision32, -precision64, -precision80
791.1Sross        -nearesteven, -tozero, -down, -up
801.1Sross        -tininessbefore, -tininessafter
811.1Sross    Function Sets
821.1Sross    Contact Information
831.1Sross
841.1Sross
851.1Sross
861.1Sross-------------------------------------------------------------------------------
871.1SrossLegal Notice
881.1Sross
891.1SrossTestFloat was written by John R. Hauser.
901.1Sross
911.1SrossTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
921.1Srosshas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
931.1SrossTIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
941.1SrossPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
951.1SrossAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
961.1Sross
971.1Sross
981.1Sross-------------------------------------------------------------------------------
991.1SrossWhat TestFloat Does
1001.1Sross
1011.1SrossTestFloat tests a system's floating-point by comparing its behavior with
1021.1Srossthat of TestFloat's own internal floating-point implemented in software.
1031.1SrossFor each operation tested, TestFloat generates a large number of test cases,
1041.1Srossmade up of simple pattern tests intermixed with weighted random inputs.
1051.1SrossThe cases generated should be adequate for testing carry chain propagations,
1061.1Srossplus the rounding of adds, subtracts, multiplies, and simple operations like
1071.1Srossconversions.  TestFloat makes a point of checking all boundary cases of the
1081.1Srossarithmetic, including underflows, overflows, invalid operations, subnormal
1091.1Srossinputs, zeros (positive and negative), infinities, and NaNs.  For the
1101.1Srossinteresting operations like adds and multiplies, literally millions of test
1111.1Srosscases can be checked.
1121.1Sross
1131.1SrossTestFloat is not remarkably good at testing difficult rounding cases for
1141.1Srossdivisions and square roots.  It also makes no attempt to find bugs specific
1151.1Srossto SRT divisions and the like (such as the infamous Pentium divide bug).
1161.1SrossSoftware that tests for such failures can be found through links on the
1171.1SrossTestFloat Web page, `http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/
1181.1SrossTestFloat.html'.
1191.1Sross
1201.1SrossNOTE!
1211.1SrossIt is the responsibility of the user to verify that the discrepancies
1221.1SrossTestFloat finds actually represent faults in the system being tested.
1231.1SrossAdvice to help with this task is provided later in this document.
1241.1SrossFurthermore, even if TestFloat finds no fault with a floating-point
1251.1Srossimplementation, that in no way guarantees that the implementation is bug-
1261.1Srossfree.
1271.1Sross
1281.1SrossFor each operation, TestFloat can test all four rounding modes required
1291.1Srossby the IEC/IEEE Standard.  TestFloat verifies not only that the numeric
1301.1Srossresults of an operation are correct, but also that the proper floating-point
1311.1Srossexception flags are raised.  All five exception flags are tested, including
1321.1Srossthe inexact flag.  TestFloat does not attempt to verify that the floating-
1331.1Srosspoint exception flags are actually implemented as sticky flags.
1341.1Sross
1351.1SrossFor machines that implement extended double precision with rounding
1361.1Srossprecision control (such as Intel's 80x86), TestFloat can test the add,
1371.1Srosssubtract, multiply, divide, and square root functions at all the standard
1381.1Srossrounding precisions.  The rounding precision can be set equivalent to single
1391.1Srossprecision, to double precision, or to the full extended double precision.
1401.1SrossRounding precision control can only be applied to the extended double-
1411.1Srossprecision format and only for the five standard arithmetic operations:  add,
1421.1Srosssubtract, multiply, divide, and square root.  Other functions can be tested
1431.1Srossonly at full precision.
1441.1Sross
1451.1SrossAs a rule, TestFloat is not particular about the bit patterns of NaNs that
1461.1Srossappear as function results.  Any NaN is considered as good a result as
1471.1Srossanother.  This laxness can be overridden so that TestFloat checks for
1481.1Srossparticular bit patterns within NaN results.  See the sections _Variations_
1491.1Sross_Allowed_by_the_IEC/IEEE_Standard_ and _TestFloat_Options_ for details.
1501.1Sross
1511.1SrossNot all IEC/IEEE Standard functions are supported by all machines.
1521.1SrossTestFloat can only test functions that exist on the machine.  But even if
1531.1Srossa function is supported by the machine, TestFloat may still not be able
1541.1Srossto test the function if it is not accessible through standard ISO C (the
1551.1Srossprogramming language in which TestFloat is written) and if the person who
1561.1Srosscompiled TestFloat did not provide an alternate means for TestFloat to
1571.1Srossinvoke the machine function.
1581.1Sross
1591.1SrossTestFloat compares a machine's floating-point against the SoftFloat software
1601.1Srossimplementation of floating-point, also written by me.  SoftFloat is built
1611.1Srossinto the TestFloat executable and does not need to be supplied by the user.
1621.1SrossIf SoftFloat is wanted for some other reason (to compile a new version
1631.1Srossof TestFloat, for instance), it can be found separately at the Web page
1641.1Sross`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.
1651.1Sross
1661.1SrossFor testing SoftFloat itself, the TestFloat package includes a program that
1671.1Srosscompares SoftFloat's floating-point against _another_ software floating-
1681.1Srosspoint implementation.  The second software floating-point is simpler and
1691.1Srossslower than SoftFloat, and is completely independent of SoftFloat.  Although
1701.1Srossthe second software floating-point cannot be guaranteed to be bug-free, the
1711.1Srosschance that it would mimic any of SoftFloat's bugs is remote.  Consequently,
1721.1Srossan error in one or the other floating-point version should appear as an
1731.1Srossunexpected discrepancy between the two implementations.  Note that testing
1741.1SrossSoftFloat should only be necessary when compiling a new TestFloat executable
1751.1Srossor when compiling SoftFloat for some other reason.
1761.1Sross
1771.1Sross
1781.1Sross-------------------------------------------------------------------------------
1791.1SrossExecuting TestFloat
1801.1Sross
1811.1SrossTestFloat is intended to be executed from a command line interpreter.  The
1821.1Sross`testfloat' program is invoked as follows:
1831.1Sross
1841.1Sross    testfloat [<option>...] <function>
1851.1Sross
1861.1SrossHere square brackets ([]) indicate optional items, while angled brackets
1871.1Sross(<>) denote parameters to be filled in.
1881.1Sross
1891.1SrossThe `<function>' argument is a name like `float32_add' or `float64_to_int32'.
1901.1SrossThe complete list of function names is given in the next section,
1911.1Sross_Functions_Tested_by_TestFloat_.  It is also possible to test all machine
1921.1Srossfunctions in a single invocation.  The various options to TestFloat are
1931.1Srossdetailed in the section _TestFloat_Options_ later in this document.  If
1941.1Sross`testfloat' is executed without any arguments, a summary of TestFloat usage
1951.1Srossis written.
1961.1Sross
1971.1SrossTestFloat will ordinarily test a function for all four rounding modes, one
1981.1Srossafter the other.  If the rounding mode is not supposed to have any affect
1991.1Srosson the results--for instance, some operations do not require rounding--only
2001.1Srossthe nearest/even rounding mode is checked.  For extended double-precision
2011.1Srossoperations affected by rounding precision control, TestFloat also tests all
2021.1Srossthree rounding precision modes, one after the other.  Testing can be limited
2031.1Srossto a single rounding mode and/or rounding precision with appropriate options
2041.1Sross(see _TestFloat_Options_).
2051.1Sross
2061.1SrossAs it executes, TestFloat writes status information to the standard error
2071.1Srossoutput, which should be the screen by default.  In order for this status to
2081.1Srossbe displayed properly, the standard error stream should not be redirected
2091.1Srossto a file.  The discrepancies TestFloat finds are written to the standard
2101.1Srossoutput stream, which is easily redirected to a file if desired.  Ordinarily,
2111.1Srossthe errors TestFloat reports and the ongoing status information appear
2121.1Srossintermixed on the same screen.
2131.1Sross
2141.1SrossThe version of TestFloat for testing SoftFloat is called `testsoftfloat'.
2151.1SrossIt is invoked the same as `testfloat',
2161.1Sross
2171.1Sross    testsoftfloat [<option>...] <function>
2181.1Sross
2191.1Srossand operates similarly.
2201.1Sross
2211.1Sross
2221.1Sross-------------------------------------------------------------------------------
2231.1SrossFunctions Tested by TestFloat
2241.1Sross
2251.1SrossTestFloat tests all operations required by the IEC/IEEE Standard except for
2261.1Srossconversions to and from decimal.  The operations are
2271.1Sross
2281.1Sross-- Conversions among the supported floating-point formats, and also between
2291.1Sross   integers (32-bit and 64-bit) and any of the floating-point formats.
2301.1Sross
2311.1Sross-- The usual add, subtract, multiply, divide, and square root operations
2321.1Sross   for all supported floating-point formats.
2331.1Sross
2341.1Sross-- For each format, the floating-point remainder operation defined by the
2351.1Sross   IEC/IEEE Standard.
2361.1Sross
2371.1Sross-- For each floating-point format, a ``round to integer'' operation that
2381.1Sross   rounds to the nearest integer value in the same format.  (The floating-
2391.1Sross   point formats can hold integer values, of course.)
2401.1Sross
2411.1Sross-- Comparisons between two values in the same floating-point format.
2421.1Sross
2431.1SrossDetailed information about these functions is given below.  In the function
2441.1Srossnames used by TestFloat, single precision is called `float32', double
2451.1Srossprecision is `float64', extended double precision is `floatx80', and
2461.1Srossquadruple precision is `float128'.  TestFloat uses the same names for
2471.1Srossfunctions as SoftFloat.
2481.1Sross
2491.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2501.1SrossConversion Functions
2511.1Sross
2521.1SrossAll conversions among the floating-point formats and all conversion between
2531.1Srossa floating-point format and 32-bit and 64-bit signed integers can be tested.
2541.1SrossThe conversion functions are:
2551.1Sross
2561.1Sross   int32_to_float32      int64_to_float32
2571.1Sross   int32_to_float64      int64_to_float32
2581.1Sross   int32_to_floatx80     int64_to_floatx80
2591.1Sross   int32_to_float128     int64_to_float128
2601.1Sross
2611.1Sross   float32_to_int32      float32_to_int64
2621.1Sross   float32_to_int32      float64_to_int64
2631.1Sross   floatx80_to_int32     floatx80_to_int64
2641.1Sross   float128_to_int32     float128_to_int64
2651.1Sross
2661.1Sross   float32_to_float64    float32_to_floatx80   float32_to_float128
2671.1Sross   float64_to_float32    float64_to_floatx80   float64_to_float128
2681.1Sross   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
2691.1Sross   float128_to_float32   float128_to_float64   float128_to_floatx80
2701.1Sross
2711.1SrossThese conversions all round according to the current rounding mode as
2721.1Srossnecessary.  Conversions from a smaller to a larger floating-point format are
2731.1Srossalways exact and so require no rounding.  Conversions from 32-bit integers
2741.1Srossto double precision or to any larger floating-point format are also exact,
2751.1Srossand likewise for conversions from 64-bit integers to extended double and
2761.1Srossquadruple precisions.
2771.1Sross
2781.1SrossISO/ANSI C requires that conversions to integers be rounded toward zero.
2791.1SrossSuch conversions can be tested with the following functions that ignore any
2801.1Srossrounding mode:
2811.1Sross
2821.1Sross   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
2831.1Sross   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
2841.1Sross   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
2851.1Sross   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
2861.1Sross
2871.1SrossTestFloat assumes that conversions from floating-point to integer should
2881.1Srossraise the invalid exception if the source value cannot be rounded to a
2891.1Srossrepresentable integer of the desired size (32 or 64 bits).  If such a
2901.1Srossconversion overflows, TestFloat expects the largest integer with the same
2911.1Srosssign as the operand to be returned.  If the floating-point operand is a NaN,
2921.2SandvarTestFloat allows either the largest positive or largest negative integer to
2931.1Srossbe returned.
2941.1Sross
2951.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2961.1SrossStandard Arithmetic Functions
2971.1Sross
2981.1SrossThe following standard arithmetic functions can be tested:
2991.1Sross
3001.1Sross   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
3011.1Sross   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
3021.1Sross   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
3031.1Sross   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
3041.1Sross
3051.1SrossThe extended double-precision (`floatx80') functions can be rounded to
3061.1Srossreduced precision under rounding precision control.
3071.1Sross
3081.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3091.1SrossRemainder and Round-to-Integer Functions
3101.1Sross
3111.1SrossFor each format, TestFloat can test the IEC/IEEE Standard remainder and
3121.1Srossround-to-integer functions.  The remainder functions are:
3131.1Sross
3141.1Sross   float32_rem
3151.1Sross   float64_rem
3161.1Sross   floatx80_rem
3171.1Sross   float128_rem
3181.1Sross
3191.1SrossThe round-to-integer functions are:
3201.1Sross
3211.1Sross   float32_round_to_int
3221.1Sross   float64_round_to_int
3231.1Sross   floatx80_round_to_int
3241.1Sross   float128_round_to_int
3251.1Sross
3261.1SrossThe remainder functions are always exact and so do not require rounding.
3271.1Sross
3281.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3291.1SrossComparison Functions
3301.1Sross
3311.1SrossThe following floating-point comparison functions can be tested:
3321.1Sross
3331.1Sross   float32_eq    float32_le    float32_lt
3341.1Sross   float64_eq    float64_le    float64_lt
3351.1Sross   floatx80_eq   floatx80_le   floatx80_lt
3361.1Sross   float128_eq   float128_le   float128_lt
3371.1Sross
3381.1SrossThe abbreviation `eq' stands for ``equal'' (=); `le' stands for ``less than
3391.1Srossor equal'' (<=); and `lt' stands for ``less than'' (<).
3401.1Sross
3411.1SrossThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than
3421.1Srossfunctions raise the invalid exception if either input is any kind of NaN.
3431.1SrossThe equal functions, for their part, are defined not to raise the invalid
3441.1Srossexception on quiet NaNs.  For completeness, the following additional
3451.1Srossfunctions can be tested if supported:
3461.1Sross
3471.1Sross   float32_eq_signaling    float32_le_quiet    float32_lt_quiet
3481.1Sross   float64_eq_signaling    float64_le_quiet    float64_lt_quiet
3491.1Sross   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
3501.1Sross   float128_eq_signaling   float128_le_quiet   float128_lt_quiet
3511.1Sross
3521.1SrossThe `signaling' equal functions are identical to the standard functions
3531.1Srossexcept that the invalid exception should be raised for any NaN input.
3541.1SrossLikewise, the `quiet' comparison functions should be identical to their
3551.1Srosscounterparts except that the invalid exception is not raised for quiet NaNs.
3561.1Sross
3571.1SrossObviously, no comparison functions ever require rounding.  Any rounding mode
3581.1Srossis ignored.
3591.1Sross
3601.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3611.1Sross
3621.1Sross
3631.1Sross-------------------------------------------------------------------------------
3641.1SrossInterpreting TestFloat Output
3651.1Sross
3661.1SrossThe ``errors'' reported by TestFloat may or may not really represent errors
3671.1Srossin the system being tested.  For each test case tried, TestFloat performs
3681.1Srossthe same floating-point operation for the two implementations being compared
3691.1Srossand reports any unexpected difference in the results.  The two results could
3701.1Srossdiffer for several reasons:
3711.1Sross
3721.1Sross-- The IEC/IEEE Standard allows for some variation in how conforming
3731.1Sross   floating-point behaves.  Two implementations can occasionally give
3741.1Sross   different results without either being incorrect.
3751.1Sross
3761.1Sross-- The trusted floating-point emulation could be faulty.  This could be
3771.1Sross   because there is a bug in the way the enulation is coded, or because a
3781.1Sross   mistake was made when the code was compiled for the current system.
3791.1Sross
3801.1Sross-- TestFloat may not work properly, reporting discrepancies that do not
3811.1Sross   exist.
3821.1Sross
3831.1Sross-- Lastly, the floating-point being tested could actually be faulty.
3841.1Sross
3851.1SrossIt is the responsibility of the user to determine the causes for the
3861.1Srossdiscrepancies TestFloat reports.  Making this determination can require
3871.1Srossdetailed knowledge about the IEC/IEEE Standard.  Assuming TestFloat is
3881.1Srossworking properly, any differences found will be due to either the first or
3891.1Srosslast of these reasons.  Variations in the IEC/IEEE Standard that could lead
3901.1Srossto false error reports are discussed in the section _Variations_Allowed_by_
3911.1Sross_the_IEC/IEEE_Standard_.
3921.1Sross
3931.1SrossFor each error (or apparent error) TestFloat reports, a line of text
3941.1Srossis written to the default output.  If a line would be longer than 79
3951.1Srosscharacters, it is divided.  The first part of each error line begins in the
3961.1Srossleftmost column, and any subsequent ``continuation'' lines are indented with
3971.1Srossa tab.
3981.1Sross
3991.1SrossEach error reported by `testfloat' is of the form:
4001.1Sross
4011.1Sross    <inputs>  soft: <output-from-emulation>  syst: <output-from-system>
4021.1Sross
4031.1SrossThe `<inputs>' are the inputs to the operation.  Each output is shown as a
4041.1Srosspair:  the result value first, followed by the exception flags.  The `soft'
4051.1Srosslabel stands for ``software'' (or ``SoftFloat''), while `syst' stands for
4061.1Sross``system,'' the machine's floating-point.
4071.1Sross
4081.1SrossFor example, two typical error lines could be
4091.1Sross
4101.1Sross    800.7FFF00  87F.000100  soft: 001.000000 ....x  syst: 001.000000 ...ux
4111.1Sross    081.000004  000.1FFFFF  soft: 001.000000 ....x  syst: 001.000000 ...ux
4121.1Sross
4131.1SrossIn the first line, the inputs are `800.7FFF00' and `87F.000100'.  The
4141.1Srossinternal emulation result is `001.000000' with flags `....x', and the
4151.1Srosssystem result is the same but with flags `...ux'.  All the items composed of
4161.1Srosshexadecimal digits and a single period represent floating-point values (here
4171.1Srosssingle precision).  These cases were reported as errors because the flag
4181.1Srossresults differ.
4191.1Sross
4201.1SrossIn addition to the exception flags, there are seven data types that may
4211.1Srossbe represented.  Four are floating-point types:  single precision, double
4221.1Srossprecision, extended double precision, and quadruple precision.  The
4231.1Srossremaining three types are 32-bit and 64-bit two's-complement integers and
4241.1SrossBoolean values (the results of comparison operations).  Boolean values are
4251.1Srossrepresented as a single character, either a `0' or a `1'.  32-bit integers
4261.1Srossare written as 8 hexadecimal digits in two's-complement form.  Thus,
4271.1Sross`FFFFFFFF' is -1, and `7FFFFFFF' is the largest positive 32-bit integer.
4281.1Sross64-bit integers are the same except with 16 hexadecimal digits.
4291.1Sross
4301.1SrossFloating-point values are written in a correspondingly primitive form.
4311.1SrossDouble-precision values are represented by 16 hexadecimal digits that give
4321.1Srossthe raw bits of the floating-point encoding.  A period separates the 3rd and
4331.1Sross4th hexadecimal digits to mark the division between the exponent bits and
4341.1Srossfraction bits.  Some notable double-precision values include:
4351.1Sross
4361.1Sross    000.0000000000000    +0
4371.1Sross    3FF.0000000000000     1
4381.1Sross    400.0000000000000     2
4391.1Sross    7FF.0000000000000    +infinity
4401.1Sross
4411.1Sross    800.0000000000000    -0
4421.1Sross    BFF.0000000000000    -1
4431.1Sross    C00.0000000000000    -2
4441.1Sross    FFF.0000000000000    -infinity
4451.1Sross
4461.1Sross    3FE.FFFFFFFFFFFFF    largest representable number preceding +1
4471.1Sross
4481.1SrossThe following categories are easily distinguished (assuming the `x's are not
4491.1Srossall 0):
4501.1Sross
4511.1Sross    000.xxxxxxxxxxxxx    positive subnormal (denormalized) numbers
4521.1Sross    7FF.xxxxxxxxxxxxx    positive NaNs
4531.1Sross    800.xxxxxxxxxxxxx    negative subnormal numbers
4541.1Sross    FFF.xxxxxxxxxxxxx    negative NaNs
4551.1Sross
4561.1SrossQuadruple-precision values are written the same except with 4 hexadecimal
4571.1Srossdigits for the sign and exponent and 28 for the fraction.  Notable values
4581.1Srossinclude:
4591.1Sross
4601.1Sross    0000.0000000000000000000000000000    +0
4611.1Sross    3FFF.0000000000000000000000000000     1
4621.1Sross    4000.0000000000000000000000000000     2
4631.1Sross    7FFF.0000000000000000000000000000    +infinity
4641.1Sross
4651.1Sross    8000.0000000000000000000000000000    -0
4661.1Sross    BFFF.0000000000000000000000000000    -1
4671.1Sross    C000.0000000000000000000000000000    -2
4681.1Sross    FFFF.0000000000000000000000000000    -infinity
4691.1Sross
4701.1Sross    3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF    largest representable number
4711.1Sross                                             preceding +1
4721.1Sross
4731.1SrossExtended double-precision values are a little unusual in that the leading
4741.1Srosssignificand bit is not hidden as with other formats.  When correctly
4751.1Srossencoded, the leading significand bit of an extended double-precision value
4761.1Srosswill be 0 if the value is zero or subnormal, and will be 1 otherwise.
4771.1SrossHence, the same values listed above appear in extended double-precision as
4781.1Srossfollows (note the leading `8' digit in the significands):
4791.1Sross
4801.1Sross    0000.0000000000000000    +0
4811.1Sross    3FFF.8000000000000000     1
4821.1Sross    4000.8000000000000000     2
4831.1Sross    7FFF.8000000000000000    +infinity
4841.1Sross
4851.1Sross    8000.0000000000000000    -0
4861.1Sross    BFFF.8000000000000000    -1
4871.1Sross    C000.8000000000000000    -2
4881.1Sross    FFFF.8000000000000000    -infinity
4891.1Sross
4901.1Sross    3FFE.FFFFFFFFFFFFFFFF    largest representable number preceding +1
4911.1Sross
4921.1SrossThe representation of single-precision values is unusual for a different
4931.1Srossreason.  Because the subfields of standard single-precision do not fall
4941.1Srosson neat 4-bit boundaries, single-precision outputs are slightly perturbed.
4951.1SrossThese are written as 9 hexadecimal digits, with a period separating the 3rd
4961.1Srossand 4th hexadecimal digits.  Broken out into bits, the 9 hexademical digits
4971.1Srosscover the single-precision subfields as follows:
4981.1Sross
4991.1Sross    x000 .... ....  .  .... .... .... .... .... ....    sign       (1 bit)
5001.1Sross    .... xxxx xxxx  .  .... .... .... .... .... ....    exponent   (8 bits)
5011.1Sross    .... .... ....  .  0xxx xxxx xxxx xxxx xxxx xxxx    fraction  (23 bits)
5021.1Sross
5031.1SrossAs shown in this schematic, the first hexadecimal digit contains only
5041.1Srossthe sign, and will be either `0' or `8'.  The next two digits give the
5051.1Srossbiased exponent as an 8-bit integer.  This is followed by a period and
5061.1Sross6 hexadecimal digits of fraction.  The most significant hexadecimal digit
5071.1Srossof the fraction can be at most a `7'.
5081.1Sross
5091.1SrossNotable single-precision values include:
5101.1Sross
5111.1Sross    000.000000    +0
5121.1Sross    07F.000000     1
5131.1Sross    080.000000     2
5141.1Sross    0FF.000000    +infinity
5151.1Sross
5161.1Sross    800.000000    -0
5171.1Sross    87F.000000    -1
5181.1Sross    880.000000    -2
5191.1Sross    8FF.000000    -infinity
5201.1Sross
5211.1Sross    07E.7FFFFF    largest representable number preceding +1
5221.1Sross
5231.1SrossAgain, certain categories are easily distinguished (assuming the `x's are
5241.1Srossnot all 0):
5251.1Sross
5261.1Sross    000.xxxxxx    positive subnormal (denormalized) numbers
5271.1Sross    0FF.xxxxxx    positive NaNs
5281.1Sross    800.xxxxxx    negative subnormal numbers
5291.1Sross    8FF.xxxxxx    negative NaNs
5301.1Sross
5311.1SrossLastly, exception flag values are represented by five characters, one
5321.1Srosscharacter per flag.  Each flag is written as either a letter or a period
5331.1Sross(`.') according to whether the flag was set or not by the operation.  A
5341.1Srossperiod indicates the flag was not set.  The letter used to indicate a set
5351.1Srossflag depends on the flag:
5361.1Sross
5371.1Sross    v    invalid flag
5381.1Sross    z    division-by-zero flag
5391.1Sross    o    overflow flag
5401.1Sross    u    underflow flag
5411.1Sross    x    inexact flag
5421.1Sross
5431.1SrossFor example, the notation `...ux' indicates that the underflow and inexact
5441.1Srossexception flags were set and that the other three flags (invalid, division-
5451.1Srossby-zero, and overflow) were not set.  The exception flags are always shown
5461.1Srossfollowing the value returned as the result of the operation.
5471.1Sross
5481.1SrossThe output from `testsoftfloat' is of the same form, except that the results
5491.1Srossare labeled `true' and `soft':
5501.1Sross
5511.1Sross    <inputs>  true: <simple-software-result>  soft: <SoftFloat-result>
5521.1Sross
5531.1SrossThe ``true'' result is from the simpler, slower software floating-point,
5541.1Srosswhich, although not necessarily correct, is more likely to be right than
5551.1Srossthe SoftFloat (`soft') result.
5561.1Sross
5571.1Sross
5581.1Sross-------------------------------------------------------------------------------
5591.1SrossVariations Allowed by the IEC/IEEE Standard
5601.1Sross
5611.1SrossThe IEC/IEEE Standard admits some variation among conforming
5621.1Srossimplementations.  Because TestFloat expects the two implementations being
5631.1Srosscompared to deliver bit-for-bit identical results under most circumstances,
5641.1Srossthis leeway in the standard can result in false errors being reported if
5651.1Srossthe two implementations do not make the same choices everywhere the standard
5661.1Srossprovides an option.
5671.1Sross
5681.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
5691.1SrossUnderflow
5701.1Sross
5711.1SrossThe standard specifies that the underflow exception flag is to be raised
5721.1Srosswhen two conditions are met simultaneously:  (1) _tininess_ and (2) _loss_
5731.1Sross_of_accuracy_.  A result is tiny when its magnitude is nonzero yet smaller
5741.1Srossthan any normalized floating-point number.  The standard allows tininess to
5751.1Srossbe determined either before or after a result is rounded to the destination
5761.1Srossprecision.  If tininess is detected before rounding, some borderline cases
5771.1Srosswill be flagged as underflows even though the result after rounding actually
5781.1Srosslies within the normal floating-point range.  By detecting tininess after
5791.1Srossrounding, a system can avoid some unnecessary signaling of underflow.
5801.1Sross
5811.1SrossLoss of accuracy occurs when the subnormal format is not sufficient
5821.1Srossto represent an underflowed result accurately.  The standard allows
5831.1Srossloss of accuracy to be detected either as an _inexact_result_ or as a
5841.1Sross_denormalization_loss_.  If loss of accuracy is detected as an inexact
5851.1Srossresult, the underflow flag is raised whenever an underflowed quantity
5861.1Srosscannot be exactly represented in the subnormal format (that is, whenever the
5871.1Srossinexact flag is also raised).  A denormalization loss, on the other hand,
5881.1Srossoccurs only when the subnormal format is not able to represent the result
5891.1Srossthat would have been returned if the destination format had infinite range.
5901.1SrossSome underflowed results are inexact but do not suffer a denormalization
5911.1Srossloss.  By detecting loss of accuracy as a denormalization loss, a system can
5921.1Srossonce again avoid some unnecessary signaling of underflow.
5931.1Sross
5941.1SrossThe `-tininessbefore' and `-tininessafter' options can be used to control
5951.1Srosswhether TestFloat expects tininess on underflow to be detected before or
5961.1Srossafter rounding.  (See _TestFloat_Options_ below.)  One or the other is
5971.1Srossselected as the default when TestFloat is compiled, but these command
5981.1Srossoptions allow the default to be overridden.
5991.1Sross
6001.1SrossMost (possibly all) systems detect loss of accuracy as an inexact result.
6011.1SrossThe current version of TestFloat can only test for this case.
6021.1Sross
6031.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6041.1SrossNaNs
6051.1Sross
6061.1SrossThe IEC/IEEE Standard gives the floating-point formats a large number of
6071.1SrossNaN encodings and specifies that NaNs are to be returned as results under
6081.1Srosscertain conditions.  However, the standard allows an implementation almost
6091.1Srosscomplete freedom over _which_ NaN to return in each situation.
6101.1Sross
6111.1SrossBy default, TestFloat does not check the bit patterns of NaN results.  When
6121.1Srossthe result of an operation should be a NaN, any NaN is considered as good
6131.1Srossas another.  This laxness can be overridden with the `-checkNaNs' option.
6141.1Sross(See _TestFloat_Options_ below.)  In order for this option to be sensible,
6151.1SrossTestFloat must have been compiled so that its internal floating-point
6161.1Srossimplementation (SoftFloat) generates the proper NaN results for the system
6171.1Srossbeing tested.
6181.1Sross
6191.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6201.1SrossConversions to Integer
6211.1Sross
6221.1SrossConversion of a floating-point value to an integer format will fail if the
6231.1Srosssource value is a NaN or if it is too large.  The IEC/IEEE Standard does not
6241.1Srossspecify what value should be returned as the integer result in these cases.
6251.1SrossMoreover, according to the standard, the invalid exception can be raised or
6261.1Srossan unspecified alternative mechanism may be used to signal such cases.
6271.1Sross
6281.1SrossTestFloat assumes that conversions to integer will raise the invalid
6291.1Srossexception if the source value cannot be rounded to a representable integer.
6301.1SrossWhen the conversion overflows, TestFloat expects the largest integer with
6311.1Srossthe same sign as the operand to be returned.  If the floating-point operand
6321.2Sandvaris a NaN, TestFloat allows either the largest positive or largest negative
6331.1Srossinteger to be returned.  The current version of TestFloat provides no means
6341.1Srossto alter these conventions.
6351.1Sross
6361.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6371.1Sross
6381.1Sross
6391.1Sross-------------------------------------------------------------------------------
6401.1SrossTestFloat Options
6411.1Sross
6421.1SrossThe `testfloat' (and `testsoftfloat') program accepts several command
6431.1Srossoptions.  If mutually contradictory options are given, the last one has
6441.1Srosspriority.
6451.1Sross
6461.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6471.1Sross-help
6481.1Sross
6491.1SrossThe `-help' option causes a summary of program usage to be written, after
6501.1Srosswhich the program exits.
6511.1Sross
6521.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6531.1Sross-list
6541.1Sross
6551.1SrossThe `-list' option causes a list of testable functions to be written,
6561.1Srossafter which the program exits.  Some machines do not implement all of the
6571.1Srossfunctions TestFloat can test, plus it may not be possible to test functions
6581.1Srossthat are inaccessible from the C language.
6591.1Sross
6601.1SrossThe `testsoftfloat' program does not have this option.  All SoftFloat
6611.1Srossfunctions can be tested by `testsoftfloat'.
6621.1Sross
6631.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6641.1Sross-level <num>
6651.1Sross
6661.1SrossThe `-level' option sets the level of testing.  The argument to `-level' can
6671.1Srossbe either 1 or 2.  The default is level 1.  Level 2 performs many more tests
6681.1Srossthan level 1.  Testing at level 2 can take as much as a day (even longer for
6691.1Sross`testsoftfloat'), but can reveal bugs not found by level 1.
6701.1Sross
6711.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6721.1Sross-errors <num>
6731.1Sross
6741.1SrossThe `-errors' option instructs TestFloat to report no more than the
6751.1Srossspecified number of errors for any combination of function, rounding mode,
6761.1Srossetc.  The argument to `-errors' must be a nonnegative decimal number.  Once
6771.1Srossthe specified number of error reports has been generated, TestFloat ends the
6781.1Srosscurrent test and begins the next one, if any.  The default is `-errors 20'.
6791.1Sross
6801.1SrossAgainst intuition, `-errors 0' causes TestFloat to report every error it
6811.1Srossfinds.
6821.1Sross
6831.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6841.1Sross-errorstop
6851.1Sross
6861.1SrossThe `-errorstop' option causes the program to exit after the first function
6871.1Srossfor which any errors are reported.
6881.1Sross
6891.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6901.1Sross-forever
6911.1Sross
6921.1SrossThe `-forever' option causes a single operation to be repeatedly tested.
6931.1SrossOnly one rounding mode and/or rounding precision can be tested in a single
6941.1Srossinvocation.  If not specified, the rounding mode defaults to nearest/even.
6951.1SrossFor extended double-precision operations, the rounding precision defaults
6961.1Srossto full extended double precision.  The testing level is set to 2 by this
6971.1Srossoption.
6981.1Sross
6991.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7001.1Sross-checkNaNs
7011.1Sross
7021.1SrossThe `-checkNaNs' option causes TestFloat to verify the bitwise correctness
7031.1Srossof NaN results.  In order for this option to be sensible, TestFloat must
7041.1Srosshave been compiled so that its internal floating-point implementation
7051.1Sross(SoftFloat) generates the proper NaN results for the system being tested.
7061.1Sross
7071.1SrossThis option is not available to `testsoftfloat'.
7081.1Sross
7091.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7101.1Sross-precision32, -precision64, -precision80
7111.1Sross
7121.1SrossFor extended double-precision functions affected by rounding precision
7131.1Srosscontrol, the `-precision32' option restricts testing to only the cases
7141.1Srossin which rounding precision is equivalent to single precision.  The other
7151.1Srossrounding precision options are not tested.  Likewise, the `-precision64'
7161.1Srossand `-precision80' options fix the rounding precision equivalent to double
7171.1Srossprecision or extended double precision, respectively.  These options are
7181.1Srossignored for functions not affected by rounding precision control.
7191.1Sross
7201.1SrossThese options are not available if extended double precision is not
7211.1Srosssupported by the machine or if extended double precision functions cannot be
7221.1Srosstested.
7231.1Sross
7241.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7251.1Sross-nearesteven, -tozero, -down, -up
7261.1Sross
7271.1SrossThe `-nearesteven' option restricts testing to only the cases in which the
7281.1Srossrounding mode is nearest/even.  The other rounding mode options are not
7291.1Srosstested.  Likewise, `-tozero' forces rounding to zero; `-down' forces
7301.1Srossrounding down; and `-up' forces rounding up.  These options are ignored for
7311.1Srossfunctions that are exact and thus do not round.
7321.1Sross
7331.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7341.1Sross-tininessbefore, -tininessafter
7351.1Sross
7361.1SrossThe `-tininessbefore' option indicates that the system detects tininess
7371.1Srosson underflow before rounding.  The `-tininessafter' option indicates that
7381.1Srosstininess is detected after rounding.  TestFloat alters its expectations
7391.1Srossaccordingly.  These options override the default selected when TestFloat was
7401.1Srosscompiled.  Choosing the wrong one of these two options should cause error
7411.1Srossreports for some (not all) functions.
7421.1Sross
7431.1SrossFor `testsoftfloat', these options operate more like the rounding precision
7441.1Srossand rounding mode options, in that they restrict the tests performed by
7451.1Sross`testsoftfloat'.  By default, `testsoftfloat' tests both cases for any
7461.1Srossfunction for which there is a difference.
7471.1Sross
7481.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7491.1Sross
7501.1Sross
7511.1Sross-------------------------------------------------------------------------------
7521.1SrossFunction Sets
7531.1Sross
7541.1SrossJust as TestFloat can test an operation for all four rounding modes in
7551.1Srosssequence, multiple operations can be tested with a single invocation of
7561.1SrossTestFloat.  Three sets are recognized:  `-all1', `-all2', and `-all'.  The
7571.1Srossset `-all1' comprises all one-operand functions; `-all2' is all two-operand
7581.1Srossfunctions; and `-all' is all functions.  A function set can be used in place
7591.1Srossof a function name in the TestFloat command line, such as
7601.1Sross
7611.1Sross    testfloat [<option>...] -all
7621.1Sross
7631.1Sross
7641.1Sross-------------------------------------------------------------------------------
7651.1SrossContact Information
7661.1Sross
7671.1SrossAt the time of this writing, the most up-to-date information about
7681.1SrossTestFloat and the latest release can be found at the Web page `http://
7691.1SrossHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/TestFloat.html'.
7701.1Sross
7711.1Sross
772