11.1Sross 21.1SrossTestFloat Release 2a General Documentation 31.1Sross 41.1SrossJohn R. Hauser 51.1Sross1998 December 16 61.1Sross 71.1Sross 81.1Sross------------------------------------------------------------------------------- 91.1SrossIntroduction 101.1Sross 111.1SrossTestFloat is a program for testing that a floating-point implementation 121.1Srossconforms to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. 131.1SrossAll standard operations supported by the system can be tested, except for 141.1Srossconversions to and from decimal. Any of the following machine formats can 151.1Srossbe tested: single precision, double precision, extended double precision, 161.1Srossand/or quadruple precision. 171.1Sross 181.1SrossTestFloat actually comes in two variants: one is a program for testing 191.1Srossa machine's floating-point, and the other is a program for testing 201.1Srossthe SoftFloat software implementation of floating-point. (Information 211.1Srossabout SoftFloat can be found at the SoftFloat Web page, `http:// 221.1SrossHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.) The version that 231.1Srosstests SoftFloat is expected to be of interest only to people compiling the 241.1SrossSoftFloat sources. However, because the two versions share much in common, 251.1Srossthey are discussed together in all the TestFloat documentation. 261.1Sross 271.1SrossThis document explains how to use the TestFloat programs. It does not 281.1Srossattempt to define or explain the IEC/IEEE Standard for floating-point. 291.1SrossDetails about the standard are available elsewhere. 301.1Sross 311.1SrossThe first release of TestFloat (Release 1) was called _FloatTest_. The old 321.1Srossname has been obsolete for some time. 331.1Sross 341.1Sross 351.1Sross------------------------------------------------------------------------------- 361.1SrossLimitations 371.1Sross 381.1SrossTestFloat's output is not always easily interpreted. Detailed knowledge 391.1Srossof the IEC/IEEE Standard and its vagaries is needed to use TestFloat 401.1Srossresponsibly. 411.1Sross 421.1SrossTestFloat performs relatively simple tests designed to check the fundamental 431.1Srosssoundness of the floating-point under test. TestFloat may also at times 441.1Srossmanage to find rarer and more subtle bugs, but it will probably only find 451.1Srosssuch bugs by accident. Software that purposefully seeks out various kinds 461.1Srossof subtle floating-point bugs can be found through links posted on the 471.1SrossTestFloat Web page (`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/ 481.1SrossTestFloat.html'). 491.1Sross 501.1Sross 511.1Sross------------------------------------------------------------------------------- 521.1SrossContents 531.1Sross 541.1Sross Introduction 551.1Sross Limitations 561.1Sross Contents 571.1Sross Legal Notice 581.1Sross What TestFloat Does 591.1Sross Executing TestFloat 601.1Sross Functions Tested by TestFloat 611.1Sross Conversion Functions 621.1Sross Standard Arithmetic Functions 631.1Sross Remainder and Round-to-Integer Functions 641.1Sross Comparison Functions 651.1Sross Interpreting TestFloat Output 661.1Sross Variations Allowed by the IEC/IEEE Standard 671.1Sross Underflow 681.1Sross NaNs 691.1Sross Conversions to Integer 701.1Sross TestFloat Options 711.1Sross -help 721.1Sross -list 731.1Sross -level <num> 741.1Sross -errors <num> 751.1Sross -errorstop 761.1Sross -forever 771.1Sross -checkNaNs 781.1Sross -precision32, -precision64, -precision80 791.1Sross -nearesteven, -tozero, -down, -up 801.1Sross -tininessbefore, -tininessafter 811.1Sross Function Sets 821.1Sross Contact Information 831.1Sross 841.1Sross 851.1Sross 861.1Sross------------------------------------------------------------------------------- 871.1SrossLegal Notice 881.1Sross 891.1SrossTestFloat was written by John R. Hauser. 901.1Sross 911.1SrossTHIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort 921.1Srosshas been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT 931.1SrossTIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO 941.1SrossPERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY 951.1SrossAND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. 961.1Sross 971.1Sross 981.1Sross------------------------------------------------------------------------------- 991.1SrossWhat TestFloat Does 1001.1Sross 1011.1SrossTestFloat tests a system's floating-point by comparing its behavior with 1021.1Srossthat of TestFloat's own internal floating-point implemented in software. 1031.1SrossFor each operation tested, TestFloat generates a large number of test cases, 1041.1Srossmade up of simple pattern tests intermixed with weighted random inputs. 1051.1SrossThe cases generated should be adequate for testing carry chain propagations, 1061.1Srossplus the rounding of adds, subtracts, multiplies, and simple operations like 1071.1Srossconversions. TestFloat makes a point of checking all boundary cases of the 1081.1Srossarithmetic, including underflows, overflows, invalid operations, subnormal 1091.1Srossinputs, zeros (positive and negative), infinities, and NaNs. For the 1101.1Srossinteresting operations like adds and multiplies, literally millions of test 1111.1Srosscases can be checked. 1121.1Sross 1131.1SrossTestFloat is not remarkably good at testing difficult rounding cases for 1141.1Srossdivisions and square roots. It also makes no attempt to find bugs specific 1151.1Srossto SRT divisions and the like (such as the infamous Pentium divide bug). 1161.1SrossSoftware that tests for such failures can be found through links on the 1171.1SrossTestFloat Web page, `http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/ 1181.1SrossTestFloat.html'. 1191.1Sross 1201.1SrossNOTE! 1211.1SrossIt is the responsibility of the user to verify that the discrepancies 1221.1SrossTestFloat finds actually represent faults in the system being tested. 1231.1SrossAdvice to help with this task is provided later in this document. 1241.1SrossFurthermore, even if TestFloat finds no fault with a floating-point 1251.1Srossimplementation, that in no way guarantees that the implementation is bug- 1261.1Srossfree. 1271.1Sross 1281.1SrossFor each operation, TestFloat can test all four rounding modes required 1291.1Srossby the IEC/IEEE Standard. TestFloat verifies not only that the numeric 1301.1Srossresults of an operation are correct, but also that the proper floating-point 1311.1Srossexception flags are raised. All five exception flags are tested, including 1321.1Srossthe inexact flag. TestFloat does not attempt to verify that the floating- 1331.1Srosspoint exception flags are actually implemented as sticky flags. 1341.1Sross 1351.1SrossFor machines that implement extended double precision with rounding 1361.1Srossprecision control (such as Intel's 80x86), TestFloat can test the add, 1371.1Srosssubtract, multiply, divide, and square root functions at all the standard 1381.1Srossrounding precisions. The rounding precision can be set equivalent to single 1391.1Srossprecision, to double precision, or to the full extended double precision. 1401.1SrossRounding precision control can only be applied to the extended double- 1411.1Srossprecision format and only for the five standard arithmetic operations: add, 1421.1Srosssubtract, multiply, divide, and square root. Other functions can be tested 1431.1Srossonly at full precision. 1441.1Sross 1451.1SrossAs a rule, TestFloat is not particular about the bit patterns of NaNs that 1461.1Srossappear as function results. Any NaN is considered as good a result as 1471.1Srossanother. This laxness can be overridden so that TestFloat checks for 1481.1Srossparticular bit patterns within NaN results. See the sections _Variations_ 1491.1Sross_Allowed_by_the_IEC/IEEE_Standard_ and _TestFloat_Options_ for details. 1501.1Sross 1511.1SrossNot all IEC/IEEE Standard functions are supported by all machines. 1521.1SrossTestFloat can only test functions that exist on the machine. But even if 1531.1Srossa function is supported by the machine, TestFloat may still not be able 1541.1Srossto test the function if it is not accessible through standard ISO C (the 1551.1Srossprogramming language in which TestFloat is written) and if the person who 1561.1Srosscompiled TestFloat did not provide an alternate means for TestFloat to 1571.1Srossinvoke the machine function. 1581.1Sross 1591.1SrossTestFloat compares a machine's floating-point against the SoftFloat software 1601.1Srossimplementation of floating-point, also written by me. SoftFloat is built 1611.1Srossinto the TestFloat executable and does not need to be supplied by the user. 1621.1SrossIf SoftFloat is wanted for some other reason (to compile a new version 1631.1Srossof TestFloat, for instance), it can be found separately at the Web page 1641.1Sross`http://HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'. 1651.1Sross 1661.1SrossFor testing SoftFloat itself, the TestFloat package includes a program that 1671.1Srosscompares SoftFloat's floating-point against _another_ software floating- 1681.1Srosspoint implementation. The second software floating-point is simpler and 1691.1Srossslower than SoftFloat, and is completely independent of SoftFloat. Although 1701.1Srossthe second software floating-point cannot be guaranteed to be bug-free, the 1711.1Srosschance that it would mimic any of SoftFloat's bugs is remote. Consequently, 1721.1Srossan error in one or the other floating-point version should appear as an 1731.1Srossunexpected discrepancy between the two implementations. Note that testing 1741.1SrossSoftFloat should only be necessary when compiling a new TestFloat executable 1751.1Srossor when compiling SoftFloat for some other reason. 1761.1Sross 1771.1Sross 1781.1Sross------------------------------------------------------------------------------- 1791.1SrossExecuting TestFloat 1801.1Sross 1811.1SrossTestFloat is intended to be executed from a command line interpreter. The 1821.1Sross`testfloat' program is invoked as follows: 1831.1Sross 1841.1Sross testfloat [<option>...] <function> 1851.1Sross 1861.1SrossHere square brackets ([]) indicate optional items, while angled brackets 1871.1Sross(<>) denote parameters to be filled in. 1881.1Sross 1891.1SrossThe `<function>' argument is a name like `float32_add' or `float64_to_int32'. 1901.1SrossThe complete list of function names is given in the next section, 1911.1Sross_Functions_Tested_by_TestFloat_. It is also possible to test all machine 1921.1Srossfunctions in a single invocation. The various options to TestFloat are 1931.1Srossdetailed in the section _TestFloat_Options_ later in this document. If 1941.1Sross`testfloat' is executed without any arguments, a summary of TestFloat usage 1951.1Srossis written. 1961.1Sross 1971.1SrossTestFloat will ordinarily test a function for all four rounding modes, one 1981.1Srossafter the other. If the rounding mode is not supposed to have any affect 1991.1Srosson the results--for instance, some operations do not require rounding--only 2001.1Srossthe nearest/even rounding mode is checked. For extended double-precision 2011.1Srossoperations affected by rounding precision control, TestFloat also tests all 2021.1Srossthree rounding precision modes, one after the other. Testing can be limited 2031.1Srossto a single rounding mode and/or rounding precision with appropriate options 2041.1Sross(see _TestFloat_Options_). 2051.1Sross 2061.1SrossAs it executes, TestFloat writes status information to the standard error 2071.1Srossoutput, which should be the screen by default. In order for this status to 2081.1Srossbe displayed properly, the standard error stream should not be redirected 2091.1Srossto a file. The discrepancies TestFloat finds are written to the standard 2101.1Srossoutput stream, which is easily redirected to a file if desired. Ordinarily, 2111.1Srossthe errors TestFloat reports and the ongoing status information appear 2121.1Srossintermixed on the same screen. 2131.1Sross 2141.1SrossThe version of TestFloat for testing SoftFloat is called `testsoftfloat'. 2151.1SrossIt is invoked the same as `testfloat', 2161.1Sross 2171.1Sross testsoftfloat [<option>...] <function> 2181.1Sross 2191.1Srossand operates similarly. 2201.1Sross 2211.1Sross 2221.1Sross------------------------------------------------------------------------------- 2231.1SrossFunctions Tested by TestFloat 2241.1Sross 2251.1SrossTestFloat tests all operations required by the IEC/IEEE Standard except for 2261.1Srossconversions to and from decimal. The operations are 2271.1Sross 2281.1Sross-- Conversions among the supported floating-point formats, and also between 2291.1Sross integers (32-bit and 64-bit) and any of the floating-point formats. 2301.1Sross 2311.1Sross-- The usual add, subtract, multiply, divide, and square root operations 2321.1Sross for all supported floating-point formats. 2331.1Sross 2341.1Sross-- For each format, the floating-point remainder operation defined by the 2351.1Sross IEC/IEEE Standard. 2361.1Sross 2371.1Sross-- For each floating-point format, a ``round to integer'' operation that 2381.1Sross rounds to the nearest integer value in the same format. (The floating- 2391.1Sross point formats can hold integer values, of course.) 2401.1Sross 2411.1Sross-- Comparisons between two values in the same floating-point format. 2421.1Sross 2431.1SrossDetailed information about these functions is given below. In the function 2441.1Srossnames used by TestFloat, single precision is called `float32', double 2451.1Srossprecision is `float64', extended double precision is `floatx80', and 2461.1Srossquadruple precision is `float128'. TestFloat uses the same names for 2471.1Srossfunctions as SoftFloat. 2481.1Sross 2491.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2501.1SrossConversion Functions 2511.1Sross 2521.1SrossAll conversions among the floating-point formats and all conversion between 2531.1Srossa floating-point format and 32-bit and 64-bit signed integers can be tested. 2541.1SrossThe conversion functions are: 2551.1Sross 2561.1Sross int32_to_float32 int64_to_float32 2571.1Sross int32_to_float64 int64_to_float32 2581.1Sross int32_to_floatx80 int64_to_floatx80 2591.1Sross int32_to_float128 int64_to_float128 2601.1Sross 2611.1Sross float32_to_int32 float32_to_int64 2621.1Sross float32_to_int32 float64_to_int64 2631.1Sross floatx80_to_int32 floatx80_to_int64 2641.1Sross float128_to_int32 float128_to_int64 2651.1Sross 2661.1Sross float32_to_float64 float32_to_floatx80 float32_to_float128 2671.1Sross float64_to_float32 float64_to_floatx80 float64_to_float128 2681.1Sross floatx80_to_float32 floatx80_to_float64 floatx80_to_float128 2691.1Sross float128_to_float32 float128_to_float64 float128_to_floatx80 2701.1Sross 2711.1SrossThese conversions all round according to the current rounding mode as 2721.1Srossnecessary. Conversions from a smaller to a larger floating-point format are 2731.1Srossalways exact and so require no rounding. Conversions from 32-bit integers 2741.1Srossto double precision or to any larger floating-point format are also exact, 2751.1Srossand likewise for conversions from 64-bit integers to extended double and 2761.1Srossquadruple precisions. 2771.1Sross 2781.1SrossISO/ANSI C requires that conversions to integers be rounded toward zero. 2791.1SrossSuch conversions can be tested with the following functions that ignore any 2801.1Srossrounding mode: 2811.1Sross 2821.1Sross float32_to_int32_round_to_zero float32_to_int64_round_to_zero 2831.1Sross float64_to_int32_round_to_zero float64_to_int64_round_to_zero 2841.1Sross floatx80_to_int32_round_to_zero floatx80_to_int64_round_to_zero 2851.1Sross float128_to_int32_round_to_zero float128_to_int64_round_to_zero 2861.1Sross 2871.1SrossTestFloat assumes that conversions from floating-point to integer should 2881.1Srossraise the invalid exception if the source value cannot be rounded to a 2891.1Srossrepresentable integer of the desired size (32 or 64 bits). If such a 2901.1Srossconversion overflows, TestFloat expects the largest integer with the same 2911.1Srosssign as the operand to be returned. If the floating-point operand is a NaN, 2921.2SandvarTestFloat allows either the largest positive or largest negative integer to 2931.1Srossbe returned. 2941.1Sross 2951.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2961.1SrossStandard Arithmetic Functions 2971.1Sross 2981.1SrossThe following standard arithmetic functions can be tested: 2991.1Sross 3001.1Sross float32_add float32_sub float32_mul float32_div float32_sqrt 3011.1Sross float64_add float64_sub float64_mul float64_div float64_sqrt 3021.1Sross floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt 3031.1Sross float128_add float128_sub float128_mul float128_div float128_sqrt 3041.1Sross 3051.1SrossThe extended double-precision (`floatx80') functions can be rounded to 3061.1Srossreduced precision under rounding precision control. 3071.1Sross 3081.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3091.1SrossRemainder and Round-to-Integer Functions 3101.1Sross 3111.1SrossFor each format, TestFloat can test the IEC/IEEE Standard remainder and 3121.1Srossround-to-integer functions. The remainder functions are: 3131.1Sross 3141.1Sross float32_rem 3151.1Sross float64_rem 3161.1Sross floatx80_rem 3171.1Sross float128_rem 3181.1Sross 3191.1SrossThe round-to-integer functions are: 3201.1Sross 3211.1Sross float32_round_to_int 3221.1Sross float64_round_to_int 3231.1Sross floatx80_round_to_int 3241.1Sross float128_round_to_int 3251.1Sross 3261.1SrossThe remainder functions are always exact and so do not require rounding. 3271.1Sross 3281.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3291.1SrossComparison Functions 3301.1Sross 3311.1SrossThe following floating-point comparison functions can be tested: 3321.1Sross 3331.1Sross float32_eq float32_le float32_lt 3341.1Sross float64_eq float64_le float64_lt 3351.1Sross floatx80_eq floatx80_le floatx80_lt 3361.1Sross float128_eq float128_le float128_lt 3371.1Sross 3381.1SrossThe abbreviation `eq' stands for ``equal'' (=); `le' stands for ``less than 3391.1Srossor equal'' (<=); and `lt' stands for ``less than'' (<). 3401.1Sross 3411.1SrossThe IEC/IEEE Standard specifies that the less-than-or-equal and less-than 3421.1Srossfunctions raise the invalid exception if either input is any kind of NaN. 3431.1SrossThe equal functions, for their part, are defined not to raise the invalid 3441.1Srossexception on quiet NaNs. For completeness, the following additional 3451.1Srossfunctions can be tested if supported: 3461.1Sross 3471.1Sross float32_eq_signaling float32_le_quiet float32_lt_quiet 3481.1Sross float64_eq_signaling float64_le_quiet float64_lt_quiet 3491.1Sross floatx80_eq_signaling floatx80_le_quiet floatx80_lt_quiet 3501.1Sross float128_eq_signaling float128_le_quiet float128_lt_quiet 3511.1Sross 3521.1SrossThe `signaling' equal functions are identical to the standard functions 3531.1Srossexcept that the invalid exception should be raised for any NaN input. 3541.1SrossLikewise, the `quiet' comparison functions should be identical to their 3551.1Srosscounterparts except that the invalid exception is not raised for quiet NaNs. 3561.1Sross 3571.1SrossObviously, no comparison functions ever require rounding. Any rounding mode 3581.1Srossis ignored. 3591.1Sross 3601.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3611.1Sross 3621.1Sross 3631.1Sross------------------------------------------------------------------------------- 3641.1SrossInterpreting TestFloat Output 3651.1Sross 3661.1SrossThe ``errors'' reported by TestFloat may or may not really represent errors 3671.1Srossin the system being tested. For each test case tried, TestFloat performs 3681.1Srossthe same floating-point operation for the two implementations being compared 3691.1Srossand reports any unexpected difference in the results. The two results could 3701.1Srossdiffer for several reasons: 3711.1Sross 3721.1Sross-- The IEC/IEEE Standard allows for some variation in how conforming 3731.1Sross floating-point behaves. Two implementations can occasionally give 3741.1Sross different results without either being incorrect. 3751.1Sross 3761.1Sross-- The trusted floating-point emulation could be faulty. This could be 3771.1Sross because there is a bug in the way the enulation is coded, or because a 3781.1Sross mistake was made when the code was compiled for the current system. 3791.1Sross 3801.1Sross-- TestFloat may not work properly, reporting discrepancies that do not 3811.1Sross exist. 3821.1Sross 3831.1Sross-- Lastly, the floating-point being tested could actually be faulty. 3841.1Sross 3851.1SrossIt is the responsibility of the user to determine the causes for the 3861.1Srossdiscrepancies TestFloat reports. Making this determination can require 3871.1Srossdetailed knowledge about the IEC/IEEE Standard. Assuming TestFloat is 3881.1Srossworking properly, any differences found will be due to either the first or 3891.1Srosslast of these reasons. Variations in the IEC/IEEE Standard that could lead 3901.1Srossto false error reports are discussed in the section _Variations_Allowed_by_ 3911.1Sross_the_IEC/IEEE_Standard_. 3921.1Sross 3931.1SrossFor each error (or apparent error) TestFloat reports, a line of text 3941.1Srossis written to the default output. If a line would be longer than 79 3951.1Srosscharacters, it is divided. The first part of each error line begins in the 3961.1Srossleftmost column, and any subsequent ``continuation'' lines are indented with 3971.1Srossa tab. 3981.1Sross 3991.1SrossEach error reported by `testfloat' is of the form: 4001.1Sross 4011.1Sross <inputs> soft: <output-from-emulation> syst: <output-from-system> 4021.1Sross 4031.1SrossThe `<inputs>' are the inputs to the operation. Each output is shown as a 4041.1Srosspair: the result value first, followed by the exception flags. The `soft' 4051.1Srosslabel stands for ``software'' (or ``SoftFloat''), while `syst' stands for 4061.1Sross``system,'' the machine's floating-point. 4071.1Sross 4081.1SrossFor example, two typical error lines could be 4091.1Sross 4101.1Sross 800.7FFF00 87F.000100 soft: 001.000000 ....x syst: 001.000000 ...ux 4111.1Sross 081.000004 000.1FFFFF soft: 001.000000 ....x syst: 001.000000 ...ux 4121.1Sross 4131.1SrossIn the first line, the inputs are `800.7FFF00' and `87F.000100'. The 4141.1Srossinternal emulation result is `001.000000' with flags `....x', and the 4151.1Srosssystem result is the same but with flags `...ux'. All the items composed of 4161.1Srosshexadecimal digits and a single period represent floating-point values (here 4171.1Srosssingle precision). These cases were reported as errors because the flag 4181.1Srossresults differ. 4191.1Sross 4201.1SrossIn addition to the exception flags, there are seven data types that may 4211.1Srossbe represented. Four are floating-point types: single precision, double 4221.1Srossprecision, extended double precision, and quadruple precision. The 4231.1Srossremaining three types are 32-bit and 64-bit two's-complement integers and 4241.1SrossBoolean values (the results of comparison operations). Boolean values are 4251.1Srossrepresented as a single character, either a `0' or a `1'. 32-bit integers 4261.1Srossare written as 8 hexadecimal digits in two's-complement form. Thus, 4271.1Sross`FFFFFFFF' is -1, and `7FFFFFFF' is the largest positive 32-bit integer. 4281.1Sross64-bit integers are the same except with 16 hexadecimal digits. 4291.1Sross 4301.1SrossFloating-point values are written in a correspondingly primitive form. 4311.1SrossDouble-precision values are represented by 16 hexadecimal digits that give 4321.1Srossthe raw bits of the floating-point encoding. A period separates the 3rd and 4331.1Sross4th hexadecimal digits to mark the division between the exponent bits and 4341.1Srossfraction bits. Some notable double-precision values include: 4351.1Sross 4361.1Sross 000.0000000000000 +0 4371.1Sross 3FF.0000000000000 1 4381.1Sross 400.0000000000000 2 4391.1Sross 7FF.0000000000000 +infinity 4401.1Sross 4411.1Sross 800.0000000000000 -0 4421.1Sross BFF.0000000000000 -1 4431.1Sross C00.0000000000000 -2 4441.1Sross FFF.0000000000000 -infinity 4451.1Sross 4461.1Sross 3FE.FFFFFFFFFFFFF largest representable number preceding +1 4471.1Sross 4481.1SrossThe following categories are easily distinguished (assuming the `x's are not 4491.1Srossall 0): 4501.1Sross 4511.1Sross 000.xxxxxxxxxxxxx positive subnormal (denormalized) numbers 4521.1Sross 7FF.xxxxxxxxxxxxx positive NaNs 4531.1Sross 800.xxxxxxxxxxxxx negative subnormal numbers 4541.1Sross FFF.xxxxxxxxxxxxx negative NaNs 4551.1Sross 4561.1SrossQuadruple-precision values are written the same except with 4 hexadecimal 4571.1Srossdigits for the sign and exponent and 28 for the fraction. Notable values 4581.1Srossinclude: 4591.1Sross 4601.1Sross 0000.0000000000000000000000000000 +0 4611.1Sross 3FFF.0000000000000000000000000000 1 4621.1Sross 4000.0000000000000000000000000000 2 4631.1Sross 7FFF.0000000000000000000000000000 +infinity 4641.1Sross 4651.1Sross 8000.0000000000000000000000000000 -0 4661.1Sross BFFF.0000000000000000000000000000 -1 4671.1Sross C000.0000000000000000000000000000 -2 4681.1Sross FFFF.0000000000000000000000000000 -infinity 4691.1Sross 4701.1Sross 3FFE.FFFFFFFFFFFFFFFFFFFFFFFFFFFF largest representable number 4711.1Sross preceding +1 4721.1Sross 4731.1SrossExtended double-precision values are a little unusual in that the leading 4741.1Srosssignificand bit is not hidden as with other formats. When correctly 4751.1Srossencoded, the leading significand bit of an extended double-precision value 4761.1Srosswill be 0 if the value is zero or subnormal, and will be 1 otherwise. 4771.1SrossHence, the same values listed above appear in extended double-precision as 4781.1Srossfollows (note the leading `8' digit in the significands): 4791.1Sross 4801.1Sross 0000.0000000000000000 +0 4811.1Sross 3FFF.8000000000000000 1 4821.1Sross 4000.8000000000000000 2 4831.1Sross 7FFF.8000000000000000 +infinity 4841.1Sross 4851.1Sross 8000.0000000000000000 -0 4861.1Sross BFFF.8000000000000000 -1 4871.1Sross C000.8000000000000000 -2 4881.1Sross FFFF.8000000000000000 -infinity 4891.1Sross 4901.1Sross 3FFE.FFFFFFFFFFFFFFFF largest representable number preceding +1 4911.1Sross 4921.1SrossThe representation of single-precision values is unusual for a different 4931.1Srossreason. Because the subfields of standard single-precision do not fall 4941.1Srosson neat 4-bit boundaries, single-precision outputs are slightly perturbed. 4951.1SrossThese are written as 9 hexadecimal digits, with a period separating the 3rd 4961.1Srossand 4th hexadecimal digits. Broken out into bits, the 9 hexademical digits 4971.1Srosscover the single-precision subfields as follows: 4981.1Sross 4991.1Sross x000 .... .... . .... .... .... .... .... .... sign (1 bit) 5001.1Sross .... xxxx xxxx . .... .... .... .... .... .... exponent (8 bits) 5011.1Sross .... .... .... . 0xxx xxxx xxxx xxxx xxxx xxxx fraction (23 bits) 5021.1Sross 5031.1SrossAs shown in this schematic, the first hexadecimal digit contains only 5041.1Srossthe sign, and will be either `0' or `8'. The next two digits give the 5051.1Srossbiased exponent as an 8-bit integer. This is followed by a period and 5061.1Sross6 hexadecimal digits of fraction. The most significant hexadecimal digit 5071.1Srossof the fraction can be at most a `7'. 5081.1Sross 5091.1SrossNotable single-precision values include: 5101.1Sross 5111.1Sross 000.000000 +0 5121.1Sross 07F.000000 1 5131.1Sross 080.000000 2 5141.1Sross 0FF.000000 +infinity 5151.1Sross 5161.1Sross 800.000000 -0 5171.1Sross 87F.000000 -1 5181.1Sross 880.000000 -2 5191.1Sross 8FF.000000 -infinity 5201.1Sross 5211.1Sross 07E.7FFFFF largest representable number preceding +1 5221.1Sross 5231.1SrossAgain, certain categories are easily distinguished (assuming the `x's are 5241.1Srossnot all 0): 5251.1Sross 5261.1Sross 000.xxxxxx positive subnormal (denormalized) numbers 5271.1Sross 0FF.xxxxxx positive NaNs 5281.1Sross 800.xxxxxx negative subnormal numbers 5291.1Sross 8FF.xxxxxx negative NaNs 5301.1Sross 5311.1SrossLastly, exception flag values are represented by five characters, one 5321.1Srosscharacter per flag. Each flag is written as either a letter or a period 5331.1Sross(`.') according to whether the flag was set or not by the operation. A 5341.1Srossperiod indicates the flag was not set. The letter used to indicate a set 5351.1Srossflag depends on the flag: 5361.1Sross 5371.1Sross v invalid flag 5381.1Sross z division-by-zero flag 5391.1Sross o overflow flag 5401.1Sross u underflow flag 5411.1Sross x inexact flag 5421.1Sross 5431.1SrossFor example, the notation `...ux' indicates that the underflow and inexact 5441.1Srossexception flags were set and that the other three flags (invalid, division- 5451.1Srossby-zero, and overflow) were not set. The exception flags are always shown 5461.1Srossfollowing the value returned as the result of the operation. 5471.1Sross 5481.1SrossThe output from `testsoftfloat' is of the same form, except that the results 5491.1Srossare labeled `true' and `soft': 5501.1Sross 5511.1Sross <inputs> true: <simple-software-result> soft: <SoftFloat-result> 5521.1Sross 5531.1SrossThe ``true'' result is from the simpler, slower software floating-point, 5541.1Srosswhich, although not necessarily correct, is more likely to be right than 5551.1Srossthe SoftFloat (`soft') result. 5561.1Sross 5571.1Sross 5581.1Sross------------------------------------------------------------------------------- 5591.1SrossVariations Allowed by the IEC/IEEE Standard 5601.1Sross 5611.1SrossThe IEC/IEEE Standard admits some variation among conforming 5621.1Srossimplementations. Because TestFloat expects the two implementations being 5631.1Srosscompared to deliver bit-for-bit identical results under most circumstances, 5641.1Srossthis leeway in the standard can result in false errors being reported if 5651.1Srossthe two implementations do not make the same choices everywhere the standard 5661.1Srossprovides an option. 5671.1Sross 5681.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5691.1SrossUnderflow 5701.1Sross 5711.1SrossThe standard specifies that the underflow exception flag is to be raised 5721.1Srosswhen two conditions are met simultaneously: (1) _tininess_ and (2) _loss_ 5731.1Sross_of_accuracy_. A result is tiny when its magnitude is nonzero yet smaller 5741.1Srossthan any normalized floating-point number. The standard allows tininess to 5751.1Srossbe determined either before or after a result is rounded to the destination 5761.1Srossprecision. If tininess is detected before rounding, some borderline cases 5771.1Srosswill be flagged as underflows even though the result after rounding actually 5781.1Srosslies within the normal floating-point range. By detecting tininess after 5791.1Srossrounding, a system can avoid some unnecessary signaling of underflow. 5801.1Sross 5811.1SrossLoss of accuracy occurs when the subnormal format is not sufficient 5821.1Srossto represent an underflowed result accurately. The standard allows 5831.1Srossloss of accuracy to be detected either as an _inexact_result_ or as a 5841.1Sross_denormalization_loss_. If loss of accuracy is detected as an inexact 5851.1Srossresult, the underflow flag is raised whenever an underflowed quantity 5861.1Srosscannot be exactly represented in the subnormal format (that is, whenever the 5871.1Srossinexact flag is also raised). A denormalization loss, on the other hand, 5881.1Srossoccurs only when the subnormal format is not able to represent the result 5891.1Srossthat would have been returned if the destination format had infinite range. 5901.1SrossSome underflowed results are inexact but do not suffer a denormalization 5911.1Srossloss. By detecting loss of accuracy as a denormalization loss, a system can 5921.1Srossonce again avoid some unnecessary signaling of underflow. 5931.1Sross 5941.1SrossThe `-tininessbefore' and `-tininessafter' options can be used to control 5951.1Srosswhether TestFloat expects tininess on underflow to be detected before or 5961.1Srossafter rounding. (See _TestFloat_Options_ below.) One or the other is 5971.1Srossselected as the default when TestFloat is compiled, but these command 5981.1Srossoptions allow the default to be overridden. 5991.1Sross 6001.1SrossMost (possibly all) systems detect loss of accuracy as an inexact result. 6011.1SrossThe current version of TestFloat can only test for this case. 6021.1Sross 6031.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6041.1SrossNaNs 6051.1Sross 6061.1SrossThe IEC/IEEE Standard gives the floating-point formats a large number of 6071.1SrossNaN encodings and specifies that NaNs are to be returned as results under 6081.1Srosscertain conditions. However, the standard allows an implementation almost 6091.1Srosscomplete freedom over _which_ NaN to return in each situation. 6101.1Sross 6111.1SrossBy default, TestFloat does not check the bit patterns of NaN results. When 6121.1Srossthe result of an operation should be a NaN, any NaN is considered as good 6131.1Srossas another. This laxness can be overridden with the `-checkNaNs' option. 6141.1Sross(See _TestFloat_Options_ below.) In order for this option to be sensible, 6151.1SrossTestFloat must have been compiled so that its internal floating-point 6161.1Srossimplementation (SoftFloat) generates the proper NaN results for the system 6171.1Srossbeing tested. 6181.1Sross 6191.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6201.1SrossConversions to Integer 6211.1Sross 6221.1SrossConversion of a floating-point value to an integer format will fail if the 6231.1Srosssource value is a NaN or if it is too large. The IEC/IEEE Standard does not 6241.1Srossspecify what value should be returned as the integer result in these cases. 6251.1SrossMoreover, according to the standard, the invalid exception can be raised or 6261.1Srossan unspecified alternative mechanism may be used to signal such cases. 6271.1Sross 6281.1SrossTestFloat assumes that conversions to integer will raise the invalid 6291.1Srossexception if the source value cannot be rounded to a representable integer. 6301.1SrossWhen the conversion overflows, TestFloat expects the largest integer with 6311.1Srossthe same sign as the operand to be returned. If the floating-point operand 6321.2Sandvaris a NaN, TestFloat allows either the largest positive or largest negative 6331.1Srossinteger to be returned. The current version of TestFloat provides no means 6341.1Srossto alter these conventions. 6351.1Sross 6361.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6371.1Sross 6381.1Sross 6391.1Sross------------------------------------------------------------------------------- 6401.1SrossTestFloat Options 6411.1Sross 6421.1SrossThe `testfloat' (and `testsoftfloat') program accepts several command 6431.1Srossoptions. If mutually contradictory options are given, the last one has 6441.1Srosspriority. 6451.1Sross 6461.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6471.1Sross-help 6481.1Sross 6491.1SrossThe `-help' option causes a summary of program usage to be written, after 6501.1Srosswhich the program exits. 6511.1Sross 6521.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6531.1Sross-list 6541.1Sross 6551.1SrossThe `-list' option causes a list of testable functions to be written, 6561.1Srossafter which the program exits. Some machines do not implement all of the 6571.1Srossfunctions TestFloat can test, plus it may not be possible to test functions 6581.1Srossthat are inaccessible from the C language. 6591.1Sross 6601.1SrossThe `testsoftfloat' program does not have this option. All SoftFloat 6611.1Srossfunctions can be tested by `testsoftfloat'. 6621.1Sross 6631.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6641.1Sross-level <num> 6651.1Sross 6661.1SrossThe `-level' option sets the level of testing. The argument to `-level' can 6671.1Srossbe either 1 or 2. The default is level 1. Level 2 performs many more tests 6681.1Srossthan level 1. Testing at level 2 can take as much as a day (even longer for 6691.1Sross`testsoftfloat'), but can reveal bugs not found by level 1. 6701.1Sross 6711.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6721.1Sross-errors <num> 6731.1Sross 6741.1SrossThe `-errors' option instructs TestFloat to report no more than the 6751.1Srossspecified number of errors for any combination of function, rounding mode, 6761.1Srossetc. The argument to `-errors' must be a nonnegative decimal number. Once 6771.1Srossthe specified number of error reports has been generated, TestFloat ends the 6781.1Srosscurrent test and begins the next one, if any. The default is `-errors 20'. 6791.1Sross 6801.1SrossAgainst intuition, `-errors 0' causes TestFloat to report every error it 6811.1Srossfinds. 6821.1Sross 6831.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6841.1Sross-errorstop 6851.1Sross 6861.1SrossThe `-errorstop' option causes the program to exit after the first function 6871.1Srossfor which any errors are reported. 6881.1Sross 6891.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 6901.1Sross-forever 6911.1Sross 6921.1SrossThe `-forever' option causes a single operation to be repeatedly tested. 6931.1SrossOnly one rounding mode and/or rounding precision can be tested in a single 6941.1Srossinvocation. If not specified, the rounding mode defaults to nearest/even. 6951.1SrossFor extended double-precision operations, the rounding precision defaults 6961.1Srossto full extended double precision. The testing level is set to 2 by this 6971.1Srossoption. 6981.1Sross 6991.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7001.1Sross-checkNaNs 7011.1Sross 7021.1SrossThe `-checkNaNs' option causes TestFloat to verify the bitwise correctness 7031.1Srossof NaN results. In order for this option to be sensible, TestFloat must 7041.1Srosshave been compiled so that its internal floating-point implementation 7051.1Sross(SoftFloat) generates the proper NaN results for the system being tested. 7061.1Sross 7071.1SrossThis option is not available to `testsoftfloat'. 7081.1Sross 7091.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7101.1Sross-precision32, -precision64, -precision80 7111.1Sross 7121.1SrossFor extended double-precision functions affected by rounding precision 7131.1Srosscontrol, the `-precision32' option restricts testing to only the cases 7141.1Srossin which rounding precision is equivalent to single precision. The other 7151.1Srossrounding precision options are not tested. Likewise, the `-precision64' 7161.1Srossand `-precision80' options fix the rounding precision equivalent to double 7171.1Srossprecision or extended double precision, respectively. These options are 7181.1Srossignored for functions not affected by rounding precision control. 7191.1Sross 7201.1SrossThese options are not available if extended double precision is not 7211.1Srosssupported by the machine or if extended double precision functions cannot be 7221.1Srosstested. 7231.1Sross 7241.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7251.1Sross-nearesteven, -tozero, -down, -up 7261.1Sross 7271.1SrossThe `-nearesteven' option restricts testing to only the cases in which the 7281.1Srossrounding mode is nearest/even. The other rounding mode options are not 7291.1Srosstested. Likewise, `-tozero' forces rounding to zero; `-down' forces 7301.1Srossrounding down; and `-up' forces rounding up. These options are ignored for 7311.1Srossfunctions that are exact and thus do not round. 7321.1Sross 7331.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7341.1Sross-tininessbefore, -tininessafter 7351.1Sross 7361.1SrossThe `-tininessbefore' option indicates that the system detects tininess 7371.1Srosson underflow before rounding. The `-tininessafter' option indicates that 7381.1Srosstininess is detected after rounding. TestFloat alters its expectations 7391.1Srossaccordingly. These options override the default selected when TestFloat was 7401.1Srosscompiled. Choosing the wrong one of these two options should cause error 7411.1Srossreports for some (not all) functions. 7421.1Sross 7431.1SrossFor `testsoftfloat', these options operate more like the rounding precision 7441.1Srossand rounding mode options, in that they restrict the tests performed by 7451.1Sross`testsoftfloat'. By default, `testsoftfloat' tests both cases for any 7461.1Srossfunction for which there is a difference. 7471.1Sross 7481.1Sross- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7491.1Sross 7501.1Sross 7511.1Sross------------------------------------------------------------------------------- 7521.1SrossFunction Sets 7531.1Sross 7541.1SrossJust as TestFloat can test an operation for all four rounding modes in 7551.1Srosssequence, multiple operations can be tested with a single invocation of 7561.1SrossTestFloat. Three sets are recognized: `-all1', `-all2', and `-all'. The 7571.1Srossset `-all1' comprises all one-operand functions; `-all2' is all two-operand 7581.1Srossfunctions; and `-all' is all functions. A function set can be used in place 7591.1Srossof a function name in the TestFloat command line, such as 7601.1Sross 7611.1Sross testfloat [<option>...] -all 7621.1Sross 7631.1Sross 7641.1Sross------------------------------------------------------------------------------- 7651.1SrossContact Information 7661.1Sross 7671.1SrossAt the time of this writing, the most up-to-date information about 7681.1SrossTestFloat and the latest release can be found at the Web page `http:// 7691.1SrossHTTP.CS.Berkeley.EDU/~jhauser/arithmetic/TestFloat.html'. 7701.1Sross 7711.1Sross 772