libc/softfloat/softfloat.txt

1.2  christos $NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $
1.1     bjh21
1.1     bjh21 SoftFloat Release 2a General Documentation
1.1     bjh21
1.1     bjh21 John R. Hauser
1.1     bjh21 1998 December 13
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Introduction
1.1     bjh21
1.1     bjh21 SoftFloat is a software implementation of floating-point that conforms to
1.1     bjh21 the IEC/IEEE Standard for Binary Floating-Point Arithmetic.  As many as four
1.1     bjh21 formats are supported:  single precision, double precision, extended double
1.1     bjh21 precision, and quadruple precision.  All operations required by the standard
1.1     bjh21 are implemented, except for conversions to and from decimal.
1.1     bjh21
1.1     bjh21 This document gives information about the types defined and the routines
1.1     bjh21 implemented by SoftFloat.  It does not attempt to define or explain the
1.1     bjh21 IEC/IEEE Floating-Point Standard.  Details about the standard are available
1.1     bjh21 elsewhere.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Limitations
1.1     bjh21
1.1     bjh21 SoftFloat is written in C and is designed to work with other C code.  The
1.1     bjh21 SoftFloat header files assume an ISO/ANSI-style C compiler.  No attempt
1.2  christos has been made to accommodate compilers that are not ISO-conformant.  In
1.1     bjh21 particular, the distributed header files will not be acceptable to any
1.1     bjh21 compiler that does not recognize function prototypes.
1.1     bjh21
1.1     bjh21 Support for the extended double-precision and quadruple-precision formats
1.1     bjh21 depends on a C compiler that implements 64-bit integer arithmetic.  If the
1.1     bjh21 largest integer format supported by the C compiler is 32 bits, SoftFloat is
1.1     bjh21 limited to only single and double precisions.  When that is the case, all
1.1     bjh21 references in this document to the extended double precision, quadruple
1.1     bjh21 precision, and 64-bit integers should be ignored.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Contents
1.1     bjh21
1.1     bjh21     Introduction
1.1     bjh21     Limitations
1.1     bjh21     Contents
1.1     bjh21     Legal Notice
1.1     bjh21     Types and Functions
1.1     bjh21     Rounding Modes
1.1     bjh21     Extended Double-Precision Rounding Precision
1.1     bjh21     Exceptions and Exception Flags
1.1     bjh21     Function Details
1.1     bjh21         Conversion Functions
1.1     bjh21         Standard Arithmetic Functions
1.1     bjh21         Remainder Functions
1.1     bjh21         Round-to-Integer Functions
1.1     bjh21         Comparison Functions
1.1     bjh21         Signaling NaN Test Functions
1.1     bjh21         Raise-Exception Function
1.1     bjh21     Contact Information
1.1     bjh21
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Legal Notice
1.1     bjh21
1.1     bjh21 SoftFloat was written by John R. Hauser.  This work was made possible in
1.1     bjh21 part by the International Computer Science Institute, located at Suite 600,
1.1     bjh21 1947 Center Street, Berkeley, California 94704.  Funding was partially
1.1     bjh21 provided by the National Science Foundation under grant MIP-9311980.  The
1.1     bjh21 original version of this code was written as part of a project to build
1.1     bjh21 a fixed-point vector processor in collaboration with the University of
1.1     bjh21 California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
1.1     bjh21
1.1     bjh21 THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
1.1     bjh21 has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
1.1     bjh21 TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
1.1     bjh21 PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
1.1     bjh21 AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Types and Functions
1.1     bjh21
1.1     bjh21 When 64-bit integers are supported by the compiler, the `softfloat.h' header
1.1     bjh21 file defines four types:  `float32' (single precision), `float64' (double
1.1     bjh21 precision), `floatx80' (extended double precision), and `float128'
1.1     bjh21 (quadruple precision).  The `float32' and `float64' types are defined in
1.1     bjh21 terms of 32-bit and 64-bit integer types, respectively, while the `float128'
1.1     bjh21 type is defined as a structure of two 64-bit integers, taking into account
1.1     bjh21 the byte order of the particular machine being used.  The `floatx80' type
1.1     bjh21 is defined as a structure containing one 16-bit and one 64-bit integer, with
1.1     bjh21 the machine's byte order again determining the order of the `high' and `low'
1.1     bjh21 fields.
1.1     bjh21
1.1     bjh21 When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
1.1     bjh21 header file defines only two types:  `float32' and `float64'.  Because
1.1     bjh21 ISO/ANSI C guarantees at least one built-in integer type of 32 bits,
1.1     bjh21 the `float32' type is identified with an appropriate integer type.  The
1.1     bjh21 `float64' type is defined as a structure of two 32-bit integers, with the
1.1     bjh21 machine's byte order determining the order of the fields.
1.1     bjh21
1.1     bjh21 In either case, the types in `softfloat.h' are defined such that if a system
1.1     bjh21 implements the usual C `float' and `double' types according to the IEC/IEEE
1.1     bjh21 Standard, then the `float32' and `float64' types should be indistinguishable
1.1     bjh21 in memory from the native `float' and `double' types.  (On the other hand,
1.1     bjh21 when `float32' or `float64' values are placed in processor registers by
1.1     bjh21 the compiler, the type of registers used may differ from those used for the
1.1     bjh21 native `float' and `double' types.)
1.1     bjh21
1.1     bjh21 SoftFloat implements the following arithmetic operations:
1.1     bjh21
1.1     bjh21 -- Conversions among all the floating-point formats, and also between
1.1     bjh21    integers (32-bit and 64-bit) and any of the floating-point formats.
1.1     bjh21
1.1     bjh21 -- The usual add, subtract, multiply, divide, and square root operations
1.1     bjh21    for all floating-point formats.
1.1     bjh21
1.1     bjh21 -- For each format, the floating-point remainder operation defined by the
1.1     bjh21    IEC/IEEE Standard.
1.1     bjh21
1.1     bjh21 -- For each floating-point format, a ``round to integer'' operation that
1.1     bjh21    rounds to the nearest integer value in the same format.  (The floating-
1.1     bjh21    point formats can hold integer values, of course.)
1.1     bjh21
1.1     bjh21 -- Comparisons between two values in the same floating-point format.
1.1     bjh21
1.1     bjh21 The only functions required by the IEC/IEEE Standard that are not provided
1.1     bjh21 are conversions to and from decimal.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Rounding Modes
1.1     bjh21
1.1     bjh21 All four rounding modes prescribed by the IEC/IEEE Standard are implemented
1.1     bjh21 for all operations that require rounding.  The rounding mode is selected
1.1     bjh21 by the global variable `float_rounding_mode'.  This variable may be set
1.1     bjh21 to one of the values `float_round_nearest_even', `float_round_to_zero',
1.1     bjh21 `float_round_down', or `float_round_up'.  The rounding mode is initialized
1.1     bjh21 to nearest/even.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Extended Double-Precision Rounding Precision
1.1     bjh21
1.1     bjh21 For extended double precision (`floatx80') only, the rounding precision
1.1     bjh21 of the standard arithmetic operations is controlled by the global variable
1.1     bjh21 `floatx80_rounding_precision'.  The operations affected are:
1.1     bjh21
1.1     bjh21    floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
1.1     bjh21
1.1     bjh21 When `floatx80_rounding_precision' is set to its default value of 80, these
1.1     bjh21 operations are rounded (as usual) to the full precision of the extended
1.1     bjh21 double-precision format.  Setting `floatx80_rounding_precision' to 32
1.1     bjh21 or to 64 causes the operations listed to be rounded to reduced precision
1.1     bjh21 equivalent to single precision (`float32') or to double precision
1.1     bjh21 (`float64'), respectively.  When rounding to reduced precision, additional
1.1     bjh21 bits in the result significand beyond the rounding point are set to zero.
1.1     bjh21 The consequences of setting `floatx80_rounding_precision' to a value other
1.1     bjh21 than 32, 64, or 80 is not specified.  Operations other than the ones listed
1.1     bjh21 above are not affected by `floatx80_rounding_precision'.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Exceptions and Exception Flags
1.1     bjh21
1.1     bjh21 All five exception flags required by the IEC/IEEE Standard are
1.1     bjh21 implemented.  Each flag is stored as a unique bit in the global variable
1.1     bjh21 `float_exception_flags'.  The positions of the exception flag bits within
1.1     bjh21 this variable are determined by the bit masks `float_flag_inexact',
1.1     bjh21 `float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and
1.1     bjh21 `float_flag_invalid'.  The exception flags variable is initialized to all 0,
1.1     bjh21 meaning no exceptions.
1.1     bjh21
1.1     bjh21 An individual exception flag can be cleared with the statement
1.1     bjh21
1.1     bjh21     float_exception_flags &= ~ float_flag_<exception>;
1.1     bjh21
1.1     bjh21 where `<exception>' is the appropriate name.  To raise a floating-point
1.1     bjh21 exception, the SoftFloat function `float_raise' should be used (see below).
1.1     bjh21
1.1     bjh21 In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess
1.1     bjh21 for underflow either before or after rounding.  The choice is made by
1.1     bjh21 the global variable `float_detect_tininess', which can be set to either
1.1     bjh21 `float_tininess_before_rounding' or `float_tininess_after_rounding'.
1.1     bjh21 Detecting tininess after rounding is better because it results in fewer
1.1     bjh21 spurious underflow signals.  The other option is provided for compatibility
1.1     bjh21 with some systems.  Like most systems, SoftFloat always detects loss of
1.1     bjh21 accuracy for underflow as an inexact result.
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Function Details
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Conversion Functions
1.1     bjh21
1.1     bjh21 All conversions among the floating-point formats are supported, as are all
1.1     bjh21 conversions between a floating-point format and 32-bit and 64-bit signed
1.1     bjh21 integers.  The complete set of conversion functions is:
1.1     bjh21
1.1     bjh21    int32_to_float32      int64_to_float32
1.1     bjh21    int32_to_float64      int64_to_float32
1.1     bjh21    int32_to_floatx80     int64_to_floatx80
1.1     bjh21    int32_to_float128     int64_to_float128
1.1     bjh21
1.1     bjh21    float32_to_int32      float32_to_int64
1.1     bjh21    float32_to_int32      float64_to_int64
1.1     bjh21    floatx80_to_int32     floatx80_to_int64
1.1     bjh21    float128_to_int32     float128_to_int64
1.1     bjh21
1.1     bjh21    float32_to_float64    float32_to_floatx80   float32_to_float128
1.1     bjh21    float64_to_float32    float64_to_floatx80   float64_to_float128
1.1     bjh21    floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
1.1     bjh21    float128_to_float32   float128_to_float64   float128_to_floatx80
1.1     bjh21
1.1     bjh21 Each conversion function takes one operand of the appropriate type and
1.1     bjh21 returns one result.  Conversions from a smaller to a larger floating-point
1.1     bjh21 format are always exact and so require no rounding.  Conversions from 32-bit
1.1     bjh21 integers to double precision and larger formats are also exact, and likewise
1.1     bjh21 for conversions from 64-bit integers to extended double and quadruple
1.1     bjh21 precisions.
1.1     bjh21
1.1     bjh21 Conversions from floating-point to integer raise the invalid exception if
1.1     bjh21 the source value cannot be rounded to a representable integer of the desired
1.1     bjh21 size (32 or 64 bits).  If the floating-point operand is a NaN, the largest
1.1     bjh21 positive integer is returned.  Otherwise, if the conversion overflows, the
1.1     bjh21 largest integer with the same sign as the operand is returned.
1.1     bjh21
1.1     bjh21 On conversions to integer, if the floating-point operand is not already an
1.1     bjh21 integer value, the operand is rounded according to the current rounding
1.1     bjh21 mode as specified by `float_rounding_mode'.  Because C (and perhaps other
1.1     bjh21 languages) require that conversions to integers be rounded toward zero, the
1.1     bjh21 following functions are provided for improved speed and convenience:
1.1     bjh21
1.1     bjh21    float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
1.1     bjh21    float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
1.1     bjh21    floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
1.1     bjh21    float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
1.1     bjh21
1.1     bjh21 These variant functions ignore `float_rounding_mode' and always round toward
1.1     bjh21 zero.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Standard Arithmetic Functions
1.1     bjh21
1.1     bjh21 The following standard arithmetic functions are provided:
1.1     bjh21
1.1     bjh21    float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
1.1     bjh21    float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
1.1     bjh21    floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
1.1     bjh21    float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
1.1     bjh21
1.1     bjh21 Each function takes two operands, except for `sqrt' which takes only one.
1.1     bjh21 The operands and result are all of the same type.
1.1     bjh21
1.1     bjh21 Rounding of the extended double-precision (`floatx80') functions is affected
1.1     bjh21 by the `floatx80_rounding_precision' variable, as explained above in the
1.1     bjh21 section _Extended_Double-Precision_Rounding_Precision_.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Remainder Functions
1.1     bjh21
1.1     bjh21 For each format, SoftFloat implements the remainder function according to
1.1     bjh21 the IEC/IEEE Standard.  The remainder functions are:
1.1     bjh21
1.1     bjh21    float32_rem
1.1     bjh21    float64_rem
1.1     bjh21    floatx80_rem
1.1     bjh21    float128_rem
1.1     bjh21
1.1     bjh21 Each remainder function takes two operands.  The operands and result are all
1.1     bjh21 of the same type.  Given operands x and y, the remainder functions return
1.1     bjh21 the value x - n*y, where n is the integer closest to x/y.  If x/y is exactly
1.1     bjh21 halfway between two integers, n is the even integer closest to x/y.  The
1.1     bjh21 remainder functions are always exact and so require no rounding.
1.1     bjh21
1.1     bjh21 Depending on the relative magnitudes of the operands, the remainder
1.1     bjh21 functions can take considerably longer to execute than the other SoftFloat
1.1     bjh21 functions.  This is inherent in the remainder operation itself and is not a
1.1     bjh21 flaw in the SoftFloat implementation.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Round-to-Integer Functions
1.1     bjh21
1.1     bjh21 For each format, SoftFloat implements the round-to-integer function
1.1     bjh21 specified by the IEC/IEEE Standard.  The functions are:
1.1     bjh21
1.1     bjh21    float32_round_to_int
1.1     bjh21    float64_round_to_int
1.1     bjh21    floatx80_round_to_int
1.1     bjh21    float128_round_to_int
1.1     bjh21
1.1     bjh21 Each function takes a single floating-point operand and returns a result of
1.1     bjh21 the same type.  (Note that the result is not an integer type.)  The operand
1.1     bjh21 is rounded to an exact integer according to the current rounding mode, and
1.1     bjh21 the resulting integer value is returned in the same floating-point format.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Comparison Functions
1.1     bjh21
1.1     bjh21 The following floating-point comparison functions are provided:
1.1     bjh21
1.1     bjh21    float32_eq    float32_le    float32_lt
1.1     bjh21    float64_eq    float64_le    float64_lt
1.1     bjh21    floatx80_eq   floatx80_le   floatx80_lt
1.1     bjh21    float128_eq   float128_le   float128_lt
1.1     bjh21
1.1     bjh21 Each function takes two operands of the same type and returns a 1 or 0
1.1     bjh21 representing either _true_ or _false_.  The abbreviation `eq' stands for
1.1     bjh21 ``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands
1.1     bjh21 for ``less than'' (<).
1.1     bjh21
1.1     bjh21 The standard greater-than (>), greater-than-or-equal (>=), and not-equal
1.1     bjh21 (!=) functions are easily obtained using the functions provided.  The
1.1     bjh21 not-equal function is just the logical complement of the equal function.
1.1     bjh21 The greater-than-or-equal function is identical to the less-than-or-equal
1.1     bjh21 function with the operands reversed; and the greater-than function can be
1.1     bjh21 obtained from the less-than function in the same way.
1.1     bjh21
1.1     bjh21 The IEC/IEEE Standard specifies that the less-than-or-equal and less-than
1.1     bjh21 functions raise the invalid exception if either input is any kind of NaN.
1.1     bjh21 The equal functions, on the other hand, are defined not to raise the invalid
1.1     bjh21 exception on quiet NaNs.  For completeness, SoftFloat provides the following
1.1     bjh21 additional functions:
1.1     bjh21
1.1     bjh21    float32_eq_signaling    float32_le_quiet    float32_lt_quiet
1.1     bjh21    float64_eq_signaling    float64_le_quiet    float64_lt_quiet
1.1     bjh21    floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
1.1     bjh21    float128_eq_signaling   float128_le_quiet   float128_lt_quiet
1.1     bjh21
1.1     bjh21 The `signaling' equal functions are identical to the standard functions
1.1     bjh21 except that the invalid exception is raised for any NaN input.  Likewise,
1.1     bjh21 the `quiet' comparison functions are identical to their counterparts except
1.1     bjh21 that the invalid exception is not raised for quiet NaNs.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Signaling NaN Test Functions
1.1     bjh21
1.1     bjh21 The following functions test whether a floating-point value is a signaling
1.1     bjh21 NaN:
1.1     bjh21
1.1     bjh21    float32_is_signaling_nan
1.1     bjh21    float64_is_signaling_nan
1.1     bjh21    floatx80_is_signaling_nan
1.1     bjh21    float128_is_signaling_nan
1.1     bjh21
1.1     bjh21 The functions take one operand and return 1 if the operand is a signaling
1.1     bjh21 NaN and 0 otherwise.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21 Raise-Exception Function
1.1     bjh21
1.1     bjh21 SoftFloat provides a function for raising floating-point exceptions:
1.1     bjh21
1.1     bjh21     float_raise
1.1     bjh21
1.1     bjh21 The function takes a mask indicating the set of exceptions to raise.  No
1.1     bjh21 result is returned.  In addition to setting the specified exception flags,
1.1     bjh21 this function may cause a trap or abort appropriate for the current system.
1.1     bjh21
1.1     bjh21 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.1     bjh21
1.1     bjh21
1.1     bjh21 -------------------------------------------------------------------------------
1.1     bjh21 Contact Information
1.1     bjh21
1.1     bjh21 At the time of this writing, the most up-to-date information about
1.1     bjh21 SoftFloat and the latest release can be found at the Web page `http://
1.1     bjh21 HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.
1.1     bjh21
1.1     bjh21