How to Read Floating Point Numbers

dc.contributor.authorClinger, William D.
dc.date.accessioned2023-09-18T21:33:37Z
dc.date.available2023-09-18T21:33:37Z
dc.date.issued1990-06-05
dc.description13 pagesen_US
dc.description.abstractConsider the problem of converting decimal scientific notation for a number into the best binary floating point approximation to that number, for some fixed precision. This problem cannot be solved using arithmetic of any fixed precision. Hence the IEEE Standard/or Binary Floating-Point Arithmetic does not require the result of such a conversion to be the best approximation. This paper presents an efficient algorithm that always·finds the best approximation. The algorithm uses a few extra bits of precision to compute an IEEE-conforming approximation while testing an intermediate result to determine whether the approximation could be other than the best. If the approximation might not be the best, then the best approximation is determined by a few simple operations on multiple-precision integers, where the precision is determined by the input When using 64 bits of precision to compute IEEE double precision results, the algorithm avoids higher-precision arithmetic over 99% of the time. The input problem considered by this papet is the inverse of an output problem considered by Steele and White: Given a binary floating point number, print a correctly rounded decimal representation of it using the smallest number of digits that will allow the number to be read without loss of accuracy. The Steele and White algorithm assumes that the input problem is solved; an imperfect solution to the input problem, as allowed by the IEEE standard and ubiquitous in current practice, defeats the purpose of their algorithm.en_US
dc.identifier.urihttps://hdl.handle.net/1794/28887
dc.language.isoenen_US
dc.publisherUniversity of Oregonen_US
dc.rightsCreative Commons BY-NC-ND 4.0-USen_US
dc.subjectIEEE Standarden_US
dc.subjectBinary Floating-Point Arithmeticen_US
dc.subjectSteele and Whiteen_US
dc.titleHow to Read Floating Point Numbersen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
clinger_1990.pdf
Size:
5.5 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: