Arbitrary Precision Floating-Point Numbers

The BigFloat class represents floating-point numbers of arbitrary precision. The range of numbers that can be represented is from roughly 10-646,000,000 to 10646,000,000. Note that this range is somewhat smaller than the BigInteger type. The precision can be up to about 20 billion digits.

Accuracy and precision

The accuracy of a number is a measure of how close an approximation is to its actual value. The precision of a number is a measure of the amount of memory used to represent a value.

Floating-point numbers are stored in the form mantissa 2exponent, where both the mantissa and the exponent are integers. Most numbers cannot be represented exactly in this format. Therefore, to compute and store the exact result of a calculation would generally require infinite time and infinite precision. This is clearly impossible. We need a way to specify the desired accuracy of floating-point calculations.

The AccuracyGoal structure is used to specify the desired accuracy of a calculation. This structure has two special values. InheritAbsolute indicates that the result should be computed with the same number of digits after the decimal point as the arguments. InheritRelative indicates that the result should be computed with the same total number of digits as the arguments. For example, 1.57 has three digits total and two after the decimal point. Computing Tan(1.57) with accuracy goal InheritAbsolute would result in 1255.77. With accuracy goal InheritRelative , the result would be 1.26e+003.

To compute a result with a specific accuracy, create an AccuracyGoal using one of two static methods. Relative creates an accuracy goal with the relative accuracy specified in decimal digits. Absolute creates an accuracy goal with the absolute accuracy specified in decimal digits. The number of digits need not be an integer.

Some operations, like casting an integer to a BigFloat, do not have operands to inherit the precision from. In such cases, the default accuracy goal is used, available through the DefaultAccuracyGoal, property. The default is a relative precision of about 60 decimal digits. This property can not be set to an inherited accuracy goal.

Rounding

When the precision of a number is reduced, a choice must be made how the information in the discarded bits will be used. The RoundingMode enumeration lists the possibilities:

Rounding Mode Values

Field

Description

TowardsNearest

Numbers are rounded to the nearest value. In case of a tie, the last bit of the result is made zero. This is the default.

TowardsNegativeInfinity

All numbers are rounded down.

TowardsPositiveInfinity

All numbers are rounded up.

TowardsZero

Positive numbers are rounded down. Negative numbers are rounded up.

When no rounding mode is specified, the DefaultRoundingMode is used.

Constructing big floating-point numbers

The BigFloat structure has several constructors that construct a floating-point number with the same value as the argument. You can start from 32 and 64 bit integers, single or double-precision numbers, BigInteger values and BigRational values.

Most rational numbers cannot be expressed exactly as a floating-point number. For this reason, a second constructor is provided that takes two additional arguments: a AccuracyGoal value that specifies the desired accuracy of the approximation, and a RoundingMode value that specifies how to round the final approximation.

C#
``````BigFloat a = new BigFloat(123);
BigFloat b = new BigFloat(3.141592);
AccuracyGoal accuracyGoal = AccuracyGoal.Absolute(50);
BigFloat r = new BigFloat(new BigRational(22, 7), accuracyGoal, RoundingMode.TowardsNearest);``````

In addition, several static methods are available. For example, Parse and TryParse create big floats from strings.

Floating-point constants

The BigFloat class provides several constants for commonly used and special floating-point numbers. These are listed in the following table:

Floating-point number constants

Field

Description

Zero

The number zero.

One

The number one.

MinusOne

The number minus one.

MaxValue

The largest possible BigFloat.

MinValue

The smallest possible BigFloat.

PositiveInfinity

Positive infinity.

NegativeInfinity

Negative infinity.

NaN

Not-a-Number value.

The last three values in the above list deserve special attention. These values correspond to the special values defined in the IEEE-754 standard for single and double precision floating-point numbers that defines the behavior of the Single and Double types.

As the name implies, PositiveInfinity represents positive infinity. This value is used to represent numbers that are too large to be represented in the number format, as well as the result of certain operations like 1/0. Likewise, NegativeInfinity represents negative infinity and is used to represent numbers that are too small to be represented in the number format, as well as the result of certain other operations like -1/0.

The NaN field represents Not-a-Number. It is a special value that is returned when the result of an operation is undefined. For example, dividing zero by zero and taking the square root of a negative number both result in NaN. To test whether a number is NaN, use the static IsNaN method.

Working with floating-point numbers

You can work with BigFloat numbers like you would any built-in floating-point type. Like all other arbitrary precision types, big floats are immutable.

One complicating factor is that the precision of BigFloat values is not a constant but depends on how it was constructed or computed. The next section goes into this factor in more depth.

Details of big floating-point arithmetic

Most operations compute a result with the same relative precision as its operands. When two or more operands are involved, the precision is the smaller of the precisions of its argument. For example, the result of multiplying two numbers with 50 and 200 digits of precision, respectively, will have a precision of 50 digits. The result is always rounded to the nearest value.

An important exception is addition and subtraction, which are calculated to be accurate within the smaller absolute accuracy of the operands. Care should be taken when subtracting from integers, which are stored with the default precision by default. For example, the result of BigFloat.One - x*x will have the default precision regardless of the value of x. To prevent this from happening, use the ExtendPrecision method. Note that this method does not modify the instance it is called on but returns a new value.

To allow for maximum flexibility, every computational method has at least two overloads. One overload uses the default accuracy goal and rounding mode. A second overload has two additional arguments that can be used to specify the rounding mode and accuracy goal of the result.

When the result of an operation can not be represented as a finite floating-point number, then the following rules apply. When the result is too large to be represented, the value PositiveInfinity is returned. When the result is too small (i.e. negative and too large in magnitude), NegativeInfinity is returned. When the result is undefined, NaN is returned. When one of the operands is NaN, the result is also NaN. When one or both of the operands of a relational operator is NaN, the result is false. The one exception is the inequality operator, which returns true if both operands are NaN.

Arithmetic operations

The Extreme Optimization Numerical Libraries for .NET provides methods for all basic arithmetic operators on floating-point numbers. Overloaded versions of the arithmetic operators are provided for languages that support them. These use the default values for rounding mode (towards nearest) and accuracy goal (usually inherit relative). For languages that don't support operator overloading, equivalent static (Shared in Visual Basic) methods are supplied.

Floating-point number operators and their static (Shared) method equivalents

Operator

Static method equivalent

Description

+x

(no equivalent)

Returns the floating-point number x.

-x

Negate

Returns the negation of the floating-point number x.

x1 + x2

Adds the floating-point numbers x1 and x2.

x + a

Adds the floating-point number x and the real number a.

a + x

Adds the real number a to the floating-point number x.

x++

(no equivalent)

Increments the floating-point number x by one.

x1 - x2

BigFloat.Subtract(x1, x2)

Subtracts the floating-point numbers x1 and x2.

x - a

BigFloat.Subtract(x, a)

Subtracts the real number a from the floating-point number x.

a - x

BigFloat.Subtract(a, x)

Subtracts the floating-point number x from the real number a.

x--

(no equivalent)

Decrements the floating-point number x by one.

x1 * x2

BigFloat.Multiply(x1, x2)

Multiplies the floating-point numbers x1 and x2.

x * a

BigFloat.Multiply(x, a)

Multiplies the floating-point number x and the real number a.

a * x

BigFloat.Multiply(a, x)

Multiplies the real number a and the floating-point number x.

x1 / x2

BigFloat.Divide(x1, x2)

Divides the floating-point number x1 by x2.

x / a

BigFloat.Divide(x, a)

Divides the floating-point number x by the real number a.

a / x

BigFloat.Divide(a, x)

Divides the real number a by the floating-point number x.

In addition, the relational operators are also available. In a language that does not support custom operators, the Equals or CompareTo method can be used.

C#
``````BigFloat d = BigFloat.Exp(1);
BigFloat e = BigFloat.Log(2);
BigFloat f = 2 - 3 * (d + e);``````

Functions of floating-point numbers

TheBigFloattype defines static methods for the most common mathematical functions of floating-point numbers, including: logarithmic, exponential, trigonometric and hyperbolic functions.

The tables below summarize these methods, and their meaning. Each of these methods is overloaded: two parameters are available that can be used to specify the rounding mode and accuracy goal used to compute the result.

Miscellaneous functions of floating-point numbers.

Method

Description

Abs

The absolute value of the floating point number x.

CopySign

The floating point number x with its sign changed to match y.

Floor

The largest integer less than or equal to the floating-point number x.

Ceiling

The smallest integer greater than or equal to the floating-point number x.

FractionalPart

The fractional part of the floating-point number x. The result is negative if x is negative.

Round

The floating-point number x rounded to the specified number of digits.

ScaleByPowerOfTwo

The floating-point number x multiplied by the specified power of two.

IsPositiveInfinity

Indicates whether the floating-point number x equals positive infinity.

IsNegativeInfinity

Indicates whether the floating-point number x equals negative infinity.

IsNaN

Indicates whether the floating-point number x is Not-a-Number.

Logarithmic and exponential functions of floating-point numbers.

Method

Description

Exp

The number E raised to the power x.

Inverse

The inverse (reciprocal) of the floating-point number x.

Sqrt

The square root of the floating-point number x.

Root

The nth root of the floating-point number x.

Pow

The floating-point number x1 raised to the complex power x2.

Pow

The floating-point number x raised to the integer power n.

Log

Natural logarithm of the floating-point number x.

Log

Base x1 logarithm of the floating-point number x2.

Trigonometric functions of floating-point numbers

Method

Description

GetPi

Gets the number pi to the specified accuracy.

SinCos

Computes the sine and cosine of the floating-point number x.

Sin

Sine of the floating-point number x.

Cos

Cosine of the floating-point number x.

Tan

Tangent of the floating-point number x.

Asin

Inverse sine of the floating-point number x.

Acos

Inverse cosine of the floating-point number x.

Atan

Inverse tangent of the floating-point number x.

Atan2

Inverse tangent of the floating-point number y/x.

Hyperbolic functions of floating-point numbers

Method

Description

Sinh

Hyperbolic sine of the floating-point number x.

Cosh

Hyperbolic cosine of the floating-point number x.

Tanh

Hyperbolic tangent of the floating-point number x.

Asinh

Inverse hyperbolic sine of the floating-point number x.

Acosh

Inverse hyperbolic cosine of the floating-point number x.

Atanh

Inverse hyperbolic tangent of the floating-point number x.

The following, larger example shows how to calculate the number π using the Arithmetic-Geometric Mean (AGM) formula. For details, see for example this paper.

C#
``````int digits = 100;
AccuracyGoal goal = AccuracyGoal.Absolute(100);
BigFloat x1 = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
BigFloat x2 = BigFloat.One;
BigFloat S = BigFloat.Zero;
BigFloat c = BigFloat.One;
int k = 0;
while (-c.GetDecimalDigits() < digits)
{
S += BigFloat.ScaleByPowerOfTwo(c, k - 1);
BigFloat aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1);
BigFloat gMean = BigFloat.Sqrt(x1 * x2);
x1 = aMean;
x2 = gMean;
c = (x1 + x2) * (x1 - x2);
k++;
}
BigFloat pi = x1 * x1 / (1 - S);
Console.WriteLine("Pi = {0:F100}", pi);``````