# Arbitrary Precision Floating-Point Numbers

The BigFloat class
represents floating-point numbers of arbitrary precision.
The range of numbers that can be represented is from roughly
10^{-646,000,000} to 10^{646,000,000}.
Note that this range is somewhat smaller than the
BigInteger
type. The precision can be up to about 20 billion digits.

## Accuracy and precision

The accuracy of a number is a measure of how close an approximation is to its actual value. The precision of a number is a measure of the amount of memory used to represent a value.

Floating-point numbers are stored in the form
*
mantissa 2 ^{exponent}*, where
both the mantissa and the exponent are integers. Most numbers cannot be represented
exactly in this format. Therefore, to compute and store the exact result of a
calculation would generally require infinite time and infinite precision.
This is clearly impossible. We need a way to specify the desired accuracy
of floating-point calculations.

The AccuracyGoal structure is used to specify the desired accuracy of a calculation. This structure has two special values. InheritAbsolute indicates that the result should be computed with the same number of digits after the decimal point as the arguments. InheritRelative indicates that the result should be computed with the same total number of digits as the arguments. For example, 1.57 has three digits total and two after the decimal point. Computing Tan(1.57) with accuracy goal InheritAbsolute would result in 1255.77. With accuracy goal InheritRelative , the result would be 1.26e+003.

To compute a result with a specific accuracy, create an AccuracyGoal using one of two static methods. Relative creates an accuracy goal with the relative accuracy specified in decimal digits. Absolute creates an accuracy goal with the absolute accuracy specified in decimal digits. The number of digits need not be an integer.

Some operations, like casting an integer to a BigFloat, do not have operands to inherit the precision from. In such cases, the default accuracy goal is used, available through the DefaultAccuracyGoal, property. The default is a relative precision of about 60 decimal digits. This property can not be set to an inherited accuracy goal.

## Rounding

When the precision of a number is reduced, a choice must be made how the information in the discarded bits will be used. The RoundingMode enumeration lists the possibilities:

Field | Description |
---|---|

TowardsNearest | Numbers are rounded to the nearest value. In case of a tie, the last bit of the result is made zero. This is the default. |

TowardsNegativeInfinity | All numbers are rounded down. |

TowardsPositiveInfinity | All numbers are rounded up. |

TowardsZero | Positive numbers are rounded down. Negative numbers are rounded up. |

When no rounding mode is specified, the DefaultRoundingMode is used.

## Constructing big floating-point numbers

The BigFloat structure has several constructors that construct a floating-point number with the same value as the argument. You can start from 32 and 64 bit integers, single or double-precision numbers, BigInteger values and BigRational values.

Most rational numbers cannot be expressed exactly as a floating-point number. For this reason, a second constructor is provided that takes two additional arguments: a AccuracyGoal value that specifies the desired accuracy of the approximation, and a RoundingMode value that specifies how to round the final approximation.

```
BigFloat a = new BigFloat(123);
BigFloat b = new BigFloat(3.141592);
AccuracyGoal accuracyGoal = AccuracyGoal.Absolute(50);
BigFloat r = new BigFloat(new BigRational(22, 7), accuracyGoal, RoundingMode.TowardsNearest);
```

In addition, several static methods are available. For example, Parse and TryParse create big floats from strings.

## Floating-point constants

The BigFloat class provides several constants for commonly used and special floating-point numbers. These are listed in the following table:

Field | Description |
---|---|

The number zero. | |

The number one. | |

The number minus one. | |

The largest possible BigFloat. | |

The smallest possible BigFloat. | |

Positive infinity. | |

Negative infinity. | |

Not-a-Number value. |

The last three values in the above list deserve special attention. These values correspond to the special values defined in the IEEE-754 standard for single and double precision floating-point numbers that defines the behavior of the Single and Double types.

As the name implies, PositiveInfinity represents positive infinity. This value is used to represent numbers that are too large to be represented in the number format, as well as the result of certain operations like 1/0. Likewise, NegativeInfinity represents negative infinity and is used to represent numbers that are too small to be represented in the number format, as well as the result of certain other operations like -1/0.

The NaN field represents Not-a-Number. It is a special value that is returned when the result of an operation is undefined. For example, dividing zero by zero and taking the square root of a negative number both result in NaN. To test whether a number is NaN, use the static IsNaN method.

## Working with floating-point numbers

You can work with BigFloat numbers like you would any built-in floating-point type. Like all other arbitrary precision types, big floats are immutable.

One complicating factor is that the precision of BigFloat values is not a constant but depends on how it was constructed or computed. The next section goes into this factor in more depth.

#### Details of big floating-point arithmetic

Most operations compute a result with the same relative precision as its operands. When two or more operands are involved, the precision is the smaller of the precisions of its argument. For example, the result of multiplying two numbers with 50 and 200 digits of precision, respectively, will have a precision of 50 digits. The result is always rounded to the nearest value.

An important exception is addition and subtraction, which are calculated to be accurate within the smaller absolute accuracy of the operands. Care should be taken when subtracting from integers, which are stored with the default precision by default. For example, the result of BigFloat.One - x*x will have the default precision regardless of the value of x. To prevent this from happening, use the ExtendPrecision method. Note that this method does not modify the instance it is called on but returns a new value.

To allow for maximum flexibility, every computational method has at least two overloads. One overload uses the default accuracy goal and rounding mode. A second overload has two additional arguments that can be used to specify the rounding mode and accuracy goal of the result.

When the result of an operation can not be represented as a finite floating-point number, then the following rules apply. When the result is too large to be represented, the value PositiveInfinity is returned. When the result is too small (i.e. negative and too large in magnitude), NegativeInfinity is returned. When the result is undefined, NaN is returned. When one of the operands is NaN, the result is also NaN. When one or both of the operands of a relational operator is NaN, the result is false. The one exception is the inequality operator, which returns true if both operands are NaN.

#### Arithmetic operations

The **Extreme Optimization Numerical Libraries for .NET** provides methods for all basic arithmetic operators
on floating-point numbers. Overloaded versions of
the arithmetic operators are provided for languages that support them. These use the
default values for rounding mode (towards nearest) and accuracy goal (usually inherit relative).
For languages that don't support operator
overloading, equivalent static (**Shared** in Visual Basic) methods are supplied.

Operator | Static method equivalent | Description |
---|---|---|

+x | (no equivalent) | Returns the floating-point number x. |

-x | Returns the negation of the floating-point number x. | |

x1 + x2 | BigFloat.Add(x1, x2) | Adds the floating-point numbers x1 and x2. |

x + a | BigFloat.Add(x, a) | Adds the floating-point number x and the real number a. |

a + x | BigFloat.Add(a, x) | Adds the real number a to the floating-point number x. |

x++ | (no equivalent) | Increments the floating-point number x by one. |

x1 - x2 | BigFloat.Subtract(x1, x2) | Subtracts the floating-point numbers x1 and x2. |

x - a | BigFloat.Subtract(x, a) | Subtracts the real number a from the floating-point number x. |

a - x | BigFloat.Subtract(a, x) | Subtracts the floating-point number x from the real number a. |

x-- | (no equivalent) | Decrements the floating-point number x by one. |

x1 * x2 | BigFloat.Multiply(x1, x2) | Multiplies the floating-point numbers x1 and x2. |

x * a | BigFloat.Multiply(x, a) | Multiplies the floating-point number x and the real number a. |

a * x | BigFloat.Multiply(a, x) | Multiplies the real number a and the floating-point number x. |

x1 / x2 | BigFloat.Divide(x1, x2) | Divides the floating-point number x1 by x2. |

x / a | BigFloat.Divide(x, a) | Divides the floating-point number x by the real number a. |

a / x | BigFloat.Divide(a, x) | Divides the real number a by the floating-point number x. |

In addition, the relational operators are also available. In a language that does not support custom operators, the Equals or CompareTo method can be used.

```
BigFloat d = BigFloat.Exp(1);
BigFloat e = BigFloat.Log(2);
BigFloat f = 2 - 3 * (d + e);
```

#### Functions of floating-point numbers

TheBigFloattype defines static methods for the most common mathematical functions of floating-point numbers, including: logarithmic, exponential, trigonometric and hyperbolic functions.

The tables below summarize these methods, and their meaning. Each of these methods is overloaded: two parameters are available that can be used to specify the rounding mode and accuracy goal used to compute the result.

Method | Description |
---|---|

The absolute value of the floating point number x. | |

The floating point number x with its sign changed to match y. | |

The largest integer less than or equal to the floating-point number x. | |

The smallest integer greater than or equal to the floating-point number x. | |

The fractional part of the floating-point number x. The result is negative if x is negative. | |

The floating-point number x rounded to the specified number of digits. | |

The floating-point number x multiplied by the specified power of two. | |

Indicates whether the floating-point number x equals positive infinity. | |

Indicates whether the floating-point number x equals negative infinity. | |

Indicates whether the floating-point number x is Not-a-Number. |

Method | Description |
---|---|

The number E raised to the power x. | |

The inverse (reciprocal) of the floating-point number x. | |

The square root of the floating-point number x. | |

The nth root of the floating-point number x. | |

The floating-point number x1 raised to the complex power x2. | |

The floating-point number x raised to the integer power n. | |

Natural logarithm of the floating-point number x. | |

Base x1 logarithm of the floating-point number x2. |

Method | Description |
---|---|

Gets the number pi to the specified accuracy. | |

Computes the sine and cosine of the floating-point number x. | |

Sine of the floating-point number x. | |

Cosine of the floating-point number x. | |

Tangent of the floating-point number x. | |

Inverse sine of the floating-point number x. | |

Inverse cosine of the floating-point number x. | |

Inverse tangent of the floating-point number x. | |

Inverse tangent of the floating-point number y/x. |

Method | Description |
---|---|

Hyperbolic sine of the floating-point number x. | |

Hyperbolic cosine of the floating-point number x. | |

Hyperbolic tangent of the floating-point number x. | |

Inverse hyperbolic sine of the floating-point number x. | |

Inverse hyperbolic cosine of the floating-point number x. | |

Inverse hyperbolic tangent of the floating-point number x. |

The following, larger example shows how to calculate the number π using the Arithmetic-Geometric Mean (AGM) formula. For details, see for example this paper.

```
int digits = 100;
AccuracyGoal goal = AccuracyGoal.Absolute(100);
BigFloat x1 = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
BigFloat x2 = BigFloat.One;
BigFloat S = BigFloat.Zero;
BigFloat c = BigFloat.One;
int k = 0;
while (-c.GetDecimalDigits() < digits)
{
S += BigFloat.ScaleByPowerOfTwo(c, k - 1);
BigFloat aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1);
BigFloat gMean = BigFloat.Sqrt(x1 * x2);
x1 = aMean;
x2 = gMean;
c = (x1 + x2) * (x1 - x2);
k++;
}
BigFloat pi = x1 * x1 / (1 - S);
Console.WriteLine("Pi = {0:F100}", pi);
```