The BigFloat class
represents floating-point numbers of arbitrary precision.
The range of numbers that can be represented is from roughly
10-646,000,000 to 10646,000,000.
Note that this range is somewhat smaller than the
BigInteger
type. The precision can be up to about 20 billion digits.
Accuracy and precision
The accuracy of a number is a measure of how close an approximation is to
its actual value. The precision of a number is a measure of the amount of memory
used to represent a value.
Floating-point numbers are stored in the form
mantissa 2exponent, where
both the mantissa and the exponent are integers. Most numbers cannot be represented
exactly in this format. Therefore, to compute and store the exact result of a
calculation would generally require infinite time and infinite precision.
This is clearly impossible. We need a way to specify the desired accuracy
of floating-point calculations.
The AccuracyGoal
structure is used to specify the desired accuracy of a calculation.
This structure has two special values.
InheritAbsolute
indicates that the result should be computed with the same number of digits after the
decimal point as the arguments.
InheritRelative
indicates that the result should be computed with the same total number of digits
as the arguments. For example, 1.57 has three digits total and two after the decimal point.
Computing Tan(1.57) with accuracy goal
InheritAbsolute
would result in 1255.77. With accuracy goal
InheritRelative ,
the result would be 1.26e+003.
To compute a result with a specific accuracy, create an
AccuracyGoal using
one of two static methods.
Relative(Double)
creates an accuracy goal with the relative accuracy specified in decimal digits.
Absolute(Double)
creates an accuracy goal with the absolute accuracy specified in decimal digits.
The number of digits need not be an integer.
Some operations, like casting an integer to a
BigFloat,
do not have operands to inherit the precision from. In such cases, the default
accuracy goal is used, available through the
DefaultAccuracyGoal,
property. The default is a relative precision of about 60 decimal digits.
This property can not be set to an inherited accuracy goal.
Rounding
When the precision of a number is reduced, a choice must be made how the
information in the discarded bits will be used. The
RoundingMode
enumeration lists the possibilities:
Rounding Mode Values
| Field |
Description |
| TowardsNearest |
Numbers are rounded to the nearest value. In case of a tie, the
last bit of the result is made zero. This is the default. |
| TowardsNegativeInfinity |
All numbers are rounded down. |
| TowardsPositiveInfinity |
All numbers are rounded up. |
| TowardsZero |
Positive numbers are rounded down. Negative numbers are rounded up. |
When no rounding mode is specified, the
DefaultRoundingMode
is used.
Constructing big floating-point numbers
The BigFloat structure has several constructors
that construct a floating-point number with the same value as the argument. You can start from
32 and 64 bit integers, single or double-precision numbers,
BigInteger values and
BigRational values.
Most rational numbers cannot be expressed exactly as a floating-point number.
For this reason, a second constructor is provided that takes two additional arguments:
a AccuracyGoal value that
specifies the desired accuracy of the approximation, and a
RoundingMode
value that specifies how to round the final approximation.
| C# | Copy |
|---|
BigFloat a = new BigFloat(123);
BigFloat b = new BigFloat(3.141592);
AccuracyGoal goal = AccuracyGoal.Absolute(50);
BigFloat c = new BigFloat(new BigRational(22, 7), goal);
|
| Visual Basic | Copy |
|---|
Dim a As New BigFloat(123)
Dim b As New BigFloat(3.141592)
Dim goal As AccuracyGoal = AccuracyGoal.Absolute(50)
Dim c As New BigFloat(new BigRational(22, 7), goal)
|
In addition, several static methods are available. For example,
Parse(String)
and TryParse(String, BigFloat%)
create big floats from strings.
Floating-point constants
The BigFloat
class provides several constants for commonly used and special floating-point numbers.
These are listed in the following table:
Floating-point number constants
The last three values in the above list deserve special attention.
These values correspond to the special values defined in the IEEE-754 standard
for single and double precision floating-point numbers that defines the behavior
of the Single and
Double types.
As the name implies, PositiveInfinity
represents positive infinity. This value is used to represent numbers that are too large
to be represented in the number format, as well as the result of certain operations
like 1/0. Likewise,
NegativeInfinity
represents negative infinity and is used to represent numbers that are too small
to be represented in the number format, as well as the result of certain other operations
like -1/0.
The NaN
field represents Not-a-Number. It is a special value that is returned when the the
result of an operation is undefined. For example, dividing zero by zero and taking the
square root of a negative number both result in
NaN.
To test whether a number is NaN, use the static
IsNaN(BigFloat)
method.
Working with floating-point numbers
You can work with BigFloat
numbers like you would any built-in floating-point type.
Like all other arbitrary precision types, big floats are immutable.
One complicating factor is that the precision of
BigFloat values
is not a constant but depends on how it was constructed or computed.
The next section goes into this factor in more depth.
Details of big floating-point arithmetic
Most operations compute a result with the same relative precision as its operands.
When two or more operands are involved, the precision is the smaller of the precisions
of its argument. For example, the result of multiplying two numbers with 50 and 200 digits of precision,
respectively, will have a precision of 50 digits. The result is always rounded
to the nearest value.
An important exception is addition and subtraction, which are calculated to
be accurate within the smaller absolute accuracy of the operands. Care should be
taken when subtracting from integers, which are stored with the default precision by
default. For example, the result of BigFloat.One - x*x will have the default precision
regardless of the value of x. To prevent this from happening,
use the ExtendPrecision(AccuracyGoal)
method. Note that this method does not modify the instance it is called on but returns
a new value.
To allow for maximum flexibility, every computational method has at least two overloads.
One overload uses the default accuracy goal and rounding mode. A second overload has two
additional arguments that can be used to specify the rounding mode and accuracy goal
of the result.
When the result of an operation can not be represented as a finite
floating-point number, then the following rules apply.
When the result is too large to be represented, the value
PositiveInfinity
is returned. When the result is too small (i.e. negative and too large in magnitude),
NegativeInfinity
is returned. When the result is undefined,
NaN
is returned.
When one of the operands is NaN,
the result is also NaN.
When one or both of the operands of a relational operator is
NaN,
the result is false. The one exception is the
inequality operator, which returns true
if both operands are NaN.
Arithmetic operations
The Extreme Optimization Mathematics Library for .NET provides methods for all basic arithmetic operators
on floating-point numbers. Overloaded versions of
the arithmetic operators are provided for languages that support them. These use the
default values for rounding mode (towards nearest) and accuracy goal (usually inherit relative).
For languages that don't support operator
overloading, equivalent static (Shared in Visual Basic) methods are supplied.
Floating-point number operators and their static (Shared) method equivalents
| Operator |
Static method equivalent |
Description |
|
+x
|
(no equivalent) |
Returns the floating-point number x. |
|
-x
|
Negate(BigFloat)
|
Returns the negation of the floating-point number x. |
|
x1 + x2
|
BigFloat.Add(x1, x2)
|
Adds the floating-point numbers x1 and x2. |
|
x + a
|
BigFloat.Add(x, a)
|
Adds the floating-point number x and the real number a. |
|
a + x
|
BigFloat.Add(a, x)
|
Adds the real number a to the floating-point number x. |
|
x++
|
(no equivalent) |
Increments the floating-point number x by one. |
|
x1 - x2
|
BigFloat.Subtract(x1, x2)
|
Subtracts the floating-point numbers x1 and x2. |
|
x - a
|
BigFloat.Subtract(x, a)
|
Subtracts the real number a from the floating-point number x. |
|
a - x
|
BigFloat.Subtract(a, x)
|
Subtracts the floating-point number x from the real number a. |
|
x--
|
(no equivalent) |
Decrements the floating-point number x by one. |
|
x1 * x2
|
BigFloat.Multiply(x1, x2)
|
Multiplies the floating-point numbers x1 and x2. |
|
x * a
|
BigFloat.Multiply(x, a)
|
Multiplies the floating-point number x and the real number a. |
|
a * x
|
BigFloat.Multiply(a, x)
|
Multiplies the real number a and the floating-point number x. |
|
x1 / x2
|
BigFloat.Divide(x1, x2)
|
Divides the floating-point number x1 by x2. |
|
x / a
|
BigFloat.Divide(x, a)
|
Divides the floating-point number x by the real number a. |
|
a / x
|
BigFloat.Divide(a, x)
|
Divides the real number a by the floating-point number x. |
In addition, the relational operators are also available. In a language
that does not support custom operators, the
Equals(BigFloat)
or CompareTo(BigFloat)
method can be used.
Functions of floating-point numbers
TheBigFloattype defines static methods for the most common mathematical functions of floating-point numbers,
including: logarithmic, exponential, trigonometric and hyperbolic functions.
The tables below summarixe these methods, and their meaning.
Each of these methods is overloaded: two parameters are available that
can be used to specify the rounding mode and accuracy goal used to compute the result.
Miscellanious functions of floating-point numbers.
Logarithmic and exponential functions of floating-point numbers.
Trigonometric functions of floating-point numbers
Hyperbolic functions of floating-point numbers
The following, larger example shows how to calculate the number π using
the Arithmetic-Geometric Mean (AGM) formula. For details, see for example
this paper.
| C# | Copy |
|---|
AccuracyGoal goal = AccuracyGoal.Absolute(100);
BigFloat x1 = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
BigFloat x2 = BigFloat.One;
BigFloat S = BigFloat.Zero;
BigFloat c = BigFloat.One;
int k = 0;
while (-c.GetDecimalDigits() < digits)
{
S += BigFloat.ScaleByPowerOfTwo(c, k - 1);
BigFloat aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1);
BigFloat gMean = BigFloat.Sqrt(x1 * x2);
x1 = aMean;
x2 = gMean;
c = (x1 + x2) * (x1 - x2);
k++;
}
BigFloat pi = x1 * x1 / (1 - S);
Console.WriteLine("Pi = {0:F100}", pi);
|
| Visual Basic | Copy |
|---|
AccuracyGoal goal = AccuracyGoal.Absolute(100);
Dim x1 As BigFloat = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
Dim x2 As BigFloat = BigFloat.One;
Dim S As BigFloat = BigFloat.Zero;
Dim c As BigFloat = BigFloat.One;
Dim k As Integer = 0
Do While (-c.GetDecimalDigits() < digits)
S = S + BigFloat.ScaleByPowerOfTwo(c, k - 1)
Dim aMean As BigFloat = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1)
Dim gMean As BigFloat = BigFloat.Sqrt(x1 * x2)
x1 = aMean
x2 = gMean
c = (x1 + x2) * (x1 - x2)
k = k + 1
Loop
Dim pi As BigFloat = x1 * x1 / (1 - S)
Console.WriteLine("Pi = {0:F100}", pi)
|