Extreme Optimization™: Complexity made simple.

Numerical Components
for .NET

  • Home
  • •
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • •
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • •
  • Support
    • Frequently Asked Questions
    • QuickStart Samples
    • Sample Applications
    • Downloads
  • •
  • Blog
  • •
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Contact us
Introduction
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand ReferenceReference
  • Home
  • Documentation
  • Mathematics Library User's Guide
  • Arbitrary Precision Arithmetic
  • Arbitrary Precision Floating-Point Numbers
Collapse imageExpand ImageCopy imageCopyHover image
       




Arbitrary Precision Floating-Point Numbers

The BigFloat class represents floating-point numbers of arbitrary precision. The range of numbers that can be represented is from roughly 10-646,000,000 to 10646,000,000. Note that this range is somewhat smaller than the BigInteger type. The precision can be up to about 20 billion digits.

Accuracy and precision

The accuracy of a number is a measure of how close an approximation is to its actual value. The precision of a number is a measure of the amount of memory used to represent a value.

Floating-point numbers are stored in the form mantissa 2exponent, where both the mantissa and the exponent are integers. Most numbers cannot be represented exactly in this format. Therefore, to compute and store the exact result of a calculation would generally require infinite time and infinite precision. This is clearly impossible. We need a way to specify the desired accuracy of floating-point calculations.

The AccuracyGoal structure is used to specify the desired accuracy of a calculation. This structure has two special values. InheritAbsolute indicates that the result should be computed with the same number of digits after the decimal point as the arguments. InheritRelative indicates that the result should be computed with the same total number of digits as the arguments. For example, 1.57 has three digits total and two after the decimal point. Computing Tan(1.57) with accuracy goal InheritAbsolute would result in 1255.77. With accuracy goal InheritRelative , the result would be 1.26e+003.

To compute a result with a specific accuracy, create an AccuracyGoal using one of two static methods. Relative(Double) creates an accuracy goal with the relative accuracy specified in decimal digits. Absolute(Double) creates an accuracy goal with the absolute accuracy specified in decimal digits. The number of digits need not be an integer.

Some operations, like casting an integer to a BigFloat, do not have operands to inherit the precision from. In such cases, the default accuracy goal is used, available through the DefaultAccuracyGoal, property. The default is a relative precision of about 60 decimal digits. This property can not be set to an inherited accuracy goal.

Rounding

When the precision of a number is reduced, a choice must be made how the information in the discarded bits will be used. The RoundingMode enumeration lists the possibilities:

Rounding Mode Values
Field Description
TowardsNearest Numbers are rounded to the nearest value. In case of a tie, the last bit of the result is made zero. This is the default.
TowardsNegativeInfinity All numbers are rounded down.
TowardsPositiveInfinity All numbers are rounded up.
TowardsZero Positive numbers are rounded down. Negative numbers are rounded up.

When no rounding mode is specified, the DefaultRoundingMode is used.

Constructing big floating-point numbers

The BigFloat structure has several constructors that construct a floating-point number with the same value as the argument. You can start from 32 and 64 bit integers, single or double-precision numbers, BigInteger values and BigRational values.

Most rational numbers cannot be expressed exactly as a floating-point number. For this reason, a second constructor is provided that takes two additional arguments: a AccuracyGoal value that specifies the desired accuracy of the approximation, and a RoundingMode value that specifies how to round the final approximation.

C# Copy imageCopy
BigFloat a = new BigFloat(123);
BigFloat b = new BigFloat(3.141592);
AccuracyGoal goal = AccuracyGoal.Absolute(50);
BigFloat c = new BigFloat(new BigRational(22, 7), goal);
Visual Basic Copy imageCopy
Dim a As New BigFloat(123)
Dim b As New BigFloat(3.141592)
Dim goal As AccuracyGoal = AccuracyGoal.Absolute(50)
Dim c As New BigFloat(new BigRational(22, 7), goal)

In addition, several static methods are available. For example, Parse(String) and TryParse(String, BigFloat%) create big floats from strings.

Floating-point constants

The BigFloat class provides several constants for commonly used and special floating-point numbers. These are listed in the following table:

Floating-point number constants
Field Description
Zero The number zero.
One The number one.
MinusOne The number minus one.
MaxValue The largest possible BigFloat.
MinValue The smallest possible BigFloat.
PositiveInfinity Positive infinity.
NegativeInfinity Negative infinity.
NaN Not-a-Number value.

The last three values in the above list deserve special attention. These values correspond to the special values defined in the IEEE-754 standard for single and double precision floating-point numbers that defines the behavior of the Single and Double types.

As the name implies, PositiveInfinity represents positive infinity. This value is used to represent numbers that are too large to be represented in the number format, as well as the result of certain operations like 1/0. Likewise, NegativeInfinity represents negative infinity and is used to represent numbers that are too small to be represented in the number format, as well as the result of certain other operations like -1/0.

The NaN field represents Not-a-Number. It is a special value that is returned when the the result of an operation is undefined. For example, dividing zero by zero and taking the square root of a negative number both result in NaN. To test whether a number is NaN, use the static IsNaN(BigFloat) method.

Working with floating-point numbers

You can work with BigFloat numbers like you would any built-in floating-point type. Like all other arbitrary precision types, big floats are immutable.

One complicating factor is that the precision of BigFloat values is not a constant but depends on how it was constructed or computed. The next section goes into this factor in more depth.

Details of big floating-point arithmetic

Most operations compute a result with the same relative precision as its operands. When two or more operands are involved, the precision is the smaller of the precisions of its argument. For example, the result of multiplying two numbers with 50 and 200 digits of precision, respectively, will have a precision of 50 digits. The result is always rounded to the nearest value.

An important exception is addition and subtraction, which are calculated to be accurate within the smaller absolute accuracy of the operands. Care should be taken when subtracting from integers, which are stored with the default precision by default. For example, the result of BigFloat.One - x*x will have the default precision regardless of the value of x. To prevent this from happening, use the ExtendPrecision(AccuracyGoal) method. Note that this method does not modify the instance it is called on but returns a new value.

To allow for maximum flexibility, every computational method has at least two overloads. One overload uses the default accuracy goal and rounding mode. A second overload has two additional arguments that can be used to specify the rounding mode and accuracy goal of the result.

When the result of an operation can not be represented as a finite floating-point number, then the following rules apply. When the result is too large to be represented, the value PositiveInfinity is returned. When the result is too small (i.e. negative and too large in magnitude), NegativeInfinity is returned. When the result is undefined, NaN is returned. When one of the operands is NaN, the result is also NaN. When one or both of the operands of a relational operator is NaN, the result is false. The one exception is the inequality operator, which returns true if both operands are NaN.

Arithmetic operations

The Extreme Optimization Mathematics Library for .NET provides methods for all basic arithmetic operators on floating-point numbers. Overloaded versions of the arithmetic operators are provided for languages that support them. These use the default values for rounding mode (towards nearest) and accuracy goal (usually inherit relative). For languages that don't support operator overloading, equivalent static (Shared in Visual Basic) methods are supplied.

Floating-point number operators and their static (Shared) method equivalents
Operator Static method equivalent Description
+x (no equivalent) Returns the floating-point number x.
-x Negate(BigFloat) Returns the negation of the floating-point number x.
x1 + x2 BigFloat.Add(x1, x2) Adds the floating-point numbers x1 and x2.
x + a BigFloat.Add(x, a) Adds the floating-point number x and the real number a.
a + x BigFloat.Add(a, x) Adds the real number a to the floating-point number x.
x++ (no equivalent) Increments the floating-point number x by one.
x1 - x2 BigFloat.Subtract(x1, x2) Subtracts the floating-point numbers x1 and x2.
x - a BigFloat.Subtract(x, a) Subtracts the real number a from the floating-point number x.
a - x BigFloat.Subtract(a, x) Subtracts the floating-point number x from the real number a.
x-- (no equivalent) Decrements the floating-point number x by one.
x1 * x2 BigFloat.Multiply(x1, x2) Multiplies the floating-point numbers x1 and x2.
x * a BigFloat.Multiply(x, a) Multiplies the floating-point number x and the real number a.
a * x BigFloat.Multiply(a, x) Multiplies the real number a and the floating-point number x.
x1 / x2 BigFloat.Divide(x1, x2) Divides the floating-point number x1 by x2.
x / a BigFloat.Divide(x, a) Divides the floating-point number x by the real number a.
a / x BigFloat.Divide(a, x) Divides the real number a by the floating-point number x.

In addition, the relational operators are also available. In a language that does not support custom operators, the Equals(BigFloat) or CompareTo(BigFloat) method can be used.

Functions of floating-point numbers

TheBigFloattype defines static methods for the most common mathematical functions of floating-point numbers, including: logarithmic, exponential, trigonometric and hyperbolic functions.

The tables below summarixe these methods, and their meaning. Each of these methods is overloaded: two parameters are available that can be used to specify the rounding mode and accuracy goal used to compute the result.

Miscellanious functions of floating-point numbers.
Method Description
Abs(BigFloat) The absolute value of the floating point number x.
CopySign(BigFloat, BigFloat) The floating point number x with its sign changed to match y.
Floor(BigFloat) The largest integer less than or equal to the floating-point number x.
Ceiling(BigFloat) The smallest integer greater than or equal to the floating-point number x.
FractionalPart(BigFloat) The fractional part of the floating-point number x. The result is negative if x is negative.
Round(BigFloat, Int32) The floating-point number x rounded to the specified number of digits.
ScaleByPowerOfTwo(BigFloat, Int32) The floating-point number x multiplied by the specified power of two.
IsPositiveInfinity(BigFloat) Indicates whether the floating-point number x equals positive infinity.
IsNegativeInfinity(BigFloat) Indicates whether the floating-point number x equals negative infinity.
IsNaN(BigFloat) Indicates whether the floating-point number x is Not-a-Nnumber.
Logarithmic and exponential functions of floating-point numbers.
Method Description
Exp(BigFloat) The number E raised to the power x.
Inverse(BigFloat) The inverse (reciprocal) of the floating-point number x.
Sqrt(BigFloat) The square root of the floating-point number x.
Root(BigFloat, Int32) The nth root of the floating-point number x.
Pow(BigFloat, BigFloat) The floating-point number x1 raised to the complex power x2.
Pow(BigFloat, Int32) The floating-point number x raised to the integer power n.
Log(BigFloat) Natural logarithm of the floating-point number x.
Log(BigFloat, BigFloat) Base x1 logarithm of the floating-point number x2.
Trigonometric functions of floating-point numbers
Method Description
GetPi(AccuracyGoal) Gets the number pi to the specified accuracy.
SinCos(BigFloat, BigFloat%, BigFloat%) Computes the sine and cosine of the floating-point number x.
Sin(BigFloat) Sine of the floating-point number x.
Cos(BigFloat) Cosine of the floating-point number x.
Tan(BigFloat) Tangent of the floating-point number x.
Asin(BigFloat) Inverse sine of the floating-point number x.
Acos(BigFloat) Inverse cosine of the floating-point number x.
Atan(BigFloat) Inverse tangent of the floating-point number x.
Atan2(BigFloat, BigFloat) Inverse tangent of the floating-point number y/x.
Hyperbolic functions of floating-point numbers
Method Description
Sinh(BigFloat) Hyperbolic sine of the floating-point number x.
Cosh(BigFloat) Hyperbolic cosine of the floating-point number x.
Tanh(BigFloat) Hyperbolic tangent of the floating-point number x.
Asinh(BigFloat) Inverse hyperbolic sine of the floating-point number x.
Acosh(BigFloat) Inverse hyperbolic cosine of the floating-point number x.
Atanh(BigFloat) Inverse hyperbolic tangent of the floating-point number x.

The following, larger example shows how to calculate the number π using the Arithmetic-Geometric Mean (AGM) formula. For details, see for example this paper.

C# Copy imageCopy
AccuracyGoal goal = AccuracyGoal.Absolute(100);
BigFloat x1 = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
BigFloat x2 = BigFloat.One;
BigFloat S = BigFloat.Zero;
BigFloat c = BigFloat.One;
int k = 0;
while (-c.GetDecimalDigits() < digits)
{
    S += BigFloat.ScaleByPowerOfTwo(c, k - 1);
    BigFloat aMean = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1);
    BigFloat gMean = BigFloat.Sqrt(x1 * x2);
    x1 = aMean;
    x2 = gMean;
    c = (x1 + x2) * (x1 - x2);
    k++;
}
BigFloat pi = x1 * x1 / (1 - S);
Console.WriteLine("Pi = {0:F100}", pi);
Visual Basic Copy imageCopy
AccuracyGoal goal = AccuracyGoal.Absolute(100);
Dim x1 As BigFloat = BigFloat.Sqrt(2, goal, RoundingMode.TowardsNearest);
Dim x2 As BigFloat = BigFloat.One;
Dim S As BigFloat = BigFloat.Zero;
Dim c As BigFloat = BigFloat.One;
Dim k As Integer = 0
Do While (-c.GetDecimalDigits() < digits)
    S = S + BigFloat.ScaleByPowerOfTwo(c, k - 1)
    Dim aMean As BigFloat = BigFloat.ScaleByPowerOfTwo(x1 + x2, -1)
    Dim gMean As BigFloat = BigFloat.Sqrt(x1 * x2)
    x1 = aMean
    x2 = gMean
    c = (x1 + x2) * (x1 - x2)
    k = k + 1
Loop
Dim pi As BigFloat = x1 * x1 / (1 - S)
Console.WriteLine("Pi = {0:F100}", pi)

Send comments on this topic to support@extremeoptimization.com

Copyright © 2003-2010, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.