Using Numerical Libraries for .NET from F#

With the release of version 5.0 of our Numerical Libraries for .NET, we’ve also considerably improved our support for the F# language. In this article I want to walk you through setting up F# Interactive to use the Extreme Optimization Numerical Libraries for .NET, and show off some of the new features. This tutorial will focus on F# 3.0 in Visual Studio 2012. Most of the functionality also works with F# 2.0 in Visual Studio 2010.

Setting up F# Interactive

The first step is to bring the Extreme Optimization Numerical Libraries for .NET assemblies into the F# Interactive environment:

#I @"c:\Program Files\Extreme Optimization\Numerical Libraries for .NET\bin\Net40\";;
#r "Extreme.Numerics.Net40.dll";;
#r "Extreme.Numerics.FSharp30.Net40.dll";;

open Extreme.Mathematics;;

fsi.AddPrinter(fun (A : Extreme.Mathematics.ISummarizable) -> A.Summarize());;

We do this by first adding the path to the Extreme Optimization binaries to the DLL search path using the #I command. (Note: the path depends on the installation folder and may be different on your machine.) We the main assembly Extreme.Numerics.Net40, and the F# specific assembly Extreme.Numerics.FSharp30.Net40. Finally, we tell fsi to use a pretty printer for objects that implement the ISummarizable interface. This will print out vectors and matrices in a user-friendly form.

Working with vectors and matrices

So let’s get our hands dirty right away. When we referenced the Extreme.Numerics.FSharp30.Net40 dll, this automatically opened several modules that make it easier to do linear algebra in F#. We can call Vector and Matrix factory methods, or we can use some convenient F# functions:

> let A = rand 3 3;;

val A : LinearAlgebra.DenseMatrix =
  3x3 DenseMatrix
[[0.0780,0.1007,0.6917]
 [0.5422,0.8726,0.5718]
 [0.3528,0.0631,0.0772]]

> let c = !![1.0..10.0];;
val c : LinearAlgebra.DenseVector =
  [1.0000,2.0000,3.0000,4.0000,5.0000,6.0000,7.0000,8.0000,9.0000,10.0000]

Notice the !! operator, which creates a vector from a list or sequence. We can access elements of vectors and matrices using the familiar F# syntax. Slices work too, as do integer sequences and predicates:

> A.[0,1];;
val it : float = 0.100694183
> A.[1,1]  A.[1..2,0..1];;
val it : Matrix = 2x2 DenseMatrix
[[0.5422,99.0000]
 [0.3528,0.0631]]
> c.[1..3];;
val it : Vector = [2.0000,3.0000,4.0000]
> c.[fun x -> sin x > 0.0];;
val it : Vector = [1.0000,2.0000,3.0000,7.0000,8.0000,9.0000]
> c.[[2;4;5]];;
val it : Vector = [3.0000,5.0000,6.0000]
> c.[[for i in 1..7 -> i % 3]];;
val it : Vector = [2.0000,3.0000,1.0000,2.0000,3.0000,1.0000,2.0000]

Let’s do some simple calculations:

> A*b;;
 val it : LinearAlgebra.DenseVector = [3.7378,5.1463,0.8651]
 > A.T*b;;
 val it : Vector = [2.9263,2.1614,2.2214]
 > 2.0 * b;;
 val it : LinearAlgebra.DenseVector = [2.0000,4.0000,10.0000]
 > 1.0 + b;;
 val it : Vector = [2.0000,3.0000,6.0000]
 > log (1.0 + b);;
 val it : Vector = [0.6931,1.0986,1.7918]
 > det A;;
 val it : float = -0.1707208593
 > inv A;;
 val it : Matrix =
 3x3 DenseMatrix
 [[-0.1835,-0.2101,3.1982]
 [-0.9361,1.3939,-1.9355]
 [1.6027,-0.1792,-0.0791]]
> let b = Vector.Create(3, fun i -> 1.0 + (float i)**2.0);;
val b : LinearAlgebra.DenseVector = [1.0000,2.0000,5.0000]
> let x = A.Solve(b);;
val x : LinearAlgebra.DenseVector = [15.3876,-7.8258,0.8488]

The last command solved the system of equations represented by the matrix A and solved it for the right-hand side b. We can also work with matrix decompositions. There are actually two ways. Let’s use the SVD as an example. First, we can get the factors of the decomposition.

> let (U,Σ,V) = A.svd();;

val Σ : LinearAlgebra.DiagonalMatrix =
  3x3 DiagonalMatrix
[[1.2980,0.0000,0.0000]
 [0.0000,0.5101,0.0000]
 [0.0000,0.0000,0.2578]]
val V : Matrix =
  3x3 DenseMatrix
[[-0.4492,0.4032,-0.7973]
 [-0.6411,0.4760,0.6020]
 [-0.6222,-0.7816,-0.0446]]
val U : Matrix =
  3x3 DenseMatrix
[[-0.4083,-0.9041,-0.1260]
 [-0.8928,0.3667,0.2616]
 [-0.1903,0.2193,-0.9569]]

Or, if we just want the singular values, we can get those as well:

> let σ = A.svd();;

val σ : LinearAlgebra.DenseVector = [1.2980,0.5101,0.2578]

The second way to work with decompositions is by creating decomposition objects. Let’s work with a non-square matrix this time:

> let C = rand 15 4;;

val C : LinearAlgebra.DenseMatrix =
  15x4 DenseMatrix
[[0.4198,0.8147,0.8530,0.5364]
 [0.9181,0.7706,0.3164,0.8773]
 [0.4768,0.1423,0.7419,0.6521]
 [0.2463,0.4579,0.3474,0.0311]
 [0.0432,0.6366,0.4928,0.2399]
 ...
 [0.3991,0.3426,0.4452,0.1276]
 [0.7867,0.3247,0.5256,0.1940]
 [0.6504,0.0943,0.1169,0.4266]
 [0.0436,0.6716,0.5230,0.4922]
 [0.5329,0.5422,0.2448,0.0547]]

> let d = (rand 15 1).GetColumn(0);;

val d : Vector =
  [0.7446,0.8430,0.6296,0.1210,0.2190,...,0.8281,0.6643,0.0355,0.7120,0.3582]

Here we see that the output is truncated, and only the 5 first and last rows are shown. We can find the least squares solution to C*y = d.
We’ll also compute the residuals and their norm:

> let y = svdC.GetPseudoInverse() * d;;

val y : Vector = [0.3221,0.4458,0.3383,0.1298]

> let r = d - C * y;;

val r : Vector =
  [-0.1120,-0.0171,0.0770,-0.2840,-0.2765,...,0.3796,0.0632,-0.3110,0.1577,-0.1450]

> norm r;;
val it : float = 0.8214250504

Working with functions

You’d expect a functional language to work with functions. So let’s see what we can do.

The modules discussed in this section are not automatically opened when you load the F# compatibility assembly. The available modules are:

  • Special: special functions, such as Gamma functions, Bessel functions, etc.
  • Functional: numerical integration and differentiation, finding zeros, maxima and minima of functions.
  • Statistics: some functions that are useful when doing statistical analysis.
  • Random: random number streams.

These are only the modules with F# specific functions. The full functionality of the libraries is of course available from F# as well.

Here we’ll focus mainly on the second category. Special functions can be very handy, however, so let’s start with those:

> open Extreme.Numerics.FSharp.Special;;
> gamma 0.5;;
val it : float = 1.772453851
> BesselJ 0.0 1.0;;
val it : float = 0.7651976866
> let J0 = BesselJ 0.0;;

val J0 : (float -> float)

> J0 1.0;;
val it : float = 0.7651976866
>

Notice how we used partial application of the Bessel function to get a function that returns the Bessel function of order 0. We can do something similar in many other places, for example with binomial coefficients:

> let binom9 = binomial 9;;

val binom9 : (int -> float)

> [for i in 0..9 -> binom9 i];;
val it : float list =
  [1.0; 9.0; 36.0; 84.0; 126.0; 126.0; 84.0; 36.0; 9.0; 1.0]

Now to the functional stuff. Two of the most basic operations on functions are differentiation and integration. Differentiation is done using the d function. Integration is done using integrate1. Partial application is supported, so you can create a function that computes the numerical derivative at any value you give it.

> open Extreme.Numerics.FSharp.Functional;;
> let J1 = BesselJ 1.0;;

val J1 : (float -> float)

> d J0 0.5;;
val it : float = -0.2422684577
> J1 0.5;; // d J0 = -J1
val it : float = 0.2422684577
> let dJ0 = d J0;;

val dJ0 : (float -> float)

> dJ0 0.5;;
val it : float = -0.2422684577
> dJ0 1.5;;
val it : float = -0.5579365079

Integration works in a similar way. The bounds of the integration interval are supplied as a tuple. Infinite intervals are supported as well.

> integrate1 J1 (0.0,5.0);;
val it : float = 1.177596771
> J0 5.0 - J0 0.0;; // Should give - the previous result:
val it : float = -1.177596771
> let iJ1 = integrate1 J1;;

val iJ1 : (float * float -> float)

> iJ1 (0.0,5.0);;
val it : float = 1.177596771
> integrate1 (fun x -> exp -x) (0.0,infinity);;
val it : float = 1.0

Finding zeros of functions is easy. Let’s quickly find the first 5 zeros of the Bessel function J0:

> [for x0 in [2.0;5.0;8.0;11.0;14.0] -> findzero (fun x -> J0 x) x0];;
val it : float list =
  [2.404825558; 5.52007811; 8.653727913; 11.79153444; 14.93091771]

Finding the minimum or maximum of a function can be difficult because most software requires that you supply the gradient of the function. Although computing derivatives is high school calculus, it’s still very error prone. Automatic differentiation comes to the rescue here. We’ll use the SymbolicMath class, which contains all kinds of functions that take advantage of symbolic computations to obtain a solution.

One of the most famous optimization problems is the Rosenbrock function. We can find the minimum of this function in just one line of code:

> SymbolicMath.FindMinimum((fun x y -> (1.0 - x)**2.0 + 100.0*(y-x**2.0)**2.0), !![-1.0;-1.0]);;
val it : SolutionReport =
  Extreme.Mathematics.SolutionReport`1[Extreme.Mathematics.Vector]
    {ConvergenceTest = null;
     Error = 2.778930274e-09;
     EvaluationsNeeded = 74;
     IterationsNeeded = 26;
     Result = [1.0000,1.0000];
     Status = Converged;}
>

It’s time now for you to play. Download the trial version of the Extreme Optimization Numerical Libraries for .NET here and have fun!

Using Quadratic Programming for Portfolio Optimization

This week’s update of the Extreme Optimization Numerical Libraries for .NET includes the ability to solve quadratic programming (QP) problems. These are optimization problems where the objective function is a quadratic function and the solution is subject to linear constraints.

Our QP solver uses an active set method, and can be used to solve programs with thousands of variables. Quadratic programming has many applications in finance, economics, agriculture, and other areas. It also serves as the core of some methods for nonlinear programming such as Sequential Quadratic Programming (SQP).

We created some QuickStart samples that demonstrates how to use the new functionality:

In this post, I’d like to illustrate an application in finance: portfolio optimization.

Say we have a set of assets. We know the historical average return and variances of each asset. We also know the correlations between the assets. With this information, we can look for the combination of assets that best meets our objectives. In this case, our goal will be to minimize the risk while achieving a minimal return. Now let’s see how we can model this.

The unknowns in our model will be the amount to be invested in each asset, which we’ll denote by xi. All values together form the vector x. We will assume that all xi ≥ 0.

We can use the variance of the value of the portfolio as a measure for the risk. A larger variance means that the probability of a large loss also increases, which is what we want to avoid. If we denote the variance-covariance matrix of the assets by R, then the variance of the asset value is xTRx. This is what we want to minimize.

Now let’s look at the constraints. We obviously have a budget, so the sum of all amounts must be less than or equal to our budget. Say our budget is $10,000, then we must have Σ xi ≤ 10,000. We also want a minimum expected return on our investment, say 10%. If ri is the return associated with asset i, then the total expected return for our investment is Σ ri xi, and we want this value to be at least 10% of our total budget, or $1,000.

So, in summary, our model looks like this:

Minimize xTRx
Subject to Σ xi ≤ 10000
Σ ri xi ≥ 1000 xi ≥ 0

The last thing we need to do is formulate this quadratic program in terms of the optimization classes in the Numerical Libraries for .NET. The QuickStart samples show three ways of doing this: you can define the model directly in terms of vectors and matrices; you can build the model from scratch, or you can read it from a model definition file in MPS format. Since the problem above is already pretty much expressed in terms of vectors and matrices, we’ll use that approach here.

Quadratic programming problems are represented by the QuadraticProgram class. One of the constructors lets you create a model in so-called standard form. A quadratic program in standard form looks like this:

Minimize c0 + cTx + ½xTHx
Subject to AEx = bE
AIxbI
lxu

where c0 is a scalar, H, AE, and AI are matrices, and all other values are vectors. The subscript E denotes quality constraints, while the subscript I denotes inequality constraints. In our case, we don’t have equality constraints.

As a concrete example, we will work with four assets with returns 5%, -20%, 15% and 30%, respectively. Our budget is $10,000, and the minimum return is 10%. The implementation is quite straightforward. Noteworthy is that, since in the standard form the right hand side is an upper bound, we have to change the sign of the minimum return constraint which had a lower bound right-hand side.

Here’s the C# code:

// The linear term in the objective function: 
Vector c = Vector.CreateConstant(4, 0.0); 
// The quaratic term in the objective function: 
Matrix R = Matrix.CreateSymmetric(4, new double[] 
    {
        0.08,-0.05,-0.05,-0.05,
        -0.05, 0.16,-0.02,-0.02,
        -0.05,-0.02, 0.35, 0.06,
        -0.05,-0.02, 0.06, 0.35
    }, MatrixTriangle.Upper); 
// The coefficients of the constraints: 
Matrix A = Matrix.Create(2, 4, new double[] 
    { 
        1, 1, 1, 1, 
        -0.05, 0.2, -0.15, -0.30
    }, MatrixElementOrder.RowMajor);
// The right-hand sides of the constraints: 
Vector b = Vector.Create(10000, -1000);
// We're now ready to call the constructor. 
// The last parameter specifies the number of equality 
// constraints. 
QuadraticProgram qp1 = new QuadraticProgram(c, R, A, b, 0); 
// Now we can call the Solve method to run the algorithm: 
Vector x = qp1.Solve();

This is how it looks in Visual Basic:

' The linear term in the objective function: 
Dim c As Vector = Vector.CreateConstant(4, 0.0) 
' The quaratic term in the objective function: 
Dim R As Matrix = Matrix.CreateSymmetric(4, _ 
    New Double() _
        { _
            0.08, -0.05, -0.05, -0.05, _
            -0.05, 0.16, -0.02, -0.02, _
            -0.05, -0.02, 0.35, 0.06, _
            -0.05, -0.02, 0.06, 0.35 _
        }, MatrixTriangle.Upper) 
' The coefficients of the constraints: 
Dim A As Matrix = Matrix.Create(2, 4, New Double() _
    { _
        1, 1, 1, 1, _
        -0.05, 0.2, -0.15, -0.3 _
    }, MatrixElementOrder.RowMajor) 
' The right-hand sides of the constraints: 
Dim b As Vector = Vector.Create(10000, -1000) 
' We're now ready to call the constructor. 
' The last parameter specifies the number of
' equality constraints. 
Dim qp1 As New QuadraticProgram(c, R, A, b, 0) 
' Now we can call the Solve method to run the algorithm: 
Dim x As Vector = qp1.Solve()

And finally in F#:

// The linear term in the objective function:
let c = Vector.CreateConstant(4, 0.0); 
// The quaratic term in the objective function:
let R = Matrix.CreateSymmetric(4,
    [|
        0.08;-0.05;-0.05;-0.05;
        -0.05; 0.16;-0.02;-0.02;
        -0.05;-0.02; 0.35; 0.06;
        -0.05;-0.02; 0.06; 0.35
    |], MatrixTriangle.Upper) 
// The coefficients of the constraints: 
let A = Matrix.Create(2, 4, 
    [|
        1.0; 1.0; 1.0; 1.0;
        -0.05; 0.2; -0.15; -0.30
    |], MatrixElementOrder.RowMajor) 
// The right-hand sides of the constraints: 
let b = Vector.Create(10000.0, -1000.0) 
// We're now ready to call the constructor. 
// The last parameter specifies the number of 
// equality constraints. 
let qp1 = new QuadraticProgram(c, R, A, b, 0)
// Now we can call the Solve method to run the algorithm:
let x = qp1.Solve()

After we’ve run this code, the vector x will contain the optimal amounts for each asset. It gives us: $3,453, $0, $1,069, $2,223. Not surprisingly, the second asset, which has had a negative return on average, should not be part of our portfolio. However, it turns out that the asset with the lowest positive return should have the largest share.

Accurate trigonometric functions for large arguments

This week we introduced two new features in the Extreme Optimization Numerical Libraries for .NET.

Trigonometric functions with large arguments

The .NET Framework implementation of the trigonometric functions, sine, cosine, and tangent, relies on the corresponding processor instruction. This gives extremely fast performance, but may not give fully accurate results. This is the case for even fairly small arguments.

For example, Math.Sin(1000000.0) returns -0.34999350217129177 while the correct value is -0.34999350217129295. So we’ve already lost at least 2 digits. And things just go downhill from there. At 1012, we only get 8 good digits.

For even larger arguments, something quite unexpected happens. The result of Math.Sin(1e20) is… 1e20! The argument is returned unchanged! Not only is this a completely meaningless return value. It also can cause a calculation to fail if it relies on the fact that sine and cosine are bounded by -1 and +1.

To understand what is going on, we need to go back to the implementation.

Math.Sin, Math.Cos and Math.Tan rely on processor instructions like fsin and fcos. These instructions work by first reducing the angle to a much smaller value, and then using a table-based approach to get the final result. Moreover, according to the documentation, the argument must be strictly between –263 and 263. The return value is not defined for arguments outside this interval. This explains the complete breakdown for large arguments.

The explanation for the gradual loss of accuracy is more subtle. As I said, the computation is done by first reducing the argument to a smaller interval, typically (–π, π]. The argument x is rewritten in the form

x = n π + f

where n is an even integer, and f is a real number with magnitude less than π.

The first step is to divide the number by π. The quotient is rounded to the nearest even integer to give us n. We get the reduced value f by subtracting this number times π from x. Because of the periodicity of the trigonometric functions, the value of the function at x is the same as the value at f, which can be calculated accurately and efficiently.

Now, if you start with a value like 10300, you’ll need to divide this number by π with at least 300 digits of precision, round the quotient to the nearest even number, and subtract π times this number from 10300. To do this calculation accurately over the whole range, we would need to work with a value of π that is accurate to 1144 bits.

Even today, this is just a little bit beyond the capabilities of current hardware. Moreover, it could be considered wasteful to spend so much silicon on a calculation that is not overly common. Intel processors use just 66 bits. This leaves us with just 14 extra bits. The effects will begin to show with arguments that are larger than 216.

Accurate argument reduction for trigonometric functions

So, there are two problems with the standard trigonometric functions in the .NET Framework:

  1. The accuracy decreases with increasing size of the argument, losing several digits for even modest sized arguments.
  2. When the argument is very large, the functions break down completely.

To address these two problems, we’ve added Sin, Cos and Tan methods to the Elementary class. These methods perform fully accurate argument reduction so the results are accurate even for huge values of the argument.

For example, the 256 digit floating-point number 6381956970095103 2797 is very close to a multiple of π/2. In fact, it is so close that the first 61 binary digits after the period of the reduced value are 0. As expected, Math.Cos gives the completely meaningless result 5.31937264832654141671e+255. Elementary.Cos returns -4.6871659242546267E-19, which is almost exactly the same as what Wolfram|Alpha gives: 4.6871659242546276E-19. The relative error is about the size of the machine precision.

What about performance? Thanks to some tricks with modular multiplication, it is possible to do the reduction in nearly constant time. Moreover, since the reduction only needs to be done for larger arguments, the overhead in most situations is minimal. Our benchmarks show a 4% overall slowdown when no reduction is needed. For smaller values, the operation can take close to twice as long, while in the worst case, for large arguments that require a full reduction, we see a 10x slowdown.

Transcendental functions for System.Decimal

Also new in this week’s update is the DecimalMath class, which contains implementations of all functions from System.Math for the decimal type. All trigonometric, hyperbolic, exponential and logarithmic functions are supported, as well as the constants e and π.

The calculations are accurate to the full 96 bit precision of the decimal type. This is useful for situations where double precision is not enough, but going with an arbitrary precision BigFloat would be overkill.

Double trouble: an interesting puzzle

I’ve written before about some of the strange looking behavior you can find when working with numbers on a computer. As this article explains at length, this behavior is a completely logical consequence of the need to use a finite set of values to represent potentially infinitely many numbers.

Kathy Kam of the Microsoft CLR team recently posted another interesting piece of floating-point wierdness. The question boils down to this. When you run this code:

Console.WriteLine(4.170404 == Convert.ToDouble(“4.170404“));

why does it print false instead of the expected true?

The answer turns out to be not trivial and comes down to the different ways that the .NET framework and the C# compiler convert text strings to numbers.

The C# compiler uses the Windows API function VarR8FromStr to convert the literal value to a double. The documentation isn’t very precific about its inner workings. The C# spec says in section 9.4.4.3 that doubles are rounded using IEEE “round to nearest“ mode.

So what happens here? First, the number is ‘normalized’: the number is scaled by a power of two so that the number is between 1 and 2. The scale factor is moved to the exponent. Next, the number is rounded to double precision, which has 52 significant binary digits.

Here, 4.170404 is divided by 4, which gives 1.042601. The binary representation of this number is:

1.0000101011100111111001100010110111000110111000101…

The part that interests us starts at digit #52 after the “binary point,“ so let’s show everything after, say, the 40th digit:

4        5          6          7
1234567890 1234567890 1234567890
1110001010 1010000000 0000011001

To round to 52 binary digits, we need to round upwards:

4        5
1234567890 12
1110001010 11

This result is used to compose the double precsion number.

If we follow the path of the Convert.ToDouble call, we find that it passes its string argument on to Double.Parse. This method prepares an internal structure called a NumberBuffer and eventually calls a function named NumberBufferToDouble internal to the CLR. In the Shared Source CLI implementation (Rotor) code, we find the following comment for the function that does all the hard work:

    The internal integer representation of the float number is
    UINT64 mantisa + INT exponent. The mantisa is kept normalized
    ie with the most significant one being 63-th bit of UINT64.

This is good news – extra precision is used to ensure we get the correct result. Looking further, we find that this function uses a helper function appropriately called Mul64Lossy: which multiplies two 64bit values. In the comments, we find this:

    // it’s ok to losse some precision here – Mul64 will be called
    // at most twice during the conversion, so the error won’t propagate
    // to any of the 53 significant bits of the result

This is bad news. If you look at the binary representation of 4.170404/4 above, you’ll see that all digits from the 54th up to the 65th are zero. So it is very much possible that there was some loss of precision here, and that it did propagate to the last significant digit of the final result. The assumption made by the developer of this code is mostly right, but sometimes wrong.

But why risk loss of precision when it can be avoided? The (misguided) answer is: speed. The Rotor code contains a function, Mul64Precise which doesn’t suffer from this loss of precision. However, it does use a few extra instructions to do some more shifting and multiplying. The function is only used in debug mode to verify that some internal conversion tables are correct.

In the grand scheme of things, the few extra instructions that would be used to get a correct result have only a very small effect on performance. The Convert.ToDouble method that started it all ends up spending most of its time parsing according to the specified locale, checking for currency symbols, thousands separators, etc. Only a tiny fraction of the time is spent in the Mul64 functions.

Let’s estimate how common this error is. For a conversion error to occur, the 12 bits from the 53rd to the 64th must all be zero. That’s about 1 in 4000. Also, the rounding must be affected, which gives us another factor of 2 or 4. So, as many as 1 conversion out of every 10000 may suffer from this effect!

The moral of the story: Be very careful with your assumptions about how errors will propagate. Don’t compromise for the sake of a few CPU cycles, unless performance is absolutely critical and is more important than accuracy.

Fast, Faster, Fastest

One of my favourite pass-times is getting the most out of a piece of code. Today, I got an opportunity to play a bit from a comment on Rico Mariani’s latest performance challenge: how to format the time part of a DateTime structure in the form “hh:mm:ss.mmm“ in the fastest possible way.

Apparently, Alois Kraus and Greg Young have been at it for a bit. Their solution already gives us more than a five-fold increase in speed compared to the simplest solution using String.Format. But could we do even better?

As it turns out, we could. Here’s the code that we can improve:

long ticks = time.Ticks;
int
hour = (int)((ticks / 0x861c46800L)) % 24;
int minute = (int
)((ticks / 0x23c34600L)) % 60;
int second = (int
)(((ticks / 0x989680L)) % 60L);
int ms = (int)(((ticks / 0x2710L)) % 0x3e8L);

The tick count is the number of 100 nanosecond (ns) intervals since the zero time value. For each of the hour, minute, second and millisecond parts, this code divides the number of ticks by the number of 100ns intervals in that time span, and reduces that number to the number of units in the larger time unit using a modulo. So, for example, there are 60x60x10000000 ticks in an hour, which is 0x861c46800 in hex, and there are 24 hours in a day.

What makes the above code less than optimal is that it starts from the number of ticks to compute every time part. This is a long (64 bit) value. 64-bit calculations are slower than 32-bit calculations. Moreover, divisions (or modulos) are much more expensive than multiplications.

We can fix both these issues by first finding the total number of milliseconds in the day. That number is always smaller than 100 million, so it fits in an int. We can calculate the number of hours with a simple division. We can “peel off” the hours from the total number of milliseconds in the day to find the total milliseconds remaining in the hour. From this, we can calculate the number of minutes with a simple division, and so on. The improved code looks like this:

long ticks = time.Ticks;
int ms = (int)((ticks / 10000) % 86400000);
int
hour = ms / 3600000;
ms -= 3600000*hour;
int
minute = ms / 60000;
ms -= 60000 * minute;
int
second = ms / 1000;
ms -= 1000*second;

This change decreases the running time by about 28 percent from the fastest previous solution. We can shave off another 4% or so by replacing the modulo calculation by a subtraction in the code that computes the digits.

The question now is: can we do even better, still?

Once again, the answer is: Yes: by as much as another 25%!

The single most time consuming calculation is a division. Dividing by large numbers is an order of magnitude slower than multiplying. For smaller numbers, the difference is smaller, but still significant. Since we know the numbers we are dividing by in advance, we can do a little bit shifting magic and eliminate the divisions altogether.

Let’s take dividing by 10 as an example. The basic idea is to approximate the ratio 1/10 by another rational number with a power of two as the denominator. Instead of dividing, we can then multiply by the numerator, and shift by the exponent in the denominator. Since shifting chops off digits, it effectively rounds down the result of the division, so we always have to find an approximation that is larger than the ratio.

We see, for example, that 13/128 is a good approximation to 1/10. We can rewrite x/10 as (x*13) >> 7 as long as x is not too large. We run into trouble as soon as the error time x is larger than 1. In this case, that happens when x is larger than 13/(13-12.8) = 65. Fortunately, this is larger than the number of hours in a day, or the number of minutes in an hour, so we can use it for most calculations in our code. It won’t work for numbers up to a 100, so to get the second digit of the millisecond, we need the next approximation, 205/2048, which is good for values up to 10,000.

To get the first digit of the milliseconds, we need to divide by 100. We find that 41/4096 works nicely.

Implementing this optimization, we go from (for example):

*a = (char)(hour / 10 + ‘0’);
a++;
*a = (
char)(hour % 10 + ‘0’);

to:

int temp = (hour * 13) >> 7;
*a = (
char)(temp + ‘0’
);
a++;
*a = (
char)(hour – 10 * temp + ‘0’);

Our running time for 1 million iterations goes down from 0.38s to 0.28s, a savings of almost 18% compared to the original.

The larger divisors give us a bit of a challenge. To get the number of seconds, we divide a number less than 60000 by 1000. Doing this the straight way has us multiplying by 536871, which would require a long value for the result of the multiplication. We can get around this once we realize that 1000 = 8*125. So if we shift the number of milliseconds by 3, we only need to divide by 125. As an added benefit, the numbers we’re multiplying are always less than 7500, so our multiplier can be larger. This gives us the simple expression good for numbers up to almost 4 million: ((x >> 3) * 67109) >> 23.

The same trick doesn’t work for getting the minutes and hours, but it does allow us to fit the intermediate result into a long. We can use the Math.BigMul method to perform the calculation efficiently.

The final code is given below. It is doubtful it can be improved by much. It runs in as little as 0.221s for one million iterations, 2.5 times faster than the previous fastest code and over 25 times faster than the original.

private unsafe static string FormatFast6(DateTime time)
{
   
fixed (char
* p = dateData)
    {
       
long
ticks = time.Ticks;
       
char
* a = p;
       
int ms = (int
)((ticks / 10000) % 86400000);
       
int hour = (int)(Math
.BigMul(ms >> 7, 9773437) >> 38);
        ms -= 3600000 * hour;
       
int minute = (int)((Math
.BigMul(ms >> 5, 2290650)) >> 32);
        ms -= 60000 * minute;
       
int
second = ((ms >> 3) * 67109) >> 23;
        ms -= 1000 * second;

       
int temp = (hour * 13) >> 7;
        *a = (
char)(temp + ‘0’
);
        a++;
        *a = (
char)(hour – 10 * temp + ‘0’
);
        a += 2;

        temp = (minute * 13) >> 7;
        *a = (
char)(temp + ‘0’
);
        a++;
        *a = (
char)(minute – 10 * temp + ‘0’
);
        a += 2;

        temp = (second * 13) >> 7;
        *a = (
char)(temp + ‘0’
);
        a++;
        *a = (
char)(second – 10 * temp + ‘0’
);
        a += 2;

        temp = (ms * 41) >> 12;
        *a = (
char)(temp + ‘0’
);
        a++;
        ms -= 100 * temp;
        temp = (ms * 205) >> 11;
        *a = (
char)(temp + ‘0’
);
        ms -= 10 * temp;
        a++;
        *a = (
char)(ms – 10 * temp + ‘0’
);
    }
    
return new String
(dateData);
}

Yahoo Finance miscalculates monthly average daily volume

Am I missing something here?

While testing some time series functionality in the new version of our statistics library, we came across a rather curious discrepancy. We used the historical quotes available from Yahoo Finance as a reference resource. As it turns out, comparison with our data appears to show that Yahoo miscalculates some summary statistics.

The error occurs on the Historical Prices page when using a monthly timeframe. Take the monthly data for 2005 for Microsoft’s stock (symbol MSFT). This shows an average daily volume for January of 79,642,818 shares. According to the help document, this is “the average daily volume for all trading days in the reported month.”

When we look at the daily prices for January 2005, we find 20 trading days. When we add up all the daily volumes, we find 1,521,414,280 shares changed hands that month. That should give an average daily volume of 76,070,714 shares, more than 3 million shares less than Yahoo’s figure. Why the difference?

A brief investigation showed that the difference can be explained because Yahoo includes the volume on the last trading day of the month twice. If you add the volume of Jan. 1st to the total, we get 1,592,856,376. Dividing by the number of trading days (20) gives 79,642,818.8.

When we look at other months, we find the same pattern: Yahoo consistently overstates the average daily volume for the month by a few percentage points. Each time, this difference can be explained by the double inclusion of the volume of the last trading day in the total volume for the month.

Here’s a random sample: Research in Motion for June 2000. Yahoo gives an average daily volume of 4,262,160 shares. Our calculation shows an average of 3,870,800 shares corresponding to a total volume of 77,416,000 for the month. Yahoo’s total corresponds to 85,243,200 shares. The difference of 7,820,200 shares is exactly the volume for June 30th.

The weekly average daily volume appears to be correct.

I find it hard to believe that a service that is as widely used as Yahoo would show such an error. So, my question to the experts in technical analysis out there is: What am I missing???

Accessing floating-point context from .NET applications

A few days ago, I wrote about the need for the .NET framework to support floating-point exceptions. Although the solution I proposed there goes some way towards fixing the problem, it would create a few of its own as well.

The main issue is that the processor’s FPU (floating-point unit) is a global resource. Therefore it should be handled with extreme care.

For an example of what can go wrong if you don’t, we can look at the story of Managed DirectX. DirectX is the high-performance multimedia/graphics/gaming API on the Windows platform. Because performance is of the essence, the DirectX guys decided to ask the processor to do its calculations in single precision by default. This is somewhat faster than the default, and all that is generally needed for the kinds of applications they encountered.

When people started using the .NET framework to build applications using Managed DirectX, some people found that it broke the .NET math functions. The reason: the precision setting of the FPU is global. It therefore affected all floating-point code in the application, including the calls into the .NET Base Class Libraries.

The problem with our initial solution is that it doesn’t behave nicely. It doesn’t isolate other code from changes to the floating-point state. This can affect code in unexpected ways.

So how do we fix this?

One option is to flag code that uses special floating-point settings with an attribute like “FloatingPointContextAware.” Any code that has this attribute can access floating-point state. However, any code that does not have this attribute would require the CLR to explicitly save the floating-point state.

Unfortunately, this fix suffers from a number of drawbacks:

  1. In some applications, like error analysis, you want to set global state. For example, one way to check whether an algorithm is stable is to run it twice using different rounding modes. If the two results are significantly different, you have a problem.
  2. It is overkill in most situations. If you’re going to use special floating-point features, you should be able to isolate yourself properly from any changes in floating-point state. If you have no knowledge such features exist, chances are your code won’t break if, say, the rounding mode is changed.
  3. It is a performance bottleneck. You can’t require all your client code to also have this attribute. Remember the goal was to allow ‘normal’ calculations to be done as fast as possible while making the edge cases reasonably fast as well. It is counter-productive to impose a performance penalty on every call.

The best solution I can think of is to actually treat the floating-point unit for what it is: a resource. If you want to use its special features, you should inform the system when you plan to use them, and also when you’re done using them.

We make floating-point state accessible only through instances of a FloatingPointContext class, which implements IDisposable and does the necessary bookkeeping. It saves enough of the floating-point state when it is constructed, and restores it when it is disposed. You don’t always have to save and restore the full floating-point state, including values on the stack, etc. In most cases, you only need to save the control word, which is relatively cheap.

A typical use for this class would be as follows:

using (FloatingPointContext context = new FloatingPointContext())
{
    context.RoundingMode = RoundingMode.TowardsZero;
    context.ClearExceptionFlags();
    // Do some calculations
    if (context.IsExceptionFlagRaised(
         
FloatingPointExceptionFlag.Underflow))
      // Do some more stuff
}

I have submitted a proposal for a FloatingPointContext class to the CLR team. So far they have chosen to keep it under consideration for the next version. Let’s hope they’ll choose to implement it.