Feb 17

Am I missing something here?

While testing some time series functionality in the new version of our statistics library, we came across a rather curious discrepancy. We used the historical quotes available from Yahoo Finance as a reference resource. As it turns out, comparison with our data appears to show that Yahoo miscalculates some summary statistics.

The error occurs on the Historical Prices page when using a monthly timeframe. Take the monthly data for 2005 for Microsoft’s stock (symbol MSFT). This shows an average daily volume for January of 79,642,818 shares. According to the help document, this is “the average daily volume for all trading days in the reported month.”

When we look at the daily prices for January 2005, we find 20 trading days. When we add up all the daily volumes, we find 1,521,414,280 shares changed hands that month. That should give an average daily volume of 76,070,714 shares, more than 3 million shares less than Yahoo’s figure. Why the difference?

A brief investigation showed that the difference can be explained because Yahoo includes the volume on the last trading day of the month twice. If you add the volume of Jan. 1st to the total, we get 1,592,856,376. Dividing by the number of trading days (20) gives 79,642,818.8.

When we look at other months, we find the same pattern: Yahoo consistently overstates the average daily volume for the month by a few percentage points. Each time, this difference can be explained by the double inclusion of the volume of the last trading day in the total volume for the month.

Here’s a random sample: Research in Motion for June 2000. Yahoo gives an average daily volume of 4,262,160 shares. Our calculation shows an average of 3,870,800 shares corresponding to a total volume of 77,416,000 for the month. Yahoo’s total corresponds to 85,243,200 shares. The difference of 7,820,200 shares is exactly the volume for June 30th.

The weekly average daily volume appears to be correct.

I find it hard to believe that a service that is as widely used as Yahoo would show such an error. So, my question to the experts in technical analysis out there is: What am I missing???

Feb 10

Programming language design is a very tough discipline. You can be sure that every feature will be used and just as often abused in unexpected ways. Mistakes are close to impossible to fix.

C# is a great language. It’s my language of choice for most applications. It derives a lot of its quality from the cautious nature of its lead architect, Anders Hejlsberg, who tries to guide programmers into writing good code. That’s why unsafe code is called unsafe. Not because it is a security risk, but because it is really easy to get yourself into a buggy mess with it. It is why methods and properties have to be explicitly declared virtual, because people tend to override methods when it is not appropriate.

In some cases, however, this protecting developers against themselves can go to far. Overload resolution is one example I came across recently.

Consider a Curve class that represents a mathematical curve. It has a ValueAt method which gets the value of the curve at a specific x-value, which is passed in as a Double argument. This is a virtual method, of course, and specific types of curves, like Polynomial or CubicSpline, provide their own implementation.

Now, for polynomials in particular, it is sometimes desirable to get the value of the polynomial for a complex argument. So we define an overload that takes a DoubleComplex argument and also returns a DoubleComplex.

So far so good.

But now we want to add a conversion from Double to DoubleComplex. This is a widening conversion: every real number is also a complex number. So it is appropriate to make the conversion implicit. We can then write things like:

DoubleComplex a = 5.0;

instead of

DoubleComplex a = new DoubleComplex(5.0);

Unfortunately, this change breaks existing code. The following snippet will no longer compile:

Polynomial p = new Polynomial(1.0, 2.0, 3.0);
double y = p.ValueAt(1.0);

On the second line, we get an error message: “Cannot implicitly convert type DoubleComplex to Double.” Why? Because of the way C# resolves method overloads.

Specifically, C# considers methods declared in a type before anything else, including override methods. Section 7.3 of the C# spec (“Member lookup“) states:

First, the set of all accessible (Section 3.5) members named N declared in T and the base types (Section 7.3.1) of T is constructed. Declarations that include an override modifier are excluded from the set. If no members named N exist and are accessible, then the lookup produces no match, and the following steps are not evaluated.

In this case, because there is an implicit conversion from Double to DoubleComplex, the ValueAt(DoubleComplex) overload is applicable. Even though there is an overload whose parameters match exactly, it isn’t even considered here, because it is an override.

This highly unintuitive behavior is justified by the following two rules:

  1. Whether or not a method is overridden is an implementation detail that should be allowed to change without breaking client code.
  2. Changes to a base class that don’t break an inherited class should not break clients of the inherited class.

Even though neither of these actually applies to our example, I can understand that these rules are useful in many situations. In this case, however, it essentially hides the ValueAt(Double) overload I defined, unless I use an ugly upcast construct like

double y = ((Curve)p).ValueAt(1.0);

My problem is this: If I define an overload that takes a specific argument type, clearly my intent is for that overload to be called whenever the argument is of that exact type. This rule violates that intent.

In my view, it was a mistake to violate developer intent in this way and not give an exact signature match precedence over every other overload candidate. Unfortunately, this is one of those choices that in all likelihood is next to impossible to fix.

Visual Basic has a different set of overload resolution rules. It looks for the overload with the least widening conversion, and so would pick the correct overload in this case.

Thanks to Neal Horowitz for helping me clear up this issue.

Feb 08

A few days ago, I wrote about the need for the .NET framework to support floating-point exceptions. Although the solution I proposed there goes some way towards fixing the problem, it would create a few of its own as well.

The main issue is that the processor’s FPU (floating-point unit) is a global resource. Therefore it should be handled with extreme care.

For an example of what can go wrong if you don’t, we can look at the story of Managed DirectX. DirectX is the high-performance multimedia/graphics/gaming API on the Windows platform. Because performance is of the essence, the DirectX guys decided to ask the processor to do its calculations in single precision by default. This is somewhat faster than the default, and all that is generally needed for the kinds of applications they encountered.

When people started using the .NET framework to build applications using Managed DirectX, some people found that it broke the .NET math functions. The reason: the precision setting of the FPU is global. It therefore affected all floating-point code in the application, including the calls into the .NET Base Class Libraries.

The problem with our initial solution is that it doesn’t behave nicely. It doesn’t isolate other code from changes to the floating-point state. This can affect code in unexpected ways.

So how do we fix this?

One option is to flag code that uses special floating-point settings with an attribute like “FloatingPointContextAware.” Any code that has this attribute can access floating-point state. However, any code that does not have this attribute would require the CLR to explicitly save the floating-point state.

Unfortunately, this fix suffers from a number of drawbacks:

  1. In some applications, like error analysis, you want to set global state. For example, one way to check whether an algorithm is stable is to run it twice using different rounding modes. If the two results are significantly different, you have a problem.
  2. It is overkill in most situations. If you’re going to use special floating-point features, you should be able to isolate yourself properly from any changes in floating-point state. If you have no knowledge such features exist, chances are your code won’t break if, say, the rounding mode is changed.
  3. It is a performance bottleneck. You can’t require all your client code to also have this attribute. Remember the goal was to allow ‘normal’ calculations to be done as fast as possible while making the edge cases reasonably fast as well. It is counter-productive to impose a performance penalty on every call.

The best solution I can think of is to actually treat the floating-point unit for what it is: a resource. If you want to use its special features, you should inform the system when you plan to use them, and also when you’re done using them.

We make floating-point state accessible only through instances of a FloatingPointContext class, which implements IDisposable and does the necessary bookkeeping. It saves enough of the floating-point state when it is constructed, and restores it when it is disposed. You don’t always have to save and restore the full floating-point state, including values on the stack, etc. In most cases, you only need to save the control word, which is relatively cheap.

A typical use for this class would be as follows:

using (FloatingPointContext context = new FloatingPointContext())
{
    context.RoundingMode = RoundingMode.TowardsZero;
    context.ClearExceptionFlags();
    // Do some calculations
    if (context.IsExceptionFlagRaised(
         
FloatingPointExceptionFlag.Underflow))
      // Do some more stuff
}

I have submitted a proposal for a FloatingPointContext class to the CLR team. So far they have chosen to keep it under consideration for the next version. Let’s hope they’ll choose to implement it.

Jan 30

I’ve decided to share some of our experiences during the development of our math and statistics libraries in the hope that they may contribute to improvements in the .NET platform as the next version is being designed.

The CLR is a general-purpose runtime environment, and cannot be expected to support every application at the fastest possible speed. However, I do expect it to perform reasonably well, and if a performance hit can be avoided, then it should be. The absence of any floating-point exception mechanism incurs such a performance hit in some fairly common situations.

As an example, let’s take an implementation of complex numbers. This is a type for general use, and has to give accurate results whenever possible. For obvious reasons, we want the core operations to be as fast as possible. This means we want to inline when we can, and make our code fast, too. Most operations are fairly straightforward, but division isn’t. Let’s start with the ‘naïve’ implementation:

struct Complex
{
  private re, im;
  public static operator/(Complex z1, Complex z2)
  {
    double d = z2.re*z2.re + z2.im*z2.im;
    double resultRe = z1.re * z2.re + z1.im * z2.im;
    double resultIm = z1.im * z2.re – z1.re * z2.im;
    return new Complex(resultRe / d, resultIm / d);
  }
}

If any of the values d, resultRe, and resultIm underflow, the result will lose accuracy, because subnormal numbers by definition don’t have the full 52 bit precision. The CLR also offers no indication that underflow has occurred. This can be fixed, mostly, by modifying the above to:

public static operator/(Complex z1, Complex z2) 
{ 
  if (Math.Abs(z2.re) > Math.Abs(z2.im) 
  { 
    double t = z2.im / z2.re; 
    double d = z2.re + t * z2.im; 
    double resultRe = (z1.re + t * z1.im); 
    double resultIm = (z1.im – t * z1.re); 
    return new Complex(resultRe / d, resultIm / d); 
  } 
  else 
  { 
    double t = z2.re / z2.im; 
    double d = t * z2.re + z2.im; 
    double resultRe = (t * z1.re + z1.im); 
    double resultIm = (t * z1.im – z1.re); 
    return new Complex(resultRe / d, resultIm / d); 
  } 
}

This will give accurate results in a larger domain, but is slower because of the extra division. Worse still, some operations that one would expect to give exact results now aren’t exact. For example, if z1 = 27-21i and z2 = 9-7i, the exact result is 3, but the round-off in the division by 9 destroys the exact result.

IEEE-754 exceptions would come to the rescue here – if they were available. Exceptions (a term with a specific meaning in the IEEE-754 standard – not to be confused with CLR exceptions) raise a flag in the FPU’s status register, and can also be trapped by the operating system. We don’t need a trap here. We can do what we need to do with a flag. The code would look something like:

public static operator/(Complex z1, Complex z2) 
{ 
  FloatingPoint.ClearExceptionFlag(
    FloatingPointExceptionFlag.Underflow);
 
  double d = z2.re*z2.re + z2.im*z2.im; 
  double resultRe = z1.re * z2.re + z1.im * z2.im; 
  double resultIm = z1.im * z2.re – z1.re * z2.im; 
  if (FloatingPoint.IsExceptionFlagRaised(
    FloatingPointExceptionFlag.Underflow)
 
  { 
    // Code for the special cases. 
  } 
  return new Complex(resultRe / d, resultIm / d); 
}

Note that the CLR strategy to “continue with default values” won’t work here, because complete underflow is defaulted to 0, which can not be distinguished from the common case when the result is exactly zero (and therefore no special action is required). The only way to do it right would be a whole series of ugly comparisons, which would make the code slower and harder to read/maintain. Even if a language supported a mechanism to check for underflow (by inserting comparisons to a suitable small value before and after storing the value), this would bloat the IL, introduce unnecessary round-off (by forcing a conversion from extended to double on each operation), and slow things down unnecessarily.

This type of scenario occurs many times in numerical calculations. You perform a calculation the quick and dirty way, and if it turns out you made a mess, you try again but you’re more cautious. The complex number example is the most significant one I have come across while developing our numerical libraries.

Nearly all hardware that the CLR (or its clones) run on, supports this floating-point exception mechanism. I found it somewhat surprising that a ‘standard’ virtual execution environment would not adopt another and well-established standard (IEEE-754) for a specific subset of its functionality.

For more on the math jargon used in this post, see my article Floating-Point in .NET Part 1: Concepts and Formats.

Jan 28

Board games that is…

Once upon a time, a board game consisted of a board, some pawns and some dice. Over the years, game designers added more and more props. Dynamic game play was possible, but you had to do all the work.

Now, Philips Research have developed the Entertaible. Pawns and dice are still around, but now the board itself can change during play. My friend Peter Sels did some of the programming.

Since they’re only at the prototype stage, the price tag will likely be too high for some time, even for Santa. Maybe in a few years they’ll sell a “1000 in 1” board game for $19.95…

Dec 08

At the recent supercomputing conference in Seattle, Bill Gates gave a keynote address (transcript) where he announced that Microsoft had re-discovered its interest in scientific computing. (Microsoft’s second product was a Fortran compiler for CP/M, released in August 1977.) The motive is obviously commercial, but the statement is still significant. Microsoft has treated technical computing as a second class citizen for too long.

A couple of points I found of particular interest:

  • The primary interest appears to be in smoothing out the process of scientific discovery, with a lot of emphasis on data integration (using XML, of course), transparent distributed computing, and enhanced collaboration.
  • There is no mention at all of the nuts and bolts programming of numerical applications. He briefly mentions ‘visual programming,’ but that’s about it.

Improving workflow is one of Microsoft’s strengths. Scientific computing, on the other hand, is not, which probably explains Gates’ silence on the subject. Many people applauded DEC’s acquisition of Microsoft Fortran, which is now being phased out in favour of Intel Visual Fortran.

The .NET platform doesn’t exactly make it easy to write good numerical software. Even aside from the lack of support for rounding modes and standard floating-point exception mechanisms, the standard .NET languages (C# and Visual Basic) are not technical-computing friendly. There is no doubt in my mind that the .NET platform delivers a major improvement in developer productivity for general line-of-business applications.

Unfortunately, where technical computing is concerned, the productivity promise remains largely unfulfilled. It takes more than a keynote speech to win over the hearts of a community that has been ignored for so long. “Developers, developers, developers!”, Bill. Make writing numerical software for .NET a joy for developers.

What we need is a .NET language specifically targeted at technical computing. Will F# be it? Don Syme is obviously doing a lot of great work there. But functional programming for everyone?

Oct 02

One of the biggest issues facing developers of reusable libraries is backward compatibility. When you get to work on version 2.0 or later, you will inevitably regret some of the choices you made in the previous version. Typical issues include:

  • We should have named this method XXX instead.
  • This should have been a method, not a property.
  • We should have put these types in their own namespace.
  • We should have used an enum instead of a bool for that parameter.

Unfortunately, fixing any of these issues could break existing code. Even though you may have very good reasons to change the API, your customers won’t be happy.

I read a suggestion recently to use the Obsolete attribute to aid in the conversion. This class has a Message property that can be used to inform the user of the recommended alternative.

Now, wouldn’t it be nice if, in addition to this message, you could specify the required changes in a formal manner using attributes. With Visual Studio 2005′s refactoring support, it should then be relatively easy to write an add-in that interprets this information and performs the necessary changes for you. This would work only for relatively simple modifications. But it would still be a big help, and make the transition to compliant code a lot easier.

Aug 02

If this trend continues, and the number of blogs doubles every five months, then within the next four years, every person on this planet will have a blog…

Jul 06

With the creation of the .NET platform, Microsoft has taken the potential for developer productivity to the next level. .NET is much more than a Virtual Machine environment with an extensive class library. It is a platform for efficient programming. Here at Extreme Optimization, we have completely embraced this side of the framework.

Yet, as the comments to a recent blog entry from Brad Adams show, not everyone cares like we do. (He rightly points out that compliance with the Design Guidelines should be a requirement.)

What is surprising is that even some of the big names in component development, like Infragistics and ComponentOne, get bad marks.

Currently, Microsoft’s partnership programs for ISV’s are based solely on what the ISV is prepared to pay for the co-marketing opportunities that come with them. There is no consideration whatsoever for the quality of the products.

I believe it would be in the interest of Microsoft as well as the developer community as a whole to create some kind of “Built for .NET” logo program, with strong quality and guideline compliance requirements. This would encourage good design, and increase the effectiveness of developers across the board, giving the .NET platform a distinct advantage over its competition.

Now, software design is somewhat subjective, and the design guidelines should not be taken as absolutes. Our own libraries violate some of the guidelines, because some general assumptions simply don’t work well for numerical software. However, a core set of design rules should be mandatory (for example: require enums instead of numerical codes), while a certain number of violations of less stringent requirements would be allowed. Alternatives should be provided for violations, and the violations themselves should be marked with the Obsolete attribute. Etc., etc.

Just my 2 cents…

preload preload preload