Fast, Faster, Fastest

One of my favourite pass-times is getting the most out of a piece of code. Today, I got an opportunity to play a bit from a comment on Rico Mariani’s latest performance challenge: how to format the time part of a DateTime structure in the form “hh:mm:ss.mmm“ in the fastest possible way.

Apparently, Alois Kraus and Greg Young have been at it for a bit. Their solution already gives us more than a five-fold increase in speed compared to the simplest solution using String.Format. But could we do even better?

As it turns out, we could. Here’s the code that we can improve:

long ticks = time.Ticks;
int
hour = (int)((ticks / 0x861c46800L)) % 24;
int minute = (int
)((ticks / 0x23c34600L)) % 60;
int second = (int
)(((ticks / 0x989680L)) % 60L);
int ms = (int)(((ticks / 0x2710L)) % 0x3e8L);

The tick count is the number of 100 nanosecond (ns) intervals since the zero time value. For each of the hour, minute, second and millisecond parts, this code divides the number of ticks by the number of 100ns intervals in that time span, and reduces that number to the number of units in the larger time unit using a modulo. So, for example, there are 60x60x10000000 ticks in an hour, which is 0x861c46800 in hex, and there are 24 hours in a day.

What makes the above code less than optimal is that it starts from the number of ticks to compute every time part. This is a long (64 bit) value. 64-bit calculations are slower than 32-bit calculations. Moreover, divisions (or modulos) are much more expensive than multiplications.

We can fix both these issues by first finding the total number of milliseconds in the day. That number is always smaller than 100 million, so it fits in an int. We can calculate the number of hours with a simple division. We can “peel off” the hours from the total number of milliseconds in the day to find the total milliseconds remaining in the hour. From this, we can calculate the number of minutes with a simple division, and so on. The improved code looks like this:

long ticks = time.Ticks;
int ms = (int)((ticks / 10000) % 86400000);
int
hour = ms / 3600000;
ms -= 3600000*hour;
int
minute = ms / 60000;
ms -= 60000 * minute;
int
second = ms / 1000;
ms -= 1000*second;

This change decreases the running time by about 28 percent from the fastest previous solution. We can shave off another 4% or so by replacing the modulo calculation by a subtraction in the code that computes the digits.

The question now is: can we do even better, still?

Once again, the answer is: Yes: by as much as another 25%!

The single most time consuming calculation is a division. Dividing by large numbers is an order of magnitude slower than multiplying. For smaller numbers, the difference is smaller, but still significant. Since we know the numbers we are dividing by in advance, we can do a little bit shifting magic and eliminate the divisions altogether.

Let’s take dividing by 10 as an example. The basic idea is to approximate the ratio 1/10 by another rational number with a power of two as the denominator. Instead of dividing, we can then multiply by the numerator, and shift by the exponent in the denominator. Since shifting chops off digits, it effectively rounds down the result of the division, so we always have to find an approximation that is larger than the ratio.

We see, for example, that 13/128 is a good approximation to 1/10. We can rewrite x/10 as (x*13) >> 7 as long as x is not too large. We run into trouble as soon as the error time x is larger than 1. In this case, that happens when x is larger than 13/(13-12.8) = 65. Fortunately, this is larger than the number of hours in a day, or the number of minutes in an hour, so we can use it for most calculations in our code. It won’t work for numbers up to a 100, so to get the second digit of the millisecond, we need the next approximation, 205/2048, which is good for values up to 10,000.

To get the first digit of the milliseconds, we need to divide by 100. We find that 41/4096 works nicely.

Implementing this optimization, we go from (for example):

*a = (char)(hour / 10 + ’0′);
a++;
*a = (
char)(hour % 10 + ’0′);

to:

int temp = (hour * 13) >> 7;
*a = (
char)(temp + ’0′
);
a++;
*a = (
char)(hour – 10 * temp + ’0′);

Our running time for 1 million iterations goes down from 0.38s to 0.28s, a savings of almost 18% compared to the original.

The larger divisors give us a bit of a challenge. To get the number of seconds, we divide a number less than 60000 by 1000. Doing this the straight way has us multiplying by 536871, which would require a long value for the result of the multiplication. We can get around this once we realize that 1000 = 8*125. So if we shift the number of milliseconds by 3, we only need to divide by 125. As an added benefit, the numbers we’re multiplying are always less than 7500, so our multiplier can be larger. This gives us the simple expression good for numbers up to almost 4 million: ((x >> 3) * 67109) >> 23.

The same trick doesn’t work for getting the minutes and hours, but it does allow us to fit the intermediate result into a long. We can use the Math.BigMul method to perform the calculation efficiently.

The final code is given below. It is doubtful it can be improved by much. It runs in as little as 0.221s for one million iterations, 2.5 times faster than the previous fastest code and over 25 times faster than the original.

private unsafe static string FormatFast6(DateTime time)
{
   
fixed (char
* p = dateData)
    {
       
long
ticks = time.Ticks;
       
char
* a = p;
       
int ms = (int
)((ticks / 10000) % 86400000);
       
int hour = (int)(Math
.BigMul(ms >> 7, 9773437) >> 38);
        ms -= 3600000 * hour;
       
int minute = (int)((Math
.BigMul(ms >> 5, 2290650)) >> 32);
        ms -= 60000 * minute;
       
int
second = ((ms >> 3) * 67109) >> 23;
        ms -= 1000 * second;

       
int temp = (hour * 13) >> 7;
        *a = (
char)(temp + ’0′
);
        a++;
        *a = (
char)(hour – 10 * temp + ’0′
);
        a += 2;

        temp = (minute * 13) >> 7;
        *a = (
char)(temp + ’0′
);
        a++;
        *a = (
char)(minute – 10 * temp + ’0′
);
        a += 2;

        temp = (second * 13) >> 7;
        *a = (
char)(temp + ’0′
);
        a++;
        *a = (
char)(second – 10 * temp + ’0′
);
        a += 2;

        temp = (ms * 41) >> 12;
        *a = (
char)(temp + ’0′
);
        a++;
        ms -= 100 * temp;
        temp = (ms * 205) >> 11;
        *a = (
char)(temp + ’0′
);
        ms -= 10 * temp;
        a++;
        *a = (
char)(ms – 10 * temp + ’0′
);
    }
    
return new String
(dateData);
}

3 thoughts on “Fast, Faster, Fastest

  1. Nice article; I’m impressed. But I think there’s a minor error in the sentence that discusses multiplying by 205/2048 instead of dividing by 10. This substitution is suitable only for values up to about 1000, not 10,000 as the article claims. Incorrect results would be obtained for numerator values that were at least 205/204.8, or 205/0.2, or 1025. (To be precise, 1029 is the lowest positive value for which the substitute operations give a different result than dividing by 10.)

  2. Have you tried to replace the (temp * 10) with (temp << 3 + temp<<1) ?

    temp * 10 == (temp*8 + temp*2) = (temp << 3 + temp<<1)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>