<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Numerical .NET</title>
	<atom:link href="http://www.extremeoptimization.com/Blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.extremeoptimization.com/Blog</link>
	<description>Experiences with technical computing on the Microsoft .NET platform.</description>
	<lastBuildDate>Thu, 19 Jan 2012 05:04:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Using Quadratic Programming for Portfolio Optimization</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2011/02/using-quadratic-programming-for-portfolio-optimization-2/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2011/02/using-quadratic-programming-for-portfolio-optimization-2/#comments</comments>
		<pubDate>Sun, 27 Feb 2011 18:16:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://www.extremeoptimization.com/Blog/?p=45</guid>
		<description><![CDATA[This week’s update of the Extreme Optimization Numerical Libraries for .NET includes the ability to solve quadratic programming (QP) problems. These are optimization problems where the objective function is a quadratic function and the solution is subject to linear constraints. Our QP solver uses an active set method, and can be used to solve programs [...]]]></description>
			<content:encoded><![CDATA[<p>This week’s <a href="http://www.extremeoptimization.com/Downloads.aspx">update</a> of the Extreme Optimization Numerical Libraries for .NET includes the ability to solve quadratic programming (QP) problems. These are optimization problems where the objective function is a quadratic function and the solution is subject to linear constraints.</p>
<p>Our QP solver uses an active set method, and can be used to solve programs with thousands of variables. Quadratic programming has many applications in finance, economics, agriculture, and other areas. It also serves as the core of some methods for nonlinear programming such as Sequential Quadratic Programming (SQP).</p>
<p>We created some QuickStart samples that demonstrates how to use the new functionality: </p>
<ul>
<li><a href="http://www.extremeoptimization.com/QuickStart/QuadraticProgrammingCS.aspx">Quadratic programming in C#</a> </li>
<li><a href="http://www.extremeoptimization.com/QuickStart/QuadraticProgrammingVB.aspx">Quadratic programming in Visual Basic</a> </li>
<li><a href="http://www.extremeoptimization.com/QuickStart/QuadraticProgrammingFS.aspx">Quadratic programming in F#</a> </li>
</ul>
<p>In this post, I’d like to illustrate an application in finance: portfolio optimization.</p>
<p>Say we have a set of assets. We know the historical average return and variances of each asset. We also know the correlations between the assets. With this information, we can look for the combination of assets that best meets our objectives. In this case, our goal will be to minimize the risk while achieving a minimal return. Now let’s see how we can model this.</p>
<p>The unknowns in our model will be the amount to be invested in each asset, which we’ll denote by <em>x<sub>i</sub></em>. All values together form the vector <strong>x</strong>. We will assume that all <em>x<sub>i </sub></em>≥ 0.</p>
<p>We can use the variance of the value of the portfolio as a measure for the risk. A larger variance means that the probability of a large loss also increases, which is what we want to avoid. If we denote the variance-covariance matrix of the assets by <strong>R</strong>, then the variance of the asset value is <strong>x</strong><sup>T</sup><strong>Rx</strong>. This is what we want to minimize.</p>
<p>Now let’s look at the constraints. We obviously have a budget, so the sum of all amounts must be less than or equal to our budget. Say our budget is $10,000, then we must have Σ <em>x<sub>i</sub></em> ≤ 10,000. We also want a minimum expected return on our investment, say 10%. If <em>r<sub>i </sub></em>is the return associated with asset <em>i</em>, then the total expected return for our investment is Σ <em>r<sub>i</sub> x<sub>i</sub></em>, and we want this value to be at least 10% of our total budget, or $1,000.</p>
<p>So, in summary, our model looks like this: </p>
<table border="0" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td valign="top" width="95">Minimize</td>
<td valign="top" width="181"><strong>x</strong><sup>T</sup><strong>Rx</strong></td>
</tr>
<tr>
<td valign="top" width="95">Subject to</td>
<td valign="top" width="181">Σ <em>x<sub>i</sub></em> ≤ 10000          <br />Σ <em>r<sub>i</sub> x<sub>i</sub></em> ≥ 1000 <em>x<sub>i </sub></em>≥ 0</td>
</tr>
</tbody>
</table>
<p>The last thing we need to do is formulate this quadratic program in terms of the optimization classes in the Numerical Libraries for .NET. The QuickStart samples show three ways of doing this: you can define the model directly in terms of vectors and matrices; you can build the model from scratch, or you can read it from a model definition file in MPS format. Since the problem above is already pretty much expressed in terms of vectors and matrices, we’ll use that approach here.</p>
<p>Quadratic programming problems are represented by the QuadraticProgram class. One of the constructors lets you create a model in so-called <em>standard form</em>. A quadratic program in standard form looks like this: </p>
<table border="1" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td valign="top" width="95">Minimize</td>
<td valign="top" width="181"><em>c</em><sub>0<em><strong> </strong></em></sub><strong>+ c<sup>T</sup>x + ½x</strong><sup>T</sup><strong>Hx</strong></td>
</tr>
<tr>
<td valign="top" width="95">Subject to</td>
<td valign="top" width="181"><strong>A<em><sub>E</sub></em>x</strong> = <strong>b<em><sub>E</sub></em></strong>          <br /><strong>A<em><sub>I</sub></em>x</strong> ≤ <strong>b</strong><em><sub><strong>I</strong> </sub></em>          <br /><strong>l</strong> ≤ <strong>x</strong> ≤ <strong>u</strong></td>
</tr>
</tbody>
</table>
<p>where <em>c</em><sub>0</sub> is a scalar, <strong>H</strong>, <strong>A<em><sub>E</sub></em></strong>, and <strong>A<em><sub>I</sub></em></strong> are matrices, and all other values are vectors. The subscript <em>E</em> denotes quality constraints, while the subscript <em>I</em> denotes inequality constraints. In our case, we don&#8217;t have equality constraints.</p>
<p>As a concrete example, we will work with four assets with returns 5%, -20%, 15% and 30%, respectively. Our budget is $10,000, and the minimum return is 10%. The implementation is quite straightforward. Noteworthy is that, since in the standard form the right hand side is an upper bound, we have to change the sign of the minimum return constraint which had a lower bound right-hand side. </p>
<p>Here&#8217;s the C# code:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp"><span class="co1">// The linear term in the objective function: </span>
Vector c <span class="sy0">=</span> Vector<span class="sy0">.</span><span class="me1">CreateConstant</span><span class="br0">&#40;</span><span class="nu0">4</span>, <span class="nu0">0.0</span><span class="br0">&#41;</span><span class="sy0">;</span> 
<span class="co1">// The quaratic term in the objective function: </span>
Matrix R <span class="sy0">=</span> Matrix<span class="sy0">.</span><span class="me1">CreateSymmetric</span><span class="br0">&#40;</span><span class="nu0">4</span>, <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span class="kw3">new</span></a> <span class="kw4">double</span><span class="br0">&#91;</span><span class="br0">&#93;</span> 
    <span class="br0">&#123;</span>
        <span class="nu0">0.08</span>,<span class="sy0">-</span><span class="nu0">0.05</span>,<span class="sy0">-</span><span class="nu0">0.05</span>,<span class="sy0">-</span><span class="nu0">0.05</span>,
        <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="nu0">0.16</span>,<span class="sy0">-</span><span class="nu0">0.02</span>,<span class="sy0">-</span><span class="nu0">0.02</span>,
        <span class="sy0">-</span><span class="nu0">0.05</span>,<span class="sy0">-</span><span class="nu0">0.02</span>, <span class="nu0">0.35</span>, <span class="nu0">0.06</span>,
        <span class="sy0">-</span><span class="nu0">0.05</span>,<span class="sy0">-</span><span class="nu0">0.02</span>, <span class="nu0">0.06</span>, <span class="nu0">0.35</span>
    <span class="br0">&#125;</span>, MatrixTriangle<span class="sy0">.</span><span class="me1">Upper</span><span class="br0">&#41;</span><span class="sy0">;</span> 
<span class="co1">// The coefficients of the constraints: </span>
Matrix A <span class="sy0">=</span> Matrix<span class="sy0">.</span><span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">2</span>, <span class="nu0">4</span>, <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span class="kw3">new</span></a> <span class="kw4">double</span><span class="br0">&#91;</span><span class="br0">&#93;</span> 
    <span class="br0">&#123;</span> 
        <span class="nu0">1</span>, <span class="nu0">1</span>, <span class="nu0">1</span>, <span class="nu0">1</span>, 
        <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="nu0">0.2</span>, <span class="sy0">-</span><span class="nu0">0.15</span>, <span class="sy0">-</span><span class="nu0">0.30</span>
    <span class="br0">&#125;</span>, MatrixElementOrder<span class="sy0">.</span><span class="me1">RowMajor</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="co1">// The right-hand sides of the constraints: </span>
Vector b <span class="sy0">=</span> Vector<span class="sy0">.</span><span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">10000</span>, <span class="sy0">-</span><span class="nu0">1000</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="co1">// We're now ready to call the constructor. </span>
<span class="co1">// The last parameter specifies the number of equality </span>
<span class="co1">// constraints. </span>
QuadraticProgram qp1 <span class="sy0">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span class="kw3">new</span></a> QuadraticProgram<span class="br0">&#40;</span>c, R, A, b, <span class="nu0">0</span><span class="br0">&#41;</span><span class="sy0">;</span> 
<span class="co1">// Now we can call the Solve method to run the algorithm: </span>
Vector x <span class="sy0">=</span> qp1<span class="sy0">.</span><span class="me1">Solve</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div></div>

<p>This is how it looks in Visual Basic:</p>

<div class="wp_syntax"><div class="code"><pre class="vbnet"><span class="co1">' The linear term in the objective function: </span>
<span class="kw6">Dim</span> c <span class="kw2">As</span> Vector <span class="sy0">=</span> Vector.<span class="me1">CreateConstant</span><span class="br0">&#40;</span><span class="nu0">4</span>, <span class="nu0">0.0</span><span class="br0">&#41;</span> 
<span class="co1">' The quaratic term in the objective function: </span>
<span class="kw6">Dim</span> R <span class="kw2">As</span> Matrix <span class="sy0">=</span> Matrix.<span class="me1">CreateSymmetric</span><span class="br0">&#40;</span><span class="nu0">4</span>, _ 
    <span class="kw2">New</span> <span class="kw4">Double</span><span class="br0">&#40;</span><span class="br0">&#41;</span> _
        <span class="br0">&#123;</span> _
            <span class="nu0">0.08</span>, <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="sy0">-</span><span class="nu0">0.05</span>, _
            <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="nu0">0.16</span>, <span class="sy0">-</span><span class="nu0">0.02</span>, <span class="sy0">-</span><span class="nu0">0.02</span>, _
            <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="sy0">-</span><span class="nu0">0.02</span>, <span class="nu0">0.35</span>, <span class="nu0">0.06</span>, _
            <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="sy0">-</span><span class="nu0">0.02</span>, <span class="nu0">0.06</span>, <span class="nu0">0.35</span> _
        <span class="br0">&#125;</span>, MatrixTriangle.<span class="me1">Upper</span><span class="br0">&#41;</span> 
<span class="co1">' The coefficients of the constraints: </span>
<span class="kw6">Dim</span> A <span class="kw2">As</span> Matrix <span class="sy0">=</span> Matrix.<span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">2</span>, <span class="nu0">4</span>, <span class="kw2">New</span> <span class="kw4">Double</span><span class="br0">&#40;</span><span class="br0">&#41;</span> _
    <span class="br0">&#123;</span> _
        <span class="nu0">1</span>, <span class="nu0">1</span>, <span class="nu0">1</span>, <span class="nu0">1</span>, _
        <span class="sy0">-</span><span class="nu0">0.05</span>, <span class="nu0">0.2</span>, <span class="sy0">-</span><span class="nu0">0.15</span>, <span class="sy0">-</span><span class="nu0">0.3</span> _
    <span class="br0">&#125;</span>, MatrixElementOrder.<span class="me1">RowMajor</span><span class="br0">&#41;</span> 
<span class="co1">' The right-hand sides of the constraints: </span>
<span class="kw6">Dim</span> b <span class="kw2">As</span> Vector <span class="sy0">=</span> Vector.<span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">10000</span>, <span class="sy0">-</span><span class="nu0">1000</span><span class="br0">&#41;</span> 
<span class="co1">' We're now ready to call the constructor. </span>
<span class="co1">' The last parameter specifies the number of</span>
<span class="co1">' equality constraints. </span>
<span class="kw6">Dim</span> qp1 <span class="kw2">As</span> <span class="kw2">New</span> QuadraticProgram<span class="br0">&#40;</span>c, R, A, b, <span class="nu0">0</span><span class="br0">&#41;</span> 
<span class="co1">' Now we can call the Solve method to run the algorithm: </span>
<span class="kw6">Dim</span> x <span class="kw2">As</span> Vector <span class="sy0">=</span> qp1.<span class="me1">Solve</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></div></div>

<p>And finally in F#:</p>

<div class="wp_syntax"><div class="code"><pre class="fsharp"><span class="co1">// The linear term in the objective function:</span>
<span class="kw1">let</span> c <span class="sy0">=</span> Vector.<span class="me1">CreateConstant</span><span class="br0">&#40;</span><span class="nu0">4</span>, <span class="nu0">0.0</span><span class="br0">&#41;</span><span class="sy0">;</span> 
<span class="co1">// The quaratic term in the objective function:</span>
<span class="kw1">let</span> R <span class="sy0">=</span> Matrix.<span class="me1">CreateSymmetric</span><span class="br0">&#40;</span><span class="nu0">4</span>,
    <span class="br0">&#91;</span>|
        <span class="nu0">0.08</span><span class="sy0">;-</span><span class="nu0">0.05</span><span class="sy0">;-</span><span class="nu0">0.05</span><span class="sy0">;-</span><span class="nu0">0.05</span><span class="sy0">;</span>
        <span class="sy0">-</span><span class="nu0">0.05</span><span class="sy0">;</span> <span class="nu0">0.16</span><span class="sy0">;-</span><span class="nu0">0.02</span><span class="sy0">;-</span><span class="nu0">0.02</span><span class="sy0">;</span>
        <span class="sy0">-</span><span class="nu0">0.05</span><span class="sy0">;-</span><span class="nu0">0.02</span><span class="sy0">;</span> <span class="nu0">0.35</span><span class="sy0">;</span> <span class="nu0">0.06</span><span class="sy0">;</span>
        <span class="sy0">-</span><span class="nu0">0.05</span><span class="sy0">;-</span><span class="nu0">0.02</span><span class="sy0">;</span> <span class="nu0">0.06</span><span class="sy0">;</span> <span class="nu0">0.35</span>
    |<span class="br0">&#93;</span>, MatrixTriangle.<span class="me1">Upper</span><span class="br0">&#41;</span> 
<span class="co1">// The coefficients of the constraints: </span>
<span class="kw1">let</span> A <span class="sy0">=</span> Matrix.<span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">2</span>, <span class="nu0">4</span>, 
    <span class="br0">&#91;</span>|
        <span class="nu0">1.0</span><span class="sy0">;</span> <span class="nu0">1.0</span><span class="sy0">;</span> <span class="nu0">1.0</span><span class="sy0">;</span> <span class="nu0">1.0</span><span class="sy0">;</span>
        <span class="sy0">-</span><span class="nu0">0.05</span><span class="sy0">;</span> <span class="nu0">0.2</span><span class="sy0">;</span> <span class="sy0">-</span><span class="nu0">0.15</span><span class="sy0">;</span> <span class="sy0">-</span><span class="nu0">0.30</span>
    |<span class="br0">&#93;</span>, MatrixElementOrder.<span class="me1">RowMajor</span><span class="br0">&#41;</span> 
<span class="co1">// The right-hand sides of the constraints: </span>
<span class="kw1">let</span> b <span class="sy0">=</span> Vector.<span class="me1">Create</span><span class="br0">&#40;</span><span class="nu0">10000.0</span>, <span class="sy0">-</span><span class="nu0">1000.0</span><span class="br0">&#41;</span> 
<span class="co1">// We're now ready to call the constructor. </span>
<span class="co1">// The last parameter specifies the number of </span>
<span class="co1">// equality constraints. </span>
<span class="kw1">let</span> qp1 <span class="sy0">=</span> <span class="kw1">new</span> QuadraticProgram<span class="br0">&#40;</span>c, R, A, b, <span class="nu0">0</span><span class="br0">&#41;</span>
<span class="co1">// Now we can call the Solve method to run the algorithm:</span>
<span class="kw1">let</span> x <span class="sy0">=</span> qp1.<span class="me1">Solve</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></div></div>

<p>After we&#8217;ve run this code, the vector x will contain the optimal amounts for each asset. It gives us: $3,453, $0, $1,069, $2,223. Not surprisingly, the second asset, which has had a negative return on average, should not be part of our portfolio. However, it turns out that the asset with the lowest positive return should have the largest share.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2011/02/using-quadratic-programming-for-portfolio-optimization-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Accurate trigonometric functions for large arguments</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2011/02/accurate-trigonometric-functions-for-large-arguments/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2011/02/accurate-trigonometric-functions-for-large-arguments/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 14:53:33 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.extremeoptimization.com/Blog/index.php/2011/02/accurate-trigonometric-functions-for-large-arguments/</guid>
		<description><![CDATA[This week we introduced two new features in the Extreme Optimization Numerical Libraries for .NET. Trigonometric functions with large arguments The .NET Framework implementation of the trigonometric functions, sine, cosine, and tangent, relies on the corresponding processor instruction. This gives extremely fast performance, but may not give fully accurate results. This is the case for [...]]]></description>
			<content:encoded><![CDATA[<p>This week we introduced two new features in the Extreme Optimization Numerical Libraries for .NET.</p>
<h3>Trigonometric functions with large arguments</h3>
<p>The .NET Framework implementation of the trigonometric functions, sine, cosine, and tangent, relies on the corresponding processor instruction. This gives extremely fast performance, but may not give fully accurate results. This is the case for even fairly small arguments.</p>
<p>For example, Math.Sin(1000000.0) returns -0.34999350217129<span style="text-decoration: underline;">177</span> while the correct value is -0.34999350217129<span style="text-decoration: underline;">295</span>. So we’ve already lost at least 2 digits. And things just go downhill from there. At 10<sup>12</sup>, we only get 8 good digits.</p>
<p>For even larger arguments, something quite unexpected happens. The result of Math.Sin(1e20) is… 1e20! The argument is returned unchanged! Not only is this a completely meaningless return value. It also can cause a calculation to fail if it relies on the fact that sine and cosine are bounded by -1 and +1.</p>
<p>To understand what is going on, we need to go back to the implementation.</p>
<p>Math.Sin, Math.Cos and Math.Tan rely on processor instructions like fsin and fcos. These instructions work by first reducing the angle to a much smaller value, and then using a table-based approach to get the final result. Moreover, according to the documentation, the argument must be strictly between –2<sup>63</sup> and 2<sup>63</sup>. The return value is not defined for arguments outside this interval. This explains the complete breakdown for large arguments.</p>
<p>The explanation for the gradual loss of accuracy is more subtle. As I said, the computation is done by first reducing the argument to a smaller interval, typically (–π, π]. The argument <em>x</em> is rewritten in the form</p>
<blockquote><p><em>x</em> = <em>n</em> π + <em>f</em></p></blockquote>
<p>where <em>n</em> is an even integer, and <em>f</em> is a real number with magnitude less than π.</p>
<p>The first step is to divide the number by π. The quotient is rounded to the nearest even integer to give us <em>n</em>. We get the reduced value <em>f</em> by subtracting this number times π from <em>x</em>. Because of the periodicity of the trigonometric functions, the value of the function at <em>x</em> is the same as the value at <em>f</em>, which can be calculated accurately and efficiently.</p>
<p>Now, if you start with a value like 10<sup>300</sup>, you’ll need to divide this number by π with at least 300 digits of precision, round the quotient to the nearest even number, and subtract π times this number from 10<sup>300</sup>. To do this calculation accurately over the whole range, we would need to work with a value of π that is accurate to 1144 bits.</p>
<p>Even today, this is just a little bit beyond the capabilities of current hardware. Moreover, it could be considered wasteful to spend so much silicon on a calculation that is not overly common. Intel processors use just 66 bits. This leaves us with just 14 extra bits. The effects will begin to show with arguments that are larger than 2<sup>16</sup>.</p>
<h3>Accurate argument reduction for trigonometric functions</h3>
<p>So, there are two problems with the standard trigonometric functions in the .NET Framework:</p>
<ol>
<li>The accuracy decreases with increasing size of the argument, losing several digits for even modest sized arguments.</li>
<li>When the argument is very large, the functions break down completely.</li>
</ol>
<p>To address these two problems, we’ve added <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.Elementary.Sin.aspx">Sin</a>,<a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.Elementary.Cos.aspx"> Cos</a> and <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.Elementary.Tan.aspx">Tan</a> methods to the <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.Elementary.aspx">Elementary</a> class. These methods perform fully accurate argument reduction so the results are accurate even for huge values of the argument.</p>
<p>For example, the 256 digit floating-point number 6381956970095103 2<sup>797</sup> is very close to a multiple of π/2. In fact, it is so close that the first 61 binary digits after the period of the reduced value are 0. As expected, Math.Cos gives the completely meaningless result 5.31937264832654141671e+255. Elementary.Cos returns -4.6871659242546267E-19, which is almost exactly the same as what <a href="http://www.wolframalpha.com/input/?i=N%5BCos%5B6381956970095103*2%5E797%5D%5D">Wolfram|Alpha</a> gives: 4.6871659242546276E-19. The relative error is about the size of the machine precision.</p>
<p>What about performance? Thanks to some tricks with modular multiplication, it is possible to do the reduction in nearly constant time. Moreover, since the reduction only needs to be done for larger arguments, the overhead in most situations is minimal. Our benchmarks show a 4% overall slowdown when no reduction is needed. For smaller values, the operation can take close to twice as long, while in the worst case, for large arguments that require a full reduction, we see a 10x slowdown.</p>
<h3>Transcendental functions for System.Decimal</h3>
<p>Also new in this week’s update is the <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.DecimalMath.aspx">DecimalMath</a> class, which contains implementations of all functions from System.Math for the decimal type. All trigonometric, hyperbolic, exponential and logarithmic functions are supported, as well as the constants <em>e</em> and π.</p>
<p>The calculations are accurate to the full 96 bit precision of the decimal type. This is useful for situations where double precision is not enough, but going with an arbitrary precision <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Mathematics.BigFloat.aspx">BigFloat</a> would be overkill.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2011/02/accurate-trigonometric-functions-for-large-arguments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New feature: Wilcoxon-Mann-Whitney and Kruskal-Wallis Tests</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2011/02/mann-whitney-and-kruskal-wallis-tests/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2011/02/mann-whitney-and-kruskal-wallis-tests/#comments</comments>
		<pubDate>Thu, 10 Feb 2011 06:00:26 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.extremeoptimization.com/Blog/?p=21</guid>
		<description><![CDATA[We just released an update to our Extreme Optimization Numerical Libraries for .NET that adds some new classes. We&#8217;ve added two non-parametric tests: the Mann-Whitney and Kruskal-Wallis tests. These are used to test the hypothesis that two or more samples were drawn from the same distribution. The Mann-Whitney test is used for two samples. The Kruskal-Wallis [...]]]></description>
			<content:encoded><![CDATA[<p>We just released an update to our Extreme Optimization Numerical Libraries for .NET that adds some new classes. We&#8217;ve added two non-parametric tests: the Mann-Whitney and Kruskal-Wallis tests. These are used to test the hypothesis that two or more samples were drawn from the same distribution. The Mann-Whitney test is used for two samples. The Kruskal-Wallis test is used when there are two or more samples.</p>
<p>For both tests, the test statistic only depends on the ranks of the observations in the combined sample, and no assumption about the distribution of the populations is made. This is the meaning of the term non-parametric in this context.</p>
<p>The <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Statistics.Tests.MannWhitneyTest.aspx">Mann-Whitney test</a>, sometimes also called the Wilcoxon-Mann-Whitney test or the Wilcoxon Rank Sum test, is often interpreted to test whether the median of the distributions are the same. Although a difference in median is the dominant differentiator if it is present, other factors such as the shape or the spread of the distributions may also be <a href="http://www.bmj.com/content/323/7309/391.extract">significant</a>.</p>
<p>For relatively small sample sizes, and if no ties are present, we return an exact result for the Mann-Whitney test. For larger samples or when some observations have the same value, the common normal approximation is used.</p>
<p>The <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Statistics.Tests.KruskalWallisTest.aspx">Kruskal-Wallis test</a> is an extension of the Mann-Whitney test to more than two samples. We always use an approximation for the distribution. The most common approximation is through a Chi-square distribution. We chose to go with an approximation in terms of the beta distribution that is generally more reliable, especially for smaller sample sizes. For comparison with other statistical software, the <a href="http://www.extremeoptimization.com/Documentation/Reference/Extreme.Statistics.Tests.KruskalWallisTest.ChiSquarePValue.aspx">chi-square p-value</a> is also available.</p>
<p>We created some QuickStart samples that illustrate how to use the new functionality:</p>
<ul>
<li><a href="http://www.extremeoptimization.com/QuickStart/NonParametricTestsCS.aspx">Non-parametric tests in C#</a></li>
<li><a href="http://www.extremeoptimization.com/QuickStart/NonParametricTestsVB.aspx">Non-parametric tests in Visual Basic</a></li>
<li><a href="http://www.extremeoptimization.com/QuickStart/NonParametricTestsFS.aspx">Non-parametric tests in F#</a> </li>
</ul>
<p>You can also view the documentation on <a href="http://www.extremeoptimization.com/Documentation/Statistics/Hypothesis_Tests/Non-Parametric_Tests.aspx">non-parametric tests</a>, or download the <a href="http://www.extremeoptimization.com/downloads.aspx">trial version</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2011/02/mann-whitney-and-kruskal-wallis-tests/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My .NET Framework Wish List</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2007/05/my-net-framework-wish-list/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2007/05/my-net-framework-wish-list/#comments</comments>
		<pubDate>Mon, 07 May 2007 13:03:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2007/05/07/42374.aspx</guid>
		<description><![CDATA[This is now the fifth year I&#8217;ve been writing numerical software for the .NET platform. Over these years, I&#8217;ve discovered quite a few, let&#8217;s call them &#8216;unfortunate&#8217;, design decisions that make writing solid and fast numerical code on .NET more difficult than it needs to be. What I&#8217;d like to do in the coming weeks [...]]]></description>
			<content:encoded><![CDATA[<p>This is now the fifth year I&#8217;ve been writing numerical software for the .NET platform. Over these years, I&#8217;ve discovered quite a few, let&#8217;s call them &#8216;unfortunate&#8217;, design decisions that make writing solid and fast numerical code on .NET more difficult than it needs to be.</p>
<p>What I&#8217;d like to do in the coming weeks is list some of the improvements that would make life easier for people in our specialized field of technical computing. The items mentioned in this post aren&#8217;t new: I&#8217;ve written about them before. But it&#8217;s nice to have them all collected in one spot.</p>
<p><font size=4><strong>Fix the &#8220;Inlining Problem&#8220;</strong></font></p>
<p>First on the list: the &#8220;inlining&#8220; problem. Method calls with parameters that are value types do not get inlined by the JIT. This is very unfortunate, as it eliminates most of the benefit of defining specialized value types. For example: it&#8217;s easy enough to define a complex number structure with overloaded operators and enough bells and whistles to make you deaf. Unfortunately, none of those operator calls are inlined. You end up with code that is an order of magnitude slower than it needs to be.</p>
<p>Even though it has been the top performance issue for several years, there is no indication yet that it will be fixed any time soon. You can <a href="http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93858">add your vote</a> to the already sizeable number on Microsoft&#8217;s product feedback site.</p>
<p><strong><font size=4>Support the IEEE-754 Standard</font></strong></p>
<p>Over 20 years ago, the IEEE published a <a href="http://grouper.ieee.org/groups/754/">standard for floating-point arithmetic</a> that has since been adopted by all major manufacturers of CPU&#8217;s. So, platform independence can&#8217;t be an issue. Why then is it that the .NET Framework has only the most minimal support for the standard? Surely the fact that people took the time to come up with a solid standard, and the fact that it has enjoyed such wide support from hardware vendors should be an indication that this is something useful and would greatly benefit an important segment of the developer community.</p>
<p>I&#8217;ve written about the&nbsp;benefits of <a href="http://blogs.extremeoptimization.com/jeffrey/archive/2006/01/30/6736.aspx">floating-point exceptions</a> before, and I&#8217;ve discussed my proposal for a <a href="http://blogs.extremeoptimization.com/jeffrey/archive/2006/02/08/7074.aspx">FloatingPointContext class</a>. I&#8217;ve added a suggestion to this effect in LadyBug. Please go over there and&nbsp;<a href="https://connect.microsoft.com/VisualStudio/feedback/Vote.aspx?FeedbackID=276107">vote</a> for this proposal.</p>
<p><font size=4><strong>Allow Overloading of Compound Assignment Operators</strong></font></p>
<p>This is another topic I&#8217;ve written about <a href="http://blogs.extremeoptimization.com/jeffrey/archive/2005/02/19/153.aspx">before</a>. In a nutshell: C# and VB.NET don&#8217;t support custom overloaded assignment operators at all. C++/CLI supports them, and purposely violates the CLI spec in the process &#8211; which is a good thing! One point I would like to add: performance isn&#8217;t the only reason. Sometimes there is a semantic difference. Take a look at this code:</p>
<p><font face="Courier New" size=2>RowVector pivotRow = matrix.GetRow(pivot)<br /><font color=#0000ff>for</font>(row = pivot+1; row &lt; rowCount; row++)<br />{<br />&nbsp;&nbsp; RowVector&nbsp;currentRow = matrix.GetRow(row);<br />&nbsp;&nbsp;&nbsp;currentRow -= factor * pivotRow<br />}</font></p>
<p>which could be part of the code for&nbsp;computing the&nbsp;LU Decomposition of a matrix. The <font face="Courier New" size=2>GetRow</font> method returns the row of the matrix without making a copy of the data. The code inside the loop subtracts a multiple of the pivot row from the current row. With the current semantics where x&nbsp;-=&nbsp;y is equivalent to x&nbsp;=&nbsp;x&nbsp;-&nbsp;y, this&nbsp;code does not perform as expected.</p>
<p>What I would like to see is have the CLI spec changed to match what C++/CLI does. Compound assignment operators should be instance members.</p>
<p>&nbsp;</p>
<p>Still&nbsp;to come: a proposal for some (relatively) minor modifications to .NET generics to efficiently implement generic arithmetic, better support for arrays, and more.</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/42374.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2007/05/my-net-framework-wish-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dynamic times two with the Dynamic Language Runtime</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2007/04/dynamic-times-two-with-the-dynamic-language-runtime/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2007/04/dynamic-times-two-with-the-dynamic-language-runtime/#comments</comments>
		<pubDate>Mon, 30 Apr 2007 14:52:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2007/04/30/42278.aspx</guid>
		<description><![CDATA[Microsoft today announced their latest addition to the .NET family: the Dynamic Language Runtime (DLR). As Jim Hugunin points out, it is based on the IronPython 1.0 codebase, but has been generalized so it can support other dynamic languages, including Visual Basic. Now, the word &#8216;dynamic&#8217; here is often misunderstood. Technically, the word dynamic refers [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft today <a href="http://blogs.msdn.com/hugunin/archive/2007/04/30/a-dynamic-language-runtime-dlr.aspx">announced</a> their latest addition to the .NET family: the Dynamic Language Runtime (DLR). As Jim Hugunin points out, it is based on the IronPython 1.0 codebase, but has been generalized so it can support other dynamic languages, including Visual Basic.</p>
<p>Now, the word &#8216;dynamic&#8217; here is often misunderstood. Technically, the word dynamic refers to the type system. The .NET CLR is statically typed: every object has a well-defined type at compile time, and all method calls&nbsp;and property references are resolved at compile time. Virtual methods are somewhat of an&nbsp;in-between case, because which code is called depends on the runtime type, a type which may not even existed when the original code was compiled. But still, every type that overrides a method must inherit from a specific parent class.</p>
<p>In dynamic languages, the type of variables, their methods and properties may be defined at runtime. You can create new types and add properties and methods to existing types. When a method is called in a dynamic language, the <em>runtime </em>looks at the object, looks for a method that matches, and calls it. If there is no matching method, a run-time error is raised.</p>
<p>Writing code in dynamic languages can be very quick, because there is&nbsp;rarely a&nbsp;need to specify type information. It&#8217;s also very common to use dynamic languages interactively. You can execute IronPython scripts, but there&#8217;s also a Console that hosts interactive IronPython sessions.</p>
<p>And this is where it gets confusing. Because leaving out type information and interactive environments come naturally to dynamic languages, these features&nbsp;are often thought of as properties of dynamic languages. They are not.</p>
<p>Ever heard of <a href="http://research.microsoft.com/fsharp">F#</a>? It is&nbsp;a statically typed, compiled language created by <a href="http://blogs.msdn.com/dsyme/">Don Syme</a> and others at Microsoft Research. It can be used to build libraries and end-user applications, much like C# and VB. But it&nbsp;also has an interactive console and eliminates the need for most type specifications through a smart use of type inference.</p>
<p>F# is not a dynamic language in the technical sense: it is statically typed. But because it has an interactive console and you&nbsp;rarely have to specify types, it is a dynamic language in the eyes of a lot of people. In fact, at&nbsp;the Lang.NET symposium&nbsp;hosted by Microsoft last August, people were asked what their favorite dynamic language is. Many&nbsp;answered with&nbsp;F#. And these were programming language designers and compiler writers!</p>
<p>Anyway, the point I wanted to make with this post is that the new Dynamic Language Runtime has great support for both the technically dynamic languages (dynamic types) and the perceived as dynamic features like interactive environments. I hope the distinction between these two aspects will be clarified in the future.</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/42278.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2007/04/dynamic-times-two-with-the-dynamic-language-runtime/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Latest Supercomputer Top 500</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2006/07/latest-supercomputer-top-500/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2006/07/latest-supercomputer-top-500/#comments</comments>
		<pubDate>Mon, 10 Jul 2006 15:45:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2006/07/10/17752.aspx</guid>
		<description><![CDATA[Last week, the latest edition of the list of the 500 fastest supercomputers was released. Two recent developments make this list interesting. The arrival of multicore processors. Even though their presence is still modest on the current list, expect their share to rise. Intel is targeting 32 cores on a chip by 2010. Microsoft made [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, the latest edition of the list of the <a href="http://www.top500.org/">500 fastest supercomputers</a> was released. Two recent developments make this list interesting.</p>
<ol>
<li>The arrival of multicore processors. Even though their presence is still modest on the current list, expect their share to rise. Intel is targeting 32 cores on a chip by 2010.</li>
<li>Microsoft made its entry on the scene with <a href="http://www.microsoft.com/windowsserver2003/ccs/default.mspx">Windows Compute Cluster Server 2003</a>, an enhanced Windows 2003&nbsp;Enterprise Server version tweaked for High Performance Computing. The first (and so far the only) entry on the Top500 list is at the <a href="http://access.ncsa.uiuc.edu/Releases/06.28.06_Top500_deb.html">National Center&nbsp;for SuperComputing</a> at the <a href="http://www.uiuc.edu/">University of Illinois</a>. It will be interesting to see how this number grows in the coming years. At the very least, it will give some indication of the headway Microsoft is making in the HPC&nbsp;market.</li>
</ol>
<p>Some trivia:</p>
<p>The #1 spot is still held by IBM&#8217;s BlueGene/L supercomputer at&nbsp;the <a href="http://www.llnl.gov/">Lawrence Livermore National Laboratory</a>. At over 280TFlops, this monster is faster than an IBM PC with 8087 co-processor by a factor of roughly one billion.&nbsp;That&#8217;s right: it&#8217;s as fast as 1,000,000,000 original IBM PC&#8217;s!</p>
<p>The first Top500 list&nbsp;was published in <a href="http://www.top500.org/list/1993/06/">June 1993</a>.&nbsp;It&#8217;s interesting to note that one dual processor machine based on Intel&#8217;s&nbsp;latest dual-core&nbsp;processors would, at <a href="http://www.intel.com/performance/server/xeon/hpcapp.htm">34.9GFlops</a>,&nbsp;take the #2 spot on that original list. Today&#8217;s average desktop would make it into the top 100&#8230;</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/17752.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2006/07/latest-supercomputer-top-500/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Double trouble: an interesting puzzle</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2006/06/double-trouble-an-interesting-puzzle/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2006/06/double-trouble-an-interesting-puzzle/#comments</comments>
		<pubDate>Fri, 16 Jun 2006 09:22:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2006/06/16/16647.aspx</guid>
		<description><![CDATA[I&#8217;ve written before about some of the strange looking&#160;behavior you can find when working with&#160;numbers on a computer. As this article explains at length, this&#160;behavior is a completely logical consequence of the need to use a finite set of values to represent potentially infinitely many numbers. Kathy Kam of the Microsoft CLR team recently posted [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve written before about some of the strange looking&nbsp;behavior you can find when working with&nbsp;numbers on a computer. As <a href="http://www.extremeoptimization.com/resources/articles/FPDotNetConceptsAndFormats.aspx">this article</a> explains at length, this&nbsp;behavior is a completely logical consequence of the need to use a finite set of values to represent potentially infinitely many numbers.</p>
<p><a href="http://blogs.msdn.com/kathykam/default.aspx">Kathy Kam</a> of the Microsoft CLR team recently posted another interesting piece of <a href="http://blogs.msdn.com/kathykam/archive/2006/05/08/592888.aspx">floating-point wierdness</a>. The question boils down to this. When you run this code:</p>
<p><font face="Courier New" size=2>Console.WriteLine(4.170404 == Convert.ToDouble(<font color=#ff0000>&#8220;4.170404&#8220;</font>));</font></p>
<p>why does it print <font face="Courier New" color=#0000ff size=2>false</font> instead of the expected <font face="Courier New" color=#0000ff size=2>true</font>?</p>
<p>The answer turns out to be not trivial and comes down to the different ways that the .NET framework and the C# compiler convert text strings to numbers.</p>
<p>The C# compiler uses the Windows API function <font face="Courier New" size=2>VarR8FromStr</font> to convert the literal value to a double. The documentation isn&#8217;t very precific about its inner workings. The C# spec says in section 9.4.4.3 that doubles are rounded using IEEE &#8220;round to nearest&#8220; mode.</p>
<p>So what happens here? First, the number is &#8216;normalized&#8217;: the number is scaled by&nbsp;a power of two so that the number is between 1 and 2. The scale factor is moved to the exponent. Next, the number is rounded to double precision, which has 52 significant binary digits.</p>
<p>Here, 4.170404 is divided by 4, which gives 1.042601. The binary representation of this number is:</p>
<p>1.0000101011100111111001100010110111000110111000101&#8230;</p>
<p>The part that interests us starts at digit #52 after the &#8220;binary point,&#8220; so let&#8217;s show everything after, say, the 40th digit:</p>
<p><font face="Courier New" size=2>4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 7<br />1234567890 1234567890 1234567890<br />1110001010 1010000000 0000011001</font></p>
<p>To round to 52 binary digits, we need to round upwards:</p>
<p><font face="Courier New" size=2>4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5<br /></font><font face="Courier New" size=2>1234567890 12<br />1110001010 11</font></p>
<p>This result is used to compose the double precsion number.</p>
<p>If we follow the path of the <font face="Courier New" size=2>Convert.ToDouble</font> call, we find that it passes its string argument on to <font face="Courier New" size=2>Double.Parse</font>. This&nbsp;method prepares an internal structure called a <font face="Courier New" size=2>NumberBuffer</font> and eventually calls a function named <font face="Courier New" size=2>NumberBufferToDouble</font> internal to the CLR. In the Shared Source CLI implementation (Rotor) code, we find the following comment&nbsp;for the function that does all the hard work:</p>
<p><font face="Courier New" color=#008000 size=2>&nbsp;&nbsp;&nbsp; The internal integer representation of the float number is<br />&nbsp;&nbsp;&nbsp; UINT64 mantisa + INT exponent. The mantisa is kept normalized<br />&nbsp;&nbsp;&nbsp; ie with the most significant one being 63-th bit of UINT64.</font></p>
<p>This is good news &#8211; extra precision is used to ensure we get the correct result. Looking further, we find that this function uses a helper function appropriately called <font face="Courier New" size=2>Mul64Lossy</font>: which multiplies two 64bit values. In the comments, we find this:</p>
<p><font face="Courier New" color=#008000 size=2>&nbsp;&nbsp;&nbsp; // it&#8217;s ok to losse some precision here &#8211; Mul64 will be called<br />&nbsp;&nbsp;&nbsp; // at most twice during the conversion, so the error won&#8217;t propagate<br />&nbsp;&nbsp;&nbsp; // to any of the 53 significant bits of the result</font></p>
<p>This is bad news. If you look at the binary representation of 4.170404/4 above, you&#8217;ll see that all digits from the 54th up to the 65th are zero. So it is very much possible that there <em>was</em> some loss of precision here, and that it <em>did</em> propagate to the last significant digit of the final result. The assumption made by the developer of this code is mostly right, but sometimes wrong.</p>
<p>But why risk loss of precision when it can be avoided? The (misguided) answer is: speed. The Rotor code contains a function, <font face="Courier New" size=2>Mul64Precise</font> which&nbsp;doesn&#8217;t suffer from this loss of precision. However, it does use a few extra instructions to do some more shifting and multiplying. The function is only used in debug mode to verify that some internal conversion tables are correct.</p>
<p>In the grand scheme of things, the few extra instructions that would be used to get a correct result have only a very small effect on performance.&nbsp;The <font face="Courier New" size=2>Convert.ToDouble</font> method&nbsp;that started it all ends up spending most of its time parsing according to the specified locale, checking for currency symbols, thousands separators, etc. Only a tiny fraction of the time is spent in the <font face="Courier New" size=2>Mul64</font> functions.</p>
<p>Let&#8217;s estimate how common this error is. For a&nbsp;conversion&nbsp;error to occur, the 12 bits&nbsp;from&nbsp;the 53rd to the 64th must all be zero. That&#8217;s about 1 in 4000. Also, the rounding&nbsp;must be affected, which gives us another factor of 2 or 4. So, as many as&nbsp;1 conversion out of every 10000&nbsp;may suffer from this effect!</p>
<p>The moral of the story: Be very careful with your assumptions about how errors will propagate. Don&#8217;t compromise for the sake of a few CPU cycles, unless performance is&nbsp;absolutely critical and is&nbsp;more important than accuracy.</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/16647.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2006/06/double-trouble-an-interesting-puzzle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fast, Faster, Fastest</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2006/04/fast-faster-fastest/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2006/04/fast-faster-fastest/#comments</comments>
		<pubDate>Wed, 26 Apr 2006 10:05:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2006/04/26/13824.aspx</guid>
		<description><![CDATA[One of my favourite pass-times is getting the most out of a piece of code. Today, I got an opportunity to play a bit from a comment on Rico Mariani&#8217;s latest performance challenge: how to format the time part of a DateTime structure in the form &#8220;hh:mm:ss.mmm&#8220; in the fastest possible way. Apparently, Alois Kraus [...]]]></description>
			<content:encoded><![CDATA[<p>One of my favourite pass-times is getting the most out of a piece of code. Today, I got an opportunity to play a bit from a comment on Rico Mariani&#8217;s latest performance challenge: how to format the time part of a DateTime structure in the form &#8220;hh:mm:ss.mmm&#8220; in the fastest possible way.</p>
<p>Apparently, <a href="http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx">Alois Kraus</a> and <a href="http://geekswithblogs.net/gyoung/archive/2006/04/24/76187.aspx">Greg Young</a> have been at it for a bit. Their solution&nbsp;already gives us more than a five-fold increase in speed compared to the simplest solution using <font face="Courier New" size=2>String.Format</font>. But could we do even better?</p>
<p>As it turns out, we could. Here&#8217;s the code that we can improve:</p>
<p><font size=2></font><font color=#0000ff size=2><font color=#0000ff size=2></p>
<p><font face="Courier New">long</font></font><font face="Courier New"><font size=2><font color=#000000> ticks = time.Ticks;<br /></font></font>int</font></font><font face="Courier New"><font size=2> hour = (</font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2>)((ticks / 0x861c46800L)) % 24;<br /></font><font color=#0000ff size=2>int</font><font size=2> minute = (</font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2>)((ticks / 0x23c34600L)) % 60;<br /></font><font color=#0000ff size=2>int</font><font size=2> second = (</font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2>)(((ticks / 0x989680L)) % 60L);<br /></font><font color=#0000ff size=2>int</font><font size=2> ms = (</font><font color=#0000ff size=2>int</font><font size=2>)(((ticks / 0x2710L)) % 0x3e8L);</font></font></p>
<p>The tick count is the number of 100 nanosecond (ns) intervals since the zero time value. For each of the hour, minute, second and millisecond parts, this code divides the number of ticks by the number of 100ns intervals in that time span, and reduces that number to the number of units in the larger time unit using a modulo. So, for example, there are 60x60x10000000&nbsp;ticks in an hour, which is 0x861c46800 in hex, and there are 24 hours in a day.</p>
<p>What makes the above code less than optimal is that it starts from the number of ticks to compute every time part. This is a long (64 bit) value. 64-bit calculations are slower than 32-bit calculations. Moreover, divisions (or modulos) are much more expensive than multiplications.</p>
<p>We can fix both these issues by first finding the total number of milliseconds in the day. That number is always smaller than 100 million, so it fits in an <font face="Courier New" size=2>int</font>. We can calculate the number of hours&nbsp;with a simple division. We can &#8220;peel off&#8221; the hours from the total number of milliseconds in the day to find the total milliseconds remaining in the hour. From this, we can calculate the number of minutes with a simple division, and so on. The improved code looks like this:</p>
<p><font size=2></p>
<p></font><font color=#0000ff size=2><font face="Courier New">long</font><font size=2><font face="Courier New" color=#000000> ticks = time.Ticks;<br /></font></font></font><font face="Courier New"><font color=#0000ff size=2>int</font><font size=2> ms = (</font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2>)((ticks / 10000) % 86400000);<br /></font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2> hour = ms / 3600000;<br />ms -= 3600000*hour;<br /></font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2> minute = ms / 60000;<br />ms -= 60000 * minute;<br /></font><font color=#0000ff size=2>int</font></font><font size=2><font face="Courier New"> second = ms / 1000;<br />ms -= 1000*second;</font></p>
<p></font></p>
<p>This change decreases the running time by about 28 percent from the fastest previous solution. We can shave off another 4% or so by replacing the modulo calculation by a subtraction in the code that computes the digits.</p>
<p>The question now is: can we do even better, still?</p>
<p>Once again, the answer is: Yes: by as much as another 25%!</p>
<p>The single most time consuming calculation is a division.&nbsp;Dividing by large numbers is an order of magnitude slower than multiplying. For smaller numbers, the difference is smaller, but still significant.&nbsp;Since we know the numbers we are dividing by in advance, we can do a little bit shifting magic and eliminate the divisions altogether.</p>
<p>Let&#8217;s take dividing by 10 as an example. The basic idea is to approximate the ratio 1/10 by another rational number with a power of two as the denominator.&nbsp;Instead of dividing, we can then multiply by the numerator, and shift by the exponent in the denominator. Since shifting chops off digits, it effectively rounds down the result of the division, so we always have to find an approximation that is larger than the ratio.</p>
<p>We see, for example, that&nbsp;13/128 is a good approximation to 1/10. We can rewrite <font face="Courier New" size=2>x/10</font> as <font face="Courier New" size=2>(x*13) &gt;&gt; 7</font>&nbsp;as long as <font face="Courier New" size=2>x</font>&nbsp;is not too large. We run into trouble as soon as the error time <font face="Courier New" size=2>x</font> is larger than 1. In this case, that happens when <font face="Courier New" size=2>x</font> is larger than 13/(13-12.8) = 65.&nbsp;Fortunately, this is larger than the number of hours in a day, or the number of minutes in an hour, so we can use it for most calculations in our code. It won&#8217;t work for numbers up to a 100, so to get the second digit of the millisecond, we need the next approximation, 205/2048, which is good for values up to 10,000.</p>
<p>To get the first digit of the milliseconds, we need to divide by 100. We find that 41/4096 works nicely.</p>
<p>Implementing this optimization, we go from (for example):</p>
<p><font size=2></p>
<p><font face="Courier New">*a = (</font></font><font face="Courier New"><font color=#0000ff size=2>char</font><font size=2>)(hour / 10 + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />a++;<br />*a = (</font><font color=#0000ff size=2>char</font><font size=2>)(hour % 10 + </font><font color=#808080 size=2>&#8217;0&#8242;</font><font size=2>);</p>
<p></font></font></p>
<p>to:</p>
<p><font size=2></p>
<p></font><font face="Courier New" color=#0000ff size=2>int</font><font face="Courier New"><font size=2> temp = (hour * 13) &gt;&gt; 7;<br />*a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />a++;<br />*a = (</font><font color=#0000ff size=2>char</font><font size=2>)(hour &#8211; 10 * temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font><font size=2>);</font></font></p>
<p>Our running time for 1 million iterations goes down from 0.38s to 0.28s, a savings of almost 18% compared to the original.</p>
<p>The larger divisors give us a bit of a challenge. To get the number of seconds, we divide a number less than 60000 by 1000. Doing this the straight way has us multiplying by 536871, which would require a long value for the result of the multiplication. We can get around this once we realize that 1000 = 8*125. So if we shift the number of milliseconds by 3, we only need to divide by 125. As an added benefit, the numbers we&#8217;re multiplying are always less than 7500, so our multiplier can be larger. This gives us the simple expression good for numbers up to almost 4 million:&nbsp;((x&nbsp;&gt;&gt;&nbsp;3)&nbsp;*&nbsp;67109)&nbsp;&gt;&gt;&nbsp;23.</p>
<p>The same trick doesn&#8217;t work for getting the minutes and hours, but it does allow us to fit the intermediate result into a long. We can use the <font face="Courier New" size=2>Math.BigMul</font> method to perform the calculation efficiently.</p>
<p>The final code is given below. It is doubtful it can be improved by much. It runs&nbsp;in as little as 0.221s for one million iterations, 2.5 times faster than the previous fastest code and over 25 times faster than the original.</p>
<p><font color=#0000ff size=2></p>
<p><font face="Courier New">private</font></font><font face="Courier New"><font size=2> </font><font color=#0000ff size=2>unsafe</font><font size=2> </font><font color=#0000ff size=2>static</font><font size=2> </font><font color=#0000ff size=2>string</font><font size=2> FormatFast6(</font><font color=#c0c0c0 size=2>DateTime</font></font><font face="Courier New"><font size=2> time)<br />{<br />&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>fixed</font><font size=2> (</font><font color=#0000ff size=2>char</font></font><font face="Courier New"><font size=2>* p = dateData)<br />&nbsp;&nbsp;&nbsp; {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>long</font></font><font face="Courier New"><font size=2> ticks = time.Ticks;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>char</font></font><font face="Courier New"><font size=2>* a = p;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>int</font><font size=2> ms = (</font><font color=#0000ff size=2>int</font></font><font face="Courier New"><font size=2>)((ticks / 10000) % 86400000);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>int</font><font size=2> hour = (</font><font color=#0000ff size=2>int</font><font size=2>)(</font><font color=#800080 size=2>Math</font></font><font face="Courier New"><font size=2>.BigMul(ms &gt;&gt; 7, 9773437) &gt;&gt; 38);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ms -= 3600000 * hour;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>int</font><font size=2> minute = (</font><font color=#0000ff size=2>int</font><font size=2>)((</font><font color=#800080 size=2>Math</font></font><font face="Courier New"><font size=2>.BigMul(ms &gt;&gt; 5, 2290650)) &gt;&gt; 32);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ms -= 60000 * minute;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font color=#0000ff size=2>int</font></font><font face="Courier New" size=2> second = ((ms &gt;&gt; 3) * 67109) &gt;&gt; 23;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ms -= 1000 * second;<br /></font><font size=2><br /><font face="Courier New">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font></font><font face="Courier New" color=#0000ff size=2>int</font><font face="Courier New"><font size=2> temp = (hour * 13) &gt;&gt; 7;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a++;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(hour &#8211; 10 * temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a += 2;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; temp = (minute * 13) &gt;&gt; 7;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a++;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(minute &#8211; 10 * temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a += 2;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; temp = (second * 13) &gt;&gt; 7;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a++;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(second &#8211; 10 * temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a += 2;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; temp = (ms * 41) &gt;&gt; 12;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a++;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ms -= 100 * temp;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; temp = (ms * 205) &gt;&gt; 11;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ms -= 10 * temp;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; a++;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *a = (</font><font color=#0000ff size=2>char</font><font size=2>)(ms &#8211; 10 * temp + </font><font color=#808080 size=2>&#8217;0&#8242;</font></font><font face="Courier New"><font size=2>);<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />&nbsp;&nbsp;&nbsp;&nbsp;</font><font color=#0000ff size=2>return</font><font size=2> </font><font color=#0000ff size=2>new</font><font size=2> </font><font color=#c0c0c0 size=2>String</font></font><font face="Courier New" size=2>(dateData);<br />}</font></p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/13824.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2006/04/fast-faster-fastest/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Visual Studio as an interactive technical computing environment</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2006/02/visual-studio-as-an-interactive-technical-computing-environment/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2006/02/visual-studio-as-an-interactive-technical-computing-environment/#comments</comments>
		<pubDate>Sun, 19 Feb 2006 18:18:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2006/02/19/7355.aspx</guid>
		<description><![CDATA[A year ago, we were experimenting with an extensive sample that would illustrate the linear algebra capabilities of our math library. Unfortunately, M#, the .NET Matrix Workbench didn&#8217;t get very far. We&#8217;re technical guys. Building user interfaces just isn&#8217;t our cup of tea, and the IDE we built wasn&#8217;t stable enough to make it into [...]]]></description>
			<content:encoded><![CDATA[<p>A year ago, we were experimenting with an extensive sample that would illustrate the linear algebra capabilities of our math library. Unfortunately, <a href="http://www.extremeoptimization.com/mathematics/samples/MSharp.aspx">M#, the .NET Matrix Workbench</a> didn&#8217;t get very far. We&#8217;re technical guys. Building user interfaces just isn&#8217;t our cup of tea, and the IDE we built wasn&#8217;t stable enough to make it into an end-user product.</p>
<p>At the time, I realized that much of the functionality needed for this kind of interactive computing environment was already present in Visual Studio .NET. For example, we already have an excellent&nbsp;code editor, a project workspace, and&nbsp;tool windows that display variables and their values. Moreover,&nbsp;in the <a href="http://msdn.microsoft.com/vstudio/extend/default.aspx">Visual Studio SDK</a>, we have a framework for extending that environment with visualizers for specific types of variables, intellisense,&nbsp;custom project items, and so on.</p>
<p>Plus, you have a great library (the .NET Base Class Libraries) that you can use to do just about anything you&#8217;d like.</p>
<p>In short,&nbsp;Visual Studio is the ideal starting point to build a great technical computing IDE.</p>
<p>A couple of recent news items bring this vision closer to reality. Aaron Marten <a href="http://blogs.msdn.com/aaronmar/archive/2006/02/16/533273.aspx">reports</a> that the February 2006 CTP of the Visual Studio 2005 SDK now contains a tool window that hosts an&nbsp;IronPython console. And just a few days ago, Don Syme gave us a <a href="http://blogs.msdn.com/dsyme/archive/2006/02/19/534925.aspx">taste</a> of what is to come in the next release of F#. The screen shot is the kind you would expect from <a href="http://www.mathworks.com/products/matlab/index.html">Matlab</a>. (I guess I was right when I <a href="http://blogs.extremeoptimization.com/jeffrey/archive/2005/06/27/380.aspx">wrote</a> that Don gets what scientists and engineers need.)</p>
<p>Now, all we need is a Matlab-like language for the .NET platform&#8230;</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/7355.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2006/02/visual-studio-as-an-interactive-technical-computing-environment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yahoo Finance miscalculates monthly average daily volume</title>
		<link>http://www.extremeoptimization.com/Blog/index.php/2006/02/yahoo-finance-miscalculates-monthly-average-daily-volume/</link>
		<comments>http://www.extremeoptimization.com/Blog/index.php/2006/02/yahoo-finance-miscalculates-monthly-average-daily-volume/#comments</comments>
		<pubDate>Fri, 17 Feb 2006 20:32:00 +0000</pubDate>
		<dc:creator>Jeffrey Sax</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.extremeoptimization.com/jeffrey/archive/2006/02/17/7272.aspx</guid>
		<description><![CDATA[Am I missing something here? While testing some time series functionality in the new version of our statistics library, we came across a rather curious discrepancy. We used the historical quotes available from&#160;Yahoo Finance as a reference resource. As it turns out, comparison with our data appears to show that Yahoo miscalculates some summary statistics. [...]]]></description>
			<content:encoded><![CDATA[<p>Am I missing something here?</p>
<p>While testing some time series functionality in the new version of our <a href="http://www.extremeoptimization.com/statistics/default.aspx">statistics library</a>, we came across a rather curious discrepancy. We used the historical quotes available from&nbsp;<a href="http://finance.yahoo.com/">Yahoo Finance</a> as a reference resource. As it turns out, comparison with our data appears to show that Yahoo miscalculates some summary statistics.</p>
<p>The error occurs on the Historical Prices page when using a monthly timeframe. Take the <a href="http://finance.yahoo.com/q/hp?s=MSFT&amp;a=00&amp;b=1&amp;c=2005&amp;d=11&amp;e=31&amp;f=2005&amp;g=m">monthly data for 2005</a>&nbsp;for Microsoft&#8217;s stock (symbol MSFT). This shows an average daily volume for January of 79,642,818 shares. According to the <a href="http://help.yahoo.com/help/us/fin/quote/quote-12.html">help document</a>, this is &#8220;the average daily volume for all trading days in the reported month.&#8221;</p>
<p>When we look at the <a href="http://finance.yahoo.com/q/hp?s=MSFT&amp;a=00&amp;b=1&amp;c=2005&amp;d=00&amp;e=31&amp;f=2005&amp;g=d">daily prices for January 2005</a>, we find 20 trading days. When we add up all the daily volumes, we find 1,521,414,280 shares changed hands that month. That should give an average daily volume of 76,070,714 shares, more than <em>3 million shares less </em>than Yahoo&#8217;s figure. Why the difference?</p>
<p>A brief investigation showed that the difference can be explained because Yahoo includes the volume on the last trading day of the month <u>twice</u>. If you add the volume of Jan. 1st to the total, we get 1,592,856,376. Dividing by the number of trading days (20) gives 79,642,818.8.</p>
<p>When we look at other months, we find the same pattern: Yahoo consistently overstates the average daily volume for the month by a few percentage points. Each time, this difference can be explained by the&nbsp;double inclusion of the volume of the last trading day in the total volume for the month.</p>
<p>Here&#8217;s a&nbsp;random sample: Research in Motion for June 2000. <a href="http://finance.yahoo.com/q/hp?s=RIMM&amp;a=05&amp;b=1&amp;c=2000&amp;d=05&amp;e=30&amp;f=2000&amp;g=m">Yahoo</a> gives&nbsp;an average daily volume of 4,262,160 shares. Our calculation shows an average of 3,870,800 shares corresponding to a total volume of 77,416,000 for the month. Yahoo&#8217;s&nbsp;total&nbsp;corresponds to&nbsp;85,243,200 shares. The difference of 7,820,200 shares is exactly the <a href="http://finance.yahoo.com/q/hp?s=RIMM&amp;a=05&amp;b=1&amp;c=2000&amp;d=05&amp;e=30&amp;f=2000&amp;g=d">volume for June 30th</a>.</p>
<p>The weekly average daily volume appears to be correct.</p>
<p>I find it hard to believe that a service that is as widely used as Yahoo would show such an error. So, my question to the experts in technical analysis out there is: What am I missing???</p>
<p><img src ="http://blogs.extremeoptimization.com/jeffrey/aggbug/7272.aspx" width = "1" height = "1" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.extremeoptimization.com/Blog/index.php/2006/02/yahoo-finance-miscalculates-monthly-average-daily-volume/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

