Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Mathematics Library User's Guide
    • General Classes
    • Mathematical Functions
    • Complex Numbers
    • Arbitrary Precision Arithmetic
    • Automatic Differentiation
    • Curves
    • Curve Fitting
    • Solving Equations
    • Optimization
    • Calculus
    • Fast Fourier Transforms
    • Random Numbers
    • Generic Arithmetic
    • Appendices
  • Optimization
    • One-Dimensional Optimization
    • Multidimensional Optimization
    • Minimizing Sums of Squares
    • Optimization Model Basics
    • Linear Programming
    • Quadratic Programming
    • Nonlinear Programming
  • Multidimensional Optimization
Multidimensional OptimizationExtreme Optimization Numerical Libraries for .NET Professional

Finding the minimum or maximum of a function in many variables is one of the most common problems in numerical computing. It is not surprising that a wide range of techniques exist, each with its advantages and disadvantages. The Extreme Optimization Numerical Libraries for .NET implements a large number of these techniques, to offer solutions for the broadest possible range of applications.

Available Algorithms

The Downhill Simplex Method of Nelder and Mead

The Nelder-Mead algorithm, often also called the downhill simplex method, is a simple algorithm that produces reasonable results when no derivatives are available. A simplex is a generalization of a triangle (2 dimensional) or tetrahedron (3 dimensional) to arbitrary dimensions. A simplex in n dimensions has n+1 vertices. This is the smallest number of points needed to define a region in n-dimensional space. The algorithm works by letting this simplex 'travel' downhill on the surface defined by the objective function through reflection. If no lower point can be found, the simplex is contracted towards the lowest point.

The algorithm defines its own versions of the ValueTest and SolutionTest properties. The value test succeeds if the difference of the objective function at the best and at the worst point in the simplex is less than the tolerance. The solution test succeeds if the distance between the best and the worst point in the simplex is less than the tolerance. The gradient test does not apply.

Convergence of the Nelder-Mead method is linear at best. It can fail in certain pathological situations. If derivatives are available, and the derivatives are continuous, then one of the other methods should be used.

Conjugate Gradient Optimizers

Two algorithms are available that search successively in a direction from a well-defined set of directions. For each direction, a one-dimensional search for an extremum is performed along the direction.

The conjugate gradient method, implemented by the ConjugateGradientOptimizer class, uses a series of search directions that are optimal for a quadratic objective function. The method requires that the gradient of the objective function be available. There are three common variants of the formula to generate a new search direction, giving rise to the Fletcher-Reeves and Polak-Ribière methods. The variant is selected by passing the appropriate ConjugateGradientMethod to the constructor, or setting the Method property. It is found that, in general, the Polak-Ribière method tends to converge somewhat faster. In addition, the positive variant of this method has proven convergence properties. It is the default method.

Powell's method, implemented by the PowellOptimizer class, is a variation of the conjugate gradient method that does not require knowledge of the gradient of the objective function. Instead, the directions are generated from the geometrical properties of the successive approximations. A drawback of Powell's method is that it usually requires more function evaluations, as the line searches need to be exact for the method to converge.

Quasi-Newton Methods

The full Newton method requires the calculation of second derivatives and the solution of a system of equations in every iteration. This can be very expensive and may not lead to great results, especially when the current point is far from the actual extremum.

Quasi-Newton methods use a numerical approximation to the inverse of the Hessian matrix that is maintained through each iteration. Different variations of the method use different ways to keep the approximation up to date. There are two main variations: the Davidon-Fletcher-Powell method (commonly abbreviated to DFP) and the Broyden-Fletcher-Goldfard-Shanno method (BFGS). The method is selected by passing the appropriate QuasiNewtonMethod to the constructor, or setting the Method property. The BFGS method is usually somewhat superior to the DFP method. It is the default.

The BFGS and DFP methods use a dense approximation to the inverse Hessian. As the number of variables grows larger, both the memory requirements and the computation time per iteration grow considerably. To address this, a limited memory variant of the BFGS method was developed. This is implemented by the LimitedMemoryBfgsOptimizer class. It uses only a small, fixed number of search directions to build the Hessian approximation. Between 3 and 7 is usually recommended. This number must be passed to the constructor. If not provided, a default value of 3 is used.

Defining the objective function

The function for which an extremum is to be found is called the objective function. It is accessed through the ObjectiveFunction property, which is a delegate of type FuncT, TResult. The delegate takes a VectorT argument and returns a real number.

The following code demonstrates how to write methods to implement objective functions. The objective function implemented here is the Rosenbrock function,

f(x1,x2) = (1 - x1)2 + 105(x2-x12)2

which is a famous test function for optimization.

C#
VB
C++
F#
Copy
double Rosenbrock(Vector<double> x)
{
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    return a * a + 105 * b * b;
}
Function Rosenbrock(x As Vector(Of Double)) As Double
    Dim a As Double = 1 - x(0)
    Dim b As Double = (x(1) - x(0) * x(0))
    Return a * a + 105 * b * b
End Function

No code example is currently available or this language may not be supported.

let Rosenbrock (x : Vector<float>) =
    let a = 1.0 - x.[0]
    let b = (x.[1] - x.[0] * x.[0])
    a * a + 105.0 * b * b

Some algorithms require the gradient (vector of partial derivatives) of the objective function. There are two ways to supply this function. The first is through the GradientFunction property. This property is of type FuncT, TResult. It is a method that takes a VectorT argument and also returns a VectorT. The drawback of using this property is that a new VectorT instance is created on every call.

The following code demonstrates how to write a method that calculates the gradient of an objective function. Once again, we use the Rosenbrock function as our example:

C#
VB
C++
F#
Copy
Vector<double> RosenbrockGradient(Vector<double> x)
{
    Vector<double> f = Vector.Create<double>(2);
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    f[0] = -2 * a - 420 * x[0] * b;
    f[1] = 210 * b;
    return f;
}
Function RosenbrockGradient(x As Vector(Of Double)) As Vector(Of Double)
    Dim f = Vector.Create(Of Double)(2)
    Dim a As Double = 1 - x(0)
    Dim b As Double = (x(1) - x(0) * x(0))
    f(0) = -2 * a - 420 * x(0) * b
    f(1) = 210 * b
    Return f
End Function

No code example is currently available or this language may not be supported.

let RosenbrockGradient (x : Vector<float>) =
    let f = Vector.Create<float>(2)
    let a = 1.0 - x.[0]
    let b = (x.[1] - x.[0] * x.[0])
    f.[0] <- -2.0 * a - 420.0 * x.[0] * b
    f.[1] <- 210.0 * b
    f

Alternatively, the gradient can be specified using the FastGradientFunction property. This property is of type FuncT1, T2, TResult. It is a method that takes two VectorT arguments. The first is the input vector. The second is the VectorT that is to hold the result. The return value is a reference to this result vector.

The following code demonstrates how to write a method that calculates the gradient of an objective function as a FuncT1, T2, TResult delegate.

C#
VB
C++
F#
Copy
Vector<double> RosenbrockFastGradient(Vector<double> x, Vector<double> f)
{
    if (f == null)
        f = Vector.Create<double>(2);
    double a = 1 - x[0];
    double b = (x[1] - x[0] * x[0]);
    f[0] = -2 * a - 420 * x[0] * b;
    f[1] = 210 * b;
    return f;
}
Function RosenbrockFastGradient(x As Vector(Of Double), f As Vector(Of Double)) As Vector(Of Double)
    If (f Is Nothing) Then f = Vector.Create(Of Double)(2)
    Dim a As Double = 1 - x(0)
    Dim b As Double = (x(1) - x(0) * x(0))
    f(0) = -2 * a - 420 * x(0) * b
    f(1) = 210 * b
    Return f
End Function

No code example is currently available or this language may not be supported.

let RosenbrockFastGradient (x : Vector<float>) (f : Vector<float>) =
    let a = 1.0 - x.[0]
    let b = (x.[1] - x.[0] * x.[0])
    f.[0] <- -2.0 * a - 420.0 * x.[0] * b
    f.[1] <- 210.0 * b
    f

If the gradient function is not available, or if it is very expensive to calculate, it may be preferable to either approximate it numerically, or use a method that does not require the gradient function.

The next example puts this all together. It creates a BFGS optimizer and sets it up to minimize the Rosenbrock function:

C#
VB
C++
F#
Copy
QuasiNewtonOptimizer bfgs =
    new QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs);
bfgs.ObjectiveFunction = Rosenbrock;
bfgs.FastGradientFunction = RosenbrockFastGradient;
Dim bfgs As QuasiNewtonOptimizer =
    New QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs)
bfgs.ObjectiveFunction = AddressOf Rosenbrock
bfgs.FastGradientFunction = AddressOf RosenbrockFastGradient

No code example is currently available or this language may not be supported.

let bfgs = QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs)
bfgs.ObjectiveFunction <- Func<_,_>(Rosenbrock)
bfgs.FastGradientFunction <- Func<_,_,_>(RosenbrockFastGradient)

When using .NET 4.0, it is possible to have the gradient computed using Automatic Differentiation. To do so, specify the objective function by setting the SymbolicObjectiveFunction property to a lambda expression that represents the objective function. The lambda expression should take a VectorT as its input.

Termination Criteria

Optimization algorithms in multiple dimensions commonly have two or three criteria that may each signal that an extremum has been found.

The ValueTest property returns a SimpleConvergenceTest object that represents the convergence test based on the value of the objective function. The test is successful if the change in the value of the objective function is less than the tolerance. The test returns a value of Divergent if the value of the objective function is infinite, and BadFunction when the value is NaN.

There is some danger in using this test, since some algorithms may not return a new best estimate for the extremum on each iteration. For this reason, the test is not active by default. To make it active, its Active property must be set to true.

The SolutionTest property returns a VectorConvergenceTest object that represents the convergence test based on the estimated extremum. The test is successful if the change in the approximate extremum is less than the tolerance. The test returns a value of Divergent if one of the elements of the solution is infinite, and BadFunction when one of the values is NaN.

The VectorConvergenceTest has many options to specify the details of the convergence test. By default, the error is calculated as the maximum of the errors in each of the elements of the change of the extremum from the previous iteration compared to the corresponding elements of the extremum.

Methods that use gradients have a third test available. The GradientTest property returns a VectorConvergenceTest based on the gradient of the objective function. The test succeeds if the norm of the gradient vector is less than the tolerance. This test is inactive for algorithms that do not use gradients.

Specific algorithms may supply their own convergence tests, or may modify the exact meaning of the tests discussed here. For example, the Nelder-Mead method uses slight variations of the value and solution tests that are more convenient for that particular algorithm.

The example below illustrates some of the options mentioned here. It sets the tolerance of the gradient test to 100 times the machine precision, which will succeed if the sum of the absolute values is less than the tolerance. It also explicitly specifies that the value test is not to be used.

C#
VB
C++
F#
Copy
bfgs.ValueTest.Enabled = false;
bfgs.GradientTest.Norm = VectorConvergenceNorm.SumOfAbsoluteValues;
bfgs.GradientTest.Tolerance = 100 * MachineConstants.Epsilon;
bfgs.ValueTest.Enabled = False
bfgs.GradientTest.Norm = VectorConvergenceNorm.SumOfAbsoluteValues
bfgs.GradientTest.Tolerance = 100 * MachineConstants.Epsilon

No code example is currently available or this language may not be supported.

bfgs.ValueTest.Enabled <- false
bfgs.GradientTest.Norm <- VectorConvergenceNorm.SumOfAbsoluteValues
bfgs.GradientTest.Tolerance <- 100.0 * MachineConstants.Epsilon
Line searches

Many algorithms for Multidimensional optimization work by successively solving a one-dimensional optimization problem along a search direction, and altering the search direction in each iteration. These algorithms all inherit from DirectionalOptimizer. Although general one-dimensional optimization algorithms may be used, they usually don't work well. They are either wasteful because they are too precise, or the results fail to meet certain criteria that are required to ensure convergence of the algorithm.

Specialized line search algorithms are therefore in order. The Extreme Optimization Mathematics Library for .NET provides three of them:

Class

Description

BacktrackingLineSearch

A line search that tries a full step first and back-tracks as needed.

MoreThuenteLineSearch

The classic algorithm of More and Thuente.

ParabolicLineSearch

A line search that uses quadratic interpolation.

UnitLineSearch

A line "search" that always returns a unit step.

All optimization algorithms that use line searches automatically select the appropriate line search algorithm, and set the search parameters to appropriate values. The unit line search is useful for experimental purposes, or when the objective function is known to be well behaved so that a line search is not necessary. You can use the LineSearch property to access the line search object.

Computing the Extremum

You must specify an initial guess for the solution by setting the InitialGuess property to a suitable VectorT value. Any vector type may be used.

The FindExtremum method finds the extremum. It returns a DenseVectorT that contains the best approximation to the extremum. This value is also available through the Extremum

C#
VB
C++
F#
Copy
bfgs.InitialGuess = Vector.Create(-1.2, 1.0);
bfgs.ExtremumType = ExtremumType.Minimum;
bfgs.FindExtremum();
bfgs.InitialGuess = Vector.Create(-1.2, 1.0)
bfgs.ExtremumType = ExtremumType.Minimum
bfgs.FindExtremum()

No code example is currently available or this language may not be supported.

bfgs.InitialGuess <- Vector.Create(-1.2, 1.0)
bfgs.ExtremumType <- ExtremumType.Minimum
bfgs.FindExtremum() |> ignore
Interpreting Results

After the FindExtremum method returns, a number of properties give more details about the algorithm. The Status property returns an AlgorithmStatus value that indicates how the algorithm terminated. The possible values are listed below:

AlgorithmStatus values

Value

Description

NoResult

The algorithm has not been executed.

Busy

The algorithm is still executing.

Converged

The algorithm ended normally. The desired accuracy has been achieved.

IterationLimitExceeded

The number of iterations needed to achieve the desired accuracy is greater than MaxIterations.

EvaluationLimitExceeded

The number of function evaluations needed to achieve the desired accuracy is greater than MaxEvaluations.

RoundOffError

Round-off prevented the algorithm from achieving the desired accuracy.

BadFunction

Bad behavior of the target function prevented the algorithm from achieving the desired accuracy.

Divergent

The algorithm diverges. The objective function appears to be unbounded.

Each of the ConvergenceTest objects discussed earlier exposes an Error property that returns the estimated error used in the test. The IterationsNeeded and EvaluationsNeeded properties return the number of iterations and the number of evaluations of the objective function, respectively. Algorithms that use gradients also have a GradientEvaluationsNeeded.

The final example prints a summary of the results of running the BFGS algorithm on the Rosenbrock function:

C#
VB
C++
F#
Copy
Console.WriteLine("BFGS Method:");
Console.WriteLine("  Solution: {0}", bfgs.Extremum);
Console.WriteLine("  Estimated error: {0}", bfgs.SolutionTest.Error);
Console.WriteLine("  Gradient residual: {0}", bfgs.GradientTest.Error);
Console.WriteLine("  # iterations: {0}", bfgs.IterationsNeeded);
Console.WriteLine("  # function evaluations: {0}",
    bfgs.EvaluationsNeeded);
Console.WriteLine("  # gradient evaluations: {0}",
    bfgs.GradientEvaluationsNeeded);
Console.WriteLine("BFGS Method:")
Console.WriteLine("  Solution: {0}", bfgs.Extremum)
Console.WriteLine("  Estimated error: {0}", bfgs.SolutionTest.Error)
Console.WriteLine("  Gradient residual: {0}", bfgs.GradientTest.Error)
Console.WriteLine("  # iterations: {0}", bfgs.IterationsNeeded)
Console.WriteLine("  # function evaluations: {0}",
    bfgs.EvaluationsNeeded)
Console.WriteLine("  # gradient evaluations: {0}",
    bfgs.GradientEvaluationsNeeded)

No code example is currently available or this language may not be supported.

Console.WriteLine("BFGS Method:")
Console.WriteLine("  Solution: 0", bfgs.Extremum)
Console.WriteLine("  Estimated error: 0", bfgs.SolutionTest.Error)
Console.WriteLine("  Gradient residual: 0", bfgs.GradientTest.Error)
Console.WriteLine("  # iterations: 0", bfgs.IterationsNeeded)
Console.WriteLine("  # function evaluations: 0",
    bfgs.EvaluationsNeeded)
Console.WriteLine("  # gradient evaluations: 0",
    bfgs.GradientEvaluationsNeeded)
References

Fletcher, R. Practical Methods of Optimization, John Wiley & Sons, New York, 1987.

Fletcher, R. and Reeves, C.M., Function Minimization by Conjugate Gradients, Comp, J. 7, 149-154, 1964.

Moré and D. Thuente, Line Search Algorithms with Guaranteed Sufficient Decrease, ACM Transactions on Mathematical Software , Vol. 20, No. 3, pp. 286--307, 1994.

Nelder, J. A. and Mead, R. "A Simplex Method for Function Minimization." Comput. J. 7, 308-313, 1965.

Polak, E., and Ribière, G., Note sur la Convergence de Methodes de Directions Conjugees, Revue Francaise d'Informatique et de Recherche Operationnelle, 3, 35-43, 1969.

Powell, M.J.D., An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives, Comp. J. 7, 155-162, 1964.

Copyright (c) 2004-2017 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2018, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.