Finding the minimum or maximum of a function in many variables is
one of the most common problems in numerical computing.
It is not surprising that a wide range of techniques exist,
each with its advantages and disadvantages.
The Extreme Optimization Numerical Libraries for .NET
implements a large number of these techniques,
to offer solutions for the broadest possible range of applications.
The Downhill Simplex Method of Nelder and Mead
The Nelder-Mead algorithm, often also called the downhill simplex method, is a simple algorithm that
produces reasonable results when no derivatives are available. A simplex is a generalization of a triangle (2
dimensional) or tetrahedron (3 dimensional) to arbitrary dimensions. A simplex in n dimensions has
n+1 vertices. This is the smallest number of points needed to define a region in n-dimensional
space. The algorithm works by letting this simplex 'travel' downhill on the surface defined by the objective function
through reflection. If no lower point can be found, the simplex is contracted towards the lowest point.
The algorithm defines its own versions of the ValueTest and SolutionTest properties. The value
test succeeds if the difference of the objective function at the best and at the worst point in the simplex is less
than the tolerance. The solution test succeeds if the distance between the best and the worst point in the simplex is
less than the tolerance. The gradient test does not apply.
Convergence of the Nelder-Mead method is linear at best. It can fail in certain pathological situations. If
derivatives are available, and the derivatives are continuous, then one of the other methods should be used.
Conjugate Gradient Optimizers
Two algorithms are available that search successively in a direction from a well-defined set of directions. For
each direction, a one-dimensional search for an extremum is performed along the direction.
The conjugate gradient method, implemented by the ConjugateGradientOptimizer class, uses a series
of search directions that are optimal for a quadratic objective function. The method requires that the gradient of
the objective function be available. There are three common variants of the formula to generate a new search
direction, giving rise to the Fletcher-Reeves and Polak-Ribière methods. The variant is selected by passing the
appropriate ConjugateGradientMethod to
the constructor, or setting the Method property. It is found that, in
general, the Polak-Ribière method tends to converge somewhat faster. In addition, the positive variant of this
method has proven convergence properties. It is the default method.
Powell's method, implemented by the PowellOptimizer class, is a variation of the conjugate
gradient method that does not require knowledge of the gradient of the objective function. Instead, the directions
are generated from the geometrical properties of the successive approximations. A drawback of Powell's method is that
it usually requires more function evaluations, as the line searches need to be exact for the method to converge.
The full Newton method requires the calculation of second derivatives
and the solution of a system of equations in every iteration.
This can be very expensive and may not lead to great results,
especially when the current point is far from the actual extremum.
Quasi-Newton methods use a numerical approximation
to the inverse of the Hessian matrix that is maintained through each
iteration. Different variations of the method use different ways
to keep the approximation up to date. There are two main variations:
the Davidon-Fletcher-Powell method (commonly abbreviated to
DFP) and the Broyden-Fletcher-Goldfard-Shanno method
(BFGS). The method is selected by passing the
appropriate
QuasiNewtonMethod
to the constructor, or setting the
Method
property. The BFGS method is usually somewhat superior to the DFP method. It is the default.
The BFGS and DFP methods use a dense approximation to the inverse Hessian.
As the number of variables grows larger, both the memory requirements and
the computation time per iteration grow considerably.
To address this, a limited memory variant of the
BFGS method was developed. This is implemented by the
LimitedMemoryBfgsOptimizer
class.
It uses only a small, fixed number of search directions to build
the Hessian approximation. Between 3 and 7 is usually recommended.
This number must be passed to the constructor. If not provided,
a default value of 3 is used.
Defining the objective function
The function for which an extremum is to be found is called the objective function. It is accessed
through the ObjectiveFunction property,
which is a delegate of type FuncT, TResult. The delegate takes a
VectorT argument and returns a real number.
The following code demonstrates how to write methods to implement objective functions. The objective function
implemented here is the Rosenbrock function,
f(x1,x2) = (1 - x1)2 +
105(x2-x12)2
which is a famous test function for optimization.
double Rosenbrock(Vector<double> x)
{
double a = 1 - x[0];
double b = (x[1] - x[0] * x[0]);
return a * a + 105 * b * b;
}
Function Rosenbrock(x As Vector(Of Double)) As Double
Dim a As Double = 1 - x(0)
Dim b As Double = (x(1) - x(0) * x(0))
Return a * a + 105 * b * b
End Function
No code example is currently available or this language may not be supported.
let Rosenbrock (x : Vector<float>) =
let a = 1.0 - x.[0]
let b = (x.[1] - x.[0] * x.[0])
a * a + 105.0 * b * b
Some algorithms require the gradient (vector of partial derivatives) of the objective function. There are two ways
to supply this function. The first is through the GradientFunction property. This
property is of type FuncT, TResult.
It is a method that takes a VectorT argument and also
returns a VectorT. The drawback of using this property is
that a new VectorT instance is created on every call.
The following code demonstrates how to write a method that calculates the gradient of an objective function. Once
again, we use the Rosenbrock function as our example:
Vector<double> RosenbrockGradient(Vector<double> x)
{
Vector<double> f = Vector.Create<double>(2);
double a = 1 - x[0];
double b = (x[1] - x[0] * x[0]);
f[0] = -2 * a - 420 * x[0] * b;
f[1] = 210 * b;
return f;
}
Function RosenbrockGradient(x As Vector(Of Double)) As Vector(Of Double)
Dim f = Vector.Create(Of Double)(2)
Dim a As Double = 1 - x(0)
Dim b As Double = (x(1) - x(0) * x(0))
f(0) = -2 * a - 420 * x(0) * b
f(1) = 210 * b
Return f
End Function
No code example is currently available or this language may not be supported.
let RosenbrockGradient (x : Vector<float>) =
let f = Vector.Create<float>(2)
let a = 1.0 - x.[0]
let b = (x.[1] - x.[0] * x.[0])
f.[0] <- -2.0 * a - 420.0 * x.[0] * b
f.[1] <- 210.0 * b
f
Alternatively, the gradient can be specified using the FastGradientFunction
property. This property is of type FuncT1, T2, TResult. It is a method that
takes two VectorT arguments. The first is the input
vector. The second is the VectorT that is to hold the
result. The return value is a reference to this result vector.
The following code demonstrates how to write a method that calculates the gradient of an objective function as a
FuncT1, T2, TResult delegate.
Vector<double> RosenbrockFastGradient(Vector<double> x, Vector<double> f)
{
if (f == null)
f = Vector.Create<double>(2);
double a = 1 - x[0];
double b = (x[1] - x[0] * x[0]);
f[0] = -2 * a - 420 * x[0] * b;
f[1] = 210 * b;
return f;
}
Function RosenbrockFastGradient(x As Vector(Of Double), f As Vector(Of Double)) As Vector(Of Double)
If (f Is Nothing) Then f = Vector.Create(Of Double)(2)
Dim a As Double = 1 - x(0)
Dim b As Double = (x(1) - x(0) * x(0))
f(0) = -2 * a - 420 * x(0) * b
f(1) = 210 * b
Return f
End Function
No code example is currently available or this language may not be supported.
let RosenbrockFastGradient (x : Vector<float>) (f : Vector<float>) =
let a = 1.0 - x.[0]
let b = (x.[1] - x.[0] * x.[0])
f.[0] <- -2.0 * a - 420.0 * x.[0] * b
f.[1] <- 210.0 * b
f
If the gradient function is not available, or if it is very expensive to calculate, it may be preferable to either
approximate it numerically, or use a method that does not require the gradient function.
The next example puts this all together. It creates a BFGS optimizer and sets it up to minimize the Rosenbrock
function:
QuasiNewtonOptimizer bfgs =
new QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs);
bfgs.ObjectiveFunction = Rosenbrock;
bfgs.FastGradientFunction = RosenbrockFastGradient;
Dim bfgs As QuasiNewtonOptimizer =
New QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs)
bfgs.ObjectiveFunction = AddressOf Rosenbrock
bfgs.FastGradientFunction = AddressOf RosenbrockFastGradient
No code example is currently available or this language may not be supported.
let bfgs = QuasiNewtonOptimizer(QuasiNewtonMethod.Bfgs)
bfgs.ObjectiveFunction <- Func<_,_>(Rosenbrock)
bfgs.FastGradientFunction <- Func<_,_,_>(RosenbrockFastGradient)
When using .NET 4.0,
it is possible to have the gradient computed using Automatic Differentiation.
To do so, specify the objective function by setting the
SymbolicObjectiveFunction
property to a lambda expression that represents the objective function.
The lambda expression should take a VectorT
as its input.
Optimization algorithms in multiple dimensions commonly have two or three criteria that may each signal that an
extremum has been found.
The ValueTest property
returns a SimpleConvergenceTest object that represents
the convergence test based on the value of the objective function. The test is successful if the change in the value
of the objective function is less than the tolerance. The test returns a value of Divergent if the value
of the objective function is infinite, and BadFunction when the value is NaN.
There is some danger in using this test, since some algorithms may not return a new best estimate for the extremum
on each iteration. For this reason, the test is not active by default. To make it active, its Enabled property must be set to true.
The SolutionTest
property returns a VectorConvergenceTestT object that
represents the convergence test based on the estimated extremum. The test is successful if the change in the
approximate extremum is less than the tolerance. The test returns a value of Divergent if one of the
elements of the solution is infinite, and BadFunction when one of the values is NaN.
The VectorConvergenceTest has many options to specify
the details of the convergence test. By default, the error is calculated as
the maximum of the errors in each of the elements of the change of the extremum
from the previous iteration compared to the corresponding elements of the extremum.
Methods that use gradients have a third test available. The GradientTest property returns a
VectorConvergenceTestT based on the gradient of the
objective function. The test succeeds if the norm of the gradient vector is less than the tolerance. This test is
inactive for algorithms that do not use gradients.
Specific algorithms may supply their own convergence tests, or may modify the exact meaning of the tests discussed
here. For example, the Nelder-Mead method uses slight variations of the value and solution tests that are more
convenient for that particular algorithm.
The example below illustrates some of the options mentioned here. It sets the tolerance of the gradient test to
100 times the machine precision, which will succeed if the sum of the absolute values is less than the tolerance. It
also explicitly specifies that the value test is not to be used.
bfgs.ValueTest.Enabled = false;
bfgs.GradientTest.Norm = VectorConvergenceNorm.SumOfAbsoluteValues;
bfgs.GradientTest.Tolerance = 100 * MachineConstants.Epsilon;
bfgs.ValueTest.Enabled = False
bfgs.GradientTest.Norm = VectorConvergenceNorm.SumOfAbsoluteValues
bfgs.GradientTest.Tolerance = 100 * MachineConstants.Epsilon
No code example is currently available or this language may not be supported.
bfgs.ValueTest.Enabled <- false
bfgs.GradientTest.Norm <- VectorConvergenceNorm.SumOfAbsoluteValues
bfgs.GradientTest.Tolerance <- 100.0 * MachineConstants.Epsilon
Many algorithms for Multidimensional optimization work by successively solving a one-dimensional optimization
problem along a search direction, and altering the search direction in each iteration. These algorithms all inherit
from DirectionalOptimizer. Although general
one-dimensional optimization algorithms may be used, they usually don't work well. They are either wasteful because
they are too precise, or the results fail to meet certain criteria that are required to ensure convergence of the
algorithm.
Specialized line search algorithms are therefore in order. The
Extreme Optimization Mathematics Library for
.NET
provides three of them:
All optimization algorithms that use line searches automatically select the appropriate line search algorithm, and
set the search parameters to appropriate values. The unit line search is useful for experimental purposes, or when
the objective function is known to be well behaved so that a line search is not necessary. You can use the LineSearch property to access the line
search object.
You must specify an initial guess for the solution by setting the InitialGuess property to a suitable
VectorT value. Any vector type may be used.
The FindExtremum
method finds the extremum. It returns a DenseVectorT that contains the best approximation to the
extremum. This value is also available through the Extremum
bfgs.InitialGuess = Vector.Create(-1.2, 1.0);
bfgs.ExtremumType = ExtremumType.Minimum;
bfgs.FindExtremum();
bfgs.InitialGuess = Vector.Create(-1.2, 1.0)
bfgs.ExtremumType = ExtremumType.Minimum
bfgs.FindExtremum()
No code example is currently available or this language may not be supported.
bfgs.InitialGuess <- Vector.Create(-1.2, 1.0)
bfgs.ExtremumType <- ExtremumType.Minimum
bfgs.FindExtremum() |> ignore
After the FindExtremum method returns, a number of properties give more details about the algorithm.
The Status property returns an AlgorithmStatus value that indicates how the algorithm terminated. The
possible values are listed below:
AlgorithmStatus values
Value | Description |
---|
NoResult | The algorithm has not been executed. |
Busy | The algorithm is still executing. |
Converged | The algorithm ended normally. The desired accuracy has been achieved. |
IterationLimitExceeded |
The number of iterations needed to achieve the desired accuracy is greater than MaxIterations.
|
EvaluationLimitExceeded |
The number of function evaluations needed to achieve the desired accuracy is greater than MaxEvaluations.
|
RoundOffError | Round-off prevented the algorithm from achieving the desired accuracy. |
BadFunction | Bad behavior of the target function prevented the algorithm
from achieving the desired accuracy. |
Divergent | The algorithm diverges. The objective function appears to be unbounded. |
Each of the ConvergenceTest objects discussed earlier exposes an
Error property that returns the estimated error used in the test.
The IterationsNeeded
and EvaluationsNeeded
properties return the number of iterations and the number of evaluations of the objective function, respectively.
Algorithms that use gradients also have a
GradientEvaluationsNeeded.
The final example prints a summary of the results of running the BFGS algorithm
on the Rosenbrock function:
Console.WriteLine("BFGS Method:");
Console.WriteLine(" Solution: {0}", bfgs.Extremum);
Console.WriteLine(" Estimated error: {0}", bfgs.SolutionTest.Error);
Console.WriteLine(" Gradient residual: {0}", bfgs.GradientTest.Error);
Console.WriteLine(" # iterations: {0}", bfgs.IterationsNeeded);
Console.WriteLine(" # function evaluations: {0}",
bfgs.EvaluationsNeeded);
Console.WriteLine(" # gradient evaluations: {0}",
bfgs.GradientEvaluationsNeeded);
Console.WriteLine("BFGS Method:")
Console.WriteLine(" Solution: {0}", bfgs.Extremum)
Console.WriteLine(" Estimated error: {0}", bfgs.SolutionTest.Error)
Console.WriteLine(" Gradient residual: {0}", bfgs.GradientTest.Error)
Console.WriteLine(" # iterations: {0}", bfgs.IterationsNeeded)
Console.WriteLine(" # function evaluations: {0}",
bfgs.EvaluationsNeeded)
Console.WriteLine(" # gradient evaluations: {0}",
bfgs.GradientEvaluationsNeeded)
No code example is currently available or this language may not be supported.
Console.WriteLine("BFGS Method:")
Console.WriteLine(" Solution: 0", bfgs.Extremum)
Console.WriteLine(" Estimated error: 0", bfgs.SolutionTest.Error)
Console.WriteLine(" Gradient residual: 0", bfgs.GradientTest.Error)
Console.WriteLine(" # iterations: 0", bfgs.IterationsNeeded)
Console.WriteLine(" # function evaluations: 0",
bfgs.EvaluationsNeeded)
Console.WriteLine(" # gradient evaluations: 0",
bfgs.GradientEvaluationsNeeded)
Fletcher, R. Practical Methods of Optimization, John Wiley & Sons, New York, 1987.
Fletcher, R. and Reeves, C.M., Function Minimization by Conjugate Gradients, Comp, J. 7, 149-154, 1964.
Moré and D. Thuente, Line Search Algorithms with Guaranteed Sufficient Decrease,
ACM Transactions on
Mathematical Software
, Vol. 20, No. 3, pp. 286--307, 1994.
Nelder, J. A. and Mead, R. "A Simplex Method for Function Minimization." Comput. J. 7, 308-313, 1965.
Polak, E., and Ribière, G., Note sur la Convergence de Methodes de Directions Conjugees, Revue
Francaise d'Informatique et de Recherche Operationnelle, 3, 35-43, 1969.
Powell, M.J.D., An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating
Derivatives, Comp. J. 7, 155-162, 1964.