Extreme Optimization >
User's Guide >
Statistics Library >
Regression Analysis >
Logistic Regression
Extreme Optimization User's Guide
User's Guide
Up: Regression Analysis Next: Analysis of Variance Previous: Nonlinear Regression Contents
Logistic Regression
Logistic regression is a technique to analyze a situation where the outcome can have two possible values.
The Extreme Optimization Numerical Libraries for .NET supports logistic
regression through the LogisticRegressionModel class.
Constructing Logistic Regression Models
The LogisticRegressionModel class has four constructors.
The first constructor takes two parameters. The first is a NumericalVariable that
represents the dependent variable. The second is an array of NumericalVariable objects that
represent the independent variables.
| C# | Copy Code |
NumericalVariable dependent = new NumericalVariable("y", yData);
NumericalVariable independent1 = new NumericalVariable("x1", x1Data);
NumericalVariable independent2 = new NumericalVariable("x2", x2Data);
LogisticRegressionModel model1 = new LogisticRegressionModel(dependent, independent); |
| Visual Basic | Copy Code |
Dim dependent As NumericalVariable = New NumericalVariable("y", yData)
Dim independent1 As NumericalVariable = New NumericalVariable("x1", xData)
Dim independent2 As NumericalVariable = New NumericalVariable("x2", xData)
Dim model1 As LogisticRegressionModel = _
New LogisticRegressionModel(dependent, independent) |
The second constructor takes 3 parameters. The first parameter is a
VariableCollection object that contains the variables to be used in the
regression. The second parameter is a string containing the name of the
dependent variable. The third parameter is an array of strings containing
the names of the independent variables. All the names must exist in
the collection specified by the first parameter. All variables must be of type NumericalVariable.
| C# | Copy Code |
VariableCollection variables = new VariableCollection();
variables.Add(dependent);
variables.Add(independent1);
variables.Add(independent2);
LogisticRegressionModel model2 = new LogisticRegressionModel(variables, "y", new string() {"x1", "x2"}); |
| Visual Basic | Copy Code |
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent1)
variables.Add(independent2)
Dim model2 As LogisticRegressionModel = _
New LogisticRegressionModel(variables, "y", New String() {"x1", "x2"}) |
The third constructor also takes 3 parameters. The first parameter is
a DataTable object that contains the data for the regression analysis. The
second parameter is a string containing the name of the column that contains the
data for the dependent variable. The third parameter is a string containing the
name of the column that contains the data for the independent variable. Both columns
must be numerical or convertible to numerical values.
| C# | Copy Code |
// Fill data table with data from some datasource.
LogisticRegressionModel model3 = new LogisticRegressionModel(table, "y", new string() {"x1", "x2"}); |
| Visual Basic | Copy Code |
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model3 As LogisticRegressionModel = _
New LogisticRegressionModel(table, "y", New String() {"x1", "x2"}) |
The fourth constructor takes two arguments. The first is a
Vector
containing the data of the dependent variable. The second is a
Matrix
whose columns contain the data for each independent variable. The length of the vector must equal the number of rows
of the matrix.
Computing the Regression
The Compute
method performs the actual analysis. Most properties and methods throw an exception when they are accessed before
the Compute method is called. You can verify that the model has been calculated by inspecting the
Computed property.
| C# | Copy Code |
model1.Compute(); |
| Visual Basic | Copy Code |
model1.Compute() |
The PredictedValues property returns a Vector
that contains the values of the dependent variable as predicted by the model.
The Residuals property returns a vector containing the difference between the
actual and the predicted values of the dependent variable. Both vectors contain
one element for each observation.
Regression Parameters
The LogisticRegressionModel class' Parameters
property returns a ParameterCollection
object that contains the parameters of the regression model. The members of
this collection are of type Parameter.
Regression parameters are created by the model. You cannot create them
directly.
Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable
associated with it.
A logistic regression model has as many parameters as there are independent variables, plus one for
the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0.
The name of the intercept parameter can be retrieved or set through the InterceptParameterName property.
By convention, the parameters of a logistic regression are assumed to have a normal distribution. This is an approximation
that is valid if the number of observations is large enough. Parameters for other models are assumed to have a
student-t distribution, which for all practical purposes is equivalent to a normal distribution.
The Parameter class has four useful properties. The Value property
returns the numerical value of the parameter, while the StandardError
property returns the standard deviation of the parameter's distribution.
The Statistic
property returns the value of the z-statistic corresponding to the hypothesis
that the parameter equals zero. The PValue
property returns the corresponding p-value. A high p-value indicates that
the variable associated with the parameter does not make a
significant contribution to explaining the data. The p-value always corresponds
to a two-tailed test. The following
example prints the properties of the slope parameter of our
earlier example:
| C# | Copy Code |
Parameter x1Parameter = model1.Parameters["x1"];
Console.WriteLine("Name: {0}", x1Parameter.Name);
Console.WriteLine("Value: {0}", x1Parameter.Value);
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError);
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value: {0}", x1Parameter.PValue); |
| Visual Basic | Copy Code |
Dim x1Parameter As = model1.Parameters("x1")
Console.WriteLine("Name: {0}", x1Parameter.Name)
Console.WriteLine("Value: {0}", x1Parameter.Value)
Console.WriteLine("St.Err.: {0}", x1Parameter.StandardError)
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value: {0}", x1Parameter.PValue) |
The Parameter class has one method: GetConfidenceInterval.
This method takes one parameter: a confidence level between 0 and 1. A value of
0.95 corresponds to a confidence level of 95%. The method returns the confidence
interval for the parameter at the specified confidence level as an Interval
structure.
Verifying the Quality of the Regression
Because a logistic regression model is computed by directly maximizing the likelihood function,
it does not have the same range of diagnostic values available for linear regression models.
The GetLogLikelihood
method returns the logarithm of the likelihood of the computed model. The regression parameters were
computed by maximizing this value.
Likelihood Ratio Test
The quality of the model can be measured using a likelihood ratio test, available through the GetLikelihoodRatioTest
method. Without parameters, this method returns a hypothesis test for the hypothesis that all parameters
that correspond to independent variables are zero. It compares the likelihood of the current model with a model
that only has a constant term. The resulting statistic follows a chi-squared distribution.
This method returns a SimpleHypothesisTest
object that can be used to verify the hypothesis at any significance level.
The GetLikelihoodRatioTest has a second overload that takes another
LogisticRegressionModel object as its only parameter. This model must be nested
inside the current model: the dependent variable must be the same, and all independent variables in the nested model
must also be in the current model. The test indicates whether the inclusion of additional independent variables in the
current model produced a significantly better model.
Wald Test
The Wald test is a generalization of the z-test on individual parameters to multiple parameters, and takes into account
interactions between the parameters. The test statistic follows a chi-squared distribution
with degrees of freedom equal to the number of parameters included in the test.
Without parameters, the GetWaldTest
method returns the Wald test for the entire model, including the constant term. This method returns a SimpleHypothesisTest
object that can be used to verify the hypothesis at any significance level.
The GetWaldTest method can take one parameter, which must be an array of integers specifying the
indexes of the parameters that are to be included in the test.
Up: Regression Analysis Next: Analysis of Variance Previous: Nonlinear Regression Contents
Copyright 2004-2008,
Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M
Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual
Studio.NET, and the Visual Studio Logo are registered trademarks of Microsoft Corporation