Extreme Optimization > User's Guide > Statistics Library > Regression Analysis > Logistic Regression

Extreme Optimization User's Guide

User's Guide

Up: Regression Analysis Next: Analysis of Variance Previous: Nonlinear Regression Contents

Logistic Regression

Logistic regression is a technique to analyze a situation where the outcome can have two possible values. The Extreme Optimization Numerical Libraries for .NET supports logistic regression through the LogisticRegressionModel class.

Constructing Logistic Regression Models

The LogisticRegressionModel class has four constructors.

The first constructor takes two parameters. The first is a NumericalVariable that represents the dependent variable. The second is an array of NumericalVariable objects that represent the independent variables.

C# CopyCode imageCopy Code
NumericalVariable dependent = new NumericalVariable("y", yData);
NumericalVariable independent1 = new NumericalVariable("x1", x1Data);
NumericalVariable independent2 = new NumericalVariable("x2", x2Data);
LogisticRegressionModel model1 =    new LogisticRegressionModel(dependent, independent);
Visual Basic CopyCode imageCopy Code
Dim dependent As NumericalVariable = New NumericalVariable("y", yData)
Dim independent1 As NumericalVariable = New NumericalVariable("x1", xData)
Dim independent2 As NumericalVariable = New NumericalVariable("x2", xData)
Dim model1 As LogisticRegressionModel = _
    New LogisticRegressionModel(dependent, independent)

The second constructor takes 3 parameters. The first parameter is a VariableCollection object that contains the variables to be used in the regression. The second parameter is a string containing the name of the dependent variable. The third parameter is an array of strings containing the names of the independent variables. All the names must exist in the collection specified by the first parameter. All variables must be of type NumericalVariable.

C# CopyCode imageCopy Code
VariableCollection variables = new VariableCollection();
variables.Add(dependent);
variables.Add(independent1);
variables.Add(independent2);
LogisticRegressionModel model2 =    new LogisticRegressionModel(variables, "y", new string() {"x1", "x2"});
Visual Basic CopyCode imageCopy Code
Dim variables As VariableCollection = New VariableCollection()
variables.Add(dependent)
variables.Add(independent1)
variables.Add(independent2)
Dim model2 As LogisticRegressionModel = _
    New LogisticRegressionModel(variables, "y", New String() {"x1", "x2"})

The third constructor also takes 3 parameters. The first parameter is a DataTable object that contains the data for the regression analysis. The second parameter is a string containing the name of the column that contains the data for the dependent variable. The third parameter is a string containing the name of the column that contains the data for the independent variable. Both columns must be numerical or convertible to numerical values.

C# CopyCode imageCopy Code
// Fill data table with data from some datasource.
LogisticRegressionModel model3 =    new LogisticRegressionModel(table, "y", new string() {"x1", "x2"});
Visual Basic CopyCode imageCopy Code
Dim table As DataTable = New DataTable()
' Fill data table with data from some datasource.
Dim model3 As LogisticRegressionModel = _
    New LogisticRegressionModel(table, "y", New String() {"x1", "x2"})

The fourth constructor takes two arguments. The first is a Vector containing the data of the dependent variable. The second is a Matrix whose columns contain the data for each independent variable. The length of the vector must equal the number of rows of the matrix.

Computing the Regression

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

C# CopyCode imageCopy Code
model1.Compute();
Visual Basic CopyCode imageCopy Code
model1.Compute()

The PredictedValues property returns a Vector that contains the values of the dependent variable as predicted by the model. The Residuals property returns a vector containing the difference between the actual and the predicted values of the dependent variable. Both vectors contain one element for each observation.

Regression Parameters

The LogisticRegressionModel class' Parameters property returns a ParameterCollection object that contains the parameters of the regression model. The members of this collection are of type Parameter. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A logistic regression model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0. The name of the intercept parameter can be retrieved or set through the InterceptParameterName property.

By convention, the parameters of a logistic regression are assumed to have a normal distribution. This is an approximation that is valid if the number of observations is large enough. Parameters for other models are assumed to have a student-t distribution, which for all practical purposes is equivalent to a normal distribution.

The Parameter class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope parameter of our earlier example:

C# CopyCode imageCopy Code
Parameter x1Parameter = model1.Parameters["x1"];
Console.WriteLine("Name:        {0}", x1Parameter.Name);
Console.WriteLine("Value:       {0}", x1Parameter.Value);
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError);
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic);
Console.WriteLine("p-value:     {0}", x1Parameter.PValue);
Visual Basic CopyCode imageCopy Code
Dim x1Parameter As = model1.Parameters("x1")
Console.WriteLine("Name:        {0}", x1Parameter.Name)
Console.WriteLine("Value:       {0}", x1Parameter.Value)
Console.WriteLine("St.Err.:     {0}", x1Parameter.StandardError)
Console.WriteLine("z-statistic: {0}", x1Parameter.Statistic)
Console.WriteLine("p-value:     {0}", x1Parameter.PValue)

The Parameter class has one method: GetConfidenceInterval. This method takes one parameter: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

Verifying the Quality of the Regression

Because a logistic regression model is computed by directly maximizing the likelihood function, it does not have the same range of diagnostic values available for linear regression models.

The GetLogLikelihood method returns the logarithm of the likelihood of the computed model. The regression parameters were computed by maximizing this value.

Likelihood Ratio Test

The quality of the model can be measured using a likelihood ratio test, available through the GetLikelihoodRatioTest method. Without parameters, this method returns a hypothesis test for the hypothesis that all parameters that correspond to independent variables are zero. It compares the likelihood of the current model with a model that only has a constant term. The resulting statistic follows a chi-squared distribution. This method returns a SimpleHypothesisTest object that can be used to verify the hypothesis at any significance level.

The GetLikelihoodRatioTest has a second overload that takes another LogisticRegressionModel object as its only parameter. This model must be nested inside the current model: the dependent variable must be the same, and all independent variables in the nested model must also be in the current model. The test indicates whether the inclusion of additional independent variables in the current model produced a significantly better model.

Wald Test

The Wald test is a generalization of the z-test on individual parameters to multiple parameters, and takes into account interactions between the parameters. The test statistic follows a chi-squared distribution with degrees of freedom equal to the number of parameters included in the test.

Without parameters, the GetWaldTest method returns the Wald test for the entire model, including the constant term. This method returns a SimpleHypothesisTest object that can be used to verify the hypothesis at any significance level.

The GetWaldTest method can take one parameter, which must be an array of integers specifying the indexes of the parameters that are to be included in the test.

Up: Regression Analysis Next: Analysis of Variance Previous: Nonlinear Regression Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army