Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Nuget packages
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference

Skip Navigation LinksHome»Documentation»Statistics Library User's Guide»Regression Analysis»Logistic Regression

Logistic Regression

Extreme Optimization Numerical Libraries for .NET Professional

Logistic regression is a technique to analyze a situation where the outcome can have two possible values (binomial logistic regression). The model estimates the probability that an observation leads to either of the outcomes. A generalization to more than two outcomes, in the form of nominal logistic regression, is possible. The LogisticRegressionModel class implements logistic regression.

Constructing Logistic Regression Models

The LogisticRegressionModel class has four constructors.

The first constructor takes two arguments. The first is a ICategoricalVector that represents the dependent variable. The second is a parameter array of VectorT objects that represent the independent variables. This creates a binary logistic regression model.

C#
VB
C++
F#
Copy
var dependent = Vector.CreateCategorical(yData);
var independent1 = Vector.Create(x1Data);
var independent2 = Vector.Create(x2Data);
var model1 = new LogisticRegressionModel(dependent, independent1, independent2);
Dim dependent = Vector.CreateCategorical(yData)
Dim independent1 = Vector.Create(x1Data)
Dim independent2 = Vector.Create(x2Data)
Dim model1 = New LogisticRegressionModel(dependent, independent1, independent2)

No code example is currently available or this language may not be supported.

let dependent = Vector.CreateCategorical(yData)
let independent1 = Vector.Create(x1Data)
let independent2 = Vector.Create(x2Data)
let model1 = new LogisticRegressionModel(dependent, independent1, independent2)

The second constructor takes 3 arguments. The first argument is a IDataFrame (a DataFrameR, C or MatrixT) that contains the variables to be used in the regression. The second argument is a string containing the name of the dependent variable. The third argument is a parameter array of strings containing the names of the independent variables. All the names must exist in the column index of the data frame specified by the first parameter.

C#
VB
C++
F#
Copy
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
    { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } });
var model2 = new LogisticRegressionModel(dataFrame, "y", "x1", "x2");
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
      {{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New LogisticRegressionModel(frame, "y", "x1", "x2")

No code example is currently available or this language may not be supported.

let columns = Dictionary<string,obj>()
columns.Add("y", dependent)
columns.Add("x1", independent1) 
columns.Add("x2", independent2)
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = new LogisticRegressionModel(dataFrame, "y", "x1", "x2")

The third and fourth constructor are like the first two, but take arrays instead of parameter arrays, and also take two optional arguments. The first optional argument specifies a vector with weights for the observations. The second optional argument specifies the method: binary or nominal.

Computing the Regression

The Compute method performs the actual analysis. Most properties and methods throw an exception when they are accessed before the Compute method is called. You can verify that the model has been calculated by inspecting the Computed property.

C#
VB
C++
F#
Copy
model1.Fit();
model1.Fit()

No code example is currently available or this language may not be supported.

model1.Fit()

The Predictions property returns a CategoricalVectorT that contains the values of the dependent variable as predicted by the model. The PredictedProbabilities property returns a MatrixT that gives the probability of each outcome for each observation. A related property, PredictedLogProbabilities property returns the natural logarithm of the predicted probabilities. The ProbabilityResiduals property returns a matrix containing the difference between the actual (0 or 1) and the predicted probabilities.

Regression Parameters

The LogisticRegressionModel class' Parameters property returns a ParameterVectorT object that contains the parameters of the regression model. The elements of this vector are of type ParameterT. Regression parameters are created by the model. You cannot create them directly.

Parameters can be accessed by numerical index or by name. The name of a parameter is usually the name of the variable associated with it.

A logistic regression model has as many parameters as there are independent variables, plus one for the intercept (constant term) when it is included. The intercept, if present, is the first parameter in the collection, with index 0.

The parameters of a logistic regression are assumed to have a normal distribution. This is an approximation that is valid if the number of observations is large enough. Parameters for other models are assumed to have a student-t distribution, which for all practical purposes is equivalent to a normal distribution.

The ParameterT class has four useful properties. The Value property returns the numerical value of the parameter, while the StandardError property returns the standard deviation of the parameter's distribution.

The Statistic property returns the value of the z-statistic corresponding to the hypothesis that the parameter equals zero. The PValue property returns the corresponding p-value. A high p-value indicates that the variable associated with the parameter does not make a significant contribution to explaining the data. The p-value always corresponds to a two-tailed test. The following example prints the properties of the slope parameter of our earlier example:

C#
VB
C++
F#
Copy
var dataFrame = DataFrame.FromColumns(new Dictionary<string, object>()
    { { "y", dependent }, { "x1", independent1 }, { "x2", independent2 } });
var model2 = new LogisticRegressionModel(dataFrame, "y", "x1", "x2");
Dim frame = DataFrame.FromColumns(New Dictionary(Of String, Object)() From
      {{"y", dependent}, {"x1", independent1}, {"x2", independent2}})
Dim model2 = New LogisticRegressionModel(frame, "y", "x1", "x2")

No code example is currently available or this language may not be supported.

let columns = Dictionary<string,obj>()
columns.Add("y", dependent)
columns.Add("x1", independent1) 
columns.Add("x2", independent2)
let dataFrame = DataFrame.FromColumns<string>(columns)
let model2 = new LogisticRegressionModel(dataFrame, "y", "x1", "x2")

The Parameter class has one method: GetConfidenceInterval. This method takes one argument: a confidence level between 0 and 1. A value of 0.95 corresponds to a confidence level of 95%. The method returns the confidence interval for the parameter at the specified confidence level as an Interval structure.

Verifying the Quality of the Regression

Because a logistic regression model is computed by directly maximizing the likelihood function, it does not have the same range of diagnostic values available for linear regression models.

The LogLikelihood method returns the logarithm of the likelihood of the computed model. The regression parameters were computed by maximizing this value.

Likelihood Ratio Test

The quality of the model can be measured using a likelihood ratio test, available through the GetLikelihoodRatioTest method. Without parameters, this method returns a test for the hypothesis that all parameters that correspond to independent variables are zero. It compares the likelihood of the current model with a model that only has a constant term. The resulting statistic follows a chi-squared distribution. This method returns a SimpleHypothesisTest that can be used to verify the hypothesis at any significance level.

The GetLikelihoodRatioTest method has a second overload that takes another LogisticRegressionModel as its only argument. This model must be nested inside the current model: the dependent variable must be the same, and all independent variables in the nested model must also be in the current model. The test indicates whether the inclusion of additional independent variables in the current model produced a significantly better model.

Wald Test

The Wald test is a generalization of the z-test on individual parameters to multiple parameters, and takes into account interactions between the parameters. The test statistic follows a chi-squared distribution with degrees of freedom equal to the number of parameters included in the test.

Without parameters, the GetWaldTest method returns the Wald test for the entire model, including the constant term. This method returns a SimpleHypothesisTest object that can be used to verify the hypothesis at any significance level.

The GetWaldTest method can take one parameter, which must be an array of integers specifying the indexes of the parameters that are to be included in the test.

Copyright (c) 2004-2023 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2023, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.