Extreme Optimization™: Complexity made simple.

Math and Statistics
Libraries for .NET

  • Home
  • Features
    • Math Library
    • Vector and Matrix Library
    • Statistics Library
    • Performance
    • Usability
  • Documentation
    • Introduction
    • Math Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Reference
  • Resources
    • Downloads
    • QuickStart Samples
    • Sample Applications
    • Frequently Asked Questions
    • Technical Support
  • Blog
  • Order
  • Company
    • About us
    • Testimonials
    • Customers
    • Press Releases
    • Careers
    • Partners
    • Contact us
Introduction
Deployment Guide
Configuration
Using Parallelism
Expand Mathematics Library User's GuideMathematics Library User's Guide
Expand Vector and Matrix Library User's GuideVector and Matrix Library User's Guide
Expand Data Analysis Library User's GuideData Analysis Library User's Guide
Expand Statistics Library User's GuideStatistics Library User's Guide
Expand Data Access Library User's GuideData Access Library User's Guide
Expand ReferenceReference
  • Extreme Optimization
    • Features
    • Solutions
    • Documentation
    • QuickStart Samples
    • Sample Applications
    • Downloads
    • Technical Support
    • Download trial
    • How to buy
    • Blog
    • Company
    • Resources
  • Documentation
    • Introduction
    • Deployment Guide
    • Configuration
    • Using Parallelism
    • Mathematics Library User's Guide
    • Vector and Matrix Library User's Guide
    • Data Analysis Library User's Guide
    • Statistics Library User's Guide
    • Data Access Library User's Guide
    • Reference
  • Statistics Library User's Guide
    • Statistical Variables
    • Numerical Variables
    • Statistical Models
    • Regression Analysis
    • Analysis of Variance
    • Time Series Analysis
    • Multivariate Analysis
    • Continuous Distributions
    • Discrete Distributions
    • Multivariate Distributions
    • Kernel Density Estimation
    • Hypothesis Tests
    • Appendices
  • Hypothesis Tests
    • Hypothesis Test Basics
    • Testing Means
    • Testing Variances
    • Testing Goodness-Of-Fit
    • Testing Homogeneity of Variances
    • Non-Parametric Tests
    • Testing for Outliers
  • Non-Parametric Tests
Non-Parametric TestsExtreme Optimization Numerical Libraries for .NET Professional

A non-parametric test is a hypothesis test that does not make any assumptions about the distribution of the samples.

The Mann-Whitney Test

The Mann-Whitney test, also known as the Wilcoxon rank sum test or the Wilcoxon-Mann-Whitney test, tests the hypothesis that two samples were drawn from the same distribution. The test relies only on the relative ranks of the observations in the combined sample. It does not rely on any properties of the distributions.

The null hypothesis is that the samples were drawn from the same distribution. Location is the dominant factor in the comparison, but if the two distributions have different shapes, this may also affect the result. The alternative hypothesis for the two-tailed test (the default) is that the two samples were drawn from different distributions.

For the one-tailed test, the alternative hypothesis is roughly that the population from which the first sample was drawn has a median that is either less than (lower tailed) or greater than (upper tailed) the median of the population from which the second sample was drawn.

The Mann-Whitney test is implemented by the MannWhitneyTest class. It has three constructors. The first constructor takes no arguments. All test parameters must be provided by setting the properties of the MannWhitneyTest object. The samples the test is to be applied to must be specified by setting the Sample1 and Sample2 properties. The second constructor takes two vector arguments objects that represent the samples the test is to be applied to.

The third constructor also takes two arguments. The first is once again a VectorT that contains the observations from the two samples combined. The second argument must be a CategoricalVectorT with two levels that specifies which sample the observations in the first variable belong to.

The distribution of the Mann-Whitney statistic U can be computed exactly. For larger samples, an approximation in terms of the normal distribution can be used. If the observations don't have distinct ranks (i.e. there are ties), then the exact calculations are not available. The exact test is used by default if there are no ties, and if the product of the two sample sizes is less than or equal to 1600. You can override the default behaviour by setting the Exactness property. Note that the 'exact' calculation gives incorrect results if ties are present.

Example

In this example, we examine whether the onset of type II diabetes is different for males and females. The onset in our sample is as follows:

  • Males: 19 22 16 29 24

  • Females: 20 11 17 12

The code below performs the exact Mann-Whitney test on this sample:

C#
VB
C++
F#
Copy
var males = Vector.Create(19, 22, 16, 29, 24);
var females = Vector.Create(20, 11, 17, 12);
var mwTest = new MannWhitneyTest<int>(males, females);
Console.WriteLine("Test statistic: {0:F4}", mwTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", mwTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    mwTest.Reject() ? "yes" : "no");
Dim males = Vector.Create(19, 22, 16, 29, 24)
Dim females = Vector.Create(20, 11, 17, 12)
Dim mwTest = New MannWhitneyTest(Of Integer)(males, females)
Console.WriteLine("Test statistic: {0:F4}", mwTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", mwTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(mwTest.Reject(), "yes", "no"))

No code example is currently available or this language may not be supported.

let males = Vector.Create(19, 22, 16, 29, 24)
let females = Vector.Create(20, 11, 17, 12)
let mwTest = MannWhitneyTest<int>(males, females)
printfn "Test statistic: %.4f" mwTest.Statistic
printfn "P-value:        %.4f" mwTest.PValue
printfn "Reject null hypothesis? %s"
    (if mwTest.Reject() then "yes" else "no")

The value of the U-statistic is 3, giving a two-tailed p-value of 0.1111. As a result, the hypothesis that on average, type II diabetes starts around the same age in males and females is not rejected at the 0.05 significance level.

The Kruskal-Wallis Test

The Kruskal-Wallis test is a generalization of the Mann-Whitney test to more than two samples. It tests the hypothesis that the supplied samples were all drawn from the same distribution. The test relies only on the relative ranks of the observations in the combined sample. It does not rely on any properties of the distributions.

The null hypothesis is that the samples were drawn from the same distribution. Location is the dominant factor in the comparison, but if the distributions have different shapes, this may also affect the result. The alternative hypothesis for the two-tailed test is that the samples were drawn from different distributions.

The Kruskal-Wallis test is implemented by the KruskalWallisTest class. It has three constructors. The first constructor takes no arguments. All test parameters must be provided by setting the properties of the KruskalWallisTest object. The samples the test is to be applied to must be specified through the Samples property.

The second constructor takes an array of VectorT objects that represent the samples the test is to be applied to. The third constructor also takes two arguments. The first is once again a VectorT that contains the observations from all the samples combined. The second argument must be a CategoricalVectorT that specifies which sample the observations in the first variable belong to.

Although the distribution of the Kruskal-Wallis statistic can be computed exactly, it is impractical except for very small values. We use an approximation in terms of the beta distribution which is more accurate than the more commonly used approximation in terms of a Chi-square distribution.

Example

In this example, we examine whether different teaching methods have an effect on a standardized exam score. The exam scores for students that were taught using each of the three methods were as follows:

  • Method 1: 94, 87, 90, 74, 86, 97

  • Method 2: 82, 85, 79, 84, 61, 72, 80

  • Method 3: 89, 68, 72, 76, 69, 65

The code below performs the Kruskal-Wallis test on these samples:

C#
VB
C++
F#
Copy
var method1 = Vector.Create<double>(94, 87, 90, 74, 86, 97);
var method2 = Vector.Create<double>(82, 85, 79, 84, 61, 72, 80);
var method3 = Vector.Create<double>(89, 68, 72, 76, 69, 65);
var kwTest = new KruskalWallisTest(method1, method2, method3);
Console.WriteLine("Test statistic: {0:F4}", kwTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", kwTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    kwTest.Reject() ? "yes" : "no");
Dim method1 = Vector.Create(Of Double)(94, 87, 90, 74, 86, 97)
Dim method2 = Vector.Create(Of Double)(82, 85, 79, 84, 61, 72, 80)
Dim method3 = Vector.Create(Of Double)(89, 68, 72, 76, 69, 65)
Dim kwTest = New KruskalWallisTest(method1, method2, method3)
Console.WriteLine("Test statistic: {0:F4}", kwTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", kwTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(kwTest.Reject(), "yes", "no"))

No code example is currently available or this language may not be supported.

let method1 = Vector.Create(94., 87., 90., 74., 86., 97.)
let method2 = Vector.Create(82., 85., 79., 84., 61., 72., 80.)
let method3 = Vector.Create(89., 68., 72., 76., 69., 65.)
let kwTest = KruskalWallisTest(method1, method2, method3)
printfn "Test statistic: %.4f" kwTest.Statistic
printfn "P-value:        %.4f" kwTest.PValue
printfn "Reject null hypothesis? %s"
    (if kwTest.Reject() then "yes" else "no")

The value of the Kruskal-Wallis statistic is 7.5023, giving a two-tailed p-value of 0.0235. As a result, the hypothesis that on average, the teaching method has no effect on exam scores is rejected at the 0.05 significance level.

The runs Test

The runs test (also called Wald–Wolfowitz test) is a test of randomness. It compares the lengths of runs of the same value in a sample to what would be expected in a random sample. In numerical data, it uses the runs of values that are above or below a cut point. The test relies only on the sequence of runs of the same value. It does not rely on any properties of the distributions.

The null hypothesis is that the samples were drawn randomly. The alternative hypothesis is that the samples were not drawn randomly. For a one-tailed test, the alternative hypothesis is that the samples tend to occur in groups (lower tailed) or that the samples tend to alternate (upper tailed).

The runs test is implemented by the RunsTest class. It has three constructors. The first constructor takes no arguments. All test parameters must be provided by setting the properties of the MannWhitneyTest object. The samples the test is to be applied to must be specified by setting the Sample1 and Sample2 properties. The second constructor takes two vector arguments objects that represent the samples the test is to be applied to.

The third constructor also takes two arguments. The first is once again a VectorT that contains the observations from the two samples combined. The second argument must be a CategoricalVectorT with two levels that specifies which sample the observations in the first variable belong to.

Example

In this example, we investigate a series of 23 estimates for the density of the Earth by Henry Cavendish. We want to know if there is a correlation between successive observations, which would mean not all measurements were made independently.

C#
VB
C++
F#
Copy
var males = Vector.Create(19, 22, 16, 29, 24);
var females = Vector.Create(20, 11, 17, 12);
var mwTest = new MannWhitneyTest<int>(males, females);
Console.WriteLine("Test statistic: {0:F4}", mwTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", mwTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}",
    mwTest.Reject() ? "yes" : "no");
Dim males = Vector.Create(19, 22, 16, 29, 24)
Dim females = Vector.Create(20, 11, 17, 12)
Dim mwTest = New MannWhitneyTest(Of Integer)(males, females)
Console.WriteLine("Test statistic: {0:F4}", mwTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", mwTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(mwTest.Reject(), "yes", "no"))

No code example is currently available or this language may not be supported.

let males = Vector.Create(19, 22, 16, 29, 24)
let females = Vector.Create(20, 11, 17, 12)
let mwTest = MannWhitneyTest<int>(males, females)
printfn "Test statistic: %.4f" mwTest.Statistic
printfn "P-value:        %.4f" mwTest.PValue
printfn "Reject null hypothesis? %s"
    (if mwTest.Reject() then "yes" else "no")

The value of the test statistic is -1.7477, corresponding to a p-value of 0.08051. As a result, the hypothesis that the measurements are uncorrelated is not rejected at the 0.05 significance level.

Copyright (c) 2004-2016 ExoAnalytics Inc.

Send comments on this topic to support@extremeoptimization.com

Copyright © 2004-2018, Extreme Optimization. All rights reserved.
Extreme Optimization, Complexity made simple, M#, and M Sharp are trademarks of ExoAnalytics Inc.
Microsoft, Visual C#, Visual Basic, Visual Studio, Visual Studio.NET, and the Optimized for Visual Studio logo
are registered trademarks of Microsoft Corporation.