Non-Parametric Tests in IronPython QuickStart Sample

Illustrates how to perform non-parametric tests like the Wilcoxon-Mann-Whitney test and the Kruskal-Wallis test in IronPython.

View this sample in: C# Visual Basic F#

```Python
import numerics

from System import Array

from Extreme.Mathematics import *
from Extreme.Statistics import *
from Extreme.Statistics.Tests import *

# Demonstrates how to use non-parametric hypothesis tests 
# like the Mann-Whitney (Wilcoxon) rank sum test and the
# Kruskal-Wallis test.

#
# Mann-Whitney test
#

print "Mann-Whitney Test"

# The Mann-Whitney test compares to samples to see if they were
# drawn from the same distribution.

# We use an example from McDonald, et.al. (1996), who compared
# the geographic variation in oyster DNA to the variation in
# proteins. A significant difference in the samples would suggest
# that natural selection played a role in the oyster diversification.

# There are two ways to create a test with multiple samples.
            
# The first is to put all the data in one variable, # and use a second variable to group the data in the first.
print "\nUsing grouping variable:"

values = NumericalVariable(Array[float]([ \
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003, \
    -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, \
    0.004, 0.049, 0.006, 0.058, -0.002, 0.015, \
    0.044, 0.024 ]))

DNA = 1
Protein = 2

groups = CategoricalVariable([ \
    DNA, DNA, DNA, DNA, DNA, DNA, \
    Protein, Protein, Protein, Protein, Protein, Protein, \
    Protein, Protein, Protein, Protein, Protein, Protein, \
    Protein, Protein ])

# With this data, we can create the test:
mw = MannWhitneyTest(values, groups)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value:        {0:.4f}".format(mw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"

# We can get the same scores for the 0.01 significance level by explicitly
# passing the significance level as a parameter to these methods:
print "Significance level:     {0:F2}".format(0.01)
print "Reject null hypothesis?", "yes" if mw.Reject(0.01) else "no"


# The second method is to put the data in different variables
print "\nUsing multiple variables:"

dnaValues = NumericalVariable(Array[float]([ \
    -0.005, 0.116,-0.006, 0.095, 0.053, 0.003 ]))
proteinValues = NumericalVariable(Array[float]([ \
    -0.005, 0.016, 0.041, 0.016, 0.066, 0.163, 0.004, \
    0.049, 0.006, 0.058, -0.002, 0.015, 0.044, 0.024 ]))

# With this data, we can create the test:
mw = MannWhitneyTest(dnaValues, proteinValues)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(mw.Statistic)
print "P-value:        {0:.4f}".format(mw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(mw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if mw.Reject() else "no"

#
# Kruskal-Wallis test
#

print "\nKruskal-Wallis Test\n"

# The Kruskal-Wallis test is a generalization of the Mann-Whitney test
# to more than 2 groups.

# The following example was taken from the NIST Engineering Statistics Handbook 
# at http:#www.itl.nist.gov/div898/handbook/prc/section4/prc41.htm
            
# The data represents percentage quarterly growth 
# in 4 investment funds:
aValues = NumericalVariable(Array[float]([ 4.2, 4.6, 3.9, 4.0 ]))
bValues = NumericalVariable(Array[float]([ 3.3, 2.4, 2.6, 3.8, 2.8 ]))
cValues = NumericalVariable(Array[float]([ 1.9, 2.4, 2.1, 2.7, 1.8 ]))
dValues = NumericalVariable(Array[float]([ 3.5, 3.1, 3.7, 4.1, 4.4 ]))

# We simply pass these variables to the constructor:
kw = KruskalWallisTest(aValues, bValues, cValues, dValues)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(kw.Statistic)
print "P-value:        {0:.4f}".format(kw.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(kw.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if kw.Reject() else "no"

#
# Runs test
#

print "\nRuns Test\n"

# The runs test is a test of randomness.

# It compares the lengths of runs of the same value
# in a sample to what would be expected.

# In numerical data, it uses the runs of successively 
# increasing or decreasing values

Male = 1
Female = 2

genders = CategoricalVariable([ \
    Male, Male, Male, Female, Female, Female, \
    Male, Male, Male, Male, Female, Female, \
    Male, Male, Male, Female, Female, Female, \
    Female, Female, Female, Female, Male, Male, \
    Female, Male, Male, Female, Female, Female, \
    Female ])

rt = RunsTest(genders)

# We can obtan the value of the test statistic through the Statistic property, # and the corresponding P-value through the PValue property:
print "Test statistic: {0:.4f}".format(rt.Statistic)
print "P-value:        {0:.4f}".format(rt.PValue)

# The significance level is the default value of 0.05:
print "Significance level:     {0:F2}".format(rt.SignificanceLevel)
# We can now print the test scores:
print "Reject null hypothesis?", "yes" if rt.Reject() else "no"

```