Extreme Optimization > User's Guide > Statistics Library > Hypothesis Tests > Testing Means

Extreme Optimization User's Guide

User's Guide

Up: Hypothesis Tests Next: Testing Variances Previous: Hypothesis Test Basics Contents

Testing Means

There are two common tests of the hypothesis that the mean of a sample comes from a distribution with a specified mean. One test, the one sample z test, is used when the standard deviation or the variance of the population is known. The other, the one sample t test, is used when the variance of the population is not known. The t test also has a two sample version, which tests whether the difference between the means of two samples is equal to a given value.

The One Sample z Test

The one sample z test is used to test the hypothesis that a sample comes from a population with a specified mean when the variance or standard deviation is known. The test is based on the assumption that the sample is randomly selected from the population, and that the population itself follows a normal distribution. If either of these assumptions is violated, the reliability of the z test may be compromised.

The null hypothesis is always that the population underlying the sample has a mean that is equal to the proposed mean. The alternative hypothesis depends on whether a one or two-tailed test is performed.

For the one-tailed test, the alternative hypothesis is that the population from which the sample was drawn has a mean that is either less than (lower tailed) or greater than (upper tailed) the proposed mean. For the two-tailed version, the alternative hypothesis is that the mean of the population does not equal the the proposed mean.

The one sample z test is implemented by the OneSampleZTest class. It has four constructors in all, which can be grouped in two pairs.

The first two constructors take 4 or 5 arguments. The first two arguments are he sample mean and the sample size. The next two arguments are the population mean and the population standard deviation. If present, the fifth argument is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

The second pair of constructors take 3 or 4 arguments. The first argument is a NumericalVariable that contains the sample data. The next two arguments are once again the population mean and standard deviation. The fourth argument, if present, is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

Example

The test scores of a class on a national test are as follows:

61, 77, 61, 90, 72, 51, 75, 83, 53, 82, 82, 66, 68, 57, 61, 61, 78, 69, 65.

We want to investigate if the mean of this class is significantly different from the national average, 79.3. The standard deviation is 7.3, The following code performs the test:

C# CopyCode imageCopy Code
double[] group1Data = new double[]
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84, 
     68, 82, 72, 71, 85, 66, 61, 79, 81, 73};        
NumericalVariable results = new NumericalVariable("Class 1", group1Data);
OneSampleZTest zTest = new OneSampleZTest(results, 79.3, 7.3);
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", zTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}", 
    zTest.Reject() ? "yes" : "no");
Visual Basic CopyCode imageCopy Code
Dim group1Data As Double() = New Double() _
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
    68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
Dim group1Results As NumericalVariable = _
    New NumericalVariable("Class 1", group1Data)
Dim zTest As OneSampleZTest = New OneSampleZTest(group1Results, 79.3, 7.3)
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", zTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}", _
    IIf(zTest.Reject(), "yes", "no"))

The value of the z-statistic turns out to be -2.4505 giving a p-value of 0.0143. As a result, the hypothesis that on average, the students in this class score no different than the national average is rejected at the 0.05 level.

Using pre-calculated values for the mean and sample size, the above example would look like this:

C# CopyCode imageCopy Code
double mean = 75.3
int sampleSize = 20;
OneSampleZTest zTest = new OneSampleZTest(mean, sampleSize, 79.3, 7.3);
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", zTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}", 
    zTest.Reject() ? "yes" : "no");
Visual Basic CopyCode imageCopy Code
Dim mean As Double = 75.3
Dim sampleSize As Integer = 20
Dim zTest As OneSampleZTest = New OneSampleZTest(mean, sampleSize, 79.3, 7.3)
Console.WriteLine("Test statistic: {0:F4}", zTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", zTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}", _
    IIf(zTest.Reject(), "yes", "no"))

Once a OneSampleZTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain a 95% confidence interval around the mean, the code would be:

C# CopyCode imageCopy Code
Interval meanInterval = zTest.GetConfidenceInterval();
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}", 
    meanInterval.LowerBound, meanInterval.UpperBound);
Visual Basic CopyCode imageCopy Code
Dim meanInterval As Interval = zTest.GetConfidenceInterval()
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}", _
    meanInterval.LowerBound, meanInterval.UpperBound)

The confidence interval for the mean is 72.1 and 78.5 at the 95% confidence level.

The One Sample t Test

The one sample t test is used to test the hypothesis that a sample comes from a population with a specified mean when the variance or standard deviation is not known. The test is based on the assumption that the sample is randomly selected from the population, and that the population itself follows a normal distribution. If either of these assumptions is violated, the reliability of the t test may be compromised.

The null hypothesis is always that the population underlying the sample has a mean that is equal to the proposed mean. The alternative hypothesis depends on whether a one or two-tailed test is performed.

For the one-tailed test, the alternative hypothesis is that the population from which the sample was drawn has a mean that is either less than (lower tailed) or greater than (upper tailed) the proposed mean. For the two-tailed version, the alternative hypothesis is that the mean of the population does not equal the the proposed mean.

The one sample t test is implemented by the OneSampleTTest class. It has five constructors in all. The first constructor takes no arguments. The source data must be specified by setting properties of the object.

The remaining four can be grouped in two pairs. The first two constructors take 3 or 4 arguments. The first two arguments are he sample mean and the sample size. The next argument is the population mean. If present, the fourth argument is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

The second pair of constructors take 2 or 3 arguments. The first argument is a NumericalVariable that contains the sample data. The next argument is once again the population mean. The third argument, if present, is a HypothesisType value that specifies whether the test is one or two-tailed. The default value is HypothesisType.TwoTailed.

Example

We use the same data as in the earlier example for the one sample z test, but this time we assume the standard deviation of the population is not known.

C# CopyCode imageCopy Code
double[] group1Data = new double[]
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84, 
     68, 82, 72, 71, 85, 66, 61, 79, 81, 73};        
NumericalVariable results = new NumericalVariable("Class 1", group1Data);
OneSampleTTest tTest = new OneSampleTTest(results, 79.3);
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", tTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}", 
    tTest.Reject() ? "yes" : "no");
Visual Basic CopyCode imageCopy Code
Dim group1Data As Double() = New Double() _
    {62, 77, 61, 94, 75, 82, 86, 83, 64, 84,
    68, 82, 72, 71, 85, 66, 61, 79, 81, 73}
Dim group1Results As NumericalVariable = _
    New NumericalVariable("Class 1", group1Data)
Dim tTest As OneSampleTTest = New OneSampleTTest(group1Results, 79.3)
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", tTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}", _
    IIf(tTest.Reject(), "yes", "no"))

The value of the t-statistic turns is -1.8800 giving a p-value of 0.0755. As a result, the hypothesis that on average, the students in this class score no different than the national average is not rejected at the 0.05 level.

The one-sample t test can also be performed using only the mean and the size of the sample. The corresponding code for the above example would look like this:

C# CopyCode imageCopy Code
double mean = 75.3
int sampleSize = 20;
OneSampleTTest tTest = new OneSampleTTest(mean, sampleSize, 79.3);
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic);
Console.WriteLine("P-value:        {0:F4}", tTest.PValue);
Console.WriteLine("Reject null hypothesis? {0}", 
    tTest.Reject() ? "yes" : "no");
Visual Basic CopyCode imageCopy Code
Dim mean As Double = 75.3
Dim sampleSize As Integer = 20
Dim tTest As OneSampleTTest = New OneSampleTTest(mean, sampleSize, 79.3)
Console.WriteLine("Test statistic: {0:F4}", tTest.Statistic)
Console.WriteLine("P-value:        {0:F4}", tTest.PValue)
Console.WriteLine("Reject null hypothesis? {0}", _
    IIf(tTest.Reject(), "yes", "no"))

Once a OneSampleTTest object has been created, you can access other properties and methods common to all hypothesis test classes. For instance, to obtain a 95% confidence interval around the mean, the code would be:

C# CopyCode imageCopy Code
Interval meanInterval = zTest.GetConfidenceInterval();
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}", 
    meanInterval.LowerBound, meanInterval.UpperBound);
Visual Basic CopyCode imageCopy Code
Dim meanInterval As Interval = zTest.GetConfidenceInterval()
Console.WriteLine("95% Confidence interval for the mean: {0:F1} - {1:F1}", _
    meanInterval.LowerBound, meanInterval.UpperBound)

Note that this interval (70.8-79.8) is wider than for the one-sample z test. The reason is that the uncertainty in the standard deviation of the population causes an increase in the uncertainty in the mean.

The Two Sample t Test

The two sample t test is used to test the hypothesis that two samples are drawn from a population with the same mean. The test is based on the assumption that the samples are randomly selected from the populations, and that the populations themselves follow a normal distribution. A third assumption states that the variances of the populations underlying each of the samples are equal. If any of these three assumptions is violated, the reliability of the z test may be compromised.

The null hypothesis is always that the difference between the means of the populations from which the samples were taken is equal to a specific value, which may be zero. The alternative hypothesis depends on whether a one or two-tailed test is performed.

For the one-tailed test, the alternative hypothesis is that the difference between the means is less than (lower tailed) or greater than (upper tailed) the proposed value. For the two-tailed version, the alternative hypothesis is that the difference between the means of the two populations does not equal the proposed value.

There is a further distinction between a paired and an unpaired test. In an unpaired test, the two samples are independent from each other. The populations represent two entirely independent properties. In the paired test, the two samples represent two properties of each subject from a single population. For example, test scores for two different groups would require an unpaired test. Test scores for a single group on two different tests would require a paired test.

For example, two samples of the heart rate of two independent groups of subjects are independent. The mean heart rates of the two groups should be compared using the unpaired test. Two sets of heart rate measurements of the same subjects, before and after some physical activity, are dependent. The mean heart rates should be compared using the paired test.

Another distinction is whether the variances of the two samples are assumed to be equal or not. Equal variances lead to simpler formulas.

The two sample t test is implemented by the TwoSampleTTest class. There are five constructors in all, reflecting the different variations of the test.

The first constructor takes no arguments. All test parameters must be provided by setting the properties of the TwoSampleTTest object.

The first two arguments of each constructor are NumericalVariable objects that represent the samples the test is to be applied to. The first constructor only has these two arguments. This creates an unpaired test for equality of the means. The variances are estimated from the sample data. The second constructor takes a third parameter that specifies the proposed difference between the two means. This value is positive if the mean of the first sample is greater than the mean of the second sample. If omitted, the difference is taken to be zero.

The third and fourth constructors are similar to the first two, but take two additional parameters. The first additional parameter is a SamplePairing value that specifies whether the test is paired or unpaired. A value of SamplePairing.Paired produces a paired test. A value of SamplePairing.Unpaired produces an unpaired test.

The second additional parameter is a VarianceAssumption value. It is only meaningful for unpaired tests. A value of VarianceAssumption.AssumeEqual indicates that the variance of the two samples should be assumed to be equal, which results in somewhat simpler calculations.

Example of an unpaired test

Once again, we use the same data as before. However, this time we compare the results of one group of students to the results of a second group of students, with these test scores:

61, 80, 98, 90, 94, 65, 79, 75, 74, 86, 76, 85, 78, 72, 76, 79, 65, 92, 76, 80

The code below performs the unpaired two-sample t-test:

C# CopyCode imageCopy Code
double[] group2Data = new double[]
    {61, 80, 98, 90, 94, 65, 79, 75, 74, 86, 
        76, 85, 78, 72, 76, 79, 65, 92, 76, 80};
NumericalVariable group2Results =    new NumericalVariable("Class 2", group2Data);
TwoSampleTTest tTest2 = new TwoSampleTTest(group1Results, group2Results,
    SamplePairing.Unpaired, VarianceAssumption.None);
    
Console.WriteLine("Test statistic: {0:F4}", tTest2.Statistic);
Console.WriteLine("P-value:        {0:F4}", tTest2.Probability);
Console.WriteLine("Reject null hypothesis? {0}",
    tTest2.Reject() ? "yes" : "no");
Visual Basic CopyCode imageCopy Code
Dim group2Data As Double() = _
    {61, 80, 98, 90, 94, 65, 79, 75, 74, 86, _
        76, 85, 78, 72, 76, 79, 65, 92, 76, 80}
Dim group2Results As NumericalVariable = _
    New NumericalVariable("Class 2", group2Data)
Dim tTest2 As TwoSampleTTest = New TwoSampleTTest(group1Results, group2Results, _
    SamplePairing.Unpaired, VarianceAssumption.None)
Console.WriteLine("Test statistic: {0:F4}", tTest2.Statistic)
Console.WriteLine("P-value:        {0:F4}", tTest2.PValue)
Console.WriteLine("Reject null hypothesis? {0}",
    IIf(tTest2.Reject(), "yes", "no"))

The value of the t-statistic is -1.4337 giving a p-value of 0.1598. As a result, the hypothesis that on average, the students in the first group score no different than the students in the second group is not rejected at the 0.05 level.

Up: Hypothesis Tests Next: Testing Variances Previous: Hypothesis Test Basics Contents

Overview
Introduction
Features
Documentation
QuickStart Samples
Sample Applications
Downloads
Get it now!
Download trial version
How to Buy
Information
Resources
Contact Us
Search

"The Extreme Optimization Statistics Library for .NET is a major boon for those doing statistical work in .NET. I strongly recommend this product."
- Marc Brooks

"I have made it my mission to institutionalize the value of good API design.  I strongly believe that this is key to making developers more productive and happy on our platform. It is clear that you value good API design in your work, and take to heart developer productivity and synergy with the .NET framework."
- Brad Abrams,
Lead Program Manager, Microsoft.

This is a partial list of companies who are using our libraries:
ABB Robotics
Allstate
Applied Materials
Arcam
Astra Schedule
Babson College
Canadian Council on Learning
Canyon Associates
Caxton Associates
CECity
Constellation Energy
CreditSights
DeepOcean
Duke University
Dynamotive
Elecsoft
Engelhard Corporation
Epcor
Equipoise Software
Galileo International
GAM UK
Gammex
GlaxoSmithKline
Global Matrix
The Hartford
Infinera Corporation
Intel
JDS Uniphase
LaBranche & Co.
Learning & Skills Council
Jacobs Consultancy
Litman Gregory
Lucas Systems
Malvern Instruments
Medrio
Merck & Co.
Mintera.
Monitor Software
MorningStar
NanoString Technologies
Paletta Invent
Parametric Portfolio Associates
Prosanos
RATA Associates
RiskShield
Ramboll
Standard & Poor's
Strategic Analysis Corporation
Univ. of Alicante
Univ. of South Carolina
vielife
Xerox
US Army