Seventh lecture: Hypothesis testing

Introduction
First example
The t-test
Second example
Two sampled t-test
Recommended problems

Introduction

Let us suppose, that we measure a quantity originating from an unknown distribution. We want to decide whether the distribution has a certain property (i.e. it's mean is 100?).

First example

Let us consider a box of matches. The boxes are filled based on weight, that is not every box has the same number of matches. On a box we can often read 100 $pm$ 5%. We want to test the theory, that in average there are 100 matches in the boxes. To do that we open 10 boxes. Let us suppose we counted the following numbers: 98, 96,102, 104,92, 100, 108, 101, 99, 101. If we calculate the mean at this point:

meres1=[91, 96,102, 104,92, 100, 108, 101, 96, 101];
mean(meres1)

ans =
   99.1000

The t-test

Let us suppose that the number of matches has normal distribution, that we can use t-test to verify the expected value. The output is 0 if the test accepted the null hypothesis, 1 if rejected.

ttest(meres1,100)

ans =
     0

The default significance level for ttest is 5%, change this only if really necessary.

The one sided test has the alternative hypothesis that the expected values is less then 100:

ttest(meres1,100,'Tail', 'left')

ans =
     0

However if we have the alternative hypothesis, that the expected value
is 98:

ttest(meres1,98)

ans =
     0

This show, that we can't distinguish the expected values 100 and 98 using only 10 experiments on this significance level. We can calculate the number of necessary measurements if we know the deviation (let it be 5)

sampsizepwr('t',[100,5],98)

ans =
    68

The default value for the sampsizepwr function is 90% confidence level, but we may change that:

sampsizepwr('t',[100,5],98,0.95)

ans =
    84

With the help of the sampsizepwr function we can also calculate, that if we have a given number of samples, that on which confidence level can we distinguish to expected value.

sampsizepwr('t',[100,5],98,[],10)

ans =
    0.2051

Second example

Let us suppose we have 20 shrimps. In the first aquarium we have 9 of them. The temperature here is 25C. We measured the length of their "pregnancy" (the number of days elapsed from appearing the larvae amongst their feet till they let the larvae go) and got the following results: 19, 17, 23, 23, 18, 20, 25, 22, 19 days. In the second aquarium the temperature is 21C and the measured lengths were: 19, 26, 22, 19, 27, 24, 29, 23, 31, 27. We want to decide, whether the temperature influences the length of their pregnancy.

Two sampled t-test

We want to test the equality of the expected values supposing both sample

%s come from a normal distribution. We can use the ttest2 function for
%that:

akv25=[19, 17, 23, 23, 18, 20, 25, 22, 18];
mean(akv25)

ans =
   20.5556

akv21=[18, 19, 26, 22, 19, 27, 24, 29, 23, 31, 27 ];
mean(akv21)

ans =
   24.0909

ttest2(akv21, akv25)

ans =
     1

On 95% confidence we can say, that the water temperature influences the length of the pregnancy.