Seventh lecture: Hypothesis testing
Contents
Introduction
Let us suppose, that we measure a quantity originating from an unknown distribution. We want to decide whether the distribution has a certain property (i.e. it's mean is 100?).
First example
Let us consider a box of matches. The boxes are filled based on weight, that is not every box has the same number of matches. On a box we can often read 100 5%. We want to test the theory, that in average there are 100 matches in the boxes. To do that we open 10 boxes. Let us suppose we counted the following numbers: 98, 96,102, 104,92, 100, 108, 101, 99, 101. If we calculate the mean at this point:
meres1=[91, 96,102, 104,92, 100, 108, 101, 96, 101]; mean(meres1)
ans = 99.1000
The t-test
Let us suppose that the number of matches has normal distribution, that we can use t-test to verify the expected value. The output is 0 if the test accepted the null hypothesis, 1 if rejected.
ttest(meres1,100)
ans = 0
The default significance level for ttest is 5%, change this only if really necessary.
The one sided test has the alternative hypothesis that the expected values is less then 100:
ttest(meres1,100,'Tail', 'left')
ans = 0
However if we have the alternative hypothesis, that the expected value is 98:
ttest(meres1,98)
ans = 0
This show, that we can't distinguish the expected values 100 and 98 using only 10 experiments on this significance level. We can calculate the number of necessary measurements if we know the deviation (let it be 5)
sampsizepwr('t',[100,5],98)
ans = 68
The default value for the sampsizepwr function is 90% confidence level, but we may change that:
sampsizepwr('t',[100,5],98,0.95)
ans = 84
With the help of the sampsizepwr function we can also calculate, that if we have a given number of samples, that on which confidence level can we distinguish to expected value.
sampsizepwr('t',[100,5],98,[],10)
ans = 0.2051
Second example
Let us suppose we have 20 shrimps. In the first aquarium we have 9 of them. The temperature here is 25C. We measured the length of their "pregnancy" (the number of days elapsed from appearing the larvae amongst their feet till they let the larvae go) and got the following results: 19, 17, 23, 23, 18, 20, 25, 22, 19 days. In the second aquarium the temperature is 21C and the measured lengths were: 19, 26, 22, 19, 27, 24, 29, 23, 31, 27. We want to decide, whether the temperature influences the length of their pregnancy.
Two sampled t-test
We want to test the equality of the expected values supposing both sample
%s come from a normal distribution. We can use the ttest2 function for %that: akv25=[19, 17, 23, 23, 18, 20, 25, 22, 18]; mean(akv25)
ans = 20.5556
akv21=[18, 19, 26, 22, 19, 27, 24, 29, 23, 31, 27 ]; mean(akv21)
ans = 24.0909
ttest2(akv21, akv25)
ans = 1
On 95% confidence we can say, that the water temperature influences the length of the pregnancy.
Recommended problems
1. problem We want to check whether the boxes has 90 tissues in them. Let us suppose, the the deviation is 4.5.
- How many boxes have to be opened, to be sure on 95% confidence level, that there are 90 tissues in the box and not 89?
- And on 99% confidence level?
- Answers these two questions as a customer (we only care if there are less),
- and as a quality controller (every difference is a problem).
2. problem We want to test the effectiveness of a new antipyretic. We measured the temperatures of 10 people before taking the drug and got the following temperatures: 38.1, 39.2, 37.9, 38.3, 39.5, 39.4, 38.5, 39.1, 38.4, 39.1 Half an hour after taking the drugs: 37.3, 36.9, 37.3, 37.2, 38.2, 37.4, 36.5, 37.3, 37.4, 36.8. What is our null hypothesis? What is the alternative hypothesis? What sort of test we should apply?