Professional Documents
Culture Documents
Parametric
Hypothesis
Testing
2004/3/17
z-test
t-test
Unpaired data
Complex data
More than two Groups
two-sample
t-test
One-Way Analysis of
Variance (ANOVA)
[4]
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P.,
Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A. et al. (1999) Molecular
classification of cancer: class discovery and class prediction by gene expression
monitoring. Science 286, 531--537.
Cancer Genomics Program at Whitehead Institute for Genome Research
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
[5]
[6]
!
The null hypothesis:
There are four types of small round blue cell tumors of childhood:
neuroblastoma (NB), non-Hodgkin lymphoma (NHL),
rhabdomyosarcoma (RMS) and Ewing tumours (EWS). Sixtythree samples from these tumours, 12, 8, 20 and 23 in each of
the groups, respectively, have been hybridised to microarray.
We want to identify genes that are differentially expressed in one
or more of these four groups.
The significance level (alpha) is related to the degree of certainty you require in
order to reject the null hypothesis in favor of the alternative.
Decide in advance to reject the null hypothesis if the probability of observing your sampled result is less than the
significance level.
For a typical significance level of 5%, the notation is alpha = 0.05. For this significance level, the probability of
incorrectly rejecting the null hypothesis when it is actually true is 5%.
If you need more protection from this error, then choose a lower value of alpha .
The p-value is the probability of observing the given sample result under the
assumption that the null hypothesis is true.
More on SRBCT:
http://www.thedoctorsdoctor.com/diseases/small_round_blue_cell_tumor.htm
If the p-value is less than alpha, then you reject the null hypothesis.
For example, if alpha = 0.05 and the p-value is 0.03, then you reject the null hypothesis.
Suppose, in our example, 1.15 is inside a 95% confidence interval for the mean, . That is equivalent to being unable
to reject the null hypothesis at a significance level of 0.05.
Conversely if the 100(1- alpha )% confidence interval does not contain 1.15, then you reject the null hypothesis at the
alpha level of significance.
[7]
[8]
"#
1.
2.
3.
4.
5.
[9]
[10]
#
& $ '
!#
[12]
% )
Question: What if I do a t-test on a pair of samples and fail to reject the null hypothesis-does this mean that there is no significant difference?
Answer: Maybe yes, maybe no.
For two-sample t-test, power is the probability of rejecting the hypothesis that
the means are equal when they are in fact not equal. Power is one minus the
probability of Type-II error.
Example
H0: no differential expressed.
The power of the test depends upon the sample size, the magnitudes of the
variances, the alpha level, and the actual difference between the two
population means.
Usually you would only consider the power of a test when you failed to reject
the null hypothesis.
High power is desirable (0.7 to 1.0). High power means that there is a high
probability of rejecting the null hypothesis when the null hypothesis is false.
This is a critical measure of precision in hypothesis testing and needs to be
considered with care.
[13]
[14]
+
*
Two measurements are independent if knowing the
value of one measurement does not give information
about the value of the other.
For any gene, the measurements of expression in two
different patient are independent.
Replicate measurements from the same patient are not
independent. (replicate features on an array)
[15]
,
,
$
,
jbtest
lillietest
kstest
kstest2
Hypothesis testing for the mean of one sample with known variance
ztest
ttest
Hypothesis testing for the difference in means of two samples (unpaired) ttest2
signtest
signrank
10
Wilcoxon rank sum test that two populations are identical (unpaired)
(Mann-Whitney test)
Ranksum
11
anova1
[16]
!"
""
%
/
[17]
'
Conclude: this gene has been significantly downregulated following chemotherapy at the 1% level.
[18]
" -
/
The gene metallothionein IB is on the Affymetrix array used for
the leukemia data.
For paired t-test, it is the distribution of the subtracted data that must be
normal.
For unpaired t-test, the distribution of both data sets must be normal.
Steps
the data is log transformed.
t=-3.4177, p=0.0016
Conclude that the expression of metallothionein IB is significantly
higher in AML than in ALL at the 1% level.
Note:
If the two populations are symmetric, and if the variances are
equal, then the t test may be used.
[19]
If the two populations are symmetric, and the variances are not
equal, then use the two-sample unequal variance t-test or Welch's t
test.
[20]
Example of data that are not normally distributed: t-tests are not appropriate
analysis of these data.
Histogram of the difference of expression of diubiquitin in 20 breast cancer
patients.
The data have not been logged. The distribution is not normal; there are two outliers,
with values of approximately -28 and -11.
(a t-test gives a not-significant result because the standard error of the mean is so high.
p=0.03 which is not significant at the 1% level.)
The data have been logged. The distribution is normal; the outliers have been pulled in.
(In both case the mean difference is less than zero. a t-test is significant because the
standard error is much lower. p=0.001, which is significant at the 1% level).
-1
11
[21]
[22]
[23]
[24]
" )
"
2
Jarque-Bera test for goodness-of-fit to a normal distribution
The Jarque-Bera test evaluates the hypothesis that X has a normal distribution
with unspecified mean and variance, against the alternative that X does not have
a normal distribution.
The test is based on the sample skewness and kurtosis of X. For a true normal
distribution, the sample skewness should be near 0 and the sample kurtosis
should be near 3.
The Jarque-Bera test determines whether the sample skewness and kurtosis
are unusually different than their expected values, as measured by a chi-square
statistic.
The Jarque-Bera test is an asymptotic test, and should not be used with small
samples. You may want to use lillietest in place of jbtest for small samples.
jbtest
[25]
lillietest
[26]
"
2
0
Kolmogorov-Smirnov test of the distribution of one sample
kstest
kstest2
[27]
[28]
&
!
Null hypothesis: the population median from which both samples were
drawn is the same.
&
-& $
[30]
'
The data from the two groups are combined and given ranks. (1 for the smallest, 2
for the second smallest,... )
The ranks for the larger group are summed and that number is compared against a
precomputed table to a p-value.
[31]
[32]
&
,4
-& $
56 "
! "
$ !
Unlogged: p-value=0.00032,
Logged: p-value=0.00048
The test gives a significant
result. The Wilcoxon test is
robust to outliers and so
gives a significant result
even on the unlogged data.
[33]
[34]
0
The bootstrap data sets look like the real data, in that they have similar
values, but are biologically nonsense because the values have been
randomized.
Aim: the aim of the test is to compare some property of the real data
with a distribution of the same property in random data sets.
Bootstrap Sample/data
with replacement: different individuals in the bootstrap data could have the same value
from the real data.
without replacement: each of the real values is only used once in the bootstrap data.
Bootstrap analysis are more appropriate for microarray analysis than either ttest or classical non-parametric tests.
don't require that the data are normally distributed.
robust to noise and experimental artifacts.
[36]
! 56
"
"
[38]
"
t-test
Non-parametric
Bootstrap Analysis
Easy
Easy
Robust
Powerful
Robust
Powerful
Widely Implemented
widely implemented
Less powerful
By the very definition of a p-value, each gene would have a 1% chance of having a p-value of
less than 0.01, and thus be significant at the 1% level.
Because there are 10000 genes on this imaginary microarray, we would expect to find 100
significant genes at this level.
Similarly, we would expect to find 10 genes with a p-value less than 0.001, and 1 gene with
p-value less than 0.0001.
Note:
Because of the loss of power, classical non-parametric statistics have not
become popular for use with microarray data, and instead bootstrap methods
trend to be preferred.
In Breast Cancer Dataset with 9216 genes, even if the chemotherapy had no effect
whatsoever, we expect to find 92 differentially expressed genes with p-values less
than 0.01, simple because of the large number of genes being analyzed.
How do we know that the genes that appear to be differentially expressed are truly
differentially expressed and are not just artifact introduced because we are analyzing
a large number of genes?
Is this gene truly differentially expressed, or could it be a false positive results?
[39]
[40]
$ %
$ %
The permutation test is a test where the null-hypothesis allows to reduce the
inference to a randomization problem.
The process of randomizations makes it possible to ascribe a probability
distribution to the difference in the outcome possible under H0.
The outcome data are analyzed many times (once for each acceptable
assignment that could have been possible under H0) and then compared with
the observed result, without dependence on additional distributional or modelbased assumptions.
Perform a permutation test (general):
1.
2.
3.
4.
-&
2 7
[41]
"7
[42]
[43]
[44]
/
"
Enfron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall.
Jarque, C. M. and Bera, A. K. (1980). Efficient tests for normality, homoscedasticity, and serial
independence of regression residuals. Economics Letters 6, 255-9.
Kerr, M. K., Martin, M., and Churchill, G. A. (2000). Analysis of variance for gene expression microarray
data, Journal of Computational Biology, 7: 819-837.
Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown,
The American Statistical Association Journal.
Martinez, W. L. (2002 ). Computational statistics handbook with MATLAB, Boca Raton : Chapman &
Hall/CRC.
Runyon, R. P. (1977). Nonparametric statistics : a contemporary approach, Reading, Mass.: AddisonWesley Pub. Co.
Statistics Toolbox User's Guide, The MathWorks Inc.
http://www.mathworks.com/access/helpdesk/help/toolbox/stats/stats.shtml
Stekel, D. (2003). Microarray bioinformatics, New York : Cambridge University Press.
Tsai, C. A., Chen, Y. J. and Chen, J. (2003). Testing for differentially expressed genes with microarray
data, Nucleic Acids Research 31, No 9, e52.
Turner, J. R. and Thayer, J. F. (2001). Introduction to analysis of variance : design, analysis, &
interpretation, Thousand Oaks, Calif. : Sage Publications.
E-mail: hmwu@stat.sinica.edu.tw
Website: http://www.sinica.edu.tw/~hmwu/
[45]
[46]