You are on page 1of 4

Statistical Inference, Decision Theory and Hypothesis Testing  

       

 
It is often necessary to make decisions about the characteristics of populations from the knowledge
of sample quantities, called sample statistics. The study of these statistical decisions, the
process behind them, and the degree of certainty attributed to them, is called statistical inference.
 
When making a statistical decision, assumptions about the populations in question must be made.
That is to say, statements about the population's unknown quantities, or population parameters,
are formulated, which may be true or false. These assumptions are called statistical
hypotheses and the process of testing them is referred to as hypothesis testing.
 
A null hypothesis, commonly denoted  , is a statistical hypothesis formulated for the purpose of
rejection and usually implies equality. It is assumed to be true until evidence indicates otherwise.
For example, if we want to test whether a certain brand of chocolate,  , tastes better than all the
other brands,  , the null hypothesis will assume that it does not as  .
 
An alternative hypothesis, commonly denoted   or  , represents an alternative to the null
hypothesis, or a claim that can only be true when the null hypothesis is false. Depending on the
nature of the null hypothesis, three alternative hypotheses are possible:
 
 as an alternative to  , in which case the statistical test is said to be a two-tailed
• test.
 

 as an alternative to  , in which case the statistical test is said to be a left-tailed
• test.
 

 as an alternative to  , in which case the statistical test is said to be a right-tailed
• test.
 
Left- and right-tailed tests are also referred to as one-tailed
tests.
 
Statistical decisions are not error-free. The error of rejecting a null hypothesis (hold it false) when it
is true is called a type I error, whereas the error of not rejecting a null hypothesis (hold it true) when
it is false is called a type II error.
 
   is true  is false
 Rejected Type I error Correct decision
 not Rejected Correct decision Type II error
    
 
The probability of making a type I error in a hypothesis test is called the significance level,  , of the
test. The point at which the significance level takes place is called the critical point—the point past
which a null hypothesis is rejected. The complement of the significance level,  , is called
the confidence level of the test.
 
The probability of making a type II error, usually denoted  , depends on the chosen significance
level, the sample size and the true value of the parameter under consideration. The complement
of  ,  , represents the probability of not making a type II error and is called the power of the test.
 
The performance of a test is characterized by its significance level and power, but because the true
value of the parameter under consideration is rarely known, power values must be reported in
curves, referred to as power curves, where power is calculated for a range of values of the
parameter under consideration.
 
At this time, Auguri  does not report power values.
TM

 
The P-value of a hypothesis test represents the smallest significance level at which a null
hypothesis can be rejected for the resulting test statistic.
 
There are two methods for rejecting a null hypothesis in favor of an alternative. The first method is
referred to as the critical point method, which indicates the rejection of a null hypothesis when the
resulting statistic falls outside the confidence level region, or acceptance region, past the critical
point, in what is called the rejection region. The second method, or P-value method, indicates the
rejection of a null hypothesis when the reported P-value is smaller than the significance level of the
test (see the following table and figures).
 
 
Reject a null hypothesis if
 
Test Critical Point P-Value
 
Two-Tailed Computed statistic < Left critical point Reported left P-Value1 < Significance level
  OR OR
Computed statistic > Right critical point Reported right P-Value1 > 2-Significance level
   
Left-Tailed Computed statistic < Left critical point Reported left P-Value < Significance level
     
Right-Tailed Computed statistic > Right critical point Reported right P-Value > 1-Significance level
     
(1) Reported two-tailed P-values are doubled in two-tailed tests.
 
 
Graphical representation of statistical tests, where white areas under the curve equal the confidence level,
shaded areas equal the significance level, and dotted lines the critical points. Significance levels and P-
values correspond to areas at the tails, whereas critical points and test statistics correspond to points in the
abscissa.
 

One‐ and Two‐Tailed Tests


In the previous example, you tested a research hypothesis that predicted not only
that the sample mean would be different from the population mean but that it would
be different in a specific direction—it would be lower. This test is called
adirectional or one-tailed test because the region of rejection is entirely within
one tail of the distribution.
Some hypotheses predict only that one value will be different from another, without
additionally predicting which will be higher. The test of such a hypothesis is non-
directional or two-tailed because an extreme test statistic in either tail of the
distribution (positive or negative) will lead to the rejection of the null hypothesis of
no difference.

Suppose that you suspect that a particular class's performance on a proficiency test
is not representative of those people who have taken the test. The national mean
score on the test is 74.

The research hypothesis is:

 The mean score of the class on the test is not 74.

H a : μ ≠ 74
Or in notation:

The null hypothesis is:

 The mean score of the class on the test is 74.

H0: μ = 74
In
notation:

As in the last example, you decide to use a 95 percent probability level for the test.
Both tests have a region of rejection, then, of five percent, or .05. In this example,
however, the rejection region must be split between both tails of the distribution
—.025 in the upper tail and .025 in the lower tail—because your hypothesis specifies
only a difference, not a direction, as shown in Figure 1 (a). You will reject the null
hypotheses of no difference if the class sample mean is either much higher or much
lower than the population mean of 74. In the previous example, only a sample mean
much lower than the population mean would have led to the rejection of the null
hypothesis. 
Figure
1 Comparison of (a) a two-tailed test and (b) a one-tailed test, at the same
probability level (95%).

The decision of whether to use a one- or a two-tailed test is important because a


test statistic that falls in the region of rejection in a one-tailed test may not do so in
a two-tailed test, even though both tests use the same probability level. Suppose
the class sample mean in your example was 77, and its corresponding z-score was
computed to be 1.80. In order to reject the null hypothesis, the test statistic must
be either smaller than −1.96 or greater than 1.96. It is not, so you cannot reject the
null hypothesis. Refer to Figure 1 (a).
Suppose, however, you had a reason to expect that the class would perform better
on the proficiency test than the population, and you did a one-tailed test instead.
For this test, the rejection region of .05 would be entirely within the upper tail. The
critical z-value for a probability of .05 in the upper tail is 1.65. Your computed test
statistic of z = 1.80 exceeds the critical value and falls in the region of rejection, so
you reject the null hypothesis and say that your suspicion that the class was better
than the population was supported. See Figure 1 (b).

In practice, you should use a one-tailed test only when you have good reason to
expect that the difference will be in a particular direction. A two-tailed test is more
conservative than a one-tailed test because a two-tailed test takes a more extreme
test statistic to reject the null hypothesis.

scrt

You might also like