You are on page 1of 50

Chi-squared test

From Wikipedia, the free encyclopedia Jump to: navigation, search This article may require cleanup to meet Wikipedia's quality standards. No cleanup reason has been specified. Please help improve this article if you can. (April 2008) "Chi-squared test" is often shorthand for Pearson's chi-squared test. A chi-squared test, also referred to as chi-square test or test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough. Some examples of chi-squared tests where the chi-squared distribution is only approximately valid:

Pearson's chi-squared test, also known as the chi-squared goodness-of-fit test or chisquared test for independence. When the chi-squared test is mentioned without any modifiers or without other precluding context, this test is usually meant (for an exact test used in place of , see Fisher's exact test). Yates's correction for continuity, also known as Yates' chi-squared test. CochranMantelHaenszel chi-squared test. McNemar's test, used in certain 2 2 tables with pairing Linear-by-linear association chi-squared test The portmanteau test in time-series analysis, testing for the presence of autocorrelation Likelihood-ratio tests in general statistical modelling, for testing whether there is evidence of the need to move from a simple model to a more complicated one (where the simple model is nested within the complicated one).

One case where the distribution of the test statistic is an exact chi-squared distribution is the test that the variance of a normally distributed population has a given value based on a sample variance. Such a test is uncommon in practice because values of variances to test against are seldom known exactly.

Contents

1 Chi-squared test for variance in a normal population 2 See also

3 References 4 External links

Chi-squared test for variance in a normal population


If a sample of size n is taken from a population having a normal distribution, then there is a wellknown result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e. the value to be tested as holding). Then T has a chi-squared distribution with n 1 degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5% is the interval 9.59 to 34.17.

The Chi Square Statistic


Types of Data:

There are basically two types of random variables and they yield two types of data: numerical and categorical. A chi square (X ) statistic is used to investigate whether distributions of categorical variables differ from one another. Basically categorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no." In contrast, responses to such questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences between these two variables.
2

Data Type Categorical Numerical

Question Type What is your sex? Disrete- How many cars do you own? Continuous - How tall are you?

Possible Responses male or female two or three

Numerical

72 inches

Notice that discrete data arise fom a counting process, while continuous data arise from a measuring process. The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.)
2 x 2 Contingency Table

There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. We'll begin with the simplest case: a 2 x 2 contingency table. If we set the 2 x 2 table to the general notation shown below in Table 1, using the letters a, b, c, and d to denote the contents of the cells, then we would have the following table:
Table 1. General notation for a 2 x 2 contingency table. Variable 1 Variable 2 Category 1 Category 2 Total Data type 1 a c a+c Data type 2 b d b+d Totals a+b c+d a+b+c+d= N

For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula:

Note: notice that the four components of the denominator are the four totals from the table columns and rows. Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would show increased heart rates compared to those that did not receive the drug. You conduct the study and collect the following data: Ho: The proportion of animals whose heart rate increased is independent of drug treatment. Ha: The proportion of animals whose heart rate increased is associated with drug treatment.

Table 2. Hypothetical drug trial results. Heart Ra No Heart R te ate Total Increase Increase d Treated Not treated Total 36 30 66 14 25 39 50 55 105

Applying the formula above we get: Chi square = 105[(36)(25) - (14)(30)] / (50)(55)(39)(66) = 3.418
2

Before we can proceed we eed to know how many degrees of freedom we have. When a comparison is made between one sample and another, a simple rule is that the degrees of freedom equal (number of columns minus one) x (number of rows minus one) not counting the totals for rows or columns. For our data this gives (2-1) x (2-1) = 1. We now have our chi square statistic (x2 = 3.418), our predetermined alpha level of significance (0.05), and our degrees of freedom (df = 1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value of x2 (3.418) lies between 2.706 and 3.841. The corresponding probability is between the 0.10 and 0.05 probability levels. That means that the p-value is above 0.05 (it is actually 0.065). Since a p-value of 0.65 is greater than the conventionally accepted significance level of 0.05 (i.e. p > 0.05) we fail to reject the null hypothesis. In other words, there is no statistically significant difference in the proportion of animals whose heart rate increased. What would happen if the number of control animals whose heart rate increased dropped to 29 instead of 30 and, consequently, the number of controls whose hear rate did not increase changed from 25 to 26? Try it. Notice that the new x2 value is 4.125 and this value exceeds the table value of 3.841 (at 1 degree of freedom and an alpha level of 0.05). This means that p < 0.05 (it is now0.04) and we reject the null hypothesis in favor of the alternative hypothesis - the heart rate of animals is different between the treatment groups. When p < 0.05 we generally refer to this as a significant difference. Table 3. Chi Square distribution table.

probability level (alpha)


Df 1 2 3 4 5 0.5 0.10 0.05 0.02 5.412 7.824 9.837 0.01 6.635 9.210 0.001 10.827 13.815

0.455 2.706 3.841 1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

To make the chi square calculations a bit easier, plug your observed and expected values into the following applet. Click on the cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed in the lower left hand coner.

Chi Square Goodness of Fit (One Sample Test)

This test allows us to compae a collection of categorical data with some theoretical expected distribution. This test is often used in genetics to compare the results of a cross with the theoretical distribution based on genetic theory. Suppose you preformed a simpe monohybrid cross between two individuals that were heterozygous for the trait of interest.
Aa x Aa

The results of your cross are shown in Table 4.

Table 4. Results of a monohybrid coss between two heterozygotes for the 'a' gene.
A A a Totals 10 33 43 a 42 15 57 Totals 52 48 100

The penotypic ratio 85 of the A type and 15 of the a-type (homozygous recessive). In a monohybrid cross between two heterozygotes, however, we would have predicted a 3:1 ratio of phenotypes. In other words, we would have expected to get 75 A-type and 25 a-type. Are or resuls different?

Calculate the chi square statistic x by completing the following steps:


2

1. For each observed number in the table subtract the corresponding expected number (O E). 2. Square the difference [ (O E) ].
2

3. Divide the squares obtained for each cell in the table by the expected number for that cell [ (O - E) / E ].
2

4. Sum all the values for (O - E) / E. This is the chi square statistic.
2

For our example, the calculation would be:


Observed Expected (O E) (O E)2 (O E)2/ E

A-type 85 a-type Total 15 100

75 25 100

10 10

100 100

1.33 4.0 5.33

x2 = 5.33 We now have our chi square statistic (x = 5.33), our predetermined alpha level of significalnce (0.05), and our degrees of freedom (df =1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value of x 5.33) lies between 3.841 and 5.412. The corresponding probability is 0.05<P<0.02. This is smaller than the conventionally accepted significance level of 0.05 or 5%, so the null hypothesis that the two distributions are the same is rejected. In other words, when the computed x statistic exceeds the critical value in the table for a 0.05 probability level, then we can reject the null hypothesis of equal distributions. Since our x2 statistic (5.33) exceeded the critical value for 0.05 probability level (3.841) we can reject the null hypothesis that the observed values of our cross are the same as the theoretical distribution of a 3:1 ratio.
2 2 2

Table 3. Chi Square distribution table.


probability level (alpha)
Df 1 2 3 4 5 0.5 0.10 0.05 0.02 5.412 7.824 9.837 0.01 6.635 9.210 0.001 10.827 13.815

0.455 2.706 3.841 1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

To put this into context, it means that we do not have a 3:1 ratio of A_ to aa offspring. To make the chi square calculations a bit easier, plug your observed and expected values into the following java applet. Click on the cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed in the lower left hand coner.

Chi Square Test of Independence

For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test ofindependence the null and alternative hypotheses are: Ho: The two categorical variables are independent. Ha: The two categorical variables are related. We can use the equation Chi Square = the sum of all the (fo - fe) / fe
2

Here fo denotes the frequency of the observed data and fe is the frequency of the expected values. The general table would look something like the one below:
CategorCategor Category yI y II III Sample A Sample B Sample C Column Totals a d g b e h c f i c+f+i Row Totals a+b+c d+e+f g+h+i a+b+c+d+e+f+g+h +i=N

a+d+g b+e+h

Now we need to calculate the expected values for each cell in the table and we can do that using the the row total times the column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N. Once the expected values have been calculated for each cell, we can use the same procedure are before for a simple 2 x 2 table.
Observ Expect |O (O E)2 ed ed E|
(O E)2/ E

Suppose you have the following categorical data set.


Table . Incidence of three types of malaria in three tropical regions.

Asi Afric a a Malaria 31 A Malaria B 2 14

South America 45

Totals

90

53

60

Malaria 53 C Totals 86

45 64

2 100

100 250

We could now set up the following table:


Observe Expecte |O -E| d d 31 14 45 2 5 53 53 45 2 30.96 23.04 36.00 20.64 15.36 24.00 34.40 25.60 40.00 0.04 9.04 9.00
(O E)2 (O E)2/ E

0.0016 81.72 81.00

0.0000516 3.546 2.25 16.83 6.99 35.04 10.06 14.70 36.10

18.64 347.45 10.36 107.33 29.00 841.00 18.60 345.96 19.40 376.36 38.00 1444.00

Chi Square = 125.516 Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4

Table 3. Chi Square distribution table.


probability level (alpha)

Df 1 2 3 4 5

0.5

0.10

0.05

0.02 5.412 7.824 9.837

0.01 6.635 9.210

0.001 10.827 13.815

0.455 2.706 3.841 1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05)

Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Our data tell us there is a relationship between type of malaria and location, but that's all it says. Follow the link below to access a java-based program for calculating Chi Square statistics for contingency tables of up to 9 rows by 9 columns. Enter the number of row and colums in the spaces provided on the page and click the submit button. A new form will appear asking you to enter your actual data into the cells of the contingency table. When finished entering your data, click the "calculate now" button to see the results of your Chi Square analysis. You may wish to print this last page to keep as a record.
Chi Square,

This page was created as part of the Mathbeans Project. The java applets were created by David Eck and modified by Jim Ryan. The Mathbeans Project is funded by a grant from the National Science Foundation DUE-9950473.

D. The Chi-Square Test

About the Chi-Square Test


Generally speaking, the chi-square test is a statistical test used to examine differences with categorical variables. There are a number of features of the social world we characterize through categorical variables - religion, political preference, etc. To examine hypotheses using such variables, use the chi-square test. The chi-square test is used in two similar but distinct circumstances:

a. for estimating how closely an observed distribution matches an expected distribution - we'll refer to this as the goodness-of-fit test b. for estimating whether two random variables are independent.
The Goodness-of-Fit Test

One of the more interesting goodness-of-fit applications of the chi-square test is to examine issues of fairness and cheating in games of chance, such as cards, dice, and roulette. Since such games usually involve wagering, there is significant incentive for people to try to rig the games and allegations of missing cards, "loaded" dice, and "sticky" roulette wheels are all too common. So how can the goodness-of-fit test be used to examine cheating in gambling? It is easier to describe the process through an example. Take the example of dice. Most dice used in wagering have six sides, with each side having a value of one, two, three, four, five, or six. If the die being used is fair, then the chance of any particular number coming up is the same: 1 in 6. However, if the die is loaded, then certain numbers will have a greater likelihood of appearing, while others will have a lower likelihood.
One night at the Tunisian Nights Casino, renowned gambler Jeremy Turner (a.k.a. The Missouri Master) is having a fantastic night at the craps table. In two hours of playing, he's racked up $30,000 in winnings and is showing no sign of stopping. Crowds are gathering around him to watch his streak - and The Missouri Master is telling anyone within earshot that his good luck is due to the fact that he's using the casino's lucky pair of "bruiser dice," so named because one is black and the other blue.

Unbeknownst to Turner, however, a casino statistician has been quietly watching his rolls and marking down the values of each roll, noting the values of the black and blue dice separately. After 60 rolls, the statistician has become convinced that the blue die is loaded. Value on Blue Observed Die Frequency 1 2 3 16 5 9 Expected Frequency 10 10 10

4 5 6 Total

7 6 17 60

10 10 10 60

At first glance, this table would appear to be strong evidence that the blue die was, indeed, loaded. There are more 1's and 6's than expected, and fewer than the other numbers. However, it's possible that such differences occurred by chance. The chisquare statistic can be used to estimate the likelihood that the values observed on the blue die occurred by chance.

The key idea of the chi-square test is a comparison of observed and expected values. How many of something were expected and how many were observed in some process? In this case, we would expect 10 of each number to have appeared and we observed those values in the left column.
With these sets of figures, we calculate the chi-square statistic as follows:

Using this formula with the values in the table above gives us a value of 13.6. Lastly, to determine the significance level we need to know the "degrees of freedom." In the case of the chi-square goodness-of-fit test, the number of degrees of freedom is equal to the number of terms used in calculating chi-square minus one. There were six terms in the chi-square for this problem - therefore, the number of degrees of freedom is five. We then compare the value calculated in the formula above to a standard set of tables. The value returned from the table is 1.8%. We interpret this as meaning that if the die was fair (or not loaded), then the chance of getting a 2 statistic as large or larger than the one calculated above is only 1.8%. In other words, there's only a very slim chance that these rolls came from a fair die. The Missouri Master is in serious trouble.
Recap

To recap the steps used in calculating a goodness-of-fit test with chi-square:


1. Establish hypotheses. 2. Calculate chi-square statistic. Doing so requires knowing:
o

The number of observations

o o

Expected values Observed values

3. Assess significance level. Doing so requires knowing the number of degrees of freedom. 4. Finally, decide whether to accept or reject the null hypothesis.
Testing Independence

The other primary use of the chi-square test is to examine whether two variables are independent or not. What does it mean to be independent, in this sense? It means that the two factors are not related. Typically in social science research, we're interested in finding factors that are related education and income, occupation and prestige, age and voting behavior. In this case, the chisquare can be used to assess whether two variables are independent or not. More generally, we say that variable Y is "not correlated with" or "independent of" the variable X if more of one is not associated with more of another. If two categorical variables are correlated their values tend to move together, either in the same direction or in the opposite.
Example

Return to the example discussed at the introduction to chi-square, in which we want to know whether boys or girls get into trouble more often in school. Below is the table documenting the percentage of boys and girls who got into trouble in school:
Got in Trouble Boy s Girl s Tot al 46 No Trouble Tota l

71 117

37

83 120

83

154 237

To examine statistically whether boys got in trouble in school more often, we need to frame the question in terms of hypotheses.

1. Establish Hypotheses

As in the goodness-of-fit chi-square test, the first step of the chi-square test for independence is to establish hypotheses. The null hypothesis is that the two variables are independent - or, in this particular case that the likelihood of getting in trouble is the same for boys and girls. The alternative hypothesis to be tested is that the likelihood of getting in trouble is not the same for boys and girls.
Cautionary Note

It is important to keep in mind that the chi-square test only tests whether two variables are independent. It cannot address questions of which is greater or less. Using the chi-square test, we cannot evaluate directly the hypothesis that boys get in trouble more than girls; rather, the test (strictly speaking) can only test whether the two variables are independent or not.
2. Calculate the expected value for each cell of the table

As with the goodness-of-fit example described earlier, the key idea of the chi-square test for independence is a comparison of observed and expected values. How many of something were expected and how many were observed in some process? In the case of tabular data, however, we usually do not know what the distribution should look like (as we did with rolls of dice). Rather, in this use of the chi-square test, expected values are calculated based on the row and column totals from the table. The expected value for each cell of the table can be calculated using the following formula:

For example, in the table comparing the percentage of boys and girls in trouble, the expected count for the number of boys who got in trouble is:

The first step, then, in calculating the chi-square statistic in a test for independence is generating the expected value for each cell of the table. Presented in the table below are the expected values (in parentheses and italics) for each cell:
Got in Trouble No Trouble Tota l 117

Boy 46 (40.97) 71

(76.02)

Girl 83(77.97 37 (42.03) 120 s ) Tot 83 al 154 237

3. Calculate Chi-square statistic

With these sets of figures, we calculate the chi-square statistic as follows:

In the example above, we get a chi-square statistic equal to:

4. Assess significance level

Lastly, to determine the significance level we need to know the "degrees of freedom." In the case of the chi-square test of independence, the number of degrees of freedom is equal to the number of columns in the table minus one multiplied by the number of rows in the table minus one. In this table, there were two rows and two columns. Therefore, the number of degrees of freedom is:

We then compare the value calculated in the formula above to a standard set of tables. The value returned from the table is p< 20%. Thus, we cannot reject the null hypothesis and conclude that boys are not significantly more likely to get in trouble in school than girls.
Recap

To recap the steps used in calculating a goodness-of-fit test with chi-square

1. Establish hypotheses 2. Calculate expected values for each cell of the table. 3. Calculate chi-square statistic. Doing so requires knowing: a. The number of observations b. Observed values 4. Assess significance level. Doing so requires knowing the number of degrees of freedom 5. Finally, decide whether to accept or reject the null hypothesis.

Student's t-test
From Wikipedia, the free encyclopedia Jump to: navigation, search

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It can be used to determine if two sets of data are significantly different from each other, and is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic (under certain conditions) follows a Student's t distribution.

Contents

1 History 2 Uses 3 Assumptions 4 Unpaired and paired two-sample t-tests


o

4.1 Independent samples

o o

4.2 Paired samples 4.3 Overlapping samples

5 Calculations
o o o

5.1 One-sample t-test 5.2 Slope of a regression line 5.3 Independent two-sample t-test

5.3.1 Equal sample sizes, equal variance 5.3.2 Unequal sample sizes, equal variance 5.3.3 Unequal (or equal) sample sizes, unequal variances

5.4 Dependent t-test for paired samples

6 Worked examples
o o

6.1 Unequal variances 6.2 Equal variances

7 Alternatives to the t-test for location problems 8 Multivariate testing


o o

8.1 One-sample T 2 test 8.2 Two-sample T 2 test

9 Software implementations 10 See also 11 Notes 12 References 13 Further reading 14 External links
o

14.1 Online calculators

History
The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name).[1][2][3] Gosset had been hired due to Claude Guinness's policy of recruiting the best graduates from Oxford and Cambridge to

apply biochemistry and statistics to Guinness's industrial processes.[2] Gosset devised the t-test as a cheap way to monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was known to fellow statisticians.[4]

Uses
Among the most frequently used t-tests are:

A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A two-sample location test of the null hypothesis that the means of two normally distributed populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. [5] A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test:[5][6] see paired difference test. A test of whether the slope of a regression line differs significantly from 0.

Assumptions
Most t-test statistics have the form t = Z/s, where Z and s are functions of the data. Typically, Z is designed to be sensitive to the alternative hypothesis (i.e., its magnitude tends to be larger when the alternative hypothesis is true), whereas s is a scaling parameter that allows the distribution of t to be determined. As an example, in the one-sample t-test Z = , where is the sample mean of the data, is the sample size, and is the population standard deviation of the data; s in the onesample t-test is , where is the sample standard deviation.

The assumptions underlying a t-test are that


Z follows a standard normal distribution under the null hypothesis s2 follows a 2 distribution with p degrees of freedom under the null hypothesis, where p is a positive constant

Z and s are independent.

In a specific type of t-test, these conditions are consequences of the population being studied, and of the way in which the data are sampled. For example, in the t-test comparing the means of two independent samples, the following assumptions should be met:

Each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the ShapiroWilk or KolmogorovSmirnov test, or it can be assessed graphically using a normal quantile plot. If using Student's original definition of the t-test, the two populations being compared should have the same variance (testable using F test, Levene's test, Bartlett's test, or the BrownForsythe test; or assessable graphically using a Q-Q plot). If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances.[7] Welch's t-test is insensitive to equality of the variances regardless of whether the sample sizes are similar. The data used to carry out the test should be sampled independently from the two populations being compared. This is in general not testable from the data, but if the data are known to be dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed here may give misleading results.

Unpaired and paired two-sample t-tests


Two-sample t-tests for a difference in mean involve independent samples, paired samples and overlapping samples. Paired t-tests are a form of blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared.[8] In a different context, paired t-tests can be used to reduce the effects of confounding factors in an observational study.
Independent samples

The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the ttest. The randomization is not essential hereif we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample t-test to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational.
Paired samples

Main article: Paired difference test

Paired samples t-tests typically consist of a sample of matched pairs of similar units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication. By comparing the same patient's numbers before and after treatment, we are effectively using each patient as their own control. That way the correct rejection of the null hypothesis (here: of no difference made by the treatment) can become much more likely, with statistical power increasing simply because the random between-patient variation has now been eliminated. Note however that an increase of statistical power comes at a price: more tests are required, each subject having to be tested twice. Because half of the sample now depends on the other half, the paired version of Student's t-test has only 'n/2 - 1' degrees of freedom (with 'n' being the total number of observations). Pairs become individual test units, and the sample has to be doubled to achieve the same number of degrees of freedom. A paired samples t-test based on a "matched-pairs sample" results from an unpaired sample that is subsequently used to form a paired sample, by using additional variables that were measured along with the variable of interest.[9] The matching is carried out by identifying pairs of values consisting of one observation from each of the two samples, where the pair is similar in terms of other measured variables. This approach is sometimes used in observational studies to reduce or eliminate the effects of confounding factors. Paired samples t-tests are often referred to as "dependent samples t-tests" (as are t-tests on overlapping samples).
Overlapping samples

An overlapping samples t-test is used when there are paired samples with data missing in one or the other samples (e.g., due to selection of "Don't know" options in questionnaires are because respondents are randomly assigned to a subset question). These tests are widely used in commercial survey research (e.g., by polling companies) and are available in many standard crosstab software packages.

Calculations
Explicit expressions that can be used to carry out various t-tests are given below. In each case, the formula for a test statistic that either exactly follows or closely approximates a t-distribution under the null hypothesis is given. Also, the appropriate degrees of freedom are given in each case. Each of these statistics can be used to carry out either a one-tailed test or a two-tailed test. Once a t value is determined, a p-value can be found using a table of values from Student's tdistribution. If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis is rejected in favor of the alternative hypothesis.

One-sample t-test

In testing the null hypothesis that the population mean is equal to a specified value 0, one uses the statistic

where is the sample mean, s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test is n 1.
Slope of a regression line

Suppose one is fitting the model

where xi, i = 1, ..., n are known, and are unknown, and i are independent identically normally distributed random errors with expected value 0 and unknown variance 2, and Yi, i = 1, ..., n are observed. It is desired to test the null hypothesis that the slope is equal to some specified value 0 (often taken to be 0, in which case the hypothesis is that x and y are unrelated). Let

Then

has a t-distribution with n 2 degrees of freedom if the null hypothesis is true. The standard error of the slope coefficient:

can be written in terms of the residuals. Let

Then

is given by:

Independent two-sample t-test Equal sample sizes, equal variance

This test is only used when both:


the two sample sizes (that is, the number, n, of participants of each group) are equal; it can be assumed that the two distributions have the same variance.

Violations of these assumptions are discussed below. The t statistic to test whether the means are different can be calculated as follows:

where

Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of t is the standard error of the difference between two means. For significance testing, the degrees of freedom for this test is 2n 2 where n is the number of participants in each group.

Unequal sample sizes, equal variance

This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) The t statistic to test whether the means are different can be calculated as follows:

where

Note that the formulae above are generalizations of the case where both samples have equal sizes (substitute n for n1 and n2). is an estimator of the common standard deviation of the two samples: it is defined in this way so that its square is an unbiased estimator of the common variance whether or not the population means are the same. In these formulae, n = number of participants, 1 = group one, 2 = group two. n 1 is the number of degrees of freedom for either group, and the total sample size minus two (that is, n1 + n2 2) is the total number of degrees of freedom, which is used in significance testing.
Unequal (or equal) sample sizes, unequal variances

This test, also known as Welch's t-test, is used only when the two population variances are assumed to be different (the two sample sizes may or may not be equal) and hence must be estimated separately. The t statistic to test whether the population means are different is calculated as:

where

Here s2 is the unbiased estimator of the variance of the two samples, ni = number of participants in group i, i=1 or 2. Note that in this case is not a pooled variance. For use in

significance testing, the distribution of the test statistic is approximated as an ordinary Student's t distribution with the degrees of freedom calculated using

This is known as the WelchSatterthwaite equation. The true distribution of the test statistic actually depends (slightly) on the two unknown population variances (see BehrensFisher problem).
Dependent t-test for paired samples

This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired". This is an example of a paired difference test.

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant 0 is non-zero if you want to test whether the average of the difference is significantly different from 0. The degree of freedom used is n 1.
Example of matched pairs Pai Nam r e 1 1 John Jane Jimm y Jessy Ag Tes e t 35 36 250 340

22

460

21

200

Worked examples
Let A1 denote a set obtained by taking 6 random samples out of a larger set:

and let A2 denote a second set obtained similarly:

These could be, for example, the weights of screws that were chosen out of a bucket. We will carry out tests of the null hypothesis that the means of the populations from which the two samples were taken are equal. The difference between the two sample means, each denoted by , which appears in the numerator for all the two-sample testing approaches discussed above, is

The sample standard deviations for the two samples are approximately 0.05 and 0.11, respectively. For such small samples, a test of equality between the two population variances would not be very powerful. Since the sample sizes are equal, the two forms of the two sample ttest will perform similarly in this example.
Unequal variances

If the approach for unequal variances (discussed above) is followed, the results are

and

The test statistic is approximately 1.959. The two-tailed test p-value is approximately 0.091 and the one-tailed p-value is approximately 0.045.
Equal variances

If the approach for equal variances (discussed above) is followed, the results are

and

Since the sample sizes are equal (both are 6), the test statistic is again approximately equal to 1.959. Since the degrees of freedom is different from what it is in the unequal variances test, the p-values will differ slightly from what was found above. Here, the two-tailed p-value is approximately 0.078, and the one-tailed p-value is approximately 0.039. Thus if there is good reason to believe that the population variances are equal, the results become somewhat more suggestive of a difference in the mean weights for the two populations of screws.

Alternatives to the t-test for location problems


The t-test provides an exact test for the equality of the means of two normal populations with unknown, but equal, variances. (The Welch's t-test is a nearly exact test for the case where the data are normal but the variances may differ.) For moderately large samples and a one tailed test, the t is relatively robust to moderate violations of the normality assumption.[10] For exactness, the t-test and Z-test require normality of the sample means, and the t-test additionally requires that the sample variance follows a scaled 2 distribution, and that the sample mean and sample variance be statistically independent. Normality of the individual data values is not required if these conditions are met. By the central limit theorem, sample means of moderately large samples are often well-approximated by a normal distribution even if the data are not normally distributed. For non-normal data, the distribution of the sample variance may deviate substantially from a 2 distribution. However, if the sample size is large, Slutsky's theorem implies that the distribution of the sample variance has little effect on the distribution of the test statistic. If the data are substantially non-normal and the sample size is small, the t-test can give misleading results. See Location test for Gaussian scale mixture distributions for some theory related to one particular family of non-normal distributions. When the normality assumption does not hold, a non-parametric alternative to the t-test can often have better statistical power. For example, for two independent samples when the data distributions are asymmetric (that is, the distributions are skewed) or the distributions have large tails, then the Wilcoxon Rank Sum test (also known as the Mann-Whitney U test) can have three to four times higher power than the t-test.[10][11][12] The nonparametric counterpart to the paired samples t test is the Wilcoxon signed-rank test for paired samples. For a discussion on choosing between the t and nonparametric alternatives, see Sawilowsky.[13] One-way analysis of variance generalizes the two-sample t-test when the data belong to more than two groups.

Multivariate testing
Main article: Hotelling's T-squared distribution

A generalization of Student's t statistic, called Hotelling's T-square statistic, allows for the testing of hypotheses on multiple (often correlated) measures within the same sample. For instance, a researcher might submit a number of subjects to a personality test consisting of multiple personality scales (e.g. the Minnesota Multiphasic Personality Inventory). Because measures of this type are usually highly correlated, it is not advisable[citation needed] to conduct separate univariate t-tests to test hypotheses, as these would neglect the covariance among measures and inflate the chance of falsely rejecting at least one hypothesis (Type I error). In this case a single multivariate test is preferable for hypothesis testing. Hotelling's T 2 statistic follows a T 2 distribution. However, in practice the distribution is rarely used,[citation needed] and instead converted to an F distribution.
One-sample T 2 test

For a one-sample multivariate test, the hypothesis is that the mean vector ( ) is equal to a given vector ( ). The test statistic is defined as:[citation needed]

where n is the sample size, covariance matrix.


Two-sample T 2 test

is the vector of column means and is a

sample

For a two-sample multivariate test, the hypothesis is that the mean vectors ( samples are equal. The test statistic is defined as[citation needed]

) of two

Software implementations
Most spreadsheet programs and statistics packages, such as QtiPlot, OpenOffice.org Calc, LibreOffice Calc, Microsoft Excel, SAS, SPSS, Stata, DAP, gretl, R, Python ([1]), PSPP, and Minitab, include implementations of Student's t-test.
Language/Pro gram Function Note s See [2]

Microsoft Excel TTEST(array1, array2, tails, type) OpenOffice.org TTEST(data1; data2; mode; type) Python scipy.stats.ttest_ind(a, b, axis=0, equal_var=True)

See [3]

t-test example
Problem: Sam Sleepresearcher hypothesizes that people who are allowed to sleep for only four hours will score significantly lower than people who are allowed to sleep for eight hours on a cognitive skills test. He brings sixteen participants into his sleep lab and randomly assigns them to one of two groups. In one group he has participants sleep for eight hours and in the other group he has them sleep for four. The next morning he administers the SCAT (Sam's Cognitive Ability Test) to all participants. (Scores on the SCAT range from 1-9 with high scores representing better performance). SCAT scores 8 hours sleep group (X) 4 hours sleep group (Y) 57535339 81466412

x 5 7 5 3 5 3 3 9 Mx=5

(x-Mx)2 0 4 0 4 0 4 4 16

y 8 1 4 6 6 4 1 2 My=4

(y - M y )2 16 9 0 4 4 0 9 4

x=40 (x-Mx)2=32 y=32 (y-My)2=46

*(according to the t sig/probability table with df = 14, t must be at least 2.145 to reach p < .05, so this difference is not statistically significant) Interpretation: Sam's hypothesis was not confirmed. He did not find a significant difference between those who slept for four hours versus those who slept for eight hours on cognitive test performance.

Practice by creating your own data and then check your work with the Virtual Statistician t-test Calculator (Only works with n = 8)

Some Examples of Statistical Analysis Using a t-Test


Example #1 A researcher wishes to learn if a certain drug slows the growth of tumors. She obtained mice with tumors and randomly divided them into two groups. She then injected one group of mice with the drug and used the second group as a control. After 2 weeks, she sacrificed the mice and weighed the tumors. The weight of tumors for each group of mice is below. The researcher is interested in learning if the drug reduces the growth of tumors. Her hypothesis is: The mean weight of tumors from mice in group A will be less than the mean weight of mice in group 2. Group A Treated with Drug
0.72 0.68 0.69 0.66 0.57 0.66 0.70 0.63 0.71 0.73 Mean = ` 0.675 Group B Control- Not Treated 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.80 0.78

0.742

A t-test can be used to test the probability that the two means do not differ. The alternative is that tumors from the group treated with the drug will not weigh less than tumors from the control group. This is a one-tailed test because the researcher is interested in if the drug decreased tumor size. She is not interested in if the drug changed tumor size.

The values from the table above are entered into the spreadsheet as shown below.

The t-test shows that tumors from the drug group were significantly smaller than the tumors from the control group because p < 0.05. The researcher therefore accepts her hypothesis that the drug reduces the growth of tumors. Example #2 A researcher wishes to learn whether the pH of soil affects seed germination of a particular herb found in forests near her home. She filled 10 flower pots with acid soil (pH 5.5) and ten flower pots with neutral soil (pH 7.0) and planted 100 seeds in each pot. The mean number of seeds that germinated in each type of soil is below.
Acid Soil pH 5.5 42 45 40 37 41 41 48 Neutral Soil pH 7.0 43 51 56 40 32 54 51

50 45 46 Mean =

55 50 48

43.5

48

The researcher is testing whether soil pH affects germination of the herb. Her hypothesis is: The mean germination at pH 5.5 is different than the mean germination at pH 7.0. A t-test can be used to test the probability that the two means do not differ. The alternative is that the means differ; one of them is greater than the other. This is a two-tailed test because the researcher is interested in if soil acidity changes germination percentage. She does not specify if it increases or decreases germination. Notice that a 2 is entered for the number of tails below.

The t-test shows that the mean germination of the two groups does not differ significantly because p > 0.05. The researcher concludes that pH does not affect germination of the herb. Example #3
Suppose that a researcher wished to learn if a particular chemical is toxic to a certain species of beetle. She believes that the chemical might interfere with the

beetles reproduction. She obtained beetles and divided them into two groups. She then fed one group of beetles with the chemical and used the second group as a control. After 2 weeks, she counted the number of eggs produced by each beetle in each group. The mean egg count for each group of beetles is below. Group 1 fed chemical 33 31 34 38 32 28 Group 2 not fed chemical (control) 35 42 43 41

Mean =

32.7

40.3

The researcher believes that the chemical interferes with beetle reproduction. She suspects that the chemical reduces egg production. Her hypothesis is: The mean number of eggs in group 1 is less than the mean number of group 2. A t-test can be used to test the probability that the two means do not differ. The alternative is that the mean of group 1 is greater than the mean of group 2. This is a 1-tailed test because her hypothesis proposes that group B will have greater reproduction than group 1. If she had proposed that the two groups would have different reproduction but was not sure which group would be greater, then it would be a 2-tailed test. Notice that a 1 is entered for the number of tails below. The results of her t-test are copied below.

The researcher concludes that the mean of group 1 is significantly less than the mean for group 2 because the value of P < 0.05. She accepts her hypothesis that the chemical reduces egg production because group 1 had significantly less eggs than the control.

The One-Sample t Test


We are going to use an example for the one sample t-test about whether the academic staff at the UNE psychology department publish differently than the national average of 8 publications per staff member between 1974 and 1996. A random sample of size 5 was taken from the 17 members of academic staff here at UNE. The data obtained were : 3, 21, 1, 15, and 7. The sample mean is = 9.4 and the sample standard deviation, s, is 8.41. The question is whether this sample mean of 9.4 gives us reason to believe that UNE academic staff publish differently than the national average of 8. Descriptively it might seem so, but had a different random sample of size 5 been collected, the sample mean would have been different. What we need is a way of determining whether the difference between 9.4 and 8 is a real difference or merely an apparent difference. If it can be considered a real difference we have evidence that to say that UNE academic staff have published more than the national average of 8 publications each. However, if the difference of 1.4 could easily be explained as a chance fluctuation then we cannot say that UNE staff publish more than the national average. The one-sample t test gives us a way of answering that question. The data

Staff Member 1 2 3 4 5 6 7

Publications 7 3 44 4 21 11 10

8 9 10 11 12 13 14 15 16 17

3 0 6 0 1 2 15 15 0 28

10 11.52 Figure 6.6 Data for one-sample t-test example.

First, we specify the null hypothesis to be tested: Ho: m = 8. The alternative hypothesis is non-directional or "two-tailed": H1: m 8. We've taken a random sample of size 5 from our population (whose mean we are pretending for this example we don't know) and computed the sample mean. Now, we ask a simple question: What is the probability of getting a sample mean of 9.4 or larger OR 6.6 or smaller if the population mean is actually 8? Before I suggest a way of answering this question, let me first talk about why we are

interested in the "6.6 or smaller." The alternative hypothesis is phrased non-directionally or "two-tailed." This means that we have no particular interest in whether the actual population mean is larger or smaller than the null hypothesised value of 8. What we are interested in is how probable it is to get a sample mean that deviates 1.4 units (from 9.4 minus 8) from the null hypothesised value of the population. Because we have no interest in the direction of the deviation, we consider 1.4 units in the opposite direction as well. This is the nature of a non-directional hypothesis test. Of interest is the probability of getting a deviation from the null hypothesised population mean this large or larger, regardless of the direction of the difference. In deciding whether or not the data are consistent with the null hypothesis, we need to consider sample mean deviations on either side of the null hypothesised population mean. We call this a twotailed test. Had we phrased the alternative hypothesis directionally, because we were interested in a particular direction of the deviation, we'd conduct a one-tailed or directional test and only consider deviations in that direction of interest when computing the probability. For example, if we were interested in the alternative that UNE staff publish more than the national average, our alternative hypothesis would be H1: m > 8 and we'd be interested only in the probability of getting a sample mean of 9.4. We would ignore evidence in the other direction even if the deviation in the other direction is substantial. The issue of whether to use two-tailed tests or one-tailed tests is a thorny one. Experts around the world cannot agree. For our purposes, we will always be using two-tailed tests. In the School of Psychology at UNE, two-tailed tests are the standard approach even when the research hypotheses are phrased in a directional manner.

F-test
From Wikipedia, the free encyclopedia Jump to: navigation, search

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact F-tests mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.[1]

Contents

1 Common examples of F-tests o 1.1 F-test of the equality of two variances 2 Formula and calculation
o o

2.1 Multiple-comparison ANOVA problems 2.2 Regression problems

3 One-way ANOVA example 4 ANOVA's robustness with respect to Type I errors for departures from population normality 5 References 6 External links

Common examples of F-tests


Examples of F-tests include:

The hypothesis that the means of several normally distributed populations, all having the same standard deviation, are equal. This is perhaps the bestknown F-test, and plays an important role in the analysis of variance (ANOVA). The hypothesis that a proposed regression model fits the data well. See Lackof-fit sum of squares. The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear models that are nested within each other. Scheff's method for multiple comparisons adjustment in linear models.

F-test of the equality of two variances

Main article: F-test of equality of variances

This F-test is sensitive to non-normality.[2][3] In the analysis of variance (ANOVA), alternative tests include Levene's test, Bartlett's test, and the BrownForsythe test. However, when any of these tests are conducted to test the underlying assumption of homoscedasticity (i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wise Type I error rate.[4]

Formula and calculation


Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares. The test statistic in an F-test is the ratio of two scaled sums of squares

reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow the F-distribution under the null hypothesis, the sums of squares should be statistically independent, and each should follow a scaled chi-squared distribution. The latter condition is guaranteed if the data values are independent and normally distributed with a common variance.
Multiple-comparison ANOVA problems

The F-test in one-way analysis of variance is used to assess whether the expected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA F-test can be used to assess whether any of the treatments is on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA F-test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The disadvantage of the ANOVA F-test is that if we reject the null hypothesis, we do not know which treatments can be said to be significantly different from the others if the F-test is performed at level we cannot state that the treatment pair with the greatest mean difference is significantly different at level . The formula for the one-way ANOVA F-test statistic is

or

The "explained variance", or "between-group variability" is

where group,

denotes the sample mean in the ith group, ni is the number of observations in the ith denotes the overall mean of the data, and K denotes the number of groups.

The "unexplained variance", or "within-group variability" is

where Yij is the jth observation in the ith out of K groups and N is the overall sample size. This Fstatistic follows the F-distribution with K 1, N K degrees of freedom under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value. Note that when there are only two groups for the one-way ANOVA F-test, F = t2 where t is the Student's t statistic.
Regression problems

Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the Restricted model, and Model 2 is the Unrestricted one. That is, model 1 has p1 parameters, and model 2 has p2 parameters, where p2 > p1, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2. (We use the convention that any constant parameter in a model is included when counting the parameters. For instance, the simple linear model y = mx + b has p = 2 under this convention.) The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives a significantly better fit to the data. One approach to this problem is to use an F test. If there are n data points to estimate parameters of both models from, then one can calculate the F statistic, given by

where RSSi is the residual sum of squares of model i. If your regression model has been calculated with weights, then replace RSSi with 2, the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with (p2 p1, n p2) degrees of freedom. The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F-distribution for some desired false-rejection probability (e.g. 0.05). The F-test is a Wald test.

One-way ANOVA example


Consider an experiment to study the effect of three different levels of a factor on a response (e.g. three levels of a fertilizer on plant growth). If we had 6 observations for each level, we could

write the outcome of the experiment in a table like this, where a1, a2, and a3 are the three levels of the factor being studied.
a1 6 8 4 5 3 4 a2 8 12 9 11 6 8 a3 13 9 11 8 7 12

The null hypothesis, denoted H0, for the overall F-test for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the F-ratio: Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

where a is the number of groups.

Step 3: Calculate the "between-group" sum of squares:

where n is the number of data values per group. The between-group degrees of freedom is one less than the number of groups

so the between-group mean square value is

Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group
a1 a2 a3

6 5 = 8 9 = 13 10 1 -1 =3 8 5 = 12 9 9 10 = 3 =3 -1 4 5 = 9 9 = 11 10 -1 0 =1 5 5 = 11 9 8 10 = 0 =2 -2 3 5 = 6 9 = 7 10 = -2 -3 -3 4 5 = 8 9 = 12 10 -1 -1 =2

The within-group sum of squares is the sum of squares of all 18 values in this table

The within-group degrees of freedom is

Thus the within-group mean square value is

Step 5: The F-ratio is

The critical value is the number that the test statistic must exceed to reject the test. In this case, Fcrit(2,15) = 3.68 at = 0.05. Since F = 9.3 > 3.68, the results are significant at the 5% significance level. One would reject the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The p-value for this test is 0.002. After performing the F-test, it is common to carry out some "post-hoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The standard error of each of these differences is . Thus the first group is strongly different from the other groups, as the mean difference is more times the standard error, so we can be highly confident that the population mean of the first group differs from the population means of the other groups. However there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error.

Note F(x, y) denotes an F-distribution with x degrees of freedom in the numerator and y degrees of freedom in the denominator.

ANOVA's robustness with respect to Type I errors for departures from population normality
The oneway ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis of covariance. None of these F-tests, however, are robust when there are severe violations of the assumption that each population follows the normal distribution, particularly for small alpha levels and unbalanced layouts.[5] Furthermore, if the underlying assumption of homoscedasticity is violated, the Type I error properties degenerate much more severely.[6] For nonparametric alternatives in the factorial layout, see Sawilowsky.[7] For more discussion see ANOVA on ranks.

F Test Example

F Test is generally defined as ratio of the variances of the given two set of values. First Calculate Standard Deviation and variation of the given set of values. The formula used to calculate SD is, Standard Deviation Formula

The standard deviation is represented by the symbol and variance is square of the standard deviation. The formula used to calculate F Test is, F Test Formula

For Example Calculate F Test for given 10,20,30,40,50 and 5,10,15,20,25. For 10, 20,30,40,50: Calculate Variance of first Set Total Inputs (N) =(10,20,30,40,50) Total Inputs (N)=5 Mean (xm)= (x1+x1+x2...xn)/N Mean (xm)= 150/5 Means(xm)= 30 SD=sqrt(1/(N-1)*((x1-xm)2+(x2-xm)2+..+(xn-xm)2)) =sqrt(1/(5-1)((10-30)2+(20-30)2+(30-30)2+(40-30)2+(50-30)2)) =sqrt(1/4((-20)2+(-10)2+(0)2+(10)2+(20)2)) =sqrt(1/4((400)+(100)+(0)+(100)+(400)))

=sqrt(250) =15.8114 Variance=SD2 Variance=15.81142 Variance=250 Calculate Variance of second set For 5, 10,15,20,25: Total Inputs(N) =(5,10,15,20,25) Total Inputs(N)=5 Mean (xm)= (x1+x2+x3...xN)/N Mean (xm)= 75/5 Means (xm)= 15 SD=sqrt(1/(N-1)*((x1-xm)2+(x2-xm)2+..+(xn-xm)2)) =sqrt(1/(5-1)((5-15)2+(10-15)2+(15-15)2+(20-15)2+(25-15)2)) =sqrt(1/4((-10)2+(-5)2+(0)2+(5)2+(10)2)) =sqrt(1/4((100)+(25)+(0)+(25)+(100))) =sqrt(62.5) =7.9057 Variance=SD2 Variance=7.90572 Variance=62.5 To calculate F Test F Test = (variance of 10, 20,30,40,50) / (variance of 5, 10, 15, 20, 25) = 250/62.5 = 4. The required F Test value is 4. The Above Worksheet example is an walk through to understand F Test calculation and finding the value for F Test for the given values. When it comes to online calculation this F Test Calculator is an Essential tool to make the calculation easy

Two-Sample F-Test
In order to compare two methods, it is often important to know whether the

variabilities for both methods are the same. In order to compare two variances v1, and v2, one has to calculate the ratio of the two variances. This ratio is called the F-statistic (in honor of R.A. Fisher) and follows an F distribution: F = v1/v2 The null hypothesis H0 assumes that the variances are equal and the ratio F is therefore one. The alternative hypothesis H1 assumes that v1 and v2 are different, and that the ratio deviates from unity. The F-test is based on two assumptions: (1) the samples are normally distributed, and (2) the samples are independent of each other. When these assumptions are fulfilled and H0 is true, the statistic F follows an F-distribution. The following is a decision table for the application of an F-test. In order to calculate the F-quantile, or an associated probability, refer to an F table, or to the distribution calculator of Teach/Me.

Remarks:

When the normality assumption is not fulfilled, one should use a non-parametric method. In general the F-test is more sensitive to deviations from normality than the t-test. The F-test can be used to check the equal variance assumption needed for the two sample t-test, but the non-rejection of H0 does not imply that the assumption (of equal variance) is valid, since the probability of the type 2 error is unknown.

Example: Suppose you have two series of measurements, one with 10 observations, and one with 13 observations. The variance of the first series is 0.88, and the variance of the second series is 1.79. Is the variance of the second series significantly larger than the variance of the first series (at a significance level of 0.05)? In order to check this, we assume the null hypothesis that the variance of the second series is not larger than the variance of the first series. The alternative hypothesis would be that the second variance is indeed larger than the first one. Next we have to calculate the F statistic: F = 1.79/0.88 = 2.034. Now we can compare the F statistic with the critical value at a 5 percent level of significance. By using the distribution calculator we find a critical value of 3.073. Since F is

only 2.034 we cannot reject our null hypothesis (the second variance is not significantly larger than the first one).

Stats: F-Test
The F-distribution is formed by the ratio of two independent chi-square variables divided by their respective degrees of freedom. Since F is formed by chi-square, many of the chi-square properties carry over to the F distribution.

The F-values are all non-negative The distribution is non-symmetric The mean is approximately 1 There are two independent degrees of freedom, one for the numerator, and one for the denominator. There are many different F distributions, one for each pair of degrees of freedom.

F-Test
The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.

All hypothesis testing is done under the assumption the null hypothesis is true If the null hypothesis is true, then the F test-statistic given above can be simplified (dramatically). This ratio of sample variances will be test statistic used. If the null hypothesis is false, then we will reject the null hypothesis that the ratio was equal to 1 and our assumption that they were equal. There are several different F-tables. Each one has a different level of significance. So, find the correct level of significance first, and then look up the numerator degrees of freedom and the denominator degrees of freedom to find the critical value. You will notice that all of the tables only give level of significance for right tail tests. Because the F distribution is not symmetric, and there are no negative values, you may not simply take the opposite of the right critical value to find the left critical value. The way to find a left critical value is to reverse the degrees of freedom, look up the right critical value, and then take the reciprocal of this value. For example, the critical value with 0.05 on the left with 12 numerator and 15 denominator degrees of freedom is found of taking the reciprocal of the critical value with 0.05 on the right with 15 numerator and 12 denominator degrees of freedom.
Avoiding Left Critical Values

Since the left critical values are a pain to calculate, they are often avoided altogether. This is the procedure followed in the textbook. You can force the F test into a right tail test by placing the sample with the large variance in the numerator and the smaller variance in the denominator. It does not matter which sample has the larger sample size, only which sample has the larger variance. The numerator degrees of freedom will be the degrees of freedom for whichever sample has the larger variance (since it is in the numerator) and the denominator degrees of freedom will be the degrees of freedom for whichever sample has the smaller variance (since it is in the denominator). If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up and compare the right critical value.

Assumptions / Notes

The larger variance should always be placed in the numerator The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2 Divide alpha by 2 for a two tail test and then find the right critical value If standard deviations are given instead of variances, they must be squared

When the degrees of freedom aren't given in the table, go with the value with the larger critical value (this happens to be the smaller degrees of freedom). This is so that you are less likely to reject in error (type I error) The populations from which the samples were obtained must be normal. The samples must be independent

Table of Contents

You might also like