You are on page 1of 7

Hypothesis Testing for Pearson r

Assumptions Data originated from a random sample Data are interval/ratio Both variables are distributed normally Linear relationship and homoscedasticity

Homoscedasticity
From Wikipedia, the free encyclopedia

Plot with random data showing homoscedasticity.

In statistics, a sequence or a vector of random variables is homoscedastic (

/homoskdstk/) if

all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.[1] The assumption of homoscedasticity simplifies mathematical and computational treatment. Serious violations in homoscedasticity (assuming a distribution of data is homoscedastic when in actuality it is heteroscedastic (/htroskdstk/)) may result in overestimating the goodness of fit as measured by the Pearson coefficient.

Contents
[hide]

1 Assumptions of a regression model 2 Testing 3 Homoscedastic distributions 4 See also 5 References

[edit]Assumptions

of a regression model

As used in describing simple linear regression analysis, one assumption of the fitted model (to ensure that the least-squares estimators are each a best linear unbiased estimator of the respective population parameters, by the GaussMarkov theorem) is that the standard deviations of the error terms are constant and do not depend on the x-value. Consequently, each probability distribution for y (response variable) has the same standard deviation regardless of the x-value (predictor). In short, this assumption is homoscedasticity. Homoscedasticity is not required for the estimates to be unbiased, consistent, and asymptotically normal.

Hypothesis Test for Regression Slope


This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y. The test focuses on the slope of the regression line Y = 0 + 1X where 0 is a constant, 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable. If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

Test Requirements
The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met. The dependent variable Y has a linear relationship to the independent variable X. For each value of X, the probability distribution of Y has the same standard deviation . For any given value of X,

The Y values are independent. The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large.

Previously, we described how to verify that regression requirements are met.

Regression Slope: Confidence Interval


This lesson describes how to construct a confidence interval around the slope of a regression line. We focus on the equation for simple linear regression, which is: = b0 + b1x where b0 is a constant, b1 is the slope (also called the regression coefficient), x is the value of the independent variable, and is the predicted value of the dependent variable.

Estimation Requirements
The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met. The dependent variable Y has a linear relationship to the independent variable X. For each value of X, the probability distribution of Y has the same standard deviation . For any given value of X,

The Y values are independent.

The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large.

Previously, we described how to verify that regression requirements are met.

Prerequisites for Regression


Simple linear regression is appropriate when the following conditions are satisfied. The dependent variable Y has a linear relationship to the independent variable X. To check this, make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. For each value of X, the probability distribution of Y has the same standard deviation . When this condition is satisfied, the variability of the residuals will be relatively constant across all values of X, which is easily checked in a residual plot. For any given value of X,

The Y values are independent, as indicated by a random pattern on the residual plot. The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large. A histogram or a dotplot will show the shape of the distribution.

ASSUMPTIONS Hypothesis Testing for Pearson r Data originated from a random sample. Data are interval/ratio. Both variables are normally distributed. Linear relationship and homoscedasticity.

Hypothesis Test for a Mean


This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met: The sampling method is simple random sampling. The sample is drawn from a normal or near-normal population.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply. The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.

Two-sample t-test The test procedure, called the two-sample t-test, is appropriate when the following conditions are met: The sampling method for each sample is simple random sampling. The samples are independent. Each population is at least 10 times larger than its respective sample. Each sample is drawn from a normal or near-normal population. Generally, the sampling distribution will be approximately normal if any of the following conditions apply.

The population distribution is normal. The sample data are symmetric, unimodal, without outliers, and the sample size is 15 or less. The sample data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40. The sample size is greater than 40, without outliers.

Hypothesis Test: Difference Between Paired Means


This lesson explains how to conduct a hypothesis test for the difference between paired means. The test procedure, called the matched-pairs t-test, is appropriate when the following conditions are met: The sampling method for each sample is simple random sampling. The test is conducted on paired data. (As a result, the data sets are not independent.) Each sample is drawn from a normal or near-normal population. Generally, the sampling distribution will be approximately normal if any of the following conditions apply.

The population distribution is normal. The sample data are symmetric, unimodal, without outliers, and the sample size is 15 or less. The sample data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40. The sample size is greater than 40, without outliers.

Hypothesis Test for a Proportion


This lesson explains how to conduct a hypothesis test of a proportion, when the following conditions are met: The sampling method is simple random sampling. Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes and 5 failures are enough.) The population size is at least 10 times as big as the sample size.

Hypothesis Test of a Proportion, Given a Small Sample

In the previous lesson, we showed how to conduct a hypothesis test for a proportion when the sample included at least 10 successes and 10 failures. This requirement serves two purposes: It guarantees that the sample size will be at least 20 when the proportion is 0.5. It ensures that the minimum acceptable sample size increases as the proportion becomes more extreme. When the sample does not include at least 10 successes and 10 failures, the sample size will be too small to justify the hypothesis testing approach presented in the previous lesson. This lesson describes how to test a hypothesis about a proportion when the sample size is small, as long as the sample includes at least one success and one failure. The key steps are: Formulate the hypotheses to be tested. This means stating the null hypothesis and thealternative hypothesis. Determine the sampling distribution of the proportion. If the sample proportion is the outcome of a binomial experiment, the sampling distribution will be binomial. If it is the outcome of ahypergeometric experiment, the sampling distribution will be hypergeometric.

You might also like