Professional Documents
Culture Documents
Assumptions Data originated from a random sample Data are interval/ratio Both variables are distributed normally Linear relationship and homoscedasticity
Homoscedasticity
From Wikipedia, the free encyclopedia
/homoskdstk/) if
all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.[1] The assumption of homoscedasticity simplifies mathematical and computational treatment. Serious violations in homoscedasticity (assuming a distribution of data is homoscedastic when in actuality it is heteroscedastic (/htroskdstk/)) may result in overestimating the goodness of fit as measured by the Pearson coefficient.
Contents
[hide]
[edit]Assumptions
of a regression model
As used in describing simple linear regression analysis, one assumption of the fitted model (to ensure that the least-squares estimators are each a best linear unbiased estimator of the respective population parameters, by the GaussMarkov theorem) is that the standard deviations of the error terms are constant and do not depend on the x-value. Consequently, each probability distribution for y (response variable) has the same standard deviation regardless of the x-value (predictor). In short, this assumption is homoscedasticity. Homoscedasticity is not required for the estimates to be unbiased, consistent, and asymptotically normal.
Test Requirements
The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met. The dependent variable Y has a linear relationship to the independent variable X. For each value of X, the probability distribution of Y has the same standard deviation . For any given value of X,
The Y values are independent. The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large.
Estimation Requirements
The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met. The dependent variable Y has a linear relationship to the independent variable X. For each value of X, the probability distribution of Y has the same standard deviation . For any given value of X,
The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large.
The Y values are independent, as indicated by a random pattern on the residual plot. The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large. A histogram or a dotplot will show the shape of the distribution.
ASSUMPTIONS Hypothesis Testing for Pearson r Data originated from a random sample. Data are interval/ratio. Both variables are normally distributed. Linear relationship and homoscedasticity.
Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply. The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40. The sample size is greater than 40, without outliers.
Two-sample t-test The test procedure, called the two-sample t-test, is appropriate when the following conditions are met: The sampling method for each sample is simple random sampling. The samples are independent. Each population is at least 10 times larger than its respective sample. Each sample is drawn from a normal or near-normal population. Generally, the sampling distribution will be approximately normal if any of the following conditions apply.
The population distribution is normal. The sample data are symmetric, unimodal, without outliers, and the sample size is 15 or less. The sample data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40. The sample size is greater than 40, without outliers.
The population distribution is normal. The sample data are symmetric, unimodal, without outliers, and the sample size is 15 or less. The sample data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40. The sample size is greater than 40, without outliers.
In the previous lesson, we showed how to conduct a hypothesis test for a proportion when the sample included at least 10 successes and 10 failures. This requirement serves two purposes: It guarantees that the sample size will be at least 20 when the proportion is 0.5. It ensures that the minimum acceptable sample size increases as the proportion becomes more extreme. When the sample does not include at least 10 successes and 10 failures, the sample size will be too small to justify the hypothesis testing approach presented in the previous lesson. This lesson describes how to test a hypothesis about a proportion when the sample size is small, as long as the sample includes at least one success and one failure. The key steps are: Formulate the hypotheses to be tested. This means stating the null hypothesis and thealternative hypothesis. Determine the sampling distribution of the proportion. If the sample proportion is the outcome of a binomial experiment, the sampling distribution will be binomial. If it is the outcome of ahypergeometric experiment, the sampling distribution will be hypergeometric.