Professional Documents
Culture Documents
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Some assumptions we might make when solving problems in the other sciences: Physics: There is no air resistance. Ecology: Foxes and rabbits are the only animals. Epidemiology: People only die of disease or old age. Oceanography: Seawater has the same composition everywhere. Archaeology: At a given site, older objects are deeper in the ground than younger objects. etc.
Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
In AP Statistics, nearly all assumptions are of three types. The sample is representative of the population.
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
In AP Statistics, nearly all assumptions are of three types. The sample is representative of the population. The sample is large enough that the distribution of some statistic is approximately equal to its limiting distribution.
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
In AP Statistics, nearly all assumptions are of three types. The sample is representative of the population. The sample is large enough that the distribution of some statistic is approximately equal to its limiting distribution. Modeling assumptions. (In AP statistics, these arise in the regression context.)
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
When we extrapolate information from a sample to a population, we are naturally assuming that the sample is representative of the population in some way. In particular, lets suppose that X is some random variable whose distribution over the population is f (x). We will be observing data whose distribution is not f (x), but rather g (x|S)the conditional distribution of X given membership in the sample.
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
When we extrapolate information from a sample to a population, we are naturally assuming that the sample is representative of the population in some way. In particular, lets suppose that X is some random variable whose distribution over the population is f (x). We will be observing data whose distribution is not f (x), but rather g (x|S)the conditional distribution of X given membership in the sample. Is it fair to observe g (x|S) and treat it as if it it were f (x)?
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
When we extrapolate information from a sample to a population, we are naturally assuming that the sample is representative of the population in some way. In particular, lets suppose that X is some random variable whose distribution over the population is f (x). We will be observing data whose distribution is not f (x), but rather g (x|S)the conditional distribution of X given membership in the sample. Is it fair to observe g (x|S) and treat it as if it it were f (x)? Under what conditions are the conditional and unconditional distributions of X the same?
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The distributions f (x) and g (x|S) will be the same if and only if X and S are independentthat is, if the value of the random variable and the elements membership in the sample are completely unrelated to one another. Can we guarantee that?
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The distributions f (x) and g (x|S) will be the same if and only if X and S are independentthat is, if the value of the random variable and the elements membership in the sample are completely unrelated to one another. Can we guarantee that? Of course we can. If membership in the sample is completely random, then it is independent of anything we can think of. Thats why we like random samples so much. They allow us to treat the X s in our sample as if they had the same distribution as those in the population. Random sampling permits inference.
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
A Problem
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
But theres a problem. Random samples are hard to come by. So we often assume for the sake of inference that our sample is random even though we know for a fact it isnt.
A Problem
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
But theres a problem. Random samples are hard to come by. So we often assume for the sake of inference that our sample is random even though we know for a fact it isnt. Is that okay? What will happen if the assumption is really quite wrong?
Alices project
A student named Alice wants to estimate the proportion of students in her school who can name her states two U.S. Senators. She plans to sample 100 students and ask them to name the two senators. Shell use the sample proportion she gets to construct a condence interval estimate of the population proportion.
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Alices project
A student named Alice wants to estimate the proportion of students in her school who can name her states two U.S. Senators. She plans to sample 100 students and ask them to name the two senators. Shell use the sample proportion she gets to construct a condence interval estimate of the population proportion. How should she get her sample?
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Here are some ways she might sample 100 students. Include all the students in her classes until she gets 100. Include her friends and her friends friends. Send out an all-school email and include the rst 100 students who reply. Stand outside the school in the morning and include every fth student until she has 100.
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Roberts project
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics
Roberts school is considering starting school a half hour later in the morning and ending a half hour later in the afternoon. Robert wants to estimate the proportion of students in the school who would be in favor of this. Would Alices sampling method work for him?
You plan to guide your students through a class project in which they will estimate the quality of ve brands of paper towels. (The students will determine how to dene quality.) You buy one roll of each of ve brands of paper towels and bring them to class. The students take six towels of each brand and measure each ones quality. Parallel boxplots of the brands quality scores give an idea of which brands are better than others.
Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
You plan to guide your students through a class project in which they will estimate the quality of ve brands of paper towels. (The students will determine how to dene quality.) You buy one roll of each of ve brands of paper towels and bring them to class. The students take six towels of each brand and measure each ones quality. Parallel boxplots of the brands quality scores give an idea of which brands are better than others. What assumption are you and your students making? Is it justied?
Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Capture/recapture
Forty squirrels are captured in a park and tagged. A month later, fty squirrels in the park are captured, and ten are found to be tagged. Thats 20% of the second sample, so we might assume that N = 5 40 = 200 is a good estimate of the number of squirrels in the park. What assumptions are being made here? Are they reasonable? What will the eect be on the population size estimator N if they are not reasonable?
Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The upshot: In practice, we often do not have the luxury of true random samples. We may make the assumption that a sample is a simple random sample (SRS) so that we may extrapolate its properties to the population. Whether this is reasonable or not depends on whether we believe that sample membership and the properties of interest are more or less independent of one another.
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The upshot: In practice, we often do not have the luxury of true random samples. We may make the assumption that a sample is a simple random sample (SRS) so that we may extrapolate its properties to the population. Whether this is reasonable or not depends on whether we believe that sample membership and the properties of interest are more or less independent of one another. Reasonable people may disagree about whether the assumption is justied.
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
X t(n1) s/ n X 1 X 2
t(n ) 2 ) (df
Ei
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
X t(n1) s/ n X 1 X 2
t(n ) 2 ) (df
Ei
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
We rely on sample sizes being large enough to justify using a limiting distribution. How do we know whats large enough?
Proportions
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics
For proportions, we often require that np and n(1 p) both be at least 10 (or sometimes 5). At least one text requires the single condition that np(1 p) > 5. Where did these come from?
Lets require that the mean of p (which is p) be at least three standard deviations (one standard deviation is p(1 p)/n) above 0. p > 3 p(1 p)/n
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Lets require that the mean of p (which is p) be at least three standard deviations (one standard deviation is p(1 p)/n) above 0. p > 3 p
2
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
p(1 p)/n
Lets require that the mean of p (which is p) be at least three standard deviations (one standard deviation is p(1 p)/n) above 0. p > 3 p np
2 2
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
p(1 p)/n
Lets require that the mean of p (which is p) be at least three standard deviations (one standard deviation is p(1 p)/n) above 0. p > 3 p np
2 2
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
p(1 p)/n
np > 9(1 p)
Note that this is guaranteed by np > 10. (Do you see why?) And np > 5 would guarantee that p > 2 p(1 p)/n.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The requirement n(1 p) > 10 will similarly insure that p is at least three standard deviations below 1.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The requirement n(1 p) > 10 will similarly insure that p is at least three standard deviations below 1. If we are comparing two proportions, then both must obey this rule-of-thumb.
means
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics
X will have an approximately normal distribution (and hence X will have an approximately t (n1) distribution) if the s/ n sample size n is large enough.
Heres a common rule-of-thumb. If n 10 and the data display no obvious outliers or skew, then continue with inference using the t distribution; but the inference still relies on the assumption that the population is approximately normal. If 10 < n 40 and the data display at most only one or two outliers and no severe skew, then continue with with inference using the t distribution; the population need not be approximately normal. If n > 40, then except for extraordinarly severe skewwhich would be indicated by numerous outliersinference using the t distribution is okay.
Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Heres a common rule-of-thumb. If n 10 and the data display no obvious outliers or skew, then continue with inference using the t distribution; but the inference still relies on the assumption that the population is approximately normal. If 10 < n 40 and the data display at most only one or two outliers and no severe skew, then continue with with inference using the t distribution; the population need not be approximately normal. If n > 40, then except for extraordinarly severe skewwhich would be indicated by numerous outliersinference using the t distribution is okay. (But you might question whether inference on such a populations mean is what you really want to be doing.)
Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
linear regression
Our third type of assumption is the modeling assumption. We choose a mathematical model that we think will describe the underlying phenomenon that generated our data. If the model is very poor, then our inference will be meaningless.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
linear regression
Our third type of assumption is the modeling assumption. We choose a mathematical model that we think will describe the underlying phenomenon that generated our data. If the model is very poor, then our inference will be meaningless. The only example of this students see in AP statistics is the linear regression model, which is: yi = 0 + 1 xi + e i , where ei N(0, ). In this model there are three parameters to be estimated: 0 , 1 , and .
iid
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Another way of stating the model is: yi N(0 + 1 xi , ) In other words, the means of the y s have a linear relationship with the xs, but there is variability in the actual y data about those meansnormally distributed errors with constant variability across all values of x.
Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
To check whether the model is reasonable, we: Look at the residuals from the linear regression to see whether there is a pattern. Verify that the residuals are of roughly constant magnitude for all xs. Check to see whether the residuals appear to be approximately normally distributed.
Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics
If a sample is assumed to be random when in fact there is an association between sample membership and a measured variable of interest, then the sampling procedure is biased. Conclusions will tend to systematically overestimate or underestimate the parameters of interest.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
0.8
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.1
2 x
1.5
0.5
2 x
2 x
Figure: These four distributions were used to simulate random samples of dierent sizes.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met
0 10 20 30 sample size 40 50
0.95
0.95
0.9
0.9
0.85
10
20 30 sample size
40
50
0.85
Conclusion
0.95
0.95
0.9
0.9
0.85
0.85
10
20 30 sample size
40
50
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
0.1
0.08
0.07
0.06
0.05
0.04
0.03
20
40
60 sample size
80
100
120
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
0.1
0.08
0.07
0.06
0.05
0.04
0.03
20
40
60 sample size
80
100
120
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
radial velocity
4 time
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
radial velocity
4 time
A sample problem
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Suppose the following is an AP exam question. 8 apples are randomly sampled from an orchard, and their weights in pounds are measured to be 0.44, 0.43, 0.33, 0.56, 0.50, 0.50, 0.45, 0.38. Estimate the mean weight of apples in the orchard, including a reasonable margin of error.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The rubric requires students to check inferential assumptions. Suppose one student writes the following: np > 10 and n(1 p) > 10 n < 40, assume normality.
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
The rubric requires students to check inferential assumptions. Suppose one student writes the following: np > 10 and n(1 p) > 10 n < 40, assume normality. How do you think the rubric will score this response for the check of assumptions?
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
0.55
0.5
0.45
0.4
0.35
1.5
0.5
0.5
1.5
Normal probability plot looks roughly linear, so assumption is reasonable. We continue with the construction of a 95% t-Condence Interval...
Three types of assumptions in AP statistics: Sample is random. (Reasonableness cannot be checked with the data.) Sample is large enough to assume a limiting distribution for a statistic. Linear model is appropriate for bivariate data. Checking assumptions is a big part of all statistics. This should be a part of every inference problem students do all year, not just something they study in an idolated unit.
Overview Random samples Limiting distributions of statistics Modeling assumptions: linear regression When assumptions arent met Conclusion
Assumptions for Statistical Inference Floyd Bullard Overview Random samples Limiting distributions of statistics