You are on page 1of 19

Week 8

Week 8 Inference about a Population

Objectives

Overview
This module introduces us to statistical inference about population parameters. The problem objective we are considering is "Describe a Population". First, we will look at inference (hypotheses tests and confidence intervals) about the population mean when we do not have any information about the population standard deviation. Without this information, we will not be able to use z as our test statistic. Instead, we will use a t-statistic and the t-distribution that was introduced back in Chapter 8. Then, we will look at inference (hypotheses tests and confidence intervals) about the population variance. We will use a chi square statistic and the chi-square distribution we first saw in Chapter 8. Next, we will consider inference (hypotheses tests and confidence intervals) about a population proportion. This time, we have a binomially distributed sampling distribution, but will make a normality assumption so we can use our Z-distribution.

WHAT TO DO THIS WEEK


1. 2. 3. 4. Review the Module Notes and Readings. Complete any suggested review exercises and review the solutions. Use the flow charts to decide what test to do! Discuss with your team and work together to understand the material. Remember that the best way to learn is to teach someone else!!

Conduct hypothesis tests BY HAND to make inferences about a population mean with an unknown population standard deviation, using the critical value method Use EXCEL Data Analysis Tools and the pvalue method for hypothesis tests about a population mean with an unknown population standard deviation Determine and interpret confidence intervals for a population mean with unknown population standard deviation (BY HAND and using EXCEL) Conduct hypothesis tests BY HAND to make inferences about population variances, using the critical value method Use EXCEL Data Analysis Tools and the pvalue method for hypothesis tests about population variances Determine and interpret confidence intervals for population variances (BY HAND and using EXCEL) Conduct hypothesis tests BY HAND to make inferences about population proportions, using the critical value method Use EXCEL Data Analysis Tools and the pvalue method for hypothesis tests about population proportions Determine and interpret confidence intervals for population proportions (BY HAND and using EXCEL) Use appropriate descriptive statistics to check the required conditions for these types of inference. Use Flow Charts to determine which test is appropriate.

Readings
Chapter 12

Module Notes Course RoadMap Revisited

Week 8

As promised, now that we know the fundamentals of estimation and hypothesis testing, we will spend the rest of the semester applying all of our skills to answer different research questions! Before we get started, I want to introduce you to some flow charts we will be using throughout the rest of the semester to help with what will soon become the most difficult part of statistics for you deciding what test to do! Remember that for descriptive statistics, we talked about two initial decisions we need to make to decide what approach to use: problem objective and data type. That is also where we start with inferential statistics. As we study the statistical inference tools that we will cover in this course, we will determine first the problem objective addressed by each one, then the data type, and take it from there. The figure below shows the five problem objectives we will cover:

Each of these problem objectives directs you to a separate flowchart that will guide you in making the decision as to the appropriate inferential technique to answer the question you are addressing. For each of these (and the one above), I include a link to a WORD document so you can easily print each of them out for easy reference. The first objective we will consider is "Describe a Population". Here is the Flow Chart for that objective:

Next we look at "Compare Two Populations". This Flow Chart is the most complex of them all and takes two pages. The link for the first one contains both pages. It is formatted landscape, so it should print reasonably well.

Week 8

Compare Two or More Populations:

Week 8

Analyze for Relationship between Two Variables:

Analyze for Relationship between Two or more Variables:

So, we start with Inference about a Population, which answers inferential questions with the objective to Describe a Population.

Week 8

Chapter 12 Inference about a Population


In this chapter, we address inference about one population. We will look at inference about the population mean, about the population variance, and about the population proportion. Inferences about the population mean and about the population variance are used when the data are interval. With nominal data, we would draw inferences about the population proportion. Now that you have the concepts of hypothesis testing and confidence intervals nailed, in order to conduct these other tests, you just need to know the appropriate test statistic and how to identify the critical value. The hard part is not in conducting the test itself; it is in determining what test to do! So, let's revisit the flow chart that will guide us for this problem objective:

12.1 Inference about a Population Mean when the Standard Deviation is Unknown
In the above flow chart, we have already answered the question about problem objective; the next question is "Data type". This test is used when we have Interval data, so look at that section of the chart. When we go there, we see another question "Type of descriptive measurement". The mean is a measure of central location, so we go to that section of the chart. This test is very similar to what we did in the last chapter, but is more realistic since, if we do not know the population mean (thus the need for the test!), we often would not know the population standard deviation either. But, in order to use the standardized test statistic (our z-statistic), we need it. So, if we want to do inference about a mean, and will be calculating , but do not have , so that we can calculate z to use as our test statistic, what do we do? Well, what can we do if we do not know the population parameter? We can estimate it using the sample statistic (do a point estimate)!! So, we can calculate the sample standard deviation, s, and use it in place of ! Very cool, huh? :-) But, if we use the estimate of , in our z formula, the resulting sampling distribution is no longer normally distributed. Instead, it is "student t distributed". William S. Gossett first described this distribution and, since he published under the pseudonym "Student" (he did this work while working at Guinness Brewery), it is called the Student t distribution. We discussed the t-distribution back in Chapter 8. Remember, it is a family of distributions that differ based on degrees of freedom, (Greek letter nu), where = n-1. Degrees of freedom is also commonly symbolized as df, so I will tend to use that (easier than the insert symbols step :-)). The formula for calculating t is the same as for z except that is replaced with s (since is unknown!!).

Now, we still can do both types of inference: hypothesis tests and estimation. Since Hypothesis tests are more complicated, we will tackle those first!

Hypothesis Tests
Since the t-distribution is symmetrical like the z-distribution, the same picture used for z can be used to describe the problem. The only difference is that the critical value will now be the t-statistic that cuts off in the tail of the distribution.

Week 8

Since we are testing the population mean, there are 3 possible tests -- upper-tailed, lower-tailed, and two-tailed tests. We will continue with our seven step process for hypothesis tests, and we can still use either the critical value method or the p-value method to make our decision. They would be applied the same way. A key difference is that we can only approximate the p-value when conducting the test by hand (since, in our t-table (recall from wayyyyy back in Chapter 8), we only have a limited selection of tail areas). So, from here on out, we will follow the following convention: we will use the critical value method for conducting tests by hand and the p-value method when we use the computer output. As I suspect I mentioned, most more advanced statistical software only provides p-values, not critical values, so it is commonly accepted practice to use the pvalue method whenever using statistical software. For now, we will practice with both methods as the critical value method is typically more helpful at really understanding the underlying concepts needed to really be able to apply these tools effectively later. Below is a summary of this test like I developed in the last chapter:

Before we do an example, let's talk about the t-distribution for moment. How is it different from z? In calculating our t-statistic, remember that we are have used TWO pieces of data that we estimated from our sample: s and . So, now we have two pieces of data that will likely vary from sample to sample. This means that we now have a distribution with more variability. The t-distribution has more variability than z and the degrees of freedom takes sample size into account. And remember that higher sample size helps reduce the sampling error. Now, let's go through Example 12.1. Here's the problem: Currently (2007) most products manufactured from recycled material are considerably more expensive than those manufactured from material found in the earth. Newspapers are an exception. It can be profitable to recycle newspaper, but a major expense is the collection from homes. In recent years a number of companies have gone into the business of collecting used newspapers from households and recycling them. A financial analyst for one such company has recently computed that the firm would make a profit if the mean weekly newspaper collection from each household exceeded 2.0 pounds. In a study to determine the feasibility of building a recycling plant, a random sample of 148 households was drawn from a large community, and the weekly weight of newspapers discarded for recycling for each household was recorded. Do these data provide sufficient evidence to allow the analyst to conclude that a recycling plant would be profitable? First, we need to go through our flow chart to see what test we should do. We have already gone through the chart and found that we are looking at central location with interval data. The next question asks what type of inference we need. The key to answering this question is how the question is worded. You are given background information and then asked "Do these data provide sufficient evidence to allow the analyst to conclude that a recycling plant would be profitable". Anytime you are asked about "sufficient evidence" or can we "conclude" or can we "infer", you will need a hypothesis test to answer your question. So, that leads us to what is called "t-test of . Since this is a hypothesis test, we will follow our seven step process. I will first go through it by-hand using the critical value rule, then I will go through it with Excel and the p-value rule.

Week 8

Example 12.1 Using the Critical Value Method


1. Decide what test are you doing/what test statistic you will use t-test of

2.

State your hypotheses H0: = 2 H1: > 2

Since the plant will be profitable IF the mean collection from each household is greater than 2.0, then we want the upper tailed alternative hypothesis shown above. 3. State your significance level Let = 0.01 Due to the very high capital costs of building a plant, the manager feels the cost of making a Type I error is high, so we want to limit our probability of making a Type I error to 1%. 4. State your rejection rule With = 0.01 and df = 148-1 = 147, the value of t that cuts off 1% in the tail is 2.351 (Check back to chapter 8 if you do not recall how to use the t-distribution. We use the closest degrees of freedom in the table) so our rule is, "Reject the null hypothesis if t > 2.351". (Notice that the sign in our rejection rule matches up with our alternate hypothesis. This will always be true when using the critical value method for t-tests.) 5. Calculate your test statistic We go to our sample data and calculate to be 2.18 and s to be 0.98. We now convert our to t.

Note that the in this formula is really since we are using our sampling distribution. We do not have , but you should recall that this is equal to the population mean . At the very beginning of this test, we stated a null hypothesis that said = 2. Since we are assuming this to be true, we can use that! We get: t = (2.18-2) / (0.98/148) = 2.23 6. Compare to your rejection rule and make a decision about the null hypothesis Since 2.23 is NOT greater than 2.351, we do NOT reject the null hypothesis and conclude: 7. State your conclusion in terms of the problem There is not enough evidence to conclude that the recycling plant will be profitable. (Notice that this conclusion is simply the alternative hypothesis stated in terms of the problem). Using the p-value method: Since we can only approximate the p-value for the t-statistic by hand, we will use that method whenever we use the computer.

Example 12.1 Using the P-Value Method


Note everything is exactly the same except for steps 4 and 5. 1. Decide what test are you doing/what test statistic you will use t-test of 2. State your hypotheses H0: = 2 H1: > 2

Week 8

3.

State your significance level Let = 0.01

4.

State your rejection rule Reject the null hypothesis if p-value < 0.01. (This will ALWAYS be less than.)

5.

Calculate your test statistic and corresponding P-value Since we are going to use Excel for this, we go to our data and conduct the test. This test is in Data Analysis Plus and is called "t-test Mean". We go to Add-ins, Data Analysis Plus, and select "t-test:Mean". Select the input data range, enter a hypothesized mean of 2, and click "labels" if appropriate (meaning that when you entered your data range, you included the column label cell). Note that the dialog box also asks you to enter an Alpha. You can ignore this cell since most sophisticated software packages will not ask you and the only reason Excel does is so that it can provide a correct critical value. But, we won't be using that part of the output so it does not matter!

6.

Compare to your rejection rule and make a decision about the null hypothesis Notice that we want to be sure to use the correct p-value, which in this case is the one-tailed p-value that I have highlighted above. Since 0.0134 is NOT less than 0.01, we do NOT reject the null hypothesis and conclude:

7.

State your conclusion in terms of the problem There is not enough evidence to conclude that the recycling plant will be profitable. (Notice that this conclusion is simply the alternative hypothesis stated in terms of the problem).

Watch the video in Blackboard for a demonstration

Estimation
To estimate the confidence interval when is unknown, we use the same idea -- use s as an estimate and use the t-statistic in place of z, so the formula looks like this:

Now, let's go through Example 12.2. Here's the problem: In 2007 (the latest year reported) 134,543,000 tax returns were filed in the United States.The Internal Revenue Service (IRS) examined 1.03% or 1,385,000 of them to determine if they were correctly done.To determine how well the auditors are performing, a random sample of these returns was drawn and the additional tax was reported. Estimate with 95% confidence the mean additional income tax collected from the 1,385,000 files audited. First, we need to go through our flow chart to see what test we should do. We have already gone through the chart and found that for this particular inference we are studying, we are looking at central location with interval data. The next question asks what type of inference we need. The key to answering this question is how the question is worded. You are given background information and then asked to "Estimate with 95% confidence the mean additional income tax collected from the 1,385,000 files audited." Anytime, you see the words "estimate", you are doing Estimation. I know it sounds kind of silly to point out right now, but it is a stumbling block for some students when they see a random mix of problems for the first time. So, that leads us to what is called "t-estimate of , which means we need to do a confidence interval. For confidence intervals, we do not have a multiple-step process to go through. We simply compute and then interpret the interval. I'll go through it first by-hand, then using Excel.

Example 12.2 By-Hand


This interval works just like the interval we did for the mean when we did have the population standard deviation. We are establishing an upper and lower limit around our sample mean that we have some measure of confidence will capture our population mean.

Week 8

In our formula, we need our

and s, which we need to get from our sample. Even though we are doing this problem by-hand, we have a =

reasonably large data set, so we can use basic Excel functions to get the statistics we need. When we use the "=average" function, we find 11,343 and when we use the "=stdev" function, we find that s = 4,400. We also need to determine the appropriate t-statistic that cuts off

in the tail. Remember that our confidence level is 1- = 0.95; so = 0.05 and

= 0.025. We also need our degrees of freedom. From our sample, see that our n = 184 (we can use the "=count" function). So our degrees of freedom is 183. Going to our t-table, we have a choice of df=180 or df=190. You can pick the closest one. So, we need to find t0.25,180. We get t=1.973. Now, using our formula:

So LCL = $10,703 and UCL = $11,983. Interpretation: I am 95% confident that the interval from 10.703 to 11,983 will capture the average additional tax collected.

Example 12.2 Using Excel's Data Analysis Tool


Once we have determined that a confidence interval is what we want, we go to our data file and go to Add-ins, Data Analysis Plus, and select " testimate:mean". The only inputs we need are our input data range and our alpha. The alpha is needed here! Also, don't forget to click the labels box if appropriate. We get the following output:

Our interpretation is the same! Notice that I have interpreted this in keeping with what we talked about in Chapter 10, using the interval as the subject of the sentence and including the confidence level. The interpretation for this problem is the text has left out the confidence level so it is incomplete. And, remember, you have to be careful where you put that 95%! Watch the video in Blackboard for a demonstration Some other key points in this section: Just as we talked about before, it is possible to have a finite population in which case, we must use the finite population correction factor. When we have the population size, we can do a confidence interval estimator of the population total simply by multiplying our upper and lower limits by the given population size. There are some sample problems like this! We made an assumption of normality for the population when we conducted the t-test, but as long as it is approximately normal, we can use this test. Whenever we do inferential statistics, we must verify that the required conditions have been met. To check for normality, we can draw histograms to view the sample data. It is important to realize that we only need for the sample data to not be extremely nonormal. The larger our sample, the less of a problem this would be. The degrees of freedom is the number of independent pieces of information involved in any estimates we must use. For the t-distribution, we use s as an estimate of . In this calculation we have n-1 independent pieces of information. Why? If we have and n-1 of the values, we can determine the last one. This may be clear as mud to you and that's okay. I wanted to give you a slightly different explanation of degrees of freedom than your text does so that perhaps it will make sense to you. I heard degrees of freedom explained many different ways before it finally clicked. Just know that the number of degrees of freedom for the t-distribution is n-1 and that other distributions will have different "formulas" for degrees of freedom. The t-statistic is interpreted just like z -- it is the number of standard errors away from the mean (remember, standard error is simply the standard deviation of the sampling distribution). The t-distribution has more variability than z since we now have 2 variables that may differ from sample to sample -Do the following suggested exercises for Section 12.1 and review the solutions before proceeding. and s.

Section Review Exercises

12.3, 12.4, 12.5,*12.9, *12.10, **12.19, **12.21 *do these for t-statistic only

Week 8

**use critical value rule Do the following "BY-HAND" using Excel only to calculate sample statistics needed; show all 7-steps for hypothesis tests and use the critical value rule to make your decision; interpret all confidence intervals 12.25, 12.26, 12.28, 12.29 Do the following using the appropriate data analysis tool; show all 7steps for hypothesis tests and use the p-value rule to make your decision; interpret all confidence intervals 12.32, 12.33, 12.34, 12.40, 12.42, 12.46, 12.47, 12.51, 12.53 Exercise Solutions

12.2 Inference about a Population Variance


Now that we know how to draw inferences about the population mean from a sample, we will see a test to draw inferences about the population variance. In our Flowchart:

We are still working with interval data, but now the type of descriptive measurement is variability. You might guess that to draw inferences about the population variance, we would use the sample variance, s2, as an estimator. You would be correct!! In order to use it we need a sampling distribution so that we can use some probability estimates. You recall that the sample variance, s2 = (X- )2 / n-1. This can be algebraically rearranged to be (X- )2 = (n-1)s2. Lucky for us, statisticians have determined that (n-1)s2 divided by the population variance is chi-squared distributed. Remember the chi-square distribution from chapter 8?

Well, since we will be assuming a null hypothesis about the population variance, we will have a value for the denominator, so we can use this as our test statistic and use the chi-square distribution as the sampling distribution!! The chi-square formula is also used to develop the confidence interval. How convenient!! We still can do both types of inference: hypothesis tests and estimation. Since Hypothesis tests are more complicated, we will again tackle those first!

Hypothesis Tests
I'll summarize the test for you below as I did for the other tests. As with the tests about the mean, we can have both one-tailed and two-tailed tests. Remember that the chi-square distribution is not symmetrical and is always positive, so this makes determining our critical values and confidence interval a little different than it was for the z and the t tests. However, like the t-distribution, it is a family of distributions that differ according to their degrees of freedom. The degrees of freedom for the chi-square distribution is n-1.

Week 8

Now, let's go through Example 12.3 Container-filling machines are used to package a variety of liquids; including milk, soft drinks, and paint. Ideally, the amount of liquid should vary only slightly, since large variations will cause some containers to be under filled (cheating the customer) and some to be overfilled (resulting in costly waste). The president of a company that developed a new type of machine boasts that this machine can fill 1 liter (1,000 cubic centimeters) containers so consistently that the variance of the fills will be less than 1. To examine the veracity of the claim, a random sample of 25 1-liter fills was taken and the results (cubic centimeters) recorded. Do these data allow the president to make this claim at the 5% significance level? Remember that you would have used your flow chart and gotten to the point of needing to determine what type of inference. The key wording here is: "Do these data allow the president to make this claim at the 5% significance level?" To answer this question, we must use a hypothesis test, so we are led to the 2-test of 2.

Example 12.3 Using the Critical Value Method


1. Decide what test are you doing/what test statistic you will use 2-test of 2 2. State your hypotheses H0: 2 = 1 H1: 2 < 1 Since he claims the variance will be less than 1 cc, that will be our alternative hypothesis. 3. State your significance level Let = 0.05 4. State your rejection rule

Week 8

(Check back to chapter 8 if you do not recall how to use the chi-square distribution) 5. Calculate your test statistic We go to our sample data and calculate s2 to be 0.6333.

6.

Compare to your rejection rule and make a decision about the null hypothesis Since 15.20 is NOT less than 13.8484, we DO NOT reject the null hypothesis and conclude:

7.

State your conclusion in terms of the problem There is not enough evidence to conclude that the variance of the new machine will be less than 1 cc. Note the wording of the conclusion when we DO NOT reject -- we state that there is NOT enough evidence to conclude that the alternate is true.

Using the p-value method: Since we can only approximate the p-value for the 2-statistic by hand, we will use that method whenever we use the computer.

Example 12.3 Using the P-Value Method


Note everything is exactly the same except for steps 4 and 5. 1. Decide what test are you doing/what test statistic you will use 2-test of 2 2. State your hypotheses H0: 2 = 1 H1: 2 < 1 3. State your significance level Let = 0.05 4. State your rejection rule Reject the null hypothesis if p-value < 0.05. (This will ALWAYS be less than.) 5. Calculate your test statistic and corresponding P-value Since we are going to use Excel for this, we go to our data and conduct the test. This test is in Data Analysis Plus and is called "Chi-squared test: Variance". We go to Add-ins, Data Analysis Plus, and select "Chi-squared test: Variance". Select the input data range, enter a hypothesized variance of 1, and click "labels" if appropriate. Remember that the dialog box also asks you to enter an Alpha. You can ignore this cell since most sophisticated software packages will not ask you and the only reason Excel does is so that it can provide a correct critical value. But, we won't be using that part of the output so it does not matter!

6.

Compare to your rejection rule and make a decision about the null hypothesis Notice that we want to be sure to use the correct p-value, which in this case is the one-tailed p-value that I have highlighted above. Since 0.0852 is NOT less than 0.05, we do NOT reject the null hypothesis and conclude:

Week 8

7.

State your conclusion in terms of the problem There is not enough evidence to conclude that the variance of the new machine will be less than 1 cc. Note the wording of the conclusion when we DO NOT reject -- we state that there is NOT enough evidence to conclude that the alternate is true.

Watch the video in Blackboard for a demonstration

Estimation
Since the sampling distribution of Chi-square is not symmetrical, the formula for the confidence interval is stated differently than for z and t. We cannot just use a negative value for the lower limit since we do not have negative values in this distribution. To estimate the confidence interval for the population variance, we use the following formula:

Now, let's go through Example 12.4. Estimate with 99% confidence the variance of fills in Example 12.3. Using our Flow Chart, since we are asked to "Estimate with 99% confidence", we know that we need to do a confidence interval. So, that leads us to what is called "2-estimate of 2, which means we need to do a confidence interval. For confidence intervals, we do not have a multiple-step process to go through. We simply compute and then interpret the interval. I'll go through it first by-hand, then using Excel.

Example 12.4 By-Hand


In our formula, we need our s2, which we need to get from our sample. Even though we are doing this problem by-hand, we have a reasonably large data set, so we can use basic Excel function to get the statistic we need. When we use the "=var" function, we find that s2 =0.6333. We also need essentially the two chi-square statistics for the tail areas. Since we want a 99% confidence interval, that leaves 1% to be divided over the tails, so the tail area will be 0.005. So we need the 2 that cuts off 0.005 in the upper tail and the 2 that cuts off 0.005 in the lower tail. We need our degrees of freedom to figure this out; since our sample size is 25, df = 24. So we need:

Now, using our formula:

Now, one thing that can keep this from feeling too much like "plug 'n chug" is to forget about which calculating is the UCL and which is the LCL. You have two different chi-square table values and you are dividing the same number, (n-1)*s2, by each of them. The larger of the two will be your upper limit and the smaller will be you lower limit. Interpretation: I am 99% confident that the interval from 0.3333 to 1.537 will capture the variance of fills. Again, note how I have interpreted this differently than the text. You need to be sure to include the confidence level in the interpretation and be careful not to make a probability statement about the population parameter.

Example 12.4 Using Excel's Data Analysis Tool


Once we have determined that a confidence interval is what we want, we go to our data file and go to Add-ins, Data Analysis Plus, and select " 2estimate:mean". The only inputs we need are our input data range and our alpha. The alpha is needed here! Also, don't forget to click the labels box if appropriate. We get the following output:

Week 8

Our interpretation is the same! Watch the video in Blackboard for a demonstration As for the t-test, we made an assumption that the population was normal. Again, as long as it is not extremely nonnormal, we can use this test. We can use a histogram to check to be sure that the data is not extremely nonnormal. Do the following suggested exercises for Section 12.2 and review the solutions before proceeding.

Section Review Exercises


12.57 Do the following "BY-HAND" using Excel only to calculate sample statistics needed; show all 7-steps for hypothesis tests and use the critical value rule to make your decision; interpret all confidence intervals 12.59, 12.60, 12.61 Do the following using the appropriate data analysis tool; show all 7steps for hypothesis tests and use the p-value rule to make your decision; interpret all confidence intervals 12.63, 12.64, 12.65, 12.66 Exercise Solutions

12.3 Inference about a population proportion


When we have a population of nominal data, remember that all we can do is frequency counts. There are a variety of inferential tests that can use these frequency counts. We will only cover one here. Going back to our Flowchart:

We are looking at the Nominal Data Type section and you will see the next question is about number of categories. Right now, we are going to address only the "Two" categories section. We will address "Two or more" later in the course. When we have nominal data with two categories, we can calculate a proportion from the frequency counts. So, for example, we have a variable representing the population for type of gasoline purchased and it has two categories: 87 octane and something else. We can calculate the

Week 8

proportion of cars purchasing 87 octane. To describe this population of nominal data, the population proportion is the parameter we use. So, we use the sample proportion as the estimator and the sampling distribution of the proportion for our inference. Since we have two categories, this is a binomial variable. The sample proportion is calculated as the number of successes (the category we are interested in) divided by our sample size. As long as our sample size is big enough so that both np and n(1-p) are greater than 5 (remember this from Chapter 8? p is the "probability of success"), then the sampling distribution of the proportion is approximately normal, and we can use the z-distribution for our inference. Again, we can do both estimation and hypothesis testing (as you see in the flow chart).

Hypothesis Testing
The procedure you follow is exactly the same as for the other tests we have done, but the formulas are different. The mean and standard deviation for the sampling distribution of p are different and the formula used for z reflects this. Below is a summary of the tests. As with the tests we have done so far, we can have both one-tailed and two-tailed tests.

Now, let's go through Example 12.5 After the polls close on Election Day, networks compete to be the first to predict which candidate will win. The predictions are based on counts in certain precincts and on exit polls. Exit polls are conducted by asking random samples of voters who have just exited from the polling booth (hence the name) for which candidate they voted. In American presidential elections the candidate who receives the most votes in a state receives the state's entire Electoral College vote. In practice, this means that either the Democrat or the Republican candidate will win. Suppose that the results of an exit poll in one state were recorded where 1 = Democrat and 2 = Republican. The polls close at 8:00 P.M. Can the networks conclude from these data that the Republican candidate will win the state? Should the network announce at 8:01 P.M. that the Republican candidate will win? Remember that you would have used your flow chart and gotten to the point of needing to determine what type of inference. The key wording here is "Can the networks conclude from these data that the Republican candidate will win the state?" To answer this question, we must use a hypothesis test, so we are led to the Z-test of a proportion.

Example 12.5 Using the Critical Value Method


1. Decide what test are you doing/what test statistic you will use Z-test of a proportion 2. State your hypotheses H0: p = 0.5 H1: p > 0.5 The specific parameter of interest is the proportion of Republican votes and we just need to test if it is greater than 50%

Week 8

3.

State your significance level Let = 0.05

There is no particular reason to be concerned about either type of error, so we'll just use the typical 0.05 for alpha. 4. State your rejection rule Z>Z = Z0.05 = 1.645 So, Reject if Z > 1.645 5. Calculate your test statistic From our sample data, we can count the number of "successes" (votes for republican) and we get 407 with a sample size of 765, so or sample proportion is calculated as 407/765, so = 0.532.

So 6. Compare to your rejection rule and make a decision Since 1.77 is greater than 1.645, we reject the null hypothesis and conclude: 7. State your conclusion in terms of the problem: There is enough evidence to conclude that the Republican candidate will win. Since we are using the z-statistic, we could get our p-value from the tables, but we will continue with the convention of using the p-value method whenever we use the computer.

Example 12.5 Using the P-Value Method


1. Decide what test are you doing/what test statistic you will use Z-test of a proportion 2. State your hypotheses H0: p = 0.5 H1: p > 0.5 3. State your significance level Let = 0.05 4. State your rejection rule Reject the null hypothesis if p-value < 0.05. (This will ALWAYS be less than.) 5. Calculate your test statistic and corresponding P-value Since we are going to use Excel for this, we go to our data and conduct the test. This test is in Data Analysis Plus and is called "Z-Test: Proportion". We go to Add-ins, Data Analysis Plus, and select "Z-Test: Proportion". Select the input data range, enter a "code for success" of 2 (this is the code in your data assigned to the category you are testing), enter the hypothesized proportion, and click "labels" if appropriate. Remember that the dialog box also asks you to enter an Alpha. You can ignore this cell since most sophisticated software packages will not ask you and the only reason Excel does is so that it can provide a correct critical value. But, we won't be using that part of the output so it does not matter! An important thing to realize is that sometimes the data may not be set up with codes! It depends on how the data was entered. So, you may need to essentially create your own nominal variable with appropriate codes.

Week 8

6.

Compare to your rejection rule and make a decision about the null hypothesis Notice that we want to be sure to use the correct p-value, which in this case is the one-tailed p-value that I have highlighted above. Since 0.0382 is less than 0.05, we do REJECT the null hypothesis and conclude:

7.

State your conclusion in terms of the problem There is enough evidence to conclude that the Republican candidate will win.

Watch the video in Blackboard for a demonstration

Estimation
To estimate the confidence interval for the population proportion, since we are using z as our test statistic, the method is essentially the same as for the confidence interval for a mean. But, we use the appropriate mean and standard error .. The formula for the confidence interval is then:

Remember that both the test statistic and confidence interval formulas require np>5 and n(1p)>5. Also, just like for other intervals, if we have a large, but finite population, we can multiply the upper and lower limits by the population size to determine an interval estimate for the actual total population. Let's go through the Nielsen Ratings Example. Statistical techniques play a vital role in helping advertisers determine how many viewers watch the shows that they sponsor. Although several companies sample television viewers to determine what shows they watch, the best known is the A. C. Nielsen firm. The Nielsen ratings are based on a random sample of approximately 5,000 of the 115 million households in the United States with at least one television (in 2010). A meter attached to the televisions in the selected households keeps track of when the televisions are turned on and what channels they are tuned to. The data are sent to the Nielsen's computer every night from which Nielsen computes the rating and sponsors can determine the number of viewers and the potential value of any commercials. The results from Sunday, February 14, 2010 for the time slot 9:00 to 9:30 P.M. have been recorded using the following codes:

NBC wants to estimate how many households were tuned to the Vancouver Winter Olympics.

Week 8

Since we are asked to "estimate", rather than conclude or infer something, we need to do a confidence interval. Also note that they actually want an estimate of how many households not just the proportion estimate, so we will go one extra step. There is no indication of how confident they want to be, so we will do a 95% confidence interval.

Nielsen Example By-Hand


From our data, we need our sample proportion which we can get by using the "=countif" function in Excel to count the number of 4s. We get 1319 and we are told that our sample size is 5000, so the sample proportion is = 1319/5000 = 0.2638. We also need to determine the appropriate z= 0.025. So, we need Z0.025 which we statistic that cuts off in the tail. Remember that our confidence level is 1- = 0.95; so = 0.05 and find from our table to be 1.96. Now, using our formula:

So, LCL = 0.2516 and LCL = 0.2760 And given that the population size is 115 million, we get: LCL = 0.2516*115 = 28.934 million and UCL = 0.2760*115 = 31.740 million Interpretation: I am 95% confident that the interval from 0.2516 to 0.2760 will capture proportion of viewers that watched the Vancouver Winter Olympics and the interval from 29.834 million to 31.740 million captures the number of households that watched .

Nielsen Example Using Excel's Data Analysis Tool


Once we have determined that a confidence interval is what we want, we go to our data file and go to Add-ins, Data Analysis Plus, and select " zestimate:proportion". The inputs we need are our input data range, our code for success (4) and our alpha Also, don't forget to click the labels box if appropriate. We get the following output:

Our interpretation is the same! Remember to be careful how your state this! Watch the video in Blackboard for a demonstration

Missing Data
This section is actually before the Nielsen example, but I thought it made more sense to show that first. Missing data is generally due to nonresponses to our questions. Have you ever done a survey and not answered one of the questions? These can impact the validity of our data and there are tests that have to be done to assess "nonresponse bias". There are many ways to deal with missing data and there is a huge research stream on methods for dealing with missing data. Eliminating the missing responses as described here is only one of the appropriate methods. The important point is that not addressing the missing data can screw up your test results!!!

Selecting the Sample Size to Estimate the Proportion


As with our work with mean, we sometimes what to determine the sample size that we need that will give us the interval width that we want for a given confidence level. Just like with the mean, we want to set a "bound" on the error of estimation. The formula for doing this is similar:

In order to do this, we need to have a value for . The problem is that this is our sample proportion and we have not taken the sample yet! There are two different methods we can use. If we have some idea of what we think our sample proportion might be, we can use that to calculate our sample size or we can assume the worst case and set = 0.5. If it turns out that is 0.5, we will get our desired interval width. If is smaller than 0.5, then our interval width will be narrower that we had wanted. Narrower is even better, but it means we did take a larger sample than needed and perhaps spent more than necessary.

Week 8

Do the following suggested exercises for Section 12.3 and review the solutions before proceeding.

Section Review Exercises


12.70, 12.72, 12.74, 12.76, 12.77, 12.80

Do the following "BY-HAND" using Excel only to calculate sample statistics needed; show all 7-steps for hypothesis tests and use the critical value rule to make your decision; interpret all confidence intervals 12.81, 12.82, 12.84

Do the following using the appropriate data analysis tool; show all 7steps for hypothesis tests and use the p-value rule to make your decision; interpret all confidence intervals 12.93, 12.94, 12.100, 12.101, 12.105, 12.106, 12.107, 12.111, 12.116, 12.118, 12.122, 12.123 Exercise Solutions 12.4 Applications in Marketing We will not be covering this section, but you might read through them to see some different applications. (I have no idea at all whether you would see something like this later on in a marketing class, so . . . take a look!). Try these additional end of chapter exercises and review the solutions. This is where you will really get to practice figuring out what technique to use! Get in the habit of using the flow chart to help you decide !!

Chapter Review Exercises


Do the following using the appropriate data analysis tool; show all 7steps for hypothesis tests and use the p-value rule to make your decision; interpret all confidence intervals 12.131, 12.132, 12.136, 12.137, 12.141, 12.142, 12.145, 12.150 Exercise Solutions Is it getting any easier? The basics are usually the toughest. Once you get the hang of the hypothesis testing and confidence intervals, it is a matter of choosing the right test!! Now, we move on to look at comparing two populations to each other! And we make it exponentially tougher!!

You might also like