MATH& 146

Lesson 4
Section 1.3
Study Beginnings


Populations and Samples
The population is the complete collection of
individuals or objects that you wish to learn about.

To study larger populations, we select a sample. The
idea of sampling is to select a portion of the population
and study that portion to gain information about the


For every parameter there is a corresponding sample statistic. The statistic is a numerical value summarizing the sample data and describing the sample the same way the parameter describes the population. 3 .Parameters and Statistics A parameter is a value (usually a proportion or average) that describes the population.

and 72 said they were satisfied. 100 first year students at the college were randomly sampled. sample. Example 1 Define the population. 4 . and statistic from the following study: We want to know the proportion of new students that were satisfied with New Student Orientation at YVC. parameter.

Suppose yesterday. you asked five of your friends how many glasses of milk they drank the day before. sample. 5 . 3. and statistic from the following study: You want to determine the average number of glasses of milk college students drink per day. 1. The answers were 1. and 12 glasses of milk. 0. Example 2 Define the population. parameter. in your English class.

A clearly laid out research question is helpful in identifying what subjects or cases should be studied and what variables are important. 6 . Research Questions The first step in conducting research is to identify topics or questions that are to be investigated.

Research Questions A research question should refer to a target population. a sample is taken. Sample data is then used to estimate the population parameter and answer the research question. Often times. it is too expensive or difficult to collect data for every case in a population. however. A sample represents a subset of the cases and is often a small fraction (usually less than one-tenth) of the population. Instead. 7 .

Does that prove it takes longer to graduate at Duke than at other colleges? Why or why not? 8 . Example 3 Consider the following research question: "Over the last 5 years. what is the average time to degree for Duke undergraduate students?" a) What are the target population and parameter? b) Suppose the researcher met two students who took more than 7 years to graduate from Duke.

Does that prove that the drug does not work? Why or why not? 9 . Example 4 Consider the following research question: "Does a new drug reduce the number of deaths in patients with severe heart disease?" a) What are the target population and parameter? b) Suppose my friend's dad had a heart attack and died after they gave him the new heart disease drug.

there were two problems. the data only represent one or two cases. • Second. However. Anecdotal Evidence Both of the conclusions of the last two examples were based on some data. it is unclear whether these cases are actually representative of the population. Data collected in this haphazard fashion are called anecdotal evidence. • First. 10 .

11 . but lousy statistics. there is no reason to expect the individuals to be representative of anyone but themselves. Anecdotal Evidence When anecdotal evidence is cited. They can make nice stories.

Bias If someone was permitted to pick and choose exactly which cases were included in a sample. 12 . A biased sample causes problems because any statistic computed from that sample has the potential to be consistently erroneous. This introduces bias into a sample. it is entirely possible that the sample could be skewed to that person's interests.

Landon was predicted to easily win with 370 electoral votes and 57% of the popular vote. In its October 31 issue. 13 .An Example of “Bad Data” The 1936 presidential election between Franklin Roosevelt and Alf Landon is notable for the Literary Digest poll. which was based on over two million returned postcards.

S. The Literary Digest was completely discredited because of the poll and was soon discontinued. Predicted Vote Actual Vote FDR 161 (~43%) 523 (60.8%) Alf Landon 370 (57%) 8 (36.5%) 14 . two-party system began in the 1850s. 1936 Election Results Landon's electoral vote total of eight is a tie for the record low for a major-party nominee since the current U.

Such a list is guaranteed to be slanted toward middle. Why did the Literary Digest fail? The first major problem with the poll was in the selection process for the names on the mailing list. 15 . which were taken from telephone directories. club membership lists. and by default to exclude lower-income voters. etc.and upper-class voters. lists of magazine subscribers.

4 million responded to the survey. (In addition. 16 . people who respond to surveys are different from people who don't. Why did the Literary Digest fail? The second problem with the Literary Digest poll was that out of the 10 million people whose names were on the original mailing list.). the size of the sample was about one-fourth of what was originally intended. only about 2. Thus.

• Nonresponse bias: A sample is chosen. Bias In general. 17 . but a subset cannot or will not respond. • Response bias: Participants to a survey provide incorrect information. intentionally or unintentionally. there are three common types of bias that might occur in a sample: • Selection bias: The method for selection makes the sample unrepresentative of the population.

Conclusions based on samples drawn with biased methods are inherently flawed. There is usually no way to fix bias after the sample is drawn and no way to salvage useful information from it. 18 . Bias Bias is the bane of sampling – the one thing above all to avoid.

A survey question asked of unmarried men was "What is the most important feature you consider when deciding whether to date somebody?" The results were found to depend on whether the interviewer was male or female. or a response bias. a nonresponse bias. Example 5 Indicate whether the potential bias is a selection bias. 19 .

. and how it could affect your results. you pick the first 20 you can catch for your experiment. b) A public opinion poll is conducted using the telephone directory. c) You are conducting a study of a new diabetes drug. a) A cage has 1000 rats. you advertise for participants in the newspaper and TV. Example 6 For each situation. explain why selection bias could be introduced.

Example 7 You need to conduct a study of longevity for people who were born in the decade following the end of World War II in 1945. would you get good results? Why or why not? . If you were to visit graveyards and use only the birth/death dates listed on tombstones.

" A professional poll by Newsday found that 91% of randomly chosen respondents would have children again.000 responses she received were "No. Example 8 "If you had to do it over again. 22 . Explain the apparent contradiction between these two surveys using what you have learned about sampling. would you have children?" This is the question that advice columnist Ann Landers asked her readers back in 1976. It turns out that nearly 70% of the 10.

23 . It is often the purpose of a study to determine if and/or how one variable (called the explanatory variable) affects another (called the response variable). Types of Variables In many studies more than one variable is recorded per case or individual.

24 . Explanatory Variable: Any variable that explains the response variable. A variable you would be interested in predicting or forecasting. Types of Variables Response Variable: The outcome of a study.

Example 9 Pick out which variable you think should be the explanatory variable and which variable should be the response. 25 . a) Weights of nuggets of gold (in ounces) and their market value (in $) over the last few days are provided. and you wish to use this to estimate the value of a gold ring that weighs 4 ounces.

at 3 PM. An oak tree in the park has a circumference of 36 inches. Example 9 continued b) You have data collected on the amount of time since chlorine was added to the public swimming pool and the concentration of chlorine still in the pool. and you wish to know approximately how old it is. Chlorine was added at 8 AM. 26 . c) You have data on the circumference of oak trees (measured 12 inches from the ground) and their age (in years). and you wish to know what the concentration is now.

guess whether the association with the response will be positive. what are some explanatory variables that might be worth considering. Example 10 Suppose your wanted to conduct a study to predict a student's success. negative. 27 . numerical) of each explanatory variable. Determine the variable type (categorical. Using a student's GPA as the response variable. or none. For each numerical explanatory variable.