You are on page 1of 5

STAT 4220 Fall 2011 Assignment 1 DUE: Friday, September 9, in Class Question 1 Decide whether a discrete or continuous random

variable is the best model for each of the following variables. (0.5 pts each) (a) The lifetime of a biomedical device after implant in a patient (b) The number of times a transistor in a computer memory changes state in one operation (c) The strength of a concrete specimen (d) The proportion of defective solders joints on a circuit board (e) The weight of an injection-molded plastic part (f) The number of molecules in a sample of gas (g) The energy generated from a reaction (h) The concentration of organic solids in a water sample Question 2 Consider the two samples shown here: Sample 1: 20 19 18 17 18 16 20 16 Sample 2: 20 16 20 16 18 20 18 16 (a) Calculate the range for both samples. Would you conclude that both samples exhibit the same variability? Explain (1 pt) (b) Calculate the sample sd for both samples. Do these quantities indicate that both samples have the same variability? Explain (1 pt) (c) Write a short statement contrasting the sample range vs. the sample sd as a measure of variability. (1 pt) Question 3 A very important aspect of a statisticians job is interpreting data. If this isnt done with care, you can come up with some dangerous false conclusions! For instance, suppose that you have just 1

had a potentially life-threatening medical condition that requires surgery. You can have this surgery at one of two hospitals let's call them A and B. You look up information on the Internet, and find the following data on deaths following surgery in the two hospitals: Died Survived Total Hospital A Hospital B 63 16 2037 784 2100 8000

(a) Which hospital do you choose and why? You should support your explanation with evidence from the data. (2 pts) Now it turns out that each patient was classified as being in either good or bad condition when first admitted to hospital. We may break the hospital data down according to this new categorical variable as displayed in the following two tables GOOD CONDITION Hospital A 6 594 600 Hospital B 8 592 600

Died Survived Total

POOR CONDITION Died Survived Total Hospital A 57 1443 1500 Hospital B 8 192 200

(b) Suppose that you are a patient in good condition. Which hospital do you prefer? But what about if you are in poor condition? Give a brief explanation based on the data. (2 pts) (c) This is a situation in which the association between two variables (survival status and hospital) is completely turned around according to whether or not a lurking variable is taken into accountWhat do we call this phenomenon in statistics? What is the lurking variable in this case? Can you explain, with evidence from the data, why is the lurking variable that you identified earlier is important in this Example? (3 pts) Question 4 This example is concerned with data from an experiment on shoe wear. A manufacturing company wanted to compare two synthetic materials (A and B) for use in the soles of boys 2

shoes. Data on the reduction in sole thickness were taken from a number of shoes after boys had worn them for a period of three months. The results are displayed in the plot below (data source: Box, Hunter & Hunter (1978) Statistics for Experimenters)
Dot plot of shoe wear data, grouped according to the material used to construct the shoe sole

Material_B

Material_A

10 WEAR

12

14

(a) What type of plot is this? (0.5 pts) (b) Do you think that these data suggest that either material is better than the other? Explain (3 pts) Question 5 A team of researchers is investigating the effect of two drugs designed to help people quit smoking. They found that 39 people out of 90 who decided to use Drug A at the beginning of 1998 were no longer smoking at the end of 1998. In contrast, only 17 people out of 115 who chose to use Drug B at the beginning of 1998 had quit smoking by the end of 1998. The researchers concluded that Drug A is superior to Drug B when it comes to helping people quit smoking. (a) Is this an experiment or an observational study? Explain. (1 pt) (b) Comment on the validity of researchers conclusion. Can you think of any other reasons why more individuals in the rst group might might have quit smoking? (2 pts)

Question 6

(a) What type of plot is this? (0.5 pts) (b) Describe the phenomenon displayed. (1 pt) (c) Which plot type asks you to draw the relationship between two variables? (1 pt) (d) Use rough values from the given plot to construct an approximate example of the plot you proposed in part (c). (2 pts) (e) What advantage is there to the plot given here, over the type in your answer to part (c). (2 pts) Question 7 A scientist collected data on a particular species of crab. She recorded the weight of n = 12 crabs, giving a sample mean of x = 248.7 and a standard deviation of s = 27.4 (all measurements in grams). She then found that one crab was an outlier, turning out to be from another species. It had weight of 299.9 grams. She removed this crab from the sample and recalculated the mean and standard deviation from the remaining 11 crabs. (a) What was the mean of these 11 crabs? (1 pt) (b) Carrying on from part (a), would you expect the standard deviation to be smaller of larger after the removal of the outlier? (2 pts)

Question 8 Download the room temperature dataset (Temperature measurements, in Kelvin, taken from 4 corners of a room) from Wyoweb. (a) Plot the 4 trajectories, FrontLeft, FrontRight, BackLeft and BackRight on the same plot. (3 pts) (b) Comment on any features you observe in your plot. (2 pts) Question 9 Download the six point board thickness dataset (Thickness of 26 SPF boards from a saw mill. The thickness is measured with a laser and the units of measurement are mils (one thousands of an inch). This is a subset of a larger industrial data set. No further adjustments were made to the data) from Wyoweb. (a) Plot an appropriate plot for the first 100 rows of data. (3 pts) (b) Comment on this data based on your plot you constructed in part (a). (3 pts) Question 10 Download the Website traffic set dataset (The number of visits to a small website on each day; if a user accesses the site after 30 minutes of inactivity, that will be logged as a new visit) from Wyoweb. (a) Calculate the 5 number summary and standard deviation for the random variable visits of the data set. (1 pt) (b) Construct a histogram with a density plot on the random variable visits of the data set. Comment on your plot. (3 pts) (c) Create a plot that shows the variability in website traffic for each day of the week. Commnet on your plot. (3 pts) (d) Use the same data set to describe any time-based trends that are apparent. (3 pts)

You might also like