You are on page 1of 11

Math 1040 Project

The project we were assigned to complete for statistics class was to gather data from
skittles and analyze the data using the tools we have learned during the semester. Each
student purchased a 2.17- ounce bag of skittles and then we used the data to produce
the following results.

Data Collection
Each student in the class will purchase one 2.17-ounce bag of Original Skittles and
record the following data:
Number of Number of Number of Number of Number of
red candies orange yellow green candies purple
candies candies candies
260 240 255 287 256

Skittles candies

purple red
20% 20%

orange
green
18%
22%

yellow
20%

red orange yellow green purple


Class sample
300

290
287
280

270

260
260
250 255 256

240
240
230

220

210
orange yellow purple red green

my sample vs class sample


350

300

250

200

150

100

50

0
orange yellow purple red green

Series1 Series2
My total candies

Column1 Orange Green Purple Yellow Red


count 10 21 9 15 7
proportion 0.161 0.339 0.145 0.242 0.113

From the collected data the pie chart appears to be evenly distributed. Although, from the
pareto chart the data shows differently. The data is fairly skewed to the left. I would say that
my data does appear to be fairly consistent with the rest of the class.

Mean Std. Dev. Min Q1 Med Q3 Max


59.6 2.17 56 58 59 61.5 64

Frequency Distribution Column1


# per bag Frequency
50-52 0
53-55 1
56-58 8
59-61 9
62-64 5
From the information displayed in the box plot and histograms you can see how they are
related. The mean number of candies for a 2.17-ounce bag of candies is 59.6 with a standard
deviation of 2.17. Which is expected because they are sold based on weight.

Categorical vs quantitative data


Categorical data can be described as classifying based on their attributes. This could be like the
different classes someone is taking for school or gender. You use pie charts and pareto charts
and bar graphs to display this information. Quantitative data is things like the number of
females in your class or how many students attend your school. You can display this data using
stem plots, dot plots, scatter plots and time series plots. Using a histogram in this project
allowed us to interpret the data much easier.

Confidence interval
A confidence interval is a way to say how confident you are that the population proportion will
fall within a specific interval. It gives a statistical range of what you could expect to find when
analyzing data.
Construct a 99% confidence interval estimate for the true proportion of yellow candies.
Construct a 95% confidence interval estimate for the true mean number of candies per bag
Construct a 98% confidence interval estimate for the standard deviation of the number of
candies per bag

Confidence interval for the true proportion of yellow candies was analyzed using a 99% confidence
interval. The confidence interval was determined to be 0.168<P<.224. So from that data we can say that
if a random sample from all of the students was taken we would be 99% confident that the true
proportion of yellow candies would fall within that interval.
Confidence intervals for the true mean using a 95% confidence interval was analyzed. The confidence
interval was determined to be somewhere between 58.01 low limit to 60.18 high limit. From this we
could say that we are 95% confident that the true mean would fall between 58.01 and 60.18.
And using a 98% confidence interval to find the standard deviation of candies per bag. We found that in
98% of the bags you can expect to see a 1.85 to 3.80 standard deviation.
Hypothesis Tests
A hypothesis test is used to find a claim being made and whether it is true, or if we reject the claim. We
do this by stating the null hypothesis and alternative hypothesis.
Use a 0.05 significance level to test the claim that 20% of all skittles candies are red
Use of 0.01 significance level to test the claim that the mean number of candies in a bag of
skittles is 55

Using the classical method to test the hypothesis claim that 20% of the skittles are red we determined in
the first test using a significant level of .05 that we failed to reject the null hypothesis. The reason being
is that our test statistic was 1.89 which is less than critical value 1.96 therefore we failed to reject the
null hypothesis. The test seems reasonable because the proportion of the total red skittle candies is
0.200.
For our second test we used the t-distribution test to find our test statistic. We determined that our test
statistic was 7.87. The t critical is 2.819. Since our t statistic number is greater than t critical we reject
the null hypothesis since it falls in the reject region. This test seems reasonable since the mean was
calculated to be 59.13

Reflection
The conditions that are needed for confidence interval estimates and hypothesis testing are as follows:
1. The sample must come from a simple random sample. For our test this was the case.
2. The condition for binomial distribution is that there must be a normal distribution. The
histogram data appears to have normal distribution bell shaped. There is however the one
outlier that was noticed and it is also skewed left.
3. Sample sizes must also be at least 30. In our case we fell short of the sample size requirement.
Therefore, since our sample size is smaller than the requirement there could lead to a type 2
error.

You might also like