You are on page 1of 11

Maddison Capel

Math 1040 Skittles Project


April 29, 2015

Introduction
This project is a statistical analysis of the contents of a 2.17 ounce bag of Skittles. We
are analyzing the contents of our own bag as well as the bags of the rest of the class.
The total amount of bags analyzed was 40. The purpose of this project is to find out what
the average amount of Skittles are and how many of each color are in each bag. The
statistical techniques being used in the project are categorical data, quantitative data,
confidence intervals, and hypothesis testing. This is fun and helpful way of showing how
statistics can be useful in everyday life.

Organizing and Displaying Categorical Data: Colors


The charts below show the proportion of the Skittle colors in every bag included in the
project.

Count of Skittle Colors

Red

Orange

457
500
Yellow
Green

452

541
487

Purple

Skittle Color Proportions

Red

Orange

0.21 0.19
Yellow
Green

0.19

0.22
0.2

Purple

My original assumption was that the


Skittle colors would be even distributed,
having 20% of each color. This was
reaffirmed by the data and graphs. It is
good to see that the manufacturer tries
to put in an equal amount of each color
in every bag.

My bag of Skittles contained:

Proportion

Number
of red
candies

Number
of orange
candies

Number
of yellow
candies

Number
of green
candies

Number
of purple
candies

11

14

14

13

11

0.175

0.222

0.222

0.210

0.175

The total number of candies in the sample: 2437


The entire sample contains:

Proportio
n

Number
of red
candies

Number
of orange
candies

Number
of yellow
candies

Number
of green
candies

Number
of purple
candies

457

541

487

452

500

0.188

0.222

0.200

0.185

0.205

Organizing and Displaying Quantitative Data: the Number


of Candies per Bag
The total number of candies in my single 2.17-ounce bag of Skittles= 63
The total number of bags in the sample collected by the entire class= 40
The total number of candies in the sample collected by the entire class= 2437
For the entire sample:
x

= 60.9

s= 1.9
5-number-summary:
Min: 54.0
Q: 60.0
Q: 61.0
Q: 62.0
Max: 64.0

The graphs do depict what I expected. As I was originally looking through the class data
of the total number of Skittles in each bag I could tell right away that the mean would be
around 61 or 62 just based on the frequency of those numbers in the data set. It appears
that the data is skewed to the left. The data from the entire class does fit into what my

Number of Skittles per bag


12
10
8
Frequency

Frequency

4
2
0

1 1

3 8 10 11 5 1
Candies per bag

personal data showed.

Reflection: Categorical Versus Quantitative Data


Categorical data consists of names or labels that are not numbers representing counts or
measurements. Quantitative data, as its name suggests, consists of numbers
representing counts or measurements. The types of graphs that make sense to use for
Categorical data are pie charts and pareto charts. The reason for that is because with pie
charts and pareto charts you are grouping the various categories and then giving them
names, instead of numbering them. Since the Categorical data graphs are labeled with
names it wouldnt make sense to try to find a difference, mean, or standard deviation of
the data set because they are just labeled out by what the group is called. You can still
find out how many of each group there is and graph the data that way, but it doesnt
make sense to find a difference between the labels. The types of graphs that would be

used for Quantitative data are ogive, dotplot, and histograms. The reason those graphs
would best be used with Quantitative data is due to the fact they all rely on numbers on
the horizontal axis, instead of names, and thats how the categories or groups are
sectioned, they are not given a name/label to group them into. It wouldnt make sense to
give them a name for the various sections of the graphs because they are based on
numbers.

Confidence Interval Estimates


A confidence interval is a range of values used to estimate the true value of a population
parameter. You could use a confidence interval to find out what the true value is of a
population proportion, the population standard deviation, or the true mean by calculating
the margin of error and using a sample statistic to infer the true statistic. Below are the
answers I found to the questions while using the confidence interval technique:
Construct a 95% confidence interval estimate for the true proportion of purple candies: .
189 < p < .221. What that means is we are 95% confident that the purple population
proportion falls within the interval

.189 and .221.

Construct a 99% confidence interval estimate for the true mean number of candies per
bag: 60.09 < < 61.71. What that means is we are 99% confident that the population
mean falls within the interval 60.09 and 61.71.
Construct a 98% confidence interval estimate for the standard deviation of the number
of candies per bag: 1.49 < < 2.52. What that means is we are 98% confident that the
population standard deviation falls within the interval 1.49 and 2.52.
Full written calculations can be found on the last page.

Hypothesis Tests
A hypothesis test is a procedure for testing a claim about a property of a population. First
you hypothesize about whether or not the test of that claim might be equal, greater
than, less than, or not equal to what the claim has stated. You then run the test statistic

and either reject the null hypothesis or fail to reject the null hypothesis. Below are the
answers I found to the questions while using the hypothesis testing technique:
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green:
n=2437

p=.20

x=452

q=.80

=.01

^p =.1854739

H: p=.20
H: p.20
t= -1.7927

critical value: 2.575

p-value= .0098

p-value > Fail to reject null hypothesis. There is not sufficient evidence to warrant
rejection of the claim that 20% of Skittles is green.
Use the 0.05 significance level to test the claim that the mean number of candies in the
bag of Skittles is 56:
n=40

x =60.9

=56.0

s=1.9

=.05
H: =56.0
H: 56.0
t=16.31069528

critical value= 2.023

p-value= .0434

p-value < Reject the null hypothesis. There is sufficient evidence to warrant rejection
of the claim that the mean number of Skittles in a bag is 56.
Full written calculations can be found on the last page.

Reflection: Confidence Interval and Hypothesis Tests


The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population proportions are as follows: The sample must be a
simple random sample which means each member of the subset has an equal probability
of being chosen. A simple random sample is meant to be an unbiased representation of a

group. Conditions for the binomial distribution must be satisfied which means there is a
fixed number of trials and the trials are independent. Also there must be two categories
of outcomes, and the probabilities remain constant for each trial. Lastly, there must be at
least 5 successes and 5 failures. Yes, my samples did meet the conditions as it was a
simple random sample, there were only two possible outcomes and they were
independent.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population means are as follows: The sample must be a simple
random sample. The population must be either normally distributed or have a samples
size larger than 30 (n > 30). Yes, the sample does meet the requirements because it is a
simple random sample and the sample size (n) is larger than 30, it is 40.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population standard deviation are as follows: The sample must
be a simple random sample. The population must have normally distributed values. The
requirement of a normal distribution is much stricter here than in earlier discussed
topics, so departure from normal distributions can result in large errors. The sample is a
simple random sample and it is normally distributed so it does meet the requirements.
The possible errors that could have been made by using this data is mixing up which n
would be used for a certain equation. In our calculations we used two n (sample size)
depending on what the question was asking. We used n=2437 when we were talking
about the total number of Skittles. Then we had n=40 which was the number of bags of
candies. It could be very easy to mix the two up and get very wrong answers. Another
error that could have been made is a Type I or Type II error. A Type I error is when you
reject a null hypothesis when it is actually true. A Type II error is when you fail to reject a
null hypothesis when it is actually false. You can always improve the sampling method by

adding more samples to get a more accurate understanding. Also, it might be helpful
when doing this project to purchase bags of Skittles from different stores and locations.
The reason for that is if you purchase all the Skittles from the same store, that store
could have had a shipment with skewed data in some way. Maybe the manufacturer sent
out that had way too little or way too many Skittles and that would skew the data and
cause outliers.
In conclusion the data from the entire class shows there is slightly more orange candies
in a bag of Skittles than any other color. However, it is very close to being evenly
distributed. There is about 20% of each color in every bag of Skittles. The manufacturer
also appears to put in roughly the same amount of candies per bag.

Reflective Writing
As a result of this project I have learned how to calculate a confidence interval using
proportion, mean, and standard deviation of a set of data samples. It showed me how to
organize and display the data in a meaningful and constructive way. Performing the steps
in this project also helped solidify the key concepts I have learned throughout the
semester. I am a very hands on learner so this project was a perfect way to help me
remember these concepts long term. It is also useful to know how to take a set of data
and either dispute or confirm a claim made by the manufacturer. This technique is called
hypothesis testing.
My math skills have improved tremendously by taking this class. Not only do I feel I now
have a knowledge base for some of the most relevant mathematics I have ever taken,
but my problem solving and critical thinking skills have increased. These newly acquired
skills would be useful in future classes when trying to calculate the probably of getting
an answer right on a test when I am merely guessing on a multiple choice question.

Another application where my newly acquired statistics skills would come in handy is
when trying to find out what my grade is in the class by using the weighted mean
concept.
This project help improve my problem solving skills because the class in general has
taught me how to look at a problem with a more analytic eye. Now, when I look at a
problem instead of getting overwhelmed if I dont understand it right way, I will take bits
and pieces that I do understand and then work with that until I have come to a greater
understanding of the question. I used to be terrified of story problems but now I feel I
have the skills necessary to dissect the question and break it down into solvable pieces.
I have taken many math classes throughout my years of education and statistics is by far
the most useful one. The other math classes like algebra and geometry are useful
because you have to have a basic understanding of those topics in order to be a wellrounded and educated person. But never in my life have I ever used the quadratic
equation at work or when hanging out with family and friends. I have, however, been at
a family gathering before reading an article where they were talking about a statistical
analysis of a survey. Now I know to question the source, sample method, and sampling
size. Before taking this class I would have accepted the data at face value and left it at
that. I am more analytical now and appreciate accurate information so I can have the
knowledge to agree or disagree with the statement.

You might also like