Professional Documents
Culture Documents
Introduction
This project is a statistical analysis of the contents of a 2.17 ounce bag of Skittles. We
are analyzing the contents of our own bag as well as the bags of the rest of the class.
The total amount of bags analyzed was 40. The purpose of this project is to find out what
the average amount of Skittles are and how many of each color are in each bag. The
statistical techniques being used in the project are categorical data, quantitative data,
confidence intervals, and hypothesis testing. This is fun and helpful way of showing how
statistics can be useful in everyday life.
Red
Orange
457
500
Yellow
Green
452
541
487
Purple
Red
Orange
0.21 0.19
Yellow
Green
0.19
0.22
0.2
Purple
Proportion
Number
of red
candies
Number
of orange
candies
Number
of yellow
candies
Number
of green
candies
Number
of purple
candies
11
14
14
13
11
0.175
0.222
0.222
0.210
0.175
Proportio
n
Number
of red
candies
Number
of orange
candies
Number
of yellow
candies
Number
of green
candies
Number
of purple
candies
457
541
487
452
500
0.188
0.222
0.200
0.185
0.205
= 60.9
s= 1.9
5-number-summary:
Min: 54.0
Q: 60.0
Q: 61.0
Q: 62.0
Max: 64.0
The graphs do depict what I expected. As I was originally looking through the class data
of the total number of Skittles in each bag I could tell right away that the mean would be
around 61 or 62 just based on the frequency of those numbers in the data set. It appears
that the data is skewed to the left. The data from the entire class does fit into what my
Frequency
4
2
0
1 1
3 8 10 11 5 1
Candies per bag
used for Quantitative data are ogive, dotplot, and histograms. The reason those graphs
would best be used with Quantitative data is due to the fact they all rely on numbers on
the horizontal axis, instead of names, and thats how the categories or groups are
sectioned, they are not given a name/label to group them into. It wouldnt make sense to
give them a name for the various sections of the graphs because they are based on
numbers.
Construct a 99% confidence interval estimate for the true mean number of candies per
bag: 60.09 < < 61.71. What that means is we are 99% confident that the population
mean falls within the interval 60.09 and 61.71.
Construct a 98% confidence interval estimate for the standard deviation of the number
of candies per bag: 1.49 < < 2.52. What that means is we are 98% confident that the
population standard deviation falls within the interval 1.49 and 2.52.
Full written calculations can be found on the last page.
Hypothesis Tests
A hypothesis test is a procedure for testing a claim about a property of a population. First
you hypothesize about whether or not the test of that claim might be equal, greater
than, less than, or not equal to what the claim has stated. You then run the test statistic
and either reject the null hypothesis or fail to reject the null hypothesis. Below are the
answers I found to the questions while using the hypothesis testing technique:
Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green:
n=2437
p=.20
x=452
q=.80
=.01
^p =.1854739
H: p=.20
H: p.20
t= -1.7927
p-value= .0098
p-value > Fail to reject null hypothesis. There is not sufficient evidence to warrant
rejection of the claim that 20% of Skittles is green.
Use the 0.05 significance level to test the claim that the mean number of candies in the
bag of Skittles is 56:
n=40
x =60.9
=56.0
s=1.9
=.05
H: =56.0
H: 56.0
t=16.31069528
p-value= .0434
p-value < Reject the null hypothesis. There is sufficient evidence to warrant rejection
of the claim that the mean number of Skittles in a bag is 56.
Full written calculations can be found on the last page.
group. Conditions for the binomial distribution must be satisfied which means there is a
fixed number of trials and the trials are independent. Also there must be two categories
of outcomes, and the probabilities remain constant for each trial. Lastly, there must be at
least 5 successes and 5 failures. Yes, my samples did meet the conditions as it was a
simple random sample, there were only two possible outcomes and they were
independent.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population means are as follows: The sample must be a simple
random sample. The population must be either normally distributed or have a samples
size larger than 30 (n > 30). Yes, the sample does meet the requirements because it is a
simple random sample and the sample size (n) is larger than 30, it is 40.
The conditions that must be met before performing calculations for confidence intervals
and hypothesis tests on population standard deviation are as follows: The sample must
be a simple random sample. The population must have normally distributed values. The
requirement of a normal distribution is much stricter here than in earlier discussed
topics, so departure from normal distributions can result in large errors. The sample is a
simple random sample and it is normally distributed so it does meet the requirements.
The possible errors that could have been made by using this data is mixing up which n
would be used for a certain equation. In our calculations we used two n (sample size)
depending on what the question was asking. We used n=2437 when we were talking
about the total number of Skittles. Then we had n=40 which was the number of bags of
candies. It could be very easy to mix the two up and get very wrong answers. Another
error that could have been made is a Type I or Type II error. A Type I error is when you
reject a null hypothesis when it is actually true. A Type II error is when you fail to reject a
null hypothesis when it is actually false. You can always improve the sampling method by
adding more samples to get a more accurate understanding. Also, it might be helpful
when doing this project to purchase bags of Skittles from different stores and locations.
The reason for that is if you purchase all the Skittles from the same store, that store
could have had a shipment with skewed data in some way. Maybe the manufacturer sent
out that had way too little or way too many Skittles and that would skew the data and
cause outliers.
In conclusion the data from the entire class shows there is slightly more orange candies
in a bag of Skittles than any other color. However, it is very close to being evenly
distributed. There is about 20% of each color in every bag of Skittles. The manufacturer
also appears to put in roughly the same amount of candies per bag.
Reflective Writing
As a result of this project I have learned how to calculate a confidence interval using
proportion, mean, and standard deviation of a set of data samples. It showed me how to
organize and display the data in a meaningful and constructive way. Performing the steps
in this project also helped solidify the key concepts I have learned throughout the
semester. I am a very hands on learner so this project was a perfect way to help me
remember these concepts long term. It is also useful to know how to take a set of data
and either dispute or confirm a claim made by the manufacturer. This technique is called
hypothesis testing.
My math skills have improved tremendously by taking this class. Not only do I feel I now
have a knowledge base for some of the most relevant mathematics I have ever taken,
but my problem solving and critical thinking skills have increased. These newly acquired
skills would be useful in future classes when trying to calculate the probably of getting
an answer right on a test when I am merely guessing on a multiple choice question.
Another application where my newly acquired statistics skills would come in handy is
when trying to find out what my grade is in the class by using the weighted mean
concept.
This project help improve my problem solving skills because the class in general has
taught me how to look at a problem with a more analytic eye. Now, when I look at a
problem instead of getting overwhelmed if I dont understand it right way, I will take bits
and pieces that I do understand and then work with that until I have come to a greater
understanding of the question. I used to be terrified of story problems but now I feel I
have the skills necessary to dissect the question and break it down into solvable pieces.
I have taken many math classes throughout my years of education and statistics is by far
the most useful one. The other math classes like algebra and geometry are useful
because you have to have a basic understanding of those topics in order to be a wellrounded and educated person. But never in my life have I ever used the quadratic
equation at work or when hanging out with family and friends. I have, however, been at
a family gathering before reading an article where they were talking about a statistical
analysis of a survey. Now I know to question the source, sample method, and sampling
size. Before taking this class I would have accepted the data at face value and left it at
that. I am more analytical now and appreciate accurate information so I can have the
knowledge to agree or disagree with the statement.