You are on page 1of 8

Mike Warren

Math 1040
Oremus

Statistical Analysis of a Bag of Skittles

At the start of the semester, every individual in Mr. Oremus’s Statistics


classes purchased a standard 2.17oz bag of Original Skittles. Each student then
counted and recorded the number of candies in his or her bag and the number
of candies of each color in that bag. Using this data, we are able to carry out a
study and demonstrate a handful of useful statistical measurements with this
relatively small sample representative of all 2.17oz Original flavor Skittles bags.

The table below represents the number of Skittles of each color that I
had in my bag.

Count Red Count Orange Count Yellow Count Green Count Purple

10 10 14 11 13

All told, there were 58 individual pieces of candy in my bag. Additionally,


I was able to make predictions about what proportion of the total candies in
my bag would be made up of each of the five colors. I suspected that each
color would be equally represented, but was incorrect in my belief.
Count Count Count Count Count
Red Orange Yellow Green Purple

Expected
.2 .2 .2 .2 .2
Proportion

Observed
.17 .17 .24 .19 .23
Proportion

I expected to see roughly equal proportions of each color in the bag of


Skittles, and while the data for each color are similar, they are not as close as I
thought they would be. No one color was grossly over- or under-produced, so
there were no extreme outliers, but it is somewhat surprising that the colors
were not equally distributed, as I expect the bags are filled by machines.

Not one of the proportions in my bag was equal to the corresponding


ones from the class data set. My bag had a rather lower proportion of red and
orange colored Skittles, while green was only slightly lower than the class.
Yellow and purple appeared quite a bit more frequently in my bag than the
class data would predict. Apparently, there is variability in the Skittles bag-
filling method.

After comparing color proportions for my own bag, I then did the same
for the entire class’s data.

Assuming that Skittles bags are filled by machines, I would expect these
machines to dispense a set number of Skittles of each color in every bag.
Therefore, I expect that each color will make up 20% of the total Skittles in
each bag.

Count Count Count Count Count


Red Orange Yellow Green Purple

Expected
20% 20% 20% 20% 20%
Proportion

Observed
20.3% 19.9% 20.4% 20.2% 19%
Proportion

Skittles Skittles
1000
838 900 800
Yellow
600
Count

Red
400
Green 200
874 Orange 0
893
Yellow Red Green Orange Purple
Purple
Colors
889

The class data does represent a sample, but not a random sample
because each student bought a bag of Skittles at their convenience and by
choice. Had we taken our data from a random selection taken from every
Skittles bag in the world, then our data would indeed be a random sample.
That said, the population is the world’s supply of 2.17oz Skittles Originals bags,
which we obviously did not have access to.

Next, I calculated some variables that paint a picture of what we might


expect a typical Original Skittles bag to look like, in terms of the number and
color of candies inside. The mean represents the average number of candies
per bag; the sum of the standard deviation and the mean represent the
number of candies in 67% of 2.17oz Original Skittles candy bags, based on our
data; and the 5-Number Summary provides five descriptive measurements,
being the minimum observed number of candies in a bag, the lower quartile
(or 25th percentile), the median number of candies in a bag, the upper quartile
(or 75th percentile), and the maximum observed number of candies in a bag.
Included are two helpful graphics illustrating the given data.

• Mean number of candies per bag: 59.4


• Standard Deviation for number of candies per bag: 2.8
• 5-Number Summary of number of candies per bag: 52, 58, 60, 61, 65
The class data suggests that the 2.17oz Skittles bags tend to have 59 or
60 individual candies, on average. The frequency histogram has a roughly bell-
shaped curve, with a slight skew towards the lower end of the scale. My own
data showed that yellow and purple were far more common than red or
orange, and moderately more common than green. This video from a Skittles
factory may help explain why yellow candies were so much more abundant
than most of the other colors: https://www.youtube.com/watch?v=e3DDHVWGnRc.

I’m not surprised by the class data, actually, because it is more


consistent with what I expected to see. The proportion of each color was
nearly equal, all being very close to 20% of the total candies.

Categorical data can be described as measuring characteristics (i.e.


color, shape, taste) and quantitative data is measuring numerical, or
countable, things (i.e. weight, height, width). Categorical data can be
represented in graphs that display frequency of occurrence, such as a bar
graph illustrating the number of red candies in a bag of Skittles. Quantitative
data can be shown using graphs such as a histogram or a box plot, which can
mark important values such as minimums, medians, means, maximums, and
quartiles.

Because quantitative data are measurements of discrete numerical


values, one can take the mean and the standard deviation, whereas it is not
possible to take the mean of a color or a preference or a shape. Calculations
for categorical data include frequency and relative frequency, which measure
the number of occurrences of a variable in a data set and the proportion of
those occurrences relative to the rest of the variables in the set, respectively.

Finally, I calculated confidence intervals for the Skittles data. Confidence


intervals are used to describe a range of values in which we are x% confident
that a certain value can be found. For example, we might know that a mail
carrier is on a certain street without knowing which house she is currently at.
We could say that we have 95% confidence that she is between house A and
house H, which leaves the possibility that she is actually not at any of those
houses and is really at house I.

With this understanding in mind, we have 99% confidence that


confidence interval for the population proportion of yellow candies is (.18914,
.22051). With 95% confidence, we can say that the confidence interval for the
mean number of candies of all colors per bag of Skittles is (58.737, 60.019).
For the proportion of yellow candies per bag of Skittles, the interval
(.18914, .22051) suggests that in each bag of Skittles, we are 99% sure that
between roughly 19% and 22% of the candies will be yellow. The interval
allows for some variation from a mean and the confidence percentage allows
for some deviation from the calculated interval.

In the case of the mean number of candies per bag, the calculations
suggest that there is very little deviation from the population average, which
may well be due to the action of filling Skittles bags being performed by
calibrated machines. We are only 95% confident that the interval captures the
population mean, which leaves room for the possibility that the actual
population mean is outside of our confidence interval.

Confidence intervals can be used to approximate specific values of


populations without having to count every member of that population
individually. There is error using this technique, but in the case of populations
that cannot feasibly be counted individually, it can provide calculated values
that are close enough to the actual value that they may suffice. Confidence
intervals also allow room for values that are outside of the interval itself, in the
case that the actual value is outside of the calculated interval.

To summarize this experiment, we see that taking measurements can be


very useful in making predictions and expecting certain outcomes. It was
interesting to see how some of the same methods used in high-budget studies
and surveys are simple enough to be used in measuring Skittles data. From my
own perspective, I think that understanding and recognizing patterns and
averages is a crucial skill, particularly to those such as myself that are pursuing
a career in medicine. A practitioner having specific expected outcomes in mind
and being able to recognize abnormal results or outliers can make all the
difference in a patient’s life. For example, if I know what functionality someone
should have three months after an ACL surgery, then I can recognize that
something is amiss and look for the problem and solution so that my patient
can return to full function and life satisfaction.

This project gave me the chance to apply my knowledge of finding


statistical measures, graphing, building 5-number summaries, and finding
confidence intervals. Having spent a little bit of time in the research world, I
was exposed to much of this in my first year of college but had a steep learning
curve as I was immersed in hours of reading and deciphering academic journal
articles – in addition to the principles and mechanisms of biochemistry,
metabolism, and organic chemistry I was trying to learn on my own and from
the chemist I was doing an internship with at the time. In summary, this
project was an excellent opportunity to tie together into one the concepts we
learned in class.

You might also like