You are on page 1of 15

Rachel Skidmore

Math 1040
Skittles Final Project

In the Skittles term project, each student was instructed to purchase a 2.17 ounce bag of
Original Skittles. The number of each color of Skittles were recorded and compiled into a class
data set. The data sets were displayed using Pie Charts and Pareto Charts to show the categorical
data. These charts summarized the entire class sample as a whole as well as the data from my
personal bag of Skittles. The data was then converted into a frequency histogram, as well as a
box plot, with the use of a five number summary of the data.
As you can see in table number one, the entire sample is displayed using categorical data
in the form of a pie chart. The sample size of candies was out of 1,020 and contained yellow,
orange, green, purple, and red Skittles. This pie chart using the entirety of the class' Skittles looks
to be fairly symmetrical between all colors. However, in table number three, you can see that my
own bag showed a significant amount more of red and green Skittles as opposed to the other
colors.
Table number two reflects a Pareto Chart, which shows that there were more yellow
Skittles than any other color when it came to the entire sample size of the class. As you can see it
was the red Skittles that were least common amongst the class as a whole. However, table
number four reflects my bag which reflected the exact opposite results. In my own sample size,
table four shows that red was actually the most common color and yellow was the least common,
as opposed to the results from the class sample size. Therefore, the overall data from the class
sample size does not agree with the data from my individual sample size. The data contained in

my sample size reflected colors that had significantly more than others, while the entire class
sample size showed a more equal amount of each color.
Overall, the shape of the distribution comes out to be fairly symmetrical. There were 17
bags of Skittles that were tested. The least amount found in a single bag was 56 Skittles, while
the two bags with the most Skittles had 64. This seems to be fitting, as you can see in table
number five, the majority of the bags had a total number of Skittles that were somewhere in the
middle. This can be seen in the frequency histogram, which shows that nine of the bags had a
total between 59 and 61 Skittles in them. As opposed to 4 of the bags with only 56 to 58, and 4
bags in the higher range with 62 to 64. In terms of my individual bag, which was 64 Skittles,
mine was on the higher range.
Categorical data is data that can't be counted, but can be described, also known as
Qualitative data. This type of data is represented in the emphasis put on the different colors of
the Skittles. The colored Pie Chart and Pareto graphs are representative of this qualitative data in
that you can easily see how many of each type of color of Skittles there were. Quantitative data
on the other hand, can be counted and in this case is used to show how many Skittles were in
each bag. This type of data is more easily seen in the Frequency Histogram in Table 5, because
there were only three frequencies shown. The frequencies were used to show how many Skittles
were in each bag. Therefore, it is easier to see using the bar graph, rather than a pie chart.
Ultimately, for Qualitative data, pie charts or bar graphs make the most sense to represent
data that is descriptive. Pie charts wouldn't work for quantitative data, as they represent variables
as a whole rather than individually. Quantitative data is best represented using Histograms, line
charts or box-plots, as seen in table number 6, to show data that can be counted. These types of
graphs are best used to measure the quantities of each of the given variables.
2

In terms of calculations, finding the mean and median for quantitative variables is the
most useful. Since quantitative data can be counted it is more effective to be able to find an
average number as well as the middle component. However, for categorical, or Qualitative data,
in this case finding the mode, or the color of Skittle that occurs most often is very useful.
Overall, in terms of this project, both categorical and quantitative data were represented.

(Tables 1 & 2) - Yellow: 213 Orange: 195 Green: 211 Purple: 210 Red: 191
Sample Size: 1020, 17 total bags
Table 1:

Skittles Categorical Data-Entire Sample

Yellow: 20.9%
Green: 20.7%
Purple: 20.6%
Orange: 19.1%
Red: 18.7%

Table 2:

Pareto Categorical Data - Entire Sample


Yellow (20.9%)
Green (20.7%)
Purple (20.6%)
Orange (19.1%)
Red (18.7%)
180

185

190

195

200

205

210

215

Table 3:
(Tables 3 & 4) - Red:18 Green:16 Orange:12 Purple:11 Yellow:7
Sample Size: 64

Skittles Categorical Data - Rachel Skidmore

Red: 28.2%
Green: 25%
Orange: 18.7%
Purple: 17.2%
Yellow: 10.9%

Table 4:

Pareto Categorical Data - Rachel Skidmore


Red (28.2%)
Green (25%)
Orange (18.7 %)
Purple (17.2%)
Yellow (10.9%)
0

10

12

14

16

18

Table 5:
The frequency histogram reflects the total number of Skittles found in each bag. There were 17
bags total. The total number in my own bag was 64 Skittles.
Total in
each
bag
56-58
59-61
62-64

Frequenc
y
4
9
4

9
8
7
6
5

56

59
62

3
2
1
0
Frequency Histogram

Table 6:
5-Number Summary: (Based on the total number of Skittles found in each bag.)
Lower Fence= 56
Q1= 59
Q2= 60
Q3= 61
8

Upper Fence = 64

Skittles Box Plot

58

58.5

59

59.5

60

60.5

61

61.5

Confidence Interval Estimates:


Confidence intervals in statistics are used to measure the parameters for a range of numbers. The
level of confidence is the probability that an unknown parameter will fall between this range of
numbers. There were two confidence interval estimates used, 99% and 95%. The 99%
confidence interval was used to measure the true proportion of yellow candies in a bag. The 95%
confidence interval was used to measure the true mean number of candies per Skittles bag.
For reference:
9

Yellow: 213 Orange: 195 Green: 211 Purple: 210 Red: 191
Sample Size: 1020, 17 total bags
Class mean: 60
Class standard deviation: 2.42

10

11

Discussion and interpretation of the confidence interval estimates:


A confidence interval assessment was done to determine the true proportion of yellow Skittles in
a bag. Based on the above calculations, we are 99% confident that the true proportion of yellow
Skittles in a bag are between 0.178 to 0.242. This interval is said to contain the true value of the
population proportion of yellow Skittles in a sample size of 1,020 candies. This means that if we
were to collect random samples of the same size, that 99% of them would contain this true
population proportion.
A second confidence interval assessment was done to determine the true mean number of candies
per bag of Skittles. Based on the above calculations, we are 95% confident that the interval from
58.76 to 61.24 contains the true value of the mean number of Skittles per bag in this particular
population. This means that if we were to collect random samples of the same size, that 95% of
the population would contain this true mean value.
A third confidence interval assessment was done to determine an estimate for the standard
deviation of the number of candies per bag. Based on the above calculations, we are 98%
confident that standard deviation of candies per bag is between 1.71 and 4.02. This means that if
we were to collect random samples of the same size, that 98% of that population would contain
this true value of the population standard deviation.
Hypothesis Testing:
The purpose of hypothesis tests are to: make a statement regarding a population, collect evidence
to test the statement (test statistic), and to either accept or reject that statement.
Statistical hypothesis testing involves assumptions about population parameters, which may or
may not be true. A null hypothesis (Ho) is formed which gives the statement that is being tested.
Then an alternate hypothesis (H1) is formed as the opposite of the statement, in which we are
testing. A test statistic is then identified in order to assess whether the null is true. After graphing
the test statistic and critical region, we can then determine whether we will reject or accept the
null hypothesis. We will reject the null hypothesis if the test statistic is not in the critical region.
We will fail to reject the null hypothesis if the test statistic is in the critical region.

12

on.

13

Discussion and interpretation of hypothesis testing:


In the first hypothesis test, you can see that a 0.05 significance level was used to determine if
20% of all candies in a Skittles bag are red. As you can see, the test statistic of -1.6 was within
the parameters of the critical region (between -1.96 and 1.96) for a 0.05 significance level.
Therefore we failed to reject the null, or accepted the null hypothesis that 20% of all candies in a
Skittles bag are red.
Next, a 0.01 significance level was used for the claim that the mean number of Skittles in a bag
is 55. As you can see from the test statistic of 8.51, which was outside of the critical region
parameters, we rejected the null hypothesis. Therefore, we rejected the claim that the mean
number of Skittles in a bag would be 55.

14

Reflections:
In this Skittles project, I was able to further develop my problem solving skills. In the first stages
of developing this project, the sorting and analyzing of data, calculating what the data meant for
the class as well as displaying it were required. Amongst collecting the data, I was able to use the
skills from class to be able to interpret what the class found. Including each class member's data
into this as well as separating it from my own, I was able to see the difference that one single bag
of Skittles had against a class of 17 total bags. Being able to analyze my own data as well against
the class data was very useful and something that I would be able to use in the future for real
world math applications.
Furthermore, after analyzing the qualitative and quantitative aspects of this project, the further
testing we did emphasized just how much can be done with a single data set. In the latter part of
this project, we analyzed confidence interval estimates as well as hypothesis testing. I was able
to be 99% confident that there are between 17.8 and 24.2 % of yellow candies in any given bag
of Skittles within our sample. I was also able to determine that our null hypothesis was true, in
the claim that there are 20% of red candies in any given bag of Skittles from our sample. Being
able to answer these types of questions about a data set certainly aided to my problem solving
skills. This is largely due to having to determine exactly which formulas to use and what
numbers to use in those formulas.
Ultimately, my views on real-world math applications have changed in that I can analyze and
interpret data more thoroughly. This will aid in my ability to perform studies and analyze data as
a Sociology major.

15

You might also like