You are on page 1of 9

Skittles Term Project

Ana de Freitas
Salt Lake Community College

Introduction:
In my introduction to statistics class, we are required to do a term project. This project
involves skittles and statistical analysis. Each student was given a bag of Original Skittles and
counted how many skittles (of each color) were in the bag (disregarding any partial skittles). We
could then compare our individual data to the class data, using graphs and so forth. This allows
us as students to use many concepts we have learned in class, including organizing data,
analyzing data, interpreting the results, and much more. This report should show all my
calculations and data that I have collected as well as the process. It should also explain the
meaning or significance of the data and results.

My individual Skittle count:


Number of Red
14

Number of
Orange
12

Number of
Yellow
15

Number of
Green
5

Number of
Purple
14

Organizing and Displaying Categorical Data (colors):

Pie Chart of Each Color from the Class Sample

329; 20% 376; 23%

Red

Orange

Yellow

291; 18%

Green

Purple

315; 19%

328; 20%

Colors:
My data:
Class total:
Total

Red
14
376
Proportion of

Orange
12
315
Proportion

Yellow
15
328
Proportion

Green
5
291
Proportion

Purple
14
329
Proportion of

number of
skittles=
1639

red
(376/1639)=
0.229

of orange
(315/1639)=
0.192

of Yellow
(328/1639)=
0.200

of Green
(291/1639)=
0.178

Purple
(329/1639)=
0.201

Pareto Chart of candy count of each color


400
350
300
250

Count 200
150
100
50
0
Red

Orange

Yellow

Green

Color

Tables showing proportions of my bag v. entire sample:

My individual Skittle Count


16
14
12
10

Count

8
6
4
2
0

Red

Orange

Yellow

Color

Green

Purple

Purple

Mean Skittle Count of entire sample per bag


16
14
12
10

Count

8
6
4
2
0

Red

Orange

Yellow

Green

Purple

Color

Proportions of entire sample


400
350
300
250

Count 200
150
100
50
0

Red

Orange

Yellow

Green

Purple

Color

Observations of this data:


The graphs represent the data accurately. I never thought about how skittles were
distributed in each bag. I can conclude that my individual skittle bag had less than half
of the average green skittle count as a whole. My bag contained 5 green skittles, most
bags contained 11. I would say that the green skittle count is an outlier because it is
far from the mean value of the class. My orange and red skittle count exactly matched
the average count of orange and red skittles of the class. I had a few more yellow and
purple skittles. Most of the class had more red skittles than any other color.
Organizing and Displaying Quantitative Data (The number of candies per bag):

Mean number of candies per bag = 60.7


Standard Deviation = 1.5
5-number summary:
Minimum = 58.0

Maximum = 64.0

Q1= 59.0

Q2 (median) = 61.0

Lower Fence = Q1 1.5(IQR) = 59.0 1.5(3) = 54.5


Upper Fence = Q3 + 1.5(IQR) = 62.0 + 1.5(3) = 66.5

Bi Frequ
n
ency
57 0
58 1
59 6
60 5
61 7
62 5
63 2
64 1
65 0

IQR (Q3-Q1) = (62-59) = 3.00

Q3 = 62.0

Frequency Histogram
8
7
6
5
4
Frequency
3
2
1
0
58

59

60

61

62

63

64

65

More

Bins (number of candies per bag)

Box Plot:

50

58 59

61 62

64

70

Amount of candies per bag


Observations of this data:
The shape of the overall distribution of the number of skittles per bag is fairly normal. It does
have a bell-shaped curve on the frequency histrogram. The only detail that some may argue is
that the 59 candies per bag was high in frequency which gave the graph another peak. I didnt
know what to expect at first, but since the numbers of candies per bag arent extremely far from
each other, the graphs looks as expected. There were more people who had 61 candies in their
bag than any other number. I had 60 skittles in my bag, which is close to 61. In all fairness, we
only tested 27 bags. If we had a larger sample, our data would be more accurate.

Reflection:
Quantitative data consists of numbers that represent counts. Categorical data consists of labels
or names that are not numerical.
Qualitative:
Pie graphs are great for numerical data that falls into categories. The categories are
represented by slices of the pie, whose area is proportional to its percentage. Calculations are
limited in qualitative data. An example of a calculation would be finding the mode or percentage.
If you ask a person a yes or no question, then there are only two categories into which their
answer can fall. Then you could calculate the percentage of those who said yes or no. Bar
charts are a good choice for qualitative data as well.
Quantitative:
Numerical data has a lot of ways in which it can be represented. One can use
graphs or tables. Some graphs include: line graphs, bar graphs, dot plots, timeseries, scatterplots, histograms, etc. When working with a set of numbers, one can
calculate the mean, median, standard deviation, frequency, mode, maximum,
minimum, etc. It really just depends on what they are looking for. It would not make
sense to use numerical data when the numbers have no relationship to what a
person is trying to find. The numbers should have significance or meaning behind
them. It is very important to label graphs correctly. If I tell someone I have 42. They
will most likely wonder what it is I have, it could 42 grams, 42 shoes, etc. Graphs
help you see relationships between variables. A visual representation of data is
always nice to have.
Confidence Interval Estimates:
A confidence interval is a range of values, derived from sample statistics,
which is likely to include the value of an unknown population parameter. We use
confidence intervals when we want to bound the mean or standard deviation.
a). 99% confidence interval estimate for the true proportion of green candies.
.154 < p < .202
b). 95% confidence interval estimate for the true mean number of candies per bag.
60.106 <

< 61.294

c). 98% confidence interval estimate for the standard deviation of the number of
candies/bag.
1.132 <

< 2.189

MY HAND-WRITTEN CALCULATIONS AND RESULTS ARE POSTED ON A SEPARATE PDF.

Hypothesis Tests:
Hypothesis testing refers to the formal procedures used by statisticians to accept or reject
statistical hypotheses (claims or assumptions). An assumption about a population parameter, to
be precise.
a). 0.5 significance level to test claim that 20% of all skittle candies are purple.
Fail to reject Ho
b). 0.01 significance level to test the claim that the mean number of candies in a bag of skittles
is 62.0.
Reject Ho
MY HAND-WRITTEN CALCULATIONS AND RESULTS ARE POSTED ON A SEPARATE PDF.
Reflection:
The conditions for:
Proportion: Simple random sample, at least 5 successes, 5 failures, and meets the conditions
for a binomial distribution
Mean: normal distribution or sample size is greater than 30. Simple random sample.
Standard Deviation: Simple random Sample has to be a normal distribution
There are a lot of checkpoints when it comes to statistics. One checkpoint or requirement that I
know we did not meet was the population size. Our population size (n) was under 30, but our
distribution is normal. In our frequency histogram it has an overall bell shape curve, but on the
59 bin (59 candies per bag) there seems to be a big bump (peak). This could be considered an
outlier, but at the same time, it is showing the data accurately, so I would consider the graph to
be a normal distribution. The sampling method could have been improved by using a larger
sample, maybe compiling data from more than one class. There is also the chance that there
may be a Type 1 or Type 2 error because we may be rejecting a true null hypothesis or failing to
reject a false null hypothesis.

You might also like