You are on page 1of 11

Brad Bingham

Math 1040 Skittles Term Project

This project is designed to statistically determine the ability of several bags of Skittles to
accurately represent the color distribution of the entire population of every bag of Skittles. 25 bags of
original Skittles were purchased, assessed individually and subsequently eaten. The color distribution
of each bag was reported and compiled on an excel spreadsheet. The following data is derived from
my individual bag of Skittles and data compiled form 24 additional bags of Skittles.

Categorical Data
The results for this section are derived from the sums, means and standard
deviations of the collected data. The information is displayed numerically in
tables as well visually in bar and pie charts. Observations are detailed at the end
of the the section.
Quantitative Data
The results for this section are derived from the medians of the total candies for
each bag. The information is displayed in several tables, a box and whisper chart
and a a histogram. Observations are detailed at the end of the section.
Confidence Interval Tests
The results for this section are derived from the various results of our data and
predetermined confidence intervals. The information is addressed and results are
shown below each hypothesis. Observations are detailed at the end of the
section.

Hypothesis Testing
The results from this section are derived from various results found by the data
and with predetermined confidence intervals. The hypothesis addressed and
corresponding results are listed below each hypothesis. Observations are detailed
at the end of the section.

Categorical Data

Color
Red
Orange
Yellow
Green
Purple
Total

My Bag of Candies
Number Proportion
19
0.317
14
0.233
5
0.083
13
0.217
9
0.150
60
1.000

Number of Candies (whole class)


Color
Red
Orange
Yellow
Green
Purple
Total

Number

Proportion

Mean

292
307
299
297
317
1512

0.193
0.203
0.198
0.196
0.210
1.000

11.7
12.3
12
11.9
12.7
60.5

Standard
Deviation
3.34
3.37
4.22
3.35
4.07
1.76

Numbers of Candies

Numbers of Candies

(My Bag)

(Whole Class)
Red
Orange
Yellow
Green
Purple

9
19
13
14

292

Number of Candies

(Whole Class)

(Whole Class)

20
15
10
5
0
Orange

Green

297
299

Number of Candies

Red

314

307

Red
Orange
Yellow
Green
Purple

Purple

Yellow

320
315
310
305
300
295
290
285
280
275
Purple Orange

Yellow

Green

Red

Observations:
The small sample I provided may not be a good representation of the color distribution of the
whole of Skittles. As the population size increases to 25 bags of Skittles, the proportions of colors
seems to even out. The table containing data collected from the whole class and the corresponding pie
chart show this relationship clearly. The corresponding bar graph seems to show a very similar pattern
to the data I collected from my individual bag, but is misleading. The table and pie chart show that the
bar graphs information is delusory, and that the visual differences do not represent a statistically
significant difference.

Quantitative Data

Five Number Summary


Minimum
55
Q1
60
median
61
Q3
62
Maximum
63

Total Candies Frequency


55
1
56
0
57
0
58
2
59
2
60
7
61
4
62
8
63
1

Observations:
The data collected shows there is a general grouping of total candies per bag. In a total of 25
bags the majority fits into the range of 60-62 candies per bag. The data doesnt come to a true natural
distribution because of the lack of bags containing 61 candies. My individual bag of Skittles contained
60 candies which falls within 1 candy of the median amount of total candies per bag. There was one
outlier that may have sewed the data in the quantitative results.

Reflection on Quantitative vs Qualitative results:


Qualitatively the color distribution seems to indicate the bag of Skittles that I used was not a
good representation for the entire population of Skittles. Quantitatively the bag of Skittle I used was
just about average. Color distribution seems to have a bigger degree of variance per individual bag,
than does numbers of candies per bag.

Confidence Interval Estimate


To determine the degree to which we are confident in our data in relation to the total population
of Skittles, we will be using a confidence interval estimate. The tests will produce a range in which the
true value of our estimate may fall. If our empirical data value falls within the parameters of the
estimated data values, then we may accept they are accurate. We will be testing three values to
determine if they can be used as total population estimates for all Skittles.

95% confidence interval estimating the true proportion of purple candies.

N=1512
P=0.210
Za/2=1.96
E=ta/2(p(1-p)/n)
E=1.96*(0.210(1-0.210)/1512)
E=0.010

P-E <P <P+E


(0.210-0.010) <P <(0.210+0.010)
0.200 <P <0.220

99% confidence interval estimating the true mean number of candies per bag.

N=25
Df=24
X=60.5
S=1.76
Ta/2=2.797

X E <u X <X +E
(60.5 0.985) <u <(60.5 +0.985)
59.5 <u <61.5

E=ta/2(s/n)
E=2.797*(1.76/25)
E=0.985

98% confidence interval estimating the true standard deviation of the numbers of candies per bag.

N=25
Df=24
S=1.76

((n-1)s^2/X^2L) <q <((n-1)s^2/X^2L)


((25-1)*1.76/42.980) <q <((25-1)1.76/10.856)
1.315 <q <2.617

X^2L

X^2R

Observations:
The confidence interval estimates show in all three cases that our estimations are fairly accurate.
We could estimate logically that the average proportion of purple candies per bag would be 20%,
making an even distribution between all colors. The estimate interval shows that our estimate is within
its possible range. Our mean of candies per bag falls within the estimated true value in our confidence
interval estimates. Lastly the standard deviations we determined from our results falls within the
ranges outlined by its confidence interval estimate.

Hypothesis Tests
Hypothesis tests are used to determine if a given hypothesis is incorrect. We determine a
hypothesis and state it with mathematical parameters that can be tested against data until it is proven
untrue. If we fail to reject the hypothesis there is a chance that it is correct. A hypothesis that fails to be
rejected over time becomes a theory. We will be testing two hypothesis that will be listed below with
their corresponding results.

0.01 significance level to test the claim that 20% of all Skittles are green

N=1512
P=0.210
a-0.01
Za/2=2.575

H0 P=0.200
h1 p0.200
Z=(p-P)/(p(1-P)/n)
Z=(0.21 0.20)/(0.20(1-0.20)/1512)
Z=0.972
Pvalue =0.331
Because p >a and z does not fall beyond Za/2 we
fail to reject H0
Conclusion: There is not sufficient evidence to
Reject the claim that 20%of all Skittles are green

0.05 significance level to test the claim that the mean number of candies in a bag of Skittles is 56

N=25
Df=24
X=60.5
S=1.76
a=0.05
Ta/2=2.797

H0 u =56
H1 u =56
t =(X-u)/(s/25) =12.784
t =(60.5-56)/(1.76/25)
t =12.784
Because t is beyond Ta/2, we reject H0
Conclusion: There is sufficient evidence to reject
the claim that the mean number of candies in a
bag of Skittles is 56

Observations:
The results from the first test failed to reject the hypothesis that 20% of Skittles are green. This
means that there is a chance that 20% of Skittle are in actuality, green. Rejecting the hypothesis would
confirm that 20% is either too high or to low of a percentage. The results from the second test rejected
the hypothesis that 56 is the true mean number of candies in a bag of Skittles. This means that there is
a high chance that the true mean of skittles is above 56.

Reflection
Confidence interval estimates and hypothesis test require procedures to validate the results.
Choosing a confidence interval that is closer to 100% generally means that there is a higher degree of
perfection needed in the estimate. If the statistic was to determine whether a piece of medical
equipment was going to function correctly, we would want to use a higher confidence interval. Smaller
confidence intervals used in these calculations allow for a greater degree of flexibility in possible
correct answers. The confidence intervals were predetermined in these exercises, and their results have
varied as discussed.
Interval estimate testing is used when the total population is not known. A sample of the
population needs to be examined and a confidence interval needs to be chosen before testing can begin.
The total population of all Skittles in the world has not been estimated for this project, but an
estimation about parameters in the population have been estimated by our results.
Confidence interval tests are based off a statistical phenomenon that allows any population to
resemble a normal distribution. The examination of T scores, Z scores, the use of P values and
logically reasoning through hypothesis help us determine what information is plausible when seeking
statistical significance in data. The data must satisfy at least one of these tests before the sample can
accurately represent the population.
Errors may have occurred at many points throughout these test. The collection of data was
reported by 25 different individuals who may have reported incorrectly. The data calculated has been
checked against other sources for accuracy, but there is still a chance or miscalculations within the data.
The percentage of error may be impossible to calculate for a project with this many aspects. The
significance of the issue is not one that would require reexamination of the data to eliminate further
error. However, this exercise will be repeated in future statistics classes and may be cross examined
with the pending results.

Reflexive Writing
What have I learned as a result of this project

This project incorporates almost every aspect of the course work in Math 1040. It has helped me
visualize everything from the integrity needed in collecting data to the process of understanding if
presented data is valid. The more specific ideas about formulas and mathematical process may escape
me sometime in the future, but the basic ideas outlined through this project will stick with me
throughout life. Ive learned how to collect data correctly, analyze data processes and formulate my
own opinion about statistical results.
Collecting data is a very important step in the process of compiling a convincing statistical
argument. This type of project had few real threats to being misrepresented, but I understand how
someone, unknowingly or with a bias, could have altered it. The data could have been reported from
vague recollection or incorporated intentionally biased data sets. There could have been parameters set
up in the collection of data to reject evidence not wanted by the statistician collection the data. The
data could have been limited to a very small population to exaggerate the differences desired. The
process and parameters of the collection of data is just as crucial to the finding of the true results as is
the analysis itself.
Analyzing data involves lots of complex mathematical formulas to manipulate the information
toward the desired result. The end result of statistics is to find significance in data and clearly convey it
to others. For some statisticians the desired result is not a true representation of the data. The
principles I will retain form this class, and specifically this assignment, will help me determine if
statistics I encounter are skewed or not. Concepts like understanding normal distributions curves,
different frequency and proportion rules and whats needed to make mathematical statistics successful,
will help me determine biased data from true data. For example, using the mean value rather than a

median value, for some data sets, may skew data in a way favorable for a biased opinion.
Understanding even the basics of what formulas accomplish and how they do it has helped me become
more critical of statistics I analyze. That criticism has helped lead to making more educated choices.
Critical thinking is the ability to formulate solutions and opinions when presented with new
information. The skills I have learned through this paper have built a foundation to critically analyze
statistical reports. With my background of exercise science, I often read reports about new and
emerging ideals about physical fitness. I have already used the skills Ive learned through this class to
determine some peer reviewed findings to be biased and unreliable. If not for my new found cognitive
tools, I may have blindly accepted the information presented in that paper. Until that subject has a
more complete analysis of its data I will reject its hypothesis.
I have been equipped with an arsenal of tools that will help me in my daily life. I most likely
will not become a professional statistician, but I know what impact understanding statistics can have.
There is no doubt that there will be occasions in my future professional life where statistics must be
understood and evaluated correctly. I feel confident in my ability to determine the integrity of data
collection, data analysis and to form an accurate opinion of statistics.

You might also like