You are on page 1of 7

Original Skittles

The purpose of this project is to pull together statistical concepts we have been studying and
adapting calculations into values that numerically summarize data to then use to create graphical
displays. By using confidence interval estimates and hypothesis tests we will attempt to draw
conclusions behind the proportions for each color and number of candies per Skittles bags. While
collectively this project would not have been possible without the contribution from the class, the
observations and graphical illustrations shown below are a reflection of my own work.

Proportion of each color:


Class
Proportions
Colors

Count

Green

291

Orange

315

Purple

329

Red

376

Yellow

328

Ratio
291/163
9
315/163
9
329/163
9
376/163
9
328/163
9

Count
11
13
6
21
9

Ratio
11/60
13/60
6/60
21/60
9/60

Perce
nt
18%
19%
20%
23%
20%
Total
100%

My
Proportions
Colors
Green
Orange
Purple
Red
Yellow

Perce
nt
18%
22%
10%
35%
15%
Total
100%

The entire population ranges primarily from 18-23%, and mine ranges from 18-35%, due to the
fact that I received more red candies in my bag compared to others in the class individually.

However, red is also the highest contributor in the population sample as well. I concluded my
bag was not special or unique in anyway. Proportions are relatively consistent.

Purple; 329 Red; 376

Green; 291

Orange; 315

Yellow; 328

Visual distribution of numerical data using a Pie Chart.

Number of Skittles Per Color


400
350
300
250
200
150
100
50
0

Red

Purple

Yellow

Orange

Green

The graphs above graphically illustrate the proportion of each color: one as a Pie Chart and the
other as a Pareto Chart. As you can see, both charts depict the same information, the difference is
presentation. Between the class, colors are consistent where they range around the same area.
Red being the highest numerical value while green being the least frequent.

Number of Skittles per Bag

Frequency

8
7
6
5
4
3
2
1
0

6
5

2
1
58

59

60

61
7

62

63

1
64

Number of Skittles per Bag

The number of skittles per bag ranged from 58 through 64 with the concentration scattered in between.

Between the mean numbers of candies per bag, collectively 7 individuals received 61.While
there were some who had 3 less than the mean, or a small percentage who received 3 more than
the average, the primary concentration was between 58 and 64 pieces. The shape of the
distribution is concentrated between 60 and 62, larger distribution around the mean. I was not
surprised by the data, Skittles manufactures are run by machines, and numerical accuracy is not
Mean
60.8 uncommon for this type of technology. I enjoy Skittles candy and am
Median
61.0 pleased to announce that I was not the unfortunate one who received only
Mode
61.0 58 pieces instead of 61 or more. I was in the average range among the rest
of the class.
Standar
d
Deviatio
n
Sample
Variance
Range
Minimu
m
Maximu
m
Sum
Count
Q1
Q2
Q3

1.4
2.0
5.0
58.0
64.0
1581
.0
26.0
59.5
61
62

It was important to define the difference between categorical data and quantitative data for this
project. Quantitative data consists of numerical values that are used to represent counts or
measurements, while categorical data primarily consists of names or labels that would not
represent counts or measurements. Both types of graphs could be used depending on the
circumstance. If one would like to illustrate the ages in years of survey respondents, quantitative
would work best because numbers are easier to organize in this situation. Pie charts or Pareto
charts would work best and labeling categorical data in simple pie slices or bar graphs to
maintain the integrity of the information and simplicity. Specific calculations make more sense
to use in categorical data if frequencies are proportional to the frequency count for the category
and if showing relative sizes of the components is necessary. Quantitative data calculations are
best used if they attempt to illustrate a cluster or gap, yearly high or low values, or IQ scores that
reveal the distribution of data while keeping the original data values.
Interval Estimates
Three main concepts are necessary to make and appropriately interpret values of corresponding
proportions such as point estimates, confidence intervals, and sample sizes. Sample proportions
are said to be the best point estimate of a population proportion, but a point estimate is a single
value that gives no robust balance between precision and reliability on how good the estimate
actually really is. By using confidence intervals the estimate becomes a range rather than a single
value and the higher the confidence, the more certain we are that the estimate contains the true
value of the population parameter.
Requirements
For estimating a Population Proportion:
1. Sample is a simple random sample
2. Fixed number of trials, independent trials, two categorical of outcomes, probabilities
remain constant for each trial.

3. There are at least 5 successes and at least 5 failures.


*In our case, our requirements were met.
For estimating a population mean (t or z)
1. Sample is a random sample
2. Population is normally distributed or n > 30
*In our case, requirements were not met. We had 27 samples.
For estimating a population standard deviation or variance
1. Sample is a simple random sample
2. Population must be normally distributed
*Based on the histogram and box plot provided in previous pages, the population is not
normally distributed. Requirements were not met.
Estimates based on Skittles Project

A 99% confidence interval for the true proportion of green candies:

Margin of error: .0243


We are 99% confident that the interval (15.3%, 20.2%) contains the population proportion of
green candies.

A 95% confidence interval estimate for true mean number of candies per bag

Margin of error: .5887


We are 95% confident that the interval (60.1, 61.3) contains the true mean number of candies per
bag.

A 98% confidence interval for the standard deviation of the number of candies per bag

We are 98% confident that the interval (1.1, 2.1) contains the standard deviation of the number of
candies per bag.

Hypothesis Testing
A method used to test whether or not a claim or statement about a property of a population is
valid. In inferential statistics we can never prove or accept a claim as true, but it is possible to
reject a claim therefore allowing a new claim to be tested for its validity.
Rare Event Rule for Inferential Statistics:

If, under a given assumption, the probability of a particular observed event is extremely
small, we conclude that the assumption is probably not correct.
Type I error: The mistake of rejecting the null hypothesis when it is actually true.
Type II error: The mistake of failing to reject the null hypothesis when it is actually false.

Use a 0.05 significance level to test the claim that 20% of all Skittles candies are purple.
1) Claim: p = .20------------------------ Null
Opposite: p does not equal .20----- Alternate
2) Significance level = 0.05
3) Test Statistic:
Z score= 0.0741 Critical value 1.96
1.96>0.0741
4) Decision:
Fail to Reject Null Hypothesis.
5) There is not sufficient evidence to reject the claim that 20% of candies are purple.
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
Skittles is 62.
1. Claim: m= 62------------------ Null
Opposite: m does not equal 62- Alternate
2) Significance level= 0.01
3) Test Statistic
(P-Value).0001 versus (Significance Level) .01
Reject Null Hypothesis
4.) There is sufficient evidence to reject the claim that the mean number of candies per bag is 62.
Summary
The requirements for estimating a population proportion were met, but the estimation of a
population mean and population standard deviation were not. The sample size was below 30 and
the population standard deviation was not normally distributed. The mean and standard deviation
cannot be approximated and the methods of these sections do not apply. Other methods would be
better appropriated for such cases. Sampling methods would be improved if the sample size met
the requirement of 30 or more. Using other sampling methods such as systematic stratified, or
cluster sampling would not necessarily improve the sample at this point until it was larger. Once

more people were included in the sample, the better we were to have a better reflection of the
population proportion estimate.

Based on the statistical research, we have been able to conclude a 95%, 98%, and 99%
confidence interval of proportion of green candies, true number of mean candies per bag, and the
standard deviation of the numbers per bag (respectively). Additionally, we had the opportunity to
test certain claims about proportions and means that have aided in understanding the values
needed to infer statistical responses. We have tested the claim that claim that 20% of all Skittles
candies are purple and concluded there was not sufficient evidence to reject the claim that 20%
of candies are purple. We also tested whether the mean number of candies in a bag of Skittles
was 62 and concluded that there was sufficient evidence to reject the claim that the mean number
was 62.