You are on page 1of 7

Analysis Skittles Data

Data Collection
Each student in the class will purchase one 2.17-ounce bag of Original Skittles and
record the following data:
Number of Number of
total candies red candies
54

Number of
orange
candies
13

Organizing and Displaying Data

Number of
Number of
Number of
yellow candies green candies purple candies
14

10

11

The Pie Chart showed me the total numbers of colors for a whole class which the highest
was red and the least was yellow. Compared to my own bag, it was opposite. Which the highest
was yellow and the least was red.

Summary statistics of the numbers of colors for the class totals:


Colors

Sum

Red
Orange
Yellow
Green
Purple

360
315
294
313
319

The numbers of color for Octans candy bag


Colors
Red

Amount
6

Orange

13

Yellow

14

Green

10

Purple

11

The graph reflects what I expected to see. The overall data collected by the whole class
does not agree with my own data from a single bag because it was opposite of the highest and the
least numbers of certain colors.

Quantitative Data
Summary statistics:
Column
Total Skittles

n
27

Mean
59.3

Std. dev.
2.72

Median
59

Min
54

Max
66

Q1 Q3
58

61

The data looks in a normal distribution (Bell shape). The totals numbers of candies per
bag is not to0 far away from the average. The mean and median are close to each other.. It
reflects what I expected to see.
Categorical (Qualitative Data) describes something and does not represent measurement.
In this project, the qualitative data is the colors of skittles. Which is best represent by pie chart or
Pareto chart. Quantitative data represents some measurement. The types of graph that make sense
for this data are histogram and boxplot.

Confidence Interval explanation


A confidence interval is a range or an interval of values used to estimate the true
value of a population parameter. A confidence interval is sometimes abbreviated as
CI. Confidence interval is associated with any value of confidence level, it depends
of how confident a person wants to be that the true value is within the confidence
interval.
1. Construct a 99% confidence interval estimate for the true proportion of
yellow candies.
N = number of sample values (total candies from the entire class)
X = sample size (total number of yellow candies)
P hat = sample proportion
P = population proportion
= 0.01 because we are 99% confidence (1-confidence level)
E = margin of error (is the maximum likely difference with probability of 0.99
between the observed sample proportion p hat and the true value of the
population proportion p)
The result is we are 99% confidence that the interval from 0.159 to 0.209
actually does contain the true value of the population proportion p.
2. Construct a 95% confidence interval estimate for the true mean number of
candies per bag.
N = number of sample values
= population mean
x bar = sample mean
E = margin of error
S of x = standard deviation of the sample
The result is we are 95% confidence that the interval from 58.23 to 60.37
actually does contain the true value of the population mean .
3. Construct a 98% confidence interval estimate for the standard deviation of
the number of candies per bag.
N = number o0f sample values
S of x = standard deviation of the sample

= 0.02 because we are 98% confidence (1-confidence level)


chi-squared left = left-tailed critical value of chi-squared
chi-squared right = right-tailed critical value of chi-squared
The result is we are 98% confidence that the interval from 2.049 to 3.964
actually does contain the true value of the population standard deviation.

Hypothesis Test Explanation


Hypothesis is a claim or statement about a property of a population. Hypothesis test
is a procedure for testing a claim about a property of a population.
First identify the null hypothesis and alternative hypothesis, express both in
symbolic form. The null hypothesis indicates no change and alternative hypothesis
indicates there is a change. After we determining alternative hypothesis, then we
can find out that the test is one tailed or two tailed test. It helps find how is
distributed. Then we calculate test statistic, given a claim and sample data. Choose
the sampling distribution that is relevant. Either find P-value or identify the critical
value and then state a conclusion about a claim in simple and non-technical terms.
1. Use a 0.05 significance level to test the claim that 20% of all skittles candies
are red.
I used classical method to compare test statistic with the critical value. If the
test statistic is bigger than the critical value, we reject the claim (rejected null
hypothesis). But if test statistic less than the critical value, it means we failed
to reject the claim (failed to reject null hypothesis and supports the
alternative hypothesis). In this case, the result is that the test statistic is
bigger than critical value, it means we rejected the claim that 20% of all
skittles candies are red.
2. Use a 0.01 significance level to test the claim that the mean number of
candies in a bag of skittles is 55.
I used classical method to compare test statistic with critical value. In this
case the test statistic is far away beyond the critical value, it means we
rejected the claim that the mean number of candies in a bag of skittles is 55.
Reflection
The conditions for doing interval estimates are the data is a normal distribution, it is
a simple random sample, and each event are independent. Based on the skittles
data that we did in our class, we met these conditions. The possible error could
have been made such as miscounting of a number of candies (colors). For example,
a different way how the students count how many candies (colors) in the bag. By
these statistical research (Confidence interval) we could find that the true value of a
parameter is within the confidence interval value, and test statistic is a procedure
for testing a claim about a property of a population.