Professional Documents
Culture Documents
Professor: Ping Yu
Math 1040
07-25-2016
Final Term Project
For this project each student in the class our statistics class bought a 2.17-ounce bag of
skittles,29 bags in total. Then, we counted the number of skittles and recorded how many candies
of each color were in the bag. After that, we determined the proportion of each color of candy
and created pie chart and a pareto chart. Then we compared the class data with our own data
Observing any similarities or differences.
Secondly, we used our data to calculate the mean, standard deviation and the 5-number
summary. We drew a frequency histogram of the total number of candies as well as a box plot
using our 5-number summary. Also we wrote a reflection about the differences between
qualitative and quantitative data.
Pie Chart
20%
21%
21%
20%
18%
Red
Orange
Yellow
Green
Purple
Number of Candies
Pareto Chart
370
360
350
340
330
320
310
300
290
280
Red
Green
Purple
Orange
Yellow
Color
My
Red
bag
Number of 9
candies
Orange
Yellow
Green
Purple
Total (1
bag)
10
13
14
10
56
Orange
Yellow
Green
Purple
343
314
355
346
Total (29
bags)
1724
The highest number of candies in my bag is in green so I thought the total data would
have the highest number of candies in green too. However, the highest number of candies in the
total data is in red which didnt match my expectation. The lowest number of candies in my bag
is in red and the lowest number of candies in the total data is in yellow, which prove my
expectation is wrong that the lowest number of candies would be in red in the total data.
The proportion of candies in my bag 0.1607 Red, 0.1785Orange, 0.2321 Yellow, 0.25
Green and 0.1785Purple. in comparison to the class proportion of red candies was 0.21, orange
0.20, yellow 0.18, green 0.21 and purple 0.20. I noticed that in my class data, the proportion of
the orange and purple candies are similar data and also I found the total data has similar
proportion in orange and purple of skittles.
Organizing and Displaying Quantitative Data: The Number of Candies per Bag
Mean: 59.45
Standard Deviation: 2.4
Minimum: 53
Q1: 58
Median: 60
Q3: 61
Max: 64
52-53
54-55
1
1
56-57
58-59
60-61
62-63
1
10
10
5
64-65
10
8
6
5
2
1
52-53
54-55
56-57
58-59
60-61
62-63
64-65
Reflection:
From the histogram and the box plot it appears that the distribution is not symmetrical like a bell
curve and is negatively skewed to the left. From the five number summary, we can see that the
median value is 60 and the mean value is 59.45 which is lower than the median, so the distributio n
is skewed to the left. The standard deviation is around 2.4 candies which is very low. This indicates
that the manufacturing process for the skittles was well controlled to the keep the variation of
candies in a bag to the minimum.
Reflection:
Quantitative data is defined as data that consist of numerical measures or counts. Some
examples of quantitative data are: height, distance and weight. Categorical data consists of
attributes, labels or non-numerical categories. Some example of categorical data are type of color,
gender and brand of product. Since the quantitative data is numerical we can perform meaningful
calculations which is not possible for categorical data which cannot be measured but it can be
counted after classifying the data into different categories.
For quantitative data it makes sense to calculate measures of center like mean, median and
mode, it is also possible to examine the spread of the data by calculating the standard deviation,
variance, range quartiles, inter-quartile range. We can plot quantitative data by histograms, box
plots, dot plots, stem and leaf plots. This provides us with a visual assessment of how the data is
spread across the center.
For analyzing categorical data, there are only two ways to graph the data: pie charts and bar charts.
There are no meaningful average or mean values and measure of spread like the standard deviation
and variance for categorical data since the data is not numerical and such calculations are not
feasible. Categorical data consist of categories which can be counted and summarized in a
frequency table. If the number of categories are small, the data can be visually assessed by making
a pie chart and a bar chart. If the number of categories are very large, it is preferable to represent
them by a bar chart because a pie chart may not provide clear representation of the categories.
99 % C.I
Yellow Candies
P hat E < P < P hat + E
E= Z (alpha/2) sqrt [p hat (1-P hat)]
n
E= 0.24
P hat + - E
(0.182 + 0.24 ) (0.182 0.24)
0.158 < P < 0.206
Construct a 95% CI estimate for the true mean number of candies per bag:
Since the sample size is close to 30 we can assume that the population distribution is normal
X bar E < mu < X bar + E
E = t (alpha/2) ( Std.Dev)
Sqrt(n)
Std.Dev = 2.4
X bar = 59.45
n = 29
df = 28
t critical = 2.048
E = 2.048 (2.4)/sqrt 29
E = 59.45 + - 0.913
58.537 < Mu < 60. 363
Construct a 98% CI estimate for the standard deviation of the number of candies per bag
Sqrt [ 28 (2.4)^2]
48.278
1.830
<
B) We are 95 % confident that the true mean number of candies per bag is contain within
59 < Mu < 61. In other words, if we take 100 samples of candies, 95 of the samples will be
contained in the confidence interval and 5 will be outside the interval.
C) We are 98% confident that the population standard deviation of the number of candies
per bag is contained 1.830 < sigma < 3.448. In other words, if we take 100 samples
of skittles candies 98 samples will be contained in the confidence interval and 2 samples
will be outside the interval.
Hypothesis Tests
Purpose and meaning of a hypothesis test is statistical method that uses sample data to
evaluate the credibility of a hypothesis about a population parameter.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red:
Alpha = 0.05
Ho: P = 0.2 (Original Claim)
Z=
Z = 1.25
0.212 0.2
Sqrt [ 0.2 (0.8)]
1724
=(0.8944)
Z critical = + - 1.96
Final Conclusion:
The Z statistics value of 1.25 < 1.96 therefore we fail to reject the null hypothesis that the
proportion of red candies is 20 %. In making this conclusion we might be making a type 2 error.
Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
skittles is 55 :
Alpha = 0.01
behind the statistical analysis of studies with simple, but important terms such as media, range,
mean, and mode that are so frequently use in so many instances.
All in all, I have learned a lot throughout the entire semester even if it was really condensed and
fast pace. I have learned how to apply these techniques in a daily life situations most important ly
it help me to develop the ability of my critical thinking