You are on page 1of 11

Meseret Asfaw

Professor: Ping Yu
Math 1040
07-25-2016
Final Term Project
For this project each student in the class our statistics class bought a 2.17-ounce bag of
skittles,29 bags in total. Then, we counted the number of skittles and recorded how many candies
of each color were in the bag. After that, we determined the proportion of each color of candy
and created pie chart and a pareto chart. Then we compared the class data with our own data
Observing any similarities or differences.
Secondly, we used our data to calculate the mean, standard deviation and the 5-number
summary. We drew a frequency histogram of the total number of candies as well as a box plot
using our 5-number summary. Also we wrote a reflection about the differences between
qualitative and quantitative data.

Organizing and Displaying Categorical Data: Colors

Pie Chart

20%

21%

21%

20%
18%

Red

Orange

Yellow

Green

Purple

Number of Candies

Pareto Chart
370
360
350
340
330
320
310
300
290
280
Red

Green

Purple

Orange

Yellow

Color

My
Red
bag
Number of 9
candies

Orange

Yellow

Green

Purple

Total (1
bag)

10

13

14

10

56

The whole Red


class
Number of 366
candies

Orange

Yellow

Green

Purple

343

314

355

346

Total (29
bags)
1724

The highest number of candies in my bag is in green so I thought the total data would
have the highest number of candies in green too. However, the highest number of candies in the
total data is in red which didnt match my expectation. The lowest number of candies in my bag
is in red and the lowest number of candies in the total data is in yellow, which prove my
expectation is wrong that the lowest number of candies would be in red in the total data.
The proportion of candies in my bag 0.1607 Red, 0.1785Orange, 0.2321 Yellow, 0.25
Green and 0.1785Purple. in comparison to the class proportion of red candies was 0.21, orange
0.20, yellow 0.18, green 0.21 and purple 0.20. I noticed that in my class data, the proportion of
the orange and purple candies are similar data and also I found the total data has similar
proportion in orange and purple of skittles.
Organizing and Displaying Quantitative Data: The Number of Candies per Bag
Mean: 59.45
Standard Deviation: 2.4

Minimum: 53
Q1: 58
Median: 60
Q3: 61
Max: 64

52-53
54-55

1
1

56-57
58-59
60-61
62-63

1
10
10
5

64-65

Histogram of Total Numbers of


Candies
12
10
10

10

8
6
5

2
1

52-53

54-55

56-57

58-59

60-61

62-63

64-65

Five Number Summary

Reflection:
From the histogram and the box plot it appears that the distribution is not symmetrical like a bell
curve and is negatively skewed to the left. From the five number summary, we can see that the
median value is 60 and the mean value is 59.45 which is lower than the median, so the distributio n
is skewed to the left. The standard deviation is around 2.4 candies which is very low. This indicates
that the manufacturing process for the skittles was well controlled to the keep the variation of
candies in a bag to the minimum.

Reflection:
Quantitative data is defined as data that consist of numerical measures or counts. Some
examples of quantitative data are: height, distance and weight. Categorical data consists of
attributes, labels or non-numerical categories. Some example of categorical data are type of color,
gender and brand of product. Since the quantitative data is numerical we can perform meaningful
calculations which is not possible for categorical data which cannot be measured but it can be
counted after classifying the data into different categories.
For quantitative data it makes sense to calculate measures of center like mean, median and
mode, it is also possible to examine the spread of the data by calculating the standard deviation,
variance, range quartiles, inter-quartile range. We can plot quantitative data by histograms, box
plots, dot plots, stem and leaf plots. This provides us with a visual assessment of how the data is
spread across the center.
For analyzing categorical data, there are only two ways to graph the data: pie charts and bar charts.
There are no meaningful average or mean values and measure of spread like the standard deviation
and variance for categorical data since the data is not numerical and such calculations are not
feasible. Categorical data consist of categories which can be counted and summarized in a
frequency table. If the number of categories are small, the data can be visually assessed by making
a pie chart and a bar chart. If the number of categories are very large, it is preferable to represent
them by a bar chart because a pie chart may not provide clear representation of the categories.

Confidence Interval Estimates:


Purpose and Meaning of a Confidence Interval
The purpose of taking a random sample from a lot or population and computing a
statistic, such as the mean from the data, is to approximate the mean of the population. How well
the sample statistic estimates the underlying population value is always an issue. A confidence
interval addresses this issue because it provides a range of values which is likely to contain the
population parameter of interest.

Construct a 99 % CI estimate for the true proportion of yellow candies:

99 % C.I
Yellow Candies
P hat E < P < P hat + E
E= Z (alpha/2) sqrt [p hat (1-P hat)]
n

P hat= 314/1724 = 0.182 (yellow candies)

E= 2.575 sqrt[ 0.182 (1-0.182)]


1724

E= 0.24

P hat + - E
(0.182 + 0.24 ) (0.182 0.24)
0.158 < P < 0.206

Construct a 95% CI estimate for the true mean number of candies per bag:
Since the sample size is close to 30 we can assume that the population distribution is normal
X bar E < mu < X bar + E
E = t (alpha/2) ( Std.Dev)
Sqrt(n)

Std.Dev = 2.4
X bar = 59.45
n = 29
df = 28

t critical = 2.048

E = 2.048 (2.4)/sqrt 29
E = 59.45 + - 0.913
58.537 < Mu < 60. 363

Construct a 98% CI estimate for the standard deviation of the number of candies per bag

Sqrt [ 28 (2.4)^2]
48.278

1.830

<

sigma < sqrt [28(2.4)^2]


13.565

< sigma < 3.448

Discuss and interpret


A) We are 99% confident that the 2 proportions of yellow candies contain within 0.158 < P
< 0.206. In other words, if we take 100 samples of skittles candies 99 samples will be
contained in the confidence interval and 1 sample will be outside the interval.

B) We are 95 % confident that the true mean number of candies per bag is contain within
59 < Mu < 61. In other words, if we take 100 samples of candies, 95 of the samples will be
contained in the confidence interval and 5 will be outside the interval.

C) We are 98% confident that the population standard deviation of the number of candies
per bag is contained 1.830 < sigma < 3.448. In other words, if we take 100 samples
of skittles candies 98 samples will be contained in the confidence interval and 2 samples
will be outside the interval.

Hypothesis Tests
Purpose and meaning of a hypothesis test is statistical method that uses sample data to
evaluate the credibility of a hypothesis about a population parameter.
Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red:
Alpha = 0.05
Ho: P = 0.2 (Original Claim)

Hi: P not = 0.2


P hat = 366
1724

P hat = 0.212 (red candies)

Z=

Z = 1.25

0.212 0.2
Sqrt [ 0.2 (0.8)]
1724
=(0.8944)

Z critical = + - 1.96

Final Conclusion:
The Z statistics value of 1.25 < 1.96 therefore we fail to reject the null hypothesis that the
proportion of red candies is 20 %. In making this conclusion we might be making a type 2 error.

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of
skittles is 55 :
Alpha = 0.01

Ho: M = 55 (Original Claim)


Hi: M not = 55
t= 59.4 - 55
2.4/sqrt(29)
t = 9.87
Final Conclusion:
The test statistic value of 9.87 > CV of 2.763 therefore we have strong evidence to reject the null
hypothesis that the mean number of candies in a bag is 55. In making this conclusion we might
be making a type 1 error.
Conditions for interval estimates and hypothesis tests for population proportion:
In this case, the normal distribution is used as an approximation to the binomial
distribution since the proportion is treated as a binomial proportion. The condition to check
before using this approximation are :
np 5 and nq 5 . where n is the sample size and p and q are proportions. Also the sample
should be a simple random sample.
For our samples, the sampling can be assumed as random sampling since the skittle bags
were purchased by each student randomly from different locations.
n = 1724 and p = 0.2 therefore np = 345 which is > 5 and nq = 1379 which is > 5 therefore the
conditions were met for the estimation and testing.
Conditions for interval estimates and hypothesis tests for population mean:
The conditions for this case are:
The sample must be a simple random sample. The value of population standard deviation is
unknown so we will use t distribution for both estimating and hypothesis test.
The population is normally distributed or the sample size n > 30.
The sample size is 29 which is almost near the required size of 30 so t distribution can be used in
this case.

Conditions for interval estimates for population standard deviation:


The conditions for this case are:
The sample should be a simple random sample.
The population must be normally distributed.
For our samples, we do not have information that the population is normally distributed even
though our sample size is 29. So, this requirement may not have been met for our samples.
Possible Errors:
The possible errors which could have been made are:
There can be measurement or counting error. Some student may have counted incorrectly or
included broken candies in the data. Another error can be Type I or Type II error in Hypothesis
testing.

Skittle project reflection


When I first started the Skittles project, I was intimidated by the process of using
statistical concepts to interpret real-life data. I had a hard time working on it because of how
difficult it seemed at first. I was so scared at how much work this was actually going to require.
We needed to create a random sample of data, have that data organized, create graphs, charts and
interpret what the information means.
Throughout the semester we were thought a great variety of statistical concepts that gave
us the necessary tools to be able to complete this project. Little by little I realized that I was not
only capable of gradually understanding the instructions, but I was also able to perform the
correct sequence of steps for each one of the exercises that were part of this project. As in any
other discipline and class, after learning the theory, practice makes the whole difference.
This project allowed us to put into practice key principles studied throughout the term,
from using a sampling method to performing hypothesis testing. The most challenging aspects
of the project were really understanding each concept, and how it applied to the population of
Skittles not just our sample.
This specific project has helped me to better understand statistics and how statistics
works in the real world. I have learned a lot about statistics and how they work. It turns out,
statistics are actually used in the real world and they play an important part in our everyday lives.
They provide information that is beneficial to society.
This project and the class in general have gave me the tools to differentiate between valid
professional papers from those that end up being questionable sources of information when
analyzing things like graphs, and confidence levels and intervals that can make the whole
difference when trying to see if the study is well done. To be able to understand the language

behind the statistical analysis of studies with simple, but important terms such as media, range,
mean, and mode that are so frequently use in so many instances.
All in all, I have learned a lot throughout the entire semester even if it was really condensed and
fast pace. I have learned how to apply these techniques in a daily life situations most important ly
it help me to develop the ability of my critical thinking

You might also like