You are on page 1of 10

Jennifer Sorto & Nicole S.

Aurora Jensen
Skittles Project
October 4, 2015

In this project we have combined the collection of each students bag


of skittles, with finding the different amount of colors per bag. We have first
started this project by collecting the sum amount of different colors per
skittles bag. Next we have analyzed the data to see what each student has
come up with. From this data we will then draw conclusions using hypothesis
testing to see what the result conclusion would be.
Organizing and Displaying Categorical Data: Colors

After completing this part of the project, we have noticed that there are
many more orange than any other color of skittles. We also observed that
there is the least of the color yellow. All the colors are very close to each
individual color and there is no color that has significantly more or less.
According to our individual data, we see that there is more orange in our
skittles and there is less yellow than any other color. This matches up with
the data collected from the entire class.
Jennifers Data Collection
Red
14

Orange
17

Yellow
8

Green
10

Purple
11

Yellow
13

Green
15

Purple
9

Yellow
276

Green
305

Purple
294

Nicoles Data Collection


Red
8

Orange
11

Class Data Collection


Red
282

Orange
339

Organizing and Displaying Quantitative Data: The Number of


Candies per Bag
Summary statistics:
Column
Mean
Class
59.8

Std. dev.
2.90

Median
60

Min
53

Max
Q1 Q3
68
59
61

My observation of this data shows the distribution among the different colors
in each skillet bag as well as the mean, standard deviation and 5-number
summary. The shape of distribution seems to be a bell shape, starting from
low then increasing and then dropping down again. The graphs shown above
do show what I have expected to see since it somewhat relates to my own
individual data set. The overall data collection from the class does relate to
my own collection from my single bag, having a high amount in orange
candies and a low amount in yellow candies. The number of candies in my
bag came out to a total of 60, and there were 26 bags of candies in the
sample.
Reflection
Qualitative data is information that describes the nature or qualities of the
subject in words whereas categorical data is anything that can be broken
down into categories where every observation in the data set will fit into a
category. Bar graphs and pie charts make the most sense for categorical
data because the values are labeled for the categories and the distribution of
the variables gives either the count or the percent of individuals falling into
each category. The graphs that do not make sense for categorical data are
histograms and box plots because these graphs fall into quantitative data.
The graphs that make most sense for quantitative data are the ones I have
mentioned earlier, which are histograms and box plots. The reason is
because they display shapes of distributions, to organize numbers and make
them easier to read. Graphs that dont make sense for quantitative data are
bar graphs and pie charts because they fall into the categorical data as
mentioned earlier.
November 23, 2015
Confidence Interval Estimates
The purpose of a confidence interval is to describe the level of
uncertainty in the sample you are analyzing.
95% confidence interval results:
p : Proportion of successes
Method: Standard-Wald
Proportion Count Total Sample Prop.
p
294 1496
0.19652406
99% confidence interval results:
Outcomes in : Candies per Bag
Success : 68
p : Proportion of successes
Method: Standard-Wald
Variable
Count Total Sample Prop.

Std. Err.
L. Limit
U. Limit
0.010273739 0.17638791 0.21666022

Std. Err.

L. Limit

U. Limit

98% confidence interval results:


2 : Variance of variable
Variable
Sample Var.
DF
Candies per Bag
8.39 24

L. Limit
4.6849894

U. Limit
18.547651

In the 95% confidence level, it is showing that they are 95% confident
that there would be 294 purples in a sample size of 1496 skittles.
There is a 99% confidence level that the mean number of candies per
bag would be 59.84.
There is a 98% confidence for a set of sample values that measures
how much data values deviate away from the mean.
Hypothesis Tests
The general purpose and meaning of a hypothesis test is to determine
the probability that the given hypothesis is true.
Hypothesis test results:
p : Proportion of successes
H0 : p = 0.5
HA : p 0.5
Proportion Count Total Sample Prop.
p
305 1469
0.20762423
Hypothesis test results:
: Mean of variable
H0 : = 56
HA : 56
Variable
Sample Mean
Candies per Bag
59.84

Std. Err.
0.013045451

Std. Err.
DF
0.57930993 24

Z-Stat
P-value
-22.412085 <0.0001

T-Stat
P-value
6.6285761
<0.0001

There is a .01 significance level that tests the claim that 20% of all
skittles candies are green per bag.
There is a .05 significance level that tests the claim that the mean
number of candies per bag is 56.
Reflection
The conditions for doing interval estimates and hypothesis tests for
population proportions is that first the sampling method should be a simple

random sample, and second the sample is sufficiently large with at least 10
successes and 10 failures. Our samples have met these conditions.
The conditions for doing interval estimates and hypothesis for
population means is that the data comes from a random sample and the
variable is normally distributed from the population. Our samples do meet
these conditions.
The conditions for doing the interval estimates of standard deviation is
when its characteristic thats being measured is numerical. Our samples do
meet these conditions.
Possible errors that could have been made by using this data could have
been an error in calculations and confusion in the interpretations of the data.
The sampling method could have been improved by having a more organized
data set including all the numbers that were needed to make the
calculations. Conclusions that we have drawn from our research is that there
is a significantly different amount of skittle candy colors per bag in our data
set.

Confidence Interval Estimates


The purpose of a confidence interval is to describe the level of
uncertainty in the sample you are analyzing.
95% confidence interval results:
p : Proportion of successes
Method: Standard-Wald
Proportio Coun Total Sample
n
t
Prop.

Std. Err.

0.010273739 0.17638791 0.21666022

294

1496 0.19652406

L. Limit

U. Limit

95% Confidence interval estimate for the true proportion of purple


candies:
0.188 < p < 0.224

99% confidence interval results:


Outcomes in : Candies per Bag
Success : 68
p : Proportion of successes
Method: Standard-Wald

Variable

Coun Tota Sample


t
l
Prop.

Std. Err.

Candies per
Bag

0.03919183 0.1409514
6
0.060951479 8

25

0.04

L. Limit

U. Limit

99% Confidence interval estimate for the true mean number of


candies per bag:
58.35 < u < 61.33

98% confidence interval results:


2 : Variance of variable
Variable

Sample Var.

DF L. Limit

U. Limit

Candies per Bag

8.39

24 4.6849894

18.547651

98% confidence interval estimate for the standard deviation of the


number of candies per bag:
3.615 < < 6.603

In the 95% confidence level, it is showing that they are 95% confident
that there would be 294 purples in a sample size of 1496 skittles.
There is a 99% confidence level that the mean number of candies per
bag would be 59.84.
There is a 98% confidence for a set of sample values that measures
how much data values deviate away from the mean.
Hypothesis Tests
The general purpose and meaning of a hypothesis test is to determine
the probability that the given hypothesis is true.
Hypothesis test results:
p : Proportion of successes
H0 : p = 0.2
HA : p 0.2
Proportion Count Total Sample Prop. Std. Err.
p

305

1469 0.20762423

Z-Stat

P-value

0.010436361 0.73054527 0.4651

In this sample we are using a 0.01 significance level to test the claim that
20% of the skittles candies are green. In the P- Value Method, if the
significance level is less than the P- Value we fail to reject the null hypothesis
(H0).
This sample has a P-Value of .4651 and a significance level of 0.01, Since the
P-Value is greater than the significance level, There is not sufficient evidence
such that we fail to reject the null hypothesis of p=0.2.
Hypothesis test results:
: Mean of variable
H0 : = 56
HA : 56
Variable

Sample Mean

Std. Err.

DF T-Stat

P-value

Candies per Bag

59.84

0.57930993

24 6.6285761

<0.0001

In this sample we are using a 0.05 significance level to test the claim that the
mean number of candies in a bag of skittles is 56.
The P-Value in this sample is less than the significance level of 0.05. In this
case, there is sufficient evidence to reject the original claim that the mean
number of candies in a skittles bag is 56.

The conditions for doing interval estimates and hypothesis tests for
population proportions is making sure that we first contain a simple random
sample and that the sample is sufficiently large with at least 10 successes
and 10 failures. The samples we have included in this project do meet these
conditions because they are samples contained by each member of our
class.
The conditions for doing interval estimates and hypothesis for
population means must be met by having data that comes from a random
sample and having the variable normally distributed from the population,
which in this case are our classmates. Our samples do meet these
conditions.
The conditions for doing the interval estimates of standard deviation
are that they must contain a simple random sample, the conditions of the
binomial distribution are satisfied. This would be that my samples had a fixed
number of trials and the probabilities remained constant for each trial. Finally
there would have to be at least 5 successes and 5 failures.

The possible errors that could have been made by using this data could have
been an error in calculations and confusion in the interpretations of the data.
The sampling method could have been improved by having the data
provided in a useful way, I found myself having to organize the data in
different ways to make them work into the equations. This made the project
take longer than it originally would have. The conclusions that we have
drawn from our research is that there is a significantly different amount of
skittle candy colors per bag in our data set.

Statistics 1040
Eportfolio Reflection
December 8, 2015

In completing this project I am proud of what I have


accomplished. Not only have I gained a better knowledge of real world math
applications but this class has also helped me develop my problem solving
skills and critical thinking skills, all of which I will be able to use in other
classes. I have incorporated much of what I have learned this semester in my
Statistic 1040 class. I have a better knowledge on what a chart is, how to
create any kind of chart, and what kinds of charts there are including how to
interpret them. I have also improved my skills at collecting data by using
various sampling methods and applying them according to what has been
asked of me throughout this project. Before starting statistics I had no idea
what a confidence interval was or how to conduct a Hypothesis test, this
class has successfully delivered the means and the teachings for me to
understand and use this knowledge in other classes outside of my statistic

class. For example, I have already been able to apply what I have learned in
my statistics class with helping my younger sibling in completing a report for
his English class. His report is about why High School students drop out of
school and do not graduate. We were able to use a sample that we have
conducted on why students drop out and we were able to support his paper
with charts and facts collected with my statistics 1040 knowledge and
therefore providing a supported and scientifically sounded paper yielding in
success of that project. I am convinced I will have many more such
opportunities throughout my college career that will flow into my future
career as a registered nurse.

You might also like