You are on page 1of 11

Math 1040 Term Project-Skittles

Introduction:
The term project below is one that demonstrates the many
concepts I have learned over the course of the semester. The concepts
represented include organizing and representing data in graphs such
as pie charts and histograms, and using confidence intervals and
hypothesis testing to draw conclusions about the data collected. I
collected my own data using a 2.17 ounce bag of Skittles, along with
20 other students in my class. Below is my individual data:

Individual Data Collection:


Number of

Number of

Number of

Number of

Number of

RED candies

ORANGE

YELLOW

GREEN

PURPLE

11

candies
13

candies
10

candies
12

candies
10

Note: I only included whole candies in my data collection, and


disregarded the candies that were broken.

Below is the entire data for the class:

RED
18

ORANGE
15

YELLOW
11

GREEN
9

PURPLE
7

7
10
12
9
9
12
11
15
10
10
18
7
8
8
16
13
10
7
14
15

8
8
15
16
12
14
13
17
15
13
5
13
10
15
8
13
10
22
16
19

13
18
8
17
21
6
10
8
14
10
19
10
5
12
11
8
12
11
10
4

19
11
12
12
6
13
12
8
10
8
11
18
10
12
19
18
17
9
15
9

16
12
11
5
9
14
10
15
12
17
7
13
16
11
8
8
15
9
7
11

The pie charts show that the frequencies of each candy occurring
are similar between all the colors. There are a few times that the
number of candies occurred less often than expected, and times where
the number of candies was greater than what happened the most. But
overall, this shows that the majority of the bags had equal numbers of

each color in them. My own data slightly differs from the values from
the rest of the class, because it shows that some of the colors that
were in my bag occurred less frequently than the other colors.

Summary statistics:
Column
RedCandies
OrangeCandies
YellowCandies
GreenCandies
PurpleCandies

Mean
11.38
13.234
11.38
12.29
11.09

Std.dev.

Min
3.49
3.94
4.42
3.93
3.49

7
6
4
6
5

Max
Q1 Q3
18
9
14
22
10
15
21
8
13
19
9
15
17
8
14

5 Number Summaries:
Red Candies: MIN: 7; Q1: 9; MEDIAN: 10; Q3: 14 MAX: 18
Orange Candies: MIN: 6; Q1: 10; MEDIAN: 13; Q3: 15; MAX: 22
Yellow Candies: MIN: 4; Q1: 8; MEDIAN: 11; Q3: 13; MAX: 21
Green Candies: MIN: 6; Q1: 9; MEDIAN: 11; Q3: 15; MAX: 19
Purple Candies: MIN: 5; Q1: 8; MEDIAN: 11; Q3: 14; MAX: 17

Median
10
13
11
12
11

The distribution for each candy varies, but overall represents a normal
distribution. Some of the data showed that different colors of candy
had less than the mean values for that color candy, while some of the
colors were greater than the mean value for that color. I was expecting
the distribution to be less normal, because based on my own individual
data I didnt expect that when all the data was compiled it would be
normal.

Categorical data groups the data by category, and does not use the
frequencies in the data. Skittles groups by color would be an example
of categorical data since each color can be considered a category.
Quantitative data relies on numbers, or values, and this can be best
represented using frequency. When the graphs above showed how
frequently each color occurred, this used quantitative data. It would
not make sense to group colors as quantitative, as they dont have a
numerical value attached to them, and it would not make sense.

Confidence Interval Estimates


99% confidence interval for the true proportion of yellow candies:
Margin of error, E = 0.2807269

99% Confidence Interval (using normal approx):


0.1954636 < p < 0.7569174
95% confidence interval estimate for the true mean number of candies
per bag
Margin of error, E = 0.0942987
95% Confidence Interval (using normal approx):
0.0723679 < p < 0.2609654

98 % Confidence interval estimate for the standard deviation of the


number of candies per bag
Margin of error, E = 0.1119267
98% Confidence Interval (using normal approx):
0.7214067 < p < 0.94526

Hypothesis Tests
The general purpose of hypothesis testing is to test the original claim
based on the data that we have collected, and find out if the data is
sufficient enough to fail to reject the original claim, or find out if the
data is insufficient enough to reject the original claim.

Use a 0.05 significance level to test the hyptothesis that 20% of all
Skittles candies are red.
Alternative Hypothesis:

p not equal p(hyp)


Sample proportion: 0.2
Test Statistic, z: 0.0000
Critical z:

1.9600

P-Value:

1.0000

95% Confidence interval:


0.098788 < p < 0.301212

Use a 0.01 significance level to test the claim that the mean number of
candies in a bag of skittles is 55.
Alternative Hypothesis:
not equal to (hyp)
t Test
Test Statistic, t: 0.0000
Critical t:

2.8453

P-Value:

1.0000

99% Confidence interval:


53.13729 < < 56.86271

The hypothesis tests showed that the greater the confidence


interval, the less the deviation from the mean was. The 95 %
confidence interval testing that 20% of all Skittle candies were red
gave us a range that in each bag, the percentage of red candies could

vary from 9% to 30%. In the hypothesis test that used a 0.01


significance level to test the claim that the mean number of candies in
a bag of
Skittle is 55 showed us that this could range from 53 to 56 candies.
Given the actual data from the class, this hypothesis could be rejected,
since there were some students whose bags of candy resulted in a
total number that was greater than the upper level of the confidence
interval.
Reflection:
In this class I learned many new concepts related to statistics. I
learned how to make proper graphs and understand when these
graphs might be a false representation of the actual data.
This introductory statistics course introduced me to the types of
ways that I will be examining data in my future career. I learned the
importance of random data, and it has given me a basic understanding
of statistics and how it is applied in the real world.
Though I do not know which specific concepts I will be using in
the future, learning how to properly make graphs and represent data
has given me an idea of how I can interpret future data I will be
working with. I also can now recognize when studies provide good
representation of the entire population. If they mention their
confidence level or the fact that their sample was random, I am more
inclined to trust this data as opposed to data that is not random.

This project was challenging, and given the short amount of time
that was allowed for this final project, it was especially difficult to
finish. It did allow me to figure out which concepts I still struggle with
and need to study more, and it showed me how much I have learned
about statistics thus far. Overall though, this project has showed me
how to think of statistics in real-world applications and has given me a
better understanding of representing sample populations, populations,
and calculations of data through the use of various types of graphs.

You might also like