You are on page 1of 7

Taste the Rainbow

In this paper, well be analyzing information that was gathered from 22 bags of skittle in our
class. The sample size, (total # of candies) will be shown in several charts each one showing
different information about the data. Following, there will explanations on what each chart
explains about the data. The goal is to observe the data thats been given, compile it into charts,
and understand how each chart explains different information about the data that was collected
from all the candies.

The first table below contains the number and percentage of each of my colors and the total
count from my own personal bag of candy.
Orang Yello Purpl
Red e w Green e Total
14 8 8 9 20 59
23.73 13.56 13.56 15.25 33.90
% % % % % 100%

The following table contains all the information from the entire class, the data here is used in the
rest of the charts throughout the paper. It contains the total number of each color and the total
number from each bag that was collected. As youll see in the chart below there are similarities
and differences when my personal bag is compared to the whole class sample.

Numbe Red Orang Yellow Green Purple Total


r e
1 14 18 12 8 8 60
2 11 11 7 11 18 58
3 13 11 7 17 12 60
4 14 5 10 13 14 56
5 16 15 18 12 0 61
6 7 8 10 12 16 53
7 9 8 23 16 7 63
8 14 14 9 11 13 61
9 15 10 11 10 8 54
10 11 9 16 12 13 61
11 13 16 12 5 12 58
12 10 7 16 9 19 61
13 15 12 7 17 8 59
14 12 8 7 14 18 59
15 15 12 8 11 10 56
16 13 12 19 7 9 60
17 19 9 8 13 12 61
18 10 16 11 10 11 58
19 11 14 15 11 11 62
20 11 16 12 13 8 60
21 12 13 11 13 11 60
22 14 8 8 9 20 59
Total 279 252 257 254 258 1300
% 21.46 19.38 19.77 19.54 19.85 100%
% % % % %

Number of Skittles
Purple; 258; 20% Red; 279; 21%

Green; 254; 20%


Orange; 252; 19%

Yellow; 257; 20%

Red Orange Yellow Green Purple


The pie chart above each slice contains the count and percentage of each color that was collected
from the total count of 1,300 candies.
One similarity that can be seen between my bag and the total class sample is that my bag had
two colors (orange and yellow) that had the same count and most of the classes colors were
within a few digits of each other. In contrast, in the class sample the purple had the least amount
but in my own sample purple was the largest percentage of the whole.

The Pareto chart above shows the number of candies in each color like the pie chart but it also
explains it in descending order. For example, you can see in this Pareto chart that 80% of the
candies comes from 20% of the colors.

The data below will be the information used to create the boxplot and the histogram for the
following paper.
Mean- 59.1
Standard deviation- 2.5
Five number summary- (Minimum) 53, (Q1)58, (Median)60, (Q3)61, (Max)63
Total bags of candy- 22
Number of candies from my own bag - 59
The boxplot below explains the five-number summary in four sections. The whiskers represent
the minimum and maximum values, gray section is quartile one, yellow section is quartile three,
and the line separating the two colors is the median which is also called quartile two.

Boxplot

0 10 20 30 40 50 60 70

Min,Q1,Med,Q3,Max

Below is a frequency distribution chart for the following histogram.


Score Frequency
50-52 0
53-55 2
56-58 5 Histogram
59-61 13
62-64 2 14 13
12
10
8
Frequency of # from 22 bags 6 5
4
2 2
2
0
52
0 55 58 61 64More
0

Total # of candies

The distribution for the boxplot is skewed to the left and the histogram is also skewed left which
means the mean will be pulled towards the left side. Notice that the standard deviation is 2.5 in
other words, the variation between the number of skittles found in each bag is very small. For
example, out of 22 bags four sets of the numbers repeated three times; that means 95% of the
data lies within 2.5 standard deviations of the mean (59.1). When comparing my own bag to the
22 bags from the class sample I didnt expect to see as many patterns throughout the data as I
did, most of the data from the skittles is within a small range; it would be sufficient to say, that
yes, the class sample of skittles does agree with my data from my own bag of candies.

The first thing we want to identify before creating charts or graphs is whether not the data given
is qualitative (categorical) or quantitative. The charts that were created in this paper are separated
by those qualities. Quantitative data for example, is more structured than categorical data so it
would be best to use line, bar graphs, or frequency histograms to show data as we did above in
the histogram. The difference between the two is that quantitative data is countable and its
typically numerical or measurable such as someones income, how much they weigh, or their
height and qualitative data categorizes or labels data. The most common way quantitative data
can be calculated is through surveys and online polls the samples are usually larger and can be
used to gain numerical data that can be later used in statistics. Qualitative data on the other hand
is describes data such as, female-male, a persons name, or a social security number. This type of
data is more for showing the reasons behind something or simply labeling information as it is.
The Pareto chart is a good example of qualitative data because each bar represents a category so
the differences from each part of the data can stand out from the others. The calculations that
make the best sense for qualitative data can come from themes of discussion, observations, as
well as information from numerical data; it is very useful in describing the attributes of data.

Confidence Intervals
A confidence interval is the range within a set of numbers for example, 50-60. Why do we use
confidence intervals? Well, the purpose of a confidence interval is to measure the likelihood of
the unknown population proportion being included in the interval 50-60. In other words, if we
had a 90% level of confidence (alpha- 0.10) we would expect 90 of the intervals to include the
parameter and 10% not to be in the parameter.

1) We are 99% confident that the true proportion of yellow candies lies between 0.169 and
0.226. This means that the proportion of yellow candies lies between 17% and 23% of the
total sample (1300 candies).

2) If the sample mean of the sample is 59.1 out of 22 bags of candies then the mean lies
between 58 and 60. Another interpretation would be: We are 95% confident that the mean
(59.1) number of candies per bag out of 22 bags lies between 57.99 and 60.21.

3) We are 98% confident that 58.9 and 59.3 will lie within 2.5 standard deviations of the
mean. In other words, the number of candies ranging from about 58-59 will be within 2.5
standard deviations.
Hypothesis Tests

Hypothesis testing means to make a claim that you want to test and the purpose of such a test is
to disclaim or accept your claim. For example, you gather information (sample) and prove
whether not you want to accept or reject the information youve gathered based on the evidence
youve gathered or havent gathered enough of.

1) Mean 279 Ho: p = .20


n 1300
H1: p not equal .20
proportion 0.2
p-hat 0.2146
p-value 0.1877
z-score 0.90

This is a two- tailed test and there is not sufficient evidence at the level of significance
(0.05) to conclude that 20% of all the Skittles are red. Because P-value is greater than the
level of significance (0.05) we fail to reject the null hypothesis (.1877>0.05).

x-bar 59.1
2) Ho: pop-mean = 55 pop-mean 55
s 2.5
H1: pop-mean not equal 55
n 1300
t-stat 7.69
P-value 0
CV +-2.831

There is enough evidence at the level of significance (0.01) to conclude that the mean number of
candies per bag is 55. Because P-value is less than the level of significance, we reject the null
hypothesis (0<0.01) or (7.69>2.831).
Reflection
When were working on interval estimates and hypothesis testing we need to define the
population, find the parameter (example: average number of a CEO salary), we then select a
sampling method like simple random sampling, gather the data, and then we use a formula to
make estimates or information to do hypothesis testing. Yes, the examples we worked on in this
paper met the conditions for the testing and estimations. However, the sampling method could be
improved by increasing the population size.
Considering example one in the hypothesis test above; if we rejected the null hypothesis and the
alternative hypothesis was true we would have made a type 2 error. For question number two, the
alternative hypothesis we rejected could have been a type 1 error if the null was true.
We can conclude with our research that the population size of our data will affect the results we
get from testing, calculations, or estimations. It will also differ depending on what type of
evidence we gather and how we choose to interpret what we have gathered. Data can also be
easily misleading to the reader if the data is not shown accurately in charts or explanations.

You might also like