You are on page 1of 6

C. S.

Statistics 1040

You place a dollar and some change in the hand of the cashier at the local convenience store,
you are now the owner of a standard bag of Skittles. A word of thanks as you slip out the door. Once you
are home you clean a space at the kitchen counter and dump the contents of the Skittles onto the space
and begin counting.

As with the bag I purchased, each member of the class also purchased and counted the candies.
The following is a report on the categorical and quantitative data gathered from counting Skittles. I will
use techniques learned in this class to provide and interpret the data.

The first thing I like to do when trying to understand data, is to look at the numbers. I have
changed the text color of the bag that I specifically counted.

# of
Red Orange Yellow Green Purple Total
bags
Amber 12 8 10 16 14 60 10
Violeta 16 8 13 12 15 64
Felecia 8 18 13 8 14 61
Joseph 11 11 10 11 20 63
Bethany 8 13 13 22 5 61
Kyungtae 9 12 13 6 18 58
Yuliza 11 8 14 12 13 58
Kellie 18 14 3 10 15 60
Christina 10 18 19 10 6 63
Sheena 10 11 9 10 20 60
------ ------ ------ ------ ------ ------
Total 113 121 117 117 140 608

Total Class Skittle


Colors

.230 .186

.199
.192
.192

Once you understand the entire class count, I can break down the numbers even further by
giving you the counts from just my bag.
Christina's Skittle Colors

.09 .16
.16

.29
.30

In each Pie Chart, all the values will add up to 1. This is because each colors value is a part of a
total, which is 100%. Instead of using the percentage, we have converted the numbers into decimal
form. Based on the total class counts each color is roughly the same size, I would expect this considering
that the total sample size was 10 bags. However, if you are viewing just the colors from Christinas bag
the Yellow and Orange were larger portions of candy. When looking at the Colors alone, we call this type
of data Categorical Data.

To help further describe the data as listed, we can use Quantitative Data. In contrast to the
categorical data, quantitative data is a way to look at the numbers rather than the colors. To obtain this
type of data you must take the total number of candies (608) and divided it by the total number of bags
(10). This equation will give you the mean, which is the average number of candies per bag. Using the
mean, we can find the standard deviation or average number of plus-minus from the center.

Mean: 608/10 = 60.8


Standard Deviation: 2.04

Although the above equations give us the averages for the total of 10 bags. Using the totals
from each individual bag count we can express a larger spread by using a 5-Number Summary. This type
of evaluation is exactly as it sounds, it provides the lowest and highest totals, and within these limits,
quartile breakdowns. Here is the visual:

Max: 64
Q3: 63
Q2: 60.5
Q1: 60
Min: 58

Interestingly, the average bag of Skittles has about 60 candies. But we cant always predict the
numbers of each color that is in the bag. Using a population of 10 bags we can guess that each color has
about the same chance of being in any given bag. Distribution is important when looking at the 10 bag
totals. We can see that after reviewing the data for the 5-Number Summary the whiskers (lines coming
from the grey box above) are roughly the same length. This lets us conclude that we have a normal
distribution.
Comparing the frequency histograms offers us a chance to consider if the numbers denote what
we expected. With a 10-bag total, each color is close to being about the same. In my bag alone, the
same result was not found. As we learned, the larger the sample size, the easier it is to receive a normal
distribution. Looking at individual bags, the results will be more skewed. Therefore, the overall data will
differ from the data collected from one single bag. As shown:

Total Skittles Christina's Skittle Count

608 63
140 19
121 18
117 10
117 10
113 6

0 200 400 600 800 0 20 40 60 80

total Purple Orange Green Yellow Red Total Yellow Orange Green Red Purple

Categorical Data is best shown by charts that give a visual layout of the colors. Considering that,
as you would expect, category means colors, in this case. As you have seen, a 5-Number summary would
not give us an appropriate layout for a color visual. Just as a Pie Chart would not provide us with a good
understanding of the data for numbers. We can use graphs that provide us with both colors and
numbers as you can see in the above depiction. Calculations are helpful when determining the
quantitative data. You would not need to use them to express colors. You do have to use an equation to
find percentage and to convert the percentage into a decimal. Categorical data is not just simple color
representation. However, it is not as descriptive as quantitative data.

Statistics gives us more than graphs for describing values. We can use it to look at the
probability that something will occur, this is called a Confidence Interval. The purpose of this function is
to help us describe a range for an uncertain value of an estimate. It is primarily used for large
approximations. General interval notations are 95% or 99%; this means that we can predict with a 99%
confidence that something will occur.

The best way to understand is to use an example. Say we would like to know the 99%
confidence interval for the true proportion of yellow candies to be in a bag of Skittles. This is done by
using the equation:

( * (1 ) / )
2

First gather the data you will need to compute the function. Because 99% is a commonly used
interval we know that it is represented by 2.575. To find (P-hat) we divide the number of yellow
candies (117) by the total number of candies (608). : 117/608 = .192. is our confidence interval
2
(2.575). Recall that n is the total number of candies (608). Now we can do the math, because we are
looking for a lower and an upper window we will have to do the equation twice, once subtracting and
again with addition.
2.575*. 192(1 .192)/608 = .041
Lower: .192-.041 = .151
Upper: .192+.041= .233
The window for confidence is between (.151, .233).

Lets do another one, this time using the true mean of candies per bag at an interval of 95%. The
equation we will use still needs to be computed twice. However, because we are looking at the mean
the equation looks like this:

( */).

As we learned earlier the mean represented here by (X-bar) is 60.8 and the standard deviation
(S) is 2.04. We are looking at the total number of bags, so n= 10. The only thing left is our t value. As
stated previously 95% is a common interval, listed at 2.262. All we have left is to plug in the numbers!

2.262*2.04/ 10 =1.46
Lower: 60.8 1.46= 59.34
Upper: 60.8+ 1.46= 62.26

The window of confidence is between (59.34, 62.26).

We can interpret the data we computed to determine a reference point of numbers for yellow
candies per bag or approximate total numbers of candies per bag. In the first equation, we found that
the number of yellow candies in a bag of Skittles to be between .151 and .233 (15% and 23%). Which
means that we can expect a majority of the time to find yellow less often. As with the second problem,
the total number of candies per bag are going to be between 59.34 and 62.26 (60 and 62). When you
purchase a bag of candies you can guess that the bag contains about 61 candies.

To find out if an event is probable or not we conduct Hypothesis Tests. This type of testing
ensures a way to accept or reject a claim. There are three types of tests you can conduct. Right Tail, Left
Tail, and Two tail. In hypothesis testing we use a Level of Significance, which is the probability of making
an error, to determine our Critical Value, the number we compare to the final answer to help us
conclude if we accept or reject.

Each test requires several steps. First, determine which test you are performing. For these
examples, we will only be using the Two Tail testing method. Second, because we have proportions and
means, you must figure out which one you are trying to find, as they both have different formulas. We
can begin with the calculations to really help the information sink in.

If I say that using a .05 significance level 20% of all the skittle candies are red, I have made a
claim. I can use the Hypothesis Test to conclude if this is true. I will be using the proportion equation
because I asked if ALL the candies are red. To help keep the data straight, we will use different symbols.
0 (H-not) will be the original claim. P for proportion. And .20 because we convert the percentage. Since
we will need to represent to other side of the claim that the red candies do not equal .20, I will use 1
(alternative).

0 : = .20
1 : .20
We are dealing with proportions so we must use (P-hat) remember the = x/n. This time x= red
candies (113), n is the total number of candies (608). = 113/608= .186.
Continuing with the steps to these test, third, we need to find a test statistic, this provides a
number for us to compare to the critical value. The test statistic formula is:

0 (10 )
0 /
.186-.20/. 20(1 .20)/608 = -.863

To determine the Critical Value, we must use the .05, again this is a common value and I know
that it is 1.96. However, because we are using a Two Tail test, we must set this value to a positive and a
negative. Take the answer from our test statistic problem; -.863 where does it fit on the bell curve in
conjunction to the critical values?

-.863
-1.96 1.96

We can see that the test statistic falls inside the critical values. The conclusion would be to fail
to reject that 20% of all Skittles are red, because there is not sufficient evidence to support this claim.

What if you changed the claim to be, using a .01 significance level to test that the mean number
of Skittles per bag was 55? We would start by picking up the key words, Mean, .01 significance level, and
the total number we are testing for. For this example, we will let mean be represented by (Mu).

0 : = 55
1 : 55

To complete the computations, we still need further data. Lucky for us, we already have the
information. Mean of candies per bag (X-bar): 60.8, total number of bags n: 10, standard deviation (S):
2.04. The only piece missing is the significance level at .01, again a commonly used value, it is listed at
2.575. Remember we must set it to a positive and negative. Now that we have all the values here is
equation we will be using.


0 / 60.8-55/2.04/10 = 8.99

We need to get our bell curve and place our critical value, 2.575.
-2.575 2.575 8.99

Here we can see that out test statistic is outside our critical value. This concludes that we will
reject that each bag contains 55 Skittles because are evidence shows that the alternative is true.

After conducting our own confidence intervals and hypothesis tests you can see that they play a
crucial role in statistics. They provide us a way to share findings with certainty. This type of data is useful
in research, census gathering, and understanding a population or sample without having to spend a
large amount of money and time to get a result. Be aware that errors are possible. It can happen in two
ways, Type 1: you reject the H-not when you are not supposed to, or Type 2: you do not reject the H-not
when you are supposed to. We can improve our methods but getting a larger number of samples. Using
a smaller level of confidence will also help. After performing all the equations, I can conclude that a bag
of Skittles is about two pennies a piece. It is up to you to decide if that is a good value.

For my final reflection, I will start by saying I had my doubts about this project. I was glad to
know that the first thing I had to do was count, not too difficult. Looking at the project, I thought, there
is no way I am going to be able explain the process for computing equations. Or that each equation has
so many steps how will I express them all in a way that makes sense. Although I did get confused often, I
was able to use critical thinking skills to find the solutions in my textbook, notes from class, or online. I
was also lucky enough to have great classmates.

The idea for this project was supposed to help me in math. I feel that I learned more about the
computer programs Excel and Word. I made graphs and wrote equations with code. I am proud of the
way I continued to test myself when I got stuck. I did refer to online videos for help at times, but I think
that growing as a person requires you to be able to admit that you need help. One of the things you
need to do for this specific assignment is to make an online presence and I learned to do that by trial
and error.

As a perfectionist, I spent a lot of time writing on paper before I put together this final draft. This
type of large assignment takes dedication and perseverance. Each week after class, I would review the
questions for this assignment to try and get another section completed. I did not want to feel rushed
and overwhelmed. I would say that what I learned most this semester was time management. I know
that I am a hard-working student, but to add full time employment, as well as, being needed at home. It
gets trying.

Dont give up!

You might also like