You are on page 1of 11

Ema Condori-Teves

Professor Oremus
Math 1040 Statistics
28 November 2018
Skittles Term Project
Throughout the semester we have been studies various aspects of statistics and how to

collect, calculate, and analyze data from real world applications. The skittles project was done to

show an example of how to interpret and apply statistics through a real life. The first part of the

project was to collect the data we were going to be working with, in this case, it was the amount

of each color of skittles in our bag. Everyone in our class purchased a 2.17 ounce bag of Original

Skittles and recorded the following data.

Number of red candies: 13


Number of orange candies: 7
Number of yellow candies: 16
Number of green candies: 7
Number of purple candies:18
Total number or candies: 61
This was the data recorded from my own skittles bag.
The data was then compiled with all the students in the math1040 class this semester, and the
following dot plot was created to represent the average number of total skittles in each bag. The
purpose of this section was to find the mean and to compare the amount of skittles found in each
bag and calculate how much this deviated from the norm.

Team Project Part 2


In the second part of the project, we were asked to determine the proportion of each color

within the overall sample gathered by the class. To do this we first made an educated guess and

listed our own expectations of what we though the proportions would be and why. The we

opened the data set and computed the proportions of Red, Orange, Yellow, Green, and Purple

candies in the class data set. Noting that the sample size is the total number of candies collected

by the class.

As a guess, we had assumed that the proportions of skittle colors would be a lot more

evenly distributed across a larger population sample than it would amongst a small, or individual

sample. It was predicted that there would be around 20% of each color reflected amongst the

class sample although those results may vary a few percentage points depending on the sample

size because we had discussed how a class sample would not be large enough to reflect across all

the varying locations of different skittle factory’s and their production accuracy, as well as all

skittle consumers.
The total number of skittles for the class was 6,680
The total number of Red Skittles was 1,340 with a relative frequency is 0.201 or 20.1%
The total number of Orange Skittles was 1,356 with a relative frequency is 0.203 or 20.3%
The total number of Yellow Skittles was 1,410 with a relative frequency is 0.211 or 21.1%
The total number of Green Skittles was 1,245 with a relative frequency is 0.186 or 18.6%
The total number of Purple Skittles was 1,329 with a relative frequency is 0.199 or 19.9%
In StatCrunch we created a pie chart and a Pareto chart for the total number of candies of each
color in our class data set.

Then we decided if the class data represented a random sample and what the population was for

this particular set of data.

There were a few different opinions that our group had while discussing whether the class

data would represent a random sample or not. It was argued that the answer would depend on

what we would consider the sample and population. It was discussed how the sample might be

considered a convenience sample when considering that class numbers are the sample although

our focus is on the comparison of skittle colors, and not necessarily on the people chosen to
collect the data. If we had considered the entire Skittle’s production as the population that would

mean that our data would be nowhere near being enough to represent the entire population,

because our class (online and on-campus) doesn’t represent the entire Skittles production, it

would not be accurate to conclude that the data would represent the entire population. If were

were to consider that all of SLCC is our population, then our class would be considered a random

sample that were given the assignment to collect the skittle data, essentially, we are a sample of

the students at the school, although one could even argue that an even bigger population, such as

the U.S. It was concluded that we could be considered a random sample because us as a class

would be the population and we are a random sample of skittle consumers.

We then created a table that displays the counts by color and total from our own bag of candies

together with the counts by color and total for the entire class sample.

Count Red Count Count Count Count Count


Orange Yellow Green Purple total
My Bag 13 7 16 7 18 61
Class 1340 1356 1410 1245 1329 6680
Count

Total Class Skittles Charts My Bag Skittles Charts


Then we discussed our observations of this data and responded to the following prompts:
 Do the graphs reflect what you expected to see? Are there any surprises?
 Are there any observations that appear to be outliers? If so, what impact might they have
on graphics and summary statistics?
 Does the distribution of colors in the total class data match with your own data from your
single bag of candies or are they different?

The graphs for the class total count of the skittle colors is what I had expected to see,

with a total of 5 colors, I had assumed that there should be about 20% of each color within

each bag. Although I had predicted that the amount would vary within individual bags, I

assumed that as a larger population for this statistical study was gathered, the equal

percentage of colors would reflect more prominently across the larger sample population.

There were a few outliers within the initial data that was gathered although it seemed to

mainly be an error on the participates part or possibly a mistake typing in the data, this data

was removed from the total count because it would have reflected across the graphics and

summary statistics negatively by reflecting inaccurate data. The data from my own bag did

differ slightly from the class total data, although that was expected. The class total shows a
more equal distribution of color while my personal bag doesn’t quite reflect an equal

distribution of colors. I would estimate that a larger population sample would reflect an even

more equal distribution of colors within the bag, although individual bags may not represent

this, from just looking as the data collected from my personal bag, one would assume that the

color distribution is not equal, although a larger sample would reflect a much different

conclusion.

Skittles Project Part 3

Using the total number of candies in each bag in our class sample, we computed the

following measures for the variable “Total candies in each bag” and reported these summary

statistics rounded to one decimal place below.

Mean number of candies per bag: 60.2

Standard deviation of the number of candies per bag: 7.0

5 number summary for the number of candies per bag: 35, 58, 59, 61, 97 2.

We then created a frequency histogram for the variable “Total candies in each bag”
As well as a box plot for the variable “Total candies in each bag”.

Then discussed my findings about the variable “Total candies in each bag” and adressed

the following in my writing: What is the shape of the distribution? Do the graphs reflect what

you expected to see? Does the overall data collected by the whole class agree with your own

data from a single bag of candies? Include the number of candies from your own bag and the

total number of bags in the class sample in your discussion.

When considering the total number of skittles in each bag in our sample population, we

get a mean of about 60 skittles in each bag. The shape of the histogram graph is a bit difficult

to determine when looking at the graph with a bin width of 10. Most of the data is cluttered

within the 50-70 number of skittles per bag range, so the histogram mainly looks like it only

consists of two bars that have a high frequency and not much else, but the graph’s shape

becomes slightly more distinguishable as you reduce the bin width. When I had changed the

bin width to 1, the shape looked to be a lot more bell shaped, with the data clustered around
60-61 and distributed in a bell-shaped manner around that. The graphs were what I had

expected to see, skittles are meant to be packaged in a pretty precise manner although there is

some variations around the average of 60 skittles per bag. The over data from the class did

agree with my own, with the mean for the whole class being about 60 skittles, and the count

of skittles from my own bag being 61.

I also explained the difference between categorical and quantitative data by addressing

the following in my writing: What types of graphs make sense and what types of graphs do

not make sense for categorical data? For quantitative data? Explain why. What types of

calculations make sense and what types of calculations do not make sense for categorical

data? For quantitative data? Explain why.

Categorical and Quantitative data are both organized through tables and graphs, although

the types of tables and graphs used to represent each type of data varies. Categorical data

consists of tables that are mainly frequency tables and relative frequency tables. The graphs

that are used to represent categorical data are: bar, side-by-side bar graphs and pie charts.

These graphs best represent categorical data because they better represent and compare data

with variables that classify the data that is based on characteristics or attributes as opposed to

quantitative data that has variables based on a numerical measure. Quantitative data on the

other hand can be broken down even further depending of the type of data, whether it be

discrete data or continuous. Discrete quantitative data, which is data with a finite or

countable number of possible values, will contain values that can be broken down into

categories. Discrete quantitative data with a variety of outcomes and continuous data, which

are variables with an infinite number of values that are not necessarily countable, will contain

values that best form classes. Both discrete and continuous quantitative data are best
organized in table such as: frequency, relative frequency, cumulative frequency, and

cumulative relative frequency. The graphs that best represent data that is quantitative are:

histograms, stem and leaf plot, dot plots, frequency polygons, ogives, and time series plots.

These tables and graphs much better represent and compare data that has variables with

numerical values.

Skittles Project Part 4

Construct a 99% confidence interval estimate for the population proportion of yellow candies.

1. Sample proportion: 1400/6880=0.211

Z score=2.58 0.211−2.58[0.211(1−0.211)6680]=0.198 LowerBound

0.211+2.58[0.211(1−0.211)6680]=0.224 UpperBound

Construct a 95% confidence interval estimate for the population mean number of candies per
bag.

2. Lower Bound: 56,583 Upper Bound 57,497

Discuss and interpret (with complete sentences) the results of each of your interval estimates

3. We can be 99% confident that the proportion of yellow skittles is between 0.198 and

0.224. This means that for every hundred skittles, we would expect to find between 20-

22 that are yellow

In a paragraph, explain in general the purpose and meaning of a confidence interval.

A confidence interval, according to the textbook, is an interval of numbers for an

unknown parameter that is reported within a level of confidence (a level of confidence represents

the proportion of intervals that will contain the parameter if a large number of different samples
is obtain). This basically means that the confidence interval is going to measure the probability

that a population parameter will fall between two sets of values known as the upper and lower

bound. For example, if we were to be 95% confident that the population mean for a certain

situation were to lie between 73.4 and 103.7, that would mean that if repeated samples were

taken and a 95% interval computed for each sample, one would expect 95% of the of those

intervals to contain the population mean. It is important to consider hat these calculations must

be made under certain conditions, two of the most important condition being that the data

obtained through a simple random sample or from a randomized experiment, and the ample size

can be no more than 5% of the population size.

Reflection on the Term Project


Throughout this semester we have been studying statistics and how to calculate various

aspects of statistical evidence to real life applications. Throughout the process of working on this

project I have been able to understand and apply many concepts such as: organizing and

analyzing data, drawing conclusions, using confidence intervals and hypothesis tests, as well as

being able to clearly present the data collected and explain it in a report. I was also able to

understand how important and useful statistics is throughout its many application to our daily

lives and jobs.

At the beginning of the semester, each student taking the class was asked to purchase a

2.17-ounce bag of original skittle to conduct our statistical research. As the semester progressed,

we were able to apply knowledge and concepts that’s we had learned from each chapter to the

data collected from the skittles. The first part of the project taught me how hypothesize, collect

and create visual representations of the data collected, as well as analyzing those results and

applying meaning to them. The second part of the project helped me understand how to compute
the mean, standard deviation, frequency, etc. to be able to create a graph using StatCrunch or

calculator, but most importantly how to analyze and derive meaning from the shape of the graphs

and boxplots made from the collected data. I was also able to grasp a better understanding of

what type of data was collected (categorical and quantitative) and why that is relevant in each

statistical experiment conducted

In the third part of the project, I learned how to construct a confidence interval estimate

for the population proportion and mean through StatCrunch and on a calculator, as well as

interpret the results for each interval estimate. I was able to learn a lot about inference and what

can be concluded from collected data and statistical evidence throughout this project and

semester.

You might also like