You are on page 1of 10

Team Project

1. Are there any identifiable problems with the data? If so, how did you as a group deal with the
issues? Why?

When looking at the data, I noticed three errors. Row 35 had a total of 98, Row 38 didnt
have a number for the purple skittles and thus making the total 38, which is way too few
of skittles. Then row 45 had way too many skittles in each column, making the total
2836.

The way that I would deal with this issue is by taking that data out of the project
because that data can ruin the end results.

2. Determine the proportion of each color within the overall sample gathered by the class.
FIRST: Guess! What do you expect the proportions to be? Why?
Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and Purple
candies in the class data set. Note that the sample size is the total number of candies collected
by the class.

If I had to guess I would say that each proportion is around 500, give or take a few

Here is the proportion for each colors, disregarding the errors that were in the date
1. Red = 533
2. Orange = 582
3. Yellow = 525
4. Green = 526
5. Purple = 546

The prediction I made before I calculated all the proportions for each number was
pretty good.
3. In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each
color in our class data set. Submit copies of your graphs in this report.
Appropriate labels, titles, and formatting of graphics is expected throughout this project.
Tutorials in your Interactive Reading assignments will help, and StatCrunch exercises are
posted on the technology discussion board that also walk you through the procedures step-
by-step.
4. Does the class data represent a random sample? What would the population be? Collaborate
to discuss sampling and our data in a paragraph or two.
The data collected for this project represents a perfect example of a simple random
sample. As defined in the interactive assignment a simple random sample is a sample of size n
from a population of size N is obtained through simple random sampling if every possible
sample of size n has an equal chance of occurring. This project is exactly what we are doing
here. The population here being the students in our class.
The data collected did have some errors in it that would have caused us get misleading
reading in our graphs so that data was removed from the overall data. From there we guessed
that each proportion would be around 500. Each proportion tuned out to all be in the 500s.
Then what we did here is take all of our data and represent it in the graphs above as a pie chart
and two Pareto charts. One being the frequency of each proportion of skittles in descending
order. This is the same concept of the relative frequency graph above.

5. Using the total number of candies in each bag in our class sample, compute the following measures
for the variable Total candies in each bag:

(a) mean number of candies per bag


With outliers = 118.8

Without outliers = 60.7


(b) standard deviation of the number of candies per bag
With outliers = 396.4

Without outliers = 2.5


(c) 5-number summary for the number of candies per bag
With outliers and without outliers

Minimum Q1 Median Q3 Maximum

With outliers 38 59.5 61 62 2836

54 59.5 61 62 68
Without
outliers
6 . Create a frequency histogram for the variable Total candies in each bag

7 . Create a box plot for the variable Total candies in each bag
8. Construct a 99% confidence interval estimate for the population proportion of yellow
candies.

Without outliers
n= 2712
x=525
p(hat)=0.194
0.
=0.01955
with outliers
n=5702
x=1116
p(hat)=0.196
0.
=0.0135
9. Construct a 95% confidence interval estimate for the population mean number of
candies per bag.
Without outliers
n=45
x=60.67
=0.7495
with outliers
n=48
x=118.79
=116.3199
Individual Bag
n=5
x=12.4
=4.35485

The first thing that was done was to find the 99% confidence interval for the yellow skittles. To
see how the data would come out with and without outliers, both calculations were made. The product
ended up being 0.01955, without outliers and 0.0135. We can see from that data that there is a
decrease from having no outliers to having outliers. That means that the larger the numbers are the
lower the product will be. Moving on to the 95% confidence interval of the population, we got different
outcomes. The product from no outliers to having outliers is a big difference. Without outliers was
0.7495 and with outliers was 116.3199. There is a major increase when there is an increase in numbers.
There was also another calculation made just for an individual bag of candy and the outcome came to
4.35485. That statistic seamed to make most sense to me.
Individual Work

Count Red Count Orange Count Yellow Count Green Count Purple Total

My Bag 10 11 18 8 16 63

Count Red Count Orange Count Yellow Count Green Count Purple Total

Class Counts 1131 1165 1116 1146 1144 5702

Count Red Count Orange Count Yellow Count Green Count Purple Total

Class Counts* 533 582 525 526 564 2730


*Lines 35, 38 and 45 excluded

1. The graphs were not surprising. The amount of each color remained relatively the same after all
the bags from the class were counted.
2. There are few inputs that were different from most of the data. I chose to make charts for both.
It seems that it didnt make too much of a difference regarding the evenness of each color with
or without the outliers.
3. They are different. I would expect the color evenness to become more significant with more
bags.
Using Excel Lines 35, 38 and 45 were excluded

Using Excel Lines 35, 38 and 45 were Included


Using Excel Lines 35, 38 and 45 were Included

Using Excel Lines 35, 38 and 45 were excluded


The shape of the distribution is symmetric for both counts (my bag and the entire class). This
surprised me. I had always assumed that you would get relatively the same amount of each color and
total amount in each bag. This proved not to be the case more than half of the time (disregarding the
outliers). The total number of candies in my bag is 63. The total number of bags is 45.

Quantitative variables are expressed numerically. They are used in calculations and can be
measured. Dont be deceived, not all numbers represent a quantitative variable. Some numbers are
categorized as categorical variables. Such as zip codes or age. The best ways to graph quantitative data
is a dot plot, stem-leaf diagram, time plot, histogram or a boxplot. Pie charts may not be suitable for too
many categories. Frequency calculations are good for quantitative data. This will give you more detail
about the data.

Categorical variables dont have numerical or quantitative meaning. They are more of a
characteristics or qualitative data. The best ways to graph categorical data is a bar graph or pie chart. A
bar graph is useful when we want to compare the different parts, not necessarily the parts to the whole.

A confidence interval is how much uncertainty there is with any particular statistic. Confidence
intervals are often used with a margin of error. It tells you how confident you can be that the results
from a poll or survey reflect what you would expect to find if it were possible to survey the entire
population. Confidence intervals are intrinsically connected to confidence levels.

You might also like