14 views

Uploaded by api-356646281

- Ch_08PP
- Assignment_3_6331.pdf
- Geodetic Deformation Analysis
- A1INSE6220-Winter17
- 0324598270_139440.pdf
- Studying the Influencing Factors on Online Brand Trust
- SChapter 4
- Explanation of Statistical Methods
- Ambarish Final
- Nhst Pdw Resource Guide Final 2018-08-14
- skittles final project rachel skidmore
- Final Report
- Ch 9 Answer key.docx
- Marketing Research Process
- Statistics_Courseplan_CIM Lavasa_2016.doc
- 9 Firm Technological Change and Its Effects
- Correlational Analysis
- August2011 - Questions
- os2
- 20026-22387-1-PB

You are on page 1of 11

Term Group Project

SLCC - Spring 2017

May 4 - 2017

Chelsea Rinde

Jessica Reynolds

Paul Baker

Trisha Snow

Whytnie Heuser

A Statistical Evaluation of Skittles

Report Introduction:

We were tasked to analyze bags of Skittles. Is there a consistent number of candy in any

given bag 2.17oz bag? Are the number of candies for a given color normally distributed or are

they random? Using a sample size based on the number of students enrolled in the current class,

we have recorded all the metrics of each bag of skittles. These metrics were then used were then

used to create an analysis to determine the answers to these questions using five number

summaries, histograms, confidence intervals, and hypothesis testing.

To visualize the proportions Skittle color, we created a pie chart of relative frequency.

From smallest relative frequency to largest we have: red, orange, green, yellow, and purple with

respective relative frequencies of 0.186, 0.199, 0.204, 0.204, and 0.207.

It is known that pie charts lack in representation when concerning quantity. A Pareto

chart supplements this data by showing the quantity and relationship of the relative frequency

together so we can see how each proportion increments towards 100% with each additional

category.

In addition, instead of being organized into a visual chart. It is helpful just to see the

numbers broken down without visualization. The table below displays the data for the class as a

whole. It is broken into a section of rows and columns, where the columns are color and the rows

are total and sample proportion.

For our individual contributions to the sample from the group, we have our own bags

displayed in the following table. Again, the data is broken into columns by color, and rows by

individual contribution. The last two rows are total of our group followed by proportion within

our group.

We found that purple was the most frequently occurring, while the red was the least.

Comparing our contributions with the sample show, preliminarily, that the numbers appear to be

consistent from bag to bag. This is reflected by our proportions being similar to the sample, with

the purple being the highest proportion and the red being the smallest proportion. We expect the

data agrees with a single bag of candy, because we also calculated the minimum and maximum

amount of candy with the entire class, and our groups results correlates with these.

Organizing and Displaying Quantitative Data: The number of candies per bag

Given that the sample is the data from our whole class and not just our group we determine the

following.

Sample Mean x : 60.4

Sample Standard Deviation s : 1.74

Sample Size n: 24

There are 24 total bags, and what we calculated was the sample mean and standard

deviation. The boxplot also visualizes this data. We can see that the data appears to be normally

distributed because the whiskers of the box plot are approximately equal, and the box itself isnt

very large in relation to the spread so we can tell that there isnt much variance or spread (which

we confirm with our sample standard deviation s)

To assist visualization further, we can show the number of skittles per bag, within our

class sample, with a histogram. Using a class-width of one, we can see that bags containing 59

and 61 candies occur most frequently while the least frequently occurring are bags of 62 and 63

candies.

Giving an early look at our own contributions we can start to see that this appears to be

fairly correct because our data appears to be similar to the sample with our numbers being very

close to the most frequently occurring in the sample.

After calculating the mean number of candy per bag, and a 5-number summary. We find

that the distribution in this case is bell shaped according to the histogram. The Pareto Chart cant

be normally distributed, because it is used for categorical data instead of numerical data. In the

5-number summary, we also find this is normally distributed, there is no skew either left or right

here. This is what we expected to see, since the proportions in these are reflective of the data.

The differences between categorical and quantitative data in this project is the categorical

data is the colors in each bag compared to the number of candies in each bag. The graphs that

make sense for the categorical data are pie and pareto charts. The quantitative data is the number

of candies in each bag along with the total for the rest of the class. The graphs that make sense

for these are box plots and histograms.

Confidence intervals serves to measure the probability that a parameter or specific value

falls within a range of 2 values. This estimate helps to identify the certainty or level of error

found within our sample study.

Using 99% confidence and the calculated portion of yellow candy within the sample, we

can calculate the amount of error and then generate our interval of confidence. With this

information we can project our proportion to the population. Given the whole population of

skittles we can say that the proportion of yellow candy is within 0.177 and 0.231 with 99%

confidence.

Using 95% confidence, we can calculate how many candies there are per bag. With our

sample of 24 bags, we can calculate the mean number of candies per bag and the error of our

interval. We can project this data to the population and state that if we purchase a bag of skittles

arbitrarily, there are between 59.665 and 61.135 on average. These number need to be interpreted

as discrete (unbroken) candies, so there will be between 60 and 62 candies on average.

Hypothesis Tests

claim may or may not be true. The hypothesis testing is the procedure by the claim is tested. Data

is gathered from a population sample, and used to test the hypothesis. If the data supports the

hypothesis, it is accepted (failed to reject), if it does not support the original claim, it is rejected.

The first hypothesis was to test the claim that 20% of all skittles are red, using a .05

significance level. This is considered a two tailed test because it is using an equal vs. not equal

claim. The test statistic -1.371 was less than the critical value of 1.96, therefore, we fail to reject

the null hypothesis. There is not sufficient sample evidence to support the claim that 20% of all

skittles are red.

The second hypothesis was to test the claim that the mean of all skittles were 55, using a

.01 significance level. This is also considered a two tailed test, due to the equal vs. not equal

claim. The test statistic 15.211 was greater than the critical value of 2.807, therefore we reject the

null hypothesis. The sample data supports the claim that the mean number of skittles in a bag is

55.

Confidence interval estimates for sample mean require the sampling method to be simple random

sampling which was accomplished in our study. Each student randomly selected a Skittles bag to use from

a variety of store locations. The number and color of Skittles within each bag was also randomly inserted

as the manufacturer did not dictate exact number and color combinations for each bag. The second

condition for interval estimates require they be an approximately normally distributed sampling

distribution. As shown by the general bell shaper of the Skittles Per Bag histogram, the sampling data was

approximately normally distributed.

Conditions for performing Hypothesis Tests also require the simple random sampling method and

that the sampling distribution is approximately normally distributed. Based on the sampling method used

and sample data, we believe these conditions have been met.

Though we feel the appropriate conditions have been met to conduct our statistical

evaluation, there are still potential errors that could have occurred. Any of the students could

have made a mistake while counting. Any of the students or the professor could have made a

typographical error while submitting the data to the class for use. We are also making the naive

assumption that our region (where all students purchased skittles) is not special and the

population of skittles, as a whole, are bagged the same regardless of their geographical

destination. If there are regions, we have not taken them into account and we do not know where

their borders lie. Because of this, errors can be introduced if the students that are in this class are

across these borders. The sample gathered by them may not correctly reflect either the

population or the population of the region due to this issue.

In order to limit the possibility of the mentioned sampling errors we could purchase

skittle bags from a wider geographical area such as across the state, from various locations in the

Country, or even from across the world. By purchasing from a wider geographical area we could

account for any differences in bagging the candy or other potential deviations made by the

manufacturer. Once the bags of candy were obtained we could have each bag counted and the

data recorded and peer-reviewed for accuracy by different individuals to ensure the correct

sampling data was obtained. This would work to limit mistakes from human error.

After conducting our statistical analysis we have found that given our sample, assuming it

to be the correct size and not just arbitrary to the number of students enrolled in class. We can

make assumptions about the population, such as proportion of a given color of candy and how

much candy will be packaged per bag. We can also perform hypothesis tests to reject or validate

(fail to reject) our assumptions. We can conclude that this data appears to be normally distributed

and number of candies per bag as well as the proportion of their colors is most likely to be

intentionally set and not random.

- Ch_08PPUploaded byDebiNina
- Assignment_3_6331.pdfUploaded byALIKNF
- Geodetic Deformation AnalysisUploaded byYasemin
- A1INSE6220-Winter17Uploaded bypicala
- 0324598270_139440.pdfUploaded bySaidaShamuratova
- Studying the Influencing Factors on Online Brand TrustUploaded byTI Journals Publishing
- SChapter 4Uploaded bySrinyanavel ஸ்ரீஞானவேல்
- Explanation of Statistical MethodsUploaded byRoshini Kr- Dubey
- Ambarish FinalUploaded byVikas Nidhi
- Nhst Pdw Resource Guide Final 2018-08-14Uploaded byasevil
- skittles final project rachel skidmoreUploaded byapi-232729233
- Final ReportUploaded byJayShah
- Ch 9 Answer key.docxUploaded byrehassan
- Marketing Research ProcessUploaded byPallabi Pattanayak
- Statistics_Courseplan_CIM Lavasa_2016.docUploaded bypavikutty
- 9 Firm Technological Change and Its EffectsUploaded byMuhammad Tayyab Javaid
- Correlational AnalysisUploaded byRomen Samuel Wabina
- August2011 - QuestionsUploaded byPallavi Gupta
- os2Uploaded byapi-27836396
- 20026-22387-1-PBUploaded byanik
- CFA1 3.3 Significance TestingUploaded bycnvb alsk
- QMT11 Chapter 9 Hypothesis TestingUploaded byBamPanggat
- WISE Confidence Interval Overlap Game.pdfUploaded byRodrigo Chang
- kjDJKUploaded byauli716
- ftestsUploaded byWaqas Khan Niazi
- 1. Determinan of Financial LiquidityUploaded bya
- Marketing Universals: Consumers' Use of Brand Name, PRice, Physical Appearance and Retailer ReputationUploaded bySabina Frățilă
- Hotspot Analysis - Journal 1Uploaded byIbrahim Anwar
- Www.nseindia.com Content Research Comppaper194Uploaded byNeha Patel
- Local Indicators of Spatial Association-LISAUploaded byOliver Eboy

- Chapter 9 - Quality Assurance.docUploaded byhello_khay
- Testing and Evaluation Part I v. PredaUploaded byFrancesca Diaconu
- Experiment NotesUploaded bySusan May
- Med 06 Course OutlineUploaded byapi-3696879
- GRE Pbt Center ListsUploaded byIrene Nindita Pradnya
- 4th Papers level Maths.pdfUploaded byKeerthana
- Six Research FrameworksUploaded bycoachhand
- quantitative researchUploaded byapi-263854869
- Template Ppt UkridaUploaded byStefanus Vernandi
- Critical AppraisalUploaded byLudi Dhyani Rahmartani
- New Exam Faqs 2015Uploaded byNuria Montoya Perez
- Second Ballot SystemUploaded bynisha
- Chapter 3Uploaded byAmit Tyagi
- Research MethodologyUploaded bysanthosh hk
- Nature & Scope of Business ResearchUploaded bysujeetleopard
- Assessment and Evaluation of Learning 1Uploaded byMary Grace Cernechez
- TqmUploaded bymanojpatel51
- Jadad ScaleUploaded byLuis Henrique Sales
- Learning Unit 11Uploaded byBelindaNiemand
- 5sampl2Uploaded byElmer Delejero
- research methodology.docxUploaded bypg
- ResearchUploaded byRajdeep Singh
- Are Personality Tests Legal to Use in the Hiring ProcessUploaded byBian Hardiyanto
- Chapter 07Uploaded byChing Mordeno
- Narrative ReportUploaded byPearl Aude
- Advantages Paper BallotsUploaded bymdmbymk
- confidence interval.docxUploaded bySamuel Antobam
- Why electoral systems....pdfUploaded bySISAY TSEGAYE
- How to Write a Research ProposalUploaded byGeorgi Garkov
- Interviewing the Investigator- Strategies for Addressing InstrumeUploaded bybrayanseixas7840