13 views

Uploaded by api-294814707

- 10 Analysis of Categorical Variables
- Leadership and Its Effects on Employees Performance
- 17vol2no2
- RMRP
- Research Process SPC
- Descriptive and Inferential Statistics Part 1 2013 2014
- 0324305419_65709
- 214534 1 Statistics
- t -test.ppt
- Descriptive and Inferential Statistics Part 1 2015
- 7.2rvsd.ppt
- Lecture (hypothesis)print
- RISK
- math 1040 skittles report part 3
- Journal Club - Radiation Oncology
- 3301 Topic 9 Hypothesis Testing
- Probability and Statistics Notes
- As-AL SOW 9709 07 ProbabilityandStatistics2 S2
- CVEN2002 Week9
- Tutorial 10

You are on page 1of 7

The purpose of this project is to pull together statistical concepts we have been studying and

adapting calculations into values that numerically summarize data to then use to create graphical

displays. By using confidence interval estimates and hypothesis tests we will attempt to draw

conclusions behind the proportions for each color and number of candies per Skittles bags. While

collectively this project would not have been possible without the contribution from the class, the

observations and graphical illustrations shown below are a reflection of my own work.

Class

Proportions

Colors

Count

Green

291

Orange

315

Purple

329

Red

376

Yellow

328

Ratio

291/163

9

315/163

9

329/163

9

376/163

9

328/163

9

Count

11

13

6

21

9

Ratio

11/60

13/60

6/60

21/60

9/60

Perce

nt

18%

19%

20%

23%

20%

Total

100%

My

Proportions

Colors

Green

Orange

Purple

Red

Yellow

Perce

nt

18%

22%

10%

35%

15%

Total

100%

The entire population ranges primarily from 18-23%, and mine ranges from 18-35%, due to the

fact that I received more red candies in my bag compared to others in the class individually.

However, red is also the highest contributor in the population sample as well. I concluded my

bag was not special or unique in anyway. Proportions are relatively consistent.

Green; 291

Orange; 315

Yellow; 328

400

350

300

250

200

150

100

50

0

Red

Purple

Yellow

Orange

Green

The graphs above graphically illustrate the proportion of each color: one as a Pie Chart and the

other as a Pareto Chart. As you can see, both charts depict the same information, the difference is

presentation. Between the class, colors are consistent where they range around the same area.

Red being the highest numerical value while green being the least frequent.

Frequency

8

7

6

5

4

3

2

1

0

6

5

2

1

58

59

60

61

7

62

63

1

64

The number of skittles per bag ranged from 58 through 64 with the concentration scattered in between.

Between the mean numbers of candies per bag, collectively 7 individuals received 61.While

there were some who had 3 less than the mean, or a small percentage who received 3 more than

the average, the primary concentration was between 58 and 64 pieces. The shape of the

distribution is concentrated between 60 and 62, larger distribution around the mean. I was not

surprised by the data, Skittles manufactures are run by machines, and numerical accuracy is not

Mean

60.8 uncommon for this type of technology. I enjoy Skittles candy and am

Median

61.0 pleased to announce that I was not the unfortunate one who received only

Mode

61.0 58 pieces instead of 61 or more. I was in the average range among the rest

of the class.

Standar

d

Deviatio

n

Sample

Variance

Range

Minimu

m

Maximu

m

Sum

Count

Q1

Q2

Q3

1.4

2.0

5.0

58.0

64.0

1581

.0

26.0

59.5

61

62

It was important to define the difference between categorical data and quantitative data for this

project. Quantitative data consists of numerical values that are used to represent counts or

measurements, while categorical data primarily consists of names or labels that would not

represent counts or measurements. Both types of graphs could be used depending on the

circumstance. If one would like to illustrate the ages in years of survey respondents, quantitative

would work best because numbers are easier to organize in this situation. Pie charts or Pareto

charts would work best and labeling categorical data in simple pie slices or bar graphs to

maintain the integrity of the information and simplicity. Specific calculations make more sense

to use in categorical data if frequencies are proportional to the frequency count for the category

and if showing relative sizes of the components is necessary. Quantitative data calculations are

best used if they attempt to illustrate a cluster or gap, yearly high or low values, or IQ scores that

reveal the distribution of data while keeping the original data values.

Interval Estimates

Three main concepts are necessary to make and appropriately interpret values of corresponding

proportions such as point estimates, confidence intervals, and sample sizes. Sample proportions

are said to be the best point estimate of a population proportion, but a point estimate is a single

value that gives no robust balance between precision and reliability on how good the estimate

actually really is. By using confidence intervals the estimate becomes a range rather than a single

value and the higher the confidence, the more certain we are that the estimate contains the true

value of the population parameter.

Requirements

For estimating a Population Proportion:

1. Sample is a simple random sample

2. Fixed number of trials, independent trials, two categorical of outcomes, probabilities

remain constant for each trial.

*In our case, our requirements were met.

For estimating a population mean (t or z)

1. Sample is a random sample

2. Population is normally distributed or n > 30

*In our case, requirements were not met. We had 27 samples.

For estimating a population standard deviation or variance

1. Sample is a simple random sample

2. Population must be normally distributed

*Based on the histogram and box plot provided in previous pages, the population is not

normally distributed. Requirements were not met.

Estimates based on Skittles Project

We are 99% confident that the interval (15.3%, 20.2%) contains the population proportion of

green candies.

A 95% confidence interval estimate for true mean number of candies per bag

We are 95% confident that the interval (60.1, 61.3) contains the true mean number of candies per

bag.

A 98% confidence interval for the standard deviation of the number of candies per bag

We are 98% confident that the interval (1.1, 2.1) contains the standard deviation of the number of

candies per bag.

Hypothesis Testing

A method used to test whether or not a claim or statement about a property of a population is

valid. In inferential statistics we can never prove or accept a claim as true, but it is possible to

reject a claim therefore allowing a new claim to be tested for its validity.

Rare Event Rule for Inferential Statistics:

If, under a given assumption, the probability of a particular observed event is extremely

small, we conclude that the assumption is probably not correct.

Type I error: The mistake of rejecting the null hypothesis when it is actually true.

Type II error: The mistake of failing to reject the null hypothesis when it is actually false.

Use a 0.05 significance level to test the claim that 20% of all Skittles candies are purple.

1) Claim: p = .20------------------------ Null

Opposite: p does not equal .20----- Alternate

2) Significance level = 0.05

3) Test Statistic:

Z score= 0.0741 Critical value 1.96

1.96>0.0741

4) Decision:

Fail to Reject Null Hypothesis.

5) There is not sufficient evidence to reject the claim that 20% of candies are purple.

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of

Skittles is 62.

1. Claim: m= 62------------------ Null

Opposite: m does not equal 62- Alternate

2) Significance level= 0.01

3) Test Statistic

(P-Value).0001 versus (Significance Level) .01

Reject Null Hypothesis

4.) There is sufficient evidence to reject the claim that the mean number of candies per bag is 62.

Summary

The requirements for estimating a population proportion were met, but the estimation of a

population mean and population standard deviation were not. The sample size was below 30 and

the population standard deviation was not normally distributed. The mean and standard deviation

cannot be approximated and the methods of these sections do not apply. Other methods would be

better appropriated for such cases. Sampling methods would be improved if the sample size met

the requirement of 30 or more. Using other sampling methods such as systematic stratified, or

cluster sampling would not necessarily improve the sample at this point until it was larger. Once

more people were included in the sample, the better we were to have a better reflection of the

population proportion estimate.

Based on the statistical research, we have been able to conclude a 95%, 98%, and 99%

confidence interval of proportion of green candies, true number of mean candies per bag, and the

standard deviation of the numbers per bag (respectively). Additionally, we had the opportunity to

test certain claims about proportions and means that have aided in understanding the values

needed to infer statistical responses. We have tested the claim that claim that 20% of all Skittles

candies are purple and concluded there was not sufficient evidence to reject the claim that 20%

of candies are purple. We also tested whether the mean number of candies in a bag of Skittles

was 62 and concluded that there was sufficient evidence to reject the claim that the mean number

was 62.

- 10 Analysis of Categorical VariablesUploaded byMamoun Slamah Alzyoud
- Leadership and Its Effects on Employees PerformanceUploaded byZulfiquarAhmed
- 17vol2no2Uploaded bygoyal_khushbu88
- RMRPUploaded byMayank Tayal
- Research Process SPCUploaded byAbinash Dash
- Descriptive and Inferential Statistics Part 1 2013 2014Uploaded bykeehooi
- 0324305419_65709Uploaded byAnshul Pandey
- 214534 1 StatisticsUploaded byAbbi Yudha Mahendra
- t -test.pptUploaded byDrRuchi Garg
- Descriptive and Inferential Statistics Part 1 2015Uploaded bynurfazihah
- 7.2rvsd.pptUploaded byM Jehanzeb Ishaq
- Lecture (hypothesis)printUploaded byMuhammad Umer
- RISKUploaded byOKORO AGHATISE
- math 1040 skittles report part 3Uploaded byapi-298082521
- Journal Club - Radiation OncologyUploaded byLokesh Viswanath
- 3301 Topic 9 Hypothesis TestingUploaded byBrandon Lawrence
- Probability and Statistics NotesUploaded byendu wesen
- As-AL SOW 9709 07 ProbabilityandStatistics2 S2Uploaded byBalkis
- CVEN2002 Week9Uploaded byKai Liu
- Tutorial 10Uploaded byMohd Hasrul Ramlee
- IE27_16_HypothesisTestingUploaded byCristina de los Reyes
- Effect of Employee Engagement on Perceived Corporate Social Responsibility in IT SectorUploaded byAnonymous vQrJlEN
- Hypothesis TestingUploaded byRay Sang
- Does Financial Development Cause Economic Growth the Case of IndiaUploaded byPallabi Guha
- Gendered Time Allocation of Indigenous Peoples in the Ecuadorian AmazonUploaded byPedro Portella Macedo
- Lec 5 - Normality TestingUploaded byGerardo Herrera
- A6_TranUploaded byPeter Tran
- CastroSotosUploaded bytarighat486
- 36.Combining Neural Network Model With Seasonal Time Series ARIMA ModelUploaded byMiloš Milenković
- Effect of English and Urdu as a medium of instruction on student achievement at elementary level in rural areas of girls public schools of tehsil Gujrat (1).docxUploaded bystar shining

- Pollinator Biology and HabitatUploaded bySteven Schafersman
- Relevant Costing - HandoutUploaded byUsra Jamil Siddiqui
- Acceptance Sampling PowerpointUploaded byshafie_buang
- fb_structuredText.pdfUploaded byalejandrobarillas
- Livestock Production and ManagementUploaded bypritamashwani
- 8085 _ Pin Diagram & SignalsUploaded byAxe Axe
- Difference Between 11i and R12 Module WiseUploaded byrabindra
- Digital Power ConversionUploaded bydgujarathi
- Ashutosh Kumar - Valuation & Performance Consequences of Cross Border Acquisitions- Indian Acquirers and Foreign TargetsUploaded byVaibhav Karthik
- Eee312 Eee282 Lab7 Spring2015Uploaded byvognar
- Linksys BEFW11S4Uploaded bymasiperes
- Appendix E Questions 2013 Introduction to Naval Architecture Fifth EditionUploaded byCharles Jones
- Marketing - Behavioral Economics - Poker and Human Brain - Aug 2010Uploaded byadithya262
- Automatic Side Stand With Breaking Locking SystemUploaded bySaravanan Viswakarma
- Ppe Question5Uploaded bySirajAyesha
- Urban Growth and Decline PowerpointUploaded byMel Kane
- The Influence of Transformational Leadership on Innovation of Universities: The Mediating Role of Trust in LeaderUploaded byInternational Journal of Innovative Science and Research Technology
- OEM12C management pack for EBS.pdfUploaded byPankaj Gupta
- Hydraulic Flow Resistance Factors for Corrugated Metal ConduitsUploaded bydnk
- sw 3010Uploaded byapi-282881075
- ID-Spec L 3.1 Guided Tour EnUploaded bygfgf
- electropneumaticsUploaded byRj Altamirano
- Arduino Audio InputUploaded byswonera
- Land Tenure and Land Use Dynamics in Limbe City, South West Region of CameroonUploaded byTI Journals Publishing
- Pharmacoeconomic Guideline BELGIUMUploaded byErie GusNellyanti
- CSTE Mock Test - Part 4 - QuestionsAnswersUploaded byapi-3733726
- Synthetic Natural Gas Engine Oil SAE (20W) 40Uploaded bybrian5786
- Horton Ch 3 Test AnswersUploaded byscribddommini
- 2 2 3 3d timeline 1Uploaded byapi-291537863
- SAP Business One and ChefUploaded byjourney_00