You are on page 1of 8

Nishanth Kalavakolanu

Kiker 3A Statistics
9/12/16 Data Exploration Project
Often times I have seen many of my peers go after school to grab some food from various
restaurants, and I wanted to know how common of a practice this was. Was it just my group of
friends that regularly went to Mueller to have a post school meal, or did the rest of the senior
class do the same? I used the data exploration project as an opportunity to answer this question
that I harbored.
My sample was collected using a Facebook poll on the LASA 2017 facebook page with
the question How many times a week do you eat out after school. I received 57 responses to
my poll with people eating out from 0 to 3 times a week. I felt that using days per week provided
an accurate measurement of the frequency that people eat out after school.

The five number summary for my data looks as such:


Min: 0
Q1: 0
Median: 1
Q3: 1
Max: 3

This was calculated using the fivenum function in Rstudio. It is interesting to see that the min is
equal to Q1 and the median is equal to Q3. This means that the bottom 25% of people that went
to eat out were also the people that ate out the least. This makes sense as the range of my data
was very small

Here are some other statistical calculations that I found using their corresponding functions in R.
The range of my data is pretty small as there are only 5 days in a school week, and nobody ate
out more than 3 times a week.
Mean: 0.807
Range: 3
Standard Deviation: 0.811
Variance: 0.658
IQR: 1

Although the range of my data was fairly small it did contain one outlier. This outlier was
calculated using the 1.5 IQR method.
Q3 of the data is 1. The IQR is also one, and 1.5x the IQR is 1.5
Q3+IQR=2.5
Only one person ate out more than 2.5 days a week making them the one outlier.
Q1 was 0 and since there are no negative values in the data there are no outliers on the left.
Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project

The outlier and 5 number summary can be easily visualized in this boxplot

Here is a histogram of the data helping further visualize the data.


Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project
This stemplot gives a good visualization of the frequencies of the data collected. You can see
that as the days per week increases, the number of people who eat out starts to fall at quite a
quick rate.
0 | 000000000000000000000000
1 | 000000000000000000000
2 | 00000000000
3|0

2|0 = 2.0 days per week

If we add 100 to the data, we get a new 5 number summary which is the same as the previous
one except with 100 added to all the values. This is the new 5 number summary. The sample size
remains the same at 57.
Min:100
Q1: 100
Median: 101
Q3: 101
Max: 103

The mean for our new set of data is also the same as the previous mean just with 100 added
giving us a mean of 100.807. Because the spread of the data did not change when 100 was
added, the standard deviation and the variance remain the same with values .811 and .658
respectively. Similarly the IQR remained at 1 since the spread didnt change.

Because spread stayed the same the outliers also remained the same. Q3 of the data is 101. The
IQR is 1, and 1.5x the IQR is 1.5
Q3+IQR=102.5
Only one person ate out more than 102.5 days a week making them the one outlier.
Q1 was 100 and since there no values that are under 100 in the data so there are no outliers on
the left.

Here is a stemplot of this new data


100 | 000000000000000000000000
101 | 000000000000000000000
102 | 00000000000
103 | 0

101 | 0 = 101.0
Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project

The outlier and 5 number summary can be easily visualized in this boxplot

Here is a histogram to also help visualize the data


Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project
Both the stemplot and the boxplot look remarkably similar to the original data as the spread
remains the same.

If we increase the numbers in our data by 50%, which can be done by multiplying the all
the values by 1.5, we can see a new 5 number summary. This is the same as the original one
except with all the values have been multiplied by 1.5. The sample size remains the same at 57.

Min: 0.0
Q1: 0.0
Median: 1.5
Q3: 1.5
Max:4.5

Because the spread of the new data set has changed, there will be a difference in values that
measure spread such as standard deviation, variance, and IQR.

Mean: 1.21
Range: 4.5
Standard Deviation: 0.811
Variance: 1.48
IQR: 1.5

With the exception of variance, all of these values are the original values multiplied by 1.5

Using the the 1.5 IQR method, although there was a change in spread, the outliers stayed the
same with only 1 outlier at 4.5

Q3 of the data is 1.5. The IQR is also one, and 1.5x the IQR is 2.25
Q3+IQR=3.75
Only one value is greater than 3.75, giving us one outlier at 4.5
Q1 was 0 and since there are no negative values in the data there are no outliers on the left.

Here is a stemplot of the data:


0 | 000000000000000000000000
1 | 555555555555555555555
2|
3 | 00000000000
Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project
4|5
2|0 = 2.0

Here are some charts and graphs to better visualize the data
Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project

Although my data is nowhere close to symmetrical and normally distributed, if we


assumed it to be so by using the qnorm function of R, I calculated that the top 10% of data would
lie above 1.85 days per week eating out. To highlight how off this is if we take the actual number
of people who ate out more than 1.85 times per week or 13 people, and divide it by the sample of
57 we see that in actuality around 23% of people eat out more than 1.85 times per week.

I used the pnorm function of R to calculate the probability that a value was greater than 5 units
above the mean assuming my data was normally distributed. This should yield an extremely
small value because the range of my data is only 3 and I had no data more than 3 units from the
mean.
By typing in 1-pnorm((5+mean(data)), mean(data), sd(data)) into R I get the value 3.603393e-10.
This number is extremely tiny and was to be expected from my data.

By using the pnorm function and typing


(pnorm((2+mean(data)), mean(data), sd(data)))-(pnorm((-3+mean(data)), mean(data), sd(data)))
into R, we calculate probability of a value being below 2 units above the mean, and subtract from
that from the probability of a value being 3 units below the mean. The probability that R spits out
is 0.9930324, or 99.3%, which is the probability that a value would lie between 3 units below the
mean and 2 units above the mean.
Nishanth Kalavakolanu
Kiker 3A Statistics
9/12/16 Data Exploration Project

From this data I can conclude that most people in fact do not eat out after school. This is
because the mode of my data was 0 days per week. I can also conclude that as days per week
increased less people ate out, looking at the histograms and stem plots really supports this
conclusion as frequency began to quickly drop off as days per week increased. One explanation
for my conclusion is that eating out is pretty expensive and many seniors would rather save their
money than spend it eating out one or more times a week after school. These results somewhat
surprised me as I thought more seniors were eating out after school.

You might also like