Professional Documents
Culture Documents
Basic Definitions
Descriptive vs. Inferential Statistics
Descriptive Statistics describe, display, and Inferential Statistics is when you use data you
summarize data. You describe data you already do have to make educated guesses about data
have. you don’t have. You are making inferences
about that missing data.
• The goal is to turn a bunch of confusing
data into useful information. • Using past data to predict future data is
common, as is using one group to make
• It involves summary statistics like
predictions about another. You can use
means and median, graphical displays
present data to “predict” the past, too.
like charts and graphs, and good
organization.
• E.g. All women in Canada, all Ryerson • E.g. You interview 200 randomly
Students, all NHL teams. selected Ryerson students.
• E.g. Height: a person can be 180cm, or • E.g. Shoe Size: You can buy a size 10
180.5cm, 180.2867cm, or etc. shoe, or a size 10 ½, but you cannot buy
a size 10.35.
An Example of Statistics
Source: Coleman, Michael. “Journal Poll: Clinton Still Ahead in NM.” Albuquerque Journal 5 Nov. 2016
https://www.abqjournal.com/883092/clinton-still-ahead-in-new-mexico.html
The Journal Poll of likely New Mexico voters, conducted Nov. 1-3, showed
Clinton leading Trump 45 percent to 40 percent.
Johnson, the Libertarian Party candidate and former two-term New Mexico
governor, pulled 11 percent support in the new Journal Poll, compared with 24
percent of New Mexicans who supported him in the newspaper’s late September
poll.
[…]
If you can collect the frequency of particular • To describe time series data, line charts
responses, you can illustrate those frequencies work well.
using pie charts and bar charts.
• E.g. Viewing a stock price over
• Bar charts make it easier to compare time
the size of one group to the size of
• To look for correlations and see how
another.
one variable influences another, use
• Pie charts make it easier to compare the scatter charts.
size of one group to the size of the
• Display comparisons between groups
whole.
using pie charts and bar charts.
To compare two variables, you can use
• To illustrate the distribution of a set of
contingency tables or you can use bar charts
data, you can group the data using
with multiple bars per group.
frequency tables and use those to make
histograms, a special kind of bar chart.
Frequency Tables
In-Class Exercise
We’ll perform a quick survey in class to gather data on how many hours of sleep Ryerson students get
per night. This is not a randomized sample, so it won’t be representative and we couldn’t use it for
inferential statistics. Luckily for us, frequency tables are descriptive statistics so we’re in the clear. Follow
the six steps to making a frequency table, then fill in the other columns.
An Ogive
An ogive is what you get when you convert a cumulative frequency table or a relative cumulative
frequency table (or both) to a line chart.
Ogive of Height
12 100%
90%
10
80%
Cumulative Relative Frequency
Cumulative Frequency
70%
8
60%
6 50%
40%
4
30%
20%
2
10%
0 0%
165 170 175 180 185 190 195
Height in cm.
Summary Statistics
A summary statistic is a single number that summarizes something about the distribution of your data.
The two major sorts of summary statistics that we need to look at are measures of central tendency and
measures of variability (aka measures of dispersion).
The mean: add up all the data and divide by To get the mean, we calculate:
however many numbers there are.
(182 + 187 + 194 + 168 + 181 + 174 + 168 + 174
+ 178 + 179 + 190 + 165)/12= 178.33
The median: find the middle value. To get the median, we put the numbers in
order:
• If there are an even number of data
points, take the mean of the middle 165, 168, 168, 174, 174, 178, 179, 181, 182,
two. 187, 190, 194
The mid-range: the value halfway between the To get the mid-range, we take the mean of the
largest and smallest values. This is used nearly largest and smallest values: (165+194)/2 = 179.5
never.
In-Class Exercise
Given the five ages of my family, find the mean, median, mode, and mid-range: 31, 29, 7, 6, and 2.
• The range is the difference between the largest value in a set of data and the smallest.
• The variance and standard deviation provide more robust measures of variability.
• We can compute a sample standard deviation (𝑠) or a population standard deviation (𝜎).
Similarly, we can find a sample variance (𝑠 2 ) or a population variance (𝜎 2 ). The process is the
same except for one step.
𝑠 = √7.5 = 2.74
𝑿 𝑿−𝑿̅ ̅ )𝟐
(𝑿 − 𝑿
31
29
7
6
2
Total
Population Variance (𝛔𝟐 ): Population Standard Deviation (𝛔):