Professional Documents
Culture Documents
Scales of Measure
Nominal Data
Places data in categories Another name for that group From the article: Do you believe that extraterrestrial beings have visited Earth at some time in the past (Believe, Dont Believe, Not sure) People are in one of three categories.
Statistics
Scales of Measure
Ordinal Data
Categories that have an order From the article: How superstitious are you? (Very, Somewhat, Not very, not at all) We know that Very is more than Somewhat which is more than Not very, etc
Statistics
Scales of Measure
Note: Numbers can be assigned to the categories but using the numbers does not make much sense.
Code: 4 = Very, 3 = Somewhat, 2= Not very, 1= Not at all Very is not twice as much as Not very.
Statistics
Scales of Measure
Interval/Ratio
Numbers are actually numbers=> make sense as numbers and can be used that way From the article: What is your age in years? Someone who is 40 is twice as old as someone who is 20. Difference in Age between someone 30 and 35 is the same as the difference between 25 and 30.
Statistics
Scales of Measure
Note: Your text refers to this as interval data, some other texts refer to it as ratio data. For the purposes of this course we will not differentiate the two.
No real difference between methods of analysis.
Statistics
Other terminology:
Categorical data- nominal and ordinal, gives names to categories. Numeric data-uses meaningful numbers Quantitative- another name for numeric Qualitative-another name for categorical
Statistics
female
female
65
63
250
150
4
4
no
yes
female
female female female female male female
64
67 63 60 64 72 66
Statistics
300
90 300 250 250 187 150
3.35
3.7 3.9 3 3.8 3.1 3.4
yes
yes yes yes yes yes no
female
female
65
63
250
150
4
4
no
yes
female
female female female female male female
64
67 63 60 64 72 66
Statistics
300
90 300 250 250 187 150
3.35
3.7 3.9 3 3.8 3.1 3.4
yes
yes yes yes yes yes no
female
female
65
63
250
150
4
4
no
yes
female
female female female female male female
64
67 63 60 64 72 66
Statistics
300
90 300 250 250 187 150
3.35
3.7 3.9 3 3.8 3.1 3.4
yes
yes yes yes yes yes no
female
female
65
63
250
150
4
4
no
yes
female
female female female female male female
64
67 63 60 64 72 66
Statistics
300
90 300 250 250 187 150
3.35
3.7 3.9 3 3.8 3.1 3.4
yes
yes yes yes yes yes no
Why do we care?
Type of data dictates summary that will be used. We must choose the analysis that will be used. Summaries of categorical data.
Proportions and counts
Example: Superstitious? 2% very, 22% somewhat, 31% not very, 45% not at all.
Statistics
12
Why do we care?
Summaries of numeric data.
Averages, medians, standard deviations
Statistics
13
Graphics
Statistical Results are often presented in graphical displays 1 picture = ____________
Statistics
Graphics
Statistical Results are often presented in graphical displays 1 picture = 1000 words
Statistics
Statistics
Graphics
Bar Charts- graphical representation of categorical data
Horizontal Axis- categories Vertical Axis- count or percentage of subjects in that category.
Statistics
Categorical data
Statistics
Categorical data
Statistics
Statistics
2) write stems in column with smallest stem at top; include all stems in range of data 3) only 1 digit in leaves; drop digits or round off 4) record leaf for each no. in corresponding stem row; ordering the leaves in each row helps
Statistics
18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10s digit; leaf: 1s digit 18: stem=1; leaf=8; 18 = 1 | 8 stem leaf 1 8 9 2 1 2 8 9 9 3 2 3 8 9 4 0 1 5 6 7 6 4
Statistics
Statistics
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.
10
20
30
40
Statistics
Dot Plot
We can see the main cluster is around 10. Smallest at 6, largest at 40. Big gap between 40 and the other values.
Statistics
Dot Plot
Good for small numbers of values, but becomes cumbersome with many values. For larger numbers of values we could stack up values in categories.
Statistics
Histograms
Histogram-bar chart of quantitative data
Range of possible values are broken into categories
Example: Undergraduate university students survey: How much did you spend on textbooks this academic term? Categories: $0 to $100, $101 to $200, etc.
Statistics
Textbooks
Statistics
Textbooks
About 20 people paid between $601 and $700
Statistics
Textbooks
Amounts centered around $400 More people around $400 than around $200 Values as big as 800 as low as 0. Useful for understanding the distribution of quantitative variables.
Where are the main chunks of data? How spread out are the values? What is the shape of the data?
Statistics
TV Time
Statistics
TV Time
About 110 people spent between 11 and 15
Statistics
TV Time
Main cluster between 0 and 10 Smallest value 0 and largest value around 60 Cant be below 0, minimum possible. Many people around the lower limit. Max possible 168. No one around there or even close. Values tail off to the right side.
Statistics
Shapes of distributions
Skewed to the right (or positively skewed)
Long tail to the right Generally because individuals are stacked up near a lower limit and unlimited on the upper end.
Statistics
TV Time
Skewed Right: Long tail to the Right
Statistics
Shapes of distributions
Skewed to the left (or negatively skewed)
Long tail to the left Generally because individuals are stacked up near an upper limit and unlimited on the lower end.
Statistics
Birth Year
Statistics
Birth Year
Traditional students
Statistics
Birth Year
Switched majors, five or six year students
Traditional students
Statistics
Birth Year
Switched majors, five or six year students
Traditional students
Statistics
Birth Year
Switched majors, five or six year students
Traditional students
Statistics
Shapes of distributions
Symmetric
Tails approximately equal in both directions Major cluster near far from limits on both ends.
Statistics
Textbooks
Approximately symmetric
Statistics
Bimodal
Statistics
Bimodal
Peak 1
Statistics
Bimodal
Peak 1 Peak 2
Statistics
Statistics
Shapes of distributions
Bimodal-two peaks
Caused by two or more groups Multi-modal several peaks
Statistics
Shapes of distributions
Help us understand the data
Skewed=> typically because of a natural limit that subjects are near Symmetric => Subjects are not near the limit. Multi-modal=> multiple distinct groups within the distribution.
Statistics
Outliers
Outliers- unusual values that do not fit with the rest of the pattern.
May be data entry errors or may be actual unusual values
Statistics
Outliers
Recall dot plot
40 might be considered an outlier. Maybe data entry error. May be actual value.
10
20
30
40
Statistics
Class Problem
The following 10 observations on October snow cover for Eurasia during the years 19701979 (in million km2):
6.5 12.0 14.9 10.0 10.7
7.9
21.9
12.5
14.5
9.2
Create a stem & leaf display of the data. Is their an outlier in the data set?
Statistics