You are on page 1of 59

Statistics

ST 361 Statistics for Engineers Graphical Displays


Kimberly Weems ksweems@ncsu.edu 5260 SAS Hall

Scales of Measure
Nominal Data
Places data in categories Another name for that group From the article: Do you believe that extraterrestrial beings have visited Earth at some time in the past (Believe, Dont Believe, Not sure) People are in one of three categories.

Statistics

Scales of Measure
Ordinal Data
Categories that have an order From the article: How superstitious are you? (Very, Somewhat, Not very, not at all) We know that Very is more than Somewhat which is more than Not very, etc

Statistics

Scales of Measure
Note: Numbers can be assigned to the categories but using the numbers does not make much sense.
Code: 4 = Very, 3 = Somewhat, 2= Not very, 1= Not at all Very is not twice as much as Not very.

Statistics

Scales of Measure
Interval/Ratio
Numbers are actually numbers=> make sense as numbers and can be used that way From the article: What is your age in years? Someone who is 40 is twice as old as someone who is 20. Difference in Age between someone 30 and 35 is the same as the difference between 25 and 30.

Statistics

Scales of Measure
Note: Your text refers to this as interval data, some other texts refer to it as ratio data. For the purposes of this course we will not differentiate the two.
No real difference between methods of analysis.

Statistics

Other terminology:
Categorical data- nominal and ordinal, gives names to categories. Numeric data-uses meaningful numbers Quantitative- another name for numeric Qualitative-another name for categorical

Statistics

Example: Intro Stat Students


gender female height 69.5 textbooks 320 HSGPA 3.1 car yes

female
female

65
63

250
150

4
4

no
yes

female
female female female female male female

64
67 63 60 64 72 66
Statistics

300
90 300 250 250 187 150

3.35
3.7 3.9 3 3.8 3.1 3.4

yes
yes yes yes yes yes no

Example: Intro Stat Students


gender female height 69.5 textbooks 320 HSGPA 3.1 car yes

female
female

65
63

250
150

4
4

no
yes

female
female female female female male female

64
67 63 60 64 72 66
Statistics

300
90 300 250 250 187 150

3.35
3.7 3.9 3 3.8 3.1 3.4

yes
yes yes yes yes yes no

Example: Intro Stat Students


gender female height 69.5 textbooks 320 HSGPA 3.1 car yes

female
female

65
63

250
150

4
4

no
yes

female
female female female female male female

64
67 63 60 64 72 66
Statistics

300
90 300 250 250 187 150

3.35
3.7 3.9 3 3.8 3.1 3.4

yes
yes yes yes yes yes no

Example: Intro Stat Students


gender female height 69.5 textbooks 320 HSGPA 3.1 car yes

female
female

65
63

250
150

4
4

no
yes

female
female female female female male female

64
67 63 60 64 72 66
Statistics

300
90 300 250 250 187 150

3.35
3.7 3.9 3 3.8 3.1 3.4

yes
yes yes yes yes yes no

Why do we care?
Type of data dictates summary that will be used. We must choose the analysis that will be used. Summaries of categorical data.
Proportions and counts

Example: Superstitious? 2% very, 22% somewhat, 31% not very, 45% not at all.

Statistics

12

Why do we care?
Summaries of numeric data.
Averages, medians, standard deviations

Example: Age? Average 35 years

Statistics

13

Graphics
Statistical Results are often presented in graphical displays 1 picture = ____________

Statistics

Graphics
Statistical Results are often presented in graphical displays 1 picture = 1000 words

Help understand the story behind the data.


Visualize the distribution

Statistics

The three rules of data analysis wont be difficult to remember


1. Make a picture reveals aspects not obvious in the raw data; enables you to think clearly about the patterns and relationships that may be hiding in your data. 2. Make a picture to show important features of and patterns in the data. You may also see things that you did not expect: the extraordinary (possibly wrong) data values or unexpected patterns 3. Make a picture the best way to tell others about your data is with a well-chosen picture.

Statistics

Graphics
Bar Charts- graphical representation of categorical data
Horizontal Axis- categories Vertical Axis- count or percentage of subjects in that category.

Pie Charts- Angle represents proportion of values

Statistics

Categorical data

Statistics

Categorical data

Story: More females than males

Statistics

What about numeric values?


Plot numeric values to see where they are located. Stem and Leaf Displays Dot plot- graphic of numeric data.

Statistics

Stem and Leaf Displays


Partition each no. in data into a stem and leaf Constructing stem and leaf display
1) deter. stem and leaf partition (5-20 stems)

2) write stems in column with smallest stem at top; include all stems in range of data 3) only 1 digit in leaves; drop digits or round off 4) record leaf for each no. in corresponding stem row; ordering the leaves in each row helps

Statistics

Example: employee ages at a small company

18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10s digit; leaf: 1s digit 18: stem=1; leaf=8; 18 = 1 | 8 stem leaf 1 8 9 2 1 2 8 9 9 3 2 3 8 9 4 0 1 5 6 7 6 4
Statistics

Suppose a 95 yr. old is hired


stem 1 2 3 4 5 6 7 8 9 leaf 8 9 1 2 8 9 9 2 3 8 9 0 1 6 7 4

Statistics

Advantages/Disadvantages of Stem-andLeaf Displays


Advantages 1) each measurement displayed 2) ascending order in each stem row 3) relatively simple (data set not too large) Disadvantages display becomes unwieldy for large data sets

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
A health researcher examined the amount of soda that a group of teenagers consumed during a day. The resulting amounts in ounces were: 9, 9, 6, 15, 12, 14, and 40.

10

20

30

40

Statistics

Dot Plot
We can see the main cluster is around 10. Smallest at 6, largest at 40. Big gap between 40 and the other values.

Statistics

Dot Plot
Good for small numbers of values, but becomes cumbersome with many values. For larger numbers of values we could stack up values in categories.

Statistics

Histograms
Histogram-bar chart of quantitative data
Range of possible values are broken into categories

Example: Undergraduate university students survey: How much did you spend on textbooks this academic term? Categories: $0 to $100, $101 to $200, etc.

Statistics

Textbooks

Statistics

Textbooks
About 20 people paid between $601 and $700

Statistics

Textbooks
Amounts centered around $400 More people around $400 than around $200 Values as big as 800 as low as 0. Useful for understanding the distribution of quantitative variables.
Where are the main chunks of data? How spread out are the values? What is the shape of the data?

Statistics

TV Time

Statistics

TV Time
About 110 people spent between 11 and 15

Statistics

TV Time
Main cluster between 0 and 10 Smallest value 0 and largest value around 60 Cant be below 0, minimum possible. Many people around the lower limit. Max possible 168. No one around there or even close. Values tail off to the right side.

Statistics

Shapes of distributions
Skewed to the right (or positively skewed)
Long tail to the right Generally because individuals are stacked up near a lower limit and unlimited on the upper end.

Statistics

TV Time
Skewed Right: Long tail to the Right

Statistics

Shapes of distributions
Skewed to the left (or negatively skewed)
Long tail to the left Generally because individuals are stacked up near an upper limit and unlimited on the lower end.

Statistics

Birth Year

Statistics

Birth Year

Traditional students

Statistics

Birth Year
Switched majors, five or six year students

Traditional students

Statistics

Birth Year
Switched majors, five or six year students

Traditional students

Non traditional students

Statistics

Birth Year
Switched majors, five or six year students

Traditional students

Non traditional students

Older students back to for life change

Statistics

Shapes of distributions
Symmetric
Tails approximately equal in both directions Major cluster near far from limits on both ends.

Statistics

Textbooks
Approximately symmetric

Statistics

Bimodal

Statistics

Bimodal
Peak 1

Statistics

Bimodal
Peak 1 Peak 2

Statistics

A Variety of Emissions Distributions by Plant Code

Statistics

Shapes of distributions
Bimodal-two peaks
Caused by two or more groups Multi-modal several peaks

Statistics

Shapes of distributions
Help us understand the data
Skewed=> typically because of a natural limit that subjects are near Symmetric => Subjects are not near the limit. Multi-modal=> multiple distinct groups within the distribution.

Statistics

Outliers
Outliers- unusual values that do not fit with the rest of the pattern.
May be data entry errors or may be actual unusual values

Statistics

Outliers
Recall dot plot
40 might be considered an outlier. Maybe data entry error. May be actual value.

10

20

30

40

Statistics

Class Problem
The following 10 observations on October snow cover for Eurasia during the years 19701979 (in million km2):
6.5 12.0 14.9 10.0 10.7

7.9

21.9

12.5

14.5

9.2

Create a stem & leaf display of the data. Is their an outlier in the data set?

Statistics

You might also like