Professional Documents
Culture Documents
estimation
learning objectives:
Parameters
Random sample
estimation
Statistics
Statistical estimation
Estimate
CI for the
95 poipulation
%C I = means
x ± 1.96 SEM
99 %CI =x ±2.58 SEM
SD
SEM =
n
Interval estimation
Confidence interval (CI)
34% 34%
14% 14%
2% 2%
z
-3.0 -2.0 -1.0 0.0 1.0 2.0
3.0
-2.58 -1.96 1.96 2.58
Interval estimation
Confidence interval (CI), interpretation and example
50
40
Frequency
30
20
10
0
22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5
25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0
Age in years
x= 41.0, SD= 8.7, SEM=0.46, 95% CI (40.0, 42), 99%CI (39.7,
42.1)
Testing of hypotheses
learning objectives:
S c i e n t i f i c k n o w l e d g
R e a s o n a n E d m i n p t i u r i i ct i ao l n o b s
Systematic error
CHANCE
70
60
50
40
30
20
10
0
23.8 28.8 33.8 38.8 43.8 48.8 53.8 58.8
AGE
If our observed age value lies outside the green lines, the
probability of getting a value as extreme as this if the null
hypothesis is true is < 5%
Testing of hypotheses
Definition of p-value.
irreparable damage
treated but not harmed
would be done
by the treatment
Multiple comparison
skewness skewness
kurtosis
kurtosis
Some concepts related to the statistical
methods.
Degrees of freedom
the number of scores, items, or other units in
the data set, which are free to vary
1
χ = ∑ ( f oi − f ei )
2 2
ι f
ei
χ 2
> 3.841 p < 0.05
χ = 14.2, df=3 (4-1)
2
χ 2
> 6.635 p < 0.01
0.0005 < p < 0.05
χ
> 10.83 p <
2
0.001
Null hypothesis is rejected at 5% level
Selected nonparametric tests
Chi-Square test.
1
Frc = ( fr fc )
N
then
1
χ = ∑ ∑ ( f ij − Fij )
2 2
ι j F
ij
df = (fr-1) (fc-1)
Selected nonparametric tests
Chi-Square test. Example
Question: whether men are treated more aggressively for
cardiovascular problems than women?
Sex
Cardiac male female Row total
Cath
No 15 16 31
Yes 45 24 69
Column 60 40 100
total
Selected nonparametric tests
Chi-Square test. Example
Sex
Cardiac male female Row total
Cath
No 18.6 12.4 31
Yes 41.4 27.6 69
Column 60 40 100
total
Selected nonparametric tests
Chi-Square test. Example
Result:
p > 0.05
Techniques include:
Factor Analysis / Principal Components
Analysis
Hierarchical Clustering
K-Means Cluster
Non-Linear Principal Components Analysis
(PRINCALS/CATPCA)
The new Two-Step Cluster
Which Technique to Use?
•Cluster •Categories
•Analysis
•Factor Analysis
•Exploratory
•Confirmatory
•Discriminant
•Analysis
•AnswerTree
Which Test to use?
Factor Analysis - to find patterns within variables
Categories - use if data doesn’t fit assumptions for Factor
Analysis
Cluster Analysis - to find patterns between individuals
Two-Step Cluster – To use with both categorical and
continuous variables
Discriminant Analysis - to look for differences between
groups, try to predict target variable
AnswerTree - combinations of data, to predict target
Multivariate Analysis
4
Brand usually use
Rambo AP Spray
Rambo AP Roll-on
Brad AP Spray
Brad AP Roll-on
2 Clint AP Spray
Clint AP Roll-on
Group Centroid
Function 2
Rambo AP Spray
0
Brad AP Roll-on Rambo AP Roll-on
Clint AP Spray
Clint AP Roll-on
Brad AP Spray
-2
-4
•
Link to more information
Also useful:
http://www.spss.com/pdfs/S115AD8-1202A.pdf
http://www.norusis.com/pdf/SPC_v13.pdf
Brand usually use by
Cluster
Percent
Cluster
1 2 3 4 5 6 Combined
Rambo AP Spray .0% 18.1% 52.3% .0% 29.6% .0% 100.0%
Rambo AP Roll-on 70.3% 29.7% .0% .0% .0% .0% 100.0%
Brad AP Spray .0% .0% .0% 100.0% .0% .0% 100.0%
Brad AP Roll-on .0% 3.6% .0% 96.4% .0% .0% 100.0%
Clint AP Spray .0% .4% .0% .0% .0% 99.6% 100.0%
Clint AP Roll-on .0% 14.3% .0% 85.7% .0% .0% 100.0%
Employment Status by
Cluster
Cluster 2 (‘Clint’ roll-on) is largely made up of part-time,
retired and not working respondents, Cluster 4 also has
a high number of retired respondents, while Cluster 6
‘Clint’ spray) also has a high percentage of part-time
and unemployed.
employ Employment Status
Percent
Cluster
1 2 3 4 5 6 Combined
Full time
24.5% 2.3% 29.7% 13.8% 16.8% 12.9% 100.0%
employment
Part-time
11.9% 61.9% 4.8% 2.4% .0% 19.0% 100.0%
employment
Not employed .0% 79.3% .0% 9.8% .0% 10.9% 100.0%
Student .0% 91.9% .0% 1.0% .0% 7.1% 100.0%
Retired .0% 61.1% .0% 33.3% 5.6% .0% 100.0%
Age Group by Cluster
Percent
Cluster
1 2 3 4 5 6 Combined
Under 18 .0% 96.8% .0% 3.2% .0% .0% 100.0%
18-24 .0% 57.2% .0% 6.9% 27.6% 8.3% 100.0%
25-34 18.3% 11.1% 47.4% 10.5% .0% 12.7% 100.0%
35-44 23.0% 3.8% 44.6% 14.2% .0% 14.4% 100.0%
45-54 29.7% 5.5% .0% 13.1% 39.5% 12.2% 100.0%
55-64 15.5% 38.7% .0% 15.2% 19.1% 11.6% 100.0%
65 or over .0% 68.8% .0% 18.8% .0% 12.5% 100.0%
TwoStep Cluster Number = 4
•on
•‘Home
TwoStep Cluster Number = 6
•Results
•Window
•(not shown) •Output Window (not shown)
Editor Window
•Explorer
•Window
Libraries Folder
bar graph?
double bar graph?
Histogram?
Bar Graph
A bar graph
Spanish
can be used to
display and
Mandarin
compare data
Hindi
The scale
should include
English all the data
values and be
0 200 400 600 800 1000
easily divided
into equal
intervals.
How to interpret a Bar
Graph?
•The bar graph shows Mr.
Snowden’s students by gender
and band membership.
7
How many of Mr.
6
Snowden’s
5
students are
4
band members?
3
How many of Mr.
2
Snowden’s 1
students are not Female Female not Male band Male not
0
band members?
Double Bar Graph
90 Can be used
80
70 to compare
60
50
two related
40 sets of data
30
20
10
0
1st 2nd 3rd 4th
Qtr Qtr Qtr Qtr
How to make a Double-
Bar Graph?
Choose a scale and
interval for the vertical
axis.
Draw a pair of bars for
each country’s data. Use
different colors to show
males and females.
Label the axes and give
the graph a title.
Make a key to show what
each bar represents.
The table shows the
highway speed limits on
interstate roads .within
State three statesRural
Urban
Choose a scale 80
40
20
State Urban Rural
0
Florida 65mi/h 70 mi/h
Texas 70 mi/h 70 mi/h
Vermont 55mi/h 65 mi/h
Step 2 Draw a pair of bars for each
state’s data. Use different colors
to show urban and rural.
80
0
Florida T exas Vermon t
Step 3 and 4
•Speed Limit on Interstate
Roads
80
Label the axes •
and give the 60 Urban
•Speed Limit
graph a title.
40
•
Make a key to
(mi/h)
Rural
show what each 20
bar represents
0
Florida Texas Vermont
Histogram
4 IIII - I 9 IIII
5 IIII - III
Make a frequency Step 1
table of the data.
Be sure to use
equal intervals
Number of Frequency
hours of TV
Number of hours of TV
1-3 15
1 II 6 III
4-6 17
2 IIII 7 IIII - IIII
7-9 16
3 IIII - IIII 8 III
4 IIII - I 9 IIII
5 IIII - III
Step 2
Choose an appropriate scale and interval for the
vertical axis. The greatest value on the scale should
be at least as great as the greatest frequency.
20
16
Number of Frequency 12
hours of TV
8
4
1-3 15
0
4-6 17 1-3 4-6 7-9
7-9 16
Step 3 Hours of Television
Watched
Draw a bar for each
interval. The height of the 20
bar is the frequency for
Number of students
that interval. Bars must 16
touch but not overlap. 12
Label the axes and give
8
the graph title
4
Number of Frequency 0
hours of TV 1-3 4-6 7-9
Hours
1-3 15
4-6 17
7-9 16
Hours of Television
Watched
20
Number of students
16
12
8
4
0
1-3 4-6 7-9
Hours
The list below shows the results
of a typing test in words per
minute. Make a histogram of
the data.
62, 55, 68, 47, 50, 41, 62, 39,
54, 70, 56, 70, 56, 47, 71, 55,
60, 42
Essential Information
Commonly used visual tools
Charts:
Bar
Line
Pie
XY
Area
Thematic map