Professional Documents
Culture Documents
Kanmani
Head-in Charge
Department of Education
Manonmaniam Sundaranar
University, Tirunelveli-12.
kan_mani_msc@yahoo.com
APPROPRIATE STATISTICS
FOR DATA ANALYSIS
BASIC CONCEPTS
Population
Collection of all individuals or objects or items
under study and denoted by N
Sample
A part of a population and denoted by n
Variable
Characteristic of an individual or object.
Qualitative and Quantitative variables
Parameter
Characteristic of the population
Statistic
Characteristic of the sample
Population
Sample
Size
Mean
_
x
X
n
SD
_
2
(x x)
n
Proportion
Correlation
Coefficient
COV( x , y)
r
x y
_
2
(x x)
S
n 1
x
n
Organise
data
Analyse the
Organised data
Q u a n t it a t iv e
d a ta s e t
M a k e a fre q u e n c y
t a b le
O b t a in t h e m o d i f ie d r a n g e
a n d t h e n d iv id e in t o
s e v e r a l c la s s e s
M ake a
f r e q u e n c y t a b le
Pictorial representation of
a data set
G r a p h ic a l r e p r e s e n t a t io n
o f a d a ta s e t
Dec 6, 2015
C a t e g o r ic a l o r
Q u a lit a t iv e d a t a s e t
Q u a n t it a t iv e
d a ta s e t
B a r D ia g r a m
P ie D ia g r a m
H is t o g r a m
S t e m le a f d is p la y
P ie d ia g r a m
T im e p lo t
S tu d y th e v a r ia b lity
o r d is p e r s io n o f th e d a ta s e t
L o c a tin g a c e n tr a l v a lu e
o f th e d a ta s e t b y th e m e a s u r e s
M ode
M e d ia n , M e a n
Q u a n tify in g th e d is p a r ity a m o n g th e
d a ta e n tr ie s is d o n e b y th e m e a s u r e s
R a n g e , In t e r - q u a r t i l e R a n g e
V a r ia n c e , S D
Dec 6, 2015
Sampling Techniques
P ro b a b i l i t y
S a m p lin g
S im p le R a n d o m
S a m p le
S tra ti f i e d R a n d o m
S a m p le
P ro p o rti o n a te
S y s te m a ti c
R andom
S a m p le
D i s p ro p o rti o n a te
O n e S ta g e
N o n -P ro b a b i l i t y
S a m p lin g
C l u s te r
S a m p lin g
C o n v e n ie n c e
s a m p lin g
T w o S ta g e
M u l t i S ta g e
Q u o ta
S a m p lin g
Judgem ent
S a m p lin g
S nowbal
S a m p lin g
Coding
Data Entry
(Keyboarding)
Data
Analysis
Descriptive
Analysis
Univeriate
Analysis
Bivariate
Analysis
Interpretation
Multivate
Analysis
Statistical Inference
S ta tis tic a l
In f e r e n c e
T h e o ry o f
E s tim a tio n
P o in t
E s tim a tio n
Dec 6, 2015
In t e r v a l
E s tim a tio n
T e s tin g o f
H y p o th e s is
P a ra m e tric
Test
N o n P a ra m e tric
Test
Measurement Scales
Types of measurement scales are
Nominal Scale
Ordinal Scale
Interval scale
Ratio Scale
Scales of Measurement
Scale Level
Scale of
Scale
Measurement Qualities
Example(s)
Ratio
Magnitude
Age, Height,
Equal Intervals Weight,
Absolute Zero Percentage
Interval
Magnitude
Temperature
Equal Intervals
Ordinal
Magnitude
Likert Scale,
Anything rank
ordered
Nominal
None
Names, Lists
of words
Appropriate Statistics
Nominal
Ordinal
[ Cross tabs ]
[ Frequencies ]
Chi square,
Phi
Cramer's
Contingency
Median,
Interquartile range
[ Nonparametric ]
[ Nonparametric ]
Chi-square,
Runs
Binomial
McNemar
Cochran
Kolmogorov-Smirnov
Sign
Wilcoxen
Kendall coefficient of
concordance
Friedman two-way anova
Mann-Whitney U
Wald-Wolfowitz
Kruskal-Wallis
Interval
Mean
Standard Deviation
Pearson's productmoment correlation
t test
Analysis of variance,
Multivariate analysis of
variance, MANOVA
Factor analysis
Regression
Multiple correlation, R
Ratio
Coefficient of
Variation,
(CV = SD / M)
Number of
Measurement
Testing
Samples
Scale
Hypotheses
About frequency
Distribution
Test
One
Nominal
Chi-square
Two or more
Nominal
Chi-square
Interval
Z Test
Or Ratio
Hypothesis
About
means
One (small
Interval
sample
Or Ratio
Interval
t Test
Z Test
Or Ratio
Two (Small sample
Interval
t Test
Or Ratio
Three or more(Small
sample
Interval
Or Ratio
Interval
ANOVA
Z Test
Or Ratio
Hypothesis
About
Proportions
One (small
Interval
sample
Or Ratio
Interval
t Test
Z Test
Or Ratio
Two (Small sample
Interval
t Test
Or Ratio
Variance
Two or more
Interval
sample
Or Ratio
ANOVA
Mark
Class
Class A
Class B
N
5
5
Mean
118.00
129.60
Std. Deviation
6.671
6.656
Std. Error
Mean
2.983
2.977
F
Mark
Equal variances
assumed
Equal variances
not assumed
Sig.
.178
.684
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-2.753
.025
-11.60
4.214
-21.318
-1.882
-2.753
8.000
.025
-11.60
4.214
-21.318
-1.882
Class
Sample
Mean
SD
Class A
118.00
6.671
Class B
129.60
6.656
t Value
P Value
2.753
0.025
Between Groups
Within Groups
Total
Sum of
Squares
2195.200
706.400
2901.600
df
2
12
14
Mean Square
1097.600
58.867
F
18.646
Sig.
.000
Ma rk
Duncan
School
School I
School II
School III
N
5
5
5
74.60
Non-Parametric Tests
5. K Independent samples
Kolmogorov-smirnov test
Mann-Whitney U Test
Correlation Analysis
Correlation Analysis
Example:
75,79,59,78,84,65
Correlations
MATHS
STATISTI
CS
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
STATISTI
MATHS
CS
1
.968**
.
.002
6
6
.968**
1
.002
.
6
6
Regression Analysis
Linear regression : Y = + X
Where Y : Dependent variable
X : Independent variable
and : Two constants are called regression coefficients
: Slope coefficient i.e. the change in the value of Y with
the corresponding change in one unit of X
: Y intercept when X = 0
Linear regression : Y = + X
F test is used to test the significance of the linear relationship
between two variables Y and X
H0: = 0 (There is no linear relationship between Y and X)
School Climate
: 25, 34, 55, 45, 56, 49, 65
Academic Achievement: 58, 62, 80, 75, 84, 72, 89
Variables Entered/Removedb
Model
1
Variables
Entered
School a
Climate
Variables
Removed
Model Summary
Method
.
Model
1
Enter
R
.978a
R Square
.957
Adjusted
R Square
.949
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
732.832
32.597
765.429
df
1
5
6
Mean Square
732.832
6.519
F
112.409
Sig.
.000a
Coefficientsa
Model
1
(Constant)
School Climate
Unstandardized
Coefficients
B
Std. Error
36.436
3.698
.805
.076
Standardized
Coefficients
Beta
.978
t
9.853
10.602
Sig.
.000
.000
Std. Error of
the Estimate
2.55330
Multivariate Analysis
Dependency Techniques
Independency Techniques
N o
D ependence
M e th o d s
In d e p e n d e n c e
M e th o d s
D ependence
M e th o d s
H o w m a n y v a r ia b le s
a re d e p e n d e n t?
O ne D ependent
V a r ia b le
S e v e ra l D e p e n d e n t
V a r ia b le s
M e t r ic - T h e s c a le s a r e
r a t io o r in t e r v a l
N o n m e t r ic - T h e
s c a le s a r e n o m in a l
o r o r d in a l
M e t r ic - T h e s c a le s a r e
r a t io o r in t e r v a l
N o n m e t r ic - T h e
s c a le s a r e n o m in a l
o r o r d in a l
M u lt ip le
R e g r e s s io n
M u lt ip le
D is c r im in a n t
A n a ly s is
M u lt iv a r ia t e
A n a ly s is o f
V a r ia n c e
(M A N O V A )
C o n jo in t
A n a ly s is
M u lt ip le in d e p e n d e n t
and dependent
v a r ia b le s
C a n o n ic a l
A n a ly s is
M e t r ic - T h e
s c a le s a r e r a t io
o r In te rv a l
F a c to r
A n a ly s is
C lu s t e r
A n a ly s is
N o n m e t r ic - T h e
s c a le s a r e n o m in a l
o r o r d in a l
M e t r ic
M u lt id im e n s io n a l
S c a lin g
N o n m e t r ic
M u lt id im e n s io n
S c a lin g
Discriminant Analysis
Discriminant Analysis
Factor Analysis
Cluster Analysis
3.
Defining the problem: First define the problem and de upon the variables
based on which the objects are clustered.
Selection of similarity or distance measures : The similarity measure tries to
examine the proximity between the objects. Closer or similar objects are
grouped together and the farther objects are ignored. There are three major
methods to measure the similarity between objects:
1. Euclidean Distance measures
2. Correlation coefficient
3. Association coefficients
Selection of clustering approach: To select the appropriate clustering
approach. There are two types of clustering approaches:
1. Hierarchical Clustering approach
2. Non-Hierarchical Clustering approach
Hierarchical clustering Approach consists of either a top-down approach or
a bottom-up approach. Prominent hierarchical clustering methods are: Single
linkage, Complete linkage, Average linkage, Wards method and Centroid
method.