Professional Documents
Culture Documents
Population Collection of all individuals or objects or items under study and denoted by N Sample A part of a population and denoted by n Variable Characteristic of an individual or object. Parameter Characteristic of the population Statistic Characteristic of the sample
Qualitative and Quantitative variables
NOTATIONS OF POPULATION AND SAMPLE Characteristics Size Mean SD Proportion Correlation Coefficient Population N Sample
n
_ x = X n
s= _ 2 (x x) n
_ 2 (x x) S= n1
x p = n
C O V x , y) ( r= x y
Organise data
Dr.R.RAVANAN, Presidency College
Sampling Techniques
P S S S i m a Sp r o b a m a p b i l i t y l i n g N S o a n m - P p r o b a l i n g
tl e r a R t S i f a i y e n s d d t e o R C m m a l au n tCs d i ct o e nm r v Q e un oi J e t u na d c g e S e n m o we m S p a l e m R p a l e n d S o am m s p a l m i n S pg a l i nm Sg p a l i m n S g p a l i m n g S a m p l e r o Dp o i s r p t iO ro o n n p ae To t erwS t i ot o aM n Sg u a e t l at t e i g S e t a g e
Descriptive Analysis
Univeriate Analysis
Bivariate Analysis
Multivate Analysis
Interpretation
Statistical Inference
S t a t i s I n f e r e t i c a n c e l
T E P E s
Jun 16, 2011
h e o s t i m
r y o f a t i o n
T H
e s y p
t i n g o t h e
i n t i m
t I n t e r v P a a l r a mN E a st i t o i mn a T t ie o s n t
Dr.R.RAVANAN, Presidency College
oe nt r i P c a T e s t
Measurement Scales
Types of measurement scales are Nominal Scale Ordinal Scale Interval scale Ratio Scale
e.g., Height, No ordering: Differences make Weight, e.g. it makes no sense to e.g., Political parties on left to right spectrum sense, but ratios state that M > F Age, given labels 0, 1, 2 do not Length e.g., Likert scales, rank on a Arbitrary labels: scale of 1..5 your degree e.g. Temperature e.g., M/F, 0/1, etc of satisfaction (C,F), Dates e.g., Restaurant ratings
What is a natural zero: Some scales of measurement have a natural zero and some do not. For example, height, weight etc have a natural 0 at no height or no weight. Consequently, it makes sense to say that 2m is twice as large as 1m. Both of these variables are ratio scale. On the other hand, year and temperature (C) do not have a natural zero. The year 0 is arbitrary and it is not sensible to say that the year 2000 is twice as old as the year 1000. Similarly, 0C is arbitrary (why pick the freezing point of water?) and it again does not make sense to say that 20C is twice as hot as 10C. Both of these variables are interval scale.
Scales of Measurement
Scale Level 4 Scale of Scale Qualities Example(s) Measurement Ratio Magnitude Age, Height, Equal Intervals Weight, Absolute Zero Percentage Magnitude Temperature Equal Intervals Magnitude Likert Scale, Anything rank ordered Names, Lists of words
3 2
Interval Ordinal
Nominal
None
Greater than Addition and Multiplication or less than subtraction of and division of operations. scale values. scale values.
Appropriate Statistics
Nominal
[ Cross tabs ]
Chi square, Phi Cramer's Contingency [ Nonparametric ] Chi-square, Runs Binomial McNemar Cochran
Ordinal
[ Frequencies ]
Median, Interquartile range
Interval
Mean Standard Deviation Pearson's product-moment correlation t test Analysis of variance, Multivariate analysis of variance, MANOVA Factor analysis Regression Multiple correlation, R
Ratio
Coefficient of Variation, (CV = SD / M)
[ Nonparametric ]
Kolmogorov-Smirnov Sign Wilcoxen Kendall coefficient of concordance Friedman two-way anova Mann-Whitney U Wald-Wolfowitz Kruskal-Wallis
Statistical Inference
There are two types of statistical inferences: Estimation of population parameters and hypothesis testing. Hypothesis testing is one of the most important tools of application of statistics to real life problems. Most often, decisions are required to be made concerning populations on the basis of sample information. Statistical tests are used in arriving at these decisions.
Steps in Hypothesis Testing 1 Identify the null hypothesis H0 and the alternate hypothesis H1. 2 Choose . The value should be small, usually less than 10%. It is important to consider the consequences of both types of errors. 3 Select the test statistic and determine its value from the sample data. This value is called the observed value of the test statistic. 4 Compare the observed value of the statistic to the critical value obtained for the chosen a. 5 Make a decision
If the test statistic does not fall in the critical region: Conclude that there is not enough evidence to reject H0.
Types of Error
Type of decision H0 true H0 false
Reject H0
Type I error ()
Accept H0
Use the key by answering the questions in the most relevant way.
1. Have you got more than two samples?
No......go to 2 Yes.....go to 8
4. Do your data sets have any factor in common (dependence), i.e. location or individuals?
No.Mann Whitney U test YesWilcoxon Matched Pairs
5. Do your data sets have any factor in common (dependence), i.e. location or individuals?
No......go to 6 Yes.....paired sample t-test
Use the key by answering the questions in the most relevant way.
6. Do your data sets have equal variances (f-test)?
No......unequal variance t-test Yes.....go to 7
Descriptive Statistics
Diversity Indices
Comparisons
Correlations
Regression
A r e s o m e o f t h e v a r ia b le s d e p e n d o n o t h e r s ? Y D e s N o
e n
e p e n d I en n d c e e p e n d e n M e t h o d sM e t h o d s
c e
D M H o w a r e
p e n d e n e t h o d s m d e V e a p n e y n
c e
v a r i a b d e n t ? e n
l e
n V
e a
D e p e r i a b l e
v e r a l D e p a r i a b l e s
e M n ut l t i p l e in d a n d d e p e n v a r i a b l e s
e t r i c r a t i o
- T h e sN c o a n l em s e a t r r i e c M - e T t rh i ce o r i n t e s r c v a a l le s a r e n o a m t i oi n r o r o r d i n a l l t i p l e n D i s c r im i n A n a l y s i s M u M n At
- T h e sN c o a n l em s e a t r r i e c - TC h a e n o n i c a a lr i n t e s r c v a a l l e s o a r e n o m ni n a a l y l s i s A o r o r d i n a l a r i a t e s i s o f n c e O V A ) C A o n jo i n t n a l y s i s
M R
u l t i p le e g r e s s i o
u l t i v n a l y V a r ia ( M A N
r e M a
M s c a o r C A
t r i c - T h e l e s a r e r a I n t e r v a l l u n a s t e r l y s Ms i M u S e
N t i o s
o n m e t r i c c a l e s a r e o r o r d i n a
F A n
c t o r a l y s i s
t r i c N o n m e t r i l t i d i m eM n u s l i t o i d n i am l e c a l i n g S c a l i n g
Test of Hypothesis
Test of Hypotheses concerning mean(s). Test of Hypotheses concerning variance/Variances. Test of Hypotheses concerning proportions.
Jun 16, 2011 Dr.R.RAVANAN, Presidency College
M rk a
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -21.318 -21.318 -1.882 -1.882
Sig. .684
t -2.753 -2.753
df 8 8.000
Sample 5 5
SD 6.671 6.656
t Value 2.753
P Value 0.025
Comparing Multiple Population 1. Comparing multiple population variances 2. Comparing multiple population means
N 5 5 5
Sb e f r a h =. 5 u s t o lp a 0 1 2 4. 0 50 5. 0 86
7. 0 46
Non-Parametric Tests
In some situations, the practical data may be non-normal and/or it may not be possible to estimate the parameter(s) of the data The test which are used for such situations are called non-parametric tests Since these tests are based on the data which are free from distribution and parameter, these tests are known as non-parametric(NP) test or Distribution Free tests NP test can be used even for nominal data (qualitative data like greater or less, etc.) and ordinal data, like ranked data. NP test required less calculation, because there is no need to compute parameters.
5. K Independent samples
Kolmogorov-smirnov test
It is similar to the chi-square test to do goodness of fit of a given set of data to an assumed distribution This test is more powerful for small samples whereas the chisquare test is suited for large sample H0: The given data follow an assumed distribution H1: The given data do not follow an assumed distribution K-S test is an one-tailed test. Hence if the calculated value of D is more than the theoretical value of D for a given significance level, then reject H0 ; otherwise accept H0
if Xi < Yi
if Xi = Yi
Mann-Whitney U Test
Mann-Whitney U test is an alternate to the two sample t-test This test is based on the ranks of the observations of two samples put together Alternate name for this test is Rank-Sum Test Let R1 = The sum of the ranks of the observations of the first sample Let R2 = The sum of the ranks of the observations of the second sample Objective: To check whether the two samples are drawn from different populations having the same distribution Compute Z = [U E(U)]/SD(U) where U = n1n2 + [n1(n1 + 1)/2] - R1 or U = n1n2 + [n2(n2 + 1)/2] - R2
Correlation Analysis
Correlation analysis is a statistical technique used to measure the magnitude of linear relationship between two variables. Correlation analysis cannot be used in isolation to describe the relationship between variables. It can be used along with regression analysis to determine the nature of the relationship between two variables. Thus correlation analysis can be used for further analysis Two prominent types of correlation Coefficient are Pearson Product Moment correlation coefficient Spearmans Rank correlation coefficient Testing the significance of correlation coefficient Type I H0: = 0 and H1: 0 Type II H0: = r and H1: r Type III H0: r1 = r2 and H1: r1 r2
Correlation Analysis
Example:
MATHS
Regression Analysis
Regression analysis is used to predict the nature and closeness of relationships between two or more variables It evaluate the causal effect of one variable on another variable It used to predict the variability in the dependent (or criterion) variable based on the information about one or more independent (or predictor) variables. Two variables : Simple or Linear Regression Analysis More than two variables : Multiple Regression Analysis
R2 : The strength of association i.e. to what degree that the variation in Y can be explained by X. R2 = 0.10 then only 10% of the total variation in Y can be explained by the variation in X variables
Example for Regression Analysis School Climate : 25, 34, 55, 45, 56, 49, 65 Academic Achievement: 58, 62, 80, 75, 84, 72, 89
Variables Entered/Removed b Model 1 Variables Entered School a Climate Variables Removed . Method Enter Model Summary Model 1 R R Square .978a .957 Adjusted R Square .949 Std. Error of the Estimate 2.55330
ANOVAb Model 1 Sum of Squares 732.832 32.597 765.429 df 1 5 6 Mean Square 732.832 6.519 F 112.409 Sig. .000 a
a Coefficients
Model 1
t 9.853 10.602
Multivariate Analysis
Multivariate analysis is defined as all statistical techniques which are simultaneously analyse more than two variables on a sample of observation. Multivariate analysis helps the researcher in evaluating the relationship between multiple (more than two) variables simultaneously. Multivariate techniques are broadly classified into two categories:
Dependency Techniques Independency Techniques
A r e s o m e o f t h e v a r ia b le s d e p e n d o n o t h e r s ? Y D e s N o
e n
e p e n d I en n d c e e p e n d e n M e t h o d sM e t h o d s
c e
D M H o w a r e
p e n d e n e t h o d s m d e V e a p n e y n
c e
v a r i a b d e n t ? e n
l e
n V
e a
D e p e r i a b l e
v e r a l D e p a r i a b l e s
e M n ut l t i p l e in d a n d d e p e n v a r i a b l e s
e t r i c r a t i o
- T h e sN c o a n l em s e a t r r i e c M - e T t rh i ce o r i n t e s r c v a a l le s a r e n o a m t i oi n r o r o r d i n a l l t i p l e n D i s c r im i n A n a l y s i s M u M n At
- T h e sN c o a n l em s e a t r r i e c - TC h a e n o n i c a a lr i n t e s r c v a a l l e s o a r e n o m ni n a a l y l s i s A o r o r d i n a l a r i a t e s i s o f n c e O V A ) C A o n jo i n t n a l y s i s
M R
u l t i p le e g r e s s i o
u l t i v n a l y V a r ia ( M A N
r e M a
M s c a o r C A
t r i c - T h e l e s a r e r a I n t e r v a l l u n a s t e r l y s Ms i M u S e
N t i o s
o n m e t r i c c a l e s a r e o r o r d i n a
F A n
c t o r a l y s i s
t r i c N o n m e t r i l t i d i m eM n u s l i t o i d n i am l e c a l i n g S c a l i n g
Discriminant Analysis
Discriminant analysis aims at studying the effect of two or more predictor variables (independent variables) on certain evaluation criterion The evaluation criterion may be two or more groups Two groups such as good or bad, like or dislike, successful or unsuccessful, above expected level or below expected level Three groups such as good, normal or poor Check whether the predictor variable discriminate among the groups To identify the predictor variable which is more important when compared to other predictor variable(s). Such analysis is called discriminant analysis
Discriminant Analysis
Designing a discriminant function: Y = aX1 + bX2 where Y is a linear composite representing the discriminant function, X1 and X2 are the predictor variables (independent variables) which are having effect on the evaluation criterion of the problem of interest. Finding the discriminant ratio (K) and determining the variables which account for intergroup difference in terms of group means This ratio is the maximum possible ratio between the variability between groups and the variability within groups Finding the critical value which can be used to include a new data set (i.e. new combination of instances for the predictor variables) into its appropriate group Testing H0: The group means are equal in importance H1: The group means are not equal in importance using F test at a given significance level
Factor Analysis
Factor analysis can be defined as a set of methods in which the observable or manifest responses of individuals on a set of variables are represented as functions of a small number of latent variables called factors. Factor analysis helps the researcher to reduce the number of variables to be analyzed, thereby making the analysis easier. For example, Consider a market researcher at a credit card company who wants to evaluate the credit card usage and behaviour of customers, using various variables. The variables include age, gender, marital status, income level, education, employment status, credit history and family background. Analysis based on a wide range of variables can be tedious and time consuming. Using Factor Analysis, the researcher can reduce the large number of variables into a few dimensions called factors that summarize the available data. Its aims at grouping the original input variables into factors which underlying the input variables. For example, age, gender, marital status can be combined under a factor called demographic characteristics. The income level, education, employment status can be combined under a factor called socio-economic status. The credit card and family background can be combined under factor called background status.
Cluster Analysis
Cluster analysis can be defined as a set of techniques used to classify the objects into relatively homogeneous groups called clusters It involves identifying similar objects and grouping them under homogeneous groups Cluster as a group of objects that display high correlation with each other and low correlation with other variables in other clusters
3.
Non-Hierarchical clustering Approach: A cluster center is first determined and all the objects that are within the specified distance from the cluster center are included in the cluster 4 5 Deciding on the number of clusters to be selected Interpreting the clusters
Books for Reference SPSS For Windows Step by Step A simple Guide and Reference Sixth Edition Darren George and Paul Mallery Pearson Education 48, Ariya Gowda Road, West Mambalam, Chennai Phone: 24803091, 92, 93, 94
Research Methodology
Panneerselvam.R Prentice-Hall of India Private Limited, New Delhi.