introduction to statistics and spss At the end of session 1 you will be able to: 3
1. Understand the definition of Statistics 2. Understand what is descriptive / inferential statistics 3. Understand various type of data 4. Understand sources of data 5. Create Codebook and coding questionnaire 6. Set the variable properties. 7. Code the value for each variable. 8. Know how to enter your data.
Introduction to Statistics 4 What is statistics? What is a variable? What is a data? Sources of data? Types of data Descriptive Statistics Graphical/Tables/Illustrations Numerical Measurement Inferential Statistics What is statistics? 5 Statistics is a group of methods used to collect, analyze, present and interpret data and to make good decision. What is variable / observation /data? A variable is a characteristics under study that assumes different values for different elements.
Observation is the value of a variable for an element.
A data set is a collection of observations on one or more variables. Source of data? 6 Primary Observation Experiment Survey Secondary Ready and available data set from individuals or institutions such BNM, Statistical departments, World bank etc. Survey Census and Sampling Sampling Probability or Non Probability
Types of data 7 Nominal - data that can be group exclusively in one of the collectively exhaustive groups. Ordinal - data that can be group exclusively in one of the collectively exhaustive groups and can be ranked. Interval - data that can be group exclusively in one of the collectively exhaustive groups, can be ranked and the difference between each group is known but NO meaningful zero. Ratio - data that can be group exclusively in one of the collectively exhaustive groups, can be ranked and the difference between each group is known and with meaningful zero. Qualitative or Quantitative ; Discrete or Continuous Descriptive & Inferential Statistics 8 Descriptive statistics consists of methods for organizing, displaying and describing data by using illustrations and summary
Inferential statistics consists of methods that use sample results to help make decisions or predictions about a population.
Descriptive Statistics 9 Graphical Methods Tables and graphs Depends on type of data Qualitative or Quantitative Tables Frequency Distribution Table & Relative Frequency Table Graphs Bar, Histogram, Pie, Pareto. Histogram. Line. Polygon and Ogive Numerical Measurements Central Tendency Mean, Median, Mode & Mid Range Variability Variance, Standard Deviation Position Quantiles, Percentiles, Deciles Shape skewness (left or right) and kurtosis (Sharp or Flat)
Codebook and Coding 10 Prepare a codebook (perlu letak code utk setiap soalan) cth lelaki 0, prmp 1 Variable SPSS variable name Coding Instructions Coding Variable Value Close ended questionCth berikan pandangan anda (open ended quest) list down all the answer listdown (kumpulkan jwpn letak 2 @ 3 jwpn) Dichotomous question. Example: Yes and No 0,1, gender Multiple answers - cth jwpn (tandakan seberapa byk jwpn) cth menu New variable for each possible answer The value of new variable is 0,1 Open ended questions List down all the possible questions Group all the answers to meaningful group Missing Value :9 missing value & 8 not applicable
Data Entry 11 Prepare the template using Questionnaire from Exercise 1 Using SPSS Using Excel Data Entry using Questionnaire from Exercise 1 SPSS first screen 12 How to open and exit file? 13 Data entry variable view 14 Example of variable view 15 Example of data view 16 What to look for at variable view? 17 Name of variable No illegal characters like space bar, $, * ! Use short and simple name that related to questionnaire Type numeric or string Width length of data Decimals Label Name of your variable Value value assign for your observation Missing value if you have any missing value
18 Session 2
data entry and descriptive statistics At the end of session 2 you will be able to: 19
1. Screen and clean up the data. 2. Obtain descriptive statistics from SPSS. 3. Recode, regroup and transform data. 4. Know how to select cases for analysis
Exercise 2 Screening and Cleaning 20 Use file Exercise 2 - Unclean Data Set.sav
Step 1 -Run frequency table to screen the data.
Step 2 - Check the error in each frequency table.
Step 3 Correct any error. Refer back to original questionnaire.
Step 4 Repeat Step 1 to 3 until you are sure that the data set is clean.
Descriptive statistics 21 FREQUENCY TABLE Commands Analyze , Descriptive Statistics, Frequencies Frequency table 22 Frequency table 23 Frequency table - output 24 frequency table 25 Purpose of Frequency table 26 Cleaning up Detect error Recode/Regroup To provide basic information of each variable Exercise 3 run descriptive statistics 27 Used the cleaned data set from exercise 2. Obtain descriptive statistics for each variables. Commands Analyze, Descriptive Statistics, Descriptive Output:
Descriptive Statistics 1130 2.0E+09 2.0E+09 2.0E+09 467276.895 1130 1.00 4.00 1.0221 .18454 1130 1.00 4.00 1.0549 .39773 1130 1.00 6.00 1.0619 .45493 1130 Matri ks number Taraf perkahwinan Etni k Agama Vali d N (li stwi se) N Mini mum Maximum Mean Std. Devi ation Descriptive statistics using excel 28 Under DATA, DATA ANALYSIS Data transformation 29 Recode Commands : Transform Recode Same Variable or different variable Data transformation 30 Compute 31 Session 3
correlation and regression
At the end of session 3 you will be able to: 32
1. Obtain correlation and interpret the result. 2. Differentiate between independent and dependent variable. 3. Perform simple run regression analysis 4. Perform Cross Tabulation and Chi Square Test 5. Perform T-Test
correlation 33 Independent variable (X) Dependent variable (Y) Only for continuous data or quantitative data. Between negative 1 to positive one. Denote by r = .11 (start with decimal point) Strength of relationship Close to 1- strong and close to 0 weak Significant or not Only show the ASSOCIATION and not CAUSAL Exercise 4 - correlation 34 Open file Exercise 4 - Correlation Record 1.sav Run Correlation.
Correlation scatter plot 35 REGRESSION 36 Simple regression one independent and one dependent variable Multiple regression one dependent and many independent variable Predicting variable from another. Y = a + bX + error a = intercept and b is the slope (how much Y will change as a result of changes in X) Error all other factors that influence Y not captured by X assumption 37 1. Variable types Independent Quantitative or dummy variable Dependent Quantitative variable 2. Non-zero variance 3. No PERFECT Multicollinearity 4. Predictors are uncorrelated with external variables 5. Homoscedasticity 6. Independent variable Durbin Watson 7. Normally distributed erors 8. Independence 9. Linearity Regression ols 38 Ordinary Least Square Method Minimize the residuals Regression R 2 39 Sum of Square Residual Mean less Predicted (SSE). Sum of Square Regression- Actual Value less Predicted Value (SSR). Sum of Square Total Actual Value less Mean Value (SST) R 2 = how much changes in Y is explained by X R 2 = 1- SSE/SST or SSR/SST
Exersice 5 - regression 40 Open file Exercise 5 - Regression Record 2.sav
Descriptive Statistics 193.2000 80.69896 200 614.4123 485.65521 200 27.5000 12.26958 200 6.7700 1.39529 200 Record Sales (thousands) Advertsi ng Budget (thousands of pounds) No. of pl ays on Radio 1 per week Attracti veness of Band Mean Std. Devi ati on N Regression output correlation 43
Correlations 1.000 .578 .599 .326 .578 1.000 .102 .081 .599 .102 1.000 .182 .326 .081 .182 1.000 . .000 .000 .000 .000 . .076 .128 .000 .076 . .005 .000 .128 .005 . 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 Record Sal es (thousands) Advertsing Budget (thousands of pounds) No. of pl ays on Radi o 1 per week Attracti veness of Band Record Sal es (thousands) Advertsing Budget (thousands of pounds) No. of pl ays on Radi o 1 per week Attracti veness of Band Record Sal es (thousands) Advertsing Budget (thousands of pounds) No. of pl ays on Radi o 1 per week Attracti veness of Band Pearson Correlati on Si g. (1-tai l ed) N Record Sal es (thousands) Advertsing Budget (thousands of pounds) No. of pl ays on Radio 1 per week Attracti veness of Band Regression output model summary 44
Model Summary c .578 a .335 .331 65.99144 .335 99.587 1 198 .000 .815 b .665 .660 47.08734 .330 96.447 2 196 .000 1.950 Model 1 2 R R Square Adjusted R Square Std. Error of the Esti mate R Square Change F Change df1 df2 Si g. F Change Change Statisti cs Durbi n- Watson Predi ctors: (Constant), Advertsi ng Budget (thousands of pounds) a. Predi ctors: (Constant), Advertsi ng Budget (thousands of pounds), Attracti veness of Band, No. of pl ays on Radi o 1 per week b. Dependent Vari abl e: Record Sal es (thousands) c. Regression output anova table 45
ANOVA c 433687.8 1 433687.833 99.587 .000 a 862264.2 198 4354.870 1295952 199 861377.4 3 287125.806 129.498 .000 b 434574.6 196 2217.217 1295952 199 Regressi on Resi dual Total Regressi on Resi dual Total Model 1 2 Sum of Squares df Mean Square F Si g. Predi ctors: (Constant), Advertsi ng Budget (thousands of pounds) a. Predi ctors: (Constant), Advertsi ng Budget (thousands of pounds), Attracti veness of Band, No. of pl ays on Radio 1 per week b. Dependent Vari abl e: Record Sal es (thousands) c. Regression how to report 46 B SE B Model 1 Constant 134.14 7.54 Advertising Budget 0.10 0.01 .58* Model 2 Constant -26.61 17.35 Advertising Budget 0.09 0.01 .51* Plays on BBC Radio 1 3.37 0.28 .51* Note: R 2 = .34 for Model 1; R 2 = .33 for Model 2. * p<.001 Cross tabs how? 47 For qualitative (ordinal and nominal) data only. Command Analyze, Descriptive Statistics, Crosstab Crosstabs how? 48 Independent variable row or column? Click percentage in row or column where your independent variable is. Pearson Chi Square test Conditions: Expected count more than 5 Exclusive data set Likelihood ratio samples are small
Exercise 6 49 Open file Exercise 6 - Chi Square & Recode.sav Cross tabulations Interpret the results