You are on page 1of 3

A NOTE ON REGRESSION ANALYSIS

This is a step-by-step approach on how to do regression analysis in SPSS. This is to help you for your projects point of view and it is not supposed to explain the fundamentals of regression analysis. A) How to do Multivariate Regression Analysis with only quantitative variables?

A multivariate regression analysis is one where we have more than one independent variables and one dependent variable. The general form of multivariate regression can be written as Y= 0+1X1+2X2+nXn+ Where X1, X2..Xn are the independent variables, Y is the dependent variable and is the error term. Step 1: Whenever you get a data set to do regression analysis, the first step is to run a scatter plot with each of the independent variables and the dependent variable. In other words, take dependent variable in the Y-axis and X1, X2, X3 ..Xn (one at a time) in the X-axis. The scatter plot will tell you the type or relationship that you are likely to expect. If the relationship is linear, then you more likely, but not absolutely certain, to get significant values. However, if the relationship is not linear, then you might think of inserting higher order values of independent variables (say for example, X2, X3, ln (x) etc). Step 2: The next step is to run the descriptives statistics which is available on SPSS. Find out the mean, standard deviation, sample size and variance. Look at standard deviation and see which independent variable has the highest standard deviation. It might also help you in outlier selection later. Look at the sample see and see if any missing data is present or not. In case of missing data, either you have to remove that observation totally or do a missing value analysis (such as replacing it by the mean value). Step 3: Run the correlation matrix along with the significance value among the independent variables with the help of SPSS. If the correlations are not significant, you are lucky!!!! But the real life data will have significant correlations. If the correlations are significant, you have a problem of multi-collinearity. As a business manager, you need to decide what level of correlations you are comfortable with. As a thumb rule, any correlations below 0.40 are okay. However, you are the best judge. Another way of deciding on the correlations is to look at Variance Inflation factor (VIF), which I will deal in the subsequent steps. Step 4: Run the regression analysis with the help of SPSS. The fist statistics you are to look into is the F-statistic. If the F-statistics is not significant (i.e. the value of the significance term should be less than 0.05), then your data set cannot be used for regression analysis. In such as case, you have

either to modify the model or look for errors in your raw data. If the F-statistic is significant, then look at adjusted R2. This value tells you the amount of variation in your dependent variable that is explained by your independents variables. Normally, we expect a high value of adjusted R2. The high value also depends on the type of data. If your data is cross-sectional, i.e. time is not one of the independent variable, an adjusted R2 or0.30 is acceptable. However, if your data is time series, i.e. time is one of the independent variable or one of your independent variables is related to time, adjusted R2 should be more than 0.70.We also look at the standard error of the estimate which should be low. Set 5: We now look into the table which estimates the co-efficient. The first thing that we look is the significance of the t-statistics. The value should be lower or equal to than 0.05 in order for co-efficient to be significant and to be included in the model. The value of the estimated coefficient is given in column titled Unstandarized co-efficient. Step 6: Remember we also looked into correlation co-efficient and their significance of the independent variables. In order to check whether the levels of collinearity is acceptable or not, we run the VIF statistic. The VIF statistics is available under the Analyze, Regression, and Linear tab in SPSS. Once you have opened the regression command box, go to Statistics tab. A window will open which will tell you regression co-efficient etc. Click on Collinearity diagnostics and click on continue. The output sheet will show you VIF in the table where co-efficients are estimated. A VIF of less than 10 is considered to be okay with no problem of multi-collinearity. Step 7: In case your F-statistic is not significant, you can do some quick fix solutions. Look into the scatter plot that you have drawn initially and check for outliers. A good starting point is seeing which independent variable has the highest standard deviation. Remove the outlier if your sample size is high and you do not mind losing a few observations. Rerun regression analysis and see if your F-statistics is significant. If you have low sample size, look at your scatter plot and see the type of relationship. If the relationship is not linear, transform the independent variable and rerun. See if you get significant results. EVEN AFTER ALL THESE YOU STILL MIGHT NOT GET SIGNIFICANT RESULTS, WHICH MEANS YOUR INDEPENDENT VARIABLES DOES NOT EXPLAIN THE VARIATION IN THE DEPENDENT VARIABLE.
B)

How to do regression analysis with qualitative variables?

We can also include categorical variables (qualitative variables) in our regression analysis. Examples of categorical variables could be the sex of an individual (male/female), geographical locations of a country (east, west, north, south) etc. In order to include such variables in our model, we introduce a new type of variable called dummy variables. Dummy variables are most commonly represented by 0 & 1, although there is no mathematical logic for it. It can be represented by any number. The purpose of dummy variables is to inform

SPSS those variables that are coded as 0 represent one category of variable and are different from those that are represented as 1. For example, the sex of an individual can be either male or female. So it we represent males with 0, them males are represented as 1. Similarly if there are four variables say, east, west, south and north, we can represent them using dummy variables in the following way Dummy variables East West North South X1 1 0 0 0 X2 0 1 0 0 X3 0 0 1 0

Reference

Dummy

It is evident from the table that East is represented as 100 while South is represented as 000. The dummy that takes on the value of 0 for all its categories is called a reference dummy. So for n categories, we would have (n-1) dummy variables. Run regression analysis the same way as we have discussed with qualitative variables. However, the co-efficient for a dummy variable regression signify something different from that of quantitative variables. The mean value of the category represented by reference dummy is represented by 0. x0=0 Similarly, the mean value of the category which is represented by dummy variable X1 variable is given by 0+1. x1 = 0+1 1= x1- 0 1= x1-x0 From the regression co-efficient table, if we find that 1is significant and positive, we can deduce that mean value of the category represented by dummy variable X1 is higher than that represented by X0. Similarly, we can find the mean value of other categories and compare them.

You might also like