You are on page 1of 22

S10: REGRESSION ANALYSIS

PGP I Term 3 (2016-17 )


Research for Marketing Decisions

Prof. Avinash Mulky


Review: Basic analysis

Check frequencies and percentages for all important variables


Calculate means, and standard deviations for important
interval scaled variables
Do cross tabulations (2 variables at a time) for nominally
scaled variables to generate insights
Do t-tests for important interval scaled variables
Check ANOVAs for impact of categorical variables on interval
measurements
One way ANOVAs for one factor
Two way (or n-way )ANOVAs for two (or more) factors and
interactions
Review: Cross tabulations

Study associations between pairs of nominal variables


Brands A, B, C, D across segments formed by age, gender, SEC
Underlying model is independence between the two variables
(no association between the row variable and the column
variable)
In Chi2 we test the model by comparing observed and
expected frequencies.
If we want to analyze models greater than 2 dimensional, we
have to use Log Linear models
2 way ANOVA
Advertising

Low Medium High

Low

Price Medium

High
Model : x ijk = + i + j + ijk

SS total = SS ads + SS price + SS ads x price + SS within

   
 =     . 
 =         
   
  =    . 

  =        .   + 

   =        .  .
Results

Source SS dF MS F P-value
A (Ad) 5.3 1 5.3 0.75 0.3951
B (Price) 74.46 2 37.23 5.27 0.0127
A x B (Ad x Price) 52.6 2 26.3 3.72 0.0392
Error (5 test markets ) 169.6 24 7.07
Total 301.96 29

Source: Iacobucci (2015)


Table of means: Purchase intention or Sales
Price
Low Medium High
Ad

61.4 32.7 19.1 37.7


Standard

Luxury 41.8 24.5 27.3 31.2

51.6 28.6 23.2

Source: Iacobucci (2015)


Interactions

51.6
37.7

31.2 Purchase 28.6


Purchase intention
intention 23.2

Low Medium High


Standard Luxury

Standard Ad

Purchase
Luxury Ad
intention

Low Medium High


Source: Adapted from Iacobucci (2015)
Type of relationship
examined

Interdependence
Dependence

One Dependent Variable Focus on Variable


Single Relationship Factor Analysis
Multiple Regression (Dependent Metric)
Conjoint Analysis (Dependent Metric)
Multiple Discriminant Analysis (Dependent Non-metric)
Linear Probability Models (Dependent Non-metric) Focus on Cases/Objects
Cluster Analysis

Several Dependent Variables


Single Relationship Focus on Objects
Canonical Correlation Analysis with dummy variables Multidimensional Scaling
(Dependent Metric/Non-metric, Predictor Metric) (Attributes are Metric)
Multivariate Analysis of Variance (Dependent Metric, Correspondence Analysis
Predictor Non-Metric) (Attributes are Metric/Non-
metric)

Multiple Relationships of Dependent and


Independent Variables
Structural Equation Modeling
Correlation

Also known as product moment correlation, Pearson


correlation coefficient, simple correlation or bivariate
correlation
Is an index used to determine whether a linear or straight line
relationship exists between two metric (interval or ratio
scaled) variables say X and Y
Represented by r
R = Cov XY/Sx Sy
R varies between 0 and 1
R indicates the strength of the association between X and Y
Does not depend on the underlying unit of measurement
SPSS Windows: Pearson Correlation

1. Select ANALYZE from the SPSS menu bar.

2. Click CORRELATE and then BIVARIATE..

3. Move Attitude[attitude] in to the VARIABLES box.. Then move


Duration[duration] ] in to the VARIABLES box..

4. Check PEARSON under CORRELATION COEFFICIENTS.

5. Check ONE-TAILED under TEST OF SIGNIFICANCE.

6. Check FLAG SIGNIFICANT CORRELATIONS.

7. Click OK.
Bivariate Regression

In the bivariate regression model, the general form of a


straight line is: Y = 0 + X1

where
Y = dependent or criterion (dependent) variable
X = independent or predictor (independent ) variable
0= intercept of the line
1 = slope of the line

The regression procedure adds an error term to account for the


probabilistic or stochastic nature of the relationship:

Yi = 0 + 1 Xi + ei

where ei is the error term associated with the i th observation.


17-12
Explaining Attitude Toward
the City of Residence

Respondent No Attitude Toward Duration of Importance


the City Residence Attached to
Weather
1 6 10 3

2 9 12 11

3 8 12 4

4 3 4 1

5 10 12 11

6 4 6 1

7 5 8 7

8 2 2 4

9 11 18 8

10 9 9 10

11 10 17 8

12 2 2 5
17-13
Plot of Attitude with Duration

Step 1: Plot the values in a scatterplot

9
Attitude

2.25 4.5 6.75 9 11.25 13.5 15.75 18

Duration of Residence
Source: Malhotra and Dash (2011)
Bivariate Regression

Step 2: Derive the best possible fit using the method of


least squares

0 + 1X
Y
YJ
eJ

eJ

YJ

X
X1 X2 X3 X4 X5
Source: Malhotra and Dash (2011)
17-15
Decomposition of
Variation
Step 3: Estimate the value of Y that captures the
maximum variance

Y
Residual Variation SSres

Explained Variation SSreg


Y

X1 X2 X3 X4 X5 X

Source: Malhotra and Dash (2011)


17-16
Bivariate Regression

Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 1 105.95222 105.95222


Residual 10 14.96444 1.49644
F = 70.80266 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta () T Significance
of T
Duration 0.58972 0.07008 0.93608 8.414 0.0000
(Constant) 1.07932 0.74335 1.452 0.1772

Source: Malhotra and Dash (2011)


17-17
Assumptions

The error term is normally distributed. For each fixed


value of X, the distribution of Y is normal.

The means of all these normal distributions of Y, given


X, lie on a straight line with slope b.

The mean of the error term is 0.

The variance of the error term is constant. This


variance does not depend on the values assumed by X.

The error terms are uncorrelated. In other words, the


observations have been drawn independently.

17-18
Multiple Regression

The general form of the multiple regression model


is as follows:

Y = 0 + 1 X1 + 2 X2 + 3 X3+ . . . + k Xk + e
which is estimated by the following equation:

Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk

The coefficient a represents the intercept,


but the b's are now the partial regression coefficients.

2007 Prentice Hall 17-19


Multiple Regression

Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 2 114.26425 57.13213


Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta () T Significance
of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
Source: Malhotra and Dash (2011)
17-20
Multicollinearity

Multicollinearity occurs when intercorrelations among


predictors are very high.
The partial regression coefficients may not be estimated precisely.
The standard errors are likely to be high.
The magnitudes, as well as the signs of the partial regression coefficients, may
change from sample to sample.
Assessment of relative importance of the independent variables in explaining
variation in the dependent variable becomes difficult.
Predictor variables may be incorrectly included or removed in stepwise
regression.
Adjusting for Multicollinearity
Use only one of the variables in a highly correlated set of variables.
Transform the set of independent variables into orthogonal factors
through principal component analysis
SPSS Windows: Regression
1. Select ANALYZE from the SPSS menu bar.
2. Click REGRESSION and then LINEAR.

3. Move the dependent variable into the DEPENDENT box..

4. Move independent variables in to the INDEPENDENT(S) box..

5. Select ENTER in the METHOD box.

6. Click on STATISTICS and check ESTIMATES under REGRESSION


COEFFICIENTS. Check MODEL FIT. Click CONTINUE.

7. Click Plots. In the LINEAR REGRESSION: PLOTS box, move *ZRES


into the Y box and *ZPRED into the X box. Check HISTOGRAM
and NORMAL PROBABILITY PLOT in the STANDARDISED
RESIDUALS PLOT. Click CONTINUE.

8. Click OK