You are on page 1of 22

# S10: REGRESSION ANALYSIS

## PGP I Term 3 (2016-17 )

Research for Marketing Decisions

## Prof. Avinash Mulky

Review: Basic analysis

## Check frequencies and percentages for all important variables

Calculate means, and standard deviations for important
interval scaled variables
Do cross tabulations (2 variables at a time) for nominally
scaled variables to generate insights
Do t-tests for important interval scaled variables
Check ANOVAs for impact of categorical variables on interval
measurements
One way ANOVAs for one factor
Two way (or n-way )ANOVAs for two (or more) factors and
interactions
Review: Cross tabulations

## Study associations between pairs of nominal variables

Brands A, B, C, D across segments formed by age, gender, SEC
Underlying model is independence between the two variables
(no association between the row variable and the column
variable)
In Chi2 we test the model by comparing observed and
expected frequencies.
If we want to analyze models greater than 2 dimensional, we
have to use Log Linear models
2 way ANOVA
Advertising

## Low Medium High

Low

Price Medium

High
Model : x ijk = + i + j + ijk

## SS total = SS ads + SS price + SS ads x price + SS within

   
 =     . 
 =         
   
  =    . 

  =        .   + 

   =        .  .
Results

Source SS dF MS F P-value
A (Ad) 5.3 1 5.3 0.75 0.3951
B (Price) 74.46 2 37.23 5.27 0.0127
A x B (Ad x Price) 52.6 2 26.3 3.72 0.0392
Error (5 test markets ) 169.6 24 7.07
Total 301.96 29

## Source: Iacobucci (2015)

Table of means: Purchase intention or Sales
Price
Low Medium High
Ad

Standard

Interactions

51.6
37.7

## 31.2 Purchase 28.6

Purchase intention
intention 23.2

Standard Luxury

Standard Ad

Purchase
Luxury Ad
intention

## Low Medium High

Source: Adapted from Iacobucci (2015)
Type of relationship
examined

Interdependence
Dependence

## One Dependent Variable Focus on Variable

Single Relationship Factor Analysis
Multiple Regression (Dependent Metric)
Conjoint Analysis (Dependent Metric)
Multiple Discriminant Analysis (Dependent Non-metric)
Linear Probability Models (Dependent Non-metric) Focus on Cases/Objects
Cluster Analysis

## Several Dependent Variables

Single Relationship Focus on Objects
Canonical Correlation Analysis with dummy variables Multidimensional Scaling
(Dependent Metric/Non-metric, Predictor Metric) (Attributes are Metric)
Multivariate Analysis of Variance (Dependent Metric, Correspondence Analysis
Predictor Non-Metric) (Attributes are Metric/Non-
metric)

## Multiple Relationships of Dependent and

Independent Variables
Structural Equation Modeling
Correlation

## Also known as product moment correlation, Pearson

correlation coefficient, simple correlation or bivariate
correlation
Is an index used to determine whether a linear or straight line
relationship exists between two metric (interval or ratio
scaled) variables say X and Y
Represented by r
R = Cov XY/Sx Sy
R varies between 0 and 1
R indicates the strength of the association between X and Y
Does not depend on the underlying unit of measurement
SPSS Windows: Pearson Correlation

## 3. Move Attitude[attitude] in to the VARIABLES box.. Then move

Duration[duration] ] in to the VARIABLES box..

## 6. Check FLAG SIGNIFICANT CORRELATIONS.

7. Click OK.
Bivariate Regression

## In the bivariate regression model, the general form of a

straight line is: Y = 0 + X1

where
Y = dependent or criterion (dependent) variable
X = independent or predictor (independent ) variable
0= intercept of the line
1 = slope of the line

## The regression procedure adds an error term to account for the

probabilistic or stochastic nature of the relationship:

Yi = 0 + 1 Xi + ei

## where ei is the error term associated with the i th observation.

17-12
Explaining Attitude Toward
the City of Residence

## Respondent No Attitude Toward Duration of Importance

the City Residence Attached to
Weather
1 6 10 3

2 9 12 11

3 8 12 4

4 3 4 1

5 10 12 11

6 4 6 1

7 5 8 7

8 2 2 4

9 11 18 8

10 9 9 10

11 10 17 8

12 2 2 5
17-13
Plot of Attitude with Duration

9
Attitude

## 2.25 4.5 6.75 9 11.25 13.5 15.75 18

Duration of Residence
Source: Malhotra and Dash (2011)
Bivariate Regression

## Step 2: Derive the best possible fit using the method of

least squares

0 + 1X
Y
YJ
eJ

eJ

YJ

X
X1 X2 X3 X4 X5
Source: Malhotra and Dash (2011)
17-15
Decomposition of
Variation
Step 3: Estimate the value of Y that captures the
maximum variance

Y
Residual Variation SSres

Y

X1 X2 X3 X4 X5 X

## Source: Malhotra and Dash (2011)

17-16
Bivariate Regression

Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard Error 1.22329

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

## Regression 1 105.95222 105.95222

Residual 10 14.96444 1.49644
F = 70.80266 Significance of F = 0.0000

## VARIABLES IN THE EQUATION

Variable b SEb Beta () T Significance
of T
Duration 0.58972 0.07008 0.93608 8.414 0.0000
(Constant) 1.07932 0.74335 1.452 0.1772

17-17
Assumptions

## The error term is normally distributed. For each fixed

value of X, the distribution of Y is normal.

## The means of all these normal distributions of Y, given

X, lie on a straight line with slope b.

## The variance of the error term is constant. This

variance does not depend on the values assumed by X.

## The error terms are uncorrelated. In other words, the

observations have been drawn independently.

17-18
Multiple Regression

## The general form of the multiple regression model

is as follows:

Y = 0 + 1 X1 + 2 X2 + 3 X3+ . . . + k Xk + e
which is estimated by the following equation:

## The coefficient a represents the intercept,

but the b's are now the partial regression coefficients.

## 2007 Prentice Hall 17-19

Multiple Regression

Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

## Regression 2 114.26425 57.13213

Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

## VARIABLES IN THE EQUATION

Variable b SEb Beta () T Significance
of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
Source: Malhotra and Dash (2011)
17-20
Multicollinearity

## Multicollinearity occurs when intercorrelations among

predictors are very high.
The partial regression coefficients may not be estimated precisely.
The standard errors are likely to be high.
The magnitudes, as well as the signs of the partial regression coefficients, may
change from sample to sample.
Assessment of relative importance of the independent variables in explaining
variation in the dependent variable becomes difficult.
Predictor variables may be incorrectly included or removed in stepwise
regression.
Adjusting for Multicollinearity
Use only one of the variables in a highly correlated set of variables.
Transform the set of independent variables into orthogonal factors
through principal component analysis
SPSS Windows: Regression
1. Select ANALYZE from the SPSS menu bar.
2. Click REGRESSION and then LINEAR.

## 6. Click on STATISTICS and check ESTIMATES under REGRESSION

COEFFICIENTS. Check MODEL FIT. Click CONTINUE.

## 7. Click Plots. In the LINEAR REGRESSION: PLOTS box, move *ZRES

into the Y box and *ZPRED into the X box. Check HISTOGRAM
and NORMAL PROBABILITY PLOT in the STANDARDISED
RESIDUALS PLOT. Click CONTINUE.

8. Click OK