Data Analysis

Data Analysis
Frequency Distribution
In a frequency distribution, one variable is

considered at a time.
A frequency distribution for a variable produces a
table of frequency counts, percentages, and
cumulative percentages for all the values associated
with that variable.
Statistics Associated with Frequency

Distribution
Measures of Location
The mean, or average value, is the most commonly used

measure of central tendency. The mean, ,is given by
X
n
X = S X /n
i= 1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It

represents the highest peak of the distribution. The mode
is a good measure of location when the variable is
inherently categorical or has otherwise been grouped into
categories.

Distribution
Measures of Location
The median of a sample is the middle value when

the data are arranged in ascending or descending
order. If the number of data points is even, the
median is usually estimated as the midpoint between
the two middle values by adding the two middle
values and dividing their sum by 2. The median is
the 50th percentile.

Distribution
Measures of Variability
The variance is the mean squared

deviation
from 2the mean.
n
(Xi - X)
sx = S
i =1 n - 1
The standard deviation is the square

root of the variance.
Cross-Tabulation
While a frequency distribution describes one variable

at a time, a cross-tabulation describes two or more
variables simultaneously.
Cross-tabulation results in tables that reflect the joint
distribution of two variables with a limited number of
categories or distinct values.
Since two variables have been cross classified,
percentages could be computed either columnwise,
based on column totals or rowwise, based on row
totals
The general rule is to compute the percentages in the
direction of the independent variable, across the
dependent variable.
Pepsi Consumption by Gender

Gender
Pepsi Consumption
Male
Female
Light
33.3%
66.7%
Heavy
66.7%
33.3%
Column total
100%
100%
Purchase of Fashion Clothing by Marital

Status
Purchase of
Fashion
Clothing
Current Marital Status

Married
Unmarried
High
31%
52%
Low
69%
48%
Column
100%
100%
700
300
Number of
respondents
Purchase of Fashion Clothing by Marital

Status
Pur chase of
Fashion
Clothing
Sex
Male
Marr ied
Female
High
35%
Not
Mar r ied
40%
Mar r ied
25%
Not
Mar r ied
60%
Low
65%
60%
75%
40%
Column
totals
Number of
cases
100%
100%
100%
100%
400
120
300
180
Case Cleopatra
Prelaunch Market Research

Objective to assess
response to Cleopatra advt.
Product acceptance
Design
Supergroup
Ad test, Product Placement
Methodology
??
Toronto
Case Cleopatra

Results
Positive from the group
50% Buying Intention Post Ad
64% Buying Intention Post Trial
Decision
Launch in Quebec
Premium
Advt. and some consumer promotion
Case Cleopatra

Problems
Location
Beyond Trial
Adoption, Purchase Frequency
Poor performance
Sales
Case: Cleopatra
Post Launch Study

204 All Soap Users
99 Cleopatra Users (Try)
Results
High Awareness
73.5%
Low Trials
14%
Case Cleopatra
Trial Implications
Lost Opportunity
73.514.2%
Critical factor
High awareness not enough
Awareness, Interest, Evaluation, Trial, Adoption
Case Cleopatra
Low Trials Reasons

Lack of adequate promotional support
Low redemption of coupons
Sweepstakes did not work at all
Problems with the ad : Exhibit 13

63% do not intend to try
59% no or a negative reaction to the Cleopatra
Why?
Case Cleopatra
Problems with the ad

Jug of perfume being poured
Strong smell a problem
Perceived to be harsh and not for skin care
Footnote Exhibit 11
Execution of bath
Showers outnumber baths 4:1 in Quebec (ex.
12)
Not for everyday usage
67% --Occasional usage (ex. 12)
Case Cleopatra
Decision Options
Discontinue brand
Continue the current strategy
4.5% market share
Smaller niche
Case Cleopatra
Decision Options
Discontinue brand
Subsidiary/Sales force reputation
Externally
Internally
Need a contender for skin care segment
Case Cleopatra
Decision Options
Continue the current strategy
Significantly higher trial levels

Increase in promotions
Increase in expenses
More losses
Case Cleopatra
Brand Performance
High Conversion rate
Strong diagnostics among users
Exhibit 10
Skin care 50%
Fragrance 53%
Case Cleopatra: Exhibit 9

Brand
Aloe and Lonolin
Conversion
rate(all+most
occasions)/ever tried
16%
Camay
14%
Cleopatra
31%
Dove
21%
Palmolive
12%
Case Cleopatra
Scale down expectations

Target a smaller segment
Need to profile current acceptors

Need to promote to this group
Change advertising- low/drop.
Reduce distribution coverage
With better incentives
Further Analysis: Crosstabs
Exhibits 9 and 10
Dove Regular vs. Others
Age segments
MHI groups
Problem 0
Pepsi has conducted a pilot U & A study

for its brands. It has found that favourite
brand varies across males and females.
It found that 5/15 males and 10/15
females prefer Mirinda and the reverse
is true for Pepsi. How should Pepsi test
this relationship?
Statistics Associated with CrossTabulation

Chi-Square
To determine whether a systematic association

exists, the probability of obtaining a value of chisquare as large or larger than the one calculated
from the cross-tabulation is estimated.
An important characteristic of the chi-square statistic
is the number of degrees of freedom (df) associated
with it. That is, df = (r - 1) x (c -1).
The null hypothesis (H0) of no association between
the two variables will be rejected only when the
calculated value of the test statistic is greater than
the critical value of the chi-square distribution with the
appropriate degrees of freedom.

Chi-Square
The chi-square statistic ( ) is used to test the

statistical significance of the observed association in
a cross-tabulation.
The expected frequency for each cell can be
calculated by using a simple formula:
nrnc
fe = n
where
nr
nc
n
= total number in the row

= total number in the column
= total sample size

Chi-Square
For the data in Table, the expected frequencies
15 X 15 = 7.50
15 X 15 = 7.50
for
30
30
the cells going from left to right and from top to
15 X 15 = 7.50
bottom, are: 15 X 15 = 7.50
30
2 =
S
all
cells
30
(f o - f e) 2
fe
Then the value of is calculated as follows:
Chi-Square
For the data in Table, the value of
is
calculated as:
= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2

7.5
7.5
7.5
7.5
=0.833 + 0.833 + 0.833+ 0.833
= 3.333
Marketing Problem 1
Vodafone Mobile has conducted a pilot

customer satisfaction study and it has found
that from a sample of 29 IIM students
average is 4.724 on a 7-point satisfaction
scale with a std. dev. Of 1. 579. Minimum
acceptable value of customer satisfaction
should be greater 4 for the firm. What should
the market research manager recommend to
the marketing manager?
Hypothesis Testing Using the t

Statistic
1.
2.
3.
4.
5.
Formulate the null (H0) and the alternative (H1)

hypotheses.
Select the appropriate formula for the t statistic.
Select a significance level for testing H0. Typically,
the 0.05 level is selected.
Take one or two samples and compute the mean
and standard deviation for each sample.
Calculate the t statistic assuming H0 is true.
Hypothesis Testing Using the t

Statistic
6.
7.
8.
Calculate the degrees of freedom and estimate the

probability of getting a more extreme value of the
statistic from Table 4 (Alternatively, calculate the
critical value of the t statistic).
If the probability computed in step 5 is smaller than
the significance level selected in step 2, reject H0.
If the probability is larger, do not reject H0.
Express the conclusion reached by the t test in
terms of the marketing research problem.
One Sample
t Test
The hypotheses may be
formulated as:
< 4.0
H1: > 4.0
H0:
t = (X - )/sX
sX = s/ n
sX
= 1.579/ 29
= 1.579/5.385 = 0.293
t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471
One Sample
t Test
The degrees of freedom for the t statistic to test the
hypothesis about one mean are n - 1. In this case,
n - 1 = 29 - 1 or 28. From Table 4 in the Statistical
Appendix, the probability of getting a more extreme
value than 2.471 is less than 0.05 (Alternatively, the
critical t value for 28 degrees of freedom and a
significance level of 0.05 is 1.7011, which is less than
the calculated value). Hence, the null hypothesis is
rejected. The satisfaction level does exceed 4.0.
Marketing Problem -2
Levers has launched a new brand of

coffee. It is interested in knowing if
consumers in South and North India are
responding differently to its new
product. What testing procedure do you
recommend to Levers?
Two Independent Samples

Means
In the case of means for two independent samples,

the hypotheses take the following form.
H :
H :
0
The two populations are sampled and the means and

variances computed based on samples of sizes n1
and n2. If both populations are found to have the
same variance, a pooled variance estimate is
computed from the two sample variances as follows:
n1
(X
i 1
i1
n2
X ) + (X
n + n -2
2
i 1
i2
2
s1 +
(n 2-1)
X ) or s2 = (n 1 - 1)
n1 + n2 -2
2
2
s2

Means
The standard deviation of the test statistic can be
estimated as:
sX 1 - X 2 =
s 2 (n1 + n1 )
1
2
The appropriate value of t can be calculated as:
t=
(X 1 -X 2) - (1 - 2)
sX 1 - X 2
The degrees of freedom in this case are (n1 + n2 -2).

F Test
An F test of sample variance may be performed
if it is
not known whether the two populations have
equal variance. In this case, the hypotheses
are:
H0:
2
1
2
2
2
H1: 12
2

F Statistic
The F statistic is computed from the sample
variances as follows
F(n1-1),(n2-1) =
s12
s22
where
n1 = size of sample 1
n2 = size of sample 2
n1-1
= degrees of freedom for sample 1
n2-1
= degrees of freedom for sample 2
s12 = sample variance for sample 1
s22 = sample variance for sample 2
Pepsi has launched two new variants of

diet Pepsi. It has decided to conduct a
product test to arrive at a suitable
product. It has decided to conduct a
C.L.T. on a group of consumers. What
testing procedures would you suggest
to Pepsi?
Paired Samples
The difference in these cases is examined by a paired samples t
test. To compute t for paired samples, the paired difference
variable, denoted by D, is formed and its mean and variance
calculated. Then the t statistic is computed. The degrees of
freedom are n - 1, where n is the number of pairs. The relevant
formulas are:
H0: D = 0
H1: D 0
tn-1 =
D - D
sD
n
Paired Samples
where,
D=
n
S=1
i
sD =
SD
S1
i=
Di
(Di - D)2
n-1
Nestle has launched a new variant of

drinking chocolate. It has decided to
conduct a product test to assess
consumer response. It has divided the
country into 4 geographic zones and
would like to know if regional
differences are relevant for this new
launch. What testing procedures would
you suggest to Nestle?
Relationship Among Techniques
Analysis of variance (ANOVA) is used as a test of

means for two or more populations. The null
hypothesis, typically, is that all means are equal.
Analysis of variance must have a dependent variable
that is metric (measured using an interval or ratio
scale).
There must also be one or more independent
variables that are all categorical (nonmetric).
Categorical independent variables are also called
factors.
Decomposition of the Total

Variation:
Independent Variable
X
One-way ANOVA
Total
Within
Category
Variation
=SSwithin
Category
Mean
X1
Y1
Y2
:
:
Yn
Y1
X2
Y1
Y2
Categories
X3
Y1
Y2
Sample
Xc
Y1
Y2
Yn
Y2
Yn
Y3
Yn
Yc
Y1
Y2
:
:
YN
Y
Between Category Variation = SSbetween= SSx
Total
Variation
=SSy
Statistics Associated with Oneway

Analysis of Variance
SSbetween. Also denoted as SSx, this is the variation

in Y related to the variation in the means of the
categories of X. This represents variation between
the categories of X, or the portion of the sum of
squares in Y related to X.
SSwithin. Also referred to as SSerror, this is the

variation in Y due to the variation within each of the
categories of X. This variation is not accounted for
by X.
SSy. This is the total variation in Y.
Conducting One-way Analysis of

Variance
Decompose the Total Variation
The total variation in Y, denoted by SSy, can be
decomposed into two components:
SSy = SSbetween + SSwithin
where the subscripts between and within refer to the
categories of X. SSbetween is the variation in Y related
to the variation in the means of the categories of X.
For this reason, SSbetween is also denoted as SSx.
SSwithin is the variation in Y related to the variation
within each category of X. SSwithin is not accounted
for by X. Therefore it is referred to as SSerror.

Variance
Test Significance
The null hypothesis may be tested by the F statistic
based on the ratio between these two estimates:
SS x /(c - 1)
MS
x
F=
=
SS error/(N - c) MS error
This statistic follows the F distribution, with (c - 1) and

(N - c) degrees of freedom (df).

Variance
Interpret the Results
If the null hypothesis of equal category means is not

rejected, then the independent variable does not
have a significant effect on the dependent variable.
On the other hand, if the null hypothesis is rejected,
then the effect of the independent variable is
significant.

Data Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis

Uploaded by

Copyright:

Available Formats

Data Analysis

In a frequency distribution, one variable is

Statistics Associated with Frequency

The mean, or average value, is the most commonly used

The mode is the value that occurs most frequently. It

Statistics Associated with Frequency

The median of a sample is the middle value when

Statistics Associated with Frequency

The variance is the mean squared

The standard deviation is the square

While a frequency distribution describes one variable

Pepsi Consumption by Gender

Purchase of Fashion Clothing by Marital

Current Marital Status

Purchase of Fashion Clothing by Marital

Prelaunch Market Research

Prelaunch Market Research

Prelaunch Market Research

Post Launch Study

Low Trials Reasons

Problems with the ad : Exhibit 13

Problems with the ad

Need a contender for skin care segment

Significantly higher trial levels

Case Cleopatra: Exhibit 9

Aloe and Lonolin

Scale down expectations

Need to profile current acceptors

Further Analysis: Crosstabs

Pepsi has conducted a pilot U & A study

Statistics Associated with CrossTabulation

To determine whether a systematic association

Statistics Associated with CrossTabulation

The chi-square statistic ( ) is used to test the

= total number in the row

Statistics Associated with CrossTabulation

Then the value of is calculated as follows:

Statistics Associated with CrossTabulation

= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2

Vodafone Mobile has conducted a pilot

Hypothesis Testing Using the t

Formulate the null (H0) and the alternative (H1)

Hypothesis Testing Using the t

Calculate the degrees of freedom and estimate the

t = (4.724-4.0)/0.293 = 0.724/0.293 = 2.471

Levers has launched a new brand of

Two Independent Samples

In the case of means for two independent samples,

The two populations are sampled and the means and

Two Independent Samples

The appropriate value of t can be calculated as:

The degrees of freedom in this case are (n1 + n2 -2).

Two Independent Samples

Two Independent Samples

Pepsi has launched two new variants of

Nestle has launched a new variant of

Relationship Among Techniques

Analysis of variance (ANOVA) is used as a test of

Decomposition of the Total

Between Category Variation = SSbetween= SSx

Statistics Associated with Oneway

SSbetween. Also denoted as SSx, this is the variation

SSwithin. Also referred to as SSerror, this is the

SSy. This is the total variation in Y.