You are on page 1of 19

Factor Analysis

Factor analysis is a technique that is used to reduce a large number of variables into fewer
numbers of factors. This technique extracts maximum common variance from all variables and
puts them into a common score. As an index of all variables, we can use this score for further
analysis. Factor analysis is part of general linear model (GLM) and this method also assumes
several assumptions: there is linear relationship, there is no multicollinearity, it includes relevant
variables into analysis, and there is true correlation between variables and factors. Several
methods are available, but principle component analysis is used most commonly.

Types of factoring:

There are different types of methods used to extract the factor from the data set:

1. Principal component analysis: This is the most common method used by researchers. PCA
starts extracting the maximum variance and puts them into the first factor. After that, it removes
that variance explained by the first factors and then starts extracting maximum variance for the
second factor. This process goes to the last factor.

2. Common factor analysis: The second most preferred method by researchers, it extracts the
common variance and puts them into factors. This method does not include the unique variance
of all variables. This method is used in SEM.

3. Image factoring: This method is based on correlation matrix. OLS Regression method is used
to predict the factor in image factoring.

4. Maximum likelihood method: This method also works on correlation metric but it uses
maximum likelihood method to factor.

5. Other methods of factor analysis: Alfa factoring outweighs least squares. Weight square is
another regression based method which is used for factoring.

Factor loading:

Factor loading is basically the correlation coefficient for the variable and factor. Factor loading
shows the variance explained by the variable on that particular factor. In the SEM approach, as a
rule of thumb, 0.7 or higher factor loading represents that the factor extracts sufficient variance
from that variable.

Eigenvalues: Eigenvalues is also called characteristic roots. Eigenvalues shows variance


explained by that particular factor out of the total variance. From the commonality column, we
can know how much variance is explained by the first factor out of the total variance. For
example, if our first factor explains 68% variance out of the total, this means that 32% variance
will be explained by the other factor.
Factor score: The factor score is also called the component score. This score is of all row and
columns, which can be used as an index of all variables and can be used for further analysis. We
can standardize this score by multiplying a common term. With this factor score, whatever
analysis we will do, we will assume that all variables will behave as factor scores and will move.

Criteria for determining the number of factors: According to the Kaiser Criterion, Eigenvalues is a
good criteria for determining a factor. If Eigenvalues is greater than one, we should consider that
a factor and if Eigenvalues is less than one, then we should not consider that a factor. According
to the variance extraction rule, it should be more than 0.7. If variance is less than 0.7, then we
should not consider that a factor.

Rotation method: Rotation method makes it more reliable to understand the output.
Eigenvalues do not affect the rotation method, but the rotation method affects the Eigenvalues
or percentage of variance extracted. There are a number of rotation methods available: (1) No
rotation method, (2) Varimax rotation method, (3) Quartimax rotation method, (4) Direct oblimin
rotation method, and (5) Promax rotation method. Each of these can be easily selected in SPSS,
and we can compare our variance explained by those particular methods.

Assumptions:

No outlier: Assume that there are no outliers in data.

Adequate sample size: The case must be greater than the factor.

No perfect multicollinearity: Factor analysis is an interdependency technique. There should not


be perfect multicollinearity between the variables.

Homoscedasticity: Since factor analysis is a linear function of measured variables, it does not
require homoscedasticity between the variables.

Linearity: Factor analysis is also based on linearity assumption. Non-linear variables can also be
used. After transfer, however, it changes into linear variable.

Interval Data: Interval data are assumed.

Key concepts and terms:

Exploratory factor analysis: Assumes that any indicator or variable may be associated with any
factor. This is the most common factor analysis used by researchers and it is not based on any
prior theory.

Confirmatory factor analysis (CFA): Used to determine the factor and factor loading of measured
variables, and to confirm what is expected on the basic or pre-established theory. CFA assumes
that each factor is associated with a specified subset of measured variables. It commonly uses
two approaches:

The traditional method: Traditional factor method is based on principle factor analysis method
rather than common factor analysis. Traditional method allows the researcher to know more
about insight factor loading.

The SEM approach: CFA is an alternative approach of factor analysis which can be done in SEM.
In SEM, we will remove all straight arrows from the latent variable, and add only that arrow
which has to observe the variable representing the covariance between every pair of latents.
We will also leave the straight arrows error free and disturbance terms to their respective
variables. If standardized error term in SEM is less than the absolute value two, then it is
assumed good for that factor, and if it is more than two, it means that there is still some
unexplained variance which can be explained by factor. Chi-square and a number of other
goodness-of-fit indexes are used to test how well the model fits.

For detailed explanation : https://stats.idre.ucla.edu/spss/output/factor-analysis/

--------------------------------------------------------------------------------------------------------------------------------

Communality in Factor Analysis

Communalities indicate the common variance shared by factors with given variables. Higher
communality indicated that larger amount of the variance in the variable has been extracted by
the factor solution. For better measurement of factor analysis communalities should be 0.4 or
greater.

-------------------------------------------------------------------------------------------------------------------

Rotation of Factors

Most of the rationale for rotating factors comes from Thurstone (1947) and

Cattell (1978) who defended its use because this procedure simplifies the factor

structure and therefore makes its interpretation easier and more reliable (i.e.,

easier to replicate with different data samples).

Thurstone suggested five criteria to identify a simple structure. According to

these criteria, still often reported in the literature, a matrix of loadings (where

the rows correspond to the original variables and the columns to the factors) is

simple if:

1. each row contains at least one zero;


2. for each column, there are at least as many zeros as there are columns

(i.e., number of factors kept);

3. for any pair of factors, there are some variables with zero loadings on one

factor and large loadings on the other factor;

4. for any pair of factors, there is a sizable proportion of zero loadings;

5. for any pair of factors, there is only a small number of large loadings.

Rotations of factors can (and used to) be done graphically, but are mostly

obtained analytically and necessitate to specify mathematically the notion of

simple structure in order to implement it with a computer program.

-------------------------------------------------------------------------------------------------------------------

KMO Analysis

Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data is for Factor Analysis. The
test measures sampling adequacy for each variable in the model and for the complete model.
The statistic is a measure of the proportion of variance among variables that might be common
variance. The lower the proportion, the more suited your data is to Factor Analysis.

KMO returns values between 0 and 1. A rule of thumb for interpreting the statistic:

KMO values between 0.8 and 1 indicate the sampling is adequate.

KMO values less than 0.6 indicate the sampling is not adequate and that remedial action should
be taken. Some authors put this value at 0.5, so use your own judgment for values between 0.5
and 0.6.

KMO Values close to zero means that there are large partial correlations compared to the sum of
correlations. In other words, there are widespread correlations which are a large problem for
factor analysis.

------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------

Discriminant Analysis
For detailed explanation : https://stats.idre.ucla.edu/stata/dae/discriminant-function-
analysis/

http://www.quickmba.com/marketing/research/

https://www.originlab.com/doc/Origin-Help/DiscAnalysis-Result

importance of Wilk's Lambda in DA

https://www.statisticshowto.datasciencecentral.com/wilks-lambda/

Wilks’ lamdba (Λ) is a test statistic that’s reported in results from MANOVA , discriminant
analysis, and other multivariate procedures. Other similar test statistics include Pillai’s trace
criterion and Roy’s ger criterion.

In MANOVA, Λ tests if there are differences between group means for a particular combination
of dependent variables. It is similar to the F-test statistic in ANOVA. Lambda is a measure of the
percent variance in dependent variables not explained by differences in levels of the
independent variable. A value of zero means that there isn’t any variance not explained by the
independent variable (which is ideal). In other words, the closer to zero the statistic is, the more
the variable in question contributes to the model. You would reject the null hypothesis when
Wilk’s lambda is close to zero, although this should be done in combination with a small p-value.

In discriminant analysis, Wilk’s lambda tests how well each level of independent variable
contributes to the model. The scale ranges from 0 to 1, where 0 means total discrimination, and
1 means no discrimination. Each independent variable is tested by putting it into the model and
then taking it out — generating a Λ statistic. The significance of the change in Λ is measured with
an F-test; if the F-value is greater than the critical value, the variable is kept in the model. This
stepwise procedure is usually performed using software like Minitab, R, or SPSS. The following
SPSS output shows which variables (from a list of a dozen or more) were kept in using this
procedure.

Attribute based perceptual map in DA


https://onlinelibrary.wiley.com/doi/full/10.1002/atr.200

-------------------------------------------------------------------------------------------------------------------

Perceptual Maps
Perceptual Maps are useful for four key reasons:

1. Assessing Strengths And Weaknesses Relative To Competing Brands Along Certain Criteria
Important To The Customer
-This is revealed by the positions of the marketer’s brand and competing brands along the axis.

2. Identification Of Competitive Advantage For The Brand

-Perceptual maps show differentiation among products in the customer’s mind.

-For example, in a perceptual map representing the car market based on two dimensions,
“conservative “ vs. “sporty” and “classy/ distinctive” vs. “practical/affordable,” Porsche will
probability be seen as the classiest and sportiest of the cars in consumers’ minds, providing the
brand with a strong competitive advantage.

-Assess opportunities for new brands, as well as for repositioning existing brands.

3. Identifying Market Opportunities

-Empty spaces near an ideal point (meaning an attractive market segment) on the perceptual
map represent potential market opportunities.

4. See How Ideal Points Are Moving

-In addition, perceptual maps show how ideal points shift as markets mature, and therefore a
brand might shift its positioning in order to retain or gain a competitive advantage.

What do you do when your product’s features are not registering with customers? If a brand has
a competitive advantage on an attribute that is not salient, marketers can educate their
customers as to why it is important and show them why they should care about this attribute.

If this does not work or if your positioning is not registering, marketers usually consider changing
their positioning with a strategy that is more likely to be effective.

---------------------------------------

ALTERNATE ANSWER
Benefit #1: Unlocking insights about your competitors and industry

In its purest form, perceptual brand mapping shows the relative position of competing brands
based on how those brands are perceived by consumers. The axes are carefully chosen brand
attributes that are both known to be highly compelling to consumers and that also enable
maximum differentiation among the brands. Here’s an example of a simplified map regarding
beverage sweeteners.
The value of perceptual mapping is in the visual impact. Through this view of the marketplace,
findings may come to light more quickly than they do with tables of data.

In this example, the perceptual map shows the “proximity” advantage that natural sugar
substitutes (like stevia-based Truvia) have over artificial substitutes like Equal for acquiring sugar
users who are seeking a healthier alternative. The implications of this advantage are major due
to the sheer size of the sugar users segment, and the fact that artificial sweeteners like Equal
may never be considered by many sugar users.

To do perceptual mapping right, we recommend creating multiple perceptual mappings for your
brand and the marketplace by using a few different pairs of brand attributes that are determined
to be highly important to the target audience. Here are another two simplified examples for
beverage sweeteners.
This exercise of generating multiple perceptual mappings can be fruitful because it requires
evaluating your brand, competitors, and the overall marketplace through many different lenses.
These additional maps, using different attribute pairs, highlight additional limitations for artificial
sweeteners like Equal relative to natural substitutes like stevia-based Truvia in the eyes of sugar
users. These additional limitations are relative healthiness (natural vs. artificial) and even a
relative “cool” factor due to the novelty and growing momentum of natural sweeteners like
stevia. (For example, stevia is a key ingredient in Coca-Cola’s newly introduced green-packaged
“Life” product.)

Ultimately, these are the types of learnings that can confirm specific brands’ strategies or
strengths, and that can identify where an industry’s “sweet spot” or value may be located (due
to the density or type of brands in a specific position), as indicated on Map #3.

Benefit #2: Communicating where the brand is headed

We have described how perceptual brand mapping can surface learnings and insights by taking a
snapshot of current brands’ positions in the marketplace. Perceptual mapping can also be an
effective tool for describing a desired future position for the company’s brand, especially when a
company is seeking to change its positioning or when an industry is undergoing a major shift.
Understandably, a perceptual map can also help track the re-positioning of a brand over time as
it moves towards that desired future position. Let’s return to the sweeteners example.

In this hypothetical example, a perceptual map shows the position that Equal wants to own –
different from the position that it holds in consumers’ minds today. Given that many companies
struggle with the right ways to communicate their strategies throughout the organization,
perceptual brand mapping can be an effective and efficient tool worth considering for that
purpose.

Benefit #3: Confirming alignment with your business and brand strategy
A key reason that perceptual mapping is an effective tool for marketers is that the mappings can
help gain alignment on strategy. A critical conversation to generate with perceptual mapping is
whether a brand’s position is the right position, given the organization’s over-arching business
strategy (e.g. vision, mission, competitive advantages) and brand strategy (e.g. intended brand
positioning and architecture).

We have discussed how perceptual maps can help to identify key learnings and can help to chart
the way forward for a brand, however ultimately there are some critical questions that a team
must answer about a brand’s current and intended future position in the marketplace:

Does this brand position align with the company’s strengths, assets, and capabilities?

Does this brand position effectively express the unique value that we are offering?

Is this brand position credible and compelling with consumers and customers?

How sustainable and defensible is this brand position relative to competitors?

In the case of the Equal brand, the company validated the “natural” sweetener opportunity and
aligned on the importance of competing in that space with brands like Truvia. However,
ultimately they determined that a different brand strategy would be most effective with sugar
users, launching a new “Naturals, from the makers of Equal” product line. This final map shows
where they ultimately landed, having begun with insights generation, then identifying the
desired future position, and then aligning on the business and brand strategies to make it all
happen.
In conclusion, perceptual brand mapping can be an effective and efficient tool for marketers. Our
belief is that it is an often overlooked tool in the marketer’s toolkit. We hope that you will
consider perceptual mapping as you seek to surface market insights, communicate brand
direction, and ensure alignment of business and brand strategies.

-------------------------------------------------------------------------------------------------------------------------

Conjoint Analysis
https://www.sawtoothsoftware.com/download/techpap/interpca.pdf

Part-worth in conjoint analysis

Partworth utilities (also know as attribute importance scores and level values) are numerical
scores that measure how much each attribute and level influenced the customer’s decision to
make that choice. Because attribute and level partworths are interrelated, in this post we will
look at them using the same example of tissue paper. Suppose the company wants to find out
customers’ preferences for tissue paper to re-assess its product range, as a pathway to growth.

Level partworths allow you to dive deeper to understand what specific features within an
attribute drive customers’ choice. In this example, recycled unscented tissue paper is strongly
preferred to white scented and somewhat to white unscented.

Levels that are strongly preferred by customers are assigned higher scores, levels that perform
poorly (in comparison) are assigned lower scores. The chart is scaled so that, for each attribute,
the sum of all positive values equals (the absolute value of) the sum of all negative values.

Again, it is important to remember that these partworths are relative. If we include “black with
velvet scent” as another level for the attribute “scent and colour”, the relative value of each level
will change.

Conjoint analysis for new product designs

https://core.ac.uk/download/pdf/145015323.pdf

New product development research and conjoint marketing research are tools that help
managers assess future profitability and understand situational market variables.

Marketing managers are often faced with difficult tasks directed at assessing future profitability,
sales, and market share for new product entries or modifications of existing products or
marketing strategies. To be successful, market research is essential for competitive evaluation
and strategic positioning. Specifically, customer research is needed to best understand the
impact of situational market variables. Forward Analytics offers new product development
research and conjoint marketing research to help our clients understand and execute important
business decisions.
Real enterprise problems can be addressed and solved using the methodologies of conjoint
marketing research. For instance, we can help you construct a strategic marketing plan that takes
into account how specific customer goups will react to a new product, at which price points, and
how the competition is likely to behave. First, let's take a moment to explain how conjoint work
anyways?

Conjoint analysis is a multivariate statistical technique used to measure consumer preference of


multi-feature product or service alternatives. A custom research study is designed to show how
various elements of a product can be selected to predict customer preferences for those
elements. Conjoint assumes that when a consumer makes a decision about a product, the
decision is based on trade-offs among product characteristics. Since any one product probably
will not contain everything the customer wants at a price they are prepared to pay, the customer
has to decide which features they need or want the most and which they are prepared to trade-
off.

Conjoint analysis allows the researcher to simulate real world consumer decision-making
processes by designing product profiles incorporating various features, or attributes, with each
attribute having two or more attribute levels. Like Customer Value Analysis, conjoint marketing
research can show the weight of importance of each product attribute.

How to Identify and Rank Attributes

There are two methods of performing conjoint analysis: full profile or an orthogonal array. The
full profile method generates every possible profile, given the attributes of interest. While this
type of conjoint analysis is practical for products or services with only a few attributes and only
two levels per attribute, it becomes quite cumbersome to evaluate products or services with
more attributes and more than two levels per attribute.

For instance, if your range of features incorporates 8 attributes, with each attribute containing 2
to 4 levels. These parameters generate 1,536 possible profiles, a number too high to expect any
sort of meaningful evaluation by the targeted audience. Therefore, when the product or service
analyzed has more than a few attributes and each attribute has more than two levels,
researchers will often use a subset of all possible profiles. This subset is called an orthogonal
array.

An orthogonal array assumes no interaction exists between attributes and focuses strictly on the
level of consumer preference for each of the attributes independent of one another. In this case,
an orthogonal array narrowed the number of service profiles down to 16, a significantly more
manageable number for customer research.

Once the service profiles have been generated, respondents are asked to rank each of them in
order of personal preference. The results of this research are analyzed using the conjoint
procedure. This procedure derives a level of importance for each attribute and a utility score for
every level of each attribute. Levels of importance and utility scores are derived for each
individual respondent, and levels of importance and utility scores for the entire sample are
calculated by averaging the individual scores. The level of importance for each attribute is
calculated by taking the range of the utility scores for each attribute and dividing it by the sum of
the ranges of all attributes. The level of importance coefficient, therefore, measures on a
proportional scale (levels of importance for all attributes will total 100) the relative importance
of each of the attributes.

Conjoint Analysis requires a custom research study with custom data collection. Forward
Analytics' market research consultants can work closely with you to design a consumer market
research study, specifically crafting a marketing research questionnaire to employ to your
customers and potential customers.

Customer Research

With the client's knowledge of their markets and products and our expertise in marketing
research questionnaire design and methodology development, we design a detailed consumer
market research survey. Forward Analytics' market research consultants will work closely with
you regarding the control product attributes (e.g. price, ease of use, etc.) and attribute levels
(e.g. price ranges, controls or easy-to-read manual, etc.) to be included in the conjoint analysis
and consumer market research.

The SPSS conjoint analysis will show such things as which combination of features is most
preferred, which particular features most influence preference of the total product, and the
relative importance of each feature. More importantly, the system allows us to enter simulations
after all of the data is collected and analyzed. We will be able to specify products with attribute
levels you can provide and test which are the most preferred. Our marketing research consultant
will explain how to implement the results into your strategic market plan.

Forward Analytics' new product development research and conjoint marketing research will
benefit your strategic marketing plan by factoring:

Optimal design configuration for a new product

Profitability and/or market share for new products given the current competitors.

Competitive reaction to your strategies of introducing a new product.

Impact of new competitor products on profits or market share if you make no changes.

Customer switch rates from your current products to new products, or from the competition's.

Response to alternative advertising strategies and/or advertising themes.

Response to alternative pricing strategies, specific price levels, or proposed price changes.

Conjoint marketing research is an excellent resource for understanding how choices are made
and consequently the importance of price. For some, conjoint analysis is the only way of carrying
out pricing research. However, conjoint analysis is a more technical form of customer research
and requires higher levels of design skills. If pricing research is to be conducted it is often
advantageous to include it as part of a broad conjoint study or new product development
research study.

--------------------------------------------------------------------------------------------------------------------------

Kmeans Analysis

In K-means clustering, the number of clusters is predetermined. From the number of clusters,
the initial cluster centres are determined using the squared Euclidean distance measure. Here,
the cluster centres are formed between the variables and the predetermined number of
clusters. Subsequently, cluster membership, final cluster centres and distances between final
cluster centres are evaluated.

What is F-Ratio?

F-Ratio is a statistical ratio which is used to analyze if the expected values of a variable within
predefined groups differ from one another. In other words, it is the ratio of sum of squares
(variances).

F= Between Group Variability/Within Group Variability

Significance of F-Ratio in K-Means

F-Ratios are calculated to estimate the difference between two clusters. From these values, the
role of different mean variables in the formation of clusters is determined. In the analysis of
variance (ANOVA) table, significance level of F-ratio holds importance. Higher the value of
significance, lesser is the contribution of the variable to the separation of clusters.

Example:

From the above ANOVA table, it is evident that net profit has the least impact on the formation
of clusters (high significance level: 0.795) while assets have the highest impact (least significance
level: 0.002)

-----------------------------------------------------------------------------------------------------------------------

Segmentation approaches
Segmentation approaches can range from throwing darts at the data to human judgment and to
advanced cluster modeling. We will explore four such methods: factor segmentation, k-means
clustering, TwoStep cluster analysis, and latent class cluster analysis.

Factor Segmentation
Factor segmentation is based on factor analysis. The first step is to factor-analyze or form groups
of attributes that express some sort of common theme. The number of factors is determined
using a combination of statistics and knowledge of the category. Once the number of factors has
been determined, each respondent receives a score for each of the factors. Respondents are
then assigned to the factor that has the highest score.

K-Means Clustering

This method attempts to identify similar groups of respondents based on selected


characteristics. Like most segmentation techniques, k-means clustering requires that the analyst
specifies the desired number of clusters or segments. During the procedure the distances of
each respondent from the cluster centers are calculated. The procedure repeats until the
distance between cluster centers is maximized (or other specified criterion is reached).
Respondents are assigned to the cluster with the nearest center.

The procedure provides some statistics that can provide information on the ability of each
variable to differentiate the segments. K-means is simple to execute because most statistical
software packages include this procedure, and it can be used with a large number of
respondents or data records.

TwoStep Cluster Analysis

TwoStep cluster analysis is based on hierarchical clustering (SPSS Inc., 2001; Zhang, et al., 1996;
and Chiu et al., 2001). The algorithm identifies groups of cases that exhibit similar response
patterns. Typically, cases are assigned to the cluster with the nearest center. The analyst can
specify a noise percentage (cases that do not belong to any cluster) however. Segment
membership is then determined by the distance of the respondent to the closest nonnoise
cluster and to the noise cluster. Respondents who are nearest to the noise cluster are considered
outliers.

The algorithm contains two stages: (1) preclustering and (2) hierarchical clustering. The
precluster stage groups the respondents into several small clusters. The cluster stage uses the
small clusters as input and groups them into larger clusters. Based on well-defined statistics, the
procedure can automatically select the optimal number of clusters given the input variables. The
algorithm is able to handle both continuous and categorical segmentation variables.

Latent Class Cluster Analysis

Latent class cluster analysis uses probability modeling to maximize the overall fit of the model to
the data. The model can identify patterns in multiple dependent variables (such as attitudes and
needs) and quantify correlation of dependent variables with related variables (such as buying
behaviors). For each survey respondent, the analysis delivers the probability of belonging to
each cluster (segment). Respondents are assigned to the cluster to which they have the highest
probability of belonging.

This method includes statistics to guide the analyst in selecting the optimal number of clusters,
and it can incorporate segmentation variables of mixed metrics. Latent class cluster analysis can
include respondents who have missing values for some of the dependent variables, which
reduces the rate of misclassification (assigning consumers or businesses to the wrong segment).

--------------------------------------------------------------------------------------------------------------------

Chi-Square for Customer Segmentation


CHAID, a technique whose original intent was to detect interaction between variables (i.e., find
"combination" variables), recursively partitions a population into separate and distinct groups,
which are defined by a set of independent (predictor) variables, such that the CHAID Objective is
met: the variance of the dependent (target) variable is minimized within the groups, and
maximized across the groups.

CHAID stands for CHi-squared Automatic Interaction Detection:

CHi-squared

Automatic

Interaction

Detection

Its advantages are that its output is highly visual, and contains no equations. It commonly takes
the form of an organization chart, more commonly referred to as a tree display. As an
illustration, consider the CHAID Tree, below. The tree can "loosely" be interpreted as: The overall
(average) Response of 10% (from a population of size 1000) is explained and predicted by
primarily Martial Status, and secondarily Gender and Pet Ownership. Note: CHAID does not work
well with small sample sizes as respondent groups can quickly become too small for reliable
analysis.
CH
AID Tree

In addition to CHAID detecting interaction between independent variables – for explanatory


studies that are concerned with the impact that many variables have on each other (e.g., in the
Response Tree above, Martial Status & Gender, and Martial Status & Pet Ownership are two
interaction variables as they differentially affect response rates across the bottom respondent
groups) – it is often used as a prediction method. Using CHAID, the data analyst can uncover
relationships between a dependent variable, e.g., response to a mail solicitation, and a host of
predictor variables such as product, price, promotion, recency, frequency, and prior purchases.
Accordingly, the result is a CHAID regression tree that allows the data analyst to predict which
individuals are most likely to respond in the future to a similar mail solicitation. The above
describes CHAIDs original intent, and frequent usage.

Today in database marketing, CHAID primarily serves as a market segmentation technique. The
Response Tree, above, represents a market segmentation of the population under consideration.
The (five) bottom branch "boxes" called nodes, namely, the segments, represent the resultant
market segmentation. The segments are prioritized for targeting based on first their level of
responsiveness, and second on their size. The upper segments, defined by response rates larger
than the overall response rate (10% in is case), are the "low-hanging" fruits, which are high-
yielding (generate response greater than average) and require little effort to obtain. The lower
segments, defined by response smaller than the average, are "high-floating" fruits, which are not
high-yielding and require extra effort to acquire. However, the lower segments offer the
marketer a challenge with a "juicy" yield if a high-octane strategy can be devised to efficiently
tap into these segments. The middle segments, defined by response about equal to the average,
offer the marketer a choice either to use the current business-as-usual strategy to yield average
results (10%), or implement an unexpected forceful strategy (like for the lower segments) to
efficiently stimulate these segments to produce greater than average results. Thusly, the priority
of the five segments, three upper segments {1, 2 and 3}, one middle segment {4} and one lower
segment {5}, for targeting are:

{Married Males, 50% response rate, size 50}

{Divorced with no Pets, 50% response rate, size 50}

{Single Females, 26.7% response rate, size 150}

{Singles, 10% response rate, size 400}

{Divorced with Pets, 7.1% response rate, size 350}

It should be mentioned that CHAID also is used as an exploratory method, and is an alternative
to the multiple regression model, especially when the dataset is not well-suited to the formality
of the parametric (i.e., rigid) statistical multiple regression analysis. What is more, Dr. Bruce
Ratner has explicated many novel and effective uses of CHAID ranging from statistical modeling
and analysis to data mining.

---------------------------------------------------------------------------------------------------------------------

Heirarchal vs K Means
When should I go for K-Means Clustering and when for Hierarchical Clustering ?

Often people get confused, which one of the two i.e. K-Means Clustering, and Hierarchical
Clustering, techniques should be used for performing a Cluster Analysis.

Well, Answer is pretty simple, if your data is small then go for Hierarchical Clustering and if it is
large then go for K-Means Clustering.

In Hierarchical Clustering, first all the possible distances among the observations are calculated.
With the basic Knowledge of Permutation and Combinations, we know that the number of
Distances would be

No. of Pairs = C , where n is number of observations.


2

Now once the nearby observation make pair, the distances among newly formed pairs are
calculated.

Imagine the number of distances if n = 5, in first iteration, it would be 5! / ( 2! * 3!) = 10, which
are manageable.

However if n = 10,000 then number of distances = (10000! / ( 2! * 9998!)

Now (10000! / ( 2! X 9998!) = 10000 X 9999 / 2 = 49,99,500

And this is only first iteration. Despite in every iteration the number of distance reduce
significantly, calculation of these many distances become quite un-manageable.

Hence we switch to K-Means Clustering.

In K-Means Clustering, Suppose we go for K = 3 clusters, then all the observation are divided
into 3 Clusters in purely random fashion, and 3 Centroids are Calculated

Now Distance of each observation with each Centroid is calculated. So in first iteration, keeping
number of observation 10,000 again, the number of distances calculated would be = 3 X 10000 =
30000.

Now again Centroid would be calculated and then again the distances ( 30,000 again).

So even after fair number of iterations, calculation of distances remains quite manageable.

Then one would say, then we should use only K-Means ... well, I would say ... You can.

But in K-Means Clustering,we need to iterate the model to find out the optimal number of
Clusters, but in Hierarchical Clustering, it automatically gives result at various number of
Clusters.

Time is money, so please make a habit to save it.

Hence, use hierarchical Clustering for small dataset, and K-Means Clustering for large dataset.

You might also like