You are on page 1of 22

Network-Based Marketing: Identifying Likely Adopters via Consumer Networks

Author(s): Shawndra Hill, Foster Provost and Chris Volinsky


Source: Statistical Science, Vol. 21, No. 2, A Special Issue on Statistical Challenges and
Opportunities in Electronic Commerce Research (May, 2006), pp. 256-276
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/27645754
Accessed: 10-03-2015 09:28 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.

http://www.jstor.org

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

Statistical Science
2006, Vol. 21,No. 2. 256-276
DOI: 10.1214/088342306000000222
? InstituteofMathematical Statistics, 2006

Network-Based

Marketing:
Identifying
via Consumer

Adopters
Likely
Networks
Shawndra

Hill, Foster

Provost

and Chris Volinsky

refers to a collection
of marketing
marketing
consumers
of
to
that
take
links
between
increase
sales.
advantage
techniques
on the consumer networks
formed using direct interactions
We concentrate
Network-based

Abstract.

between consumers. We survey the diverse literature


(e.g., communications)
on such marketing
with an emphasis on the statistical methods
used and the
have been applied. We also provide a discus
these methods
data to which
for this burgeoning
and opportunities
research topic. Our
a
of inadequate data, prior
survey highlights
gap in the literature. Because
studies have not been able to provide direct, statistical support for the hypoth
esis that network
linkage can directly affect product/service
adoption. Using
sion of challenges

ser
a new data set that represents the adoption of a new telecommunications
we
we
show very strong support for the hypothesis.
show
vice,
Specifically,
consumers
three main results: (1) "Network neighbors"?those
linked to a
the service at a rate 3-5 times greater than baseline
team. In ad
the best practices
of the firm's marketing
new
to
network
customers
the
allows the firm
who
acquire
have fallen through the cracks, because
they would not have

prior customer?adopt
groups selected by
dition, analyzing
otherwise would

built
identified based on traditional attributes.
(2) Statistical models,
a very large amount of geographic,
and
demographic
prior purchase
and substantially
in
data, are significantly
improved by including network
information
allows the ranking of the
formation.
(3) More detailed network

been
with

so as to permit the selection


of adoption.
very high probabilities

network
with

neighbors

of small

sets of individuals

Viral marketing,
word of mouth,
Key words and phrases:
targeted market
statistical
network
relational
classification,
analysis,
learning.
ing,

1. INTRODUCTION

cial network among consumers.


Instances of network
mar
have been called word-of-mouth
based marketing
and
buzz marketing
keting, diffusion
of innovation,

seeks to increase brand


marketing
of a so
and profit by taking advantage

Network-based
recognition
Shawndra Hill
is Associate

is a Doctoral

Professor,

and Management

Operations

Candidate

Department

viral marketing
(we do not consider multilevel
which
has
become
known as "network"
ing,

and Foster Provost

or adoption spreads from consumer


to
ing). Awareness
consumer. For example,
friends or acquaintances
may
tell each other about a product or service,
increasing
awareness
and possibly
exercising
explicit advocacy.

of Information,

Sciences,

N.

Leonard

Stem

School of Business, New York University, New York, New


York 10012-1126, USA (e-mail:
shill@stern.nyu.edu;

Volinsky isDirector,
Labs
Jersey

Research,
07932,

Statistics Research Department,

Shannon
USA

Laboratory,

(e-mail:

Firms may
to-consumer

Chris

fprovost@stern.nyu.edu).

Florham

Park,

market
market

AT&T

use

their websites

advocacy
and Shah,
(Kautz, Selman
tomer feedback mechanisms

New

volinsky@research.att.com).

to facilitate

via product

256

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

1997)

consumer

recommendations
or via on-line

(Dellarocas,

2003).

cus
Con

NETWORK-BASED MARKETING
sumer networks

may
or marketing

also provide
leverage to the ad
of
the
firm. For exam
strategy

vertising
ple, in this paper we show how analysis
network improves targeted marketing.
two contributions.
This paper makes

of a consumer
First we

sur

257

such as Oprah, with her monthly


book club reading
"hubs"
of
in the consumer
may represent
advocacy

list,
re

lationship network. The success of The Da Vinci Code,


by Dan Brown, may be due to its initial marketing:
free to readers thought to
10,000 books were delivered

research literature
vey the burgeoning methodological
on network-based
in
marketing,
particular on statisti
cal analyses
for network-based
marketing. We review

be influential

the research

to spread information
about a product via word
of mouth,
it has been called viral marketing,
although
that term could be used to describe any network-based

techniques

posed, and the data and analytic


also discuss
and op
challenges
for research in this area. The review allows
questions
used. We

portunities
us to postulate

data requirements
for study
necessary
of network-based
and
ing the effectiveness
marketing
to highlight
the lack of current research that satisfies
those

access

requirements.
Specifically,
both to direct links between

direct

information

research

must

consumers

have
and to

on the consumers'

product adoption.
of inadequate data, prior studies have not been
able to provide direct, statistical support (Van den Bulte
and Lilien, 2001) for the hypothesis
that network link
age can directly affect product/service
adoption.

Because

The
port

second

contribution

is to provide

that network-based

sup
empirical
indeed can im

marketing
intro
prove on traditional marketing
techniques. We
duce telecommunications
data that present a natural

in which
marketing models,
as
as
well
linkages
product adoption
rates can be observed.
For these data, we show three
testbed

for network-based

communication

con
(1) "Network
neighbors"?those
the service at
linked to a prior customer?adopt
a rate 3-5 times greater than baseline
groups selected

main

results:

to
enough (e.g., individuals, booksellers)
the traffic in paid-for editions
(Paumgarten,
When
to con
firms give explicit
incentives

stimulate
2003).
sumers

where

marketing

from

spreads

tion of using
commonly
athletes)

capitalize
to advocate

"cool" members
particularly
to adopt products
(Gladwell,
and
1997; Hightower,
Baker,
2002).
Brady
Network
targeting'. The third mode of network-based
is for the firm to market to prior purchasers'
marketing
social-network
any advo
neighbors,
possibly without
For network
the
cacy at all by customers.
targeting,
firm must have some means
to identify
these social

team. In
of the firm's marketing
by the best practices
the network allows the firm to ac
addition, analyzing
quire new customers who otherwise would have fallen

demographic
and substantially
formation.
allows
permit

improved

by

(3) More

network information
sophisticated
so as to
the ranking of the network neighbors
the selection of small sets of individuals with

very high probabilities

of adoption.

example
The Hotmail
targeting and implicit advocacy:
free e-mail service appended
to the bottom of every
e-mail message
the hyperlinked
advertise
outgoing

ment,

are three, possibly


network-based
marketing.

complementary,

modes

"Get

targeting

your free e-mail


the social neighbors

(Montgomery,
user's
implicit

of

Individuals become
vocal advo
advocacy:
Explicit
cates for the product or service, recommending
it to
their friends or acquaintances.
Particular
individuals

at Hotmail,"
thereby
every current user

of

while

2001),

taking
Hotmail

advocacy.
customer base.
tially increasing
in the first month
alone Hotmail

of the
advantage
saw an exponen

Started

in July 1996,
acquired 20,000 cus
1996 the firm had acquired over

tomers. By September
100,000 accounts, and by early
lion subscribers.
some

There

in combination.
may be used
of viral marketing
combines net

work

Traditional

2. NETWORK-BASEDMARKETING

products
simply by conspicuous
firms have tried to induce the

adoption. More
recently,
same effect by convincing
of smaller social groups

A well-cited

data, are significantly


in
including network

as implicit advocates.
Firms
on influential individuals
(such as

consumers

neighbors.
These
three modes

and prior purchase

or adoption

to consumer.

do not speak
Implicit advocacy: Even if individuals
about a product, they may advocate
through
implicitly
their actions?especially
through their own adoption
of the product. Designer
labeling has a long tradi

sumers

they would not have been


through the cracks, because
identified based on traditional attributes. (2) Statistical
built with a very large amount of geographic,
models,

of awareness

the pattern

consumer

segments

marketing
of

1997 it had over

methods

consumers.

do
Some

not

1mil
to

appeal

consumers

ap

of being on the cutting


parently value the appearance
or
"in the know," and therefore derive satisfac
edge
tion from promoting

exciting products. The firm


to entice vol
(Walker, 2004) has managed
BzzAgents
new
of
untary (unpaid) marketing
products. Further
new,

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

258
more,
come

more

although
available

and more

S. HILL, F. PROVOST AND C. VOLINSKY


information

has be

on products, parsing such information


such as
is costly to the consumer. Explicit
advocacy,
can
a
be
useful way to filter
word-of-mouth
advocacy,
out

noise.

of

network-based

marketing
assumption
is that consumers
propagate
explicit advocacy
about products after they either
information
"positive"
aware of the product by traditional
have been made
key

through

vehicles

marketing
themselves.

Under

or have

the product
experienced
a particular subset
this assumption,

may have greater value to firms because


a
to propagate product in
have
higher propensity
they
on a combination
based
formation
2002),
(Gladwell,
and their having
influential
of their being particularly

of consumers

more

friends

should want
useful

and Domingos,
2002). Firms
(Richardson
to find these influencers
and to promote

behavior.

Many

quantitative
research

pirical marketing
independently.
are collected

methods

assume

used

in em

that consumers

act

attributes
Typically, many explanatory
on each actor and used
in multivari

or tree induction.
In
ate modeling
such as regression
assumes
interde
network-based
contrast,
marketing
inter
among consumer
preferences. When
pendency
dependencies
their effects

to account for
exist, itmay be beneficial
in statis
in targeting models. Traditionally

as part of
are modeled
tical research, interdependencies
a
a covariance
either
within
structure,
particular obser
ex
vational unit (as in the case of repeated measures
or between
units. Studies of
observational
periments)
instead
network-based
attempt to measure
marketing
these interdependencies
through implicit links, such as
on geographic
or demographic
attributes, or
matching
of
links, such as direct observation
between actors. In this section, we re
the different
types of data and the range of statis

through explicit
communications
view

tical methods
we discuss

the extent

accommodate

to analyze
these methods

that have been used


to which

networked

them, and
naturally

data.

we discuss

the final

subsection,
inherent
challenges

some of the statistical

in incorporating

this network

struc

ture.

3.1

Econometric

Models
is the application
of statistical meth
relation
estimation
of economic

Econometrics

to the empirical
ships. In marketing

ods

this often means

of
the estimation
or
one
for
the
marketing
equations:
or firm and one for the market. Regression

two simultaneous
ganization
and time-series

analysis are found at the core of econo


are often
and econometric
models
modeling,
cam
to assess the impact of a target marketing

metric
used

paign over time.


Econometric
models
pact
tion

3. LITERATUREREVIEW
statistical

an
In each case, we provide
systems.
of the approach and a discussion
of a promi
nent example. This (brief) survey is not exhaustive.
In
recommender

overview

to study the im
on
rice consump
interdependent
preferences
automobile
1991),
(Case,
(Yang and
purchases
have been used

of

Allenby,

2003) and elections


(Linden,
For each of the aforementioned

Smith

and York,

studies, geogra
as
a
be
in
for
proxy
part
interdependence
phy
as opposed
tween consumers,
to direct, explicit com
are used in
munication.
different methods
However,
2003).

is used

the analysis.

Most
(2003)
recently, Yang and Allenby
are
that traditional
random effects models

suggested
not sufficient
sumer

to measure

networks.

chical mixture

the interdependencies
a Bayesian
developed

They
model where

of con
hierar
is built

interdependence
through an autoregressive
allows testing of the presence

structure

into the covariance

process. This framework


It also
of interdependence
through a single parameter.
can incorporate
each
the effects of multiple
networks,
structure. In their
with its own estimated
dependence
they use geography
application,
create a "network" of consumers
created

consumers

between

who

to
and demography
links are
in which
exhibit

or
geographic
showed
that the

similarity. The authors


defined network of consumers

demographic

is more
geographically
useful
than the demographic
for explaining
network
consumer behavior as it relates to purchasing
Japanese
cars. Although
they do not have data on direct commu

spans the fields


marketing
science,
economics,
sociology,
computer
and marketing.
In this section, we orga
psychology
nize prominent work
in network-based
marketing
by

the framework presented


consumers,
to ex
and
be extended
could
(2003)
by Yang
Allenby
are
network
between
data
where
links
created
plicit

mod
six types of statistical research: (1) econometric
network
classification
surveys,
(2)
(3)
eling,
modeling,
with convenience
(4) designed
samples,
experiments

posed
A drawback

Work

in network-based

of statistics,

(5) diffusion

theory

and (6) collaborative

filtering

and

nication

between

consumers

their explicit communication


or geographic
to demographic
similarity.
through
of

matrix

pendence
of consumers;

as op

this approach
is that the interde
size n2, where n is the number

has

consumer

networks

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

are extremely

large

NETWORK-BASED MARKETING
estimation
and prohibit parameter
using this method.
or
clever
matrix
clustering of the ob
techniques
Sparse
servations would be a natural extension.
3.2

Network

Network

Models

Classification

models

classification

links between

in a network

entities

use knowledge
to estimate

of the
a quan
in such a

tity of interest for those entities. Typically,


an entity is influenced most by those directly
model
to a lesser ex
to it, but is also affected
connected
tent by those further away. Some network
classifica

use an entire network to make predictions


a particular entity on the network; Macskassy
a brief survey. However,
and Provost
(2004) provided
most methods
have been applied to small data sets and
tion models

259

some studies use survey


this shortcoming,
to
collect
data on consumers'
sampling
comprehensive
word-of-mouth
behavior. By sampling individuals
and
To address

can collect data that are


them, researchers
contacting
to obtain directly by observing
difficult (or impossible)
network-based
and
(Bowman
phenomena
marketing
The
of
lies
these
studies
2001).
Narayandas,
strength
in the data, including the richness and flexibility
of the
answers that can be collected
For
from the responders.
can acquire data about how cus
instance, researchers
tomers found out about a product and how many oth
ers they told about the product. An advantage
is that

about

can design their sampling


scheme to con
trol for any known confounding
factors and can devise
balanced
that test their hy
fully
experimental
designs

research
have not been applied to consumer data. Much
in network classification
has grown out of the pioneer
(1999) on hubs and authorities
ing work by Kleinberg

potheses.
vey data

on the Internet, and out of Google's


PageRank
algo
rithm (Brin and Page,
(to oversimplify)
1998), which
of a network
identifies
the most
influential members

typically are used.


Bowman
and Narayandas
(2001)
surveyed more
than 1700 purchasers of 60 different products who pre

"point" to them. Al
both are
study uses statistical models,
notions of degree centrality
related to well-understood
and distance centrality from the field of social-network

viously had contacted


were
The purchasers
their interaction with

by how many
though neither

others

influential

One paper

a consumer

that models

for max

network

and Domingos
(2002),
imizing profit is by Richardson
as
in which a social network of customers
is modeled
a Markov

random
will

field.

that a given
probability
a
function of the
product is

The

buy a given

states

of her neighbors,
attributes of the product and
or not the customer was marketed
to. In this
whether
framework
to every

to assign a "network value"


it is possible
customer by estimating
the overall benefit of

to that customer,
the impact that
including
marketing
the marketing
action will have on the rest of the net
(e.g., through word of mouth). The authors tested
reviews from an In
their model on a database of movie

work

ternet site and found


outperforms
tomer value.

that their proposed methodology


cus
methods
for estimating
uses implicit
formulation
network

non-network
Their

are linked when a customer


reads a
(customers
reviews
review by another customer and subsequently
information
and implicit purchase
the item herself)
links

(they assume a review


and vice versa).
3.3

logistic

Since the purpose of models


built from sur
is description,
like
statistical
methods
simple
or analysis
of variance
(ANOVA)
regression

on

the manufacturer

of an item

implies

a purchase

ofthat

asked

specific
the manufacturer

word-of-mouth

subsequent
able to capture whether
ers of their experience
and

were

analysis.

customer

researchers

product.
about
questions

behavior.

and its impact


The authors

the customers

told oth

if so, how many people


told.
The
authors
found
that
they
"loyal"
self-reported
customers were more
likely to talk to others about the
but interestingly
products when they were dissatisfied,
more
not
satisfied. Although
likely when
they were
studies

like this collect

some direct data on consumers'

word-of-mouth

the researchers

which

contacts

behavior,
of the consumers'

product. Therefore,
of-mouth
actually
3.4

Designed

do not know

the
later purchased
address whether word

they cannot
affects individual
with

Experiments

sales.
Convenience

Samples
to study
enable researchers
Designed
experiments
a
network-based
in
controlled
marketing
setting. Al
a
the
convenience
though
subjects typically
comprise
who answer an
sample (such as those undergraduates
ad in the school

the design of the experi


newspaper),
can be completely
randomized. This is unlike the
studies that rely on secondary data sources or data from

ment

the Web.

Typically

ANOVA

is used

to draw

conclu

sions.

Surveys

Most

research

tion on whether

Frenzen
in this area does
consumers

actually

not have

informa

talk to each other.

and Nakamoto

that influence
formation

individuals'

through

(1993)
decisions

a market

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

studied

the factors

to disseminate

via word-of-mouth.

in
The

S. HILL, F. PROVOST AND C. VOLINSKY

260

subjects were presented with several scenarios that rep


and marketing
resented different
strategies,
products
tell trusted and
and were asked whether
they would
about the product/sale.
They
acquaintances
on
the effect of the cost/value manipulations
ac
to
information
share
the consumers'
willingness
of the strength of
tively with others, as a function

nontrusted
studied

tie. In this study, the authors did not allow


consumer
net
their explicit
the subjects to construct
to hypoth
instead, they asked the participants
work;
the social

esize

about

data

from

their networks.
a convenience

used

The

experiments
to generalize
sample

the
over

consumer
a complete
network. The authors also em
in their study. They found that the
ployed simulations
be
hazard
moral
the
(the risk of problematic
stronger
the stronger the
havior) presented by the information,
Gen
ties must be to foster information
propagation.
structure and
erally, the authors showed that network
information

characteristics

form their information


3.5

interact when

transmission

individuals

decisions.

Models

Diffusion

Diffusion
tools, both quantitative
theory provides
to assess
the likely rate of diffusion
and qualitative,
or product. Qualitatively,
researchers
of a technology
numerous
factors that facilitate or hin
have identified
der technology
2004), as well as
(Fichman,
adoption
social

that influence

factors

(Rogers,
product adoption
research involves empir
diffusion
2003). Quantitative
often
from diffusion models,
ical testing of predictions
informed by economic
theory.
The

most

notable

and most

influential

diffusion

was proposed by Bass (1969). The Bass model


the number of users who
of product diffusion predicts
a
at
will adopt an innovation
given time t. It hypoth
is a function
that the rate of adoption
esizes
solely

model

individual
of product
adoption. Models
incorporate
assume
is ef
diffusion
that network-based
marketing
occurs
when
diffusion
understanding
and the extent to which
it is effective
is important
from using
for marketers,
these methods
benefit
may

fective.

individual-level
enable

can effectively model word-of-mouth


at the aggregate,
societal level.

In general,
tend accepted
aggregate-level
and the overall
Tout,

Evans

prod

of the sales peak and the


good predictions
to historical
data.
of
the
when
timing
peak
applied
to estimate
Bass used linear regression
the parame
ters for future

the good
sales predictions,
measuring
of fit (R2 value) of the model
for 11 consumer
durable products. The success of the forecasts suggests
that the model may be useful in providing
long-range

ness

for product
sales or adoption. There has
forecasting
since
been considerable
work on diffusion
follow-up
this groundbreaking
and Kerin
work. Mahajan, M?ller
(1984) review this work. Recent work on product diffu
sion explores
2003) as well
2002)

of the product
2005);

they

(Ueda,

typically

the extent

to which

the Internet

as globalization
(Kumar
a
in
role
diffusion.
play
product

3.6 Collaborative

(Fildes,
and Krishnan,

and Recommender

Filtering

Systems
Recommender
to

mendations

systems make personalized


consumers
individual
based

recom
on

de

content
and
and link data (Adomavicius
methods
focus
Collaborative
Tuzhilin,
2005).
filtering
on the links between consumers;
the links are
however,
consumers
not direct. They associate
with each other

mographic

based

on shared purchases

Collaborative

filtering

or similar
is related

network-based

ratings of shared

to explicit consumer
both target market

because
marketing
tasks
benefit
from
ing
learning from data stored inmul
tables
and
(Getoor, 2005). For example, Getoor
tiple
Sahami
(1999), Huang, Chung and Chen (2004) and
and Greiner

between
relational

the connection
(2004) established
the recommendation
and statistical
problem
of proba
learning through the application

bilistic

studies that test and ex


the empirical
theories of product diffusion
rely on
attributes
data for both the customer

adoption
and Yakan,

empirically
The model

yielded

Newton

uct diffusion

ver

individual-

In his first study, Bass tested his model


durables.
against data for 11 consumer

an 5-shaped
is slow
curve, where
adoption
and tails off at the end.
takes off exponentially

at first,
This model

the extension

as the comparison
of results using
sus aggregate-level
data.

products.

describes

on explicit networks would


as
of existing diffusion models,

data. Data

well

who have
of the population
proportion
the
cumulative
let
be
pro
F(t)
adopted. Specifically,
The diffusion
in the population.
portion of adopters
as a func
in its simplest form, models
F(t)
equation,
tion of p, the intrinsic adoption
rate, and q, a mea
sure of social contagion. When
q > p, this equation

of the current

Since

1990;
do not

relational models
(PRM's)
(Getoor, Friedman,
and Pfeffer, 2001 ). However,
neither group used
customers
links
between
for
explicit
learning. Recom
mendation
systems may well benefit from information
Koller

about
perhaps

explicit
quite

consumer
important,

interaction
aspect

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

as an additional,

of similarity.

NETWORK-BASED MARKETING
3.7

Research

and Statistical

Opportunities

see

We

is a burgeoning
interactions

that there

consumers'

body of work
and their effects

on purchasing.
the foregoing
To our knowledge
types
taken in re
statistical
represent the main
approaches
In each approach,
search on network-based
marketing.
or in
in the data collection
there are assumptions made
the analysis
that restrict them from providing
strong
that network
and direct support for the hypothesis
based

can improve on traditional


and convenience
samples can suf

indeed

marketing

Surveys
techniques.
fer from small and possibly
biased
samples. Collab
but do
have large samples,
orative filtering models
not measure
individuals. Models
direct links between
in network
have

and econometrics

classification

instead

like geography
proxies
and almost
communications,

used

direct

accurate,

specific

data on which

historically
of data on

all studies have no

(and what)

customers

purchase.

To paint a complete
a particular

product,

picture of network influence


the ideal data set would have

for
the

(1) large and unbiased


sample,
following
properties:
on subjects,
information
covariate
(2) comprehensive
between
of direct communication
(3) measurement
subjects and (4) accurate information on subjects' pur
in the next section
chases. The data set we present
and we will demonstrate
has all of these properties
its value
The

for statistical
of how

research
to analyze

into network

can be useful when


that squashing
dealing
with up to billions of records. However,
there may be
a loss of important information which can be captured
claimed

Challenges

that addresses

261

influence.

such data brings

question
issues:
many statistical
data
Data-set
size. Network-based
marketing
or
often arise from Internet
telecommunications

up
sets

ap
can
When
observations
be
and
quite large.
plications
the
number in the millions
(or hundreds of millions),
for
data
and
the
become
data
typical
analyst
unwieldy

cannot be handled
in memory
by standard statis
software. Even if the data can be loaded,
tical analysis
their size renders the interactive style of analysis com
mon with tools like R or Splus painfully
slow. In Inter
net or telecommunications
studies, there often are two
often

only by complex network structure.


More
network
information
derived
sophisticated
from transactional
into
data can also be incorporated
net
information
by deriving
as degree distribution
and time
below).
(which we demonstrate
spent on the network
data
Similarly, other types of data such as geographical
or temporal data, which
otherwise
would
need to be
of customer

the matrix

work

attributes

such

can be
by some sophisticated
methodology,
into the analysis by creating new covariates.
It remains an open question whether
clever data en

handled
folded

can extract all useful


to create
information
gineering
a set of covariates
for traditional analysis. For exam
with specific sets of
of communication
ple, knowledge
can be incorporated,
and may provide sub
stantial benefit (Perlich and Provost, 2006).
Once the data are combined,
the remaining data set
individuals

re
still may be quite large. While
much data mining
search is focused on scaling up the statistical toolbox to
data sets, random sampling remains an
today's massive
effective way to reduce data to amanageable
size while
the relationships we are trying to discover,
maintaining
if we assume the network information
is fully encoded
in the derived variables. The amount of sampling nec
environment
and
essary will depend on the computing
the complexity
of the model, but most modern
systems
can handle data sets of tens or hundreds of thousands
sampling, care must be taken to
interest
stratify by any attributes that are of particular
or to oversample
those attributes that have extremely

of observations.

skewed
Low
response

When

distributions.
incidence

of response.

is a consumer's

In applications

purchase

or

reaction

where

the

to a mar

to have a very low response


keting event, it is common
rate, which can result in poor fit and reduced ability to
detect significant
like
effects for standard techniques
If there are not many
logistic regression.
independent
is
attributes, one solution is Poisson
regression, which
well suited for rare events. Poisson
requires
regression

of data: all actors (web sites, commu


their descriptive
with
attributes, and
nicators),
along
actors.
One
solution
is
these
the transactions
among

forming buckets of observations


dent attributes
and modeling
as a Poisson
in these buckets

to compress
information
into attributes
the transaction
to be included in the actors' attribute set. It has been

of any continuous
requires discretization
independent
not
if there
which
be desirable. Also,
attributes,
may
are even a moderate
number of independent
attributes,

massive

sources

et al.,
that file squashing
(DuMouchel,
Volinsky
the best features of
1999), which attempts to combine
with
random
data
sampling, can be use
preprocessed
shown

ful for customer

attrition

prediction.

DuMouchel

et al.

the buckets

will

be

eling. Other

solutions

oversampling

positive

based on the indepen


the aggregate
response
random variable. This

too sparse to allow Poisson mod


that have been proposed
include
responses

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

and/or undersampling

S. HILL, F. PROVOST AND C. VOLINSKY

262

of
(2004) gave an overview
negative responses. Weiss
show
the literature on these and related techniques,

extended network
Incorporating
structure lend themselves
network

as to their effective
evidence
ing that there is mixed
ness. Other studies of note include the following. Weiss

network-centric

(2003) showed that, given a fixed sample


in training data varies
the
size,
optimal class proportion
(but can be de
by domain and by ultimate objective
and Provost

termined); generally
estimates or rankings,
fault. However, Weiss

to produce probability
speaking,
a 50:50 distribution
is a good de
and Provost's

results are only for

tree induction,
mented

and Stephen
(2002) experi
Japkowicz
ma
and support-vector
neural networks
in addition to tree induction, showing
(among

with

chines,
machines
other things) that support-vector
sitive to class
imbalance. However,
they

are insen
considered

fluenced

like itself. Hoff, Raftery and


ings and (4) has neighbors
a
Handcock
defined
Markov-chain
Monte Carlo
(2002)
to estimate
method
latent positions
of the actors for

useful

in need

response

of more

empirical

systematic

and theoretical

from homophily. Unless


the content of communi

word-of-mouth
Separating
about
there is information
cations, one cannot
mouth
transmission
Social

theory
cate with
each

that there was word-of

conclude

about the product.


communi
that people who
are more
likely to be simi

of information

tells

us

other

a concept
called homophily
(Blau,
and
Smith-Lovin
Cook, 2001). Ho
1977; McPherson,
a
for
wide variety of relation
is exhibited
mophily
lar to each

other,

of similarity. Therefore,
linked
ships and dimensions
are
consumers
and
like-minded
like-minded,
probably
consumers
tend to buy the same products. One way to
is to account for con
address this issue in the analysis
scores (Rosenbaum
sumer similarity using propensity
were developed
scores
and Rubin,
1984). Propensity
clinical trials and at
in the context of nonrandomized
tempt to adjust for the fact that the statistical profile of
patients who received treatment may be different than
the profile of those who did not, and that these differ
ences

or enhance

the apparent effect of the


treatment. Let T represent the treatment, X represent
the treatment and
the independent
attributes excluding
score
Y represent
the response. Then the propensity
=
= P(T =
x). By matching
PS(x)
1|X
propensity
scores in the treatment and control groups using typical
could mask

like demographic
of homophily
data, we can
of
account (partially) for the possible
confoundedness
other independent attributes.
indicators

data sets. This

social-network

in an unobserved

embeds

the actors

space," which could be more


than the actual transactions
for pre
themselves
sales. The field of statistical relational
learning

dicting

"social

2005) has recently produced a wide variety of


that could be applicable. Often
these models
influence to propagate
the
network.
through

(Getoor,
methods
allow

study.

as a Markov

(2001) used
to assign every node a "network value."
this technique
A node with high network value (1) has a high prob
(2) is likely to give the product
ability of purchase,
a high rating, (3) is influential on its neighbors'
rat

1998; Mease, Wyner and Buja, 2006)


(Chan and Stolfo,
rule induction
and Stern,
and multiphase
(Clearwater
This
is an area
1991 ;Joshi, Kumar and Agarwal,
2001).

attributes

set of

method

One

modeled
by her neighborhood
field. Domingos
and Richardson

random

small

primarily
with unbalanced

Data with

to a robust

(em
analyses.
simple
in our analysis)
from
is to create attributes
ployed
the network
them into a traditional
data and plug
to let each actor be in
Another
is
analysis.
approach

to deal
techniques
include ensemble

data. Other

noise-free

structure.

data. Missing
transactions
data in network
Missing
are common?often
is observ
only part of a network
able. For instance, firms typically have transactional
data on their customers
only or may have one class
of communication

(e-mail) but not another


(cellular
phone). One attempt to account for these missing
edges
is to use network
structure to assign a probability
of
a missing

edge

an edge
everywhere
creates
this probability

Thresholding
can be added
which
lesser weight
related
closely

to the network,
(Agarwal and Pregibon,

framework

pseudo-edges,
perhaps with

This

is

2004).
to the link prediction
problem,
where
the next links will be

tries to predict
Nowell
and Kleinberg,
PRM

is not present.

models

2003). One extension


link structure through

which
(Liben
of the
the use

of reference uncertainty
and existence uncertainty. The
a
extension
includes
unified generative model
for both
content and relational structure, where interactions be
tween the attributes
and link structure are modeled
(Getoor,

Friedman,

Koller

and Taskar,

2003).

4. DATASET AND PRIMARYHYPOTHESIS


section details our data set, derived
a direct-mail
from
ily
marketing
campaign
of a new communications
tential customers
the primary data with a
(later we augment
This

of

consumer-specific
ing team identified

attributes).
and marketed

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

The

firm's

primar
to po
service
large set
market

to a list of prospects

NETWORK-BASED MARKETING
its standard methods.
We
using
network-related
effects or evidence

investigate
of "viral"

whether

stead

informa

fined 21 marketing
(Table 1) that were used
segments
for campaign management
and post hoc analyses. The
of consumers.
The team be
sample included millions

in this group. As we will de


tion spread are present
to a group we identified
scribe, the firm also marketed
using the network data, which allows us to test our hy

lieved

cer
to disclose
are not permitted
potheses
tain details, including specifics about the service being
offered and the exact size of the data set.

level

who were thought to be "high tech."


In keeping with
the marketing
standard practice,
on
a
team collected
set
of
attributes
prospects?
large
consumers whom they believed
to be potential adopters
those consumers

team used demographic


The marketing
customer
data,
relationship data, and various other data
sources to create profitability
and behavioral models

Med

Hi
3 2Y
4 2Y
Med
Hi
5 1Y
Med
6 1Y
73N
Hi
Hi
Med
102N

2
9N

N
N

2
20
21
2

on

demo

1
segments

(see Section

1 3 YHi
1-7

Med-Hi
Med-Hi

1-4
1-4
1-4
1-4
1-7

Hi
Hi0.10
PI
1.7
Hi
Hi0.25
PI
0.1
Med-Hi

8 3 NMed
1-7

Med-Hi

details)

%
Offer

Early Adopt

1-7

1-4
1-4
11
1NHi
1-4
1-4
1-7
1-4

4.1 for

PI
1.60.63
PI

1-4
1-4

1.7

PI

0.1

10.9
0.50P2
P2
13.1

Med-Hi

1-7

of %NN
list

2.41.26 PI

Hi

Y
19 1,2,3

based

ordered

IN?
16 ?
Hi
17N
3
1-7
Hi
181,2
N
1-4
Hi, Med
Hi, Med

were

or

Hi
P2 17.5
0.04
Hi 0.07
P2
11.0
Hi
P2 5.3
0.14
Hi
P27.7
0.25
Med-Hi
2.00.63 P2
Hi0.15
P2
2.0
15 1?Y??
P3
2.0
1.01
?
P2 1.6
0.46
Med-Hi
P2+
2.00.70
Hi P2+ 2.0
0.15

Med
12
N1
Hi
133N
N
Hi 14 1,2

customers
3 comprises
and/or those who have

services; Techl
any international
and
Tech2
low)
(1-10, where
l=high
and other
tech) are scores derived from demographics

Tech2

2Y
3

to campaigns.
attributes
important

previously
(hi, med

the marketing

Techl

based

and other customer


The at
characteristics.
graphics
tribute Intl is an indicator of whether
the prospect had

Table

Intl

variable was loyalty, a three


on previous
with
the
relationships
ser
orders
of
this
and
other
previous

score

response
Other

to identify prospective
who would
targets?consumers
a targeted mailing.
receive
The data the marketing
us with did not contain
team provided
the underly
attributes
but in
ing customer
(e.g., demographics),

Segment
Loyalty

would

have varying
to
separate the seg
important
to learn the most from the campaign.
segments

firm at the time of mailing;


is
little (if any) information
on
available
them. Previous
analyses have shown that
loyalty and tenure attributes have substantial
impact on

of the service.

for

that de

to a number of services in the past. Loyalty


level 2 comprises
those customers with which the firm
has had some limited prior experiences.
level 1
Loyalty
consumers
not
who
did
have
with
the
service
comprises

of this, it was
and, because
technology
to
would be most
successful
that marketing

statistics

that the different

attributes

subscribed

new

Descriptive

for derived

firm, including
vices. Roughly,
level
loyalty
with moderate-to-long
tenure

a
In late 2004, a telecommunications
firm undertook
cus
to potential
large direct-mail marketing
campaign
tomers of a new communications
service. This service
believed

values

response
ments
in this way
An important derived

Initial Data Details

involved

included

rates and itwas

further. We

4.1

263

P3 1.80.67

LI
6.0Hi
0.05
Hi
L2
6.0
0.05

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

0.83

0.08
0.22

S. HILL, F PROVOST AND C. VOLINSKY

264

the interest and ability of the


a high-tech
is
service; Early Adopt
score that estimates
of
the likelihood

for our

attributes

that estimate

relaxed

customer

to use

the marketing

a proprietary
the customer

a new

to use

ous behavior.

also

We

show

received

different

on previ
that
indicating

based

product,
the Offer,
different marketing

mes

segments
that were
indicate different
postcards
sages: P1-P3
a "+"
and
different
L2
indicate
LI
and
letters,
sent,
the mailing.
that a "call blast" accompanied
indicates

those groups with high loy


the segments,
In defining
were
lower values from the tech
values
permitted
alty
15 and 16
models.
and
Segments
early adoption
nology
were
an
insuf
there
were provided by
external vendor;
to fit our Tech and Early
ficient data on these prospects
Adopt
4.2

Primary

The

by a "?" in Table

as indicated

models,

Hypothesis

research

goal we

and Network
here

consider

1.

Neighbors
is whether

re

con
between
of independence
laxing the assumption
the estimation
sumers can improve demonstrably
of
our
is
that
first
likelihood.
Thus,
hypothesis
response
someone who has direct communication
with a current
subscriber

is more

It should

be noted

likely herself to adopt the service.


that the firm knows only of com

initiated by one of its customers


through a
are
so
data
network
the
service of the firm,
incomplete
lower
for
the
groups.
loyalty
especially
(considerably),

munications

Data

on

communications

events

include

anonymous
stamp and the

a time
the transactors,
the
For
transaction duration.
purposes of this research,
so that individual
all data are rendered anonymous
identifiers

for

are protected.
an at
we constructed
In pursuit of our hypothesis,
tribute called network neighbor
(or NN)?a
flag that
consumer
had commu
the targeted
indicates whether
identities

a current user of the service in a time pe


riod prior to the marketing
campaign. Overall, 0.3% of
In Table 1, the per
the targets are network neighbors.
(%NN) is broken down
centage of network neighbors

nicated with

by segment.
team invited us to create
the marketing
In addition,
our own segment, which
target. Our
they also would
that were
of
network
22"
consisted
neighbors
"segment
not already on the current list of targets. To make sure
our list contained
calculated
scores
based
used
with
merit

for

viable

prospects,

the derived

technology
on our
the consumers

team

the marketing
and early
list. They

adopter
filtered

scores, but they relaxed the thresholds


to limit their original
list. For instance, someone
= 1 needed a Tech2 score less than 4 to
loyalty
on these

inclusion

on

the initial

list; this threshold

was

clusion

list to Tech2
team allowed

less than 7. In this way,


in
prospects who missed

on the first cut to make

they were network neighbors.


ing team still avoided
targeting
believed

had very
those network

For

it into segment 22 if
the market
However,
customers

who
they
a
of
purchase.
probabilities
who did not score high
neighbors
small

to warrant
in segment 22, we still
inclusion
enough
tracked their purchase records to see if any of them sub
scribed to the service in the absence of the marketing
see below. Overall,
the profile of the candi
campaign;
to be subpar
dates in our segment 22 was considered
in terms of demographics,
affinity and technological
our
these tar
for
final
conclusions,
capability. Notably,
the firm would
gets are potential customers
wise ignored. The size of segment 22 was
list.
of the marketing

have other
about

1.2%

the pros
the above process divides
summarize,
two
dimensions:
(1) targets?those
pect universe along
as being
consumers
identified by the marketing models
To

of solicitation?and
(2) network neighbors?
worthy
with a subscriber.
those who had direct communication
Table

2 shows

the relative

combination

size for each

targets as the refer


(using the non-network-neighbor
ence set). Note
who neither
the non-NN
nontargets,
are network neighbors nor are they deemed to be good
of the prospect
This group is the majority
prospects.
consumers
firm
has very lit
and
includes
that
the
space
tle information about, because
they are low-usage com
municators

or do not subscribe

to any services with

the

firm.

with

4.3 Modeling
To determine

as
relaxing the independence
the network
data) improves model
a
wide
range of demographic
using

whether

(using
sumption
ing, we fit models

and consumer-specific
are known
of which
mated

likelihood

the values

Data

Consumer-Specific

independent
or believed

attributes
to affect

of purchase).
Overall, we
to assess
150 attributes

for over

fect on sales

likelihood

network-neighbor

These

values

collected
their ef
with

the

included

the

and their interactions

variable.

(many
the esti

following:
data: We obtained
Loyalty
than the simple
formation

finer-grained
categorization

loyalty in
described

types of service,
to prior mailings,
responded
a loyalty score generated by a proprietary model and
information about length of tenure.

above,
past
including
how often the customer

spending,

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

NETWORK-BASED MARKETING
Table
Data

= Y

Target
NN = Y

NN

= N

non target s

NN

1-22
size =

= N

Target

targets

Segments
Relative

NN

categories

Relative

0.015

Non-NN

Non-NN

nontargets

Relative

size >

Consumers

size =

1
models

but who

are

Consumers
were

data

The

for our

in each

up

into targets and network


study are broken down
to the non-NN
relative
target group.

level, credit score, head of


of children in the household,
age
and home
in the household,
occupation

education

household,
of members

number

ownership.
the census

Some

of this information

was

inferred at

tract level from

the geographic
data.
As mentioned
earlier, we

Network

attributes:

served

communications

other consumers.
neighbor

flag

of current

subscribers

ob
with

to the simple network


earlier, we derived more

In addition
described

communica
attributes from prospects'
sophisticated
tion patterns. We will return to these in Section 5.6.
4.4

Data

Limitations

for all targets


data are available
example,
geography
across all three loyalty levels. On the other hand, as the
number of services and tenure with the firm decline,
so does

the amount

available

for each

not

on marketing

The

neighbors.

were

not

network

to be good

and

neighbors
prospects

also

the mar

by

model.

"relative

The

size"

value

shows

the number

of prospects

rate is very low. As discussed


inherent with a heav
challenges

overall

response

above, this presents


ily skewed response variable. For example, an analysis
that stratifies over many different attributes may have
several strata with no sales at all, rendering these strata
useless. The data set is large, which
mostly
ameliorate
this problem, but in turn presents
with
statistical
many
problems
sophisticated
In this paper, we restrict ourselves
forward analyses.
4.5

Loyalty

of information

to relatively

to

helps
logistical
analyses.
straight

Distribution

A look at the distribution


the four categories
(Figure
the firm targeted customers

of the loyalty groups across


shows that
1) of prospects
in the higher

loyalty groups

The

target group
heavily.
network-neighbor
this
appears to skew toward the less loyal prospects;
is due to the fact that segment 22, which makes
up a
com
of
the
large part
network-neighbor
population,
prises

predominantly

consumers.

low-loyalty
5. ANALYSIS

we

that
evidence
direct, statistical
with prior cus
communicated
tomers are more
likely to become customers. We show
this in several ways,
including
using our own best
Next

consumers

will

show

who

have

(e.g., transactions)
in in
the difference
target. Given
as loyalty varies, we grouped customers by
formation
in our
loyalty level and treated the levels separately

ducting

leaves three groups that


This stratification
analyses.
are mostly
with respect to miss
consistent
internally

out-of-sample
cated network

values.

they

considered

relatively

values for customers across


We encountered missing
information
is
all loyalty levels. The amount of missing
directly related to the level of experience we have had
For
with the customer just prior to the direct mailing.

ing

neighbors,
scored poorly

group,

data were necessary


data: Geographie
Geographie
for the direct mail campaign. These data include city,
state, zip code, area code and metropolitan
city code.
These
include
such
data:
information
Demographic
as gender,

but were

network

who

not

keting

show

who

to because

models.

targets
1-21

identified
by marketing
Prospects
not network
neighbors.

who

were

marketed

Segments
Relative

Notes.

size =0.10

identified
models
and who also
by marketing
Prospects
are network
in
22 have re
Those
segment
neighbors.
on the marketing
scores.
model
thresholds
duced

efforts

to build

improved

competing
targeting models
assessments
of predictive
thorough
data. Then
attributes

we

consider

and con
ability

more

and show that targeting

further.

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

on

sophisti
can be

266

S. HILL, F. PROVOST AND C. VOLINSKY


tTaqjjr* H

Tm&tff

-?*?#t
-Vf/

V,;i.m.
i*rr

-r-ffl

<M?m

'*'? -";:-'"'.'.'

FlG.

distribution
category.
by customer
Loyalty
The network
(NN) show a much
neighbors

1.

categories.

5.1

Network-Based

Improves

Marketing

The

>'&'?S-Tj^:

three

bars

larger proportion

Response

who

the service
adopted
the offer. For each

within

sizes of
consumers

to log-odds ratios) for


parameter estimates
(equivalent
inter
the network attribute along with 95% confidence
vals for 20 of the 21 segments
(segment 5 had only a
and zero
small number of network-neighbor
prospects

Figure

the

three

than

loyalty
the non-NN

groups

for

our four

data

group.

sales, and therefore had an infinite


2 shows that in all 20 segments
the

effect is positive
(the parameter esti
network-neighbor
an increased
mate
is greater than zero), demonstrating
take rate for the network-neighbor
group within each
segment. For
is significantly
of 0 (p < 0.05),
bor significantly
While

segment,
specified period following
we performed
a simple logistic regression for the inde
attribute versus the depen
pendent network-neighbor
In Figure 2, we graphically
dent sales response.
present

odds

17 of these
different

ratio
value

that being a network neigh


indicating
affected sales in those segments.
ratios allow for tests of significance
of

an independent
variable,
as
pretable
comparisons
neighbor

the log-odds
segments,
from the null hypothesis

inter
they are not as directly
of take rates of the network

and non-network-neighbor
groups
take rates for the network
The

in a given

segment.
neighbors
are plotted versus the non-network
in Fig
neighbors
ure 3, where
to
the size of the point is proportional
the log size of the segment. All segments have higher
take rates in the network-neighbor
subgroup, except for
the one segment
sales
that had no network-neighbor
(the smallest sample size). Over the entire data set, the
take rates were greater by a fac
network-neighbors'
tor of 3.4. This value is plotted in Figure 3 as a dotted
line with slope = 3.4. The right-hand plot of Figure 3
shows the relationship
take
between
each segment's

I w
O
o?

its lift ratio, defined as the take rate for NN


The plot shows
by the take rate for non-NN.
that the benefit of being a network neighbor
is greater

rate and

of low-loyalty

log odds).

stratifying by many attributes known to be important,


is
variable
such as loyalty and tenure. The response
the take rate for the targets in the two months
following
the direct mailing. The take rate is the proportion of the
consumers

the relative

network-neighbor

Segmentation
provides an ideal setting to test the sig
inmodel
nificance and magnitude
of any improvement
information, while
ing by including network-neighbor

targeted

show

divided

segments with lower overall take rates.


As Figure 3 shows, some of the segments had much
higher take rates than others. To assess statistical
sig
for those

Segments
FIG.
ted as

2.

Results

by

log odds)

Parameter
estimates
plot
of logistic
regression.
ratios with 95% confidence
intervals.
The number

log-odds
at the value

plotted
ment numbers

(ordered

from

of the parameter
1.
Table

estimate

refers

back

to seg

of the network-neighbor
effect after account
ing for this segment effect, we ran a logistic regression
across all segments,
the main effects for the
including

nificance

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

NETWORK-BASED MARKETING

-A

1
Jake

FIG.

Take

3.

267

(%} for Noti ^Network,

Rate

Take Rate

for Non-Network

rates for

with that of
segments.
marketing
Left: For each segment,
comparison
of the take rate of the non-network
neighbors
The
is proportional
the
to the log size of the segment.
size
There
is one outlier
not plotted,
with a take rate
neighbors.
glyph
of
and 0.3% for the non-network
lines are plotted
at x = y and at the overall
take-rate
of 11% for the network
neighbors
neighbors.
Reference
ratio of 3.4. Right: Plot of the take rate for the non-network
the network
group versus
lift ratio for
neighbors.
the network

attribute, dummy attributes


network-neighbor
terms between
and
the
interaction
segment
Two

terms had

the interaction

of

from

segment
cases, and one

The

to be deleted:

one

22, which
from

the network
and used

for each
the two.

only had network-neighbor


the segment with no sales from
We

neighbors.
stepwise variable

ran a full logistic


selection.

regression

Coeff (ci.)

Network neighbor (NN)


= 1
Segment
Segment
Segment

Segment
Segment

to get an
interval of

Significance2

negative and very close inmagnitude


of the main effects of the segments

to the coefficients
themselves.

There

are significant,
the segments
themselves
fore, although
in the presence
of the network attribute the segments'
effect ismostly
negated by the interaction effect. Since

1.7(0.9,2.5)
1.8(1.2,2.4)
2.1(1.3,3.0)
1.9(0.4,

3.3)

1.9(1.2,

2.5)

1.4(1.0,

1.9)

1.3(0.9,

1.7)

Segment = 8
Segment = 17
Segment = 19
NN x Segment = 1
NN x Segment ==2
NN x Segment = 4
NN x Segment = 6
NN x Segment = 7
NN x Segment = 8
NN x Segment = 17
NN x Segment = 19

is an esti

of these interactions
is important. Note
interpretation
that the magnitudes
of the interaction coefficients
are

2.0(1.7,2.3)

Segment = 5

attribute in the final model


network-neighbor
mate of the log odds, which we exponentiate
odds ratio of 7.49, with a 95% confidence

than half of the segment effects and


(5.64, 9.94). More
most of the interactions between
the network-neighbor
attribute and those segment effects are significant. The

Table 3
Coefficients and confidence intervalsfor thefinal segment model
Attribute

results of the logistic regression


reiterate the sig
of being a network neighbor. The final model
can be found in Table 3. The coefficient
of 2.0 for the

nificance

the segments represent known important attributes


this is evidence
loyalty, tenure and demographics,
being

a network

neighbor

is at least as important

like
that
in this

context.

1.5(0.7,2.2)

In Table

2.2(1.6,2.9)

4 we present an analysis of deviance


table,
to analysis
of variance used for nested lo

-1.1

(-2.1,

0.0)

an analog

-0.9

(-1.7,

-0.2)

-1.8

(-4.0,

0.4)

gistic regressions (McCullagh and Neider, 1983). The

-1.5

(-2.6,

-0.6)

-1.2

(-1.7,

-0.6)

-0.8

(-1.3,

-0.4)

-1.6

(-2.8,

-0.5)

-1.1

(-1.9,

-0.3)

table confirms

significant when a chi-squared


approximation
for the differences
of the d?viances.
The fact
many

of the attributes in the logistic regression model


Significance
shown at the 0.05 (*) and 0.01 (**) levels.

the significance
of the main effects and
Each level of the nested model
is

of the interactions.

is

interactions

are significant demonstrates


effect varies for different

network-neighbor
of the prospect population.

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

is used
that so
that the

segments

S. HILL, F. PROVOST AND C. VOLINSKY


Tab
of deviance

Analysis

NN

interactions

10687

of attributes

at each

is shown

at the 0.05

not identified
us to compare
take rates
targets for the segments
types of targets. However, many of
targets fall into the network-only

segment data enable


of network and non-network
both

the network-neighbor
segment 22. Segment

Significance^1

9 63
370
8 41

10733

22

The

that contained

Change in deviance

10869

of the group

Significance

5.2 Segment

study

11200

Intercept

Segment
Segment + NN
Segment

the network-neighbor

DF

Deviance

Variable

E4

table fo.

that the
22 comprises
prospects
not
to
be
models
deemed
good can
original marketing
can
see
we
from
the
distribu
As
for
didates
targeting.
most
the
tion in Figure
1, this segment for
part contains
who had no prior relationship with the firm.
the take rates for segment 22 with the
compare
take rates for the combined
group, including all of seg
in
ments
the
leftmost
three bars of Figure 4.
1-21,
consumers
We

The network-neighbor
segment 22 is (not surprisingly)
as the NN groups in segments
not as successful
1-21,
1-21 were selected based
since the targets in segments
for mar
them favorable
that made
we
see
the
that
22 net
segment
keting. Interestingly,
non-NN
the
work neighbors
targets from
outperform
on characteristics

1-21. These
segment 22 network neighbors,
segments
on the basis of their network ac
identified primarily
likely by almost 3 to 1 to purchase
tivity, were more
than the more "favorable" prospects who were not net
work neighbors.
Since those in segment 22 either were

be unworthy
would have

(**)

by marketing

analysts

or were

prospects,
they represent
"fallen through the cracks"

deemed

Improving

Now we will

to

who

in the tradi

process.
a Multivariate

Targeting

assess whether

the NN

Model
attribute

can im

a multivariate

prove
targeting model
by incorporating
all that we know or can find out (over 150 different at
demo
tributes) about the targets, including geography,
and other company-specific
from
attributes,
graphics
internal and external sources (see Section 3.2).
As discussed
in Section
3.7, we tried to address
an important causal question
that
(as well as possible)
arises: Is this network-neighbor
effect due to word of
or simply due to homophily?
The observed ef
fect may not be indicating viral propagation,
but in
a
stead may
demonstrate
effective
way
very
simply
to find like-minded
people. This theoretical distinction

mouth

may not matter much to the firm for this particular type
of marketing
process, but is important to make, for ex
before
future campaigns
that try to
ample,
designing
take advantage of word-of-mouth
behavior.
we
cannot
control for unobserved
Although
ities, we can be as careful
to ensure that the statistical

1.35%

levels.

customers

tional marketing
5.3

(*) and 0.01

similar

as possible
in our analysis
NN prospects
of
the
profile

cases. Since
is the same as the profile for the non-NN
set contains many more non-NN
cases than
we
case
a
NN cases,
match each NN
with
single non
our data

0.83%

II
Network
Neighbors
Segs1-21
FlG.

4.

network
compared
nontarget

%%$
Wmn
' W/^n
Network

0.28%|
W0\
mz-y\
' Non-Network '
Neighbors
Segs1-21

Neighbors
Seg 22

rates for marketing


and non-network
neighbors
Take

segments.
neighbors

the all-network-neighbor
take
All
network
neighbors.

with

non-network-neighbor

group

(segments

Q.11%
?mm?
Network

rates for

in segments

22 and with
segment
rates are relative
to
1-21).

reasonably
NN group.

Neighbors
Non-Targets

Take

case

to it by calculating
that is as close as possible
scores
all
of
the
attributes
propensity
using
explanatory
considered
in
Section
At
the
end of
described
(as
3.7).
as
as is
this matching
the
NN
close
is
process,
group

NN

the
1-21
the
the

possible

in statistical

properties

to the non

to heterogeneity
of data sources across the three
we
scores to create
used
the propensity
loyalty groups,
a matched
data set for each group. For each (individu
in
ally), we fitted a full logistic regression
including
Due

teractions

and selected

a final model

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

using

stepwise

NETWORK-BASED MARKETING
Table
Results

of multivariate

model

Loyalty
1
NN

Significant

NN

Discount

attributes

Level

calling plan (-)(I)

of Int'l

Referral

(-)

band

Tenure

firm

with

(-)

to loyalty

Belonged
Referral

program

of adults

Number

service

High Tech Msg

plan

Letter

of country
indicator
to
Belonged
loyalty program
Chumer
(-)

Recent

grad
at residence
children

Any

to

responder

mailing

High Tech model score (I)

College
Tenure

plan

Type of previous
score
Credit

Previous

(-)

Region

communicator

International

calling
plan
firm
with

Tenure

Comm.(I)
in house

# of devices
Revenue

NN

Discount

(vs. postcard)
responder
incentive

User

of

Any

children

to mailing
credit card

in house

(-)

(-)

in house

(-)

Child < 18 at home (-)

in house

Beta hat for NN


0.68 (0.46, 0.91)

(95% CI)
Take

rate

Notes.
the effect

variable
liers,

0.99

attributes
Significant
of the variable was

selection.

All

from

0.4%

across
levels
regressions
loyalty
a significant
interaction
(1) indicates

logistic

negative;

attributes were

checked

for out

with other at
and collinearity
the attributes
removed or combined

transformations

tributes, and we
for any significant correlations.
that accounted
Table
5 shows the results of the logistic
regres
were
to be
the
attributes
that
found
show
which
sions,
correlated with
those that were negatively
significant,
take rate, and those that had interactions with the NN
found the network
attribute. Each of the three models
attribute to be significant
along with several
neighbor
others. The significant attributes tended to be attributes
regarding the prospects'
previous relationships with the
firm,
with

such

0.84

(0.49,1.49)

0.9%

as previous
tenure
international
services,
and revenue
churn identifiers
spent with

firm,
the firm. These

attributes are typically correlated with


which explains the lack of sig
attributes,
demographic
attributes con
of many of the demographic
nificance
tenure with firm is significant
in
sidered. Interestingly,
In
the
with
different
1
and
but
2,
signs.
loyalty groups
correlated, but
loyal group, tenure is negatively
is
in the mid-level
loyalty group it positive. This unex
of
compositions
pected result may be due to differing

most

the two groups; those consumers with


long tenure in
who
be
the most
people
just never
loyal group might
long tenure in the other group
change services, while
an
that
be
indicator
they are gaining more trust
might
In loyalty group 1, there is limited in
in the company.
services with the firm. For
formation
about previous

0.3%

indicates
(p < 0.05). Bold
with
the NN variable.

those

(0.52,1.16)

significance

customers,
knowing
to
any
responded
previous
significant effect.

at 0.01

whether
marketing

level;

(-) indicates

the customer
campaigns

has
has a

Table 5 also shows parameter


estimates
for NN and
the take rates in the three loyalty groups. The take rates
are highest
in the group with the most
loyalty but, in
this group gets the least lift (smallest para
terestingly,
meter estimate)
from the NN attribute. So the impact
of network-neighbor
ments with lower

is stronger for those market


seg
loyalty, where actual take rates are

weakest.

5.4 Consumers

Not Targeted

above,
only a select subset of our
was
list
based
network-neighbor
subject to marketing,
on relaxed
on eligibility
thresholds
criteria. The re
As

discussed

mainder

of the list, the nontarget


network neighbors,
made up the majority.
customers were omit
Potential
reasons:
ted for various
to
they were not believed
have high-tech
capacity;
they were on a do-not-contact
was unreliable,
address
information
and so on.
list;
we
were
to
able
whether
Nonetheless,
identify
they
the product
in the follow-up
time period.
purchased
take rate for this group was 0.11%, and is shown
to the target groups as the rightmost bar in Fig
ure 4. Although
to, their
they were not even marketed

The

relative

take rate is almost half that for the non-NN


targets?
as some of the best prospects
by the market
consumers
without
ing team. This group comprises

chosen

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

S. HILL, F. PROVOST AND C. VOLINSKY

270

characteristics
that would
have
any known favorable
on
the list of prospects.
The fact that they
put them
are network neighbors
alone supports a relatively high
rate, even
lends some

of direct marketing.
an
to
of word-of
support
explanation
mouth propagation
rather than homophily.
we
will
the remainder of the
Finally,
briefly discuss
take

This

consumer

in the absence

non-NN

space?the

tunately,

a take rate in this

it is very difficult
which could be considered

a baseline

category,
all of the other
estimate
cludes

Unfor

group.

nontarget

to estimate

rate for

take rates. To do this, we would need to


This in
the size of the space of all prospects.
the firm knows about, as well
all of the prospects

as customers
of the firm's competitors
and consumers
this product that do not have cur
who might purchase
rent telecommunications
service with any provider.
It
has been established
that the size of the communica
tions market
best

(Poole, 2004); our


take rate put it at well
at least an order of magnitude
less than

is difficult

to estimate

of this baseline

estimates

below

0.01%,
the nontarget network neighbors.
of our study is that
On the other hand, a by-product
we can upper-bound
the effect of the mass market
even

in general

the target-NN
by comparing
ing campaigns
The
and
the
in
difference
group.
group
nontarget-NN
rates
the targeted network neighbors
take
between
and
the nontargeted
This difference

network neighbors
is about
10 to 1.
cannot all be attributed to the marketing

chosen
effect, since the targeted group was specifically
to be better prospects
and it is likely that more of them
would have signed up for the service even in the total
it does seem reason
of marketing.
However,
an
able to call this factor of 10
upper bound on the
effect of the marketing.

absence

5.5 Out-of-Sample
These

Ranking

that we

results

estimations

Performance

suggest
as to which

can give fine-grained


are more
or less

as network-neighbor
status. Note that in different
business
scenarios, different
types and amounts of data
are available. For example,
for low-loyalty
customers,
are
few
attributes
known.
We
very
report
descriptive
results here using all attributes; the findings are quali

well

subset of attributes
tatively similar for every different
we have tried (namely,
segment,
loyalty, geography,
The response
is the same as
variable
demographic).
and we

above

els. We measure
nary response

can be
to respond to an offer. Such estimations
the consumer
pool is immense and a
quite valuable:
a
limited
will
have
be
budget. Therefore,
campaign
ing able to pick a better list of "top-/:" prospects will
to increased profit (assuming
lead directly
targeting

prises

the ability

to rank customers

consumer,

we

all of the traditional

ing loyalty,

demographic

statistic,

for each predicted possible probability


score cutoff re
from
the
model.
sulting
logistic regression
Specifically,
the AUC is the probability
that a randomly chosen (as
yet unseen) taker will be ranked higher than a randomly
chosen nontaker; AUC = 1.0 means
the classes are per
= 0.5 means
and
AUC
the list is ran
fectly separated
domly shuffled. All reported AUC values are averages
obtained

using 10-fold cross-validation.


6 shows the AUC
values for the three

Table

alty groups,

the expected
regression models.

quantifying

benefit

from

loy
the

There
is an in
improved
logistic
crease in AUC for each group, with the largest increase
to loyalty level 1, for which
the least infor
belonging
mation

is available;
note that here the ranking
the network
is not much
information

without
than

ability
better

random.

this improvement,
Figure 5(a) shows cu
curves
mulative
when using the model
response ("lift")
on loyalty group 3. The lower curve
the per
depicts
formance of the model
all
traditional
attributes,
using
To visualize

and the upper curve includes the traditional marketing


attributes and the network-neighbor
attribute. In Fig
ure 5(b), one can see the marked
that
improvement
Table
ROC

analysis:

create

attributes

accurately.
a

record

that

(trad atts),

and geographic

as

that result from


regression

10.54

the application

of

trad

+ NN

models

atts

atts

0.60

20.64

0.67

30.60

0.64

were built using all available


models
logistic
regression
with
and without
(trad atts + NN)
(trad atts) the network
see an increase
across
attribute. We
in AUC
all loyalty
neighbor
when
NN
is
the
attribute
included
the
in
model.
groups
The

attributes

includ

attributes,

values

trad

Loyalty

Note.
com

AUC

logistic

higher for higher ranked prospects).


In this section, we show that combining
the network
attribute with
the traditional
attributes
im
neighbor
each

logistic regression mod


the predictive
ranking ability in the bi
variable by an increase in theWilcoxon

to the area under


equivalent
curve
the
(AUC). The ROC
represents
trade off between false negative and false positive rates

Mann-Whitney
the ROC curve

costs are not much

For

the same

customers

likely

proves

used

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

271

NETWORK-BASED MARKETING
2.5
- trad atts
-trad atts + NNi

1.5

E
3
O

0.5

% of Consumers

Targeted

0.8

0.6

0.4

0.2

Cumulative

(Ranked by Predicted

0.4

0.2

Cumulative

Sales)

% of Consumers

0.6
Targeted

(a)

0.8

(Ranked by Predicted

Sales)

(b)
curves

built with all attributes


with
(trad atts) and without
(trad atts + AW)
it. For example,
the model
without
outperforms
if the firm sent out 50% of the
to receiving
with
without
it. (b) Top-k
the NN compared
responses
get 70% of the positive
only 63% of the responses
they would
mailing,
scores from
are
the
NN
attribute
the
model.
The
model
that
includes
the
Consumers
ranked
logistic
regression
by
probability
analysis.
the NN attribute
and
For example, for the top 20% of targets,
the take rate is 1.51% without
1.72% with the
the model
without.
outperforms
FIG.

5.

curves.

(a) Lift

NN

Power

of the segmentation
with the NN
The model

attribute.

network-neighbor

for models

attribute

attribute.

from sending to the top-/: prospects


for the top 20% of the list,

be obtained

would

on the list. For example,


without
the NN attribute,

the take rate is 1.51%; with


it is 1.72%. The NN attribute does not

the NN

attribute,
improve the ranking

measures
of social
phisticated
network of existing customers.
a set
7 summarizes
Table
network

from

Performance

Improving

By Adding
Attributes

Network

Sophisticated

More

that can be extracted

interaction

from

the

data. We now investigate whether


augment
more
social-network
with
the
model
sophisticated
ing
can add additional
value. In this section,
information

network

we

focus

on the social

the current

"the network"),
along with
of prospects who have communicated

will

call

the network
whether

we

(the network
can improve

targeting

the periphery
with those on
We

investigate
so
by using more

neighbors).

Table
Network

them represent
between
the nodes. The
relationships
intuitive social notions,
SNA measures
help quantify
such as connectedness,
social im
influence, centrality,
on.
so
to
and
understand
portance
Graph theory helps

attribute

ating mathematically.
Three of the attributes

and methods

that we

introduce

7
descriptions

Description

Number

customers
of unique
communicated
customers
of transactions
to/from

Number

of seconds

Number

Degree
Transactions
of communication

Connected

to influencer

Connected

component

Max

them as interconnected

similarity

communicated

with

with
before
customers

before

the mailing
the mailing
before mailing

Is an influencer
size

Size
Max

for oper
can be de

rived from a prospect's


local neighborhood
(the set of
re
on
immediate communication
the
network;
partners
call that these all are current customers). Degree mea

Attribute

Seconds

of social-network

problems better by representing


nodes, and provides vocabulary

that comprises
(only)
of this service (which here we
network

customers

social

Social-network

a consumer
is a network neigh
whether
Knowing
indicators of consumer-to
bor is one of the simplest
consumer

the fields

additional

the

and
analysis
involves
(SNA)
graph theory.
analysis
trans
information
measuring
(including
relationships
on
a
in
network.
The
nodes
between
mission)
people
the network
and
the
links
between
represent people
degree

5.6

of

with

add to the logistic regres


use is borrowed
to some

attributes

sion. The

for the top 10% of the list.

that we
we
terminology

relationship

in prospect's
local neighborhood?
to
of the connected
component
prospect
belongs
in local neighborhood
with any existing
overlap
neighboring

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

customer

272
sures

the number

of direct

S. HILL, F. PROVOST AND C. VOLINSKY

connections

a node

Table

has.

ROC

we also count the num


the local neighborhood,
ber of Transactions,
and the length of those transac
tions (Seconds of communication).

Within

The network
Given
tices

is made up of many disjoint subgraphs.


a graph G = (V, E), where V is a set of ver
is a set of links between
them,
(nodes) and ?
are the sets of ver

the connected

of G
components
tices such that all vertices in each set are mutually

con

Attribute(s)

AUC

Transactions

0.68

Seconds

of communication

to influencer

Connected

component

All

network

All

traditional

nected

component may be an indicator for awareness


of and positive views about the product. If a prospect

All

traditional

to a large set of "friends" all of whom have


the
she may be more
service,
adopted
likely to adopt
herself. Connected
is
the
size of the
size
component
(in the network) to which
component
largest connected

Note.

is linked

the

borhood.

Observing
local neighbors,

prospect's
of social

a prospect's
local neigh
local neighborhoods
of a
we can define a measure

as the
similarity. We define social similarity
size of the overlap in the immediate network neighbor
hoods of two consumers. Max
is the max
similarity
imum

the prospect and any


similarity between
of the prospect.
the firm also can
neighbors
Finally,
observe the prior dynamics
of its customers.
In partic
communi
ular, the firm can observe which customers
cated before and/or after their adoption as well as the

social

date customers

this information, we
signed up. Using
define influencers as those subscribers who signed up
we see one of their
for the service and, subsequently,
network

neighbors
sign up for the service. Connected
to influencer is an indicator of whether
the prospect
is
to one of these influencers. We appreciate
connected
that we do not actually know
if there was true influ

ence.

We

use all of the aforementioned

AUC

values

find

that some

and show

for these predictive models


in Table 8.We
of these network
attributes have con

siderable
more

attributes

and have even


individually
This is indicated by AUCs
transactions
and seconds of commu

predictive power
value when combined.

of 0.68

for both

0.68
0.59

Degree
Connected

0.53
size

0.55

0.55

Similarity

(reachable by some path) and no two vertices


in different
sets are connected.
The size of the con

nected

the prospect
is connected.
We
also move
beyond

analysis

each

AUC
of

0.71
(loyalty,
demographic,
+ all network
0.71

values

result

the constructed

in combination.

Results

from

network

geographic)

0.66

built
models
logistic
regression
as
well
attributes
individually,

are presented

for

loyalty-level

on
as

3 customers.

neighbors, who already have especially


high take rates
as a group, as we have shown.)
when we combine
the traditional at
Interestingly,
tributes with

the network

there is no ad
attributes,
even
in
of these
AUC,
gain
though many
attributes were shown to be significant
in the broader
analysis above. The similarities
represented
implicitly
or explicitly
in the network attributes seem to account

ditional

for all useful

information
captured by traditional de
and
other
attributes. That tra
mographics
marketing
ditional demographics
and other marketing
attributes
do not add value is not only of theoretical
interest, but
as well?for
in cases such as this
practical
example,
where demographic
data must be purchased.
Our result is further confirmed
by the lift and take
rate curves displayed
in Figure 6(a) and (b), respec
tively. One can achieve substantially
higher take rates
to using
using the new network attributes as compared
the traditional attributes. For example, we find that for
the top 20% of the targeted list, without
the network
the take rate is 2.2%; with the network at
it
is 3.1%. Likewise,
at the top 10% of the list,
tributes,
the take rate with the network attributes is 4.4% com
them.
pared to 2.9% without
attributes,

6. LIMITATIONS

do not find high AUCs


for
individually
or
connected
to
size,
component
similarity
we find that the logistic regres
influencer. Ultimately,

We believe our study to be the first to combine data


on direct customer communication
with data on prod
uct adoption
to show the effect of network-based
mar

sion model

there are limitations


in
keting
statistically. However,
our study that are important to point out.
There are several types of missing,
or
incomplete
unreliable
data which could influence our results. We
have records of all of the communication
(using the

We

nication.

connected

an AUC

built with

of 0.71

the network

compared

the network

results

in

of 0.66 without

only the traditional mar


in previous
sections.
(Re
the ability to rank the network

attributes?using
attributes described

keting
call that this represents

attributes

to an AUC

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

273

NETWORK-BASED MARKETING

0.8
6

CO
? 0.6

4
0.4
0.2

3
trad atts
trad atts + net;

2
1

0.6

0.4

0.2

Cumulative

% of Consumers

Targeted

0.8

(Ranked by Predicted

0.2

0.4

% of Consumers

Cumulative

Sales)

(a)
FIG.
atts)

6.

(a) Lift
the network

attributes
attributes
rate

Power

attributes.

without

the model
without
net) outperforms
and 3.1 % with
attributes
the network

them

(trad atts).

the network

to and from current customers


of the
service)
service. That is not true for all the network-neighbor
consumers.
infor
As such, we do not have complete
as
well
mation about the network-neighbor
targets (as
some
In
the non-network-neighbor
addition,
targets).
of the attributes

we used were

external

be at least partially
is not well known

sources.

collected
These

by purchasing
data are known to

erroneous

and outdated, although it


so. An additional
how much
prob

from external sources


lem is joining data on customers
to missing
to internal communication
data, leading
or
data.
incorrect
data
sometimes
Finally,
just blatantly
telecommunications
firms are not legally able to col
lect

information

communication,
the consumers

Sales)

(b)

firm's

data from

(Ranked by Predicted

cundes for models


built with all traditional
with (trad atts + net) and without
attributes,
(trad
of segmentation
have received
11% of the positive
with the network
responses
they would
If the firm sent out 50% of the mailing,
to receiving
63% of the responses
attributes,
The model
the network
without
the network
(b) Top-k analysis.
including

curves.

compared
(trad atts +

is 2.2%

0.8

0.6
Targeted

the actual content of


regarding
so we are not able to determine

the
if

discussion

forums.

effect to manifest
expect the network-neighbor
of
for different
itself differently
types
products. Most
have fo
of the studies done to date on viral marketing
We

on the types of products


that people are likely
as
a
to talk about, such
new, high-tech
gadget or a re
cently released movie. We expect there to be less buzz

cused

example,

the new

for

the top 20%

of

target

ranked

by score,

the take

service

studied

here

to a roll-out

of another

product by the same firm. This other product was sim


ply a new pricing plan for an older telecommunications
service.

Customers

who

signed

stand to save a significant


pending on their current usage
could

range and variety


in the marketplace

up for this new plan


amount of money, de
the
patterns. However,

of telecommunications

pricing plans
to
and so confusing
that this
do not believe

is so extensive

the typical consumer


that we
is the type of product that would
consumers. We
cussion between

generate a lot of dis


refer to the two prod

as the pricing

For
plan and the new technology.
the pricing plan, we have the same knowledge
of the
network as we do for the new technology.
For those
ucts

In
in question
discussed
the product.
our data are inferior to some other do
content is visible, such as Internet bulletin

this regard,
mains where
boards or product

For

attributes.

consumers
who

who

belong
they communicate

to the pricing plan, we know


with and then we can follow

to see if they ulti


these network-neighbor
candidates
a measure
construct
for
the
We
sign up
mately
plan.
a
as
of "network neighborness"
series of
follows. For
consecutive
who

ordered

we gather data for all customers


months,
the product
in that month. We calculate

the percentage
of these new customers who were net
com
work neighbors,
that is, those who had previously
municated
with a user of the product. This percentage

like a new deodorant or a sale


for less "sexy" products,
on grapes at the supermarket. The study presented
in
this paper involves a new telecommunications
service,
and features that con
which involves a new technology
sumers have perhaps never been exposed to before. The

is a measurement

of the proportion
of new sales be
driven
network
effects.
this per
ing
by
By comparing
centage across two products, we get insight into which
product stimulates network effects more.

and features are such


firm hopes the new technology
that they would encourage word of mouth.
What can we say about other products that might not
To study this, we compared
be quite so buzz-worthy?

an 8-month

We

now

ucts was

for our two products over


period. The time period for the two prod
so that it would
the first
chosen
be within
look at this value

year after the product was broadly available. The re


sults are shown in Figure 7. The two main points to

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

274

S. HILL, F. PROVOST AND C. VOLINSKY


based on how much
decisions
marketing
they
and potential
customers.
about their customers
to mass market when
They may choose
they do not
know much. With more
information,
they may market
on
some
observed
We
characteristics.
directly based

make
N?w5?rvke
Pitting P?ar)

know

s\

and how well a


that whether
strong evidence
provide
consumer
is linked to existing customers
is a powerful
characteristic
sions. Our

on which
results

the use of social

Month
Fi G. 7.

plot for

Network-neighborness

new

service

versus

pricing

The
we

here

study
telecommunciations

recently
for $2.6

the pricing plan has a flat network-neighbor


never increasing above 3%.
percentage,

Google's

e-mail

explicit

networks

the dip in the plot for the new service


exactly to the month of the direct market

corresponds
ing discussed

we

can see

earlier. Before
the campaign,
that the network-neighbor
that
effect was increasing,
more
and more
in a given month
of the purchasers
were network neighbors.
the mass marketing
During
we exposed many non-network
campaign,
neighbors
to the service and many of them ended up purchasing
it, temporarily dropping the network-neighbor
percent
age. After the campaign, we see the network-neighbor
percentage
starting to increase again.
measure
This network-neighborness

should

not be

of the product, as the pric


was
from a sales perspective,
ing plan
quite successful
but it does suggest that the pricing plan is a product
that has less of a network-based
spread of information.
confused

with

the success

might be due to the new service creat


more
or perhaps we are seeing the
word-of-mouth
ing
effects of homophily.
interact with each
People who

This

other

difference

are more

for purchasing
for purchasing
fects of word

likely to be similar in their propensity


the new service than in their propensity
a particular pricing plan. Again,
the ef
of mouth

to discern without

are difficult
homophily
the content of the commu

versus

knowing

nication.

7. DISCUSSION
One

of the main

and to whom

concerns

for any firm is when, how


their products. Firms

they should market

deci
from

to predict the likelihood


of
the network data into account
im

networks

sort of directed

take away are that the new service has a higher per
cent of purchasers who are network neighbors
and also
an increasing one (except for the dip in month
5). In

Interestingly,

indicate

direct marketing
that a firm can benefit

purchasing.
Taking
on both the firm's
and substantially
proves significantly
own marketing
"best practices"
and our best efforts to
collect and model with traditional data.

plan.

contrast

to base

explicit

has

network-based
applicability

that
marketing
traditional
beyond

For

companies.

example,

eBay

upstart Skype
purchased
Internet-telephony
billion;
they now also will have large-scale,
data on who
talks to whom. With
gmail,
now has access
to
service, Google
of consumer
and
interrelationships
directed network
gmail for marketing;

already is using
based marketing might be a next step. Various
systems
have emerged
that provide
recently
explicit
linkages
between
Friendster,
acquaintances
(e.g., MySpace,
which could be fruitful fields for network
Facebook),

create interlinked
As more consumers
based marketing.
source
another
data
arises.
More
these
blogs,
generally,
results suggest that such linkage data potentially
could
be a sort of data considered

for acquisition
by many
data now are being col
types of firms, as purchase
lected routinely by many
types of retail firms through
Even
cards.
academic
could bene
loyalty
departments

in spe
such data; for example,
the enrollment
classes could be bolstered
to
cialized
by "marketing"
those linked to existing
students. Such links exist (e.g.,

fit from

It remains to design
via e-mail).
to all.
that are acceptable

tactics

for using

them

It is tempting to argue that we have shown that cus


tomers discuss
the product and that discussion
helps to
rates.
not the
take
word
of
mouth
is
However,
improve
our
As
for
result.
discussed
only possible
explanation
in detail
source

above, itmay
of information

is in accord with

is a powerful
which
homophily,

be that the network


on consumer

social

theories (Blau, 1977; McPher


and Cook, 2001). We have tried to
son, Smith-Lovin
control for homophily
by using a propensity-matched
sample to produce our logistic regression model. How
be
ever, it may well be that direct communications
tween people
is a better indicator of deep similarity

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

NETWORK-BASED MARKETING
or geographic
than any demographic
or word of mouth,
cause, homophily
and practically.
theoretically

Either

attributes.
is interesting

R.

FiLDES,

both

V. Mahajan,
19 327-328.

and

of AT&T, as well
for useful
versity of Maryland,
We

suggestions.
mous
reviewers

Getoor,

drafts.

previous

Learning

Mining
Berlin.
Springer,

Learn.

REFERENCES
G.

A.

and Tuzhilin,

the next

Toward

(2005).

A survey of the state-of


of recommender
systems:
generation
IEEE Trans. Knowledge
extensions.
the-art and possible
and
Data Engineering
17 734-749.
D.

Agarwal,
ties of

Philadelphia.
F. M.

Bass,

of Social
D. and Narayandas,

Theory

contacts

initiated

with

D.

Web

pertextual

L.
search

30 107-117.
Systems
A.C.
(1991). Spatial

metrica

S. (1998). Toward
and cost distributions:

and Stolfo,

non-uniform

class

fraud

In Proc.

detection.

on Knowledge
Press, Menlo

S. H.

and

DELLAROCAS,
Promise
agement

event
physics
67 159-182.

M.

of

tional

on Knowledge
Conference
ACM Press, New York.
W.,

DuMouchel,
and Pregibon,
Fifth

ACM

In Proc.

Volinsky,
D.

rule-learning

classification.

Computer

R.

G.

for information

Huang,

Mining
Going

technology
and methods.
J. Assoc.

6-15.

the network
).Mining
SIGKDD
Interna
and Data

Discovery

Mining

Johnson,

T.,

beyond
innovation
Information

Press,

C.

Cortes,
flatter.

In Proc.

on Knowledge
York.

New

the dominant

paradigm
con
Emerging
5
314-355.
Systems

research:

J. Mach.

relation

probabilistic
WEBKDD

San

1999,

Yorker March

17,

K.

Point:

How

Little

T. L.

and Baker,
environment

Inves

(2002).

in hedonic

service

events.

J. Busi

of sporting

study

Can

Things

Boston.

Books,

E.

W.

and Handcock,

to social
MR

Sta

(2004). A graph model


J. Amer. Soc. Informa

H. C.

S.

systematic

(2002).
/. Amer.

analysis.

systems.
55 259-274.

and Technology
N.
and Stephen,

La

S.

1951262

and Chen,

recommender

M.

network

The

(2002).
Intelligent

study.

imbal

class

Data

Analysis

429-449.
R.
and Agarwal,
(2001).
Evaluating
rare
to
and
classes:
algorithms
classify
boosting
Comparison
on Data
In Proc. IEEE International
Conference
improvements.
IEEE Press,

257-264.

Mining
Kautz,

V.

Kumar,

M.,

H.,

Selman,

Combining

social

B.

and

networks

NJ.

Piscataway,
M.

Shah,
and

(1997).

web:

Referral

collaborative

Comm.

filtering.

sources

in a hyperlinked

ACM 40(3) 63-65.


J. (1999). Authoritative
J. ACM 46 604-632.

BERG,
V.

T. V.

and Krishnan,
An

D.
for

problem

MR

(2002).

networks.

on Information
Conference
ACM Press, New York.
556-559.
Linden,

G.,

Smith,

B.

and

worked

Paper
Working
York University.
Mahajan,

toolkit

(2003).

F. (2004).
a univariate

and

Stern

#CeDER-04-08,

V., M?ller,

J.

E.

for new products


strategy
mouth. Management
Sei.

and Kerin,
with
30

positive
1389-1404.

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

318-330.
link pre

Twelfth

collaborative

7 76-80.
Computing
S. and Provost,
data:

In Proc.

and Knowledge

York,

recommendations?Item-to-item

Macskassy,

diffusion

Multinational

framework.

social

en

1747649

Sei. 21
Marketing
and KLEINBERG,
J. (2003).
The

alternative

LlBEN-NowELL,

Internet

ACM

A.

Chung,

Japkowicz,
ance problem:

diction

ACM

B.

Taskar,

55 691-101.

approaches
97 1090-1098.

Z.,

Kumar,

Man

and

The New

tional

(2001

Seventh

C,

mechanisms.

D.

coolhunt.

M.

Raftery,

vironment.

of mouth:

of word

flat files
Squashing
International
Conference

(2004).

R D.,

models:

(1999).

SIGKDD
and Data

Discovery
FlCHMAN,

(1991).

307-338.

eds.)

of link structure.

Using
In Proc.

The Tipping
Back Bay

Brady,

R.,

Research

Klein

P. and Richardson,
customers.

study

Conference
AAAI
164-168.

Mining

E. G.

value

57-66.

case

The

the role of the physical


tigating
An exploratory
consumption:

Joshi,

Econo

learning with
in credit

scalable
A

The
C
(2003).
digitization
and challenges
feedback
of online
Sei. 49 1407-1424.

Domingos,

Hightower,

tion Science

demand.

(1999).

filtering.

(1997).

for E-commerce

/. Market

Lavrac,

Koller,

M.

M.
GLADWELL,
(2002).
a Big Difference.
Make

International

Data

Stern,

in high-energy
Communications

program
Physics

Fourth

and

Discovery
Park, CA.

Clearwater,

cepts

of

59 953-965.

E.

card

in household

patterns

share

of a large-scale
hy
Networks
and ISDN

anatomy

Computer

engine.

Case,

Chan,

The

(1998).

on

impact
behavior.

and word-of-mouth
category
requirements
Research-38
281
-297'.
ing
S. and Page,

M.

space
tist Assoc.

customer

N.

A.

In Relational

models.

78-88.

tent

Managing

The

and Sahami,

Springer,
D.
and Pfeffer,

relational
and

In

learning.
Conference.
Berlin.

CA.

Diego,

Hoff,

A Primitive

415.

3625

N.,

for collaborative

Gladwell,

ness

(2001).

manufacturers:

SIAM,

models

consumer

and Heterogeneity:
Inequality
Structure.
Free Press, New York.

Bowman,

Brin,

In Proc.

Mining.

for model

growth
product
Sei. 15 215-227.

Management
P. M.
(1977).

communi

Enhancing
blockmodels.

new

(1969).

durables.
BLAU,

(2004).

stochastic
using Bayesian
on Data
International
Conference

interest
SIAM

Fourth

D.

and Pregibon,

L.

Getoor,

relational

International

Koller,

N.,

Friedman,

L.,

statistical

models
Learning
probabilistic
Res. 3 679-707.
MR1983942

(2003).

Adomavicius,

cooperation,
Research
20

J. Consumer

15th

probabilistic
(S. Dzeroski

Data

Getoor,

on

Programming,
in Comput.
Sei.

Friedman,

L.,

(2001).

on

comments

insightful

Structure,

(1993).

information.

Tutorial

(2005).

Logic
Notes

Lecture

discussions

offered

who

L.

ductive

and helpful
like to thank three anony

also

would

K.

of market

Models,
Diffusion
by
Internat. J. Forecasting

360-375.

Paul and Deepak Agar


as Chris Dellarocas
of the Uni

wal

the flow

GETOOR,

like to thank DeDe

We would

of New-Product
(2003). Review
E. M?ller
and Y. Wind,
eds.

J. and Nakamoto,

Frenzen,

ACKNOWLEDGMENTS

275

Amazon.com
filtering.

Classification
case

School

R.

Interna

Management

study.

of Business,

IEEE

in net
CeDER
New

Introduction
(1984).
word-of
and negative

S. HILL, F. PROVOST AND C. VOLINSKY

276

Models.

and Hall,

Chapman

McPHERSON,

M.,

cation

trees

Learn.

Res.

and class

A.

(2001).

N.

PAUMGARTEN,

Interfaces
R. (2004).

Learning,
Banff,

Learning.
(2003). No.

Yorker May
5.
C. and Provost,

Perlich,

Learning
D.

Applying

for collaborative

Relational

on Machine

POOLE,

J. (2001).

Annual

Boosted

(2006).

estimation.

for relational

quantitative
31(2) 90-108.

learning

acknowledged.
Distribution-based

(2006).
with

Canada.

Alberta,

1 fan dept.
F.

marketing

M.
sites

score.

P. R.

identifier

attributes.

aggre
Machine

Estimating

of

the telephone
universe.
In Proc.
Tenth ACM

and Domingos,

for viral marketing.

P. (2002).
In Proc.

Discovery

knowledge
Mining
ACM SIGKDD

Eighth

Statist.
D.

Evans,

D.

B.

ed.

Free

bias in
Reducing
on the propensity

(1984).

subclassification

using
Assoc.

79 516-524.

J. and Yakan,

A.

case in predictive
tering: Special
Mathematics
82 1-11. MR2159280

fil

Collaborative

(2005).

Internat.

analysis.

J. Computer

T. (1990). A study of a competitive


Bass model which
takes
into account
firms.
J.
Research
among
competition
Operations
Society
of Japan 33 319-334.

Ueda,

den

C.

Bulte,

tion revisited:

can J. Sociology
R. (2004).

New

costly:
J. Artificial
G. M.
WEISS,

Yang,

The

The

effect

Intelligence
(2004).

SIGKDD

hidden

of

Dec.

Research

Mining

preferences.

L.

versus

effort.

sight)

Learning
distribution

persuaders.

when
on

Ameri

The

training data
tree induction.

19 315-354.

with

rarity: A unifying
Newsletter
6 7-19.

(2003). Modeling
/. Marketing
Research

This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions

innova

Medical

(2001).
marketing

(in plain
5, 69-75.

(2003).
class

Explorations
S. and Allenby,
G. M.

consumer

G.

contagion
106 1409-1435.

York Times Magazine


G. and Provost,
F

are

ACM

and Lilien,

Social

Walker,

Weiss,
the size

5th

Innovations,

of

Diffusion

and Rubin,

studies

J. Amer.

K.,

Van
The New

(2003).

and Data

Discovery

York.

observational

J. Mach.

probabilistic
In Proc. Workshop
filtering.
21st International
Confer

Bayesian
Mark-recapture
approach.
on Knowledge
International
SIGKDD
Conference
ACM Press, New York.
and Data Mining
659-664.

Richardson,

New

Press,

classifi

62 65-105.
(2004).

E. M.

ROGERS,

of

Hierarchical

sharing

Birds

Review

Tout,

L.

models

on Statistical

gation

A.

probability/quantile

J. and Greiner,

relational

ence

and Buja,

to the Internet.

techniques

61-70.

Mining

To appear.

Montgomery,

Newton,

and COOK,
networks.

on Knowledge
Conference
ACM Press, New York.

International

MR0727836

Rosenbaum,

A.

D., Wyner,

Linear

Generalized

(1983).

York.
L.

in social

Homophily
27 415-444.

Sociology

New

Smith-Lovin,

of a feather:

Mease,

J. A.

P. and Nelder,

McCULLAGH,

framework.

interdependent
40 282-294.

You might also like