27645754

Network-Based Marketing: Identifying Likely Adopters via Consumer Networks
Author(s): Shawndra Hill, Foster Provost and Chris Volinsky

Source: Statistical Science, Vol. 21, No. 2, A Special Issue on Statistical Challenges and
Opportunities in Electronic Commerce Research (May, 2006), pp. 256-276
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/27645754
Accessed: 10-03-2015 09:28 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.
http://www.jstor.org
This content downloaded from 202.92.130.58 on Tue, 10 Mar 2015 09:28:56 UTC
All use subject to JSTOR Terms and Conditions
Statistical Science
2006, Vol. 21,No. 2. 256-276
DOI: 10.1214/088342306000000222
? InstituteofMathematical Statistics, 2006
Network-Based
Marketing:
Identifying
via Consumer
Adopters
Likely
Networks
Shawndra
Hill, Foster
Provost
and Chris Volinsky
refers to a collection
of marketing
marketing
consumers
of
to
that
take
links
between
increase
sales.
advantage
techniques
on the consumer networks
formed using direct interactions
We concentrate
Network-based
Abstract.
between consumers. We survey the diverse literature

(e.g., communications)
on such marketing
with an emphasis on the statistical methods
used and the
have been applied. We also provide a discus
these methods
data to which
for this burgeoning
and opportunities
research topic. Our
a
of inadequate data, prior
survey highlights
gap in the literature. Because
studies have not been able to provide direct, statistical support for the hypoth
esis that network
linkage can directly affect product/service
adoption. Using
sion of challenges
ser
a new data set that represents the adoption of a new telecommunications
we
we
show very strong support for the hypothesis.
show
vice,
Specifically,
consumers
three main results: (1) "Network neighbors"?those
linked to a
the service at a rate 3-5 times greater than baseline
team. In ad
the best practices
of the firm's marketing
new
to
network
customers
the
allows the firm
who
acquire
have fallen through the cracks, because
they would not have
prior customer?adopt
groups selected by
dition, analyzing
otherwise would
built
identified based on traditional attributes.
(2) Statistical models,
a very large amount of geographic,
and
demographic
prior purchase
and substantially
in
data, are significantly
improved by including network
information
allows the ranking of the
formation.
(3) More detailed network
been
with
so as to permit the selection

of adoption.
very high probabilities
network
with
neighbors
of small
sets of individuals
Viral marketing,
word of mouth,
Key words and phrases:
targeted market
statistical
network
relational
classification,
analysis,
learning.
ing,
1. INTRODUCTION
cial network among consumers.

Instances of network
mar
have been called word-of-mouth
based marketing
and
buzz marketing
keting, diffusion
of innovation,
seeks to increase brand

marketing
of a so
and profit by taking advantage
Network-based
recognition
Shawndra Hill
is Associate
is a Doctoral
Professor,
and Management
Operations
Candidate
Department
viral marketing
(we do not consider multilevel
which
has
become
known as "network"
ing,
and Foster Provost
or adoption spreads from consumer

to
ing). Awareness
consumer. For example,
friends or acquaintances
may
tell each other about a product or service,
increasing
awareness
and possibly
exercising
explicit advocacy.
of Information,
Sciences,
N.
Leonard
Stem
School of Business, New York University, New York, New

York 10012-1126, USA (e-mail:
shill@stern.nyu.edu;
Volinsky isDirector,
Labs
Jersey
Research,
07932,
Statistics Research Department,
Shannon
USA
Laboratory,
(e-mail:
Firms may
to-consumer
Chris
fprovost@stern.nyu.edu).
Florham
Park,
market
market
AT&T
use
their websites
advocacy
and Shah,
(Kautz, Selman
tomer feedback mechanisms
New
volinsky@research.att.com).
to facilitate
via product
256
1997)
consumer
recommendations
or via on-line
(Dellarocas,
2003).
cus
Con
NETWORK-BASED MARKETING
sumer networks
may
or marketing
also provide
leverage to the ad
of
the
firm. For exam
strategy
vertising
ple, in this paper we show how analysis
network improves targeted marketing.
two contributions.
This paper makes
of a consumer
First we
sur
257
such as Oprah, with her monthly

book club reading
"hubs"
of
in the consumer
may represent
advocacy
list,
re
lationship network. The success of The Da Vinci Code,

by Dan Brown, may be due to its initial marketing:
free to readers thought to
10,000 books were delivered
research literature
vey the burgeoning methodological
on network-based
in
marketing,
particular on statisti
cal analyses
for network-based
marketing. We review
be influential
the research
to spread information
about a product via word
of mouth,
it has been called viral marketing,
although
that term could be used to describe any network-based
techniques
posed, and the data and analytic

also discuss
and op
challenges
for research in this area. The review allows
questions
used. We
portunities
us to postulate
data requirements
for study
necessary
of network-based
and
ing the effectiveness
marketing
to highlight
the lack of current research that satisfies
those
access
requirements.
Specifically,
both to direct links between
direct
information
research
must
consumers
have
and to
on the consumers'
product adoption.
of inadequate data, prior studies have not been
able to provide direct, statistical support (Van den Bulte
and Lilien, 2001) for the hypothesis
that network link
age can directly affect product/service
adoption.
Because
The
port
second
contribution
is to provide
that network-based
sup
empirical
indeed can im
marketing
intro
prove on traditional marketing
techniques. We
duce telecommunications
data that present a natural
in which
marketing models,
as
as
well
linkages
product adoption
rates can be observed.
For these data, we show three
testbed
for network-based
communication
con
(1) "Network
neighbors"?those
the service at
linked to a prior customer?adopt
a rate 3-5 times greater than baseline
groups selected
main
results:
to
enough (e.g., individuals, booksellers)
the traffic in paid-for editions
(Paumgarten,
When
to con
firms give explicit
incentives
stimulate
2003).
sumers
where
marketing
from
spreads
tion of using
commonly
athletes)
capitalize
to advocate
"cool" members
particularly
to adopt products
(Gladwell,
and
1997; Hightower,
Baker,
2002).
Brady
Network
targeting'. The third mode of network-based
is for the firm to market to prior purchasers'
marketing
social-network
any advo
neighbors,
possibly without
For network
the
cacy at all by customers.
targeting,
firm must have some means
to identify
these social
team. In
of the firm's marketing
by the best practices
the network allows the firm to ac
addition, analyzing
quire new customers who otherwise would have fallen
demographic
and substantially
formation.
allows
permit
improved
by
(3) More
network information
sophisticated
so as to
the ranking of the network neighbors
the selection of small sets of individuals with
very high probabilities
of adoption.
example
The Hotmail
targeting and implicit advocacy:
free e-mail service appended
to the bottom of every
e-mail message
the hyperlinked
advertise
outgoing
ment,
are three, possibly

network-based
marketing.
complementary,
modes
"Get
targeting
your free e-mail

the social neighbors
(Montgomery,
user's
implicit
of
Individuals become
vocal advo
advocacy:
Explicit
cates for the product or service, recommending
it to
their friends or acquaintances.
Particular
individuals
at Hotmail,"
thereby
every current user
of
while
2001),
taking
Hotmail
advocacy.
customer base.
tially increasing
in the first month
alone Hotmail
of the
advantage
saw an exponen
Started
in July 1996,
acquired 20,000 cus
1996 the firm had acquired over
tomers. By September
100,000 accounts, and by early
lion subscribers.
some
There
in combination.
may be used
of viral marketing
combines net
work
Traditional
2. NETWORK-BASEDMARKETING
products
simply by conspicuous
firms have tried to induce the
adoption. More
recently,
same effect by convincing
of smaller social groups
A well-cited
data, are significantly

in
including network
as implicit advocates.
Firms
on influential individuals
(such as
consumers
neighbors.
These
three modes
and prior purchase
or adoption
to consumer.
do not speak
Implicit advocacy: Even if individuals
about a product, they may advocate
through
implicitly
their actions?especially
through their own adoption
of the product. Designer
labeling has a long tradi
sumers
they would not have been

through the cracks, because
identified based on traditional attributes. (2) Statistical
built with a very large amount of geographic,
models,
of awareness
the pattern
consumer
segments
marketing
of
1997 it had over
methods
consumers.
do
Some
not
1mil
to
appeal
consumers
ap
of being on the cutting

parently value the appearance
or
"in the know," and therefore derive satisfac
edge
tion from promoting
exciting products. The firm

to entice vol
(Walker, 2004) has managed
BzzAgents
new
of
untary (unpaid) marketing
products. Further
new,
258
more,
come
more
although
available
and more
S. HILL, F. PROVOST AND C. VOLINSKY

information
has be
on products, parsing such information

such as
is costly to the consumer. Explicit
advocacy,
can
a
be
useful way to filter
word-of-mouth
advocacy,
out
noise.
of
network-based
marketing
assumption
is that consumers
propagate
explicit advocacy
about products after they either
information
"positive"
aware of the product by traditional
have been made
key
through
vehicles
marketing
themselves.
Under
or have
the product
experienced
a particular subset
this assumption,
may have greater value to firms because

a
to propagate product in
have
higher propensity
they
on a combination
based
formation
2002),
(Gladwell,
and their having
influential
of their being particularly
of consumers
more
friends
should want
useful
and Domingos,
2002). Firms
(Richardson
to find these influencers
and to promote
behavior.
Many
quantitative
research
pirical marketing
independently.
are collected
methods
assume
used
in em
that consumers
act
attributes
Typically, many explanatory
on each actor and used
in multivari
or tree induction.
In
ate modeling
such as regression
assumes
interde
network-based
contrast,
marketing
inter
among consumer
preferences. When
pendency
dependencies
their effects
to account for
exist, itmay be beneficial
in statis
in targeting models. Traditionally
as part of
are modeled
tical research, interdependencies
a
a covariance
either
within
structure,
particular obser
ex
vational unit (as in the case of repeated measures
or between
units. Studies of
observational
periments)
instead
network-based
attempt to measure
marketing
these interdependencies
through implicit links, such as
on geographic
or demographic
attributes, or
matching
of
links, such as direct observation
between actors. In this section, we re
the different
types of data and the range of statis
through explicit
communications
view
tical methods
we discuss
the extent
accommodate
to analyze
these methods
that have been used

to which
networked
them, and
naturally
data.
we discuss
the final
subsection,
inherent
challenges
some of the statistical
in incorporating
this network
struc
ture.
3.1
Econometric
Models
is the application
of statistical meth
relation
estimation
of economic
Econometrics
to the empirical
ships. In marketing
ods
this often means
of
the estimation
or
one
for
the
marketing
equations:
or firm and one for the market. Regression
two simultaneous
ganization
and time-series
analysis are found at the core of econo

are often
and econometric
models
modeling,
cam
to assess the impact of a target marketing
metric
used
paign over time.

Econometric
models
pact
tion
3. LITERATUREREVIEW
statistical
an
In each case, we provide
systems.
of the approach and a discussion
of a promi
nent example. This (brief) survey is not exhaustive.
In
recommender
overview
to study the im
on
rice consump
interdependent
preferences
automobile
1991),
(Case,
(Yang and
purchases
have been used
of
Allenby,
2003) and elections

(Linden,
For each of the aforementioned
Smith
and York,
studies, geogra
as
a
be
in
for
proxy
part
interdependence
phy
as opposed
tween consumers,
to direct, explicit com
are used in
munication.
different methods
However,
2003).
is used
the analysis.
Most
(2003)
recently, Yang and Allenby
are
that traditional
random effects models
suggested
not sufficient
sumer
to measure
networks.
chical mixture
the interdependencies
a Bayesian
developed
They
model where
of con
hierar
is built
interdependence
through an autoregressive
allows testing of the presence
structure
into the covariance
process. This framework

It also
of interdependence
through a single parameter.
can incorporate
each
the effects of multiple
networks,
structure. In their
with its own estimated
dependence
they use geography
application,
create a "network" of consumers
created
consumers
between
who
to
and demography
links are
in which
exhibit
or
geographic
showed
that the
similarity. The authors

defined network of consumers
demographic
is more
geographically
useful
than the demographic
for explaining
network
consumer behavior as it relates to purchasing
Japanese
cars. Although
they do not have data on direct commu
spans the fields

marketing
science,
economics,
sociology,
computer
and marketing.
In this section, we orga
psychology
nize prominent work
in network-based
marketing
by
the framework presented

consumers,
to ex
and
be extended
could
(2003)
by Yang
Allenby
are
network
between
data
where
links
created
plicit
mod
six types of statistical research: (1) econometric
network
classification
surveys,
(2)
(3)
eling,
modeling,
with convenience
(4) designed
samples,
experiments
posed
A drawback
Work
in network-based
of statistics,
(5) diffusion
theory
and (6) collaborative
filtering
and
nication
between
consumers
their explicit communication

or geographic
to demographic
similarity.
through
of
matrix
pendence
of consumers;
as op
this approach
is that the interde
size n2, where n is the number
has
consumer
networks
are extremely
large
estimation
and prohibit parameter
using this method.
or
clever
matrix
clustering of the ob
techniques
Sparse
servations would be a natural extension.
3.2
Network
Network
Models
Classification
models
classification
links between
in a network
entities
use knowledge
to estimate
of the
a quan
in such a
tity of interest for those entities. Typically,

an entity is influenced most by those directly
model
to a lesser ex
to it, but is also affected
connected
tent by those further away. Some network
classifica
use an entire network to make predictions

a particular entity on the network; Macskassy
a brief survey. However,
and Provost
(2004) provided
most methods
have been applied to small data sets and
tion models
259
some studies use survey

this shortcoming,
to
collect
data on consumers'
sampling
comprehensive
word-of-mouth
behavior. By sampling individuals
and
To address
can collect data that are

them, researchers
contacting
to obtain directly by observing
difficult (or impossible)
network-based
and
(Bowman
phenomena
marketing
The
of
lies
these
studies
2001).
Narayandas,
strength
in the data, including the richness and flexibility
of the
answers that can be collected
For
from the responders.
can acquire data about how cus
instance, researchers
tomers found out about a product and how many oth
ers they told about the product. An advantage
is that
about
can design their sampling

scheme to con
trol for any known confounding
factors and can devise
balanced
that test their hy
fully
experimental
designs
research
have not been applied to consumer data. Much
in network classification
has grown out of the pioneer
(1999) on hubs and authorities
ing work by Kleinberg
potheses.
vey data
on the Internet, and out of Google's

PageRank
algo
rithm (Brin and Page,
(to oversimplify)
1998), which
of a network
identifies
the most
influential members
typically are used.

Bowman
and Narayandas
(2001)
surveyed more
than 1700 purchasers of 60 different products who pre
"point" to them. Al
both are
study uses statistical models,
notions of degree centrality
related to well-understood
and distance centrality from the field of social-network
viously had contacted

were
The purchasers
their interaction with
by how many
though neither
others
influential
One paper
a consumer
that models
for max
network
and Domingos
(2002),
imizing profit is by Richardson
as
in which a social network of customers
is modeled
a Markov
random
will
field.
that a given
probability
a
function of the
product is
The
buy a given
states
of her neighbors,
attributes of the product and
or not the customer was marketed
to. In this
whether
framework
to every
to assign a "network value"

it is possible
customer by estimating
the overall benefit of
to that customer,
the impact that
including
marketing
the marketing
action will have on the rest of the net
(e.g., through word of mouth). The authors tested
reviews from an In
their model on a database of movie
work
ternet site and found

outperforms
tomer value.
that their proposed methodology

cus
methods
for estimating
uses implicit
formulation
network
non-network
Their
are linked when a customer

reads a
(customers
reviews
review by another customer and subsequently
information
and implicit purchase
the item herself)
links
(they assume a review

and vice versa).
3.3
logistic
Since the purpose of models

built from sur
is description,
like
statistical
methods
simple
or analysis
of variance
(ANOVA)
regression
on
the manufacturer
of an item
implies
a purchase
ofthat
asked
specific
the manufacturer
word-of-mouth
subsequent
able to capture whether
ers of their experience
and
were
analysis.
customer
researchers
product.
about
questions
behavior.
and its impact

The authors
the customers
told oth
if so, how many people

told.
The
authors
found
that
they
"loyal"
self-reported
customers were more
likely to talk to others about the
but interestingly
products when they were dissatisfied,
more
not
satisfied. Although
likely when
they were
studies
like this collect
some direct data on consumers'
word-of-mouth
the researchers
which
contacts
behavior,
of the consumers'
product. Therefore,
of-mouth
actually
3.4
Designed
do not know
the
later purchased
address whether word
they cannot
affects individual
with
Experiments
sales.
Convenience
Samples
to study
enable researchers
Designed
experiments
a
network-based
in
controlled
marketing
setting. Al
a
the
convenience
though
subjects typically
comprise
who answer an
sample (such as those undergraduates
ad in the school
the design of the experi

newspaper),
can be completely
randomized. This is unlike the
studies that rely on secondary data sources or data from
ment
the Web.
Typically
ANOVA
is used
to draw
conclu
sions.
Surveys
Most
research
tion on whether
Frenzen
in this area does
consumers
actually
not have
informa
talk to each other.
and Nakamoto
that influence
formation
individuals'
through
(1993)
decisions
a market
studied
the factors
to disseminate
via word-of-mouth.
in
The
260
subjects were presented with several scenarios that rep

and marketing
resented different
strategies,
products
tell trusted and
and were asked whether
they would
about the product/sale.
They
acquaintances
on
the effect of the cost/value manipulations
ac
to
information
share
the consumers'
willingness
of the strength of
tively with others, as a function
nontrusted
studied
tie. In this study, the authors did not allow

consumer
net
their explicit
the subjects to construct
to hypoth
instead, they asked the participants
work;
the social
esize
about
data
from
their networks.
a convenience
used
The
experiments
to generalize
sample
the
over
consumer
a complete
network. The authors also em
in their study. They found that the
ployed simulations
be
hazard
moral
the
(the risk of problematic
stronger
the stronger the
havior) presented by the information,
Gen
ties must be to foster information
propagation.
structure and
erally, the authors showed that network
information
characteristics
form their information

3.5
interact when
transmission
individuals
decisions.
Models
Diffusion
Diffusion
tools, both quantitative
theory provides
to assess
the likely rate of diffusion
and qualitative,
or product. Qualitatively,
researchers
of a technology
numerous
factors that facilitate or hin
have identified
der technology
2004), as well as
(Fichman,
adoption
social
that influence
factors
(Rogers,
product adoption
research involves empir
diffusion
2003). Quantitative
often
from diffusion models,
ical testing of predictions
informed by economic
theory.
The
most
notable
and most
influential
diffusion
was proposed by Bass (1969). The Bass model

the number of users who
of product diffusion predicts
a
at
will adopt an innovation
given time t. It hypoth
is a function
that the rate of adoption
esizes
solely
model
individual
of product
adoption. Models
incorporate
assume
is ef
diffusion
that network-based
marketing
occurs
when
diffusion
understanding
and the extent to which
it is effective
is important
from using
for marketers,
these methods
benefit
may
fective.
individual-level
enable
can effectively model word-of-mouth

at the aggregate,
societal level.
In general,
tend accepted
aggregate-level
and the overall
Tout,
Evans
prod
of the sales peak and the

good predictions
to historical
data.
of
the
when
timing
peak
applied
to estimate
Bass used linear regression
the parame
ters for future
the good
sales predictions,
measuring
of fit (R2 value) of the model
for 11 consumer
durable products. The success of the forecasts suggests
that the model may be useful in providing
long-range
ness
for product
sales or adoption. There has
forecasting
since
been considerable
work on diffusion
follow-up
this groundbreaking
and Kerin
work. Mahajan, M?ller
(1984) review this work. Recent work on product diffu
sion explores
2003) as well
2002)
of the product
2005);
they
(Ueda,
typically
the extent
to which
the Internet
as globalization
(Kumar
a
in
role
diffusion.
play
product
3.6 Collaborative
(Fildes,
and Krishnan,
and Recommender
Filtering
Systems
Recommender
to
mendations
systems make personalized

consumers
individual
based
recom
on
de
content
and
and link data (Adomavicius
methods
focus
Collaborative
Tuzhilin,
2005).
filtering
on the links between consumers;
the links are
however,
consumers
not direct. They associate
with each other
mographic
based
on shared purchases
Collaborative
filtering
or similar
is related
network-based
ratings of shared
to explicit consumer
both target market
because
marketing
tasks
benefit
from
ing
learning from data stored inmul
tables
and
(Getoor, 2005). For example, Getoor
tiple
Sahami
(1999), Huang, Chung and Chen (2004) and
and Greiner
between
relational
the connection
(2004) established
the recommendation
and statistical
problem
of proba
learning through the application
bilistic
studies that test and ex

the empirical
theories of product diffusion
rely on
attributes
data for both the customer
adoption
and Yakan,
empirically
The model
yielded
Newton
uct diffusion
ver
individual-
In his first study, Bass tested his model

durables.
against data for 11 consumer
an 5-shaped
is slow
curve, where
adoption
and tails off at the end.
takes off exponentially
at first,
This model
the extension
as the comparison
of results using
sus aggregate-level
data.
products.
describes
on explicit networks would

as
of existing diffusion models,
data. Data
well
who have
of the population
proportion
the
cumulative
let
be
pro
F(t)
adopted. Specifically,
The diffusion
in the population.
portion of adopters
as a func
in its simplest form, models
F(t)
equation,
tion of p, the intrinsic adoption
rate, and q, a mea
sure of social contagion. When
q > p, this equation
of the current
Since
1990;
do not
relational models
(PRM's)
(Getoor, Friedman,
and Pfeffer, 2001 ). However,
neither group used
customers
links
between
for
explicit
learning. Recom
mendation
systems may well benefit from information
Koller
about
perhaps
explicit
quite
consumer
important,
interaction
aspect
as an additional,
of similarity.
3.7
Research
and Statistical
Opportunities
see
We
is a burgeoning
interactions
that there
consumers'
body of work
and their effects
on purchasing.
the foregoing
To our knowledge
types
taken in re
statistical
represent the main
approaches
In each approach,
search on network-based
marketing.
or in
in the data collection
there are assumptions made
the analysis
that restrict them from providing
strong
that network
and direct support for the hypothesis
based
can improve on traditional

and convenience
samples can suf
indeed
marketing
Surveys
techniques.
fer from small and possibly
biased
samples. Collab
but do
have large samples,
orative filtering models
not measure
individuals. Models
direct links between
in network
have
and econometrics
classification
instead
like geography
proxies
and almost
communications,
used
direct
accurate,
specific
data on which
historically
of data on
all studies have no
(and what)
customers
purchase.
To paint a complete
a particular
product,
picture of network influence

the ideal data set would have
for
the
(1) large and unbiased

sample,
following
properties:
on subjects,
information
covariate
(2) comprehensive
between
of direct communication
(3) measurement
subjects and (4) accurate information on subjects' pur
in the next section
chases. The data set we present
and we will demonstrate
has all of these properties
its value
The
for statistical
of how
research
to analyze
into network
can be useful when

that squashing
dealing
with up to billions of records. However,
there may be
a loss of important information which can be captured
claimed
Challenges
that addresses
261
influence.
such data brings
question
issues:
many statistical
data
Data-set
size. Network-based
marketing
or
often arise from Internet
telecommunications
up
sets
ap
can
When
observations
be
and
quite large.
plications
the
number in the millions
(or hundreds of millions),
for
data
and
the
become
data
typical
analyst
unwieldy
cannot be handled
in memory
by standard statis
software. Even if the data can be loaded,
tical analysis
their size renders the interactive style of analysis com
mon with tools like R or Splus painfully
slow. In Inter
net or telecommunications
studies, there often are two
often
only by complex network structure.

More
network
information
derived
sophisticated
from transactional
into
data can also be incorporated
net
information
by deriving
as degree distribution
and time
below).
(which we demonstrate
spent on the network
data
Similarly, other types of data such as geographical
or temporal data, which
otherwise
would
need to be
of customer
the matrix
work
attributes
such
can be
by some sophisticated
methodology,
into the analysis by creating new covariates.
It remains an open question whether
clever data en
handled
folded
can extract all useful

to create
information
gineering
a set of covariates
for traditional analysis. For exam
with specific sets of
of communication
ple, knowledge
can be incorporated,
and may provide sub
stantial benefit (Perlich and Provost, 2006).
Once the data are combined,
the remaining data set
individuals
re
still may be quite large. While
much data mining
search is focused on scaling up the statistical toolbox to
data sets, random sampling remains an
today's massive
effective way to reduce data to amanageable
size while
the relationships we are trying to discover,
maintaining
if we assume the network information
is fully encoded
in the derived variables. The amount of sampling nec
environment
and
essary will depend on the computing
the complexity
of the model, but most modern
systems
can handle data sets of tens or hundreds of thousands
sampling, care must be taken to
interest
stratify by any attributes that are of particular
or to oversample
those attributes that have extremely
of observations.
skewed
Low
response
When
distributions.
incidence
of response.
is a consumer's
In applications
purchase
or
reaction
where
the
to a mar
to have a very low response

keting event, it is common
rate, which can result in poor fit and reduced ability to
detect significant
like
effects for standard techniques
If there are not many
logistic regression.
independent
is
attributes, one solution is Poisson
regression, which
well suited for rare events. Poisson
requires
regression
of data: all actors (web sites, commu

their descriptive
with
attributes, and
nicators),
along
actors.
One
solution
is
these
the transactions
among
forming buckets of observations

dent attributes
and modeling
as a Poisson
in these buckets
to compress
information
into attributes
the transaction
to be included in the actors' attribute set. It has been
of any continuous
requires discretization
independent
not
if there
which
be desirable. Also,
attributes,
may
are even a moderate
number of independent
attributes,
massive
sources
et al.,
that file squashing
(DuMouchel,
Volinsky
the best features of
1999), which attempts to combine
with
random
data
sampling, can be use
preprocessed
shown
ful for customer
attrition
prediction.
DuMouchel
et al.
the buckets
will
be
eling. Other
solutions
oversampling
positive
based on the indepen

the aggregate
response
random variable. This
too sparse to allow Poisson mod

that have been proposed
include
responses
and/or undersampling
262
of
(2004) gave an overview
negative responses. Weiss
show
the literature on these and related techniques,
extended network
Incorporating
structure lend themselves
network
as to their effective
evidence
ing that there is mixed
ness. Other studies of note include the following. Weiss
network-centric
(2003) showed that, given a fixed sample

in training data varies
the
size,
optimal class proportion
(but can be de
by domain and by ultimate objective
and Provost
termined); generally
estimates or rankings,
fault. However, Weiss
to produce probability
speaking,
a 50:50 distribution
is a good de
and Provost's
results are only for
tree induction,
mented
and Stephen
(2002) experi
Japkowicz
ma
and support-vector
neural networks
in addition to tree induction, showing
(among
with
chines,
machines
other things) that support-vector
sitive to class
imbalance. However,
they
are insen
considered
fluenced
like itself. Hoff, Raftery and

ings and (4) has neighbors
a
Handcock
defined
Markov-chain
Monte Carlo
(2002)
to estimate
method
latent positions
of the actors for
useful
in need
response
of more
empirical
systematic
and theoretical
from homophily. Unless

the content of communi
word-of-mouth
Separating
about
there is information
cations, one cannot
mouth
transmission
Social
theory
cate with
each
that there was word-of
conclude
about the product.

communi
that people who
are more
likely to be simi
of information
tells
us
other
a concept
called homophily
(Blau,
and
Smith-Lovin
Cook, 2001). Ho
1977; McPherson,
a
for
wide variety of relation
is exhibited
mophily
lar to each
other,
of similarity. Therefore,
linked
ships and dimensions
are
consumers
and
like-minded
like-minded,
probably
consumers
tend to buy the same products. One way to
is to account for con
address this issue in the analysis
scores (Rosenbaum
sumer similarity using propensity
were developed
scores
and Rubin,
1984). Propensity
clinical trials and at
in the context of nonrandomized
tempt to adjust for the fact that the statistical profile of
patients who received treatment may be different than
the profile of those who did not, and that these differ
ences
or enhance
the apparent effect of the

treatment. Let T represent the treatment, X represent
the treatment and
the independent
attributes excluding
score
Y represent
the response. Then the propensity
=
= P(T =
x). By matching
PS(x)
1|X
propensity
scores in the treatment and control groups using typical
could mask
like demographic
of homophily
data, we can
of
account (partially) for the possible
confoundedness
other independent attributes.
indicators
data sets. This
social-network
in an unobserved
embeds
the actors
space," which could be more

than the actual transactions
for pre
themselves
sales. The field of statistical relational
learning
dicting
"social
2005) has recently produced a wide variety of

that could be applicable. Often
these models
influence to propagate
the
network.
through
(Getoor,
methods
allow
study.
as a Markov
(2001) used
to assign every node a "network value."
this technique
A node with high network value (1) has a high prob
(2) is likely to give the product
ability of purchase,
a high rating, (3) is influential on its neighbors'
rat
1998; Mease, Wyner and Buja, 2006)

(Chan and Stolfo,
rule induction
and Stern,
and multiphase
(Clearwater
This
is an area
1991 ;Joshi, Kumar and Agarwal,
2001).
attributes
set of
method
One
modeled
by her neighborhood
field. Domingos
and Richardson
random
small
primarily
with unbalanced
Data with
to a robust
(em
analyses.
simple
in our analysis)
from
is to create attributes
ployed
the network
them into a traditional
data and plug
to let each actor be in
Another
is
analysis.
approach
to deal
techniques
include ensemble
data. Other
noise-free
structure.
data. Missing
transactions
data in network
Missing
are common?often
is observ
only part of a network
able. For instance, firms typically have transactional
data on their customers
only or may have one class
of communication
(e-mail) but not another

(cellular
phone). One attempt to account for these missing
edges
is to use network
structure to assign a probability
of
a missing
edge
an edge
everywhere
creates
this probability
Thresholding
can be added
which
lesser weight
related
closely
to the network,
(Agarwal and Pregibon,
framework
pseudo-edges,
perhaps with
This
is
2004).
to the link prediction
problem,
where
the next links will be
tries to predict
Nowell
and Kleinberg,
PRM
is not present.
models
2003). One extension

link structure through
which
(Liben
of the
the use
of reference uncertainty
and existence uncertainty. The
a
extension
includes
unified generative model
for both
content and relational structure, where interactions be
tween the attributes
and link structure are modeled
(Getoor,
Friedman,
Koller
and Taskar,
2003).
4. DATASET AND PRIMARYHYPOTHESIS

section details our data set, derived
a direct-mail
from
ily
marketing
campaign
of a new communications
tential customers
the primary data with a
(later we augment
This
of
consumer-specific
ing team identified
attributes).
and marketed
The
firm's
primar
to po
service
large set
market
to a list of prospects
its standard methods.
We
using
network-related
effects or evidence
investigate
of "viral"
whether
stead
informa
fined 21 marketing
(Table 1) that were used
segments
for campaign management
and post hoc analyses. The
of consumers.
The team be
sample included millions
in this group. As we will de

tion spread are present
to a group we identified
scribe, the firm also marketed
using the network data, which allows us to test our hy
lieved
cer
to disclose
are not permitted
potheses
tain details, including specifics about the service being
offered and the exact size of the data set.
level
who were thought to be "high tech."

In keeping with
the marketing
standard practice,
on
a
team collected
set
of
attributes
prospects?
large
consumers whom they believed
to be potential adopters
those consumers
team used demographic

The marketing
customer
data,
relationship data, and various other data
sources to create profitability
and behavioral models
Med
Hi
3 2Y
4 2Y
Med
Hi
5 1Y
Med
6 1Y
73N
Hi
Hi
Med
102N
2
9N
N
N
2
20
21
2
on
demo
1
segments
(see Section
1 3 YHi
1-7
Med-Hi
Med-Hi
1-4
1-4
1-4
1-4
1-7
Hi
Hi0.10
PI
1.7
Hi
Hi0.25
PI
0.1
Med-Hi
8 3 NMed
1-7
Med-Hi
details)
%
Offer
Early Adopt
1-7
1-4
1-4
11
1NHi
1-4
1-4
1-7
1-4
4.1 for
PI
1.60.63
PI
1-4
1-4
1.7
PI
0.1
10.9
0.50P2
P2
13.1
Med-Hi
1-7
of %NN
list
2.41.26 PI
Hi
Y
19 1,2,3
based
ordered
IN?
16 ?
Hi
17N
3
1-7
Hi
181,2
N
1-4
Hi, Med
Hi, Med
were
or
Hi
P2 17.5
0.04
Hi 0.07
P2
11.0
Hi
P2 5.3
0.14
Hi
P27.7
0.25
Med-Hi
2.00.63 P2
Hi0.15
P2
2.0
15 1?Y??
P3
2.0
1.01
?
P2 1.6
0.46
Med-Hi
P2+
2.00.70
Hi P2+ 2.0
0.15
Med
12
N1
Hi
133N
N
Hi 14 1,2
customers
3 comprises
and/or those who have
services; Techl
any international
and
Tech2
low)
(1-10, where
l=high
and other
tech) are scores derived from demographics
Tech2
2Y
3
to campaigns.
attributes
important
previously
(hi, med
the marketing
Techl
based
and other customer

The at
characteristics.
graphics
tribute Intl is an indicator of whether
the prospect had
Table
Intl
variable was loyalty, a three

on previous
with
the
relationships
ser
orders
of
this
and
other
previous
score
response
Other
to identify prospective
who would
targets?consumers
a targeted mailing.
receive
The data the marketing
us with did not contain
team provided
the underly
attributes
but in
ing customer
(e.g., demographics),
Segment
Loyalty
would
have varying
to
separate the seg
important
to learn the most from the campaign.
segments
firm at the time of mailing;

is
little (if any) information
on
available
them. Previous
analyses have shown that
loyalty and tenure attributes have substantial
impact on
of the service.
for
that de
to a number of services in the past. Loyalty

level 2 comprises
those customers with which the firm
has had some limited prior experiences.
level 1
Loyalty
consumers
not
who
did
have
with
the
service
comprises
of this, it was
and, because
technology
to
would be most
successful
that marketing
statistics
that the different
attributes
subscribed
new
Descriptive
for derived
firm, including
vices. Roughly,
level
loyalty
with moderate-to-long
tenure
a
In late 2004, a telecommunications
firm undertook
cus
to potential
large direct-mail marketing
campaign
tomers of a new communications
service. This service
believed
values
response
ments
in this way
An important derived
Initial Data Details
involved
included
rates and itwas
further. We
4.1
263
P3 1.80.67
LI
6.0Hi
0.05
Hi
L2
6.0
0.05
0.83
0.08
0.22
S. HILL, F PROVOST AND C. VOLINSKY
264
the interest and ability of the

a high-tech
is
service; Early Adopt
score that estimates
of
the likelihood
for our
attributes
that estimate
relaxed
customer
to use
the marketing
a proprietary
the customer
a new
to use
ous behavior.
also
We
show
received
different
on previ
that
indicating
based
product,
the Offer,
different marketing
mes
segments
that were
indicate different
postcards
sages: P1-P3
a "+"
and
different
L2
indicate
LI
and
letters,
sent,
the mailing.
that a "call blast" accompanied
indicates
those groups with high loy

the segments,
In defining
were
lower values from the tech
values
permitted
alty
15 and 16
models.
and
Segments
early adoption
nology
were
an
insuf
there
were provided by
external vendor;
to fit our Tech and Early
ficient data on these prospects
Adopt
4.2
Primary
The
by a "?" in Table
as indicated
models,
Hypothesis
research
goal we
and Network
here
consider
1.
Neighbors
is whether
re
con
between
of independence
laxing the assumption
the estimation
sumers can improve demonstrably
of
our
is
that
first
likelihood.
Thus,
hypothesis
response
someone who has direct communication
with a current
subscriber
is more
It should
be noted
likely herself to adopt the service.

that the firm knows only of com
initiated by one of its customers

through a
are
so
data
network
the
service of the firm,
incomplete
lower
for
the
groups.
loyalty
especially
(considerably),
munications
Data
on
communications
events
include
anonymous
stamp and the
a time
the transactors,
the
For
transaction duration.
purposes of this research,
so that individual
all data are rendered anonymous
identifiers
for
are protected.
an at
we constructed
In pursuit of our hypothesis,
tribute called network neighbor
(or NN)?a
flag that
consumer
had commu
the targeted
indicates whether
identities
a current user of the service in a time pe

riod prior to the marketing
campaign. Overall, 0.3% of
In Table 1, the per
the targets are network neighbors.
(%NN) is broken down
centage of network neighbors
nicated with
by segment.
team invited us to create
the marketing
In addition,
our own segment, which
target. Our
they also would
that were
of
network
22"
consisted
neighbors
"segment
not already on the current list of targets. To make sure
our list contained
calculated
scores
based
used
with
merit
for
viable
prospects,
the derived
technology
on our
the consumers
team
the marketing
and early
list. They
adopter
filtered
scores, but they relaxed the thresholds

to limit their original
list. For instance, someone
= 1 needed a Tech2 score less than 4 to
loyalty
on these
inclusion
on
the initial
list; this threshold
was
clusion
list to Tech2
team allowed
less than 7. In this way,

in
prospects who missed
on the first cut to make
they were network neighbors.

ing team still avoided
targeting
believed
had very
those network
For
it into segment 22 if
the market
However,
customers
who
they
a
of
purchase.
probabilities
who did not score high
neighbors
small
to warrant
in segment 22, we still
inclusion
enough
tracked their purchase records to see if any of them sub
scribed to the service in the absence of the marketing
see below. Overall,
the profile of the candi
campaign;
to be subpar
dates in our segment 22 was considered
in terms of demographics,
affinity and technological
our
these tar
for
final
conclusions,
capability. Notably,
the firm would
gets are potential customers
wise ignored. The size of segment 22 was
list.
of the marketing
have other
about
1.2%
the pros
the above process divides
summarize,
two
dimensions:
(1) targets?those
pect universe along
as being
consumers
identified by the marketing models
To
of solicitation?and
(2) network neighbors?
worthy
with a subscriber.
those who had direct communication
Table
2 shows
the relative
combination
size for each
targets as the refer

(using the non-network-neighbor
ence set). Note
who neither
the non-NN
nontargets,
are network neighbors nor are they deemed to be good
of the prospect
This group is the majority
prospects.
consumers
firm
has very lit
and
includes
that
the
space
tle information about, because
they are low-usage com
municators
or do not subscribe
to any services with
the
firm.
with
4.3 Modeling
To determine
as
relaxing the independence
the network
data) improves model
a
wide
range of demographic
using
whether
(using
sumption
ing, we fit models
and consumer-specific
are known
of which
mated
likelihood
the values
Data
Consumer-Specific
independent
or believed
attributes
to affect
of purchase).
Overall, we
to assess
150 attributes
for over
fect on sales
likelihood
network-neighbor
These
values
collected
their ef
with
the
included
the
and their interactions
variable.
(many
the esti
following:
data: We obtained
Loyalty
than the simple
formation
finer-grained
categorization
loyalty in
described
types of service,
to prior mailings,
responded
a loyalty score generated by a proprietary model and
information about length of tenure.
above,
past
including
how often the customer
spending,
Table
Data
= Y
Target
NN = Y
NN
= N
non target s
NN
1-22
size =
= N
Target
targets
Segments
Relative
NN
categories
Relative
0.015
Non-NN
Non-NN
nontargets
Relative
size >
Consumers
size =
1
models
but who
are
Consumers
were
data
The
for our
in each
up
into targets and network

study are broken down
to the non-NN
relative
target group.
level, credit score, head of

of children in the household,
age
and home
in the household,
occupation
education
household,
of members
number
ownership.
the census
Some
of this information
was
inferred at
tract level from
the geographic
data.
As mentioned
earlier, we
Network
attributes:
served
communications
other consumers.
neighbor
flag
of current
subscribers
ob
with
to the simple network

earlier, we derived more
In addition
described
communica
attributes from prospects'
sophisticated
tion patterns. We will return to these in Section 5.6.
4.4
Data
Limitations
for all targets

data are available
example,
geography
across all three loyalty levels. On the other hand, as the
number of services and tenure with the firm decline,
so does
the amount
available
for each
not
on marketing
The
neighbors.
were
not
network
to be good
and
neighbors
prospects
also
the mar
by
model.
"relative
The
size"
value
shows
the number
of prospects
rate is very low. As discussed

inherent with a heav
challenges
overall
response
above, this presents

ily skewed response variable. For example, an analysis
that stratifies over many different attributes may have
several strata with no sales at all, rendering these strata
useless. The data set is large, which
mostly
ameliorate
this problem, but in turn presents
with
statistical
many
problems
sophisticated
In this paper, we restrict ourselves
forward analyses.
4.5
Loyalty
of information
to relatively
to
helps
logistical
analyses.
straight
Distribution
A look at the distribution

the four categories
(Figure
the firm targeted customers
of the loyalty groups across

shows that
1) of prospects
in the higher
loyalty groups
The
target group
heavily.
network-neighbor
this
appears to skew toward the less loyal prospects;
is due to the fact that segment 22, which makes
up a
com
of
the
large part
network-neighbor
population,
prises
predominantly
consumers.
low-loyalty
5. ANALYSIS
we
that
evidence
direct, statistical
with prior cus
communicated
tomers are more
likely to become customers. We show
this in several ways,
including
using our own best
Next
consumers
will
show
who
have
(e.g., transactions)
in in
the difference
target. Given
as loyalty varies, we grouped customers by
formation
in our
loyalty level and treated the levels separately
ducting
leaves three groups that

This stratification
analyses.
are mostly
with respect to miss
consistent
internally
out-of-sample
cated network
values.
they
considered
relatively
values for customers across

We encountered missing
information
is
all loyalty levels. The amount of missing
directly related to the level of experience we have had
For
with the customer just prior to the direct mailing.
ing
neighbors,
scored poorly
group,
data were necessary

data: Geographie
Geographie
for the direct mail campaign. These data include city,
state, zip code, area code and metropolitan
city code.
These
include
such
data:
information
Demographic
as gender,
but were
network
who
not
keting
show
who
to because
models.
targets
1-21
identified
by marketing
Prospects
not network
neighbors.
who
were
marketed
Segments
Relative
Notes.
size =0.10
identified
models
and who also
by marketing
Prospects
are network
in
22 have re
Those
segment
neighbors.
on the marketing
scores.
model
thresholds
duced
efforts
to build
improved
competing
targeting models
assessments
of predictive
thorough
data. Then
attributes
we
consider
and con
ability
more
and show that targeting
further.
on
sophisti
can be
266

tTaqjjr* H
Tm&tff
-?*?#t
-Vf/
V,;i.m.
i*rr
-r-ffl
<M?m
'*'? -";:-'"'.'.'
FlG.
distribution
category.
by customer
Loyalty
The network
(NN) show a much
neighbors
1.
categories.
5.1
Network-Based
Improves
Marketing
The
>'&'?S-Tj^:
three
bars
larger proportion
Response
who
the service
adopted
the offer. For each
within
sizes of
consumers
to log-odds ratios) for

parameter estimates
(equivalent
inter
the network attribute along with 95% confidence
vals for 20 of the 21 segments
(segment 5 had only a
and zero
small number of network-neighbor
prospects
Figure
the
three
than
loyalty
the non-NN
groups
for
our four
data
group.
sales, and therefore had an infinite

2 shows that in all 20 segments
the
effect is positive
(the parameter esti
network-neighbor
an increased
mate
is greater than zero), demonstrating
take rate for the network-neighbor
group within each
segment. For
is significantly
of 0 (p < 0.05),
bor significantly
While
segment,
specified period following
we performed
a simple logistic regression for the inde
attribute versus the depen
pendent network-neighbor
In Figure 2, we graphically
dent sales response.
present
odds
17 of these
different
ratio
value
that being a network neigh

indicating
affected sales in those segments.
ratios allow for tests of significance
of
an independent
variable,
as
pretable
comparisons
neighbor
the log-odds
segments,
from the null hypothesis
inter
they are not as directly
of take rates of the network
and non-network-neighbor
groups
take rates for the network
The
in a given
segment.
neighbors
are plotted versus the non-network
in Fig
neighbors
ure 3, where
to
the size of the point is proportional
the log size of the segment. All segments have higher
take rates in the network-neighbor
subgroup, except for
the one segment
sales
that had no network-neighbor
(the smallest sample size). Over the entire data set, the
take rates were greater by a fac
network-neighbors'
tor of 3.4. This value is plotted in Figure 3 as a dotted
line with slope = 3.4. The right-hand plot of Figure 3
shows the relationship
take
between
each segment's
I w
O
o?
its lift ratio, defined as the take rate for NN

The plot shows
by the take rate for non-NN.
that the benefit of being a network neighbor
is greater
rate and
of low-loyalty
log odds).
stratifying by many attributes known to be important,

is
variable
such as loyalty and tenure. The response
the take rate for the targets in the two months
following
the direct mailing. The take rate is the proportion of the
consumers
the relative
network-neighbor
Segmentation
provides an ideal setting to test the sig
inmodel
nificance and magnitude
of any improvement
information, while
ing by including network-neighbor
targeted
show
divided
segments with lower overall take rates.

As Figure 3 shows, some of the segments had much
higher take rates than others. To assess statistical
sig
for those
Segments
FIG.
ted as
2.
Results
by
log odds)
Parameter
estimates
plot
of logistic
regression.
ratios with 95% confidence
intervals.
The number
log-odds
at the value
plotted
ment numbers
(ordered
from
of the parameter
1.
Table
estimate
refers
back
to seg
of the network-neighbor
effect after account
ing for this segment effect, we ran a logistic regression
across all segments,
the main effects for the
including
nificance
-A
1
Jake
FIG.
Take
3.
267
(%} for Noti ^Network,
Rate
Take Rate
for Non-Network
rates for
with that of
segments.
marketing
Left: For each segment,
comparison
of the take rate of the non-network
neighbors
The
is proportional
the
to the log size of the segment.
size
There
is one outlier
not plotted,
with a take rate
neighbors.
glyph
of
and 0.3% for the non-network
lines are plotted
at x = y and at the overall
take-rate
of 11% for the network
neighbors
neighbors.
Reference
ratio of 3.4. Right: Plot of the take rate for the non-network
the network
group versus
lift ratio for
neighbors.
the network
attribute, dummy attributes

network-neighbor
terms between
and
the
interaction
segment
Two
terms had
the interaction
of
from
segment
cases, and one
The
to be deleted:
one
22, which
from
the network
and used
for each
the two.
only had network-neighbor

the segment with no sales from
We
neighbors.
stepwise variable
ran a full logistic

selection.
regression
Coeff (ci.)
Network neighbor (NN)

= 1
Segment
Segment
Segment
Segment
Segment
to get an
interval of
Significance2
negative and very close inmagnitude

of the main effects of the segments
to the coefficients
themselves.
There
are significant,
the segments
themselves
fore, although
in the presence
of the network attribute the segments'
effect ismostly
negated by the interaction effect. Since
1.7(0.9,2.5)
1.8(1.2,2.4)
2.1(1.3,3.0)
1.9(0.4,
3.3)
1.9(1.2,
2.5)
1.4(1.0,
1.9)
1.3(0.9,
1.7)
Segment = 8
Segment = 17
Segment = 19
NN x Segment = 1
NN x Segment ==2
NN x Segment = 4
NN x Segment = 6
NN x Segment = 7
NN x Segment = 8
NN x Segment = 17
NN x Segment = 19
is an esti
of these interactions
is important. Note
interpretation
that the magnitudes
of the interaction coefficients
are
2.0(1.7,2.3)
Segment = 5
attribute in the final model

network-neighbor
mate of the log odds, which we exponentiate
odds ratio of 7.49, with a 95% confidence
than half of the segment effects and

(5.64, 9.94). More
most of the interactions between
the network-neighbor
attribute and those segment effects are significant. The
Table 3
Coefficients and confidence intervalsfor thefinal segment model
Attribute
results of the logistic regression

reiterate the sig
of being a network neighbor. The final model
can be found in Table 3. The coefficient
of 2.0 for the
nificance
the segments represent known important attributes

this is evidence
loyalty, tenure and demographics,
being
a network
neighbor
is at least as important
like
that
in this
context.
1.5(0.7,2.2)
In Table
2.2(1.6,2.9)
4 we present an analysis of deviance

table,
to analysis
of variance used for nested lo
-1.1
(-2.1,
0.0)
an analog
-0.9
(-1.7,
-0.2)
-1.8
(-4.0,
0.4)
gistic regressions (McCullagh and Neider, 1983). The
-1.5
(-2.6,
-0.6)
-1.2
(-1.7,
-0.6)
-0.8
(-1.3,
-0.4)
-1.6
(-2.8,
-0.5)
-1.1
(-1.9,
-0.3)
table confirms
significant when a chi-squared

approximation
for the differences
of the d?viances.
The fact
many
of the attributes in the logistic regression model

Significance
shown at the 0.05 (*) and 0.01 (**) levels.
the significance
of the main effects and
Each level of the nested model
is
of the interactions.
is
interactions
are significant demonstrates

effect varies for different
network-neighbor
of the prospect population.
is used
that so
that the
segments

Tab
of deviance
Analysis
NN
interactions
10687
of attributes
at each
is shown
at the 0.05
not identified
us to compare
take rates
targets for the segments
types of targets. However, many of
targets fall into the network-only
segment data enable

of network and non-network
both
segment 22. Segment
Significance^1
9 63
370
8 41
10733
22
The
that contained
Change in deviance
10869
of the group
Significance
5.2 Segment
study
11200
Intercept
Segment
Segment + NN
Segment
DF
Deviance
Variable
E4
table fo.
that the
22 comprises
prospects
not
to
be
models
deemed
good can
original marketing
can
see
we
from
the
distribu
As
for
didates
targeting.
most
the
tion in Figure
1, this segment for
part contains
who had no prior relationship with the firm.
the take rates for segment 22 with the
compare
take rates for the combined
group, including all of seg
in
ments
the
leftmost
three bars of Figure 4.
1-21,
consumers
We
The network-neighbor
segment 22 is (not surprisingly)
as the NN groups in segments
not as successful
1-21,
1-21 were selected based
since the targets in segments
for mar
them favorable
that made
we
see
the
that
22 net
segment
keting. Interestingly,
non-NN
the
work neighbors
targets from
outperform
on characteristics
1-21. These
segment 22 network neighbors,
segments
on the basis of their network ac
identified primarily
likely by almost 3 to 1 to purchase
tivity, were more
than the more "favorable" prospects who were not net
work neighbors.
Since those in segment 22 either were
be unworthy
would have
(**)
by marketing
analysts
or were
prospects,
they represent
"fallen through the cracks"
deemed
Improving
Now we will
to
who
in the tradi
process.
a Multivariate
Targeting
assess whether
the NN
Model
attribute
can im
a multivariate
prove
targeting model
by incorporating
all that we know or can find out (over 150 different at
demo
tributes) about the targets, including geography,
and other company-specific
from
attributes,
graphics
internal and external sources (see Section 3.2).
As discussed
in Section
3.7, we tried to address
an important causal question
that
(as well as possible)
arises: Is this network-neighbor
effect due to word of
or simply due to homophily?
The observed ef
fect may not be indicating viral propagation,
but in
a
stead may
demonstrate
effective
way
very
simply
to find like-minded
people. This theoretical distinction
mouth
may not matter much to the firm for this particular type
of marketing
process, but is important to make, for ex
before
future campaigns
that try to
ample,
designing
take advantage of word-of-mouth
behavior.
we
cannot
control for unobserved
Although
ities, we can be as careful
to ensure that the statistical
1.35%
levels.
customers
tional marketing
5.3
(*) and 0.01
similar
as possible
in our analysis
NN prospects
of
the
profile
cases. Since
is the same as the profile for the non-NN
set contains many more non-NN
cases than
we
case
a
NN cases,
match each NN
with
single non
our data
0.83%
II
Network
Neighbors
Segs1-21
FlG.
4.
network
compared
nontarget
%%$
Wmn
' W/^n
Network
0.28%|
W0\
mz-y\
' Non-Network '
Neighbors
Segs1-21
Neighbors
Seg 22
rates for marketing

and non-network
neighbors
Take
segments.
neighbors
the all-network-neighbor
take
All
network
neighbors.
with
non-network-neighbor
group
(segments
Q.11%
?mm?
Network
rates for
in segments
22 and with
segment
rates are relative
to
1-21).
reasonably
NN group.
Neighbors
Non-Targets
Take
case
to it by calculating
that is as close as possible
scores
all
of
the
attributes
propensity
using
explanatory
considered
in
Section
At
the
end of
described
(as
3.7).
as
as is
this matching
the
NN
close
is
process,
group
NN
the
1-21
the
the
possible
in statistical
properties
to the non
to heterogeneity
of data sources across the three
we
scores to create
used
the propensity
loyalty groups,
a matched
data set for each group. For each (individu
in
ally), we fitted a full logistic regression
including
Due
teractions
and selected
a final model
using
stepwise
Table
Results
of multivariate
model
Loyalty
1
NN
Significant
NN
Discount
attributes
Level
calling plan (-)(I)
of Int'l
Referral
(-)
band
Tenure
firm
with
(-)
to loyalty
Belonged
Referral
program
of adults
Number
service
High Tech Msg
plan
Letter
of country
indicator
to
Belonged
loyalty program
Chumer
(-)
Recent
grad
at residence
children
Any
to
responder
mailing
High Tech model score (I)
College
Tenure
plan
Type of previous
score
Credit
Previous
(-)
Region
communicator
International
calling
plan
firm
with
Tenure
Comm.(I)
in house
# of devices
Revenue
NN
Discount
(vs. postcard)
responder
incentive
User
of
Any
children
to mailing
credit card
in house
(-)
(-)
in house
(-)
Child < 18 at home (-)
in house
Beta hat for NN

0.68 (0.46, 0.91)
(95% CI)
Take
rate
Notes.
the effect
variable
liers,
0.99
attributes
Significant
of the variable was
selection.
All
from
0.4%
across
levels
regressions
loyalty
a significant
interaction
(1) indicates
logistic
negative;
attributes were
checked
for out
with other at
and collinearity
the attributes
removed or combined
transformations
tributes, and we
for any significant correlations.
that accounted
Table
5 shows the results of the logistic
regres
were
to be
the
attributes
that
found
show
which
sions,
correlated with
those that were negatively
significant,
take rate, and those that had interactions with the NN
found the network
attribute. Each of the three models
attribute to be significant
along with several
neighbor
others. The significant attributes tended to be attributes
regarding the prospects'
previous relationships with the
firm,
with
such
0.84
(0.49,1.49)
0.9%
as previous
tenure
international
services,
and revenue
churn identifiers
spent with
firm,
the firm. These
attributes are typically correlated with

which explains the lack of sig
attributes,
demographic
attributes con
of many of the demographic
nificance
tenure with firm is significant
in
sidered. Interestingly,
In
the
with
different
1
and
but
2,
signs.
loyalty groups
correlated, but
loyal group, tenure is negatively
is
in the mid-level
loyalty group it positive. This unex
of
compositions
pected result may be due to differing
most
the two groups; those consumers with

long tenure in
who
be
the most
people
just never
loyal group might
long tenure in the other group
change services, while
an
that
be
indicator
they are gaining more trust
might
In loyalty group 1, there is limited in
in the company.
services with the firm. For
formation
about previous
0.3%
indicates
(p < 0.05). Bold
with
the NN variable.
those
(0.52,1.16)
significance
customers,
knowing
to
any
responded
previous
significant effect.
at 0.01
whether
marketing
level;
(-) indicates
the customer
campaigns
has
has a
Table 5 also shows parameter

estimates
for NN and
the take rates in the three loyalty groups. The take rates
are highest
in the group with the most
loyalty but, in
this group gets the least lift (smallest para
terestingly,
meter estimate)
from the NN attribute. So the impact
of network-neighbor
ments with lower
is stronger for those market

seg
loyalty, where actual take rates are
weakest.
5.4 Consumers
Not Targeted
above,
only a select subset of our
was
list
based
network-neighbor
subject to marketing,
on relaxed
on eligibility
thresholds
criteria. The re
As
discussed
mainder
of the list, the nontarget

network neighbors,
made up the majority.
customers were omit
Potential
reasons:
ted for various
to
they were not believed
have high-tech
capacity;
they were on a do-not-contact
was unreliable,
address
information
and so on.
list;
we
were
to
able
whether
Nonetheless,
identify
they
the product
in the follow-up
time period.
purchased
take rate for this group was 0.11%, and is shown
to the target groups as the rightmost bar in Fig
ure 4. Although
to, their
they were not even marketed
The
relative
take rate is almost half that for the non-NN

targets?
as some of the best prospects
by the market
consumers
without
ing team. This group comprises
chosen
270
characteristics
that would
have
any known favorable
on
the list of prospects.
The fact that they
put them
are network neighbors
alone supports a relatively high
rate, even
lends some
of direct marketing.
an
to
of word-of
support
explanation
mouth propagation
rather than homophily.
we
will
the remainder of the
Finally,
briefly discuss
take
This
consumer
in the absence
non-NN
space?the
tunately,
a take rate in this
it is very difficult
which could be considered
a baseline
category,
all of the other
estimate
cludes
Unfor
group.
nontarget
to estimate
rate for
take rates. To do this, we would need to

This in
the size of the space of all prospects.
the firm knows about, as well
all of the prospects
as customers
of the firm's competitors
and consumers
this product that do not have cur
who might purchase
rent telecommunications
service with any provider.
It
has been established
that the size of the communica
tions market
best
(Poole, 2004); our

take rate put it at well
at least an order of magnitude
less than
is difficult
to estimate
of this baseline
estimates
below
0.01%,
the nontarget network neighbors.
of our study is that
On the other hand, a by-product
we can upper-bound
the effect of the mass market
even
in general
the target-NN
by comparing
ing campaigns
The
and
the
in
difference
group.
group
nontarget-NN
rates
the targeted network neighbors
take
between
and
the nontargeted
This difference
network neighbors
is about
10 to 1.
cannot all be attributed to the marketing
chosen
effect, since the targeted group was specifically
to be better prospects
and it is likely that more of them
would have signed up for the service even in the total
it does seem reason
of marketing.
However,
an
able to call this factor of 10
upper bound on the
effect of the marketing.
absence
5.5 Out-of-Sample
These
Ranking
that we
results
estimations
Performance
suggest
as to which
can give fine-grained

are more
or less
as network-neighbor
status. Note that in different
business
scenarios, different
types and amounts of data
are available. For example,
for low-loyalty
customers,
are
few
attributes
known.
We
very
report
descriptive
results here using all attributes; the findings are quali
well
subset of attributes
tatively similar for every different
we have tried (namely,
segment,
loyalty, geography,
The response
is the same as
variable
demographic).
and we
above
els. We measure
nary response
can be
to respond to an offer. Such estimations
the consumer
pool is immense and a
quite valuable:
a
limited
will
have
be
budget. Therefore,
campaign
ing able to pick a better list of "top-/:" prospects will
to increased profit (assuming
lead directly
targeting
prises
the ability
to rank customers
consumer,
we
all of the traditional
ing loyalty,
demographic
statistic,
for each predicted possible probability

score cutoff re
from
the
model.
sulting
logistic regression
Specifically,
the AUC is the probability
that a randomly chosen (as
yet unseen) taker will be ranked higher than a randomly
chosen nontaker; AUC = 1.0 means
the classes are per
= 0.5 means
and
AUC
the list is ran
fectly separated
domly shuffled. All reported AUC values are averages
obtained
using 10-fold cross-validation.

6 shows the AUC
values for the three
Table
alty groups,
the expected
regression models.
quantifying
benefit
from
loy
the
There
is an in
improved
logistic
crease in AUC for each group, with the largest increase
to loyalty level 1, for which
the least infor
belonging
mation
is available;
note that here the ranking
the network
is not much
information
without
than
ability
better
random.
this improvement,
Figure 5(a) shows cu
curves
mulative
when using the model
response ("lift")
on loyalty group 3. The lower curve
the per
depicts
formance of the model
all
traditional
attributes,
using
To visualize
and the upper curve includes the traditional marketing

attributes and the network-neighbor
attribute. In Fig
ure 5(b), one can see the marked
that
improvement
Table
ROC
analysis:
create
attributes
accurately.
a
record
that
(trad atts),
and geographic
as
that result from

regression
10.54
the application
of
trad
+ NN
models
atts
atts
0.60
20.64
0.67
30.60
0.64
were built using all available

models
logistic
regression
with
and without
(trad atts + NN)
(trad atts) the network
see an increase
across
attribute. We
in AUC
all loyalty
neighbor
when
NN
is
the
attribute
included
the
in
model.
groups
The
attributes
includ
attributes,
values
trad
Loyalty
Note.
com
AUC
logistic
higher for higher ranked prospects).

In this section, we show that combining
the network
attribute with
the traditional
attributes
im
neighbor
each
logistic regression mod

the predictive
ranking ability in the bi
variable by an increase in theWilcoxon
to the area under

equivalent
curve
the
(AUC). The ROC
represents
trade off between false negative and false positive rates
Mann-Whitney
the ROC curve
costs are not much
For
the same
customers
likely
proves
used
271
2.5
- trad atts
-trad atts + NNi
1.5
E
3
O
0.5
% of Consumers
Targeted
0.8
0.6
0.4
0.2
Cumulative
(Ranked by Predicted
0.4
0.2
Cumulative
Sales)
% of Consumers
0.6
Targeted
(a)
0.8
Sales)
(b)
curves
built with all attributes

with
(trad atts) and without
(trad atts + AW)
it. For example,
the model
without
outperforms
if the firm sent out 50% of the
to receiving
with
without
it. (b) Top-k
the NN compared
responses
get 70% of the positive
only 63% of the responses
they would
mailing,
scores from
are
the
NN
attribute
the
model.
The
model
that
includes
the
Consumers
ranked
logistic
regression
by
probability
analysis.
the NN attribute
and
For example, for the top 20% of targets,
the take rate is 1.51% without
1.72% with the
the model
without.
outperforms
FIG.
5.
curves.
(a) Lift
NN
Power
of the segmentation
with the NN
The model
attribute.
network-neighbor
for models
attribute
attribute.
from sending to the top-/: prospects

for the top 20% of the list,
be obtained
would
on the list. For example,

without
the NN attribute,
the take rate is 1.51%; with

it is 1.72%. The NN attribute does not
the NN
attribute,
improve the ranking
measures
of social
phisticated
network of existing customers.
a set
7 summarizes
Table
network
from
Performance
Improving
By Adding
Attributes
Network
Sophisticated
More
that can be extracted
interaction
from
the
data. We now investigate whether

augment
more
social-network
with
the
model
sophisticated
ing
can add additional
value. In this section,
information
network
we
focus
on the social
the current
"the network"),
along with
of prospects who have communicated
will
call
the network
whether
we
(the network
can improve
targeting
the periphery
with those on
We
investigate
so
by using more
neighbors).
Table
Network
them represent
between
the nodes. The
relationships
intuitive social notions,
SNA measures
help quantify
such as connectedness,
social im
influence, centrality,
on.
so
to
and
understand
portance
Graph theory helps
attribute
ating mathematically.
Three of the attributes
and methods
that we
introduce
7
descriptions
Description
Number
customers
of unique
communicated
customers
of transactions
to/from
Number
of seconds
Number
Degree
Transactions
of communication
Connected
to influencer
Connected
component
Max
them as interconnected
similarity
communicated
with
with
before
customers
before
the mailing
the mailing
before mailing
Is an influencer
size
Size
Max
for oper
can be de
rived from a prospect's

local neighborhood
(the set of
re
on
immediate communication
the
network;
partners
call that these all are current customers). Degree mea
Attribute
Seconds
of social-network
problems better by representing

nodes, and provides vocabulary
that comprises
(only)
of this service (which here we
network
customers
social
Social-network
a consumer
is a network neigh
whether
Knowing
indicators of consumer-to
bor is one of the simplest
consumer
the fields
additional
the
and
analysis
involves
(SNA)
graph theory.
analysis
trans
information
measuring
(including
relationships
on
a
in
network.
The
nodes
between
mission)
people
the network
and
the
links
between
represent people
degree
5.6
of
with
add to the logistic regres

use is borrowed
to some
attributes
sion. The
for the top 10% of the list.
that we
we
terminology
relationship
in prospect's
local neighborhood?
to
of the connected
component
prospect
belongs
in local neighborhood
with any existing
overlap
neighboring
customer
272
sures
the number
of direct
connections
a node
Table
has.
ROC
we also count the num

the local neighborhood,
ber of Transactions,
and the length of those transac
tions (Seconds of communication).
Within
The network
Given
tices
is made up of many disjoint subgraphs.

a graph G = (V, E), where V is a set of ver
is a set of links between
them,
(nodes) and ?
are the sets of ver
the connected
of G
components
tices such that all vertices in each set are mutually
con
Attribute(s)
AUC
Transactions
0.68
Seconds
of communication
to influencer
Connected
component
All
network
All
traditional
nected
component may be an indicator for awareness

of and positive views about the product. If a prospect
All
traditional
to a large set of "friends" all of whom have

the
she may be more
service,
adopted
likely to adopt
herself. Connected
is
the
size of the
size
component
(in the network) to which
component
largest connected
Note.
is linked
the
borhood.
Observing
local neighbors,
prospect's
of social
a prospect's
local neigh
local neighborhoods
of a
we can define a measure
as the
similarity. We define social similarity
size of the overlap in the immediate network neighbor
hoods of two consumers. Max
is the max
similarity
imum
the prospect and any

similarity between
of the prospect.
the firm also can
neighbors
Finally,
observe the prior dynamics
of its customers.
In partic
communi
ular, the firm can observe which customers
cated before and/or after their adoption as well as the
social
date customers
this information, we
signed up. Using
define influencers as those subscribers who signed up
we see one of their
for the service and, subsequently,
network
neighbors
sign up for the service. Connected
to influencer is an indicator of whether
the prospect
is
to one of these influencers. We appreciate
connected
that we do not actually know
if there was true influ
ence.
We
use all of the aforementioned
AUC
values
find
that some
and show
for these predictive models

in Table 8.We
of these network
attributes have con
siderable
more
attributes
and have even

individually
This is indicated by AUCs
transactions
and seconds of commu
predictive power
value when combined.
of 0.68
for both
0.68
0.59
Degree
Connected
0.53
size
0.55
0.55
Similarity
(reachable by some path) and no two vertices

in different
sets are connected.
The size of the con
nected
the prospect
is connected.
We
also move
beyond
analysis
each
AUC
of
0.71
(loyalty,
demographic,
+ all network
0.71
values
result
the constructed
in combination.
Results
from
network
geographic)
0.66
built
models
logistic
regression
as
well
attributes
individually,
are presented
for
loyalty-level
on
as
3 customers.
neighbors, who already have especially

high take rates
as a group, as we have shown.)
when we combine
the traditional at
Interestingly,
tributes with
the network
there is no ad
attributes,
even
in
of these
AUC,
gain
though many
attributes were shown to be significant
in the broader
analysis above. The similarities
represented
implicitly
or explicitly
in the network attributes seem to account
ditional
for all useful
information
captured by traditional de
and
other
attributes. That tra
mographics
marketing
ditional demographics
and other marketing
attributes
do not add value is not only of theoretical
interest, but
as well?for
in cases such as this
practical
example,
where demographic
data must be purchased.
Our result is further confirmed
by the lift and take
rate curves displayed
in Figure 6(a) and (b), respec
tively. One can achieve substantially
higher take rates
to using
using the new network attributes as compared
the traditional attributes. For example, we find that for
the top 20% of the targeted list, without
the network
the take rate is 2.2%; with the network at
it
is 3.1%. Likewise,
at the top 10% of the list,
tributes,
the take rate with the network attributes is 4.4% com
them.
pared to 2.9% without
attributes,
6. LIMITATIONS
do not find high AUCs

for
individually
or
connected
to
size,
component
similarity
we find that the logistic regres
influencer. Ultimately,
We believe our study to be the first to combine data

on direct customer communication
with data on prod
uct adoption
to show the effect of network-based
mar
sion model
there are limitations

in
keting
statistically. However,
our study that are important to point out.
There are several types of missing,
or
incomplete
unreliable
data which could influence our results. We
have records of all of the communication
(using the
We
nication.
connected
an AUC
built with
of 0.71
the network
compared
the network
results
in
of 0.66 without
only the traditional mar

in previous
sections.
(Re
the ability to rank the network
attributes?using
attributes described
keting
call that this represents
attributes
to an AUC
273
0.8
6
CO
? 0.6
4
0.4
0.2
3
trad atts
trad atts + net;
2
1
0.6
0.4
0.2
Cumulative
% of Consumers
Targeted
0.8
0.2
0.4
% of Consumers
Cumulative
Sales)
(a)
FIG.
atts)
6.
(a) Lift
the network
attributes
attributes
rate
Power
attributes.
without
the model
without
net) outperforms
and 3.1 % with
attributes
the network
them
(trad atts).
the network
to and from current customers

of the
service)
service. That is not true for all the network-neighbor
consumers.
infor
As such, we do not have complete
as
well
mation about the network-neighbor
targets (as
some
In
the non-network-neighbor
addition,
targets).
of the attributes
we used were
external
be at least partially
is not well known
sources.
collected
These
by purchasing
data are known to
erroneous
and outdated, although it

so. An additional
how much
prob
from external sources

lem is joining data on customers
to missing
to internal communication
data, leading
or
data.
incorrect
data
sometimes
Finally,
just blatantly
telecommunications
firms are not legally able to col
lect
information
communication,
the consumers
Sales)
(b)
firm's
data from
cundes for models

built with all traditional
with (trad atts + net) and without
attributes,
(trad
of segmentation
have received
11% of the positive
with the network
responses
they would
If the firm sent out 50% of the mailing,
to receiving
63% of the responses
attributes,
The model
the network
without
the network
(b) Top-k analysis.
including
curves.
compared
(trad atts +
is 2.2%
0.8
0.6
Targeted
the actual content of

regarding
so we are not able to determine
the
if
discussion
forums.
effect to manifest
expect the network-neighbor
of
for different
itself differently
types
products. Most
have fo
of the studies done to date on viral marketing
We
on the types of products

that people are likely
as
a
to talk about, such
new, high-tech
gadget or a re
cently released movie. We expect there to be less buzz
cused
example,
the new
for
the top 20%
of
target
ranked
by score,
the take
service
studied
here
to a roll-out
of another
product by the same firm. This other product was sim

ply a new pricing plan for an older telecommunications
service.
Customers
who
signed
stand to save a significant

pending on their current usage
could
range and variety

in the marketplace
up for this new plan

amount of money, de
the
patterns. However,
of telecommunications
pricing plans
to
and so confusing
that this
do not believe
is so extensive
the typical consumer

that we
is the type of product that would
consumers. We
cussion between
generate a lot of dis

refer to the two prod
as the pricing
For
plan and the new technology.
the pricing plan, we have the same knowledge
of the
network as we do for the new technology.
For those
ucts
In
in question
discussed
the product.
our data are inferior to some other do
content is visible, such as Internet bulletin
this regard,
mains where
boards or product
For
attributes.
consumers
who
who
belong
they communicate
to the pricing plan, we know

with and then we can follow
to see if they ulti

these network-neighbor
candidates
a measure
construct
for
the
We
sign up
mately
plan.
a
as
of "network neighborness"
series of
follows. For
consecutive
who
ordered
we gather data for all customers

months,
the product
in that month. We calculate
the percentage
of these new customers who were net
com
work neighbors,
that is, those who had previously
municated
with a user of the product. This percentage
like a new deodorant or a sale

for less "sexy" products,
on grapes at the supermarket. The study presented
in
this paper involves a new telecommunications
service,
and features that con
which involves a new technology
sumers have perhaps never been exposed to before. The
is a measurement
of the proportion
of new sales be
driven
network
effects.
this per
ing
by
By comparing
centage across two products, we get insight into which
product stimulates network effects more.
and features are such

firm hopes the new technology
that they would encourage word of mouth.
What can we say about other products that might not
To study this, we compared
be quite so buzz-worthy?
an 8-month
We
now
ucts was
for our two products over

period. The time period for the two prod
so that it would
the first
chosen
be within
look at this value
year after the product was broadly available. The re

sults are shown in Figure 7. The two main points to
274

based on how much
decisions
marketing
they
and potential
customers.
about their customers
to mass market when
They may choose
they do not
know much. With more
information,
they may market
on
some
observed
We
characteristics.
directly based
make
N?w5?rvke
Pitting P?ar)
know
s\
and how well a

that whether
strong evidence
provide
consumer
is linked to existing customers
is a powerful
characteristic
sions. Our
on which
results
the use of social
Month
Fi G. 7.
plot for
Network-neighborness
new
service
versus
pricing
The
we
here
study
telecommunciations
recently
for $2.6
the pricing plan has a flat network-neighbor

never increasing above 3%.
percentage,
Google's
e-mail
explicit
networks
the dip in the plot for the new service

exactly to the month of the direct market
corresponds
ing discussed
we
can see
earlier. Before
the campaign,
that the network-neighbor
that
effect was increasing,
more
and more
in a given month
of the purchasers
were network neighbors.
the mass marketing
During
we exposed many non-network
campaign,
neighbors
to the service and many of them ended up purchasing
it, temporarily dropping the network-neighbor
percent
age. After the campaign, we see the network-neighbor
percentage
starting to increase again.
measure
This network-neighborness
should
not be
of the product, as the pric

was
from a sales perspective,
ing plan
quite successful
but it does suggest that the pricing plan is a product
that has less of a network-based
spread of information.
confused
with
the success
might be due to the new service creat

more
or perhaps we are seeing the
word-of-mouth
ing
effects of homophily.
interact with each
People who
This
other
difference
are more
for purchasing
for purchasing
fects of word
likely to be similar in their propensity

the new service than in their propensity
a particular pricing plan. Again,
the ef
of mouth
to discern without
are difficult
homophily
the content of the commu
versus
knowing
nication.
7. DISCUSSION
One
of the main
and to whom
concerns
for any firm is when, how

their products. Firms
they should market
deci
from
to predict the likelihood

of
the network data into account
im
networks
sort of directed
take away are that the new service has a higher per
cent of purchasers who are network neighbors
and also
an increasing one (except for the dip in month
5). In
Interestingly,
indicate
direct marketing
that a firm can benefit
purchasing.
Taking
on both the firm's
and substantially
proves significantly
own marketing
"best practices"
and our best efforts to
collect and model with traditional data.
plan.
contrast
to base
explicit
has
network-based
applicability
that
marketing
traditional
beyond
For
companies.
example,
eBay
upstart Skype
purchased
Internet-telephony
billion;
they now also will have large-scale,
data on who
talks to whom. With
gmail,
now has access
to
service, Google
of consumer
and
interrelationships
directed network
gmail for marketing;
already is using
based marketing might be a next step. Various
systems
have emerged
that provide
recently
explicit
linkages
between
Friendster,
acquaintances
(e.g., MySpace,
which could be fruitful fields for network
Facebook),
create interlinked
As more consumers
based marketing.
source
another
data
arises.
More
these
blogs,
generally,
results suggest that such linkage data potentially
could
be a sort of data considered
for acquisition
by many
data now are being col
types of firms, as purchase
lected routinely by many
types of retail firms through
Even
cards.
academic
could bene
loyalty
departments
in spe
such data; for example,
the enrollment
classes could be bolstered
to
cialized
by "marketing"
those linked to existing
students. Such links exist (e.g.,
fit from
It remains to design
via e-mail).
to all.
that are acceptable
tactics
for using
them
It is tempting to argue that we have shown that cus

tomers discuss
the product and that discussion
helps to
rates.
not the
take
word
of
mouth
is
However,
improve
our
As
for
result.
discussed
only possible
explanation
in detail
source
above, itmay
of information
is in accord with
is a powerful
which
homophily,
be that the network

on consumer
social
theories (Blau, 1977; McPher

and Cook, 2001). We have tried to
son, Smith-Lovin
control for homophily
by using a propensity-matched
sample to produce our logistic regression model. How
be
ever, it may well be that direct communications
tween people
is a better indicator of deep similarity
or geographic
than any demographic
or word of mouth,
cause, homophily
and practically.
theoretically
Either
attributes.
is interesting
R.
FiLDES,
both
V. Mahajan,
19 327-328.
and
of AT&T, as well
for useful
versity of Maryland,
We
suggestions.
mous
reviewers
Getoor,
drafts.
previous
Learning
Mining
Berlin.
Springer,
Learn.
REFERENCES
G.
A.
and Tuzhilin,
the next
Toward
(2005).
A survey of the state-of

of recommender
systems:
generation
IEEE Trans. Knowledge
extensions.
the-art and possible
and
Data Engineering
17 734-749.
D.
Agarwal,
ties of
Philadelphia.
F. M.
Bass,
of Social
D. and Narayandas,
Theory
contacts
initiated
with
D.
Web
pertextual
L.
search
30 107-117.
Systems
A.C.
(1991). Spatial
metrica
S. (1998). Toward
and cost distributions:
and Stolfo,
non-uniform
class
fraud
In Proc.
detection.
on Knowledge
Press, Menlo
S. H.
and
DELLAROCAS,
Promise
agement
event
physics
67 159-182.
M.
of
tional
on Knowledge
Conference
ACM Press, New York.
W.,
DuMouchel,
and Pregibon,
Fifth
ACM
In Proc.
Volinsky,
D.
rule-learning
classification.
Computer
R.
G.
for information
Huang,
Mining
Going
technology
and methods.
J. Assoc.
6-15.
the network
).Mining
SIGKDD
Interna
and Data
Discovery
Mining
Johnson,
T.,
beyond
innovation
Information
Press,
C.
Cortes,
flatter.
In Proc.
on Knowledge
York.
New
the dominant
paradigm
con
Emerging
5
314-355.
Systems
research:
J. Mach.
relation
probabilistic
WEBKDD
San
1999,
Yorker March
17,
K.
Point:
How
Little
T. L.
and Baker,
environment
Inves
(2002).
in hedonic
service
events.
J. Busi
of sporting
study
Can
Things
Boston.
Books,
E.
W.
and Handcock,
to social
MR
Sta
(2004). A graph model

J. Amer. Soc. Informa
H. C.
S.
systematic
(2002).
/. Amer.
analysis.
systems.
55 259-274.
and Technology
N.
and Stephen,
La
S.
1951262
and Chen,
recommender
M.
network
The
(2002).
Intelligent
study.
imbal
class
Data
Analysis
429-449.
R.
and Agarwal,
(2001).
Evaluating
rare
to
and
classes:
algorithms
classify
boosting
Comparison
on Data
In Proc. IEEE International
Conference
improvements.
IEEE Press,
257-264.
Mining
Kautz,
V.
Kumar,
M.,
H.,
Selman,
Combining
social
B.
and
networks
NJ.
Piscataway,
M.
Shah,
and
(1997).
web:
Referral
collaborative
Comm.
filtering.
sources
in a hyperlinked
ACM 40(3) 63-65.

J. (1999). Authoritative
J. ACM 46 604-632.
BERG,
V.
T. V.
and Krishnan,
An
D.
for
problem
MR
(2002).
networks.
on Information
Conference
556-559.
Linden,
G.,
Smith,
B.
and
worked
Paper
Working
York University.
Mahajan,
toolkit
(2003).
F. (2004).
a univariate
and
Stern
#CeDER-04-08,
V., M?ller,
J.
E.
for new products

strategy
mouth. Management
Sei.
and Kerin,
with
30
positive
1389-1404.
318-330.
link pre
Twelfth
collaborative
7 76-80.
Computing
S. and Provost,
data:
In Proc.
and Knowledge
York,
recommendations?Item-to-item
Macskassy,
diffusion
Multinational
framework.
social
en
1747649
Sei. 21
Marketing
and KLEINBERG,
J. (2003).
The
alternative
LlBEN-NowELL,
Internet
ACM
A.
Chung,
Japkowicz,
ance problem:
diction
ACM
B.
Taskar,
55 691-101.
approaches
97 1090-1098.
Z.,
Kumar,
Man
and
The New
tional
(2001
Seventh
C,
mechanisms.
D.
coolhunt.
M.
Raftery,
vironment.
of mouth:
of word
flat files
Squashing
International
Conference
(2004).
R D.,
models:
(1999).
SIGKDD
and Data
Discovery
FlCHMAN,
(1991).
307-338.
eds.)
of link structure.
Using
In Proc.
The Tipping
Back Bay
Brady,
R.,
Research
Klein
P. and Richardson,
customers.
study
Conference
AAAI
164-168.
Mining
E. G.
value
57-66.
case
The
the role of the physical

tigating
An exploratory
consumption:
Joshi,
Econo
learning with
in credit
scalable
A
The
C
(2003).
digitization
and challenges
feedback
of online
Sei. 49 1407-1424.
Domingos,
Hightower,
tion Science
demand.
(1999).
filtering.
(1997).
for E-commerce
/. Market
Lavrac,
Koller,
M.
M.
GLADWELL,
(2002).
a Big Difference.
Make
International
Data
Stern,
in high-energy
Communications
program
Physics
Fourth
and
Discovery
Park, CA.
Clearwater,
cepts
of
59 953-965.
E.
card
in household
patterns
share
of a large-scale
hy
Networks
and ISDN
anatomy
Computer
engine.
Case,
Chan,
The
(1998).
on
impact
behavior.
and word-of-mouth
category
requirements
Research-38
281
-297'.
ing
S. and Page,
M.
space
tist Assoc.
customer
N.
A.
In Relational
models.
78-88.
tent
Managing
The
and Sahami,
Springer,
D.
and Pfeffer,
relational
and
In
learning.
Conference.
Berlin.
CA.
Diego,
Hoff,
A Primitive
415.
3625
N.,
for collaborative
Gladwell,
ness
(2001).
manufacturers:
SIAM,
models
consumer
and Heterogeneity:
Inequality
Structure.
Free Press, New York.
Bowman,
Brin,
In Proc.
Mining.
for model
growth
product
Sei. 15 215-227.
Management
P. M.
(1977).
communi
Enhancing
blockmodels.
new
(1969).
durables.
BLAU,
(2004).
stochastic
using Bayesian
on Data
International
Conference
interest
SIAM
Fourth
D.
and Pregibon,
L.
Getoor,
relational
International
Koller,
N.,
Friedman,
L.,
statistical
models
Learning
probabilistic
Res. 3 679-707.
MR1983942
(2003).
Adomavicius,
cooperation,
Research
20
J. Consumer
15th
probabilistic
(S. Dzeroski
Data
Getoor,
on
Programming,
in Comput.
Sei.
Friedman,
L.,
(2001).
on
comments
insightful
Structure,
(1993).
information.
Tutorial
(2005).
Logic
Notes
Lecture
discussions
offered
who
L.
ductive
and helpful
like to thank three anony
also
would
K.
of market
Models,
Diffusion
by
Internat. J. Forecasting
360-375.
Paul and Deepak Agar

as Chris Dellarocas
of the Uni
wal
the flow
GETOOR,
like to thank DeDe
We would
of New-Product
(2003). Review
E. M?ller
and Y. Wind,
eds.
J. and Nakamoto,
Frenzen,
ACKNOWLEDGMENTS
275
Amazon.com
filtering.
Classification
case
School
R.
Interna
Management
study.
of Business,
IEEE
in net
CeDER
New
Introduction
(1984).
word-of
and negative
276
Models.
and Hall,
Chapman
McPHERSON,
M.,
cation
trees
Learn.
Res.
and class
A.
(2001).
N.
PAUMGARTEN,
Interfaces
R. (2004).
Learning,
Banff,
Learning.
(2003). No.
Yorker May
5.
C. and Provost,
Perlich,
Learning
D.
Applying
for collaborative
Relational
on Machine
POOLE,
J. (2001).
Annual
Boosted
(2006).
estimation.
for relational
quantitative
31(2) 90-108.
learning
acknowledged.
Distribution-based
(2006).
with
Canada.
Alberta,
1 fan dept.
F.
marketing
M.
sites
score.
P. R.
identifier
attributes.
aggre
Machine
Estimating
of
the telephone
universe.
In Proc.
Tenth ACM
and Domingos,
for viral marketing.
P. (2002).
In Proc.
Discovery
knowledge
Mining
ACM SIGKDD
Eighth
Statist.
D.
Evans,
D.
B.
ed.
Free
bias in
Reducing
on the propensity
(1984).
subclassification
using
Assoc.
79 516-524.
J. and Yakan,
A.
case in predictive
tering: Special
Mathematics
82 1-11. MR2159280
fil
Collaborative
(2005).
Internat.
analysis.
J. Computer
T. (1990). A study of a competitive

Bass model which
takes
into account
firms.
J.
Research
among
competition
Operations
Society
of Japan 33 319-334.
Ueda,
den
C.
Bulte,
tion revisited:
can J. Sociology
R. (2004).
New
costly:
J. Artificial
G. M.
WEISS,
Yang,
The
The
effect
Intelligence
(2004).
SIGKDD
hidden
of
Dec.
Research
Mining
preferences.
L.
versus
effort.
sight)
Learning
distribution
persuaders.
when
on
Ameri
The
training data
tree induction.
19 315-354.
with
rarity: A unifying
Newsletter
6 7-19.
(2003). Modeling
/. Marketing
Research
innova
Medical
(2001).
marketing
(in plain
5, 69-75.
(2003).
class
Explorations
S. and Allenby,
G. M.
consumer
G.
contagion
106 1409-1435.
York Times Magazine

G. and Provost,
F
are
ACM
and Lilien,
Social
Walker,
Weiss,
the size
5th
Innovations,
of
Diffusion
and Rubin,
studies
J. Amer.
K.,
Van
The New
(2003).
and Data
Discovery
York.
observational
J. Mach.
probabilistic
In Proc. Workshop
filtering.
21st International
Confer
Bayesian
Mark-recapture
approach.
on Knowledge
International
SIGKDD
Conference
and Data Mining
659-664.
Richardson,
New
Press,
classifi
62 65-105.
(2004).
E. M.
ROGERS,
of
Hierarchical
sharing
Birds
Review
Tout,
L.
models
on Statistical
gation
A.
probability/quantile
J. and Greiner,
relational
ence
and Buja,
to the Internet.
techniques
61-70.
Mining
To appear.
Montgomery,
Newton,
and COOK,
networks.
on Knowledge
Conference
International
MR0727836
Rosenbaum,
A.
D., Wyner,
Linear
Generalized
(1983).
York.
L.
in social
Homophily
27 415-444.
Sociology
New
Smith-Lovin,
of a feather:
Mease,
J. A.
P. and Nelder,
McCULLAGH,
framework.
interdependent
40 282-294.

27645754

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

27645754

Uploaded by

Copyright:

Available Formats

Network-Based Marketing: Identifying Likely Adopters via Consumer Networks

Author(s): Shawndra Hill, Foster Provost and Chris Volinsky

and Chris Volinsky

between consumers. We survey the diverse literature

so as to permit the selection

cial network among consumers.

seeks to increase brand

and Foster Provost

or adoption spreads from consumer

School of Business, New York University, New York, New

Statistics Research Department,

such as Oprah, with her monthly

lationship network. The success of The Da Vinci Code,

posed, and the data and analytic

very high probabilities

are three, possibly

your free e-mail

data, are significantly

and prior purchase

they would not have been

1997 it had over

of being on the cutting

exciting products. The firm

S. HILL, F. PROVOST AND C. VOLINSKY

on products, parsing such information

may have greater value to firms because

that have been used

some of the statistical

this often means

analysis are found at the core of econo

paign over time.

2003) and elections

into the covariance

process. This framework

similarity. The authors

spans the fields

the framework presented

and (6) collaborative

their explicit communication

tity of interest for those entities. Typically,

use an entire network to make predictions

some studies use survey

can collect data that are

can design their sampling

on the Internet, and out of Google's

typically are used.

viously had contacted

to assign a "network value"

ternet site and found

that their proposed methodology

are linked when a customer

(they assume a review

Since the purpose of models

and its impact

if so, how many people

like this collect

some direct data on consumers'

the design of the experi

talk to each other.

S. HILL, F. PROVOST AND C. VOLINSKY

subjects were presented with several scenarios that rep

tie. In this study, the authors did not allow

form their information

was proposed by Bass (1969). The Bass model