Professional Documents
Culture Documents
570590
ISSN 0732-2399 (print) ISSN 1526-548X (online) http://dx.doi.org/10.1287/mksc.2013.0786
2013 INFORMS
Eva Ascarza
Columbia Business School, New York, New York 10027, ascarza@columbia.edu
Bruce G. S. Hardie
London Business School, London NW1 4SA, United Kingdom, bhardie@london.edu
A s firms become more customer-centric, concepts such as customer equity come to the fore. Any serious
attempt to quantify customer equity requires modeling techniques that can provide accurate multiperiod
forecasts of customer behavior. Although a number of researchers have explored the problem of modeling
customer churn in contractual settings, there is surprisingly limited research on the modeling of usage while
under contract. The present work contributes to the existing literature by developing an integrated model of
usage and retention in contractual settings. The proposed method fully leverages the interdependencies between
these two behaviors even when they occur on different time scales (or clocks), as is typically the case in most
contractual/subscription-based business settings.
We propose a model in which usage and renewal are modeled simultaneously by assuming that both behav-
iors reflect a common latent variable that evolves over time. We capture the dynamics in the latent variable
using a hidden Markov model with a heterogeneous transition matrix and allow for unobserved heterogeneity
in the associated usage process to capture time-invariant differences across customers.
The model is validated using data from an organization in which an annual membership is required to gain
the right to buy its products and services. We show that the proposed model outperforms a set of benchmark
models on several important dimensions. Furthermore, the model provides several insights that can be useful
for managers. For example, we show how our model can be used to dynamically segment the customer base
and identify the most common paths to death (i.e., stages that customers go through before churn).
Key words: churn; retention; contractual settings; access services; hidden Markov models; RFM; latent variable
models
History: Received: April 10, 2012; accepted: February 9, 2013; Preyas Desai served as the editor-in-chief and
Scott Neslin served as associate editor for this article. Published online in Articles in Advance May 22, 2013.
570
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 571
of its facilities. And let us consider Susan, who cur- a dynamic latent variable model that captures the
rently has certain fitness goals and is committed to interdependencies between these two behaviors, even
exercise; as such, she is a member of her local gym. when they occur on different clocks. The proposed
Susans commitment will be reflected in the number model fully leverages the two-clock nature of most
of times she exercises in a particular week. If she feels contractual settings by allowing usage and retention
very committed, she is likely to exercise very often. to occur on two different time scales. This approach
However, if her commitment decreases over time, her makes it possible to generate accurate multiperiod
forecasts of usage and renewal behavior and pro-
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
renewal at t = 12 because that would require usage shows that differences in satisfaction levels explain a
data for periods 811, which are currently the future.2 substantial portion of the variance in contract dura-
One possible solution is to create a usage sub- tions. Similarly, Bolton and Lemon (1999) find a sig-
model in which current usage is modeled as a func- nificant relationship between satisfaction and usage.
tion of past usage. The churn and usage models can Other researchers have examined alternative attitudi-
be cobbled together to provide multiperiod forecasts nal constructs such as commitment (e.g., Gruen et al.
of both behaviors (i.e., use the usage model to pre- 2000, Verhoef 2003). In this research, we assume the
dict future usage, which then serves as an input into existence of such a latent variable. We seek to model
the churn model to predict future churn).3 Examples it and its effects on the manifest variables; however,
of this approach in the marketing literature include we do not formally define the variable, a position
Borle et al. (2008), who combine an interpurchase consistent with most latent variable models in the
time model for usage with a conditional hazard for statistics, biostatistics, econometric, and psychometric
retention, and Bonfrer et al. (2010), who combine a literatures.4 For the sake of linguistic simplicityand
geometric Brownian motion process for usage with a given the nature of our empirical settingwe will call
first-passage time model for defection. Both models it commitment going forward.5
provide multiperiod forecasts of usage and churn, but There are several advantages associated with such
they assume that churn can occur each usage period a modeling approach. First, it easily accommodates
(i.e., usage and renewal decisions occur on the same and leverages the fact that usage and renewal typ-
clock), thus limiting the settings in which they can ically occur on two different time scales, without
be applied. the need to ignore (or aggregate) useful informa-
Our proposed approach to the modeling problem is tion. Second, although the joint estimation of mod-
illustrated in Figure 2. We assume the existence of a els in which retention and usage are modeled as
common latent variable, the level of which influences a function of past usage allows for the possibility
a customers usage and her likelihood of renewing of exogenous factors/shocks affecting both decisions
her contract at each renewal opportunity. As such, simultaneously (e.g., through a correlated error struc-
changes in usage over time, along with the decision ture), such an approach does not explicitly model sys-
to churn, reflect dynamics in the latent variable. If we tematic dynamics in those factors. In other words,
can forecast the evolution of the latent variable mul- combining the two submodels controls for com-
tiple periods into the future, forecasts of usage and mon factors affecting both decisions, but it does not
renewal follow automatically. (In others words, there explain them. (Explicitly modeling the evolution
is no need to use predictions of usage as inputs when of a latent variable affecting both decisions not only
predicting usage and retention.) improves our insights into customer behavior but also
The idea of usage and renewal being driven by implies that we can forecast those systematic changes,
a common latent variable is supported by survey- resulting in more accurate predictions.6 ) Last, but by
based research in the marketing literature, which no means least, our modeling approach provides a
shows that contract renewal and service usage are
driven by some underlying attitudinal construct. For 4
At an abstract level, the notion of a latent variable driving both
example, Rust and Zahorik (1993) propose a dynamic behaviors is very similar to the idea of usage and renewal being
framework linking customer satisfaction to retention. outcomes of the maximization of a common utility function. We
They find that changes in retention rates are linked to do not use the term utility to refer to the latent variable because
we are not using a formal utility maximization framework when
changes in satisfaction with the service. Bolton (1998)
developing our model.
5
We acknowledge that the concept of commitment has been
2
This problem exists regardless of how many usage periods occur defined and previously studied in the marketing literature (e.g.,
between successive renewal opportunities. Morgan and Hunt 1994, Garbarino and Johnson 1999, Gruen et al.
3
Some of the benchmark models considered in our empirical 2000). Its theoretical definition and measurement are beyond the
analysis are based on this logic. An obvious problem with such scope of this paper.
6
an approach is that any prediction error propagates through the We support this claim in our empirical analysis when we compare
forecasts. our approach to other methods.
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 573
compelling and intuitive story of customer behavior Danaher 2002), where different methods have been
in contractual settings that is easy to communicate to proposed to model longitudinal data with some type
a nontechnical audience. of attrition process. With variations particular to each
Consistent with much of the modeling literature in application area, these longitudinal models see the
the customer relationship management (CRM) area, joint estimation of a measurement process and a sur-
we assume that the behaviors of interest are mea- vival function. However, such models do not address
sured in discrete time; for example, Kumar et al. our research objective. First, they focus on control-
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
(2008) and Venkatesan and Kumar (2004) use months, ling for dropout-induced bias, rather than predicting
whereas Borle et al. (2008) and Bonfrer et al. (2010) dropout. And second, these methods do not naturally
use weeks. This same literature typically treats usage lend themselves to the two-clock nature of the prob-
data in one of two ways. The first focuses on the lem we are addressing.
time between events, with a submodel for the usage A number of marketing researchers have pro-
(e.g., expenditure, number of units purchased) asso- posed various dynamic latent models to capture con-
ciated with each event (e.g., Venkatesan and Kumar sumers evolving behavior. For example, Sabavala
2004, Borle et al. 2008). The second sums usage across and Morrison (1981), Fader et al. (2004), and Moe
all the events that occur in a given time interval, and Fader (2004a, b) present nonstationary probability
giving us the total usage per period (e.g., total expen- models for media exposure, new product purchasing,
diture per month, total number of units purchased and website usage, respectively; Netzer et al. (2008)
per week), and an appropriate model for that quan- and Montoya et al. (2010) develop models of char-
tity is specified (e.g., Kumar et al. 2008, Sriram et al. itable giving and drug-prescribing behavior. None
2012).7 In this research we take the second approach of these models is directly relevant to our model-
because it fits naturally with our assumed data gen- ing problem because they assume a univariate action
erating process. given the latent state. Nevertheless, our proposed
Before we formally develop the model, let us note model is influenced by their work.
other streams of literature potentially relevant to our
modeling objective. In essence, the data we seek to
model are from a bivariate longitudinal process in 3. The Model
which one variable is contingent on the other (i.e., We now turn our attention to the specification of our
customers need to get access to the service), and model, first outlining the intuition of the model. We
one of the variables follows a binary and absorb- assume that observed usage and renewal behaviors
ing mechanism (i.e., churn is absorbing). As such, reflect the individuals commitment at any point in
it appears that a discrete/continuous model of con- time. We assume that this latent variable is discrete
sumer demand (e.g., Hanemann 1984, Krishnamurthi in nature and evolves over time in a stochastic man-
and Raj 1988, Chintagunta 1993) could be used to ner. Variations in commitment are reflected in usage
address this problem. This type of model was pro- behavior (e.g., number of transactions). The decision
posed in the marketing and economics literature to of whether or not to renew the contract when it comes
model joint decisions (binary/continuous) such as up for renewal also reflects the individuals commit-
whether to buy and, if so, how much to buy. The ment at that point in time; if her commitment is below
underlying assumption of such models is that all the a certain threshold, she will not renew her contract.
involved decisions are consequences of optimizing a Consistent with the nature of contractual businesses,
common utility function. Although they have been we assume that churn is absorbing (i.e., once a cus-
extended to handle dropout (e.g., Narayanan et al. tomer churns, we do not observe her behavior any
2007), we cannot automatically make use of these more). The fact that an individual is under contract
models because they do not accommodate the two at a particular point in time implies that her commit-
different time scales inherent in the data typically at ment had to be above some renewal threshold in all
an analysts disposal. preceding renewal periods.
Another possible starting point is the biostatistics With reference to Figure 3, where we have a
literature on modeling longitudinal data with dropout monthly subscription with usage observed on a
or a terminal event (e.g., Diggle and Kenwark 1994, weekly basis, the fact that this person is active in the
Henderson et al. 2000, Xu and Zeger 2001, Hashemi second month means that she renewed in week 4,
et al. 2003, Cook and Lawless 2007) and, to a which implies that her unobserved commitment was
lesser extent, related work in the fields of market- above the renewal threshold in that particular week.
ing and economics (e.g., Hausman and Wise 1979, However, it does not tell us anything about the level
of the latent variable in weeks 11 21 31 51 0 0 0 3 this has
7
At one level, the two approaches are equivalent, given the inter- to be inferred from her usage behavior. As her com-
play between timing and counting processes. mitment is below the renewal threshold in week 8,
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
574 Marketing Science 32(4), pp. 570590, 2013 INFORMS
Finally, we need to establish the initial conditions For each customer i we have a total of Ti usage
for the commitment states in period 1. We assume that observations. Let yit be customer is observed usage
the probability that customer i belongs to commit- in period t, and let Si = 6Si1 1 Si2 1 0 0 0 1 SiTi 7 denote the
ment state k at period 1 is determined by the vector (unobserved) sequence of states to which customer i
q = 6q1 1 q2 1 0 0 0 1 qK 7, where belongs during the observation window, with realiza-
tion si = 6si1 1 si2 1 0 0 0 1 siTi 7. The customers usage likeli-
P 4Si1 = k5 = qk 1 k = 11 0 0 0 1 K0 (4) hood function is
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
Table 1 Illustrative Feasible and Infeasible Commitment State the underlying commitment process. If Ti = n1 2n1 0 0 0 1
Sequences and the customer did not renew her contract, i con-
Month 1 Month 2 tains 4K 15Ti /n K Ti Ti /n1 possible paths; otherwise, i
contains 4K 154Ti 15/n+1 K Ti 4Ti 15/n1 paths.
Wk 1 Wk 2 Wk 3 Wk 4 Wk 5 Wk 6 Wk 7 Wk 8 Wk 9 Feasible? Considering all customers in our sample, and rec-
1 3 1 2 2 3 2 2 3 ognizing the heterogeneity in i , the overall likeli-
2 3 1 1 2 3 2 2 3 hood function is
2 3 2 2 2 3 2 1 3
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
2 3 1 1 L4A1 q1 1 data5
2 3 2 2 2 3 2 1
2 3 2 2 2 3 2 2 3 I Z
Y Z Z
2 1 1 2 2 1 2 2 1 = Li 4i 1 q1 1 i data5
i=1 0 4i1 5 4iK 5
f 4i A5f 4i 5 di di 1 (8)
The first three sequences are infeasible. The first
sequence of states cannot occur given that, for an where 4ij 5 is the simplex 86ij1 1 ij2 1 0 0 0 1 ijk 7 ijk
individual to have become a customer, her commit- 03 k = 11 0 0 0 1 K3 Kk=1 ijk = 19.
P
ment state in period 1 must have (by definition) been To summarize, we propose a joint model of usage
greater than 1. The following two sequences of states and churn in which the two behaviors (which typ-
are also infeasible because if a customer is active in ically occur on different time scales, but need not)
week 9 (the third month), her commitment state at the reflect a common dynamic latent variable modeled
end of the first and second months (weeks 4 and 8) using a hidden Markov model. Churn is deter-
had to have been greater than 1. However, there are ministically linked to the latent variable, whereas
no restrictions about her commitment states in any usage is modeled as a state-dependent Poisson pro-
periods other than 4 and 8, which means the next four cess that incorporates time-invariant cross-sectional
sequences shown in Table 1 are feasible. heterogeneity.
At first glance, the specification for the churn pro- We estimate the model parameters using a hierar-
cess might seem restrictive given that renewal behav- chical Bayesian framework. In particular, we use data
ior is assumed to be deterministic conditional on the augmentation techniques to draw from the distribu-
commitment state. However, membership of this hid- tion of the latent states Sit as well as the individual-
den state evolves in a stochastic and heterogeneous level parameters i and i . We control for the path
manner. As a result, renewal behavior is modeled restrictions (because of the nature of the contract
probabilistically, and customers are allowed to churn renewal process) when augmenting the latent states.
at different rates. As a consequence, the evaluation of the likelihood
3.1.4. Bringing It All Together. We now combine function becomes simpler, reducing to the expres-
the three processes to characterize the overall model. sion of the conditional (usage) likelihood function,
usage
For each customer i, we have shown how the unob- Li 41 i Si = si 1 data5. See Web Appendix A (avail-
served sequence Si determines her renewal pattern able as supplemental material at http://dx.doi.org/
over time. Moreover, conditional on her Si = si , the 10.1287/mksc.2013.0786) for further details.
expression for the usage likelihood was derived. To
remove the conditioning on si , we need to consider all 3.2. Parameter Identification
possible paths that Si may take, weighting each usage This basic model has K 2 + 4K 15 + K + 1 popula-
likelihood by the probability of that path: tion parameters, which are the elements of A, q, ,
and , respectively. The Markov process (determined
Li 4i 1 q1 1 i data5 by parameters A and q) captures both churn and
X usage changes in usage behavior, whereas the Poisson pro-
= Li 41 i Si = si 1 data5f 4si i 1 q51 (7)
si i
cess (determined by parameters and ) links these
underlying dynamics to usage behavior alone.
where i denotes all possible commitment state The challenge in estimation is to separate individ-
paths customer i might have during the observation ual dynamics from cross-sectional heterogeneity and
usage
window, Li 41 i Si = si 1 data5 is given in (6), the general randomness associated with the Poisson
and f 4si A1 q5 is the probability of path si . If there usage process. We have a number of observations of
were no restrictions as a result of the renewal pro- individual usage, but there are typically fewer for
cess, the space i would include all possible com- renewal behavior. This scarcity in renewal data is due
binations of the K states across Ti periods (i.e., K Ti to two factors: renewal only happens every n peri-
possible paths). However, as discussed earlier, the ods, and the churn process is absorbing (i.e., once
nature of the renewal process places constraints on a customer churns, we do not observe her behavior
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 577
anymore). Nevertheless, given the deterministic link where 1 reflects how mean usage varies as a function
between the latent state and renewal behavior (i.e., of the individual time-invariant covariates denoted
in any given renewal period, customers churn if and by xi , and 2 captures the effects of the time-varying
only if they are in state 1, the lowest commitment covariates (e.g., marketing activities at time t) denoted
state), all renewal observations are very informative by zit .14
and are therefore necessary for identification. Moreover, the observed covariates could have a
Let us outline the intuition of how the model more persistent effect on customer behavior or, in
parameters are identified.13 We estimate the state-
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
3.3. Incorporating Covariates in the Model 3.4. Variants on the Basic Model
In many situations the firm will have reliable cus- The proposed model assumes that, conditional on the
tomer demographic data and/or information about underlying state, usage behavior is characterized by
the interactions between the firm and the cus- a Poisson distribution. Behaviors for which this spec-
tomers. The latter case might include a wide range ification is appropriate include the number of credit
of marketing actions, from untargeted advertis- card transactions per month, the number of movies
ing campaigns or promotional activities aimed at all purchased each month in a pay-TV setting, and the
customers to individual-level direct mail and email number of phone calls made per week. However, in
communications. some settings, the usage level has an upper bound,
This additional information can be incorporated either because of capacity constraints from the com-
into the model in different ways. If we expect the panys side or because the time period in which
customer-level covariates to explain cross-sectional usage is observed is short. Going back to the above-
variation in overall usage levels or if the time-varying mentioned gym example, if one wants to model the
marketing actions are expected to have a short-term
impact on usage alone (and not on other behaviors), 14
As a particular case of the latter, one can easily incorporate sea-
we could simply make the usage rate in Equation (5) sonal dummies or any other type of time-varying information to
a function of the available covariates: control for seasonality and time trends. This is examined empiri-
cally in Web Appendix C.
it 6Sit = k7 = k i exp41 xi + 2 zit 51 (9) 15
Experimental data would provide a perfect scenario for measur-
ing such effects, because this would not only solve any identifica-
13
We have also run simulation analyses to corroborate this. The tion issues but also alleviate the potential endogeneity of marketing
results are reported in Web Appendix B. actions.
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
578 Marketing Science 32(4), pp. 570590, 2013 INFORMS
number of days a member attends in a particular with information about the products offered and com-
week, the Poisson may not be the most appropriate plete an order form. When ones membership is close
distribution because there is an upper bound of seven to expiring (generally one month before the expira-
days. Similarly, an orchestra will have a fixed number tion date), the organization sends out a renewal let-
of performances in any given booking period, and the ter. If membership is not renewed, the benefits can no
analyst may wish to acknowledge this upper bound longer be received.
when modeling usage. In these cases the Poisson dis- We focus on the cohort of individuals that took out
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
tribution should be replaced by the binomial distribu- their initial subscription during the first quarter of
tion in which the upper bound (e.g., the number of 2002 and analyze their buying and renewal behavior
days in a week, total number of performances offered) for the following four years. Expressing these data in
is the number of trials. terms of periods (as we defined t in 3), we have a
There are also situations where usage is not dis- total of 17 periods. We observe usage in periods 116
crete; for example, usage could refer to time (e.g., and renewal decisions in periods 5, 9, 13, and 17.16
minutes used in wireless contracts), expenditure (e.g., 4.1.1. Some Patterns in the Data. Of the 1,173
total amount spent), or other nonnegative continu- members of this cohort, 884 renewed at the end of
ous quantities (e.g., MBs downloaded in a wireless their first year (75% renewal rate), 738 renewed at
data plan). The proposed model is easily applied in the end of year 2 (83% renewal rate), 634 renewed at
such settings, provided the distributional assumption least three times (86% renewal rate), and 575 were still
of the usage process is modified; we simply replace active after four renewal opportunities.
the Poisson or binomial with, say, a gamma or lognor- This cohort of customers made a total of 14,255
mal distribution. (Details of these alternative specifi- purchases across the entire observation period. On
cations are provided in Web Appendix D.) average, a subscriber made 1.05 purchases per period.
Finally, there could be cases in which usage and However, the transaction behavior was very heteroge-
renewal occur each and every period (i.e., operate neous across subscribers, with the average number of
on the same clock), or alternatively, the firm tempo- purchases per period ranging from 0 to 41.9. (We use
rally aggregates the usage data (e.g., a gym offering the words purchase and transaction interchange-
monthly subscriptions and recording the total num- ably.) There was also variability in within-customer
ber of attendances in each month). The model pre- variation in the number of transactions. The indi-
sented in 3.1 can easily accommodate such cases. vidual coefficient of variation for usage ranged from
One simply needs to set n = 1 in the specification almost 0 (customers whose transaction history was
of the renewal process, which basically restricts the very stable) to 4 (customers with high variation in
lowest commitment state to be absorbing. (As a con- their period-to-period purchasing behavior).
sequence, the transition matrix would have fewer To examine the observed relationship between
elements because there is no possibility to move from usage and renewal behaviors, we split the cohort into
state 1.) four groups depending on how long they were mem-
bers of the organization (1 year, 2 years, etc.) and look
at the evolution of their usage behavior over time.
4. Empirical Analysis Figure 4 plots the cumulative moving average of the
4.1. Data number of transactions by period (indexed against
We explore the performance of the proposed model period 1). We observe that, on average, customers
using data from an organization in which an annual decrease their usage prior to churn.
subscription/membership is required to gain the right To examine whether this pattern also occurs at
to buy (or use) its products and services, as is the case the customer level, we analyze individual-level usage
for some warehouse clubs and priority-booking behavior at the end of each customers observation
schemes for cultural organizations. Membership of period. We find that for the majority of customers,
this scheme also provides subscribers with additional the number of transactions decreases before churn:
benefits, including newsletters and invitations to spe- for 70% of churners, transaction levels in the last two
cial events.
In addition to the membership fee, subscribers are 16
In developing the logic of our model, we discussed a contract
an important source of revenue for the organization period of n = 4 with renewal occurring at 4, 8, 12, etc. For the
through their usage behavior. The company generates case of quarterly periods, this implicitly assumed that customers
approximately $5 million a year from membership are acquired immediately at the beginning of the period (e.g., Jan-
uary 1, the first day of Q1) with the contract expiring at the end of
fees alone and a further $40 million from members fourth period (e.g., December 31, the last day of Q4). In this empir-
transactions. Each year is divided into four buying ical setting, customers are acquired throughout the first period,
periods; all members receive a catalog each period which means the first renewal occurs sometime in the fifth period.
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 579
Figure 4 Indexed Cumulative Moving Average of Usage, by Duration Table 3 Parameters of the Usage Process with Three States
of Membership
Parameter Posterior mean 95% CPI
Figure 5 Distributions of the State-Specific Usage Process Poisson Table 5 Mean Transition Probabilities and the 95% Interval of
Means Individual Posterior Means
40 To state
State 1
State 2 From state 1 2 3
State 3
30 1 0.60 0.38 0.02
[0.60 0.61] [0.37 0.38] [0.02 0.02]
Frequency (%)
10
the data do not want this state to be absorbing in this
empirical setting.
To get a tangible sense of how the model fits the
0 1 2 3 4 data, we compare the actual and predicted levels of
Mean transaction rate usage in the calibration period. We find that the three-
state specification of the proposed model gives a very
good fit when predicting total usage: the mean abso-
should be read as follows: for an average individual lute percentage error (MAPE) for the total number
in state 3, the probability of remaining in state 3 is of transactions per period is 7.44%. However, being
0.72, the average probability of switching to state 2 able to track aggregate levels of usage is not enough;
in the next period is 0.23, and the average probability we would expect the model to capture cross-sectional
of switching to the lowest commitment state is 0.05. differences too. We compute the posterior distribu-
Note that individuals do not switch states with the tions of the maximum number of transactions, the
same propensity; if we look at individuals within the minimum number of transactions, and five common
95% interval of individual posterior means, their (pos- percentiles of the transaction distribution. Comparing
terior mean) probabilities of switching from state 3 to these with the actual numbers, we observe in Table 6
state 2 range from 0.06 to 0.34. that all the summaries of the actual usage behavior
Care must be taken when interpreting the state 1 lie within the 95% CPI of the posterior distributions.
transition probabilities ( 11 = 0060, 12 = 0038, and These results, combined with the good aggregate fit,
13 = 0002), because these probabilities only apply to confirm that the model predictions in the calibration
nonrenewal periods. The model assumes that cus- period are accurate.
tomers churn if they are in the lowest commitment To further assess the validity of the model, we
state during a renewal period; however, not all peri- look at the relationship between the observed behav-
ods are renewal occasions. Therefore, it is possible to iors (both usage and renewal) and the latent states
find individuals who were in state 1 at a particular to which customers are assigned. First, we group the
time (nonrenewal period) and changed their commit- customers depending on their levels of usage (i.e.,
ment state before the renewal occasion occurred. The whether recent usage is below/above their regular
fact that the estimate of 11 is less than 1 implies that levels). Then we look at both the probability of being
assigned to each latent state and the actual renewal
behavior at the end of the current contract. Regard-
Table 4 Parameters of the Commitment Process with Three States ing usage, we confirm that the probability of being
assigned to lower states increases when a customers
Parameter Posterior mean 95% CPI
q1 0000
q2 0050 [0.44 0.55] Table 6 Comparing the In-Sample Distribution of Transactions with
q3 0050 [0.44 0.55] the Model Predictions
11 38017 [32.32 46.80] Actual Posterior mean 95% CPI
12 24020 [16.39 31.30]
13 1013 [0.75 1.66] Min 0.00 0.00 [0.00 0.00]
21 0025 [0.23 0.28] 5% 0.00 0.00 [0.00 0.00]
22 0028 [0.21 0.37] 25% 2.00 1.94 [1.00 2.00]
23 0021 [0.19 0.23] 50% 4.00 4.56 [4.00 5.00]
31 0013 [0.12 0.15] 75% 11.00 11.07 [10.25 12.00]
32 0062 [0.55 0.70] 95% 32.85 33.10 [31.00 35.00]
33 1094 [1.73 2.12] Max 457.00 453.04 [396.00 513.00]
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 581
usage in recent period is below their average. Regard- their current average.18 We also consider two heuris-
ing renewal, we observe that individuals assigned tics for predicting churn: Heuristic C (no usage)
to state 1 have much higher churn rates than those assumes that churn occurs if there is no usage activity
assigned to higher states. (Details of this analysis are during the last two periods, and Heuristic D (lower
provided in Web Appendix E). usage) assumes that churn occurs if an individuals
Now that we have shown the quality of the model average usage over the last two periods is lower than
predictions in the calibration period, we examine the that of the corresponding periods in the previous
year. (The latter heuristic is in the spirit of Berry and
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
is unlike the forecasts obtained using the proposed Table 7 Assessing the Period 12 Predictive Performance of the
method, where all predictions are based on the evo- Usage Models
lution of the underlying commitment state. Aggregate Disaggregate Individual
Bivariate Model. Another way to model our data is (% error) ( 2 ) (MSE)
to assume two latent variables where one variable
Heuristic
determines usage and the other determines retention. A (periodic) 2808 1609 300
Further, one could allow these two variables to be B (status quo) 2108 13707 109
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
20 21
We estimate both specifications varying the number of hidden Note that we are using this statistic as a measure of the match
states. The best-fitting model for the specification with homoge- between the actual and predicted period 12 histograms, not as a
neous usage has three states, whereas that for the homogeneous measure of the goodness of fit of the model. As such, we do not
transition specification has four states. report any p-values.
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
Marketing Science 32(4), pp. 570590, 2013 INFORMS 583
Figure 6 Comparing the Predicted and Actual Distributions of the because their hit rates are higher. However, it should
Number of Transactions in Period 12 be noted that the actual retention rate in the sample is
500 86%, and these two methods predict that almost every
Actual customer renews (98% and 95% predicted renewal
450 Heuristic A
Poisson cross rates). As a consequence, the high figures (for hit rate)
400 Bivariate associated with the logistic regression models are a
Full
350 consequence of their classifying stayers correctly (at
Number of customers
300
Second, we compare the accuracy of the methods
250 when predicting renewal behavior at the end of our
200 validation period (i.e., period 17). The period 17 pre-
dictions for the homogeneous usage specification and
150
the full-model specification are exceptionally accurate
100 at the aggregate level ( 006% and 0.5%, respectively),
50 with hit rates of 67% and 68%. We note that the
predicted renewal rates associated with the homoge-
0
0 1 2 3 4 5 6 7+ neous transition specification are the same for periods
Number of transactions 13 and 17; this is a natural consequence of the model
specification.
4.3.3. Renewal Forecast. The predictive perfor- 4.3.4. Renewal and Usage Forecast. Finally, we
mance of all the churn models is presented in Table 8. consider the overall forecasting accuracy of the mod-
First, we compare actual versus predicted renewal els by examining usage behavior in periods 1416,
rates in period 13. As shown in the table, the two which in turn depends on predicted renewal behavior
dynamic latent variable model specifications with het- in period 13.
erogeneous transition probabilities provide the most We look at actual versus predicted usage lev-
accurate predictions of the future renewal rate (1.0% els in periods 1416, examining the accuracy of the
and 2.7% error). Both heuristics yield very poor predictions at the aggregate, disaggregate, and indi-
predictions of period 13 churn. The two logistic vidual levels (see Table 9). Comparing the aggregate
regression models overestimate future renewal (and MAPE computed across all forecast periods, we find
therefore the size of the customer base) by more than that the full model provides the most accurate predic-
10%. We also compute the hit rate (i.e., the percent- tions over the entire validation period (MAPE = 204%).
age of customers correctly classified) for all methods. The combination of the RFM-based Poisson and logis-
The full model correctly classifies 78% of customers, tic regression models results in very poor estimates
the highest among the three dynamic latent variable of future behavior at the aggregate level. When we
models. At first glance, it appears that the logistic consider the disaggregate-level (average 2 across the
regression models are better than the proposed model three forecast periods) and individual-level (squared
error averaged across the three forecast periods and
the 738 customers who were still active at the end of
Table 8 Assessing the Period 13 and Period 17 Predictive
Performance of the Renewal Models
the calibration period) measures of predictive accu-
racy, we see that the full-model specification is clearly
Period 13 Period 17 superior.
Renewal Hit rate Renewal Hit rate To conclude, we have shown that our proposed
rate (%) % error (%) rate (%) % error (%) dynamic latent variable model accurately predicts
Heuristic
C (no usage) 27 6808 37 Table 9 Assessing the Accuracy of Usage Predictions for
D (lower usage) 63 2605 60 Periods 1416
Logistic regression
Aggregate Disaggregate Individual
Cross-sectional 98 1306 85 51 4306 57
(MAPE) (avg. 2 ) (MSE)
Panel 95 1009 83 68 2407 60
Bivariate model 79 708 71 79 1204 68 RFM
Proposed model Cross-sectional 2604 29108 1805
Homogeneous 87 100 77 90 006 67 Panel 6802 4802 1804
usage Bivariate model 300 3508 508
Homogeneous 81 601 73 81 1509 60 Proposed model
transition Homogeneous usage 408 8402 500
Full specification 88 207 78 91 005 68 Homogeneous transition 1002 2602 305
Actual 86 91 Full specification 204 1600 301
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
584 Marketing Science 32(4), pp. 570590, 2013 INFORMS
Table 10 Comparing the Present Value of Actual and Predicted a very powerful tool for customer valuationthe
Validation-Period Revenue dynamic aspect of the model provides additional
Total revenue ($000) Aggregate % error insights that are managerially useful. Understanding
the evolution of customer churn and usage propen-
RFM cross-sectional 21386 26 sity has the potential to provide marketers operat-
RFM panel 41548 136
ing in contractual business with useful information
Bivariate model 11006 46
Proposed model for issues such as segmentation, cross selling, and the
design of retention programs. In this section we show
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
Prob(State)
Customer A 0.5 0.5 0.5 0.5
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
1 1 1 1
Prob(State)
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
1 1 1 1
Prob(State)
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
notably from period 5 to period 8, which we inter- same. The subsequent drop in purchasing for cus-
pret as a drop in her commitment. Why do we see tomer B and the lack of purchasing by customer C
these differences in the underlying states when their are reflected in the changing inferred probabilities
behavior in periods 58 is the same? The answer is of commitment state membership for the next two
because the state membership probabilities also reflect periods; the model detects that both customers have
the differences in their usage over their entire life- decreased their commitment.
time. As customer B was more active than customer A Finally, it is worth noting that although customers B
in year 1, observing one period with zero purchases and C end up in the same place at the time of
(period 8) is a likely indicator of a drop in underly- renewal (i.e., the lowest commitment level), they got
ing commitment. However, given customer As past there via two different paths. This phenomenon opens
behavior, one period of no purchases does not neces- an interesting question: Are there particular paths
sarily indicate a high risk of churn. that customers go through before cancelling their
Comparing customers B and C, we note that Bs subscription?
purchasing in periods 14 is higher than that of C.
5.2. Identifying Paths to Death
This is a consequence of the differences in i as well
We can extend this analysis by calculating the evo-
as in the commitment levels for those periods; the
lution of commitment for all the customers in our
latter is reflected in the inferred commitment level
sample and then using that information to identify
for period 5customer B has a high probability of
similar paths to death (i.e., the most common com-
being in the highest commitment state, whereas cus-
mitment paths that customers go through before can-
tomer C has an almost equal probability of being in celing their contracts). Knowledge of these different
states 2 or 3as well as in the evolution of com- paths should be of interest to those developing cus-
mitment in subsequent periods. For example, the tomer retention programs.
jump in customer Cs purchasing in period 6 is inter- For all customers who cancel their membership
preted as evidence of an increase in commitment. after one year, we compute the posterior probabilities
Even though customer B made the same number of of belonging to each latent state in each period. These
purchases in period 6, there is little change in the probabilities are computed based on the individual
probability of her being in the highest commitment posterior draws of state membership in each period.24
state because two purchases is not out of the ordi-
nary in light of her transactions in periods 14. The 24
We use probabilities, as opposed to posterior states, because their
inferred probabilities of commitment state member- use allows us to account for uncertainty around the posterior esti-
ship for these two customers are now basically the mate of state membership.
Ascarza and Hardie: A Joint Model of Usage and Churn in Contractual Settings
586 Marketing Science 32(4), pp. 570590, 2013 INFORMS
Prob(State)
Cluster 1
(57%) 0.5 0.5 0.5 0.5
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
1 1 1 1
Prob(State)
Cluster 2
(11%) 0.5 0.5 0.5 0.5
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
1 1 1 1
Prob(State)
Cluster 3
(32%) 0.5 0.5 0.5 0.5
0 0 0 0
1 2 3 1 2 3 1 2 3 1 2 3
State State State State
We then perform a k-means cluster analysis to iden- is a small group of customers who were highly com-
tify groups of customers with similar commitment mitted for most of the year. Cluster 3 corresponds to
evolution patterns (i.e., customers are grouped based those customers who started high but whose commit-
on their probability of belonging to each commitment ment to the organization decreased as time went by
level in all periods before cancellation). Varying the (slow death).
number of clusters from two to four, we find that
5.3. Dynamic Segmentation
a three-cluster solution best represents the data. The
We can also group customers on the basis of their
cluster centroids are plotted in Figure 8.
commitment level at each point in time, resulting in
Cluster 1, representing 57% of the sample, corre-
a dynamic segmentation scheme that could help the
sponds to those customers whose commitment was marketer better understand her customer base. Fig-
very low since the start of their membership. Except ure 10(a) shows how the segment sizes evolve over
for the first period, these customers are always very
likely to belong to the lowest commitment state. Figure 9 Evolution of Commitment States Before Cancellation
In contrast, cluster 2s commitment is at its highest
Cluster 1
level from the moment of acquisition until period 4, 3
Cluster 2
the moment at which it decreases to the medium Cluster 3
level. (Note that this cluster is much smaller than
the first one, representing 11% of the sample.) Finally,
Commitment level
Figure 10 Examining the Dynamics of Segment Sizes How would a company benefit from such segmen-
(a) Number of customers in each segment tation scheme? Segmenting customers on the basis of
1,200 their underlying commitment enables us to not only
State 1 detect at-risk customers (i.e., potential churners) but
State 2 also identify highly committed customers. This is in
1,000 State 3
the marketers interests if, for instance, commitment
is correlated with the purchasing or use of additional
Number of customers
800
products and services.
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
Table 12 Attendance of Other Events in 2004 by Period 12 a set of benchmark models, providing more accurate
Commitment Segment predictions of both churn and usage behaviors. Such
Commitment segment No. of customers Avg. no. of events predictions lie at the heart of any attempt to quan-
tify customer equity, a concept central to any firms
1 62 0053
efforts to become more customer-centric. Forecasts of
2 300 0057
3 376 0076 usage can have additional value, such as in settings
where usage levels affect service quality, which in
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
Krishnamurthi L, Raj SP (1988) A model of brand choice and Reinartz W, Kumar V (2003) The impact of customer relation-
purchase quantity price sensitivities. Marketing Sci. 7(1): ship characteristics on profitable lifetime duration. J. Marketing
120. 67(1):7799.
Kumar V, Venkatesan R, Bohling T, Beckmann D (2008) The power Risselada H, Verhoef PC, Bijmolt THA (2010) Staying power of
of CLV: Managing customer lifetime value at IBM. Marketing churn prediction models. J. Interactive Marketing 24(3):198208.
Sci. 27(4):585599. Rust RT, Zahorik AJ (1993) Customer satisfaction, customer reten-
Larivire B, Van den Poel D (2005) Predicting customer reten- tion, and market share. J. Retailing 69(2):193215.
tion and profitability by using random forests and regression Rust RT, Zeithaml VA, Lemon KN (2001) Driving Customer Equity:
forests techniques. Expert Systems Appl. 29(2):472484. How Customer Lifetime Value Is Reshaping Corporate Strategy
Downloaded from informs.org by [128.59.222.12] on 23 April 2014, at 07:25 . For personal use only, all rights reserved.
Lemmens A, Croux C (2006) Bagging and boosting classification (Free Press, New York).
trees to predict churn. Marketing Res. 43(2):276286. Sabavala DJ, Morrison DG (1981) A nonstationary model of binary
Lemon KN, White TB, Winer RS (2002) Dynamic customer relation- choice applied to media exposure. Management Sci. 27(6):
ship management: Incorporating future considerations into the 637657.
service retention decision. J. Marketing 66(1):114. Schweidel DA, Bradlow ET, Fader PS (2008a) Modeling the evo-
Lu J (2002) Predicting customer churn in the telecommunications lution of customers service portfolios. Working paper, Emory
industryAn application of survival analysis modeling using University, Atlanta. http://ssrn.com/abstract=985639.
SAS . SAS User Group Internat. (SUGI27) Online Proc. (SAS Schweidel DA, Fader PS, Bradlow ET (2008b) Understanding ser-
Institute, Cary, NC), Paper 114. vice retention within and across cohorts using limited infor-
Moe WW, Fader PS (2004a) Capturing evolving visit behavior in mation. J. Marketing 72(1):8294.
clickstream data. J. Interactive Marketing 18(1):519. Schweidel DA, Bradlow ET, Fader PS (2011) Portfolio dynamics
Moe WW, Fader PS (2004b) Dynamic conversion behavior at for customers of a multi-service provider. Management Sci.
e-commerce sites. Management Sci. 50(3):326335. 57(3):471486.
Montoya R, Netzer O, Jedidi K (2010) A dynamic allocation of phar- Scott SL, James GM, Sugar CA (2005) Hidden Markov models
maceutical detailing and sampling for long-term profitability. for longitudinal comparisons. J. Amer. Statist. Assoc. 100(470):
Marketing Sci. 29(5):909924. 359369.
Morgan RYH, Hunt SD (1994) The commitment-trust theory of rela- Sriram S, Chintagunta PK, Manchanda P (2012) The effects of ser-
tionship marketing. J. Marketing 58(3):2038. vice quality on usage and termination of a video on demand
Mozer MC, Wolniewicz R, Grimes DB, Johnson E, Kaushansky H service. Working paper, University of Michigan, Ann Arbor.
(2000) Predicting subscriber dissatisfaction and improving Venkatesan R, Kumar V (2004) A customer lifetime value frame-
retention in the wireless telecommunications industry. IEEE work for customer selection and resource allocation strategy.
Trans. Neural Networks 11(3):690696. J. Marketing 68(4):106125.
Naik PA, Mantrala MK, Sawyer AG (1998) Planning media sched- Verhoef PC (2003) Understanding the effect of customer relation-
ules in the presence of dynamic advertising quality. Marketing ship management efforts on customer retention and customer
Sci. 17(3):214235. share development. J. Marketing 67(4):3045.
Narayanan S, Chintagunta PK, Miravete EJ (2007) The role of self- Wooldridge JM (2002) Econometric Analysis of Cross Section and Panel
selection and usage uncertainty in the demand for local tele- Data (MIT Press, Cambridge, MA).
phone service. Quant. Marketing Econom. 5(1):134. Wbben M, Wangenheim F (2008) Instant customer base analysis:
Netzer O, Lattin JM, Srinivasan V (2008) A hidden Markov Managerial heuristics often get it right. J. Marketing 72(3):8293.
model of customer relationship dynamics. Marketing Sci. Xie JX, Song M, Sirbu M, Wang Q (1997) Kalman filter estimation of
27(2):185204. new product diffusion models. J. Marketing Res. 34(3):378393.
Parr Rud O (2001) Data Mining Cookbook: Modeling Data for Market- Xu J, Zeger SL (2001) Joint analysis of longitudinal data com-
ing, Risk, and Customer Relationship Management (John Wiley & prising repeated measures and times to events. Appl. Statist.
Sons, New York). 50(3):375387.