You are on page 1of 10

DECSUP-11912; No of Pages 10

Decision Support Systems xxx (2011) xxxxxx

Contents lists available at ScienceDirect

Decision Support Systems


j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / d s s

Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model
Hyeseon Lee a, Yeonhee Lee a, Hyunbo Cho a, Kwanyoung Im a, Yong Seog Kim b,
a b

Division of Industrial and Management Engineering, Pohang University of Science and Technology, San 31 Hyoja, Pohang 790-784, Republic of Korea Management Information Systems Department, Jon M. Huntsman School of Business, 3515 Old Main Hill, Utah State University, Logan, UT 84322-3515, USA

a r t i c l e

i n f o

a b s t r a c t
In a very competitive mobile telecommunication business environment, marketing managers need a business intelligence model that allows them to maintain an optimal (at least a near optimal) level of churners very effectively and efciently while minimizing the costs throughout their marketing programs. As a rst step toward optimal churn management program for marketing managers, this paper focuses on building an accurate and concise predictive model for the purpose of churn prediction utilizing a partial least squares (PLS)-based methodology on highly correlated data sets among variables. A preliminary experiment demonstrates that the presented model provides more accurate performance than traditional prediction models and identies key variables to better understand churning behaviors. Further, a set of simple churn marketing programsdevice management, overage management, and complaint management strategiesis presented and discussed. 2011 Elsevier B.V. All rights reserved.

Article history: Received 27 July 2010 Received in revised form 5 June 2011 Accepted 24 July 2011 Available online xxxx Keywords: Partial least squares Customer relationship management Business intelligence Churn management

1. Introduction For many rms in a very competitive market and business environment, customer churn management becomes one of the most critical success factors mainly due to higher acquisition costs for new customers. In particular, with exceptionally high annual churn rates (2040%), the rms in the mobile telecommunications industry try to develop predictive models that accurately identify which customers are most likely to terminate the current relationship. Therefore, the optimal selection of customer targets (e.g., most probable churners) and marketing campaign size where the prot of marketing campaign is maximized has been considered a critical success factor in the customer relationship management (CRM) program. However, according to a recent study [36] based on a Dutch survey involving 228 database marketing companies, many marketing managers still select their targets either intuitively or based on the long-standing methods such as cross tabulation or RFM (recency, frequency, monetary), a segmentation based database marketing method [32]. For example, in RFM analysis, marketing managers rst divide customers into a total of 125 groups (5 groups for each of three past purchasing pattern dimension) in such a way that customers within the same group are similar in terms of past purchasing patterns such as how recently they have made a purchase (recency), how often they have made purchases (frequency), and how much they have spent (monetary). Then, marketers narrow down their focus on
A preliminary version of this paper was published and presented in AMCIS 2010. Corresponding author. Tel.: + 1 435 797 2271; fax: +1 435 797 2351. E-mail address: yong.kim@usu.edu (Y.S. Kim). 0167-9236/$ see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2011.07.005

customer groups who have recently made a purchase, have made multiple purchases, and have spent more money in their customer lifetime. This RFM analysis and its variants have been successfully used for more than 30 years because they are simple and easy to apply. However, they are very limited in the sense that they cannot be applied to new customers in target markets because no RFM information is available for new customers. In addition, they do not provide any insights about new customers and markets, and hence cannot be used to expand customer bases because marketers shift their focus only on customers with high scores of RFM variables, which, in return, will boost the values of RFM scores for targeted customers while lowering those of customers with low RFM scores. Finally, RFM analysis may not be applicable to certain products. For example, customers who recently purchased an expensive car are not likely to buy another car very soon. In contrast, customers who purchased a car several years ago are more likely to purchase another car. Fortunately, the rapid development of new data mining and business intelligence models over two decades in computer science, cognitive science, and information system area make it possible for marketing managers to disseminate micromarketing messages targeted toward a specic group of households that are most likely to open to the customized incentives. When a churn management program is operated successfully, marketing managers can further identify most likely churners in advance, and then develop and present the right retention offers customized across different customer segments. Both marketing [4,29] and data mining researchers [1,6,20,35] have presented various database marketing approaches for successful CRM programs. Several studies presented

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

models to target households by using the knowledge extracted from the customer's purchase history [11,32]. In other studies [25,26] the protability condition of a campaign was explicitly formulated as a function of the model performance along with campaign cost and revenue factors such as mailing costs and marginal revenue per identied positive record. In another study [16], a business intelligence model was presented to identify as many customers as possible who will respond to a specic solicitation campaign letter by studying the effects of variable selection and class distribution on the performance both of a primitive classier system and of a relatively more sophisticated classier system. Other studies showed the effects of customer satisfaction and switching barrier on customer loyalty in Korea [15] and the relationship among customer retention, loyalty, and satisfaction in Germany[9], respectively. A few models narrow their interests such as selecting prospects in the automotive industry [10] and identifying most likely insurance buyers [17]. In particular, several studies aim to provide accurate models for churn prediction in the telecommunication industry [3,7,37] based on various statistical and data mining models such as neural network models, support vector machines, or clustering algorithms along with customers' demographics, billing, and call detail records. A recent discussion about advantages and disadvantages of various models for churn management can be found in [14]. However, most of these models have not focused on identifying critical success factors but on building an accurate churn prediction model only, which may only serve as the rst step of developing churn marketing programs. In this paper, we present not only a prediction model to accurately predict customers' churning behavior but also a simple but implementable churn marketing program. Note that for successful initiation and management of churn marketing programs, both the predictive accuracy and the comprehensibility of a model become important. In particular, a rule-based system that consists of too many ifthen statements cannot provide the key drivers of consumer behaviors and hence greatly reduces managers' trust in the system itself. To address both important issues simultaneously, we propose a partial least squares (PLS)-based methodology that allows a marketing manager to maintain an optimal (at least a near optimal) level of churners effectively and efciently through her marketing programs. The detailed objectives are: (i) to build an accurate and concise predictive model for churner prediction based on PLS-based methodologies from the vast amount of data sets of highly correlated variables; (ii) to conceptually design an effective churn control model after understanding key drivers of consumer behaviors; and (iii) to develop a segmentation based marketing campaign strategy, to validate its effectiveness for a chosen campaign size, and to determine an optimal marketing campaign size that maximizes the campaign prot. In our approach, PLS is employed as the prediction modeling method because it places minimal demands on measurement scales, sample size, and residual distributions, and it is capable of handling a large number of highly correlated variables, measurement errors, and missing data [21,24,38]. Further, PLS models naturally can be used for dimension reduction through variable selection mechanism based on the variable importance in projection (VIP) scores. Therefore, it is possible that PLS models can be used not only for constructing highly accurate models but also for enhancing the comprehensibility of models by choosing a subset of the original predictive variables. By doing so, marketing managers can save a great amount of effort and costs in identifying key determinants of churn behaviors of customers. However, eliminating many input variables may have different effects on the predictive accuracy of models depending on their representational powers and structural complexities. Therefore, this study aims to analyze the relationship between variable selection and the performance of several PLS models. The remainder of this paper is organized as follows. Section 2 provides a review of PLS methodology. In Section 3, the research

framework based on PLS models is introduced, and evaluation metrics to compare various predictive models on a data set collected by a mobile phone service provider are explained. In the following Section 4, experimental results of linear and nonlinear PLS models compared to Logit and random models will be presented. Section 5 illustrates a scenario of the optimal churn marketing campaigns based on customer segmentations. Finally, Section 6 provides the conclusion of the paper and suggests several directions of further research. 2. Literature review Various statistical models such as logistic and ordinary regression models, canonical correlation, or structural equation modeling (SEM) have been used to investigate relationships between independent (=predictors) and dependent (=responses) variables. Note that while ordinary regression method models a continuous dependent variable as a linear combination of continuous predictor variables, logistic regression method (or a Logit model) models the log odds of a binary dependent variable as a linear combination of a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these dependent variables. In particular, binomial logistic regression has been frequently used to estimate the odds of a certain event occurring based on maximum likelihood estimation (MLE) technique. In addition, logistic regression does not require normally distributed variables and does not assume homoscedasticity, and has been widely adopted by marketing researchers to study brand choice behaviors of customers [12,13]. While the logistic regression model has been successful in several marketing research areas, a new multivariate projection approach is studied for predicting churning behaviors in this study. The PLS method is one of such methods, and it can be also used to reduce the original large-scale data to lower dimensional data to deal with highly (both linearly and nonlinearly) correlated data between independent variables and dependent variables [8,19,22]. Therefore, one of main objectives in PLS analysis is to nd a few PLS factors that explain most of the variation in both independent and dependent variables. The PLS factors that explain most of the variation in responses using observed information of the predictors can consists of good predictive models for new responses. Fundamentally, the PLS method can be used as an alternative to well known models such as logistic or ordinary linear regression. One of the most popular application areas of the PLS method is in process optimization. In such an application, a PLS model is applied to extract latent variables as a linear or non-linear combination of process variables and to determine the optimal settings in the process variables from the latent variable spaces [5]. In another recent study [38], the PLS method is compared with a well known data mining algorithm, random forest, to identify the key variables that cause much variation in bulk vaccine yield so that vaccine researchers can optimize new control processes to reduce the variation. In a study [18], PLS is utilized in building a predictive system with some success to assess failure probability in small to mediumsized Finnish rms using nancial and non-nancial variables and reorganization plan information. Note that the PLS method can model both multiple responses and multiple predictor variables, even when multicollinearity among predictors are suspected. Further, the PLS method is known to be robust when there are many observations with missing values in the data. However, it is difcult for the researcher to interpret loadings of the independent latent variables from the PLS method, and hence it is favored as a predictive technique and not as an interpretive technique. The PLS method can be implemented either as a regression model to predict response variables or as a path model to understand the structural relationship among records. In this study, the PLS method is used as a regression model to predict churners based on demographic, psychographic, and historical service usage information. We also note that in prediction tasks, the number of latent variables is an important

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

factor that affects the predictive accuracy. The number of latent variables is usually chosen by a cross validation considering the proportion of variations explained by each latent variable. The variable importance in projection (VIP) has been proposed and used as a measure of importance of each variable contributing the response of interest. After the PLS model is performed, the VIP of j-th variable is calculated as the sum of weights on the latent variable from j-th variable divided by the total weights. Since the average of squared VIP over all variables is 1, greater than one rule is generally used as a criterion for selecting signicant variables. In the nonlinear PLS models, the nonlinear functional relationship can be constructed by algorithms using neural networks or Gaussian kernel. Qin and McAvoy [27] propose neural net PLS to approximate nonlinear mapping between the input and output score variables, and show better prediction than PLS and direct neural network. Malthouse et al. [22] implement nonlinear PLS with feed forward neural network and apply the model with multiple response variables. The nonlinear PLS based on neural network mapping requires much computational time and it has limitations when applied to large size data. Rosipal and Trejo [30] proposed a kernel based partial least squares regression method. Kernel methods are learning methods to nd nonlinear patterns using linear models. By moving input data to higher dimensional space using mapping functions, linear patterns in the expanded space which represents nonlinear patterns in the original feature space can be found [33]. It is also possible to detect nonlinear patterns in the original input space without explicit denition of mapping functions using a kernel function which represents an inner product of two vectors in the expanded space. For this study, we used Gaussian kernel function for the nonlinear PLS which especially works well in cases where an original input space dimension is so low that expansion of input space into a certain higher dimensional abstract space provides much more information than before.

compared with the most accurate linear PLS model. Multiple Logit regression models with different pre-dened signicance values will be considered to estimate the effectiveness of dimension reduction via variable selection through stepwise variable selection and the PLS method. We also intend to compare a linear PLS model to a nonlinear model to see if a nonlinear relationship mapping between predictors and response variables can boost the predictive power of the linear PLS models. Finally, managerial insights extracted from experimental results are discussed to help data analysts and marketing managers develop a successful churn management program by improving service quality and developing management strategies through hardware replacement, complaint management, and service quality improvement. 3.2. Data set The data sets used in this study are provided by the Teradata Center for CRM at Duke University, and the original data set for calibration has 171 predictor variables of 100,000 observations. The complete set of variables includes three types of variables: behavioral information such as minutes of use, revenue, handset equipment; company interaction information such as customer calls into the customer service center; and customer household demographics. For each customer, churn was calculated based on whether the customer left the company during the 31- to 60-day period after the customer was originally sampled. Although the actual percentage of customers who left the company in a given month is approximately 1.8%, churners in the original data set were oversampled to create roughly a 50-50 split between churners and nonchurners. However, the test data set with 51,306 observations are expected to represent a realistic churning rate, 1.8%. As a preprocessing step for further analysis, the original data sets are preprocessed as follows. First, most categorical variables are excluded because of high missing rate or being encoded into multiple binary variables which makes low predictive power. We include only 11 categorical variables which are either indicator variables or countable variables such as number of handsets and number of subscribers. This is because each categorical variable has very little predictive power in general [31]. Second, continuous variables with more than 20% of missing values are eliminated. We take 123 predictors including 11 categorical variables and 112 continuous variables in data preprocessing step. Finally, records with missing values in the data set with 123 predictors are removed from further analysis. After preprocessing steps, the training set contains 67,181 observations with 32,862 churners, while the test set contains 34,986 observations with 619 churners, respectively. 3.3. Evaluation metrics In this study, the hit rate and lift trend curve are used to numerically quantify the predictive power of models and graphically represent the performance for easy comparison, respectively. The hit

3. Research model and data set 3.1. Research model Our research framework consists of four sequential steps based on typical CRM processes, as illustrated in Fig. 1. As the rst step, it is necessary to preprocess raw data into a readily available format for further analysis. In this study, two different techniqueseliminating records with missing values and variable selectionare used separately or together for preprocessing raw data. Once preprocessed data sets from raw data are obtained, three different types of classiers (Logit regression, linear and nonlinear PLS models) are calibrated and evaluated. In this process, the performance of a random model will be also evaluated to highlight additional gains of predictive power by using any one of three intelligent models. As one of intelligent model types, several linear PLS models will be calibrated and evaluated in terms of comprehensibility, computational complexity, and predictive power related metrics such as hit ratio. Then nonlinear PLS model and Logit regression models are constructed and

Fig. 1. Research framework.

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx Table 1 Variables with VIP scores based on PLS1.0 model. Rank Variable description 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Number of days (age) of current equipment Handset: refurbished or new Age of rst household member Average monthly minutes of use over the life of the customer Range of revenue of voice overage Range of revenue of overage Percentage change in monthly minutes of use vs. previous three month average Range of overage minutes of use Average monthly number of calls over the life of the customer Mean revenue of voice overage Mean overage revenue Account spending limit Mean overage minutes of use Range of revenue (charge amount) Total number of months in service Mean total monthly recurring charge Number of handsets issued Range of number of minutes of use Range of number of attempted voice calls placed Range of number of completed voice calls Range of number of attempted calls Range of number of completed calls Range of number of inbound and outbound peak voice calls Mean monthly revenue (charge amount) Mean rounded minutes of use of customer care calls Number of models issued Range of unrounded minutes of use of peak voice calls Mean unrounded minutes of use of customer care (see CUSTCARE_MEAN) calls Average monthly revenue over the previous 3 months Average monthly revenue over the life of the customer Mean number of customer care calls Total number of calls over the life of the customer Mean number of monthly minutes of use Number of unique subscribers in the household Billing adjusted total number of calls over the life of the customer Mean number of dropped (failed) voice calls Average monthly number of calls over the previous 6 months Total minutes of use over the life of the customer Range of rounded minutes of use of customer care calls Range of number of received voice calls Billing adjusted total minutes of use over the life of the customer Average monthly minutes of use over the previous 3 months Range of unrounded minutes of use of customer care calls Range of total monthly recurring charge Range of unrounded minutes of use of completed voice calls Range of number of off-peak voice calls VIP score 2.995 1.9089 1.7519 1.5625 Churn impact + +

rate is dened as the number of correctly identied churners out of churner candidates in this study. When only x% of customers predicted most likely to churn are considered for the model evaluation, it is called hit rate at target point x%. For example, if the model is required to select 1000 customers who are most likely to churn from 10,000 observations, and 200 of them turn out to be actual churners, then a hit rate at the 10% target point (1000/10,000 = 10%) is 20% (200/1000 = 20%). The lift trend curve shows a lift at x% target point, where a lift is a ratio of the hit rate of a predictive model divided by the hit rate of a random model. This paper uses the raw number of correctly identied churners over small target points due to the limited budget and time constraints to develop marketing programs. 4. Experimental results 4.1. Stepwise variable selection vs. PLS variable selection The PLS models can be used to reduce data dimension using the information of variable importance in projection (VIP) score of each variable. We present the ranked list only of variables whose VIP score is higher than or equal to 1.0 in Table 1. According to Wold [39], the researcher may safely remove any independent variable which has a small VIP (b.8) and a small absolute value of regression coefcient. As an exploratory study, we use a set of VIP cut-off criteria values (0, 1.0, 1.2, and 1.5) in this study. We denote PLS1.0 model as a PLS model with all variables whose VIP scores are greater than or equal to 1.0, while PLSall model utilizes all variables (i.e., PLS0). Therefore, PLS1.5 model is the most parsimonious PLS model while PLS all model is the most comprehensive model. We also indicate in Table 1 whether each of these variables positively or negatively affects the churning decision of service users. For example, the number of days (age) of current equipment variable is ranked rst because it has the highest VIP score (2.995) and it is positively associated with the churning decision of service users, meaning that a service user who has kept her current mobile equipment longer is more likely to switch to another service provider. It is also noted that a variable indicating that a subscriber's phone was refurbished or new one (handset-refurbished or new) is considered the second most important variable although it is negatively associated with the churning decision of service users. Due to the limited space, we focus on the rst nine variables in Table 1, which will be PLS1.5 model because their VIP scores are higher than or equal to 1.5, and leave the interpretation of remaining variables to the reader. The most interesting fact that we noticed is that the rst two most predictive variables are related to mobile handsets and it is not very difcult to intuitively link these variables to churn behavior. For example, many mobile phone service subscribers in the USA are required to enroll into a two-year contract on the condition that certain handsets are given free. Most subscribers are not likely to churn to another service provider until the contract expires to avoid nancial penalty. When the contract period ends soon and/or the utility (or benet) from adopting a new handset with more services and better interfaces from another service provider outweighs the switching cost, a user is more likely to switch to a new service provider. Therefore, the number of days of current equipment is a strong indicator of whether the current subscriber may switch or stay. The second most important variable, handset-refurbished or new, implies almost the same insight. Typically, a customer is most likely to purchase a new handset as a promotion package from the current service provider and hence stay with the current subscriber until the end of the contract period. However, if a used or refurbished handset was purchased without any obligation, the customer is more likely to switch to a new service provider. From marketing managers' perspective, the aforementioned ndings imply that marketers must pay attention to their current users who have used the current handset for almost their entire contract periods or whose contract periods end in near future. Once users with higher

1.5562 + 1.5556 + 1.5406 1.5302 + 1.5211 + 1.4597 1.4586 1.4448 1.4039 1.3976 1.3268 1.3051 1.3 1.2882 1.2703 1.2666 1.2655 1.2597 1.2536 1.2278 1.1984 1.1964 1.182 1.1652 1.1572 1.1497 1.1307 1.1162 1.1101 1.1094 1.1091 + + + + + + + + + + + + + + +

1.0683 + 1.055 1.0485 1.0447 1.0445 1.0375 + + +

1.0375 1.0351 + 1.0299 1.0221 1.0036

probability of churning are identied, marketing managers immediately contact them to offer customized micromarketing messages before the contract period ends or users switch to another service provider. For example, about 1 month before the end of the contract period, the marketer may send an appreciation letter to offer a new handset with more advanced functions at a reasonable price as long as the current user agrees to stay with the current service provider for another contract period. Other variables of user's service usage patterns are also informative to predict churn behaviors. For example, heavy service users with higher range of overage revenue and number of minutes of use (5th, 6th, and 8th variables in Table 1), or higher value of average monthly number of calls over the life of the customer (9th variable in Table 1) are most likely to churn unless they are extremely satised with

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

the current service quality. In addition, when users do not use the mobile service as much as they have used in the past 3 months (7th variable in Table 1), they are likely to switch. Further, users who have been through many times of billing adjustment over their lifetime (4th variable in Table 1) are most likely to churn partly because it is a possible indicator of poor satisfaction with the current service. Interestingly, the age of rst household member affects the user's churning behavior negatively (i.e., older users tend to be more loyal to the current service provider than young users), indicating that young users are more responsive to new service plans associated with the cost, functionality and design of mobile devices. In order to measure the effectiveness of dimension reduction of the proposed PLS models, a simple variable selection procedure, stepwise variable selection, is performed. Once a set of informative variables are selected through the forward selection, they are used as predictor variables of Logit regression models to estimate the likelihood of becoming churners. Note that the stepwise variable selection procedure starts with the empty set of variables and greedily adds the variables that most improve performance based on 2 scores. It stops adding new variables when no additional variables satisfy the predened signicance level (e.g., = 0.05) for entry into the model. Two different values of signicance levels are used ( = 0.05 and = 0.15). From the perspective of model interpretation, a predictive model with a parsimonious variable subset (i.e., is set to 0.05) is preferred as long as the performance of the two models is compatible because of improved comprehensibility. Further, the choice of a simpler model is consistent with the well known Occam's razor principle, which states that if all other things are equal, a simpler model generalizes better and hence is preferred [2]. Due to the limited space, we only compare the rst nine variables selected by the two models: the stepwise variable selection with = 0.05 and PLS 1.5 model-based variable selection (VIP 1.5). These variables are listed in Table 2 in the selection order in the process of stepwise variable selection or VIP scores in the PLS variable selection. Variables in bold are selected via both models. We immediately noticed that the two most important variables identied from the PLS1.5 model are also chosen in the same order from the stepwise variable selection with = 0.05. In addition, the ve variables are commonly selected by the two methods. Overall, the selected variables reect customer's usage behaviors and interactions with the company, which are in line with marketing science work [31]. 4.2. Churner prediction with linear PLS models In this section, we present the prediction performance of linear PLS models in a prediction task to correctly identify churners. Fig. 2
Table 2 List of variables selected via stepwise variable selection and PLS. Variable selection method Selected variables

presents the hit rates of linear PLS models combined with four different cut-off values of VIP scores. Unfortunately, Fig. 2 shows that PLS models with more input variables are more predictive. Note, however, that all PLS models perform signicantly better than a random model whose proportion of hit records is always equal to the proportion of chosen records. Both PLS all and PLS 1.0 models identify more than 30% of churners correctly at the 20% target point. In particular, the performance of PLS 1.0 model is very compatible with PLS all model when a small portion of records is chosen for churn management program, and hence it can be used as an alternative to PLS all model when both costs and benets of churn management program are considered. Note that with more targets, a market manager needs to spend more money and time to distribute campaign messages and offer discounted service fees or free handsets. Fig. 3 presents the relative performance of all linear PLS models compared to a random model. While x axis represents the proportion of chosen records, y axis represents the lift dened as the hit rate of model A divided by the hit rate of a random model at a chosen record proportion. Therefore, lift values higher than 1.0 indicate that the compared model performs better than a random model. We immediately note that both PLS all and PLS 1.0 models identify twice as many hit records (=correctly identied churners) as a random model, while other PLS models such as PLS 1.2 and PLS 1.5 also identify about 60% more churners at the 5% selection point. Naturally, all PLS models show the decreasing trends of lift values as more users with lower estimated probability of churning are considered in a pool of churner candidates. However, we note that parsimonious PLS models with higher VIP cut-off scores, PLS 1.2 and PLS 1.5, show relatively stable lift trends compared to PLS models with lower VIP cut-off scores such as PLS all and PLS 1.0 modelspartly because of their relatively lower lift values. However, we also attribute this nding to the fact that parsimonious PLS models typically generalize better with a smaller number of input variables for a prediction task. 4.3. Comparative analysis for churner prediction We also compare the prediction performance of PLS models to other popular classication models in the marketing community: two Logit models, an articial neural networks (ANN) model, a decision tree (DT) classier, a Nave Bayes (NB) model, and a random model. For PLS models, we choose a linear PLS model, PLS all, that shows the best performance out of all linear PLS models in Fig. 2. We also implement a nonlinear PLS model, PLS kernel, using a Gaussian kernel function, K(xi xj) = exp(||xi xj|| 2/k), where xi and xj represent i th

Stepwise variable selection 1. Number of days of current equipment, 2. ( = 0.05) Handset (refurbished or new), 3. Range of overage minutes of use, 4. Mean of unrounded minutes of use of completed voice calls, 5. Age of rst household member, 6. Total number of months in service, 7. Account spending limit, 8. Billing adjusted total minutes over the life of the customer, 9. Mean of number of minutes of use 1.5 model-based variable 1. Number of days of current equipment, 2. PLS selection (VIP 1.5) Handset (refurbished or new), 3. Age of rst household member, 4. Billing adjusted total minutes over the life of the customer, 5. Range of revenue of voice overage, 6. Range of revenue of overage, 7. Percentage change in monthly minutes of use vs. previous 3 month average, 8. Range of overage minutes of use, 9. Average monthly number of calls over the life of the customer Variables in bold type: Variables selected by both selection models. Fig. 2. Hit rates of linear PLS models.

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

Fig. 3. Lift trends of linear PLS models.

Fig. 4. Hit rates of single classiers.

and j th records of X, and k is a parameter value of Gaussian kernel function. We compute the values of the kernel function from 6000 samples with even class distribution from training data to lessen computational overload after we standardized the values of observations for each variable, and the value of k is subjectively set at 200. For comparison purposes, two Logit models are implemented: Logit0.15 with =0.15 and Logit0.05 with =0.05. Logit0.05 is a more parsimonious model because the stepwise variable selection method stops adding new variables with a stricter signicance level. Although the two models are very compatible, Logit0.15 model was slightly better in terms of prediction power and hence was chosen for prediction performance comparison. Note, however, that we prefer Logit0.05 model to Logit0.15 model because it helps marketing managers to better understand churn behaviors from fewer variables and hence develop new micromarketing programs easily. To implement ANN, DT, and NB classiers, we use a popular data mining package tool, Weka, which is publicly available at www.cs. waikato.ac.nz/ml/weka/. Note that while all these algorithms have been tested on various data sets in almost all business applications [34][40], they require a different number and kind of parameters, and their performance and computational complexity are dependent on a set of parameter values optimized for each data set. For example, an ANN model has been known to be very accurate and robust to noise in data sets with excellent performance [23], but its prediction performance is strongly dependent on various parameters such as a network architecture (e.g., a three-layered back-propagation network), a transfer function (e.g., a sigmoid transformation), the number of hidden nodes, a learning rate, a momentum rate, the weights and biases on each connection among neurons, and a learning epoch. However, we implement a standard three-layered ANN classier with back-propagation learning with all default parameter values pre-determined by Weka for easy replication of our experimental results. Another popular classier, DT, provides more intuitive outputs with a list of if-then rules and is relatively free from many parameters compared with ANNs. In our implementation, we use a pruned C4.5 tree [28] based on an entropy-based gain ratio criterion to split records with all default parameter values. The Nave Bayes model is one of the simplest models that require no parameters, but we use the Bayes theorem to estimate the conditional probability of response values. We use the same hit rate and corresponding lift trend curve for comparing the performance of prediction models, and graphically summarize their performances in Figs. 4 and 5.

First of all, Fig. 4 presents the hit rates of six predictive models: a linear PLS classier (PLS all), a nonlinear PLS classier (PLS kernel), Logit 0.15, a DT, an ANN, and a random model. We note that while both PLS all and PLS kernel models perform best in general, PLS kernel performs slightly better than PLS all model possibly because a non-linear PLS model is suitable to t a non-linear relationship between churn indicator and all other variables. In particular, PLS kernel relatively performs better at the 5% and 10% target points, returning higher lift values compared to the all other models. Surprisingly, a DT and an ANN model show disappointing performances while a simple Logit 0.15 performs reasonably well. In particular, a DT model with default parameter settings results in a very large tree structure with 8839 leaves (and the total size of tree is 17,677) whose performance is not signicantly different from that of a random model. Considering the fact that its tree structure consists of too many if-then rules, we attribute its low performance to the default settings of tree parameters or the possible overtting in which a calibrated model explains too well on the training data set, but predicts poorly on new

Fig. 5. Lift trends of single classier models compared to that of a random model.

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

test data set. The performance of an NB is not distinguishable from the performance of a DT, and hence is not shown in Figs. 4 and 5. Since no parameter tuning is required for an NB, we attribute its poor performance to the fact that the probability estimates from an NB can be biased due to the unrealistic assumption that input variables are conditionally independent of each other given the value of the class indicator in our study. Fig. 5 shows the lift trends of classier models compared to that of a random model. All classiers except a DT model such as PLS all, PLS kernel, Logit 0.15, and an ANN model show the trend of lift values that records the highest at the 5% selection point and then gradually decreases as more prospects are targeted. The lift values of a DT model at the 5% and 10% selection points are lower than 1, indicating its performance is worse than that of a random model, while at other selection points its performance is better than a that of random model. One thing to note is that there are various performance comparison indices of classiers such as predictive accuracy, computational complexity in terms of memory and CPU time requirement, and easy interpretation of the resulting models. For example, in our experiment, an ANN model performs signicantly better than a random model and a DT model in terms of predictive accuracy, but it takes much longer time to calibrate an ANN model (more than 5 h for an ANN vs. 140.9 s for a DT), and often its nal neural network structure is not immediately understandable due to its black-box nature. Note also that it is possible to boost the performance of all classier models including not only an ANN and a DT model but also PLS all and PLS kernel models by tuning their parameter values to the currently used data set. In particular, an ensemble model that combines multiple single classiers may signicantly boost the predictive performance. Since this comprehensive comparative analysis requires testing numerous models calibrated with different parameter values and ensemble construction strategies, we leave this important issue as one of possible future research directions. 5. Segmentation based churn marketing campaigns In this section, we propose a simple marketing campaign that combines our PLS-based churner identication model with a standard clustering analysis to demonstrate the applicability of the proposed approach in a real-world situation and to recommend the most appropriate campaign size at which the prot from a marketing campaign is maximized. The overall procedure for marketing campaign is as follows: For a chosen campaign size (e.g., the top 5% of customers based on their churn probability), we rst segment customers into ve different groups using a K-means algorithm, one of the most well known clustering methods. Note that both the number of clusters and the properties of variables used for clustering are the main factors that determine the outcomes of clustering analysis. Although how to identify the optimal number of clusters has yet to be developed, theoretically, the number of clusters ranges between one and the number of data points. In this study, we subjectively form ve clusters for easy demonstration of our approach. In terms of variables for clustering analysis, we note that the comprehensibility and applicability of marketing programs are important for marketing managers. Therefore, we rst identify a set of variables that are known to be most signicant factors based on their highest VIP scores to explain customer churn behaviors. Then, we nally select seven variables necessary for the development and implementation of marketing strategies. The seven variables and marketing programs will be discussed in detail later. Once ve clusters of customers are formed, we analyze the characteristics of each cluster to determine which marketing strategy is most suitable to apply, and apply only the best t marketing strategy to each cluster. Then, we compute revenue and cost to verify the effectiveness of the marketing campaign with each cluster. By

repeating the aforementioned steps over a set of campaign sizes (e.g., top 5%, 10%, 15%, 20%, 25% and 30% of customer groups based on their churn probabilities), we also recommend the best campaign size in terms of estimated marketing prots. 5.1. Marketing program development Our proposed marketing program consists of three marketing strategies: device management strategy (DMS), overage management strategy (OMS), and complaints management strategy (CMS). The main purpose of DMS is to provide incentives to renew another contract period for customers who are most likely to churn when their contract period is close to expiration, while OMS focuses on soliciting current overage users for premium services at marginal increases of service fees. Finally, CMS is to attract the customers experienced with technical difculties and accounting discrepancies by providing highly responsive services and nancial incentives. Each marketing strategy is based on one or two variables out of seven variables used for customer clustering. The relationships between these strategies and cluster characteristics will be detailed in Section 5.2. The core element in the process of implementing DMS is to provide free mobile devices for the customers whose service contract period will expire soon and thus to have them extend their service contract for two more years. Then the estimated revenue from each true positive (TP) customer will be his or her monthly service fee (RevMean) for 24 months, where a TP customer is an actual churner among customers who are predicted as churners in the original data set. For all analyses from now on, we use a TP customer as a surrogate customer who is predicted to accept DMS offer (or other marketing offers such as OMS or CMS) and who actually accepts it. Further, we assume that all TP customers accept the marketing offer and stay with the current service provider. Although we acknowledge that this is a very strong assumption, we believe that the general framework of demonstrating comparative effectiveness of marketing campaigns is still valid. When we apply the same DMS to all the customers in the same cluster, the total estimated revenue will be the sum of estimated revenue from each TP customer. In terms of marketing cost of DMS, we consider $5 as the handling cost for each customer and $100 as a new device cost for each TP customer. Note, the handling cost includes postage, customer care call ($2 per call), and any other small costs incurred to promote the strategy. We summarize the formula to compute the revenue and prot of DMS as follows: RevenueDMS RevMeani TPi 24 months
i=1 n n

CostDMS Handling cost + Device cost TPi


i=1

where TPi is a Boolean indicator whether or not a customer i belongs to a TP group and n is the number of customers in a chosen cluster considered for a specic marketing strategy (here, DMS). The main purpose of OMS is to solicit current overage users who frequently overuse their voice and data plan beyond their allowance for premium services at marginal increases of service fees. Heavy users may benet from this OMS offer because they enjoy upgraded services at only marginal cost increase and avoid unexpected and expensive overage charges. The service providers also benet from OMS because they can turn uncertain cash ow from monthly overage charges into steady and predictable cash ow. To compute the revenue from OMS for each TP customer who upgrades his/her service plan, we assume that each TP customer has been enrolled in the current service on average for 12 months (i.e., 12 more months to complete until the end of their current service contract). Then, the revenue from OMS for each TP customer will be the sum of his/her current service fee (RevMean) and additional fee to upgrade his/her

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx

current service into the next available premium service for the coming 12 months. The cost components of OMS include the handling cost of $5 for all customers, and the promotion cost of waiving upgrade costs for the rst 3 months for each TP customer as a token of gratitude. We summarize the formula to compute the revenue and cost of OMS as follows: RevenueOMS RevMeani + Upgrade costi TPi 12 months
i=1 n

3 CostOMS Handling cost + Upgrade costi TPi 3 months 4


i=1 n

where Upgradecosti represents an upgrade cost for a customer i from the current service plan to the next available premium service, while TPi is a Boolean indicator whether or not a customer i belongs to a TP group. The last strategy, CMS, is targeting the customers who are not satised with the service and made many complaint calls to the service centers, causing high operating costs (typically $2 per call). Therefore, the objective of CMS is to make these customers satised with the customer care service to reduce complaint calls accordingly and stay with the current service plan for the remaining contract period. The core part of CMS implementation is to credit 1% of monthly service fee of each customer who made multiple complaint calls after their complaints are veried as legitimate by customer care operators. The formula of estimating revenue is the sum of monthly service charges (RevMean) from all the TP customers for remaining service contract period of 12 months on average. The estimated cost is the sum of $5 as handling cost for all customers and 1% of monthly service charge of each TP customer for 12 months. The prot of this strategy can be calculated as follows: RevenueCMS RevMeani TPi 12 months
i=1 n n

CostCMS Handling cost + RevMeani TPi 1% 12 months


i=1

6 We acknowledge that revenue and cost infrastructure vary depending on organizational characteristics and hence different formulae for computing revenue and cost may be more appropriate in other scenarios. Even so, the proposed scheme is still useful as a general framework to develop and implement churn marketing programs. Finally, we summarize the three marketing strategies in Table 3. 5.2. Segmentation based relevance mapping of marketing strategies Once the three marketing strategies are developed, the next step is to identify a group of customers who are suited for each marketing strategy. Note that we already identied a set of seven predictive variables with the highest VIP scores to cluster all the customers into ve groups and then selected a marketing strategy suitable for each cluster. The rst two variables, number of days of current equipment (eqpdays) and handset (refurbished), with higher centroid values are used to identify the group of customers who keep current mobile
Table 3 Summary of marketing strategies. Marketing strategy Device Management Strategy (DMS) Overage Management Strategy (OMS) Complaints management Strategy (CMS) Objective Renew contract period Stabilize cash ow from overage use Reduce operational costs

devices for a longer time and/or are more likely to keep refurbished device. Since customers in this cluster are highly likely to churn based on our prior analyses, they become ideal candidates for the application of DMS. The next three variables for clustering include Mean overage revenue (ovrrev_Mean), Mean revenue of voice overage (vceovr_Mean), and Mean revenue of data overage (datovr_Mean), and they are supposed to measure users' overage tendency, where Mean overage revenue equals the sum of Mean revenue of voice overage and Mean revenue of data overage. Customer clusters with higher centroid values of these variables will be strong candidates who are likely to churn but who are most likely to positively respond to OMS. Another variable, Mean minutes of customer care calls (ccrndmou_Mean), is used to recognize a group of customers who are most likely to positively respond to a specic marketing program such as CMS. Finally, one variable of demographic information, age of rst household member (age1), is also used for clustering to improve the comprehensibility of clustering outputs, but this variable will not be used specically for decision making to select the most appropriate marketing strategies for each cluster. Once ve clusters are formed along seven variables, we analyze the characteristics of each cluster to determine the most appropriate marketing strategy. For illustration purpose, we show the centroid values of ve clusters in Table 4 when a campaign size is set to top 5% customers based on their churn probabilities. To determine the most appropriate marketing strategy for each cluster, we rank each cluster into three levels, Low, Middle, and High in terms of centroid values along each strategy dimension. For example, after considering centroid values of DMS dimensional variable (eqpdays) from Table 4, we notice that the centroid value of Cluster 2 is much higher (1138.6291) than that other clusters and hence we assign High relevance to Cluster 2 along DMS. Middle relevance is assigned to two other clusters, Cluster 3 (587.4744) and Cluster 4 (456.3294), while Low relevance value is assigned to the two remaining clusters, Cluster 1 (380.25) and Cluster 5 (258.797). We repeat the same process to assign relevance rank to each cluster along two other marketing strategies, OMS based on overage variables (ovrrev_Mean, vceovr_Mean, and datovr_Mean) and CMS based on complaint variable (ccrndmou_Mean). A marketing strategy relevance map of each cluster toward marketing strategies is summarized in Table 5. Once the marketing strategy relevance map is completed, we can (subjectively) choose the most appropriate marketing strategy to each cluster. If there exists a cluster that has high (or mid) relevance values with multiple marketing strategies, a marketing strategy with the highest marketing impact is chosen, which is determined by values of VIP scores of each variable. In our analysis, the strategy with the highest marketing impact is DMS followed by OMS and CMS consecutively. The nal marketing strategy for each cluster in Table 4 is shown in the last column in Table 5.

5.3. Evaluation of marketing strategies Once we assign the most appropriate marketing strategy for each cluster, we can calculate the outcomes of the marketing campaign by estimating actual revenue and cost based on formulae described in Section 5.1. We summarize the outcomes of marketing campaigns when the top 5% of customer groups who are most likely to churn are

Target customer Whose service contract is going to expire soon Who makes overage usage beyond allowance Who is loyal and makes valid claims

Core program Free mobile device Cross sell or up sell Service points

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx Table 4 Centroid values of ve clusters when campaign size is set at top 5% of customers. Attribute eqpdays Refurbished ovrrev_Mean vceovr_Mean datovr_Mean ccrndmou_Mean age1 Full data 690.862 0.7117 31.6745 31.347 0.317 2.742 38.5538 Cluster 1 (152)* 380.25 1 152.231 150.333 1.8051 4.3882 39.8947 Cluster 2 (585)* 1138.62 1 8.3662 8.3659 0.0003 0.6798 39.8838 Cluster 3 (371)* 587.474 0 9.9711 9.9529 0.0182 1.1851 38.8841 Cluster 4 (507)* 456.329 1 26.0078 25.4949 0.5048 4.5207 36.785 Cluster 5 (133)* 258.79 0 78.558 78.431 0.1273 7.4937 36.992

*) The numbers shown in parentheses indicate the number of records in each cluster.

chosen for marketing campaigns in Table 6, which shows overall prot of $62,008 in total. To help marketing managers determine a marketing campaign size where the prot of marketing campaigns is maximized, we repeat the afore-mentioned steps over ve other campaign sizes: top 10%, 15%, 20%, 25% and 30% of customers who are mostly likely to churn. We expect in advance that the prot of marketing strategies increases as marketing campaign grows, reaches, and starts to decrease after an optimal marketing campaign size. Experimental results of marketing strategies over six campaign sizes are summarized in Table 7. According to Table 7, the prot from marketing programs is maximized at a marketing campaign size of the top 25% of customers. However, the prot at a marketing campaign size of the top 30% of customers ($178,361) is not signicantly different from the maximized prot ($178,959), indicating anywhere between top 25% and 30% of customers would be an optimal campaign size. Although it is very possible to have different outcomes due to subjective selection process of assigning the most appropriate marketing strategies to each cluster and subjective values in revenue and cost formulae, the ndings of our approach are still valid and provide a useful tool for marketing managers. 6. Conclusion and future research In this paper, we introduce and build an accurate and parsimonious model based on PLS method to predict churn behaviors on highly correlated data sets among variables. The PLS method is useful when typical regression analyses cannot be applied to data sets with high correlation among variables. In terms of quality of variable selection, the capability of the PLS model is very similar to that of a popular stepwise variable selection algorithm, and both are capable of selecting quite a few identical variables. In terms of predictive accuracy of the PLS model measured by hit rates and shown in lift trend curves, all PLS models perform well and PLS all is the most predictive model. We notice that while PLS all model performs slightly better than PLS 1.0 model, PLS 1.0 model is much more parsimonious than PLS all and hence may be preferred by marketing managers. Both linear and nonlinear PLS models, PLS all and PLS kernel, perform signicantly better than a random model, and PLS kernel relatively performs better at 5% and 10% target points, returning higher lift values compared to the best linear PLS model. However, the relative advantage of nonlinear PLS kernel over the linear PLS model decreases at higher target points possibly due to the loss of strong nonlinearity between predictors and target variable.

Another important contribution of this paper is a churn marketing program developed from a predictive PLS model. This churn marketing program utilizes the VIP scores obtained from a PLS model. Using only seven input variables with highest VIP scores for segmenting customers, we greatly improve the interpretability of clustering outcomes and controllability of the proposed churn marketing program that consists of device management, overage management, and complaint management strategies. Three types of variables (i.e., one type for each marketing strategy) are used to develop and customize marketing strategies that can be tailored according to unique organizational characteristics of each business entity. For the actual implementation of the proposed marketing program, a marketing campaign manager rst determines the most appropriate marketing strategy for each cluster of customers based on the relevant scores between cluster centroids along three types of variable dimensions and marketing strategies. With available revenue and cost infrastructure, she can compute the expected prot from the proposed marketing campaign. We also illustrate how a marketing campaign manager can utilize the proposed churn marketing scheme to determine an optimal marketing campaign size where the prot of the marketing campaign is maximized. A simple extension of the proposed scheme is to consider different responses of customers in the same customer segment to retention offers. The proposed churn management program in this study considers different markets and customer segments to identify which offers are most appropriate for each cluster. Although customers in the same cluster may share common behavioral characteristics, it is very possible that they respond differently to retention offers. In particular, it is important to note that more retention efforts must be given to potential churners who are most likely to react positively. Overall, churn management should not only identify customers who are most likely to leave the current service provider but also who are most likely to respond positively to the right retention offer. Another promising direction for future research is to extend the current research framework to develop an optimal churn management system based on mathematical formulation. In such a churn management system, the chosen subset of variables from the PLS predictive model are carefully investigated and divided into controllable and uncontrollable variables. Then, a mathematical formulation will be applied to minimize the churn rate (or maximize nancial benets from churn management program) by changing the values of controllable variables with the constraint that costs involved in manipulating controllable variables should be less than or equal to the overall budget limit.

Table 5 Marketing strategy relevance map. Cluster Cluster Cluster Cluster Cluster Cluster 1 2 3 4 5 DMS Low High Mid Low Low OMS High Low Low Low Mid CMS High Low Low Mid High Final strategy OMS DMS DMS CMS CMS

Table 6 Marketing program outcome (campaign size: 5%, US$). Strategy DMS OMS CMS Total Cluster Clusters 2, 3 Cluster 1 Clusters 4, 5 Revenue (R) 42,593 7322 25,118 75,033 Cost (C) 8380 1195 3451 13,026 Prot (RC) 34,213 6128 21,667 62,008

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

10

H. Lee et al. / Decision Support Systems xxx (2011) xxxxxx [25] M.C. Mozer, R. Wolniewicz, D.B. Grimes, Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry, IEEE Transactions on Neural Networks 11 (3) (2000) 690696. [26] G. Piatetsky-Shapiro, B. Masand, Estimating campaign benets and modeling lift, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 1999, pp. 185193. [27] S.J. Qin, T.J. McAvoy, Nonlinear PLS modeling using neural networks, Computers and Chemical Engineering 16 (4) (1992) 379391. [28] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993. [29] V.R. Rao, J.H. Steckel, Selecting, evaluating and updating prospects in direct mail marketing, Journal of Direct Marketing 9 (2) (1995) 2031. [30] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research 2 (2002) 97123. [31] P.E. Rossi, R. McCulloch, G. Allenby, The value of household information in target marketing, Marketing Science 15 (3) (1996) 321340. [32] J. Schmid, A. Weber, Desktop Database Marketing, NTC Business Books, Lincolnwood, IL, 1998. [33] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. [34] P. Sinha, J.H. May, A case-based reasoning system for indirect bank lending, Journal of Management Information Systems 21 (3) (2004) 249280. [35] S. Thomassey, A. Fiordaliso, A hybrid sales forecasting system based on clustering and decision trees, Decision Support Systems 42 (1) (2006) 408421. [36] P.C. Verhoef, P.N. Spring, J.C. Hoekstra, P.S.H. Leeang, The commercial use of segmentation and predictive modeling techniques for database marketing in The Netherlands, Decision Support Systems 34 (4) (2003) 471481. [37] C.-P. Wei, I.-T. Chiu, Turning telecommunications call details to churn prediction: a data mining approach, Expert Systems with Applications 23:2 (2002) 103112. [38] M.C. Wiener, L. Obando, J. O'Neill, Building process understanding for vaccine manufacturing using data mining, Quality Engineering 22: 3 (2010) 157168. [39] S. Wold, PLS for multivariate linear modeling, in: H. van de Waterbeemd (Ed.), QSAR: Chemometric Methods in Molecular Design. Methods and Principles in Medicinal Chemistry, VerlagChemie, Weinheim, Germany, 1994. [40] W. Zhang, Q. Cao, M.J. Schniederjans, Neural network earnings per share forecasting models: a comparative analysis of alternative methods, Decision Sciences 35 (2) (2004) 205237. Dr. Hyeseon Lee is a research professor in Department of Industrial & Management Engineering at POSTECH (Pohang University of Science and Technology). She got her M.S degree in Statistics at Cornell University, and Ph.D. at Kyungpook National University in Korea. She was a research programmer at National Opinion Research Center afliated with the University of Chicago for 3 years, and worked as a statistician in Biostatistics division at University of California, San Diego for 2 years. She served as a project investigator in the marketing division of Samsung Electronics in 2003, and developed customer scoring model based on Samsung electronics' appliance buyer database. Her major interest in statistics is data mining methods using high dimensional data. Partial least squares methods is one of her research topics. She also studies diagnostic problem applied machine learning techniques with clinical data. Dr. Yeonhee Lee is a research fellow in Gyeonggi Institute of Science and Technology Promotion (GSTEP), South Korea. She received her M.S. degree in Business Administration from the TU Berlin and Ph.D. in Business Administration from Free University of Berlin. Dr. Lee's primary research interest is in high-tech marketing, research and development (R&D) in services, and technology-based service innovation. Her papers have appeared in Journal of Engineering and Technology Management and Journal of Business Management. Dr. Lee currently works on topics about regional science and technology policy and technology cluster and network development. Dr. Hyunbo Cho is an associate professor in the Department of Industrial Engineering at the Pohang University of Science and Technology. He received his B.S. and M.S. in Industrial Engineering from Seoul National University in 1986 and 1988, respectively, and his Ph.D. in Industrial Engineering with specialization in Manufacturing Systems Engineering from Texas A&M University in 1993. His Ph.D. dissertation was associated with dening and implementing an intelligent workstation controller for CIM. He is a recipient of the SME's 1997 Outstanding Young Manufacturing Engineer Award. His areas of expertise include Shop Floor Control, CAPP, and Internet-based manufacturing/ASP. He is an active member of IIE and SME. Mr. Kwanyoung Im is currently a Ph.D. student in Technology Innovation Management at POSTECH (Pohang University of Science and Technology). He received his M.S. degree in Information System from Yonsei University in Korea. His research interests include B2C marketing, business model innovation, framework for building business models and feasibility studies for new business models. Dr. Yong Seog Kim is an associate professor in Management Information Systems department at the Utah State University. He received his M.S. degree in Computer Science and Ph.D. in Business Administration from the University of Iowa. Dr. Kim's primary research interest is in decision support systems utilizing various data mining (KDD) algorithms such as variable selection, clustering, classication, and ensemble methods. His papers have appeared in Management Science, Decision Support Systems, Intelligent Data Analysis, Expert Systems with Application, and Journal of Computer Information Systems, and conference proceedings of KDD, AMCIS, DSI, HICSS, and many others. Dr. Kim currently serves on the editorial board of the Journal of Computer Information Systems, Journal of Information Technology Cases and Applications, and Journal of Emerging Trends in Computing and Information Sciences.

Table 7 Experimental results of marketing strategies over six campaign sizes (US$). Campaign size DMS OMS CMS Total 5% 34,213 6128 21,667 62,008 10% 84,173 11,105 20,758 116,036 15% 111,707 7845 24,960 144,512 20% 91,417 38,842 27,854 158,113 25% 104,856 38,110 35,993 178,959 30% 68,703 71,723 37,935 178,361

Values: Sequential orders of variable selection.

Acknowledgments The author wishes to thank the Teradata Center for CRM at Duke University for making the data sets available. This research by Hyeseon Lee was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) from the Ministry of Education, Science and Technology (20100003628). References
[1] S. Bhattacharyya, S. Bhattacharyya, Evolutionary algorithms in data mining: multi-objective performance modeling for direct marketing, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, 2000, pp. 465473. [2] A. Blumer, D. Ehrenfeucht, Haussler, M.K. Warmuth, Occam's razor, Information Processing Letters 24 (1987) 377380. [3] Bose, X. Chen, Hybrid models using unsupervised clustering for prediction of customer churn, Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS) vol. 1 (2009). [4] J.R. Bult, T. Wansbeek, Optimal selection for direct mail, Marketing Science 14 (4) (1995) 378394. [5] I.-G. Chong, S.L. Albin, C.-H. Jun, A data mining approach to process optimization without an explicit quality function, IIE Transactions 39 (2007) 795804. [6] P.B. Chou, E. Grossman, D. Gunopulos, P. Kamesam, Identifying prospective customers, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, 2000, pp. 447456. [7] K. Coussement, D. Van den Poel, Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques, Expert Systems with Application 34 (1) (2008) 313327. [8] P. Geldadi, B. Kowalski, Partial least-squares regression: a tutorial, Analytica Chemica Acta 185 (1986) 117. [9] T.J. Gerpott, W. Rams, A. Schindler, Customer retention, loyalty, and satisfaction in the German mobile cellular telecommunications market, Telecommunications Policy 25 (2001) 249269. [10] W. Gersten, R. Wirth, D. Arndt, Predictive modeling in automotive direct marketing: tools, experiences and open issues, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, 2000, pp. 398406. [11] F. Gnl, M.Z. Shi, Optimal mailing of catalogs: a new methodology using estimable structural dynamic programming models, Management Science 44 (9) (1998) 12491262. [12] P.M. Guadagni, J. Little, A Logit model of brand choice calibrated on scanner data, Marketing Science 2 (3) (1983) 203238. [13] S. Guris, N. Metin, E. Caglayan, The brand choice model of wine consumers: a multinomial Logit model, Quality & Quantity 41 (3) (2007) 447460. [14] J. Hadden, A. Tiwary, R. Roy, D. Ruta, Computer assisted customer churn management: state-of-the-art and future trends, Computers & Operations Research 34 (10) (2007) 29022917. [15] M.-K. Kim, M.-C. Park, D.-H. Jeong, The effects of customer satisfaction and switching barrier on customer loyalty in Korean mobile telecommunication services, Telecommunications Policy 28 (2004) 145159. [16] Y. Kim, Toward a successful CRM: variable selection, sampling, and ensemble, Decision Support Systems 41 (2) (2006) 542553. [17] Y. Kim, W.N. Street, G.J. Russell, F. Menczer, Customer targeting: a neural network approach guided by genetic algorithms, Management Science 51 (2) (2005) 264276. [18] E.K. Laitinen, Data system for assessing probability of failure in SME reorganization, Industrial Management & Data Systems 108 (7) (2008) 849866. [19] S. Lakshminaraynan, S. Shah, K. Nandakumar, Modeling and control of multivariable processes: dynamic PLS approach, AIChE Journal 43 (9) (1997) 23072322. [20] M. Lejeune, A.P.M. Miguel, Measuring the impact of data mining on churn management, Internet Research: Electronic Network Applications and Policy 11:5 (2001) 375387. [21] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi, Process monitoring and diagnosis by PLS methods, AIChE Journal 40 (5) (1994) 826836. [22] E. Malthouse, A. Tamhane, R. Mah, Nonlinear partial least squares, Computers and Chemical Engineering 21 (8) (1997) 875890. [23] T. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. [24] P. Nelson, P.A. Taylor, J.F. MacGregor, Missing data methods in PCA and PLS: score calculations with incomplete observations, Chemometrics and Intelligent Laboratory Systems 35 (1) (1996) 4565.

Please cite this article as: H. Lee, et al., Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decis. Support Syst. (2011), doi:10.1016/j.dss.2011.07.005

You might also like