Professional Documents
Culture Documents
Kyu-Hwan Jung
SK telecom
Seoul, South Korea
Onlyou7@postech.ac.kr
Abstract
This paper explores the possible application of a
single SVM classifier and its variants to churner
identification problem in mobile telecommunication
industry in which the role of customer retention
program becomes more important than ever due to
its very competitive business environment. In
particular, this study introduces a uniformly
subsampled ensemble model of SVM classifiers
combined with principal component analysis (PCA)
not only to reduce the high dimensionality of data but
also to boost the reliability and accuracy of
calibrated models on data sets with highly skewed
class distributions. According to our experiments, the
performance of USE SVM model with PCA is
superior to all compared models and the number of
principal components (PCs) affect the accuracy of
ensemble models.
1. Introduction
The availability of cheap hard disk spaces and the
expansion of data collection technologies empower
many business companies to easily monitor and
visualize customers daily purchase and usage
patterns through online transaction processing
(OLTP) databases [5]. Therefore, in these days, most
companies have plenty of data. However, data itself
is not information, and data must be turned into
information so that users can answer their own
questions with the right information at the right time
and in the right place.
1023
2.3. Evaluation
We used hit rate as an evaluation metric for our
research . The hit rate is a popular measure to
evaluate the predictive power of models numerically
for the marketing field [18]. Hit rate is calculated as
n
Hit rate = H i / n
(1)
i =1
1024
f ( x) = wm f m ( x)
(4)
m =1
1025
4. Experimental results
In this section, we present the process and results
that the proposed Uniformly Subsampled Ensemble
SVM is applied to the telecommunications market
data. Fig. 3 shows correlation matrix of variables. We
can notice that there are high correlations among
features. It supports the need of extracting
uncorrelated new features.
1026
1027
5. Conclusions
6. References
[1] E. Bauer and R. Kohavi, "An empirical comparison of
voting classification algorithms: Bagging, Boosting, and
variants", Machine Learning, 36(12):105139, 1999.
[2] L. Breiman. Bagging predictors. Machine Learning,
24(2):123140, 1996.
[3] L. Breiman. Stacked regression. Machine Learning,
24(1):4964, 1996.
[4] R. B. Cattell. The scree test for the number of factors.
Multivariate Behavioral Research, 1:245276.
[5] S. Chaudhuri and U. Dayal. An overview of data
warehousing and olap technology. SIGMOD Rec., 26:65
74, March 1997.
[6] Y. Freund and R. Schapire. Experiments with a new
Boosting algorithm. In Proc. of 13th Intl Conf. on Machine
Learning, pages 148156, Bari, Italy, 1996.
[7] K.-H. Jung, D. Lee, and J. Lee. Fast support-based
clustering method for large-scale problems. Pattern
Recognition, 43:19751983, 2010.
1028
1029