Hybrid Intelligence System For Data Imputation For Final Review

Hybrid Intelligence Systems For Data
Imputation
Chandan Gautam
(12MCMB03)
Under the guidance of
Prof. V. Ravi
Outline
Outline
Problem Statement
Missing Data and their causes
Data Imputation
Literature Survey
Proposed Method
Results
Conclusions
References
2
Problem Statement
Problem Statement
Developing Hybrid Intelligence Systems for Data Imputation
Based on Statistical and Machine Learning Techniques.
What is missing data ?
What is missing data ?

In the real world scenario,
missing data is an inevitable
and common problem in
various disciplines.
It circumscribes the ability of
researchers to obtain any
conclusion, even if we will get
result by deleting missing data
then result may have biased and
inappropriate.
So, the missing values have to
be imputed.
Age
Salary
Incentive
25
4000
??
??
500
27
??
50
82
2000
150
42
6500
1000
Literature Survey
Literature Survey*
N. Ankaiah, V. Ravi, A novel soft computing hybrid for data
imputation, In Proceedings of the 7th International Conference
On Data Mining (DMIN), Las Vegas, USA, 2011.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation
using principal component analysis and Auto associative neural
networks, Journal of Systemics, Cybernetics and Informatics, Volume 7,
pp. 72-79 .
I. B. Aydilek, A. Arslan, A hybrid method for imputation of missing
values using optimized fuzzy c-means with support vector regression
and a genetic algorithm, Information Sciences, vol. 233, pp. 25-35,
2013.
Shichao Zhang, Nearest neighbor selection for iteratively kNN
imputation, The Journal of Systems and Software (2012), vol. 85(11),
pp. 2541-2552.
Mean Imputation
Creating Missing Values and Mean Imputation

Age
Salary
Incentive
Age
Salary
Incentive
25
4000
200
25
4000
??
34
500
??
500
27
1000
50
27
??
50
82
2000
150
82
2000
150
42
6500
1000
42
6500
1000
Mean Imputation :
44
3250
300
Initially, No Missing Data

7
Mean Imputation
MAPE
Compute
the mean absolute percentage error (Flores,1986)

(MAPE) value:
100 n
MAPE
n i 1
x x
x
i
Where,
n - Number of missing values in a given dataset.
i - Predicted by the Mean Imputation for the missing
x
value.
xi - Actual value.
8
Mean Imputation
Result of Mean Imputation
Average MAPE value over 10 fold Mean Imputation

Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
Mean Imputation
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
Error value is too high for

most of the datasets.
So, we have need some
other methods.
Proposed Methods
Module I
PCA-AAELM Imputation
ECM-Imputation
ECM-AAELM Imputation
Module II
PSO-ECM- Imputation
PSO-ECM + ECM-AAELM
Module III
CPAANN Imputation
Gray+PCA-AAELM
Gray+CPAANN
10
Overview of ELM
Overview of Extreme Learning Machine (ELM)
11
Overview of ELM
Architecture of ELM
Architecture of ELM *
x
Output of hiddenTraining
nodes :
g(ai x + bi)
H=g(a.x)
ai : the weight vector
of the connection
=?
th
between the i hidden
node and the input
H. =O
nodes.
H thO
bi : the threshold of the i hidden node.
Output of SLFNs :
m Testing
( x) g(a i x bi )
i
iH_T=g(y.a)
1
Output=H_T .
i : the weight vector of the connection
Output Weight : H T
H is Moore-Penrose inverse.
between the ith hidden node and the

output nodes.
12
Table of Activation Functions
Table of Activation Functions *
13
Proposed Method
Architecture of AAELM
Architecture of AAELM (Autoassociative ELM)
Auto encoders are

feed forward neural
networks trained to
recall the input
space.
14
Ensembled-AAELM
Ensembling of AAELM
Ensembling of AAELM
Run AAELM 10 times independently on same dataset to
generate AAELF.
Use three different probability distribution functions
(Uniform, Normal and Logistic distributions) to generate
weight and two different activation functions (Sigmoid and
Gaussian) at hidden layer.
AAELM ensemble for total six combinations of probability
distribution and activation functions.
15
Ensembled-AAELM
Result of Ensembled AAELM
Average MAPE value over 10 folds Ensembled AAELM *
16
Problems and Solutions of Ensembled AAELM
Drawbacks of AAELM
Dependency of AAELM on randomness is very high and

significant because each run of ELM yields different results.
Result could be fluctuate wildly sometimes.
Remedy of Above Problem
We proposed two new hybrid methods to stabilize

randomness of AAELM :
PCA-AAELM
ECM-AAELM
17
PCA-AAELM
Proposed Method 1:
PCA-AAELM
18
PCA-AAELM
Architecture of PCA-AAELM
Architecture of PCA-AAELM *
Traditional ELM
19
PCA-AAELM
Results
Average MAPE value over 10 folds - PCA-AAELM *
20
ECM-Imputation
Proposed Method 2:
Evolving Clustering method (ECM)
based Imputation
21
ECM-Imputation
Block Diagram of the Proposed Method

Complete
Dataset with
Missing
Values
ECM
Clustering
Incomplete
Find Nearest
Cluster Center from
Incomplete Records
Impute Incomplete Features with

Corresponding Features of the Nearest
Cluster center
Obtained
Cluster Centers
Dataset
without Missing
Values
ECM-Imputation
How to calculate missing values by the help of cluster centers ?

(2 0)2 (3 2)2 5
0
(2 3)2 (3 1)2 5
3
(2 1)2 (3 2)2 2
5
(2 5) (3 3) 9
2
(2 1)2 (3 9)2 37
23
ECM-Imputation
Results
Average MAPE value over 10 folds - ECM-Imputation *

Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
K-Means+MLP
ECM Imputation
[Ankaiah & Ravi]
23.75
18.03
7.83
6.31
21.01
17.84
26.61
22.29
9.41
5.27
29.7
27.16
39.91
31.98
12.14
10.21
33.01
27.90
30.96
46.14
32.17
27.40
21.58
15.61
24
ECM-AAELM
Proposed Method 3:
ECM-AAELM
25
ECM-AAELM
Architecture of ECM-AAELM
Architecture of ECM-AAELM *
Traditional ELM
26
ECM-AAELM
Results
Average MAPE value over 10 folds - ECM-AAELM
27
PCA/ECM-AAELM
Behavior of PCA/ECM-AAELM on different activation functions
80
PCA-AAELM
70
Sigmoid
60
Sinh
50
Cloglogm
40
Bsigmoid
Sine
30
Hardlim
20
Tribas
10
Radbas
0
Auto mpg Boby fat
Boston Forest fires

Housing
Iris
Prima
indian
Spanish
Spectf
Turkish
UK
UK Credit
bankruptcy
Wine
ECM-AAELM
Softplus
Sigmoid
70
sinh
Cloglogm
60
Bsigmoid
50
Sin
40
Hardlim
Tribas
30
Radbas
20
Softplus
Gaussian
10
Rectifier
0
Auto mpg
Body fat
Boston Forest fires

Housing
Iris
Prima
Indian
Spanish
Spectf
Turkish
UK
UK Credit
bankruptcy
Wine
28
ECM-AAELM
Influence of Dthr value on MAPE results : ECM-AAELM
Influence of Dthr value on MAPE results : ECM-AAELM

250
Auto_MPG
200
Body_Fat
Boston_housing
Forest_Fire
150
Iris
Prima_indian
Spanish
100
Turkish
Spectf
50
UK_Credit
UK_Bankruptcy
Wine
Dthr
0.035
0.07
0.105
0.14
0.175
0.21
0.245
0.28
0.315
0.35
0.385
0.42
0.455
0.49
0.525
0.56
0.595
0.63
0.665
0.7
0.735
0.77
0.805
0.84
0.875
0.91
0.945
0.98
29
ECM-AAELM
Module II:
Proposed Method 4:
PSO-ECM
30
Block Diagram of the Proposed Method

2
Dataset contains
incomplete records
Complete Records
3
Initialize PSO parameter
and Apply ECM with
initialized Dthr value
4
ECM imputation based on nearest
cluster center
Incomplete Records
Compute Covariance matrix for complete

records (Ccov) and total records (Tcov) after
imputation and Determinant of Ccov & Tcov
Compute MSE b/w Ccov & Tcov and absolute difference

b/w Det(Ccov) & Det(Tcov )
Invoke PSO to select Dthr value
Apply ECM with Dthr value yielded by

PSO
Is error
minimum ?
Parameter Optimized
ECM imputation with optimized Dthr
value
Dataset does not contain

incomplete records
31
ECM-Imputation
Results
Average MAPE value over 10 fold PSO-ECM based Imputation *

Proposed Techniques
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
K-Means+MLP
ECM-Imputation
[Ankaiah & Ravi]
23.75
18.03
7.83
6.31
21.01
17.84
26.61
22.29
9.41
5.27
29.7
27.16
39.91
31.98
12.14
10.21
33.01
27.9
30.96
46.14
32.17
27.4
21.58
15.61
PSO-ECM
15.34844
4.96008
14.49978
18.33909
4.82263
24.57587
20.73123
9.85382
19.28137
30.97627
24.61695
12.75819
32
Proposed Method 5:
PSO-ECM + ECM-AAELM
33
PSO-ECM + ECM-AAELM
Proposed Model
34
PSO-COV-ECM + ECM_AAELM
Results
Average MAPE value over 10 fold - PSO-COV-ECM + ECM_AAELM
35
PSO-ECM + ECM-AAELM
Comparison
Compare the Results after and before selection of optimal Dthr value
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
K-Means+MLP
[Ankaiah & Ravi]
23.75
7.83
21.01
26.61
9.41
29.7
39.91
12.14
33.01
30.96
32.17
21.58
ECM-AAELM
PSO-ECM +
ECM-AAELM
Before
After
17.38
5.33
16.48
21.54
5.10
23.95
22.09
8.05
21.49
40.06
26.85
14.88
14.69
4.64
14.44
18.17
4.83
23.96
18.53
8.18
18.97
28.66
24.79
12.60
36
CPAANN
Module III:
Proposed Method 6:
CPAANN
37
CPAANN
Introduction of CPNN
Introduction *
Semi-supervised Learning :
Unsupervised
CP NN
Kohonen
SOM
competitive
learning
Supervised
Grossberg
Outstar
Added the concept of auto-associativity in CPNN and

created Counter Propagation Auto-associative Neural
Network (CPAANN)
38
Comparison
Average MAPE value over 10 fold - CPAANN

Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
K-Means+MLP CPAANN based

[Ankaiah & Ravi]
Imputation
23.75
18.32
7.83
5.25
21.01
14.86
26.61
16.97
9.41
6.51
29.7
18.21
39.91
17.13
12.14
8.61
33.01
16.07
30.96
21.96
32.17
22.88
21.58
11.56
39
PCA-AAELM
Proposed Method 7:
Gray + PCA-AAELM
40
Proposed Method*:
Stage I
Gray Distance
Based
Nearest
Neighbor
Imputation
Stage II
PCA-AAELM
Based
Imputation
41
Gray+PCA-AAELM
Comparison
Results of PCA-AAELM with Mean Imputation and Gray Distance based Imputation
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
PCA-AAELM with
K-Means+MLP
Gray Distance
Mean Imputation
[Ankaiah & Ravi]
based Imputation
23.75
28.63
16.92
7.83
6.01
5.41
21.01
20.9
17.46
26.61
19.41
20.89
9.41
10.23
5.79
29.7
22.06
22.03
39.91
30.09
28.06
12.14
9.11
8.38
33.01
30.18
27.38
30.96
37.7
37.95
32.17
25.27
27.79
21.58
16.6
14.78
42
Gray+CPAANN
Proposed Method 8:
Gray + CPAANN
43
Gray+CPAANN
Proposed Method*:
Stage I
Gray Distance
Based
Nearest
Neighbour
Imputation
Stage II
CPAANN
Based
Imputation
44
Gray+CPAANN
Comparison
Results of CPAANN with Mean Imputation and Gray Distance based Imputation
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
CPAANN with
K-Means+MLP
Gray Distance
Mean Imputation
[Ankaiah & Ravi]
based Imputation
23.75
18.32
15.31
7.83
5.25
4.71
21.01
14.86
15.01
26.61
16.97
17.91
9.41
6.51
4.03
29.7
18.21
19.34
39.91
17.13
14.21
12.14
8.61
8.53
33.01
16.07
17.37
30.96
21.96
20.58
32.17
22.88
13.70
21.58
11.56
11.72
45
Comparison Between All Methods

Comparison
Comparison Between All Proposed Methods based on Average MAPE value over 10
folds
PCAAAELM
ECM_Imp
utation
ECMAAELM
Auto mpg
Body fat
Boston
Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
28.63
6.01
18.03
6.31
17.38
5.33
20.90
19.41
10.23
22.06
30.09
9.11
30.18
17.84
22.29
5.27
27.16
31.98
10.21
27.90
16.48
21.54
5.10
23.95
22.09
8.05
21.49
UK bankruptcy
UK Credit
Wine
37.70
25.27
16.60
46.14
27.40
15.61
40.06
26.85
14.88
PSOGray+PCA
Gray+CPA
PSO-ECM ECM+ECMCPAANN
-AAELM
ANN
AAELM
15.35
4.96
14.50
18.34
4.82
24.58
20.73
9.85
19.28
30.98
24.62
12.76
14.39
4.61
16.92
5.41
18.32
5.25
15.31
4.71
14.18
17.66
4.75
23.38
16.99
8.18
16.49
17.46
20.89
5.79
22.03
28.06
8.38
27.38
14.86
16.97
6.51
18.21
17.13
8.61
16.07
15.01
17.91
4.03
19.34
14.21
8.53
17.37
26.89
23.66
12.21
37.95
27.79
14.78
21.96
22.88
11.56
20.58
13.70
11.72
46
Conclusions
Conclusions
47
Conclusion
Conclusions
The results indicated that all the proposed methods provided significantly
improved results compare to K-Means +MLP.
ECM-Imputation alone outperformed K-Means +MLP. It showed powerful
local learning capability of ECM.
ECM-AAELM yields more accuracy than PCA-AAELM.
Output of ECM-AAELM primarily depends on threshold value of ECM, its
output does not fluctuate wildly according to activation functions.
Based on our experiment, it is proved that selection of optimal Dthr value
always performed better imputation.
In case of PCA-AAELM, it is recommended to use Softplus activation
function because it performed better than other activation functions.
Gray Distance based imputation performed better than Mean imputation as
preprocessing task for most of the dataset.
48
Papers
List of Published and Communicated Research Papers

C. Gautam, V. Ravi, Evolving Clustering Based Data Imputation, 3rd
IEEE Conference, ICCPCT, Kanyakumari, Mar 21-22, 2014.
C. Gautam, V. Ravi, Data Imputation via Evolutionary Computation,
Clustering and a Neural Network, to be communicated in IEEE
Computational Intelligence Magazine (CIM).
A Hybrid Data Imputation method based on Gray System Theory and
Counterpropagation Auto-associative Neural Network, to be
communicated in Neurocomputing.
Imputation of Missing Data Using PCA, Extreme Learning Machine
and Gray System Theory, to be communicated in The 5th Joint
International Conference on Swarm, Evolutionary and Memetic
Computing (SEMCCO 2014).
49
References
Data Imputation
References
Abdella, M., & Marwala, T. (2005). The use of genetic algorithms and neural
networks to approximate missing data in database, IEEE 3rd International
Conference on Computational Cybernetics, Mauritius, pp. 207-212.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation using
principal component analysis and Auto associative neural networks, Journal of
Systemics, Cybernetics and Informatics, Volume 7, pp. 72-79 .
Ankaiah, N., & Ravi, V. (2011). A novel soft computing hybrid for data
imputation, International Conference on Data Mining, Las Vegas, USA.
Vriens, M., & Melton, E. (2002). Managing missing data. Marketing Research,
Volume 14, Issue 3, pp.1217.
Naveen, N., Ravi, V., & Rao, C. R. (2010). Differential evolution trained radial
basis function network: application to bankruptcy prediction in banks, International
Journal of Bio-Inspired Computation (IJBIC), Volume 2, Issue 3, pp. 222-232.50
References
Data Imputation (Cont.)
Nelwamondo, F., V., Golding, D., & Marwala, T. (2013). A dynamic programming
approach to missing data estimation using neural networks, Elsevier, Information
Sciences, Volume 237, pp. 4958.
Nishanth, K.J., Ankaiah, N., Ravi, V., & Bose, I. (2012). Soft computing based
imputation and hybrid data and text mining: The case of predicting the severity of
phishing alerts, Expert Systems with Applications, Volume 39, Issue 12, pp. 1058310589.
K. J. Nishanth, V. Ravi, A Computational Intelligence Based Online Data
Imputation Method: An Application For Banking, Journal of Information
Processing Systems, vol. 9 (4), pp. 633-650, 2013.
M. Krishna, V. Ravi, Particle swarm optimization and covariance matrix based data
imputation, IEEE International Conference on Computational Intelligence and
Computing Research (ICCIC), Enathi, 2013.
V. Ravi, M. Krishna, A new online data imputation method based on general
regression auto associative neural network, Neurocomputing, vol. 138, pp. 207212, 2014.
51
References
Extreme Learning Machine (ELM)
Huang, G.B., Zhu, Q., & Siew, C. (2006). Extreme Learning Machine: Theory and
Applications, Neurocomputing, Elsevier, 7th Brazilian Symposium on Neural
Networks, Volume 70, pp. 489-501.
Rajesh, R., & Siva, J. (2011). Extreme Learning Machine A Review and State of
Art, International Journal Of Wisdom Based Computing, Volume 1, pp. 35-49.
Huang, G., Wang, D., & Lan, Y. (2011). Extreme Learning Machine: A Survey,
International Journal of Machine Learning and Cybernetics June 2011, Volume
2, Issue 2, pp 107-122.
Bartlett, P. (1997). For Valid Generalization, The Size of the Weights is more
important than the Size of the Network, Advances in Neural Information Processing
Systems, Volume 9, pp. 134-140.
Huang, G., Chen, L., & Siew, C. (2006). Universal Approximation Using
Incremental Constructive Feedforward Networks with Random Hidden Nodes,
IEEE Transactions on Neural Networks, Volume 17, Issue 4, pp. 879-892.
52
References
Extreme Learning Machine (ELM) (Cont.)
Zhu, Q., Qin, A. K., Suganthan, P.N., & Huang, G. (2005). Evolutionary Extreme
Learning Machine, Pattern Recognition, Elsevier, Volume 38, Issue 10, pp. 1759
1763.
Castao, A., Fernndez-Navarro, F., & Hervs-Martnez, C. (2013). PCA-ELM -A
Robust and Pruned ELM Approach Based on PCA, Neural Processing Letter,
Springer, Volume 37, Issue 3, pp. 377-392.
Huang, G.B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme Learning Machine
for Regression and Multiclass Classification, IEEE Transaction on Systems, Man
And Cybernetics, Volume 42, Issue 2, pp. 513-529.
53
References
Extreme Learning Machine (In Finance)
Duan, G., Huang, Z., & Wang, J. (2009). Extreme Learning Machine for Bank
Clients Classification, International Conference on Information Management,
Innovation Management and Industrial Engineering, Xian, China, Volume 2, pp.
496-499.
Duan, G., Huang, Z., & Wang, J. (2010). Extreme Learning Machine for Financial
Distress Prediction for Listed Company, International Conference on Logistics
Systems and Intelligent Management, Harbin, China, Volume 3, pp. 1961-1965.
Zhou, H., Lan, Y., Soh, Y. C., Huang G.B., & Zhang, R. (2012). Credit Risk
Evaluation with Extreme Learning Machine, IEEE International Conference on
Systems, Man and Cybernetics, Seoul, Korea, pp. 1064-1069.
Teresa, M., Carmen, M., David B., & Jos, F. (2012). Extreme Learning Machine
to Analyze the Level of Default in Spanish Deposit Institutions, Journal of
Methods for the quantitative Economy and Enterprise, Volume 13, Issue 1, pp. 323.
54
References
Activation Function
Sibi,
p., Jones, s., & Siddarth, p. (2013). Analysis of Different Activation Functions
Using Back Propagation Neural Networks, Journal of Theoretical and Applied
Information Technology, Volume 47, Issue 3, pp. 1264-1268.
Peng, J., Li, L., & Tang (2013). Combination of Activation Functions in Extreme
Learning Machines for Multivariate Calibration, Chemometrics and Intelligent
Laboratory Systems, Elsevier, Volume 120, pp. 53-58.
Gomes, G. S. S., Ludermir, T. B., & Lima, L. M. M. R. (2011). Comparison of new
activation functions in neural network for forecasting financial time series, Neural
Computing and Applications, Springer, Volume 20, Issue 3, pp. 417-439.
Asaduzzaman, Md., Shahjahan, M., & Murase, K. (2009). Faster Training Using
Fusion of Activation Functions for Feed Forward Neural Networks, International
Journal of Neural Systems , Volume 19, Issue 06, pp. 437-448 .
Karlik, B., & Olgac, A. V. (2010) Performance Analysis of Various Activation
Functions in Generalized MLP Architectures of Neural Networks, International Journal
of Artificial Intelligence and Expert Systems, Volume 1, Issue 4, pp. 111-122. 55
References
Activation Function(Cont.), ECM, Cross Validation & PCA
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural
Networks, International Conference on Artificial Intelligence and Statistics, Fort
Lauderdale, USA, Volume 15, pp. 315-323.
Song, Q. & Kasasbov, N. (2001) ECM A Novel On-line, Evolving Clustering
Method and Its Applications, Proceedings of the Fifth Biannual Conference on
Artificial Neural Networks and Expert Systems, Berlin, pp. 87-92.
Refaeilzadeh, P., Tang, L., & Liu. H. (2009). "Cross Validation", in Encyclopedia
of Database Systems (EDBS), Springer, Volume 1, pp. 532-538.
Smith, L. (2002). A tutorial on Principal Components Analysis.
56
References
Counter Propagation Neural Network (CPAANN)
Kuzmanovski, I., & Novi, M. (2008). Counter-propagation neural networks in

Matlab, Chemometrics and Intelligent Laboratory Systems, pp. 84-91.
Taner, M. (1997). Kohonens self organizing networks with CONSCIENCE.
Ballabio, D., & Vasighi, M. (2012). MATLAB toolbox for Self Organizing Maps and
supervised neural network learning strategies, Chemometrics and Intelligent
Laboratory Systems, pp. 24-32.
Ballabio, D., Consonni, V., & Todeschini, R. (2009). The Kohonen and CP-ANN
toolbox: A collection of MATLAB modules for Self Organizing Maps and
Counterpropagation Artificial Neural Networks, Chemometrics and Intelligent
Laboratory Systems, pp. 115-122.
Introduction to neural networks Using MATLAB 6.0 by S N Sivanandam, S Sumathi
and S N Deepa.
Elements of Artificial Neural Networks by Kishan Mehrotra, Chilukuri K. Mohan and
Sanjay Ranka .
57
Thank You
Thank You
58
Activation Function
Activation Function *
Sibi, p., Jones, s., & Siddarth, p. (2013). Analysis of Different
Activation Functions Using Back Propagation Neural Networks, Journal
of Theoretical and Applied Information Technology, Volume 47, Issue
3, pp. 1264-1268.
Gomes, G. S. S., Ludermir, T. B., & Lima, L. M. M. R. (2011).
Comparison of new activation functions in neural network for
forecasting financial time series, Neural Computing and Applications,
Springer, Volume 20, Issue 3, pp. 417-439.
59
Activation Function
Activation Function (Cont.)

Karlik, B., & Olgac, A. V. (2010) Performance Analysis of Various
Activation Functions in Generalized MLP Architectures of Neural
Networks, International Journal of Artificial Intelligence and Expert
Systems, Volume 1, Issue 4, pp. 111-122.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier
Neural Networks, International Conference on Artificial Intelligence
and Statistics, Fort Lauderdale, USA, Volume 15, pp. 315-323.
60
Experimental Design
Experimental Design for PCA-AAELM and ECM-AAELM

10 fold cross validation has been used in our experiment.
Both
PCA-AAELM and ECM-AAELM have one user

defined parameter, PCA has variance i.e. eigen values and
ECM has threshold value.
We fixed activation function and varied variance from 1 to 99
in PCA-AAELM and threshold from 0.001 to 0.999 in ECMAAELM for each activation function on whole dataset.
We used 11 activation functions and compare their
performances.
61
PCA-AAELM
Steps of PCA-AAELM
Following steps are required for training process :*

Training Dataset
Perform the PCA
Selection of optimal
number of hidden
nodes and value of
hidden node as input
weight
PC * Training Data
Neural Network
Model
Compute
the
output weight by
performing MoorePenrose generalized
inverse
Perform the no-linear

transformation
62
ECM-AAELM
Evolving Clustering Method
x3
x1
x1
C10 R10 =0
Evolving Clustering Method *
C20 R20 =0
x4
x2
R11
C11
x8
C30
C30 R30 =0
x7
C21
C21
x6
x9
C13
C12
x5
63
ECM-AAELM
Steps of ECM-AAELM
Following steps are required for training process : *

1) First; perform ECM on given dataset and find how many
clusters will be generated.
2) Then extract centre of each cluster and assume each cluster
as a hidden node. Therefore, number of hidden nodes are
equal to number of generated clusters.
3) Calculate normalized Euclidean distance from centre of each
cluster, which is presented as hidden nodes.
4) Performs non-linear transformation by activation function on
this distance to get hidden node output.
64
ECM-AAELM
Steps of ECM-AAELM (Cont.)
5) Normalized Euclidean distance formula is:

x y
x
q
i 1
Where q=number of features.

6) After this perform Moore-Penrose generalized inverse on
output of previous step and multiply by dataset to calculate
output weight.
7) In last, multiply output weight to hidden node output to get
final output.
65
ELM
Why Moore-Penrose Inverse
Why are we using Moore-Penrose Generalized Inverse *

Moore- Penrose provides solution of a linear system
Ax=y
in such a way that
error = Ax-y and x
both will be minimized simultaneously and gives a unique
solution :
x=H y
Formula : H = (HT H)-1 HT
66
Flow of CPNN Algorithm
Initialize Network
N epochs
Get Input
Repeat for
all inputs
Find Winner
Update Winner &
neighbourhoods
Update nodes at Grossberg
Outstar
67
Architecture of Forward only CPNN

Input
Weights trained by
simple competitive
learning
Hidden
Weights trained by
Outstar rule
Output
x1
h1
y1
x2
h2
y2
xm
hn
yp
68
How weights are updating ?
Fig. 9 red color

Hidden Nodes
and 10 blue color
input samples
69
How to Calculate Gray Distance*:

Gray Relational Coefficient :
mis
GRC ( x kp
, xi )
mis
mis
min i min p | x kp
xip | max i max p | x kp
xip |
|x
mis
kp
xip | max i max p | x
mis
kp
xip |
p 1,2,3,......, m.
k 1,2,3,......, n.
i 1,2,3,......., o.
0 1 Control the level of differences with respect to the relational coefcient.
Gray Relational Grade :
GRG ( x
mis
k
1
, xi )
m
GRC ( x
p 1
mis
kp
, xi )
i 1,2,......, o.
k 1, 2,....., n.
70
Example *
attr1
attr2
attr3
attr4
attr5
R1
0.2
0.9
0.6
0.5
R2
R3
R4
R5
0.1
0.1
0.8
0.5
0.3
0.4
0.2
0.8
0.9
0.8
0.5
0.3
0.4
0.5
0.3
0.9
0.6
0.6
0.2
0.7
Abs.
Diff1
0.1
0.1
0.6
0.3
Abs.
Diff2
0
0.1
0.4
0.6
Abs.
Diff3
0.2
0.1
0.3
0.3
Abs.
Diff4
0.1
0.1
0.3
0.2
Min
Max
0
0.1
0.3
0.2
0.2
0.1
0.6
0.6
GRC1
GRC2
GRC3
GRC4
GRG
R2
0.75
0.6
0.75
0.775
R3
0.75
0.75
0.75
0.75
0.75
Actual value = 0.3
0.5
0.5
0.440476
0.5
0.6
0.483333
Imputation by Gray
Distance = 0.3
R4
R5
0.333333 0.428571
0.5
0.333333
Min= 0
Max=0.6
71
Gray Distance Based Imputation

Results
Average MAPE value over 10 fold - Gray Distance Based Imputation *
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
Mean
K-Means+MLP
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99
23.75
7.83
21.01
26.61
9.41
29.7
39.91
12.14
33.01
30.96
32.17
21.58
Gray Distance Based

Imputation
16.73
7.65
19.28
22.89
5.34
28.06
36.29
11.60
36.63
39.75
28.90
17.58
72
Literature Survey
73
Data Imputation
Data Imputation
Vriens, M., & Melton, E. (2002). Managing missing data. Marketing Research,
Volume 14, Issue 3, pp.1217.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation using
principal component analysis and Auto associative neural networks, Journal of
Systemics, Cybernetics and Informatics, Volume 7, pp. 72-79 .
Ankaiah, N., & Ravi, V. (2011). A novel soft computing hybrid for data
imputation, International Conference on Data Mining, Las Vegas, USA.
Nishanth, K.J., Ankaiah, N., Ravi, V., & Bose, I. (2012). Soft computing based
imputation and hybrid data and text mining: The case of predicting the severity
of phishing alerts, Expert Systems with Applications, Volume 39, Issue 12, pp.
10583-10589.
M. Krishna, V. Ravi, Particle swarm optimization and covariance matrix
based data imputation, IEEE International Conference on Computational
Intelligence and Computing Research (ICCIC), Enathi, 2013.
74
Huang, G.B., Zhu, Q., & Siew, C. (2006). Extreme Learning Machine:
Theory and Applications, Neurocomputing, Elsevier, 7th Brazilian
Symposium on Neural Networks, Volume 70, pp. 489-501.
Huang, G., Wang, D., & Lan, Y. (2011). Extreme Learning Machine: A
Survey, International Journal of Machine Learning and Cybernetics June
2011, Volume 2, Issue 2, pp 107-122.
Rajesh, R., & Siva, J. (2011). Extreme Learning Machine A Review and
State of Art, International Journal Of Wisdom Based Computing, Volume
1, pp. 35-49.
Huang, G.B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme Learning
Machine for Regression and Multiclass Classification, IEEE Transaction
on Systems, Man And Cybernetics, Volume 42, Issue 2, pp. 513-529.
75
ECM & CPNN
Evolving Clustering Method*
Song, Q. & Kasasbov, N. (2001) ECM A Novel On-line, Evolving

Clustering Method and Its Applications, Proceedings of the Fifth
Biannual Conference on Artificial Neural Networks and Expert Systems,
Berlin, pp. 87-92.
Counter Propagation Neural Network

Kuzmanovski, I., & Novi, M. (2008). Counter-propagation neural
networks in Matlab, Chemometrics and Intelligent Laboratory Systems,
pp. 84-91.
Ballabio, D., Consonni, V., & Todeschini, R. (2009). The Kohonen and CPANN toolbox: A collection of MATLAB modules for Self Organizing Maps
and Counterpropagation Artificial Neural Networks, Chemometrics and
Intelligent Laboratory Systems, pp. 115-122.
Sivanandam, S. N., & Deepa, S. N. Introduction to neural networks Using
MATLAB 6.0.
76
Dataset Description
Dataset
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine
Number of records Number of attributes

392
7
252
14
506
13
516
10
150
4
768
8
66
9
267
44
40
12
60
10
1225
12
178
13
77

Hybrid Intelligence System For Data Imputation For Final Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hybrid Intelligence System For Data Imputation For Final Review

Uploaded by

Copyright:

Available Formats

Hybrid Intelligence Systems For Data

What is missing data ?

What is missing data ?

Creating Missing Values and Mean Imputation

Initially, No Missing Data

the mean absolute percentage error (Flores,1986)

Average MAPE value over 10 fold Mean Imputation

Error value is too high for

Overview of Extreme Learning Machine (ELM)

between the ith hidden node and the

Table of Activation Functions

Table of Activation Functions *

Architecture of AAELM (Autoassociative ELM)

Auto encoders are

Average MAPE value over 10 folds Ensembled AAELM *

Problems and Solutions of Ensembled AAELM

Dependency of AAELM on randomness is very high and

Remedy of Above Problem

We proposed two new hybrid methods to stabilize

Average MAPE value over 10 folds - PCA-AAELM *

Block Diagram of the Proposed Method

Impute Incomplete Features with

How to calculate missing values by the help of cluster centers ?

Average MAPE value over 10 folds - ECM-Imputation *

Average MAPE value over 10 folds - ECM-AAELM

Boston Forest fires

Boston Forest fires

Influence of Dthr value on MAPE results : ECM-AAELM

Block Diagram of the Proposed Method

Compute Covariance matrix for complete

Compute MSE b/w Ccov & Tcov and absolute difference

Invoke PSO to select Dthr value

Apply ECM with Dthr value yielded by

Dataset does not contain

Average MAPE value over 10 fold PSO-ECM based Imputation *

Average MAPE value over 10 fold - PSO-COV-ECM + ECM_AAELM

Added the concept of auto-associativity in CPNN and

Average MAPE value over 10 fold - CPAANN

K-Means+MLP CPAANN based

Comparison Between All Methods

List of Published and Communicated Research Papers

Kuzmanovski, I., & Novi, M. (2008). Counter-propagation neural networks in

Activation Function (Cont.)

Experimental Design for PCA-AAELM and ECM-AAELM

PCA-AAELM and ECM-AAELM have one user

Following steps are required for training process :*

Perform the PCA

Perform the no-linear

Evolving Clustering Method *

Following steps are required for training process : *

5) Normalized Euclidean distance formula is:

Where q=number of features.

Why are we using Moore-Penrose Generalized Inverse *

Flow of CPNN Algorithm

Architecture of Forward only CPNN

How weights are updating ?

Fig. 9 red color

How to Calculate Gray Distance*:

xip | max i max p | x

Actual value = 0.3

Gray Distance Based Imputation

Average MAPE value over 10 fold - Gray Distance Based Imputation *

Gray Distance Based