Genetic Deep Neural Networks Using Different Activation Functions For Financial Data Mining

2015 IEEE International Conference on Big Data (Big Data)
Genetic Deep Neural Networks Using Different Activation Functions

for Financial Data Mining
Luna M. Zhang
Soft Tech Consulting, Inc.
Chantilly, USA
e-mail: lunamzhang@gmail.com
AbstractA Deep Neural Network (DNN) using the same
activation function for all hidden neurons has an optimization
limitation due to its single mathematical functionality. To solve
it, a new DNN with different activation functions is designed to
globally optimize both parameters (weights and biases) and
function selections. In addition, a novel Genetic Deep Neural
Network (GDNN) with different activation functions uses
genetic algorithms to optimize the parameters and selects the
best activation function combination for different neurons
among many activation function combinations through
sufficient simulations. Two sample financial data sets (Dow
Jones Industrial Average and 30-Year Treasury Constant
Maturity Rate were used for performance analysis.
Simulation results indicate that a GDNN using different
activation functions can perform better than one using a single
activation function. Future works include (1) developing more
effective DNNs using different activation functions by using
both cloud and GPU computing, (2) creating more effective
DNNs by using new training optimization methods different
from genetic algorithms, (3) using big data to further test the
performance of the new GDNN, and (4) expanding its big data
mining application areas (i.e. health and biomedical
informatics, computer vision, social networks, security, etc.).
biologically identical. To mimic the biological neural

networks in the human brain, we create a more complex
DNN with various neurons that use different mathematical
activation functions instead of having a conventional DNN
with neurons that all use the same activation function. The
main merit is that a new learning algorithm can globally
optimize both the parameters (weights and biases) among all
the neurons and activation function selection for each neuron
to significantly enhance prediction performance (achieving
comparatively lower error rates).
This paper investigates whether a DNN using more than
one activation function performs better than a DNN using
only one activation function. In addition, a new Genetic
Deep Neural Network (GDNN) using different activation
functions is developed by using Genetic Algorithms (GAs)
to optimize the parameters. The simulation-based activation
function optimization method is used to directly select the
GDNN with the best combination of activation functions
based on sufficient simulation results.
Keywords-Deep learning; machine learning; neural

networks; genetic algorithms; big data mining; optimization;
activation functions
A. A Generic Framework of DNN

For the basic structure, a DNN has an input layer, n
hidden layers, and an output layer. For given inputs
with input neurons, let the outputs of the
input layer be for . The output of
the -th neuron in the i-th layer is defined as

(1)
where i denotes the i-th layer (i.e., i = 0 is the input layer
and i = n+1 is the output layer), is the internal bias of the
-th neuron in the i-th layer, is the weight between the
-th neuron in the -th layer and the -th neuron in the
i-th layer, and for (i.e., the i-th layer has
neurons). This general DNN with N hidden neurons and M
output neurons can use a maximum of N+M different
activation functions.
Each neuron in each hidden or output layer of the general
DNN can have a different activation function. Hence, if there
are 100 neurons and 3 different activation functions, then
there are a total of 3100 combinations of functions for all the
neurons. The advantage is that a global optimization
algorithm can be used to find an optimal or near-optimal
function combination in addition to only locally optimizing
the parameters among all the neurons.
I.
INTRODUCTION
A Deep Neural Network (DNN) has many successful

applications in big data mining, computer vision, image
processing, pattern recognition, biomedical informatics, etc.
[1-3]. A DNN usually has neurons that all use the same
activation function, such as the unipolar sigmoid function [13]. A traditional learning algorithm usually only focuses on
optimizing parameters (weights and biases) among the
neurons using one activation function. Different neural
networks with different activation functions have different
performance results. For instance, among the activation
functions (bipolar sigmoid, unipolar sigmoid, hyperbolic
tangent, conic section, and radial bases functions), simulation
results indicate that the hyperbolic tangent function performs
the best [4]. Thus, finding optimal or near-optimal activation
functions for different neurons to greatly improve
performance is an important and difficult research problem.
The human brain has a huge number of various biological
neurons that all have different functionalities. Therefore, all
biological neurons do not have the same mathematical
function because they are not completely physically and
978-1-4799-9926-2/15/$31.00 2015 IEEE
II. NEW DEEP NEURAL NETWORKS USING GENETIC

ALGORITHMS AND DIFFERENT ACTIVATION FUNCTIONS
2849
B. GDNN with Different Activation Functions

A DNN using GAs and evolutionary computation
methods are described in [5-10]. GAs can be an excellent
alternative to existing methods for training DNNs.
A new GDNN is developed by using GAs to optimize the
parameters and by applying the simulation-based activation
function optimization method to select the GDNN with the
best function combination. Three activation functions tested
by the new GDNN are given below:
1
,
(2)
unipolar sigmoid (U): g ( x ) =
1 + e x
1 e x
,
(3)
bipolar sigmoid (B): g ( x) =
1 + e x
e x e x
.
(4)
hyperbolic tangent (H): g ( x) = x
e + e x
III.
SIMULATIONS AND PERFORMANCE ANALYSIS
The new GDNN is implemented in Java. 206 training

data and 34 testing data (Dow Jones Industrial Average
[11]) are used to analyze the GDNN with different numbers
of hidden layers (4, 6, , 20). 1000 training data and 100
testing data (30-Year Treasury Constant Maturity Rate
[11]) are also used to analyze it with 4, 6, 8, 10, and 12
hidden layers. Both dataset samples are obtained from the
source: http://www.forecasts.org/data/index.htm [11]. The
fitness function (mean squared errors (MSEs)) to be
minimized by GAs is defined as
1 K L i i 2
MSE =
(5)
(o j t j ) ,
KL i =1 j =1
where oij is the predicted output and t ij is the correct output
TABLE II.
PERFORMANCE OF GDNN WITH SINGLE ACTIVATION
FUNCTION FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
No.
Hidden
Layers
4
0.0101
0.0066
0.0078
0.0043
0.0051
0.0102
0.0135
0.0104
0.0120
0.0161
0.0172
0.0028
0.0243
0.0169
0.0162
0.0359
0.0201
0.0080
10
0.0242
0.0379
0.0239
0.0077
0.0597
0.0223
12
0.0360
0.0399
0.0316
0.0700
0.0365
0.0635
Avg.
0.0216
0.0223
0.0183
0.0268
0.0277
0.0213
TABLE III.
TRAINING PERFORMANCE OF GDNN WITH TWO
ACTIVATION FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE
A. GDNN with One Activation Function

Each neuron in the GDNN uses the same activation
function (B, U, or H). Results are shown in Tables I and II.
No.
Hidden
Layers
4
B-U
B-H
U-B
U-H
H-U
H-B
0.0292
0.0381
0.0401
0.0279
0.0202
0.0359
0.0358
0.0367
0.0306
0.0219
0.0281
0.0374
0.0385
0.0431
0.0430
0.0392
0.0458
0.0420
10
0.0446
0.0512
0.0515
0.0399
0.0394
0.0418
12
0.0436
0.0502
0.0501
0.0559
0.0324
0.0446
14
0.0576
0.0645
0.0526
0.0554
0.0540
0.0645
16
0.0645
0.0645
0.0645
0.0645
0.0645
0.0645
18
0.0580
0.0573
0.0645
0.0645
0.0645
0.0645
20
0.0645
0.0645
0.0645
0.0645
0.0645
0.0645
Avg.
0.0485
0.0522
0.0513
0.0482
0.0459
0.0511
0.0238
0.0247
0.0391
0.0098
0.0368
0.0211
0.0442
0.0442
0.0212
0.0163
0.0387
0.0349
0.0380
0.0386
0.0436
0.0330
0.0833
0.0411
10
0.0482
0.0603
0.0478
0.0720
0.1066
0.0164
12
14
0.0608
0.0520
Average Testing MSE
Testing MSE
0.0645
0.1071
0.0601
0.0645
0.0645
0.0207
0.1024
0.1016
16
0.0361
0.0645
0.0645
0.0164
0.1022
0.1031
18
0.0563
0.0645
0.0645
0.0188
0.1022
0.1019
20
0.0645
0.0645
0.0645
0.1019
0.1023
0.1026
Avg.
0.0484
0.0531
0.0527
0.0440
0.0816
0.0637
B-U
B-H
U-B
U-H
H-U
H-B
0.0511
0.0479
0.0550
0.0491
0.0583
0.0532
TABLE V. TRAINING PERFORMANCE OF GDNN WITH TWO ACTIVATION

FUNCTIONS FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
0.0503
0.0633
Training MSE
TABLE IV. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION

FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE
TABLE I.
PERFORMANCE OF GDNN WITH SINGLE ACTIVATION
FUNCTION FOR DOW JONES INDUSTRIAL AVERAGE
Training MSE
Testing MSE
B. GDNN with Two Activation Functions

Each layer alternates between two different activation
functions. The output layer uses the same function of the
last n-th hidden layer. For example, B-U in Table III
means that all the neurons in the odd layers (1st, 3rd, etc.) use
the bipolar sigmoid function while all the neurons in the
even layers (2nd, 4th, etc.) use the unipolar sigmoid function.
The output layer uses the unipolar sigmoid function. Results
are shown in Tables III to VI.
of the -th output neuron for the -th input data for K data
and L output neurons.
In this paper, all of the neurons in the same hidden layer
or the same output layer have the same activation function.
No.
Hidden
Layers
4
Training MSE
2850
No.
Hidden
Layers
4
B-U
B-H
U-B
U-H
H-U
H-B
0.0065
0.0060
0.0079
0.0079
0.0127
0.0101
0.0113
0.0117
0.0097
0.0108
0.0173
0.0106
0.0245
0.0189
0.0237
0.0138
0.0156
0.0202
10
0.0226
0.0240
0.0305
0.0272
0.0236
0.0147
12
0.0227
0.0200
0.0334
0.0304
0.0399
0.0391
Avg.
0.0175
0.0161
0.0211
0.0180
0.0218
0.0189
Training MSE
TABLE VI. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION

Average Testing MSE
B-U
B-H
U-B
U-H
H-U
H-B
0.0132
0.0152
0.0154
0.0230
0.0302
0.0161
C. GDNN with Three Activation Functions

Each layer alternates between three different activation
functions. The output layer uses the same function of the
-th hidden layer. For example, B-H-U in Table
VII means that all the neurons in the 1st, 4th, layers use B,
all the neurons in the 2nd, 5th, layers use H, and all the
neurons in the 3rd, 6th, layers use U. The output layer uses
H. Results are shown in Tables VII to X.
TABLE VII.
TRAINING PERFORMANCE OF GDNN WITH THREE
No.
Hidden
Layers
4
B-H-U
B-U-H
H-B-U
H-U-B
U-B-H
U-H-B
0.0387
0.0225
0.0222
0.0334
0.0230
0.0365
0.0492
0.0381
0.0349
0.0228
0.0256
0.0370
0.0470
0.0474
0.0372
0.0393
0.0404
0.0391
10
0.0500
0.0359
0.0380
0.0477
0.0411
0.0512
12
0.0572
0.0442
0.0541
0.0574
0.0492
0.0483
14
0.0645
0.0645
0.0515
0.0618
0.0430
0.0645
16
0.0645
0.0513
0.0645
0.0645
0.0550
0.0606
18
0.0645
0.0645
0.0645
0.0645
0.0645
0.0645
20
0.0645
0.0645
0.0645
0.0645
0.0645
0.0645
Avg.
0.0556
0.0481
0.0479
0.0507
0.0451
0.0518
Training MSE
TABLE VIII. TESTING PERFORMANCE OF GDNN WITH THREE

Average Testing MSE
B-H-U
B-U-H
H-B-U
H-U-B
U-B-H
U-H-B
0.0648
0.0742
0.0524
0.0762
0.0351
0.0694
TABLE IX. TRAINING PERFORMANCE OF GDNN WITH THREE ACTIVATION

No.
Hidden
Layers
4
B-H-U
B-U-H
H-B-U
H-U-B
U-B-H
U-H-B
0.0082
0.0081
0.0071
0.0074
0.0396
0.0058
0.0101
0.0101
0.0128
0.0128
0.0159
0.0103
0.0246
0.0073
0.0201
0.0156
0.0089
0.0190
10
0.0340
0.0307
0.0354
0.0235
0.0282
0.0273
12
0.0304
0.0152
0.0330
0.0396
0.0369
0.0262
Avg.
0.0214
0.0143
0.0217
0.0198
0.0259
0.0177
Training MSE
TABLE X. TESTING PERFORMANCE OF GDNN WITH THREE ACTIVATION

Average Testing MSE
B-H-U
B-U-H
H-B-U
H-U-B
U-B-H
U-H-B
0.0234
0.0209
0.0279
0.0189
0.0172
0.0162
IV.
CONCLUSIONS
The simulation results indicate that among 15 GDNNs,

the GDNN with U-B-H is the best with the minimum
average testing MSE (0.0351) for Dow Jones Industrial
Average data, and GDNN with B-H is the best with the
minimum average testing MSE (0.0132) for 30-Year
Treasury Constant Maturity Rate data. Thus, a GDNN using
different activation functions can perform better than one
using a single function. The new GDNN can optimize both
parameters by GAs and activation function selections by the
simulation-based activation function optimization method.
Future works include (1) developing more effective
DNNs by quickly optimizing both the parameters and
activation function selections by using cloud computing and
GPU computing, (2) creating more effective DNNs by using
optimization methods other than GAs, such as humaninspired algorithms [12], ant colony optimization, and bees
algorithms, (3) using big data to further test the performance
of the new GDNN, and (4) expanding its big data mining
application areas (i.e. health and biomedical informatics,
computer vision, social networks, security, etc.).
REFERENCES
[1]
H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, Exploring

Strategies for Training Deep Neural Networks, Journal of Machine
Learning Research, vol. 10, pp. 1-40, January 2009.
[2] X Glorot and Y. Bengio, Understanding the difficulty of training
deep feedforward neural networks, Proc. the International
Conference on Artificial Intelligence and Statistics (AISTATS10),
pp. 249-256, 2010.
[3] https://en.wikipedia.org/wiki/Deep_learning.
[4] B. Karlik and A V. Olgac, Performance Analysis of Various
Activation Functions in Generalized MLP Architectures of Neural
Networks, International Journal of Artificial Intelligence And Expert
Systems (IJAE), vol. 1, issue 4, pp. 111-122, 2011.
[5] J. D. Lamos-Sweeney, Deep Learning Using Genetic Algorithms,
Thesis, Department of Computer Science at Rochester Institute of
Technology, May 2012.
[6] O. E. David, I. Greental, Genetic algorithms for evolving deep
neural networks, Proc. the 2014 conference companion on Genetic
and evolutionary computation companion, pp. 1451-1452, 2014.
[7] S. S. Tirumala, Implementation of Evolutionary Algorithms for
Deep Architectures, 2nd International Workshop on Artificial
Intelligence and Cognition, Torino, Italy, pp. 164-171, 2014.
[8] T. Shinozaki and S. Watanabe, Structure discovery of deep neural
network based on evolutionary algorithms, 2015 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 4979-4983, 2015.
[9] E. Fall and H.-H. Chiang, Neural networks with dynamic structure
using a GA-based learning method, 2015 IEEE 12th International
Conference on Networking, Sensing and Control (ICNSC), pp. 7-12,
2015.
[10] S. Lander and Y. Shang, EvoAE -- A New Evolutionary Method for
Training Autoencoders for Deep Learning Networks, 2015 IEEE
39th Annual Computer Software and Applications Conference
(COMPSAC), pp. 790-795, 2015.
[11] http://www.forecasts.org/data/index.htm.
[12] L. M. Zhang and Y.-Q. Zhang, The Human-Inspired Algorithm: A
Hybrid Nature-Inspired Approach to Optimizing Continuous
Functions with Constraints, Journal of Computational Intelligence
and Electronic Systems, vol. 2, no. 1, pp. 80-87, June 2013.
2851

Genetic Deep Neural Networks Using Different Activation Functions For Financial Data Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genetic Deep Neural Networks Using Different Activation Functions For Financial Data Mining

Uploaded by

Copyright:

Available Formats

2015 IEEE International Conference on Big Data (Big Data)

Genetic Deep Neural Networks Using Different Activation Functions

biologically identical. To mimic the biological neural

Keywords-Deep learning; machine learning; neural

A. A Generic Framework of DNN

A Deep Neural Network (DNN) has many successful

978-1-4799-9926-2/15/$31.00 2015 IEEE

II. NEW DEEP NEURAL NETWORKS USING GENETIC

B. GDNN with Different Activation Functions

SIMULATIONS AND PERFORMANCE ANALYSIS

The new GDNN is implemented in Java. 206 training

where oij is the predicted output and t ij is the correct output

A. GDNN with One Activation Function

Average Testing MSE

TABLE V. TRAINING PERFORMANCE OF GDNN WITH TWO ACTIVATION

TABLE IV. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION

B. GDNN with Two Activation Functions

TABLE VI. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION

C. GDNN with Three Activation Functions

TABLE VIII. TESTING PERFORMANCE OF GDNN WITH THREE

TABLE IX. TRAINING PERFORMANCE OF GDNN WITH THREE ACTIVATION

TABLE X. TESTING PERFORMANCE OF GDNN WITH THREE ACTIVATION

The simulation results indicate that among 15 GDNNs,

H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, Exploring

You might also like