You are on page 1of 3

2015 IEEE International Conference on Big Data (Big Data)

Genetic Deep Neural Networks Using Different Activation Functions


for Financial Data Mining
Luna M. Zhang
Soft Tech Consulting, Inc.
Chantilly, USA
e-mail: lunamzhang@gmail.com
AbstractA Deep Neural Network (DNN) using the same
activation function for all hidden neurons has an optimization
limitation due to its single mathematical functionality. To solve
it, a new DNN with different activation functions is designed to
globally optimize both parameters (weights and biases) and
function selections. In addition, a novel Genetic Deep Neural
Network (GDNN) with different activation functions uses
genetic algorithms to optimize the parameters and selects the
best activation function combination for different neurons
among many activation function combinations through
sufficient simulations. Two sample financial data sets (Dow
Jones Industrial Average and 30-Year Treasury Constant
Maturity Rate were used for performance analysis.
Simulation results indicate that a GDNN using different
activation functions can perform better than one using a single
activation function. Future works include (1) developing more
effective DNNs using different activation functions by using
both cloud and GPU computing, (2) creating more effective
DNNs by using new training optimization methods different
from genetic algorithms, (3) using big data to further test the
performance of the new GDNN, and (4) expanding its big data
mining application areas (i.e. health and biomedical
informatics, computer vision, social networks, security, etc.).

biologically identical. To mimic the biological neural


networks in the human brain, we create a more complex
DNN with various neurons that use different mathematical
activation functions instead of having a conventional DNN
with neurons that all use the same activation function. The
main merit is that a new learning algorithm can globally
optimize both the parameters (weights and biases) among all
the neurons and activation function selection for each neuron
to significantly enhance prediction performance (achieving
comparatively lower error rates).
This paper investigates whether a DNN using more than
one activation function performs better than a DNN using
only one activation function. In addition, a new Genetic
Deep Neural Network (GDNN) using different activation
functions is developed by using Genetic Algorithms (GAs)
to optimize the parameters. The simulation-based activation
function optimization method is used to directly select the
GDNN with the best combination of activation functions
based on sufficient simulation results.

Keywords-Deep learning; machine learning; neural


networks; genetic algorithms; big data mining; optimization;
activation functions

A. A Generic Framework of DNN


For the basic structure, a DNN has an input layer, n
hidden layers, and an output layer. For given inputs
 with input neurons, let the outputs of the
input layer be for . The output of
the -th neuron in the i-th layer is defined as



(1)
where i denotes the i-th layer (i.e., i = 0 is the input layer
and i = n+1 is the output layer), is the internal bias of the
-th neuron in the i-th layer, is the weight between the
-th neuron in the -th layer and the -th neuron in the
i-th layer, and for (i.e., the i-th layer has
neurons). This general DNN with N hidden neurons and M
output neurons can use a maximum of N+M different
activation functions.
Each neuron in each hidden or output layer of the general
DNN can have a different activation function. Hence, if there
are 100 neurons and 3 different activation functions, then
there are a total of 3100 combinations of functions for all the
neurons. The advantage is that a global optimization
algorithm can be used to find an optimal or near-optimal
function combination in addition to only locally optimizing
the parameters among all the neurons.

I.

INTRODUCTION

A Deep Neural Network (DNN) has many successful


applications in big data mining, computer vision, image
processing, pattern recognition, biomedical informatics, etc.
[1-3]. A DNN usually has neurons that all use the same
activation function, such as the unipolar sigmoid function [13]. A traditional learning algorithm usually only focuses on
optimizing parameters (weights and biases) among the
neurons using one activation function. Different neural
networks with different activation functions have different
performance results. For instance, among the activation
functions (bipolar sigmoid, unipolar sigmoid, hyperbolic
tangent, conic section, and radial bases functions), simulation
results indicate that the hyperbolic tangent function performs
the best [4]. Thus, finding optimal or near-optimal activation
functions for different neurons to greatly improve
performance is an important and difficult research problem.
The human brain has a huge number of various biological
neurons that all have different functionalities. Therefore, all
biological neurons do not have the same mathematical
function because they are not completely physically and

978-1-4799-9926-2/15/$31.00 2015 IEEE

II. NEW DEEP NEURAL NETWORKS USING GENETIC


ALGORITHMS AND DIFFERENT ACTIVATION FUNCTIONS

2849

B. GDNN with Different Activation Functions


A DNN using GAs and evolutionary computation
methods are described in [5-10]. GAs can be an excellent
alternative to existing methods for training DNNs.
A new GDNN is developed by using GAs to optimize the
parameters and by applying the simulation-based activation
function optimization method to select the GDNN with the
best function combination. Three activation functions tested
by the new GDNN are given below:
1
,
(2)
unipolar sigmoid (U): g ( x ) =
1 + e x
1 e x
,
(3)
bipolar sigmoid (B): g ( x) =
1 + e x
e x e x
.
(4)
hyperbolic tangent (H): g ( x) = x
e + e x
III.

SIMULATIONS AND PERFORMANCE ANALYSIS

The new GDNN is implemented in Java. 206 training


data and 34 testing data (Dow Jones Industrial Average
[11]) are used to analyze the GDNN with different numbers
of hidden layers (4, 6, , 20). 1000 training data and 100
testing data (30-Year Treasury Constant Maturity Rate
[11]) are also used to analyze it with 4, 6, 8, 10, and 12
hidden layers. Both dataset samples are obtained from the
source: http://www.forecasts.org/data/index.htm [11]. The
fitness function (mean squared errors (MSEs)) to be
minimized by GAs is defined as
1 K L i i 2
MSE =
(5)
(o j t j ) ,
KL i =1 j =1

where oij is the predicted output and t ij is the correct output

TABLE II.
PERFORMANCE OF GDNN WITH SINGLE ACTIVATION
FUNCTION FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
No.
Hidden
Layers
4

0.0101

0.0066

0.0078

0.0043

0.0051

0.0102

0.0135

0.0104

0.0120

0.0161

0.0172

0.0028

0.0243

0.0169

0.0162

0.0359

0.0201

0.0080

10

0.0242

0.0379

0.0239

0.0077

0.0597

0.0223

12

0.0360

0.0399

0.0316

0.0700

0.0365

0.0635

Avg.

0.0216

0.0223

0.0183

0.0268

0.0277

0.0213

TABLE III.
TRAINING PERFORMANCE OF GDNN WITH TWO
ACTIVATION FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE

A. GDNN with One Activation Function


Each neuron in the GDNN uses the same activation
function (B, U, or H). Results are shown in Tables I and II.

No.
Hidden
Layers
4

B-U

B-H

U-B

U-H

H-U

H-B

0.0292

0.0381

0.0401

0.0279

0.0202

0.0359

0.0358

0.0367

0.0306

0.0219

0.0281

0.0374

0.0385

0.0431

0.0430

0.0392

0.0458

0.0420

10

0.0446

0.0512

0.0515

0.0399

0.0394

0.0418

12

0.0436

0.0502

0.0501

0.0559

0.0324

0.0446

14

0.0576

0.0645

0.0526

0.0554

0.0540

0.0645

16

0.0645

0.0645

0.0645

0.0645

0.0645

0.0645

18

0.0580

0.0573

0.0645

0.0645

0.0645

0.0645

20

0.0645

0.0645

0.0645

0.0645

0.0645

0.0645

Avg.

0.0485

0.0522

0.0513

0.0482

0.0459

0.0511

0.0238

0.0247

0.0391

0.0098

0.0368

0.0211

0.0442

0.0442

0.0212

0.0163

0.0387

0.0349

0.0380

0.0386

0.0436

0.0330

0.0833

0.0411

10

0.0482

0.0603

0.0478

0.0720

0.1066

0.0164

12
14

0.0608

0.0520

Average Testing MSE

Testing MSE

0.0645

0.1071

0.0601

0.0645

0.0645

0.0207

0.1024

0.1016

16

0.0361

0.0645

0.0645

0.0164

0.1022

0.1031

18

0.0563

0.0645

0.0645

0.0188

0.1022

0.1019

20

0.0645

0.0645

0.0645

0.1019

0.1023

0.1026

Avg.

0.0484

0.0531

0.0527

0.0440

0.0816

0.0637

B-U

B-H

U-B

U-H

H-U

H-B

0.0511

0.0479

0.0550

0.0491

0.0583

0.0532

TABLE V. TRAINING PERFORMANCE OF GDNN WITH TWO ACTIVATION


FUNCTIONS FOR 30-YEAR TREASURY CONSTANT MATURITY RATE

0.0503

0.0633

Training MSE

TABLE IV. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION


FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE

TABLE I.
PERFORMANCE OF GDNN WITH SINGLE ACTIVATION
FUNCTION FOR DOW JONES INDUSTRIAL AVERAGE
Training MSE

Testing MSE

B. GDNN with Two Activation Functions


Each layer alternates between two different activation
functions. The output layer uses the same function of the
last n-th hidden layer. For example, B-U in Table III
means that all the neurons in the odd layers (1st, 3rd, etc.) use
the bipolar sigmoid function while all the neurons in the
even layers (2nd, 4th, etc.) use the unipolar sigmoid function.
The output layer uses the unipolar sigmoid function. Results
are shown in Tables III to VI.

of the -th output neuron for the -th input data for K data
and L output neurons.
In this paper, all of the neurons in the same hidden layer
or the same output layer have the same activation function.

No.
Hidden
Layers
4

Training MSE

2850

No.
Hidden
Layers
4

B-U

B-H

U-B

U-H

H-U

H-B

0.0065

0.0060

0.0079

0.0079

0.0127

0.0101

0.0113

0.0117

0.0097

0.0108

0.0173

0.0106

0.0245

0.0189

0.0237

0.0138

0.0156

0.0202

10

0.0226

0.0240

0.0305

0.0272

0.0236

0.0147

12

0.0227

0.0200

0.0334

0.0304

0.0399

0.0391

Avg.

0.0175

0.0161

0.0211

0.0180

0.0218

0.0189

Training MSE

TABLE VI. TESTING PERFORMANCE OF GDNN WITH TWO ACTIVATION


FUNCTIONS FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
Average Testing MSE
B-U

B-H

U-B

U-H

H-U

H-B

0.0132

0.0152

0.0154

0.0230

0.0302

0.0161

C. GDNN with Three Activation Functions


Each layer alternates between three different activation
functions. The output layer uses the same function of the
-th hidden layer. For example, B-H-U in Table
VII means that all the neurons in the 1st, 4th, layers use B,
all the neurons in the 2nd, 5th, layers use H, and all the
neurons in the 3rd, 6th, layers use U. The output layer uses
H. Results are shown in Tables VII to X.
TABLE VII.
TRAINING PERFORMANCE OF GDNN WITH THREE
ACTIVATION FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE
No.
Hidden
Layers
4

B-H-U

B-U-H

H-B-U

H-U-B

U-B-H

U-H-B

0.0387

0.0225

0.0222

0.0334

0.0230

0.0365

0.0492

0.0381

0.0349

0.0228

0.0256

0.0370

0.0470

0.0474

0.0372

0.0393

0.0404

0.0391

10

0.0500

0.0359

0.0380

0.0477

0.0411

0.0512

12

0.0572

0.0442

0.0541

0.0574

0.0492

0.0483

14

0.0645

0.0645

0.0515

0.0618

0.0430

0.0645

16

0.0645

0.0513

0.0645

0.0645

0.0550

0.0606

18

0.0645

0.0645

0.0645

0.0645

0.0645

0.0645

20

0.0645

0.0645

0.0645

0.0645

0.0645

0.0645

Avg.

0.0556

0.0481

0.0479

0.0507

0.0451

0.0518

Training MSE

TABLE VIII. TESTING PERFORMANCE OF GDNN WITH THREE


ACTIVATION FUNCTIONS FOR DOW JONES INDUSTRIAL AVERAGE
Average Testing MSE
B-H-U

B-U-H

H-B-U

H-U-B

U-B-H

U-H-B

0.0648

0.0742

0.0524

0.0762

0.0351

0.0694

TABLE IX. TRAINING PERFORMANCE OF GDNN WITH THREE ACTIVATION


FUNCTIONS FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
No.
Hidden
Layers
4

B-H-U

B-U-H

H-B-U

H-U-B

U-B-H

U-H-B

0.0082

0.0081

0.0071

0.0074

0.0396

0.0058

0.0101

0.0101

0.0128

0.0128

0.0159

0.0103

0.0246

0.0073

0.0201

0.0156

0.0089

0.0190

10

0.0340

0.0307

0.0354

0.0235

0.0282

0.0273

12

0.0304

0.0152

0.0330

0.0396

0.0369

0.0262

Avg.

0.0214

0.0143

0.0217

0.0198

0.0259

0.0177

Training MSE

TABLE X. TESTING PERFORMANCE OF GDNN WITH THREE ACTIVATION


FUNCTIONS FOR 30-YEAR TREASURY CONSTANT MATURITY RATE
Average Testing MSE
B-H-U

B-U-H

H-B-U

H-U-B

U-B-H

U-H-B

0.0234

0.0209

0.0279

0.0189

0.0172

0.0162

IV.

CONCLUSIONS

The simulation results indicate that among 15 GDNNs,


the GDNN with U-B-H is the best with the minimum
average testing MSE (0.0351) for Dow Jones Industrial
Average data, and GDNN with B-H is the best with the
minimum average testing MSE (0.0132) for 30-Year
Treasury Constant Maturity Rate data. Thus, a GDNN using
different activation functions can perform better than one
using a single function. The new GDNN can optimize both
parameters by GAs and activation function selections by the
simulation-based activation function optimization method.
Future works include (1) developing more effective
DNNs by quickly optimizing both the parameters and
activation function selections by using cloud computing and
GPU computing, (2) creating more effective DNNs by using
optimization methods other than GAs, such as humaninspired algorithms [12], ant colony optimization, and bees
algorithms, (3) using big data to further test the performance
of the new GDNN, and (4) expanding its big data mining
application areas (i.e. health and biomedical informatics,
computer vision, social networks, security, etc.).
REFERENCES
[1]

H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, Exploring


Strategies for Training Deep Neural Networks, Journal of Machine
Learning Research, vol. 10, pp. 1-40, January 2009.
[2] X Glorot and Y. Bengio, Understanding the difficulty of training
deep feedforward neural networks, Proc. the International
Conference on Artificial Intelligence and Statistics (AISTATS10),
pp. 249-256, 2010.
[3] https://en.wikipedia.org/wiki/Deep_learning.
[4] B. Karlik and A V. Olgac, Performance Analysis of Various
Activation Functions in Generalized MLP Architectures of Neural
Networks, International Journal of Artificial Intelligence And Expert
Systems (IJAE), vol. 1, issue 4, pp. 111-122, 2011.
[5] J. D. Lamos-Sweeney, Deep Learning Using Genetic Algorithms,
Thesis, Department of Computer Science at Rochester Institute of
Technology, May 2012.
[6] O. E. David, I. Greental, Genetic algorithms for evolving deep
neural networks, Proc. the 2014 conference companion on Genetic
and evolutionary computation companion, pp. 1451-1452, 2014.
[7] S. S. Tirumala, Implementation of Evolutionary Algorithms for
Deep Architectures, 2nd International Workshop on Artificial
Intelligence and Cognition, Torino, Italy, pp. 164-171, 2014.
[8] T. Shinozaki and S. Watanabe, Structure discovery of deep neural
network based on evolutionary algorithms, 2015 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 4979-4983, 2015.
[9] E. Fall and H.-H. Chiang, Neural networks with dynamic structure
using a GA-based learning method, 2015 IEEE 12th International
Conference on Networking, Sensing and Control (ICNSC), pp. 7-12,
2015.
[10] S. Lander and Y. Shang, EvoAE -- A New Evolutionary Method for
Training Autoencoders for Deep Learning Networks, 2015 IEEE
39th Annual Computer Software and Applications Conference
(COMPSAC), pp. 790-795, 2015.
[11] http://www.forecasts.org/data/index.htm.
[12] L. M. Zhang and Y.-Q. Zhang, The Human-Inspired Algorithm: A
Hybrid Nature-Inspired Approach to Optimizing Continuous
Functions with Constraints, Journal of Computational Intelligence
and Electronic Systems, vol. 2, no. 1, pp. 80-87, June 2013.

2851

You might also like