Professional Documents
Culture Documents
wlji (k + 1) = wlji (k )
1. Introduction
With the rapid development of consumer credit market in China, the role of personal credit scoring becomes more and more important. It is important for commercial banks to keep away from credit risks. In the west, personal credit scoring methods have developed for long and are relatively advanced. A lot of useful methods have been applied into this area[1], including neural networks. In practice of neural networks, BP network based on back-propagation algorithm has been applied widely into many fields, such as function approximation, pattern recognition. However, there exist some insufficiencies of BP algorithm, such as plunging into local minimum, convergence slow around the goal and so on. Aiming at the insufficiency of BP algorithm, we establish a hybrid neural network through the combination of genetic algorithm (GA) and BP algorithms and use this model in personal credit scoring, and in the end, we make an analysis on the applied results of the hybrid network with single BP network.
i in
layer of l 1 and neuron j in layer l , and is a positive number named learning rate which is used to control the learning steps and it is often given as a very small positive number. BP algorithm is an effective method in practice; however, it has some insufficiencies including[3]: (1) BP algorithm uses gradient descent to minimize MSE which usually makes it plunge into local minimum;(2) the performance of the algorithm is very sensitive to the proper setting of the learning rate. If the learning rate is set too high, the algorithm may oscillate and becomes unstable. If the learning rate is too small, the algorithm will take a long time to converge; (3) the initial weights and biases of network, which are important to the convergence process of BP network, are generated randomly, that sometimes even causes the network not to achieve the training goal. And the structure of BP network is often constructed by the experience of the decision-maker for there are no uniform rules to lean on. In view of the insufficiencies, some methods are introduced to improve the performance of BP
209
algorithm including gradient descent with momentum and gradient descent with an adaptive learning rate. The first method is introduced to solve the problem of plunging into local minimum of BP, that is:
regions of the search space. Mutation is used to search for further problem space and to avoid local convergence of GA.
where is named momentum factor, and momentum allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Acting like a low-pass filter, momentum allows the network to ignore small features in the error surface. Without momentum a network may get stuck in a shallow local minimum. With momentum a network can slide through such a minimum. We can use a larger learning rate to hasten the convergence. The second method is based on the following consideration. If the value of MSE falls down by the weights, it means the chosen learning rate is small, and we should choose a larger . Otherwise, we should choose a smaller . These two improved BP algorithms are effective in solving the problems of plunging into local minimum and determining the learning rate. However, the problem of initial weights and biases generated randomly and the slow convergence process around the training goal are not solved effectively.
MSE ( w) w( k + 1) = w( k ) + w( k ) (2) w
210
accuracy of disequilibrium data, we make a pretreatment of the sample data to reduce the variables into the range of [0, 1]. Here we divide the variables into discrete and sequential groups. Table 1. Inputs and Outputs
variable index Education Level Monthly Income value elementary1,middling2, advanced3 Actual value National department1, scientific education cultural and healthy department2, trade and business3, post and communication4, financial and insurance5, social service6, supply of water, electricity and gas7, industry 8, real estate9, other10 manager1, techinque2, officer3, jobless4,other5 yes1, no2 Actual value Actual value Actual value 1,corpus2 pledge1, impawn2, other3 Actual value yes0, no1
x1 x2
x3
Organization Character
x4 x5 x6 x7 x8 x9 x10 y
Career Spouse Loan Amount Time limit Return mode Surety Age Default or not
On variables in discrete group which contains x1 , x3 , x4 , x5 , x7 , x8 , x9 , we use Formula (3), that is: X X min (3) Y = X max X min where Y [0, 1] represents the outcome of data processing; X min and X max represent the minimum and maximum value of the variable X respectively. On variables in sequential group which contains x 2 , x 6 , x10 , we observe that they obey normal distributions, that is xi ~ N ( , 2 ) . So we adopt Formula (4), which can be written as follows: X (4) Y = ( ) where Figure 1. Flowchart of hybrid neural network for personal credit scoring
( x) =
1 2 e dt 2
t2
represents
the
211
number of neurons selected in each layer, there are no rules to lean on just according to the problems and some empirical rules. Here we choose a forward threelayer network with one hidden layer, and on the numbers of neurons in hidden layer we refer to the empirical formula (5)[6]: (5) Lk P(O + 3) + 1 where P and O represent the numbers of nodes in input layer and output layer; L k is the maximum number of hidden neurons. In this paper, we get 10 input variables and 1 output variable, so we choose 10 nodes in input layer and 1 node in output layer as well as 7 nodes in hidden layer.
network respectively; M is a big positive number used to make the fitness more distinctive and here we set it as 100. Genetic operators include selection, crossover and mutation. In this paper, we employ a roulette wheel mechanism to probabilistically select individuals based on their fitness levels. This method is called proportional model, that is:
pi = fi
fi
i =1
(7)
p i is
4.3. Parameters of GA
According to the networks structure, the weights connecting with input layer and hidden layer form a matrix W1 with the size of 10 by 7 containing 70 elements and the biases in hidden layer is a matrix B1 with the size of 7 by 1 containing 7 elements; the weights connecting with hidden layer and output layer form a matrix W2 with the size of 7 by 1 containing 7 elements and the bias in output layer is a matrix B2 containing 1 element. So the number of the parameters needed to be optimized by GA is 85. Considering the number of parameters and the precision, we choose real-valued encodings. The locations of each parameter in chromosome string are listed in Table 2. GA operates on a number of potential solutions, called a population, consisting of some encoding of the parameter set simultaneously. Typically, a population is composed of between 20 and 100 individuals. Here we set the number of population as 100. Table 2. Locations of Each Parameter in Chromosome
W1 w11 w10,7 W2 w1 w7 B1 b1 b7 B2 b
the expected selection probability of individual i . On crossover operator, we choose arithmetic crossover. That is: t t t X A+1 = X B + (1 ) X A (8) t +1 t t X B = X A + (1 ) X B where is a parameter which can be selected as a constant or a variable determined by the evolution generations. Here, we choose 0.9 as the value of . On mutation, we choose non-uniform mutation method, which can be described as follows: when making a mutation operation from chromosome ' X = x1 x2 xk xl to X ' = x1 x2 xk xl , if the range of gene
k k x k named mutation point is [U min ,U max ],
then the new gene x k is determined by the following formula, that is, k x + (t,Umax vk ) if random0,1) = 0 (9) ( ' xk = k k ( xk (t, vk Umin) if random0,1) = 1 k k where (t , y ) ( y represents U max vk or vk U min ) represents a random number which obeys the nonuniform distribution and the probability of (t , y ) approaching to 0 increases with the evolution generation grows. Mutation is usually applied with low probability, typically in the range 0.0001 and 0.1. Here we choose 0.08 as the mutation probability. Besides, we set the evolution generation as 1000 to terminate GA.
'
GA selects individuals according to their level of fitness. Highly fit individuals, relative to the whole population, have a high probability of being selected for mating whereas less fit individuals have a correspondingly low probability of being selected. In this paper, we select the fitness function as: 1 1 (6) = f = 1 N MSE (ti ai ) 2 N i =1 where N is the number of training samples; a i and
4.4. Parameters of BP
The parameters of BP algorithm are set as follows. On training function, we choose the training function traingdx which combines the improved BP algorithms of adaptive learning rate and momentum training with momentum set as 0.9 and learning rate set as the formula (10):
212
MSEk < MSEk 1 1.05 k k +1 = 0.7 k MSEk > 1.04MSEk 1 (10) else k The training epoch in each cyclic iteration is 100 and performance function is MSE; the training goal is set as 0.
MSE=6.2796e-006 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 0.002 0
50
100
150 Iteration
200
250
300
10
-1
Training-Blue Goal-Black
10
-2
10
-3
10
-4
10
-5
10
-6
1000
2000
3000
7000
8000
9000 10000
5. Result Analysis
5.1. Comparison on Training Result
Compared with Figure 3 and Figure 4, we can see hybrid neural network gets the convergence of 6.279610-6 and achieves the training goal. However, single BP neural network doesnt achieve the training goal within the maximum training epochs. In Figure 3, we can see that when BP algorithm gets the convergence of 0.002, the program uses GA to optimize the networks weights and biases for the decline extent of MSE is less than 0.000001. With the optimization of GA, MSE gets a sharp decline and achieves the training goal in the end. However, in Figure 4, MSE gets almost no extinct change around 0.002, and the single BP network doesnt achieve the training goal in the process. This indicates that with the combination of GA and BP algorithms, the learning
60
50 Fitness
40
30
20
10
100
200
300
700
800
900
1000
213
ability of neural network has been improved. Using GA to optimize the weights and biases, problem of BP algorithms slow convergence around the training goal can be solved.
of the samples fully, but single BP neural network doesnt achieve the training goal. So the single BP neural networks classification accuracy is lower than that of hybrid network.
6. Conclusions
In this paper, we establish a hybrid neural network based on the combination of GA and BP algorithms and apply the model in personal credit scoring of commercial banks. The training results of the model indicate that hybrid algorithms can improve the learning ability of neural network effectively and can overcome the defects of BP algorithm when it is slow around the training goal. The classification result on testing sample data indicates that hybrid neural network gets a higher classification accuracy than single BP network does.
References
[1] Lyn C. Thomas, A Survey of Credit and Behavioral Scoring: Forecasting Financial Risk of Lending to Consumers. International Journal of Forecasting, 2000, No.16, pp. 149-172. Dong Xu, Zheng Wu, The System Analysis and Design Based on MATLAB 6.x-Neural Network (2nd edition), XIDIAN UNIVERSITY PRESS, Sian, 2002. Hao Pan, Xiaoyong Wang, Qiong Chen, et al, Application of BP neural network based on genetic algorithm , 2005, Vol 25, No. 12, pp. 2777-2779. J. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, 1975. Lippman. R. P, An introduction to computing with neural nets, IEEE ASSP Magazine, 1987, April, pp. 422,. Xiaofeng Hui, Yunquan Hu, Fei Hu, Study on The Application of BP Neural Network in Forecasting the Exchange Rate Based on GA, Quantitative & technical economics, 2002, No. 2, pp. 80-83.
Hybrid network
[2]
From Table 3, we can see on total classification accuracy, hybrid neural network has a higher result which gets 95.27% than single BP neural network. There are two type errors, type I error mistakes a client with a good credit condition as a bad one and refuses to offer the loan, however, type II is on the opposite way. From Table 3, we can see GA-BP hybrid neural network and single BP neural network have equal error rate on type I, but on type II, hybrid neural network which is only 3.02% has a lower error rate than single BP neural network which is 5.60%. So from the aspects of total classification accuracy and the error rate of type II, we can say GA-BP hybrid neural network is better than single BP neural network. The reason lies mainly on that hybrid neural network achieves the training goal and learn the characteristics
[3]
[4]
[5]
[6]
214