You are on page 1of 8

Problem: The objective of the problem is to approximate a non-linear function using a FF ANN.

Regression problems are one of the well-studied problems in the history of ANN. As a result, Matlab has specialised inbuilt-algorithms (like nftool) that can efficiently solve regression problems without much need for experimentation to set various parameter. But still for the question at hand following problems have to be addressed. 1. A new target has to be constructed from the original data based on five independent nonlinear functions 2. A set of 1000 data sets for training, validation and testing have to be selected from the original data set of 13600 data points. They can be selected either as a. Consecutive data points b. Random data points 3. Pre-Processing of the data if required a. Remove constant rows b. Mapping minimum and maximum values to -1 & +1 4. A NN has to be built with suitable a. Number of hidden layers b. Number of neurons for each hidden layer c. Transfer function for each layer d. Training algorithm to train the network 5. The problem of generalisation and over-fitting has to be studied Solution: At first, the new target data (and hence a new non-linear function) is constructed based on the 5 targets T1, T2, T3, T4 and T5 given in the question and my student number (r0223444)
Tnew = (4*T1+4*T2+4*T3+3*T4+2*T5)/17;

Primary attraction of ANN is its ability to learn from a set of observations which entails defining a cost function over the observation data. Here MSE on the test is considered as the cost function. Based on the default settings of nftool and the given data set it can be assumed that the best possible parameter settings for the architecture in most of the cases is a. An input layer with 2 neurons b. 1 Hidden layer with n neurons with tansig transfer function c. An output layer with 1 neuron with purelin transfer function d. Training algorithm trainlm to train the network Also lets assume its acceptable to have MSE => 0.01.

Given the above settings the below table compares the performance (MSE) for the training set for different number of hidden neurons. S.No 1 2 3 4 5 6 7 8 9 10 Number of neurons MSE (Training Data) MSE (Validation Data) In the hidden layer (Desired MSE < 0.01) 2 0.0393 0.0346 4 0.0294 0.0305 6 0.0065 0.0071 10 0.0013 0.0016 15 2.0263e-004 2.3878e-004 25 6.3218e-006 9.1975e-006 50 1.2695e-007 4.3578e-007 75 2.8736e-007 2.4866e-005 100 2.0458e-007 3.2613e-006 200 1.4986e-005 4.8771e-004 Tab.1 Performance of networks with various hidden neurons Epochs 32 61 213 567 389 213 572 179 332 532

Please note that the training set is a set of 1000 data points selected randomly to cover the entire surface of the function f(X1,X2)=Tnew (index based sampling wont cover the whole surface)

Fig 1. Surface of the training data

From table 1, we can conclude that the optimal number of neuron is around 4-10. Also from that table it can be inferred that with very large number of hidden layer neurons, network overfitting is a major issue. Now lets consider the network with 10 hidden neurons, we can experiment with other parameters to improve its performance ( MSE < 0.001). S.No Parameter MSE (Training Data) (Desired MSE < 0.01) 0.0013 5.6947e-004 0.0014 2.7101e-004 MSE (Validation Data) Epochs

1 2 3 4

7 8 9 10 11 12 13 14

Original 0.0016 mapminmax 6.3489e-004 439 Trainbfg 0.0017 532 Mu = 0.005 4.2813e-004 746 Mu_Dec = 0.001 Mu_Inc = 10 Mu = 0.05 1.5789e-004 2.1400e-004 694 Mu_Dec = 0.1 Mu_Inc = 10 Mu = 0.05 4.3929e-004 4.3248e-004 1000 Mu_Dec = 0.99 Mu_Inc = 10 logsig 6.9459e-004 6.4244e-004 764 Initzero 2.2323e-004 3.2821e-004 617 Smallrand/initnw 1.7152e-004 1.9203e-004 515 Smallrand/initwb 2.6759e-004 3.3184e-004 838 6/4 1.1380e-004 1.1814e-004 626 3/6 7.4606e-004 6.2828e-004 1000 5/5 6.5655e-005 6.7525e-005 706 4/6 1.6552e-004 2.2040e-004 1000 Tab.2 Performance of the selected network over various parameters

Discussions: As the second row of tab.2 points out, there is considerable improvement in the training, if the inputs and targets are scaled to the range [-1, +1]. This is due to the use of Levenberg-Marquardt optimization in training. As we will see later, the same is applicable for Bayesian Regularisation training as it uses the same technique. Also various parameters of trainlm algorithm are varied and their corresponding influence on the training is studied. From the rows 4,5 & 6 of tab.2 it can be inferred that the best possible mu settings is Mu = 0.05, Mu_Dec = 0.1 and Mu_Inc = 10 Also it can be noted that usually smooth (exponentially decreasing) performance curve is unstable with large Mu_Dec as shoen below.

Fig.2 Performance of the network when Mu = 0.05, Mu_Dec = 0.99 and Mu_Inc = 10 The row 7 of tab.2 also indicate that although its better to have sigmoid function in the hidden layers (to induce the non-linearity), the exact shape of the transfer function has less significance in training the network. The next major thing that can influence the performance is the way the bias and network weights are initialized. Here two significant methods are studied Initzero & Smallrand. In case of Initzero, all the bias and network weights are initialized to zero whereas Smallrand initializes them to small random values. Although the difference in the results was not much, Smallrand consistently outperformed Initzero indicating its better to initialize with random values than zeroes. Also out of hand-full of training algorithms tested, trainlm was way better than others given the same network. For example, traingdm with lr = 0.01 & mc = 0.9 had MSE = 0.94703 (51 epochs) whereas traingd with lr = 0.01 had MSE = 1.0059 (1000 epochs) and trainscg also had similar performance. Success of trainlm can be attributed to two of its characteristics design suitable for least square problems which are approximately linear and its ability to produce accurate approximation given small number of weights. Another important decision in the network architecture is the number of hidden layers. Its influenced by two factors, the non-linearity of the underlying function and the generalisation required. As its advantageous to have a second layer and few neurons in it to train for target functions with several hills and valleys, few scenarios as indicated by the rows 11 to 14 are tested.

Based on the results its clear that the network 2-5-5-1 with tansig transfer function for its hidden layer and purelin transfer function for its output layer (trained with trainlm) outperforms all other network. Please note that linear transfer function is necessary in the output layer to scale the output to match the targets.

Fig.3 Final network selected for regression The output on the test data is 6.5849e-005 and the output/ target surface of the network on the test data is also given in the appendix (Fig.4). The performance on training and validation set are 6.5655e-005and 6.7525e-005 respectively. It can be clearly seen that the performance has been clearly improved when compared with the ANN of 10 hidden neurons (in 1 layer) and also the network generalises well on the test data. Conclusions/ Further Improvements: Even though the selected network performed well on the test data with good generalisation, it can be improved further both on its performance and generalisation. Performance of each network is different at different training session due to the randomness induced by the initialisation of weights and biases. So for a given network and data set, its better to store the values of weights and biases for which the performance of the network is near to optimal and to re-use it to test on new data. In this experiment, the generalisation of the network is obtained through early stopping by checking on the validation set and randomising the data set. But it can be further improved (especially in case of large networks) using Bayesian Regularisation (trainbr) which evaluates the performance on the MSE and weights.

Appendix:

Fig. 4 Output/ target surface of the network on the test data

Code:
%Load Data clear; load 'E:\ANN-Matlab\Exam_ANN_2010-2011_matfiles\Data_Problem1.mat'; %Generate Tnew Tnew = (4*T1+4*T2+4*T3+3*T4+2*T5)/17; %Randomly Sample 3000 data points Training = [X1 X2 Tnew]; shuffledRows = randperm(13600); randomrows = Training(shuffledRows(1:3000), :); Training_X1 = randomrows (:,1); Training_X2 = randomrows (:,2); Training_T = randomrows (:,3); Training_Data = [Training_X1 Training_X2]; %Assign input/ output inputs = Training_Data'; targets = Training_T'; [inputs,ps] = mapstd(inputs); [targets,ps1] = mapstd(targets); % Create a Fitting Network hiddenLayerSize = 10; net = newff(inputs,targets,[5,5]); %net = fitnet(hiddenLayerSize); % Choose Input and Output Pre/Post-Processing Functions net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'}; net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'}; %network parameters net.layers{1}.initFcn = 'initwb'; %Layer initialisation function net.layers{2}.initFcn = 'initwb'; %Layer initialisation function %net.layers{1}.transferFcn = 'logsig'; %Transfer Function net.inputWeights{1,1}.initFcn = 'randsmall'; net.layerWeights{1,1}.initFcn = 'randsmall'; net.biases{1}.initFcn = 'randsmall'; net.biases{2}.initFcn = 'randsmall'; % Setup Division of Data for Training, Validation, Testing net.divideFcn = 'dividerand'; % Divide data randomly %net.divideFcn = 'divideind'; % Divide Data based on index net.divideMode = 'sample'; % Divide up every sample net.divideParam.trainRatio = 1/3; net.divideParam.valRatio = 1/3; net.divideParam.testRatio = 1/3; %net.divideParam.trainInd = 1:4534; %net.divideParam.valInd = 4535:9068; %net.divideParam.testInd = 9069:13600; %[trainInd,valInd,testInd] = divideind(inputs,1:4534,4535:9068,9069:13600); %[trainInd,valInd,testInd] = divideind(targets,1:4534,4535:9068,9069:13600); %net.trainFcn = 'trainscg'; % Levenberg-Marquardt %remove gradient %net.trainParam.lr=0.01; % only for gradient %net.trainParam.mc = 0.9; % only for traingdm net.trainParam.max_fail = 25; % on the validation set net.trainParam.epochs = 1000;

net.performFcn = 'mse'; %net.performParam.ratio = 1.0; net.trainParam.mu = 0.05; net.trainParam.mu_dec = 0.1; net.trainParam.mu_inc = 10; % Choose a Performance Function % For a list of all performance functions type: help nnperformance %net.performFcn = 'mse'; % Mean squared error % Choose Plot Functions % For a list of all plot functions type: help nnplot net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ... 'plotregression', 'plotfit'};

% Train the Network [net,tr] = train(net,inputs,targets); % Test the Network outputs = net(inputs); errors = gsubtract(targets,outputs); performance = perform(net,targets,outputs);

trainTargets = targets .* tr.trainMask{1}; valTargets = targets .* tr.valMask{1}; testTargets = targets .* tr.testMask{1}; trainPerformance = perform(net,trainTargets,outputs) valPerformance = perform(net,valTargets,outputs) testPerformance = perform(net,testTargets,outputs) hold on; plot3(Training_X1 .* tr.testMask{1}', Training_X2 .* tr.testMask{1}', targets' .* tr.testMask{1}','o g'); plot3(Training_X1 .* tr.testMask{1}', Training_X2 .* tr.testMask{1}', outputs' .* tr.testMask{1}','o r'); hold off; %hold on; %plot3(Training_X1 .* tr.trainMask{1}', Training_X2 .* tr.trainMask{1}', Training_T .* tr.trainMask{1}','o r'); %plot3(Training_X1 .* tr.valMask{1}', Training_X2 .* tr.valMask{1}', Training_T .* tr.valMask{1}' ,'* b'); %plot3(Training_X1 .* tr.testMask{1}', Training_X2 .* tr.testMask{1}', Training_T .* tr.testMask{1}','o g'); %hold off; % View the Network %view(net) figure, plotperform(tr) %figure, plottrainstate(tr) %figure, plotfit(net,inputs,targets) figure, plotregression(targets,outputs) %figure, ploterrhist(errors)

You might also like