Letter Image Recognition using Neural Network Pattern Recognition
Objective: To identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. Number of Inputs/Attributes: 16 Numeric attributes Number of Targets: 26 Number of Instances: 20000 Neural Network has been used to classify the instances as any of the 26 alphabets based on the inputs and this problem is a pattern recognition problem. Number of Instances used for Training: 16000 Number of Instances used for Validation: 3000 Number of Instances used for Testing: 1000 Fivefold cross validation has been used to ensure that the test and training set are independent and to ensure that every instance is used for testing.
Data Processing: The target file comprised of alphabets, which could not be imported into MATLAB directly. Hence it was converted into a vector of numerals (combination of 1s and 0s). A 26 column vector was defined for every alphabet to uniquely code them. For eg. A was coded as 1 in the first position followed by 25 zeroes. Eventually the input appears as a 20000 x 16 matrix and the target as a 20000 x 26 matrix.
Selection of Network: We start with a two layer Neural Network with the following specifications:
Two Layer Networks: Network: 16 x 26
Number of Neurons in the output layer: 26 (as number of outputs is 26) Transfer function: Tan Sigmoid in both the layers Algorithm: Scaled Conjugate Gradient Convergence is based on minimum Mean Squared Error
Learning Rate: 1
Number of Neurons in the hidden Layer Mean Squared Error (in the validation set) 16 0.0259 18 0.0204 20 0.0119 22 0.0111 24 0.0126 26 0.0141
For neurons above 26 in the hidden layer, no improvement in MSE is found. From the above table it is clear that for a learning rate of one, the best performance is obtained for 22 neurons in the hidden layer. Note that the values shown in the table are the means obtained after fivefold cross validation.
Learning Rate: 0.75
Number of Neurons in the hidden Layer Mean Squared Error (in the validation set) 16 0.01306 18 0.0177 20 0.0112 22 0.0138 24 0.0131 26 0.0123
From the above table it is clear that for a learning rate of 0.75, the best performance is obtained for 20 neurons in the hidden layer.
Learning Rate: 0.5
Number of Neurons in the hidden Layer Mean Squared Error (in the validation set) 16 0.0164 18 0.0142 20 0.0134 22 0.0124 24 0.0136 26 0.0209
From the above table it is evident that for a learning rate of 0.5, the best performance is obtained for 22 neurons in the hidden layer.
Comparing the results of the three learning rates, it is pretty evident that the least MSE is obtained for learning rate one and for 22 neurons in the hidden layer.
Now we embark upon figuring out if there is any significant improvement in the performance of the network when the number of hidden layers is increased beyond 1.
Three Layer Networks:
Network1: 16 15 15 26 Number of Neurons in the two hidden layers: 15 Learning Rate: 1 Number of Neurons in the output layer: 26 Transfer function: Tan Sigmoid in both the layers Algorithm: Scaled Conjugate Gradient Convergence is based on minimum Mean Squared Error Mean Squared Error: 0.0142 on the validation set (after fivefold cross validation) Network2: 16 12 12 26 Learning Rate: 1 Number of Neurons in the two hidden layers: 12 Mean Squared Error: 0.0148 on the validation set (after fivefold cross validation) In the case of two layer networks, results are displayed only for the two best network models after trying different combinations of neurons and learning rates. It is seen that network 1 with 15 neurons in each of the two hidden layers and a learning rate of 1 is found to be the best. Now comparing the performance of two layer and three layer networks, we find that the best network is the single layer network with 22 neurons in the hidden layer and with a learning rate of one. Final Neural Network Model:
Number of Inputs in the hidden layer: 22 Learning Rate: 1 Transfer function: Tan Sigmoid in both the layers Mean Squared Error: 0.011044
The performance plot is shown below:
The performance plot, which is a plot of MSE and Epochs (Iterations), is shown for training, test and validation sets separately. The figure shows that as the number of iterations increases, MSE falls and the training stops at a point beyond which the MSE increases. The training stopped when the MSE in the validation set reached its lowest value of 0.011 at epoch 560.
The Receiver Operating characteristics plot is shown below:
The ROC curve basically plots the true positive rate (sensitivity) in the Y axis and false positive rate (1-specificity) in the X axis as the threshold is varied. The colored lines indicate the curves for each of the 26 classes. The performance of the network would be the best when all the curves or lines lie at the top left corner. The figure shown above clearly shows that all curves lie in the upper corner implying that the network is performing well. Conclusion: The neural network model 16 22 26 26 that has been selected for solving the character pattern recognition problem is shown below:
The model selected works well in the training set in that it doesnt over fit at the same time performing well in the test set as well.