You are on page 1of 5

A New Approach to MRI Brain Images Classification

Abstract: The aim of this work is to present an automated


method that assists diagnosis of normal and abnormal MR
images. The diagnosis method consists of four stages, pre-
processing of MR images, feature extraction, dimensionality
reduction and classification. After histogram equalization of
image, the features are extracted based on discrete wavelet
transformation (DWT). Then the features are reduced using
principal component analysis (PCA). In the last stage three
classification methods, k-nearest neighbour (k-NN), parzen
window and artificial neural network (ANN) are employed. Our
work is the modification and extension of the previous studies
on the diagnosis of brain diseases, while we obtain better
classification rate with the less number of features and we also
use larger and rather different database.

Keywords: Magnetic resonance imaging, pattern
recognition, classification, wavelet feature extraction,
neural networks.
1. Introduction
In the recent century, increasingly use of
sophisticated laboratory tests has made diagnosis a
sensitive and accurate issue. The use of computer
technology in medical decision support is now
widespread and pervasive across a wide range of
medical area, such as cancer research, gastroenterology,
hart diseases, etc [1]. In studying human brain, magnetic
resonance imaging (MRI) plays an important role in
progressive researchs. The rich information that MR
images provide about the soft tissue anatomy has
dramatically improved the quality of brain pathology
diagnosis and treatment. However, the amount of data is
far too much for manual interpretation and hence there is a
great need for automated image analysis tools [2].
Pattern recognition techniques are being increasingly
employed in magnetic resonance imaging (MRI) data
analysis [3]. The automatic classification of brain
magnetic resonance images is vital process for separating
healthy subjects and subjects with various brain diseases
like, cerebrovascular, Alzheimer, brain tumour,
inflammatory, etc.
The process of automatically classifying MR image is
a challenging process. This leads to many different
approaches. In literature, MR images have been classified
via supervised techniques such as artificial neural
networks and support vector machine (SVM) [2], k-
nearest neighbor (k-NN) and feed forward back
propagation ANN [6], and unsupervised classification
techniques such as self-organization map (SOM) [2] and
fuzzy c-means [4]. In this study we used three
supervised machine learning algorithms (ANN, parzen,
and k-NN) to classify the images.
In the recently introduced work [6], the authors used a
hybrid intelligent technique for classifying of MRI brain
images. Their dataset contained 10 normal images and 60
abnormal image including Alzheimer, Glioma,
bronchogentic carcinoma, cerebrovascular. They have
achieved higher classification rate than the previous
methods.
In this study, our goal is to achieve higher
classification rate in order to diagnosis normal images
from those with brain abnormality. First, the MR images
are pre-processed to set the mean intensity of pixels to
same level. In this way the dark images become light like
other images. Second, features are extracted using
wavelet transformation. Wavelets seem to be a suitable
tool for this task, because they allow analysis of images
at various levels of resolution [5]. Third, principal
component analysis (PCA) is used for reducing the
number of features and also increasing discrimination
between classes. Principal component analysis is
appealing since it effectively reduces the dimensionality
of the data and therefore reduces the computational
cost of analyzing new data [6]. Finally, three pattern
recognition methods k-NN, parzen and ANN are used for
classification. The results indicate fully classification of
data. Our work is the extension and modification of the
method introduced in [6]. But, our case is different, the
database contains more and different images, we use pre-
processing step and additional classifier; the number of
features obtained by PCA for maximum classification
rate is less and we obtain better classification rate.
However, DWT, PCA, and classifiers are commonly used
steps in pattern recognition problems.
The organization of this paper is as follows. Section 2
shortly describes the image dataset, steps of brain
diagnosis method including pre-processing, feature
extraction and reduction. Section 3 contains classification
stage. Results and comparison with previous works are
presented in section 4. Section 5 summarizes the
approach and presents future works.
Shahla Najafi*, Mehdi Chehel Amirani**, and Zahra Sedghi***
*Urmia University, Electrical Engineering Department, sh.najafi66@yahoo.com
** Urmia University, Electrical Engineering Department , m.amirani@urmia.ac.ir
*** Urmia University, Electrical Engineering Department , st_z.sedghi@urmia.ac.ir

2. Methodology
2.1 Imaging Data
The diagnosis methods have been implemented on a
real human brain MRI dataset. The protocol includes high
resolution axial, T2-weighted 256 256 pixel images.
The dataset contains 125 normal subjects and 41
abnormal MR images including cerebrovascular,
Alzheimer, brain tumor, inflammatory, infectious and
degenerative diseases. These dataset were collected from
Harvard Medical School website [13] and Laboratory of
Neuro Imaging (LONI) website [14]. Fig.1 shows some
examples from normal and abnormal subjects.
2.2 Pre-processing
Some images of dataset were dark rather than others.
This is because of data acquisition scanner problems. The
scans are corrected for intensity nonuniformity using
histogram equalization. Fig.2 depicts an example of dark
MR image after and before pre-processing.

Fig. 1: Examples from (a) normal (b) abnormal subjects

Fig. 2: (a) Original image (b) Image after pre-processing

2.3 Feature Extraction
We use wavelet coefficients for generating the initial
features. Wavelet transform is traditionally used for
feature extraction. The provision of localized frequency
information about a function of a signal, is the main
advantage of wavelets and is particularly beneficial for
classification. Earlier, wavelets have been used as a
feature extraction method for discrimination [7].
In two-dimensional wavelet transform a scaling
function (x, y) and three wavelets
H
(x, y) (measures
variations along columns),
v
(x, y) (responds to
variations along rows),

(x, y) (corresponds to
variations along diagonals), are required. The disctere
wavelet transform of image (x, y) of size H N is then
w
q
(]
0
, m, n) =
1
MN
(x, y)
]
0
,m,n
(x, y)
N-1
=0
M-1
x=0
(1)
w

(], m, n) =
1
MN
(x, y)
],m,n

(x, y)
N-1
=0
M-1
x=0
(2)
Where i identifies the directional wavelets
(i = {E, I, ]) and ]
0
is an arbitrary starting scale. The
w
q
(]
0
, m, n) coefficients define an approximation of
(x, y) and scale ]
0
. The w

(], m, n) coefficients add


horizontal, vertical, and diagonal details for scales ] ]
0

[8]. Fig.3 shows the process in block diagram form.
There are several different kinds of wavelets which have
gained popularity throughout the development of wavelet
analysis. One important discrete wavelet is the Haar
wavelet. Basically, it is one period of a square wave.
Because of its simplicity, it is often the wavelet to be
chosen [9].
We use three scale Haar (E
2
) basis functions for DWT
feature extraction. . Thus, for the image of size 256*256,
we use the approximation coefficient of the third level as
the features. Thus, the number of features used in this
stage would be1024.
E
2
=
1
2
j
1 1
1 -1
[ (3)
Fig.4 shows the discrete wavelet transform of one
example MR image.

Fig. 3: The analysis filter banks of discrete wavelet transform

Fig. 4: discrete wavelet transform of one example MR image
2.4 Feature Reduction
Measurement cost and classification accuracy are
two predominant reasons for minimizing the
dimensionality of the pattern representation (i.e., the
number of features). Classifiers which are built on the
selected representation can use less memory and be faster
by utilizing the limited feature set.
Linear transforms, due to their simplicity, have been
widely used for feature extraction. These transforms
create a smaller set of features from linear combination of
the initial features.
One of the best known linear feature extractor is
the principal component analysis (PCA) or
Karhunen-Loeve expansion [10]. The basic
approach in principal components is conceptually
quite simple. First, the J-dimensional mean vector
and J J covariance matrix X are computed for the
full data set. Next, the eigenvectors and eigenvalues
are computed, and sorted according to decreasing
eigenvalue. Call these eigenvectors c
1
with
eigenvaluez
1
, c
2
with eigenvaluez
2
, and so on. Next,
the largest k such eigenvectors are chosen. In practice,
this is done by looking at a spectrum of eigenvectors.
Form a k k matrix A whose columns consist of the
k eigenvectors. Preprocess data according to:
x = A
t
(x - ) (4)
Since PCA uses the most expressive features
(eigenvectors with the largest eigenvalues), it
effectively approximates the data by a linear subspace
using the mean squared error criterion [11].
In our work the number of features after PCA that
achieves the maximum accuracy for k-NN is 6, for parzen
is 5 and for ANN is 7.
3. Classification
3.1 K-nearest Neighbor
K-nearest neighbor is one of the simplest pattern
recognition classification techniques. The algorithm for
the nearest neighbor rule is summarized as follows. Given
an unknown feature vector x and a distance measure,
then:
Out of the N training vectors, identify the k nearest
neighbours, irrespective of class label. k is chosen
to be odd.
Out of these k samples, identify the number of
vectors, k

that belong to class w

, i = 1,2, , H.
Obviously, k

= k

.
Assign x to the class w

with the maximum number


k

of samples.
Various distance measures can be used, including the
Euclidean and Mahalanobis [12]. The value of k is tuned
until the maximum level of accuracy is achieved. For this
dataset we use Euclidean distance and after several trials
and errors, k = S is the best value for k.
3.2 Parzen Window
In this approach of classification a d-dimensional
window is created around all the train ing samples and
depending upon the number of patterns that belong to
those windows the probability estimates of the different
classes is made. Formally this can be stated as,
p
n
(x) =
1
n

1
u
n
q(
x-x
i
h
n
)
n
=0
(5)
where u
n
is a J -dimensional hypercube in feature
space. Here q is a general probability distribution
function and p
n
(X) is the probability that the pattern
belongs to the given class. It remains to designer to
choose the form of q and for all practical purposes
Gaussian distribution is chosen.

3.3 Artificial Neural Network
Neural networks have seen an increasingly interest over
the last few years. The addition of neural network
techniques theory in pattern recognition, have received
significant attention.
A neural network is a massively parallel distributed
processor that has a natural propensity for storing
experiential knowledge and making it available for
use. It resembles the brain in two respects [13]:
1. Knowledge is acquired by the network through a
learning process.
2. Interconnection strengths known as synaptic
weights are used to store the knowledge.
Basically, learning is a process by which the free
parameters (i.e., synaptic weights and bias levels) of
a neural network are adapted through a continuing
process of stimulation by the environment in which
the network is embedded. The type of learning is
determined by the manner in which the parameter
changes take place.
This form of learning assumes the availability of a
labeled set of training data made up of N input-output
examples:
I = {(x

, J

)]
=1
N

where x
|
= input vector of ith examp
J

= desired response of ith e


to be scalar for convenience o
N = sample size
Given the training sample T, the r
compute the free parameters of the
so that the actual output y

of the ne
to x
|
is close enough to J

for all
sense. For example, we may use t
error
E(n) =
1
N
(J

-y

)
2 N
=1
as the index of performance to be m
As depicted in Fig.5, a multilaye
most important and widely used netwo
up of layers which together are proce
layer is fully connected to the succee
layer, the first processing unit, is set by
the output layer, the last processing un
values. In addition to these two layer
more layers of hidden neurons, whi
because these neurons are not directl
hidden neurons extract important fe
in the input data.
The multilayer neural network e
classier in this study, had three lay
trails for different hidden layers with d
neurons). The first layer consisted of 7
accordance with the 7 feature selected
coefficients by the PCA. After many tri
number of neurons in the hidden layer
has shown the best performance). Als
the output layer were used to repre
abnormal human brain.

Fig. 5: Fully connected network with one h
one output later
(6)
ple
xample, assumed
f presentation
requirement is to
e neural network
eural network due
i in a statistical
the mean-square
2
(7)
minimized.
er perceptron, the
rk model, is made
essing units. Each
eding layer. Input
y problem data and
nit, results solution
rs, usually one or
ich are so called
ly accessible. The
atures contained
employed as the
yers (after several
ifferent number of
input elements in
from the wavelet
ials and errors, the
r was five (which
so two neurons in
esent normal and
hi dden layer and
4. Results and
We apply a supervised m
normal and abnormal MRI bra
the method employs four stag
extraction, feature reduction
histogram equalization of ima
coecients of decomposition
are computed to extract the
approximation component and
are used as the wavelet coe
are used for feature extraction
vector is 1024 and we use prin
for reducing the number of fea
dimension by PCA leads to inc
classification. Three classif
recognition methods, k-NN, par
classifying.
The experimental results
compared in TABLE I, which
extracted by PCA for each cla
and abnormal images used
number of images misclassifie
finally the percentage of corr
the two different image clas
experimental results shows
ratio 98.4% is achieved with
with parzen window and 99.2%
To evaluate the effectiven
compare our results with
TABLE II gives the classi
method and recent results. Th
our system has high cor r ec
less computation due to the f
the PCA.
5. Conclusions an
In this study, we used a
method for classification of
classes: normal and abnorma
pre-processing, discrete wav
extraction, feature reduction
analysis and the supervised l
parzen and ANN) that we ach
in classifying the healthy an
classification percentage of m
artifitial neural network, 9
window and 99.2% in case
demonstrate the privilege of o
We have applied this me
weighted images at a particul
The same method can be empl
weighted, proton density and o
with more than one slice of bra
better accuracy. Therefore, one
diagnostic system for the detec
Alzheimers, Cerebrovascu
Neoplastics diseases, and etc
d Comparison
method for the diagnosis
ain images. As mentioned,
ges: preprocessing, feature
and classification. After
ages, the first three levels
of MR images with Haar
features. Then, the 3rd
d all detailed components
cients. These coecients
n. The dimension of feature
ncipal component analysis
atures. Reduction of feature
crease the accuracy rates of
fiers based on pattern
rzen and ANN are used for
s of the classiers are
shows number of features
assifier, number of normal
for training and testing,
ed with each classifier and
rect classification ratio for
sses. The analysis of the
that correct classication
the ANN classier, 99.2%
% with k-NN.
ness of our methods we
previous works [2, 6].
cation accuracies of our
his comparison shows that
ct classication ratio and
feature reduction based on
nd Future Works
a machine learning based
brain MR images into two
al. Our method designed by
velet transform for feature
n by principal component
learning classifiers ( k-NN,
hieved the promising results
nd patient subjects. Correct
more than 98.4% in case of
99.2% in case of parzen
of the k-nearest neighbor
our method.
ethod only to axial T2-
lar depth inside the brain.
oyed for T1-weighted, T2-
other types of MR images
ain MRI in order to achieve
can develop software for a
ction of brain disorders like
ulars, Inflammatorys,
c.
TABLE I: Classification Results
Classifier
Number of
Features by
PCA
Total Number
of Images
Number of Images in Training Number of Images in Testing
Images
Misclassified
Correct
Classification
Ratio (%) Normal Abnormal Normal Abnormal
PP+DWT+PCA+ANN 7 166 19 21 106 20 2 98.4
PP+DWT+PCA+k-NN
6 166 19 21 106 20 1 99.2
PP+DWT+PCA+Parzen
5 166 19 21 106 20 1 99.2

TABLE II: Comparison With Other Methods Reported in the Literature
Method Techniques Used for Classification Correct Classification Ratio (%)
Our Work
PP+DWT+PCA+ANN 98.4
PP+DWT+PCA+k-NN
99.2
PP+DWT+PCA+Parzen
99.2
El-Dahshan et al. [6]
DWT+PCA+ANN 97
DWT+PCA+k-NN
98
Chaplot et al. [2]
DWT + SOM
94
DWT + SVM with linear kernel
96
DWT + SVM with radial basis function based kernel

98

References
[1] F. Gorunescu, Data mining techniques in computer-aided
diagnosis: Non-invasive cancer detection, PWASET, vol. 25,
pp. 427430, 2007.
[2] S. Chaplot, L. M. Patnaik, and N. R. Jagannathan,
Classication of magnetic resonance brain images using
wavelets as input to support vector machine and neural
network, Biomed. Signal Process., vol. 1, pp. 8692, Jun.
2006.
[3] E. Formisano, F. D. Martino, and G. Valente, Multivariate
analysis of f MRI time series: classification and regression of brain
responses using machine learning, Magnetic Resonance Imaging,
vol. 26, pp. 921 934, Jan. 2008.
[4] M. Maitra, and A. Chatterjee, Hybrid multiresolution Slantlet
transform and fuzzy c-means clustering approach for
normal-pathological brain MR image segregation, Med. Eng.
Phys., Aug. 2007.
[5] S. Kara, and F. Dirgenali, A system to diagnose
atherosclerosis via wavelet transforms, principal component
analysis and articial neural networks, Expert Syst. Appl.,
vol. 32, pp. 632640, Feb. 2006.
[6] E. A. El-Dahshan, T. Hosny, A. M. Salem, Hybrid intelligent
techniques for MRI brain images classication, Digital Signal
Processing, vol . 20, pp. 433441, 2010.
[7] K. Karibasappa, S. Patnaik, Face recognition by ANN using
wavelet transform coecients, IE (India) Journal of
Computer Eng., vol . 85, pp. 1723, 2004.
[8] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3d
edition, New Jersy, Prentice Hall, 2008.
[9] K. Roy, and P. Bhattacharya, Optimal features subset
selection and classication for Iris recognition, Journal of
Image Video Process. , Mar. 2008.
[10] A. K. Jain, P. W. Robert Duin, and M. Jianchang, Statistical
pattern recognition: A review, IEEE Trans. Pattern Anal.
Mach. Intell., vol. 22, pp.437, Jan.2000.
[11] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classication, Wiley, New York, 2001.
[12] S. Theodoridis, and K. Koutroumbas, Pattern Recognition,
Academic Press, p.45, 1999.
[13] Harvard Medical School, Web, data available at
http://med.harvard.edu/AANLIB/.
[14] Laboratory of Neuro Imaging (LONI), web,
http://www.loni.ucla.edu/.

You might also like