You are on page 1of 15

+ model

ARTICLE IN PRESS

Image and Vision Computing xx (2006) 115 www.elsevier.com/locate/imavis

Hidden Markov models-based 3D MRI brain segmentation


M. Ibrahim, N. John, M. Kabuka *, A. Younis
Department of Electrical and Computer Engineering, College of Engineering, University of Miami, 1251 Memorial Drive, Room 406, Coral Gables, FL 33146, USA Received 18 September 2004; received in revised form 4 February 2006; accepted 1 March 2006

Abstract This paper introduces a 3D MRI segmentation algorithm based on Hidden Markov Models (HMMs). The mathematical models for the HMM that forms the basis of the segmentation algorithm for both the continuous and discrete cases are developed and contrasted with Hidden Markov Random Field in terms of complexity and extensibility to larger elds. The presented algorithm clearly demonstrates the capacity of HMM to tackle multi-dimensional classication problems. The HMM-based segmentation algorithm was evaluated through application to simulated brain images from the McConnell Brain Imaging al Neurological Institute, McGill University as well as real brain images from the Internet Brain Segmentation Repository (IBSR), Centre, Montre Harvard University. The HMM model exhibited high accuracy in segmenting the simulated brain data and an even higher accuracy when compared to other techniques applied to the IBSR 3D MRI data sets. The achieved accuracy of the segmentation results is attributed to the HMM foundation and the utilization of the 3D model of the data. The IBSR 3D MRI data sets encompass various levels of difculty and artifacts that were chosen to pose a wide range of challenges, which required handling of sudden intensity variations and the need for global intensity level correction and 3D anisotropic ltering. During segmentation, each class of MR tissue was assigned to a separate HMM and all of the models were trained using the discriminative MCE training algorithm. The results were numerically assessed and compared to those reported using other techniques applied to the same data sets, including manual segmentations establishing the ground truth for real MR brain data. The results obtained using the HMM-based algorithm were the closest to the manual segmentation ground truth in terms of an objective measure of overlap compared to other methods. q 2006 Elsevier B.V. All rights reserved.
Keywords: Hidden Markov Models; Image segmentation; Medical imaging

1. Introduction Interpretation of the biomedical imaging of the brain plays an important part in diagnosis of various diseases and injury. Due to the importance of brain imaging interpretation, signicant research efforts have been devoted to developing better and more efcient techniques in several related areas including processing, modeling, and understanding of brain images. In particular, the problem of automating 3D segmentation of brain imaging using Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET) or other modalities, has received special attention as evidenced by numerous published research

* Corresponding author. Tel.: C1 305 284 2212; fax: C1 305 284 4044. E-mail address: m.kabuka@miami.edu (M. Kabuka).

0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2006.03.001

work [13]. This is mainly due to the multitude of benets that may be gained from accurate automated 3D brain segmentation. Segmentation frameworks based on Markov Random Fields (MRF) and Hidden Markov Random Fields (HMRF) were introduced in several reported efforts [912]. MRFs and HMRFs share the common property of revealing the dependency between the imaging voxels to be segmented and their rst-degree neighbors. However, both frameworks are computationally intensive, which adversely affects their practical applicability in medical environments. On the other hand, Hidden Markov Models (HMMs) have proven valuable when applied to Automatic Speech Recognition (ASR) [4], where ASR is essentially a pattern recognition problem. In fact, HMRFs, which are mainly applied in computer vision and image processing, grew out of further developments of HMMs. Hidden Markov Chains have also been reported for image segmentation using radar, synthetic and multi-sensor images [3133]. A generalized mixture estimation approach is

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

presented for unsupervised classication of Hilbert-Peano scans of radar images [31], which combines Hidden Markov Chain models and Hidden Markov Random Field models. Similarly, pairwise Markov Random Chain models provided the basis for unsupervised signal and image segmentation of simulated as well as radar images [32]. Another approach utilizing Hidden Markov Chains was presented for image segmentation of synthetic and multi-sensor radar images [33]. These techniques provide promising results for utilizing HMMs for MR image segmentation. HMMs, implemented using the Viterbi algorithm, are sufciently capable of encoding the rst-degree relationships and can be extended to higher degrees. Encoding rst-degree relationships among the voxels will be shown, as evidenced by the experimental results, to provide sufcient information for accurate segmentation of 3D MRI brain imaging data. The main training algorithms that have been developed for HMMs are the BaumWelsh algorithm [4] and the Maximum Mutual Information (MMI) algorithm [5]. The inefciency of both techniques is argued in the context of Bayesian classication where it is shown that both algorithms do not necessarily result in the best Bayesian threshold [6]. Consequently, a new algorithm, namely the Minimum Classication Error (MCE), was developed [6], which takes into consideration exposing each of the HMM nodes to both the patterns to be rejected as well as the patterns to be recognized. As a result, the HMM nodes can minimize the accompanying error rate by moving the Bayesian threshold closer to the correct location as shown in Fig. 1. Many advances in brain MR image segmentation have relied on a Bayesian framework and Markov Random Fields (MRFs) [17]. In [15], the smoothness and piecewise contiguous nature of the tissue regions in MR cerebral images was modeled using a 3D MRF. A segmentation algorithm, based on the statistical model, nds the approximate Maximum A Posteriori (MAP) estimation of the segmentation model parameters from the MR imaging data. Another scheme for segmentation was based on the Iterative Conditional Modes (ICM) algorithm [18], in which measurement model parameters were estimated using local information at each site, and the prior model parameters were estimated using the segmentation results after each cycle of iterations. In this

Probability

Bayesian threshold Errorneous threshold

Class PDF Non-class PDF

Argument

Fig. 1. Correct Bayesian threshold vs. erroneous one.

case, MRFs were used to model only the intensity process, and the segmentation results were improved by incorporating the discontinuity process into the prior model. The scheme also addressed the effect of magnetic eld inhomogeneities and biological variations of tissues as variations of the model parameters. Unfortunately, this model did not investigate the discontinuity process in the 3D MR volumes. A fully automated 3D-segmentation technique for MR brain images was introduced in [19] that relied on a MRF model to capture the non-parametric distributions of tissue intensities, neighborhood correlations, and signal inhomogeneities in MR images. The technique used two algorithms based on Simulated Annealing and on Iterative Conditional Modes and started with a training process of typical echo intensities and setting one of the MRF parameter according to the expected inhomogeneity. The technique was able to automatically segment the entire 3D MR volume, as well as different MR images acquired using the same MR sequence. Another study [20] involved embedding the problem of functional MRI (fMRI) analysis into a Bayesian framework, and then provided an algorithm to restore and analyze fMRI using MRFs in a Bayesian framework. The study analyzed the shortcomings of the Statistical Parameter Map (SPM) by using a 3D MRF where the third dimension represents time, and then the proposed restoration approach was applied before using SPM, which resulted in an improvement of the detection sensitivity. This study also analyzed the hemodynamic response using three parameters, the norm, the maximum and the time when the maximum occurs, where it was shown that when the values of these parameters in neighboring voxels are far from each other, the probability of detection is lower since the associated hemodynamic responses are not consistent in the spatial domain. Hence, the problem was modeled using two-level MRF interactions between the activation map and the three parameter maps. The detection of an activated area, thus, depends on the norm of the hemodynamic response and some contextual information on this norm as well as the consistency of the hemodynamic function parameters across this area. Another fully automated method for model-based tissue classication of magnetic resonance MR images of the brain was introduced in [16]. The method relies on MRFs to incorporate contextual information and uses a digital brain atlas for the expected a priori information of the spatial locations of the tissue classes. The main idea of the method is to interleave the classication with MR bias eld correction, intensity distribution estimation, and estimation of MRF parameters. Hence, it improves the classication in each iteration of the segmented single and multi-spectral MR images, and corrected MR signal inhomogeneities. The proposed strategy can be considered a fully automated method for tissue classication that produces objective and reproducible results. Another automatic method is presented in [21], where the objective of the study is to classify the brain tissue while taking into account the partial volume effect, which results in MR image volumes being composed of a mixture of several tissue types. This study assumes that the brain dataset is composed of gray matter, white matter, cerebro-spinal uid,

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 3

and mixtures (called mix-classes). The study provided a statistical model of the mix-classes and it showed that it could be approximated by a Gaussian function under some conditions. The proposed method used a two-step strategy; in the rst step, it segmented the brain into pure and mix-classes while the second step is to re-classify the mix-classes into the pure classes using knowledge about the obtained pure classes. Both steps use MRF models as well as the multi-fractal dimension describing the topology of the brain to provide an additional energy term in the MRF model to improve discrimination of the mix-classes. The proposed strategy is unsupervised, fully automatic, and uses only T1-weighted images. In [22], a statistical framework for partial volume segmentation of MR images of the brain was introduced. The framework starts by segmenting the image using a parametric statistical model in which each voxel is classied to one single type of tissue. Then, it uses a down-sampling step that addresses partial volumes along the borders between tissues. In this step, a number of voxels in the original image grid contribute to the intensity of each voxel in the resulting image grid. The framework also uses an Expectation Maximization (EM) approach to estimate the parameters of the new model and to perform the partial volume classication. In [23], a statistical segmentation framework of brain MR images based on Hidden Markov Random Field (HMRF) is introduced, which overcomes the problems of Finite Mixture (FM) models [24,25] that do not take into account the spatial properties of the image. The HMRF model is an MRF model whose state sequence cannot be observed directly but can be indirectly estimated through observations. The strategy also uses an EM algorithm to provide an accurate and robust segmentation. The study in [26] introduced an efcient and accurate automatic 3D segmentation approach for brain MR images. The approach uses a brain atlas in conjunction with a robust registration procedure to nd a non-rigid transformation that maps the standard brain to the specimen to be segmented, and hence, is used to segment the brain from non-brain tissues and compute prior probabilities for each class at each voxel location. The approach also involved a fast and accurate way to nd optimal segmentations based on EM models, given the intensity models along with the spatial coherence assumption. Unfortunately, the study does not take the Partial Volume (PV) effect into account. A contextual segmentation technique to detect brain activation from functional brain images based on a Bayesian framework is presented [28], which uses an MRF model to represent congurations of activated brain voxels. It also uses likelihoods given by statistical parametric maps to nd the maximum a posteriori estimation of segmentation. The technique is capable of analyzing experiments involving multiple-input stimuli. The study in [27] introduced a modelbased approach for automatic segmentation and classication of multi-parameter MR brain images into 15 tissue classes. The model approximated the spatial distribution of tissue classes by a Gaussian MRF and used the maximum likelihood method to estimate class probabilities and transitional probabilities for each pixel of the image. The proposed algorithm is not only

accurate compared to manual segmentation but also can learn new tissue classes. An unsupervised tissue characterization algorithm was introduced in [29] that is both statistically principled and patient specic. The method used adaptive standard nite normal mixture and inhomogeneous MRF models, whose parameters were estimated using ER method and relaxation labeling algorithms under information theoretic criteria. A technique for assessing the accuracy of segmentation algorithms was presented in [10] and applied to the performance evaluation of brain editing and brain tissue segmentation algorithms for MR images. It relied on a distance-based discrepancy features between the ground truth obtained from realistic digital brain phantom, which is taken as a reference, and the edited/segmented brain tissues. The proposed strategy can be used to evaluate and validate any segmentation algorithm, and it is able to determine quantitatively to what extent a segmentation algorithm is sensitive to internal parameters, noise, artifacts or distortions when a ground truth is given. In this paper, a segmentation algorithm based on Hidden Markov Models is presented, in conjunction with the required preprocessing, for MR data. The algorithm is multi-dimensional and demonstrates a high degree of accuracy for 3D MRI brain segmentation, compared to other techniques. Unlike generic pre-processing used in most image processing and computer vision applications, the pre-processing phases used in this algorithm are specically developed to handle problems encountered in 3D MRI brain segmentation. These problems include correction of sudden intensity variations resulting from artifacts during the acquisition process and global brightness and contrast correction, with both problems showing a signicant impact on segmentation accuracy. In addition to its segmentation accuracy, the HMM-based segmentation algorithm distinguishing characteristics include efcient computational requirements, unique scanning of the 3D MRI data that enables the modeling of the voxels neighborhood effect on that voxels segmentation, and generic applicability to larger neighborhoods that is important for the detection of larger features that exceed the high-resolution neighborhood size. The 3D MRI segmentation algorithm was evaluated using simulated 3D MRI brain data sets obtained from McConnell al Neurological Institute, McGill Brain Imaging Centre, Montre University (http://www.bic.mni.mcgill.ca/) and real 3D MRI brain data sets obtained from the Internet Brain Segmentation Repository (IBSR), Center for Morphometric Analysis at Massachusetts General Hospital (http://www.cma.mgh.harvard. edu/ibsr/). The 3D MRI data sets are used to perform an objective assessment of the segmentation results based on a metric that enables the comparison of the segmentation results obtained using the presented algorithm as well as clinical experts performing segmentation manually, which are available from the IBSR web site. The metric is termed the overlapping coefcient and is equal to one if the automatic segmentation results were identical to the manual ones and reduces to 0 with no intersection. The quality of the

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

segmentation results obtained using the algorithm presented in this paper were further evaluated by comparison with the results of other algorithms applied to the same data sets and published on the IBSR website. This paper is organized as follows: Section 2 describes the underlying mathematical foundation upon which the algorithm is based. Section 3 provides the details of the adopted mathematical model for the discrete Hidden Markov Models and is followed by the mathematical foundation of the continuous case. Then, a complexity analysis for comparing Markov Random Fields and Hidden Markov Models is presented in Section 3. Section 5 details the training and segmentation steps in both the continuous and discrete cases. Section 4 provides the detail of the preprocessing phases. Finally, experimental results using both real and simulated 3D MRI data sets are presented in Section 6. 2. Mathematical model The basic foundation of the presented algorithm relies on the ability of the underlying Hidden Markov Model (HMM) to build knowledge about the input multi-dimensional data vectors or sequences that reect the parameters of the MR imaging modality, i.e. intensity information about the voxel and its neighborhood. Hidden Markov Models are descendants of Markov Chains, which are made of different states statistically bound by transition probabilities. A HMM is characterized by a set of internal states, the transition probabilities among the states in response to an input symbol from the sequence, and the emission probabilities of symbols from the different states. The HMM knowledge is built in the form of the transition as well as the emission probabilities of the states that are conditioned in response to the input symbols of the sequence during the learning stage based on two mathematical assumptions. First, the Markovianity assumption, which is expressed as follows: pqi Z si jqiK1 Z sa ;qiK2 Z sb ;. Z pqi Z si jqiK1 Z sa (1)

Eq. (1) imposes the condition that the probability p of transition from one state qiK1 to another qi, is only dependent on the previous state qiK1. In other words, the probability is independent of the states prior to qiK1. Second, the assumption that the emission probabilities from each state are independent of each other, which leads to the output probability being the product of the emission probabilities of all states, as expressed in Eq. (2) as follows X pOjL Z pq1 bq1 O1 aq1 q2 bq2 O2 aq2 q3 / bqn On (2)
q1 ;q2 ;.;qn

assume that the states are hidden and cannot be observed at the output stage. Instead, only the outputs emitted from those states are observable without knowing which states emitted those outputs. This is true when the Hidden Markov Models are viewed from a similar perspective to the one presented in [4], where the HMM was imagined as a process generating output symbols and the observations were viewed from the outside without knowing which states emitted them. At that point, the emission probability of one state can well be assumed to be independent of the other outputs. However, a different case exists when the HMM is used for MR image segmentation, where the objective is to nd the best state sequence that might have produced an output. By inspecting (2), the output probability for each segment is calculated using the most probable path only, i.e. without a summation over all possible paths. During training, the goal of the training algorithms is to increase the output probability of input sequences representing a certain class of tissues. Hence, the transition and emission probabilities are updated in a manner that maximizes the output probability of a given class of tissues. This in some cases entails changing transition and emission probabilities of prior states in order to maximize the output probability given a certain terminating output. A case that is clear if the output probability is considered only due to the most probable path. This mechanism in turn encodes some form of relationship between the terminal and the input sequences. The encoding of relations arises from the fact that upon updating the transition probabilities of the prior states their values are decreased, forcing the most probable chain of states to change to another set of states having higher transition and emission probabilities. The fact that relation encoding takes place is demonstrated through a numerical example that shows that emission probabilities during classication with HMMs are conditioned by non-neighboring outputs. The encoding of this relation is demonstrated through the example HMM, shown in Fig. 2, which involves two states, State 0 and State 1 with the following initial probability, pq0 Z pq1 Z 0:5, emission probabilities, bq0(0)Z0.8, bq0(1)Z 0.2, bq1(0)Z0.4, bq1(1)Z0.6, and transition probabilities a00Z 0.3, a01Z0.7, a10Z0.6, a11Z0.4. This was tested using a sequence of ve outputs 00000 and the most probable chain was found to be 01010 with probability of 0.016257. However, when the last output is changed to 00001, the most probable chain changes not only in terms of the last state but also in terms of the rst state to be 10101 with probability of 0.008129. Changing the output emitted nal state inferred a change in initial state and consequently changes the emission

where p is the output probability of a chain OZO1O2/On, bqx(Oy) is the emission probability of pattern Oy from state qx, aij is the transition probability from state i to state j, pqx is the initial probability of state qx, and L is a vector representing the model parameters. Higher-order Markov Models increase the level of dependency, which complicates the analysis of higherorder systems. Moreover, rst-order Hidden Markov Models

0.7 0.3
Pi=0.5 b(0)=0.8 b(1)=0.2 Pi=0.5 b(0)=0.4 b(1)=0.6

0.4

State 0

0.6

State 1

Fig. 2. Example HMM.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 5

probability of O1 depending on O4. This shows that the emission probability of output intensities can be conditioned by the presence of other output intensities emitted by nonneighboring states. A simple argument based on those results shows that the HMM can encode relations in more than one dimension, since the intensities in these sequences are constructed from a 3!3!3 neighborhood of voxels. Moreover, the HMM encodes relations between intensities of nonneighboring voxels in the same 3!3!3 neighborhood, even if they do not reside in the same clique, as dened in HMRF models. The knowledge stored in the HMM encodes the conditional dependency of the voxels intensities and the class of tissue to which they belong in the form of the initial probabilities, transition probabilities and emission probabilities which are based on the mathematical model of the HMM transition among the constituent states. In contrast, Hidden Markov Random Fields (HMRFs) are based on Gibbs distribution, which encodes relations between voxels through the usage of cliques and mathematical modeling of the potential. In other words, both MRFs and HMRFs provide a mathematical model for the dependency between voxel intensities. However, HMMs can establish similar dependencies among pixel/voxel intensities that are in larger regions or do not belong to the same clique, as will be shown in Section 3 addressing the HMM mathematical model. In this work, when presenting the pixel/voxel data to the HMM-based segmentation module, each pixel/voxel is represented by a vector composed of its grayscale/color value as well as those of other pixels/voxels in its neighborhood, 9 pixel-vector and 27 voxelvector for 2D and 3D imaging data, respectively. The vector is presented to the HMM models and the probability of output is calculated using prior training knowledge stored in the model. Labeling takes place by setting the label of the voxel to that of the HMM showing the highest output probability. The outputs of a HMM can be discrete, acquiring certain specic quantized levels or continuous based on continuous probability density functions (PDFs). The most common continuous PDF representation is a Multivariate Gaussian distribution whose co-variances are assumed to be zeros, reducing to a mixture additive set of normal distributions. By estimating the probability that a pattern was generated by a certain HMM where the most probable model to produce that pattern is regarded as its tissue type or class. HMMs were previously used successfully in automatic speech recognition (ASR) and are commonly used with the Minimum Classication Error (MCE) training algorithm described in [6,7], which forms the foundation of the learning process employed in the proposed segmentation framework. During the MCE training, the derivatives of the output are computed with respect to every parameter to be updated. Since the output we seek is the class number, a continuous differentiable formula is required that evaluates the correctness of the result by replacing the non-differentiable discrete on/off output. The mathematical model of the loss in [6,7] was used for that case where

li Z sigmoiddi Z

1 1 C eKgdiCq

(3)

where g is the sigmoid slope, q is a shift and di is continuous variable that is more negative when the result is more correct, i.e. when the HMM of class i has higher probability, which can be expressed as follows: 11=h 0 C B C B k X C B 1 h B di Z K gi X ;L C B g j X ;L C C C B k K1 j Z 1 A @ j si
k

(4)

The right term of Eq. (4) approaches MAX gj X ; L as h/ jZ 1 j si N, which leads to di being negative if the HMM model of class i showed the highest probability and so will the corresponding li. gx is a discriminant function for each class, which is not necessarily corresponding to a probability since no restrictions are imposed for that purpose. However, by using HMMs the output is the probability of the pattern, and the used discriminant is the probability due to the most probable path. k is the number of models involved. The MCE updates each parameter trying to reach the minimum of li. For a certain parameter x, this update proceeds as follows xtC1 Z xt K3 vli vx (5)

where 3 is the learning rate. In the MCE algorithm [7], it is discussed that if the learning rate was chosen such that the following conditions are satised
N X tZ0 N X tZ0

3t /N

(6)

3t2 !N

(7)

the model parameters L approaches at least a local minimum L*. It is also described that by using a small sigmoid slope that increases across iterations, the global minimum is achievable with a higher probability than other training algorithms due to smoothing of the error surface. Both considerations were addressed in the context of this paper, where the learning rate is given by: 3t Z 30 1 C at (8)

The integration from zero to innity is innity, and the integration of the learning rate squared is equal to 3(0)/a, where a is a constant, and t is time, which is substituted by the iteration number, i.e. (6) and (7) are satised. In other words, the proposed HMM is accurate since it converges to the global minimum as well as robust since the convergence is only dependent on the established learning rate.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

Two HMM models will be considered during analysis. The rst one is a binary discrete HMM where each node has an emission probability for zero and an emission probability for one. Consequently, the input is taken in the form of a long vector having the binary equivalent of the intensities represented in eight bits each. The second model is a continuous model where each node represents the emission probabilities in the form of a Gaussian Mixture. The analysis of the discrete model will be rst presented followed by the formulas necessary for the continuous Gaussian Mixture. Since li is a function of di, and di is a function of gx, xZ 1,.,k, then the derivative of li with respect to a certain parameter x using the chain rule is given: vli vl vdi vgx Z i vx vdi vgx vx vli Z gli 1 Kli vdi 8 K 1; > > > > < 2 31=hK1 vdi Z k hK 1 X j Z 1 g 1 vgx > x > 4 5 > gh ; > j : k K1 k K1 j si x Zi (11) x si (9)

For that reason in [6] a substitution was used which guarantees those constraints where the substituted parameter is the one that gets updated in each step. The substitution previously used in [6] for the initial probability is: px Z  x expp
Q P qZ1

(16)

 q expp

The previous substitution works well except for the fact that its uses exponents, which slows down execution. Another substitution is proposed and used in this research that does not depend on exponents px Z 2 p x Q P 2 p q
qZ1

(17)

(10) aix Z

qZ1

2 a ix Q P 2 a iq 2 P 0 2 P 1 2 2 P 0 C P1

(18)

P0 Z

2  P 0 C P1

; 2

P1 Z

(19)

The output of the HMM can take several forms X gi Z gx;q;L


q2C

(12)

gi Z MAXgx;q;L
q2C

(13) #1=h

where P0 and P1 belong to a certain state, and represent the emission probabilities of ones and zeros. The parameters that get updated are the substituted bar parameters. To update the initial probabilities we need to nd the  q , where qZ1,.,Q. If derivative of gx with respect to every p qZq0 then 21 Kpq0 vgx Z  q0  q0 vp p (20)

1 gi Z N C

"

X
q2C

g x ;q ;L

(14)

where C represents the set of possible chains and N(C) is the number of elements in C. The output can be any of the previous forms or functions of them. MCE training discussed in [6,7] was based on Eq. (13), which is called the segmental form where only the most optimal path is considered for update during the Generalized Probabilistic Descent update step. Since minimizing or maximizing a function requires the minimization or the maximization of its log, we choose the discriminant function given by
T X gi Z Logpq0 C LogaqtK1 qt C Logbqt tZ1

On the other hand, if qsq0, a dependency still exists through the normalization formula (17), and the derivative becomes: z 2pq0 p vgx K Z 2  q0 vpz p (21)

A similar case holds for the transition probabilities. During the update, we will consider the derivatives of the transition probability going out from a certain state i to a state j.
T X vg x 1 vaqtK1 qt Z ij ij a va va tZ1 qtK1 qt

(22)

(15)

where the bs are the output functions, as are the transition probabilities and ps are the initial probabilities. The HMM imposes constraints on the most of the parameters associated with each model. Such constraints include the summation of all transition probabilities going out of a state which must be one, the summation of all initial probabilities and many other constraints, which have to be satised during parameter update.

vaij 2aij 1 Kaij Z ij ij va a 2a2 vaix K ix aij Z  ij va 2 a ix

(23)

(24)

Updating the output probabilities is much easier than the rest of the parameters. The rst step is to nd the derivative of g with respect to b.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 7

vgx 1 Z bq vbq vbx 2P0 P1 2x K1 1 Z 1 vP P vbx 2P0 P1 1 K2x 0 Z 0 vP P

(25)

3. Comparison with Hidden Markov Random Field Comparison of the HMM and HMRF in the context of MRI segmentation will be presented from two points of view. The rst is performance where the complexity analysis of both is presented. The second is the ability of encoding relations among voxels in larger neighborhoods. In order to assess the computational efciency of the proposed HMM-based segmentation framework, the complexity of the HMM-based segmentation is compared to the widely utilized HMRF-based segmentation in terms of performance. Since the continuous Gaussian Mixture HMM is similar to HMRF segmentation, its complexity analysis is used for performance comparisons. This starts with the estimation of the Gaussian mixture given by OZ
G X iZ1
xK m2 wi p eK s2 2ps2

(26)

(27)

The only difference between the discrete and the continuous HMM models, is the way the output probability is calculated. In the continuous case, b(x) is derived from a Gaussian mixture as follows b x Z
K X kZ1

Ck N x;mk ;s2 k

(28)

where two constraints are imposed. The rst is that the summation of the weights Ck must be one. The second is that the standard deviations sk is always positive. To guarantee that, 4 s2 k is used for the standard deviation while sk is used for the variance. mk is the mean of distribution k and x is the input variable. The substitution used for the weights is given by Ck Z 2 C k K P 2 C x
xZ1

(35)

where wi is the weight associated with this Gaussian response. The number of oating-point operations required Nf for such an operation is given by OHMM Z 9 ! G (36)

(29)

where K is the number of mixtures used. Updating the parameters is governed by the following equations: vC x 2C 1 KCx Z x   vC x Cx
2 vC y K 2C y Cx Z 2 x  vC C y

(30)

(31)

vbx Z N x;mk ;s2 k vCj vbx vbx Cj x Kmj Z vmj vCj s4 j vbx K vbx 4x Kmj 2 C 2s4 j Z vsj vmj sj

(32)

where the nine operations account for nding (xKm), squaring it, nding s2, negating (xKm)2, nding the exponent, calculating 2ps2, nding square root, dividing wi by square root, and multiplying by the exponent and G is the number of mixtures used. Hence, to nd the output probability of a certain number in a sequence, Eq. (36) gives the number of required oating-point operations. In its rst iteration, the Viterbi algorithm computes the output probability of the rst pattern in the sequence, multiplied by the initial probability of each node, which forms n!(1C9!G) computations for n nodes. In the subsequent operations, the Viterbi algorithm multiplies the current probability set at each node by the transition probability to each node, which require an extra n2 operations, and adds the output probability of the current pattern, which needs 1C9!G operations, so the total number of operations is given by OHMM Z n1 C 9G C n2 1 C 9GL K1 0 OHMM

(33)

(34)

Z On2 GL

(37)

This then leads to the general form of the training and segmentation algorithms for a 3!3!3 3D neighborhood. Voxel data is represented as a vector composed of 27 oatingpoint numbers. Each of these numbers represents the intensity of the voxel and the intensity of each of its 26 3D neighbors. This vector is presented to the HMM model and the probability of output is calculated using prior training knowledge stored in each model. Labeling takes place by setting the label of the voxel to that of the node showing the highest output probability.

where L is the length of the sequence. HMRF models start by counting the number of cliques in the 3!3!3 neighborhood. Those cliques can only be formed as 2!2!2 neighborhoods, i.e. composed of eight voxels. Any combination of voxels larger than two will form a clique in that neighborhood. Each of cliques requires the evaluation of the potential. Since the complexity of computing the potential depends on the model being used, the potential is assumed to require one cycle per voxel and another cycle for the clique, which results in the best case scenario for the HMRF models. This can be demonstrated by the simplest case of subtracting

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

the mean out of each voxel, squaring and summing all the potentials together. More complicated models will, in turn, require higher complexities. The number of operations NV required to carry out these computations of the potential will thus be: NV Z 2 ! 16 ! 2 C 1 C 4 ! 4 ! 2 C 1 C 1 C
8   X 8 vZ3

! 4 ! v C v K 1

(38)

And since the probability distribution P(f) of the conguration is a Gibbs distribution (with respect to the neighborhood system used), which can be given by: X 1 1 Pf Z ! eKT U f where U f Z Vc f (39) Z allCliques If Z is assumed a constant by restricting the cliques either to single locations like or single and double locations, the order of computing the Gibbs distribution then the computational complexity of G will be O(1), since it requires a constant number of operations irrespective of any of the model parameters. However, for more accurate computations the process of estimation of Z increases the order of complexity, moreover, for a continuous case like that presented in this paper it is impossible to nd the exact value of Z as it will be the result of 27 nested integrals. This leads to estimation, which in turn affects the accuracy of the computed probability. The complexity becomes: OHMRF Z OZ (40)

So, in the continuous case, the HMRF becomes the computation of 27 nested integrals, whereas the HMM is dependant on the number of classes, number of nodes, and the size of the input vector. In the HMM, application to larger neighborhoods occurs with a change in the size of the input vectors used for training and segmentation to represent the larger neighborhood since the parameter updates (11)(34) do not rely on the size of the input vector. No further change in the HMM algorithm is necessary. As a result, the HMM provides a robust foundation that is generically applicable to the segmentation of multidimensional datasets in arbitrarily large neighborhoods, i.e. applicable to MRI as well as MRSI data. Larger neighborhoods raise a computational concern in the case of HMRF segmentation. For example, the classication of a voxel based on a neighborhood larger than 3!3!3 involves the mapping from MRF to Gibbs distribution, which, in turn, entails computing the Gibbs distribution in a 3!3!3 neighborhood. Larger regions necessitate the analysis of higher order Markov Fields, which requires the re-denition of the neighborhood. To successfully relate larger neighborhoods, HMRF must be used in which iterative segmentation takes place. This is due to the dependency of segmenting each voxel on its neighbors and their prior segmentations, which are used to compute the potential. Thus, the segmentation becomes subject to iterative local maximization/minimization

algorithms like the Expectation Maximization (EM) and the Iterative Conditional Modes (ICM), which are typically used to avoid the analytically intractable nature of estimating the best solution for the HMRF. A common concern with such methods is their sensitivity to the initialization conditions and the reaction of the system to input sequences during segmentation. HMM are easily applicable to larger neighborhoods at cost of increasing the additional complexity required by the algorithm (increase in L in Eq. (37)), instead of adding increased sensitivities to initial conditions and reactions to input sequences. The problem of applicability to larger neighborhoods is specically important in the context of segmentation of biomedical imaging data from multiple modalities where the voxel neighborhood must be extended across modalities or across time, e.g. in functional MRI, beyond the 3!3!3 neighborhood. Although this increase may provide better segmentation accuracy, the increase in accuracy is bounded, i.e. will occur up to a certain neighborhood, after which there could not be any signicant change in the segmentation accuracy due to the smoothing effect of utilizing a larger neighborhood. This issue can even further complicate the choice of the appropriate neighborhood size, since if the neighborhood becomes very large, the segmentation accuracy can be negatively affected. Hence, the contribution of the different neighbors in the segmentation process may be weighted according to their distance to the voxel under investigation. These weights can be inversely proportional to the distance between the neighboring pixels/voxels and the investigated pixel/voxel. In other words, the signicance of the neighboring pixels/voxels in the segmentation strategy increases as the neighbors become closer to the pixel/voxel under investigation. 4. Preprocessing phase Preprocessing steps aim to reduce the effects of noise, address intensity inhomogeneities, and perform global intensity level correction and are applied prior to segmentation. These are based on existing techniques and are only presented here for completeness, but are not discussed in detail. 4.1. Intensity inhomogeneities Intensity inhomogeneities are dened as variations in voxel intensities through or across imaging data sets, which appear as either sudden or slow variations. Handling both types of intensity variations in a pre-processing phase to segmentation results in improving the segmentation accuracy through the control of adverse effects caused by such inhomogeneities. A normalized histogram intersection between each two consecutive images in a data set is used for this purpose. The distributions of pixel intensities between each pair of consecutive images are expected to change slowly. If the mean and variance across slices nearly match, then the distribution will change slowly. Assuming that Ii is the intensity of pixel i in an image, then the standard deviation of the image is given by

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 9

Fig. 3. Sudden intensity correction steps.

v u N u1 X sZt I Km2 N iZ1 i

(41)

^ int H

V Max vZ0

^ IA ; v h H ^ I B ; v dv H

(44)

where m is the mean intensity. If we assume that a contrast a and a brightness b, which made the standard deviation of the voxel intensity distribution become r 0 , then: v u N u1 X 0 (42) s Zt aIi C b Kam Kb2 Z as N iZ1 This shows that the standard deviation is only affected by the contrast. Obviously, that case maps to the correction of each slice, with respect to its preceding slice. The slices that are considered are those ones having non-empty preceding slices, which can be determined since after skull peeling (cerebrum reconstruction, or skull stripping) all the background voxels end up being exactly zeroes. A similar argument holds for the brightness, where m 0 Z am C b (43)

The brightness and contrast values were estimated in the same way done in Eqs. (42) and (43), where they were applied after the sudden intensity correction and before the ltering. 5. Training and segmentation steps Based on the mathematical models of both the discrete and continuous HMM-based segmentation techniques, the general form of the HMM-based training and segmentation algorithms for a 3D neighborhood N involves representing each voxel by a vector or sequence of symbols v. The sequence represents the relevant parameters of the voxel and the voxels neighbors in N. The representative vector or symbol of each voxel is presented to a set of HMM models, each corresponding to a separate class or tissue type, and the output probabilities are calculated using prior training knowledge stored in each HMM model. Labeling takes place by assigning to the voxel the label associated with the HMM showing the highest output probability. Training of both the continuous and the discrete models follow the same procedure (Fig. 4). The segmentation also follows the same procedure for both discrete and continuous HMM-based techniques (Fig. 5). If labeling encounters segments whose characteristics are not consistent with any of the known tissue types, these are classed as unknown tissue. A clinical expert is then requested to assign a

which means that by the knowing a from Eq. (42), b can be estimated from (43) Fig. 3. 4.2. Global intensity level correction Global intensity correction is addressed after handling both sudden and slow intensity variations. Since, the HMM-based segmentation utilizes the grayscale or color information of voxels, it is sensitive to global variations among data sets. In order to remedy this condition, global correction is employed to maximize the histogram intersection between the data sets, so that errors due to intensity differences are minimized. In order to achieve the required global intensity correction, the normalized histograms are utilized due to differences between the number of pixels/voxels of different data sets. The histogram, which represents the frequency of the intensities, is normalized against the total number of non-background pixels present in each data set. And brightness that leads to the maximization of the integral of the histogram intersection, expressed as follows, is performed after applying an anisotropic ltering stage:

Fig. 4. Training the HMM.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

10

Fig. 5. Segmenting with the HMM.


Classification accuracy 7.60E-01 7.40E-01 7.20E-01 7.00E-01 6.80E-01 6.60E-01 6.40E-01 6.20E-01 1.00E+03 1.00E+04 1.00E+05 Iterations 1.00E+06

were used here for preprocessing of imaged prior to segmentation. The anisotropic lter used had kZ5 and applied for 10 iterations. For the discrete HMM, the number of states was successively increased and after 15 states no signicant improvement was detected. The number of states used was 10, with a Gaussian mixture of 15 distributions. The maximum number of iterations was set to 30,000 and the sigmoid slope to 0.08. Fig. 6 shows the classication accuracy (1-loss) averaged for every 1000 iterations for the discrete model. It is clear that the training algorithm reaches the minimum of the error surface after around 10,000 iterations, which justies why during experimentation we chose 30,000 as an upper bound for the number of iterations. 6.1. BrainWeb data results The algorithm was tested using simulated digital phantoms from the BrainWeb MR simulator (http://www.bic.mni.mcgill. ca/brainweb/). The digital phantoms were obtained using an isotropic voxel size of 19 mm to investigate the inuence of noise, eld inhomogeneity, and contrast (T1-weighted using [18, 10 ms, and 30 (TR, TE, and ip angle))] with varying levels of noise from 1 to 9% and varying levels of spatial inhomogeneity, i.e. intensity variations for each tissue class, from 0 to 40%. The comparison was performed on the basis of the Dice similarity coefcient that measures the overlap between two segmentations X and Y DX ;Y Z 2jX h Y j jX j C jY j (46)

Fig. 6. Classication accuracy (1-loss) evaluated across iterations for every 1000 iterations.

label to the unknown tissues. The segments characteristics are then used to initialize the knowledge of the newly identied tissue and the corresponding HMM. The acquired knowledge is then used to label new segments that belong to the newly identied tissue type. 6. Experimental results Three types of preprocessing were applied. 3D anisotropic ltering as described in [13], using kZ5 and for 10 iterations. Previously in [14], we showed how Global Intensity level correction could be applied to MRI sequences. Also in [14], we showed how sudden intensity variations that appear in many MR sequences could be accounted for. The same techniques
Table 1 BrainWeb results, 1 mm slice Spatial inhomogeneity 0% White 0% 1% 3% 5% 7% 9% Average 0.831 0.825 0.815 0.793 0.739 0.663 0.778 Gray 0.872 0.870 0.869 0.860 0.832 0.796 0.85 20% White 0.831 0.756 0.772 0.765 0.739 0.682 0.758

where jrj represents the number of voxels in segment r. The Dice coefcient was computed for both gray matter and white matter segmentation. The results are shown in Tables 15. As can be seen from the tables, the results for the Dice similarity coefcient shows that the HMM-segmentation provides accurate segmentation of the White Matter (WM) and Gray Matter (GM) even in the presence of increasing noise and spatial inhomegeneities. The increase in the slice thickness has the expected effect of reducing the accuracy of the algorithm as evidenced by the similarity coefcient. This is expected as the algorithm itself is geared to use in 3D images

40% Gray 0.872 0.801 0.828 0.833 0.825 0.799 0.826 White 0.708 0.706 0.713 0.717 0.702 0.672 0.703 Gray 0.756 0.756 0.773 0.793 0.797 0.787 0.777

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 11

Table 2 BrainWeb results, 3 mm slice Spatial inhomogeneity 0% White 0% 1% 3% 5% 7% 9% Average 0.707 0.708 0.703 0.695 0.668 0.600 0.68 Gray 0.790 0.792 0.802 0.804 0.796 0.770 0.792 20% White 0.671 0.672 0.682 0.685 0.664 0.620 0.666 Gray 0.727 0.741 0.765 0.783 0.782 0.772 0.762 40% White 0.637 0.636 0.641 0.646 0.642 0.618 0.637 Gray 0.707 0.707 0.723 0.749 0.758 0.755 0.733

Table 3 BrainWeb results, 5 mm slice Spatial inhomogeneity 0% White 0% 1% 3% 5% 7% 9% Average 0.629 0.628 0.634 0.620 0.618 0.598 0.621 Gray 0707 0.711 0.733 0.750 0.755 0.753 0.735 20% White 0.607 0.609 0.624 0.625 0.611 0.600 0.613 Gray 0.662 0.669 0.704 0.719 0.745 0.752 0.709 40% White 0.582 0.583 0.591 0.601 0.603 0.601 0.594 Gray 0.637 0.641 0.664 0.697 0.714 0.729 0.68

Table 4 BrainWeb results, 7 mm slice Spatial inhomogeneity 0% White 0% 1% 3% 5% 7% 9% Average 0.568 0.568 0.577 0.569 0.563 0.510 0.559 Gray 0.630 0.634 0.662 0.578 0.708 0.698 0.652 20% White 0.559 0.560 0.564 0.581 0.577 0.596 0.573 Gray 0.589 0.594 0.613 0.652 0.671 0.697 0.636 40% White 0.524 0.543 0.551 0.562 0.556 0.555 0.549 Gray 0.567 0.570 0.595 0.629 0.641 0.690 0.615

Table 5 BrainWeb results, 9 mm slice Spatial inhomogeneity 0% White 0% 1% 3% 5% 7% 9% Average 0.526 0.532 0.534 0.529 0.508 0.527 0.526 Gray 0.573 0.576 0.598 0.630 0.659 0.673 0.618 20% White 0.523 0.523 0.532 0.539 0.535 0.529 0.53 Gray 0.539 0.538 0.570 0.593 0.634 0.646 0.587 40% White 0.512 0.513 0.519 0.527 0.533 0.534 0.523 Gray 0.518 0.521 0.538 0.594 0.596 0.630 0.566

data sets in which the neighboring images are of a similar distance as the pixel distances (i.e. that the slice thickness is close to the pixel distance). In the case of the simulated dataset, the ground truth is established from the original data

creation, whereby each tissue is clearly established and the segmentation is completely known. Thus, this data set does not need an expert segmentation for comparison. Additionally, expected results should be better that for real data. Note that

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

12

Fig. 7. Sample segmentation of simulated digital phantoms.

this is used to initial test the capability of the algorithm to carry out segmentation, further testing with real data is also presented that show comparisons with existing and expert segmentation. Sample images for 1 mm slices are shown in Fig. 7. The leftmost images are the original phantom data, the center is the data used to generate the phantom (i.e. the ground truth data), and the rightmost is the segmented result. As can be seen, subjectively, the segmentation results reect the objective overlap results. One noticeable exception is that the segmentation algorithm currently is congured to only look at gray and white matter, and ignores all other tissue types. Further work is continuing in expanding the algorithm for use with additional tissue types. 6.2. IBSR data results After segmenting each case, the accuracy of the HMMbased segmentation relative to the manual segmentations as well as the results of existing techniques, including the maximum likelihood, was determined using the Tanimoto
Table 6 IBSR reported results White 0.567 0.562 0.567 0.554 0.551 0.571 0.832 Gray 0.564 0.558 0.473 0.550 0.535 0.477 0.876 Method Adaptive MAP Biased MAP Fuzzy c-means Maximum A posteriori Probability (MAP) Maximum-Likelihood Tree-structure k-means Manual (4 brains averaged over 2 experts)

coefcient, which was previously used in existing techniques, and is given by T X ; Y Z jX h Y j jX h Y j (47)

where jrj represents the number of voxels in segment r. By the denitions of the Dice and Tanimoto coefcients, T(X,Y)%D(X,Y). So, the Tanimoto is more conservative than the Dice, where equality is subject to the condition that
Table 7 Overlapping results obtained from applying both HMM algorithms on the IBSR data after training Discrete White 100_23 11_3 110_3 111_2 112_2 12_3 191_3 13_3 202_3 205_3 7_8 8_4 17_3 4_8 15_3 5_8 16_3 2_4 6_10 Average 0.517 0.537 0.589 0.614 0.610 0.574 0.504 0.617 0.566 0.499 0.587 0.595 0.613 0.590 0.592 0.578 0.604 0.596 0.528 0.546 Gray 0.694 0.718 0.747 0.737 0.761 0.748 0.708 0.746 0.743 0.623 0.745 0.723 0.716 0.690 0.697 0.635 0.719 0.684 0.582 0.671 Continuous White 0.774 0.778 0.746 0.748 0.761 0.784 0.762 0.761 0.756 0.723 0.758 0.742 0.735 0.669 0.669 0.731 0.702 0.635 0.752 0.699 Gray 0.879 0.878 0.869 0.857 0.874 0.881 0.870 0.868 0.864 0.782 0.869 0.853 0.854 0.813 0.817 0.854 0.842 0.797 0.855 0.809

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 13

Table 8 Overlapping results obtained without carrying out sudden intensity correction White 100_23 11_3 110_3 111_2 112_2 12_3 191_3 13_3 202_3 205_3 7_8 8_4 17_3 4_8 15_3 5_8 16_3 2_4 6_10 Average 0.792087 0.795936 0.756762 0.777369 0.775694 0.814668 0.798191 0.799298 0.793114 0.760803 0.743964 0.734167 0.710917 0.631508 0.700746 0.238758 0.72307 0.632165 0.386919 0.668307 Gray 0.867957 0.863375 0.850009 0.844556 0.854445 0.873579 0.864343 0.869574 0.857034 0.761042 0.836973 0.817848 0.814474 0.774435 0.790821 0.690957 0.815892 0.77051 0.668827 0.774333

XKYZf. Either coefcient can be utilized in the evaluation. However, the choice is dictated to ensure the consistent comparison of the published results with the HMM-based segmentation results. The Tanimoto coefcient was computed for both gray matter and white matter segmentation based on an analysis of variance in which the coefcient is the dependent variable while the training dataset and the tissue type are the independent factors. The results are demonstrated in Table 7. Table 6 shows the average results that were reported on the IBSR website using the same data that was used in this study. The BMAP algorithm described in [11] is based on HMRF computation. Although the results of the discrete model shows to be near from most of the reported ones, yet the results of the

continuous model shows to be superior even when compared HMRF. The IBSR data consists of various image sequences representing differing real work data sets. The HMM was trained with one data set, and then used to segment the remaining data sets. In both Tables 7 and 8, the rst column represents the image sequence numbers. The averages for the HMM are show on the last row for comparison with existing results. For a fair comparison, the preprocessing phase of intensity variations was removed, and the results were compared to that of the Adaptive MAP algorithm [15], which takes care of intensity variations of segments through an ML stage for initialization purposes. The AMAP is based on Hidden Markov Random Fields, so after removing this preprocessing phase the comparison becomes so close to comparing both algorithms together except for the difference in ltering. We compare with the AMAP and not the BMAP because the latter models the bias eld, which was not considered in our analysis. The results in Table 8 demonstrate that the HMM were able to segment the brain with higher accuracies. Conversely, this supports the initial argument present at the beginning of the paper, which is that HMMs during classication encodes relations not only between neighboring voxels, but also between voxels present in non-neighboring sites and which is not present in the HMRF. Inspection of segmented slices from case 5_8, which was before described in the preprocessing phases during sudden intensity correction, is shown in Fig. 8. The comparison reveals the expected, without sudden intensity correction the whole slice is erroneously segmented as gray, and the bright one gets segmented as being white. The results demonstrated in the tables demonstrate an objective assessment of the quality of the algorithms, yet practical cases may be much higher than that since each of the IBSR data sets contains at least has one form of difculty.

Fig. 8. Comparison between segmented slices from image sequence 5_8 with and without carrying out sudden intensity correction.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115

14

7. Conclusion In this paper, a 3D MRI segmentation algorithm based on HMMs is presented. The algorithm demonstrates the ability of HMMs to handle multi-dimensional classication, whereas HMMs were previously considered as candidates for 1D classication only. The HMM model, together with carefully constructed preprocessing steps showed signicant improvement in the quality of 3D MRI segmentation when objectively compared to other results obtained using the same data. Both simulated and real data were used in the evaluation of the algorithm with promising results. The objective measure on the simulated phantoms (created from images used to establish ground truth) showed that the algorithm, although currently restricted to only gray and white matter, accurately identies these tissues within limits of error. Further work is progressing on increasing the number of identied tissues. The results from the real data (using expert manual segmentations as ground truth) showed that the overlap measures are better that previously established methods, and are within the limits of error. This is easily seen even the expert manual segmentation shows errors from one operator to another. Comparisons between HMMs and HMRFs concerning the complexity of computations involved and the ability to segment based on the decision made from larger regions are presented. The comparative results indicate that the current mathematical model of MRF using Gibbs distribution can be extended to neighborhoods larger than a 3!3!3 neighborhood if the Markovianity assumption is extended to higher orders. A restriction that is not required for the current HMM modeling scheme. The challenge in using HMRFs for larger neighborhoods requires more research in terms of innovative modeling schemes. In the HMM, application to larger neighborhoods occurs with only a change in the input vectors used for training and segmentation to represent this larger neighborhood since the parameter updates do not rely on the size of the input vector. No further change in the HMM algorithm is necessary. For that purpose, the proposed HMM provides a robust foundation that is not sensitive to the initial conditions for enabling the segmentation of MR imaging data. The problem that affects the application of MRF in segmentation is the classication of the voxel based on regions larger than 3!3!3 neighborhoods. The basis of the mapping from HMRF to Gibbs distribution forces the neighborhood used to compute the Gibbs distribution in a 3!3!3 neighborhood. Larger regions necessitate the analysis of higher order Markov Fields, which in turns needs the re-denition of the neighborhood. To successfully relate larger neighborhoods HMRF must be used, where iterative segmentation takes place. This is due to the dependency of segmenting each voxel on its neighbors and their prior segments, which are used to compute the potential. This in turn becomes subject to iterative local maximization/minimization algorithms like the Expectation Maximization (EM) and the Iterative Conditional Modes (ICM). A common and crucial problem with such methods is

their sensitivity to the initialization conditions and the reaction of the system to input patterns during segmentation. The cost of increasing the neighborhood size in the context of the proposed segmentation strategy is the extra computations required by the algorithm (increase in L in Eq. (37)). The problem of applicability to larger neighborhoods is specically important in the context of segmentation of biomedical imaging data from multiple modalities where the pixel/voxel neighborhood must be extended across modalities or across time, e.g. in functional MRI, beyond the 3!3!3 neighborhood. Although this increase may provide better segmentation accuracy, this will only occur up to a certain point after which there would not be any signicant change in the segmentation accuracy. This issue is under investigation. This issue can even further complicate the choice of the appropriate neighborhood size, since if the neighborhood becomes very large, the segmentation accuracy can be negatively affected. Hence, the contribution of the different neighbors in the segmentation strategy can be weighted according to their distance to the voxel under investigation. These weights can be inversely proportional to the distance between the neighboring pixels/voxels and the investigated pixel/voxel, in other words, the signicance of the neighboring pixels/voxels in the segmentation strategy increases as the neighbors become closer to the pixel/voxel under investigation. Further work is continuing on the effect of increased neighborhood sizes. Other considerations that will enhance the accuracy of segmentation include the usage of multi-spectral images, not only T1 but also T2 and PD. The vectors used in classication will then be extracted from each voxel and its neighbors in the three images forming a 27!3 input. And in this case, a 3D Gaussian mixture model can be used, where the input to each state is a vector of the three intensities. Similar work using multi-sensor data and Hidden Markov Chains has been reported [33] and concludes that the applicability of HMC to these problems is appropriate. This does present promising results that have yet to be applied to MR imaging data. Further work in applying HMM to multi-spectral MR imaging data is currently in progress. References
[1] W. Grimson, G. Ettinger, T. Kapur, M. Leventon, W. Wells, R. Kikinis, Utilizing segmented MRI data in image-guided surgery, International Journal of Pattern Recognition and Articial Intelligence 11 (8) (1998) 13671397. [2] S. Wareld, J. Dengler, J. Zaers, C.R.G. Guttmann, W.M. Wells, G.J. Ettinger, J. Hiller, R. Kikinis, Automatic identication of grey matter structures from MRI to improve the segmentation of white matter lesions, Journal of Image Guided Surgery 1 (6) (1996) 326338. [3] E. Grimson, M. Leventon, G. Ettinger, A. Chabrerie, S. Nakajima, F. Ozlen, H. Atsumi, R. Kikinis, P. Black, Clinical Experience with a High Precision Image-Guided Neurosurgery System, MICCAI, Springer, Berlin, 1998. pp. 6373. [4] L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE (1989) 257286.

+ model

ARTICLE IN PRESS
M. Ibrahim et al. / Image and Vision Computing xx (2006) 115 15 [20] X. Descombes, F. Kruggel, D.Y. von Cramon, Spatio-temporal fMRI analysis using markov random elds, IEEE Transactions on Medical Imaging 17 (6) (1998) 10281039. [21] S. Ruan, C. Jaggi, J. Xue, J. Fadili, D. Bloyet, Brain tissue classication of magnetic resonance images using partial volume modeling, IEEE Transactions on Medical Imaging 19 (12) (2000) 11791187. [22] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A unifying framework for partial volume segmentation of brain MR images, IEEE Transactions on Medical Imaging 22 (1) (2003) 105113. [23] W.M. Wells, E.L. Grimson, R. Kikinis, F.A. Jolesz, Adaptive segmentation of MRI data, IEEE Transactions on Medical Imaging 15 (8) (1996) 429442. [24] R. Guillemaud, J.M. Brady, Estimating the bias eld of MR images, IEEE Transactions on Medical Imaging 16 (6) (1997) 238251. [25] J.L. Marroquin, B.C. Vemuri, S. Botello, F. Calderon, A. FernandezBouzas, An accurate and efcient bayesian method for automatic segmentation of brain MRI, IEEE Transactions on Medical Imaging 21 (8) (2002) 934945. [26] B. Moretti, L.M. Fadili, S. Ruan, N. Bloyet, B. Mazoyer, Phantom-based performance evaluation: application to brain segmentation from magnetic resonance images, Medical Image Analysis 4 (4) (2000) 303316. [27] A. Zavaljevski, A.P. Dhawan, M. Gaskil, W. Ball, J.D. Johnson, Multilevel adaptive segmentation of multi-parameter MR brain images, Computerized Medical Imaging and Graphics 24 (2) (2000) 8798. [28] Y. Wang, T. Adali, J. Xuan, Z. Szabo, Magnetic resonance image analysis by information theoretic criteria and stochastic site models, IEEE Transactions on Information Technology in Biomedicine 5 (2) (2001) 150158. [29] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, A statistical framework for partial volume segmentation, Lecture Notes Computer Science 2208 (2001) 204212. [31] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, F. Tupin, Unsupervised classication of radar images using hidden markov chains and hidden markov random elds, IEEE Transactions of Geoscience and Remote Sensing 41 (3) (2003) 675685. [32] S. Derrode, W. Pieczynski, Signal and image segmentation using pairwise markov chains, IEEE Transactions on Signal Processing 52 (9) (2004) 24772489. [33] N. Giordana, W. Pieczynski, Estimation of generalized multisensor hidden markov chains and unsupervised image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (5) (1997) 465475.

[5] L. Bahl, P.F. de Souza Brown, P.V., K.L. Mercer, Maximum mutual information estimation of hidden markov parameters for speech recognition, Proceedings of the IEEE (1988) 4952. [6] Biing-Hwang Juang, W. Chou, Chin-Hui Lee, Minimum classication error rate methods for speech recognition, IEEE Transactions on Speech and Audio Processing (1997) 257265. [7] S. Katagiri, Biing-Hwang Juang, Chin-Hui Lee, Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method, Proceedings of the IEEE 86 (11) (1998) 23452373. [9] Y. Zhang, M. Brady, S. Smith, Segmentation of brain MR images through a hidden markov random eld model and the expectation maximization algorithm, IEEE Transactions on Medical Imaging 20 (1) (2001) 4557. [10] J.C. Rajapakse, J. Piyaratna, Bayesian approach to segmentation of statistical parametric maps, IEEE Transactions on Biomedical Engineering 48 (10) (2001) 11861194. [11] J.C. Rajapakse, F. Kruggel, Segmentation of MR images with intensity inhomogeneities, Image and Vision Computing 16 (1998) 165180. [12] Jagath. C. Rajapakse, Jay. N. Giedd, Judith. L. Rapoport, Statistical approach to segmentation of single-channel cerebral MR images, IEEE Transactions on Medical Imaging 16 (2) (1997) 176186. [13] N.M. John, A three dimensional statistical model for image segmentation and its application to mr brain images, PhD thesis, University of Miami, 1999. [14] N.M. John, M. Kabuka, M.O. Ibrahim, Multivariate statistical model for 3D image segmentation with application to medical images, Journal of Digital Imaging 16 (4) (2004) 365377. [15] J.C. Rajapakse, J.N. Giedd, J.L. Rapoport, Statistical approach to segmentation of single-channel cerebral MR images, IEEE Transactions on Medical Imaging 16 (2) (1997) 176186. [16] K. Van Leemput, F. Maes, D. Vandermeulen, P. Suetens, Automated model-based tissue classication of MR images of the brain, IEEE Transactions on Medical Imaging 18 (10) (1999) 8979008. [17] S.Z. Li, Markov random eld models in computer vision, in: Proceedings of the European Conference on Computer Vision, Stockholm, Sweden, 1994, pp. 361370. [18] J. Besag, On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B 48 (3) (1986) 259302. [19] K. Held, E.R. Kops, B.J. Krause, W.M. Wells III, R. Kikinis, H. rtner, Markov random eld segmentation of brain MR ller-Ga W. Mu images, IEEE Transactions on Medical Imaging 16 (6) (1997) 878886.

You might also like