Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/311736612
CITATIONS READS
3 27
2 authors, including:
Wentao Zhu
16 PUBLICATIONS 104 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Wentao Zhu on 27 December 2016.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Adversarial Deep Structural Networks for
Mammographic Mass Segmentation
1
To this end, we propose an end-to-end adversarial deep
structural network to perform mass segmentation (Fig. 1).
The proposed network is constructed by taking advantage of
many insights from recent successful networks [19, 32, 11],
and is designed to robustly learn from the poor contrastive CRF structural
I Fully Convolutional
and scarce mammographic images. Specifically, an end- Radv learning
Network
to-end fully convolutional network (FCN) with CRF struc-
tural learning is applied to the mass segmentation. Adver-
sarial training is introduced into the network to learn ro- Figure 1: The proposed adversarial deep FCN-CRF net-
bustly from the scarce mammographic images. To fully ex- work with four convolutional layers followed by CRF struc-
plore the features of mass regions, categorical distributions tural learning.
considering the positions are added for predictions. With
these components, we validate our adversarial deep struc-
tural network on two public mammographic mass segmen- such as low signal-to-noise ratio, the poor contrast nature of
tation datasets. The proposed network is shown to consis- mass and normal breast tissues. There are few work using
tently outperform other algorithms for mass segmentation. deep networks to process the mammogram [7, 34]. Dhun-
Our main contributions in this paper are: (1) Integrat- gel et al. employed multiple deep belief networks (DBNs),
ing CNN+CRF and adversarial training into a unified end- GMM classifier and priori knowledge as the potential func-
to-end training framework has not been attempted before. tions, and structural SVM (SSVM) to perform the spatially
Both components are essential and necessary for achieving structural learning [8]. They also used CRF with tree re-
the state-of-the-art performance. (2) We employ an end- weighted belief propagation to boost the segmentation re-
to-end trained network to do the mass segmentation while sults [9]. A recent work used the output from a CNN as
previous works needed a lot of hand-designed features or an additional potential function, yielding the state-of-the-
procedures, such as calculating potential functions inde- art performance on mass segmentation [7]. However, the
pendently. (3) Our model achieves state-of-the-art results two-stage training used in these methods produces potential
on two commonly used mammographic mss segmentation functions that can easily over-fit the training data, and can-
datasets, the Inbreast dataset and the DDSM-BCRP dataset. not handle spatial information that might be important for
The rest of this paper is organized as follows. Sec. 2 in- segmentation.
troduces the related work of mammographic mass and nat-
ural image segmentation. Sec. 3 describes the framework 2.2. Adversarial Training and Deep Structural net-
of adversarial deep FCN-CRF network for mass segmenta- works
tion. Sec. 4 shows experimental comparisons among these
Adversarial examples are generated by making small
methods. The paper concludes in Sec. 5.
perturbations in the input space that can significantly in-
crease the loss function corresponding to these samples
2. Related Work [29]. The adversarial examples can be used to force the
2.1. Medical Image Segmentation with Neural Net- model to learn more robust features, which is a novel regu-
works larization technique for deep networks and can yield better
performance than the base deep networks [11, 21, 20].
In contrast to the natural image segmentation, only a few Deep structural learning is an emerging topic for natu-
works have leveraged neural networks for medical image ral image processing. The recurrent convolutional network
segmentation. Ngo and Carneiro used deep belief networks learns unified features of pixels and spatial structures for
as the initialization of distance regularized level set method scene labeling [25]. The fully convolutional network re-
to segment ventricle in magnetic resonance (MR) images places the fully connected layers in CNN with convolu-
[1]. The deep contour-aware network employed multi-task tional layers to preserve the spatial structure information
learning to learn the segmentation and contour of glands hi- [19]. Chen et al. refines the FCN results using the fully
erarchically [5]. Liskowski and Krawiec trained deep con- connected CRF [15] to do spatially structural learning [6].
volutional network on a large dataset to do retinal segmen- The CRF as Recurrent Neural Networks (RNN) further un-
tation, and obtained impressive results [18]. The DeepCut rolls the fully connected CRF, interprets it as a RNN, and
network iteratively updated weak annotations and learned trains the model with FCN followed by CRF as an end-to-
the convolutional neural network followed by CRF to do end network [32]. Lin et al. further explores CNN as the
brain and lung segmentation in MR images [26]. pairwise potential function in CRF capturing semantic cor-
Compared to other medical images, mammographic relations of neighboring patches [17, 16].
mass segmentation has its specific issues and challenges, The successful models for natural images easily over-fit
where In denotes the nth image, yn,i is the label of ith pixel
u
in the nth image, N is the number of training mammo-
Qt-1
Qt graphic images, Ni is the number of pixels in the image,
Message Re- Compatibility Local Normali
Passing weighting transform update zation and is the parameter of FCN. In our case, the size of im-
kG
ages is fixed 40 40. That is, Ni is 1,600.
CRF is a commonly used method in the structural learn-
Figure 2: The recurrent unit in the CRF as RNN, which con- ing well suited for pixel-wise labeling of image segmenta-
sists of massage passing, re-weighting, compatibility trans- tion. In image segmentation, CRF models pixel labels as
form, local update and normalization. random variables in a markov random field (MRF) condi-
tioned on a global observation. To make the annotation con-
sistent, we still use y = (y1 , y2 , . . . , yi , . . . , y1600 )T to de-
note the random variables of pixel labels in an image, where
the training set, which leads to poor performance for mam- yi {0, 1}. The 0 denotes the pixel belonging to back-
mographic image processing tasks of scarce and low signal- ground, and 1 denotes it belonging to mass region. The
to-noise ratio data. The adversarial training provides ef- Gibbs energy of fully connected pairwise CRF [15] is
fective regularization for deep structure networks when the X X
data is scarce. The end-to-end training scheme eliminates E(y) = u (yi ) + p (yi , yj ), (2)
over-fitting of FCN compared with the two-stage training i i<j
[7]. The priori knowledge varying with pixel positions can
be used to enhance the FCN in mass segmentation further. where unary potential function u (yi ) is the loss of FCN in
our case, pairwise potential function p (yi , yj ) defines the
cost of labeling pair (yi , yj ).
3. Adversarial FCN-CRF network The pairwise potential function p (yi , yj ) can be defined
Leveraging the insights from recent successful deep con- as
(m)
X
volutional networks for natural image segmentation, we de- p (yi , yj ) = (yi , yj ) w(m) kG (fi , fj ), (3)
sign an end-to-end trained fully convolutional network-CRF m
(FCN-CRF) network with adversarial training. Fig. 1 shows where label compatibility function is given by the Potts
the architecture of the proposed network which has four (m)
model in our case, kG is the Gaussian kernel applied to
convolutional layers, and one CRF as RNN layer. For ad- feature vectors, w(m) is the learned weight. Pixel values Ii
versarial training, the input to the network is the adversar- and positions pi can be used as the feature vector fi .
ial examples. To fully consider the characters of mammo- Efficient inference algorithm
Q can be obtained by mean
graphic image segmentation, a position priori is integrated field approximation Q(y) = i Qi (yi ) [15]. The update
into FCN. rule is
In the following section, we first review the FCN-CRF
(m)
X (m)
network [32] briefly to make the paper self-contained. We Qi (l) kG (fi , fj )Qj (l) for all m,
then introduce how to explore the priori knowledge of the i6=j
mass segmentation and add it into the FCN. Lastly, we de- (m)
X
Qi (l) w(m) Qi (l),
scribe the adversarial training of the network which enables m
effective learning for the scarce mammographic images. X
Qi (l) (l, l0 )Qi (l), (4)
l0 L
3.1. Overview of FCN-CRF Network
Qi (l) exp(u (yi = l)) Qi (l),
Fully convolutional network is a successful model ap-
1
plying convolution neural network (CNN) to the image seg- Qi exp Qi (l) ,
mentation task [19]. The advantage of FCN over the model Zi
with CNN plus fully connected networks is that FCN can where the first line is the message passing from label of
preserve the spatial structure of predictions. FCN consists pixel i to label of pixel j, the second line is re-weighting
of convolution operator, de-convolution operator [31], or with the learned weights w(m) , the third line is compatibil-
max-pooling in each layer. ity transform, the fourth line is adding unary potentials, and
For training, the FCN optimizes the maximum likelihood the last step is normalization operator. Here L = {0, 1}
loss function denotes background or mass. The initialization of inference
N Ni
use unary potential function as Qi (yi ) = Z1i exp(u (yi )).
1 XX The above mean field approximation can be interpreted
LF CN = log p(In |yn,i , ), (1)
N n=1 i=1 as a recurrent neural network (RNN) in Fig. 2 [32]. The
In general, the calculation of exact R is intractable because
the exact minimization is not solvable w.r.t. R, especially
for complicated models such as deep networks. The linear
approximation and L2 norm box constrain can be used for
Figure 3: The empirical estimation of priori on INbreast the calculation of perturbation [11] as
(left) and DDSM-BCRP (right) training datasets.
g
Radv = , (8)
kgk2
recurrent unit takes last states approximation Qt1 , unary where g = I log p(y|I, ). For adversarial fully convolu-
potential function u , and the Gaussian kernel kG as the tional network, the network predicts label of each pixel in-
input. The output is updated approximation Qt . After in-
Q
dependently as p(y|I, ) = i p(yi |I, ). For adversarial
terpreting the CRF as a RNN, the framework using FCN as CRF as RNN, the prediction of network relies
Q on mean field
potential function followed by CRF can be trained with an approximation inference as p(y|I, ) = i Q(yi |I, ).
end-to-end scheme [32]. The adversarial training forces the model to fit examples
3.2. FCN Enhancement with Position Priori with the worst perturbation as well. The adversarial loss is
defined as
The categorical distribution varies greatly with the pixel
N
positions in our mammographic mass segmentation. From 1 X
Ladv () = log p(yn |In + Radv,n , ). (9)
observation, most of the mass located in the center of ROI, N n=1
and the boundary of ROI is more likely background, as
shown in Fig. 3. The phenomenon is also validated by the In training, the total loss is defined as the sum of adver-
result that priori alone gives much better segmentation than sarial loss and the empirical loss based on training samples
multiple DBNs and GMM classifier [7]. as
The conventional FCN provides predictions for pixels in- N
dependently. It only considers global class distribution dif- 1 X
L() = Ladv () log p(yn |In , )+ kk2 , (10)
ference corresponding to the number of filters (channels) in N n=1 2
the last layer. Here we take the categorical priori of dif-
ferent positions into consideration and add it into the FCN where is the regularization factor used to avoid over-
prediction as fitting, p(yn |In , ) is either prediction in the enhanced FCN
or posteriori approximated by mean field inference in the
p(yi |I, ) p(yi )p(I|yi , ), (5) CRF as RNN for the nth image In . The L2 regularization
where p(yi ) is the categorical priori distribution varied with term is used only for the parameters in CRF.
the pixel position i, and p(I|yi , ) is the prediction of con- 3.4. Mass Segmentation using the Learned Model
ventional FCN. The log p(yi |I, ) is used as the unary po-
tential function for u (yi ) in the CRF as RNN. For multi- With the learned adversarial FCN-CRF model, the infer-
ple FCNs as potential functions, the potential function is ence p(yn |In , ) is simply the segmentation result of FCN-
defined as X CRF network because the adversarial training is a regular-
u (yi ) = w(u0) u0 (yi ), (6) ization of the networks. The inference of FCN-CRF can be
u0 calculated directly by running several steps in the CRF as
where w(u0) is the learned weight for unary potential func- RNN network as Eq. 4. The unary potential functions of
tion, u0 (yi ) is the potential function provided by one FCN. CRF as RNN can be obtained as p(yi |I, ) in Eq. 5. The
Experimental result shows that adding position priori ob- pairwise potential function of CRF as RNN can be calcu-
tains 0.25% Dice index improvement than the result of FCN lated using Eq. 3.
without position priori on the INbreast dataset [22].
4. Experiments
3.3. Adversarial FCN-CRF Network
We validate the proposed model on two publicly and
Adversarial training provides strong regularization for most frequently used mammographic mass segmentation
deep networks [11]. The idea of adversarial training is that datasets: INbreast dataset [22] and DDSM-BCRP dataset
if the model is robust enough, it should be invariant to exam- [12]. We use the same ROI extraction and re-size princi-
ple with the worst perturbation (adversarial examples [29]). ple as [8, 7, 9]. Due to the low contrast of mammographic
The perturbation R for adversarial example can be obtained images, image enhancement technique is used on the ex-
as tracted ROI images as the first 9 steps of enhancement [2],
min log p(y|I + R, ). (7) followed by pixel position dependent normalization is used
R,kRk
after the enhancement. The normalization makes the train- Method Dice(%)
ing converge quickly. We further augment each training set Cardoso et al. [4] 88
by flipping horizontally, flipping vertically, flipping hori- Deep Structure Learning [8] 88
zontally and vertically, which made the training set 4 times TRW Deep Structure Learning [9] 89
larger than the original training set. Deep Structure Learning + CNN [7] 90
For consistent comparison, the Dice index metric is FCN 89.48
used for the segmentation results and is defined Dice = FCN with Adversarial Training 89.71
2T P
2T P +F P +F N . For fair comparison, we also validate the Jointly Trained FCN-CRF 89.78
Deep Structure Learning + CNN [7] on our processed data, FCN-CRF with Adversarial Training 90.07
and obtain similar result (Dice index 0.9013) on the IN- Multi-FCN 90.47
breast dataset. Multi-FCN with Adversarial Training 90.71
To investigate the impact of each component in our Jointly Trained Multi-FCN-CRF 90.76
model, we conduct experiments under different configura- Multi-FCN-CRF with Adversarial Training 90.97
tions defined as follows:
FCN is the network integrating position priori into Table 1: Comparisons on INbreast dataset.
FCN. We use the enhanced FCN rather than the con-
ventional FCN in all experiments.
The first layer contains 16 filters of size 3 3 1, max
Adversarial FCN is the FCN with adversarial training. pooling with strides 2 2. The second layer contains
Jointly Trained FCN-CRF is the FCN followed by 13 filters of size 3 3 16, max pooling with strides
CRF as RNN with an end-to-end training scheme. 2 2. The third layer has 415 filters of size 8 8 13.
The fourth layer has 2 filters of size size 40 40 415
Jointly Trained Adversarial FCN-CRF is the Jointly for deconvolutional operator.
Trained FCN-CRF with end-to-end adversarial train-
ing. The first layer contains 37 filters of size 2 2 1, max
pooling with strides 2 2. The second layer contains
Multi-FCN is four FCNs with different configurations 12 filters of size 2 2 37, max pooling with strides
followed by a linear weighting layer to fuse the four 2 2. The third layer has 355 filters of size 9 9 12.
predictions with end-to-end training. The fourth layer has 2 filters of size size 40 40 355
for deconvolutional operator.
Adversarial Multi-FCN is the Multi-FCN with end-to-
end adversarial training. The parameters of FCNs are set such that the number
of each layers parameters is almost the same as that of
Jointly Trained Multi-FCN-CRF is with the Multi- CNN used in the work [7]. For optimization, we use Adam
FCN as potential functions followed by CRF as RNN algorithm [13] with learning rate 0.003. The used for
with end-to-end training. weights of CRF as RNN is 0.5 in the two datasets. The
used in adversarial training are 0.1 and 0.5 for INbreast and
Jointly Trained Adversarial Multi-FCN-CRF is the
DDSM-BCRP datasets respectively. For mean field approx-
Jointly Trained Multi-FCN-CRF with end-to-end ad-
imation or the CRF as RNN, we use 5 iterations/time steps
versarial training.
in the training and 10 iterations/time steps in the test phase.
The configuration of FCN is 6 filters of size 5 5 1, Due to the scarcity of data, we simplify the Gaussian kernel
max pooling of strides 2 2 in the first layer, 12 filters of kG (fi , fj ) by only considering the neighboring pixels:
size 5 5 6, max pooling of strides 2 2 in the second
kG (fi , fj ) = [exp(|Ii Ij |2 /2), exp(|pi pj |2 )/2]T ,
layer, 588 filters of size 7 7 12 in the third layer, 2 filters
(11)
of size 4040588 for the last deconvolutional layer. Note
which is a special form of truncated Gaussian kernel.
that we use the configuration in the methods using one FCN.
Other three FCNs used in the Multi-FCN are as follows: 4.1. INbreast Dataset
The first layer contains 9 filters of size 4 4 1, max The INbreast dataset is a recently released mammo-
pooling with strides 2 2. The second layer contains graphic mass analysis dataset, which contains 410 mam-
12 filters of size 4 4 9, max pooling with stride mographic images [22]. The dataset has 115 cases where
2 2. The third layer has 588 filters of size 7 7 12. 90 cases are from women with both breasts affected. Four
The fourth layer has 2 filters of size 40 40 588 for types of lesions are included: masses, calcifications, asym-
deconvolutional operator. metries and distortions. Compared to other datasets such as
mini-MIAS [27], the INbreast dataset provides more accu- Method Dice(%)
rate contours of lesion region and the mammographic im- Beller et al. [3] 70
ages are of high quality. Deep Structure Learning [8] 87
For mass segmentation, the dataset contains 116 mass TRW Deep Structure Learning [9] 89
regions. We use the first 58 masses for training and the rest Deep Structure Learning + CNN [7] 90
for test, which is of the same protocol used in these works FCN 90.21
[8, 7, 9]. FCN with Adversarial Training 90.78
We have compared our schemes with other recently pub- Jointly Trained FCN-CRF 90.97
lished mammographic mass segmentation methods [4, 8, 9, FCN-CRF with Adversarial Training 91.03
7]. We summarize the results in terms of the publicly used Multi-FCN 91.17
Dice index in Table 1. Jointly Trained Multi-FCN 91.20
Table 1 shows that the successfully used CNN features Multi-FCN-CRF jointly training 91.26
in natural image provide superior performance on medical Multi-FCN-CRF with Adversarial Training 91.30
image analysis, outperforming contour based method [4].
Our enhanced FCN achieves about 0.5% improvement than
the CNN followed by fully connected network (88.98%). Table 2: Comparisons on DDSM-BCRP dataset.
The adversarial training yields 0.3% improvement on aver-
age. Incorporating the spatially structural constraint further
produces 0.3% improvement. Using model average or mul- pared to that of INbreast dataset, which means the vari-
tiple potential functions contributes the most to segmenta- ation of groundtruth is much smaller. The position pri-
tion results which is consistent with work showing that the ori contributes more to the model in such case. Adver-
best model requires five different unary potential functions sarial training improves more than 0.5% compared to the
[7]. Combining all the components together achieves the model using FCN only. The FCN-CRF with adversarial
best performance with relative 9.7% improvement. In our training obtains about 0.8% improvement over the baseline.
experiment, there is a phenomenon that the FCN over-fits The multi-FCN-CRF with end-to-end adversarial training
heavily on the training set which can even achieve above achieves 1.3% (relative 13%) improvement over the Deep
98.60% Dice index. The phenomenon explains why the Structure Learning + CNN [7]. The result shows the advan-
two-stage training cannot boost the performance too much. tage of our proposed network.
The adversarial training works effectively as a regulariza-
tion to eliminate the over-fitting. We believe the over-fitting 4.3. Discussions
is mainly caused by the scarce mammographic images and To further understand the adversarial training, we visual-
we strongly support the creation of a large mammographic ize the segmentation results of FCN (the first row) and FCN
analysis dataset to accelerate mammographic image analy- with adversarial training (the second row) on the INbreast
sis research. (a) and DDSM-BCRP (b) datasets as in Fig. 4. For clarity,
we only visualize the perimeters of mass prediction results.
4.2. DDSM-BCRP Dataset We randomly select seven samples from the two datasets
The DDSM dataset is the largest mammographic image respectively.
analysis dataset, which contains 2,620 cases (10,480 im- From Fig. 4, we observe that the FCN with adversar-
ages) [12]. However, many of images in the DDSM dataset ial training yields more accurate borders (green lines) than
are too old and even damaged. A subset of the DDSM that of FCN. The FCN with adversarial training obtains bet-
dataset, DDSM-BCRP, is commonly used for its relatively ter segmentation results than using FCN only on the two
high resolutions and pixel level groundtruth annotations. datasets. This supports using adversarial training as an ef-
The DDSM-BCRP dataset includes 89 cases (356 images) fective regularizer to learn better models.
for training and 90 cases (360 images) for testing, which We further visualize segmentation results using the four
contains 39 mass cases (156 images) for training and 40 networks, FCN (the first row), FCN with adversarial train-
mass cases (160 images) for testing. After ROI extraction, ing (the second row), jointly trained FCN-CRF (the third
there are 84 ROI for training, and 87 ROI for test. Compar- row) and FCN-CRF with adversarial training (the fourth
isons with contour based method [3] and two stage trained row) on the INbreast (a) and DDSM-BCRP (b) datasets in
deep structure models [8, 9, 7] are summarized in Table 2. Fig. 5.
Table 2 shows that, using enhanced FCN achieves com- From Fig. 5, we observe that the segmentations in the
parable performance as the previous best performance. The first row have vague borders and many outliers within the
reason is that there are not many sharp boundaries in the predicted borders. The segmentations in the second row
segmentation groundtruth of DDSM-BCRP dataset com- have fewer vague borders and fewer outliers than the pre-
(a)