You are on page 1of 3

2014 World Congress on Computing and Communication Technologies

Multiple feature extraction from


Cervical cytology images
by Gaussian mixture model
G.Karthigai Lakshmi

K.Krishnaveni

Department of Computer Science


V.V.Vanniaperumal College for Women
VIRUDHUNAGAR
karthigailakshmi64@gmail.com

Department of Computer Science


Sri Ramasamy Naidu Memorial College
SATTUR
kkveni_srnmc@yahoo.co.in

Abstract - In this paper, methods for automated extraction of


multiple features of cytoplasm and nuclei from cervical cytology
images are described. Edges of the image are enhanced by Edge
Sharpening filter. Then Gaussian mixture model using Expectation
Maximization and K-means clustering is used to segment the image
into its components as background, nucleus and cytoplasm. Features
have been identified for both multiple and single cervical cytology
cells. For multiple cell images, nucleus to cytoplasm ratio is
calculated. A mixture of features like center, perimeter, area, mean
intensity of nucleus and cytoplasm are extracted from cells with
single nucleus. These features may be used to determine the stage of
cancer.
Keywords Cervical Cytology, k-means clustering, Expectation
Maximization, Pap Smear Test, Structural features

To clean up the noise, mean filter that preserves the edge


information may be used. Boundary based or Region based
segmentation method may be used to extract the zone of
interest from the images. Edges mark the discontinuities in
images in the form of color or gray level changes or variations
in texture. Edge detecting filters may be used for identifying
edges. But since the boundary of cervical cytology images is
irregular, these filters do not suit the segmentation. Edges may
be enhanced and contours may be used for cytoplasm and
nuclei detection. Gradient Vector Flow (GVF) Snakes and
Active Contour models have been proposed in many research
works. Initial contour positioning is imperative in determining
the efficiency of these methods. Nazahah Mustafa et al [3]
proposed a Seed Based Region Growing algorithm for
automated multicells segmentation of Thin Prep Image. This
method uses k-means clustering process for splitting image
into background, cytoplasm and nuclei. A seed pixel selected
using moments is used for region growing. This method works
out well for non overlapping cell images. M.E.Plissiti et al [4][5]
have used Fuzzy C-means clustering reconstruction
techniques.
The aim of this paper is to implement a simple, yet
efficient method that helps in feature extraction without much
of preprocessing. Cancerous cells vary from normal cells in
both structural and textural features. Qualitative features can
be visually interpreted correctly by cytopathologists.
Quantitative features can be correctly analyzed by automation.
In this paper, Pap smear cervical cytology images
from NCI web atlas of Bethesda system [6] and database of
Herlev Hospital of Denmark [7] are used. These images in RGB
color space are preprocessed to enhance quality of the image.
Gaussian Mixture Model (GMM) supported by Expectation
Maximization (EM) and K-means clustering is used to

I. INTRODUCTION
Cervical cancer is caused by Human papillomavirus (HPV). It
is the fifth deadliest cancer in humans and second deadliest in
women. In developing countries and underdeveloped
countries, the awareness of the causes and effects of cervical
cancer is far less than developed countries. Cervical cancer
kills 280,000 women every year. In India, Chennai ranks third
in the whole world in cervical cancer mortality hinged on
population based cancer registries made in 2001[1]. This cancer
accounts for 27% (77,100) of the total cervical cancer deaths
in the year 2008. [2] In 2010, 33,400 Indian women died of
cervical cancer. 16 per 0.1 million, women are affected per
year by cervical cancer. Cervical cancer can be cured, if
proper diagnosis is done at an early stage. Papanicolaou test
or Pap smear test is the precursor for diagnosing gynecological
cancer. The visual interpretation of features of cervical images
is crucial in discriminating severity of disease. Manual
screening of cervical cytology image obtained from Pap smear
test is error-prone due to several reasons as uneven dyeing,
poor contrast, blood stain and it is a time consuming process.
Differentiation of types of cells as benign and malignant can
be automated to reduce human errors and improve diagnosis.
Dysplastic cells of cervical cancer have immature cytoplasm,
abnormal features in nucleus, increased nucleus to cytoplasm
ratio. Features extracted from microscopic cervical cytology
images can serve to diagnose the stage of cancer. Medical
image processing techniques help in extracting and analyzing
these features and determining the severity of disease.
Automated cervical cancer diagnosis involves three
phases as noise removal, segmentation and feature extraction.
978-1-4799-2876-7/13 $31.00 2014
978-1-4799-2876-7/14
978-1-4799-2877-4/14
2013 IEEE
DOI 10.1109/WCCCT.2014.89

(a)
Figure 1. (a) Normal Cervical cytology cell

309

(b)
(b) Severe Dysplastic cell

segment the image into its components as background, nucleus


and cytoplasm. Statistical Features used to categorize cervical
cells as Normal, Low grade Squamous Epithelial Cells (LSIL)
and High grade Squamous Epithelial Cells (HSIL), are
extracted from the nucleus and cytoplasm.
II. FEATURE EXTRACTION BY GAUSSIAN MIXTURE
MODEL
A. Preprocessing
The cervical cytology image in RGB format is given as
input. To sharpen the boundaries of cytoplasm and nuclei, an
edge sharpening filter is used. This is achieved by subtracting
a scaled un-sharp version of the image from the original. The
resultant image is given as input to Gaussian Mixture Model.

for next iteration by maximizing the expected log-likelihood


found on the E step. These parameter estimates are then used
to determine the distribution of the latent variables in the next
E step.
D. K-Means Clustering Algorithm
The EM algorithm identifies the mean and standard deviation
of the regions of cervical image (background, cytoplasm and
nucleus). The K-Means Clustering Algorithm performs the
following steps:
1. For each pixel in cervical image, the closest cluster
centre, measured by Euclidean distance, is identified.
2. Each cluster centre is replaced by the coordinate-wise
average of intensity values of all pixels that are closest
to it.
Steps 1 and 2 are alternated until all pixels in the image are
assigned to any one of the three regions.

B. Gaussian Mixture Model


A cervical cytology image is an array of pixel values. Each
pixel has its intensity value represented as an integer. To
model the distribution of intensity of pixels in an image, a
mixture model can be used. In this paper, a Gaussian Mixture
Model (GMM), which is a parametric probability density
function represented as a weighted sum of Gaussian
component densities, is used. The parameters like mean (i)
and standard deviation (i) of the components or classes of
GMM are found by iterating an Expectation Maximization
(EM) algorithm. A GMM can be represented by the following
equation

E. Feature Extraction
Bethesda System classifies cervical cytology cells as
1. Atypical Squamous Cells
a. of undetermined significance (ASC-US)
b. cannot exclude HSIL (ASC-HSIL)
2. Low grade Squamous Intraepithelial Lesion (LSIL/
CIN1)
3. High grade Squamous Intraepithelial Lesion (HSIL
CIN2, CIN3)
4. Squamous Cell Carcinoma
This classification is based on visual inspection of size of
nucleus, position of nucleus with respect to center of
cytoplasm, ratio of area of nucleus to that of cytoplasm (N/C
ratio), shape of nucleus, intensity variations in nucleus, and
inter-nuclear distance.
The database of Herlev Hospital of Denmark has been
benchmarked into seven categories as follows:
1. Normal Superficial squamous epithelial
2. Normal Intermediate squamous epithelial
3. Normal Columnar epithelial
4. Abnormal Mild squamous non-keratinizing dysplasia
5. Abnormal Moderate squamous non-keratinizing
dysplasia
6. Abnormal Severe squamous non-keratinizing dysplasia
7. Abnormal Squamous cell carcinoma in situ
In this paper, for multi-cellular images, ratio of area of
nuclei to that of cytoplasm is calculated. The implications
were derived from the suggestions given in Bethesda system.
multifarious features may be
For single cell images [9],
calculated to classify the images using a classifier like
Artificial Neural Network, Genetic algorithm, ant colony
approach or any other suitable classifier.

(1)
where k is number of Gaussian components, pi > 0 is the
mixing weight of the component (i) such that the total
probability is 1, x is a random variable representing intensity
of pixels. Q (1) is a Gaussian distribution with the parameters
(,).
(2)
C. Expectation Maximization Method
Expectation maximization method [8] is an iterative model
used for estimating the mean and standard deviation of
components of the Gaussian Mixture Model. Maximum a
posteriori (MAP) or maximum likelihood of the parameters is
found. The EM iteration alternates between an Expectation (E)
step and a Maximization step (M) until convergence. The
Expectation step creates a function for the expectation of
the log-likelihood evaluated using the current estimate for the
parameters. The Maximization (M) step computes parameters

F. Features and their description


1. Area of nucleus or cytoplasm is the count of pixels that lie
in the respective region.
2. Ratio of area of Nucleus to area of cytoplasm is calculated
to be
(3)
(Nucleus Area) / (Nucleus Area + Cytoplasm Area)
3. Brightness of nucleus and cytoplasm are calculated as

(a)
(b)
Figure 2. (a) Normal Multi-cellular image (b) Cancerous Multicellular image with closely spaced nuclei with irregular shape

310

Table 3
Features extracted

Y = 0.299 * Red + 0.587 * Green + 0.114 * Blue


(4)
4. Diameters are calculated to be diameter of the inscribed and
circumscribed circles of nucleus and cytoplasm.
5. Elongation is defined to be the ratio of shortest to longest
diameter of the object under consideration.
6. The roundness is calculated as the ratio between the actual
area and the area of the circumscribed circle of the object.
7. Perimeter is the count of pixels that lie on the boundary of
object.
8. Nucleus position is a measure of how well the nucleus is
centred within the cell. This is calculated using centres of
nucleus and cytoplasm.

Original
Image

Nucleus

Cytoplasm

Mean
Intensity

Area

Diameter

Mean
Intensity

Area

Dia
met
er

136

1187

39

280

2573

57

127

3227

64

156

5467

87

186

2175

53

209

8115

101

III RESULTS AND DISCUSSION

38 Pap smear cervical cytology images with multiple cells


were taken as experimental samples. When ratio of the area of
nucleus to that of cytoplasm [10] [11] is less than 0.4 the cells are
normal. When the ratio lies between 0.4 and 0.5, then the cells
are classified as Cervical Intraepithelial Neoplasia1 LSIL.
When the ratio lies between 0.5 and 0.67 then the cells are
classified as Cervical Intraepithelial Neoplasia2 HSIL (CIN2
HSIL). When the ratio lies above 0.67, then the cells are
categorized as Cervical Intraepithelial Neoplasia3 HSIL
(CIN3 HSIL). Some results are shown in Table 2.
Nearly 100 Pap smear cervical cytology images with single
cells were taken as experimental samples from the database of
Herlev University Hospital and the features are estimated.

IV CONCLUSION

The proposed method separates nucleus and cytoplasm


from both single and multiple cellular Pap smear cervical
cytology images using Gaussian mixture model. The results
obtained are found to be satisfactory. The relevant features are
extracted. Further work will be carried out by attempting to
automate classification of cytology images into any one of the
groups as normal, LSIL or HSIL cancerous cell images using
any one of the soft computing techniques.

Table 1
Result of Segmentation by Gaussian Mixture Model
Original Image
Segmented Image

REFERENCES
[1] http://www.canceratlasindia.org
[2] http://globocan.iarc.fr/
[3] Nazahah MUSTAFA, Nor Ashidi MAT ISA and Mohd Yusoff MASHOR.
Automated Multicells Segmentation of ThinPrep Image Using Modified
Seed Based Region Growing Algorithm, Biomedical Soft Computing and
Human Sciences, Vol. 14, No.2, pp. 41-47(2009)
[4] M.E.Plissiti, E.E. Tripoliti, A.Charchanti, O. Krikoni and D.I. Fotiadis.
Automated Detection of Cell Nuclei in PAP Stained Cervical Smear
Images using Fuzzy Clustering
[5]Marina E.Plissiti, Christophoros Nikou and Antonia Charchanti.
Automated Detection of Cell Nuclei in Pap Smear Images using
Morphological Reconstruction and Clustering, IEEE Transactions on
Information Technology in Biomedicine, Vol 15, No 2, March 2011
[6] http://nih.techriver.net
[7] http://labs.fme.aegean.gr/decision
[8] http://en.wikipedia.org/wiki/Expectation_maximization_algorithm
[9] Eric Martin, Pap smear classification, Technical University of Denmark
[10] http://eurocytolgy.eu
[11]G.Karthigai Lakshmi, K.Krishnaveni. Automated Extraction of
Cytoplasm and Nuclei from Cervical Cytology images by Fuzzy
Thresholding and Active Contours, International Journal of Computer
Applications (0975 8887) Volume 73 No.15, July 2013

Table 2
Implication from Nucleus to Cytoplasm ratio
Image 1 is normal cell Images 2 and 3 are cancerous cells
No

Input Image

Cytoplasm

Nucleus/Nuclei

N/C
Ratio

0.0365

0.8613

1.7683

311

You might also like