1 Computerized Evaluation of Mamographic Image Quality Using Phantom Images - Geoffrey Doughety

Computerized evaluation of mammographic image quality using phantom
images
Geoffrey Dougherty*
Department of Radiologic Sciences, Faculty of Allied Health Sciences, Kuwait University, PO Box 31470, 90805 Sulaibikhat, Kuwait
Received 7 January 1998; accepted 5 August 1998
Abstract
A simple, quick and computerized method for quantitatively evaluating the image quality of mammography phantom images has been
developed. Images of the American College of Radiology (ACR) mammographic accreditation phantoms were acquired under different X-
ray techniques, scored and ranked subjectively by ve expert readers, and digitized for quantitative analysis. The contrast and signal-to-noise
(contrast-to-noise) ratios of the main nodule and microcalcication group were obtained accurately and reproducibly using an image
processing protocol. The contrast values were successful at discriminating differences in image quality due to variations in scatter conditions
(as a result of different kVp's, and the presence or absence of an acrylic scatterer and/or a moving Bucky grid). They were more precise,
reproducible and sensitive than the ACR score. In particular, the contrast of the main nodule was highly correlated (r
s
0.988: p , 0.001)
with the ranking of image quality by our panel of expert readers. q1998 Elsevier Science Ltd. All rights reserved
Keywords: Mammography; Mammographic phantom; Image quality; Conspicuity
1. Introduction
Worldwide, breast cancer is one of the most serious
health threats facing women. In the United States cancer
of the breast is the most frequently diagnosed malignancy
[1] and is a leading cause of cancer mortality among women
[2]. At present, mammography remains the sole imaging
modality with proven capability of detecting small,
clinically occult breast lesions in asymptomatic women
[3]. The medical value of mammographic screening for
breast cancer has been demonstrated in a number of trials
[411], albeit with some reservations on age and frequency
of screening [12,13]. Survival and recovery depend on early
diagnosis, and hence the image quality of the mammo-
graphic system is of crucial importance. In particular, the
portrayal of ne detail and the improvement of contrast with
surrounding broglandular tissue are the keys to the visua-
lization of early or subtle abnormalities [14,15]. Although
mammography is more accurate than any other currently
available modality, limitations in sensitivity (typically,
8090%) and specicity (typically 1530%) are a concern.
Meticulous attention to detail in the mammographic
assessment is required since even a very slight decrease in
image quality could result in the loss of diagnostically use-
ful information.
High quality mammographic images depend on all the
factors within the imaging chainincluding the X-ray
equipment, the screenlm combination and its degree of
contact [16], and the processing. Technical image quality is
conventionally described in terms of the contrast, sharpness
and noise properties of the image formation process [17],
since these elements can be determined on a more or less
independent basis. However, the effects of these three image
properties on the visibility of ne mammographic detail are
interrelated, and their relative importance in the depiction of
subtle but diagnostically signicant abnormalities depends
on the target of interest. Consequently breast phantoms,
which contain structures that mimic clinically relevant fea-
tures such as soft-tissue masses and calcications, are rou-
tinely used to monitor the performance of a mammographic
imaging system and assess global image quality. Evaluation
using a phantom is convenient to conduct in a standardized
manner since patient radiation dose is not a factor, although
there are concerns that these tests may not adequately simu-
late the radiologists' search for specic anatomic details
against highly complex backgrounds [18,19].
A number of different phantoms are commercially avail-
able [20,21]. They contain a range of structures some of
which are more sensitive to resolution related factors
0895-6111/98/$ - see front matter q 1998 Elsevier Science Ltd. All rights reserved
PII: S0895-6111( 98) 00043- 3
* Corresponding author. Fax: +965-4833662; e-mail: geoff@hsc.kuniv.
edu.kw
PERGAMON
Computerized Medical Imaging and Graphics 22 (1998) 365373
Computerized
Medical Imaging
and Graphics
(focal spot size, lm-screen combination), some to contrast
related factors (kVp, scatter, lm characteristics and proces-
sing procedures), and some a combination of both. The
Mammography Accreditation Program (MAP) of the Amer-
ican College of Radiology (ACR) uses the RMI model 156
phantom (Radiation Measurements Inc., Middleton, WI),
which simulates a breast of 4.5 cm thickness, composed of
50% fatty tissue and 50% glandular tissue. It consists of a
square acrylic block with a wax insert embedded with struc-
tures simulating various sizes of bres (brils), specks
(microcalcications) and masses (nodules). The visibility
of these structures is assessed as either visible, partially
visible or not visible, arbitrary scores are assigned for
each structure seen (or partially seen), and scores are
totalled to give a global image quality score. Performance
on this imaging test is a signicant factor in accreditation of
a facility.
Despite the adoption of a system for assigning fractional
scores to partially visible structures, and other deductions
and artefact assessment, there is a considerable subjective
element in the overall evaluation. Not only may there be
some inconsistency amongst readers in interpreting the
scoring criteria, there may even be different visual acuities
amongst readers. Furthermore, due to the arrangement of the
structures in the ACR phantom in a progressive pattern
readers can anticipate the presence of invisible or barely
visible structures contributing to false-positives and an ele-
vated score [22]. Clearly a much more objective assessment
of the phantom images is required to provide a valid, con-
sistent and reproducible gure of merit. With proper quan-
titation, the phantom could then be used with condence to
assess the effects of changes in exposure conditions (espe-
cially kVp) and lm/screen characteristics, and to be more
discriminatory between different mammographic imaging
systems.
Quantitative phantom image analysis has been recog-
nized as a possible solution to these problems [19,23,24].
The aim of our study was to establish a simple, quick and
convenient methodology to quantitatively assess
mammographic phantom image quality using image proces-
sing techniques; and to identify a convenient metric that
would be sensitive to exposure conditions and discrimina-
tory between different imaging systems, and could be used
as part of a consistent quality control program. Its validity as
an indicator of overall image quality would be tested against
the evaluation of a series of images by a panel of ve expert
readers. If such a metric could be identied, its adoption in
quality control should enable hospitals to deliver optimum
mammography at the lowest possible dose to screened
women.
2. Methods
2.1. Phantom
The RMI model 156 Mammographic Accreditation Phan-
tom comprises an acrylic (Lucite) block with a removable
wax insert, approximating a 4.04.5 cm compressed breast.
The wax insert contains the target structures consisting of
six brils (F1F6), ve groups of microcalcications (M1
M5) and ve nodules (N1N5), arranged in four rows of
four (Fig. 1). Within each class of structures the targets are
of progressively smaller contrast and size so that they range
from most easily visible to least easily visible. The alumi-
nium identication plate was carefully removed from each
wax insert, in order to scan in square images of the struc-
tures without including part of the plate.
2.2. Images
A series of test images of the ACR phantom were
acquired with a dedicated mammography system(Senographe
DMR, General Electric), using a molybdenum target and l-
ter, a focal spot size of 0.3 mm and a focus-to-lm distance of
60 cm. A variety of operating conditions was used to test the
effect on image quality of kVp, mAs, scatter, and lm type.
We used two methods to vary the degree of scatter radiation.
Images of the wax insert were taken with and without the
acrylic block to give more or less scatter: and images of the
wax insert were taken with and without the moving Bucky
grid, whose presence reduces scatter radiation.
All images within a given series were obtained in a single
session, using Kodak Min-R medium lms and screens.
Film processing was performed on a Kodak RP X-OMAT
processor (Model M6B) using a standard 90 s developing
cycle and a developer temperature of 358C. A nominal
28 kVp and automatic exposure control (AEC), with the
AEC photocell at the middle of the three available positions,
was recognized as the normal operating condition.
2.3. Digitization
All the images were dititized with a laser digitizer (Hew-
lett Packard Scanjet II CX/T), using 8 bits per pixel to span
Fig. 1. Schematic view of the ACR Mammographic Phantom showing the
structures (F brils, M microcalcications, N nodes) and their
positions, and the labelling scheme used.
366 G. Dougherty / Computerized Medical Imaging and Graphics 22 (1998) 365373
the optical density range 1.01.4. The images taken without
the acrylic scatterer were digitized at 340 dpi (correspond-
ing to a pixel size of 75 mm) to produce images of 1024 3
1024 pixels. A typical image is shown in Fig. 2. Those taken
with the scatterer in place were magnied since the struc-
tures were further from the lm-screen: in order to scale
these images to the same size, they were scanned at
320 dpi. The resulting grey levels are linearly related to
the optical density of the lm in each pixel. The images
were saved in TIFF format using Photostyler (Aldus Cor-
poration, Seattle, USA): no enhancement was performed.
Automatic software optimization of brightness and contrast
was not invoked, since it is not a linear process: instead all
images within a series were scanned at identical software
settings for brightness and contrast. The le format was
changed to IMG (Kontron) format using X_IMAGE (Foster
Findlay, Newcastle, England) and the les transferred to the
hard disk of a Mipron workstation (Kontron Electronik,
GmBH, Eching, Germany) for analysis and display. The
Mipron workstation comprises a control processor unit
based on a 80486 processor at 66 MHz and an image pro-
cessing unit, which contains a pipeline-structured array pro-
cessor, display processor, additional image memory and a
video input/output board. The Mipron software contains a
range of functions for a wide range of image analysis appli-
cations, and allows users to assemble and edit customized
macro les. A specialized image processing workstation is
not essential for the analysis. We have found that a Pentium
133 MHz computer and software such as Kontron's KS-400
(available for the PC and other platforms) can reproduce
these results, with only a marginally slower performance.
2.4. Computer analysis
A monotonically varying background optical density in
the anodecathode direction was frequently encountered.
This results from the heel effect, and its magnitude increases
with the amount of scatter during image formation (as a
result of higher kVp, presence of the acrylic scatterer or
disabling the Bucky grid). In order to isolate the features
in the image from this varying background, a background
image was formed by low pass ltering (using a 256 3 256
averaging kernel) the digitized image and subtracting it
from the (original) digitized image. Subtraction of the back-
ground image effectively minimized this artefact, and con-
veniently produced images of similar brightness for further
analysis. However, a low frequency background variability
of smaller amplitude was also noticed especially at higher
kVp's on those images taken with the acrylic scatterer. The
source of this variability is unknown, although its appear-
ance has previously been reported by others [19]. Our back-
ground subtraction did not remove this variability entirely,
but subsequent processing using 256 3 256 sub-images
succeeded in discriminating it from the features in the
image.
Two regions (each 256 3256 pixels) were then extracted
from each image, to isolate respectively the most prominent
nodule N1, and the most prominent microcalcication
group, M1. Our analysis limits measurement to these two
structures, which are visible under even the worst operating
conditions. Attempts to include other structures in the ana-
lysis were not successful with all images, using the proces-
sing protocol described here. With N2, for example, simple
thresholding was often unable to even approximately seg-
ment feature from non-feature pixels. The visibilities of N1
and M1 are primarily limited by contrast discrimination and
resolution, respectively: and clinical visibility will depend
on a combination of both factors. Analysis of the brils was
not included in this study since their visibility depends on
both contrast and resolution according to some unknown
and complex combination.
For both of these excised regions (containing N1 and
M1), the contrast between features and background (which
we call C(N1) and C(M1) respectively) were computed. The
pixel values in the digitized images are proportional to the
logarithm of the lm exposure for exposures in the quasi-
linear region of the characteristic curve. Average signal
values are required within the feature, and in the back-
ground surrounding the feature. The contrast is the differ-
ence between the two corresponding average values. It is
equivalent to (signal background)/background if the pixel
values are converted to units of exposure [25].
The processing challenge is to determine the contrast
values accurately. In the case of N1 the feature itself is
not well-delineated, the pixel values inside and outside the
feature are not very different, and there is signicant varia-
tion in the two values which are to be subtracted. The dif-
culty of estimating the difference in two nearly equal
Fig. 2. Image of the ACR Mammographic Accreditation Phantom (taken at
27 kVp, 5 mAs, no acrylic scatterer).
numbers, each subject to imprecision, is a fundamental
problem in measurement. In this study, the contrast of
N1 relative to background was around 4%. In clinical
situations even lower contrasts are common. A 3 mm car-
cinoma in a 42 mm thick breast has been calculated to
have a subject contrast of only 1.3% [26], close to the
threshold contrast for human vision of about 0.8% [27].
For M1, the difculty lies in the small size of the
individual specks each of which typically occupied a 7
3 7 pixel region.
Attempts to distinguish feature from non-feature pixels
by eye, and draw a circular contour in the overlay plane to
partition them were abandoned as being too operator-depen-
dent, and therefore not reproducible. We also rejected the
procedure of placing circles of xed diameters around the
nodules, used in a previous study [19], as being arbitrary.
Instead the sub-image containing N1 was thresholded to
produce a binary image, which was then morphologically
closed (using a diagonal cross-shaped structuring element,
comprising 5 pixels, to merge closely adjacent objects), and
lled (to eliminate holes (grey value 0) inside the object).
Small bright areas below a chosen threshold size (generally
100200 pixels), due to noisy background pixels, were dis-
carded, and the resulting binary mask was used to discrimi-
nate feature from non-feature pixels by logically `ANDing'
it with the original (grey-scale) sub-image. This proved to
be a straightforward process, when coded as a macro, and
produced reproducible results. Fig. 3 shows the mask
obtained for a typical sub-image containing the nodule
N1, using this protocol. Once the pixels were discriminated,
the averages of both feature pixels (around 12 000 pixels or
so) and non-feature pixels (the remaining 53 000 or so in
each sub-image) were computed. The difference in these
two values is our metric C(N1), regarded either as an aver-
age contrast (in density terms) or an average contrast-to-
background ratio (in exposure terms). A second metric
was computed by dividing C(N1) by the standard deviation
(`noise') of the feature pixels, and this is the signal-to-noise
ratio, SNR(N1) (although some sources [28] would refer to
it as the contrast-to-noise ratio, CNR).
The feature pixels comprising the specks in the sub-image
containing M1 were successfully identied by thresholding
the image, morphologically opening it (using a diagonal
cross-shaped structuring element, comprising 5 pixels, to
eliminate small objects but retain the size of the speck
features), and then discarding areas smaller than a chosen
threshold size if necessary (a threshold of about 20 pixels
worked in most cases). Fig. 4 shows the mask obtained for
a typical sub-image containing the microcalcication
group M1. Our processing protocol preserves the size
and shape of the individual calcications. The average
of the feature pixels (around 300 for the six microcalci-
cations in each sub-image) and non-feature pixels was
computed, and analogous metrics C(M1) and SNR(M1)
obtained.
2.5. ACR scoring
The readers scored the lm images using an experimental
design modelled after the three-alternative forced choice
algorithm [29,30]. The images were viewed on a standard
clinical viewbox in a darkened room. The viewing area of
the box was reduced by an opaque cardboard mask to elim-
inate extraneous light, and a magnifying glass was available.
Readers were encouraged to vary their viewing distance and
position in order to nd the position at which the features
were most visible. For each of the 16 positions within each
image, the reader indicated whether structures were visible,
partially visible or not visible, and this information was
recorded on a standardized form and converted to a net
score.
Scoring was according to the ACR criteria [31], which
allows half scores for structures which are not completely
visible, e.g. if the whole length of the bre is not seen, or
all of the specks in a group are not visible (23 specks
seen in a group result in a half-score, 46 result in the full
score). The full ACR scoring system of deductions and
artefact assessment was implemented. The scoring was
performed by ve readers, all practising radiographers
familiar with mammography. The complete scoring task
was repeated by each reader on two separate occasions to
check for any inconsistencies in the scoring of an indivi-
dual reader.
Fig. 3. The binary mask (right) used to discriminate feature from non-
feature pixels in the sub-image of the node N1 (from Fig. 2) shown at left.
Fig. 4. The binary mask (right) used to discriminate feature from non-
feature pixels in the sub-image of the microcalcication group M1 (from
Fig. 2) shown at left.
2.6. Ranking of image quality
In addition, because of concerns regarding the arbitrary
weights given to visible features in the ACR scoring system,
the readers were asked on a separate occasion to rank the
image quality of the radiographs. Radiographs were pre-
sented side-by-side in pairs to a reader, who recorded a
preference for one of the members of the pair. To rank a
set of n radiographs, each reader independently performed
all possible pair-wise comparisons within the set (i.e. n(n
1)/2 paired comparisons). This method is commonly used
when objects being compared can only be judged subjec-
tively. We followed the protocol used in an earlier study
[32].
3. Results
Table 1 shows the effect of changing tube kVp on the
contrast and signal-to-noise metrics. The mAs value was
xed at 140 mAs, the value obtained using automatic expo-
sure control (AEC) at 28 kVp. The acrylic block was
included in some exposures to simulate tissue scatter, and
removed in others to produce better quality images under
minimal scatter conditions. Over this limited range of
kVp's, optical densities were reasonably uniform within
the range 1.01.4. Net ACR scores, and overall image qual-
ity ranking, are included in the table.
The effect on the contrast metrics of changing mAs (at
28 kVp) around the auto-exposure values of 140 mAs (with
the acrylic scatterer) and 5 mAs (without the acrylic scat-
terer) is shown in Table 2.
Table 3 shows the effect on the contrast values of dis-
abling the Bucky grid, for the range of kVp values used in
Table 1. Images taken with the acrylic scatterer in place
were generally unsatisfactory, and were excluded from the
analysis.
The validity of the contrast metrics, C(N1) and C(M1), as
indicators of the quality of phantom images was determined
by comparing the ranking due to their measured values with
the quality ranking as judged by our expert panel of ve
readers. The ranking of the members of the panel showed
little interobserver variability. A convenient non-parametric
measure of correlation is the Spearman rank correlation
coefcient, r
s
[33], given by
r
s
1
6
X
d
2
i
n(n
2
1)
(1)
where the two sets of values are ranked 1 to n and d
i
is the
difference in ranking for each pair of observations. A cor-
relation coefcient of unity indicates perfect correlation,
and zero indicates uncorrelated rankings: intermediate
Table 1
Values of the contrast and SNR indices for N1 (main nodule) and M1 (main calcication group) of the mammographic phantom, exposed with () and
without ( 3) acrylic scatterer, at various kVp's. Rankings based on these indices are given in parentheses. (The mAs value with the scatterer was xed at 140;
without the scatterer it was 5 mAs. The moving Bucky (grid) was enabled)
kVp Acrylic C(N1) SNR(N1) C(M1) SNR(M1) ACR Score
( 6 0.5)
Expert panel rank-
ing
26 8.40 2.98 16.24 1.61 28.0 3
27 8.11 2.85 14.12 1.91 25.5 4
28 6.18 3.26 12.55 1.97 25.0 7
29 6.02 3.19 12.21 2.28 21.0 8
30 5.75 2.23 11.25 2.12 18.5 10
26 3 8.56 2.49 18.15 1.35 40.5 1
27 3 8.30 2.26 16.11 1.84 29.0 2
28 3 6.36 2.15 17.83 2.42 29.0 5
29 3 6.30 3.00 15.13 2.59 21.0 6
30 3 5.87 1.31 13.38 2.22 18.5 9
Spearman rank correlation coefcient 0.988 0.067 0.855 0.648 0.924
Table 2
Values of the contrast index, C, for nodule N1 and microcalcication group M1, at various mAs values, with () and without ( 3 ) the acrylic scatterer. The
AEC (automatic exposure control) values of mAs are shown in bold. The moving Bucky (grid) was enabled
kVp Acrylic mAs C(N1) C(M1) ACR score
28 100 5.97 12.14 21
28 140 6.40 12.74 26
28 160 5.90 12.08 26
28 3 3 6.71 17.58 28.5
28 3 4 6.85 18.06 29
28 3 5 7.01 20.64 28.5
28 3 7 6.05 16.35 26
28 3 9 3.90 8.74 21
values are compared with critical values of the test statistic
to interpret their correlation at various signicance levels.
Values of r
s
showing the correlations of the various metrics
with the ranking of image quality by our panel of experts are
included in Table 1.
4. Discussion
In order to be clinically useful, an image quality index
must be sensitive enough to discriminate between subtle
variations in system performance over the range of typical
operating conditions used to image breasts of differing sizes
and composition.
Table 1 shows that the values of the contrast metrics
(C(N1) and C(M1)) fall monotonically as tube kVp
increases. This is expected due to the increasing relative
importance of Compton scattering at higher X-ray energies
[34]. The ACR scoring also falls with increased kVp,
although the fall is less uniform. We ascribe this to the
coarse (i.e. 3-alternative) nature of the scoring, and the arbi-
trary scores awarded to each visible feature. (In particular,
we noticed large observer variability in scoring the fth
bril, F5.)
The values of the contrast metrics, C(M1) and C(N1),
were slightly higher (for the same kVp) for the images
obtained without the acrylic scatterer. This is the result of
increased image quality due to the reduced scattering.
The value of the Spearman rank correlation coefcient
correlating the expert panel ranking with C(N1) is very high
(0.988), exceeding the critical value of the test statistic for n
10 at a signicance level of 0.001 (i.e. 0.867). This indi-
cates that C(N1) is a valid indicator of image quality, cor-
relating strongly with the expert panel opinion adopted as
our ``gold standard''. Interestingly, the value is even higher
than that obtained for the correlation between the expert
panel ranking and the ACR score (i.e. 0.924). The correla-
tion coefcient for C(M1) with expert ranking is not so high
(0.855), but it is valid at the p , 0.005 level.
The values of our alternative metrics, SNR(N1) and
SNR(M1), do not show a monotonic fall with rising kVp.
This results from noise values which do not show any dis-
tinct pattern with kVp. Theoretically we would expect the
measured noise to fall with rising kVp, as increasing scatter
blurs the variations within the featuresalthough the effect
over the range 2630 kVp may be small. Although our
method minimizes pre-processing of the images, subtraction
of the smoothed background is probably affecting the noise
values in the features differently from one image to another.
The noise estimate will also depend on the size of the feature
area, since the noise in the images does not have a uniform
frequency distribution. It represents an integration of the
two-dimensional noise power spectrum between frequen-
cies corresponding to two pixels (the Nyquist frequency)
and the size of the feature, and there is some variability in
this latter value. Clearly, the residual low frequency back-
ground artefact, evident on the images taken at higher kVp
with the acrylic scatterer, will affect the noise measure-
ments. Whatever the reasons, our noise measurements are
not reliable and consequently neither are the derived SNR
metrics. The values of the Spearman rank correlation coef-
cient for SNR(N1) and SNR(M1) with expert ranking
(0.067 and 0.648 respectively) conrm that they are
not valid descriptors of image quality, and we have omitted
them from further discussion.
C(N1) monitors the conspicuity of the large, low-contrast
masses and will primarily reect low contrast detectability,
whilst C(M1) monitors the conspicuity of the small specks
above background and thus is expected to provide informa-
tion on the spatial resolution within the images. A metric
obtained from the thin, moderately low contrast brils is
expected to be inuenced both by spatial resolution and
contrast factors: and a combination of all three separate
metrics may be required to assess global image quality.
Nevertheless, we have shown that the contrast alone of the
main nodule (N1) in the ACR phantom provides a simple,
useful and reliable indicator of image quality. A computer-
ized image-processing scheme is required to measure the
contrast accurately and reproducibly. The low grey level
contrast necessitates high digitization quality, viz. either a
12-bit digitizer, or an 8-bit digitizer set to span a reduced
range of optical density values of interest (e.g. OD 1.01.4).
Our measurements reliably followed the ranking of an
expert panel, with validity at the p , 0.001 level. Accurate
manual measurement of the contrast using a densitometer is
not practical. The optical densities of nodule feature and
background are very close (e.g. 0.54 and 0.57 respectively),
such that the difference between them hardly exceeds the
varying noise level. Reproducible sampling in the numbers
necessary to produce acceptable precision in the contrast
would not be possible: even moderate sampling would be
difcult and time-consuming.
Manual measurement of the contrast of the microcalci-
cations in the phantomimage would require precise alignment
of a microdensitometer beam. However, we have shown that
the contrast of the main calcication group (M1) can be accu-
rately measured by computer, without resorting to complex
alignment with a template image. Our measurements corre-
lated well with expert ranking, with validity at the p , 0.005
level. However, the precision of the values is limited because
far fewer pixels are averaged in the process.
Table 3
Values of the contrast index, C, for nodule N1 and microcalcication group
M1, with the moving Bucky (grid) disabled. (The mAs was 3 for all expo-
sures)
kVp Acrylic C(N1) C(M1) ACR score
26 3 7.35 16.74 36
27 3 6.92 16.86 36
28 3 5.84 15.68 25.5
29 3 4.78 12.36 20.5
30 3 3.94 10.93 18.5
It is the computer averaging of many pixel values which
results in acceptably precise contrast values. Averaging of N
values reduces the imprecision by
N
p
. Thus, typically con-
trast values of around 8 (grey levels), which are imprecise to
6
2
p
(i.e. each value is imprecise to 6 1 before subtrac-
tion) can produce averaged contrast values precise to about
60.012 if 12 000 values are used. We have observed that if
the same operator measures C(N1) for the same image on
different occasions, the measured values are indeed precise
to within about 6 0.01. However, if different operators
measure C(N1) the precision falls, with a variability of
about 6 0.04. This indicates that there is a degree of varia-
bility within our protocol (most likely during initial thresh-
olding) and/or the operators have different visual acuities.
Similarly, between-reader variability worsened the preci-
sion of our C(M1) values from an expected 6 0.08
(

2
p
4
300
p
) to around 6 0.3. Nevertheless, these levels
of precision are far superior to what could be achieved using
manual measurements.
The effect on image quality of varying mAs at xed kVp
was investigated. As mAs was increased, the images obtained
were overall darker. Table 2 shows that the contrast of the
features changes with mAs, although the changes are smaller
than those caused by changing kVp. The changes are qualita-
tively what might be expected: the contrast is maximal at the
optimal (i.e. AEC) exposure, and is reduced by both under-
and over-exposure as a result of the sigmoidal shape of the
characteristic curve of the lm. Detailed analysis of the effects
of mAs on image quality would be rather complicated because
of the highly non-linear response of mammographic lm to
exposure, and is beyond the scope of this current study.
Table 2 includes values taken from images recorded
under AEC conditions which can be compared with corre-
sponding values in Table 1 taken from different images
under identical conditions but on different days. The mea-
sured contrast values are broadly similar, but differ by more
than the imprecision involved in reading identical images.
This is expected, and is due to inherent and random uctua-
tions of image quality between images obtained at identical
X-ray techniques (due to the Poissonian statistical variation
of the images and changes in processing conditions [19]).
Grids are used in many radiographic procedures to reduce
secondary (scattered) radiation and hence improve image
contrast. In mammography, due to the low X-ray tube
potentials and the small volume of tissue irradiated, scatter
was thought to have a negligible effect on image contrast.
However, depending on breast thickness and, to a lesser
degree, eld size, the scatter-to-primary ratio in mammo-
graphy ranges from 0.4 to 1.5 resulting in a signicant con-
trast degradation (scatter degradation factors, SDF's, of
0.710.4) unless scatter is controlled [35]. The GE Seno-
graphe DMR uses a moving Bucky grid with a grid ratio of
5:1 and a strip density of 60 lines/cm, resulting in a Bucky
factor of about 2 and an expected contrast improvement of
about 1.4 [35]. Images are routinely taken with the grid in
position, but it can be disabled.
The contrast values for the images taken with the grid
disabled, shown in Table 3, can be compared with the cor-
responding values with the grid enabled (in Table 1). They
are lower, as expected, although the experimental contrast
improvement factor for the main module is about 1.25 on
averageless than the theoretically expected value of 1.4.
A quantitative image-quality index for imaging systems
can be calculated from physical parameters. It depends on
the system modulation transfer function (MTF), the image
contrast obtained, and noise in the system. The International
Commission on Radiation Units and Measurements [17] has
proposed the use of a signal-to-noise ratio in the form of the
Noise Equivalent Quanta (NEQ) as a quantitative index of
image quality. The NEQ represents the signal-to-noise ratio
as a function of spatial frequency, and so relates to an obser-
ver's ability to distinguish features of clinical relevance
from noise of similar spatial frequency. Formally, the
NEQ depends on the square of the MTF divided by the
Wiener (or noise) power spectrum, W
n
(u)
NEQ(u)
K
2
:MTF
2
(u)
W
n
(u)
(2)
where K depends on the imaging task. Thus the spatial
resolution of the imaging system, information on which is
contained within the system MTF, directly affects the image
quality as represented by the NEQ. A direct lm exposure
(i.e. without the use of a uorescent screen) would result in
improved image quality as a result of its better high spatial
frequency information.
Measurement of the NEQ requires microdensitometer
scans of images of bar pattern test objects (to obtain the
MTF) and scans of areas of uniform lm density (from
which the Wiener spectrum can be obtained from Fourier
analysis), along with sophisticated software. It is not a tri-
vial exercise. Furthermore, it is predicated on a sophisti-
cated theoretical basis with which most radiographers and
radiologists will not be familiar.
At the other extreme of image quality analysis is the
arbitrary scoring system used with the Mammography
Accreditation Phantom of the ACR. This is quick and sim-
ple to implement, but variability in reader performance due
to the inherent subjectivity of the assessment mechanism
affects the overall score despite efforts to minimize it. Med-
ical physicists, for example, have been shown to produce
signicantly higher scores than radiologists or untrained
readers [36].
Our approach took an intermediate position. It is not
based on sophisticated mathematical theory, yet it provides
a computer-based, and therefore objective, robust, numeri-
cal metric to quantitate image quality, which is easy to
implement. Its simplicity and sensitivity make it an attrac-
tive indicator for clinical use. We deliberately avoided
developing a methodology that would require alignment
of images with a standard reference image, either in the
spatial [19] or Fourier [24] domain, since this requires pre-
cise digitization, scaling and ne registration of images
which is both difcult and time-consuming. Our protocol can
be easily adapted to other phantoms containing more complex
features, such as found in anthropomorphic phantoms [18,21]
although adaptive thresholding may be necessary to cope with
the anticipated increase in the inhomogeneity of the back-
ground reecting the structure of the breast tissue. We are
currently investigating how well the contrast metrics obtained
from the ACR phantom reect the global quality of clinical
mammograms, and their eventual clinical diagnoses.
Studies on observer detection performance, using statis-
tical decision theory, concluded that an optimal human obser-
ver uses generalized signal-to-noise ratios to estimate image
quality [37]. However, this may provide little insight into how
a radiologist actually performs a specic visual task. Our
studies agree with others [19,24] which have found contrast
alone, rather than contrast (or signal)-to-noise, to be the pre-
eminent factor linked to human performance in ranking image
quality. All the work to date reects the clinical reality within
a mammographic facility that factors which explicitly affect
the spatial resolution (such as the focal spot size and lm-
screen combination, and the sampling rate used in subsequent
digitization) are not varied. It is possible that comparison of
image quality across different systems may require an explicit
consideration of spatial resolution. However, our initial stu-
dies using a Senographe 600S, Kodak min-R cassettes and
Dupont Microvision lm suggest that the contrast metric
remains valid when comparing across current dedicated mam-
mography systems.
Digital mammography can overcome many of the limita-
tions of conventional mammography by separating, and
independently optimizing, the functions of image acquisi-
tion and display. Our proposed quality indices are just as
applicable to phantom images obtained from digital
systems, and are even easier to implement since the
mammograms are already in digital form.
Whilst the evaluation of image quality is most conveni-
ently conducted using a standardized phantom, we are
interested in adapting our protocol to the perception of
clinical abnormalities. Given the shortage of trained radiol-
ogists and the virtual impossibility of sustaining interest in
the interpretation of large numbers of mammographic
images, automated evaluation of image quality and
detection of abnormalities is an imperative.
5. Summary
A convenient and systematic protocol for evaluating the
image quality of digitized mammographic phantom images
has been developed. It involves the measurement of the
contrast of a low-contrast nodule and/or a group of micro-
calcications from images of the American College of Radi-
ology (ACR) mammographic accreditation phantom
acquired under different X-ray techniques. The rst step
was to minimize the effect of the inhomogeneous back-
ground. The features of interest were then separated from
the background using a mask obtained by appropriate
thresholding and morphological processing. Computer aver-
aging over many pixels produced acceptably precise con-
trast values for a nodule even though the difference between
pixel values within and without a nodule hardly exceeded
the noise level. No alignment or registration of images was
required in the measurement of the microcalcications. The
resulting values provided robust, numerical metrics which
correlated well (with Spearman rank correlation coefcients
of 0.998 and 0.867, at p , 0.001, for the main nodule and
microcalcication group respectively) with the ranking of
image quality by a panel of ve experts. The approach opens
up the possibility of a simple but reproducible method of
identifying clinical pathologies, which is a necessary pre-
condition for an efcient screening programme.
Acknowledgements
This work was supported by a grant from Kuwait Uni-
versity (MH021).
References
[1] Breast cancer facts and gures. Atlanta, GA: American Cancer
Society, 1996.
[2] Ries LAG, Hankey BF, Miller BA, Hartman AM, Edwards BK.
Cancer statistics review 197388. NIH Publication no. 91-2789.
Bethesda, MD: National Cancer Institute, 1991.
[3] Orel SG, Troupin RH. Nonmammographic imaging of the breast:
current issues and future prospects. Seminars in Roentgenology,
1993;28:231241.
[4] Shapiro S, Venet W, Strax P, Venet L, Roeser R. Ten-to-fourteen-year
effect of screening on breast cancer mortality. J Natl Cancer Inst,
1982;69:349355.
[5] Kopaus DB, Feig SA. The Canadian National Breast Screening Study:
a critical review. Am J Roentgenol, 1993;161:755760.
[6] Nystrom L, Rutqvist LE, Wall S, Lindgren A, Lindqvist M, Ryden S,
Andersson I, Bjurstam N, Fagerberg G, Frisell J, Tabar L, Larsson
L-G. Breast cancer screening with mammography: overview of Swe-
dish randomised trials. Lancet, 1993;341:973978.
[7] Peters GN, Vogel GV, Evans WP, Bondy M, Halabi S, Lord J, Laville
EA. The Texas Breast Screening Project: Part 1 Mammographic and
clinical results. South Med J, 1993;86:385399.
[8] Bird RE. A successful breast cancer screening program. Cancer,
1992;69:19381941.
[9] Shapiro S. Periodic breast cancer screening in seven foreign countries.
Cancer, 1992;69:19191942.
[10] Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grontoft O.
Update of the Swedish two-country program of mammographic
screening for breast cancer. Radiol Clin North Am, 1992;30:187210.
[11] Smart CR, Hendrick RE, Rutledge JH, Smith RA. Benet of
mammographic screening in women aged 4049 years. Cancer,
1995;75:16191622.
[12] Liddell MJ. Mass mammographic screening-the other side of the
argument. Aust Fam Physician, 1993;22:11681175.
[13] Den Otter W, Merchant TE, Beijerunck D, Koten JW. Exclusion from
mammographic screening of women genetically predisposed to breast
cancer. Anticancer Res, 1993;13:11131115.
[14] Haiart DC, Henderson J. A comparison of interpretation of screening
mammograms by a radiographer, a doctor and a radiologist: results
and implications. Brit J Clin Pract, 1991;45:4345.
[15] UK Trial of Early Detection of Breast Cancer Group. Specicity of
screening in United Kingdom trial of early detection of breast cancer.
Brit Med J, 1992;304:346349.
[16] Dougherty G, Newman D. Quantitation of lm-screen contact in
radiographic imaging. Radiography, 1996;2:301309.
[17] Medical ImagingThe assessment of image quality. International
Commission on Radiation Units and Measurements. Report 54.
Bethesda, MD, 1996.
[18] Yaffe MJ, Byng JW, Caldwell CB, Bennett NR. Anthropomorphic
radiological phantoms for mammography. Med Prog Tech,
1993;19:2330.
[19] Chakraborty DP, Eckert MP. Quantitative versus subjective
evaluation of mammography accreditation phantom images. Med
Phys, 1995;22:133143.
[20] Kimme-Smith C, Bassett LW, Gold RH. A review of mammography
test objects for the calibration of resolution, contrast and exposure.
Med Phys, 1989;16:758765.
[21] Caldwell CB, Fishell EK, Jong RA, Weiser WJ, Yaffe MJ. Evaluation
of mammographic image quality: pilot study comparing ve methods.
Am J Roentgen, 1992;159:295301.
[22] Dougherty G, Newman D. The effect of anticipation in the scoring of
mammographic accreditation phantom images. Radiography,
1997;3:279286.
[23] Brooks KW, Trueblood JH, Kearfott KJ, Lawton DT. Automated
analysis of mammographic quality control images. Med Phys,
1993;20:881892.
[24] Brooks KW, Trueblood JH, Kearfott KJ, Lawton DT. Automated
analysis of the American College of Radiology mammographic
accreditation phantom images. Med Phys, 1997;24:709923.
[25] Floyd CE, Baydush AH, Lo JY, Bowsher JE, Ravin CE. Bayesian
restoration of chest radiographs: scatter compensation with improved
signal-to-noise ratio. Invest Radiol, 1994;29:904910.
[26] Johns PC, Yaffe MJ. X-ray characterisation of normal and neoplastic
breast tissues. Phys Med Biol, 1987;32:675695.
[27] Wagner AJ, Frey GD. Quantitative mammography contrast threshold
test tool. Med Phys, 1995;22:127132.
[28] Baydush AH, Floyd CE. Spatially varying Bayesian image estimation.
Acad Radiol, 1996;3:129136.
[29] Rehm K, Strother SC, Anderson JR, Schaper KA, Rottenberg DA.
Display of merged multimodality brain images using interleaved
pixels with independent color scales. J Nucl Med, 1994;35:1815
1821.
[30] Aufrichtig R, Xue P, Thomas CW, Gilmore GC, Wilson DL. Perpetual
comparison of pulsed and continuous uoroscopy. Med Phys,
1994;21:245256.
[31] Hendrick RE. Mammography Quality Control Manual. American
College of Radiology, Committee on Quality Assurance in
Mammography. Reston, VA: American Cancer Society, 1992.
[32] Dougherty G. Quantitative indices for ranking the severity of
hepatocellular carcinoma. Comp Med Imag Graph, 1995;19:329338.
[33] Kruskal WH. Ordinal measures of association. J Am Stat Assoc,
1958;53:814861.
[34] Photon, electron, proton and neutron interaction data for body tissues.
International Commission on Radiation Units and Measurements.
Report No. 46. Bethesda, MD, 1992:122.
[35] Barnes GT. Contrast and scatter in X-ray imaging. Radiographics,
1991;11:307323.
[36] Brooks KW, Trueblood JH, Kearfott KJ. Subjective evaluation of
mammographic accreditation phantom images by three observer
groups. Invest Radiol, 1994;29:4247.
[37] Wagner RF. Decision theory and the detail SNR of Otto Schade.
Photog Sci Eng, 1978;22:4146.
Geoff Dougherty received a B.Sc. from
the University of Manchester in 1971,
and a Ph.D. from Keele University in
1979. He has held University posts in
England, Australia, Switzerland and
Malaysia, and was a visiting Research
Professor at Swarthmore College, Phila-
delphia. He has been Professor of Med-
ical Image Processing at the University
of Kuwait since 1992. His research inter-
ests include medical applications of
image processing and pattern recogni-
tion, fractal dimension and lacunarity
as multiresolution texture measures, and the quantitation of global
image quality. Professor Dougherty is a Senior Member of IEEE, a
Member of AAPM, and a Fellow of IEE and AIP. (Note: IEE-Institution
of Electrical Engineers (UK); AIP-Australian Institute of Physics.)

1 Computerized Evaluation of Mamographic Image Quality Using Phantom Images - Geoffrey Doughety

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Computerized Evaluation of Mamographic Image Quality Using Phantom Images - Geoffrey Doughety

Uploaded by

Copyright:

Available Formats

Computerized evaluation of mammographic image quality using phantom

You might also like