You are on page 1of 1

Machine Learning Approaches to Automatic BI-RADS Classification of Mammography Reports

Bethany Percha and Daniel Rubin


Program in Biomedical Informatics and Department of Radiology, Stanford University

Introduction Preprocessing Classification


Clinical information is often recorded as 41,142 reports extracted from Stanford’s radTF database Technique % Accuracy
narrative (unstructured) text. 38,665 were diagnostic mammograms (not specimen analyses or descriptions of biopsy Naive Bayes 76.4
This is problematic for both researchers and procedures) Multinomial Naive Bayes 83.1
clinicians, as free text thwarts attempts to 22,109 had BI-RADS codes (older reports frequently don’t have them) and were unilateral K-Nearest Neighbors (K=10) 87.5
standardize language and ensure document (single-breast) mammography reports Support Vector Machines
completeness. Each remaining report was processed as follows: LIBLINEAR (L2-norm, one-against-one) 89.3
Natural language processing could be used LIBLINEAR (Multiclass Cramer) 89.3
to extract relevant information from LIBLINEAR-POLY2 (polynomial kernel, degree 2) 90.1
unstructured text reports, but reports must Accuracy was determined using 10-fold cross-validation.
be both complete and consistent.
Misclassification error did not decrease significantly with more training
A feedback system which extracts relevant data (high bias). Including more features, such as bigrams, did not
information from text as it is being generated improve performance.
and prompts the physician to modify the
report as needed would be useful, both in
physician training and clinical practice.
Here we demonstrate our preliminary results
in building an automatic classification
system to automatically assign BI-RADS
assessment codes to mammography
reports.
0 Incomplete
1 Negative
2 Benign finding(s) The final confusion matrix (units are %) was:
3 Probably benign Class Classified As. . .
4 Suspicious After preprocessing, the reports were converted into feature vectors, where each feature was the 0 1 2 3 4 5 6
0 93.7 2.3 3.1 0.1 0.8 0.0 0.0
abnormality number of times a given word stem appeared in a report. There were 2,216 unique stems. 1 0.4 93.6 5.9 0.1 0.0 0.0 0.0
5 Highly suggestive of 2 0.9 11.1 87.1 0.1 0.6 0.0 0.1
malignancy Feature Ranking 3 7.1 21.1 49.1 9.7 12.6 0.0 0.3
6 Known biopsy - 4 8.5 3.7 10.6 0.6 75.9 0.0 0.7
The most informative features were chosen using chi-squared attribute evaluation. The 5 0.0 0.0 0.0 0.0 100.0 0.0 0.0
proven malignancy most informative stems were: 6 4.9 4.9 24.6 0.8 27.9 0.0 36.9
Stem Most Common Context Occurrences per report by class
0 1 2 3 4 5 6 Conclusions
breast (Many contexts.) 4.2 1.9 3.8 4.6 5.7 6.9 7.7
featur no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.1 0.1 Radiologists’ word choices are a good indicator of which BI-RADS class
nippl x cm from the nipple (Describing a mass.) 1.2 0.1 0.2 0.8 2.3 4.6 2.8 they choose, but the correspondence is not perfect, particularly for the
malign no mammographic features of malignancy 0.1 1.1 1.2 0.1 0.1 0.3 0.3 higher BI-RADS values.
evalu incompletely evaluated 1.0 0.0 0.0 0.1 0.1 0.1 0.2
The development of training software for radiologists based on this
incomplet incompletely evaluated 0.9 0.0 0.0 0.0 0.0 0.0 0.0
mammograph no mammographic features of malignancy 0.3 1.5 1.8 0.7 0.9 1.7 1.2 approach could help them standardize their descriptions of images, and
stabl stable post-biopsy change 0.2 0.3 1.5 0.7 0.4 0.2 0.5 learn to better describe which specific features of the image cause them
calcif calcifications 0.6 0.1 0.7 1.3 1.5 1.9 2.0 to place it in a given class.

You might also like