Breast Cancer Diagnostics With Bayesian Networks

Breast Cancer Diagnostics with Bayesian Networks
Reevaluating the Wisconsin Breast Cancer Database with BayesiaLab
Stefan Conrady, stefan.conrady@conradyscience.com
Dr. Lionel Jouffe, jouffe@bayesia.com
March 5, 2011
Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
Introduction to Bayesian Networks
Table of Contents
Introduction
About the Authors 2
Stefan Conrady 2
Lionel Jouffe 2
Case Study & Tutorial

Background 3
Wisconsin Breast Cancer Database 3
Notation 4
Data Import 4
Unsupervised Learning 6
Model 1: Markov Blanket 7
Model 1 Performance 11
Model 2: Augmented Markov Blanket 13
Model 2 Performance 14
Structural Coefficient 16
Conclusion 22
Model Application
Interactive Inference 23
Target Interpretation Tree 24
Summary 26
References 27
Contact Information 28
Conrady Applied Science, LLC 28
Bayesia SAS 28
Copyright 28
www.conradyscience.com | www.bayesia.com i
Breast Cancer Diagnostics with Bayesian Networks and BayesiaLab
Introduction
Data classification is one of the most common tasks in the field of statistical analysis and countless methods have been
developed for this purpose over time. A common approach is to develop a model based on known historical data, i.e.
where the class membership of a record is known, and to use this generalization to predict the class membership for a
new set of observations.
Applications of data classifications permeate virtually all fields of study, including social sciences, engineering, biology,
etc. In the medical field, classification problems often appear in the context of disease identification, i.e. making a diag-
nosis about a patient’s condition. The medical sciences have a long history of developing large body of knowledge,
which links observable symptoms with known types of illnesses. It is the physician’s task to use the available medical
knowledge to make inference based on the patient’s symptoms, i.e. to classify the medical condition, in order to enable
appropriate treatment.
Over the last two decades, so-called medical expert systems have emerged, which are meant to support physicians in
their diagnostic work. Given the sheer amount of medical knowledge in existence today, it should not be surprising that
significant benefits are expected from such machine-based support in terms of medical reasoning and inference.
In this context, several papers by Wolberg, Street, Heisey and Managasarian became much-cited examples. They pro-
posed an automated method for the classification of Fine Needle Aspirates1 through imaging processing and machine
learning, with the objective of achieving a greater accuracy in distinguishing between malignant and benign cells for the
diagnosis of breast cancer. At the time of their study, the practice of visual inspection of FNA yielded an inconsistent
diagnostic accuracy. The proposed new approach would increase this accuracy reliably to over 95%. This research was
quickly translated into clinical practice and has since been applied with continued success.
As part of their studies in the late 1980s and 1990s, the research team generated what became known as the Wisconsin
Breast Cancer Database, which contains measurements of hundreds of FNA samples and the associated diagnoses. This
database has been extensively studied, especially outside the medical field. Statisticians and computer scientists have
proposed a wide range of techniques for this classification problem and have continuously raised the benchmark for
predictive performance.
Our objective with this paper is to present Bayesian networks as a very practical framework for working with this kind
of classification problem. Furthermore, we intend to demonstrate how the BayesiaLab software can extremely quickly —
and simply — create a Bayesian network model that is on par performance-wise with virtually all existing models. Also,
while most of our previous white papers focused on marketing science applications, we hope that this case study from
the medical field can demonstrate their universal applicability of Bayesian networks.
We speculate that our modeling approach with Bayesian networks (as the framework) and BayesiaLab (as the software
tool) achieves 99% of the performance of the best conceivable, custom-developed model, while only requiring 10% of
the development time. This allows researchers to focus more on the subject matter of their studies, because they are less
1 Fine needle aspiration (FNA) is a percutaneous (“through the skin”) procedure that uses a fine gauge needle (22 or 25
gauge) and a syringe to sample fluid from a breast cyst or remove clusters of cells from a solid mass. With FNA, the cel-
lular material taken from the breast is usually sent to the pathology laboratory for analysis.
www.conradyscience.com | www.bayesia.com 1
distracted by the technicalities of traditional statistical tools. As a result, Bayesian networks and BayesiaLab are a very
important innovations accelerating research and in pursuing translational science.
About the Authors
Stefan Conrady
Stefan Conrady is the co-founder and managing partner of Conrady Applied Science, LLC, a privately held consulting
firm specializing in knowledge discovery and probabilistic reasoning with Bayesian networks. In 2010, Conrady Applied
Science was appointed the authorized sales and consulting partner of Bayesia SAS for North America. Stefan Conrady
has extensive management experience in the fields of product planning, marketing science and advanced analytics. Prior
to establishing his own firm, he was heading the Analytics & Forecasting group at Nissan North America.
Lionel Jouffe
Dr. Lionel Jouffe is co-founder and CEO of France-based Bayesia SAS. Lionel Jouffe holds a Ph.D. in Computer Science
and has been working in the field of Artificial Intelligence since the early 1990s. He and his team have been developing
BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and
knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as
in business and industry. The relevance of Bayesian networks, especially in the context of consumer research, is high-
lighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.
Case Study & Tutorial
Background
To provide context for this study, we quote Mangasarian, Street and Wolberg (1994), who conducted the original re-
search related breast cancer diagnosis with digital image processing and machine learning:
Most breast cancers are detected by the patient as a lump in the breast. The majority of breast lumps are benign, so it is
the physician’s responsibility to diagnose breast cancer, that is, to distinguish benign lumps from malignant ones. There
are three available methods for diagnosing breast cancer: mammography, FNA with visual interpretation and surgical
biopsy. The reported sensitivity, i.e. ability to correctly diagnose cancer when the disease is present of mammography
varies from 68% to 79%, of FNA with visual interpretation from 65% to 98%, and of surgical biopsy close to 100%.
Therefore mammography lacks sensitivity, FNA sensitivity varies widely, and surgical biopsy, although accurate, is inva-
sive, time consuming and costly. The goal of the diagnostic aspect of our research is to develop a relatively objective sys-
tem that diagnoses FNAs with an accuracy that approaches the best achieved visually.
Wisconsin Breast Cancer Database

This breast cancer database was created through the clinical work of Dr. William H. Wolberg at the University of Wis-
consin Hospitals in Madison. As of 1992, Dr. Wolberg had collected 699 instances of patient diagnoses in this database,
consisting of two classes: 458 benign cases (65.5%) and 241 malignant cases (34.5%).
The following eleven attributes2 are included in the database:
1. Sample code number
2. Clump Thickness (1 - 10)
3. Uniformity of Cell Size (1 - 10)
4. Uniformity of Cell Shape (1 - 10)
5. Marginal Adhesion (1 - 10)
6. Single Epithelial Cell Size (1 - 10)
7. Bare Nuclei (1 - 10)
8. Bland Chromatin (1 - 10)
9. Normal Nucleoli (1 - 10)
10. Mitoses (1 - 10)
11. Class (benign/malignant)
Attributes 2 through 9 were computed from digital images of fine needle aspirates (FNA) of breast masses. These fea-
tures describe the characteristics of the cell nuclei in the image. The class membership was established via subsequent
biopsies or via long-term monitoring of the tumor.
2 Upon exclusion of the row identifier, this database is ideally suited for the evaluation version of BayesiaLab, which is
limited to ten nodes.
We will not go into detail here regarding the definition of the attributes and their measurement. Rather, we refer the
reader to papers referenced in the bibliography.
The Wisconsin Breast Cancer Database is available to any interested researcher from the UC Irvine Machine Learning
Repository.3 We use this database in its original format without any further transformation, so our results can be di-
rectly compared to dozens of methods that have been developed since the original study.
Notation
To clearly distinguish between natural language, software-specific functions and study-specific variable names, the fol-
lowing notation is used:
• BayesiaLab-specific functions, keywords, commands, etc., are shown in bold type.
• Attribute/variable/node names are capitalized and italicized.
Data Import
Our modeling process begins with importing the database, which is available in a CSV format, into BayesiaLab. The
Data Import Wizard guides the analyst through the required steps.
In the first dialogue box of the Data Import Wizard, we can click on Define Typing and specify that we wish to set aside
test set of the database. Following common practice, we will randomly select 20% of the 699 records as test data, and,
as a result, the remaining 80% will serve as our training data set.
3 UC Irvine Machine Learning Repository website: http://archive.ics.uci.edu/ml/
In the next step, the Data Import Wizard will suggest the data type for each variable (or attribute4 ). Attributes 2 through
10 are identified as continuous variables and Class is read as a discrete variable. Only for the first variable, Sample code,
the analyst has to specify Row Identifier, so it is not mistaken for a continuous predictor variable.
For the import process of this study, the most important step is the selection of the discretization algorithm. As we know
that the exclusive objective is classification, we will choose the Decision Tree algorithm, which will discretize each vari-
able for an optimum information gain with respect to the target variable Class.
Bayesian networks are entirely non-parametric, probabilistic models and for their estimation they require a certain
minimum of observations. To help us with the selection of discretization levels, we use the heuristic of five observations
per parameter and probability cell. Given that we have a relatively small database with only 560 observations,5 three
discretization intervals for each variable appear to be an appropriate choice. If we used a higher number of discretiza-
tion levels, we would most likely need more observations for the reliable estimation of the parameters.
4 “Attribute” and “variable” are used interchangeably throughout the paper.

5 560 cases are in the training set (80%) and 139 are in the test set (20%).
Upon clicking Finish, we will immediately see a representation of the newly imported database in the form of a fully
unconnected Bayesian network. Each variable is now represented as a blue node in the graph panel of BayesiaLab.
The question mark symbol, which is associated with the Bare Nuclei node, indicates that there are missing values for
this variable. Hovering over the question mark with the mouse pointer while pressing the “i” key will show the number
of missing values.
Unsupervised Learning
When working with BayesiaLab, it is recommended to always perform Unsupervised Learning first on any newly im-
ported database. This is the case, even when the exclusive objective is predictive modeling, for which Supervised Learn-
ing will later be the main tool.
Learning>Association Discovering>EQ will initiate the EQ algorithm, which, in this case, is suitable for the initial review
of the database. For larger databases with significantly more variables, the Maximum Weight Spanning Tree is a very
fast algorithm and can be used first instead.
The analyst can visually review the learned network structure and compare it to his or her domain knowledge. This
quickly provides a “sanity check” for the database and the variables and it may highlight any inconsistencies.
Furthermore, one can also display the Pearson correlation between the nodes, by selecting Analysis>Graphic>Pearson’s
Correlation and clicking the Display Arc Comment button in the toolbar.
For instance, a potentially incorrect sign of a correlation would noticed immediately by the analyst as the arcs are color-
coded. Red and blue arcs indicate negative and positive Pearson correlations respectively.
Model 1: Markov Blanket

Now that all data is stored within BayesiaLab (and reviewed through the Unsupervised Learning step), we can proceed
to the modeling stage. Given our objective of predicting the state (benign versus malignant) of the variable Class, we will
define it as the Target Variable by right-clicking on the node and selecting Set as Target Variable from the contextual
menu. We need to specify this explicitly, so the subsequent Supervised Learning algorithm can use Class as the depend-
ent variable. The supervised learning algorithms are then available under Learning>Target Node Characterization.
In most cases, the Markov Blanket algorithm is a good starting point for any predictive model. This algorithm is ex-
tremely fast and can even be applied to databases with thousands of variables and millions of records, although data-
base size is not a concern in this particular study.
The Markov Blanket for a node A is the set of nodes composed of A’s parents, its children, and its children’s other par-
ents (=spouses).
The Markov Blanket of the node A contains all the variables, which, if we know their states, will shield the node A from
the rest of the network. This means that the Markov Blanket of a node is the only knowledge needed to predict the be-
havior of that node A. Learning a Markov Blanket selects relevant predictor variables, which is particularly helpful
when there is a large number of variables in the database (In fact, this can also serve as a highly-efficient variable selec-
tion method in preparation for other types of modeling, outside the Bayesian network framework).
Upon Markov Blanket learning for our database, the resulting Bayesian network looks as follows:
This suggests that Class, has a direct probabilistic relationship with all variables except Marginal Adhesion and Single
Epithelial Cell Size, which are disconnected. The lack of their connection with the Target indicates that these nodes are
independent given the nodes in the Markov Blanket.
For a better visual interpretation, we will apply the Force Directed Layout algorithm and obtain a view with the Class at
its center. Both unconnected variables are shown at the bottom of the graph.
Beyond distinguishing between predictors (connected nodes) and non-predictors (disconnected nodes), we can further
examine the relationship versus the Target Node Class by highlighting the Mutual Information of the arcs connecting
the nodes. This function is accessible within the Validation Mode via Analysis>Graphic>Arcs’ Mutual Information.
The thickness of the arcs is now proportional to the Mutual Information, i.e. the strength of the relationship between
the nodes. Intuitively, Mutual Information measures the information that X and Y share: it measures how much know-
ing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent, then know-
ing X does not provide any information about Y and vice versa, so their Mutual Information is zero. At the other ex-
treme, if X and Y are identical then all information conveyed by X is shared with Y: knowing X determines the value of
Y and vice versa.
Formal Definition of Mutual Information
⎛ p(x, y) ⎞
I(X;Y ) = ∑ ∑ p(x, y)log ⎜
y∈Y x∈X ⎝ p(x)p(y) ⎟⎠
We can also show the values of the Mutual Information on the graph by clicking on Display Arc Comments.
In the top part of the comment box attached to each arc, the Mutual Information of the arc is shown.
Below, expressed as a percentage and highlighted in blue, we see the relative Mutual Information in the
direction of the arc (parent node ➔ child node). And, at the bottom, we have the relative mutual
information in the opposite direction of the arc (child node ➔ parent node).
Model 1 Performance
As we are not equipped with specific domain knowledge about the variables, we will not further interpret these relation-
ships but rather run an initial test for Network Performance — we want to know how well this Markov Blanket model
can predict the states of the Class variable, i.e. benign versus malignant. This test is available via Analysis>Network Per-
formance>Targeted.
Using our previously defined test set for validating our model, we obtain the following, rather encouraging results:
Markov Blanket - Test Set
Of the 87 benign cases of the test set, 96.5% were correctly identified (true negative), which corresponds to a false posi-
tive rate of 3.5%. More importantly though, of the 52 malignant cases, 100% were identified correctly (true positive)
with no false negatives. This yields a total precision of 97.8%.
Analogous to the original papers on this topic, we will also perform a K-Fold Cross Validation, which will iteratively
select different test and training sets, and, based on those, learn and test the model. The Cross Validation can be per-
formed via Tools>Cross Validation>Targeted.
We choose 10 samples, i.e. 10 iterations with 69 cases as test samples and 630 training cases.
The results from the Cross Validation confirms the good performance of this model. The overall precision is 96.7%,
with a false negative rate of 2.9%.
Markov Blanket - Cross Validation
At this point we might be tempted to conclude our analysis, as our Markov Blanket modeling is already performing at a
level comparable to the most sophisticated (and complex) models ever developed from this database. More remarkable
though is the minimal effort that was required for creating our model with the Supervised Learning algorithms in
BayesiaLab. Even a new user of BayesiaLab would be expected to replicate the above steps in less than 30 minutes.
Model 2: Augmented Markov Blanket

BayesiaLab offers an extension to the Markov Blanket algorithm, named Augmented Markov Blanket, which performs
an unsupervised learning algorithm on those nodes, which were previously selected by the Markov Blanket learning.
This allows to identify influence paths between the predictor variables and can potentially help improve the prediction
performance.
This sequence of algorithms can be started via Learning>Target Node Characterization>Augmented Markov Blanket.
As can be expected, the resulting network is somewhat more complex than the standard Markov Blanket.
The additional arcs (compared to the Markov Blanket network) are highlighted with green markers.
Model 2 Performance
With this Augmented Markov Blanket network we now proceed to performance evaluations, analogous to the Markov
Blanket model. Initially, we evaluate the performance on the test set.
Augmented Markov Blanket - Test Set
To complete the evaluation of this model, we will also perform a K-Fold Cross Validation.
Augmented Markov Blanket - Cross-Validation
Despite the greater complexity of the model, we only see a marginal improvement in overall precision.
Structural Coefficient
Up to this point, we have not addressed the Structural Coefficient (SC), which is the only adjustable parameter for all
the learning algorithms in BayesiaLab. This parameter is available to manage network complexity.
By default, this Structural Coefficient is set to 1, which reliably prevents the learning algorithms from overfitting the
model to the data. In studies with relatively few observations, the analyst’s judgment is needed for determining a poten-
tial downward adjustment of this parameter. On the other hand, when data sets are very large, increasing the parameter
to values higher than 1 will help manage the network complexity.
Given the fairly simple network structure of Model 1, complexity was of no concern. Model 2 is more complex, but still
very manageable. The question is, could a more complex network provide greater precision without overfitting? To an-
swer this question, we will perform the Structural Coefficient Analysis, which generates several metrics that help in mak-
ing a trade-off between complexity and precision. The function Tools>Cross Validation>Structural Coefficient Analysis
starts this process.
We are prompted to specify the range of the Structural Coefficient to be examined and the number of iterations. The
Number of Iterations determines the interval steps to be taken within the specified range of the Structural Coefficient.
Given the relatively light computational load, we choose 50 iterations. With more complex models, we might be more
conservative, as each iteration re-learns and re-evaluates the network. Furthermore, we select Compute Structure/
Target’s Precision Ratio to compute our target metric.
The resulting report will show us how the network structure changes as a function of the Structural Coefficient. This can
be interpreted as the degree of confidence the analyst should have in any particular arc in the structure.
Clicking Graphs, will show a synthesized network, consisting of all structures generated during the iterative learning
process.
The reference structure is represented by black arcs, which show the original network learned prior to the start of the
Structural Coefficient Analysis. The blue-colored arcs are not contained in the reference structure, but they appear in
networks that have been learned as a function of the different Structural Coefficients (SC). The thickness of the arcs is
proportional to the frequency of individual arcs existing in the learned networks.
More importantly for us, however, is determining the correct level of network complexity for a reliable and accurate
prediction performance while avoiding to overfit the data. We can plot several different metrics in this context by click-
ing Curve.
Structure/Target’s Precision Ratio is the most relevant metric in our case and the corresponding plot is shown below.
This first plot shows the metric computed for the whole database.
Typically, the “elbow” of the L-shaped curve identifies a suitable value for the Structural Coefficient (SC). More for-
mally, we would look for the point on the curve where the second derivative is maximized. With a visual inspection, an
SC value of around 0.4 appears to be a good candidate for that point. The portion of the curve, where SC values ap-
proach 0, shows the characteristic pattern of overfitting, which is to be avoided.
In order to further validate this interpretation, we will also compute the same metric for the training/test database.
This graph has the same properties as the previous one and suggests a similar SC value. As a result, we can have some
confidence in this new value for the Structural Coefficient.
We will also plot the Target’s Precision alone as a function of the SC. On the surface, the curve resembles an L-shape,
too, but the curve moves only within roughly 1 percentage point, i.e. between 97% and 98%. For practical purposes,
this means that the curve is virtually flat.
⎛ Structure ⎞
As a result, the Structure/Target’s Precision Ratio ⎜ i.e. ⎟ is primarily a function of the numerator, i.e. Struc-
⎝ Target's Precision ⎠
ture, as the denominator, Target’s Precision, is nearly constant across a wide range of SC values, as per the graph above.
The joint interpretation of Target’s Precision and Structure/Target’s Precision Ratio indicates that little can be gained
with lowering the SC, but that there is a definite risk of overfitting.
Nevertheless, we relearn the network with an SC of 0.4, generating, as expected, a more complex network, which is
displayed below.
The performance of the model (with SC=0.4) on the test set appears to be virtually the same,
Augmented Markov Blanket (SC=0.4) - Test Set
and the result from the K-Fold Cross Validation is not materially different from the previous performance with SC=1.
Augmented Markov Blanket (SC=0.4) - Cross-Validation
Conclusion
The models reviewed, Markov Blanket and Augmented Markov Blanket (SC=0.4 and SC=1), have performed at virtually
indistinguishable levels in terms of classification performance. The greater complexity of either Augmented Markov
Blanket specification did not yield the expected precision gain. Precision and false negatives are shown as the key met-
rics in the summary table below.
G#&/,B#/,A(D<H4E C"'&&,I*+%@*/%'(,A(D;44E
B>??*"J !"#$%&%'(
)*+&#,
!"#$%&%'(
)*+&#,
-#.*/%0#& -#.*/%0#&
1*"2'0,3+*(2#/ 456789 : 4;65<9 5
=>.?#(/#@,1*"2'0,3+*(2#/,ABCD<E 456789 < 456<89 F
=>.?#(/#@,1*"2'0,3+*(2#/,ABCD:68E 476F;9 : 4;65<9 5
In this situation, the choice of model should be determined by the most parsimonious specification. This provides the
best prospect of good generalization of the model beyond the samples observed in this study. The originally specified
Markov Blanket model will thus be recommended as the model of choice.
Reestimating these models with more observations could potentially change this conclusion and might more clearly dif-
ferentiate the classification performance. For now, however, we select the Markov Blanket model and it will serve as the
basis for the next section of this paper, Model Application.
Model Application
Interactive Inference
Without further discussion of the merits of each model specification, we will now show how the learned Markov Blan-
ket model can be applied in practice. For instance, we can use BayesiaLab to review the individual classification predic-
tions made based on the model. This feature is call Interactive Inference, which can be accessed via Inference>Adaptive
Inference.
This will bring up Monitors for all variables in the Monitor Panel, and the navigation bar above allows scrolling
through each record of the test set. Record #0 can be seen below with all the associated observations highlighted in
green. Given the observations shown, the model predicts a 99.76% probability that the cells from this FNA sample are
malignant (the Monitor is highlighted in red).
For reference, we will also show record #22, which is classified as benign.
Most cases are rather clear-cut, as above, with record #19 being one of the few exceptions. Here, the probability of ma-
lignancy is 73%.
Target Interpretation Tree

In situations, when only individual cases are under review by a pathologist (rather than a batch of cases from a data-
base), BayesiaLab can also express the model in the form of a Target Interpretation Tree. It is a kind of decision tree,
which prescribes in which sequence evidence should be sought for gaining the maximum amount of information to-
wards a diagnosis. As can been seen in the tree diagram, Uniformity of Cell Size provides the highest information gain.
Upon obtaining this piece of evidence, Uniformity of Cell Shape will bring the highest information gain among the re-
maining variables. Due to the size of a complete Target Interpretation Tree, only three levels of evidence are shown in
the following diagram.
In our particular example, this may not be relevant, as all pieces of evidence, i.e. all observations regarding the FNA are
obtained simultaneously. However, in the context of other diagnostic methods, such as mammography and surgical bi-
opsy, a tree-based decision structure can help prioritize the sequence of exams, given the evidence obtained up to that
point.
Summary
By using Bayesian networks as the framework, we have shown a practical new modeling approach based on the widely
studied Wisconsin Breast Cancer Database. Our prediction accuracy is comparable with the results of all known studies
on this topic.
With BayesiaLab as the software tool, modeling with Bayesian networks becomes accessible to a very broad range of
analysts and researchers, including non-statisticians. The speed of modeling, analysis and subsequent implementation
make BayesiaLab a suitable tool in many areas of research and especially for translational science.
References
Abdrabou, E. A.M.L, and A. E.B.M Salem. “A Breast Cancer Classifier based on a Combination of Case-Based Reason-
ing and Ontology Approach.”
El-Sebakhy, E. A, K. A Faisal, T. Helmy, F. Azzedin, and A. Al-Suhaim. “Evaluation of breast cancer tumor classification
with unconstrained functional networks classifier.” In the 4th ACS/IEEE International Conf. on Computer Systems
and Applications, 281–287, 2006.
Hung, M. S, M. Shanker, and M. Y Hu. “Estimating breast cancer risks using neural networks.” Journal of the Opera-
tional Research Society 53, no. 2 (2002): 222–231.
Karabatak, M., and M. C Ince. “An expert system for detection of breast cancer based on association rules and neural
network.” Expert Systems with Applications 36, no. 2 (2009): 3465–3469.
Mangasarian, Olvi L, W. Nick Street, and William H Wolberg. “Breast cancer diagnosis and prognosis via linear pro-
gramming.” OPERATIONS RESEARCH 43 (1995): 570--577.
Mu, T., and A. K Nandi. “BREAST CANCER DIAGNOSIS FROM FINE-NEEDLE ASPIRATION USING SUPERVISED
COMPACT HYPERSPHERES AND ESTABLISHMENT OF CONFIDENCE OF MALIGNANCY.”
Wolberg, W. H, W. N Street, D. M Heisey, and O. L Mangasarian. “Computer-derived nuclear features distinguish malig-
nant from benign breast cytology* 1.” Human Pathology 26, no. 7 (1995): 792–796.
Wolberg, William H, W. Nick Street, and O. L Mangasarian. “MACHINE LEARNING TECHNIQUES TO DIAGNOSE
BREAST CANCER FROM IMAGE-PROCESSED NUCLEAR FEATURES OF FINE NEEDLE ASPIRATES.”
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.2109.
Wolberg, William H, W. Nick Street, and Olvi L Mangasarian. “Breast Cytology Diagnosis Via Digital Image Analysis”
(1993). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.9894.
Contact Information
Conrady Applied Science, LLC

312 Hamlet’s End Way
Franklin, TN 37067
USA
+1 888-386-8383
info@conradyscience.com
www.conradyscience.com
Bayesia SAS
6, rue Léonard de Vinci
BP 119
53001 Laval Cedex
France
+33(0)2 43 49 75 69
info@bayesia.com
www.bayesia.com
Copyright
© 2011 Conrady Applied Science, LLC and Bayesia SAS. All rights reserved.
Any redistribution or reproduction of part or all of the contents in any form is prohibited other than the following:
• You may print or download this document for your personal and noncommercial use only.
• You may copy the content to individual third parties for their personal use, but only if you acknowledge Conrady
Applied Science, LLC and Bayesia SAS as the source of the material.
• You may not, except with our express written permission, distribute or commercially exploit the content. Nor may
you transmit it or store it in any other website or other form of electronic retrieval system.

Breast Cancer Diagnostics With Bayesian Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Breast Cancer Diagnostics With Bayesian Networks

Uploaded by

Copyright:

Available Formats

Breast Cancer Diagnostics with Bayesian Networks

Reevaluating the Wisconsin Breast Cancer Database with BayesiaLab

Stefan Conrady, stefan.conrady@conradyscience.com

Dr. Lionel Jouffe, jouffe@bayesia.com

Case Study & Tutorial

About the Authors

Case Study & Tutorial

Wisconsin Breast Cancer Database

The following eleven attributes2 are included in the database:

1. Sample code number

2. Clump Thickness (1 - 10)

3. Uniformity of Cell Size (1 - 10)

4. Uniformity of Cell Shape (1 - 10)

5. Marginal Adhesion (1 - 10)

6. Single Epithelial Cell Size (1 - 10)

7. Bare Nuclei (1 - 10)

8. Bland Chromatin (1 - 10)

9. Normal Nucleoli (1 - 10)

10. Mitoses (1 - 10)

11. Class (benign/malignant)

• BayesiaLab-specific functions, keywords, commands, etc., are shown in bold type.

• Attribute/variable/node names are capitalized and italicized.

3 UC Irvine Machine Learning Repository website: http://archive.ics.uci.edu/ml/

4 “Attribute” and “variable” are used interchangeably throughout the paper.

Model 1: Markov Blanket

Formal Definition of Mutual Information

Markov Blanket - Test Set

Markov Blanket - Cross Validation

Model 2: Augmented Markov Blanket

Augmented Markov Blanket - Test Set

Augmented Markov Blanket - Cross-Validation

Augmented Markov Blanket (SC=0.4) - Test Set

Augmented Markov Blanket (SC=0.4) - Cross-Validation

Target Interpretation Tree

Conrady Applied Science, LLC

You might also like