Bayes Network

BAYESIAN NETWORK
References
[1]Jiawei Han:Data Mining Concepts and Techniques,ISBN 153860-489-8

Morgan Kaufman Publisher.
[2] Stuart Russell,Peter Norvig Artificial Intelligence A modern
Approach ,Pearson education.
[3] Kandasamy,Thilagavati,Gunavati , Probability, Statistics and
Queueing Theory , Sultan Chand Publishers.
[4] D. Heckerman: A Tutorial on Learning with Bayesian Networ
ks, In Learning in Graphical Models, ed. M.I. Jordan, The MIT
Press, 1998.
[5] http://en.wikipedia.org/wiki/Bayesian_probability
[6] http://www.construction.ualberta.ca/civ606/myFiles/Intro%2
0to%20Belief%20Network.pdf
[7] http://www.murrayc.com/learning/AI/bbn.shtml
[8] http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html
[9] http://en.wikipedia.org/wiki/Bayesian_belief_network
CONTENTS
HISTORY
CONDITIONAL PROBABILITY
BAYES THEOREM
NAVE BAYES CLASSIFIER
BELIEF NETWORK
APPLICATION OF BAYESIAN NETWORK
PAPER ON CYBER CRIME DETECTION
HISTORY
Bayesian Probability was named after

Reverend Thomas Bayes (1702-1761).
He proved a special case of what is current
ly known as the Bayes Theorem.
The term Bayesian came into use around
the 1950s.
Pierre-Simon, Marquis de Laplace (1749-1
827) independently proved a generalized ve
rsion of Bayes Theorem.
http://en.wikipedia.org/wiki/Bayesian_probability
HISTORY (Cont.)
1950s New knowledge in Artificial Intelligen

ce
1958 Genetic Algorithms by Friedberg (Hollan
d and Goldberg ~1985)
1965 Fuzzy Logic by Zadeh at UC Berkeley
1970 Bayesian Belief Network at Stanford
University (Judea Pearl 1988)
The ideas proposed above was not fully
developed until later. BBN became popular in
the 1990s.
http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf
HISTORY (Cont.)
Current uses of Bayesian Networks:
Microsofts printer troubleshooter.
Diagnose diseases (Mycin).
Used to predict oil and stock prices
Control the space shuttle
Risk Analysis Schedule and Cost Overru
ns.
Probability : How likely is it that an event will happen?

Sample Space S
Element of S: elementary event

An event A is a subset of S
P(A)
P(S) = 1
Events A and B
P(A|B)- Probability that event A occurs given that event B has

already occurred.
Example:
There are 2 baskets. B1 has 2 red ball and 5 blue ball. B2 has 4 re
d ball and 3 blue ball. Find
probability of picking a red ball
from
basket 1?
The question above wants P(red ball |
basket 1).
The answer intuitively wants the probability o
f red ball from only the sample space of b
asket 1.
So the answer is 2/7
The equation to solve it is:
P(A|B) = P(AB)/P(B) [Product Rule]
P(A,B) = P(A)*P(B) [ If A and B are independe
nt ]
How do you solve P(basket2 | red ball) ???
BAYESIAN THEOREM
A special case of Ba
yesian Theorem:
P(AB) = P(B) x P(A|B
)
A
B
P(BA) = P(A) x P(B|A
)
Since P(AB) = P(BA
),
P(B) x P(A|B) = P(A) x
P(B|A)
P( A) P( B | A)
P( A) P( B | A)
P( A | B)
( B)
P APB | A PA PB | A
=> P(A|B) = [P(A) xPP(
BAYESIAN THEOREM
Solution to P(basket2 | red ball) ?
P(basket 2| red ball) = [P(b2) x P(r | b2)]
/ P(r)
= (1/2) x (4/7)] / (6/14)
= 0.66
BAYESIAN THEOREM
Example 2: A medical cancer diagnosis
problem
There are 2 possible outcomes of a diagnos
is: +ve, -ve. We know .8% of world popul
ation has cancer. Test gives correct +ve r
esult 98% of the time and gives correct v
e result 97% of the time.
If a patients test returns +ve, should we
diagnose the patient as having cancer?
BAYESIAN THEOREM
P(cancer) = .008
P(+ve|cancer) = .98
P(+ve|-cancer) = .03
P(-cancer) = .992
P(-ve|cancer) = .02
P(-ve|-cancer) = .97
Using Bayes Formula:

P(cancer|+ve) = P(+ve|cancer)xP(cancer) / P(+ve)
= 0.98 x 0.008 = .0078 / P(+ve)
P(-cancer|+ve) = P(+ve|-cancer)xP(-cancer) / P(+ve)
= 0.03 x 0.992 = 0.0298 / P(+ve)
So, the patient most likely does not have cancer.
BAYESIAN THEOREM
General Bayesian Theorem:
Given E1, E2,,En are mutually disjoint ev

ents and P(Ei) 0, (i = 1, 2,, n)
P(Ei/A) = [P(Ei) x P(A|Ei)] / P(Ei) x P(A|
Ei)
i = 1, 2,, n
BAYESIAN THEOREM
Example:
There are 3 boxes. B1 has 2 white, 3 bla
ck and 4 red balls. B2 has 3 white, 2 bl
ack and 2 red balls. B3 has 4 white, 1 bla
ck and 3 red balls. A box is chosen at r
andom and 2 balls are drawn. 1 is white
and other is red. What is the probability t
hat they came from the first box??
BAYESIAN THEOREM
Let E1, E2, E3 denote events of choosing
B1, B2, B3 respectively. Let A be the ev
ent that 2 balls selected are white and r
ed.
P(E1) = P(E2) = P(E3) = 1/3
P(A|E1) = [2c1 x 4c1] / 9c2 = 2/9
P(A|E2) = [3c1 x 2c1] / 7c2 = 2/7
P(A|E3) = [4c1 x 3c1] / 8c2 = 3/7
BAYESIAN THEOREM
P(E1|A) = [P(E1) x P(A|E1)] / P(Ei) x P(A
|Ei)
= 0.23727
P(E2|A) = 0.30509
P(E3|A) = 1 (0.23727 + 0.30509) = 0.4576
4
BAYESIAN CLASSIFICATION
Why use Bayesian Classification:
Probabilistic learning: Calculate explicit
probabilities for hypothesis, among the mo
st practical approaches to certain types of
learning problems
Incremental: Each training example can
incrmentally increase/decrease the probabil
ity that a hypothesis is correct. Prior knowl
edge can be combined with observed dat
a.
BAYESIAN CLASSIFICATION
Probabilistic prediction: Predict multiple

hypotheses, weighted by their probabiliti
es
Standard: Even when Bayesian methods
are computationally intractable, they ca
n provide a standard of optimal decisio
n
making against which other method
s can be measured
A simplified assumption: attributes are

conditionally independent:
Greatly reduces the computation cost, onl

y count the class distribution.

The probabilistic model of NBC is to find the probab
ility of a certain class given multiple dijoint (assum
ed)
events.
The nave Bayes classifier applies to learning tasks
where each instance x is described by a conjuncti
on of attribute values and where the target functio
n f(x) can take on any value from some finite set
V. A set of training examples of the target function
is provided, and a new instance is presented, de
scribed by the
tuple of attribute values <a1,a2,
,an>. The learner is asked to predict the target v
alue, or classification, for this new instance.

Abstractly, probability model for a classifier is a
conditional model
P(C|F1,F2,,Fn)
Over a dependent class variable C with a small
nuumber of outcome or classes conditional o
ver several feature variables F1,,Fn.
Nave Bayes Formula:
P(C|F1,F2,,Fn) = argmaxc [P(C) x P(F1|C) x
P(F2|C) xx P(Fn|C)] / P(F1,F2,,Fn)
Since P(F1,F2,,Fn) is common to all probabili
ties, we donot need to evaluate the denomitat
or for comparisons.

Tennis-Example

Problem:
Use training data from above to classify t
he following instances:
a) <Outlook=sunny, Temperature=cool,
Humidity=high, Wind=strong>
b) <Outlook=overcast, Temperature=cool
, Humidity=high, Wind=strong>

Answer to (a):
P(PlayTennis=yes) = 9/14 = 0.64
P(PlayTennis=n) = 5/14 = 0.36
P(Outlook=sunny|PlayTennis=yes) = 2/9 = 0.22
P(Outlook=sunny|PlayTennis=no) = 3/5 = 0.60
P(Temperature=cool|PlayTennis=yes) = 3/9 = 0
.33
P(Temperature=cool|PlayTennis=no) = 1/5 = .2
0
P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33
P(Humidity=high|PlayTennis=no) = 4/5 = 0.80
P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33

P(yes)xP(sunny|yes)xP(cool|yes)xP(high|y
es)xP(strong|yes) = 0.0053
P(no)xP(sunny|no)xP(cool|no)xP(high|no)
x P(strong|no) = 0.0206
So the class for this instance is no. We ca
n normalize the probility by:
[0.0206]/[0.0206+0.0053] = 0.795

Answer to (b):
P(PlayTennis=no) = 5/14 = 0.36
P(Outlook=overcast|PlayTennis=yes) = 4/9 = 0.44
P(Outlook=overcast|PlayTennis=no) = 0/5 = 0
P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33
P(Wind=strong|PlayTennis=no) = 3/5 = 0.60

Estimating Probabilities:
In the previous example, P(overcast|no) =
0 which causes the formulaP(no)xP(overcast|no)xP(cool|no)xP(high|
no)xP(strong|nno) = 0.0
This causes problems in comparing becau
se the other probabilities are not consid
ered. We can avoid this difficulty by usin
g mestimate.

M-Estimate Formula:
[c + k] / [n + m] where c/n is the origina
l
probability used before, k=1 and
m=
equivalent sample size.
Using this method our new values of

probility is given below-

New answer to (b):
P(PlayTennis=no) = 6/16 = 0.37
P(Outlook=overcast|PlayTennis=yes) = 5/12 = 0.42
P(Outlook=overcast|PlayTennis=no) = 1/8 = .13
P(Temperature=cool|PlayTennis=yes) = 4/12 = 0.33
P(Wind=strong|PlayTennis=no) = 4/7 = 0.57

P(yes)xP(overcast|yes)xP(cool|yes)xP(hi
gh|yes)xP(strong|yes) = 0.011
P(no)xP(overcast|no)xP(cool|no)xP(high|
no)xP(strong|nno) = 0.00486
So the class of this instance is yes

The conditional probability values of all t
he
attributes with respect to the class are
pre-computed and stored on disk.
This prevents the classifier from comput
ing the conditional probabilities every ti
me it runs.
This stored data can be reused to reduc
e the
BAYESIAN BELIEF NETWORK
In Nave Bayes Classifier we make the assumpti

on of class conditional independence, that is gi
ven the class label of a sample, the value of the
attributes are conditionally independent of one
another.
However, there can be dependences between
value of attributes. To avoid this we use Bayesia
n Belief Network which provide joint conditional
probability distribution.
A Bayesian network is a form of probabilistic
graphical model. Specifically, a Bayesian netwo
rk is a directed acyclic graph of nodes represe
nting
variables and arcs representing depen
dence
relations among the variables.
A Bayesian network is a representation of the

joint distribution over all the variables represen
ted by nodes in the graph. Let the variables be
X(1), ..., X(n).
Let parents(A) be the parents of the node A. T
hen the joint distribution for X(1) through X(n) i
s
represented as the product of the proba
bility
distributions P(Xi|Parents(Xi)) for i =
1 to n. If X has no parents, its probability dist
ribution is said to be unconditional, otherwise
it is conditional.

By the chaining rule of probability, the j
oint probability of all the nodes in the gr
aph
above is:
P(C, S, R, W) = P(C) * P(S|C) * P(R|C) *
P(W|S,R)
W=Wet Grass, C=Cloudy, R=Rain,
S=Sprinkler
Example: P(W-RSC)
= P(W|S,-R)*P(-R|C)*P(S|C)*P(C)
= 0.9*0.2*0.1*0.5 = 0.009

What is the probability of wet grass on a given
day - P(W)?
P(W) = P(W|SR) * P(S) * P(R) +
P(W|S-R) * P(S) * P(-R) +
P(W|-SR) * P(-S) * P(R) +
P(W|-S-R) * P(-S) * P(-R)
Here P(S) = P(S|C) * P(C) + P(S|-C) * P(-C)
P(R) = P(R|C) * P(C) + P(R|-C) * P(-C)
P(W)= 0.5985
Advantages of Bayesian Approac

h
Bayesian networks can readily handle

incomplete data sets.
Bayesian networks allow one to learn
about causal relationships
Bayesian networks readily facilitate use
of prior knowledge.
Sources/References
Naive Bayes Spam Filtering Using Word-Position-Based Attributes- http://www.ceas.cc/paper

s-2005/144.pdf
by-: Johan Hovold, Department of Computer Science,Lund University Box 118,
221 00
Lund, Sweden.[E-mail johan.hovold.363@student.lu.se]
[Presented at CEAS 2005 Second Conference on Email and Anti-Spam
July 21 & 22, at Stanford University]
Tom Mitchell , Machine Learning , Tata Mcgraw Hill
A Bayesian Approach to Filtering Junk EMail,

Mehran Sahami Susan Dumaisy David Heckermany Eric Horvitzy Gates Building
Computer Science Department Microsoft Research, Stanford University Redmond W
Stanford CA fsdumais heckerma horvitzgmicrosoftcom
[Presented at AAAI Workshop on Learning for Text Categorization, July 1998, Madison, Wiscon
sin]
Problem???
real world Bayesian network application

Learning to classify text.
Instances are text documents
we might wish to learn the target concept electronic ne
ws articles that I find interesting, or pages on the Worl
d Wide Web that discuss data mining topics.
In both cases, if a computer could learn the target conc
ept accurately, it could automatically filter the large volu
me of
online text documents to present only the most relevan
t
documents to the user.
TECHNIQUE
learning how to classify text, based on the

naive Bayes classifier
its a probabilistic approach and is among the most effe
ctive algorithms currently known for learning to classify t
ext documents,
Instance space X consists of all possible text documents
given training examples of some unknown target
function f(x), which can take on any value from some
finite set V
we will consider the target function classifying document
s as interesting or uninteresting to a particular person, u
sing the target values like and dislike to indicate these t
wo classes.
Design issues
how to represent an arbitrary text docume

nt in terms of attribute values
decide how to estimate the probabilities r

equired by the naive Bayes classifier
Approach
Our approach to representing arbitrary text document

s is disturbingly simple: Given a text document, such
as this paragraph, we define an attribute for each wor
d position in the document and define the value of t
hat attribute to be the English word found in that pos
ition. Thus, the current paragraph would be described
by 111 attribute values, corresponding to the 111 wor
d positions. The value of the first attribute is the word
our, the value of the second attribute is the word a
pproach, and so on. Notice that long text documents
will require a larger number of attributes than short do
cuments. As we shall see, this will not cause us any t
rouble.
ASSUMPTIONS
assume we are given a set of 700

training documents that a friend has
classified as dislike and another 300
she has classified as like
We are now given a new document and
asked to classify it
let us assume the new text document is
the preceding paragraph
We know (P(like) = .3 and P (dislike) = .7 in the current

example
P(ai , = wk|vj) (here we introduce wk to indicate the kth word
in the English vocabulary)
estimating the class conditional probabilities (e.g., P(ai =
ourIdislike)) is more problematic because we must
estimate one such probability term for each combination of
text position, English word, and target value.
there are approximately 50,000 distinct words in the
English vocabulary, 2 possible target values, and 111 text
positions in the current example, so we must estimate
2*111* 50, 000 =~10 million such terms from the training
data.
we shall assume the probability of encountering a specific

word wk (e.g., chocolate) is independent of the specific
word position being considered (e.g., a23 versus a95) .
we estimate the entire set of probabilities P(a1= wk|vj),
P(a2= wk|vj)... by the single position-independent
probability P(wklvj)
net effect is that we now require only 2* 50, 000 distinct
terms of the form P(wklvj)
We adopt the rn-estimate, with uniform priors and with m
equal to the size of the word vocabulary
n total number of word positions in all training examples

whose target value is v, nk is the number of times word Wk i
s found among these n word positions, and Vocabulary is th
e total number of distinct words (and other tokens) found wi
thin the training data.
Final Algorithm
Examples is a set of text documents along with their target values. V is the
set of all possible target values. This function learns the probability terms
P( wk| vj), describing the probability that a randomly drawn word from a
document in class vj will be the English word Wk. It also learns the class prior
probabilities P(vi).
1. collect all words, punctuation, and other tokens that occur in Examples
Vocabulary set of all distinct words & tokens occurring in any text
document from Examples
2. calculate the required P(vi) and P( wk| vj) probability terms
For each target value vj in V do
docsj the subset of documents from Examples for which the target value
is vj
P(v1) IdocsjI / \Examplesl
Textj a single document created by concatenating all members of docsj
n total number of distinct word positions in Textj
for each word Wk in Vocabulary
nk number of times word wk occurs in Textj
P(wkIvj) nk+1/n+|Vocabulary|
CLASSIFY_NAIVE_BAYES_TEXT( Doc)
Return the estimated target value for the document Doc. ai denotes the word
found in the ith position within Doc.
positions all word positions in Doc that contain tokens found in
Vocabulary
Return VNB, where
During learning, the procedure

LEARN_NAIVE_BAYES_TEXT examines all training
documents to extract the vocabulary of all words and
tokens that appear in the text, then counts their
frequencies among the different target classes to
obtain the necessary probability estimates. Later,
given a new document to be classified, the
procedure CLASSIFY_NAIVE_BAYESTEXT uses these
probability estimates to calculate VNB according to
Equation Note that any words appearing in the new
document that were not observed in the training set
are simply ignored by CLASSIFY_NAIVE_BAYESTEXT
Effectiveness of the Algorithm
Problem classifying usenet news articles

target classification for an article name of the usenet newsgroup in which
the article appeared
In the experiment described by Joachims (1996), 20 electronic newsgroups
were considered
1,000 articles were collected from each newsgroup, forming a data set of 20,0
00 documents. The naive Bayes algorithm was then applied using two-thirds o
f these 20,000 documents as training examples, and performance was measur
ed over the remaining third.
100 most frequent words were removed (these include words such as the
and of), and any word occurring fewer than three times was also removed.
The resulting vocabulary contained approximately 38,500 words.
The accuracy achieved by the program was 89%.
comp.graphics
misc.forsale
soc.religion.christian
alt.atheism
comp.os.ms-winclows.misc
rec.autos
talk.politics.guns
sci.space
cornp.sys.ibm.pc.hardware
rec.sport.baseball
talk.politics.mideast
sci.crypt
comp.windows.x
rec.motorcycles
talk.politics.misc
sci.electronics
comp.sys.mac.hardware
rec.sport.hockey
talk.creligion.misc
sci .med
APPLICATIONS
A newsgroup posting service that learns to

assign documents to the appropriate
newsgroup.
NEWSWEEDER systema program for reading
netnews that allows the user to rate articles as
he or she reads them. NEWSWEEDER then
uses these rated articles (i.e its learned profile
of user interests to suggest the most highly
rated new articles each day
Naive Bayes Spam Filtering Using Word- Positi
on-Based Attributes
Thank you !
Bayesian Learning Networks

Approach to
Cybercrime Detection
Bayesian Learning Networks Approach to

Cybercrime Detection
N S ABOUZAKHAR, A GANI and G MANSON
The Centre for Mobile Communications Research
(C4MCR),
University of Sheffield, Sheffield
Regent Court, 211 Portobello Street,
Sheffield S1 4DP, UK
N.Abouzakhar@dcs.shef.ac.uk
A.Gani@dcs.shef.ac.uk
G.Manson@dcs.shef.ac.uk
M ABUITBEL and D KING

The Manchester School of Engineering,
University of Manchester
IT Building, Room IT 109,
Oxford Road,
Manchester M13 9PL, UK
mostafa.abuitbel@stud.man.ac.uk
David.king@man.ac.uk
REFERENCES
1. David J. Marchette, Computer Intrusion Detection and Network Monitoring,
A statistical Viewpoint, 2001,Springer-Verlag, New York, Inc, USA.
2. Heckerman, D. (1995), A Tutorial on Learning with Bayesian Networks, Technical
Report MSR-TR-95-06, Microsoft Corporation.
3. Michael Berthold and David J. Hand, Intelligent Data Analysis, An Introduction, 1
999, Springer, Italy.
4. http://www.ll.mit.edu/IST/ideval/data/data_index.html, accessed on 01/12/2002
5. http://kdd.ics.uci.edu/ , accessed on 01/12/2002.
6. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and
Techniques with Java Implementations, 2000, Morgan Kaufmann, USA.
7. http://www.bayesia.com , accessed on 20/12/2002
Motivation behind the paper..

Growing dependence of modern society
on telecommunication and information
networks.
Increase in the number of interconnected

networks to the Internet has led to an
increase in security threats and cyber crimes.
Structure of the paper
In order to detect distributed network

attacks as early as possible, an under
research and development probabilistic
approach, based on Bayesian networks
has been proposed.
Where can this model be utilized
Learning Agents which deploy Bayesian

network approach are considered to be
a promising and useful tool in determini
ng suspicious early events of Internet
threats.
Before we look at the detai

ls given in the paper lets
understand what Bayesian
Networks are and how they
are constructed.
Bayesian Networks
A simple, graphical notation for conditional independe

nce assertions and hence for compact specification of
full
joint distributions.
Syntax:
a set of nodes, one per variable
a directed, acyclic graph (link "directly influences"

)
a conditional distribution for each node given its
parents:
P (Xi | Parents (Xi))
In the simplest case, conditional distribution represented
as a conditional probability table (CPT) giving the
Some conventions.
Variables depicted as node

s
Arcs represent probabilistic
dependence between
variables.
Conditional probabilities
encode the strength of
dependencies.
Missing arcs implies
conditional independence.
Semantics
The full joint distribution is defined as the product of
the
local conditional distributions:
P (X1, ,Xn) = i = 1 P (Xi | Parents(Xi))

e.g., P(j m a b e)
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
Example of Construction of a BN
Back to the discussion of the

paper.
Description
This paper shows how probabilistically B

ayesian network detects communication
network attacks, allowing for generalizati
on of Network Intrusion Detection Syste
ms
(NIDSs).
Goal
How well does our model detect or classif
y
attacks and respond to them later on.
The system requires the estimation of two
quantities:
The probability of detection (PD)
Probability of false alarm (PFA).
It is not possible to simultaneously achi
eve a PD of 1 and PFA of 0.
Input DataSet
The 2000 DARPA Intrusion Detection Ev

aluation Program which was prepared a
nd managed by MIT Lincoln Labs has pr
ovided the necessary dataset.
Sample dataset
Construction of the network

The following figure shows the Bayesian
network that has been automatically
constructed by the learning algorithms of
BayesiaLab.
The target variable, activity_type, is directl
y
connected to the variables that heavily
contribute to its knowledge such as servic
e
and protocol_type.
Data Gathering
MIT Lincoln Labs set up an environment t
o
acquire several weeks of raw TCP dump
data for a local-area network (LAN)
simulating a typical U.S. Air Force LAN. T
he
generated raw dataset contains about few
million connection records.
Mapping the simple

Bayesian Network that we saw to
the one used in the paper
Observation 1:
As shown in the next figure, the most pro
bable activity corresponds to a smurf at
tack (52.90%), an ecr_i (ECHO_REPLY)
service (52.96%) and an icmp protocol
(53.21%).
Observation 2:
What would happen if the probability of

receiving ICMP protocol packets is incre
ased? Would the probability of having a
smurf attack increase?
Setting the protocol to its ICMP value in
creases the probability of having a smur
f attack from 52.90% to 99.37%.
Observation 3:
Lets look at the problem from the opposite di

rection. If we set the probability of portsweep
attack to 100%,then the value of some associ
ated variables would inevitably vary.
We note from Figure 4 that the probabilities o
f the TCP protocol and private service have b
een increased from 38.10% to 97.49% and fr
om 24.71% to 71.45% respectively. Also, we
can notice an increase in the REJ and RSTR fl
ags.
How do the previous examples

work??
PROPOGATION
Data
Data
Benefits of the Bayesian Model
The benefit of using Bayesian IDSs is the abili

ty to adjust our IDSs sensitivity.
This would allow us to trade off between
accuracy and sensitivity.
Furthermore, the automatic detection network
anomalies by learning allows distinguishing th
e normal activities from the abnormal ones.
Allow network security analysts to see the
amount of information being contributed by e
ach variable in the detection model to the kno
wledge of the target node
Performance evaluation
QUESTIONS OR QUERIES
Thank you !

Bayes Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes Network

Uploaded by

Copyright:

Available Formats

BAYESIAN NETWORK

[1]Jiawei Han:Data Mining Concepts and Techniques,ISBN 153860-489-8

Bayesian Probability was named after

1950s New knowledge in Artificial Intelligen

Probability : How likely is it that an event will happen?

Element of S: elementary event

P(A|B)- Probability that event A occurs given that event B has

Using Bayes Formula:

General Bayesian Theorem:

Given E1, E2,,En are mutually disjoint ev

Probabilistic prediction: Predict multiple

NAVE BAYES CLASSIFIER

A simplified assumption: attributes are

Greatly reduces the computation cost, onl

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

Using this method our new values of

NAVE BAYES CLASSIFIER

NAVE BAYES CLASSIFIER

So the class of this instance is yes

NAVE BAYES CLASSIFIER

BAYESIAN BELIEF NETWORK

In Nave Bayes Classifier we make the assumpti

BAYESIAN BELIEF NETWORK

A Bayesian network is a representation of the

BAYESIAN BELIEF NETWORK

BAYESIAN BELIEF NETWORK

BAYESIAN BELIEF NETWORK

Advantages of Bayesian Approac

Bayesian networks can readily handle

Naive Bayes Spam Filtering Using Word-Position-Based Attributes- http://www.ceas.cc/paper

Tom Mitchell , Machine Learning , Tata Mcgraw Hill

A Bayesian Approach to Filtering Junk EMail,

real world Bayesian network application

learning how to classify text, based on the

how to represent an arbitrary text docume

decide how to estimate the probabilities r

Our approach to representing arbitrary text document

assume we are given a set of 700

We know (P(like) = .3 and P (dislike) = .7 in the current

we shall assume the probability of encountering a specific

n total number of word positions in all training examples

During learning, the procedure

Effectiveness of the Algorithm

Problem classifying usenet news articles

A newsgroup posting service that learns to

Bayesian Learning Networks

Bayesian Learning Networks Approach to

M ABUITBEL and D KING

Motivation behind the paper..

Increase in the number of interconnected

Structure of the paper

In order to detect distributed network

Where can this model be utilized

Learning Agents which deploy Bayesian