Professional Documents
Culture Documents
References
CONTENTS
HISTORY
CONDITIONAL PROBABILITY
BAYES THEOREM
NAVE BAYES CLASSIFIER
BELIEF NETWORK
APPLICATION OF BAYESIAN NETWORK
PAPER ON CYBER CRIME DETECTION
HISTORY
http://en.wikipedia.org/wiki/Bayesian_probability
HISTORY (Cont.)
http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf
HISTORY (Cont.)
Current uses of Bayesian Networks:
Microsofts printer troubleshooter.
Diagnose diseases (Mycin).
Used to predict oil and stock prices
Control the space shuttle
Risk Analysis Schedule and Cost Overru
ns.
CONDITIONAL PROBABILITY
P(A)
P(S) = 1
Events A and B
Example:
There are 2 baskets. B1 has 2 red ball and 5 blue ball. B2 has 4 re
d ball and 3 blue ball. Find
probability of picking a red ball
from
basket 1?
CONDITIONAL PROBABILITY
The question above wants P(red ball |
basket 1).
The answer intuitively wants the probability o
f red ball from only the sample space of b
asket 1.
So the answer is 2/7
The equation to solve it is:
P(A|B) = P(AB)/P(B) [Product Rule]
P(A,B) = P(A)*P(B) [ If A and B are independe
nt ]
How do you solve P(basket2 | red ball) ???
BAYESIAN THEOREM
A special case of Ba
yesian Theorem:
P(AB) = P(B) x P(A|B
)
A
B
P(BA) = P(A) x P(B|A
)
Since P(AB) = P(BA
),
P(B) x P(A|B) = P(A) x
P(B|A)
P( A) P( B | A)
P( A) P( B | A)
P( A | B)
( B)
P APB | A PA PB | A
=> P(A|B) = [P(A) xPP(
BAYESIAN THEOREM
Solution to P(basket2 | red ball) ?
P(basket 2| red ball) = [P(b2) x P(r | b2)]
/ P(r)
= (1/2) x (4/7)] / (6/14)
= 0.66
BAYESIAN THEOREM
Example 2: A medical cancer diagnosis
problem
There are 2 possible outcomes of a diagnos
is: +ve, -ve. We know .8% of world popul
ation has cancer. Test gives correct +ve r
esult 98% of the time and gives correct v
e result 97% of the time.
If a patients test returns +ve, should we
diagnose the patient as having cancer?
BAYESIAN THEOREM
P(cancer) = .008
P(+ve|cancer) = .98
P(+ve|-cancer) = .03
P(-cancer) = .992
P(-ve|cancer) = .02
P(-ve|-cancer) = .97
BAYESIAN THEOREM
BAYESIAN THEOREM
Example:
There are 3 boxes. B1 has 2 white, 3 bla
ck and 4 red balls. B2 has 3 white, 2 bl
ack and 2 red balls. B3 has 4 white, 1 bla
ck and 3 red balls. A box is chosen at r
andom and 2 balls are drawn. 1 is white
and other is red. What is the probability t
hat they came from the first box??
BAYESIAN THEOREM
Let E1, E2, E3 denote events of choosing
B1, B2, B3 respectively. Let A be the ev
ent that 2 balls selected are white and r
ed.
P(E1) = P(E2) = P(E3) = 1/3
P(A|E1) = [2c1 x 4c1] / 9c2 = 2/9
P(A|E2) = [3c1 x 2c1] / 7c2 = 2/7
P(A|E3) = [4c1 x 3c1] / 8c2 = 3/7
BAYESIAN THEOREM
P(E1|A) = [P(E1) x P(A|E1)] / P(Ei) x P(A
|Ei)
= 0.23727
P(E2|A) = 0.30509
P(E3|A) = 1 (0.23727 + 0.30509) = 0.4576
4
BAYESIAN CLASSIFICATION
Why use Bayesian Classification:
Probabilistic learning: Calculate explicit
probabilities for hypothesis, among the mo
st practical approaches to certain types of
learning problems
Incremental: Each training example can
incrmentally increase/decrease the probabil
ity that a hypothesis is correct. Prior knowl
edge can be combined with observed dat
a.
BAYESIAN CLASSIFICATION
Sources/References
Problem???
TECHNIQUE
Design issues
Approach
ASSUMPTIONS
Final Algorithm
Examples is a set of text documents along with their target values. V is the
set of all possible target values. This function learns the probability terms
P( wk| vj), describing the probability that a randomly drawn word from a
document in class vj will be the English word Wk. It also learns the class prior
probabilities P(vi).
1. collect all words, punctuation, and other tokens that occur in Examples
Vocabulary set of all distinct words & tokens occurring in any text
document from Examples
2. calculate the required P(vi) and P( wk| vj) probability terms
For each target value vj in V do
docsj the subset of documents from Examples for which the target value
is vj
P(v1) IdocsjI / \Examplesl
Textj a single document created by concatenating all members of docsj
n total number of distinct word positions in Textj
for each word Wk in Vocabulary
nk number of times word wk occurs in Textj
P(wkIvj) nk+1/n+|Vocabulary|
CLASSIFY_NAIVE_BAYES_TEXT( Doc)
Return the estimated target value for the document Doc. ai denotes the word
found in the ith position within Doc.
positions all word positions in Doc that contain tokens found in
Vocabulary
Return VNB, where
misc.forsale
soc.religion.christian
alt.atheism
comp.os.ms-winclows.misc
rec.autos
talk.politics.guns
sci.space
cornp.sys.ibm.pc.hardware
rec.sport.baseball
talk.politics.mideast
sci.crypt
comp.windows.x
rec.motorcycles
talk.politics.misc
sci.electronics
comp.sys.mac.hardware
rec.sport.hockey
talk.creligion.misc
sci .med
APPLICATIONS
Thank you !
REFERENCES
1. David J. Marchette, Computer Intrusion Detection and Network Monitoring,
A statistical Viewpoint, 2001,Springer-Verlag, New York, Inc, USA.
2. Heckerman, D. (1995), A Tutorial on Learning with Bayesian Networks, Technical
Report MSR-TR-95-06, Microsoft Corporation.
3. Michael Berthold and David J. Hand, Intelligent Data Analysis, An Introduction, 1
999, Springer, Italy.
4. http://www.ll.mit.edu/IST/ideval/data/data_index.html, accessed on 01/12/2002
5. http://kdd.ics.uci.edu/ , accessed on 01/12/2002.
6. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and
Techniques with Java Implementations, 2000, Morgan Kaufmann, USA.
7. http://www.bayesia.com , accessed on 20/12/2002
Bayesian Networks
Syntax:
a set of nodes, one per variable
Some conventions.
Semantics
The full joint distribution is defined as the product of
the
local conditional distributions:
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
Example of Construction of a BN
Description
Goal
How well does our model detect or classif
y
attacks and respond to them later on.
The system requires the estimation of two
quantities:
The probability of detection (PD)
Probability of false alarm (PFA).
It is not possible to simultaneously achi
eve a PD of 1 and PFA of 0.
Input DataSet
Sample dataset
e
and protocol_type.
Data Gathering
MIT Lincoln Labs set up an environment t
o
acquire several weeks of raw TCP dump
data for a local-area network (LAN)
simulating a typical U.S. Air Force LAN. T
he
generated raw dataset contains about few
million connection records.
Observation 1:
As shown in the next figure, the most pro
bable activity corresponds to a smurf at
tack (52.90%), an ecr_i (ECHO_REPLY)
service (52.96%) and an icmp protocol
(53.21%).
Observation 2:
Observation 3:
Data
Data
Performance evaluation
QUESTIONS OR QUERIES
Thank you !