Professional Documents
Culture Documents
Machine Learning - 2
Neural Networks & Genetic Algorithms
Today
Last time:
Neural Networks
Decision Trees
(Evaluation & Unsupervised Learning)
Perceptrons
Multilayer Neural Networks
Genetic Algorithms
Neural Networks
Biological Neurons
Behavior of a Neuron
Today
Last time:
Neural Networks
Decision Trees
(Evaluation & Unsupervised Learning)
Perceptrons
Multilayer Neural Networks
Genetic Algorithms
A Simple Perceptron
Learning :
use the training data to adjust the weights in a network
The Idea
Features (xi)
Output
Student
First
last
year?
Male?
Works
hard?
Drinks
?
First
this
year?
Richard
Yes
Yes
No
Yes
No
Alan
Yes
Yes
Yes
No
Yes
1.
2.
3.
4.
Step
Step
Step
Step
1.
2.
3.
5.
A Simple Example
Output
Student
A this year?
Richard
Yes
Yes
No
Yes
No
Alan
Yes
Yes
Yes
No
Yes
Alison
No
No
Yes
No
No
Jeff
No
Yes
No
Yes
No
Gail
Yes
No
Yes
Yes
Yes
Simon
No
Yes
Yes
Yes
No
Richard:
(1 0.2) + (1 0.2) + (0 0.2) + (1 0.2) = 0.6>=0.55-->output is 1
but he did not get an A this year
So reduce all weights of active connections (with input 1) by 0.05
So we get w1= 0.15, w2= 0.15, w3= 0.2, w4= 0.15
10
Output
Student
A this year?
Richard
Yes
Yes
No
Yes
No
Alan
Yes
Yes
Yes
No
Yes
Alison
No
No
Yes
No
No
Jeff
No
Yes
No
Yes
No
Gail
Yes
No
Yes
Yes
Yes
Simon
No
Yes
Yes
Yes
No
Alan:
(1 0.15) + (1 0.15) + (1 0.2) + (0 0.15) = 0.5 < 0.55 output is 0
but he got an A this year
So increase all weights of active connections by 0.05
So we get w1= 0.2, w2= 0.2, w3= 0.25, w4= 0.15
11
Output
Student
A this year?
Richard
Yes
Yes
No
Yes
No
Alan
Yes
Yes
Yes
No
Yes
Alison
No
No
Yes
No
No
Jeff
No
Yes
No
Yes
No
Gail
Yes
No
Yes
Yes
Yes
Simon
No
Yes
Yes
Yes
No
Lets check (w1= 0.2 w2= 0.1 w3= 0.25 w4= 0.1)
Richard: (10.2) + (10.1) + (00.25) + (10.1) = 0.4 <= 0.55 -> output is 0
Alan:
(10.2) + (10.1) + (10.25) + (10.1) = 0.55 <= 0.55 -> output is 1
Alison: (00.2) + (00.1) + (10.25) + (00.1) = 0.35 <= 0.55 -> output is 0
Jeff: (00.2) + (10.1) + (00.25) + (10.1) = 0.2 <= 0.55 -> output is 0
Gail:
(10.2) + (00.1) + (10.25) + (10.1) = 0.55 <= 0.55 -> output is 1
Simon: (00.2) + (10.1) + (10.25) + (10.1) = 0.45 <= 0.55 -> output is 0
12
x2
w1x1 + w2x2 -t = 0
decision
boundary
decision
region for C2
C1
C2
decision
region for C1
x1
x2
decision
region for C1
decision
boundary
decision
region for C2
x1
x3
14
Adding a Bias
b + xiwi
b w1
x1
w2
x2
15
x1
x2
.
.
.
w1
w2
xn+1 = 1
wn+1
output
n +1
w x
i
i=1
f wi xi
i=1
n +1
wn
xn
activation function
n +1
y = f wi xi
i=1
final classification
16
n +1
w x
i
n +1
i=1
w x
i
step
sign
y=
w x
i=1
y=
n +1
n +1
1 if wi xi t
i=1
0 otherwise
i=1
Sigmoid function
1
y=
1+e
n +1
wi xi
i=1
n +1
+ 1 if wi xi 0
i=1
- 1 otherwise
17
Delta Rule
1.
w = (T - O)
learning rate
So:
2.
wi = (T - O) xi
18
19
training data:
perceptron
20
w1 = 0.75
w2 = 0.5
w3 = -0.6
Assume:
21
Training
22
23
24
x1
x2
Output
25
26
A Perceptron Network
So far, we looked at
a single perceptron
C1
~C1
C2
~C2
C1
~C1
C6
~C6
Ex: learning to
recognize digit --> 10
possible outputs -->
need a perceptron
network
27
28
sigmoid function
(which is differentiable)
29
Today
Last time:
Neural Networks
Decision Trees
(Evaluation & Unsupervised Learning)
Perceptrons
Multilayer Neural Networks
Genetic Algorithms
30
Output layer
(one or more) Hidden layer
Input layer
31
Decision Boundaries
of a single perceptron
32
Decision Boundaries
33
Decision Boundaries
34
35
Feed-forward + Backpropagation
Feed-forward:
Backpropagation:
repeat the forward pass and back pass for the next data point
until all data points are examined (1 epoch)
repeat this entire exercise until the overall error is minimised
2
Ex: squared errors = 1
(TargetOutput ActualOutput ) < 0.001
2 iOutputlayer
36
Backpropagation
In a multilayer network
Intuitively:
37
Backpropagation Visually
Goal: minimize
the error
(w1,w2)
(w1+w1,w2 +w2)
39
Oi = sigmoid wji xj =
j
1
1+e
w ji xj
j
Derivative of sigmoid
h g'(xi ) Erri = Ok (1 - Ok )
kh k
koutputs
40
Example: XOR
O5
w35 = -1.2
5 = 0.3
w23 = 0.4
O5
w14 = 0.9
w45 = 1.1
w24 = 1.0
4 = -0.1
42
Oi = sigmoid wji xj =
j
1
1+e
w ji xj
j
x2
Target output T
x1
Output of neuron 5:
43
be used to modify
w35 and w45
44
kh k
koutputs
3 = O3 (1-O3) 5 w35
= (0.5250) x (1-0.5250) x (-0.1274) x (-1.2)
= 0.0381
4 = O4 (1-O4) 5 w45
= (0.8808) x (1-0.8808) x (-0.1274) x (1.1)
= -0.0147
3 to modify
O5
4 to modify
w14 and w24
45
x1=1
x2=1
O5
5=-0.1274
4=-0.0147
46
O5
47
2
Ex: squared errors = 1
(Ti Oi ) < 0.001
2 iOutputlayer
48
The Result
W35 = -10.38
5 = 4.56
W23 = 4.76
O5
W14 = 6.39
W45 = 9.77
W24 = 6.39
4 = 2.84
49
Error is minimized
Inputs
Target Output
x1
x2
T5
Actual Output
O5
1
0
1
0
1
1
0
0
0
1
1
0
0.0155
0.9849
0.9849
0.0175
Error
e
-0.0155
0.0151
0.0151
-0.0175
50
51
Speech synthesis
NETtalk:
uses the context and the letters around a letter to learn how to
pronounce a letter
Input: letter and its surrounding letters
Output: phoneme
52
NETtalk Architecture
53
Neural Networks
Disadvantage:
Advantage:
54
Today
Last time:
Neural Networks
Decision Trees
(Evaluation & Unsupervised Learning)
Perceptrons
Multilayer Neural Networks
Genetic Algorithms
55
Genetic Algorithms
56
Genetic Algorithms
Solution Representation
Ex: AATAGC
59
Genetic Operators
Crossover
1#0#
+
0#10
1#10
0#0#
Mutation
0#10
0#11
60
Example (1)
Features (xi)
A last year?
Male?
Works hard?
Drinks?
A this year?
Richard
Yes / 1
Yes / 1
No / 0
Yes / 1
No / 0
Alan
Yes / 1
Yes / 1
Yes / 1
No / 0
Yes / 1
Alison
No / 0
No / 0
Yes / 1
No / 0
No / 0
Jeff
No / 0
Yes / 1
No / 0
Yes / 1
No / 0
Gail
Yes / 1
No / 0
Yes / 1
Yes / 1
Yes / 1
Simon
No / 0
Yes / 1
Yes / 1
Yes / 1
No / 0
chosen at random
Fitness function:
Student
Output f(X)
#1## (only males will get A) predicts correctly 2/6 -- Alan, Alison
0#10 predicts correctly 3/6 -- Richard, Jeff, Simon
11## predicts correctly 4/6 -- Alan, Alison, Jeff, Simon
#11# predicts correctly 4/6 -- Richard, Alan, Alison, Jeff
61
Example (2)
Fitness function:
62
Proportional Selection
1/6
A
3/6
= 50%
= 17%
B
2/6
C
= 33%
fitness(A) = 3/6
fitness(B) = 1/6
fitness(C) = 2/6
63
Proportional Cross-over
64
If N=8
65
12 Unique Solutions
Problem Representation
Observation that eliminates many arrangements from
consideration:
<16257483>
source: Russel & Norvig (2003)
67
Fitness Function
fitness = 28-1=27
source: Russel & Norvig (2003)
68
Crossing Over
28-5
= 23
28-4 = 24
23/(24+23+20+11) = 29%
24/(24+23+20+11) = 31% etc
69
Genetic Algorithms
Applications:
Difficulties:
70
Today
Last time:
Neural Networks
Decision Trees
(Evaluation & Unsupervised Learning)
Perceptrons
Multilayer Neural Networks
Genetic Algorithms
71