Professional Documents
Culture Documents
OVERVIEW
Tony Cooper
Senior Data Scientist
tonycooper@kpmg.co.nz
July 2016
kpmg.com/nz
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
Introduction
Last meeting Machine Learning what can it do
This meeting Machine Learning how does it work
Not covering How to do Machine Learning (e.g. test / train split)
Not covering Applications (see e.g. a long list at
http://www.deeplearningpatterns.com/doku.php/applications)
Machine Learning
Recommender Systems
Computational Advertising
Hyperpersonalisation (segmentation with segment size 1)
Computer Vision
GPU programming
KPMG Hardware
7 Node Spark Cluster (7 x 2 Xeon)
1 GPU Server
4 x Tesla K80
(4 x 24GB GPU RAM
4 x 5000 cores
4 x 5.8 TeraFLOPs)
2 x Xeon 14 cores
(56 threads)
1 TB RAM
6TB SSD
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
Machine Learning
SL
ML
Statistical Learning
AI
Machine Learning
Machine Intelligence
Unsupervised
Supervised
Semi-supervised
Technical
Practical
Dummies
(can download the R and Python code without buying the book)
Experts
Machine Learnings best kept secret
Interesting
Internet
http://deeplearning.stanford.edu/tutorial/
http://cs231n.stanford.edu/
(some images in this presentation taken from there)
5
5
Recommender System
4
4
4
4
2
3
2
1
3
1
2
3
2
1
3
1
5
3
3
h11
h21
h31
h12
h22
h32
h11
h21
h31
h12
h22
h32
5
3
3
12 = 11 12 + 12 22
w11
w21
w12
w22
w13
w23
w14
w24
w15
w25
w11
w21
w12
w22
w13
w23
w14
w24
w15
w25
11 equations in 16 unknowns
Generically:
= 1 1 + 2 2 + +
Solved using Alternating Least Squares (the machine chooses the features latent features)
The machine did the work for us in deciding what features to use
R pseudo code
# ratings matrix
R = matrix(nr=3, nc=5, data=c(4,2,1,NA,5,NA,3,3,3,NA,4,2,1,3,NA))
# initial users matrix.
h = matrix(nr=3, nc=2, data=rnorm(6));
# initial items matrix.
w = matrix(nr=5, nc=2, data=rnorm(10));
# find h, w to minimize the squared error
For (iter in 1:5) {
# update users
for (i in 1:3) {
h[i, ] = solve(...)
}
# update items
for (j in 1:5) {
w[j, ] = solve(...)
}
}
Title Sale Method Bedrooms Land Area Floor Area Existing/New Valuation Valuation Year Sale Price
P
2
308000
P
3
400000
C
P
3
E
484000
F
P
3
E
625000
P
4
695000
P
3
760000
C
A
3
E
790000
2011
945000
F
P
2
511
E
730000
F
P
5
556
E
670000
815000
F
A
3
612
E
570000
810000
P
3
754
400000
F
A
3
809
E
730000
2011
780000
P
1
32
220000
223000
S
P
1
45 E
220000
2008
265000
P
1
51
173000
S
P
2
51 E
230000
2011
230000
P
2
54
336575
P
2
59
270000
289800
F
P
2
60 E
405000
C
T
2
70
385000
P
2
70
340000
370000
A
2
70
340000
380000
C
A
2
74 E
505000
P
2
74
375000
395500
F
P
3
202
80
640000
U
P
2
81 E
305000
P
2
81
315000
Problems
The world isnt linear
Doesnt handle interactions easily (Samuel Johnson: Your Manuscript Is Good and Original, But What is Original Is Not
Good; What Is Good Is Not Original)
Doesnt handle missing values at all
Doesnt handle correlated inputs well
10
Response
500
1000
1500
Input
2000
2500
3000
10
Response
500
1000
1500
Input
2000
2500
3000
10
Response
500
1000
1500
Input
2000
2500
3000
Cubic Fit
Cubic Fit
12
10
Response
500
1000
1500
Input
2000
2500
3000
Quartic Fit
Quartic Fit
12
10
Response
500
1000
1500
Input
2000
2500
3000
Quintic Fit
Quintic Fit
12
10
Response
500
1000
1500
Input
2000
2500
3000
Overfitting
Overfitting Fit
12
10
Response
500
1000
1500
Input
2000
2500
3000
10
Response
500
1000
1500
Input
2000
2500
3000
Sigmoid
10
Response
500
1000
1500
Input
2000
2500
3000
= 1 1 + 2 2 + +
Linear hi is constant
= 1 1 + 2 2 + +
A neural network is just a bunch of weighted sigmoid regressions, n is the number of nodes
10
Response
500
1000
1500
Input
2000
2500
3000
10
Response
500
1000
1500
Input
2000
2500
3000
10
Overfitting
Response
500
1000
1500
Input
2000
2500
3000
playground.tensorflow.org
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
- Deep Learning
Same Problem as
Suburb
List Price Agreement Date Type
Mt Roskill
308000
40972 R
Mt Roskill
300000
40944 R
Mt Albert
41007 R
Mt Albert
40900 R
Mt Albert
695000
40728 R
Mt Roskill
760000
40862 R
Mt Albert
40961 R
Mt Albert
40996 R
Mt Albert
41016 R
Mt Eden
40856 R
Mt Roskill
380000
40985 R
Mt Albert
40975 R
Mt Eden
160000
40757 R
Mt Eden
40689 R
Mt Eden
173000
40996 APT
Mt Albert
249000
40967 APT
Mt Eden
359000
40819 R
Mt Albert
299000
40985 R
Mt Eden
40709 R
Mt Eden
40974 APT
Mt Albert
380000
40750 R
Mt Eden
40788 R
Mt Albert
40985 R
Mt Albert
399000
40994 R
Mt Eden
665000
40966 R
Mt Albert
319000
40711 APT
Mt Albert
319000
40757 APT
Title Sale Method Bedrooms Land Area Floor Area Existing/New Valuation Valuation Year Sale Price
P
2
308000
P
3
400000
C
P
3
E
484000
F
P
3
E
625000
P
4
695000
P
3
760000
C
A
3
E
790000
2011
945000
F
P
2
511
E
730000
F
P
5
556
E
670000
815000
F
A
3
612
E
570000
810000
P
3
754
400000
F
A
3
809
E
730000
2011
780000
P
1
32
220000
223000
S
P
1
45 E
220000
2008
265000
P
1
51
173000
S
P
2
51 E
230000
2011
230000
P
2
54
336575
P
2
59
270000
289800
F
P
2
60 E
405000
C
T
2
70
385000
P
2
70
340000
370000
A
2
70
340000
380000
C
A
2
74 E
505000
P
2
74
375000
395500
F
P
3
202
80
640000
U
P
2
81 E
305000
P
2
81
315000
Linear Regression
Tree
Random Forest
Gradient Boosting
Going Deeper
playground.tensorflow.org
Adding X1*X2
Go and Play!
(use ReLU)
deeplearning4j.org
Autoencoders
Compression
Dimension Reduction (resembles PCA)
Noise reduction (MRI example)
Drawing stuff
Drawing Stuff
Convolutions
Just functions that combine pixels in a weighted way
A way of getting a correlation between a shape and parts of an image
Example: find red circles in an image, find edges
Measure how much the image part matches the shape
(clarifai.com)
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
- Transfer Learning
(one more reason why the robots will be eating our lunch)
Transfer Learning
Transfer Learning uses an existing set of weights for an existing Deep Learning
network and adapts (retrains) some of the layers for a new set of images. This
lets us (and robots) transfer learning to new tasks.
The next two examples show a case where the last three layers of a network
downloaded from the internet are retrained to distinguish between cars and
SUVs. The fruit image example does no retraining but takes weights from the
second last layer as inputs to a model trained in R.
Results - before
Results - after
setwd("~/MATLAB/matconvnet")
cat("reading data file\n")
train <- read_csv("fruit.csv", col_names=FALSE)
rows = nrow(train)
cols = ncol(train)
train[, cols] <- factor(train[, cols], labels=c("peaches", "apricots"))
set.seed(2) # 2 gives a nice even split
inTrain <- sample(rows) # no test sample
x.train <- train[inTrain, -cols]
y.train <- train[inTrain, cols]
fitControl <- trainControl(method = "repeatedcv", number=4, repeats=50) #, classProbs=TRUE, summaryFunction=twoClassSummary)
cat("training model\n")
xgbGrid <- expand.grid(.nrounds = 4, .max_depth = 3, .eta = 0.3, .gamma = 0, .colsample_bytree = 0.6, .min_child_weight = 1) # good
xgb <- train(x=x.train, y=y.train, method='xgbTree', trControl=fitControl, tuneGrid=xgbGrid)
save(xgb, file="xgb.RData")
pred <- predict(xgb, newdata=x.train)
confusionMatrix(pred, y.train)
# output the predicted values
cat("saving predicted labels\n")
pred <- predict(xgb, newdata=train[, -cols])
pred <- data.frame(pred) # convert to 0,1 variable
write_csv(pred, "fruitresults.csv", col_names=FALSE)
cat("done\n")
Result
Agenda
Introduction
Machine Learning
- Deep Learning
- Transfer Learning
- Reinforcement Learning
- Reinforcement Learning
(one more reason why the robots will be eating our lunch)
Reinforcement Learning
Semi-supervised learning (not all input data has labels)
Labels are sparse
The labels come some time after the input data (when we get our
reward we dont know which actions in the past contributed to that
reward)
https://gym.openai.com/envs
Q-Learning Model
Using Q turns a semi-supervised problem into a supervised problem
Q(s,a) is the long run mean reward from taking action a in state s
, = [+1 + +2 + 2 +3 + |, ]
We dont know Q but we play lots of games using trial and error and
record all the rewards from playing various actions a.
We update Q as we go
Its not as difficult as it looks! (c.f. Kalman Filter)
, , + + max , ,
Q-Learning Algorithm
Loop through frames and games updating
, , + + max , ,
Where
s=current frame + previous frame
a=action (space bar or nothing)
Q is a deep neural network approximation of E[]
Demo
Demo
Results (videos)
https://www.youtube.com/watch?v=W2CAghUiofY
https://www.youtube.com/watch?v=TmPfTpjtdgg