You are on page 1of 183

PRMLPattern Recognition And Machine Learning

Pattern Recognition And Machine Learning

planktonli

PRML

23

Introduction

Probability Distributions --

Linear Models for Regression


planktonli

Linear Models for Classification planktonli


marginalization Fisher
Laplace
Neural Networks @
BP
Kernel Methods Dual Representations
Gaussian Processes GP

Sparse Kernel Machines @


support vector machine Dual Representations KKT
SVM RVM
Graphical Models @
inference
Mixture Models and EMKmeans EMExpectation
Maximization GMM EM
Variational Inference@
_CASIA KL
Mean Field
MCMCMarkov Chain Monte Carlo
Metropolis-Hastings
1 Gibbs Sampling Slice SamplingHamiltonian MCMC
@_CASIA
PCA PPCA
EM PCA Autoencoder

Sequential Data@_ISCAS
Hidden Markov Models EM HMM
Combining Models@
committeesBoostingAdaBoost

PRML MLAPPMachine Learning:A Probabilistic


Perspective: @Copper_PKU: @
_SIAT: @priceton: @unluckyAllenZealot
PRML QQ

QQ 177217565
424269692

PRML
@Nietzsche_
http://weibo.com/dmalgorithms

Introduction

likrain

PRML

->
->

=============================================================

planktonli(1027753147) 19:01:30
learning

(373242125) 19:01:36

HX(458728037) 19:01:57
->
likrain(261146056) 19:02:22

planktonli(1027753147) 19:02:50

likrain(261146056) 19:02:59
@planktonli

(373242125) 19:04:38
,?
(66707180) 19:04:43

likrain(261146056) 19:05:32
@planktonli
@ N x_n,y_n
@HX

planktonli(1027753147) 19:06:42
@: : clustering, dimension reduction, density estimization
likrain(261146056) 19:07:02

google, trunsductive SVM


planktonli(1027753147) 19:08:33
inductive learning tranductive learning
likrain(261146056) 19:08:36

planktonli(1027753147) 19:09:09
inductive learning semi-supervised leraning
likrain(261146056) 19:09:26

=========================================================

Generalization Bounds

topic


MIT CMU machine learning deparment CSML
UCL

planktonli(1027753147) 19:15:43

likrain(261146056) 19:17:17

siri


fMIR
fMIR
fMRI

(1316061080) 19:25:06
D
likrain(261146056) 19:25:33

planktonli(1027753147) 19:25:46

likrain(261146056) 19:25:50
x1,x2,x3xn
planktonli(1027753147) 19:26:17

likrain(261146056) 19:26:28

ok


PRML

regularization.

AIC

=============================================================

@(289765648) 19:35:33

(1316061080) 19:35:47

@(289765648) 19:35:57
p-value
HX(458728037) 19:36:14
cross validation
lost(549294286) 19:37:49
AIC
likrain(261146056) 19:38:23
@
10 1024 AIC
AIC

@(289765648) 19:40:20
p-value
likrain(261146056) 19:40:23
p-value
@(289765648) 19:40:36

likrain(261146056) 19:40:43


lost(549294286) 19:40:58
AIC
likrain(261146056) 19:41:07
BICEBIC
DM-(864914991) 19:41:30

likrain(261146056) 19:41:37

@(289765648) 19:42:01

DM-(864914991) 19:42:10

likrain(261146056) 19:42:39

DM-(864914991) 19:43:00
2^n
@(289765648) 19:43:27
2^n
likrain(261146056) 19:43:35

@(289765648) 19:44:09
2^n
likrain(261146056) 19:44:13
paper
@(289765648) 19:44:37

likrain(261146056) 19:44:57

(813394698) 19:45:51

likrain(261146056) 19:45:56

(794717493) 19:46:14
@likrain
9 9
HX(458728037) 19:46:26

(794717493) 19:48:51

@(289765648) 19:49:24

(66707180) 19:49:44

@(289765648) 19:50:19

HX(458728037) 19:50:32

@(289765648) 19:50:45

HX(458728037) 19:51:33

@(289765648) 19:52:26

HX(458728037) 19:52:26

(373242125) 19:52:49
,.
planktonli(1027753147) 19:52:59
,google naive bayes + distributed computing
@(289765648) 19:53:26

Xiaoming(50777501) 19:53:35
ML

@(289765648) 19:53:42

planktonli(1027753147) 19:53:53
google 1TB
HX(458728037) 19:54:21

(373242125) 19:55:01

(1549614810) 19:55:04
n gram
(794717493) 19:55:06

HX(458728037) 19:55:30

(794717493) 19:55:42

(373242125) 19:55:42

planktonli(1027753147) 19:55:45

To Xiaoming: +
(66707180) 19:56:22
HX

@(289765648) 19:56:44

HX(458728037) 19:57:06

@(289765648) 19:57:43

(66707180) 19:58:05
-
planktonli(1027753147) 19:58:44
RGB histgoram,SIFT\shape context
HX(458728037) 19:58:54
800*600 800*600*3
Xiaoming(50777501) 20:00:26
To planktonliLearning"

HX(458728037) 20:01:24

(373242125) 20:01:43
,?
planktonli(1027753147) 20:01:57
To Xiaoming: \

(794717493) 20:02:20
@
Xiaoming(50777501) 20:02:27
To HX"
(373242125) 20:02:58
,.
HX(458728037) 20:03:03

(794717493) 20:03:17

planktonli(1027753147) 20:03:27
neural network/support vector machine
(373242125) 20:03:48
,.
planktonli(1027753147) 20:03:49

Xiaoming(50777501) 20:03:52
@ NN
planktonli(1027753147) 20:04:02

HX(458728037) 20:04:06
SVM
(374350284) 20:04:52

planktonli(1027753147) 20:05:03
SVM model
likrain(261146056) 20:04:53

(66707180) 20:06:14

@(289765648) 20:06:19

likrain(261146056) 20:06:28

,
,,,


code
(794717493) 20:08:39

likrain(261146056) 20:08:52

planktonli(1027753147) 20:09:21

=========================================================

likrain(261146056) 20:10:28

(66707180) 20:10:31
likrain
likrain(261146056) 20:10:39

likrain(261146056) 20:10:44

planktonli(1027753147) 20:12:25
,

likrain(261146056) 20:12:41

graphic model
@planktonli
generative model
planktonli(1027753147) 20:14:36
@likrain,
@(289765648) 20:14:40

likrain(261146056) 20:14:41
ppt
HX(458728037) 20:14:53
generative model discriminative model
@(289765648) 20:14:56
loss fucntion
likrain(261146056) 20:15:23
loss function?
planktonli(1027753147) 20:15:35

likrain(261146056) 20:15:37

planktonli(1027753147) 20:15:55
objective function
likrain(261146056) 20:16:06
l_2 loss function = gaussian noise

@( (289765648) 20:16:44

likrain(261146056) 20:16:53

ok

lost(549294286) 20:18:09

likrain(261146056) 20:18:29
noise
@(289765648) 20:18:38

likrain(261146056) 20:18:45
LAD
==


n 2^n

Functional Geometrical Analysis.

GFA

ok
planktonli(1027753147) 20:25:15
,Functional Geometrical Analysis high dimension

likrain(261146056) 20:25:55

planktonli(1027753147) 20:27:39

likrain(261146056) 20:28:04
ok
lost(549294286) 20:28:11
2^n
n
likrain(261146056)

lost(549294286) 20:28:38

likrain(261146056) 20:28:53

planktonli(1027753147) 20:29:16

likrain(261146056) 20:29:20

lost(549294286) 20:29:28

likrain(261146056) 20:29:30

vapnik VC

lost(549294286) 20:30:39

likrain(261146056) 20:30:45

jun zhu

~~

shrake-DM(965229647) 21:43:44
ikrain squre losssvm hinge
loss alex
loss common sense2000 ikrain
GFA cmu random projection

Probability Distributions

:@Nietzsche_
(813394698) 9:11:56
PRML Probability Distributions

PRML Probability Distributions beta


-

N m

818

beta

beta beta
beta
a b beta

N l m

a 1
small batches

-beta ml

PRML

[0,1] N N

Gaussian Processes
p(y|x) x x
Gamma
Gaussian-Gamma distribution

Student t-distributionStudent nb

t t
t outliers

RIVERS(773600590) 11:01:09

(813394698) 11:01:48

EM

xunyu(2118773) 11:09:18

(348609341) 11:10:16
ML
(813394698) 11:10:31
beta

g()u(x) x

lnp(X|) 0

DUDA

Linear Models for Regression


planktonli
planktonli(1027753147) 18:58:12
PRML 3 linear regression
, applied mathematics + computer science
: http://t.qq.com/keepuphero/mine
, http://www.zhizhihu.com/
machine learning
, 3 3 linear regression:
introductionprobability distributions linear regression? machine learning
essence ?
: , \
function ,,\\
\\,
,

, ? 2 ,, L1,L
objective : Polynomial Fitting?
talyor

Hilbert
Polynomial Fitting
Polynomial
Wilbur_(1954123) 19:28:26
x0
planktonli(1027753147) 19:29:21
, x0 ,

Wilbur_(1954123) 19:30:41

planktonli(1027753147) 19:31:03
,

Wilbur_(1954123) 19:31:45

planktonli(1027753147) 19:31:51
,,,wavelet
, kernel construcion
Bishop
1) Polynomial Fitting sin 2)
Polynomial Fitting model Model w ,w

X
: model
, NN,SVM , model ,
, model Bishop
Linear regression
model , linear model,
linear model BP perceptron modelSVM BP model
deep learning SVM shallow model machine learning
linear regression
,Linear regression

Gaussian ,Linear model MLE Least square

Gaussian precision

, 2

model

, model, lasso, ridge regression LASSO


:sparse coding,Compressed sensing L0--> L1sparse

Bias-Variance Decoposition shrink ,

PPT 2

1--> Bias 2--> VarianceBias ,Variance


Bias Variance

Wilbur_(1954123) 20:33:07

planktonli(1027753147) 20:33:25
Variance ,Bias Variance ,Bias tradeoff
Bias Variance

Bias-Variance Decoposition
Bayesian Linear Regression

Bayesian ,, linear model


gaussian process
Wilbur_(1954123) 20:51:25

planktonli(1027753147) 20:54:44

Wilbur?
Wilbur_(1954123) 20:57:24

planktonli(1027753147) 20:58:08
0 ,linear model kernel

bayesain model

model

NN,SVM NN,SVM BP error,RBF


SVM 2 ,,

=============================================================
Wilbur_(1954123) 21:08:29
RBF SVM
planktonli(1027753147) 21:09:01
SVM 0 RBF, RBF RBF
Wilbur_(1954123) 21:11:07
0 x^k SVM
RBF
planktonli(1027753147) 21:11:40
,
Wilbur_(1954123) 21:12:35
logisitic
w
planktonli(1027753147) 21:13:37
,SVM
Wilbur_(1954123) 21:13:41

planktonli(1027753147) 21:14:03


gaussian
Wilbur_(1954123) 21:15:55
SVM multiple kernel learning

planktonli(1027753147) 21:16:45
,MK kernel kernl power
Wilbur_(1954123) 21:17:05

planktonli(1027753147) 21:17:16
,kernel construction ML
Wilbur_(1954123) 21:18:14

planktonli(1027753147) 21:50:29
Linear1.pdf
linear classification Linear2.pdf

Linear Models for Classification


planktonli
planktonli(1027753147) 19:52:28
:
1) Fisher , (Fisher )
2)
3)
4) Laplace

1)
2)
3) MAP
MAP
MAP(poor mans Bayesian) marginalization point estimate
MAP(poor mans Bayesian) test
PRML Bayesian Empirical Bayesian

Curve fitting
1) MLE likelihood function w point estimate
2) MAP (poor mans bayes) prior probability posterior probability w
MAP MLE likelihood function L2 penalty
point estimation
3) fully Bayesian approach sum rule product ruledegree of belief machinery
rule degree of belief predictive distribution
marginalize (sum or integrate) over the whole of parameter space w

x X t label w

probability w marginalization
marginalization Graphical Model
Laplace approximationVariation inferenceMCMC
:Graphical Model marginalization

LS(Least Square) Least squares for classification

Generalized Linear Model: an activation function acting on a linear function of the


feature variables

Linear Model

sign ,
D Euclidean space D-1
Linearly separable D
Coding scheme1-of-K binary coding scheme K
i K i 1 0

,, 1 1,1 :
3 : ,,
1 1 6
1 1 3
,
,
,
1 :

1 , 1 1,
,,

? Fisher's Linear Discriminant


echo<echotooo@gmail.com> 20:26:46


planktonli(1027753147) 20:27:35
,
echo<echotooo@gmail.com> 20:28:12
pass
planktonli(1027753147) 20:28:54
, Fisher's Linear DiscriminantLinear Discriminant Analysis, LDA), Fisher
Graphical Model Latent Dirichlet allocation LDA
PCA(),ICA()
mainfold dimension reduction LDA ,
PCA Loss LDA ,


Fisher LS , Fisher ?
echo<echotooo@gmail.com> 20:41:53
....LDA
planktonli(1027753147) 20:44:12
LDA Gaussian , power performance data distribution
LDA Kernel trick
echo<echotooo@gmail.com> 20:45:29

planktonli(1027753147) 20:45:32
Gaussian
(813394698) 20:46:01

echo<echotooo@gmail.com> 20:46:40

planktonli(1027753147) 20:47:00
Between class Variance
echo<echotooo@gmail.com> 20:47:28


planktonli(1027753147) 20:47:45
mean , LDA
if the distribution of data is not so good, then we may use Kernel Fisher discriminant analysis
I mean that the distribution doesn't meet the gaussian
echo<echotooo@gmail.com> 20:50:35
kfda
planktonli(1027753147) 20:50:42
the detail info you can c the web site http://en.wikipedia.org/wiki/Kernel_Fisher_discriminant_an
alysis KDA LDA Kernel Sb, Sw
KDA kernel fisher
(37633749) 20:52:11
OK
planktonli(1027753147) 20:52:24
, NN perceptron training error

,perceptron
Probabilistic Generative Models MAP ,
2 Probabilistic Generative Models logistic sigmoid function

input class-conditional distribution


Discriminant model make decision

gaussian ,

, 2 2 , MLE


1) class-conditional distribution MLE
class-conditional distribution
2) GLM Logistic
,
Probabilistic Discriminative Models
,
maping the data from the orginal sapce to a new space may make it linearly separable

Logistic regression ,
, Probabilistic Discriminative Models point estimization
Bayesian Logistic Regression
we want to approximate the posterior using Gaussian
Laplace Approximation

Laplace

, Bayesian Logistic Regression

=============================================================
zeno(117127143) 21:25:06

planktonli(1027753147) 21:26:02
model p(x,y) model p(y|X) boundry

zeno(117127143) 21:26:38

planktonli(1027753147) 21:27:34
: Naive Bayes, Graphical Model : NN,SVM,LDA,Decision Tree
+ : SVM kernel construcion generative
Generative + Discriminant
zeno(117127143) 21:30:38
X
planktonli(1027753147) 21:31:39
, IID statistics IID ?
HX(458728037) 21:33:12
boundary
planktonli(1027753147) 21:34:32
two class gaussian distribution
PPT

zeno(117127143) 21:36:52

feature feature
(236995728) 21:34:22

planktonli(1027753147) 21:37:10
MLE statistics statistics NN
LDA boundry
(498290717) 21:39:41

ant/wxony(824976094) 21:42:18
feature

HX(458728037) 21:44:51

ant/wxony(824976094) 21:45:43

planktonli(1027753147) 21:48:38
Generative verses discriminative classifier PPT :
"Lecture2.pdf"

Neural Networks

:@
(66707180) 18:55:06
3,4
5
1.
2. error
3.
BP Jacobian Hessian

x1,x2,x3 1
f "" , activation function. sigmoid .
sigmoid - logistic

x1...xd x0 y
1,...,yk . hidden level, z1,...,zM

j=1,...,M M . (1)wji weight,wj0 . a


j activation.
activation aj h(.)

zj

activation

ak

, identity
, y = a. sigmoid multiclass softmax .

2 NN deep le
arning 5-9 . forward propagation
back propagation

=============================================================
(236995728) 19:20:43

(66707180) 19:21:11

sigmoid
.
(1106207961) 19:22:30

(303957511) 19:23:47
2 2

ant/wxony(824976094) 19:24:30
svm
(303957511) 19:29:42
sigmoid
(236995728) 19:24:07
sigmoid 0-1
(66707180) 19:25:19
.[01]

(236995728) 19:26:56
@
(66707180) 19:27:33
SVM
(38900407) 19:29:00
W,b
(236995728) 19:29:11
@ @
(66707180) 19:29:48

(236995728) 19:30:24
@
(66707180) 19:30:47

(303957511) 19:30:52

(66707180) 19:31:33

(236995728) 19:31:59
0-1
(1106207961) 19:32:47

Wilbur_(1954123) 19:33:59

sigmoid
[0, 1]
(303957511) 19:39:50

tanh sigmoid
(66707180) 19:39:59
NN
(1106207961) 19:40:03

1
(66707180) 19:40:18
y = wx+b +1 b w,

(38900407) 19:41:36
Ng b
(66707180) 19:43:16
b
(236995728) 19:40:33

@
(66707180) 19:42:13

(236995728) 19:42:58
@
(66707180) 19:44:30

(236995728) 19:45:05

zeno(117127143) 19:45:31
nn xor
HX(458728037) 19:45:56
BP
(66707180) 19:46:07
BP

=========================================================
(66707180) 19:46:59

error error w b
NN binary K binary multiclass
error .
t

error

error errro

identity
ak

error .
binary multiclass error
binary
error

binary sigmoid multiclass K


. error ..
error BP E(w) ak

binary multiclass error


yk-tk

y = a, a y - t.

error
error

(9219642) 20:07:40

(66707180) 20:07:56
.
error

0.

error w back propagation


E w E activation a a w .

(236995728) 20:21:30
error function
(66707180) 20:21:38

(785011830) 20:22:35
hidden w chain rule
(66707180) 20:22:49

E w .

1. Xn, forward propagation hidden output a.


2. output
3. hidden
4.
5. w
w
(303957511) 20:32:09
BP
(66707180) 20:33:32
batch .batch
w .

. error


early stop w,
error

M M

monica(909117539) 20:42:38

(66707180) 20:42:56
error M
NN
monica(909117539) 20:44:36

(66707180) 20:44:38
ML
M

NN train
monica(909117539) 20:47:01

(66707180) 20:47:08
M M train w

NLP
.
ouput

Ng deep learning
100x100 w .
input unit . hidden

units planes plane feature map feature map un


it input feature map unit weight
plane
100 100 plane 6
6
plane unit unit
plane unit
32x32 input 32x32 unit. 4x4
28x28 6 6 plane plane 28x28
unit.
plane unit 4x4 unit 16 weight plane
unit 16 weight .
6 16x6=96 weight 6 bias .
6 28x28 unit 6x28x28 unit.

6 32x32 96x96 400


4x4 unit 400 x (96-4) x (96-4) = 300 300

unit .

plane plane unit


plane 4 uni
t plane 4 unit

a. /

b.
(236995728) 21:01:30
@
(1549614810) 21:01:44
cnn normalization
(66707180) 21:03:26
Andrew Ng deep learning

(38900407) 21:04:19

(1549614810) 21:05:01
max pooling
(66707180) 21:05:02
http://ibillxia.github.io/blog/2013/04/06/Convolutional-Neural-Networks/
(1106207961) 21:05:02

(66707180) 21:06:40
deep learning Andrew Ng notes,

(1106207961) 21:08:04

(1549614810) 21:08:56

(66707180) 21:09:03
w .

Kernel Methods


:@Nietzsche_
(813394698) 9:16:05
Kernel

KNNSVM --
dual representation
y=

kernel methods
stationary kernels


dual representation

L2

w J(w) 0

J(w)

K(X)

DUAL

w
DUAL
svm DUAL
80(850639048) 10:09:50
professor
(813394698) 10:12:57

mercer

HMM

(813394698) 10:40:34
Gaussian Processes
(1106207961) 10:41:02

(1009129701) 10:41:14

(813394698) 10:42:41
Gaussian Processes Gaussian Processes

ARMA (autoregressive moving average) models, Kalman filters, radi


al basis function networks

w 0 0

GP

marginal distribution
C

k c

log

sigmoid

MCMC Laplace approximation

Gaussian Processes GP

http://blog.videolectures.net/100-most-popular-machine-learning-talks-at-videolectures-net/
Jordan talks David MacKay
Information Theory, Inference and Learning Algorithms
(9219642) 14:35:09

an ?
(813394698) 14:36:44

(9219642) 14:37:48

2
(813394698) 14:49:01
@

(9219642) 14:56:08

Sparse Kernel Machines



: @
(66707180) 18:59:22
PRML 7

KNN

sparse kernel machine


SVM RVM(revelance vector machine) SVM SVM

. y(x)=0

SVM margin margin y(x)

SVM margin
x
:

|y(x)|/||w||

x y=0 r

x y(x)=0

:
w b y=0

w b
w b,

w b 0 w b

w L(w,b,a) w b a

n=1...N

SVM

0 KKT KKT

KKT

f(x1,x2):

g=c g-c=0 . g f


g
f g
g f lamda

g<0 KKT x`

inactive
lamda 0 f lamda 0.
g ,

lamda 0 g=0

KKT :

SVM

KKT
1

=============================================================

Fire(564122106) 20:01:56
KKT
(66707180) 20:02:33
KKT
Fire(564122106) 20:03:54
KKT

kxkr<lxfkxkr@126.com> 20:04:18

Wolf

<wuwja@foxmail.com> 20:04:19

an 0 kkt
kxkr<lxfkxkr@126.com> 20:04:20
KKT
YYKuaiXian(335015891) 20:04:40
Ng
(852383636) 20:04:47
sv
Wolf

<wuwja@foxmail.com> 20:05:05

margin sv
YYKuaiXian(335015891) 20:05:40

Wolf

<wuwja@foxmail.com> 20:05:54

kkt primary due

kxkr<lxfkxkr@126.com> 20:07:03

(1316103319) 20:07:08

Wolf

<wuwja@foxmail.com> 20:07:19

kkt sv 0 sv 0

Fire(564122106) 20:08:38
sv
Wolf

<wuwja@foxmail.com> 20:09:26

sv

sv sv svm
(66707180) 20:09:37
sv svm 0
0.
(852383636) 20:11:04
sv sv
Wolf

<wuwja@foxmail.com> 20:11:30

sv svhinge

(66707180) 20:12:24
kkt hinge svm
hinge svm :

hinge

hinge

z>1 0 yt>1 .

=========================================================


svm .

, slack variables

0. n=1,..,N

margin

y(x)=0 ,
,

margin trade-off

upper bound C trade-off.


SVM C
C
C

w, b { } 0 w, b, L(w,b,a)

SVM
svm

chunking:
1.

=0

2.
3.SVM SMO, sequentialminimal optimization.
SVM ()

x
nonlinear manifold manifold .
(66707180) 20:48:59

(1316103319) 20:49:58
manifold
(66707180) 20:50:26

SVM .

y(x) t

0.:

.
PAC(probably approximately correct) VC
PAC PAC
VC PAC

=============================================================
Fire(564122106) 21:05:56
svm svm
(66707180) 21:07:36
svm
Fire(564122106) 21:09:29

Selective Block Minimization for Faster Convergence of Limited Memory Large_Scale Linear Mod
els
(852383636) 21:11:05
svm
Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing
Data Mining and Knowledge Discovery
Fire(564122106) 21:16:29

Fire 21:14:33

"Selective Block Minimization for Faster Convergence of Limited Memory Large_Scale Linear Mo
dels.pdf"
(852383636) 21:17:41
svm Support Vector Clustering

=========================================================
(66707180) 8:54:01
RVM SVMSVM decision SVM
C
RVM
RVM SVM
RVM RVM
w

precision

y(x)
RVM

SVM (7.64) SVM

RVM SVM

RVM w
w w
w w
w w 0 RVM

NxM

RVM

K NxN

w p(t|x,w)

evidence approximation type-2 maximum likelihood

sharply peaked

relatively flat

log :

C NxN
C

EM

log

w m i
w

w
0

RVM

relevance vectors SVM

SVM SVM RVM


SVM RVM

RVM :

sigmoid w RVM
ARD RVM
RVM

RVM sigmod , 4.5


Laplace approximation

Laplace approximation

p(z) q(z)
p(z) mode() mode

p(z)

p(z) p(z)

A f(z)

A>0
f(x) z M
A MxM

A .mode

RVM

q(z)

RVM w mode log

mode. log

( mode)

Z Z=
:

RVM

Graphical Models

: @
(66707180) 18:52:10

1.2. inference

prml

:
p(x1,x2,...x7)=

xk

a,b,c p(a,b|c)=p(a|c)p(b|c) ca b
a, b, c

c tail-to-tailc ab c

a,b
c ,p(a,b):

:p(a,b)=p(a)p(b) a,b
c

a,b c

a,b :

tail-to-tail tail-to-head

c a b c a b c

head-to-head

c a b c a b

a b .
c

p(a,b|c)=p(a|c)p(b|c)
D-separation:
A,B,C A B A B C
:
1, A B tail-to-tail head-to-tail C
2. head-to-head C

f e d-separation. f tail-to-tail f f C.
e head-to-head e c e C
speedmancs<speedmancs@qq.com> 19:23:05
2
(66707180) 19:23:21
f e d-separation..

speedmancs<speedmancs@qq.com> 19:24:54
prml PGM

(66707180) 19:27:14
filter
p(x1,x2...,xn)

MRF(malkov random field)MRF tail-to-tail


MRF

A B C
speedmancs<speedmancs@qq.com> 19:33:16

(66707180) 19:33:18
A B C C A B
(66707180) 19:33:49

MRF
MRF
clique
.
(x1,x2,x3)(x2,x3,x4)

MRF potential function potential :

potential
.
Z normalization

speedmancs<speedmancs@qq.com> 19:42:47
Z
(66707180) 19:43:15

p(x) potential

potential :

p(x) E(Xc)E(Xc)
. potential

Yi Xi Xi Xi
xi yi xj
(xi, yi)(xi, xj)
(xi, yi) E(xi,yi)=

(xi,xj) E(xi, xj)=

xi yi ( xi xj).

speedmancs<speedmancs@qq.com> 19:56:55

(66707180) 19:57:28
Xi E(x,y)

speedmancs<speedmancs@qq.com> 19:57:34
-1
(66707180) 19:58:20

Such a term has the effect of biasing the model towards pixel values that have one particular sig
n in preference to the other.
speedmancs<speedmancs@qq.com> 19:59:29

(66707180) 19:59:39
E(x,y) xi -1 1E(x,y)

E(x,y)p(x,y)
<liyitan2144@163.com> 20:01:13

(66707180) 20:02:18
liyitan2144
speedmancs<speedmancs@qq.com> 20:02:33

(66707180) 20:02:59
.
1 1

inference

=============================================================
speedmancs<speedmancs@qq.com> 20:06:45
beta, h
(66707180) 20:07:30
ICM,iterated conditional modes
max-product max-product , inference
ICM ., ICM
max-product

speedmancs<speedmancs@qq.com> 20:15:07
denoise graph cut beta

<liyitan2144@163.com> 20:16:12
graph cut

kxkr<lxfkxkr@126.com> 20:16:30

kxkr<lxfkxkr@126.com> 20:17:18

(66707180) 20:18:11
. prml
kxkr<lxfkxkr@126.com> 20:19:05
ICMgraph-cut
figure8.30 section8.4figure8.32
(66707180) 20:20:15

kxkr<lxfkxkr@126.com> 20:21:25

=========================================================
(66707180) 20:22:24
inference inference

y x

p(xn) xn x
x k k (N-1)

inference factor graph


factor graph .


1.
2.

. polytree
.
. factor factor .

p(x1,x2,x3)=p(x1)p(x2|x1)p(x3|x1,x2) factor
factor factor

x1,x2,x3 fa,fb,fc,fd factor factor


factor
kxkr<lxfkxkr@126.com> 20:42:10
clique
(66707180) 20:42:19

kxkr<lxfkxkr@126.com> 20:44:15

(66707180) 20:45:10
x1,x2,x3 . factor
kxkr<lxfkxkr@126.com> 20:45:17

(66707180) 20:45:22
factor factor factor
kxkr<lxfkxkr@126.com> 20:47:10

(66707180) 20:47:21
factor
sum-product .
<liyitan2144@163.com> 20:49:50
factor
factor .
(66707180) 20:51:06
factor factormax-product
root
<liyitan2144@163.com> 20:52:38

factor factor
kxkr<lxfkxkr@126.com> 20:54:37
factor

<liyitan2144@163.com> 20:56:57
clique factor factor
kxkr<lxfkxkr@126.com> 20:57:53
factor
<liyitan2144@163.com> 21:00:00

factor
(66707180) 21:00:39
prml HMM CRF
.
inference. factor xi xi root , root
(message) root .
1 , factor f(x).
message factor
message f(x) sum

x3 message :

x1 fa p(x3)

fb x2 , x2 ,

p(x2)

x2

sum-product

huajh7(284696304) 21:34:52
temporal model , Dynamical Bayesian newtwork(DBN), plate models
; belief Bropagation exact approximation loopy ; Inference
MCMC, tree graph Structure Learning BIC score

Mixture Models and EM



: @Nietzsche_
(813394698) 9:10:56
k-means EM
k-means 50 clustering
survey: Data clustering: 50 years beyond K-meansk-means

k-means

k rnk

xn k rnk 1 0
k-means
NP-hard k-means

k
E
M

<ccab4209211@qq.com> 9:29:00

(813394698) 9:30:16
E M ,


k-means

k-means

k-means

k-means
k-means

DBSCAN Science
Clustering by fast search and find of density peaks clustering outliers
detection k-means

k-means k k-means
(465191936) 9:49:20
k-mean
(813394698) 9:49:53
k-means

(465191936) 9:50:08
k coursera sse k
(813394698) 9:51:41
k
survey: Clustering stability an overview k
DPMDL MDL k-meansDP
JORDAN
GMM

GMM

Z k-means
z k zk=1, 0
zk=1

zk=1

zk=1

log

log
log
EM
GMM

zn
zn
EM GMM

EM GMM E

E znk

E M
EM log


karnon(447457116) 10:39:42
EM GMM
(813394698) 10:40:14

EM

E
M Q
Q

EM

EM GMM
M Q

k-means EM k-means E
z 1 0, EM z

k-means EM k-means

EM

log

znk=1

E M

k=1

=============================================================
EM :

log


q(z)

q(z) KL

q(z)

Jensen KL

0 q(z) z

E z

bound

Q M

EM

E M

E z

M
EM EM
EM

Approximate Inference

: @_CASIA
Wilbur_(1954123) 20:02:04

EM

MCMC

f: x -> f(x) df/dxF: f -> F(f)


dF/df
0

1 PRML D E

KL


KL
q
p

KL

KL(q||p)
KL(p||q)

q(x) > p(x) KL(q||p)


q(x) < p(x) q(x) 0 p(x)/q(x)
0.8/0.4 0.01/0.001 log q(x) q(x)
0 0 KL(q||p) KL(p||q)
0.32 0.81 KL

q p

p q p PRML 4.4

p
KL(q||p) 1.17
0.090.07KL(p||q) 23.20.120.07 KL p
q p p p

p p
0 KL(q||p)KL(q||p) 0.690.69
3.45KL(p||q) 43.915.40.97 KL(p||q)
p moment-matching KL(q||p)
mode-seeking
KL(q||p) p(x) 0 q(x)
0 q(x)/p(x)5 KL(p||q)
p(x) 0 q(x) 0 p(x) 0 q(x) 0

KL(q||p) q(x) q(x) 0 KL(p||q)


q(x) q(x) 0
10.3

KL(q||p) KL(p||q)
KL(q||p) KL(p||q)
KL(q||p)
KL

=============================================================
(346723494) 20:24:23
KL(q||p) q p

Yuli(764794071) 20:25:31
KL KL Divergence
(346723494) 20:25:57

Wilbur_(1954123) 20:27:06
p q p
mode mean

linbo-phd-bayesian(99878724) 20:27:15
KL(q||p)=00
(346723494) 20:28:06

Wilbur_(1954123) 20:29:21
ln

(421723497) 20:30:44
PRML P56
Wilbur_(1954123) 20:31:50
KL

(421723497) 20:35:31
KL ?
Wilbur_(1954123) 20:37:15
PRML (10.20) Hellinger distance

(p(x) - q(x) ^ 2
WayneZhang(824976094) 20:41:52
KL
karnon(447457116) 20:42:24
KL ,

Wilbur_(1954123) 20:43:07

=========================================================
PRML 4.4
mode
mode mode mode

\theta^*mode

PRML 10.1

KL

10.1.1
(10.5)

Z_i

mean field
mean field
RBM
KL L(q)(10.2)(10.4)

9.4 (9.70)(9.72) Z
Z p(Z|X)
KL
(10.5)(10.6)

MLAPP (21.28)(21.31)

MLAPP Machine Learning - A Probabilistic Perspective

huajh7(284696304) 21:02:10
low bound L(q) log P(D)KL(Q||P)=0
L(Q) q
q

Wilbur_(1954123) 21:03:04

http://www.blog.huajh7.com/variational-bayes/

q_j q_j (21.30)

(21.32)

x_j x_i
x_j
10.1.1

p(x)

coordinate descent KL
L(q)(10.2)(10.4)(10.5)(10.3) L(q) q_j
q_i L(q)

KL

(10.6)

KL KL

exp
mean-field MLAPP 21.3.1 PRML
q_j ok

KL divergenceKLD
mean field KLD KLD
KL(q||p) q
p
p(Z|X)
p(X)p(X) p(Z,X) p(Z,X)
p(Z,X)

ln(p(X)) KLD L L
:
Z_i

Z_j mean field L

KLD L

p Z_j q_j
q_j coordinate descent

=============================================================
(992463596) 21:15:06
10.6 10.7 const
Zi 1 lnqi j
const j
huajh7(284696304) 21:25:23
const const
const z_j z_j
(1549614810) 21:26:08
mean filed koller
(992463596) 21:26:57
Z j i 1
huajh7(284696304) 21:27:48
const exp(E_{i~=j}[..])
(992463596) 21:28:23
exp
huajh7(284696304) 21:28:32
ln .
-<zh3f@qq.com> 21:26:42

Wilbur_(1954123) 21:28:32
@- MAP
trade-off
huajh7(284696304) 21:29:54

Wilbur_(1954123) 21:30:01

huajh7(284696304) 21:32:23
variational bayeian distributional approximation wilbur
The Variational Bayes Method in Signal Processing 9
(94755934) 21:32:46
@ local minimum non convex
convex minimum
(992463596) 21:33:23
dependence convex
intractable 10.7 const
Wilbur_(1954123) 21:36:04
const(10.9)
(992463596) 21:36:17

(94755934) 21:36:30
@huajh7EM
(992463596) 21:37:16
EM
/sun (76961223) 21:37:31
EM
Happy(467594602) 21:38:30
jordan introduction
huajh7(284696304) 21:38:33
10.6-10.9 KLq||p=0

constVB EM
(992463596) 21:41:00
@Happy jordan introduction
Happy(467594602) 21:41:20
introduction to variational methods in graphical model

(992463596) 21:42:47
thanks Graphical Models, Exponential Families, and Variational Inference
huajh7(284696304) 21:43:25
Neal,HintonA view of the EM algorithm that justifies incremental, sparse,and other variants.pdf
EM
Happy(467594602) 21:43:25

(94755934) 21:43:26

@huajh Z marginalize

huajh7(284696304) 21:44:06
GMM
(992463596) 21:45:37
exponential
huajh7(284696304) 21:46:01
exponential
Wilbur_(1954123) 21:46:01
mean field Z_j @huajh7
Happy(467594602) 21:46:16
mean-field
(992463596) 21:46:18

huajh7(284696304) 21:46:45

(992463596) 21:46:46
64 01 2^64
(94755934) 21:47:33
partition function
Happy(467594602) 21:47:35

(94755934) 21:48:12

huajh7(284696304) 21:48:16

Happy(467594602) 21:48:30

huajh7(284696304) 21:48:35

(992463596) 21:49:29
10.9 10.10
huajh7(284696304) 21:49:42

Happy(467594602) 21:50:03
jordan
(992463596) 21:51:09

huajh7(284696304) 21:51:33
@ (partition function)..
(992463596) 21:51:41
1
Happy(467594602) 21:52:25

(992463596) 21:53:00

(94755934) 21:56:38

Happy(467594602) 21:57:29
jordan graphical model
Wilbur_(1954123) 21:58:14
PRML MLAPP Bayesian Reasoning and Machine Learning
(992463596) 21:58:16
lda variational inference
Happy(467594602) 21:58:28

/sun (76961223) 21:58:33


@
huajh7(284696304) 21:58:33
bishop :
A Tutorial on Variational Bayesian Inference http://staffwww.dcs.shef.ac.uk/people/c.fox/fox_v
btut.pdf LDA
/sun (76961223) 21:59:37
LDA mean field
(992463596) 21:59:38
blei
Happy(467594602) 21:59:46

/sun (76961223) 22:00:02

huajh7(284696304) 22:00:03

/sun (76961223) 22:00:12

Happy(467594602) 22:00:21

/sun (76961223) 22:00:26


Expectation propagation
huajh7(284696304) 22:00:37

(992463596) 22:00:40
kl
Happy(467594602) 22:01:52
jordan
(992463596) 22:05:45
http://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
blei

(992463596) 22:09:18
variational inference ?
Happy(467594602) 22:09:26

(992463596) 22:10:04
Gibbs sampling
Happy(467594602) 22:10:16

(992463596) 22:11:00
rbm variational inference
Happy(467594602) 22:11:21
mean-field
(1002621044) 22:19:40
lda
Happy(467594602) 22:23:48
lda
/sun (76961223) 22:24:22
mean field expectation propagation gibbs samplingdistributed online

_Matrix_(690119781) 22:28:18
https://github.com/sudar/Yahoo_LDA
(407365525) 22:31:18
variational inferencegibbs sampling
/sun (76961223) 22:38:35
sampling
(407365525) 22:39:08
vb
huajh7(284696304) 22:40:24
VB
(94755934) 22:42:16
VB
huajh7(284696304) 22:45:22
variational message passing (
consensus diffusion paper BP loopy BP
(94755934) 22:48:28
vb
huajh7(284696304) 22:49:05

(94755934) 23:04:01
karnon jmlr vb
Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

huajh7(284696304) 23:10:47
2,3 combined and extended

light.(513617306) 23:15:13

karnon(447457116) 23:15:34

(94755934) 23:17:43
vb vb
karnon(447457116) 23:21:29
vb
svd

=========================================================

q(z) p(z)(10.9)

z1 (10.9) z2 (10.11)
(10.11)

z2

m1 z2 m2 z1

(10.13)(10.15)

Gamma

-Gamma

mean field (10.9)

N
(10.9)

Gamma

(10.27)(10.30)

10.4

2.3.6

Gamma

10.8

(10.9)

Gamma

(10.95)(10.97)

a_Nb_N m_NS_N

10.2

Expectation Propagation

p(Z) q(Z) p(Z)


10.7

(10.9)
mean field
@huajh7

=============================================================
(983272906) 21:44:16

Wilbur_(1954123) 21:45:20
(10.91)
Y(414474527) 21:47:49

tzk<tangzk2011@163.com> 21:48:29
LDA
<(523723864) 21:48:43
10.9 tractable
zeno(117127143) 21:52:58

Wilbur_(1954123) 21:53:54
@Y

http://ipg.epfl.ch/~seeger/lapmalmainweb/papers/seeger_wipf_spm10.pdf RBM

zeno(117127143) 21:54:19

Wilbur_(1954123) 21:54:43
@< p(X,Z)
@zeno
<(523723864) 21:55:01
10.9 qj
Wilbur_(1954123) 21:55:57
(10.9)
karnon(447457116) 21:59:27
naive
naive
Wilbur_(1954123) 22:02:31

zeno(117127143) 22:03:27

<(523723864) 22:04:21
lower bound
karnon(447457116) 22:04:22
taylor
Wilbur_(1954123) 22:04:43
@<

karnon(447457116) 22:06:49

zeno(117127143) 22:10:58
kl kl
pgm
mrfhmmcrf pgm pgm
(1549614810) 23:41:20

karnon(447457116) 0:02:10

(337595903) 6:31:34
@karnon

zeno(117127143) 6:52:44
25%

abc d

karnon(447457116) 7:33:16

Sampling Methods

: @Nietzsche_
(813394698) 9:05:00
Markov Chain Monte Carlo Metropolis-Hastings Gibbs Sampling
Slice SamplingHybrid Monte Carlo

L(Q)

Markov Chain Monte Carlo


sampling intractable
MCMC

p(Z|X) sampling

z p(z) f(z) p(z)

sampling z(l)

f(z)so,

p(z) sampling
sampling EM
log Z

Z Z

EM theta IP :

I P(Z|X) P(theta|X)

P P(Z|X)

I P
sampling methods rejection sampling
Importance sampling MCMC
Markov Chain Monte Carlo

orz
MCMC Metropolis

proposal distribution

proposal Z*

1 01
Metropolis-Hastings algorithm

Metropolis-Hastings markov chains

detailed balance

marginal distribution

z' z

11.40 11.41 11.39

Metropolis-Ha

stings
Metropolis proposal distribution
proposal distribution

Gaussian centred on the current state


proposal distribution

Metropolis-Hastings

q proposal distribution

@(289765648) 10:49:15
Zp
(813394698) 10:52:04
Zp z

1 101

11.45

11.44
11.40
proposal distribution Gaussian centred on the current state
state of the art
Gaussian centred on the current state
proposal distribution

Gibbs SamplingGibbs Sampling


Metropolis-Hastings 1.

z2 z3 z1
z1 z3 z2

Z3

Metropolis-Hastings 1

gibbs sampling

1.

gibbs sampling MLAPP blocked gibbs collapsed gibbs


Metropolis-Hastings
slice sampling U

U U

z (b)

The Hybrid Monte Carlo Algorithm Hamiltonian MCMC

Hybrid Monte Carlo

E(z)

Hybrid Monte Carlo

detailed balance
ORC(267270520) 12:14:03

Introducing Monte Carlo Methods with R (use R) PSR MCMC

(813394698) 12:36:37
Markov Chain Monte Carlo In Practice(Gilks)
(403774317) 12:38:06
code
(813394698)
MCMC burn-in koller

Continuous Latent Variables



: @_CASIA
Wilbur_(1954123) 20:00:49
PRML
9.2 K z
0-1 K 1 1 0 x K
z kmeans GMM

12

2.3.1

PCAFA

PCA Karhunen-Love transformKL Hotelling transform

Hotelling 1933
Pearson 1901

u_1

lambda_1 u_1 S
u_1

S
S
D

D M<D
M

12.14

D M

PCA S DxD S O(D^3)

M O(M*D^2)
power method(Golub and Van Loan, 1996) O(M*D^2) D
PCA O(M*D^2)
EM PPCA
PCA
S

(12.28) X^T

XX^T NxN
XX^T X^TX

=============================================================
Wilbur_(1954123) 20:30:56

dxhml(601765336) 20:32:36

Wilbur_(1954123) 20:33:35
PCA

Phinx(411584794) 20:34:46

coli<fwforever@126.com> 20:35:19

(15673189) 20:35:22

Wilbur_(1954123) 20:36:11
O(D^3) D O(D^2)

DxD NxN
(15673189) 20:37:54
pca matlab

=========================================================
Wilbur_(1954123) 20:40:45

PCA PPCA PCA


1.
2. EM PCA

3. EM missing values
4. PPCA EM
5.
6.
7. PPCA
8.
PPCA 2.3.1
z p(z)
x z p(x|z)
z 0

W mu sigma^2
p(z) z
x

z M epsilon D sigma^2*I
W
mu epsilon
PPCA PCA

musigma^2
2.3.1, 2.3.2, 2.3.3
p(z) p(x|z) p(x)

(12.31)(12.32) 2.3.3
p(x)(12.33)

C DxD

O(D^3) O(M^3)

W, mu, sigma^2 0

mu

(12.43)

sum(x_i^T*S*x_i) = Tr(X^T*S*X) X i x_i X'SX (i,j)


X' i S X j (12.43)(12.44) trace
Tr(ABC) = Tr(BCA) = Tr(CAB)

U_M S M
Tipping and Bishop (1999b)L_M
R MxM R

PPCA PCA

sigma^2

(12.32)(12.33)sigma^2
W

dxhml(601765336) 21:36:52
X
Wilbur_(1954123) 21:40:43
@dxhml
W sigma^2

EM PPCA

PPCA EM EM
FA missing data EM
EM E z_n z_n*z_n^T M
W sigma^2
mu mu W sigma^2

EM W sigma^2
O(NDM) D M
O(ND^2) EM online missing data

===========================================================
(15673189) 22:07:20
ppac pca em ?
Wilbur_(1954123) 22:10:20

online
PPCA PCA

(15673189) 22:12:12

Wilbur_(1954123) 22:12:14
PCA

=========================================================
Wilbur_(1954123) 20:57:53
12.2.3 PCA PPCA
z W x
0 X
W x mu sigma^2*I W
W PCA
EM PCA
W

(12.45)

PCA

W
W precision alpha_i
precision

alpha_i w_i
0 alpha alpha
PRML 7.2
EM W

4.4 Laplace approximation alpha_i

EM M alpha_i
M sigma^2 PPCA EM (12.57) W

E
alpha_i
alpha_i
EM EM

12.2.4 FAFA PPCA PPCA

FA

FA FA
FA FA PCA
FA
12.3 PCAKPCA

PCA PCA
PCA

DxD

phi

phi(x_n)*phi(x_n)^T phi(x_n)^T*phi(x_n)
phi(x_n)
(12.73)(12.74)

v_i phi(x_n)

a phi(x) a

12.76 12.75

phi(x_l)^T

K 0

PCA

(12.80)(12.81) a_i lambda_i


phi(x)

x x_n

phi(x_n) 0

KPCA K NxN PCA DxD

PRML
12 12.4
ICAPCA

ICA

ICA
http://ishare.iask.sina.com.cn/f/16979796.html
http://ufldl.stanford.edu/tutorial/index.php/ICA http://ufldl.stanford.edu/tutorial/index.php/R
ICA ICA

autoassociative neural networks


autoassociative neural networks deep learning
autoencoder

M D deep learning
M D

deep learning pretraining

BP
http://ufldl.stanford.edu/tutorial/index.php/Autoencoders ICA

12.4.3
kmeans
PCA
PCA

http://www.iro.umontreal.ca/~kegl/research/pcurves/
MDSISOMAP LLEMDS

MDS
PCA MDS
PCA

ISOMAP LLE science ISOMAP MDS


A BC D
A D A D LLE
x x
LLE
http://www.cs.nyu.edu/~roweis/lle/ matlab
sam roweis
ISOMAP LLE

GTM SOM

============================================================
(328509423) 22:05:41
autoencoder intput xoutput x y
Wilbur_(1954123) 22:07:31
x
denoising autoencoder
(328509423) 22:08:47
autoencoder 3 3
intput x vector vectors
Wilbur_(1954123) 22:11:11

(328509423) 22:12:09
autoencoder regression y
(271182928) 22:12:42
x regression y x
x correlation
Wilbur_(1954123) 22:15:30
100 80 60 80 100
100 80 60 100
autoencoder

(271937125) 22:11:47

Wilbur_(1954123) 22:15:30
dodo
http://blog.sina.com.cn/s/blog_4d92192101008end.html

(75553395) 22:16:42

Wilbur_(1954123) 22:17:50

fine-tuning
(328509423) 22:19:49
autoencoder pca learning
intputx x correlation interaction learning
x correlation x y correlation x
y PLS.
Wilbur_(1954123) 22:24:35
x x y y
y x
CNN

(328509423) 22:26:03
multitask learning y correlation Jieping Ye

XXX<liyitan2144@163.com> 22:26:04
@Wilbur
Wilbur_(1954123) 22:26:21
@
XXX<liyitan2144@163.com> 22:26:27
Jieping Ye multitask learning
(328509423) 22:26:42
@Wilbur autoencoder framework
try
Wilbur_(1954123) 22:27:55
@XXX
XXX<liyitan2144@163.com> 22:30:04

Deep learning@Wilbur

Wilbur_(1954123) 22:30:51
@
http://deeplearning.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder
@XXX hashing trick

XXX<liyitan2144@163.com> 22:33:04
hashing trick learning to hash
Wilbur_(1954123) 22:35:23
learning to hash hash code google WTA hashing
CVPR best paper hashing trick learning to hash

submodular optimization
Towards Efficient and Exact MAP-Inference for Large Scale Discrete Computer Vision Problems v
ia Combinatorial Optimization

CV
XXX<liyitan2144@163.com> 22:39:33
bound
Wilbur_(1954123) 22:45:26

Sequential Data

: @_ISCAS
-<zh3f@qq.com> 19:01:27
DNA

12

Hidden Markov Models: Linear Dynamical Systems


HMM
1

3(emission probabilities)

HMM 1 2 Zn
Zn k

1
2

3 p(Z|X)
(419504839) 19:24:21

-<zh3f@qq.com> 19:25:18
@
(419504839) 19:27:20

-<zh3f@qq.com> 19:28:27

HMM
(250992259) 19:30:46

-<zh3f@qq.com> 19:32:45
@

-<zh3f@qq.com> 19:32:51
1 EM HMM
123

EM
E

13.10 13.12,

13.12

Lagrange

M E HMM E

13.35

z_N

13.13

M EM HMM

argmax_Z{p(Z|X)} E

13.13

HMM
log

log(

*p(x_1|z_1)

log(

Z expZ
HMM
Linear Dynamical Systems HMM

(872352128) 21:09:21
hmm hmm
EM

Combining Models

: @
(66707180) 18:57:18
14 combining models

committees,
boosting: committees L

conditional mixture model

Bayesian model averaging


H h p(h)

X h

committescommittes boosting

bias-variance trade-off 3.5 copy

sin
lamda bias variance variance

sin variance

bootstrap M bootstrap N
M M M M

committees x:

h(x)

y(x) h(x)

error

M :

committes

error 0

error committes error 1/M


error error error boosting
.boosting
AdaBoost, adaptive boosting.
50%

boosting

1/N. M
(14.15) x
x (14.16)

. (14.17):

50%
)

0.5,

(14.18)

0 exp(a) 1

exp(0)=1.
(14.19):

(m=1)
2 3 m=2
150

(66707180) 19:59:59
@ZealotMaster
ZealotMaster(850458544) 20:00:14
over fitting m=150
(852383636) 20:02:23

(66707180) 20:02:25

(852383636) 20:04:50

(66707180) 20:04:56
boosting

================================================================
boosting boosting boosting
bound bound
adaboost

N N

boosting

, boosting .

boosting

boosting

boosting error ,

error

y(x) variational minimization,


half the log-odds

z<0

corss-entropy

svm
hinge

================================================================

x
x
ID3, C4.5, CART CART
CART classification and regression trees

5 A-E

trade-off

Q(T) cross-entropy Gini index


:

Gini index

================================================================
conditional mixture models

mixture of experts

9
x

9 EM

, k n

1 0 log

EM EM
E k n

responsibilities n k
responsibilities

EM M

,
:

W const log

E M
EM EM
30 50
responsibilities k n

EM M closed-form
IRLS(iterative reweighted least squares)

mixtures of expertsmixture of experts


x

gating

experts, experts

gating experts
gating

softmax

gating expert EM M
IRLS hierarchical mixture of experts
14