机器学习个人笔记完整版v4 3

2014
V4.3

2014

haiguang2000@qq.com
qq10822884
2017-06-08
2014
Machine Learning()
Web
10 18
ppt
2014 2014
https://www.coursera.org/course/ml
potplayer
ppt
_V
http://pan.baidu.com/s/1pKLATJl xn4w
qq
2017-6-7

1.0 2014.12.16
1.1 2014.12.31
2.0 2015.02.17
2.1 2015.02.23
2.2 2015.03.02
2.3 2015.03.14
2.4 2015.05.02
2.5 2015.05.13
3.0 2016.01.11 OCTAVE
3.1 2016.01.15
3.2 2016.02.15
3.3 2016.02.19
4.0 2016.02.24
4.1 2016.03.20
4.2 2016.03.28
4.3 2017.06.08

1 .............................................................................................................................................. 1
(Introduction) .................................................................................................... 1
1.1 ............................................................................................................................ 1
1.2 .................................................................................................... 4
1.3 .................................................................................................................... 6
1.4 .............................................................................................................. 10
(Linear Regression with One Variable) ................................................ 15
2.1 .................................................................................................................. 15
2.2 .................................................................................................................. 18
2.3 I ............................................................................................ 20
2.4 II ........................................................................................... 21
2.5 .................................................................................................................. 23
2.6 .............................................................................................. 26
2.7 .............................................................................................. 29
2.8 .......................................................................................................... 31
(Linear Algebra Review)........................................................................... 32
3.1 .............................................................................................................. 32
3.2 ...................................................................................................... 34
3.3 .......................................................................................................... 35
3.4 .................................................................................................................. 36
3.5 ...................................................................................................... 37
3.6 .................................................................................................................. 38
2 ............................................................................................................................................ 39
(Linear Regression with Multiple Variables) ........................................ 39
4.1 .................................................................................................................. 39
4.2 ...................................................................................................... 41
4.3 1- ................................................................................. 43
4.4 2- ..................................................................................... 45
4.5 .................................................................................................. 46
4.6 .................................................................................................................. 48
4.7 .............................................................................. 51
Octave (Octave Tutorial).......................................................................................... 53
5.1 .................................................................................................................. 53
5.2 .................................................................................................................. 60
5.3 .................................................................................................................. 69
5.4 .................................................................................................................. 76
5.5 forwhileif ............................................................................. 82
5.6 ...................................................................................................................... 88
5.7 .......................................................................................... 93
3 ............................................................................................................................................ 96
(Logistic Regression) ........................................................................................ 96
6.1 .................................................................................................................. 96
I
6.2 .................................................................................................................. 98
6.3 ................................................................................................................ 100
6.4 ................................................................................................................ 102
6.5 ................................................................................ 105
6.6 ................................................................................................................ 108
6.7 ............................................................................................ 112
(Regularization) ................................................................................................. 115
7.1 ........................................................................................................ 115
7.2 ................................................................................................................ 117
7.3 .................................................................................................... 119
7.4 ........................................................................................ 120
4 .......................................................................................................................................... 122
(Neural Networks: Representation)............................................... 122
8.1 ............................................................................................................ 122
8.2 ........................................................................................................ 124
8.3 1............................................................................................................. 128
8.4 2............................................................................................................. 132
8.5 1................................................................................................. 134
8.6 II................................................................................................. 136
8.7 ................................................................................................................ 138
5 .......................................................................................................................................... 139
(Neural Networks: Learning) ............................................................. 139
9.1 ................................................................................................................ 139
9.2 ........................................................................................................ 141
9.3 .................................................................................... 144
9.4 ............................................................................................ 147
9.5 ................................................................................................................ 148
9.6 ............................................................................................................ 150
9.7 ................................................................................................................ 151
9.8 ................................................................................................................ 152
6 .......................................................................................................................................... 155
(Advice for Applying Machine Learning) ................................... 155
10.1 .............................................................................................. 155
10.2 ...................................................................................................... 158
10.3 ...................................................................................... 160
10.4 .................................................................................................. 162
10.5 / ............................................................................................ 164
10.6 .............................................................................................................. 166
10.7 .............................................................................................. 168
(Machine Learning System Design) ....................................... 170
11.1 ...................................................................................................... 170
11.2 .............................................................................................................. 171
11.3 .............................................................................................. 174
11.4 .............................................................................. 175
II
11.5 .................................................................................................. 177
7 .......................................................................................................................................... 181
(Support Vector Machines) ................................................................... 181
12.1 .............................................................................................................. 181
12.2 .............................................................................................. 188
12.3 ...................................................................... 194
12.4 1............................................................................................................... 201
12.5 2............................................................................................................... 203
12.6 .................................................................................................. 205
8 .......................................................................................................................................... 208
(Clustering) ........................................................................................................ 208
13.1 .............................................................................................. 208
13.2 K- ........................................................................................................... 211
13.3 .............................................................................................................. 213
13.4 .......................................................................................................... 214
13.5 .......................................................................................................... 215
(Dimensionality Reduction) ............................................................................... 216
14.1 .............................................................................................. 216
14.2 .......................................................................................... 219
14.3 .................................................................................................. 220
14.4 .................................................................................................. 222
14.5 .............................................................................................. 223
14.6 .................................................................................................. 224
14.7 .................................................................................. 226
9 .......................................................................................................................................... 227
(Anomaly Detection) ................................................................................. 227
15.1 .......................................................................................................... 227
15.2 .............................................................................................................. 229
15.3 ...................................................................................................................... 230
15.4 .......................................................................... 232
15.5 .................................................................................. 233
15.6 .............................................................................................................. 234
15.7 ...................................................................................... 236
15.8 ...................................................... 239
(Recommender Systems)........................................................................... 242
16.1 .......................................................................................................... 242
16.2 .......................................................................................... 244
16.3 .............................................................................................................. 246
16.4 ...................................................................................................... 247
16.5 ...................................................................................... 248
16.6 ...................................................................... 250
10 ........................................................................................................................................ 251
(Large Scale Machine Learning)..................................................... 251
17.1 .............................................................................................. 251
III
17.2 .................................................................................................. 252
17.3 .................................................................................................. 253
17.4 .............................................................................................. 254
17.5 .............................................................................................................. 256
17.6 .......................................................................................... 258
(Application Example: Photo OCR) ................................ 259
18.1 .............................................................................................. 259
18.2 .............................................................................................................. 260
18.3 .................................................................................. 262
18.4 .................................................................. 263
(Conclusion)....................................................................................................... 264
19.1 .......................................................................................................... 264
IV
- 1 -(Introduction)
(Introduction)
1.1
: 1 - 1 - Welcome (7 min).mkv
Facebook
AI
A B
web
1
- 1 -(Introduction)
web
DNA
AI
Netflix iTunes Genius
AI
12 IT HR

2
- 1 -(Introduction)
3
- 1 -(Introduction)
1.2
: 1 - 2 - What is Machine Learning_ (7 min).mkv
Arthur Samuel
Samuel 50
Samuel
Samuel
Tom Mitchell Tom
E T
P E P T
e t p
Tom Mitchell
Email
email
P P
T E
4
- 1 -(Introduction)
5
- 1 -(Introduction)
1.3
: 1 - 3 - Supervised Learning (12 min).mkv
750
$150, 000
$200, 000
6
- 1 -(Introduction)
1 0
5 1 5
0 1
012
30 1 2 3

7
- 1 -(Introduction)
X O X
2 3 5
5 3
3 5
8
- 1 -(Introduction)
1.
2.
0 1
0 1
0 1
9
- 1 -(Introduction)
1.4
: 1 - 4 - Unsupervised Learning (14 min).mkv
URL
news.google.com
10
- 1 -(Introduction)
DNA
email Facebook +
11
- 1 -(Introduction)
12345678910,
12345678910
JAVA
12
- 1 -(Introduction)
[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
Octave Octave, Octave Matlab
Octave Octave
Matlab Matlab
Octave Octave
SVM
Octave C++ JAVA
C++ Java
C++ Java Python
Octave Octave
Octave
C++ Java
C++
Octave
Octave

13
- 1 -(Introduction)
14
- 1 -(Linear Regression with One Variable)
(Linear Regression with One Variable)
2.1
: 2 - 1 - Model Representation (8 min).mkv
1250
220000()
0/1
m
15
Training Set
x /
y /
(x,y)
(x(i),y(i) ) i
h hypothesis
h hypothesis() h
h x y y h
x y
h hypothesis
16
h x 0 1 x /
17
2.2
: 2 - 2 - Cost Function (8 min).mkv
m m = 47
h x 0 1 x
parameters0 1 y
modeling error
18
x i

2
1 m i
J 0 ,1 h
2m i 1
y
0 1 J(0,1)
J(0,1)
J(0,1)
19
2.3 I
: 2 - 3 - Cost Function - Intuition I (11 min).mkv
20
2.4 II
: 2 - 4 - Cost Function - Intuition II (9 min).mkv
J(0,1)
21
0 1
0 1
J 0 1
22
2.5
: 2 - 5 - Gradient Descent (11 min).mkv
J(0,1)
0,1,...,n
local minimum
global minimum
360
batch gradient descent
learning rate
23
0 1 j=0
j=1 J0 J1
0 1
0:= 0 1:= 1
0 1
0 1

24
25
2.6
: 2 - 6 - Gradient Descent Intuition (12 min).mkv

j : j J ( )
j
J()
learning rate
11 1

j : j J ( )
j

26
1 1 1
J()
27
28
2.7
: 2 - 7 - GradientDescentForLinearRegression (6 min).mkv
j=0
j=1
29
""
""
(normal equations)
30
2.8
: 2 - 8 - What_'s Next (6 min).mkv
31
- 1 -(Linear Algebra Review)
(Linear Algebra Review)
3.1
: 3 - 1 - Matrices and Vectors (9 min).mkv
42 4 2 m n mn 42
Aij i j

41
1 0 1 0
32
33
3.2
: 3 - 2 - Addition and Scalar Multiplication (7 min).mkv
1 0 4 0.5 5 0.5
2 5 2 5 4 10

3 1 0 1 3 2
1 0 3 0 1 0
3 2 5 6 15 2 5 3
3 1 9 3 3 1
34
3.3
: 3 - 3 - Matrix Vector Multiplication (14 min).mkv
mn n1 m1
35
3.4
: 3 - 4 - Matrix Matrix Multiplication (11 min).mkv
mn no mo
A B
36
3.5
: 3 - 5 - Matrix Multiplication Properties (9 min).mkv
ABBA
ABC=ABC
1,
I E I
1 0
AI=IA=A
37
3.6
: 3 - 6 - Inverse and Transpose (11 min).mkv
A mm
OCTAVE MATLAB
A mn m n i j a(i,j)
A=a(i,j)
A nm B B=a(j,i) b (i,j)=a (j,i)B i
j A j i AT=B( A'=B
A 1 1 45
(AB)T=ATBT
(AB)T= BTAT
(AT)T=A
(KA)T=KAT
matlab
x=y'
38
- 2 -(Linear Regression with Multiple Variables)
(Linear Regression with Multiple Variables)
4.1
: 4 - 1 - Multiple Features (8 min).mkv
x1,x2,...,xn
x(i) i i vector
1416
3
x
(2)
2

40
i
x j
i j i j
2 x3 2
(2) (2)
x 3
h h x 0 1 x1 2 x2 ... n xn
n+1 n x0=1
n+1 n+1
X m*(n+1) h x X T
T
39
40
4.2
: 4 - 2 - Gradient Descent for Multiple Variables (5 min).mkv
x y i 2
1 m i
J 0 ,1... n h
2m i 1
h x X 0 x0 1 x1 2 x2 ... n xn
T
n>=1
41
42
4.3 1-
: 4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).mkv
0-
2000 0-5
-1 1
43
n sn
44
4.4 2-
: 4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).mkv
0.001

=0.010.030.10.31310
45
4.5
: 4 - 5 - Features and Polynomial Regression (8 min).mkv
X1=frontagex2=depthx=frontage*depth=area
h x 0 1 x
h x 0 1 x1 2 x2
2
h x 0 1 x1 2 x2 3 x3
2 3
46
47
4.6
: 4 - 6 - Normal Equation (16 min).mkv

J j 0
j
X x0=1 y
XT X XT y
1
T -1 A=XTX(XTX)-1=A-1
48
Octave
pinv(X'*X)*X'*y
n (XTX)-1
O(n3)
n 10000
49
50
4.7
: 4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).mkv
( normal equation )
XT X XT y
1

=inv(X'X ) X'y X'X
X'X Octave
Octave
pinv() inv()
pinv() X'X
pinv() inv() ?
inv() x1
x2 1
3.28 ( )

x1=x2* (3.28)2
X'X
X'X
m n m 10 n
100 ( n +1 ) n+1 101
10 101
51
10
100 101
100 101
X'X
x1 x2
X'X Octave
pinv ( ) X'X
XTX
52
- 2 -Octave (Octave Tutorial)
Octave (Octave Tutorial)
5.1
: 5 - 1 - Basic Operations (14 min).mkv
Octave
C++JavaPythonNumpy
Octave Octave
Octave
Octave (prototyping language) Octave
C++ Java
Octave C++
Java
OctaveMATLABPythonNumPy
Octave MATLAB
matlabmatlab Octave D
matlab MATLAB
PythonNumPy R R
PythonNumPy
Octave NumPy
R Octave
Octave
Octave
Octave
53
Octave Octave
Octave
5 + 6 11
3 2581/22 ^ 6
1==2 false ( )
54
1==2 0
( ~= )
( != )
1 && 0&1 && 0
1 0 1 || 0
XOR ( 1, 0 ) 1
Octave 324.x
Octave
Octave
Octave
A 3 A 3
b "hi"
55
C 3 1 C
A A
DISP
C C
sprintf 0.6%f ,a 6
56
V 1 2 3V 3 ( )1 ( )
1;2;3 3 1
V=10.12
V 1 0.1
2 V 1 11
1 1.1 1.2 1.3 2
V 1:6 V 1 6
ones(2, 3)
57
w A
W Rand Rand
rand(3, 3) 33
0 1 0 1
W N 0
58
hist
help
Octave
Octave
Octave
59
5.2
: 5 - 2 - Moving Data Around (16 min).mkv
Octave Octave
Octave
Octave
A A
A = [1 2; 3 4; 5 6]
3 2 Octave size()
size(A) 3 2
size() 12 sz
sz = size(A)
sz 12 3 2
size(sz) sz 1 2 12 1
60
2 sz
size(A, 1) 3 A A
size(A, 2) 2 A
v v = [1 2 3 4] length(v)
length(A) A 32 3
length length
length([1;2;3;4;5]) 5
Octave Octave
pwd Octave
cd C:\Users\ang\Desktop
lsls Unix Linux ls
featuresX.dat priceY.dat
61
featuresX
47 2104 3
1600 3
priceY featuresX priceY
Octave featuresX.dat
featuresX priceY.dat
load('featureX.dat')
Octave
who Octave
featuresX featuresX
62
size(featuresX) 47 2 472
size(priceY) 47 1 47
who whos
double
63
clear clear featuresX
whos featuresX
v= priceY(1:10)
Y 10 v
save hello.mat v v
hello.mat hello.mat
MATLAB MATLAB
MATLAB
clear
hello.mat v v
hello.mat save
save hello.txt v -ascii
ascii
hello.txt
64
32 A(3,2)
A (3,2) A 32 3 2
A(2,:)
A(:,2) A 2 4 6
A([1 3],:)
A 1 3 A
A A(:,2)
A 10 11 12
A [10;11;12] A
1 3 5 10 11 12
65
A A = [A, [100, 101, 102]]
A A
A(:) A
91
A [1 2; 3 4; 5 6] B [11 12; 13 14; 15 16]
CC = [A B] A
B C A B
66
C = [A; B][A; B]
A B
C 62
C A
[A B] [A, B]
67
Octave
Octave
Octave
68
5.3
: 5 - 3 - Computing on Data (13 min).mkv
Octave
Octave A 32
B 3 2 C 2 2
A C AC 32
22 32
A .*B Octave A
A .* B 1 11 11 2 12 24
Octave
69
A A .^ 2 A
V V [1; 2; 3] 1 ./ V
1/1 1/2 1/3
1 ./ A A
e e
70
abs v v
v V
-1 v -v -1*v
v 1
3 1 1 1 v [1 2 3]
[2 3 4]length(v) ones(length(v) ,1)
ones(3,1) v + ones(3,1) v 1 v
v+1v + 1 v 1
71
A A, A
(A) A A
a=[1 15 2 0.5] 1 4 val=max(a)
A 15
[val, ind] = max(a) a val
15 2 ind ind 2
max(A) A
a a=[1 15 2 0.5] a<3
3 1 0
[1 1 0 1] a 3
3 1 0
find(a<3) a 3
A = magic(3)magic (magic squares)
3 3
72
[r,c] = find( A>=7 ) A 7 r
c 7
7 7
find
help help
help find
sum(a) a
prod(a)prod product()
73
floor(a) a 0.5 0
ceil(a) 0.5 1
type(3) 33 max(rand(3), rand(3))
33
max(A,[],1)
8 9 7 1 A
max(A,[],2)
8 7 9
max(A) A
max(max(A)) A max(A(:)) A
A 9 9
74
99 sum(A,1)
99 369
sum(A,2) A
369
A 99
eye(9)
I9
A 0
sum(sum(A.*eye(9))
369
369
flipup/flipud /
pinv(A)
A A
temp = pinv(A) temp A
1 0
75
5.4
: 5 - 4 - Plotting Data (10 min).mkv
J()
Octave
Octave
plot(t, y1)
t y1
y2
76
Octave

cos(x) 1
plot(t, y1) hold onhold on
y2plot(t, y2)
r r
plot(t, y2,r) xlabel('time') X
ylabel('value')
legend('sin', 'cos')
title('myplot')
77
print dpng 'myplot.png'png
Octave help plot
close
Octave
figure(1); plot(t, y1); t y1
figure(2); plot(t, y2); t y2
subplot subplot(1,2,1) 1*2
plot(t, y1)
subplot(1,2,2) plot(t, y2) y2

78
[0.5 1 -1 1]axis([0.5 1 -1 1])
x y 0.5
1-1 1
axis
Octave help
Clf
A 55 magic
imagesc(A) 5*5
5*5 A
colorbar imagesc(A)colorbarcolormap
gray imagesccolorbar
colormap gray
79
imagesc(magic(15))colorbarcolormap gray
15*15 magic
a=1,b=2,c=3 Enter
80
a=1; b=2;c=3;
Octave imagesc colorbar
colormap
Octave Octave
if while for
81
5.5 forwhileif
: 5 - 5 - Control Statements_ for, while, if statements (13 min).mkv
Octave "for" "while" "if"
for
v 10 1
for" i 1 10 i = 1:10 v(i)
2 i end
v 2 2 i
1 10 i 1 10
82
indices () 1 10
indices 1 10
i = indices i 1 10 disp(i)
for
breakcontinue Octave breakcontinue
Octave
while
83
i 1 v(i) 100 i 1
i 5
100
while
break v(i) = 999 i = i+1 i 6
break () (end)
if i
1 while v(i) 999 i
i 6 while
v 5 999
if while end
end if end while
if-else
84
Octave exit
Octave quit
(functions)
squarethisnumber.m Octave
Windows
Octave
function y = squareThisNumber(x) Octave y
y Octave
x y x
search path ()
Octave
addpath C:\Users\ang\desktop
Octave Octave
85
Users\ang\desktop
SquareThisNumber
cd
Octave
SquareAndCubeThisNumber(x) (x x )
y1 y2
y1 y2
C C++
Octave
[a,b] = SquareAndCubeThisNumber(5)a 25b 5
125
[1,1], [2,2], [3,3]
Octave J() J
Octave X = [1 1; 1 2; 1 3];
86
Octave j = costFunctionJ(X, y, theta) j
0 x [1;2;3] y [1;2;3] 0 01
1 45
theta [0; 0] 0
0 = 01 0 2.333
1 2 3 2m
2.33
X y
Octave for while
if
Octave Octave
87
5.6
: 5 - 6 - Vectorization (14 min).mkv
Octave
MATLAB PythonNumPy Java C C++
Octave a b
h(x) j =0 j = n
h(x) Tx
012 n =2 x x0x1x2
88
h(x) prediction 0.0
prediction h(x) for j 0 n+1
prediction theta(j) x(j)
0 012 MATLAB
1 MATLAB 0 theta(1)
theta(2) theta(3) MATLAB 1
for j 1 n+1 0 n
for n
x prediction theta x
for
Octave x
Octave
C++
89
C++
C++
j 012 j 012
n 2 012
for j 0
1 2 j

90
for
n+1
x(i)
91
u = 2v +5w u 2 v 5 w
for 012
Octave
C++Java
92
5.7
: 5 - 7 - Working on and Submitting Programming Exercises (4 min).mkv
'ml-class-ex1'
pdf
warmUpExercise.m
55 A = eye(5)
55 warmUpExercise()
5x5
93
Octave C:\Users\ang\Desktop\ml-class-ex1
'warmUpExercise()'
5x5
submit()
'1'
94
1 1
95
- 3 -(Logistic Regression)
(Logistic Regression)
6.1
: 6 - 1 - Classification (8 min).mkv
y (Logistic
Regression)
(dependant variable)negative class
positive class y 0,1 0 1
96
y 0 1
1 0
y 0 1 0 1
1 0
0 1
y 1 0 0 1
97
6.2
: 6 - 2 - Hypothesis Representation (7 min).mkv
0 1
0 1
0 1
h 0.5 y=1
h 0.5 y=0
0.5
[0,1]
0 1
h(x)=g(TX)
98
g logistic function S
Sigmoid function
1
g z
1 e z

1
h x
1 e
T
X
h(x)=1
estimated probablity h x P y 1| x;
x h(x)=0.7 70%
y y 1-0.7=0.3
99
6.3
: 6 - 3 - Decision Boundary (15 min).mkv
(decision boundary)
h 0.5 y=1
h 0.5 y=0
z=0 g(z)=0.5
z>0 g(z)>0.5
z<0 g(z)<0.5
z=TX
TX 0 y=1
TX 0 y=0
[-3 1 1] -3+x1+x2 0 x1+x2 3
y=1
x1+x2=3 1
100
y=0 y=1

h x g 0 1 x1 2 x2 3 x12 4 x22 [-1 0 0 1 1]
101
6.4
1
h x
1 e
T
X
non-convex function
x y i 2
1 m 1 i
J h
m i 1 2

102
J
1 m
x , y
Cos t h
m i 1
i i

h(x) Cost(h(x),y)
Cost(h(x),y) y=1 h 1 0 y=1
h 1 h y=0 h 0 0 y=0
h 0 h
Cost(h(x),y)

103
J()
h(x)=g(TX)
Conjugate Gradient(Broyden fletcher
goldfarb shann,BFGS)(LBFGS) fminunc matlab octave
octave
fminunc
function [jVal, gradient] = costFunction(theta)
jVal = [...code to compute

J(theta)...];
gradient = [...code to compute derivative of J(theta)...];
end
options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);
104
6.5
: 6 - 5 - Simplified Cost Function and Gradient Descent (10 min).mkv
J()
p(y=1|x;) x
y=1 y=1
J()
105
(gradient descent)

j
1 m

J = h xi y i xji
m i 1
i
i=1 m x j

J
j
0

1
n 2 0 1 2 n

...
n
106
h x X
T
1
h x
1 e
T
X
0 n
for for i=1
to n for i=1 to n+1 for
107
6.6
: 6 - 6 - Advanced Optimization (14 min).mkv
J()
J()
J() J
01 n
J()
J()
J()

J() J
j

108

J() J
j
BFGS () L-BFGS (
) J()
(line search)
BFGS L-BFGS
L-BGFS BFGS
Octave MATLAB
Octave
CC + + Java
L-BFGS
109
0 1
1 2 J() 1
5
2 5
J()
Octave
function [jVal, gradient]=costFunction(theta)
jVal=(theta(1)-5)^2+(theta(2)-5)^2;
gradient=zeros(2,1);
gradient(1)=2*(theta(1)-5);
gradient(2)=2*(theta(2)-5);
end
21
costFunction
fminunc Octave
options=optimset('GradObj','on','MaxIter',100);
initialTheta=zeros(2,1);
[optTheta, functionVal, exitFlag]=fminunc(@costFunction, initialTheta, options);
options options options
GradObj On(on)
100
21 fminunc@
110
costFunction
Octave
theta costFunction jval
gradientgradient
theta(1) theta(2)
111
6.7
: 6 - 7 - Multiclass Classification_ One-vs-all (6 min).mkv
(logistic regression)
"" (one-vs-all)
y=1y=2y=3y=4
y=1 y=2 y=3
"" 1 31 4
0 1 2 3 1 2 3 4 1
112
""
y=1
y=2 y=3
1 ""
2 3 1
113
y=1
y=2 ,
h
i
x
i y=i x
x h
i
x
max h x
i
i
i
i y
114
- 3 -(Regularization)
(Regularization)
7.1
: 7 - 1 - The Problem of Overfitting (10 min).mkv
(over-fitting)
(regularization)
115
1.
PCA
2. magnitude
116
7.2
3 4 3 4
3 4
3 4
Regularization Parameter 0
h(x)=0
117
n

2
j

j 1
Cost Function
0 0
118
7.3
: 7 - 3 - Regularized Linear Regression (11 min).mkv
j=1,2,...,n
(n+1)*(n+1)
119
7.4
: 7 - 4 - Regularized Logistic Regression (9 min).mkv
J()
J()
h(x)=g(TX)
Octave fminuc
1.
h(x)
2. 0
120
121
- 4 -(Neural Networks: Representation)
(Neural Networks: Representation)
8.1
: 8 - 1 - Non-linear Hypotheses (10 min).mkv
x1x2
100 100
x1x2+x1x3+x1x4+...+x2x3+x2x4+...+x99x100, 5000
RGB
122
50x50
2500 25002/2
123
8.2
: 8 - 2 - Neurons and the Brain (8 min).mkv
90
124
BrainPort
FDA ()
125
YouTube
126
127
8.3 1
: 8 - 3 - Model Representation I (12 min).mkv
/processing unit/ Nucleus/
input/Dendrite/output/Axon
128
activation unit
weight
129
x1,x2,x3 input units
a1,a2,a3
h(x)
3 Input Layer
Output LayerHidden Layers
bias unit
a
(j)
i
j i (j) j j+1

(1)
j+1
(1)
3*4
130
a x x
( FORWARD PROPAGATION )
x0
10 ... ... ... a1
x1
x, a X ... ... ... ... a a 2
x2
... ... ... 33 a3
x3
X a
131
8.4 2
: 8 - 4 - Model Representation II (12 min).mkv
( FORWARD PROPAGATION )
Neuron Networks
132
a0,a1,a2,a3 Logistic Regression h(x)
logistic regression logistic regression
[x1~x3][a(2)1~a(2)3],
h(x)=g((2)0 a(2)0+(2)1 a(2)1+(2)2 a(2)2+(2)3 a(2)3)
a0,a1,a2,a3 x0,x1,x2,x3
x a
133
8.5 1
: 8 - 5 - Examples and Intuitions I (7 min).mkv
x1,x2,...,xn
AND
OR
AND output
sigmod
AND
0 =-301 =202 =20
h(x)
g(x)
134
AND
OR
OR AND
135
8.6 II
: 8 - 6 - Examples and Intuitions II (10 min).mkv
BINARY LOGICAL OPERATORS0 1
-302020AND
-102020OR
10-20NOT
XNOR 1 0 XNOR=x1ANDx2
OR((NOTx1)AND(NOTx2))
(NOTx1)AND(NOTx2)
136
AND (NOTx1)AND(NOTx2) OR
XNOR
137
8.7
: 8 - 7 - Multiclass Classification (4 min).mkv
y=1,2,3.
1 0
x 4 4
[a b c d]T a,b,c,d 1
138
- 5 -(Neural Networks: Learning)
(Neural Networks: Learning)
9.1
m x yL
Sl neuron (SL )SL
SL=1, y=0 or 1
K SL=K, yi = 1 i K>2
scalar y
h(x) K
139
K K
K y
0 j
sl +1 i sl
h(x)-
regularization bias
140
9.2
: 9 - 2 - Backpropagation Algorithm (12 min).mkv
h(x)
x(1),y(1)
K=4SL=4L=4
yk
k=1:K

141
g'(z(3)) S g'(z(3))=a(3).*(1-a(3))((3))T(4)
=0
j j
i i
l i j
142
Octave fminuc
Theta1Theta2 Theta3 10*1110*11
1*11
thetaVec = [Theta1(:) ; Theta2(:) ; Theta3(:)]
...optimization using functions like fminuc...
Theta1 = reshape(thetaVec(1:110, 10, 11);
143
9.3
: 9 - 3 - Backpropagation Intuition (13 min).mkv
144
145
146
9.4
: 9 - 4 - Implementation Note_ Unrolling Parameters (8 min).mkv
147
9.5
: 9 - 5 - Gradient Checking (12 min).mkv
Numerical Gradient Checking
- +
0.001
Octave
gradApprox = (J(theta + eps) J(theta - eps)) / (2*eps)
148
149
9.6
: 9 - 6 - Random Initialization (7 min).mkv
1011
Theta1 = rand(10, 11) * (2*eps) eps
150
9.7
: 9 - 7 - Putting It Together (14 min).mkv
1.
2. h(x)
3. J
4.
5.
6.
151
9.8
: 9 - 8 - Autonomous Driving (7 min).mkv
Dean Pomerleau
152
ALVINN (Autonomous Land Vehicle In a Neural Network)
ALVINN NavLab
ALVINN
ALVINN ALVINN
30x32 ALVINN
ALVINN
153
ALVINN 12
154
- 6 -(Advice for Applying Machine Learning)
(Advice for Applying Machine Learning)
10.1
: 10 - 1 - Deciding What to Try Next (6 min).mkv
x1x2 x3
155
x1 x2 x1x2
lambda
1.
2.
3.
4.
5.
6.
""
156
157
10.2
: 10 - 2 - Evaluating a Hypothesis (8 min).mkv
h(x)
70%
30%
158
1. J
2.
159
10.3
: 10 - 3 - Model Selection and Train_Validation_Test Sets (12 min).mkv
10
60% 20% 20%
1. 10
2. 10
3.
4. 3
160
161
10.4
: 10 - 4 - Diagnosing Bias vs. Variance (8 min).mkv
162
d d
d d
163
10.5 /
: 10 - 5 - Regularization and Bias_Variance (11 min).mkv
0-10 2
0,0.01,0.02,0.04,0.08,0.15,0.32,0.64,1.28,2.56,5.12,10 12
1. 12
2. 12
3.
4. 3
164
165
10.6
: 10 - 6 - Learning Curves (12 min).mkv
sanity check
100 1
166
167
10.7
: 10 - 7 - Deciding What to Do Next Revisited (7 min).mkv
1.1
1.
2.
3.
4.
5.
6.

168
169
- 6 -(Machine Learning System Design)
(Machine Learning System Design)
11.1
: 11 - 1 - Prioritizing What to Work On (10 min).mkv
100
1 0 1001
1.
2.
3.
4. watch w4tch
" Honey Pot "
170
11.2
: 11 - 2 - Error Analysis (13 min).mkv
error analysis
24
1.
2.
3.
171
discount/discounts/discounted/discounting
172
173
11.3
: 11 - 3 - Error Metrics for Skewed Classes (12 min).mkv
skewed classes
0.5%
0.5% 1%
PrecisionRecall
1. True Positive,TP
2. True Negative,TN
3. False Positive,FP
4. False Negative,FN
=TP/TP+FP
=TP/TP+FN
174
11.4
: 11 - 4 - Trading Off Precision and Recall (14 min).mkv
0-1 0.5
Precision=TP/TP+FP
Recall=TP/TP+FN
0.5 0.70.9
0.5 0.3
175
F1 F1 Score
F1
176
11.5
: 11 - 5 - Data For Machine Learning (11 min).mkv
Michele Banko Eric Brill
__ (to,two,too)
2001
"
" (perceptron) Winnow
177
0.1 1000 10
""
""
"
"
x y
twototoo x

178
__ (two)
to too
179
x y
y x
180
- 7 -(Support Vector Machines)
(Support Vector Machines)
12.1
: 12 - 1 - Optimization Objective (15 min).mkv
A B
(Support Vector
Machine) SVM
181
. z T x
y=1
y=1 h(x)
1 h(x) 1 T x
0>> 0 z T x z 0
y=0 0 T x z 0
182
(x, y)
1/m
1/m
y 1 y 0
y 1 y 1 (1-y)
1
0 y 1 (x, y) y 1 log(1 )

1 e z
z T x y y
1 1 z
z T x
y=1 T x
183
1
log(1 )
z
1 e
z=1
()
y=1
y=1 y=0
y=0 0
z z
184
cos t ( z)
1
cos t ( z) y=1 y=0

0
J()
cos t ( z)
1
cos t ( x) cos t 0( z ) cos t 0( x)

T T
1
cost1 cost0
185
1/m
1/m 1/m
1/m
(u-5)^2+1
u u=5
10
10(u-5)^2+10 u u 5
m m
B A B
A+B
A B
CCA+B
B C
B A
C 1/ 1/
C 1/ C
186
1/
SVM C
y 1 0
1 T x 0 0
SVM
187
12.2
: 12 - 2 - Large Margin Intuition (11 min).mkv
SVM
z cost1(z)
z cost0(z) z
y 1
z 1 cost1(z) 0
T x >=1 y 0 cost0(z) z<=1
0 y
1 T x 0 T x >0
0 T x <=0
T x >0 0 1 0
-1
188
C 100000
C 0
y=1 0
T x >=1 y=0 cost0(z)
0 T x <=-1 0
0 0
C 0 C 0
T xi >=1 y (i) 1 T xi <=-1 i

189
(margin)
190
C 100000
y=1 y=0
191
(outlier)
C 1/
C=1/

192
193
12.3
: 12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20
min).mkv
u v
u T v u T v u v
u1 u2 u
u u u u u
u u u
2 2
1 2
u
v v
v1 v2 v u v
v u 90
u p p
v u p v u
194
u T v=p u u
p u
u T v [u1 u2] v u1v1+ u2v2
u T v=vTu
u v u v v u u
v u
p u T v
p p
u v u v
90 v u p
u T v p u p
u v 90 p
90 p
90
195
0 = 0 n 2
x1 x2
12
2
1 1

2 2 2 2 2
n=2 1 2 1 2

2 2
1 2
012 0 0 12
0 12 0
1

2

2
T x
x? u T v
x(i) u v
196
i i i
x x 1
x2
2 T x(i)
p (i) i
T x(i) p
i i
1 x 1
2 x 2

x(i)
i i
x >=1 x <-1 ,
T T
i
x >=1 T x = p
i i
p
i i
x p
T
197
1

2

2
0 =0
90
, 0 =0
(0,0)
x(1)
p(1) x(2)
p(2)
p(2)
p(2) 90 p(2) 0
p(i)
i
p >=1, p(i) ,
1
. p(1) , p >=1,
2
p(1) p <= -
1 p(2)

198
x(1) x p(1) p(1)
x(2)p(2)
(1)
p(1) p(2) p >1
p(1)
p(1) p(2) p(3) p(1) p(2)
p(3)

p(i)
0 =0
199
0 = 0
0 0
0 0
C 0 0
200
12.4 1
: 12 - 4 - Kernels I (16 min).mkv
... h(x)=f1+f2+...+fn
f1,f2,f3
x
x landmarksl(1),l(2),l(3)
f1,f2,f3
x l(1)
similarityx,l(1)Gaussian
201
Kernel
x L 0
)
f e-0=1 x L f e-( =0
[x1 x2] l(1)
x1x2 f x l(1) f
x f 2
l(1) l(2) l(3)
f1 1 f2,f3 0 h(x)=0+1f1+2f2+1f3>0 y=1
l(2) y=1
y=0
f1,f2,f3
202
12.5 2
: 12 - 5 - Kernels II (16 min).mkv
m l(1)=x(1),l(2)=x(2),...,l(m)=x(m)
x f Tf>=0 y=1
203
TM T M
liblinear,libsvm
linear kernel
204
12.6
: 12 - 6 - Using An SVM (21 min).mkv
SVM
SVM
SVM
liblinear libsvm
Polynomial Kernel
String kernel
chi-square kernel
histogram intersection kernel
...
Mercer's
k k k
SVM
1 C /
SVM SVM
205
n m
(1) m n
(2) n m n 1-1000 m 10-10000
(3) n m n 1-1000 m 50000
SVM
SVM
SVM SVM
1 10,000
5 50,000
SVM
SVM SVM
SVM
206
SVM
SVM
SVM
SVM
SVM
207
- 8 -(Clustering)
(Clustering)
13.1
: 13 - 1 - Unsupervised Learning_ Introduction (3 min).mkv
x(1),x(2)..
x(m) y
208
- 8 -(Clustering)
Facebook Google+
209
- 8 -(Clustering)
210
- 8 -(Clustering)
13.2 K-
: 13 - 2 - K-Means Algorithm (13 min).mkv
K-
K- n :
K cluster centroids
2-4
211
- 8 -(Clustering)
10
1,2,...,k c(1),c(2),...,c(m) i
K-
Repeat {
for i = 1 to m
c(i) := index (form 1 to K) of cluster centroid closest to x(i)
for k = 1 to K
k := average (mean) of points assigned to cluster k
for i
for k
K-
K-
T-
212
- 8 -(Clustering)
13.3
: 13 - 3 - Optimization Objective (7 min).mkv
K-
K- Distortion function
c(i) x(i)
c(1),c(2),...,c(m) 1,2,...,k
K- c(i)
213
- 8 -(Clustering)
13.4
: 13 - 4 - Random Initialization (8 min).mkv
K-
1. K<m
2. K K K
K-
K-
K- K
2--10 K
214
- 8 -(Clustering)
13.5
: 13 - 5 - Choosing the Number of Clusters (8 min).mkv
K-
JK
1 2 2 3 3
K 3 K
T- 3
S,M,L 5 XS,S,M,L,XL T-
215
- 8 -(Dimensionality Reduction)
(Dimensionality Reduction)
14.1
: 14 - 1 - Motivation I_ Data Compression (10 min).mkv
x1:X2
x1 X2
216
X1
X2
217
1000
100
218
14.2
: 14 - 2 - Motivation II_ Visualization (6 min).mkv
50 GDP
GDP 50
219
14.3
: 14 - 3 - Principal Component Analysis Problem Formulation (9 min). mkv
PCA
PCA Vector direction
n k u(1),u(2),...,u(k)
Projected
Error
PCA n k 100 10
220
90% KL PCA PCA
PCA
PCA PCA
221
14.4
: 14 - 4 - Principal Component Analysis Algorithm (15 min).mkv
PCA n k
xj= xj -j
covariance matrix
eigenvectors:
Octave singular value decomposition[U, S, V]=
svd(sigma)
nn U
n k U K
nk Ureduce
z(i)
x n1 k1
222
14.5
: 14 - 5 - Choosing The Number Of Principal Components (13 min).mkv
1% 99%
95%
K=1 Ureduce z
1% K=2 1% K
K Octave svd
[U, S, V] = svd(sigma)
S nn 0
223
14.6
: 14 - 6 - Reconstruction from Compressed Representation (4 min).mkv
PCA 1000
100
i i
Z 100 x 1000
PCA x(1),X(2)
Z(1)
Z(1)
x 2 z 1 z x x appox U reduce z
T
U reduce
x appox
x
224
PCA X Z
PCA PCA
225
14.7
: 14 - 7 - Advice for Applying PCA (13 min).mkv
100100
10000
1. 1000
2.
3. Ureduce x z
Ureduce
226
- 9 -(Anomaly Detection)
(Anomaly Detection)
15.1
: 15 - 1 - Problem Motivation (8 min).mkv
(Anomaly detection)
QA
()
x(1) x(m) m
xtest
227
x(1),x(2),..,x(m) xtest
p(x)
X(i) = i
p(x) =
p(x)<
CPU
228
15.2
: 15 - 2 - Gaussian Distribution (10 min).mkv
x x~N(,2)
m m-1
1/m 1/(m-1)
1/m
229
15.3
: 15 - 3 - Algorithm (12 min).mkv
x(1),x(2),...,x(m) 2
p(x)
p(x)<
z p(x)
p(x)= p(x)>
230
p(x) x
231
15.4
: 15 - 4 - Developing and Evaluating an Anomaly Detection System (13
min). mkv
10000 20
6000
2000 10
2000 10
1. p(x)
2. F1
3. F1
232
15.5
: 15 - 5 - Anomaly Detection vs. Supervised Learning (8 min).mkv
y=1,
y=0
1. 1.
2. 2.
3. 3.
233
15.6
: 15 - 6 - Choosing What Features to Use (12 min).mkv
x = log(x+c) c
x=xcc 0-1
p(x)
234
CPU
235
15.7
: 15 - 7 - Multivariate Gaussian Distribution (Optional) (14 min).mkv
X p(x)
p(x)
p(x)
p(x):

236
|| Octave det(sigma)
1.
2. 1 2
3. 2 1
4.
5.
1233
m>n
m>10n
237
238
15.8
: 15 - 8 - Anomaly Detection using the Multivariate Gaussian Distribution
(Optional) (14 min).mkv
n n n
PCA
239
P(x)
240
241
- 9 -(Recommender Systems)
(Recommender Systems)
16.1
: 16 - 1 - Problem Formulation (8 min).mkv
iTunes Genius
242
5 4
Alice Bob
Carol Dave
nu
nm
r(i,j) i j r(i,j)=1
y(i,j) i j
mj j
243
16.2
: 16 - 2 - Content Based Recommendations (15 min).mkv
x1 x2
x(1)[0.9 0]
(1)
(j) j
x(i) i
j i((j))Tx(i)
i:r(i,j) j
1/2m m 0
244
245
16.3
: 16 - 3 - Collaborative Filtering (10 min).mkv
1. x(1),x(2),...,x(nm)(1),(2),...,(nu)
2.
3. ((j))Tx(i) j i
x(i) x(j)
||x(i)-x(j)||
246
16.4
: 16 - 4 - Collaborative Filtering Algorithm (9 min).mkv
247
16.5
: 16 - 5 - Vectorization_ Low Rank Matrix Factorization (8 min).mkv
1.
2.
Y 5 4
Movie Alice (1) Bob (2) Carol (3) Dave (4)
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Nonstop car chases 0 0 5 4
Swords vs. karate 0 0 5 ?
248
i x(i)
j x(i) x(j) i j
i j
i 5
5 j
249
16.6
: 16 - 6 - Implementational Detail_ Mean Normalization (9 min).mkv
Eve Eve
Eve
((j))T(x(i))+i Eve
250
- 10 -(Large Scale Machine Learning)
10
(Large Scale Machine Learning)
17.1
: 17 - 1 - Learning With Large Datasets (6 min).mkv
100
20
1000
251
17.2
: 17 - 2 - Stochastic Gradient Descent (13 min).mkv
252
17.3
: 17 - 3 - Mini-Batch Gradient Descent (6 min).mkv
b 2-100
253
17.4
: 17 - 4 - Stochastic Gradient Descent Convergence (12 min). mkv
X X
254
1000
255
17.5
: 17 - 5 - Online Learning (13 min).mkv

A B
$50 $20
A B
y=1y=0
p(y=1)
256
2 3 3
257
17.6
: 17 - 6 - Map Reduce and Data Parallelism (14 min).mkv
CPU
400 4
CPU
258
- 10 -(Application Example: Photo OCR)
(Application Example: Photo OCR)
18.1
: 18 - 1 - Problem Description and Pipeline (7 min).mkv
1. Text detection
2. Character segmentation
3. Character classification
259
18.2
: 18 - 2 - Sliding Windows (15 min).mkv
260
261
18.3
: 18 - 3 - Getting Lots of Data and Artificial Data (16 min).mkv
1.
2.
3.
262
18.4
: 18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14
min).mkv
100%
72%
100% 72%
89%
100%
1%
100%
10%
263
- 10 -(Conclusion)
(Conclusion)
19.1
: 19 - 1 - Summary and Thank You (5 min).mkv
x(i)y(i)
K-
x(i)
F1
264
- 10 -(Conclusion)
Andew Ng
265

机器学习个人笔记完整版v4 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

机器学习个人笔记完整版v4 3

Uploaded by

Copyright:

Available Formats

2014

3.0 2016.01.11 OCTAVE

Netflix iTunes Genius

: 1 - 2 - What is Machine Learning_ (7 min).mkv

Tom Mitchell Tom

: 1 - 3 - Supervised Learning (12 min).mkv

: 1 - 4 - Unsupervised Learning (14 min).mkv

Octave Octave, Octave Matlab

Octave C++ JAVA

C++ Java Python

(Linear Regression with One Variable)

: 2 - 1 - Model Representation (8 min).mkv

: 2 - 2 - Cost Function (8 min).mkv

: 2 - 3 - Cost Function - Intuition I (11 min).mkv

: 2 - 4 - Cost Function - Intuition II (9 min).mkv

: 2 - 5 - Gradient Descent (11 min).mkv

batch gradient descent

: 2 - 6 - Gradient Descent Intuition (12 min).mkv

: 2 - 8 - What_'s Next (6 min).mkv

(Linear Algebra Review)

: 3 - 1 - Matrices and Vectors (9 min).mkv

: 3 - 2 - Addition and Scalar Multiplication (7 min).mkv

: 3 - 3 - Matrix Vector Multiplication (14 min).mkv

: 3 - 4 - Matrix Matrix Multiplication (11 min).mkv

: 3 - 5 - Matrix Multiplication Properties (9 min).mkv

: 3 - 6 - Inverse and Transpose (11 min).mkv

A nm B B=a(j,i) b (i,j)=a (j,i)B i

(Linear Regression with Multiple Variables)

: 4 - 1 - Multiple Features (8 min).mkv

: 4 - 2 - Gradient Descent for Multiple Variables (5 min).mkv

: 4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).mkv

: 4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).mkv

: 4 - 5 - Features and Polynomial Regression (8 min).mkv

: 4 - 6 - Normal Equation (16 min).mkv

: 4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).mkv

100 ( n +1 ) n+1 101

Octave (Octave Tutorial)

: 5 - 1 - Basic Operations (14 min).mkv

Octave (prototyping language) Octave

1 && 0&1 && 0

1 1.1 1.2 1.3 2

: 5 - 2 - Moving Data Around (16 min).mkv

lsls Unix Linux ls

priceY featuresX priceY

clear clear featuresX

save hello.txt v -ascii

A A = [A, [100, 101, 102]]

A [1 2; 3 4; 5 6] B [11 12; 13 14; 15 16]

: 5 - 3 - Computing on Data (13 min).mkv

1/1 1/2 1/3

[2 3 4]length(v) ones(length(v) ,1)

a=[1 15 2 0.5] 1 4 val=max(a)

[val, ind] = max(a) a val

a a=[1 15 2 0.5] a<3

A = magic(3)magic (magic squares)

[r,c] = find( A>=7 ) A 7 r

type(3) 33 max(rand(3), rand(3))

temp = pinv(A) temp A

: 5 - 4 - Plotting Data (10 min).mkv

plot(t, y1) hold onhold on

plot(t, y2,r) xlabel('time') X

print dpng 'myplot.png'png

Octave help plot