You are on page 1of 273

2014

V4.3


2014


haiguang2000@qq.com
qq10822884

2017-06-08
2014

Machine Learning()

Web

10 18

ppt

2014 2014

https://www.coursera.org/course/ml

potplayer

ppt

_V

http://pan.baidu.com/s/1pKLATJl xn4w

qq

2017-6-7

1.0 2014.12.16

1.1 2014.12.31

2.0 2015.02.17

2.1 2015.02.23

2.2 2015.03.02

2.3 2015.03.14

2.4 2015.05.02

2.5 2015.05.13

3.0 2016.01.11 OCTAVE

3.1 2016.01.15

3.2 2016.02.15

3.3 2016.02.19

4.0 2016.02.24

4.1 2016.03.20

4.2 2016.03.28

4.3 2017.06.08

1 .............................................................................................................................................. 1
(Introduction) .................................................................................................... 1
1.1 ............................................................................................................................ 1
1.2 .................................................................................................... 4
1.3 .................................................................................................................... 6
1.4 .............................................................................................................. 10
(Linear Regression with One Variable) ................................................ 15
2.1 .................................................................................................................. 15
2.2 .................................................................................................................. 18
2.3 I ............................................................................................ 20
2.4 II ........................................................................................... 21
2.5 .................................................................................................................. 23
2.6 .............................................................................................. 26
2.7 .............................................................................................. 29
2.8 .......................................................................................................... 31
(Linear Algebra Review)........................................................................... 32
3.1 .............................................................................................................. 32
3.2 ...................................................................................................... 34
3.3 .......................................................................................................... 35
3.4 .................................................................................................................. 36
3.5 ...................................................................................................... 37
3.6 .................................................................................................................. 38
2 ............................................................................................................................................ 39
(Linear Regression with Multiple Variables) ........................................ 39
4.1 .................................................................................................................. 39
4.2 ...................................................................................................... 41
4.3 1- ................................................................................. 43
4.4 2- ..................................................................................... 45
4.5 .................................................................................................. 46
4.6 .................................................................................................................. 48
4.7 .............................................................................. 51
Octave (Octave Tutorial).......................................................................................... 53
5.1 .................................................................................................................. 53
5.2 .................................................................................................................. 60
5.3 .................................................................................................................. 69
5.4 .................................................................................................................. 76
5.5 forwhileif ............................................................................. 82
5.6 ...................................................................................................................... 88
5.7 .......................................................................................... 93
3 ............................................................................................................................................ 96
(Logistic Regression) ........................................................................................ 96
6.1 .................................................................................................................. 96

I
6.2 .................................................................................................................. 98
6.3 ................................................................................................................ 100
6.4 ................................................................................................................ 102
6.5 ................................................................................ 105
6.6 ................................................................................................................ 108
6.7 ............................................................................................ 112
(Regularization) ................................................................................................. 115
7.1 ........................................................................................................ 115
7.2 ................................................................................................................ 117
7.3 .................................................................................................... 119
7.4 ........................................................................................ 120
4 .......................................................................................................................................... 122
(Neural Networks: Representation)............................................... 122
8.1 ............................................................................................................ 122
8.2 ........................................................................................................ 124
8.3 1............................................................................................................. 128
8.4 2............................................................................................................. 132
8.5 1................................................................................................. 134
8.6 II................................................................................................. 136
8.7 ................................................................................................................ 138
5 .......................................................................................................................................... 139
(Neural Networks: Learning) ............................................................. 139
9.1 ................................................................................................................ 139
9.2 ........................................................................................................ 141
9.3 .................................................................................... 144
9.4 ............................................................................................ 147
9.5 ................................................................................................................ 148
9.6 ............................................................................................................ 150
9.7 ................................................................................................................ 151
9.8 ................................................................................................................ 152
6 .......................................................................................................................................... 155
(Advice for Applying Machine Learning) ................................... 155
10.1 .............................................................................................. 155
10.2 ...................................................................................................... 158
10.3 ...................................................................................... 160
10.4 .................................................................................................. 162
10.5 / ............................................................................................ 164
10.6 .............................................................................................................. 166
10.7 .............................................................................................. 168
(Machine Learning System Design) ....................................... 170
11.1 ...................................................................................................... 170
11.2 .............................................................................................................. 171
11.3 .............................................................................................. 174
11.4 .............................................................................. 175

II
11.5 .................................................................................................. 177
7 .......................................................................................................................................... 181
(Support Vector Machines) ................................................................... 181
12.1 .............................................................................................................. 181
12.2 .............................................................................................. 188
12.3 ...................................................................... 194
12.4 1............................................................................................................... 201
12.5 2............................................................................................................... 203
12.6 .................................................................................................. 205
8 .......................................................................................................................................... 208
(Clustering) ........................................................................................................ 208
13.1 .............................................................................................. 208
13.2 K- ........................................................................................................... 211
13.3 .............................................................................................................. 213
13.4 .......................................................................................................... 214
13.5 .......................................................................................................... 215
(Dimensionality Reduction) ............................................................................... 216
14.1 .............................................................................................. 216
14.2 .......................................................................................... 219
14.3 .................................................................................................. 220
14.4 .................................................................................................. 222
14.5 .............................................................................................. 223
14.6 .................................................................................................. 224
14.7 .................................................................................. 226
9 .......................................................................................................................................... 227
(Anomaly Detection) ................................................................................. 227
15.1 .......................................................................................................... 227
15.2 .............................................................................................................. 229
15.3 ...................................................................................................................... 230
15.4 .......................................................................... 232
15.5 .................................................................................. 233
15.6 .............................................................................................................. 234
15.7 ...................................................................................... 236
15.8 ...................................................... 239
(Recommender Systems)........................................................................... 242
16.1 .......................................................................................................... 242
16.2 .......................................................................................... 244
16.3 .............................................................................................................. 246
16.4 ...................................................................................................... 247
16.5 ...................................................................................... 248
16.6 ...................................................................... 250
10 ........................................................................................................................................ 251
(Large Scale Machine Learning)..................................................... 251
17.1 .............................................................................................. 251

III
17.2 .................................................................................................. 252
17.3 .................................................................................................. 253
17.4 .............................................................................................. 254
17.5 .............................................................................................................. 256
17.6 .......................................................................................... 258
(Application Example: Photo OCR) ................................ 259
18.1 .............................................................................................. 259
18.2 .............................................................................................................. 260
18.3 .................................................................................. 262
18.4 .................................................................. 263
(Conclusion)....................................................................................................... 264
19.1 .......................................................................................................... 264

IV
- 1 -(Introduction)

(Introduction)

1.1

: 1 - 1 - Welcome (7 min).mkv

Facebook

AI

A B

web

1
- 1 -(Introduction)

web

DNA

AI

Netflix iTunes Genius

AI

12 IT HR


2
- 1 -(Introduction)

3
- 1 -(Introduction)

1.2

: 1 - 2 - What is Machine Learning_ (7 min).mkv

Arthur Samuel

Samuel 50

Samuel

Samuel

Tom Mitchell Tom

E T

P E P T

e t p

Tom Mitchell

Email

email

P P

T E

4
- 1 -(Introduction)

5
- 1 -(Introduction)

1.3

: 1 - 3 - Supervised Learning (12 min).mkv

750

$150, 000

$200, 000

6
- 1 -(Introduction)

1 0

5 1 5

0 1

012

30 1 2 3


7
- 1 -(Introduction)

X O X

2 3 5

5 3

3 5

8
- 1 -(Introduction)

1.

2.

0 1

0 1

0 1

9
- 1 -(Introduction)

1.4

: 1 - 4 - Unsupervised Learning (14 min).mkv

URL

news.google.com

10
- 1 -(Introduction)

DNA

email Facebook +

11
- 1 -(Introduction)

12345678910,

12345678910

JAVA
12
- 1 -(Introduction)

[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

Octave Octave, Octave Matlab

Octave Octave

Matlab Matlab

Octave Octave

SVM

Octave C++ JAVA

C++ Java

C++ Java Python

Octave Octave

Octave

C++ Java

C++

Octave

Octave


13
- 1 -(Introduction)

14
- 1 -(Linear Regression with One Variable)

(Linear Regression with One Variable)

2.1

: 2 - 1 - Model Representation (8 min).mkv

1250

220000()

0/1

m
15
- 1 -(Linear Regression with One Variable)

Training Set

x /

y /

(x,y)

(x(i),y(i) ) i

h hypothesis

h hypothesis() h

h x y y h

x y

h hypothesis

16
- 1 -(Linear Regression with One Variable)

h x 0 1 x /

17
- 1 -(Linear Regression with One Variable)

2.2

: 2 - 2 - Cost Function (8 min).mkv

m m = 47

h x 0 1 x

parameters0 1 y

modeling error

18
- 1 -(Linear Regression with One Variable)

x i

2
1 m i
J 0 ,1 h
2m i 1
y

0 1 J(0,1)

J(0,1)

J(0,1)

19
- 1 -(Linear Regression with One Variable)

2.3 I

: 2 - 3 - Cost Function - Intuition I (11 min).mkv

20
- 1 -(Linear Regression with One Variable)

2.4 II

: 2 - 4 - Cost Function - Intuition II (9 min).mkv

J(0,1)

21
- 1 -(Linear Regression with One Variable)

0 1

0 1

J 0 1

22
- 1 -(Linear Regression with One Variable)

2.5

: 2 - 5 - Gradient Descent (11 min).mkv

J(0,1)

0,1,...,n

local minimum

global minimum

360

batch gradient descent

learning rate

23
- 1 -(Linear Regression with One Variable)

0 1 j=0

j=1 J0 J1

0 1

0:= 0 1:= 1

0 1

0 1


24
- 1 -(Linear Regression with One Variable)

25
- 1 -(Linear Regression with One Variable)

2.6

: 2 - 6 - Gradient Descent Intuition (12 min).mkv


j : j J ( )
j

J()

learning rate

11 1



j : j J ( )
j

26
- 1 -(Linear Regression with One Variable)

1 1 1

J()

27
- 1 -(Linear Regression with One Variable)

28
- 1 -(Linear Regression with One Variable)

2.7

: 2 - 7 - GradientDescentForLinearRegression (6 min).mkv

j=0

j=1

29
- 1 -(Linear Regression with One Variable)

""

""

(normal equations)

30
- 1 -(Linear Regression with One Variable)

2.8

: 2 - 8 - What_'s Next (6 min).mkv

31
- 1 -(Linear Algebra Review)

(Linear Algebra Review)

3.1

: 3 - 1 - Matrices and Vectors (9 min).mkv

42 4 2 m n mn 42

Aij i j


41

1 0 1 0

32
- 1 -(Linear Algebra Review)

33
- 1 -(Linear Algebra Review)

3.2

: 3 - 2 - Addition and Scalar Multiplication (7 min).mkv

1 0 4 0.5 5 0.5
2 5 2 5 4 10

3 1 0 1 3 2

1 0 3 0 1 0
3 2 5 6 15 2 5 3
3 1 9 3 3 1

34
- 1 -(Linear Algebra Review)

3.3

: 3 - 3 - Matrix Vector Multiplication (14 min).mkv

mn n1 m1

35
- 1 -(Linear Algebra Review)

3.4

: 3 - 4 - Matrix Matrix Multiplication (11 min).mkv

mn no mo

A B

36
- 1 -(Linear Algebra Review)

3.5

: 3 - 5 - Matrix Multiplication Properties (9 min).mkv

ABBA

ABC=ABC

1,

I E I

1 0

AI=IA=A

37
- 1 -(Linear Algebra Review)

3.6

: 3 - 6 - Inverse and Transpose (11 min).mkv

A mm

OCTAVE MATLAB

A mn m n i j a(i,j)

A=a(i,j)

A nm B B=a(j,i) b (i,j)=a (j,i)B i

j A j i AT=B( A'=B

A 1 1 45

(AB)T=ATBT

(AB)T= BTAT

(AT)T=A

(KA)T=KAT

matlab

x=y'

38
- 2 -(Linear Regression with Multiple Variables)

(Linear Regression with Multiple Variables)

4.1

: 4 - 1 - Multiple Features (8 min).mkv

x1,x2,...,xn

x(i) i i vector

1416
3
x
(2)

2

40
i
x j
i j i j

2 x3 2
(2) (2)
x 3

h h x 0 1 x1 2 x2 ... n xn

n+1 n x0=1

n+1 n+1

X m*(n+1) h x X T
T

39
- 2 -(Linear Regression with Multiple Variables)

40
- 2 -(Linear Regression with Multiple Variables)

4.2

: 4 - 2 - Gradient Descent for Multiple Variables (5 min).mkv

x y i 2
1 m i
J 0 ,1... n h
2m i 1
h x X 0 x0 1 x1 2 x2 ... n xn
T

n>=1

41
- 2 -(Linear Regression with Multiple Variables)

42
- 2 -(Linear Regression with Multiple Variables)

4.3 1-

: 4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).mkv

0-

2000 0-5

-1 1

43
- 2 -(Linear Regression with Multiple Variables)

n sn

44
- 2 -(Linear Regression with Multiple Variables)

4.4 2-

: 4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).mkv

0.001


=0.010.030.10.31310

45
- 2 -(Linear Regression with Multiple Variables)

4.5

: 4 - 5 - Features and Polynomial Regression (8 min).mkv

X1=frontagex2=depthx=frontage*depth=area

h x 0 1 x

h x 0 1 x1 2 x2
2

h x 0 1 x1 2 x2 3 x3
2 3

46
- 2 -(Linear Regression with Multiple Variables)

47
- 2 -(Linear Regression with Multiple Variables)

4.6

: 4 - 6 - Normal Equation (16 min).mkv


J j 0
j

X x0=1 y

XT X XT y
1

T -1 A=XTX(XTX)-1=A-1

48
- 2 -(Linear Regression with Multiple Variables)

Octave

pinv(X'*X)*X'*y

n (XTX)-1

O(n3)

n 10000

49
- 2 -(Linear Regression with Multiple Variables)

50
- 2 -(Linear Regression with Multiple Variables)

4.7

: 4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).mkv

( normal equation )

XT X XT y
1


=inv(X'X ) X'y X'X

X'X Octave

Octave

pinv() inv()

pinv() X'X

pinv() inv() ?

inv() x1

x2 1

3.28 ( )


x1=x2* (3.28)2

X'X

X'X

m n m 10 n

100 ( n +1 ) n+1 101

10 101
51
- 2 -(Linear Regression with Multiple Variables)

10

100 101

100 101

X'X

x1 x2

X'X Octave

pinv ( ) X'X

XTX

52
- 2 -Octave (Octave Tutorial)

Octave (Octave Tutorial)

5.1

: 5 - 1 - Basic Operations (14 min).mkv

Octave

C++JavaPythonNumpy

Octave Octave

Octave

Octave (prototyping language) Octave

C++ Java

Octave C++

Java

OctaveMATLABPythonNumPy

Octave MATLAB

matlabmatlab Octave D

matlab MATLAB

PythonNumPy R R

PythonNumPy

Octave NumPy

R Octave

Octave

Octave

Octave

53
- 2 -Octave (Octave Tutorial)

Octave Octave

Octave

5 + 6 11

3 2581/22 ^ 6

1==2 false ( )

54
- 2 -Octave (Octave Tutorial)

1==2 0

( ~= )

( != )

1 && 0&1 && 0

1 0 1 || 0

XOR ( 1, 0 ) 1

Octave 324.x
Octave
Octave

Octave

A 3 A 3

b "hi"

55
- 2 -Octave (Octave Tutorial)

C 3 1 C

A A

DISP

C C

sprintf 0.6%f ,a 6

56
- 2 -Octave (Octave Tutorial)

V 1 2 3V 3 ( )1 ( )

1;2;3 3 1

V=10.12

V 1 0.1

2 V 1 11

1 1.1 1.2 1.3 2

V 1:6 V 1 6

ones(2, 3)
57
- 2 -Octave (Octave Tutorial)

w A

W Rand Rand

rand(3, 3) 33

0 1 0 1

W N 0

58
- 2 -Octave (Octave Tutorial)

hist

help

Octave

Octave

Octave

59
- 2 -Octave (Octave Tutorial)

5.2

: 5 - 2 - Moving Data Around (16 min).mkv

Octave Octave

Octave

Octave

A A

A = [1 2; 3 4; 5 6]

3 2 Octave size()

size(A) 3 2

size() 12 sz

sz = size(A)

sz 12 3 2

size(sz) sz 1 2 12 1

60
- 2 -Octave (Octave Tutorial)

2 sz

size(A, 1) 3 A A

size(A, 2) 2 A

v v = [1 2 3 4] length(v)

length(A) A 32 3

length length

length([1;2;3;4;5]) 5

Octave Octave

pwd Octave

cd C:\Users\ang\Desktop

lsls Unix Linux ls

featuresX.dat priceY.dat

61
- 2 -Octave (Octave Tutorial)

featuresX

47 2104 3

1600 3

priceY featuresX priceY

Octave featuresX.dat

featuresX priceY.dat

load('featureX.dat')

Octave

who Octave

featuresX featuresX

62
- 2 -Octave (Octave Tutorial)

size(featuresX) 47 2 472

size(priceY) 47 1 47

who whos

double

63
- 2 -Octave (Octave Tutorial)

clear clear featuresX

whos featuresX

v= priceY(1:10)

Y 10 v

save hello.mat v v

hello.mat hello.mat

MATLAB MATLAB

MATLAB

clear

hello.mat v v

hello.mat save

save hello.txt v -ascii

ascii

hello.txt

64
- 2 -Octave (Octave Tutorial)

32 A(3,2)

A (3,2) A 32 3 2

A(2,:)

A(:,2) A 2 4 6

A([1 3],:)

A 1 3 A

A A(:,2)

A 10 11 12

A [10;11;12] A

1 3 5 10 11 12

65
- 2 -Octave (Octave Tutorial)

A A = [A, [100, 101, 102]]

A A

A(:) A

91

A [1 2; 3 4; 5 6] B [11 12; 13 14; 15 16]

CC = [A B] A

B C A B

66
- 2 -Octave (Octave Tutorial)

C = [A; B][A; B]

A B

C 62

C A

[A B] [A, B]

67
- 2 -Octave (Octave Tutorial)

Octave

Octave

Octave

68
- 2 -Octave (Octave Tutorial)

5.3

: 5 - 3 - Computing on Data (13 min).mkv

Octave

Octave A 32

B 3 2 C 2 2

A C AC 32

22 32

A .*B Octave A

A .* B 1 11 11 2 12 24

Octave

69
- 2 -Octave (Octave Tutorial)

A A .^ 2 A

V V [1; 2; 3] 1 ./ V

1/1 1/2 1/3

1 ./ A A

e e

70
- 2 -Octave (Octave Tutorial)

abs v v

v V

-1 v -v -1*v

v 1

3 1 1 1 v [1 2 3]

[2 3 4]length(v) ones(length(v) ,1)

ones(3,1) v + ones(3,1) v 1 v

v+1v + 1 v 1

71
- 2 -Octave (Octave Tutorial)

A A, A

(A) A A

a=[1 15 2 0.5] 1 4 val=max(a)

A 15

[val, ind] = max(a) a val

15 2 ind ind 2

max(A) A

a a=[1 15 2 0.5] a<3

3 1 0

[1 1 0 1] a 3

3 1 0

find(a<3) a 3

A = magic(3)magic (magic squares)

3 3

72
- 2 -Octave (Octave Tutorial)

[r,c] = find( A>=7 ) A 7 r

c 7

7 7

find

help help

help find

sum(a) a

prod(a)prod product()
73
- 2 -Octave (Octave Tutorial)

floor(a) a 0.5 0

ceil(a) 0.5 1

type(3) 33 max(rand(3), rand(3))

33

max(A,[],1)

8 9 7 1 A

max(A,[],2)

8 7 9

max(A) A

max(max(A)) A max(A(:)) A

A 9 9

74
- 2 -Octave (Octave Tutorial)

99 sum(A,1)

99 369

sum(A,2) A

369

A 99

eye(9)

I9

A 0

sum(sum(A.*eye(9))

369

369

flipup/flipud /

pinv(A)

A A

temp = pinv(A) temp A

1 0

75
- 2 -Octave (Octave Tutorial)

5.4

: 5 - 4 - Plotting Data (10 min).mkv

J()

Octave

Octave

plot(t, y1)

t y1

y2

76
- 2 -Octave (Octave Tutorial)

Octave

cos(x) 1

plot(t, y1) hold onhold on

y2plot(t, y2)

r r

plot(t, y2,r) xlabel('time') X

ylabel('value')

legend('sin', 'cos')

title('myplot')

77
- 2 -Octave (Octave Tutorial)

print dpng 'myplot.png'png

Octave help plot

close

Octave

figure(1); plot(t, y1); t y1

figure(2); plot(t, y2); t y2

subplot subplot(1,2,1) 1*2

plot(t, y1)

subplot(1,2,2) plot(t, y2) y2


78
- 2 -Octave (Octave Tutorial)

[0.5 1 -1 1]axis([0.5 1 -1 1])

x y 0.5

1-1 1

axis

Octave help

Clf

A 55 magic

imagesc(A) 5*5

5*5 A

colorbar imagesc(A)colorbarcolormap

gray imagesccolorbar

colormap gray

79
- 2 -Octave (Octave Tutorial)

imagesc(magic(15))colorbarcolormap gray

15*15 magic

a=1,b=2,c=3 Enter

80
- 2 -Octave (Octave Tutorial)

a=1; b=2;c=3;

Octave imagesc colorbar

colormap

Octave Octave

if while for

81
- 2 -Octave (Octave Tutorial)

5.5 forwhileif

: 5 - 5 - Control Statements_ for, while, if statements (13 min).mkv

Octave "for" "while" "if"

for

v 10 1

for" i 1 10 i = 1:10 v(i)

2 i end

v 2 2 i

1 10 i 1 10

82
- 2 -Octave (Octave Tutorial)

indices () 1 10

indices 1 10

i = indices i 1 10 disp(i)

for

breakcontinue Octave breakcontinue

Octave

while

83
- 2 -Octave (Octave Tutorial)

i 1 v(i) 100 i 1

i 5

100

while

break v(i) = 999 i = i+1 i 6

break () (end)

if i

1 while v(i) 999 i

i 6 while

v 5 999

if while end

end if end while

if-else

84
- 2 -Octave (Octave Tutorial)

Octave exit

Octave quit

(functions)

squarethisnumber.m Octave

Windows

Octave

function y = squareThisNumber(x) Octave y

y Octave

x y x

search path ()

Octave

addpath C:\Users\ang\desktop

Octave Octave

85
- 2 -Octave (Octave Tutorial)

Users\ang\desktop

SquareThisNumber

cd

Octave

SquareAndCubeThisNumber(x) (x x )

y1 y2

y1 y2

C C++

Octave

[a,b] = SquareAndCubeThisNumber(5)a 25b 5

125

[1,1], [2,2], [3,3]

Octave J() J

Octave X = [1 1; 1 2; 1 3];

86
- 2 -Octave (Octave Tutorial)

Octave j = costFunctionJ(X, y, theta) j

0 x [1;2;3] y [1;2;3] 0 01

1 45

theta [0; 0] 0

0 = 01 0 2.333

1 2 3 2m

2.33

X y

Octave for while

if

Octave Octave

87
- 2 -Octave (Octave Tutorial)

5.6

: 5 - 6 - Vectorization (14 min).mkv

Octave

MATLAB PythonNumPy Java C C++

Octave a b

h(x) j =0 j = n

h(x) Tx

012 n =2 x x0x1x2

88
- 2 -Octave (Octave Tutorial)

h(x) prediction 0.0

prediction h(x) for j 0 n+1

prediction theta(j) x(j)

0 012 MATLAB

1 MATLAB 0 theta(1)

theta(2) theta(3) MATLAB 1

for j 1 n+1 0 n

for n

x prediction theta x

for

Octave x

Octave

C++

89
- 2 -Octave (Octave Tutorial)

C++

C++

j 012 j 012

n 2 012

for j 0

1 2 j


90
- 2 -Octave (Octave Tutorial)

for

n+1

x(i)

91
- 2 -Octave (Octave Tutorial)

u = 2v +5w u 2 v 5 w

for 012

Octave

C++Java

92
- 2 -Octave (Octave Tutorial)

5.7

: 5 - 7 - Working on and Submitting Programming Exercises (4 min).mkv

'ml-class-ex1'

pdf

warmUpExercise.m

55 A = eye(5)

55 warmUpExercise()

5x5

93
- 2 -Octave (Octave Tutorial)

Octave C:\Users\ang\Desktop\ml-class-ex1

'warmUpExercise()'

5x5

submit()

'1'

94
- 2 -Octave (Octave Tutorial)

1 1

95
- 3 -(Logistic Regression)

(Logistic Regression)

6.1

: 6 - 1 - Classification (8 min).mkv

y (Logistic

Regression)

(dependant variable)negative class

positive class y 0,1 0 1

96
- 3 -(Logistic Regression)

y 0 1

1 0

y 0 1 0 1

1 0

0 1

y 1 0 0 1

97
- 3 -(Logistic Regression)

6.2

: 6 - 2 - Hypothesis Representation (7 min).mkv

0 1

0 1

0 1

h 0.5 y=1

h 0.5 y=0

0.5

[0,1]

0 1

h(x)=g(TX)

98
- 3 -(Logistic Regression)

g logistic function S
Sigmoid function
1
g z
1 e z

1
h x
1 e
T
X

h(x)=1

estimated probablity h x P y 1| x;
x h(x)=0.7 70%

y y 1-0.7=0.3

99
- 3 -(Logistic Regression)

6.3

: 6 - 3 - Decision Boundary (15 min).mkv

(decision boundary)

h 0.5 y=1

h 0.5 y=0

z=0 g(z)=0.5

z>0 g(z)>0.5

z<0 g(z)<0.5

z=TX

TX 0 y=1

TX 0 y=0

[-3 1 1] -3+x1+x2 0 x1+x2 3

y=1

x1+x2=3 1

100
- 3 -(Logistic Regression)

y=0 y=1


h x g 0 1 x1 2 x2 3 x12 4 x22 [-1 0 0 1 1]

101
- 3 -(Logistic Regression)

6.4

: 6 - 4 - Cost Function (11 min).mkv

1
h x
1 e
T
X

non-convex function

x y i 2
1 m 1 i
J h
m i 1 2

102
- 3 -(Logistic Regression)

J
1 m
x , y
Cos t h
m i 1
i i

h(x) Cost(h(x),y)

Cost(h(x),y) y=1 h 1 0 y=1

h 1 h y=0 h 0 0 y=0

h 0 h

Cost(h(x),y)


103
- 3 -(Logistic Regression)

J()

h(x)=g(TX)

Conjugate Gradient(Broyden fletcher

goldfarb shann,BFGS)(LBFGS) fminunc matlab octave

octave

fminunc

function [jVal, gradient] = costFunction(theta)

jVal = [...code to compute


J(theta)...];

gradient = [...code to compute derivative of J(theta)...];

end

options = optimset('GradObj', 'on', 'MaxIter', '100');

initialTheta = zeros(2,1);

[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

104
- 3 -(Logistic Regression)

6.5

: 6 - 5 - Simplified Cost Function and Gradient Descent (10 min).mkv

J()

p(y=1|x;) x

y=1 y=1

J()

105
- 3 -(Logistic Regression)

(gradient descent)


j
1 m

J = h xi y i xji
m i 1

i
i=1 m x j


J
j

0

1
n 2 0 1 2 n

...
n

106
- 3 -(Logistic Regression)

h x X
T

1
h x
1 e
T
X

0 n

for for i=1

to n for i=1 to n+1 for

107
- 3 -(Logistic Regression)

6.6

: 6 - 6 - Advanced Optimization (14 min).mkv

J()

J()

J() J

01 n

J()

J()

J()


J() J
j


108
- 3 -(Logistic Regression)


J() J
j

BFGS () L-BFGS (

) J()

(line search)

BFGS L-BFGS

L-BGFS BFGS

Octave MATLAB

Octave

CC + + Java

L-BFGS
109
- 3 -(Logistic Regression)

0 1

1 2 J() 1

5
2 5
J()

Octave

function [jVal, gradient]=costFunction(theta)

jVal=(theta(1)-5)^2+(theta(2)-5)^2;

gradient=zeros(2,1);

gradient(1)=2*(theta(1)-5);

gradient(2)=2*(theta(2)-5);

end

21

costFunction

fminunc Octave

options=optimset('GradObj','on','MaxIter',100);

initialTheta=zeros(2,1);

[optTheta, functionVal, exitFlag]=fminunc(@costFunction, initialTheta, options);

options options options

GradObj On(on)

100

21 fminunc@
110
- 3 -(Logistic Regression)

costFunction

Octave

theta costFunction jval

gradientgradient
theta(1) theta(2)

111
- 3 -(Logistic Regression)

6.7

: 6 - 7 - Multiclass Classification_ One-vs-all (6 min).mkv

(logistic regression)

"" (one-vs-all)

y=1y=2y=3y=4

y=1 y=2 y=3

"" 1 31 4

0 1 2 3 1 2 3 4 1

112
- 3 -(Logistic Regression)

""

y=1

y=2 y=3

1 ""

2 3 1

113
- 3 -(Logistic Regression)

y=1

y=2 ,

h
i
x
i y=i x

x h
i
x

max h x
i
i
i

i y

114
- 3 -(Regularization)

(Regularization)

7.1

: 7 - 1 - The Problem of Overfitting (10 min).mkv

(over-fitting)

(regularization)

115
- 3 -(Regularization)

1.

PCA

2. magnitude

116
- 3 -(Regularization)

7.2

: 7 - 2 - Cost Function (10 min).mkv

3 4 3 4

3 4

3 4

Regularization Parameter 0

h(x)=0

117
- 3 -(Regularization)

n

2
j

j 1

Cost Function

0 0

118
- 3 -(Regularization)

7.3

: 7 - 3 - Regularized Linear Regression (11 min).mkv

j=1,2,...,n

(n+1)*(n+1)

119
- 3 -(Regularization)

7.4

: 7 - 4 - Regularized Logistic Regression (9 min).mkv

J()

J()

h(x)=g(TX)

Octave fminuc

1.

h(x)

2. 0
120
- 3 -(Regularization)

121
- 4 -(Neural Networks: Representation)

(Neural Networks: Representation)

8.1

: 8 - 1 - Non-linear Hypotheses (10 min).mkv

x1x2

100 100

x1x2+x1x3+x1x4+...+x2x3+x2x4+...+x99x100, 5000

RGB

122
- 4 -(Neural Networks: Representation)

50x50

2500 25002/2

123
- 4 -(Neural Networks: Representation)

8.2

: 8 - 2 - Neurons and the Brain (8 min).mkv

90

124
- 4 -(Neural Networks: Representation)

BrainPort

FDA ()

125
- 4 -(Neural Networks: Representation)

YouTube

126
- 4 -(Neural Networks: Representation)

127
- 4 -(Neural Networks: Representation)

8.3 1

: 8 - 3 - Model Representation I (12 min).mkv

/processing unit/ Nucleus/

input/Dendrite/output/Axon

128
- 4 -(Neural Networks: Representation)

activation unit

weight

129
- 4 -(Neural Networks: Representation)

x1,x2,x3 input units

a1,a2,a3

h(x)

3 Input Layer

Output LayerHidden Layers

bias unit

a
(j)
i
j i (j) j j+1

(1)
j+1

(1)
3*4

130
- 4 -(Neural Networks: Representation)

a x x

( FORWARD PROPAGATION )

x0
10 ... ... ... a1
x1
x, a X ... ... ... ... a a 2
x2
... ... ... 33 a3
x3

X a

131
- 4 -(Neural Networks: Representation)

8.4 2

: 8 - 4 - Model Representation II (12 min).mkv

( FORWARD PROPAGATION )

Neuron Networks

132
- 4 -(Neural Networks: Representation)

a0,a1,a2,a3 Logistic Regression h(x)

logistic regression logistic regression

[x1~x3][a(2)1~a(2)3],

h(x)=g((2)0 a(2)0+(2)1 a(2)1+(2)2 a(2)2+(2)3 a(2)3)

a0,a1,a2,a3 x0,x1,x2,x3

x a

133
- 4 -(Neural Networks: Representation)

8.5 1

: 8 - 5 - Examples and Intuitions I (7 min).mkv

x1,x2,...,xn

AND

OR

AND output

sigmod

AND

0 =-301 =202 =20

h(x)

g(x)

134
- 4 -(Neural Networks: Representation)

AND

OR

OR AND

135
- 4 -(Neural Networks: Representation)

8.6 II

: 8 - 6 - Examples and Intuitions II (10 min).mkv

BINARY LOGICAL OPERATORS0 1

-302020AND

-102020OR

10-20NOT

XNOR 1 0 XNOR=x1ANDx2

OR((NOTx1)AND(NOTx2))

(NOTx1)AND(NOTx2)

136
- 4 -(Neural Networks: Representation)

AND (NOTx1)AND(NOTx2) OR

XNOR

137
- 4 -(Neural Networks: Representation)

8.7

: 8 - 7 - Multiclass Classification (4 min).mkv

y=1,2,3.

1 0

x 4 4

[a b c d]T a,b,c,d 1

138
- 5 -(Neural Networks: Learning)

(Neural Networks: Learning)

9.1

: 9 - 1 - Cost Function (7 min).mkv

m x yL

Sl neuron (SL )SL

SL=1, y=0 or 1

K SL=K, yi = 1 i K>2

scalar y

h(x) K

139
- 5 -(Neural Networks: Learning)

K K

K y

0 j

sl +1 i sl

h(x)-

regularization bias

140
- 5 -(Neural Networks: Learning)

9.2

: 9 - 2 - Backpropagation Algorithm (12 min).mkv

h(x)

x(1),y(1)

K=4SL=4L=4

yk

k=1:K


141
- 5 -(Neural Networks: Learning)

g'(z(3)) S g'(z(3))=a(3).*(1-a(3))((3))T(4)

=0

j j

i i

l i j

142
- 5 -(Neural Networks: Learning)

Octave fminuc

Theta1Theta2 Theta3 10*1110*11

1*11

thetaVec = [Theta1(:) ; Theta2(:) ; Theta3(:)]

...optimization using functions like fminuc...

Theta1 = reshape(thetaVec(1:110, 10, 11);

Theta2 = reshape(thetaVec(111:220, 10, 11);

Theta1 = reshape(thetaVec(221:231, 1, 11);

143
- 5 -(Neural Networks: Learning)

9.3

: 9 - 3 - Backpropagation Intuition (13 min).mkv

144
- 5 -(Neural Networks: Learning)

145
- 5 -(Neural Networks: Learning)

146
- 5 -(Neural Networks: Learning)

9.4

: 9 - 4 - Implementation Note_ Unrolling Parameters (8 min).mkv

147
- 5 -(Neural Networks: Learning)

9.5

: 9 - 5 - Gradient Checking (12 min).mkv

Numerical Gradient Checking

- +

0.001

Octave

gradApprox = (J(theta + eps) J(theta - eps)) / (2*eps)

148
- 5 -(Neural Networks: Learning)

149
- 5 -(Neural Networks: Learning)

9.6

: 9 - 6 - Random Initialization (7 min).mkv

1011

Theta1 = rand(10, 11) * (2*eps) eps

150
- 5 -(Neural Networks: Learning)

9.7

: 9 - 7 - Putting It Together (14 min).mkv

1.

2. h(x)

3. J

4.

5.

6.

151
- 5 -(Neural Networks: Learning)

9.8

: 9 - 8 - Autonomous Driving (7 min).mkv

Dean Pomerleau

152
- 5 -(Neural Networks: Learning)

ALVINN (Autonomous Land Vehicle In a Neural Network)

ALVINN NavLab

ALVINN

ALVINN ALVINN

30x32 ALVINN

ALVINN

153
- 5 -(Neural Networks: Learning)

ALVINN 12

154
- 6 -(Advice for Applying Machine Learning)

(Advice for Applying Machine Learning)

10.1

: 10 - 1 - Deciding What to Try Next (6 min).mkv

x1x2 x3

155
- 6 -(Advice for Applying Machine Learning)

x1 x2 x1x2

lambda

1.

2.

3.

4.

5.

6.

""

156
- 6 -(Advice for Applying Machine Learning)

157
- 6 -(Advice for Applying Machine Learning)

10.2

: 10 - 2 - Evaluating a Hypothesis (8 min).mkv

h(x)

70%

30%

158
- 6 -(Advice for Applying Machine Learning)

1. J

2.

159
- 6 -(Advice for Applying Machine Learning)

10.3

: 10 - 3 - Model Selection and Train_Validation_Test Sets (12 min).mkv

10

60% 20% 20%

1. 10

2. 10

3.

4. 3

160
- 6 -(Advice for Applying Machine Learning)

161
- 6 -(Advice for Applying Machine Learning)

10.4

: 10 - 4 - Diagnosing Bias vs. Variance (8 min).mkv

162
- 6 -(Advice for Applying Machine Learning)

d d

d d

163
- 6 -(Advice for Applying Machine Learning)

10.5 /

: 10 - 5 - Regularization and Bias_Variance (11 min).mkv

0-10 2

0,0.01,0.02,0.04,0.08,0.15,0.32,0.64,1.28,2.56,5.12,10 12

1. 12

2. 12

3.

4. 3

164
- 6 -(Advice for Applying Machine Learning)

165
- 6 -(Advice for Applying Machine Learning)

10.6

: 10 - 6 - Learning Curves (12 min).mkv

sanity check

100 1

166
- 6 -(Advice for Applying Machine Learning)

167
- 6 -(Advice for Applying Machine Learning)

10.7

: 10 - 7 - Deciding What to Do Next Revisited (7 min).mkv

1.1

1.

2.

3.

4.

5.

6.


168
- 6 -(Advice for Applying Machine Learning)

169
- 6 -(Machine Learning System Design)

(Machine Learning System Design)

11.1

: 11 - 1 - Prioritizing What to Work On (10 min).mkv

100

1 0 1001

1.

2.

3.

4. watch w4tch

" Honey Pot "

170
- 6 -(Machine Learning System Design)

11.2

: 11 - 2 - Error Analysis (13 min).mkv

error analysis

24

1.

2.

3.

171
- 6 -(Machine Learning System Design)

discount/discounts/discounted/discounting

172
- 6 -(Machine Learning System Design)

173
- 6 -(Machine Learning System Design)

11.3

: 11 - 3 - Error Metrics for Skewed Classes (12 min).mkv

skewed classes

0.5%

0.5% 1%

PrecisionRecall

1. True Positive,TP

2. True Negative,TN

3. False Positive,FP

4. False Negative,FN

=TP/TP+FP

=TP/TP+FN

174
- 6 -(Machine Learning System Design)

11.4

: 11 - 4 - Trading Off Precision and Recall (14 min).mkv

0-1 0.5

Precision=TP/TP+FP

Recall=TP/TP+FN

0.5 0.70.9

0.5 0.3

175
- 6 -(Machine Learning System Design)

F1 F1 Score

F1

176
- 6 -(Machine Learning System Design)

11.5

: 11 - 5 - Data For Machine Learning (11 min).mkv

Michele Banko Eric Brill

__ (to,two,too)

2001

"

" (perceptron) Winnow

177
- 6 -(Machine Learning System Design)

0.1 1000 10

""

""

"

"

x y

twototoo x


178
- 6 -(Machine Learning System Design)

__ (two)

to too

179
- 6 -(Machine Learning System Design)

x y

y x

180
- 7 -(Support Vector Machines)

(Support Vector Machines)

12.1

: 12 - 1 - Optimization Objective (15 min).mkv

A B

(Support Vector

Machine) SVM

181
- 7 -(Support Vector Machines)

. z T x

y=1

y=1 h(x)

1 h(x) 1 T x

0>> 0 z T x z 0

y=0 0 T x z 0

182
- 7 -(Support Vector Machines)

(x, y)

1/m

1/m

y 1 y 0

y 1 y 1 (1-y)

1
0 y 1 (x, y) y 1 log(1 )

1 e z

z T x y y

1 1 z

z T x

y=1 T x

183
- 7 -(Support Vector Machines)

1
log(1 )
z
1 e

z=1

()

y=1

y=1 y=0

y=0 0

z z

184
- 7 -(Support Vector Machines)

cos t ( z)
1

cos t ( z) y=1 y=0


0

J()

cos t ( z)
1

cos t ( x) cos t 0( z ) cos t 0( x)


T T
1

cost1 cost0
185
- 7 -(Support Vector Machines)

1/m

1/m 1/m

1/m

(u-5)^2+1

u u=5

10

10(u-5)^2+10 u u 5

m m

B A B

A+B

A B

CCA+B

B C

B A

C 1/ 1/

C 1/ C
186
- 7 -(Support Vector Machines)

1/

SVM C

y 1 0

1 T x 0 0

SVM

187
- 7 -(Support Vector Machines)

12.2

: 12 - 2 - Large Margin Intuition (11 min).mkv

SVM

z cost1(z)

z cost0(z) z

y 1

z 1 cost1(z) 0

T x >=1 y 0 cost0(z) z<=1

0 y

1 T x 0 T x >0

0 T x <=0

T x >0 0 1 0

-1

188
- 7 -(Support Vector Machines)

C 100000

C 0

y=1 0

T x >=1 y=0 cost0(z)

0 T x <=-1 0

0 0

C 0 C 0

T xi >=1 y (i) 1 T xi <=-1 i


189
- 7 -(Support Vector Machines)

(margin)

190
- 7 -(Support Vector Machines)

C 100000

y=1 y=0
191
- 7 -(Support Vector Machines)

(outlier)

C 1/

C=1/


192
- 7 -(Support Vector Machines)

193
- 7 -(Support Vector Machines)

12.3

: 12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20

min).mkv

u v

u T v u T v u v

u1 u2 u

u u u u u

u u u
2 2
1 2
u

v v

v1 v2 v u v

v u 90

u p p

v u p v u
194
- 7 -(Support Vector Machines)

u T v=p u u

p u

u T v [u1 u2] v u1v1+ u2v2

u T v=vTu

u v u v v u u

v u

p u T v

p p

u v u v

90 v u p

u T v p u p

u v 90 p

90 p

90

195
- 7 -(Support Vector Machines)

0 = 0 n 2

x1 x2

12
2
1 1

2 2 2 2 2
n=2 1 2 1 2

2 2

1 2

012 0 0 12

0 12 0

1

2

2

T x

x? u T v

x(i) u v

196
- 7 -(Support Vector Machines)

i i i
x x 1
x2

2 T x(i)

p (i) i

T x(i) p
i i
1 x 1
2 x 2

x(i)

i i
x >=1 x <-1 ,
T T

i
x >=1 T x = p
i i
p

i i
x p
T

197
- 7 -(Support Vector Machines)

1

2

2

0 =0

90

, 0 =0

(0,0)

x(1)

p(1) x(2)

p(2)

p(2)

p(2) 90 p(2) 0

p(i)

i
p >=1, p(i) ,

1
. p(1) , p >=1,

2
p(1) p <= -

1 p(2)


198
- 7 -(Support Vector Machines)

x(1) x p(1) p(1)

x(2)p(2)
(1)
p(1) p(2) p >1

p(1)

p(1) p(2) p(3) p(1) p(2)

p(3)


p(i)

0 =0

199
- 7 -(Support Vector Machines)

0 = 0

0 0

0 0

C 0 0

200
- 7 -(Support Vector Machines)

12.4 1

: 12 - 4 - Kernels I (16 min).mkv

... h(x)=f1+f2+...+fn

f1,f2,f3

x
x landmarksl(1),l(2),l(3)

f1,f2,f3

x l(1)

similarityx,l(1)Gaussian

201
- 7 -(Support Vector Machines)

Kernel

x L 0
)
f e-0=1 x L f e-( =0

[x1 x2] l(1)

x1x2 f x l(1) f

x f 2

l(1) l(2) l(3)

f1 1 f2,f3 0 h(x)=0+1f1+2f2+1f3>0 y=1

l(2) y=1

y=0

f1,f2,f3

202
- 7 -(Support Vector Machines)

12.5 2

: 12 - 5 - Kernels II (16 min).mkv

m l(1)=x(1),l(2)=x(2),...,l(m)=x(m)

x f Tf>=0 y=1

203
- 7 -(Support Vector Machines)

TM T M

liblinear,libsvm

linear kernel

204
- 7 -(Support Vector Machines)

12.6

: 12 - 6 - Using An SVM (21 min).mkv

SVM

SVM

SVM

liblinear libsvm

Polynomial Kernel

String kernel

chi-square kernel

histogram intersection kernel

...

Mercer's

k k k

SVM

1 C /

SVM SVM
205
- 7 -(Support Vector Machines)

n m

(1) m n

(2) n m n 1-1000 m 10-10000

(3) n m n 1-1000 m 50000

SVM

SVM

SVM SVM

1 10,000

5 50,000

SVM

SVM SVM

SVM
206
- 7 -(Support Vector Machines)

SVM

SVM

SVM

SVM

SVM

207
- 8 -(Clustering)

(Clustering)

13.1

: 13 - 1 - Unsupervised Learning_ Introduction (3 min).mkv

x(1),x(2)..

x(m) y

208
- 8 -(Clustering)

Facebook Google+

209
- 8 -(Clustering)

210
- 8 -(Clustering)

13.2 K-

: 13 - 2 - K-Means Algorithm (13 min).mkv

K-

K- n :

K cluster centroids

2-4

211
- 8 -(Clustering)

10

1,2,...,k c(1),c(2),...,c(m) i

K-

Repeat {

for i = 1 to m

c(i) := index (form 1 to K) of cluster centroid closest to x(i)

for k = 1 to K

k := average (mean) of points assigned to cluster k

for i

for k

K-

K-

T-

212
- 8 -(Clustering)

13.3

: 13 - 3 - Optimization Objective (7 min).mkv

K-

K- Distortion function

c(i) x(i)

c(1),c(2),...,c(m) 1,2,...,k

K- c(i)

213
- 8 -(Clustering)

13.4

: 13 - 4 - Random Initialization (8 min).mkv

K-

1. K<m

2. K K K

K-

K-

K- K

2--10 K

214
- 8 -(Clustering)

13.5

: 13 - 5 - Choosing the Number of Clusters (8 min).mkv

K-

JK

1 2 2 3 3

K 3 K

T- 3

S,M,L 5 XS,S,M,L,XL T-

215
- 8 -(Dimensionality Reduction)

(Dimensionality Reduction)

14.1

: 14 - 1 - Motivation I_ Data Compression (10 min).mkv

x1:X2

x1 X2

216
- 8 -(Dimensionality Reduction)

X1

X2

217
- 8 -(Dimensionality Reduction)

1000

100

218
- 8 -(Dimensionality Reduction)

14.2

: 14 - 2 - Motivation II_ Visualization (6 min).mkv

50 GDP

GDP 50

219
- 8 -(Dimensionality Reduction)

14.3

: 14 - 3 - Principal Component Analysis Problem Formulation (9 min). mkv

PCA

PCA Vector direction

n k u(1),u(2),...,u(k)

Projected

Error

PCA n k 100 10
220
- 8 -(Dimensionality Reduction)

90% KL PCA PCA

PCA

PCA PCA

221
- 8 -(Dimensionality Reduction)

14.4

: 14 - 4 - Principal Component Analysis Algorithm (15 min).mkv

PCA n k

xj= xj -j

covariance matrix

eigenvectors:

Octave singular value decomposition[U, S, V]=

svd(sigma)

nn U

n k U K

nk Ureduce

z(i)

x n1 k1

222
- 8 -(Dimensionality Reduction)

14.5

: 14 - 5 - Choosing The Number Of Principal Components (13 min).mkv

1% 99%

95%

K=1 Ureduce z

1% K=2 1% K

K Octave svd

[U, S, V] = svd(sigma)

S nn 0

223
- 8 -(Dimensionality Reduction)

14.6

: 14 - 6 - Reconstruction from Compressed Representation (4 min).mkv

PCA 1000

100

i i
Z 100 x 1000

PCA x(1),X(2)

Z(1)

Z(1)

x 2 z 1 z x x appox U reduce z
T
U reduce

x appox
x

224
- 8 -(Dimensionality Reduction)

PCA X Z

PCA PCA

225
- 8 -(Dimensionality Reduction)

14.7

: 14 - 7 - Advice for Applying PCA (13 min).mkv

100100

10000

1. 1000

2.

3. Ureduce x z

Ureduce

226
- 9 -(Anomaly Detection)

(Anomaly Detection)

15.1

: 15 - 1 - Problem Motivation (8 min).mkv

(Anomaly detection)

QA

()

x(1) x(m) m

xtest
227
- 9 -(Anomaly Detection)

x(1),x(2),..,x(m) xtest

p(x)

X(i) = i

p(x) =

p(x)<

CPU

228
- 9 -(Anomaly Detection)

15.2

: 15 - 2 - Gaussian Distribution (10 min).mkv

x x~N(,2)

m m-1

1/m 1/(m-1)

1/m

229
- 9 -(Anomaly Detection)

15.3

: 15 - 3 - Algorithm (12 min).mkv

x(1),x(2),...,x(m) 2

p(x)

p(x)<

z p(x)

p(x)= p(x)>
230
- 9 -(Anomaly Detection)

p(x) x

231
- 9 -(Anomaly Detection)

15.4

: 15 - 4 - Developing and Evaluating an Anomaly Detection System (13

min). mkv

10000 20

6000

2000 10

2000 10

1. p(x)

2. F1

3. F1

232
- 9 -(Anomaly Detection)

15.5

: 15 - 5 - Anomaly Detection vs. Supervised Learning (8 min).mkv

y=1,

y=0

1. 1.

2. 2.

3. 3.

233
- 9 -(Anomaly Detection)

15.6

: 15 - 6 - Choosing What Features to Use (12 min).mkv

x = log(x+c) c

x=xcc 0-1

p(x)

234
- 9 -(Anomaly Detection)

CPU

235
- 9 -(Anomaly Detection)

15.7

: 15 - 7 - Multivariate Gaussian Distribution (Optional) (14 min).mkv

X p(x)

p(x)

p(x)

p(x):


236
- 9 -(Anomaly Detection)

|| Octave det(sigma)

1.

2. 1 2

3. 2 1

4.

5.

1233

m>n

m>10n

237
- 9 -(Anomaly Detection)

238
- 9 -(Anomaly Detection)

15.8

: 15 - 8 - Anomaly Detection using the Multivariate Gaussian Distribution

(Optional) (14 min).mkv

n n n

PCA

239
- 9 -(Anomaly Detection)

P(x)

240
- 9 -(Anomaly Detection)

241
- 9 -(Recommender Systems)

(Recommender Systems)

16.1

: 16 - 1 - Problem Formulation (8 min).mkv

iTunes Genius

242
- 9 -(Recommender Systems)

5 4

Alice Bob

Carol Dave

nu

nm

r(i,j) i j r(i,j)=1

y(i,j) i j

mj j

243
- 9 -(Recommender Systems)

16.2

: 16 - 2 - Content Based Recommendations (15 min).mkv

x1 x2

x(1)[0.9 0]

(1)

(j) j

x(i) i

j i((j))Tx(i)

i:r(i,j) j

1/2m m 0

244
- 9 -(Recommender Systems)

245
- 9 -(Recommender Systems)

16.3

: 16 - 3 - Collaborative Filtering (10 min).mkv

1. x(1),x(2),...,x(nm)(1),(2),...,(nu)

2.

3. ((j))Tx(i) j i

x(i) x(j)

||x(i)-x(j)||

246
- 9 -(Recommender Systems)

16.4

: 16 - 4 - Collaborative Filtering Algorithm (9 min).mkv

247
- 9 -(Recommender Systems)

16.5

: 16 - 5 - Vectorization_ Low Rank Matrix Factorization (8 min).mkv

1.

2.

Y 5 4

Movie Alice (1) Bob (2) Carol (3) Dave (4)

Love at last 5 5 0 0

Romance forever 5 ? ? 0

Cute puppies of love ? 4 0 ?

Nonstop car chases 0 0 5 4

Swords vs. karate 0 0 5 ?

248
- 9 -(Recommender Systems)

i x(i)

j x(i) x(j) i j

i j

i 5

5 j

249
- 9 -(Recommender Systems)

16.6

: 16 - 6 - Implementational Detail_ Mean Normalization (9 min).mkv

Eve Eve

Eve

((j))T(x(i))+i Eve

250
- 10 -(Large Scale Machine Learning)

10

(Large Scale Machine Learning)

17.1

: 17 - 1 - Learning With Large Datasets (6 min).mkv

100

20

1000

251
- 10 -(Large Scale Machine Learning)

17.2

: 17 - 2 - Stochastic Gradient Descent (13 min).mkv

252
- 10 -(Large Scale Machine Learning)

17.3

: 17 - 3 - Mini-Batch Gradient Descent (6 min).mkv

b 2-100

253
- 10 -(Large Scale Machine Learning)

17.4

: 17 - 4 - Stochastic Gradient Descent Convergence (12 min). mkv

X X

254
- 10 -(Large Scale Machine Learning)

1000

255
- 10 -(Large Scale Machine Learning)

17.5

: 17 - 5 - Online Learning (13 min).mkv


A B

$50 $20

A B

y=1y=0

p(y=1)

256
- 10 -(Large Scale Machine Learning)

2 3 3

257
- 10 -(Large Scale Machine Learning)

17.6

: 17 - 6 - Map Reduce and Data Parallelism (14 min).mkv

CPU

400 4

CPU

258
- 10 -(Application Example: Photo OCR)

(Application Example: Photo OCR)

18.1

: 18 - 1 - Problem Description and Pipeline (7 min).mkv

1. Text detection

2. Character segmentation

3. Character classification

259
- 10 -(Application Example: Photo OCR)

18.2

: 18 - 2 - Sliding Windows (15 min).mkv

260
- 10 -(Application Example: Photo OCR)

261
- 10 -(Application Example: Photo OCR)

18.3

: 18 - 3 - Getting Lots of Data and Artificial Data (16 min).mkv

1.

2.

3.

262
- 10 -(Application Example: Photo OCR)

18.4

: 18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14

min).mkv

100%

72%

100% 72%

89%

100%

1%

100%

10%

263
- 10 -(Conclusion)

(Conclusion)

19.1

: 19 - 1 - Summary and Thank You (5 min).mkv

x(i)y(i)

K-

x(i)

F1

264
- 10 -(Conclusion)

Andew Ng

265