Professional Documents
Culture Documents
Learning
The Philosophy
Advantages
Applications
Algorithms
Venkat Reddy
1
2
3
4
5
6
Bigdata Analytics
Venkat Reddy
The Learning
Bigdata Analytics
Venkat Reddy
Machine Learning
Machine Learning
Trained
machine
Bigdata Analytics
Venkat Reddy
Learning
algorithm
TRAINING
DATA
Answer
Query
Ship
Water
Rock
Iron object
Fiber Object etc.,
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
Its easy for human brain but it is tough for a machine. It takes
some time and good amount of training data for machine to
accurately classify objects
Prospective customers
Dissatisfied customers
Good customers
Bad payers
Obtain:
Bigdata Analytics
Venkat Reddy
Identify:
Face recognition
Signature / fingerprint / iris verification
DNA fingerprinting
Bigdata Analytics
Venkat Reddy
Medicine:
10
Internet
Hit ranking
Spam filtering
Text categorization
Text translation
Recommendation
Bigdata Analytics
Venkat Reddy
Computer interfaces:
11
Bigdata Analytics
Venkat Reddy
1.
2.
3.
12
Bigdata Analytics
Venkat Reddy
13
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
Bayes Network
15
A(2%)
B(90%)
C(8%)
1%
60%
2%
Bigdata Analytics
Venkat Reddy
16
B(50%)
E(40%)
D(60%)
Bigdata Analytics
Venkat Reddy
A(20%)
1%
5%
2%
17
Bigdata Analytics
Venkat Reddy
18
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
Bigdata Analytics
Venkat Reddy
21
Bigdata Analytics
Venkat Reddy
PCA and FA
22
Player
Avg Runs
Total
wickets
Height
Not
outs
Highest Best
Score
Bowling
45
5.5
15
120
50
34
5.2
34
209
38
36
183
46
6.1
78
160
37
45
5.8
56
98
32
5.10
89
183
18
123
35
19
239
6.1
56
18
96
6.6
87
10
16
83
5.9
32
11
17
138
5.10
12
Bigdata Analytics
Venkat Reddy
23
Bigdata Analytics
Venkat Reddy
24
Player
Avg Runs
Total
wickets
Height
Not
outs
Highest Best
Score
Bowling
45
5.5
15
120
50
34
5.2
34
209
38
36
183
46
6.1
78
160
37
45
5.8
56
98
32
5.10
89
183
18
123
35
19
239
6.1
56
18
96
6.6
87
10
16
83
5.9
32
11
17
138
5.10
12
Bigdata Analytics
Venkat Reddy
25
Bigdata Analytics
Venkat Reddy
SVM Classification
26
Linear Classifier
Bigdata Analytics
Venkat Reddy
27
Bigdata Analytics
Venkat Reddy
Linear Classifier
28
Bigdata Analytics
Venkat Reddy
Linear Classifier
Bigdata Analytics
Venkat Reddy
Classifier Margin
Define the margin of a linear classifier as the width that theboundary could be
increased by before hitting a datapoint.
30
Maximum Margin
Bigdata Analytics
Venkat Reddy
31
Support Vectors
w.x+b>0
w.x+b=0
f(x) = sign(w . x + b)
red is +1
blue is -1
Bigdata Analytics
Venkat Reddy
Support Vectors
w.x+b<0
32
ML is more heuristic
Focused on improving performance of a learning agent
Also looks at real-time self learning and robotics areas
not part of data mining
Some algorithms are too heuristic that there is no one
right or wrong answer
Lets take K-Means example
Bigdata Analytics
Venkat Reddy
33
K-Means clustering
Bigdata Analytics
Venkat Reddy
Overall population
34
K-Means clustering
Bigdata Analytics
Venkat Reddy
35
K-Means clustering
Bigdata Analytics
Venkat Reddy
36
K-Means clustering
Bigdata Analytics
Venkat Reddy
37
K-Means clustering
Bigdata Analytics
Venkat Reddy
38
Weight Age
Cust1
68
25
Cust2
72
70
Cust3
100
28
Weight Age
Cust1
68
25
Cust2
72
70
Cust3
100
28
Income
60,000
9,000
62,000
Bigdata Analytics
Venkat Reddy
Weight
Cust1
68
Cust2
72
Cust3
100
39
Distance Measures
Chebychev similarity
Minkowski distance
Mahalanobis distance
Maximum distance
Cosine similarity
Simple correlation between observations
Minimum distance
Weighted distance
Venkat Reddy distance
Not sure all these measures will result in same clusters in the
above example.
So, same ML algorithms with different results. Generally this is not
the case with conventional methods
Bigdata Analytics
Venkat Reddy
Euclidean distance
City-block (Manhattan) distance
40
Bigdata Analytics
Venkat Reddy
Thank you
41
More ..
https://www.facebook.com/groups/SASanalysts/
Bigdata Analytics
Venkat Reddy
http://www.slideshare.net/21_venkat/presentations
42