Professional Documents
Culture Documents
LEARNING
An introduction
APPLICATIONS
2
WHY MACHINE LEARNING?
An agent is learning if it improves its performance on future tasks after
making observations about the world.
3
TYPES OF MACHINE
LEARNING
Supervised
Unsupervised
Reinforcement
4
SUPERVISED LEARNING
From a collection of input-output pairs, learn a function that predicts the
output for new inputs.
Requires a collection of labelled data.
5
UNSUPERVISED LEARNING
Agent learns patterns in data, even though no explicit feedback is given.
Clustering is an example of supervised learning.
6
REINFORCEMENT LEARNING
Learns from a series of reinforcements rewards and punishments.
Rewards and punishments are designed to help the agent achieve the goal.
Reinforcements need not be given at the end of each action.
Credit assignment problem.
7
SUPERVISED LEARNING
Given a training set of N samples
8
HYPOTHESIS
The hypothesis must be able to generalize well on data unseen during
training.
It is said to generalise well if it correctly predicts the value of for novel .
9
TYPES
Regression
Classification
10
REGRESSION
11
CLASSIFICATION
12
FEATURES
The in the training data are called features.
As mentioned, they need not be scalars. They can also be vectors.
How to choose features ?
Oranges vs Apples
13
LINEAR REGRESSION
One solution to regression could be
14
EXTENSION
The previous case can be extended to
15
16
BASIS FUNCTION
Basis function is a non-linear function of input variable.
Represented by .
In the previous case, the basis functions were a polynomial functions.
17
MULTIVARIATE LINEAR
REGRESSION
The output might depend on multiple inputs.
All the inputs are captures in terms of a vector. Vectors will be represented
by boldface letters.
18
SOLUTIONS
Now that the model of the system is in place, how do we find the vector ?
Normal equation
Gradient descent
19
NORMAL EQUATION
Then,
20
ERROR FUNCTION
Least square error
21
GRADIENT DESCENT
22
ERROR GRAPH
23
MINIMIZATION
24
VISUALIZATION
25
ITERATION VS ERROR
26
CODE
27
OVERFITTING
A learning algorithm must perform well on novel data.
Hence it must be generalise well.
Failure to do so will lead to large errors on testing data.
28
OVERFITTING
29
ERROR DURING
OVERFITTING
30
WEIGHTS DURING
OVERFITTING
31
WITH LARGE DATA SET
32
REGULARIZATION
We need to minimize the value of the value of the weights, hence we add
regularization.
33
CURVE FITTING WITH
REGULARIZATION
34
BIAS VS VARIANCE
High bias is when the training has underfit the data.
High variance is when the training has overfit the data.
If training error and cross validation error are both high, then the algorithm
suffers from high bias (underfitting).
If training error is low and cv error is high, the algorithm is suffering from
high variance problem. (overfitting)
35
BIAS AND VARIANCE
36
HIGH BIAS (UNDERFIT)
Getting more data wont be useful.
37
HIGH VARIANCE (OVERFIT)
More data can improve the efficiency of the learning algorithm.
38
LOGISTIC REGRESSION
A small change to linear regression will convert it to logistic regression
which is used for binary classification.
Assumed as probability
39
WHY SIGMOID?
40
K-NEAREST NEIGHBORS
One of the simplest machine learning algorithms for classification.
An input is classified based on majority vote of neighbors. The number of
neighbors are decided k (a small positive number)
Ifk=1, then the object is simply assigned to the class of that single
nearest neighbor.
The neighbors are taken from training set but not explicit training happens.
Distance is described by norm function.
Generally, it is assumed to be Euclidian distance.
41
NN VS 5-NN
CLASSIFICATION
42
SENSITIVE TO LOCAL
STRUCTURE
43
SKEWED CLASS
DISTRIBUTION
If the class distribution is skewed, majority voting will lead to wrong results.
Rather the vote from each class can weighted based on the distance from
the unlabelled data.
44
SELECTION OF K
Larger the value of k, lesser is the effect of noise on the output.
In binary classification, it is helpful to choose k to be an odd number to
avoid ties.
Hyper-parameter tuning.
45
ADVANTAGES OF KNN
46
DISADVANTAGES
Computational time for classification increases with the training data size.
Training data is required during classification.
Pixel-wise distance metric for classification leads to classification of data
based on color distribution rather than perceptual or sematic similarity.
47
DISADVANTAGES
48
EVALUATION METRIC FOR
CLASSIFICATION
Useful when the data set is skewed.
Precision is the fraction of all retrieved elements that are relevant.
Precision => How useful the results are.
Recall is the fraction of all relevant instances that are retrieved.
Recall => How complete the results are.
49
PRECISION AND RECALL
50
F-SCORE
51