You are on page 1of 42

Artificial Intelligence

Slide 6
Muhammad Faizan Tahir
Lecturer,
Department of Computer Science
Dadabhoy Institute of Higher Education
Contact:
Faizan_tahir@hotmail.com
+92 321 9767673
What is Learning?
Websters definition learn

To gain knowledge or understanding of, or skill


in by study, instruction or experience''

Learning a set of new facts


Learning HOW to do something
Improving ability of something already learned
What is Learning?
Examples:

Riding a bike (motor skills)


Telephone number (memorizing)
Playing backgammon (strategy)
Develop scientific theory (abstraction)
Language
Recognize fraudulent credit card transactions
Etc.
What is Machine Learning?
Machine learning is programming computers to
optimize a performance criterion using example
data or past experience.
There is no need to learn to calculate payroll

Learning is used when:


-Human expertise does not exist (navigating on Mars),
-Humans are unable to explain their expertise (speech
recognition)
-Solution changes in time (routing on a computer network)
-Solution needs to be adapted to particular cases (user
biometrics)
What is Machine Learning?
Learning general models from a data of particular
examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.

Arthur Samuel (1959): a field of study that gives


computers the ability to learn without being explicitly
programmed
What is Machine Learning?
Optimize a performance criterion using example
data or past experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
-Solve the optimization problem
-Representing and evaluating the model for
inference
Getting computers to program themselves
Automating automation
Let the data do the work instead!
What is Machine Learning?
Traditional Programming

Data
Computer Output
Program

Machine Learning
Data
Computer Program
Output
What is Machine Learning?
No, more like gardening

Seeds = Algorithms
Nutrients = Data
Gardener = You
Plants = Programs
Examples of ML?
is a cutting edge technology with wide applications, e.g.:
Web search: Google, Yahoo!, Bing, etc.
Recommendation systems for books/movies/music, e.g.
Amazon
Personalized internet advertising, e.g. Facebook, Gmail
Spam filtering
Autopilot in planes, cruise control in cars
Finance: Algorithmic trading
Biology: Bioinformatics
Chemistry: Computational chemistry
Big Data applications in any field
[Your favorite area]
Machine Learning In Nutshell
Tens of thousands of machine learning
algorithms
Hundreds new every year
Every machine learning algorithm has three
components:
Representation
Evaluation
Optimization
Representation
Decision trees
Sets of rules / Logic programs
Instances
Graphical models (Bayes/Markov nets)
Neural networks
Support vector machines
Model ensembles
Genetic Algorithm
Etc.
Evaluation
Accuracy
Precision and recall
Squared error
Likelihood
Posterior probability
Cost / Utility
Margin
Entropy
Etc.
Optimization
Combinatorial optimization
E.g.: Greedy search
Convex optimization
E.g.: Gradient descent
Constrained optimization
E.g.: Linear programming
Machine Learning Algorithms
For identifying spam or recognizing written digits, we
have labeled examples to learn from
Supervised Learning: Teach the method the relationships in the
data, based on a set of inputoutput pairs of observations
For finding groups of similar patients, we do not know
the correct groupings
Unsupervised Learning: Let the method learn relationships and
structure of data on its own
Other types of learning algorithms
semisupervised learning, reinforcement learning
Examples
Supervised learning
Decision tree induction
Rule induction
Instance-based learning
Bayesian learning
Neural networks
Support vector machines
Model ensembles
Learning theory
Unsupervised learning
Clustering
Dimensionality reduction
Supervised Learning
Supervised learning
For each training example both the input variables
and the associated response are available

-i.e. input/output pairs (X(1) , Y(1),, (X(n), Y(n) to


available to learning algorithm
Supervised Learning
Given a sequence of input/output pairs of the
form <xi, yi>, where xi is a possible input, and yi
is the output associated with xi: Learn a
function f such that:
f(xi)=yi for all is,
f makes a good guess for the outputs of
inputs that it has not previously seen.

[If f has only 2 possible outputs, f is called a


concept and learning is called concept-
learning.]
Supervised Learning
A supervised learning algorithm analyzes the
training data and produces
(Inferred function) classifier
If the output is discrete.
OR
Regression function
If the output is continuous
Unsupervised Learning
Unsupervised Learning
Measurements for each observation, but no
associated response
i.e. we have X(i)s but not Y(i)s
Use the data to understand the relationships
between variables or among observations
Unsupervised Learning
While Supervised Learning considers the
input/output pairs of the form <xi, yi>,
Unsupervised Learning focuses on the input
only: xi. It has no knowledge of the output,
yi.
Unsupervised Learning attempts to group
together (or cluster) similar xis.
Different similarity measures can be used as
well as different strategies for building the
clusters.
Classification
Given a collection of records (training set)
Each record contains a set of attributes, one of the
attributes is the class.
Find a model for class attribute as a function of
the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets,
Decision Tree
What is a decision tree
Decision Tree Learning (DTL) is a form of inductive
learning task, it has the following objective:

Use a training set of examples to create a


hypothesis that makes general conclusions

Attribute: a variable that we take into account in


making a decision

Target attribute: the attribute that we want to


take on a certain value, well decide based on it
Decision Tree Learning
Decision tree learning is a method for approximating
discrete-valued target functions.
The learned function is represented by a decision tree.
A learned decision tree can also be re-represented as a set of if-then rules.
Decision tree learning is one of the most widely used and
practical methods for inductive inference.
It is robust to noisy data and capable of learning
disjunctive expressions.
Decision tree learning method searches a completely
expressive hypothesis.
Avoids the difficulties of restricted hypothesis spaces.
Its inductive bias is a preference for small trees over large trees.
The decision tree algorithms such as ID3, C4.5 are very
popular inductive inference algorithms, and they are
successfully applied to many leaning tasks.
Decision Tree (Real world examples)
Astronomy:
- Astronomy has been an active domain for using
automated classification techniques.
- gif Use of decision trees for filtering noise from Hubble
Space Telescope images.
- Decision trees have helped in star-galaxy classification,
and determining galaxy counts.

Remote Sensing:
- Remote sensing has been a strong application area for
pattern recognition work on decision trees.
Decision Tree (Real world examples)
Medicine:
- Medical research and practice have long been important
areas of application for decision tree techniques.
- Recent uses of automatic induction of decision trees can
be found in diagnosis, cardiology, psychiatry, etc.

Manufacturing and Production:


- Decision trees have been recently used to non-
destructively test welding quality, for semiconductor
manufacturing, for increasing productivity, for material
procurement method selection, for process optimization
in electrochemical machining, to schedule printed circuit
board assembly lines, etc.
Training Examples

Yes, {D3,D4,D5,D7,D9,D10,D11,D12,D13} No, {D1,D2,D6,D8,D14}


Decision Tree Terms

Root Node

Condition
Condition Check
Check

Leaf Node(Decision
Point)

Leaf Node(Decision
Point)
Constructing a Decision Tree
Which attribute to choose?
- Information Gain
- ENTROPY
Where to stop?
- Termination criteria
Advantages:
They are fast
Robust
Requires very little experimentation
You may also build some intuitions about your customer base. E.g.
Are customers with different family sizes truly different?
Decision Tree for play tennis
Top-Down Induction of Decision Trees -- ID3

1.A the best decision attribute for next node


2.Assign A as decision attribute for node
3.For each value of A create new descendant node
4.Sort training examples to leaf node according to
the attribute value of the branch
5.If all training examples are perfectly classified
(same value of target attribute) STOP, else iterate
over new leaf nodes.
Which Attribute is best?
We would like to select the attribute that is most useful
for classifying examples.
Information gain measures how well a given attribute
separates the training examples according to their target
classification.
ID3 uses this information gain measure to select among
the candidate attributes at each step while growing the
tree.
In order to define information gain precisely, we use a
measure commonly used in information theory, called
entropy
Entropy characterizes the (im)purity of an arbitrary
collection of examples.
Which Attribute is best?
Which Attribute is best?
Want to measure "purity" of the split
- more certain about Yes/No after the split
pure set (4 yes / 0 No) => completely certain (100%)
impure (3 yes / 3 No) => completely uncertain (50%)

- can't use P("Yes | set):


must be symmetric: (4 yes / 0 No) as pure as (0 yes /
4 No)
Which Attribute is best?
Entropy
Entropy is a measure of uncertainty in the
data
Given a collection S, containing positive
and negative examples of some target
concept, the entropy of S relative to this
Boolean classification is:
Entropy(S) = - p+ log2p+ - p-log2p-
S is a sample of training examples
p+ is the proportion of positive examples
p- is the proportion of negative examples
Information Gain
entropy is a measure of the impurity in a collection of
training examples
information gain is a measure of the effectiveness of an
attribute in classifying the training data.
information gain measures the expected reduction in
entropy by partitioning the examples according to an
attribute.
ID3- Result
THANK YOU

You might also like