You are on page 1of 12

Machine Learning

Unit 1

1. What do you mean by a well-posed learning problem? Explain the


important features that are required to well-define a learning
problem.
Ans)
Definition: A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks T, as
measured by P, improves with experience E.

Identify three features:


 class of tasks „
 measure of performance to be improved „
 source of experience

A Robot Driving Learning Problem:

 Task T: driving on public, 4-lane highway using vision sensors „


 Performance measure P: average distance traveled before an error (as
judged by human overseer) „
 Training experience E: a sequence of images and steering commands
recorded while observing a human driver

A Handwriting Recognition Learning Problem:

 Task T: recognizing and classifying handwritten words within images


 Performance measure P: percent of words correctly classified „
 Training experience E: a database of handwritten words with given
classifications
Text Categorization Problem:„

 Task T: assign a document to its content category „


 Performance measure P: Precision and Recall „
 Training experience E: Example pre classified documents

2)
a) Explain the working of find-s algorithm with an example.
Ans)

Find-S Algorithm:

(OR)
Example:
Here is an example to find target hypothesis using the Find-S algorithm. This
example is divided into multiple parts:
1. Training data generation: As we know that the concept learning works on past
experiences, we need to have training data ready for the learning process. This
step involves training data generation with a simple example. To test the
application you can create your own data, the application currently generates test
data as Map[String, Any].

2. Trainer initialization: This task involves the creation of a Trainer with a model
(Find-S) and some basic configuration like training ratio (Ration between
training samples and validation samples, typically represented by the double
value in range 0 to 1 where 0 represents 0% and 1 represents 100%).

3. Training: The trainer is completely responsible to make a model learn the


concept from training samples, but we need to trigger that event using
trainer function ‘train’. Once is triggered, the trainer divides the training samples
into two parts training data and validation data based on training ratio, and pass
training samples to model synchronously.
4. Trained Model: After finishing training process we can use trained model to
make predictions and can analyze final hypothesis(hypothesis set) using the
‘getHypothesis’ function.

5. Testing: To test the training model we can pass a sample object into the model
using predict function and compare actual output with expected output for
verification.

2 b) Will the Candidate Elimination algorithm converge to the correct


hypothesis? Justify.
Ans)

• If there is a consistent hypothesis then the algorithm will converge to


S = G = {h} when enough examples are provided
• False examples may cause the removal of the correct h
• If the examples are inconsistent, S and G become empty
• This can also happen, when the concept to be learned is not in H
3 a) How to represent the decision tree & discuss any one appropriate
problem for decision tree learning?
Ans)
Decision tree:
A decision tree is a tree where each node represents a feature(attribute),
each link(branch) represents a decision(rule) and each leaf represents an
outcome(categorical or continues value).

DECISION TREE REPRESENTATION:


Decision trees classify instances by sorting them down the tree from the
root to some leaf node, which provides the classification of the instance. Each
node in the tree specifies a test of some attribute of the instance, and each
branch descending from that node corresponds to one of the possible values for
this attribute. An instance is classified by starting at the root node of the tree,
testing the attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example. This process is
then repeated for the sub tree rooted at the new node.
A decision tree for the concept PlayTennis. An example is classified by sorting it
through the tree to the appropriate leaf node, then returning the classification
associated with this leaf (in this case, Yes or No). This tree classifies Saturday
mornings according to whether or not they are suitable for playing tennis.
Figure illustrates a typical learned decision tree. This decision tree classifies
Saturday mornings according to whether they are suitable for playing tennis. For
example, the instance
(Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong)
would be sorted down the leftmost branch of this decision tree and would
therefore be classified as a negative instance (i.e., the tree predicts that
PlayTennis = no). This tree and the example used in Table 3.2 to illustrate the ID3
learning algorithm are adapted from (Quinlan 1986).
In general, decision trees represent a disjunction of conjunctions of
constraints on the attribute values of instances. Each path from the tree root to a
leaf corresponds to a conjunction of attribute tests, and the tree itself to a
disjunction of these conjunctions. For example, the decision tree shown in Figure
corresponds to the expression
(Outlook = Sunny A Humidity = Normal) V (Outlook = Overcast) v (Outlook =
Rain A Wind = Weak)

APPROPRIATE PROBLEM FOR DECISION TREE LEARNING:

Avoiding overfitting:
Since the ID3 algorithm continues splitting on attributes until either it
classifies all the data points or there are no more attributes to splits on. As a
result, it is prone to creating decision trees that overfit by performing really well
on the training data at the expense of accuracy with respect to the entire
distribution of data.
There are, in general, two approaches to avoid this in decision trees: - Allow the
tree to grow until it overfits and then prune it. – Prevent the tree from growing
too deep by stopping it before it perfectly classifies the training data.
A decision tree’s growth is specified in terms of the number of layers, or depth,
it’s allowed to have. The data available to train the decision tree is split into
training and testing data and then trees of various sizes are created with the help
of the training data and tested on the test data. Cross-validation can also be used
as part of this approach. Pruning the tree, on the other hand, involves testing the
original tree against pruned versions of it. Leaf nodes are removed from the tree
as long as the pruned tree performs better on the test data than the larger tree.

(OR)

Incorporating continuous valued attributes:


Our initial definition of ID3 is restricted to attributes that take on a discrete
set of values. One way to make the ID3 algorithm more useful with continuous
variables is to turn them, in a way, into discrete variables. Let’s say in our example
of Play Badminton the temperature is continuous (see the following table), we
could test the information gain of certain partitions of the temperature values,
such as temperature > 42.5. Typically, whenever the classification changes from
no to yes or yes to no, the average of the two temperatures is taken as a potential
partition boundary.
Because 42 corresponds to No and 43 corresponds to Yes, 42.5 becomes a
candidate. If any of the partitions end up exhibiting the greatest information gain,
then it is used as an attribute and temperature is removed from the set of
potential attributes to split on.

3 b) Elaborate the Classification And Regression Tress (CART) with


examples.
Ans)
Decision Trees are commonly used in data mining with the objective of creating a
model that predicts the value of a target (or dependent variable) based on the
values of several input (or independent variables). In today's post, we discuss the
CART decision tree methodology. The CART or Classification & Regression Trees
methodology was introduced in 1984 by Leo Breiman, Jerome Friedman, Richard
Olshen and Charles Stone as an umbrella term to refer to the following types of
decisiontrees:
 Classification Trees: where the target variable is categorical and the tree is
used to identify the "class" within which a target variable would likely fall
into.

 Regression Trees: where the target variable is continuous and tree is used
to predict it's value.

Classification tree example: Consider the widely referenced Iris data classification
problem introduced by Fisher [1936; see also Discriminant Function
Analysis and General Discriminant Analysis (GDA)]. The data file Irisdat reports the
lengths and widths of sepals and petals of three types of irises (Setosa, Versicol,
and Virginic). The purpose of the analysis is to learn how we can discriminate
between the three types of flowers, based on the four measures of width and
length of petals and sepals. Discriminant function analysis will estimate several
linear combinations of predictor variables for computing classification scores (or
probabilities) that allow the user to determine the predicted classification for
each observation. A classification tree will determine a set of logical if-then
conditions (instead of linear equations) for predicting or classifying cases instead:
The interpretation of this tree is
straightforward: If the petal width is
less than or equal to 0.8, the
respective flower would be classified
as Setosa; if the petal width is
greater than 0.8 and less than or
equal to 1.75, then the respective
flower would be classified as
Versicol; else, it belongs to class
Virginic.

Regression tree Example:


The general approach to derive predictions from few simple if-then
conditions can be applied to regression problems as well. This example is based
on the data file Poverty, which contains 1960 and 1970 Census figures for a
random selection of 30 counties. The research question (for that example) was to
determine the correlates of poverty, that is, the variables that best predict the
percent of families below the poverty line in a county. A reanalysis of those data,
using the regression tree analysis [and v-fold cross-validation, yields the following
results:
Again, the interpretation of these
results is rather straightforward:
Counties where the percent of
households with a phone is greater
than 72% have generally a lower
poverty rate. The greatest poverty
rate is evident in those counties that
show less than (or equal to) 72% of
households with a phone, and
where the population change (from
the 1960 census to the 170 census)
is less than -8.3 (minus 8.3). These results are straightforward, easily presented,
and intuitively clear as well: There are some affluent counties (where most
households have a telephone), and those generally have little poverty. Then there
are counties that are generally less affluent, and among those the ones that
shrunk most showed the greatest poverty rate. A quick review of the scatterplot
of observed vs. predicted values shows how the discrimination between the latter
two groups is particularly well "explained" by
the tree model.

(OR)
Visit the below link for CART example:
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
https://www.datasciencecentral.com/profiles/blogs/introduction-to-
classification-regression-trees-cart

You might also like