Professional Documents
Culture Documents
BY:Vatsal J. Gajera
(09BCE010)
What is Machine Learning?
It is a branch of artificial intelligence.It is
a scientfic discipline concerned with the
design and development of algorithms
that allow computers to evolve behaviours
based on empirical data.Such as from
sensors and data bases.
Techanical Definition of
machine learning:
so Info(D)=-(3/7)Log(3/7) –
(4/7)Log(4/7)
=0.9851
Here for young=2,middle=2,senior=3.among young both are from “no” class.
And among middle both are from “yes” class.and among senior 1 is from “yes”
and 2 are from “no” class.
so Info age (D)=((2/7)*(-2/2 Log(2/2) – 0/2 Log(0/2)))+
((2/7)*(-2/2 Log(2/2) - 0/2 Log(0/2)))+
((3/7)*(-1/3 Log(1/3) - 2/3 Log(2/3)))
=0.05931
So Gain(age)=0.9851-0.05931
=0.9257
As we calculated ,gain for age, we have to calculate gain for all attribute.After
calculating gain ,attribute which has highest gain value ,becomes our split
node.
AGE
Young
Senior
Middle
Income Income
Student Student
Credit_rating Income Credit_rating
Class:Buy or not Student Class:Buy or not
Credit_rating
Class:Buy or Not
2. Gain Ratio:
The information gain measure is biased toward tests with many
outcomes.That is,it prefers to select attributes having a large number of
values.For example,consider an attribute that acts as a unique identifier,such
as a product_ID.It would give large no. of partitions. So Info product_ID
(D)=0. so it is useless to calculate information gain.
For our example 2 tuple for young,2 for middle and 3 for senior
so splitInfo age(D)=-2/7 log(2/7) – 2/7 log(2/7) -3/7 log(3/7)
= 1.5564
Attribute, which has maximum gain ratio is selected for split node.
3. Gini Index:
The gini index used in CART. The gini index measures the
impurity of D,
Gini(D)= 1- ∑Pi*Pi (where i=1 to m)