You are on page 1of 137

R Machine Learning

Trainer: Dr Ghazaleh Babanejad

Website:www.tertiarycourses.com.my
Email: enquiry@tertiaryinfotech.com
About the Trainer
Dr Ghazaleh Babanejad has received Phd from
University Putra Malaysia in Faculty of
Computer Science and Information Technology..She is
working on recommender systems in the field of skyline
queries over Dynamic and Incomplete databases for her
PhD thesis. She is also working on Data Science field as a
trainer and Data Scientist. She worked on Machine
Learning and Process Mining projects. She also has
several international certificates in Practical Machine
Learning (John Hopkins University) Mining Massive
Datasets (Stanford University), Process Mining
(Eindhoven University), Hadoop (University of San Diego),
MongoDB for DBAs (MongoDB Inc) and some
other certificates. She has more than 5 year experience
as a lecturer and data base administrator.
Agenda
Module 1 Introduction to Machine Learning
- What is Machine Learning
- Installing mlr package
- Supervised vs Unsupervised Learning
- Regression vs Classification

Module 2 Datasets
- Iris Dataset
- Boston Housing Price Dataset
- Mtcars Dataset

Module 3 Preprocessing
- Sampling
- Impute missing values
- Normalize columns
- Split data into train and test set
Agenda
Module 4 Regression based Models

- What is Supervised Learning


- Linear Regression
- Logistics Regression Classifier
- Linear Discriminant Analysis

Module 5 Tree based Models


- Decision Tree
- Random Forest
- Gradient Boost
- Xg Boost

Module 6 Nearest Neighbor Models


- Naive Bayes
- KNN
Agenda
Module 7 Unsupervised Learning
- What is Unsupervised Learning
- Clustering
- Dimensionality Reduction

Module 8 Intro to Neural Network (Optional)


- What is Neural Network
- Multi Layer Perceptron
Prerequisite
Basic knowledge of R is assumed
Exercise Files
Download the exercise file from

https://github.com/rkrtiwari/rMachineLe
arning
Module 1
Getting Started
What is Machine Learning?
• Machine Learning is about building
programs with tunable parameters that
are adjusted automatically so as to
improve their behavior by adapting to
previously seen data
• Machine Learning is a subfield of
Artificial Intelligence
Why Machine Learning?
http://www.goratings.org/
Machine Learning
• Supervised Learning
– Classification
– Regression
• Unsupervised Learning
– Clustering
– Dimensionality Reduction
R Packages for ML
• rpart
• randomForest
• e1071
• glmnet
• nnet
• class
• FNN
• Xgboost
• lda
Installing and Loading R ML
Packages

install.packages(“mlr")
library(mlr)
Module 2
Datasets
Iris Flower Dataset
Iris Flower Dataset

setosa (0) versicolor (1) virginica (2)


Iris flower dataset, introduced in 1936 by
Sir Ronald Fisher
Iris Flower Dataset
Features in the Iris dataset:
• sepal length in cm
• sepal width in cm
• petal length in cm
• petal width in cm
Target classes to predict:
• setosa
• versicolor
• virginica
Load Iris Dataset
data(iris)
dim(iris)
levels(iris$Species)
head(iris)
Boston Housing
Price Dataset
Boston Housing Price Dataset
There are 13 features for this dataset.
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000
sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment
centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
-B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by
town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
Load Boston Housing Dataset
library(MASS)
Boston
dim(Boston)
head(Boston)
Mtcars Dataset
Motor Trend Car (mtcars) dataset
There are 11 features for this dataset.
- mpg Miles/(US) gallon
- cyl Number of cylinders
- disp Displacement (cu.in.)
- hp Gross horsepower
- drat Rear axle ratio
- wt Weight (lb/1000)
- qsec 1/4 mile time
- vs V/S
- am Transmission (0 = automatic, 1 =
manual)
- gear Number of forward gears
- carb Number of carburetors
Load MTCars Dataset
mtcars
dim(mtcars)
head(mtcars)
Module 3
Pre-processing data
Sampling
## selecting rows and columns

iris2.1=subset(iris,
select=c("Sepal.Length","Sepal.Width"))
# select only these 2 columns

iris2.2=iris[1:100, ]
# select the first 100 rows
Sampling
## random sampling

# take a random sample of size 50 rows


from a dataset iris
# sample without replacement
myiris <- iris[sample(1:nrow(iris), 50,
replace=FALSE),]
Impute missing values

data(airquality)
aqr=airquality
summary(aqr)

imp = impute(aqr, classes = list(integer =


imputeMean(), factor = imputeMode()),
dummy.classes = "integer")
summary(imp$data)
Normalize data
## normalize columns
iris2.4 = normalizeFeatures(iris[,1:4],
method = "range")

summary(iris2.4)
Train and test set
## create train and test set

nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
iris.train <- iris[inTrain,]
iris.test <- iris[-inTrain,]
Module 4
Regression Models
What is Supervised Learning
• In Supervised Learning, we have a
dataset consisting of both features and
labels.
• The input data (X) is associated with a
target label (y)
Supervised Learning Examples
• Spam Email Filter
• Tumor Classification
Classification Steps
# Step 1 Load classifer library
library(package)

# Step 2 Split the data


index <- sample(....prob = c(0.6, 0.4))

# Step 3 Training
model <- classifier(y ~ ., data = train)

# Step 4: Prediction
class <- predict(model, data = test)
Multiple Linear
Regression
Split the Boston dataset
### Splitting data
data(Boston)
nr <- nrow(Boston)
inTrain <- sample(1:nr, 0.6*nr)
bh.train <- Boston[inTrain,]
bh.test <- Boston[-inTrain,]
Making tasks and learner
### Making Tasks
library(mlr)

regr.task = makeRegrTask(id = "bh", data


= bh.train, target= "medv")
regr.task

### Making learner


regr.lrn = makeLearner("regr.lm")
regr.lrn
Train the model
mod = train(regr.lrn, regr.task)
mod

names(mod)
getLearnerModel(mod)
Make predictions
regr.pred = predict(mod, newdata =
bh.test)
regr.pred

performance(regr.pred, measures =
list(rmse))

head(getPredictionTruth(regr.pred))

head(getPredictionResponse(regr.pred))
Visualize Results
plotLearnerPrediction(regr.lrn,
features="lstat", task=regr.task)
Ex: Multiple Linear Regression
Use MLR regressor to build a model to
predict media house price (MEDV) using
boston dataset

Time: 5 mins
Logistic Regression
Classifier
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
log.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
log.task

log.lrn = makeLearner("classif.logreg")
# predict.type = "prob" >if you want
probabililties)

log.lrn
Train the model
mod = train(log.lrn, log.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
log.pred = predict(mod, newdata =
ir2.test)
log.pred

performance(log.pred, measures =
list(mmce, acc))

head(getPredictionTruth(log.pred))

head(getPredictionResponse(log.pred))
Verify Model
### Confusion Matrix
calculateConfusionMatrix(log.pred)

### ROC curve


# for ROC the prediction must be type
"prob"

df =
generateThreshVsPerfData(log.pred,
measures = list(fpr, tpr,mmce))
plotROCCurves(df)
Visualize Results
plotLearnerPrediction(log.lrn,
features=c("Petal.Length","Petal.Width"),
task=log.task)
Ex: Logistic Regression Classifier
Use Logistic regression to build a model to
predict am variable using mtcars dataset

Time: 5 mins
Linear Discriminant
Analysis
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
lda.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
lda.task

lda.lrn = makeLearner("classif.lda")

lda.lrn
Train the model
mod = train(lda.lrn, lda.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
lda.pred = predict(mod, newdata =
ir2.test)
lda.pred

performance(lda.pred, measures =
list(mmce, acc))

head(getPredictionTruth(lda.pred))

head(getPredictionResponse(lda.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(lda.pred)
Visualize Results
plotLearnerPrediction(lda.lrn,
features=c("Petal.Length","Petal.Width"),
task=lda.task)
Ex: Linear Discriminant
Use LDA to build a model to predict gear
variable using mtcars dataset

Time: 5 mins
Decision Tree
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
rpart.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
rpart.task

rpart.lrn = makeLearner("classif.rpart")

rpart.lrn
Train the model
mod = train(rpart.lrn, rpart.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
rpart.pred = predict(mod, newdata =
ir2.test)
rpart.pred

performance(rpart.pred, measures =
list(mmce, acc))

head(getPredictionTruth(rpart.pred))

head(getPredictionResponse(rpart.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(rpart.pred)
Visualize Results
plotLearnerPrediction(rpart.lrn,
features=c("Petal.Length","Petal.Width"),
task=rpart.task)
Ex: Decision Tree
Use Decision Tree to build a model to
predict gear variable using mtcars dataset

Time: 5 mins
Random Forest
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
rf.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
rf.task

rf.lrn = makeLearner("classif.randomForest")

rpart.lrn
Train the model
mod = train(rf.lrn, rf.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
rf.pred = predict(mod, newdata =
ir2.test)
rf.pred

performance(rf.pred, measures =
list(mmce, acc))

head(getPredictionTruth(rf.pred))

head(getPredictionResponse(rf.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(rf.pred)
Visualize Results
plotLearnerPrediction(rf.lrn,
features=c("Petal.Length","Petal.Width"),
task=rf.task)
Ex: Random Forest
Use Random Forest to build a model to
predict gear variable using mtcars
dataset

Time: 5 mins
Gradient Booster
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
gbm.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
gbm.task

gbm.lrn = makeLearner("classif.gbm")

gbm.lrn
Train the model
mod = train(gbm.lrn, gbm.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
gbm.pred = predict(mod, newdata =
ir2.test)
gbm.pred

performance(gbm.pred, measures =
list(mmce, acc))

head(getPredictionTruth(gbm.pred))

head(getPredictionResponse(gbm.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(gbm.pred)
Visualize Results
plotLearnerPrediction(gbm.lrn,
features=c("Petal.Length","Petal.Width"),
task=gbm.task)
Ex: Gradient Boost
Use GBM tree to build a model to predict
gear variable using mtcars dataset

Time: 5 mins
XG Boost
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
xg.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
xg.task

xg.lrn = makeLearner("classif.xgboost")

xg.lrn
Train the model
mod = train(xg.lrn, xg.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
xg.pred = predict(mod, newdata =
ir2.test)
xg.pred

performance(xg.pred, measures =
list(mmce, acc))

head(getPredictionTruth(xg.pred))

head(getPredictionResponse(xg.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(xg.pred)
Visualize Results
plotLearnerPrediction(xg.lrn,
features=c("Petal.Length","Petal.Width"),
task=xg.task)
Ex: XG Boost
Use XG boost tree to build a model to
predict gear variable using mtcars
dataset

Time: 5 mins
Naïve Bayes
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
nb.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
nb.task

nb.lrn = makeLearner("classif.naiveBayes")

nb.lrn
Train the model
mod = train(nb.lrn, nb.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
nb.pred = predict(mod, newdata =
ir2.test)
nb.pred

performance(nb.pred, measures =
list(mmce, acc))

head(getPredictionTruth(nb.pred))

head(getPredictionResponse(nb.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(nb.pred)
Visualize Results
plotLearnerPrediction(nb.lrn,
features=c("Petal.Length","Petal.Width"),
task=nb.task)
Ex: naiveBayes
Use naiveBayes to build a model to
predict gear variable using mtcars
dataset

Time: 5 mins
k Nearest Neighbour
Split the Iris Dataset
nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
knn.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species")
knn.task

knn.lrn = makeLearner("classif.knn")

knn.lrn
Train the model
mod = train(knn.lrn, knn.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
knn.pred = predict(mod, newdata =
ir2.test)
knn.pred

performance(knn.pred, measures =
list(mmce, acc))

head(getPredictionTruth(knn.pred))

head(getPredictionResponse(knn.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(knn.pred)
Visualize Results
plotLearnerPrediction(knn.lrn,
features=c("Petal.Length","Petal.Width"),
task=knn.task)
Ex: k Nearest Neighbour
Use kNN to build a model to predict gear
variable using mtcars dataset

Time: 5 mins
Support Vector
Machines
Split the Iris Dataset
### Splitting data
data(iris)
iris2=subset(iris, subset=iris$Species
%in% c("versicolor","virginica"))
iris2$Species=factor(iris2$Species)

nr <- nrow(iris2)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris2[inTrain,]
ir2.test <- iris2[-inTrain,]
Making task and learner
svm.task = makeClassifTask(id = "ir2", data
= ir2.train, target= "Species")
svm.task

svm.lrn = makeLearner("classif.svm")

svm.lrn
Train the model
mod = train(svm.lrn, svm.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
svm.pred = predict(mod, newdata =
ir2.test)
svm.pred

performance(svm.pred, measures =
list(mmce, acc))

head(getPredictionTruth(svm.pred))

head(getPredictionResponse(svm.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(svm.pred)
Visualize Results
plotLearnerPrediction(svm.lrn,
features=c("Petal.Length","Petal.Width"),
task=svm.task)
Ex: support vector machines
Use svm to build a model to predict am
variable using mtcars dataset

Time: 5 mins
Unsupervised
Learning
What is Unsupervised Learning
• In Supervised Learning, we have a
dataset consisting of both features but
without labels.
• The most common method is cluster
analysis, which is used for exploratory
data analysis to find hidden patterns or
grouping in data.
Unsupervised Learning Examples
• Image grouping
• Grouping of drug molecules
K-means clustering
Split the Iris Dataset
### Splitting data
data(iris)

nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris[inTrain,-5]
ir2.test <- iris[-inTrain,-5]
ir2.Class<- iris[-inTrain,5]
## clustering only deals with numeric
data
Making task and learner
kmeans.task = makeClassifTask(id = "ir2",
data = ir2.train)
kmeans.task

kmeans.lrn =
makeLearner("cluster.kmeans“,
centers=3)
# specify how many clusters you want

kmeans.lrn
Train the model
mod = train(kmeans.lrn, kmeans.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
kmeans.pred = predict(mod, newdata =
ir2.test)
svm.pred

head(getPredictionResponse(kmeans.pr
ed))
Visualize Results
plotLearnerPrediction(kmeans.lrn,
features=c("Petal.Length","Petal.Width"),
task=kmeans.task)
Ex: kMeans
Use kmeans to build a cluster the mtcars
dataset into 5 groups

Time: 5 mins
Dimensionality Reduction - PCA
cor(iris[,1:4])

names(iris)
plot(iris$Sepal.Length, col = iris$Species)
plot(iris$Sepal.Width, col = iris$Species)

## PCA on full data set


pc2 <- prcomp(iris[,1:4], scale = TRUE)
pc2$x[1:3,]

plot(pc2$x[,1], col = iris$Species)


Dimensionality Reduction - PCA
iris2 <- scale(iris[,1:4])
iris2[1:5,]%*%pc2$rotation
pc2$x[1:5,]

summary(pc2)
vars <- apply(pc2$x, 2, var)
props <- vars / sum(vars)
cumsum(props)

barplot(cumsum(props))
Neural Network
(optional)
One Layer MLP
Split the Iris Dataset
### Splitting data
data(iris)

nr <- nrow(iris)
inTrain <- sample(1:nr, 0.6*nr)
ir2.train <- iris[inTrain,]
ir2.test <- iris[-inTrain,]
Making task and learner
nn.task = makeClassifTask(id = "ir2", data =
ir2.train, target= "Species"))
nn.task

nn.lrn = makeLearner("classif.nnet")
Train the model
mod = train(nn.lrn, nn.task)
mod

names(mod)

getLearnerModel(mod)
Make Prediction
nn.pred = predict(mod, newdata =
iris.test)
nn.pred

performance(nn.pred, measures =
list(mmce, acc))

head(getPredictionTruth(nn.pred))

head(getPredictionResponse(nn.pred))
Verify Model
### Confusion Matrix

calculateConfusionMatrix(nn.pred)
Visualize Results
plotLearnerPrediction(nn.lrn,
features=c("Petal.Length","Petal.Width"),
task=nn.task)
Ex: Neural Network
Use neural nets to build a model to
predict gear variable using mtcars
dataset

Time: 5 mins
Summary
Parting
Message
Q&A
Feedback
https://goo.gl/EDezXH

136
Thank You!
Ghazaleh Babanejad
ghazaleh.babanejad@gmail.com
01123005257

You might also like