Different Classifiers Data Science

Different Classifiers
J48 decision tree- form a model for the target value based on attribute values of the available
data. It starts by forming a decision tree by identifying the attribute with the highest information
gain.
OneR (One rule)
-generates one rule for each predictor in the data and select the rule with the smallest total
error.
Naive Bayes
-called naive bayes/ idiot bayes because the calculation of the probabilities for each hypothesis
are simplified to make their calculation tractable
- based on Bayes rule of conditional probability. It predicts outcomes for new data based on
prior probability from the training set. Downside- it assumes all predictors are independent, and
don’t care that much about the correlation between attributes and values, which is not possible
in real life. (called naive for that reason) It is known to be a fast and decent classifier, but a bad
estimator. So the probability outputs are not accurate.
Random forest
-creates a set of decision trees from randomly selected subset of training set, and then takes
the tree with the highest number of votes. (or take the tree with the mode of the
classes(classification) or mean prediction(regression). Vote - each tree gives a classification,
and we say that the tree votes for that class. Forest chooses the classification with the most
votes.
-pro- gets rid of overfitting- having too many parameters relative to the number of observations.
IBK (k-nearest neighbors algorithm)- makes prediction by using the training dataset directly.
Makes prediction by searching through the entire training set for the k most similar instances
(the neighbors) and summarizing the the output variable for those k instances (could be mean
or mode of the output calculated). To calculate the nearest neighbor, we just use the euclidean
space distance formula.
-pros- very fast training, but the testing takes more time and memory, struggles with large
amount of inputs.
SVM (support vectors machine) - An algorithm that plots all the data points in n-dimensional
space, and classify by finding the hyperplane/boundary that differentiate the different classes.
-pros- performs well when there is a clear margin of separation between data, apply linear and
non-linear classification techniques to data.
-cons- doesn’t perform well with large datasets, when the data set has more noise (there are
more overlaps between classes).
ZeroR- the simplest classification method. Predicts the majority category (looks for the most
popular class and guesses that all the time- basic probability) it can’t be used to predict any data
since it only uses the training set. Trivial classifier, but helps you look at baseline. (look at
baseline first- if baseline is greater than all the other classifier, and there are some really low
accuracy with other classifiers. Then you must examine all the attributes and see if they are
actually significant)
Statistics
Logistic Regression
Logistic regression is the type of analysis used in statistics to predict outcomes when the
dependent variable/ the outcome variable is dichotomous (yes/no, success/failure).
The attributes are treated as independent variables, and they can be either measurement
variables (age) or categorical/dummy variables (education, marital status).
C- statistic (AUC statistic) is the goodness of fit for the logistic model. The min value is 0.5, and
generally a value between 0.70- 0.80 is good, and above 0.80 is a strong model. Here we have
0.88, which is a really strong model.
Odds Ratio Table

*By default, R chooses the lowest coded categorical variable as the reference group. Many
statistical packages report the Wald Chi-square as the test statistic for the slope in a logistic
regression. The Wald test statistic is (estimate/SE) , which follows a chi-square distribution with
2
1 degree of freedom. The glm( ) procedure in R reports a z-statistic, which is the square root of
the Wald Chi-square, z = (estimate/SE). These two approaches give identical p-values for the
slopes.
-Confidence intervals that do not contain the value of 1 indicate significant associate at the
p<0.05 level - for 95 percent confidence intervals. - including 1 means there is no difference
between the two groups being compared with respect to what you are measuring.
-ex: For those significant variables. Like loan, 0 is coded for having no loan, and 1 is coded for
having loan. The odds of making a bank deposit is 54 percent lower for those who currently
have loan than those who currently don’t have a loan.

Different Classifiers Data Science

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Different Classifiers Data Science

Uploaded by

Copyright:

Available Formats

Different Classifiers

Odds Ratio Table

You might also like