30 views

Uploaded by bhavin

- IRJET-Performance for Student Higher Education using Decision Tree to Predict the Career Decision
- Thesis_final.pdf
- P2P Simulator for Queries Routing Using Data Mining
- Machine Learning Techniques for Intrusion Detection System
- Raise Course4
- Decision Tree
- Final Report
- A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
- W96-0103
- Decision Trees for Uncertain Data(2011).docx
- Data Minning in Scada Power Systems Paper
- SAS_E-Miner2000 (2)
- DecisionTreePrimer-1
- r 4 Decision
- A Comprehensive Evaluation of Intelligent Classifiers for Fault Identification in Three-phase Induction Motors
- 9986-22203-1-PB (1)
- Analysis Studio User Guide Ug Cr Pps
- 26 Slide
- Final s2008 Sol 1
- ocaml-3.12-refman

You are on page 1of 2

will split the data as a series of

Decision trees are a type of splits on single features Additional splits can occur until a

supervised machine learning stopping criteria is reached

They use known training data to Common stopping criteria are

create a process that predicts the Maximum Depth: the maximum

results of that data. That process number of splits in a row has been

can then be used to predict the reached

results for unknown data

A decision tree processes data into

groups based on the value of the

Maximum Leafs: dont allow any

data and the features it is provided

more leafs

At each decision gate, the data gets

Min Samples Split: Only split a node

split into two separate branches,

if it has at least this many samples

which might be split again

Min Samples Leaf: Only split a node

The final grouping of the data is

if both children resulting have at

called Leaf Nodes It will not do what a human might

least this many samples

do, shown below

Min Weight Fraction: Only split a

node if it has at least this

percentage of the total samples

RANDOMIZATION

When a single decision tree is run,

it usually looks at every point in the

available data.

However some algorithms combine

multiple decision trees to capture

their benefits while mitigating over

fitting

Two widely used algorithms that

Decision trees can be used for use multiple decision trees are

regression to get a real numeric Time Complexity Random Forests and Gradient

value. Or they can be used for For a single decision tree, the most Boosted Trees

classification to split data into expensive operation is picking the Random Forests use a large number

different distinct categories. best splits of slightly different decision trees

Picking the best split requires run in parallel and averaged to get a

evaluating every possible split final result.

PICKING THE SPLITS Finding every possible split requires Gradient Boosted Trees use

sorting all the data on every feature decision trees run in series, with

Decision trees look through every That means for M features and N later trees correcting errors on

possible split and pick the best split points, this takes M * N lg(N) time previous trees.

between two adjacent points in a For machine learning algorithms If multiple trees are combined, it

given feature that use multiple decision trees, can be advantageous to have a

If the feature is in the real range, those sorts can be done a single random element in the creation of

the split occurs halfway between time and cached the trees, which can mitigate over

the two points fitting

The split are always selected on a Overfitting Bagging (short for Bootstrap

single feature only, not any Over fitting, which means too aggregating) is drawing samples

interaction between multiple closely matching the training data from the data set with

features at the expense of the test data, is a replacement.

That means if there is a relationship key concern for decision trees If you had 100 data points, you

between multiple features, for Different stopping criteria should would randomly draw 100 points

instance, if density is the critical be evaluated with cross-validation and on average get 63.2% unique

feature, but the data is provided in to mitigate over fitting points and 36.8% repeats

Sub-Sampling - is drawing a smaller that error, sum the results and take Classification Information Gain

set than your data set. For the average Classification trees measure their

instance, if you have 100 points, information gain either by

randomly draw 60 of them EXAMPLE REGRESSION Entropy or Gini Impurity

(typically without replacement) The best result for those values is

Random Forests tend to use This is an example of a regression also zero, and the worst result

bagging, Gradient Boosted Trees tree attempting to match a sine occurs when every category has an

tend to use sub-sampling. wave using a single split equal likelihood of being at any

Additionally the features the tree given node

splits on can be randomized Gini equation

Instead of evaluating every feature

for the optimum split, a subset of

the features can be evaluated

(frequenlty the square root of the

number of features are evaluated) Where p is the probability of having

a given data class in your data set.

Entropy Equation

REGRESSION DECISION TREES

Below is an example of a depth 2

Regression trees default to splitting regression tree, which gets 3 splits

data based on mean squared error (i.e. 4 final groups)

(MSE) Those results end up being fairly

To calculate MSE: take the average similar

value of all points in each group,

and for each point in that group

subtract the average value from the

true value, square the error, sum all

of the squares and take the

average.

The chart below shows data that

was split in order to generate two

new groups that had the lowest

possible MSE

INFORMATION GAIN

FEATURE IMPORTANCES

Decision trees pick their splits in

order to maximize Information Decision trees lend themselves to

Gain identifying which features were the

Information is a metric which tells most important

you how much error you probably This is by calculating which feature

have in your decision tree resulted in the most information

Information gain is how much less gain (weighted by number of data

error you have after the split than points), across all of the splits

before the split The information gain can be MSE /

MSE by default tends to focus on MAE for regression trees or Entropy

points with the highest error and Regression Tree Information Gain / Gini for classification trees

split them into their own groups,

Regression trees use either MSE or

since their error has a squared MAE as their metric

effect.

The information gain is the MSE /

Mean Average Error (MAE) is also MAE before the split minus the MSE

an option for picking splits instead / MAE after the split (weighted by

of MSE. the number of data points that split

To calculate MAE: take the median operated on)

(not mean) of all points in the The best MSE / MAE is zero

group, and for each point subtract

the median value from the true

value, take the absolute value of

- IRJET-Performance for Student Higher Education using Decision Tree to Predict the Career DecisionUploaded byIRJET Journal
- Thesis_final.pdfUploaded byricksant2003
- P2P Simulator for Queries Routing Using Data MiningUploaded byMaurice Lee
- Machine Learning Techniques for Intrusion Detection SystemUploaded byijcsis
- Raise Course4Uploaded bydrawone1
- Decision TreeUploaded byAnup Bhattarai
- Final ReportUploaded byTara Gonzales
- A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATIONUploaded byCS & IT
- W96-0103Uploaded bymusic2850
- Decision Trees for Uncertain Data(2011).docxUploaded byNikhil Nipurte
- Data Minning in Scada Power Systems PaperUploaded byjoaopedrocabral
- SAS_E-Miner2000 (2)Uploaded byNusrat E
- DecisionTreePrimer-1Uploaded byAndrew Drummond-Murray
- r 4 DecisionUploaded by0xym0r0n
- A Comprehensive Evaluation of Intelligent Classifiers for Fault Identification in Three-phase Induction MotorsUploaded byIsrael Zamudio
- 9986-22203-1-PB (1)Uploaded byArijit Das
- Analysis Studio User Guide Ug Cr PpsUploaded byphucmu
- 26 SlideUploaded byHoàng Vũ
- Final s2008 Sol 1Uploaded bysaahil480
- ocaml-3.12-refmanUploaded byxuke
- Question for GateUploaded byrajveerchoudharymca
- Lab 1Uploaded byHo LY
- Multi-Agent Influence Diagrams for Representing and Solving GamesUploaded byMichael Rogers
- Online QDA - How and What to CodeUploaded byRocky
- E&I(2, 35,36) - 20. B.E. ECEUploaded byRichard Nixon X
- Lab-Course Plan - ADS(2018)Uploaded bySaurabh Sengar
- ocaml-3.11-refmanUploaded byAndrey Kukareka
- Slides1.PsUploaded byPraneeth Akula Sona
- Rao Lecture 22Uploaded bySakura2709
- AVL_TREESUploaded byGióng Thánh

- sutureUploaded byNisitya
- 2013 Lect1 Application of Fluid StaticUploaded bymustikaaryanti
- archivo 4.pdfUploaded byJose David
- 2014 11 05 Dynasty Technical Report Final CELICAUploaded byGabbyTorres
- T1ug90--Uploaded bygermangsilva
- Normative Values for Isometric Muscle Force by Handheld DynamometerUploaded byWasemBhat
- Powerful_Forecasting_With_MS_Excel_sample.pdfUploaded byprabindra
- Applied Statistic PosterUploaded bydouglasgarciatorres
- maths-general-2-formulae-14Uploaded byapi-245523260
- many way of serial correlation checkingUploaded bysaeed meo
- General Average PaperUploaded byManoj Varrier
- Assignment 2(LCW) (1)hhUploaded byThomas Goh
- Coring & Core AnalysisUploaded byGeotag
- Correlation & RegressionUploaded byAnsh Talwar
- Dry Gas SealUploaded bydrg
- oriana tutorial.pdfUploaded bygkgj
- SMCh03 FinalUploaded bySharmaine Sur
- D3686.pdfUploaded byFred Renard
- Econometrics.pdfUploaded bybizjim20067960
- qmmUploaded bygladwin thomas
- Control ChartUploaded byvignanaraj
- Lab Report GuidUploaded byBoyceLoc
- Sr15 Chapter1 AnnexUploaded byJEAN PAUL LOPEZ JESUS
- Paper 113Uploaded byjennajennjenjej
- Fluid Mechanics.pdfUploaded byMisael
- Wavelet in FX PredictionUploaded byharry_tree
- Accumulator Sizing - SLBUploaded byemmanuel
- Hw2-solUploaded byjegosss
- 04072930Uploaded bydf_campos3353
- Introduction First Lecture in ClassUploaded bybaltimorlu