Rapid Miner Process - Getting Started With Assignment 2 and 3 (Fundraising Data)

Uploaded by

someoneelses

0% found this document useful (0 votes)

48 views7 pages

Original Title

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

48 views7 pages

Rapid Miner Process - Getting Started With Assignment 2 and 3 (Fundraising Data)

Uploaded by

someoneelses

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

Rapid Miner process - getting started with Assignment 2 and 3 (fundRaising data) Explore tha data - after reading

it in thru the ReadExcel node. Look into the distribution of values in different variables.

Would we like to transform some of the variables? - what transformations? - maybe try log transform of some variables (with very skewed distributions)? - maybe try group values into ranges -- these can be user-specified, based on domain knwoledge or what seems 'common-sense'

The following nodes in the RapidMiner process show some examples of transformations. Check the distributions after the transfomations -- do transformations help (why?) Go into the nodes to make sure you understand how the data transformations are specified.

The top part shows how certain variable transformations are obtained The Validation node is for splitting the data into training and validation and learning a model - as in teh last assignment. The lower part shows how we may perform a Principal Components Analysis (PCA)

The Generate Attribute node shows how new attributes can be obtained by specifying functions on existing attributes. The dialog box for specifying new variables is obtained by pressing the Edit List button. Try creating some new variable yourself.

The first Discretize node defines ranges for values of the RAMNTALL variable - i,.e. it converts this variable from numeric to nominal values having multiple 'classes' or groups. These groups are specified as shown, by pressing the EditList button

You can specify any names for the classes - here, we have chosen the group names based on teh value-ranges they include. The second group is named '50-100' (u can name it '50 to 100' or '50 to Hundred Frogs' if u like). The upper limits for the ranges are specified in sequence.

The same ranges can be specified for multiple variables, in a single Discretize node - as in teh case of the third Discretize node (shown below).

The Attribute filter type is set to subset to indicate that multiple attributes are to be selected into this node. Attributes can be selected in to the right-side pane of the dialog (dialog is obtained by pressing SelectAttributes)

Here, the AVGGIFT and LASTGIFT variables are selected into this node - so the disctretization operation will be performed on these attributes. The value ranges for the different groups are specified in the same way as in the last node.

For the PCA part - a subset of attributes to be used for PCA is selected using the Select Attribute node. Attributes to be included are selected in to the right-side panel.

Note - only the selected attributes are available at the 'exa' output poprt of the Select Attribute node. We next normalize these attributes using the Normalize node - the parameters here can specify different methods for normalizing. We have chosen 'range transformation' and 0.0 and 1.0 as min and max of the range.

For the PCA node - we choose 'keep variance' and set 0.95 as 'variance threshold'. This means that we'd like to retain as many (of the new) variables as needed to keep 95% of the information content or variance in the data.

The results of PCA can be seen at the 'pre' output port of the PCA node. It gives principal components in descending order of variance that they capture and cumulative variance - we see here that the first three principal components (PC1, PC2, PC3) capture 95% of the total information. So, instead of 5 original variables, we can use just 3 principal components.

The eigenvectors are also shown:

and these can help calculate the values of the new variables (PC1, PC2,....) from the values of original variables. The 'exa' output port gives the new attribute (principal components) for the data (ie. gives the values of variables PC1, PC2, ...for each data row). We can then use these values in subsequent processing.

Excel Techniques
From Everand
Excel Techniques
Online Trainees
Rating: 2 out of 5 stars
2/5 (1)
02_Econ115a_Mod2_Lesson4_WorkingWithVarsAndData
Document53 pages
02_Econ115a_Mod2_Lesson4_WorkingWithVarsAndData
lyriemaecutara0
No ratings yet
7 Data Pre-Processing in Clementine
Document7 pages
7 Data Pre-Processing in Clementine
Vũ Tuấn Hưng
No ratings yet
SAS Programming TIPS To Be Used in Development
Document16 pages
SAS Programming TIPS To Be Used in Development
Giri Bala
No ratings yet
Plot scientific line plots with PlotXY program
Document10 pages
Plot scientific line plots with PlotXY program
bubo28
No ratings yet
Aggregator Stage: The Stage Editor Has Three Pages
Document8 pages
Aggregator Stage: The Stage Editor Has Three Pages
maddhisrikanthreddy
No ratings yet
Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate
Document2 pages
Data Analytics 02: Pivoting of Data. You Might Be Familiar With The Concept of Pivoting From BI Tools or Excel: Rotate
Jhon Rey Balbastro
No ratings yet
SPSS Creating New Variables
Document0 pages
SPSS Creating New Variables
thlro
No ratings yet
Visual Basic Variable: Variable: Provide Temporary Storage of Data
Document6 pages
Visual Basic Variable: Variable: Provide Temporary Storage of Data
Tharindu Chathuranga
No ratings yet
SCD Stage
Document11 pages
SCD Stage
sam2sung2
No ratings yet
DWDM Record With Alignment
Document69 pages
DWDM Record With Alignment
navya
No ratings yet
Assignment 1-Preprocessing Handon
Document6 pages
Assignment 1-Preprocessing Handon
Ch Ubaid Warraich
No ratings yet
Instruction: Difficulty With Minimal Without
Document13 pages
Instruction: Difficulty With Minimal Without
Zia F Rahman
No ratings yet
Beta Distribution Fitting: Reliability Function
Document15 pages
Beta Distribution Fitting: Reliability Function
gore_11
No ratings yet
MC0717 Lab Manual
Document42 pages
MC0717 Lab Manual
Arun Reddy
No ratings yet
FINAL RM Lab File
Document9 pages
FINAL RM Lab File
vinod
No ratings yet
Procedural Programming Benefits
Document7 pages
Procedural Programming Benefits
Yesmine Makkes
No ratings yet
SPSS Beginner's Guide to Describing Data
Document12 pages
SPSS Beginner's Guide to Describing Data
altleo
No ratings yet
Processing of Diffraction Data With The HIGH SCORE PLUS Program
$Processing of Diffraction Data With The HIGH SCORE PLUS Program$
Document23 pages
Processing of Diffraction Data With The HIGH SCORE PLUS Program
basco costas
No ratings yet
Plot scientific line plots with PlotXY program
Document8 pages
Plot scientific line plots with PlotXY program
umairnaeem_90
No ratings yet
Matlab Activity 2.1
Document7 pages
Matlab Activity 2.1
roseleen
No ratings yet
Consommer Une Calculation View Avec Procedure Stockée
Document5 pages
Consommer Une Calculation View Avec Procedure Stockée
fiotui
No ratings yet
Static Variables
Document4 pages
Static Variables
Anjan Kumar
No ratings yet
Optimize Linear Model with Summary Stats
Document19 pages
Optimize Linear Model with Summary Stats
Meron Moges
No ratings yet
Simulate Linear Systems Lab
Document7 pages
Simulate Linear Systems Lab
ErcanŞişko
No ratings yet
SV 4 Variables Database ENU
Document72 pages
SV 4 Variables Database ENU
spctin
No ratings yet
Data Warehousing and Data Mining Lab
Document53 pages
Data Warehousing and Data Mining Lab
Aman Jolly
No ratings yet
Product Development Process of Carpentry Shop by Using IDEF and Arena
Document8 pages
Product Development Process of Carpentry Shop by Using IDEF and Arena
Hassan Talha
No ratings yet
Customized Post Processing Using The Result Template Concept
Document13 pages
Customized Post Processing Using The Result Template Concept
shochst
No ratings yet
Session 3,4
Document4 pages
Session 3,4
Rahul kumar
No ratings yet
Lectures On Spss 2010
Document94 pages
Lectures On Spss 2010
Superstar_13
No ratings yet
Variables Selection and Transformation SAS EM
Document12 pages
Variables Selection and Transformation SAS EM
Jennifer Parker
No ratings yet
Stata Application Part I
Document27 pages
Stata Application Part I
ዮናታን ዓለሙ ዮርዳኖስ
No ratings yet
QPA Training
Document13 pages
QPA Training
senthilkumar
No ratings yet
Basic Selectivity - How Oracle Calculates Selectivity
Document7 pages
Basic Selectivity - How Oracle Calculates Selectivity
Jack Wang
No ratings yet
Analyze and Select Features For Pump Diagnostics
Document15 pages
Analyze and Select Features For Pump Diagnostics
Pierpaolo Vergati
No ratings yet
Sap Abap Interview Questions
Document17 pages
Sap Abap Interview Questions
Upendra Kumar
No ratings yet
VB - Unit Iii
Document7 pages
VB - Unit Iii
Nalls
No ratings yet
Datapreprocessing
Document8 pages
Datapreprocessing
NAVYA Tadisetty
No ratings yet
Reporting Aggregated Data Using The Group Functions
Document6 pages
Reporting Aggregated Data Using The Group Functions
Florin Nedelcu
No ratings yet
Value Set Basics in Oracle Apps R12
Document7 pages
Value Set Basics in Oracle Apps R12
SuneelTej
No ratings yet
Reportingques
Document28 pages
Reportingques
anusha
No ratings yet
Data Mining 2
Document40 pages
Data Mining 2
Piyush Rajput
No ratings yet
Apunts BLOC 1 Estadística
Document15 pages
Apunts BLOC 1 Estadística
Mayssae Essabbar
No ratings yet
Chapter 4: SPSS: Spss Overview The SPSS Environment
Document10 pages
Chapter 4: SPSS: Spss Overview The SPSS Environment
Roi Trawon
No ratings yet
Tutorial #5: Analyzing Ranking Data: Bank Segmentation Example
Document19 pages
Tutorial #5: Analyzing Ranking Data: Bank Segmentation Example
janine
No ratings yet
Done - Best A Interview Questions & Answers by Saurav Mitra
Document28 pages
Done - Best A Interview Questions & Answers by Saurav Mitra
swapna3183
100% (1)
Differential Evolution
Document11 pages
Differential Evolution
Duško Tovilović
No ratings yet
Sort Data with Sorter Transformation
Document7 pages
Sort Data with Sorter Transformation
DEEPAK KUMAR ARORA
No ratings yet
Reports & Agents
Document49 pages
Reports & Agents
Srikanth
No ratings yet
Lab Manual Ee-319 Lab Xi (Control Systems Using Matlab)
Document38 pages
Lab Manual Ee-319 Lab Xi (Control Systems Using Matlab)
Parul Thapar
No ratings yet
How To Configure ODBC Connection For EXCEL: Informatica 7.x Vs 8.x An S
Document22 pages
How To Configure ODBC Connection For EXCEL: Informatica 7.x Vs 8.x An S
amar_annavarapu1610
No ratings yet
PRACTICAL NOTES STATA (Using Version 8 and 10) Stata Is One of The Sophisticated Statistical Software Used in Data Analysis. It Is More User
Document30 pages
PRACTICAL NOTES STATA (Using Version 8 and 10) Stata Is One of The Sophisticated Statistical Software Used in Data Analysis. It Is More User
Muwanguzi David
No ratings yet
SPREADSHEET
Document10 pages
SPREADSHEET
Shah Muhammad
No ratings yet
MODEST MANUAL in English
Document21 pages
MODEST MANUAL in English
Nikhil
No ratings yet
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Document4 pages
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Pavan Sankar K
No ratings yet
Data Mining Problem 2 Report
Document13 pages
Data Mining Problem 2 Report
Babu Shaikh
No ratings yet
Engo 645
Document10 pages
Engo 645
sree vishnupriyq
No ratings yet
ALE Sequence
Document3 pages
ALE Sequence
sweetu.m
No ratings yet
Master in Advanced Analytics 2017/2018 Data Mining Project Group Elements
Document16 pages
Master in Advanced Analytics 2017/2018 Data Mining Project Group Elements
Biazi
No ratings yet
The Development of the Laplace Transform from Euler to Spitzer (1737-1880
Document48 pages
The Development of the Laplace Transform from Euler to Spitzer (1737-1880
Marcelo Eduardo Benencase
0% (1)
Idrisi 32 - Guide To GIS and Image Processing 2 (2001)
Document151 pages
Idrisi 32 - Guide To GIS and Image Processing 2 (2001)
infinite_kosmos
No ratings yet
Effective Viral Clearance in Mab PurificationTOYOPEARL® Resin
Document1 page
Effective Viral Clearance in Mab PurificationTOYOPEARL® Resin
Bruno Andre
No ratings yet
Integral Theorems: 9.1 The Divergence Theorem of Gauss
Document26 pages
Integral Theorems: 9.1 The Divergence Theorem of Gauss
samwel kiwale
No ratings yet
Business Analytics Chapter 1
Document50 pages
Business Analytics Chapter 1
ricardo enriquez
No ratings yet
Quartiles
Document18 pages
Quartiles
Marilyn Aterrado
No ratings yet
STEM BasicCalculus AntiderivativeOfPolynomialandRadical
Document5 pages
STEM BasicCalculus AntiderivativeOfPolynomialandRadical
Nelson Salosagcol
No ratings yet
Variation
Document6 pages
Variation
PSYCHO / FLOW
No ratings yet
92402
Document34 pages
92402
anon_193775053
0% (1)
CBSE Class 11 Assignment For Complex Numbers and Quadratic Equations PDF
Document2 pages
CBSE Class 11 Assignment For Complex Numbers and Quadratic Equations PDF
animeshtechnos
No ratings yet
Formula For Calculating The Number of Theoretical Plates - SHIMADZU (Shimadzu Corporation)
Document3 pages
Formula For Calculating The Number of Theoretical Plates - SHIMADZU (Shimadzu Corporation)
RajaRamanD
No ratings yet
Section 5.3, Exercise 10
Document3 pages
Section 5.3, Exercise 10
Carlos Lopez
No ratings yet
Em2 Solved Problems-Ode Phabala 2003
Document9 pages
Em2 Solved Problems-Ode Phabala 2003
ferdinand zdrava
No ratings yet
Optimization
Document13 pages
Optimization
Ali Acıoğlu
No ratings yet
SCME-TBW Course Outline
Document6 pages
SCME-TBW Course Outline
inam ullah
No ratings yet
CH 35 Titrimetry Acid Base
Document46 pages
CH 35 Titrimetry Acid Base
Farhan Muhammad Iskandar
No ratings yet
CH 5 Continuity
Document43 pages
CH 5 Continuity
Modern Designs
No ratings yet
Analysis in Banach Spaces - Volume I - Martingales and Littlewood-Paley Theory ( Tuomas Hytonen Jan Van Neerven Mark Veraar) PDF
Document628 pages
Analysis in Banach Spaces - Volume I - Martingales and Littlewood-Paley Theory ( Tuomas Hytonen Jan Van Neerven Mark Veraar) PDF
Anonymous bZtJlFvPtp
100% (1)
Annova Excel New
Document12 pages
Annova Excel New
Kritik Malik
No ratings yet
Assignment 1A - Differential Equations
Document4 pages
Assignment 1A - Differential Equations
Alina Keny
No ratings yet
Drop Test Ansys Vs Dyna
Document20 pages
Drop Test Ansys Vs Dyna
manmohan_iit
No ratings yet
Scientific Calculator
Document8 pages
Scientific Calculator
John Carlo D. Engay
No ratings yet
Chapter 5 Measures of Variability
Document24 pages
Chapter 5 Measures of Variability
jessa
No ratings yet
NUS Probability and Statistics Notes
Document55 pages
NUS Probability and Statistics Notes
Manmeet Singh
No ratings yet
Solved Problems in Techniques of Integration
Document5 pages
Solved Problems in Techniques of Integration
Jayson Acosta
No ratings yet
Mathematical Physics-II PDF
Document4 pages
Mathematical Physics-II PDF
bidyut mahato
No ratings yet
Lecture 06 Lti Differential and Difference Systems
Document24 pages
Lecture 06 Lti Differential and Difference Systems
rama
No ratings yet
Deep Learning, Neural Networks and Kernel Machines: A Unifying Framework
Document98 pages
Deep Learning, Neural Networks and Kernel Machines: A Unifying Framework
Adrian Ion-Margineanu
No ratings yet
Tutorial Notes For Structural Dynamics - First Part
Document53 pages
Tutorial Notes For Structural Dynamics - First Part
David CivilEngineer
No ratings yet
Testing The Difference Between Means, Variances, and Proportions
Document53 pages
Testing The Difference Between Means, Variances, and Proportions
aswardi
No ratings yet