Professional Documents
Culture Documents
Analytics AY 2016-17
Dr. Sridhar Vaithianathan
IT & Analytics
Office 2
sridhar.v@imthyderabad.edu.in
Mobile: 99899 04245
Topic Outline
Statistics Vs Analytics
Data Mining Methods
Data Mining Techniques
Data Mining Process
Analytics: What it is
NOT?
Statistics
Analytics
Macro - decisioning
Explain/Describe
Population
relationships
Small sample, few
variables
Find good fitting
statistical model
Confidence intervals,
hypothesis test, pvalue
Micro - decisioning
Predict values of
new records
Large sample, many
Vs
variables
Models/Algorithms
with high predictive
power
Predictive power
metrics and cost
Analytics Vs Statistics
The prime
objective of
--------- is to
minimize the
error and
improve the
accuracy.
Answer: ANALYTICS
PREDICTION (NUMERICAL Y)
DIMENSION REDUCTION
CLASSIFICATION
(CATEGORICAL Y)
SEGMENTATION
WHAT GOES WITH
WHAT
Datamining
Techniques
Classification
Prediction
Association
Cluster/Segmentation
16
17
18
Preprocessing DATA
Oversampling Rare Events
Types of Variables
Handling Categorical Variables
Variable Selection : Parsimony
Problem of Overfitting
How many Variables (X-Y plots)
How much Data:
10 cases for each variable (or)
6 x m (outcome classes) x P ( No. of Variables)
Detecting Outliers
Handling Missing Data
Normalizing Data
19
DATA PARTITION
UNDERSAMPLING ???
Types of Variables
Determine the types of pre-processing
needed, and algorithms used
Main distinction: Categorical vs.
numeric
Numeric
Continuous
Integer
Categorical
Ordered (low, medium, high)
Unordered (male, female)
Handling Categorical
Variables
Dummy Variable in regression
Eg: Occupation
Student
Unemployed
Employed
Retired
22
300
400
500
600
Expenditure
700
800
900
1000
Detecting Outliers
An outlier is an observation that is extreme,
being distant from the rest of the data (definition of
distant is deliberately vague)
Outliers can have disproportionate influence on
models (a problem if it is spurious)
An important step in data pre-processing is
detecting outliers
Once detected, domain knowledge is required to
determine if it is an error, or truly extreme.
In some contexts, finding outliers is the purpose of
the DM exercise (airport security screening). This is
called anomaly detection.
DATA Partition
28
29
XLMINER INSTALLATION :
\\10.1.1.11\commonfolder\DMBA - Elective
DMBA Text Book Datasets:
\\10.1.1.11\commonfolder\DMBA - Elective\DMBA
DATASETS\DM for BI - DATA SETS June 2016