Professional Documents
Culture Documents
Anlisis de Regresin
Guayaquil, Ecuador
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Regresin Lineal
X
Data Mining, agabad@espol.edu.ec
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Precio en USD
Edad en meses
1*=-170.93
0*=20294.06
1*=-170.93 0
0*=20294.06
1
0
1
Data Mining, agabad@espol.edu.ec 5
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Ecuaciones Normales
Coeficiente de Determinacin
TOTAL SUM OF SQUARES
Variance to be
explained by
X1 predictors
(SST)
Variance
explained by
X1 Variance
NOT
(SSE) explained by
Y X1
(SSR)
Data Mining, agabad@espol.edu.ec
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Coeficiente de Determinacin
SSE
R 2
SST
Coeficiente de Determinacin
para evaluar la adecuacin del modelo de regression lineal
Y o 1 X1 2 X 2 ... p X p
X1 X2
Common variance
explained by X1 and X2
X2 Y
X1
Unique variance
explained by X2
Unique variance
Y
Variance NOT
explained by X1 explained by
X1 OR X2
Data Mining, agabad@espol.edu.ec 15
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Y o 1 X1 2 X 2 ... p X p
Computo de Parmetros
Cmputo de Coeficientes
( X ' X ) X 'Y 1
Cmputo de Coeficientes
( X ' X ) X 'Y 1
Cmputo de Coeficientes
( X ' X ) X 'Y 1
Estadsticos de Regresin
TOTAL SUM OF SQUARES
SSE
R 2
SST
Coefficient of Determination
to judge the adequacy of the regression model
Estadsticos de Regresin
n 1
R 2
1 (1 R )
2
n k 1
adj
n = sample size
k = number of independent variables
Logistic Regression
Extends idea of linear regression to situation
where outcome variable is categorical
Prstamo=0+ 1*Ingresos
UniversalBank Data
Age of customer
Prstamo {0,1} Experience: professional experience in years
Ingresos $K/ao Income of customer
Prstamo
Acept prstamo
CCAvg: average monthly
No acept prstamo
credit card spending
Income
Data preprocessing
Partition 60% training, 40% validation
Create 0/1 dummy variables for categorical predictors
Predictive Accuracy
Number of correct classifications
Accuracy = Number of instances in our database
100 0 0
True label is...
9 90 1
45 45 10
Data Mining, agabad@espol.edu.ec 39
ESCUELA SUPERIOR POLITECNICA DEL LITORAL ESPOL, Guayaquil, Ecuador
Curva ROC
Complete Example:
Predicting Delayed Flights DC to NY
Outcome: delayed or not-delayed
Predictors:
Day of week
Departure time
Origin (DCA, IAD, BWI)
Destination (LGA, JFK, EWR)
Carrier
Weather (1 = bad weather)
Data Preprocessing
Predictor Model