You are on page 1of 24

ISC 42: Fundamentals of Predictive Analytics

Cox Regression for Survival Data

Cox Regression

[2]

What if you want to know for how long the customers will remain
with your telecommunications company?
What if you want to know when your customers will cancel their
credit card?
What if you want to know when students are likely to leave the
University?
What if you want to know when students are likely to be employed?

Use Cox Regression!

Cox Regression

[2]

Use Cox Regression to identify the chances of survival and failure

Survival Rate:
Chances that the subject will stay at time t
Hazard Rate:
Risk of failure
Time-dependent
Probability of the event happening at time t given that the
individual is at risk at time t

Cox Regression

[2]

How can you use Cox Regression?

Step 1: Decide on what you want to know. Decide on the event to


time you want to
analyze.
Ex: How long does it take you to find a job? What's the chance that
you will find a job?
How long until a company adopts a new technology?
How long until a student drops out from the University?
Step 2: Track subjects and wait for a the event to happen
Step 3: Compute for Survival Rate, Cumulative Survival Rate,
Hazard Rate, Cumulative
Hazard Rate
Survival Rate: How long the subject stays in the sample.
Hazard Rate: Risk of failure. Chance of an event happening.
4

Censored Data
These are samples that cannot be tracked anymore.
Ex. We do not know if they found a job.
We do not know if they survived their battle against cancer.
Why?
They did not reply to the survey anymore.
They could no longer be contacted.
Do you still use this data?
Yes it is still data while they are still in the study.

Survival Analysis

[2]

Examines the length of time to


a critical event
Data may have censored
data. Some entries are not
yet conclusive.c
Ex. Customers who have not
left yet as of writing, do not
have a definite future
survival

What is Survival Analysis? (continued)

[2]

Censored data are


those subjects that are
no longer tracked

n.d.(2010). Predictive Modeling with IBM SPSS Modeler Student Guide. IBM Corporation Inc.

Censored Data
Cancer
Death

Censored

Explanation

1 Subject has not died yet (Censored)

0 Subject has died

1 Subject has not died yet (Censored)

0 Subject has died

Event:

Death from Cancer

Censored:Subjects who survived until the end of the study but we do not kn
what
happened after the studyc
Subjects who dropped out from the study while the study was on
going

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

Data of Cancer Patients


Total Patients: 26

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

At time 59, how many have died (e


1

At time 115, how many have died (


2

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

At time 59, how many are at risk?


26

At time 115, how many are at risk?


25,because 1 patient has alread

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

At time 59, how many are at risk?


26
How many event?
1
Hazard Rate?
1/26 = ?

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

At time 115, how many are at risk?


25
How many event?
2
Hazard Rate?
2/25 = ?

Time

Censored if 0

Age

Time

Censored if 0

Age

59

72

744

50

115

74

769

59

156

66

770

57

268

74

803

39

329

43

855

43

353

63

1040

38

365

64

1106

44

377

58

1129

53

421

53

1206

44

431

50

1227

59

448

56

464

56

475

59

477

64

563

55

638

56

At time 115, how many are at risk?


25
How many event?
2
Survival Rate?
(25-2)/25 = ?

Notations:

Hazard Function:

tj time
nj number at risk
dj number of events

Cumulative Hazard Function:

Sj number of censored

j hazard function
(tj ) cumulative hazard function
S(tj ) Survival function

Kaplan-Meier Survival Function:

Hazard Rate of the Cancer Patients


Compute for the hazard rate.

Recall:
nj number at risk
dj number of events

Number at risk at a given time j


Number of events at a given time j

Kaplan Meier Survival Rate of the Cancer Patients


Compute for the survival rate.

Recall:

Parametric Survival Function


Look at the shape of the hazard function
Parametric Model

Hazard Function

Survival Function

Exponential
Weibull
Gompertz
Log-logistic

Exponential model assumes that the probability of the event


happening does not change over time.

Cox Regression for Survival Data

[2]

Uses predictors to predict the likelihood of the event of interest


occurring at time t.
Based on survival analysis ( time of terminal illness until death)
Can be used to predict churn (time the customer will leave the
company)

19

Cox Regression for Survival Data


What can you say about this graph?

After 60 months, 50% of the customers left


20

[2]

What is Survival Analysis?

[2]

Examines the length of time to a critical event

n.d.(2010). Predictive Modeling with IBM SPSS Modeler Student Guide. IBM Corporation Inc.

21

Cox Regression or the Cox Proportional Hazard Model


[2]

Is a survival model that represents hazard as a function of time and


predictor fields that can be continuous

Where:
= hazard function
t = time
= covariates
x = predictors

22

Cox Regression or the Cox Proportional Hazard Model


[2]

Interpreting Coefficients

If Coefficient is positive,
Lower duration, higher hazard rates (more likely to
happen)
As an independent variable increases, time-toevent decreases (the sooner the event will happen)
If Coefficient is negative,
Higher duration, lower hazard rate (less likely to
happen)
As an independent variable increases, time-toevent increases
23

QUESTIONS?

THANK YOU & GOD BLESS!

24

You might also like