Professional Documents
Culture Documents
Analytics
evtechnologies.com/predicting-survivors-titanic-using-sap/
Ahmed Sherif
4/26/2016
1/6
Step 2: Analyze the data to make sure all of the columns are appropriately valued
Step 3: Apply the target and predictor variables to assign roles within the model. For purposes of this model,
Survived will be our target variable since we are looking to see how well we can predict the survival outcome
based on everything else. (Please note that KxIndex is just a generated index variable produced by the tool to
provide uniqueness to each row of data and will be discarded from the final model. In addition, I removed boat,
home destination, and ID as they were also insignificant to the model outcome.)
Step 4: After selecting all the explanatory variables as well as the target variable, go ahead and generate the
2/6
model. (Please note that the settings for determining a probability threshold were kept at the default settings of
50%, meaning a probability of less than 0.5 would indicate No Survival and a 0.5 or greater is Yes Survival.
This setting can be adjusted in the advanced settings before model generation if the standard needs to be reevaluated.)
Step 5: Evaluate Results
3/6
The most significant contributor was the sex/gender of those aboard, with women more likely to survive than
men. Additionally, the next two contributors had to do the fare they paid for getting on board and the passenger
class they were in. Clearly there is a direct relationship between how much you paid for a ticket and the class
you were assigned to. These two indicators highlight that in fact social class may have been a factor if there was
priority given to those who were saved during the Titanics fatal end.
Final Step: Apply Training Model on Test Results
We will now apply the model built off of the training set against the 309 rows of data we stored in the test data set
to see how well we can predict the actual probability of survival. For the output, we want to select the option to
view probability. For us, this model goes back to predicting a probability of 0 or 1.
Less than 50% = 0
50% or Greater = 1
4/6
You can view the output inside of the tool or export it out to a data source. I always like to export it out to an
Excel spreadsheet to quickly calculate my scores. The variable that we are looking for is the
proba_rr_survived. If we round it to a whole number, we want to compare that result to the original response of
survived. After some calculations in Excel for the rows I tested, I got 245 correct matches out of 309. That rate
comes out to be 79.3%, almost identical to our KI score above of 79.8%.
In conclusion, we found that the data gathered on 1,300 of those on the Titanic were able to predict the survival
rate of 80% of those we tested our model on. We found that Sex, Fare, and Class were the highest individual
predictors contributing to the model score.
FYI, the actual Joseph Dawson did also die on the Titanic.
To learn more about SAP Predictive Analytics, please register and join our 4-part webinar series called
I Love Predictive
Resources
Learn more about logistic regression algorithms
5/6
SAP Predictive Analytics Tutorials for Automated Analytics and Expert Analytics
Titanic Data Set
6/6