Getting Started with Titanic Challenge

This blog describes the steps performed in Titanic Machine Learning Challenge on Kaggle, as part of the tutorial project.

The main aim of this challenge is to predict the survival of each passenger, by using the training data set given. We have three different files given : train.csv (Training data), test.csv (Test Data for which we need to make predictions), and gender_submission.csv (which will describe how the the predictions should be structured.)

The two important libraries that are used here are: Numpy and Pandas

First step will be to load all the data files. This is achieved using the pandas function “read_csv”. Next, a hypothesis is made for the given data - that all the female passengers are survived and all male passengers died. To validate this hypothesis made, we need to calculate the percentage of female passengers who survived and the percentage of the male passengers who survived.

From the above results we get to know that 75% of the women on the ship survived and only 19% of the men were survived. If we consider all the other aspects, better predictions can be made. Since, it is very tedious for us to do so, we will use Machine Learning model to make the predictions.

In this case, Random Forest Model will be used as the model. As we know, this algorithm is widely used in Classification and Regression problems. Based on patterns in the training data, it builds the trees in the random forest model. And, these trees will be used for predicting the output for test data.

Please go through the below link, which will take you to the Kaggle Notebook which has the above described code: https://www.kaggle.com/harshitharavi96/getting-started-with-titanic/edit