Completed the classic project on Kaggle on the titanic dataset by performing data cleaning and machine learning to predict passenger survival.
Performed median imputation for missing variables and explored key features using visualisations before fitting and comparing Logistic Regression and Random Forest models for classification. The Random Forest model came out on top reaching around 80% accuracy on a testing set.
Feature importance analysis confirmed sex, fare, and passenger class as the strongest predictors of survival, consistent with the historical reality of the disaster.
Identified potential future steps to go further with the machine learning model.
The final submission on kaggle reached a score of 0.77990.