Abstract:
Health is a state of full well-being and a cornerstone of international development, withconsiderable investments made over the last three decades to reduce morbidity and mortality. Under-five mortality, which is generally defined as the death of children under the age of five, is still a critical public health challenge in developing countries. The study here proposesamachine learning-based approaches for predicting under-five mortality using health, socio-demographic, and climate data from Eastern Hararghe, Ethiopia. The data used in this study were collected from the Hararghe Health DemographicSurveillance System and Ethiopian National Meteorology Agency. The followingeight
supervised machine learning algorithms were considered: Naïve Bayes (NB), Support VectorMachine (SVM), K-Nearest Neighbors (KNN), Decision Tree (DT), RandomForest (RF), eXtreme Gradient Boosting (XGBoost), Attentive Tabular Network (TabNet), andConvolutional Neural Network (CNN). The proposed framework covers data preprocessing, exploratory data analysis, model training, prediction, performance evaluation, andidentification of the key determinants of under-five mortality. It was observed that an80:20split produced an optimum performance in the models. Preprocessing techniques werethenapplied to enhance data quality before training the machine learning models. There weretwoexperimental setups: one with a data-balancing technique and the other without.Results indicated that balanced datasets always outperformed. Amongst all the modelsdeveloped, the XGBoost recorded the highest accuracy, having testing accuracy scoreof97.9%, precision of 98%, F1-score of 98%, and recall of 98%. The determinants of under-fivemortality identified in this study were antenatal care, child gender, wealth index, total numberof alive children, preceding child alive, physical healthy ,birth place and weight of baby. Intheend, the XGBoost algorithm emerged as the best among other models, proving to be themost
reliable predictive model for under-five mortality. This study has shown the potential ofmachine learning providing helps in tackling critical public health challenges by leveragingdiverse datasets to enhance decision-making and interventions. Household-level climatedatawere not utilized in this thesis, which would be taken into account by future researchers.