Rachael Mburu is a data analyst with three years of experience in the field of research and data analysis. Previously worked for The International Centre of Tropical Agriculture (CIAT) as a research and data analysis consultant and currently working for Triggerise as a data analyst. She has had experience in data analysis and data visualization using R software, STATA, SPSS, Tableau, and Power Bi.
Project Title :Comparison of Elastic Net and Random Forest in identifying risk factors of stunting in children under five years of age in Kenya.
Children with a Height-for-Age (HAZ) below -2 Standards Deviations based on the World Health Organization (WHO) child growth standards median are said to be stunted. According to Kenya Demographic Survey (KDHS, 2014), the national prevalence of stunting among the under-five children was 26\% which was relatively higher than the average prevalence of developing countries which is 25\%. This work compares Random Forest and Elastic Net in identifying determinants of under-five childhood stunting with Variable Importance as the key outcome. The Kenya Demographic Health Survey (KDHS) women and children data was used for analysis. This data was cleaned using STATA and analyzed with R software. Due to the variance in the classes of the response variable, Synthetic Minority Oversampling Technique (SMOTE) was employed to obtain a balanced class data. Missing observations were imputed using function from library randomForest in R software. Random Forest and Elastic Net algorithms were used to obtain determinants of stunting while Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve was used to compare the models. The top 5 factors in terms of importance according to Random Forest are: underweight status, region, child’s age, ethnicity, and mother’s current age. According to the Elastic Net algorithm, the top 5 important coefficient variables are: underweight children, Nairobi region, 60+ months preceding birth interval, 12-23 months old children, and children from Luhya ethnicity. In terms of the ROC values, Random Forest had an AUC of 0.92 while Elastic Net had an AUC of 0.86. Based on our findings, most of the top ranked important variables selected by Random Forest and Elastic Net are similar. Nevertheless, Random Forest performed better than the Elastic Net algorithm in determining the factors of under-five childhood stunting.