COVID-19 Mortality Prediction among Patients Using Epidemiological Parameters: An Ensemble Machine Learning Approach

Krishnaraj Chadaga1

Srikanth Prabhu1,*,Email

Shashikiran Umakanth2

Vivekananda Bhat K1

Niranjana Sampathila3

Rajagopala Chadaga P4 

Krishna Prakasha K5

1Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

2Department of Medicine, Dr. TMA Pai Hospital, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

3Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

4Department of Mechanical and Manufacturing Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education. Manipal, Karnataka, 576104, India

5Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India

Abstract

Coronavirus infection (COVID-19) is a dangerous disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that has quickly spread all around the world, becoming a global pandemic on 11th March 2020. Vaccines have been developed to prevent the spread of this disease and various researches are being conducted to find the cure too. Machine learning (ML) has shown to be useful in battling COVID-19 and various applications have been deployed to comprehend real-world events through the meticulous analysis of data. In this study, we perform a retrospective study of epidemiological parameters to predict the mortality among SARS-CoV-2 patients. The goal of this research is to find important predictive parameters that can indicate the patients who are at the highest risk of death. Supervised ensemble machine learning models were developed that included random forest, catboost, adaboost, gradient boost, extreme gradient boosting and light GBM (Gradient Boosting Machine) for the COVID-19 epidemiology dataset that was obtained from Mexico. Prior to creating the models, Pearson’s co-relation and mutual information analysis between various dependent and independent features were used to establish the strength of the association between features in the dataset. Extreme Gradient Boosting achieved the highest results with an accuracy of 96%.