TY - JOUR
T1 - Supervised Learning Algorithm for Predicting Mortality Risk in Older Adults Using Cardiovascular Health Study Dataset
AU - Navarrete, Jean Paul
AU - Pinto, Jose
AU - Figueroa, Rosa Liliana
AU - Lagos, Maria Elena
AU - Zeng, Qing
AU - Taramasco, Carla
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/11
Y1 - 2022/11
N2 - Featured Application: In this project, we designed an algorithm to predict mortality from multiple chronic conditions and cardiovascular diseases. We designed this algorithm to function as a decision aid for healthcare professionals. Multiple chronic conditions are an important factor influencing mortality in older adults. At the same time, cardiovascular events in older adult patients are one of the leading causes of mortality worldwide. This study aimed to design a machine learning model capable of predicting mortality risk in older adult patients with cardiovascular pathologies and multiple chronic diseases using the Cardiovascular Health Study database. The methodology for algorithm design included (i) database analysis, (ii) variable selection, (iii) feature matrix creation and data preprocessing, (iv) model training, and (v) performance analysis. The analysis and variable selection were performed through previous knowledge, correlation, and histograms to visualize the data distribution. The machine learning models selected were random forest, support vector machine, and logistic regression. The models were trained using two sets of variables. First, eight years of the data were summarized as the mode of all years per patient for each variable (123 variables). The second set of variables was obtained from the mode every three years (369 variables). The results show that the random forest trained with the second set of variables has the best performance (89% accuracy), which is better than other reported results in the literature.
AB - Featured Application: In this project, we designed an algorithm to predict mortality from multiple chronic conditions and cardiovascular diseases. We designed this algorithm to function as a decision aid for healthcare professionals. Multiple chronic conditions are an important factor influencing mortality in older adults. At the same time, cardiovascular events in older adult patients are one of the leading causes of mortality worldwide. This study aimed to design a machine learning model capable of predicting mortality risk in older adult patients with cardiovascular pathologies and multiple chronic diseases using the Cardiovascular Health Study database. The methodology for algorithm design included (i) database analysis, (ii) variable selection, (iii) feature matrix creation and data preprocessing, (iv) model training, and (v) performance analysis. The analysis and variable selection were performed through previous knowledge, correlation, and histograms to visualize the data distribution. The machine learning models selected were random forest, support vector machine, and logistic regression. The models were trained using two sets of variables. First, eight years of the data were summarized as the mode of all years per patient for each variable (123 variables). The second set of variables was obtained from the mode every three years (369 variables). The results show that the random forest trained with the second set of variables has the best performance (89% accuracy), which is better than other reported results in the literature.
KW - Cardiovascular Health Study
KW - logistic regression
KW - machine learning
KW - mortality risk
KW - multiple chronic diseases
KW - random forest
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85142831103&partnerID=8YFLogxK
U2 - 10.3390/app122211536
DO - 10.3390/app122211536
M3 - Article
AN - SCOPUS:85142831103
SN - 2076-3417
VL - 12
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 22
M1 - 11536
ER -