TY - GEN
T1 - Predicting cardiovascular disease by combining optimal feature selection methods with machine learning
AU - Segura, Mauricio Rodriguez
AU - Nicolis, Orietta
AU - Marquez, Billy Peralta
AU - Carrillo Azocar, Juan
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/11/16
Y1 - 2020/11/16
N2 - Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD then, the selected features are used for training four ML algorithms: Naïve Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.
AB - Cardiovascular Disease (CVD) is one of the main causes of death in the world. Early detection could prevent deaths associated to cardiac problems. In this work, we propose a methodology based on data pre-processing and Machine Learning (ML) techniques for predicting cardiovascular disease, by using the Sleep Heart Health Study (SHHS) dataset. First, the principal component analysis and lowest p-value logistic regression are applied to select optimal features which could be related to the CVD then, the selected features are used for training four ML algorithms: Naïve Bayes (NB), Feed Forward Neural Networks (NN), Support Vector Machine (SVM) and Random Forest (RF). A binary feature was considered as output of the proposed models and the SMOTE sampling has been used for balancing the training set. Among the proposed methods, NN provided the best accuracy (0.81) and AUC (0.76) outperforming the results obtained in other studies.
KW - Cardiovascular disease
KW - classification models
KW - linear regression
KW - PCA
UR - http://www.scopus.com/inward/record.url?scp=85098634023&partnerID=8YFLogxK
U2 - 10.1109/SCCC51225.2020.9281168
DO - 10.1109/SCCC51225.2020.9281168
M3 - Conference contribution
AN - SCOPUS:85098634023
T3 - Proceedings - International Conference of the Chilean Computer Science Society, SCCC
BT - 2020 39th International Conference of the Chilean Computer Science Society, SCCC 2020
PB - IEEE Computer Society
T2 - 39th International Conference of the Chilean Computer Science Society, SCCC 2020
Y2 - 16 November 2020 through 20 November 2020
ER -