A Hybrid Deep Learning Framework for Cardiovascular Risk Prediction Using Temporal Embeddings, Ensemble Learning, and Bayesian Uncertainty Estimation
Abstract
This study presents a new hybrid deep learning framework that predicts the risk of cardiovascular disease (CVD) by combining different techniques into one system. The methods used in the study are Long Short- Term Memory (LSTM) autoencoders for temporal representation learning, hybrid feature fusion, stacked ensemble learning, and uncertainty estimation via Bayesian methods. The proposed framework is to be used for the early CVD risk stratification in order to achieve better predictive performance, clinical acceptability and interpretability. The data source was the famous Framingham Heart Study dataset with 4,240 records and 16 clinical variables. The preprocessing steps performed were Hampel filtering for outlier removal, mean imputation for missing value treatment and Min-Max normalization. In addition, the use of Principal Component Analysis (PCA) facilitated the retention of the most important components which explain the highest variance. In order to create a risk evolution scenario, a synthetic temporal sequence was produced and then passed through the LSTM autoencoder, resulting in 32-dimensional latent features. The temporal embeddings were concatenated with the PCA components to create a 41- dimensional hybrid feature space. The problem of class imbalance was solved through the use of a Synthetic Minority Over-Sampling Technique (SMOTE). A stacked ensemble classifier was composed of eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), and Gradient Boosting as base learners, and a Multilayer Perceptron (MLP) was trained as a meta-learner. For uncertainty quantification, a separate Bayesian MLP model using Monte Carlo Dropout was created. The stacked model performed with 96.06% accuracy, 97.67% recall, and 99.31% Area Under the Curve - Receiver Operating Characteristic, thus surpassing single classifiers. Bayesian analysis produced a mean predictive uncertainty of 0.087. Stratified risk assessment disclosed clinically relevant clusters with a high degree of correspondence between the predicted and actual CVD incidence. This interpretable concurrent AI model provides accurate CVD risk prediction that is suitable for daily clinical and wearable monitoring use.DOI:
https://doi.org/10.31449/inf.v50i1.13040Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







