The projects in this repository were created using R Studio and R.
- R Studio: Version 2023.09.1+494 "Desert Sunflower" Release (cd7011dce393115d3a7c3db799dda4b1c7e88711, 2023-10-16) for Windows
- R Version: 4.3.1
To identify potential predictors contributing to death outcomes of heart failure (HF) patient an exploratory analysis, subset selection, and a supervised learning (SL) model will be constructed using logistic regression. The aim is to identify risk factors for people living with HF, in the hopes of suggesting lifestyle changes and treatments in order to prolong the lives of people living with this condition.
The data used in this program is sourced from the Medical Information Mart for Intensive Care (MIMIC-III). It is a large, single-center database that holds information about patient admission to critical care units in the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA. The dataset is altered to include certain variables relating to demographic statistics, vital signs, co-morbidities, and laboratory measurements. The patients included in this analysis are older than 15 years of age at the time of intensive care unit (ICU) admission, of which 13,389 patients had a diagnosis of HF, and 1,177 adult patients were then included in this analysis.
- Data Cleaning. Prior to analysis and subset creation using R Studio, the data is cleaned.
- Feature Engineering. There are 1929 missing values within the dataset, which vary in proportion depending on the variable. Predictive mean matching (PMM) is performed on the dataset.