Skip to content

ddong63/Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

Project 1: Data Visualization of COVID

Dashboard: link

  • Processed the COVID structured data (incl. data quality check) using dplyr
  • Created plots with ggplot2 and plotly to show the trends over time

Project 2: De-Identification Pipeline wiki

  • Pull notes from the server and identify personal health information (PHI) using Name Entity Recognition algorithms.
  • Check notes and determine the notes for manual review.
  • Developed resynthesis algorithm used for replacing identifiers (name, address, etc) with surrogates back into the raw text.

Project 3: Cluster Membership

Scope: clustered 48 antigens into 4 groups.

  • Fitted standardized data across time point and treatment groups.
  • Applied hierarchical clustering algorithm with Ward’s minimum variance method to find compact, spherical clusters.
  • Chose the best number of clusters by cutting the corresponding hierarchical tree based on their Pearson's correlation.

Project 4: Lending Club Loan Risk Prediction

Data size: 500k+; Data Source: Link

  • Cleaned and processed the lending club open souce dataset (2019 Q1-Q4).
  • Applied feature engineering to 150+ features, and revealed the top factors for loan risks.
  • Trained Random Forest and Gradient Boosting to predict loan risks (average precision & recall > 75%)
  • Created an interactive loan risk prediction using Flask API (below).

Releases

No releases published

Packages

 
 
 

Contributors