You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MA in Quantitative Methods in the Social Sciences (Concentration: Data Science), Columbia University, Graduate School of Arts and Sciences (GSAS)
MPA in International Affairs (Specializations: United Nations Studies, International Conflict Resolution), Columbia University, School of International & Public Affairs (SIPA)
BA in Political Science (Concentration: International Affairs), Northwestern University
Work Experience
Founder, CEO, and Lead Geospatial Machine Learning Engineer @ Crisis Forecast - Las Vegas, NV - 2023 - Present
Data Scientist Intern @ Volkswagen Group of America (VGoA) - Belmont, CA - 2022-2023
SIPA Capstone Project Consultant @ International Peace Institute (IPI) - New York, NY - 2020
Protection of Civilians (PoC) Team Intern, Division of Policy, Evaluation, and Training @ United Nations Department of Peace Operations (UN DPO) - New York, NY - 2019
Refugee Status Determination (RSD) Intern @ United Nations High Commissioner for Refugees (UNHCR) - Cairo, Egypt - 2016
Technology Used: R Markdown, QGIS, mlr3, spatially aware cross-validation, XGBoost, Random Forest, Support Vector Machines, Principal Component Analysis, multi-objective tuning, hyperband, nsga2R, sf, raster, terra, spdep, blockCV, ggplot2, rayshader, pROC, PRROC, caret, data.tables, future (parallel processing)
Contents: Geospatial machine learning model creation and performance comparison, feature engineering, creating dynamically weighted and asymmetrically penalized metrics for highly skewed data, spatial weight matrix creation, feature selection, feature importance, creating maps with fishnet grid cell observations, kernel density estimation heatmaps, Global and Local Moran's I calculation, variogram analysis for determining the range of spatial autocorrelation, and data visualization of 2D and 3D maps, bar charts, histograms, risk category distribution charts, ROC Curves, and Precision-Recall Curves.
Contents: Geospatial interpolation, hot spot analysis, Moran's I calculation, Moran's I residual analysis, Lagrange Multiplier (LM) and Robust LM lag and error diagnostics, and Spatial Durbin models.
Data Visualization Storytelling:
Title: "UNAMID: Did the UN’s Withdrawal from Darfur Lead to More Violence against Civilians?" (9 Pages)
Technology Used: R Markdown, sf, tidyr, dplyr, htmlwidgets, igraph, and visNetwork
Contents: Interactive undirected network graph, with nodes weighted by eigenvector centrality, edges weighted by total conflict episodes between node dyads, community clusters determined by the Walktrap graph cluster algorithm, edge type distinguished by color, and pop-up displays of edge attributes as users hover over edges.
Contents: Converting news articles by publishing date into time-series machine learning forecasting models. Performance comparison between Ridge, Lasso, Random Forest, and XGBoost regression models
Title: "LDA Topic Modeling & VADER Sentiment Analysis for Political News Articles on Nigeria"
Technology Used: Python, R Markdown, Excel, NLTK for stopwords, PorterStemmer, and PunktSentenceTokenizer, gensim library for CoherenceModel, LdaModel, and corpora, Jaccard similarity, vaderSentiment library, itertools, ggplot2
Contents: Text data cleaning, Latent Dirichlet Allocation (LDA) topic modeling of Nigerian news article text, VADER (Valence Aware Dictionary for Sentiment Reasoning) sentiment analysis scores for articles containing specific political words, compared across quarters of the year.
Focus: Estimating the causal effect of a job training program on post-intervention earnings using randomized experimental data, with regression adjustment, treatment heterogeneity analysis, and Monte Carlo power validation.
Technology Used: Python, pandas, NumPy, SciPy, statsmodels (OLS, ANCOVA), seaborn, matplotlib, Monte Carlo simulation.
Contents: A randomized controlled trial of 445 participants evaluating the impact of a job training program on 1978 earnings. Randomization was validated through baseline balance tests and standardized mean differences. Treatment effects were estimated using Welch mean comparisons, fully adjusted regression with HC3 robust standard errors, and ANCOVA controlling for baseline earnings, with interaction models assessing heterogeneity and log-income models addressing skewness. The preferred ANCOVA model estimated a statistically significant earnings increase of about $1,773 (95% CI: $443–$3,102). Analytical power was approximately 80%, confirmed via Monte Carlo simulation.
Contents: An observational study analyzing 548 sales observations across 137 retail locations and 10 markets. Employed linear mixed effects models with nested random effects to account for hierarchical data structure (locations within markets). Identified bimodal distribution in sales, stratified analysis by market segment, and conducted pairwise comparisons with Bonferroni correction. Found Promotion 1 fairly consistently associated with higher sales compared to alternatives across market segments.
Contents: API design with POST /predict for multipart file upload (UploadFile) and a health check endpoint (GET /). One-time initialization at application startup using FastAPI lifespan to avoid reloading tokenizer/analyzer per request. Sentence segmentation with a Punkt tokenizer updated using a custom abbreviation set to avoid splitting on common abbreviations. Sentence-level VADER compound scoring and aggregation into an overall mean sentiment score for the full document. Structured, validated JSON responses using Pydantic response models (filename, sentence count, per-sentence scores, overall score). Local execution using Uvicorn.
Contents: A proof-of-concept for using Retrieval Augmented Generation (RAG) with an LLM to search PDF documents on a computer, providing context for chatbot queries.
Contents: Web scraping using BeautifulSoup and TOR, and structuring originating URLs, scraped URLs, text and tables from scraped URLs in Pandas DataFrames for analysis. Extensive cleaning of scraped text via regex. Viewing styled Pandas DataFrames. Viewing scraped tables in html.
Convolutional Neural Networks (CNNs) for Image Classification:
Contents: Comparing the performance of 8 CNN deep learning models on X-ray images from three classes (COVID-19, viral pneumonia, and healthy). These include transfer learning models (e.g., InceptionV3), and various techniques to improve model generalization and help avoid overfitting (e.g., dropout, batch normalization, early stopping, data augmentation, L1 and L2 regularization, fire modules, and ways of using deep networks effectively). I also demonstrate best practices for structuring filters/kernels, channels, layers, activation functions, pooling, convolutional blocks, and other model components for optimal performance. Metrics include confusion matrixes, accuracy, precision, recall, F1-score, ROC curve, and AUC. Analysis of non-augmented vs. augmented data models with specific augmentation techniques are shown. Architectures and training strategies for each model are detailed.
Contents: Comparing the performance of 5 RNN sequential text classification models (having varying architectures) that were trained on text from accurate and misleading tweets about COVID-19, in order to classify unseen tweets as containing true or false information.
About
Visit the website below to view the html files in this portfolio