Project-Portfolio

I have completed a series of data-related projects and challenges to demonstrate what I have learned and how I have applied the skills I acquired throughout my learning journey.

Python for Data Science
Data Engineering
SQL
Machine Learning
Business Intelligence (Tableau)

Python for Data Science

Project	Description	Project Scope	Requirements	Libraries	Completed
ABC Product Segmentation and Analysis	The dataset is a year's worth of electronics sales transactions. The business objective is to identify the products which generated 80% of profit.	Data exploration and preprocessing, exploratory data analysis, and ABC product segmentation and analysis	Python, Python IDE or text editor and CLI, Jupyter notebook	numpy, pandas, matplotlib, seaborn	March 2024
Customer Segmentation - Percentile Ranking	Carry out segmentation of customers using the transaction history of an online gift store	Data extraction, data exploration, data preprocessing, exploratory data analysis, customer segmentation based on RFM metrics	Python, Python IDE or text editor and CLI, Jupyter notebook	os, sys, pyscopg2, pandas, numpy, matplotlib, seaborn, dotenv, scipy,	August 2023
Happy Deliveries Case Study	Use the data from Happy Deliveries, a food delivery business based in Ireland. The company aims to utilize its data to gain an advantage over its competitors.	Data cleaning, exploratory data analysis	Python, Jupyter notebook	Pandas, numpy, matplotlib, seaborn	May 2023
Product Range Analysis	Masterschool's capstone project integrating skills and tools for data analysis.	Data Preprocessing, Exploratory Data Analysis, Customer Segments Analysis, Product Category Analysis, Statistical Hypotheses, Insights	Python, Jupyter notebook	Pandas, numpy, matplotlib, seaborn,scipy	August 2022
A/B Test – Online Store	Conduct an analysis of an incomplete A/B test project with expected result: Within 14 days of signing up, users will show better conversion into product page views, product cart views and purchases. At each of the stages of the funnel, there will be at least a 10% increase.	Data exploration, data preprocessing, exploratory data analysis, evaluation of A/B test results, and conclusion	Python, Jupyter notebook	Pandas, numpy, matplotlib, seaborn, ploty.express, scipy, statsmodel.pi	August 2022
Comparing Search Interest with Google Trends	Analyze Google Trends data related to the popularity of five major browsers and the Kardashian & Jenner sisters.	statistical data analysis	Python, Jupyter notebook	Pandas, matplotlib	July 2022
Data Storytelling	Build a data story from the movies dataset of a fictitious production company	Data collection, data preprocessing, insights, presentation	Python, Jupyter notebook, Google slides or PowerPoint for the presentation	Pandas, numpy, ast, re, matplotlib, seaborn	May 2022
Communicate Data Findings	Demonstrate the importance and value of data visualization techniques in the data analysis process: exploratory data analysis and explanatory data analysis using the loan data from Proper Marketplace	Preliminary wrangling, univariate, bivariate, and multivariate exploration, inferences, explanatory data analysis, slideshow to convey key insights	Python, Jupyter notebook	Pandas, numpy, matplotlib, seaborn	May 2022
Data Wrangling - Twitter Data	Perform data wrangling tasks on the tweet archive of Twitter user @dog_rates, also known as WeRateDogs	Data gathering, data quality audit, data cleaning, analysis, and visualization	Python, Jupyter notebook	Pandas, numpy, matplotlib, request, json, tweepy, IPython.core.display	April 2022
Analyze A/B Test Result	Performed A/B test and help the company decide whether they should implement the new web page, keep the old page, or run the experiment longer.	Data exploration, data preprocessing, probability analysis, hypothesis testing, logistic regression, conclusion	Python, Jupyter notebook	pandas, numpy, random, matplotlib, statsmodel.api, subprocess	February 2022
Explore US Bikeshare Data	Explore data related to bike sharing systems for three (3) major cities in the United States: Chicago, New York, and Washington by using Python.	Data exploration, descriptive statistics, scripting – take in raw input to create an interactive experience in the CLI to present these statistics	Python, Jupyter notebook, Python IDE or text editor + CLI	Pandas, time, numpy, calendar	February 2022

Data Engineering

Project	Description	Scope	Requirements	Libraries	Date Completed
ETL Pipeline	Build an ETL pipeline for online transactions data of an e-commerce platform	Extract - connect to Amazon Redshift data warehouse and extract online transaction data with transformation tasks performed using SQL, Transform - identify and remove duplicated records, Load - connect to Amazon S3 cloud object storage and write the cleaned data as a CSV file into an S3 bucket, Run ETL from the CLI and by using Docker	Python, Dbeaver, Python IDE or text editor + CLI, Docker, Amazon Redshift credentials, Amazon S3 details	psycopg2-binary, pandas, boto3, dotenv	June 2023
Data Modeling with Postgres	Create a database schema, a Postgres database, and develop ETL processes on song and log data collected from a music streaming app	Create a database schema by defining fact and dimension tables, create a Postgres database with tables designed to optimize queries on song play analysis, build an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL	Python, Python IDE or text editor + CLI, Jupyter notebook	psycopg2, os, glob, pandas, ipython-sql	March 2022
Data Modeling with Cassandra	Create an Apache Cassandra database and built an ETL pipeline on songs and user activity data collected from a music streaming app which resides in a directory of csv files	Modeling Apache Cassandra database and build ETL pipeline	Python, Jupyter notebook	pandas, cassandra, re, os, glob, numpy, json, csv	March 2022

SQL

Project / Challenge	Description	Tasks	Date Completed
E-book startup	Provide answers to business questions to help the startup company create a product value proposition	Data exploration	August 2022
Deforestation Exploration	Create views, simple and complex SQL queries to answer questions from the fictitious ForestQuery management about the deforestation’s global situation, regional outlook, and country-level details	Data exploration	March 2022
SQL Interview Questions on Data Lemur	Solutions to SQL interview questions at each level of difficulty		in progress

Machine Learning

Project	Description	Project Scope	Area	Algorithm	Date Completed
Customer Segmentation based on RFM-Kmeans Clustering	Carry out customer segmentation based on RFM metrics using K-means clustering	Data exploration, data preprocessing, EDA, customer segmentation	Unsupervised Learning - Clustering	K-means Clustering	August 2023
Predicting Credit Card Approvals	Build an automatic credit card approval predictor using the credit card approval dataset	Data exploration, data preprocessing (scaling, label encoding, missing value imputation), EDA, fitting logistic regression model to the train set, making predictions, evaluating performance	Supervised Learning - Classification	Logistic Regression	August 2022
Customer RFM Segmentation using K-means Clustering	Segmented customers on the basis of recency, frequency, and monetary value using K-means clustering	data preprocessing, EDA, customer segmentation	Unsupervised Learning - Clustering	K-means Clustering	August 2022
Product Categorization	Carry out product category identification based on the description field of an online transaction history dataset of an online gifts store	data preprocessing, EDA, product categorization	NLP, unsupervised learning- Clustering	TF-IDF, K-means Clustering, nltk, wordcloud	August 2022

Business Intelligence (Tableau)

Project / Challenge	Dataset	Visualization	Date Completed
Electronic Sales Performance Dashboard	Preprocessed electronics dataset	Sales overview, performance by product, category, state/city, and hour, top performing products and categories, product segments analysis, key products analysis	March 2024
Executive Dashboard	Preprocessed online transaction data and customer segments data from the Customer Segmentation project	Text table to show performance metrics, top-performing products and top-performing customers bar charts, revenue by segment pie chart, daily revenue timeline with 7-day moving average line chart	August 2023
Superstore - Executive Dashboard	Superstore dataset	Map to visualize profit ratio by geography, area charts to convey sales by category and by segment, text table using measure names and measure values as the key performance indicators, bar chart that appears when hovered over the map	August 2022
Max and Min Sales by Month	Superstore dataset	Table showing sub-category by month: sub-categories are sorted in descending order by total sales of the year; for each month, highlight the min sales in red, and max sales in blue; users should be able to filter by year and category	August 2022
Customers whose Sales have Increased Each Year	Superstore dataset	Text table showing sales by customers and years but only including customers whose sales have increased from 2018 to 2021 and sorting by 2021 sales amount	August 2022

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-Portfolio

Python for Data Science

Data Engineering

SQL

Machine Learning

Business Intelligence (Tableau)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Project-Portfolio

Python for Data Science

Data Engineering

SQL

Machine Learning

Business Intelligence (Tableau)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages