I have completed a series of data-related projects and challenges to demonstrate what I have learned and how I have applied the skills I acquired throughout my learning journey.
| Project | Description | Project Scope | Requirements | Libraries | Completed |
|---|---|---|---|---|---|
| ABC Product Segmentation and Analysis | The dataset is a year's worth of electronics sales transactions. The business objective is to identify the products which generated 80% of profit. | Data exploration and preprocessing, exploratory data analysis, and ABC product segmentation and analysis | Python, Python IDE or text editor and CLI, Jupyter notebook | numpy, pandas, matplotlib, seaborn | March 2024 |
| Customer Segmentation - Percentile Ranking | Carry out segmentation of customers using the transaction history of an online gift store | Data extraction, data exploration, data preprocessing, exploratory data analysis, customer segmentation based on RFM metrics | Python, Python IDE or text editor and CLI, Jupyter notebook | os, sys, pyscopg2, pandas, numpy, matplotlib, seaborn, dotenv, scipy, | August 2023 |
| Happy Deliveries Case Study | Use the data from Happy Deliveries, a food delivery business based in Ireland. The company aims to utilize its data to gain an advantage over its competitors. | Data cleaning, exploratory data analysis | Python, Jupyter notebook | Pandas, numpy, matplotlib, seaborn | May 2023 |
| Product Range Analysis | Masterschool's capstone project integrating skills and tools for data analysis. | Data Preprocessing, Exploratory Data Analysis, Customer Segments Analysis, Product Category Analysis, Statistical Hypotheses, Insights | Python, Jupyter notebook | Pandas, numpy, matplotlib, seaborn,scipy | August 2022 |
| A/B Test – Online Store | Conduct an analysis of an incomplete A/B test project with expected result: Within 14 days of signing up, users will show better conversion into product page views, product cart views and purchases. At each of the stages of the funnel, there will be at least a 10% increase. | Data exploration, data preprocessing, exploratory data analysis, evaluation of A/B test results, and conclusion | Python, Jupyter notebook | Pandas, numpy, matplotlib, seaborn, ploty.express, scipy, statsmodel.pi | August 2022 |
| Comparing Search Interest with Google Trends | Analyze Google Trends data related to the popularity of five major browsers and the Kardashian & Jenner sisters. | statistical data analysis | Python, Jupyter notebook | Pandas, matplotlib | July 2022 |
| Data Storytelling | Build a data story from the movies dataset of a fictitious production company | Data collection, data preprocessing, insights, presentation | Python, Jupyter notebook, Google slides or PowerPoint for the presentation | Pandas, numpy, ast, re, matplotlib, seaborn | May 2022 |
| Communicate Data Findings | Demonstrate the importance and value of data visualization techniques in the data analysis process: exploratory data analysis and explanatory data analysis using the loan data from Proper Marketplace | Preliminary wrangling, univariate, bivariate, and multivariate exploration, inferences, explanatory data analysis, slideshow to convey key insights | Python, Jupyter notebook | Pandas, numpy, matplotlib, seaborn | May 2022 |
| Data Wrangling - Twitter Data | Perform data wrangling tasks on the tweet archive of Twitter user @dog_rates, also known as WeRateDogs | Data gathering, data quality audit, data cleaning, analysis, and visualization | Python, Jupyter notebook | Pandas, numpy, matplotlib, request, json, tweepy, IPython.core.display | April 2022 |
| Analyze A/B Test Result | Performed A/B test and help the company decide whether they should implement the new web page, keep the old page, or run the experiment longer. | Data exploration, data preprocessing, probability analysis, hypothesis testing, logistic regression, conclusion | Python, Jupyter notebook | pandas, numpy, random, matplotlib, statsmodel.api, subprocess | February 2022 |
| Explore US Bikeshare Data | Explore data related to bike sharing systems for three (3) major cities in the United States: Chicago, New York, and Washington by using Python. | Data exploration, descriptive statistics, scripting – take in raw input to create an interactive experience in the CLI to present these statistics | Python, Jupyter notebook, Python IDE or text editor + CLI | Pandas, time, numpy, calendar | February 2022 |
| Project | Description | Scope | Requirements | Libraries | Date Completed |
|---|---|---|---|---|---|
| ETL Pipeline | Build an ETL pipeline for online transactions data of an e-commerce platform | Extract - connect to Amazon Redshift data warehouse and extract online transaction data with transformation tasks performed using SQL, Transform - identify and remove duplicated records, Load - connect to Amazon S3 cloud object storage and write the cleaned data as a CSV file into an S3 bucket, Run ETL from the CLI and by using Docker | Python, Dbeaver, Python IDE or text editor + CLI, Docker, Amazon Redshift credentials, Amazon S3 details | psycopg2-binary, pandas, boto3, dotenv | June 2023 |
| Data Modeling with Postgres | Create a database schema, a Postgres database, and develop ETL processes on song and log data collected from a music streaming app | Create a database schema by defining fact and dimension tables, create a Postgres database with tables designed to optimize queries on song play analysis, build an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL | Python, Python IDE or text editor + CLI, Jupyter notebook | psycopg2, os, glob, pandas, ipython-sql | March 2022 |
| Data Modeling with Cassandra | Create an Apache Cassandra database and built an ETL pipeline on songs and user activity data collected from a music streaming app which resides in a directory of csv files | Modeling Apache Cassandra database and build ETL pipeline | Python, Jupyter notebook | pandas, cassandra, re, os, glob, numpy, json, csv | March 2022 |
| Project / Challenge | Description | Tasks | Date Completed |
|---|---|---|---|
| E-book startup | Provide answers to business questions to help the startup company create a product value proposition | Data exploration | August 2022 |
| Deforestation Exploration | Create views, simple and complex SQL queries to answer questions from the fictitious ForestQuery management about the deforestation’s global situation, regional outlook, and country-level details | Data exploration | March 2022 |
| SQL Interview Questions on Data Lemur | Solutions to SQL interview questions at each level of difficulty | in progress |
| Project | Description | Project Scope | Area | Algorithm | Date Completed |
|---|---|---|---|---|---|
| Customer Segmentation based on RFM-Kmeans Clustering | Carry out customer segmentation based on RFM metrics using K-means clustering | Data exploration, data preprocessing, EDA, customer segmentation | Unsupervised Learning - Clustering | K-means Clustering | August 2023 |
| Predicting Credit Card Approvals | Build an automatic credit card approval predictor using the credit card approval dataset | Data exploration, data preprocessing (scaling, label encoding, missing value imputation), EDA, fitting logistic regression model to the train set, making predictions, evaluating performance | Supervised Learning - Classification | Logistic Regression | August 2022 |
| Customer RFM Segmentation using K-means Clustering | Segmented customers on the basis of recency, frequency, and monetary value using K-means clustering | data preprocessing, EDA, customer segmentation | Unsupervised Learning - Clustering | K-means Clustering | August 2022 |
| Product Categorization | Carry out product category identification based on the description field of an online transaction history dataset of an online gifts store | data preprocessing, EDA, product categorization | NLP, unsupervised learning- Clustering | TF-IDF, K-means Clustering, nltk, wordcloud | August 2022 |
| Project / Challenge | Dataset | Visualization | Date Completed |
|---|---|---|---|
| Electronic Sales Performance Dashboard | Preprocessed electronics dataset | Sales overview, performance by product, category, state/city, and hour, top performing products and categories, product segments analysis, key products analysis | March 2024 |
| Executive Dashboard | Preprocessed online transaction data and customer segments data from the Customer Segmentation project | Text table to show performance metrics, top-performing products and top-performing customers bar charts, revenue by segment pie chart, daily revenue timeline with 7-day moving average line chart | August 2023 |
| Superstore - Executive Dashboard | Superstore dataset | Map to visualize profit ratio by geography, area charts to convey sales by category and by segment, text table using measure names and measure values as the key performance indicators, bar chart that appears when hovered over the map | August 2022 |
| Max and Min Sales by Month | Superstore dataset | Table showing sub-category by month: sub-categories are sorted in descending order by total sales of the year; for each month, highlight the min sales in red, and max sales in blue; users should be able to filter by year and category | August 2022 |
| Customers whose Sales have Increased Each Year | Superstore dataset | Text table showing sales by customers and years but only including customers whose sales have increased from 2018 to 2021 and sorting by 2021 sales amount | August 2022 |