🛒 Zepto Product Data Analysis using SQL

This SQL project analyzes product-level data from Zepto, an online grocery delivery platform. The dataset includes product information such as name, MRP, discount, availability, and quantity. The goal is to clean, transform, and extract business insights using SQL.

📁 Dataset Structure

The zepto table includes the following columns:

Column Name	Description
sku_id	Unique identifier for each product (Primary Key)
category	Product category
name	Product name
mrp	Maximum Retail Price
discountPercent	Discount offered on the product
availableQuantity	Quantity currently available in stock
discountedSellingPrice	Final selling price after discount
weightInGms	Weight of the product in grams
outOfStock	Original status in string format
quantity	Quantity sold or ordered
outstock	Converted stock status (Boolean)

🧪 Data Exploration

1. View Sample Data

SELECT * FROM zepto;

➡️ Displays sample rows to understand the dataset structure.

2. Count total rows

SELECT COUNT(*) FROM zepto;

➡️ Total number of records in the table.

3. Add boolean stock column

ALTER TABLE zepto ADD COLUMN outstock BOOLEAN;
UPDATE zepto SET outstock = CASE 
    WHEN outofstock = "FALSE" THEN FALSE 
    WHEN outofstock = "TRUE" THEN TRUE 
END;

➡️ Converts outOfStock string values to a boolean field outstock.

4. Check for NULLs

SELECT * FROM zepto
WHERE category IS NULL OR name IS NULL OR ...;

➡️ Checks for missing values in the dataset.

5. Distinct product categories

SELECT DISTINCT category FROM zepto ORDER BY category;

➡️ Lists all unique product categories.

6. Count of in-stock vs out-of-stock products

SELECT COUNT(sku_id), 
       CASE WHEN outstock = FALSE THEN "available" 
            WHEN outstock = TRUE THEN "notavailable" END AS stock_status
FROM zepto
GROUP BY outstock;

➡️ Groups product counts based on stock status.

7. Products listed more than once

SELECT name, COUNT(sku_id) 
FROM zepto 
GROUP BY name 
HAVING COUNT(sku_id) > 1 
ORDER BY COUNT(sku_id) DESC;

➡️ Detects duplicate product entries.

🧹 Data Cleaning

1. Check for MRP or discounted price being 0

SELECT * FROM zepto WHERE mrp = 0 OR discountedSellingPrice = 0;

➡️ Detects invalid pricing records.

2. Remove such invalid entries

DELETE FROM zepto WHERE mrp = 0 OR discountedSellingPrice = 0;

➡️ Cleans out erroneous product data.

3. Convert price from paisa to rupees

UPDATE zepto 
SET mrp = mrp / 100.0, discountedSellingPrice = discountedSellingPrice / 100.0;

➡️ Adjusts price format for better analysis.

📊 Business Insights

Q1: Top 10 best-value products by discount

SELECT DISTINCT * FROM zepto ORDER BY discountPercent DESC LIMIT 10;

➡️ Products offering the highest discounts.

Q2: High MRP products that are out of stock

SELECT DISTINCT name, mrp 
FROM zepto 
WHERE outstock IS TRUE AND mrp > 300 
ORDER BY mrp DESC;

➡️ Identifies expensive but unavailable products.

Q3: Estimated revenue by category

SELECT SUM(discountedSellingPrice * availableQuantity) AS total_revenue, category 
FROM zepto 
GROUP BY category 
ORDER BY total_revenue DESC;

➡️ Calculates revenue potential for each category.

Q4: Products with high MRP but low discount

SELECT DISTINCT name, mrp, discountPercent 
FROM zepto 
WHERE mrp > 500 AND discountPercent < 10 
ORDER BY mrp DESC, discountPercent DESC;

➡️ Filters premium products with small discounts.

Q5: Top 5 categories by average discount

SELECT DISTINCT ROUND(AVG(discountPercent), 2) AS avg_dis_price, category 
FROM zepto 
GROUP BY category 
ORDER BY avg_dis_price DESC 
LIMIT 5;

➡️ Shows which categories give highest average discounts.

Q6: Price per gram for products >100g

SELECT DISTINCT name, ROUND((discountedSellingPrice / weightInGms), 2) AS price_per_grams 
FROM zepto 
WHERE weightInGms >= 100 
ORDER BY price_per_grams;

➡️ Helps assess cost efficiency per gram.

Q7: Classify product sizes

SELECT name, weightInGms, 
       CASE 
           WHEN weightInGms < 1000 THEN "low"
           WHEN weightInGms < 5000 THEN "medium"
           ELSE "bulk" 
       END AS category_quantity 
FROM zepto;

➡️ Categorizes product size into low, medium, or bulk.

Q8: Total inventory weight by category

SELECT SUM(weightInGms * availableQuantity) AS total_weight, category 
FROM zepto 
GROUP BY category 
ORDER BY total_weight DESC;

➡️ Total stock weight for each category.

✅ Conclusion

This project shows how to clean and analyze product data using SQL, focusing on stock availability, pricing insights, and revenue potential.

🛠 Tools Used

MySQL
SQL Workbench / DBeaver
GitHub

👨‍💻 Author

Praveen S R
SQL | Data Analytics | Machine Learning

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Zepto_SQL_README.md		Zepto_SQL_README.md
project 2 sql.sql		project 2 sql.sql
zepto_v2.csv		zepto_v2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛒 Zepto Product Data Analysis using SQL

📁 Dataset Structure

🧪 Data Exploration

1. View Sample Data

2. Count total rows

3. Add boolean stock column

4. Check for NULLs

5. Distinct product categories

6. Count of in-stock vs out-of-stock products

7. Products listed more than once

🧹 Data Cleaning

1. Check for MRP or discounted price being 0

2. Remove such invalid entries

3. Convert price from paisa to rupees

📊 Business Insights

Q1: Top 10 best-value products by discount

Q2: High MRP products that are out of stock

Q3: Estimated revenue by category

Q4: Products with high MRP but low discount

Q5: Top 5 categories by average discount

Q6: Price per gram for products >100g

Q7: Classify product sizes

Q8: Total inventory weight by category

✅ Conclusion

🛠 Tools Used

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🛒 Zepto Product Data Analysis using SQL

📁 Dataset Structure

🧪 Data Exploration

1. View Sample Data

2. Count total rows

3. Add boolean stock column

4. Check for NULLs

5. Distinct product categories

6. Count of in-stock vs out-of-stock products

7. Products listed more than once

🧹 Data Cleaning

1. Check for MRP or discounted price being 0

2. Remove such invalid entries

3. Convert price from paisa to rupees

📊 Business Insights

Q1: Top 10 best-value products by discount

Q2: High MRP products that are out of stock

Q3: Estimated revenue by category

Q4: Products with high MRP but low discount

Q5: Top 5 categories by average discount

Q6: Price per gram for products >100g

Q7: Classify product sizes

Q8: Total inventory weight by category

✅ Conclusion

🛠 Tools Used

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages