Perfume Data Analysis and Rating Prediction

This project is a comprehensive data science project that scrapes, analyzes, and predicts perfume ratings. It uses a variety of data sources and machine learning models to understand the factors that influence perfume ratings and prices.

Features

Data Scraping: Scripts to scrape perfume data from various websites, including Parfumo, Idealo, and Douglas.
Data Analysis: In-depth analysis of perfume data, including ratings, prices, and scent profiles.
Prediction Models: Machine learning models to predict perfume ratings based on various features.
Data Visualization: A variety of plots and visualizations to understand the data and the model results.

Installation

Clone the repository:
```
git clone <repository-url>
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage

The project is organized into several directories and notebooks:

Scraping/: Contains scripts for scraping perfume data.
idealo_prices/: Contains scripts for scraping prices from Idealo.
duckduckgo_prices/: Contains scripts for scraping prices from DuckDuckGo.
nb_*.ipynb: Jupyter notebooks for data analysis, prediction, and visualization.

To run the scrapers or the analysis, navigate to the respective directories and run the Python scripts or notebooks.

Data

The project uses a variety of data sources, including:

perfumes.xlsx: The main dataset with perfume information.
idealo_prices.csv: Price data from Idealo.
data/: Contains various data files, including brand information and scraped data.

Models

The project uses the following machine learning models:

XGBoost: For predicting perfume ratings.
PyTorch: For building and training neural networks.
Optuna: For hyperparameter tuning.

The trained models are saved in the models/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.dvc		.dvc
Scraping		Scraping
Videos		Videos
archived_data		archived_data
data		data
duckduckgo_prices		duckduckgo_prices
idealo_prices		idealo_prices
models		models
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Spielwiese.ipynb		Spielwiese.ipynb
data.zip		data.zip
merged_df.parquet		merged_df.parquet
nb_pca.ipynb		nb_pca.ipynb
nb_prediction.ipynb		nb_prediction.ipynb
nb_prices.ipynb		nb_prices.ipynb
nb_statistics.ipynb		nb_statistics.ipynb
perfumes.xlsx		perfumes.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perfume Data Analysis and Rating Prediction

Features

Installation

Usage

Data

Models

About

Uh oh!

Releases

Packages

Languages

hamza1av/Parfumo

Folders and files

Latest commit

History

Repository files navigation

Perfume Data Analysis and Rating Prediction

Features

Installation

Usage

Data

Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages