PrivyScan 🍝

"You clicked 'I agree.' We actually read it."

PrivyScan is an AI-powered privacy policy analyzer that summarizes lengthy privacy policies, labels important sections, and rates them from A to E based on privacy friendliness — helping users understand what they are agreeing to before clicking accept.

Overview of the Project

PrivyScan simplifies complex privacy policies using AI and Machine Learning. Users can enter a website URL, and the system automatically fetches, summarizes, categorizes, and rates the privacy policy. The goal is to make online privacy information transparent, accessible, and easy to understand.

Models Used

Task	Model Used
Summarization	BART
Policy Labelling / Classification	TF-IDF + Logistic Regression
Privacy Rating	LegalBERT

Dataset Used

Dataset	Description
OPP-115	Annotated dataset of website privacy policies categorized by privacy practices and data usage.
ToS;DR	Community-driven dataset that reviews and rates terms of service and privacy policies.

Methodology

1. Preprocessing

Privacy policy extraction
Text cleaning
Chunking large policies into manageable sections

2. Summarization

BART generates simplified summaries for each chunk

3. Classification / Labelling

TF-IDF + Logistic Regression classifies chunks into policy categories

4. Privacy Rating

LegalBERT assigns privacy ratings from A–E based on privacy practices

5. Pipeline Integration

All models are merged into a single end-to-end processing pipeline

Tech Stack

Component	Platform
Frontend	Vercel
Backend API	Render
ML Inference	Hugging Face Spaces
Version Control	GitHub
Containerization	Docker

~ by Team Spaghetti 🍝

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.vscode		.vscode
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrivyScan 🍝

Overview of the Project

Models Used

Dataset Used

Methodology

1. Preprocessing

2. Summarization

3. Classification / Labelling

4. Privacy Rating

5. Pipeline Integration

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrivyScan 🍝

Overview of the Project

Models Used

Dataset Used

Methodology

1. Preprocessing

2. Summarization

3. Classification / Labelling

4. Privacy Rating

5. Pipeline Integration

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages