MentorMe - Image Plagiarism Detection System

Real-time plagiarism detection for student image submissions using perceptual hashing, CLIP embeddings, and vector search.

Features

Perceptual hashing (pHash, dHash, aHash) for fast duplicate detection
CLIP embeddings for semantic similarity
Vector search with FAISS or pgvector
AI-generated image detection (DALL-E, Midjourney, Stable Diffusion)
Peer and self-plagiarism checking
Async processing with RabbitMQ
Optional student ID hashing for privacy

Quick Start

1. Start Services

./start-dev-env.sh

This starts PostgreSQL and RabbitMQ containers using Podman and creates a .env file with default settings.

2. Install Dependencies

python -m venv venv
source venv/bin/activate  # Windows: .\venv\Scripts\activate
pip install -r requirements.txt

3. Run Application

python app.py

The application will:

Connect to PostgreSQL and RabbitMQ
Start listening for plagiarism check submissions
Process images and store results in the database

Configuration

Configuration is managed via .env file (auto-created by start script).

Essential Settings

# Database Connection
POSTGRES_USER=plagiarism_user
POSTGRES_PASSWORD=secure_password
POSTGRES_DB=plagiarism_db
POSTGRES_HOST=postgres  # Use 'localhost' for local development

# Message Queue
RABBITMQ_HOST=rabbitmq  # Use 'localhost' for local development
RABBITMQ_USER=admin
RABBITMQ_PASS=admin123

Detection Thresholds

# Hash matching (lower = stricter)
HASH_MATCH_THRESHOLD=8  # Hamming distance

# Semantic similarity (higher = stricter)
SEMANTIC_MATCH_THRESHOLD=0.80  # 0.0 to 1.0

# Self-plagiarism grace period
RESUBMISSION_WINDOW_DAYS=14  # Days

Vector Search Backend

# Choose between FAISS (in-memory) or pgvector (database)
USE_PGVECTOR=false  # Set to 'true' for pgvector

See .env.example for all available options.

Testing

Run All Tests

# Install test dependencies
pip install -r requirements-test.txt

# Run tests
pytest tests/ -v

# With coverage report
pytest tests/ --cov --cov-report=html

Test Status

Total: 384 tests
Passing: 364 (94.8%)
Skipped: 20 (DB manager mocks - covered by integration tests)
Failing: 0
Execution Time: ~7 minutes

Coverage by Component:

✅ Integration tests: 11/11 (end-to-end workflows)
✅ Hash handlers: Complete coverage
✅ CLIP embeddings: 35/38 (3 skipped - validation tests)
✅ AI detection: 15/15
✅ FAISS handler: 22/22
✅ Image validator: 18/18

Project Structure

mentorme/
├── app.py                      # Main application entry
├── api/
│   └── api.py                 # FastAPI REST endpoints
├── config/
│   └── config.py              # Configuration management
├── database/
│   ├── db_manager.py          # Database operations
│   ├── init.sql               # Schema definition
│   └── dumps/                 # Database backups
├── image_worker/
│   ├── worker.py              # Core detection engine
│   ├── clip_handler.py        # CLIP embeddings (768D)
│   ├── hash_handler.py        # Perceptual hashing
│   ├── ai_generated_detector.py  # AI detection
│   ├── image_validator.py     # Image validation
│   ├── faiss_handler.py       # FAISS vector backend
│   └── pgvector_handler.py    # pgvector backend
├── mq/
│   └── rmq_client.py          # RabbitMQ client
├── plag_checker/
│   ├── submissions_checker.py # Message orchestrator
│   └── submission_status.py   # Status tracking
├── processors/
│   ├── base_processor.py      # Base processor class
│   ├── image_processor.py     # Image processing
│   └── text_processor.py      # Text processing
├── scripts/
│   ├── dump_database.sh       # Database backup
│   ├── restore_database.sh    # Database restore
│   └── download_clip_model.py # CLIP model downloader
├── seeding/
│   ├── seed_ref_images.py     # Reference image indexing
│   └── seed_from_xlsx.py      # Bulk submission seeding
├── utils/
│   ├── security.py            # Student ID hashing
│   └── exceptions.py          # Custom exceptions
└── tests/                      # Test suite (384 tests)

How It Works

Hash Check: Compares perceptual hashes (Hamming distance)
CLIP Check: Generates 768D embedding (ViT-L/14), searches vector index
AI Detection: Checks metadata and statistical patterns
Result: Returns plagiarism status with confidence score

Priority: Peer > Reference > Self (resubmission) > Original

Container Management

# Check status
podman ps

# View logs
podman logs mentorme-postgres
podman logs mentorme-rabbitmq

# Stop services
podman stop mentorme-postgres mentorme-rabbitmq

# Restart services
./start-dev-env.sh

Monitoring

RabbitMQ Management UI:

URL: http://localhost:15672
Login: admin/admin123

Database Stats:

-- Total submissions
SELECT COUNT(*) FROM submissions;

-- Plagiarism rate
SELECT 
  COUNT(*) FILTER (WHERE is_plagiarized = true) * 100.0 / COUNT(*) AS plagiarism_rate
FROM submissions;

Troubleshooting

RabbitMQ Connection Issues:

podman restart mentorme-rabbitmq
podman logs mentorme-rabbitmq

PostgreSQL Connection Issues:

podman restart mentorme-postgres
podman exec mentorme-postgres pg_isready -U plagiarism_user

Test Failures:

pip install -r requirements-test.txt --upgrade
pytest tests/ -vv --tb=short

Documentation

DOCUMENTATION.md - Complete technical documentation
DATABASE_DUMPS.md - Database backup/restore guide
Seeding README - Data seeding instructions
Copilot Instructions - Development guidelines

License

MIT License

Acknowledgments

OpenCLIP - Image embeddings
FAISS - Vector search
imagehash - Perceptual hashing
asyncpg, psycopg3 - PostgreSQL drivers
aio-pika - RabbitMQ client

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.tours		.tours
api		api
config		config
database		database
docs		docs
image_worker		image_worker
mq		mq
plag_checker		plag_checker
processors		processors
scripts		scripts
seeding		seeding
tests		tests
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
README.md		README.md
app.py		app.py
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose-prod.yml		docker-compose-prod.yml
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
run_tests.bat		run_tests.bat
run_tests.sh		run_tests.sh
start-dev-env.README.md		start-dev-env.README.md
start-dev-env.ps1		start-dev-env.ps1
start-dev-env.sh		start-dev-env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MentorMe - Image Plagiarism Detection System

Features

Quick Start

1. Start Services

2. Install Dependencies

3. Run Application

Configuration

Essential Settings

Detection Thresholds

Vector Search Backend

Testing

Run All Tests

Test Status

Project Structure

How It Works

Container Management

Monitoring

Troubleshooting

Documentation

License

Acknowledgments

tap_plg

About

Uh oh!

Releases

Packages

Languages

theapprenticeproject/tap_plg

Folders and files

Latest commit

History

Repository files navigation

MentorMe - Image Plagiarism Detection System

Features

Quick Start

1. Start Services

2. Install Dependencies

3. Run Application

Configuration

Essential Settings

Detection Thresholds

Vector Search Backend

Testing

Run All Tests

Test Status

Project Structure

How It Works

Container Management

Monitoring

Troubleshooting

Documentation

License

Acknowledgments

tap_plg

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages