This repository contains two NLP classification models built with the Hugging Face Transformers library. The project focuses on two distinct domains:
- Adversarial Prompt Security (binary classification)
- Scientific Text Classification (multiclass classification)
Both projects are unified by a common pipeline of Transformer-based classification and data augmentation.
Table of Contents
- Overview
- Quick Start Guide
- Local
- Demo
- Tests
- Documentation
- Project Structure
- Conventions
- Project Description
- Project Extension and Future Work
To start locally, first ensure you have just and uv installed. If you
don't, run the following OS-specific commands:
MacOS:
brew install just uvLinux (Debian/Ubuntu):
sudo apt-get update
sudo apt-get install -y just
curl -LsSf https://astral.sh/uv/install.sh | sh
# then restart your shell so uv is on PATHWindows:
# uv (official installer)
irm https://astral.sh/uv/install.ps1 | iex
# just — pick one package manager you support in your project:
# winget (preferred if available)
winget install casey.just -e # if this ID doesn't resolve on some systems, use one of the following lines
# scoop
scoop install just
# chocolatey
choco install just
Then, install the dependencies and activate the virtual environment by running:
just install
source .venv/bin/activateTBA
To run the tests, make sure you have the virtual environment activated and run:
python -m pytestTo check coverage, run:
python -m pytest --cov=src --cov-fail-under=90 --cov-report=term-missingCI is configured in .github/workflows/ci.yml and is intentionally PR-focused.
It runs for open, reopened, synchronised, and ready-for-review pull requests.
Draft pull requests are ignored until they are marked as ready.
Dependency installation in CI uses uv sync --group dev --frozen to enforce lockfile reproducibility.
CI pipeline stages (in execution order):
Check PR Commit Policy- Fails if the PR has anything other than exactly one commit.
- Fails if commit messages start with
fixup!orsquash!. - Keeps PR history clean before merge.
Pre-commit Checks- Runs all hooks from
.pre-commit-config.yaml. - Enforces formatting, linting, and lightweight safety checks.
- Runs all hooks from
Type Check (Pyright)- Runs static type checks with
pyright. - Catches interface/typing issues before runtime tests.
- Runs static type checks with
Smoke Tests- Runs the
smokemarker subset (pytest -m smoke). - Provides a fast runtime sanity check before full tests.
- Runs the
Pytest (Python 3.11)- Runs the full test suite and enforces a minimum coverage of
90%. - Uploads
coverage.xmlas a workflow artifact for inspection.
- Runs the full test suite and enforces a minimum coverage of
Docs Build- Runs
mkdocs build --strict. - Fails the PR if documentation pages, links, or API autodoc references are invalid.
- Runs
Dependency Vulnerability Audit (Non-blocking)- Runs
pip-auditagainst installed dependencies. - Reports known vulnerabilities in CI logs.
- Is intentionally non-blocking while security posture is being established.
- Runs with
if: always()so findings are still emitted when test stages fail.
- Runs
Security and dependency maintenance is configured with Dependabot in
.github/dependabot.yml:
- Weekly Python dependency update PRs (from
pyproject.toml). - Weekly GitHub Actions version update PRs.
The workflow also uses concurrency cancellation:
- When new commits are pushed to the same PR, in-progress older runs are cancelled.
- This avoids stale CI feedback and reduces consumed GitHub Actions minutes.
Branch protection/ruleset alignment:
- Require a pull request before merging.
- Required approvals:
0(solo workflow), while keeping code-owner and conversation rules. - Require review from Code Owners.
- Require conversation resolution before merging.
- Require status checks to pass (must be enabled), with required checks:
Check PR Commit PolicyPre-commit ChecksType Check (Pyright)Smoke TestsPytest (Python 3.11)Docs Build- Keep
Dependency Vulnerability Audit (Non-blocking)as informational for now, rather than as a required blocking status check. - Block force pushes.
- Require linear history.
- Allow squash merging (and optional rebase merging), with merge commits disabled.
Project documentation is built with MkDocs Material and published to GitHub Pages.
The site combines hand-written guides from docs/ and API reference pages generated
from in-code Google-style docstrings.
Local docs commands:
just docs-build
just docs-serveDocumentation workflows:
- PRs run a strict build (
uv run mkdocs build --strict) as a blocking CI gate. - Pushes to
maintrigger.github/workflows/docs-publish.ymlto deploy to GitHub Pages.
GitHub repository settings needed once:
- Settings -> Pages -> Build and deployment -> Source:
GitHub Actions. - Branch protection/ruleset -> required status checks: include
Docs Build.
The project structure can be seen below, with files having the following roles:
| Folder | File | Description |
|---|---|---|
| ... | ... | ... |
TBA
TBA
Designed to detect prompt injections and jailbreak attempts (e.g., "ignore previous instructions", "DAN", roleplay).
- Backbone: ...
- Focus: ...
- Techniques: ...
Classifies text into scientific/technical categories versus general content.
- Backbone: ...
- Focus: ...
- Techniques: ...
TBA