A STARK-based prover for proving unigram LLM-watermark detection using the Stwo prover.
Unigram watermarking embeds hidden signatures in LLM-generated text. The detection algorithm:
- Uses a PRF (Poseidon2) with a secret key to classify each token as "green" or "red"
- Counts the number of green tokens in the text
- Compares the count against a threshold derived from the z-score
With γ = 0.5 (half the vocabulary is "green"), unwatermarked text has ~50% green tokens. Watermarked text has significantly more.
This library generates a zero-knowledge STARK proof that the green count exceeds the threshold, allowing a public verifier to confirm watermark detection without knowing the secret key.
This project is a PoC provided for experimental purposes only.
It has not been audited for security, correctness, or performance, and must not be used in production.
Quick setup (automated):
bash cli/setup.sh
source .venv/bin/activateRequirements: Python 3.8+, Rust toolchain, transformers, torch
Generate, detect, prove, and verify watermarked text:
# Generate watermarked text
python cli/zunigram_cli.py generate "Write a poem about love" -o output.json --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" -n 200
# Detect watermark
python cli/zunigram_cli.py detect output.json
# Generate STARK proof
python cli/zunigram_cli.py prove output.json -o proof.json
# Verify proof
python cli/zunigram_cli.py verify proof.json$ cargo run -p benchmarks --release| Tokens | Green | Prove Time | Verify Time |
|---|---|---|---|
| 256 | ~128 | ~16ms | ~644.54µs |
| 512 | ~256 | ~30ms | ~607.63µs |
| 1024 | ~512 | ~90ms | ~787.04µs |
| 2048 | ~1024 | ~320ms | ~1.12ms |
PRF Component (air/prf/):
- Proves Poseidon2 permutation for each token
- Provides (token, prf_output) via LogUp
Unigram Component (air/unigram/):
- Consumes (token, prf_output) from PRF
- Computes
is_greenfromprf_outputvia range check - Accumulates green count
- Binds public inputs via LogUp
- Boolean:
is_green ∈ {0, 1},is_padding ∈ {0, 1} - Range Check for computing
is_greenfromprf_output - Accumulator:
green_count[i] = green_count[i-1] + is_green[i] - LogUp Balance: PRF provides, Unigram consumes
MIT