Differential testing infrastructure for the PolyML lexer. PolyFuzz automatically generates SML source files, lexes them with both PolyML and an independent reference lexer (Verilex), and reports any discrepancies in token output.
| Component | Language | Purpose |
|---|---|---|
| smlgen | Kotlin | Generates randomised SML test corpus |
| polylex-harness | SML / C | AFL++ instrumented harness around the PolyML lexer |
| verilex | Kotlin | Independent reference lexer for SML |
| diffcomp | Kotlin | Token-level differential comparison and reporting |
| orchestrator | Python | End-to-end pipeline orchestration and analytics |
- JDK 21+ (for smlgen, verilex, and diffcomp via Gradle)
- Python 3.13+ and uv
- PolyML (for the lexer harness)
- AFL++ (optional, for fuzz testing)
# Initialise submodules
git submodule update --init --recursive
# Build everything
make
# Verify build artifacts
make checkThe ./polyfuzz wrapper script at the repo root is the main entry point. It handles dependency installation automatically.
# Run a single campaign (100 tests, random seed)
./polyfuzz -d ./results -n 100
# Run 5 campaigns with a fixed seed
./polyfuzz -d ./results -n 200 -s 42 -N 5
# Run without AFL (direct corpus comparison)
./polyfuzz -d ./results -n 100 --no-afl
# Run a single pipeline stage
./polyfuzz -d ./results run-stage smlgen
# Analyse existing results
./polyfuzz -d ./results analyseRun ./polyfuzz --help for the full set of options.