✦ ✦ ✦ ✦ ✦ ✦ ✦ ┌─┐┬─┐┌─┐┬┬ │ ┬├┬┘├─┤││ └─┘┴└─┴ ┴┴┴─┘ ✦ ✦ ✦ ✦ ✦ ✦ ✦
Documentation: Miner • Validator •
grail delivers post-training for language models with cryptographically verifiable inference. It implements the GRAIL protocol (Guaranteed Rollout Authenticity via Inference Ledger) so that rollouts produced during RL are tied to a specific model and input, and can be independently verified by validators.
- grail (lowercase): The Bittensor subnet implementation orchestrating miners and validators for verifiable post-training
- GRAIL (uppercase): The protocol that proves rollout authenticity and model identity
- The current release is inference-only: miners generate rollouts and validators verify and score them.
- Reinforcement learning post-training (e.g., GRPO trainer and model updates) will be added in a future version.
Prover/Verifier implementation with:
- PRF-based index derivation and sketch commitments for token-level verification
- Verifier-supplied challenge (drand + chain/window context)
- Token and model-config validation; structured signatures bound to model identity
- SAT problem binding and solution checks for end-to-end rollout verification
GRPO-style rollout system with:
- Multiple rollouts per problem, token-level logprob tracking, advantage computation
- Qwen-style chat template injection for reasoning/solution tagging
- SAT-specific
SATRolloutGeneratorwith modular reward vector composition
Modular environments, currently:
- SAT Problems (
sat.py): Deterministic 3-SAT generation, parsing, reward shaping
Object-storage utilities for miner/validator coordination:
- Upload mined rollouts (
sink_window_inferences), publish validated rollouts (upload_valid_rollouts) - Optional dataset export to Hugging Face (
upload_to_huggingface)
- Randomness (
grail/infrastructure/drand.py): Robust drand v2-first client with fallbacks and a mock beacon for testing - Chain & credentials (
grail/infrastructure/chain.py): Manages R2 credential commitments and metagraph access
Typer-based CLI with subcommands: mine, validate (and experimental train).
Best practices for miners:
- Do not override model-related environment variables (
GRAIL_MODEL_NAME,GRAIL_MAX_NEW_TOKENS). - Leave the final 2 blocks of each window for upload; generation should stop near the end automatically.
- Prefer
uv syncfor reproducible installs.
- Problem Generation: Validators derive a SAT instance from a public seed that mixes drand randomness with the window’s block hash
- Rollout Collection: Miners generate multiple GRPO rollouts, tracking token ids and logprobs for proof construction
- GRAIL Verification: Validators verify tokens, the GRAIL commitment/opening against the claimed model, the deterministic SAT instance, and the reported solution
- Reward & Weights: Validators score miners over recent windows using unique/valid/successful rollout metrics with a superlinear curve, then normalize and set weights on-chain
- Model Updates (planned): Validated rollouts will be used for post-training in a future release
The GRAIL protocol ensures:
- Deterministic, publicly auditable challenges (drand + chain context)
- Model-binding proof of token processing; no substitution or replay
- Deterministic SAT instance reconstruction and solution verification
- PRIME_Q: 2,147,483,647 (mod prime for sketches)
- CHALLENGE_K: 16 (minimum challenged positions)
- TOLERANCE: 3 (numeric tolerance for comparisons)
- MODEL_NAME: default
Qwen/Qwen3-4B-Instruct-2507(override viaGRAIL_MODEL_NAME) - MAX_NEW_TOKENS: configurable generation cap (default 1024 via env)
- WINDOW_LENGTH: 50 blocks per scoring window
- 3-SAT: Variables 3–10, Clauses 5–20, Clause length 3; deterministic from seed
- Hugging Face Transformers compatible, exposes token ids/logprobs
- CUDA recommended for throughput
This project uses uv for dependency management.
# Clone the repository
git clone https://github.com/tplr-ai/grail
cd grail
# Create a venv
uv venv
# Activate the virtual environment
source .venv/bin/activate
# Install dependencies
uv sync# Copy then fill out env items (wallets, network, R2 credentials)
cp .env.example .env
# Run miner locally
grail mine# Copy then fill out env items
cp .env.example .env
# Run validator locally
grail validateNotes:
- Randomness is fetched from drand; miners mix it with the window's block hash
- Rollouts are uploaded to object storage (R2/S3); validators fetch, verify, score, and set weights
- Validated rollouts can be exported to a Hugging Face dataset for analysis
- Monitoring: miners and validators can log metrics to the public W&B project for real-time scores and issues: https://wandb.ai/tplr/grail
- Verifiable Training: Cryptographic binding of rollouts to model and input
- Decentralized Post-Training: Internet-scale contribution and evaluation
- Problem Agnostic: Environment framework enables new domains beyond SAT
- Incentive Aligned: On-chain weights reward sustained, verifiable improvements
We welcome contributions to:
- New environments and reward vectors
- Protocol robustness and verification
- Performance and throughput improvements
- Documentation and examples