This repository contains a comprehensive training dataset for fine-tuning large language models to become experts in the Witness supply chain attestation framework.
The dataset is designed to train models to:
- Instrument CI/CD pipelines with
witness runcommands - Create policy documents for attestation verification
- Write Rego policies for all attestors in go-witness
- Design multi-step workflows with cross-step validation
- Handle security scenarios including tampering detection and policy enforcement
witness-evals/
├── data/
│ ├── attestors/ # Per-attestor training examples
│ │ ├── commandrun.jsonl # Command execution attestor
│ │ ├── git.jsonl # Git repository attestor
│ │ ├── environment.jsonl
│ │ └── material-product.jsonl
│ ├── policies/ # Policy creation examples
│ │ └── policy-creation.jsonl
│ ├── workflows/ # Multi-step pipeline examples
│ │ └── ci-cd-workflows.jsonl
│ └── security/ # Attack/defense scenarios
├── scripts/
│ ├── generate_dataset.py # Generate training data
│ └── validate_dataset.py # Validate JSONL format
└── docs/
└── FINE_TUNING_GUIDE.md
All training examples follow the OpenAI fine-tuning format (JSONL with messages):
{
"messages": [
{
"role": "system",
"content": "You are an expert in the Witness supply chain attestation framework..."
},
{
"role": "user",
"content": "How do I attest a Go build with commandrun tracking?"
},
{
"role": "assistant",
"content": "Here's how to attest a Go build...\n\n```bash\nwitness run --step build...\n```"
}
]
}- Total examples: 10,000 ✅
- Verification: 100% passed
witness verify - File size: 26 MB
- Attestors covered: 15 combinations
- Quality: Formally verified
- Total examples: 10,000
- File size: 20 MB
- Quality: Programmatically generated
- Total examples: 22
- Attestors covered: commandrun, git, environment, material, product
- Quality: Hand-crafted
| Category | Examples | Description |
|---|---|---|
| commandrun | 6 | Command execution, tracing, exit code validation |
| git | 5 | Repository attestation, branch validation, signatures |
| environment | 3 | System info, env vars, hostname restrictions |
| material-product | 3 | Input/output files, cross-step validation |
| policies | 3 | Policy document structure, multi-step, Rego integration |
| workflows | 2 | GitHub Actions, container builds |
python3 scripts/generate_dataset.pypython3 scripts/validate_dataset.pySee docs/FINE_TUNING_GUIDE.md for detailed instructions on fine-tuning with:
- Llama 3
- Mistral
- Other open-source models
To expand the dataset:
-
Edit
scripts/generate_dataset.py:- Add new
generate_*_examples()methods - Cover additional attestors (aws, gitlab, github, oci, sbom, etc.)
- Add security/adversarial scenarios
- Add new
-
Regenerate:
python3 scripts/generate_dataset.py
-
Validate:
python3 scripts/validate_dataset.py
The following attestors from go-witness need examples:
- aws-iid (AWS Instance Identity)
- aws-codebuild
- gcp-iit (GCP Identity Token)
- github (GitHub Actions)
- gitlab (GitLab CI)
- jenkins
- docker
- oci (Container images)
- sbom (Software Bill of Materials)
- vex (Vulnerability Exploitability eXchange)
- sarif (Static Analysis)
- maven
- lockfiles
- k8smanifest
- secretscan
- system-packages
- slsa (SLSA Provenance)
- omnitrail
- jwt
- link (in-toto link)
- policyverify
To contribute examples:
- Add examples to appropriate category in
generate_dataset.py - Follow the existing format (system + user + assistant messages)
- Include:
- Complete
witness run/witness verifycommands - Rego policy examples
- Explanations of captured fields
- Complete
- Run validation before submitting
- Ensure no sensitive data in examples
For Llama 3 / Mistral fine-tuning:
base_model: meta-llama/Llama-3-8B
learning_rate: 2e-5
batch_size: 4
num_epochs: 3
warmup_steps: 100
gradient_accumulation_steps: 4
max_seq_length: 2048After fine-tuning, evaluate the model on:
- Accuracy: Can it generate correct witness commands?
- Policy creation: Valid JSON policy documents?
- Rego syntax: Syntactically correct Rego policies?
- Completeness: All required flags included?
- Security: Does it recommend secure practices?
Apache 2.0 - Same as go-witness