About · Quick Start · Architecture · Training · Benchmarks · Structure · Contributing
Springhead is a quantum-classical hybrid language model built on Qwen2.5-Coder-32B by TheWakeSystems. It replaces a subset of classical transformer layers with proprietary Springhead Hybrid quantum-informed modules, achieving extreme parameter compression while preserving core reasoning capabilities.
- 🌀 Extreme Compression — 3,398M → 43.7M trainable parameters (≈ 1.3%), dramatically reducing memory footprint
- ⚛️ Quantum-Classical Hybrid — replaces 8 of 64 transformer blocks with quantum-informed tensor network layers (
MonarchProj+EntanglementLayer) - 🖥️ Multi-GPU Ready — automated device dispatch across up to 16 GPUs via
accelerate, with intelligent memory load balancing - 🧠 Reasoning Preserved — retains mathematical and logical reasoning performance from the base Qwen2.5-Coder-32B
- 🔌 Drop-in Compatible — works with standard Hugging Face Transformers pipelines
| ✅ Recommended | ❌ Not Recommended |
|---|---|
| Quantum-classical hybrid NN research | Production code generation (without further fine-tuning) |
| Hardware-constrained inference testing | High-risk decision-making |
| Knowledge Distillation experiments | Zero-shot critical reasoning |
- Python 3.10+
- CUDA-capable GPU(s) (BF16 recommended)
- ~60 GB total GPU memory (single or multi-GPU)
git clone https://github.com/THeWakeSystems/Springhead.git
cd Springhead
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtimport torch
from transformers import AutoTokenizer
from scripts.benchmark_hybrid import load_model, generate
MODEL_PATH = "/path/to/Qwen2.5-Coder-32B"
CHECKPOINT = "checkpoints/checkpoints_hybrid_v2/epoch_2.pt"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = load_model(CHECKPOINT, MODEL_PATH, device="cuda", dtype="bf16")
response = generate(model, tokenizer, "Write a Python function for binary search.")
print(response)python scripts/benchmark_hybrid.py \
--model_path /path/to/Qwen2.5-Coder-32B \
--checkpoint checkpoints/checkpoints_hybrid_v2/epoch_2.pt \
--device cuda \
--dtype bf16Springhead targets layers 48–63 of the 64-layer Qwen2.5-Coder-32B backbone. Each replaced MLP is substituted with:
Original MLP → MonarchProj → EntanglementLayer → Hybrid Output
| Metric | Value |
|---|---|
| Base Model | Qwen2.5-Coder-32B |
| Total Layers | 64 |
| Hybrid Layers | 8 (layers 48–63) |
| Trainable Parameters | 43.7M |
| Original Target Parameters | 3,398M |
| Compression Ratio | ≈ 1.3% |
u_proj_output_dim |
4 |
block_size / entangle_rank |
64 |
| Recommended Hardware | 16× CUDA GPUs (BF16), ~58.8 GB total VRAM |
The hybrid layers are trained via Knowledge Distillation to match the original layer outputs:
python scripts/train_hybrid.py \
--model_path /path/to/Qwen2.5-Coder-32B \
--output_dir checkpoints/ \
--num_epochs 3 \
--batch_size 2 \
--learning_rate 1e-3- Freezes all base model parameters
- Trains only the injected quantum-informed projections
- Supports both SFT and KD loss modes
Run the integrated benchmark suite across 5 task categories:
| Category | Status |
|---|---|
| 🧮 Math Reasoning | ✅ Stable |
| 🔢 Logic | ✅ Stable |
| 💻 Code Generation | |
| 🌐 Commonsense | |
| 🌍 Multilingual |
Note: At the current 1.3% compression ratio, code generation exhibits semantic breaks and token repetition. For production deployments, consider increasing
entangle_rankor reducing the number of replaced layers.
Springhead/
├── model/
│ └── CustomQwen32B_hybrid.py # Hybrid model architecture
├── scripts/
│ ├── train_hybrid.py # Training / KD pipeline
│ ├── benchmark_hybrid.py # Multi-task benchmark suite
│ └── benchmark_results/ # Saved benchmark outputs
├── examples/
│ └── simple_inference.py # Minimal inference example
├── checkpoints/
│ └── checkpoints_hybrid_v2/ # Pretrained hybrid weights (Git LFS)
├── MODEL_CARD.md # Detailed model card
├── RELEASE_NOTES.md # Version changelog
└── requirements.txt # Python dependencies
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-idea) - Commit your changes (
git commit -m 'Add amazing idea') - Push to your branch (
git push origin feature/amazing-idea) - Open a Pull Request
Please read MODEL_CARD.md for model-specific considerations before submitting PRs that modify the architecture.
- Code Generation: Extreme compression may cause token repetition and semantic discontinuities
- Memory Footprint: Despite parameter compression, overall GPU memory requirement remains ~58.8 GB in BF16
- Model Parity: Exact behavioral parity with the upstream Qwen2.5-Coder-32B base model is not guaranteed
See RELEASE_NOTES.md for the full list and mitigation roadmap.
This project is licensed under the Apache 2.0 License — see LICENSE for details.
Built with ❤️ by TheWakeSystems