Skip to content

feat: EXP-15-19 training research, Gemma 4 adapter, data pipeline#377

Merged
CalebisGross merged 3 commits intomainfrom
autoresearch/ft-mar25
Apr 4, 2026
Merged

feat: EXP-15-19 training research, Gemma 4 adapter, data pipeline#377
CalebisGross merged 3 commits intomainfrom
autoresearch/ft-mar25

Conversation

@CalebisGross
Copy link
Copy Markdown
Collaborator

Summary

  • EXP-15 through EXP-19 completed: rotation experiments, data scaling (12K encoding examples), Gemma 4 E2B integration
  • Encoding spoke solved: 100% novel schema compliance on both Qwen 3.5 2B and Gemma 4 E2B
  • Bespoke spoke models outperform Gemini 3 Flash on mnemonic's encoding task (100% vs 0% schema)
  • Qwen 3.5 2B selected as production encoding model (1.7x faster than Gemma 4 locally at equal quality)
  • Hallucination stress test: both spoke models 5/7, Gemini 1/7

Key additions

  • gemma_spoke_adapter.py — Gemma 4 E2B spoke adapter (NF4, PLE CPU offload, SpokeWrappedLayer)
  • batch_encode.py — Gemini Batch API pipeline for scalable training data generation
  • compare_models.py — Side-by-side model comparison (schema, speed, quality)
  • stress_test_hallucination.py — Hard input testing for detail preservation
  • enrich_and_generate.py, extract_prenuke_data.py, merge_training_data.py — Data pipeline
  • train_qwen_spokes.py — Updated with --model-type gemma support, OOM protection
  • experiment_registry.md — Full results for EXP-15 through EXP-19
  • Handoff encoding preserved verbatim in agent.go

Test plan

  • Qwen novel eval: 10/10 (100%) schema compliance
  • Gemma novel eval: 10/10 (100%) schema compliance
  • Hallucination stress test: Qwen 5/7, Gemma 5/7
  • Model comparison: Qwen 19.7s/input, Gemma 33.9s/input, Gemini fails
  • End-to-end daemon integration via serve_spokes.py

🤖 Generated with Claude Code

CalebisGross and others added 3 commits April 4, 2026 08:33
…testing

Research session covering rotation experiments, data scaling, Gemma 4
integration, and production quality validation.

Experiments:
- EXP-15/15b: orthogonal rotation (refuted full-space, minor bottleneck)
- EXP-16: clean run 3 replication (eval 0.6074, 70% novel schema)
- EXP-17: v2 dataset — 100% novel schema (removed poison data)
- EXP-18: 12K encoding-only — 100% novel schema confirmed
- EXP-19: Gemma 4 E2B + spokes — 100% schema, 5/7 stress test

Infrastructure:
- gemma_spoke_adapter.py (NF4, PLE offload, SpokeWrappedLayer)
- batch_encode.py (Gemini Batch API, 50% cheaper)
- compare_models.py, stress_test_hallucination.py
- enrich_and_generate.py, extract_prenuke_data.py, merge_training_data.py
- train_qwen_spokes.py --model-type gemma support
- Handoff encoding preserved verbatim (agent.go)

Results: Qwen 100% schema 20s/input vs Gemma 100% 34s/input vs Gemini 0%.
Both spoke models 5/7 on hallucination stress test. Qwen selected as
production encoding model for speed at equal quality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HTTP server on port 8899 that serves Qwen/Gemma + spokes as
/v1/chat/completions endpoint. Allows mnemonic daemon to use the
spoke model like LM Studio without GGUF export.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CalebisGross CalebisGross merged commit f6ce427 into main Apr 4, 2026
@CalebisGross CalebisGross deleted the autoresearch/ft-mar25 branch April 4, 2026 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant