Pretraining and inference code for a large-scale depth-recurrent language model
-
Updated
Dec 29, 2025 - Python
Pretraining and inference code for a large-scale depth-recurrent language model
Research and training stack for AVA — a tool-using, memory-aware virtual assistant targeting 4 GB VRAM. Spans custom transformers, verifier-RL, external memory, multi-domain benchmarks, and Gemma 4 inference optimization.
Recurrent-depth transformer, fixed. Fork of kyegomez/OpenMythos with scatter-based MoE (2.94x faster), proper ACT halting, DeepSeekMoE load balancing, SDPA kernels, and a working training loop.
Add a description, image, and links to the recurrent-depth topic page so that developers can more easily learn about it.
To associate your repository with the recurrent-depth topic, visit your repo's landing page and select "manage topics."