Skip to content

feat(runtime): add Unsloth Gemma 4 support for E2B, E4B, and 26B-A4B#267

Merged
slomin merged 1 commit intomainfrom
feat/issue-265-gemma4-support
Apr 3, 2026
Merged

feat(runtime): add Unsloth Gemma 4 support for E2B, E4B, and 26B-A4B#267
slomin merged 1 commit intomainfrom
feat/issue-265-gemma4-support

Conversation

@slomin
Copy link
Copy Markdown
Collaborator

@slomin slomin commented Apr 3, 2026

Closes #265

Summary

  • Add first-class Potato support for Unsloth Gemma 4 GGUFs: E2B, E4B, and experimental 26B-A4B
  • Model family detection, projector repo resolution, vision capability detection, and runtime env propagation
  • Shell integration: mmproj handling, repo resolution, and candidate generation for all three variants
  • Auto-switch to llama_cpp runtime when activating Gemma 4 models (ik_llama doesn't have gemma3n/gemma4 arch support yet — tracked in Feature Request: Gemma4 model support ikawrakow/ik_llama.cpp#1572)
  • Strip Gemma 4 <|channel>thought tokens in the chat UI (upstream llama.cpp PEG parser bug, fix pending in Gemma 4 template parser fixes ggml-org/llama.cpp#21326)
  • Fix cp same-file error in build_llama_runtime.sh universal profile packaging
  • Refactored default_projector_candidates_for_model into a shared helper for both Qwen3.5 and Gemma 4
  • Built and deployed upstream llama.cpp (commit a1cfb64) to the llama_cpp runtime slot on Pi

Test evidence

pytest: 649 passed in 10.45s
playwright: 144 passed (35.1s)

Pi QA

  • ssd.local (Pi 5 8GB): E2B Q4_K_M — loads, vision works, projector auto-downloads ✓
  • ssd.local (Pi 5 8GB): E4B Q4_0 — loads, inference works ✓
  • potato.local (Pi 5 16GB): 26B-A4B IQ4_NL — loads (15.5GB RSS), inference at 2.76 tok/sec, vision with mmproj works ✓
  • potato.local: Channel token stripping verified clean via Chrome MCP ✓
  • Qwen3.5 regression: switched back, no issues ✓

Adds Potato support for Unsloth Gemma 4 multimodal GGUF models,
including E2B, E4B, and an experimental 26B-A4B path.

Extends model-family detection, projector repo resolution, projector
candidate handling, runtime env propagation, and launcher mmproj logic
for the supported Gemma 4 variants while keeping existing Qwen3.5
behavior intact.

Automatically switches Gemma 4 activations to the llama_cpp runtime
when needed, packages shared libraries safely for runtime bundles, and
strips Gemma 4 channel-thought tokens from chat output until upstream
template parsing catches up.

Closes #265
@slomin slomin force-pushed the feat/issue-265-gemma4-support branch from 603d1da to e493351 Compare April 3, 2026 08:42
@slomin slomin merged commit 9faaf9a into main Apr 3, 2026
2 checks passed
@slomin slomin deleted the feat/issue-265-gemma4-support branch April 3, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(runtime): add Unsloth Gemma 4 support for E2B/E4B and experimental 26B A4B

1 participant