Skip to content

feat: Go eval CLI with CGo FFI bridge#9

Merged
urmzd merged 1 commit intomainfrom
feat/go-eval-cli
Apr 14, 2026
Merged

feat: Go eval CLI with CGo FFI bridge#9
urmzd merged 1 commit intomainfrom
feat/go-eval-cli

Conversation

@urmzd
Copy link
Copy Markdown
Owner

@urmzd urmzd commented Apr 14, 2026

Summary

  • Adds a Go eval CLI (apps/eval-cli) that benchmarks base vs GAP flows across multiple providers (Google, OpenAI, Ollama, Groq, GitHub Models)
  • Introduces C FFI bindings (src/cffi.rs) so the Go CLI can call the Rust GAP apply engine via CGo
  • Fixes Ollama provider ignoring --host flag when OLLAMA_API_KEY is set (was silently redirecting to ollama.com)
  • Fixes silent error swallowing from saige SDK ErrorDelta — stream errors are now extracted and surfaced in all runner paths
  • Adds justfile recipes: build-go, test-go, run-go, report-go
  • Includes experiment 026 results (gemma4 via Ollama): base flow succeeds, GAP envelope apply fails (model capability limitation)

Test plan

  • go vet ./... passes
  • go test ./... passes
  • Smoke tested direct OpenAI adapter vs provider.Build — both produce output
  • End-to-end run of experiment 026 with Ollama (gemma4) — base flow produces valid artifacts, GAP flow correctly reports parse/apply failures
  • Test with a cloud provider (Google/OpenAI) for GAP apply success validation

Adds a Go-based eval CLI (apps/eval-cli) that benchmarks base vs GAP
flows across providers (Google, OpenAI, Ollama, Groq, GitHub). Uses CGo
to call the Rust GAP apply engine via a new C FFI layer (src/cffi.rs).

Includes provider factory, experiment runner, structured envelope
scoring, markdown report generation, and justfile recipes. Fixes silent
error swallowing from saige SDK ErrorDelta by extracting stream errors
in all runner paths. Fixes Ollama provider ignoring --host when
OLLAMA_API_KEY is set.
@urmzd urmzd merged commit a47fc53 into main Apr 14, 2026
1 check passed
@urmzd urmzd deleted the feat/go-eval-cli branch April 14, 2026 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant