Skip to content

Fix signed-out Ollama prompt/chat with Gemma 4#2563

Merged
senamakel merged 6 commits into
tinyhumansai:mainfrom
JZKK720:ollama-fix
May 25, 2026
Merged

Fix signed-out Ollama prompt/chat with Gemma 4#2563
senamakel merged 6 commits into
tinyhumansai:mainfrom
JZKK720:ollama-fix

Conversation

@JZKK720
Copy link
Copy Markdown
Contributor

@JZKK720 JZKK720 commented May 24, 2026

Summary

  • Allow gemma4:e4b-it-q8_0 as a local Ollama chat model for the validated smoke path.
  • Route user-facing local prompt/chat RPC calls through interactive inference entrypoints that do not block on scheduler capacity when the core is signed out.
  • Preserve the existing gated internal inference path for non-interactive flows that should still respect scheduler capacity limits.
  • Add regression tests covering signed-out prompt/chat execution so the non-blocking behavior stays enforced.

Problem

  • In signed-out core-only usage, openhuman.inference_prompt and openhuman.inference_chat could hang indefinitely against Ollama because the RPC path waited on scheduler capacity that is unavailable in signed_out mode.
  • The requested Ollama model gemma4:e4b-it-q8_0 was not accepted by the current chat-model allowlist, which blocked the target smoke configuration.

Solution

  • Add interactive local inference service methods that bypass wait_for_capacity() for direct user-invoked prompt/chat calls.
  • Switch the JSON-RPC local inference ops to those interactive methods while keeping the existing gated path available for internal capacity-managed flows.
  • Extend the local Ollama chat-model allowlist to include gemma4:e4b-it-q8_0.
  • Add signed-out regression tests for both prompt and chat execution.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — CI enforces the diff-cover gate for this PR; local merged coverage was not run because frontend dependencies are not installed in this checkout.
  • Coverage matrix updated — N/A: no feature rows were added, removed, or renamed; existing rows still apply.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: this change is limited to the local inference RPC path and targeted regression coverage.
  • Linked issue closed via Closes #NNN in the ## Related section — N/A: no linked issue was provided.

Impact

  • Affects the Rust core local-inference runtime.
  • Signed-out/core-only Ollama prompt and chat requests no longer hang on scheduler gating.
  • gemma4:e4b-it-q8_0 can be selected directly for the validated Ollama smoke configuration.

Related

  • Feature IDs: 3.1.3, 3.2.1, 3.2.4
  • Closes: N/A: no linked issue was provided.
  • Follow-up PR(s)/TODOs: add a repository-owned JSON-RPC integration test for the signed-out Ollama prompt/chat path if broader CI coverage is needed.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: ollama-fix
  • Commit SHA: 1ed33aa54d3bf77677e015afafec7528d3a9d155

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only PR; CI typecheck passes independently
  • N/A: pnpm typecheck — Rust-only PR; CI typecheck passes independently
  • Focused tests: live JSON-RPC smoke against a patched core on http://127.0.0.1:7182 returned ollama smoke ok for both openhuman.inference_prompt and openhuman.inference_chat; openhuman.inference_status reported provider ollama, state ready, and model/chat model gemma4:e4b-it-q8_0.
  • Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml --all --check
  • Tauri fmt/check (if changed): N/A: no Tauri shell files changed.

Validation Blocked

  • command: pnpm --filter openhuman-app format:check; pnpm typecheck
  • error: prettier and tsc were not available because node_modules is not installed in this checkout.
  • impact: frontend formatting/typecheck were not exercised locally; this PR only changes Rust core files.

Behavior Changes

  • Intended behavior change: signed-out local Ollama prompt/chat requests should execute immediately instead of waiting forever on scheduler capacity, and gemma4:e4b-it-q8_0 should be accepted as the configured chat model.
  • User-visible effect: local prompt/chat smoke succeeds in core-only signed-out mode with the requested Gemma 4 Ollama model.

Parity Contract

  • Legacy behavior preserved: internal capacity-managed inference flows still use the gated path.
  • Guard/fallback/dispatch parity checks: JSON-RPC prompt/chat now dispatch through interactive methods only; the shared internal implementation continues to handle provider dispatch and response shaping.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): updated

Summary by CodeRabbit

  • New Features

    • Added interactive local inference for single prompts and multi-turn chat that bypasses global scheduler waits.
    • Expanded supported local chat models to include a newer model.
    • Documented an optional base-URL override for the local model server.
  • Bug Fixes

    • Improved runner connectivity probe to use a more appropriate request method.
  • Tests

    • Added tests verifying interactive inference does not block and that server connectivity checks accept healthy runners.

Review Change Stack

@JZKK720 JZKK720 requested review from a team and Copilot May 24, 2026 09:20
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

Adds interactive inference entry points that bypass the global scheduler gate, refactors chat gating to be optional, updates callers to use interactive variants, adds non-blocking tests, expands allowed chat models, and changes the Ollama tags probe to GET.

Changes

Interactive Inference and Gating

Layer / File(s) Summary
Interactive inference API and conditional gating
src/openhuman/inference/local/service/public_infer.rs
Adds LocalAiService::prompt_interactive and chat_with_history_interactive, refactors chat_with_history_internal(..., gated: bool), and makes scheduler gate acquisition conditional on gated.
Callsites updated to use interactive methods
src/openhuman/inference/local/ops.rs
local_ai_prompt and local_ai_chat now call prompt_interactive(...) and chat_with_history_interactive(...) respectively, preserving parameters.
Test coverage for non-blocking interactive methods
src/openhuman/inference/local/service/public_infer_tests.rs
Adds two Tokio tests that hold an LLM scheduler permit, mock Ollama endpoints, and assert interactive calls return mocked responses within a 2s timeout.
MVP chat model allowlist expansion
src/openhuman/inference/model_ids.rs
Adds gemma4:e4b-it-q8_0 to MVP_ALLOWED_CHAT_MODELS, adds a unit test validating it, and updates a comment in the rejection test.
Ollama admin probe & test
src/openhuman/inference/local/service/ollama_admin.rs, src/openhuman/inference/local/service/ollama_admin_tests.rs
Changes the /api/tags runner probe from POST to GET and adds a test that verifies ensure_ollama_server succeeds with a mock external runner.
Environment example update
.env.example
Documents optional OPENHUMAN_OLLAMA_BASE_URL with default guidance for non-default Ollama deployments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

rust-core

Suggested reviewers

  • graycyrus

Poem

🐰 I nudged the gate and hopped ahead,
Interactive prompts now leap instead.
Gemma4 joined the merry throng,
Mock servers answered, brave and strong.
Queue released — I bound along.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main objective: fixing signed-out Ollama prompt/chat with Gemma 4 by introducing interactive methods that bypass scheduler gating and adding Gemma 4 to the allowlist.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. bug labels May 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the local inference (Ollama/LM Studio) routing so user-invoked prompt/chat JSON-RPC calls don’t hang in signed-out core-only mode, and expands the Ollama chat-model allowlist to include the requested Gemma 4 variant.

Changes:

  • Allow gemma4:e4b-it-q8_0 in the local Ollama chat-model allowlist (with a regression unit test).
  • Add interactive local inference entrypoints for prompt + chat history that bypass scheduler_gate::wait_for_capacity().
  • Route JSON-RPC local_ai_prompt / local_ai_chat through the interactive entrypoints and add signed-out regression tests to enforce non-blocking behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/openhuman/inference/model_ids.rs Expands the local Ollama chat-model allowlist and adds a test for the new Gemma 4 ID.
src/openhuman/inference/local/service/public_infer.rs Introduces interactive prompt/chat entrypoints and makes chat gating conditional via an internal helper.
src/openhuman/inference/local/service/public_infer_tests.rs Adds signed-out regression tests asserting prompt/chat interactive paths don’t block.
src/openhuman/inference/local/ops.rs Switches JSON-RPC prompt/chat operations to call the new interactive service methods.
Comments suppressed due to low confidence (1)

src/openhuman/inference/local/service/public_infer_tests.rs:416

  • Same concern as above: calling scheduler_gate::init_global here initialises global state for the entire unit-test binary, which can cause order-dependent behavior / skipped scheduler_gate tests. Prefer isolating this via an integration test binary or a test-only reset/isolated init helper.
    let config = enabled_config();
    crate::openhuman::scheduler_gate::init_global(&config);
    let _signed_out = crate::openhuman::scheduler_gate::SignedOutTestGuard::set(true);
    let service = ready_service(&config);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openhuman/inference/local/service/public_infer.rs
Comment thread src/openhuman/inference/local/service/public_infer_tests.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/local/service/public_infer.rs`:
- Around line 66-83: Add diagnostic debug logging to prompt_interactive (and
similarly to inference_interactive call sites) to record function entry/exit and
branch decisions so gating and the new interactive path are triageable: log at
entry with the config/user context, log the disabled gate branch when
!config.local_ai.runtime_enabled (including a short reason), log which system
prompt branch was chosen (no_think vs default), and log the call to and return
from inference_interactive with outcome (Ok/Err) so callers can correlate
behavior; update the same pattern for the related interactive/gating code around
inference_interactive to cover both entry, branches, and exit.
- Around line 220-228: Module-level documentation still states that autocomplete
is the only ungated path; update the docs to reflect the new interactive
prompt/chat entry points. Edit the module doc comment in the same file (update
the top-of-file //! or /*! */ doc block) to mention the new interactive
functions such as chat_with_history_interactive and any related interactive
prompt APIs (e.g., chat_with_history_internal entry points), clarify that
interactive prompt/chat is also an ungated path, and remove or rephrase the
stale sentence about autocomplete being the only ungated path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0777053c-6b84-4edf-b996-3bb4d77d6f01

📥 Commits

Reviewing files that changed from the base of the PR and between e9ca97c and 1ed33aa.

📒 Files selected for processing (4)
  • src/openhuman/inference/local/ops.rs
  • src/openhuman/inference/local/service/public_infer.rs
  • src/openhuman/inference/local/service/public_infer_tests.rs
  • src/openhuman/inference/model_ids.rs

Comment thread src/openhuman/inference/local/service/public_infer.rs
Comment thread src/openhuman/inference/local/service/public_infer.rs
@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 24, 2026
@senamakel
Copy link
Copy Markdown
Member

great pr! merging shortly

@senamakel senamakel self-assigned this May 25, 2026
senamakel added 2 commits May 24, 2026 21:58
# Conflicts:
#	src/openhuman/inference/local/ops.rs
#	src/openhuman/inference/local/service/public_infer.rs
…ctive paths

- Add trace logs to prompt_interactive and chat_with_history_interactive
  indicating scheduler gate bypass (addresses @Copilot and @coderabbitai
  on public_infer.rs:80 and public_infer.rs:220-228)
- Update inline_complete_interactive doc comment to reflect that
  prompt_interactive and chat_with_history_interactive are also ungated
@coderabbitai coderabbitai Bot added the rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. label May 25, 2026
@senamakel senamakel merged commit cc67177 into tinyhumansai:main May 25, 2026
35 of 42 checks passed
@JZKK720 JZKK720 deleted the ollama-fix branch May 25, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants