Fix signed-out Ollama prompt/chat with Gemma 4 by JZKK720 · Pull Request #2563 · tinyhumansai/openhuman

JZKK720 · 2026-05-24T09:20:37Z

Summary

Allow gemma4:e4b-it-q8_0 as a local Ollama chat model for the validated smoke path.
Route user-facing local prompt/chat RPC calls through interactive inference entrypoints that do not block on scheduler capacity when the core is signed out.
Preserve the existing gated internal inference path for non-interactive flows that should still respect scheduler capacity limits.
Add regression tests covering signed-out prompt/chat execution so the non-blocking behavior stays enforced.

Problem

In signed-out core-only usage, openhuman.inference_prompt and openhuman.inference_chat could hang indefinitely against Ollama because the RPC path waited on scheduler capacity that is unavailable in signed_out mode.
The requested Ollama model gemma4:e4b-it-q8_0 was not accepted by the current chat-model allowlist, which blocked the target smoke configuration.

Solution

Add interactive local inference service methods that bypass wait_for_capacity() for direct user-invoked prompt/chat calls.
Switch the JSON-RPC local inference ops to those interactive methods while keeping the existing gated path available for internal capacity-managed flows.
Extend the local Ollama chat-model allowlist to include gemma4:e4b-it-q8_0.
Add signed-out regression tests for both prompt and chat execution.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — CI enforces the diff-cover gate for this PR; local merged coverage was not run because frontend dependencies are not installed in this checkout.
Coverage matrix updated — N/A: no feature rows were added, removed, or renamed; existing rows still apply.
All affected feature IDs from the matrix are listed in the PR description under ## Related
No new external network dependencies introduced (mock backend used per Testing Strategy)
Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: this change is limited to the local inference RPC path and targeted regression coverage.
Linked issue closed via Closes #NNN in the ## Related section — N/A: no linked issue was provided.

Impact

Affects the Rust core local-inference runtime.
Signed-out/core-only Ollama prompt and chat requests no longer hang on scheduler gating.
gemma4:e4b-it-q8_0 can be selected directly for the validated Ollama smoke configuration.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: ollama-fix
Commit SHA: 1ed33aa54d3bf77677e015afafec7528d3a9d155

Validation Run

N/A: pnpm --filter openhuman-app format:check — Rust-only PR; CI typecheck passes independently
N/A: pnpm typecheck — Rust-only PR; CI typecheck passes independently
Focused tests: live JSON-RPC smoke against a patched core on http://127.0.0.1:7182 returned ollama smoke ok for both openhuman.inference_prompt and openhuman.inference_chat; openhuman.inference_status reported provider ollama, state ready, and model/chat model gemma4:e4b-it-q8_0.
Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml --all --check
Tauri fmt/check (if changed): N/A: no Tauri shell files changed.

Validation Blocked

command: pnpm --filter openhuman-app format:check; pnpm typecheck
error: prettier and tsc were not available because node_modules is not installed in this checkout.
impact: frontend formatting/typecheck were not exercised locally; this PR only changes Rust core files.

Behavior Changes

Intended behavior change: signed-out local Ollama prompt/chat requests should execute immediately instead of waiting forever on scheduler capacity, and gemma4:e4b-it-q8_0 should be accepted as the configured chat model.
User-visible effect: local prompt/chat smoke succeeds in core-only signed-out mode with the requested Gemma 4 Ollama model.

Parity Contract

Legacy behavior preserved: internal capacity-managed inference flows still use the gated path.
Guard/fallback/dispatch parity checks: JSON-RPC prompt/chat now dispatch through interactive methods only; the shared internal implementation continues to handle provider dispatch and response shaping.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: this PR
Resolution (closed/superseded/updated): updated

Summary by CodeRabbit

New Features
- Added interactive local inference for single prompts and multi-turn chat that bypasses global scheduler waits.
- Expanded supported local chat models to include a newer model.
- Documented an optional base-URL override for the local model server.
Bug Fixes
- Improved runner connectivity probe to use a more appropriate request method.
Tests
- Added tests verifying interactive inference does not block and that server connectivity checks accept healthy runners.

qodo-code-review · 2026-05-24T09:20:41Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-24T09:21:04Z

📝 Walkthrough

Walkthrough

Adds interactive inference entry points that bypass the global scheduler gate, refactors chat gating to be optional, updates callers to use interactive variants, adds non-blocking tests, expands allowed chat models, and changes the Ollama tags probe to GET.

Changes

Interactive Inference and Gating

Layer / File(s)	Summary
Interactive inference API and conditional gating `src/openhuman/inference/local/service/public_infer.rs`	Adds `LocalAiService::prompt_interactive` and `chat_with_history_interactive`, refactors `chat_with_history_internal(..., gated: bool)`, and makes scheduler gate acquisition conditional on `gated`.
Callsites updated to use interactive methods `src/openhuman/inference/local/ops.rs`	`local_ai_prompt` and `local_ai_chat` now call `prompt_interactive(...)` and `chat_with_history_interactive(...)` respectively, preserving parameters.
Test coverage for non-blocking interactive methods `src/openhuman/inference/local/service/public_infer_tests.rs`	Adds two Tokio tests that hold an LLM scheduler permit, mock Ollama endpoints, and assert interactive calls return mocked responses within a 2s timeout.
MVP chat model allowlist expansion `src/openhuman/inference/model_ids.rs`	Adds `gemma4:e4b-it-q8_0` to `MVP_ALLOWED_CHAT_MODELS`, adds a unit test validating it, and updates a comment in the rejection test.
Ollama admin probe & test `src/openhuman/inference/local/service/ollama_admin.rs`, `src/openhuman/inference/local/service/ollama_admin_tests.rs`	Changes the `/api/tags` runner probe from POST to GET and adds a test that verifies `ensure_ollama_server` succeeds with a mock external runner.
Environment example update `.env.example`	Documents optional `OPENHUMAN_OLLAMA_BASE_URL` with default guidance for non-default Ollama deployments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tinyhumansai/openhuman#1975: Related inference/Ollama routing refactor touching the same local inference service layer.

Suggested labels

rust-core

Suggested reviewers

graycyrus

Poem

🐰 I nudged the gate and hopped ahead,
Interactive prompts now leap instead.
Gemma4 joined the merry throng,
Mock servers answered, brave and strong.
Queue released — I bound along.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main objective: fixing signed-out Ollama prompt/chat with Gemma 4 by introducing interactive methods that bypass scheduler gating and adding Gemma 4 to the allowlist.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adjusts the local inference (Ollama/LM Studio) routing so user-invoked prompt/chat JSON-RPC calls don’t hang in signed-out core-only mode, and expands the Ollama chat-model allowlist to include the requested Gemma 4 variant.

Changes:

Allow gemma4:e4b-it-q8_0 in the local Ollama chat-model allowlist (with a regression unit test).
Add interactive local inference entrypoints for prompt + chat history that bypass scheduler_gate::wait_for_capacity().
Route JSON-RPC local_ai_prompt / local_ai_chat through the interactive entrypoints and add signed-out regression tests to enforce non-blocking behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/openhuman/inference/model_ids.rs`	Expands the local Ollama chat-model allowlist and adds a test for the new Gemma 4 ID.
`src/openhuman/inference/local/service/public_infer.rs`	Introduces interactive prompt/chat entrypoints and makes chat gating conditional via an internal helper.
`src/openhuman/inference/local/service/public_infer_tests.rs`	Adds signed-out regression tests asserting prompt/chat interactive paths don’t block.
`src/openhuman/inference/local/ops.rs`	Switches JSON-RPC prompt/chat operations to call the new interactive service methods.

Comments suppressed due to low confidence (1)

src/openhuman/inference/local/service/public_infer_tests.rs:416

Same concern as above: calling scheduler_gate::init_global here initialises global state for the entire unit-test binary, which can cause order-dependent behavior / skipped scheduler_gate tests. Prefer isolating this via an integration test binary or a test-only reset/isolated init helper.

    let config = enabled_config();
    crate::openhuman::scheduler_gate::init_global(&config);
    let _signed_out = crate::openhuman::scheduler_gate::SignedOutTestGuard::set(true);
    let service = ready_service(&config);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/local/service/public_infer.rs`:
- Around line 66-83: Add diagnostic debug logging to prompt_interactive (and
similarly to inference_interactive call sites) to record function entry/exit and
branch decisions so gating and the new interactive path are triageable: log at
entry with the config/user context, log the disabled gate branch when
!config.local_ai.runtime_enabled (including a short reason), log which system
prompt branch was chosen (no_think vs default), and log the call to and return
from inference_interactive with outcome (Ok/Err) so callers can correlate
behavior; update the same pattern for the related interactive/gating code around
inference_interactive to cover both entry, branches, and exit.
- Around line 220-228: Module-level documentation still states that autocomplete
is the only ungated path; update the docs to reflect the new interactive
prompt/chat entry points. Edit the module doc comment in the same file (update
the top-of-file //! or /*! */ doc block) to mention the new interactive
functions such as chat_with_history_interactive and any related interactive
prompt APIs (e.g., chat_with_history_internal entry points), clarify that
interactive prompt/chat is also an ungated path, and remove or rephrase the
stale sentence about autocomplete being the only ungated path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0777053c-6b84-4edf-b996-3bb4d77d6f01

📥 Commits

Reviewing files that changed from the base of the PR and between e9ca97c and 1ed33aa.

📒 Files selected for processing (4)

src/openhuman/inference/local/ops.rs
src/openhuman/inference/local/service/public_infer.rs
src/openhuman/inference/local/service/public_infer_tests.rs
src/openhuman/inference/model_ids.rs

senamakel · 2026-05-25T04:55:19Z

great pr! merging shortly

# Conflicts: # src/openhuman/inference/local/ops.rs # src/openhuman/inference/local/service/public_infer.rs

…ctive paths - Add trace logs to prompt_interactive and chat_with_history_interactive indicating scheduler gate bypass (addresses @Copilot and @coderabbitai on public_infer.rs:80 and public_infer.rs:220-228) - Update inline_complete_interactive doc comment to reflect that prompt_interactive and chat_with_history_interactive are also ungated

Developer added 2 commits May 24, 2026 17:18

Fix signed-out Ollama prompt and chat

3cd559f

Format Ollama inference regression test

1ed33aa

JZKK720 requested review from a team and Copilot May 24, 2026 09:20

Copilot started reviewing on behalf of JZKK720 May 24, 2026 09:21 View session

coderabbitai Bot added feature Net-new user-facing capability or product behavior. bug labels May 24, 2026

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread src/openhuman/inference/local/service/public_infer.rs

Comment thread src/openhuman/inference/local/service/public_infer_tests.rs Outdated

coderabbitai Bot requested changes May 24, 2026

View reviewed changes

Comment thread src/openhuman/inference/local/service/public_infer.rs

Comment thread src/openhuman/inference/local/service/public_infer.rs

Isolate interactive inference permit tests

946d4d5

coderabbitai Bot added the working A PR that is being worked on by the team. label May 24, 2026

Probe Ollama readiness with GET /api/tags

d592e92

senamakel self-assigned this May 25, 2026

senamakel added 2 commits May 24, 2026 21:58

Merge branch 'main' into pr/2563

12084f8

# Conflicts: # src/openhuman/inference/local/ops.rs # src/openhuman/inference/local/service/public_infer.rs

coderabbitai Bot added the rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. label May 25, 2026

coderabbitai Bot approved these changes May 25, 2026

View reviewed changes

senamakel merged commit cc67177 into tinyhumansai:main May 25, 2026
35 of 42 checks passed

JZKK720 deleted the ollama-fix branch May 25, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix signed-out Ollama prompt/chat with Gemma 4#2563

Fix signed-out Ollama prompt/chat with Gemma 4#2563
senamakel merged 6 commits into
tinyhumansai:mainfrom
JZKK720:ollama-fix

JZKK720 commented May 24, 2026 •

edited by senamakel

Loading

Uh oh!

qodo-code-review Bot commented May 24, 2026

Uh oh!

coderabbitai Bot commented May 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

senamakel commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JZKK720 commented May 24, 2026 • edited by senamakel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

qodo-code-review Bot commented May 24, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

senamakel commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JZKK720 commented May 24, 2026 •

edited by senamakel

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading