Fix signed-out Ollama prompt/chat with Gemma 4#2563
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
📝 WalkthroughWalkthroughAdds interactive inference entry points that bypass the global scheduler gate, refactors chat gating to be optional, updates callers to use interactive variants, adds non-blocking tests, expands allowed chat models, and changes the Ollama tags probe to GET. ChangesInteractive Inference and Gating
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the local inference (Ollama/LM Studio) routing so user-invoked prompt/chat JSON-RPC calls don’t hang in signed-out core-only mode, and expands the Ollama chat-model allowlist to include the requested Gemma 4 variant.
Changes:
- Allow
gemma4:e4b-it-q8_0in the local Ollama chat-model allowlist (with a regression unit test). - Add interactive local inference entrypoints for prompt + chat history that bypass
scheduler_gate::wait_for_capacity(). - Route JSON-RPC
local_ai_prompt/local_ai_chatthrough the interactive entrypoints and add signed-out regression tests to enforce non-blocking behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/openhuman/inference/model_ids.rs |
Expands the local Ollama chat-model allowlist and adds a test for the new Gemma 4 ID. |
src/openhuman/inference/local/service/public_infer.rs |
Introduces interactive prompt/chat entrypoints and makes chat gating conditional via an internal helper. |
src/openhuman/inference/local/service/public_infer_tests.rs |
Adds signed-out regression tests asserting prompt/chat interactive paths don’t block. |
src/openhuman/inference/local/ops.rs |
Switches JSON-RPC prompt/chat operations to call the new interactive service methods. |
Comments suppressed due to low confidence (1)
src/openhuman/inference/local/service/public_infer_tests.rs:416
- Same concern as above: calling
scheduler_gate::init_globalhere initialises global state for the entire unit-test binary, which can cause order-dependent behavior / skippedscheduler_gatetests. Prefer isolating this via an integration test binary or a test-only reset/isolated init helper.
let config = enabled_config();
crate::openhuman::scheduler_gate::init_global(&config);
let _signed_out = crate::openhuman::scheduler_gate::SignedOutTestGuard::set(true);
let service = ready_service(&config);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/local/service/public_infer.rs`:
- Around line 66-83: Add diagnostic debug logging to prompt_interactive (and
similarly to inference_interactive call sites) to record function entry/exit and
branch decisions so gating and the new interactive path are triageable: log at
entry with the config/user context, log the disabled gate branch when
!config.local_ai.runtime_enabled (including a short reason), log which system
prompt branch was chosen (no_think vs default), and log the call to and return
from inference_interactive with outcome (Ok/Err) so callers can correlate
behavior; update the same pattern for the related interactive/gating code around
inference_interactive to cover both entry, branches, and exit.
- Around line 220-228: Module-level documentation still states that autocomplete
is the only ungated path; update the docs to reflect the new interactive
prompt/chat entry points. Edit the module doc comment in the same file (update
the top-of-file //! or /*! */ doc block) to mention the new interactive
functions such as chat_with_history_interactive and any related interactive
prompt APIs (e.g., chat_with_history_internal entry points), clarify that
interactive prompt/chat is also an ungated path, and remove or rephrase the
stale sentence about autocomplete being the only ungated path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0777053c-6b84-4edf-b996-3bb4d77d6f01
📒 Files selected for processing (4)
src/openhuman/inference/local/ops.rssrc/openhuman/inference/local/service/public_infer.rssrc/openhuman/inference/local/service/public_infer_tests.rssrc/openhuman/inference/model_ids.rs
|
great pr! merging shortly |
# Conflicts: # src/openhuman/inference/local/ops.rs # src/openhuman/inference/local/service/public_infer.rs
…ctive paths - Add trace logs to prompt_interactive and chat_with_history_interactive indicating scheduler gate bypass (addresses @Copilot and @coderabbitai on public_infer.rs:80 and public_infer.rs:220-228) - Update inline_complete_interactive doc comment to reflect that prompt_interactive and chat_with_history_interactive are also ungated
Summary
gemma4:e4b-it-q8_0as a local Ollama chat model for the validated smoke path.Problem
openhuman.inference_promptandopenhuman.inference_chatcould hang indefinitely against Ollama because the RPC path waited on scheduler capacity that is unavailable insigned_outmode.gemma4:e4b-it-q8_0was not accepted by the current chat-model allowlist, which blocked the target smoke configuration.Solution
wait_for_capacity()for direct user-invoked prompt/chat calls.gemma4:e4b-it-q8_0.Submission Checklist
## Relateddocs/RELEASE-MANUAL-SMOKE.md) — N/A: this change is limited to the local inference RPC path and targeted regression coverage.Closes #NNNin the## Relatedsection — N/A: no linked issue was provided.Impact
gemma4:e4b-it-q8_0can be selected directly for the validated Ollama smoke configuration.Related
3.1.3,3.2.1,3.2.4AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
ollama-fix1ed33aa54d3bf77677e015afafec7528d3a9d155Validation Run
pnpm --filter openhuman-app format:check— Rust-only PR; CI typecheck passes independentlypnpm typecheck— Rust-only PR; CI typecheck passes independentlyhttp://127.0.0.1:7182returnedollama smoke okfor bothopenhuman.inference_promptandopenhuman.inference_chat;openhuman.inference_statusreported providerollama, stateready, and model/chat modelgemma4:e4b-it-q8_0.cargo fmt --manifest-path Cargo.toml --all --checkValidation Blocked
command:pnpm --filter openhuman-app format:check;pnpm typecheckerror:prettierandtscwere not available becausenode_modulesis not installed in this checkout.impact:frontend formatting/typecheck were not exercised locally; this PR only changes Rust core files.Behavior Changes
gemma4:e4b-it-q8_0should be accepted as the configured chat model.Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Tests