Feat: smart routing#569
Conversation
… functionality - Bumped OpenHuman version from 0.52.7 to 0.52.9 in Cargo.lock files. - Introduced a new routing module that implements intelligent model routing based on task complexity and local model health. - Added a health checker for the local Ollama model server to improve routing decisions. - Enhanced the provider to classify tasks and determine the appropriate backend (local or remote) for processing requests. - Updated related files to support the new routing logic and ensure seamless integration with existing functionalities.
- Removed the `build_tool_instructions` function from the public API, simplifying the routing module. - Updated the `IntelligentRoutingProvider` to utilize a more efficient model resolution process, improving routing decisions based on task complexity and local model health. - Introduced a new `quality` module to assess response quality, enabling better fallback decisions when local responses are deemed low quality. - Enhanced the `RoutingHints` struct to provide more granular control over routing behavior, including privacy requirements and cost sensitivity. - Added tests to validate the new routing logic and quality assessment, ensuring robust functionality across various scenarios.
…ration - Introduced a new test file `live_routing_e2e.rs` containing end-to-end tests for routing against a live backend. - Tests require a valid backend URL, user session JWT, and real network interactions, hence marked as `#[ignore]`. - Implemented functionality to set up environment variables, write configuration files, and perform JSON-RPC calls to validate routing behavior. - Added assertions to ensure correct handling of various routing cases, enhancing test coverage for the routing module.
📝 WalkthroughWalkthroughAdds a new intelligent routing subsystem (routing factory, policy, provider, health, quality, telemetry), wires provider construction to use the routing factory, exposes Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Routing as IntelligentRoutingProvider
participant Policy as PolicyEngine
participant Health as LocalHealthChecker
participant Local as LocalProvider
participant Remote as RemoteProvider
participant Quality as QualityChecker
participant Telemetry as TelemetryEmitter
Client->>Routing: chat(request with model_hint)
Routing->>Policy: classify(model_hint)
Policy-->>Routing: TaskCategory
Routing->>Health: is_healthy()
Health-->>Routing: bool
Routing->>Policy: decide(category, local_available, hints)
Policy-->>Routing: primary + optional_fallback
alt primary == Local
Routing->>Local: execute_request
Local-->>Routing: response / error
Routing->>Quality: is_low_quality(response)
Quality-->>Routing: bool
alt response OK and quality good
Routing->>Telemetry: emit(local_success_record)
else error or low-quality and fallback exists
Routing->>Remote: execute_request
Remote-->>Routing: response
Routing->>Telemetry: emit(fallback_record)
end
else primary == Remote
Routing->>Remote: execute_request
Remote-->>Routing: response
Routing->>Telemetry: emit(remote_record)
end
Routing-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
tests/json_rpc_e2e.rs (1)
808-815: Clarifyhint:reactionexpected behavior.The test expects
hint:reactionto pass through verbatim to the backend (line 814), but the PR description states lightweight tasks should route locally when Ollama is healthy. Since the test config doesn't setlocal_ai.enabled = true, this behavior is consistent with the remote-only path—but the comment on line 813 could be clearer:- // Web chat currently forwards lightweight hint overrides as-is. + // With local_ai disabled, lightweight hints pass through to remote as-is. ("hint:reaction", "hint:reaction"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/json_rpc_e2e.rs` around lines 808 - 815, Update the comment above the routing_cases array to clearly state why the ("hint:reaction", "hint:reaction") entry is expected: mention that web chat forwards lightweight hint overrides verbatim and that in this test the local AI path is disabled (local_ai.enabled not set/false), so the remote-only routing applies; reference the routing_cases array and the "hint:reaction" entry to make the justification explicit.src/openhuman/routing/mod.rs (1)
62-85: Movenew_provider()to a sibling file per module guidelines.The
new_providerfunction contains operational/factory logic (constructing providers, wiring dependencies). Per coding guidelines,mod.rsshould remain light and export-focused, with operational code in sibling files likefactory.rsorops.rs.As per coding guidelines: "Keep domain mod.rs files light and export-focused; put operational code in sibling files (ops.rs, store.rs, schedule.rs, types.rs), then re-export the public API from mod.rs".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/routing/mod.rs` around lines 62 - 85, Move the operational factory function new_provider out of mod.rs into a sibling file (e.g., factory.rs or ops.rs) and keep mod.rs export-focused: create a new file containing the new_provider implementation that constructs OpenAiCompatibleProvider, LocalHealthChecker (using OLLAMA_BASE_URL), and calls IntelligentRoutingProvider::new with local_ai_config.chat_model_id and remote_fallback_model; then update mod.rs to pub use the new_provider symbol (re-export) so callers still import new_provider from the module while the factory wiring lives in the sibling file.src/openhuman/routing/health.rs (2)
88-105: Consider reusingreqwest::Clientinstead of building per-probe.Creating a new
Clienton everyprobe()call is suboptimal —reqwest::Clientis designed to be reused and internally manages connection pools. Building it per-call incurs allocation overhead and prevents connection reuse for repeated probes.♻️ Proposed refactor: store Client in the struct
pub struct LocalHealthChecker { probe_url: String, cache: Mutex<Option<HealthCache>>, ttl: Duration, + client: reqwest::Client, } impl LocalHealthChecker { pub fn with_ttl(base_url: &str, ttl: Duration) -> Self { + let client = reqwest::Client::builder() + .timeout(PROBE_TIMEOUT) + .build() + .unwrap_or_default(); Self { probe_url: format!("{base_url}/api/tags"), cache: Mutex::new(None), ttl, + client, } } async fn probe(&self) -> bool { - let client = reqwest::Client::builder() - .timeout(PROBE_TIMEOUT) - .build() - .unwrap_or_default(); - - match client.get(&self.probe_url).send().await { + match self.client.get(&self.probe_url).send().await { Ok(resp) => resp.status().is_success(), // ... } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/routing/health.rs` around lines 88 - 105, The probe() method currently builds a new reqwest::Client on every call; change the struct that implements probe() (the type containing probe_url and probe()) to own a single reqwest::Client (e.g., add a client: reqwest::Client field) that is constructed once (in the struct's constructor/new or factory) with the PROBE_TIMEOUT and reused for each probe; update probe() to call self.client.get(&self.probe_url).send().await and remove per-call Client::builder() creation, and ensure the initial Client::builder().timeout(PROBE_TIMEOUT).build() error is handled at construction time rather than unwrap_or_default.
56-85: Add trace-level logs for cache hits and probe results.Per coding guidelines, add development-oriented logs at branch decisions and state transitions. The health check result directly impacts routing decisions, so visibility into cache behavior aids debugging.
📝 Proposed logging additions
pub async fn is_healthy(&self) -> bool { // Fast path: return cached result if still fresh. { let guard = self.cache.lock(); if let Some(cached) = guard.as_ref() { if cached.checked_at.elapsed() < cached.ttl { + tracing::trace!( + healthy = (cached.last_result == CachedStatus::Healthy), + "[routing] local health cache hit" + ); return cached.last_result == CachedStatus::Healthy; } } } // Slow path: probe and update cache. let healthy = self.probe().await; + tracing::debug!( + url = %self.probe_url, + healthy, + "[routing] local health probe completed" + ); let status = if healthy {As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/routing/health.rs` around lines 56 - 85, Add trace-level logs in is_healthy to show cache behavior and probe outcomes: log a trace when the fast-path cache is hit (include cached.last_result, cached.checked_at.elapsed() and cached.ttl), log a trace when the cache is stale/miss before calling self.probe(), log the probe result after awaiting self.probe() (include the boolean healthy and the mapped CachedStatus), and log when updating the cache (include new HealthCache fields: last_result, checked_at, ttl). Use the existing is_healthy, probe, cache, HealthCache and CachedStatus symbols and emit trace-level messages so these branch decisions and state transitions are visible in dev logs.src/openhuman/routing/provider.rs (1)
268-306: Complex nested fallback logic — consider consolidation.The local path has three branches: error fallback (272-280), quality fallback (281-300), and success (298-300, 302). The duplication of
self.remote.chat(request, fb, temperature).awaitin both fallback paths and the deep nesting reduce readability.Consider extracting a
should_fallback(&Result<ChatResponse>, privacy: bool) -> boolhelper and a single fallback path to reduce nesting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/routing/provider.rs` around lines 268 - 306, The local routing branch in effective_primary has duplicated nested branches for error- and quality-based fallbacks calling self.remote.chat(...) in multiple places; extract a helper like should_fallback(result: &Result<ChatResponse>, privacy_required: bool, fallback: &Option<RoutingTarget>) -> bool that checks: (1) error and !privacy_required, or (2) Ok(resp) and !privacy_required and quality::is_low_quality(resp.text.as_deref().unwrap_or("")) and fallback.is_some(); then replace the branching in the RoutingTarget::Local arm to call self.local.chat(request, m, temperature).await once, set fallback_occurred when should_fallback returns true, and perform a single unified remote fallback call self.remote.chat(request, fb, temperature).await (using the fb from fallback) if needed; keep references to RoutingTarget::Local, RoutingTarget::Remote, local.chat, remote.chat, hints.privacy_required, and quality::is_low_quality to locate the changes.tests/live_routing_e2e.rs (1)
106-144: SSE stream reader lacks internal read timeout.The
read_sse_event_by_typesfunction relies entirely on the outer 120s timeout. If the backend sends partial data and stalls (keepalive without events), the stream won't close and may hang. For manually-run ignored tests this is acceptable, but adding a per-chunk timeout would improve robustness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/live_routing_e2e.rs` around lines 106 - 144, The SSE reader read_sse_event_by_types should wrap each await of stream.next() with a per-chunk timeout to avoid hanging on partial/stalled keepalives: import tokio::time::timeout and use timeout(Duration::from_secs(N), stream.next()).await at the point where stream.next().await is currently called, handle the Ok(Some(Ok(bytes))) path as before, treat Ok(None) as stream closed, treat Ok(Some(Err(e))) as the existing panic path, and treat Err(_) (timeout) by continuing the loop (or returning a descriptive error) so the function doesn't block indefinitely; add a small configurable constant (e.g., CHUNK_TIMEOUT_SECS) near the top of read_sse_event_by_types to control the per-chunk timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/routing/policy.rs`:
- Around line 149-156: The latency_budget and cost_sensitivity checks in the
use_local decision are dead code because only Lightweight/Medium tasks reach
this branch and matches!(category, TaskCategory::Lightweight |
TaskCategory::Medium) always makes the hint checks unreachable; remove the hint
clauses so use_local is computed from local_available and the category match
only (i.e., simplify the use_local expression), or alternatively if you intended
hints to affect Medium tasks refactor the condition to match only Lightweight in
the category check and apply hint checks for Medium via matches!(category,
TaskCategory::Medium) combined with hints.LatencyBudget/CostSensitivity; update
the variable use_local and associated comment accordingly.
In `@src/openhuman/routing/provider.rs`:
- Around line 158-165: The code slices `text[..text.len().min(80)]` which can
panic on UTF-8 boundaries; replace this with a safe truncation helper and use it
in the tracing::warn call: add a small function like `fn truncate_safe(s: &str,
max_bytes: usize) -> &str { if s.len() <= max_bytes { s } else { let mut end =
max_bytes; while end > 0 && !s.is_char_boundary(end) { end -= 1; } &s[..end] }
}` (or use `char_indices()` to compute the boundary), then change the
`response_preview = &text[..text.len().min(80)]` to `response_preview =
truncate_safe(&text, 80)` where the branch checks `Ok(text) if !privacy_required
&& quality::is_low_quality(text)` and the fallback handling
(RoutingTarget::Remote, fallback_fn, hint) remains unchanged.
In `@tests/live_routing_e2e.rs`:
- Around line 31-46: The EnvVarGuard implementation calls
std::env::set_var/remove_var which are unsafe in Rust 2024; wrap those calls in
unsafe blocks and document the safety rationale: ensure the test-level mutex
(live_e2e_env_lock) serializes access and explain in a comment that the guard
relies on that mutex to avoid UB, or alternatively replace the approach with
process-level isolation if you need stricter safety. Specifically, update
EnvVarGuard::set_to_path to call unsafe { std::env::set_var(...) } and update
Drop::drop to call unsafe { std::env::set_var(...) } / unsafe {
std::env::remove_var(...) }, and add a concise comment referencing EnvVarGuard
and live_e2e_env_lock describing why the unsafe blocks are sound.
---
Nitpick comments:
In `@src/openhuman/routing/health.rs`:
- Around line 88-105: The probe() method currently builds a new reqwest::Client
on every call; change the struct that implements probe() (the type containing
probe_url and probe()) to own a single reqwest::Client (e.g., add a client:
reqwest::Client field) that is constructed once (in the struct's constructor/new
or factory) with the PROBE_TIMEOUT and reused for each probe; update probe() to
call self.client.get(&self.probe_url).send().await and remove per-call
Client::builder() creation, and ensure the initial
Client::builder().timeout(PROBE_TIMEOUT).build() error is handled at
construction time rather than unwrap_or_default.
- Around line 56-85: Add trace-level logs in is_healthy to show cache behavior
and probe outcomes: log a trace when the fast-path cache is hit (include
cached.last_result, cached.checked_at.elapsed() and cached.ttl), log a trace
when the cache is stale/miss before calling self.probe(), log the probe result
after awaiting self.probe() (include the boolean healthy and the mapped
CachedStatus), and log when updating the cache (include new HealthCache fields:
last_result, checked_at, ttl). Use the existing is_healthy, probe, cache,
HealthCache and CachedStatus symbols and emit trace-level messages so these
branch decisions and state transitions are visible in dev logs.
In `@src/openhuman/routing/mod.rs`:
- Around line 62-85: Move the operational factory function new_provider out of
mod.rs into a sibling file (e.g., factory.rs or ops.rs) and keep mod.rs
export-focused: create a new file containing the new_provider implementation
that constructs OpenAiCompatibleProvider, LocalHealthChecker (using
OLLAMA_BASE_URL), and calls IntelligentRoutingProvider::new with
local_ai_config.chat_model_id and remote_fallback_model; then update mod.rs to
pub use the new_provider symbol (re-export) so callers still import new_provider
from the module while the factory wiring lives in the sibling file.
In `@src/openhuman/routing/provider.rs`:
- Around line 268-306: The local routing branch in effective_primary has
duplicated nested branches for error- and quality-based fallbacks calling
self.remote.chat(...) in multiple places; extract a helper like
should_fallback(result: &Result<ChatResponse>, privacy_required: bool, fallback:
&Option<RoutingTarget>) -> bool that checks: (1) error and !privacy_required, or
(2) Ok(resp) and !privacy_required and
quality::is_low_quality(resp.text.as_deref().unwrap_or("")) and
fallback.is_some(); then replace the branching in the RoutingTarget::Local arm
to call self.local.chat(request, m, temperature).await once, set
fallback_occurred when should_fallback returns true, and perform a single
unified remote fallback call self.remote.chat(request, fb, temperature).await
(using the fb from fallback) if needed; keep references to RoutingTarget::Local,
RoutingTarget::Remote, local.chat, remote.chat, hints.privacy_required, and
quality::is_low_quality to locate the changes.
In `@tests/json_rpc_e2e.rs`:
- Around line 808-815: Update the comment above the routing_cases array to
clearly state why the ("hint:reaction", "hint:reaction") entry is expected:
mention that web chat forwards lightweight hint overrides verbatim and that in
this test the local AI path is disabled (local_ai.enabled not set/false), so the
remote-only routing applies; reference the routing_cases array and the
"hint:reaction" entry to make the justification explicit.
In `@tests/live_routing_e2e.rs`:
- Around line 106-144: The SSE reader read_sse_event_by_types should wrap each
await of stream.next() with a per-chunk timeout to avoid hanging on
partial/stalled keepalives: import tokio::time::timeout and use
timeout(Duration::from_secs(N), stream.next()).await at the point where
stream.next().await is currently called, handle the Ok(Some(Ok(bytes))) path as
before, treat Ok(None) as stream closed, treat Ok(Some(Err(e))) as the existing
panic path, and treat Err(_) (timeout) by continuing the loop (or returning a
descriptive error) so the function doesn't block indefinitely; add a small
configurable constant (e.g., CHUNK_TIMEOUT_SECS) near the top of
read_sse_event_by_types to control the per-chunk timeout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a8693a09-4b35-481d-aa6d-a7de211d3bfd
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
src/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/channels/runtime/startup.rssrc/openhuman/mod.rssrc/openhuman/providers/ops.rssrc/openhuman/routing/health.rssrc/openhuman/routing/mod.rssrc/openhuman/routing/policy.rssrc/openhuman/routing/provider.rssrc/openhuman/routing/quality.rssrc/openhuman/routing/telemetry.rstests/json_rpc_e2e.rstests/live_routing_e2e.rs
|
@YellowSnnowmann pls resolve the coderabbit issues and then push |
|
…Checker - Introduced a new `factory.rs` file containing the `new_provider` function to construct an `IntelligentRoutingProvider` that integrates local AI capabilities with remote backend providers. - Enhanced the `LocalHealthChecker` in `health.rs` by adding a `reqwest::Client` for improved health probing, including better logging for cache hits and misses, and streamlined cache updates. - Updated health check logic to utilize the new client, ensuring more reliable health status checks for local AI services.
- Moved the `new_provider` function from `mod.rs` to a new `factory.rs` module to improve code organization and maintainability. - Updated public exports to include the new location of `new_provider`, ensuring continued accessibility for constructing `IntelligentRoutingProvider` instances. - Removed the old implementation from `mod.rs`, streamlining the routing module's structure.
- Removed redundant conditions for routing medium tasks locally, streamlining the decision-making process in the `decide` function. - Updated comments to reflect the simplified logic, enhancing code clarity and maintainability.
…andling - Introduced a timeout mechanism for reading SSE events to prevent indefinite blocking. - Updated environment variable management in tests to ensure safe access and cleanup. - Improved comments for clarity regarding the safety of environment variable mutations during tests.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/openhuman/routing/policy.rs (1)
47-60:⚠️ Potential issue | 🟠 MajorLatency/cost hints are still a no-op.
decide()only branches onprivacy_required,Heavy, andlocal_available, solatency_budgetandcost_sensitivitynever change the chosen target. The tests at Lines 353-382 also only assert outcomes that already happen without those hints, so they won't catch this. Either make these fields participate in routing or drop them from the public API/docs until they do.Also applies to: 120-149, 351-382
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/routing/policy.rs` around lines 47 - 60, The RoutingHints fields latency_budget and cost_sensitivity are currently ignored by decide(): update the decide() logic to include these hints (in addition to privacy_required, Heavy, and local_available) by computing a simple preference score or tie-breaker so latency_budget and cost_sensitivity influence target selection; e.g., in decide() read RoutingHints.latency_budget and .cost_sensitivity and bias the choice toward local for low-latency or high-cost-sensitivity preferences (or toward remote for opposite values), or if you prefer the alternate approach, remove latency_budget and cost_sensitivity from the public RoutingHints API and docs and adjust tests accordingly (update references to RoutingHints, decide(), privacy_required, Heavy, and local_available so behavior/tests remain consistent).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/routing/provider.rs`:
- Around line 450-471: The router currently advertises streaming and
unconditionally calls remote.stream_chat_with_system (in
stream_chat_with_system) which bypasses privacy checks; change
supports_streaming() and stream_chat_with_system() to consult
resolve()/policy::decide() and the RoutingHints privacy flag before delegating:
have supports_streaming() return true only if the effective resolve/policy
allows remote streaming for the given model/hints (or explicitly document
local-only support), and in stream_chat_with_system() call resolve() and
policy::decide() (using policy::decide() or similar decision path used
elsewhere) to determine Local vs Remote; if decision == Local but local
streaming not implemented, return a fail-closed boxed stream that yields a
StreamResult::Err immediately instead of calling remote.stream_chat_with_system,
otherwise call remote.stream_chat_with_system with resolve_remote_model as
before. Reference symbols: supports_streaming, stream_chat_with_system,
resolve(), policy::decide(), resolve_remote_model,
remote.stream_chat_with_system, and ensure the error-producing stream matches
StreamResult<StreamChunk> semantics.
In `@tests/json_rpc_e2e.rs`:
- Around line 808-817: The test's routing_cases entry for ("hint:reaction",
"hint:reaction") is wrong because classify("hint:reaction") returns Lightweight
and resolve_remote_model() maps non-Heavy tasks to remote_fallback_model; update
the expected target model in tests/json_rpc_e2e.rs to "e2e-mock-model" (or
remove the ("hint:reaction", ...) case entirely if you don't want to exercise
this path), ensuring the change aligns with the IntelligentRoutingProvider
behavior and remote_fallback_model resolution.
---
Duplicate comments:
In `@src/openhuman/routing/policy.rs`:
- Around line 47-60: The RoutingHints fields latency_budget and cost_sensitivity
are currently ignored by decide(): update the decide() logic to include these
hints (in addition to privacy_required, Heavy, and local_available) by computing
a simple preference score or tie-breaker so latency_budget and cost_sensitivity
influence target selection; e.g., in decide() read RoutingHints.latency_budget
and .cost_sensitivity and bias the choice toward local for low-latency or
high-cost-sensitivity preferences (or toward remote for opposite values), or if
you prefer the alternate approach, remove latency_budget and cost_sensitivity
from the public RoutingHints API and docs and adjust tests accordingly (update
references to RoutingHints, decide(), privacy_required, Heavy, and
local_available so behavior/tests remain consistent).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a29d8192-8d2a-4c44-97a7-096caa419a6a
📒 Files selected for processing (7)
src/openhuman/routing/factory.rssrc/openhuman/routing/health.rssrc/openhuman/routing/mod.rssrc/openhuman/routing/policy.rssrc/openhuman/routing/provider.rstests/json_rpc_e2e.rstests/live_routing_e2e.rs
✅ Files skipped from review due to trivial changes (1)
- src/openhuman/routing/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/openhuman/routing/health.rs
- tests/live_routing_e2e.rs
- Updated the routing logic for medium tasks to utilize hints for local bias, ensuring more accurate routing decisions. - Revised comments throughout the code to clarify the behavior of task categories and routing preferences. - Adjusted test cases to reflect the new routing logic, ensuring they accurately validate the expected behavior for medium tasks.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 788-790: The test setup uses write_min_config() which leaves local
AI enabled, causing policy::decide() to choose Lightweight/local-first and
making hint:reaction non-deterministic; update the test's configuration after
invoking write_min_config() (for both openhuman_home and user_scoped_dir
occurrences) to explicitly disable local AI (set [local_ai] enabled = false) so
the scenario is remote-only and /openai/v1/chat/completions will
deterministically receive the e2e-mock-model.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bdaea13d-36d2-4dbe-be12-9cf80f916e7a
📒 Files selected for processing (4)
src/openhuman/routing/mod.rssrc/openhuman/routing/policy.rssrc/openhuman/routing/provider.rstests/json_rpc_e2e.rs
✅ Files skipped from review due to trivial changes (2)
- src/openhuman/routing/mod.rs
- src/openhuman/routing/provider.rs
…tion - Added a timeout mechanism to the dictation event test to prevent indefinite blocking while waiting for the "pressed" event. - Enhanced the test logic to consume events until the expected event type is received, improving reliability and clarity in the test flow.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/json_rpc_e2e.rs (1)
812-815:⚠️ Potential issue | 🟠 Major
hint:reactionis still asserting the wrong contract.In this remote-only setup,
hint:reactionis a lightweight hint, so it should fall back to the configured remote/default model rather than reach the backend as the literal"hint:reaction". The expectation here and the exemption at Lines 869-876 reintroduce the opposite behavior.🛠️ Suggested fix
let routing_cases = [ ("hint:reasoning", "reasoning-v1"), ("hint:agentic", "agentic-v1"), ("hint:coding", "coding-v1"), ("reasoning-v1", "reasoning-v1"), - // Web chat forwards lightweight hint overrides as-is for this path, - // so the upstream model receives the original hint string. - ("hint:reaction", "hint:reaction"), + // Lightweight hints are not normalized; in remote-only mode they route + // to the configured remote fallback/default model. + ("hint:reaction", "e2e-mock-model"), ]; @@ - if model_override.starts_with("hint:") - && *model_override != "hint:reaction" - && *expected_model != *model_override - { + if model_override.starts_with("hint:") && *expected_model != *model_override { assert!( !captured_models.iter().any(|m| m == model_override), "hint model should not pass through for case={model_override}: {captured_models:?}" ); }Also applies to: 869-876
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/json_rpc_e2e.rs` around lines 812 - 815, The test currently asserts ("hint:reaction","hint:reaction") which treats the lightweight hint "hint:reaction" as a literal backend model; change the expectation so the hint falls back to the configured remote/default model instead of the literal string. Update the tuple in tests/json_rpc_e2e.rs to expect the test suite's remote/default model identifier (the same symbol used elsewhere for remote_model/default_model) and adjust the exemption logic that special-cases "hint:reaction" (the block around the lightweight-hint exemption) so it does not route lightweight hints to the backend as the literal "hint:reaction" but allows the normal remote fallback. Ensure both the tuple and the exemption block use the shared remote/default model symbol so behavior is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 812-815: The test currently asserts
("hint:reaction","hint:reaction") which treats the lightweight hint
"hint:reaction" as a literal backend model; change the expectation so the hint
falls back to the configured remote/default model instead of the literal string.
Update the tuple in tests/json_rpc_e2e.rs to expect the test suite's
remote/default model identifier (the same symbol used elsewhere for
remote_model/default_model) and adjust the exemption logic that special-cases
"hint:reaction" (the block around the lightweight-hint exemption) so it does not
route lightweight hints to the backend as the literal "hint:reaction" but allows
the normal remote fallback. Ensure both the tuple and the exemption block use
the shared remote/default model symbol so behavior is consistent.
* chore: update OpenHuman version to 0.52.9 and add intelligent routing functionality - Bumped OpenHuman version from 0.52.7 to 0.52.9 in Cargo.lock files. - Introduced a new routing module that implements intelligent model routing based on task complexity and local model health. - Added a health checker for the local Ollama model server to improve routing decisions. - Enhanced the provider to classify tasks and determine the appropriate backend (local or remote) for processing requests. - Updated related files to support the new routing logic and ensure seamless integration with existing functionalities. * refactor(routing): streamline provider and enhance routing logic - Removed the `build_tool_instructions` function from the public API, simplifying the routing module. - Updated the `IntelligentRoutingProvider` to utilize a more efficient model resolution process, improving routing decisions based on task complexity and local model health. - Introduced a new `quality` module to assess response quality, enabling better fallback decisions when local responses are deemed low quality. - Enhanced the `RoutingHints` struct to provide more granular control over routing behavior, including privacy requirements and cost sensitivity. - Added tests to validate the new routing logic and quality assessment, ensuring robust functionality across various scenarios. * feat(tests): add live end-to-end routing tests for real backend integration - Introduced a new test file `live_routing_e2e.rs` containing end-to-end tests for routing against a live backend. - Tests require a valid backend URL, user session JWT, and real network interactions, hence marked as `#[ignore]`. - Implemented functionality to set up environment variables, write configuration files, and perform JSON-RPC calls to validate routing behavior. - Added assertions to ensure correct handling of various routing cases, enhancing test coverage for the routing module. * refactor(format): ran format command * feat(routing): add IntelligentRoutingProvider and enhance LocalHealthChecker - Introduced a new `factory.rs` file containing the `new_provider` function to construct an `IntelligentRoutingProvider` that integrates local AI capabilities with remote backend providers. - Enhanced the `LocalHealthChecker` in `health.rs` by adding a `reqwest::Client` for improved health probing, including better logging for cache hits and misses, and streamlined cache updates. - Updated health check logic to utilize the new client, ensuring more reliable health status checks for local AI services. * refactor(routing): move new_provider function to factory module - Moved the `new_provider` function from `mod.rs` to a new `factory.rs` module to improve code organization and maintainability. - Updated public exports to include the new location of `new_provider`, ensuring continued accessibility for constructing `IntelligentRoutingProvider` instances. - Removed the old implementation from `mod.rs`, streamlining the routing module's structure. * refactor(routing): simplify local task routing logic - Removed redundant conditions for routing medium tasks locally, streamlining the decision-making process in the `decide` function. - Updated comments to reflect the simplified logic, enhancing code clarity and maintainability. * docs(tests): clarify comments in json_rpc_e2e.rs regarding hint overrides logic. * refactor(tests): enhance live routing end-to-end tests with timeout handling - Introduced a timeout mechanism for reading SSE events to prevent indefinite blocking. - Updated environment variable management in tests to ensure safe access and cleanup. - Improved comments for clarity regarding the safety of environment variable mutations during tests. * refactor(tests): update SSE event reading in live routing tests. * refactor(routing): enhance medium task routing logic and update comments - Updated the routing logic for medium tasks to utilize hints for local bias, ensuring more accurate routing decisions. - Revised comments throughout the code to clarify the behavior of task categories and routing preferences. - Adjusted test cases to reflect the new routing logic, ensuring they accurately validate the expected behavior for medium tasks. * refactor(tests): implement timeout handling for dictation event reception - Added a timeout mechanism to the dictation event test to prevent indefinite blocking while waiting for the "pressed" event. - Enhanced the test logic to consume events until the expected event type is received, improving reliability and clarity in the test flow. --------- Co-authored-by: Steven Enamakel <31011319+senamakel@users.noreply.github.com>
Summary
Lightweight,Medium,Heavy) and routing policy using task type, privacy constraints, latency/cost hints, and local health.reasoning-v1,agentic-v1,coding-v1).Problem
Solution
Lightweight/Medium/Heavyhint:reasoning->reasoning-v1hint:agentic->agentic-v1hint:coding->coding-v1channel_web_chat+ SSE) for real backend routing verification.Submission Checklist
cargo testtargeted suites)(Any feature related checklist can go in here)
Impact
Related
hint:reactionunder healthy local runtime.Summary by CodeRabbit
New Features
Behavior Changes
Tests