Skip to content

Feat: smart routing#569

Merged
senamakel merged 13 commits into
tinyhumansai:mainfrom
YellowSnnowmann:feat/smart-routing
Apr 14, 2026
Merged

Feat: smart routing#569
senamakel merged 13 commits into
tinyhumansai:mainfrom
YellowSnnowmann:feat/smart-routing

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented Apr 14, 2026

Summary

  • Implemented an intelligent, policy-driven routing layer that selects local vs remote model execution by task complexity and runtime conditions.
  • Added deterministic task categorization (Lightweight, Medium, Heavy) and routing policy using task type, privacy constraints, latency/cost hints, and local health.
  • Defaulted lightweight/medium paths to local when healthy, with guarded fallback to remote on local failure or low-quality output.
  • Kept heavy tasks remote-first while normalizing heavy hint model names to backend-valid model IDs (reasoning-v1, agentic-v1, coding-v1).
  • Added routing telemetry and expanded test coverage (policy/unit/integration/live) for routing and fallback behavior.

Problem

  • Model selection was not consistently optimized by complexity, latency/cost sensitivity, privacy, and runtime health.
  • This led to avoidable remote usage, weaker local utilization, and higher risk of latency/cost inefficiency.
  • Heavy hint model strings could also be forwarded upstream in a form not aligned with backend model registry expectations.

Solution

  • Added a routing provider that:
    • classifies tasks into Lightweight / Medium / Heavy
    • evaluates local availability/health
    • applies deterministic routing rules with explainable outcomes
    • performs safe local->remote fallback on error or low-quality output (unless privacy requires local-only behavior)
  • Implemented heavy-hint normalization so remote calls use backend-compatible model IDs:
    • hint:reasoning -> reasoning-v1
    • hint:agentic -> agentic-v1
    • hint:coding -> coding-v1
  • Applied the same normalization logic to streaming remote path for consistency.
  • Added/updated tests:
    • policy tests (classification + decision determinism)
    • provider fallback/regression tests
    • JSON-RPC integration tests for web-chat-triggered routing cases
    • live ignored e2e (channel_web_chat + SSE) for real backend routing verification.

Submission Checklist

  • Unit tests — routing policy/provider tests pass (cargo test targeted suites)
  • E2E / integration — JSON-RPC routing integration test added and passing; live ignored e2e added and validated in real backend setup
  • N/A — Not applicable
  • Doc comments — Routing modules and tests include behavior-focused comments
  • Inline comments — Added for non-obvious routing/fallback/config-resolution behavior

(Any feature related checklist can go in here)

Impact

  • Behavioral: Lightweight/medium tasks now prefer local execution when healthy, improving local capacity utilization.
  • Reliability: Fallback behavior is explicit and tested (error-based and quality-based).
  • Compatibility: Heavy remote routing aligns with backend registry model names, reducing model-resolution failures.
  • Observability: Routing decisions and outcomes are traceable via telemetry/log fields.
  • Performance/Cost: Expected improvement in latency and token cost for lightweight/medium workloads via local-first policy.

Related

Summary by CodeRabbit

  • New Features

    • Added intelligent routing to choose local vs remote inference with health checks, policy-driven decisions, quality heuristics, and telemetry; factory to build routing-enabled providers.
    • Local health probing with caching and TTL; routing records emitted for observability.
  • Behavior Changes

    • Provider construction now uses the intelligent routing provider by default; routing influences streaming, warmup, and fallback behavior.
  • Tests

    • New unit and end-to-end tests covering routing, health, quality fallback, privacy overrides, and live scenarios.

… functionality

- Bumped OpenHuman version from 0.52.7 to 0.52.9 in Cargo.lock files.
- Introduced a new routing module that implements intelligent model routing based on task complexity and local model health.
- Added a health checker for the local Ollama model server to improve routing decisions.
- Enhanced the provider to classify tasks and determine the appropriate backend (local or remote) for processing requests.
- Updated related files to support the new routing logic and ensure seamless integration with existing functionalities.
- Removed the `build_tool_instructions` function from the public API, simplifying the routing module.
- Updated the `IntelligentRoutingProvider` to utilize a more efficient model resolution process, improving routing decisions based on task complexity and local model health.
- Introduced a new `quality` module to assess response quality, enabling better fallback decisions when local responses are deemed low quality.
- Enhanced the `RoutingHints` struct to provide more granular control over routing behavior, including privacy requirements and cost sensitivity.
- Added tests to validate the new routing logic and quality assessment, ensuring robust functionality across various scenarios.
…ration

- Introduced a new test file `live_routing_e2e.rs` containing end-to-end tests for routing against a live backend.
- Tests require a valid backend URL, user session JWT, and real network interactions, hence marked as `#[ignore]`.
- Implemented functionality to set up environment variables, write configuration files, and perform JSON-RPC calls to validate routing behavior.
- Added assertions to ensure correct handling of various routing cases, enhancing test coverage for the routing module.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Adds a new intelligent routing subsystem (routing factory, policy, provider, health, quality, telemetry), wires provider construction to use the routing factory, exposes routing module, adjusts a harness re-export, and adds unit and e2e tests for routing behavior.

Changes

Cohort / File(s) Summary
Module & Exports
src/openhuman/mod.rs, src/openhuman/agent/harness/mod.rs
Expose new public routing module; remove pub(crate) re-export of build_tool_instructions from harness, leaving only build_tool_instructions_filtered.
Routing Subsystem
src/openhuman/routing/mod.rs, src/openhuman/routing/health.rs, src/openhuman/routing/policy.rs, src/openhuman/routing/provider.rs, src/openhuman/routing/quality.rs, src/openhuman/routing/telemetry.rs, src/openhuman/routing/factory.rs
Add intelligent routing API and implementation: local health checker, deterministic task classification and decision policy, IntelligentRoutingProvider orchestration with guarded fallback semantics, quality heuristics, telemetry record+emit, and factory to construct routing providers.
Provider Ops & Call Sites
src/openhuman/providers/ops.rs, src/openhuman/agent/harness/session/builder.rs, src/openhuman/channels/runtime/startup.rs
Add create_intelligent_routing_provider and replace prior provider construction calls to use the new factory (now passing &config or config as required).
Tests / E2E
tests/json_rpc_e2e.rs, tests/live_routing_e2e.rs
Capture mock backend model in e2e tests; add routing-focused asynchronous JSON-RPC/SSE e2e tests and an ignored live routing smoke test that exercises local/remote behavior against a live backend.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Routing as IntelligentRoutingProvider
    participant Policy as PolicyEngine
    participant Health as LocalHealthChecker
    participant Local as LocalProvider
    participant Remote as RemoteProvider
    participant Quality as QualityChecker
    participant Telemetry as TelemetryEmitter

    Client->>Routing: chat(request with model_hint)
    Routing->>Policy: classify(model_hint)
    Policy-->>Routing: TaskCategory
    Routing->>Health: is_healthy()
    Health-->>Routing: bool
    Routing->>Policy: decide(category, local_available, hints)
    Policy-->>Routing: primary + optional_fallback

    alt primary == Local
        Routing->>Local: execute_request
        Local-->>Routing: response / error
        Routing->>Quality: is_low_quality(response)
        Quality-->>Routing: bool
        alt response OK and quality good
            Routing->>Telemetry: emit(local_success_record)
        else error or low-quality and fallback exists
            Routing->>Remote: execute_request
            Remote-->>Routing: response
            Routing->>Telemetry: emit(fallback_record)
        end
    else primary == Remote
        Routing->>Remote: execute_request
        Remote-->>Routing: response
        Routing->>Telemetry: emit(remote_record)
    end

    Routing-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • senamakel

Poem

🐇 I hop through hints and health checks bright,
I choose local burrows or the cloud's high light,
If local falters, off to remote I race,
I log the path and metrics of each case,
A routing rabbit, nimble, swift, and spry.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: smart routing' directly relates to the main change: a comprehensive policy-driven routing layer for local vs remote model execution decisions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
tests/json_rpc_e2e.rs (1)

808-815: Clarify hint:reaction expected behavior.

The test expects hint:reaction to pass through verbatim to the backend (line 814), but the PR description states lightweight tasks should route locally when Ollama is healthy. Since the test config doesn't set local_ai.enabled = true, this behavior is consistent with the remote-only path—but the comment on line 813 could be clearer:

-        // Web chat currently forwards lightweight hint overrides as-is.
+        // With local_ai disabled, lightweight hints pass through to remote as-is.
         ("hint:reaction", "hint:reaction"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/json_rpc_e2e.rs` around lines 808 - 815, Update the comment above the
routing_cases array to clearly state why the ("hint:reaction", "hint:reaction")
entry is expected: mention that web chat forwards lightweight hint overrides
verbatim and that in this test the local AI path is disabled (local_ai.enabled
not set/false), so the remote-only routing applies; reference the routing_cases
array and the "hint:reaction" entry to make the justification explicit.
src/openhuman/routing/mod.rs (1)

62-85: Move new_provider() to a sibling file per module guidelines.

The new_provider function contains operational/factory logic (constructing providers, wiring dependencies). Per coding guidelines, mod.rs should remain light and export-focused, with operational code in sibling files like factory.rs or ops.rs.

As per coding guidelines: "Keep domain mod.rs files light and export-focused; put operational code in sibling files (ops.rs, store.rs, schedule.rs, types.rs), then re-export the public API from mod.rs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/routing/mod.rs` around lines 62 - 85, Move the operational
factory function new_provider out of mod.rs into a sibling file (e.g.,
factory.rs or ops.rs) and keep mod.rs export-focused: create a new file
containing the new_provider implementation that constructs
OpenAiCompatibleProvider, LocalHealthChecker (using OLLAMA_BASE_URL), and calls
IntelligentRoutingProvider::new with local_ai_config.chat_model_id and
remote_fallback_model; then update mod.rs to pub use the new_provider symbol
(re-export) so callers still import new_provider from the module while the
factory wiring lives in the sibling file.
src/openhuman/routing/health.rs (2)

88-105: Consider reusing reqwest::Client instead of building per-probe.

Creating a new Client on every probe() call is suboptimal — reqwest::Client is designed to be reused and internally manages connection pools. Building it per-call incurs allocation overhead and prevents connection reuse for repeated probes.

♻️ Proposed refactor: store Client in the struct
 pub struct LocalHealthChecker {
     probe_url: String,
     cache: Mutex<Option<HealthCache>>,
     ttl: Duration,
+    client: reqwest::Client,
 }

 impl LocalHealthChecker {
     pub fn with_ttl(base_url: &str, ttl: Duration) -> Self {
+        let client = reqwest::Client::builder()
+            .timeout(PROBE_TIMEOUT)
+            .build()
+            .unwrap_or_default();
         Self {
             probe_url: format!("{base_url}/api/tags"),
             cache: Mutex::new(None),
             ttl,
+            client,
         }
     }

     async fn probe(&self) -> bool {
-        let client = reqwest::Client::builder()
-            .timeout(PROBE_TIMEOUT)
-            .build()
-            .unwrap_or_default();
-
-        match client.get(&self.probe_url).send().await {
+        match self.client.get(&self.probe_url).send().await {
             Ok(resp) => resp.status().is_success(),
             // ...
         }
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/routing/health.rs` around lines 88 - 105, The probe() method
currently builds a new reqwest::Client on every call; change the struct that
implements probe() (the type containing probe_url and probe()) to own a single
reqwest::Client (e.g., add a client: reqwest::Client field) that is constructed
once (in the struct's constructor/new or factory) with the PROBE_TIMEOUT and
reused for each probe; update probe() to call
self.client.get(&self.probe_url).send().await and remove per-call
Client::builder() creation, and ensure the initial
Client::builder().timeout(PROBE_TIMEOUT).build() error is handled at
construction time rather than unwrap_or_default.

56-85: Add trace-level logs for cache hits and probe results.

Per coding guidelines, add development-oriented logs at branch decisions and state transitions. The health check result directly impacts routing decisions, so visibility into cache behavior aids debugging.

📝 Proposed logging additions
     pub async fn is_healthy(&self) -> bool {
         // Fast path: return cached result if still fresh.
         {
             let guard = self.cache.lock();
             if let Some(cached) = guard.as_ref() {
                 if cached.checked_at.elapsed() < cached.ttl {
+                    tracing::trace!(
+                        healthy = (cached.last_result == CachedStatus::Healthy),
+                        "[routing] local health cache hit"
+                    );
                     return cached.last_result == CachedStatus::Healthy;
                 }
             }
         }

         // Slow path: probe and update cache.
         let healthy = self.probe().await;
+        tracing::debug!(
+            url = %self.probe_url,
+            healthy,
+            "[routing] local health probe completed"
+        );
         let status = if healthy {

As per coding guidelines: "Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/routing/health.rs` around lines 56 - 85, Add trace-level logs
in is_healthy to show cache behavior and probe outcomes: log a trace when the
fast-path cache is hit (include cached.last_result, cached.checked_at.elapsed()
and cached.ttl), log a trace when the cache is stale/miss before calling
self.probe(), log the probe result after awaiting self.probe() (include the
boolean healthy and the mapped CachedStatus), and log when updating the cache
(include new HealthCache fields: last_result, checked_at, ttl). Use the existing
is_healthy, probe, cache, HealthCache and CachedStatus symbols and emit
trace-level messages so these branch decisions and state transitions are visible
in dev logs.
src/openhuman/routing/provider.rs (1)

268-306: Complex nested fallback logic — consider consolidation.

The local path has three branches: error fallback (272-280), quality fallback (281-300), and success (298-300, 302). The duplication of self.remote.chat(request, fb, temperature).await in both fallback paths and the deep nesting reduce readability.

Consider extracting a should_fallback(&Result<ChatResponse>, privacy: bool) -> bool helper and a single fallback path to reduce nesting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/routing/provider.rs` around lines 268 - 306, The local routing
branch in effective_primary has duplicated nested branches for error- and
quality-based fallbacks calling self.remote.chat(...) in multiple places;
extract a helper like should_fallback(result: &Result<ChatResponse>,
privacy_required: bool, fallback: &Option<RoutingTarget>) -> bool that checks:
(1) error and !privacy_required, or (2) Ok(resp) and !privacy_required and
quality::is_low_quality(resp.text.as_deref().unwrap_or("")) and
fallback.is_some(); then replace the branching in the RoutingTarget::Local arm
to call self.local.chat(request, m, temperature).await once, set
fallback_occurred when should_fallback returns true, and perform a single
unified remote fallback call self.remote.chat(request, fb, temperature).await
(using the fb from fallback) if needed; keep references to RoutingTarget::Local,
RoutingTarget::Remote, local.chat, remote.chat, hints.privacy_required, and
quality::is_low_quality to locate the changes.
tests/live_routing_e2e.rs (1)

106-144: SSE stream reader lacks internal read timeout.

The read_sse_event_by_types function relies entirely on the outer 120s timeout. If the backend sends partial data and stalls (keepalive without events), the stream won't close and may hang. For manually-run ignored tests this is acceptable, but adding a per-chunk timeout would improve robustness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/live_routing_e2e.rs` around lines 106 - 144, The SSE reader
read_sse_event_by_types should wrap each await of stream.next() with a per-chunk
timeout to avoid hanging on partial/stalled keepalives: import
tokio::time::timeout and use timeout(Duration::from_secs(N),
stream.next()).await at the point where stream.next().await is currently called,
handle the Ok(Some(Ok(bytes))) path as before, treat Ok(None) as stream closed,
treat Ok(Some(Err(e))) as the existing panic path, and treat Err(_) (timeout) by
continuing the loop (or returning a descriptive error) so the function doesn't
block indefinitely; add a small configurable constant (e.g., CHUNK_TIMEOUT_SECS)
near the top of read_sse_event_by_types to control the per-chunk timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/routing/policy.rs`:
- Around line 149-156: The latency_budget and cost_sensitivity checks in the
use_local decision are dead code because only Lightweight/Medium tasks reach
this branch and matches!(category, TaskCategory::Lightweight |
TaskCategory::Medium) always makes the hint checks unreachable; remove the hint
clauses so use_local is computed from local_available and the category match
only (i.e., simplify the use_local expression), or alternatively if you intended
hints to affect Medium tasks refactor the condition to match only Lightweight in
the category check and apply hint checks for Medium via matches!(category,
TaskCategory::Medium) combined with hints.LatencyBudget/CostSensitivity; update
the variable use_local and associated comment accordingly.

In `@src/openhuman/routing/provider.rs`:
- Around line 158-165: The code slices `text[..text.len().min(80)]` which can
panic on UTF-8 boundaries; replace this with a safe truncation helper and use it
in the tracing::warn call: add a small function like `fn truncate_safe(s: &str,
max_bytes: usize) -> &str { if s.len() <= max_bytes { s } else { let mut end =
max_bytes; while end > 0 && !s.is_char_boundary(end) { end -= 1; } &s[..end] }
}` (or use `char_indices()` to compute the boundary), then change the
`response_preview = &text[..text.len().min(80)]` to `response_preview =
truncate_safe(&text, 80)` where the branch checks `Ok(text) if !privacy_required
&& quality::is_low_quality(text)` and the fallback handling
(RoutingTarget::Remote, fallback_fn, hint) remains unchanged.

In `@tests/live_routing_e2e.rs`:
- Around line 31-46: The EnvVarGuard implementation calls
std::env::set_var/remove_var which are unsafe in Rust 2024; wrap those calls in
unsafe blocks and document the safety rationale: ensure the test-level mutex
(live_e2e_env_lock) serializes access and explain in a comment that the guard
relies on that mutex to avoid UB, or alternatively replace the approach with
process-level isolation if you need stricter safety. Specifically, update
EnvVarGuard::set_to_path to call unsafe { std::env::set_var(...) } and update
Drop::drop to call unsafe { std::env::set_var(...) } / unsafe {
std::env::remove_var(...) }, and add a concise comment referencing EnvVarGuard
and live_e2e_env_lock describing why the unsafe blocks are sound.

---

Nitpick comments:
In `@src/openhuman/routing/health.rs`:
- Around line 88-105: The probe() method currently builds a new reqwest::Client
on every call; change the struct that implements probe() (the type containing
probe_url and probe()) to own a single reqwest::Client (e.g., add a client:
reqwest::Client field) that is constructed once (in the struct's constructor/new
or factory) with the PROBE_TIMEOUT and reused for each probe; update probe() to
call self.client.get(&self.probe_url).send().await and remove per-call
Client::builder() creation, and ensure the initial
Client::builder().timeout(PROBE_TIMEOUT).build() error is handled at
construction time rather than unwrap_or_default.
- Around line 56-85: Add trace-level logs in is_healthy to show cache behavior
and probe outcomes: log a trace when the fast-path cache is hit (include
cached.last_result, cached.checked_at.elapsed() and cached.ttl), log a trace
when the cache is stale/miss before calling self.probe(), log the probe result
after awaiting self.probe() (include the boolean healthy and the mapped
CachedStatus), and log when updating the cache (include new HealthCache fields:
last_result, checked_at, ttl). Use the existing is_healthy, probe, cache,
HealthCache and CachedStatus symbols and emit trace-level messages so these
branch decisions and state transitions are visible in dev logs.

In `@src/openhuman/routing/mod.rs`:
- Around line 62-85: Move the operational factory function new_provider out of
mod.rs into a sibling file (e.g., factory.rs or ops.rs) and keep mod.rs
export-focused: create a new file containing the new_provider implementation
that constructs OpenAiCompatibleProvider, LocalHealthChecker (using
OLLAMA_BASE_URL), and calls IntelligentRoutingProvider::new with
local_ai_config.chat_model_id and remote_fallback_model; then update mod.rs to
pub use the new_provider symbol (re-export) so callers still import new_provider
from the module while the factory wiring lives in the sibling file.

In `@src/openhuman/routing/provider.rs`:
- Around line 268-306: The local routing branch in effective_primary has
duplicated nested branches for error- and quality-based fallbacks calling
self.remote.chat(...) in multiple places; extract a helper like
should_fallback(result: &Result<ChatResponse>, privacy_required: bool, fallback:
&Option<RoutingTarget>) -> bool that checks: (1) error and !privacy_required, or
(2) Ok(resp) and !privacy_required and
quality::is_low_quality(resp.text.as_deref().unwrap_or("")) and
fallback.is_some(); then replace the branching in the RoutingTarget::Local arm
to call self.local.chat(request, m, temperature).await once, set
fallback_occurred when should_fallback returns true, and perform a single
unified remote fallback call self.remote.chat(request, fb, temperature).await
(using the fb from fallback) if needed; keep references to RoutingTarget::Local,
RoutingTarget::Remote, local.chat, remote.chat, hints.privacy_required, and
quality::is_low_quality to locate the changes.

In `@tests/json_rpc_e2e.rs`:
- Around line 808-815: Update the comment above the routing_cases array to
clearly state why the ("hint:reaction", "hint:reaction") entry is expected:
mention that web chat forwards lightweight hint overrides verbatim and that in
this test the local AI path is disabled (local_ai.enabled not set/false), so the
remote-only routing applies; reference the routing_cases array and the
"hint:reaction" entry to make the justification explicit.

In `@tests/live_routing_e2e.rs`:
- Around line 106-144: The SSE reader read_sse_event_by_types should wrap each
await of stream.next() with a per-chunk timeout to avoid hanging on
partial/stalled keepalives: import tokio::time::timeout and use
timeout(Duration::from_secs(N), stream.next()).await at the point where
stream.next().await is currently called, handle the Ok(Some(Ok(bytes))) path as
before, treat Ok(None) as stream closed, treat Ok(Some(Err(e))) as the existing
panic path, and treat Err(_) (timeout) by continuing the loop (or returning a
descriptive error) so the function doesn't block indefinitely; add a small
configurable constant (e.g., CHUNK_TIMEOUT_SECS) near the top of
read_sse_event_by_types to control the per-chunk timeout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a8693a09-4b35-481d-aa6d-a7de211d3bfd

📥 Commits

Reviewing files that changed from the base of the PR and between 219a2ff and 3346698.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/channels/runtime/startup.rs
  • src/openhuman/mod.rs
  • src/openhuman/providers/ops.rs
  • src/openhuman/routing/health.rs
  • src/openhuman/routing/mod.rs
  • src/openhuman/routing/policy.rs
  • src/openhuman/routing/provider.rs
  • src/openhuman/routing/quality.rs
  • src/openhuman/routing/telemetry.rs
  • tests/json_rpc_e2e.rs
  • tests/live_routing_e2e.rs

Comment thread src/openhuman/routing/policy.rs Outdated
Comment thread src/openhuman/routing/provider.rs
Comment thread tests/live_routing_e2e.rs
senamakel
senamakel previously approved these changes Apr 14, 2026
@senamakel
Copy link
Copy Markdown
Member

@YellowSnnowmann pls resolve the coderabbit issues and then push

@YellowSnnowmann
Copy link
Copy Markdown
Contributor Author

@YellowSnnowmann pls resolve the coderabbit issues and then push
Resolving

…Checker

- Introduced a new `factory.rs` file containing the `new_provider` function to construct an `IntelligentRoutingProvider` that integrates local AI capabilities with remote backend providers.
- Enhanced the `LocalHealthChecker` in `health.rs` by adding a `reqwest::Client` for improved health probing, including better logging for cache hits and misses, and streamlined cache updates.
- Updated health check logic to utilize the new client, ensuring more reliable health status checks for local AI services.
- Moved the `new_provider` function from `mod.rs` to a new `factory.rs` module to improve code organization and maintainability.
- Updated public exports to include the new location of `new_provider`, ensuring continued accessibility for constructing `IntelligentRoutingProvider` instances.
- Removed the old implementation from `mod.rs`, streamlining the routing module's structure.
- Removed redundant conditions for routing medium tasks locally, streamlining the decision-making process in the `decide` function.
- Updated comments to reflect the simplified logic, enhancing code clarity and maintainability.
…andling

- Introduced a timeout mechanism for reading SSE events to prevent indefinite blocking.
- Updated environment variable management in tests to ensure safe access and cleanup.
- Improved comments for clarity regarding the safety of environment variable mutations during tests.
senamakel
senamakel previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/openhuman/routing/policy.rs (1)

47-60: ⚠️ Potential issue | 🟠 Major

Latency/cost hints are still a no-op.

decide() only branches on privacy_required, Heavy, and local_available, so latency_budget and cost_sensitivity never change the chosen target. The tests at Lines 353-382 also only assert outcomes that already happen without those hints, so they won't catch this. Either make these fields participate in routing or drop them from the public API/docs until they do.

Also applies to: 120-149, 351-382

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/routing/policy.rs` around lines 47 - 60, The RoutingHints
fields latency_budget and cost_sensitivity are currently ignored by decide():
update the decide() logic to include these hints (in addition to
privacy_required, Heavy, and local_available) by computing a simple preference
score or tie-breaker so latency_budget and cost_sensitivity influence target
selection; e.g., in decide() read RoutingHints.latency_budget and
.cost_sensitivity and bias the choice toward local for low-latency or
high-cost-sensitivity preferences (or toward remote for opposite values), or if
you prefer the alternate approach, remove latency_budget and cost_sensitivity
from the public RoutingHints API and docs and adjust tests accordingly (update
references to RoutingHints, decide(), privacy_required, Heavy, and
local_available so behavior/tests remain consistent).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/routing/provider.rs`:
- Around line 450-471: The router currently advertises streaming and
unconditionally calls remote.stream_chat_with_system (in
stream_chat_with_system) which bypasses privacy checks; change
supports_streaming() and stream_chat_with_system() to consult
resolve()/policy::decide() and the RoutingHints privacy flag before delegating:
have supports_streaming() return true only if the effective resolve/policy
allows remote streaming for the given model/hints (or explicitly document
local-only support), and in stream_chat_with_system() call resolve() and
policy::decide() (using policy::decide() or similar decision path used
elsewhere) to determine Local vs Remote; if decision == Local but local
streaming not implemented, return a fail-closed boxed stream that yields a
StreamResult::Err immediately instead of calling remote.stream_chat_with_system,
otherwise call remote.stream_chat_with_system with resolve_remote_model as
before. Reference symbols: supports_streaming, stream_chat_with_system,
resolve(), policy::decide(), resolve_remote_model,
remote.stream_chat_with_system, and ensure the error-producing stream matches
StreamResult<StreamChunk> semantics.

In `@tests/json_rpc_e2e.rs`:
- Around line 808-817: The test's routing_cases entry for ("hint:reaction",
"hint:reaction") is wrong because classify("hint:reaction") returns Lightweight
and resolve_remote_model() maps non-Heavy tasks to remote_fallback_model; update
the expected target model in tests/json_rpc_e2e.rs to "e2e-mock-model" (or
remove the ("hint:reaction", ...) case entirely if you don't want to exercise
this path), ensuring the change aligns with the IntelligentRoutingProvider
behavior and remote_fallback_model resolution.

---

Duplicate comments:
In `@src/openhuman/routing/policy.rs`:
- Around line 47-60: The RoutingHints fields latency_budget and cost_sensitivity
are currently ignored by decide(): update the decide() logic to include these
hints (in addition to privacy_required, Heavy, and local_available) by computing
a simple preference score or tie-breaker so latency_budget and cost_sensitivity
influence target selection; e.g., in decide() read RoutingHints.latency_budget
and .cost_sensitivity and bias the choice toward local for low-latency or
high-cost-sensitivity preferences (or toward remote for opposite values), or if
you prefer the alternate approach, remove latency_budget and cost_sensitivity
from the public RoutingHints API and docs and adjust tests accordingly (update
references to RoutingHints, decide(), privacy_required, Heavy, and
local_available so behavior/tests remain consistent).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a29d8192-8d2a-4c44-97a7-096caa419a6a

📥 Commits

Reviewing files that changed from the base of the PR and between 3346698 and b94f4ff.

📒 Files selected for processing (7)
  • src/openhuman/routing/factory.rs
  • src/openhuman/routing/health.rs
  • src/openhuman/routing/mod.rs
  • src/openhuman/routing/policy.rs
  • src/openhuman/routing/provider.rs
  • tests/json_rpc_e2e.rs
  • tests/live_routing_e2e.rs
✅ Files skipped from review due to trivial changes (1)
  • src/openhuman/routing/mod.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/openhuman/routing/health.rs
  • tests/live_routing_e2e.rs

Comment thread src/openhuman/routing/provider.rs Outdated
Comment thread tests/json_rpc_e2e.rs
- Updated the routing logic for medium tasks to utilize hints for local bias, ensuring more accurate routing decisions.
- Revised comments throughout the code to clarify the behavior of task categories and routing preferences.
- Adjusted test cases to reflect the new routing logic, ensuring they accurately validate the expected behavior for medium tasks.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 788-790: The test setup uses write_min_config() which leaves local
AI enabled, causing policy::decide() to choose Lightweight/local-first and
making hint:reaction non-deterministic; update the test's configuration after
invoking write_min_config() (for both openhuman_home and user_scoped_dir
occurrences) to explicitly disable local AI (set [local_ai] enabled = false) so
the scenario is remote-only and /openai/v1/chat/completions will
deterministically receive the e2e-mock-model.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdaea13d-36d2-4dbe-be12-9cf80f916e7a

📥 Commits

Reviewing files that changed from the base of the PR and between b94f4ff and 978e928.

📒 Files selected for processing (4)
  • src/openhuman/routing/mod.rs
  • src/openhuman/routing/policy.rs
  • src/openhuman/routing/provider.rs
  • tests/json_rpc_e2e.rs
✅ Files skipped from review due to trivial changes (2)
  • src/openhuman/routing/mod.rs
  • src/openhuman/routing/provider.rs

Comment thread tests/json_rpc_e2e.rs Outdated
…tion

- Added a timeout mechanism to the dictation event test to prevent indefinite blocking while waiting for the "pressed" event.
- Enhanced the test logic to consume events until the expected event type is received, improving reliability and clarity in the test flow.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/json_rpc_e2e.rs (1)

812-815: ⚠️ Potential issue | 🟠 Major

hint:reaction is still asserting the wrong contract.

In this remote-only setup, hint:reaction is a lightweight hint, so it should fall back to the configured remote/default model rather than reach the backend as the literal "hint:reaction". The expectation here and the exemption at Lines 869-876 reintroduce the opposite behavior.

🛠️ Suggested fix
     let routing_cases = [
         ("hint:reasoning", "reasoning-v1"),
         ("hint:agentic", "agentic-v1"),
         ("hint:coding", "coding-v1"),
         ("reasoning-v1", "reasoning-v1"),
-        // Web chat forwards lightweight hint overrides as-is for this path,
-        // so the upstream model receives the original hint string.
-        ("hint:reaction", "hint:reaction"),
+        // Lightweight hints are not normalized; in remote-only mode they route
+        // to the configured remote fallback/default model.
+        ("hint:reaction", "e2e-mock-model"),
     ];
@@
-        if model_override.starts_with("hint:")
-            && *model_override != "hint:reaction"
-            && *expected_model != *model_override
-        {
+        if model_override.starts_with("hint:") && *expected_model != *model_override {
             assert!(
                 !captured_models.iter().any(|m| m == model_override),
                 "hint model should not pass through for case={model_override}: {captured_models:?}"
             );
         }

Also applies to: 869-876

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/json_rpc_e2e.rs` around lines 812 - 815, The test currently asserts
("hint:reaction","hint:reaction") which treats the lightweight hint
"hint:reaction" as a literal backend model; change the expectation so the hint
falls back to the configured remote/default model instead of the literal string.
Update the tuple in tests/json_rpc_e2e.rs to expect the test suite's
remote/default model identifier (the same symbol used elsewhere for
remote_model/default_model) and adjust the exemption logic that special-cases
"hint:reaction" (the block around the lightweight-hint exemption) so it does not
route lightweight hints to the backend as the literal "hint:reaction" but allows
the normal remote fallback. Ensure both the tuple and the exemption block use
the shared remote/default model symbol so behavior is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/json_rpc_e2e.rs`:
- Around line 812-815: The test currently asserts
("hint:reaction","hint:reaction") which treats the lightweight hint
"hint:reaction" as a literal backend model; change the expectation so the hint
falls back to the configured remote/default model instead of the literal string.
Update the tuple in tests/json_rpc_e2e.rs to expect the test suite's
remote/default model identifier (the same symbol used elsewhere for
remote_model/default_model) and adjust the exemption logic that special-cases
"hint:reaction" (the block around the lightweight-hint exemption) so it does not
route lightweight hints to the backend as the literal "hint:reaction" but allows
the normal remote fallback. Ensure both the tuple and the exemption block use
the shared remote/default model symbol so behavior is consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28dc7962-fdf2-46e3-8629-2ff653eeb657

📥 Commits

Reviewing files that changed from the base of the PR and between 978e928 and a739760.

📒 Files selected for processing (1)
  • tests/json_rpc_e2e.rs

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
* chore: update OpenHuman version to 0.52.9 and add intelligent routing functionality

- Bumped OpenHuman version from 0.52.7 to 0.52.9 in Cargo.lock files.
- Introduced a new routing module that implements intelligent model routing based on task complexity and local model health.
- Added a health checker for the local Ollama model server to improve routing decisions.
- Enhanced the provider to classify tasks and determine the appropriate backend (local or remote) for processing requests.
- Updated related files to support the new routing logic and ensure seamless integration with existing functionalities.

* refactor(routing): streamline provider and enhance routing logic

- Removed the `build_tool_instructions` function from the public API, simplifying the routing module.
- Updated the `IntelligentRoutingProvider` to utilize a more efficient model resolution process, improving routing decisions based on task complexity and local model health.
- Introduced a new `quality` module to assess response quality, enabling better fallback decisions when local responses are deemed low quality.
- Enhanced the `RoutingHints` struct to provide more granular control over routing behavior, including privacy requirements and cost sensitivity.
- Added tests to validate the new routing logic and quality assessment, ensuring robust functionality across various scenarios.

* feat(tests): add live end-to-end routing tests for real backend integration

- Introduced a new test file `live_routing_e2e.rs` containing end-to-end tests for routing against a live backend.
- Tests require a valid backend URL, user session JWT, and real network interactions, hence marked as `#[ignore]`.
- Implemented functionality to set up environment variables, write configuration files, and perform JSON-RPC calls to validate routing behavior.
- Added assertions to ensure correct handling of various routing cases, enhancing test coverage for the routing module.

* refactor(format): ran format command

* feat(routing): add IntelligentRoutingProvider and enhance LocalHealthChecker

- Introduced a new `factory.rs` file containing the `new_provider` function to construct an `IntelligentRoutingProvider` that integrates local AI capabilities with remote backend providers.
- Enhanced the `LocalHealthChecker` in `health.rs` by adding a `reqwest::Client` for improved health probing, including better logging for cache hits and misses, and streamlined cache updates.
- Updated health check logic to utilize the new client, ensuring more reliable health status checks for local AI services.

* refactor(routing): move new_provider function to factory module

- Moved the `new_provider` function from `mod.rs` to a new `factory.rs` module to improve code organization and maintainability.
- Updated public exports to include the new location of `new_provider`, ensuring continued accessibility for constructing `IntelligentRoutingProvider` instances.
- Removed the old implementation from `mod.rs`, streamlining the routing module's structure.

* refactor(routing): simplify local task routing logic

- Removed redundant conditions for routing medium tasks locally, streamlining the decision-making process in the `decide` function.
- Updated comments to reflect the simplified logic, enhancing code clarity and maintainability.

* docs(tests): clarify comments in json_rpc_e2e.rs regarding hint overrides
logic.

* refactor(tests): enhance live routing end-to-end tests with timeout handling

- Introduced a timeout mechanism for reading SSE events to prevent indefinite blocking.
- Updated environment variable management in tests to ensure safe access and cleanup.
- Improved comments for clarity regarding the safety of environment variable mutations during tests.

* refactor(tests): update SSE event reading in live routing tests.

* refactor(routing): enhance medium task routing logic and update comments

- Updated the routing logic for medium tasks to utilize hints for local bias, ensuring more accurate routing decisions.
- Revised comments throughout the code to clarify the behavior of task categories and routing preferences.
- Adjusted test cases to reflect the new routing logic, ensuring they accurately validate the expected behavior for medium tasks.

* refactor(tests): implement timeout handling for dictation event reception

- Added a timeout mechanism to the dictation event test to prevent indefinite blocking while waiting for the "pressed" event.
- Enhanced the test logic to consume events until the expected event type is received, improving reliability and clarity in the test flow.

---------

Co-authored-by: Steven Enamakel <31011319+senamakel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants