Skip to content

fix(local-ai): cap local models to 1b preset#1029

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:fix/models
Apr 30, 2026
Merged

fix(local-ai): cap local models to 1b preset#1029
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:fix/models

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 29, 2026

Summary

  • Restrict local AI preset selection to the single 1B Gemma preset in the core presets RPC and apply-preset path.
  • Change local AI defaults and allowlist fallbacks to gemma3:1b-it-qat, with vision disabled and lightweight embeddings.
  • Keep GUI preset/bootstrap expectations aligned with the new single-preset behavior.
  • Add regression tests that verify only the 1B preset is exposed and larger tiers are rejected.

Problem

  • The app still exposed larger local model tiers even though the 1B model is sufficient for local summarization.
  • Larger models increase battery and resource usage, and they created a mismatch between the intended lightweight local-AI experience and what the GUI/core allowed.
  • Some core defaults still pointed at larger local models, so config/bootstrap behavior was not fully aligned with the intended cap.

Solution

  • Update local AI config defaults and model ID fallbacks to prefer the 1B Gemma preset and lightweight embeddings.
  • Return only MVP-allowed presets from local_ai_presets, and reject non-allowed larger tiers in local_ai_apply_preset and env-tier overrides.
  • Adjust the preset metadata copy so the GUI presents the 1B preset as the only local model choice.
  • Update app and Rust tests to assert the new ram_2_4gb / 1B-only behavior.

Submission Checklist

  • Unit tests — Vitest (app/) and cargo test (core) for the changed local AI preset/bootstrap paths
  • E2E / integration — Not added; this change is configuration/preset gating and was covered with targeted core RPC-facing tests plus app unit tests
  • N/A — If truly not applicable, say why (e.g. change is documentation-only)
  • Doc comments — Updated local AI preset/model comments where the build-level cap changed
  • Inline comments — Existing comments cover the non-obvious gating/allowlist behavior; no new inline comments were needed beyond that

(Any feature related checklist can go in here)

Impact

  • Desktop local AI now consistently defaults to the 1B text-only preset when enabled.
  • The GUI no longer offers larger local-model tiers, and the core rejects attempts to opt into them through the preset RPC or OPENHUMAN_LOCAL_AI_TIER.
  • This reduces battery/runtime cost for local summarization and removes drift between UI and backend behavior.

Related

  • Closes:
  • Follow-up PR(s)/TODOs:

Summary by CodeRabbit

  • Configuration Updates

    • Local AI now defaults to the lighter 1B Gemma 3 model for improved performance
    • Vision disabled by default
    • Embedding model switched to all-minilm for efficiency
    • Preset system simplified to a single MVP-optimized 1B tier
  • Behavior Changes

    • Unsupported or custom model tiers are rejected with clear messaging and blocked in this build
  • Tests

    • Unit and e2e tests updated to expect the single MVP 1B preset and tier behavior

@senamakel senamakel requested a review from a team April 29, 2026 18:37
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 792e750a-ef76-415c-a1aa-4b6fe1bbfcd2

📥 Commits

Reviewing files that changed from the base of the PR and between 34d2aec and 5585c25.

📒 Files selected for processing (1)
  • tests/json_rpc_e2e.rs

📝 Walkthrough

Walkthrough

The PR restricts local AI presets to a single MVP tier (ram_2_4gb), updates default model IDs to lighter choices, and adds runtime and env-override validation to reject custom or non-MVP tiers; tests, schemas, and handler responses are updated accordingly.

Changes

Cohort / File(s) Summary
Configuration Validation
src/openhuman/config/schema/load.rs
Env override parsing now rejects ModelTier::Custom and any tier not allowed for this build (!tier.is_mvp_allowed()); only MVP-allowed tiers trigger preset application; warning logs updated.
Default Model Identifiers
src/openhuman/config/schema/local_ai.rs, src/openhuman/local_ai/model_ids.rs
Default/fallback model IDs changed: chat → gemma3:1b-it-qat, vision → "" (disabled), embedding → all-minilm:latest.
Preset System & Tier Management
src/openhuman/local_ai/presets.rs
Preset exposure constrained to a single MVP tier (ram_2_4gb); recommend_tier returns MVP_MAX_TIER; preset metadata updated to reflect Gemma 3 1B and disabled vision; tests adjusted.
API Schemas & Handlers
src/openhuman/local_ai/schemas.rs
local_ai_apply_preset schema now advertises only ram_2_4gb/disabled; handlers return MVP-only presets and reject non-MVP tiers at runtime with explicit error messages.
Tests & Frontend Mock
app/src/utils/__tests__/localAiBootstrap.test.ts, src/openhuman/local_ai/schemas_tests.rs, tests/json_rpc_e2e.rs
Mocks and tests updated to expect a single ram_2_4gb preset, reflect new model/quantization fields and applied tier, and assert handler rejection of unsupported larger tiers.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API_Handler as "Local AI Handler"
    participant Presets as "Presets Module"
    participant Config as "Config Loader/Applier"

    Client->>API_Handler: POST /local_ai/apply_preset { tier: "ram_2_4gb" }
    API_Handler->>Presets: validate_tier(tier)
    Presets-->>API_Handler: OK (tier is MVP-allowed)
    API_Handler->>Config: apply_preset_to_config(tier)
    Config->>Config: reject if ModelTier::Custom or !is_mvp_allowed()
    Config-->>API_Handler: applied config / result
    API_Handler-->>Client: 200 OK (appliedTier: "ram_2_4gb")
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through code on silent paws,

Picked one small tier with gentle laws,
Gemma 1B, vision tucked away,
Light and snug to save the day,
Ram_2_4gb—my cozy play.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: restricting local AI models to a 1B preset, which is the core objective across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 95.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 3/5 reviews remaining, refill in 19 minutes and 23 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/local_ai/presets.rs (1)

197-203: Consider removing unused computation.

The ram_gb value is computed on line 199 but the recommendation logic no longer uses it (always returns MVP_MAX_TIER). While it provides context in the debug log, you could simplify this if the RAM-based recommendation is not planned to return.

♻️ Optional: Inline the log if RAM-based logic is permanently removed
 pub fn recommend_tier(device: &DeviceProfile) -> ModelTier {
-    let ram_gb = device.total_ram_gb();
     let tier = MVP_MAX_TIER;
-    tracing::debug!(ram_gb, ?tier, "[local_ai] recommended model tier");
+    tracing::debug!(ram_gb = device.total_ram_gb(), ?tier, "[local_ai] recommended model tier");
     tier
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/local_ai/presets.rs` around lines 197 - 203, The function
recommend_tier computes ram_gb from DeviceProfile but never uses it and always
returns MVP_MAX_TIER; either remove the unused ram_gb binding and its inclusion
in the tracing::debug call or restore RAM-based logic—update the function
recommend_tier to either drop ram_gb and simplify the debug message to not
reference ram_gb, or implement the intended tier selection using ram_gb so the
variable is used before returning MVP_MAX_TIER.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/openhuman/local_ai/presets.rs`:
- Around line 197-203: The function recommend_tier computes ram_gb from
DeviceProfile but never uses it and always returns MVP_MAX_TIER; either remove
the unused ram_gb binding and its inclusion in the tracing::debug call or
restore RAM-based logic—update the function recommend_tier to either drop ram_gb
and simplify the debug message to not reference ram_gb, or implement the
intended tier selection using ram_gb so the variable is used before returning
MVP_MAX_TIER.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 961fd556-69d9-473b-9762-acd11c229993

📥 Commits

Reviewing files that changed from the base of the PR and between 74074cd and e0c68f0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • app/src/utils/__tests__/localAiBootstrap.test.ts
  • src/openhuman/config/schema/load.rs
  • src/openhuman/config/schema/local_ai.rs
  • src/openhuman/local_ai/model_ids.rs
  • src/openhuman/local_ai/presets.rs
  • src/openhuman/local_ai/schemas.rs
  • src/openhuman/local_ai/schemas_tests.rs

@senamakel senamakel merged commit ac97a96 into tinyhumansai:main Apr 30, 2026
13 of 15 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant