Skip to content

fix(plugins): stop reconciliation install loop, slow plugin list, uninstall resurrection#309

Merged
ChuckBuilds merged 8 commits intomainfrom
fix/reconciliation-loop-and-plugin-perf
Apr 8, 2026
Merged

fix(plugins): stop reconciliation install loop, slow plugin list, uninstall resurrection#309
ChuckBuilds merged 8 commits intomainfrom
fix/reconciliation-loop-and-plugin-perf

Conversation

@ChuckBuilds
Copy link
Copy Markdown
Owner

@ChuckBuilds ChuckBuilds commented Apr 8, 2026

Summary

Three interacting bugs reported by ericepe on Discord after a fresh install:

  1. Infinite reinstall loop — logs flooded with Plugin not found in registry: github / youtube (pastebin). The reconciler fired on every HTTP request, which on a Pi4 also accounted for much of the ~98% CPU and the long "connecting to display" hangs.
  2. Uninstalled plugins reappeared — the UI reported success but the plugin came back, even after the user manually deleted plugin-repos/<name> from disk.
  3. Plugin list was very slow / pegged the CPU on a Pi4 — both /plugins/installed and especially /plugins/store/list were doing synchronous per-plugin work on every request.

All three are addressed. Each is a confirmed root cause from reading and running the code paths on a real Pi, not a guess.

Commits

  • b031b991 fix(plugins): stop reconciliation install loop, slow /plugins/installed, and uninstall resurrection
  • c03eb8db perf(plugins): parallelize Plugin Store browse and extend metadata cache TTLs
  • 5ed48d5c perf(plugins): drop redundant per-plugin manifest.json fetch in search_plugins

What changed

Fix 1 — Reconciliation infinite retry loop

  • web_interface/app.py: _run_startup_reconciliation no longer resets _reconciliation_started = False on unresolved inconsistencies, so @app.before_request cannot retrigger the entire pass on every page load. Reconciliation runs exactly once per process.
  • src/plugin_system/state_reconciliation.py: added _unrecoverable_missing_on_disk set, populated by _auto_repair_missing_plugin whenever a candidate isn't in the registry. Subsequent reconciliation passes skip these plugins entirely (no install_plugin, no fetch_registry).
  • Cheap pre-check before calling install_plugin so we never fire the expensive forced GitHub fetch when we already know the plugin is missing from the registry.
  • Transient registry failures are not marked unrecoverable — only "definitely not in the registry" is.
  • Added force=True parameter on reconcile_state(...) and the /plugins/state/reconcile endpoint so users can retry after the underlying issue is fixed. The endpoint also now passes store_manager (previously omitted, silently disabling auto-repair).

Fix 2 — Uninstall plugin reappears

  • web_interface/blueprints/api_v3.py: reordered the uninstall flow (both queue and direct paths) — config cleanup now happens before file removal, eliminating the "config-entry-with-no-files" race window the reconciler used to resurrect on.
  • Added an in-memory uninstall tombstone on PluginStoreManager (mark_recently_uninstalled / was_recently_uninstalled, 5-minute TTL). The reconciler honors it as a hard "do not auto-repair" signal even if state somehow drifts out of sync.

Fix 3a — /plugins/installed slowness

  • src/plugin_system/store_manager.py: _get_local_git_info now caches results keyed on .git/HEAD mtime. Subsequent calls do zero subprocess work unless the working copy actually moved.
  • web_interface/blueprints/api_v3.py: GET /plugins/installed mtime-gates discover_plugins() against the plugins directory and drops the per-plugin manifest.json re-read entirely. Added ?refresh=1 for force-rescan.
  • Bumped registry_cache_timeout from 5 → 15 minutes.
  • fetch_registry now falls back to the stale cache on transient network failure instead of returning an empty list.

Fix 3b — /plugins/store/list slowness (Plugin Store browse)

Most users install via the Plugin Store (ZIP extraction, no .git dir), so the git-info cache above doesn't help them directly. The real pain was the store browse endpoint:

  • Parallelize per-plugin GitHub enrichment via a ThreadPoolExecutor (10 workers, preserves registry order). Category/tag/query filters now run before enrichment so we never waste HTTP requests on plugins that will be filtered out.
  • Drop the redundant per-plugin manifest.json fetch from search_plugins. The registry's plugins.json already carries description (it is generated from each manifest by update_registry.py), and last_updated is filled from the commit info fetched in the same loop. Removing this call eliminated one of three per-plugin HTTPS round trips — crucially, the one that regularly hit the _http_get_with_retries(max_retries=3, backoff=0.75) tail when DNS flapped, which dominated wall time even after parallelization.
  • Bump commit/manifest cache TTLs from 5 min → 30 min so a warm browse session stays warm.
  • Stale-on-error fallback in _get_github_repo_info and _get_latest_commit_info: on network failure or 403, return the previously-cached value rather than zeroing out stars / clearing last_commit. Matches the pattern already in fetch_registry.

Benchmarks on real Pi hardware (devpi, Raspberry Pi, ARM64, 31 plugins in registry)

/api/v3/plugins/installed

cold warm
PR 221 ms 7–13 ms

10 repeated warm hits produced zero new reconciliation retries, confirming the loop is gone.

/api/v3/plugins/store/list (the big one)

cold warm
main (pre-fix, serial + manifest fetch + 5-min TTLs) 30.06 s 23 ms
PR (parallel + manifest drop + 30-min TTLs) 16.43 s 11–27 ms

46 % faster cold, with identical plugin count, identical order, and zero diffs on the critical fields the UI reads (stars, last_commit, last_commit_author, last_updated, default_branch, verified, repo, name, author). A single description field differed between the two runs because the PR uses the registry description instead of the live GitHub manifest description for that plugin — an acceptable trade since update_registry.py is meant to be authoritative.

The remaining 16 s cold time is two GitHub API calls per plugin × 31 plugins bottlenecked by the slowest thread-pool wave. After the first load, warm hits are sub-30 ms and the 30-minute cache TTL means normal browsing stays warm across realistic sessions.

Reconciliation loop verified on the live service

I stopped ledmatrix.service + ledmatrix-web.service, checked out the PR branch, and restarted. The journal showed exactly the expected behavior:

  • github and youtube (the exact plugins from the bug report) hit the registry pre-check, were marked unrecoverable, no install attempt
  • music and weather WERE in the registry, auto-repaired successfully
  • /api/v3/plugins/installed fired 13 times over the test; zero AutoRepair or Starting state reconciliation entries after the initial pass

Tests

29/29 tests pass:

  • test/web_interface/test_state_reconciliation.py — 5 new cases:
    • test_not_in_registry_marks_unrecoverable_without_install
    • test_subsequent_reconcile_does_not_retry
    • test_force_reconcile_clears_unrecoverable_cache
    • test_registry_unreachable_does_not_mark_unrecoverable
    • test_recently_uninstalled_skips_auto_repair
  • test/test_store_manager_caches.py — new file, 13 cases:
    • Tombstone lifecycle + TTL expiry (3)
    • Git-info cache hit / mtime invalidation / no-.git early return (3)
    • Parallel search_plugins: order preservation, filter-before-enrichment, concurrent-execution wall-time bound (3)
    • Stale-on-error fallback for repo info and commit info (2)
    • Registry stale-cache fallback (with and without prior cache) (2)

Broader sweep: 288/288 in the rest of the suite. Two pre-existing failures in TestDottedKeyNormalization reproduce on main and are unrelated.

Not in this PR

  • Static image plugin "doesn't show" — the canonical plugin-repos/static-image/manager.py already handles the schema mismatch I initially suspected (top-level images vs image_config.images). Without specific log output pinpointing what fails for that user, a speculative patch could mask the real bug. Tracking separately.
  • /plugins/update fast-path for ZIP-installed plugins — the update endpoint still calls get_plugin_info(fetch_latest_from_github=True) which makes 3 GitHub requests. A version-only comparison against the cached registry would reduce that to zero. Saving for a follow-up; the update path is only hit on explicit user action, not on every page load.

Test plan

  • pytest test/test_store_manager_caches.py test/web_interface/test_state_reconciliation.py -q --no-cov passes locally.
  • On a Pi4: restart the service, observe reconciliation runs exactly once, CPU settles.
  • Browse the Plugin Store page cold — should complete in roughly half the previous time.
  • Install a plugin via the UI, uninstall it, refresh several times — stays gone. Manually delete plugin-repos/<plugin>/ too; should stay gone.
  • Hand-edit config/config.json to add "fakeplugin": {"enabled": true} that does not exist in the registry. Start the web interface, refresh the plugin page repeatedly, tail the log — exactly one reconciliation attempt, then silence.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Reconciliation gains a force option to reattempt repairs and skip session-level skip-list; reconciliation endpoint accepts a JSON force flag.
    • Transactional plugin uninstall with automatic config snapshot/restore on failure.
    • Tracks recent uninstalls to avoid resurrecting just-removed plugins.
  • Bug Fixes

    • Registry and metadata fetching fall back to cached/stale data on network errors to improve resilience.
    • Reduced unnecessary disk reads during plugin discovery.
    • Prevent repeated auto-repair retries for irrecoverable missing plugins in the same process.
  • Tests

    • Added extensive tests covering caching, reconciliation, uninstall transactions, and endpoint payload handling.

… uninstall resurrection

Three interacting bugs reported by a user (Discord/ericepe) on a fresh install:

1. The state reconciler retried failed auto-repairs on every HTTP request,
   pegging CPU and flooding logs with "Plugin not found in registry: github
   / youtube". Root cause: ``_run_startup_reconciliation`` reset
   ``_reconciliation_started`` to False on any unresolved inconsistency, so
   ``@app.before_request`` re-fired the entire pass on the next request.
   Fix: run reconciliation exactly once per process; cache per-plugin
   unrecoverable failures inside the reconciler so even an explicit
   re-trigger stays cheap; add a registry pre-check to skip the expensive
   GitHub fetch when we already know the plugin is missing; expose
   ``force=True`` on ``/plugins/state/reconcile`` so users can retry after
   fixing the underlying issue.

2. Uninstalling a plugin via the UI succeeded but the plugin reappeared.
   Root cause: a race between ``store_manager.uninstall_plugin`` (removes
   files) and ``cleanup_plugin_config`` (removes config entry) — if
   reconciliation fired in the gap it saw "config entry with no files" and
   reinstalled. Fix: reorder uninstall to clean config FIRST, drop a
   short-lived "recently uninstalled" tombstone on the store manager that
   the reconciler honors, and pass ``store_manager`` to the manual
   ``/plugins/state/reconcile`` endpoint (it was previously omitted, which
   silently disabled auto-repair entirely).

3. ``GET /plugins/installed`` was very slow on a Pi4 (UI hung on
   "connecting to display" for minutes, ~98% CPU). Root causes: per-request
   ``discover_plugins()`` + manifest re-read + four ``git`` subprocesses per
   plugin (``rev-parse``, ``--abbrev-ref``, ``config``, ``log``). Fix:
   mtime-gate ``discover_plugins()`` and drop the per-plugin manifest
   re-read in the endpoint; cache ``_get_local_git_info`` keyed on
   ``.git/HEAD`` mtime so subprocesses only run when the working copy
   actually moved; bump registry cache TTL from 5 to 15 minutes and fall
   back to stale cache on transient network failure.

Tests: 16 reconciliation cases (including 5 new ones covering the
unrecoverable cache, force-reconcile path, transient-failure handling, and
recently-uninstalled tombstone) and 8 new store_manager cache tests
covering tombstone TTL, git-info mtime cache hit/miss, and the registry
stale-cache fallback. All 24 pass; the broader 288-test suite continues to
pass with no new failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a per-process "unrecoverable missing" cache and a reconcile_state(force: bool = False) option; StateReconciliation now pre-checks the registry and respects uninstall tombstones before auto-repair. PluginStoreManager gains uninstall-tombstones, extended cache/backoff logic, stale-on-error fallbacks, improved git metadata caching, and concurrent search enrichment; web endpoints and tests updated accordingly.

Changes

Cohort / File(s) Summary
State reconciliation
src/plugin_system/state_reconciliation.py
Add self._unrecoverable_missing_on_disk; change signature to reconcile_state(self, force: bool = False); clear unrecoverable cache when force=True; check tombstones and unrecoverable cache before auto-repair; pre-check registry (raise_on_failure) and mark definitive failures as unrecoverable; avoid caching transient registry errors.
Plugin store manager
src/plugin_system/store_manager.py
Increase several TTLs, add per-instance uninstall tombstones with TTL and mark_recently_uninstalled/was_recently_uninstalled, add _git_info_cache keyed by .git signature, introduce _record_cache_backoff, return stale GitHub/commit/registry data on transient errors, add fetch_registry(..., raise_on_failure=False), and change search_plugins to pre-filter then concurrently enrich results.
Web interface
web_interface/app.py, web_interface/blueprints/api_v3.py
Startup reconciliation always sets _reconciliation_done; run discover_plugins() only when inconsistencies were fixed; /plugins/state/reconcile accepts optional JSON { "force": ... } and forwards coerced boolean to reconcile_state; /plugins/installed uses mtime-based refresh and avoids per-request manifest re-reads; add transactional uninstall helpers that snapshot/restore config and perform atomic-best-effort uninstall.
Tests
test/test_store_manager_caches.py, test/web_interface/test_state_reconciliation.py, test/test_uninstall_and_reconcile_endpoint.py
Add tests for uninstall-tombstone lifecycle, git-info mtime caching, stale-on-error fallbacks and backoff behavior, concurrent enrichment and filtering in search_plugins, unrecoverable-cache behavior (including force=True clear), recent-uninstall blocking, transactional uninstall rollback scenarios, and reconcile endpoint payload coercion and handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant StateRec as StateReconciliation
    participant StoreMgr as PluginStoreManager
    participant Registry as Plugin Registry
    participant Disk as File System

    Client->>StateRec: reconcile_state(force=False)
    alt force=True
        StateRec->>StateRec: clear unrecoverable cache
    end
    StateRec->>StateRec: check unrecoverable cache for plugin_id
    alt in unrecoverable cache
        StateRec-->>Client: report MANUAL_FIX_REQUIRED (skip auto-repair)
    else not cached
        StateRec->>StoreMgr: was_recently_uninstalled(plugin_id)?
        StoreMgr-->>StateRec: true/false
        alt recently uninstalled
            StateRec-->>Client: skip auto-repair
        else not recently uninstalled
            StateRec->>StoreMgr: fetch_registry(raise_on_failure=True)
            StoreMgr->>Registry: network request (may return cached fallback)
            Registry-->>StoreMgr: registry data / error
            StoreMgr-->>StateRec: registry data or stale/error
            StateRec->>StateRec: is plugin in registry?
            alt not in registry (definitive)
                StateRec->>StateRec: mark plugin unrecoverable
                StateRec-->>Client: report MANUAL_FIX_REQUIRED
            else in registry
                StateRec->>StoreMgr: install_plugin(plugin_id)
                StoreMgr->>Disk: download & install
                alt install succeeds
                    StoreMgr-->>StateRec: install success
                else install fails
                    StateRec->>StateRec: mark plugin unrecoverable
                    StoreMgr-->>StateRec: install failed
                end
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the three main fixes: stopping the reconciliation install loop, addressing slow plugin list performance, and fixing uninstall resurrection.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/reconciliation-loop-and-plugin-perf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/web_interface/test_state_reconciliation.py (1)

414-433: Consider using assert_called_once_with for stricter verification.

assert_called_with only verifies the arguments of the most recent call without checking call count. Using assert_called_once_with would catch potential bugs where install_plugin is invoked multiple times.

💡 Suggested improvement
-        self.store_manager.install_plugin.assert_called_with("ghost")
+        self.store_manager.install_plugin.assert_called_once_with("ghost")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/web_interface/test_state_reconciliation.py` around lines 414 - 433, In
test_force_reconcile_clears_unrecoverable_cache replace the loose call assertion
with a strict one: change the final assertion that inspects
store_manager.install_plugin from assert_called_with("ghost") to
assert_called_once_with("ghost") so the test verifies both the argument and that
install_plugin was invoked exactly once on the reconciler re-run (refer to the
test method name test_force_reconcile_clears_unrecoverable_cache and the mock
method store_manager.install_plugin).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 504-516: The current registry-fetching method silently falls back
to self.registry_cache on requests.RequestException/json.JSONDecodeError which
hides network errors and breaks
state_reconciliation._auto_repair_missing_plugin's logic; modify the registry
fetch method to accept a raise_on_failure: bool (default False) parameter and
when True re-raise the caught exception instead of returning the stale cache
(i.e., in the except blocks for requests.RequestException and
json.JSONDecodeError, if raise_on_failure: raise), and update callers that need
to distinguish transient network failures—specifically call the fetch with
raise_on_failure=True from
state_reconciliation._auto_repair_missing_plugin—while keeping the default
behavior for UI callers that prefer the stale cache.

In `@web_interface/blueprints/api_v3.py`:
- Around line 2850-2857: The uninstall flow must be made transactional: stop the
uninstall if api_v3.config_manager.cleanup_plugin_config(plugin_id,
remove_secrets=True) raises an exception (do not proceed to file deletion) and
when calling PluginStoreManager.uninstall_plugin(...) if that call returns False
or raises, restore the previously saved config/secrets (the config snapshot you
took before cleanup) and surface/raise the error instead of silently logging and
continuing; update the code paths around cleanup_plugin_config, the
preserve_config check, and PluginStoreManager.uninstall_plugin to (1) take a
pre-cleanup snapshot of config/secrets, (2) abort and re-raise/log + return
failure when cleanup_plugin_config fails, and (3) on uninstall_plugin failure
restore the snapshot before returning/failing, logging clear context via logger
rather than proceeding.
- Around line 2390-2397: The code assumes request.get_json() returns a dict and
uses payload.get('force') which will raise on non-object JSON and mis-handle
string booleans; change the block that builds force so you first confirm payload
is a dict (after request.get_json(silent=True) or {}), then extract the value
using payload.get('force', False) only if isinstance(payload, dict), and pass
that value through the existing _coerce_to_bool(...) helper before calling
reconciler.reconcile_state(force=...), ensuring non-object JSON returns the
default False.

---

Nitpick comments:
In `@test/web_interface/test_state_reconciliation.py`:
- Around line 414-433: In test_force_reconcile_clears_unrecoverable_cache
replace the loose call assertion with a strict one: change the final assertion
that inspects store_manager.install_plugin from assert_called_with("ghost") to
assert_called_once_with("ghost") so the test verifies both the argument and that
install_plugin was invoked exactly once on the reconciler re-run (refer to the
test method name test_force_reconcile_clears_unrecoverable_cache and the mock
method store_manager.install_plugin).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6712dc9a-b433-4303-abb3-dee3fcca67e8

📥 Commits

Reviewing files that changed from the base of the PR and between 7812245 and b031b99.

📒 Files selected for processing (6)
  • src/plugin_system/state_reconciliation.py
  • src/plugin_system/store_manager.py
  • test/test_store_manager_caches.py
  • test/web_interface/test_state_reconciliation.py
  • web_interface/app.py
  • web_interface/blueprints/api_v3.py

…che TTLs

Follow-up to the previous commit addressing the Plugin Store browse path
specifically. Most users install plugins via the store (ZIP extraction,
no .git directory) so the git-info mtime cache from the previous commit
was a no-op for them; their pain was coming from /plugins/store/list.

Root cause. search_plugins() enriched each returned plugin with three
serial GitHub fetches: _get_github_repo_info (repo API), _get_latest_commit_info
(commits API), _fetch_manifest_from_github (raw.githubusercontent.com).
Fifteen plugins × three requests × serial HTTP = 30–45 sequential round
trips on every cold browse. On a Pi4 over WiFi that translated directly
into the "connecting to display" hang users reported. The commit and
manifest caches had a 5-minute TTL, so even a brief absence re-paid the
full cost.

Changes.

- ``search_plugins``: fan out per-plugin enrichment through a
  ``ThreadPoolExecutor`` (max 10 workers, stays well under unauthenticated
  GitHub rate limits). Apply category/tag/query filters before enrichment
  so we never waste requests on plugins that will be filtered out.
  ``executor.map`` preserves input order, which the UI depends on.
- ``commit_cache_timeout`` and ``manifest_cache_timeout``: 5 min → 30 min.
  Keeps the cache warm across a realistic session while still picking up
  upstream updates in a reasonable window.
- ``_get_github_repo_info`` and ``_get_latest_commit_info``: stale-on-error
  fallback. On a network failure or a 403 we now prefer a previously-
  cached value over the zero-default, matching the pattern already in
  ``fetch_registry``. Flaky Pi WiFi no longer causes star counts to flip
  to 0 and commit info to disappear.

Tests (5 new in test_store_manager_caches.py).

- ``test_results_preserve_registry_order`` — the parallel map must still
  return plugins in input order.
- ``test_filters_applied_before_enrichment`` — category/tag/query filters
  run first so we don't waste HTTP calls.
- ``test_enrichment_runs_concurrently`` — peak-concurrency check plus a
  wall-time bound that would fail if the code regressed to serial.
- ``test_repo_info_stale_on_network_error`` — repo info falls back to
  stale cache on RequestException.
- ``test_commit_info_stale_on_network_error`` — commit info falls back to
  stale cache on RequestException.

All 29 tests (16 reconciliation, 13 store_manager caches) pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/test_store_manager_caches.py (1)

186-189: Remove unused variable concurrent_workers.

This variable is defined but never used in the test.

🧹 Remove dead code
         import threading
-        concurrent_workers = []
         peak_lock = threading.Lock()
         peak = {"count": 0, "current": 0}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_store_manager_caches.py` around lines 186 - 189, Remove the dead
variable concurrent_workers declared alongside peak_lock and peak in the test
(the unused symbol concurrent_workers); delete its declaration and any related
unused assignment so only peak_lock and peak remain, leaving the threading
import intact if still used by peak_lock or other code (search for
concurrent_workers to ensure no remaining references).
src/plugin_system/store_manager.py (1)

668-678: Add parentheses to clarify operator precedence.

The condition on line 671 mixes and and or without parentheses, making the intent unclear. While and binds tighter than or, explicit parentheses improve readability and silence the linter warning.

✨ Suggested clarification
-        if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4:
+        if len(filtered) == 1 or (not fetch_commit_info and len(filtered) < 4):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/store_manager.py` around lines 668 - 678, The conditional
combining len(filtered), fetch_commit_info and logical operators in the
early-return needs explicit parentheses to clarify precedence: update the if in
store_manager (the block that returns [_enrich(p) for p in filtered]) to use
parentheses around the intended subexpressions (e.g., group the single-item
check and the fetch_commit_info+size check) so it reads unambiguously (refer to
variables/functions: filtered, fetch_commit_info, and _enrich) before proceeding
to the ThreadPoolExecutor branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/plugin_system/store_manager.py`:
- Around line 668-678: The conditional combining len(filtered),
fetch_commit_info and logical operators in the early-return needs explicit
parentheses to clarify precedence: update the if in store_manager (the block
that returns [_enrich(p) for p in filtered]) to use parentheses around the
intended subexpressions (e.g., group the single-item check and the
fetch_commit_info+size check) so it reads unambiguously (refer to
variables/functions: filtered, fetch_commit_info, and _enrich) before proceeding
to the ThreadPoolExecutor branch.

In `@test/test_store_manager_caches.py`:
- Around line 186-189: Remove the dead variable concurrent_workers declared
alongside peak_lock and peak in the test (the unused symbol concurrent_workers);
delete its declaration and any related unused assignment so only peak_lock and
peak remain, leaving the threading import intact if still used by peak_lock or
other code (search for concurrent_workers to ensure no remaining references).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 138d4a15-6842-4130-bac6-660ef20c6ea9

📥 Commits

Reviewing files that changed from the base of the PR and between b031b99 and c03eb8d.

📒 Files selected for processing (2)
  • src/plugin_system/store_manager.py
  • test/test_store_manager_caches.py

…h_plugins

Benchmarking the previous parallelization commit on a real Pi4 revealed
that the 10x speedup I expected was only ~1.1x. Profiling showed two
plugins (football-scoreboard, ledmatrix-flights) each spent 5 seconds
inside _fetch_manifest_from_github — not on the initial HTTP call, but
on the three retries in _http_get_with_retries with exponential backoff
after transient DNS failures. Even with the thread pool, those 5-second
tail latencies stayed in the wave and dominated wall time.

The per-plugin manifest fetch in search_plugins is redundant anyway.
The registry's plugins.json already carries ``description`` (it is
generated from each plugin's manifest by update_registry.py at release
time), and ``last_updated`` is filled in from the commit info that we
already fetch in the same loop. Dropping the manifest fetch eliminates
one of the three per-plugin HTTPS round trips entirely, which also
eliminates the DNS-retry tail.

The _fetch_manifest_from_github helper itself is preserved — it is
still used by the install path.

Tests unchanged (the search_plugins tests mock all three helpers and
still pass); this drop only affects the hot-path call sequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/plugin_system/store_manager.py (3)

84-89: Consider more precise type hint for _git_info_cache.

The type hint Dict[str, tuple] could be more specific to improve IDE support and documentation:

-        self._git_info_cache: Dict[str, tuple] = {}
+        self._git_info_cache: Dict[str, tuple[float, Dict[str, str]]] = {}

This is optional but would make the cache structure clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/store_manager.py` around lines 84 - 89, The _git_info_cache
annotation is too generic (Dict[str, tuple]); update it to a precise typing that
reflects the cached shape used by _get_local_git_info — for example Dict[str,
Tuple[float, GitInfoType]] or Dict[str, Tuple[float, Dict[str, Any]]] depending
on whether you store a custom GitInfo dataclass or a dict — and import/alias the
appropriate typing names (Tuple, Any or the GitInfo dataclass) so IDEs and
static checkers can infer the cache contents.

673-673: Add parentheses to clarify operator precedence.

The condition mixes and and or without parentheses, making the intended logic unclear to readers.

-        if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4:
+        if len(filtered) == 1 or (not fetch_commit_info and len(filtered) < 4):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/store_manager.py` at line 673, The boolean condition mixing
"and" and "or" is ambiguous; update the conditional that currently reads "if
len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4:" to use
parentheses around the intended grouping so precedence is explicit (referencing
the variables/operands filtered and fetch_commit_info). Modify the condition so
it clearly expresses whether the "not fetch_commit_info" should bind with the
second comparison (len(filtered) < 4) or the entire left-hand expression, e.g.,
by adding parentheses around the appropriate side, and keep the logic otherwise
unchanged.

67-67: Minor: Replace ambiguous multiplication sign with ASCII 'x'.

Line 67 contains a Unicode multiplication sign (×) which could cause confusion. Replace with ASCII x for clarity.

-        # minutes keeps the cache warm across a realistic session while
+        # minutes keeps the cache warm across a realistic session while

Actually, looking more carefully, the × appears in:

#     paid for ~3 HTTP requests × N plugins (30-60s serial).
Suggested fix
-        # minutes meant every fresh browse on
-        # a Pi4 paid for ~3 HTTP requests × N plugins (30-60s serial). 30
+        # minutes meant every fresh browse on
+        # a Pi4 paid for ~3 HTTP requests x N plugins (30-60s serial). 30
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/store_manager.py` at line 67, In
src/plugin_system/store_manager.py update the comment string that currently uses
a Unicode multiplication sign (`×`) — specifically the comment containing "paid
for ~3 HTTP requests × N plugins (30-60s serial)" — to use an ASCII 'x' instead
(e.g., "paid for ~3 HTTP requests x N plugins (30-60s serial)") so the comment
is unambiguous; this change is only in the comment in the store_manager module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/plugin_system/store_manager.py`:
- Around line 84-89: The _git_info_cache annotation is too generic (Dict[str,
tuple]); update it to a precise typing that reflects the cached shape used by
_get_local_git_info — for example Dict[str, Tuple[float, GitInfoType]] or
Dict[str, Tuple[float, Dict[str, Any]]] depending on whether you store a custom
GitInfo dataclass or a dict — and import/alias the appropriate typing names
(Tuple, Any or the GitInfo dataclass) so IDEs and static checkers can infer the
cache contents.
- Line 673: The boolean condition mixing "and" and "or" is ambiguous; update the
conditional that currently reads "if len(filtered) == 1 or not fetch_commit_info
and len(filtered) < 4:" to use parentheses around the intended grouping so
precedence is explicit (referencing the variables/operands filtered and
fetch_commit_info). Modify the condition so it clearly expresses whether the
"not fetch_commit_info" should bind with the second comparison (len(filtered) <
4) or the entire left-hand expression, e.g., by adding parentheses around the
appropriate side, and keep the logic otherwise unchanged.
- Line 67: In src/plugin_system/store_manager.py update the comment string that
currently uses a Unicode multiplication sign (`×`) — specifically the comment
containing "paid for ~3 HTTP requests × N plugins (30-60s serial)" — to use an
ASCII 'x' instead (e.g., "paid for ~3 HTTP requests x N plugins (30-60s
serial)") so the comment is unambiguous; this change is only in the comment in
the store_manager module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5cc6e985-b28e-4198-bdfe-1474e7ede053

📥 Commits

Reviewing files that changed from the base of the PR and between c03eb8d and 5ed48d5.

📒 Files selected for processing (1)
  • src/plugin_system/store_manager.py

Chuck and others added 2 commits April 8, 2026 11:21
Regression guard for the caching and tombstone changes in this PR:

- ``install_plugin`` must not be gated by the uninstall tombstone. The
  tombstone only exists to keep the state reconciler from resurrecting a
  freshly-uninstalled plugin; explicit user-initiated installs via the
  store UI go straight to ``install_plugin()`` and must never be blocked.
  Test: mark a plugin recently uninstalled, stub out the download, call
  ``install_plugin``, and assert the download step was reached.

- ``get_plugin_info(force_refresh=True)`` must forward force_refresh
  through to both ``_get_latest_commit_info`` and ``_fetch_manifest_from_github``,
  so that install_plugin and update_plugin (both of which call
  get_plugin_info with force_refresh=True) continue to bypass the 30-min
  cache TTLs introduced in c03eb8d. Without this, bumping the commit
  cache TTL could cause users to install or update to a commit older than
  what GitHub actually has.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stry error propagation, payload hardening

Three real bugs surfaced by review, plus one nitpick. Each was verified
against the current code before fixing.

1. fetch_registry silently swallowed network errors, breaking the
   reconciler (CONFIRMED BUG).

   The stale-cache fallback I added in c03eb8d made fetch_registry
   return {"plugins": []} on network failure when no cache existed —
   which is exactly the state on a fresh boot with flaky WiFi. The
   reconciler's _auto_repair_missing_plugin code assumed an exception
   meant "transient, don't mark unrecoverable" and expected to never
   see a silent empty-dict result. With the silent fallback in place
   on a fresh boot, it would see "no candidates in registry" and
   mark every config-referenced plugin permanently unrecoverable.

   Fix: add ``raise_on_failure: bool = False`` to fetch_registry. UI
   callers keep the stale-cache-fallback default. The reconciler's
   _auto_repair_missing_plugin now calls it with raise_on_failure=True
   so it can distinguish a genuine registry miss from a network error.

2. Uninstall was not transactional (CONFIRMED BUG).

   Two distinct failure modes silently left the system in an
   inconsistent state:

   (a) If ``cleanup_plugin_config`` raised, the code logged a warning
       and proceeded to delete files anyway, leaving an orphan install
       with no config entry.
   (b) If ``uninstall_plugin`` returned False or raised AFTER cleanup
       had already succeeded, the config was gone but the files were
       still on disk — another orphan state.

   Fix: introduce ``_do_transactional_uninstall`` shared by both the
   queue and direct paths. Flow:
     - snapshot plugin's entries in main config + secrets
     - cleanup_plugin_config; on failure, ABORT before touching files
     - uninstall_plugin; on failure, RESTORE the snapshot, then raise
   Both queue and direct endpoints now delegate to this helper and
   surface clean errors to the user instead of proceeding past failure.

3. /plugins/state/reconcile crashed on non-object JSON bodies
   (CONFIRMED BUG).

   The previous code did ``payload.get('force', False)`` after
   ``request.get_json(silent=True) or {}``. If a client sent a bare
   string or array as the JSON body, payload would be that string or
   list and .get() would raise AttributeError. Separately,
   ``bool("false")`` is True, so string-encoded booleans were
   mis-handled.

   Fix: guard ``isinstance(payload, dict)`` and route the value
   through the existing ``_coerce_to_bool`` helper.

4. Nitpick: use ``assert_called_once_with`` in
   test_force_reconcile_clears_unrecoverable_cache. The existing test
   worked in practice (we call reset_mock right before) but the stricter
   assertion catches any future regression where force=True might
   double-fire the install.

Tests added (19 new, 48 total passing):

- TestFetchRegistryRaiseOnFailure (4): flag propagates both
  RequestException and JSONDecodeError, wins over stale cache, and
  the default behavior is unchanged for existing callers.
- test_real_store_manager_empty_registry_on_network_failure (1): the
  key regression test — uses the REAL PluginStoreManager (not a Mock)
  with ConnectionError at the HTTP helper layer, and verifies the
  reconciler does NOT poison _unrecoverable_missing_on_disk.
- TestTransactionalUninstall (4): cleanup failure aborts before file
  removal; file removal failure (both False return and raise) restores
  the config snapshot; happy path still succeeds.
- TestReconcileEndpointPayload (8): bare string / array / null JSON
  bodies, missing force key, boolean true/false, and string-encoded
  "true"/"false" all handled correctly.

All 342 tests in the broader sweep still pass (2 pre-existing
TestDottedKeyNormalization failures reproduce on main and are unrelated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/plugin_system/state_reconciliation.py (1)

89-97: Good design for preventing infinite reconciliation loops.

The docstring clearly explains the purpose and lifecycle of the cache. Consider adding a type hint for better IDE support.

💡 Optional: Add type hint for clarity
-        self._unrecoverable_missing_on_disk: set = set()
+        self._unrecoverable_missing_on_disk: set[str] = set()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/state_reconciliation.py` around lines 89 - 97, Add an
explicit type hint to the _unrecoverable_missing_on_disk attribute to improve
IDE/type-checker support: change its annotation from "set" to a concrete element
type such as "set[str]" (i.e., _unrecoverable_missing_on_disk: set[str] = set())
so the attribute reflects it contains plugin ID strings and tools can infer
element types when used elsewhere.
test/test_store_manager_caches.py (1)

206-215: The wall-time bound here is likely to flake on Pi/CI.

elapsed < 0.2 depends on scheduler noise and machine load, so this can fail even when the thread pool is overlapping work correctly. You already track peak["count"]; that overlap signal is the stable part. Either loosen the threshold substantially or switch to a coordination-based overlap check. As per coding guidelines, "Ensure tests are compatible with Raspberry Pi environment".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/test_store_manager_caches.py` around lines 206 - 215, The elapsed-time
assertion in the test (using t0/elapsed around self.sm.search_plugins) is flaky
on low-power/CI devices; remove or relax the hard wall-time check and rely on
the deterministic overlap signal peak["count"] instead (assert peak["count"] >=
2 via the existing peak variable) or, if you must keep a timing guard,
substantially loosen it (e.g., to 0.5–1.0s) and/or replace it with a
coordination-based check that explicitly waits for concurrent worker entry/exit
events; update the assertions around search_plugins, t0/elapsed, and
peak["count"] accordingly so the test no longer fails on Raspberry Pi/CI due to
scheduler noise.
src/plugin_system/store_manager.py (1)

358-365: Serving stale data without a retry cooldown can still put the UI on the slow path.

These branches return stale cache but leave the entry expired, so once the TTL has passed every request still blocks on a fresh network attempt before returning the same stale payload. During a longer GitHub/registry outage that reintroduces slow store/list responses. Consider refreshing the stale timestamp or keeping a short failure-backoff separately. As per coding guidelines, "Implement graceful degradation to continue operation when non-critical features fail".

Also applies to: 390-395, 554-556, 562-563

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin_system/store_manager.py` around lines 358 - 365, When returning a
stale entry after a fetch error (the branches that check cache_key in
self.github_cache and return stale), update the cache metadata to avoid repeated
immediate retries: either refresh the stored timestamp/TTL on
self.github_cache[cache_key] (set a short backoff expiry like now +
failure_backoff_seconds) or maintain a separate failure_backoff map (keyed by
cache_key) with a next_retry timestamp and check it before attempting network
fetch; apply this change in the blocks that access self.github_cache and return
stale (the snippet using cache_key, stale, req_err and the similar branches
mentioned) so subsequent requests serve the stale payload without blocking until
the backoff expires.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 793-808: In the commit fetch path in store_manager.py (the code
handling requests.get and response handling that uses commit_info_cache and
cache_key), don’t overwrite an existing good cache entry with None on retryable
non-200 responses; instead treat HTTP errors like the RequestException branch:
if response.status_code is not 200 and cache_key exists in
self.commit_info_cache, log a warning and return the stale value; otherwise set
last_error/continue without assigning None into self.commit_info_cache. Ensure
this applies to the block that inspects response.status_code (the same logic
referenced around lines 829–839) so only a successful 200 replaces the cache
entry.
- Around line 1703-1714: The cache currently only checks .git/HEAD mtime
(head_file) which doesn't change on fast-forward; instead read HEAD contents in
the code around plugin_path/head_file and if it begins with "ref: " resolve that
ref to the corresponding ref file under git_dir (or fall back to looking up
packed-refs) and use the resolved ref's mtime or the actual resolved SHA as the
cache key/value stored in self._git_info_cache; apply the same fix to the
duplicate logic near lines 1773-1774 so the cache is invalidated when the branch
ref (or packed ref) updates rather than only when HEAD changes.

In `@test/test_store_manager_caches.py`:
- Around line 366-369: The test currently swallows all exceptions from
self.sm.install_plugin("bar") which masks real failures; change the test to
either stub/mock the remaining install steps so install_plugin("bar") completes
normally (e.g., mock _install_via_git or the methods it calls on StoreManager to
return expected values) or explicitly assert the specific exception you expect
(catch only that exception type and assert its message), and remove the broad
`except Exception: pass` so unexpected errors surface during testing.

In `@web_interface/blueprints/api_v3.py`:
- Around line 2916-2954: Wrap the unload and uninstall steps in the same
rollback boundary: move the call to
api_v3.plugin_manager.unload_plugin(plugin_id) into the try/except around
api_v3.plugin_store_manager.uninstall_plugin(plugin_id), record whether the
plugin was loaded before unloading, and on any exception or a False return call
_restore_plugin_config(plugin_id, snapshot) and, if the plugin manager exists
and the plugin was previously loaded, call
api_v3.plugin_manager.load_plugin(plugin_id) (or the appropriate load method) to
restore runtime state; also catch and log exceptions from unload so they trigger
the same rollback path.

---

Nitpick comments:
In `@src/plugin_system/state_reconciliation.py`:
- Around line 89-97: Add an explicit type hint to the
_unrecoverable_missing_on_disk attribute to improve IDE/type-checker support:
change its annotation from "set" to a concrete element type such as "set[str]"
(i.e., _unrecoverable_missing_on_disk: set[str] = set()) so the attribute
reflects it contains plugin ID strings and tools can infer element types when
used elsewhere.

In `@src/plugin_system/store_manager.py`:
- Around line 358-365: When returning a stale entry after a fetch error (the
branches that check cache_key in self.github_cache and return stale), update the
cache metadata to avoid repeated immediate retries: either refresh the stored
timestamp/TTL on self.github_cache[cache_key] (set a short backoff expiry like
now + failure_backoff_seconds) or maintain a separate failure_backoff map (keyed
by cache_key) with a next_retry timestamp and check it before attempting network
fetch; apply this change in the blocks that access self.github_cache and return
stale (the snippet using cache_key, stale, req_err and the similar branches
mentioned) so subsequent requests serve the stale payload without blocking until
the backoff expires.

In `@test/test_store_manager_caches.py`:
- Around line 206-215: The elapsed-time assertion in the test (using t0/elapsed
around self.sm.search_plugins) is flaky on low-power/CI devices; remove or relax
the hard wall-time check and rely on the deterministic overlap signal
peak["count"] instead (assert peak["count"] >= 2 via the existing peak variable)
or, if you must keep a timing guard, substantially loosen it (e.g., to 0.5–1.0s)
and/or replace it with a coordination-based check that explicitly waits for
concurrent worker entry/exit events; update the assertions around
search_plugins, t0/elapsed, and peak["count"] accordingly so the test no longer
fails on Raspberry Pi/CI due to scheduler noise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2057d87c-aef0-4870-a4d6-e62084b9ea86

📥 Commits

Reviewing files that changed from the base of the PR and between 5ed48d5 and 2e5d079.

📒 Files selected for processing (6)
  • src/plugin_system/state_reconciliation.py
  • src/plugin_system/store_manager.py
  • test/test_store_manager_caches.py
  • test/test_uninstall_and_reconcile_endpoint.py
  • test/web_interface/test_state_reconciliation.py
  • web_interface/blueprints/api_v3.py
✅ Files skipped from review due to trivial changes (1)
  • test/web_interface/test_state_reconciliation.py

Chuck and others added 2 commits April 8, 2026 11:56
Four small cleanups, each verified against current code:

1. ``_git_info_cache`` type annotation was ``Dict[str, tuple]`` — too
   loose. Tightened to ``Dict[str, Tuple[float, Dict[str, str]]]`` to
   match what ``_get_local_git_info`` actually stores (mtime + the
   sha/short_sha/branch/... dict it returns). Added ``Tuple`` to the
   typing imports.

2. The ``search_plugins`` early-return condition
   ``if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4``
   parses correctly under Python's precedence (``and`` > ``or``) but is
   visually ambiguous. Added explicit parentheses to make the intent —
   "single plugin, OR small batch that doesn't need commit info" —
   obvious at a glance. Semantics unchanged.

3. Replaced a Unicode multiplication sign (×) with ASCII 'x' in the
   commit_cache_timeout comment.

4. Removed a dead ``concurrent_workers = []`` declaration from
   ``test_enrichment_runs_concurrently``. It was left over from an
   earlier sketch of the concurrency check — the final test uses only
   ``peak_lock`` and ``peak``.

All 48 tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…back

Verified each finding against the current code. All four inline issues
were real bugs; nitpicks 5-7 were valid improvements.

1. _get_latest_commit_info overwrote a good cached value with None on
   all-branches-404 (CONFIRMED BUG).

   The final line of the branch loop unconditionally wrote
   ``self.commit_info_cache[cache_key] = (time.time(), None)``, which
   clobbered any previously-good entry on a single transient failure
   (e.g. an odd 5xx, a temporary DNS hiccup during the branches_to_try
   loop). Fix: if there's already a good prior value, bump its
   timestamp into the backoff window and return it instead. Only
   cache None when we never had a good value.

2. _get_local_git_info cache did not invalidate on fast-forward
   (CONFIRMED BUG).

   Caching on ``.git/HEAD`` mtime alone is wrong: a ``git pull`` that
   fast-forwards the current branch updates ``.git/refs/heads/<branch>``
   (or packed-refs) but leaves HEAD's contents and mtime untouched.
   The cache would then serve a stale SHA indefinitely.

   Fix: introduce ``_git_cache_signature`` which reads HEAD contents,
   resolves ``ref: refs/heads/<name>`` to the corresponding loose ref
   file, and builds a signature tuple of (head_contents, head_mtime,
   resolved_ref_mtime, packed_refs_mtime). A fast-forward bumps the
   ref file's mtime, which invalidates the signature and re-runs git.

3. test_install_plugin_is_not_blocked_by_tombstone swallowed all
   exceptions (CONFIRMED BUG in test).

   ``try: self.sm.install_plugin("bar") except Exception: pass`` could
   hide a real regression in install_plugin that happens to raise.
   Fix: the test now writes a COMPLETE valid manifest stub (id, name,
   class_name, display_modes, entry_point) and stubs _install_dependencies,
   so install_plugin runs all the way through and returns True. The
   assertion is now ``assertTrue(result)`` with no exception handling.

4. Uninstall rollback missed unload/reload (CONFIRMED BUG).

   Previous flow: cleanup → unload (outside try/except) → uninstall →
   rollback config on failure. Problem: if ``unload_plugin`` raised,
   the exception propagated without restoring config. And if
   ``uninstall_plugin`` failed after a successful unload, the rollback
   restored config but left the plugin unloaded at runtime —
   inconsistent.

   Fix: record ``was_loaded`` before touching runtime state, wrap
   ``unload_plugin`` in the same try/except that covers
   ``uninstall_plugin``, and on any failure call a ``_rollback`` local
   that (a) restores the config snapshot and (b) calls
   ``load_plugin`` to reload the plugin if it was loaded before we
   touched it.

5. Nitpick: ``_unrecoverable_missing_on_disk: set`` → ``Set[str]``.
   Matches the existing ``Dict``/``List`` style in state_reconciliation.py.

6. Nitpick: stale-cache fallbacks in _get_github_repo_info and
   _get_latest_commit_info now bump the cached entry's timestamp by a
   60s failure backoff. Without this, a cache entry whose TTL just
   expired would cause every subsequent request to re-hit the network
   until it came back, amplifying the failure. Introduced
   ``_record_cache_backoff`` helper and applied it consistently.

7. Nitpick: replaced the flaky wall-time assertion in
   test_enrichment_runs_concurrently with just the deterministic
   ``peak["count"] >= 2`` signal. ``peak["count"]`` can only exceed 1
   if two workers were inside the critical section simultaneously,
   which is definitive proof of parallelism. The wall-time check was
   tight enough (<200ms) to occasionally fail on CI / low-power boxes.

Tests (6 new, 54 total passing):

- test_cache_invalidates_on_fast_forward_of_current_branch: builds a
  loose-ref layout under a temp .git/, verifies a first call populates
  the cache, a second call with unchanged state hits the cache, and a
  simulated fast-forward (overwriting ``.git/refs/heads/main`` with a
  new SHA and mtime) correctly re-runs git.
- test_commit_info_preserves_good_cache_on_all_branches_404: seeds a
  good cached entry, mocks requests.get to always return 404, and
  verifies the cache still contains the good value afterwards.
- test_repo_info_stale_bumps_timestamp_into_backoff: seeds an expired
  cache, triggers a ConnectionError, then verifies a second lookup
  does NOT re-hit the network (proves the timestamp bump happened).
- test_repo_info_stale_on_403_also_backs_off: same for the 403 path.
- test_file_removal_failure_reloads_previously_loaded_plugin:
  plugin starts loaded, uninstall_plugin returns False, asserts
  load_plugin was called during rollback.
- test_unload_failure_restores_config_and_does_not_call_uninstall:
  unload_plugin raises, asserts uninstall_plugin was never called AND
  config was restored AND load_plugin was NOT called (runtime state
  never changed, so no reload needed).

Broader test sweep: 348/348 pass (2 pre-existing
TestDottedKeyNormalization failures reproduce on main, unrelated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/plugin_system/store_manager.py (1)

834-898: ⚠️ Potential issue | 🟠 Major

Don't serve stale commit SHAs on a forced refresh.

The update path asks for force_refresh=True before deciding the local checkout is already current. Returning a cached commit from these stale-on-error branches can make that check short-circuit and skip git pull even when upstream has moved. Keep the stale fallback for UI callers, but disable it when force_refresh is true, or split it behind an explicit allow_stale_on_error flag.

🛠️ Suggested change
-                    if cache_key in self.commit_info_cache:
+                    if (not force_refresh) and cache_key in self.commit_info_cache:
                         _, stale = self.commit_info_cache[cache_key]
                         if stale is not None:
                             self._record_cache_backoff(
                                 self.commit_info_cache, cache_key,
                                 self.commit_cache_timeout, stale,
@@
-            if cache_key in self.commit_info_cache:
+            if (not force_refresh) and cache_key in self.commit_info_cache:
                 _, prior = self.commit_info_cache[cache_key]
                 if prior is not None:
                     self._record_cache_backoff(
                         self.commit_info_cache, cache_key,
                         self.commit_cache_timeout, prior,
Based on learnings: Plugin store always fetches latest versions from GitHub (releases/tags/manifest/commit) - GitHub is the source of truth for plugin versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 1755-1800: The cache signature built by _git_cache_signature lacks
the repository config, so add the .git/config file to the signature: attempt to
read git_dir / 'config' (both its mtime via .stat().st_mtime and its text via
read_text(...).strip() inside the same OSError-guarded pattern used for HEAD),
set config_mtime/config_contents to None on error, and append these values to
the returned tuple (e.g., return (head_contents, head_mtime, ref_mtime,
packed_refs_mtime, config_contents, config_mtime)) so config-only changes (like
remote.origin.url) invalidate the cache.
- Around line 587-603: In fetch_registry(), when catching
requests.RequestException or json.JSONDecodeError and returning
self.registry_cache (and only when raise_on_failure is False), bump
self.registry_cache_time forward by a short backoff to avoid repeatedly hitting
the network; add or reuse a SMALL_BACKOFF_SECONDS constant (e.g. 30) and set
self.registry_cache_time = time.time() + SMALL_BACKOFF_SECONDS before returning
self.registry_cache so the stale-cache path gets a short grace period similar to
the repo/commit cache logic.

In `@test/test_uninstall_and_reconcile_endpoint.py`:
- Around line 37-57: The _make_client() test helper mutates the shared api_v3
singleton by replacing attributes like config_manager, plugin_manager,
plugin_store_manager, plugin_state_manager, saved_repositories_manager,
schema_manager, operation_queue, operation_history, and cache_manager with
MagicMocks and never restores them; update _make_client() to snapshot the
original api_v3 attributes before overwriting and restore them after the test
(or use patch.object on each attribute and call addCleanup for each patch) so
the MagicMocks do not leak into later tests and the api_v3 singleton is returned
to its original state.

In `@web_interface/blueprints/api_v3.py`:
- Around line 2841-2851: The catch blocks around the snapshot logic for main
config and secrets currently use broad "except Exception as e"; replace those
with narrower exception types such as FileNotFoundError, IOError, and OSError
when checking/reading files to avoid masking programmer errors—update the two
exception handlers that follow the checks using
api_v3.config_manager.secrets_path and
api_v3.config_manager.get_raw_file_content('secrets') (the blocks referencing
plugin_id and secrets_config) so they only catch FileNotFoundError,
OSError/IOError (as appropriate) and keep the existing logger.warning calls
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47dabdcb-4503-44b1-9d72-a8be18bb5a5e

📥 Commits

Reviewing files that changed from the base of the PR and between 2e5d079 and 80d8445.

📒 Files selected for processing (5)
  • src/plugin_system/state_reconciliation.py
  • src/plugin_system/store_manager.py
  • test/test_store_manager_caches.py
  • test/test_uninstall_and_reconcile_endpoint.py
  • web_interface/blueprints/api_v3.py

…isolation

All four findings verified as real issues against the current code.

1. _git_cache_signature was missing .git/config (CONFIRMED GAP).

   The cached ``result`` dict from _get_local_git_info includes
   ``remote_url``, which is read from ``.git/config``. But the cache
   signature only tracked HEAD + refs — so a config-only change (e.g.
   ``git remote set-url origin https://...``) would leave the stale
   URL cached indefinitely. This matters for the monorepo-migration
   detection in update_plugin.

   Fix: add ``config_contents`` and ``config_mtime`` to the signature
   tuple. Config reads use the same OSError-guarded pattern as the
   HEAD read.

2. fetch_registry stale fallback didn't bump registry_cache_time
   (CONFIRMED BUG).

   The other caches already had the failure-backoff pattern added in
   the previous review pass (via ``_record_cache_backoff``), but the
   registry cache's stale-fallback branches silently returned the
   cached payload without updating ``registry_cache_time``. Next
   request saw the same expired TTL, re-hit the network, failed
   again — amplifying the original transient failure.

   Fix: bump ``self.registry_cache_time`` forward by the existing
   ``self._failure_backoff_seconds`` (reused — no new constant
   needed) in both the RequestException and JSONDecodeError stale
   branches. Kept the ``raise_on_failure=True`` path untouched so the
   reconciler still gets the exception.

3. _make_client() in the uninstall/reconcile test helper leaked
   MagicMocks into the api_v3 singleton (CONFIRMED RISK).

   Every test call replaced api_v3.config_manager, .plugin_manager,
   .plugin_store_manager, etc. with MagicMocks and never restored them.
   If any later test in the same pytest run imported api_v3 expecting
   original state (or None), it would see the leftover mocks.

   Fix: _make_client now snapshots the original attributes (with a
   sentinel to distinguish "didn't exist" from "was None") and returns
   a cleanup callable. Both setUp methods call self.addCleanup(cleanup)
   so state is restored even if the test raises. On cleanup, sentinel
   entries trigger delattr rather than setattr to preserve the
   "attribute was never set" case.

4. Snapshot helpers used broad ``except Exception`` (CONFIRMED).

   _snapshot_plugin_config caught any exception from
   get_raw_file_content, which could hide programmer errors (TypeError,
   AttributeError) behind the "best-effort snapshot" fallback. The
   legitimate failure modes are filesystem errors (covered by OSError;
   FileNotFoundError is a subclass, IOError is an alias in Python 3)
   and ConfigError (what config_manager wraps all load failures in).

   Fix: narrow to ``(OSError, ConfigError)`` in both snapshot blocks.
   ConfigError was already imported at line 20 of api_v3.py.

Tests added (4 new, 58 total passing):

- test_cache_invalidates_on_git_config_change: builds a realistic
  loose-ref layout, writes .git/config with an "old" remote URL,
  exercises _get_local_git_info, then rewrites .git/config with a
  "new" remote URL + new mtime, calls again, and asserts the cache
  invalidated and returned the new URL.
- test_stale_fallback_bumps_timestamp_into_backoff: seeds an expired
  registry cache, triggers ConnectionError, verifies first call
  serves stale, then asserts a second call makes ZERO new HTTP
  requests (proves registry_cache_time was bumped forward).
- test_snapshot_survives_config_read_error: raises ConfigError from
  get_raw_file_content and asserts the uninstall still completes
  successfully — the narrow exception list still catches this case.
- test_snapshot_does_not_swallow_programmer_errors: raises a
  TypeError from get_raw_file_content (not in the narrow list) and
  asserts it propagates up to a 500, AND that uninstall_plugin was
  never called (proves the exception was caught at the right level).

Broader test sweep: 352/352 pass (2 pre-existing
TestDottedKeyNormalization failures reproduce on main, unrelated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChuckBuilds ChuckBuilds merged commit 39ccdcf into main Apr 8, 2026
1 check passed
@ChuckBuilds ChuckBuilds deleted the fix/reconciliation-loop-and-plugin-perf branch April 8, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant