fix(plugins): stop reconciliation install loop, slow plugin list, uninstall resurrection#309
Conversation
… uninstall resurrection Three interacting bugs reported by a user (Discord/ericepe) on a fresh install: 1. The state reconciler retried failed auto-repairs on every HTTP request, pegging CPU and flooding logs with "Plugin not found in registry: github / youtube". Root cause: ``_run_startup_reconciliation`` reset ``_reconciliation_started`` to False on any unresolved inconsistency, so ``@app.before_request`` re-fired the entire pass on the next request. Fix: run reconciliation exactly once per process; cache per-plugin unrecoverable failures inside the reconciler so even an explicit re-trigger stays cheap; add a registry pre-check to skip the expensive GitHub fetch when we already know the plugin is missing; expose ``force=True`` on ``/plugins/state/reconcile`` so users can retry after fixing the underlying issue. 2. Uninstalling a plugin via the UI succeeded but the plugin reappeared. Root cause: a race between ``store_manager.uninstall_plugin`` (removes files) and ``cleanup_plugin_config`` (removes config entry) — if reconciliation fired in the gap it saw "config entry with no files" and reinstalled. Fix: reorder uninstall to clean config FIRST, drop a short-lived "recently uninstalled" tombstone on the store manager that the reconciler honors, and pass ``store_manager`` to the manual ``/plugins/state/reconcile`` endpoint (it was previously omitted, which silently disabled auto-repair entirely). 3. ``GET /plugins/installed`` was very slow on a Pi4 (UI hung on "connecting to display" for minutes, ~98% CPU). Root causes: per-request ``discover_plugins()`` + manifest re-read + four ``git`` subprocesses per plugin (``rev-parse``, ``--abbrev-ref``, ``config``, ``log``). Fix: mtime-gate ``discover_plugins()`` and drop the per-plugin manifest re-read in the endpoint; cache ``_get_local_git_info`` keyed on ``.git/HEAD`` mtime so subprocesses only run when the working copy actually moved; bump registry cache TTL from 5 to 15 minutes and fall back to stale cache on transient network failure. Tests: 16 reconciliation cases (including 5 new ones covering the unrecoverable cache, force-reconcile path, transient-failure handling, and recently-uninstalled tombstone) and 8 new store_manager cache tests covering tombstone TTL, git-info mtime cache hit/miss, and the registry stale-cache fallback. All 24 pass; the broader 288-test suite continues to pass with no new failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a per-process "unrecoverable missing" cache and a Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant StateRec as StateReconciliation
participant StoreMgr as PluginStoreManager
participant Registry as Plugin Registry
participant Disk as File System
Client->>StateRec: reconcile_state(force=False)
alt force=True
StateRec->>StateRec: clear unrecoverable cache
end
StateRec->>StateRec: check unrecoverable cache for plugin_id
alt in unrecoverable cache
StateRec-->>Client: report MANUAL_FIX_REQUIRED (skip auto-repair)
else not cached
StateRec->>StoreMgr: was_recently_uninstalled(plugin_id)?
StoreMgr-->>StateRec: true/false
alt recently uninstalled
StateRec-->>Client: skip auto-repair
else not recently uninstalled
StateRec->>StoreMgr: fetch_registry(raise_on_failure=True)
StoreMgr->>Registry: network request (may return cached fallback)
Registry-->>StoreMgr: registry data / error
StoreMgr-->>StateRec: registry data or stale/error
StateRec->>StateRec: is plugin in registry?
alt not in registry (definitive)
StateRec->>StateRec: mark plugin unrecoverable
StateRec-->>Client: report MANUAL_FIX_REQUIRED
else in registry
StateRec->>StoreMgr: install_plugin(plugin_id)
StoreMgr->>Disk: download & install
alt install succeeds
StoreMgr-->>StateRec: install success
else install fails
StateRec->>StateRec: mark plugin unrecoverable
StoreMgr-->>StateRec: install failed
end
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
test/web_interface/test_state_reconciliation.py (1)
414-433: Consider usingassert_called_once_withfor stricter verification.
assert_called_withonly verifies the arguments of the most recent call without checking call count. Usingassert_called_once_withwould catch potential bugs whereinstall_pluginis invoked multiple times.💡 Suggested improvement
- self.store_manager.install_plugin.assert_called_with("ghost") + self.store_manager.install_plugin.assert_called_once_with("ghost")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/web_interface/test_state_reconciliation.py` around lines 414 - 433, In test_force_reconcile_clears_unrecoverable_cache replace the loose call assertion with a strict one: change the final assertion that inspects store_manager.install_plugin from assert_called_with("ghost") to assert_called_once_with("ghost") so the test verifies both the argument and that install_plugin was invoked exactly once on the reconciler re-run (refer to the test method name test_force_reconcile_clears_unrecoverable_cache and the mock method store_manager.install_plugin).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 504-516: The current registry-fetching method silently falls back
to self.registry_cache on requests.RequestException/json.JSONDecodeError which
hides network errors and breaks
state_reconciliation._auto_repair_missing_plugin's logic; modify the registry
fetch method to accept a raise_on_failure: bool (default False) parameter and
when True re-raise the caught exception instead of returning the stale cache
(i.e., in the except blocks for requests.RequestException and
json.JSONDecodeError, if raise_on_failure: raise), and update callers that need
to distinguish transient network failures—specifically call the fetch with
raise_on_failure=True from
state_reconciliation._auto_repair_missing_plugin—while keeping the default
behavior for UI callers that prefer the stale cache.
In `@web_interface/blueprints/api_v3.py`:
- Around line 2850-2857: The uninstall flow must be made transactional: stop the
uninstall if api_v3.config_manager.cleanup_plugin_config(plugin_id,
remove_secrets=True) raises an exception (do not proceed to file deletion) and
when calling PluginStoreManager.uninstall_plugin(...) if that call returns False
or raises, restore the previously saved config/secrets (the config snapshot you
took before cleanup) and surface/raise the error instead of silently logging and
continuing; update the code paths around cleanup_plugin_config, the
preserve_config check, and PluginStoreManager.uninstall_plugin to (1) take a
pre-cleanup snapshot of config/secrets, (2) abort and re-raise/log + return
failure when cleanup_plugin_config fails, and (3) on uninstall_plugin failure
restore the snapshot before returning/failing, logging clear context via logger
rather than proceeding.
- Around line 2390-2397: The code assumes request.get_json() returns a dict and
uses payload.get('force') which will raise on non-object JSON and mis-handle
string booleans; change the block that builds force so you first confirm payload
is a dict (after request.get_json(silent=True) or {}), then extract the value
using payload.get('force', False) only if isinstance(payload, dict), and pass
that value through the existing _coerce_to_bool(...) helper before calling
reconciler.reconcile_state(force=...), ensuring non-object JSON returns the
default False.
---
Nitpick comments:
In `@test/web_interface/test_state_reconciliation.py`:
- Around line 414-433: In test_force_reconcile_clears_unrecoverable_cache
replace the loose call assertion with a strict one: change the final assertion
that inspects store_manager.install_plugin from assert_called_with("ghost") to
assert_called_once_with("ghost") so the test verifies both the argument and that
install_plugin was invoked exactly once on the reconciler re-run (refer to the
test method name test_force_reconcile_clears_unrecoverable_cache and the mock
method store_manager.install_plugin).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6712dc9a-b433-4303-abb3-dee3fcca67e8
📒 Files selected for processing (6)
src/plugin_system/state_reconciliation.pysrc/plugin_system/store_manager.pytest/test_store_manager_caches.pytest/web_interface/test_state_reconciliation.pyweb_interface/app.pyweb_interface/blueprints/api_v3.py
…che TTLs Follow-up to the previous commit addressing the Plugin Store browse path specifically. Most users install plugins via the store (ZIP extraction, no .git directory) so the git-info mtime cache from the previous commit was a no-op for them; their pain was coming from /plugins/store/list. Root cause. search_plugins() enriched each returned plugin with three serial GitHub fetches: _get_github_repo_info (repo API), _get_latest_commit_info (commits API), _fetch_manifest_from_github (raw.githubusercontent.com). Fifteen plugins × three requests × serial HTTP = 30–45 sequential round trips on every cold browse. On a Pi4 over WiFi that translated directly into the "connecting to display" hang users reported. The commit and manifest caches had a 5-minute TTL, so even a brief absence re-paid the full cost. Changes. - ``search_plugins``: fan out per-plugin enrichment through a ``ThreadPoolExecutor`` (max 10 workers, stays well under unauthenticated GitHub rate limits). Apply category/tag/query filters before enrichment so we never waste requests on plugins that will be filtered out. ``executor.map`` preserves input order, which the UI depends on. - ``commit_cache_timeout`` and ``manifest_cache_timeout``: 5 min → 30 min. Keeps the cache warm across a realistic session while still picking up upstream updates in a reasonable window. - ``_get_github_repo_info`` and ``_get_latest_commit_info``: stale-on-error fallback. On a network failure or a 403 we now prefer a previously- cached value over the zero-default, matching the pattern already in ``fetch_registry``. Flaky Pi WiFi no longer causes star counts to flip to 0 and commit info to disappear. Tests (5 new in test_store_manager_caches.py). - ``test_results_preserve_registry_order`` — the parallel map must still return plugins in input order. - ``test_filters_applied_before_enrichment`` — category/tag/query filters run first so we don't waste HTTP calls. - ``test_enrichment_runs_concurrently`` — peak-concurrency check plus a wall-time bound that would fail if the code regressed to serial. - ``test_repo_info_stale_on_network_error`` — repo info falls back to stale cache on RequestException. - ``test_commit_info_stale_on_network_error`` — commit info falls back to stale cache on RequestException. All 29 tests (16 reconciliation, 13 store_manager caches) pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/test_store_manager_caches.py (1)
186-189: Remove unused variableconcurrent_workers.This variable is defined but never used in the test.
🧹 Remove dead code
import threading - concurrent_workers = [] peak_lock = threading.Lock() peak = {"count": 0, "current": 0}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/test_store_manager_caches.py` around lines 186 - 189, Remove the dead variable concurrent_workers declared alongside peak_lock and peak in the test (the unused symbol concurrent_workers); delete its declaration and any related unused assignment so only peak_lock and peak remain, leaving the threading import intact if still used by peak_lock or other code (search for concurrent_workers to ensure no remaining references).src/plugin_system/store_manager.py (1)
668-678: Add parentheses to clarify operator precedence.The condition on line 671 mixes
andandorwithout parentheses, making the intent unclear. Whileandbinds tighter thanor, explicit parentheses improve readability and silence the linter warning.✨ Suggested clarification
- if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4: + if len(filtered) == 1 or (not fetch_commit_info and len(filtered) < 4):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/store_manager.py` around lines 668 - 678, The conditional combining len(filtered), fetch_commit_info and logical operators in the early-return needs explicit parentheses to clarify precedence: update the if in store_manager (the block that returns [_enrich(p) for p in filtered]) to use parentheses around the intended subexpressions (e.g., group the single-item check and the fetch_commit_info+size check) so it reads unambiguously (refer to variables/functions: filtered, fetch_commit_info, and _enrich) before proceeding to the ThreadPoolExecutor branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/plugin_system/store_manager.py`:
- Around line 668-678: The conditional combining len(filtered),
fetch_commit_info and logical operators in the early-return needs explicit
parentheses to clarify precedence: update the if in store_manager (the block
that returns [_enrich(p) for p in filtered]) to use parentheses around the
intended subexpressions (e.g., group the single-item check and the
fetch_commit_info+size check) so it reads unambiguously (refer to
variables/functions: filtered, fetch_commit_info, and _enrich) before proceeding
to the ThreadPoolExecutor branch.
In `@test/test_store_manager_caches.py`:
- Around line 186-189: Remove the dead variable concurrent_workers declared
alongside peak_lock and peak in the test (the unused symbol concurrent_workers);
delete its declaration and any related unused assignment so only peak_lock and
peak remain, leaving the threading import intact if still used by peak_lock or
other code (search for concurrent_workers to ensure no remaining references).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 138d4a15-6842-4130-bac6-660ef20c6ea9
📒 Files selected for processing (2)
src/plugin_system/store_manager.pytest/test_store_manager_caches.py
…h_plugins Benchmarking the previous parallelization commit on a real Pi4 revealed that the 10x speedup I expected was only ~1.1x. Profiling showed two plugins (football-scoreboard, ledmatrix-flights) each spent 5 seconds inside _fetch_manifest_from_github — not on the initial HTTP call, but on the three retries in _http_get_with_retries with exponential backoff after transient DNS failures. Even with the thread pool, those 5-second tail latencies stayed in the wave and dominated wall time. The per-plugin manifest fetch in search_plugins is redundant anyway. The registry's plugins.json already carries ``description`` (it is generated from each plugin's manifest by update_registry.py at release time), and ``last_updated`` is filled in from the commit info that we already fetch in the same loop. Dropping the manifest fetch eliminates one of the three per-plugin HTTPS round trips entirely, which also eliminates the DNS-retry tail. The _fetch_manifest_from_github helper itself is preserved — it is still used by the install path. Tests unchanged (the search_plugins tests mock all three helpers and still pass); this drop only affects the hot-path call sequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/plugin_system/store_manager.py (3)
84-89: Consider more precise type hint for _git_info_cache.The type hint
Dict[str, tuple]could be more specific to improve IDE support and documentation:- self._git_info_cache: Dict[str, tuple] = {} + self._git_info_cache: Dict[str, tuple[float, Dict[str, str]]] = {}This is optional but would make the cache structure clearer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/store_manager.py` around lines 84 - 89, The _git_info_cache annotation is too generic (Dict[str, tuple]); update it to a precise typing that reflects the cached shape used by _get_local_git_info — for example Dict[str, Tuple[float, GitInfoType]] or Dict[str, Tuple[float, Dict[str, Any]]] depending on whether you store a custom GitInfo dataclass or a dict — and import/alias the appropriate typing names (Tuple, Any or the GitInfo dataclass) so IDEs and static checkers can infer the cache contents.
673-673: Add parentheses to clarify operator precedence.The condition mixes
andandorwithout parentheses, making the intended logic unclear to readers.- if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4: + if len(filtered) == 1 or (not fetch_commit_info and len(filtered) < 4):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/store_manager.py` at line 673, The boolean condition mixing "and" and "or" is ambiguous; update the conditional that currently reads "if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4:" to use parentheses around the intended grouping so precedence is explicit (referencing the variables/operands filtered and fetch_commit_info). Modify the condition so it clearly expresses whether the "not fetch_commit_info" should bind with the second comparison (len(filtered) < 4) or the entire left-hand expression, e.g., by adding parentheses around the appropriate side, and keep the logic otherwise unchanged.
67-67: Minor: Replace ambiguous multiplication sign with ASCII 'x'.Line 67 contains a Unicode multiplication sign (
×) which could cause confusion. Replace with ASCIIxfor clarity.- # minutes keeps the cache warm across a realistic session while + # minutes keeps the cache warm across a realistic session whileActually, looking more carefully, the
×appears in:# paid for ~3 HTTP requests × N plugins (30-60s serial).Suggested fix
- # minutes meant every fresh browse on - # a Pi4 paid for ~3 HTTP requests × N plugins (30-60s serial). 30 + # minutes meant every fresh browse on + # a Pi4 paid for ~3 HTTP requests x N plugins (30-60s serial). 30🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/store_manager.py` at line 67, In src/plugin_system/store_manager.py update the comment string that currently uses a Unicode multiplication sign (`×`) — specifically the comment containing "paid for ~3 HTTP requests × N plugins (30-60s serial)" — to use an ASCII 'x' instead (e.g., "paid for ~3 HTTP requests x N plugins (30-60s serial)") so the comment is unambiguous; this change is only in the comment in the store_manager module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/plugin_system/store_manager.py`:
- Around line 84-89: The _git_info_cache annotation is too generic (Dict[str,
tuple]); update it to a precise typing that reflects the cached shape used by
_get_local_git_info — for example Dict[str, Tuple[float, GitInfoType]] or
Dict[str, Tuple[float, Dict[str, Any]]] depending on whether you store a custom
GitInfo dataclass or a dict — and import/alias the appropriate typing names
(Tuple, Any or the GitInfo dataclass) so IDEs and static checkers can infer the
cache contents.
- Line 673: The boolean condition mixing "and" and "or" is ambiguous; update the
conditional that currently reads "if len(filtered) == 1 or not fetch_commit_info
and len(filtered) < 4:" to use parentheses around the intended grouping so
precedence is explicit (referencing the variables/operands filtered and
fetch_commit_info). Modify the condition so it clearly expresses whether the
"not fetch_commit_info" should bind with the second comparison (len(filtered) <
4) or the entire left-hand expression, e.g., by adding parentheses around the
appropriate side, and keep the logic otherwise unchanged.
- Line 67: In src/plugin_system/store_manager.py update the comment string that
currently uses a Unicode multiplication sign (`×`) — specifically the comment
containing "paid for ~3 HTTP requests × N plugins (30-60s serial)" — to use an
ASCII 'x' instead (e.g., "paid for ~3 HTTP requests x N plugins (30-60s
serial)") so the comment is unambiguous; this change is only in the comment in
the store_manager module.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5cc6e985-b28e-4198-bdfe-1474e7ede053
📒 Files selected for processing (1)
src/plugin_system/store_manager.py
Regression guard for the caching and tombstone changes in this PR: - ``install_plugin`` must not be gated by the uninstall tombstone. The tombstone only exists to keep the state reconciler from resurrecting a freshly-uninstalled plugin; explicit user-initiated installs via the store UI go straight to ``install_plugin()`` and must never be blocked. Test: mark a plugin recently uninstalled, stub out the download, call ``install_plugin``, and assert the download step was reached. - ``get_plugin_info(force_refresh=True)`` must forward force_refresh through to both ``_get_latest_commit_info`` and ``_fetch_manifest_from_github``, so that install_plugin and update_plugin (both of which call get_plugin_info with force_refresh=True) continue to bypass the 30-min cache TTLs introduced in c03eb8d. Without this, bumping the commit cache TTL could cause users to install or update to a commit older than what GitHub actually has. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…stry error propagation, payload hardening Three real bugs surfaced by review, plus one nitpick. Each was verified against the current code before fixing. 1. fetch_registry silently swallowed network errors, breaking the reconciler (CONFIRMED BUG). The stale-cache fallback I added in c03eb8d made fetch_registry return {"plugins": []} on network failure when no cache existed — which is exactly the state on a fresh boot with flaky WiFi. The reconciler's _auto_repair_missing_plugin code assumed an exception meant "transient, don't mark unrecoverable" and expected to never see a silent empty-dict result. With the silent fallback in place on a fresh boot, it would see "no candidates in registry" and mark every config-referenced plugin permanently unrecoverable. Fix: add ``raise_on_failure: bool = False`` to fetch_registry. UI callers keep the stale-cache-fallback default. The reconciler's _auto_repair_missing_plugin now calls it with raise_on_failure=True so it can distinguish a genuine registry miss from a network error. 2. Uninstall was not transactional (CONFIRMED BUG). Two distinct failure modes silently left the system in an inconsistent state: (a) If ``cleanup_plugin_config`` raised, the code logged a warning and proceeded to delete files anyway, leaving an orphan install with no config entry. (b) If ``uninstall_plugin`` returned False or raised AFTER cleanup had already succeeded, the config was gone but the files were still on disk — another orphan state. Fix: introduce ``_do_transactional_uninstall`` shared by both the queue and direct paths. Flow: - snapshot plugin's entries in main config + secrets - cleanup_plugin_config; on failure, ABORT before touching files - uninstall_plugin; on failure, RESTORE the snapshot, then raise Both queue and direct endpoints now delegate to this helper and surface clean errors to the user instead of proceeding past failure. 3. /plugins/state/reconcile crashed on non-object JSON bodies (CONFIRMED BUG). The previous code did ``payload.get('force', False)`` after ``request.get_json(silent=True) or {}``. If a client sent a bare string or array as the JSON body, payload would be that string or list and .get() would raise AttributeError. Separately, ``bool("false")`` is True, so string-encoded booleans were mis-handled. Fix: guard ``isinstance(payload, dict)`` and route the value through the existing ``_coerce_to_bool`` helper. 4. Nitpick: use ``assert_called_once_with`` in test_force_reconcile_clears_unrecoverable_cache. The existing test worked in practice (we call reset_mock right before) but the stricter assertion catches any future regression where force=True might double-fire the install. Tests added (19 new, 48 total passing): - TestFetchRegistryRaiseOnFailure (4): flag propagates both RequestException and JSONDecodeError, wins over stale cache, and the default behavior is unchanged for existing callers. - test_real_store_manager_empty_registry_on_network_failure (1): the key regression test — uses the REAL PluginStoreManager (not a Mock) with ConnectionError at the HTTP helper layer, and verifies the reconciler does NOT poison _unrecoverable_missing_on_disk. - TestTransactionalUninstall (4): cleanup failure aborts before file removal; file removal failure (both False return and raise) restores the config snapshot; happy path still succeeds. - TestReconcileEndpointPayload (8): bare string / array / null JSON bodies, missing force key, boolean true/false, and string-encoded "true"/"false" all handled correctly. All 342 tests in the broader sweep still pass (2 pre-existing TestDottedKeyNormalization failures reproduce on main and are unrelated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/plugin_system/state_reconciliation.py (1)
89-97: Good design for preventing infinite reconciliation loops.The docstring clearly explains the purpose and lifecycle of the cache. Consider adding a type hint for better IDE support.
💡 Optional: Add type hint for clarity
- self._unrecoverable_missing_on_disk: set = set() + self._unrecoverable_missing_on_disk: set[str] = set()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/state_reconciliation.py` around lines 89 - 97, Add an explicit type hint to the _unrecoverable_missing_on_disk attribute to improve IDE/type-checker support: change its annotation from "set" to a concrete element type such as "set[str]" (i.e., _unrecoverable_missing_on_disk: set[str] = set()) so the attribute reflects it contains plugin ID strings and tools can infer element types when used elsewhere.test/test_store_manager_caches.py (1)
206-215: The wall-time bound here is likely to flake on Pi/CI.
elapsed < 0.2depends on scheduler noise and machine load, so this can fail even when the thread pool is overlapping work correctly. You already trackpeak["count"]; that overlap signal is the stable part. Either loosen the threshold substantially or switch to a coordination-based overlap check. As per coding guidelines, "Ensure tests are compatible with Raspberry Pi environment".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/test_store_manager_caches.py` around lines 206 - 215, The elapsed-time assertion in the test (using t0/elapsed around self.sm.search_plugins) is flaky on low-power/CI devices; remove or relax the hard wall-time check and rely on the deterministic overlap signal peak["count"] instead (assert peak["count"] >= 2 via the existing peak variable) or, if you must keep a timing guard, substantially loosen it (e.g., to 0.5–1.0s) and/or replace it with a coordination-based check that explicitly waits for concurrent worker entry/exit events; update the assertions around search_plugins, t0/elapsed, and peak["count"] accordingly so the test no longer fails on Raspberry Pi/CI due to scheduler noise.src/plugin_system/store_manager.py (1)
358-365: Serving stale data without a retry cooldown can still put the UI on the slow path.These branches return stale cache but leave the entry expired, so once the TTL has passed every request still blocks on a fresh network attempt before returning the same stale payload. During a longer GitHub/registry outage that reintroduces slow store/list responses. Consider refreshing the stale timestamp or keeping a short failure-backoff separately. As per coding guidelines, "Implement graceful degradation to continue operation when non-critical features fail".
Also applies to: 390-395, 554-556, 562-563
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/plugin_system/store_manager.py` around lines 358 - 365, When returning a stale entry after a fetch error (the branches that check cache_key in self.github_cache and return stale), update the cache metadata to avoid repeated immediate retries: either refresh the stored timestamp/TTL on self.github_cache[cache_key] (set a short backoff expiry like now + failure_backoff_seconds) or maintain a separate failure_backoff map (keyed by cache_key) with a next_retry timestamp and check it before attempting network fetch; apply this change in the blocks that access self.github_cache and return stale (the snippet using cache_key, stale, req_err and the similar branches mentioned) so subsequent requests serve the stale payload without blocking until the backoff expires.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 793-808: In the commit fetch path in store_manager.py (the code
handling requests.get and response handling that uses commit_info_cache and
cache_key), don’t overwrite an existing good cache entry with None on retryable
non-200 responses; instead treat HTTP errors like the RequestException branch:
if response.status_code is not 200 and cache_key exists in
self.commit_info_cache, log a warning and return the stale value; otherwise set
last_error/continue without assigning None into self.commit_info_cache. Ensure
this applies to the block that inspects response.status_code (the same logic
referenced around lines 829–839) so only a successful 200 replaces the cache
entry.
- Around line 1703-1714: The cache currently only checks .git/HEAD mtime
(head_file) which doesn't change on fast-forward; instead read HEAD contents in
the code around plugin_path/head_file and if it begins with "ref: " resolve that
ref to the corresponding ref file under git_dir (or fall back to looking up
packed-refs) and use the resolved ref's mtime or the actual resolved SHA as the
cache key/value stored in self._git_info_cache; apply the same fix to the
duplicate logic near lines 1773-1774 so the cache is invalidated when the branch
ref (or packed ref) updates rather than only when HEAD changes.
In `@test/test_store_manager_caches.py`:
- Around line 366-369: The test currently swallows all exceptions from
self.sm.install_plugin("bar") which masks real failures; change the test to
either stub/mock the remaining install steps so install_plugin("bar") completes
normally (e.g., mock _install_via_git or the methods it calls on StoreManager to
return expected values) or explicitly assert the specific exception you expect
(catch only that exception type and assert its message), and remove the broad
`except Exception: pass` so unexpected errors surface during testing.
In `@web_interface/blueprints/api_v3.py`:
- Around line 2916-2954: Wrap the unload and uninstall steps in the same
rollback boundary: move the call to
api_v3.plugin_manager.unload_plugin(plugin_id) into the try/except around
api_v3.plugin_store_manager.uninstall_plugin(plugin_id), record whether the
plugin was loaded before unloading, and on any exception or a False return call
_restore_plugin_config(plugin_id, snapshot) and, if the plugin manager exists
and the plugin was previously loaded, call
api_v3.plugin_manager.load_plugin(plugin_id) (or the appropriate load method) to
restore runtime state; also catch and log exceptions from unload so they trigger
the same rollback path.
---
Nitpick comments:
In `@src/plugin_system/state_reconciliation.py`:
- Around line 89-97: Add an explicit type hint to the
_unrecoverable_missing_on_disk attribute to improve IDE/type-checker support:
change its annotation from "set" to a concrete element type such as "set[str]"
(i.e., _unrecoverable_missing_on_disk: set[str] = set()) so the attribute
reflects it contains plugin ID strings and tools can infer element types when
used elsewhere.
In `@src/plugin_system/store_manager.py`:
- Around line 358-365: When returning a stale entry after a fetch error (the
branches that check cache_key in self.github_cache and return stale), update the
cache metadata to avoid repeated immediate retries: either refresh the stored
timestamp/TTL on self.github_cache[cache_key] (set a short backoff expiry like
now + failure_backoff_seconds) or maintain a separate failure_backoff map (keyed
by cache_key) with a next_retry timestamp and check it before attempting network
fetch; apply this change in the blocks that access self.github_cache and return
stale (the snippet using cache_key, stale, req_err and the similar branches
mentioned) so subsequent requests serve the stale payload without blocking until
the backoff expires.
In `@test/test_store_manager_caches.py`:
- Around line 206-215: The elapsed-time assertion in the test (using t0/elapsed
around self.sm.search_plugins) is flaky on low-power/CI devices; remove or relax
the hard wall-time check and rely on the deterministic overlap signal
peak["count"] instead (assert peak["count"] >= 2 via the existing peak variable)
or, if you must keep a timing guard, substantially loosen it (e.g., to 0.5–1.0s)
and/or replace it with a coordination-based check that explicitly waits for
concurrent worker entry/exit events; update the assertions around
search_plugins, t0/elapsed, and peak["count"] accordingly so the test no longer
fails on Raspberry Pi/CI due to scheduler noise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2057d87c-aef0-4870-a4d6-e62084b9ea86
📒 Files selected for processing (6)
src/plugin_system/state_reconciliation.pysrc/plugin_system/store_manager.pytest/test_store_manager_caches.pytest/test_uninstall_and_reconcile_endpoint.pytest/web_interface/test_state_reconciliation.pyweb_interface/blueprints/api_v3.py
✅ Files skipped from review due to trivial changes (1)
- test/web_interface/test_state_reconciliation.py
Four small cleanups, each verified against current code: 1. ``_git_info_cache`` type annotation was ``Dict[str, tuple]`` — too loose. Tightened to ``Dict[str, Tuple[float, Dict[str, str]]]`` to match what ``_get_local_git_info`` actually stores (mtime + the sha/short_sha/branch/... dict it returns). Added ``Tuple`` to the typing imports. 2. The ``search_plugins`` early-return condition ``if len(filtered) == 1 or not fetch_commit_info and len(filtered) < 4`` parses correctly under Python's precedence (``and`` > ``or``) but is visually ambiguous. Added explicit parentheses to make the intent — "single plugin, OR small batch that doesn't need commit info" — obvious at a glance. Semantics unchanged. 3. Replaced a Unicode multiplication sign (×) with ASCII 'x' in the commit_cache_timeout comment. 4. Removed a dead ``concurrent_workers = []`` declaration from ``test_enrichment_runs_concurrently``. It was left over from an earlier sketch of the concurrency check — the final test uses only ``peak_lock`` and ``peak``. All 48 tests still pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…back
Verified each finding against the current code. All four inline issues
were real bugs; nitpicks 5-7 were valid improvements.
1. _get_latest_commit_info overwrote a good cached value with None on
all-branches-404 (CONFIRMED BUG).
The final line of the branch loop unconditionally wrote
``self.commit_info_cache[cache_key] = (time.time(), None)``, which
clobbered any previously-good entry on a single transient failure
(e.g. an odd 5xx, a temporary DNS hiccup during the branches_to_try
loop). Fix: if there's already a good prior value, bump its
timestamp into the backoff window and return it instead. Only
cache None when we never had a good value.
2. _get_local_git_info cache did not invalidate on fast-forward
(CONFIRMED BUG).
Caching on ``.git/HEAD`` mtime alone is wrong: a ``git pull`` that
fast-forwards the current branch updates ``.git/refs/heads/<branch>``
(or packed-refs) but leaves HEAD's contents and mtime untouched.
The cache would then serve a stale SHA indefinitely.
Fix: introduce ``_git_cache_signature`` which reads HEAD contents,
resolves ``ref: refs/heads/<name>`` to the corresponding loose ref
file, and builds a signature tuple of (head_contents, head_mtime,
resolved_ref_mtime, packed_refs_mtime). A fast-forward bumps the
ref file's mtime, which invalidates the signature and re-runs git.
3. test_install_plugin_is_not_blocked_by_tombstone swallowed all
exceptions (CONFIRMED BUG in test).
``try: self.sm.install_plugin("bar") except Exception: pass`` could
hide a real regression in install_plugin that happens to raise.
Fix: the test now writes a COMPLETE valid manifest stub (id, name,
class_name, display_modes, entry_point) and stubs _install_dependencies,
so install_plugin runs all the way through and returns True. The
assertion is now ``assertTrue(result)`` with no exception handling.
4. Uninstall rollback missed unload/reload (CONFIRMED BUG).
Previous flow: cleanup → unload (outside try/except) → uninstall →
rollback config on failure. Problem: if ``unload_plugin`` raised,
the exception propagated without restoring config. And if
``uninstall_plugin`` failed after a successful unload, the rollback
restored config but left the plugin unloaded at runtime —
inconsistent.
Fix: record ``was_loaded`` before touching runtime state, wrap
``unload_plugin`` in the same try/except that covers
``uninstall_plugin``, and on any failure call a ``_rollback`` local
that (a) restores the config snapshot and (b) calls
``load_plugin`` to reload the plugin if it was loaded before we
touched it.
5. Nitpick: ``_unrecoverable_missing_on_disk: set`` → ``Set[str]``.
Matches the existing ``Dict``/``List`` style in state_reconciliation.py.
6. Nitpick: stale-cache fallbacks in _get_github_repo_info and
_get_latest_commit_info now bump the cached entry's timestamp by a
60s failure backoff. Without this, a cache entry whose TTL just
expired would cause every subsequent request to re-hit the network
until it came back, amplifying the failure. Introduced
``_record_cache_backoff`` helper and applied it consistently.
7. Nitpick: replaced the flaky wall-time assertion in
test_enrichment_runs_concurrently with just the deterministic
``peak["count"] >= 2`` signal. ``peak["count"]`` can only exceed 1
if two workers were inside the critical section simultaneously,
which is definitive proof of parallelism. The wall-time check was
tight enough (<200ms) to occasionally fail on CI / low-power boxes.
Tests (6 new, 54 total passing):
- test_cache_invalidates_on_fast_forward_of_current_branch: builds a
loose-ref layout under a temp .git/, verifies a first call populates
the cache, a second call with unchanged state hits the cache, and a
simulated fast-forward (overwriting ``.git/refs/heads/main`` with a
new SHA and mtime) correctly re-runs git.
- test_commit_info_preserves_good_cache_on_all_branches_404: seeds a
good cached entry, mocks requests.get to always return 404, and
verifies the cache still contains the good value afterwards.
- test_repo_info_stale_bumps_timestamp_into_backoff: seeds an expired
cache, triggers a ConnectionError, then verifies a second lookup
does NOT re-hit the network (proves the timestamp bump happened).
- test_repo_info_stale_on_403_also_backs_off: same for the 403 path.
- test_file_removal_failure_reloads_previously_loaded_plugin:
plugin starts loaded, uninstall_plugin returns False, asserts
load_plugin was called during rollback.
- test_unload_failure_restores_config_and_does_not_call_uninstall:
unload_plugin raises, asserts uninstall_plugin was never called AND
config was restored AND load_plugin was NOT called (runtime state
never changed, so no reload needed).
Broader test sweep: 348/348 pass (2 pre-existing
TestDottedKeyNormalization failures reproduce on main, unrelated).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/plugin_system/store_manager.py (1)
834-898:⚠️ Potential issue | 🟠 MajorDon't serve stale commit SHAs on a forced refresh.
The update path asks for
force_refresh=Truebefore deciding the local checkout is already current. Returning a cached commit from these stale-on-error branches can make that check short-circuit and skipgit pulleven when upstream has moved. Keep the stale fallback for UI callers, but disable it whenforce_refreshis true, or split it behind an explicitallow_stale_on_errorflag.Based on learnings: Plugin store always fetches latest versions from GitHub (releases/tags/manifest/commit) - GitHub is the source of truth for plugin versions.🛠️ Suggested change
- if cache_key in self.commit_info_cache: + if (not force_refresh) and cache_key in self.commit_info_cache: _, stale = self.commit_info_cache[cache_key] if stale is not None: self._record_cache_backoff( self.commit_info_cache, cache_key, self.commit_cache_timeout, stale, @@ - if cache_key in self.commit_info_cache: + if (not force_refresh) and cache_key in self.commit_info_cache: _, prior = self.commit_info_cache[cache_key] if prior is not None: self._record_cache_backoff( self.commit_info_cache, cache_key, self.commit_cache_timeout, prior,
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/plugin_system/store_manager.py`:
- Around line 1755-1800: The cache signature built by _git_cache_signature lacks
the repository config, so add the .git/config file to the signature: attempt to
read git_dir / 'config' (both its mtime via .stat().st_mtime and its text via
read_text(...).strip() inside the same OSError-guarded pattern used for HEAD),
set config_mtime/config_contents to None on error, and append these values to
the returned tuple (e.g., return (head_contents, head_mtime, ref_mtime,
packed_refs_mtime, config_contents, config_mtime)) so config-only changes (like
remote.origin.url) invalidate the cache.
- Around line 587-603: In fetch_registry(), when catching
requests.RequestException or json.JSONDecodeError and returning
self.registry_cache (and only when raise_on_failure is False), bump
self.registry_cache_time forward by a short backoff to avoid repeatedly hitting
the network; add or reuse a SMALL_BACKOFF_SECONDS constant (e.g. 30) and set
self.registry_cache_time = time.time() + SMALL_BACKOFF_SECONDS before returning
self.registry_cache so the stale-cache path gets a short grace period similar to
the repo/commit cache logic.
In `@test/test_uninstall_and_reconcile_endpoint.py`:
- Around line 37-57: The _make_client() test helper mutates the shared api_v3
singleton by replacing attributes like config_manager, plugin_manager,
plugin_store_manager, plugin_state_manager, saved_repositories_manager,
schema_manager, operation_queue, operation_history, and cache_manager with
MagicMocks and never restores them; update _make_client() to snapshot the
original api_v3 attributes before overwriting and restore them after the test
(or use patch.object on each attribute and call addCleanup for each patch) so
the MagicMocks do not leak into later tests and the api_v3 singleton is returned
to its original state.
In `@web_interface/blueprints/api_v3.py`:
- Around line 2841-2851: The catch blocks around the snapshot logic for main
config and secrets currently use broad "except Exception as e"; replace those
with narrower exception types such as FileNotFoundError, IOError, and OSError
when checking/reading files to avoid masking programmer errors—update the two
exception handlers that follow the checks using
api_v3.config_manager.secrets_path and
api_v3.config_manager.get_raw_file_content('secrets') (the blocks referencing
plugin_id and secrets_config) so they only catch FileNotFoundError,
OSError/IOError (as appropriate) and keep the existing logger.warning calls
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 47dabdcb-4503-44b1-9d72-a8be18bb5a5e
📒 Files selected for processing (5)
src/plugin_system/state_reconciliation.pysrc/plugin_system/store_manager.pytest/test_store_manager_caches.pytest/test_uninstall_and_reconcile_endpoint.pyweb_interface/blueprints/api_v3.py
…isolation All four findings verified as real issues against the current code. 1. _git_cache_signature was missing .git/config (CONFIRMED GAP). The cached ``result`` dict from _get_local_git_info includes ``remote_url``, which is read from ``.git/config``. But the cache signature only tracked HEAD + refs — so a config-only change (e.g. ``git remote set-url origin https://...``) would leave the stale URL cached indefinitely. This matters for the monorepo-migration detection in update_plugin. Fix: add ``config_contents`` and ``config_mtime`` to the signature tuple. Config reads use the same OSError-guarded pattern as the HEAD read. 2. fetch_registry stale fallback didn't bump registry_cache_time (CONFIRMED BUG). The other caches already had the failure-backoff pattern added in the previous review pass (via ``_record_cache_backoff``), but the registry cache's stale-fallback branches silently returned the cached payload without updating ``registry_cache_time``. Next request saw the same expired TTL, re-hit the network, failed again — amplifying the original transient failure. Fix: bump ``self.registry_cache_time`` forward by the existing ``self._failure_backoff_seconds`` (reused — no new constant needed) in both the RequestException and JSONDecodeError stale branches. Kept the ``raise_on_failure=True`` path untouched so the reconciler still gets the exception. 3. _make_client() in the uninstall/reconcile test helper leaked MagicMocks into the api_v3 singleton (CONFIRMED RISK). Every test call replaced api_v3.config_manager, .plugin_manager, .plugin_store_manager, etc. with MagicMocks and never restored them. If any later test in the same pytest run imported api_v3 expecting original state (or None), it would see the leftover mocks. Fix: _make_client now snapshots the original attributes (with a sentinel to distinguish "didn't exist" from "was None") and returns a cleanup callable. Both setUp methods call self.addCleanup(cleanup) so state is restored even if the test raises. On cleanup, sentinel entries trigger delattr rather than setattr to preserve the "attribute was never set" case. 4. Snapshot helpers used broad ``except Exception`` (CONFIRMED). _snapshot_plugin_config caught any exception from get_raw_file_content, which could hide programmer errors (TypeError, AttributeError) behind the "best-effort snapshot" fallback. The legitimate failure modes are filesystem errors (covered by OSError; FileNotFoundError is a subclass, IOError is an alias in Python 3) and ConfigError (what config_manager wraps all load failures in). Fix: narrow to ``(OSError, ConfigError)`` in both snapshot blocks. ConfigError was already imported at line 20 of api_v3.py. Tests added (4 new, 58 total passing): - test_cache_invalidates_on_git_config_change: builds a realistic loose-ref layout, writes .git/config with an "old" remote URL, exercises _get_local_git_info, then rewrites .git/config with a "new" remote URL + new mtime, calls again, and asserts the cache invalidated and returned the new URL. - test_stale_fallback_bumps_timestamp_into_backoff: seeds an expired registry cache, triggers ConnectionError, verifies first call serves stale, then asserts a second call makes ZERO new HTTP requests (proves registry_cache_time was bumped forward). - test_snapshot_survives_config_read_error: raises ConfigError from get_raw_file_content and asserts the uninstall still completes successfully — the narrow exception list still catches this case. - test_snapshot_does_not_swallow_programmer_errors: raises a TypeError from get_raw_file_content (not in the narrow list) and asserts it propagates up to a 500, AND that uninstall_plugin was never called (proves the exception was caught at the right level). Broader test sweep: 352/352 pass (2 pre-existing TestDottedKeyNormalization failures reproduce on main, unrelated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Three interacting bugs reported by ericepe on Discord after a fresh install:
Plugin not found in registry: github / youtube(pastebin). The reconciler fired on every HTTP request, which on a Pi4 also accounted for much of the ~98% CPU and the long "connecting to display" hangs.plugin-repos/<name>from disk./plugins/installedand especially/plugins/store/listwere doing synchronous per-plugin work on every request.All three are addressed. Each is a confirmed root cause from reading and running the code paths on a real Pi, not a guess.
Commits
b031b991fix(plugins): stop reconciliation install loop, slow/plugins/installed, and uninstall resurrectionc03eb8dbperf(plugins): parallelize Plugin Store browse and extend metadata cache TTLs5ed48d5cperf(plugins): drop redundant per-plugin manifest.json fetch insearch_pluginsWhat changed
Fix 1 — Reconciliation infinite retry loop
web_interface/app.py:_run_startup_reconciliationno longer resets_reconciliation_started = Falseon unresolved inconsistencies, so@app.before_requestcannot retrigger the entire pass on every page load. Reconciliation runs exactly once per process.src/plugin_system/state_reconciliation.py: added_unrecoverable_missing_on_diskset, populated by_auto_repair_missing_pluginwhenever a candidate isn't in the registry. Subsequent reconciliation passes skip these plugins entirely (noinstall_plugin, nofetch_registry).install_pluginso we never fire the expensive forced GitHub fetch when we already know the plugin is missing from the registry.force=Trueparameter onreconcile_state(...)and the/plugins/state/reconcileendpoint so users can retry after the underlying issue is fixed. The endpoint also now passesstore_manager(previously omitted, silently disabling auto-repair).Fix 2 — Uninstall plugin reappears
web_interface/blueprints/api_v3.py: reordered the uninstall flow (both queue and direct paths) — config cleanup now happens before file removal, eliminating the "config-entry-with-no-files" race window the reconciler used to resurrect on.PluginStoreManager(mark_recently_uninstalled/was_recently_uninstalled, 5-minute TTL). The reconciler honors it as a hard "do not auto-repair" signal even if state somehow drifts out of sync.Fix 3a —
/plugins/installedslownesssrc/plugin_system/store_manager.py:_get_local_git_infonow caches results keyed on.git/HEADmtime. Subsequent calls do zero subprocess work unless the working copy actually moved.web_interface/blueprints/api_v3.py:GET /plugins/installedmtime-gatesdiscover_plugins()against the plugins directory and drops the per-pluginmanifest.jsonre-read entirely. Added?refresh=1for force-rescan.registry_cache_timeoutfrom 5 → 15 minutes.fetch_registrynow falls back to the stale cache on transient network failure instead of returning an empty list.Fix 3b —
/plugins/store/listslowness (Plugin Store browse)Most users install via the Plugin Store (ZIP extraction, no
.gitdir), so the git-info cache above doesn't help them directly. The real pain was the store browse endpoint:ThreadPoolExecutor(10 workers, preserves registry order). Category/tag/query filters now run before enrichment so we never waste HTTP requests on plugins that will be filtered out.manifest.jsonfetch fromsearch_plugins. The registry'splugins.jsonalready carriesdescription(it is generated from each manifest byupdate_registry.py), andlast_updatedis filled from the commit info fetched in the same loop. Removing this call eliminated one of three per-plugin HTTPS round trips — crucially, the one that regularly hit the_http_get_with_retries(max_retries=3, backoff=0.75)tail when DNS flapped, which dominated wall time even after parallelization._get_github_repo_infoand_get_latest_commit_info: on network failure or 403, return the previously-cached value rather than zeroing outstars/ clearinglast_commit. Matches the pattern already infetch_registry.Benchmarks on real Pi hardware (devpi, Raspberry Pi, ARM64, 31 plugins in registry)
/api/v3/plugins/installed10 repeated warm hits produced zero new reconciliation retries, confirming the loop is gone.
/api/v3/plugins/store/list(the big one)46 % faster cold, with identical plugin count, identical order, and zero diffs on the critical fields the UI reads (stars, last_commit, last_commit_author, last_updated, default_branch, verified, repo, name, author). A single
descriptionfield differed between the two runs because the PR uses the registry description instead of the live GitHub manifest description for that plugin — an acceptable trade sinceupdate_registry.pyis meant to be authoritative.The remaining 16 s cold time is two GitHub API calls per plugin × 31 plugins bottlenecked by the slowest thread-pool wave. After the first load, warm hits are sub-30 ms and the 30-minute cache TTL means normal browsing stays warm across realistic sessions.
Reconciliation loop verified on the live service
I stopped
ledmatrix.service+ledmatrix-web.service, checked out the PR branch, and restarted. The journal showed exactly the expected behavior:githubandyoutube(the exact plugins from the bug report) hit the registry pre-check, were marked unrecoverable, no install attemptmusicandweatherWERE in the registry, auto-repaired successfully/api/v3/plugins/installedfired 13 times over the test; zeroAutoRepairorStarting state reconciliationentries after the initial passTests
29/29 tests pass:
test/web_interface/test_state_reconciliation.py— 5 new cases:test_not_in_registry_marks_unrecoverable_without_installtest_subsequent_reconcile_does_not_retrytest_force_reconcile_clears_unrecoverable_cachetest_registry_unreachable_does_not_mark_unrecoverabletest_recently_uninstalled_skips_auto_repairtest/test_store_manager_caches.py— new file, 13 cases:.gitearly return (3)search_plugins: order preservation, filter-before-enrichment, concurrent-execution wall-time bound (3)Broader sweep: 288/288 in the rest of the suite. Two pre-existing failures in
TestDottedKeyNormalizationreproduce onmainand are unrelated.Not in this PR
plugin-repos/static-image/manager.pyalready handles the schema mismatch I initially suspected (top-levelimagesvsimage_config.images). Without specific log output pinpointing what fails for that user, a speculative patch could mask the real bug. Tracking separately./plugins/updatefast-path for ZIP-installed plugins — the update endpoint still callsget_plugin_info(fetch_latest_from_github=True)which makes 3 GitHub requests. A version-only comparison against the cached registry would reduce that to zero. Saving for a follow-up; the update path is only hit on explicit user action, not on every page load.Test plan
pytest test/test_store_manager_caches.py test/web_interface/test_state_reconciliation.py -q --no-covpasses locally.plugin-repos/<plugin>/too; should stay gone.config/config.jsonto add"fakeplugin": {"enabled": true}that does not exist in the registry. Start the web interface, refresh the plugin page repeatedly, tail the log — exactly one reconciliation attempt, then silence.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
forceoption to reattempt repairs and skip session-level skip-list; reconciliation endpoint accepts a JSONforceflag.Bug Fixes
Tests