feat(skills): accept comma-separated list of managed-skills manifests by pjdoland · Pull Request #321 · plmbr/notebook-intelligence

pjdoland · 2026-05-19T01:23:43Z

Summary

NBI_SKILLS_MANIFEST (and the matching skills_manifest traitlet) now accept a comma-separated list of YAML/JSON manifests. The reconciler loads each independently and unions the entries, with first-wins URL dedupe and per-entry-error name dedupe. Single-manifest deployments behave identically; the wire format is strictly additive.

The use case: a JupyterHub deployment can ship an org-wide baseline manifest plus a per-team manifest in one env var, without baking a single combined file at deploy time.

Solution

Three layers of change, one commit each, plus a follow-up commit applying the convergent review findings.

load_manifests(sources, *, token) in notebook_intelligence/skill_manifest.py. Iterates sources, catches ManifestError per source so a transient outage on one manifest doesn't block the others, and merges entries with two passes of dedupe:
- By URL: same url: in two manifests means the earlier one wins; a warning names both manifests.
- By installed name: predicted via the same _slug() step the real installer applies (subpath casing, whitespace, capitals all normalize), so two URLs that would install to the same name collide before staging. The second entry drops with a per-entry error.
SkillReconciler now takes manifest_sources: List[str] instead of a single string. When any source fails to load, the cycle skips stale-removal so the missing source's skills don't get falsely orphaned. Per-entry installs still run against whatever did load.
Wire format resolved through a new util.split_csv helper that both _resolve_csv_appended and the new _resolve_skills_manifest_sources share. The new helper means \" ,,, \" and other all-separator inputs degrade cleanly to "no manifests configured" rather than constructing phantom sources.

Testing

tests/test_skill_manifest.py: 13 new cases. split_sources (5), load_manifests (8 covering union, URL dedupe with warning, name collision with explicit names, name collision via slugging, non-GitHub URL warning, partial fetch failure both orderings, URLError isolation, all-fail, order preservation, token-per-source).
tests/test_skill_reconciler.py: 7 new cases in a TestMultipleManifests class. Multi-source union, URL dedupe through the reconciler, name collision through the reconciler, partial failure preserving managed skills (both [good, missing] and [missing, good] orderings), all-sources-fail, empty-sources no-op. Plus 15 existing tests adapted from the single-string signature to the new list signature.
tests/test_config_integration.py: 7 new cases in TestSkillsManifestResolver pinning the env-vs-traitlet contract end to end, including the all-separator-input edge case.

Full suite green: pytest tests/ --ignore=tests/test_claude_client.py → 908 passed. jlpm tsc --noEmit, jlpm lint:check, jlpm jest all clean.

Risks and follow-ups

Permanent partial-failure means stale skills accumulate. A manifest source that fails forever pins stale-removal off indefinitely. Documented as accepted behavior in docs/skills.md; a follow-up could add a "force stale removal after K consecutive partial-failure cycles" escape hatch.
Stop-signal latency. SkillReconciler.stop() can block up to N * FETCH_TIMEOUT_SECONDS if the first manifest fetch hangs. Pre-existing single-manifest behavior; scales linearly with N. Follow-up could thread the stop event through load_manifests so the loop bails between sources.
Per-source ownership of installed skills. Today Skill.managed_source records the entry URL, not the owning manifest URL. With per-source ownership the reconciler could remove only the skills owned by the loaded manifests on partial failure, instead of skipping removal entirely. That's a schema change in serialize_skill_md and warrants its own PR.
Token fanout. NBI_MANAGED_SKILLS_TOKEN is shipped as a Bearer header to every manifest URL in the list. Pre-existing single-manifest behavior; the multi-manifest path multiplies the blast radius of a misconfigured host. Worth a one-line caveat in the docs; deferred to keep this PR scoped to the wire-format change.

The full review pass (six agents: code-reuse, code-quality, efficiency, backend/contract, distributed-systems, test-architecture) produced fix-before-merge items that all landed in the final commit. Remaining items above are follow-ups.

Adds `load_manifests` that takes a list of manifest URLs/paths, loads each independently, and merges the entries with dedupe-by-URL (first-wins + warning) and dedupe-by-predicted-name (skip second + per-entry error). A `split_sources` helper handles the comma-separated env-var / traitlet wire format with whitespace tolerance. The new `MergedManifest` return type carries the deduped entries, per-source errors, advisory warnings, and the set of sources that loaded successfully. Reconciler will consume `loaded_sources` to decide whether stale removal is safe this cycle (a missing source shouldn't orphan its skills). `ManifestEntry` gains a `source` field so dedupe diagnostics can name both the kept and the ignored manifest. Single-manifest callers continue to work unchanged.

Switches SkillReconciler from a single `manifest_source: str` to `manifest_sources: List[str]` and routes the per-cycle load through `load_manifests`. Per-source errors and dedupe warnings surface in ReconcileResult and the server log. Partial-failure safety: if any source fails to load, the reconciler skips stale-removal for that cycle so a transient outage on one manifest can't orphan the skills it owned. Per-entry installs still run against whatever sources did load. Single-source deployments behave identically (1/1 sources loaded → stale removal runs). AIServiceManager now consumes `skills_manifest_sources` (a list) from its options rather than the single-string `skills_manifest`.

`NBI_SKILLS_MANIFEST` and the matching traitlet now accept either a single source (URL or filesystem path) or a comma-separated list of either. Empty input still disables the feature; `,,,` and other all-separator inputs degrade cleanly to "no manifests configured." Docs (`admin-guide.md`, `skills.md`) and the handler 409 message explain the new format. The reconciler endpoint's "no manifest" response now mentions the multi-source contract so operators don't hit it and wonder why their second manifest stops loading.

Six findings from the parallel review pass. - `_predicted_name` now runs the same `_slug()` step the real installer uses, so subpath casing/whitespace variants dedupe correctly. Two manifests pointing at .../skills/data-eda and .../Data_EDA collide before install rather than letting the second managed install silently overwrite the first. - When `_predicted_name` returns None (non-GitHub URL or unparseable entry), `load_manifests` now surfaces a warning so operators know cross-manifest name dedupe was skipped for that entry. The install-time collision check in SkillManager remains the backstop. - Extracted `util.split_csv` and have both `_resolve_csv_appended` (extension.py) and `skill_manifest.split_sources` use it. Single source of truth for the comma/strip/drop-empties wire format. - Extracted `_resolve_skills_manifest_sources(traitlet_value)` so the env-var-vs-traitlet contract can be unit-tested without spinning up the extension harness. Seven new tests pin env-wins, traitlet-fallback, comma splitting, whitespace edges, and the all-separator-input disable path. - Added test coverage for: name collision via slugging (the bug fixed above), non-GitHub URL warning, URLError isolation across sources, and a reverse-order partial-failure case ([missing, good] alongside the existing [good, missing]). Skipped findings noted as follow-ups: stop-signal latency on hanging fetches (pre-existing single-manifest behavior), permanent partial- failure stale accumulation (documented contract), per-source skill ownership for precise stale handling (schema change), failure- visibility metric, and the `MergedManifest.all_loaded()` helper.

Second review pass surfaced a pre-existing thread-safety hole in SkillReconciler.start() / stop(): two concurrent start() calls can spawn an untracked daemon thread, and a start() racing with stop() can orphan threads or flip the wrong stop event. Probability is low but the admin-reachable kill-switch endpoint (SkillsReconcilerStopHandler) is documented as an incident-response tool, so the race is reachable in practice. Adds a `_lifecycle_lock` around the `_thread` / `_stop` mutations. start() is a no-op when a thread already exists; stop() captures the thread reference under the lock then joins outside the lock so a second stop() doesn't block on a hung join. Two new tests pin the contract (concurrent-start does-not-spawn-two-threads, and an empty-manifest-among-loaded test that documents the stale-removal behavior the test architect flagged as silently destructive). Also adds a trust-boundary note to docs/skills.md near the multi-manifest section: NBI_MANAGED_SKILLS_TOKEN is sent to every URL in the comma-separated list, including non-GitHub hosts. The no-redirect handler blocks server-side leaks but cannot stop a typo'd entry from receiving the token. The pre-existing single- manifest contract had the same property; multi-manifest just multiplies the surface and operators should know.

Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).

pjdoland added 4 commits May 18, 2026 20:57

pjdoland added the enhancement New feature or request label May 19, 2026

mbektas approved these changes May 19, 2026

View reviewed changes

mbektas merged commit 3f4e7cc into plmbr:main May 19, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): accept comma-separated list of managed-skills manifests#321

feat(skills): accept comma-separated list of managed-skills manifests#321
mbektas merged 5 commits into
plmbr:mainfrom
pjdoland:feat/multi-skills-manifest

pjdoland commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjdoland commented May 19, 2026

Summary

Solution

Testing

Risks and follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants