feat(skills): accept comma-separated list of managed-skills manifests#321
Merged
Conversation
Adds `load_manifests` that takes a list of manifest URLs/paths, loads each independently, and merges the entries with dedupe-by-URL (first-wins + warning) and dedupe-by-predicted-name (skip second + per-entry error). A `split_sources` helper handles the comma-separated env-var / traitlet wire format with whitespace tolerance. The new `MergedManifest` return type carries the deduped entries, per-source errors, advisory warnings, and the set of sources that loaded successfully. Reconciler will consume `loaded_sources` to decide whether stale removal is safe this cycle (a missing source shouldn't orphan its skills). `ManifestEntry` gains a `source` field so dedupe diagnostics can name both the kept and the ignored manifest. Single-manifest callers continue to work unchanged.
Switches SkillReconciler from a single `manifest_source: str` to `manifest_sources: List[str]` and routes the per-cycle load through `load_manifests`. Per-source errors and dedupe warnings surface in ReconcileResult and the server log. Partial-failure safety: if any source fails to load, the reconciler skips stale-removal for that cycle so a transient outage on one manifest can't orphan the skills it owned. Per-entry installs still run against whatever sources did load. Single-source deployments behave identically (1/1 sources loaded → stale removal runs). AIServiceManager now consumes `skills_manifest_sources` (a list) from its options rather than the single-string `skills_manifest`.
`NBI_SKILLS_MANIFEST` and the matching traitlet now accept either a single source (URL or filesystem path) or a comma-separated list of either. Empty input still disables the feature; `,,,` and other all-separator inputs degrade cleanly to "no manifests configured." Docs (`admin-guide.md`, `skills.md`) and the handler 409 message explain the new format. The reconciler endpoint's "no manifest" response now mentions the multi-source contract so operators don't hit it and wonder why their second manifest stops loading.
Six findings from the parallel review pass. - `_predicted_name` now runs the same `_slug()` step the real installer uses, so subpath casing/whitespace variants dedupe correctly. Two manifests pointing at .../skills/data-eda and .../Data_EDA collide before install rather than letting the second managed install silently overwrite the first. - When `_predicted_name` returns None (non-GitHub URL or unparseable entry), `load_manifests` now surfaces a warning so operators know cross-manifest name dedupe was skipped for that entry. The install-time collision check in SkillManager remains the backstop. - Extracted `util.split_csv` and have both `_resolve_csv_appended` (extension.py) and `skill_manifest.split_sources` use it. Single source of truth for the comma/strip/drop-empties wire format. - Extracted `_resolve_skills_manifest_sources(traitlet_value)` so the env-var-vs-traitlet contract can be unit-tested without spinning up the extension harness. Seven new tests pin env-wins, traitlet-fallback, comma splitting, whitespace edges, and the all-separator-input disable path. - Added test coverage for: name collision via slugging (the bug fixed above), non-GitHub URL warning, URLError isolation across sources, and a reverse-order partial-failure case ([missing, good] alongside the existing [good, missing]). Skipped findings noted as follow-ups: stop-signal latency on hanging fetches (pre-existing single-manifest behavior), permanent partial- failure stale accumulation (documented contract), per-source skill ownership for precise stale handling (schema change), failure- visibility metric, and the `MergedManifest.all_loaded()` helper.
Second review pass surfaced a pre-existing thread-safety hole in SkillReconciler.start() / stop(): two concurrent start() calls can spawn an untracked daemon thread, and a start() racing with stop() can orphan threads or flip the wrong stop event. Probability is low but the admin-reachable kill-switch endpoint (SkillsReconcilerStopHandler) is documented as an incident-response tool, so the race is reachable in practice. Adds a `_lifecycle_lock` around the `_thread` / `_stop` mutations. start() is a no-op when a thread already exists; stop() captures the thread reference under the lock then joins outside the lock so a second stop() doesn't block on a hung join. Two new tests pin the contract (concurrent-start does-not-spawn-two-threads, and an empty-manifest-among-loaded test that documents the stale-removal behavior the test architect flagged as silently destructive). Also adds a trust-boundary note to docs/skills.md near the multi-manifest section: NBI_MANAGED_SKILLS_TOKEN is sent to every URL in the comma-separated list, including non-GitHub hosts. The no-redirect handler blocks server-side leaks but cannot stop a typo'd entry from receiving the token. The pre-existing single- manifest contract had the same property; multi-manifest just multiplies the surface and operators should know.
mbektas
approved these changes
May 19, 2026
pjdoland
added a commit
to pjdoland/notebook-intelligence
that referenced
this pull request
May 22, 2026
Promotes the [Unreleased] CHANGELOG snapshot to [5.0.0] - 2026-05-22 and expands it to cover everything merged into upstream/main after PR plmbr#287's docs refresh. Bumps package.json to 5.0.0. CHANGELOG additions cover the post-plmbr#287 surface: - Settings tabs: plugin marketplace picker (plmbr#284), plugin marketplace details + Update button (plmbr#303), per-workspace MCP disable (plmbr#286), JSON-paste path in Add MCP server (plmbr#285). - Launchers: hide-with-policy (plmbr#288), brand icons for Codex / opencode (plmbr#325, plmbr#333), per-launch directory picker (plmbr#332). - Chat sidebar and agentic UX: workspace @-mention in Claude mode (plmbr#327), reload-open-files-on-disk (plmbr#330), steered system prompt away from over-eager notebook creation (plmbr#336). - Skills: multi-manifest support (plmbr#321), tracks-upstream for user- imported skills (plmbr#322), HTTP kill switch for the reconciler (plmbr#291). - Accessibility: full sub-section covering plmbr#305-plmbr#320. - Security: shell-tool sandbox (plmbr#290), Claude UI-bridge sandbox (plmbr#323), 0o600 on encrypted token (plmbr#293), env-secret scrubbing (plmbr#295), MCP config shape validation (plmbr#299), XSS allowlist (plmbr#296), Copilot WS auth + origin (plmbr#301), GHE host detection (plmbr#292), fastmcp -> mcp SDK swap (plmbr#324). - Fixed: session listing unification (plmbr#310), session preview unwrap (plmbr#331), down-area runtime throw (plmbr#330 follow-up), WS message-handler leak (plmbr#294). - Removed: fastmcp dependency, history.jsonl session gate. Adds a Migration note covering the five behavior changes operators should review before upgrading from 4.x: fastmcp swap, path sandboxes, history.jsonl gate removal, workspace @-mention pointer shape, and the Copilot WebSocket auth/origin tightening. Two reviewer rounds (six personas each) applied: - Round 1 caught security overclaims (plmbr#293, plmbr#299, plmbr#323), the plmbr#284/plmbr#303 mis-attribution, missing migration note, 3 em dashes, and the stale `fastmcp==2.x.*` recommendation in the admin guide. - Round 2 caught the missing plmbr#301 migration bullet, missing version- matrix 5.0.x row, missing README TOC entry, and a couple of style nits (sub-heading overpromise, orphan bullet). Skipped (deferred to future PRs): - README first-run tour mention. - Admin guide HTTP kill-switch row in Failure-modes table. - Terminal drag-drop trust-model precision update after plmbr#327. - Cipher description nit in plmbr#293 (Fernet AES-128-CBC+HMAC, not AES-GCM).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
NBI_SKILLS_MANIFEST(and the matchingskills_manifesttraitlet) now accept a comma-separated list of YAML/JSON manifests. The reconciler loads each independently and unions the entries, with first-wins URL dedupe and per-entry-error name dedupe. Single-manifest deployments behave identically; the wire format is strictly additive.The use case: a JupyterHub deployment can ship an org-wide baseline manifest plus a per-team manifest in one env var, without baking a single combined file at deploy time.
Solution
Three layers of change, one commit each, plus a follow-up commit applying the convergent review findings.
load_manifests(sources, *, token)innotebook_intelligence/skill_manifest.py. Iterates sources, catchesManifestErrorper source so a transient outage on one manifest doesn't block the others, and merges entries with two passes of dedupe:url:in two manifests means the earlier one wins; a warning names both manifests._slug()step the real installer applies (subpath casing, whitespace, capitals all normalize), so two URLs that would install to the same name collide before staging. The second entry drops with a per-entry error.SkillReconcilernow takesmanifest_sources: List[str]instead of a single string. When any source fails to load, the cycle skips stale-removal so the missing source's skills don't get falsely orphaned. Per-entry installs still run against whatever did load.util.split_csvhelper that both_resolve_csv_appendedand the new_resolve_skills_manifest_sourcesshare. The new helper means\" ,,, \"and other all-separator inputs degrade cleanly to "no manifests configured" rather than constructing phantom sources.Testing
tests/test_skill_manifest.py: 13 new cases.split_sources(5),load_manifests(8 covering union, URL dedupe with warning, name collision with explicit names, name collision via slugging, non-GitHub URL warning, partial fetch failure both orderings, URLError isolation, all-fail, order preservation, token-per-source).tests/test_skill_reconciler.py: 7 new cases in aTestMultipleManifestsclass. Multi-source union, URL dedupe through the reconciler, name collision through the reconciler, partial failure preserving managed skills (both[good, missing]and[missing, good]orderings), all-sources-fail, empty-sources no-op. Plus 15 existing tests adapted from the single-string signature to the new list signature.tests/test_config_integration.py: 7 new cases inTestSkillsManifestResolverpinning the env-vs-traitlet contract end to end, including the all-separator-input edge case.Full suite green:
pytest tests/ --ignore=tests/test_claude_client.py→ 908 passed.jlpm tsc --noEmit,jlpm lint:check,jlpm jestall clean.Risks and follow-ups
docs/skills.md; a follow-up could add a "force stale removal after K consecutive partial-failure cycles" escape hatch.SkillReconciler.stop()can block up toN * FETCH_TIMEOUT_SECONDSif the first manifest fetch hangs. Pre-existing single-manifest behavior; scales linearly with N. Follow-up could thread the stop event throughload_manifestsso the loop bails between sources.Skill.managed_sourcerecords the entry URL, not the owning manifest URL. With per-source ownership the reconciler could remove only the skills owned by the loaded manifests on partial failure, instead of skipping removal entirely. That's a schema change inserialize_skill_mdand warrants its own PR.NBI_MANAGED_SKILLS_TOKENis shipped as a Bearer header to every manifest URL in the list. Pre-existing single-manifest behavior; the multi-manifest path multiplies the blast radius of a misconfigured host. Worth a one-line caveat in the docs; deferred to keep this PR scoped to the wire-format change.The full review pass (six agents: code-reuse, code-quality, efficiency, backend/contract, distributed-systems, test-architecture) produced fix-before-merge items that all landed in the final commit. Remaining items above are follow-ups.