Skip to content

fix(self-host): harvest non-weight siblings into slug_dir (gh-8749)#9610

Merged
nnshah1 merged 4 commits into
mainfrom
nnshah1/gh-8749-mm-routing-hf-harvest
May 16, 2026
Merged

fix(self-host): harvest non-weight siblings into slug_dir (gh-8749)#9610
nnshah1 merged 4 commits into
mainfrom
nnshah1/gh-8749-mm-routing-hf-harvest

Conversation

@nnshah1
Copy link
Copy Markdown
Contributor

@nnshah1 nnshah1 commented May 15, 2026

Summary

After PR #9057 (MDC verify-and-cache pipeline), the per-model slug_dir only contained the 5 typed metadata slots — config.json, tokenizer.json, tokenizer_config.json, generation_config.json, chat_template.*. Consumers that instantiate via from_pretrained(slug_dir) — lightseek-mm in particular — silently degraded to text-prefix-only KV routing because sibling files (preprocessor_config.json, special_tokens_map.json, added_tokens.json, tokenizer.model) were no longer reachable from the model directory.

This PR closes that regression with a ~80-LoC harvest pass at the end of resolve_metadata_files. After the typed-slot resolve loop, walk every directory the resolve actually touched and symlink non-weight sibling files into slug_dir if they aren't already there.

How it works

  1. Collect snapshot directories from two sources:
    • every hf:// snapshot dir returned by hub::from_hf (already populated in hf_snapshots)
    • the parent directory of every file:// URI that the typed-slot loop resolved (covers shared-mount workers and pre-PR1 workers whose CheckedFiles rewrite through checked_file_uri's rung 4 to hf://)
  2. For each snapshot dir, walk it and symlink any file that is:
    • a regular file
    • not in the weight-extension deny-list (.safetensors, .bin, .gguf, .onnx, .tflite, .h5, .pt, .pth, .msgpack)
    • not already present as a typed-slot entry in slug_dir

The symlink target is canonicalized so downstream canonicalizes land on a stable blob path.

Trust boundary

No new trust surface. Every harvested file comes from a snapshot directory the typed-slot resolve already trusted — either an hub::from_hf snapshot (ModelExpress / HF Hub verified) or a file://-resolved directory the worker pointed the typed slots at. Files in slug_dir that arrived via this pass and files that arrived via the typed-slot loop share the same provenance.

Coverage

Worker mode MDC slot URIs Harvest fires?
Current DYN_SELF_HOST_METADATA=false (HF-backed) hf:// ✅ from snapshot
Shared-mount worker (worker and frontend on same host / mount) file:// (path reachable) ✅ from file parent
Pre-PR1 worker (e.g. v1.1.0 image) without shared mount file:// rewritten to hf:// at rung 4 ✅ from snapshot
Future DYN_SELF_HOST_METADATA=true http:// ✗ — needs worker-advertised extras list (separate follow-up, lands before default-ON flip)

Compared to #9441's piece #4

#9441 addresses the same symptom for lightseek-mm via a new hub::find_local_snapshot() helper called from a cfg(feature = "lightseek-mm") fallback in mdc_model_dir(). This PR is the same fix one layer earlier and more general:

  • ~15 LoC of wire-up + a ~50 LoC helper + tests, vs ~30 LoC + a new public API
  • Generic across all consumers (sglang/vllm chat processors, future image_processor_config.json readers, …), not just lightseek-mm
  • Lives in the unified resolve_metadata_files pipeline rather than a per-consumer fallback
  • No new public API; reuses the hf_snapshots map the typed-slot loop already builds

If this lands, pieces 1–3 of #9441 (Phi-3 substitution, Qwen2-VL chat-template placeholder, tokenizer-loader merge) stay as-is and piece #4 plus find_local_snapshot can be dropped.

Stale comment cleanup

The ABSOLUTE_MAX_METADATA_BYTES doc referenced a "fallback when size is absent (legacy MDCs)" — is_new_format() was already removed during the PR #9057 collapse. Updated to drop the legacy framing.

Test plan

  • Unit: harvest_siblings_copies_non_weights_skipping_existing — non-weight siblings land, weights filtered, existing typed-slot symlinks preserved
  • Unit: harvest_siblings_tolerates_missing_snapshot — missing snapshot dir is a no-op
  • cargo test -p dynamo-llm --lib model_card:: — all 15 tests pass
  • cargo test -p dynamo-llm --lib — 1101 tests pass, no regressions
  • cargo fmt --check clean
  • cargo clippy -- -D warnings clean
  • E2E: existing mm_agg_router_qwen2-vl-2b / mm_agg_router_qwen2.5-vl-3b / mm_agg_router_phi3-vision post-merge profiles (require GPU dispatch) — these tests use make_image_payload_cached_tokens and assert cached_tokens >= 1 on a repeat MM request, which is exactly what fails today and should pass after this change
  • E2E: pre-PR1 (v1.1.0) worker + this branch frontend — verify backcompat through rung-4 rewrite

Related

Summary by CodeRabbit

  • Improvements

    • Enhanced model deployment to better handle and include sibling metadata files during resolution.
  • Tests

    • Added unit tests for the metadata file harvesting mechanism during model resolution.

Review Change Stack

After PR #9057, `slug_dir` only contained the 5 typed metadata slots
(`config.json`, `tokenizer.json`, `tokenizer_config.json`,
`generation_config.json`, `chat_template.*`). Consumers that load
through `from_pretrained(slug_dir)` — lightseek-mm in particular —
silently degraded to text-prefix routing because sibling files like
`preprocessor_config.json` were no longer reachable from the model dir.

After the typed-slot resolve loop, harvest non-weight sibling files
from every directory the resolve actually walked: each `hf://` snapshot
returned by `hub::from_hf`, plus every `file://` parent dir (covers
shared-mount workers and pre-PR1 workers whose CheckedFiles rewrite
through rung 4 to `hf://`). Skip files in the worker-published weight
extension deny-list. Skip existing entries so typed-slot symlinks are
never disturbed. Files come from the same canonical snapshots the
typed slots already trust — no new trust surface introduced.

No-op for `http://` (default-ON self-host) — a worker-advertised
extras list is needed there and will land before the default-ON flip.

Also drops a stale "legacy MDCs" reference in the
`ABSOLUTE_MAX_METADATA_BYTES` comment (`is_new_format()` was removed
during the PR #9057 collapse).

Signed-off-by: nnshah1 <neelays@nvidia.com>
@github-actions github-actions Bot added the fix label May 15, 2026
@nnshah1 nnshah1 marked this pull request as ready for review May 15, 2026 21:43
@nnshah1 nnshah1 requested a review from a team as a code owner May 15, 2026 21:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR adds a sibling harvest mechanism to the model deployment card resolution pipeline. New helper functions filter weight blobs and symlink non-weight metadata files from snapshot directories into the model slug directory. Integration into the resolution flow collects snapshot directories from both HF and file artifacts, then harvests their sibling files. Unit tests validate correct file copying, weight skipping, and missing directory tolerance.

Changes

Sibling Harvest Mechanism

Layer / File(s) Summary
Sibling harvesting helper functions and tests
lib/llm/src/model_card.rs
Adds is_weight_file filter to identify weight blobs and harvest_siblings function that iterates snapshot entries, skips weights and unreadable paths, and symlinks/copies non-weight metadata into slug_dir without overwriting. Unit tests verify correct file copying, weight blob skipping, preservation of pre-existing typed-slot symlinks, and no-op behavior on missing directories.
Resolution pipeline Pass 2.5 integration
lib/llm/src/model_card.rs
Introduces "Pass 2.5" in ModelDeploymentCard::resolve_metadata_files that collects snapshot directories from pre-resolved hf:// snapshots and file:// artifact parents, then invokes harvest_siblings for each to populate additional non-weight metadata in slug_dir before cache path rewriting.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: harvesting non-weight sibling files into slug_dir to fix a regression from PR #9057.
Description check ✅ Passed The PR description comprehensively covers Overview (summary of regression and fix), Details (mechanism and trust analysis), Related Issues (closes #9057 regression, references #9441 and gh-8749), but lacks explicit 'Where should the reviewer start?' section.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/llm/src/model_card.rs (1)

1063-1089: 💤 Low value

Consider best-effort error handling for sibling harvest failures.

The current implementation propagates errors from harvest_siblings via ?, which fails the entire metadata resolution if any single sibling file fails to symlink (e.g., due to transient permission issues). Since the PR describes this as a supplementary pass for MM-aware routing that "silently degrades to text-prefix" without these files, consider logging errors and continuing instead of failing the full resolve.

However, if failing the resolve is intentional (to surface broken setups early), the current approach is acceptable.

🔧 Optional: Log and continue on harvest errors
         for snap in &snapshot_dirs {
-            harvest_siblings(snap, &slug_dir)?;
+            if let Err(e) = harvest_siblings(snap, &slug_dir) {
+                tracing::warn!(
+                    snapshot = %snap.display(),
+                    error = %e,
+                    "sibling harvest failed; MM routing may degrade to text-prefix",
+                );
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/model_card.rs` around lines 1063 - 1089, The call to
harvest_siblings inside the resolve pass currently uses the ? operator and will
abort the entire metadata resolution on any sibling-harvest error; change this
to best-effort handling so failures are logged and the loop continues. In
lib/llm/src/model_card.rs, replace the propagation at the harvest_siblings(snap,
&slug_dir)? call with an explicit match or if-let that logs the error (e.g.,
processLogger.warn or tracing::warn with context including snap and slug_dir)
and continues, so missing or transient permission errors don't fail the whole
resolve while preserving existing behavior when no errors occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/llm/src/model_card.rs`:
- Around line 2078-2081: The test currently calls
std::os::unix::fs::symlink(&typed_slot_blob, slug.path().join("config.json"))
which is Unix-only and will fail to compile on Windows; either wrap that symlink
call (or the whole test) with a #[cfg(unix)] guard, or replace the call with the
cross-platform helper symlink_force from the parent module to create the link;
locate the call by the symbols typed_slot_blob and slug.path() in model_card.rs
and apply the platform conditional or swap to symlink_force so the test builds
on all platforms.

---

Nitpick comments:
In `@lib/llm/src/model_card.rs`:
- Around line 1063-1089: The call to harvest_siblings inside the resolve pass
currently uses the ? operator and will abort the entire metadata resolution on
any sibling-harvest error; change this to best-effort handling so failures are
logged and the loop continues. In lib/llm/src/model_card.rs, replace the
propagation at the harvest_siblings(snap, &slug_dir)? call with an explicit
match or if-let that logs the error (e.g., processLogger.warn or tracing::warn
with context including snap and slug_dir) and continues, so missing or transient
permission errors don't fail the whole resolve while preserving existing
behavior when no errors occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a68ce10a-3588-481e-8ad9-2c6150dceb79

📥 Commits

Reviewing files that changed from the base of the PR and between f0a0891 and 4254853.

📒 Files selected for processing (1)
  • lib/llm/src/model_card.rs

Comment thread lib/llm/src/model_card.rs Outdated
Applied graham-code-review pass to the harvest:

- Trim `harvest_siblings` rustdoc — drop the trust-model paragraph
  (belongs in the PR description, not the function doc).
- Trim the "Pass 2.5" comment in `resolve_metadata_files` to two lines.
- Extract `file_uri_parent(uri: &str) -> Option<PathBuf>` helper so the
  wire-up reads as one `if let Some(parent) = ...` instead of a
  five-condition let-chain. Same total LoC; less to mentally parse at
  the call site; helper is independently testable.
- Split the monolithic `harvest_siblings_copies_non_weights_skipping_existing`
  test into three focused tests: `harvest_brings_in_non_weight_siblings`,
  `harvest_skips_weight_blobs`, `harvest_preserves_existing_dst`.
- Extend `download_config_pipelines_local_files_through_cache` with two
  asserts that the TinyLlama fixture's non-typed siblings
  (`special_tokens_map.json`, `tokenizer.model`) land in slug_dir —
  closes the end-to-end coverage gap for the harvest wire-up.

No behavior change. 17/17 model_card tests pass, fmt + clippy clean.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1
Copy link
Copy Markdown
Contributor Author

nnshah1 commented May 15, 2026

Self-review (graham-code-review skill)

Ran the graham-code-review skill against this PR; capturing the findings here for transparency. Tightenings already applied in 62836e016b (this push):

Applied:

  • Trimmed harvest_siblings rustdoc — dropped the trust-model paragraph (belongs in PR description, not the function doc).
  • Trimmed the "Pass 2.5" comment block in resolve_metadata_files to two lines.
  • Extracted file_uri_parent(uri: &str) -> Option<PathBuf> helper so the wire-up reads as one if let Some(parent) = ... instead of a five-condition let-chain.
  • Split the monolithic harvest test into three focused tests (harvest_brings_in_non_weight_siblings, harvest_skips_weight_blobs, harvest_preserves_existing_dst).
  • Closed the wire-up coverage gap: download_config_pipelines_local_files_through_cache now asserts that the TinyLlama fixture's non-typed siblings (special_tokens_map.json, tokenizer.model) land in slug_dir after the full pipeline.

Rule checks passed:

  • No unwrap()/expect() in production code; tests use ?
  • tracing (not log) with structured fields (%/?)
  • debug! level for routine internal events; no hot-path logging
  • No reflexive Arc<Mutex<…>>, no new dependencies, SPDX-only header
  • is_weight_file correctly is_-prefixed
  • Symlink replacement is atomic (stage_and_rename_syncstd::fs::rename); the dst.exists() / symlink_force race is benign (idempotent)
  • path.is_file() follows symlinks — intentional for HF cache layout

One acknowledged caveat (not addressed):

  • harvest_siblings is synchronous I/O (std::fs::read_dir, std::fs::canonicalize, symlink_force) called from async fn resolve_metadata_files. Consistent with the rest of this file (stage_and_rename_sync etc.) — pre-existing pattern, not introduced here. Volume is small (1-3 snapshot dirs, ~15 files each), so the blocking impact is in the noise. If the MDC metadata footprint ever balloons, wrap in tokio::task::spawn_blocking.

Final diff: +147 / -85 vs main (down from +166 / -1 before this self-review pass).

…odeRabbit)

Replace direct `std::os::unix::fs::symlink` with the cross-platform
`symlink_force` helper used by the production code path, per CodeRabbit
review on PR #9610. Test compiles on non-Unix and matches the rest of
the file's pattern.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Copy link
Copy Markdown
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Neelay for putting up the fix in short time!

Comment thread lib/llm/src/model_card.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Comment thread lib/llm/src/model_card.rs Outdated
Addresses three review comments on PR #9610:

1. krishung5 (nit): the file_uri_parent helper's doc + body landed
   AFTER harvest_siblings's doc when I extracted the helper, so the
   harvest_siblings doc was orphaned and the helper carried two stacked
   rustdoc blocks. Swap order: file_uri_parent first with its doc,
   then harvest_siblings with its doc directly above the function.

2. dynamo-ops (correctness): drop the `if dst.exists() { continue }`
   check, which left stale harvested siblings in a reused slug_dir
   when mdcsum collides across different upstream snapshots. Add a
   `typed_filenames: &HashSet<String>` argument so the harvest knows
   which basenames the resolve loop's typed-slot pass owns; skip
   those and re-link every other sibling unconditionally via
   `symlink_force` (already idempotent + atomic). Callers populate
   the set from `entries` (the typed-slot uri list).

3. dynamo-ops (portability): the `harvest_preserves_existing_dst`
   test asserted `preserved.is_symlink()`, which fails on non-Unix
   where `symlink_force` falls back to a copy. Rewritten as
   `harvest_preserves_typed_filenames`: places a typed-slot symlink
   AND a different `config.json` payload in the snapshot dir, runs
   the harvest with the typed name in the deny-list, asserts the
   typed-slot content survives (content equality is portable).

17/17 model_card tests pass, fmt + clippy clean.

Signed-off-by: nnshah1 <neelays@nvidia.com>
@nnshah1 nnshah1 enabled auto-merge (squash) May 16, 2026 00:40
@nnshah1 nnshah1 merged commit 94771a9 into main May 16, 2026
99 checks passed
@nnshah1 nnshah1 deleted the nnshah1/gh-8749-mm-routing-hf-harvest branch May 16, 2026 00:40
saturley-hall pushed a commit that referenced this pull request May 17, 2026
…#9610) (#9652)

Signed-off-by: nnshah1 <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
nnshah1 added a commit that referenced this pull request May 18, 2026
Worker harvest was using `HuggingFaceProvider::is_weight_file` from
mx, whose deny-list is narrower than the one used by the frontend's
gh-8749 hf-sibling harvest (#9610): mx misses `.gguf`, `.onnx`,
`.tflite`, `.pt`, `.pth`. That asymmetry meant a stray PyTorch /
ONNX / gguf weight in the model dir would slip past the worker
filter, get advertised as an `extra_files` entry, and ride the
metadata HTTP path to the frontend — turning a multi-GB weight
blob into a config-cache resident.

Promote `model_card::is_weight_file` to `pub(crate)` and call it
from the worker harvest. Single source of truth for "is this a
weight?" across both gh-8749 harvests; no new constants.

Bumps the worker harvest test to also assert `.pt` is filtered.

Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants