fix(self-host): harvest non-weight siblings into slug_dir (gh-8749)#9610
Conversation
After PR #9057, `slug_dir` only contained the 5 typed metadata slots (`config.json`, `tokenizer.json`, `tokenizer_config.json`, `generation_config.json`, `chat_template.*`). Consumers that load through `from_pretrained(slug_dir)` — lightseek-mm in particular — silently degraded to text-prefix routing because sibling files like `preprocessor_config.json` were no longer reachable from the model dir. After the typed-slot resolve loop, harvest non-weight sibling files from every directory the resolve actually walked: each `hf://` snapshot returned by `hub::from_hf`, plus every `file://` parent dir (covers shared-mount workers and pre-PR1 workers whose CheckedFiles rewrite through rung 4 to `hf://`). Skip files in the worker-published weight extension deny-list. Skip existing entries so typed-slot symlinks are never disturbed. Files come from the same canonical snapshots the typed slots already trust — no new trust surface introduced. No-op for `http://` (default-ON self-host) — a worker-advertised extras list is needed there and will land before the default-ON flip. Also drops a stale "legacy MDCs" reference in the `ABSOLUTE_MAX_METADATA_BYTES` comment (`is_new_format()` was removed during the PR #9057 collapse). Signed-off-by: nnshah1 <neelays@nvidia.com>
WalkthroughThis PR adds a sibling harvest mechanism to the model deployment card resolution pipeline. New helper functions filter weight blobs and symlink non-weight metadata files from snapshot directories into the model slug directory. Integration into the resolution flow collects snapshot directories from both HF and file artifacts, then harvests their sibling files. Unit tests validate correct file copying, weight skipping, and missing directory tolerance. ChangesSibling Harvest Mechanism
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/llm/src/model_card.rs (1)
1063-1089: 💤 Low valueConsider best-effort error handling for sibling harvest failures.
The current implementation propagates errors from
harvest_siblingsvia?, which fails the entire metadata resolution if any single sibling file fails to symlink (e.g., due to transient permission issues). Since the PR describes this as a supplementary pass for MM-aware routing that "silently degrades to text-prefix" without these files, consider logging errors and continuing instead of failing the full resolve.However, if failing the resolve is intentional (to surface broken setups early), the current approach is acceptable.
🔧 Optional: Log and continue on harvest errors
for snap in &snapshot_dirs { - harvest_siblings(snap, &slug_dir)?; + if let Err(e) = harvest_siblings(snap, &slug_dir) { + tracing::warn!( + snapshot = %snap.display(), + error = %e, + "sibling harvest failed; MM routing may degrade to text-prefix", + ); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/llm/src/model_card.rs` around lines 1063 - 1089, The call to harvest_siblings inside the resolve pass currently uses the ? operator and will abort the entire metadata resolution on any sibling-harvest error; change this to best-effort handling so failures are logged and the loop continues. In lib/llm/src/model_card.rs, replace the propagation at the harvest_siblings(snap, &slug_dir)? call with an explicit match or if-let that logs the error (e.g., processLogger.warn or tracing::warn with context including snap and slug_dir) and continues, so missing or transient permission errors don't fail the whole resolve while preserving existing behavior when no errors occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/llm/src/model_card.rs`:
- Around line 2078-2081: The test currently calls
std::os::unix::fs::symlink(&typed_slot_blob, slug.path().join("config.json"))
which is Unix-only and will fail to compile on Windows; either wrap that symlink
call (or the whole test) with a #[cfg(unix)] guard, or replace the call with the
cross-platform helper symlink_force from the parent module to create the link;
locate the call by the symbols typed_slot_blob and slug.path() in model_card.rs
and apply the platform conditional or swap to symlink_force so the test builds
on all platforms.
---
Nitpick comments:
In `@lib/llm/src/model_card.rs`:
- Around line 1063-1089: The call to harvest_siblings inside the resolve pass
currently uses the ? operator and will abort the entire metadata resolution on
any sibling-harvest error; change this to best-effort handling so failures are
logged and the loop continues. In lib/llm/src/model_card.rs, replace the
propagation at the harvest_siblings(snap, &slug_dir)? call with an explicit
match or if-let that logs the error (e.g., processLogger.warn or tracing::warn
with context including snap and slug_dir) and continues, so missing or transient
permission errors don't fail the whole resolve while preserving existing
behavior when no errors occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a68ce10a-3588-481e-8ad9-2c6150dceb79
📒 Files selected for processing (1)
lib/llm/src/model_card.rs
Applied graham-code-review pass to the harvest: - Trim `harvest_siblings` rustdoc — drop the trust-model paragraph (belongs in the PR description, not the function doc). - Trim the "Pass 2.5" comment in `resolve_metadata_files` to two lines. - Extract `file_uri_parent(uri: &str) -> Option<PathBuf>` helper so the wire-up reads as one `if let Some(parent) = ...` instead of a five-condition let-chain. Same total LoC; less to mentally parse at the call site; helper is independently testable. - Split the monolithic `harvest_siblings_copies_non_weights_skipping_existing` test into three focused tests: `harvest_brings_in_non_weight_siblings`, `harvest_skips_weight_blobs`, `harvest_preserves_existing_dst`. - Extend `download_config_pipelines_local_files_through_cache` with two asserts that the TinyLlama fixture's non-typed siblings (`special_tokens_map.json`, `tokenizer.model`) land in slug_dir — closes the end-to-end coverage gap for the harvest wire-up. No behavior change. 17/17 model_card tests pass, fmt + clippy clean. Signed-off-by: nnshah1 <neelays@nvidia.com>
Self-review (graham-code-review skill)Ran the Applied:
Rule checks passed:
One acknowledged caveat (not addressed):
Final diff: +147 / -85 vs main (down from +166 / -1 before this self-review pass). |
…odeRabbit) Replace direct `std::os::unix::fs::symlink` with the cross-platform `symlink_force` helper used by the production code path, per CodeRabbit review on PR #9610. Test compiles on non-Unix and matches the rest of the file's pattern. Signed-off-by: nnshah1 <neelays@nvidia.com>
krishung5
left a comment
There was a problem hiding this comment.
Thanks Neelay for putting up the fix in short time!
Addresses three review comments on PR #9610: 1. krishung5 (nit): the file_uri_parent helper's doc + body landed AFTER harvest_siblings's doc when I extracted the helper, so the harvest_siblings doc was orphaned and the helper carried two stacked rustdoc blocks. Swap order: file_uri_parent first with its doc, then harvest_siblings with its doc directly above the function. 2. dynamo-ops (correctness): drop the `if dst.exists() { continue }` check, which left stale harvested siblings in a reused slug_dir when mdcsum collides across different upstream snapshots. Add a `typed_filenames: &HashSet<String>` argument so the harvest knows which basenames the resolve loop's typed-slot pass owns; skip those and re-link every other sibling unconditionally via `symlink_force` (already idempotent + atomic). Callers populate the set from `entries` (the typed-slot uri list). 3. dynamo-ops (portability): the `harvest_preserves_existing_dst` test asserted `preserved.is_symlink()`, which fails on non-Unix where `symlink_force` falls back to a copy. Rewritten as `harvest_preserves_typed_filenames`: places a typed-slot symlink AND a different `config.json` payload in the snapshot dir, runs the harvest with the typed name in the deny-list, asserts the typed-slot content survives (content equality is portable). 17/17 model_card tests pass, fmt + clippy clean. Signed-off-by: nnshah1 <neelays@nvidia.com>
Worker harvest was using `HuggingFaceProvider::is_weight_file` from mx, whose deny-list is narrower than the one used by the frontend's gh-8749 hf-sibling harvest (#9610): mx misses `.gguf`, `.onnx`, `.tflite`, `.pt`, `.pth`. That asymmetry meant a stray PyTorch / ONNX / gguf weight in the model dir would slip past the worker filter, get advertised as an `extra_files` entry, and ride the metadata HTTP path to the frontend — turning a multi-GB weight blob into a config-cache resident. Promote `model_card::is_weight_file` to `pub(crate)` and call it from the worker harvest. Single source of truth for "is this a weight?" across both gh-8749 harvests; no new constants. Bumps the worker harvest test to also assert `.pt` is filtered. Signed-off-by: nnshah1 <neelays@nvidia.com>
Summary
After PR #9057 (MDC verify-and-cache pipeline), the per-model
slug_dironly contained the 5 typed metadata slots —config.json,tokenizer.json,tokenizer_config.json,generation_config.json,chat_template.*. Consumers that instantiate viafrom_pretrained(slug_dir)— lightseek-mm in particular — silently degraded to text-prefix-only KV routing because sibling files (preprocessor_config.json,special_tokens_map.json,added_tokens.json,tokenizer.model) were no longer reachable from the model directory.This PR closes that regression with a ~80-LoC harvest pass at the end of
resolve_metadata_files. After the typed-slot resolve loop, walk every directory the resolve actually touched and symlink non-weight sibling files intoslug_dirif they aren't already there.How it works
hf://snapshot dir returned byhub::from_hf(already populated inhf_snapshots)file://URI that the typed-slot loop resolved (covers shared-mount workers and pre-PR1 workers whose CheckedFiles rewrite throughchecked_file_uri's rung 4 tohf://).safetensors,.bin,.gguf,.onnx,.tflite,.h5,.pt,.pth,.msgpack)slug_dirThe symlink target is
canonicalized so downstreamcanonicalizes land on a stable blob path.Trust boundary
No new trust surface. Every harvested file comes from a snapshot directory the typed-slot resolve already trusted — either an
hub::from_hfsnapshot (ModelExpress / HF Hub verified) or afile://-resolved directory the worker pointed the typed slots at. Files inslug_dirthat arrived via this pass and files that arrived via the typed-slot loop share the same provenance.Coverage
DYN_SELF_HOST_METADATA=false(HF-backed)hf://file://(path reachable)file://rewritten tohf://at rung 4DYN_SELF_HOST_METADATA=truehttp://Compared to #9441's piece #4
#9441 addresses the same symptom for lightseek-mm via a new
hub::find_local_snapshot()helper called from acfg(feature = "lightseek-mm")fallback inmdc_model_dir(). This PR is the same fix one layer earlier and more general:image_processor_config.jsonreaders, …), not just lightseek-mmresolve_metadata_filespipeline rather than a per-consumer fallbackhf_snapshotsmap the typed-slot loop already buildsIf this lands, pieces 1–3 of #9441 (Phi-3 substitution, Qwen2-VL chat-template placeholder, tokenizer-loader merge) stay as-is and piece #4 plus
find_local_snapshotcan be dropped.Stale comment cleanup
The
ABSOLUTE_MAX_METADATA_BYTESdoc referenced a "fallback whensizeis absent (legacy MDCs)" —is_new_format()was already removed during the PR #9057 collapse. Updated to drop the legacy framing.Test plan
harvest_siblings_copies_non_weights_skipping_existing— non-weight siblings land, weights filtered, existing typed-slot symlinks preservedharvest_siblings_tolerates_missing_snapshot— missing snapshot dir is a no-opcargo test -p dynamo-llm --lib model_card::— all 15 tests passcargo test -p dynamo-llm --lib— 1101 tests pass, no regressionscargo fmt --checkcleancargo clippy -- -D warningscleanmm_agg_router_qwen2-vl-2b/mm_agg_router_qwen2.5-vl-3b/mm_agg_router_phi3-visionpost-merge profiles (require GPU dispatch) — these tests usemake_image_payload_cached_tokensand assertcached_tokens >= 1on a repeat MM request, which is exactly what fails today and should pass after this changeRelated
gh-8749umbrella — worker self-host metadataSummary by CodeRabbit
Improvements
Tests