Update README.md#1
Merged
Merged
Conversation
saturley-hall
approved these changes
Mar 4, 2025
kylehh
pushed a commit
to kylehh/dynamo
that referenced
this pull request
Apr 11, 2025
copy-pr-bot Bot
pushed a commit
that referenced
this pull request
Sep 9, 2025
* add metrics to disconnect * fmt * fmt Signed-off-by: michaelfeil <me@michaelfeil.eu>
grahamking
pushed a commit
that referenced
this pull request
Sep 10, 2025
ayushag-nv
pushed a commit
that referenced
this pull request
Sep 15, 2025
zhongdaor-nv
pushed a commit
that referenced
this pull request
Sep 15, 2025
elyasmnvidian
added a commit
that referenced
this pull request
Sep 22, 2025
Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>
ShounakRay
added a commit
to ShounakRay/fuzzy-dynamo
that referenced
this pull request
Mar 20, 2026
* chore: explanation * feat: add advanced fuzzing harnesses for dynamo-parsers Add invariant, differential, ReDoS, and tool-definition-aware fuzz targets with seed corpus, token dictionary, and coverage tooling. * test: add regression tests for 5 parser bugs Bugs 1-2 (granite streaming prefix loss/trim asymmetry), bug 5 (DeepSeek V3 newline normalization), bug 6 (pythonic text drop), bug 9 (base_json text drop), bug 14 (harmony empty content). * refactor: deduplicate parser code with shared utilities Extract chunk_ends_with_token_prefix() and decode_xml_entities() into tool_calling/utils.rs, replacing 6 inline loops and 2 entity decoders. Fixes UTF-8 byte-slicing panics in XML and GLM47 parsers. Includes regression tests for bugs 3-4, 7-8, 10-13. * feat: add kv-router fuzz targets 5 fuzz targets for dynamo-kv-router: radix tree state machine, block hash computation, KV protocol JSON deserialization, positional indexer, and request extra info splitting. Includes dictionary and seed corpus. * feat: add content preservation and streaming monotonicity fuzz oracles Two new parser fuzz targets: - fuzz_content_preservation: verifies text outside tool-call delimiters survives parsing without fabrication - fuzz_streaming_monotonicity: verifies streaming output accumulates monotonically (never shrinks) * feat: add tokens hash fuzz targets 3 fuzz targets for dynamo-tokens: positional sequence hash round-trip, positional lineage hash boundary testing, and token block sequence stateful invariants. * feat: add runtime codec fuzz targets 4 fuzz targets for dynamo-runtime: two-part message encode/decode round-trip, two-part decode crash oracle, TCP request message decode, and event plane frame decode. * test: add llm protocol regression tests * feat: add unified fuzzing runner, update findings and next steps * refactor: deduplicate fuzz targets with shared utility modules * feat: confirm kv_block_size=0 crash bug via fuzzing * feat: confirm 4 new crash bugs via extended fuzzing campaign New crashes found: - RadixTree RefCell reentrant borrow (radix_tree.rs:371) - RequestExtraInfo division by zero (protocols.rs:365) - TwoPartCodec integer overflow (two_part.rs:58) - MiniMaxAppendThink differential trim asymmetry All 27 fuzz targets run across 5 crates (~140M total executions). Updated findings and next steps documents. Added adversarial seeds targeting integer boundaries and hash collisions. * refactor: minimize fuzz targets and scripts (-1187 lines) Remove verbose comments, doc strings, and redundant code across all 27 fuzz targets, 2 shared libs, and 2 scripts. Delete superseded files (run_parser_fuzz.sh, repo_overview.md). Make coverage.sh auto-discover crates instead of maintaining a hardcoded target list. * docs: expand FINDINGS.md with detailed reproduction steps for all 20 bugs Each bug now includes: description, severity, how found (fuzzing vs audit), exact file/line, crash artifacts, fuzzing reproduction commands, standalone reproducers, root cause analysis, and suggested fixes. * docs: add individual upstream issue templates for all 20 bugs Each file follows the ai-dynamo/dynamo [BUG] issue format with: description, steps to reproduce, expected/actual behavior, suggested fix, environment, and additional context. Related bugs are grouped into single issues where appropriate (3+4, 6+9, 12+13, 19+20). * docs: update findings and issues based on intentionality audit - Remove bug 12/13 issue template (already fixed upstream via character-based iteration in chunk_ends_with_token_prefix) - Mark bugs 12/13 as already fixed in FINDINGS.md - Add debatable verdicts to bugs 5, 10, 19 in FINDINGS.md - Add context notes to issue templates for debatable bugs - Update NEXT_STEPS.md with extended run suggestions * chore: remove upstream CI workflows (require NVIDIA infra) * feat: add tool-call roundtrip and radix tree consistency fuzz targets * docs: add security assessment, severity rankings, and extended campaign results * chore: add setup script for remote configuration * chore: add WORKFLOW.md, tracked hooks, and updated setup script * docs: restore repo_overview.md * 🧪 feat: add ZMQ wire and SSE codec fuzz targets Expand fuzzing coverage to kv-router ZMQ wire protocol and LLM SSE codec * Add 4 kv-router fuzz targets: decode, convert, roundtrip, extra_keys * Add ZMQ wire dictionary entries and msgpack markers * Add 16 ZMQ wire seed corpus files for structured fuzzing * Add llm/fuzz crate with 3 SSE codec fuzz targets (decode, decode_data, streaming) * Add SSE seed corpus (14 files) and dictionary * Add rmp-serde dependency for msgpack serialization in fuzz harness --------- Co-authored-by: ShounakRay <shounak@stanford.edu>
ShounakRay
added a commit
to ShounakRay/fuzzy-dynamo
that referenced
this pull request
Mar 20, 2026
…ers) LaTeX book covering transformer inference, KV caching, distributed serving, Dynamo architecture, KV-aware routing, token blocks, network codecs, fuzzing methodology, and 16 bugs found across 36 fuzz targets. All bug references use ch10 canonical numbering (ai-dynamo#1-ai-dynamo#16), all cross- references resolve, all arithmetic verified. Includes TikZ diagrams, whynotbox pedagogical environments, and marginnotes throughout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks
tedzhouhk
added a commit
that referenced
this pull request
Apr 18, 2026
Adds TestQwen235BugReproCase with the exact two picks from the original triage report: * Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4 * Decode: tp=1, dp=8, moe_tp=2, moe_ep=4 Each test pins down a specific symptom that the old profiler path produced: * test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs) and Bug #2 (using .tp_size property) would break the identity for this pick. Verifies all five kwargs and the MoE identity. * test_decode_pick_satisfies_aic_identity — same for the decode pick, which was the exact shape that triggered Bug #3 (missing attention_dp_size). * test_tp_size_property_differs_from_aic_tp_size — direct regression guard: documents that the property returns 1 here while AIC's tp_size is 4. Anyone who wires .tp_size back into AIC kwargs will break this. * test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both sweeps and asserts every AIC call gets complete kwargs. This is the direct reproducer of the originally-reported failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
ranrubin
added a commit
that referenced
this pull request
Apr 20, 2026
Fixes all actionable items from the second review: Bug fixes: - #1: Change returncode=4 → returncode=2 in pytest_configure exit (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED) - #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original) Test quality: - #7: Add missing assertions to test_apply_hf_home_layout (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE) - #8: Use monkeypatch in tests 3 & 4 for proper env isolation (prevents pre-existing env vars from leaking on test failure) Design / correctness: - #3: Fix _models_dir_env docstring ("exactly once" → "once per worker") - #4: Add comment noting TRANSFORMERS_CACHE deprecation - #5: Update --models-dir help text and docs to reflect both supported layouts (bare HF_HUB_CACHE and HF_HOME), not just bare - #10: Restore pytest.skip() in download_lora() (test-only infra); remove now-redundant guard from minio_lora_service fixture - #11: Raise hub/ detection log to WARNING with guidance - #12: Replace shutil.rmtree(ignore_errors=True) with try/except so cleanup failures are logged rather than silently swallowed Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester test deferred — complex due to conftest dependencies, low severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>
ranrubin
added a commit
that referenced
this pull request
Apr 20, 2026
Bug fixes: - #1: Add monkeypatch env isolation to test_apply_sets_writable_transformers_cache - #2: Add TRANSFORMERS_CACHE assertion to test_apply_bare_cache_layout (bare-layout path was missing the writable-dir check present in HF_HOME test) Minor cleanups: - #4: Move `import pytest` from inside download_lora() to module top-level (lora_utils.py is test-only infra; pytest is always available) - #5: Replace pytestconfig.getoption("--models-dir") re-checks in predownload_models/predownload_tokenizers with os.environ.get("DYNAMO_MODELS_DIR") (_models_dir_env runs first and sets the var; single source of truth) New coverage tests: - #7: test_models_dir_nonexistent_exits_with_code_2 — subprocess test verifying pytest_configure exits with returncode=2 on bad path - #8: test_download_lora_skips_in_models_dir_mode — verifies download_lora() raises pytest.skip.Exception when DYNAMO_MODELS_DIR is set Not addressed: #3 (keep gpu_0 per project guidelines and previous review retraction), #6 (hook ordering is guaranteed), #9 (complex, low priority) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>
nv-tusharma
added a commit
that referenced
this pull request
Apr 21, 2026
Follow-up on the PR #8415 repro. The reproduction showed two problems in how multi-worker sglang tests handle startup failures: 1. Both workers in SGLangProcess share log_dir = request.node.name, so their stdout/stderr race into a single python3.log.txt. When worker #2 dies during init, worker #1's later output clobbers the crash reason and we cannot tell why worker #2 failed. 2. ManagedEngineProcessMixin.__enter__ launches workers sequentially with a 5 s init_delay but never calls proc.poll(), so a dead worker is indistinguishable from a healthy one. The test then blocks in KvRouter(endpoint, ...) (see tests/router/common.py:1993) waiting for min_initial_workers=2 instances that will never register -- manifesting as the 5 h nightly hang. Changes: - tests/router/test_router_e2e_with_sglang.py: give each worker its own log_dir subdirectory (worker_<idx>) so crash output is preserved. - tests/router/e2e_harness.py: add _assert_workers_alive() that polls every launched worker after delayed_start and after init_delay, and raise RuntimeError with the log path if any worker exited during startup. Fails the test in ~seconds instead of hanging for hours. - .github/workflows/pr.yaml: extend the sglang-hang-repro diagnostic step to tail each worker's python3.log.txt so the next repro captures worker #2's death reason directly. DYN-2784: https://linear.app/nvidia/issue/DYN-2784 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 24, 2026
keivenchang
added a commit
that referenced
this pull request
Apr 30, 2026
Adds 8 integration tests in lib/llm/tests/tool_choice.rs covering each
parser × {auto, required, named-correct, named-wrong}. Two tests pin
the parser-vs-immediate-jail conflict for tool_choice=required with a
TODO(CASE.11) note pointing at DIS-1842 work-item #1.
Renames the parser-unit placeholder tests to drop the bogus CASE.11/12
labels (parser is contract-blind by design). The renamed tests document
the parser-level invariant: parser does NOT filter by tool_choice and
its output is byte-stable regardless of upstream finish_reason.
Updates DSv4 coverage chart to point CASE.11 at the new integration
tests and CASE.12 at the parser-invariant test + cross-parser fixtures
in lib/llm/tests/test_streaming_tool_parsers.rs.
8 new integration tests; 16 tool_choice tests pass; 424 parser tests
pass; clippy clean.
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
keivenchang
added a commit
that referenced
this pull request
Apr 30, 2026
CodeRabbit flagged the new tests as carrying internal Linear ticket references in their doc-comments — repo policy forbids Linear refs in source code. Replaced each occurrence with neutral wording: - DIS-1842 work-item #1 → cross-parser tool_choice parametrisation work-item (tracked separately) - DIS-1842 work-item #5 → cross-parser finish_reason mapping work-item (tracked separately) - Universal gap noted in DIS-1842 → Universal gap noted in the test taxonomy Linear refs remain in the PR description (allowed per repo convention). 424 parser tests + 16 tool_choice tests pass. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
yifjiang
added a commit
to yifjiang/dynamo
that referenced
this pull request
May 4, 2026
…code-waiter cleanup Addresses two correctness issues raised by dynamo-ops review on PR 8965. 1. **Race on PrefillReady re-insert** (model_manager.rs) Both `register_prefill_router` and `activate_prefill_router` previously used a `remove → process → insert` pattern on `prefill_router_activators`. Between the `remove` and the `insert` the entry is gone from the map, so a concurrent `remove_prefill_activator` call (called by the watcher on prefill-component teardown) hits the empty map, skips its cleanup, and we then re-insert a stale `PrefillReady` for a prefill that's already gone. The next decode rebuild's lookup finds the resurrected cache and activates a `PrefillRouter` against a dead endpoint. Refactor both functions to use DashMap's `Entry` API. The shard lock is held for the duration of the OccupiedEntry, so any concurrent `remove_prefill_activator` serializes after us and observes the entry it needs to clear. The PrefillReady-consume path in `register_prefill_router` now reads-and-clones in place without removing; the DecodeWaiting/PrefillReady → PrefillReady transition in `activate_prefill_router` uses `OccupiedEntry::insert(...)` for an atomic value swap that returns the old value. 2. **Decode-waiter cleanup gated on `removed.is_some()`** (watcher.rs) `handle_delete_helper`'s decode branch only called `remove_decode_prefill_waiter` when `remove_worker_set` returned `Some(_)`. If decode registered (creating a `DecodeWaiting` activator entry via `register_prefill_router`) but `handle_add_helper` failed later (e.g., on `kv_chooser_for`, monitor setup, or `build_routed_pipeline`) before `add_worker_set`, no WorkerSet ever landed in the manager — and on later teardown `remove_worker_set` returns `None`, leaving the stale `DecodeWaiting` entry orphaned in the activator map. Drop the `removed.is_some()` guard. The helper itself is state-safe (uses `DashMap::remove_if(|_, v| matches!(v, DecodeWaiting(_)))`) so calling it on a key that's vacant or holds `PrefillReady` is a no-op. The prefill-teardown branch keeps its `removed.is_some()` guard because `remove_prefill_activator` blanket-removes both states, and calling it without a real WorkerSet teardown could orphan a live decode-side `DecodeWaiting`. Behavior is unchanged for all four reproducers (ai-dynamo#1 burst-sweep canary OFF, ai-dynamo#2 Pattern 1 multi-decode restart, ai-dynamo#3 Pattern 2 stale-DecodeWaiting, ai-dynamo#4 burst-sweep canary ON); will re-verify on a fresh build.
4 tasks
5 tasks
keivenchang
added a commit
that referenced
this pull request
May 7, 2026
…p-N / Others DIS-1842's top-N model #1 (DeepSeek V4) used the deepseek_v4 parser, but that family was never wired into the M2 cross-impl harness — the matrix shipped with deepseek_v3 + deepseek_v3_1 only. Closes the gap. * Add 10 PARSER.batch.* INPUTS entries for deepseek_v4 to regenerate_fixtures.py (DSML wire format: <DSML|tool_calls>, <DSML|invoke>, <DSML|parameter string="true|false">). Generated via the live Dynamo binding; new fixture file at tests/parity/parser/fixtures/deepseek_v4/PARSER.batch.yaml. * Wire "deepseek_v4": "deepseek_v4" in _FAMILY_TO_VLLM_KEY. * SGLang has no deepseek_v4 detector — whole-impl n/a, footnote added. * 2 new KNOWN_DIVERGENCES from smoke run: - vllm/deepseek_v4/batch.5 → recovery contract (vLLM surfaces raw text in normal_text on missing end-marker; Dynamo drops everything). - vllm/deepseek_v4/batch.7 → vLLM emits JSON-typed parameter values as raw strings; Dynamo coerces nested object/array types. Restructure the README matrix into two charts: * Top-N (7 rows) — primary parser per top-N model in DIS-1842, with (model) annotation: deepseek_v4 (DeepSeek V4), glm47 (GLM 5.1), harmony (gpt-oss), kimi_k2 (Kimi K2.6), minimax_m2 (MiniMax 2.7), nemotron_deci (Nemotron), qwen3_coder (Qwen 3.5). * Others (4 rows) — parsers in the harness without a top-N owner: deepseek_v3, deepseek_v3_1, gemma4, pythonic. Tally: 200 → 220 cells, 128 → 136 parity, 32 → 34 divergence, 40 → 50 n/a. Also rolls in earlier local-only edits to the same section: compact b1..b10 column headers (full PARSER.batch.N was too wide), case-name dictionary moved into the parity-status section directly above the table, case 1 ... case 10 references in the "Why families' YAMLs look so similar" section renamed to PARSER.batch.N. Smoke: vllm 8 passed / 2 xfailed; sglang 10 skipped (no peer); dynamo 10/10. Doc + harness wiring; no parser-side code change. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
MatejKosec
added a commit
that referenced
this pull request
May 10, 2026
…Bug A) Why --- Commit 3a7b775 fixed `VeloSession::close()` so that it drains BOTH `available_pins` (the `Mutex<BTreeMap<SequenceHash, ImmutableBlock<G2>>>` that holds strong refs to peer-puller-bound G2 blocks) AND `inbound_pulls` (the `DashMap<u64, Vec<SequenceHash>>` that authorizes peer pulls). Without that drain, the prefill-side `ManagedBlockPool::reset()` failed the two-request smoke with `Reset pool count mismatch: expected 1089, got 1086` whenever the puller errored before emitting `Frame::PullAck`. The fix shipped with `close_drains_unacked_holder_pins`, but that test asserts only `test_available_pin_count() == 0` after `close()`. A future refactor that deletes `self.inner.inbound_pulls.clear();` while keeping `self.inner.available_pins.lock().clear();` would silently pass the existing test but reintroduce the asymmetric protocol-state leak. What ---- New regression test `close_drains_inbound_pulls_and_releases_pool_blocks` that pins the invariant on three INDEPENDENT observable axes: 1. `test_available_pin_count() == 0` after `close()` 2. `test_inbound_pulls_count() == 0` after `close()` (NEW accessor, mirrors the existing `test_available_pin_count`) 3. `g2_manager.available_blocks() == g2_manager.total_blocks()` after the local `ImmutableBlock` handles drop, AND `g2_manager.reset_inactive_pool()` succeeds — the same shape as the production `BlockPoolError::ResetError` failure. The test forges TWO inbound `Frame::Pull`s (so `inbound_pulls` has two entries) and never injects `Frame::PullAck`, simulating the recompute-policy peer abort. It asserts non-empty mid-state on both maps before close, then full drain plus pool-parity after close. Two intermediate sanity checks guard against false positives: * `make_available` actually pins 3 blocks (not 0) * `g2_manager.available_blocks() == total_blocks - 3` mid-test (so the pool-parity assertion is measuring real strong-ref drop) Field-type note --------------- `SequenceHash` is `PositionalLineageHash(u128)` — a plain hash value, not a strong block ref. So clearing `inbound_pulls` alone does not itself release G2 blocks; the load-bearing strong-ref drain is on `available_pins`. But the `Frame::Pull`/`PullAck` protocol contract requires both maps to drain in lockstep (see the `Frame::PullAck` arm in `dispatch_frame`); an asymmetric drain at `close()` leaves the session with stale authorize-but-unacked tracking that any future addition of a strong-ref-bearing field to `inbound_pulls` (e.g. for backpressure) would silently leak. Axis #2 is checkable today via the new accessor, so the contract is locked in now. Validation (computelab-build-5, x86_64, dev profile) --------- * `cargo check -p kvbm-engine --tests` — clean (Finished in 29.54s) * `cargo test -p kvbm-engine --test velo_session_loopback` — all 9 tests pass (the new test plus 8 pre-existing siblings, including `close_drains_unacked_holder_pins`) * Negative path A — `close()` body with BOTH drain lines reverted (the literal pre-fix state): assertion `left == right` failed: close() must drain `available_pins` left: 3 right: 0 at velo_session_loopback.rs:723 (Axis #1 fires — production-shape failure of the original bug.) * Negative path B — `close()` body with ONLY `self.inner.inbound_pulls.clear();` reverted (asymmetric drain): assertion `left == right` failed: close() must drain `inbound_pulls`; otherwise authorized- but-unacked pull entries keep `Vec<SequenceHash>` mirrors alive and the corresponding G2 blocks leak past pool reset left: 2 right: 0 at velo_session_loopback.rs:728 (Axis #2 fires — proves the new test independently guards the inbound_pulls drain that the sibling test does not cover.) Also adds `VeloSession::test_inbound_pulls_count()` under the existing `#[cfg(any(test, feature = "testing"))]` gate, mirroring the existing `test_available_pin_count()`. No new dependencies; no production-path unwrap/expect/panic. Signed-off-by: Matej Kosec <mkosec@nvidia.com>
shqizhang
pushed a commit
to shqizhang/dynamo
that referenced
this pull request
May 11, 2026
The shared TCP server keys handlers by {cid:x}/{endpoint_name} where cid
is the process-scoped DRT connection_id. Registering both backend.generate
and prefill.generate from the same dual-mode worker collided on cid/generate
in the dispatcher's DashMap; the second insert silently overwrote the first.
After /switch_role, the prefill MDC's transport URL pointed at the same TCP
slot as the backend MDC, so PrefillRouter dispatches landed on whichever
generate handler won the race -- usually the plain decode handler -- which
returns disaggregated_params=None on every chunk. Rust PrefillRouter
rejected with HTTP 500 NoDisaggregatedParams (~50% of chats failed,
matching the load-balance fraction routed to the switched pod).
Fix: register exactly one TCP handler under cid/generate. The single
generate handler is a polymorphic wrapper that, at request time, looks at
dual_mode.current_role and either runs the decode handler or runs a
chunk-buffering wrapper around partner_prefill_handler.generate (which
consolidates vLLM's multi-chunk prefill output -- kv_transfer_params is
only published on the FINAL chunk by NixlConnector -- into a single chunk
whose disaggregated_params carries the merged kv_transfer_params, so the
Rust PrefillRouter sees the field on its mandatory chunk ai-dynamo#1).
_dual_partner_endpoint is still constructed as an Endpoint object purely
so VllmReregistrar.register('prefill') can publish the prefill MDC; the
MDC's transport URL points at the (now correctly handled) cid/generate
slot.
andyluo7
added a commit
to andyluo7/dynamo
that referenced
this pull request
May 12, 2026
Re-ran the Phase 4 M2.5 escalation without --enforce-eager (HIP graph capture enabled). Production numbers: conc | eager (was) | HIP graphs (now) | speedup -----|-------------|------------------|-------- c=1 | 18.6 tok/s | 94.7 tok/s | 5.1x c=4 | 68.0 tok/s | 261.4 tok/s | 3.8x c=8 | 72.7 tok/s | 587.1 tok/s | 8.1x Notable: 587 tok/s @ c=8 disagg now BEATS Phase 2.5 single-node agg (521 tok/s) -- disagg uses 2 nodes' compute (8 GPUs vs 4 in agg) and the frontend + KV-router overhead does not exceed the doubled capacity. Per-GPU at c=8: 73.4 tok/s/GPU on 8 MI355X. Fork's InferenceX MoRI 1P1D at c=32 reports 73.9 tok/s/GPU -- we match the fork's per-GPU efficiency on a model the fork didn't even test in this configuration. HIP graph capture: ~1:34 per worker for 5 PIECEWISE + 4 FULL graphs. KV cache available after capture: 170.93 GiB per worker. No aiter segfaults observed on gfx950 with M2.5+TP=4. Updates: - docs/08-phase34-escalation-results.md: replaces Phase 4 numbers table with HIP-graphs result; keeps eager for comparison; updates Tier 1 callout to mark item ai-dynamo#1 as DONE; updates final scorecard - README.md: updates headline table M2.5 row to 587 tok/s with footnote explaining the eager vs HIP-graphs context; updates per-GPU efficiency observations to highlight matching fork's published numbers - scripts/phase4_m25_disagg.sh: removes --enforce-eager flag, raises --max-num-seqs from 4 to 8 to amortize HIP graph capture
yao531441
pushed a commit
to yao531441/dynamo
that referenced
this pull request
May 13, 2026
Signed-off-by: michaelfeil <me@michaelfeil.eu>
tanmayv25
added a commit
that referenced
this pull request
May 15, 2026
Four CI failures on pre-merge after the metrics refactor: 1. cargo fmt --check: 4 files weren't `cargo fmt`-clean (engine.rs, metrics.rs, publisher.rs, mocker/engine.rs). Applied. 2. mypy: sglang/llm_engine.py imports `register_engine_registry` from `dynamo.common.backend.metrics` for vendor-registry bridging (SGLang's multiprocess CollectorRegistry path). The refactor wrongly classified this as framework-only and moved it to `_internal_metrics`. Re-export from `metrics.py` — engines legitimately use this primitive for custom-registry scenarios (sglang) alongside the global-registry helper `register_global_registry` (vllm/trtllm). 3. pre-commit isort: 3 files reformatted to single-line imports. 4. pre-commit black: trtllm/args.py reformatted (pre-existing line-length cleanup pulled in by black --all-files). rust-tests (lib/bindings/python) and (kvbm) were cascade-canceled by the workspace fmt failure; they pass once #1 is fixed. Fern docs broken-links + preview failures are unrelated tooling issues (Fern 4.66.1→5.26.5 upgrade prompt) — pre-existing on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tanmayv25
added a commit
that referenced
this pull request
May 16, 2026
…cycle Address senior-architect review of the metrics infrastructure: #1 (two parallel Prometheus paths): merge `register_prometheus` + `setup_component_metrics` into one `setup_metrics(ctx) -> MetricsBindings` method on the Rust `LLMEngine` trait. The trait surface shrinks from 7 → 6 methods; both expfmt bridging and the structured component publisher go through one place. The Python ABC keeps its two methods — the PyO3 bridge adapts (engine-author API unchanged for Python). #3 + #8 (framework lifecycle gauges gated on engine opt-in): - `cleanup_time_seconds` and `drain_time_seconds` move to Rust-side `LifecycleGauges` (prometheus crate gauges registered via the runtime's `MetricsRegistry`). Worker constructs them after `engine.start()` succeeds; observes during cleanup/drain. Emits regardless of engine opt-in. - Drop `set_cleanup_time` / `set_drain_time` from the `ComponentMetricsPublisher` trait — engine no longer responsible. - Drop the corresponding methods from `PyComponentMetricsPublisher`. - `model_load_time_seconds` STAYS on Python `LLMBackendMetrics` for parity with the legacy entry points (both legacy `main.py` and the unified bridge populate it). The unified bridge now constructs `LLMBackendMetrics` UNCONDITIONALLY so the gauge emits even when the engine returns no per-rank sources. #5 (kv_cache_hit_rate Optional semantics): clearer docstring on both Rust `ComponentSnapshot` and the Python ABC — tri-state explicitly documented (`None` = no data, `0.0` = legitimate zero-hit). Deferred with rationale (explained in the post-review architect's note): - #2 (HashMap<MetricKey, f64> ComponentSnapshot): real future-proofing but no current bug; 5 fields for 5 signals isn't worth the refactor cost yet. Revisit at ~10+ signals. - #4 (per-source cadence): explicit defer in original review. - #7 (conformance in CI): workflow change, separate from trait surface. Conformance kit collapses `check_register_prometheus` + `check_component_metrics` into `check_setup_metrics`. All 68 backend-common unit tests pass. Mocker example updated. PyO3 bridge wheel rebuilt successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tanmayv25
added a commit
that referenced
this pull request
May 16, 2026
Four CI failures on pre-merge after the metrics refactor: 1. cargo fmt --check: 4 files weren't `cargo fmt`-clean (engine.rs, metrics.rs, publisher.rs, mocker/engine.rs). Applied. 2. mypy: sglang/llm_engine.py imports `register_engine_registry` from `dynamo.common.backend.metrics` for vendor-registry bridging (SGLang's multiprocess CollectorRegistry path). The refactor wrongly classified this as framework-only and moved it to `_internal_metrics`. Re-export from `metrics.py` — engines legitimately use this primitive for custom-registry scenarios (sglang) alongside the global-registry helper `register_global_registry` (vllm/trtllm). 3. pre-commit isort: 3 files reformatted to single-line imports. 4. pre-commit black: trtllm/args.py reformatted (pre-existing line-length cleanup pulled in by black --all-files). rust-tests (lib/bindings/python) and (kvbm) were cascade-canceled by the workspace fmt failure; they pass once #1 is fixed. Fern docs broken-links + preview failures are unrelated tooling issues (Fern 4.66.1→5.26.5 upgrade prompt) — pre-existing on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.