Update README.md by nvda-mesharma · Pull Request #1 · ai-dynamo/dynamo

nvda-mesharma · 2025-03-04T00:42:20Z

No description provided.

* add metrics to disconnect * fmt * fmt Signed-off-by: michaelfeil <me@michaelfeil.eu>

Signed-off-by: michaelfeil <me@michaelfeil.eu>

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>

* chore: explanation * feat: add advanced fuzzing harnesses for dynamo-parsers Add invariant, differential, ReDoS, and tool-definition-aware fuzz targets with seed corpus, token dictionary, and coverage tooling. * test: add regression tests for 5 parser bugs Bugs 1-2 (granite streaming prefix loss/trim asymmetry), bug 5 (DeepSeek V3 newline normalization), bug 6 (pythonic text drop), bug 9 (base_json text drop), bug 14 (harmony empty content). * refactor: deduplicate parser code with shared utilities Extract chunk_ends_with_token_prefix() and decode_xml_entities() into tool_calling/utils.rs, replacing 6 inline loops and 2 entity decoders. Fixes UTF-8 byte-slicing panics in XML and GLM47 parsers. Includes regression tests for bugs 3-4, 7-8, 10-13. * feat: add kv-router fuzz targets 5 fuzz targets for dynamo-kv-router: radix tree state machine, block hash computation, KV protocol JSON deserialization, positional indexer, and request extra info splitting. Includes dictionary and seed corpus. * feat: add content preservation and streaming monotonicity fuzz oracles Two new parser fuzz targets: - fuzz_content_preservation: verifies text outside tool-call delimiters survives parsing without fabrication - fuzz_streaming_monotonicity: verifies streaming output accumulates monotonically (never shrinks) * feat: add tokens hash fuzz targets 3 fuzz targets for dynamo-tokens: positional sequence hash round-trip, positional lineage hash boundary testing, and token block sequence stateful invariants. * feat: add runtime codec fuzz targets 4 fuzz targets for dynamo-runtime: two-part message encode/decode round-trip, two-part decode crash oracle, TCP request message decode, and event plane frame decode. * test: add llm protocol regression tests * feat: add unified fuzzing runner, update findings and next steps * refactor: deduplicate fuzz targets with shared utility modules * feat: confirm kv_block_size=0 crash bug via fuzzing * feat: confirm 4 new crash bugs via extended fuzzing campaign New crashes found: - RadixTree RefCell reentrant borrow (radix_tree.rs:371) - RequestExtraInfo division by zero (protocols.rs:365) - TwoPartCodec integer overflow (two_part.rs:58) - MiniMaxAppendThink differential trim asymmetry All 27 fuzz targets run across 5 crates (~140M total executions). Updated findings and next steps documents. Added adversarial seeds targeting integer boundaries and hash collisions. * refactor: minimize fuzz targets and scripts (-1187 lines) Remove verbose comments, doc strings, and redundant code across all 27 fuzz targets, 2 shared libs, and 2 scripts. Delete superseded files (run_parser_fuzz.sh, repo_overview.md). Make coverage.sh auto-discover crates instead of maintaining a hardcoded target list. * docs: expand FINDINGS.md with detailed reproduction steps for all 20 bugs Each bug now includes: description, severity, how found (fuzzing vs audit), exact file/line, crash artifacts, fuzzing reproduction commands, standalone reproducers, root cause analysis, and suggested fixes. * docs: add individual upstream issue templates for all 20 bugs Each file follows the ai-dynamo/dynamo [BUG] issue format with: description, steps to reproduce, expected/actual behavior, suggested fix, environment, and additional context. Related bugs are grouped into single issues where appropriate (3+4, 6+9, 12+13, 19+20). * docs: update findings and issues based on intentionality audit - Remove bug 12/13 issue template (already fixed upstream via character-based iteration in chunk_ends_with_token_prefix) - Mark bugs 12/13 as already fixed in FINDINGS.md - Add debatable verdicts to bugs 5, 10, 19 in FINDINGS.md - Add context notes to issue templates for debatable bugs - Update NEXT_STEPS.md with extended run suggestions * chore: remove upstream CI workflows (require NVIDIA infra) * feat: add tool-call roundtrip and radix tree consistency fuzz targets * docs: add security assessment, severity rankings, and extended campaign results * chore: add setup script for remote configuration * chore: add WORKFLOW.md, tracked hooks, and updated setup script * docs: restore repo_overview.md * 🧪 feat: add ZMQ wire and SSE codec fuzz targets Expand fuzzing coverage to kv-router ZMQ wire protocol and LLM SSE codec * Add 4 kv-router fuzz targets: decode, convert, roundtrip, extra_keys * Add ZMQ wire dictionary entries and msgpack markers * Add 16 ZMQ wire seed corpus files for structured fuzzing * Add llm/fuzz crate with 3 SSE codec fuzz targets (decode, decode_data, streaming) * Add SSE seed corpus (14 files) and dictionary * Add rmp-serde dependency for msgpack serialization in fuzz harness --------- Co-authored-by: ShounakRay <shounak@stanford.edu>

…ers) LaTeX book covering transformer inference, KV caching, distributed serving, Dynamo architecture, KV-aware routing, token blocks, network codecs, fuzzing methodology, and 16 bugs found across 36 fuzz targets. All bug references use ch10 canonical numbering (ai-dynamo#1-ai-dynamo#16), all cross- references resolve, all arithmetic verified. Includes TikZ diagrams, whynotbox pedagogical environments, and marginnotes throughout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update main branch

Adds TestQwen235BugReproCase with the exact two picks from the original triage report: * Prefill: tp=4, dp=1, moe_tp=1, moe_ep=4 * Decode: tp=1, dp=8, moe_tp=2, moe_ep=4 Each test pins down a specific symptom that the old profiler path produced: * test_prefill_pick_satisfies_aic_identity — Bug #1 (missing moe kwargs) and Bug #2 (using .tp_size property) would break the identity for this pick. Verifies all five kwargs and the MoE identity. * test_decode_pick_satisfies_aic_identity — same for the decode pick, which was the exact shape that triggered Bug #3 (missing attention_dp_size). * test_tp_size_property_differs_from_aic_tp_size — direct regression guard: documents that the property returns 1 here while AIC's tp_size is 4. Anyone who wires .tp_size back into AIC kwargs will break this. * test_end_to_end_sweep_delivers_complete_kwargs_to_aic — runs both sweeps and asserts every AIC call gets complete kwargs. This is the direct reproducer of the originally-reported failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Fixes all actionable items from the second review: Bug fixes: - #1: Change returncode=4 → returncode=2 in pytest_configure exit (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED) - #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original) Test quality: - #7: Add missing assertions to test_apply_hf_home_layout (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE) - #8: Use monkeypatch in tests 3 & 4 for proper env isolation (prevents pre-existing env vars from leaking on test failure) Design / correctness: - #3: Fix _models_dir_env docstring ("exactly once" → "once per worker") - #4: Add comment noting TRANSFORMERS_CACHE deprecation - #5: Update --models-dir help text and docs to reflect both supported layouts (bare HF_HUB_CACHE and HF_HOME), not just bare - #10: Restore pytest.skip() in download_lora() (test-only infra); remove now-redundant guard from minio_lora_service fixture - #11: Raise hub/ detection log to WARNING with guidance - #12: Replace shutil.rmtree(ignore_errors=True) with try/except so cleanup failures are logged rather than silently swallowed Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester test deferred — complex due to conftest dependencies, low severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

Bug fixes: - #1: Add monkeypatch env isolation to test_apply_sets_writable_transformers_cache - #2: Add TRANSFORMERS_CACHE assertion to test_apply_bare_cache_layout (bare-layout path was missing the writable-dir check present in HF_HOME test) Minor cleanups: - #4: Move `import pytest` from inside download_lora() to module top-level (lora_utils.py is test-only infra; pytest is always available) - #5: Replace pytestconfig.getoption("--models-dir") re-checks in predownload_models/predownload_tokenizers with os.environ.get("DYNAMO_MODELS_DIR") (_models_dir_env runs first and sets the var; single source of truth) New coverage tests: - #7: test_models_dir_nonexistent_exits_with_code_2 — subprocess test verifying pytest_configure exits with returncode=2 on bad path - #8: test_download_lora_skips_in_models_dir_mode — verifies download_lora() raises pytest.skip.Exception when DYNAMO_MODELS_DIR is set Not addressed: #3 (keep gpu_0 per project guidelines and previous review retraction), #6 (hook ordering is guaranteed), #9 (complex, low priority) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

Follow-up on the PR #8415 repro. The reproduction showed two problems in how multi-worker sglang tests handle startup failures: 1. Both workers in SGLangProcess share log_dir = request.node.name, so their stdout/stderr race into a single python3.log.txt. When worker #2 dies during init, worker #1's later output clobbers the crash reason and we cannot tell why worker #2 failed. 2. ManagedEngineProcessMixin.__enter__ launches workers sequentially with a 5 s init_delay but never calls proc.poll(), so a dead worker is indistinguishable from a healthy one. The test then blocks in KvRouter(endpoint, ...) (see tests/router/common.py:1993) waiting for min_initial_workers=2 instances that will never register -- manifesting as the 5 h nightly hang. Changes: - tests/router/test_router_e2e_with_sglang.py: give each worker its own log_dir subdirectory (worker_<idx>) so crash output is preserved. - tests/router/e2e_harness.py: add _assert_workers_alive() that polls every launched worker after delayed_start and after init_delay, and raise RuntimeError with the log path if any worker exited during startup. Fails the test in ~seconds instead of hanging for hours. - .github/workflows/pr.yaml: extend the sglang-hang-repro diagnostic step to tail each worker's python3.log.txt so the next repro captures worker #2's death reason directly. DYN-2784: https://linear.app/nvidia/issue/DYN-2784 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds 8 integration tests in lib/llm/tests/tool_choice.rs covering each parser × {auto, required, named-correct, named-wrong}. Two tests pin the parser-vs-immediate-jail conflict for tool_choice=required with a TODO(CASE.11) note pointing at DIS-1842 work-item #1. Renames the parser-unit placeholder tests to drop the bogus CASE.11/12 labels (parser is contract-blind by design). The renamed tests document the parser-level invariant: parser does NOT filter by tool_choice and its output is byte-stable regardless of upstream finish_reason. Updates DSv4 coverage chart to point CASE.11 at the new integration tests and CASE.12 at the parser-invariant test + cross-parser fixtures in lib/llm/tests/test_streaming_tool_parsers.rs. 8 new integration tests; 16 tool_choice tests pass; 424 parser tests pass; clippy clean. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

CodeRabbit flagged the new tests as carrying internal Linear ticket references in their doc-comments — repo policy forbids Linear refs in source code. Replaced each occurrence with neutral wording: - DIS-1842 work-item #1 → cross-parser tool_choice parametrisation work-item (tracked separately) - DIS-1842 work-item #5 → cross-parser finish_reason mapping work-item (tracked separately) - Universal gap noted in DIS-1842 → Universal gap noted in the test taxonomy Linear refs remain in the PR description (allowed per repo convention). 424 parser tests + 16 tool_choice tests pass. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…code-waiter cleanup Addresses two correctness issues raised by dynamo-ops review on PR 8965. 1. **Race on PrefillReady re-insert** (model_manager.rs) Both `register_prefill_router` and `activate_prefill_router` previously used a `remove → process → insert` pattern on `prefill_router_activators`. Between the `remove` and the `insert` the entry is gone from the map, so a concurrent `remove_prefill_activator` call (called by the watcher on prefill-component teardown) hits the empty map, skips its cleanup, and we then re-insert a stale `PrefillReady` for a prefill that's already gone. The next decode rebuild's lookup finds the resurrected cache and activates a `PrefillRouter` against a dead endpoint. Refactor both functions to use DashMap's `Entry` API. The shard lock is held for the duration of the OccupiedEntry, so any concurrent `remove_prefill_activator` serializes after us and observes the entry it needs to clear. The PrefillReady-consume path in `register_prefill_router` now reads-and-clones in place without removing; the DecodeWaiting/PrefillReady → PrefillReady transition in `activate_prefill_router` uses `OccupiedEntry::insert(...)` for an atomic value swap that returns the old value. 2. **Decode-waiter cleanup gated on `removed.is_some()`** (watcher.rs) `handle_delete_helper`'s decode branch only called `remove_decode_prefill_waiter` when `remove_worker_set` returned `Some(_)`. If decode registered (creating a `DecodeWaiting` activator entry via `register_prefill_router`) but `handle_add_helper` failed later (e.g., on `kv_chooser_for`, monitor setup, or `build_routed_pipeline`) before `add_worker_set`, no WorkerSet ever landed in the manager — and on later teardown `remove_worker_set` returns `None`, leaving the stale `DecodeWaiting` entry orphaned in the activator map. Drop the `removed.is_some()` guard. The helper itself is state-safe (uses `DashMap::remove_if(|_, v| matches!(v, DecodeWaiting(_)))`) so calling it on a key that's vacant or holds `PrefillReady` is a no-op. The prefill-teardown branch keeps its `removed.is_some()` guard because `remove_prefill_activator` blanket-removes both states, and calling it without a real WorkerSet teardown could orphan a live decode-side `DecodeWaiting`. Behavior is unchanged for all four reproducers (ai-dynamo#1 burst-sweep canary OFF, ai-dynamo#2 Pattern 1 multi-decode restart, ai-dynamo#3 Pattern 2 stale-DecodeWaiting, ai-dynamo#4 burst-sweep canary ON); will re-verify on a fresh build.

…p-N / Others DIS-1842's top-N model #1 (DeepSeek V4) used the deepseek_v4 parser, but that family was never wired into the M2 cross-impl harness — the matrix shipped with deepseek_v3 + deepseek_v3_1 only. Closes the gap. * Add 10 PARSER.batch.* INPUTS entries for deepseek_v4 to regenerate_fixtures.py (DSML wire format: <DSML|tool_calls>, <DSML|invoke>, <DSML|parameter string="true|false">). Generated via the live Dynamo binding; new fixture file at tests/parity/parser/fixtures/deepseek_v4/PARSER.batch.yaml. * Wire "deepseek_v4": "deepseek_v4" in _FAMILY_TO_VLLM_KEY. * SGLang has no deepseek_v4 detector — whole-impl n/a, footnote added. * 2 new KNOWN_DIVERGENCES from smoke run: - vllm/deepseek_v4/batch.5 → recovery contract (vLLM surfaces raw text in normal_text on missing end-marker; Dynamo drops everything). - vllm/deepseek_v4/batch.7 → vLLM emits JSON-typed parameter values as raw strings; Dynamo coerces nested object/array types. Restructure the README matrix into two charts: * Top-N (7 rows) — primary parser per top-N model in DIS-1842, with (model) annotation: deepseek_v4 (DeepSeek V4), glm47 (GLM 5.1), harmony (gpt-oss), kimi_k2 (Kimi K2.6), minimax_m2 (MiniMax 2.7), nemotron_deci (Nemotron), qwen3_coder (Qwen 3.5). * Others (4 rows) — parsers in the harness without a top-N owner: deepseek_v3, deepseek_v3_1, gemma4, pythonic. Tally: 200 → 220 cells, 128 → 136 parity, 32 → 34 divergence, 40 → 50 n/a. Also rolls in earlier local-only edits to the same section: compact b1..b10 column headers (full PARSER.batch.N was too wide), case-name dictionary moved into the parity-status section directly above the table, case 1 ... case 10 references in the "Why families' YAMLs look so similar" section renamed to PARSER.batch.N. Smoke: vllm 8 passed / 2 xfailed; sglang 10 skipped (no peer); dynamo 10/10. Doc + harness wiring; no parser-side code change. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…Bug A) Why --- Commit 3a7b775 fixed `VeloSession::close()` so that it drains BOTH `available_pins` (the `Mutex<BTreeMap<SequenceHash, ImmutableBlock<G2>>>` that holds strong refs to peer-puller-bound G2 blocks) AND `inbound_pulls` (the `DashMap<u64, Vec<SequenceHash>>` that authorizes peer pulls). Without that drain, the prefill-side `ManagedBlockPool::reset()` failed the two-request smoke with `Reset pool count mismatch: expected 1089, got 1086` whenever the puller errored before emitting `Frame::PullAck`. The fix shipped with `close_drains_unacked_holder_pins`, but that test asserts only `test_available_pin_count() == 0` after `close()`. A future refactor that deletes `self.inner.inbound_pulls.clear();` while keeping `self.inner.available_pins.lock().clear();` would silently pass the existing test but reintroduce the asymmetric protocol-state leak. What ---- New regression test `close_drains_inbound_pulls_and_releases_pool_blocks` that pins the invariant on three INDEPENDENT observable axes: 1. `test_available_pin_count() == 0` after `close()` 2. `test_inbound_pulls_count() == 0` after `close()` (NEW accessor, mirrors the existing `test_available_pin_count`) 3. `g2_manager.available_blocks() == g2_manager.total_blocks()` after the local `ImmutableBlock` handles drop, AND `g2_manager.reset_inactive_pool()` succeeds — the same shape as the production `BlockPoolError::ResetError` failure. The test forges TWO inbound `Frame::Pull`s (so `inbound_pulls` has two entries) and never injects `Frame::PullAck`, simulating the recompute-policy peer abort. It asserts non-empty mid-state on both maps before close, then full drain plus pool-parity after close. Two intermediate sanity checks guard against false positives: * `make_available` actually pins 3 blocks (not 0) * `g2_manager.available_blocks() == total_blocks - 3` mid-test (so the pool-parity assertion is measuring real strong-ref drop) Field-type note --------------- `SequenceHash` is `PositionalLineageHash(u128)` — a plain hash value, not a strong block ref. So clearing `inbound_pulls` alone does not itself release G2 blocks; the load-bearing strong-ref drain is on `available_pins`. But the `Frame::Pull`/`PullAck` protocol contract requires both maps to drain in lockstep (see the `Frame::PullAck` arm in `dispatch_frame`); an asymmetric drain at `close()` leaves the session with stale authorize-but-unacked tracking that any future addition of a strong-ref-bearing field to `inbound_pulls` (e.g. for backpressure) would silently leak. Axis #2 is checkable today via the new accessor, so the contract is locked in now. Validation (computelab-build-5, x86_64, dev profile) --------- * `cargo check -p kvbm-engine --tests` — clean (Finished in 29.54s) * `cargo test -p kvbm-engine --test velo_session_loopback` — all 9 tests pass (the new test plus 8 pre-existing siblings, including `close_drains_unacked_holder_pins`) * Negative path A — `close()` body with BOTH drain lines reverted (the literal pre-fix state): assertion `left == right` failed: close() must drain `available_pins` left: 3 right: 0 at velo_session_loopback.rs:723 (Axis #1 fires — production-shape failure of the original bug.) * Negative path B — `close()` body with ONLY `self.inner.inbound_pulls.clear();` reverted (asymmetric drain): assertion `left == right` failed: close() must drain `inbound_pulls`; otherwise authorized- but-unacked pull entries keep `Vec<SequenceHash>` mirrors alive and the corresponding G2 blocks leak past pool reset left: 2 right: 0 at velo_session_loopback.rs:728 (Axis #2 fires — proves the new test independently guards the inbound_pulls drain that the sibling test does not cover.) Also adds `VeloSession::test_inbound_pulls_count()` under the existing `#[cfg(any(test, feature = "testing"))]` gate, mirroring the existing `test_available_pin_count()`. No new dependencies; no production-path unwrap/expect/panic. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

The shared TCP server keys handlers by {cid:x}/{endpoint_name} where cid is the process-scoped DRT connection_id. Registering both backend.generate and prefill.generate from the same dual-mode worker collided on cid/generate in the dispatcher's DashMap; the second insert silently overwrote the first. After /switch_role, the prefill MDC's transport URL pointed at the same TCP slot as the backend MDC, so PrefillRouter dispatches landed on whichever generate handler won the race -- usually the plain decode handler -- which returns disaggregated_params=None on every chunk. Rust PrefillRouter rejected with HTTP 500 NoDisaggregatedParams (~50% of chats failed, matching the load-balance fraction routed to the switched pod). Fix: register exactly one TCP handler under cid/generate. The single generate handler is a polymorphic wrapper that, at request time, looks at dual_mode.current_role and either runs the decode handler or runs a chunk-buffering wrapper around partner_prefill_handler.generate (which consolidates vLLM's multi-chunk prefill output -- kv_transfer_params is only published on the FINAL chunk by NixlConnector -- into a single chunk whose disaggregated_params carries the merged kv_transfer_params, so the Rust PrefillRouter sees the field on its mandatory chunk ai-dynamo#1). _dual_partner_endpoint is still constructed as an Endpoint object purely so VllmReregistrar.register('prefill') can publish the prefill MDC; the MDC's transport URL points at the (now correctly handled) cid/generate slot.

Re-ran the Phase 4 M2.5 escalation without --enforce-eager (HIP graph capture enabled). Production numbers: conc | eager (was) | HIP graphs (now) | speedup -----|-------------|------------------|-------- c=1 | 18.6 tok/s | 94.7 tok/s | 5.1x c=4 | 68.0 tok/s | 261.4 tok/s | 3.8x c=8 | 72.7 tok/s | 587.1 tok/s | 8.1x Notable: 587 tok/s @ c=8 disagg now BEATS Phase 2.5 single-node agg (521 tok/s) -- disagg uses 2 nodes' compute (8 GPUs vs 4 in agg) and the frontend + KV-router overhead does not exceed the doubled capacity. Per-GPU at c=8: 73.4 tok/s/GPU on 8 MI355X. Fork's InferenceX MoRI 1P1D at c=32 reports 73.9 tok/s/GPU -- we match the fork's per-GPU efficiency on a model the fork didn't even test in this configuration. HIP graph capture: ~1:34 per worker for 5 PIECEWISE + 4 FULL graphs. KV cache available after capture: 170.93 GiB per worker. No aiter segfaults observed on gfx950 with M2.5+TP=4. Updates: - docs/08-phase34-escalation-results.md: replaces Phase 4 numbers table with HIP-graphs result; keeps eager for comparison; updates Tier 1 callout to mark item ai-dynamo#1 as DONE; updates final scorecard - README.md: updates headline table M2.5 row to 587 tok/s with footnote explaining the eager vs HIP-graphs context; updates per-GPU efficiency observations to highlight matching fork's published numbers - scripts/phase4_m25_disagg.sh: removes --enforce-eager flag, raises --max-num-seqs from 4 to 8 to amortize HIP graph capture

Signed-off-by: michaelfeil <me@michaelfeil.eu>

Four CI failures on pre-merge after the metrics refactor: 1. cargo fmt --check: 4 files weren't `cargo fmt`-clean (engine.rs, metrics.rs, publisher.rs, mocker/engine.rs). Applied. 2. mypy: sglang/llm_engine.py imports `register_engine_registry` from `dynamo.common.backend.metrics` for vendor-registry bridging (SGLang's multiprocess CollectorRegistry path). The refactor wrongly classified this as framework-only and moved it to `_internal_metrics`. Re-export from `metrics.py` — engines legitimately use this primitive for custom-registry scenarios (sglang) alongside the global-registry helper `register_global_registry` (vllm/trtllm). 3. pre-commit isort: 3 files reformatted to single-line imports. 4. pre-commit black: trtllm/args.py reformatted (pre-existing line-length cleanup pulled in by black --all-files). rust-tests (lib/bindings/python) and (kvbm) were cascade-canceled by the workspace fmt failure; they pass once #1 is fixed. Fern docs broken-links + preview failures are unrelated tooling issues (Fern 4.66.1→5.26.5 upgrade prompt) — pre-existing on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cycle Address senior-architect review of the metrics infrastructure: #1 (two parallel Prometheus paths): merge `register_prometheus` + `setup_component_metrics` into one `setup_metrics(ctx) -> MetricsBindings` method on the Rust `LLMEngine` trait. The trait surface shrinks from 7 → 6 methods; both expfmt bridging and the structured component publisher go through one place. The Python ABC keeps its two methods — the PyO3 bridge adapts (engine-author API unchanged for Python). #3 + #8 (framework lifecycle gauges gated on engine opt-in): - `cleanup_time_seconds` and `drain_time_seconds` move to Rust-side `LifecycleGauges` (prometheus crate gauges registered via the runtime's `MetricsRegistry`). Worker constructs them after `engine.start()` succeeds; observes during cleanup/drain. Emits regardless of engine opt-in. - Drop `set_cleanup_time` / `set_drain_time` from the `ComponentMetricsPublisher` trait — engine no longer responsible. - Drop the corresponding methods from `PyComponentMetricsPublisher`. - `model_load_time_seconds` STAYS on Python `LLMBackendMetrics` for parity with the legacy entry points (both legacy `main.py` and the unified bridge populate it). The unified bridge now constructs `LLMBackendMetrics` UNCONDITIONALLY so the gauge emits even when the engine returns no per-rank sources. #5 (kv_cache_hit_rate Optional semantics): clearer docstring on both Rust `ComponentSnapshot` and the Python ABC — tri-state explicitly documented (`None` = no data, `0.0` = legitimate zero-hit). Deferred with rationale (explained in the post-review architect's note): - #2 (HashMap<MetricKey, f64> ComponentSnapshot): real future-proofing but no current bug; 5 fields for 5 signals isn't worth the refactor cost yet. Revisit at ~10+ signals. - #4 (per-source cadence): explicit defer in original review. - #7 (conformance in CI): workflow change, separate from trait surface. Conformance kit collapses `check_register_prometheus` + `check_component_metrics` into `check_setup_metrics`. All 68 backend-common unit tests pass. Mocker example updated. PyO3 bridge wheel rebuilt successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four CI failures on pre-merge after the metrics refactor: 1. cargo fmt --check: 4 files weren't `cargo fmt`-clean (engine.rs, metrics.rs, publisher.rs, mocker/engine.rs). Applied. 2. mypy: sglang/llm_engine.py imports `register_engine_registry` from `dynamo.common.backend.metrics` for vendor-registry bridging (SGLang's multiprocess CollectorRegistry path). The refactor wrongly classified this as framework-only and moved it to `_internal_metrics`. Re-export from `metrics.py` — engines legitimately use this primitive for custom-registry scenarios (sglang) alongside the global-registry helper `register_global_registry` (vllm/trtllm). 3. pre-commit isort: 3 files reformatted to single-line imports. 4. pre-commit black: trtllm/args.py reformatted (pre-existing line-length cleanup pulled in by black --all-files). rust-tests (lib/bindings/python) and (kvbm) were cascade-canceled by the workspace fmt failure; they pass once #1 is fixed. Fern docs broken-links + preview failures are unrelated tooling issues (Fern 4.66.1→5.26.5 upgrade prompt) — pre-existing on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Update README.md

c6ef4ad

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 00:42 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:01 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:06 — with GitHub Actions Failure

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:09 — with GitHub Actions Inactive

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:10 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:31 — with GitHub Actions Failure

nvda-mesharma had a problem deploying to GITLAB March 4, 2025 01:33 — with GitHub Actions Failure

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:49 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 01:57 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 02:00 — with GitHub Actions Inactive

Merge branch 'main' into Update-README.md

6104b6c

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 16:27 — with GitHub Actions Inactive

nvda-mesharma temporarily deployed to GITLAB March 4, 2025 16:28 — with GitHub Actions Inactive

saturley-hall self-requested a review March 4, 2025 16:46

saturley-hall approved these changes Mar 4, 2025

View reviewed changes

nnshah1 merged commit 5033457 into main Mar 4, 2025

nnshah1 deleted the Update-README.md branch March 4, 2025 17:16

maobaolong mentioned this pull request Mar 21, 2025

pip install ai-dynamo[all] ERROR，help #321

Closed

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

Update README.md (ai-dynamo#1)

b9ce8dd

hhzhang16 mentioned this pull request Apr 11, 2025

feat: add .devcontainer based off images in container/ #497

Merged

dtransposed mentioned this pull request Jun 23, 2025

[BUG]: AttributeError: type object 'ServiceConfig' has no attribute 'get_parsed_config' #1608

Closed

coderabbitai Bot mentioned this pull request Sep 9, 2025

feat: frontend, disconnect metrics (#1) #2953

Merged

copy-pr-bot Bot pushed a commit that referenced this pull request Sep 9, 2025

Mf/disconnect metrics (#1)

0ac83d5

* add metrics to disconnect * fmt * fmt Signed-off-by: michaelfeil <me@michaelfeil.eu>

grahamking pushed a commit that referenced this pull request Sep 10, 2025

feat: frontend, disconnect metrics (#1) (#2953)

e42746a

Signed-off-by: michaelfeil <me@michaelfeil.eu>

ayushag-nv pushed a commit that referenced this pull request Sep 15, 2025

feat: frontend, disconnect metrics (#1) (#2953)

031f95c

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: ayushag <ayushag@nvidia.com>

zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025

feat: frontend, disconnect metrics (#1) (#2953)

aba1dd1

Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

elyasmnvidian added a commit that referenced this pull request Sep 22, 2025

chore: fix unit test #1

265870c

Signed-off-by: Elyas Mehtabuddin <emehtabuddin@nvidia.com>

ajcasagrande mentioned this pull request Nov 5, 2025

[BUG]: Tempo fails to run using docker compose due to permissions error #4124

Closed

kthui mentioned this pull request Mar 20, 2026

fix: Sglang Cancellation Aggregated Test. #3896

Open

nnshah1 mentioned this pull request Apr 3, 2026

feat: request lifecycle logging + metrics capture [DIS-1643] #7840

Merged

4 tasks

tanmayv25 mentioned this pull request Apr 15, 2026

DEP: Backend Interface -- LLMEngine ABC and Worker #8251

Open

copy-pr-bot Bot pushed a commit that referenced this pull request Apr 17, 2026

Merge pull request #1 from ai-dynamo/main

99504d4

Update main branch

atchernych mentioned this pull request Apr 23, 2026

Proposal: Hybrid Priority Scheduling in Dynamo #8580

Open

This was referenced Apr 24, 2026

[INVESTIGATE] sglang + DSv4: kwarg workaround breaks live — needs diagnosis #8671

Closed

test(parsers): DIS-1842 — DSv4 + Kimi K2 unit-test coverage gaps #8946

Merged

yifjiang mentioned this pull request May 4, 2026

fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates #8965

Merged

4 tasks

keivenchang mentioned this pull request May 5, 2026

feat(observability): add Dynamo Local Resource Monitor with Grafana dashboard #9055

Merged

5 tasks

harryskim mentioned this pull request May 6, 2026

[Vision] Toward Dynamo 2.0 #9208

Open

dmitry-tokarev-nv mentioned this pull request May 7, 2026

test(helm): add helm-unittest infrastructure and CI job #9266

Merged

richardhuo-nv mentioned this pull request May 7, 2026

fix: do not override the the event_buffer_max_size if user provided one #9284

Merged

yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026

feat: frontend, disconnect metrics (ai-dynamo#1) (ai-dynamo#2953)

4d5c088

Signed-off-by: michaelfeil <me@michaelfeil.eu>

nv-tusharma mentioned this pull request May 18, 2026

fix(ci): wire enable_coverage into pytest-local pytest invocation #9559

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#1

Update README.md#1
nnshah1 merged 2 commits into
mainfrom
Update-README.md

nvda-mesharma commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvda-mesharma commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants