CI: setup caching for any merge to main by saturley-hall · Pull Request #8 · ai-dynamo/dynamo

saturley-hall · 2025-03-04T17:57:30Z

This will build a framework-specific image cache for each push onto main which should speed up building the image for the first CI run of a PR.

github-actions · 2025-03-04T18:05:16Z

Test Results

2 files 2 suites 26s ⏱️
71 tests 71 ✅ 0 💤 0 ❌
89 runs 88 ✅ 1 💤 0 ❌

Results for commit 0d41be0.

♻️ This comment has been updated with latest results.

whoisj

Please add an explainer to the description of the PR.

The fact that ever merge to main will populate the cache, which in turn accelerates future builds that involve main, etc.

whoisj · 2025-03-04T18:29:05Z

10m build times is faster - nice work 😄

whoisj

LGTM

Fixes all actionable items from the second review: Bug fixes: - #1: Change returncode=4 → returncode=2 in pytest_configure exit (4 is reserved by pytest for EXIT_NOTESTSCOLLECTED) - #2: Add comment clarifying HF_HUB_OFFLINE double-clear is safe (already in _MODELS_DIR_ENV_KEYS; loop correctly restores original) Test quality: - #7: Add missing assertions to test_apply_hf_home_layout (HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE, DYNAMO_MODELS_DIR, TRANSFORMERS_CACHE) - #8: Use monkeypatch in tests 3 & 4 for proper env isolation (prevents pre-existing env vars from leaking on test failure) Design / correctness: - #3: Fix _models_dir_env docstring ("exactly once" → "once per worker") - #4: Add comment noting TRANSFORMERS_CACHE deprecation - #5: Update --models-dir help text and docs to reflect both supported layouts (bare HF_HUB_CACHE and HF_HOME), not just bare - #10: Restore pytest.skip() in download_lora() (test-only infra); remove now-redundant guard from minio_lora_service fixture - #11: Raise hub/ detection log to WARNING with guidance - #12: Replace shutil.rmtree(ignore_errors=True) with try/except so cleanup failures are logged rather than silently swallowed Not addressed: #6 (keep gpu_0 per project marker policy), #9 (pytester test deferred — complex due to conftest dependencies, low severity) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

Bug fixes: - #1: Add monkeypatch env isolation to test_apply_sets_writable_transformers_cache - #2: Add TRANSFORMERS_CACHE assertion to test_apply_bare_cache_layout (bare-layout path was missing the writable-dir check present in HF_HOME test) Minor cleanups: - #4: Move `import pytest` from inside download_lora() to module top-level (lora_utils.py is test-only infra; pytest is always available) - #5: Replace pytestconfig.getoption("--models-dir") re-checks in predownload_models/predownload_tokenizers with os.environ.get("DYNAMO_MODELS_DIR") (_models_dir_env runs first and sets the var; single source of truth) New coverage tests: - #7: test_models_dir_nonexistent_exits_with_code_2 — subprocess test verifying pytest_configure exits with returncode=2 on bad path - #8: test_download_lora_skips_in_models_dir_mode — verifies download_lora() raises pytest.skip.Exception when DYNAMO_MODELS_DIR is set Not addressed: #3 (keep gpu_0 per project guidelines and previous review retraction), #6 (hook ordering is guaranteed), #9 (complex, low priority) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: rrubin <rrubin@nvidia.com>

#8 Slice A) Adds the prefill-side parallel of decode's cleanup_failed_request so CD-bound prefill failures surface to vLLM via mark_failed_onboarding. Without this path, prefill failures post-(Some(N), true) gnmt left the request hanging in WAITING_FOR_REMOTE_KVS until vLLM's request timeout (or indefinitely if not configured). Changes: - RequestState gains g1_block_ids: Mutex<Option<Vec<BlockId>>>, populated in on_usaa with the FULL G1 window vLLM allocated (not the local-match prefix kick_onboard fills). Failure paths report the full window so vLLM aborts every allocated slot, not just the ones we tried to onboard. - New PrefillCoordinatorImpl::cleanup_failed_request (idempotent, ordered: mark_failed_onboarding → session.close → state eviction). Pre-USAA failures see g1=None and surface only the request_id; the worker handler's request_id pairing (S3a) ensures vLLM still unblocks the request. - run_setup Err and on_usaa's spawned kick_onboard Err now route to cleanup_failed_request instead of logging-only. Tests (cd_prefill_e2e: was 7+1, now 10+1 ignored): - cd_prefill_cleanup_post_usaa_surfaces_full_g1_window — induced mid-flight failure surfaces the full G1 window to vLLM. - cd_prefill_cleanup_pre_usaa_surfaces_request_id_only — pre-USAA failure surfaces empty block_ids; request_id pairing (S3a) makes this sufficient for vLLM to unblock. - cd_prefill_cleanup_is_idempotent — multiple failure-detection paths converging on the same request fire mark_failed_onboarding exactly once. Pre-USAA failures cannot report block_ids (none exist yet); documented limitation per cd-error-path-design.md §6 Q2. Slices B (lifecycle escalation) / D (offload deadline) build on this path and remain on the queue. Test baselines: lib 192, cd_prefill_e2e 10+1 ignored, all other suites unchanged. Workspace-wide regression check pending (S10).

…es seen (#8 Slice C) run_remote_pipeline previously broke out of the commits loop on CommitDelta::Closed without checking whether all expected hashes had arrived first. In the failure case where prefill closes its commit stream prematurely (e.g. mid-pipeline crash, hub-routing mismatch, peer's own setup failure), decode would proceed into the availability drain loop and only surface the failure later via "availability drained with N of M filled" — by which point the request had already eaten an availability cycle. Now bail with anyhow::Error as soon as Closed arrives short. The existing run_remote_pipeline spawn handler routes Err to cleanup_failed_request, which marks the unfilled remote slice failed via mark_failed_onboarding — vLLM aborts that slice (local-match was already onboarded so it stays). Audit event: decode_commits_closed_short (seen / expected counts). Test: cd_decode_commits_closed_short_fails_request — inject half the expected commits + Closed, assert the remote G1 slice is surfaced via mark_failed_onboarding. Passes alongside existing local-kick / remote-pull failure-path tests. Test baselines: cd_decode_e2e 4 → 5; lib + all unified suites unchanged.

…_lifecycle_watcher (R-B Slice 1) Both `RemotePrefillCoordinator::begin_remote_prefill` (decode side) and `PrefillCoordinatorImpl::run_setup` (prefill side) used to inline near-identical tokio::spawn loops driving `session.lifecycle()` to a terminal Detached / Failed / watchdog and emitting role-prefixed audit events. Helper `disagg::lifecycle::spawn_lifecycle_watcher` already existed (added during the session-refactor) but was unused. This slice: - Extends the helper to take a `FnOnce(LifecycleOutcome) -> Future` closure (decode needs `await`-driven failure_sink dispatch). - Helper now formats role-prefixed audit names (`{role}_lifecycle_{detached,failed,watchdog_fired,stream_ended}`) so existing trace tooling (cd_unified_prefill_audit_equiv's `prefill_lifecycle_*` filter, etc) keeps working unchanged. - Decode call site collapses into the helper + two private fns (`decode_failure_reason`, `invoke_failure_sink`) for the failure-sink dispatch logic. Same audits, same ordering (sink invocation BEFORE state eviction). - Prefill call site collapses into the helper + a 5-line on_evict closure that drops the per-request state via Weak<Self> upgrade. - Per-file LIFECYCLE_WATCHDOG consts retired in favor of the shared one in `lifecycle.rs`; prefill's injectable `lifecycle_watchdog: Duration` field (#8) is unchanged. Net: ~140 LOC removed from decode.rs (inline match + audits), ~50 LOC removed from prefill_coordinator.rs. Behavior preserved verbatim — same audit event names, same ordering, same fire-once semantics. Test baselines unchanged: lib 192, cd_decode_e2e 5, cd_prefill_e2e 11+0, cd_loopback 1, cd_bidirectional_e2e 2, all 4 unified suites green. 223 active tests across the CD scope. Reduces R-B Slice 5's blast radius: the unified leader's collapsed coordinator inherits a single watcher pattern instead of needing to merge two divergent inline copies.

…later Reader-first ordering. Old order buried Quick start at position 8 of 9; new order surfaces the runnable commands above all the reference sections (FSx vs EBS, label conventions, aiperf SHA rationale). New section order: 1. Title + 3-config table 2. Pre-requisites (was #3) 3. Quick start (was #8 — promoted) 4. Directory layout (was #7 — now serves as map for the rest) 5. Hardware targets (was #2 — now pure reference; invocation examples moved into Quick start) 6. Storage (was #5) 7. aiperf install (was #6) 8. Naming & ownership (was #4) 9. Notes (unchanged) Also drop the stray "We use ebs by default" sentence — it contradicted both the Storage section and the actual yaml (where the PVC block is fully commented out, no default storage class is set). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cycle Address senior-architect review of the metrics infrastructure: #1 (two parallel Prometheus paths): merge `register_prometheus` + `setup_component_metrics` into one `setup_metrics(ctx) -> MetricsBindings` method on the Rust `LLMEngine` trait. The trait surface shrinks from 7 → 6 methods; both expfmt bridging and the structured component publisher go through one place. The Python ABC keeps its two methods — the PyO3 bridge adapts (engine-author API unchanged for Python). #3 + #8 (framework lifecycle gauges gated on engine opt-in): - `cleanup_time_seconds` and `drain_time_seconds` move to Rust-side `LifecycleGauges` (prometheus crate gauges registered via the runtime's `MetricsRegistry`). Worker constructs them after `engine.start()` succeeds; observes during cleanup/drain. Emits regardless of engine opt-in. - Drop `set_cleanup_time` / `set_drain_time` from the `ComponentMetricsPublisher` trait — engine no longer responsible. - Drop the corresponding methods from `PyComponentMetricsPublisher`. - `model_load_time_seconds` STAYS on Python `LLMBackendMetrics` for parity with the legacy entry points (both legacy `main.py` and the unified bridge populate it). The unified bridge now constructs `LLMBackendMetrics` UNCONDITIONALLY so the gauge emits even when the engine returns no per-rank sources. #5 (kv_cache_hit_rate Optional semantics): clearer docstring on both Rust `ComponentSnapshot` and the Python ABC — tri-state explicitly documented (`None` = no data, `0.0` = legitimate zero-hit). Deferred with rationale (explained in the post-review architect's note): - #2 (HashMap<MetricKey, f64> ComponentSnapshot): real future-proofing but no current bug; 5 fields for 5 signals isn't worth the refactor cost yet. Revisit at ~10+ signals. - #4 (per-source cadence): explicit defer in original review. - #7 (conformance in CI): workflow change, separate from trait surface. Conformance kit collapses `check_register_prometheus` + `check_component_metrics` into `check_setup_metrics`. All 68 backend-common unit tests pass. Mocker example updated. PyO3 bridge wheel rebuilt successfully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

saturley-hall added 4 commits March 4, 2025 12:03

ci: cache image on main for speed of ci

89d25e2

ci: run it on PR before setting back to main

e688847

ci: move to env var specification

7ec5032

ci: trigger main cache on push to main

ec15912

saturley-hall temporarily deployed to GITLAB March 4, 2025 17:57 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 17:59 — with GitHub Actions Inactive

saturley-hall requested review from nv-anants and whoisj March 4, 2025 18:13

saturley-hall marked this pull request as ready for review March 4, 2025 18:15

saturley-hall requested a review from nnshah1 as a code owner March 4, 2025 18:15

saturley-hall requested a review from a team March 4, 2025 18:15

Merge branch 'main' into harrison/main_branch_cache

04817b5

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:15 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:20 — with GitHub Actions Inactive

whoisj requested changes Mar 4, 2025

View reviewed changes

Comment thread .github/workflows/cache_image.yaml Outdated

Comment thread .github/workflows/cache_image.yaml Outdated

nit: fix indentation

165400e

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:30 — with GitHub Actions Inactive

whoisj approved these changes Mar 4, 2025

View reviewed changes

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:31 — with GitHub Actions Inactive

mc-nv reviewed Mar 4, 2025

View reviewed changes

Comment thread .github/workflows/cache_image.yaml Outdated

ci: remove bash copy-paste error

e9c721f

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:40 — with GitHub Actions Inactive

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:43 — with GitHub Actions Inactive

Merge branch 'main' into harrison/main_branch_cache

0d41be0

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:56 — with GitHub Actions Inactive

saturley-hall requested a review from pvijayakrish March 4, 2025 18:59

saturley-hall temporarily deployed to GITLAB March 4, 2025 18:59 — with GitHub Actions Inactive

nv-anants reviewed Mar 4, 2025

View reviewed changes

Comment thread .github/workflows/cache_image.yaml

nv-anants reviewed Mar 4, 2025

View reviewed changes

Comment thread .github/workflows/pr_github_validation.yaml

saturley-hall requested a review from nv-anants March 4, 2025 20:01

nv-anants approved these changes Mar 4, 2025

View reviewed changes

saturley-hall merged commit 0e673dc into main Mar 4, 2025

saturley-hall deleted the harrison/main_branch_cache branch March 4, 2025 20:33

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

CI: setup caching for any merge to main (ai-dynamo#8)

8fb1421

dagil-nvidia mentioned this pull request Oct 21, 2025

feat: Add automated dependency version tracking and extraction #3547

Closed

coderabbitai Bot mentioned this pull request Nov 5, 2025

feat: Update Dynamo k8s deployment example to use ModelExpress #4112

Closed

tanmayv25 mentioned this pull request Apr 15, 2026

DEP: Backend Interface -- LLMEngine ABC and Worker #8251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: setup caching for any merge to main#8

CI: setup caching for any merge to main#8
saturley-hall merged 8 commits into
mainfrom
harrison/main_branch_cache

saturley-hall commented Mar 4, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

Uh oh!

whoisj commented Mar 4, 2025

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

saturley-hall commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

whoisj commented Mar 4, 2025

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

saturley-hall commented Mar 4, 2025 •

edited

Loading

github-actions Bot commented Mar 4, 2025 •

edited

Loading