Skip to content

ci: add a fast test suite#2031

Merged
terrykong merged 6 commits intomainfrom
tk/ci-slim
Feb 27, 2026
Merged

ci: add a fast test suite#2031
terrykong merged 6 commits intomainfrom
tk/ci-slim

Conversation

@terrykong
Copy link
Copy Markdown
Collaborator

@terrykong terrykong commented Feb 27, 2026

What does this PR do ?

Intro a new test suite called when the CI:Lfast label is applied. The high level changes are:

  1. it skips container build and uses the latest nightly + prefetch_venvs.py to bring the dependencies to the correct state
  2. it runs a fast version of all the tests
  3. L1 tests are parallelized in this mode

Will still run new unit tests since it follows an exclusion pattern, but if you want a test to be run in L1 fast mode you have to label it appropriately.

We will still run nightly tests on the whole suite, but this will give us a way to get thru prs quicker.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced new "Lfast" test level for running fast test variants.
    • Added FAST mode support enabling selective test execution with conditional skipping.
    • Implemented image tagging functionality for container build workflows.
    • Added nightly container tagging capability to the CI/CD pipeline.

@terrykong terrykong requested a review from a team as a code owner February 27, 2026 05:57
@github-actions github-actions Bot added the CI Relating to CI label Feb 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

The PR introduces a "Lfast" test level and "fast mode" support for CI/CD pipelines. A new image-tag input parameter controls Docker image versioning (supporting "nightly" or default tags), while exclusion lists enable selective test skipping. Workflow and test scripts are updated to propagate and use this configuration across functional and unit tests.

Changes

Cohort / File(s) Summary
GitHub Action Configuration
.github/actions/test-template/action.yml
Added new public input image-tag with default fallback to github.run_id, updating Docker image pull and container run commands to use the tag when provided, with conditional environment variables NRL_FORCE_REBUILD_VENVS and FAST=1.
CI/CD Workflow
.github/workflows/cicd-main.yml
Introduced new test level Lfast with image_tag output derivation (default empty, set to "nightly" for Lfast). Added new job cicd-fast-functional-tests for Lfast mode. Propagated image_tag to all downstream test jobs and added nightly container tagging step after CI_QA_Gate. Extended gating logic to include Lfast dependencies with skipped result handling.
Functional Test Script
tests/functional/L1_Functional_Tests_GPU.sh
Added run_test() helper function supporting fast/full mode semantics: skips execution when FAST=1, otherwise runs with timing. Wrapped all test invocations with run_test and conditionally skip research functional tests in FAST mode.
Unit Test Scripts
tests/unit/L0_Unit_Tests_Generation.sh, tests/unit/L0_Unit_Tests_Other.sh, tests/unit/L0_Unit_Tests_Policy.sh
Introduced FAST mode support with conditional sourcing of excluded_unit_tests.sh. Refactored test invocations to use parameterized TEST_PATHS, IGNORE, and EXCLUDED_UNIT_TESTS arrays. Added conditional skipping of research unit tests when FAST=1. All test blocks now respect the exclusion lists and run from the tests directory context.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI, CI:L1

Suggested reviewers

  • chtruong814
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major changes to test suite including new CI/CD job, but description is a template with no testing documentation, verification results, or evidence the feature works correctly. Add comprehensive testing documentation including test execution results, CI job verification, performance metrics, and check off contributor checklist items.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci: add a fast test suite' accurately summarizes the main change—introducing a fast test mode across the CI/CD pipeline and test scripts.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/ci-slim

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/functional/L1_Functional_Tests_GPU.sh (1)

73-80: Quote command substitution to prevent word splitting.

Same static analysis warning as in L0_Unit_Tests_Other.sh. Consider quoting for defensive coding.

🔧 Proposed fix
 if [[ "${FAST:-0}" != "1" ]]; then
     for test_script in research/*/tests/functional/*.sh; do
-        project_dir=$(echo $test_script | cut -d/ -f1-2)
+        project_dir=$(echo "$test_script" | cut -d/ -f1-2)
         pushd $project_dir
-        time uv run --no-sync bash $(echo $test_script | cut -d/ -f3-)
+        time uv run --no-sync bash "$(echo "$test_script" | cut -d/ -f3-)"
         popd
     done
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/functional/L1_Functional_Tests_GPU.sh` around lines 73 - 80, The shell
loop is using unquoted command substitutions which can cause word-splitting;
update usages of test_script and project_dir and the echo substitution so they
are quoted: quote "${test_script}" when computing project_dir and when passing
into pushd/popd and quote the result of the path extraction used with bash/uv
run (e.g., quote the substitution that computes the relative test path).
Specifically update references to FAST, test_script, project_dir, pushd, popd
and the uv run invocation so all command substitutions and variables are wrapped
in double quotes to prevent word splitting.
tests/unit/L0_Unit_Tests_Other.sh (1)

68-76: Quote command substitution to prevent word splitting.

Static analysis flagged unquoted command substitution on line 71. While the project prefers consistent formatting, this particular case could cause issues with paths containing spaces.

🔧 Proposed fix
 if [[ "${FAST:-0}" != "1" ]]; then
     for i in research/*/tests/unit; do
-        project_dir=$(dirname $(dirname $i))
+        project_dir=$(dirname "$(dirname "$i")")
         pushd $project_dir
         uv run --no-sync pytest tests/unit
         popd
     done
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/L0_Unit_Tests_Other.sh` around lines 68 - 76, The unquoted command
substitutions in the loop can cause word-splitting for paths with spaces; change
project_dir=$(dirname $(dirname $i)) to project_dir="$(dirname "$(dirname
"$i")")" and also quote usages of the variable when changing directories (e.g.,
pushd "$project_dir" and popd as appropriate) so all command substitutions and
variable expansions are safely quoted; update references in the loop handling
(variable i, project_dir, and pushd) to use these quoted forms.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cicd-main.yml:
- Around line 328-351: The pipeline fails because the test-template action
declares azure-client-id, azure-tenant-id, and azure-subscription-id as required
but most callers (e.g., cicd-fast-functional-tests, cicd-doc-tests,
cicd-unit-tests, cicd-functional-tests) do not pass them; update
.github/actions/test-template/action.yml to mark these inputs as required: false
and give them default: '' (empty string) and ensure the action uses the
has-azure-credentials input to gate Azure usage, or alternatively make the
callers set has-azure-credentials: "false" and provide empty placeholders for
azure-client-id/azure-tenant-id/azure-subscription-id in jobs like
cicd-fast-functional-tests; adjust the action logic to only read Azure inputs
when has-azure-credentials is true.

---

Nitpick comments:
In `@tests/functional/L1_Functional_Tests_GPU.sh`:
- Around line 73-80: The shell loop is using unquoted command substitutions
which can cause word-splitting; update usages of test_script and project_dir and
the echo substitution so they are quoted: quote "${test_script}" when computing
project_dir and when passing into pushd/popd and quote the result of the path
extraction used with bash/uv run (e.g., quote the substitution that computes the
relative test path). Specifically update references to FAST, test_script,
project_dir, pushd, popd and the uv run invocation so all command substitutions
and variables are wrapped in double quotes to prevent word splitting.

In `@tests/unit/L0_Unit_Tests_Other.sh`:
- Around line 68-76: The unquoted command substitutions in the loop can cause
word-splitting for paths with spaces; change project_dir=$(dirname $(dirname
$i)) to project_dir="$(dirname "$(dirname "$i")")" and also quote usages of the
variable when changing directories (e.g., pushd "$project_dir" and popd as
appropriate) so all command substitutions and variable expansions are safely
quoted; update references in the loop handling (variable i, project_dir, and
pushd) to use these quoted forms.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7556d7 and 03c3204.

📒 Files selected for processing (6)
  • .github/actions/test-template/action.yml
  • .github/workflows/cicd-main.yml
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/unit/L0_Unit_Tests_Generation.sh
  • tests/unit/L0_Unit_Tests_Other.sh
  • tests/unit/L0_Unit_Tests_Policy.sh

Comment thread .github/workflows/cicd-main.yml
@terrykong terrykong requested a review from a team as a code owner February 27, 2026 07:53
@terrykong terrykong force-pushed the tk/ci-slim branch 2 times, most recently from fec41cb to 82ab72d Compare February 27, 2026 07:58
@terrykong terrykong added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Feb 27, 2026
@terrykong terrykong requested a review from yuki-97 February 27, 2026 08:00
Signed-off-by: Terry Kong <terryk@nvidia.com>
The _build_container.yml reusable workflow already tags images with
the branch name (:main), so the separate tag-nightly-container job
was redundant. Remove it and update Lfast to reference :main.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Replace NRL_FORCE_REBUILD_VENVS=true with a one-time prefetch_venvs.py
call and fingerprint regeneration when reusing a pre-built container.
This avoids rebuilding venvs on every Ray remote call, significantly
speeding up test execution.

Signed-off-by: Terry Kong <terryk@nvidia.com>
The file was untracked, causing "No such file or directory" errors
in L0_Unit_Tests_Other.sh when running in FAST mode.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Feb 27, 2026
@terrykong terrykong merged commit 4a7aa47 into main Feb 27, 2026
44 of 45 checks passed
@terrykong terrykong deleted the tk/ci-slim branch February 27, 2026 22:08
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) CI Relating to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants