-
Notifications
You must be signed in to change notification settings - Fork 172
[TRITON] Add script to select Triton tests based on diff content #1682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
242d8f4 to
c26fc58
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
64e8d2c to
dc110b6
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
6e37b57 to
45d86fc
Compare
This comment was marked as resolved.
This comment was marked as resolved.
| source_branch, | ||
| ) | ||
| else: | ||
| git_check_branch(source_branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the reason CI didn't find the branch, could you add some debug steps like git branch -r and git branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I'll add some git commands before running the script to see what's going on. Thank you for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't run any git command in the container with docker exec -w /workspace triton_test git ....
I was getting the following error:
fatal: detected dubious ownership in repository at '/workspace'
To add an exception for this directory, call:
git config --global --add safe.directory /workspace
This error happens due to a security feature that occurs when the repository's owner is different from the user running the git command. This prevents unauthorized scripts from running in repositories not owned by the current user. You can resolve this error by either explicitly marking the directory as safe in your Git configuration or by changing the directory's ownership to your current user.
After running git config --global --add safe.directory /workspace the test selection script is working. However, I'm not sure if this is the best solution! I'm open to suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of reference:
aiter/.github/workflows/triton-test.yaml
Lines 126 to 127 in c43f997
| docker exec -w /workspace triton_test \ | |
| git config --global --add safe.directory /workspace |
10d6be7 to
c744074
Compare
eabb64d to
c43f997
Compare
brunomazzottiamd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issues to be resolved.
| git fetch origin --no-tags \ | ||
| +refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }} | ||
| # TODO: Evaluate if this step should be removed before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve before merging.
| triton_test \ | ||
| pip install pytest | ||
| # TODO: Evaluate if the security exception is the best way to fix the script execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve before merging.
| python .github/scripts/select_triton_tests.py \ | ||
| --source '${{ github.head_ref }}' --target 'remotes/origin/${{ github.base_ref }}' \ | ||
| --env-var TRITON_TEST --env-file "${ENV_FILE}" | ||
| # TODO: Comment the follwing command before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve before merging, i.e. comment the following line.
|
Marked old discussions as resolved, hiding them. |
|
@gyohuangxin, I have good news to share: The test selection script is working on CI! Taking this PR as an example, we're able to achieve a 22x speedup on test execution. Please take a look at "Triton test execution speedup" section of PR description, I updated it with more details. I'd be glad if you could review the proposed changes on Tomorrow I'll try to handle forked PRs, I think it's a matter of fetching the source and target branches. |
c43f997 to
57a8c61
Compare
| # TODO: Evaluate if the security exception is the best way to fix the script execution. | ||
| # TODO: Uncomment [docker exec -w /workspace triton_test cat "${ENV_FILE}" >> "${GITHUB_ENV}"] | ||
| # command to enable test selection. | ||
| - name: Triton Test Selection Script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we will always run all tests in main branch, we should skip this step when it's a main branch.
| - name: Triton Test Selection Script | |
| - name: Triton Test Selection Script | |
| if: ${{ github.ref != 'refs/heads/main' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding forked PRs, I think the simplest approach is to still run the full test suite. So we can change it to the following:
| - name: Triton Test Selection Script | |
| - name: Triton Test Selection Script | |
| if: ${{ github.ref != 'refs/heads/main' && !github.event.pull_request.head.repo.fork }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to enable forked PRs, but I'm having a hard time... I'll try a bit more today, if I can't get it to work then I'll add the if statement as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this:
- name: Run a step only if it's from a fork
if: ${{ github.event.pull_request.head.repo.fork }}
run: echo "This PR is from a fork."
- name: Run a step only if it's from the same repo (not a fork)
if: ${{ !github.event.pull_request.head.repo.fork }}
run: echo "This PR is from an internal branch."It's always printing This PR is from a fork., for #1682 (not a fork) and #1804 (a fork). What's going on? I'm very confused...
| echo "Running Triton Tests..." | ||
| docker exec -w /workspace triton_test mkdir -p test-reports | ||
| docker exec -w /workspace triton_test pytest -v ${{ env.TRITON_TEST }} --junitxml=test-reports/triton.xml | ||
| docker exec -w /workspace triton_test pytest -v ${TRITON_TEST} --junitxml=test-reports/triton.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we skip the tests on the main branch, we need to add another step.
echo ${{ env.TRITON_TEST }} >> "${GITHUB_ENV}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my previous tests, I think this isn't required. ${TRITON_TEST} initial value is ${{ env.TRITON_TEST }}, if we don't mess with ${GITHUB_ENV} file it should be fine. I'll perform more tests later today just to confirm my hypothesis.
GitHub already has a "show timestamps" feature, so logging timestamps isn't adding anything.
|
@brunomazzottiamd Congratulations! This is a huge amount of work, thanks so much for all the effort for AIter CI! |
|
Forked PR test being done in #1804. |
5d3c6c5 to
4102cb9
Compare
* Add benchmarks and test selection script to Triton paths filter. * Install NetworkX dependency of test selection script. * Fetch target branch from remote. * Show available branches: It's for debugging purposes, maybe it could be removed before merging. * Run test selection script: It's still a dry run, the result of test selection script isn't being used by pytest yet.
4102cb9 to
a0662d4
Compare
This reverts commit 4bb00df.
Motivation
Triton test step of AITER CI is taking too long and becoming a bottleneck for PR merging. We should programmatically select which Triton tests to run, based on the diff content of a given PR. The full test suite can be executed periodically on
mainbranch only.This PR proposes a Python script that given
automatically selects which Triton tests validate the changes to be merged.
Technical Details
Script help text:
Implementation details:
subprocessmodule to rungitcommands and find out what has changed between source and target branches.pathlibmodule to list all Triton source files in AITER code base (kernels, kernel configurations, unit tests, benchmark scripts).astmodule to recursively parse the Triton source files, tracking all dependency relations. These dependency relations can be between two source files or between a kernel configuration file and a kernel source file.networkxdirected graph. Later, the graph is traversed starting from the diff content until we reach unit tests. All unit tests that are reachable should be executed to validate the proposed changes.Test Plan
Tested locally with two scenarios:
aiter/ops/triton/_triton_kernels/mha.py→ MHA forward kernelaiter/ops/triton/_triton_kernels/mha_onekernel_bwd.py→ MHA backward kernelaiter/ops/triton/_triton_kernels/rope.py→ RoPE kernelaiter/ops/triton/configs/gfx942-GMM.json→ GMM kernel configurationop_tests/op_benchmarks/triton/bench_topk.py→ Top-k benchmark scriptbmazzott/run-triton-tests-selectively, which has changes in:aiter/ops/triton/_triton_kernels/fused_gemm_afp4wfp4_a16w16.py→ standardization ofget_gemm_configargumentsaiter/ops/triton/_triton_kernels/fused_gemm_afp4wfp4_mul_add.py→ standardization ofget_gemm_configargumentsaiter/ops/triton/_triton_kernels/gemm_afp4wfp4.py→ standardization ofget_gemm_configargumentsaiter/ops/triton/_triton_kernels/gmm.py→ standardization of kernel config file patternaiter/ops/triton/_triton_kernels/pod_attention.py→ untested kernelTest Result
Summarized script output:
Test case A:
Test case B:
Test case C:
Test case D:
Triton test execution speedup
Using this PR as an example:
bmazzott/run-triton-tests-selectivelymainTODO before merging
IMPORTANT NOTICE!
mainbranch.Submission Checklist