Skip to content

WIP: adding pr label auto validation#167

Merged
cquil11 merged 33 commits intomainfrom
pr-label-auto-validation
Nov 6, 2025
Merged

WIP: adding pr label auto validation#167
cquil11 merged 33 commits intomainfrom
pr-label-auto-validation

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Nov 5, 2025

Design Doc: Auto PR Validation via Labels

Validate all labels: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19142957749/job/54712862609

Introduction

Currently, when a change is made to a critical-path for a particular configuration, users must manually run workflows to validate the changes. In an effort to move towards general ease-of-use within the InferenceMAX environment, we propose adding validation of particular configs (i.e., gptoss-h200) via pull request labels.

Implementation

Since PR #145, developers have more control over which jobs run, thanks to external master config files and a Python reconciliation script. This architecture lends itself nicely to having a CI label for auto validation of configurations. Below we propose how this will work.

When a user opens a PR, they will still be responsible for identifying what configuration their changes affect. However, once they do so, they will be able to add a label to the PR of the form <RUNNER_TYPE>_<MODEL_PREFIX>. This will then invoke a new workflow label-validation.yml which will run

on:
    pull_request:
        types: [labeled]
        branches:
            - main`

First, a job will be run to parse the label and decide whether or not it meets the format we expect <RUNNER_TYPE>_<MODEL_PREFIX>. Note that this validation just performs a simple regex match for two strings separated by a “_” delimiter. It was considered to perform more validation on this label by loading the .github/configs files and checking to make sure RUNNER_TYPE and MODEL_PREFIX actually exist and are valid, however this would add complexity to the workflow file that we deem unnecessary for now. This can always be added in the future. Upon a match, we save the pattern groups accordingly as runner-type and model-prefix as job outputs to be passed to the next job, get-jobs. This job invokes

python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --model-prefix MODEL_PREFIX --runner-type RUNNER_TYPE  --seq-lens 1k1k --test-mode --config-files ...

and saves the output to GITHUB_OUTPUT for use in the next job, which will actually generate the matrix and run the appropriate jobs.

As a quick refresher, this command will find all the configs that match RUNNER_TYPE, MODEL_PREFIX, and sequence length 1k1k and run only the highest TP level with the lowest available CONC (as indicated by --test-mode).

For instance, let’s say that a developer makes a change to benchmarks/gptoss_fp4_b200_docker.sh (critical path), they would then add the label b200_gptoss. This will trigger the label-validation.yml workflow and first run the job parse-label which will successfully match the regex pattern to extract RUNNER_TYPE and MODEL_PREFIX and send it to the appropriate job output variables. The next step will only run if the job output variables from the previous step are not empty, and will invoke the full-sweep command with the appropriate --runner-type and --model-prefix. Upon success, the validate job will run, calling the actual benchmark-tmpl.yml workflow with the generated parameters from the previous job. Finally, the calc-success-rate job (per usual) at the end of the workflow to calculate the per-GPU success rate.

Labels to Add

The following labels will be added, with runners in accordance with the keys in .github/configs/runners.yaml:

  • h100_gptoss
  • h200_{gptoss,dsr1}
  • b200_{gptoss,dsr1}
  • b200-trt_{gptoss,dsr1}
  • mi300x_{gptoss,dsr1}
  • mi325x_{gptoss,dsr1}
  • mi355x_{gptoss,dsr1}

GB200 Integration

Currently, the way GB200 (and all multinode tests) are run is very different from the single node configurations. This is not desirable and should be fixed ASAP (see issue https://github.com/InferenceMAX/InferenceMAX/issues/171).

Therefore, we will skip integrating GB200 with label auto-validation (for now). This is fine since there is a separate gb200-tests.yml workflow.

Considerations

Hard-Coding Config Files

One possible point of concern is the fact that the master config files along with the runner config file are hardcoded in the get-jobs job. This is potentially problematic as this file will need to be updated if and when additional config files are added in the future.

Insufficient Coverage

Recall that the invoked command to the generate_sweep_configs.py script applies the --test-mode argument, which runs only the highest TP + lowest concurrency run for the specified configuration. This can be problematic if, say, a developer makes a change that only affects the TP4 configurations, and the label only runs the TP8 configurations.

This is likely fine, because in this case, the developer can just manually run the e2e-tests.yml workflow and be more specific about what they would like to run.

Too Much Coverage

Contrary to the point above, the label (being quite inspecific) may unnecessarily run certain configs. For instance, in the case of models with multiple precisions, such as DeepSeek, adding the label h200_dsr1 will run both FP4 and FP8, even if only the FP8 critical path needs to be tested.

Again, this is probably fine for now because the label is still net-net more convenient than manually running the workflows.

Providing an Incorrect Label

If an invalid label is provided, the workflow will fail since the generate_sweep_configs.py script which is ultimately invoked validates the runner type argument. Here is an example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19117241089/job/54629331775.

Security Considerations

First and foremost, only users with at least write permissions on the repo can directly add a label to a PR. This eliminates the case of a random contributor being able to trigger a workflow run on their PR from a fork.

Furthermore, we tested the following path:

  1. Random contributor (less than write permissions) creates a fork and then opens a PR against InferenceMAX/main
  2. A codeowner (member of InferenceMAX with at least write permissions) adds a label on the random contributor’s PR that is supposed to trigger a workflow
  3. At this point, the codeowner must manually kick of the workflows as specified in the repository’s settings for workflows run on a PR opened by a random contributor
  4. At this point, we expected that the workflow would run normally, but secrets such as HF_TOKEN are not passed to workflows running on a PR from a fork. From our contact at GitHub: “When running a workflow for a PR coming from a fork, the secrets are not available. It runs in the context of the fork, not the main repo.”

This is fine, since most PRs that will require validation will come from contributors with write permissions already, and are opening up PRs in the context of the actual InferenceMAX repo and not a fork.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

12 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

File: utils/matrix-logic/generate_sweep_configs.py

Total Lines: 956

@cquil11 cquil11 removed the h100_dsr1 label Nov 6, 2025
@cquil11 cquil11 marked this pull request as ready for review November 6, 2025 16:49
@cquil11 cquil11 requested a review from a team as a code owner November 6, 2025 16:49
Copilot AI review requested due to automatic review settings November 6, 2025 16:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automated validation of critical-path configurations through GitHub PR labels. When developers add a label matching <RUNNER_TYPE>_<MODEL_PREFIX> (e.g., b200_gptoss), it automatically triggers validation workflows for the affected configuration.

Key Changes:

  • Adds label-validation.yml workflow that triggers on PR label events
  • Parses labels to extract runner type and model prefix, then generates appropriate test configurations
  • Includes temporary GB200 workaround pending resolution of multi-node testing architecture issue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/label-validation.yml
Comment thread .github/workflows/label-validation.yml
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • missing collect-results

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cquil11 cquil11 merged commit 7f418c0 into main Nov 6, 2025
20 of 21 checks passed
@cquil11 cquil11 deleted the pr-label-auto-validation branch November 6, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants