WIP: adding pr label auto validation by cquil11 · Pull Request #167 · SemiAnalysisAI/InferenceX

cquil11 · 2025-11-05T18:03:55Z

Design Doc: Auto PR Validation via Labels

Validate all labels: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19142957749/job/54712862609

Introduction

Currently, when a change is made to a critical-path for a particular configuration, users must manually run workflows to validate the changes. In an effort to move towards general ease-of-use within the InferenceMAX environment, we propose adding validation of particular configs (i.e., gptoss-h200) via pull request labels.

Implementation

Since PR #145, developers have more control over which jobs run, thanks to external master config files and a Python reconciliation script. This architecture lends itself nicely to having a CI label for auto validation of configurations. Below we propose how this will work.

When a user opens a PR, they will still be responsible for identifying what configuration their changes affect. However, once they do so, they will be able to add a label to the PR of the form <RUNNER_TYPE>_<MODEL_PREFIX>. This will then invoke a new workflow label-validation.yml which will run

on:
    pull_request:
        types: [labeled]
        branches:
            - main`

First, a job will be run to parse the label and decide whether or not it meets the format we expect <RUNNER_TYPE>_<MODEL_PREFIX>. Note that this validation just performs a simple regex match for two strings separated by a “_” delimiter. It was considered to perform more validation on this label by loading the .github/configs files and checking to make sure RUNNER_TYPE and MODEL_PREFIX actually exist and are valid, however this would add complexity to the workflow file that we deem unnecessary for now. This can always be added in the future. Upon a match, we save the pattern groups accordingly as runner-type and model-prefix as job outputs to be passed to the next job, get-jobs. This job invokes

python3 ${GITHUB_WORKSPACE}/utils/matrix-logic/generate_sweep_configs.py full-sweep --model-prefix MODEL_PREFIX --runner-type RUNNER_TYPE  --seq-lens 1k1k --test-mode --config-files ...

and saves the output to GITHUB_OUTPUT for use in the next job, which will actually generate the matrix and run the appropriate jobs.

As a quick refresher, this command will find all the configs that match RUNNER_TYPE, MODEL_PREFIX, and sequence length 1k1k and run only the highest TP level with the lowest available CONC (as indicated by --test-mode).

For instance, let’s say that a developer makes a change to benchmarks/gptoss_fp4_b200_docker.sh (critical path), they would then add the label b200_gptoss. This will trigger the label-validation.yml workflow and first run the job parse-label which will successfully match the regex pattern to extract RUNNER_TYPE and MODEL_PREFIX and send it to the appropriate job output variables. The next step will only run if the job output variables from the previous step are not empty, and will invoke the full-sweep command with the appropriate --runner-type and --model-prefix. Upon success, the validate job will run, calling the actual benchmark-tmpl.yml workflow with the generated parameters from the previous job. Finally, the calc-success-rate job (per usual) at the end of the workflow to calculate the per-GPU success rate.

Labels to Add

The following labels will be added, with runners in accordance with the keys in .github/configs/runners.yaml:

h100_gptoss
h200_{gptoss,dsr1}
b200_{gptoss,dsr1}
b200-trt_{gptoss,dsr1}
mi300x_{gptoss,dsr1}
mi325x_{gptoss,dsr1}
mi355x_{gptoss,dsr1}

GB200 Integration

Currently, the way GB200 (and all multinode tests) are run is very different from the single node configurations. This is not desirable and should be fixed ASAP (see issue https://github.com/InferenceMAX/InferenceMAX/issues/171).

Therefore, we will skip integrating GB200 with label auto-validation (for now). This is fine since there is a separate gb200-tests.yml workflow.

Considerations

Hard-Coding Config Files

One possible point of concern is the fact that the master config files along with the runner config file are hardcoded in the get-jobs job. This is potentially problematic as this file will need to be updated if and when additional config files are added in the future.

Insufficient Coverage

Recall that the invoked command to the generate_sweep_configs.py script applies the --test-mode argument, which runs only the highest TP + lowest concurrency run for the specified configuration. This can be problematic if, say, a developer makes a change that only affects the TP4 configurations, and the label only runs the TP8 configurations.

This is likely fine, because in this case, the developer can just manually run the e2e-tests.yml workflow and be more specific about what they would like to run.

Too Much Coverage

Contrary to the point above, the label (being quite inspecific) may unnecessarily run certain configs. For instance, in the case of models with multiple precisions, such as DeepSeek, adding the label h200_dsr1 will run both FP4 and FP8, even if only the FP8 critical path needs to be tested.

Again, this is probably fine for now because the label is still net-net more convenient than manually running the workflows.

Providing an Incorrect Label

If an invalid label is provided, the workflow will fail since the generate_sweep_configs.py script which is ultimately invoked validates the runner type argument. Here is an example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19117241089/job/54629331775.

Security Considerations

First and foremost, only users with at least write permissions on the repo can directly add a label to a PR. This eliminates the case of a random contributor being able to trigger a workflow run on their PR from a fork.

Furthermore, we tested the following path:

Random contributor (less than write permissions) creates a fork and then opens a PR against InferenceMAX/main
A codeowner (member of InferenceMAX with at least write permissions) adds a label on the random contributor’s PR that is supposed to trigger a workflow
At this point, the codeowner must manually kick of the workflows as specified in the repository’s settings for workflows run on a PR opened by a random contributor
At this point, we expected that the workflow would run normally, but secrets such as HF_TOKEN are not passed to workflows running on a PR from a fork. From our contact at GitHub: “When running a workflow for a PR coming from a fork, the secrets are not available. It runs in the context of the fork, not the main repo.”

Total Lines: 956

Copilot

Pull Request Overview

This PR introduces automated validation of critical-path configurations through GitHub PR labels. When developers add a label matching <RUNNER_TYPE>_<MODEL_PREFIX> (e.g., b200_gptoss), it automatically triggers validation workflows for the affected configuration.

Key Changes:

Adds label-validation.yml workflow that triggers on PR label events
Parses labels to extract runner type and model prefix, then generates appropriate test configurations
Includes temporary GB200 workaround pending resolution of multi-node testing architecture issue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

functionstackx

missing collect-results

functionstackx

lgtm

cquil11 added the gptoss_h200 label Nov 5, 2025

cquil11 added h200-trt_gptoss and removed gptoss_h200 labels Nov 5, 2025

cquil11 added b200_dsr1 and removed h200-trt_gptoss labels Nov 5, 2025

cquil11 added 3 commits November 5, 2025 15:49

adding pr label auto validation

3327dd9

cant escalpe # in github actions i guess

f523541

yes u can

3e8a6b9

cquil11 added h100_gptoss labels Nov 6, 2025

Merge branch 'main' into pr-label-auto-validation

7914fe2

cquil11 removed the h100_dsr1 label Nov 6, 2025

add unlabeled event trigger

d16a63e

cquil11 marked this pull request as ready for review November 6, 2025 16:49

cquil11 requested a review from a team as a code owner November 6, 2025 16:49

Copilot AI review requested due to automatic review settings November 6, 2025 16:49

Copilot AI reviewed Nov 6, 2025

View reviewed changes

Comment thread .github/workflows/label-validation.yml

Comment thread .github/workflows/label-validation.yml

functionstackx requested changes Nov 6, 2025

View reviewed changes

cquil11 added 2 commits November 6, 2025 11:45

remove gb200

1a2dded

Merge branch 'main' into pr-label-auto-validation

e80fa2d

cquil11 removed gb200_gptoss labels Nov 6, 2025

add collect results

cea3692

cquil11 mentioned this pull request Nov 6, 2025

Add model name in some capacity to summarize results table #193

Closed

Copilot AI mentioned this pull request Nov 6, 2025

[WIP] Add model name to summarize results table #194

Closed

functionstackx approved these changes Nov 6, 2025

View reviewed changes

cquil11 merged commit 7f418c0 into main Nov 6, 2025
20 of 21 checks passed

cquil11 deleted the pr-label-auto-validation branch November 6, 2025 21:12

Copilot AI mentioned this pull request Nov 6, 2025

Add Model column to collect-results summary table #195

Merged

Conversation

cquil11 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Doc: Auto PR Validation via Labels

Introduction

Implementation

Labels to Add

GB200 Integration

Considerations

Hard-Coding Config Files

Insufficient Coverage

Too Much Coverage

Providing an Incorrect Label

Security Considerations

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

github-actions Bot commented Nov 5, 2025

📊 Line Count Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

cquil11 commented Nov 5, 2025 •

edited

Loading