Skip to content

CI for performance regressions#973

Merged
DavidEiglspergerQC merged 20 commits intomainfrom
feat/ci_regression
Mar 13, 2026
Merged

CI for performance regressions#973
DavidEiglspergerQC merged 20 commits intomainfrom
feat/ci_regression

Conversation

@DavidEiglspergerQC
Copy link
Copy Markdown
Contributor

@DavidEiglspergerQC DavidEiglspergerQC commented Feb 10, 2026

This a first implementation of an automated CI regression detection. If you think this should be done in a different way (e.g. using asv) I am more than happy to adapt. This is just the approach I found the most lightweight and easy to implement given the current state of the updated benchmarking. Also the specific problems benchmarked need to be discussed, so we have the most representative ones there. We might also think about using a dedicated machine instead of running this with github-hosted runners.

Runtime Regression CI

This PR adds a lightweight automated runtime regression check for glum.

  • Added a dedicated workflow: .github/workflows/runtime-regression.yml

    • runs on pull_request updates and push to main (we might need to adapt this, this was just the easiest setting for me to test things)
    • compares base vs head in one run
  • Reused existing benchmark tooling (no new harness):

    • glum_benchmarks/run_benchmarks.py
    • glum_benchmarks/compare_results.py
  • Added CI benchmark config: glum_benchmarks/config_ci.yaml

    • focused representative glum problems
    • iterations: 10, num_threads: 1
    • only benchmark + analysis steps enabled
    • thresholds:
      • max_rel_slowdown: 0.15
      • max_abs_slowdown_sec: 0.05
      • max_regressed_cases: 0
  • CI stability/noise controls:

    • warmup run (discarded), then measured run
    • separate cache per ref
    • fixed thread/hash env vars
    • CI uses trimmed_mean aggregation; default benchmark behavior remains min

If regression thresholds are exceeded, the workflow fails and publishes a summary/artifacts for inspection.

@DavidEiglspergerQC DavidEiglspergerQC changed the title very first CI integration implementation CI integration implementation Feb 10, 2026
@DavidEiglspergerQC DavidEiglspergerQC changed the title CI integration implementation CI for performance regressions Feb 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a lightweight automated runtime regression detection system for the glum benchmarking pipeline, aimed at catching performance degradations in pull requests before they're merged.

Changes:

  • Added a new GitHub Actions workflow that runs performance benchmarks comparing the PR base against the head commit
  • Implemented a new comparison script that analyzes benchmark results and fails CI when regression thresholds are exceeded
  • Extended the runtime measurement utility to support configurable aggregation methods (min vs trimmed mean) to reduce CI noise

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/runtime-regression.yml New workflow that runs benchmarks for base and head commits in separate worktrees, then compares results
glum_benchmarks/config_ci.yaml CI-specific benchmark configuration with reduced scope and regression thresholds
glum_benchmarks/compare_results.py New script to compare base/head benchmark CSVs and detect regressions
glum_benchmarks/util.py Enhanced runtime measurement to support trimmed mean aggregation via environment variables
glum_benchmarks/run_benchmarks.py Added CLI arguments for config path and run name overrides, plus threshold metadata fields
pixi.toml Added rich dependency for prettier console output
pixi.lock Lock file updates for rich and its dependencies (markdown-it-py, mdurl)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread glum_benchmarks/util.py Outdated
Comment thread .github/workflows/runtime-regression.yml Outdated
Comment thread .github/workflows/runtime-regression.yml Outdated
Comment thread glum_benchmarks/compare_results.py Outdated
Comment thread .github/workflows/runtime-regression.yml Outdated
Comment thread glum_benchmarks/compare_results.py Outdated
@DavidEiglspergerQC DavidEiglspergerQC marked this pull request as ready for review February 17, 2026 16:32
@DavidEiglspergerQC DavidEiglspergerQC self-assigned this Mar 5, 2026
Base automatically changed from fix/update_benchmarks to main March 5, 2026 19:52
@DavidEiglspergerQC DavidEiglspergerQC linked an issue Mar 6, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is good. Let's do one more round of integration with the existing systems.

Comment thread glum_benchmarks/compare_results.py Outdated
Comment thread glum_benchmarks/config_ci.yaml
Comment thread glum_benchmarks/compare_results.py
Comment thread glum_benchmarks/compare_results.py
Comment thread glum_benchmarks/compare_results.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your solution to the config circular import is great. I tested a CI run with a regression and the output was good. I think this is ready to merge.

@DavidEiglspergerQC DavidEiglspergerQC merged commit e92aa88 into main Mar 13, 2026
25 checks passed
@DavidEiglspergerQC DavidEiglspergerQC deleted the feat/ci_regression branch March 13, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatic performance benchmarks

3 participants