Skip to content

Mark slowest CI test outliers as THOROUGH#8854

Merged
kroening merged 1 commit intodiffblue:developfrom
tautschnig:slow-as-THOROUGH
Mar 10, 2026
Merged

Mark slowest CI test outliers as THOROUGH#8854
kroening merged 1 commit intodiffblue:developfrom
tautschnig:slow-as-THOROUGH

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

@tautschnig tautschnig commented Mar 10, 2026

Move four extreme outlier tests from CORE to THOROUGH to reduce CI wall-clock time. These tests were identified by analysing 5 recent 'Build and Test CBMC' workflow runs across macOS-14, Windows (VS 2022 and VS 2025), and Linux.

Tests moved to THOROUGH:

  1. jbmc-strings/StringSubstring/test.desc

    • macOS-14 ARM: 1950s (32.5 min), Linux: 3s — 650x slower
    • Single biggest contributor to the jbmc-strings-symex-driven-lazy- loading-CORE suite (avg 1h26m on macOS-14), which is the primary bottleneck in the entire CI pipeline
    • Also affects jbmc-strings-CORE (29s on macOS-14 vs 3s on Linux)
  2. book-examples/pid/C11.desc

    • macOS-14 ARM: 1069s (17.8 min), Linux: 28s — 38x slower
    • Dominates the book-examples-cprover-smt2-CORE suite entirely (18 min avg on macOS-14, all from this one test)
    • Also slow in book-examples-CORE (3m14s avg on macOS-14)
  3. contracts-dfcc/assigns-local-composite/test.desc

    • macOS-14: 90s, VS 2025: 114s, VS 2022: 78s, Linux: 8s
    • 10-14x slower on macOS and Windows than Linux
    • Top individual test in contracts-dfcc-CORE on all platforms
  4. cbmc-incr-smt2/structs/large_array_of_struct_nondet_index.desc

    • VS 2022: 472s (7.9 min) per solver, VS 2025: 103s, Linux: fast
    • Windows-specific bottleneck; runs twice (z3 + cvc5)
    • Single biggest contributor to cbmc-incr-smt2 suite on Windows

Expected impact on the critical path (check-macos-14-cmake-clang):

  • jbmc-strings-symex-driven-lazy-loading-CORE: ~1h26m -> ~54m (est.)
  • book-examples-cprover-smt2-CORE: ~18m -> ~1m (est.)
  • contracts-dfcc-CORE: ~13m -> ~12m (est.)
  • Overall job: ~2h25m -> ~1h30m (est.)

These tests will continue to run in the THOROUGH CI job (check-ubuntu-24_04-cmake-gcc-THOROUGH).

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Move four extreme outlier tests from CORE to THOROUGH to reduce CI
wall-clock time. These tests were identified by analysing 5 recent
'Build and Test CBMC' workflow runs across macOS-14, Windows (VS 2022
and VS 2025), and Linux.

Tests moved to THOROUGH:

1. jbmc-strings/StringSubstring/test.desc
   - macOS-14 ARM: 1950s (32.5 min), Linux: 3s — 650x slower
   - Single biggest contributor to the jbmc-strings-symex-driven-lazy-
     loading-CORE suite (avg 1h26m on macOS-14), which is the #1
     bottleneck in the entire CI pipeline
   - Also affects jbmc-strings-CORE (29s on macOS-14 vs 3s on Linux)

2. book-examples/pid/C11.desc
   - macOS-14 ARM: 1069s (17.8 min), Linux: 28s — 38x slower
   - Dominates the book-examples-cprover-smt2-CORE suite entirely
     (18 min avg on macOS-14, all from this one test)
   - Also slow in book-examples-CORE (3m14s avg on macOS-14)

3. contracts-dfcc/assigns-local-composite/test.desc
   - macOS-14: 90s, VS 2025: 114s, VS 2022: 78s, Linux: 8s
   - 10-14x slower on macOS and Windows than Linux
   - Top individual test in contracts-dfcc-CORE on all platforms

4. cbmc-incr-smt2/structs/large_array_of_struct_nondet_index.desc
   - VS 2022: 472s (7.9 min) per solver, VS 2025: 103s, Linux: fast
   - Windows-specific bottleneck; runs twice (z3 + cvc5)
   - Single biggest contributor to cbmc-incr-smt2 suite on Windows

Expected impact on the critical path (check-macos-14-cmake-clang):
- jbmc-strings-symex-driven-lazy-loading-CORE: ~1h26m -> ~54m (est.)
- book-examples-cprover-smt2-CORE: ~18m -> ~1m (est.)
- contracts-dfcc-CORE: ~13m -> ~12m (est.)
- Overall job: ~2h25m -> ~1h30m (est.)

These tests will continue to run in the THOROUGH CI job
(check-ubuntu-24_04-cmake-gcc-THOROUGH).

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 10, 2026 10:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Moves a small set of regression tests identified as extreme CI runtime outliers from the CORE test set to THOROUGH to reduce critical-path CI wall-clock time while still running them in the thorough job.

Changes:

  • Reclassified 4 outlier tests from CORE to THOROUGH
  • Preserved existing per-test modifiers (e.g., no-new-smt) when present

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
regression/contracts-dfcc/assigns-local-composite/test.desc Switches test classification from CORE to THOROUGH
regression/cbmc-incr-smt2/structs/large_array_of_struct_nondet_index.desc Switches test classification from CORE to THOROUGH
regression/book-examples/pid/C11.desc Switches test classification from CORE to THOROUGH, retaining no-new-smt
jbmc/regression/jbmc-strings/StringSubstring/test.desc Switches test classification from CORE to THOROUGH

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.01%. Comparing base (5719027) to head (6e616e9).
⚠️ Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8854      +/-   ##
===========================================
- Coverage    80.02%   80.01%   -0.02%     
===========================================
  Files         1700     1700              
  Lines       188345   188345              
  Branches        73       73              
===========================================
- Hits        150716   150696      -20     
- Misses       37629    37649      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kroening kroening merged commit b191cc1 into diffblue:develop Mar 10, 2026
44 of 45 checks passed
@tautschnig tautschnig deleted the slow-as-THOROUGH branch March 10, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants