Skip to content

chore(tooling,ci): throttle py3 xdist workers locally, use all cores in CI#2120

Merged
fselmo merged 4 commits intoethereum:forks/amsterdamfrom
danceratopz:tox-py-remove-maxprocesses
Feb 12, 2026
Merged

chore(tooling,ci): throttle py3 xdist workers locally, use all cores in CI#2120
fselmo merged 4 commits intoethereum:forks/amsterdamfrom
danceratopz:tox-py-remove-maxprocesses

Conversation

@danceratopz
Copy link
Member

@danceratopz danceratopz commented Feb 3, 2026

🗒️ Description

This PR changes the xdist behavior of the py3 and tests_pytest_py3 tox environments such that:

  • All physical available cores are used in CI on self-hosted runners (-n auto) instead of being capped via --maxprocesses.
  • Only 6 cores are used for local development, by default; overridable via PYTEST_XDIST_AUTO_NUM_WORKERS. This lower default intends to avoid xdist utilizing all cores, which leaves the local workstation unresponsive until the end of the pytest run.

It additionally adds psutil to the dev dependency group in order to accurately detect the number of physical cores, cf https://pytest-xdist.readthedocs.io/en/stable/distribution.html#running-tests-across-multiple-cpus

Results

py3 runs:

Approach

Replace -n auto (and -n auto --maxprocesses N) with:

-n {env:PYTEST_XDIST_AUTO_NUM_WORKERS:6}

Tox resolves {env:VAR:default} at config-parse time from the host shell. CI workflows set the env var to auto; locally it falls back to 6.

Scenario Host env Resolves to
Local (default) unset -n 6
Local (override) PYTEST_XDIST_AUTO_NUM_WORKERS=3 -n 3
CI PYTEST_XDIST_AUTO_NUM_WORKERS=auto -n auto

Why not passenv + PYTEST_XDIST_AUTO_NUM_WORKERS directly?

xdist reads PYTEST_XDIST_AUTO_NUM_WORKERS from the child process environment to resolve -n auto to a worker count and requires an integer. Passing auto through passenv would cause xdist to crash on int("auto"). By resolving the env var in tox's own substitution layer, the child process only ever sees -n <int> or -n auto (with the env var absent), so xdist never encounters a non-integer value.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

Cute Animal Picture

🦖

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.07%. Comparing base (342c7bc) to head (8cc9f9a).
⚠️ Report is 20 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2120   +/-   ##
================================================
  Coverage            86.07%   86.07%           
================================================
  Files                  599      599           
  Lines                39472    39472           
  Branches              3780     3780           
================================================
  Hits                 33977    33977           
  Misses                4862     4862           
  Partials               633      633           
Flag Coverage Δ
unittests 86.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@danceratopz danceratopz changed the title chore(tooling,ci): remove maxprocesses for py3 tox env chore(tooling,ci): throttle py3 xdist workers locally, use all cores in CI Feb 3, 2026
@danceratopz danceratopz added A-tooling Area: Improvements or changes to auxiliary tooling such as uv, ruff, mypy, ... A-ci Area: Continuous Integration labels Feb 3, 2026
@danceratopz danceratopz marked this pull request as ready for review February 3, 2026 17:07
@danceratopz danceratopz requested a review from fselmo February 3, 2026 17:07
@SamWilsn
Copy link
Contributor

SamWilsn commented Feb 3, 2026

avoid deadlocks when xdist spawns a worker per core

How does limiting the number of workers avoid a deadlock?

@SamWilsn
Copy link
Contributor

SamWilsn commented Feb 3, 2026

I'm thinking we maybe combine this with #1393? It should be possible with pytest_xdist_auto_num_workers to:

  • Read numbers of cores
  • Read total system memory
  • Determine if CPython or PyPy

And come up with a reasonable number of cores.

@danceratopz
Copy link
Member Author

danceratopz commented Feb 5, 2026

avoid deadlocks when xdist spawns a worker per core

How does limiting the number of workers avoid a deadlock?

Deadlock was a poor choice of wording! What I mean is: Running -n auto leads to full load, which makes your (local) machine unresponsive until pytest finishes.

Updated the description! :)

@danceratopz danceratopz force-pushed the tox-py-remove-maxprocesses branch from 9dde841 to 2b4c639 Compare February 5, 2026 10:01
@danceratopz
Copy link
Member Author

danceratopz commented Feb 5, 2026

I'm thinking we maybe combine this with #1393?

I took a look #1393 and commented there: #1393 (comment)

According to this, I've just updated this PR to:

  • Add psutil as a dev dep,

It should be possible with pytest_xdist_auto_num_workers to:

* Read numbers of cores

* Read total system memory

* Determine if CPython or PyPy

And come up with a reasonable number of cores.

This is a great idea! This would def help future-proof our setup for CI runner changes. I'd prefer to defer this to a follow-up issue though (tbh, I need to move on in terms of priorities and patience), but happy if someone else picks it up!

I'd just add for the follow-up issue, that this calculation probably won't be set in stone, as new optimizations will impact the memory footprint of generating tests.

For now, this PR means:

  • In CI:
    • We accurately detect the number of physical cores and don't increase the count unintentionally to match the number of logical cores.
    • We use the maximum number of physical cores with -n auto; this might be too memory hungry in the future.
  • For local development:
    • Defaults to 6.
    • Allows devs to override to their preferred value via the PYTEST_XDIST_AUTO_NUM_WORKERS env var.

@danceratopz danceratopz added C-chore Category: chore A-deps Area: Dependencies—Stuff we build on top of (eg. `uv.lock`, `pyproject.toml`) labels Feb 5, 2026
Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now! I noticed tests_pytest_py3 has --dist=loadgroup but py3 doesn't

Replace `-n auto` with `-n {env:PYTEST_XDIST_AUTO_NUM_WORKERS:6}` in
py3 and tests_pytest_py3 tox envs. Defaults to 6 workers locally to
avoid deadlocks; CI sets the env var to `auto` for full core usage.
Removes the now-redundant `--maxprocesses` flag.
Without `psutil` pytest-xdist reports the number of logical cores.
@danceratopz danceratopz force-pushed the tox-py-remove-maxprocesses branch from 39a6ae8 to 8cc9f9a Compare February 9, 2026 03:57
@danceratopz
Copy link
Member Author

Rebased on forks/amsterdam to get the changes from #2116.

LGTM for now! I noticed tests_pytest_py3 has --dist=loadgroup but py3 doesn't

Thanks for flagging! It's the other way round (py3 uses loadgroup). I added an inline comment explaining why for now:
https://github.com/danceratopz/execution-specs/blob/8cc9f9a1f5765f45ccab81accdfaf35cb0c20cee/tox.ini#L90-L95

We'll likely revisit this flag very soon, as we'll likely somehow also enable --dist=worksteal for better balance/distribution across workers.

@danceratopz
Copy link
Member Author

danceratopz commented Feb 9, 2026

@SamWilsn I made a follow-up issue to not forget your idea [above](https://github.com//pull/2120#issuecomment-3843476312): - https://github.com//issues/2163

Ah, #1393 is already exactly this issue. I hadn't understood that from the description! I added your comment to #1393's description to make it clearer.

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this helps those of us who run these locally without an additional -n flag. I think this opens up some flexibility which is nice. Approving based on the other support here for this feature 👍🏼

@fselmo fselmo merged commit 8e276dc into ethereum:forks/amsterdam Feb 12, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ci Area: Continuous Integration A-deps Area: Dependencies—Stuff we build on top of (eg. `uv.lock`, `pyproject.toml`) A-tooling Area: Improvements or changes to auxiliary tooling such as uv, ruff, mypy, ... C-chore Category: chore

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants