Skip to content

[CORE-14786] rptest: speedup data_stat#30329

Merged
WillemKauf merged 1 commit intoredpanda-data:devfrom
WillemKauf:stat_fix
Apr 29, 2026
Merged

[CORE-14786] rptest: speedup data_stat#30329
WillemKauf merged 1 commit intoredpanda-data:devfrom
WillemKauf:stat_fix

Conversation

@WillemKauf
Copy link
Copy Markdown
Contributor

@WillemKauf WillemKauf commented Apr 28, 2026

Currently, we use:

f"find {RedpandaService.DATA_DIR} -type f -exec stat -c '%n %s' '{{}}' \\;"

as a way to stat all of the files in a redpanda directory on a node.
This is bad: with -exec stat ... \;, find runs one stat process per file.

We can do better simply by changing this command instead to:

f"find {RedpandaService.DATA_DIR} -type f -exec stat -c '%n %s' '{{}}' +"

Where find will batch as many files as possible to each stat call.
However, we can do even better by not using -exec at all:

f"find {RedpandaService.DATA_DIR} -ignore_readdir_race -type f -printf '%p %s\\n'"

Here, we don't fork any external processes at all, and everything is handled natively in find. Failures due to the concurrent removal of files is silenced with the flag -ignore_readdir_race.

This should fix timeouts in CI for tests with large numbers of files, where 20+ minutes is spent waiting on these stat commands.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@WillemKauf
Copy link
Copy Markdown
Contributor Author

WillemKauf commented Apr 28, 2026

For example, comparing the three options in my redpanda/src/v directory:

willem@bloom:~/redpanda/src/v$ time find -type f -exec stat -c '%n %s' '{}' \; > /dev/null

real	0m2.264s
user	0m1.287s
sys	0m0.972s
willem@bloom:~/redpanda/src/v$ time find  -type f -exec stat -c '%n %s' '{}' + > /dev/null

real	0m0.013s
user	0m0.001s
sys	0m0.011s
willem@bloom:~/redpanda/src/v$ time find -type f -printf '%p %s\n' > /dev/null

real	0m0.011s
user	0m0.001s
sys	0m0.010s

And within a directory with 1M files in 100 sub directories,

willem@bloom:/tmp/1m$ time find -type f -exec stat -c '%n %s' '{}' \; > /dev/null

real	9m47.619s
user	5m12.542s
sys	4m33.323s
willem@bloom:/tmp/1m$ time find -type f -exec stat -c '%n %s' '{}' + > /dev/null

real	0m1.571s
user	0m0.302s
sys	0m1.262s
willem@bloom:/tmp/1m$ time find -type f -printf '%p %s\n' > /dev/null

real	0m1.014s
user	0m0.191s
sys	0m0.819s

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes RedpandaService.data_stat() in the rptest harness by avoiding per-file stat subprocess invocations, aiming to prevent CI timeouts when many files exist under the Redpanda data directory.

Changes:

  • Replace find ... -exec stat ... \; with a find ... -printf ... approach to avoid forking stat for each file.
  • Update inline commentary to reflect how transient filesystem races appear in output.

Comment thread tests/rptest/services/redpanda.py Outdated
Currently, we use:

```
f"find {RedpandaService.DATA_DIR} -type f -exec stat -c '%n %s' '{{}}' \\;"
```

as a way to stat all of the files in a `redpanda` directory on a node.
This is bad: with `-exec stat ... \;`, `find` runs one `stat` process per file.
We can do better simply by changing this command instead to:

```
f"find {RedpandaService.DATA_DIR} -type f -exec stat -c '%n %s' '{{}}' +"
```
Where `find` will batch as many files as possible to each `stat` call.
However, we can do even better by not using `-exec` at all:

```
f"find {RedpandaService.DATA_DIR} -ignore_readdir_race -type f -printf '%p %s\\n'"
```

Here, we don't need to fork any extra processes at all, and everything is
handled natively in `find`. Concurrent removal of files is silenced with
the flag `-ignore_readdir_race`.

This should fix timeouts in CI for tests with large numbers of files,
where 20+ minutes is spent waiting on these stat commands.
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Retry command for Build#83784

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/write_caching_fi_e2e_test.py::WriteCachingFailureInjectionE2ETest.test_crash_all@{"use_transactions":false}

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#83784
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(FAIL) WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/83784#019dd62d-95fe-4808-950f-f54e5ad1a4f7 14/21 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0941, p0=0.0084, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@WillemKauf WillemKauf requested a review from oleiman April 29, 2026 02:15
Copy link
Copy Markdown
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@nvartolomei
Copy link
Copy Markdown
Contributor

please backport to 26.1 and 25.3 to which we still frequently backport changes

@WillemKauf
Copy link
Copy Markdown
Contributor Author

please backport to 26.1 and 25.3 to which we still frequently backport changes

Ack!

@WillemKauf WillemKauf merged commit 2d68eee into redpanda-data:dev Apr 29, 2026
23 checks passed
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v26.1.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v25.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants