Skip to content

Conversation

@gruuya
Copy link
Contributor

@gruuya gruuya commented Mar 27, 2024

Which issue does this PR close?

Progresses #5504.

Rationale for this change

Make the variance a smaller component in benchmarks by using a larger SF
and thus reduce the noise/false-positives.

What changes are included in this PR?

Run SF 10 in PR benchmarks too.

Are these changes tested?

Are there any user-facing changes?

# Setup the TPC-H data set with a scale factor of 10
# Setup the TPC-H data sets for scale factors 1 and 10
./bench.sh data tpch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to do SF=1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point; I guess it can be beneficial to detect minor systemic/non-linear regressions that are larger than the noise level, but smaller then the sensitivity of SF 10?

cd benchmarks
./bench.sh run tpch
./bench.sh run tpch10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could run tpch10_mem as well if it doesn't run OOM, or tpch_mem otherwise which should have less variance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add tpch10_mem as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've added both tpch_mem10 and tpch_mem, so that we can observe and compare the noise level for each one.

Also distinguish the output file by the SF used.
@Dandandan Dandandan merged commit 7f4b338 into apache:main Mar 27, 2024
@gruuya gruuya deleted the pr-bench-tpch-sf10 branch March 28, 2024 07:49
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Apr 1, 2024
* Run TPC-H SF10 during PR benchmarks

* Add memory benchmarks to the workflow

Also distinguish the output file by the SF used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants