Skip to content

GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet#15101

Merged
pitrou merged 4 commits intoapache:masterfrom
wjones127:feat/parquet-string-bench
Jan 5, 2023
Merged

GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet#15101
pitrou merged 4 commits intoapache:masterfrom
wjones127:feat/parquet-string-bench

Conversation

@wjones127
Copy link
Member

@wjones127 wjones127 commented Dec 27, 2022

@wjones127 wjones127 changed the title GH-15100: [C++][Parquet] Add benchmark for reading strings from Parque GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet Dec 27, 2022
@github-actions
Copy link

@wjones127
Copy link
Member Author

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

@wjones127 wjones127 marked this pull request as ready for review December 28, 2022 16:10
@wjones127
Copy link
Member Author

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

@wjones127
Copy link
Member Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Dec 30, 2022

Benchmark runs are scheduled for baseline = 6236dba and contender = 3c02495. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️2.04% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.81% ⬆️0.14%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3c02495f ec2-t3-xlarge-us-east-2
[Failed] 3c02495f test-mac-arm
[Finished] 3c02495f ursa-i9-9960x
[Finished] 3c02495f ursa-thinkcentre-m75q
[Finished] 6236dbac ec2-t3-xlarge-us-east-2
[Failed] 6236dbac test-mac-arm
[Finished] 6236dbac ursa-i9-9960x
[Finished] 6236dbac ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Dec 30, 2022

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

::arrow::schema({::arrow::field("column", type, null_percentage > 0)}), {arr});
}

static void BM_WriteBinaryColumn(::benchmark::State& state) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it use the PLAIN encoding? Add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment near the parameters of each benchmark, explaining we are using the unique_values to trigger the code paths for dictionary and plain encodings. I tried to add a test within the benchmark to validate we are getting the expected encodings. But I found that it was too complicated, as the encodings can change from page to page and also apply to the definition and repetition levels (IIUC).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you just confirm that the expected encodings are used (and add a comment)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the comment below, sorry. Please disregard. :-)

@wjones127 wjones127 requested a review from pitrou January 4, 2023 21:13
@pitrou pitrou merged commit 040310f into apache:master Jan 5, 2023
EpsilonPrime pushed a commit to EpsilonPrime/arrow that referenced this pull request Jan 5, 2023
… Parquet (apache#15101)

* Closes: apache#15100

Authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@ursabot
Copy link

ursabot commented Jan 5, 2023

Benchmark runs are scheduled for baseline = 25b5093 and contender = 040310f. 040310f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️8.15% ⬆️6.76%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.47% ⬆️0.17%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 040310fe ec2-t3-xlarge-us-east-2
[Failed] 040310fe test-mac-arm
[Finished] 040310fe ursa-i9-9960x
[Finished] 040310fe ursa-thinkcentre-m75q
[Finished] 25b50932 ec2-t3-xlarge-us-east-2
[Failed] 25b50932 test-mac-arm
[Finished] 25b50932 ursa-i9-9960x
[Finished] 25b50932 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented Jan 5, 2023

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

@ursabot
Copy link

ursabot commented Jan 6, 2023

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmarks for reading and writing strings

3 participants