Add bench for data page statistics parquet extraction#10950
Add bench for data page statistics parquet extraction#10950alamb merged 4 commits intoapache:mainfrom
Conversation
|
@alamb |
I think that makes sense to me The other thing we could do is somehow ignore those columns until the support has been added |
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you @marvinlanhenke
yes, but this would require another PR to revert back those changes. I think adding the bench is not so urgent; so merging once other dataypes are ready might be the easiest thing to do. I've also addressed your other comments; should be fine now. Thanks for the review. |
|
Thanks again @marvinlanhenke -- this PR looks good to me. Per your suggestion, let's wait until the required type support has been added |
efredine
left a comment
There was a problem hiding this comment.
I checked this out and merged main and it runs without errors now, so it should be safe to merge.
|
Thanks for checking this out @efredine -- I merged up from main and updated this PR to get it moving. Once it passes CI I think it is good to go Thanks again @marvinlanhenke Like @efredine I verified the benchmark now works without error: cargo bench --bench parquet_statistic
Compiling bigdecimal v0.4.1
Compiling datafusion v39.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
Finished `bench` profile [optimized] target(s) in 1m 34s
Running benches/parquet_statistic.rs (target/release/deps/parquet_statistic-c6fce472dea5abe8)
Gnuplot not found, using plotters backend
Extract row group statistics for Int64/extract_statistics/Int64
time: [594.98 ns 596.23 ns 597.66 ns]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
Extract data page statistics for Int64/extract_statistics/Int64
time: [6.5665 µs 6.5848 µs 6.6047 µs]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Extract row group statistics for UInt64/extract_statistics/UInt64
time: [576.78 ns 578.78 ns 581.09 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Extract data page statistics for UInt64/extract_statistics/UInt64
time: [6.8120 µs 6.8332 µs 6.8559 µs]
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
Extract row group statistics for F64/extract_statistics/F64
time: [588.96 ns 592.68 ns 596.62 ns]
Extract data page statistics for F64/extract_statistics/F64
time: [7.5959 µs 7.6334 µs 7.6650 µs]
Extract row group statistics for String/extract_statistics/String
time: [897.07 ns 901.70 ns 907.19 ns]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Extract data page statistics for String/extract_statistics/String
time: [25.507 µs 25.555 µs 25.609 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...: Collecting 100 samples in estimated 5.00
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...
time: [947.78 ns 954.30 ns 960.82 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
7 (7.00%) low mild
Benchmarking Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...: Collecting 100 samples in estimated 5.04
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...
time: [25.602 µs 25.812 µs 26.109 µs]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
|
* feat: add data page bench * chore: add comment * fix: row_groups + shorten row_group_indices --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add data page bench * chore: add comment * fix: row_groups + shorten row_group_indices --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #10934.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?