GH-39122: [C++] [Parquet] FLBA reader reading preallocs and null count check#39120
GH-39122: [C++] [Parquet] FLBA reader reading preallocs and null count check#39120Hattonuri wants to merge 1 commit intoapache:mainfrom
Conversation
… we don't have nulls
|
Thanks for the improvement! I think this PR worths creating an issue. |
|
The code LGTM, will approve it after an issue created |
|
This looks ok to me. Note this could probably be improved still. The silly part in this code is that we are generating a bitmap in |
|
Also, did you run any benchmarks? |
|
|
|
Actually you right about bitmap and now i can see that AppendToBitmap takes some percents |
|
LGTM, waiting for Bitmap part finished |
|
I submitted #39124 for an alternative. |
|
I did a quick-and-dirty benchmark on PLAIN-encoded uncompressed FLBA columns:
|
|
Since #39124 is merged, would you mind close this pr? |


We can reserve memory before running loops in reading.
Also we can put check on zero null count, because this function can get it despite called Spaced
This code is run here if column descriptor HasSpacedValues
arrow/cpp/src/parquet/column_reader.cc
Lines 1194 to 1227 in ef3797d
But this function does not look on actual values
arrow/cpp/src/parquet/column_reader.cc
Lines 77 to 93 in ef3797d
In my case i have many optional decimal fields, but they can be null only in the beginning and the ending and i think that this scenario is not rare
For now i have a flamegraph of my reading looking like this(even I don't only have decimals in schema their parsing takes most of the time). And I optimize that part of FLBA record reader appends

What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?