-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-44345: [C++][Parquet] Add Decimal32/64 support to Parquet #47427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for reading and writing Arrow Decimal32 and Decimal64 types in Parquet files. The implementation extends the existing Decimal128/256 support to include smaller decimal types, allowing for more efficient storage of decimal values with lower precision.
- Extends Parquet I/O to support Decimal32/64 alongside existing Decimal128/256 types
- Adds reader property for enabling smallest decimal type inference from Parquet
- Consolidates decimal serialization logic to support all decimal types uniformly
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/src/parquet/properties.h | Adds smallest_decimal_enabled_ flag to ArrowReaderProperties for backward compatibility |
| cpp/src/parquet/column_writer.cc | Extends decimal serialization to support Decimal32/64 with unified template logic |
| cpp/src/parquet/arrow/test_util.h | Refactors decimal test utilities to be generic across all decimal types |
| cpp/src/parquet/arrow/schema_internal.h | Updates function signatures to accept ArrowReaderProperties parameter |
| cpp/src/parquet/arrow/schema_internal.cc | Implements smallest decimal type selection logic using new reader property |
| cpp/src/parquet/arrow/schema.cc | Adds Decimal32/64 cases to schema conversion and metadata restoration |
| cpp/src/parquet/arrow/reader_internal.cc | Extends decimal reading logic to support all decimal types through generic templates |
| cpp/src/parquet/arrow/arrow_reader_writer_test.cc | Adds comprehensive test coverage for Decimal32/64 roundtrip scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Left some nits. Thanks @HuaHuaY!
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @HuaHuaY . Can we also update https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types ?
Is this web page generated by |
adad99a to
531679a
Compare
|
All CI failures are unrelated for the same reason below: I think it is ready to merge. Do you have more comments? @pitrou |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but can you please rebase or merge from main to avoid the current CI failures @HuaHuaY ?
8c9c949 to
d8c5503
Compare
|
I rebased the branch. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit a444380. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
As described in #44345,
Decimal32/Decimal64have been implemented but Parquet has poor support. This change allows to writeDecimal32/Decimal64into Parquet file the same way asDecimal128/Decimal256and to readDecimal32/Decimal64from an existing Parquet file.What changes are included in this PR?
Decimal32/Decimal64asINT32/INT64/BYTE_ARRAY/FIXED_LEN_BYTE_ARRAYinto Parquet file.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. A flag named
smallest_decimal_enabled_is added inArrowReaderProperties. To maintain backward compatibility, only when the flag istrue, Arrow will infer Decimal with small precision toDecimal32/Decimal64instead ofDecimal128.