Skip to content

Conversation

@HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Aug 26, 2025

Rationale for this change

As described in #44345, Decimal32/Decimal64 have been implemented but Parquet has poor support. This change allows to write Decimal32/Decimal64 into Parquet file the same way as Decimal128/Decimal256 and to read Decimal32/Decimal64 from an existing Parquet file.

What changes are included in this PR?

  1. Support writing Decimal32/Decimal64 as INT32/INT64/BYTE_ARRAY/FIXED_LEN_BYTE_ARRAY into Parquet file.
  2. Support reading Parquet column with logical type Decimal. Either reading type from metadata or infering Arrow Decimal type is supported.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. A flag named smallest_decimal_enabled_ is added in ArrowReaderProperties. To maintain backward compatibility, only when the flag is true, Arrow will infer Decimal with small precision to Decimal32/Decimal64 instead of Decimal128.

Copilot AI review requested due to automatic review settings August 26, 2025 11:18
@HuaHuaY HuaHuaY requested a review from wgtmac as a code owner August 26, 2025 11:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for reading and writing Arrow Decimal32 and Decimal64 types in Parquet files. The implementation extends the existing Decimal128/256 support to include smaller decimal types, allowing for more efficient storage of decimal values with lower precision.

  • Extends Parquet I/O to support Decimal32/64 alongside existing Decimal128/256 types
  • Adds reader property for enabling smallest decimal type inference from Parquet
  • Consolidates decimal serialization logic to support all decimal types uniformly

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/src/parquet/properties.h Adds smallest_decimal_enabled_ flag to ArrowReaderProperties for backward compatibility
cpp/src/parquet/column_writer.cc Extends decimal serialization to support Decimal32/64 with unified template logic
cpp/src/parquet/arrow/test_util.h Refactors decimal test utilities to be generic across all decimal types
cpp/src/parquet/arrow/schema_internal.h Updates function signatures to accept ArrowReaderProperties parameter
cpp/src/parquet/arrow/schema_internal.cc Implements smallest decimal type selection logic using new reader property
cpp/src/parquet/arrow/schema.cc Adds Decimal32/64 cases to schema conversion and metadata restoration
cpp/src/parquet/arrow/reader_internal.cc Extends decimal reading logic to support all decimal types through generic templates
cpp/src/parquet/arrow/arrow_reader_writer_test.cc Adds comprehensive test coverage for Decimal32/64 roundtrip scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@HuaHuaY HuaHuaY changed the title GH-44345: [C++] arrow Decimal32/64 read/write parquet GH-44345: [C++][Parquet] arrow Decimal32/64 read/write parquet Aug 26, 2025
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 26, 2025
@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Aug 27, 2025

@pitrou @mapleFU Could you spare some time to review this PR?

@wgtmac wgtmac changed the title GH-44345: [C++][Parquet] arrow Decimal32/64 read/write parquet GH-44345: [C++][Parquet] Add Decimal32/64 support to Parquet Aug 28, 2025
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Left some nits. Thanks @HuaHuaY!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @HuaHuaY . Can we also update https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types ?

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Aug 28, 2025

Can we also update https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types ?

Is this web page generated by docs/source/cpp/parquet.rst? I have pushed a commit to add Decimal32/Decimal64.

@wgtmac
Copy link
Member

wgtmac commented Sep 3, 2025

All CI failures are unrelated for the same reason below:

CMake Error at /opt/conda/envs/arrow/share/cmake-4.1/Modules/FindPackageHandleStandardArgs.cmake:227 (message):
  Could NOT find LLVMAlt (missing: LLVM_PACKAGE_VERSION CLANG_EXECUTABLE
  LLVM_FOUND LLVM_LINK_EXECUTABLE)

I think it is ready to merge. Do you have more comments? @pitrou

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can you please rebase or merge from main to avoid the current CI failures @HuaHuaY ?

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Sep 3, 2025

I rebased the branch.

@pitrou pitrou merged commit a444380 into apache:main Sep 3, 2025
34 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Sep 3, 2025
@HuaHuaY HuaHuaY deleted the fix_gh_44345 branch September 3, 2025 08:44
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit a444380.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants