-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-43640: [Go] allow setting Logical types for pqarrow file writer #43641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or In the case of PARQUET issues on JIRA the title also supports: See also: |
|
|
go/parquet/pqarrow/schema.go
Outdated
| } | ||
|
|
||
| func fieldToNode(name string, field arrow.Field, props *parquet.WriterProperties, arrprops ArrowWriterProperties) (schema.Node, error) { | ||
| func fieldToNode(name string, field arrow.Field, props *parquet.WriterProperties, arrprops ArrowWriterProperties, customLogicalType *LogicalType) (schema.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of adding a function parameter here we could optionally specify the logical type in ColumnProperties which can be accessed per-column through the existing props parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be confusing for the parquet APIs.
When writing parquet files, logical types should be specified right when creating the parquet schema.
This extra parameter is only relevant for pqarrow only so I think it should be within the scope of pqarrow.
Perhaps I can pass it down via pqarrow.ArrowWriterProperties instead?

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ changed in ef9f2f6
88796be to
8bd7407
Compare
8bd7407 to
ef9f2f6
Compare
go/parquet/pqarrow/schema.go
Outdated
|
|
||
| if customLogicalType != nil { | ||
| logicalType = customLogicalType.Type | ||
| length = customLogicalType.Length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length here should come from the physical layout. The LogicalType interface has method IsApplicable(t parquet.Type, tlen int32) bool that can validate whether the underlying type is of the right size for the logical type.
In practice this means you shouldn't need to specify the length here at all. The arrow types above that correspond to FixedLenByteArray should already be setting length to the appropriate value.
That raises another point. If schema.LogicalType is actually the only field that the new LogicalType struct needs to track, then we can skip defining the new struct type and just use the schema.LogicalType interface instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining.
I have updated to use schema.LogicalType directly 3de0781
|
closing in favor of #43679 |
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?