Skip to content

Conversation

@datbth
Copy link

@datbth datbth commented Aug 12, 2024

Rationale for this change

What changes are included in this PR?

Are these changes tested?

yes

Are there any user-facing changes?

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@datbth datbth changed the title [GH-43640] allow setting Logical types for pqarrow file writer GH-43640: allow setting Logical types for pqarrow file writer Aug 12, 2024
@github-actions github-actions bot added the awaiting review Awaiting review label Aug 12, 2024
@datbth datbth changed the title GH-43640: allow setting Logical types for pqarrow file writer GH-43640: [Go] allow setting Logical types for pqarrow file writer Aug 12, 2024
@github-actions
Copy link

⚠️ GitHub issue #43640 has been automatically assigned in GitHub to PR creator.

}

func fieldToNode(name string, field arrow.Field, props *parquet.WriterProperties, arrprops ArrowWriterProperties) (schema.Node, error) {
func fieldToNode(name string, field arrow.Field, props *parquet.WriterProperties, arrprops ArrowWriterProperties, customLogicalType *LogicalType) (schema.Node, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of adding a function parameter here we could optionally specify the logical type in ColumnProperties which can be accessed per-column through the existing props parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be confusing for the parquet APIs.
When writing parquet files, logical types should be specified right when creating the parquet schema.
This extra parameter is only relevant for pqarrow only so I think it should be within the scope of pqarrow.

Perhaps I can pass it down via pqarrow.ArrowWriterProperties instead?
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ changed in ef9f2f6

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting review Awaiting review awaiting changes Awaiting changes labels Aug 12, 2024
@datbth datbth force-pushed the go/pqarrow_custom_logical_types branch from 88796be to 8bd7407 Compare August 12, 2024 11:21
@datbth datbth force-pushed the go/pqarrow_custom_logical_types branch from 8bd7407 to ef9f2f6 Compare August 12, 2024 11:23
@datbth datbth requested a review from joellubi August 12, 2024 11:24
@datbth datbth marked this pull request as ready for review August 12, 2024 11:24
@datbth datbth requested a review from zeroshade as a code owner August 12, 2024 11:24

if customLogicalType != nil {
logicalType = customLogicalType.Type
length = customLogicalType.Length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length here should come from the physical layout. The LogicalType interface has method IsApplicable(t parquet.Type, tlen int32) bool that can validate whether the underlying type is of the right size for the logical type.

In practice this means you shouldn't need to specify the length here at all. The arrow types above that correspond to FixedLenByteArray should already be setting length to the appropriate value.

That raises another point. If schema.LogicalType is actually the only field that the new LogicalType struct needs to track, then we can skip defining the new struct type and just use the schema.LogicalType interface instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining.
I have updated to use schema.LogicalType directly 3de0781

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 12, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 12, 2024
@datbth datbth requested a review from joellubi August 13, 2024 10:17
@datbth
Copy link
Author

datbth commented Aug 15, 2024

closing in favor of #43679

@datbth datbth closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants