Skip to content

Conversation

@AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented May 4, 2022

Remove the lines that unconditionally set partitioning and file_visitor in pq.write_to_dataset to None. This is a leftover from #12811 where additional pq.write_dataset keywords were exposed.

@github-actions
Copy link

github-actions bot commented May 4, 2022

@github-actions
Copy link

github-actions bot commented May 4, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@lidavidm
Copy link
Member

lidavidm commented May 4, 2022

Thanks!

Is it possible to add tests for these?

@AlenkaF
Copy link
Member Author

AlenkaF commented May 5, 2022

I added a test that checks for partitioning and file_visitor being correctly passed in pq.write_to_dataset.

While writing the test I bumped into another error. If the basename_template is specified as a keyword in pq.write_to_dataset (not being None) the code missed the check for existing_data_behavior and so the call to ds.write_dataset errored due to existing_data_behavior being None and not a string. I decided to add a correction here as this is also my leftover, but from #12838. I could do a separate PR if there will be any opinion in favour of it.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks perfect!

I think it is fine to include the other changes here as well, as they are very similar

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I wonder for some of these 'conflicting' options, should we raise an error? For instance if the user passes both 'partitioning' and 'partition_cols', or 'metadata_collector' and 'file_visitor'.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 10, 2022

Yes, that makes sense. Will do.

@jorisvandenbossche
Copy link
Member

@AlenkaF do you can to do that here, or in a follow-up PR? (either way is fine)

@AlenkaF
Copy link
Member Author

AlenkaF commented May 18, 2022

Sorry, am a bit distracted by other issues.
Let's do a follow-up so this PR can get closed. Will create a JIRA for it today.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 18, 2022

Created a JIRA for the follow-up:
https://issues.apache.org/jira/browse/ARROW-16610

@ursabot
Copy link

ursabot commented May 19, 2022

Benchmark runs are scheduled for baseline = 1cdedc4 and contender = 0a0d7fe. 0a0d7fe is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.51% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.2% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 0a0d7fea ec2-t3-xlarge-us-east-2
[Failed] 0a0d7fea test-mac-arm
[Failed] 0a0d7fea ursa-i9-9960x
[Finished] 0a0d7fea ursa-thinkcentre-m75q
[Finished] 1cdedc4c ec2-t3-xlarge-us-east-2
[Failed] 1cdedc4c test-mac-arm
[Failed] 1cdedc4c ursa-i9-9960x
[Finished] 1cdedc4c ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@AlenkaF AlenkaF deleted the ARROW-16420 branch June 6, 2022 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants