Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

No description provided.

@github-actions
Copy link

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9718-ParquetWriter-filesystems branch from 089d84e to fb9a773 Compare August 27, 2020 09:44
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review August 27, 2020 09:47
@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9718-ParquetWriter-filesystems branch from 37ac79f to 12017a5 Compare September 1, 2020 09:39
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jorisvandenbossche . A couple of comments below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"where"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the name of an argument, then put backquotes around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was copy pasted from the implementation in pyarrow.filesystem, but agree it can be improved. Will update.

The keyword name itself may vary depending on where this helper function is called, so will keep it on a general "the specified path" or so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are legacy filesystem imports? Do we still need them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps import pyarrow.filesystem as legacyfs would make the code easier to read below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we still need them because the full ParquetDataset/ParquetManifest (python) implementation here is based on the legacy filesystems.
But switched to use legacyfs. for the old ones, and plain imports for the new ones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also from pyarrow import filesystem as legacyfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am going to leave it as is for now, because the old ones are still used a lot (would make the diff much larger, will keep that for a next PR, eg when actually deprecating)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be lifted out of the with block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it may be simpler to use pytest.raises(ValueError, match="...")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, was copied from another test, but updated to use match

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test ParquetWriter(path=uri)?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one, but it is segfaulting locally .. (maybe similar as ARROW-9814)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass -s to pytest, you should be able to see the C++ crash message (if any).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge it with the commented out test for now? (opened issue for it at https://issues.apache.org/jira/browse/ARROW-9906)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Pushed some updates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was copy pasted from the implementation in pyarrow.filesystem, but agree it can be improved. Will update.

The keyword name itself may vary depending on where this helper function is called, so will keep it on a general "the specified path" or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we still need them because the full ParquetDataset/ParquetManifest (python) implementation here is based on the legacy filesystems.
But switched to use legacyfs. for the old ones, and plain imports for the new ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am going to leave it as is for now, because the old ones are still used a lot (would make the diff much larger, will keep that for a next PR, eg when actually deprecating)

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one, but it is segfaulting locally .. (maybe similar as ARROW-9814)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, was copied from another test, but updated to use match

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9718-ParquetWriter-filesystems branch from b772535 to 5bafd1d Compare September 3, 2020 07:51
@pitrou
Copy link
Member

pitrou commented Sep 7, 2020

@jorisvandenbossche Do you want to merge this?

@jorisvandenbossche
Copy link
Member Author

Yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants