Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Sep 3, 2019

  • FileSystem binding without S3 (it will be implemented in a followup PR)
  • Moved arrow/python/util/datetime.h one level upper, enabling to be used by the bindings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou apply ARROW_PYTHON_EXPORT for all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the functions which were inlined should be kept inline IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I need to move all of the implementation to the header, including the static functions (practically the whole implementation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how it was already, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep everything as internal than that could work, although I prefer readable header files.
Actually we should benchmark whether it has a performance impact or not.

Could you recommend me specific benchmarks if we have any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to have any datetime-heavy benchmarks in the ASV benchmarks directory.
However, you can probably devise a timeit-based microbenchmark easily. No need to be sophisticated :-)

@kszucs kszucs marked this pull request as ready for review September 4, 2019 14:15
@kszucs
Copy link
Member Author

kszucs commented Sep 5, 2019

@wesm
Copy link
Member

wesm commented Sep 5, 2019

@kszucs it's a typedef... visibility only has to do with exported compiled symbols

@kszucs
Copy link
Member Author

kszucs commented Sep 6, 2019

Ahh, indeed! Thanks!

@kszucs kszucs requested review from pitrou and wesm September 6, 2019 07:55
@kszucs
Copy link
Member Author

kszucs commented Sep 6, 2019

@pitrou I have a "not a subpath error" which is probably windows related? https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27224120/job/dx9qmxdj5tnenu94#L2057

@pitrou
Copy link
Member

pitrou commented Sep 9, 2019

@pitrou I have a "not a subpath error" which is probably windows related?

Right now the filesystem classes expect forward slashes, and it seems you're passing backslashes. You should try pathlib.Path.as_posix.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It's gonna be very useful. I posted some comments below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to have any datetime-heavy benchmarks in the ASV benchmarks directory.
However, you can probably devise a timeit-based microbenchmark easily. No need to be sophisticated :-)

@pitrou
Copy link
Member

pitrou commented Sep 11, 2019

Looks like the AppVeyor failure needs fixing, can you take a look @kszucs ?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly tried this out, and added a few comments. But looks good!

@jorisvandenbossche
Copy link
Member

For the pathlib comment, the other alternative would also be to add it as a dependency (for python 2 only, just like also enum34 is a py2 only dependency for enum backport).
I would say that if we think it is nice to return a pathlib, we shouldn't preclude ourselves doing that just for the few months of py2 support we still have.

@kszucs
Copy link
Member Author

kszucs commented Sep 11, 2019

After transitioning to py3 only it would be reasonable to use pathlib.Path objects as return types indeed (at least that's my preference). I suggest to create a JIRA where we can discuss further (possible involves other parts of pyarrow), meanwhile I've set the return types to string.

@pitrou
Copy link
Member

pitrou commented Sep 11, 2019

After transitioning to py3 only it would be reasonable to use pathlib.Path objects as return types indeed (at least that's my preference).

You mean from get_target_stats?

@pitrou
Copy link
Member

pitrou commented Sep 11, 2019

it would be reasonable to use pathlib.Path objects as return types indeed (at least that's my preference)

Actually, no. pathlib paths are only for the local filesystem, not for arbitrary resources (e.g. S3). Instead you should return a plain string.

@jorisvandenbossche
Copy link
Member

Actually, no. pathlib paths are only for the local filesystem, not for arbitrary resources (e.g. S3). Instead you should return a plain string.

Ah, that solves the python 2 / should we depend on pathlib question then.

@kszucs
Copy link
Member Author

kszucs commented Sep 11, 2019

We could have a pathlib compatible Path implementation as well. Whatever, the API we're providing is not stable yet.

@pitrou
Copy link
Member

pitrou commented Sep 16, 2019

For now I'm not doing a full review again, feel free to ping me when you think you're ready (and CI is fixed :-)).

@kszucs
Copy link
Member Author

kszucs commented Sep 17, 2019

@pitrou this should be green now. I'd like to move forward with bindings for S3 once it is merged.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two more comments, sorry ;-)

since the C++ side then decodes from utf-8. On Unix, os.fsencode may be
better.
"""
return tobytes(_stringify_path(path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this shouldn't accept Path objects, only str and bytes. Path objects are only valid for local paths, but here we are dealing with paths in arbitrary systems.

Copy link
Member Author

@kszucs kszucs Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou done, actually we don't accept bytes neither, only strings (we're discussing it previously)

@kszucs
Copy link
Member Author

kszucs commented Sep 17, 2019

@pitrou done, builds are green, the travis error is about codecov upload failure.

@pitrou pitrou closed this in ae20ce8 Sep 17, 2019
@pitrou
Copy link
Member

pitrou commented Sep 17, 2019

This is a very nice start, thank you :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants