-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9518: [Python] Deprecate pyarrow serialization #8255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9518: [Python] Deprecate pyarrow serialization #8255
Conversation
|
WIP, still need to add a few additional deprecations, and add do an update of the docs explaining the deprecation with alternatives. cc @rok @mrkn @dhirschfeld (since you contributed to serialization somewhat recently) |
|
Thanks for the ping. I was using that functionality. So I'm interested to see how I can replace it. My goal is to pass objects (mapping of arrays, dataframes and primitive types) between languages such as Python, R and TypeScript. I'm still unsure if arrow is capable of this language-agnostic interop for blobs of heterogeneous types? |
And so I am interested to understand how you were using it, to know what to write about a replacement You mention the goal of passing objects to other languages, but since that is also not possible now with serialize: what do you use
At the moment, not for "random" objects, but only for types that fit into one of the serialization format / IPC message types. The |
aef7bc6 to
1ede379
Compare
1ede379 to
733ff51
Compare
python/pyarrow/__init__.py
Outdated
| "removed in a future version. Use pickle or the pyarrow IPC " | ||
| "functionality instead.") | ||
| if name == "SerializationContext": | ||
| _warnings.warn(_msg.format(name), DeprecationWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we use FutureWarning instead of DeprecationWarning? (because the latter is silent by default)
python/pyarrow/serialization.py
Outdated
| "removed in a future version. Use pickle or the pyarrow IPC " | ||
| "functionality instead.", | ||
| DeprecationWarning, stacklevel=2 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to write a helper instead of pasting this snippet everywhere? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there are existing helpers in pyarrow.util
| @@ -52,6 +52,9 @@ | |||
| sparse = None | |||
|
|
|||
|
|
|||
| pytestmark = pytest.mark.filterwarnings("ignore:'pyarrow:DeprecationWarning") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... is this another pytest magic? What does it do exactly? Filter these warnings only for this test module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is a way to suppress (ignore) all DeprecationWarnings that start with "pyarrow" in this test module. That's easier than catching all individual warnings, since this full file is about serialization and thus deprecated.
(and therefore I added a separate file to explicitly test the warnings are raised)
Will add a short comment about it
I've just got a proof-of-concept arrow serialization framework which can serialize arbitrary Python objects (inheriting from a base class). Unfortunately, after implementing that I found it's not language-agnostic so it's languished as a bit of a curiosity. I need a language-agnostic serialization format which can serialise a Python Currently wondering if |
|
|
Yep, but I also want to be able to read the serialized data in from R, TypeScript and C# |
|
Then you'll have to invent your own serialization format, or find another existing one. |
|
Yeah, that's what I figured 😞. I've previously invented my own (with protocol buffers) but it's a big job so I was hoping to leverage off existing efforts. Will have to do some more research to see what other options there are... |
I have a slight preference for using DeprecationWarning here, at first. Reasoning: DeprecationWarning is still visible if you are using the functionality directly (eg if you call My idea was then to bump it from DeprecationWarning to FutureWarning in the next release, for example. |
Then we should make sure to have a JIRA for this, otherwise we'll probably forget :) |
|
This needs rebasing now :-) |
7fd922e to
17b0b19
Compare
|
Rebased this |
|
|
||
|
|
||
| def test_serialization_deprecated(): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace
python/pyarrow/__init__.py
Outdated
| # deprecated top-level access | ||
|
|
||
|
|
||
| from pyarrow.filesystem import FileSystem as _FileSystem, LocalFileSystem as _LocalFileSystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have flake8 disabled on this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's from the other PR, will fix
kszucs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few nits.
|
Why are the tests in a separate file? |
Because that allows me to simply ignore all deprecation warnings in the actual test file (see the comment at the pytestmark there) |
|
Thanks Joris, merging! |
No description provided.