Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

No description provided.

@jorisvandenbossche
Copy link
Member Author

WIP, still need to add a few additional deprecations, and add do an update of the docs explaining the deprecation with alternatives.

cc @rok @mrkn @dhirschfeld (since you contributed to serialization somewhat recently)

@github-actions
Copy link

@dhirschfeld
Copy link

Thanks for the ping. I was using that functionality. So I'm interested to see how I can replace it.

My goal is to pass objects (mapping of arrays, dataframes and primitive types) between languages such as Python, R and TypeScript. pickle, being python-only, is no good for this. I understand the previous functionality was also python only so I suppose I haven't lost anything there.

I'm still unsure if arrow is capable of this language-agnostic interop for blobs of heterogeneous types?

@jorisvandenbossche
Copy link
Member Author

I was using that functionality. So I'm interested to see how I can replace it.

And so I am interested to understand how you were using it, to know what to write about a replacement

You mention the goal of passing objects to other languages, but since that is also not possible now with serialize: what do you use serialize for currently ?
Is it a use case where you can basically replace it with pickle?

I'm still unsure if arrow is capable of this language-agnostic interop for blobs of heterogeneous types?

At the moment, not for "random" objects, but only for types that fit into one of the serialization format / IPC message types. The pyarrow.serialization module always has been python specific. Expanding the IPC messages types might be possible, but is something that needs to be discussed on the dev mailing list to see if there is interest.

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9518-deprecate-serialize branch from aef7bc6 to 1ede379 Compare September 30, 2020 15:02
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review September 30, 2020 15:02
@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9518-deprecate-serialize branch from 1ede379 to 733ff51 Compare October 7, 2020 08:41
"removed in a future version. Use pickle or the pyarrow IPC "
"functionality instead.")
if name == "SerializationContext":
_warnings.warn(_msg.format(name), DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use FutureWarning instead of DeprecationWarning? (because the latter is silent by default)

"removed in a future version. Use pickle or the pyarrow IPC "
"functionality instead.",
DeprecationWarning, stacklevel=2
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to write a helper instead of pasting this snippet everywhere? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there are existing helpers in pyarrow.util

@@ -52,6 +52,9 @@
sparse = None


pytestmark = pytest.mark.filterwarnings("ignore:'pyarrow:DeprecationWarning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... is this another pytest magic? What does it do exactly? Filter these warnings only for this test module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is a way to suppress (ignore) all DeprecationWarnings that start with "pyarrow" in this test module. That's easier than catching all individual warnings, since this full file is about serialization and thus deprecated.
(and therefore I added a separate file to explicitly test the warnings are raised)

Will add a short comment about it

@dhirschfeld
Copy link

what do you use serialize for currently?

I've just got a proof-of-concept arrow serialization framework which can serialize arbitrary Python objects (inheriting from a base class). Unfortunately, after implementing that I found it's not language-agnostic so it's languished as a bit of a curiosity.

I need a language-agnostic serialization format which can serialise a Python Dict[str, object] where object is itself serializable to a Dict[str, primitive_type]. I'm performance sensitive and may need to serialize large arrays/DataFrames ("primitive" data-science types). Given that, I hoped pyarrow might fit the bill but it doesn't seem suited to serializing heterogeneous mappings?

Currently wondering if msgpack might be better suited but interested to hear if this is something on the radar/roadmap for arrow?

@pitrou
Copy link
Member

pitrou commented Oct 7, 2020

pickle is the best thing available for arbitrary Python types and heteregenous data, IMO.

@dhirschfeld
Copy link

pickle is the best thing available for arbitrary Python types and heterogenous data, IMO.

Yep, but I also want to be able to read the serialized data in from R, TypeScript and C#

@pitrou
Copy link
Member

pitrou commented Oct 7, 2020

Then you'll have to invent your own serialization format, or find another existing one.

@dhirschfeld
Copy link

Yeah, that's what I figured 😞. I've previously invented my own (with protocol buffers) but it's a big job so I was hoping to leverage off existing efforts. Will have to do some more research to see what other options there are...

@jorisvandenbossche
Copy link
Member Author

I think we use FutureWarning instead of DeprecationWarning? (because the latter is silent by default)

I have a slight preference for using DeprecationWarning here, at first. Reasoning: DeprecationWarning is still visible if you are using the functionality directly (eg if you call pa.serialize() yourself in an interactive environment or a main script, you will see it by default, since Python 3.7 or so), but not visible if you are using some library that is using the deprecated functionality. This gives developers of libraries using this functionality the chance to fix those warnings, before their users see warnings they cannot do anything about.

My idea was then to bump it from DeprecationWarning to FutureWarning in the next release, for example.

@pitrou
Copy link
Member

pitrou commented Oct 7, 2020

My idea was then to bump it from DeprecationWarning to FutureWarning in the next release, for example.

Then we should make sure to have a JIRA for this, otherwise we'll probably forget :)

@pitrou
Copy link
Member

pitrou commented Oct 7, 2020

This needs rebasing now :-)

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-9518-deprecate-serialize branch from 7fd922e to 17b0b19 Compare October 9, 2020 07:36
@jorisvandenbossche
Copy link
Member Author

Rebased this



def test_serialization_deprecated():

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

# deprecated top-level access


from pyarrow.filesystem import FileSystem as _FileSystem, LocalFileSystem as _LocalFileSystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have flake8 disabled on this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's from the other PR, will fix

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just few nits.

@kszucs
Copy link
Member

kszucs commented Oct 9, 2020

Why are the tests in a separate file?

@jorisvandenbossche
Copy link
Member Author

Why are the tests in a separate file?

Because that allows me to simply ignore all deprecation warnings in the actual test file (see the comment at the pytestmark there)

@kszucs
Copy link
Member

kszucs commented Oct 9, 2020

Thanks Joris, merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants