[BEAM-4007] Futurize typehints subpackage #5337

RobbeSneyders · 2018-05-11T14:35:31Z

This pull request prepares the typehints subpackage for Python 3 support. This PR is part of a series in which all subpackages will be updated using the same approach.
This approach has been documented here and the first pull request in the series (Futurize coders subpackage) demonstrating this approach can be found at #5053.

R: @aaltay @tvalentyn

tvalentyn

Thank you. A few comments below.

tvalentyn · 2018-06-13T04:29:36Z

sdks/python/apache_beam/typehints/typehints.py

    return 'Any'

+  def __hash__(self):
+    return hash(id(self))


Can we make this hash(type(self)) or are we running into something like: https://issues.apache.org/jira/browse/BEAM-3730 here?

Yes, this is the same issue, so I made __hash__ instance specific like it was on Python 2.

Can you please add a TODO(BEAM-3730) comment here?

tvalentyn · 2018-06-13T04:30:57Z

sdks/python/apache_beam/typehints/typehints.py


-  def __init__(self, name):
-    self.name = name
+  def __hash__(self):


Let's change the implementation hash(self.name) to fulfilll the contract between __hash__ and __eq__.

This is again the same issue as mentioned above.

I actually think it might make more sense to change __eq__ to be instance specific. If I specify K = beam.typehints.TypeVariable('K') in two separate places in my code base, I might not want them match.
However, this might break some existing pipelines. I therefore tried to use the equivalent of the Python 2 code.

What do you think?

I reproduced the BEAM-3730 but could not easily interpret the failure mode. It is clear that equality relationship is not defined correctly here. I'd like to understand the failure first to understand what assumptions Beam codebase has about typehints. I don't have a good answer right now. For the purpose of this change, I think keeping hash(id(self)) is ok for now since we won't make things more broken with that, but we should add a TODO(BEAM-3730): Fix the contract between __hash__ and eq. I'll try to add more context to the issue later.

Can you please add a TODO(BEAM-3730) comment here too?

tvalentyn · 2018-06-13T04:47:35Z

sdks/python/apache_beam/typehints/typehints_test.py

    self.assertTrue(is_consistent_with(int, Any))
-    self.assertTrue(is_consistent_with(str, object))
-    self.assertFalse(is_consistent_with(object, str))
+    # object builtin is shadowed by object imported from future.builtins on


Trying to understand why we need to use native_object in the test, given that we import object from builtins both in the test and in the typehints.py. Does the test not pass without this?
Is this condition (https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1110) evaluated differently without using native_object?

I think the linked condition evaluates ok because we import object from future.builtins in both modules, but the issubclass(sub, base) condition will evaluate differently if sub is not based on the object from future.builtins.
I never was a fan of this workaround, but pushed it mostly to get feedback.

Looking back at it, I actually think we should not use object from future.builtins in test modules, because we want to test for 'native' code.
We already decided not to use other future.builtins types in test modules (mostly focused on str and bytes) because of the same reason.

We should also remove the import from typehints.py, because typechecks with object will behave differently.

+1 for not importing the object from future.builtins in typehints.py . Another option could be to to normalize builtins.object to native object here:
https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1082. But this feels brittle and it seems that importing object here is not that critical.

We can assign _native_object = object before importing from future.builtins. This is maybe more compatible, going forward, than doing this here.

I have removed from builtins import object from all the test files and typehints.py. This import should not be necessary if we are not working with iterators.

tvalentyn · 2018-06-16T00:35:24Z

R: @charlesccychen

charlesccychen · 2018-06-20T21:08:03Z

sdks/python/apache_beam/typehints/typecheck.py

    if output is None:
      return output
-    elif isinstance(output, (dict,) + six.string_types):
+    elif isinstance(output, (dict, str, unicode)):


We should add bytes to this tuple: this preserves the intended behavior in Python 3, where str and unicode here are synonyms.

I don't think we need to add bytes here.
six.string_types is equivalent to basestring on python 2 and str on python 3. (str, unicode) is equivalent to basestring on Python 2 and here also equivalent to str on Python 3 due to the unicode=str assignment.

My understanding is that in Python 2, bytes is equivalent to str, but this is not the case in Python 3, so by leaving out bytes in Python 3, this code path would not be triggered if output is of type bytes, so that we should add this to the list. Is that correct?

You're right. We also want to trigger this path for bytes.
I'll add it.

charlesccychen · 2018-06-20T21:14:14Z

Thanks @RobbeSneyders. I added some minor comments above.

pabloem · 2018-07-02T18:15:52Z

Is this PR ready to merge?

charlesccychen · 2018-07-02T18:22:02Z

@pabloem Not yet--finishing running Dataflow benchmarks.

charlesccychen · 2018-07-02T23:55:47Z

Thanks, this LGTM. (ccy-benchmark-ok)

RobbeSneyders changed the title ~~[BEAM-4007Futurize typehints subpackage~~ [BEAM-4007] Futurize typehints subpackage May 11, 2018

tvalentyn requested changes Jun 13, 2018

View reviewed changes

charlesccychen reviewed Jun 20, 2018

View reviewed changes

Futurize typehints subpackage

ffe4eab

RobbeSneyders force-pushed the typehints branch from 4fc7965 to e73d150 Compare June 21, 2018 21:22

Address PR comments

7b79c0d

RobbeSneyders force-pushed the typehints branch from e73d150 to 7b79c0d Compare June 24, 2018 21:54

charlesccychen merged commit a56ce43 into apache:master Jul 2, 2018

charlesccychen mentioned this pull request Jul 2, 2018

[BEAM-4007] Fix TODO style in typehints.py #5872

Merged

[BEAM-4007] Futurize typehints subpackage #5337

[BEAM-4007] Futurize typehints subpackage #5337

Uh oh!

Conversation

RobbeSneyders commented May 11, 2018

Uh oh!

tvalentyn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobbeSneyders Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobbeSneyders Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn commented Jun 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesccychen commented Jun 20, 2018

Uh oh!

pabloem commented Jul 2, 2018

Uh oh!

charlesccychen commented Jul 2, 2018

Uh oh!

charlesccychen commented Jul 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RobbeSneyders Jun 13, 2018 •

edited

Loading

RobbeSneyders Jun 13, 2018 •

edited

Loading

charlesccychen commented Jul 2, 2018 •

edited

Loading