Skip to content

Conversation

@RobbeSneyders
Copy link
Contributor

This pull request prepares the typehints subpackage for Python 3 support. This PR is part of a series in which all subpackages will be updated using the same approach.
This approach has been documented here and the first pull request in the series (Futurize coders subpackage) demonstrating this approach can be found at #5053.

R: @aaltay @tvalentyn

@RobbeSneyders RobbeSneyders changed the title [BEAM-4007Futurize typehints subpackage [BEAM-4007] Futurize typehints subpackage May 11, 2018
Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. A few comments below.

return 'Any'

def __hash__(self):
return hash(id(self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this hash(type(self)) or are we running into something like: https://issues.apache.org/jira/browse/BEAM-3730 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the same issue, so I made __hash__ instance specific like it was on Python 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a TODO(BEAM-3730) comment here?


def __init__(self, name):
self.name = name
def __hash__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the implementation hash(self.name) to fulfilll the contract between __hash__ and __eq__.

Copy link
Contributor Author

@RobbeSneyders RobbeSneyders Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again the same issue as mentioned above.

I actually think it might make more sense to change __eq__ to be instance specific. If I specify K = beam.typehints.TypeVariable('K') in two separate places in my code base, I might not want them match.
However, this might break some existing pipelines. I therefore tried to use the equivalent of the Python 2 code.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduced the BEAM-3730 but could not easily interpret the failure mode. It is clear that equality relationship is not defined correctly here. I'd like to understand the failure first to understand what assumptions Beam codebase has about typehints. I don't have a good answer right now. For the purpose of this change, I think keeping hash(id(self)) is ok for now since we won't make things more broken with that, but we should add a TODO(BEAM-3730): Fix the contract between __hash__ and eq. I'll try to add more context to the issue later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a TODO(BEAM-3730) comment here too?

self.assertTrue(is_consistent_with(int, Any))
self.assertTrue(is_consistent_with(str, object))
self.assertFalse(is_consistent_with(object, str))
# object builtin is shadowed by object imported from future.builtins on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand why we need to use native_object in the test, given that we import object from builtins both in the test and in the typehints.py. Does the test not pass without this?
Is this condition (https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1110) evaluated differently without using native_object?

Copy link
Contributor Author

@RobbeSneyders RobbeSneyders Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the linked condition evaluates ok because we import object from future.builtins in both modules, but the issubclass(sub, base) condition will evaluate differently if sub is not based on the object from future.builtins.
I never was a fan of this workaround, but pushed it mostly to get feedback.

Looking back at it, I actually think we should not use object from future.builtins in test modules, because we want to test for 'native' code.
We already decided not to use other future.builtins types in test modules (mostly focused on str and bytes) because of the same reason.

We should also remove the import from typehints.py, because typechecks with object will behave differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not importing the object from future.builtins in typehints.py . Another option could be to to normalize builtins.object to native object here:
https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1082. But this feels brittle and it seems that importing object here is not that critical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assign _native_object = object before importing from future.builtins. This is maybe more compatible, going forward, than doing this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed from builtins import object from all the test files and typehints.py. This import should not be necessary if we are not working with iterators.

@tvalentyn
Copy link
Contributor

R: @charlesccychen

if output is None:
return output
elif isinstance(output, (dict,) + six.string_types):
elif isinstance(output, (dict, str, unicode)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add bytes to this tuple: this preserves the intended behavior in Python 3, where str and unicode here are synonyms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add bytes here.
six.string_types is equivalent to basestring on python 2 and str on python 3. (str, unicode) is equivalent to basestring on Python 2 and here also equivalent to str on Python 3 due to the unicode=str assignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that in Python 2, bytes is equivalent to str, but this is not the case in Python 3, so by leaving out bytes in Python 3, this code path would not be triggered if output is of type bytes, so that we should add this to the list. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. We also want to trigger this path for bytes.
I'll add it.

@charlesccychen
Copy link
Contributor

Thanks @RobbeSneyders. I added some minor comments above.

@pabloem
Copy link
Member

pabloem commented Jul 2, 2018

Is this PR ready to merge?

@charlesccychen
Copy link
Contributor

@pabloem Not yet--finishing running Dataflow benchmarks.

@charlesccychen
Copy link
Contributor

charlesccychen commented Jul 2, 2018

Thanks, this LGTM. (ccy-benchmark-ok)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants