-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-4007] Futurize typehints subpackage #5337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tvalentyn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. A few comments below.
| return 'Any' | ||
|
|
||
| def __hash__(self): | ||
| return hash(id(self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this hash(type(self)) or are we running into something like: https://issues.apache.org/jira/browse/BEAM-3730 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same issue, so I made __hash__ instance specific like it was on Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a TODO(BEAM-3730) comment here?
|
|
||
| def __init__(self, name): | ||
| self.name = name | ||
| def __hash__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change the implementation hash(self.name) to fulfilll the contract between __hash__ and __eq__.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is again the same issue as mentioned above.
I actually think it might make more sense to change __eq__ to be instance specific. If I specify K = beam.typehints.TypeVariable('K') in two separate places in my code base, I might not want them match.
However, this might break some existing pipelines. I therefore tried to use the equivalent of the Python 2 code.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reproduced the BEAM-3730 but could not easily interpret the failure mode. It is clear that equality relationship is not defined correctly here. I'd like to understand the failure first to understand what assumptions Beam codebase has about typehints. I don't have a good answer right now. For the purpose of this change, I think keeping hash(id(self)) is ok for now since we won't make things more broken with that, but we should add a TODO(BEAM-3730): Fix the contract between __hash__ and eq. I'll try to add more context to the issue later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a TODO(BEAM-3730) comment here too?
| self.assertTrue(is_consistent_with(int, Any)) | ||
| self.assertTrue(is_consistent_with(str, object)) | ||
| self.assertFalse(is_consistent_with(object, str)) | ||
| # object builtin is shadowed by object imported from future.builtins on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand why we need to use native_object in the test, given that we import object from builtins both in the test and in the typehints.py. Does the test not pass without this?
Is this condition (https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1110) evaluated differently without using native_object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the linked condition evaluates ok because we import object from future.builtins in both modules, but the issubclass(sub, base) condition will evaluate differently if sub is not based on the object from future.builtins.
I never was a fan of this workaround, but pushed it mostly to get feedback.
Looking back at it, I actually think we should not use object from future.builtins in test modules, because we want to test for 'native' code.
We already decided not to use other future.builtins types in test modules (mostly focused on str and bytes) because of the same reason.
We should also remove the import from typehints.py, because typechecks with object will behave differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not importing the object from future.builtins in typehints.py . Another option could be to to normalize builtins.object to native object here:
https://github.com/RobbeSneyders/beam/blob/4fc7965ba67fd22a7a7c4a3935ec295aca2811be/sdks/python/apache_beam/typehints/typehints.py#L1082. But this feels brittle and it seems that importing object here is not that critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can assign _native_object = object before importing from future.builtins. This is maybe more compatible, going forward, than doing this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed from builtins import object from all the test files and typehints.py. This import should not be necessary if we are not working with iterators.
| if output is None: | ||
| return output | ||
| elif isinstance(output, (dict,) + six.string_types): | ||
| elif isinstance(output, (dict, str, unicode)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add bytes to this tuple: this preserves the intended behavior in Python 3, where str and unicode here are synonyms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add bytes here.
six.string_types is equivalent to basestring on python 2 and str on python 3. (str, unicode) is equivalent to basestring on Python 2 and here also equivalent to str on Python 3 due to the unicode=str assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that in Python 2, bytes is equivalent to str, but this is not the case in Python 3, so by leaving out bytes in Python 3, this code path would not be triggered if output is of type bytes, so that we should add this to the list. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We also want to trigger this path for bytes.
I'll add it.
|
Thanks @RobbeSneyders. I added some minor comments above. |
|
Is this PR ready to merge? |
|
@pabloem Not yet--finishing running Dataflow benchmarks. |
|
Thanks, this LGTM. (ccy-benchmark-ok) |
This pull request prepares the typehints subpackage for Python 3 support. This PR is part of a series in which all subpackages will be updated using the same approach.
This approach has been documented here and the first pull request in the series (Futurize coders subpackage) demonstrating this approach can be found at #5053.
R: @aaltay @tvalentyn