-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(sdk:python): Avoid AttributeError for transforms without hints #36251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The AppliedPTransform initializer would unconditionally attempt to call `get_resource_hints()` on a transform object. This could cause an AttributeError if a PTransform implementation does not define this method. This change adds an `hasattr` check to verify the existence of the `get_resource_hints` method before calling it, preventing the potential crash and making the pipeline construction more robust.
Summary of ChangesHello @liferoad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the stability of the Apache Beam Python SDK by addressing a specific Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
After applying #36238 and this one, I get: |
|
Thanks. Updated annotations. |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
One more apparently: |
|
Assigning reviewers: R: @damccorm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
I did more checks. Please help validate this. Thanks. |
|
Added one check for that. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #36251 +/- ##
=============================================
- Coverage 56.84% 40.17% -16.67%
Complexity 3386 3386
=============================================
Files 1220 1220
Lines 185898 185904 +6
Branches 3523 3523
=============================================
- Hits 105672 74688 -30984
- Misses 76885 107875 +30990
Partials 3341 3341
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Updated the PR to capture this. |
|
Pushed one more fix for this. :) |
|
Thanks! The test passes now! |
| return | ||
| replacement_transform.side_inputs = tuple( | ||
| original_transform_node.transform.side_inputs) | ||
| getattr(original_transform_node.transform, 'side_inputs', ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing all of these attribute checks, can we just set these property to empty values when we initialize the object?
| def __init__(self, label=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is not to set these property. It is caused by the nested MaybeReshuffle. Any fix in MaybeReshuffle could cause the update-compatibly issue. That is why we did #36238
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow - are you saying adding these properties to the PTransform class would cause update incompatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check https://github.com/apache/beam/pull/36184/files#r2359516983: MaybeReshuffle is defined dynamically (inside Create.expand?) which is affecting the inheritance.
The fields should be there if MaybeReshuffle was not nested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, chimed in on that thread. I think we should fix the core label issue which is causing this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fix is much better since it can handle other nested transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would updating
| def __init__(self, label=None): |
The reason https://github.com/apache/beam/pull/36184/files#r2359516983 was breaking is because:
- There were transforms which didn't have explicit labels
- Those transforms get autoassigned names which include the line number. For example
Map(<lambda at bigquery_file_loads.py:1157>)in https://github.com/apache/beam/pull/34807/files - When we change the file, the line number that those transforms land on is no longer the same
So if we:
- Explicitly name the transform which is getting assigned a name with a line number
- Add these properties to
def __init__(self, label=None):
Then we should fix this issue while avoiding any breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you mean here. The nested transform misses many fields (check the rest of my PR), which are not needed when the transform is nested. My PR can make sure any future nested transform should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested transform misses many fields (check the rest of my PR), which are not needed when the transform is nested. My PR can make sure any future nested transform should work.
I agree your PR works. But it is quite messy - for example, we check for the existence of a side_inputs property 3 times when the object is always a PTransform object. It seems much cleaner to just guarantee that this property will always exist on PTransform objects. This also means that if we use these properties elsewhere (now or in the future), we don't need to do more of these kinds of checks.
It seem reasonable to me that PTransform should have these fields in all cases. An alternative would be PTransform providing some functions to get these properties if they exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, this is a minor code quality issue and not a correctness one. It doesn't need to block the PR if you disagree.
| return | ||
| replacement_transform.side_inputs = tuple( | ||
| original_transform_node.transform.side_inputs) | ||
| getattr(original_transform_node.transform, 'side_inputs', ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested transform misses many fields (check the rest of my PR), which are not needed when the transform is nested. My PR can make sure any future nested transform should work.
I agree your PR works. But it is quite messy - for example, we check for the existence of a side_inputs property 3 times when the object is always a PTransform object. It seems much cleaner to just guarantee that this property will always exist on PTransform objects. This also means that if we use these properties elsewhere (now or in the future), we don't need to do more of these kinds of checks.
It seem reasonable to me that PTransform should have these fields in all cases. An alternative would be PTransform providing some functions to get these properties if they exist.
| return | ||
| replacement_transform.side_inputs = tuple( | ||
| original_transform_node.transform.side_inputs) | ||
| getattr(original_transform_node.transform, 'side_inputs', ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, this is a minor code quality issue and not a correctness one. It doesn't need to block the PR if you disagree.
|
I am going to merge this for now since this covers the needs for #33854 (comment) |
The AppliedPTransform initializer would unconditionally attempt to call
get_resource_hints()on a transform object. This could cause an AttributeError if a PTransform implementation does not define this method.This change adds an
hasattrcheck to verify the existence of theget_resource_hintsmethod before calling it, preventing the potential crash and making the pipeline construction more robust.Addresses #33854 (comment)
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.