-
Notifications
You must be signed in to change notification settings - Fork 4.5k
BEAM-13939: Restructure Protos to fix namespace conflicts #16961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BEAM-13939: Restructure Protos to fix namespace conflicts #16961
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16961 +/- ##
==========================================
- Coverage 74.12% 74.09% -0.03%
==========================================
Files 677 681 +4
Lines 89069 89209 +140
==========================================
+ Hits 66019 66098 +79
- Misses 21899 21960 +61
Partials 1151 1151
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
For python, thinking it might actually be better to make the structure of |
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the proto package name be used as the basis for the directory name?
(e.g. org.apache.beam.model.job_management.v1 -> org.apache/beam/model/job_management/v1 and org.apache.beam.model.fn_execution.v1 -> org/apache/beam/model/fn_execution/v1)
|
R: @lukecwik |
|
Is there an easy way to run these CI tasks locally? I'm trying some gradle tasks and some work, some don't and some that used to work don't work 😬 . Want to speed up my iteration speed, if there's a doc/readme I can read somewhere, that would be great too! |
|
I will start here |
sdks/python/container/Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
retest this please |
1 similar comment
|
retest this please |
|
retest this please |
|
Run Java PreCommit |
|
Run GoPortable PreCommit |
|
Looks good for me for the proto file changes. |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
lostluck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Thank you for this finicky bit of work! It'll be nice to not be accidentally misusing proto file imports.
- We may be able to avoid using GOPATH. Per
go help gopath
GOPATH and Modules
When using modules, GOPATH is no longer used for resolving imports.
However, it is still used to store downloaded source code (in GOPATH/pkg/mod)
and compiled commands (in GOPATH/bin).
I don't insist on this. We can try to fix it later, and avoid the version duplications.
- Why are we losing the license header on the generated GRPC pb.go files, but not the standard pb.go files? Not really an issue, since we can ignore it in the RAT check, but it is strange.
TBH I don't know (sad). I saw that and spent some time loosely investigating. I agree that ideally we would have the license header, so I am happy to find some fix before merging! If you have tips on direction or if your google-foo is good, would love any help 🙏🏽 |
|
It looks like so i went ahead and wrote my own hacky implementation |
|
ping @tvalentyn, should we tag in another reviewer? |
|
Run Java PreCommit |
|
taking a look, sorry for delay. |
|
no problem! take your time, glad to hear from you |
|
Tiny note: #17045 is ready to go in, and it makes a necessary change to graphx/v1.proto. If there are weird merge conflicts, it should be fine if you just re-generate. I also have a small change I need to make to that same file to fix a different bug, which would also need regeneration. |
|
Thanks for taking on this change, @thempatel . The new suggested import aliases for Python ( For further steps on this PR I'd suggest to cleanup the commit history if comments from other reviewers have been addressed. |
|
So the vast majority of Python changes seem to be unrelated to this (mostly go specific, already very large) change. Is there a way we could break them out and review them separately? The one change I see is that we (now) have to do renaming of the proto files in apache_beam/portability/api back to a flat structure (and fix their internal imports). |
|
Also, for auto-generated files, I think adding them to RAT is better than modifying these files--they're (clearly marked) machine output. |
Milan can confirm, but since the changes are rooted in changing how the proto files import each other, I don't know how separable each change at a PR level. Might be able to organize the changes as commits with: "proto file changes" "generator changes" "go generated code" "python things."... granularity though. I'm ambivalent, since the go side is finished review at this point, and I don't understand the nature of the python side of the change. |
I sympathize with this, it's usually good advice to break large changes into their logical components and land them separately. I think in this case, that will be a challenge; I see the change-set here as one logical component. The nature of building a polyglot SDK on top of proto is that if you change the proto, you have to change all the languages and there's really no way around this without some long migration path. To that end, I think the per-language changes here are isolated enough where you can likely hide the changes from other files and review just the python files, a similar experience to what it would be like if we housed them in their own PR. Maybe a question to ask is what we do if something inexplicably goes wrong with merging this PR: is it easier to recover quickly if we have this change set spread over multiple commits, or a single commit? A revert of 1 commit is easy, a revert of multiple commits with others interspersed is much harder. If you feel really strongly that the python changes should be in their own change set, I am happy to oblige.
Let's chat about this, I'm not sure I understand what is being said here. Why would we have to change this back to a flat structure? I think that will be impossible, hence the extensive changes in the generation tooling.
It looks like the generated go files are already in the RAT, though I do agree with Robert that maintaining the ASF license header is a better avenue, plus it reduces the diff on the PR |
|
Just to confirm, imports of the form |
correct, this PR changes the the proto structure from a flat one to a hierarchical one, like so: to so the reason you can no longer do from apache_beam.portability.api.org.apache.beam.model.pipeline.beam_runner_api_pb2 import TestStreamPayloadIn order to make this easier to work with, I updated the proto generator to also generate module bindings in the from .org.apache.beam.model import pipeline
# ...
external_transforms_pb2 = pipeline.external_transforms_pb2
# ...so that you don't need to provide the fully qualified path. If we didn't do this, this PR would be even more huge, since we'd have to update every import of the generated bindings in the SDK |
| LOG = logging.getLogger() | ||
| LOG.setLevel(logging.INFO) | ||
|
|
||
| LICENSE_HEADER = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just drop these two headers and skip the checks over adding logic to prepend them to the auto-generated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not strongly opposed, I suggest we keep these. My argument is less related to the build system, and more related to messaging to consumers of the SDK: if they are exploring the SDK, it is a useful notice to have so that they know this is Beam's posture
lostluck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Unless there are any further objections, I can merge this in later this week.
|
I guess @lukecwik needs to validate/approve his requested changes first. |
…structure. This is so that the proto files require usage of a org/apache/beam/model namespace in their imports and so that the generated files also include this namespace in their source file metadata.
…enerate all proto files for the go sdk. This new tool will add any necessary options to the proto compiler and generate the proto files relative to the go sdk root to ensure that the generated files have a namespaced file path in their metadata. If you want to generate a proto file in the go sdk, simply use this script in the go:generate directive, the rest will be taken care of by the script.
…te proto bindings. Updates the README for how to generate the model proto bindings into the SDK
…w namespaced structure of the Beam model. It does this by supporting arbitrary directory structures of proto files by calculating and replacing the generated imports with relative imports with the generated source. Additionally, it will generate bindings that allow for imports of the form `from apache_beam.portability.api import beam_runner_api_pb2` so that the SDK is not dependent on the potentially changing structure of the generated bindings within `api`. Imports of the form `from apache_beam.portability.api.org.apache.beam.model import beam_runner_api_pb2` are still supported. setup.py now attempts to generate the proto bindings on invocation since the package structure must exist before the wheel can be created.
…rder to support the new python output structure
|
thanks @lostluck @lukecwik @tvalentyn @robertwb for the thoughtful reviews, i had fun on this one! 🙏🏽 |
|
Can one of the admins verify this patch? |
1 similar comment
|
Can one of the admins verify this patch? |
Generated protobuf files contain additional information about the messages and services they were compiled from such as the file path to the original source proto file. The protobuf runtime for Golang maintains a global registry of all protobufs being used by registering the descriptors using the file path to the source proto file. If multiple descriptors with the same source file path are registered to the global registry, then the initialization code will prevent startup by panic-ing and printing a message like this to stdout:
This behavior in the Go Protocol Buffer SDK is unlikely to go away:
This change aims to bring the protobuf imports in Beam to follow the guidance from a comment in this issue filed with the protobuf repo:
As such, I've elected to place each protobuf package in a directory
org/apache/beam/modelrelative to its respective module root and have updated the build system where necessary.P.S. This is still a work in progress, but opened the PR to socialize since this will affect all of Beam.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.