Clean up descriptor.upbdefs dependency of BUILD#24953
Conversation
d3494d6 to
749f5ea
Compare
markdroth
left a comment
There was a problem hiding this comment.
This looks reasonable to me, but please also get a review from @jtattermusch before merging.
Thanks for fixing this!
jtattermusch
left a comment
There was a problem hiding this comment.
I had a few questions
|
|
||
| # TODO(veblush): Remove this workaround once upb is supported well | ||
| # for both Bazel and non-Bazel (https://github.com/grpc/grpc/issues/24904) | ||
| google_api_upbdefs_rule = bazel_rules["//:google_api_upbdefs"] |
There was a problem hiding this comment.
Since in this hack you are manually adding a descriptor.upbdefs.c and descriptor.updefs.h depedency for a specific target //:google_api_upbdefs, it also means that you'd have to add this hack for all the libraries that depend on upb_lib_descriptor and it seems that that's something that's hard to maintain and also something that over time people will run into (and they won't be able to solve this themselves as this whole issue is quite cryptic).
I don't know of a better solution right now (I'd have to spend some time thinking about it and experimenting), but at least you could:
- introduce a separate function (e.g.
_inject_ubp_descriptor_dependency) with a proper description that explains why this hack is being used. I think this hack being applied at the top level is an invitation for folks to add more similar hacks (and I don't like that). Also the function should make it explicit that there are some libraries (currently it's just//:google_api_upbdefsbut it might be more of them in the future) where the upb descriptor.proto dependency needs to be added manually. If there is a reason why only:google_api_upbdefswill ever need the upb_lib_descriptor dependency, please include that reasoning in the comment in said function.
There was a problem hiding this comment.
Idea: Maybe there's some way to adjust the logic from https://github.com/grpc/grpc/pull/24925/files
to fix this without actually requiring to manually list the libraries that depend on descriptor.updefs.c and descriptor.upbdefs.h?
There was a problem hiding this comment.
It turns out that this is unnecessary because src/upb/gen_build_yaml.py already included it. Thanks!
| external_deps = [ | ||
| "upb_lib", | ||
| "upb_lib_descriptor", | ||
| "upb_lib_descriptor_reflection", |
There was a problem hiding this comment.
I'm puzzled by this: Can you elaborate a bit on why replacing upb_lib_descriptor with upb_lib_descriptor_reflection does anything when the conflict we're seeing is "undeclared dependency on descriptor.upbdefs.h"? (it seems like the _reflection version of the library would also depdend on descriptor.upbdefs.h and descriptor.upbdefs.c)
There was a problem hiding this comment.
These replacements are not meant to fix the error. They're merely updated because they're supposed to rely upon it. (e.g. envoy_ads_upbdefs should depend on upb_lib_descriptor_reflection because it's a collection of upb reflections (*.upbdefs) and it has to rely on the reflection of descriptor (upb_lib_descriptor_reflection).
Actually I don't know why this PR happens to fix the original error, to be honest. I think the gRPC build should work without this PR but it happen not to work sometimes on Windows. My guess is that the upb project has descriptor.upbdefs.* which will be generated by one of its targets and gRPC also has the same file name in its project. Mystery here is gRPC doesn't actually depend on the upb target which generates those files. (gRPC just embedded it instead of depending on it) But bazel on Windows appears to be confused by this.
Therefore, I just makes gRPC depend on the target instead of embedding it, which seems to work. :)
749f5ea to
a198f5f
Compare
|
Interestingly, when I try to build the latest version of this PR (using the repro mentioned in the issue), I get a different error: results in @veblush is that something you're seeing as well? |
|
@jtattermusch Oh that's a good point. Thanks! I was able to reproduce this problem. On masterOn upb-descriptor |
|
After some tests, it appears that bazel 3.7.1 can build this only with this PR. I guess bazel 2.2 doesn't go well with |
a198f5f to
5061f93
Compare
|
Although this is built well on Windows after upgrading Bazel to 3.7.1, it still doesn't work on Linux when building it with From the error logs, it appears that protoc failed to generate |
|
It appears that this sometimes works. So this has some indeterministic factor involved. Presumably an order of building targets. |
|
This is my guess on what actually happened. Upb has two types of code generating rules; This should be fine when building with the sandbox spawn_strategy because each target has an isolated environment. But, it could go wrong with the local spawn_strategy because these two targets can share the same directory. Let's assume there is a target relying on
And bazel appears to make generated files which will be used actually later readonly. So it will work like for the target above
This behavior making those files read-only can be observed from the files above with permission
This is why it sometimes succeeds and sometimes fails. This explanation is consistent with the file log below. Case 1 [G1] [M1] [G2] [M2]: FailCase 2 [G1] [G2] [M1] [M2]: Success |
|
Okay, I roughly figured out what's wrong with the linux build with local spawn_strategy. This PR doesn't solve the linux build issue but can solve the windows issue, which is a blocker for Cloud C++ library's gRPC upgrade. @jtattermusch Shall we merge this? |
Given that bazel always wants to have each file created by exactly one rule, this seems like a problem. @haberman, can we change the code generator to generate only the relevant targets in both |
I think this was more or less Bazel-correct, because each rule only declared one set of outputs. It just so happened that the code generator binary would output both, but it doesn't usually matter if a binary creates extra files that are not declared outputs, Bazel will just ignore them. The problem was arising when Bazel was used in non-sandboxed mode, because multiple concurrent runs of the code generator would be sharing the same output directory, and so would conflict as the parallel invocations were not fully isolated from each other.
That was done in protocolbuffers/upb#356 |
|
Would this problem be fixed by updating to a version of upb that includes protocolbuffers/upb#356? |
|
I verified the recent upb fix works with this. So upgrading our upb to protocolbuffers/upb@60607da should fix this linux local bazel build issue. This can be done with on-going #24987 or a separate PR upgrading it. |
|
Upb upgrade is done by #25037 so this is now working. |
5061f93 to
24c698f
Compare
There was a problem hiding this comment.
LGTM if this is still needed to fix #24904.
Thanks for taking the time to investigate the --spawn_strategy=local issue on linux.
To include grpc/grpc#24953 and protocolbuffers/upb#356 which fix https://github.com/protocolbuffers/upb/issues/354. The issue manifested on Windows CI with errors of the form ``` bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied ``` See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545
To include grpc/grpc#24953 and protocolbuffers/upb#356 which fix https://github.com/protocolbuffers/upb/issues/354. The issue manifested on Windows CI with errors of the form ``` bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied ``` See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545 changelog_begin changelog_end
To include grpc/grpc#24953 and protocolbuffers/upb#356 which fix https://github.com/protocolbuffers/upb/issues/354. The issue manifested on Windows CI with errors of the form ``` bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied ``` See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545 changelog_begin changelog_end
To include grpc/grpc#24953 and protocolbuffers/upb#356 which fix https://github.com/protocolbuffers/upb/issues/354. The issue manifested on Windows CI with errors of the form ``` bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied ``` See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545 changelog_begin changelog_end
To include grpc/grpc#24953 and protocolbuffers/upb#356 which fix https://github.com/protocolbuffers/upb/issues/354. The issue manifested on Windows CI with errors of the form ``` bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied ``` See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545 changelog_begin changelog_end Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>
Fixing #24904.
Idea here is removing
descriptor.upbdefs.*from bazel build file because it conflicts withupb_lib_descriptorwhile keeping it in the non-bazel build configuration because they still need them.In addition to this, some
upb_lib_descriptorare fixed withupb_lib_descriptor_reflection(upbdef is for reflection)Internal counterpart cl/346678433 got submitted.