Skip to content

Clean up descriptor.upbdefs dependency of BUILD#24953

Merged
veblush merged 1 commit intogrpc:masterfrom
veblush:upb-descriptor
Jan 16, 2021
Merged

Clean up descriptor.upbdefs dependency of BUILD#24953
veblush merged 1 commit intogrpc:masterfrom
veblush:upb-descriptor

Conversation

@veblush
Copy link
Copy Markdown
Contributor

@veblush veblush commented Dec 9, 2020

Fixing #24904.

Idea here is removing descriptor.upbdefs.* from bazel build file because it conflicts with upb_lib_descriptor while keeping it in the non-bazel build configuration because they still need them.

In addition to this, some upb_lib_descriptor are fixed with upb_lib_descriptor_reflection (upbdef is for reflection)

Internal counterpart cl/346678433 got submitted.

@veblush veblush added lang/core area/build release notes: no Indicates if PR should not be in release notes labels Dec 9, 2020
@veblush veblush force-pushed the upb-descriptor branch 2 times, most recently from d3494d6 to 749f5ea Compare December 10, 2020 00:24
@veblush veblush marked this pull request as ready for review December 10, 2020 03:34
@veblush veblush requested a review from markdroth December 10, 2020 03:34
Copy link
Copy Markdown
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, but please also get a review from @jtattermusch before merging.

Thanks for fixing this!

Copy link
Copy Markdown
Contributor

@jtattermusch jtattermusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few questions


# TODO(veblush): Remove this workaround once upb is supported well
# for both Bazel and non-Bazel (https://github.com/grpc/grpc/issues/24904)
google_api_upbdefs_rule = bazel_rules["//:google_api_upbdefs"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in this hack you are manually adding a descriptor.upbdefs.c and descriptor.updefs.h depedency for a specific target //:google_api_upbdefs, it also means that you'd have to add this hack for all the libraries that depend on upb_lib_descriptor and it seems that that's something that's hard to maintain and also something that over time people will run into (and they won't be able to solve this themselves as this whole issue is quite cryptic).

I don't know of a better solution right now (I'd have to spend some time thinking about it and experimenting), but at least you could:

  • introduce a separate function (e.g. _inject_ubp_descriptor_dependency) with a proper description that explains why this hack is being used. I think this hack being applied at the top level is an invitation for folks to add more similar hacks (and I don't like that). Also the function should make it explicit that there are some libraries (currently it's just //:google_api_upbdefs but it might be more of them in the future) where the upb descriptor.proto dependency needs to be added manually. If there is a reason why only :google_api_upbdefs will ever need the upb_lib_descriptor dependency, please include that reasoning in the comment in said function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: Maybe there's some way to adjust the logic from https://github.com/grpc/grpc/pull/24925/files
to fix this without actually requiring to manually list the libraries that depend on descriptor.updefs.c and descriptor.upbdefs.h?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that this is unnecessary because src/upb/gen_build_yaml.py already included it. Thanks!

Comment thread BUILD
external_deps = [
"upb_lib",
"upb_lib_descriptor",
"upb_lib_descriptor_reflection",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled by this: Can you elaborate a bit on why replacing upb_lib_descriptor with upb_lib_descriptor_reflection does anything when the conflict we're seeing is "undeclared dependency on descriptor.upbdefs.h"? (it seems like the _reflection version of the library would also depdend on descriptor.upbdefs.h and descriptor.upbdefs.c)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These replacements are not meant to fix the error. They're merely updated because they're supposed to rely upon it. (e.g. envoy_ads_upbdefs should depend on upb_lib_descriptor_reflection because it's a collection of upb reflections (*.upbdefs) and it has to rely on the reflection of descriptor (upb_lib_descriptor_reflection).

Actually I don't know why this PR happens to fix the original error, to be honest. I think the gRPC build should work without this PR but it happen not to work sometimes on Windows. My guess is that the upb project has descriptor.upbdefs.* which will be generated by one of its targets and gRPC also has the same file name in its project. Mystery here is gRPC doesn't actually depend on the upb target which generates those files. (gRPC just embedded it instead of depending on it) But bazel on Windows appears to be confused by this.

Therefore, I just makes gRPC depend on the target instead of embedding it, which seems to work. :)

@jtattermusch
Copy link
Copy Markdown
Contributor

Interestingly, when I try to build the latest version of this PR (using the repro mentioned in the issue), I get a different error:

bazel build //:grpc++ --spawn_strategy=local

results in

ERROR: /usr/local/google/home/jtattermusch/.cache/bazel/_bazel_jtattermusch/473402ebb46aca5e4ea9eb310c6f9e93/external/com_google_protobuf/BUILD:354:15: Generating upb protos for :descriptor_proto failed (Exit 1): protoc failed: error executing command bazel-out/host/bin/external/com_google_protobuf/protoc '--upb_out=bazel-out/k8-fastbuild/bin/external/com_google_protobuf' '--plugin=protoc-gen-upb=bazel-out/host/bin/external/upb/protoc-gen-upb' ... (remaining 2 argument(s) skipped)
bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission deniedTarget //:grpc++ failed to build
Use --verbose_failures to see the command lines of failed build steps.

@veblush is that something you're seeing as well?

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 15, 2020

@jtattermusch Oh that's a good point. Thanks! I was able to reproduce this problem.

On master

$ bazel build :grpc++
ERROR: C:/cygwin64/home/kbuilder/grpc/BUILD:3188:1: undeclared inclusion(s) in rule '//:google_api_upbdefs':
this rule is missing dependency declarations for the following files included by 'src/core/ext/upbdefs-generated/google/protobuf/descriptor.upbdefs.c':
  'bazel-out/x64_windows-fastbuild/bin/external/com_google_protobuf/google/protobuf/descriptor.upbdefs.h'

On upb-descriptor

$ bazel build :grpc++
ERROR: C:/cygwin64/home/kbuilder/_bazel_kbuilder/7z4tdihz/external/com_google_protobuf/BUILD:354:2: Generating upb protos for :descriptor_proto failed (Exit 1)
bazel-out/x64_windows-fastbuild/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission deniedTarget //:grpc++ failed to build

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 16, 2020

After some tests, it appears that bazel 3.7.1 can build this only with this PR. I guess bazel 2.2 doesn't go well with @upb:descriptor_upb_proto_reflection. Bazel 3.7 still doesn't build gRPC without this PR, though so I assume that now starts working with the another PR(#24981) bumping the version of Bazel to 3.7.1.

@veblush veblush changed the title Add workaround for descriptor.upbdefs Clean up descriptor.upbdefs dependency of BUILD Dec 16, 2020
@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 18, 2020

Although this is built well on Windows after upgrading Bazel to 3.7.1, it still doesn't work on Linux when building it with --spawn_strategy=local .

$ bazel clean --expunge

$ bazel build grpc++ --spawn_strategy=local --verbose_failures
INFO: Running bazel wrapper (see //tools/bazel for details), bazel version 3.7.1 will be used instead of system-wide bazel installation.
ERROR: /usr/local/google/home/veblush/.cache/bazel/_bazel_veblush/c4652c20fd8d5880d194bf82693e4fee/external/com_google_protobuf/BUILD:354:15: Generating upb protos for :descriptor_proto failed (Exit 1): protoc failed: error executing command 
  (cd /usr/local/google/home/veblush/.cache/bazel/_bazel_veblush/c4652c20fd8d5880d194bf82693e4fee/execroot/com_github_grpc_grpc && \
  exec env - \
  bazel-out/host/bin/external/com_google_protobuf/protoc '--upb_out=bazel-out/k8-fastbuild/bin/external/com_google_protobuf' '--plugin=protoc-gen-upb=bazel-out/host/bin/external/upb/protoc-gen-upb' '--descriptor_set_in=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/descriptor_proto-descriptor-set.proto.bin' google/protobuf/descriptor.proto)
Execution platform: @local_config_platform//:host
bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission deniedTarget //:grpc++ failed to build

$ ls -al bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf
total 228
drwxr-x--- 2 veblush primarygroup   4096 Dec 17 19:22 .
drwxr-x--- 3 veblush primarygroup   4096 Dec 17 19:22 ..
-rw-r--r-- 1 veblush primarygroup  16550 Dec 17 19:22 descriptor.upb.c
-r-xr-xr-x 1 veblush primarygroup  46180 Dec 17 19:22 descriptor.upbdefs.c
-r-xr-xr-x 1 veblush primarygroup   7271 Dec 17 19:22 descriptor.upbdefs.h
-rw-r--r-- 1 veblush primarygroup 145081 Dec 17 19:22 descriptor.upb.h

From the error logs, it appears that protoc failed to generate descriptor.upb.c because of permission. This is caused by the permission of descriptor.upb.c, r-x which blocks its modification.

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 18, 2020

It appears that this sometimes works. So this has some indeterministic factor involved. Presumably an order of building targets.

$ bazel clean --expunge

$ bazel build grpc++ --spawn_strategy=local --verbose_failures
INFO: Running bazel wrapper (see //tools/bazel for details), bazel version 3.7.1 will be used instead of system-wide bazel installation.
Target //:grpc++ up-to-date:
  bazel-bin/libgrpc++.a
  bazel-bin/libgrpc++.so

$ ls -al bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf
total 216
drwxr-x--- 2 veblush primarygroup   4096 Dec 17 21:16 .
drwxr-x--- 3 veblush primarygroup   4096 Dec 17 20:06 ..
-r-xr-xr-x 1 veblush primarygroup  16427 Dec 17 21:16 descriptor.upb.c
-r-xr-xr-x 1 veblush primarygroup  46180 Dec 17 21:16 descriptor.upbdefs.c
-r-xr-xr-x 1 veblush primarygroup   7271 Dec 17 21:16 descriptor.upbdefs.h
-r-xr-xr-x 1 veblush primarygroup 134803 Dec 17 21:16 descriptor.upb.h

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 18, 2020

This is my guess on what actually happened.

Upb has two types of code generating rules; upb_proto_library and upb_proto_reflection_library. Both use the same code generator which happenes to generate *.upb.* and *.upbdefs.* at all time. As a result, upb_proto_library(proto) will create proto.upb.* and proto.upbdefs.*. upb_proto_reflection_library(proto) will do the same thing.

This should be fine when building with the sandbox spawn_strategy because each target has an isolated environment. But, it could go wrong with the local spawn_strategy because these two targets can share the same directory. Let's assume there is a target relying on upb_proto_library(descriptor) and upb_proto_reflection_library(descriptor). Bazel will execute following steps;

  • [G1] build upb_proto_library(descriptor): generates descriptor.upb.* and descriptor.upbdefs.* in the protobuf directory.
  • [G2] build upb_proto_reflection_library(descriptor): generates descriptor.upb.* and descriptor.upbdefs.* in the protobuf directory.

And bazel appears to make generated files which will be used actually later readonly. So it will work like for the target above

  • [M1] make descriptor.upb.* ready-only for upb_proto_library(descriptor).
  • [M2] make descriptor.upbdefs.* ready-only for upb_proto_reflection_library(descriptor).

This behavior making those files read-only can be observed from the files above with permission -r-xr-xr-x. These four steps's dependency is like [M1] -> [G1] and [M2] -> [G2] so there are a few of possible execution orders. And let's take a look at two possible cases.

  1. [G1] [M1] [G2] [M2]: M1 step makes descriptor.upb.* read-only so [G2] will fail.
  2. [G1] [G2] [M1] [M2]: G1 and G2 happens before M1 and M2. It will succeed.

This is why it sometimes succeeds and sometimes fails. This explanation is consistent with the file log below.

Case 1 [G1] [M1] [G2] [M2]: Fail

$ inotifywait -r -m bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf
CREATE descriptor.upb.c
OPEN descriptor.upb.c
MODIFY descriptor.upb.c
CLOSE_WRITE,CLOSE descriptor.upb.c
CREATE descriptor.upb.h
OPEN descriptor.upb.h
MODIFY descriptor.upb.h
CLOSE_WRITE,CLOSE descriptor.upb.h
CREATE descriptor.upbdefs.c
OPEN descriptor.upbdefs.c
MODIFY descriptor.upbdefs.c
CLOSE_WRITE,CLOSE descriptor.upbdefs.c
CREATE descriptor.upbdefs.h
OPEN descriptor.upbdefs.h
MODIFY descriptor.upbdefs.h
CLOSE_WRITE,CLOSE descriptor.upbdefs.h
ATTRIB descriptor.upb.c
OPEN descriptor.upb.c
ACCESS descriptor.upb.c
CLOSE_NOWRITE,CLOSE descriptor.upb.c
ATTRIB descriptor.upb.h
OPEN descriptor.upb.h
ACCESS descriptor.upb.h
CLOSE_NOWRITE,CLOSE descriptor.upb.h

Case 2 [G1] [G2] [M1] [M2]: Success

$ inotifywait -r -m bazel-out/k8-fastbuild/bin/external/com_google_protobuf/google/protobuf
CREATE descriptor.upb.c
OPEN descriptor.upb.c
MODIFY descriptor.upb.c
CLOSE_WRITE,CLOSE descriptor.upb.c
CREATE descriptor.upb.h
OPEN descriptor.upb.h
MODIFY descriptor.upb.h
CLOSE_WRITE,CLOSE descriptor.upb.h
CREATE descriptor.upbdefs.c
OPEN descriptor.upbdefs.c
MODIFY descriptor.upb.c
OPEN descriptor.upb.c
MODIFY descriptor.upbdefs.c
CLOSE_WRITE,CLOSE descriptor.upbdefs.c
CREATE descriptor.upbdefs.h
OPEN descriptor.upbdefs.h
MODIFY descriptor.upb.c
MODIFY descriptor.upbdefs.h
CLOSE_WRITE,CLOSE descriptor.upb.c
CLOSE_WRITE,CLOSE descriptor.upbdefs.h
MODIFY descriptor.upb.h
OPEN descriptor.upb.h
MODIFY descriptor.upb.h
CLOSE_WRITE,CLOSE descriptor.upb.h
MODIFY descriptor.upbdefs.c
OPEN descriptor.upbdefs.c
MODIFY descriptor.upbdefs.c
CLOSE_WRITE,CLOSE descriptor.upbdefs.c
MODIFY descriptor.upbdefs.h
OPEN descriptor.upbdefs.h
MODIFY descriptor.upbdefs.h
CLOSE_WRITE,CLOSE descriptor.upbdefs.h
ATTRIB descriptor.upbdefs.c
OPEN descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
ACCESS descriptor.upbdefs.c
CLOSE_NOWRITE,CLOSE descriptor.upbdefs.c
ATTRIB descriptor.upbdefs.h
OPEN descriptor.upbdefs.h
ACCESS descriptor.upbdefs.h
CLOSE_NOWRITE,CLOSE descriptor.upbdefs.h
ATTRIB descriptor.upb.c
OPEN descriptor.upb.c
ACCESS descriptor.upb.c
CLOSE_NOWRITE,CLOSE descriptor.upb.c
ATTRIB descriptor.upb.h
OPEN descriptor.upb.h
ACCESS descriptor.upb.h
CLOSE_NOWRITE,CLOSE descriptor.upb.h

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 18, 2020

Okay, I roughly figured out what's wrong with the linux build with local spawn_strategy. This PR doesn't solve the linux build issue but can solve the windows issue, which is a blocker for Cloud C++ library's gRPC upgrade. @jtattermusch Shall we merge this?

@markdroth
Copy link
Copy Markdown
Member

Upb has two types of code generating rules; upb_proto_library and upb_proto_reflection_library. Both use the same code generator which happenes to generate *.upb.* and *.upbdefs.* at all time. As a result, upb_proto_library(proto) will create proto.upb.* and proto.upbdefs.*. upb_proto_reflection_library(proto) will do the same thing.

Given that bazel always wants to have each file created by exactly one rule, this seems like a problem. @haberman, can we change the code generator to generate only the relevant targets in both upb_proto_library and upb_proto_reflection_library?

@haberman
Copy link
Copy Markdown
Contributor

haberman commented Dec 21, 2020

Given that bazel always wants to have each file created by exactly one rule, this seems like a problem.

I think this was more or less Bazel-correct, because each rule only declared one set of outputs. It just so happened that the code generator binary would output both, but it doesn't usually matter if a binary creates extra files that are not declared outputs, Bazel will just ignore them.

The problem was arising when Bazel was used in non-sandboxed mode, because multiple concurrent runs of the code generator would be sharing the same output directory, and so would conflict as the parallel invocations were not fully isolated from each other.

@haberman, can we change the code generator to generate only the relevant targets in both upb_proto_library and upb_proto_reflection_library?

That was done in protocolbuffers/upb#356

@markdroth
Copy link
Copy Markdown
Member

Would this problem be fixed by updating to a version of upb that includes protocolbuffers/upb#356?

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 21, 2020

I verified the recent upb fix works with this. So upgrading our upb to protocolbuffers/upb@60607da should fix this linux local bazel build issue. This can be done with on-going #24987 or a separate PR upgrading it.

@veblush
Copy link
Copy Markdown
Contributor Author

veblush commented Dec 22, 2020

Upb upgrade is done by #25037 so this is now working.

Copy link
Copy Markdown
Contributor

@jtattermusch jtattermusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if this is still needed to fix #24904.

Thanks for taking the time to investigate the --spawn_strategy=local issue on linux.

@veblush veblush merged commit ea22dd6 into grpc:master Jan 16, 2021
aherrmann-da pushed a commit to digital-asset/daml that referenced this pull request Feb 2, 2021
To include grpc/grpc#24953 and
protocolbuffers/upb#356 which fix
https://github.com/protocolbuffers/upb/issues/354.

The issue manifested on Windows CI with errors of the form

```
bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied
```

See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545
aherrmann-da pushed a commit to digital-asset/daml that referenced this pull request Feb 2, 2021
To include grpc/grpc#24953 and
protocolbuffers/upb#356 which fix
https://github.com/protocolbuffers/upb/issues/354.

The issue manifested on Windows CI with errors of the form

```
bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied
```

See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545

changelog_begin
changelog_end
aherrmann-da pushed a commit to digital-asset/daml that referenced this pull request Feb 2, 2021
To include grpc/grpc#24953 and
protocolbuffers/upb#356 which fix
https://github.com/protocolbuffers/upb/issues/354.

The issue manifested on Windows CI with errors of the form

```
bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied
```

See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545

changelog_begin
changelog_end
aherrmann-da pushed a commit to digital-asset/daml that referenced this pull request Feb 2, 2021
To include grpc/grpc#24953 and
protocolbuffers/upb#356 which fix
https://github.com/protocolbuffers/upb/issues/354.

The issue manifested on Windows CI with errors of the form

```
bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied
```

See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545

changelog_begin
changelog_end
aherrmann-da added a commit to digital-asset/daml that referenced this pull request Feb 3, 2021
To include grpc/grpc#24953 and
protocolbuffers/upb#356 which fix
https://github.com/protocolbuffers/upb/issues/354.

The issue manifested on Windows CI with errors of the form

```
bazel-out/x64_windows-opt/bin/external/com_google_protobuf/google/protobuf/descriptor.upb.c: Permission denied
```

See https://dev.azure.com/digitalasset/daml/_build/results?buildId=68545

changelog_begin
changelog_end

Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build lang/core release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants