refactor all gRPC usages to use Tonic instead of grpcio#11307
Conversation
|
Caveat:
|
| } | ||
|
|
||
| #[tokio::test] | ||
| // TODO(tonic): Ignored this test because, while `ByteStore` is configured with both endpoints, |
There was a problem hiding this comment.
Tonic did not seem to round robin all the time. This test technically relies on an implementation detail of grpcio.
There was a problem hiding this comment.
I think this is fine - honestly I'd consider (as a follow-up) just taking a single String rather than Vec<String> for a CAS endpoint. Again, the only reason we supported multiple was because Scoot was trying to avoid server-side load balancing.
| Digest(fp, size_bytes) | ||
| } | ||
|
|
||
| // TODO(tonic): Replace use of this method with `.into` or equivalent. |
There was a problem hiding this comment.
There are also quite a few places in the code where I open-coded what this method does. We should choose one way or the other. My concern is that digest == None may not be equivalent to digest == Some(EMPTY_DIGEST) in some places.
There was a problem hiding this comment.
Yeah, digest == None is always an error indicating a non-complaint server :(
illicitonion
left a comment
There was a problem hiding this comment.
Looking great! Thanks!
| .digest | ||
| .as_ref() | ||
| .map(|d| d.try_into()) | ||
| .unwrap_or(Ok(EMPTY_DIGEST)); |
There was a problem hiding this comment.
I think this case actually points at an invalid proto, and we should probably return an error, rather than defaulting to the empty digest? These fields are required, even if that's not enforceable in protobuf schema.
Maybe a helper in the bazel_protos crate:
fn require_digest(digest: Option<bazel_protos::Digest>) -> Result<hashing::Digest, String> {or an
impl TryFrom<Option<bazel_protos::Digest>> for hashing::Digestso you could just:
hashing::Digest::try_from(file.digest)?(And throughout this PR)
There was a problem hiding this comment.
I did try impl TryFrom<Option<bazel_protos::Digest>> for hashing::Digest but it fails due to limitations in std and the current Rust stable versions:
error[E0119]: conflicting implementations of trait `std::convert::TryFrom<std::option::Option<gen::build::bazel::remote::execution::v2::Digest>>` for type `hashing::Digest`:
--> process_execution/bazel_protos/src/conversions.rs:46:1
|
46 | impl TryFrom<Option<crate::gen::build::bazel::remote::execution::v2::Digest>> for hashing::Digest {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: conflicting implementation in crate `core`:
- impl<T, U> TryFrom<U> for T
where U: Into<T>;
= note: upstream crates may add a new impl of trait `std::convert::From<std::option::Option<gen::build::bazel::remote::execution::v2::Digest>>` for type `hashing::Digest` in future versions
error[E0117]: only traits defined in the current crate can be implemented for arbitrary types
--> process_execution/bazel_protos/src/conversions.rs:46:1
|
46 | impl TryFrom<Option<crate::gen::build::bazel::remote::execution::v2::Digest>> for hashing::Digest {
| ^^^^^------------------------------------------------------------------------^^^^^---------------
| | | |
| | | `hashing::Digest` is not defined in the current crate
| | `std::option::Option` is not defined in the current crate
| impl doesn't use only types from inside the current crate
|
= note: define and implement a trait or new type instead
I could move all of the conversions to hashing to try and avoid the cycle between the crates. But there is a generic implementation for TryFrom that conflicts with any other implementations, and we would apparently need specialization in stable and not nightly to be able to do the Option bit to avoid the catch-all generic impl. rust-lang/rust#50133
There was a problem hiding this comment.
Oh well, a method is just as usable :) Thanks for trying!
There was a problem hiding this comment.
Will go with a method, I actually had already introduced one called to_pants_digest_opt as part of these "half one way, half the other way" solutions.
| .digest | ||
| .as_ref() | ||
| .map(|d| d.try_into()) | ||
| .unwrap_or(Ok(EMPTY_DIGEST)); |
| let bytes = directory | ||
| .write_to_bytes() | ||
| .map_err(|e| format!("Error serializing directory proto {:?}: {:?}", directory, e))?; | ||
| let mut buf = BytesMut::with_capacity(directory.encoded_len()); |
There was a problem hiding this comment.
It feels like a helper function along the lines of:
fn to_bytes<Message: prost::Message>(m: &Message) -> Bytes {could be handy?
There was a problem hiding this comment.
Yes, probably better done as a separate PR?
|
@illicitonion : Thanks a lot for the detailed review! I'll wait a round before reviewing. |
I can tackle this if you'd prefer. I'm still trying to land #11256, which is somewhat related. |
I'm fine with waiting. Gives me time to try and figure out the stack size issue. Frankly, I don't know if it is the remote execution code, the remote store code, or the test servers. |
@Eric-Arellano: Yes, please. I am not familiar with what tooling exactly was required for building grpcio other than cmake. So doing the removal in a separate PR would be good. |
|
Partial fix for the stack issue: I removed the |
|
Tracked the cause of the stack overflow to pants/src/rust/engine/fs/store/src/remote.rs Lines 327 to 333 in f9b98ec The stack overflow does not occur if |
cc0527a to
7b3be4a
Compare
|
Solved the issue but just boxing the future being passed to |
|
Added some commits to actually get the Tonic-based code when invoked on Toolchain's internal repo: (1) Enter the executor when initializing the scheduler so that Tonic can use |
|
(Pants needs a better story around integration testing of the remote cache and execution code, or at least a way to have unit/functional tests of the setup/init logic.) |
|
@illicitonion: I made the suggested cleanups for digest conversion. Thanks! |
Thanks to #11307, we no longer need to install all these tools. [ci skip-build-wheels]
### Problem #11307 upgraded the gRPC code to [Tonic](https://github.com/hyperium/tonic) from grpcio. As part of that upgrade, Pants now uses [Prost](https://github.com/danburkert/prost) to generate Rust structs for protobuf types. By default, Prost encodes binary fields as `Vec<u8>`. Given the attendant problems of `Vec<u8>` of needing to copy around bytes when structs are cloned, the more efficient method is to use `Bytes` for the representation to avoid unnecessary copies. ### Solution Upgrade Prost to danburkert/prost@a1cccbc and enable using `Bytes` for all binary fields. ### Result Existing tests pass.
Follow-up to #11307 (comment) to add a `to_bytes` helper as an extension function to encode `prost::Message` protobuf types to `Bytes`. Includes roundtrip unit test.
Problem
In testing the remote cache code against Toolchain's remote cache cluster (with both an AWS ALB load balancer and nginx ingress), the Pants client was erroring when receiving HTTP/2 GOAWAY frames which are allowed by the HTTP/2 standard in ordinary course of shutting down a connection. The GOAWAY frame should not cause an error when there is no error attached since it is just the server closing the connection, which it is allowed to do. The client should just reconnect.
Pants uses a very old version of the grpcio library (v0.5.x). Thus, this issue could potentially be a bug in grpcio but could have been fixed in a more recent 1.x series release. Long term, we want to switch to Tonic so instead of upgrading grpcio and hoping the issue is fixed, the better use of time is to just port Pants to Tonic.
Solution
Port all gRPC usage in Pants to the Tonic library and the Prost protobuf code generator. This provides idiomatic Rust bindings for gRPC and protobuf and is directly integrated into tokio and hyper.
Result
Existing tests continue to pass.