-
Notifications
You must be signed in to change notification settings - Fork 39
Support connect-go protocol #1277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
38ff8a1 to
1da8c33
Compare
4aaaca1 to
9fc5408
Compare
pkg/api/message/subscribe_test.go
Outdated
| // Consume the keepalive message. | ||
| shouldReceive := stream.Receive() | ||
| require.True(t, shouldReceive) | ||
|
|
||
| // Validate the keepalive message. | ||
| msg := stream.Msg() | ||
| require.NotNil(t, msg) | ||
|
|
||
| // There shouldn't be any more messages. | ||
| shouldReceive = stream.Receive() | ||
| require.False(t, shouldReceive) | ||
|
|
||
| // Connect returns an empty (non-nil) message when the stream is closed. | ||
| msg = stream.Msg() | ||
| require.NotNil(t, msg) | ||
|
|
||
| // The stream should return an invalid argument error when the request is invalid. | ||
| err = stream.Err() | ||
| require.Error(t, err) | ||
| require.Equal(t, connect.CodeInvalidArgument, connect.CodeOf(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this handling feels fragile. Why do we need to know exactly what messages are received when, what nils and closing handling and keep alives?
All the test is doing is checking that the request is invalid.
Do we have to call Msg, Receive, Msg before we get an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, modified to be way more elegant.
mkysel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing more to add
Switch node APIs from gRPC to Connect-go and serve Replication/Metadata/Payer handlers over HTTP/2 h2c while clamping
QueryEnvelopeslimit to max when 0 or above max and sending an initial keepalive inmessage.Service.SubscribeEnvelopesReplace gRPC servers and interceptors with Connect-go handlers, request/response types, and HTTP/2 h2c serving; update gateway, server startup, interceptors, metrics, and tests to use Connect; generate Connect clients/handlers for all protobuf APIs; adjust error handling to
connect.NewErrorand change subscribe to send an initial keepalive; clampQueryEnvelopeslimits and move client construction helpers to Connect.📍Where to Start
Start with the Connect service migration in
message.Serviceat [file:pkg/api/message/service.go], then review server setup and registration changes inapi.APIServerat [file:pkg/api/server.go] andserver.startAPIServerat [file:pkg/server/server.go].Changes since #1277 opened
TestSubscribeEnvelopesInvalidRequesttest [a5d7161]TestSubscribeSyncCursorBasictest [a5d7161]TestGetNewestEnvelopetest [a5d7161]📊 Macroscope summarized a5d7161. 22 files reviewed, 63 issues evaluated, 55 issues filtered, 0 comments posted
🗂️ Filtered Issues
cmd/xmtpd-cli/commands/generate.go — 0 comments posted, 3 evaluated, 3 filtered
stress.NewEnvelopesGeneratorwas changed to setcleanuptofunc() { cancel() }instead of closing the underlying transport connection (previouslyconn.Close()). As a result,welcomeMessageHandler's deferredgenerator.Close()(lines 96–101) now only cancels a timeout context and does not close any gRPC/GRPC-Web client connections created insideNewEnvelopesGenerator. This violates single paired cleanup and can leak network resources after the command completes. The handlers continue to callClose()expecting cleanup, but the transport is left open. [ Low confidence ]stress.NewEnvelopesGeneratornow setscleanupto only cancel a context.groupMessageHandlerdefersgenerator.Close()(lines 197–202), but that no longer closes the gRPC client connection created insideNewEnvelopesGenerator. This breaks the single paired cleanup invariant and can leave open transports after command completion. [ Low confidence ]stress.NewEnvelopesGeneratorcleanup now only cancels a context.keyPackageHandlerdefersgenerator.Close()(lines 292–297), but this no longer closes the underlying gRPC client connection built insideNewEnvelopesGenerator. The transport remains open, violating single paired cleanup and potentially leaking resources. [ Low confidence ]pkg/api/message/service.go — 0 comments posted, 8 evaluated, 8 filtered
SubscribeEnvelopescreates atime.Tickerwiths.options.SendKeepAliveIntervaland also callsticker.Reset(s.options.SendKeepAliveInterval)on each channel receive. IfSendKeepAliveIntervalis zero or negative (e.g., misconfigured viaXMTPD_API_SEND_KEEP_ALIVE_INTERVAL),time.NewTickerandticker.Resetwill panic with "non-positive interval for NewTicker/Reset". There is no validation enforcing a positive duration. This results in a runtime crash on subscription setup or during runtime when resetting the ticker. To fix, validates.options.SendKeepAliveInterval > 0up-front and reject the request or use a sane default before creating or resetting the ticker. [ Out of scope ]limittomaxRequestedRowswhen a larger limit is requested (lines 334-339). Previously, the implementation allowed any non-zerolimitfrom the request (per diff), only defaulting when zero. This is an externally visible contract change: clients requesting more thanmaxRequestedRowswill receive fewer envelopes than requested, with no error or warning, which may cause unexpected behavior. [ Low confidence ]fetchEnvelopes, originator node IDs are converted fromuint32toint32when populatingqueries.SelectGatewayEnvelopesByOriginatorsParams.OriginatorNodeIds. If a caller supplies a value greater thanmath.MaxInt32, the conversion will wrap to a negativeint32, potentially causing incorrect query semantics or filtering. While it may not crash, it can lead to silently incorrect behavior. Consider validating the range or using a DB parameter type that preserves unsigned semantics. [ Previously rejected ]requestsprocessed. The handler buildsaddressesfromreq.Msg.Requests(lines 642-645) and queries the DB with the entire list (line 646) without enforcing a maximum. This can enable oversized requests leading to high memory usage and expensive queries, unlikeQueryEnvelopeswhich enforcesmaxQueriesPerRequest. [ Low confidence ]resp.InboxIdto the value from the last matchinglogEntryencountered (lines 659-664) without ordering by or comparingAssociationSequenceID. IfaddressLogEntriescontains multiple entries per address, the chosen inbox ID depends on DB result order, which may not reflect the most recent or correct association. [ Low confidence ]topicsbefore querying the database. There are constants likemaxQueriesPerRequestandmaxTopicLengthused invalidateQuery, but no analogous validation is performed here (lines 690-701). A caller can submit a very large list of topics or topics exceeding expected length, potentially causing excessive memory usage, oversized SQL array parameters, or DB errors. [ Low confidence ]topicsinputs. The code builds a single-index maporiginalSortkeyed bystring(topic)(lines 696-699) and later assigns a result only to the index returned for that key (lines 719-735). If the request includes the same topic multiple times,originalSortwill contain only the last index for that topic, and earlier duplicate indices will remainnilinresponse.Msg.Results. This violates the externally visible contract where each provided topic should get its newest envelope (or null). [ Already posted ]validateKeyPackagedirectly dereferences nested fieldspayload.UploadKeyPackage.KeyPackage.KeyPackageTlsSerializedwithout checking fornil. BothUploadKeyPackageand its innerKeyPackageare pointer fields in the protobuf types and can legitimately benilin incoming requests. If either isnil, this will cause a panic due to a nil pointer dereference. The code should defensively checkpayload.UploadKeyPackage != nilandpayload.UploadKeyPackage.KeyPackage != nilbefore accessingKeyPackageTlsSerialized, and return aconnecterror if they are missing. [ Low confidence ]pkg/api/payer/service.go — 0 comments posted, 6 evaluated, 6 filtered
connect.CodeInternalwhere the previous implementation returnedUnavailable. This change in error code semantics can alter client retry logic and error handling behavior. [ Low confidence ]Unavailableon empty results; now it returns a successful response with an emptyNodesmap. This is a contract/parity change that can break clients relying on the prior error to trigger retries or fallback. [ Low confidence ]env.payload.Bytes()instead of aconnect.Error, leading to inconsistent error typing and potentially incorrect HTTP/Connect status mapping. The handler otherwise consistently wraps errors withconnect.NewError. Returning a plainerrorhere can cause the framework to surface an unknown/incorrect status to clients. [ Low confidence ]status.Errorf(codes.InvalidArgument, ...)(gRPC status) instead ofconnect.NewErrorwhen rejecting over-sized messages. This mixes error types in a Connect handler and can result in incorrect status mapping and response encoding for clients. [ Low confidence ]connect.CodeInternalrather thanconnect.CodeInvalidArgument. WhenParseGroupID(identifier)fails due to an invalid identifier length, the code returnsconnect.NewError(connect.CodeInternal, ...)atcase topic.TopicKindGroupMessagesV1. This misclassifies a client input validation error as an internal server error and breaks the external contract by not signaling the caller that the request is invalid. [ Low confidence ]connect.CodeInternalrather thanconnect.CodeInvalidArgument. WhenParseInboxID(identifier)fails due to an invalid identifier length, the code returnsconnect.NewError(connect.CodeInternal, ...)atcase topic.TopicKindIdentityUpdatesV1. This misclassifies a client input validation error as an internal server error and breaks the external contract by not signaling the caller that the request is invalid. [ Low confidence ]pkg/api/server.go — 0 comments posted, 2 evaluated, 2 filtered
APIServerConfig.PromRegistryandWithPrometheusRegistryare accepted/options-exposed but never used inNewAPIServer. In the prior gRPC implementation, the Prometheus registry was used to register gRPC server metrics (grpcprom.NewServerMetrics). After the refactor to HTTP/Connect handlers,cfg.PromRegistryis silently ignored, resulting in a loss of metrics registration and an externally visible contract change. Callers can pass a registry expecting metrics but receive none, with no error or warning. [ Low confidence ]APIServer.Closecan hang indefinitely when shutdown times out. At lines 166–181,Closecreates a timeout context and callssvc.httpServer.Shutdown(shutdownCtx). IfShutdownreturns an error due to the timeout (e.g.,context deadline exceeded), the server may still be serving and the goroutine started inStartmay remain blocked inServe(svc.listener). The code then unconditionally callssvc.wg.Wait(). Without a forced shutdown (e.g.,svc.httpServer.Close()or explicitly closing thelistener) to unblockServe,wg.Wait()can block forever, causing the whole shutdown sequence to hang. This is a regression compared to the previous gRPC implementation, which enforced a forced stop after a graceful timeout (GracefulStopthenStop). [ Already posted ]pkg/gateway/builder.go — 0 comments posted, 2 evaluated, 2 filtered
Buildmay construct multiple long-lived resources (Redis client viaensureRedis/setupNonceManager, blockchain publisher viasetupBlockchainPublisher, node registry viasetupNodeRegistry, and a metrics server viasetupMetrics) before callingbuildGatewayService. IfbuildGatewayServicelater fails (e.g., due to an invalid payer private key or API server initialization error), the function returns an error without tearing down these previously created resources. Given the interfaces (IBlockchainPublisher.Close()andNodeRegistry.Stop()exist) and the metrics server likely holds a listener, this results in leaked connections/goroutines. To fix, ensure that on any error after creating these resources, they are properly closed/stopped (e.g., using defers for cleanup that are canceled when the build completes successfully), or structure the build to allocate resources after all preceding steps are guaranteed not to fail. [ Already posted ]buildGatewayServicecreates anet.Listenerwithnet.Listenbut does not close it on all failure paths. Specifically, ifapi.NewAPIServer(or its internally invokedRegistrationFunc) returns an error, the function callscancel()and returns the error without callinglistener.Close(). This leaves the port bound and the listener resource leaked. The same occurs if any error is returned after the listener is created but before serving, e.g., inregistrationFuncwhenpayer.NewPayerAPIServicefails. To fix, explicitlydefer listener.Close()immediately after successful creation, and if startup succeeds and ownership is transferred to the API server, cancel the defer (or reassign responsibility) appropriately. Alternatively, close the listener in each error branch after it is created. [ Already posted ]pkg/gateway/interceptor.go — 0 comments posted, 6 evaluated, 6 filtered
WrapUnaryandWrapStreamingHandlerrely onreq.Peer().Addrandconn.Peer().Addrwithout verifying presence. IfPeer()returns an object with an emptyAddror if the environment doesn't set peer info (e.g., certain in-process transports), downstreamIdentityFnimplementations that assume a non-empty address could panic or misbehave. There is no fallback or explicit error when peer address is empty. [ Low confidence ]identityFnis not guarded against.WrapUnarycallsi.identityFn(req.Header(), req.Peer().Addr)without checking thati.identityFnis non-nil. If the interceptor is constructed with a nilIdentityFn, the first request will panic due to a nil function call. [ Low confidence ]WrapUnaryiteratesi.authorizersand calls eachauthorizer(ctx, identity, summary)without checking fornil. If any element inauthorizersisnil, the request will panic. [ Low confidence ]WrapStreamingHandlersets identity on context and callsnextbut does not apply the same authorizer checks asWrapUnary. If any streaming publish or sensitive operation exists, this asymmetry can allow unauthorized operations over streaming while being blocked over unary. [ Low confidence ]GatewayServiceErrors. Previously, the code returnedstatus.Error(rlError.Code(), rlError.ClientMessage()), ensuring a client-safe message. NowreturnRetryAfterErrorwraps and returns therlErroritself viaconnect.NewError(rlError.Code(), rlError), potentially exposing internal error details instead of the intended client-facingClientMessage(). This is a runtime behavior change that can leak implementation details to clients. [ Low confidence ]Retry-Aftervalue when negative durations are provided.returnRetryAfterErrorformats the header asint(retryAfter.Seconds()). If a negative duration is returned byRetryAfter(), the emittedRetry-Afterwill be a negative integer string, which is not a valid value per HTTP spec. There is no validation or clamping to ensure non-negative seconds. [ Low confidence ]pkg/interceptors/client/auth.go — 0 comments posted, 2 evaluated, 2 filtered
UnaryandStream, the code usesmetadata.NewOutgoingContext(ctx, md)to attach the authorization header.NewOutgoingContextreplaces any existing outgoing metadata inctxwithmd, which can silently discard metadata previously set by callers or other interceptors. This can lead to loss of other headers (e.g., tracing, custom auth, per-RPC settings) and cause unexpected behavior downstream. To safely add the token without clobbering existing metadata, usemetadata.AppendToOutgoingContext(ctx, constants.NodeAuthorizationHeaderName, token.SignedString)or merge with existing metadata viametadata.FromOutgoingContextandmetadata.NewOutgoingContextwith a combinedmd. [ Low confidence ]WrapStreamingClient, when token acquisition fails, the code still callsnext(ctx, spec)to obtain aconnect.StreamingClientConnand then wraps it withstreamingAuthInterceptorFailure. This establishes an underlying streaming connection without an auth header even though all subsequentSend/Receivecalls will fail withUnauthenticated. This can cause unintended resource acquisition on both client and server and risks leaking the connection if the caller doesn’t explicitly close it after encountering the immediate send/receive error. A safer approach is to avoid callingnextwhen token generation fails and instead return a synthetic failing connection (or propagate the error immediately if the API allows), ensuring no underlying stream is created when authentication isn’t possible. [ Low confidence ]pkg/interceptors/server/auth.go — 0 comments posted, 4 evaluated, 4 filtered
NodeAuthorizationheader values are no longer explicitly rejected. The prior gRPC interceptor logic (extractToken) returned an error if multiple auth tokens were provided. The new Connect implementation usesreq.Header().Get(...)/conn.RequestHeader().Get(...), which silently selects one value when multiple are present. This changes the authentication contract and can lead to ambiguous or unsafe acceptance of requests with multiple tokens instead of failing fast. Explicitly detect and reject multiple header values. [ Low confidence ]i.verifieris non-nil and unconditionally callsi.verifier.Verify(token)in bothWrapUnaryandWrapStreamingHandler. IfNewServerAuthInterceptoris constructed with a nilauthn.JWTVerifier, this will panic at runtime. Add a nil check or enforce non-nil verifier during construction. [ Low confidence ]i.loggeris non-nil and unconditionally dereferences it inconnectLogIncomingAddressviai.logger.Core().Enabled(...). IfNewServerAuthInterceptoris called with anil*zap.Logger(there are no guards in the constructor), both unary and streaming paths will calli.connectLogIncomingAddress(...)and panic on a nil pointer dereference. Add a nil check before usingi.loggeror enforce a non-nil logger at construction. [ Low confidence ]net.SplitHostPort(addr)returns an error),connectLogIncomingAddressnow skips logging entirely. Previously, the gRPC interceptor attempted reverse DNS and, on failure, still logged withdns_nameset to"unknown". This silent drop of logging on malformed addresses reduces observability and deviates from prior behavior. Consider logging withdns_name=unknownand the raw address even when parsing fails. [ Code style ]pkg/interceptors/server/logging.go — 0 comments posted, 3 evaluated, 2 filtered
WrapStreamingHandler, same as the unary interceptor: logging usesconnect.CodeOf(err)while the returned error issanitizeError(err)which maps context errors to specific codes. This can produce logs showingunknownwhere clients receivedeadline_exceededorcanceled. [ Low confidence ]sanitizeErrordiscards*connect.Errormetadata (details, headers, trailers) by constructing a newconnect.Errorwith onlyfinalCodeand a new genericerrorfor the message. If handlers attach structured error details or response headers/trailers to aconnect.Error, these are lost. This breaks contract parity for error propagation: clients will not receive intended error details, and any server-attached metadata will be silently dropped. [ Low confidence ]pkg/mlsvalidate/service.go — 0 comments posted, 5 evaluated, 5 filtered
NewMLSValidationServiceties the gRPC connection’s lifetime to the incomingctx(atlines 47-50), which in the visible call chain iscfg.CtxfromNewBaseServer, not the server’s derivedsvc.ctx. As a result, canceling the server (svc.cancel) will not close the MLS validation gRPC connection; it will remain open untilcfg.Ctxis canceled. This creates a lifecycle mismatch and can leak the connection during server shutdown, violating the single paired cleanup and lifecycle contract. The fix is to use the server’s lifecycle context (e.g.,svc.ctx) or ensure the passedctxaligns with the server shutdown semantics. [ Low confidence ]GetAssociationState, there is no validation thatnewUpdatescontains at least one update. If an empty slice (or one with nil elements) is passed, the server may reject the request or behave unexpectedly. While this method won’t panic locally, adding an explicit check (e.g., require exactly one new update or at least one, depending on API contract) would avoid sending malformed or semantically invalid requests and provides earlier, clearer errors. [ Code style ]ValidateGroupMessages: the method does not validategroupMessageselements before passing them tomakeValidateGroupMessagesRequest. Inside that helper (outside this code object), each element is used viagroupMessage.GetV1().Data. If any element ingroupMessagesis nil or if theoneofVersionis not set toV1(soGetV1()returns nil), dereferencing.Datawill panic. Add explicit validation inValidateGroupMessagesto ensure eachgroupMessages[i]is non-nil and has a non-nilGetV1()before calling the helper, or make the helper handle these cases gracefully. [ Low confidence ]ValidateGroupMessagesdoes not verify that the number of responses from the server matches the number of input messages. It constructsoutsized tolen(response.Responses)and returns it directly. If the service returns fewer or more responses than inputs, this will silently produce a result slice whose length does not correspond to the input size, potentially losing alignment between inputs and outputs. Add a sanity checkif len(response.Responses) != len(groupMessages) { return nil, fmt.Errorf(...) }to enforce contract parity. [ Low confidence ]ValidateGroupMessagesfails the entire batch on the first per-item error (!response.IsOk), returning an error and discarding any results computed earlier in the loop. If callers expect per-item results (successes alongside failures) for a batch, this behavior changes the externally visible contract by making the operation all-or-nothing. Consider returning a full results array with per-item error details, or at least accumulate errors and include the index to aid diagnosis; otherwise, document clearly that the method fails-fast and returns no partial results. [ Code style ]pkg/server/server.go — 0 comments posted, 4 evaluated, 3 filtered
NewBaseServerare not cleaned up on subsequent failures, causing goroutine and listener leaks. Specifically, the function starts components incrementally and returns early on later errors without stopping those already started: [ Previously rejected ]startAPIServerbuilds up authentication interceptors but does not attach them unless bothjwtVerifierandauthInterceptorare non-nil. While the guard seems okay, there is no fallback or explicit logging when auth is expected but unavailable. More importantly, there is no validation thatsvc.registrantis present when API is enabled;startAPIServerproceeds without auth if eithersvc.nodeRegistryorsvc.registrantis nil, allowing unauthenticated access. Given upstream logic,svc.registrantshould be initialized whenever API is enabled, but if that invariant is broken (e.g., via options), the API would start without auth and with no error. At minimum, enforce that when API is enabled and nodeRegistry is provided, registrant must be non-nil, or explicitly document/guard the unauthenticated mode. [ Already posted ]startAPIServererror path: the function creates anet.Listener(lines 433–436) and then attempts to construct the API server (lines 518–522). Ifapi.NewAPIServerreturns an error, the listener is not closed, resulting in a file descriptor/resource leak. Similarly, if registration or subsequent steps fail, there is noClose()on the listener. The listener should be closed on any error after creation unless ownership is transferred to the API server successfully. [ Already posted ]pkg/stress/envelopes_generator.go — 0 comments posted, 1 evaluated, 1 filtered
NewEnvelopesGenerator, a context with a 60-second timeout is created and then passed into the Connect client builders. Those builders callutils.BuildHTTP2Client(ctx, isTLS), which (on the h2c/plaintext path) setshttp2.Transport.DialTLSto a closure that usesnet.Dialer.DialContext(ctx, ...). Because that closure retains thectxfrom construction time, once the 60-second timeout expires orcleanupcallscancel(), any subsequent RPCs that require a new connection will attempt dials with a canceled context and fail. This can manifest when usingProtocolConnect,ProtocolConnectGRPC, orProtocolConnectGRPCWebover plaintext HTTP (h2c). The correct approach is to avoid capturing a cancelable context in the transport's dialer (use a non-cancelable/background context orDialwith timeouts), or build the HTTP client with a context that remains valid for the client's lifetime. Specifically: [ Already posted ]pkg/testutils/server/server.go — 0 comments posted, 1 evaluated, 1 filtered
NewTestBaseServerdue to missing nil check forcfg.PrivateKey. At line 66,hex.EncodeToString(crypto.FromECDSA(cfg.PrivateKey))dereferencescfg.PrivateKeywithout validation. If a test passesnil(which is reachable from the type with no guards),crypto.FromECDSAwill attempt to read from a nil pointer and panic. Add an explicit nil check and return a clear error (or generate a test key) whencfg.PrivateKeyis nil. [ Test / Mock code ]pkg/utils/api.go — 0 comments posted, 3 evaluated, 3 filtered
AuthorizationHeaderFromContextstrips theBearerscheme usingstrings.TrimPrefix(auth[0], "Bearer "), which is case-sensitive and only removes exactly"Bearer "with a single space. If the metadata carriesauthorization: bearer <token>(lowercase), has multiple spaces or different whitespace (e.g., tabs), or omits the space ("Bearer<token>"), the prefix will not be removed and the returned value will include the scheme text (e.g.,"bearer <token>"). Downstream consumers expecting just the raw token (e.g., JWT parsers) will fail to parse, causing authentication failures. [ Low confidence ]AuthorizationTokenFromHeaderstrips theBearerscheme usingstrings.TrimPrefix(token, "Bearer "), which is case-sensitive and only removes exactly"Bearer "with a single space. If the HTTP header isauthorization: bearer <token>(lowercase), contains multiple spaces or tabs, or uses"Bearer<token>"(no space), the prefix will not be removed and the returned value will include the scheme text. Downstream JWT parsing will likely fail due to the malformed token string. [ Low confidence ]ClientIPFromHeaderOrPeerdrops the peer address ifnet.SplitHostPort(peer)fails (returns empty string). This differs from the established fallback inClientIPFromContext, which attempts a manual split by ':' whenSplitHostPortfails. If thepeerargument is a bare IP without a port (e.g.,"192.0.2.1") or otherwise not inhost:portform, this function will return an empty string instead of a usable IP, causing loss of caller IP in legitimate scenarios. [ Low confidence ]pkg/utils/api_clients.go — 0 comments posted, 11 evaluated, 7 filtered
NewConnectReplicationAPIClient,NewConnectGRPCReplicationAPIClient, andNewConnectGRPCWebReplicationAPIClient, thectxparameter is forwarded toBuildHTTP2Client. For plaintext HTTP (h2c),BuildHTTP2Clientinstalls aDialTLSclosure that usesDialContext(ctx, ...), causing future dials to fail after thectxis canceled or times out. SinceNewEnvelopesGeneratorprovides a 60-second timeout context, clients built with any of these functions will stop being able to establish new connections after ~60 seconds when using h2c. Fix by using a non-cancelable context for transport construction or by ensuring the dial closure does not capture a cancelable context. [ Already posted ]NewConnectGRPCReplicationAPIClientforwardsctxtoBuildHTTP2Client, causing the same dial-closure context capture issue for classic gRPC over h2c. After the context is canceled or times out, future RPCs requiring new connections will fail on plaintext HTTP/2 targets. [ Already posted ]NewConnectGRPCWebReplicationAPIClientalso forwardsctxintoBuildHTTP2Client, introducing the same transport dial closure context capture issue for gRPC-Web. Under plaintext HTTP (h2c), once the context expires or is canceled, subsequent connection attempts will fail. [ Already posted ]NewGRPCConndoes not block or validate connectivity at creation time (noWithBlockand likelygrpc.NewClientis non-blocking), so it may return a*grpc.ClientConnsuccessfully even for invalid/unreachable targets. Callers may expect a failure fromNewGRPCConnwhen the address is bad (as suggested by returnederrorand example usage). If the intent is to validate upfront, addgrpc.WithBlock()and a context with timeout. If non-blocking is desired, consider documenting that the returned error only reflects option/constructor errors, not connectivity. [ Code style ]BuildHTTP2Clientcaptures the caller-providedctxinside the h2c (isTLS == false) transport'sDialTLSclosure. Becausehttp2.Transport.DialTLShas no per-requestcontextparameter, all future dials will use that singlectx. If thectxhas a deadline or is later canceled (e.g., service shutdown or a short-lived setup context), subsequent requests made by the returnedhttp.Clientwill fail to dial, even if the requests themselves have valid contexts. This can cause intermittent or permanent failures unrelated to the actual request context. Usecontext.Background()(or a dedicated long-lived context) for the dialer here, or switch to an API that supports per-request contexts. [ Already posted ]"localhost:8080"or"example.com") are rejected with"missing host"becauseurl.Parsetreats them as paths and setsURL.Hostname()to empty. Although the code defaultsschemeto"http"later, it occurs after checking for a non-empty host, so these common inputs are not supported. If scheme-less addresses are intended to be accepted, the function should detect this and prepend"http://"before parsing, or handle the host parsing manually. [ Already posted ]buildTLSConfigtreats a nilcertPoolfromx509.SystemCertPool()as a hard error, even though a nilRootCAsintls.Configis valid and means “use the host’s verified system roots.” On some platforms or minimal images,SystemCertPool()may return a nil pool (with or without error), and the current code would fail to build a TLS client where the default behavior might otherwise work. Prefer allowingRootCAsto be nil (using the platform default), or only error ifSystemCertPool()returns a non-nil error. [ Low confidence ]