Skip to content

Conversation

@shivaspeaks
Copy link
Owner

@shivaspeaks shivaspeaks commented Dec 10, 2025

Summary of Changes
This pull request refactors the grpclb load balancer's PICK_FIRST mode to delegate its logic to a standard pick_first load balancing policy.

The key changes are as follows:

  1. grpclb/build.gradle

Added dependency on grpc-util module to access ForwardingLoadBalancerHelper

  1. grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java
  • New imports:
    LoadBalancer, LoadBalancerProvider, LoadBalancerRegistry, ResolvedAddresses, FixedResultPicker, ForwardingLoadBalancerHelper
  • New fields for PICK_FIRST delegation:
    • pickFirstLbProvider - Provider for creating child pick_first LB
    • pickFirstLb - The child LoadBalancer instance
    • pickFirstLbState / pickFirstLbPicker - Track child LB's state and picker
    • currentPickFirstLoadRecorder - Load recorder for token attachment
  • Key behavioral changes:
    • updateServerList() PICK_FIRST case: Instead of creating a single subchannel, it now:
      • Creates the child pick_first LB once and then updates it with new addresses on subsequent updates.
      • Passes addresses to child LB via acceptResolvedAddresses()
    • maybeUpdatePicker() PICK_FIRST case: Uses child LB's state and picker wrapped with ChildLbPickerEntry
    • RoundRobinEntry.picked() signature change: Changed from picked(Metadata) to picked(PickSubchannelArgs) to allow child picker delegation
    • New ChildLbPickerEntry class: Wraps child LB's picker and attaches TokenAttachingTracerFactory for token propagation
    • New PickFirstLbHelper class: Forwarding helper that intercepts updateBalancingState() to store child state and trigger grpclb picker updates
    • Updated shutdown(), requestConnection(), maybeUseFallbackBackends(): Handle the new child LB delegation model
  1. grpclb/src/test/java/io/grpc/grpclb/GrpclbLoadBalancerTest.java
  • Updated tests to reflect the new delegation behavior:
    • Initial state is now CONNECTING (not IDLE) since standard pick_first eagerly connects
    • Tests now verify the child LB is created only once and then handles address updates internally.
    • Adjusted verification expectations for the new flow
  • Key Behavioral Changes:
    • Delegation to child LB: The grpclb state no longer directly manages subchannels for PICK_FIRST mode; the child pick_first LB handles this internally.

@gemini-code-assist
Copy link

Summary of Changes

Hello @shivaspeaks, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the grpclb load balancer's PICK_FIRST mode by introducing delegation to a child pick_first load balancer. The primary motivation is to avoid issues associated with Subchannel.updateAddresses() by recreating the child load balancer whenever backend addresses are updated. This change also standardizes the PICK_FIRST connection lifecycle, making it eagerly connect, and ensures proper token propagation through new helper classes.

Highlights

  • Delegation of PICK_FIRST Subchannel Management: The grpclb load balancer's PICK_FIRST mode now delegates subchannel management to a standard pick_first load balancer. This refactoring eliminates direct calls to Subchannel.updateAddresses() by recreating the child load balancer when addresses change.
  • Updated Connection Behavior for PICK_FIRST: The PICK_FIRST mode now exhibits eager connection behavior, meaning its initial state is CONNECTING rather than IDLE, aligning with the standard pick_first policy.
  • New Helper Classes and Interfaces: Introduced ChildLbPickerEntry to wrap the child load balancer's picker and attach TokenAttachingTracerFactory for token propagation. A PickFirstLbHelper class was added to intercept updateBalancingState() from the child load balancer.
  • Signature Change for RoundRobinEntry.picked(): The picked() method in the RoundRobinEntry interface has been updated to accept PickSubchannelArgs instead of Metadata, allowing for more flexible delegation to child pickers.
  • Test Suite Adjustments: Existing tests for GrpclbLoadBalancer have been updated to reflect the new delegation model, including changes in expected initial states, the absence of updateAddresses() calls, and modified verification expectations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PICK_FIRST mode in the GRPCLB load balancer to delegate to a standard pick_first load balancer. The main goal is to avoid using Subchannel.updateAddresses(), which is achieved by creating a new child load balancer instance on each address update. The changes are comprehensive, affecting state management, subchannel lifecycle, and the picker logic. The tests have been correctly updated to reflect the new behavior, such as eager connections from the child pick_first balancer.

Overall, the changes are well-implemented. I've found one critical issue where the authority override from the child picker's result is not propagated, which could lead to incorrect routing or TLS failures. A code suggestion to fix this is provided.

return childResult;
}
// Wrap the pick result to attach tokens via the tracer factory.
return PickResult.withSubchannel(childResult.getSubchannel(), tracerFactory);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The childResult from the pick_first load balancer might contain an authority override. This implementation loses that information by not propagating childResult.getAuthorityOverride() to the new PickResult. This could lead to incorrect behavior in scenarios that rely on authority override, such as routing to the correct virtual host or for SNI in TLS.

Please propagate the authority override from the child picker's result.

Suggested change
return PickResult.withSubchannel(childResult.getSubchannel(), tracerFactory);
return PickResult.withSubchannel(childResult.getSubchannel(), tracerFactory, childResult.getAuthorityOverride());

@shivaspeaks
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PICK_FIRST mode in grpclb to delegate to a child pick_first load balancer. This is a significant improvement that avoids using the problematic Subchannel.updateAddresses() API. The changes are extensive but appear to be correct and consistently applied. The core logic is now cleaner, as grpclb no longer needs to re-implement pick_first behavior. The tests have been thoroughly updated to reflect the new delegation model and verify the key behavioral changes, such as eager connection and the absence of updateAddresses calls. The implementation of PickFirstLbHelper to intercept state updates from the child LB is a clean solution. Overall, this is a high-quality change that improves the maintainability and correctness of the grpclb implementation.

@shivaspeaks
Copy link
Owner Author

shivaspeaks commented Dec 11, 2025

@gemini-code-assist,

I'm seeing this error in Github Actions:

ERROR: /home/runner/work/grpc-java/grpc-java/grpclb/BUILD.bazel:5:13: Building grpclb/libgrpclb-class.jar (14 source files) failed: (Exit 1): java failed: error executing Javac command (from target //grpclb:grpclb) external/rules_java7.3.2toolchains~remotejdk21_linux/bin/java '--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED' '--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED' ... (remaining 19 arguments skipped)
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
warning: [options] source value 8 is obsolete and will be removed in a future release
warning: [options] target value 8 is obsolete and will be removed in a future release
warning: [options] To suppress warnings about obsolete options, use -Xlint:-options.
grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java:68: error: package io.grpc.util does not exist
import io.grpc.util.ForwardingLoadBalancerHelper;
^
grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java:1254: error: cannot find symbol
private final class PickFirstLbHelper extends ForwardingLoadBalancerHelper {
^
symbol: class ForwardingLoadBalancerHelper
location: class GrpclbState
grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java:576: error: incompatible types: GrpclbState.PickFirstLbHelper cannot be converted to Helper
pickFirstLb = pickFirstLbProvider.newLoadBalancer(new PickFirstLbHelper());
^
grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java:1256: error: method does not override or implement a method from a supertype
@Override
^
grpclb/src/main/java/io/grpc/grpclb/GrpclbState.java:1261: error: method does not override or implement a method from a supertype
@Override
^

@gemini-code-assist
Copy link

Hello @shivaspeaks,

It looks like the compilation error you're encountering in GitHub Actions is due to the Bazel build system not recognizing the grpc-util dependency, even though it was added to grpclb/build.gradle.

For Bazel to correctly resolve the ForwardingLoadBalancerHelper class, you'll need to add //util to the deps list in the grpclb/BUILD.bazel file.

Here's an example of how you might update grpclb/BUILD.bazel:

java_library(
    name = "grpclb",
    srcs = glob(["src/main/java/**/*.java"]),
    deps = [
        "//api",
        "//core",
        "//stub",
        "//util", # Add this line
        "@com_google_guava_guava//jar",
        "@com_google_protobuf//:protobuf_java",
        "@com_google_protobuf//:protobuf_java_util",
        # ... other dependencies
    ],
)

Please make sure to add "//util" to the deps array in your grpclb/BUILD.bazel file and then try running your GitHub Actions again. Let me know if you have any further questions!

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PICK_FIRST load balancing mode in grpclb to delegate to a standard pick_first child load balancer. This is a significant change aimed at avoiding the deprecated Subchannel.updateAddresses() method. The changes are well-contained and the accompanying test updates reflect the new behavior, such as eager connections.

The core logic change in GrpclbState.java to shut down and recreate the child load balancer on every address update is correctly implemented according to the stated goal. The new helper classes ChildLbPickerEntry and PickFirstLbHelper effectively bridge the parent grpclb logic with the delegated child load balancer.

I've found one potential issue regarding state management when the child load balancer is recreated, which could lead to temporary RPC failures. My review includes a suggestion to address this. Overall, this is a good refactoring that modernizes the grpclb implementation.

Comment on lines 576 to 581
pickFirstLb = pickFirstLbProvider.newLoadBalancer(new PickFirstLbHelper());
// Pass addresses to child LB.
pickFirstLb.acceptResolvedAddresses(
ResolvedAddresses.newBuilder()
.setAddresses(eagList)
.build());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There seems to be a potential issue here where stale state from a previous child load balancer could be used. When pickFirstLb is recreated, pickFirstLbState and pickFirstLbPicker are not reset. If maybeUpdatePicker() is called before the new child LB provides its state, it will use the state from the old, now-shutdown LB. This could lead to using a picker with shutdown subchannels, causing RPCs to fail.

It would be safer to reset the child LB's state immediately after creating a new instance.

        pickFirstLb = pickFirstLbProvider.newLoadBalancer(new PickFirstLbHelper());
        // Reset the child LB state, since we created a new one.
        pickFirstLbState = CONNECTING;
        pickFirstLbPicker = new FixedResultPicker(PickResult.withNoResult());
        // Pass addresses to child LB.
        pickFirstLb.acceptResolvedAddresses(
            ResolvedAddresses.newBuilder()
                .setAddresses(eagList)
                .build());

@shivaspeaks
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the grpclb load balancer's PICK_FIRST mode. By delegating the logic to a standard pick_first child load balancer, you've successfully eliminated the use of Subchannel.updateAddresses(), which is the main goal for improving dualstack compatibility. The use of a ForwardingLoadBalancerHelper to bridge the child and parent states, and the TokenAttachingTracerFactory for token propagation, are both elegant solutions. The accompanying test updates correctly reflect the new behavior, such as the eager connection characteristic of the standard pick_first policy.

The code is clear and the changes are well-documented in the pull request description. I have one suggestion to improve the robustness of the implementation by adding a null check.

Comment on lines 228 to 229
this.pickFirstLbProvider =
LoadBalancerRegistry.getDefaultRegistry().getProvider("pick_first");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For improved robustness, it's a good practice to ensure that the pickFirstLbProvider is not null. LoadBalancerRegistry.getDefaultRegistry().getProvider("pick_first") is annotated with @Nullable and can return null if the "pick_first" provider is not found in the registry. A NullPointerException would occur later in updateServerList when pickFirstLbProvider.newLoadBalancer() is called.

Adding a checkNotNull will cause a clean and early failure during initialization if the provider is missing, which is preferable to a deferred NullPointerException.

Suggested change
this.pickFirstLbProvider =
LoadBalancerRegistry.getDefaultRegistry().getProvider("pick_first");
this.pickFirstLbProvider =
checkNotNull(
LoadBalancerRegistry.getDefaultRegistry().getProvider("pick_first"),
"pick_first balancer not available");

@shivaspeaks
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the grpclb load balancer to delegate PICK_FIRST logic to a child pick_first load balancer. This is a significant and positive change that avoids using the Subchannel.updateAddresses() API, aligning with modern gRPC LB patterns. The changes are well-structured, and the test updates correctly reflect the new behavior. My main feedback is a suggestion to optimize the recreation of the child load balancer to avoid unnecessary connection churn when the server list from the balancer is unchanged.

Comment on lines 571 to 576
// Always shutdown and recreate the child LB when addresses change to avoid
// calling Subchannel.updateAddresses(). This ensures we use the new dualstack-
// compatible path where the child LB creates fresh subchannels.
if (pickFirstLb != null) {
pickFirstLb.shutdown();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment here states that the child load balancer is recreated "when addresses change", but the current implementation recreates it on every server list update. This happens even if the list of backends is identical to the previous one, which can lead to unnecessary connection churn and performance degradation if the remote balancer sends frequent but identical server lists.

To align with the stated intent and optimize performance, I suggest caching the list of EquivalentAddressGroups and only recreating the child pick_first load balancer when this list actually changes.

You could implement this by:

  1. Adding a new field private List<EquivalentAddressGroup> pickFirstEagList; to GrpclbState.
  2. In updateServerList, for the PICK_FIRST case, before recreating the LB, check if eagList.equals(pickFirstEagList). If they are equal, you can skip recreating the LB. You would still need to update currentPickFirstLoadRecorder if it has changed.
  3. Update pickFirstEagList = eagList; when you do recreate the LB.
  4. Remember to nullify pickFirstEagList when the child LB is shut down (e.g., when the backend list becomes empty, or in the main shutdown() method).

@shivaspeaks
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PICK_FIRST mode in grpclb to delegate the logic to a standard pick_first load balancer. This is a good simplification that removes custom PICK_FIRST logic from grpclb. The implementation looks solid, using ForwardingLoadBalancerHelper to observe the child LB's state and ChildLbPickerEntry to inject token attachment logic.

The tests have been updated to reflect the new behavior, such as the eager connection of the standard pick_first policy.

I have a couple of minor comments:

  • An unrelated change in cronet/build.gradle seems to have been included by accident.
  • A comment in GrpclbLoadBalancerTest.java is confusing and should be updated.

Also, please note that the pull request description seems to be partially out of sync with the implementation. For example, it states that the child LB is recreated on every address update, but the code reuses the existing child LB instance, which is more efficient. It might be worth updating the description to avoid confusion for future readers.

task javadocs(type: Javadoc) {
source = android.sourceSets.main.java.srcDirs
classpath += files(android.getBootClasspath())
// classpath += files(android.getBootClasspath())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change seems unrelated to the grpclb changes in this pull request. It comments out a line related to the Javadoc classpath in the cronet project. Was this change intended to be part of this PR? If it's a temporary change for local testing, it should be reverted. android.getBootClasspath() is deprecated, but commenting it out might have side effects on the generated Javadocs.

assertThat(picker1.dropList).containsExactly(null, null);

// New server list with drops
// New server list with drops - child LB is recreated (no updateAddresses)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is confusing as it contradicts the behavior being tested and the comment on lines 1951-1952. The comment states "child LB is recreated (no updateAddresses)", but the test verifies that the child LB is not recreated (verify(helper, times(1)).createSubchannel(...)). Please remove this misleading comment or update it to reflect the actual behavior.

Suggested change
// New server list with drops - child LB is recreated (no updateAddresses)
// New server list with drops

@shivaspeaks
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the PICK_FIRST load balancing logic in grpclb to delegate to a standard pick_first child load balancer. This is a great improvement as it removes custom logic and aligns with the standard gRPC implementation, reducing maintenance overhead. The changes are well-implemented across GrpclbState and the corresponding tests. The core of the change is delegating address management and subchannel picking to a child pick_first LB, with a PickFirstLbHelper and ChildLbPickerEntry to glue the integration for state management and token propagation.

The implementation correctly reuses the child pick_first load balancer across address updates, which is efficient. It only recreates it when recovering from an empty server list. This is slightly different from the pull request description which suggests it's recreated on every update, but the implementation is correct and more performant.

The tests have been updated thoroughly to reflect the new behavior, such as the eager connection of the standard pick_first policy and the removal of direct updateAddresses verifications.

Overall, this is a solid refactoring. I have reviewed the changes and I don't have any specific comments or suggestions for code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants