Skip to content

bazel: RBE cleanups — explicit rbe input, gcc worker image, macOS cache-only#1027

Closed
Copilot wants to merge 3 commits intobazel-rbefrom
copilot/fix-bazel-rbe-cleanups
Closed

bazel: RBE cleanups — explicit rbe input, gcc worker image, macOS cache-only#1027
Copilot wants to merge 3 commits intobazel-rbefrom
copilot/fix-bazel-rbe-cleanups

Conversation

Copy link
Copy Markdown

Copilot AI commented May 9, 2026

The existing RBE integration in _bazel.yml mutated bazel-args via string stripping to remove --config=rbe, x-compile jobs inherited RBE flags they can't use, test-gcc lacked a gcc-capable worker container, and test-macos had no remote cache at all.

Changes

New composite action: actions/bazel/rbe/

Encapsulates auth check + flag decision in one reusable place. Used by both _bazel.yml and test-macos.

  • Inputs: flags (what to append), enabled (default 'true')
  • Runs ./actions/github/container/auth only when enabled == 'true'
  • Appends flags only when: enabled AND not a PR event AND auth passed; otherwise emits ::notice:: and outputs empty string

_bazel.yml

  • Added rbe: bool input (default false); when false, auth step is skipped entirely
  • Replaced string-stripping logic with steps.rbe.outputs.bazel-args from the composite action
  • x-compile jobs are unaffected — they only receive inputs.bazel-args, which no longer carries --config=rbe

bazel.yml callers

Job Change
test, build rbe: true; --config=rbe removed from bazel-args
test-gcc rbe: true; --config=rbe-gcc added to bazel-args
test-macos auth + cache-only step via composite action; GITHUB_TOKEN forwarded for credential helper

bazel/.bazelrc

  • common:rbe-gcc — overrides --host_platform / --extra_execution_platforms to the gcc worker platforms; used alongside --config=rbe and --config=gcc
  • common:rbe-cache-only — composes remote-cache + bes, no remote-exec; used by macOS

bazel/platforms/rbe/BUILD

Added linux_x64_gcc and linux_arm64_gcc platforms pointing at the docker-v0.1.3 gcc image:

GCC_WORKER_IMAGE = "docker://gcr.io/envoy-ci/envoy-build:gcc-v0.1.3@sha256:2a12bbcd5e95bc037bae5dfb2eb6361e55faf737098cbf90b4378e2345d9eecf"
Original prompt

Background

PR envoyproxy#4260 enables EngFlow RBE for the Bazel CI workflows. The expensive jobs now run properly with RBE, but several cleanups are needed.

This task targets branch bazel-rbe in phlax/toolshed (open the PR against that branch, NOT against main and NOT against envoyproxy/toolshed).

Current state (branch bazel-rbe)

  • .github/workflows/_bazel.yml: the bazel job runs an actions/github/container/auth step (id rbe). The actual run step parses inputs.bazel-args, and if github.event_name != pull_request and steps.rbe.outputs.authorized != 'true', it strips --config=rbe from the args via ${args//--config=rbe/}. The xcompile-x86-to-arm and xcompile-arm-to-x86 jobs also receive inputs.bazel-args which now includes --config=rbe.
  • .github/workflows/bazel.yml: test, build, and test-gcc all pass --config=rbe directly inside bazel-args. test-macos runs natively on macos-14 (does not call _bazel.yml) and uses no remote cache.

Required cleanups

1. Add an explicit rbe boolean input instead of stripping --config=rbe from bazel-args

In .github/workflows/_bazel.yml:

  • Add a new boolean input rbe (default false) to the workflow_call inputs.
  • Remove --config=rbe from bazel-args in .github/workflows/bazel.yml callers and instead set rbe: true on the callers that should use RBE.
  • The auth check step should still run (when inputs.rbe == true), and --config=rbe should be appended to the bazel command only when:
    • inputs.rbe == true, AND
    • the auth check passed (steps.rbe.outputs.authorized == 'true'), AND
    • this is not a pull_request event (auth is allowed to be skipped/unavailable for PRs from forks — in that case run without rbe).
  • Do NOT mutate inputs.bazel-args by stripping substrings. Just conditionally append --config=rbe to the bazel command line.
  • If inputs.rbe == false, skip the auth step entirely (use a step if:), and never add --config=rbe.
  • Emit a ::notice:: when RBE was requested but skipped (auth failed or PR event), explaining why.

2. x-compile jobs should NOT use RBE

The xcompile-x86-to-arm and xcompile-arm-to-x86 jobs in .github/workflows/_bazel.yml currently inherit inputs.bazel-args which (on the test caller) includes --config=rbe. These x-compile jobs are reportedly broken with RBE.

  • These jobs must run without --config=rbe regardless of the rbe input. The cleanest way given change (1) above (where --config=rbe is no longer baked into bazel-args) is that they will simply never receive it. Verify that with the new rbe input model the x-compile steps run plain bazel test ${{ inputs.bazel-args }} ... and do not get --config=rbe.

3. test-gcc should use RBE but with the gcc worker container

The gcc job in .github/workflows/bazel.yml currently does NOT use RBE (in current branch state — verify; if it does, it would fail because the default RBE worker image does not have gcc).

  • The test-gcc caller should set rbe: true.
  • The gcc actions need to dispatch to a worker container that has gcc. Check how envoy does this — envoy uses docker.io/envoyproxy/envoy-build:gcc-<tag> as a separate gcc-capable build image. The envoyproxy/toolshed repo produces these images (see docker/build/); the latest semver-published docker release at the time of writing is docker-v0.1.3, which produced envoyproxy/envoy-build:gcc-v0.1.3 (as well as worker-v0.1.3).
  • Add a new bazel config in bazel/.bazelrc (e.g. common:rbe-gcc or extend common:gcc when combined with common:rbe) that overrides the container-image exec property on the platforms to point at the latest semver-published gcc image. The cleanest approach is likely:
    • Add a new platform in bazel/platforms/rbe/BUILD (e.g. linux_x64_gcc, linux_arm64_gcc) using a GCC_WORKER_IMAGE constant pointing at docker://docker.io/envoyproxy/envoy-build:gcc-v0.1.3@sha256:<digest> (look up the actual published digest by querying the registry for envoyproxy/envoy-build:gcc-v0.1.3 — if a digest cannot be resolved, leave a TODO comment to pin and use the floating tag for now).
    • Add a common:rbe-gcc config in bazel/.bazelrc that sets --host_platform=//platforms/rbe:linux_x64_gcc and --extra_execution_platforms=//platforms/rbe:linux_x64_gcc,//platforms/rbe:linux_arm64_gcc (overriding the defaults set by common:remote-exec). It should NOT redefine the remote_executor/cache settings — instead the gcc job should pass --config=rbe --config=rbe-gcc (or the bazel.yml caller should set both --config=gcc and the new --config=rbe-gcc, alongside rbe: true).
  • Confirm via the bazel.yml test-gcc caller that with these changes, gcc actions are dispatched onto the gcc worker container.

Note: The latest docker release tag is docker-v0.1.3 (as of 2026-05-08). The image name is envoyproxy/envoy-build:gcc-v0.1.3. Use this; when a new...

This pull request was created from Copilot chat.

phlax and others added 2 commits May 9, 2026 18:30
Signed-off-by: Ryan Northey <ryan@synca.io>
Copilot AI changed the title [WIP] Fix cleanups for Bazel RBE configuration bazel: RBE cleanups — explicit rbe input, gcc worker image, macOS cache-only May 9, 2026
Copilot AI requested a review from phlax May 9, 2026 17:52
Copilot finished work on behalf of phlax May 9, 2026 17:52
@phlax phlax closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants