[pull] master from tensorflow:master by pull[bot] · Pull Request #1636 · makesoftwaresafe/tensorflow

pull · 2026-05-11T18:10:40Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

PiperOrigin-RevId: 913642931

There was a recent upstream change and we cannot rely anymore on having the pattern be applied multiple times in one go and also deleting dead ops. So we need to delete them ourselves. This change moves the pattern from tablegen to C++ to make this possible. Also, do a small fix to the "interestingness" script to avoid printing the result of the grep command. PiperOrigin-RevId: 913646216

PiperOrigin-RevId: 913656790

Imported from GitHub PR openxla/xla#41779 • 📝 Summary of Changes This PR migrates `ReduceScatterCmd` to use `ReduceScatterThunk` directly as a command-buffer command, matching the existing `AllReduceThunk` command migration pattern. It removes the dedicated `ReduceScatterCmd` wrapper and appends `ReduceScatterThunk` as a borrowed command in the command-buffer emitter. It also adds multi-GPU command-buffer tests for `ReduceScatterThunk`, covering eager warmup via `ExecuteOnStream`, command-buffer create, command-buffer update, and output correctness. 🎯 Justification This reduces duplicate command-buffer collective plumbing and keeps reduce-scatter behavior aligned with the shared `CollectiveThunk` recording path. The change benefits GPU workloads using reduce-scatter collectives captured into command buffers, especially distributed workloads that rely on command-buffer update paths. 🚀 Kind of Contribution ♻️ Cleanup, 🧪 Tests 📊 Benchmark (for Performance Improvements) Not applicable. This PR is a cleanup/test coverage change and does not claim a performance improvement. 🧪 Unit Tests: Added/updated command-buffer recording tests in: `//xla/backends/gpu/runtime:all_reduce_thunk_test` Coverage includes: - `ReduceScatterThunkTest.RecordCommandBufferCreate` - `ReduceScatterThunkTest.RecordCommandBufferUpdate` 🧪 Execution Tests: Added multi-GPU execution coverage in: `//xla/backends/gpu/runtime:all_reduce_thunk_multigpu_test` New tests: - `ReduceScatterThunkMultiGpuTest.RecordCommandBufferCreate` - `ReduceScatterThunkMultiGpuTest.RecordCommandBufferUpdate` These run with 2 GPUs and verify expected reduce-scatter outputs for both command-buffer create and update paths. Validated locally with: `bazel test --test_output=errors --test_filter='ReduceScatterThunkMultiGpuTest.*' //xla/backends/gpu/runtime:all_reduce_thunk_multigpu_test` Copybara import of the project: -- 77960ea67396bf055ee18937c14863b082e5f1d1 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Migrate ReduceScatterCmd to thunk command -- 25dd889f23c30976c389923d43f1fba644c01e07 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Add ReduceScatterThunk multigpu tests -- 2f7b052976da7ae21a85762f0d632c9877fb1334 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Clean up ReduceScatterThunk command buffer deps -- 77715f319a63d5517e3a7ca8ba7173cfb10a26f0 by Shawn Wang <shawnw@nvidia.com>: remove usused header Merging this change closes #41779 PiperOrigin-RevId: 913656825

Imported from GitHub PR openxla/xla#42218 Add a diffing tool for clang-tidy output Copybara import of the project: -- 14f33777161c2eac9a9eeb1fa8e6b8d37413b8d3 by Sohaib Iftikhar <sohaibiftikhar@google.com>: [XLA:BUILD] Add a diffing tool for clang-tidy output Adds a diffing tool for reading clang-tidy report files and reporting on the output only if that line was affected in the change. Merging this change closes #42218 PiperOrigin-RevId: 913665880

…tendAttrs. PiperOrigin-RevId: 913675881

Flags: * mixin_max_same_mnk limits the number of mixin configs with the same M N K block sizes. This should help mitigate the cost model not differentiating configs very well. * mixin_only_faster only allows mixin configs faster than the base set. This may help reduce compile time hit while keeping perf benefits. PiperOrigin-RevId: 913705752

…imation. This old logic only worked for cases when we tile the full row anyway. Also fix the test name. PiperOrigin-RevId: 913717643

PiperOrigin-RevId: 913727465

…Attribute. PiperOrigin-RevId: 913775326

vickyliu-go4it and others added 10 commits May 11, 2026 05:18

[XLA:GPU] Added test case for unsupported kExpandShape case

6a6740a

PiperOrigin-RevId: 913642931

Refactor HloShardingV2 sharding import logic to reuse dict attribute.

35b7ce8

PiperOrigin-RevId: 913656790

Refactor HloShardingV2 sharding import logic to inline setFuncArgFron…

6d96c5a

…tendAttrs. PiperOrigin-RevId: 913675881

[XLA:GPU] Do not rely on unpadded tile sizes during memory access est…

1980e0c

…imation. This old logic only worked for cases when we tile the full row anyway. Also fix the test name. PiperOrigin-RevId: 913717643

Support virtual device memory limit in PluggableDevice

6d40c20

PiperOrigin-RevId: 913727465

Refactor HloShardingV2 sharding import logic to inline removeFrontend…

8653c16

…Attribute. PiperOrigin-RevId: 913775326

pull Bot locked and limited conversation to collaborators May 11, 2026

pull Bot added the ⤵️ pull label May 11, 2026

pull Bot merged commit 8653c16 into makesoftwaresafe:master May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from tensorflow:master#1636

[pull] master from tensorflow:master#1636
pull[bot] merged 10 commits into
makesoftwaresafe:masterfrom
tensorflow:master

pull Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

pull Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pull Bot commented May 11, 2026 •

edited

Loading