[pull] master from tensorflow:master#1636
Merged
Merged
Conversation
PiperOrigin-RevId: 913642931
There was a recent upstream change and we cannot rely anymore on having the pattern be applied multiple times in one go and also deleting dead ops. So we need to delete them ourselves. This change moves the pattern from tablegen to C++ to make this possible. Also, do a small fix to the "interestingness" script to avoid printing the result of the grep command. PiperOrigin-RevId: 913646216
PiperOrigin-RevId: 913656790
Imported from GitHub PR openxla/xla#41779 • 📝 Summary of Changes This PR migrates `ReduceScatterCmd` to use `ReduceScatterThunk` directly as a command-buffer command, matching the existing `AllReduceThunk` command migration pattern. It removes the dedicated `ReduceScatterCmd` wrapper and appends `ReduceScatterThunk` as a borrowed command in the command-buffer emitter. It also adds multi-GPU command-buffer tests for `ReduceScatterThunk`, covering eager warmup via `ExecuteOnStream`, command-buffer create, command-buffer update, and output correctness. 🎯 Justification This reduces duplicate command-buffer collective plumbing and keeps reduce-scatter behavior aligned with the shared `CollectiveThunk` recording path. The change benefits GPU workloads using reduce-scatter collectives captured into command buffers, especially distributed workloads that rely on command-buffer update paths. 🚀 Kind of Contribution ♻️ Cleanup, 🧪 Tests 📊 Benchmark (for Performance Improvements) Not applicable. This PR is a cleanup/test coverage change and does not claim a performance improvement. 🧪 Unit Tests: Added/updated command-buffer recording tests in: `//xla/backends/gpu/runtime:all_reduce_thunk_test` Coverage includes: - `ReduceScatterThunkTest.RecordCommandBufferCreate` - `ReduceScatterThunkTest.RecordCommandBufferUpdate` 🧪 Execution Tests: Added multi-GPU execution coverage in: `//xla/backends/gpu/runtime:all_reduce_thunk_multigpu_test` New tests: - `ReduceScatterThunkMultiGpuTest.RecordCommandBufferCreate` - `ReduceScatterThunkMultiGpuTest.RecordCommandBufferUpdate` These run with 2 GPUs and verify expected reduce-scatter outputs for both command-buffer create and update paths. Validated locally with: `bazel test --test_output=errors --test_filter='ReduceScatterThunkMultiGpuTest.*' //xla/backends/gpu/runtime:all_reduce_thunk_multigpu_test` Copybara import of the project: -- 77960ea67396bf055ee18937c14863b082e5f1d1 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Migrate ReduceScatterCmd to thunk command -- 25dd889f23c30976c389923d43f1fba644c01e07 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Add ReduceScatterThunk multigpu tests -- 2f7b052976da7ae21a85762f0d632c9877fb1334 by Shawn Wang <shawnw@nvidia.com>: [xla:gpu] Clean up ReduceScatterThunk command buffer deps -- 77715f319a63d5517e3a7ca8ba7173cfb10a26f0 by Shawn Wang <shawnw@nvidia.com>: remove usused header Merging this change closes #41779 PiperOrigin-RevId: 913656825
Imported from GitHub PR openxla/xla#42218 Add a diffing tool for clang-tidy output Copybara import of the project: -- 14f33777161c2eac9a9eeb1fa8e6b8d37413b8d3 by Sohaib Iftikhar <sohaibiftikhar@google.com>: [XLA:BUILD] Add a diffing tool for clang-tidy output Adds a diffing tool for reading clang-tidy report files and reporting on the output only if that line was affected in the change. Merging this change closes #42218 PiperOrigin-RevId: 913665880
…tendAttrs. PiperOrigin-RevId: 913675881
Flags: * mixin_max_same_mnk limits the number of mixin configs with the same M N K block sizes. This should help mitigate the cost model not differentiating configs very well. * mixin_only_faster only allows mixin configs faster than the base set. This may help reduce compile time hit while keeping perf benefits. PiperOrigin-RevId: 913705752
…imation. This old logic only worked for cases when we tile the full row anyway. Also fix the test name. PiperOrigin-RevId: 913717643
PiperOrigin-RevId: 913727465
…Attribute. PiperOrigin-RevId: 913775326
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )