Skip to content

[pull] master from tensorflow:master#1627

Merged
pull[bot] merged 62 commits into
makesoftwaresafe:masterfrom
tensorflow:master
May 9, 2026
Merged

[pull] master from tensorflow:master#1627
pull[bot] merged 62 commits into
makesoftwaresafe:masterfrom
tensorflow:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 9, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

tensorflower-gardener and others added 30 commits May 7, 2026 23:07
The new API provides `RunIsolationTestOnModule` which performs comparisons between the test runner and a reference runner, including TPU vs Defused TPU and TPU vs Interpreter.

PiperOrigin-RevId: 912336410
PiperOrigin-RevId: 912338506
… pattern

Imported from GitHub PR openxla/xla#36224

📝 Summary of Changes
This PR enables hoisting all-reduce operations out of while loops for scatter-based gradient accumulation patterns, commonly used in ZeRO-1 with gradient accumulation.

🎯 Justification
This optimization improves ZeRO-1 gradient accumulation performance by replacing all-reduce operations inside a loop with one after the loop, reducing communication overhead.

🚀 Kind of Contribution
✨ New Feature, 🧪 Tests

📊 Benchmark (for Performance Improvements)
The public HLOs in `xla/tools/benchmarks/hlo/` do not have this pattern.

🧪 Unit Tests:
Added to `xla/service/while_loop_all_reduce_code_motion_test.cc`

🧪 Execution Tests:
N/A

Copybara import of the project:

--
f253cd406a6419d2c57818b42ade3617ae92c761 by Sevin Varoglu <svaroglu@nvidia.com>:

Support all-reduce hoisting for scatter-based accumulation pattern

--
b3debd31dd5d72d49f1afdc02f7654f095837516 by Sevin Varoglu <svaroglu@nvidia.com>:

Incorporate review feedback

Merging this change closes #36224

PiperOrigin-RevId: 912351282
Creating launch dimensions was an unnecessary and error-prone step.

PiperOrigin-RevId: 912365353
Imported from GitHub PR openxla/xla#42201

Use latest bant for improved DWYU checks
Copybara import of the project:

--
50ca3e78144a751560a6543e052cc7e035ca6fce by Eugene Zhulenev <ezhulenev@openxla.org>:

Bump DWYU bant version to 0.2.7

Merging this change closes #42201

PiperOrigin-RevId: 912369650
This brings down the number of shape size computations from O(#edges) to O(#instructions).

PiperOrigin-RevId: 912384006
PiperOrigin-RevId: 912388623
PiperOrigin-RevId: 912389759
… devices

Imported from GitHub PR openxla/xla#41901

## Summary

- `MakeGlobalTopologyFromPjRtClient` hardcodes `addressable_devices()[0]->device_kind()` for every device in the topology, causing all devices to report device 0's marketing name in mixed-GPU systems
- Use `pjrt_device->device_kind()` for addressable devices, falling back to `addressable_devices()[0]` only for non-addressable devices

## Background

In multi-GPU systems with different architectures (e.g., gfx906 + gfx1101, or Radeon VII + Radeon PRO W7700), `jax.devices()` reports device 0's name for all devices:

```python
>>> [d.device_kind for d in jax.devices()]
['AMD Radeon VII', 'AMD Radeon VII']  # device 1 is actually Radeon PRO W7700
```

While `compute_capability` and `core_count` are correctly per-device, `device_kind` is not — because the IFRT topology builder at `xla/python/pjrt_ifrt/pjrt_client.cc:366` always reads from `addressable_devices()[0]`.

Traced and reproduced on a physical gfx906 + gfx1101 mixed-GPU system. The underlying PJRT layer (`PJRT_DeviceDescription_Kind`) returns correct per-device names — the bug is solely in the IFRT topology construction.

Fixes ROCm/rocm-jax#390 (cosmetic portion)

## Test plan

- [x] Verify `device_kind` reports correct per-device names in mixed-GPU systems
- [x] Verify homogeneous multi-GPU systems are unaffected
- [x] Verify single-GPU systems are unaffected
- [x] Verify non-addressable device fallback still works
Copybara import of the project:

--
76b9c85c046d04ca5a114791a85cdbd84b78f22d by Luca Bruni <lucbruni@amd.com>:

Fix IFRT topology reporting device 0's device_kind for all devices.

MakeGlobalTopologyFromPjRtClient always used
addressable_devices()[0]->device_kind() when populating the DeviceProto
for every device in the topology. In mixed-GPU systems (e.g., different
GPU architectures), this caused all devices to report device 0's
marketing name instead of their own.

Use pjrt_device->device_kind() for addressable devices, falling back to
addressable_devices()[0] only for non-addressable devices where no
PjRtDevice is available.

Co-Authored-By: Luca Bruni <lucbruni@amd.com>
Co-Authored-By: Luca Bruni <lucaalexbruni@gmail.com>

Merging this change closes #41901

PiperOrigin-RevId: 912404916
Imported from GitHub PR openxla/xla#42096

Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 8.0.0 to 8.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p>
<blockquote>
<h2>v8.1.0 🌈 New input <code>no-project</code></h2>
<h2>Changes</h2>
<p>This add the a new boolean input <code>no-project</code>.
It only makes sense to use in combination with <code>activate-environment: true</code> and will append <code>--no project</code> to the <code>uv venv</code> call. This is for example useful <a href="https://redirect.github.com/astral-sh/setup-uv/issues/854">if you have a pyproject.toml file with parts unparseable by uv</a></p>
<h2>🚀 Enhancements</h2>
<ul>
<li>Add input no-project in combination with activate-environment <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/856">#856</a>)</li>
</ul>
<h2>🧰 Maintenance</h2>
<ul>
<li>fix: grant contents:write to validate-release job <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/860">#860</a>)</li>
<li>Add a release-gate step to the release workflow <a href="https://github.com/zanieb"><code>@​zanieb</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/859">#859</a>)</li>
<li>Draft commitish releases <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/858">#858</a>)</li>
<li>Add action-types.yml to instructions <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/857">#857</a>)</li>
<li>chore: update known checksums for 0.11.7 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/853">#853</a>)</li>
<li>Refactor version resolving <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/852">#852</a>)</li>
<li>chore: update known checksums for 0.11.6 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/850">#850</a>)</li>
<li>chore: update known checksums for 0.11.5 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/845">#845</a>)</li>
<li>chore: update known checksums for 0.11.4 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/843">#843</a>)</li>
<li>Add a release workflow <a href="https://github.com/zanieb"><code>@​zanieb</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/839">#839</a>)</li>
<li>chore: update known checksums for 0.11.3 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/836">#836</a>)</li>
</ul>
<h2>📚 Documentation</h2>
<ul>
<li>Update ignore-nothing-to-cache documentation <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/833">#833</a>)</li>
<li>Pin setup-uv docs to v8 <a href="https://github.com/eifinger"><code>@​eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/829">#829</a>)</li>
</ul>
<h2>⬆️ Dependency updates</h2>
<ul>
<li>chore(deps): bump release-drafter/release-drafter from 7.1.1 to 7.2.0 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/855">#855</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/astral-sh/setup-uv/commit/08807647e7069bb48b6ef5acd8ec9567f424441b"><code>0880764</code></a> fix: grant contents:write to validate-release job (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/860">#860</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/717d6aba0f15312f509f5c4999e34d71ecbab8a9"><code>717d6ab</code></a> Add a release-gate step to the release workflow (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/859">#859</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/5a911eb3a3983b5e650f2dad95c1ce698ca94378"><code>5a911eb</code></a> Draft commitish releases (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/858">#858</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/080c31e04cd7155b0ca676d08c7bc260a4476a23"><code>080c31e</code></a> Add action-types.yml to instructions (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/857">#857</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/b3e97d2ba1a1eed7e9d1f8456dd06c3b725bc3a6"><code>b3e97d2</code></a> Add input no-project in combination with activate-environment (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/856">#856</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/7dd591db9557f680290587fcc578372813b9ff64"><code>7dd591d</code></a> chore(deps): bump release-drafter/release-drafter from 7.1.1 to 7.2.0 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/855">#855</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/1541b7762698877904805605192ecd63d0e4787a"><code>1541b77</code></a> chore: update known checksums for 0.11.7 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/853">#853</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/cdfb2ee6dde255817c739680168ad81e184c4bfb"><code>cdfb2ee</code></a> Refactor version resolving (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/852">#852</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/cb84d12dc6a0d495b82fcae14fa4559b90698660"><code>cb84d12</code></a> chore: update known checksums for 0.11.6 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/850">#850</a>)</li>
<li><a href="https://github.com/astral-sh/setup-uv/commit/1912cc65f2e839707d7a16f2372f30b57d35fd80"><code>1912cc6</code></a> chore: update known checksums for 0.11.5 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/845">#845</a>)</li>
<li>Additional commits viewable in <a href="https://github.com/astral-sh/setup-uv/compare/cec208311dfd045dd5311c1add060b2062131d57...08807647e7069bb48b6ef5acd8ec9567f424441b">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=8.0.0&new-version=8.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>
Copybara import of the project:

--
8ec91047cf93fc7cc20ea4d7849a60890ed01cf1 by dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>:

Bump astral-sh/setup-uv from 8.0.0 to 8.1.0

Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 8.0.0 to 8.1.0.
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](astral-sh/setup-uv@cec2083...0880764)

---
updated-dependencies:
- dependency-name: astral-sh/setup-uv
  dependency-version: 8.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Merging this change closes #42096

PiperOrigin-RevId: 912405230
Imported from GitHub PR openxla/xla#42198

When loading AOT result `xla_dump_to` path pointes to a path on a machine that compiled the binary. Override it with the path for the current process to write all dumps to correct location.
Copybara import of the project:

--
6cebc8044554fccd326bef1f82fc251be1fe005d by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Override xla_dump_to path when loading AOT result

Merging this change closes #42198

PiperOrigin-RevId: 912405569
Imported from GitHub PR openxla/xla#42103

🚀 Kind of Contribution
♻️ Cleanup

Copybara import of the project:

--
62edde4be67bf0154635595852e44079423997df by Aleksei Nurmukhametov <anurmukh@amd.com>:

[NFC] Fix IWYU/DWYU in xla/stream_executor/device_description_test.cc

Merging this change closes #42103

PiperOrigin-RevId: 912406790
Imported from GitHub PR openxla/xla#41440

- Add `DynamicSliceFusion` analysis library  that extracts hero instructions, resolves parameter/result slices, and returns `DynamicSliceConfig` protos from dynamic-slice fusion computations.
- Key APIs: `FindHero()` locates the hero op inside a fusion body, `ResolveParameters()` maps hero operands back to fusion parameters with slice configs and per-dimension offsets, `ResolveResults()` does the same for DUS outputs.
- Offsets are either `ConstantOffset` (literal sunk into the fusion) or `RuntimeOffset` (fusion parameter holding the induction variable), enabling the thunk emitter to verify annotated offsets at runtime.
Copybara import of the project:

--
d97fa268c679c74f6346e3eb07418bd4358f4722 by Eugene Zhulenev <ezhulenev@openxla.org>:

[xla:gpu] Add dynamic-slice fusion analysis library

Merging this change closes #41440

PiperOrigin-RevId: 912408149
This flag will be effective when we switch downstream projects (JAX, TF) to
Bzlmod, otherwise they are no-ops.

We can later remove --override_repository when the Bzlmod migration is done.

PiperOrigin-RevId: 912431884
PiperOrigin-RevId: 912432140
The test now starts by parsing a SymbolicMap from a string, converts it to an AffineMap, and then converts it back to a SymbolicMap to verify the round trip. This removes the need for a custom AffineMap parsing helper and therefore a TODO.

I guess I could just directly remove the class, but since Adrian showed me that we still are using it in one place (simplify_affine), I will leave the file for now since it's not hurting.

PiperOrigin-RevId: 912437098
The calculation of L2 bytes now takes into account the element types of the LHS and RHS operands, providing a more accurate estimate of the data loaded from L2.
More info: First conclusion of b/501002656#comment2.

We are now overestimating but this might be a separate problem as Nikita suggested in b/501002656#comment3. Also, the case-study of the bug is still producing the same suggestions for all the configs. But anyway, with this small change, cost-model suggestion has clearly improved.

The tests had to be adjusted, probably because of the issue mentioned in b/510666436, so I added the TODO accordingly.

PiperOrigin-RevId: 912437307
PiperOrigin-RevId: 912439391
…ion) in HeapAlgorithmWithFallback for preventing device OOMs.

PiperOrigin-RevId: 912451895
The GPU dot fusion cost model now checks if any transitive user of a dot instruction is a transpose. If a transpose is found in the users' graph, the dot fusion is marked as unsupported. A TODO() has been left so we can track the inclusion in the future

PiperOrigin-RevId: 912453825
To match API of experimental::TilingHloInstruction.

PiperOrigin-RevId: 912455533
so we don't need --use_experimental_tiling and can also set other flags if need be

PiperOrigin-RevId: 912463889
PatriosTheGreat and others added 27 commits May 8, 2026 07:54
…rt shardings.

To avoid context bloat. In MLIR, updating function result attributes requires creating a new ArrayAttr that holds the DictionaryAttr for all results.

PiperOrigin-RevId: 912524956
This change introduces a new GitHub Actions workflow (`actions-lint.yml`) to run `actionlint` on changes to workflow files. It also fixes several errors found by actionlint.

The change also includes several improvements to existing workflows, such as quoting variables in shell commands, using here-documents for writing to output files, and adding a `halt-for-connection` input to some workflows for manual triggering. Minor fixes to conditional checks and command formatting are also included.

PiperOrigin-RevId: 912531462
PiperOrigin-RevId: 912535650
This change updated scheduling (used by experimental emitter) to support customized traversal order. For dot, the experimental emitter now optimizes cache hits by swapping m and n traversal whenever the LHS operand is smaller than the RHS.

PiperOrigin-RevId: 912556582
…eamz

Adds a new field "enable_priority_queue" to the "/tensorflow/serving/batching/mixed_priority_batching_policy" streamz metric to provide more context on whether priority queueing is enabled when reporting the mixed priority batching policy.

PiperOrigin-RevId: 912586091
- When there is padding, the padding value should be the reduce identity value, not the reduce initial value.
- Windows of size > 1 still need to be disclaimed as supported if the stride is not 1.

PiperOrigin-RevId: 912589684
Updating (removing or setting) N func args/results one by one in a loop creates N^2 pointers, that is, each one will create an ArrayAttr of N pointers.

PiperOrigin-RevId: 912590644
…ion is seen

during the upward traversal.

PiperOrigin-RevId: 912593212
PiperOrigin-RevId: 912629390
…aled_dot and writing a helper method SupportsPrecisionConfig

PiperOrigin-RevId: 912667637
Eventually we want to replace SymbolicTileAnalysis. In order to do that, we
need to have a way to evaluate whether certain tile sizes are valid. As a
preparation, add the necessary constraints for Concat during tiling
propagation. Also, the current constraint was too strong, it would have
disallowed valid cases with slice(concat). Improve the constraints to handle
cases with a constant base offset as well.

PiperOrigin-RevId: 912669003
Reverts 582b956

PiperOrigin-RevId: 912706974
PiperOrigin-RevId: 912723972
…contact Shardy team.

PiperOrigin-RevId: 912738934
tsl::AsyncValue.

PiperOrigin-RevId: 912749857
@pull pull Bot locked and limited conversation to collaborators May 9, 2026
@pull pull Bot added the ⤵️ pull label May 9, 2026
@pull pull Bot merged commit a4307ab into makesoftwaresafe:master May 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.