Skip to content

fix: Made histc_input robust for broader hardware#45687

Open
rigen1048 wants to merge 3 commits intohuggingface:mainfrom
rigen1048:OSI
Open

fix: Made histc_input robust for broader hardware#45687
rigen1048 wants to merge 3 commits intohuggingface:mainfrom
rigen1048:OSI

Conversation

@rigen1048
Copy link
Copy Markdown

@rigen1048 rigen1048 commented Apr 28, 2026

What does this PR do?

Fixes a NotImplementedError: "histogram_mps" not implemented for 'Int' when running Mixture-of-Experts (MoE) models on Apple Silicon (MPS backend).
The error occurred in src/transformers/integrations/moe.py because torch.histc does not support integer dtypes on MPS. The original condition failed to account for the MPS backend. This PR improves the logic by checking specifically for CUDA instead, as float operations are more reliably supported across a wider range of hardware, including legacy devices.

Before:

  • Use float on CPU
  • Use int on all other backend (CUDA, MPS, XPU, TPU, etc)

After:

  • Use int on CUDA (best performance)
  • Use float32 on all other backends (CPU, MPS, XPU, etc.)

Fixes #45685

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.
    • AI was used to understand broader application and write a manual smoke test script for regression checking & bring clarity to the PR

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

i think it makes more sense if we continue using int as the default for any hardware we are not aware of. because it's better to have the best perf on as many hardware as possible and only use float if it breaks it.

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

because int histograms are easier to compute than floating ones

@rigen1048
Copy link
Copy Markdown
Author

@IlyasMoutawwakil I agree with you — using int by default makes sense for performance whenever it's reliably supported, since integer histograms are generally faster and cheaper to compute.
That said, a broader range of hardware (such as MPS on Apple Silicon, and potentially some other backends) currently doesn't support integer dtypes for torch.histc, which is what caused the NotImplementedError.
I also considered the "try int first, fallback to float" pattern. While it's more aggressive at giving good performance on unknown hardware, it has a couple of downsides:

  • It can hide bugs or lead to inconsistent behavior across devices.
  • The fallback path adds a small runtime cost and makes the code slightly more complex.

Because of that, I went with the explicit approach in this PR: int on CUDA (where we know it works well and gives the best perf) and float32 everywhere else for broad compatibility.
Happy to adjust if there's a cleaner way to handle this.

@rigen1048
Copy link
Copy Markdown
Author

Given that the large majority of users (especially for MoE models) run on CUDA, this approach gives them the best performance while ensuring reliable behavior on other backends like MPS. We can always expand the int64 support list later as more backends (ROCm, XPU, etc.) stabilize.

@Vincent08199
Copy link
Copy Markdown

It feels like this is a classic performance vs portability tradeoff:

  • int histograms → better performance
  • float histograms → broader hardware support (MPS, etc.)

One question I had: do we know how significant the performance difference is in practice for MoE workloads?

If the gap is large, it might be worth keeping int as the default and falling back only when needed. But if the difference is small, the current approach (float for non-CUDA) seems safer.

Also wondering whether a "try int → fallback to float" approach could be acceptable if the failure is rare and predictable.

@rigen1048
Copy link
Copy Markdown
Author

It feels like this is a classic performance vs portability tradeoff:

* int histograms → better performance

* float histograms → broader hardware support (MPS, etc.)

One question I had: do we know how significant the performance difference is in practice for MoE workloads?

If the gap is large, it might be worth keeping int as the default and falling back only when needed. But if the difference is small, the current approach (float for non-CUDA) seems safer.

Also wondering whether a "try int → fallback to float" approach could be acceptable if the failure is rare and predictable.

Yes, that's correct. However, the main issue here is more about hardware reality than a classic performance vs portability tradeoff.

Integer histograms for torch.histc are still poorly supported outside of CUDA. They tend to fail on MPS, CPU, and several other backends (APU, TPU, etc.), and broader support may take a long time to mature.

My current approach uses int only on CUDA (where we get the best performance) and float32 everywhere else. This gives strong performance for the majority of MoE users while ensuring reliable behavior across hardware.

Regarding the "try int first, then fallback to float" idea — I actually considered it as well. While it could offer better performance on unknown hardware, I’m concerned it might introduce subtle issues such as problems with torch.compile, Python exception handling interfering with the memory cache, or inconsistent states on backends like Metal (MPS). That said, I might be overthinking it, so I’d really appreciate the maintainers’ conservative feedback on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[moe] mps interface has error "histogram_mps" not implemented for 'Int'

3 participants