Skip to content

Add universal phone recognition model - PhoneticXeus#45355

Open
Shikhar-S wants to merge 8 commits intohuggingface:mainfrom
Shikhar-S:pxeus
Open

Add universal phone recognition model - PhoneticXeus#45355
Shikhar-S wants to merge 8 commits intohuggingface:mainfrom
Shikhar-S:pxeus

Conversation

@Shikhar-S
Copy link
Copy Markdown

@Shikhar-S Shikhar-S commented Apr 10, 2026

What does this PR do?

This PR introduces PhoneticXeus, which is the state-of-the-art universal phone recognizer trained on 70+ languages and evaluated on ~100 languages. The model should have high utility for the linguistic, phonological and multilingual research community.
Since this is currently the best-performing multilingual phone recognition model available, I expect the integration here will also attract new users to HF.

The E-Branchformer encoder used here is architecturally distinct from existing models, and has been employed in top performing speech models (XEUS, OWSM v2+ etc). The implementation here (ported from espnet) could also serve as a foundation for future speech models in HF.

Code Agent Policy

I have read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR. I used coding agent for initial draft of this PR. Then I tested the output with original implementations, and manually reduced verbosity of the code generated.

Before submitting

@Shikhar-S Shikhar-S marked this pull request as draft April 10, 2026 04:27
@Rocketknight1
Copy link
Copy Markdown
Member

Rocketknight1 commented Apr 10, 2026

Audio model I guess, albeit an unusual one, so cc @ebezzam @eustlb! Transcription to IPA is very niche (but very cool) though, so we may want to keep this as a remote code model unless we expect significant usage.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, phoneticxeus

@Shikhar-S
Copy link
Copy Markdown
Author

Thanks @Rocketknight1 for the initial review!
I have added motivation for this PR to the description above. Additionally, integrating the model into HF ecosystem will lower the barrier for people to easily reproduce results and build on top.

@Shikhar-S Shikhar-S changed the title Pxeus Add universal phone recognition model - PhoneticXeus Apr 11, 2026
@Shikhar-S Shikhar-S marked this pull request as ready for review April 11, 2026 02:43
@ebezzam ebezzam added the Audio label Apr 13, 2026
@eustlb
Copy link
Copy Markdown
Contributor

eustlb commented Apr 13, 2026

Hey @Shikhar-S, thanks a lot for your PR, really cool contribution!

I’d like to confirm what @Rocketknight1 mentioned above: this model is a better candidate for remote code, which will allow you and users to fully leverage the HF ecosystem. If it gains strong usage, it could become a great candidate for native support 🤗

Thanks a lot for your understanding! I’m happy to help with the remote code option if you need it

@Shikhar-S
Copy link
Copy Markdown
Author

Shikhar-S commented Apr 14, 2026

Hi @eustlb thanks for taking a look!
For adding it via remote code option, do I need to move the modeling code to my hf repo, and just keep the auto registrations here? Is there an example I can look at/or some documentation for this option? I will change the PR accordingly. Thanks!

edit: adding @Rocketknight1 to advise next action for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants