Skip to content

Fix _get_feat_extract_output_lengths in qwen3_omni_moe#45091

Closed
hkc5 wants to merge 1 commit intohuggingface:mainfrom
hkc5:fix-qwen3-omni-moe-feat-extract
Closed

Fix _get_feat_extract_output_lengths in qwen3_omni_moe#45091
hkc5 wants to merge 1 commit intohuggingface:mainfrom
hkc5:fix-qwen3-omni-moe-feat-extract

Conversation

@hkc5
Copy link
Copy Markdown

@hkc5 hkc5 commented Mar 29, 2026

This PR fixes the unexpected behaviour of helper function _get_feat_extract_output_lengths in qwen3_omni_moe as reported in #45083.

Problem

The current implementation incorrectly calculates the output length of the convolutional layers by:

  1. Taking modulo 100 of input lengths
  2. Adding a correction factor of (input_lengths // 100) * 13

This does not align with the official PyTorch Conv2d formula.

Fix

Updated the function to correctly calculate the output length based on the PyTorch Conv2d formula:

  • For Conv2d with kernel_size=3, stride=2, padding=1: output = (input - 1) // 2 + 1
  • Applied sequentially for the 3 conv layers in the audio encoder

Files Changed

  • src/transformers/models/qwen3_omni_moe/modeling_qwen3_omni_moe.py
  • src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py
  • src/transformers/models/qwen3_omni_moe/processing_qwen3_omni_moe.py

Fixes #45083

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected behaviour of helper function _get_feat_extract_output_lengths in qwen3_omni_moe

2 participants