fix failure of pytest tests/models/paddleocr_vl/test_modeling_paddleo… by sywangyi · Pull Request #43001 · huggingface/transformers

sywangyi · 2025-12-22T13:05:45Z

…cr_vl.py::PaddleOCRVLModelTest::test_flash_attn_2_fp32_ln

quantization_config is not set in text and visual part. so datatype conversion before flash attn will not be called.

vision models: @yonigozlan @molbap
multimodal models: @zucchini-nlp

pytest tests/models/paddleocr_vl/test_modeling_paddleocr_vl.py::PaddleOCRVLModelTest::test_flash_attn_2_fp32_ln

failure like

../bk/hub/models--kernels-community--flash-attn2/snapshots/172e23272e585d3c0d97124bc690593af81a0b95/build/torch29-cxx11-cu128-x86_64-linux/flash_attn2/flash_attn_interface.py:1199: in flash_attn_func
    return FlashAttnFunc.apply(
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/autograd/function.py:581: in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../bk/hub/models--kernels-community--flash-attn2/snapshots/172e23272e585d3c0d97124bc690593af81a0b95/build/torch29-cxx11-cu128-x86_64-linux/flash_attn2/flash_attn_interface.py:837: in forward
    out_padded, softmax_lse, S_dmask, rng_state = _wrapped_flash_attn_forward(
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_ops.py:1255: in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_library/autograd.py:111: in autograd_impl
    result = forward_no_grad(*args, Metadata(keyset, keyword_only_args))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_library/autograd.py:40: in forward_no_grad
    result = op.redispatch(keyset & _C._after_autograd_keyset, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_ops.py:848: in redispatch
    return self._handle.redispatch_boxed(keyset, *args, **kwargs)  # type: ignore[return-value]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_library/custom_ops.py:343: in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_compile.py:53: in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:1044: in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_library/custom_ops.py:376: in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
../bk/hub/models--kernels-community--flash-attn2/snapshots/172e23272e585d3c0d97124bc690593af81a0b95/build/torch29-cxx11-cu128-x86_64-linux/flash_attn2/flash_attn_interface.py:94: in _flash_attn_forward
    out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.fwd(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <OpOverloadPacket(op='_flash_attn_9e27194.fwd')>
args = (tensor([[[[0, 0, 0,  ..., 0, 0, 0],
          [0, 0, 0,  ..., 0, 0, 0],
          [0, 0, 0,  ..., 0, 0, 0],
         ...0, 0,  ..., 0, 0, 0],
          [0, 0, 0,  ..., 0, 0, 0]]]], device='cuda:0', dtype=torch.uint8), None, None, 0.0, ...)
kwargs = {}

    def __call__(self, /, *args: _P.args, **kwargs: _P.kwargs) -> _T:
        # overloading __call__ to ensure torch.ops.foo.bar()
        # is still callable from JIT
        # We save the function ptr as the `op` attribute on
        # OpOverloadPacket to access it here.

        # Directly calling OverloadPacket goes into C++, which will check
        # the schema and cause an error for torchbind op when inputs consist of FakeScriptObject so we
        # intercept it here and call TorchBindOpverload instead.
        if self._has_torchbind_op_overload and _must_dispatch_in_python(args, kwargs):
            return _call_overload_packet_from_python(self, *args, **kwargs)
>       return self._op(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
E       RuntimeError: FlashAttention only support fp16 and bf16 data type

/mnt/disk0/wangyi/miniforge3/envs/transformers/lib/python3.11/site-packages/torch/_ops.py:1255: RuntimeError

tests/models/paddleocr_vl/test_modeling_paddleocr_vl.py::PaddleOCRVLModelTest::test_flash_attn_2_fp32_ln quantization_config is not set in text and visual part. so datatype conversion before flash attn will not be called. Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

github-actions · 2025-12-22T13:41:00Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: paddleocr_vl

ArthurZucker

in general we should never have to fix quantization in model specific code! cc @SunMarc

yao-matrix · 2026-01-06T22:26:15Z

@SunMarc , could you pls suggest a way for this? Thx very much

SunMarc

Sorry this regression was most likely caused by #42882. I will fix it

SunMarc · 2026-01-07T14:31:51Z

Please give the PR above a try !

sywangyi · 2026-01-08T04:40:20Z

Please give the PR above a try !

yes, this PR fix it

sywangyi force-pushed the paddleocr_vl branch from 79e636e to a45997a Compare December 22, 2025 13:11

fix quality

277a25e

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

ArthurZucker reviewed Jan 5, 2026

View reviewed changes

SunMarc reviewed Jan 7, 2026

View reviewed changes

SunMarc mentioned this pull request Jan 7, 2026

Fix flashattn wrt quantized models #43145

Merged

sywangyi closed this Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix failure of pytest tests/models/paddleocr_vl/test_modeling_paddleo…#43001

fix failure of pytest tests/models/paddleocr_vl/test_modeling_paddleo…#43001
sywangyi wants to merge 2 commits intohuggingface:mainfrom
sywangyi:paddleocr_vl

sywangyi commented Dec 22, 2025

Uh oh!

github-actions Bot commented Dec 22, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

yao-matrix commented Jan 6, 2026

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc commented Jan 7, 2026 •

edited

Loading

Uh oh!

sywangyi commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sywangyi commented Dec 22, 2025

Uh oh!

github-actions Bot commented Dec 22, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

yao-matrix commented Jan 6, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SunMarc commented Jan 7, 2026 •

edited

Loading

sywangyi commented Jan 8, 2026 •

edited

Loading