[BugFix] support real batch_size #3217

lizexu123 · 2025-08-05T09:26:23Z

Fix the inference logic to use num_running_requests instead of max_num_seqs; the latter brought clear gains on smaller models.

* support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix

paddle-bot · 2025-08-05T09:26:27Z

Thanks for your contribution!

This reverts commit bc0b92b.

Jiang-Jia-Jun merged commit bc0b92b into PaddlePaddle:release/2.1 Aug 6, 2025
11 of 14 checks passed

iosmers added a commit that referenced this pull request Aug 8, 2025

Revert "[BugFix] support real batch_size (#3109) (#3217)"

94177cf

This reverts commit bc0b92b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] support real batch_size #3217

[BugFix] support real batch_size #3217

Uh oh!

lizexu123 commented Aug 5, 2025

Uh oh!

paddle-bot bot commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix] support real batch_size #3217

[BugFix] support real batch_size #3217

Uh oh!

Conversation

lizexu123 commented Aug 5, 2025

Uh oh!

paddle-bot bot commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants