Skip to content

Conversation

@lizexu123
Copy link
Collaborator

Fix the inference logic to use num_running_requests instead of max_num_seqs; the latter brought clear gains on smaller models.

* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
@paddle-bot
Copy link

paddle-bot bot commented Aug 5, 2025

Thanks for your contribution!

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit bc0b92b into PaddlePaddle:release/2.1 Aug 6, 2025
11 of 14 checks passed
iosmers added a commit that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants