Support EB5 mini block-wise FP8 inference.#3
Open
K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
Open
Support EB5 mini block-wise FP8 inference.#3K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
Conversation
Co-authored-by: ddchenhao66 <dhaochen163.com>
* llguidance * add requirements_guided_decoding.txt and doc * fix test_guidance_*.py * fix test_guidance_*.py && mv * fix llguidance choice * test_guidance_* * rm lazy loader --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* commit * commit * commit * commit * commit * commit
…iton and safetensors, del fastsafetensors (PaddlePaddle#5330) * Remove version constraints for setuptools, triton, and fastsafetensors. * remove version for uvicorn * fix according to review
* fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…fix moe all_reduce issue (PaddlePaddle#5357)
…le#5138) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix
…addlePaddle#5197) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…le#5368) * [doc] update FAQ with logprobs MQ limits and deprecation * [doc] update FAQ with logprobs MQ limits and deprecation * update faq
…ePaddle#5367) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Fearture] Support cache kv cache for output tokens * fix bug * fix ci bug * improve coverage * enable output caching by default * fix ci --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [BugFix] fix data_processor asyn bug * fix bug
* Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command
…toring. (PaddlePaddle#5518) * support spec metrics monitor per request * fix bug * remove debug log * fix ut bugs
…cessor/image_preprocessor/image_preprocessor_adaptive.py 单测补充 (PaddlePaddle#5265) * test: add unit tests for image_preprocessor_adaptive.py (NO.25) * refactor: merge redundant test functions in test_image_preprocessor_adaptive.py * fix: fix codestyle issues - remove extra blank lines * update * update --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
…n different prefill (PaddlePaddle#5514) * Distinguish the pipelines for sending kv signal in different prefill * up
* [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…PaddlePaddle#5506) * bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize * bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize * 注释 * [bugfix] PREEMPTED task blocksize * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add mtp case * Refactor test_mtp.py for clarity and efficiency Removed duplicate import of json and simplified spec_config formatting. --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
…cessor/ernie4_5_vl_processor.py 单测补充 (PaddlePaddle#5263) * test: improve ernie4_5_vl_processor.py test coverage * update * improve coverage * update * fix: correct test expectation for thinking_mode false in test_ernie_vl_processor * remove test_process_request_dict_comprehensive test case --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: xiaozude <xiaozude@outlook.com>
7011be9 to
8c29bb1
Compare
276487f to
c1c7658
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.