-
Notifications
You must be signed in to change notification settings - Fork 693
[Feature] add envs FD_USE_PHI_FP8_QUANT #6319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/2.4
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.4 #6319 +/- ##
==============================================
Coverage ? 56.22%
==============================================
Files ? 331
Lines ? 41291
Branches ? 6292
==============================================
Hits ? 23217
Misses ? 16246
Partials ? 1828
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy/envs.py
Outdated
| # Timeout for worker process health check in seconds | ||
| "FD_WORKER_ALIVE_TIMEOUT": lambda: int(os.getenv("FD_WORKER_ALIVE_TIMEOUT", "30")), | ||
| # Whether to use phi FP8 quantization,if 1,use paddle default. | ||
| "FD_USE_PHI_FP8_QUANT": lambda: int(os.getenv("FD_USE_PHI_FP8_QUANT", "1")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "FD_USE_PHI_FP8_QUANT": lambda: int(os.getenv("FD_USE_PHI_FP8_QUANT", "1")), | |
| "FD_USE_PHI_FP8_QUANT": lambda: bool(int(os.getenv("FD_USE_PHI_FP8_QUANT", "1"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
| x, using_pow2_scale=False, output_scale_transpose=True | ||
| ) | ||
| x_scale_tensor = x_scale_tensor.T | ||
| if fastdeploy.envs.FD_USE_PHI_FP8_QUANT == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if fastdeploy.envs.FD_USE_PHI_FP8_QUANT == 0: | |
| if not fastdeploy.envs.FD_USE_PHI_FP8_QUANT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
|
/re-run run_ce_cases |
Motivation
Modifications
FD_USE_PHI_FP8_QUANT
控制是否使用fd的量化算子,默认为1,使用paddle的量化算子
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.