refactor: improved the cli server module code organization#44875
refactor: improved the cli server module code organization#44875tarekziade wants to merge 4 commits intomainfrom
Conversation
…e CB context manager Refactors `src/transformers/cli/serve.py` to reduce nesting depth, eliminate code duplication, and improve maintainability. No behavioral changes and the public API is unchanged. This change is motivated by discussion in #44566 where type checking was made a bit complex due to the current code architecture.
…rd to follow and complex
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I used https://gist.github.com/tarekziade/f2ef24d1986ed139928e59965b238e69 to smoke test it. I don't think it's useful to add it in our code base since test coverage is already good. |
|
@tarekziade if by "smoke test" (not familiar w/ the term sorry) you mean test the most important features, thena test with flash-attention, cuda graphs and asynchronous batching, since those are the flags that lead to a good throughput. |
By “smoke test” I mean a very simple curl-like client that exercises the server end-to-end, focusing on happy path. Basically simulating a real user and checking that “no smoke comes out.” It’s meant to complement more targeted tests rather than replace them. Right now the script just runs a few basic operations (using SmoL135), but I can extend it with a few more scenarios to cover CB paths as well. |
What does this PR do?
Refactors
src/transformers/cli/serve.pyto reduce nesting depth, eliminate code duplication, and improve maintainability. No behavioral changes and the public API is unchanged.Also added a module docstring to explain how the fastapi server works.
This change is motivated by discussion in #44566 where type checking was made a bit complex due to the current code architecture.