Skip to content

refactor: improved the cli server module code organization#44875

Draft
tarekziade wants to merge 4 commits intomainfrom
tarekziade-cli-refacto
Draft

refactor: improved the cli server module code organization#44875
tarekziade wants to merge 4 commits intomainfrom
tarekziade-cli-refacto

Conversation

@tarekziade
Copy link
Copy Markdown
Collaborator

@tarekziade tarekziade commented Mar 20, 2026

What does this PR do?

Refactors src/transformers/cli/serve.py to reduce nesting depth, eliminate code duplication, and improve maintainability. No behavioral changes and the public API is unchanged.

Also added a module docstring to explain how the fastapi server works.

This change is motivated by discussion in #44566 where type checking was made a bit complex due to the current code architecture.

…e CB context manager

Refactors `src/transformers/cli/serve.py` to reduce nesting depth, eliminate code duplication,
and improve maintainability. No behavioral changes and the public API is unchanged.

This change is motivated by discussion in #44566
where type checking was made a bit complex due to the current code architecture.
@tarekziade tarekziade changed the title refactor: in cli/server extract helpers, flatten nested functions, use CB context manager refactor: extract helpers, flatten nested functions, use CB context manager Mar 20, 2026
@tarekziade tarekziade changed the title refactor: extract helpers, flatten nested functions, use CB context manager refactor: improved the cli server module code organization Mar 20, 2026
@tarekziade tarekziade self-assigned this Mar 20, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tarekziade
Copy link
Copy Markdown
Collaborator Author

I used https://gist.github.com/tarekziade/f2ef24d1986ed139928e59965b238e69 to smoke test it.

I don't think it's useful to add it in our code base since test coverage is already good.
@remi-or what scenario do you recommend here to smoke test continuous batching?

@remi-or
Copy link
Copy Markdown
Collaborator

remi-or commented Mar 20, 2026

@tarekziade if by "smoke test" (not familiar w/ the term sorry) you mean test the most important features, thena test with flash-attention, cuda graphs and asynchronous batching, since those are the flags that lead to a good throughput.

@tarekziade
Copy link
Copy Markdown
Collaborator Author

@tarekziade if by "smoke test" (not familiar w/ the term sorry) you mean test the most important features, thena test with flash-attention, cuda graphs and asynchronous batching, since those are the flags that lead to a good throughput.

By “smoke test” I mean a very simple curl-like client that exercises the server end-to-end, focusing on happy path. Basically simulating a real user and checking that “no smoke comes out.”

It’s meant to complement more targeted tests rather than replace them. Right now the script just runs a few basic operations (using SmoL135), but I can extend it with a few more scenarios to cover CB paths as well.

@tarekziade
Copy link
Copy Markdown
Collaborator Author

@SunMarc is refactoring already this module #44796
We will put this patch on hold until it's done and then resume if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants