Skip to content

chore(typing): extend typing to src/transformers/cli #44566

Closed
tarekziade wants to merge 3 commits intomainfrom
tarekziade-typing-cli
Closed

chore(typing): extend typing to src/transformers/cli #44566
tarekziade wants to merge 3 commits intomainfrom
tarekziade-typing-cli

Conversation

@tarekziade
Copy link
Copy Markdown
Collaborator

@tarekziade tarekziade commented Mar 10, 2026

This patch extends ty check to src/transformers/cli

Based on #44412

@tarekziade tarekziade self-assigned this Mar 10, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: fbgemm_fp8, finegrained_fp8, gptq, higgs, hqq, metal, mxfp4, sinq

@tarekziade tarekziade force-pushed the tarekziade-typing-cli branch from 524e015 to 5f7993f Compare March 16, 2026 09:13
@tarekziade tarekziade requested a review from vasqu March 17, 2026 08:42
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments from my side: I think we need a better workaround or else we will encounter these issues in the whole codebase again. Just my intuition/impression, it may as well not be that bad

Comment thread src/transformers/cli/system.py
Comment on lines +117 to +120
elif pt_hpu_available and hasattr(torch, "hpu"):
info["Using HPU in script?"] = "<fill in>"
info["HPU type"] = torch.hpu.get_device_name()
elif pt_npu_available:
elif pt_npu_available and hasattr(torch, "npu"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we not need this for all devices? Cuda, xpu does not need it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like for safetensor it depends on how torch has declared its types and also if the API exists in alll supported version. On our side the safest bet is to assume it's not there, and always check for it.

Here, this was an automated change on failures, but we should do this for all torch.something

Comment thread src/transformers/cli/serve.py Outdated
Comment thread src/transformers/cli/serve.py Outdated
Comment thread src/transformers/cli/serve.py Outdated
Comment on lines +886 to +888
cb_manager = self.running_continuous_batching_manager
if cb_manager is None:
raise RuntimeError("Continuous batching manager failed to initialize")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cb_manager = self.running_continuous_batching_manager
if cb_manager is None:
raise RuntimeError("Continuous batching manager failed to initialize")
if self.running_continuous_batching_manager is None:
raise RuntimeError("Continuous batching manager failed to initialize")

I guess this is needed, but just double-checking: Do we the local var conversion?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue is that this function contains 3 sub functions - that variable is narrowed as ContinuousBatchingManager | None, and nested closures can't benefit from the narrowing guard.

The real problem is how large and complex that function is, the real fix is to refactor it but I was trying to minimize the diff. Happy to do it though if you think that's in scope for this patch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, no worries I think we can keep it as is for now. But ccing @remi-or for CB viz

Comment thread src/transformers/cli/serve.py Outdated
Comment thread src/transformers/cli/serve.py Outdated
Comment thread src/transformers/cli/serve.py Outdated
@tarekziade tarekziade force-pushed the tarekziade-typing-cli branch from 7bbc1ae to b1104ee Compare March 18, 2026 06:44
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already better to me 🤗 just a few smaller comments. Imo, the biggest point remains on how we handle the batch encoding --> this is very core and I think we should take our time here

Comment on lines +886 to +888
cb_manager = self.running_continuous_batching_manager
if cb_manager is None:
raise RuntimeError("Continuous batching manager failed to initialize")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, no worries I think we can keep it as is for now. But ccing @remi-or for CB viz

inputs = processor.apply_chat_template(
req["messages"], return_tensors="pt", add_generation_prompt=True, return_dict=True
).to(model.device)["input_ids"][0]
chat_inputs = require_batch_encoding(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening re #44566 (comment) because I'm lazy and want everything collected in the same review :D

How much lifecycle for 3.10 is left 🤔 I think it might be worth it. The usage in other places of the code base should be similar so we hit this sooner or later I feel like

"Using `fp_quant` with real quantization requires a **Blackwell GPU** and qutlass: `git clone https://github.com/IST-DASLab/qutlass.git && cd qutlass && pip install --no-build-isolation .`. You can use `FPQuantConfig(pseudoquantization=True, ...)` to use Triton-based pseudo-quantization. It doesn't provide any speedups but emulates the quantization behavior of the real quantization."
)

if (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unrelated 👀

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ewww, thanks, looks like a bad rebase

tarekziade added a commit that referenced this pull request Mar 20, 2026
…e CB context manager

Refactors `src/transformers/cli/serve.py` to reduce nesting depth, eliminate code duplication,
and improve maintainability. No behavioral changes and the public API is unchanged.

This change is motivated by discussion in #44566
where type checking was made a bit complex due to the current code architecture.
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 27, 2026

Argh, it changed a lot on main 😭 do we want to close/draft this one for now @tarekziade

@tarekziade
Copy link
Copy Markdown
Collaborator Author

@vasqu I can rebase no worries, I am waiting on @SunMarc refactoring of the server cli here #44796 before resuming this one

Add type declarations for mixin host-class attributes on GenerationMixin,
class-level annotations for dynamically-set attributes on GenerationConfig,
and fix minor typing issues in candidate_generator, watermarking, and
stopping_criteria. Create _typing.py Protocol for documentation/reuse.
@tarekziade tarekziade force-pushed the tarekziade-typing-cli branch from 0ea408b to 0adcb07 Compare April 1, 2026 15:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44566&sha=7606ab

@tarekziade
Copy link
Copy Markdown
Collaborator Author

refactoring too complex, will cherry pick in a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants