chore(typing): extend typing to src/transformers/cli #44566
chore(typing): extend typing to src/transformers/cli #44566tarekziade wants to merge 3 commits intomainfrom
src/transformers/cli #44566Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: fbgemm_fp8, finegrained_fp8, gptq, higgs, hqq, metal, mxfp4, sinq |
524e015 to
5f7993f
Compare
vasqu
left a comment
There was a problem hiding this comment.
Some initial comments from my side: I think we need a better workaround or else we will encounter these issues in the whole codebase again. Just my intuition/impression, it may as well not be that bad
| elif pt_hpu_available and hasattr(torch, "hpu"): | ||
| info["Using HPU in script?"] = "<fill in>" | ||
| info["HPU type"] = torch.hpu.get_device_name() | ||
| elif pt_npu_available: | ||
| elif pt_npu_available and hasattr(torch, "npu"): |
There was a problem hiding this comment.
Would we not need this for all devices? Cuda, xpu does not need it?
There was a problem hiding this comment.
like for safetensor it depends on how torch has declared its types and also if the API exists in alll supported version. On our side the safest bet is to assume it's not there, and always check for it.
Here, this was an automated change on failures, but we should do this for all torch.something
| cb_manager = self.running_continuous_batching_manager | ||
| if cb_manager is None: | ||
| raise RuntimeError("Continuous batching manager failed to initialize") |
There was a problem hiding this comment.
| cb_manager = self.running_continuous_batching_manager | |
| if cb_manager is None: | |
| raise RuntimeError("Continuous batching manager failed to initialize") | |
| if self.running_continuous_batching_manager is None: | |
| raise RuntimeError("Continuous batching manager failed to initialize") |
I guess this is needed, but just double-checking: Do we the local var conversion?
There was a problem hiding this comment.
The root issue is that this function contains 3 sub functions - that variable is narrowed as ContinuousBatchingManager | None, and nested closures can't benefit from the narrowing guard.
The real problem is how large and complex that function is, the real fix is to refactor it but I was trying to minimize the diff. Happy to do it though if you think that's in scope for this patch
There was a problem hiding this comment.
Gotcha, no worries I think we can keep it as is for now. But ccing @remi-or for CB viz
7bbc1ae to
b1104ee
Compare
vasqu
left a comment
There was a problem hiding this comment.
Looks already better to me 🤗 just a few smaller comments. Imo, the biggest point remains on how we handle the batch encoding --> this is very core and I think we should take our time here
| cb_manager = self.running_continuous_batching_manager | ||
| if cb_manager is None: | ||
| raise RuntimeError("Continuous batching manager failed to initialize") |
There was a problem hiding this comment.
Gotcha, no worries I think we can keep it as is for now. But ccing @remi-or for CB viz
| inputs = processor.apply_chat_template( | ||
| req["messages"], return_tensors="pt", add_generation_prompt=True, return_dict=True | ||
| ).to(model.device)["input_ids"][0] | ||
| chat_inputs = require_batch_encoding( |
There was a problem hiding this comment.
Opening re #44566 (comment) because I'm lazy and want everything collected in the same review :D
How much lifecycle for 3.10 is left 🤔 I think it might be worth it. The usage in other places of the code base should be similar so we hit this sooner or later I feel like
| "Using `fp_quant` with real quantization requires a **Blackwell GPU** and qutlass: `git clone https://github.com/IST-DASLab/qutlass.git && cd qutlass && pip install --no-build-isolation .`. You can use `FPQuantConfig(pseudoquantization=True, ...)` to use Triton-based pseudo-quantization. It doesn't provide any speedups but emulates the quantization behavior of the real quantization." | ||
| ) | ||
|
|
||
| if ( |
There was a problem hiding this comment.
ewww, thanks, looks like a bad rebase
…e CB context manager Refactors `src/transformers/cli/serve.py` to reduce nesting depth, eliminate code duplication, and improve maintainability. No behavioral changes and the public API is unchanged. This change is motivated by discussion in #44566 where type checking was made a bit complex due to the current code architecture.
|
Argh, it changed a lot on main 😭 do we want to close/draft this one for now @tarekziade |
Add type declarations for mixin host-class attributes on GenerationMixin, class-level annotations for dynamically-set attributes on GenerationConfig, and fix minor typing issues in candidate_generator, watermarking, and stopping_criteria. Create _typing.py Protocol for documentation/reuse.
0ea408b to
0adcb07
Compare
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44566&sha=7606ab |
|
refactoring too complex, will cherry pick in a new PR |
This patch extends
tycheck tosrc/transformers/cliBased on #44412