Adding support for `fp16` for asr pipeline. by Narsil · Pull Request #20864 · huggingface/transformers

Narsil · 2022-12-21T17:54:15Z

What does this PR do?

Fixes #20862

Many things were considered before settling for this design.

feature_extractor(return_tensors="pt¨, torch_dtype=torch_dtype) . This would have the advantage of being consistent, but not all feature extractors to define this, so it would affect all of them. Then why would we use torch_dtype instead of the more common place dtype which could be applied to TF and flax as well. Also it feels a bit redundant to specify both return_tensors and torch_dtype, it would be a good candidate to fuse both parameters (but outisde the scope of this PR).
AutoFeatureExtractor.from_pretrained(..., torch_dtype=torch_dtype). This would have the advantage of being overall so users don't need to respecify on each call. However we can't specifiy return_tensors="pt" in there either, so for consistency I didn't try to put it there.
ffmpeg_read(..., dtype=dtype) This would be nice to load directly the waveform into fp16 and just let fp16 flow through the feature_extractor. However, whisper in particular uses mel_spectrogram, so using f16 sound might actually damage performance.

In the end, this solution is the simplement I could come up with. Let torch_dtype flow to the pipeline, use it as a regular parameter and convert the output of the feature_extractor after.

This does incur a potentially extra copy but there's no risk of damaging quality of the input.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This reverts commit 0b917fc.

HuggingFaceDocBuilderDev · 2022-12-21T19:00:08Z

The documentation is not available anymore as the PR was closed or merged.

accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16.

bofenghuang · 2022-12-22T00:04:11Z

                inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
            )
+            if dtype is not None:
+                processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


Hi @Narsil,

I think this works fine for whisper models because they only have a single value input_features.

But in case of other models like wav2vec2, the model have multiple values of different dtypes, input_values which need to be casted from float32 to float16, and attention_mask I'm not sure to keep as int32 or cast to int16

Yes. And as above, if you directly use the to method on processed, it will take care of that for you.

Done. Thanks, TIL

sgugger

Thanks for working on this! My only comment is to make sure to leverage the to method on BatchFeature (if the feature extractor here returns another type, maybe make sure its to method handles dtype arguments) so that checks like not converting int inputs are applied for free.

Otherwise LGTM!

sgugger · 2022-12-22T06:41:25Z

        chunk = inputs[i : i + chunk_len]
        processed = feature_extractor(chunk, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt")
+        if dtype is not None:
+            processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


I believe you can call the to directly on processed, which is a BatchFeature and handles dtype in its to method thanks to #20536 (was designed for vision but I think it will apply here too).

sgugger · 2022-12-22T06:41:37Z


-    def preprocess(self, inputs, chunk_length_s=0, stride_length_s=None, ignore_warning=False):
+    def preprocess(self, inputs, chunk_length_s=0, stride_length_s=None, ignore_warning=False, dtype=None):
+        print(f"Running with dtype {dtype}")


To be cleaned up ;-)

sgugger · 2022-12-22T06:42:14Z

                inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
            )
+            if dtype is not None:
+                processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


Yes. And as above, if you directly use the to method on processed, it will take care of that for you.

* Supporting `fp16` for asr pipeline * Adding test. * Style. * Oops. * Flake8 update ? * Fixing flake8 ? * Revert "Flake8 update ?" This reverts commit 0b917fc. * Style (acctidentally deleted flake8 F401.) * Move to a bigger test (no small whisper model, and s2t doesn't seem to accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16. * Using BatchFeature capability.

Narsil added 4 commits December 21, 2022 18:50

Supporting fp16 for asr pipeline

6b3e4d2

Adding test.

defceec

Style.

216852d

Oops.

e6e0feb

Narsil mentioned this pull request Dec 21, 2022

Run AutomaticSpeechRecognitionPipeline with FP16 #20862

Closed

Narsil added 4 commits December 21, 2022 19:22

Flake8 update ?

0b917fc

Fixing flake8 ?

b28fc52

Revert "Flake8 update ?"

5f09fb8

This reverts commit 0b917fc.

Style (acctidentally deleted flake8 F401.)

e0755e5

Move to a bigger test (no small whisper model, and s2t doesn't seem to

9d0f155

accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16.

Narsil requested review from ArthurZucker and sgugger December 21, 2022 19:32

bofenghuang reviewed Dec 22, 2022

View reviewed changes

sgugger approved these changes Dec 22, 2022

View reviewed changes

Using BatchFeature capability.

570107a

sgugger approved these changes Dec 23, 2022

View reviewed changes

Narsil merged commit f7f0ec2 into huggingface:main Dec 23, 2022

Narsil deleted the support_fp16_asr branch December 23, 2022 09:18

bofenghuang mentioned this pull request Dec 27, 2022

Run TextGenerationPipeline in FP16 #20912

Closed

4 tasks

SunMarc mentioned this pull request Jan 16, 2025

fix low-precision audio classification pipeline #35435

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for `fp16` for asr pipeline.#20864

Adding support for `fp16` for asr pipeline.#20864
Narsil merged 10 commits intohuggingface:mainfrom
Narsil:support_fp16_asr

Narsil commented Dec 21, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 21, 2022 •

edited

Loading

Uh oh!

bofenghuang Dec 22, 2022 •

edited

Loading

Uh oh!

sgugger Dec 22, 2022

Uh oh!

Narsil Dec 22, 2022

Uh oh!

sgugger left a comment

Uh oh!

sgugger Dec 22, 2022

Uh oh!

Narsil Dec 22, 2022

Uh oh!

sgugger Dec 22, 2022

Uh oh!

Narsil Dec 22, 2022

Uh oh!

sgugger Dec 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Narsil commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bofenghuang Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

Narsil Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

Narsil Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

Narsil Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Narsil commented Dec 21, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 21, 2022 •

edited

Loading

bofenghuang Dec 22, 2022 •

edited

Loading