Skip to content

Add generate kwargs to AutomaticSpeechRecognitionPipeline#20935

Closed
bofenghuang wants to merge 15 commits intohuggingface:mainfrom
bofenghuang:add-gen-kwargs-asr-pipeline
Closed

Add generate kwargs to AutomaticSpeechRecognitionPipeline#20935
bofenghuang wants to merge 15 commits intohuggingface:mainfrom
bofenghuang:add-gen-kwargs-asr-pipeline

Conversation

@bofenghuang
Copy link
Copy Markdown
Contributor

@bofenghuang bofenghuang commented Dec 29, 2022

What does this PR do?

Hi @Narsil 👋,

In this PR, I tried to add generate arguments to AutomaticSpeechRecognitionPipeline in order to run pipeline with seq2seq models using beam search, contrastive search, etc. I followed the style in TextGenerationPipeline.

import torch
from transformers import pipeline

pipe = pipeline(model="openai/whisper-base", device=0, torch_dtype=torch.float16)

pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", max_new_tokens=5)
# {'text': ' He hoped'}

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Dec 29, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Copy Markdown
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition.

I proposed several changes:

  • Using kwargs directly is indeed done in text-generation pipeline, but it actually showed limits becase max_length is both used for generate and the tokenization part, so I now try to avoid it.

pipeline(..., generate_kwargs={"num_beams": 5}) is not exactly as elegant, but at least it enables ALL parameters without risking clashing later on with another parameter.

Since max_new_tokens is likely to be the most used parameter, I think we can definitely lift it as a first class parameter. pipeline(...., max_new_tokens=5) works.
That way we don't risk more obscure parameter clashing and we can still enable complex use cases.

I added an error in the odd case where a user would do pipeline(..., max_new_tokens=5, generate_kwargs={"max_new_tokens": 10}) .

Wdyt ?

Do you mind adding small test for this feature, and update the docstring ?

Otherwise LGTM

Comment thread src/transformers/pipelines/automatic_speech_recognition.py Outdated
Comment thread src/transformers/pipelines/automatic_speech_recognition.py Outdated
Comment thread src/transformers/pipelines/automatic_speech_recognition.py Outdated
@bofenghuang
Copy link
Copy Markdown
Contributor Author

@Narsil it's indeed better this way. Thanks for the explanation!

@bofenghuang
Copy link
Copy Markdown
Contributor Author

Hi @Narsil,

Some tests of ctc_with_lm models failed. I think we could

  1. Lift decoder in __init__ as an individual argument
  2. Add **kwargs into _sanitize_parameters

Personally I prefer the 1st one since the other one may introduce some silent errors. What's your opinion?

@Narsil
Copy link
Copy Markdown
Contributor

Narsil commented Dec 29, 2022

Personally I prefer the 1st one since the other one may introduce some silent errors. What's your opinion?

In general I would agree with you. Pipelines accepting so many parameters I would tend to keep it simple, and maybe just change line 183

- self.decoder = kwargs["decoder"]
+ self.decoder = kwargs.pop("decoder")

This would be just so the signature is kept at a minimum (the docstring should be good) and avoiding accepting decoder as a positioned arguments instead of a keyword one. (I know we can do that within the signature, but it does complexify the docs, notably this part: https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline

@Narsil
Copy link
Copy Markdown
Contributor

Narsil commented Dec 29, 2022

This is the sort of function complexity that I think is more detrimental than helping unfortunately: https://huggingface.co/docs/transformers/v4.25.1/en/main_classes/text_generation#transformers.GenerationMixin.generate

@bofenghuang
Copy link
Copy Markdown
Contributor Author

In general I would agree with you. Pipelines accepting so many parameters I would tend to keep it simple, and maybe just change line 183

- self.decoder = kwargs["decoder"]
+ self.decoder = kwargs.pop("decoder")

The error occurs in the line 173 where _sanitize_parameters is called in parent :(

@Narsil
Copy link
Copy Markdown
Contributor

Narsil commented Dec 30, 2022

The error occurs in the line 173 where _sanitize_parameters is called in parent :(

Ah so it happens before then, let's do it you way then

does

__init__(self, ....,, *args, *, decoder, **kwargs)

work ?
(Try and force to disable positional argument for decoder ?

@bofenghuang
Copy link
Copy Markdown
Contributor Author

does

__init__(self, ....,, *args, *, decoder, **kwargs)

work ? (Try and force to disable positional argument for decoder ?

No it's a syntax error :(

Can we do this ?

- def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], *args, **kwargs):
+ def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], decoder: Optional[Union["BeamSearchDecoderCTC", str]] = None, *args, **kwargs):

@Narsil
Copy link
Copy Markdown
Contributor

Narsil commented Dec 30, 2022

This will interpret AutomaticSpeecRecognitionPipeline(feature_extractor, model) and interpret model as decoder which will lead to confusing errors.

Can you try :

+ def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], *, decoder: Optional[Union["BeamSearchDecoderCTC", str]] = None, **kwargs):

Maybe ?

@bofenghuang
Copy link
Copy Markdown
Contributor Author

Can you try :

+ def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], *, decoder: Optional[Union["BeamSearchDecoderCTC", str]] = None, **kwargs):

Maybe ?

No we need *args for the line 173

@Narsil
Copy link
Copy Markdown
Contributor

Narsil commented Dec 30, 2022

Can you try :

+ def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], *, decoder: Optional[Union["BeamSearchDecoderCTC", str]] = None, **kwargs):

Maybe ?

No we need *args for the line 173

Remove it there too.

@bofenghuang
Copy link
Copy Markdown
Contributor Author

@Narsil Oups, the commit history seems to be messed up. Let me create a new one!

@bofenghuang
Copy link
Copy Markdown
Contributor Author

Closed as the other one is cleaner #20952

@bofenghuang bofenghuang deleted the add-gen-kwargs-asr-pipeline branch January 6, 2023 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants