Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode#18351
Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode#18351sgugger merged 4 commits intohuggingface:mainfrom
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
@anton-l are you looking into this? I'm not super familiar with the |
|
Hi @falcaopetri! Looks like |
|
Hi everyone. Test setup was wrong, but new commit's diff should be self-explanatory. Let me know if there's something still missing. You can also check this colab. 34s -> 5s speed-up by just removing the overhead, even without using >1 processes, which should have a somewhat linear improvement when decoding multiple batches. |
|
Hi @anton-l, @sanchit-gandhi. I've merged |
|
This PR looks very nice to me! Let's try to get it merged |
|
@falcaopetri let's see what the tests say and then IMO we can merge if they are all green |
|
Thanks for re-opening it @patrickvonplaten! I've just fixed a code quality issue that made And should I rebase everything to get a nice and clean history or it's not necessary? |
19b6608 to
8b43366
Compare
|
@patrickvonplaten, I've rebased everything so we (i) get a cleaner merge and (ii) re-trigger CircleCI. Unfortunately, CircleCI setup is still failing. |
|
It seems there is an issue with your CircleCI permissions, the tests won't run. |
8b43366 to
7e4f592
Compare
|
Oh I see, I didn't realize CircleCI was talking about my own user's credential. Thanks for the the heads up. For future reference, I did make a mistake in the process: after refreshing credentials, CircleCI prompted me to set up a project within my own organization, which then made the |
|
I just realized that my Wav2Vec2ProcessorWithLM.batch_decode's Example triggers: @patrickvonplaten, is there a way to indicate that a map arg shouldn't be taken into account when hashing the transformation? Or maybe |
|
Uff yeah good question. Gently pinging @lhoestq and/or @mariosasko here as this seems to be |
|
On the other hand, I don't think it's a must that results are cached with datasets here - think we can merge this PR without. |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR @falcaopetri - it's good to merge for me. Would love to get a review from @sanchit-gandhi here before merging :-)
You can pass Alternatively you can just disable caching in |
sgugger
left a comment
There was a problem hiding this comment.
Thanks a lot for your PR! Just left a couple of nits for styling but otherwise LGTM!
There was a problem hiding this comment.
This example is a bit too long for a docstring IMO, can you mvoe it to the doc page instead? (in docs/source/em/models/wav2vec2_with_lm.mdx)
There was a problem hiding this comment.
I'm not sure if I understood your suggestion @sgugger. Wav2Vec2ProcessorWithLM seems to be documented within docs/source/en/model_doc/wav2vec2.mdx. Should I add the example as a custom section (after Overview?), similarly to Speech2Text#Inference?
There was a problem hiding this comment.
You can add the example after the documentation of the processor class.
There was a problem hiding this comment.
Sorry to bother you, @sgugger, but I still can't grasp how to proceed. I've tried things like below, but doc-builder renders the <div>s incorrectly.
[[autodoc]] Wav2Vec2ProcessorWithLM
...
- batch_decode
Example:
```python
>>> ...
```
- decodeThere was a problem hiding this comment.
The example should be after the whole autodoc block. You can introduce it with a small sentence.
There was a problem hiding this comment.
Thanks for the explanation @sgugger. I've refactor the example. FYI, I added a link from batch_decode to the example section (see here). I hope it's ok to have this cross-reference (.py's docstring referencing a section from .mdx).
Now I've also realized that [~model.wav2vec2...] were not rendered properly. Are they available only inside an autodoc context?
There was a problem hiding this comment.
See my comment below, the links are not right (you don't need the full path but you do need the class name!)
sanchit-gandhi
left a comment
There was a problem hiding this comment.
Very nice PR, thank you @falcaopetri!
Also, the new tests are a copy and paste of previous tests (...) Ideas to cut down the duplicated code are welcomed
There's no issue with having duplicated code for the different tests - it's very readable and easy to understand, which is exactly what we want in Transformers 🤗
e047aa1 to
3867feb
Compare
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
3867feb to
ee2f3b8
Compare
| If you are planning to decode multiple batches of audios, you should consider using [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.batch_decode`] and passing an instantiated `multiprocessing.Pool`. | ||
| Otherwise, [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.batch_decode`] performance will be slower than calling [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.decode`] for each audio individually, as it internally instantiates a new `Pool` for every call. See the example below: |
There was a problem hiding this comment.
| If you are planning to decode multiple batches of audios, you should consider using [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.batch_decode`] and passing an instantiated `multiprocessing.Pool`. | |
| Otherwise, [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.batch_decode`] performance will be slower than calling [`~model.wav2vec2_with_lm.processing_wav2vec2_with_lm.decode`] for each audio individually, as it internally instantiates a new `Pool` for every call. See the example below: | |
| If you are planning to decode multiple batches of audios, you should consider using [`~Wav2Vec2ProcessorWithLM.batch_decode`] and passing an instantiated `multiprocessing.Pool`. | |
| Otherwise, [`~Wav2Vec2ProcessorWithLM.batch_decode`] performance will be slower than calling [`~Wav2Vec2ProcessorWithLM.decode`] for each audio individually, as it internally instantiates a new `Pool` for every call. See the example below: |
e2f0f9c to
d285576
Compare
|
Thanks again for all your work on this! |
|
Hi @falcaopetri Thank you for adding this! After merging to the main branch, we have 3 test failures (when running on GPU) tests/models/wav2vec2/test_modeling_tf_wav2vec2.py::TFWav2Vec2ModelIntegrationTest::test_wav2vec2_with_lm_invalid_pool
(line 243) RuntimeError: context has already been set
tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2ModelIntegrationTest::test_wav2vec2_with_lm_invalid_pool
(line 243) RuntimeError: context has already been set
tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2ModelIntegrationTest::test_wav2vec2_with_lm_pool
(line 1639) TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.More detailed information could be found here and its raw logs. Would you like to take a look here 🙏? Thank you 🤗 |
|
Hi @ydshieh. I'm really sorry about that. Both errors are my fault while setting up the tests.
This should fix both issues: main...falcaopetri:transformers:fix-w2v2-lm-pool. What is your workflow in this case? Should I open a new PR? |
|
Thank you so much @falcaopetri. Yes, it would be nice if you can open a PR. Otherwise, we can do it ourselves by looking your branch above ❤️ . [Context of our CI] Our CircleCI tests run on CPU, so we use |
What does this PR do?
There are two issues being attacked:
Wav2Vec2ProcessorWithLM.batch_decodegenerates a big overhead if it's called multiple times (this PR fixes Wav2Vec2ProcessorWithLM degraded performance when transcribing multiple files #17879)pyctcdecodecan't usespawnPools (this PR supersedes fix: 🐛 changing context of multiprocessing while decoding for Windows #17070)Changes:
poolargument toWav2Vec2ProcessorWithLM.batch_decode. This allows a user-managedmultiprocessing.Poolto be (re)used across multiple calls tobatch_decode.pyctcdecodeversion requirement. The new version contains code to handle invalid pools.Wav2Vec2ProcessorWithLM.Wav2Vec2ProcessorWithLM.batch_decode's docs.An important implementation reference is multiprocessing's Contexts and start methods. Basically,
batch_decode's multiprocessing capabilities are useful only in Unix, which usesforkcontexts. This PR introduces some checks in this regard. They can be removed once kensho-technologies/pyctcdecode#65 is resolved.Breaking change
The new
poolargument can break currently valid codes like:processor.batch_decode(logits, 5). Previously, the second argument meantnum_processes. If that's an issue, some considerations are:poolandnum_processesare mutually exclusive, but a unique arg likenum_processes_or_poolseemed weirdpoolas last argumentChecklist
I couldn't install all deps, so I didn't execute some tests and I didn't build the documentation changes. Let's see what CI shows about:
TestFixed in 17efdddbeeac75eeacddff3bcc51ced70ec19217test_decoder_batch_1fromtest_processor_wav2vec2_with_lm.py(it usesfork, but it fails in my Mac, probably because of the OS platform behavior)Wav2Vec2ProcessorWithLM.batch_decode's new TipsWav2Vec2ProcessorWithLM.batch_decode's new usage exampleAlso, the new tests are a copy and paste of previous tests. I couldn't figure out how to factor out the duplicated code (I'm more used to pytest's fixtures). Ideas to cut down the duplicated code are welcomed.
After merge
Once merged, it could be nice to adapt and re-run some scripts such as evaluation of patrickvonplaten/wav2vec2-large-960h-lv60-self-4-gram. In this case, for example, there are two improvements:
Wav2Vec2ProcessorWithLM.batch_decodewithoutbatched=True, which probably means that there's no advantage in using parallel batch decoding. The usage example from this PR allows proper parallel batch decoding.Before submitting
Who can review?
@patrickvonplaten