Conversation
zhehuaichen
left a comment
There was a problem hiding this comment.
Great work! Thank you so much!
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
titu1994
left a comment
There was a problem hiding this comment.
Requires images to be moved to GH release, rest all are minor comments
| "answer": "the transcription of the audio", # optional for inference, default to "na" in dataloader | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
We support more variations of what does the audio mean now right ?
|
|
||
| The `context` field in the manifest is optional, and you can put a list of context in a context file (one context for each line) then set `++model.data.train_ds.context_file=<path to to context file>` to ask the dataloader to randomly pick a context from the file for each audio sample. This is useful for training with multiple prompts for the same task. If neither `context` field nor `context_file` is provided, the dataloader will use a default context `what does the audio mean?` for all audios. During inference, it is recommended to have the `context` field in the manifest. | ||
|
|
||
| Customizing the fields to use |
There was a problem hiding this comment.
Note that the use of prompt_template here conflicts with Canary model (and speechlm) PromptFormatter class which also uses a model.cfg.prompt_format called Canary. Just a note
| ------------------------------ | ||
|
|
||
|
|
||
| In order to use a context file, you can set `++model.data.train_ds.context_file=<path to to context file>` in the command line or use multiple context files with `++model.data.train_ds.context_file=[<path to to context file1>,<path to context file2>,...]`. If the number of context files is equal to the number of provided datasets, the dataloader will assigne each context file to a dataset. Otherwise, the dataloader will randomly pick a context file from all provided context files for each audio sample. Using multiple context files is useful for training with multiple tasks, where each task has its own set of prompts. Meanwhile, you can control the weights for different tasks/datasets by using concatentated tarred datasets, where you can assign weights to datasets by: |
There was a problem hiding this comment.
What if the task and the context are wildly different during sampling ? Ie for ASR and AST ?
There was a problem hiding this comment.
each dataset can have it's own list of context files, such that ASR and ASR can sample from each pool separately
There was a problem hiding this comment.
Cool, is this mentioned somewhere else ?
There was a problem hiding this comment.
Don't add images to git. Upload file to last release, and put url in rst
Signed-off-by: stevehuang52 <heh@nvidia.com>
…to add_speechlm_docs
| ------------------------------ | ||
|
|
||
|
|
||
| In order to use a context file, you can set `++model.data.train_ds.context_file=<path to to context file>` in the command line or use multiple context files with `++model.data.train_ds.context_file=[<path to to context file1>,<path to context file2>,...]`. If the number of context files is equal to the number of provided datasets, the dataloader will assigne each context file to a dataset. Otherwise, the dataloader will randomly pick a context file from all provided context files for each audio sample. Using multiple context files is useful for training with multiple tasks, where each task has its own set of prompts. Meanwhile, you can control the weights for different tasks/datasets by using concatentated tarred datasets, where you can assign weights to datasets by: |
There was a problem hiding this comment.
Cool, is this mentioned somewhere else ?
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Boxiang Wang <boxiangw@nvidia.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Vivian Chen <xuanzic@example.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: kchike <kohei.chike@jp.ricoh.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* add docs Signed-off-by: stevehuang52 <heh@nvidia.com> * add lhotse specific info Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * move images to github release 1.23 Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
What does this PR do ?
Add docs to SpeechLLM
Collection: [multimodal]