Conversation
jlamypoirier
left a comment
There was a problem hiding this comment.
If I understand correctly, you would like to include text-only datasets when training multimodal models?
Please have a look at #402, which has a big impact on this PR. In there I forward the dataset requirements from the model to the dataset. Currently a text-only dataset will cause a crash, but I could adjust to create empty image patches instead, and it would make more sense than doing it in the model. What do you think?
|
|
||
| return preprocessed_meta | ||
|
|
||
| def _get_empty_image_patches(self, tokens: torch.Tensor, kwargs: dict[str, typing.Any]) -> PatchBatch: |
There was a problem hiding this comment.
This should probably go in preprocessing/image_patch. Also it's very similar to ImagePatchConfig.get_patches_from_images, maybe it can be reused.
There was a problem hiding this comment.
Hey thanks @jlamypoirier ! Yeh, this is about using test-only data for multimodal model. Are you planing to address it in #402? I am fine with creating those in the dataset instead of the model
There was a problem hiding this comment.
I'll add it myself, it's not much effort and will affect the PR
✨ Description
Not sure this is the right way to go about it, but this fixes problems when it comes to training multimodal model on text only data? @jlamypoirier ?
🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Dependencies and Configuration
Testing
Performance Impact
📊 Performance Impact Details
If there is any impact on performance, describe it and provide benchmark results, if applicable:
🗒️ Additional Notes
Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.