Text-only training of multimodal models by oleksost · Pull Request #403 · ServiceNow/Fast-LLM

oleksost · 2025-12-04T19:42:10Z

✨ Description

Not sure this is the right way to go about it, but this fixes problems when it comes to training multimodal model on text only data? @jlamypoirier ?

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

jlamypoirier

If I understand correctly, you would like to include text-only datasets when training multimodal models?

Please have a look at #402, which has a big impact on this PR. In there I forward the dataset requirements from the model to the dataset. Currently a text-only dataset will cause a crash, but I could adjust to create empty image patches instead, and it would make more sense than doing it in the model. What do you think?

jlamypoirier · 2025-12-04T21:15:05Z


        return preprocessed_meta

+    def _get_empty_image_patches(self, tokens: torch.Tensor, kwargs: dict[str, typing.Any]) -> PatchBatch:


This should probably go in preprocessing/image_patch. Also it's very similar to ImagePatchConfig.get_patches_from_images, maybe it can be reused.

Hey thanks @jlamypoirier ! Yeh, this is about using test-only data for multimodal model. Are you planing to address it in #402? I am fine with creating those in the dataset instead of the model

I'll add it myself, it's not much effort and will affect the PR

fallback empty patch batch

e88cf2e

oleksost requested a review from jlamypoirier December 4, 2025 19:42

oleksost marked this pull request as ready for review December 4, 2025 19:42

jlamypoirier reviewed Dec 4, 2025

View reviewed changes

jlamypoirier mentioned this pull request Dec 6, 2025

Ensure compatibility between models and datasets #402

Merged

oleksost closed this Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text-only training of multimodal models#403

Text-only training of multimodal models#403
oleksost wants to merge 1 commit intomainfrom
text_only_multimodal

oleksost commented Dec 4, 2025 •

edited

Loading

Uh oh!

jlamypoirier left a comment

Uh oh!

jlamypoirier Dec 4, 2025

Uh oh!

oleksost Dec 4, 2025

Uh oh!

jlamypoirier Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return preprocessed_meta

		def _get_empty_image_patches(self, tokens: torch.Tensor, kwargs: dict[str, typing.Any]) -> PatchBatch:

Conversation

oleksost commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

jlamypoirier left a comment

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

oleksost Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oleksost commented Dec 4, 2025 •

edited

Loading