Fix missing video inputs for PerceptionLM.#39971
Fix missing video inputs for PerceptionLM.#39971zucchini-nlp merged 3 commits intohuggingface:mainfrom
Conversation
molbap
left a comment
There was a problem hiding this comment.
LGTM for the fix, cc @zucchini-nlp who made the initial change!
For the non-standard image inputs, OK but would be better with a test that goes with it
zucchini-nlp
left a comment
There was a problem hiding this comment.
Oke, thanks! I think we need to standardize output shapes from the image processor to be consistent though
Maybe we can always return 5D pixels or already flattened 4D pixels? Whichever way looks good, we have models doing both options
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@zucchini-nlp The reason shape unification is done in models rather than image_processing is that i noticed in training model sees a different input shape than in eval/inference.
|
|
@zucchini-nlp Let me split the PR and merge the more urgent fix first? |
This reverts commit 181d87b.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: perception_lm |
|
@zucchini-nlp My bad. Just realized this from collate_fn in my training script ( I added one dimension) Let me open another PR for this simple fix for image_preprocessor and update corresponding training script in model card.
|
* Fix missing video inputs for PerceptionLM. * Minor fix for vanilla input image (only C,H,W, no tiles dim). * Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)." This reverts commit 181d87b.
Critical: Fixes missing video input for PerceptionLM (accidentally removed in PR)
Minor: Add support for vanilla image that only has C,H,W dims but not tiles dim.
This is non-default image shapes used in PLM but it's useful in demos and low-resoure devices.
e.g., in just added "PLM Simple Fine-tuning Example" under
https://huggingface.co/facebook/Perception-LM-1B#plm-usage