Add VidEoMT#44285
Conversation
…-to-transformers [Videomt] Extend query-stage 5D/4D parity validation to 3-frame videos
…onversion [videomt] Improve verify adapters and DINOv3 failure diagnostics
…n-videomt [VidEoMT] Add temporal query-updater support, fix DINOv2 conversion mappings and re-verify yt_2019_vit_small
|
What gpu @NielsRogge? Might be a diff between CI (A10) and your local one |
|
run-slow: videomt |
|
This comment contains models: ["models/videomt"] |
|
I've added expectation for our CI devices so it works now, multi gpu still seems to fail. Would be nice if you could check @NielsRogge (general ci still shaky 😢) |
|
@vasqu friendly ping |
|
Please check #44285 (comment) It hasn't been addressed afaik |
|
run-slow: videomt |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, videomt |
|
This comment contains models: ["models/videomt"] |
vasqu
left a comment
There was a problem hiding this comment.
Fixed the last things, good to merge now 🤗
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44285&sha=09b99a |
|
Force merged, because the failure is known and I don't want to further delay this model |
* First draft * [Videomt] Extend query-stage parity checks to 3-frame inputs * [Videomt] Add full-model parity check against EoMT reference * [Videomt] Compare conversion against official GitHub reference * [Videomt] Simplify conversion to checkpoint-based HF mapping * [Videomt] Add --verify mode against upstream GitHub implementation * [Videomt] Improve --verify diagnostics with key remapping and layer checks * [Videomt] Improve verify backbone candidate fallback and remapping * [Videomt] Add DINOv3 verify compatibility patch and progress logging * [Videomt] Extend verify diagnostics with MLP/head parity checks * [Videomt] Make --verify succeed for converted weight mapping scope * [videomt] Improve verify adapters and candidate traceback diagnostics * [videomt] Adapt verify _pos_embed output for DINOv3 candidates * [videomt] Enable DINOv3 verify candidate by adapting EVA head_dim * [videomt] Add pre-query layer diagnostics to verify flow * [videomt] Add deterministic verify probes and deeper pre-query diffs * [videomt] Penalize skipped keys in verify candidate scoring * [videomt] Add no-rope A/B diagnostics to verify pre-query layers * [videomt] Add branch-level pre-query diagnostics to verify * [videomt] Add fine-grained MLP diagnostics to verify * [videomt] Verify layer-scale mapping parity in --verify * [videomt] Validate MLP diagnostic decomposition in verify * [videomt] Add token-group diagnostics for layer-4 MLP divergence * [VidEoMT] Add temporal query updater path and re-verify yt_2019_vit_small * [VidEoMT] Refine 5D execution order and re-check small checkpoint parity * Simplify conversion script and convert all dinov2 checkpoints * Add id2label mappings * Fix all tests * Add to auto mapping * Simplify verify_conversion_against_github_reference * Update absolute tolerance * Update date * Revert AGENTS.md * Address comments * Add circleci skill, fix circleci * Fix CI * Remove skills from git * Address comments * Address more comments * Address comment * Add docstrigns * Restore AGENTS.md * Address comment * fix this one * Address comments * [fix] mistral 4 docs (huggingface#44776) fix * Address comment * add expectations * Update date * Make fix-repo * fix multi gpu * fix with changes on main * fix date --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* First draft * [Videomt] Extend query-stage parity checks to 3-frame inputs * [Videomt] Add full-model parity check against EoMT reference * [Videomt] Compare conversion against official GitHub reference * [Videomt] Simplify conversion to checkpoint-based HF mapping * [Videomt] Add --verify mode against upstream GitHub implementation * [Videomt] Improve --verify diagnostics with key remapping and layer checks * [Videomt] Improve verify backbone candidate fallback and remapping * [Videomt] Add DINOv3 verify compatibility patch and progress logging * [Videomt] Extend verify diagnostics with MLP/head parity checks * [Videomt] Make --verify succeed for converted weight mapping scope * [videomt] Improve verify adapters and candidate traceback diagnostics * [videomt] Adapt verify _pos_embed output for DINOv3 candidates * [videomt] Enable DINOv3 verify candidate by adapting EVA head_dim * [videomt] Add pre-query layer diagnostics to verify flow * [videomt] Add deterministic verify probes and deeper pre-query diffs * [videomt] Penalize skipped keys in verify candidate scoring * [videomt] Add no-rope A/B diagnostics to verify pre-query layers * [videomt] Add branch-level pre-query diagnostics to verify * [videomt] Add fine-grained MLP diagnostics to verify * [videomt] Verify layer-scale mapping parity in --verify * [videomt] Validate MLP diagnostic decomposition in verify * [videomt] Add token-group diagnostics for layer-4 MLP divergence * [VidEoMT] Add temporal query updater path and re-verify yt_2019_vit_small * [VidEoMT] Refine 5D execution order and re-check small checkpoint parity * Simplify conversion script and convert all dinov2 checkpoints * Add id2label mappings * Fix all tests * Add to auto mapping * Simplify verify_conversion_against_github_reference * Update absolute tolerance * Update date * Revert AGENTS.md * Address comments * Add circleci skill, fix circleci * Fix CI * Remove skills from git * Address comments * Address more comments * Address comment * Add docstrigns * Restore AGENTS.md * Address comment * fix this one * Address comments * [fix] mistral 4 docs (huggingface#44776) fix * Address comment * add expectations * Update date * Make fix-repo * fix multi gpu * fix with changes on main * fix date --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
What does this PR do?
This PR adds the VidEoMT model, as described in VidEoMT: Your ViT is Secretly Also a Video Segmentation Model.
Gradio demo (running on ZeroGPU): https://huggingface.co/spaces/nielsr/videomt-transformers-demo
Original Github thread: tue-mps/videomt#1