Skip to content

Add VidEoMT#44285

Merged
vasqu merged 77 commits intohuggingface:mainfrom
NielsRogge:add_videomt
Mar 25, 2026
Merged

Add VidEoMT#44285
vasqu merged 77 commits intohuggingface:mainfrom
NielsRogge:add_videomt

Conversation

@NielsRogge
Copy link
Copy Markdown
Collaborator

@NielsRogge NielsRogge commented Feb 25, 2026

What does this PR do?

This PR adds the VidEoMT model, as described in VidEoMT: Your ViT is Secretly Also a Video Segmentation Model.

Gradio demo (running on ZeroGPU): https://huggingface.co/spaces/nielsr/videomt-transformers-demo

Original Github thread: tue-mps/videomt#1

NielsRogge and others added 30 commits February 23, 2026 15:37
…-to-transformers

[Videomt] Extend query-stage 5D/4D parity validation to 3-frame videos
…onversion

[videomt] Improve verify adapters and DINOv3 failure diagnostics
…n-videomt

[VidEoMT] Add temporal query-updater support, fix DINOv2 conversion mappings and re-verify yt_2019_vit_small
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 17, 2026

What gpu @NielsRogge? Might be a diff between CI (A10) and your local one

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 17, 2026

run-slow: videomt

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/videomt"]
quantizations: []

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 17, 2026

I've added expectation for our CI devices so it works now, multi gpu still seems to fail. Would be nice if you could check @NielsRogge

(general ci still shaky 😢)

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 5bba765b workflow commit (merge commit)
PR 20278faf branch commit (from PR)
main acc89e74 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@NielsRogge
Copy link
Copy Markdown
Collaborator Author

@vasqu friendly ping

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 23, 2026

Please check #44285 (comment)

It hasn't been addressed afaik

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

run-slow: videomt

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, videomt

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/videomt"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 1a440406 workflow commit (merge commit)
PR b192fc74 branch commit (from PR)
main 2f624917 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu vasqu enabled auto-merge March 25, 2026 16:08
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the last things, good to merge now 🤗

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44285&sha=09b99a

@vasqu vasqu disabled auto-merge March 25, 2026 17:05
@vasqu vasqu merged commit 2a54236 into huggingface:main Mar 25, 2026
27 of 29 checks passed
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

Force merged, because the failure is known and I don't want to further delay this model

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Mar 27, 2026
* First draft

* [Videomt] Extend query-stage parity checks to 3-frame inputs

* [Videomt] Add full-model parity check against EoMT reference

* [Videomt] Compare conversion against official GitHub reference

* [Videomt] Simplify conversion to checkpoint-based HF mapping

* [Videomt] Add --verify mode against upstream GitHub implementation

* [Videomt] Improve --verify diagnostics with key remapping and layer checks

* [Videomt] Improve verify backbone candidate fallback and remapping

* [Videomt] Add DINOv3 verify compatibility patch and progress logging

* [Videomt] Extend verify diagnostics with MLP/head parity checks

* [Videomt] Make --verify succeed for converted weight mapping scope

* [videomt] Improve verify adapters and candidate traceback diagnostics

* [videomt] Adapt verify _pos_embed output for DINOv3 candidates

* [videomt] Enable DINOv3 verify candidate by adapting EVA head_dim

* [videomt] Add pre-query layer diagnostics to verify flow

* [videomt] Add deterministic verify probes and deeper pre-query diffs

* [videomt] Penalize skipped keys in verify candidate scoring

* [videomt] Add no-rope A/B diagnostics to verify pre-query layers

* [videomt] Add branch-level pre-query diagnostics to verify

* [videomt] Add fine-grained MLP diagnostics to verify

* [videomt] Verify layer-scale mapping parity in --verify

* [videomt] Validate MLP diagnostic decomposition in verify

* [videomt] Add token-group diagnostics for layer-4 MLP divergence

* [VidEoMT] Add temporal query updater path and re-verify yt_2019_vit_small

* [VidEoMT] Refine 5D execution order and re-check small checkpoint parity

* Simplify conversion script and convert all dinov2 checkpoints

* Add id2label mappings

* Fix all tests

* Add to auto mapping

* Simplify verify_conversion_against_github_reference

* Update absolute tolerance

* Update date

* Revert AGENTS.md

* Address comments

* Add circleci skill, fix circleci

* Fix CI

* Remove skills from git

* Address comments

* Address more comments

* Address comment

* Add docstrigns

* Restore AGENTS.md

* Address comment

* fix this one

* Address comments

* [fix] mistral 4 docs (huggingface#44776)

fix

* Address comment

* add expectations

* Update date

* Make fix-repo

* fix multi gpu

* fix with changes on main

* fix date

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
@NielsRogge NielsRogge mentioned this pull request Mar 30, 2026
2 tasks
NielsRogge added a commit to NielsRogge/transformers that referenced this pull request Mar 30, 2026
* First draft

* [Videomt] Extend query-stage parity checks to 3-frame inputs

* [Videomt] Add full-model parity check against EoMT reference

* [Videomt] Compare conversion against official GitHub reference

* [Videomt] Simplify conversion to checkpoint-based HF mapping

* [Videomt] Add --verify mode against upstream GitHub implementation

* [Videomt] Improve --verify diagnostics with key remapping and layer checks

* [Videomt] Improve verify backbone candidate fallback and remapping

* [Videomt] Add DINOv3 verify compatibility patch and progress logging

* [Videomt] Extend verify diagnostics with MLP/head parity checks

* [Videomt] Make --verify succeed for converted weight mapping scope

* [videomt] Improve verify adapters and candidate traceback diagnostics

* [videomt] Adapt verify _pos_embed output for DINOv3 candidates

* [videomt] Enable DINOv3 verify candidate by adapting EVA head_dim

* [videomt] Add pre-query layer diagnostics to verify flow

* [videomt] Add deterministic verify probes and deeper pre-query diffs

* [videomt] Penalize skipped keys in verify candidate scoring

* [videomt] Add no-rope A/B diagnostics to verify pre-query layers

* [videomt] Add branch-level pre-query diagnostics to verify

* [videomt] Add fine-grained MLP diagnostics to verify

* [videomt] Verify layer-scale mapping parity in --verify

* [videomt] Validate MLP diagnostic decomposition in verify

* [videomt] Add token-group diagnostics for layer-4 MLP divergence

* [VidEoMT] Add temporal query updater path and re-verify yt_2019_vit_small

* [VidEoMT] Refine 5D execution order and re-check small checkpoint parity

* Simplify conversion script and convert all dinov2 checkpoints

* Add id2label mappings

* Fix all tests

* Add to auto mapping

* Simplify verify_conversion_against_github_reference

* Update absolute tolerance

* Update date

* Revert AGENTS.md

* Address comments

* Add circleci skill, fix circleci

* Fix CI

* Remove skills from git

* Address comments

* Address more comments

* Address comment

* Add docstrigns

* Restore AGENTS.md

* Address comment

* fix this one

* Address comments

* [fix] mistral 4 docs (huggingface#44776)

fix

* Address comment

* add expectations

* Update date

* Make fix-repo

* fix multi gpu

* fix with changes on main

* fix date

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants