Add SAM 3.1 by NielsRogge · Pull Request #45110 · huggingface/transformers

NielsRogge · 2026-03-30T08:19:42Z

What does this PR do?

[disclaimer: PR was entirely written by Codex where I just nudge it in the right directions, similar to #44285]

Feature request

I'd like to add support for Meta's SAM 3.1 release to transformers.

SAM 3.1 does not look like a simple checkpoint refresh for the video stack. The upstream release introduces the new Object Multiplex tracking architecture, so for video this is not just a drop-in replacement for the existing SAM 3 / sam3_video implementation.

Proposed scope

I have a local implementation working for the following scope:

Image support for SAM 3.1 via the existing sam3 image family
- Reuse Sam3Model for image inference.
- Extend the SAM 3 conversion script to accept the merged facebook/sam3.1 checkpoint and extract the detector weights from it.
- Verify conversion with a save/load/forward check, plus preprocessing parity against the upstream SAM 3 image preprocessing pipeline.
New sam3_1_video model family for SAM 3.1 video
- Add a dedicated sam3_1_video implementation based on the SAM 3.1 multiplex tracker architecture.
- Build it from a modular source file (modular_sam3_1_video.py), with generated config/modeling files.
- Add a conversion script that loads the public sam3.1_multiplex.pt checkpoint and verifies parity against the upstream implementation.
Docs and tests
- Add model docs for sam3_1 and sam3_1_video.
- Add focused unit tests for the new sam3_1_video model family.

Local status

This is already working locally against the upstream SAM 3.1 codebase and checkpoint:

real image conversion from facebook/sam3.1 succeeds
real video conversion from facebook/sam3.1 succeeds
video parity passes against the upstream implementation
image preprocessing parity passes against the upstream preprocessing pipeline
targeted SAM 3 / SAM 3.1 tests pass
make check-repo passes locally

Why a new `sam3_1_video` family?

My current recommendation is:

keep SAM 3.1 image support inside the existing sam3 image family
add a separate sam3_1_video family for video, since the multiplex tracker architecture and checkpoint layout differ from the current SAM 3 video implementation

This keeps the image path minimal while avoiding forcing the existing sam3_video code into a checkpoint-incompatible architecture jump.

Open questions for maintainers

Codex also had some questions to confirm the expected scope and structure:

Is a new sam3_1_video model family the preferred approach for SAM 3.1 video support?
Is it acceptable to keep SAM 3.1 image support in the existing sam3 family rather than adding a separate sam3_1 image model class?
Should the first PR focus on:
- image conversion + sam3_1_video core model only
- or also include a higher-level processor / session API for SAM 3.1 video?
If this direction looks good, is there any preferred PR split for reviewability?

To do:

remove plan.md and progress.md
convert and push checkpoint

github-actions · 2026-03-30T08:20:48Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, sam3, sam3_1_video

HuggingFaceDocBuilderDev · 2026-03-30T08:34:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jjabo · 2026-04-23T12:25:15Z

Is this going to be merged?

First draft

3af407e

NielsRogge added 2 commits March 30, 2026 10:21

Update progress file

3eedf61

Merge remote-tracking branch 'upstream/main' into add_sam_3_1

02f1a90

Fix SAM 3.1 consistency checks

0869ad9

JavierYepez reviewed Mar 30, 2026

View reviewed changes

Comment thread plan.md

Comment thread progress.md

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SAM 3.1#45110

Add SAM 3.1#45110
NielsRogge wants to merge 4 commits intohuggingface:mainfrom
NielsRogge:add_sam_3_1

NielsRogge commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

jjabo commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

NielsRogge commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Feature request

Proposed scope

Local status

Why a new sam3_1_video family?

Open questions for maintainers

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

jjabo commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NielsRogge commented Mar 30, 2026 •

edited

Loading

Why a new `sam3_1_video` family?