SAM2 Video support fp16 by Guppy16 · Pull Request #43268 · huggingface/transformers

Guppy16 · 2026-01-13T23:21:21Z

What does this PR do?

Fix SAM2 Video inference processor so that it can support float16 (currently just works for fp32 and bfloat16).

How to reproduce

Demo source from here

This demo will work for: dtype = torch.bfloat16 and dtype = torch.float32,
and this PR fixes it for the case: dtype = torch.float16
(pls note that fp8 / int8 / etc don't work)

import torch
from transformers import Sam2VideoModel, Sam2VideoProcessor
from transformers.video_utils import load_video

device = torch.device("cuda")
dtype = torch.float16
model_name = "facebook/sam2.1-hiera-tiny"

model = Sam2VideoModel.from_pretrained(model_name).to(device, dtype=dtype)

processor = Sam2VideoProcessor.from_pretrained(model_name)


video_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/bedroom.mp4"

video_frames, _ = load_video(video_url)

# Initialize session for streaming
inference_session = processor.init_video_session(
    inference_device=device,
    dtype=dtype,
)

# Process frames one by one
for frame_idx, frame in enumerate(video_frames[:10]):  # Process first 10 frames
    inputs = processor(images=frame, device=device, return_tensors="pt")
    if frame_idx == 0:
        # Add point input on first frame
        processor.add_inputs_to_inference_session(
            inference_session=inference_session,
            frame_idx=0,
            obj_ids=1,
            input_points=[[[[210, 350], [250, 220]]]],
            input_labels=[[[1, 1]]],
            original_size=inputs.original_sizes[
                0
            ],  # need to be provided when using streaming video inference
        )
    # Process current frame
    sam2_video_output = model(
        inference_session=inference_session, frame=inputs.pixel_values[0]
    )
    video_res_masks = processor.post_process_masks(
        [sam2_video_output.pred_masks],
        original_sizes=inputs.original_sizes,
        binarize=False,
    )[0]
    print(f"Frame {frame_idx}: mask shape {video_res_masks.shape}")

Who can review?

@yonigozlan @molbap

Guppy16 · 2026-01-16T23:23:24Z

@yonigozlan bump. there are a few cicd pipelines which are failing; smth to do with code quality and consistency between sam 2 and sam 3 (but not sure entirely)

molbap · 2026-01-19T09:12:38Z

Thanks! The checks are failing because you are modifying a modular file which serves to auto-generate one or several modeling files. https://huggingface.co/docs/transformers/v5.0.0rc2/modular_transformers#create-a-modelingpy-file

yonigozlan · 2026-01-22T15:50:28Z

Thank you for raising this issue @Guppy16 ! Indeed I could reproduce the problem. I simplified the fix a bit to convert to the correct dtype as late as possible, and added tests.

github-actions · 2026-01-22T15:50:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: edgetam_video, sam2_video, sam3_tracker_video, sam3_video

HuggingFaceDocBuilderDev · 2026-01-22T16:00:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix: cast memory attention inputs to inference session dtype * chore: fix formatting * add fix and tests --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

fix: cast memory attention inputs to inference session dtype

2f97225

Guppy16 commented Jan 13, 2026

View reviewed changes

Comment thread src/transformers/models/sam2_video/modular_sam2_video.py Outdated

chore: fix formatting

058811d

yonigozlan mentioned this pull request Jan 22, 2026

Fix float16 inference for Sam2 Video / Sam3 Video / Sam3 Video Tracker + add tests #43414

Closed

yonigozlan added 2 commits January 22, 2026 15:47

add fix and tests

6b3e447

Merge remote-tracking branch 'upstream/main' into patch-1

dfe72aa

yonigozlan approved these changes Jan 22, 2026

View reviewed changes

yonigozlan enabled auto-merge (squash) January 22, 2026 15:54

yonigozlan merged commit 4a1ad8d into huggingface:main Jan 22, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAM2 Video support fp16#43268

SAM2 Video support fp16#43268
yonigozlan merged 4 commits intohuggingface:mainfrom
Guppy16:patch-1

Guppy16 commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Guppy16 commented Jan 16, 2026

Uh oh!

molbap commented Jan 19, 2026

Uh oh!

yonigozlan commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Guppy16 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to reproduce

Who can review?

Uh oh!

Uh oh!

Guppy16 commented Jan 16, 2026

Uh oh!

molbap commented Jan 19, 2026

Uh oh!

yonigozlan commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Guppy16 commented Jan 13, 2026 •

edited

Loading