Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,8 @@ def encode_frames(self, frames: np.ndarray) -> np.ndarray:
with torch.no_grad():
with torch.amp.autocast(device_type=self.device, enabled=self.device == "cuda"):
outputs = self.model.get_image_features(**inputs)

# L2 normalize on GPU (faster than CPU numpy)
frame_embeddings = torch.nn.functional.normalize(outputs, p=2, dim=1)
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the new comment is helpful for understanding the change in the transformers library, the removed comment # L2 normalize on GPU (faster than CPU numpy) provided a useful performance rationale. It would be beneficial to retain this information for future maintainers, especially in a tutorial. Consider combining both pieces of information.

Suggested change
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings.
# The embeddings are then L2 normalized on GPU (faster than CPU numpy).

frame_embeddings = torch.nn.functional.normalize(outputs.pooler_output, p=2, dim=1)

# Move to CPU and convert to numpy
result = frame_embeddings.cpu().numpy().astype(np.float32)
Expand Down