Skip to content

[Serve] fix video analysis release test#61324

Merged
aslonnie merged 2 commits intomasterfrom
abrar-vid-analysis-fix
Feb 26, 2026
Merged

[Serve] fix video analysis release test#61324
aslonnie merged 2 commits intomasterfrom
abrar-vid-analysis-fix

Conversation

@abrarsheikh
Copy link
Copy Markdown
Contributor

regression caused due to upgrade in transformer lib version, specifically there is a behavior change caused by huggingface/transformers#42564

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner February 25, 2026 21:16
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a regression in the video analysis release test caused by an upgrade in the transformers library. The change in doc/source/serve/tutorials/video-analysis/deployments/encoder.py correctly adapts to the new return type of model.get_image_features. Previously, this method returned a tensor directly, but now it returns a BaseModelOutputWithPooling object. The fix correctly accesses the pooler_output attribute of this object to retrieve the frame embeddings before normalization. The change is correct and necessary to fix the regression. I have added one suggestion to improve a code comment.


# L2 normalize on GPU (faster than CPU numpy)
frame_embeddings = torch.nn.functional.normalize(outputs, p=2, dim=1)
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the new comment is helpful for understanding the change in the transformers library, the removed comment # L2 normalize on GPU (faster than CPU numpy) provided a useful performance rationale. It would be beneficial to retain this information for future maintainers, especially in a tutorial. Consider combining both pieces of information.

Suggested change
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings
# get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings.
# The embeddings are then L2 normalized on GPU (faster than CPU numpy).

@aslonnie aslonnie merged commit 73b5266 into master Feb 26, 2026
5 of 6 checks passed
@aslonnie aslonnie deleted the abrar-vid-analysis-fix branch February 26, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants