[Serve] fix video analysis release test#61324
Conversation
Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a regression in the video analysis release test caused by an upgrade in the transformers library. The change in doc/source/serve/tutorials/video-analysis/deployments/encoder.py correctly adapts to the new return type of model.get_image_features. Previously, this method returned a tensor directly, but now it returns a BaseModelOutputWithPooling object. The fix correctly accesses the pooler_output attribute of this object to retrieve the frame embeddings before normalization. The change is correct and necessary to fix the regression. I have added one suggestion to improve a code comment.
|
|
||
| # L2 normalize on GPU (faster than CPU numpy) | ||
| frame_embeddings = torch.nn.functional.normalize(outputs, p=2, dim=1) | ||
| # get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings |
There was a problem hiding this comment.
While the new comment is helpful for understanding the change in the transformers library, the removed comment # L2 normalize on GPU (faster than CPU numpy) provided a useful performance rationale. It would be beneficial to retain this information for future maintainers, especially in a tutorial. Consider combining both pieces of information.
| # get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings | |
| # get_image_features returns BaseModelOutputWithPooling; use pooler_output for embeddings. | |
| # The embeddings are then L2 normalized on GPU (faster than CPU numpy). |
regression caused due to upgrade in transformer lib version, specifically there is a behavior change caused by huggingface/transformers#42564