Skip to content

[ViLT] Refactor output handling to align with standardized patterns#44098

Open
aman-coder03 wants to merge 2 commits intohuggingface:mainfrom
aman-coder03:refactor-vilt-output-capture
Open

[ViLT] Refactor output handling to align with standardized patterns#44098
aman-coder03 wants to merge 2 commits intohuggingface:mainfrom
aman-coder03:refactor-vilt-output-capture

Conversation

@aman-coder03
Copy link
Copy Markdown

@aman-coder03 aman-coder03 commented Feb 17, 2026

What does this PR do?

This PR refactors ViLT's output handling to align with the standardized patterns used across the codebase.

Key changes:

  • Removes manual hidden_states/attentions propagation and passes output_attentions, output_hidden_states, and return_dict cleanly through the encoder call in ViltModel.forward
  • Adds **kwargs forwarding to all child model self.vilt(...) calls (ViltForMaskedLM, ViltForQuestionAnswering, ViltForImageAndTextRetrieval, ViltForTokenClassification) so output flags are correctly propagated from the top-level forward call down to the base model
  • Fixes ViltForTokenClassification to handle the inputs_embeds path correctly when computing text_input_size
  • Fixes ViltForImagesAndTextClassification to return None instead of empty lists for hidden_states and attentions when not requested, ensuring correct output length counting

Why is this needed?

This aligns ViLT with the new output handling patterns introduced in #43979.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: vilt

@aman-coder03 aman-coder03 changed the title [ViLT] Refactor output tracing using capture_outputs decorator [ViLT] Refactor output handling to align with standardized patterns Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant