Resource-aware multimodal scene understanding with view selection for efficient captioning and QA.
computer-vision multiview multimodal scene-understanding active-perception research-prototype view-selection stop-policy
-
Updated
Mar 30, 2026