Fix inference loop logic and improve image resizing by RubyNg · Pull Request #24 · Visual-Agent/DeepEyesV2

RubyNg · 2026-01-13T03:13:53Z

Summary

This PR addresses two critical issues in inference_demo.py:

Conversation Logic Fix: The original code failed to append the model's response to history and used the wrong role for execution results, causing infinite loops.
Visual Grounding Fix: Implemented smart_resize strategy (referenced from VLMEvalKit) to fix inaccurate bounding box predictions caused by image downscaling.

Appended response_message to chat_message to maintain context.
Changed execution result role from assistant to user.
Implemented smart_resize to preserve image details for better coordinate grounding.

Tested locally with Visual Probe tasks. The model now correctly generates answers and crops accurate regions.

Fix inference loop logic and improve image resizing

a3efb5b