Skip to content

Fix inference loop logic and improve image resizing#24

Open
RubyNg wants to merge 1 commit intoVisual-Agent:mainfrom
RubyNg:fix/inference
Open

Fix inference loop logic and improve image resizing#24
RubyNg wants to merge 1 commit intoVisual-Agent:mainfrom
RubyNg:fix/inference

Conversation

@RubyNg
Copy link

@RubyNg RubyNg commented Jan 13, 2026

Summary

This PR addresses two critical issues in inference_demo.py:

  1. Conversation Logic Fix: The original code failed to append the model's response to history and used the wrong role for execution results, causing infinite loops.
  2. Visual Grounding Fix: Implemented smart_resize strategy (referenced from VLMEvalKit) to fix inaccurate bounding box predictions caused by image downscaling.

Changes

  • Appended response_message to chat_message to maintain context.
  • Changed execution result role from assistant to user.
  • Implemented smart_resize to preserve image details for better coordinate grounding.

Verified

Tested locally with Visual Probe tasks. The model now correctly generates answers and crops accurate regions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant