CoV: Chain-of-View Prompting for Spatial Reasoning

This is the code repository for the paper:

CoV: Chain-of-View Prompting for Spatial Reasoning

Haoyu Zhao*, Akide Liu*, Zeyu Zhang*, Weijie Wang*, Feng Chen, Ruihan Zhu, Gholamreza Haffari and Bohan Zhuang†

*Equal contribution. †Corresponding author.

ACL 2026 Findings

[arXiv] [HF Paper] [Website] [Trending on X]

Overview

Embodied question answering (EQA) in 3D environments often requires collecting context that is distributed across multiple viewpoints and partially occluded. However, most recent vision--language models (VLMs) are constrained to a fixed and finite set of input views, which limits their ability to acquire question-relevant context at inference time and hinders complex spatial reasoning. We propose Chain-of-View (CoV) prompting, a training-free, test-time reasoning framework that transforms a VLM into an active viewpoint reasoner through a coarse-to-fine exploration process. CoV first employs a View Selection agent to filter redundant frames and identify question-aligned anchor views. It then performs fine-grained view adjustment by interleaving iterative reasoning with discrete camera actions, obtaining new observations from the underlying 3D scene representation until sufficient context is gathered or a step budget is reached.

Updates

2026-01-09 We release paper on arXiv.

Project Structure

.
├── cov/                      # Main package
├── scripts/                  # Utility scripts
├── tools/                    # Data processing tools
├── main.py                   # Main entry point
├── pixi.toml                 # Pixi environment configuration
└── README.md

Installation

Prerequisites

Python 3.9+
CUDA support (recommended for HabTat Sim)

Using Pixi

The project uses Pixi for dependency management:

# Install dependencies
pixi install

# Activate the environment
pixi shell

How to Run

Setup Environment Variables

Create a .env file in the root directory with your API credentials:

# OpenAI
OPENAI_API_KEY=[your_key_here]

# OpenRouter
OPENROUTER_API_BASE=https://openrouter.api.com/api/v1
OPENROUTER_API_KEY=[your_key_here]

# Dashscop
DASHSCOPE_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_API_KEY=[your_key_here]

Prepare Dataset

Download the OpenEQA dataset following the original dataset.
Place question files in data/ directory
Place scene frames in data/frames/ directory

Run experiment

Run the agent on OpenEQA questions:

# Specify models.
python main.py model=qwen

# Specify min_action_step
python main.py model=qwen min_action_step=7

Custom Models

You can set your own model backend in cov/config.py.

Output

Results are saved to the configured output directory with:

JSON files containing answers and metadata
HTML reports showing navigation history and visualizations
Screenshots of selected views and bird's eye views

Run evaluation

For evaluation, please follow the LLM-Match protocol from OpenEQA.

Citation

If you use Chain of View in your research, please cite this work.

@article{zhao2026cov,
  title={CoV: Chain-of-View Prompting for Spatial Reasoning},
  author={Zhao, Haoyu and Liu, Akide and Zhang, Zeyu and Wang, Weijie and Chen, Feng and Zhu, Ruihan and Haffari, Gholamreza and Zhuang, Bohan},
  journal={arXiv preprint arXiv:2601.05172},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoV: Chain-of-View Prompting for Spatial Reasoning

Overview

Updates

Project Structure

Installation

Prerequisites

Using Pixi

How to Run

Setup Environment Variables

Prepare Dataset

Run experiment

Custom Models

Output

Run evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
cov		cov
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Folders and files

Latest commit

History

Repository files navigation

CoV: Chain-of-View Prompting for Spatial Reasoning

Overview

Updates

Project Structure

Installation

Prerequisites

Using Pixi

How to Run

Setup Environment Variables

Prepare Dataset

Run experiment

Custom Models

Output

Run evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages