Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@ The toolbox supports the most advanced high-quality navigation dataset, InternDa
- [🏠 Introduction](#-introduction)
- [🔥 News](#-news)
- [📚 Getting Started](#-getting-started)
- [📦 Overview of Benchmark \& Model Zoo](#-overview-of-benchmark-and-model-zoo)
- [📦 Overview of Benchmark \& Model Zoo](#-overview)
- [🔧 Customization](#-customization)
- [👥 Contribute](#-contribute)
- [🚀 Community Deployment & Best Practices](#-community-deployment--best-practices)
- [🔗 Citation](#-citation)
- [📄 License](#-license)
- [👏 Acknowledgements](#-acknowledgements)
Expand Down Expand Up @@ -213,6 +214,23 @@ For example, raising issues, fixing bugs in the framework, and adapting or addin

**Note:** We welcome the feedback of the model's zero-shot performance when deploying in your own environment. Please show us your results and offer us your future demands regarding the model's capability. We will select the most valuable ones and collaborate with users together to solve them in the next few months :)

## 🚀 Community Deployment & Best Practices

We are excited to see InternNav being deployed and extended by the community across different robots and real-world scenarios.
Below are selected community-driven deployment guides and solution write-ups, which may serve as practical references for advanced users.

- **IROS Challenge Nav Track: Champion Solution (2025)**
A complete system-level solution and design analysis for Vision-and-Language Navigation in Physical Environments.
🔗 https://zhuanlan.zhihu.com/p/1969046543286907790

- **Go2 Series Deployment Tutorial (ShanghaiTech University)**
Step-by-step edge deployment guide for InternNav-based perception and navigation.
🔗 https://github.com/cmjang/InternNav-deploy

- **G1 Series Deployment Tutorial (Wuhan University)**
Detailed educational materials on vision-language navigation deployment.
🔗 [*Chapter 5: Vision-Language Navigation (Part II)*](https://mp.weixin.qq.com/s/p3cJzbRvecMajiTh9mXoAw)

## 🔗 Citation

If you find our work helpful, please cite:
Expand Down
15 changes: 5 additions & 10 deletions internnav/agent/dialog_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,16 @@
import quaternion
import torch
from PIL import Image, ImageDraw
from transformers import (
AutoProcessor,
AutoTokenizer,
Qwen2_5_VLForConditionalGeneration,
)

from internnav.agent import Agent
from internnav.configs.agent import AgentCfg

try:
pass
except Exception as e:
print(f"Warning: ({e}), Ignore this if not using dual_system.")

try:
from transformers import (
AutoProcessor,
AutoTokenizer,
Qwen2_5_VLForConditionalGeneration,
)
from depth_camera_filtering import filter_depth
from habitat.tasks.nav.shortest_path_follower import ShortestPathFollower
except Exception as e:
Expand Down
6 changes: 3 additions & 3 deletions internnav/dataset/vlln_lerobot_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,21 @@

# Define placeholders for dataset paths
IION_split1 = {
"data_path": "traj_data/mp3d_split1",
"data_path": "projects/VL-LN-Bench/traj_data/mp3d_split1",
"height": 125,
"pitch_1": 0,
"pitch_2": 30,
}

IION_split2 = {
"data_path": "traj_data/mp3d_split2",
"data_path": "projects/VL-LN-Bench/traj_data/mp3d_split2",
"height": 125,
"pitch_1": 0,
"pitch_2": 30,
}

IION_split3 = {
"data_path": "traj_data/mp3d_split3",
"data_path": "projects/VL-LN-Bench/traj_data/mp3d_split3",
"height": 125,
"pitch_1": 0,
"pitch_2": 30,
Expand Down
2 changes: 1 addition & 1 deletion internnav/evaluator/utils/result_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,5 +319,5 @@ def finalize_all_results(self, rank, world_size):
}

# write log content to file
with open(f"{self.name}_result.json", "w") as f:
with open(f"{PROJECT_ROOT_PATH}/logs/{self.name}/result.json", "w") as f:
json.dump(json_data, f, indent=2, ensure_ascii=False)
32 changes: 25 additions & 7 deletions internnav/habitat_extensions/vlln/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,34 @@
Vision-Language-and-Language Navigation (VL-LN) is a new [benchmark](https://0309hws.github.io/VL-LN.github.io/) built upon VLN in Habitat, which refers to the setting that models take the vision and language as input and output language and navigation actions. In contrast to VLN, where agents only take navigation actions, agents in VL-LN could ask questions and engage in dialogue with users to complete tasks better with language interaction.
This package adapts [Meta AI Habitat](https://aihabitat.org) for VL-LN within InternNav. It wraps Habitat environments that expose semantic masks, registers dialog-aware datasets and metrics, and provides evaluators that coordinate agent actions, NPC interactions, and logging.

Install our benchmark [dataset](https://huggingface.co/datasets/InternRobotics/VL-LN-Bench) and the [latest checkpoints](https://huggingface.co/InternRobotics/VL-LN-Bench-basemodel) from HuggingFace.
Place the downloaded benchmark under `InternNav/projects/VL-LN-Bench` to match the default path expected by the code.

## Package structure

```
habitat_vlln_extensions/
├── __init__.py
├── habitat_dialog_evaluator.py
├── habitat_vlln_env.py
├── measures.py
├── simple_npc/
└── utils/
InternNav
├── assets/
├── internnav/
│ ├── habitat_vlln_extensions
│ │ ├── simple_npc
│ │ │ ├── api_key.txt
│ │ ├── measures.py
│ │ ├── habitat_dialog_evaluator.py
│ │ ├── habitat_vlln_env.py
│ ... ... ...
...
├── projects
│ ├── VL-LN-Bench/
│ │ ├── base_model/
│ │ ├── raw_data/
│ │ ├── scene_datasets/
│ │ │ └── mp3d/
│ │ │ └── 17DRP5sb8fy/
│ │ │ ├── 1LXtFkjw3qL/
│ │ │ ...
│ │ ├── traj_data/
...
```

* `__init__.py` re-exports the public entry points so callers can import
Expand Down
7 changes: 7 additions & 0 deletions internnav/habitat_extensions/vlln/habitat_dialog_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,13 @@ def calc_metrics(self, global_metrics: dict) -> dict:
# avoid /0 if no episodes
denom = max(len(sucs_all), 1)

# clean NaN in spls, treat as 0.0
torch.nan_to_num(spls_all, nan=0.0, posinf=0.0, neginf=0.0, out=spls_all)

# clean inf in nes, only fiinite nes are counted
nes_finite_mask = torch.isfinite(nes_all)
nes_all = nes_all[nes_finite_mask]

return {
"sucs_all": float(sucs_all.mean().item()) if denom > 0 else 0.0,
"spls_all": float(spls_all.mean().item()) if denom > 0 else 0.0,
Expand Down
2 changes: 1 addition & 1 deletion internnav/habitat_extensions/vln/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ utilities.
## Package structure

```
habitat_extensions/
habitat_extensions/vln/
├── __init__.py
├── habitat_env.py
├── habitat_default_evaluator.py
Expand Down
13 changes: 10 additions & 3 deletions internnav/habitat_extensions/vln/habitat_vln_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ def eval_action(self):
"nes": nes, # shape [N_local]
}

if ndtws:
if ndtws is not None:
result["ndtws"] = ndtws # shape [N_local]
return result

Expand All @@ -207,6 +207,13 @@ def calc_metrics(self, global_metrics: dict) -> dict:

# avoid /0 if no episodes
denom = max(len(sucs_all), 1)

# clean NaN in spls, treat as 0.0
torch.nan_to_num(spls_all, nan=0.0, posinf=0.0, neginf=0.0, out=spls_all)

# clean inf in nes, only fiinite nes are counted
nes_finite_mask = torch.isfinite(nes_all)
nes_all = nes_all[nes_finite_mask]

result_all = {
"sucs_all": float(sucs_all.mean().item()) if denom > 0 else 0.0,
Expand Down Expand Up @@ -587,7 +594,7 @@ def _run_eval_dual_system(self) -> tuple:
torch.tensor(spls).to(self.device),
torch.tensor(oss).to(self.device),
torch.tensor(nes).to(self.device),
torch.tensor(ndtw).to(self.device) if 'ndtw' in metrics else None,
torch.tensor(ndtw).to(self.device) if ndtw else None,
)

def _run_eval_system2(self) -> tuple:
Expand Down Expand Up @@ -876,5 +883,5 @@ def _run_eval_system2(self) -> tuple:
torch.tensor(spls).to(self.device),
torch.tensor(oss).to(self.device),
torch.tensor(nes).to(self.device),
torch.tensor(ndtw).to(self.device) if 'ndtw' in metrics else None,
torch.tensor(ndtw).to(self.device) if ndtw else None,
)
Loading