Skip to content

Conversation

@kew6688
Copy link
Collaborator

@kew6688 kew6688 commented Dec 29, 2025

Motivation

Clean up and stabilize the codebase before bump the version to 0.3.0

Modification

  • Fix RxR evaluation bug: While collecting result from distribution nodes, the ndtw tensor needs to be checked if it is not None
  • Standardize VLN-PE evaluation result output: saving path to logs/task/result.json
  • Handle NaN and inf values in NE and SPL during habitat evaluation
  • Update VL-LN benchmark default path to projects/VL-LN-Bench
  • Update readme to include community tutorials
  • Align all evaluation configs to default settings

Regression Benchmark for 0.3.0

Supported benchmarks are re-evaluated, and the results fall within a reasonable range.
(VL-LN dialog, N1-dual-system in r2r & rxr, cma/rdp/seq2seq/N1-dual-system in pe & flash mode)

Model Dataset/Benchmark NE OS SR SPL
dialog VL-LN 9.04 56.8 17.6 9.45
InternVLA-N1 Habitat R2R 4.31 69.5 62.8 57.2
InternVLA-N1 Habitat RxR 4.68 69.1 60.8 51.3
InternVLA-N1 Flash 4.06 67.1 61.0 54.9
InternVLA-N1 PE 4.95 56.4 50.8 43.4
RDP Flash 6.94 44.0 26.4 19.1
RDP PE 6.67 47.2 24.9 17.5
CMA Flash 7.82 43.2 23.4 17.6
CMA PE 7.24 31.2 22.2 18.7
Seq2seq Flash 8.81 43.4 15.5 0.09
Seq2seq PE 7.74 29.8 16.3 12.3

@kew6688 kew6688 marked this pull request as ready for review December 29, 2025 09:56
@kew6688 kew6688 requested a review from Tai-Wang December 29, 2025 13:24
@Tai-Wang Tai-Wang merged commit f8331e8 into InternRobotics:dev Dec 30, 2025
@kew6688 kew6688 mentioned this pull request Dec 30, 2025
Tai-Wang pushed a commit that referenced this pull request Jan 5, 2026
* fix vlnpe result.json save path

* update vlln dataset path

* fix evaluator bug for rxr ndtw result

* update habitat extensions readme

* add community tutorials

* align eval configs to default path

* fix ne and spl contain NaN issue

* update links for community work

* update readme

* update checkpoint path; isolate transformer dependency

* Update readme IROS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants