Skip to content

feat(replay): named replay datasets via --replay small|big#1519

Merged
spomichter merged 2 commits intodevfrom
feat/replay-datasets
Mar 11, 2026
Merged

feat(replay): named replay datasets via --replay small|big#1519
spomichter merged 2 commits intodevfrom
feat/replay-datasets

Conversation

@spomichter
Copy link
Contributor

@spomichter spomichter commented Mar 11, 2026

Summary

Adds --replay-dir flag to select which data/ directory to replay from.

Usage

dimos --replay run ...                                          # go2_sf_office (default)
dimos --replay --replay-dir unitree_go2_bigoffice run ...       # big office dataset
dimos --replay --replay-dir unitree_office_walk run ...         # any data/ directory

Changes

  • GlobalConfig: add replay_dir: str = "go2_sf_office" field
  • ReplayConnection: accept dataset param, use directly as data/ dir name
  • make_connection(): pass cfg.replay_dir to ReplayConnection
  • --replay flag unchanged (backward compatible)

Available Datasets

Directory Description
go2_sf_office Short office walk (~1GB, default)
unitree_go2_bigoffice Full office exploration (~11GB, 4164 video, 5492 odom, 2263 lidar)
unitree_office_walk Another office walk
Any data/ directory Custom datasets

+5/-3 lines, fully backward compatible.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR upgrades --replay from a bare boolean flag to a named dataset selector (small, big, or a custom data/ directory name) by changing GlobalConfig.replay from bool to str | None and introducing a DATASETS alias registry on ReplayConnection.

Key changes:

  • GlobalConfig.replay: bool = Falsestr | None = None; unitree_connection_type guard updated to is not None
  • ReplayConnection.DATASETS: new class-level dict mapping "small"go2_sf_office and "big"unitree_go2_bigoffice
  • ReplayConnection.__init__: now accepts a dataset: str = "small" param; resolves via DATASETS or passes through for custom dirs
  • make_connection(): passes cfg.replay to ReplayConnection, falling back to "small" when IP is "fake"/"mock"/"replay" but no --replay value was given

Breaking change (documented): --replay now requires a value (e.g. --replay small) instead of being a bare flag. Existing users with REPLAY=true in .env files will silently receive dataset="true" which won't resolve to a known directory — they should migrate to REPLAY=small.

Confidence Score: 4/5

  • Safe to merge; changes are well-scoped with only minor style improvements needed.
  • The logic changes are small and correct — the type change, the is not None guard, the DATASETS registry, and the fallback to "small" all work as intended. Two minor style issues exist: a redundant isinstance check that could be cfg.replay or "small", and DATASETS missing a ClassVar annotation. Neither causes a runtime bug. The documented breaking change (bare --replay flag no longer valid) is clearly communicated in the PR description.
  • No files require special attention beyond the two minor style notes in connection.py.

Important Files Changed

Filename Overview
dimos/core/global_config.py Clean type change from bool to `str
dimos/robot/unitree/go2/connection.py Adds DATASETS registry and dataset param to ReplayConnection.__init__; minor: redundant isinstance check in make_connection and missing ClassVar annotation on DATASETS.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["dimos --replay <name> run ..."] --> B["GlobalConfig.replay = <name> | None"]
    B --> C{"unitree_connection_type"}
    C -- "replay is not None" --> D["connection_type = 'replay'"]
    C -- "simulation=True" --> E["connection_type = 'mujoco'"]
    C -- "else" --> F["connection_type = 'webrtc'"]

    D --> G["make_connection()"]
    E --> G
    F --> G

    G -- "ip in fake/mock/replay OR type==replay" --> H{"cfg.replay is str?"}
    H -- "Yes (--replay was set)" --> I["dataset = cfg.replay"]
    H -- "No (ip=replay, no --replay)" --> J["dataset = 'small'"]

    I --> K["ReplayConnection(dataset)"]
    J --> K

    K --> L{"DATASETS.get(dataset, dataset)"}
    L -- "'small'" --> M["dir_name = go2_sf_office"]
    L -- "'big'" --> N["dir_name = unitree_go2_bigoffice"]
    L -- "unknown / custom" --> O["dir_name = dataset (pass-through)"]

    M --> P["get_data(dir_name)"]
    N --> P
    O --> P
Loading

Last reviewed commit: ff0cc03


if ip in ("fake", "mock", "replay") or connection_type == "replay":
return ReplayConnection()
dataset = cfg.replay if isinstance(cfg.replay, str) else "small"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant isinstance check — simplify to cfg.replay or "small"

cfg.replay is typed as str | None, so isinstance(cfg.replay, str) is just a verbose way to write cfg.replay is not None. The expression can be simplified for readability:

Suggested change
dataset = cfg.replay if isinstance(cfg.replay, str) else "small"
dataset = cfg.replay or "small"

This also aligns with how the rest of the file handles optional strings.

Comment on lines +103 to +106
DATASETS: dict[str, str] = {
"small": "go2_sf_office",
"big": "unitree_go2_bigoffice",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add ClassVar annotation to DATASETS

DATASETS is a class-level constant but is annotated as a plain instance-level dict. This can confuse type checkers and signals the wrong intent. Consider annotating it with ClassVar to make the intent explicit:

from typing import ClassVar

class ReplayConnection(UnitreeWebRTCConnection):
    DATASETS: ClassVar[dict[str, str]] = {
        "small": "go2_sf_office",
        "big": "unitree_go2_bigoffice",
    }

This also prevents accidental per-instance shadowing of the mapping.

@spomichter spomichter force-pushed the feat/replay-datasets branch from ff0cc03 to 8198999 Compare March 11, 2026 15:29
Adds --replay-dir flag to select which data/ directory to replay from:

  dimos --replay run ...                                          # go2_sf_office (default)
  dimos --replay --replay-dir unitree_go2_bigoffice run ...       # big office dataset
  dimos --replay --replay-dir <any_data_dir> run ...              # any dataset

Changes:
- GlobalConfig: add replay_dir field (default 'go2_sf_office')
- ReplayConnection: accept dataset param, use directly as data/ dir name
- --replay flag unchanged (backward compatible)
@spomichter spomichter force-pushed the feat/replay-datasets branch from 8198999 to f0481c9 Compare March 11, 2026 15:32
@spomichter spomichter merged commit e4593a3 into dev Mar 11, 2026
11 checks passed
@spomichter spomichter deleted the feat/replay-datasets branch March 11, 2026 17:58
@spomichter spomichter mentioned this pull request Mar 11, 2026
1 task
spomichter added a commit that referenced this pull request Mar 12, 2026
Release v0.0.11

82 PRs, 10 contributors, 396 files changed.

This release brings a production CLI, MCP tooling, temporal memory, and first-class support for coding agents. Dask has been removed. The entire stack now runs from `dimos run` through `dimos stop`.

### Agent-Native Development

DimOS is now built to be driven by coding agents. Point OpenClaw, Claude Code, or Cursor at [AGENTS.md](AGENTS.md) and they can build, run, and debug Dimensional applications using the CLI and MCP interfaces directly.

- **AGENTS.md** — comprehensive onboarding doc: architecture, CLI reference, skill rules, blueprint quick-reference. Your agent reads this and starts coding.
- **MCP server** — all `@skill` methods exposed as HTTP tools. External agents call `dimos mcp call relative_move --arg forward=0.5` or connect via JSON-RPC.
- **MCP CLI** — `dimos mcp list-tools`, `dimos mcp call`, `dimos mcp status`, `dimos mcp modules`
- **Agent context logging** — MCP tool calls and agent messages logged to per-run JSONL for debugging and replay.

### CLI & Daemon

Full process lifecycle — no more Ctrl-C in tmux.

- `dimos run --daemon` — background execution with health checks and run registry
- `dimos stop [--force]` — graceful shutdown with SIGTERM → SIGKILL fallback
- `dimos restart` — replays the original CLI arguments
- `dimos status` — PID, blueprint, uptime, MCP port
- `dimos log -f` — structured per-run logs with follow, JSON output, filtering
- `dimos show-config` — resolved GlobalConfig with source tracing

### Temporal-Spatial Memory

Robots in physical space ingest hours of video and lidar. Temporal-spatial memory gives them a human-like understanding of the world — causal object relationships, entity tracking through time and physical space, and the ability to answer complex temporal queries:

*Who spends the most time in the kitchen? What time on average do I wake up? Which set of switches toggles the main lights? Who was at the office at 9am last Thursday?*

Traditional frame-level embeddings (CLIP, ViT) lose temporal context and don't scale beyond a handful of frames. Video transformers are expensive and don't operate in RGB-D. Dimensional agents work with video + lidar natively, tracking entities across hours and days.

```bash
dimos --replay --replay-dir unitree_go2_office_walk2 run unitree-go2-temporal-memory
```

### Interactive Viewer

Custom Rerun fork (`dimos-viewer`) is now the default. Click-to-navigate: click a point in the 3D view → PointStamped → A* planner → robot moves.

- Camera | 3D split layout on Go2, G1, and drone blueprints
- Native keyboard teleop in the viewer
- `--viewer rerun|rerun-web|rerun-connect|foxglove|none`

### Drone Support

Drone blueprints modernized to match Go2 composition pattern. `drone-basic` and `drone-agentic` work with replay, Rerun, and the full CLI.

```bash
dimos --replay run drone-basic
dimos --replay run drone-agentic
```

### More

- **Go2 fleet control** — multi-robot with `--robot-ips` (#1487)
- **Replay `--replay-dir`** — select dataset, loops by default (#1519, #1494)
- **Interactive install** — `curl -fsSL .../install.sh | bash` (#1395)
- **Nix on non-Debian Linux** (#1472)
- **Remove Dask** — native worker pool (#1365)
- **Remove asyncio dependency** (#1367)
- **Perceive loop** — continuous observation module for agents (#1411)
- **Worker resource monitor** — `dtop` TUI (#1378)
- **G1 agent wiring fix** (#1518)
- **Rerun rate limiting** — prevents viewer OOM on continuous streams (#1509, #1521)
- **RotatingFileHandler** — prevents unbounded log growth (#1492)
- **Test coverage** (#1397), draft PR CI skip (#1398), manipulation test fixes (#1522)

### Breaking Changes

- `--viewer-backend` renamed to `--viewer`
- Dask removed — blueprints using Dask workers need migration to native worker pool
- Default viewer changed from `rerun-web` to `rerun` (native dimos-viewer)

### Contributors

@spomichter, @PaulNechifor, @ruthwikdasyam, @summeryang, @MustafaBhadsorawala, @leshy, @sambull, @JeffHykin, @RadientBrain

## Contributor License Agreement

- [x] I have read and approved the [CLA](https://github.com/dimensionalOS/dimos/blob/main/CLA.md).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant