Skip to content

feat(core): worker resource monitor#1378

Merged
leshy merged 25 commits intodevfrom
feat/resource-monitor
Mar 3, 2026
Merged

feat(core): worker resource monitor#1378
leshy merged 25 commits intodevfrom
feat/resource-monitor

Conversation

@leshy
Copy link
Contributor

@leshy leshy commented Feb 28, 2026

Problem

Obsoleting dask, we didn't know the resource use of modules

Misc

  • tagged expensive voxel mapper test as tool (was used in dev)
  • removed unitree_webrtc/init.py shim that was loading pygame on every depickle

Solution

  • Adds psutil-based resource monitoring to ModuleCoordinator.loop(), collecting system stats, pluggable publishers (structlog, lcm)
  • Stats published over pickle LCM to /dimos/resource_stats
  • New dtop CLI tool capturing the LCM topic
  • Bump smart blueprint to 7 workers

TODO

this (with diff stats output - maybe on exit one) can now be used for profiling tests, and comparing to dev!

Breaking Changes

None

How to Test

uv sync --all-extras
dtop

run some blueprint

dimos --dtop run ...

Contributor License Agreement

  • I have read and approved the CLA.
2026-03-01_14-50

67861b2bccd80-3398979908

Adds psutil-based resource monitoring to ModuleCoordinator.loop(),
collecting CPU, memory (PSS/USS/RSS/VMS), threads, children, FDs,
and IO stats per worker process every 2s. Stats are published over
LCM and viewable with the new `dtop` CLI tool.

- WorkerStats dataclass and collect_stats() on Worker/WorkerManager
- ResourceLogger protocol with LCM and structlog implementations
- dtop: live Textual TUI subscribing to /dimos/resource_stats
- psutil added as explicit dependency
- Bump smart blueprint to 7 workers
Cache psutil.Process objects across snapshots so cpu_percent(interval=None)
has a previous sample to diff against. Fix wrong module name in docstring,
remove dead _snap_count state, and extend color gradient to cyan→yellow→red.
@leshy leshy changed the title feat(core): per-worker resource monitor + dtop TUI feat(core): worker resource monitor Feb 28, 2026
@leshy leshy marked this pull request as ready for review March 1, 2026 06:54
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

resource_logger: ResourceLogger | None = None,
monitor_interval: float = 1.0,
) -> None:
_logger: ResourceLogger = resource_logger or LCMResourceLogger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a flag to only gather statistics if it was requested.

@dimensionalOS dimensionalOS deleted a comment from greptile-apps bot Mar 1, 2026
@leshy
Copy link
Contributor Author

leshy commented Mar 1, 2026

I think class StatsMonitor can potentially be refactored, (too coupled to workers vs processes, could have per process monitor class for better caching) but now this feature is isolated and refactoring is easy once we know more

@leshy leshy merged commit 306f70d into dev Mar 3, 2026
12 checks passed
@spomichter spomichter mentioned this pull request Mar 11, 2026
1 task
spomichter added a commit that referenced this pull request Mar 12, 2026
Release v0.0.11

82 PRs, 10 contributors, 396 files changed.

This release brings a production CLI, MCP tooling, temporal memory, and first-class support for coding agents. Dask has been removed. The entire stack now runs from `dimos run` through `dimos stop`.

### Agent-Native Development

DimOS is now built to be driven by coding agents. Point OpenClaw, Claude Code, or Cursor at [AGENTS.md](AGENTS.md) and they can build, run, and debug Dimensional applications using the CLI and MCP interfaces directly.

- **AGENTS.md** — comprehensive onboarding doc: architecture, CLI reference, skill rules, blueprint quick-reference. Your agent reads this and starts coding.
- **MCP server** — all `@skill` methods exposed as HTTP tools. External agents call `dimos mcp call relative_move --arg forward=0.5` or connect via JSON-RPC.
- **MCP CLI** — `dimos mcp list-tools`, `dimos mcp call`, `dimos mcp status`, `dimos mcp modules`
- **Agent context logging** — MCP tool calls and agent messages logged to per-run JSONL for debugging and replay.

### CLI & Daemon

Full process lifecycle — no more Ctrl-C in tmux.

- `dimos run --daemon` — background execution with health checks and run registry
- `dimos stop [--force]` — graceful shutdown with SIGTERM → SIGKILL fallback
- `dimos restart` — replays the original CLI arguments
- `dimos status` — PID, blueprint, uptime, MCP port
- `dimos log -f` — structured per-run logs with follow, JSON output, filtering
- `dimos show-config` — resolved GlobalConfig with source tracing

### Temporal-Spatial Memory

Robots in physical space ingest hours of video and lidar. Temporal-spatial memory gives them a human-like understanding of the world — causal object relationships, entity tracking through time and physical space, and the ability to answer complex temporal queries:

*Who spends the most time in the kitchen? What time on average do I wake up? Which set of switches toggles the main lights? Who was at the office at 9am last Thursday?*

Traditional frame-level embeddings (CLIP, ViT) lose temporal context and don't scale beyond a handful of frames. Video transformers are expensive and don't operate in RGB-D. Dimensional agents work with video + lidar natively, tracking entities across hours and days.

```bash
dimos --replay --replay-dir unitree_go2_office_walk2 run unitree-go2-temporal-memory
```

### Interactive Viewer

Custom Rerun fork (`dimos-viewer`) is now the default. Click-to-navigate: click a point in the 3D view → PointStamped → A* planner → robot moves.

- Camera | 3D split layout on Go2, G1, and drone blueprints
- Native keyboard teleop in the viewer
- `--viewer rerun|rerun-web|rerun-connect|foxglove|none`

### Drone Support

Drone blueprints modernized to match Go2 composition pattern. `drone-basic` and `drone-agentic` work with replay, Rerun, and the full CLI.

```bash
dimos --replay run drone-basic
dimos --replay run drone-agentic
```

### More

- **Go2 fleet control** — multi-robot with `--robot-ips` (#1487)
- **Replay `--replay-dir`** — select dataset, loops by default (#1519, #1494)
- **Interactive install** — `curl -fsSL .../install.sh | bash` (#1395)
- **Nix on non-Debian Linux** (#1472)
- **Remove Dask** — native worker pool (#1365)
- **Remove asyncio dependency** (#1367)
- **Perceive loop** — continuous observation module for agents (#1411)
- **Worker resource monitor** — `dtop` TUI (#1378)
- **G1 agent wiring fix** (#1518)
- **Rerun rate limiting** — prevents viewer OOM on continuous streams (#1509, #1521)
- **RotatingFileHandler** — prevents unbounded log growth (#1492)
- **Test coverage** (#1397), draft PR CI skip (#1398), manipulation test fixes (#1522)

### Breaking Changes

- `--viewer-backend` renamed to `--viewer`
- Dask removed — blueprints using Dask workers need migration to native worker pool
- Default viewer changed from `rerun-web` to `rerun` (native dimos-viewer)

### Contributors

@spomichter, @PaulNechifor, @ruthwikdasyam, @summeryang, @MustafaBhadsorawala, @leshy, @sambull, @JeffHykin, @RadientBrain

## Contributor License Agreement

- [x] I have read and approved the [CLA](https://github.com/dimensionalOS/dimos/blob/main/CLA.md).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants