feat(perception): look out loop by paul-nechifor · Pull Request #1411 · dimensionalOS/dimos

paul-nechifor · 2026-03-04T01:48:05Z

Problem

We need a skill that can scan for something continuously.

Closes DIM-634

Solution

Added a skill which uses Qwen/Moondream VL models to look out for something in a loop.
Analyzing the least blurry image in a time window.
Added ability to disable a module (it's useful to disable spatial-memory to save on VRAM).

Breaking Changes

None

How to Test

MuJoCo test ("lookout for a person (any person) and when you see him, follow him")

uv run pytest -svm mujoco dimos/e2e_tests/test_scan_and_follow_person.py

Contributor License Agreement

I have read and approved the CLA.

greptile-apps · 2026-03-04T01:51:21Z

Greptile Summary

This PR introduces a look-out loop perception skill (PerceiveLoopSkill) that enables an agent to continuously monitor the camera feed for specified objects using a configurable VL model, notify the agent upon detection, and then automatically stop. Supporting changes include a create_vl_model factory with lazy imports, a disabled_modules API on Blueprint for selectively excluding modules at build time, a corresponding --disable CLI flag for the run command, and a new unitree-go2-spatial blueprint that wires PerceiveLoopSkill in alongside the existing spatial memory module.

Key changes and issues found:

perceive_loop_skill.py — lock held during VL inference (logic): Both query_detections and query are called inside with self._lock, meaning these multi-second inference calls block every other lock-protected operation (including stop_looking_out and look_out_for on the agent thread) for their full duration.
perceive_loop_skill.py — TOCTOU in stop_looking_out (logic): The lock is released between capturing active_lookout_str and calling _stop_lookout(), creating a window where a concurrent image callback can clear the lookout (or a new look_out_for can start), causing _stop_lookout() to either cancel the wrong lookout or report incorrect feedback to the agent.
pyproject.toml — duplicate types-psutil (logic): The PR adds a second types-psutil specifier without removing the existing one, resulting in two conflicting entries in both pyproject.toml and uv.lock.
dimos/robot/cli/dimos.py — implicit single-atom assumption (style): The --disable implementation accesses blueprints[0].module with no guard, which is correct for today's single-atom all_modules entries but silently misbehaves for any future multi-atom module blueprint.
The Blueprint.disabled_modules refactor is clean and well-tested. The create_vl_model lazy-import factory is a solid design, though it still lacks an explicit match fallback (flagged in a prior review).

Confidence Score: 2/5

Not safe to merge without addressing the lock-during-inference and TOCTOU issues in PerceiveLoopSkill, as they introduce real liveness and correctness bugs in the core new feature.
The two bugs in perceive_loop_skill.py are in the hot path of the new feature: holding the RLock across slow model inference blocks the agent from cancelling a lookout in a timely manner, and the TOCTOU in stop_looking_out can cause incorrect lookout cancellation. The duplicate dependency is a packaging correctness issue. The surrounding blueprint and CLI work is well-structured.
dimos/perception/perceive_loop_skill.py requires the most attention — both the lock-scope and the TOCTOU race need to be resolved before this is production-safe. pyproject.toml needs the old types-psutil entry removed.

Important Files Changed

Filename	Overview
dimos/perception/perceive_loop_skill.py	New core module introducing the look-out loop skill. Has two meaningful logic bugs: (1) VL model inference is performed while holding the RLock, blocking `stop_looking_out`/`look_out_for` for the full inference duration; (2) TOCTOU race in `stop_looking_out` between the lock release and the `_stop_lookout()` call.
dimos/core/blueprints.py	Adds `disabled_modules` feature to Blueprint dataclass. Refactors all builder methods to use `dataclasses.replace` instead of manual reconstruction. Clean change; `_active_blueprints` as a `cached_property` on a frozen dataclass follows the pre-existing pattern in this file.
dimos/models/vl/create_vl_model.py	New factory module for VL model creation with lazy imports. Annotated return type is `VlModel` but the `match` block has no fallback case, so an unexpected `name` value would implicitly return `None`, violating the type contract.
dimos/robot/cli/dimos.py	Adds `--disable` CLI option for disabling modules by name. Logic is correct for single-atom blueprints; silently only disables the first atom for any multi-atom blueprint (currently theoretical).
pyproject.toml	Adds `types-psutil>=7.2.2.20260130,<8` to dev deps without removing the pre-existing `>=7.0.0.20251001,<8` entry, resulting in two conflicting specifiers for the same package in both `pyproject.toml` and `uv.lock`.
dimos/core/test_blueprints.py	Adds two new tests for the `disabled_modules` feature: one verifying that disabled modules are skipped at build time, another verifying that `autoconnect` correctly merges `disabled_modules_tuple`. Tests look correct.
dimos/e2e_tests/conftest.py	Moves the `start_person_track` fixture from `test_person_follow.py` into the shared `conftest.py` so it can be reused by the new `test_scan_and_follow_person.py` test. Logic is unchanged.
dimos/e2e_tests/test_scan_and_follow_person.py	New end-to-end test for the scan-and-follow-person flow. Correctly uses the shared fixtures. Test body is straightforward.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant PerceiveLoopSkill
    participant ImageStream
    participant VlModel

    Agent->>PerceiveLoopSkill: look_out_for(["person"])
    activate PerceiveLoopSkill
    PerceiveLoopSkill->>VlModel: start()
    PerceiveLoopSkill->>ImageStream: subscribe(sharpness_window)
    PerceiveLoopSkill-->>Agent: "Started looking for ["person"]..."
    deactivate PerceiveLoopSkill

    loop Every ~2s (sharpness window)
        ImageStream->>PerceiveLoopSkill: _on_image(frame)
        activate PerceiveLoopSkill
        Note over PerceiveLoopSkill: Acquires _lock
        PerceiveLoopSkill->>VlModel: query_detections(image, active_lookout_str)
        VlModel-->>PerceiveLoopSkill: detections (empty → return)
        Note over PerceiveLoopSkill: Releases _lock
        deactivate PerceiveLoopSkill
    end

    ImageStream->>PerceiveLoopSkill: _on_image(frame) [match found]
    activate PerceiveLoopSkill
    Note over PerceiveLoopSkill: Acquires _lock (held for entire block)
    PerceiveLoopSkill->>VlModel: query_detections(image, active_lookout_str)
    VlModel-->>PerceiveLoopSkill: detections [non-empty]
    PerceiveLoopSkill->>PerceiveLoopSkill: dispose subscription, clear _active_lookout
    PerceiveLoopSkill->>VlModel: query(image, describe_prompt)
    VlModel-->>PerceiveLoopSkill: description
    PerceiveLoopSkill->>VlModel: stop()
    Note over PerceiveLoopSkill: Releases _lock
    PerceiveLoopSkill->>Agent: add_message("Found a match for ('person',)...")
    deactivate PerceiveLoopSkill

    Agent->>PerceiveLoopSkill: stop_looking_out()
    activate PerceiveLoopSkill
    Note over PerceiveLoopSkill: Acquires _lock → reads active_lookout_str → releases _lock
    PerceiveLoopSkill->>PerceiveLoopSkill: _stop_lookout() [re-acquires _lock]
    PerceiveLoopSkill-->>Agent: "Stopped looking out for..."
    deactivate PerceiveLoopSkill

_{Last reviewed commit: 32fbd6b}

Comments Outside Diff (1)

undefined, line undefined (link)

Duplicate types-psutil dependency entry

This line adds a second types-psutil entry to the dev dependencies. The original entry types-psutil>=7.0.0.20251001,<8 already existed in pyproject.toml (as confirmed by the uv.lock diff, which now contains both specifiers side-by-side). Having two entries for the same package with overlapping but different lower bounds is at best redundant and at worst can confuse the resolver.

The old entry should be removed and replaced by this one, not added alongside it. Please delete the pre-existing "types-psutil>=7.0.0.20251001,<8" line.

dimos/perception/perceive_loop_skill.py

dimos/models/vl/create_vl_model.py

dimos/e2e_tests/conftest.py

dimos/perception/perceive_loop_skill.py

dimos/robot/cli/dimos.py

dimos/models/vl/create.py

dimos/perception/perceive_loop_skill.py

leshy

looks good, small suggestions at Pauls discression

@spomichter

Release v0.0.11 82 PRs, 10 contributors, 396 files changed. This release brings a production CLI, MCP tooling, temporal memory, and first-class support for coding agents. Dask has been removed. The entire stack now runs from `dimos run` through `dimos stop`. ### Agent-Native Development DimOS is now built to be driven by coding agents. Point OpenClaw, Claude Code, or Cursor at [AGENTS.md](AGENTS.md) and they can build, run, and debug Dimensional applications using the CLI and MCP interfaces directly. - **AGENTS.md** — comprehensive onboarding doc: architecture, CLI reference, skill rules, blueprint quick-reference. Your agent reads this and starts coding. - **MCP server** — all `@skill` methods exposed as HTTP tools. External agents call `dimos mcp call relative_move --arg forward=0.5` or connect via JSON-RPC. - **MCP CLI** — `dimos mcp list-tools`, `dimos mcp call`, `dimos mcp status`, `dimos mcp modules` - **Agent context logging** — MCP tool calls and agent messages logged to per-run JSONL for debugging and replay. ### CLI & Daemon Full process lifecycle — no more Ctrl-C in tmux. - `dimos run --daemon` — background execution with health checks and run registry - `dimos stop [--force]` — graceful shutdown with SIGTERM → SIGKILL fallback - `dimos restart` — replays the original CLI arguments - `dimos status` — PID, blueprint, uptime, MCP port - `dimos log -f` — structured per-run logs with follow, JSON output, filtering - `dimos show-config` — resolved GlobalConfig with source tracing ### Temporal-Spatial Memory Robots in physical space ingest hours of video and lidar. Temporal-spatial memory gives them a human-like understanding of the world — causal object relationships, entity tracking through time and physical space, and the ability to answer complex temporal queries: *Who spends the most time in the kitchen? What time on average do I wake up? Which set of switches toggles the main lights? Who was at the office at 9am last Thursday?* Traditional frame-level embeddings (CLIP, ViT) lose temporal context and don't scale beyond a handful of frames. Video transformers are expensive and don't operate in RGB-D. Dimensional agents work with video + lidar natively, tracking entities across hours and days. ```bash dimos --replay --replay-dir unitree_go2_office_walk2 run unitree-go2-temporal-memory ``` ### Interactive Viewer Custom Rerun fork (`dimos-viewer`) is now the default. Click-to-navigate: click a point in the 3D view → PointStamped → A* planner → robot moves. - Camera | 3D split layout on Go2, G1, and drone blueprints - Native keyboard teleop in the viewer - `--viewer rerun|rerun-web|rerun-connect|foxglove|none` ### Drone Support Drone blueprints modernized to match Go2 composition pattern. `drone-basic` and `drone-agentic` work with replay, Rerun, and the full CLI. ```bash dimos --replay run drone-basic dimos --replay run drone-agentic ``` ### More - **Go2 fleet control** — multi-robot with `--robot-ips` (#1487) - **Replay `--replay-dir`** — select dataset, loops by default (#1519, #1494) - **Interactive install** — `curl -fsSL .../install.sh | bash` (#1395) - **Nix on non-Debian Linux** (#1472) - **Remove Dask** — native worker pool (#1365) - **Remove asyncio dependency** (#1367) - **Perceive loop** — continuous observation module for agents (#1411) - **Worker resource monitor** — `dtop` TUI (#1378) - **G1 agent wiring fix** (#1518) - **Rerun rate limiting** — prevents viewer OOM on continuous streams (#1509, #1521) - **RotatingFileHandler** — prevents unbounded log growth (#1492) - **Test coverage** (#1397), draft PR CI skip (#1398), manipulation test fixes (#1522) ### Breaking Changes - `--viewer-backend` renamed to `--viewer` - Dask removed — blueprints using Dask workers need migration to native worker pool - Default viewer changed from `rerun-web` to `rerun` (native dimos-viewer) ### Contributors @spomichter, @PaulNechifor, @ruthwikdasyam, @summeryang, @MustafaBhadsorawala, @leshy, @sambull, @JeffHykin, @RadientBrain ## Contributor License Agreement - [x] I have read and approved the [CLA](https://github.com/dimensionalOS/dimos/blob/main/CLA.md).

paul-nechifor marked this pull request as draft March 4, 2026 01:48

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

paul-nechifor force-pushed the paul/feat/perceive-loop branch from 5002609 to 32fbd6b Compare March 4, 2026 06:46

paul-nechifor assigned paul-nechifor and unassigned paul-nechifor Mar 5, 2026

paul-nechifor marked this pull request as ready for review March 5, 2026 01:38

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

dimos/perception/perceive_loop_skill.py Show resolved Hide resolved

dimos/perception/perceive_loop_skill.py Show resolved Hide resolved

dimos/robot/cli/dimos.py Show resolved Hide resolved

paul-nechifor force-pushed the paul/feat/perceive-loop branch from 32fbd6b to 649db7b Compare March 7, 2026 08:31

leshy reviewed Mar 9, 2026

View reviewed changes

dimos/models/vl/create.py Show resolved Hide resolved

leshy reviewed Mar 9, 2026

View reviewed changes

dimos/perception/perceive_loop_skill.py Show resolved Hide resolved

leshy reviewed Mar 9, 2026

View reviewed changes

dimos/perception/perceive_loop_skill.py Show resolved Hide resolved

leshy reviewed Mar 9, 2026

View reviewed changes

dimos/perception/perceive_loop_skill.py Show resolved Hide resolved

leshy previously approved these changes Mar 9, 2026

View reviewed changes

paul-nechifor dismissed leshy’s stale review via 0e07984 March 9, 2026 11:39

paul-nechifor force-pushed the paul/feat/perceive-loop branch 2 times, most recently from 0e07984 to ea7a704 Compare March 9, 2026 22:37

feat(perception): look out loop

dfb30a0

paul-nechifor force-pushed the paul/feat/perceive-loop branch from ea7a704 to dfb30a0 Compare March 9, 2026 22:40

leshy approved these changes Mar 10, 2026

View reviewed changes

paul-nechifor merged commit 6d85549 into dev Mar 10, 2026
12 checks passed

spomichter mentioned this pull request Mar 11, 2026

Release v0.0.11 #1526

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(perception): look out loop#1411

feat(perception): look out loop#1411
paul-nechifor merged 1 commit intodevfrom
paul/feat/perceive-loop

paul-nechifor commented Mar 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 4, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leshy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paul-nechifor commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

greptile-apps bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leshy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paul-nechifor commented Mar 4, 2026 •

edited

Loading

greptile-apps bot commented Mar 4, 2026 •

edited

Loading