refactor(perception): temporal memory rerewite with visualization in rerun #1511
refactor(perception): temporal memory rerewite with visualization in rerun #1511spomichter merged 22 commits intodevfrom
Conversation
Greptile SummaryThis PR is a well-executed refactor that breaks a 785-line god-class temporal memory module into five clean, testable components ( Key observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Src as VideoSource
participant TM as TemporalMemory (orchestrator)
participant FWA as FrameWindowAccumulator
participant WA as WindowAnalyzer
participant TS as TemporalState
participant DB as EntityGraphDB
participant JSONL as JSONL Log
Src->>TM: color_image (RxPY stream)
TM->>FWA: add_frame(img, wall_time)
Note over TM: interval(stride_s) fires
TM->>FWA: try_extract_window()
FWA-->>TM: list[Frame] | None
TM->>TS: to_dict() → state_dict
TM->>WA: analyze_window(frames, state_dict, w_start, w_end)
WA-->>TM: AnalysisResult (parsed + raw_vlm)
TM->>JSONL: _log_jsonl(window_analysis)
TM->>DB: save_window_data(parsed, w_end)
TM->>TS: update_from_window(parsed, w_end, ...) → needs_summary
opt enable_distance_estimation
TM->>DB: estimate_and_save_distances(...) [background thread]
end
opt needs_summary
TM->>FWA: latest_frame()
TM->>TS: snapshot()
TM->>WA: update_summary(frame, rolling_summary, chunk_buffer)
WA-->>TM: SummaryResult
TM->>TS: apply_summary(text, w_end, ...)
TM->>JSONL: _log_jsonl(rolling_summary)
end
Note over TM: query() skill call
TM->>TS: snapshot()
TM->>FWA: latest_frame()
TM->>WA: answer_query(question, context, frame)
WA-->>TM: QueryResult
TM->>JSONL: _log_jsonl(query)
TM-->>Src: answer (str)
|
|
@greptile |
…onents Architecture refactor of the temporal memory system: - TemporalMemory(Module): thin orchestrator with reactive RxPY pipeline - FrameWindowAccumulator: bounded frame buffering + windowing (pure logic) - WindowAnalyzer: isolated VLM interaction layer (testable, stateless) - TemporalState: typed dataclass with thread-safe snapshot support - EntityGraphDB: cleaned up, removed dead semantic_relations table Config changes: - TemporalMemoryConfig exposes ALL VLM frequency knobs at top level - Added db_dir, new_memory, stale_scene_threshold, max_distance_pairs Storage simplified to two outputs: - Per-run: temporal_memory.jsonl with raw VLM responses (via get_run_log_dir) - Persistent: memory/temporal/entity_graph.db (survives across runs) - Removed: state.json, entities.json, frames_index.jsonl, evidence.jsonl Other changes: - extract_time_window uses regex-only (removed wasteful VLM image call) - Added Rerun GraphNodes/GraphEdges entity graph visualization - Removed temporal_memory_deploy.py (use blueprints/autoconnect) - Removed temporal_memory_example.py (replaced by tests) - Removed temporal_utils/state.py (replaced by TemporalState) - 29 unit tests covering all components, all passing
…, storage docs - Replace stale README.md referencing old artifacts (evidence.jsonl, state.json, entities.json, frames_index.jsonl, output_dir) - Document all TemporalMemoryConfig flags with defaults and descriptions - Add architecture diagram showing 5-component split - Document storage outputs (per-run JSONL + persistent SQLite DB) - Add VLM call budget section for cost estimation - Add standalone blueprint composition example - Remove redundant temporal_memory.md (was copy of old README)
Three bugs found during live testing on real hardware: 1. JSONL not written in worker processes: get_run_log_dir() only checks the in-process global, which is not set in forkserver workers. Fall back to DIMOS_RUN_LOG_DIR env var (inherited). 2. Rerun visualization invisible: rr.log() is a no-op without an active recording context. Worker processes don't inherit the bridge's Rerun connection. Add _ensure_rerun() that lazily connects to the gRPC server at 127.0.0.1:9876. 3. Query appears to hang with no feedback: add logging before graph context build and VLM call so users can see progress.
…g, fix Rerun - Persistent DB moved from DIMOS_PROJECT_ROOT/memory/temporal/ to ~/.local/state/dimos/temporal_memory/ (XDG compliant, same root as per-run logs). Repo root is not the right place for runtime state. - Log both paths at startup so users can always find their data. - Warn if no run log dir found (JSONL logging disabled). - Revert wrong Rerun connect: other DimOS modules just call rr.log() and let it no-op if no recording exists. Bridge is the canonical Rerun entry point. TemporalMemory follows the same pattern. - Remove unused DIMOS_PROJECT_ROOT import. - Update tests to remove stale DIMOS_PROJECT_ROOT patches. - Update README with new DB location.
Replace the 2D GraphNodes/GraphEdges visualization (separate panel, disconnected from the scene) with 3D entity markers overlaid on the world map using the robot's odometry position. How it works: - TemporalMemory now subscribes to odometry: In[Odometry] - Tracks robot (x, y, z) position continuously - When entities are detected, their world position (robot's position at detection time) is stored in the entity DB metadata - After each window analysis, publishes EntityMarkers on an output stream — the Rerun bridge auto-picks it up via to_rerun() - Renders as rr.Points3D with per-entity labels and colors: person=red, object=green, location=blue New files: - dimos/msgs/visualization_msgs/EntityMarkers.py — labeled 3D markers with JSON-over-LCM serialization and to_rerun() → rr.Points3D - dimos/msgs/visualization_msgs/__init__.py Changes: - TemporalMemory: +odometry input, +entity_markers output, robot pose tracking, _publish_entity_markers() replaces _visualize_graph() - EntityGraphDB.save_window_data(): accepts metadata= kwarg, passes world position through to upsert_entity() for all entity types - Tests: new TestEntityMarkers with publish + to_rerun assertions
…markers EntityMarkers was getting pLCMTransport (pickle) because it had no lcm_encode method. The bridge's LCM pubsub couldn't decode pickle messages. Added lcm_encode/decode aliases and exported the class from visualization_msgs/__init__.py so resolve_msg_type() finds the class (not the module) when decoding the LCM channel type suffix. Verified: entity markers now render as labeled 3D Points in the Rerun world view, positioned at robot odometry coordinates.
Go2 publishes odom: Out[PoseStamped], not odometry: Out[Odometry]. Different name AND type prevented autoconnect from wiring in unitree-go2-temporal-memory blueprint. Changed to odom: In[PoseStamped] which matches the existing convention across all Go2 blueprints. TemporalMemory only reads .position.x/y/z which PoseStamped provides.
… logging Two fixes: 1. Entity positions: Changed COALESCE order in upsert_entity SQL from COALESCE(excluded.metadata, metadata) to COALESCE(metadata, excluded.metadata). Existing metadata (first detection position) is now preserved on re-sighting. Previously every update overwrote the position to the robot's current location, causing all entities to cluster at one point. 2. Insight logging: Added [temporal-memory] prefixed logger.info() calls for all key events visible in terminal: - Frame/odom counters (every 20 frames) - VLM captions, new entities, entities present, relations - Rolling summaries (300 chars) - Entity marker publish counts - Debug log when no window ready 30 tests pass.
…gging Storage changes: - Added persistent JSONL at ~/.local/state/dimos/temporal_memory/ temporal_memory.jsonl that accumulates across runs (raw VLM output + parsed entities). Per-run JSONL still written to run log dir. - --new-memory CLI flag (GlobalConfig) clears both persistent DB and persistent JSONL on startup. Logging improvements: - 'waiting for frames' logged at INFO (was debug) for first 3 polls then every 10th, so users can see the module is alive before frames arrive. - First frame logged immediately (was: after 20 frames). - All [temporal-memory] prefixed logs at INFO level. 30 tests pass.
…l_memory/ Robot knowledge is project-specific, not system-wide. The entity graph and accumulated JSONL belong with the project, not in ~/.local/state/. Persistent storage now at: memory/temporal_memory/entity_graph.db (entity graph) memory/temporal_memory/temporal_memory.jsonl (raw VLM dump) Per-run logs stay in ~/.local/state/dimos/logs/<run-id>/. Added memory/temporal_memory/ to .gitignore. Updated README storage docs.
Path.cwd()/memory/temporal_memory/ instead of dimos.__file__ resolution. Works for both cloned repos and pip installs — memory lives where you run 'dimos run' from, same pattern as .git/ or node_modules/.
XDG state dir — predictable, works for pip install and git clone. No CWD dependency, no repo root detection. Override with db_dir config.
GlobalConfig updates happen in the main process AFTER workers fork, so workers never see CLI overrides. Fix: CLI sets DIMOS_NEW_MEMORY=1 env var before fork, TemporalMemory checks both GlobalConfig and env.
1. odom.subscribe() now guarded by transport check — module works without odometry (entities get (0,0,0) positions). Fixes 'NoneType has no attribute subscribe' in integration test. 2. Test image value clamped to 255 (was 50+i*40 overflowing uint8).
- Convert async test to sync (no pytest-asyncio needed) - Start consumer (TemporalMemory) before producer (VideoReplay) to avoid race where all frames emit before subscription exists - Clamp test image values to 255 (uint8 overflow at i=6)
fb4bbe9 to
8677b31
Compare
… env var Blueprint module is imported lazily by CLI after global_config.update(), so global_config.new_memory is correct at that point. Pass it through TemporalMemoryConfig instead of hacking env vars across fork boundary.
…y all windows stale_scene_threshold=5.0 (mean pixel diff < 5/255) silently skipped almost every window after the first 2. CLIP filtering already handles duplicate frames. Default to 0 (disabled). Added INFO log when skip does fire.
Replay datasets can have negative timestamps (relative to recording start). The stride check 'current - last < stride' always passed when timestamps decreased, blocking all windows after the first 2. Fix: abs() the difference.
Problem
The temporal memory module (
dimos/perception/experimental/temporal_memory/) is a ~3300-line VideoRAG-inspired system that had grown into a god class with 13 overlapping data structures, raw threading instead of RxPY, blocking VLM calls, and tightly coupled concerns.Key issues:
temporal_memory.py(785 lines) did everything: frame buffering, windowing, VLM calls, state management, summaries, file I/O, queries, graph DBthreading.Threadinstead of reactive streamssemantic_relationsSQLite table was dead code (schema existed, never written to)extract_time_window()sent an IMAGE to parse TEXT (wasteful VLM call)Solution
Split god class into 5 clean components:
TemporalMemory(Module)temporal_memory.py(~350 lines)FrameWindowAccumulatorframe_window_accumulator.py(~160 lines)WindowAnalyzerwindow_analyzer.py(~165 lines)TemporalStatetemporal_state.py(~170 lines)EntityGraphDBentity_graph_db.py(~350 lines)All VLM frequency flags exposed in top-level config:
fps,window_s,stride_s,summary_interval_s,enable_distance_estimation,max_distance_pairs,stale_scene_threshold,max_frames_per_window,max_buffer_frames,new_memory,visualizeStorage simplified to two outputs:
logs/<run>/temporal_memory/temporal_memory.jsonl— raw VLM text + parsed JSON (greppable by agent)memory/temporal/entity_graph.db— SQLite, survives across runs.new_memory=Trueclears.Dead code removed:
temporal_memory_deploy.py,temporal_memory_example.py,temporal_utils/state.pysemantic_relationstable (never written to)VLM Call #4 fixed:
extract_time_window()is regex-only, no image sent.Rerun visualization:
GraphNodes+GraphEdgesfor live entity graph, color-coded by type.Breaking Changes
temporal_memory_deploy.pyremoved — use blueprint/autoconnect insteadtemporal_memory_example.pyremoved — replaced by testsassets/temporal_memory/tomemory/temporal/semantic_relationstable no longer createdHow to Test
Unit Tests (29 tests, mocked VLMs — no API key needed)
source .venv/bin/activate DISPLAY=:99 python -m pytest dimos/perception/experimental/temporal_memory/test_temporal_memory_module.py -v -c /dev/nullIntegration Test (real VLM, requires
OPENAI_API_KEY)Blueprint:
unitree-go2-temporal-memoryThe existing blueprint at
dimos/robot/unitree/go2/blueprints/agentic/unitree_go2_temporal_memory.pycomposesunitree_go2_agentic+temporal_memory()and is registered inall_blueprints.py:The standalone
temporal-memorycomponent is also registered inall_blueprints.pyand can be composed with any camera source viaautoconnect().Verified E2E Results (2026-03-10)
Contributor License Agreement