Skip to content

refactor(perception): temporal memory rerewite with visualization in rerun #1511

Merged
spomichter merged 22 commits intodevfrom
refactor/temporal-memory
Mar 11, 2026
Merged

refactor(perception): temporal memory rerewite with visualization in rerun #1511
spomichter merged 22 commits intodevfrom
refactor/temporal-memory

Conversation

@spomichter
Copy link
Contributor

@spomichter spomichter commented Mar 10, 2026

Problem

The temporal memory module (dimos/perception/experimental/temporal_memory/) is a ~3300-line VideoRAG-inspired system that had grown into a god class with 13 overlapping data structures, raw threading instead of RxPY, blocking VLM calls, and tightly coupled concerns.

Key issues:

  • temporal_memory.py (785 lines) did everything: frame buffering, windowing, VLM calls, state management, summaries, file I/O, queries, graph DB
  • Raw threading.Thread instead of reactive streams
  • Blocking VLM calls with no skip mechanism when overloaded
  • semantic_relations SQLite table was dead code (schema existed, never written to)
  • extract_time_window() sent an IMAGE to parse TEXT (wasteful VLM call)
  • 13 data structures: 5 in-memory, 4 SQLite tables, 4 files (many redundant)
  • VLM frequency params buried in code, not exposed to users

Solution

Split god class into 5 clean components:

Component File Purpose
TemporalMemory(Module) temporal_memory.py (~350 lines) Thin orchestrator, RxPY pipeline, lifecycle
FrameWindowAccumulator frame_window_accumulator.py (~160 lines) Bounded frame buffer + windowing, pure logic
WindowAnalyzer window_analyzer.py (~165 lines) Isolated VLM calls, stateless, testable
TemporalState temporal_state.py (~170 lines) Typed dataclass, thread-safe snapshot
EntityGraphDB entity_graph_db.py (~350 lines) Cleaned up, dead code removed

All VLM frequency flags exposed in top-level config:
fps, window_s, stride_s, summary_interval_s, enable_distance_estimation, max_distance_pairs, stale_scene_threshold, max_frames_per_window, max_buffer_frames, new_memory, visualize

Storage simplified to two outputs:

  • Per-run: logs/<run>/temporal_memory/temporal_memory.jsonl — raw VLM text + parsed JSON (greppable by agent)
  • Persistent: memory/temporal/entity_graph.db — SQLite, survives across runs. new_memory=True clears.

Dead code removed:

  • temporal_memory_deploy.py, temporal_memory_example.py, temporal_utils/state.py
  • semantic_relations table (never written to)
  • Redundant files: state.json, entities.json, frames_index.jsonl, evidence.jsonl

VLM Call #4 fixed: extract_time_window() is regex-only, no image sent.

Rerun visualization: GraphNodes + GraphEdges for live entity graph, color-coded by type.

Breaking Changes

  • temporal_memory_deploy.py removed — use blueprint/autoconnect instead
  • temporal_memory_example.py removed — replaced by tests
  • Persistent DB location changed from assets/temporal_memory/ to memory/temporal/
  • semantic_relations table no longer created

How to Test

Unit Tests (29 tests, mocked VLMs — no API key needed)

source .venv/bin/activate
DISPLAY=:99 python -m pytest dimos/perception/experimental/temporal_memory/test_temporal_memory_module.py -v -c /dev/null

Integration Test (real VLM, requires OPENAI_API_KEY)

source .venv/bin/activate
export OPENAI_API_KEY=...
DISPLAY=:99 python -m pytest dimos/perception/experimental/temporal_memory/test_temporal_memory_module.py -v -c /dev/null -k "integration" --slow

Blueprint: unitree-go2-temporal-memory

The existing blueprint at dimos/robot/unitree/go2/blueprints/agentic/unitree_go2_temporal_memory.py composes unitree_go2_agentic + temporal_memory() and is registered in all_blueprints.py:

# With real robot or simulation (requires full agentic deps: soundfile, langgraph, etc.)
source .venv/bin/activate
export OPENAI_API_KEY=...
DISPLAY=:99 dimos --simulation run unitree-go2-temporal-memory

# With replay data
DISPLAY=:99 dimos --replay run unitree-go2-temporal-memory

The standalone temporal-memory component is also registered in all_blueprints.py and can be composed with any camera source via autoconnect().

Verified E2E Results (2026-03-10)

Run: 20260310-155811-temporal-memory-replay
JSONL: 17 entries (10 window_analysis + 7 rolling_summary), 33,503 bytes
DB:    10 entities, 10 relations, 57,344 bytes
  E1:  [location] interior office space with an American flag and whiteboard
  E2:  [object]   whiteboard
  E3:  [object]   partially visible furniture piece
  E4:  [object]   open door to an inner room
  E5:  [object]   table
  E6:  [object]   rolling office chair
  E7:  [object]   modern worktable
  E8:  [object]   clutter on worktables
  E9:  [location] open door leading to another room
  E10: [object]   wooden cart with wheels holding cardboard boxes

Contributor License Agreement

  • I have read and approved the CLA

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR is a well-executed refactor that breaks a 785-line god-class temporal memory module into five clean, testable components (TemporalMemory, FrameWindowAccumulator, WindowAnalyzer, TemporalState, EntityGraphDB), adds an RxPY reactive pipeline, exposes all VLM-frequency knobs in a top-level config, removes dead code (semantic_relations table, unused files), and replaces a wasteful VLM-based time-window extractor with a pure-regex implementation.

Key observations:

  • Logic bug in temporal_memory.py: The _analyzer lazy-cache property uses hasattr(self, "__analyzer") with a literal string, but Python name mangling turns self.__analyzer into self._TemporalMemory__analyzer — the hasattr check always returns False, so a new WindowAnalyzer object is created on every window analysis and query call. The caching intent is never realised.
  • Logic bug in temporal_state.py: apply_summary contains a while loop that increments next_summary_at_s by summary_interval_s; if called with summary_interval_s=0.0 the loop never terminates, hanging the caller indefinitely.
  • Missing None guard in window_analyzer.py: answer_query calls raw.strip() without first checking whether raw is None, unlike the consistent guard present in update_summary. The AttributeError is swallowed by the surrounding try/except, but produces a misleading log message.
  • Thread-local DB connections (entity_graph_db.py): close() and commit() only operate on the calling thread's connection. Background distance-estimation threads each open their own thread-local connection that is never explicitly closed.
  • The test suite is substantially improved with 29 unit tests covering all new components, thread-safety, persistence, and JSONL logging — a strong addition.

Confidence Score: 3/5

  • Mergeable after fixing the name-mangling cache bug and the infinite-loop risk in apply_summary.
  • Two confirmed logic bugs (broken _analyzer cache due to Python name mangling; potential infinite loop in apply_summary when summary_interval_s=0) reduce confidence. Both are straightforward one-line fixes, and the overall refactor architecture is sound with good test coverage.
  • temporal_memory.py (name-mangling cache bug) and temporal_state.py (infinite-loop risk in apply_summary) need attention before merge.

Important Files Changed

Filename Overview
dimos/perception/experimental/temporal_memory/temporal_memory.py Refactored from 785-line god class to a ~360-line thin orchestrator; contains a name-mangling bug in the _analyzer lazy-cache property that causes a new WindowAnalyzer to be created on every call.
dimos/perception/experimental/temporal_memory/temporal_state.py New typed, thread-safe state container; contains a potential infinite loop in apply_summary when summary_interval_s=0.0 is passed directly.
dimos/perception/experimental/temporal_memory/window_analyzer.py New stateless VLM interaction layer; well-structured, but answer_query is missing an explicit None check on the VLM response before calling .strip(), unlike the consistent guard in update_summary.
dimos/perception/experimental/temporal_memory/frame_window_accumulator.py New pure-logic bounded frame buffer; correctly thread-safe with a single lock, good stride guard, and clean accessors. No issues found.
dimos/perception/experimental/temporal_memory/entity_graph_db.py Dead semantic_relations table dropped, schema cleaned up; close() and commit() only operate on the calling thread's connection — background distance-estimation thread connections are not explicitly closed.
dimos/perception/experimental/temporal_memory/test_temporal_memory_module.py Significantly expanded test suite (29 unit tests + integration); good coverage of thread safety, new_memory flag, JSONL logging, and all new components. No issues found.
dimos/perception/experimental/temporal_memory/temporal_utils/graph_utils.py Replaces the wasteful VLM-based extract_time_window with a regex-only implementation; semantic_knowledge context key removed; cleaner and no issues found.
dimos/perception/experimental/temporal_memory/init.py Updated exports to include new public components (FrameWindowAccumulator, TemporalState, WindowAnalyzer); no issues found.

Sequence Diagram

sequenceDiagram
    participant Src as VideoSource
    participant TM as TemporalMemory (orchestrator)
    participant FWA as FrameWindowAccumulator
    participant WA as WindowAnalyzer
    participant TS as TemporalState
    participant DB as EntityGraphDB
    participant JSONL as JSONL Log

    Src->>TM: color_image (RxPY stream)
    TM->>FWA: add_frame(img, wall_time)
    Note over TM: interval(stride_s) fires
    TM->>FWA: try_extract_window()
    FWA-->>TM: list[Frame] | None

    TM->>TS: to_dict() → state_dict
    TM->>WA: analyze_window(frames, state_dict, w_start, w_end)
    WA-->>TM: AnalysisResult (parsed + raw_vlm)

    TM->>JSONL: _log_jsonl(window_analysis)
    TM->>DB: save_window_data(parsed, w_end)
    TM->>TS: update_from_window(parsed, w_end, ...) → needs_summary

    opt enable_distance_estimation
        TM->>DB: estimate_and_save_distances(...) [background thread]
    end

    opt needs_summary
        TM->>FWA: latest_frame()
        TM->>TS: snapshot()
        TM->>WA: update_summary(frame, rolling_summary, chunk_buffer)
        WA-->>TM: SummaryResult
        TM->>TS: apply_summary(text, w_end, ...)
        TM->>JSONL: _log_jsonl(rolling_summary)
    end

    Note over TM: query() skill call
    TM->>TS: snapshot()
    TM->>FWA: latest_frame()
    TM->>WA: answer_query(question, context, frame)
    WA-->>TM: QueryResult
    TM->>JSONL: _log_jsonl(query)
    TM-->>Src: answer (str)
Loading

Comments Outside Diff (4)

  1. dimos/perception/experimental/temporal_memory/temporal_state.py, line 160-161 (link)

    Infinite loop when summary_interval_s is zero

    If apply_summary is called with summary_interval_s=0.0, the while loop while self.next_summary_at_s <= w_end + 1e-6 can never terminate because self.next_summary_at_s += 0.0 never advances the counter, hanging the calling thread indefinitely.

    While update_from_window guards against this by only returning True when summary_interval_s > 0, apply_summary is a public method and has no such guard. A defensive check is warranted:

  2. dimos/perception/experimental/temporal_memory/window_analyzer.py, line 162-163 (link)

    Missing None guard before raw.strip()

    self._vlm.query() can return None (as guarded in analyze_window at line 111). If it does here, raw.strip() raises an AttributeError inside the try block — it will be caught and logged as a generic query failed error, making it harder to debug.

    update_summary (line 143) already handles this correctly with if raw and raw.strip(). The same pattern should be applied here:

  3. dimos/perception/experimental/temporal_memory/entity_graph_db.py, line 613-616 (link)

    close() only closes the calling thread's connection

    Because connections are stored in threading.local(), each thread has its own connection. The close() and commit() methods only operate on the connection belonging to the thread that calls them. Background distance-estimation threads (spawned in temporal_memory.py) each open their own connection via _get_connection(), but those connections are never explicitly closed — they are only released when the thread (and therefore the threading.local) is garbage-collected.

    For long-running sessions this is generally fine since SQLite manages the file handles, but it is worth documenting this limitation in the class docstring, and potentially signalling to threads to close their connections before the DB path is removed (e.g. on new_memory=True restart).

  4. dimos/perception/experimental/temporal_memory/temporal_memory.py, line 197-203 (link)

    _analyzer cache broken — name mangling mismatch

    hasattr(self, "__analyzer") passes the literal string "__analyzer" to hasattr. Python does not apply name mangling to string arguments. However, self.__analyzer = ... inside the class body is mangled to self._TemporalMemory__analyzer. This means hasattr always returns False, so a brand-new WindowAnalyzer is created on every single access to _analyzer (every _analyze_window, _update_rolling_summary, and query call). The lazy-cache intent is never realised.

    The fix is to use a single-underscore attribute name (e.g. _analyzer_instance) so no mangling is involved — the hasattr check and the assignment will refer to the same attribute name.

Last reviewed commit: 6b946fa

@spomichter
Copy link
Contributor Author

@greptile

spomichter and others added 17 commits March 11, 2026 13:44
…onents

Architecture refactor of the temporal memory system:

- TemporalMemory(Module): thin orchestrator with reactive RxPY pipeline
- FrameWindowAccumulator: bounded frame buffering + windowing (pure logic)
- WindowAnalyzer: isolated VLM interaction layer (testable, stateless)
- TemporalState: typed dataclass with thread-safe snapshot support
- EntityGraphDB: cleaned up, removed dead semantic_relations table

Config changes:
- TemporalMemoryConfig exposes ALL VLM frequency knobs at top level
- Added db_dir, new_memory, stale_scene_threshold, max_distance_pairs

Storage simplified to two outputs:
- Per-run: temporal_memory.jsonl with raw VLM responses (via get_run_log_dir)
- Persistent: memory/temporal/entity_graph.db (survives across runs)
- Removed: state.json, entities.json, frames_index.jsonl, evidence.jsonl

Other changes:
- extract_time_window uses regex-only (removed wasteful VLM image call)
- Added Rerun GraphNodes/GraphEdges entity graph visualization
- Removed temporal_memory_deploy.py (use blueprints/autoconnect)
- Removed temporal_memory_example.py (replaced by tests)
- Removed temporal_utils/state.py (replaced by TemporalState)
- 29 unit tests covering all components, all passing
…, storage docs

- Replace stale README.md referencing old artifacts (evidence.jsonl,
  state.json, entities.json, frames_index.jsonl, output_dir)
- Document all TemporalMemoryConfig flags with defaults and descriptions
- Add architecture diagram showing 5-component split
- Document storage outputs (per-run JSONL + persistent SQLite DB)
- Add VLM call budget section for cost estimation
- Add standalone blueprint composition example
- Remove redundant temporal_memory.md (was copy of old README)
Three bugs found during live testing on real hardware:

1. JSONL not written in worker processes: get_run_log_dir() only
   checks the in-process global, which is not set in forkserver
   workers. Fall back to DIMOS_RUN_LOG_DIR env var (inherited).

2. Rerun visualization invisible: rr.log() is a no-op without an
   active recording context. Worker processes don't inherit the
   bridge's Rerun connection. Add _ensure_rerun() that lazily
   connects to the gRPC server at 127.0.0.1:9876.

3. Query appears to hang with no feedback: add logging before
   graph context build and VLM call so users can see progress.
…g, fix Rerun

- Persistent DB moved from DIMOS_PROJECT_ROOT/memory/temporal/ to
  ~/.local/state/dimos/temporal_memory/ (XDG compliant, same root
  as per-run logs). Repo root is not the right place for runtime state.

- Log both paths at startup so users can always find their data.

- Warn if no run log dir found (JSONL logging disabled).

- Revert wrong Rerun connect: other DimOS modules just call rr.log()
  and let it no-op if no recording exists. Bridge is the canonical
  Rerun entry point. TemporalMemory follows the same pattern.

- Remove unused DIMOS_PROJECT_ROOT import.
- Update tests to remove stale DIMOS_PROJECT_ROOT patches.
- Update README with new DB location.
Replace the 2D GraphNodes/GraphEdges visualization (separate panel,
disconnected from the scene) with 3D entity markers overlaid on the
world map using the robot's odometry position.

How it works:
- TemporalMemory now subscribes to odometry: In[Odometry]
- Tracks robot (x, y, z) position continuously
- When entities are detected, their world position (robot's position
  at detection time) is stored in the entity DB metadata
- After each window analysis, publishes EntityMarkers on an output
  stream — the Rerun bridge auto-picks it up via to_rerun()
- Renders as rr.Points3D with per-entity labels and colors:
  person=red, object=green, location=blue

New files:
- dimos/msgs/visualization_msgs/EntityMarkers.py — labeled 3D markers
  with JSON-over-LCM serialization and to_rerun() → rr.Points3D
- dimos/msgs/visualization_msgs/__init__.py

Changes:
- TemporalMemory: +odometry input, +entity_markers output, robot pose
  tracking, _publish_entity_markers() replaces _visualize_graph()
- EntityGraphDB.save_window_data(): accepts metadata= kwarg, passes
  world position through to upsert_entity() for all entity types
- Tests: new TestEntityMarkers with publish + to_rerun assertions
…markers

EntityMarkers was getting pLCMTransport (pickle) because it had no
lcm_encode method. The bridge's LCM pubsub couldn't decode pickle
messages. Added lcm_encode/decode aliases and exported the class from
visualization_msgs/__init__.py so resolve_msg_type() finds the class
(not the module) when decoding the LCM channel type suffix.

Verified: entity markers now render as labeled 3D Points in the Rerun
world view, positioned at robot odometry coordinates.
Go2 publishes odom: Out[PoseStamped], not odometry: Out[Odometry].
Different name AND type prevented autoconnect from wiring in
unitree-go2-temporal-memory blueprint. Changed to odom: In[PoseStamped]
which matches the existing convention across all Go2 blueprints.

TemporalMemory only reads .position.x/y/z which PoseStamped provides.
… logging

Two fixes:

1. Entity positions: Changed COALESCE order in upsert_entity SQL from
   COALESCE(excluded.metadata, metadata) to COALESCE(metadata, excluded.metadata).
   Existing metadata (first detection position) is now preserved on
   re-sighting. Previously every update overwrote the position to the
   robot's current location, causing all entities to cluster at one point.

2. Insight logging: Added [temporal-memory] prefixed logger.info() calls
   for all key events visible in terminal:
   - Frame/odom counters (every 20 frames)
   - VLM captions, new entities, entities present, relations
   - Rolling summaries (300 chars)
   - Entity marker publish counts
   - Debug log when no window ready

30 tests pass.
…gging

Storage changes:
- Added persistent JSONL at ~/.local/state/dimos/temporal_memory/
  temporal_memory.jsonl that accumulates across runs (raw VLM output +
  parsed entities). Per-run JSONL still written to run log dir.
- --new-memory CLI flag (GlobalConfig) clears both persistent DB and
  persistent JSONL on startup.

Logging improvements:
- 'waiting for frames' logged at INFO (was debug) for first 3 polls
  then every 10th, so users can see the module is alive before frames
  arrive.
- First frame logged immediately (was: after 20 frames).
- All [temporal-memory] prefixed logs at INFO level.

30 tests pass.
…l_memory/

Robot knowledge is project-specific, not system-wide. The entity graph
and accumulated JSONL belong with the project, not in ~/.local/state/.

Persistent storage now at:
  memory/temporal_memory/entity_graph.db      (entity graph)
  memory/temporal_memory/temporal_memory.jsonl (raw VLM dump)

Per-run logs stay in ~/.local/state/dimos/logs/<run-id>/.

Added memory/temporal_memory/ to .gitignore.
Updated README storage docs.
Path.cwd()/memory/temporal_memory/ instead of dimos.__file__ resolution.
Works for both cloned repos and pip installs — memory lives where you
run 'dimos run' from, same pattern as .git/ or node_modules/.
XDG state dir — predictable, works for pip install and git clone.
No CWD dependency, no repo root detection. Override with db_dir config.
GlobalConfig updates happen in the main process AFTER workers fork,
so workers never see CLI overrides. Fix: CLI sets DIMOS_NEW_MEMORY=1
env var before fork, TemporalMemory checks both GlobalConfig and env.
1. odom.subscribe() now guarded by transport check — module works
   without odometry (entities get (0,0,0) positions). Fixes
   'NoneType has no attribute subscribe' in integration test.

2. Test image value clamped to 255 (was 50+i*40 overflowing uint8).
- Convert async test to sync (no pytest-asyncio needed)
- Start consumer (TemporalMemory) before producer (VideoReplay)
  to avoid race where all frames emit before subscription exists
- Clamp test image values to 255 (uint8 overflow at i=6)
@spomichter spomichter force-pushed the refactor/temporal-memory branch from fb4bbe9 to 8677b31 Compare March 11, 2026 13:45
… env var

Blueprint module is imported lazily by CLI after global_config.update(),
so global_config.new_memory is correct at that point. Pass it through
TemporalMemoryConfig instead of hacking env vars across fork boundary.
…y all windows

stale_scene_threshold=5.0 (mean pixel diff < 5/255) silently skipped
almost every window after the first 2. CLIP filtering already handles
duplicate frames. Default to 0 (disabled). Added INFO log when skip
does fire.
Replay datasets can have negative timestamps (relative to recording
start). The stride check 'current - last < stride' always passed when
timestamps decreased, blocking all windows after the first 2.
Fix: abs() the difference.
@spomichter spomichter changed the title refactor(perception): split temporal memory into clean components refactor(perception): temporal memory rerewite with visualization in rerun Mar 11, 2026
@spomichter spomichter merged commit 514f859 into dev Mar 11, 2026
12 checks passed
@spomichter spomichter deleted the refactor/temporal-memory branch March 11, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant