ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 by spomichter · Pull Request #345 · dimensionalOS/dimos

spomichter · 2025-06-11T11:26:11Z

Model unit tests for @mdaiter

Not merged as I need to test on my Mac. Ran so far on my Ubuntu 22.04 machine with CUDA 11.7/Torch 2.0.1. Also need to push clean python install in setup.py so you can run pip install -e ".[mac]" or pip install -e ".[arm,gpu]"

To run:

pytest -s dimos/agents/memory dimos/perception

leshy · 2025-06-11T11:56:12Z

dimos/agents/memory/test_image_embedding.py

+        """Return the path to the test video."""
+        # Use a video file from assets directory
+        base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../assets"))
+        video_file = "trimmed_video_office.mov"  # Use the same test video as YOLO test


should use LFS store for these files.
merge this, #342 then read:
https://github.com/dimensionalOS/dimos/blob/dev/docs/testing_stream_reply.md

would be something like

dir = testData("my_video_dir")

the rest the same, or you can upload individual videos separately

video_path = testData("some_specific_video.mp4")

can do in the follow up PR if annoying but plz let's do it :D

Nice sounds good

So LFS seems to only be setup for tests/data dir right @leshy ? Is it worth adding to /assets or automatically any directories that include "*data" so that @mdaiter can store any onnx yolo binaries in dimos/models/yolo/data/yolo.onnx or whatever

@leshy Will do this after merge since your testing stream stuff in dev

leshy · 2025-06-11T12:00:14Z

dimos/agents/memory/test_image_embedding.py

+from dimos.agents.memory.image_embedding import ImageEmbeddingProvider
+
+
+class TestImageEmbedding:


all of this works as a non-class, not sure why we are embedding tests in class methods? not very important, might care about it for dimos-wide uniformity

leshy · 2025-06-11T12:11:49Z

Not merged as I need to test on my Mac.

I think github actions support running tests on mac, so we can investigate what it takes to have a test grid of machines for this. might be a few lines change to github actions

leshy · 2025-06-13T10:00:46Z

Not merged as I need to test on my Mac.

I think github actions support running tests on mac, so we can investigate what it takes to have a test grid of machines for this. might be a few lines change to github actions

Investigated this @ #346 - it wasn't fast & easy (osx runners have diff software installed) so closing for now

…d CUDA

…ogging

…[cuda]

ONNX conversions for YOLOv11 and FastSAM

…onnx

…ip_onnx Add CLIP ONNX conversion and support, with passing vision and text tests

…into tests_clip_yolo_sam

@leshy

Release v0.0.5 ## What's Changed * Unitree WebRTC implementation on rebased dev by @leshy in #277 * Update ros_observable_topic timeout to 100s by @leshy in #273 * Updated README, more clear on API key requirements and updated go2_ros2_sdk remote by @spomichter in #272 * Release v0.0.4 Patch: readme changes by @spomichter in #292 * Readme patch v0.0.4 by @spomichter in #293 * Development container & CI by @leshy in #278 * env/devcontainer ruff formatting/typing by @leshy in #294 * Global reformat 100 line length by @spomichter in #300 * Global code reformat with ruff by @leshy in #295 * Position/Vector type cleanup & tests by @leshy in #297 * Linelength100 by @leshy in #301 * Auto-delivery of binary data files for testing, rewrite of dev script by @leshy in #298 * pre-commit hooks in dev container & CI, automatic LFS upload by @leshy in #303 * Removed all submodules - Testing by @spomichter in #306 * Fixed v0.0.4 Unitree ROS runfile broken by WebRTC development, Vector.py fixes by @spomichter in #307 * test/mapper by @leshy in #305 * Reduced CI cleanup frequency to PRs only into dev/main by @spomichter in #312 * DimOS Manipulation Framework, ObjectDetectionStream Changes by @spomichter in #308 * Added auto-license header to pre-commit by @spomichter in #336 * Move thread fix for alex planner by @leshy in #334 * base typing cleanup, sensor reply tests+docs by @leshy in #309 * devcontainer docs by @leshy in #338 * ci docs by @leshy in #339 * Add Cerebras Agent by @joshuajerin in #310 * Repo cleanup by @leshy in #340 * noros builds by @leshy in #341 * Update testing_stream_reply.md by @leshy in #342 * ONNX conversions for YOLOv11 and FastSAM by @mdaiter in #350 * Test cicd fake ros change by @spomichter in #361 * Reverted cleanup workflow frequency to on any PUSH due to CICD docker workflow issues by @spomichter in #360 * Trigger docker ros rerun by @spomichter in #363 * Ros CI change detection by @leshy in #364 * trigger full rebuild by @leshy in #365 * Add CLIP ONNX conversion and support, with passing vision and text tests by @mdaiter in #353 * CI fix 3 by @leshy in #367 * ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 by @spomichter in #345 * LFS moved to utils from testing by @leshy in #368 * Contact graspnet integration on pytorch and pyproject build processes setup with cuda/manipulation tags by @spomichter in #370 * data/* deletions by @leshy in #369 * Ci pre-commit and docker builds run in parallel by @leshy in #372 * Ci shared docker cache by @leshy in #371 * Unitree WebRTC integrated with full functionality, remove all ROS dependency, refactored entire robot base class and connection interface, added explore skill by @alexlin2 in #279 * Unitree WebRTC only implementation, Exploration skills [Staging --> Dev] by @spomichter in #379 * Dask lcm multiprocess by @leshy in #377 * DimOS Packaging & Build Improvements for CPU-only, CUDA, Manipulation installations by @spomichter in #394 * Multitree go2 by @leshy in #381 * better LCM system checks, fixes bin/lfs_push by @leshy in #382 * UnitreeSpeak skill over webrtc, Voice Interface added on localhost, Voice interface on mobile device on network by @spomichter in #400 * FIX: multiprocess by @leshy in #402 * Lcmspy cli by @leshy in #404 * changed position type name to pose by @alexlin2 in #358 * WIP: foxglove bridge stub by @leshy in #411 * Create running_without_devcontainer.md by @leshy in #405 * new LCM class format support by @leshy in #417 * Fixed PoseStamped ros_msgs error in dimos-lcm by @spomichter in #457 * Fixes move stream issue, Odom receive issue by @leshy in #456 * Small stream/type fixes for unitree by @leshy in #460 * Local planner, Global Planner, Explore, SpatialMemory working via LCM/Dask Multiprocess by @spomichter in #467 * Added working runfile to Unitreego2Light class by @spomichter in #474 * Point Cloud Filtering and Segmentation, Full 6DOF Object pose estimation, Grasp generation, ZED driver support, Hosted grasp integration by @spomichter in #458 * Stream fixes, Twist, Pose, Quaternion updates by @leshy in #471 * Added self-hosted runner to full CICD by @spomichter in #484 * Full Unitree (Local planner, Explore, SpatialMemory) FakeRTC/WebRTC LCM modules working in self-hosted devcontainer by @spomichter in #487 * Porting types/ LCM msgs/ new LCM types, Transform visualization by @leshy in #477 * Tracking streams lcm dask refactor by @spomichter in #488 * Pytransforms by @leshy in #491 * Fix python and dev docker builds for CICD by @spomichter in #489 * Remove PIL Image Usage by @alexlin2 in #490 * Added missing __init__.py's to transforms by @spomichter in #493 * Added tofix pytest tag back to addopts by @spomichter in #494 * Added module docs by @spomichter in #495 * SpatialMemory converted to Dask module, input LCM odom and video streams by @spomichter in #481 * Run modules tests only on 16gb runner by @spomichter in #499 * Trigger CI only on PR or push to main/dev by @spomichter in #500 * Added more aggressive cleanup workflows by @spomichter in #501 * Visual Servoing for Pick and Place Demo by @alexlin2 in #476 * Testing run-tests container pull fix and removed modules tests by @spomichter in #505 * Fix permissions in pre-build-cleanup by @spomichter in #508 * Moved pre-build cleanup to build template by @spomichter in #509 * dimos lcm update to main branch latest commit by @leshy in #498 * RPC Kwargs by @leshy in #503 * Transform system, stream convinience features, type checking by @leshy in #504 * Dimoslcm bump by @leshy in #510 * Testing UV builds in docker by @spomichter in #513 * OccupancyGrid, Path types by @leshy in #511 * subscribing to transports/streams from main loop by @leshy in #524 * Alex Lin's version of ROS Nav2 by @alexlin2 in #514 * Agent refactor conversation history by @spomichter in #541 * Exposed optional memory_limit param in dimos core by @spomichter in #540 * Agent refactor by @spomichter in #535 * Validating transforms with ros examples by @leshy in #538 * rpc timeout by @leshy in #542 * MuJoCo Simulation by @paul-nechifor in #539 * Revert "MuJoCo Simulation" by @spomichter in #548 * perception refactor to be on parity with old architecture by @alexlin2 in #534 * Skill coordinator by @leshy in #536 * WIP Mujoco simulation by @paul-nechifor in #549 * Fix event loop leak by @paul-nechifor in #547 * Correct way to build package directly in non-editable mode, no manife… by @spomichter in #551 * Office environment mujoco by @paul-nechifor in #554 * Less bandwidth usage on LCM, bug fixed with navigation by @alexlin2 in #559 * disabled old agent tests by @leshy in #563 * Camera Module Refactor, added image rectification by @alexlin2 in #566 * long rpc timeout by @leshy in #569 * Twist message for all move command, added keyboard teleop for easy robot control in sim by @alexlin2 in #570 * numerical sort for sensor replay by @leshy in #564 * 2d detection module by @leshy in #567 * Stream timestamp alignment by @leshy in #557 * Sharpness for Images by @leshy in #560 * Jetson humanoid integration by @spomichter in #590 * 2d detection module + Agent2 - yolo demo by @leshy in #582 * jetson.md cleanup by @spomichter in #602 * Unitree b1 integration with continuous cmd_vel Twist interface, joystick control for testing, C++ UDP server for onboard B1 by @spomichter in #601 * Joystick integrated g1 humanoid by @spomichter in #603 * Unitree b1 manipulation pose integration by @spomichter in #604 * use SHM in Foxglove by @paul-nechifor in #607 * CPU isolated shared mem by @mdaiter in #589 * silence unnecessary unitree go 2 tricks by @paul-nechifor in #615 * Pshm to lcm by @paul-nechifor in #616 * Unitree agents2 skill integration paul by @paul-nechifor in #617 * Unitree go2 runfile integration tool call issues by @spomichter in #605 * gstreamer camera by @paul-nechifor in #613 * zed local node by @leshy in #623 * ROS Bridge for Unitree G1 and B1 Navigation, Working G1 navigation by @spomichter in #610 * B1 ros navigation rebase by @spomichter in #626 * Added build directory to gitignore by @yashas-salankimatt in #628 * 2D detection module + Pointcloud localization by @leshy in #583 * Camera calibration loading by @leshy in #629 * Agent2 nav skills by @paul-nechifor in #630 * WIP shared mem again by @paul-nechifor in #650 * Fix leaks by @paul-nechifor in #649 * Fix SHM leak by @paul-nechifor in #652 * Suppress echos with counter by @paul-nechifor in #653 * Removing websocket vis causing crazy lag by @spomichter in #656 * Suppress with UUID by @paul-nechifor in #655 * Modules navigate object bbox by @spomichter in #654 * Ros bridge test fix by @alexlin2 in #660 * video g1 spatial mem + detection - tomerge by @leshy in #651 * Update README.md by @spomichter in #664 * Image upgrades! Impls for CUDA + numpy, along with an abstraction and full backwards compatibility by @mdaiter in #612 * Revert "Image upgrades! Impls for CUDA + numpy, along with an abstraction and full backwards compatibility" by @leshy in #665 * Detection second pass by @leshy in #662 * CudaImage by @spomichter in #671 * Add start/stop to all modules and other resources by @paul-nechifor in #675 * forgotten context managers by @paul-nechifor in #676 * CUDAImage, NumpyImage, Image implementations with robust backend tests for image operations by @spomichter in #680 * CudaImage by @spomichter in #677 * alibaba env var fix by @leshy in #673 * Rename FakeRTC --> ReplayRTC by @spomichter in #681 * Fix websocketvis performance rebase by @spomichter in #682 * Alexl ros nav intergration by @alexlin2 in #632 * detection pipeline rewrite, embedding, vl model standardization, reid system by @leshy in #674 * cli tooling theme by @leshy in #687 * Fix spatial memory bug in g1 by @spomichter in #689 * Add autoconnect back2 by @paul-nechifor in #684 * Add ability to remap module connections name. by @paul-nechifor in #698 * Add transport which encodes images as JPEG to improve performance. by @paul-nechifor in #693 * New Ruff autofixes by @paul-nechifor in #694 ## New Contributors * @joshuajerin made their first contribution in #310 * @mdaiter made their first contribution in #350 * @yashas-salankimatt made their first contribution in #628 **Full Changelog**: https://github.com/dimensionalOS/dimos/commits/v0.0.5

…CLIP, FastSAM, YOLO, SpatialMemory ONNX + LFS Support for Perception models - CLIP, FastSAM, YOLO, SpatialMemory Former-commit-id: ece81cc

…CLIP, FastSAM, YOLO, SpatialMemory ONNX + LFS Support for Perception models - CLIP, FastSAM, YOLO, SpatialMemory Former-commit-id: 3c97efb [formerly ece81cc] Former-commit-id: 312cea3

…CLIP, FastSAM, YOLO, SpatialMemory ONNX + LFS Support for Perception models - CLIP, FastSAM, YOLO, SpatialMemory Former-commit-id: 9dcb417 [formerly ece81cc] Former-commit-id: 312cea3

Unit tests for CLIP, YOLO, SAM2

b0efd99

leshy reviewed Jun 11, 2025

View reviewed changes

mdaiter and others added 9 commits June 17, 2025 18:45

ONNX conversions for YOLOv11 and FastSAM

912d00f

Adding CUDA to requirements.txt and explicit check for non-Torch-base…

32edfe7

…d CUDA

Move ONNX model files to dimos/models/onnx directory

4a4bb51

Cleaned up YOLO and SAM models, changed paths to use ONNX and added l…

02aee75

…ogging

Started building pyproject dependency management, added pip install .…

9733495

…[cuda]

Add GIF, MP4, and MOV files to Git LFS with binary designation

e823c02

LFS fix

b94ee3e

LFS fix

0c10ab4

Merge pull request #350 - ONNX Conversations for YOLOv11 and FastSAM

f4e130f

ONNX conversions for YOLOv11 and FastSAM

spomichter changed the title ~~Unit tests for CLIP, YOLO, SAM2~~ ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 Jun 20, 2025

mdaiter and others added 7 commits June 19, 2025 23:29

Add CLIP ONNX conversion and support, with passing vision and text tests

c76055d

Clip ONNX var cleanup

7f96932

Merge branch 'tests_clip_yolo_sam' into tests_clip_yolo_sam_fix_clip_…

b1112bb

…onnx

Merge pull request #353 from dimensionalOS/tests_clip_yolo_sam_fix_cl…

20db26f

…ip_onnx Add CLIP ONNX conversion and support, with passing vision and text tests

Merged requirements.txt

61425b0

Merge branch 'dev' into tests_clip_yolo_sam

2525502

CI code cleanup

8e9c851

spomichter changed the base branch from dev to main June 26, 2025 11:24

spomichter changed the base branch from main to dev June 26, 2025 11:24

spomichter and others added 7 commits June 26, 2025 14:52

Added get proj root path utility

763fe5b

Merge branch 'tests_clip_yolo_sam' of github.com:dimensionalOS/dimos …

5141a5b

…into tests_clip_yolo_sam

CI code cleanup

92e291d

Added CLIP to LFS /models/onnx

e3369f5

Updated CLIP to LFS correct path

f6bb8ad

YOLO and FastSAM added to LFS

abfdfe0

Unit test for continuous spatial memory processing

2951893

spomichter added 5 commits June 26, 2025 21:26

Deleted deprecated model files

def4182

Changed model paths to use LFS

28cf5ec

Added office video to tests/data

56326b1

Fixed unit tests to use LFS video stream

cb5afab

Added GPU utils

c7a4e89

spomichter merged commit ece81cc into dev Jun 27, 2025
10 checks passed

spomichter deleted the tests_clip_yolo_sam branch June 27, 2025 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2#345

ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2#345
spomichter merged 29 commits intodevfrom
tests_clip_yolo_sam

spomichter commented Jun 11, 2025

Uh oh!

leshy Jun 11, 2025 •

edited

Loading

Uh oh!

spomichter Jun 12, 2025

Uh oh!

spomichter Jun 18, 2025

Uh oh!

spomichter Jun 26, 2025

Uh oh!

leshy Jun 11, 2025 •

edited

Loading

Uh oh!

spomichter Jun 11, 2025

Uh oh!

leshy commented Jun 11, 2025 •

edited

Loading

Uh oh!

leshy commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from dimos.agents.memory.image_embedding import ImageEmbeddingProvider


		class TestImageEmbedding:

Conversation

spomichter commented Jun 11, 2025

Uh oh!

leshy Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spomichter Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

spomichter Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

spomichter Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

leshy Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spomichter Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

leshy commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leshy commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leshy Jun 11, 2025 •

edited

Loading

leshy Jun 11, 2025 •

edited

Loading

leshy commented Jun 11, 2025 •

edited

Loading