ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2#345
ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2#345spomichter merged 29 commits intodevfrom
Conversation
| """Return the path to the test video.""" | ||
| # Use a video file from assets directory | ||
| base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../assets")) | ||
| video_file = "trimmed_video_office.mov" # Use the same test video as YOLO test |
There was a problem hiding this comment.
should use LFS store for these files.
merge this, #342 then read:
https://github.com/dimensionalOS/dimos/blob/dev/docs/testing_stream_reply.md
would be something like
dir = testData("my_video_dir")the rest the same, or you can upload individual videos separately
video_path = testData("some_specific_video.mp4")can do in the follow up PR if annoying but plz let's do it :D
There was a problem hiding this comment.
Nice sounds good
There was a problem hiding this comment.
@leshy Will do this after merge since your testing stream stuff in dev
| from dimos.agents.memory.image_embedding import ImageEmbeddingProvider | ||
|
|
||
|
|
||
| class TestImageEmbedding: |
There was a problem hiding this comment.
all of this works as a non-class, not sure why we are embedding tests in class methods? not very important, might care about it for dimos-wide uniformity
I think github actions support running tests on mac, so we can investigate what it takes to have a test grid of machines for this. might be a few lines change to github actions |
Investigated this @ #346 - it wasn't fast & easy (osx runners have diff software installed) so closing for now |
ONNX conversions for YOLOv11 and FastSAM
…ip_onnx Add CLIP ONNX conversion and support, with passing vision and text tests
…into tests_clip_yolo_sam
Release v0.0.5 ## What's Changed * Unitree WebRTC implementation on rebased dev by @leshy in #277 * Update ros_observable_topic timeout to 100s by @leshy in #273 * Updated README, more clear on API key requirements and updated go2_ros2_sdk remote by @spomichter in #272 * Release v0.0.4 Patch: readme changes by @spomichter in #292 * Readme patch v0.0.4 by @spomichter in #293 * Development container & CI by @leshy in #278 * env/devcontainer ruff formatting/typing by @leshy in #294 * Global reformat 100 line length by @spomichter in #300 * Global code reformat with ruff by @leshy in #295 * Position/Vector type cleanup & tests by @leshy in #297 * Linelength100 by @leshy in #301 * Auto-delivery of binary data files for testing, rewrite of dev script by @leshy in #298 * pre-commit hooks in dev container & CI, automatic LFS upload by @leshy in #303 * Removed all submodules - Testing by @spomichter in #306 * Fixed v0.0.4 Unitree ROS runfile broken by WebRTC development, Vector.py fixes by @spomichter in #307 * test/mapper by @leshy in #305 * Reduced CI cleanup frequency to PRs only into dev/main by @spomichter in #312 * DimOS Manipulation Framework, ObjectDetectionStream Changes by @spomichter in #308 * Added auto-license header to pre-commit by @spomichter in #336 * Move thread fix for alex planner by @leshy in #334 * base typing cleanup, sensor reply tests+docs by @leshy in #309 * devcontainer docs by @leshy in #338 * ci docs by @leshy in #339 * Add Cerebras Agent by @joshuajerin in #310 * Repo cleanup by @leshy in #340 * noros builds by @leshy in #341 * Update testing_stream_reply.md by @leshy in #342 * ONNX conversions for YOLOv11 and FastSAM by @mdaiter in #350 * Test cicd fake ros change by @spomichter in #361 * Reverted cleanup workflow frequency to on any PUSH due to CICD docker workflow issues by @spomichter in #360 * Trigger docker ros rerun by @spomichter in #363 * Ros CI change detection by @leshy in #364 * trigger full rebuild by @leshy in #365 * Add CLIP ONNX conversion and support, with passing vision and text tests by @mdaiter in #353 * CI fix 3 by @leshy in #367 * ONNX Support for YOLO, SAM2 + Unit tests for CLIP, YOLO, SAM2 by @spomichter in #345 * LFS moved to utils from testing by @leshy in #368 * Contact graspnet integration on pytorch and pyproject build processes setup with cuda/manipulation tags by @spomichter in #370 * data/* deletions by @leshy in #369 * Ci pre-commit and docker builds run in parallel by @leshy in #372 * Ci shared docker cache by @leshy in #371 * Unitree WebRTC integrated with full functionality, remove all ROS dependency, refactored entire robot base class and connection interface, added explore skill by @alexlin2 in #279 * Unitree WebRTC only implementation, Exploration skills [Staging --> Dev] by @spomichter in #379 * Dask lcm multiprocess by @leshy in #377 * DimOS Packaging & Build Improvements for CPU-only, CUDA, Manipulation installations by @spomichter in #394 * Multitree go2 by @leshy in #381 * better LCM system checks, fixes bin/lfs_push by @leshy in #382 * UnitreeSpeak skill over webrtc, Voice Interface added on localhost, Voice interface on mobile device on network by @spomichter in #400 * FIX: multiprocess by @leshy in #402 * Lcmspy cli by @leshy in #404 * changed position type name to pose by @alexlin2 in #358 * WIP: foxglove bridge stub by @leshy in #411 * Create running_without_devcontainer.md by @leshy in #405 * new LCM class format support by @leshy in #417 * Fixed PoseStamped ros_msgs error in dimos-lcm by @spomichter in #457 * Fixes move stream issue, Odom receive issue by @leshy in #456 * Small stream/type fixes for unitree by @leshy in #460 * Local planner, Global Planner, Explore, SpatialMemory working via LCM/Dask Multiprocess by @spomichter in #467 * Added working runfile to Unitreego2Light class by @spomichter in #474 * Point Cloud Filtering and Segmentation, Full 6DOF Object pose estimation, Grasp generation, ZED driver support, Hosted grasp integration by @spomichter in #458 * Stream fixes, Twist, Pose, Quaternion updates by @leshy in #471 * Added self-hosted runner to full CICD by @spomichter in #484 * Full Unitree (Local planner, Explore, SpatialMemory) FakeRTC/WebRTC LCM modules working in self-hosted devcontainer by @spomichter in #487 * Porting types/ LCM msgs/ new LCM types, Transform visualization by @leshy in #477 * Tracking streams lcm dask refactor by @spomichter in #488 * Pytransforms by @leshy in #491 * Fix python and dev docker builds for CICD by @spomichter in #489 * Remove PIL Image Usage by @alexlin2 in #490 * Added missing __init__.py's to transforms by @spomichter in #493 * Added tofix pytest tag back to addopts by @spomichter in #494 * Added module docs by @spomichter in #495 * SpatialMemory converted to Dask module, input LCM odom and video streams by @spomichter in #481 * Run modules tests only on 16gb runner by @spomichter in #499 * Trigger CI only on PR or push to main/dev by @spomichter in #500 * Added more aggressive cleanup workflows by @spomichter in #501 * Visual Servoing for Pick and Place Demo by @alexlin2 in #476 * Testing run-tests container pull fix and removed modules tests by @spomichter in #505 * Fix permissions in pre-build-cleanup by @spomichter in #508 * Moved pre-build cleanup to build template by @spomichter in #509 * dimos lcm update to main branch latest commit by @leshy in #498 * RPC Kwargs by @leshy in #503 * Transform system, stream convinience features, type checking by @leshy in #504 * Dimoslcm bump by @leshy in #510 * Testing UV builds in docker by @spomichter in #513 * OccupancyGrid, Path types by @leshy in #511 * subscribing to transports/streams from main loop by @leshy in #524 * Alex Lin's version of ROS Nav2 by @alexlin2 in #514 * Agent refactor conversation history by @spomichter in #541 * Exposed optional memory_limit param in dimos core by @spomichter in #540 * Agent refactor by @spomichter in #535 * Validating transforms with ros examples by @leshy in #538 * rpc timeout by @leshy in #542 * MuJoCo Simulation by @paul-nechifor in #539 * Revert "MuJoCo Simulation" by @spomichter in #548 * perception refactor to be on parity with old architecture by @alexlin2 in #534 * Skill coordinator by @leshy in #536 * WIP Mujoco simulation by @paul-nechifor in #549 * Fix event loop leak by @paul-nechifor in #547 * Correct way to build package directly in non-editable mode, no manife… by @spomichter in #551 * Office environment mujoco by @paul-nechifor in #554 * Less bandwidth usage on LCM, bug fixed with navigation by @alexlin2 in #559 * disabled old agent tests by @leshy in #563 * Camera Module Refactor, added image rectification by @alexlin2 in #566 * long rpc timeout by @leshy in #569 * Twist message for all move command, added keyboard teleop for easy robot control in sim by @alexlin2 in #570 * numerical sort for sensor replay by @leshy in #564 * 2d detection module by @leshy in #567 * Stream timestamp alignment by @leshy in #557 * Sharpness for Images by @leshy in #560 * Jetson humanoid integration by @spomichter in #590 * 2d detection module + Agent2 - yolo demo by @leshy in #582 * jetson.md cleanup by @spomichter in #602 * Unitree b1 integration with continuous cmd_vel Twist interface, joystick control for testing, C++ UDP server for onboard B1 by @spomichter in #601 * Joystick integrated g1 humanoid by @spomichter in #603 * Unitree b1 manipulation pose integration by @spomichter in #604 * use SHM in Foxglove by @paul-nechifor in #607 * CPU isolated shared mem by @mdaiter in #589 * silence unnecessary unitree go 2 tricks by @paul-nechifor in #615 * Pshm to lcm by @paul-nechifor in #616 * Unitree agents2 skill integration paul by @paul-nechifor in #617 * Unitree go2 runfile integration tool call issues by @spomichter in #605 * gstreamer camera by @paul-nechifor in #613 * zed local node by @leshy in #623 * ROS Bridge for Unitree G1 and B1 Navigation, Working G1 navigation by @spomichter in #610 * B1 ros navigation rebase by @spomichter in #626 * Added build directory to gitignore by @yashas-salankimatt in #628 * 2D detection module + Pointcloud localization by @leshy in #583 * Camera calibration loading by @leshy in #629 * Agent2 nav skills by @paul-nechifor in #630 * WIP shared mem again by @paul-nechifor in #650 * Fix leaks by @paul-nechifor in #649 * Fix SHM leak by @paul-nechifor in #652 * Suppress echos with counter by @paul-nechifor in #653 * Removing websocket vis causing crazy lag by @spomichter in #656 * Suppress with UUID by @paul-nechifor in #655 * Modules navigate object bbox by @spomichter in #654 * Ros bridge test fix by @alexlin2 in #660 * video g1 spatial mem + detection - tomerge by @leshy in #651 * Update README.md by @spomichter in #664 * Image upgrades! Impls for CUDA + numpy, along with an abstraction and full backwards compatibility by @mdaiter in #612 * Revert "Image upgrades! Impls for CUDA + numpy, along with an abstraction and full backwards compatibility" by @leshy in #665 * Detection second pass by @leshy in #662 * CudaImage by @spomichter in #671 * Add start/stop to all modules and other resources by @paul-nechifor in #675 * forgotten context managers by @paul-nechifor in #676 * CUDAImage, NumpyImage, Image implementations with robust backend tests for image operations by @spomichter in #680 * CudaImage by @spomichter in #677 * alibaba env var fix by @leshy in #673 * Rename FakeRTC --> ReplayRTC by @spomichter in #681 * Fix websocketvis performance rebase by @spomichter in #682 * Alexl ros nav intergration by @alexlin2 in #632 * detection pipeline rewrite, embedding, vl model standardization, reid system by @leshy in #674 * cli tooling theme by @leshy in #687 * Fix spatial memory bug in g1 by @spomichter in #689 * Add autoconnect back2 by @paul-nechifor in #684 * Add ability to remap module connections name. by @paul-nechifor in #698 * Add transport which encodes images as JPEG to improve performance. by @paul-nechifor in #693 * New Ruff autofixes by @paul-nechifor in #694 ## New Contributors * @joshuajerin made their first contribution in #310 * @mdaiter made their first contribution in #350 * @yashas-salankimatt made their first contribution in #628 **Full Changelog**: https://github.com/dimensionalOS/dimos/commits/v0.0.5
…CLIP, FastSAM, YOLO, SpatialMemory ONNX + LFS Support for Perception models - CLIP, FastSAM, YOLO, SpatialMemory Former-commit-id: ece81cc
Model unit tests for @mdaiter
Not merged as I need to test on my Mac. Ran so far on my Ubuntu 22.04 machine with CUDA 11.7/Torch 2.0.1. Also need to push clean python install in setup.py so you can run
pip install -e ".[mac]"orpip install -e ".[arm,gpu]"To run: