Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Greptile OverviewGreptile SummaryThis PR delivers a comprehensive refactoring of the models architecture, establishing a clean type hierarchy for vision-language models, embedding models, and captioning models. The refactor introduces consistent Resource lifecycle management with Key improvements:
Architecture: Integration: Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant App
participant VlModel
participant LocalModel
participant Resource
participant GPU
Note over App,GPU: Model Initialization
App->>VlModel: __init__(auto_resize=(512,512))
VlModel->>LocalModel: super().__init__()
LocalModel->>LocalModel: Check autostart/warmup
Note over App,GPU: Resource Lifecycle
App->>VlModel: start()
VlModel->>LocalModel: start()
LocalModel->>LocalModel: Access _model @cached_property
LocalModel->>GPU: Load model to device
GPU-->>LocalModel: Model loaded
VlModel->>VlModel: Run warmup query
Note over App,GPU: Query Processing
App->>VlModel: query(image, "What is this?")
VlModel->>VlModel: _prepare_image() - resize if needed
VlModel->>GPU: Forward pass
GPU-->>VlModel: Response
VlModel-->>App: String response
Note over App,GPU: Batch Query Optimization
App->>VlModel: query_batch([img1, img2, img3], "Describe")
VlModel->>VlModel: Prepare all images
VlModel->>GPU: Batched forward pass
GPU-->>VlModel: Batch responses
VlModel-->>App: List of responses
Note over App,GPU: Cleanup
App->>VlModel: stop()
VlModel->>LocalModel: stop()
LocalModel->>LocalModel: Delete _model from __dict__
LocalModel->>GPU: torch.cuda.empty_cache()
GPU-->>LocalModel: Memory freed
|
|
@greptileai rereview please, also don't include this sequence diagram please |
There was a problem hiding this comment.
Additional Comments (1)
-
dimos/models/vl/moondream.py, line 154-156 (link)logic:
query_detections()converts normalized coords to pixels using original image dimensions, but_to_pil()may have resized the image viaauto_resize. When auto_resize is active, moondream returns coords normalized to the resized image, so scaling them back to the original dimensions will be incorrect.Use the resized dimensions from the PIL image instead:
37 files reviewed, 1 comment
@greptileai this is not correct, moondream returns normalized (0.0-1.0) coordinates so we don't need to take scaling factor into account when generating detections |
There was a problem hiding this comment.
You're absolutely right! I made an error in my analysis.
Moondream's detect() method returns normalized coordinates (0.0-1.0) regardless of the input image size, so the coordinates should indeed be scaled directly to the original image dimensions without considering any resizing factor.
The current implementation in query_detections() is correct:
pil_image = self._to_pil(image) # May resize image
result = self._model.detect(pil_image, query, settings=settings)
# result contains normalized coords (0.0-1.0)
# Scale normalized coords to original image dimensions - this is correct
detection = Detection2DBBox(
bbox=(
result["x1"] * image.width, # Scale to original width
result["y1"] * image.height, # Scale to original height
result["x2"] * image.width,
result["y2"] * image.height,
),
# ...
)Since Moondream normalizes its output coordinates, they represent the same relative positions regardless of input image size, so scaling them to the original image dimensions is the right approach. Thank you for the correction!
@greptileai so why does your review score my moondream implementation 3/5 still, can u update |
dimos/utils/cli/plot.py
Outdated
| # tw, th = plt.terminal_size() | ||
| # return tw, th // 2 |
There was a problem hiding this comment.
Should this be uncommented?
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: cbe68d2
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 26e61a70a9469f2e33e51f1296f082b470009c09 [formerly 7ffc878] Former-commit-id: 725e628
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: 2026ce3
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 7ffc878
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 7ffc878 Former-commit-id: 067332a
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: 81f5aa8 [formerly 2026ce3] Former-commit-id: 3f5919d
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 26e61a70a9469f2e33e51f1296f082b470009c09 [formerly 7ffc878] Former-commit-id: 725e628 Former-commit-id: 2e5f1d4
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: 81f5aa8 [formerly 2026ce3] Former-commit-id: 3f5919d
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 7ffc878 Former-commit-id: 067332a
Release v0.0.6: Pre-Launch Unitree Go2 Release ## What's Changed * Added is_flying_to_target agent skill and fly_to now return string for agent feeback by @spomichter in #635 * Release v0.0.5 by @spomichter in #697 * Rebase ivan g1 by @paul-nechifor in #709 * Navspec by @leshy in #648 * Remove depth module from base unitree go2 blueprints by @spomichter in #712 * Fix Unitree Go2 (replay and spatial memory) by @paul-nechifor in #714 * Add G1 blueprints, and simulation by @paul-nechifor in #724 * New g1 blueprint runfiles by @spomichter in #706 * Update G1/Go2 skills and remove some Robot interfaces by @paul-nechifor in #717 * Add dimos-robot end-to-end test with agents by @paul-nechifor in #716 * Run DimOS and ROS nav in Docker by @paul-nechifor in #700 * Anim experiment by @leshy in #701 * G1 navigation documentation fixes by @spomichter in #738 * Rename dimos-robot to dimos by @paul-nechifor in #740 * Use a process for MuJoCo by @paul-nechifor in #747 * Remove unneeded code files by @paul-nechifor in #718 * Make pygame G1JoystickModule usable for all modules by @paul-nechifor in #741 * error on conflicts by @paul-nechifor in #763 * Hosted Moondream 3 for VLM queries by @alexlin2 in #751 * transport: Remove DaskTransport dead code by @ym-han in #767 * Add editorconfig by @paul-nechifor in #769 * add `type: ignore` by @paul-nechifor in #768 * exclude .md changes from CICD builds by @spomichter in #770 * Working Ivan g1 detection in blueprints by @spomichter in #737 * small env fixes on a fresh install by @leshy in #778 * autofixes by @paul-nechifor in #744 * Support running local agents by @paul-nechifor in #739 * pin major version of langchain packages by @paul-nechifor in #789 * Deduplicate Unitree connections/entrypoints. by @paul-nechifor in #749 * Add TTS and STT by @paul-nechifor in #753 * fix mypy errors by @paul-nechifor in #791 * Use structlog and store JSON logs on disk by @paul-nechifor in #715 * Rpc fixes merge by @paul-nechifor in #801 * transport improvements by @leshy in #713 * Added concurrency check by @spomichter in #803 * make connections work with string annotations by @paul-nechifor in #807 * Run mypy checks in GitHub Actions by @paul-nechifor in #805 * Fix incorrect `= None` by @paul-nechifor in #802 * increase mujoco timeout by @paul-nechifor in #823 * MacOS Support: tests + devShell + mujoco by @jeff-hykin in #745 * nix flake revert by @leshy in #824 * fix mypy issues by @paul-nechifor in #827 * PRODUCTION Nav skills on drone with tracking by @spomichter in #640 * Fix added memory limit to blueprint global config by @spomichter in #856 * models/ refactor by @leshy in #819 * Point Detections by @leshy in #859 * Add generic ignore to gitignore by @jeff-hykin in #864 * fix set transport by @paul-nechifor in #866 * cli-precedence by @paul-nechifor in #857 * show `get_data` progress by @paul-nechifor in #873 * skip if OPENAI_API_KEY not defined by @paul-nechifor in #872 * build foxglove extension by @paul-nechifor in #871 * New planner by @paul-nechifor in #792 * Use `uv` by @paul-nechifor in #870 * Add direnv to gitignore by @Kaweees in #875 * Cuda mapper by @leshy in #862 * rename agents to agents_deprecated by @paul-nechifor in #877 * new planner new mapper by @paul-nechifor in #879 * odom ts parsing by @leshy in #882 * Sim fix by @paul-nechifor in #881 * navigation tuning by @leshy in #883 * Fix: Module init and agents by @leshy in #876 * Remove old setup.sh by @paul-nechifor in #888 * Release planner by @leshy in #887 * fix replay leak by @paul-nechifor in #890 * first pass on large file deletions by @leshy in #891 * Generalized manipulator driver by @mustafab0 in #831 * Restore MacOS Support (flake.nix) by @jeff-hykin in #863 * check-uv by @paul-nechifor in #902 * Make dimos pip-installable by @paul-nechifor in #731 * Revert "Restore MacOS Support (flake.nix)" by @leshy in #907 * jeff flake without py env stuff by @leshy in #911 * remove deprecated docker files by @paul-nechifor in #912 * command center stop and home by @leshy in #893 * use packages by @paul-nechifor in #915 * Fix agents prompt by @paul-nechifor in #914 * fix manifest by @paul-nechifor in #916 * fix move skill by @paul-nechifor in #913 * Ignore individual errors by @paul-nechifor in #919 * Feat/rerun latency panels by @Nabla7 in #917 * WIP Release detections by @leshy in #889 * Remove old navigation modules by @paul-nechifor in #923 * Feat/rerun latency panels by @Nabla7 in #925 * Repair camera module by @leshy in #929 * Repair Stream by @leshy in #932 * Docs Clean by @leshy in #933 * docs: sensor streams by @leshy in #934 * Docs: bugfixes by @leshy in #940 * Fixed doclinks to use git ls by @spomichter in #943 * Examples: third party language interop by @leshy in #946 * DOCS: temporal alignment docs improvements by @leshy in #944 * filter bots from commits by @leshy in #947 * Fix skills by @paul-nechifor in #950 * Limit Rerun viewer memory to 4GB default by @Nabla7 in #949 * Working dimensional MCP server - tested with Claude Code MCP client by @spomichter in #945 * allow registration of different agents by @paul-nechifor in #951 * Pre commit large files by @leshy in #953 * Proper Realsense and ZED Camera Drivers by @alexlin2 in #935 * Granular deps by @leshy in #894 * class VLMAgent(AgentSpec, Module) for streamed VLM queries over Transport by @spomichter in #960 * mac compatible commit filter by @paul-nechifor in #961 ## New Contributors * @ym-han made their first contribution in #767 * @jeff-hykin made their first contribution in #745 * @Kaweees made their first contribution in #875 * @mustafab0 made their first contribution in #831 * @Nabla7 made their first contribution in #917 **Full Changelog**: v0.0.5...v0.0.6 Former-commit-id: 26e61a70a9469f2e33e51f1296f082b470009c09 [formerly 7ffc878] Former-commit-id: 725e628 Former-commit-id: 2e5f1d4
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: 2be857a [formerly cbe68d2] Former-commit-id: 3583bdb
* captioner modules implemented in models/vl, flake.nix fixes * model structure rework * refactor * bugfix * removed double update_intrinsic on metric3d * mypy * typing fixes * embedding models rewrite * mobileclip preprocess accessor rewrite * torch reid models added to lfs, reid/embedding model cleanup * mobileclip upload * batch vlm querying * moondream batch queries and tests * type fixes * proper model resource management, speed tests, auto-resizing, plotting * type fixes * tests, mypy, correct cleanup * metric3d tests * attempting to remove dead code * scaling bugfix for visual models * docstring fix * plotext dep * open clip dep * open clip dep fix * gdown dep * tensorboard dep * typing fixes for detections and plotter * person tracker typing fix * py 3.10 typing fix * last type fix * ignore missing imports (for ros deps) * nicer init for florence * type fixes * mypy ignore ros/mujoco * addressing PR comments * image is a fixture * captioner fixtures * all PR comments addressed Former-commit-id: 8be510d [formerly cbe68d2] Former-commit-id: 3583bdb
adds ignore missing imports for ros/mujoco
writes consistent type spec for models
visual language models
models/vl/base.pyembedding models
models/embedding/base.pylocal models
models/base.pyhuggingface models
models/base.pyclassDiagram class LocalModel { +device +dtype +_model } class HuggingFaceModel { +model_name +from_pretrained() } class Captioner { <<ABC>> +caption(image) } class VlModel { +query(image, query) +query_json() +query_detections() } class EmbeddingModel~E~ { <<ABC>> +embed(*images) +embed_text(*texts) } class Embedding { +vector +to_numpy() +to_torch() } LocalModel <|-- HuggingFaceModel Captioner <|-- VlModel Captioner <|-- Florence2Model HuggingFaceModel <|-- Florence2Model EmbeddingModel <|-- CLIPModel HuggingFaceModel <|-- CLIPModel EmbeddingModel <|-- MobileCLIPModel LocalModel <|-- MobileCLIPModel VlModel <|-- Qwen : remote API VlModel <|-- Moondream : local GPU VlModel <|-- MoondreamHosted : remote API EmbeddingModel ..> Embedding : producescomprehensive parametric tests for all
vl/embedding/andcaptioning/models (you don't write your own tests for a new module, just add it to a list)ensures models are resources, so have start() and stop() methods that clean up cuda resources, adds optional argument for warmup when initializing
adds methods for batch querying of models (for speed)
adds optional auto-resize of input images to models (for speed)
adds florence2 fast captioning model (implements captioning model type, all vlms are captioners, not all captioners are vlms)
adds torch reid models to lfs
adds mobileclip models to lfs
cleans up dead code (models/segmentation, pointcloud, labels, qwen)
adds a quick cli graphing system at dimos/utils/cli/plot.py for profiling