Run AI models too large for your Mac's memory — at near-full speed. Intelligent expert caching, speculative execution, and 15+ research techniques for MoE inference on Apple Silicon.
-
Updated
Apr 7, 2026 - Python
Run AI models too large for your Mac's memory — at near-full speed. Intelligent expert caching, speculative execution, and 15+ research techniques for MoE inference on Apple Silicon.
Production RAG system for scientific literature synthesis with SPECTER2 embeddings, Metal GPU acceleration, multi-LLM support, and automatic BibTeX citations.
First native Apple Silicon (MLX) port of Diffusion Policy (RSS 2023 Best Paper). 6 policy variants, 472 tests, Metal GPU verified. Train and run visuomotor diffusion policies on M-series — no CUDA required.
GPU-accelerated quantum circuit simulator for Apple Silicon (MLX) with Google Willow, IBM Heron, and QuTech Tuna-9 noise models. Up to 34 qubits on M2 Ultra.
Comprehensive VAE performance benchmark comparing PyTorch vs TensorFlow on Apple Silicon (M1/M2/M3). Quantifies training speed, memory efficiency, and Metal GPU utilization across Python versions to guide framework selection for ML prototyping and production deployment.
The NVIDIA Container Toolkit for Mac — Give any Docker container full Apple Silicon Metal GPU access. 100+ GPU operations, LLM inference, training, image gen, audio, embeddings. Zero CUDA. Just Metal.
LeRobot-MLX: HuggingFace LeRobot ported to Apple MLX for native Apple Silicon robotics policy training & inference. 10 policies, 739+ tests, Metal GPU accelerated.
GPU-accelerated robot motion planning on Apple Silicon. Port of NVIDIA cuRobo (CUDA) to MLX — real-time collision-free trajectory generation on M-series Macs.
Add a description, image, and links to the metal-gpu topic page so that developers can more easily learn about it.
To associate your repository with the metal-gpu topic, visit your repo's landing page and select "manage topics."