Open-source AI infrastructure for teams that need to own their stack.
We're a software company in Washington State building tools that make self-hosted LLM deployment practical on Kubernetes. Our work is open source, Apache 2.0 licensed, and designed for production use.
LLMKube — Kubernetes Operator for LLM Inference
A Kubernetes operator that turns LLM deployment into a two-line YAML problem. Define a Model and an InferenceService, and the operator handles the rest — downloading, caching, GPU scheduling, health checks, and exposing an OpenAI-compatible API.
llmkube deploy llama-3.1-8b --gpuWhat makes it different:
- Heterogeneous GPU support — NVIDIA CUDA and Apple Silicon Metal in the same cluster, managed by the same CRDs. The Metal Agent runs inference natively on macOS while Kubernetes handles orchestration.
- OpenAI-compatible API — Drop-in replacement for OpenAI endpoints. Works with LangChain, LlamaIndex, and any OpenAI SDK.
- Full observability — Prometheus metrics, OpenTelemetry tracing, and Grafana dashboards included.
- Air-gap ready — Built for environments where cloud APIs aren't an option.
Everything we build is open source first. We believe the best infrastructure software gets built in the open, with input from the people who actually use it.
We welcome contributions at every level — from filing issues and improving docs to adding new features. If you're interested in Kubernetes, GPU orchestration, or LLM infrastructure, we'd love to work with you.
- Issues & features: GitHub Issues
- Questions & ideas: GitHub Discussions
- Contributing: CONTRIBUTING.md
- Website: defilan.com
- GitHub: github.com/defilantech
- Location: Washington State