MoMor (Model service Monitor) is an out-of-the-box monitoring system deployment for popular model serving frameworks. It provides comprehensive observability and flexible alerting for your model services without modification to your existing model server code.
-
🚀 Out-of-the-Box Experience Get started immediately with minimal configuration. Check out our quick_start to see how easy it is.
-
🔌 Broad Framework Support Seamlessly integrates with popular model serving systems, including:
- Triton Inference Server
- SGLang
- copied SGLang demo
- vLLM
- copied from vLLM demo
- Text Embeddings Inference (TEI)
- TODO
- Any service exposing a Prometheus metrics endpoint
- welcome PRs
-
🛡️ Robust Data Collection Leverages the Telegraf Agent for reliable and efficient metrics scraping and aggregation.
-
📊 Elegant Observability Features stunning dashboards and proactive alerting built on top of Grafana, giving you deep insights into your model performance.
-
💾 High-Performance Storage Utilizes VictoriaMetrics for fast, cost-saving, scalable, and long-term time-series data storage.
-
⚙️ Production-Ready Ready for high availability (HA) deployments, delivering a robust, uninterrupted monitoring system.
Simplely use the docker images.
docker run -tid --name=momor --network=host --privileged -v ($pwd)/workspace/:/workspace wingedge777/momor:latest bashconsidering you have a model server (one of tritonserver, sglang, vllm, text-embedding-inference) running on host machine, and it expose /metrics endpoint.
docker exec -ti momor bash
bash script/main.shIf you want to monitor a remote server, just change telegraf urls config.
See advance_usage
Thanks to the model serving frameworks' open-source community for their invaluable contributions. Special thanks to the creators of Grafana, VictoriaMetrics, and Telegraf. This project is built upon their excellent work.