A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Dec 9, 2025 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
SGLang is a fast serving framework for large language models and vision language models.
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Low-code framework for building custom LLMs, neural networks, and other AI models
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
Add a description, image, and links to the llama topic page so that developers can more easily learn about it.
To associate your repository with the llama topic, visit your repo's landing page and select "manage topics."