agentmux is a profile-driven launcher for vllm serve. It keeps model presets, runtime flags,
and environment choices in one repo so you can start a known-good server shape without retyping a
huge command line every time.
uv sync
uv pip install vllm --torch-backend=autouv run agentmux list
uv run agentmux show qwen2_5_7b
uv run agentmux render deepseek_r1_distill_qwen_14b
uv run agentmux serve qwen2_5_7b --dry-run
uv run agentmux serve qwen2_5_7bProfiles live in agentmux.toml.
Each profile can define:
modelserved_model_namehost/portdtypegpu_memory_utilizationmax_model_lenmax_num_seqstensor_parallel_sizeattention_backendenvextra_argsnotes
- Start with stable vLLM on Python 3.12.
- Use
uv pip install vllm --torch-backend=autofor the initial GPU-aware install. .envis loaded automatically byagentmuxbefore rendering or launching profiles.- Keep global env locked to stable machine facts like
CUDA_VISIBLE_DEVICES=0; prefer per-profile flags over broad vLLM env vars. - Keep project-specific wrappers and presets here; do not rely on shell history for production-ish runs.
- Specs:
docs/specs/ - Decisions:
docs/decisions/ - Reference:
docs/reference/