QitOS is the torch-flavor framework for agent researchers.
Prototype methods, run benchmarks, and inspect long-horizon trajectories on one AgentModule + Engine kernel with built-in qita observability.
Quickstart · Tutorial Track · Benchmarks · CLI Reference · Changelog · Chinese README
- v0.5 multimodal core phase 1 is now in the main kernel: OpenAI-compatible image input, screenshot-first
ObservationPacksupport, qita visual asset inspection, and a newvisual_inspect_agentbaseline for visual-web / GUI research. - v0.5 computer-use phase 1 is now live: an OSWorld-inspired
DesktopEnv, provider-neutral GUI action protocols,ComputerUseToolSet, and a newopenai_cua_agentexample for desktop automation research on OpenAI-compatible multimodal models. - Desktop benchmarking is now split into clear layers:
desktop-starterremains the canonical starter benchmark,qitos.recipes.desktop.osworld_starternow hosts the reproducible baseline recipe, andqitos.benchmark.osworldis the new home for real OSWorld-style adapter/runtime/evaluator integration. - QitOS now separates starter benchmarks, real benchmark adapters, and reproducible recipes across the whole benchmark surface: GAIA, Tau-Bench, CyBench,
desktop-starter, andosworldall route throughqitos.benchmarkplusqitos.recipes, with a new contributor guide for third-party benchmark integration.
- Official reproducible-run foundation with
RunSpec,ExperimentSpec, and normalized benchmark outputs. - New
qit benchworkflow forrun,eval,replay, andexport. qitareplay, export, and diff surfaces for review-grade trajectory inspection.- Course-style tutorial track plus new reproducibility and failed-run replay guides.
If this direction resonates, please star the repo, open an issue, or contribute. Early feedback matters a lot.
- Method researchers who want to change prompts, parsers, critics, tools, and memory policies without rewriting the runtime.
- Benchmark users who want GAIA, Tau-Bench, and CyBench workflows on the same kernel they use for agent development.
- Long-running agent debuggers who care about trajectory review, replay, diff, and context-collapse diagnosis instead of app scaffolding alone.
The minimal agent in QitOS is a minimal coding agent. It configures a real model, works inside a workspace, edits code, runs a verification command, and leaves behind a qita-ready trace.
pip install "qitos[models]"
export OPENAI_API_KEY="sk-..."
qit demo minimal
qita board --logdir runsOptional but common for OpenAI-compatible providers:
export OPENAI_BASE_URL="https://api.siliconflow.cn/v1/"
export QITOS_MODEL="Qwen/Qwen3-8B"qit demo minimal seeds a tiny buggy workspace, asks a model-backed coding agent to fix it, verifies the patch, and writes the trajectory to ./runs.
Then go deeper:
- Want ReAct? See
examples/patterns/react.py - Want a coding agent? See
examples/real/coding_agent.py - Want benchmarks? Start with the benchmark guides
| If you want... | QitOS gives you... |
|---|---|
| reproducible agent research | a stable AgentModule + Engine kernel |
| observability | qita board, replay, export, and trace artifacts |
| benchmark workflows | GAIA, Tau-Bench, and CyBench adapters |
| less framework glue code | one canonical execution loop |
- ReAct: text protocol + one-action-per-step baseline.
- PlanAct: explicit plan first, then execute step by step.
- Tree-of-Thought: branch and select before acting.
- Reflexion: actor-critic loop with grounded retry behavior.
- Coding agent: practical coding loop with editor, shell, and memory.
- SWE agent: richer planning-oriented software engineering flow.
- Computer-use agent: web research and computer-use style interaction.
- OpenAI CUA-inspired desktop agent: OSWorld-style screenshot-first desktop control on the QitOS kernel.
- Visual inspect agent: screenshot-first multimodal baseline for visual-web and GUI research.
- EPUB reader: document-grounded reasoning with branching.
- GAIA: benchmark runner on the QitOS kernel.
- Tau-Bench: standardized benchmark adapter path.
- CyBench: CTF-like evaluation with guided metrics.
Canonical examples live in:
QiTOS separates tool imports into three layers:
qitos.kit: the simplest curated entrypoint for common toolsetsqitos.kit.toolset: scenario-oriented presets and registry buildersqitos.kit.tool.<domain>: advanced atomic capability imports
Default composition is list-first:
from qitos import ToolRegistry
from qitos.kit.tool.file import ReadFile
from qitos.kit.toolset import coding_tools
registry = ToolRegistry().include_toolset(
[
ReadFile(workspace_root="."),
coding_tools(workspace_root="."),
]
)- Start here: Introduction
- First successful run: Quickstart
- Install options: Installation
- Build your own minimal coding agent: First Agent
- Build the first screenshot-first baseline: Multimodal Core and Visual-Web Research
- Learn the runtime: AgentModule / Engine
- Inspect traces: Observability
- Follow the course: Tutorials
- Run benchmarks: Benchmarks Overview
- Check commands: CLI Reference
- Need API details: API Reference
| QitOS CLI | qita Board | qita Trajectory View |
|
|
|
QitOS is currently Alpha.
- Stable direction:
AgentModule + Engine, trace/qita flow, canonical examples, benchmark adapters, and official reproducible-run contracts. - Likely to evolve: higher-level convenience APIs, some
kitmodules, and experimental toolsets. - If you are evaluating adoption, start from the kernel and examples, not assumptions about frozen surface area.
- For ongoing project evolution and upgrade notes, see CHANGELOG.md.
- Supported Python version: 3.10+
- User install:
pip install "qitos[models]" - Minimal coding agent:
qit demo minimal - Optional provider config:
OPENAI_API_KEY,OPENAI_BASE_URL,QITOS_MODEL - Core-only install:
pip install qitos - Repo source install:
pip install -r requirements.txt - Full contributor install:
pip install -r requirements-dev.txt - Installation guide: Installation
Contributions are welcome, especially around benchmark adapters, memory/history workflows, qita UX, and cyber-agent use cases. Start with CONTRIBUTING.md for the PR process, DEVELOPMENT.md for the local workflow, ARCHITECTURE.md for system design, SECURITY.md for disclosure guidance, and CODE_OF_CONDUCT.md for community expectations.
MIT. See LICENSE.

