Skip to content

Qitor/qitos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

QitOS

QitOS Logo

Python License Docs PyPI Repo

QitOS is the torch-flavor framework for agent researchers.

Prototype methods, run benchmarks, and inspect long-horizon trajectories on one AgentModule + Engine kernel with built-in qita observability.

Quickstart · Tutorial Track · Benchmarks · CLI Reference · Changelog · Chinese README

Latest Progress

  • v0.5 multimodal core phase 1 is now in the main kernel: OpenAI-compatible image input, screenshot-first ObservationPack support, qita visual asset inspection, and a new visual_inspect_agent baseline for visual-web / GUI research.
  • v0.5 computer-use phase 1 is now live: an OSWorld-inspired DesktopEnv, provider-neutral GUI action protocols, ComputerUseToolSet, and a new openai_cua_agent example for desktop automation research on OpenAI-compatible multimodal models.
  • Desktop benchmarking is now split into clear layers: desktop-starter remains the canonical starter benchmark, qitos.recipes.desktop.osworld_starter now hosts the reproducible baseline recipe, and qitos.benchmark.osworld is the new home for real OSWorld-style adapter/runtime/evaluator integration.
  • QitOS now separates starter benchmarks, real benchmark adapters, and reproducible recipes across the whole benchmark surface: GAIA, Tau-Bench, CyBench, desktop-starter, and osworld all route through qitos.benchmark plus qitos.recipes, with a new contributor guide for third-party benchmark integration.

What's New in v0.3.0

  • Official reproducible-run foundation with RunSpec, ExperimentSpec, and normalized benchmark outputs.
  • New qit bench workflow for run, eval, replay, and export.
  • qita replay, export, and diff surfaces for review-grade trajectory inspection.
  • Course-style tutorial track plus new reproducibility and failed-run replay guides.

If this direction resonates, please star the repo, open an issue, or contribute. Early feedback matters a lot.

Live Terminal of QitOS for Code Review

QitOS long-running agent demo

Who QitOS is For

  • Method researchers who want to change prompts, parsers, critics, tools, and memory policies without rewriting the runtime.
  • Benchmark users who want GAIA, Tau-Bench, and CyBench workflows on the same kernel they use for agent development.
  • Long-running agent debuggers who care about trajectory review, replay, diff, and context-collapse diagnosis instead of app scaffolding alone.

Run QitOS in 2 Minutes

The minimal agent in QitOS is a minimal coding agent. It configures a real model, works inside a workspace, edits code, runs a verification command, and leaves behind a qita-ready trace.

pip install "qitos[models]"
export OPENAI_API_KEY="sk-..."
qit demo minimal
qita board --logdir runs

Optional but common for OpenAI-compatible providers:

export OPENAI_BASE_URL="https://api.siliconflow.cn/v1/"
export QITOS_MODEL="Qwen/Qwen3-8B"

qit demo minimal seeds a tiny buggy workspace, asks a model-backed coding agent to fix it, verifies the patch, and writes the trajectory to ./runs.

Then go deeper:

Why QitOS

If you want... QitOS gives you...
reproducible agent research a stable AgentModule + Engine kernel
observability qita board, replay, export, and trace artifacts
benchmark workflows GAIA, Tau-Bench, and CyBench adapters
less framework glue code one canonical execution loop

Example Gallery

Core Patterns

  • ReAct: text protocol + one-action-per-step baseline.
  • PlanAct: explicit plan first, then execute step by step.
  • Tree-of-Thought: branch and select before acting.
  • Reflexion: actor-critic loop with grounded retry behavior.

Real Agents

  • Coding agent: practical coding loop with editor, shell, and memory.
  • SWE agent: richer planning-oriented software engineering flow.
  • Computer-use agent: web research and computer-use style interaction.
  • OpenAI CUA-inspired desktop agent: OSWorld-style screenshot-first desktop control on the QitOS kernel.
  • Visual inspect agent: screenshot-first multimodal baseline for visual-web and GUI research.
  • EPUB reader: document-grounded reasoning with branching.

Evaluation

  • GAIA: benchmark runner on the QitOS kernel.
  • Tau-Bench: standardized benchmark adapter path.
  • CyBench: CTF-like evaluation with guided metrics.

Canonical examples live in:

Tooling Layout

QiTOS separates tool imports into three layers:

  • qitos.kit: the simplest curated entrypoint for common toolsets
  • qitos.kit.toolset: scenario-oriented presets and registry builders
  • qitos.kit.tool.<domain>: advanced atomic capability imports

Default composition is list-first:

from qitos import ToolRegistry
from qitos.kit.tool.file import ReadFile
from qitos.kit.toolset import coding_tools

registry = ToolRegistry().include_toolset(
    [
        ReadFile(workspace_root="."),
        coding_tools(workspace_root="."),
    ]
)

Documentation Map

Preview

QitOS CLI qita Board qita Trajectory View
QitOS CLI qita Board qita Trajectory View

Status

QitOS is currently Alpha.

  • Stable direction: AgentModule + Engine, trace/qita flow, canonical examples, benchmark adapters, and official reproducible-run contracts.
  • Likely to evolve: higher-level convenience APIs, some kit modules, and experimental toolsets.
  • If you are evaluating adoption, start from the kernel and examples, not assumptions about frozen surface area.
  • For ongoing project evolution and upgrade notes, see CHANGELOG.md.

Installation and Versions

  • Supported Python version: 3.10+
  • User install: pip install "qitos[models]"
  • Minimal coding agent: qit demo minimal
  • Optional provider config: OPENAI_API_KEY, OPENAI_BASE_URL, QITOS_MODEL
  • Core-only install: pip install qitos
  • Repo source install: pip install -r requirements.txt
  • Full contributor install: pip install -r requirements-dev.txt
  • Installation guide: Installation

Contributing

Contributions are welcome, especially around benchmark adapters, memory/history workflows, qita UX, and cyber-agent use cases. Start with CONTRIBUTING.md for the PR process, DEVELOPMENT.md for the local workflow, ARCHITECTURE.md for system design, SECURITY.md for disclosure guidance, and CODE_OF_CONDUCT.md for community expectations.

License

MIT. See LICENSE.