A universal AI agent sandbox for evaluation, training data construction, and RL training
across ten open-source environments spanning Android, OS, Minecraft, Embodied agents, QA, data processing, scientific discovery, and multimodal reasoning.
Quick Start • Environments • RL Training • Custom Env • Configuration • Data • Report
Safactory provides a unified pipeline so you can go from model evaluation to RL training without changing your codebase:
| Goal | What Safactory does |
|---|---|
| Evaluate agents | Run any LLM against realistic simulated environments and collect reward metrics |
| Build training data | Every interaction is automatically logged to SQLite — ready to be used as SFT / RL data |
| RL training | Feed rollout data directly into Slime-based GRPO training via the built-in Buffer Server |
Key strengths:
- 🌍 Multi-domain environments — Android, OS, Minecraft, RoboTrustBench, Embodied ALFRED and more
- ⚡ High concurrency — Environment pool management with async workers for fast parallel rollouts
- 🔌 LLM-agnostic — Works with any OpenAI-compatible endpoint (vLLM, SGLang, OpenAI API)
- 🏗️ Two deployment modes —
local(single machine) orremote(Ray-based cluster) - 🧩 Extensible — Add a new environment in < 50 lines by implementing a simple
BaseEnvinterface
git clone https://github.com/AI45Lab/Safactory.git
cd Safactory
pip install -r requirements.txtpython launcher.py \
--env-config env/osgym/os_config.yaml \ # Select the evaluation environment (OS / Android / Minecraft, etc.)
--llm-base-url http://YOUR_LLM_HOST/v1 \ # Model service address
--llm-api-key YOUR_API_KEY \ # API Key
--llm-model YOUR_MODEL \ # Model name
--pool-size 1 # Number of concurrent environment instancesThis command will automatically complete environment loading, task scheduling, and evaluation execution.
Configuration
-
CLI parameters: Control model access and concurrent execution(e.g.,
--llm-*,--pool-size) -
YAML configuration: Defines specific environments and tasks (e.g., dataset, environment parameters)
Every run automatically records step-level interactions (messages, response, reward, environment state) to test_envs.db. Records are available immediately after the run completes.
See docs/data-manager.md for the database schema and example queries.
With a rollout runner active, start the Slime training loop in a second terminal:
# Terminal 1 — Slime training process (requires Slime installation)
cd rl && ./run_slime_generator_vl.sh
# Terminal 2 — Buffer Server (launches the Safactory runner and collects rollouts)
cd rl && ./run_buffer_server.shTerminals 1 and 2 can run on different machines as long as they can communicate.
Full setup guide: docs/rl-training.md
Safactory supports optional experience extraction and injection. You can distill reusable lessons from historical trajectories into a local experience library, then inject relevant experience into the agent prompt at the start of a new episode.
For a detailed usage guide, see docs/experience-extraction-injection.md.
| Guide | Description |
|---|---|
| Supported Environments | Setup, Prerequisites, Docker images, and Configuration |
| RL Training | Slime integration, Buffer Server setup, and RL parameters |
| Custom Environment | Step-by-step guide to adding a new environment |
| Configuration | Full CLI reference and config.yaml schema |
| Data Manager | Database schema and SQLite query examples |
| Report | Project report PDF |
Contributions for new environments, bug fixes, and documentation improvements are welcome.
- Fork the repository
- Implement your environment under
env/your_env_name/ - Add a config YAML and a brief
README.mdin the same directory - Open a Pull Request
For questions and bug reports, please use the issue tracker.