🧪 Safactory

A universal AI agent sandbox for evaluation, training data construction, and RL training
across ten open-source environments spanning Android, OS, Minecraft, Embodied agents, QA, data processing, scientific discovery, and multimodal reasoning.

Quick Start • Environments • RL Training • Custom Env • Configuration • Data • Report

✨ Why Safactory?

Safactory provides a unified pipeline so you can go from model evaluation to RL training without changing your codebase:

Goal	What Safactory does
Evaluate agents	Run any LLM against realistic simulated environments and collect reward metrics
Build training data	Every interaction is automatically logged to SQLite — ready to be used as SFT / RL data
RL training	Feed rollout data directly into Slime-based GRPO training via the built-in Buffer Server

Key strengths:

🌍 Multi-domain environments — Android, OS, Minecraft, RoboTrustBench, Embodied ALFRED and more
⚡ High concurrency — Environment pool management with async workers for fast parallel rollouts
🔌 LLM-agnostic — Works with any OpenAI-compatible endpoint (vLLM, SGLang, OpenAI API)
🏗️ Two deployment modes — local (single machine) or remote (Ray-based cluster)
🧩 Extensible — Add a new environment in < 50 lines by implementing a simple BaseEnv interface

🚀 Quick Start

Installation

git clone https://github.com/AI45Lab/Safactory.git
cd Safactory
pip install -r requirements.txt

1 — Evaluate a Model

python launcher.py \
  --env-config env/osgym/os_config.yaml \   # Select the evaluation environment (OS / Android / Minecraft, etc.)
  --llm-base-url http://YOUR_LLM_HOST/v1 \  # Model service address
  --llm-api-key YOUR_API_KEY \              # API Key
  --llm-model YOUR_MODEL \                  # Model name
  --pool-size 1                             # Number of concurrent environment instances

This command will automatically complete environment loading, task scheduling, and evaluation execution.

Configuration

CLI parameters: Control model access and concurrent execution（e.g., --llm-*, --pool-size）
YAML configuration: Defines specific environments and tasks (e.g., dataset, environment parameters)

2 — Collect Training Data

Every run automatically records step-level interactions (messages, response, reward, environment state) to test_envs.db. Records are available immediately after the run completes.

See docs/data-manager.md for the database schema and example queries.

3 — RL Training (Optional)

With a rollout runner active, start the Slime training loop in a second terminal:

# Terminal 1 — Slime training process (requires Slime installation)
cd rl && ./run_slime_generator_vl.sh

# Terminal 2 — Buffer Server (launches the Safactory runner and collects rollouts)
cd rl && ./run_buffer_server.sh

Terminals 1 and 2 can run on different machines as long as they can communicate.

Full setup guide: docs/rl-training.md

4 — Experience Extraction & Injection（Optional）

Safactory supports optional experience extraction and injection. You can distill reusable lessons from historical trajectories into a local experience library, then inject relevant experience into the agent prompt at the start of a new episode.

For a detailed usage guide, see docs/experience-extraction-injection.md.

📚 Documentation

Guide	Description
Supported Environments	Setup, Prerequisites, Docker images, and Configuration
RL Training	Slime integration, Buffer Server setup, and RL parameters
Custom Environment	Step-by-step guide to adding a new environment
Configuration	Full CLI reference and `config.yaml` schema
Data Manager	Database schema and SQLite query examples
Report	Project report PDF

🤝 Contributing

Contributions for new environments, bug fixes, and documentation improvements are welcome.

Fork the repository
Implement your environment under env/your_env_name/
Add a config YAML and a brief README.md in the same directory
Open a Pull Request

For questions and bug reports, please use the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
core		core
docs		docs
env		env
evaluator		evaluator
exp_service		exp_service
fig		fig
manager		manager
rl		rl
utils		utils
README.md		README.md
interactor.py		interactor.py
launcher.py		launcher.py
log_setup.py		log_setup.py
rayjob_sdk-0.3.11-py3-none-any.whl		rayjob_sdk-0.3.11-py3-none-any.whl
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 Safactory

✨ Why Safactory?

🚀 Quick Start

Installation

1 — Evaluate a Model

2 — Collect Training Data

3 — RL Training (Optional)

4 — Experience Extraction & Injection（Optional）

📚 Documentation

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧪 Safactory

✨ Why Safactory?

🚀 Quick Start

Installation

1 — Evaluate a Model

2 — Collect Training Data

3 — RL Training (Optional)

4 — Experience Extraction & Injection（Optional）

📚 Documentation

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages