diff --git a/README.md b/README.md index b2bb9767a..225687226 100644 --- a/README.md +++ b/README.md @@ -21,43 +21,28 @@ Train multi-step agents for real-world tasks using GRPO. -## πŸ¦œπŸ”— LangGraph Integration: Build Smarter Multi-Step Agents +## πŸ“ RULER: Zero-Shot Agent Rewards -ART's **LangGraph integration** enables you to train sophisticated ReAct-style agents that improve through reinforcement learning. Build agents that reason, use tools, and adapt their behavior over time without manual prompt engineering. +**RULER** (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the restβ€”**no labeled data, expert feedback, or reward engineering required**. ✨ **Key Benefits:** -- **Automatic behavior improvement** - Train agents to get better at multi-step reasoning -- **Tool usage optimization** - Learn when and how to use tools more effectively -- **Seamless integration** - Drop-in replacement for LangGraph's LLM initialization -- **RULER compatibility** - Train without hand-crafted reward functions +- **2-3x faster development** - Skip reward function engineering entirely +- **General-purpose** - Works across any task without modification +- **Strong performance** - Matches or exceeds hand-crafted rewards in 3/4 benchmarks +- **Easy integration** - Drop-in replacement for manual reward functions ```python -import art -from art.langgraph import init_chat_model, wrap_rollout -from langgraph.prebuilt import create_react_agent +# Before: Hours of reward engineering +def complex_reward_function(trajectory): + # 50+ lines of careful scoring logic... + pass -async def email_rollout(model: art.Model, scenario: str) -> art.Trajectory: - # Create LangGraph agent with ART's chat model - chat_model = init_chat_model(model.name) - agent = create_react_agent(chat_model, tools) - - await agent.ainvoke({"messages": [("user", scenario)]}) - return art.Trajectory(reward=1.0, messages_and_choices=[]) - -# Train your agent -scenarios = ["Find urgent emails", "Search Q4 budget"] - -# Using wrap_rollout (captures interactions automatically) -groups = await art.gather_trajectory_groups([ - art.TrajectoryGroup(wrap_rollout(model, email_rollout)(model, s) for _ in range(4)) - for s in scenarios -]) - -await model.train(groups) +# After: One line with RULER +judged_group = await ruler_score_group(group, "openai/o3") ``` -[πŸ“– Learn more about LangGraph integration β†’](https://art.openpipe.ai/integrations/langgraph-integration) | [πŸ‹οΈ Try the notebook β†’](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb) +[πŸ“– Learn more about RULER β†’](https://art.openpipe.ai/fundamentals/ruler) ## ART Overview