feat: Integrate Chutes API with Kimi K2.5-TEE model by echobt · Pull Request #1 · PlatformNetwork/baseagent

echobt · 2026-02-03T13:07:01Z

Summary

This PR integrates Chutes.ai API into baseagent, following the integration pattern from tau-agent.

Changes

New Features

ChutesClient class for Chutes API (https://llm.chutes.ai/v1)
CHUTES_API_TOKEN environment variable for authentication
Default model: moonshotai/Kimi-K2.5-TEE (1T params, 32B activated)
Thinking mode enabled by default with <think>...</think> parsing
get_llm_client() factory function for easy provider selection

Configuration Updates

Default provider: chutes
Temperature: 1.0 for thinking mode, 0.6 for instant mode
Top-p: 0.95 (Kimi K2.5 best practice)
Context window: 256K tokens

Dependencies

Added openai>=1.0.0 for OpenAI-compatible API client

Usage

# With Chutes API (default)
export CHUTES_API_TOKEN="your-token"
python agent.py --instruction "Your task here..."

# With OpenRouter (fallback)
export LLM_PROVIDER="openrouter"
export OPENROUTER_API_KEY="your-key"
python agent.py --instruction "Your task here..."

References

Acceptance Criteria

LLM client supports Chutes API at https://llm.chutes.ai/v1
CHUTES_API_TOKEN environment variable is used for authentication
Default model is set to moonshotai/Kimi-K2.5-TEE
Thinking mode is enabled by default with proper parsing
Configuration uses Kimi K2.5 recommended parameters
Dependencies are updated (openai package)
Agent remains backward-compatible with LiteLLMClient fallback

Summary by CodeRabbit

New Features
- Added multi-provider LLM support with Chutes API as default provider and OpenRouter as fallback option
- Integrated Chutes API with Kimi K2.5-TEE model
- Enabled thinking mode capability for enhanced LLM responses
- Added per-call cost tracking and cost limit enforcement
- Introduced configuration-driven provider selection and flexible model parameters

- Add ChutesClient class for Chutes API (https://llm.chutes.ai/v1) - Support CHUTES_API_TOKEN environment variable for authentication - Set moonshotai/Kimi-K2.5-TEE as default model - Enable thinking mode by default with <think>...</think> parsing - Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95 for thinking) - Increase context limit to 256K tokens for Kimi K2.5 - Add openai>=1.0.0 dependency for OpenAI-compatible API client - Keep LiteLLMClient as fallback for other providers - Add get_llm_client() factory function for provider selection Based on tau-agent integration pattern from: https://github.com/unconst/tau-agent

coderabbitai · 2026-02-03T13:07:16Z

📝 Walkthrough

Walkthrough

Multi-provider LLM support is added via a factory function that selects between Chutes API and OpenRouter clients based on configuration. The agent initialization flow now retrieves LLM clients dynamically. Configuration defaults updated to use Chutes provider with Kimi-K2.5-TEE model, including thinking mode and cost tracking. New dependency on openai>=1.0.0.

Changes

Cohort / File(s)	Summary
Multi-provider LLM factory and client implementation `agent.py`, `src/llm/client.py`	Introduces `get_llm_client` factory function for dynamic provider selection. Adds new `ChutesClient` class with thinking mode, cost tracking, and Chutes API integration. Extends `LLMResponse` with thinking and cost fields. Updates `LiteLLMClient` constructor and chat signature. Agent updated to use factory instead of direct client instantiation with configuration-driven provider selection.
Configuration defaults `src/config/defaults.py`	Replaces previous Codex-simulated config with Chutes/OpenRouter-friendly defaults. Sets provider to "chutes", model to "moonshotai/Kimi-K2.5-TEE", introduces `enable_thinking`, `cost_limit`, `cache_extended_retention`, `cache_key` fields. Updates token limits (model_context_limit: 200000→256000) and adds cache/compaction parameters.
Dependency management `pyproject.toml`, `requirements.txt`	Adds openai>=1.0.0 as runtime dependency to support Chutes API integration.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Factory as get_llm_client
    participant Config
    participant ChutesClient
    participant ChutesAPI

    Agent->>Config: Read provider config
    Config-->>Agent: provider="chutes", model="Kimi-K2.5-TEE"
    Agent->>Factory: Call get_llm_client(provider, model, ...)
    Factory->>Factory: Check provider == "chutes"
    Factory->>ChutesClient: Initialize with API token & model
    ChutesClient-->>Factory: Instance created
    Factory-->>Agent: Return ChutesClient
    Agent->>ChutesClient: chat(messages, thinking_mode=True)
    ChutesClient->>ChutesAPI: POST /chat/completions
    ChutesAPI-->>ChutesClient: Response with thinking + text
    ChutesClient->>ChutesClient: Parse thinking, calculate cost
    ChutesClient-->>Agent: LLMResponse(text, thinking, cost)

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A factory springs up, providers now dance—
Chutes and OpenRouter get their chance,
Thinking modes enabled, costs tracked with care,
Kimi-K2.5 whispers wisdom in the air! 🎩✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: integrating Chutes API with Kimi K2.5-TEE model, which is the primary objective reflected across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/chutes-kimi-k25-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/llm/client.py`:
- Line 381: The two clients use different hard-coded defaults for cost_limit
(LiteLLMClient sets 10.0 while ChutesClient sets 100.0); make them consistent by
centralizing the default: add a shared constant (e.g., DEFAULT_LLM_COST_LIMIT)
or read the same env var fallback in both classes, and update LiteLLMClient and
ChutesClient to use that constant/env-based default when initializing
self.cost_limit so both clients default to the same value and honor
LLM_COST_LIMIT uniformly.

🧹 Nitpick comments (7)

src/llm/client.py (5)

23-23: Unused import sys.

The sys module is imported but not used anywhere in this file.

🧹 Proposed fix

-import sys

194-204: Regex only captures the first <think> block.

If the model returns multiple <think>...</think> blocks, this regex with re.search will only capture the first one, and re.sub will remove all of them but the thinking content from subsequent blocks will be lost.

♻️ Proposed fix to capture all thinking blocks

     def _parse_thinking_content(self, text: str) -> Tuple[str, str]:
         """Parse thinking content from response.
         
         Kimi K2.5 can return thinking content in:
         1. <think>...</think> tags (for some deployments)
         2. reasoning_content field (official API)
         
         Returns (thinking_content, final_response).
         """
         if not text:
             return "", ""
         
         # Check for <think>...</think> pattern
         think_pattern = r"<think>(.*?)</think>"
-        match = re.search(think_pattern, text, re.DOTALL)
+        matches = re.findall(think_pattern, text, re.DOTALL)
         
-        if match:
-            thinking = match.group(1).strip()
+        if matches:
+            thinking = "\n\n".join(m.strip() for m in matches)
             # Remove the think block from the response
             response = re.sub(think_pattern, "", text, flags=re.DOTALL).strip()
             return thinking, response
         
         return "", text

260-267: Preserve exception chain for better debugging.

When re-raising exceptions, use raise ... from e to preserve the original traceback, which aids debugging.

♻️ Proposed fix

         try:
             response = self._client.chat.completions.create(**kwargs)
             self._request_count += 1
         except Exception as e:
             error_msg = str(e)
             if "authentication" in error_msg.lower() or "api_key" in error_msg.lower() or "unauthorized" in error_msg.lower():
-                raise LLMError(error_msg, code="authentication_error")
+                raise LLMError(error_msg, code="authentication_error") from e
             elif "rate" in error_msg.lower() or "limit" in error_msg.lower():
-                raise LLMError(error_msg, code="rate_limit")
+                raise LLMError(error_msg, code="rate_limit") from e
             else:
-                raise LLMError(error_msg, code="api_error")
+                raise LLMError(error_msg, code="api_error") from e

296-304: Hardcoded pricing may become stale.

The cost estimation uses hardcoded pricing values. Consider adding a comment noting where to find current pricing, or making these configurable.

📝 Suggested improvement

-        # Estimate cost (Kimi K2.5 pricing via Chutes - approximate)
-        # $0.60 per million input tokens, $2.50 per million output tokens
-        input_cost_per_1k = 0.0006  # $0.60 / 1000
-        output_cost_per_1k = 0.0025  # $2.50 / 1000
+        # Estimate cost (Kimi K2.5 pricing via Chutes - approximate as of Feb 2026)
+        # Check https://chutes.ai/pricing for current rates
+        # $0.60 per million input tokens, $2.50 per million output tokens
+        input_cost_per_1k = 0.0006  # $0.60 / 1M = $0.0006 / 1K
+        output_cost_per_1k = 0.0025  # $2.50 / 1M = $0.0025 / 1K

466-473: Preserve exception chain for better debugging.

Same issue as in ChutesClient - use raise ... from e to preserve the original traceback.

♻️ Proposed fix

         except Exception as e:
             error_msg = str(e)
             if "authentication" in error_msg.lower() or "api_key" in error_msg.lower():
-                raise LLMError(error_msg, code="authentication_error")
+                raise LLMError(error_msg, code="authentication_error") from e
             elif "rate" in error_msg.lower() or "limit" in error_msg.lower():
-                raise LLMError(error_msg, code="rate_limit")
+                raise LLMError(error_msg, code="rate_limit") from e
             else:
-                raise LLMError(error_msg, code="api_error")
+                raise LLMError(error_msg, code="api_error") from e

src/config/defaults.py (1)

82-89: Clarify caching behavior for Chutes API.

The comments mention "Chutes may support server-side caching" and "Keep system prompt stable for best performance," but cache_extended_retention and cache_key appear to be unused in ChutesClient (as noted by static analysis). Consider documenting whether these are placeholders for future use or should be removed.

agent.py (1)

60-60: Unused imports ChutesClient and LiteLLMClient.

These classes are imported but never directly referenced in this file. The factory function get_llm_client returns them internally.
🧹 Proposed fix
-from src.llm.client import get_llm_client, CostLimitExceeded, ChutesClient, LiteLLMClient
+from src.llm.client import get_llm_client, CostLimitExceeded

coderabbitai · 2026-02-03T13:09:51Z

src/llm/client.py

        self.model = model
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "10.0"))


⚠️ Potential issue | 🟡 Minor

Inconsistent default cost limit between clients.

LiteLLMClient defaults to $10.0 while ChutesClient defaults to $100.0. This inconsistency could surprise users when switching providers.

🔧 Proposed fix for consistency

- self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "10.0")) + self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "10.0"))

self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))

🤖 Prompt for AI Agents

In `@src/llm/client.py` at line 381, The two clients use different hard-coded defaults for cost_limit (LiteLLMClient sets 10.0 while ChutesClient sets 100.0); make them consistent by centralizing the default: add a shared constant (e.g., DEFAULT_LLM_COST_LIMIT) or read the same env var fallback in both classes, and update LiteLLMClient and ChutesClient to use that constant/env-based default when initializing self.cost_limit so both clients default to the same value and honor LLM_COST_LIMIT uniformly.

This umbrella commit combines changes from all three feature PRs: - PR #1: Chutes API integration with Kimi K2.5-TEE model - PR #2: Comprehensive documentation with Mermaid diagrams - PR #3: Remove OpenRouter support, replace litellm with Chutes API Conflicts resolved by taking the latest implementation from PR #3, which provides a cleaner httpx-based client without litellm dependency.

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

This was referenced Feb 3, 2026

docs: Comprehensive documentation with Mermaid diagrams and Chutes API integration #2

Merged

feat: Remove OpenRouter support, replace litellm with Chutes API #3

Merged

echobt mentioned this pull request Feb 3, 2026

🚀 Epic: Complete Chutes API Integration (Umbrella PR) #4

Merged

5 tasks

echobt merged commit c25a4a5 into main Feb 3, 2026
1 check passed

coderabbitai bot mentioned this pull request Feb 3, 2026

feat: Integrate Chutes API with Kimi K2.5-TEE model #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Integrate Chutes API with Kimi K2.5-TEE model#1

feat: Integrate Chutes API with Kimi K2.5-TEE model#1
echobt merged 1 commit intomainfrom
feature/chutes-kimi-k25-integration

echobt commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "10.0"))
	self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))

Conversation

echobt commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Features

Configuration Updates

Dependencies

Usage

References

Acceptance Criteria

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

echobt commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading