Skip to content

feat(tools): add curl + gitbooks agent tools#907

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/curl-gitbooks-tools
Apr 25, 2026
Merged

feat(tools): add curl + gitbooks agent tools#907
senamakel merged 4 commits into
tinyhumansai:mainfrom
senamakel:feat/curl-gitbooks-tools

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 24, 2026

Summary

  • New curl agent tool: streams downloads to disk under <workspace>/<curl.dest_subdir> with a hard byte ceiling, returning { path, bytes_written, content_type, sha256 }.
  • New gitbooks_search + gitbooks_get_page tools that call the OpenHuman GitBook MCP server (https://tinyhumans.gitbook.io/openhuman/~gitbook/mcp) so the agent can answer product questions from the docs.
  • Extracted SSRF/allowlist guards out of http_request.rs into a shared url_guard.rs module so http_request and curl use one implementation.
  • Two new config blocks: [curl] (dest_subdir, max_download_bytes default 50 MB, timeout_secs) and [gitbooks] (enabled default true, endpoint, timeout_secs).

Problem

Agents had http_request (returns body inline, capped at ~1 MB) but no way to save a file to the workspace, and no first-class way to query our own product documentation. We were one allowlist away from a download tool, and the GitBook docs already speak MCP — we just needed a thin Rust client to surface those two upstream tools as native agent tools.

Solution

curl (src/openhuman/tools/impl/network/curl.rs)

  • Reuses http_request.allowed_domains so there is one allowlist to reason about.
  • Streams bytes_stream() chunks to disk; aborts and removes the partial file if the running total exceeds max_download_bytes.
  • Resolves dest_path lexically under <workspace>/<dest_subdir>: rejects absolute paths, .., and any escape attempts (belt-and-braces starts_with check after the component scan).
  • Computes SHA-256 incrementally during streaming.
  • Permission level Write.

gitbooks_* (src/openhuman/tools/impl/network/gitbooks.rs)

  • Inline McpHttpClient does tools/call over POST JSON-RPC and parses the single event: message\ndata: {…} SSE frame the GitBook server returns. No new MCP crate needed yet — kept inline so we can extract it the moment a second remote MCP server appears.
  • gitbooks_search → upstream searchDocumentation { query }; gitbooks_get_page → upstream getPage { url }.
  • Permission level ReadOnly. Togglable via [gitbooks].enabled.

Shared url_guard.rs — moved out of http_request.rs. All 40 SSRF/allowlist tests carried over verbatim. http_request.rs is now ~190 LOC lighter and just imports validate_url.

Submission Checklist

  • Unit testscargo test for logic added/changed (124 unit tests in tools::implementations::network::* pass).
  • Live integration — env-gated live smoke tests for both tools that hit real endpoints:
    • OPENHUMAN_CURL_LIVE_TEST=1 cargo test live_download_example_com — downloads https://example.com/, asserts > 100 bytes and that the saved HTML contains "example domain".
    • OPENHUMAN_GITBOOKS_LIVE_TEST=1 cargo test live_search_smoke — queries the real GitBook MCP for "what is openhuman" and asserts a non-error result. Sample excerpt:

      Title: Welcome to OpenHuman
      Link: https://tinyhumans.gitbook.io/openhuman/overview/readme
      Content: OpenHuman connects to your communication platforms, tools, and workflows, and compresses that information into structured intelligence any AI can act on...

  • Doc comments — module-level //! headers + /// on public types and non-obvious helpers.
  • Inline comments — only where the why is non-obvious (the lexical .. rejection rationale, why the SSE parser is one-shot, why curl shares http_request.allowed_domains).

Impact

  • Runtime: desktop only (matches the rest of the agent loop). New tools auto-register through the existing tool registry — no UI/RPC plumbing needed.
  • Security: curl shares http_request's SSRF/allowlist guards, has a hard max_download_bytes ceiling, and rejects absolute / .. dest paths. gitbooks_* only talks to the configured fixed endpoint.
  • Defaults: [gitbooks].enabled = true means existing installs get the docs tools immediately on next core upgrade. [curl] is gated by http_request.allowed_domains exactly like http_request.
  • No migrations, no compat shims.

Related

  • Closes:
  • Follow-up PR(s)/TODOs:
    • Consider adding a JSON-RPC E2E for curl against the mock server (scripts/test-rust-with-mock.sh).
    • Update src/openhuman/about_app/ capability catalog to mention the new tools.

Summary by CodeRabbit

  • New Features

    • Added Curl download tool (allowlist, size/time limits) and GitBooks tools for searching/fetching docs.
    • New built-in "help" agent that uses GitBooks and memory recall.
  • Chores

    • Centralized outbound URL validation for network tools.
    • Exposed curl/gitbooks configuration and default config fields; updated agent tool allowlists and prompts to include GitBooks.

- `curl` streams downloads to disk under <workspace>/<curl.dest_subdir>,
  shares http_request.allowed_domains, enforces hard byte ceiling
  mid-stream, returns {path, bytes_written, content_type, sha256}.
- `gitbooks_search` + `gitbooks_get_page` mirror the OpenHuman GitBook
  MCP server (searchDocumentation, getPage). Inline McpHttpClient
  parses SSE-framed JSON-RPC.
- Extract URL/SSRF guards from http_request into shared url_guard
  module so both http_request and curl use one implementation.
- Add [curl] and [gitbooks] config blocks with sensible defaults
  (50 MB cap, 120s timeout for curl; gitbooks enabled by default
  pointing at tinyhumans.gitbook.io/openhuman/~gitbook/mcp).

124 unit tests pass; live integration tests gated behind
OPENHUMAN_CURL_LIVE_TEST / OPENHUMAN_GITBOOKS_LIVE_TEST env vars
both pass against the real endpoints.
@senamakel senamakel requested a review from a team April 24, 2026 23:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e92fa57f-6386-4cc9-976e-3557e52b05d2

📥 Commits

Reviewing files that changed from the base of the PR and between fff8f16 and ac0fc8b.

📒 Files selected for processing (14)
  • src/openhuman/agent/agents/code_executor/agent.toml
  • src/openhuman/agent/agents/help/agent.toml
  • src/openhuman/agent/agents/help/mod.rs
  • src/openhuman/agent/agents/help/prompt.md
  • src/openhuman/agent/agents/help/prompt.rs
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/mod.rs
  • src/openhuman/agent/agents/researcher/agent.toml
  • src/openhuman/agent/agents/welcome/agent.toml
  • src/openhuman/agent/agents/welcome/prompt.md
  • src/openhuman/config/schema/types.rs
  • src/openhuman/tools/impl/network/curl.rs
  • src/openhuman/tools/impl/network/gitbooks.rs
  • src/openhuman/tools/ops.rs
✅ Files skipped from review due to trivial changes (5)
  • src/openhuman/agent/agents/mod.rs
  • src/openhuman/agent/agents/code_executor/agent.toml
  • src/openhuman/agent/agents/researcher/agent.toml
  • src/openhuman/agent/agents/welcome/agent.toml
  • src/openhuman/agent/agents/help/prompt.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/openhuman/config/schema/types.rs
  • src/openhuman/tools/impl/network/gitbooks.rs
  • src/openhuman/tools/ops.rs

📝 Walkthrough

Walkthrough

Adds two network tools (Curl and GitBooks) with configs and top-level wiring; centralizes outbound URL validation into a new url_guard module; updates HTTP request tool to use url_guard; registers Curl unconditionally and GitBooks conditionally; and introduces a Help agent that uses GitBooks tools and its prompt.

Changes

Cohort / File(s) Summary
Configuration schema & top-level config
src/openhuman/config/mod.rs, src/openhuman/config/schema/{mod.rs,tools.rs,types.rs}
Expose and add CurlConfig and GitbooksConfig; integrate them into top-level Config with serde defaults.
Curl tool implementation
src/openhuman/tools/impl/network/curl.rs
New CurlTool: allowlist-validated downloads, safe destination resolution, streaming to disk with SHA-256, max-byte enforcement, proxy/timeout support, security-policy checks, and tests.
GitBooks tool implementation
src/openhuman/tools/impl/network/gitbooks.rs
New GitbooksSearchTool and GitbooksGetPageTool: JSON-RPC/SSE handling against GitBook MCP, response parsing/rendering, timeout/proxy handling, and tests.
URL validation centralization
src/openhuman/tools/impl/network/url_guard.rs
New module with strict validate_url and helpers: scheme checks, host extraction, private/local IP blocking, allowlist normalization and matching, with comprehensive tests.
HTTP request tool update
src/openhuman/tools/impl/network/http_request.rs
Delegated URL/domain validation to url_guard, removing prior in-file validation code.
Network module exports & registry
src/openhuman/tools/impl/network/mod.rs, src/openhuman/tools/ops.rs
Register CurlTool (always) and GitBooks tools (when enabled); export new tool types; tests ensuring registry behavior updated.
Agent additions & prompts
src/openhuman/agent/agents/{help,loader,mod.rs}, src/openhuman/agent/agents/help/{agent.toml,prompt.md,prompt.rs}, src/openhuman/agent/agents/{researcher,welcome}/{agent.toml,prompt.md}, src/openhuman/agent/agents/code_executor/agent.toml
Add help built-in agent and prompt builder; update builtins registry and several agent TOML files to include curl and GitBooks tools; adjust tests for tool availability and flags.
Misc tests & small mods
various mod tests updates across tools/loader
Added/updated unit tests for URL validation, curl/gitbooks behavior, registry presence, and live-test guards.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant CurlTool
    participant SecurityPolicy
    participant UrlGuard
    participant FileSystem
    participant RemoteServer

    Client->>CurlTool: execute(url, dest_path, headers)
    CurlTool->>SecurityPolicy: can_act() / record_action()
    alt Denied
        SecurityPolicy-->>CurlTool: Deny
        CurlTool-->>Client: Error
    else Allowed
        CurlTool->>UrlGuard: validate_url(url, allowed_domains)
        alt Invalid/Not allowlisted
            UrlGuard-->>CurlTool: Error
            CurlTool-->>Client: Error
        else Valid
            CurlTool->>FileSystem: resolve_safe_path(dest_subdir, dest_path)
            alt Unsafe path
                FileSystem-->>CurlTool: Error
                CurlTool-->>Client: Error
            else Safe
                CurlTool->>RemoteServer: GET (timeout, headers, proxy)
                RemoteServer-->>CurlTool: Response stream
                CurlTool->>FileSystem: stream -> file, update SHA256
                alt Exceeds max_download_bytes
                    CurlTool->>FileSystem: delete partial
                    CurlTool-->>Client: Error
                else Completed
                    CurlTool->>FileSystem: flush/close
                    CurlTool-->>Client: Success {path, bytes, content_type, sha256}
                end
            end
        end
    end
Loading
sequenceDiagram
    participant Client
    participant GitbooksTool
    participant HttpClient
    participant GitBookMCP

    Client->>GitbooksTool: execute(query or url)
    alt Invalid params
        GitbooksTool-->>Client: Error
    else Valid
        GitbooksTool->>HttpClient: POST JSON-RPC (timeout, proxy)
        HttpClient->>GitBookMCP: HTTP request
        GitBookMCP-->>HttpClient: Response (maybe text/event-stream)
        alt SSE
            HttpClient->>HttpClient: parse first "data:" JSON frame
        else JSON
            HttpClient->>HttpClient: parse JSON body
        end
        HttpClient-->>GitbooksTool: parsed result
        GitbooksTool->>GitbooksTool: render_tool_result(content[], isError)
        GitbooksTool-->>Client: Formatted ToolResult (success or error)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Poem

🐰
Curl fetches files, soft paws on the ground,
GitBooks hum knowledge, each doc-link a sound.
Paths guarded safe, checksums snug and round,
I hop through code where trusted domains are found,
A rabbit cheers: downloads and docs abound!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(tools): add curl + gitbooks agent tools' directly and specifically summarizes the main change: adding two new agent tools (curl for downloads and gitbooks for documentation access).
Docstring Coverage ✅ Passed Docstring coverage is 97.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

- New built-in `help` agent (delegate_name `ask_docs`) for product-docs
  questions. Uses gitbooks_search, gitbooks_get_page, memory_recall.
  Read-only sandbox, narrow tool scope, no shell/file_write/spawn.
- Add `curl` to researcher (artifact downloads alongside http_request)
  and code_executor (dataset/binary fetches alongside shell + git).
- Deliberately NOT added to orchestrator: orchestrator delegates rather
  than executing — code_executor / researcher own actual downloads.
  Wildcard agents (tools_agent, morning_briefing) inherit curl through
  the registry.
- Test count bumped 14 → 15; new tests pin help's tool scope, the new
  curl additions, and the orchestrator-must-not-have-curl invariant.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/openhuman/tools/impl/network/curl.rs (1)

152-297: Expand [curl] logging across reject and failure paths.

This only logs start/success today. The autonomy gate, URL/path validation failures, request/send failures, non-2xx responses, size-cap aborts, and cleanup failures all return silently, which will make production debugging much harder for a networked write tool.

As per coding guidelines, "Add substantial development-oriented logging at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error paths" and "Rust debug logging must use stable grep-friendly prefixes (e.g., [domain], [rpc], [ui-flow]) and correlation fields."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/tools/impl/network/curl.rs` around lines 152 - 297, The execute
method in curl.rs currently logs only start/success; add debug/error logs with
target "[curl]" and stable prefix fields at each rejection/failure and key
branch (autonomy gate, rate limit, validate_url error in validate_url(),
resolve_dest() error, create_dir_all failure, client.build() failure,
request.send() failure, non-success HTTP status, fs::File::create() failure,
stream chunk Err, download-size-abort, write_all Err, flush Err, and any cleanup
fs::remove_file() Results) so failures are visible and correlated; for each
error path in execute(), emit tracing::error!(target = "[curl]", url = %url,
dest = %dest_path.display(), reason = %e) or tracing::debug!(...) as appropriate
and include contextual fields like url, dest, bytes_written, status, and sha256
where available, keeping the current return values (ToolResult::error(...))
unchanged and only adding logging calls adjacent to the existing early returns
and cleanup results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/tools/impl/network/curl.rs`:
- Around line 31-47: The constructor new currently accepts dest_subdir directly
which can be absolute or contain .. and escape workspace_dir; validate and
sanitize dest_subdir inside new (and the other place around lines 76-83) by
rejecting or normalizing any absolute paths or ParentDir components before
storing it: check Path::is_absolute(&dest_subdir) and iterate Path::components()
to detect Component::ParentDir, and if found either return an error (or fallback
to a safe default) or strip leading separators and parent segments to produce a
safe relative path; ensure resolve_dest and the starts_with(root) check then
operate on this validated/sanitized self.dest_subdir so workspace confinement
cannot be bypassed.
- Around line 249-278: The error and flush branches must drop the open file
handle before attempting to delete the partial download and ensure cleanup on
flush failure: in the stream chunk Err branch and the write_all Err branch, call
drop(file) (or let file = None if using Option) before calling
fs::remove_file(&dest_path).await (ignore its error), and in the final flush
error path likewise drop the file handle then remove the file and return the
error ToolResult; update the references around bytes_written/hasher as needed in
the function in curl.rs so the file handle is closed before removal.

In `@src/openhuman/tools/impl/network/gitbooks.rs`:
- Line 55: The debug log prints the full user-configurable
GitbooksConfig.endpoint (self.endpoint) and may leak secrets; before calling
tracing::debug in the Gitbooks implementation (the line with
tracing::debug!(target: "[gitbooks]", endpoint = %self.endpoint, ...)), parse
self.endpoint and compute a redacted_endpoint containing only the origin/host
(e.g. scheme+host+port or host only) or a safe "<redacted>" fallback if parsing
fails, then log endpoint = %redacted_endpoint (keeping the "[gitbooks]" target
string and tool = %name) instead of the raw self.endpoint.
- Around line 47-50: Replace the redirect policy that currently allows following
redirects with a no-redirect policy: locate the reqwest client builder
expression that assigns to builder (the line using reqwest::Client::builder()
and reqwest::redirect::Policy::limited(3)) and change the redirect policy to
reqwest::redirect::Policy::none() so the client will not follow 30x responses
away from the configured MCP endpoint.

In `@src/openhuman/tools/ops.rs`:
- Around line 171-195: Add unit tests in the test module for the tools registry
to assert that CurlTool is always registered and that GitbooksSearchTool and
GitbooksGetPageTool are only present when root_config.gitbooks.enabled is true;
specifically, construct root_config with gitbooks.enabled = true and = false,
call the same code path that registers tools (the block that calls
CurlTool::new, GitbooksSearchTool::new, GitbooksGetPageTool::new in ops.rs),
then inspect the produced tools vector or registry to assert a tool with the
curl identity/marker exists in both cases and that entries corresponding to
gitbooks_search/gitbooks_get_page exist only in the enabled case. Ensure tests
fail if the tool identities/names change by matching on the concrete types or
well-known identifiers used when creating those Boxed tools.

---

Nitpick comments:
In `@src/openhuman/tools/impl/network/curl.rs`:
- Around line 152-297: The execute method in curl.rs currently logs only
start/success; add debug/error logs with target "[curl]" and stable prefix
fields at each rejection/failure and key branch (autonomy gate, rate limit,
validate_url error in validate_url(), resolve_dest() error, create_dir_all
failure, client.build() failure, request.send() failure, non-success HTTP
status, fs::File::create() failure, stream chunk Err, download-size-abort,
write_all Err, flush Err, and any cleanup fs::remove_file() Results) so failures
are visible and correlated; for each error path in execute(), emit
tracing::error!(target = "[curl]", url = %url, dest = %dest_path.display(),
reason = %e) or tracing::debug!(...) as appropriate and include contextual
fields like url, dest, bytes_written, status, and sha256 where available,
keeping the current return values (ToolResult::error(...)) unchanged and only
adding logging calls adjacent to the existing early returns and cleanup results.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a5f6f19d-171a-41d4-8e2e-759b715b7515

📥 Commits

Reviewing files that changed from the base of the PR and between 215a776 and fff8f16.

📒 Files selected for processing (10)
  • src/openhuman/config/mod.rs
  • src/openhuman/config/schema/mod.rs
  • src/openhuman/config/schema/tools.rs
  • src/openhuman/config/schema/types.rs
  • src/openhuman/tools/impl/network/curl.rs
  • src/openhuman/tools/impl/network/gitbooks.rs
  • src/openhuman/tools/impl/network/http_request.rs
  • src/openhuman/tools/impl/network/mod.rs
  • src/openhuman/tools/impl/network/url_guard.rs
  • src/openhuman/tools/ops.rs

Comment thread src/openhuman/tools/impl/network/curl.rs
Comment thread src/openhuman/tools/impl/network/curl.rs
Comment thread src/openhuman/tools/impl/network/gitbooks.rs Outdated
Comment thread src/openhuman/tools/impl/network/gitbooks.rs Outdated
Comment thread src/openhuman/tools/ops.rs
Welcome agent can now answer "how does X work" / "what can OpenHuman
do" / "where is the setting for…" questions during onboarding by
searching the real product docs instead of guessing feature names.

- Add gitbooks_search + gitbooks_get_page to welcome's named tool list.
- Update prompt.md: tell welcome to ground answers in the docs, cite
  the URL, and steer the conversation back to setup so it doesn't
  drift into an open-ended Q&A loop.
- Tests now assert both gitbooks tools are present and re-pin the
  read-only invariant (no shell/file_write/curl can leak in).
Merge upstream/main (which split onboarding-status checks into a new
`check_onboarding_status` tool) and address CodeRabbit's actionable
items on the curl/gitbooks PR.

Conflict resolution
- welcome/agent.toml: combined upstream's `check_onboarding_status`
  addition with our gitbooks tools (welcome now has all five tools).
- welcome/prompt.md: kept upstream's `check_onboarding_status` table
  and kept our gitbooks docs-grounded-answer block.
- loader.rs: dropped the old `tools.len() == 4` count assertion
  (welcome's surface keeps growing); added explicit `gitbooks_*`
  presence assertions.

CodeRabbit fixes
- curl: sanitize `dest_subdir` in `CurlTool::new` so a malicious
  `[curl].dest_subdir` config value (`/etc`, `../..`) cannot escape
  the workspace root. Added 4 sanitizer tests.
- curl: drop the file handle before `fs::remove_file` on every
  cleanup path (stream error, write error, size cap, flush error)
  so partial files are reliably deleted on Windows too. Log
  cleanup-removal failures at debug.
- curl: add tracing on every reject/failure path with the stable
  `[curl]` target and correlation fields (url, dest, bytes_written,
  reason).
- gitbooks: redact endpoint to scheme+host[:port] in debug logs so
  query strings, future bearer tokens, or accidental userinfo can't
  leak. Added 2 redactor tests.
- gitbooks: tighten redirect policy from `Policy::limited(3)` to
  `Policy::none()` — same hardening as `http_request`/`curl`, prevents
  an MCP-endpoint operator from redirecting our client to an
  attacker-controlled origin.
- ops: add registration tests — curl is always present;
  gitbooks_search/gitbooks_get_page register only when
  gitbooks.enabled = true.
@senamakel senamakel merged commit 98b74b5 into tinyhumansai:main Apr 25, 2026
9 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant