feat(tools): add curl + gitbooks agent tools#907
Conversation
- `curl` streams downloads to disk under <workspace>/<curl.dest_subdir>,
shares http_request.allowed_domains, enforces hard byte ceiling
mid-stream, returns {path, bytes_written, content_type, sha256}.
- `gitbooks_search` + `gitbooks_get_page` mirror the OpenHuman GitBook
MCP server (searchDocumentation, getPage). Inline McpHttpClient
parses SSE-framed JSON-RPC.
- Extract URL/SSRF guards from http_request into shared url_guard
module so both http_request and curl use one implementation.
- Add [curl] and [gitbooks] config blocks with sensible defaults
(50 MB cap, 120s timeout for curl; gitbooks enabled by default
pointing at tinyhumans.gitbook.io/openhuman/~gitbook/mcp).
124 unit tests pass; live integration tests gated behind
OPENHUMAN_CURL_LIVE_TEST / OPENHUMAN_GITBOOKS_LIVE_TEST env vars
both pass against the real endpoints.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds two network tools (Curl and GitBooks) with configs and top-level wiring; centralizes outbound URL validation into a new url_guard module; updates HTTP request tool to use url_guard; registers Curl unconditionally and GitBooks conditionally; and introduces a Help agent that uses GitBooks tools and its prompt. Changes
Sequence DiagramssequenceDiagram
participant Client
participant CurlTool
participant SecurityPolicy
participant UrlGuard
participant FileSystem
participant RemoteServer
Client->>CurlTool: execute(url, dest_path, headers)
CurlTool->>SecurityPolicy: can_act() / record_action()
alt Denied
SecurityPolicy-->>CurlTool: Deny
CurlTool-->>Client: Error
else Allowed
CurlTool->>UrlGuard: validate_url(url, allowed_domains)
alt Invalid/Not allowlisted
UrlGuard-->>CurlTool: Error
CurlTool-->>Client: Error
else Valid
CurlTool->>FileSystem: resolve_safe_path(dest_subdir, dest_path)
alt Unsafe path
FileSystem-->>CurlTool: Error
CurlTool-->>Client: Error
else Safe
CurlTool->>RemoteServer: GET (timeout, headers, proxy)
RemoteServer-->>CurlTool: Response stream
CurlTool->>FileSystem: stream -> file, update SHA256
alt Exceeds max_download_bytes
CurlTool->>FileSystem: delete partial
CurlTool-->>Client: Error
else Completed
CurlTool->>FileSystem: flush/close
CurlTool-->>Client: Success {path, bytes, content_type, sha256}
end
end
end
end
sequenceDiagram
participant Client
participant GitbooksTool
participant HttpClient
participant GitBookMCP
Client->>GitbooksTool: execute(query or url)
alt Invalid params
GitbooksTool-->>Client: Error
else Valid
GitbooksTool->>HttpClient: POST JSON-RPC (timeout, proxy)
HttpClient->>GitBookMCP: HTTP request
GitBookMCP-->>HttpClient: Response (maybe text/event-stream)
alt SSE
HttpClient->>HttpClient: parse first "data:" JSON frame
else JSON
HttpClient->>HttpClient: parse JSON body
end
HttpClient-->>GitbooksTool: parsed result
GitbooksTool->>GitbooksTool: render_tool_result(content[], isError)
GitbooksTool-->>Client: Formatted ToolResult (success or error)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
- New built-in `help` agent (delegate_name `ask_docs`) for product-docs questions. Uses gitbooks_search, gitbooks_get_page, memory_recall. Read-only sandbox, narrow tool scope, no shell/file_write/spawn. - Add `curl` to researcher (artifact downloads alongside http_request) and code_executor (dataset/binary fetches alongside shell + git). - Deliberately NOT added to orchestrator: orchestrator delegates rather than executing — code_executor / researcher own actual downloads. Wildcard agents (tools_agent, morning_briefing) inherit curl through the registry. - Test count bumped 14 → 15; new tests pin help's tool scope, the new curl additions, and the orchestrator-must-not-have-curl invariant.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
src/openhuman/tools/impl/network/curl.rs (1)
152-297: Expand[curl]logging across reject and failure paths.This only logs start/success today. The autonomy gate, URL/path validation failures, request/send failures, non-2xx responses, size-cap aborts, and cleanup failures all return silently, which will make production debugging much harder for a networked write tool.
As per coding guidelines, "Add substantial development-oriented logging at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error paths" and "Rust debug logging must use stable grep-friendly prefixes (e.g.,
[domain],[rpc],[ui-flow]) and correlation fields."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/impl/network/curl.rs` around lines 152 - 297, The execute method in curl.rs currently logs only start/success; add debug/error logs with target "[curl]" and stable prefix fields at each rejection/failure and key branch (autonomy gate, rate limit, validate_url error in validate_url(), resolve_dest() error, create_dir_all failure, client.build() failure, request.send() failure, non-success HTTP status, fs::File::create() failure, stream chunk Err, download-size-abort, write_all Err, flush Err, and any cleanup fs::remove_file() Results) so failures are visible and correlated; for each error path in execute(), emit tracing::error!(target = "[curl]", url = %url, dest = %dest_path.display(), reason = %e) or tracing::debug!(...) as appropriate and include contextual fields like url, dest, bytes_written, status, and sha256 where available, keeping the current return values (ToolResult::error(...)) unchanged and only adding logging calls adjacent to the existing early returns and cleanup results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tools/impl/network/curl.rs`:
- Around line 31-47: The constructor new currently accepts dest_subdir directly
which can be absolute or contain .. and escape workspace_dir; validate and
sanitize dest_subdir inside new (and the other place around lines 76-83) by
rejecting or normalizing any absolute paths or ParentDir components before
storing it: check Path::is_absolute(&dest_subdir) and iterate Path::components()
to detect Component::ParentDir, and if found either return an error (or fallback
to a safe default) or strip leading separators and parent segments to produce a
safe relative path; ensure resolve_dest and the starts_with(root) check then
operate on this validated/sanitized self.dest_subdir so workspace confinement
cannot be bypassed.
- Around line 249-278: The error and flush branches must drop the open file
handle before attempting to delete the partial download and ensure cleanup on
flush failure: in the stream chunk Err branch and the write_all Err branch, call
drop(file) (or let file = None if using Option) before calling
fs::remove_file(&dest_path).await (ignore its error), and in the final flush
error path likewise drop the file handle then remove the file and return the
error ToolResult; update the references around bytes_written/hasher as needed in
the function in curl.rs so the file handle is closed before removal.
In `@src/openhuman/tools/impl/network/gitbooks.rs`:
- Line 55: The debug log prints the full user-configurable
GitbooksConfig.endpoint (self.endpoint) and may leak secrets; before calling
tracing::debug in the Gitbooks implementation (the line with
tracing::debug!(target: "[gitbooks]", endpoint = %self.endpoint, ...)), parse
self.endpoint and compute a redacted_endpoint containing only the origin/host
(e.g. scheme+host+port or host only) or a safe "<redacted>" fallback if parsing
fails, then log endpoint = %redacted_endpoint (keeping the "[gitbooks]" target
string and tool = %name) instead of the raw self.endpoint.
- Around line 47-50: Replace the redirect policy that currently allows following
redirects with a no-redirect policy: locate the reqwest client builder
expression that assigns to builder (the line using reqwest::Client::builder()
and reqwest::redirect::Policy::limited(3)) and change the redirect policy to
reqwest::redirect::Policy::none() so the client will not follow 30x responses
away from the configured MCP endpoint.
In `@src/openhuman/tools/ops.rs`:
- Around line 171-195: Add unit tests in the test module for the tools registry
to assert that CurlTool is always registered and that GitbooksSearchTool and
GitbooksGetPageTool are only present when root_config.gitbooks.enabled is true;
specifically, construct root_config with gitbooks.enabled = true and = false,
call the same code path that registers tools (the block that calls
CurlTool::new, GitbooksSearchTool::new, GitbooksGetPageTool::new in ops.rs),
then inspect the produced tools vector or registry to assert a tool with the
curl identity/marker exists in both cases and that entries corresponding to
gitbooks_search/gitbooks_get_page exist only in the enabled case. Ensure tests
fail if the tool identities/names change by matching on the concrete types or
well-known identifiers used when creating those Boxed tools.
---
Nitpick comments:
In `@src/openhuman/tools/impl/network/curl.rs`:
- Around line 152-297: The execute method in curl.rs currently logs only
start/success; add debug/error logs with target "[curl]" and stable prefix
fields at each rejection/failure and key branch (autonomy gate, rate limit,
validate_url error in validate_url(), resolve_dest() error, create_dir_all
failure, client.build() failure, request.send() failure, non-success HTTP
status, fs::File::create() failure, stream chunk Err, download-size-abort,
write_all Err, flush Err, and any cleanup fs::remove_file() Results) so failures
are visible and correlated; for each error path in execute(), emit
tracing::error!(target = "[curl]", url = %url, dest = %dest_path.display(),
reason = %e) or tracing::debug!(...) as appropriate and include contextual
fields like url, dest, bytes_written, status, and sha256 where available,
keeping the current return values (ToolResult::error(...)) unchanged and only
adding logging calls adjacent to the existing early returns and cleanup results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a5f6f19d-171a-41d4-8e2e-759b715b7515
📒 Files selected for processing (10)
src/openhuman/config/mod.rssrc/openhuman/config/schema/mod.rssrc/openhuman/config/schema/tools.rssrc/openhuman/config/schema/types.rssrc/openhuman/tools/impl/network/curl.rssrc/openhuman/tools/impl/network/gitbooks.rssrc/openhuman/tools/impl/network/http_request.rssrc/openhuman/tools/impl/network/mod.rssrc/openhuman/tools/impl/network/url_guard.rssrc/openhuman/tools/ops.rs
Welcome agent can now answer "how does X work" / "what can OpenHuman do" / "where is the setting for…" questions during onboarding by searching the real product docs instead of guessing feature names. - Add gitbooks_search + gitbooks_get_page to welcome's named tool list. - Update prompt.md: tell welcome to ground answers in the docs, cite the URL, and steer the conversation back to setup so it doesn't drift into an open-ended Q&A loop. - Tests now assert both gitbooks tools are present and re-pin the read-only invariant (no shell/file_write/curl can leak in).
Merge upstream/main (which split onboarding-status checks into a new `check_onboarding_status` tool) and address CodeRabbit's actionable items on the curl/gitbooks PR. Conflict resolution - welcome/agent.toml: combined upstream's `check_onboarding_status` addition with our gitbooks tools (welcome now has all five tools). - welcome/prompt.md: kept upstream's `check_onboarding_status` table and kept our gitbooks docs-grounded-answer block. - loader.rs: dropped the old `tools.len() == 4` count assertion (welcome's surface keeps growing); added explicit `gitbooks_*` presence assertions. CodeRabbit fixes - curl: sanitize `dest_subdir` in `CurlTool::new` so a malicious `[curl].dest_subdir` config value (`/etc`, `../..`) cannot escape the workspace root. Added 4 sanitizer tests. - curl: drop the file handle before `fs::remove_file` on every cleanup path (stream error, write error, size cap, flush error) so partial files are reliably deleted on Windows too. Log cleanup-removal failures at debug. - curl: add tracing on every reject/failure path with the stable `[curl]` target and correlation fields (url, dest, bytes_written, reason). - gitbooks: redact endpoint to scheme+host[:port] in debug logs so query strings, future bearer tokens, or accidental userinfo can't leak. Added 2 redactor tests. - gitbooks: tighten redirect policy from `Policy::limited(3)` to `Policy::none()` — same hardening as `http_request`/`curl`, prevents an MCP-endpoint operator from redirecting our client to an attacker-controlled origin. - ops: add registration tests — curl is always present; gitbooks_search/gitbooks_get_page register only when gitbooks.enabled = true.
Summary
curlagent tool: streams downloads to disk under<workspace>/<curl.dest_subdir>with a hard byte ceiling, returning{ path, bytes_written, content_type, sha256 }.gitbooks_search+gitbooks_get_pagetools that call the OpenHuman GitBook MCP server (https://tinyhumans.gitbook.io/openhuman/~gitbook/mcp) so the agent can answer product questions from the docs.http_request.rsinto a sharedurl_guard.rsmodule sohttp_requestandcurluse one implementation.[curl](dest_subdir,max_download_bytesdefault 50 MB,timeout_secs) and[gitbooks](enableddefault true,endpoint,timeout_secs).Problem
Agents had
http_request(returns body inline, capped at ~1 MB) but no way to save a file to the workspace, and no first-class way to query our own product documentation. We were one allowlist away from a download tool, and the GitBook docs already speak MCP — we just needed a thin Rust client to surface those two upstream tools as native agent tools.Solution
curl(src/openhuman/tools/impl/network/curl.rs)http_request.allowed_domainsso there is one allowlist to reason about.bytes_stream()chunks to disk; aborts and removes the partial file if the running total exceedsmax_download_bytes.dest_pathlexically under<workspace>/<dest_subdir>: rejects absolute paths,.., and any escape attempts (belt-and-bracesstarts_withcheck after the component scan).Write.gitbooks_*(src/openhuman/tools/impl/network/gitbooks.rs)McpHttpClientdoestools/callover POST JSON-RPC and parses the singleevent: message\ndata: {…}SSE frame the GitBook server returns. No new MCP crate needed yet — kept inline so we can extract it the moment a second remote MCP server appears.gitbooks_search→ upstreamsearchDocumentation { query };gitbooks_get_page→ upstreamgetPage { url }.ReadOnly. Togglable via[gitbooks].enabled.Shared
url_guard.rs— moved out ofhttp_request.rs. All 40 SSRF/allowlist tests carried over verbatim.http_request.rsis now ~190 LOC lighter and just importsvalidate_url.Submission Checklist
cargo testfor logic added/changed (124 unit tests intools::implementations::network::*pass).OPENHUMAN_CURL_LIVE_TEST=1 cargo test live_download_example_com— downloadshttps://example.com/, asserts > 100 bytes and that the saved HTML contains "example domain".OPENHUMAN_GITBOOKS_LIVE_TEST=1 cargo test live_search_smoke— queries the real GitBook MCP for "what is openhuman" and asserts a non-error result. Sample excerpt://!headers +///on public types and non-obvious helpers...rejection rationale, why the SSE parser is one-shot, whycurlshareshttp_request.allowed_domains).Impact
curlshareshttp_request's SSRF/allowlist guards, has a hardmax_download_bytesceiling, and rejects absolute /..dest paths.gitbooks_*only talks to the configured fixed endpoint.[gitbooks].enabled = truemeans existing installs get the docs tools immediately on next core upgrade.[curl]is gated byhttp_request.allowed_domainsexactly likehttp_request.Related
curlagainst the mock server (scripts/test-rust-with-mock.sh).src/openhuman/about_app/capability catalog to mention the new tools.Summary by CodeRabbit
New Features
Chores