feat(redirect_links): SQLite-backed URL shortener for token-heavy links#870
Conversation
New domain at `src/openhuman/redirect_links/` that encodes long tracking URLs to `openhuman://link/<id>` placeholders and expands them back out before messages reach the user. Keeps the full URL in a global SQLite store (`workspace_dir/redirect_links/links.db`), uses content-addressed 8-char hex ids (SHA-256 prefix, lengthened on hash-prefix collision) so re-shortening the same URL is idempotent. RPC surface (`openhuman.redirect_links_*`): `shorten`, `expand`, `list`, `remove`, `rewrite_inbound` (regex-based, min-length guard, trims trailing sentence punctuation), `rewrite_outbound` (unknown ids pass through unchanged). Wired into the controller registry and namespace description. Follows the controller migration checklist: `schemas.rs` declares the registry, `mod.rs` re-exports `all_*_controller_schemas` and `all_*_registered_controllers`, and `ops.rs` is re-exported as `rpc`. 15 unit tests cover store dedup/collision, expand bumps hit count, inbound rewrite preserves surrounding text, trims trailing punctuation, leaves short URLs alone, handles multiple URLs, outbound round-trip, and unknown-id passthrough.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new openhuman Changes
Sequence Diagram(s)sequenceDiagram
participant Client as RPC Client
participant Handler as Schema Handler
participant Ops as Redirect Ops
participant Store as SQLite Store
rect rgba(100,150,200,0.5)
Note over Client,Store: Shorten Flow
Client->>Handler: rl_shorten(url)
Handler->>Handler: load config
Handler->>Ops: shorten_url(url)
Ops->>Ops: regex match & trim punctuation
Ops->>Store: shorten(url)
Store->>Store: sha256 id gen, insert/dedup
Store-->>Ops: RedirectLink
Ops-->>Handler: RedirectLink
Handler-->>Client: RpcOutcome<RedirectLink>
end
rect rgba(150,200,100,0.5)
Note over Client,Store: Expand / Rewrite Outbound
Client->>Handler: rl_rewrite_outbound(text)
Handler->>Ops: rewrite_outbound(text)
Ops->>Ops: find openhuman://link/<id> tokens
Ops->>Store: expand(id) per match
Store->>Store: lookup, increment hit_count, update last_used_at
Store-->>Ops: URL or None
Ops-->>Handler: RewriteResult{text, replacements}
Handler-->>Client: RpcOutcome<RewriteResult>
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/core/all.rs (1)
271-273: Add a direct test assertion for the new namespace description.Line 271 adds new behavior in
namespace_description, butnamespace_description_known_namespacesdoesn’t currently assertredirect_links.As per coding guidelines: "Ship unit tests and coverage for behavior you are adding or changing before building additional features on top".Suggested test addition
#[test] fn namespace_description_known_namespaces() { assert!(namespace_description("memory").is_some()); assert!(namespace_description("memory_tree").is_some()); + assert!(namespace_description("redirect_links").is_some()); assert!(namespace_description("billing").is_some()); assert!(namespace_description("config").is_some());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/all.rs` around lines 271 - 273, The test suite is missing a direct assertion for the new "redirect_links" entry added to namespace_description; update the test named namespace_description_known_namespaces to include an assertion that the returned known namespaces list (from namespace_description or the function under test) contains "redirect_links" and that its description string matches the new value, ensuring the test checks both presence and exact description text for "redirect_links".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/redirect_links/ops.rs`:
- Around line 127-189: The RPC handlers (rl_shorten, rl_expand, rl_list,
rl_remove, rl_rewrite_inbound, rl_rewrite_outbound) currently emit raw URLs/IDs
in logs; change logging to never include full URLs or querystrings and use a
stable grep-friendly prefix like "[rpc][redirect_links]". Implement or call a
helper (e.g., redact_url) that strips query and sensitive fragments (or returns
only host + path or a fixed "<redacted_url>") and use that instead of link.url /
link.short_url / raw ids in the format! calls inside RpcOutcome::single_log and
RpcOutcome::new; update all log messages to the consistent prefix and redacted
values (e.g., "[rpc][redirect_links] shortened: {redacted_original} ->
{redacted_short}"). Ensure rl_list and rewrite logs also use redacted summaries
(count or redacted samples) rather than full URLs.
In `@src/openhuman/redirect_links/store.rs`:
- Around line 18-27: The function id_from_short currently only handles strings
with the SHORT_URL_PREFIX; update it to also accept a bare hex id by, after
trimming, first checking if trimmed starts with SHORT_URL_PREFIX and validating
the suffix as hex (existing logic), and if not, validating that the entire
trimmed string is a non-empty ASCII hex string and returning it; reference the
id_from_short function and SHORT_URL_PREFIX constant when making the change.
- Around line 41-67: The shorten flow has a check-then-insert race: find_by_url
and get_by_id are non-atomic so concurrent calls can both try to insert the same
id and one will hit a PRIMARY KEY error; replace the INSERT with an idempotent
insert using "INSERT ... ON CONFLICT(id) DO NOTHING" (or the SQLite equivalent)
inside the loop where conn.execute is called, then after the insert run a
post-insert lookup (e.g., call find_by_url(conn, url) and/or get_by_id(conn,
&id)) to determine whether the insert succeeded or an existing row is present
and return that row; if the existing row has a different url (hash-prefix
collision) continue the len += 2 loop as before. Ensure you remove or avoid
relying on the separate pre-insert find_by_url/read checks and handle all
outcomes (inserted, found-by-url, found-by-id-with-different-url) via these
atomic insert + post-check steps in the shorten implementation.
---
Nitpick comments:
In `@src/core/all.rs`:
- Around line 271-273: The test suite is missing a direct assertion for the new
"redirect_links" entry added to namespace_description; update the test named
namespace_description_known_namespaces to include an assertion that the returned
known namespaces list (from namespace_description or the function under test)
contains "redirect_links" and that its description string matches the new value,
ensuring the test checks both presence and exact description text for
"redirect_links".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 09132052-d4cf-4fb6-bbea-a219b0e80fb5
📒 Files selected for processing (7)
src/core/all.rssrc/openhuman/mod.rssrc/openhuman/redirect_links/mod.rssrc/openhuman/redirect_links/ops.rssrc/openhuman/redirect_links/schemas.rssrc/openhuman/redirect_links/store.rssrc/openhuman/redirect_links/types.rs
- store.rs: replace check-then-insert with atomic `ON CONFLICT DO NOTHING` + post-verify so concurrent `shorten()` callers for the same URL don't race into a PRIMARY KEY constraint error. Add a threaded regression test. - store.rs: implement the bare-id path of `id_from_short` promised by its docstring (accepts both `openhuman://link/<id>` and bare hex; normalizes to lowercase). Add a test. - ops.rs: drop raw URLs from RPC logs (full query strings can carry tracking identifiers / PII) and switch every log line to the stable grep-friendly prefix `[redirect_links][rpc][<fn>]`. - core/all.rs: extend `namespace_description_known_namespaces` to assert the new `redirect_links` entry.
Summary
src/openhuman/redirect_links/domain that encodes long tracking URLs (e.g.trip.com/forward/...?bizData=eyJld...) toopenhuman://link/<id>placeholders and expands them back before messages reach the user — saves tokens on every prompt that carries one of these URLs.workspace_dir/redirect_links/links.db; content-addressed 8-char hex ids (SHA-256 prefix, widened on hash-prefix collision) so re-shortening the same URL is idempotent.openhuman.redirect_links_{shorten,expand,list,remove,rewrite_inbound,rewrite_outbound}. Wired into the controller registry andnamespace_descriptionper the controller migration checklist.Follow-ups (not in this PR): wire
rewrite_inboundinto the agent prompt-assembly path andrewrite_outboundinto response delivery so the token savings actually land.Test plan
cargo check --manifest-path Cargo.toml— clean (pre-existing unrelated warning only)cargo test --lib openhuman::redirect_links— 15 new tests passNoneid_from_shortscheme parsing + hex-only guardlistorders newest-first and respects limit;removereports affectedcargo test --lib core::all— 34 existing registry tests still pass (no duplicate methods, every handler has a declared schema, etc.)Summary by CodeRabbit