fix(security): docker hardening, homoglyph detection, async persist, public-bind warning#2011
Conversation
…ersist, public-bind warning 1. docker-compose.yml: Add container security hardening — no-new-privileges, cap_drop ALL, memory/CPU limits, and tmpfs for /tmp. Closes tinyhumansai#1930. 2. prompt_injection/detector.rs: Expand normalization to handle Cyrillic homoglyphs (а→a, е→e, о→o, etc.), fullwidth ASCII (U+FF01..U+FF5E), and additional zero-width/formatting characters (soft hyphen, RTL overrides, combining grapheme joiner). Adds 4 new tests proving homoglyph-based bypass attempts are now detected. Closes tinyhumansai#1925. 3. webhooks/router.rs: Offload persist() file I/O to spawn_blocking when a tokio runtime is available, falling back to sync write in tests. Prevents async runtime stalls under rapid registration churn. Closes tinyhumansai#1933. 4. core/jsonrpc.rs: Add loud security warning when binding on a non-loopback address without an explicit OPENHUMAN_CORE_TOKEN. The server still starts (with an auto-generated token) but operators get clear stderr + log guidance to set the token. Addresses tinyhumansai#1919. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds three independent hardening improvements: an RPC startup token check for public binds, expanded Unicode obfuscation detection to strip zero-width and homoglyph characters during prompt normalization, and regression tests to verify homoglyph bypasses are now caught. Webhook persistence documentation and debug logging clarify threading and skipped-write behavior. ChangesInfrastructure and Application Security
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/openhuman/prompt_injection/detector.rs (1)
220-227: ⚡ Quick winDeduplicate the obfuscation-codepoint list to prevent future drift.
The same security-sensitive codepoint set is duplicated in
had_zwspdetection and in stripping logic. If one list changes without the other, scoring and normalization can diverge. Consider a shared helper (e.g.,is_obfuscation_char(ch)) used in both places.♻️ Proposed refactor
+fn is_obfuscation_char(ch: char) -> bool { + matches!( + ch, + '\u{200b}' + | '\u{200c}' + | '\u{200d}' + | '\u{2060}' + | '\u{feff}' + | '\u{00ad}' + | '\u{034f}' + | '\u{180e}' + | '\u{200e}' + | '\u{200f}' + | '\u{202a}'..='\u{202e}' + | '\u{2066}'..='\u{2069}' + ) +} + fn normalize_prompt(input: &str) -> NormalizedPrompt { let lowered = input.to_lowercase(); - let had_zwsp = lowered.chars().any(|ch| { - matches!( - ch, - '\u{200b}' | '\u{200c}' | '\u{200d}' | '\u{2060}' | '\u{feff}' - | '\u{00ad}' | '\u{034f}' | '\u{180e}' | '\u{200e}' | '\u{200f}' - | '\u{202a}'..='\u{202e}' | '\u{2066}'..='\u{2069}' - ) - }); + let had_zwsp = lowered.chars().any(is_obfuscation_char); @@ - '\u{200b}' - | '\u{200c}' - | '\u{200d}' - | '\u{2060}' - | '\u{feff}' - | '\u{00ad}' - | '\u{034f}' - | '\u{180e}' - | '\u{200e}' - | '\u{200f}' - | '\u{202a}'..='\u{202e}' - | '\u{2066}'..='\u{2069}' => continue, + ch if is_obfuscation_char(ch) => continue,Also applies to: 255-267
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/prompt_injection/detector.rs` around lines 220 - 227, The duplicated obfuscation-codepoint list used in the had_zwsp detection should be centralized: implement a single helper function (e.g., is_obfuscation_char(ch) or ObfuscationChar::is(ch)) that encapsulates the full set of code points, replace the inline matches in had_zwsp and the stripping/normalization logic with calls to that helper, and update any other occurrences (notably the block around lines 255-267) to use the same helper to avoid divergence between scoring and normalization.src/openhuman/webhooks/router.rs (1)
524-527: ⚡ Quick winAdd debug logging for the runtime detection branch.
The code selects between async offload (
spawn_blocking) and synchronous write based on runtime availability, but does not log which path is taken. This makes it difficult to diagnose persistence behavior in production vs. test environments.As per coding guidelines, "Use
log/tracingatdebugortracelevel on ... any branch that is hard to infer from tests alone."📊 Proposed logging additions
// Offload to a blocking thread if a tokio runtime is available, // otherwise write synchronously (e.g. in unit tests). if tokio::runtime::Handle::try_current().is_ok() { + debug!("[webhooks] Offloading persist to blocking thread pool"); tokio::task::spawn_blocking(do_write); } else { + debug!("[webhooks] No runtime detected; persisting synchronously"); do_write(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/webhooks/router.rs` around lines 524 - 527, Add debug logging to indicate which execution path is taken when deciding between async offload and direct write: around the branch that calls tokio::runtime::Handle::try_current(), log a debug message before calling tokio::task::spawn_blocking(do_write) (e.g., "spawning blocking task for write") and another debug message before calling do_write() synchronously (e.g., "running write synchronously, no Tokio runtime detected"). Use the project's logging crate (tracing::debug or log::debug) so the messages follow existing instrumentation and include the function/closure name (do_write) for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/webhooks/router.rs`:
- Around line 524-528: Add debug logging to the runtime-detection branch in
persist() so it records which path is taken: log a debug message before calling
tokio::task::spawn_blocking(do_write) and another before calling do_write()
directly; reference the do_write closure and the persist() function so the
branch is easy to find. Use log::debug! or tracing::debug! with a stable context
tag like "[webhooks] persist" and a short message ("using spawn_blocking" vs
"running synchronously"). Also add a brief doc comment on persist() noting it is
best-effort and writes may not complete before process exit and routes can be
re-registered.
---
Nitpick comments:
In `@src/openhuman/prompt_injection/detector.rs`:
- Around line 220-227: The duplicated obfuscation-codepoint list used in the
had_zwsp detection should be centralized: implement a single helper function
(e.g., is_obfuscation_char(ch) or ObfuscationChar::is(ch)) that encapsulates the
full set of code points, replace the inline matches in had_zwsp and the
stripping/normalization logic with calls to that helper, and update any other
occurrences (notably the block around lines 255-267) to use the same helper to
avoid divergence between scoring and normalization.
In `@src/openhuman/webhooks/router.rs`:
- Around line 524-527: Add debug logging to indicate which execution path is
taken when deciding between async offload and direct write: around the branch
that calls tokio::runtime::Handle::try_current(), log a debug message before
calling tokio::task::spawn_blocking(do_write) (e.g., "spawning blocking task for
write") and another debug message before calling do_write() synchronously (e.g.,
"running write synchronously, no Tokio runtime detected"). Use the project's
logging crate (tracing::debug or log::debug) so the messages follow existing
instrumentation and include the function/closure name (do_write) for clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 683816bd-b3c0-4933-b48f-3d204c769bb2
📒 Files selected for processing (5)
docker-compose.ymlsrc/core/jsonrpc.rssrc/openhuman/prompt_injection/detector.rssrc/openhuman/prompt_injection/tests.rssrc/openhuman/webhooks/router.rs
…ersist logging 1. prompt_injection/detector.rs: Extract duplicated obfuscation codepoint list into shared `is_obfuscation_char()` helper, used by both `had_zwsp` detection and normalization stripping to prevent future drift. 2. webhooks/router.rs: Add debug logging to the runtime-detection branch in persist() and document best-effort semantics (fire-and-forget write may not complete before process exit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
senamakel
left a comment
There was a problem hiding this comment.
Merge conflict with main — needs a rebase
This PR conflicts with main in src/openhuman/webhooks/router.rs (the persist() method). GitHub reports mergeable: CONFLICTING.
Why
Both branches independently improved persist():
- This PR (
0ccf9a9) — added a docstring marking persist as best-effort +debug!logs on the runtime-detection branches (per CodeRabbit's review). main— added apersist_generation: Arc<AtomicU64>field to drop stale queued writes when a newerpersist()is already queued, and moved JSON serialization inside the spawned closure so it runs off-thread too.
The two diffs touch the same lines but don't overlap semantically — both want to live.
Suggested resolution
Take main's structure (generation counter + serialize-inside-closure) and re-apply this PR's docstring + debug logs on top. Concretely, the merged persist() should look like:
/// Persist current routes to disk (best-effort).
///
/// Serialization happens off the async runtime: when called from a tokio
/// context the file write is offloaded to a blocking thread via
/// [`tokio::task::spawn_blocking`] so the worker is never stalled. Falls
/// back to inline I/O when no runtime is available (e.g. unit tests, CLI
/// one-shots).
///
/// A monotonically-increasing generation counter is bumped on every call;
/// previously queued writes with a stale generation skip the disk write to
/// avoid wasted I/O under rapid registration churn.
///
/// **Note:** Because the write is fire-and-forget, it may not complete
/// before process exit. Routes are re-registered on next startup from the
/// persisted file, so a lost write only means the most recent registration
/// change is replayed.
fn persist(&self) {
let Some(ref path) = self.persist_path else {
return;
};
let persisted = {
let routes = match self.routes.read() {
Ok(r) => r,
Err(_) => return,
};
PersistedRoutes {
registrations: routes.values().cloned().collect(),
}
};
let gen = self.persist_generation.fetch_add(1, Ordering::SeqCst) + 1;
let gen_ref = Arc::clone(&self.persist_generation);
let path = path.clone();
let do_write = move || {
if gen_ref.load(Ordering::SeqCst) != gen {
debug!("[webhooks] persist: skipping stale write (gen {})", gen);
return;
}
if let Some(parent) = path.parent() {
let _ = std::fs::create_dir_all(parent);
}
match serde_json::to_string_pretty(&persisted) {
Ok(json) => {
if let Err(e) = std::fs::write(&path, json) {
warn!("[webhooks] Failed to persist routes to {:?}: {}", path, e);
}
}
Err(e) => {
warn!("[webhooks] Failed to serialize routes: {}", e);
}
}
};
if tokio::runtime::Handle::try_current().is_ok() {
debug!("[webhooks] persist: offloading write to blocking thread pool (gen {})", gen);
tokio::task::spawn_blocking(do_write);
} else {
debug!("[webhooks] persist: no tokio runtime, writing synchronously (gen {})", gen);
do_write();
}
}How to apply
git fetch upstream
git checkout fix/security-hardening-batch2
git merge upstream/main
# resolve src/openhuman/webhooks/router.rs as above
cargo fmt
cargo check
cargo test openhuman::webhooks
git add src/openhuman/webhooks/router.rs
git commit
git pushI verified this resolution locally (cargo check + 89/89 webhook tests pass) but can't push it back to your fork without "Allow edits and access to secrets by maintainers" enabled on the PR. If you flip that on, I can push the resolved merge directly.
Otherwise — once you've rebased and resolved, this PR is good to go on my end. Thanks for the security fixes 🙏
- docker-compose.yml: take upstream's hardening (read_only, cap_drop, configurable mem_limit/cpus were already applied); drop our now-redundant duplicate directives. - webhooks/router.rs: take upstream's generation counter + serialize-inside- closure structure; re-apply our docstring (best-effort note) and debug logs on the runtime-detection branches, as suggested by @senamakel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@senamakel Merge conflict resolved in |
|
@coderabbitai re-review |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/prompt_injection/detector.rs (1)
239-285:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrecomposed Unicode letters still bypass this normalization path.
Line 240 lowercases the raw string, but there is still no decomposition pass before Line 281 turns unknown non-ASCII letters into spaces. A payload like
Ígnore all previous instructionsbecomesgnore all previous instructions, so neither the exact override checks nor the regexes match. That leaves the composed-form bypass from#1925open even after this hardening.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/prompt_injection/detector.rs` around lines 239 - 285, normalize_prompt currently lowercases but never decomposes Unicode, so precomposed letters (e.g., Í) bypass mapping and get turned into spaces; add a Unicode decomposition+combining-mark stripping step (use NFKD or NFD via the unicode_normalization crate) after computing lowered and before iterating chars so composed letters are split into base letters + diacritics and diacritics can be removed (retain reference to normalize_prompt, is_obfuscation_char, BASE64_RE, and SPACE_RE). Ensure you run lowercase after decomposition (or apply to the decomposed chars) so mapping for ASCII and homoglyphs still matches, then proceed with the existing mapping loop; this prevents composed-form bypasses without changing downstream logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/openhuman/prompt_injection/detector.rs`:
- Around line 239-285: normalize_prompt currently lowercases but never
decomposes Unicode, so precomposed letters (e.g., Í) bypass mapping and get
turned into spaces; add a Unicode decomposition+combining-mark stripping step
(use NFKD or NFD via the unicode_normalization crate) after computing lowered
and before iterating chars so composed letters are split into base letters +
diacritics and diacritics can be removed (retain reference to normalize_prompt,
is_obfuscation_char, BASE64_RE, and SPACE_RE). Ensure you run lowercase after
decomposition (or apply to the decomposed chars) so mapping for ASCII and
homoglyphs still matches, then proceed with the existing mapping loop; this
prevents composed-form bypasses without changing downstream logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c4d90ec-b526-449d-aebe-2b72ce5cb313
📒 Files selected for processing (4)
src/core/jsonrpc.rssrc/openhuman/prompt_injection/detector.rssrc/openhuman/prompt_injection/tests.rssrc/openhuman/webhooks/router.rs
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Four security-related changes bundled in one PR: public-bind warning for unauthenticated RPC, Unicode homoglyph normalization in the prompt injection detector, doc/logging improvements for webhook router persist(), and claimed docker-compose hardening. The Rust code changes are well-implemented — homoglyph normalization is thorough, the shared is_obfuscation_char() helper prevents drift, and the fullwidth ASCII math is correct. However, the PR overclaims: docker-compose.yml changes for #1930 are not in the diff, making the "Closes #1930" claim incorrect.
Change Summary
| File | Change type | Description |
|---|---|---|
src/core/jsonrpc.rs |
Modified | Public-bind security warning when OPENHUMAN_CORE_TOKEN is unset on non-loopback |
src/openhuman/prompt_injection/detector.rs |
Modified | Cyrillic homoglyph map, fullwidth ASCII conversion, expanded obfuscation char set, shared is_obfuscation_char() helper |
src/openhuman/prompt_injection/tests.rs |
Modified | 4 new tests: Cyrillic, fullwidth, mixed homoglyphs, soft-hyphen/RTL bypass |
src/openhuman/webhooks/router.rs |
Modified | Expanded doc comments on persist(), debug logging for generation-skipping and runtime detection |
Per-file Analysis
src/core/jsonrpc.rs
Clean implementation. Uses is_public_bind() and CORE_TOKEN_ENV_VAR from existing modules rather than hardcoding. Whitespace-only token values are correctly treated as unset. Logging uses log::error! with [core] prefix, matching the surrounding startup-path code (lines 862-890). The stderr eprintln! with ANSI color codes is appropriate for a security-critical startup warning.
src/openhuman/prompt_injection/detector.rs
Solid normalization expansion. The Cyrillic homoglyph map covers the most commonly abused confusables (а, е, о, р, с, у, х, і, ѕ, һ, ԁ). Fullwidth ASCII conversion math is verified correct: (ch as u32 - 0xff00 + 0x20) maps U+FF01→U+0021 through U+FF5E→U+007E. The is_obfuscation_char() shared helper is the right pattern — single source of truth for both detection and stripping. Obfuscation char set expanded to include soft hyphens, combining grapheme joiners, RTL/LTR overrides, and isolates.
src/openhuman/prompt_injection/tests.rs
Four well-structured tests following the existing enforce() + score >= 0.45 pattern. Good coverage of Cyrillic, fullwidth, mixed, and obfuscation-char attack vectors.
src/openhuman/webhooks/router.rs
Doc comments accurately describe the generation-counter write-skipping and fire-and-forget semantics. Debug logs use [webhooks] prefix consistent with sibling files (ops.rs, bus.rs). No behavioral changes.
|
|
||
| let port = resolved_port; | ||
| let host = resolved_host; | ||
| let bind_addr = format!("{host}:{port}"); |
There was a problem hiding this comment.
[minor] The {workspace} in the output is a literal placeholder (Rust {{ escapes to {), so users see the string {workspace}/core.token rather than the actual path. If the workspace/config directory is accessible at this point in the startup sequence, consider resolving it to the real path — otherwise this is confusing for someone trying to find the file.
// If workspace_dir is available:
eprintln!(
"... the auto-generated token is written to {}/core.token\n",
workspace_dir.display()
);|
@Liohtml The PR description and change table list docker-compose.yml modifications for #1930, and the PR says "Closes #1930" — but docker-compose.yml is not in the diff. The only changed files are |
|
Good catch @graycyrus — the docker-compose.yml changes were dropped during the merge-conflict resolution because upstream independently applied equivalent hardening ( |
|
@graycyrus PR description has been updated — removed the #1930 claim and docker-compose.yml row from the change table. Could you dismiss your review when you get a chance? Thanks! |
…public-bind warning (tinyhumansai#2011) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Three targeted fixes addressing open security and reliability issues:
is_obfuscation_char()helper prevents list drift.OPENHUMAN_CORE_HOSTis non-loopback without explicitOPENHUMAN_CORE_TOKENChanges
src/openhuman/prompt_injection/detector.rsis_obfuscation_char()helpersrc/openhuman/prompt_injection/tests.rssrc/openhuman/webhooks/router.rssrc/core/jsonrpc.rsTest plan
cargo fmt --all -- --check— cleancargo check -p openhuman --lib— no errorscargo test -p openhuman "prompt_injection"— 18/18 pass (4 new)cargo test -p openhuman "webhooks::router"— 25/25 passCloses #1925, #1933. Addresses #1919.
🤖 Generated with Claude Code