feat(contentsafety): add content-safety scanning at CLI output boundary#435
feat(contentsafety): add content-safety scanning at CLI output boundary#435
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (23)
✅ Files skipped from review due to trivial changes (9)
🚧 Files skipped from review as they are similar to previous changes (10)
📝 WalkthroughWalkthroughAdds a content-safety scanning subsystem for CLI responses: provider interface and registry, a regex-based provider, allowlist/mode controls, scan-before-emit integration into output paths, command-path propagation to response handling, and extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIHandler as API/Service Handler
participant RespHandler as Response Handler
participant OutputEmitter as Output Emitter
participant SafetyGate as Content-Safety Gate
participant Provider as Content-Safety Provider
participant StdOut as StdOut/Err
Client->>APIHandler: request
APIHandler->>RespHandler: HandleResponse(data, opts with CommandPath)
RespHandler->>OutputEmitter: EmitLarkResponse/EmitShortcut(data, format...)
OutputEmitter->>SafetyGate: ScanForSafety(cmdPath, data)
SafetyGate->>SafetyGate: check mode & allowlist
alt scan needed
SafetyGate->>Provider: Scan(ctx, ScanRequest)
Provider-->>SafetyGate: Alert or nil
alt Alert & Mode=Block
SafetyGate-->>OutputEmitter: return block error (no output)
else Alert & Mode=Warn
SafetyGate-->>OutputEmitter: ScanResult{Alert}
OutputEmitter->>StdOut: emit output with _content_safety_alert / warn to Err
else no alert
SafetyGate-->>OutputEmitter: proceed
OutputEmitter->>StdOut: emit normal output
end
else not scanning
SafetyGate-->>OutputEmitter: proceed
OutputEmitter->>StdOut: emit normal output
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@17655ae6fbe5e28bb2faeea91f144b0cf6dcb8b8🧩 Skill updatenpx skills add larksuite/cli#feat/content-safety -y -g |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/security/contentsafety/normalize_test.go (1)
79-87: Strengthen the unmarshalable fallback assertion.The current check only verifies non-nil. Assert type preservation so regressions are caught.
Suggested test tightening
func TestNormalize_Unmarshalable(t *testing.T) { // A struct with a func field cannot be marshaled; normalize must return original. type Bad struct{ F func() } v := Bad{F: func() {}} got := normalize(v) - // Just verify no panic and that we got something back (the original value). - if got == nil { - t.Error("expected non-nil return for unmarshalable value") - } + if _, ok := got.(Bad); !ok { + t.Fatalf("expected original Bad value, got %T", got) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/security/contentsafety/normalize_test.go` around lines 79 - 87, The test TestNormalize_Unmarshalable must not only ensure normalize(v) doesn't return nil but also that it returns a value of the original type; update the assertion on the result of calling normalize(v) to verify the returned value's dynamic type matches the input type (i.e., preserve type Bad) using a type assertion or reflect.TypeOf on the result from normalize, so regressions that change the fallback behavior are detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/security/contentsafety/injection.go`:
- Around line 35-37: The delimiter_smuggle regex currently uses
regexp.MustCompile(...) and is case-sensitive, so tokens like "### SYSTEM:" or
"### Assistant:" bypass it; update the pattern in delimiter_smuggle by adding
the case-insensitive flag at the start (e.g. prepend "(?i)" to the existing
regex passed to regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.
---
Nitpick comments:
In `@internal/security/contentsafety/normalize_test.go`:
- Around line 79-87: The test TestNormalize_Unmarshalable must not only ensure
normalize(v) doesn't return nil but also that it returns a value of the original
type; update the assertion on the result of calling normalize(v) to verify the
returned value's dynamic type matches the input type (i.e., preserve type Bad)
using a type assertion or reflect.TypeOf on the result from normalize, so
regressions that change the fallback behavior are detected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 695887df-217d-49bf-b922-dbc8e6d6086d
📒 Files selected for processing (23)
cmd/api/api.gocmd/service/service.goextension/contentsafety/registry.goextension/contentsafety/types.goextension/contentsafety/types_test.gointernal/client/response.gointernal/cmdutil/factory_default.gointernal/envvars/envvars.gointernal/output/emit.gointernal/output/emit_core.gointernal/output/emit_core_test.gointernal/output/emit_integration_test.gointernal/output/emit_test.gointernal/output/envelope.gointernal/security/contentsafety/injection.gointernal/security/contentsafety/injection_test.gointernal/security/contentsafety/normalize.gointernal/security/contentsafety/normalize_test.gointernal/security/contentsafety/provider.gointernal/security/contentsafety/provider_test.gointernal/security/contentsafety/scanner.gointernal/security/contentsafety/scanner_test.goshortcuts/common/runner.go
| Pattern: regexp.MustCompile( | ||
| `<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`, | ||
| ), |
There was a problem hiding this comment.
Make delimiter_smuggle case-insensitive to avoid simple bypasses.
Line 36 currently misses (?i), so variants like ### SYSTEM: / ### Assistant: won’t match.
Suggested fix
{
ID: "delimiter_smuggle",
Pattern: regexp.MustCompile(
- `<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
+ `(?i)<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`,
),
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/security/contentsafety/injection.go` around lines 35 - 37, The
delimiter_smuggle regex currently uses regexp.MustCompile(...) and is
case-sensitive, so tokens like "### SYSTEM:" or "### Assistant:" bypass it;
update the pattern in delimiter_smuggle by adding the case-insensitive flag at
the start (e.g. prepend "(?i)" to the existing regex passed to
regexp.MustCompile) so the whole alternation
(`<\|im_(start|end|sep)\|>|<\|endoftext\|>|###\s*(system|assistant|user)\s*:`)
is matched case-insensitively. Ensure you modify the pattern string where
regexp.MustCompile is called (the variable/reference named delimiter_smuggle)
and run tests to confirm behavior.
Greptile SummaryThis PR adds an opt-in content-safety scanning layer ( Confidence Score: 4/5Safe to merge with reservations — three open P1 findings from prior threads (map mutation, goroutine data race on bytes.Buffer, dead EmitShortcut export) should be addressed before shipping. The core scanning logic is well-structured, fail-open, and thoroughly tested. Two new P2 findings (isLarkShapedMap too broad, wrapBlockError nil fragility) are minor. However, three P1 issues from prior review threads remain unaddressed: routeLarkAlert mutates the caller's map in-place, the leaked goroutine can race on non-thread-safe errOut writers in tests, and EmitShortcut is documented as the canonical shortcut path but is never actually called by runner.go. internal/output/emit.go (routeLarkAlert mutation, isLarkShapedMap), internal/output/emit_core.go (goroutine/errOut race, wrapBlockError fragility) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[API Response] --> B{MODE env var}
B -- off / unset --> Z[Pass through - no scanning]
B -- warn / block --> C{Command in ALLOWLIST?}
C -- no --> Z
C -- yes --> D{Provider registered?}
D -- nil --> Z
D -- regex provider --> E[runContentSafety goroutine + 100ms deadline]
E -- timeout / panic / error --> Z
E -- clean nil alert --> Z
E -- hit + warn mode --> F[alert non-nil errBlocked = nil]
E -- hit + block mode --> G[alert non-nil errBlocked returned]
G --> H[wrapBlockError - return ExitError]
F --> I{Output path}
I -- EmitShortcut json/jq --> J[Envelope._content_safety_alert]
I -- EmitShortcut pretty+prettyFn --> K[WriteAlertWarning to stderr]
I -- EmitLarkResponse Lark-shaped JSON --> L[Inject into response map]
I -- EmitLarkResponse non-JSON / jq --> K
I -- runner.Out --> J
I -- runner.OutFormat pretty+fn / default --> K
Reviews (2): Last reviewed commit: "feat(contentsafety): add content-safety ..." | Re-trigger Greptile |
Add a cross-cutting content-safety layer that scans API responses for prompt injection patterns before they reach AI agents. The feature is opt-in via two environment variables (MODE + ALLOWLIST) and defaults to off with zero impact on existing behavior. Architecture: - extension/contentsafety: pluggable Provider interface and registry - internal/security/contentsafety: built-in regex provider (4 rules) - internal/output: ScanForSafety entry point + EmitLarkResponse - runner.go Out/OutFormat: 5-line block check at top, minimal invasiveness - response.go HandleResponse: routes through EmitLarkResponse Key design decisions: - Single-provider registry (last-write-wins), aligned with extension/transport - 100ms deadline enforced via goroutine+select (works even if provider ignores ctx) - Panic/error/timeout all fail-open with stderr warning - Built-in provider self-registers via init(), activated by blank import in factory_default.go Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b50cb5b to
17655ae
Compare
alwaysxyang
left a comment
There was a problem hiding this comment.
Thank you for this excellent and highly valuable contribution! This PR introduces critical prompt injection protection capabilities for Lark CLI's agent-native operation model, addressing a significant security risk as AI agents increasingly become primary users of command-line tools.
As AI agents integrate more deeply with CLI tools to automate workflows, the risk of prompt injection attacks where malicious actors embed hidden instructions in API responses (from documents, messages, calendar entries, and other platform resources) has become a major concern. This implementation provides a strong first line of defense against such attacks, and we are excited to see this security feature added to Lark CLI.
Key Improvements & Recommendations for Next Steps:
- Configuration Management Enhancement:
We recommend extracting the current hardcoded regex detection rules into a dedicated, human-readable configuration file format (YAML or JSON is preferred). This configuration file should include a comprehensive default set of security rules provided out of the box that cover common prompt injection patterns including instruction override attacks, role injection, delimiter smuggling, and system prompt exfiltration attempts.
Additionally, comprehensive documentation should be added to the README file explaining:
- The structure of the configuration file
- How to add custom detection rules tailored to specific use cases
- How to adjust rule sensitivity levels
- How to disable specific rules that may cause false positives in particular environments
This configuration-driven approach will allow users to update security rules without rebuilding the CLI, significantly improving flexibility and response time to emerging attack patterns.
- Clear Feature Positioning & Disclaimer:
This functionality should be positioned explicitly as user-configurable, self-managed content safety protection. The feature is designed to empower end users and organizations to adjust their security posture according to their specific risk tolerance and usage requirements:
- Users operating in high-security environments can enable strict blocking mode and add custom internal rules
- Users in less restrictive environments can run in warning-only mode
- Users with specialized use cases can disable the feature entirely if it conflicts with their workflows
Important Disclaimer: The content safety capabilities provided in this PR are self-service security tools intended for user customization. The Lark platform does not provide any warranties, express or implied, regarding the effectiveness, accuracy, or reliability of these security rules or detection capabilities. Users are solely responsible for evaluating, configuring, and maintaining these security controls to meet their specific security requirements, and for any consequences resulting from their use or misuse of these capabilities.
The implementation should maintain this user-centric approach, ensuring security controls are flexible rather than one-size-fits-all.
This contribution provides an incredibly strong foundation for security in AI agent workflows, and we greatly appreciate the effort that went into this implementation. We look forward to working with you to refine and expand these capabilities!
Summary
LARK_CONTENT_SAFETY_MODE+LARK_CONTENT_SAFETY_ALLOWLIST) and defaults to off with zero impact on existing behaviorArchitecture
extension/contentsafety: pluggable Provider interface and registryinternal/security/contentsafety: built-in regex provider (4 rules)internal/output:ScanForSafetyentry point +EmitLarkResponserunner.goOut/OutFormat: 5-line block check at top, minimal invasivenessresponse.goHandleResponse: routes throughEmitLarkResponseKey design decisions
extension/transportinit(), activated by blank import infactory_default.goTest plan
EmitLarkResponsewith safety scanning🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests