Skip to content

ECHO-675 ECHO-721 Anonymize transcripts fixes#482

Merged
ussaama merged 1 commit intomainfrom
testing
Mar 17, 2026
Merged

ECHO-675 ECHO-721 Anonymize transcripts fixes#482
ussaama merged 1 commit intomainfrom
testing

Conversation

@ussaama
Copy link
Copy Markdown
Contributor

@ussaama ussaama commented Mar 17, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added conditional PII redaction support for anonymized transcripts with guided redaction patterns for personal information.
  • Improvements

    • Enhanced rendering of redacted content with visual indicators in the markdown editor.
    • Improved redacted token handling throughout the interface for better user experience.

@linear
Copy link
Copy Markdown

linear bot commented Mar 17, 2026

@ussaama ussaama requested a review from spashii March 17, 2026 16:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 17, 2026

Walkthrough

Frontend and backend components are enhanced to support PII redaction and anonymization. Frontend markdown rendering now processes and displays redacted tokens, with utility functions exposed for token escaping. Backend propagates anonymization preferences through system prompts and artifact generation, with conditional PII redaction blocks added across multiple locale-specific templates.

Changes

Cohort / File(s) Summary
Frontend Markdown & Redaction Core
echo/frontend/src/components/common/Markdown.tsx, echo/frontend/src/components/common/RedactedText.tsx, echo/frontend/src/components/form/MarkdownWYSIWYG/MarkdownWYSIWYG.tsx
Exports redaction utilities (escapeRedactedTokens, unescapeRedactedTokens, REDACTED_CODE_PREFIX), updates Markdown component to render redacted code blocks with RedactedBadge, and adds safe escaping/unescaping in MarkdownWYSIWYG editor to handle redacted tokens bidirectionally.
Frontend Message Rendering
echo/frontend/src/components/participant/UserChunkMessage.tsx
Removes conditional RedactedText wrapping from transcripts, allowing redacted content to render directly as markdown without special handling.
Backend Anonymization Flag Wiring
echo/server/dembrane/api/verify.py, echo/server/dembrane/reply_utils.py
Propagates is_anonymized flag from project settings through artifact generation and reply prompts; adds pii_redaction context to system prompt payloads to conditionally enforce PII redaction in generated responses.
Backend Prompt Templates
echo/server/prompt_templates/generate_artifact.en.jinja, echo/server/prompt_templates/revise_artifact.en.jinja, echo/server/prompt_templates/get_reply_system.*.jinja (de, en, es, fr, it, nl)
Introduces conditional PII redaction blocks gated by pii_redaction flag; each template specifies detection rules, allowed token placeholders (e.g., `<redacted_name>`), locale hints, and formatting requirements to ensure consistent redaction behavior across languages.
Localization Reference Updates
echo/frontend/src/locales/*.po (de-DE, en-US, es-ES, fr-FR, it-IT, nl-NL)
Updates source file reference line numbers across translation entries due to code reorganization; no translation content or semantic meaning is altered.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ECHO-673 Fixes #445: Adds RedactedText component usage across frontend; this PR extends that by exporting redaction utilities and integrating token escaping into markdown rendering pipelines.
  • ECHO-571 pre release changes and fixes #371: Modifies the same server-side prompt templates (get_reply_system.*.jinja) and reply logic; these changes unify the template structure for PII redaction across locales.
  • ECHO-668 helper changes for anonymous transcripts mode (#436) #437: Implements the broader anonymization/PII-redaction feature; this PR provides complementary frontend and backend infrastructure for handling and rendering redacted tokens consistently.

Suggested labels

Feature, improvement

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the ticket numbers (ECHO-675, ECHO-721) and accurately summarizes the main feature—anonymizing transcripts with bug fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch testing
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can generate walkthrough in a markdown collapsible section to save space.

Enable the reviews.collapse_walkthrough setting to generate walkthrough in a markdown collapsible section.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
echo/server/prompt_templates/get_reply_system.en.jinja (1)

1-24: ⚠️ Potential issue | 🟠 Major

Drop the keyterms exception in anonymized mode.

The allow-list in Lines 37-38 is both undefined and unsafe: this template never exposes a keyterms section, so the model cannot apply the rule deterministically, and if that input is wired in later it becomes a straight bypass for anonymization for any name/email/phone placed there. Remove the exception or constrain it to a separate non-PII allow-list, then mirror that fix across the copied blocks in the localized reply and artifact templates.

🔒 Suggested change
- - Allow-list: if a detected PII token or exact multi-token phrase appears in keyterms, KEEP it as spoken (no redaction).
+ - Do not whitelist detected PII from project metadata or key terms. If a token matches a PII class, redact it.

Also applies to: 37-38

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@echo/server/prompt_templates/get_reply_system.en.jinja` around lines 1 - 24,
Remove the unsafe `keyterms` exception from the anonymization allow-list in the
get_reply_system template: eliminate any branch or rule that treats a `keyterms`
section as a bypass for PII removal, or alternatively replace it with a
clearly-named non-PII allow-list (e.g., `non_pii_allowlist`) and enforce that
only non-PII tokens are permitted; update the same change in the localized reply
and artifact templates (the copied blocks that reference `keyterms`) so no
template exposes an undefined or unrestricted `keyterms` allow-list that could
bypass anonymization.
echo/server/dembrane/reply_utils.py (2)

176-187: ⚠️ Potential issue | 🟠 Major

Prompt-level redaction still leaks participant names.

When is_anonymized is true, we only add a pii_redaction instruction, but the current conversation and every adjacent conversation still serialize participant_name verbatim. The model still sees real names and can echo them back, so anonymized replies stay best-effort. Replace names with <redacted_name> or drop the field before format_conversation().

Also applies to: 280-285, 299-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@echo/server/dembrane/reply_utils.py` around lines 176 - 187, The code builds
Conversation objects with participant_name even when is_anonymized is true, so
redact or remove participant_name before creating or serializing Conversations
passed to format_conversation(); specifically, when is_anonymized is true set
Conversation.name to "<redacted_name>" (or delete the field) for
current_conversation and any adjacent conversations built the same way, and
ensure build_conversation_transcript/format_conversation are not re-introducing
real names. Update the Conversation constructors and the other similar blocks
that construct adjacent conversations (the other Conversation-creation sites) to
apply the same redaction.

124-164: ⚠️ Potential issue | 🟠 Major

Don't block the event loop with Directus and token counting here.

generate_reply_for_conversation() is async, but these directus.get_items / directus.create_item calls and repeated token_counter(...) invocations still run synchronously on the request path. On large projects, this will pin the worker during prompt assembly and reply persistence. Push them behind await run_in_thread_pool(...).

As per coding guidelines "Always wrap blocking I/O calls using run_in_thread_pool from dembrane.async_helpers in backend code. Wrap calls to directus.* ... and CPU-heavy utilities like token counting or summary generation if they are sync."

Also applies to: 237-255, 287-327, 470-477

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@echo/server/dembrane/reply_utils.py` around lines 124 - 164,
generate_reply_for_conversation currently calls blocking functions
(directus.get_items, directus.create_item and synchronous token_counter calls)
directly on the async path; wrap those calls using await run_in_thread_pool(...)
from dembrane.async_helpers so they execute off the event loop. Specifically,
replace directus.get_items and any directus.create_item invocations inside
generate_reply_for_conversation with calls executed via run_in_thread_pool, and
wrap synchronous token_counter and other CPU-bound utilities used during prompt
assembly and reply persistence similarly. Also apply the same wrapping to the
other occurrences in this module where directus.* or token_counter are called
(e.g., the blocks that assemble prompts and persist replies), ensuring you
import run_in_thread_pool and keep call signatures consistent when moving the
sync call into the thread pool.
echo/server/dembrane/api/verify.py (1)

744-775: ⚠️ Potential issue | 🟠 Major

Don't pass the participant name into anonymized artifact prompts.

is_anonymized only redacts email here. We still inject participant_name unchanged, which lets the model reuse a real name despite the new pii_redaction guard. Redact or omit it alongside email.

🐛 Minimal fix
     participant_name = conversation.get("participant_name")
     if participant_name:
-        parts.append(f"Participant name: {participant_name}")
+        parts.append(
+            f"Participant name: {'<redacted_name>' if is_anonymized else participant_name}"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@echo/server/dembrane/api/verify.py` around lines 744 - 775, The function
_build_user_message_content currently only redacts participant_email but still
injects participant_name; when is_anonymized is True you must redact or omit
participant_name as well (e.g., use '<redacted_name>' or skip adding the
"Participant name" line). Update the logic around participant_name in
_build_user_message_content to respect the is_anonymized flag and also ensure
any previous-artifact rendering via _format_previous_artifacts does not
reintroduce the real name (pass a redaction flag into _format_previous_artifacts
or ensure it performs its own redaction).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@echo/frontend/src/components/common/RedactedText.tsx`:
- Around line 5-33: The current escape/unescape uses the public inline-code form
"redacted:" and causes collisions; change to a collision-safe internal sentinel
and update the related regexes and functions: replace REDACTED_CODE_PREFIX with
a non-colliding sentinel string (e.g. include unlikely characters or a GUID-like
token) and update SAFE_REDACTED_PATTERN to match that sentinel, then update
escapeRedactedTokens to emit `\`<sentinel><type>\`` and unescapeRedactedTokens
to reverse only that sentinel pattern; ensure REDACTED_PATTERN (the original
incoming token regex) and the replacement logic in escapeRedactedTokens and
unescapeRedactedTokens are consistent so normal user-authored code like
`redacted:name` is no longer transformed.

In `@echo/server/prompt_templates/generate_artifact.en.jinja`:
- Around line 20-25: The PII redaction block references an allow-list via
`keyterms` but the template never receives `keyterms`, making the rule
non-deterministic; either remove the allow-list clause or accept and use a
`keyterms` context variable in this template so the rule can be applied
deterministically. Update the `PII Redaction (MANDATORY)` block: if you choose
to keep the allow-list, add `keyterms` to the template context and reference it
wherever the allow-list is checked; otherwise delete the sentence that preserves
detected PII when in `keyterms` so all detected PII is consistently redacted.
Ensure references in the template to `keyterms` (the allow-list) match the exact
variable name you add to callers so behavior is deterministic.

In `@echo/server/prompt_templates/get_reply_system.en.jinja`:
- Around line 31-50: The PII redaction rules are duplicated across multiple
Jinja templates (get_reply_system.*.jinja and revise_artifact.en.jinja); extract
the common placeholder map and cleanup rules into a single shared Jinja macro or
data file (e.g., redaction_macros.jinja providing a macro or variables) and
replace the duplicated blocks in each template with an import/use of that macro
(use {% import %} or {% include %} and call the macro to render the rules),
leaving only the locale-specific prose inside each get_reply_system.*.jinja and
revise_artifact.en.jinja file; update references to the placeholders so
templates preserve existing <redacted_*> tokens exactly and keep the
allow-list/cleanup behavior inside the shared macro.

---

Outside diff comments:
In `@echo/server/dembrane/api/verify.py`:
- Around line 744-775: The function _build_user_message_content currently only
redacts participant_email but still injects participant_name; when is_anonymized
is True you must redact or omit participant_name as well (e.g., use
'<redacted_name>' or skip adding the "Participant name" line). Update the logic
around participant_name in _build_user_message_content to respect the
is_anonymized flag and also ensure any previous-artifact rendering via
_format_previous_artifacts does not reintroduce the real name (pass a redaction
flag into _format_previous_artifacts or ensure it performs its own redaction).

In `@echo/server/dembrane/reply_utils.py`:
- Around line 176-187: The code builds Conversation objects with
participant_name even when is_anonymized is true, so redact or remove
participant_name before creating or serializing Conversations passed to
format_conversation(); specifically, when is_anonymized is true set
Conversation.name to "<redacted_name>" (or delete the field) for
current_conversation and any adjacent conversations built the same way, and
ensure build_conversation_transcript/format_conversation are not re-introducing
real names. Update the Conversation constructors and the other similar blocks
that construct adjacent conversations (the other Conversation-creation sites) to
apply the same redaction.
- Around line 124-164: generate_reply_for_conversation currently calls blocking
functions (directus.get_items, directus.create_item and synchronous
token_counter calls) directly on the async path; wrap those calls using await
run_in_thread_pool(...) from dembrane.async_helpers so they execute off the
event loop. Specifically, replace directus.get_items and any
directus.create_item invocations inside generate_reply_for_conversation with
calls executed via run_in_thread_pool, and wrap synchronous token_counter and
other CPU-bound utilities used during prompt assembly and reply persistence
similarly. Also apply the same wrapping to the other occurrences in this module
where directus.* or token_counter are called (e.g., the blocks that assemble
prompts and persist replies), ensuring you import run_in_thread_pool and keep
call signatures consistent when moving the sync call into the thread pool.

In `@echo/server/prompt_templates/get_reply_system.en.jinja`:
- Around line 1-24: Remove the unsafe `keyterms` exception from the
anonymization allow-list in the get_reply_system template: eliminate any branch
or rule that treats a `keyterms` section as a bypass for PII removal, or
alternatively replace it with a clearly-named non-PII allow-list (e.g.,
`non_pii_allowlist`) and enforce that only non-PII tokens are permitted; update
the same change in the localized reply and artifact templates (the copied blocks
that reference `keyterms`) so no template exposes an undefined or unrestricted
`keyterms` allow-list that could bypass anonymization.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b7d65812-545b-41ab-8b3f-7940d5778f41

📥 Commits

Reviewing files that changed from the base of the PR and between ddbfa7d and 012e498.

📒 Files selected for processing (20)
  • echo/frontend/src/components/common/Markdown.tsx
  • echo/frontend/src/components/common/RedactedText.tsx
  • echo/frontend/src/components/form/MarkdownWYSIWYG/MarkdownWYSIWYG.tsx
  • echo/frontend/src/components/participant/UserChunkMessage.tsx
  • echo/frontend/src/locales/de-DE.po
  • echo/frontend/src/locales/en-US.po
  • echo/frontend/src/locales/es-ES.po
  • echo/frontend/src/locales/fr-FR.po
  • echo/frontend/src/locales/it-IT.po
  • echo/frontend/src/locales/nl-NL.po
  • echo/server/dembrane/api/verify.py
  • echo/server/dembrane/reply_utils.py
  • echo/server/prompt_templates/generate_artifact.en.jinja
  • echo/server/prompt_templates/get_reply_system.de.jinja
  • echo/server/prompt_templates/get_reply_system.en.jinja
  • echo/server/prompt_templates/get_reply_system.es.jinja
  • echo/server/prompt_templates/get_reply_system.fr.jinja
  • echo/server/prompt_templates/get_reply_system.it.jinja
  • echo/server/prompt_templates/get_reply_system.nl.jinja
  • echo/server/prompt_templates/revise_artifact.en.jinja
💤 Files with no reviewable changes (1)
  • echo/frontend/src/components/participant/UserChunkMessage.tsx

@ussaama ussaama enabled auto-merge March 17, 2026 17:13
@ussaama ussaama added this pull request to the merge queue Mar 17, 2026
Merged via the queue into main with commit 68eaa10 Mar 17, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants