Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready
pip install markdown-redactor
printf "Contact me at jane@example.com\n" | markdown-redactor -Expected output:
Contact me at [REDACTED]
See docs/GUIDE.md for the full API and CLI usage guide.
- Who is this for
- Key features
- Built-in redaction rules
- How redaction works
- Performance
- Security and compliance notes
- Troubleshooting
- Additional resources
- Development and contribution
- Release process
- Teams feeding Markdown documents into LLMs (RAG, agents, chat pipelines)
- Security-conscious teams that need deterministic redaction before inference
- Developers who want a small codebase with extensible rules
- Pluggable architecture: register custom redaction rules without touching core engine
- Markdown-aware behavior: by default, skips fenced code blocks and inline code spans
- Lightweight runtime: zero runtime dependencies
- Typed API: strict typing-friendly design
- Operational visibility: per-rule match counters and timing stats
Default engine includes 24 rules:
email,phoneipv4,ipv6us_ssn,us_einuk_ninoin_pan,in_aadhaar,in_gstinbr_cpf,br_cnpjiban,swift_bic,eu_vatlabeled_sensitive_id(tax ID, driver license, passport, national ID labels)secret_assignment(password/api_key/token style assignments)credential_uri(connection-string credentials)aws_access_key,generic_token,google_api_key,jwt,private_keycredit_card(Luhn-validated to reduce false positives)
- Markdown text is segmented.
- Based on config, non-redactable segments (like fenced code) can be preserved.
- Each redactable segment is processed by registered rules in order.
- Output and stats are returned.
This makes behavior explicit and easy to extend.
Runs in
- This is best-effort pattern redaction, not formal DLP certification
- Always validate on your real data and threat model
- Combine with downstream controls (access controls, logging, policy engines)
- Add organization-specific rules for identifiers, ticket IDs, or internal secrets
- Verify you are using
create_default_engine()or registering custom rules - Check whether content is inside fenced/inline code that is skipped by default
- Tighten custom regex patterns
- Keep
--redact-inline-code/--redact-fenced-code-blocksdisabled unless required
- Ensure package is installed in active environment
- Try module mode:
python -m markdown_redactor.cli input.md
- Full usage guide: docs/GUIDE.md
- Architecture guide: docs/ARCHITECTURE.md
- FAQ: docs/FAQ.md
- Support process: SUPPORT.md
- Security policy: SECURITY.md
- Changelog: CHANGELOG.md
- Releasing guide: docs/RELEASING.md
- Guided onboarding docs: docs/README.md
- Runnable examples:
See CONTRIBUTING.md for setup and quality checks.
Primary local quality command:
PYTHONPATH=src .venv/bin/python -m ruff check src tests && \
PYTHONPATH=src .venv/bin/python -m mypy src && \
PYTHONPATH=src .venv/bin/python -m pytestMaintainers can follow docs/RELEASING.md.
Publishing is automated via .github/workflows/release.yml on tags matching v*.
GitHub Release notes and signed provenance attestations are generated via .github/workflows/github-release.yml.