markdown-redactor

Rule-based PII and secret redaction for Markdown documents — audit log, risk-level filtering, LLM pipeline ready

Quick start

pip install markdown-redactor
printf "Contact me at jane@example.com\n" | markdown-redactor -

Expected output:

Contact me at [REDACTED]

See docs/GUIDE.md for the full API and CLI usage guide.

Who is this for

Teams feeding Markdown documents into LLMs (RAG, agents, chat pipelines)
Security-conscious teams that need deterministic redaction before inference
Developers who want a small codebase with extensible rules

Key features

Pluggable architecture: register custom redaction rules without touching core engine
Markdown-aware behavior: by default, skips fenced code blocks and inline code spans
Lightweight runtime: zero runtime dependencies
Typed API: strict typing-friendly design
Operational visibility: per-rule match counters and timing stats

Built-in redaction rules

Default engine includes 24 rules:

email, phone
ipv4, ipv6
us_ssn, us_ein
uk_nino
in_pan, in_aadhaar, in_gstin
br_cpf, br_cnpj
iban, swift_bic, eu_vat
labeled_sensitive_id (tax ID, driver license, passport, national ID labels)
secret_assignment (password/api_key/token style assignments)
credential_uri (connection-string credentials)
aws_access_key, generic_token, google_api_key, jwt, private_key
credit_card (Luhn-validated to reduce false positives)

How redaction works

Markdown text is segmented.
Based on config, non-redactable segments (like fenced code) can be preserved.
Each redactable segment is processed by registered rules in order.
Output and stats are returned.

This makes behavior explicit and easy to extend.

Performance

Runs in $O(n \cdot r)$ time where $n$ is input length and $r$ is active rule count. No network I/O, no AST parsing, no heavy dependencies.

Security and compliance notes

This is best-effort pattern redaction, not formal DLP certification
Always validate on your real data and threat model
Combine with downstream controls (access controls, logging, policy engines)
Add organization-specific rules for identifiers, ticket IDs, or internal secrets

Troubleshooting

Nothing is being redacted

Verify you are using create_default_engine() or registering custom rules
Check whether content is inside fenced/inline code that is skipped by default

Too much is being redacted

Tighten custom regex patterns
Keep --redact-inline-code / --redact-fenced-code-blocks disabled unless required

CLI command not found

Ensure package is installed in active environment
Try module mode: python -m markdown_redactor.cli input.md

Additional resources

Full usage guide: docs/GUIDE.md
Architecture guide: docs/ARCHITECTURE.md
FAQ: docs/FAQ.md
Support process: SUPPORT.md
Security policy: SECURITY.md
Changelog: CHANGELOG.md
Releasing guide: docs/RELEASING.md
Guided onboarding docs: docs/README.md
Runnable examples:

Development and contribution

See CONTRIBUTING.md for setup and quality checks.

Primary local quality command:

PYTHONPATH=src .venv/bin/python -m ruff check src tests && \
PYTHONPATH=src .venv/bin/python -m mypy src && \
PYTHONPATH=src .venv/bin/python -m pytest

Release process

Maintainers can follow docs/RELEASING.md.

Publishing is automated via .github/workflows/release.yml on tags matching v*. GitHub Release notes and signed provenance attestations are generated via .github/workflows/github-release.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markdown-redactor

Quick start

Table of contents

Who is this for

Key features

Built-in redaction rules

How redaction works

Performance

Security and compliance notes

Troubleshooting

Nothing is being redacted

Too much is being redacted

CLI command not found

Additional resources

Development and contribution

Release process

About

Uh oh!

Releases 4

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
docs		docs
examples		examples
src/markdown_redactor		src/markdown_redactor
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

markdown-redactor

Quick start

Table of contents

Who is this for

Key features

Built-in redaction rules

How redaction works

Performance

Security and compliance notes

Troubleshooting

Nothing is being redacted

Too much is being redacted

CLI command not found

Additional resources

Development and contribution

Release process

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors

Uh oh!

Languages