A multi-agent system for extracting requirements from data privacy and tech regulation statutes using the Anthropic Agent SDK.
-
Multi-agent architecture: Specialized agents for different tasks:
- Statute Reader: Parses statute structure (definitions, applicability, rights, duties, exemptions, enforcement)
- Section Analyzer: Extracts specific requirements with exact citations
- Citation Verifier: Validates all citations against the original text
- Requirement Classifier: Categorizes requirements (disclosure, operational, technical, enforcement)
-
Built-in viewer: Interactive HTML viewer for exploring extracted requirements, filtering by category/confidence, and browsing the statute structure tree
-
Model Configuration:
- Orchestrator: Uses Opus for complex coordination
- Subagents: Use Sonnet for specialized tasks
-
Anti-hallucination measures:
- Every requirement must have a direct quote from the statute
- Two-pass verification (extract then verify)
- Confidence scoring for citations
- Flagging of unverified requirements
-
Diagnostics and logging:
- Phase 1 logs section/definition counts on success, or a clear warning on failure
- Parse failures log the raw agent response details for debugging
-
Statute interpretation skill: Incorporates statutory interpretation guidance from legal experts
-
PDF Support: Can parse both text files and PDFs (with pdfplumber or pypdf)
# Install from PyPI
pip install techreg-parser
# With PDF support
pip install techreg-parser[pdf]
# Or install locally for development
pip install -e .# Analyze a statute and output JSON
techreg-parser path/to/statute.txt --output results.json
# Analyze a PDF statute
techreg-parser path/to/statute.pdf --output results.json
# Output markdown report
techreg-parser path/to/statute.txt --output analysis.md --format markdown
# Skip citation verification (faster but less reliable)
techreg-parser path/to/statute.txt --no-verify
# Batch-process a directory of PDFs with 3 concurrent workers
techreg-parser --input-dir "State Privacy Laws/" --parallel 3 --output results/
# Disable Phase 1 structure caching
techreg-parser path/to/statute.txt --no-cache --output results.json
# Open the interactive viewer (then select a JSON results file)
techreg-parser viewimport asyncio
from TechRegParser import TechRegParserOrchestrator, OrchestratorConfig
async def main():
config = OrchestratorConfig(
verify_citations=True,
classify_requirements=True,
)
parser = TechRegParserOrchestrator(config=config)
result = await parser.analyze_statute(
statute_path="path/to/texas_privacy_law.txt",
output_format="json"
)
# Access results
for req in result.requirements:
print(f"Requirement: {req.description}")
print(f" Citation: {req.citation.section}")
print(f" Category: {req.category.value}")
print(f" Verified: {req.verified}")
print()
# Export to file
await parser.export_results(result, "output.json", format="json")
asyncio.run(main()) +-------------------+
| Orchestrator |
| (Opus Model) |
+--------+----------+
|
+--------------------+--------------------+
| | | |
+-------v----+ +----v-----+ +-----v------+ +-----v------+
| Statute | | Section | | Citation | |Requirement |
| Reader | | Analyzer | | Verifier | | Classifier |
| (Haiku) | | (Sonnet) | | (Python) | | (Haiku) |
+------------+ +----------+ +------------+ +------------+
- DISCLOSURE: Must be stated in privacy policy/notice
- OPERATIONAL: Internal compliance processes (response times, procedures)
- TECHNICAL: System/UI implementation (GPC signals, security measures, link placement, UI elements)
- LEGAL FRAMEWORK: Enforcement mechanisms, penalties, AG authority, cure periods
Run techreg-parser view to open the built-in interactive viewer in your browser. Then drag-and-drop or select a JSON results file to explore:
- Filter requirements by category, verification status, and confidence threshold
- Full-text search across descriptions and citations
- Browse the statute structure tree with section types and line ranges
- Expand individual requirements to see quoted text, conditions, and metadata
The analysis produces:
- Requirements: List of all extracted requirements with citations
- Definitions: All defined terms from the statute
- Structure: Full statute section tree (IDs, types, titles, line ranges) — included by default in JSON export for the viewer's Structure tab
- Verification: Status of citation verification
- Classification: Category for each requirement
Based on lessons from analyzing tech regulation statutes:
- Start with definitions sections to anchor interpretation — defined terms control meaning throughout
- Separate disclosure requirements from operational and technical requirements
- Tech regulation statutes follow predictable architecture (definitions, scope, rights, duties, exemptions, enforcement)
- Obligations and defined terms vary across jurisdictions and regulatory domains — never assume uniformity
- Work section by section, not requirement by requirement — structure drives accurate extraction
- Every extracted requirement must trace back to a specific statutory provision with a verbatim quote
Before installing TechRegParser, you need the following set up on your computer. If you're not sure whether you have these, follow the steps below.
Python is the programming language this tool runs on. Download it from python.org. During installation on Windows, make sure to check "Add Python to PATH".
To verify it's installed, open a terminal and run:
python --versionTechRegParser uses Anthropic's AI models to read and analyze statutes. Claude Code is the program that connects to those models.
- Install Claude Code by following the official setup guide
- You will need an Anthropic API key — this is what allows the tool to communicate with the AI. You can get one from console.anthropic.com
- API usage is billed by Anthropic based on how much text is processed. Analyzing a single statute typically costs a few dollars
Claude Code requires Git Bash to run on Windows. Download and install Git for Windows. Use the default installation options.
To verify it's installed:
git --versionIf you want to analyze statutes in PDF format (rather than plain text), install PDF support:
pip install techreg-parser[pdf]For developers and CI environments:
- Python 3.11+
- Claude Code with a configured Anthropic API key
- Git for Windows (Windows only)
- Anthropic Agent SDK (
claude-agent-sdk) - Pydantic 2.0+
- Optional:
pdfplumberorpypdffor PDF support
When developing TechRegParser with Claude Code, the assistant maintains a persistent memory file at .claude/projects/.../memory/MEMORY.md. This memory carries context across separate conversations so the assistant doesn't re-learn the same things each session.
- Project architecture (phase pipeline, model assignments, orchestrator location)
- Performance optimizations already implemented (caching, early classification, parallel safety)
- CLI flags and their behavior
- Key code patterns (return types, backfill logic, temp file naming)
- Files that should not be deleted (e.g.
viewer.html)
- Resuming work across sessions — the assistant already knows the codebase layout, model assignments, and design decisions without needing to re-explore
- Avoiding regressions — recorded patterns (like the backfill duplication or temp file hashing) prevent the assistant from accidentally breaking established behavior
- Consistent style — remembering conventions means new code matches existing patterns
- One-off questions — if you're just asking about Python syntax or a general concept, memory adds no value
- New/unrelated projects — the memory is scoped to this project directory; it won't interfere with other work, but it also won't help
- Speculative or unverified info — memory should only contain patterns confirmed across multiple interactions or explicitly requested by the user, not guesses from reading a single file
- Ask the assistant to "remember X across sessions" to add something
- Ask the assistant to "forget X" or "stop remembering X" to remove an entry
- The memory file is plain markdown — you can edit it directly if needed
MIT