Recall Improvement Patch

Target: 95.85% → 99%+ recall

Files Included

File	Action	Description
`additional_patterns.py`	NEW	Drop into `detectors/` folder
`dictionaries.py`	REPLACE	Replace existing `detectors/dictionaries.py`
`merger_patch.py`	PATCH	Apply changes to `pipeline/merger.py`
`orchestrator_registration.py`	REFERENCE	Instructions for registering new detector

Installation

Step 1: Add new detector (EMPLOYER, AGE, HEALTH_PLAN_ID)

cp additional_patterns.py /path/to/scrubiq/detectors/

Step 2: Replace dictionaries.py (geo folder + min_length)

cp dictionaries.py /path/to/scrubiq/detectors/

Step 3: Patch merger.py

Open pipeline/merger.py and make these changes:

3a. Add EMPLOYER to COMPATIBLE_TYPE_GROUPS (line ~20-28):

COMPATIBLE_TYPE_GROUPS: List[Set[str]] = [
    {"NAME", "NAME_PATIENT", "NAME_PROVIDER", "NAME_RELATIVE", "NAME_FAMILY"},
    {"ADDRESS", "STREET", "STREET_ADDRESS", "CITY", "STATE", "ZIP", "LOCATION"},
    {"DATE", "DOB", "DATE_DOB", "DATE_ADMISSION", "DATE_DISCHARGE"},
    {"PHONE", "FAX", "PHONE_MOBILE", "PHONE_HOME", "PHONE_WORK"},
    {"SSN", "SSN_PARTIAL"},
    {"MRN", "PATIENT_ID", "MEDICAL_RECORD"},
    {"HEALTH_PLAN_ID", "MEMBER_ID", "INSURANCE_ID"},
    {"EMPLOYER", "ORGANIZATION", "COMPANY", "COMPANYNAME"},  # <-- ADD THIS LINE
]

3b. Update TYPE_NORMALIZE (line ~669-676):

Find this section:

    # === CLINICAL (context-only, filtered before output) ===
    "HOSPITAL": "FACILITY",
    "ORG": "FACILITY",
    "ORGANIZATION": "FACILITY",
    "VENDOR": "FACILITY",
    "COMPANYNAME": "FACILITY",
    "COMPANY": "FACILITY",

Replace with:

    # === EMPLOYER (companies/organizations) ===
    "COMPANYNAME": "EMPLOYER",  # CHANGED from FACILITY
    "COMPANY": "EMPLOYER",       # CHANGED from FACILITY
    "ORG": "EMPLOYER",           # CHANGED from FACILITY
    "ORGANIZATION": "EMPLOYER",  # CHANGED from FACILITY
    
    # === CLINICAL (context-only, filtered before output) ===
    "HOSPITAL": "FACILITY",
    "VENDOR": "FACILITY",
    
    # === MEDICATION ===
    "DRUG": "MEDICATION",  # NEW

Step 4: Register the new detector

In detectors/orchestrator.py, add:

from .additional_patterns import AdditionalPatternDetector

Then add to your detector list:

AdditionalPatternDetector(),

Step 5: Verify

python3 -c "
from scrubiq.detectors.additional_patterns import AdditionalPatternDetector

d = AdditionalPatternDetector()

# Test EMPLOYER
spans = d.detect('I work at ABC Corporation.')
print('EMPLOYER:', [s.text for s in spans if s.entity_type == 'EMPLOYER'])

# Test AGE  
spans = d.detect('Patient is 45 years old.')
print('AGE:', [s.text for s in spans if s.entity_type == 'AGE'])

# Test HEALTH_PLAN_ID
spans = d.detect('Member ID: XYZ123456')
print('HEALTH_PLAN_ID:', [s.text for s in spans if s.entity_type == 'HEALTH_PLAN_ID'])
"

Expected output:

EMPLOYER: ['ABC Corporation']
AGE: ['45 years old']
HEALTH_PLAN_ID: ['XYZ123456']

What These Changes Fix

Entity Type	Missed Count	Fix
EMPLOYER	773	New patterns in `additional_patterns.py`
HEALTH_PLAN_ID	873	New patterns in `additional_patterns.py`
AGE	579	New patterns in `additional_patterns.py`
MEDICATION	79	Dictionary outputs MEDICATION (not DRUG)
CITY/STATE	~80	Geo folder now loaded in `dictionaries.py`
False positives	-	min_length filter in `dictionaries.py`
Type mismatches	-	TYPE_NORMALIZE updates in `merger.py`

After Installation

Run your recall test:

pytest tests/test_synthetic_accuracy.py -v -s

Expected improvement: 95.85% → 98-99%+

Name		Name	Last commit message	Last commit date
Latest commit History 407 Commits
.claude		.claude
docs		docs
scripts		scripts
scrubiq		scrubiq
tests		tests
ui		ui
.gitignore		.gitignore
Future State		Future State
apply_fixes.sh		apply_fixes.sh
master_test_todo.pdf		master_test_todo.pdf
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-prod.txt		requirements-prod.txt
requirements.txt		requirements.txt
setup.sh		setup.sh
start.bat		start.bat
start.sh		start.sh
test_dl.jpg		test_dl.jpg
test_photo.jpg		test_photo.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recall Improvement Patch

Files Included

Installation

Step 1: Add new detector (EMPLOYER, AGE, HEALTH_PLAN_ID)

Step 2: Replace dictionaries.py (geo folder + min_length)

Step 3: Patch merger.py

Step 4: Register the new detector

Step 5: Verify

What These Changes Fix

After Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Recall Improvement Patch

Files Included

Installation

Step 1: Add new detector (EMPLOYER, AGE, HEALTH_PLAN_ID)

Step 2: Replace dictionaries.py (geo folder + min_length)

Step 3: Patch merger.py

Step 4: Register the new detector

Step 5: Verify

What These Changes Fix

After Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages