Skip to content

jomcgi-old/scrubadub

 
 

Repository files navigation

scrubadub

Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don't. This package makes it easy to seamlessly scrub personal information from free text, without compromising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

  • Names
  • Email addresses
  • Addresses/Postal codes (US, GB, CA)
  • URLs
  • Phone numbers
  • Username and password combinations
  • Skype/twiter usernames
  • Social security numbers (US, GB)
  • Tax numbers (GB)
  • Driving licence numbers (GB)
Build Status Version Downloads Test Coverage Documentation Status

Quick start

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub

# My cat may be more tech-savvy than most, but he doesn't want other people to know it.
>>> text = "My cat can be contacted on example@example.com, or 1800 555-5555"

# Replaces the phone number and email addresse with anonymous IDs.
>>> scrubadub.clean(text)
'My cat can be contacted on {{EMAIL}}, or {{PHONE}}'

There are many ways to tailor the behavior of scrubadub using different Detectors and PostProcessors. Scrubadub is highly configurable and supports localisation for different languages and regions.

Installation

To install scrubadub using pip, simply type:

pip install scrubadub

This package requires at least python 3.5. For python 2.7 support see v1.2.2 which is the last version with python 2.7 support.

Some detectors need extra dependencies, see our documentation for more details on these.

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

About

Clean personally identifiable information from dirty dirty text.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 99.6%
  • Shell 0.4%