Skip to content

trackingplan/pii-regex-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regular Expressions for PII Detection

This repository provides a curated collection of regular expressions (regex) designed to help developers and data analysts detect potential personally identifiable information (PII) in datasets and applications. The patterns cover many countries in the European Union as well as the United States, Canada and Mexico, and include common data types such as email addresses, credit‑card numbers, national identity numbers and social security numbers.

Why this repository exists

Under privacy laws like the EU’s General Data Protection Regulation (GDPR) and Spain’s LOPD‑GDD, organisations must collect and process only the data that is _adequate, relevant and limited to what is necessary. Many categories of data – names, phone numbers, identification numbers, emails, IP addresses, location or purchase history – are classed as personal data and fall within these rules. Other categories, such as health data, biometric identifiers, ethnicity or political opinions, are sensitive and require even stronger protections. Detecting PII in logs or datasets is a first step towards complying with the principle of minimisation.

The regex patterns in this repository are not a silver bullet. They aim to catch typical formats of national identification numbers, phone numbers, bank account numbers and other data types. However, they may produce false positives or false negatives and should be used in conjunction with other validation logic and manual review. Never rely solely on regex to make decisions about individuals.

Repository structure

Each country has its own Markdown file under the country_regexes/ directory. Every file contains a table with four columns:

Data type Country ISO Code Regex

The Data type column lists the kind of information (e.g. Email, Phone number, Social security number). The Country column indicates the jurisdiction for which the pattern applies. The ISO Code column contains the ISO 3166-1 alpha-2 country code (e.g. ES for Spain, DE for Germany, US for United States). The Regex column contains the regular expression. For example, the pattern for a German social security number is an eleven‑digit sequence followed by a letter and three digits, while the Portuguese social security number comprises eleven digits. We have included similar patterns for more than two dozen EU Member States and added entries for the United States, Canada and Mexico based on publicly available documentation.

Additionally, the repository includes a global.md file with regex patterns that apply to any country, such as generic patterns for ethnicity, geolocation, religion, gender, and IP addresses. These patterns use the ISO code ZZ to indicate their global applicability.

Generic patterns

Some files include generic patterns (e.g. for email addresses or IPv4 addresses) that apply across multiple jurisdictions. Where national formats exist (such as IBAN, CLABE or routing numbers), the pattern reflects the structure described in official sources. For instance, a CLABE account number in Mexico must contain 18 digits that encode the bank, branch, account and a check digit.

How to contribute

We welcome contributions from the community. If you know of a new data type or country‑specific pattern that should be added, or if you can improve an existing regex, please open a pull request with your proposed change. To keep the repository consistent:

  1. Follow the table structure. Add one row per data type for each country. Use the same column order (Data type, Country, ISO Code, Regex).
  2. Escape vertical bars. If your regex contains a pipe (|), escape it as \| so that the Markdown table renders correctly.
  3. Cite your sources. Provide a reference to an official or authoritative source describing the format (e.g. legislation, government documentation, or standards body) so that others can verify the pattern. You may include citations in the pull request description.
  4. Respect privacy laws. Ensure that the patterns you submit are used for detecting and protecting data, not for profiling or discrimination. The GDPR principle of minimisation prohibits storing data that is not necessary for the stated purpose.
  5. Test your regex. Where possible, include unit tests or sample strings demonstrating how the pattern works. Remember that some numbers (like social security numbers) include checksum rules; basic regex may not validate these checks.

Disclaimer

The expressions and documentation provided here are offered as is for educational and informational purposes. They do not constitute legal advice. While we have consulted publicly available guidance to build these patterns, we cannot guarantee their accuracy or completeness. Always review national legislation and consult privacy professionals when implementing PII detection in production systems.

License

Unless otherwise noted, the contents of this repository are released under the MIT License. You are free to use, modify and distribute the regex patterns provided you include attribution and abide by the terms of the license.

Contact

If you have questions or suggestions about this project, please open an issue or contact the maintainers via the repository’s issue tracker. Thank you for helping build a safer, more privacy‑conscious ecosystem!

About

PII regex library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors