Describe the bug
mail-parser fails to run regex on domains that ends with ".id"
My scraper fails on them specifically
To Reproduce
Change domain to end with .id
Change in const.py:51 the regex to negative lookbehind with whitespace and dot:
(
r"[^\w](?:id\s+(?P.+?)(?:\s*[(]?envelope-from|\s*"
r"[(]?envelope-sender|\s+from|\s+by|\s+with"
r"(?! cipher)|\s+for|\s+via|;))"
)
(
r"(?<![^\w\.])(?:id\s+(?P.+?)(?:\s*[(]?envelope-from|\s*"
r"[(]?envelope-sender|\s+from|\s+by|\s+with"
r"(?! cipher)|\s+for|\s+via|;))"
)
Expected behavior
ID extracted correctly and only once
Environment:
- OS: Linux
- Docker: yes
- mail-parser version 4.1.2
Describe the bug
mail-parser fails to run regex on domains that ends with ".id"
My scraper fails on them specifically
To Reproduce
Change domain to end with .id
Change in const.py:51 the regex to negative lookbehind with whitespace and dot:
(
r"[^\w](?:id\s+(?P.+?)(?:\s*[(]?envelope-from|\s*"
r"[(]?envelope-sender|\s+from|\s+by|\s+with"
r"(?! cipher)|\s+for|\s+via|;))"
)
(
r"(?<![^\w\.])(?:id\s+(?P.+?)(?:\s*[(]?envelope-from|\s*"
r"[(]?envelope-sender|\s+from|\s+by|\s+with"
r"(?! cipher)|\s+for|\s+via|;))"
)
Expected behavior
ID extracted correctly and only once
Environment: