Skip to content

Conversation

@nobu
Copy link
Member

@nobu nobu commented Jul 12, 2025

Successive dots are prohibited in RFC5322.

@nobu nobu changed the title More checks in email regexp More checks in EMAIL_REGEXP Jul 12, 2025
@nobu nobu merged commit 0abac72 into ruby:master Jul 12, 2025
26 checks passed
@nobu nobu deleted the more-checks-in-email_regexp branch July 12, 2025 07:07
nobu added a commit to nobu/uri that referenced this pull request Jul 12, 2025
Fix the performance regression at ruby#172 for valid emails.

``` yml
prelude: |
  require 'uri/mailto'
  n = 1000
  re = URI::MailTo::EMAIL_REGEXP
benchmark:
  n.t..t.@docomo.ne.jp: re.match?("n.t..t.@docomo.ne.jp")
  example@example.info: re.match?("example@example.info")
```

|                      |released| 788274b| c5974f0|    this|
|:---------------------|-------:|-------:|-------:|-------:|
|n.t..t.@docomo.ne.jp  |  3.538M|  4.509M|  4.597M|  8.089M|
|                      |       -|   1.27x|   1.30x|   2.29x|
|example@example.info  |  3.627M|  3.461M|  2.622M|  3.610M|
|                      |   1.38x|   1.32x|       -|   1.38x|
nobu added a commit to nobu/uri that referenced this pull request Jul 12, 2025
Fix the performance regression at ruby#172 for valid emails.

``` yml
prelude: |
  require 'uri/mailto'
  n = 1000
  re = URI::MailTo::EMAIL_REGEXP
benchmark:
  n.t..t.: re.match?("n.t..t.@docomo.ne.jp")
  example: re.match?("example@example.info")
```

|         |released| 788274b| c5974f0|    this|
|:--------|-------:|-------:|-------:|-------:|
|n.t..t.  |  3.795M|  4.864M|  4.993M|  8.739M|
|         |       -|   1.28x|   1.32x|   2.30x|
|example  |  3.911M|  3.740M|  2.838M|  3.880M|
|         |   1.38x|   1.32x|       -|   1.37x|
nobu added a commit to nobu/uri that referenced this pull request Jul 12, 2025
Fix the performance regression at ruby#172 for valid emails.

``` yml
prelude: |
  require 'uri/mailto'
  n = 1000
  re = URI::MailTo::EMAIL_REGEXP
benchmark:
  n.t..t.: re.match?("n.t..t.@docomo.ne.jp")
  example: re.match?("example@example.info")
```

|         |released| 788274b| c5974f0|    this|
|:--------|-------:|-------:|-------:|-------:|
|n.t..t.  |  3.795M|  4.864M|  4.993M|  8.739M|
|         |       -|   1.28x|   1.32x|   2.30x|
|example  |  3.911M|  3.740M|  2.838M|  3.880M|
|         |   1.38x|   1.32x|       -|   1.37x|
nobu added a commit to nobu/uri that referenced this pull request Jul 12, 2025
Fix the performance regression at ruby#172 for valid emails.

``` yml
prelude: |
  require 'uri/mailto'
  n = 1000
  re = URI::MailTo::EMAIL_REGEXP
benchmark:
  n.t..t.: re.match?("n.t..t.@docomo.ne.jp")
  example: re.match?("example@example.info")
```

|         |released| 788274b| c5974f0|    this|
|:--------|-------:|-------:|-------:|-------:|
|n.t..t.  |  3.795M|  4.864M|  4.993M|  8.739M|
|         |       -|   1.28x|   1.32x|   2.30x|
|example  |  3.911M|  3.740M|  2.838M|  3.880M|
|         |   1.38x|   1.32x|       -|   1.37x|
@osyoyu
Copy link
Contributor

osyoyu commented Nov 4, 2025

@nobu @hsbt While this change is semantically correct, its impact is rather broad and may cause unintentional breakages. It is standard practice to use EMAIL_REGEXP to test validity on login screens. An user using an email address containing .. will suddenly experience login problems once the service provider updates uri to 1.1.0.

Instead of modifying the original EMAIL_REGEXP constant, can the new regex live under a separate name like RFC5322_EMAIL_REGEXP (just like RFC3986_PARSER), or can a more larger announcement be made?


Note: Email addresses like a..a@example.com and a.@example.com have been allowed by a major email provider in the past, and still do exist in the wild. To the best of my knowledge, web services do rely on EMAIL_REGEXP behavior allowing these addresses.

Comment on lines 54 to +55
# https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
EMAIL_REGEXP = /\A(?!\.)[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/
EMAIL_REGEXP = /\A(?!\.)(?!.*\.{2})[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+(?<!\.)@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/
Copy link
Contributor

@osyoyu osyoyu Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment above states, the original regex is mostly drawn from WHATWG HTML LS. This spec states that it intentionally violates RFC 5322 to provide a practical regex for validation.

This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The allowing of .. is not the only deviation from RFC 5322. If a truly RFC 5322-compliant regexp is needed, I believe it should be organized under a different name, since too much departure from the original EMAIL_REGEXP must be introduced.

@osyoyu
Copy link
Contributor

osyoyu commented Nov 4, 2025

I have opened #189. Please use if needed.

@sorah
Copy link
Member

sorah commented Nov 4, 2025

Reverted at #189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants