929. Unique Email Addresses by fhiyo · Pull Request #17 · fhiyo/leetcode

fhiyo · 2024-06-06T07:37:53Z

https://leetcode.com/problems/unique-email-addresses/description/

nodchip · 2024-06-06T14:03:28Z

929_unique-email-addresses.md

+
+### ①
+
+正規表現が最初に思いついた。ReDOSが怖いからPythonで正規表現を簡単に選択肢に入れない方がいいのだろうか？


読みやすさの観点からは、単純な正規表現であれば使っても良いと思います。一方、複雑な正規表現は、理解に時間がかかるため、あまり読みたくありません…。

nodchip · 2024-06-06T14:05:58Z

929_unique-email-addresses.md

+            normalized_local_name = re.sub('\.|\+.*', '', local_name)
+            return ''.join([normalized_local_name, '@', domain_name])
+
+        return len(set(map(lambda e: normalize(e), emails)))


パズルを読み解いているような気持ちになりました。あまり読みやすいとは思えませんでした。単純なループで書いたほうが読みやすいと思います。

unique_emails = set() for email in emails: unique_emails.add(normalize(email)) return len(unique_emails)

うーんそうなんですかね...？自分はパズルと言うほど複雑には感じなかったです。もちろん共通の感覚としてあるなら修正したいですが。

許容範囲な気がします。私は、len(set(map(normalize, emails)))の方が、for loopより速く理解できる気がします。

lambda e: normalize(e) -> normalize
ですか?

パズルというほどではないですが、中間変数を作る選択肢なども様々あるはずで、その中でこれを意識的に選んでいますか。あと、本当はカッコの対応が分かっていないと、シンタックスエラーか分からないはずですが、ひと目見て、シンタックスエラーでないと言えますか。

lambda e: normalize(e) -> normalize

そうです。

中間変数を作る選択肢なども様々あるはずで、その中でこれを意識的に選んでいますか。

様々というほどではないですが、nodchipsさんにご提案いただいたものと比べて選択はしました。どちらがより良いという感覚は無いので明確なものではないんですが。

本当はカッコの対応が分かっていないと、シンタックスエラーか分からないはずですが、ひと目見て、シンタックスエラーでないと言えますか。

ひと目で分かるかというと分からないですね。ただこれって皆さん意識されてるんですか...？そこを気にする感覚が自分には無かったので、意識しないといけないのか気になります。普段は括弧の対応が正しいかはエディタで分かるようにしていて、そういう人も多いと思っていたので。

foo(bar(baz(qux(quux(...))))) と多くなってくると、中間変数を作った方が分かりやすくなる場合も増えるのは分かるんですが。

longestCommonPrefix = foldl1 (((.) . (.)) (map fst . takeWhile (uncurry (==))) zip)

昔こういうポイントフリーを投稿しましたが、これは11個の演算子または関数で十分厳しいですね。

return len(set(map(lambda e: normalize(e), emails)))

これ9つです。
私は lambda を書いたら一呼吸入れたほうがいいと思います。

normalized = map(lambda e: normalize(e), emails) return len(set(normalized))

一方

return len(set(map(normalize, emails)))

こっちは許容範囲と思いますが、

unique = set(map(normalize, emails)) return len(unique)

一呼吸いれても set にした意図が明確になるように思います。

心理学やマーケティングだと、マジカルナンバーといって、短期記憶の限界は7つか4つかというんで、それくらいであふれると思ったほうがいいと思います。

nodchip · 2024-06-06T14:07:38Z

929_unique-email-addresses.md

+class Solution:
+    def numUniqueEmails(self, emails: List[str]) -> int:
+        def normalize(email: str) -> str:
+            g = re.match('(.*)@(.*)', email)


@ で分けるだけの処理であれば、 split() のほうが読みやすいと思います。

local_name, domain_name = email.split('@')

nodchip · 2024-06-06T14:10:33Z

929_unique-email-addresses.md

+            g = re.match('(.*)@(.*)', email)
+            local_name = g[1]
+            domain_name = g[2]
+            normalized_local_name = re.sub('\.|\+.*', '', local_name)


「. または + 以降を空白で置き換える」という処理なのは理解できるのですが、やりたい処理に対して実装がやや複雑に感じました。

normalized_local_name = local_name.split('+')[0] normalized_local_name = normalized_local_name.replace('.', '')

と平易な書き方をした方が読みやすいと思います。

nodchip · 2024-06-06T14:11:15Z

929_unique-email-addresses.md

+            local_name = g[1]
+            domain_name = g[2]
+            normalized_local_name = re.sub('\.|\+.*', '', local_name)
+            return ''.join([normalized_local_name, '@', domain_name])


このくらいの文字列の連結であれば、 '+' を用いたほうが読みやすいと思います。

normalized_local_name + '@' + domain_name

あと、f string というのもありますね。

nodchip · 2024-06-06T14:12:29Z

929_unique-email-addresses.md

+class Solution:
+    def numUniqueEmails(self, emails: List[str]) -> int:
+        def normalize(email: str) -> str:
+            local_name, domain_name = email.rsplit('@', maxsplit=1)


こちらの方が読みやすく感じます。

nodchip · 2024-06-06T14:13:23Z

929_unique-email-addresses.md

+        def normalize(email: str) -> str:
+            local_name, domain_name = email.rsplit('@', maxsplit=1)
+            normalized_local_name = []
+            for ch in local_name:


ループで書いても、あまり読みやすくはなっていないように感じました。

何に対して読みやすくなっていないのでしょうか？

nodchip · 2024-06-06T14:14:38Z

929_unique-email-addresses.md

+        def normalize(email: str) -> str:
+            local_name, domain_name = email.rsplit('@', maxsplit=1)
+            local_name = local_name.replace('.', '')
+            plus_position = local_name.find('+')


個人的には local_name = local_name.split('+')[0] のほうが読みやすく感じますが、人によって意見が分かれるかもしれません。

個人的には local_name = local_name.split('+')[0] のほうが読みやすく感じますが

自分もご提案いただいた方が読みやすい気がします。

TORUS0818 · 2024-06-11T05:35:49Z

929_unique-email-addresses.md

+                    local_part = False
+                    ignored_local_part = False
+                    yield ch
+                    continue


このcontinue必要でしょうか（上のyieldも同様）

いらないといえばいらないですね。
自分の意図としては、上からdomainを処理するスイート、'@'を処理するスイート、残りがlocal_partのイメージで書いていて、認知負荷的に下側を見なくても済むように書きたかったのでこうなりました (その旨のコメントはあった方が良かったかも)。

不要では？と考えるのもよく分かるので、どうするのが正解なのかは分からないです...それぞれのトレードオフを踏まえたケースバイケースだろうとは思いますが (そもそももっと良い書き方あればぜひ)。

@TORUS0818 さんは、early returnと混同しているような気がします。continueがないと同じ文字が２回yieldされませんか？

local part, ignored local part, domain partと複数のループに分ける方が、分かりやすいと思います。

上のyieldも同様

なので、2行まとめて不要じゃないかという主張かなと

@liquo-rice さん
確かにここの理解があやふやです。ドキュメント読んできます。

最後の文字が2回出てきちゃう感じでしょうか？

私は105-106, 110-111行目に対する指摘だと読みました。

if not local_part: yield ch continue # 不要？ if ch == '@': local_part = False ignored_local_part = False yield ch continue # 不要？

@liquo-rice さん確かにここの理解があやふやです。ドキュメント読んできます。

最後の文字が2回出てきちゃう感じでしょうか？

continueしないと、117行目のyieldでもう一度chが出る可能性があります。

@liquo-rice
理解しました。
今回の場合、たまたま通ってしまいますが、continueがないと@以降が（.以外）ダブって出てきてしまいますね。。

local part, ignored local part, domain partと複数のループに分ける方が、分かりやすいと思います。

class Solution: def numUniqueEmails(self, emails: List[str]) -> int: def normalize(email: str) -> Generator[str, None, None]: i = 0 while not (email[i] == '@' or email[i] == '+'): if email[i] != '.': yield email[i] i += 1 while email[i] != '@': # ignore the substring between '+' and '@' ('@' is excluded) i += 1 while i < len(email): yield email[i] i += 1 return len(set(map(lambda e: ''.join(normalize(e)), emails)))

書いてみました (local partに'@'は無い前提にしている)。こちらの方が素直な気がします、ありがとうございます

TORUS0818 · 2024-06-11T07:04:52Z

929_unique-email-addresses.md

+rsplitで分けてreplace, findで削除。discord上でlocal_partの中で@が含まれる場合について話していた記憶があったので、一応右から検索した (https://discord.com/channels/1084280443945353267/1201211204547383386/1209856265413591140 にあることを確認)。
+[RFC5322](https://datatracker.ietf.org/doc/html/rfc5322#section-3.4.1)と[RFC1034](https://www.ietf.org/rfc/rfc1034.txt)を見る限り、domain側に@は多分含まない、と思う。
+
+入力が不正で@が含まれない場合、 `local_name, domain_name = email.rsplit('@', maxsplit=1)` の行でunpackできずにエラーになる。まあいいんじゃないだろうか


ここ想定されているエラーが出ますか？

local_name, domain_name = email.split('@')

だとunpack出来ずにValueErrorを吐くと思いますが。

自分の環境で試すとエラー出ますね。出なかったですか？

$ ipython Python 3.11.2 (main, May 3 2023, 18:53:30) [Clang 14.0.0 (clang-1400.0.29.102)] Type 'copyright', 'credits' or 'license' for more information IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help. In [1]: a, b = 'hoge'.rsplit('@', maxsplit=1) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[1], line 1 ----> 1 a, b = 'hoge'.rsplit('@', maxsplit=1) ValueError: not enough values to unpack (expected 2, got 1)

あ、ごめんなさい。

”入力で不正に@が含まれる場合”
と空目していました。

"@"が複数入ってた場合にunpack出来ずにエラー吐く仕様なのかと勘違いしました。。

929_unique-email-addresses

718d1da

nodchip reviewed Jun 6, 2024

View reviewed changes

929_unique-email-addresses 4th

c7aea26

TORUS0818 reviewed Jun 11, 2024

View reviewed changes

oda mentioned this pull request Jul 24, 2024

46. Permutations #50

Open

colorbox mentioned this pull request Nov 15, 2024

929. Unique Email Addresses colorbox/leetcode#28

Merged

oda mentioned this pull request Nov 27, 2024

347. Top K Frequent Elements.md katataku/leetcode#9

Open


		### ①

		正規表現が最初に思いついた。ReDOSが怖いからPythonで正規表現を簡単に選択肢に入れない方がいいのだろうか？

Conversation

fhiyo commented Jun 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TORUS0818 Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TORUS0818 Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TORUS0818 Jun 11, 2024 •

edited

Loading

TORUS0818 Jun 11, 2024 •

edited

Loading