-
Notifications
You must be signed in to change notification settings - Fork 0
929. Unique Email Addresses #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,186 @@ | ||
| # 929. Unique Email Addresses | ||
|
|
||
| ## 1st | ||
|
|
||
| ### ① | ||
|
|
||
| 正規表現が最初に思いついた。ReDOSが怖いからPythonで正規表現を簡単に選択肢に入れない方がいいのだろうか? | ||
|
|
||
| 答えを返す部分のlambdaは不要で、単に `map(normalize, emails)` で良かった。 | ||
|
|
||
| 所要時間: 15:35 | ||
|
|
||
| 正規表現の計算量、よく分からない。normalize一回あたりで文字列の長さをnとして考えると、 | ||
| - 時間計算量: 最悪指数時間?これはバックトラッキングで指数時間かかる場合がある?無いとすると、O(n) | ||
| - 空間計算量: O(n)か? | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| g = re.match('(.*)@(.*)', email) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| local_name = g[1] | ||
| domain_name = g[2] | ||
| normalized_local_name = re.sub('\.|\+.*', '', local_name) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 「 と平易な書き方をした方が読みやすいと思います。 |
||
| return ''.join([normalized_local_name, '@', domain_name]) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. このくらいの文字列の連結であれば、 '+' を用いたほうが読みやすいと思います。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. あと、f string というのもありますね。 |
||
|
|
||
| return len(set(map(lambda e: normalize(e), emails))) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. パズルを読み解いているような気持ちになりました。あまり読みやすいとは思えませんでした。単純なループで書いたほうが読みやすいと思います。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. うーんそうなんですかね...?自分はパズルと言うほど複雑には感じなかったです。もちろん共通の感覚としてあるなら修正したいですが。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 許容範囲な気がします。私は、 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lambda e: normalize(e) -> normalize パズルというほどではないですが、中間変数を作る選択肢なども様々あるはずで、その中でこれを意識的に選んでいますか。あと、本当はカッコの対応が分かっていないと、シンタックスエラーか分からないはずですが、ひと目見て、シンタックスエラーでないと言えますか。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
そうです。
様々というほどではないですが、nodchipsさんにご提案いただいたものと比べて選択はしました。どちらがより良いという感覚は無いので明確なものではないんですが。
ひと目で分かるかというと分からないですね。ただこれって皆さん意識されてるんですか...?そこを気にする感覚が自分には無かったので、意識しないといけないのか気になります。普段は括弧の対応が正しいかはエディタで分かるようにしていて、そういう人も多いと思っていたので。 foo(bar(baz(qux(quux(...))))) と多くなってくると、中間変数を作った方が分かりやすくなる場合も増えるのは分かるんですが。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. longestCommonPrefix = foldl1 (((.) . (.)) (map fst . takeWhile (uncurry (==))) zip)昔こういうポイントフリーを投稿しましたが、これは11個の演算子または関数で十分厳しいですね。 return len(set(map(lambda e: normalize(e), emails)))これ9つです。 normalized = map(lambda e: normalize(e), emails)
return len(set(normalized))There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 一方 return len(set(map(normalize, emails)))こっちは許容範囲と思いますが、 unique = set(map(normalize, emails))
return len(unique)一呼吸いれても set にした意図が明確になるように思います。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 心理学やマーケティングだと、マジカルナンバーといって、短期記憶の限界は7つか4つかというんで、それくらいであふれると思ったほうがいいと思います。 |
||
| ``` | ||
|
|
||
| ### ② | ||
|
|
||
| rsplitで分けてreplace, findで削除。discord上でlocal_partの中で@が含まれる場合について話していた記憶があったので、一応右から検索した (https://discord.com/channels/1084280443945353267/1201211204547383386/1209856265413591140 にあることを確認)。 | ||
| [RFC5322](https://datatracker.ietf.org/doc/html/rfc5322#section-3.4.1)と[RFC1034](https://www.ietf.org/rfc/rfc1034.txt)を見る限り、domain側に@は多分含まない、と思う。 | ||
|
|
||
| 入力が不正で@が含まれない場合、 `local_name, domain_name = email.rsplit('@', maxsplit=1)` の行でunpackできずにエラーになる。まあいいんじゃないだろうか | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ここ想定されているエラーが出ますか? local_name, domain_name = email.split('@')だとunpack出来ずにValueErrorを吐くと思いますが。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 自分の環境で試すとエラー出ますね。出なかったですか? $ ipython
Python 3.11.2 (main, May 3 2023, 18:53:30) [Clang 14.0.0 (clang-1400.0.29.102)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: a, b = 'hoge'.rsplit('@', maxsplit=1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 1
----> 1 a, b = 'hoge'.rsplit('@', maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. あ、ごめんなさい。 ”入力で不正に@が含まれる場合” "@"が複数入ってた場合にunpack出来ずにエラー吐く仕様なのかと勘違いしました。。 |
||
|
|
||
| 所要時間: 7:00 | ||
|
|
||
| normalize一回あたり、文字列の長さをnとして、 | ||
| - 時間計算量: O(n) | ||
| - 空間計算量: O(n) | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| local_name, domain_name = email.rsplit('@', maxsplit=1) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. こちらの方が読みやすく感じます。 |
||
| local_name = local_name.replace('.', '') | ||
| plus_position = local_name.find('+') | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 個人的には
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
自分もご提案いただいた方が読みやすい気がします。 |
||
| if plus_position != -1: | ||
| local_name = local_name[:plus_position] | ||
| return ''.join([local_name, '@', domain_name]) | ||
|
|
||
| return len(set(map(normalize, emails))) | ||
| ``` | ||
|
|
||
| ### ③ | ||
|
|
||
| local_nameを一文字ずつ進めるパターン。listと文字列を両方normalized_local_nameという同じ変数に入れているのはあまり行儀が良くないかもしれない。文字列もリストのようなものだしまあいいか、と考えた | ||
|
|
||
| 所要時間: 5:02 | ||
|
|
||
| normalize一回あたり、文字列の長さをnとして、 | ||
| - 時間計算量: O(n) | ||
| - 空間計算量: O(n) | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| local_name, domain_name = email.rsplit('@', maxsplit=1) | ||
| normalized_local_name = [] | ||
| for ch in local_name: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ループで書いても、あまり読みやすくはなっていないように感じました。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 何に対して読みやすくなっていないのでしょうか? |
||
| if ch == '+': | ||
| break | ||
| if ch == '.': | ||
| continue | ||
| normalized_local_name.append(ch) | ||
| normalized_local_name = ''.join(normalized_local_name) | ||
| return ''.join([normalized_local_name, '@', domain_name]) | ||
|
|
||
| return len(set(map(normalize, emails))) | ||
| ``` | ||
|
|
||
|
|
||
| ## 2nd | ||
|
|
||
| ### 参考 | ||
|
|
||
| - https://discord.com/channels/1084280443945353267/1200089668901937312/1207717581390217236 | ||
| - https://discord.com/channels/1084280443945353267/1200089668901937312/1209166861821026356 | ||
| RFCへのリンクが書いてある。 | ||
| - https://discord.com/channels/1084280443945353267/1200089668901937312/1210065979300651048 | ||
|
|
||
| splitしないでindexを進めるパターンがあったのでそれもやる。 | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> Generator[str, None, None]: | ||
| local_part = True | ||
| ignored_local_part = False | ||
| for ch in email: | ||
| if not local_part: | ||
| yield ch | ||
| continue | ||
| if ch == '@': | ||
| local_part = False | ||
| ignored_local_part = False | ||
| yield ch | ||
| continue | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. このcontinue必要でしょうか(上のyieldも同様)
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. いらないといえばいらないですね。 不要では?と考えるのもよく分かるので、どうするのが正解なのかは分からないです...それぞれのトレードオフを踏まえたケースバイケースだろうとは思いますが (そもそももっと良い書き方あればぜひ)。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TORUS0818 さんは、early returnと混同しているような気がします。continueがないと同じ文字が2回yieldされませんか? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. local part, ignored local part, domain partと複数のループに分ける方が、分かりやすいと思います。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
なので、2行まとめて不要じゃないかという主張かなと There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @liquo-rice さん 最後の文字が2回出てきちゃう感じでしょうか? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 私は105-106, 110-111行目に対する指摘だと読みました。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
continueしないと、117行目のyieldでもう一度chが出る可能性があります。 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @liquo-rice
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
class Solution:
def numUniqueEmails(self, emails: List[str]) -> int:
def normalize(email: str) -> Generator[str, None, None]:
i = 0
while not (email[i] == '@' or email[i] == '+'):
if email[i] != '.':
yield email[i]
i += 1
while email[i] != '@':
# ignore the substring between '+' and '@' ('@' is excluded)
i += 1
while i < len(email):
yield email[i]
i += 1
return len(set(map(lambda e: ''.join(normalize(e)), emails)))書いてみました (local partに'@'は無い前提にしている)。こちらの方が素直な気がします、ありがとうございます |
||
| if ignored_local_part or ch == '.': | ||
| continue | ||
| if ch == '+': | ||
| ignored_local_part = True | ||
| continue | ||
| yield ch | ||
|
|
||
| return len(set(map(lambda e: ''.join(normalize(e)), emails))) | ||
| ``` | ||
|
|
||
| yieldする形にした。2回書いて整ってきたものを載せている。 | ||
|
|
||
|
|
||
| - https://discord.com/channels/1084280443945353267/1200089668901937312/1210619083385479258 | ||
| 文字列追加 += が最適化される話と、環境に対して頑健か、という話。 | ||
|
|
||
| ## 3rd | ||
|
|
||
| joinしているところはf-stringで書いてもよかったかもと後で思った。 | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| local_part, domain = email.rsplit('@', maxsplit=1) | ||
| local_part = re.sub('\.|\+.*', '', local_part) | ||
| return ''.join([local_part, '@', domain]) | ||
|
|
||
| return len(set(map(normalize, emails))) | ||
| ``` | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| local_part, domain = email.rsplit('@', maxsplit=1) | ||
| local_part = local_part.replace('.', '') | ||
| plus_position = local_part.find('+') | ||
| if plus_position != -1: | ||
| local_part = local_part[:plus_position] | ||
| return ''.join([local_part, '@', domain]) | ||
|
|
||
| return len(set(map(normalize, emails))) | ||
| ``` | ||
|
|
||
| ## 4th | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| local_part, domain = email.rsplit('@', maxsplit=1) | ||
| local_part = local_part.split('+')[0] | ||
| local_part = local_part.replace('.', '') | ||
| return local_part + '@' + domain | ||
|
|
||
| normalized_emails = set() | ||
| for email in emails: | ||
| normalized_emails.add(normalize(email)) | ||
| return len(normalized_emails) | ||
| ``` | ||
|
|
||
| ①のように正規表現を使ってもそれを使う意味が薄い気がする。少し複雑にはなるがこんな感じだろうか。単体テストを厚めにしないとこの程度でも嫌な感じはある | ||
|
|
||
| ```py | ||
| class Solution: | ||
| def numUniqueEmails(self, emails: List[str]) -> int: | ||
| def normalize(email: str) -> str: | ||
| g = re.match(r'([^+]+)(?:\+.*)?@(.+)', email) | ||
| normalized_local_part = g[1].replace('.', '') | ||
| domain = g[2] | ||
| return f'{normalized_local_part}@{domain}' | ||
|
|
||
| return len(set(map(normalize, emails))) | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
読みやすさの観点からは、単純な正規表現であれば使っても良いと思います。一方、複雑な正規表現は、理解に時間がかかるため、あまり読みたくありません…。