-
Notifications
You must be signed in to change notification settings - Fork 0
139. Word Break #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
139. Word Break #37
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # 139. Word Break | ||
|
|
||
| sol1.py: 時間制限に引っかかる。このコード以外にも複数ためしたが、wordDictを全探索して削るやり方だとうまくいかない | ||
| 再帰関数を定義し、メモ化するようにしたらうまくいった | ||
|
|
||
| https://discord.com/channels/1084280443945353267/1200089668901937312/1222092873508323368 | ||
|
|
||
| > 0文字目から開始して、len(s)文字目に到達できれば受理します。 | ||
|
|
||
| 結局DFSで解けるのか。削除した文字列は数字で管理すれば良い。 | ||
|
|
||
| > この問題、まず正規表現で書くことができるので O(n) で解けるはずとまず初めに考えました。 | ||
| > ((apple)|(pen))* | ||
| > 次に、その場合のよくある話として | ||
| > "a" * 51 | ||
| > は、 | ||
| > "a" * 2 と "a" * 4 で表せないので、単純なバックトラックでは失敗するというのが予想です。 | ||
|
|
||
| 「正規表現で書けるからO(n)」はあくまでwordDictが定数のときの話だろう。ただこの考えは持っておきたい。 | ||
|
|
||
| > というわけで、先頭から DP が"模範解答"だろうな、とは思います。 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 背景として、 Vitabi のアルゴリズムがありそうな気がしました。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 最大確率を求めるわけではないのでやや疑問ですが、たしかにインデックスを状態としたDPを行うのは似ているのかもしれません |
||
|
|
||
| > たとえば、priority queue を用意して、そこに数字 N が入っている場合は「先頭から N 文字目までの部分文字列は、wordDict の結合で表現できる」ということを意味する、とかします。 | ||
| > 初期値は [0] (0文字目までは表現できる。)ですね。 | ||
|
|
||
|
|
||
|
|
||
| https://github.com/garunitule/coding_practice/pull/39 | ||
|
|
||
| トライ木を使った実装 | ||
|
|
||
| dataclass | ||
| https://docs.python.org/ja/3.10/library/dataclasses.html | ||
|
|
||
| https://docs.python.org/3.10/library/dataclasses.html#dataclasses.field | ||
|
|
||
| > default_factory: If provided, it must be a zero-argument callable that will be called when a default value is needed for this field. Among other purposes, this can be used to specify fields with mutable default values, as discussed below. It is an error to specify both default and default_factory. | ||
|
|
||
| > This has the same issue as the original example using class C. That is, two instances of class D that do not specify a value for x when creating a class instance will share the same copy of x. Because dataclasses just use normal Python class creation they also share this behavior. There is no general way for Data Classes to detect this condition. Instead, the dataclass() decorator will raise a TypeError if it detects a default parameter of type list, dict, or set. This is a partial solution, but it does protect against many common errors. | ||
|
|
||
| > Using default factory functions is a way to create new instances of mutable types as default values for fields: | ||
|
|
||
| つまり、mutableな型を、クラス変数として置いたり、dataclass のフィールドのデフォルト値に直接書いたりするとインスタンス間で共有されてしまうことがある。dataclass も通常のPythonのクラス生成の仕組みに従うため基本的に同じだが、list/dict/set をデフォルトに直接指定した場合はTypeErrorを出して事故を減らす。default_factory を使えば、インスタンス生成のたびに新しいmutableを作って各インスタンスに持たせられる。 | ||
|
|
||
|
|
||
| https://github.com/mamo3gr/arai60/blob/139_word-break/139_word-break/step3_tuple_words.py | ||
|
|
||
| > str.startswithがtuple[str]を受け取れる | ||
|
|
||
| 知らなかった。直感的にわかりやすい。 | ||
|
|
||
|
|
||
| ## 計算量 | ||
| - n = |s|, m = len(wordDict), l = max([len(word) for word in wordDoct])とする | ||
|
|
||
| ### sol1.py | ||
| - 時間 O(nml): can_breakはメモ化しているので高々O(n)回呼び出される、それぞれの関数内でwordDict内全ての文字列比較をするので O(ml) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 計算量から具体的な計算時間の目安を見積もると実行前にTLEに気付けるようになるかもしれません.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ありがとうございます、参考になりました。自分で見積もるくせをつけておきたいですね。 |
||
| - 空間 O(n): 再帰スタックとメモ化 | ||
|
|
||
| ### sol2.py | ||
| - 時間 O(nml): (visitedを使っているので)各単語の訪問回数は高々一回 O(n)、それぞれwordDictを全走査O(m)、文字列比較 O(l) | ||
| - 空間 O(n):visited, frontierの管理 | ||
|
|
||
| ### sol3.py | ||
| - 時間 O((m+n)l): Trie木の構築 O(ml)、can_reachが全てTrueになった場合の探索 O(nl) | ||
| - 空間 O(ml+n): Trie木 O(ml)、can_reach O(n) | ||
|
|
||
| sol2.pyは位置それぞれで文字列比較を行う分時間計算量が大きい。 | ||
|
|
||
| sol2, sol3は自分で思いつけていないので後でやり直したい | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| import functools | ||
|
|
||
|
|
||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| len_target = len(s) | ||
|
|
||
| @functools.cache | ||
| def can_break(i) -> bool: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. みていただいた二人の目に留まったということは、良いコードではないのですね。 |
||
| if i == len_target: | ||
| return True | ||
| for word in wordDict: | ||
| if s.startswith(word, i) and can_break(i + len(word)): | ||
| return True | ||
| return False | ||
|
|
||
| return can_break(0) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| if not s: | ||
| return True | ||
| stripped_sub_strs = [] | ||
| for word in wordDict: | ||
| if s.startswith(word): | ||
| stripped_sub_strs.append(s[len(word) :]) | ||
| return any( | ||
| [ | ||
| self.wordBreak(stripped_sub_str, wordDict) | ||
| for stripped_sub_str in stripped_sub_strs | ||
| ] | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| if not s: | ||
| return True | ||
| sub_strings = [] | ||
| for word in wordDict: | ||
| if s.startswith(word): | ||
| sub_strings.append(s[len(word) :]) | ||
| return any( | ||
| [ | ||
| self.wordBreak(stripped_sub_str, wordDict) | ||
| for stripped_sub_str in sub_strings | ||
| ] | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| import functools | ||
|
|
||
|
|
||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| len_target = len(s) | ||
|
|
||
| @functools.cache | ||
| def can_break(start_pos: int) -> bool: | ||
| """returns whether s[start_pos:] can be broken.""" | ||
| if start_pos == len_target: | ||
| return True | ||
| for word in wordDict: | ||
| if s.startswith(word, start_pos) and can_break(start_pos + len(word)): | ||
| return True | ||
| return False | ||
|
|
||
| return can_break(0) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| frontier = [0] | ||
| visited = {0} | ||
| while frontier: | ||
| start_position = frontier.pop() | ||
| if start_position == len(s): | ||
| return True | ||
| for word in wordDict: | ||
| new_position = start_position + len(word) | ||
| if new_position in visited: | ||
| continue | ||
| if s[start_position:new_position] != word: | ||
| continue | ||
| frontier.append(new_position) | ||
| visited.add(new_position) | ||
|
|
||
| return False |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| import dataclasses | ||
| from typing import Dict, List | ||
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class TrieNode: | ||
| children: Dict[str, TrieNode] = dataclasses.field(default_factory=dict) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 別に Pythonのリストは動的配列だったはずなので,同様の議論が成り立つと思いますが,インタプリタ言語である以上ボトルネックがC++と異なるので,可読性なども考慮した上でやる価値があるかは不明です...
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. なるほど、C++の場合にはキャッシュヒットが増えそうなのは理解ができました。 |
||
| is_end: bool = False | ||
|
|
||
|
|
||
| class Solution: | ||
| def wordBreak(self, s: str, wordDict: List[str]) -> bool: | ||
| len_target = len(s) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 好みの問題でしょうが, |
||
| if len_target == 0: | ||
| return True | ||
|
|
||
| root = TrieNode() | ||
| max_len_of_wordDict = max(len(word) for word in wordDict) | ||
| for word in wordDict: | ||
| node = root | ||
| for ch in word: | ||
| node = node.children.setdefault(ch, TrieNode()) | ||
| node.is_end = True | ||
|
|
||
| can_reach = [False] * (len_target + 1) | ||
| can_reach[0] = True | ||
|
|
||
| for i in range(len_target): | ||
| if not can_reach[i]: | ||
| continue | ||
| node = root | ||
| limit = min(len_target, i + max_len_of_wordDict) | ||
| for j in range(i, limit): | ||
| next_node = node.children.get(s[j]) | ||
| if next_node is None: | ||
| break | ||
| node = next_node | ||
| if node.is_end: | ||
| can_reach[j + 1] = True | ||
| if j + 1 == len_target: | ||
| return True | ||
|
|
||
| return can_reach[len_target] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
これはどういう意味でしょうか。listの要素が固定されている、という意味であれば、関数が呼び出されたタイミングで固定されていると思いました。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
計算量の見積もりにおいてて定数とみなされている、という意味です。
もし定数でなければ、正規表現を受理するオートマトンの構築自体の時間も考える必要があると思いました。