-
Notifications
You must be signed in to change notification settings - Fork 19
Add find_shortest_cycle
#194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Remove the "bidirectional" from the function name, since this is the only path finding function we have. The "bidirectional" part is just an implementation detail.
292810d to
fead8d6
Compare
CodSpeed Performance ReportMerging #194 will not alter performanceComparing Summary
|
fead8d6 to
b1804b0
Compare
K4liber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Its just a partial review to satisfy my curiosity :)
| ) -> GrimpResult<Option<Vec<ModuleToken>>> { | ||
| // Exclude imports internal to `modules` | ||
| let mut excluded_imports = excluded_imports.clone(); | ||
| for (m1, m2) in modules.iter().tuple_combinations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I dont know Rust at all, I will try to understand the full code and learn a bit of Rust. Maybe then I will add some useful comments to the PR.
According to those lines (question about logic, not Rust details): what is the logic in lines 62-64 for? Why do we remove internal imports between modules in modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the way to understand this is by looking at the test case test_ignores_internal_imports_when_as_package_is_true in test_cycles.py. If we remove L62-65 then that test case will start failing.
AssertionError: assert ('colors.blue',) == ('colors.red', 'x', 'y', 'z', 'colors.blue')
There is an import between colors.red and colors.blue (and the reverse import too). We don't want to follow either of those imports for the BFS, else we will short circuit and never end up leaving the colors package to find the → x → y → z → path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, makes sense. I mean, I understand now why we have those lines. Thanks for the explanation. But should we have them?
In the test case test_ignores_internal_imports_when_as_package_is_true, I think the shortest cycle could be ("colors.red", "colors.blue"). I think we could consider all child packages of "colors". Why do I think that?
I rather think of package "colors" as a set of all descendant modules inside all sub-packages., not only children.
In case of finding cycles in the whole project, instead of iterating over every single package in some package tree, I would rather just provide a single package. For example, I would like to fix "colors.red". If grimp does not return any cycle for "colors.red", it means that the whole package and all sub-packages inside are fine when it comes to importing.
Maybe we could introduce another parameter called something like "consider_sub_packages: bool = True"? Based on the value we could execute lines 62-64 or not. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function is behaving as I intended it (we're finding the shortest chain from the colors package to itself, but that chain needs to leave the colors package and not just be entirely internal to it).
This behaviour is consistent with e.g. find_shortest_chain. E.g.
p1.m1 → p2.m1
p2.m2 → p3.m1
In the above example find_shortest_chain("p1", "p3", as_packages=True) will return None, since there is no direct import chain between p1 and p3 (the import p2.m1 → p2.m2 is missing). p1 and p3 are treated as entire packages, but p2 is not.
That said, judging by your reply above and #194 (comment) it sounds like find_shortest_cycle is maybe not behaving in the way you needed for your contract. I think I'm a bit confused about the contract you're trying to express. Perhaps we need to write down more clearly the behaviour you're trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about continuing discussion on a high level functionality here (@seddonym already proposed a solid documentation): #189 (comment) and then go back for the low level implementation in this PR after we agree to the scope of high level functionality?
Yes sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to the topic, I guess in the high lvl functionality we can loop over sub-packages of the package in our interest executing find_shortest_cycle. And it will return the internal cycles if any.
For example, having the following dependencies:
graph.add_module("colors")
graph.add_import(importer="colors.red", imported="colors.blue")
graph.add_import(importer="colors.blue", imported="colors.red")
graph.add_import(importer="colors.red", imported="x")
graph.add_import(importer="x", imported="y")
# graph.add_import(importer="y", imported="z")
graph.add_import(importer="z", imported="colors.blue")
we would like to find cycles (including internal ones) for "colors". So we execute:
graph.find_shortest_cycle("colors", as_package=True)
>>> None
We go further and start with a first sub-package of "colors" which is "colors.blue":
graph.find_shortest_cycle("colors.blue", as_package=True)
>>> ('colors.blue', 'colors.red', 'colors.blue')
Wanting to find all cycles, we execute further, selecting the second sub-package "colors.red":
graph.find_shortest_cycle("colors.red", as_package=True)
('colors.red', 'colors.blue', 'colors.red')
We get the semantically equivalent cycle.
I wonder if optionally disabling those lines by having a parameter consider_sub_packages or consider_internal_packages could save some computation time. It could potentially reduce the overlapping calculations on a frequently executed type of request from the high-lvl functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if the entirety of find_cycles_between_siblings is implemented at the Rust layer, in the same way as find_illegal_dependencies_for_layers. Does anyone disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thinner the Python layer is, faster executions we get. As @Peter554 mentioned in the PR description we can have both, low and high lvl functionality implemented in Grimp. Then we can utilize find_shortest_cycle in find_cycles_between_siblings, making sure we do not perform overlapping computations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed here: #189 (comment)
we can utilize the find_shortest_cycle method, as it is implemented now, in the high level method that will be the base of the contract.
| "y", | ||
| "z", | ||
| "colors.blue", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the following test:
def test_cycle_root_independent(self):
graph = ImportGraph()
graph.add_module("package_1")
graph.add_module("package_2")
graph.add_module("package_3")
graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
'package_1.module_a',
'package_2.module_b',
'package_1.package_3.module_c'
)
# should the function return the same for package_2 or not?
assert graph.find_shortest_cycle("package_2", as_package=True) == (
'package_1.module_a',
'package_2.module_b',
'package_1.package_3.module_c'
)
Why do we only find the cycle while using "package_1" as root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so basically, providing as_package=True is equivalent to changing all the internal modules paths of the package to just the package name. So in case of package_1 the two following results are almost equivalent (they differ by the module names in the returned cycle):
graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
'package_1.module_a',
'package_2.module_b',
'package_1.package_3.module_c'
)
graph.add_import(importer="package_2.module_b", imported="package_1")
graph.add_import(importer="package_1", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=False) == (
'package_1',
'package_2.module_b',
'package_1'
)
And in the case of package_2 we have the following results:
1.
graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_2", as_package=True) == None
graph.add_import(importer="package_2", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2")
assert graph.find_shortest_cycle("package_2", as_package=False) == None
We get None, because there is no any module in package_2 having a dependency circle to itself through any package outside of package_2.
b1804b0 to
43b7c31
Compare
| "y", | ||
| "z", | ||
| "colors.blue", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so basically, providing as_package=True is equivalent to changing all the internal modules paths of the package to just the package name. So in case of package_1 the two following results are almost equivalent (they differ by the module names in the returned cycle):
graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
'package_1.module_a',
'package_2.module_b',
'package_1.package_3.module_c'
)
graph.add_import(importer="package_2.module_b", imported="package_1")
graph.add_import(importer="package_1", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=False) == (
'package_1',
'package_2.module_b',
'package_1'
)
And in the case of package_2 we have the following results:
1.
graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_2", as_package=True) == None
graph.add_import(importer="package_2", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2")
assert graph.find_shortest_cycle("package_2", as_package=False) == None
We get None, because there is no any module in package_2 having a dependency circle to itself through any package outside of package_2.
| ) -> GrimpResult<Option<Vec<ModuleToken>>> { | ||
| // Exclude imports internal to `modules` | ||
| let mut excluded_imports = excluded_imports.clone(); | ||
| for (m1, m2) in modules.iter().tuple_combinations() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed here: #189 (comment)
we can utilize the find_shortest_cycle method, as it is implemented now, in the high level method that will be the base of the contract.
|
I'm going to close this one as it's a bit stale and I'm not working on it. |
|
@Peter554 I am actually using your work while implementing AcyclicContract (still a draft, but soon it will be ready for a review: seddonym/import-linter#250). It seems that it works as intended. Maybe one thing could be improved which is a deterministic output. Right now it seems to not be deterministic. I get different "shortest cycle" between runs for the exact same dependency graph. |
This PR adds the
find_shortest_cyclemethod to grimp, which I've been proposing in e.g. #189.This is a low-level, granular method for finding cycles in the import graph. We could potentially add some higher order methods for working with import cycles across the entire graph as @K4liber has been suggesting - I think we can potentially have both, e.g. similar to the import chain APIs
find_shortest_chain,find_shortest_chainsandfind_illegal_dependencies_for_layers(I think grimp should offer both lower and higher level APIs - we can add higher level APIs, but we also shouldn't forget/miss the more basic APIs - for one thing not everyone is using grimp with import linter).Still TODO: