Skip to content

Conversation

@Peter554
Copy link
Collaborator

@Peter554 Peter554 commented Mar 2, 2025

This PR adds the find_shortest_cycle method to grimp, which I've been proposing in e.g. #189.

This is a low-level, granular method for finding cycles in the import graph. We could potentially add some higher order methods for working with import cycles across the entire graph as @K4liber has been suggesting - I think we can potentially have both, e.g. similar to the import chain APIs find_shortest_chain, find_shortest_chains and find_illegal_dependencies_for_layers (I think grimp should offer both lower and higher level APIs - we can add higher level APIs, but we also shouldn't forget/miss the more basic APIs - for one thing not everyone is using grimp with import linter).

Still TODO:

  • Align with @seddonym and @K4liber on whether we want this method added to grimp
  • Check implementation for correctness. More (python) tests?
  • Add benchmark
  • Usage docs
  • Update changelog

Peter554 added 2 commits March 2, 2025 06:24
Remove the "bidirectional" from the function name,
since this is the only path finding function we have.

The "bidirectional" part is just an implementation detail.
@Peter554 Peter554 force-pushed the find-shortest-cycle branch 2 times, most recently from 292810d to fead8d6 Compare March 2, 2025 06:45
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 2, 2025

CodSpeed Performance Report

Merging #194 will not alter performance

Comparing Peter554:find-shortest-cycle (43b7c31) with master (8c1bb5d)

Summary

✅ 17 untouched benchmarks

@Peter554 Peter554 force-pushed the find-shortest-cycle branch from fead8d6 to b1804b0 Compare March 2, 2025 07:16
Copy link

@K4liber K4liber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Its just a partial review to satisfy my curiosity :)

) -> GrimpResult<Option<Vec<ModuleToken>>> {
// Exclude imports internal to `modules`
let mut excluded_imports = excluded_imports.clone();
for (m1, m2) in modules.iter().tuple_combinations() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I dont know Rust at all, I will try to understand the full code and learn a bit of Rust. Maybe then I will add some useful comments to the PR.

According to those lines (question about logic, not Rust details): what is the logic in lines 62-64 for? Why do we remove internal imports between modules in modules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the way to understand this is by looking at the test case test_ignores_internal_imports_when_as_package_is_true in test_cycles.py. If we remove L62-65 then that test case will start failing.

AssertionError: assert ('colors.blue',) == ('colors.red', 'x', 'y', 'z', 'colors.blue')

There is an import between colors.red and colors.blue (and the reverse import too). We don't want to follow either of those imports for the BFS, else we will short circuit and never end up leaving the colors package to find the → x → y → z → path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that make sense?

Copy link

@K4liber K4liber Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, makes sense. I mean, I understand now why we have those lines. Thanks for the explanation. But should we have them?

In the test case test_ignores_internal_imports_when_as_package_is_true, I think the shortest cycle could be ("colors.red", "colors.blue"). I think we could consider all child packages of "colors". Why do I think that?

I rather think of package "colors" as a set of all descendant modules inside all sub-packages., not only children.

In case of finding cycles in the whole project, instead of iterating over every single package in some package tree, I would rather just provide a single package. For example, I would like to fix "colors.red". If grimp does not return any cycle for "colors.red", it means that the whole package and all sub-packages inside are fine when it comes to importing.

Maybe we could introduce another parameter called something like "consider_sub_packages: bool = True"? Based on the value we could execute lines 62-64 or not. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function is behaving as I intended it (we're finding the shortest chain from the colors package to itself, but that chain needs to leave the colors package and not just be entirely internal to it).

This behaviour is consistent with e.g. find_shortest_chain. E.g.

p1.m1 → p2.m1
        p2.m2 → p3.m1

In the above example find_shortest_chain("p1", "p3", as_packages=True) will return None, since there is no direct import chain between p1 and p3 (the import p2.m1 → p2.m2 is missing). p1 and p3 are treated as entire packages, but p2 is not.

That said, judging by your reply above and #194 (comment) it sounds like find_shortest_cycle is maybe not behaving in the way you needed for your contract. I think I'm a bit confused about the contract you're trying to express. Perhaps we need to write down more clearly the behaviour you're trying to achieve?

Copy link
Collaborator Author

@Peter554 Peter554 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about continuing discussion on a high level functionality here (@seddonym already proposed a solid documentation): #189 (comment) and then go back for the low level implementation in this PR after we agree to the scope of high level functionality?

Yes sounds good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to the topic, I guess in the high lvl functionality we can loop over sub-packages of the package in our interest executing find_shortest_cycle. And it will return the internal cycles if any.

For example, having the following dependencies:

graph.add_module("colors")
graph.add_import(importer="colors.red", imported="colors.blue")
graph.add_import(importer="colors.blue", imported="colors.red")
graph.add_import(importer="colors.red", imported="x")
graph.add_import(importer="x", imported="y")
# graph.add_import(importer="y", imported="z")
graph.add_import(importer="z", imported="colors.blue")

we would like to find cycles (including internal ones) for "colors". So we execute:

graph.find_shortest_cycle("colors", as_package=True)
>>> None

We go further and start with a first sub-package of "colors" which is "colors.blue":

graph.find_shortest_cycle("colors.blue", as_package=True)
>>> ('colors.blue', 'colors.red', 'colors.blue')

Wanting to find all cycles, we execute further, selecting the second sub-package "colors.red":

graph.find_shortest_cycle("colors.red", as_package=True)
('colors.red', 'colors.blue', 'colors.red')

We get the semantically equivalent cycle.

I wonder if optionally disabling those lines by having a parameter consider_sub_packages or consider_internal_packages could save some computation time. It could potentially reduce the overlapping calculations on a frequently executed type of request from the high-lvl functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if the entirety of find_cycles_between_siblings is implemented at the Rust layer, in the same way as find_illegal_dependencies_for_layers. Does anyone disagree?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thinner the Python layer is, faster executions we get. As @Peter554 mentioned in the PR description we can have both, low and high lvl functionality implemented in Grimp. Then we can utilize find_shortest_cycle in find_cycles_between_siblings, making sure we do not perform overlapping computations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed here: #189 (comment)
we can utilize the find_shortest_cycle method, as it is implemented now, in the high level method that will be the base of the contract.

"y",
"z",
"colors.blue",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following test:

    def test_cycle_root_independent(self):
        graph = ImportGraph()
        graph.add_module("package_1")
        graph.add_module("package_2")
        graph.add_module("package_3")
        graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
        graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
        assert graph.find_shortest_cycle("package_1", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
        )
        # should the function return the same for package_2 or not?
        assert graph.find_shortest_cycle("package_2", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
        )

Why do we only find the cycle while using "package_1" as root?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so basically, providing as_package=True is equivalent to changing all the internal modules paths of the package to just the package name. So in case of package_1 the two following results are almost equivalent (they differ by the module names in the returned cycle):

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
)
graph.add_import(importer="package_2.module_b", imported="package_1")
graph.add_import(importer="package_1", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=False) == (
            'package_1',
            'package_2.module_b',
            'package_1'
)

And in the case of package_2 we have the following results:
1.

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_2", as_package=True) == None
graph.add_import(importer="package_2", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2")
assert graph.find_shortest_cycle("package_2", as_package=False) == None

We get None, because there is no any module in package_2 having a dependency circle to itself through any package outside of package_2.

@Peter554 Peter554 force-pushed the find-shortest-cycle branch from b1804b0 to 43b7c31 Compare March 2, 2025 10:27
"y",
"z",
"colors.blue",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so basically, providing as_package=True is equivalent to changing all the internal modules paths of the package to just the package name. So in case of package_1 the two following results are almost equivalent (they differ by the module names in the returned cycle):

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
)
graph.add_import(importer="package_2.module_b", imported="package_1")
graph.add_import(importer="package_1", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=False) == (
            'package_1',
            'package_2.module_b',
            'package_1'
)

And in the case of package_2 we have the following results:
1.

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_2", as_package=True) == None
graph.add_import(importer="package_2", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2")
assert graph.find_shortest_cycle("package_2", as_package=False) == None

We get None, because there is no any module in package_2 having a dependency circle to itself through any package outside of package_2.

) -> GrimpResult<Option<Vec<ModuleToken>>> {
// Exclude imports internal to `modules`
let mut excluded_imports = excluded_imports.clone();
for (m1, m2) in modules.iter().tuple_combinations() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed here: #189 (comment)
we can utilize the find_shortest_cycle method, as it is implemented now, in the high level method that will be the base of the contract.

@Peter554
Copy link
Collaborator Author

I'm going to close this one as it's a bit stale and I'm not working on it.

@Peter554 Peter554 closed this Jun 16, 2025
@K4liber
Copy link

K4liber commented Jun 16, 2025

@Peter554 I am actually using your work while implementing AcyclicContract (still a draft, but soon it will be ready for a review: seddonym/import-linter#250). It seems that it works as intended. Maybe one thing could be improved which is a deterministic output. Right now it seems to not be deterministic. I get different "shortest cycle" between runs for the exact same dependency graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants