Open
Conversation
…ntention Replace on-demand caching with single global mutex (29% runtime overhead) with parallel pre-computation at construction time. Uses dense vector instead of unordered_map for O(1) lock-free neighbor lookups during algorithm execution.
- Return node move count from parallelMove() for monitoring progress - Add INFO logs showing inner iteration number, nodes moved, and community count - Add max inner iterations safety limit (100) with log message when reached - Remove broken early termination checks that were breaking algorithm correctness
Pre-computing neighbor lists for millions of supernodes was consuming too much memory. Switch to on-demand computation without caching. Tradeoff: - Memory: Much lower (no pre-computed cache) - Performance: ~2-3x slower (recompute neighbors on each access) This allows the algorithm to run on large graphs without OOM.
The coarsenedGraphs vector was accumulating all intermediate graph views, each holding 4GB+ of memory (nodeMapping + supernodeToOriginal for 115M nodes). Fix: Only keep current coarsened view, not historical ones. The 'mappings' vector already stores everything needed for flattenPartition().
Instead of storing all intermediate mappings (each ~920MB for 115M nodes), compose mappings incrementally keeping only one mapping at a time. Memory usage reduced but algorithm still OOMs due to: - parallelRefine allocates 13M mutexes per iteration (~832MB) - parallelMove allocates 115M atomics for inQueue (~115MB) - CoarsenedGraphView uses ~4GB per level Further optimization needed for these components.
After each coarsening step, compact the result partition to remap community IDs to a contiguous range. This reduces the upper bound from ~115M (number of original nodes) to ~13M (number of communities), allowing smaller vectors for communityVolumes and per-thread cutWeights. Also recalculate volumes after compacting to ensure correctness. This reduces peak memory from >64GB (OOM) to ~40GB.
When countSelfLoopsTwice=true, self-loops should be counted twice in the weighted degree (matching Graph::weightedDegree behavior). This was causing incorrect volume calculations in the coarsened view.
Also cap max inner iterations to 20.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previous versions had problems: