Skip to content

fix: Parallel Leiden bugs#19

Open
adsharma wants to merge 22 commits intomainfrom
leiden_memory_convergence
Open

fix: Parallel Leiden bugs#19
adsharma wants to merge 22 commits intomainfrom
leiden_memory_convergence

Conversation

@adsharma
Copy link
Contributor

Previous versions had problems:

  • memory usage kept increasing
  • high lock contention and lock memory allocation
  • algorithm diverged from the original in that it didn't converge

…ntention

Replace on-demand caching with single global mutex (29% runtime overhead)
with parallel pre-computation at construction time. Uses dense vector instead
of unordered_map for O(1) lock-free neighbor lookups during algorithm execution.
- Return node move count from parallelMove() for monitoring progress
- Add INFO logs showing inner iteration number, nodes moved, and community count
- Add max inner iterations safety limit (100) with log message when reached
- Remove broken early termination checks that were breaking algorithm correctness
Pre-computing neighbor lists for millions of supernodes was consuming
too much memory. Switch to on-demand computation without caching.

Tradeoff:
- Memory: Much lower (no pre-computed cache)
- Performance: ~2-3x slower (recompute neighbors on each access)

This allows the algorithm to run on large graphs without OOM.
The coarsenedGraphs vector was accumulating all intermediate graph views,
each holding 4GB+ of memory (nodeMapping + supernodeToOriginal for 115M nodes).

Fix: Only keep current coarsened view, not historical ones.
The 'mappings' vector already stores everything needed for flattenPartition().
Instead of storing all intermediate mappings (each ~920MB for 115M nodes),
compose mappings incrementally keeping only one mapping at a time.

Memory usage reduced but algorithm still OOMs due to:
- parallelRefine allocates 13M mutexes per iteration (~832MB)
- parallelMove allocates 115M atomics for inQueue (~115MB)
- CoarsenedGraphView uses ~4GB per level

Further optimization needed for these components.
After each coarsening step, compact the result partition to remap
community IDs to a contiguous range. This reduces the upper bound
from ~115M (number of original nodes) to ~13M (number of communities),
allowing smaller vectors for communityVolumes and per-thread cutWeights.

Also recalculate volumes after compacting to ensure correctness.

This reduces peak memory from >64GB (OOM) to ~40GB.
When countSelfLoopsTwice=true, self-loops should be counted twice
in the weighted degree (matching Graph::weightedDegree behavior).
This was causing incorrect volume calculations in the coarsened view.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant