feat: louvain community detection for module boundary analysis by carlos-alm · Pull Request #133 · optave/codegraph

carlos-alm · 2026-02-26T23:12:51Z

Summary

Add codegraph communities command that runs Louvain clustering on the dependency graph, compares discovered communities against directory structure, and surfaces architectural drift (split/merge candidates, drift score)
Supports file-level (default) and function-level (--functions) modes with configurable resolution parameter
Integrated into codegraph stats, MCP server (communities tool), and programmatic API exports

Details

New file: src/communities.js (~200 lines) — core module with communitiesData(), communitySummaryForStats(), and communities() CLI display function. Uses graphology + graphology-communities-louvain (pure JS, ~50KB).

CLI options:

--functions — function-level instead of file-level
--resolution <n> — Louvain resolution (default 1.0, higher = more communities)
--drift — show only drift analysis
-T, -j, -d — standard flags

Drift analysis:

Split candidates: directories with members in 2+ communities
Merge candidates: communities spanning 2+ directories
Drift score: 0-100 composite metric

Dependencies: graphology ^0.25.4, graphology-communities-louvain ^2.0.2 (regular deps)

Test plan

13 new integration tests covering file-level, function-level, drift analysis, noTests filtering, resolution parameter, drift-only mode, stats summary, and empty graph edge case
Full test suite passes (710 tests)
Lint clean (Biome)
CLI verified: communities, --functions, --drift -T, --resolution 1.5 -j, stats -T

Add `codegraph communities` command that runs Louvain clustering on the dependency graph, compares discovered communities against directory structure, and surfaces architectural drift (split/merge candidates, drift score). Supports file-level (default) and function-level modes, configurable resolution, and drift-only output. Integrated into stats, MCP, and programmatic API. Impact: 9 functions changed, 8 affected

claude · 2026-02-26T23:13:05Z

Claude finished @carlos-alm's task —— View job

PR Review: Community Detection Feature

Comprehensive review complete using the strict codegraph framework.

Todo List

Phase 1: Root cause analysis - understand problem and approach
Phase 2: Check for high-risk configuration changes
Phase 3: Test integrity verification
Phase 4: Code quality and security review
Final rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses the need to detect natural module boundaries in codebases by identifying tightly coupled groups of files/functions that may not align with the current directory structure. This helps identify architectural drift and suggests refactoring opportunities.

Approach: Uses Louvain community detection algorithm on the dependency graph to discover natural clusters, then compares these communities against directory structure to identify split/merge candidates and calculate a drift score.

Risk Assessment: LOW - This is a genuine analysis feature that adds value without modifying core parsing or graph building logic. No symptoms are being hidden.

Phase 2: Configuration Change Detection ✅

Result: PASS - No high-risk configuration changes detected:

No linting rules disabled or relaxed
No test timeouts increased or coverage thresholds lowered
No build configuration modifications
No TypeScript strict mode flags disabled
Only legitimate dependency additions (graphology packages ~50KB)

Phase 3: Test Integrity ✅

Result: PASS - Strong test coverage with no red flags:

13 comprehensive integration tests covering all modes and edge cases
Tests are rigorous with proper assertions (not weakened)
Hand-crafted multi-directory test fixture with realistic scenarios
Empty graph edge case properly handled
No test modifications alongside code changes that would indicate test weakening

Phase 4: Code Quality and Security Review ✅

Architecture & Integration: EXCELLENT

Clean separation of concerns: src/communities.js (~300 lines) contains all logic
Proper lazy loading with dynamic imports and graceful fallbacks
Well-integrated into CLI, MCP server, and programmatic API (src/index.js)
Stats integration with communitySummaryForStats function

Security: SECURE

All SQL queries use prepared statements (no injection risk)
Only readonly database access via openReadonlyOrFail()
No command injection, eval, or other dangerous patterns
Pure console.log output for CLI display (no XSS vectors)

Code Quality: HIGH

Comprehensive error handling for empty/disconnected graphs
Proper resource management (database closure)
Clear algorithmic approach: SQL → graphology → Louvain → drift analysis
Good naming conventions and JSDoc documentation

Performance: EFFICIENT

Lightweight dependency footprint (~50KB for graphology packages)
Efficient SQL queries filtering by node/edge types
O(n) drift analysis with proper data structures (Maps, Sets)
Configurable resolution parameter for fine-tuning

Minor Observations:

Lines 242-302: CLI output could benefit from centralized formatting, but follows existing patterns
Line 202: Drift score calculation is straightforward (split + merge ratios), appropriately simple

Root Cause Analysis

Problem Being Solved: Identifying architectural drift between logical code dependencies and directory structure to guide refactoring decisions.
Why This Approach: Louvain algorithm is well-established for community detection in graphs, and the comparison to directory structure provides actionable insights.
Risk Assessment: Very low - pure analysis feature with no modification to core graph building or parsing functionality.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: This is a well-designed, thoroughly tested feature that adds significant value to the codebase analysis toolkit. The implementation is secure, performant, and follows all established patterns. The only minor limitation is the lack of sophisticated CLI formatting, but this is consistent with the project's current approach and doesn't detract from functionality.

greptile-apps · 2026-02-26T23:15:36Z

Greptile Summary

This PR adds Louvain community detection to analyze module boundaries and architectural drift. The implementation is well-structured with comprehensive test coverage (13 tests, 710 total passing).

Key strengths:

Clean separation of concerns: graph construction, clustering, drift analysis, and display logic are properly isolated
Robust edge case handling: empty graphs, self-loops, duplicate edges, and missing dependencies all handled gracefully
Smart integration: async imports with try-catch in stats() allow graceful degradation if graphology is unavailable
Proper SQL safety: uses prepared statements, read-only operations, and validates node existence before creating edges
Comprehensive testing: covers both file-level and function-level modes, resolution parameters, drift-only mode, and empty graph scenarios

Technical details:

Builds undirected graphs from SQLite (file nodes + imports OR function nodes + calls)
Filters self-loops and deduplicates edges correctly
Drift analysis identifies split candidates (directories spanning communities) and merge candidates (communities spanning directories)
Modularity score properly bounded 0-1, drift score normalized 0-100

The code follows project conventions, has no security issues, and integrates cleanly into CLI, MCP, and programmatic APIs.

Confidence Score: 5/5

This PR is safe to merge with no issues found
The implementation is thoroughly tested, handles edge cases properly, uses safe SQL practices, integrates cleanly with existing code, and follows established project patterns. No logical errors, security vulnerabilities, or breaking changes detected.
No files require special attention

Important Files Changed

Filename	Overview
src/communities.js	New core module implementing Louvain clustering with proper graph construction, drift analysis, and edge case handling
src/cli.js	Added communities command with proper option parsing and async handler for stats integration
src/mcp.js	Added communities tool to MCP server with proper input schema and async import handling
src/queries.js	Integrated communities summary into stats with graceful fallback for missing dependencies
tests/integration/communities.test.js	Comprehensive test suite with 13 tests covering file/function-level modes, drift analysis, and edge cases

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI[CLI: codegraph communities] --> CommData[communitiesData]
    Stats[CLI: codegraph stats] --> CommSummary[communitySummaryForStats]
    MCP[MCP Server] --> CommData
    API[Programmatic API] --> CommData
    
    CommData --> BuildGraph[buildGraphologyGraph]
    CommSummary --> CommData
    
    BuildGraph --> DB[(SQLite DB)]
    BuildGraph --> FileLevel{Mode?}
    FileLevel -->|file-level| FileNodes[nodes: files<br/>edges: imports]
    FileLevel -->|function-level| FnNodes[nodes: functions<br/>edges: calls]
    
    FileNodes --> GraphObj[Graphology Graph]
    FnNodes --> GraphObj
    
    GraphObj --> Louvain[Louvain Algorithm]
    Louvain --> Communities[Community Assignments]
    
    Communities --> DirAnalysis[Directory Analysis]
    DirAnalysis --> Split[Split Candidates]
    DirAnalysis --> Merge[Merge Candidates]
    DirAnalysis --> DriftScore[Drift Score 0-100]
    
    Communities --> Output[JSON Output]
    Split --> Output
    Merge --> Output
    DriftScore --> Output

_{Last reviewed commit: cc28daa}

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: strict type validation for threshold values in complexity queries Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()` to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently coercing into valid SQL values. Add integration test verifying exceeds arrays and summary.aboveWarn are correctly computed. Addresses Greptile review feedback on #136. Impact: 2 functions changed, 3 affected * docs: add complexity, communities, and manifesto to all docs Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples. * fix: remove redundant condition in paginate guard clauses When limit === undefined, limit !== 0 is always true — the && check was dead code. Simplified to just check limit === undefined. Impact: 2 functions changed, 18 affected * docs: update dogfood report with fix statuses All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep cascade). 3 of 4 suggestions addressed. MCP tool counts updated 18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix. * fix: rename misleading test to match actual behavior Test was named "handles non-numeric thresholds gracefully" but only validated baseline exceeds/aboveWarn with valid thresholds. Actual non-numeric threshold tests exist separately. Renamed to "produces correct exceeds and aboveWarn with valid thresholds". * fix: update stale MCP tool count in dogfood skill (21→24) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: strict type validation for threshold values in complexity queries Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()` to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently coercing into valid SQL values. Add integration test verifying exceeds arrays and summary.aboveWarn are correctly computed. Addresses Greptile review feedback on #136. Impact: 2 functions changed, 3 affected * docs: add complexity, communities, and manifesto to all docs Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples. * fix: remove redundant condition in paginate guard clauses When limit === undefined, limit !== 0 is always true — the && check was dead code. Simplified to just check limit === undefined. Impact: 2 functions changed, 18 affected * docs: update dogfood report with fix statuses All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep cascade). 3 of 4 suggestions addressed. MCP tool counts updated 18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix. * fix: rename misleading test to match actual behavior Test was named "handles non-numeric thresholds gracefully" but only validated baseline exceeds/aboveWarn with valid thresholds. Actual non-numeric threshold tests exist separately. Renamed to "produces correct exceeds and aboveWarn with valid thresholds". * fix: update stale MCP tool count in dogfood skill (21→24) * feat: add complexity analysis for Python, Go, Rust, Java, C#, Ruby, PHP Parameterize the complexity algorithm to support all 10 languages instead of just JS/TS/TSX. Add per-language COMPLEXITY_RULES, HALSTEAD_RULES, and COMMENT_PREFIXES with three else-if detection patterns (else-wraps-if, explicit elif, alternative field). Guard against tree-sitter keyword leaf tokens that share node type names with their parent constructs. Impact: 4 functions changed, 4 affected --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

carlos-alm merged commit f3e36ad into main Feb 26, 2026
18 checks passed

carlos-alm deleted the feat/community-detection branch February 26, 2026 23:35

greptile-apps bot mentioned this pull request Feb 26, 2026

docs: update build performance benchmarks (v2.4.0) #135

Closed

carlos-alm mentioned this pull request Feb 27, 2026

feat: halstead metrics, maintainability index, and docs update #142

Closed

7 tasks

carlos-alm mentioned this pull request Feb 27, 2026

docs: complexity, communities, manifesto across all docs #144

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: louvain community detection for module boundary analysis#133

feat: louvain community detection for module boundary analysis#133
carlos-alm merged 1 commit intomainfrom
feat/community-detection

carlos-alm commented Feb 26, 2026

Uh oh!

claude bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Feb 26, 2026

Summary

Details

Test plan

Uh oh!

claude bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Community Detection Feature

Todo List

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality and Security Review ✅

Root Cause Analysis

Final Recommendation

Uh oh!

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Feb 26, 2026 •

edited

Loading