Skip to content

feat: louvain community detection for module boundary analysis#133

Merged
carlos-alm merged 1 commit intomainfrom
feat/community-detection
Feb 26, 2026
Merged

feat: louvain community detection for module boundary analysis#133
carlos-alm merged 1 commit intomainfrom
feat/community-detection

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add codegraph communities command that runs Louvain clustering on the dependency graph, compares discovered communities against directory structure, and surfaces architectural drift (split/merge candidates, drift score)
  • Supports file-level (default) and function-level (--functions) modes with configurable resolution parameter
  • Integrated into codegraph stats, MCP server (communities tool), and programmatic API exports

Details

New file: src/communities.js (~200 lines) — core module with communitiesData(), communitySummaryForStats(), and communities() CLI display function. Uses graphology + graphology-communities-louvain (pure JS, ~50KB).

CLI options:

  • --functions — function-level instead of file-level
  • --resolution <n> — Louvain resolution (default 1.0, higher = more communities)
  • --drift — show only drift analysis
  • -T, -j, -d — standard flags

Drift analysis:

  • Split candidates: directories with members in 2+ communities
  • Merge candidates: communities spanning 2+ directories
  • Drift score: 0-100 composite metric

Dependencies: graphology ^0.25.4, graphology-communities-louvain ^2.0.2 (regular deps)

Test plan

  • 13 new integration tests covering file-level, function-level, drift analysis, noTests filtering, resolution parameter, drift-only mode, stats summary, and empty graph edge case
  • Full test suite passes (710 tests)
  • Lint clean (Biome)
  • CLI verified: communities, --functions, --drift -T, --resolution 1.5 -j, stats -T

Add `codegraph communities` command that runs Louvain clustering on the
dependency graph, compares discovered communities against directory
structure, and surfaces architectural drift (split/merge candidates,
drift score). Supports file-level (default) and function-level modes,
configurable resolution, and drift-only output. Integrated into stats,
MCP, and programmatic API.

Impact: 9 functions changed, 8 affected
@claude
Copy link

claude bot commented Feb 26, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Community Detection Feature

Comprehensive review complete using the strict codegraph framework.

Todo List

  • Phase 1: Root cause analysis - understand problem and approach
  • Phase 2: Check for high-risk configuration changes
  • Phase 3: Test integrity verification
  • Phase 4: Code quality and security review
  • Final rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses the need to detect natural module boundaries in codebases by identifying tightly coupled groups of files/functions that may not align with the current directory structure. This helps identify architectural drift and suggests refactoring opportunities.

Approach: Uses Louvain community detection algorithm on the dependency graph to discover natural clusters, then compares these communities against directory structure to identify split/merge candidates and calculate a drift score.

Risk Assessment: LOW - This is a genuine analysis feature that adds value without modifying core parsing or graph building logic. No symptoms are being hidden.

Phase 2: Configuration Change Detection ✅

Result: PASS - No high-risk configuration changes detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No build configuration modifications
  • No TypeScript strict mode flags disabled
  • Only legitimate dependency additions (graphology packages ~50KB)

Phase 3: Test Integrity ✅

Result: PASS - Strong test coverage with no red flags:

  • 13 comprehensive integration tests covering all modes and edge cases
  • Tests are rigorous with proper assertions (not weakened)
  • Hand-crafted multi-directory test fixture with realistic scenarios
  • Empty graph edge case properly handled
  • No test modifications alongside code changes that would indicate test weakening

Phase 4: Code Quality and Security Review ✅

Architecture & Integration: EXCELLENT

  • Clean separation of concerns: src/communities.js (~300 lines) contains all logic
  • Proper lazy loading with dynamic imports and graceful fallbacks
  • Well-integrated into CLI, MCP server, and programmatic API (src/index.js)
  • Stats integration with communitySummaryForStats function

Security: SECURE

  • All SQL queries use prepared statements (no injection risk)
  • Only readonly database access via openReadonlyOrFail()
  • No command injection, eval, or other dangerous patterns
  • Pure console.log output for CLI display (no XSS vectors)

Code Quality: HIGH

  • Comprehensive error handling for empty/disconnected graphs
  • Proper resource management (database closure)
  • Clear algorithmic approach: SQL → graphology → Louvain → drift analysis
  • Good naming conventions and JSDoc documentation

Performance: EFFICIENT

  • Lightweight dependency footprint (~50KB for graphology packages)
  • Efficient SQL queries filtering by node/edge types
  • O(n) drift analysis with proper data structures (Maps, Sets)
  • Configurable resolution parameter for fine-tuning

Minor Observations:

  • Lines 242-302: CLI output could benefit from centralized formatting, but follows existing patterns
  • Line 202: Drift score calculation is straightforward (split + merge ratios), appropriately simple

Root Cause Analysis

Problem Being Solved: Identifying architectural drift between logical code dependencies and directory structure to guide refactoring decisions.
Why This Approach: Louvain algorithm is well-established for community detection in graphs, and the comparison to directory structure provides actionable insights.
Risk Assessment: Very low - pure analysis feature with no modification to core graph building or parsing functionality.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-designed, thoroughly tested feature that adds significant value to the codebase analysis toolkit. The implementation is secure, performant, and follows all established patterns. The only minor limitation is the lack of sophisticated CLI formatting, but this is consistent with the project's current approach and doesn't detract from functionality.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR adds Louvain community detection to analyze module boundaries and architectural drift. The implementation is well-structured with comprehensive test coverage (13 tests, 710 total passing).

Key strengths:

  • Clean separation of concerns: graph construction, clustering, drift analysis, and display logic are properly isolated
  • Robust edge case handling: empty graphs, self-loops, duplicate edges, and missing dependencies all handled gracefully
  • Smart integration: async imports with try-catch in stats() allow graceful degradation if graphology is unavailable
  • Proper SQL safety: uses prepared statements, read-only operations, and validates node existence before creating edges
  • Comprehensive testing: covers both file-level and function-level modes, resolution parameters, drift-only mode, and empty graph scenarios

Technical details:

  • Builds undirected graphs from SQLite (file nodes + imports OR function nodes + calls)
  • Filters self-loops and deduplicates edges correctly
  • Drift analysis identifies split candidates (directories spanning communities) and merge candidates (communities spanning directories)
  • Modularity score properly bounded 0-1, drift score normalized 0-100

The code follows project conventions, has no security issues, and integrates cleanly into CLI, MCP, and programmatic APIs.

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • The implementation is thoroughly tested, handles edge cases properly, uses safe SQL practices, integrates cleanly with existing code, and follows established project patterns. No logical errors, security vulnerabilities, or breaking changes detected.
  • No files require special attention

Important Files Changed

Filename Overview
src/communities.js New core module implementing Louvain clustering with proper graph construction, drift analysis, and edge case handling
src/cli.js Added communities command with proper option parsing and async handler for stats integration
src/mcp.js Added communities tool to MCP server with proper input schema and async import handling
src/queries.js Integrated communities summary into stats with graceful fallback for missing dependencies
tests/integration/communities.test.js Comprehensive test suite with 13 tests covering file/function-level modes, drift analysis, and edge cases

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI[CLI: codegraph communities] --> CommData[communitiesData]
    Stats[CLI: codegraph stats] --> CommSummary[communitySummaryForStats]
    MCP[MCP Server] --> CommData
    API[Programmatic API] --> CommData
    
    CommData --> BuildGraph[buildGraphologyGraph]
    CommSummary --> CommData
    
    BuildGraph --> DB[(SQLite DB)]
    BuildGraph --> FileLevel{Mode?}
    FileLevel -->|file-level| FileNodes[nodes: files<br/>edges: imports]
    FileLevel -->|function-level| FnNodes[nodes: functions<br/>edges: calls]
    
    FileNodes --> GraphObj[Graphology Graph]
    FnNodes --> GraphObj
    
    GraphObj --> Louvain[Louvain Algorithm]
    Louvain --> Communities[Community Assignments]
    
    Communities --> DirAnalysis[Directory Analysis]
    DirAnalysis --> Split[Split Candidates]
    DirAnalysis --> Merge[Merge Candidates]
    DirAnalysis --> DriftScore[Drift Score 0-100]
    
    Communities --> Output[JSON Output]
    Split --> Output
    Merge --> Output
    DriftScore --> Output
Loading

Last reviewed commit: cc28daa

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@carlos-alm carlos-alm merged commit f3e36ad into main Feb 26, 2026
18 checks passed
@carlos-alm carlos-alm deleted the feat/community-detection branch February 26, 2026 23:35
carlos-alm pushed a commit that referenced this pull request Feb 27, 2026
Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.
carlos-alm pushed a commit that referenced this pull request Feb 27, 2026
Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.
carlos-alm added a commit that referenced this pull request Feb 27, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on #136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
carlos-alm added a commit that referenced this pull request Feb 27, 2026
* fix: strict type validation for threshold values in complexity queries

Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()`
to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently
coercing into valid SQL values. Add integration test verifying exceeds
arrays and summary.aboveWarn are correctly computed.

Addresses Greptile review feedback on #136.

Impact: 2 functions changed, 3 affected

* docs: add complexity, communities, and manifesto to all docs

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices,
and CLI/MCP examples to reflect today's merged PRs: complexity metrics
(#130/#139), Louvain community detection (#133/#134), and manifesto rule
engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo),
marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: remove redundant condition in paginate guard clauses

When limit === undefined, limit !== 0 is always true — the && check
was dead code. Simplified to just check limit === undefined.

Impact: 2 functions changed, 18 affected

* docs: update dogfood report with fix statuses

All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep
cascade). 3 of 4 suggestions addressed. MCP tool counts updated
18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix.

* fix: rename misleading test to match actual behavior

Test was named "handles non-numeric thresholds gracefully" but only
validated baseline exceeds/aboveWarn with valid thresholds. Actual
non-numeric threshold tests exist separately. Renamed to "produces
correct exceeds and aboveWarn with valid thresholds".

* fix: update stale MCP tool count in dogfood skill (21→24)

* feat: add complexity analysis for Python, Go, Rust, Java, C#, Ruby, PHP

Parameterize the complexity algorithm to support all 10 languages instead
of just JS/TS/TSX. Add per-language COMPLEXITY_RULES, HALSTEAD_RULES, and
COMMENT_PREFIXES with three else-if detection patterns (else-wraps-if,
explicit elif, alternative field). Guard against tree-sitter keyword leaf
tokens that share node type names with their parent constructs.

Impact: 4 functions changed, 4 affected

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant