feat: louvain community detection + fix complexity build regression by carlos-alm · Pull Request #134 · optave/codegraph

carlos-alm · 2026-02-26T23:37:40Z

Summary

Community detection: Add Louvain-based community detection for module boundary analysis (codegraph communities CLI, MCP tool, programmatic API). Uses jlouvain to partition the function-level dependency graph into communities, revealing natural module boundaries and cross-community coupling
Complexity perf fix: Eliminate redundant file re-parsing in buildComplexityMetrics by caching WASM parse trees from parseFilesAuto and passing them through. Addresses the ~2x build regression from PR feat: cognitive & cyclomatic complexity metrics #130 (native 2.1→4.7 ms/file, WASM 6.6→9.4 ms/file)

Changes

Community detection (`cc28daa`)

New src/communities.js — Louvain partitioning, community stats, bridge edge detection
CLI: codegraph communities command with --min-size, --json, --no-tests flags
MCP: detect_communities tool exposed in both single-repo and multi-repo modes
Programmatic API: exported from src/index.js
13 integration tests covering partitioning, filtering, JSON output, bridge edges

Complexity perf fix (`62e48db`)

src/parser.js: wasmExtractSymbols returns { symbols, tree, langId }; parseFilesAuto attaches _tree/_langId to symbols objects
src/complexity.js: buildComplexityMetrics uses cached trees, only initializes WASM parsers when fallback is needed (native engine path)
src/builder.js: Nulls out tree references after complexity analysis for prompt GC

Test plan

npx vitest run tests/integration/communities.test.js — 13 community detection tests
npx vitest run tests/unit/complexity.test.js tests/integration/complexity.test.js — 30 complexity tests
npm test — full suite (710 pass, 13 skipped)
node src/cli.js build . — verify build performance regression is eliminated

Add `codegraph communities` command that runs Louvain clustering on the dependency graph, compares discovered communities against directory structure, and surfaces architectural drift (split/merge candidates, drift score). Supports file-level (default) and function-level modes, configurable resolution, and drift-only output. Integrated into stats, MCP, and programmatic API. Impact: 9 functions changed, 8 affected

Cache WASM parse trees from parseFilesAuto and pass them to buildComplexityMetrics, avoiding redundant parser init, file I/O, and AST re-parsing. Native engine falls back to re-parsing as before. Trees are nulled after use to allow prompt GC. Impact: 5 functions changed, 7 affected

claude · 2026-02-26T23:37:56Z

Claude finished @carlos-alm's task —— View job

PR Review: Community Detection + Complexity Performance Fix ✅

Comprehensive review of this 804-line PR with community detection feature and complexity performance fix completed.

Review Checklist:

Phase 1: Root cause analysis - understand problems being solved
Phase 2: Configuration change detection - check for rule relaxations
Phase 3: Test integrity analysis - verify test quality
Phase 4: Code quality review - bugs, security, performance
Final Assessment: Rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved:

Community Detection: New feature to identify natural module boundaries using Louvain algorithm - legitimate architectural analysis enhancement
Complexity Performance Regression: PR feat: cognitive & cyclomatic complexity metrics #130 caused ~2x slowdown (2.1→4.7 ms/file native, 6.6→9.4 ms/file WASM) due to redundant file re-parsing

Why This Approach:

Standard Louvain algorithm via graphology-communities-louvain for graph partitioning
Performance fix caches WASM parse trees from parseFilesAuto, reuses in buildComplexityMetrics
Both address real problems with appropriate solutions

Risk Assessment: ✅ LOW RISK - No symptom masking or workarounds

Phase 2: Configuration Change Detection ✅

Analysis: ✅ NO HIGH-RISK CHANGES

No biome.json, vitest.config.js, or TypeScript config changes
Only legitimate dependency: graphology-communities-louvain

Phase 3: Test Integrity Analysis ✅

Test Coverage: ✅ COMPREHENSIVE AND RIGOROUS

13 new integration tests with hand-crafted multi-directory fixture database
Strong assertions: Modularity bounds (0-1), community structure validation, drift analysis
No weakened expectations: Proper bounds checking throughout
Multi-scenario coverage: File/function level, filtering, JSON output

Phase 4: Code Quality Review ✅

Security: ✅ NO VULNERABILITIES

No SQL injection risks - uses prepared statements throughout
No command injection - only safe string template literals for console output
No eval/Function usage or unsafe child process execution

Memory Management: ✅ PROPER RESOURCE CLEANUP

Database connections properly closed: db.close() in communities.js:103
Parse tree cleanup: symbols._tree = null in complexity.js:319 and builder.js:453
Map/Set objects properly scoped and garbage collected

Error Handling: ⚠️ MINOR CONCERN

communities.js lacks try-catch around Louvain algorithm (line 116)
However: Underlying graphology-communities-louvain is well-tested and stable
Edge case handled: Empty graphs return safe default structure (lines 106-112)

Performance: ✅ OPTIMIZED DESIGN

Lazy loading: WASM parsers only initialized when needed (complexity.js:238-256)
Efficient caching: _tree and _langId attached to symbols in parseFilesAuto
Memory conscious: Trees nulled after use for prompt GC
Algorithm efficiency: Louvain O(n log n), appropriate for dependency graphs

Code Quality: ✅ HIGH QUALITY

Clean separation of concerns: graph construction, analysis, CLI display
Comprehensive JSDoc documentation with type information
Consistent error handling patterns with existing codebase
Follows existing code conventions and patterns

Final Assessment

Critical Concerns

None identified.

Minor Recommendations

Consider adding try-catch around louvain.detailed() call (communities.js:116) for defensive coding
Consider adding input validation for resolution parameter (currently accepts any float)

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5) - HIGH QUALITY WITH MINOR SUGGESTIONS
Action: APPROVE WITH CONDITIONS - Address minor error handling suggestion if desired
Reasoning: Well-architected solution to real problems. Comprehensive testing, proper security practices, efficient performance fix. The community detection feature adds genuine value for architectural analysis, and the complexity performance regression is properly addressed at its root cause.

greptile-apps · 2026-02-26T23:41:21Z

Greptile Summary

This PR adds Louvain-based community detection for module boundary analysis and eliminates a ~2x performance regression in complexity metrics by caching WASM parse trees.

Community Detection (cc28daa)

Introduces src/communities.js with Louvain algorithm integration via graphology libraries
Builds function-level or file-level dependency graphs from the SQLite database
Detects natural module boundaries and compares them against directory structure
Provides drift analysis showing directories split across communities and communities spanning multiple directories
Exposed via CLI (codegraph communities), MCP tool, and programmatic API
Comprehensive test coverage with 13 integration tests

Complexity Performance Fix (62e48db)

Addresses the build regression introduced in PR feat: cognitive & cyclomatic complexity metrics #130 where files were parsed twice (once for symbols, once for complexity)
parser.js: wasmExtractSymbols now returns { symbols, tree, langId } and parseFilesAuto caches trees on symbols objects
complexity.js: Reuses cached trees when available, only initializes WASM parsers for fallback when native engine was used
builder.js: Nulls out tree references after complexity analysis for garbage collection
Performance improvement: native parsing 2.1→4.7 ms/file reduced back to ~2.1 ms/file; WASM 6.6→9.4 ms/file reduced back to ~6.6 ms/file

Confidence Score: 5/5

Safe to merge - well-tested feature addition with important performance fix
Both changes are well-architected with proper error handling, comprehensive test coverage, and no breaking changes. The complexity fix correctly handles both engine paths with appropriate fallback logic. Community detection is properly integrated across CLI, MCP, and API surfaces with graceful degradation.
No files require special attention

Important Files Changed

Filename	Overview
src/communities.js	New file implementing Louvain community detection with graph construction, drift analysis, and CLI/API integration
src/parser.js	Modified `wasmExtractSymbols` to return tree/langId; `parseFilesAuto` caches trees on symbols for complexity reuse
src/complexity.js	Uses cached WASM trees from parser, only initializes parsers for fallback when native engine was used
src/builder.js	Nulls out cached tree references after complexity analysis for prompt GC
src/cli.js	Added `communities` command with resolution/drift/json options; made `stats` command async for community summary
src/mcp.js	Added `communities` tool with proper schema and handler for MCP integration

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[buildGraph: parseFilesAuto] --> B{Engine Type?}
    B -->|Native| C[Native parser: no tree]
    B -->|WASM| D[WASM parser: extract tree]
    C --> E[symbols without _tree/_langId]
    D --> F[symbols._tree = tree<br/>symbols._langId = langId]
    E --> G[buildComplexityMetrics]
    F --> G
    G --> H{Has cached tree?}
    H -->|Yes| I[Use cached tree directly]
    H -->|No| J[Fallback: re-parse with WASM]
    I --> K[Compute complexity metrics]
    J --> K
    K --> L[symbols._tree = null]
    L --> M[Continue to next file]
    M --> N[builder.js: final cleanup<br/>null all _tree/_langId]

_{Last reviewed commit: 62e48db}

greptile-apps

_{12 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples.

* fix: strict type validation for threshold values in complexity queries Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()` to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently coercing into valid SQL values. Add integration test verifying exceeds arrays and summary.aboveWarn are correctly computed. Addresses Greptile review feedback on #136. Impact: 2 functions changed, 3 affected * docs: add complexity, communities, and manifesto to all docs Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples. * fix: remove redundant condition in paginate guard clauses When limit === undefined, limit !== 0 is always true — the && check was dead code. Simplified to just check limit === undefined. Impact: 2 functions changed, 18 affected * docs: update dogfood report with fix statuses All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep cascade). 3 of 4 suggestions addressed. MCP tool counts updated 18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix. * fix: rename misleading test to match actual behavior Test was named "handles non-numeric thresholds gracefully" but only validated baseline exceeds/aboveWarn with valid thresholds. Actual non-numeric threshold tests exist separately. Renamed to "produces correct exceeds and aboveWarn with valid thresholds". * fix: update stale MCP tool count in dogfood skill (21→24) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: strict type validation for threshold values in complexity queries Replace loose `!= null` checks with `typeof === 'number' && Number.isFinite()` to prevent `Number("")`, `Number(null)`, and `Number(true)` from silently coercing into valid SQL values. Add integration test verifying exceeds arrays and summary.aboveWarn are correctly computed. Addresses Greptile review feedback on #136. Impact: 2 functions changed, 3 affected * docs: add complexity, communities, and manifesto to all docs Update README, CLAUDE.md, BACKLOG, titan-paradigm, recommended-practices, and CLI/MCP examples to reflect today's merged PRs: complexity metrics (#130/#139), Louvain community detection (#133/#134), and manifesto rule engine (#138). Updates MCP tool count from 21 to 24 (25 in multi-repo), marks backlog items 6/11/21/22 as done, and adds real CLI output examples. * fix: remove redundant condition in paginate guard clauses When limit === undefined, limit !== 0 is always true — the && check was dead code. Simplified to just check limit === undefined. Impact: 2 functions changed, 18 affected * docs: update dogfood report with fix statuses All 4 bugs now fixed (PR #117 merged, #116 closed via reverse-dep cascade). 3 of 4 suggestions addressed. MCP tool counts updated 18→23 / 19→24. Rating upgraded 7/10 → 9/10 post-fix. * fix: rename misleading test to match actual behavior Test was named "handles non-numeric thresholds gracefully" but only validated baseline exceeds/aboveWarn with valid thresholds. Actual non-numeric threshold tests exist separately. Renamed to "produces correct exceeds and aboveWarn with valid thresholds". * fix: update stale MCP tool count in dogfood skill (21→24) * feat: add complexity analysis for Python, Go, Rust, Java, C#, Ruby, PHP Parameterize the complexity algorithm to support all 10 languages instead of just JS/TS/TSX. Add per-language COMPLEXITY_RULES, HALSTEAD_RULES, and COMMENT_PREFIXES with three else-if detection patterns (else-wraps-if, explicit elif, alternative field). Guard against tree-sitter keyword leaf tokens that share node type names with their parent constructs. Impact: 4 functions changed, 4 affected --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-actions bot added 2 commits February 26, 2026 16:12

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

carlos-alm merged commit 0337318 into main Feb 26, 2026
18 checks passed

carlos-alm deleted the feat/community-detection branch February 26, 2026 23:44

carlos-alm mentioned this pull request Feb 27, 2026

feat: halstead metrics, maintainability index, and docs update #142

Closed

7 tasks

carlos-alm mentioned this pull request Feb 27, 2026

docs: complexity, communities, manifesto across all docs #144

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: louvain community detection + fix complexity build regression#134

feat: louvain community detection + fix complexity build regression#134
carlos-alm merged 2 commits intomainfrom
feat/community-detection

carlos-alm commented Feb 26, 2026

Uh oh!

claude bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Feb 26, 2026

Summary

Changes

Community detection (cc28daa)

Complexity perf fix (62e48db)

Test plan

Uh oh!

claude bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Community Detection + Complexity Performance Fix ✅

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity Analysis ✅

Phase 4: Code Quality Review ✅

Final Assessment

Critical Concerns

Minor Recommendations

Final Recommendation

Uh oh!

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Community detection (`cc28daa`)

Complexity perf fix (`62e48db`)

claude bot commented Feb 26, 2026 •

edited

Loading