feat: Add AST-based code chunking support with new splitter and confi… #23

rostilos · 2025-12-30T09:40:12Z

…guration options

coderabbitai · 2025-12-30T09:40:20Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ent in indexing

…namic imports

…ting in repository. Fix issue with non-resolving issues duplicates ( branch reconciliation )

…ed issue reconciliation

…status handling for previous issues

…ure correct handling of duplicate issues

… tokens and enhance connection handling

… issue deduplication; enhance prompt builder with detailed instructions

rostilos · 2025-12-31T11:34:13Z

/codecrow analyze

codecrow-ai · 2025-12-31T11:34:15Z

⚠️ CodeCrow Command Failed

Comment commands are not enabled for this project

Check the job logs in CodeCrow for detailed error information.

codecrow-local · 2025-12-31T11:34:42Z

⚠️ Code Analysis Results

Summary

The pull request introduces a major upgrade to the RAG pipeline with AST-based chunking, an issue post-processing service to improve AI review quality, and critical fixes for Bitbucket Connect App authentication. While the architectural improvements are significant, there are several security and reliability concerns regarding shared secret handling, potential resource leaks in the indexing process, and inconsistent use of identifier fields in the Java backend.

Issues Overview

Severity	Count
🔴 High	1	Critical issues requiring immediate attention
🟡 Medium	2	Issues that should be addressed
🔵 Low	1	Minor issues and improvements

Detailed Issues

🔴 High Severity Issues

Id on Platform: 666

Category: 🔒 Security

File: .../vcsclient/VcsClientProvider.java:307

Issue: In refreshBitbucketConnectAppConnection, if decryption fails, the code falls back to using the shared secret 'as-is'. This bypasses security controls and encourages storing secrets in plaintext. Secrets should always be encrypted at rest.

💡 Suggested Fix

Remove the fallback to plaintext. If the secret cannot be decrypted, it should be considered an invalid configuration and throw an exception.

--- a/java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/VcsClientProvider.java
+++ b/java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/VcsClientProvider.java
@@ -301,8 +301,7 @@
         try {
             decryptedSecret = encryptionService.decrypt(sharedSecret);
         } catch (Exception e) {
-            // Shared secret might be stored unencrypted (old installation) - use as-is
-            log.warn("Could not decrypt shared secret, using as stored: {}", e.getMessage());
-            decryptedSecret = sharedSecret;
+            log.error("Failed to decrypt shared secret for client key: {}", installation.getClientKey());
+            throw new VcsClientException("Invalid or corrupted shared secret encryption", e);
         }

View Issue Details

🟡 Medium Severity Issues

Id on Platform: 667

Category: ⚡ Performance

File: .../core/index_manager.py:331

Issue: The indexing process pre-computes embeddings for the entire batch of 100 chunks in memory before insertion. For very large documents or high-dimensional embeddings, this could lead to memory pressure or OOM (Out Of Memory) errors, especially since GC is only called every 10 batches.

💡 Suggested Fix

Reduce the embedding batch size or implement more aggressive cleanup of the text and embedding buffers.

--- a/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager.py
+++ b/python-ecosystem/rag-pipeline/src/rag_pipeline/core/index_manager.py
@@ -328,1 +328,1 @@
-            embedding_batch_size = 100  # Batch size for embedding API calls
+            embedding_batch_size = 32  # Smaller batch size to reduce memory pressure

View Issue Details

Id on Platform: 668

Category: 🐛 Bug Risk

File: .../analysis/BranchAnalysisProcessor.java:476

Issue: The reconciliation logic was changed to look for 'id' instead of 'issueId'. While this matches the new Pydantic model, several parts of the system (like incremental_review_prompt) still refer to 'issueId', which will cause reconciliation to fail for those components.

💡 Suggested Fix

The AI response should ideally provide both or the backend should check both keys to maintain backward compatibility during the migration to the new 'id' field.

--- a/java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java
+++ b/java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java
@@ -475,1 +475,4 @@
-        Object issueIdFromAi = issueData.get("id");
+        Object issueIdFromAi = issueData.get("id");
+        if (issueIdFromAi == null) {
+            issueIdFromAi = issueData.get("issueId");
+        }

View Issue Details

🔵 Low Severity Issues

Id on Platform: 669

Category: ✨ Best Practices

File: .../service/issue_post_processor.py:450

Issue: The IssuePostProcessor uses a custom SequenceMatcher ratio for similarity. While effective, it doesn't account for semantic meaning of code. Additionally, the line drift correction assumes the 'reason' contains enough keywords, which might not be true for style-only issues.

💡 Suggested Fix

Add a check to ensure at least one high-confidence keyword is found before attempting line correction to avoid 'false correction' to a random similar-looking line.

--- a/python-ecosystem/mcp-client/service/issue_post_processor.py
+++ b/python-ecosystem/mcp-client/service/issue_post_processor.py
@@ -242,1 +242,1 @@
-        return best_match_line if best_match_score > 0 else None
+        return best_match_line if best_match_score > 1.5 else None

View Issue Details

Files Affected

.../core/index_manager.py: 1 issue
.../service/issue_post_processor.py: 1 issue
.../vcsclient/VcsClientProvider.java: 1 issue
.../analysis/BranchAnalysisProcessor.java: 1 issue

Analysis completed on 2025-12-31 11:34:40 | View Full Report | Pull Request

feat: Add AST-based code chunking support with new splitter and confi…

cd5aa00

…guration options

rostilos added 8 commits December 30, 2025 13:06

feat: Update branch handling in RAG queries and improve alias managem…

e7998ac

…ent in indexing

feat: Enhance language module mapping in AST splitter for improved dy…

46005f1

…namic imports

feat: Add pagination support for branch issues and enhance issue coun…

0e121fd

…ting in repository. Fix issue with non-resolving issues duplicates ( branch reconciliation )

fix: Correct issue ID retrieval in BranchAnalysisProcessor for improv…

997edf7

…ed issue reconciliation

feat: Update prompt builder to enhance error grouping and resolution …

dc7c08c

…status handling for previous issues

fix: Clarify instructions for error grouping in prompt builder to ens…

dcb3a80

…ure correct handling of duplicate issues

feat: Implement JWT Bearer grant for refreshing Bitbucket Connect App…

f67b6f5

… tokens and enhance connection handling

feat: Add Issue Post-Processor service for line number validation and…

1f233d3

… issue deduplication; enhance prompt builder with detailed instructions

rostilos merged commit 7cd9b05 into 0.2.0-rc Dec 31, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add AST-based code chunking support with new splitter and confi… #23

feat: Add AST-based code chunking support with new splitter and confi… #23

Uh oh!

rostilos commented Dec 30, 2025

Uh oh!

coderabbitai bot commented Dec 30, 2025

Review skipped

Uh oh!

rostilos commented Dec 31, 2025

Uh oh!

codecrow-ai bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

codecrow-local bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add AST-based code chunking support with new splitter and confi… #23

feat: Add AST-based code chunking support with new splitter and confi… #23

Uh oh!

Conversation

rostilos commented Dec 30, 2025

Uh oh!

coderabbitai bot commented Dec 30, 2025

Review skipped

Uh oh!

rostilos commented Dec 31, 2025

Uh oh!

codecrow-ai bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecrow-local bot commented Dec 31, 2025

⚠️ Code Analysis Results

Summary

Issues Overview

Detailed Issues

🔴 High Severity Issues

🟡 Medium Severity Issues

🔵 Low Severity Issues

Files Affected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecrow-ai bot commented Dec 31, 2025 •

edited

Loading