perf: optimize technology detection with content caching and pre-discovery by yacosta738 · Pull Request #421 · dallay/agentsync

yacosta738 · 2026-05-17T10:00:07Z

Optimized the technology detection system by implementing content caching for shared configuration files and pre-discovering layout-specific markers (like Gradle files). This eliminates redundant filesystem I/O and heap allocations during the evaluation of detection rules, making the agentsync skill suggest command measurably faster.

PR created automatically by Jules for task 9856788744421460774 started by @yacosta738

…overy - Implement per-project content cache in `CatalogDrivenDetector` to ensure shared config files (e.g. setup.py, package.json) are read from disk at most once. - Pre-discover Gradle layout files during the primary `RepoMetadata` walk, eliminating redundant O(Rules * Files) filesystem checks and dynamic path constructions in the detection loop. - Pre-compile `PathBuf`s for detection rules to avoid repeated heap allocations and string parsing during rule evaluation. - Document performance learning in `.agents/journal/bolt.md`. These optimizations move technology detection from disk-bound to memory-bound, significantly improving the performance of the `skill suggest` command in large or nested projects.

google-labs-jules · 2026-05-17T10:00:08Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-05-17T10:00:21Z

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Improved technology detection performance through per-project content caching, eliminating repeated file reads when evaluating multiple technology rules.
- Optimized project scanning efficiency by pre-discovering layout-specific markers during the initial scan, reducing redundant disk I/O and dynamic path construction during rule evaluation.

Walkthrough

This PR optimizes technology detection by introducing per-project content caching and pre-discovering Gradle layout markers during the initial filesystem walk. RepoMetadata now collects Gradle files, the detection pipeline maintains a shared content cache cleared between nested projects, and rule evaluation leverages cached file content to reduce repeated disk reads.

Changes

Technology Detection Caching and Gradle Pre-Discovery

Layer / File(s)	Summary
Gradle file pre-discovery in metadata collection `src/skills/detect.rs`	`RepoMetadata` adds a `gradle_files` field to capture Gradle-related files (`build.gradle`, `settings.gradle`) discovered at root level and selected depth-2 paths during the bounded `WalkDir` walk, replacing legacy root-dir tracking.
Content cache threading through detection pipeline `src/skills/detect.rs`	`CatalogDrivenDetector::detect` initializes a per-project `content_cache` and passes it to both root and nested `evaluate_rules()` invocations; cache is cleared between nested projects to isolate config content per project.
Rule compilation and cache-aware evaluation `src/skills/detect.rs`	`CompiledConfigFileContentRules` converts `files` from `Option<Vec<String>>` to `Option<Vec<PathBuf>>` at compile time. `evaluate_rules()` gains a `content_cache` parameter and implements cache-before-read: checks the cache, reads and inserts on miss, then applies configured regex patterns to cached content.
File gathering using pre-discovered gradle_files `src/skills/detect.rs`	`gather_content_scan_files()` is updated to extend files from `metadata.gradle_files` directly when `scan_gradle_layout` is enabled, with deduplication and existence checks applied to explicit file inclusion.
Documentation `.agents/journal/bolt.md`	Journal entry for 2026-05-24 documents the optimization strategy: per-project content caching and pre-discovering layout markers during the primary walk to reduce disk I/O and allocation overhead.

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 A rabbit hops through the code,
Pre-caching files on the road,
Less disk reads, more speed,
Gradle files we'll indeed,
Discover once, reuse with mode! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'perf: optimize technology detection with content caching and pre-discovery' accurately reflects the main changes: implementing content caching for shared config files and pre-discovering layout-specific markers to optimize technology detection performance.
Description check	✅ Passed	The description is directly related to the changeset, explaining the implementation of content caching and pre-discovery mechanisms that correspond to the modifications in detect.rs and the journal entry.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf-tech-detection-caching-9856788744421460774

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

semgrep-code-dallay · 2026-05-17T10:03:21Z

+                Some(cached)
+            } else {
+                let absolute = project_root.join(file_path);
+                if let Ok(content) = fs::read_to_string(absolute) {


Semgrep identified a blocking 🔴 issue in your code:

File path constructed from untrusted configuration rules allows path traversal attack, enabling attackers to read arbitrary files outside the intended directory.

More details about this

The code reads file content from a path constructed by joining project_root with file_path, where file_path comes from untrusted content_rules that may be loaded from external configuration. An attacker can craft a malicious content_rules configuration containing path traversal sequences (like ../../../etc/passwd) in the file paths. When project_root.join(file_path) is called, these sequences allow escaping the intended directory boundary and reading arbitrary files on the system. For example, if project_root is /home/app/config and file_path is ../../../etc/passwd, the code will read /etc/passwd instead of staying within the config directory. The application then searches patterns against this file content, potentially exposing sensitive data like credentials or system information to an attacker who controls the rule configuration.

Dataflow graph

flowchart LR classDef invis fill:white, stroke: none classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none subgraph File0["<b>src/skills/detect.rs</b>"] direction LR %% Source subgraph Source direction LR v0["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L335 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 335] project_root</a>"] end %% Intermediate subgraph Traces0[Traces] direction TB v2["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L335 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 335] metadata</a>"] v3["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L358 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 358] &</a>"] v4["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L358 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 358] in</a>"] v5["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L358 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 358] rel_nested_dir</a>"] v6["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L359 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 359] nested_dir</a>"] v7["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L375 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 375] &</a>"] v8["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L374 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 374] evaluate_rules</a>"] v9["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L421 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 421] project_root</a>"] v10["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L482 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 482] absolute</a>"] end v2 --> v3 v3 --> v4 v4 --> v5 v5 --> v6 v6 --> v7 v7 --> v8 v8 --> v9 v9 --> v10 %% Sink subgraph Sink direction LR v1["<a href=https://github.com/dallay/agentsync/blob/564017c1055a4a806d0d9bf8e92d60869fe8220a/src/skills/detect.rs#L483 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 483] absolute</a>"] end end %% Class Assignment Source:::invis Sink:::invis Traces0:::invis File0:::invis %% Connections Source --> Traces0 Traces0 --> Sink

Loading

To resolve this comment:

✨ Commit fix suggestion

Normalize and validate the file path to prevent path traversal before using it with fs::read_to_string or similar file system functions.

Use a helper function like canonical_existing_path to canonicalize and check the path. You can do this by adding: let file_path = canonical_existing_path(file_path).ok()?; before any code that reads the file, such as fs::read_to_string(file_path).

Only proceed with file operations if the call to canonical_existing_path returns Some(path), meaning the path is valid and canonical.

Make sure that all dynamic path construction using user input (such as joins with user-derived values) passes through this validation step before being used by the application.

Canonicalizing the path with canonical_existing_path ensures that attacker-controlled input cannot break out of the intended directory, which helps to prevent path traversal vulnerabilities.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by tainted-path.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

sentry · 2026-05-17T10:03:59Z

Codecov Report

❌ Patch coverage is 94.82759% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/skills/detect.rs	94.82%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

sonarqubecloud · 2026-05-17T10:04:14Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
94.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/journal/bolt.md:
- Around line 156-157: The Markdown heading "## 2026-05-24 - Content Caching and
Gradle Discovery for Tech Detection" is missing a blank line beneath it; update
the .agents/journal/bolt.md entry by inserting a single blank line immediately
after that heading to satisfy MD022 linting (ensure the heading and the
following paragraph are separated by one empty line).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04ba79d7-eda9-431c-8b91-e3f51e173b18

📥 Commits

Reviewing files that changed from the base of the PR and between da781df and 564017c.

📒 Files selected for processing (2)

.agents/journal/bolt.md
src/skills/detect.rs

coderabbitai · 2026-05-17T10:04:32Z

+## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection
+**Learning:** Even with single-pass metadata, technology detection was performing redundant I/O and allocations by re-reading shared configuration files (e.g., `setup.py`) and re-calculating Gradle layout paths for every matching rule. This created an \(O(T \times F)\) overhead in I/O and heap allocations.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a blank line after the heading to satisfy Markdown linting.

Line 156 is missing the required blank line below the heading (MD022), which can keep doc-lint warnings noisy in CI.

Suggested fix

## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection + **Learning:** Even with single-pass metadata, technology detection was performing redundant I/O and allocations by re-reading shared configuration files (e.g., `setup.py`) and re-calculating Gradle layout paths for every matching rule. This created an \(O(T \times F)\) overhead in I/O and heap allocations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection

**Learning:** Even with single-pass metadata, technology detection was performing redundant I/O and allocations by re-reading shared configuration files (e.g., `setup.py`) and re-calculating Gradle layout paths for every matching rule. This created an \(O(T \times F)\) overhead in I/O and heap allocations.

## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection

**Learning:** Even with single-pass metadata, technology detection was performing redundant I/O and allocations by re-reading shared configuration files (e.g., `setup.py`) and re-calculating Gradle layout paths for every matching rule. This created an \(O(T \times F)\) overhead in I/O and heap allocations.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 156-156: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/journal/bolt.md around lines 156 - 157, The Markdown heading "## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection" is missing a blank line beneath it; update the .agents/journal/bolt.md entry by inserting a single blank line immediately after that heading to satisfy MD022 linting (ensure the heading and the following paragraph are separated by one empty line).

github-actions Bot added the backend label May 17, 2026

semgrep-code-dallay Bot reviewed May 17, 2026

View reviewed changes

coderabbitai Bot requested changes May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize technology detection with content caching and pre-discovery#421

perf: optimize technology detection with content caching and pre-discovery#421
yacosta738 wants to merge 1 commit into
mainfrom
perf-tech-detection-caching-9856788744421460774

yacosta738 commented May 17, 2026

Uh oh!

google-labs-jules Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

semgrep-code-dallay Bot May 17, 2026

Uh oh!

sentry Bot commented May 17, 2026

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		## 2026-05-24 - Content Caching and Gradle Discovery for Tech Detection
		Learning: Even with single-pass metadata, technology detection was performing redundant I/O and allocations by re-reading shared configuration files (e.g., `setup.py`) and re-calculating Gradle layout paths for every matching rule. This created an \(O(T \times F)\) overhead in I/O and heap allocations.

Conversation

yacosta738 commented May 17, 2026

Uh oh!

google-labs-jules Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

semgrep-code-dallay Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot commented May 17, 2026

Codecov Report

Uh oh!

sonarqubecloud Bot commented May 17, 2026

Quality Gate passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 17, 2026 •

edited

Loading