Skip to content

fix: benchmark uses stale native addon from npm#156

Merged
carlos-alm merged 2 commits intomainfrom
fix/benchmark-stale-addon
Feb 27, 2026
Merged

fix: benchmark uses stale native addon from npm#156
carlos-alm merged 2 commits intomainfrom
fix/benchmark-stale-addon

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Downloads the just-built native artifact from the triggering Publish workflow and overlays it on top of what npm ci installed
  • Ensures all 4 benchmark jobs (build, embedding, query, incremental) test the freshly compiled addon rather than the last stable npm release (v2.4.0)
  • Adds actions: read permission to each job for cross-workflow artifact access
  • Both new steps guarded by if: github.event_name == 'workflow_run' — manual workflow_dispatch runs skip them gracefully

Test plan

  • Push to main → Publish triggers → Benchmark triggers → verify "Install native addon from Publish" step runs
  • Native benchmark numbers should reflect the just-compiled addon
  • Manual workflow_dispatch still works (skips artifact download, falls back to npm)

@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: fix: benchmark uses stale native addon from npm

Todo List:

  • Read and analyze the workflow file changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Benchmark workflows were testing the last published npm version (v2.4.0) instead of the freshly built native addon from the current Publish workflow run. This creates a gap between code changes and performance measurements.

Why This Approach: The solution downloads the native-linux-x64 artifact from the triggering Publish workflow and overlays it on the npm-installed package. This ensures benchmarks reflect the actual performance of the code being published.

Risk Assessment: LOW RISK - The change is additive with proper fallbacks and security controls.

Code Quality Assessment

Strengths

  1. Proper conditional guards: All new steps use if: github.event_name == 'workflow_run' ensuring manual runs fallback gracefully to npm packages
  2. Consistent implementation: The same artifact download/install pattern is applied across all 4 benchmark jobs (build, embedding, query, incremental)
  3. Security: Appropriate actions: read permission added for cross-workflow artifact access
  4. Error handling: Creates package.json if missing to ensure require() resolution works
  5. Clear logging: Includes workflow run ID in installation messages for debugging

⚠️ Minor Concerns

  1. Hardcoded artifact name: Uses native-linux-x64 specifically - works for Ubuntu runners but could be more flexible (lines 41, 133, 237, 329)
  2. Missing error handling: No explicit handling if artifact download fails or .node file is missing
  3. Shell globbing risk: Uses *.node wildcard in cp commands (lines 51, 143, 247, 339) - could theoretically copy unintended files

💡 Suggestions

  • Consider adding error checking: [ -f "${{ runner.temp }}/native-artifact"/*.node ] before cp
  • The artifact name could use a variable if other platforms are supported later
  • Consider logging the native addon version/hash for traceability

Phase Analysis

Phase 1 ✅: Solves a real problem - ensures benchmarks test current code, not stale npm packages

Phase 2 ✅: Configuration changes are additive and properly scoped. No rules relaxed or thresholds lowered

Phase 3 ✅: No test integrity issues - this improves test accuracy rather than weakening it

Phase 4 ✅: Code quality is good with proper conditionals, permissions, and fallback behavior

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented fix for a real issue. The approach is sound, properly guarded, and maintains backward compatibility. Minor suggestions above could be addressed in follow-up if needed, but they don't block approval.

The PR effectively solves the core problem of benchmark staleness while maintaining safety through conditional execution and proper permissions. The implementation is consistent across all benchmark jobs and includes appropriate fallbacks.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR ensures benchmark workflows test freshly compiled native addons rather than the last stable npm release (v2.4.0).

Key changes:

  • Downloads the just-built native-linux-x64 artifact from the triggering Publish workflow
  • Overlays the fresh .node file into node_modules/@optave/codegraph-linux-x64-gnu/
  • Adds actions: read permission to all 4 benchmark jobs for cross-workflow artifact access
  • Guards both download and install steps with if: github.event_name == 'workflow_run' so manual workflow_dispatch runs skip gracefully

Integration with existing code:

  • The overlay approach works correctly with the native loader in src/native.js, which uses require('@optave/codegraph-linux-x64-gnu') to load platform-specific packages
  • Matches the artifact naming convention (native-linux-x64) established in the Publish workflow
  • Creates the minimal package.json structure needed for Node.js module resolution

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation correctly addresses the stated problem using GitHub Actions' cross-workflow artifact access pattern. The changes are isolated to CI configuration, properly guarded with conditionals for graceful fallback, and consistent across all four benchmark jobs. The directory structure and package.json format match what the native loader expects.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/benchmark.yml Adds native addon overlay from Publish workflow artifacts, ensuring benchmarks test the freshly compiled code rather than stale npm version

Last reviewed commit: d1c763c

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Download the just-built native artifact from the triggering Publish
workflow and overlay it on top of what npm ci installed. This ensures
benchmarks test the freshly compiled addon rather than the last stable
npm release.

Both new steps are guarded by `if: github.event_name == 'workflow_run'`
so manual workflow_dispatch runs skip them and fall back to npm.
@carlos-alm carlos-alm force-pushed the fix/benchmark-stale-addon branch from d1c763c to 3fd2bb0 Compare February 27, 2026 06:16
@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: fix: benchmark uses stale native addon from npm

Todo List:

  • Read and analyze the workflow file changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Benchmark workflows were testing the last published npm version (v2.4.0) instead of the freshly built native addon from the current Publish workflow run. This creates a gap between code changes and performance measurements.

Why This Approach: The solution downloads the native-linux-x64 artifact from the triggering Publish workflow and overlays it on the npm-installed package. This ensures benchmarks reflect the actual performance of the code being published.

Risk Assessment: LOW RISK - The change is additive with proper fallbacks and security controls.

Code Quality Assessment

Strengths

  1. Proper conditional guards: All new steps use if: github.event_name == 'workflow_run' ensuring manual runs fallback gracefully to npm packages
  2. Consistent implementation: The same artifact download/install pattern is applied across all 4 benchmark jobs (build, embedding, query, incremental)
  3. Security: Appropriate actions: read permission added for cross-workflow artifact access
  4. Error handling: Creates package.json if missing to ensure require() resolution works
  5. Clear logging: Includes workflow run ID in installation messages for debugging

⚠️ Minor Concerns

  1. Hardcoded artifact name: Uses native-linux-x64 specifically - works for Ubuntu runners but could be more flexible (lines 41, 133, 237, 329)
  2. Missing error handling: No explicit handling if artifact download fails or .node file is missing
  3. Shell globbing risk: Uses *.node wildcard in cp commands (lines 51, 143, 247, 339) - could theoretically copy unintended files

💡 Suggestions

  • Consider adding error checking: [ -f "${{ runner.temp }}/native-artifact"/*.node ] before cp
  • The artifact name could use a variable if other platforms are supported later
  • Consider logging the native addon version/hash for traceability

Phase Analysis

Phase 1 ✅: Solves a real problem - ensures benchmarks test current code, not stale npm packages

Phase 2 ✅: Configuration changes are additive and properly scoped. No rules relaxed or thresholds lowered

Phase 3 ✅: No test integrity issues - this improves test accuracy rather than weakening it

Phase 4 ✅: Code quality is good with proper conditionals, permissions, and fallback behavior

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented fix for a real issue. The approach is sound, properly guarded, and maintains backward compatibility. Minor suggestions above could be addressed in follow-up if needed, but they don't block approval.

The PR effectively solves the core problem of benchmark staleness while maintaining safety through conditional execution and proper permissions. The implementation is consistent across all benchmark jobs and includes appropriate fallbacks.


- Validate codegraph-core.node exists before copying (fail with
  ::error annotation and directory listing if missing)
- Use explicit file path instead of *.node glob to avoid copying
  unintended files
- Log sha256 hash of installed addon for traceability
@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: fix: benchmark uses stale native addon from npm

Todo List:

  • Read and analyze the workflow file changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality assessment
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: Benchmark workflows were testing the last published npm version (v2.4.0) instead of the freshly built native addon from the current Publish workflow run. This creates a gap between code changes and performance measurements.

Why This Approach: The solution downloads the native-linux-x64 artifact from the triggering Publish workflow and overlays it on the npm-installed package. This ensures benchmarks reflect the actual performance of the code being published.

Risk Assessment: LOW RISK - The change is additive with proper fallbacks and security controls.

Code Quality Assessment

Strengths

  1. Proper conditional guards: All new steps use if: github.event_name == 'workflow_run' ensuring manual runs fallback gracefully to npm packages
  2. Consistent implementation: The same artifact download/install pattern is applied across all 4 benchmark jobs (build, embedding, query, incremental)
  3. Security: Appropriate actions: read permission added for cross-workflow artifact access
  4. Error handling: Proper file existence checks and error logging (lines 51-54, 150-153, 261-264, 360-363)
  5. Clear logging: Includes workflow run ID and SHA256 hash for debugging and verification
  6. Fallback behavior: Manual workflow_dispatch runs skip artifact download gracefully

⚠️ Minor Concerns

  1. Hardcoded artifact name: Uses native-linux-x64 specifically - works for Ubuntu runners but could be more flexible
  2. Directory creation: Uses mkdir -p which is safe but could potentially mask permission issues

💡 Suggestions

  • The error handling is actually quite good with explicit file checks and detailed logging
  • The SHA256 logging adds good traceability for debugging performance differences

Phase Analysis

Phase 1 ✅: Solves a real problem - ensures benchmarks test current code, not stale npm packages

Phase 2 ✅: Configuration changes are additive and properly scoped. No rules relaxed or thresholds lowered

Phase 3 ✅: No test integrity issues - this improves test accuracy rather than weakening it

Phase 4 ✅: Code quality is good with proper conditionals, permissions, error handling, and fallback behavior

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented fix for a real issue. The approach is sound, properly guarded, and maintains backward compatibility. The implementation includes good error handling, logging, and consistent patterns across all benchmark jobs.

The PR effectively solves the core problem of benchmark staleness while maintaining safety through conditional execution and proper permissions. The artifact overlay approach is clean and doesn't interfere with the existing npm package structure.


@carlos-alm carlos-alm merged commit 83f2d4e into main Feb 27, 2026
14 checks passed
@carlos-alm carlos-alm deleted the fix/benchmark-stale-addon branch February 27, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant